nextcloud-mcp-server

Author	SHA1	Message	Date
Chris Coutinho	578de4d7d6	feat(observability): Add comprehensive monitoring with Prometheus and OpenTelemetry - Add Prometheus metrics for HTTP, MCP tools, Nextcloud API, OAuth, vector sync, and DB operations - Add OpenTelemetry distributed tracing with OTLP export - Add structured JSON logging with trace context correlation - Add ObservabilityMiddleware for automatic HTTP instrumentation - Add app_name attribute to all client classes for per-app metrics - Add configuration for metrics, tracing, and logging via environment variables - Add documentation in docs/observability.md - Fix graceful degradation when tracing is disabled (default state) - Fix uvicorn logging configuration to use observability formatters 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-09 08:54:04 +01:00
Chris Coutinho	857d8f2152	feat: add Qdrant local mode support with in-memory and persistent storage Adds flexible Qdrant deployment modes to reduce infrastructure requirements for local development and smaller deployments: Configuration Changes: - Add QDRANT_LOCATION environment variable (mutually exclusive with QDRANT_URL) - Three modes: network (URL), in-memory (:memory:, default), persistent (file path) - Settings dataclass validation via __post_init__ ensures mutual exclusivity - API key warning when set in local mode (ignored, only for network mode) Client Initialization: - Auto-detect mode: network (url + api_key) vs local (:memory: or path=) - In-memory: AsyncQdrantClient(":memory:") - zero config default - Persistent: AsyncQdrantClient(path="/app/data/qdrant") - file storage - Network: AsyncQdrantClient(url, api_key) - production mode Docker Compose Updates: - Qdrant service moved to optional profile (--profile qdrant) - MCP service uses QDRANT_LOCATION=:memory: by default - Added mcp-data volume for persistent storage (/app/data) - No hard dependency on qdrant service Documentation: - Comprehensive configuration guide in docs/configuration.md - All three modes documented with pros/cons - Docker Compose examples for each mode - Environment variable reference table Tests: - 13 new config validation tests (mutual exclusivity, defaults, warnings) - Persistent mode integration test (create, close, reopen, verify persistence) - All 82 unit tests + 5 smoke tests pass Breaking Change: - Default changed from QDRANT_URL=http://qdrant:6333 to QDRANT_LOCATION=:memory: - Simplifies local development (no external service needed) - Production deployments: explicitly set QDRANT_URL or QDRANT_LOCATION Related: ADR-007 background vector sync implementation 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-09 07:07:07 +01:00
Chris Coutinho	fdd82f59e2	feat: implement semantic search tool and fix vector sync issues (ADR-007 Phase 3) Completes the ADR-007 implementation by adding user-facing semantic search functionality. Previous phases implemented scanner and processor for background indexing; this adds the query interface. Changes: - Add nc_notes_semantic_search MCP tool for natural language queries - Fix Qdrant point IDs to use UUIDs instead of strings (was causing 400 errors) - Reduce scan interval default from 1 hour to 5 minutes for faster updates - Add SemanticSearchResult and SemanticSearchNotesResponse models - Implement dual-phase authorization (Qdrant filter + Nextcloud API verification) The semantic search enables finding notes by meaning rather than exact keywords, using vector embeddings to understand query intent. Point ID fix resolves critical bug where all document indexing failed with "invalid point ID" errors. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-08 21:51:12 +01:00
Chris Coutinho	8f45e996e8	feat: implement vector sync scanner and processor (ADR-007 Phase 2) Implements background vector database synchronization using anyio TaskGroups for BasicAuth mode with single-user credentials. Scanner Implementation: - Periodic document discovery (hourly, configurable) - Timestamp-based change detection (Nextcloud vs Qdrant) - Wake event for immediate scanning on-demand - Supports both initial sync (all docs) and incremental sync (changes only) - Detects deleted documents and queues for removal Processor Implementation: - Concurrent document processing pool (3 workers default) - I/O-bound embedding generation via Ollama API - Retry logic with exponential backoff (3 retries) - Document chunking (512 words, 50-word overlap) - Handles both index and delete operations - Upserts vectors to Qdrant with rich metadata App Lifespan Integration: - Extended AppContext with background task state - Modified app_lifespan_basic() to start tasks via anyio TaskGroups - Graceful shutdown with coordinated task cancellation - Only activates when VECTOR_SYNC_ENABLED=true Embedding Service: - OllamaEmbeddingProvider with TLS support - Singleton pattern for shared client instances - Batch embedding support for efficiency - Auto-detects embedding dimension (768 for nomic-embed-text) Qdrant Client: - Async client wrapper with singleton pattern - Auto-creates collection on first use - COSINE distance metric for semantic similarity - Integrates with embedding service for dimension detection Health Check Enhancement: - Added Qdrant status check to /health/ready endpoint - Only checks when VECTOR_SYNC_ENABLED=true - 2-second timeout for health probe - Reports connection errors with details Configuration: - VECTOR_SYNC_ENABLED: Enable background sync - VECTOR_SYNC_SCAN_INTERVAL: Scanner frequency (3600s default) - VECTOR_SYNC_PROCESSOR_WORKERS: Concurrent processors (3 default) - QDRANT_URL, QDRANT_API_KEY, QDRANT_COLLECTION: Vector DB config - OLLAMA_BASE_URL, OLLAMA_EMBEDDING_MODEL: Embedding service config Dependencies Added: - qdrant-client>=1.7.0: Vector database client Docker Compose: - Added Qdrant service with health check - Exposed ports 6333 (REST) and 6334 (gRPC) - Configured MCP service with vector sync environment - Added qdrant-data volume for persistence Known Issue: - FastMCP lifespan not triggering for streamable-http transport - Background tasks will start once lifespan integration is complete - Lifespan triggers on MCP session establishment, not server startup Related: ADR-007 Background Vector Database Synchronization 🤖 Generated with Claude Code (https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-08 21:14:38 +01:00
Chris Coutinho	9fab6cb550	feat: Implement ADR-005 unified token verifier to eliminate token passthrough vulnerability Replace two non-compliant token verifiers (NextcloudTokenVerifier and ProgressiveConsentTokenVerifier) with a single UnifiedTokenVerifier that properly validates token audiences per MCP Security Best Practices specification. The previous implementation had a critical security vulnerability where tokens intended for the MCP server were passed directly to Nextcloud APIs without proper audience validation (token passthrough anti-pattern). This violates OAuth 2.0 security principles and the MCP specification. Changes: - Add UnifiedTokenVerifier supporting two compliant modes: * Multi-audience mode (default): Validates tokens contain BOTH MCP and Nextcloud audiences, enabling direct use without exchange * Token exchange mode (opt-in): Validates MCP audience only, exchanges for Nextcloud tokens via RFC 8693 with caching to minimize latency - Remove token passthrough vulnerability from context.py and context_helper.py - Implement token exchange caching (5-minute TTL default) to reduce network calls - Add required environment variables for audience validation: * NEXTCLOUD_MCP_SERVER_URL - MCP server URL (used as audience) * NEXTCLOUD_RESOURCE_URI - Nextcloud resource identifier * TOKEN_EXCHANGE_CACHE_TTL - Cache TTL for exchanged tokens - Update docker-compose.yml with resource URI configuration for both OAuth modes - Add comprehensive test suite (29 tests) covering both authentication modes - Remove legacy NextcloudTokenVerifier and ProgressiveConsentTokenVerifier Security improvements: - Eliminates token passthrough anti-pattern - Enforces proper audience separation between MCP and Nextcloud - Complies with MCP Security Best Practices and RFC 8707/8693 - Maintains performance with token exchange caching Test results: 65/65 unit tests passed, 5/5 smoke tests passed 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-05 18:53:14 +01:00
Chris Coutinho	15113dbb03	fix: remove Hybrid Flow, make Progressive Consent default (ADR-004) Eliminates scope escalation security vulnerability by removing Hybrid Flow and making Progressive Consent the only OAuth mode. Changes: - Delete oauth_callback() and oauth_token() (Hybrid Flow only, ~314 lines) - Fix scope flows: Flow 1 requests resource scopes, Flow 2 requests identity+offline - Remove ENABLE_PROGRESSIVE_CONSENT flag (always enabled in OAuth mode) - Update documentation to reflect Progressive Consent as default - Delete test_adr004_hybrid_flow.py test file - Remove unused variables (ruff lint fixes) Security improvements: - No scope escalation: client gets exactly what it requests - Clear separation: MCP session tokens vs Nextcloud offline tokens - OAuth2 compliant: follows best practices for scope handling 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-04 00:26:07 +01:00
Chris Coutinho	71e77e95bc	refactor: integrate token exchange into unified get_client() pattern Resolves the token exchange implementation gap where get_session_client() was implemented but never used by tools. Unifies token acquisition into a single async get_client() method that handles both pass-through and token exchange modes transparently. Core Changes: - Make get_client() async and merge token exchange logic into it - Remove scopes parameter from token exchange (Nextcloud doesn't support OAuth scopes) - Update all 8 tool modules to use await get_client(ctx) - Fix provisioning decorator to skip checks in BasicAuth mode Token Acquisition Modes: 1. BasicAuth: Returns shared client (no token operations) 2. OAuth pass-through (default): Verifies and passes Flow 1 token to Nextcloud 3. OAuth token exchange (opt-in): Exchanges Flow 1 token for ephemeral token via RFC 8693 Key Architectural Clarifications: - Progressive Consent (Flow 1/2) = Authorization architecture - Token Exchange = Token acquisition pattern during tool execution - Refresh tokens from Flow 2 are NEVER used for tool calls (only background jobs) - Nextcloud scopes are "soft-scopes" enforced by MCP server, not IdP Documentation Updates: - ADR-004: Added comprehensive token acquisition patterns section - CRITICAL-TOKEN-EXCHANGE-PATTERN.md: Updated to reflect implementation status - CLAUDE.md: Updated architectural patterns with async get_client() Testing: - All 36 unit tests passing - All 4 smoke tests passing (BasicAuth mode) - Linting issues fixed (ruff) Configuration: ENABLE_TOKEN_EXCHANGE=false (default) - pass-through mode ENABLE_TOKEN_EXCHANGE=true (opt-in) - token exchange mode 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-03 20:33:56 +01:00
Chris Coutinho	a36038422b	feat: Add text processing background worker for telling client about progress	2025-10-25 19:52:45 +02:00
Chris Coutinho	2147fc1696	refactor: Transform document parsing into pluggable processor architecture Refactors PR #190's hardcoded Unstructured.io integration into a flexible, extensible plugin system supporting multiple text extraction engines. - `DocumentProcessor` ABC: Abstract interface for all processors - `ProcessorRegistry`: Central registry for discovery and routing - `ProcessingResult`: Standardized output format across processors - `UnstructuredProcessor`: Refactored from `UnstructuredClient` - `TesseractProcessor`: Local OCR for images (lightweight alternative) - `CustomHTTPProcessor`: Generic wrapper for custom HTTP APIs - New `get_document_processor_config()` returns structured config - Supports enabling/disabling individual processors - Per-processor configuration via environment variables - Breaking Change: `ENABLE_UNSTRUCTURED_PARSING` replaced with: - `ENABLE_DOCUMENT_PROCESSING=true/false` (master switch) - `ENABLE_UNSTRUCTURED=true/false` (per-processor) - `ENABLE_TESSERACT=true/false` - `ENABLE_CUSTOM_PROCESSOR=true/false` - `parse_document()` now uses `ProcessorRegistry` - Auto-selects appropriate processor based on MIME type - Processor priority system (Unstructured=10, Tesseract=5, Custom=1) - `initialize_document_processors()` registers processors at startup - Integrated into both BasicAuth and OAuth lifespans - Graceful degradation if processors fail to initialize ```env ENABLE_DOCUMENT_PROCESSING=false ENABLE_UNSTRUCTURED=false UNSTRUCTURED_API_URL=http://unstructured:8000 UNSTRUCTURED_STRATEGY=auto # auto\|fast\|hi_res UNSTRUCTURED_LANGUAGES=eng,deu ENABLE_TESSERACT=false TESSERACT_LANG=eng ENABLE_CUSTOM_PROCESSOR=false CUSTOM_PROCESSOR_URL=http://localhost:9000/process CUSTOM_PROCESSOR_TYPES=application/pdf,image/jpeg ``` - Removed: `tests/test_unstructured_config.py` (legacy tests) - Added: `tests/unit/test_document_processor_config.py` - 7 unit tests for new config system - Tests individual and multi-processor configurations - Added: - `nextcloud_mcp_server/document_processors/__init__.py` - `nextcloud_mcp_server/document_processors/base.py` - `nextcloud_mcp_server/document_processors/registry.py` - `nextcloud_mcp_server/document_processors/unstructured.py` - `nextcloud_mcp_server/document_processors/tesseract.py` - `nextcloud_mcp_server/document_processors/custom_http.py` - `tests/unit/test_document_processor_config.py` - Modified: - `nextcloud_mcp_server/config.py` - New plugin config system - `nextcloud_mcp_server/app.py` - Processor initialization - `nextcloud_mcp_server/utils/document_parser.py` - Uses registry - `nextcloud_mcp_server/server/webdav.py` - Import updates - `env.sample` - New configuration format - `docker-compose.yml` - (profile changes from previous work) - Removed: - `nextcloud_mcp_server/client/unstructured_client.py` - Replaced by UnstructuredProcessor - `tests/test_unstructured_config.py` - Replaced with new tests ✅ Extensible: Add processors without modifying core code ✅ Testable: Mock processors for unit tests ✅ Configurable: Enable only needed processors ✅ Flexible: Choose fast (Tesseract) vs accurate (Unstructured) ✅ Opt-in: Disabled by default, no mandatory dependencies Users upgrading from PR #190 need to update environment variables: ```bash ENABLE_UNSTRUCTURED_PARSING=true ENABLE_DOCUMENT_PROCESSING=true ENABLE_UNSTRUCTURED=true ``` 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-10-25 19:28:35 +02:00
yuisheaven	64649c902d	Merge branch 'master' into feature/introduce_files_parsing_with_unstructured_service_for_webdav_files_retrieval	2025-10-21 20:37:00 +02:00
Chris Coutinho	5e829fc7e7	refactor: Unify logging & remove factory deployment	2025-10-18 01:15:06 +02:00
yuisheaven	3ff6346c03	ran ruff format via uv	2025-10-05 02:16:42 +02:00
yuisheaven	c9a687171a	added envs for unstructured to control OCR quality and OCR languages	2025-10-04 05:21:02 +02:00
yuisheaven	76dce41ed9	added first versoin of the new document_parser utility and added it to the webdav file retrieval logic	2025-10-04 04:28:24 +02:00
Chris Coutinho	a2c78ee1ef	test: Add tests for MCP tools and resources	2025-07-27 17:43:55 +02:00
Chris Coutinho	ee32a1bfe8	feat: Switch to using async client	2025-06-06 18:41:57 +02:00
Chris Coutinho	e93eb9d302	fix: Configure logging	2025-05-25 11:46:41 +02:00
Chris Coutinho	b0012d6e4a	wip Move testing to container	2025-05-10 12:47:10 +02:00
Chris Coutinho	0d8666a2d7	Initial commit	2025-05-04 23:24:55 +02:00

19 Commits