nextcloud-mcp-server

Author	SHA1	Message	Date
Chris Coutinho	3aa7128f45	feat: add chunk position tracking to vector indexing and search Track character offsets (start_offset, end_offset) for each chunk in vector database metadata, enabling precise chunk highlighting in visualization pane. Changes: - processor.py: Store chunk_start_offset and chunk_end_offset in Qdrant metadata - processor.py: Added metadata_version=2 to indicate position tracking support - search/semantic.py: Return chunk positions from search results - server/semantic.py: Expose chunk positions in API responses (SemanticSearchResult) Enables viz pane to: 1. Display exact matched chunk with surrounding context 2. Highlight the precise portion of text that matched the query 3. Build user trust by showing what the RAG system actually retrieved Position tracking uses ChunkWithPosition dataclass from document_chunker.py which provides character-accurate offsets in the original document. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-17 06:47:58 +01:00
Chris Coutinho	c28fc955ca	Merge origin/master into feature/bm25 Resolved conflicts: - viz_routes.py: Kept bm25's extract_dense_vector() function for robust vector handling - hybrid.py: Removed (bm25 uses native Qdrant RRF fusion instead) - uv.lock: Regenerated after accepting master's dependencies This merge brings in: - RAG evaluation framework (ADR-013) - Performance optimizations (double-fetch elimination) - Migration from asyncio to anyio - OpenTelemetry tracing improvements - Notes app enhancements 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-16 11:52:40 +01:00
Chris Coutinho	02700a8e2c	perf: Eliminate double-fetching in semantic search sampling Performance optimization that removes redundant verification step and makes content fetching parallel in nc_semantic_search_answer tool. Changes: - Remove verification.py module (only had 1 caller) - Refactor nc_semantic_search to do inline deduplication instead of calling verify_search_results() - Migrate verification patterns (anyio task group, semaphore limiting) to nc_semantic_search_answer's content fetching - Change content fetching from sequential loop to parallel execution Performance impact: - Before: 10 API calls (5 parallel verification + 5 sequential content) = ~5.5s overhead - After: 5 API calls (parallel content fetch) = ~0.5s overhead - Result: 50% fewer API calls, ~10x faster for sampling operations Technical details: - Uses anyio.create_task_group() for structured concurrency - Semaphore limiting (max_concurrent=20) prevents connection pool exhaustion - Index-based storage maintains result ordering - Expected failures (deleted notes) logged at debug level - Deduplication handles hybrid search returning same doc from dense + sparse 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-16 10:25:04 +01:00
Chris Coutinho	6fe5596c13	feat: Implement BM25 hybrid search with native Qdrant RRF fusion Replace custom keyword/fuzzy search algorithms with industry-standard BM25 sparse vectors, combined with dense semantic vectors using Qdrant's native Reciprocal Rank Fusion (RRF). This consolidates search architecture and improves relevance for both semantic and keyword queries. Key changes: - Add fastembed dependency for BM25 sparse vector generation - Update Qdrant collection schema to support named vectors (dense + sparse) - Create BM25SparseEmbeddingProvider using FastEmbed's Qdrant/bm25 model - Implement BM25HybridSearchAlgorithm with native Qdrant RRF prefetch - Update document processor to generate both dense and sparse embeddings - Simplify nc_semantic_search() tool to use BM25 hybrid only - Remove legacy keyword.py, fuzzy.py, and custom hybrid.py (736 lines) - Update ADR-014 with implementation notes and test results Benefits: - Consolidated architecture (single Qdrant database) - Native database-level RRF fusion (more efficient) - Industry-standard BM25 (replaces brittle custom keyword search) - Better relevance across semantic and keyword queries - Simplified codebase (-285 net lines) Tests: All 125 tests passing (118 unit, 7 integration) Implements ADR-014: Replace Custom Keyword Search with BM25 Hybrid Search 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-16 06:59:44 +01:00
Chris Coutinho	42376483ab	refactor: Optimize Nextcloud access verification with centralized filtering Move access verification from individual search algorithms to final output stage, eliminating redundant API calls and improving performance. ## Changes New: - `search/verification.py`: Centralized verification using anyio task groups - Deduplicates results by (doc_id, doc_type) before verification - Verifies all unique documents in parallel using structured concurrency - Filters out inaccessible documents in single pass Modified Search Algorithms: - `search/semantic.py`: Removed _deduplicate_and_verify() and _verify_document_access() - `search/keyword.py`: Removed _verify_access() and parallel verification - `search/fuzzy.py`: Removed _verify_access() and parallel verification - `search/hybrid.py`: Removed nextcloud_client parameter passing All algorithms now return unverified results from Qdrant payload. Modified Output Stages: - `server/semantic.py`: Added verify_search_results() call after search - `auth/viz_routes.py`: Added verify_search_results() call after search Both endpoints now verify access once at final stage with deduplication. ## Performance Impact Before: - Hybrid mode (limit=10): 30 API calls (10 per algorithm × 3 algorithms) - Single algorithm: 10-20 API calls (with verification buffer) After: - Hybrid mode (limit=10): 10 API calls (deduplicated verification) - Single algorithm: 10 API calls (deduplicated verification) Performance Gain: 3x reduction in API calls for hybrid search ## Architecture Benefits - Separation of concerns: Algorithms handle scoring, output stage handles security - Deduplication: Each document verified exactly once - Parallel execution: All verifications run concurrently via anyio task groups - Consistency: Same verification logic across MCP tools and viz endpoints 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-15 06:21:06 +01:00
Chris Coutinho	2a078093ed	refactor!: Make all search algorithms query Qdrant payload, not Nextcloud BREAKING CHANGE: Search algorithms now require Qdrant to be populated. Vector sync must be enabled and documents indexed for search to work. - Keyword and fuzzy search now query Qdrant scroll API for title/excerpt - Remove inefficient Nextcloud API fetching pattern - Add optional Nextcloud verification for security - Deduplicate by (doc_id, doc_type) tuple, keeping chunk_index=0 - Align with document processor pattern that already stores text in Qdrant	2025-11-15 01:56:41 +01:00
Chris Coutinho	f3bdb8b885	feat: Update nc_semantic_search tool with algorithm selection Implements ADR-012 by adding multi-algorithm support to the MCP tool. Key changes: - Added algorithm parameter: "semantic"\|"keyword"\|"fuzzy"\|"hybrid" (default: "hybrid") - Added weight parameters for hybrid mode configuration - Replaced direct Qdrant/embedding calls with search module abstractions - Updated docstring to describe all four algorithms - Simplified implementation: ~50 lines vs ~150 lines (67% reduction) - Better error handling for missing vector sync Algorithm selection: - semantic: Pure vector similarity (requires VECTOR_SYNC_ENABLED=true) - keyword: Token-based matching with weighted title/content scoring - fuzzy: Character overlap for typo tolerance - hybrid: RRF fusion with configurable weights (default: 0.5/0.3/0.2) Backward compatibility: - Tool name unchanged (nc_semantic_search) - New parameters have sensible defaults - Existing clients get hybrid search automatically (better than pure semantic) - search_method field in response reflects actual algorithm used Weight validation: - Performed in HybridSearchAlgorithm constructor - Must sum to ≤1.0 and all non-negative - At least one weight must be > 0 - Clear error messages on validation failure Next: Update viz pane to use same algorithms 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-15 00:25:55 +01:00
Chris Coutinho	c3023d2cc3	feat: Complete Phase 5 - Instrument all 93 MCP tools Applied @instrument_tool decorator to all 86 remaining tools across 8 server files. Instrumented files: - calendar.py: 16 tools - contacts.py: 7 tools - deck.py: 25 tools - webdav.py: 11 tools - tables.py: 6 tools - sharing.py: 5 tools - cookbook.py: 13 tools - semantic.py: 3 tools Total: 93 tools instrumented (7 in notes.py + 86 in other files) These metrics populate: - MCP Tool Calls panel (by tool name and status) - MCP Tool Duration panel (histogram) - MCP Tool Errors panel (by tool name and error type) This completes PR #295 - All 5 phases of metrics instrumentation done: ✅ Phase 1: Queue size metrics (2 locations) ✅ Phase 2: Health checks (1 location) ✅ Phase 3: Database operations (3 methods) ✅ Phase 4: OAuth token metrics (3 locations) ✅ Phase 5: MCP tool metrics (93 tools) All 34 dashboard panels now have data sources.	2025-11-13 16:58:44 +01:00
Chris Coutinho	4ea5ed72d4	feat: Add Grafana dashboard and vector sync metric instrumentation Implement comprehensive observability for vector database synchronization with Grafana dashboard and Prometheus metrics. ## Part 1: Grafana Dashboard Created all-in-one operations dashboard with 7 rows and 34 panels: ### Dashboard Structure: - Overview Row: Request rate, error rate, P95 latency, active requests - HTTP Metrics (RED): Request/error rates by endpoint, latency percentiles - MCP Tools: Call volume, error rates, execution duration by tool - Nextcloud API: API calls/latency by app, retry patterns - OAuth & Authentication: Token validations, exchanges, cache hit rate - Dependencies & Health: Status for Nextcloud/Qdrant/Keycloak/Unstructured - Vector Sync: Processing throughput, queue depth, Qdrant operations ### Helm Chart Integration: - Added dashboard-configmap.yaml template for automatic provisioning - Configured Grafana sidecar auto-discovery (label: grafana_dashboard="1") - Added dashboards configuration section in values.yaml (opt-in) - Updated Chart.yaml with dashboard annotations - Enhanced NOTES.txt with dashboard deployment instructions - Comprehensive documentation in dashboards/README.md Dashboard supports dynamic filtering via variables: - datasource: Prometheus data source selection - namespace: Filter by Kubernetes namespace - pod: Multi-select pod filtering - interval: Query interval (1m/5m/10m/30m/1h) ## Part 2: Vector Sync Metric Instrumentation Implemented metric recording throughout vector sync pipeline: ### metrics.py: Added convenience functions: - record_vector_sync_scan() - Track documents per scan - record_vector_sync_processing() - Track processing duration/status - record_qdrant_operation() - Track database operations - update_vector_sync_queue_size() - Track queue depth ### scanner.py: - Record number of documents found in each scan - Enables monitoring of scan throughput ### processor.py: - Record processing duration for each document - Track success/failure status with timing - Record Qdrant upsert/delete operations - Handle all code paths (success, deletion, error) ### semantic.py: - Wrap Qdrant query_points with try/except - Record search operation success/failure ## Metrics Exposed: - mcp_vector_sync_documents_scanned_total - mcp_vector_sync_documents_processed_total{status} - mcp_vector_sync_processing_duration_seconds (histogram) - mcp_vector_sync_queue_size (gauge) - mcp_qdrant_operations_total{operation,status} This enables monitoring of: - Scan and processing throughput - Processing latency (P50/P95/P99) - Error rates for processing and Qdrant operations - Queue depth trends - Complete observability of vector sync pipeline ## Testing: Verified locally that metrics are recorded correctly: - 36 documents scanned - 3 documents processed (avg 7.5s each) - 3 successful Qdrant upsert operations - Search operations tracked ## Deployment: Enable dashboard provisioning in Helm values: ```yaml dashboards: enabled: true grafanaFolder: "Nextcloud MCP" ``` 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-13 11:49:20 +01:00
Chris Coutinho	e575c8e57b	feat(vector): Support multiple embedding models with auto-generated collection names This PR enables safe switching between embedding models and multi-server deployments by implementing auto-generated Qdrant collection names based on deployment ID and model name. ## Problem Previously, all deployments used a single hardcoded collection name "nextcloud_content", which caused two critical issues: 1. Dimension mismatches when switching models: Changing OLLAMA_EMBEDDING_MODEL (e.g., nomic-embed-text at 768D → all-minilm at 384D) would cause runtime errors as vectors couldn't be inserted into a collection with incompatible dimensions. 2. Collection collisions in multi-server setups: Multiple MCP servers sharing a single Qdrant instance would overwrite each other's data, making horizontal scaling impossible. ## Solution ### Auto-Generated Collection Naming Collections are now automatically named using the pattern: \`{deployment-id}-{model-name}\` Deployment ID: Uses \`OTEL_SERVICE_NAME\` if configured (and not default value), otherwise falls back to \`hostname\` for simple Docker deployments. Model Name: From \`OLLAMA_EMBEDDING_MODEL\` with path separators sanitized. Examples: - \`my-mcp-server-nomic-embed-text\` (with OTEL_SERVICE_NAME=my-mcp-server) - \`mcp-container-all-minilm\` (simple Docker, hostname=mcp-container) Override: Users can still set \`QDRANT_COLLECTION\` explicitly to bypass auto-generation for backward compatibility. ### Dimension Validation Added startup validation that checks collection dimensions match the embedding service. If a mismatch is detected, the server fails fast with a clear error message explaining: - Expected vs actual dimensions - Likely cause (model change) - Solutions (delete collection, use different name, or revert model) ### Improved Sampling Error Handling Enhanced MCP sampling rejection handling to treat user rejections as normal behavior rather than errors: - User rejections ("rejected", "denied") → INFO log, no traceback - Unsupported clients → INFO log, no traceback - Other MCP errors → WARNING log, no traceback - Unexpected errors → ERROR log WITH traceback This aligns with the MCP specification where clients SHOULD prompt users for approval/denial of sampling requests. ## Changes ### Core Implementation - nextcloud_mcp_server/config.py: Added \`get_collection_name()\` method with deployment ID detection and model name sanitization - nextcloud_mcp_server/vector/qdrant_client.py: Dimension validation on collection open with helpful error messages - nextcloud_mcp_server/vector/{scanner,processor}.py: Updated to use \`get_collection_name()\` - nextcloud_mcp_server/auth/userinfo_routes.py: Vector sync status uses \`get_collection_name()\` - nextcloud_mcp_server/server/semantic.py: - Updated semantic search tools to use \`get_collection_name()\` - Improved sampling rejection error handling (McpError vs Exception) ### Documentation - docs/semantic-search-architecture.md: New comprehensive architecture document (557 lines) covering background sync, semantic search flow, RAG implementation, and deployment modes - docs/configuration.md: Added detailed "Qdrant Collection Naming" section with examples and multi-server deployment guidance - docker-compose.yml: Added comments explaining collection naming behavior - README.md: Updated semantic search descriptions to clarify experimental status, Notes-only support, and infrastructure requirements ## Migration Guide For existing single-server deployments: Option 1 (Recommended): Use explicit collection name for continuity \`\`\`bash QDRANT_COLLECTION=nextcloud_content # Keep existing collection \`\`\` Option 2: Allow auto-generation and re-embed \`\`\`bash # Remove QDRANT_COLLECTION override # New collection will be created based on deployment ID + model # Requires re-embedding all documents (may take time) \`\`\` For new multi-server deployments: Set unique OTEL service names per server: \`\`\`bash # Server 1 OTEL_SERVICE_NAME=mcp-prod OLLAMA_EMBEDDING_MODEL=nomic-embed-text # → Collection: "mcp-prod-nomic-embed-text" # Server 2 OTEL_SERVICE_NAME=mcp-staging OLLAMA_EMBEDDING_MODEL=nomic-embed-text # → Collection: "mcp-staging-nomic-embed-text" \`\`\` ## Benefits ✅ Safe model switching: Each model gets its own collection, preventing dimension mismatch errors ✅ Multi-server support: Multiple MCP servers can share one Qdrant instance without conflicts ✅ Clear ownership: Collection names show which deployment and model owns the data ✅ Better error messages: Dimension validation provides actionable guidance ✅ Backward compatible: Existing deployments can continue using \`QDRANT_COLLECTION\` override ## Testing Validated with: - Single-server deployments (default hostname-based naming) - Multi-server deployments (OTEL service name-based naming) - Model switching scenarios (dimension validation) - Collection override scenarios (backward compatibility) Next steps: Testing various Ollama embedding models to investigate optimal chunk sizes and performance characteristics. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-10 01:18:30 +01:00
Chris Coutinho	72232f937a	refactor: migrate vector sync from asyncio.Queue to anyio memory object streams Replace asyncio.Queue with anyio.create_memory_object_stream() throughout the vector sync system for better library consistency and improved shutdown semantics. ## Changes Made scanner.py: - Changed parameter type from `asyncio.Queue` to `MemoryObjectSendStream[DocumentTask]` - Replaced all `await document_queue.put()` calls with `await send_stream.send()` - Wrapped scanner loop in `async with send_stream:` context manager for automatic cleanup - Updated log messages: "Queued" → "Sent" - Removed `import asyncio` (no longer needed) processor.py: - Changed parameter type from `asyncio.Queue` to `MemoryObjectReceiveStream[DocumentTask]` - Replaced `asyncio.wait_for(document_queue.get(), timeout=1.0)` with `anyio.fail_after(1.0)` + `await receive_stream.receive()` - Removed all `document_queue.task_done()` calls (not needed with streams) - Added `anyio.EndOfStream` exception handling for graceful shutdown when scanner closes - Removed `import asyncio` (no longer needed) app.py: - Removed `import asyncio` from top-level imports - Added `from anyio.streams.memory import MemoryObjectReceiveStream, MemoryObjectSendStream` - Updated AppContext dataclass: - Replaced `document_queue: Optional[asyncio.Queue]` with: - `document_send_stream: Optional[MemoryObjectSendStream]` - `document_receive_stream: Optional[MemoryObjectReceiveStream]` - Updated `app_lifespan_basic()`: - Replaced `asyncio.Queue(maxsize=...)` with `anyio.create_memory_object_stream(max_buffer_size=...)` - Pass `send_stream` to scanner_task - Pass `receive_stream.clone()` to each processor_task (enables multiple consumers) - Updated AppContext yield to include both streams - Updated `starlette_lifespan()`: - Same changes as app_lifespan_basic for streamable-http transport - Removed `import asyncio as asyncio_module` (no longer needed) - Updated app.state storage to use send_stream and receive_stream semantic.py: - Updated `nc_get_vector_sync_status()` tool: - Access `document_receive_stream` instead of `document_queue` from lifespan context - Use `stream_stats.current_buffer_used` instead of `queue.qsize()` for pending count - More reliable metrics (qsize() was not guaranteed accurate) ## Benefits 1. Library Consistency: Pure anyio throughout codebase (was mixing asyncio.Queue with anyio.Event and anyio.create_task_group) 2. Graceful Shutdown: `async with send_stream:` automatically closes stream on exit, signaling EndOfStream to all processors 3. Better Timeout Handling: `anyio.fail_after()` is more idiomatic than `asyncio.wait_for()` 4. Stream Cloning: Easy to add multiple consumers via `receive_stream.clone()` 5. Better Statistics: `.statistics()` provides accurate buffer metrics (qsize() was unreliable) 6. Type Safety: Separate send/receive types prevent accidental misuse 7. No task_done() tracking: Streams handle completion automatically ## Testing - ✅ All 69 unit tests passing - ✅ All 5 smoke tests passing - ✅ No regressions in functionality - ✅ Graceful shutdown behavior improved ## References - https://anyio.readthedocs.io/en/stable/why.html#queue-fix - https://anyio.readthedocs.io/en/stable/streams.html#memory-object-streams 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-09 06:43:44 +01:00
Chris Coutinho	4b026e9aa0	feat: implement ADR-009 - refactor semantic search to use generic semantic:read scope This implements ADR-009, which documents the decision to use a generic `semantic:read` OAuth scope instead of requiring all app-specific scopes for semantic search functionality. Changes: - Created new `nextcloud_mcp_server/models/semantic.py` with semantic search models - SemanticSearchResult (with new doc_type field for multi-app support) - SemanticSearchResponse - SamplingSearchResponse - VectorSyncStatusResponse - Created new `nextcloud_mcp_server/server/semantic.py` with semantic search tools - nc_semantic_search (renamed from nc_notes_semantic_search) - nc_semantic_search_answer (renamed from nc_notes_semantic_search_answer) - nc_get_vector_sync_status (renamed from nc_notes_get_vector_sync_status) - All tools now use @require_scopes("semantic:read") instead of "notes:read" - Updated `nextcloud_mcp_server/server/notes.py` - Removed semantic search tools (moved to semantic.py) - Removed semantic search model imports - Removed unused MCP imports (ModelHint, ModelPreferences, etc.) - Updated `nextcloud_mcp_server/models/notes.py` - Removed semantic search models (moved to semantic.py) - Updated `nextcloud_mcp_server/app.py` - Import configure_semantic_tools - Register semantic tools when VECTOR_SYNC_ENABLED=true - Updated `nextcloud_mcp_server/server/__init__.py` - Export configure_semantic_tools - Updated tests - tests/integration/test_sampling.py: Use new tool names - tests/unit/test_response_models.py: Import from semantic.py, add doc_type field Architecture: - Semantic search is now a cross-app feature, not tied to Notes - Uses dual-phase authorization: semantic:read scope + per-document verification - Supports future multi-app indexing (notes, calendar, deck, files, contacts) Test results: - All 69 unit tests passing - All 5 smoke tests passing 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-09 05:53:53 +01:00

12 Commits