nextcloud-mcp-server

Author	SHA1	Message	Date
Chris Coutinho	c8d9cc24e0	refactor: migrate asyncio to anyio for consistent structured concurrency Replace asyncio primitives with anyio equivalents throughout the codebase to establish a single async pattern. This provides better structured concurrency with automatic cancellation on errors and aligns with the pytest anyio configuration. Changes: - hybrid.py: Replace asyncio.gather() with anyio task groups - token_broker.py: Replace asyncio.Lock() with anyio.Lock() - storage.py: Replace asyncio.run() with anyio.run() - app.py: Replace tg.start_soon() with await tg.start() for task status - processor.py: Add task_status parameter for structured startup - scanner.py: Add task_status parameter for structured startup - CLAUDE.md: Update async/await patterns guidance The change from start_soon() to await tg.start() enables proper task initialization signaling, ensuring background tasks are ready before proceeding. This follows anyio best practices for structured concurrency. All 118 unit tests pass with the new implementation. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-16 03:51:45 +01:00
Chris Coutinho	eaeb8eae28	feat: Normalize hybrid search RRF scores to 0-1 range Improve user comprehension by scaling RRF scores to match the intuitive 0-1 range used by other search algorithms. ## Problem RRF (Reciprocal Rank Fusion) scores had a drastically different scale than semantic/keyword/fuzzy scores: - Semantic similarity: 0.0 to 1.0 (typical: 0.5-0.9) - RRF scores: 0.0 to ~0.016 (typical: 0.005-0.015) This caused user confusion - a score of 0.0078 looked terrible but was actually excellent (near theoretical maximum). ## Solution Normalize RRF scores using the formula: `normalized_score = rrf_score * (rrf_k + 1) / total_weight` Where: - rrf_k = 60 (RRF constant) - total_weight = sum of algorithm weights (default: 1.0) Example transformation: - Before: 0.0078 (confusing) - After: 0.477 (intuitive) ## Changes nextcloud_mcp_server/search/hybrid.py: - Store total_weight as instance variable (line 63) - Calculate normalization factor in _reciprocal_rank_fusion() (line 209) - Apply normalization to all RRF scores (line 217) - Preserve raw RRF score in metadata for debugging (line 222) ## Impact User Experience: - Hybrid search scores now comparable with semantic/keyword/fuzzy - Score of 0.5 indicates good match across all algorithms - Consistent scale improves score threshold usability Backward Compatibility: - Raw RRF scores preserved in metadata["rrf_score_raw"] - Result ordering unchanged (normalization is linear transformation) - Breaking change: Existing score thresholds need adjustment Performance: - Negligible overhead (single multiplication per result) ## Testing Verified with nc_semantic_search and nc_semantic_search_answer: - Hybrid scores now 0.47-0.7 range (was 0.003-0.011) - Semantic scores unchanged (0.75) - Result ordering preserved 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-15 06:48:58 +01:00
Chris Coutinho	42376483ab	refactor: Optimize Nextcloud access verification with centralized filtering Move access verification from individual search algorithms to final output stage, eliminating redundant API calls and improving performance. ## Changes New: - `search/verification.py`: Centralized verification using anyio task groups - Deduplicates results by (doc_id, doc_type) before verification - Verifies all unique documents in parallel using structured concurrency - Filters out inaccessible documents in single pass Modified Search Algorithms: - `search/semantic.py`: Removed _deduplicate_and_verify() and _verify_document_access() - `search/keyword.py`: Removed _verify_access() and parallel verification - `search/fuzzy.py`: Removed _verify_access() and parallel verification - `search/hybrid.py`: Removed nextcloud_client parameter passing All algorithms now return unverified results from Qdrant payload. Modified Output Stages: - `server/semantic.py`: Added verify_search_results() call after search - `auth/viz_routes.py`: Added verify_search_results() call after search Both endpoints now verify access once at final stage with deduplication. ## Performance Impact Before: - Hybrid mode (limit=10): 30 API calls (10 per algorithm × 3 algorithms) - Single algorithm: 10-20 API calls (with verification buffer) After: - Hybrid mode (limit=10): 10 API calls (deduplicated verification) - Single algorithm: 10 API calls (deduplicated verification) Performance Gain: 3x reduction in API calls for hybrid search ## Architecture Benefits - Separation of concerns: Algorithms handle scoring, output stage handles security - Deduplication: Each document verified exactly once - Parallel execution: All verifications run concurrently via anyio task groups - Consistency: Same verification logic across MCP tools and viz endpoints 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-15 06:21:06 +01:00
Chris Coutinho	b5b03bfd78	feat: Add multi-document Protocol with cross-app search support Implements NextcloudClientProtocol for multi-document type search following user requirement that document types are not 1:1 with apps (e.g., Notes app specializes in markdown, while Files/WebDAV handles multiple file types). Key Changes: - NextcloudClientProtocol: Generic protocol with app-specific client properties - get_indexed_doc_types(): Query Qdrant for actually-indexed document types - Document dispatch: All algorithms check Qdrant before attempting access - Cross-type deduplication: Use (doc_id, doc_type) tuples in hybrid RRF Search Algorithm Updates: - Semantic: Added _verify_document_access() with dispatch to appropriate client - Deduplication by (doc_id, doc_type) tuple - Only "note" verification implemented, others return None with info log - Keyword: Added _fetch_documents() dispatch method - Queries Qdrant for available types before fetching - Supports cross-app search when doc_type=None - Fuzzy: Same pattern as keyword search - Hybrid: Already uses (doc_id, doc_type) for deduplication (no changes needed) Future-Proof Design: - File/calendar verification stubs in place - Clear logging when unsupported types found - Easy to extend when processor indexes new document types Currently Supported: - "note" documents fully implemented and tested - Other types gracefully handled (logged but skipped) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-15 01:19:29 +01:00
Chris Coutinho	11e620f2d1	feat: Implement unified search algorithm module Creates shared search module with four algorithms implementing ADR-012: - Semantic search (vector similarity via Qdrant) - Keyword search (token-based matching from ADR-001) - Fuzzy search (character overlap matching) - Hybrid search (RRF fusion from ADR-003) Architecture: - Base SearchAlgorithm interface for consistent API - SearchResult dataclass for unified result format - All algorithms async and independently testable - Proper logging and error handling throughout Semantic Search (search/semantic.py): - Extracted from server/semantic.py - Vector similarity using Qdrant query_points - Dual-phase authorization (vector filter + API verification) - Deduplication of document chunks - Configurable score threshold (default: 0.7) Keyword Search (search/keyword.py): - Implements ADR-001 token-based matching - Title matches weighted 3x higher than content - Case-insensitive token matching - Relevance scoring with normalization - Excerpt extraction with context Fuzzy Search (search/fuzzy.py): - Simple character overlap calculation - Configurable threshold (default: 70%) - Typo-tolerant matching - Fast and dependency-free Hybrid Search (search/hybrid.py): - Reciprocal Rank Fusion (RRF) from ADR-003 - Parallel execution of sub-algorithms - Configurable weights per algorithm - RRF constant k=60 (standard value) - Weight validation (must sum ≤1.0) All algorithms: - Share NextcloudClient for document access - Support user_id filtering (multi-tenant) - Support doc_type filtering (currently notes only) - Return consistent SearchResult objects - Properly formatted with ruff and type-checked Next steps: Update MCP tool to use these algorithms 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-15 00:10:19 +01:00

5 Commits