nextcloud-mcp-server

Author	SHA1	Message	Date
Chris Coutinho	d374bfa1e5	feat(viz): Add dual-score display and improve UI controls This commit enhances the vector visualization interface with better score transparency and improved UX: Dual-Score Display: - Store original algorithm scores before normalization (viz_routes.py:203) - Display both raw and normalized scores: "Raw Score: 0.842 (89% relative)" - Update plot hover text with dual scores (userinfo_routes.py:740) - Fixes issue where all queries showed at least one 100% match regardless of actual relevance (normalization artifact) UI Improvements: 1. Fusion Method dropdown: Changed from x-show to :disabled - Prevents jarring layout shift when switching algorithms - Dropdown stays visible but grayed out when Semantic is selected - Better UX with opacity: 0.5 and cursor: not-allowed 2. Score Threshold: Changed step from 0.1 to "any" - Allows arbitrary float precision (0.7, 0.85, 0.123) - Users can now fine-tune threshold values 3. Document Types: Converted multi-select to checkbox grid - Replaced clunky Ctrl/Cmd multi-select listbox - Checkbox grid with cleaner layout - Positioned left of Score Threshold and Result Limit inputs - More intuitive UX Technical Details: - Raw score ranges vary by algorithm: - Semantic: 0.0-1.0 (cosine similarity) - BM25 RRF: ~0.001-0.033 (Reciprocal Rank Fusion) - BM25 DBSF: Can exceed 1.0 (Distribution-Based Score Fusion) - Normalized scores (0-1) used for visual encoding (marker size, color) - Original scores preserved in API response via getattr fallback Files modified: - nextcloud_mcp_server/auth/viz_routes.py (store original_score) - nextcloud_mcp_server/auth/templates/vector_viz.html (UI controls) - nextcloud_mcp_server/auth/userinfo_routes.py (plot hover text) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-17 08:05:49 +01:00
Chris Coutinho	2522b13d35	ci: Add unit tests to ci	2025-11-17 06:51:40 +01:00
Chris Coutinho	6cfd7e2729	feat: add configurable fusion algorithms for BM25 hybrid search Added support for two fusion algorithms (RRF and DBSF) to combine dense semantic and sparse BM25 search results, with comprehensive documentation and unit tests. Changes: - Added fusion parameter to nc_semantic_search and nc_semantic_search_answer tools - Updated ADR-014 with detailed comparison of RRF vs DBSF fusion algorithms - Added unit tests for fusion algorithm initialization and validation - Updated search_method in responses to include fusion type (e.g., "bm25_hybrid_rrf") Fusion Algorithms: - RRF (Reciprocal Rank Fusion): Default, rank-based, general-purpose - DBSF (Distribution-Based Score Fusion): Score normalization using statistics RRF is recommended for most use cases due to its robustness and established track record. DBSF may provide better results when retrieval systems have very different score distributions. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-17 06:48:43 +01:00
Chris Coutinho	3aa7128f45	feat: add chunk position tracking to vector indexing and search Track character offsets (start_offset, end_offset) for each chunk in vector database metadata, enabling precise chunk highlighting in visualization pane. Changes: - processor.py: Store chunk_start_offset and chunk_end_offset in Qdrant metadata - processor.py: Added metadata_version=2 to indicate position tracking support - search/semantic.py: Return chunk positions from search results - server/semantic.py: Expose chunk positions in API responses (SemanticSearchResult) Enables viz pane to: 1. Display exact matched chunk with surrounding context 2. Highlight the precise portion of text that matched the query 3. Build user trust by showing what the RAG system actually retrieved Position tracking uses ChunkWithPosition dataclass from document_chunker.py which provides character-accurate offsets in the original document. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-17 06:47:58 +01:00
Chris Coutinho	c3282534eb	feat: add vector viz template and chunk context endpoint Extracted vector visualization HTML template to separate file to resolve syntax conflicts between Jinja2, Alpine.js, and CSS. Added chunk context endpoint for fetching matched chunks with surrounding text. Changes: - Moved vector_viz.html to templates/ directory (separates Jinja2/Alpine.js/CSS) - Added /app/chunk-context endpoint for retrieving chunk text with context - Updated .dockerignore to include HTML files in Docker builds - Moved anthropic and boto3 to main dependencies (needed for production features) - Added jinja2 dependency for template rendering Fixes Jinja2 TemplateSyntaxError caused by CSS colons being parsed as Jinja2 syntax when template was inline in Python code. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-17 06:46:52 +01:00
Chris Coutinho	862308418e	fix: prevent infinite loop in DocumentChunker with position tracking Fixed a critical infinite loop bug in document_chunker.py that occurred when the overlap parameter caused the chunker to not make forward progress. Changes: - Added ChunkWithPosition dataclass to track character positions - Refactored chunk_text() to use regex word matching for accurate position tracking - Added safety check to ensure forward progress (next_start_idx > start_idx) - Changed return type from list[str] to list[ChunkWithPosition] The bug manifested when: 1. end_idx reached len(word_matches) (processing last chunk) 2. next_start_idx = end_idx - overlap would not advance past start_idx 3. Loop would continue indefinitely without making progress Fix ensures chunker always terminates by breaking when not advancing. All 9 unit tests now pass in 1.66s (previously timing out at 180s). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-17 06:39:15 +01:00
Chris Coutinho	3464b21845	fix: Relax SearchResult validation to support DBSF fusion scores > 1.0 Fix false-positive validation error where DBSF (Distribution-Based Score Fusion) correctly produces scores > 1.0 but SearchResult validation incorrectly rejected them. Root Cause: SearchResult.__post_init__() enforced scores in [0.0, 1.0] range, but DBSF sums normalized scores from multiple retrieval systems (dense semantic + sparse BM25), resulting in scores like 1.55 when both systems strongly agree a document is relevant. Changes: - Relaxed validation to allow any score ≥ 0.0 (algorithms.py:147-157) - Updated SearchResult and SemanticSearchResult documentation to explain score ranges for RRF ([0.0, 1.0]) vs DBSF (unbounded) - Added comprehensive test coverage for both fusion methods - Added DBSF fusion option to vector visualization UI - Updated viz routes and vizApp() to support fusion parameter selection Testing: All 157 unit tests pass, type checking passes, ruff passes Fixes error: "Configuration error: Score must be between 0.0 and 1.0, got 1.1528953" 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-17 06:32:30 +01:00
Chris Coutinho	ea01ce7673	Merge pull request #311 from cbcoutinho/renovate/python-replacement chore(deps): replace python docker tag with docker.io/library/python	2025-11-16 12:11:52 +01:00
Chris Coutinho	216cb94383	Merge branch 'master' into renovate/python-replacement	2025-11-16 12:11:36 +01:00
Chris Coutinho	5f3e0b84a3	Merge pull request #310 from cbcoutinho/renovate/pin-dependencies chore(deps): pin dependencies	2025-11-16 12:10:57 +01:00
github-actions[bot]	39131cefcc	bump: version 0.39.0 → 0.40.0 nextcloud-mcp-server-0.40.0 v0.40.0	2025-11-16 11:09:40 +00:00
Chris Coutinho	9498c0fa36	Merge pull request #309 from cbcoutinho/feature/bedrock feat: Unified Provider Architecture + Amazon Bedrock Support	2025-11-16 12:09:12 +01:00
Chris Coutinho	ed33b39062	docs: fix ADR-014 template text and numbering - Remove template instruction text from line 1 - Fix ADR numbering from 007 to 014 to match filename	2025-11-16 12:08:37 +01:00
Chris Coutinho	1504df6fb5	Merge branch 'master' into feature/bedrock	2025-11-16 12:08:23 +01:00
renovate-bot-cbcoutinho[bot]	392e1536b9	chore(deps): replace python docker tag with docker.io/library/python	2025-11-16 11:07:34 +00:00
renovate-bot-cbcoutinho[bot]	00ed3f07e5	chore(deps): pin dependencies	2025-11-16 11:07:28 +00:00
github-actions[bot]	050e9a56b9	bump: version 0.38.0 → 0.39.0 nextcloud-mcp-server-0.39.0 v0.39.0	2025-11-16 11:02:48 +00:00
Chris Coutinho	7fccd47722	Merge pull request #304 from cbcoutinho/feature/bm25 feat: Replace custom keyword search with BM25 hybrid search via Qdrant	2025-11-16 12:02:18 +01:00
Chris Coutinho	f65b95ef07	Update Dockerfile	2025-11-16 11:58:13 +01:00
Chris Coutinho	c28fc955ca	Merge origin/master into feature/bm25 Resolved conflicts: - viz_routes.py: Kept bm25's extract_dense_vector() function for robust vector handling - hybrid.py: Removed (bm25 uses native Qdrant RRF fusion instead) - uv.lock: Regenerated after accepting master's dependencies This merge brings in: - RAG evaluation framework (ADR-013) - Performance optimizations (double-fetch elimination) - Migration from asyncio to anyio - OpenTelemetry tracing improvements - Notes app enhancements 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-16 11:52:40 +01:00
Chris Coutinho	ad4b45889f	fix: suppress Starlette middleware type warnings in ty checker	2025-11-16 11:43:50 +01:00
Chris Coutinho	5b484c9226	feat: add unified provider architecture with Amazon Bedrock support Refactored LLM provider infrastructure to support sustainable additions of new providers with both embedding and text generation capabilities. ## Major Changes ### Unified Provider Architecture (ADR-015) - Created `nextcloud_mcp_server/providers/` with unified Provider ABC - Providers now support optional capabilities (embeddings and/or generation) - Auto-detection registry with priority: Bedrock → Ollama → Simple - Backward compatible - existing code continues to work ### New Providers - BedrockProvider: Full Amazon Bedrock integration - Embeddings: Titan Embed, Cohere Embed models - Generation: Claude, Llama, Titan Text, Mistral models - Model-specific request/response handling - AWS credential chain integration - OllamaProvider: Migrated with both capabilities support - AnthropicProvider: Moved from test code to production providers - SimpleProvider: Migrated in-memory fallback provider ### Breaking Changes None - full backward compatibility maintained: - `embedding.get_embedding_service()` still works - RAG evaluation tests updated to use unified providers - All existing tests pass (127 unit tests) ### Testing - Added 9 comprehensive Bedrock unit tests with mocked boto3 - All existing unit tests pass - Type checking (ty) and linting (ruff) pass - Verified backward compatibility ### Documentation - `docs/ADR-015-unified-provider-architecture.md`: Comprehensive ADR - `docs/bedrock-setup.md`: AWS setup guide with IAM permissions - `CLAUDE.md`: Updated with provider architecture section ### Dependencies - Added `boto3>=1.35.0` to dev dependencies (optional) ## Environment Variables ### Bedrock - `AWS_REGION`: AWS region (e.g., "us-east-1") - `BEDROCK_EMBEDDING_MODEL`: Model ID for embeddings - `BEDROCK_GENERATION_MODEL`: Model ID for generation - `AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY`: Optional credentials ### Ollama - `OLLAMA_BASE_URL`: API URL - `OLLAMA_EMBEDDING_MODEL`: Embedding model (default: "nomic-embed-text") - `OLLAMA_GENERATION_MODEL`: Generation model ## AWS Bedrock Permissions Required Minimal IAM policy: ```json { "Effect": "Allow", "Action": ["bedrock:InvokeModel"], "Resource": ["arn:aws:bedrock:::foundation-model/"] } ``` See `docs/bedrock-setup.md` for detailed setup instructions. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-16 11:36:58 +01:00
github-actions[bot]	b58b200452	bump: version 0.37.0 → 0.38.0 nextcloud-mcp-server-0.38.0 v0.38.0	2025-11-16 10:18:37 +00:00
Chris Coutinho	c1aad94aa7	Merge pull request #308 from cbcoutinho/revert-305-feature/notes Revert "Feature/notes"	2025-11-16 11:18:12 +01:00
github-actions[bot]	10129354d9	bump: version 0.36.0 → 0.37.0	2025-11-16 10:18:00 +00:00
Chris Coutinho	259d33b41d	Revert "Feature/notes"	2025-11-16 11:17:59 +01:00
Chris Coutinho	32d8eaaab6	Merge pull request #305 from cbcoutinho/feature/notes Feature/notes	2025-11-16 11:17:51 +01:00
Chris Coutinho	8799450c7d	Merge pull request #306 from cbcoutinho/rag-evaluation feat: RAG evaluation framework with performance improvements	2025-11-16 11:17:41 +01:00
Chris Coutinho	1a02819999	Merge pull request #307 from cbcoutinho/feature/mcp-tool-tracing feat: Add OpenTelemetry tracing to @instrument_tool decorator	2025-11-16 11:17:33 +01:00
Chris Coutinho	c4bf077050	feat: Add OpenTelemetry tracing to @instrument_tool decorator Enhances the @instrument_tool decorator to create distributed traces for all MCP tool executions, improving observability and debugging. Changes: - Modified @instrument_tool to wrap tool execution in trace_operation - Added automatic span creation with mcp.tool.* span names - Sanitized tool arguments before adding to span attributes (excludes password, token, secret, api_key, etag, ctx) - Limited argument strings to 500 characters to prevent huge spans - Maintained existing Prometheus metrics functionality - Updated docs/observability.md to reflect correct decorator name - Added comprehensive unit tests All ~50+ MCP tools now emit traces automatically without code changes. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-16 11:16:05 +01:00
Chris Coutinho	f559ca049e	Merge branch 'rag-evaluation'	2025-11-16 10:26:19 +01:00
Chris Coutinho	02700a8e2c	perf: Eliminate double-fetching in semantic search sampling Performance optimization that removes redundant verification step and makes content fetching parallel in nc_semantic_search_answer tool. Changes: - Remove verification.py module (only had 1 caller) - Refactor nc_semantic_search to do inline deduplication instead of calling verify_search_results() - Migrate verification patterns (anyio task group, semaphore limiting) to nc_semantic_search_answer's content fetching - Change content fetching from sequential loop to parallel execution Performance impact: - Before: 10 API calls (5 parallel verification + 5 sequential content) = ~5.5s overhead - After: 5 API calls (parallel content fetch) = ~0.5s overhead - Result: 50% fewer API calls, ~10x faster for sampling operations Technical details: - Uses anyio.create_task_group() for structured concurrency - Semaphore limiting (max_concurrent=20) prevents connection pool exhaustion - Index-based storage maintains result ordering - Expected failures (deleted notes) logged at debug level - Deduplication handles hybrid search returning same doc from dense + sparse 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-16 10:25:04 +01:00
Chris Coutinho	8e7b3c3ded	Merge branch 'feature/notes'	2025-11-16 09:18:58 +01:00
Chris Coutinho	758cd5dbfb	build: bump submodule	2025-11-16 09:18:45 +01:00
Chris Coutinho	c74695af16	Merge branch 'feature/notes'	2025-11-16 08:28:00 +01:00
Chris Coutinho	f36f92120c	build: bump submodule	2025-11-16 08:27:49 +01:00
Chris Coutinho	1faf572546	Merge branch 'feature/bm25' Resolves conflict in viz_routes.py by combining: - Named vector extraction from feature/bm25 - Performance timing from master	2025-11-16 08:18:39 +01:00
Chris Coutinho	944b6dcf5a	fix: Handle named vectors in visualization and semantic search - viz_routes.py: Extract "dense" vector from named vector dict - semantic.py: Specify using="dense" for BM25 hybrid collections - Fixes "X must be 2D array" error in hybrid search - Fixes "Dense vector is not found" error in semantic search 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-16 08:16:35 +01:00
Chris Coutinho	2aa82d849c	Merge branch 'feature/bm25'	2025-11-16 07:57:36 +01:00
Chris Coutinho	fc6a2f14e4	fix: Update vizApp to use bm25_hybrid algorithm and remove deprecated weights The visualization UI was still using the old 'hybrid' algorithm name and weight parameters that were replaced by the BM25 hybrid search refactor. This caused "Unknown algorithm: hybrid" errors when using the search & visualize feature. Changes: - Update default algorithm from 'hybrid' to 'bm25_hybrid' - Update default scoreThreshold from 0.7 to 0.0 to match backend - Remove deprecated semanticWeight, keywordWeight, fuzzyWeight parameters - Remove weight parameters from search request Fixes the visualization search functionality after BM25 hybrid refactor. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-16 07:54:20 +01:00
Chris Coutinho	d1fb7eb633	Merge branch 'rag-evaluation'	2025-11-16 07:46:17 +01:00
Chris Coutinho	5e80f22d42	Merge pull request #303 from cbcoutinho/renovate/commitizen-tools-commitizen-action-0.x chore(deps): update commitizen-tools/commitizen-action action to v0.25.0	2025-11-16 07:37:05 +01:00
Chris Coutinho	96cee48258	build: Migrate image to debian-based	2025-11-16 07:32:01 +01:00
Chris Coutinho	16c22c953b	fix: Update viz routes to use BM25 hybrid search after refactor - Remove obsolete search algorithm imports (Fuzzy, Keyword, Hybrid) - Update UI to only show Semantic and BM25 Hybrid algorithms - Replace manual weight controls with RRF fusion info message - Update default algorithm from "hybrid" to "bm25_hybrid" - Remove weight parameters (semantic_weight, keyword_weight, fuzzy_weight) - Update score_threshold default from 0.7 to 0.0 for RRF scoring - Document ty type checker in CLAUDE.md Fixes unresolved-import type errors after BM25 refactor. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-16 07:23:11 +01:00
Chris Coutinho	529daf2b48	ci: temp disable sse in ci	2025-11-16 07:03:18 +01:00
Chris Coutinho	137d1d6c75	perf: fix vector viz search performance and visual encoding This commit addresses critical performance issues with vector visualization search (reducing time from 40s to ~2s) and improves result visualization through better visual encoding. ## Performance Fixes ### 1. Fix blocking sleep in retry decorator (base.py:51) - Changed `time.sleep(5)` to `await anyio.sleep(5)` in @retry_on_429 - Prevents entire event loop from freezing during rate limit retries - Impact: Reduced search time from 22s to 16s initially ### 2. Add concurrency limiting for verification (verification.py:77-93) - Added `anyio.Semaphore(20)` to limit concurrent HTTP requests - Prevents connection pool exhaustion (RequestError) from 90+ simultaneous requests - Fixes false filtering (was filtering 77/90 results incorrectly) - Note: Semaphore still in code but verification removed from viz endpoint ### 3. Remove unnecessary verification from viz endpoint (viz_routes.py:483-486) - Visualization only needs Qdrant metadata (title, excerpt), not full content - Verification only required for sampling (LLM needs full note content) - Impact: Reduced search time from 43.7s to ~2s (final fix) ### 4. Restore streaming scanner pattern (scanner.py) - Process notes one-at-a-time using async generator - Avoids loading all notes into memory ## Visualization Improvements ### 5. Result-relative score normalization (viz_routes.py:489-504) - Normalize scores within result set: best=1.0, worst=0.0 - Removes arbitrary RRF normalization (theoretical max didn't make sense) - Makes visual encoding meaningful regardless of algorithm scores ### 6. Power scaling for marker sizes (userinfo_routes.py:743) - Changed from linear `8 + (score * 12)` to power `6 + (score² * 14)` - Creates dramatic visual contrast: 0.0→6px, 0.5→9.5px, 1.0→20px - Combined with opacity (0.2-1.0) for clear visual hierarchy ### 7. Multi-channel visual encoding (userinfo_routes.py:740-745) - Size: Exponentially scaled with score² - Opacity: Linear 0.2-1.0 (keeps all points visible) - Color: Viridis gradient (blue→yellow) - Effect: Top results are large/bright/opaque, context results small/dim/transparent ## Result - Search time: 40s → ~2s (20x faster) - Visual contrast: Subtle → dramatic (clear result hierarchy) - No arbitrary cutoffs: All results visible, best naturally highlighted 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-16 07:01:35 +01:00
Chris Coutinho	b96657c935	ci: Add open-webui to docker-compose	2025-11-16 07:00:20 +01:00
Chris Coutinho	6fe5596c13	feat: Implement BM25 hybrid search with native Qdrant RRF fusion Replace custom keyword/fuzzy search algorithms with industry-standard BM25 sparse vectors, combined with dense semantic vectors using Qdrant's native Reciprocal Rank Fusion (RRF). This consolidates search architecture and improves relevance for both semantic and keyword queries. Key changes: - Add fastembed dependency for BM25 sparse vector generation - Update Qdrant collection schema to support named vectors (dense + sparse) - Create BM25SparseEmbeddingProvider using FastEmbed's Qdrant/bm25 model - Implement BM25HybridSearchAlgorithm with native Qdrant RRF prefetch - Update document processor to generate both dense and sparse embeddings - Simplify nc_semantic_search() tool to use BM25 hybrid only - Remove legacy keyword.py, fuzzy.py, and custom hybrid.py (736 lines) - Update ADR-014 with implementation notes and test results Benefits: - Consolidated architecture (single Qdrant database) - Native database-level RRF fusion (more efficient) - Industry-standard BM25 (replaces brittle custom keyword search) - Better relevance across semantic and keyword queries - Simplified codebase (-285 net lines) Tests: All 125 tests passing (118 unit, 7 integration) Implements ADR-014: Replace Custom Keyword Search with BM25 Hybrid Search 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-16 06:59:44 +01:00
Chris Coutinho	b174e7f8fb	ci: Add notes app for development	2025-11-16 06:57:28 +01:00
Chris Coutinho	f5bc3e3bc3	docs: init ADR	2025-11-16 06:24:25 +01:00

1 2 3 4 5 ...

1104 Commits