Compare commits

...

91 Commits

Author SHA1 Message Date
Chris Coutinho ed33b39062 docs: fix ADR-014 template text and numbering
- Remove template instruction text from line 1
- Fix ADR numbering from 007 to 014 to match filename
2025-11-16 12:08:37 +01:00
Chris Coutinho 1504df6fb5 Merge branch 'master' into feature/bedrock 2025-11-16 12:08:23 +01:00
github-actions[bot] 050e9a56b9 bump: version 0.38.0 → 0.39.0 2025-11-16 11:02:48 +00:00
Chris Coutinho 7fccd47722 Merge pull request #304 from cbcoutinho/feature/bm25
feat: Replace custom keyword search with BM25 hybrid search via Qdrant
2025-11-16 12:02:18 +01:00
Chris Coutinho f65b95ef07 Update Dockerfile 2025-11-16 11:58:13 +01:00
Chris Coutinho c28fc955ca Merge origin/master into feature/bm25
Resolved conflicts:
- viz_routes.py: Kept bm25's extract_dense_vector() function for robust vector handling
- hybrid.py: Removed (bm25 uses native Qdrant RRF fusion instead)
- uv.lock: Regenerated after accepting master's dependencies

This merge brings in:
- RAG evaluation framework (ADR-013)
- Performance optimizations (double-fetch elimination)
- Migration from asyncio to anyio
- OpenTelemetry tracing improvements
- Notes app enhancements

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-16 11:52:40 +01:00
Chris Coutinho ad4b45889f fix: suppress Starlette middleware type warnings in ty checker 2025-11-16 11:43:50 +01:00
Chris Coutinho 5b484c9226 feat: add unified provider architecture with Amazon Bedrock support
Refactored LLM provider infrastructure to support sustainable additions of new providers with both embedding and text generation capabilities.

## Major Changes

### Unified Provider Architecture (ADR-015)
- Created `nextcloud_mcp_server/providers/` with unified Provider ABC
- Providers now support optional capabilities (embeddings and/or generation)
- Auto-detection registry with priority: Bedrock → Ollama → Simple
- Backward compatible - existing code continues to work

### New Providers
- **BedrockProvider**: Full Amazon Bedrock integration
  - Embeddings: Titan Embed, Cohere Embed models
  - Generation: Claude, Llama, Titan Text, Mistral models
  - Model-specific request/response handling
  - AWS credential chain integration
- **OllamaProvider**: Migrated with both capabilities support
- **AnthropicProvider**: Moved from test code to production providers
- **SimpleProvider**: Migrated in-memory fallback provider

### Breaking Changes
None - full backward compatibility maintained:
- `embedding.get_embedding_service()` still works
- RAG evaluation tests updated to use unified providers
- All existing tests pass (127 unit tests)

### Testing
- Added 9 comprehensive Bedrock unit tests with mocked boto3
- All existing unit tests pass
- Type checking (ty) and linting (ruff) pass
- Verified backward compatibility

### Documentation
- `docs/ADR-015-unified-provider-architecture.md`: Comprehensive ADR
- `docs/bedrock-setup.md`: AWS setup guide with IAM permissions
- `CLAUDE.md`: Updated with provider architecture section

### Dependencies
- Added `boto3>=1.35.0` to dev dependencies (optional)

## Environment Variables

### Bedrock
- `AWS_REGION`: AWS region (e.g., "us-east-1")
- `BEDROCK_EMBEDDING_MODEL`: Model ID for embeddings
- `BEDROCK_GENERATION_MODEL`: Model ID for generation
- `AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY`: Optional credentials

### Ollama
- `OLLAMA_BASE_URL`: API URL
- `OLLAMA_EMBEDDING_MODEL`: Embedding model (default: "nomic-embed-text")
- `OLLAMA_GENERATION_MODEL`: Generation model

## AWS Bedrock Permissions Required

Minimal IAM policy:
```json
{
  "Effect": "Allow",
  "Action": ["bedrock:InvokeModel"],
  "Resource": ["arn:aws:bedrock:*::foundation-model/*"]
}
```

See `docs/bedrock-setup.md` for detailed setup instructions.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-16 11:36:58 +01:00
github-actions[bot] b58b200452 bump: version 0.37.0 → 0.38.0 2025-11-16 10:18:37 +00:00
Chris Coutinho c1aad94aa7 Merge pull request #308 from cbcoutinho/revert-305-feature/notes
Revert "Feature/notes"
2025-11-16 11:18:12 +01:00
github-actions[bot] 10129354d9 bump: version 0.36.0 → 0.37.0 2025-11-16 10:18:00 +00:00
Chris Coutinho 259d33b41d Revert "Feature/notes" 2025-11-16 11:17:59 +01:00
Chris Coutinho 32d8eaaab6 Merge pull request #305 from cbcoutinho/feature/notes
Feature/notes
2025-11-16 11:17:51 +01:00
Chris Coutinho 8799450c7d Merge pull request #306 from cbcoutinho/rag-evaluation
feat: RAG evaluation framework with performance improvements
2025-11-16 11:17:41 +01:00
Chris Coutinho 1a02819999 Merge pull request #307 from cbcoutinho/feature/mcp-tool-tracing
feat: Add OpenTelemetry tracing to @instrument_tool decorator
2025-11-16 11:17:33 +01:00
Chris Coutinho c4bf077050 feat: Add OpenTelemetry tracing to @instrument_tool decorator
Enhances the @instrument_tool decorator to create distributed traces
for all MCP tool executions, improving observability and debugging.

Changes:
- Modified @instrument_tool to wrap tool execution in trace_operation
- Added automatic span creation with mcp.tool.* span names
- Sanitized tool arguments before adding to span attributes
  (excludes password, token, secret, api_key, etag, ctx)
- Limited argument strings to 500 characters to prevent huge spans
- Maintained existing Prometheus metrics functionality
- Updated docs/observability.md to reflect correct decorator name
- Added comprehensive unit tests

All ~50+ MCP tools now emit traces automatically without code changes.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-16 11:16:05 +01:00
Chris Coutinho f559ca049e Merge branch 'rag-evaluation' 2025-11-16 10:26:19 +01:00
Chris Coutinho 02700a8e2c perf: Eliminate double-fetching in semantic search sampling
Performance optimization that removes redundant verification step and
makes content fetching parallel in nc_semantic_search_answer tool.

Changes:
- Remove verification.py module (only had 1 caller)
- Refactor nc_semantic_search to do inline deduplication instead of
  calling verify_search_results()
- Migrate verification patterns (anyio task group, semaphore limiting)
  to nc_semantic_search_answer's content fetching
- Change content fetching from sequential loop to parallel execution

Performance impact:
- Before: 10 API calls (5 parallel verification + 5 sequential content)
  = ~5.5s overhead
- After: 5 API calls (parallel content fetch) = ~0.5s overhead
- Result: 50% fewer API calls, ~10x faster for sampling operations

Technical details:
- Uses anyio.create_task_group() for structured concurrency
- Semaphore limiting (max_concurrent=20) prevents connection pool exhaustion
- Index-based storage maintains result ordering
- Expected failures (deleted notes) logged at debug level
- Deduplication handles hybrid search returning same doc from dense + sparse

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-16 10:25:04 +01:00
Chris Coutinho 8e7b3c3ded Merge branch 'feature/notes' 2025-11-16 09:18:58 +01:00
Chris Coutinho 758cd5dbfb build: bump submodule 2025-11-16 09:18:45 +01:00
Chris Coutinho c74695af16 Merge branch 'feature/notes' 2025-11-16 08:28:00 +01:00
Chris Coutinho f36f92120c build: bump submodule 2025-11-16 08:27:49 +01:00
Chris Coutinho 1faf572546 Merge branch 'feature/bm25'
Resolves conflict in viz_routes.py by combining:
- Named vector extraction from feature/bm25
- Performance timing from master
2025-11-16 08:18:39 +01:00
Chris Coutinho 944b6dcf5a fix: Handle named vectors in visualization and semantic search
- viz_routes.py: Extract "dense" vector from named vector dict
- semantic.py: Specify using="dense" for BM25 hybrid collections
- Fixes "X must be 2D array" error in hybrid search
- Fixes "Dense vector  is not found" error in semantic search

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-16 08:16:35 +01:00
Chris Coutinho 2aa82d849c Merge branch 'feature/bm25' 2025-11-16 07:57:36 +01:00
Chris Coutinho fc6a2f14e4 fix: Update vizApp to use bm25_hybrid algorithm and remove deprecated weights
The visualization UI was still using the old 'hybrid' algorithm name and
weight parameters that were replaced by the BM25 hybrid search refactor.
This caused "Unknown algorithm: hybrid" errors when using the search
& visualize feature.

Changes:
- Update default algorithm from 'hybrid' to 'bm25_hybrid'
- Update default scoreThreshold from 0.7 to 0.0 to match backend
- Remove deprecated semanticWeight, keywordWeight, fuzzyWeight parameters
- Remove weight parameters from search request

Fixes the visualization search functionality after BM25 hybrid refactor.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-16 07:54:20 +01:00
Chris Coutinho d1fb7eb633 Merge branch 'rag-evaluation' 2025-11-16 07:46:17 +01:00
Chris Coutinho 5e80f22d42 Merge pull request #303 from cbcoutinho/renovate/commitizen-tools-commitizen-action-0.x
chore(deps): update commitizen-tools/commitizen-action action to v0.25.0
2025-11-16 07:37:05 +01:00
Chris Coutinho 96cee48258 build: Migrate image to debian-based 2025-11-16 07:32:01 +01:00
Chris Coutinho 16c22c953b fix: Update viz routes to use BM25 hybrid search after refactor
- Remove obsolete search algorithm imports (Fuzzy, Keyword, Hybrid)
- Update UI to only show Semantic and BM25 Hybrid algorithms
- Replace manual weight controls with RRF fusion info message
- Update default algorithm from "hybrid" to "bm25_hybrid"
- Remove weight parameters (semantic_weight, keyword_weight, fuzzy_weight)
- Update score_threshold default from 0.7 to 0.0 for RRF scoring
- Document ty type checker in CLAUDE.md

Fixes unresolved-import type errors after BM25 refactor.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-16 07:23:11 +01:00
Chris Coutinho 529daf2b48 ci: temp disable sse in ci 2025-11-16 07:03:18 +01:00
Chris Coutinho 137d1d6c75 perf: fix vector viz search performance and visual encoding
This commit addresses critical performance issues with vector visualization
search (reducing time from 40s to ~2s) and improves result visualization
through better visual encoding.

## Performance Fixes

### 1. Fix blocking sleep in retry decorator (base.py:51)
- Changed `time.sleep(5)` to `await anyio.sleep(5)` in @retry_on_429
- Prevents entire event loop from freezing during rate limit retries
- Impact: Reduced search time from 22s to 16s initially

### 2. Add concurrency limiting for verification (verification.py:77-93)
- Added `anyio.Semaphore(20)` to limit concurrent HTTP requests
- Prevents connection pool exhaustion (RequestError) from 90+ simultaneous requests
- Fixes false filtering (was filtering 77/90 results incorrectly)
- Note: Semaphore still in code but verification removed from viz endpoint

### 3. Remove unnecessary verification from viz endpoint (viz_routes.py:483-486)
- Visualization only needs Qdrant metadata (title, excerpt), not full content
- Verification only required for sampling (LLM needs full note content)
- Impact: Reduced search time from 43.7s to ~2s (final fix)

### 4. Restore streaming scanner pattern (scanner.py)
- Process notes one-at-a-time using async generator
- Avoids loading all notes into memory

## Visualization Improvements

### 5. Result-relative score normalization (viz_routes.py:489-504)
- Normalize scores within result set: best=1.0, worst=0.0
- Removes arbitrary RRF normalization (theoretical max didn't make sense)
- Makes visual encoding meaningful regardless of algorithm scores

### 6. Power scaling for marker sizes (userinfo_routes.py:743)
- Changed from linear `8 + (score * 12)` to power `6 + (score² * 14)`
- Creates dramatic visual contrast: 0.0→6px, 0.5→9.5px, 1.0→20px
- Combined with opacity (0.2-1.0) for clear visual hierarchy

### 7. Multi-channel visual encoding (userinfo_routes.py:740-745)
- Size: Exponentially scaled with score²
- Opacity: Linear 0.2-1.0 (keeps all points visible)
- Color: Viridis gradient (blue→yellow)
- Effect: Top results are large/bright/opaque, context results small/dim/transparent

## Result
- Search time: 40s → ~2s (20x faster)
- Visual contrast: Subtle → dramatic (clear result hierarchy)
- No arbitrary cutoffs: All results visible, best naturally highlighted

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-16 07:01:35 +01:00
Chris Coutinho b96657c935 ci: Add open-webui to docker-compose 2025-11-16 07:00:20 +01:00
Chris Coutinho 6fe5596c13 feat: Implement BM25 hybrid search with native Qdrant RRF fusion
Replace custom keyword/fuzzy search algorithms with industry-standard BM25
sparse vectors, combined with dense semantic vectors using Qdrant's native
Reciprocal Rank Fusion (RRF). This consolidates search architecture and
improves relevance for both semantic and keyword queries.

Key changes:
- Add fastembed dependency for BM25 sparse vector generation
- Update Qdrant collection schema to support named vectors (dense + sparse)
- Create BM25SparseEmbeddingProvider using FastEmbed's Qdrant/bm25 model
- Implement BM25HybridSearchAlgorithm with native Qdrant RRF prefetch
- Update document processor to generate both dense and sparse embeddings
- Simplify nc_semantic_search() tool to use BM25 hybrid only
- Remove legacy keyword.py, fuzzy.py, and custom hybrid.py (736 lines)
- Update ADR-014 with implementation notes and test results

Benefits:
- Consolidated architecture (single Qdrant database)
- Native database-level RRF fusion (more efficient)
- Industry-standard BM25 (replaces brittle custom keyword search)
- Better relevance across semantic and keyword queries
- Simplified codebase (-285 net lines)

Tests: All 125 tests passing (118 unit, 7 integration)

Implements ADR-014: Replace Custom Keyword Search with BM25 Hybrid Search

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-16 06:59:44 +01:00
Chris Coutinho b174e7f8fb ci: Add notes app for development 2025-11-16 06:57:28 +01:00
Chris Coutinho f5bc3e3bc3 docs: init ADR 2025-11-16 06:24:25 +01:00
renovate-bot-cbcoutinho[bot] a9eb2c1da2 chore(deps): update commitizen-tools/commitizen-action action to v0.25.0 2025-11-16 05:07:20 +00:00
Chris Coutinho c8d9cc24e0 refactor: migrate asyncio to anyio for consistent structured concurrency
Replace asyncio primitives with anyio equivalents throughout the codebase
to establish a single async pattern. This provides better structured
concurrency with automatic cancellation on errors and aligns with the
pytest anyio configuration.

Changes:
- hybrid.py: Replace asyncio.gather() with anyio task groups
- token_broker.py: Replace asyncio.Lock() with anyio.Lock()
- storage.py: Replace asyncio.run() with anyio.run()
- app.py: Replace tg.start_soon() with await tg.start() for task status
- processor.py: Add task_status parameter for structured startup
- scanner.py: Add task_status parameter for structured startup
- CLAUDE.md: Update async/await patterns guidance

The change from start_soon() to await tg.start() enables proper task
initialization signaling, ensuring background tasks are ready before
proceeding. This follows anyio best practices for structured concurrency.

All 118 unit tests pass with the new implementation.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-16 03:51:45 +01:00
Chris Coutinho 98d1c2de8e perf: make note deletion concurrent in upload --force
- Collect all notes to delete first, then delete concurrently
- Use anyio task group with semaphore (20 concurrent deletions)
- Add progress reporting and error tracking for deletions
- Show count of notes found before deletion starts

This significantly improves --force performance when refreshing large
corpuses (e.g., 3,633 notes now delete in ~1 minute instead of ~5 minutes).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-16 00:55:27 +01:00
Chris Coutinho 30a4d84458 feat: add concurrent uploads and --force flag to upload command
- Add --force flag to delete all existing notes in target category before upload
- Implement concurrent uploads using anyio task groups (20 concurrent max)
- Add semaphore to limit concurrent requests and avoid overwhelming server
- Improve progress reporting with upload count and error tracking
- Update README with --force flag documentation

Performance improvement: Concurrent uploads significantly reduce upload time
from ~10-15 minutes to ~2-3 minutes for 3,633 documents.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-16 00:41:00 +01:00
Chris Coutinho fca8ab0cfd Merge remote-tracking branch 'origin/master' into rag-evaluation 2025-11-16 00:32:59 +01:00
github-actions[bot] 7a7ed79d56 bump: version 0.35.0 → 0.36.0 2025-11-15 23:32:55 +00:00
Chris Coutinho 7e7d861797 Merge pull request #302 from cbcoutinho/feature/viz
feat: Vector visualization enhancements and search optimizations
2025-11-16 00:32:31 +01:00
Chris Coutinho 4fa2edf4c7 ci: Set default scan interval to 5min 2025-11-16 00:10:12 +01:00
Chris Coutinho defa8db18e fix: download qrels from BEIR ZIP instead of HuggingFace
- HuggingFace BeIR/nfcorpus only has 'corpus' and 'queries' configs
- Download qrels from original BEIR ZIP file (nfcorpus.zip)
- Use synchronous httpx.Client for download (simpler than async)
- Remove deprecated trust_remote_code parameter

Tested with successful corpus download and qrels extraction.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-16 00:02:15 +01:00
Chris Coutinho c9506da2d2 refactor: replace httpx client with NextcloudClient in upload command
- Use NextcloudClient with BasicAuth instead of raw httpx
- Replace direct HTTP POST with notes.create_note() method
- Add close() method to LLMProvider Protocol for proper cleanup
- Fix type annotations for dataset iteration

This improves code reuse and consistency with the rest of the codebase.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-15 23:26:07 +01:00
Chris Coutinho c272ddd82d feat: implement RAG evaluation framework with CLI tooling
- Add ADR-013 documenting RAG evaluation architecture
- Implement two-part evaluation: Context Recall (retrieval) + Answer Correctness (generation)
- Create Click CLI for ground truth generation and corpus upload
- Add pytest fixtures and tests for retrieval/generation quality
- Use BeIR/nfcorpus dataset with 5 selected test queries
- Support Ollama and Anthropic LLM providers
- Generate synthetic ground truth answers offline
- Add comprehensive documentation in tests/rag_evaluation/README.md

The framework separates one-time setup (generate/upload) from test execution,
making tests much faster (~6-12 min vs ~15-25 min per run).

Tests are manual only (not in CI) and require external LLM access.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-15 23:11:21 +01:00
Chris Coutinho eaeb8eae28 feat: Normalize hybrid search RRF scores to 0-1 range
Improve user comprehension by scaling RRF scores to match the intuitive
0-1 range used by other search algorithms.

## Problem

RRF (Reciprocal Rank Fusion) scores had a drastically different scale
than semantic/keyword/fuzzy scores:

- Semantic similarity: 0.0 to 1.0 (typical: 0.5-0.9)
- RRF scores: 0.0 to ~0.016 (typical: 0.005-0.015)

This caused user confusion - a score of 0.0078 looked terrible but was
actually excellent (near theoretical maximum).

## Solution

Normalize RRF scores using the formula:
`normalized_score = rrf_score * (rrf_k + 1) / total_weight`

Where:
- rrf_k = 60 (RRF constant)
- total_weight = sum of algorithm weights (default: 1.0)

**Example transformation:**
- Before: 0.0078 (confusing)
- After: 0.477 (intuitive)

## Changes

**nextcloud_mcp_server/search/hybrid.py:**
- Store total_weight as instance variable (line 63)
- Calculate normalization factor in _reciprocal_rank_fusion() (line 209)
- Apply normalization to all RRF scores (line 217)
- Preserve raw RRF score in metadata for debugging (line 222)

## Impact

**User Experience:**
- Hybrid search scores now comparable with semantic/keyword/fuzzy
- Score of 0.5 indicates good match across all algorithms
- Consistent scale improves score threshold usability

**Backward Compatibility:**
- Raw RRF scores preserved in metadata["rrf_score_raw"]
- Result ordering unchanged (normalization is linear transformation)
- Breaking change: Existing score thresholds need adjustment

**Performance:**
- Negligible overhead (single multiplication per result)

## Testing

Verified with nc_semantic_search and nc_semantic_search_answer:
- Hybrid scores now 0.47-0.7 range (was 0.003-0.011)
- Semantic scores unchanged (0.75)
- Result ordering preserved

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-15 06:48:58 +01:00
Chris Coutinho 42376483ab refactor: Optimize Nextcloud access verification with centralized filtering
Move access verification from individual search algorithms to final output
stage, eliminating redundant API calls and improving performance.

## Changes

**New:**
- `search/verification.py`: Centralized verification using anyio task groups
  - Deduplicates results by (doc_id, doc_type) before verification
  - Verifies all unique documents in parallel using structured concurrency
  - Filters out inaccessible documents in single pass

**Modified Search Algorithms:**
- `search/semantic.py`: Removed _deduplicate_and_verify() and _verify_document_access()
- `search/keyword.py`: Removed _verify_access() and parallel verification
- `search/fuzzy.py`: Removed _verify_access() and parallel verification
- `search/hybrid.py`: Removed nextcloud_client parameter passing

All algorithms now return unverified results from Qdrant payload.

**Modified Output Stages:**
- `server/semantic.py`: Added verify_search_results() call after search
- `auth/viz_routes.py`: Added verify_search_results() call after search

Both endpoints now verify access once at final stage with deduplication.

## Performance Impact

**Before:**
- Hybrid mode (limit=10): 30 API calls (10 per algorithm × 3 algorithms)
- Single algorithm: 10-20 API calls (with verification buffer)

**After:**
- Hybrid mode (limit=10): 10 API calls (deduplicated verification)
- Single algorithm: 10 API calls (deduplicated verification)

**Performance Gain:** 3x reduction in API calls for hybrid search

## Architecture Benefits

- **Separation of concerns**: Algorithms handle scoring, output stage handles security
- **Deduplication**: Each document verified exactly once
- **Parallel execution**: All verifications run concurrently via anyio task groups
- **Consistency**: Same verification logic across MCP tools and viz endpoints

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-15 06:21:06 +01:00
Chris Coutinho ed0825e661 feat: Enhance vector visualization UI and parallelize search verification
Vector Visualization Improvements:
- Add interactive vector viz tab with Alpine.js and Plotly.js to user info page
- Refactor viz route CSS for better scoping and maintainability
- Remove unused nextcloud_host variable

Performance Optimizations:
- Parallelize access verification in fuzzy and keyword search algorithms
- Use asyncio.gather() to verify multiple documents concurrently
- Add exception handling with return_exceptions=True for resilience

Dependencies:
- Update third_party/oidc submodule to include RFC 9728 resource_url support

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-15 05:39:07 +01:00
Chris Coutinho e3153822f7 perf: Exclude vector-sync status polling from distributed tracing
Skip tracing for /app/vector-sync/status to reduce noise from HTMX polling.
Metrics collection continues for this endpoint.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-15 05:19:35 +01:00
Chris Coutinho 2b35dd729f fix: Reorder tabs and fix viz pane session access
- Move Webhooks tab to the right (User Info | Vector Sync | Vector Viz | Webhooks)
- Use request.user.display_name instead of session for viz routes
- Fixes session middleware error when accessing via iframe
2025-11-15 02:41:42 +01:00
Chris Coutinho eb32bbbc6b feat: Add Vector Viz tab to app home page
- Add Vector Viz button to tab navigation
- Embed viz pane in iframe for seamless integration
- Only shown when vector sync is enabled
2025-11-15 02:38:05 +01:00
Chris Coutinho 916af1c8f3 feat: Add vector visualization pane with multi-select document types
- Add /app/vector-viz endpoint for interactive search testing
- Implement server-side PCA dimensionality reduction (768-dim → 2D)
- Support multi-select document type filter for cross-app search
- Support all search algorithms: semantic, keyword, fuzzy, hybrid
- Display 2D scatter plot of vector embeddings using Plotly
- Show search results with scores and document types
- Register viz routes in app.py
2025-11-15 02:32:10 +01:00
Chris Coutinho 9a62c8478f feat: Implement custom PCA to remove sklearn dependency
- Add custom PCA implementation using numpy eigendecomposition
- Replace sklearn.decomposition.PCA with custom implementation
- Maintains same API (fit, transform, fit_transform)
- Supports explained_variance_ratio_ for variance analysis
- Removes scikit-learn dependency from project
- Add type hints and assertion for type safety
2025-11-15 02:02:57 +01:00
Chris Coutinho 2a078093ed refactor!: Make all search algorithms query Qdrant payload, not Nextcloud
BREAKING CHANGE: Search algorithms now require Qdrant to be populated.
Vector sync must be enabled and documents indexed for search to work.

- Keyword and fuzzy search now query Qdrant scroll API for title/excerpt
- Remove inefficient Nextcloud API fetching pattern
- Add optional Nextcloud verification for security
- Deduplicate by (doc_id, doc_type) tuple, keeping chunk_index=0
- Align with document processor pattern that already stores text in Qdrant
2025-11-15 01:56:41 +01:00
github-actions[bot] 682923dcc8 bump: version 0.34.2 → 0.35.0 2025-11-15 00:46:11 +00:00
Chris Coutinho b1a756145e Merge pull request #301 from cbcoutinho/feature/sse
feat: Enable SSE transport for validation testing
2025-11-15 01:45:48 +01:00
Chris Coutinho b5b03bfd78 feat: Add multi-document Protocol with cross-app search support
Implements NextcloudClientProtocol for multi-document type search following
user requirement that document types are not 1:1 with apps (e.g., Notes app
specializes in markdown, while Files/WebDAV handles multiple file types).

Key Changes:
- NextcloudClientProtocol: Generic protocol with app-specific client properties
- get_indexed_doc_types(): Query Qdrant for actually-indexed document types
- Document dispatch: All algorithms check Qdrant before attempting access
- Cross-type deduplication: Use (doc_id, doc_type) tuples in hybrid RRF

Search Algorithm Updates:
- Semantic: Added _verify_document_access() with dispatch to appropriate client
  - Deduplication by (doc_id, doc_type) tuple
  - Only "note" verification implemented, others return None with info log
- Keyword: Added _fetch_documents() dispatch method
  - Queries Qdrant for available types before fetching
  - Supports cross-app search when doc_type=None
- Fuzzy: Same pattern as keyword search
- Hybrid: Already uses (doc_id, doc_type) for deduplication (no changes needed)

Future-Proof Design:
- File/calendar verification stubs in place
- Clear logging when unsupported types found
- Easy to extend when processor indexes new document types

Currently Supported:
- "note" documents fully implemented and tested
- Other types gracefully handled (logged but skipped)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-15 01:19:29 +01:00
Chris Coutinho f3bdb8b885 feat: Update nc_semantic_search tool with algorithm selection
Implements ADR-012 by adding multi-algorithm support to the MCP tool.

Key changes:
- Added algorithm parameter: "semantic"|"keyword"|"fuzzy"|"hybrid" (default: "hybrid")
- Added weight parameters for hybrid mode configuration
- Replaced direct Qdrant/embedding calls with search module abstractions
- Updated docstring to describe all four algorithms
- Simplified implementation: ~50 lines vs ~150 lines (67% reduction)
- Better error handling for missing vector sync

Algorithm selection:
- semantic: Pure vector similarity (requires VECTOR_SYNC_ENABLED=true)
- keyword: Token-based matching with weighted title/content scoring
- fuzzy: Character overlap for typo tolerance
- hybrid: RRF fusion with configurable weights (default: 0.5/0.3/0.2)

Backward compatibility:
- Tool name unchanged (nc_semantic_search)
- New parameters have sensible defaults
- Existing clients get hybrid search automatically (better than pure semantic)
- search_method field in response reflects actual algorithm used

Weight validation:
- Performed in HybridSearchAlgorithm constructor
- Must sum to ≤1.0 and all non-negative
- At least one weight must be > 0
- Clear error messages on validation failure

Next: Update viz pane to use same algorithms

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-15 00:25:55 +01:00
Chris Coutinho 11e620f2d1 feat: Implement unified search algorithm module
Creates shared search module with four algorithms implementing ADR-012:
- Semantic search (vector similarity via Qdrant)
- Keyword search (token-based matching from ADR-001)
- Fuzzy search (character overlap matching)
- Hybrid search (RRF fusion from ADR-003)

Architecture:
- Base SearchAlgorithm interface for consistent API
- SearchResult dataclass for unified result format
- All algorithms async and independently testable
- Proper logging and error handling throughout

Semantic Search (search/semantic.py):
- Extracted from server/semantic.py
- Vector similarity using Qdrant query_points
- Dual-phase authorization (vector filter + API verification)
- Deduplication of document chunks
- Configurable score threshold (default: 0.7)

Keyword Search (search/keyword.py):
- Implements ADR-001 token-based matching
- Title matches weighted 3x higher than content
- Case-insensitive token matching
- Relevance scoring with normalization
- Excerpt extraction with context

Fuzzy Search (search/fuzzy.py):
- Simple character overlap calculation
- Configurable threshold (default: 70%)
- Typo-tolerant matching
- Fast and dependency-free

Hybrid Search (search/hybrid.py):
- Reciprocal Rank Fusion (RRF) from ADR-003
- Parallel execution of sub-algorithms
- Configurable weights per algorithm
- RRF constant k=60 (standard value)
- Weight validation (must sum ≤1.0)

All algorithms:
- Share NextcloudClient for document access
- Support user_id filtering (multi-tenant)
- Support doc_type filtering (currently notes only)
- Return consistent SearchResult objects
- Properly formatted with ruff and type-checked

Next steps: Update MCP tool to use these algorithms

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-15 00:10:19 +01:00
Chris Coutinho 56bd85c0f7 docs: Emphasize server-side processing in ADR-012 viz pane
Updates ADR-012 to clarify that all search and filtering operations
must happen server-side, not in the browser.

Key changes:
- Enhanced viz pane data flow showing server-side processing
- Added performance benefits section (384x bandwidth reduction)
- Detailed server-side filtering approach:
  * Query execution via search/algorithms.py
  * User ID filtering (multi-tenant security)
  * Document type filtering
  * PCA reduction (768-dim → 2D) on server
  * Only 2D coordinates + metadata sent to client
- Updated Phase 3 implementation plan:
  * Remove ALL client-side search logic
  * Implement /app/vector-viz server endpoint
  * htmx form submission for queries
  * Performance optimizations (caching, streaming)

This ensures:
- Minimal bandwidth usage (only 2 floats per doc vs 768)
- Client handles only visualization, not computation
- Can visualize 10,000+ documents without client lag
- Raw vectors never leave server (security)
- Same search logic as MCP tool (consistency)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-15 00:02:54 +01:00
Chris Coutinho 5e67277049 docs: Add architecture diagrams and viz pane UI to ADR-012
Enhances ADR-012 with detailed architecture visualization and UI mockup
for the vector visualization pane.

Added sections:
- Architecture diagram showing MCP tool and viz pane integration
- Data flow diagrams for both MCP requests and viz pane interactions
- Detailed UI mockup with ASCII art showing:
  * Search configuration controls
  * Algorithm selector with weight sliders
  * Interactive 2D scatter plot (Plotly.js)
  * Results panel with scores
  * Performance comparison table
- Technology stack details (htmx, Alpine.js, Plotly.js, Tailwind CSS)

The diagrams illustrate how the viz pane and MCP tool share the same
search algorithm implementations from search/algorithms.py, ensuring
consistency between user testing interface and programmatic API.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-15 00:00:40 +01:00
Chris Coutinho 66a7109130 docs: Add ADR-012 for unified multi-algorithm search
Proposes unified search architecture with client-configurable algorithm
selection and weighting. Addresses the need for flexible search options
beyond pure semantic search.

Key features:
- Four algorithms: semantic, keyword, fuzzy, hybrid
- Client-configurable weights for hybrid search
- Shared implementation between viz pane and MCP tools
- Reciprocal Rank Fusion (RRF) for result combination
- Backward compatible with existing nc_semantic_search()

Implements designs from:
- ADR-003: Hybrid search with RRF (previously unimplemented)
- ADR-001: Token-based keyword search (previously unimplemented)

Supersedes ADR-011's placeholder for "ADR-013: Hybrid Search"

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-14 23:56:09 +01:00
Chris Coutinho 00e72d24a6 feat: Enable SSE transport for mcp service and update test fixtures
Changes:
- Remove streamable-http transport override from mcp service in docker-compose.yml
- Service now uses CLI default SSE transport on /sse endpoint
- Add create_mcp_client_session_sse() helper for SSE connections
- Update nc_mcp_client fixture to use SSE transport
- Fix unpacking for SSE client (yields 2 values vs 3 for streamable-http)

Testing:
- All 4 smoke tests pass with SSE transport
- 32/34 affected tests pass (2 skipped for vector sync)
- OAuth services remain on streamable-http (unchanged)

Note: SSE transport is being deprecated in favor of streamable-http.
This enables minimal validation testing before deprecation.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-14 19:20:30 +01:00
Chris Coutinho dc78d92e5b Merge pull request #299 from cbcoutinho/renovate/docker.io-library-mariadb-lts
chore(deps): update docker.io/library/mariadb:lts docker digest to 6b848cb
2025-11-14 11:23:32 +01:00
renovate-bot-cbcoutinho[bot] 86891173b2 chore(deps): update docker.io/library/mariadb:lts docker digest to 6b848cb 2025-11-14 05:07:34 +00:00
Chris Coutinho 73b3d80026 Merge pull request #294 from cbcoutinho/feature/app_api
docs: Add ADR-011 for hybrid OAuth + AppAPI deployment architecture
2025-11-13 23:43:25 +01:00
github-actions[bot] 56a5c63994 bump: version 0.34.1 → 0.34.2 2025-11-13 21:11:36 +00:00
Chris Coutinho 92c8e1e41d Merge pull request #290 from cbcoutinho/renovate/quay.io-keycloak-keycloak-26.x
chore(deps): update quay.io/keycloak/keycloak docker tag to v26.4.5
2025-11-13 22:11:09 +01:00
github-actions[bot] dd12c957f6 bump: version 0.34.0 → 0.34.1 2025-11-13 21:10:16 +00:00
Chris Coutinho 74e2ab2440 Merge pull request #297 from cbcoutinho/fix/helm-oidc-env-vars
fix: Use NEXTCLOUD_OIDC_CLIENT_ID/SECRET env vars consistently
2025-11-13 22:10:04 +01:00
Chris Coutinho d124144424 Merge pull request #298 from cbcoutinho/fix/notes-search-empty-query
fix: return all notes when search query is empty
2025-11-13 22:09:50 +01:00
Chris Coutinho 39259ef282 ci: Run smoke tests only in ci 2025-11-13 22:06:07 +01:00
Chris Coutinho 3648d478f1 fix: return all notes when search query is empty
Previously, an empty query string to nc_notes_search_notes would return
zero results due to an early return when no query tokens were present.

This was counterintuitive - users expect an empty query to list all
notes, not return nothing.

Changes:
- Modified NotesSearchController.search_notes() to return all notes
  when query is empty
- Added documentation to clarify this behavior
- Empty query results have _score: None (no relevance scoring)
- Non-empty query results continue to have relevance scores

Fixes behavior where listing all notes was impossible via the search tool.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-13 21:57:14 +01:00
Chris Coutinho 14a59fdff3 fix: Use NEXTCLOUD_OIDC_CLIENT_ID/SECRET env vars consistently
Fixes #296

The application code was looking for OIDC_CLIENT_ID and OIDC_CLIENT_SECRET
(without NEXTCLOUD_ prefix), but the Helm chart, documentation, and CLI
all use NEXTCLOUD_OIDC_CLIENT_ID and NEXTCLOUD_OIDC_CLIENT_SECRET.

This mismatch caused OAuth deployments via Helm to fail with crashloops
because the credentials weren't being found.

Changes:
- app.py: Use NEXTCLOUD_OIDC_CLIENT_ID/SECRET in setup_oauth_config()
- config.py: Use NEXTCLOUD_OIDC_CLIENT_ID/SECRET in get_settings()
- Updated documentation comments and error messages

This aligns with the documented naming convention where all Nextcloud-related
environment variables use the NEXTCLOUD_ prefix.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-13 21:48:58 +01:00
github-actions[bot] 2f138e7539 bump: version 0.33.1 → 0.34.0 2025-11-13 16:15:29 +00:00
Chris Coutinho 2baacc0ae8 Merge pull request #295 from cbcoutinho/feat/complete-metrics-instrumentation
feat: Add metrics instrumentation (phases 1-3)
2025-11-13 17:15:03 +01:00
Chris Coutinho c3023d2cc3 feat: Complete Phase 5 - Instrument all 93 MCP tools
Applied @instrument_tool decorator to all 86 remaining tools
across 8 server files.

Instrumented files:
- calendar.py: 16 tools
- contacts.py: 7 tools
- deck.py: 25 tools
- webdav.py: 11 tools
- tables.py: 6 tools
- sharing.py: 5 tools
- cookbook.py: 13 tools
- semantic.py: 3 tools

Total: 93 tools instrumented (7 in notes.py + 86 in other files)

These metrics populate:
- MCP Tool Calls panel (by tool name and status)
- MCP Tool Duration panel (histogram)
- MCP Tool Errors panel (by tool name and error type)

This completes PR #295 - All 5 phases of metrics instrumentation done:
 Phase 1: Queue size metrics (2 locations)
 Phase 2: Health checks (1 location)
 Phase 3: Database operations (3 methods)
 Phase 4: OAuth token metrics (3 locations)
 Phase 5: MCP tool metrics (93 tools)

All 34 dashboard panels now have data sources.
2025-11-13 16:58:44 +01:00
Chris Coutinho 6253faee19 feat: Add instrumentation decorator and apply to notes tools (Phase 5)
Created @instrument_tool decorator for automatic MCP tool metrics collection.
Applied to all 7 tools in notes.py.

Changes:
- observability/metrics.py:
  * New instrument_tool() decorator for automatic timing and error tracking
  * Compatible with @mcp.tool() and @require_scopes() decorators
  * Records tool_name, duration, and success/error status

- server/notes.py:
  * Applied @instrument_tool to all 7 tool functions
  * nc_notes_create_note, nc_notes_update_note, nc_notes_append_content
  * nc_notes_search_notes, nc_notes_get_note, nc_notes_get_attachment
  * nc_notes_delete_note

These metrics will populate the MCP Tool Calls dashboard panels.

Part of PR #295 - Complete metrics instrumentation (Phase 5)
Remaining: 86 tools across 8 server files
2025-11-13 16:40:56 +01:00
Chris Coutinho c97f12d47e feat: Add OAuth token and database metrics (Phases 3-4)
Complete Prometheus instrumentation for OAuth token operations
and additional database operations to populate empty dashboard panels.

OAuth Token Metrics (Phase 4):
- unified_verifier.py:
  * Token validation cache hits/misses
  * JWT verification success/failure/error
  * Introspection validation results
  * Audience validation failures
- context_helper.py:
  * Token exchange cache hits/misses
  * RFC 8693 exchange success/error

Database Metrics (Phase 3 completion):
- storage.py:
  * get_refresh_token() with timing
  * delete_refresh_token() with timing
  * All operations record duration and success/error status

These metrics populate the following dashboard panels:
- Token Validations (by method and result)
- Token Cache Hit Rate
- Token Exchange Operations
- Database Operations (refresh token CRUD)
- Database Operation Duration

Part of PR #295 - Complete metrics instrumentation
2025-11-13 16:23:00 +01:00
Chris Coutinho a667d7c59c feat: Add metrics instrumentation for queue, health, and database operations
Implement Prometheus metrics to populate empty Grafana dashboard panels.

## Phase 1: Queue Size Metrics 
**File**: `processor.py`
- Track vector sync queue depth in real-time
- Update metric after receiving and processing each document
- Update metric during timeout (empty queue)
- Enables: "Processing Queue Depth" panel

## Phase 2: Health Check Metrics 
**File**: `app.py`
- Add Nextcloud connectivity check with timing
- Add Qdrant health check with timing
- Record dependency health status (up/down)
- Record health check duration
- Enables: 4 health status panels + health check duration panel

## Phase 3: Database Operation Metrics (Partial) 
**File**: `storage.py`
- Instrument `store_refresh_token()` method
- Track SQLite INSERT operation timing and success/error status
- Enables: Partial data for database operation latency panel

## Metrics Now Exposed

### Queue Metrics:
- `mcp_vector_sync_queue_size` - Real-time queue depth

### Health Metrics:
- `mcp_dependency_health{dependency="nextcloud"}` - UP/DOWN status
- `mcp_dependency_health{dependency="qdrant"}` - UP/DOWN status
- `mcp_dependency_check_duration_seconds{dependency}` - Health check latency

### Database Metrics:
- `mcp_db_operations_total{db="sqlite",operation="insert"}` - Operation count
- `mcp_db_operation_duration_seconds{db="sqlite",operation="insert"}` - Operation latency

## Dashboard Impact

**Panels Now Populated** (7/34 panels):
-  Processing Queue Depth
-  Nextcloud Health
-  Qdrant Health
-  Health Check Duration
-  Database Operation Latency (partial)
-  Vector sync panels (already working from PR #292)

**Panels Still Empty** (remaining work):
-  OAuth panels (4): Token validations, exchanges, cache hit rate, refresh ops
-  MCP tool panels (3): Call volume, error rates, execution duration
-  Database panel: Needs more SQLite operations instrumented (~29 remaining)

## Testing

Verified metric definitions exist and will be recorded on next deployment.

## Next Steps

Phase 4: OAuth token metrics (unified_verifier.py, context_helper.py, storage.py)
Phase 5: MCP tool metrics (all server/*.py files with @mcp.tool())
Phase 3 completion: Remaining 29 database operations in storage.py

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-13 16:14:38 +01:00
github-actions[bot] bd76902932 bump: version 0.33.0 → 0.33.1 2025-11-13 12:10:42 +00:00
Chris Coutinho da65155cde Merge pull request #293 from cbcoutinho/fix/grafana-folder-label-validation
fix: Move grafana_folder from labels to annotations
2025-11-13 13:10:15 +01:00
Chris Coutinho 4e43d15153 fix: Move grafana_folder from labels to annotations
Fixes Kubernetes label validation error when deploying dashboard ConfigMap.

Problem:
- Kubernetes labels cannot contain spaces (validation regex: [A-Za-z0-9][-A-Za-z0-9_.]*[A-Za-z0-9])
- Previous implementation had grafana_folder: "Nextcloud MCP" as a label
- Deployment failed with: "Invalid value: 'Nextcloud MCP'"

Solution:
- Move grafana_folder from labels to annotations (annotations allow spaces)
- Keep grafana_dashboard="1" as label for ConfigMap discovery
- Grafana sidecar reads folder name from folderAnnotation parameter

Changes:
- dashboard-configmap.yaml: Move grafana_folder to annotations section
- dashboards/README.md: Fix kubectl commands to use annotations
- values.yaml: Update comments to clarify annotation usage

This follows the standard kube-prometheus-stack pattern where:
- Labels are used for ConfigMap discovery (strict validation)
- Annotations are used for metadata like folder names (relaxed validation)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-13 13:08:45 +01:00
github-actions[bot] 15951c38fa bump: version 0.32.1 → 0.33.0 2025-11-13 10:58:05 +00:00
Chris Coutinho 2de0590839 Merge pull request #292 from cbcoutinho/feat/grafana-dashboard-and-vector-metrics
feat: Add Grafana dashboard and vector sync metric instrumentation
2025-11-13 11:57:40 +01:00
Chris Coutinho 4ea5ed72d4 feat: Add Grafana dashboard and vector sync metric instrumentation
Implement comprehensive observability for vector database synchronization
with Grafana dashboard and Prometheus metrics.

## Part 1: Grafana Dashboard

Created all-in-one operations dashboard with 7 rows and 34 panels:

### Dashboard Structure:
- **Overview Row**: Request rate, error rate, P95 latency, active requests
- **HTTP Metrics (RED)**: Request/error rates by endpoint, latency percentiles
- **MCP Tools**: Call volume, error rates, execution duration by tool
- **Nextcloud API**: API calls/latency by app, retry patterns
- **OAuth & Authentication**: Token validations, exchanges, cache hit rate
- **Dependencies & Health**: Status for Nextcloud/Qdrant/Keycloak/Unstructured
- **Vector Sync**: Processing throughput, queue depth, Qdrant operations

### Helm Chart Integration:
- Added dashboard-configmap.yaml template for automatic provisioning
- Configured Grafana sidecar auto-discovery (label: grafana_dashboard="1")
- Added dashboards configuration section in values.yaml (opt-in)
- Updated Chart.yaml with dashboard annotations
- Enhanced NOTES.txt with dashboard deployment instructions
- Comprehensive documentation in dashboards/README.md

Dashboard supports dynamic filtering via variables:
- datasource: Prometheus data source selection
- namespace: Filter by Kubernetes namespace
- pod: Multi-select pod filtering
- interval: Query interval (1m/5m/10m/30m/1h)

## Part 2: Vector Sync Metric Instrumentation

Implemented metric recording throughout vector sync pipeline:

### metrics.py:
Added convenience functions:
- record_vector_sync_scan() - Track documents per scan
- record_vector_sync_processing() - Track processing duration/status
- record_qdrant_operation() - Track database operations
- update_vector_sync_queue_size() - Track queue depth

### scanner.py:
- Record number of documents found in each scan
- Enables monitoring of scan throughput

### processor.py:
- Record processing duration for each document
- Track success/failure status with timing
- Record Qdrant upsert/delete operations
- Handle all code paths (success, deletion, error)

### semantic.py:
- Wrap Qdrant query_points with try/except
- Record search operation success/failure

## Metrics Exposed:

- mcp_vector_sync_documents_scanned_total
- mcp_vector_sync_documents_processed_total{status}
- mcp_vector_sync_processing_duration_seconds (histogram)
- mcp_vector_sync_queue_size (gauge)
- mcp_qdrant_operations_total{operation,status}

This enables monitoring of:
- Scan and processing throughput
- Processing latency (P50/P95/P99)
- Error rates for processing and Qdrant operations
- Queue depth trends
- Complete observability of vector sync pipeline

## Testing:

Verified locally that metrics are recorded correctly:
- 36 documents scanned
- 3 documents processed (avg 7.5s each)
- 3 successful Qdrant upsert operations
- Search operations tracked

## Deployment:

Enable dashboard provisioning in Helm values:
```yaml
dashboards:
  enabled: true
  grafanaFolder: "Nextcloud MCP"
```

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-13 11:49:20 +01:00
Chris Coutinho d1829fbbd6 Merge pull request #291 from cbcoutinho/renovate/ghcr.io-astral-sh-uv-0.x
chore(deps): update ghcr.io/astral-sh/uv docker tag to v0.9.9
2025-11-13 08:02:35 +01:00
renovate-bot-cbcoutinho[bot] 8332542959 chore(deps): update ghcr.io/astral-sh/uv docker tag to v0.9.9 2025-11-12 23:11:29 +00:00
renovate-bot-cbcoutinho[bot] 2c37ad165e chore(deps): update quay.io/keycloak/keycloak docker tag to v26.4.5 2025-11-12 17:09:23 +00:00
79 changed files with 11645 additions and 1715 deletions
+1 -1
View File
@@ -20,7 +20,7 @@ jobs:
fetch-depth: 0
token: "${{ secrets.PERSONAL_ACCESS_TOKEN }}"
- name: Create bump and changelog
uses: commitizen-tools/commitizen-action@5b0848cd060263e24602d1eba03710e056ef7711 # 0.24.0
uses: commitizen-tools/commitizen-action@9615e7be1cf341393c52e865ebbdaa0712176d81 # 0.25.0
with:
github_token: ${{ secrets.PERSONAL_ACCESS_TOKEN }}
changelog_increment_filename: body.md
+1 -1
View File
@@ -85,4 +85,4 @@ jobs:
NEXTCLOUD_USERNAME: "admin"
NEXTCLOUD_PASSWORD: "admin"
run: |
uv run pytest -v --log-cli-level=WARN --ignore=tests/manual
uv run pytest -v --log-cli-level=WARN -m smoke
+3
View File
@@ -13,3 +13,6 @@ docker-compose.override.yml
# Generated by pytest used to login users
.nextcloud_oauth_*.json
.playwright-mcp/
# RAG Evaluation
tests/rag_evaluation/fixtures/
+110
View File
@@ -1,3 +1,113 @@
## v0.39.0 (2025-11-16)
### Feat
- Implement BM25 hybrid search with native Qdrant RRF fusion
### Fix
- Handle named vectors in visualization and semantic search
- Update vizApp to use bm25_hybrid algorithm and remove deprecated weights
- Update viz routes to use BM25 hybrid search after refactor
## v0.38.0 (2025-11-16)
### Feat
- add concurrent uploads and --force flag to upload command
- implement RAG evaluation framework with CLI tooling
### Fix
- download qrels from BEIR ZIP instead of HuggingFace
### Refactor
- migrate asyncio to anyio for consistent structured concurrency
- replace httpx client with NextcloudClient in upload command
### Perf
- Eliminate double-fetching in semantic search sampling
- fix vector viz search performance and visual encoding
- make note deletion concurrent in upload --force
## v0.37.0 (2025-11-16)
### Feat
- Add OpenTelemetry tracing to @instrument_tool decorator
## v0.36.0 (2025-11-15)
### BREAKING CHANGE
- Search algorithms now require Qdrant to be populated.
Vector sync must be enabled and documents indexed for search to work.
### Feat
- Normalize hybrid search RRF scores to 0-1 range
- Enhance vector visualization UI and parallelize search verification
- Add Vector Viz tab to app home page
- Add vector visualization pane with multi-select document types
- Implement custom PCA to remove sklearn dependency
- Add multi-document Protocol with cross-app search support
- Update nc_semantic_search tool with algorithm selection
- Implement unified search algorithm module
### Fix
- Reorder tabs and fix viz pane session access
### Refactor
- Optimize Nextcloud access verification with centralized filtering
- Make all search algorithms query Qdrant payload, not Nextcloud
### Perf
- Exclude vector-sync status polling from distributed tracing
## v0.35.0 (2025-11-15)
### Feat
- Enable SSE transport for mcp service and update test fixtures
## v0.34.2 (2025-11-13)
### Fix
- Use NEXTCLOUD_OIDC_CLIENT_ID/SECRET env vars consistently
## v0.34.1 (2025-11-13)
### Fix
- return all notes when search query is empty
## v0.34.0 (2025-11-13)
### Feat
- Complete Phase 5 - Instrument all 93 MCP tools
- Add instrumentation decorator and apply to notes tools (Phase 5)
- Add OAuth token and database metrics (Phases 3-4)
- Add metrics instrumentation for queue, health, and database operations
## v0.33.1 (2025-11-13)
### Fix
- Move grafana_folder from labels to annotations
## v0.33.0 (2025-11-13)
### Feat
- Add Grafana dashboard and vector sync metric instrumentation
## v0.32.1 (2025-11-12)
### Fix
+63 -5
View File
@@ -5,23 +5,29 @@ This file provides guidance to Claude Code (claude.ai/code) when working with co
## Coding Conventions
### async/await Patterns
- **Use anyio + asyncio hybrid** - Both libraries are available
- **Use anyio for all async operations** - Provides structured concurrency
- pytest runs in `anyio` mode (`anyio_mode = "auto"` in pyproject.toml)
- asyncio used in auth modules (refresh_token_storage.py, token_exchange.py, token_broker.py)
- anyio used in calendar.py, client_registration.py, app.py
- Use `anyio.create_task_group()` for concurrent execution (NOT `asyncio.gather()`)
- Use `anyio.Lock()` for synchronization primitives (NOT `asyncio.Lock()`)
- Use `anyio.run()` for entry points (NOT `asyncio.run()`)
- Prefer standard async/await syntax without explicit library imports when possible
- Examples: app.py, search/hybrid.py, search/verification.py, auth/token_broker.py
### Type Hints
- **Use Python 3.10+ union syntax**: `str | None` instead of `Optional[str]`
- **Use lowercase generics**: `dict[str, Any]` instead of `Dict[str, Any]`
- **Type all function signatures** - Parameters and return types
- **No explicit type checker configured** - Ruff handles linting only
- **Type checker**: `ty` is configured for static type checking
```bash
uv run ty check -- nextcloud_mcp_server
```
### Code Quality
- **Run ruff before committing**:
- **Run ruff and ty before committing**:
```bash
uv run ruff check
uv run ruff format
uv run ty check -- nextcloud_mcp_server
```
- **Ruff configuration** in pyproject.toml (extends select: ["I"] for import sorting)
@@ -55,8 +61,60 @@ This file provides guidance to Claude Code (claude.ai/code) when working with co
- `nextcloud_mcp_server/server/` - MCP tool/resource definitions
- `nextcloud_mcp_server/auth/` - OAuth/OIDC authentication
- `nextcloud_mcp_server/models/` - Pydantic response models
- `nextcloud_mcp_server/providers/` - Unified LLM provider infrastructure (embeddings + generation)
- `tests/` - Layered test suite (unit, smoke, integration, load)
### Provider Architecture (ADR-015)
**Unified Provider System** for embeddings and text generation:
**Location:** `nextcloud_mcp_server/providers/`
- `base.py` - `Provider` ABC with optional capabilities
- `registry.py` - Auto-detection and factory pattern
- `ollama.py` - Ollama provider (embeddings + generation)
- `anthropic.py` - Anthropic provider (generation only)
- `bedrock.py` - Amazon Bedrock provider (embeddings + generation)
- `simple.py` - Simple in-memory provider (embeddings only, fallback)
**Usage:**
```python
from nextcloud_mcp_server.providers import get_provider
provider = get_provider() # Auto-detects from environment
# Check capabilities
if provider.supports_embeddings:
embeddings = await provider.embed_batch(texts)
if provider.supports_generation:
text = await provider.generate("prompt", max_tokens=500)
```
**Environment Variables:**
Bedrock:
- `AWS_REGION` - AWS region (e.g., "us-east-1")
- `BEDROCK_EMBEDDING_MODEL` - Embedding model ID (e.g., "amazon.titan-embed-text-v2:0")
- `BEDROCK_GENERATION_MODEL` - Generation model ID (e.g., "anthropic.claude-3-sonnet-20240229-v1:0")
- `AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY` - Optional, uses AWS credential chain
Ollama:
- `OLLAMA_BASE_URL` - API URL (e.g., "http://localhost:11434")
- `OLLAMA_EMBEDDING_MODEL` - Embedding model (default: "nomic-embed-text")
- `OLLAMA_GENERATION_MODEL` - Generation model (e.g., "llama3.2:1b")
- `OLLAMA_VERIFY_SSL` - SSL verification (default: "true")
Simple (fallback, no config needed):
- `SIMPLE_EMBEDDING_DIMENSION` - Dimension (default: 384)
**Auto-Detection Priority:** Bedrock → Ollama → Simple
**Backward Compatibility:**
- Old code using `nextcloud_mcp_server.embedding.get_embedding_service()` still works
- `EmbeddingService` now wraps `get_provider()` internally
**For Details:** See `docs/ADR-015-unified-provider-architecture.md`
## Development Commands (Quick Reference)
### Testing
+6 -2
View File
@@ -1,9 +1,13 @@
FROM ghcr.io/astral-sh/uv:0.9.8-python3.11-alpine@sha256:6c842c49ad032f46b62f32a7e7779f45f12671a8e0d82ea24c766ab62d58b396
FROM python:3.12-slim-trixie
COPY --from=ghcr.io/astral-sh/uv:latest /uv /uvx /bin/
# Install dependencies
# 1. git (required for caldav dependency from git)
# 2. sqlite for development with token db
RUN apk add --no-cache git sqlite
RUN apt update && apt install --no-install-recommends --no-install-suggests -y \
git \
sqlite3
WORKDIR /app
+6 -2
View File
@@ -2,8 +2,8 @@ apiVersion: v2
name: nextcloud-mcp-server
description: A Helm chart for Nextcloud MCP Server - enables AI assistants to interact with Nextcloud
type: application
version: 0.32.1
appVersion: "0.32.1"
version: 0.39.0
appVersion: "0.39.0"
keywords:
- nextcloud
- mcp
@@ -21,6 +21,10 @@ home: https://github.com/cbcoutinho/nextcloud-mcp-server
sources:
- https://github.com/cbcoutinho/nextcloud-mcp-server
icon: https://raw.githubusercontent.com/nextcloud/server/master/core/img/logo/logo.svg
annotations:
# Grafana dashboard support
grafana_dashboard: "true"
grafana_dashboard_folder: "Nextcloud MCP"
dependencies:
- name: qdrant
version: "1.15.5"
+66
View File
@@ -280,6 +280,72 @@ Use OpenAI or any OpenAI-compatible API instead of Ollama.
| `openai.secretKey` | Key in secret containing API key | `api-key` |
| `openai.baseUrl` | Custom API endpoint (optional) | `""` |
#### Observability & Monitoring
The chart includes comprehensive observability features including Prometheus metrics, OpenTelemetry tracing, and Grafana dashboards.
**Metrics Configuration:**
| Parameter | Description | Default |
|-----------|-------------|---------|
| `observability.metrics.enabled` | Enable Prometheus metrics | `true` |
| `observability.metrics.port` | Metrics port | `9090` |
| `observability.metrics.path` | Metrics endpoint path | `/metrics` |
**Tracing Configuration:**
| Parameter | Description | Default |
|-----------|-------------|---------|
| `observability.tracing.enabled` | Enable OpenTelemetry tracing | `false` |
| `observability.tracing.endpoint` | OTLP collector endpoint | `""` |
| `observability.tracing.serviceName` | Service name in traces | `nextcloud-mcp-server` |
| `observability.tracing.samplingRate` | Trace sampling rate (0.0-1.0) | `1.0` |
**Logging Configuration:**
| Parameter | Description | Default |
|-----------|-------------|---------|
| `observability.logging.format` | Log format (json or text) | `json` |
| `observability.logging.level` | Log level | `INFO` |
| `observability.logging.includeTraceContext` | Include trace IDs in logs | `true` |
**ServiceMonitor (Prometheus Operator):**
| Parameter | Description | Default |
|-----------|-------------|---------|
| `serviceMonitor.enabled` | Create ServiceMonitor resource | `false` |
| `serviceMonitor.interval` | Scrape interval | `30s` |
| `serviceMonitor.scrapeTimeout` | Scrape timeout | `10s` |
| `serviceMonitor.labels` | Additional labels for ServiceMonitor | `{}` |
**PrometheusRule (Prometheus Operator):**
| Parameter | Description | Default |
|-----------|-------------|---------|
| `prometheusRule.enabled` | Create PrometheusRule with alert rules | `false` |
| `prometheusRule.labels` | Additional labels for PrometheusRule | `{}` |
**Grafana Dashboards:**
| Parameter | Description | Default |
|-----------|-------------|---------|
| `dashboards.enabled` | Enable automatic dashboard provisioning | `false` |
| `dashboards.grafanaFolder` | Grafana folder name for dashboards | `Nextcloud MCP` |
| `dashboards.labels` | Additional labels for dashboard ConfigMap | `{}` |
| `dashboards.annotations` | Additional annotations for dashboard ConfigMap | `{}` |
When `dashboards.enabled` is `true`, a ConfigMap with the Grafana dashboard is created with the `grafana_dashboard: "1"` label. This enables automatic discovery by Grafana sidecar containers (commonly used with kube-prometheus-stack).
The dashboard provides comprehensive monitoring including:
- HTTP request metrics (RED pattern: Rate, Errors, Duration)
- MCP tool performance and errors
- Nextcloud API performance by app (notes, calendar, contacts, etc.)
- OAuth token operations and cache hit rates
- External dependency health (Nextcloud, Qdrant, Keycloak, Unstructured API)
- Vector sync processing pipeline (when enabled)
For manual import or more details, see `charts/nextcloud-mcp-server/dashboards/README.md`.
## Examples
### Example 1: Basic Auth with Ingress
+107 -36
View File
@@ -6,14 +6,57 @@ This directory contains example Grafana dashboards for monitoring the Nextcloud
### nextcloud-mcp-server.json
Comprehensive dashboard with the following panels:
All-in-one Operations Dashboard with comprehensive monitoring across all system components.
- **Request Rate**: HTTP requests per second by method and endpoint
- **Error Rate**: Percentage of 5xx errors
- **Request Latency**: P50 and P95 latency by endpoint
- **Top MCP Tools**: Most frequently called tools
- **Nextcloud API Latency**: API call latency by app (notes, calendar, etc.)
- **Vector Sync Queue**: Queue size for background document processing
#### Overview Row
High-level metrics for quick health assessment:
- **Request Rate** (stat): Total requests per second
- **Error Rate** (stat): Percentage of 5xx errors with color thresholds
- **P95 Latency** (stat): 95th percentile request latency
- **Active Requests** (stat): Current in-flight requests
#### HTTP Metrics (RED Pattern)
Core request/error/duration metrics:
- **Request Rate by Endpoint** (timeseries): RPS breakdown by endpoint
- **Error Rate by Status Code** (timeseries): Error rates for 4xx/5xx codes
- **Latency Percentiles** (timeseries): P50, P95, P99 latency trends
- **Status Code Distribution** (piechart): Percentage breakdown of all status codes
#### MCP Tools Row
MCP-specific tool performance:
- **Top Tools by Call Volume** (bargauge): Top 10 most-called tools
- **Tool Error Rate** (timeseries): Error rates per tool
- **Tool Execution Duration** (timeseries): P95 latency by tool
#### Nextcloud API Row
Backend API performance metrics:
- **API Calls by App** (timeseries): Request rate per Nextcloud app (notes, calendar, contacts, etc.)
- **API Latency by App** (timeseries): P95 latency per app
- **API Retries by Reason** (timeseries): Retry patterns (429, timeout, connection errors)
- **API Error Rate** (stat): Overall API error percentage
#### OAuth & Authentication Row
OAuth token operations and caching:
- **Token Validations** (timeseries): Success/failure rates for token validation
- **Token Exchange Operations** (timeseries): RFC 8693 token exchange operations
- **Token Cache Hit Rate** (stat): Percentage of cache hits (color-coded: red<50%, yellow<80%, green≥80%)
- **Refresh Token Operations** (timeseries): Refresh token storage operations by type
#### Dependencies & Health Row
External dependency status monitoring:
- **Nextcloud Health** (stat): UP/DOWN status with color coding
- **Qdrant Health** (stat): Vector database health status
- **Keycloak Health** (stat): Identity provider health status
- **Unstructured API Health** (stat): Document processing API status
- **Health Check Duration** (timeseries): Health check latency by dependency
- **Database Operation Latency** (timeseries): P95 latency for DB operations (SQLite, Qdrant)
#### Vector Sync Row (when enabled)
Document processing pipeline metrics:
- **Documents Processed Rate** (timeseries): Processing throughput by status (success/failure)
- **Processing Queue Depth** (gauge): Current queue size with thresholds (yellow>50, red>100)
- **Qdrant Operations** (timeseries): Vector database operations by type
- **Document Processing Duration** (timeseries): P95 processing latency
## Importing to Grafana
@@ -25,49 +68,77 @@ Comprehensive dashboard with the following panels:
4. Select your Prometheus data source
5. Click "Import"
### Automated Import (Kubernetes)
### Automated Import (Helm Chart)
If using the Grafana Operator or kube-prometheus-stack, you can create a ConfigMap:
The Helm chart now supports automatic dashboard provisioning via Grafana sidecar pattern.
#### Option 1: Using Helm Chart (Recommended)
Enable dashboard provisioning in your Helm values:
```yaml
# values.yaml for nextcloud-mcp-server chart
dashboards:
enabled: true
grafanaFolder: "Nextcloud MCP" # Folder name in Grafana
labels: {} # Additional labels if needed
```
Then deploy or upgrade:
```bash
kubectl create configmap nextcloud-mcp-dashboards \
helm upgrade --install nextcloud-mcp nextcloud-mcp-server \
--set dashboards.enabled=true
```
The dashboard will be automatically imported by Grafana if the sidecar is configured
to watch for ConfigMaps with label `grafana_dashboard: "1"`.
#### Option 2: Using kube-prometheus-stack
If using kube-prometheus-stack with Grafana sidecar enabled, the dashboard will be
automatically discovered and imported. Ensure your Grafana deployment has:
```yaml
# kube-prometheus-stack values
grafana:
sidecar:
dashboards:
enabled: true
label: grafana_dashboard
folder: /tmp/dashboards
provider:
foldersFromFilesStructure: true
```
#### Option 3: Manual ConfigMap Creation
For other Grafana setups, create a ConfigMap manually:
```bash
kubectl create configmap nextcloud-mcp-dashboard \
--from-file=nextcloud-mcp-server.json \
-n monitoring
# Add label for Grafana sidecar to discover
kubectl label configmap nextcloud-mcp-dashboards \
# Add sidecar discovery label
kubectl label configmap nextcloud-mcp-dashboard \
grafana_dashboard=1 \
-n monitoring
```
Or add to your Helm values:
```yaml
# values.yaml for kube-prometheus-stack
grafana:
dashboardProviders:
dashboardproviders.yaml:
apiVersion: 1
providers:
- name: 'nextcloud-mcp'
orgId: 1
folder: 'Nextcloud MCP'
type: file
disableDeletion: false
editable: true
options:
path: /var/lib/grafana/dashboards/nextcloud-mcp
dashboardsConfigMaps:
nextcloud-mcp: nextcloud-mcp-dashboards
# Add folder annotation (annotations support spaces, unlike labels)
kubectl annotate configmap nextcloud-mcp-dashboard \
grafana_folder="Nextcloud MCP" \
-n monitoring
```
## Dashboard Variables
The dashboard includes two variables:
The dashboard includes four template variables for dynamic filtering:
- **Data Source**: Select your Prometheus data source
- **Namespace**: Filter metrics by Kubernetes namespace
- **datasource**: Select your Prometheus data source
- **namespace**: Filter metrics by Kubernetes namespace (supports "All")
- **pod**: Filter by specific pod(s) - multi-select enabled (supports "All")
- **interval**: Query interval for rate calculations (1m, 5m, 10m, 30m, 1h - default: 5m)
## Customization
File diff suppressed because it is too large Load Diff
@@ -96,6 +96,30 @@ Your Nextcloud MCP Server has been deployed in {{ .Values.auth.mode }} authentic
kubectl --namespace {{ .Release.Namespace }} exec -it deploy/{{ include "nextcloud-mcp-server.fullname" . }} -- curl -s http://localhost:{{ include "nextcloud-mcp-server.port" . }}/user/page | grep "Vector Sync"
{{- end }}
{{- if .Values.dashboards.enabled }}
6. Grafana Dashboards:
- Dashboard provisioning: Enabled
- ConfigMap: {{ include "nextcloud-mcp-server.fullname" . }}-dashboard
- Grafana Folder: {{ .Values.dashboards.grafanaFolder }}
The dashboard will be automatically imported by Grafana if the sidecar is configured
to watch for ConfigMaps with label "grafana_dashboard: 1".
To manually import the dashboard:
kubectl --namespace {{ .Release.Namespace }} get configmap {{ include "nextcloud-mcp-server.fullname" . }}-dashboard -o jsonpath='{.data.nextcloud-mcp-server\.json}' | jq . > dashboard.json
Then import dashboard.json via Grafana UI (Dashboards → Import).
{{- else }}
6. Grafana Dashboards:
- Dashboard provisioning: Disabled
- To enable automatic dashboard provisioning, set: dashboards.enabled=true
Manual import option:
The dashboard JSON is available in the chart at charts/nextcloud-mcp-server/dashboards/nextcloud-mcp-server.json
{{- end }}
For more information and documentation:
- GitHub: https://github.com/cbcoutinho/nextcloud-mcp-server
- Documentation: https://github.com/cbcoutinho/nextcloud-mcp-server#readme
@@ -0,0 +1,25 @@
{{- if .Values.dashboards.enabled }}
apiVersion: v1
kind: ConfigMap
metadata:
name: {{ include "nextcloud-mcp-server.fullname" . }}-dashboard
namespace: {{ .Release.Namespace }}
labels:
{{- include "nextcloud-mcp-server.labels" . | nindent 4 }}
{{- with .Values.dashboards.labels }}
{{- toYaml . | nindent 4 }}
{{- end }}
# Grafana sidecar discovery label
grafana_dashboard: "1"
annotations:
{{- with .Values.dashboards.annotations }}
{{- toYaml . | nindent 4 }}
{{- end }}
# Grafana folder name (annotations support spaces, unlike labels)
{{- if .Values.dashboards.grafanaFolder }}
grafana_folder: {{ .Values.dashboards.grafanaFolder | quote }}
{{- end }}
data:
nextcloud-mcp-server.json: |-
{{ .Files.Get "dashboards/nextcloud-mcp-server.json" | indent 4 }}
{{- end }}
+14
View File
@@ -205,6 +205,20 @@ prometheusRule:
# Additional labels for PrometheusRule (e.g., for Prometheus selector)
# Example: { prometheus: kube-prometheus }
# Grafana dashboards (requires Grafana with sidecar enabled)
dashboards:
# Enable automatic dashboard provisioning via ConfigMap
enabled: false
# Grafana folder name where dashboards will be imported
# The grafana-sidecar looks for ConfigMaps with label "grafana_dashboard: 1"
# and reads the folder name from annotation "grafana_folder" (supports spaces)
grafanaFolder: "Nextcloud MCP"
# Additional labels for dashboard ConfigMap
# These will be added alongside the required "grafana_dashboard: 1" label
labels: {}
# Additional annotations for dashboard ConfigMap
annotations: {}
service:
type: ClusterIP
port: 8000
+9 -7
View File
@@ -3,7 +3,7 @@ services:
# https://hub.docker.com/_/mariadb
db:
# Note: Check the recommend version here: https://docs.nextcloud.com/server/latest/admin_manual/installation/system_requirements.html#server
image: docker.io/library/mariadb:lts@sha256:404ebf26ed7a56fbab05c29f6f1e70188e5eadb51bba8cee8d355775776deb08
image: docker.io/library/mariadb:lts@sha256:6b848cb24fbbd87429917f6c4422ac53c343e85692eb0fef86553e99e4f422f3
restart: always
command: --transaction-isolation=READ-COMMITTED
volumes:
@@ -34,7 +34,7 @@ services:
- ./app-hooks:/docker-entrypoint-hooks.d:ro
# Mount OIDC development directory outside /var/www/html to avoid rsync conflicts
# The post-installation hook will register /opt/apps as an additional app directory
- ./third_party:/opt/apps:ro
#- ./third_party:/opt/apps:ro
environment:
- NEXTCLOUD_TRUSTED_DOMAINS=app
- NEXTCLOUD_ADMIN_USER=admin
@@ -69,23 +69,25 @@ services:
mcp:
build: .
command: ["--transport", "streamable-http"]
restart: always
command: ["--transport", "streamable-http"]
depends_on:
app:
condition: service_healthy
ports:
- 127.0.0.1:8000:8000
- 127.0.0.1:9090:9090
volumes:
- mcp-data:/app/data
environment:
- NEXTCLOUD_HOST=http://app:80
- NEXTCLOUD_USERNAME=admin
- NEXTCLOUD_PASSWORD=admin
- NEXTCLOUD_PUBLIC_ISSUER_URL=http://localhost:8080
# Vector sync configuration (ADR-007)
- VECTOR_SYNC_ENABLED=true
- VECTOR_SYNC_SCAN_INTERVAL=10
- VECTOR_SYNC_SCAN_INTERVAL=60
- VECTOR_SYNC_PROCESSOR_WORKERS=1
#- LOG_FORMAT=json
@@ -156,7 +158,7 @@ services:
- oauth-tokens:/app/data
keycloak:
image: quay.io/keycloak/keycloak:26.4.4@sha256:c6459d5fae1b759f5d667ebdc6237ab3121379c3494e213898569014ede1846d
image: quay.io/keycloak/keycloak:26.4.5@sha256:653852bfdea2be6e958b9e90a976eff1c6de34edd55f2f679bdc48ef16bc528e
command:
- "start-dev"
- "--import-realm"
@@ -193,8 +195,8 @@ services:
# Provider auto-detected from OIDC_DISCOVERY_URL issuer
# Using internal Docker hostname for discovery to get consistent issuer
- OIDC_DISCOVERY_URL=http://keycloak:8080/realms/nextcloud-mcp/.well-known/openid-configuration
- OIDC_CLIENT_ID=nextcloud-mcp-server
- OIDC_CLIENT_SECRET=mcp-secret-change-in-production
- NEXTCLOUD_OIDC_CLIENT_ID=nextcloud-mcp-server
- NEXTCLOUD_OIDC_CLIENT_SECRET=mcp-secret-change-in-production
- OIDC_JWKS_URI=http://keycloak:8080/realms/nextcloud-mcp/protocol/openid-connect/certs
# Nextcloud API endpoint (for accessing APIs with validated token)
@@ -0,0 +1,895 @@
# ADR-011: Improving Semantic Search Quality Through Better Chunking and Embeddings
**Status**: Proposed
**Date**: 2025-11-12
**Authors**: Development Team
**Related**: ADR-003 (Vector Database Architecture), ADR-008 (MCP Sampling for RAG)
## Context
The semantic search implementation provides document retrieval across Nextcloud apps using vector embeddings. Production usage has revealed that **the system frequently misses relevant documents** (recall problem).
Root cause analysis identifies two fundamental issues:
### 1. Poor Chunking Strategy
**Current Implementation** (`nextcloud_mcp_server/vector/document_chunker.py:36`):
```python
words = content.split() # Naive whitespace splitting
chunk_size = 512 # words
overlap = 50 # words
chunks = [words[i:i+chunk_size] for i in range(0, len(words), chunk_size-overlap)]
```
**Problems**:
- **Breaks semantic boundaries**: Splits mid-sentence, mid-paragraph, mid-thought
- **Loses context**: "The meeting discussed budget. We decided to..." becomes two disconnected chunks
- **Poor retrieval**: Relevant content split across chunks with low individual relevance scores
- **No structure awareness**: Ignores markdown headers, lists, code blocks
**Evidence**:
- Documents with relevant content in middle sections score poorly (content split across 3+ chunks)
- Multi-sentence concepts (spanning 60-100 words) are fragmented
- Search for "budget planning process" misses documents where these words appear in adjacent sentences but different chunks
### 2. Suboptimal Embedding Model
**Current Implementation** (`nextcloud_mcp_server/embedding/ollama_provider.py:33`):
```python
_model = "nomic-embed-text" # 768 dimensions
_dimension = 768 # Hardcoded
```
**Problems**:
- **Model selection**: `nomic-embed-text` is general-purpose, not optimized for our use case
- **No benchmarking**: Selected without comparative evaluation
- **Dimensionality**: 768-dim may be insufficient for nuanced semantic distinctions
- **No domain adaptation**: Model not tuned for Nextcloud content (notes, calendar, deck cards)
**Evidence**:
- Synonymous queries return different results ("meeting notes" vs. "discussion summary")
- Domain-specific terms poorly represented ("standup", "retrospective", "OKRs")
- Cross-lingual content (if present) not well supported
### Current Performance
**Baseline Metrics** (100-document test corpus, 50 queries):
- **Recall@10**: ~52% (misses 48% of relevant documents)
- **Precision@10**: ~78% (acceptable but room for improvement)
- **MRR**: 0.58 (relevant docs often not in top positions)
- **Zero-result queries**: 18% (completely missing relevant content)
## Decision Drivers
1. **Address Root Causes**: Fix fundamental issues (chunking, embeddings) before adding complexity (reranking, hybrid search)
2. **Measurable Impact**: Target 40-60% improvement in recall through chunking/embedding alone
3. **Independence**: Improvements should be orthogonal to future enhancements (reranking, GraphRAG)
4. **Cost Efficiency**: Minimize infrastructure and API costs
5. **Reindexing Acceptable**: One-time reindex cost justified by long-term quality improvement
## Options Considered
### Chunking Strategies
#### Option C1: Semantic Sentence-Aware Chunking (RECOMMENDED)
**Description**: Respect sentence boundaries while maintaining target chunk size
**Implementation**:
```python
from langchain.text_splitter import RecursiveCharacterTextSplitter
splitter = RecursiveCharacterTextSplitter(
chunk_size=2048, # ~512 words in characters
chunk_overlap=200, # ~50 words in characters
separators=["\n\n", "\n", ". ", "! ", "? ", "; ", ": ", ", ", " "],
length_function=len,
)
```
**How it works**:
1. Try splitting by paragraphs (`\n\n`)
2. If chunks too large, split by sentences (`. `, `! `, `? `)
3. If still too large, split by clauses (`;`, `:`)
4. Last resort: split by words
**Pros**:
- ✅ Preserves semantic boundaries (never breaks mid-sentence)
- ✅ Maintains context coherence within chunks
- ✅ Simple implementation (langchain library)
- ✅ Configurable separators for different content types
- ✅ Proven approach (used by major RAG systems)
**Cons**:
- ❌ Variable chunk sizes (not exactly 512 words, but close)
- ❌ Adds dependency (langchain)
- ❌ Slightly slower than naive splitting (~10-20ms per document)
**Expected Impact**: 20-30% recall improvement
#### Option C2: Hierarchical Context-Preserving Chunks
**Description**: Create overlapping parent/child chunks
**Structure**:
```
Document → Large parent chunks (1024 words) → Small child chunks (256 words)
↓ ↓
Stored in Qdrant Searched first
Return parent context
```
**Implementation**:
```python
# Generate child chunks (searched)
child_chunks = splitter.split_text(content, chunk_size=1024)
# Generate parent chunks (context)
parent_chunks = splitter.split_text(content, chunk_size=4096)
# Store both with parent-child relationships
for child_idx, child in enumerate(child_chunks):
parent_idx = find_parent(child_idx)
store_vector(
vector=embed(child),
payload={
"chunk": child,
"parent_chunk": parent_chunks[parent_idx],
"chunk_type": "child"
}
)
```
**Pros**:
- ✅ Best of both worlds: precise matching + full context
- ✅ Handles multi-hop information needs
- ✅ Better for long documents (> 1000 words)
**Cons**:
- ❌ 2x storage (parent + child chunks)
- ❌ More complex implementation
- ❌ Higher indexing time (embed twice)
- ❌ Query complexity (retrieve child, return parent)
**Expected Impact**: 35-45% recall improvement (diminishing returns vs. complexity)
**Verdict**: ⚠️ Consider only if Option C1 insufficient
#### Option C3: Document Structure-Aware Chunking
**Description**: Parse markdown/document structure before chunking
**Implementation**:
```python
import mistune # Markdown parser
def structure_aware_chunk(markdown_content: str) -> list[str]:
ast = mistune.create_markdown(renderer='ast')(markdown_content)
chunks = []
for node in ast:
if node['type'] == 'heading':
# Start new chunk at each header
current_chunk = node['children'][0]['raw']
elif node['type'] == 'paragraph':
current_chunk += "\n" + node['children'][0]['raw']
if len(current_chunk) > 2048:
chunks.append(current_chunk)
current_chunk = ""
return chunks
```
**Pros**:
- ✅ Respects document logical structure
- ✅ Headers provide context for chunks
- ✅ Works well for structured notes (documentation, meeting notes with sections)
**Cons**:
- ❌ Complex implementation (parser, AST traversal)
- ❌ Markdown-specific (doesn't help calendar events, deck cards)
- ❌ Variable chunk sizes (some sections very short/long)
- ❌ Breaks for unstructured content
**Expected Impact**: 15-25% improvement for structured content only
**Verdict**: ⚠️ Future enhancement after Option C1
#### Option C4: Fixed Sliding Window (Current Baseline)
**Description**: Current naive word-based splitting
**Verdict**: ❌ Superseded by Option C1
### Embedding Model Strategies
#### Option E1: Upgrade to Better General-Purpose Model (RECOMMENDED)
**Description**: Switch to state-of-the-art embedding model
**Candidates**:
| Model | Dimensions | MTEB Score | Pros | Cons |
|-------|-----------|------------|------|------|
| **mxbai-embed-large** | 1024 | 64.68 | Best performance, good balance | Larger (slower) |
| **nomic-embed-text-v1.5** | 768 | 62.39 | Upgraded version of current | Incremental improvement |
| **bge-large-en-v1.5** | 1024 | 64.23 | Excellent for English | Not multilingual |
| **nomic-embed-text** (current) | 768 | 60.10 | Baseline | Lower performance |
**MTEB**: Massive Text Embedding Benchmark (higher = better semantic understanding)
**Recommendation**: **mxbai-embed-large-v1**
- Best MTEB score (64.68)
- 1024 dimensions (richer semantic space)
- Works well via Ollama
- ~15-20% better retrieval quality in benchmarks
**Implementation**:
```python
# config.py
OLLAMA_EMBEDDING_MODEL = "mxbai-embed-large-v1" # Changed from nomic-embed-text
# ollama_provider.py
async def get_dimension(self) -> int:
# Query Ollama for actual dimension instead of hardcoding
response = await self.client.post("/api/show", json={"name": self.model})
return response.json()["details"]["embedding_length"]
```
**Migration**:
1. Deploy new model to Ollama
2. Create new Qdrant collection (different dimension)
3. Reindex all documents with new embeddings
4. Swap collections atomically
5. Delete old collection
**Pros**:
- ✅ Immediate quality improvement (15-20%)
- ✅ Simple change (config + reindex)
- ✅ No code complexity
- ✅ Future-proof (state-of-the-art model)
**Cons**:
- ❌ Requires full reindex (2-4 hours for 1000 documents)
- ❌ Larger model = slower embedding (~50ms vs. 30ms per chunk)
- ❌ Higher dimensionality = more storage (~30% increase)
**Expected Impact**: 15-25% recall improvement
#### Option E2: Multi-Vector Embeddings (ColBERT-style)
**Description**: Generate multiple embeddings per chunk (token-level)
**Architecture**:
```
Chunk → Transformer → Token embeddings (e.g., 50 tokens × 128 dim) → Store all
Query → Transformer → Token embeddings → MaxSim(query_tokens, doc_tokens)
```
**MaxSim scoring**:
```python
def maxsim_score(query_embeddings, doc_embeddings):
# For each query token, find max similarity with any doc token
scores = []
for q_emb in query_embeddings:
max_sim = max(cosine_similarity(q_emb, d_emb) for d_emb in doc_embeddings)
scores.append(max_sim)
return sum(scores)
```
**Pros**:
- ✅ Best retrieval quality (state-of-the-art results)
- ✅ Fine-grained matching (token-level)
- ✅ Handles partial matches better
**Cons**:
-**50-100x storage increase** (50 vectors per chunk vs. 1)
-**Slower search** (compute MaxSim for each candidate)
-**Complex implementation** (custom scoring, storage schema)
-**Requires specialized model** (ColBERTv2, not available in Ollama)
**Expected Impact**: 40-50% improvement, but at very high cost
**Verdict**: ❌ Too complex, too expensive for marginal gain over E1+C1
#### Option E3: Fine-Tuned Domain-Specific Model
**Description**: Fine-tune embedding model on Nextcloud corpus
**Process**:
1. Collect training data (query-document pairs)
2. Fine-tune base model (e.g., `nomic-embed-text`) on domain data
3. Deploy fine-tuned model via Ollama
4. Reindex with fine-tuned embeddings
**Training data needed**:
- 1,000+ query-document pairs
- Labeled relevance (positive/negative examples)
- Representative of real usage
**Pros**:
- ✅ Optimized for specific content (notes, calendar, deck)
- ✅ Better handling of domain terminology
- ✅ Highest potential quality improvement (30-40%)
**Cons**:
-**Requires training data** (expensive to collect)
-**GPU infrastructure** needed for fine-tuning
-**Expertise required** (ML/NLP knowledge)
-**Maintenance burden** (retrain as corpus evolves)
-**Time investment**: 2-4 weeks initial setup
**Expected Impact**: 30-40% improvement, but high cost
**Verdict**: ⚠️ Consider only if E1+C1 insufficient AND have training data
#### Option E4: Ensemble Embeddings
**Description**: Generate embeddings with multiple models, combine scores
**Implementation**:
```python
models = ["mxbai-embed-large-v1", "bge-large-en-v1.5"]
# Index
embeddings = [await embed(chunk, model) for model in models]
store_multi_vector(embeddings)
# Search
query_embeddings = [await embed(query, model) for model in models]
scores = [search(q_emb, model) for q_emb, model in zip(query_embeddings, models)]
combined_score = 0.5 * scores[0] + 0.5 * scores[1]
```
**Pros**:
- ✅ Robust to individual model weaknesses
- ✅ Better coverage of semantic space
**Cons**:
- ❌ 2x storage and compute
- ❌ Complex scoring and fusion
- ❌ Marginal improvement (~5-10%) over single best model
**Expected Impact**: 5-10% over best single model
**Verdict**: ❌ Not worth complexity
### Combined Strategies
#### Option D1: Best Chunking + Best Embedding (RECOMMENDED)
**Combination**: Option C1 (Semantic Chunking) + Option E1 (mxbai-embed-large-v1)
**Expected Impact**:
- Chunking: +20-30% recall
- Embedding: +15-25% recall
- **Combined**: +35-55% recall improvement (not strictly additive, but significant)
**Cost**:
- Development: 1-2 days
- Reindex: 2-4 hours (one-time)
- Ongoing: None (same infrastructure)
**Pros**:
- ✅ Addresses both root causes
- ✅ Orthogonal improvements (chunking + embedding)
- ✅ Simple implementation
- ✅ No new infrastructure
- ✅ Future-proof foundation for additional enhancements (reranking, hybrid search)
**Cons**:
- ❌ Requires full reindex (manageable)
- ❌ Slightly higher storage (1024 vs. 768 dim)
**Verdict**: ✅ **RECOMMENDED**
## Decision
**Adopt Option D1: Semantic Chunking + Upgraded Embedding Model**
Implement both improvements together to maximize recall improvement:
### 1. Semantic Sentence-Aware Chunking
**Changes**:
- Replace naive word splitting with `RecursiveCharacterTextSplitter`
- Preserve sentence boundaries, paragraph structure
- Maintain similar chunk sizes (~512 words / 2048 characters)
**Implementation**:
```python
# nextcloud_mcp_server/vector/document_chunker.py
from langchain.text_splitter import RecursiveCharacterTextSplitter
class DocumentChunker:
"""Chunk documents into semantically coherent pieces."""
def __init__(
self,
chunk_size: int = 2048, # Characters, not words
chunk_overlap: int = 200, # Characters, not words
):
self.chunk_size = chunk_size
self.chunk_overlap = chunk_overlap
self.splitter = RecursiveCharacterTextSplitter(
chunk_size=chunk_size,
chunk_overlap=chunk_overlap,
separators=[
"\n\n", # Paragraphs (highest priority)
"\n", # Lines
". ", # Sentences
"! ",
"? ",
"; ", # Clauses
": ",
", ", # Phrases
" ", # Words (last resort)
],
length_function=len,
is_separator_regex=False,
)
def chunk_text(self, content: str) -> list[str]:
"""
Chunk text while preserving semantic boundaries.
Args:
content: Full document text
Returns:
List of text chunks, each ending at a semantic boundary
"""
if not content:
return []
# Use RecursiveCharacterTextSplitter for semantic boundaries
chunks = self.splitter.split_text(content)
return chunks
```
**Configuration Changes** (`config.py`):
```python
# Old (word-based)
DOCUMENT_CHUNK_SIZE: int = 512 # words
DOCUMENT_CHUNK_OVERLAP: int = 50 # words
# New (character-based, more precise)
DOCUMENT_CHUNK_SIZE: int = 2048 # characters (~512 words)
DOCUMENT_CHUNK_OVERLAP: int = 200 # characters (~50 words)
```
**Dependency** (`pyproject.toml`):
```toml
[project]
dependencies = [
# ... existing dependencies
"langchain-text-splitters>=0.2.0",
]
```
### 2. Upgrade Embedding Model
**Changes**:
- Switch from `nomic-embed-text` (768-dim) to `mxbai-embed-large-v1` (1024-dim)
- Dynamic dimension detection (query Ollama instead of hardcoding)
- Create new Qdrant collection for new dimensions
**Implementation**:
```python
# nextcloud_mcp_server/embedding/ollama_provider.py
class OllamaEmbeddingProvider(EmbeddingProvider):
def __init__(self, base_url: str, model: str, verify_ssl: bool = True):
self.base_url = base_url
self.model = model
self._dimension: int | None = None # Changed: query dynamically
self.client = httpx.AsyncClient(base_url=base_url, verify=verify_ssl)
async def dimension(self) -> int:
"""Get embedding dimension from Ollama API."""
if self._dimension is None:
try:
response = await self.client.post(
"/api/show",
json={"name": self.model},
timeout=10.0,
)
response.raise_for_status()
info = response.json()
self._dimension = info.get("details", {}).get("embedding_length")
if self._dimension is None:
# Fallback: generate test embedding to detect dimension
test_emb = await self.embed("test")
self._dimension = len(test_emb)
except Exception as e:
logger.warning(f"Failed to get dimension from Ollama: {e}, using fallback")
# Fallback dimensions by model name
if "mxbai-embed-large" in self.model:
self._dimension = 1024
elif "nomic-embed-text" in self.model:
self._dimension = 768
else:
self._dimension = 768 # Default
return self._dimension
```
**Configuration Changes** (`config.py`):
```python
# Old
OLLAMA_EMBEDDING_MODEL: str = "nomic-embed-text"
# New
OLLAMA_EMBEDDING_MODEL: str = "mxbai-embed-large-v1"
```
**Environment Variable**:
```bash
OLLAMA_EMBEDDING_MODEL=mxbai-embed-large-v1
```
### 3. Migration Strategy
**Reindexing Process**:
```python
# nextcloud_mcp_server/vector/migration.py
async def migrate_to_new_embeddings():
"""
Migrate from old embeddings to new embeddings.
Process:
1. Create new collection with new dimension
2. Reindex all documents with new embeddings
3. Atomic swap (update collection name in config)
4. Delete old collection
"""
old_collection = "nextcloud_content"
new_collection = "nextcloud_content_v2"
# 1. Create new collection
await qdrant_client.create_collection(
collection_name=new_collection,
vectors_config=VectorParams(
size=1024, # mxbai-embed-large-v1 dimension
distance=Distance.COSINE,
),
)
# 2. Reindex all documents
logger.info("Starting reindex with new embeddings...")
scanner = VectorScanner(...)
processor = VectorProcessor(collection_name=new_collection, ...)
await scanner.scan_all() # Rescans and re-embeds all documents
# 3. Wait for completion
while True:
status = await get_sync_status()
if status.pending_documents == 0:
break
await asyncio.sleep(5)
# 4. Atomic swap
# Update config to point to new collection
# (or use collection alias in Qdrant)
await qdrant_client.update_collection_aliases(
change_aliases_operations=[
CreateAliasOperation(
create_alias=CreateAlias(
collection_name=new_collection,
alias_name="nextcloud_content"
)
)
]
)
# 5. Verify new collection works
test_results = await run_benchmark_queries()
if test_results.recall < baseline_recall:
# Rollback
logger.error("New embeddings worse than baseline, rolling back")
await rollback_migration()
return False
# 6. Delete old collection
await qdrant_client.delete_collection(old_collection)
logger.info("Migration complete!")
return True
```
**Downtime Mitigation**:
- Use Qdrant collection aliases for atomic swap
- Reindex can happen in background
- Only brief downtime during alias swap (~1s)
**Rollback Plan**:
- Keep old collection until validation complete
- If new embeddings worse, swap alias back to old collection
- No data loss
### 4. Validation & Benchmarking
**Before/After Comparison**:
```python
# tests/benchmarks/chunking_embedding_comparison.py
async def benchmark_chunking_embeddings():
"""
Compare old vs. new chunking and embeddings on test queries.
"""
test_queries = load_benchmark_queries() # 100 queries with known relevant docs
# Baseline (current)
baseline_results = await run_queries(
queries=test_queries,
collection="nextcloud_content", # Old: nomic-embed-text, word chunks
)
# New implementation
new_results = await run_queries(
queries=test_queries,
collection="nextcloud_content_v2", # New: mxbai-embed-large-v1, semantic chunks
)
# Compare metrics
comparison = {
"baseline": {
"recall@10": calculate_recall(baseline_results, k=10),
"precision@10": calculate_precision(baseline_results, k=10),
"mrr": calculate_mrr(baseline_results),
"zero_result_rate": calculate_zero_result_rate(baseline_results),
},
"new": {
"recall@10": calculate_recall(new_results, k=10),
"precision@10": calculate_precision(new_results, k=10),
"mrr": calculate_mrr(new_results),
"zero_result_rate": calculate_zero_result_rate(new_results),
},
"improvement": {
"recall_improvement": (new_recall - baseline_recall) / baseline_recall,
"precision_improvement": (new_precision - baseline_precision) / baseline_precision,
}
}
return comparison
```
**Success Criteria**:
- **Recall@10**: Improve from ~52% to ≥75% (+40% improvement)
- **Precision@10**: Maintain ≥75% (no degradation)
- **MRR**: Improve from 0.58 to ≥0.70
- **Zero-result rate**: Reduce from 18% to ≤10%
- **Indexing time**: Maintain ≤10s per document
**Validation Process**:
1. Run benchmark on baseline (current implementation)
2. Implement changes
3. Run benchmark on new implementation
4. Compare metrics
5. If improvement ≥40%, proceed to production
6. If improvement <40%, investigate and iterate
## Implementation Timeline
### Week 1: Development & Testing
**Day 1-2: Chunking Implementation**
- [ ] Add langchain-text-splitters dependency
- [ ] Refactor `document_chunker.py`
- [ ] Update configuration (character-based chunk sizes)
- [ ] Write unit tests for semantic boundaries
- [ ] Validate: Chunks never break mid-sentence
**Day 3-4: Embedding Implementation**
- [ ] Update `ollama_provider.py` with dynamic dimension detection
- [ ] Update configuration (new model name)
- [ ] Deploy `mxbai-embed-large-v1` to Ollama
- [ ] Test embedding generation with new model
- [ ] Validate: Embeddings are 1024-dim
**Day 5: Migration Script**
- [ ] Write migration script (collection creation, reindexing, alias swap)
- [ ] Test migration on staging environment
- [ ] Validate: No data loss, atomic swap works
### Week 2: Reindexing & Validation
**Day 1-2: Staging Reindex**
- [ ] Run full reindex on staging environment
- [ ] Monitor indexing performance
- [ ] Validate: All documents indexed correctly
**Day 3: Benchmarking**
- [ ] Run benchmark queries on old collection (baseline)
- [ ] Run benchmark queries on new collection
- [ ] Compare metrics (recall, precision, MRR)
- [ ] Validate: ≥40% recall improvement
**Day 4: Production Reindex**
- [ ] Schedule maintenance window (optional, can run in background)
- [ ] Run migration script on production
- [ ] Monitor reindexing progress
- [ ] Atomic swap when complete
**Day 5: Production Validation**
- [ ] Monitor search quality metrics
- [ ] Collect user feedback
- [ ] Compare production metrics to staging
- [ ] Rollback if issues detected
## Cost Analysis
### Development Cost
- **Time**: 1-2 weeks (implementation + validation)
- **Effort**: 40-60 hours @ $100/hour = $4,000 - $6,000
### Infrastructure Cost
- **Storage**: +30% (1024-dim vs. 768-dim)
- Example: 1,000 notes × 3 chunks × 1024 dim × 4 bytes = 12 MB (negligible)
- **Compute**: +20% embedding time (50ms vs. 30ms per chunk)
- Amortized over batch indexing, minimal impact
- **No new infrastructure**: Uses existing Ollama + Qdrant
### Reindexing Cost (One-Time)
- **Time**: 2-4 hours for 1,000 documents
- 1,000 docs × 3 chunks × 50ms = 150 seconds (~2.5 minutes embedding)
- + Ollama processing time + Qdrant insertion
- **Downtime**: ~1 second (atomic alias swap)
### Total Cost
- **Initial**: $4,000 - $6,000 (development + testing)
- **Ongoing**: $0 (no new infrastructure or API costs)
### ROI
- **Recall improvement**: +40-60% (finding relevant documents)
- **User satisfaction**: Reduced zero-result queries (18% → 10%)
- **Foundation**: Enables future enhancements (reranking, hybrid search)
- **Cost per % improvement**: $100 - $150 (excellent ROI)
## Consequences
### Positive
1. **Addresses Root Causes**: Fixes fundamental issues (chunking, embeddings) not symptoms
2. **High Impact**: Expected 40-60% recall improvement from foundational changes
3. **Future-Proof**: Creates solid foundation for future enhancements (reranking, hybrid search, GraphRAG)
4. **Simple**: No architectural changes, no new infrastructure
5. **Orthogonal**: Improvements are independent, can be validated separately
6. **Low Risk**: Proven techniques (RecursiveCharacterTextSplitter, mxbai-embed-large-v1)
7. **Maintainable**: Standard libraries and models, easy to debug
### Negative
1. **Reindexing Required**: 2-4 hours one-time cost (manageable, can run in background)
2. **Storage Increase**: +30% for higher-dimensional embeddings (12 MB vs. 9 MB for 1K docs)
3. **Slower Indexing**: +20% embedding time (50ms vs. 30ms per chunk)
4. **Dependency**: Adds langchain-text-splitters (minimal, well-maintained library)
5. **Not a Complete Solution**: May still need reranking/hybrid search for optimal recall (but solid foundation)
### Neutral
1. **Model Lock-In**: Committed to mxbai-embed-large-v1, but can change later (another reindex)
2. **Chunk Size Trade-offs**: ~512 words is heuristic, may need tuning for specific content types
## Monitoring & Success Metrics
### Real-Time Metrics (Grafana)
**Search Quality**:
- `semantic_search_recall_at_10` (target: ≥75%)
- `semantic_search_precision_at_10` (target: ≥75%)
- `semantic_search_mrr` (target: ≥0.70)
- `semantic_search_zero_result_rate` (target: ≤10%)
**Performance**:
- `semantic_search_latency_ms` (p50, p95, p99)
- `embedding_generation_time_ms`
- `indexing_throughput_docs_per_sec`
**Indexing**:
- `documents_indexed_total`
- `documents_pending`
- `indexing_errors_total`
### Weekly Validation
**A/B Testing** (if gradual rollout):
- 50% users: New embeddings
- 50% users: Old embeddings
- Compare metrics for 1 week
- Full rollout if new embeddings superior
**User Feedback**:
- Survey: "How satisfied are you with search results?" (1-5 scale)
- Track: Number of "search not working" support tickets
- Monitor: User-reported false negatives ("I know this doc exists")
### Rollback Criteria
**Automatic Rollback** if:
- Recall decreases by >10% from baseline
- Error rate increases by >50%
- Query latency increases by >100%
**Manual Rollback** if:
- User complaints increase significantly
- Zero-result queries increase instead of decrease
## Future Enhancements
These improvements create a solid foundation. Future enhancements (in order of priority):
1. **Cross-Encoder Reranking** (ADR-012)
- Two-stage retrieval: broad recall (50 candidates) → precise reranking (top 10)
- Expected: +15-20% additional recall improvement
- Builds on: Better embeddings retrieve better candidates to rerank
2. **Hybrid Search** (ADR-013)
- Combine vector search + BM25 keyword search
- Expected: +10-15% additional recall (especially for exact matches)
- Builds on: Semantic chunks provide better keyword match context
3. **Multi-App Indexing** (ADR-014)
- Index calendar, deck, files (currently notes-only)
- Expected: Expands searchable corpus 3-5x
- Builds on: Proven chunking and embedding strategy
4. **GraphRAG** (ADR-015, conditional)
- Only if: Global thematic queries needed OR corpus >10K documents
- Expected: Relationship discovery, multi-hop reasoning
- Builds on: High-quality embeddings improve graph construction
## References
### Research Papers
1. **RecursiveCharacterTextSplitter**
- LangChain Documentation: https://python.langchain.com/docs/modules/data_connection/document_transformers/text_splitters/recursive_text_splitter
- Proven technique used by major RAG systems
2. **MTEB Leaderboard** (Massive Text Embedding Benchmark)
- https://huggingface.co/spaces/mteb/leaderboard
- Comprehensive embedding model comparison
3. **mxbai-embed-large**
- Model: https://huggingface.co/mixedbread-ai/mxbai-embed-large-v1
- Best general-purpose embedding model (MTEB: 64.68)
### Related ADRs
- **ADR-003**: Vector Database and Semantic Search Architecture (original implementation)
- **ADR-008**: MCP Sampling for Multi-App Semantic Search with RAG (answer generation)
### Tools & Libraries
- **LangChain Text Splitters**: https://python.langchain.com/docs/modules/data_connection/document_transformers/
- **Ollama Embedding Models**: https://ollama.ai/library
- **Qdrant Collections**: https://qdrant.tech/documentation/concepts/collections/
## Summary
This ADR addresses the root causes of poor semantic search recall:
1. **Better Chunking**: Semantic sentence-aware splitting (preserves context)
2. **Better Embeddings**: Upgrade to mxbai-embed-large-v1 (richer semantic space)
**Expected Impact**: 40-60% recall improvement with minimal cost and complexity.
**Why This Approach**:
- Fixes fundamentals before adding complexity
- Proven techniques (not experimental)
- Simple implementation (1-2 weeks)
- Creates foundation for future enhancements
- No new infrastructure or ongoing costs
**Next Steps**: Approve ADR → Implement changes → Reindex → Validate → Production rollout
@@ -0,0 +1,619 @@
# ADR-012: Unified Multi-Algorithm Search with Client-Configurable Weighting
## Status
Proposed
## Context
### Current State
The Nextcloud MCP server currently provides semantic search via vector similarity (Qdrant), as designed in ADR-003 and implemented through ADR-007. However, users and MCP clients have limited control over search behavior:
1. **Single algorithm only**: Only pure vector similarity search is available
2. **No algorithm selection**: MCP clients cannot choose between semantic, keyword, or fuzzy approaches
3. **No weighting control**: Clients cannot adjust the balance between different search methods
4. **Disconnected implementations**: Viz pane uses different search algorithms than MCP tools
5. **Limited flexibility**: No way to optimize search for different use cases (exact match vs. conceptual similarity)
### User Needs
Different search scenarios require different algorithms:
- **Exact match queries**: "Find note titled 'Q1 Budget'" → keyword search preferred
- **Conceptual queries**: "What are my goals for next quarter?" → semantic search preferred
- **Typo-tolerant queries**: "Find note about kuberntes" → fuzzy search needed
- **Balanced queries**: "Find documentation about API endpoints" → hybrid search optimal
Additionally, users need a **testing interface** (viz pane) to:
- Experiment with different search algorithms on their own documents
- Visualize search results and algorithm behavior
- Tune weights for optimal results
- Understand which algorithm works best for their queries
### Technical Requirements
1. **Unified interface**: Single MCP tool supporting multiple algorithms
2. **Client control**: MCP clients specify algorithm and weights via tool parameters
3. **Backward compatibility**: Existing `nc_semantic_search()` behavior preserved
4. **Shared implementation**: Viz pane and MCP tools use identical search algorithms
5. **User accessibility**: Viz pane available to all logged-in users with vector sync enabled
6. **Performance**: Minimal overhead for algorithm selection
## Decision
We will implement a **unified multi-algorithm search architecture** with the following components:
### Architecture Diagram
```
┌─────────────────────────────────────────────────────────────────────────────┐
│ MCP Client / User Browser │
│ │
│ ┌──────────────────────────┐ ┌──────────────────────────────────┐ │
│ │ MCP Tool Call │ │ Viz Pane (Browser UI) │ │
│ │ │ │ │ │
│ │ nc_semantic_search( │ │ - Algorithm selector dropdown │ │
│ │ query="kubernetes", │ │ - Weight adjustment sliders │ │
│ │ algorithm="hybrid", │ │ - Interactive 2D scatter plot │ │
│ │ semantic_weight=0.5, │ │ - Side-by-side comparison │ │
│ │ keyword_weight=0.3, │ │ - Real-time search testing │ │
│ │ fuzzy_weight=0.2 │ │ │ │
│ │ ) │ │ │ │
│ └───────────┬──────────────┘ └────────────┬─────────────────────┘ │
└──────────────┼─────────────────────────────────────┼────────────────────────┘
│ │
│ MCP Protocol │ HTTPS (htmx)
│ │
┌──────────────▼──────────────────────────────────────▼────────────────────────┐
│ MCP Server (/app endpoint) │
│ │
│ ┌─────────────────────────────────────────────────────────────────────────┐ │
│ │ Unified Search Interface (server/semantic.py) │ │
│ │ │ │
│ │ @mcp.tool() nc_semantic_search(algorithm, weights...) │ │
│ │ ├─ Validate parameters (weights sum ≤1.0) │ │
│ │ ├─ Dispatch to algorithm selector │ │
│ │ └─ Return ranked SearchResponse │ │
│ └────────────────────────────┬────────────────────────────────────────────┘ │
│ │ │
│ ┌────────────────────────────▼────────────────────────────────────────────┐ │
│ │ Algorithm Dispatcher (search/algorithms.py) │ │
│ │ │ │
│ │ if algorithm == "semantic": → semantic.py │ │
│ │ if algorithm == "keyword": → keyword.py │ │
│ │ if algorithm == "fuzzy": → fuzzy.py │ │
│ │ if algorithm == "hybrid": → hybrid.py (RRF fusion) │ │
│ └─────────────────────────────────────────────────────────────────────────┘ │
│ │
│ ┌──────────────────┐ ┌──────────────────┐ ┌──────────────────┐ │
│ │ semantic.py │ │ keyword.py │ │ fuzzy.py │ │
│ │ │ │ │ │ │ │
│ │ • Query Qdrant │ │ • Token matching │ │ • Char overlap │ │
│ │ • Cosine dist │ │ • Title weight │ │ • 70% threshold │ │
│ │ • Score ≥0.7 │ │ • ADR-001 logic │ │ • Simple impl │ │
│ └────────┬─────────┘ └────────┬─────────┘ └────────┬─────────┘ │
│ │ │ │ │
│ └─────────────────────┼──────────────────────┘ │
│ │ │
│ ┌──────────────────────────────▼──────────────────────────────────────────┐ │
│ │ hybrid.py (Reciprocal Rank Fusion) │ │
│ │ │ │
│ │ 1. Run algorithms in parallel (semantic, keyword, fuzzy) │ │
│ │ 2. Collect ranked results from each │ │
│ │ 3. Apply RRF formula: score = weight / (k + rank) │ │
│ │ 4. Combine scores across algorithms │ │
│ │ 5. Re-rank by combined score │ │
│ └─────────────────────────────────────────────────────────────────────────┘ │
└───────────────────────────────────┬───────────────────────────────────────────┘
┌───────────────┴───────────────┐
│ │
┌──────────▼──────────┐ ┌─────────▼────────────┐
│ Qdrant Vector DB │ │ Nextcloud APIs │
│ │ │ │
│ • Vector search │ │ • Access verification│
│ • user_id filter │ │ • Full metadata fetch│
│ • Score threshold │ │ • Permission checks │
│ • 768-dim embeddings│ │ │
└─────────────────────┘ └──────────────────────┘
```
### Data Flow
#### MCP Tool Request
```
1. Client calls nc_semantic_search(query, algorithm="hybrid", weights...)
2. Server validates parameters (weights sum ≤1.0)
3. Dispatcher routes to hybrid.py
4. Hybrid search runs semantic, keyword, fuzzy in parallel
5. RRF combines results with weighted scores
6. Access verification via Nextcloud API
7. Return ranked SearchResponse to client
```
#### Viz Pane Request (Server-Side Processing)
```
1. User navigates to /app (Vector Visualization tab)
2. Browser loads vector-viz fragment via htmx
3. User enters query and adjusts algorithm/weights
4. htmx sends request to /app/vector-viz endpoint
5. Server executes search via search/algorithms.py:
- Filters by user_id (multi-tenant security)
- Applies selected algorithm (semantic/keyword/fuzzy/hybrid)
- Filters by document type (notes/files/calendar/contacts)
- Retrieves matching results + metadata
6. Server performs PCA reduction (768-dim → 2D):
- Converts matching results to 2D coordinates
- Only sends coordinates + metadata (not full vectors)
- Dramatically reduces bandwidth (e.g., 768 floats → 2 floats per doc)
7. Server returns JSON: {results: [...], coordinates_2d: [...], stats: {...}}
8. Browser receives lightweight response
9. Plotly.js renders interactive scatter plot
10. Matching results highlighted (blue), non-matches grayed (40% opacity)
```
**Performance Benefits of Server-Side Processing**:
- **Bandwidth reduction**: ~384x less data (2 floats vs 768 floats per document)
- **Client efficiency**: Browser only handles visualization, not computation
- **Scalability**: Can visualize 10,000+ documents without client-side lag
- **Security**: Raw vectors never leave server
- **Consistency**: Same search logic as MCP tool (no drift)
### 1. Core Search Algorithms
Four search algorithms will be available:
#### a) Semantic Search (Vector Similarity)
- **Method**: Cosine distance in 768-dimensional embedding space
- **Implementation**: Qdrant `query_points` with user_id filtering
- **Use case**: Conceptual queries, finding related content
- **Current status**: Implemented in `nextcloud_mcp_server/server/semantic.py`
#### b) Keyword Search (Token-Based)
- **Method**: Token matching with weighted scoring (from ADR-001)
- **Implementation**: Title matches weighted 3x higher than content
- **Use case**: Exact phrase matching, known titles
- **Current status**: Designed in ADR-001, not implemented
#### c) Fuzzy Search (Character Overlap)
- **Method**: Simple character-based similarity (70% threshold)
- **Implementation**: Character set comparison (current viz pane approach)
- **Use case**: Typo tolerance, approximate matching
- **Current status**: Implemented in viz pane only
#### d) Hybrid Search (Multi-Algorithm Fusion)
- **Method**: Reciprocal Rank Fusion (RRF) from ADR-003
- **Implementation**: Parallel execution + score combination
- **Use case**: Balanced queries, general-purpose search
- **Current status**: Designed in ADR-003, not implemented
### 2. Unified MCP Tool Interface
```python
@mcp.tool()
@require_scopes("semantic:read")
async def nc_semantic_search(
query: str,
ctx: Context,
limit: int = 10,
score_threshold: float = 0.7,
algorithm: Literal["semantic", "keyword", "fuzzy", "hybrid"] = "hybrid",
semantic_weight: float = 0.5,
keyword_weight: float = 0.3,
fuzzy_weight: float = 0.2,
) -> SearchResponse:
"""
Search Nextcloud content using configurable algorithms.
Args:
query: Natural language search query
ctx: MCP context for authentication
limit: Maximum results to return
score_threshold: Minimum similarity score (semantic/hybrid only)
algorithm: Search algorithm to use
semantic_weight: Weight for semantic results (hybrid only, default: 0.5)
keyword_weight: Weight for keyword results (hybrid only, default: 0.3)
fuzzy_weight: Weight for fuzzy results (hybrid only, default: 0.2)
Returns:
Ranked search results with scores and excerpts
"""
```
**Key decisions**:
- **Single tool name**: Keep `nc_semantic_search` for backward compatibility
- **Algorithm parameter**: Explicit selection via enum
- **Weight parameters**: Client-configurable, only apply to hybrid mode
- **Validation**: Weights must sum to ≤1.0, enforced server-side
- **Defaults**: Hybrid mode with balanced weights (semantic 50%, keyword 30%, fuzzy 20%)
### 3. Shared Algorithm Implementation
Extract search algorithms into reusable module:
```
nextcloud_mcp_server/
├── search/
│ ├── __init__.py
│ ├── algorithms.py # Core search implementations
│ ├── semantic.py # Vector similarity search
│ ├── keyword.py # Token-based search (ADR-001)
│ ├── fuzzy.py # Character overlap search
│ └── hybrid.py # RRF fusion (ADR-003)
└── server/
└── semantic.py # MCP tool wrapper
```
**Benefits**:
- Viz pane and MCP tools share identical implementations
- Testable in isolation
- Easy to add new algorithms (e.g., BM25, neural reranking)
- Clear separation of concerns
### 4. Viz Pane Integration
Update viz pane (`nextcloud_mcp_server/auth/userinfo_routes.py`) to:
1. **Use shared algorithms**: Import from `search/algorithms.py`
2. **Server-side filtering**: All search and filtering operations happen server-side
- Query execution via shared search backend
- Document type filtering (notes, files, calendar, contacts)
- User ID filtering for multi-tenant security
- Only matching results + metadata sent to client
- Reduces bandwidth and improves performance
3. **PCA reduction**: Server performs dimensionality reduction (768-dim → 2D)
- Only 2D coordinates sent to browser for visualization
- Dramatically reduces data transfer vs sending full vectors
- Enables visualization of large document collections
4. **User accessibility**: Available to all users with vector sync enabled
5. **Security**: Filter results by `user_id` (only show user's own documents)
6. **Interactive testing**: Allow users to:
- Select algorithm type
- Adjust weights (hybrid mode)
- Compare results across algorithms
- Visualize result distribution in 2D space
#### Viz Pane UI Components
```
┌────────────────────────────────────────────────────────────────────────┐
│ Vector Visualization [Status] │
├────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────────────────────────────────────────────────────────────┐ │
│ │ Search Configuration │ │
│ │ │ │
│ │ Query: [_______________________________________________] [Search]│ │
│ │ │ │
│ │ Algorithm: [Hybrid ▼] [Semantic] [Keyword] [Fuzzy] │ │
│ │ │ │
│ │ Weights (Hybrid Mode): │ │
│ │ Semantic: [========50========] 0.5 │ │
│ │ Keyword: [======30====== ] 0.3 │ │
│ │ Fuzzy: [====20==== ] 0.2 │ │
│ │ │ │
│ │ Document Types: ☑ Notes ☑ Files ☑ Calendar ☑ Contacts │ │
│ └──────────────────────────────────────────────────────────────────┘ │
│ │
│ ┌──────────────────────────────────────────────────────────────────┐ │
│ │ Vector Space Visualization (PCA 2D Projection) │ │
│ │ │ │
│ │ ▲ │ │
│ │ PC2 │ ● ● ● 🔵 Matching results (full opacity) │ │
│ │ │ ● ● ● ⚪ Non-matching results (40% opacity) │ │
│ │ │ 🔵 ● ● │ │
│ │ │ ● 🔵 ● Hover: Show document title + excerpt │ │
│ │ │ ● ● 🔵 ● Click: Open document in Nextcloud │ │
│ │ ────┼──●─🔵──●─●────► PC1 │ │
│ │ │ ● ● ● │ │
│ │ │ 🔵 ● ● Explained Variance: │ │
│ │ │ ● ● ● PC1: 23.4% | PC2: 18.7% │ │
│ │ │ ● ● │ │
│ │ │ │
│ └──────────────────────────────────────────────────────────────────┘ │
│ │
│ ┌──────────────────────────────────────────────────────────────────┐ │
│ │ Search Results (12 matching documents) │ │
│ │ │ │
│ │ 🔵 Kubernetes Setup Guide Score: 0.87 │ │
│ │ "...configure kubectl to connect to cluster..." │ │
│ │ [Open in Nextcloud] │ │
│ │ │ │
│ │ 🔵 Container Orchestration Notes Score: 0.82 │ │
│ │ "...deployment strategies for kubernetes..." │ │
│ │ [Open in Nextcloud] │ │
│ │ │ │
│ │ 🔵 K8s Troubleshooting Score: 0.79 │ │
│ │ "...common kuberntes errors and solutions..." │ │
│ │ [Open in Nextcloud] │ │
│ │ │ │
│ │ [Show More Results...] │ │
│ └──────────────────────────────────────────────────────────────────┘ │
│ │
│ ┌──────────────────────────────────────────────────────────────────┐ │
│ │ Algorithm Performance Comparison │ │
│ │ │ │
│ │ Algorithm │ Results │ Avg Score │ Time (ms) │ Precision │ │
│ │ ─────────────┼─────────┼───────────┼───────────┼─────────── │ │
│ │ Semantic │ 45 │ 0.78 │ 145ms │ ████░ 0.82 │ │
│ │ Keyword │ 23 │ 0.91 │ 42ms │ ███░░ 0.67 │ │
│ │ Fuzzy │ 67 │ 0.72 │ 89ms │ ██░░░ 0.45 │ │
│ │ Hybrid (RRF) │ 52 │ 0.84 │ 198ms │ █████ 0.89 │ │
│ └──────────────────────────────────────────────────────────────────┘ │
└────────────────────────────────────────────────────────────────────────┘
```
**Key UI Features**:
1. **Search Input**: Real-time query testing with instant visualization
2. **Algorithm Selector**: Dropdown + quick-select buttons
3. **Weight Sliders**: Visual adjustment with live preview (hybrid mode only)
4. **Document Type Filters**: Checkboxes for notes, files, calendar, contacts
5. **2D Scatter Plot**: Interactive Plotly.js visualization
- Blue dots = matching documents (full opacity)
- Gray dots = non-matching documents (40% opacity)
- Hover = show title + excerpt tooltip
- Click = open document in Nextcloud
- Zoom/pan controls for exploration
6. **Results Panel**: Ranked list with scores and excerpts
7. **Performance Table**: Compare algorithm speed and accuracy
8. **Explained Variance**: Show how much information PCA preserves
**Technology Stack**:
- **Frontend**: htmx for dynamic loading, Alpine.js for reactivity
- **Visualization**: Plotly.js for interactive scatter plots
- **Styling**: Tailwind CSS (consistent with existing /app UI)
- **Backend**: Shared `search/algorithms.py` implementation
### 5. Reciprocal Rank Fusion (RRF) for Hybrid Search
Following ADR-003's design:
```python
def reciprocal_rank_fusion(
results: dict[str, list[SearchResult]],
weights: dict[str, float],
k: int = 60
) -> list[SearchResult]:
"""
Combine multiple ranked result lists using RRF.
Args:
results: Dict of algorithm_name -> ranked results
weights: Dict of algorithm_name -> weight (0-1)
k: RRF constant (default: 60, standard value)
Returns:
Combined and re-ranked results
"""
scores = defaultdict(float)
for algo_name, algo_results in results.items():
weight = weights.get(algo_name, 0.0)
for rank, result in enumerate(algo_results, start=1):
# RRF formula: 1 / (k + rank)
rrf_score = weight / (k + rank)
scores[result.doc_id] += rrf_score
# Sort by combined score, return top results
return sorted(scores.items(), key=lambda x: x[1], reverse=True)
```
**RRF properties**:
- **Rank-based**: Uses position, not raw scores (handles score scale differences)
- **Proven effective**: Standard approach in information retrieval
- **Configurable**: `k` parameter controls rank decay (default: 60)
- **Weight support**: Allows algorithm-specific importance
## Implementation Plan
### Phase 1: Extract and Unify Algorithms (Week 1)
1. Create `nextcloud_mcp_server/search/` module
2. Implement `algorithms.py` with base interface
3. Extract semantic search logic from `server/semantic.py`
4. Implement keyword search from ADR-001 design
5. Extract fuzzy search from viz pane
6. Implement RRF hybrid search from ADR-003
7. Add comprehensive unit tests for each algorithm
### Phase 2: Update MCP Tool (Week 1-2)
1. Add `algorithm` parameter to `nc_semantic_search()`
2. Add weight parameters (`semantic_weight`, etc.)
3. Implement algorithm dispatcher
4. Add parameter validation (weights sum ≤1.0)
5. Update response model to include algorithm metadata
6. Maintain backward compatibility (default: hybrid)
7. Add integration tests for all algorithm modes
### Phase 3: Update Viz Pane (Week 2)
**Critical: All processing must happen server-side**
1. **Remove client-side search filtering**
- Delete JavaScript-based keyword/fuzzy matching
- Remove client-side document type filtering
- No search logic in browser
2. **Implement server-side endpoint** (`/app/vector-viz`)
- Accept query, algorithm, weights, doc_type filters
- Execute search via `search/algorithms.py`
- Filter results by user_id (security)
- Perform PCA reduction (768-dim → 2D)
- Return JSON with 2D coordinates + metadata only
3. **Update frontend**
- htmx form submission to `/app/vector-viz`
- Algorithm selector dropdown
- Weight adjustment sliders (htmx updates on change)
- Document type checkboxes
- Plotly.js visualization of server response
4. **Performance optimization**
- Limit results to user's documents only
- Cache PCA transformation (invalidate on new vectors)
- Stream large result sets if needed
- Add loading indicators for server processing
### Phase 4: Documentation and Testing (Week 2-3)
1. Update MCP tool documentation
2. Add algorithm selection guide
3. Document weight tuning recommendations
4. Add end-to-end tests (MCP + viz pane)
5. Performance benchmarks for each algorithm
6. Update CLAUDE.md with search patterns
## Consequences
### Positive
1. **Flexibility**: MCP clients can optimize search for their use case
2. **Unified implementation**: Single source of truth for search algorithms
3. **User empowerment**: Viz pane enables query testing and tuning
4. **Backward compatible**: Existing semantic search behavior preserved
5. **Extensible**: Easy to add new algorithms (BM25, neural reranking)
6. **Testable**: Each algorithm can be unit tested independently
7. **Standards-based**: RRF is proven in production systems
### Negative
1. **Complexity**: More parameters for clients to understand
2. **API surface**: Larger tool signature (8 parameters)
3. **Performance**: Hybrid search requires multiple queries
4. **Validation overhead**: Weight validation adds processing
5. **Documentation burden**: Need to explain when to use each algorithm
### Neutral
1. **Weight defaults**: May need tuning based on user feedback
2. **Algorithm performance**: Will vary by content type and query
3. **Viz pane adoption**: Unknown if users will utilize testing interface
## Alternatives Considered
### Alternative 1: Separate Tools Per Algorithm
```python
@mcp.tool()
async def nc_semantic_search(query: str, ctx: Context, ...) -> SearchResponse:
"""Pure vector similarity search."""
@mcp.tool()
async def nc_keyword_search(query: str, ctx: Context, ...) -> SearchResponse:
"""Pure keyword matching."""
@mcp.tool()
async def nc_hybrid_search(query: str, ctx: Context, weights: dict, ...) -> SearchResponse:
"""Hybrid search with weights."""
```
**Rejected because**:
- API proliferation (3+ tools instead of 1)
- Harder to discover capabilities
- Backward compatibility issues
- DRY violation (repeated parameters)
### Alternative 2: Server-Wide Configuration Only
```python
# .env configuration
SEARCH_ALGORITHM=hybrid
SEMANTIC_WEIGHT=0.5
KEYWORD_WEIGHT=0.3
FUZZY_WEIGHT=0.2
```
**Rejected because**:
- No per-query flexibility
- MCP clients cannot optimize for different tasks
- Requires server restart for changes
- User's requirement: "expose a way for users to override the default weights"
### Alternative 3: Production-Grade Fuzzy (Levenshtein/RapidFuzz)
**Rejected because**:
- Adds external dependency
- Simple character overlap performs adequately
- Can always upgrade later if needed
- User's preference: "Keep simple character overlap"
## Related ADRs
- **ADR-001**: Enhanced Note Search (keyword algorithm design)
- **ADR-003**: Vector Database and Semantic Search (hybrid search + RRF design)
- **ADR-007**: Background Vector Sync (semantic search implementation)
- **ADR-008**: MCP Sampling for RAG (uses semantic search results)
- **ADR-009**: Semantic Search OAuth Scope (security model)
- **ADR-011**: Improving Semantic Search Quality (mentions future "ADR-013" for hybrid search)
**This ADR supersedes**:
- ADR-011's placeholder for "ADR-013: Hybrid Search"
**This ADR implements**:
- ADR-003's hybrid search design (previously unimplemented)
- ADR-001's keyword search design (previously unimplemented)
## References
- **Reciprocal Rank Fusion**: Cormack, G. V., Clarke, C. L., & Buettcher, S. (2009). "Reciprocal rank fusion outperforms condorcet and individual rank learning methods." SIGIR '09.
- **Vector Search**: Malkov, Y. A., & Yashunin, D. A. (2018). "Efficient and robust approximate nearest neighbor search using Hierarchical Navigable Small World graphs." TPAMI.
- **Hybrid Search Best Practices**: Qdrant documentation on hybrid search patterns
- **MCP Protocol**: Model Context Protocol specification for tool design
## Implementation Notes
### Weight Validation
```python
def validate_weights(
semantic_weight: float,
keyword_weight: float,
fuzzy_weight: float
) -> None:
"""Validate hybrid search weights."""
if semantic_weight < 0 or keyword_weight < 0 or fuzzy_weight < 0:
raise ValueError("Weights must be non-negative")
total = semantic_weight + keyword_weight + fuzzy_weight
if total > 1.0:
raise ValueError(f"Weights sum to {total:.2f}, must be ≤1.0")
if total == 0.0:
raise ValueError("At least one weight must be > 0")
```
### Backward Compatibility
The default behavior (`algorithm="hybrid"` with balanced weights) provides better results than current pure semantic search, while maintaining the same tool name and signature structure. Existing clients will automatically benefit from hybrid search without code changes.
### Performance Considerations
- **Semantic search**: ~50-200ms (vector DB query)
- **Keyword search**: ~10-50ms (in-memory token matching)
- **Fuzzy search**: ~20-100ms (character comparison)
- **Hybrid search**: ~100-300ms (parallel execution + fusion)
Parallel execution of algorithms minimizes hybrid search latency.
### Security Model
All algorithms respect the same security boundaries:
1. **User filtering**: Qdrant queries filter by `user_id`
2. **Access verification**: Results verified via Nextcloud API
3. **OAuth scope**: `semantic:read` required for all algorithms
4. **Viz pane**: Shows only current user's documents
## Success Metrics
1. **Adoption**: % of MCP clients using algorithm parameter
2. **Performance**: Search latency percentiles (p50, p95, p99)
3. **Quality**: User satisfaction with result relevance
4. **Viz pane usage**: % of users accessing testing interface
5. **Weight distribution**: Most common weight configurations
## Future Enhancements
1. **Additional algorithms**: BM25, TF-IDF, neural reranking
2. **Auto-tuning**: Learn optimal weights per user
3. **Query analysis**: Automatic algorithm selection based on query
4. **Cross-app search**: Extend beyond notes to calendar, files, etc.
5. **Feedback loop**: Use click-through rate to improve weights
+254
View File
@@ -0,0 +1,254 @@
## ADR-013: RAG Evaluation Testing Framework
**Status:** Proposed
**Date:** 2025-11-15
### Context
The `nc_semantic_search_answer` tool implements a Retrieval-Augmented Generation (RAG) system where:
1. **Retrieval**: Vector sync pipeline indexes Nextcloud documents (notes, calendar, contacts, etc.) into a vector database
2. **Generation**: MCP client's LLM synthesizes answers from retrieved documents via MCP sampling (ADR-008)
We need a testing framework to evaluate RAG system performance and identify whether failures occur in retrieval (wrong documents found) or generation (poor answer quality). This framework must use industry-standard evaluation methodologies while remaining practical to implement and maintain.
To establish a baseline, we will use the **BeIR/nfcorpus** dataset (medical/biomedical corpus) with ~5,000 documents and established query/answer pairs.
Homepage: https://www.cl.uni-heidelberg.de/statnlpgroup/nfcorpus/
Download: https://public.ukp.informatik.tu-darmstadt.de/thakur/BEIR/datasets/nfcorpus.zip
### Decision
We will implement a **two-part evaluation framework** that independently tests retrieval and generation quality using pytest fixtures.
#### In Scope
**1. Retrieval Evaluation**
Tests the vector sync/embedding pipeline's ability to find relevant documents.
- **Metric: Context Recall** (Did we retrieve documents containing the answer?)
- **Evaluation method**: Heuristic - Check if ground-truth document IDs appear in top-k retrieval results
- **Test**: Query → Semantic search → Assert expected doc IDs present
**2. Generation Evaluation**
Tests the MCP client LLM's ability to synthesize correct answers from retrieved context.
- **Metric: Answer Correctness** (Is the generated answer factually correct?)
- **Evaluation method**: LLM-as-judge - Compare RAG answer against ground-truth answer
- **Test**: Query → `nc_semantic_search_answer` → LLM evaluates answer vs. ground truth (binary true/false)
#### Out of Scope (Initial Implementation)
- **Context Relevance/Precision**: Measuring irrelevant documents in retrieval results
- **Faithfulness/Groundedness**: Detecting hallucinations not supported by retrieved context
- **Answer Relevance**: Whether answer addresses the specific question asked
- **Out-of-Scope Handling**: Testing "I don't know" responses when answer isn't in context
- **Continuous benchmarking**: Automated tracking of metric trends over time
- **Custom domain datasets**: Production-specific test data (medical corpus used initially)
These remain valuable for future iterations but add complexity beyond our initial goals.
#### Implementation
**Test Structure**
Location: `tests/rag_evaluation/`
- `test_retrieval_quality.py` - Retrieval evaluation tests
- `test_generation_quality.py` - Generation evaluation tests
- `conftest.py` - Fixtures for test data, MCP clients, and evaluation LLMs
**Required Pytest Fixtures**
1. **`nfcorpus_test_data`** (session-scoped)
- Downloads/caches BeIR nfcorpus dataset at runtime
- Loads 5 pre-selected test queries with:
- Query text
- Pre-generated ground-truth answer (from `tests/rag_evaluation/fixtures/ground_truth.json`)
- Expected document IDs (from qrels with score=2)
- Uploads all corpus documents as notes in test Nextcloud instance
- Triggers vector sync to index documents
- Waits for indexing completion
- Returns test case data structure
2. **`mcp_sampling_client`** (session-scoped)
- Creates MCP client that supports sampling
- Configurable LLM provider (ollama or anthropic) via environment:
- `RAG_EVAL_PROVIDER=ollama` (default) or `anthropic`
- `RAG_EVAL_OLLAMA_BASE_URL=http://localhost:11434`
- `RAG_EVAL_OLLAMA_MODEL=llama3.1:8b`
- `RAG_EVAL_ANTHROPIC_API_KEY=sk-...`
- `RAG_EVAL_ANTHROPIC_MODEL=claude-3-5-sonnet-20241022`
- Returns configured MCP client fixture
3. **`evaluation_llm`** (session-scoped)
- Separate LLM instance for evaluation (independent from MCP client)
- Same provider configuration as `mcp_sampling_client`
- Returns callable: `async def evaluate(prompt: str) -> str`
**Test Implementation Examples**
```python
# tests/rag_evaluation/test_retrieval_quality.py
async def test_retrieval_recall(nc_client, nfcorpus_test_data):
"""Test that semantic search retrieves documents containing the answer."""
for test_case in nfcorpus_test_data:
# Perform semantic search (retrieval only, no generation)
results = await nc_client.notes.semantic_search(
query=test_case.query,
limit=10
)
retrieved_doc_ids = {r.document_id for r in results}
expected_doc_ids = set(test_case.expected_document_ids)
# Context Recall: Are expected documents in top-k results?
recall = len(expected_doc_ids & retrieved_doc_ids) / len(expected_doc_ids)
assert recall >= 0.8, f"Recall {recall} below threshold for query: {test_case.query}"
# tests/rag_evaluation/test_generation_quality.py
async def test_answer_correctness(mcp_sampling_client, evaluation_llm, nfcorpus_test_data):
"""Test that RAG system generates factually correct answers."""
for test_case in nfcorpus_test_data:
# Execute full RAG pipeline (retrieval + generation)
result = await mcp_sampling_client.call_tool(
"nc_semantic_search_answer",
arguments={"query": test_case.query, "limit": 5}
)
rag_answer = result["generated_answer"]
# LLM-as-judge evaluation
evaluation_prompt = f"""Compare these two answers and respond with only TRUE or FALSE.
Question: {test_case.query}
Generated Answer: {rag_answer}
Ground Truth Answer: {test_case.ground_truth}
Are these answers semantically equivalent (do they convey the same factual information)?
Respond with only: TRUE or FALSE"""
evaluation_result = await evaluation_llm(evaluation_prompt)
assert evaluation_result.strip().upper() == "TRUE", \
f"Answer mismatch for query: {test_case.query}\nGot: {rag_answer}\nExpected: {test_case.ground_truth}"
```
**Dataset Integration**
The BeIR nfcorpus dataset structure:
- **corpus.jsonl**: 3,633 medical/biomedical documents (articles from PubMed)
- **queries.jsonl**: 3,237 queries (questions)
- **qrels/*.tsv**: Relevance judgments mapping query IDs to document IDs with scores (2=highly relevant, 1=somewhat relevant)
**Important**: The dataset provides relevance judgments (which documents answer which queries) but does NOT include ground truth answers. We must generate synthetic ground truth offline.
**Selected Test Queries** (5 diverse candidates):
1. **PLAIN-2630**: "Alkylphenol Endocrine Disruptors and Allergies" (5 words, 21 highly relevant docs)
2. **PLAIN-2660**: "How Long to Detox From Fish Before Pregnancy?" (8 words, 20 highly relevant docs)
3. **PLAIN-2510**: "Coffee and Artery Function" (4 words, 16 highly relevant docs)
4. **PLAIN-2430**: "Preventing Brain Loss with B Vitamins?" (6 words, 15 highly relevant docs)
5. **PLAIN-2690**: "Chronic Headaches and Pork Tapeworms" (5 words, 14 highly relevant docs)
**Ground Truth Generation** (offline, pre-test):
Ground truth answers will be generated offline using a script that:
1. Loads nfcorpus dataset
2. For each selected query, extracts top 3-5 highly relevant documents
3. Uses an LLM (ollama/anthropic) to synthesize a reference answer
4. Stores ground truth in `tests/rag_evaluation/fixtures/ground_truth.json`
```python
# tools/generate_rag_ground_truth.py
async def generate_ground_truth(query: str, relevant_docs: List[dict], llm: LLMProvider) -> str:
"""Generate synthetic ground truth answer from highly relevant documents."""
context = "\n\n".join([
f"Document {i+1}:\nTitle: {doc['title']}\n{doc['text']}"
for i, doc in enumerate(relevant_docs[:5])
])
prompt = f"""Based on the following documents, provide a comprehensive answer to this question:
Question: {query}
{context}
Provide a factual, well-structured answer that synthesizes information from the documents.
Focus on accuracy and completeness."""
return await llm.generate(prompt, max_tokens=500)
```
**Dataset Loading at Test Runtime** (in `nfcorpus_test_data` fixture):
1. Download nfcorpus dataset (cached in pytest temp directory)
2. Load corpus, queries, and qrels (relevance judgments)
3. Load pre-generated ground truth from `tests/rag_evaluation/fixtures/ground_truth.json`
4. Upload all corpus documents as Nextcloud notes
5. Trigger vector sync to index documents
6. Wait for indexing completion
7. Return test cases with query, ground truth, and expected doc IDs
**LLM Provider Abstraction**
```python
# tests/rag_evaluation/llm_providers.py
class LLMProvider(Protocol):
async def generate(self, prompt: str, max_tokens: int = 100) -> str: ...
class OllamaProvider:
def __init__(self, base_url: str, model: str):
self.base_url = base_url
self.model = model
async def generate(self, prompt: str, max_tokens: int = 100) -> str:
# Use httpx to call Ollama API
...
class AnthropicProvider:
def __init__(self, api_key: str, model: str):
self.client = anthropic.AsyncAnthropic(api_key=api_key)
self.model = model
async def generate(self, prompt: str, max_tokens: int = 100) -> str:
message = await self.client.messages.create(
model=self.model,
max_tokens=max_tokens,
messages=[{"role": "user", "content": prompt}]
)
return message.content[0].text
```
### Consequences
**Positive:**
* **Actionable debugging**: Separate retrieval/generation tests pinpoint failure location
* **Industry-standard metrics**: Context Recall and Answer Correctness are recognized RAG evaluation metrics
* **Simple initial implementation**: Binary LLM evaluation (true/false) is straightforward to implement and interpret
* **Extensible framework**: Easy to add more metrics (faithfulness, relevance) later
* **Standardized benchmark**: nfcorpus provides objective comparison against published RAG systems
* **Hybrid evaluation**: Combines efficiency (heuristics for retrieval) with quality (LLM-as-judge for generation)
* **Provider flexibility**: Supports both local (Ollama) and cloud (Anthropic) LLM evaluation
**Negative:**
* **Medical domain bias**: nfcorpus is medical/biomedical content, may not represent production use cases (personal notes, calendar events, etc.)
* **Manual test execution**: Tests require external LLM access and are not integrated into CI pipeline
* **Limited initial coverage**: Starting with only 5 queries provides limited statistical confidence
* **Evaluation cost**: LLM-as-judge for generation evaluation incurs API costs (Anthropic) or requires local inference (Ollama)
* **Single metric per component**: Initial scope tests only one metric per component, missing other important quality dimensions
* **Synthetic ground truth**: Ground truth answers are LLM-generated, not human-validated, which may introduce evaluation bias
* **Large corpus upload**: Uploading 3,633 documents at test runtime may be slow; caching strategy needed
**Future Work:**
* Expand to 50-100 queries for statistical significance
* Add custom test dataset with production-representative documents (meeting notes, task lists, etc.)
* Implement additional metrics (faithfulness, context relevance, answer relevance)
* Create automated benchmarking dashboard to track metric trends
* Test multi-hop reasoning (synthesis questions requiring multiple documents)
* Evaluate out-of-scope handling ("I don't know" responses)
+153
View File
@@ -0,0 +1,153 @@
# ADR-014: Replace Custom Keyword Search with BM25 Hybrid Search via Qdrant
**Date:** 2025-11-16
**Status:** Implemented
---
### 1. Context
Our RAG application currently employs two separate retrieval mechanisms:
1. **Dense (Semantic) Search:** Using vector embeddings stored in our Qdrant database to find semantically similar context.
2. **Keyword Search:** A custom-built fuzzy/character-based search to match-specific keywords, acronyms, and product codes that semantic search often misses.
This dual-system approach has several drawbacks:
* **Poor Relevance:** Our current keyword search is basic (e.g., `LIKE` queries or simple fuzzy matching). It is not as effective as modern full-text search algorithms like BM25.
* **Clunky Fusion:** We lack a robust, principled method to combine the results from the two systems. This leads to disjointed logic in the application layer and suboptimal context being passed to the LLM.
* **Architectural Complexity:** We must maintain two separate search pathways (one to Qdrant, one to the keyword search mechanism), increasing code complexity and maintenance overhead.
Our vector database, **Qdrant**, natively supports **hybrid search** by combining dense vectors with BM25-based **sparse vectors** in a single collection.
### 2. Decision
We will **deprecate and remove** the existing custom keyword/fuzzy search functionality.
We will **replace it by implementing native hybrid search within Qdrant**. This involves:
1. **Modifying the Qdrant Collection:** Updating our collection to support a named sparse vector index configured for BM25.
2. **Updating the Ingestion Pipeline:** For every document chunk, we will generate and upsert *both*:
* Its **dense vector** (from our existing embedding model).
* Its **sparse vector** (generated using a BM25-compatible model, e.g., `Qdrant/bm25` from `fastembed`).
3. **Refactoring Retrieval Logic:** All retrieval calls will be consolidated into a single Qdrant query using the `query_points` endpoint. This query will use the `prefetch` parameter to execute both dense and sparse searches, and Qdrant's built-in **Reciprocal Rank Fusion (RRF)** to automatically merge the results into a single, relevance-ranked list.
4. **Backfilling:** A one-time migration script will be created to generate and add sparse vectors for all existing documents in the Qdrant collection.
---
### 3. Considered Options
#### Option 1: Native Qdrant Hybrid Search (Chosen)
* Use Qdrant's built-in sparse vector and RRF capabilities.
* **Pros:**
* **Consolidated Architecture:** Manages both dense and sparse indexes in one database.
* **No Data Sync Issues:** Updates are atomic. A single `upsert` updates both representations.
* **Built-in Fusion:** RRF is handled natively and efficiently by the database.
* **Superior Relevance:** Replaces our brittle custom search with the industry-standard BM25.
* **Cons:**
* Requires a one-time data backfill which may be time-consuming.
* Adds a new step (sparse vector generation) to the ingestion pipeline.
#### Option 2: External Full-Text Search (e.g., Elasticsearch)
* Keep Qdrant for dense search and add a separate Elasticsearch/OpenSearch cluster for BM25.
* **Pros:**
* Provides a very powerful, dedicated full-text search engine.
* **Cons:**
* **High Complexity:** Introduces a new, stateful service to deploy, manage, and scale.
* **Data Sync Nightmare:** We would be responsible for ensuring that the document IDs and content in Qdrant and Elasticsearch are always perfectly synchronized. This is a major source of bugs.
* **Manual Fusion:** The application would have to query both systems and perform RRF manually.
#### Option 3: Keep Current System
* Make no changes.
* **Pros:**
* No engineering effort required.
* **Cons:**
* Fails to address the known relevance and architectural problems.
* Our RAG application's performance will remain suboptimal, especially for keyword-sensitive queries.
---
### 4. Rationale
**Option 1 is the clear winner.** It directly solves our primary problem (poor keyword matching) by adopting the industry-standard BM25.
Critically, it achieves this while **simplifying** our overall architecture, not complicating it. By leveraging features already present in our existing database (Qdrant), we avoid the massive operational and synchronization overhead of adding a second search system (Option 2).
This decision consolidates our retrieval logic, eliminates the data consistency problem, and moves the complex fusion logic (RRF) from the application layer into the database, where it can be performed more efficiently.
### 5. Consequences
**New Work:**
* **Ingestion:** The data ingestion pipeline must be updated to add the `fastembed` library (or similar), generate sparse vectors, and upsert them to the new named vector field in Qdrant.
* **Retrieval:** The application's retrieval service must be refactored to use the `query_points` endpoint with `prefetch` and `fusion=models.Fusion.RRF`.
* **Migration:** A one-time backfill script must be written and executed to add sparse vectors for all existing documents.
* **Infrastructure:** The Qdrant collection schema must be updated (or re-created) to add the `sparse_vectors_config`.
**Positive:**
* **Improved Accuracy:** Retrieval will be significantly more accurate, handling both semantic and keyword queries robustly.
* **Simplified Code:** The application's retrieval logic will be cleaner and simpler, with one endpoint instead of two.
* **Reduced Maintenance:** We will remove the custom fuzzy-search code, which is brittle and difficult to maintain.
**Negative:**
* The data backfill process will require careful management to avoid downtime.
* Ingestion time will slightly increase due to the extra step of sparse vector generation. This is considered a negligible trade-off for the gains in relevance.
---
### 6. Implementation Notes
**Implementation completed on 2025-11-16**
**Key Changes:**
1. **Dependencies** (pyproject.toml:25):
- Added `fastembed>=0.4.2` for BM25 sparse vector embeddings
- Adjusted `pillow` version constraint to be compatible with fastembed
2. **Qdrant Collection Schema** (nextcloud_mcp_server/vector/qdrant_client.py:113-128):
- Updated to named vectors: `{"dense": VectorParams(...), "sparse": SparseVectorParams(...)}`
- Added sparse vector configuration with BM25 index
- Maintains backward compatibility with existing collections (detects legacy schema)
3. **BM25 Embedding Provider** (nextcloud_mcp_server/embedding/bm25_provider.py):
- Created `BM25SparseEmbeddingProvider` using FastEmbed's `Qdrant/bm25` model
- Implements `encode()` and `encode_batch()` methods
- Returns sparse vectors as `{indices: list[int], values: list[float]}` format
4. **Document Indexing Pipeline** (nextcloud_mcp_server/vector/processor.py:229-255):
- Generates both dense (semantic) and sparse (BM25) embeddings for each document chunk
- Updates `PointStruct` to use named vectors: `vector={"dense": ..., "sparse": ...}`
- Maintains same chunking strategy (512 words, 50-word overlap)
5. **BM25 Hybrid Search Algorithm** (nextcloud_mcp_server/search/bm25_hybrid.py):
- Implements `BM25HybridSearchAlgorithm` using Qdrant's native RRF fusion
- Uses `prefetch` parameter for parallel dense + sparse search
- Applies `fusion=models.Fusion.RRF` for automatic result merging
- Maintains same deduplication and filtering logic as semantic search
6. **MCP Tool Updates** (nextcloud_mcp_server/server/semantic.py:39-68):
- Simplified `nc_semantic_search()` to use BM25 hybrid only
- Removed `algorithm`, `semantic_weight`, `keyword_weight`, `fuzzy_weight` parameters
- Updated default `score_threshold=0.0` for RRF scoring
- Returns `search_method="bm25_hybrid"` in responses
7. **Legacy Algorithm Removal**:
- Deleted `nextcloud_mcp_server/search/keyword.py` (278 lines)
- Deleted `nextcloud_mcp_server/search/fuzzy.py` (220 lines)
- Deleted `nextcloud_mcp_server/search/hybrid.py` (238 lines - custom RRF)
- Updated `nextcloud_mcp_server/search/__init__.py` to export only BM25 hybrid
**Migration Strategy:**
- No migration required (vector sync feature is experimental)
- New documents automatically indexed with both dense + sparse vectors
- Collection re-creation on first startup with updated schema
**Test Results:**
- All unit tests passing (118 passed)
- All integration tests passing (7 semantic search tests)
- Code formatting verified with ruff
**Benefits Realized:**
- ✅ Consolidated architecture (single Qdrant database for both dense + sparse)
- ✅ Native RRF fusion (database-level, more efficient)
- ✅ Industry-standard BM25 (replaces custom keyword search)
- ✅ Simplified codebase (removed 736 lines of legacy code)
- ✅ Better relevance (handles both semantic and keyword queries)
@@ -0,0 +1,380 @@
# ADR-015: Unified Provider Architecture for Embeddings and Text Generation
**Status:** Accepted
**Date:** 2025-01-16
**Deciders:** Development Team
**Related:** ADR-003 (Vector Database), ADR-008 (MCP Sampling), ADR-013 (RAG Evaluation)
## Context
Prior to this refactoring, the codebase had two separate provider systems:
1. **Embedding Providers** (`nextcloud_mcp_server/embedding/`)
- Used `EmbeddingProvider` ABC with methods: `embed()`, `embed_batch()`, `get_dimension()`
- Had auto-detection via `EmbeddingService._detect_provider()`
- Used for semantic search and vector indexing (production)
2. **LLM Providers** (`tests/rag_evaluation/llm_providers.py`)
- Used `LLMProvider` Protocol with method: `generate()`
- Had separate factory function `create_llm_provider()`
- Used only for RAG evaluation tests (not production)
This fragmentation created several problems:
### Problems with Dual Provider Systems
1. **Code Duplication**
- Ollama configuration appeared in both `embedding/service.py` and `tests/rag_evaluation/llm_providers.py`
- Similar provider detection logic in multiple places
- Separate singleton patterns for each system
2. **Limited Extensibility**
- Hard-coded provider detection in `EmbeddingService._detect_provider()`
- No support for providers that offer both capabilities (like Bedrock)
- Adding new providers required modifying multiple files
3. **Inconsistent Patterns**
- BM25 provider didn't follow `EmbeddingProvider` ABC
- Different method names across providers (`embed` vs `encode`)
- ABC vs Protocol for type checking
4. **Difficult Scaling**
- Adding Amazon Bedrock (our third provider) would exacerbate all issues
- No clear path for future providers (OpenAI, Cohere, etc.)
### Amazon Bedrock Requirements
Bedrock naturally supports **both** embeddings and text generation:
- **Embeddings**: `amazon.titan-embed-text-v1/v2`, `cohere.embed-*`
- **Text Generation**: `anthropic.claude-*`, `meta.llama3-*`, `amazon.titan-text-*`
- **Unified API**: Single `invoke_model()` method via bedrock-runtime
This made it the perfect opportunity to establish a unified provider architecture.
## Decision
We refactored the provider infrastructure to use a **unified Provider ABC** with optional capabilities:
### 1. Unified Provider Interface
**New Structure:**
```
nextcloud_mcp_server/providers/
├── __init__.py
├── base.py # Provider ABC with optional capabilities
├── registry.py # Auto-detection and factory
├── ollama.py # Supports both embedding + generation
├── anthropic.py # Generation only
├── bedrock.py # Supports both embedding + generation
└── simple.py # Embedding only (testing fallback)
```
**Base Class (`providers/base.py`):**
```python
class Provider(ABC):
@property
@abstractmethod
def supports_embeddings(self) -> bool:
"""Whether this provider supports embedding generation."""
pass
@property
@abstractmethod
def supports_generation(self) -> bool:
"""Whether this provider supports text generation."""
pass
@abstractmethod
async def embed(self, text: str) -> list[float]:
"""Generate embedding (raises NotImplementedError if not supported)."""
pass
@abstractmethod
async def embed_batch(self, texts: list[str]) -> list[list[float]]:
"""Generate batch embeddings (raises NotImplementedError if not supported)."""
pass
@abstractmethod
def get_dimension(self) -> int:
"""Get embedding dimension (raises NotImplementedError if not supported)."""
pass
@abstractmethod
async def generate(self, prompt: str, max_tokens: int = 500) -> str:
"""Generate text (raises NotImplementedError if not supported)."""
pass
@abstractmethod
async def close(self) -> None:
"""Close provider and release resources."""
pass
```
### 2. Provider Registry
**Auto-Detection Priority** (`providers/registry.py`):
```python
class ProviderRegistry:
@staticmethod
def create_provider() -> Provider:
# 1. Bedrock (AWS_REGION or BEDROCK_*_MODEL)
# 2. Ollama (OLLAMA_BASE_URL)
# 3. Simple (fallback)
```
**Environment Variables:**
**Bedrock:**
- `AWS_REGION`: AWS region (e.g., "us-east-1")
- `AWS_ACCESS_KEY_ID`: AWS access key (optional, uses credential chain)
- `AWS_SECRET_ACCESS_KEY`: AWS secret key (optional)
- `BEDROCK_EMBEDDING_MODEL`: Model ID for embeddings (e.g., "amazon.titan-embed-text-v2:0")
- `BEDROCK_GENERATION_MODEL`: Model ID for text generation (e.g., "anthropic.claude-3-sonnet-20240229-v1:0")
**Ollama:**
- `OLLAMA_BASE_URL`: Ollama API base URL (e.g., "http://localhost:11434")
- `OLLAMA_EMBEDDING_MODEL`: Model for embeddings (default: "nomic-embed-text")
- `OLLAMA_GENERATION_MODEL`: Model for text generation (e.g., "llama3.2:1b")
- `OLLAMA_VERIFY_SSL`: Verify SSL certificates (default: "true")
**Simple (no configuration, fallback):**
- `SIMPLE_EMBEDDING_DIMENSION`: Embedding dimension (default: 384)
### 3. Backward Compatibility
**Old Code Continues to Work:**
```python
# Old way (still works)
from nextcloud_mcp_server.embedding import get_embedding_service
service = get_embedding_service() # Returns singleton Provider
embeddings = await service.embed_batch(texts)
```
**New Way (recommended):**
```python
# New way (cleaner)
from nextcloud_mcp_server.providers import get_provider
provider = get_provider() # Returns singleton Provider
embeddings = await provider.embed_batch(texts)
# Can also use generation if provider supports it
if provider.supports_generation:
text = await provider.generate("prompt")
```
**Migration Path:**
- `embedding/service.py` now wraps `providers.get_provider()` for compatibility
- `tests/rag_evaluation/llm_providers.py` now uses unified providers
- Old imports still work, marked as deprecated in docstrings
### 4. Amazon Bedrock Implementation
**Features:**
- Supports both embeddings and text generation
- Model-specific request/response handling for:
- Titan Embed (amazon.titan-embed-text-*)
- Cohere Embed (cohere.embed-*)
- Claude (anthropic.claude-*)
- Llama (meta.llama3-*)
- Titan Text (amazon.titan-text-*)
- Mistral (mistral.*)
- Uses boto3 bedrock-runtime client
- Graceful degradation if boto3 not installed
- Async implementation matching existing patterns
**Model-Specific Handling:**
```python
# Bedrock embedding request (Titan)
{"inputText": text}
# Bedrock generation request (Claude)
{
"anthropic_version": "bedrock-2023-05-31",
"max_tokens": max_tokens,
"temperature": 0.7,
"messages": [{"role": "user", "content": prompt}]
}
```
## Consequences
### Positive
1. **Sustainable Provider Additions**
- New providers only need to implement `Provider` ABC
- Auto-detection via environment variables
- No modifications to existing code required
2. **Code Consolidation**
- Single provider interface instead of two
- Unified configuration pattern
- Eliminated duplication
3. **Better Extensibility**
- Providers can support one or both capabilities
- Clear capability detection via properties
- Registry pattern simplifies auto-detection
4. **Improved Testing**
- RAG evaluation can use any provider (Ollama, Anthropic, Bedrock)
- Comprehensive unit tests for all providers
- Mocked boto3 tests for Bedrock
5. **Production-Ready Bedrock Support**
- Full embedding and generation support
- Multiple model families supported
- AWS credential chain integration
### Neutral
1. **Optional Boto3 Dependency**
- boto3 is dev dependency only (not required for core functionality)
- Bedrock provider gracefully fails if boto3 not installed
- Users who want Bedrock must `pip install boto3`
2. **Capability Properties**
- All providers must implement capability properties
- Methods raise `NotImplementedError` if capability not supported
- Clear error messages guide users to alternatives
### Negative
1. **Migration Effort**
- Existing code must be migrated to new imports (optional, backward compatible)
- Documentation needs updating
- Users must learn new environment variables
2. **Increased Complexity**
- Provider base class has more methods (embedding + generation)
- More environment variables to configure
- Capability detection adds runtime checks
## Implementation
### Files Created
**New Provider Infrastructure:**
- `nextcloud_mcp_server/providers/__init__.py`
- `nextcloud_mcp_server/providers/base.py`
- `nextcloud_mcp_server/providers/registry.py`
- `nextcloud_mcp_server/providers/ollama.py`
- `nextcloud_mcp_server/providers/anthropic.py`
- `nextcloud_mcp_server/providers/bedrock.py`
- `nextcloud_mcp_server/providers/simple.py`
**Tests:**
- `tests/unit/providers/__init__.py`
- `tests/unit/providers/test_bedrock.py` (9 unit tests)
**Documentation:**
- `docs/ADR-015-unified-provider-architecture.md` (this file)
### Files Modified
**Backward Compatibility:**
- `nextcloud_mcp_server/embedding/service.py` - Now wraps `get_provider()`
- `tests/rag_evaluation/llm_providers.py` - Uses unified providers
**Dependencies:**
- `pyproject.toml` - Added `boto3>=1.35.0` to dev dependencies
### Testing Results
**Unit Tests:** 127 passed (including 9 new Bedrock tests)
**Type Checking:** All checks passed (ty)
**Linting:** All checks passed (ruff)
**Backward Compatibility:** Verified - existing embedding tests work
## Alternatives Considered
### Alternative 1: Keep Separate Provider Systems
**Pros:**
- No refactoring needed
- Simpler short-term
**Cons:**
- Bedrock would need to be implemented twice
- Continued code duplication
- No long-term scalability
**Decision:** Rejected - technical debt would continue to grow
### Alternative 2: Separate Embedding and Generation Providers
Use composition instead of unified interface:
```python
class CombinedProvider:
def __init__(self, embedding: EmbeddingProvider, generation: LLMProvider):
self.embedding = embedding
self.generation = generation
```
**Pros:**
- Clearer separation of concerns
- Simpler individual providers
**Cons:**
- Bedrock and Ollama naturally do both - artificial separation
- More complex configuration (two providers to configure)
- More boilerplate code
**Decision:** Rejected - unified interface better matches provider capabilities
### Alternative 3: Plugin System
Dynamic provider registration via entry points:
```python
# setup.py
entry_points={
'nextcloud_mcp.providers': [
'ollama = nextcloud_mcp_server.providers.ollama:OllamaProvider',
'bedrock = nextcloud_mcp_server.providers.bedrock:BedrockProvider',
]
}
```
**Pros:**
- Most extensible
- Third-party providers possible
**Cons:**
- Over-engineered for current needs
- Added complexity
- No immediate benefit
**Decision:** Deferred - can add later if needed
## Future Work
1. **Additional Providers**
- OpenAI (embeddings + generation)
- Cohere (embeddings + generation)
- Google Vertex AI
- Azure OpenAI
2. **Provider Features**
- Streaming generation support
- Batch API optimization (when available)
- Model-specific optimizations
- Cost tracking and metrics
3. **Configuration Improvements**
- Provider profiles (development, production)
- Model aliasing (e.g., "small", "large")
- Fallback provider chains
4. **Testing**
- Integration tests with real Bedrock endpoints
- Performance benchmarking across providers
- Cost comparison analysis
## References
- [boto3 Bedrock Runtime Documentation](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/bedrock-runtime.html)
- [Amazon Bedrock User Guide](https://docs.aws.amazon.com/bedrock/latest/userguide/what-is-bedrock.html)
- ADR-003: Vector Database and Semantic Search
- ADR-008: MCP Sampling for Semantic Search
- ADR-013: RAG Evaluation Framework
+338
View File
@@ -0,0 +1,338 @@
# Amazon Bedrock Setup Guide
This guide covers how to configure the Nextcloud MCP Server to use Amazon Bedrock for embeddings and text generation.
## Prerequisites
1. **AWS Account** with access to Amazon Bedrock
2. **boto3 library** installed: `pip install boto3` or `uv sync --group dev`
3. **Model Access** - Request access to models in AWS Bedrock console
## Required AWS Permissions
### IAM Policy for Bedrock Access
The AWS IAM user or role needs the following permissions:
```json
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "BedrockInvokeModels",
"Effect": "Allow",
"Action": [
"bedrock:InvokeModel",
"bedrock:InvokeModelWithResponseStream"
],
"Resource": [
"arn:aws:bedrock:*::foundation-model/*"
]
}
]
}
```
### Minimal Permissions (Production)
For production deployments, restrict to specific models:
```json
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "BedrockEmbeddings",
"Effect": "Allow",
"Action": [
"bedrock:InvokeModel"
],
"Resource": [
"arn:aws:bedrock:us-east-1::foundation-model/amazon.titan-embed-text-v2:0"
]
},
{
"Sid": "BedrockGeneration",
"Effect": "Allow",
"Action": [
"bedrock:InvokeModel"
],
"Resource": [
"arn:aws:bedrock:us-east-1::foundation-model/anthropic.claude-3-sonnet-20240229-v1:0"
]
}
]
}
```
### Additional Permissions (Optional)
For advanced use cases:
```json
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "BedrockListModels",
"Effect": "Allow",
"Action": [
"bedrock:ListFoundationModels",
"bedrock:GetFoundationModel"
],
"Resource": "*"
},
{
"Sid": "BedrockAsyncInvoke",
"Effect": "Allow",
"Action": [
"bedrock:InvokeModelAsync",
"bedrock:GetAsyncInvoke",
"bedrock:ListAsyncInvokes"
],
"Resource": [
"arn:aws:bedrock:*::foundation-model/*"
]
}
]
}
```
## Model Access
Before using Bedrock models, you must request access in the AWS Console:
1. Navigate to **Amazon Bedrock****Model access**
2. Click **Manage model access**
3. Select models you want to use:
- **Embeddings:** Amazon Titan Embed Text, Cohere Embed
- **Text Generation:** Anthropic Claude, Meta Llama, Amazon Titan Text
4. Click **Request model access**
5. Wait for approval (usually instant for most models)
## Supported Models
### Embedding Models
| Provider | Model ID | Dimensions | Best For |
|----------|----------|------------|----------|
| Amazon Titan | `amazon.titan-embed-text-v1` | 1,536 | General purpose |
| Amazon Titan | `amazon.titan-embed-text-v2:0` | 1,024 | Latest, improved quality |
| Cohere | `cohere.embed-english-v3` | 1,024 | English text |
| Cohere | `cohere.embed-multilingual-v3` | 1,024 | Multilingual |
### Text Generation Models
| Provider | Model ID | Context | Best For |
|----------|----------|---------|----------|
| Anthropic | `anthropic.claude-3-sonnet-20240229-v1:0` | 200K | Balanced performance |
| Anthropic | `anthropic.claude-3-haiku-20240307-v1:0` | 200K | Fast, cost-effective |
| Anthropic | `anthropic.claude-3-opus-20240229-v1:0` | 200K | Highest quality |
| Meta | `meta.llama3-8b-instruct-v1:0` | 8K | Fast, open-source |
| Meta | `meta.llama3-70b-instruct-v1:0` | 8K | High quality |
| Amazon | `amazon.titan-text-express-v1` | 8K | Fast, low cost |
| Mistral | `mistral.mistral-7b-instruct-v0:2` | 32K | Efficient |
## Configuration
### Environment Variables
**Required:**
```bash
AWS_REGION=us-east-1
```
**Optional (at least one model required):**
```bash
# For embeddings
BEDROCK_EMBEDDING_MODEL=amazon.titan-embed-text-v2:0
# For text generation (RAG evaluation)
BEDROCK_GENERATION_MODEL=anthropic.claude-3-sonnet-20240229-v1:0
```
**AWS Credentials (choose one method):**
**Method 1: Environment Variables**
```bash
AWS_ACCESS_KEY_ID=AKIAIOSFODNN7EXAMPLE
AWS_SECRET_ACCESS_KEY=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
```
**Method 2: AWS Credentials File** (`~/.aws/credentials`)
```ini
[default]
aws_access_key_id = AKIAIOSFODNN7EXAMPLE
aws_secret_access_key = wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
```
**Method 3: IAM Role** (when running on AWS EC2/ECS/Lambda)
- No credentials needed, uses instance/task role automatically
### Docker Configuration
Add to your `docker-compose.yml`:
```yaml
services:
mcp:
environment:
- AWS_REGION=us-east-1
- BEDROCK_EMBEDDING_MODEL=amazon.titan-embed-text-v2:0
- BEDROCK_GENERATION_MODEL=anthropic.claude-3-sonnet-20240229-v1:0
- AWS_ACCESS_KEY_ID=${AWS_ACCESS_KEY_ID}
- AWS_SECRET_ACCESS_KEY=${AWS_SECRET_ACCESS_KEY}
```
Or use AWS credentials file volume mount:
```yaml
services:
mcp:
volumes:
- ~/.aws:/root/.aws:ro
environment:
- AWS_REGION=us-east-1
- BEDROCK_EMBEDDING_MODEL=amazon.titan-embed-text-v2:0
```
## Usage Examples
### Embeddings Only
```bash
export AWS_REGION=us-east-1
export BEDROCK_EMBEDDING_MODEL=amazon.titan-embed-text-v2:0
export AWS_ACCESS_KEY_ID=your-key
export AWS_SECRET_ACCESS_KEY=your-secret
uv run nextcloud-mcp-server
```
### Both Embeddings and Generation
```bash
export AWS_REGION=us-east-1
export BEDROCK_EMBEDDING_MODEL=amazon.titan-embed-text-v2:0
export BEDROCK_GENERATION_MODEL=anthropic.claude-3-sonnet-20240229-v1:0
# For RAG evaluation with Bedrock
export RAG_EVAL_PROVIDER=bedrock
export RAG_EVAL_BEDROCK_MODEL=anthropic.claude-3-sonnet-20240229-v1:0
uv run python -m tests.rag_evaluation.evaluate
```
### Programmatic Usage
```python
from nextcloud_mcp_server.providers import BedrockProvider
# Embeddings only
provider = BedrockProvider(
region_name="us-east-1",
embedding_model="amazon.titan-embed-text-v2:0",
)
embeddings = await provider.embed_batch(["text1", "text2"])
# Both capabilities
provider = BedrockProvider(
region_name="us-east-1",
embedding_model="amazon.titan-embed-text-v2:0",
generation_model="anthropic.claude-3-sonnet-20240229-v1:0",
)
# Generate embeddings
embedding = await provider.embed("query text")
# Generate text
response = await provider.generate("Write a summary", max_tokens=500)
```
## Cost Considerations
### Embedding Costs (as of Jan 2025)
| Model | Price per 1K tokens |
|-------|---------------------|
| Titan Embed Text v2 | $0.0001 |
| Cohere Embed English v3 | $0.0001 |
### Generation Costs (as of Jan 2025)
| Model | Input (per 1K tokens) | Output (per 1K tokens) |
|-------|----------------------|------------------------|
| Claude 3 Haiku | $0.00025 | $0.00125 |
| Claude 3 Sonnet | $0.003 | $0.015 |
| Claude 3 Opus | $0.015 | $0.075 |
| Llama 3 8B | $0.0003 | $0.0006 |
| Titan Text Express | $0.0002 | $0.0006 |
**Note:** Prices vary by region. Check [AWS Bedrock Pricing](https://aws.amazon.com/bedrock/pricing/) for current rates.
## Troubleshooting
### Error: "Executable doesn't exist" or boto3 not found
**Solution:**
```bash
uv sync --group dev # Installs boto3
```
### Error: "AccessDeniedException"
**Causes:**
1. IAM permissions missing
2. Model access not requested
3. Wrong AWS region
**Solution:**
1. Verify IAM policy includes `bedrock:InvokeModel`
2. Request model access in Bedrock console
3. Check model is available in your region
### Error: "ResourceNotFoundException"
**Cause:** Invalid model ID or model not available in region
**Solution:**
- Verify model ID matches exactly (case-sensitive)
- Check model availability in your AWS region
- Use `aws bedrock list-foundation-models` to see available models
### Error: "ThrottlingException"
**Cause:** Rate limit exceeded
**Solution:**
- Reduce request rate
- Request quota increase via AWS Support
- Use batch operations where possible
## Security Best Practices
1. **Use IAM Roles** when running on AWS infrastructure
2. **Rotate Access Keys** regularly if using IAM users
3. **Restrict Permissions** to only required models
4. **Enable CloudTrail** for audit logging
5. **Use AWS Secrets Manager** for credential management
6. **Monitor Costs** with AWS Cost Explorer and Budgets
## Regional Availability
Amazon Bedrock is available in:
- **US East (N. Virginia)**: `us-east-1` ✅ Most models
- **US West (Oregon)**: `us-west-2` ✅ Most models
- **Asia Pacific (Singapore)**: `ap-southeast-1`
- **Asia Pacific (Tokyo)**: `ap-northeast-1`
- **Europe (Frankfurt)**: `eu-central-1`
**Note:** Model availability varies by region. Check the [AWS Bedrock documentation](https://docs.aws.amazon.com/bedrock/latest/userguide/models-regions.html) for current availability.
## References
- [AWS Bedrock Documentation](https://docs.aws.amazon.com/bedrock/)
- [AWS Bedrock Pricing](https://aws.amazon.com/bedrock/pricing/)
- [boto3 Bedrock Runtime API](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/bedrock-runtime.html)
- [Provider Architecture ADR](./ADR-015-unified-provider-architecture.md)
+1 -1
View File
@@ -243,7 +243,7 @@ If you see cardinality warnings:
The observability stack integrates at multiple layers:
1. **HTTP Layer**: `ObservabilityMiddleware` tracks all HTTP requests
2. **MCP Layer**: Tools use `@trace_mcp_tool` for span creation
2. **MCP Layer**: Tools use `@instrument_tool` for automatic metrics and trace span creation
3. **Client Layer**: `BaseNextcloudClient` tracks all API calls
4. **OAuth Layer**: Token operations are traced and metered
5. **Background Tasks**: Vector sync operations emit metrics/traces
+66 -14
View File
@@ -1,5 +1,6 @@
import logging
import os
import time
from collections.abc import AsyncIterator
from contextlib import AsyncExitStack, asynccontextmanager
from dataclasses import dataclass
@@ -44,6 +45,10 @@ from nextcloud_mcp_server.observability import (
setup_metrics,
setup_tracing,
)
from nextcloud_mcp_server.observability.metrics import (
record_dependency_check,
set_dependency_health,
)
from nextcloud_mcp_server.server import (
configure_calendar_tools,
configure_contacts_tools,
@@ -441,7 +446,7 @@ async def app_lifespan_basic(server: FastMCP) -> AsyncIterator[AppContext]:
# Start background tasks using anyio TaskGroup
async with anyio.create_task_group() as tg:
# Start scanner task
tg.start_soon(
await tg.start(
scanner_task,
send_stream,
shutdown_event,
@@ -452,7 +457,7 @@ async def app_lifespan_basic(server: FastMCP) -> AsyncIterator[AppContext]:
# Start processor pool (each gets a cloned receive stream)
for i in range(settings.vector_sync_processor_workers):
tg.start_soon(
await tg.start(
processor_task,
i,
receive_stream.clone(),
@@ -502,9 +507,9 @@ async def setup_oauth_config():
- External IdP mode: OIDC_DISCOVERY_URL points to external provider
→ External IdP for OAuth, Nextcloud user_oidc validates tokens and provides API access
Uses generic OIDC environment variables:
Uses OIDC environment variables:
- OIDC_DISCOVERY_URL: OIDC discovery endpoint (optional, defaults to NEXTCLOUD_HOST)
- OIDC_CLIENT_ID / OIDC_CLIENT_SECRET: Static credentials (optional, uses DCR if not provided)
- NEXTCLOUD_OIDC_CLIENT_ID / NEXTCLOUD_OIDC_CLIENT_SECRET: Static credentials (optional, uses DCR if not provided)
- NEXTCLOUD_OIDC_SCOPES: Requested OAuth scopes
This is done synchronously before FastMCP initialization because FastMCP
@@ -628,19 +633,21 @@ async def setup_oauth_config():
)
# Load client credentials (static or dynamic registration)
client_id = os.getenv("OIDC_CLIENT_ID")
client_secret = os.getenv("OIDC_CLIENT_SECRET")
client_id = os.getenv("NEXTCLOUD_OIDC_CLIENT_ID")
client_secret = os.getenv("NEXTCLOUD_OIDC_CLIENT_SECRET")
if client_id and client_secret:
logger.info(f"Using static OIDC client credentials: {client_id}")
elif registration_endpoint:
logger.info("OIDC_CLIENT_ID not set, attempting Dynamic Client Registration")
logger.info(
"NEXTCLOUD_OIDC_CLIENT_ID not set, attempting Dynamic Client Registration"
)
client_id, client_secret = await load_oauth_client_credentials(
nextcloud_host=nextcloud_host, registration_endpoint=registration_endpoint
)
else:
raise ValueError(
"OIDC_CLIENT_ID and OIDC_CLIENT_SECRET environment variables are required "
"NEXTCLOUD_OIDC_CLIENT_ID and NEXTCLOUD_OIDC_CLIENT_SECRET environment variables are required "
"when the OIDC provider does not support Dynamic Client Registration. "
f"Discovery URL: {discovery_url}"
)
@@ -1140,7 +1147,7 @@ def get_app(transport: str = "sse", enabled_apps: list[str] | None = None):
# Start background tasks using anyio TaskGroup
async with anyio_module.create_task_group() as tg:
# Start scanner task
tg.start_soon(
await tg.start(
scanner_task,
send_stream,
shutdown_event,
@@ -1151,7 +1158,7 @@ def get_app(transport: str = "sse", enabled_apps: list[str] | None = None):
# Start processor pool (each gets a cloned receive stream)
for i in range(settings.vector_sync_processor_workers):
tg.start_soon(
await tg.start(
processor_task,
i,
receive_stream.clone(),
@@ -1205,12 +1212,35 @@ def get_app(transport: str = "sse", enabled_apps: list[str] | None = None):
checks = {}
is_ready = True
# Check Nextcloud host configuration
# Check Nextcloud host configuration and connectivity
nextcloud_host = os.getenv("NEXTCLOUD_HOST")
if nextcloud_host:
checks["nextcloud_configured"] = "ok"
# Try to connect to Nextcloud
start_time = time.time()
try:
async with httpx.AsyncClient(timeout=2.0) as client:
response = await client.get(f"{nextcloud_host}/status.php")
duration = time.time() - start_time
if response.status_code == 200:
checks["nextcloud_reachable"] = "ok"
set_dependency_health("nextcloud", True)
else:
checks["nextcloud_reachable"] = (
f"error: status {response.status_code}"
)
set_dependency_health("nextcloud", False)
is_ready = False
record_dependency_check("nextcloud", duration)
except Exception as e:
duration = time.time() - start_time
checks["nextcloud_reachable"] = f"error: {str(e)}"
set_dependency_health("nextcloud", False)
record_dependency_check("nextcloud", duration)
is_ready = False
else:
checks["nextcloud_configured"] = "error: NEXTCLOUD_HOST not set"
set_dependency_health("nextcloud", False)
is_ready = False
# Check authentication configuration
@@ -1238,20 +1268,29 @@ def get_app(transport: str = "sse", enabled_apps: list[str] | None = None):
qdrant_url = os.getenv("QDRANT_URL") # Only set in network mode
if vector_sync_enabled and qdrant_url:
start_time = time.time()
try:
async with httpx.AsyncClient(timeout=2.0) as client:
response = await client.get(f"{qdrant_url}/readyz")
duration = time.time() - start_time
if response.status_code == 200:
checks["qdrant"] = "ok"
set_dependency_health("qdrant", True)
else:
checks["qdrant"] = f"error: status {response.status_code}"
set_dependency_health("qdrant", False)
is_ready = False
record_dependency_check("qdrant", duration)
except Exception as e:
duration = time.time() - start_time
checks["qdrant"] = f"error: {str(e)}"
set_dependency_health("qdrant", False)
record_dependency_check("qdrant", duration)
is_ready = False
elif vector_sync_enabled:
# Using embedded Qdrant (memory or persistent mode)
checks["qdrant"] = "embedded"
set_dependency_health("qdrant", True)
status_code = 200 if is_ready else 503
return JSONResponse(
@@ -1438,6 +1477,10 @@ def get_app(transport: str = "sse", enabled_apps: list[str] | None = None):
user_info_html,
vector_sync_status_fragment,
)
from nextcloud_mcp_server.auth.viz_routes import (
vector_visualization_html,
vector_visualization_search,
)
from nextcloud_mcp_server.auth.webhook_routes import (
disable_webhook_preset,
enable_webhook_preset,
@@ -1457,6 +1500,15 @@ def get_app(transport: str = "sse", enabled_apps: list[str] | None = None):
vector_sync_status_fragment,
methods=["GET"],
), # /app/vector-sync/status
# Vector visualization routes
Route(
"/vector-viz", vector_visualization_html, methods=["GET"]
), # /app/vector-viz
Route(
"/vector-viz/search",
vector_visualization_search,
methods=["GET"],
), # /app/vector-viz/search
# Webhook management routes (admin-only)
Route("/webhooks", webhook_management_pane, methods=["GET"]), # /app/webhooks
Route(
@@ -1471,7 +1523,7 @@ def get_app(transport: str = "sse", enabled_apps: list[str] | None = None):
browser_app = Starlette(routes=browser_routes)
browser_app.add_middleware(
AuthenticationMiddleware,
AuthenticationMiddleware, # type: ignore[invalid-argument-type]
backend=SessionAuthBackend(oauth_enabled=oauth_enabled),
)
@@ -1561,7 +1613,7 @@ def get_app(transport: str = "sse", enabled_apps: list[str] | None = None):
# Add CORS middleware to allow browser-based clients like MCP Inspector
app.add_middleware(
CORSMiddleware,
CORSMiddleware, # type: ignore[invalid-argument-type]
allow_origins=["*"], # Allow all origins for development
allow_credentials=True,
allow_methods=["*"],
@@ -1571,7 +1623,7 @@ def get_app(transport: str = "sse", enabled_apps: list[str] | None = None):
# Add observability middleware (metrics + tracing)
if settings.metrics_enabled or settings.otel_exporter_otlp_endpoint:
app.add_middleware(ObservabilityMiddleware)
app.add_middleware(ObservabilityMiddleware) # type: ignore[invalid-argument-type]
logger.info("Observability middleware enabled (metrics and/or tracing)")
# Add exception handler for scope challenges (OAuth mode only)
+19 -7
View File
@@ -12,6 +12,10 @@ from mcp.server.fastmcp import Context
from ..client import NextcloudClient
from ..config import get_settings
from ..observability.metrics import (
oauth_token_cache_hits_total,
oauth_token_exchange_total,
)
from .token_exchange import exchange_token_for_audience
logger = logging.getLogger(__name__)
@@ -138,6 +142,7 @@ async def get_session_client_from_context(
logger.debug(
f"Using cached exchanged token (expires in {expiry - time.time():.1f}s)"
)
oauth_token_cache_hits_total.labels(hit="true").inc()
return NextcloudClient.from_token(
base_url=base_url, token=cached_token, username=username
)
@@ -145,17 +150,24 @@ async def get_session_client_from_context(
logger.debug("Cached token expired, removing from cache")
del _exchange_cache[cache_key]
oauth_token_cache_hits_total.labels(hit="false").inc()
# Perform RFC 8693 token exchange
logger.info(f"Exchanging MCP token for Nextcloud API token (user: {username})")
# Exchange for Nextcloud resource URI audience
exchanged_token, expires_in = await exchange_token_for_audience(
subject_token=mcp_token,
requested_audience=settings.nextcloud_resource_uri or "nextcloud",
requested_scopes=None, # Nextcloud doesn't support scopes
)
try:
# Exchange for Nextcloud resource URI audience
exchanged_token, expires_in = await exchange_token_for_audience(
subject_token=mcp_token,
requested_audience=settings.nextcloud_resource_uri or "nextcloud",
requested_scopes=None, # Nextcloud doesn't support scopes
)
oauth_token_exchange_total.labels(status="success").inc()
logger.info(f"Token exchange successful. Token expires in {expires_in}s")
logger.info(f"Token exchange successful. Token expires in {expires_in}s")
except Exception:
oauth_token_exchange_total.labels(status="error").inc()
raise
# Cache the exchanged token
# Use the minimum of exchange TTL and configured cache TTL
+109 -80
View File
@@ -35,6 +35,8 @@ from typing import Any, Optional
import aiosqlite
from cryptography.fernet import Fernet
from nextcloud_mcp_server.observability.metrics import record_db_operation
logger = logging.getLogger(__name__)
@@ -292,35 +294,43 @@ class RefreshTokenStorage:
# For Flow 2, set provisioned_at timestamp
provisioned_at = now if flow_type == "flow2" else None
async with aiosqlite.connect(self.db_path) as db:
await db.execute(
"""
INSERT OR REPLACE INTO refresh_tokens
(user_id, encrypted_token, expires_at, created_at, updated_at,
flow_type, token_audience, provisioned_at, provisioning_client_id, scopes)
VALUES (?, ?, ?, COALESCE((SELECT created_at FROM refresh_tokens WHERE user_id = ?), ?), ?,
?, ?, ?, ?, ?)
""",
(
user_id,
encrypted_token,
expires_at,
user_id,
now,
now,
flow_type,
token_audience,
provisioned_at,
provisioning_client_id,
scopes_json,
),
)
await db.commit()
start_time = time.time()
try:
async with aiosqlite.connect(self.db_path) as db:
await db.execute(
"""
INSERT OR REPLACE INTO refresh_tokens
(user_id, encrypted_token, expires_at, created_at, updated_at,
flow_type, token_audience, provisioned_at, provisioning_client_id, scopes)
VALUES (?, ?, ?, COALESCE((SELECT created_at FROM refresh_tokens WHERE user_id = ?), ?), ?,
?, ?, ?, ?, ?)
""",
(
user_id,
encrypted_token,
expires_at,
user_id,
now,
now,
flow_type,
token_audience,
provisioned_at,
provisioning_client_id,
scopes_json,
),
)
await db.commit()
duration = time.time() - start_time
record_db_operation("sqlite", "insert", duration, "success")
logger.info(
f"Stored refresh token for user {user_id}"
+ (f" (expires at {expires_at})" if expires_at else "")
)
logger.info(
f"Stored refresh token for user {user_id}"
+ (f" (expires at {expires_at})" if expires_at else "")
)
except Exception:
duration = time.time() - start_time
record_db_operation("sqlite", "insert", duration, "error")
raise
# Audit log
await self._audit_log(
@@ -422,40 +432,45 @@ class RefreshTokenStorage:
if not self._initialized:
await self.initialize()
async with aiosqlite.connect(self.db_path) as db:
async with db.execute(
"""
SELECT encrypted_token, expires_at, flow_type, token_audience,
provisioned_at, provisioning_client_id, scopes
FROM refresh_tokens WHERE user_id = ?
""",
(user_id,),
) as cursor:
row = await cursor.fetchone()
if not row:
logger.debug(f"No refresh token found for user {user_id}")
return None
(
encrypted_token,
expires_at,
flow_type,
token_audience,
provisioned_at,
provisioning_client_id,
scopes_json,
) = row
# Check expiration
if expires_at is not None and expires_at < time.time():
logger.warning(
f"Refresh token for user {user_id} has expired (expired at {expires_at})"
)
await self.delete_refresh_token(user_id)
return None
start_time = time.time()
try:
async with aiosqlite.connect(self.db_path) as db:
async with db.execute(
"""
SELECT encrypted_token, expires_at, flow_type, token_audience,
provisioned_at, provisioning_client_id, scopes
FROM refresh_tokens WHERE user_id = ?
""",
(user_id,),
) as cursor:
row = await cursor.fetchone()
if not row:
logger.debug(f"No refresh token found for user {user_id}")
duration = time.time() - start_time
record_db_operation("sqlite", "select", duration, "success")
return None
(
encrypted_token,
expires_at,
flow_type,
token_audience,
provisioned_at,
provisioning_client_id,
scopes_json,
) = row
# Check expiration
if expires_at is not None and expires_at < time.time():
logger.warning(
f"Refresh token for user {user_id} has expired (expired at {expires_at})"
)
await self.delete_refresh_token(user_id)
duration = time.time() - start_time
record_db_operation("sqlite", "select", duration, "success")
return None
decrypted_token = self.cipher.decrypt(encrypted_token).decode()
scopes = json.loads(scopes_json) if scopes_json else None
@@ -463,6 +478,9 @@ class RefreshTokenStorage:
f"Retrieved refresh token for user {user_id} (flow_type: {flow_type})"
)
duration = time.time() - start_time
record_db_operation("sqlite", "select", duration, "success")
return {
"refresh_token": decrypted_token,
"expires_at": expires_at,
@@ -474,6 +492,8 @@ class RefreshTokenStorage:
"scopes": scopes,
}
except Exception as e:
duration = time.time() - start_time
record_db_operation("sqlite", "select", duration, "error")
logger.error(f"Failed to decrypt refresh token for user {user_id}: {e}")
return None
@@ -568,25 +588,34 @@ class RefreshTokenStorage:
if not self._initialized:
await self.initialize()
async with aiosqlite.connect(self.db_path) as db:
cursor = await db.execute(
"DELETE FROM refresh_tokens WHERE user_id = ?",
(user_id,),
)
await db.commit()
deleted = cursor.rowcount > 0
start_time = time.time()
try:
async with aiosqlite.connect(self.db_path) as db:
cursor = await db.execute(
"DELETE FROM refresh_tokens WHERE user_id = ?",
(user_id,),
)
await db.commit()
deleted = cursor.rowcount > 0
if deleted:
logger.info(f"Deleted refresh token for user {user_id}")
await self._audit_log(
event="delete_refresh_token",
user_id=user_id,
auth_method="offline_access",
)
else:
logger.debug(f"No refresh token to delete for user {user_id}")
duration = time.time() - start_time
record_db_operation("sqlite", "delete", duration, "success")
return deleted
if deleted:
logger.info(f"Deleted refresh token for user {user_id}")
await self._audit_log(
event="delete_refresh_token",
user_id=user_id,
auth_method="offline_access",
)
else:
logger.debug(f"No refresh token to delete for user {user_id}")
return deleted
except Exception:
duration = time.time() - start_time
record_db_operation("sqlite", "delete", duration, "error")
raise
async def get_all_user_ids(self) -> list[str]:
"""
@@ -1281,7 +1310,7 @@ async def generate_encryption_key() -> str:
# Example usage
if __name__ == "__main__":
import asyncio
import anyio
async def main():
# Generate a key for testing
@@ -1289,4 +1318,4 @@ if __name__ == "__main__":
print(f"Generated encryption key: {key}")
print(f"Set this in your environment: export TOKEN_ENCRYPTION_KEY='{key}'")
asyncio.run(main())
anyio.run(main)
+2 -2
View File
@@ -14,11 +14,11 @@ The Token Broker provides:
- Session vs background token separation (RFC 8693)
"""
import asyncio
import logging
from datetime import datetime, timedelta, timezone
from typing import Dict, Optional, Tuple
import anyio
import httpx
import jwt
from cryptography.fernet import Fernet
@@ -43,7 +43,7 @@ class TokenCache:
self._cache: Dict[str, Tuple[str, datetime]] = {}
self._ttl = timedelta(seconds=ttl_seconds)
self._early_refresh = timedelta(seconds=early_refresh_seconds)
self._lock = asyncio.Lock()
self._lock = anyio.Lock()
async def get(self, user_id: str) -> Optional[str]:
"""Get cached token if valid."""
@@ -26,6 +26,10 @@ from jwt import PyJWKClient
from mcp.server.auth.provider import AccessToken, TokenVerifier
from nextcloud_mcp_server.config import Settings
from nextcloud_mcp_server.observability.metrics import (
oauth_token_cache_hits_total,
record_oauth_token_validation,
)
logger = logging.getLogger(__name__)
@@ -105,8 +109,11 @@ class UnifiedTokenVerifier(TokenVerifier):
cached = self._get_cached_token(token)
if cached:
logger.debug("Token found in cache")
oauth_token_cache_hits_total.labels(hit="true").inc()
return cached
oauth_token_cache_hits_total.labels(hit="false").inc()
# Both modes do the same validation (MCP audience only)
return await self._verify_mcp_audience(token)
@@ -124,13 +131,24 @@ class UnifiedTokenVerifier(TokenVerifier):
Returns:
AccessToken if valid with MCP audience, None otherwise
"""
validation_method = "unknown"
try:
# Attempt JWT verification first
if self._is_jwt_format(token) and self.jwks_client:
validation_method = "jwt"
payload = await self._verify_jwt_signature(token)
if payload:
record_oauth_token_validation("jwt", "valid")
else:
record_oauth_token_validation("jwt", "invalid")
else:
# Fall back to introspection for opaque tokens
validation_method = "introspect"
payload = await self._introspect_token(token)
if payload:
record_oauth_token_validation("introspect", "valid")
else:
record_oauth_token_validation("introspect", "invalid")
if not payload:
return None
@@ -146,6 +164,8 @@ class UnifiedTokenVerifier(TokenVerifier):
f"Got {audiences}, need MCP ({self.settings.oidc_client_id} or "
f"{self.settings.nextcloud_mcp_server_url})"
)
# Record as invalid due to audience mismatch
record_oauth_token_validation(validation_method, "invalid")
return None
# Log based on mode for clarity
@@ -163,6 +183,7 @@ class UnifiedTokenVerifier(TokenVerifier):
except Exception as e:
logger.error(f"Token verification failed: {e}")
record_oauth_token_validation(validation_method, "error")
return None
def _has_mcp_audience(self, payload: dict[str, Any]) -> bool:
@@ -489,6 +489,16 @@ async def user_info_html(request: Request) -> HTMLResponse:
str(request.url_for("oauth_logout")) if oauth_ctx else "/oauth/logout"
)
# Get Nextcloud host for generating links to apps (used by viz tab)
# Use public issuer URL if available (for browser-accessible links),
# otherwise fall back to NEXTCLOUD_HOST from settings
from nextcloud_mcp_server.config import get_settings
settings = get_settings()
nextcloud_host_for_links = (
os.getenv("NEXTCLOUD_PUBLIC_ISSUER_URL") or settings.nextcloud_host
)
# Build host info HTML (BasicAuth only)
host_info_html = ""
if auth_mode == "basic":
@@ -658,6 +668,121 @@ async def user_info_html(request: Request) -> HTMLResponse:
<!-- Alpine.js for tab state management -->
<script defer src="https://cdn.jsdelivr.net/npm/alpinejs@3.x.x/dist/cdn.min.js"></script>
<!-- Plotly.js for vector visualization -->
<script src="https://cdn.plot.ly/plotly-2.27.0.min.js"></script>
<!-- Vector visualization app (Alpine.js component) -->
<script>
function vizApp() {{
return {{
query: '',
algorithm: 'bm25_hybrid',
showAdvanced: false,
docTypes: [''], // Default to "All Types"
limit: 50,
scoreThreshold: 0.0,
loading: false,
results: [],
async executeSearch() {{
this.loading = true;
this.results = [];
try {{
const params = new URLSearchParams({{
query: this.query,
algorithm: this.algorithm,
limit: this.limit,
score_threshold: this.scoreThreshold,
}});
// Add doc_types parameter (filter out empty string for "All Types")
const selectedTypes = this.docTypes.filter(t => t !== '');
if (selectedTypes.length > 0) {{
params.append('doc_types', selectedTypes.join(','));
}}
const response = await fetch(`/app/vector-viz/search?${{params}}`);
const data = await response.json();
if (data.success) {{
this.results = data.results;
this.renderPlot(data.coordinates_2d, data.results);
}} else {{
alert('Search failed: ' + data.error);
}}
}} catch (error) {{
alert('Error: ' + error.message);
}} finally {{
this.loading = false;
}}
}},
renderPlot(coordinates, results) {{
// Calculate score range for auto-scaling
const scores = results.map(r => r.score);
const minScore = Math.min(...scores);
const maxScore = Math.max(...scores);
const trace = {{
x: coordinates.map(c => c[0]),
y: coordinates.map(c => c[1]),
mode: 'markers',
type: 'scatter',
text: results.map(r => `${{r.title}}<br>Score: ${{r.score.toFixed(3)}}`),
marker: {{
// Multi-channel encoding: size + opacity + color for visual hierarchy
// Power scaling (score^2) amplifies visual differences dramatically
// score=0.0 → 6px, score=0.5 → 9.5px, score=1.0 → 20px
size: results.map(r => 6 + (Math.pow(r.score, 2) * 14)),
// Linear opacity scaling (0.2-1.0 range keeps all points visible)
opacity: results.map(r => 0.2 + (r.score * 0.8)),
// Color gradient shows score
color: scores,
colorscale: 'Viridis',
showscale: true,
colorbar: {{ title: 'Relative Score' }},
// Scores are normalized 0-1 within result set
cmin: 0,
cmax: 1
}}
}};
const layout = {{
title: `Vector Space (PCA 2D) - ${{results.length}} results`,
xaxis: {{ title: 'PC1' }},
yaxis: {{ title: 'PC2' }},
hovermode: 'closest',
height: 600
}};
Plotly.newPlot('viz-plot', [trace], layout);
}},
getNextcloudUrl(result) {{
// Generate Nextcloud URL based on document type
// Use the actual Nextcloud host (port 8080), not the MCP server
const baseUrl = '{nextcloud_host_for_links}';
switch (result.doc_type) {{
case 'note':
return `${{baseUrl}}/apps/notes/note/${{result.id}}`;
case 'file':
return `${{baseUrl}}/apps/files/?fileId=${{result.id}}`;
case 'calendar':
return `${{baseUrl}}/apps/calendar`;
case 'contact':
return `${{baseUrl}}/apps/contacts`;
case 'deck':
return `${{baseUrl}}/apps/deck`;
default:
return `${{baseUrl}}`;
}}
}}
}}
}}
</script>
<style>
body {{
font-family: -apple-system, BlinkMacSystemFont, "Segoe UI", Roboto, "Helvetica Neue", Arial, sans-serif;
@@ -846,6 +971,18 @@ async def user_info_html(request: Request) -> HTMLResponse:
Vector Sync
</button>
'''
}
{
""
if not show_vector_sync_tab
else '''
<button
class="tab"
:class="activeTab === 'vector-viz' ? 'active' : ''"
@click="activeTab = 'vector-viz'">
Vector Viz
</button>
'''
}
{
""
@@ -881,6 +1018,19 @@ async def user_info_html(request: Request) -> HTMLResponse:
{
""
if not show_vector_sync_tab
else '''
<!-- Vector Viz Tab -->
<div class="tab-pane" x-show="activeTab === 'vector-viz'" x-transition.opacity.duration.150ms>
<div hx-get="/app/vector-viz" hx-trigger="load" hx-swap="outerHTML">
<p style="color: #999;">Loading vector visualization...</p>
</div>
</div>
'''
}
{
""
if not show_webhooks_tab
else f'''
<!-- Webhooks Tab (admin-only, loaded dynamically) -->
+596
View File
@@ -0,0 +1,596 @@
"""Vector visualization routes for testing search algorithms.
Provides a web UI for users to test different search algorithms on their own
indexed documents and visualize results in 2D space using PCA.
All processing happens server-side following ADR-012:
- Search execution via shared search/algorithms.py
- PCA dimensionality reduction (768-dim → 2D)
- Only 2D coordinates + metadata sent to client
- Bandwidth-efficient (2 floats per doc vs 768)
"""
import logging
import time
import numpy as np
from starlette.authentication import requires
from starlette.requests import Request
from starlette.responses import HTMLResponse, JSONResponse
from nextcloud_mcp_server.config import get_settings
from nextcloud_mcp_server.search import (
BM25HybridSearchAlgorithm,
SemanticSearchAlgorithm,
)
from nextcloud_mcp_server.vector.pca import PCA
from nextcloud_mcp_server.vector.qdrant_client import get_qdrant_client
logger = logging.getLogger(__name__)
@requires("authenticated", redirect="oauth_login")
async def vector_visualization_html(request: Request) -> HTMLResponse:
"""Vector visualization page with search controls and interactive plot.
Provides UI for testing search algorithms with real-time visualization.
Requires vector sync to be enabled.
Args:
request: Starlette request object
Returns:
HTML page with search interface
"""
settings = get_settings()
if not settings.vector_sync_enabled:
return HTMLResponse(
"""
<div>
<h2>Vector Visualization</h2>
<div style="padding: 20px; background: #fff3cd; border: 1px solid #ffc107; border-radius: 4px;">
Vector sync is not enabled. Set VECTOR_SYNC_ENABLED=true to use this feature.
</div>
</div>
"""
)
# Get user info from auth context
username = (
request.user.display_name
if hasattr(request.user, "display_name")
else "unknown"
)
html_content = f"""
<style>
.viz-card {{
background: white;
border-radius: 8px;
padding: 20px;
margin-bottom: 20px;
box-shadow: 0 2px 4px rgba(0,0,0,0.1);
}}
.viz-controls {{
margin-bottom: 20px;
}}
.viz-control-row {{
display: grid;
grid-template-columns: 2fr 1fr auto;
gap: 12px;
margin-bottom: 12px;
align-items: end;
}}
.viz-control-group {{
margin-bottom: 15px;
}}
.viz-control-group label {{
display: block;
margin-bottom: 5px;
font-weight: 500;
color: #333;
}}
.viz-control-group input[type="text"],
.viz-control-group input[type="number"],
.viz-control-group select {{
width: 100%;
padding: 8px 12px;
border: 1px solid #ddd;
border-radius: 4px;
font-size: 14px;
}}
.viz-control-group input[type="range"] {{
width: 100%;
}}
.viz-control-group select[multiple] {{
min-height: 100px;
}}
.viz-weight-display {{
display: inline-block;
min-width: 40px;
text-align: right;
color: #666;
}}
.viz-btn {{
background: #0066cc;
color: white;
border: none;
padding: 10px 20px;
border-radius: 4px;
cursor: pointer;
font-size: 14px;
font-weight: 500;
}}
.viz-btn:hover {{
background: #0052a3;
}}
.viz-btn-secondary {{
background: #6c757d;
color: white;
border: none;
padding: 6px 12px;
border-radius: 4px;
cursor: pointer;
font-size: 13px;
margin-bottom: 12px;
}}
.viz-btn-secondary:hover {{
background: #5a6268;
}}
#viz-plot-container {{
width: 100%;
height: 600px;
position: relative;
}}
#viz-plot {{
width: 100%;
height: 100%;
}}
.viz-loading {{
text-align: center;
padding: 40px;
color: #666;
}}
.viz-loading-overlay {{
position: absolute;
inset: 0;
display: flex;
align-items: center;
justify-content: center;
background: white;
color: #666;
}}
.viz-no-results {{
text-align: center;
padding: 40px;
color: #666;
font-style: italic;
}}
.viz-advanced-section {{
margin-top: 16px;
padding: 16px;
background: #f8f9fa;
border-radius: 4px;
border: 1px solid #dee2e6;
}}
.viz-advanced-grid {{
display: grid;
grid-template-columns: 1fr 1fr;
gap: 20px;
}}
.viz-info-box {{
background: #e3f2fd;
border-left: 4px solid #2196f3;
padding: 12px;
margin-bottom: 20px;
font-size: 14px;
}}
</style>
<div x-data="vizApp()">
<div class="viz-card">
<h2>Vector Visualization</h2>
<div class="viz-info-box">
Testing search algorithms on your indexed documents. User: <strong>{username}</strong>
</div>
<form @submit.prevent="executeSearch">
<div class="viz-controls">
<!-- Main Controls -->
<div class="viz-control-group">
<label>Search Query</label>
<input type="text" x-model="query" placeholder="Enter search query..." required />
</div>
<div class="viz-control-row">
<div class="viz-control-group" style="margin-bottom: 0;">
<label>Algorithm</label>
<select x-model="algorithm">
<option value="semantic">Semantic (Dense Vectors)</option>
<option value="bm25_hybrid" selected>BM25 Hybrid (Dense + Sparse RRF)</option>
</select>
</div>
<div style="display: flex; align-items: flex-end;">
<button type="submit" class="viz-btn" style="width: 100%;">Search & Visualize</button>
</div>
<div style="display: flex; align-items: flex-end;">
<button type="button" class="viz-btn-secondary" @click="showAdvanced = !showAdvanced" style="white-space: nowrap;">
<span x-text="showAdvanced ? 'Hide Advanced' : 'Advanced'"></span>
</button>
</div>
</div>
<!-- Advanced Options (Collapsible) -->
<div class="viz-advanced-section" x-show="showAdvanced" x-transition.opacity.duration.200ms>
<h3 style="margin-top: 0; margin-bottom: 16px; font-size: 16px;">Advanced Options</h3>
<div class="viz-advanced-grid">
<div class="viz-control-group">
<label>Document Types</label>
<select x-model="docTypes" multiple>
<option value="">All Types (cross-app search)</option>
<option value="note">Notes</option>
<option value="file">Files</option>
<option value="calendar">Calendar Events</option>
<option value="contact">Contacts</option>
<option value="deck">Deck Cards</option>
</select>
<small style="color: #666; display: block; margin-top: 4px;">
Hold Ctrl/Cmd to select multiple
</small>
</div>
<div>
<div class="viz-control-group">
<label>Score Threshold (Semantic/Hybrid)</label>
<input type="number" x-model.number="scoreThreshold" min="0" max="1" step="0.1" />
</div>
<div class="viz-control-group">
<label>Result Limit</label>
<input type="number" x-model.number="limit" min="1" max="100" />
</div>
</div>
</div>
<!-- Info: BM25 Hybrid uses native RRF fusion (no manual weights) -->
<div x-show="algorithm === 'bm25_hybrid'" style="margin-top: 16px; padding: 12px; background: #e9ecef; border-radius: 4px;">
<p style="margin: 0; font-size: 14px; color: #666;">
<strong>BM25 Hybrid Search:</strong> Uses Qdrant's native Reciprocal Rank Fusion (RRF)
to automatically combine dense semantic vectors with sparse BM25 keyword vectors.
No manual weight tuning required.
</p>
</div>
</div>
</div>
</form>
</div>
<div class="viz-card">
<div id="viz-plot-container">
<div x-show="loading" class="viz-loading-overlay" x-transition.opacity.duration.200ms>
Executing search and computing PCA projection...
</div>
<div id="viz-plot" x-show="!loading" x-transition.opacity.duration.200ms></div>
</div>
</div>
<div class="viz-card">
<h3>Search Results (<span x-text="loading ? '...' : results.length"></span>)</h3>
<div x-show="loading" class="viz-loading" x-transition.opacity.duration.200ms>
Loading results...
</div>
<div x-show="!loading && results.length === 0" class="viz-no-results" x-transition.opacity.duration.200ms>
No results found. Try a different query or adjust your search parameters.
</div>
<template x-if="!loading && results.length > 0">
<div x-transition.opacity.duration.200ms>
<template x-for="result in results" :key="result.id">
<div style="padding: 12px; border-bottom: 1px solid #eee;">
<a :href="getNextcloudUrl(result)" target="_blank" style="font-weight: 500; color: #0066cc; text-decoration: none;">
<span x-text="result.title"></span>
</a>
<div style="font-size: 14px; color: #666; margin-top: 4px;" x-text="result.excerpt"></div>
<div style="font-size: 12px; color: #999; margin-top: 4px;">
Score: <span x-text="result.score.toFixed(3)"></span> |
Type: <span x-text="result.doc_type"></span>
</div>
</div>
</template>
</div>
</template>
</div>
</div>
"""
return HTMLResponse(content=html_content)
@requires("authenticated", redirect="oauth_login")
async def vector_visualization_search(request: Request) -> JSONResponse:
"""Execute server-side search and return 2D coordinates + results.
All processing happens server-side:
1. Execute search via shared algorithm module
2. Fetch matching vectors from Qdrant
3. Apply PCA reduction (768-dim → 2D)
4. Return coordinates + metadata only
Args:
request: Starlette request with query parameters
Returns:
JSON response with coordinates_2d and results
"""
settings = get_settings()
if not settings.vector_sync_enabled:
return JSONResponse(
{"success": False, "error": "Vector sync not enabled"},
status_code=400,
)
# Get user info from auth context
username = (
request.user.display_name if hasattr(request.user, "display_name") else None
)
if not username:
return JSONResponse(
{"success": False, "error": "User not authenticated"},
status_code=401,
)
# Parse query parameters
query = request.query_params.get("query", "")
algorithm = request.query_params.get("algorithm", "bm25_hybrid")
limit = int(request.query_params.get("limit", "50"))
score_threshold = float(request.query_params.get("score_threshold", "0.0"))
# Parse doc_types (comma-separated list, None = all types)
doc_types_param = request.query_params.get("doc_types", "")
doc_types = doc_types_param.split(",") if doc_types_param else None
logger.info(
f"Viz search: user={username}, query='{query}', "
f"algorithm={algorithm}, limit={limit}, doc_types={doc_types}"
)
try:
# Start total request timer
request_start = time.perf_counter()
# Get authenticated HTTP client from session
# In BasicAuth mode: uses username/password from session
# In OAuth mode: uses access token from session
from nextcloud_mcp_server.auth.userinfo_routes import (
_get_authenticated_client_for_userinfo,
)
async with await _get_authenticated_client_for_userinfo(request) as http_client: # noqa: F841
# Create search algorithm (no client needed - verification removed)
if algorithm == "semantic":
search_algo = SemanticSearchAlgorithm(score_threshold=score_threshold)
elif algorithm == "bm25_hybrid":
search_algo = BM25HybridSearchAlgorithm(score_threshold=score_threshold)
else:
return JSONResponse(
{"success": False, "error": f"Unknown algorithm: {algorithm}"},
status_code=400,
)
# Execute search (supports cross-app when doc_types=None)
# Get unverified results with buffer for filtering
search_start = time.perf_counter()
all_results = []
if doc_types is None or len(doc_types) == 0:
# Cross-app search - search all indexed types
unverified_results = await search_algo.search(
query=query,
user_id=username,
limit=limit * 2, # Buffer for verification filtering
doc_type=None, # Search all types
score_threshold=score_threshold,
)
all_results.extend(unverified_results)
else:
# Search each document type and combine
for doc_type in doc_types:
unverified_results = await search_algo.search(
query=query,
user_id=username,
limit=limit * 2, # Buffer for verification filtering
doc_type=doc_type,
score_threshold=score_threshold,
)
all_results.extend(unverified_results)
# Sort by score before verification
all_results.sort(key=lambda r: r.score, reverse=True)
# No verification needed for visualization - we only need Qdrant metadata
# (title, excerpt, doc_type) which is already in search results.
# Verification is only needed for sampling (LLM needs full content).
search_results = all_results[:limit]
search_duration = time.perf_counter() - search_start
# Normalize scores relative to this result set for better visualization
# (best result = 1.0, worst result = 0.0 within THIS result set)
# This makes visual encoding meaningful regardless of RRF normalization
if search_results:
scores = [r.score for r in search_results]
min_score, max_score = min(scores), max(scores)
score_range = max_score - min_score if max_score > min_score else 1.0
logger.info(
f"Normalizing scores for viz: original range [{min_score:.3f}, {max_score:.3f}] "
f"→ [0.0, 1.0]"
)
# Rescale each result's score to 0-1 within this result set
for r in search_results:
r.score = (r.score - min_score) / score_range
if not search_results:
return JSONResponse(
{
"success": True,
"results": [],
"coordinates_2d": [],
"message": "No results found",
}
)
# Fetch vectors for matching results from Qdrant
vector_fetch_start = time.perf_counter()
qdrant_client = await get_qdrant_client()
doc_ids = [r.id for r in search_results]
# Retrieve vectors for the matching documents
from qdrant_client.models import FieldCondition, Filter, MatchAny
points_response = await qdrant_client.scroll(
collection_name=settings.get_collection_name(),
scroll_filter=Filter(
must=[
FieldCondition(
key="doc_id",
match=MatchAny(any=[str(doc_id) for doc_id in doc_ids]),
),
FieldCondition(
key="user_id",
match={"value": username},
),
]
),
limit=len(doc_ids) * 2, # Account for multiple chunks per doc
with_vectors=["dense"], # Only fetch dense vectors for visualization
with_payload=["doc_id"], # Need doc_id to map vectors to results
)
points = points_response[0]
if not points:
return JSONResponse(
{
"success": True,
"results": [],
"coordinates_2d": [],
"message": "No vectors found for results",
}
)
# Extract dense vectors (handle both named and unnamed vectors)
def extract_dense_vector(point):
if point.vector is None:
return None
# If named vectors (dict), extract "dense"
if isinstance(point.vector, dict):
return point.vector.get("dense")
# If unnamed vector (array), use directly
return point.vector
vectors = np.array(
[v for v in (extract_dense_vector(p) for p in points) if v is not None]
)
vector_fetch_duration = time.perf_counter() - vector_fetch_start
if len(vectors) < 2:
# Not enough points for PCA
return JSONResponse(
{
"success": True,
"results": [
{
"id": r.id,
"doc_type": r.doc_type,
"title": r.title,
"excerpt": r.excerpt,
"score": r.score,
}
for r in search_results
],
"coordinates_2d": [[0, 0]] * len(search_results),
"message": "Not enough vectors for PCA",
}
)
# Apply PCA dimensionality reduction (768-dim → 2D)
pca_start = time.perf_counter()
pca = PCA(n_components=2)
coords_2d = pca.fit_transform(vectors)
pca_duration = time.perf_counter() - pca_start
# After fit, these attributes are guaranteed to be set
assert pca.explained_variance_ratio_ is not None
logger.info(
f"PCA explained variance: PC1={pca.explained_variance_ratio_[0]:.3f}, "
f"PC2={pca.explained_variance_ratio_[1]:.3f}"
)
# Map results to coordinates (use first chunk per document)
result_coords = []
seen_doc_ids = set()
for point, coord in zip(points, coords_2d):
if point.payload:
doc_id = int(point.payload.get("doc_id", 0))
if doc_id not in seen_doc_ids and doc_id in doc_ids:
seen_doc_ids.add(doc_id)
result_coords.append(coord.tolist())
# Build response
response_results = [
{
"id": r.id,
"doc_type": r.doc_type,
"title": r.title,
"excerpt": r.excerpt,
"score": r.score,
}
for r in search_results
]
# Calculate total request duration
total_duration = time.perf_counter() - request_start
# Log comprehensive timing metrics
logger.info(
f"Viz search timing: total={total_duration * 1000:.1f}ms, "
f"search={search_duration * 1000:.1f}ms ({search_duration / total_duration * 100:.1f}%), "
f"vector_fetch={vector_fetch_duration * 1000:.1f}ms ({vector_fetch_duration / total_duration * 100:.1f}%), "
f"pca={pca_duration * 1000:.1f}ms ({pca_duration / total_duration * 100:.1f}%), "
f"results={len(search_results)}, vectors={len(vectors)}"
)
return JSONResponse(
{
"success": True,
"results": response_results,
"coordinates_2d": result_coords[: len(search_results)],
"pca_variance": {
"pc1": float(pca.explained_variance_ratio_[0]),
"pc2": float(pca.explained_variance_ratio_[1]),
},
"timing": {
"total_ms": round(total_duration * 1000, 2),
"search_ms": round(search_duration * 1000, 2),
"vector_fetch_ms": round(vector_fetch_duration * 1000, 2),
"pca_ms": round(pca_duration * 1000, 2),
"num_results": len(search_results),
"num_vectors": len(vectors),
},
}
)
except Exception as e:
logger.error(f"Viz search error: {e}", exc_info=True)
return JSONResponse(
{"success": False, "error": str(e)},
status_code=500,
)
+2 -1
View File
@@ -5,6 +5,7 @@ import time
from abc import ABC
from functools import wraps
import anyio
from httpx import AsyncClient, HTTPStatusError, RequestError, codes
from nextcloud_mcp_server.observability.metrics import (
@@ -47,7 +48,7 @@ def retry_on_429(func):
# Record retry metric (extract app name from args if available)
if len(args) > 0 and hasattr(args[0], "app_name"):
record_nextcloud_api_retry(app=args[0].app_name, reason="429")
time.sleep(5)
await anyio.sleep(5)
elif e.response.status_code == 404:
# 404 errors are often expected (e.g., checking if attachments exist)
# Log as debug instead of warning
+1 -1
View File
@@ -40,7 +40,7 @@ class NotesClient(BaseNextcloudClient):
seen_ids: set[int] = set()
while True:
params: Dict[str, Any] = {"chunkSize": 10}
params: Dict[str, Any] = {"chunkSize": 100}
if cursor:
params["chunkCursor"] = cursor
if prune_before is not None:
+2 -2
View File
@@ -288,8 +288,8 @@ def get_settings() -> Settings:
return Settings(
# OAuth/OIDC settings
oidc_discovery_url=os.getenv("OIDC_DISCOVERY_URL"),
oidc_client_id=os.getenv("OIDC_CLIENT_ID"),
oidc_client_secret=os.getenv("OIDC_CLIENT_SECRET"),
oidc_client_id=os.getenv("NEXTCLOUD_OIDC_CLIENT_ID"),
oidc_client_secret=os.getenv("NEXTCLOUD_OIDC_CLIENT_SECRET"),
oidc_issuer=os.getenv("OIDC_ISSUER"),
# Nextcloud settings
nextcloud_host=os.getenv("NEXTCLOUD_HOST"),
@@ -12,13 +12,24 @@ class NotesSearchController:
"""
Search notes using token-based matching with relevance ranking.
Returns notes sorted by relevance score.
If query is empty, returns all notes.
"""
search_results = []
query_tokens = self._process_query(query)
# If empty query after processing, return empty results
# If empty query after processing, return all notes
if not query_tokens:
return []
async for note in notes:
search_results.append(
{
"id": note.get("id"),
"title": note.get("title"),
"category": note.get("category"),
"modified": note.get("modified"),
"_score": None, # No score for unfiltered results
}
)
return search_results
# Process and score each note
async for note in notes:
+9 -2
View File
@@ -1,6 +1,13 @@
"""Embedding service package for generating vector embeddings."""
from .service import EmbeddingService, get_embedding_service
from .bm25_provider import BM25SparseEmbeddingProvider
from .service import EmbeddingService, get_bm25_service, get_embedding_service
from .simple_provider import SimpleEmbeddingProvider
__all__ = ["EmbeddingService", "get_embedding_service", "SimpleEmbeddingProvider"]
__all__ = [
"EmbeddingService",
"get_embedding_service",
"BM25SparseEmbeddingProvider",
"get_bm25_service",
"SimpleEmbeddingProvider",
]
@@ -0,0 +1,74 @@
"""BM25 sparse embedding provider using FastEmbed."""
import logging
from typing import Any
from fastembed import SparseTextEmbedding
logger = logging.getLogger(__name__)
class BM25SparseEmbeddingProvider:
"""
BM25 sparse embedding provider for hybrid search.
Uses FastEmbed's BM25 model to generate sparse vectors for keyword-based
retrieval. These sparse vectors are combined with dense semantic vectors
in Qdrant using Reciprocal Rank Fusion (RRF) for hybrid search.
Unlike dense embeddings which have fixed dimensions, sparse embeddings
have variable-length vectors with (index, value) pairs representing
term frequencies in the BM25 vocabulary.
"""
def __init__(self, model_name: str = "Qdrant/bm25"):
"""
Initialize BM25 sparse embedding provider.
Args:
model_name: FastEmbed BM25 model name (default: Qdrant/bm25)
"""
self.model_name = model_name
logger.info(f"Initializing BM25 sparse embedding provider: {model_name}")
# Initialize FastEmbed sparse embedding model
self.model = SparseTextEmbedding(model_name=model_name)
logger.info(f"BM25 sparse embedding model loaded: {model_name}")
def encode(self, text: str) -> dict[str, Any]:
"""
Generate BM25 sparse embedding for a single text.
Args:
text: Input text to encode
Returns:
Dictionary with 'indices' and 'values' keys for Qdrant sparse vector
"""
# FastEmbed returns a generator, take first result
sparse_embedding = next(iter(self.model.embed([text])))
return {
"indices": sparse_embedding.indices.tolist(),
"values": sparse_embedding.values.tolist(),
}
def encode_batch(self, texts: list[str]) -> list[dict[str, Any]]:
"""
Generate BM25 sparse embeddings for multiple texts (batched).
Args:
texts: List of texts to encode
Returns:
List of dictionaries with 'indices' and 'values' for each text
"""
sparse_embeddings = list(self.model.embed(texts))
return [
{
"indices": emb.indices.tolist(),
"values": emb.values.tolist(),
}
for emb in sparse_embeddings
]
+33 -42
View File
@@ -1,56 +1,30 @@
"""Embedding service with provider detection."""
"""Embedding service with provider detection.
DEPRECATED: This module is maintained for backward compatibility.
New code should use nextcloud_mcp_server.providers.get_provider() directly.
"""
import logging
import os
from .base import EmbeddingProvider
from .ollama_provider import OllamaEmbeddingProvider
from .simple_provider import SimpleEmbeddingProvider
from nextcloud_mcp_server.providers import get_provider
from .bm25_provider import BM25SparseEmbeddingProvider
logger = logging.getLogger(__name__)
class EmbeddingService:
"""Unified embedding service with automatic provider detection."""
"""
Unified embedding service with automatic provider detection.
DEPRECATED: This class wraps the new unified provider infrastructure
for backward compatibility. New code should use
nextcloud_mcp_server.providers.get_provider() directly.
"""
def __init__(self):
"""Initialize embedding service with auto-detected provider."""
self.provider = self._detect_provider()
def _detect_provider(self) -> EmbeddingProvider:
"""
Auto-detect available embedding provider.
Checks environment variables in order:
1. OLLAMA_BASE_URL - Use Ollama provider (production)
2. OPENAI_API_KEY - Use OpenAI provider (future)
3. Fallback to SimpleEmbeddingProvider (testing/development)
Returns:
Configured embedding provider
"""
# Ollama provider (production)
ollama_url = os.getenv("OLLAMA_BASE_URL")
if ollama_url:
logger.info(f"Using Ollama embedding provider: {ollama_url}")
return OllamaEmbeddingProvider(
base_url=ollama_url,
model=os.getenv("OLLAMA_EMBEDDING_MODEL", "nomic-embed-text"),
verify_ssl=os.getenv("OLLAMA_VERIFY_SSL", "true").lower() == "true",
)
# OpenAI provider (future implementation)
# openai_key = os.getenv("OPENAI_API_KEY")
# if openai_key:
# return OpenAIEmbeddingProvider(api_key=openai_key)
# Fallback to simple provider for development/testing
logger.warning(
"No embedding provider configured (OLLAMA_BASE_URL or OPENAI_API_KEY not set). "
"Using SimpleEmbeddingProvider for testing/development. "
"For production, configure an external embedding service."
)
return SimpleEmbeddingProvider(dimension=384)
self.provider = get_provider()
async def embed(self, text: str) -> list[float]:
"""
@@ -109,3 +83,20 @@ def get_embedding_service() -> EmbeddingService:
if _embedding_service is None:
_embedding_service = EmbeddingService()
return _embedding_service
# BM25 sparse embedding singleton
_bm25_service: BM25SparseEmbeddingProvider | None = None
def get_bm25_service() -> BM25SparseEmbeddingProvider:
"""
Get singleton BM25 sparse embedding service instance.
Returns:
Global BM25SparseEmbeddingProvider instance
"""
global _bm25_service
if _bm25_service is None:
_bm25_service = BM25SparseEmbeddingProvider()
return _bm25_service
@@ -39,7 +39,12 @@ class HealthCheckFilter(logging.Filter):
message = record.getMessage()
return not any(
endpoint in message
for endpoint in ["/health/live", "/health/ready", "/metrics"]
for endpoint in [
"/health/live",
"/health/ready",
"/metrics",
"/app/vector-sync/status",
]
)
@@ -352,3 +352,115 @@ def record_dependency_check(dependency: str, duration: float) -> None:
duration: Check duration in seconds
"""
dependency_check_duration_seconds.labels(dependency=dependency).observe(duration)
def record_vector_sync_scan(documents_found: int) -> None:
"""
Record documents scanned during vector sync.
Args:
documents_found: Number of documents discovered in scan
"""
vector_sync_documents_scanned_total.inc(documents_found)
def record_vector_sync_processing(duration: float, status: str = "success") -> None:
"""
Record document processing with duration and status.
Args:
duration: Processing duration in seconds
status: "success" or "error"
"""
vector_sync_documents_processed_total.labels(status=status).inc()
vector_sync_processing_duration_seconds.observe(duration)
def record_qdrant_operation(operation: str, status: str = "success") -> None:
"""
Record Qdrant vector database operation.
Args:
operation: Operation type ("upsert", "search", "delete")
status: "success" or "error"
"""
qdrant_operations_total.labels(operation=operation, status=status).inc()
def update_vector_sync_queue_size(size: int) -> None:
"""
Update vector sync queue size gauge.
Args:
size: Current queue size
"""
vector_sync_queue_size.set(size)
# =============================================================================
# Decorator for Automatic Tool Instrumentation
# =============================================================================
def instrument_tool(func):
"""
Decorator to automatically instrument MCP tool functions with metrics and tracing.
Wraps async tool functions to record execution time, success/error status, and
create OpenTelemetry trace spans. Compatible with @mcp.tool() and @require_scopes()
decorators.
Usage:
@mcp.tool()
@require_scopes("notes:write")
@instrument_tool
async def nc_notes_create_note(...):
...
Args:
func: The async function to instrument
Returns:
Wrapped function with metrics and tracing instrumentation
"""
import functools
import time
from nextcloud_mcp_server.observability.tracing import trace_operation
@functools.wraps(func)
async def wrapper(*args, **kwargs):
tool_name = func.__name__
start_time = time.time()
# Extract tool arguments for tracing (sanitize sensitive fields)
# kwargs contains the actual arguments passed to the tool
tool_args = {
k: v
for k, v in kwargs.items()
if k not in ("password", "token", "secret", "api_key", "etag", "ctx")
}
# Create trace span with metrics collection
with trace_operation(
f"mcp.tool.{tool_name}",
attributes={
"mcp.tool.name": tool_name,
"mcp.tool.args": str(tool_args)[:500]
if tool_args
else None, # Limit to 500 chars
},
record_exception=True,
):
try:
result = await func(*args, **kwargs)
duration = time.time() - start_time
record_tool_call(tool_name, duration, "success")
return result
except Exception as e:
duration = time.time() - start_time
record_tool_call(tool_name, duration, "error")
record_tool_error(tool_name, type(e).__name__)
raise
return wrapper
@@ -66,8 +66,12 @@ class ObservabilityMiddleware(BaseHTTPMiddleware):
# Record start time
start_time = time.time()
# Skip tracing for health/metrics endpoints to reduce noise
should_trace = not (path.startswith("/health/") or path == "/metrics")
# Skip tracing for health/metrics/polling endpoints to reduce noise
should_trace = not (
path.startswith("/health/")
or path == "/metrics"
or path == "/app/vector-sync/status"
)
try:
if should_trace:
@@ -0,0 +1,18 @@
"""Unified provider infrastructure for embeddings and text generation."""
from .anthropic import AnthropicProvider
from .base import Provider
from .bedrock import BedrockProvider
from .ollama import OllamaProvider
from .registry import get_provider, reset_provider
from .simple import SimpleProvider
__all__ = [
"Provider",
"OllamaProvider",
"AnthropicProvider",
"SimpleProvider",
"BedrockProvider",
"get_provider",
"reset_provider",
]
@@ -0,0 +1,97 @@
"""Unified Anthropic provider for text generation."""
import logging
from anthropic import AsyncAnthropic
from .base import Provider
logger = logging.getLogger(__name__)
class AnthropicProvider(Provider):
"""
Anthropic provider for text generation.
Supports Claude models via the Anthropic API.
Note: Anthropic doesn't provide embedding models, only text generation.
"""
def __init__(self, api_key: str, model: str = "claude-3-5-sonnet-20241022"):
"""
Initialize Anthropic provider.
Args:
api_key: Anthropic API key
model: Model name (e.g., "claude-3-5-sonnet-20241022")
"""
self.client = AsyncAnthropic(api_key=api_key)
self.model = model
logger.info(f"Initialized Anthropic provider (model={model})")
@property
def supports_embeddings(self) -> bool:
"""Whether this provider supports embedding generation."""
return False
@property
def supports_generation(self) -> bool:
"""Whether this provider supports text generation."""
return True
async def embed(self, text: str) -> list[float]:
"""
Generate embedding vector for text.
Raises:
NotImplementedError: Anthropic doesn't provide embedding models
"""
raise NotImplementedError(
"Embedding not supported by Anthropic - use Ollama or Bedrock for embeddings"
)
async def embed_batch(self, texts: list[str]) -> list[list[float]]:
"""
Generate embeddings for multiple texts.
Raises:
NotImplementedError: Anthropic doesn't provide embedding models
"""
raise NotImplementedError(
"Embedding not supported by Anthropic - use Ollama or Bedrock for embeddings"
)
def get_dimension(self) -> int:
"""
Get embedding dimension.
Raises:
NotImplementedError: Anthropic doesn't provide embedding models
"""
raise NotImplementedError(
"Embedding not supported by Anthropic - use Ollama or Bedrock for embeddings"
)
async def generate(self, prompt: str, max_tokens: int = 500) -> str:
"""
Generate text using Anthropic API.
Args:
prompt: The prompt to generate from
max_tokens: Maximum tokens to generate
Returns:
Generated text
"""
message = await self.client.messages.create(
model=self.model,
max_tokens=max_tokens,
temperature=0.7,
messages=[{"role": "user", "content": prompt}],
)
return message.content[0].text
async def close(self) -> None:
"""Close the client (no-op for Anthropic SDK)."""
pass
+91
View File
@@ -0,0 +1,91 @@
"""Unified provider interface for embeddings and text generation."""
from abc import ABC, abstractmethod
class Provider(ABC):
"""
Unified base class for LLM providers.
Providers can support embeddings, text generation, or both.
Use capability properties to determine what features are available.
"""
@property
@abstractmethod
def supports_embeddings(self) -> bool:
"""Whether this provider supports embedding generation."""
pass
@property
@abstractmethod
def supports_generation(self) -> bool:
"""Whether this provider supports text generation."""
pass
@abstractmethod
async def embed(self, text: str) -> list[float]:
"""
Generate embedding vector for text.
Args:
text: Input text to embed
Returns:
Vector embedding as list of floats
Raises:
NotImplementedError: If provider doesn't support embeddings
"""
pass
@abstractmethod
async def embed_batch(self, texts: list[str]) -> list[list[float]]:
"""
Generate embeddings for multiple texts (optimized).
Args:
texts: List of texts to embed
Returns:
List of vector embeddings
Raises:
NotImplementedError: If provider doesn't support embeddings
"""
pass
@abstractmethod
def get_dimension(self) -> int:
"""
Get embedding dimension for this provider.
Returns:
Vector dimension (e.g., 768 for nomic-embed-text)
Raises:
NotImplementedError: If provider doesn't support embeddings
"""
pass
@abstractmethod
async def generate(self, prompt: str, max_tokens: int = 500) -> str:
"""
Generate text from a prompt.
Args:
prompt: The prompt to generate from
max_tokens: Maximum tokens to generate
Returns:
Generated text
Raises:
NotImplementedError: If provider doesn't support generation
"""
pass
@abstractmethod
async def close(self) -> None:
"""Close the provider and release resources."""
pass
+397
View File
@@ -0,0 +1,397 @@
"""Amazon Bedrock provider for embeddings and text generation."""
import json
import logging
from typing import Any
try:
import boto3
from botocore.exceptions import BotoCoreError, ClientError
BOTO3_AVAILABLE = True
except ImportError:
BOTO3_AVAILABLE = False
from .base import Provider
logger = logging.getLogger(__name__)
class BedrockProvider(Provider):
"""
Amazon Bedrock provider supporting both embeddings and text generation.
Uses AWS Bedrock Runtime API with boto3. Supports various model families:
- Embeddings: amazon.titan-embed-text-v1, amazon.titan-embed-text-v2, cohere.embed-*
- Text Generation: anthropic.claude-*, meta.llama3-*, amazon.titan-text-*, mistral.*, etc.
Requires AWS credentials configured via:
- Environment variables (AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_REGION)
- AWS credentials file (~/.aws/credentials)
- IAM role (when running on AWS)
"""
def __init__(
self,
region_name: str | None = None,
embedding_model: str | None = None,
generation_model: str | None = None,
aws_access_key_id: str | None = None,
aws_secret_access_key: str | None = None,
):
"""
Initialize Bedrock provider.
Args:
region_name: AWS region (e.g., "us-east-1"). Defaults to AWS_REGION env var.
embedding_model: Model ID for embeddings (e.g., "amazon.titan-embed-text-v2:0").
None disables embeddings.
generation_model: Model ID for text generation (e.g., "anthropic.claude-3-sonnet-20240229-v1:0").
None disables generation.
aws_access_key_id: AWS access key (optional, uses default credential chain if not provided)
aws_secret_access_key: AWS secret key (optional, uses default credential chain if not provided)
Raises:
ImportError: If boto3 is not installed
"""
if not BOTO3_AVAILABLE:
raise ImportError(
"boto3 is required for Bedrock provider. Install with: pip install boto3"
)
self.embedding_model = embedding_model
self.generation_model = generation_model
self._dimension: int | None = None # Detected dynamically
# Initialize bedrock-runtime client
client_kwargs: dict[str, Any] = {}
if region_name:
client_kwargs["region_name"] = region_name
if aws_access_key_id:
client_kwargs["aws_access_key_id"] = aws_access_key_id
if aws_secret_access_key:
client_kwargs["aws_secret_access_key"] = aws_secret_access_key
self.client = boto3.client("bedrock-runtime", **client_kwargs)
logger.info(
f"Initialized Bedrock provider in region {region_name or 'default'} "
f"(embedding_model={embedding_model}, generation_model={generation_model})"
)
@property
def supports_embeddings(self) -> bool:
"""Whether this provider supports embedding generation."""
return self.embedding_model is not None
@property
def supports_generation(self) -> bool:
"""Whether this provider supports text generation."""
return self.generation_model is not None
def _create_embedding_request(self, text: str) -> dict[str, Any]:
"""
Create model-specific embedding request payload.
Args:
text: Input text to embed
Returns:
Request payload dict for the embedding model
"""
if not self.embedding_model:
raise NotImplementedError(
"Embedding not supported - no embedding_model configured"
)
# Titan Embed models
if self.embedding_model.startswith("amazon.titan-embed"):
return {"inputText": text}
# Cohere Embed models
elif self.embedding_model.startswith("cohere.embed"):
return {"texts": [text], "input_type": "search_document"}
# Unknown model - try Titan format as default
else:
logger.warning(
f"Unknown embedding model format for {self.embedding_model}, "
"using Titan format as default"
)
return {"inputText": text}
def _parse_embedding_response(self, response: dict[str, Any]) -> list[float]:
"""
Parse model-specific embedding response.
Args:
response: Raw response from Bedrock
Returns:
Embedding vector as list of floats
"""
# Titan Embed models
if self.embedding_model and self.embedding_model.startswith(
"amazon.titan-embed"
):
return response["embedding"]
# Cohere Embed models
elif self.embedding_model and self.embedding_model.startswith("cohere.embed"):
return response["embeddings"][0]
# Unknown model - try Titan format as default
else:
logger.warning(
f"Unknown embedding response format for {self.embedding_model}, "
"trying Titan format"
)
return response.get("embedding", response.get("embeddings", [None])[0])
async def embed(self, text: str) -> list[float]:
"""
Generate embedding vector for text.
Args:
text: Input text to embed
Returns:
Vector embedding as list of floats
Raises:
NotImplementedError: If embeddings not enabled (no embedding_model)
ClientError: If Bedrock API call fails
"""
if not self.supports_embeddings:
raise NotImplementedError(
"Embedding not supported - no embedding_model configured"
)
try:
request_body = self._create_embedding_request(text)
response = self.client.invoke_model(
modelId=self.embedding_model,
body=json.dumps(request_body),
accept="application/json",
contentType="application/json",
)
response_body = json.loads(response["body"].read())
embedding = self._parse_embedding_response(response_body)
return embedding
except (BotoCoreError, ClientError) as e:
logger.error(f"Bedrock embedding error: {e}")
raise
async def embed_batch(self, texts: list[str]) -> list[list[float]]:
"""
Generate embeddings for multiple texts.
Note: Current implementation sends requests sequentially.
Future optimization could use asyncio for concurrent requests.
Args:
texts: List of texts to embed
Returns:
List of vector embeddings
Raises:
NotImplementedError: If embeddings not enabled (no embedding_model)
ClientError: If Bedrock API call fails
"""
if not self.supports_embeddings:
raise NotImplementedError(
"Embedding not supported - no embedding_model configured"
)
embeddings = []
for text in texts:
embedding = await self.embed(text)
embeddings.append(embedding)
return embeddings
async def _detect_dimension(self):
"""
Detect embedding dimension by generating a test embedding.
"""
if self._dimension is None and self.supports_embeddings:
logger.debug(
f"Detecting embedding dimension for model {self.embedding_model}..."
)
test_embedding = await self.embed("test")
self._dimension = len(test_embedding)
logger.info(
f"Detected embedding dimension: {self._dimension} "
f"for model {self.embedding_model}"
)
def get_dimension(self) -> int:
"""
Get embedding dimension.
Returns:
Vector dimension for the configured embedding model
Raises:
NotImplementedError: If embeddings not enabled (no embedding_model)
RuntimeError: If dimension not detected yet (call _detect_dimension first)
"""
if not self.supports_embeddings:
raise NotImplementedError(
"Embedding not supported - no embedding_model configured"
)
if self._dimension is None:
raise RuntimeError(
f"Embedding dimension not detected yet for model {self.embedding_model}. "
"Call _detect_dimension() first or generate an embedding."
)
return self._dimension
def _create_generation_request(
self, prompt: str, max_tokens: int
) -> dict[str, Any]:
"""
Create model-specific text generation request payload.
Args:
prompt: The prompt to generate from
max_tokens: Maximum tokens to generate
Returns:
Request payload dict for the generation model
"""
if not self.generation_model:
raise NotImplementedError(
"Text generation not supported - no generation_model configured"
)
# Anthropic Claude models
if self.generation_model.startswith("anthropic.claude"):
return {
"anthropic_version": "bedrock-2023-05-31",
"max_tokens": max_tokens,
"temperature": 0.7,
"messages": [{"role": "user", "content": prompt}],
}
# Meta Llama models
elif self.generation_model.startswith("meta.llama"):
return {"prompt": prompt, "max_gen_len": max_tokens, "temperature": 0.7}
# Amazon Titan Text models
elif self.generation_model.startswith("amazon.titan-text"):
return {
"inputText": prompt,
"textGenerationConfig": {
"maxTokenCount": max_tokens,
"temperature": 0.7,
},
}
# Mistral models
elif self.generation_model.startswith("mistral"):
return {"prompt": prompt, "max_tokens": max_tokens, "temperature": 0.7}
# Unknown model - try Claude format as default
else:
logger.warning(
f"Unknown generation model format for {self.generation_model}, "
"using Claude format as default"
)
return {
"anthropic_version": "bedrock-2023-05-31",
"max_tokens": max_tokens,
"temperature": 0.7,
"messages": [{"role": "user", "content": prompt}],
}
def _parse_generation_response(self, response: dict[str, Any]) -> str:
"""
Parse model-specific text generation response.
Args:
response: Raw response from Bedrock
Returns:
Generated text
"""
# Anthropic Claude models
if self.generation_model and self.generation_model.startswith(
"anthropic.claude"
):
return response["content"][0]["text"]
# Meta Llama models
elif self.generation_model and self.generation_model.startswith("meta.llama"):
return response["generation"]
# Amazon Titan Text models
elif self.generation_model and self.generation_model.startswith(
"amazon.titan-text"
):
return response["results"][0]["outputText"]
# Mistral models
elif self.generation_model and self.generation_model.startswith("mistral"):
return response["outputs"][0]["text"]
# Unknown model - try common response fields
else:
logger.warning(
f"Unknown generation response format for {self.generation_model}, "
"trying common fields"
)
# Try common response field names
for field in ["text", "generation", "outputText", "completion"]:
if field in response:
return response[field]
# Last resort: return JSON string
return json.dumps(response)
async def generate(self, prompt: str, max_tokens: int = 500) -> str:
"""
Generate text from a prompt.
Args:
prompt: The prompt to generate from
max_tokens: Maximum tokens to generate
Returns:
Generated text
Raises:
NotImplementedError: If generation not enabled (no generation_model)
ClientError: If Bedrock API call fails
"""
if not self.supports_generation:
raise NotImplementedError(
"Text generation not supported - no generation_model configured"
)
try:
request_body = self._create_generation_request(prompt, max_tokens)
response = self.client.invoke_model(
modelId=self.generation_model,
body=json.dumps(request_body),
accept="application/json",
contentType="application/json",
)
response_body = json.loads(response["body"].read())
text = self._parse_generation_response(response_body)
return text
except (BotoCoreError, ClientError) as e:
logger.error(f"Bedrock generation error: {e}")
raise
async def close(self) -> None:
"""Close the client (no-op for boto3 clients)."""
pass
+221
View File
@@ -0,0 +1,221 @@
"""Unified Ollama provider for embeddings and text generation."""
import logging
import httpx
from .base import Provider
logger = logging.getLogger(__name__)
class OllamaProvider(Provider):
"""
Ollama provider supporting both embeddings and text generation.
Supports TLS, SSL verification, and automatic model loading.
"""
def __init__(
self,
base_url: str,
embedding_model: str | None = None,
generation_model: str | None = None,
verify_ssl: bool = True,
timeout: httpx.Timeout | None = None,
):
"""
Initialize Ollama provider.
Args:
base_url: Ollama API base URL (e.g., https://ollama.internal.example.com:443)
embedding_model: Model for embeddings (e.g., "nomic-embed-text"). None disables embeddings.
generation_model: Model for text generation (e.g., "llama3.2:1b"). None disables generation.
verify_ssl: Verify SSL certificates (default: True)
timeout: HTTP timeout configuration
"""
self.base_url = base_url.rstrip("/")
self.embedding_model = embedding_model
self.generation_model = generation_model
self.verify_ssl = verify_ssl
if timeout is None:
timeout = httpx.Timeout(timeout=120, connect=5)
self.client = httpx.AsyncClient(verify=verify_ssl, timeout=timeout)
self._dimension: int | None = None # Detected dynamically for embeddings
logger.info(
f"Initialized Ollama provider: {base_url} "
f"(embedding_model={embedding_model}, generation_model={generation_model}, "
f"verify_ssl={verify_ssl})"
)
# Pre-check and auto-load models
if embedding_model:
self._check_model_is_loaded(embedding_model, autoload=True)
if generation_model:
self._check_model_is_loaded(generation_model, autoload=True)
@property
def supports_embeddings(self) -> bool:
"""Whether this provider supports embedding generation."""
return self.embedding_model is not None
@property
def supports_generation(self) -> bool:
"""Whether this provider supports text generation."""
return self.generation_model is not None
async def embed(self, text: str) -> list[float]:
"""
Generate embedding vector for text.
Args:
text: Input text to embed
Returns:
Vector embedding as list of floats
Raises:
NotImplementedError: If embeddings not enabled (no embedding_model)
"""
if not self.supports_embeddings:
raise NotImplementedError(
"Embedding not supported - no embedding_model configured"
)
response = await self.client.post(
f"{self.base_url}/api/embeddings",
json={"model": self.embedding_model, "prompt": text},
)
response.raise_for_status()
return response.json()["embedding"]
async def embed_batch(self, texts: list[str]) -> list[list[float]]:
"""
Generate embeddings for multiple texts (batched requests).
Note: Ollama doesn't have native batch API, so we send requests sequentially.
Args:
texts: List of texts to embed
Returns:
List of vector embeddings
Raises:
NotImplementedError: If embeddings not enabled (no embedding_model)
"""
if not self.supports_embeddings:
raise NotImplementedError(
"Embedding not supported - no embedding_model configured"
)
embeddings = []
for text in texts:
embedding = await self.embed(text)
embeddings.append(embedding)
return embeddings
async def _detect_dimension(self):
"""
Detect embedding dimension by generating a test embedding.
This method queries the model to determine the actual dimension
instead of relying on hardcoded values.
"""
if self._dimension is None and self.supports_embeddings:
logger.debug(
f"Detecting embedding dimension for model {self.embedding_model}..."
)
test_embedding = await self.embed("test")
self._dimension = len(test_embedding)
logger.info(
f"Detected embedding dimension: {self._dimension} "
f"for model {self.embedding_model}"
)
def get_dimension(self) -> int:
"""
Get embedding dimension.
Returns:
Vector dimension for the configured embedding model
Raises:
NotImplementedError: If embeddings not enabled (no embedding_model)
RuntimeError: If dimension not detected yet (call _detect_dimension first)
"""
if not self.supports_embeddings:
raise NotImplementedError(
"Embedding not supported - no embedding_model configured"
)
if self._dimension is None:
raise RuntimeError(
f"Embedding dimension not detected yet for model {self.embedding_model}. "
"Call _detect_dimension() first or generate an embedding."
)
return self._dimension
async def generate(self, prompt: str, max_tokens: int = 500) -> str:
"""
Generate text from a prompt.
Args:
prompt: The prompt to generate from
max_tokens: Maximum tokens to generate
Returns:
Generated text
Raises:
NotImplementedError: If generation not enabled (no generation_model)
"""
if not self.supports_generation:
raise NotImplementedError(
"Text generation not supported - no generation_model configured"
)
response = await self.client.post(
f"{self.base_url}/api/generate",
json={
"model": self.generation_model,
"prompt": prompt,
"stream": False,
"options": {
"num_predict": max_tokens,
"temperature": 0.7,
},
},
)
response.raise_for_status()
data = response.json()
return data["response"]
def _check_model_is_loaded(self, model: str, autoload: bool = True):
"""
Check if model is loaded in Ollama, optionally auto-loading it.
Args:
model: Model name to check
autoload: Whether to automatically pull the model if not loaded
"""
response = httpx.get(f"{self.base_url}/api/tags")
response.raise_for_status()
models = [m["name"] for m in response.json().get("models", [])]
logger.info("Ollama has following models pre-loaded: %s", models)
if (model not in models) and autoload:
logger.warning(
"Model '%s' not yet available in ollama, attempting to pull now...",
model,
)
response = httpx.post(f"{self.base_url}/api/pull", json={"model": model})
response.raise_for_status()
async def close(self) -> None:
"""Close HTTP client."""
await self.client.aclose()
+126
View File
@@ -0,0 +1,126 @@
"""Provider registry and factory for auto-detection and instantiation."""
import logging
import os
from .base import Provider
from .bedrock import BedrockProvider
from .ollama import OllamaProvider
from .simple import SimpleProvider
logger = logging.getLogger(__name__)
class ProviderRegistry:
"""
Registry for provider auto-detection and instantiation.
Checks environment variables in priority order and creates appropriate provider:
1. Bedrock (AWS_REGION + BEDROCK_*_MODEL)
2. Ollama (OLLAMA_BASE_URL)
3. Simple (fallback for testing/development)
"""
@staticmethod
def create_provider() -> Provider:
"""
Auto-detect and create provider based on environment variables.
Priority order:
1. Bedrock - if AWS_REGION or BEDROCK_EMBEDDING_MODEL is set
2. Ollama - if OLLAMA_BASE_URL is set
3. Simple - fallback for testing/development
Returns:
Provider instance
Environment Variables:
Bedrock:
- AWS_REGION: AWS region (e.g., "us-east-1")
- AWS_ACCESS_KEY_ID: AWS access key (optional, uses credential chain)
- AWS_SECRET_ACCESS_KEY: AWS secret key (optional)
- BEDROCK_EMBEDDING_MODEL: Model ID for embeddings (e.g., "amazon.titan-embed-text-v2:0")
- BEDROCK_GENERATION_MODEL: Model ID for text generation (e.g., "anthropic.claude-3-sonnet-20240229-v1:0")
Ollama:
- OLLAMA_BASE_URL: Ollama API base URL (e.g., "http://localhost:11434")
- OLLAMA_EMBEDDING_MODEL: Model for embeddings (default: "nomic-embed-text")
- OLLAMA_GENERATION_MODEL: Model for text generation (e.g., "llama3.2:1b")
- OLLAMA_VERIFY_SSL: Verify SSL certificates (default: "true")
Simple (no configuration needed, fallback):
- SIMPLE_EMBEDDING_DIMENSION: Embedding dimension (default: 384)
"""
# 1. Check for Bedrock
aws_region = os.getenv("AWS_REGION")
bedrock_embedding_model = os.getenv("BEDROCK_EMBEDDING_MODEL")
bedrock_generation_model = os.getenv("BEDROCK_GENERATION_MODEL")
if aws_region or bedrock_embedding_model or bedrock_generation_model:
logger.info(
f"Using Bedrock provider: region={aws_region}, "
f"embedding_model={bedrock_embedding_model}, "
f"generation_model={bedrock_generation_model}"
)
return BedrockProvider(
region_name=aws_region,
embedding_model=bedrock_embedding_model,
generation_model=bedrock_generation_model,
aws_access_key_id=os.getenv("AWS_ACCESS_KEY_ID"),
aws_secret_access_key=os.getenv("AWS_SECRET_ACCESS_KEY"),
)
# 2. Check for Ollama
ollama_url = os.getenv("OLLAMA_BASE_URL")
if ollama_url:
embedding_model = os.getenv("OLLAMA_EMBEDDING_MODEL", "nomic-embed-text")
generation_model = os.getenv("OLLAMA_GENERATION_MODEL")
verify_ssl = os.getenv("OLLAMA_VERIFY_SSL", "true").lower() == "true"
logger.info(
f"Using Ollama provider: {ollama_url}, "
f"embedding_model={embedding_model}, "
f"generation_model={generation_model}"
)
return OllamaProvider(
base_url=ollama_url,
embedding_model=embedding_model,
generation_model=generation_model,
verify_ssl=verify_ssl,
)
# 3. Fallback to Simple provider for development/testing
dimension = int(os.getenv("SIMPLE_EMBEDDING_DIMENSION", "384"))
logger.warning(
"No provider configured (AWS_REGION, OLLAMA_BASE_URL not set). "
"Using SimpleProvider for testing/development. "
"For production, configure Bedrock or Ollama."
)
return SimpleProvider(dimension=dimension)
# Singleton instance
_provider: Provider | None = None
def get_provider() -> Provider:
"""
Get singleton provider instance.
Returns:
Global Provider instance (auto-detected on first call)
"""
global _provider
if _provider is None:
_provider = ProviderRegistry.create_provider()
return _provider
def reset_provider():
"""
Reset singleton provider instance.
Useful for testing or reconfiguration.
"""
global _provider
_provider = None
+149
View File
@@ -0,0 +1,149 @@
"""Simple in-process embedding provider for testing.
This provider uses a basic TF-IDF-like approach with feature hashing to generate
deterministic embeddings without requiring external services. Suitable for testing
but not for production use.
"""
import hashlib
import math
import re
from collections import Counter
from .base import Provider
class SimpleProvider(Provider):
"""Simple deterministic embedding provider using feature hashing.
This implementation:
- Tokenizes text into words
- Uses feature hashing to map words to fixed-size vectors
- Applies TF-IDF-like weighting
- Normalizes vectors to unit length
Not suitable for production but good for testing semantic search infrastructure.
Only supports embeddings, not text generation.
"""
def __init__(self, dimension: int = 384):
"""Initialize simple embedding provider.
Args:
dimension: Embedding dimension (default: 384)
"""
self.dimension = dimension
@property
def supports_embeddings(self) -> bool:
"""Whether this provider supports embedding generation."""
return True
@property
def supports_generation(self) -> bool:
"""Whether this provider supports text generation."""
return False
def _tokenize(self, text: str) -> list[str]:
"""Tokenize text into lowercase words.
Args:
text: Input text
Returns:
List of lowercase word tokens
"""
# Simple word tokenization
text = text.lower()
words = re.findall(r"\b\w+\b", text)
return words
def _hash_word(self, word: str) -> int:
"""Hash word to dimension index.
Args:
word: Word to hash
Returns:
Index in range [0, dimension)
"""
hash_bytes = hashlib.md5(word.encode()).digest()
hash_int = int.from_bytes(hash_bytes[:4], byteorder="big")
return hash_int % self.dimension
def _embed_single(self, text: str) -> list[float]:
"""Generate embedding for single text.
Args:
text: Input text
Returns:
Normalized embedding vector
"""
tokens = self._tokenize(text)
if not tokens:
return [0.0] * self.dimension
# Count term frequencies
term_freq = Counter(tokens)
# Initialize vector
vector = [0.0] * self.dimension
# Apply TF weighting with feature hashing
for word, count in term_freq.items():
idx = self._hash_word(word)
# Simple TF weighting: log(1 + count)
vector[idx] += math.log1p(count)
# Normalize to unit length
norm = math.sqrt(sum(x * x for x in vector))
if norm > 0:
vector = [x / norm for x in vector]
return vector
async def embed(self, text: str) -> list[float]:
"""Generate embedding vector for text.
Args:
text: Input text to embed
Returns:
Vector embedding as list of floats
"""
return self._embed_single(text)
async def embed_batch(self, texts: list[str]) -> list[list[float]]:
"""Generate embeddings for multiple texts.
Args:
texts: List of texts to embed
Returns:
List of vector embeddings
"""
return [self._embed_single(text) for text in texts]
def get_dimension(self) -> int:
"""Get embedding dimension.
Returns:
Vector dimension
"""
return self.dimension
async def generate(self, prompt: str, max_tokens: int = 500) -> str:
"""
Generate text from a prompt.
Raises:
NotImplementedError: Simple provider doesn't support text generation
"""
raise NotImplementedError(
"Text generation not supported by Simple provider - use Ollama, Anthropic, or Bedrock"
)
async def close(self) -> None:
"""Close the provider (no-op for simple provider)."""
pass
+27
View File
@@ -0,0 +1,27 @@
"""Search algorithms module for BM25 hybrid search.
This module provides BM25 hybrid search combining:
- Dense semantic vectors (vector similarity via embeddings)
- Sparse BM25 vectors (keyword-based retrieval)
Results are fused using Qdrant's native Reciprocal Rank Fusion (RRF) for
optimal relevance across both semantic and keyword queries.
"""
from nextcloud_mcp_server.search.algorithms import (
NextcloudClientProtocol,
SearchAlgorithm,
SearchResult,
get_indexed_doc_types,
)
from nextcloud_mcp_server.search.bm25_hybrid import BM25HybridSearchAlgorithm
from nextcloud_mcp_server.search.semantic import SemanticSearchAlgorithm
__all__ = [
"NextcloudClientProtocol",
"SearchAlgorithm",
"SearchResult",
"get_indexed_doc_types",
"SemanticSearchAlgorithm",
"BM25HybridSearchAlgorithm",
]
+200
View File
@@ -0,0 +1,200 @@
"""Base interfaces and data structures for search algorithms."""
from abc import ABC, abstractmethod
from dataclasses import dataclass
from typing import Any, Protocol, runtime_checkable
@runtime_checkable
class NextcloudClientProtocol(Protocol):
"""Protocol for Nextcloud client supporting multi-document search.
This protocol defines the interface that search algorithms need from a
Nextcloud client to access documents across different apps (Notes, Files,
Calendar, etc.). The client provides access to app-specific sub-clients
that handle the actual API calls.
Document types (e.g., "note", "file", "calendar") are NOT 1:1 with apps.
For example, the Notes app specializes in markdown files, while Files/WebDAV
handles multiple file types. The abstraction is at the document type level.
Search algorithms query Qdrant to determine which document types are actually
indexed before attempting to access them, enabling graceful cross-app search.
"""
username: str
# App-specific clients that search algorithms dispatch to
@property
def notes(self) -> Any:
"""Notes client for accessing note documents."""
...
@property
def webdav(self) -> Any:
"""WebDAV client for accessing file documents."""
...
@property
def calendar(self) -> Any:
"""Calendar client for accessing event/task documents."""
...
@property
def contacts(self) -> Any:
"""Contacts client for accessing contact card documents."""
...
@property
def deck(self) -> Any:
"""Deck client for accessing deck card documents."""
...
@property
def cookbook(self) -> Any:
"""Cookbook client for accessing recipe documents."""
...
@property
def tables(self) -> Any:
"""Tables client for accessing table row documents."""
...
async def get_indexed_doc_types(user_id: str) -> set[str]:
"""Query Qdrant to get actually-indexed document types for a user.
This enables search algorithms to check which document types are available
before attempting to search/verify them, allowing graceful cross-app search.
Args:
user_id: User ID to filter by
Returns:
Set of document type strings (e.g., {"note", "file", "calendar"})
Example:
>>> types = await get_indexed_doc_types("alice")
>>> if "note" in types:
... # Search notes
"""
import logging
from qdrant_client.models import FieldCondition, Filter, MatchValue
from nextcloud_mcp_server.config import get_settings
from nextcloud_mcp_server.vector.qdrant_client import get_qdrant_client
logger = logging.getLogger(__name__)
settings = get_settings()
qdrant_client = await get_qdrant_client()
collection = settings.get_collection_name()
# Use scroll to sample documents and extract doc_types
# Note: This could be optimized with a facet/aggregation query if Qdrant adds support
try:
scroll_results, _next_offset = await qdrant_client.scroll(
collection_name=collection,
scroll_filter=Filter(
must=[FieldCondition(key="user_id", match=MatchValue(value=user_id))]
),
limit=1000, # Sample size to discover types
with_payload=["doc_type"],
with_vectors=False, # Don't need vectors for type discovery
)
doc_types = {
point.payload.get("doc_type")
for point in scroll_results
if point.payload.get("doc_type")
}
logger.debug(f"Found indexed document types for user {user_id}: {doc_types}")
return doc_types
except Exception as e:
logger.warning(f"Failed to query Qdrant for doc_types: {e}")
return set()
@dataclass
class SearchResult:
"""A single search result with metadata and score.
Attributes:
id: Document ID
doc_type: Document type (note, file, calendar, contact, etc.)
title: Document title
excerpt: Content excerpt showing match context
score: Relevance score (0.0-1.0, higher is better)
metadata: Additional algorithm-specific metadata
"""
id: int
doc_type: str
title: str
excerpt: str
score: float
metadata: dict[str, Any] | None = None
def __post_init__(self):
"""Validate score is in valid range."""
if not 0.0 <= self.score <= 1.0:
raise ValueError(f"Score must be between 0.0 and 1.0, got {self.score}")
class SearchAlgorithm(ABC):
"""Abstract base class for search algorithms.
All search algorithms must implement the search() method with consistent
interface, allowing them to be used interchangeably.
"""
@abstractmethod
async def search(
self,
query: str,
user_id: str,
limit: int = 10,
doc_type: str | None = None,
**kwargs: Any,
) -> list[SearchResult]:
"""Execute search with the given parameters.
Args:
query: Search query string
user_id: User ID for multi-tenant filtering
limit: Maximum number of results to return
doc_type: Optional document type filter (note, file, calendar, etc.)
**kwargs: Algorithm-specific parameters
Returns:
List of SearchResult objects ranked by relevance
Raises:
McpError: If search fails or configuration is invalid
"""
pass
@property
@abstractmethod
def name(self) -> str:
"""Return algorithm name for identification."""
pass
@property
def supports_scoring(self) -> bool:
"""Whether this algorithm provides meaningful relevance scores.
Default: True. Override if algorithm doesn't support scoring.
"""
return True
@property
def requires_vector_db(self) -> bool:
"""Whether this algorithm requires vector database.
Default: False. Override for semantic search.
"""
return False
+206
View File
@@ -0,0 +1,206 @@
"""BM25 hybrid search algorithm using Qdrant native RRF fusion."""
import logging
from typing import Any
from qdrant_client import models
from qdrant_client.models import FieldCondition, Filter, MatchValue
from nextcloud_mcp_server.config import get_settings
from nextcloud_mcp_server.embedding import get_bm25_service, get_embedding_service
from nextcloud_mcp_server.observability.metrics import record_qdrant_operation
from nextcloud_mcp_server.search.algorithms import SearchAlgorithm, SearchResult
from nextcloud_mcp_server.vector.qdrant_client import get_qdrant_client
logger = logging.getLogger(__name__)
class BM25HybridSearchAlgorithm(SearchAlgorithm):
"""
Hybrid search combining dense semantic vectors with BM25 sparse vectors.
Uses Qdrant's native Reciprocal Rank Fusion (RRF) to automatically merge
results from both dense (semantic) and sparse (BM25 keyword) searches.
This provides the best of both worlds: semantic understanding for conceptual
queries and precise keyword matching for specific terms, acronyms, and codes.
The fusion happens efficiently in the database using the prefetch mechanism,
eliminating the need for application-layer result merging.
"""
def __init__(self, score_threshold: float = 0.0):
"""
Initialize BM25 hybrid search algorithm.
Args:
score_threshold: Minimum RRF score (0-1, default: 0.0 to allow RRF scoring)
Note: RRF produces normalized scores, so threshold is typically lower
"""
self.score_threshold = score_threshold
@property
def name(self) -> str:
return "bm25_hybrid"
@property
def requires_vector_db(self) -> bool:
return True
async def search(
self,
query: str,
user_id: str,
limit: int = 10,
doc_type: str | None = None,
**kwargs: Any,
) -> list[SearchResult]:
"""
Execute hybrid search using dense + sparse vectors with native RRF fusion.
Returns unverified results from Qdrant. Access verification should be
performed separately at the final output stage using verify_search_results().
Args:
query: Natural language or keyword search query
user_id: User ID for filtering
limit: Maximum results to return
doc_type: Optional document type filter
**kwargs: Additional parameters (score_threshold override)
Returns:
List of unverified SearchResult objects ranked by RRF fusion score
Raises:
McpError: If vector sync is not enabled or search fails
"""
settings = get_settings()
score_threshold = kwargs.get("score_threshold", self.score_threshold)
logger.info(
f"BM25 hybrid search: query='{query}', user={user_id}, "
f"limit={limit}, score_threshold={score_threshold}, doc_type={doc_type}"
)
# Generate dense embedding for semantic search
embedding_service = get_embedding_service()
dense_embedding = await embedding_service.embed(query)
logger.debug(f"Generated dense embedding (dimension={len(dense_embedding)})")
# Generate sparse embedding for BM25 keyword search
bm25_service = get_bm25_service()
sparse_embedding = bm25_service.encode(query)
logger.debug(
f"Generated sparse embedding "
f"({len(sparse_embedding['indices'])} non-zero terms)"
)
# Build Qdrant filter
filter_conditions = [
FieldCondition(
key="user_id",
match=MatchValue(value=user_id),
)
]
# Add doc_type filter if specified
if doc_type:
filter_conditions.append(
FieldCondition(
key="doc_type",
match=MatchValue(value=doc_type),
)
)
query_filter = Filter(must=filter_conditions)
# Execute hybrid search with Qdrant native RRF fusion
qdrant_client = await get_qdrant_client()
try:
# Use prefetch to run both dense and sparse searches
# Qdrant will automatically merge results using RRF
search_response = await qdrant_client.query_points(
collection_name=settings.get_collection_name(),
prefetch=[
# Dense semantic search
models.Prefetch(
query=dense_embedding,
using="dense",
limit=limit * 2, # Get extra for deduplication
filter=query_filter,
),
# Sparse BM25 search
models.Prefetch(
query=models.SparseVector(
indices=sparse_embedding["indices"],
values=sparse_embedding["values"],
),
using="sparse",
limit=limit * 2, # Get extra for deduplication
filter=query_filter,
),
],
# RRF fusion query (no additional query needed, just fusion)
query=models.FusionQuery(fusion=models.Fusion.RRF),
limit=limit * 2, # Get extra for deduplication
score_threshold=score_threshold,
with_payload=True,
with_vectors=False, # Don't return vectors to save bandwidth
)
record_qdrant_operation("search", "success")
except Exception:
record_qdrant_operation("search", "error")
raise
logger.info(
f"Qdrant RRF fusion returned {len(search_response.points)} results "
f"(before deduplication)"
)
if search_response.points:
# Log top 3 RRF scores to help with threshold tuning
top_scores = [p.score for p in search_response.points[:3]]
logger.debug(f"Top 3 RRF fusion scores: {top_scores}")
# Deduplicate by (doc_id, doc_type) - multiple chunks per document
seen_docs = set()
results = []
for result in search_response.points:
doc_id = int(result.payload["doc_id"])
doc_type = result.payload.get("doc_type", "note")
doc_key = (doc_id, doc_type)
# Skip if we've already seen this document
if doc_key in seen_docs:
continue
seen_docs.add(doc_key)
# Return unverified results (verification happens at output stage)
results.append(
SearchResult(
id=doc_id,
doc_type=doc_type,
title=result.payload.get("title", "Untitled"),
excerpt=result.payload.get("excerpt", ""),
score=result.score, # RRF fusion score
metadata={
"chunk_index": result.payload.get("chunk_index"),
"total_chunks": result.payload.get("total_chunks"),
"search_method": "bm25_hybrid_rrf",
},
)
)
if len(results) >= limit:
break
logger.info(f"Returning {len(results)} unverified results after deduplication")
if results:
result_details = [
f"{r.doc_type}_{r.id} (score={r.score:.3f}, title='{r.title}')"
for r in results[:5] # Show top 5
]
logger.debug(f"Top results: {', '.join(result_details)}")
return results
+167
View File
@@ -0,0 +1,167 @@
"""Semantic search algorithm using vector similarity (Qdrant)."""
import logging
from typing import Any
from qdrant_client.models import FieldCondition, Filter, MatchValue
from nextcloud_mcp_server.config import get_settings
from nextcloud_mcp_server.embedding import get_embedding_service
from nextcloud_mcp_server.observability.metrics import record_qdrant_operation
from nextcloud_mcp_server.search.algorithms import SearchAlgorithm, SearchResult
from nextcloud_mcp_server.vector.qdrant_client import get_qdrant_client
logger = logging.getLogger(__name__)
class SemanticSearchAlgorithm(SearchAlgorithm):
"""Semantic search using vector similarity in Qdrant.
Searches documents by meaning rather than exact keywords using
768-dimensional embeddings and cosine distance.
"""
def __init__(self, score_threshold: float = 0.7):
"""Initialize semantic search algorithm.
Args:
score_threshold: Minimum similarity score (0-1, default: 0.7)
"""
self.score_threshold = score_threshold
@property
def name(self) -> str:
return "semantic"
@property
def requires_vector_db(self) -> bool:
return True
async def search(
self,
query: str,
user_id: str,
limit: int = 10,
doc_type: str | None = None,
**kwargs: Any,
) -> list[SearchResult]:
"""Execute semantic search using vector similarity.
Returns unverified results from Qdrant. Access verification should be
performed separately at the final output stage using verify_search_results().
Args:
query: Natural language search query
user_id: User ID for filtering
limit: Maximum results to return
doc_type: Optional document type filter
**kwargs: Additional parameters (score_threshold override)
Returns:
List of unverified SearchResult objects ranked by similarity score
Raises:
McpError: If vector sync is not enabled or search fails
"""
settings = get_settings()
score_threshold = kwargs.get("score_threshold", self.score_threshold)
logger.info(
f"Semantic search: query='{query}', user={user_id}, "
f"limit={limit}, score_threshold={score_threshold}, doc_type={doc_type}"
)
# Generate embedding for query
embedding_service = get_embedding_service()
query_embedding = await embedding_service.embed(query)
logger.debug(
f"Generated embedding for query (dimension={len(query_embedding)})"
)
# Build Qdrant filter
filter_conditions = [
FieldCondition(
key="user_id",
match=MatchValue(value=user_id),
)
]
# Add doc_type filter if specified
if doc_type:
filter_conditions.append(
FieldCondition(
key="doc_type",
match=MatchValue(value=doc_type),
)
)
# Search Qdrant
qdrant_client = await get_qdrant_client()
try:
search_response = await qdrant_client.query_points(
collection_name=settings.get_collection_name(),
query=query_embedding,
using="dense", # Use named dense vector (BM25 hybrid collections)
query_filter=Filter(must=filter_conditions),
limit=limit * 2, # Get extra for deduplication
score_threshold=score_threshold,
with_payload=True,
with_vectors=False, # Don't return vectors to save bandwidth
)
record_qdrant_operation("search", "success")
except Exception:
record_qdrant_operation("search", "error")
raise
logger.info(
f"Qdrant returned {len(search_response.points)} results "
f"(before deduplication)"
)
if search_response.points:
# Log top 3 scores to help with threshold tuning
top_scores = [p.score for p in search_response.points[:3]]
logger.debug(f"Top 3 similarity scores: {top_scores}")
# Deduplicate by (doc_id, doc_type) - multiple chunks per document
seen_docs = set()
results = []
for result in search_response.points:
doc_id = int(result.payload["doc_id"])
doc_type = result.payload.get("doc_type", "note")
doc_key = (doc_id, doc_type)
# Skip if we've already seen this document
if doc_key in seen_docs:
continue
seen_docs.add(doc_key)
# Return unverified results (verification happens at output stage)
results.append(
SearchResult(
id=doc_id,
doc_type=doc_type,
title=result.payload.get("title", "Untitled"),
excerpt=result.payload.get("excerpt", ""),
score=result.score,
metadata={
"chunk_index": result.payload.get("chunk_index"),
"total_chunks": result.payload.get("total_chunks"),
},
)
)
if len(results) >= limit:
break
logger.info(f"Returning {len(results)} unverified results after deduplication")
if results:
result_details = [
f"{r.doc_type}_{r.id} (score={r.score:.3f}, title='{r.title}')"
for r in results[:5] # Show top 5
]
logger.debug(f"Top results: {', '.join(result_details)}")
return results
+17
View File
@@ -12,6 +12,7 @@ from nextcloud_mcp_server.models.calendar import (
ListTodosResponse,
Todo,
)
from nextcloud_mcp_server.observability.metrics import instrument_tool
logger = logging.getLogger(__name__)
@@ -20,6 +21,7 @@ def configure_calendar_tools(mcp: FastMCP):
# Calendar tools
@mcp.tool()
@require_scopes("calendar:read")
@instrument_tool
async def nc_calendar_list_calendars(ctx: Context) -> ListCalendarsResponse:
"""List all available calendars for the user"""
client = await get_client(ctx)
@@ -30,6 +32,7 @@ def configure_calendar_tools(mcp: FastMCP):
@mcp.tool()
@require_scopes("calendar:write")
@instrument_tool
async def nc_calendar_create_event(
calendar_name: str,
title: str,
@@ -106,6 +109,7 @@ def configure_calendar_tools(mcp: FastMCP):
@mcp.tool()
@require_scopes("calendar:read")
@instrument_tool
async def nc_calendar_list_events(
calendar_name: str,
ctx: Context,
@@ -208,6 +212,7 @@ def configure_calendar_tools(mcp: FastMCP):
@mcp.tool()
@require_scopes("calendar:read")
@instrument_tool
async def nc_calendar_get_event(
calendar_name: str,
event_uid: str,
@@ -220,6 +225,7 @@ def configure_calendar_tools(mcp: FastMCP):
@mcp.tool()
@require_scopes("calendar:write")
@instrument_tool
async def nc_calendar_update_event(
calendar_name: str,
event_uid: str,
@@ -293,6 +299,7 @@ def configure_calendar_tools(mcp: FastMCP):
@mcp.tool()
@require_scopes("calendar:write")
@instrument_tool
async def nc_calendar_delete_event(
calendar_name: str,
event_uid: str,
@@ -304,6 +311,7 @@ def configure_calendar_tools(mcp: FastMCP):
@mcp.tool()
@require_scopes("calendar:write")
@instrument_tool
async def nc_calendar_create_meeting(
title: str,
date: str,
@@ -370,6 +378,7 @@ def configure_calendar_tools(mcp: FastMCP):
@mcp.tool()
@require_scopes("calendar:read")
@instrument_tool
async def nc_calendar_get_upcoming_events(
ctx: Context,
calendar_name: str = "", # Empty = all calendars
@@ -420,6 +429,7 @@ def configure_calendar_tools(mcp: FastMCP):
@mcp.tool()
@require_scopes("calendar:read")
@instrument_tool
async def nc_calendar_find_availability(
duration_minutes: int,
ctx: Context,
@@ -500,6 +510,7 @@ def configure_calendar_tools(mcp: FastMCP):
@mcp.tool()
@require_scopes("calendar:write")
@instrument_tool
async def nc_calendar_bulk_operations(
operation: str, # "update", "delete", "move"
ctx: Context,
@@ -749,6 +760,7 @@ def configure_calendar_tools(mcp: FastMCP):
@mcp.tool()
@require_scopes("calendar:write")
@instrument_tool
async def nc_calendar_manage_calendar(
action: str, # "create", "delete", "update", "list"
ctx: Context,
@@ -818,6 +830,7 @@ def configure_calendar_tools(mcp: FastMCP):
@mcp.tool()
@require_scopes("todo:read", "calendar:read")
@instrument_tool
async def nc_calendar_list_todos(
calendar_name: str,
ctx: Context,
@@ -863,6 +876,7 @@ def configure_calendar_tools(mcp: FastMCP):
@mcp.tool()
@require_scopes("todo:write", "calendar:read")
@instrument_tool
async def nc_calendar_create_todo(
calendar_name: str,
summary: str,
@@ -906,6 +920,7 @@ def configure_calendar_tools(mcp: FastMCP):
@mcp.tool()
@require_scopes("todo:write", "calendar:read")
@instrument_tool
async def nc_calendar_update_todo(
calendar_name: str,
todo_uid: str,
@@ -966,6 +981,7 @@ def configure_calendar_tools(mcp: FastMCP):
@mcp.tool()
@require_scopes("todo:write", "calendar:read")
@instrument_tool
async def nc_calendar_delete_todo(
calendar_name: str,
todo_uid: str,
@@ -986,6 +1002,7 @@ def configure_calendar_tools(mcp: FastMCP):
@mcp.tool()
@require_scopes("todo:read", "calendar:read")
@instrument_tool
async def nc_calendar_search_todos(
ctx: Context,
status: Optional[str] = None,
+8
View File
@@ -4,6 +4,7 @@ from mcp.server.fastmcp import Context, FastMCP
from nextcloud_mcp_server.auth import require_scopes
from nextcloud_mcp_server.context import get_client
from nextcloud_mcp_server.observability.metrics import instrument_tool
logger = logging.getLogger(__name__)
@@ -12,6 +13,7 @@ def configure_contacts_tools(mcp: FastMCP):
# Contacts tools
@mcp.tool()
@require_scopes("contacts:read")
@instrument_tool
async def nc_contacts_list_addressbooks(ctx: Context):
"""List all addressbooks for the user."""
client = await get_client(ctx)
@@ -19,6 +21,7 @@ def configure_contacts_tools(mcp: FastMCP):
@mcp.tool()
@require_scopes("contacts:read")
@instrument_tool
async def nc_contacts_list_contacts(ctx: Context, *, addressbook: str):
"""List all contacts in the specified addressbook."""
client = await get_client(ctx)
@@ -26,6 +29,7 @@ def configure_contacts_tools(mcp: FastMCP):
@mcp.tool()
@require_scopes("contacts:write")
@instrument_tool
async def nc_contacts_create_addressbook(
ctx: Context, *, name: str, display_name: str
):
@@ -42,6 +46,7 @@ def configure_contacts_tools(mcp: FastMCP):
@mcp.tool()
@require_scopes("contacts:write")
@instrument_tool
async def nc_contacts_delete_addressbook(ctx: Context, *, name: str):
"""Delete an addressbook."""
client = await get_client(ctx)
@@ -49,6 +54,7 @@ def configure_contacts_tools(mcp: FastMCP):
@mcp.tool()
@require_scopes("contacts:write")
@instrument_tool
async def nc_contacts_create_contact(
ctx: Context, *, addressbook: str, uid: str, contact_data: dict
):
@@ -66,6 +72,7 @@ def configure_contacts_tools(mcp: FastMCP):
@mcp.tool()
@require_scopes("contacts:write")
@instrument_tool
async def nc_contacts_delete_contact(ctx: Context, *, addressbook: str, uid: str):
"""Delete a contact."""
client = await get_client(ctx)
@@ -73,6 +80,7 @@ def configure_contacts_tools(mcp: FastMCP):
@mcp.tool()
@require_scopes("contacts:write")
@instrument_tool
async def nc_contacts_update_contact(
ctx: Context, *, addressbook: str, uid: str, contact_data: dict, etag: str = ""
):
+14
View File
@@ -24,6 +24,7 @@ from nextcloud_mcp_server.models.cookbook import (
UpdateRecipeResponse,
Version,
)
from nextcloud_mcp_server.observability.metrics import instrument_tool
logger = logging.getLogger(__name__)
@@ -72,6 +73,7 @@ def configure_cookbook_tools(mcp: FastMCP):
@mcp.tool()
@require_scopes("cookbook:write")
@instrument_tool
async def nc_cookbook_import_recipe(url: str, ctx: Context) -> ImportRecipeResponse:
"""Import a recipe from a URL using schema.org metadata.
@@ -129,6 +131,7 @@ def configure_cookbook_tools(mcp: FastMCP):
@mcp.tool()
@require_scopes("cookbook:read")
@instrument_tool
async def nc_cookbook_list_recipes(ctx: Context) -> ListRecipesResponse:
"""Get all recipes in the database"""
client = await get_client(ctx)
@@ -154,6 +157,7 @@ def configure_cookbook_tools(mcp: FastMCP):
@mcp.tool()
@require_scopes("cookbook:read")
@instrument_tool
async def nc_cookbook_get_recipe(recipe_id: int, ctx: Context) -> Recipe:
"""Get a specific recipe by its ID"""
client = await get_client(ctx)
@@ -179,6 +183,7 @@ def configure_cookbook_tools(mcp: FastMCP):
@mcp.tool()
@require_scopes("cookbook:write")
@instrument_tool
async def nc_cookbook_create_recipe(
name: str,
description: str | None = None,
@@ -258,6 +263,7 @@ def configure_cookbook_tools(mcp: FastMCP):
@mcp.tool()
@require_scopes("cookbook:write")
@instrument_tool
async def nc_cookbook_update_recipe(
recipe_id: int,
name: str | None = None,
@@ -347,6 +353,7 @@ def configure_cookbook_tools(mcp: FastMCP):
@mcp.tool()
@require_scopes("cookbook:write")
@instrument_tool
async def nc_cookbook_delete_recipe(
recipe_id: int, ctx: Context
) -> DeleteRecipeResponse:
@@ -382,6 +389,7 @@ def configure_cookbook_tools(mcp: FastMCP):
@mcp.tool()
@require_scopes("cookbook:read")
@instrument_tool
async def nc_cookbook_search_recipes(
query: str, ctx: Context
) -> SearchRecipesResponse:
@@ -418,6 +426,7 @@ def configure_cookbook_tools(mcp: FastMCP):
@mcp.tool()
@require_scopes("cookbook:read")
@instrument_tool
async def nc_cookbook_list_categories(ctx: Context) -> ListCategoriesResponse:
"""Get all known categories.
@@ -445,6 +454,7 @@ def configure_cookbook_tools(mcp: FastMCP):
@mcp.tool()
@require_scopes("cookbook:read")
@instrument_tool
async def nc_cookbook_get_recipes_in_category(
category: str, ctx: Context
) -> ListRecipesResponse:
@@ -481,6 +491,7 @@ def configure_cookbook_tools(mcp: FastMCP):
@mcp.tool()
@require_scopes("cookbook:read")
@instrument_tool
async def nc_cookbook_list_keywords(ctx: Context) -> ListKeywordsResponse:
"""Get all known keywords/tags"""
client = await get_client(ctx)
@@ -506,6 +517,7 @@ def configure_cookbook_tools(mcp: FastMCP):
@mcp.tool()
@require_scopes("cookbook:read")
@instrument_tool
async def nc_cookbook_get_recipes_with_keywords(
keywords: list[str], ctx: Context
) -> ListRecipesResponse:
@@ -540,6 +552,7 @@ def configure_cookbook_tools(mcp: FastMCP):
@mcp.tool()
@require_scopes("cookbook:write")
@instrument_tool
async def nc_cookbook_set_config(
folder: str | None = None,
update_interval: int | None = None,
@@ -583,6 +596,7 @@ def configure_cookbook_tools(mcp: FastMCP):
@mcp.tool()
@require_scopes("cookbook:write")
@instrument_tool
async def nc_cookbook_reindex(ctx: Context) -> ReindexResponse:
"""Trigger a rescan of all recipes into the caching database.
+26
View File
@@ -18,6 +18,7 @@ from nextcloud_mcp_server.models.deck import (
LabelOperationResponse,
StackOperationResponse,
)
from nextcloud_mcp_server.observability.metrics import instrument_tool
logger = logging.getLogger(__name__)
@@ -118,6 +119,7 @@ def configure_deck_tools(mcp: FastMCP):
@mcp.tool()
@require_scopes("deck:read")
@instrument_tool
async def deck_get_boards(ctx: Context) -> list[DeckBoard]:
"""Get all Nextcloud Deck boards"""
client = await get_client(ctx)
@@ -126,6 +128,7 @@ def configure_deck_tools(mcp: FastMCP):
@mcp.tool()
@require_scopes("deck:read")
@instrument_tool
async def deck_get_board(ctx: Context, board_id: int) -> DeckBoard:
"""Get details of a specific Nextcloud Deck board"""
client = await get_client(ctx)
@@ -134,6 +137,7 @@ def configure_deck_tools(mcp: FastMCP):
@mcp.tool()
@require_scopes("deck:read")
@instrument_tool
async def deck_get_stacks(ctx: Context, board_id: int) -> list[DeckStack]:
"""Get all stacks in a Nextcloud Deck board"""
client = await get_client(ctx)
@@ -142,6 +146,7 @@ def configure_deck_tools(mcp: FastMCP):
@mcp.tool()
@require_scopes("deck:read")
@instrument_tool
async def deck_get_stack(ctx: Context, board_id: int, stack_id: int) -> DeckStack:
"""Get details of a specific Nextcloud Deck stack"""
client = await get_client(ctx)
@@ -150,6 +155,7 @@ def configure_deck_tools(mcp: FastMCP):
@mcp.tool()
@require_scopes("deck:read")
@instrument_tool
async def deck_get_cards(
ctx: Context, board_id: int, stack_id: int
) -> list[DeckCard]:
@@ -162,6 +168,7 @@ def configure_deck_tools(mcp: FastMCP):
@mcp.tool()
@require_scopes("deck:read")
@instrument_tool
async def deck_get_card(
ctx: Context, board_id: int, stack_id: int, card_id: int
) -> DeckCard:
@@ -172,6 +179,7 @@ def configure_deck_tools(mcp: FastMCP):
@mcp.tool()
@require_scopes("deck:read")
@instrument_tool
async def deck_get_labels(ctx: Context, board_id: int) -> list[DeckLabel]:
"""Get all labels in a Nextcloud Deck board"""
client = await get_client(ctx)
@@ -180,6 +188,7 @@ def configure_deck_tools(mcp: FastMCP):
@mcp.tool()
@require_scopes("deck:read")
@instrument_tool
async def deck_get_label(ctx: Context, board_id: int, label_id: int) -> DeckLabel:
"""Get details of a specific Nextcloud Deck label"""
client = await get_client(ctx)
@@ -190,6 +199,7 @@ def configure_deck_tools(mcp: FastMCP):
@mcp.tool()
@require_scopes("deck:write")
@instrument_tool
async def deck_create_board(
ctx: Context, title: str, color: str
) -> CreateBoardResponse:
@@ -207,6 +217,7 @@ def configure_deck_tools(mcp: FastMCP):
@mcp.tool()
@require_scopes("deck:write")
@instrument_tool
async def deck_create_stack(
ctx: Context, board_id: int, title: str, order: int
) -> CreateStackResponse:
@@ -223,6 +234,7 @@ def configure_deck_tools(mcp: FastMCP):
@mcp.tool()
@require_scopes("deck:write")
@instrument_tool
async def deck_update_stack(
ctx: Context,
board_id: int,
@@ -249,6 +261,7 @@ def configure_deck_tools(mcp: FastMCP):
@mcp.tool()
@require_scopes("deck:write")
@instrument_tool
async def deck_delete_stack(
ctx: Context, board_id: int, stack_id: int
) -> StackOperationResponse:
@@ -270,6 +283,7 @@ def configure_deck_tools(mcp: FastMCP):
# Card Tools
@mcp.tool()
@require_scopes("deck:write")
@instrument_tool
async def deck_create_card(
ctx: Context,
board_id: int,
@@ -304,6 +318,7 @@ def configure_deck_tools(mcp: FastMCP):
@mcp.tool()
@require_scopes("deck:write")
@instrument_tool
async def deck_update_card(
ctx: Context,
board_id: int,
@@ -357,6 +372,7 @@ def configure_deck_tools(mcp: FastMCP):
@mcp.tool()
@require_scopes("deck:write")
@instrument_tool
async def deck_delete_card(
ctx: Context, board_id: int, stack_id: int, card_id: int
) -> CardOperationResponse:
@@ -379,6 +395,7 @@ def configure_deck_tools(mcp: FastMCP):
@mcp.tool()
@require_scopes("deck:write")
@instrument_tool
async def deck_archive_card(
ctx: Context, board_id: int, stack_id: int, card_id: int
) -> CardOperationResponse:
@@ -401,6 +418,7 @@ def configure_deck_tools(mcp: FastMCP):
@mcp.tool()
@require_scopes("deck:write")
@instrument_tool
async def deck_unarchive_card(
ctx: Context, board_id: int, stack_id: int, card_id: int
) -> CardOperationResponse:
@@ -423,6 +441,7 @@ def configure_deck_tools(mcp: FastMCP):
@mcp.tool()
@require_scopes("deck:write")
@instrument_tool
async def deck_reorder_card(
ctx: Context,
board_id: int,
@@ -455,6 +474,7 @@ def configure_deck_tools(mcp: FastMCP):
# Label Tools
@mcp.tool()
@require_scopes("deck:write")
@instrument_tool
async def deck_create_label(
ctx: Context, board_id: int, title: str, color: str
) -> CreateLabelResponse:
@@ -471,6 +491,7 @@ def configure_deck_tools(mcp: FastMCP):
@mcp.tool()
@require_scopes("deck:write")
@instrument_tool
async def deck_update_label(
ctx: Context,
board_id: int,
@@ -497,6 +518,7 @@ def configure_deck_tools(mcp: FastMCP):
@mcp.tool()
@require_scopes("deck:write")
@instrument_tool
async def deck_delete_label(
ctx: Context, board_id: int, label_id: int
) -> LabelOperationResponse:
@@ -518,6 +540,7 @@ def configure_deck_tools(mcp: FastMCP):
# Card-Label Assignment Tools
@mcp.tool()
@require_scopes("deck:write")
@instrument_tool
async def deck_assign_label_to_card(
ctx: Context, board_id: int, stack_id: int, card_id: int, label_id: int
) -> CardOperationResponse:
@@ -541,6 +564,7 @@ def configure_deck_tools(mcp: FastMCP):
@mcp.tool()
@require_scopes("deck:write")
@instrument_tool
async def deck_remove_label_from_card(
ctx: Context, board_id: int, stack_id: int, card_id: int, label_id: int
) -> CardOperationResponse:
@@ -565,6 +589,7 @@ def configure_deck_tools(mcp: FastMCP):
# Card-User Assignment Tools
@mcp.tool()
@require_scopes("deck:write")
@instrument_tool
async def deck_assign_user_to_card(
ctx: Context, board_id: int, stack_id: int, card_id: int, user_id: str
) -> CardOperationResponse:
@@ -588,6 +613,7 @@ def configure_deck_tools(mcp: FastMCP):
@mcp.tool()
@require_scopes("deck:write")
@instrument_tool
async def deck_unassign_user_from_card(
ctx: Context, board_id: int, stack_id: int, card_id: int, user_id: str
) -> CardOperationResponse:
+8
View File
@@ -17,6 +17,7 @@ from nextcloud_mcp_server.models.notes import (
SearchNotesResponse,
UpdateNoteResponse,
)
from nextcloud_mcp_server.observability.metrics import instrument_tool
logger = logging.getLogger(__name__)
@@ -86,6 +87,7 @@ def configure_notes_tools(mcp: FastMCP):
@mcp.tool()
@require_scopes("notes:write")
@instrument_tool
async def nc_notes_create_note(
title: str, content: str, category: str, ctx: Context
) -> CreateNoteResponse:
@@ -132,6 +134,7 @@ def configure_notes_tools(mcp: FastMCP):
@mcp.tool()
@require_scopes("notes:write")
@instrument_tool
async def nc_notes_update_note(
note_id: int,
etag: str,
@@ -197,6 +200,7 @@ def configure_notes_tools(mcp: FastMCP):
@mcp.tool()
@require_scopes("notes:write")
@instrument_tool
async def nc_notes_append_content(
note_id: int, content: str, ctx: Context
) -> AppendContentResponse:
@@ -247,6 +251,7 @@ def configure_notes_tools(mcp: FastMCP):
@mcp.tool()
@require_scopes("notes:read")
@instrument_tool
async def nc_notes_search_notes(query: str, ctx: Context) -> SearchNotesResponse:
"""Search notes by title or content, returning only id, title, and category (requires notes:read scope)."""
client = await get_client(ctx)
@@ -293,6 +298,7 @@ def configure_notes_tools(mcp: FastMCP):
@mcp.tool()
@require_scopes("notes:read")
@instrument_tool
async def nc_notes_get_note(note_id: int, ctx: Context) -> Note:
"""Get a specific note by its ID (requires notes:read scope)"""
client = await get_client(ctx)
@@ -322,6 +328,7 @@ def configure_notes_tools(mcp: FastMCP):
@mcp.tool()
@require_scopes("notes:read")
@instrument_tool
async def nc_notes_get_attachment(
note_id: int, attachment_filename: str, ctx: Context
) -> dict[str, str]:
@@ -368,6 +375,7 @@ def configure_notes_tools(mcp: FastMCP):
@mcp.tool()
@require_scopes("notes:write")
@instrument_tool
async def nc_notes_delete_note(note_id: int, ctx: Context) -> DeleteNoteResponse:
"""Delete a note permanently"""
logger.info("Deleting note %s", note_id)
+196 -147
View File
@@ -2,7 +2,8 @@
import logging
from httpx import HTTPStatusError, RequestError
import anyio
from httpx import RequestError
from mcp.server.fastmcp import Context, FastMCP
from mcp.shared.exceptions import McpError
from mcp.types import (
@@ -21,6 +22,10 @@ from nextcloud_mcp_server.models.semantic import (
SemanticSearchResult,
VectorSyncStatusResponse,
)
from nextcloud_mcp_server.observability.metrics import (
instrument_tool,
)
from nextcloud_mcp_server.search.bm25_hybrid import BM25HybridSearchAlgorithm
logger = logging.getLogger(__name__)
@@ -30,184 +35,158 @@ def configure_semantic_tools(mcp: FastMCP):
@mcp.tool()
@require_scopes("semantic:read")
@instrument_tool
async def nc_semantic_search(
query: str, ctx: Context, limit: int = 10, score_threshold: float = 0.7
query: str,
ctx: Context,
limit: int = 10,
doc_types: list[str] | None = None,
score_threshold: float = 0.0,
) -> SemanticSearchResponse:
"""
Semantic search across all indexed Nextcloud apps using vector embeddings.
Search Nextcloud content using BM25 hybrid search with cross-app support.
Searches documents by meaning rather than exact keywords across notes, calendar
events, deck cards, files, and contacts. Requires vector database synchronization
to be enabled (VECTOR_SYNC_ENABLED=true).
Uses Qdrant's native hybrid search combining:
- Dense semantic vectors: For conceptual similarity and natural language queries
- BM25 sparse vectors: For precise keyword matching, acronyms, and specific terms
Results are automatically fused using Reciprocal Rank Fusion (RRF) in the
database for optimal relevance. This provides the best of both semantic
understanding and keyword precision.
Requires VECTOR_SYNC_ENABLED=true. Currently only "note" documents are
fully supported for indexing.
Args:
query: Natural language search query
query: Natural language or keyword search query
limit: Maximum number of results to return (default: 10)
score_threshold: Minimum similarity score (0-1, default: 0.7)
doc_types: Document types to search (e.g., ["note", "file"]). None = search all indexed types (default)
score_threshold: Minimum RRF fusion score (0-1, default: 0.0 for RRF scoring)
Returns:
SemanticSearchResponse with matching documents and similarity scores
SemanticSearchResponse with matching documents ranked by RRF fusion scores
"""
from qdrant_client.models import FieldCondition, Filter, MatchValue
from nextcloud_mcp_server.config import get_settings
from nextcloud_mcp_server.embedding import get_embedding_service
from nextcloud_mcp_server.vector.qdrant_client import get_qdrant_client
settings = get_settings()
# Check if vector sync is enabled
if not settings.vector_sync_enabled:
raise McpError(
ErrorData(
code=-1,
message="Semantic search is not enabled. Set VECTOR_SYNC_ENABLED=true and ensure vector database is configured.",
)
)
client = await get_client(ctx)
username = client.username
logger.info(
f"Semantic search: query='{query}', user={username}, "
f"BM25 hybrid search: query='{query}', user={username}, "
f"limit={limit}, score_threshold={score_threshold}"
)
# Check that vector sync is enabled
if not settings.vector_sync_enabled:
raise McpError(
ErrorData(
code=-1,
message="BM25 hybrid search requires VECTOR_SYNC_ENABLED=true",
)
)
try:
# Generate embedding for query
embedding_service = get_embedding_service()
query_embedding = await embedding_service.embed(query)
logger.debug(
f"Generated embedding for query (dimension={len(query_embedding)})"
)
# Create BM25 hybrid search algorithm
search_algo = BM25HybridSearchAlgorithm(score_threshold=score_threshold)
# Search Qdrant with user filtering
# Note: Currently only searching notes (doc_type="note")
# Future: Remove doc_type filter to search all apps
qdrant_client = await get_qdrant_client()
search_response = await qdrant_client.query_points(
collection_name=settings.get_collection_name(),
query=query_embedding,
query_filter=Filter(
must=[
FieldCondition(
key="user_id",
match=MatchValue(value=username),
),
FieldCondition(
key="doc_type",
match=MatchValue(value="note"),
),
]
),
limit=limit * 2, # Get extra for filtering
score_threshold=score_threshold,
with_payload=True,
with_vectors=False, # Don't return vectors to save bandwidth
)
# Execute search across requested document types
# If doc_types is None, search all indexed types (cross-app search)
# If doc_types is a list, search only those types
all_results = []
logger.info(
f"Qdrant returned {len(search_response.points)} results "
f"(before deduplication and access verification)"
)
if search_response.points:
# Log top 3 scores to help with threshold tuning
top_scores = [p.score for p in search_response.points[:3]]
logger.debug(f"Top 3 similarity scores: {top_scores}")
if doc_types is None:
# Cross-app search: search all indexed types
# Get unverified results from Qdrant
unverified_results = await search_algo.search(
query=query,
user_id=username,
limit=limit * 2, # Get extra for access filtering
doc_type=None, # Signal to search all types
score_threshold=score_threshold,
)
all_results.extend(unverified_results)
else:
# Search specific document types
# For each requested type, execute search and combine results
for dtype in doc_types:
unverified_results = await search_algo.search(
query=query,
user_id=username,
limit=limit * 2, # Get extra for combining and filtering
doc_type=dtype,
score_threshold=score_threshold,
)
all_results.extend(unverified_results)
# Deduplicate by document ID (multiple chunks per document)
seen_doc_ids = set()
# Sort combined results by score
all_results.sort(key=lambda r: r.score, reverse=True)
# Deduplicate results (hybrid search may return same doc from dense + sparse)
# Qdrant already filters by user_id for multi-tenant isolation
# Sampling tool will verify access when fetching full content
seen = set()
unique_results = []
for result in all_results:
key = (result.id, result.doc_type)
if key not in seen:
seen.add(key)
unique_results.append(result)
search_results = unique_results[:limit] # Final limit after deduplication
# Convert SearchResult objects to SemanticSearchResult for response
results = []
for r in search_results:
results.append(
SemanticSearchResult(
id=r.id,
doc_type=r.doc_type,
title=r.title,
category=r.metadata.get("category", "") if r.metadata else "",
excerpt=r.excerpt,
score=r.score,
chunk_index=r.metadata.get("chunk_index", 0)
if r.metadata
else 0,
total_chunks=r.metadata.get("total_chunks", 1)
if r.metadata
else 1,
)
)
for result in search_response.points:
doc_id = int(result.payload["doc_id"])
doc_type = result.payload.get("doc_type", "note")
# Skip if we've already seen this document
if doc_id in seen_doc_ids:
continue
seen_doc_ids.add(doc_id)
# Verify access via Nextcloud API (dual-phase authorization)
# Currently only supports notes, will be extended to other apps
if doc_type == "note":
try:
note = await client.notes.get_note(doc_id)
results.append(
SemanticSearchResult(
id=doc_id,
doc_type="note",
title=result.payload["title"],
category=note.get("category", ""),
excerpt=result.payload["excerpt"],
score=result.score,
chunk_index=result.payload["chunk_index"],
total_chunks=result.payload["total_chunks"],
)
)
if len(results) >= limit:
break
except HTTPStatusError as e:
if e.response.status_code == 403:
# User lost access, skip this document
logger.debug(f"Skipping note {doc_id}: access denied (403)")
continue
elif e.response.status_code == 404:
# Document was deleted but not yet removed from vector DB
logger.debug(
f"Skipping note {doc_id}: not found (404), "
f"likely deleted after indexing"
)
continue
else:
# Log other errors but continue processing
logger.warning(
f"Error verifying access to note {doc_id}: {e.response.status_code}"
)
continue
logger.info(
f"Returning {len(results)} results after deduplication and access verification"
)
if results:
result_details = [
f"note_{r.id} (score={r.score:.3f}, title='{r.title}')"
for r in results[:5] # Show top 5
]
logger.debug(f"Top results: {', '.join(result_details)}")
logger.info(f"Returning {len(results)} results from BM25 hybrid search")
return SemanticSearchResponse(
results=results,
query=query,
total_found=len(results),
search_method="semantic",
search_method="bm25_hybrid",
)
except ValueError as e:
if "No embedding provider configured" in str(e):
error_msg = str(e)
if "No embedding provider configured" in error_msg:
raise McpError(
ErrorData(
code=-1,
message="Embedding service not configured. Set OLLAMA_BASE_URL environment variable.",
)
)
raise McpError(ErrorData(code=-1, message=f"Configuration error: {str(e)}"))
raise McpError(
ErrorData(code=-1, message=f"Configuration error: {error_msg}")
)
except RequestError as e:
raise McpError(
ErrorData(code=-1, message=f"Network error during search: {str(e)}")
)
except Exception as e:
logger.error(f"Semantic search error: {e}", exc_info=True)
raise McpError(
ErrorData(code=-1, message=f"Semantic search failed: {str(e)}")
)
logger.error(f"Search error: {e}", exc_info=True)
raise McpError(ErrorData(code=-1, message=f"Search failed: {str(e)}"))
@mcp.tool()
@require_scopes("semantic:read")
@instrument_tool
async def nc_semantic_search_answer(
query: str,
ctx: Context,
@@ -331,21 +310,91 @@ def configure_semantic_tools(mcp: FastMCP):
success=True,
)
# 4. Construct context from retrieved documents
# 4. Fetch full content for notes in parallel (also verifies access)
# Use anyio task group for concurrent fetching with semaphore to prevent
# connection pool exhaustion
client = await get_client(ctx)
accessible_results = [None] * len(search_response.results)
full_contents = [None] * len(search_response.results)
# Limit concurrent requests to prevent connection pool exhaustion
max_concurrent = 20
semaphore = anyio.Semaphore(max_concurrent)
async def fetch_content(index: int, result: SemanticSearchResult):
"""Fetch full content for a single document (parallel with semaphore)."""
async with semaphore:
if result.doc_type == "note":
try:
note = await client.notes.get_note(result.id)
# Note is accessible, store result and full content
content = note.get("content", "")
accessible_results[index] = result
full_contents[index] = content
logger.debug(
f"Fetched full content for note {result.id} "
f"(length: {len(content)} chars)"
)
except Exception as e:
# Note might have been deleted or permissions changed
# Leave as None to filter out later
logger.debug(
f"Note {result.id} not accessible: {e}. "
f"Excluding from results."
)
else:
# Non-note document types (future: calendar, deck, files)
# For now, keep them with excerpts
accessible_results[index] = result
# full_contents[index] remains None (will use excerpt)
# Run all fetches in parallel using anyio task group
async with anyio.create_task_group() as tg:
for idx, result in enumerate(search_response.results):
tg.start_soon(fetch_content, idx, result)
# Filter out None (inaccessible notes) while preserving order
final_pairs = [
(r, c) for r, c in zip(accessible_results, full_contents) if r is not None
]
accessible_results = [r for r, c in final_pairs]
full_contents = [c for r, c in final_pairs]
# Check if we filtered out all results
if not accessible_results:
logger.warning(f"All search results became inaccessible for query: {query}")
return SamplingSearchResponse(
query=query,
generated_answer="All matching documents are no longer accessible.",
sources=[],
total_found=0,
search_method="semantic_sampling",
success=True,
)
# 5. Construct context from accessible documents with full content
context_parts = []
for idx, result in enumerate(search_response.results, 1):
for idx, (result, content) in enumerate(
zip(accessible_results, full_contents), 1
):
# Use full content if available (notes), otherwise use excerpt
if content is not None:
content_field = f"Content: {content}"
else:
content_field = f"Excerpt: {result.excerpt}"
context_parts.append(
f"[Document {idx}]\n"
f"Type: {result.doc_type}\n"
f"Title: {result.title}\n"
f"Category: {result.category}\n"
f"Excerpt: {result.excerpt}\n"
f"{content_field}\n"
f"Relevance Score: {result.score:.2f}\n"
)
context = "\n".join(context_parts)
# 5. Construct prompt - reuse user's query, add context and instructions
# 6. Construct prompt - reuse user's query, add context and instructions
prompt = (
f"{query}\n\n"
f"Here are relevant documents from Nextcloud (notes, calendar events, deck cards, files, contacts):\n\n"
@@ -361,7 +410,6 @@ def configure_semantic_tools(mcp: FastMCP):
)
# 6. Request LLM completion via MCP sampling with timeout
import anyio
try:
with anyio.fail_after(30):
@@ -401,8 +449,8 @@ def configure_semantic_tools(mcp: FastMCP):
return SamplingSearchResponse(
query=query,
generated_answer=generated_answer,
sources=search_response.results,
total_found=search_response.total_found,
sources=accessible_results,
total_found=len(accessible_results),
search_method="semantic_sampling",
model_used=sampling_result.model,
stop_reason=sampling_result.stopReason,
@@ -419,11 +467,11 @@ def configure_semantic_tools(mcp: FastMCP):
generated_answer=(
f"[Sampling request timed out]\n\n"
f"The answer generation took too long (>30s). "
f"Found {search_response.total_found} relevant documents. "
f"Found {len(accessible_results)} relevant documents. "
f"Please review the sources below or try a simpler query."
),
sources=search_response.results,
total_found=search_response.total_found,
sources=accessible_results,
total_found=len(accessible_results),
search_method="semantic_sampling_timeout",
success=True,
)
@@ -454,11 +502,11 @@ def configure_semantic_tools(mcp: FastMCP):
query=query,
generated_answer=(
f"[{user_message}]\n\n"
f"Found {search_response.total_found} relevant documents. "
f"Found {len(accessible_results)} relevant documents. "
f"Please review the sources below."
),
sources=search_response.results,
total_found=search_response.total_found,
sources=accessible_results,
total_found=len(accessible_results),
search_method=search_method,
success=True,
)
@@ -475,17 +523,18 @@ def configure_semantic_tools(mcp: FastMCP):
query=query,
generated_answer=(
f"[Unexpected error during sampling]\n\n"
f"Found {search_response.total_found} relevant documents. "
f"Found {len(accessible_results)} relevant documents. "
f"Please review the sources below."
),
sources=search_response.results,
total_found=search_response.total_found,
sources=accessible_results,
total_found=len(accessible_results),
search_method="semantic_sampling_error",
success=True,
)
@mcp.tool()
@require_scopes("semantic:read")
@instrument_tool
async def nc_get_vector_sync_status(ctx: Context) -> VectorSyncStatusResponse:
"""Get the current vector sync status.
+6
View File
@@ -6,6 +6,7 @@ from mcp.server.fastmcp import Context, FastMCP
from nextcloud_mcp_server.auth import require_scopes
from nextcloud_mcp_server.context import get_client
from nextcloud_mcp_server.observability.metrics import instrument_tool
def configure_sharing_tools(mcp: FastMCP):
@@ -17,6 +18,7 @@ def configure_sharing_tools(mcp: FastMCP):
@mcp.tool()
@require_scopes("sharing:write")
@instrument_tool
async def nc_share_create(
path: str,
share_with: str,
@@ -56,6 +58,7 @@ def configure_sharing_tools(mcp: FastMCP):
@mcp.tool()
@require_scopes("sharing:write")
@instrument_tool
async def nc_share_delete(share_id: int, ctx: Context) -> str:
"""Delete a share by its ID.
@@ -75,6 +78,7 @@ def configure_sharing_tools(mcp: FastMCP):
@mcp.tool()
@require_scopes("sharing:write")
@instrument_tool
async def nc_share_get(share_id: int, ctx: Context) -> str:
"""Get information about a specific share.
@@ -93,6 +97,7 @@ def configure_sharing_tools(mcp: FastMCP):
@mcp.tool()
@require_scopes("sharing:write")
@instrument_tool
async def nc_share_list(
ctx: Context, path: str | None = None, shared_with_me: bool = False
) -> str:
@@ -114,6 +119,7 @@ def configure_sharing_tools(mcp: FastMCP):
@mcp.tool()
@require_scopes("sharing:write")
@instrument_tool
async def nc_share_update(share_id: int, permissions: int, ctx: Context) -> str:
"""Update the permissions of an existing share.
+7
View File
@@ -4,6 +4,7 @@ from mcp.server.fastmcp import Context, FastMCP
from nextcloud_mcp_server.auth import require_scopes
from nextcloud_mcp_server.context import get_client
from nextcloud_mcp_server.observability.metrics import instrument_tool
logger = logging.getLogger(__name__)
@@ -12,6 +13,7 @@ def configure_tables_tools(mcp: FastMCP):
# Tables tools
@mcp.tool()
@require_scopes("tables:read")
@instrument_tool
async def nc_tables_list_tables(ctx: Context):
"""List all tables available to the user"""
client = await get_client(ctx)
@@ -19,6 +21,7 @@ def configure_tables_tools(mcp: FastMCP):
@mcp.tool()
@require_scopes("tables:read")
@instrument_tool
async def nc_tables_get_schema(table_id: int, ctx: Context):
"""Get the schema/structure of a specific table including columns and views"""
client = await get_client(ctx)
@@ -26,6 +29,7 @@ def configure_tables_tools(mcp: FastMCP):
@mcp.tool()
@require_scopes("tables:read")
@instrument_tool
async def nc_tables_read_table(
table_id: int,
ctx: Context,
@@ -38,6 +42,7 @@ def configure_tables_tools(mcp: FastMCP):
@mcp.tool()
@require_scopes("tables:write")
@instrument_tool
async def nc_tables_insert_row(table_id: int, data: dict, ctx: Context):
"""Insert a new row into a table.
@@ -48,6 +53,7 @@ def configure_tables_tools(mcp: FastMCP):
@mcp.tool()
@require_scopes("tables:write")
@instrument_tool
async def nc_tables_update_row(row_id: int, data: dict, ctx: Context):
"""Update an existing row in a table.
@@ -58,6 +64,7 @@ def configure_tables_tools(mcp: FastMCP):
@mcp.tool()
@require_scopes("tables:write")
@instrument_tool
async def nc_tables_delete_row(row_id: int, ctx: Context):
"""Delete a row from a table"""
client = await get_client(ctx)
+12
View File
@@ -5,6 +5,7 @@ from mcp.server.fastmcp import Context, FastMCP
from nextcloud_mcp_server.auth import require_scopes
from nextcloud_mcp_server.context import get_client
from nextcloud_mcp_server.models import DirectoryListing, FileInfo, SearchFilesResponse
from nextcloud_mcp_server.observability.metrics import instrument_tool
from nextcloud_mcp_server.utils.document_parser import (
is_parseable_document,
parse_document,
@@ -17,6 +18,7 @@ def configure_webdav_tools(mcp: FastMCP):
# WebDAV file system tools
@mcp.tool()
@require_scopes("files:read")
@instrument_tool
async def nc_webdav_list_directory(
ctx: Context, path: str = ""
) -> DirectoryListing:
@@ -50,6 +52,7 @@ def configure_webdav_tools(mcp: FastMCP):
@mcp.tool()
@require_scopes("files:read")
@instrument_tool
async def nc_webdav_read_file(path: str, ctx: Context):
"""Read the content of a file from NextCloud.
@@ -130,6 +133,7 @@ def configure_webdav_tools(mcp: FastMCP):
@mcp.tool()
@require_scopes("files:write")
@instrument_tool
async def nc_webdav_write_file(
path: str, content: str, ctx: Context, content_type: str | None = None
):
@@ -158,6 +162,7 @@ def configure_webdav_tools(mcp: FastMCP):
@mcp.tool()
@require_scopes("files:write")
@instrument_tool
async def nc_webdav_create_directory(path: str, ctx: Context):
"""Create a directory in NextCloud.
@@ -172,6 +177,7 @@ def configure_webdav_tools(mcp: FastMCP):
@mcp.tool()
@require_scopes("files:write")
@instrument_tool
async def nc_webdav_delete_resource(path: str, ctx: Context):
"""Delete a file or directory in NextCloud.
@@ -186,6 +192,7 @@ def configure_webdav_tools(mcp: FastMCP):
@mcp.tool()
@require_scopes("files:write")
@instrument_tool
async def nc_webdav_move_resource(
source_path: str, destination_path: str, ctx: Context, overwrite: bool = False
):
@@ -206,6 +213,7 @@ def configure_webdav_tools(mcp: FastMCP):
@mcp.tool()
@require_scopes("files:write")
@instrument_tool
async def nc_webdav_copy_resource(
source_path: str, destination_path: str, ctx: Context, overwrite: bool = False
):
@@ -226,6 +234,7 @@ def configure_webdav_tools(mcp: FastMCP):
@mcp.tool()
@require_scopes("files:read")
@instrument_tool
async def nc_webdav_search_files(
ctx: Context,
scope: str = "",
@@ -342,6 +351,7 @@ def configure_webdav_tools(mcp: FastMCP):
@mcp.tool()
@require_scopes("files:read")
@instrument_tool
async def nc_webdav_find_by_name(
pattern: str, ctx: Context, scope: str = "", limit: int | None = None
) -> SearchFilesResponse:
@@ -369,6 +379,7 @@ def configure_webdav_tools(mcp: FastMCP):
@mcp.tool()
@require_scopes("files:read")
@instrument_tool
async def nc_webdav_find_by_type(
mime_type: str, ctx: Context, scope: str = "", limit: int | None = None
) -> SearchFilesResponse:
@@ -396,6 +407,7 @@ def configure_webdav_tools(mcp: FastMCP):
@mcp.tool()
@require_scopes("files:read")
@instrument_tool
async def nc_webdav_list_favorites(
ctx: Context, scope: str = "", limit: int | None = None
) -> SearchFilesResponse:
+140
View File
@@ -0,0 +1,140 @@
"""Custom PCA implementation for dimensionality reduction.
Implements Principal Component Analysis without scikit-learn dependency.
Used for reducing high-dimensional embeddings (768-dim) to 2D for visualization.
"""
import logging
import numpy as np
logger = logging.getLogger(__name__)
class PCA:
"""Principal Component Analysis for dimensionality reduction.
Simple implementation that finds principal components via eigendecomposition
of the covariance matrix. Suitable for small-to-medium datasets.
Attributes:
n_components: Number of principal components to keep
mean_: Mean of training data (set during fit)
components_: Principal components (eigenvectors)
explained_variance_: Variance explained by each component
explained_variance_ratio_: Fraction of total variance explained
"""
def __init__(self, n_components: int = 2):
"""Initialize PCA.
Args:
n_components: Number of components to keep (default: 2)
"""
if n_components < 1:
raise ValueError(f"n_components must be >= 1, got {n_components}")
self.n_components = n_components
self.mean_: np.ndarray | None = None
self.components_: np.ndarray | None = None
self.explained_variance_: np.ndarray | None = None
self.explained_variance_ratio_: np.ndarray | None = None
def fit(self, X: np.ndarray) -> "PCA":
"""Fit PCA model to data.
Args:
X: Training data of shape (n_samples, n_features)
Returns:
self (for method chaining)
Raises:
ValueError: If X has fewer features than n_components
"""
X = np.asarray(X)
if X.ndim != 2:
raise ValueError(f"X must be 2D array, got shape {X.shape}")
n_samples, n_features = X.shape
if n_features < self.n_components:
raise ValueError(
f"n_components={self.n_components} > n_features={n_features}"
)
# Center data
self.mean_ = np.mean(X, axis=0)
X_centered = X - self.mean_
# Compute covariance matrix
# Use (X^T X) / (n-1) for numerical stability with high-dim data
cov = np.cov(X_centered.T)
# Eigendecomposition
eigenvalues, eigenvectors = np.linalg.eigh(cov)
# Sort by eigenvalue (descending)
idx = np.argsort(eigenvalues)[::-1]
eigenvalues = eigenvalues[idx]
eigenvectors = eigenvectors[:, idx]
# Keep top n_components
self.components_ = eigenvectors[:, : self.n_components].T
self.explained_variance_ = eigenvalues[: self.n_components]
# Calculate explained variance ratio
total_variance = np.sum(eigenvalues)
if total_variance > 0:
self.explained_variance_ratio_ = self.explained_variance_ / total_variance
else:
self.explained_variance_ratio_ = np.zeros(self.n_components)
logger.debug(
f"PCA fit: {n_samples} samples, {n_features} features → "
f"{self.n_components} components, "
f"explained variance: {self.explained_variance_ratio_}"
)
return self
def transform(self, X: np.ndarray) -> np.ndarray:
"""Transform data to principal component space.
Args:
X: Data to transform of shape (n_samples, n_features)
Returns:
Transformed data of shape (n_samples, n_components)
Raises:
ValueError: If PCA not fitted yet
"""
if self.mean_ is None or self.components_ is None:
raise ValueError("PCA not fitted yet. Call fit() first.")
X = np.asarray(X)
if X.ndim != 2:
raise ValueError(f"X must be 2D array, got shape {X.shape}")
# Center using training mean
X_centered = X - self.mean_
# Project onto principal components
X_transformed = np.dot(X_centered, self.components_.T)
return X_transformed
def fit_transform(self, X: np.ndarray) -> np.ndarray:
"""Fit PCA model and transform data in one step.
Args:
X: Training data of shape (n_samples, n_features)
Returns:
Transformed data of shape (n_samples, n_components)
"""
self.fit(X)
return self.transform(X)
+108 -54
View File
@@ -8,13 +8,19 @@ import time
import uuid
import anyio
from anyio.abc import TaskStatus
from anyio.streams.memory import MemoryObjectReceiveStream
from httpx import HTTPStatusError
from qdrant_client.models import FieldCondition, Filter, MatchValue, PointStruct
from nextcloud_mcp_server.client import NextcloudClient
from nextcloud_mcp_server.config import get_settings
from nextcloud_mcp_server.embedding import get_embedding_service
from nextcloud_mcp_server.embedding import get_bm25_service, get_embedding_service
from nextcloud_mcp_server.observability.metrics import (
record_qdrant_operation,
record_vector_sync_processing,
update_vector_sync_queue_size,
)
from nextcloud_mcp_server.observability.tracing import trace_operation
from nextcloud_mcp_server.vector.document_chunker import DocumentChunker
from nextcloud_mcp_server.vector.qdrant_client import get_qdrant_client
@@ -29,6 +35,8 @@ async def processor_task(
shutdown_event: anyio.Event,
nc_client: NextcloudClient,
user_id: str,
*,
task_status: TaskStatus = anyio.TASK_STATUS_IGNORED,
):
"""
Process documents from stream concurrently.
@@ -48,20 +56,34 @@ async def processor_task(
shutdown_event: Event signaling shutdown
nc_client: Authenticated Nextcloud client
user_id: User being processed
task_status: Status object for signaling task readiness
"""
logger.info(f"Processor {worker_id} started")
# Signal that the task has started and is ready
task_status.started()
while not shutdown_event.is_set():
try:
# Get document with timeout (allows checking shutdown)
with anyio.fail_after(1.0):
doc_task = await receive_stream.receive()
# Update queue size metric after receiving
stream_stats = receive_stream.statistics()
update_vector_sync_queue_size(stream_stats.current_buffer_used)
# Process document
await process_document(doc_task, nc_client)
# Update queue size metric after processing
stream_stats = receive_stream.statistics()
update_vector_sync_queue_size(stream_stats.current_buffer_used)
except TimeoutError:
# No documents available, continue
# No documents available, update metric to show empty queue
stream_stats = receive_stream.statistics()
update_vector_sync_queue_size(stream_stats.current_buffer_used)
continue
except anyio.EndOfStream:
@@ -90,6 +112,8 @@ async def process_document(doc_task: DocumentTask, nc_client: NextcloudClient):
doc_task: Document task to process
nc_client: Authenticated Nextcloud client
"""
start_time = time.time()
logger.debug(
f"Processing {doc_task.doc_type}_{doc_task.doc_id} "
f"for {doc_task.user_id} ({doc_task.operation})"
@@ -105,58 +129,79 @@ async def process_document(doc_task: DocumentTask, nc_client: NextcloudClient):
"vector_sync.doc_operation": doc_task.operation,
},
):
qdrant_client = await get_qdrant_client()
settings = get_settings()
try:
qdrant_client = await get_qdrant_client()
settings = get_settings()
# Handle deletion
if doc_task.operation == "delete":
await qdrant_client.delete(
collection_name=settings.get_collection_name(),
points_selector=Filter(
must=[
FieldCondition(
key="user_id",
match=MatchValue(value=doc_task.user_id),
),
FieldCondition(
key="doc_id",
match=MatchValue(value=doc_task.doc_id),
),
FieldCondition(
key="doc_type",
match=MatchValue(value=doc_task.doc_type),
),
]
),
)
logger.info(
f"Deleted {doc_task.doc_type}_{doc_task.doc_id} for {doc_task.user_id}"
)
return
# Handle deletion
if doc_task.operation == "delete":
await qdrant_client.delete(
collection_name=settings.get_collection_name(),
points_selector=Filter(
must=[
FieldCondition(
key="user_id",
match=MatchValue(value=doc_task.user_id),
),
FieldCondition(
key="doc_id",
match=MatchValue(value=doc_task.doc_id),
),
FieldCondition(
key="doc_type",
match=MatchValue(value=doc_task.doc_type),
),
]
),
)
logger.info(
f"Deleted {doc_task.doc_type}_{doc_task.doc_id} for {doc_task.user_id}"
)
# Handle indexing with retry
max_retries = 3
retry_delay = 1.0
# Record successful deletion metrics
duration = time.time() - start_time
record_qdrant_operation("delete", "success")
record_vector_sync_processing(duration, "success")
return
for attempt in range(max_retries):
try:
await _index_document(doc_task, nc_client, qdrant_client)
return # Success
# Handle indexing with retry
max_retries = 3
retry_delay = 1.0
except (HTTPStatusError, Exception) as e:
if attempt < max_retries - 1:
logger.warning(
f"Retry {attempt + 1}/{max_retries} for "
f"{doc_task.doc_type}_{doc_task.doc_id}: {e}"
)
await anyio.sleep(retry_delay)
retry_delay *= 2 # Exponential backoff
else:
logger.error(
f"Failed to index {doc_task.doc_type}_{doc_task.doc_id} "
f"after {max_retries} retries: {e}"
)
raise
for attempt in range(max_retries):
try:
await _index_document(doc_task, nc_client, qdrant_client)
# Record successful processing metrics
duration = time.time() - start_time
record_qdrant_operation("upsert", "success")
record_vector_sync_processing(duration, "success")
return # Success
except (HTTPStatusError, Exception) as e:
if attempt < max_retries - 1:
logger.warning(
f"Retry {attempt + 1}/{max_retries} for "
f"{doc_task.doc_type}_{doc_task.doc_id}: {e}"
)
await anyio.sleep(retry_delay)
retry_delay *= 2 # Exponential backoff
else:
logger.error(
f"Failed to index {doc_task.doc_type}_{doc_task.doc_id} "
f"after {max_retries} retries: {e}"
)
# Record failed processing metrics
duration = time.time() - start_time
record_qdrant_operation("upsert", "error")
record_vector_sync_processing(duration, "error")
raise
except Exception:
# Catch any other unexpected errors
duration = time.time() - start_time
record_vector_sync_processing(duration, "error")
raise
async def _index_document(
@@ -188,15 +233,21 @@ async def _index_document(
)
chunks = chunker.chunk_text(content)
# Generate embeddings (I/O bound - external API call)
# Generate dense embeddings (I/O bound - external API call)
embedding_service = get_embedding_service()
embeddings = await embedding_service.embed_batch(chunks)
dense_embeddings = await embedding_service.embed_batch(chunks)
# Generate sparse embeddings (BM25 for keyword matching)
bm25_service = get_bm25_service()
sparse_embeddings = bm25_service.encode_batch(chunks)
# Prepare Qdrant points
indexed_at = int(time.time())
points = []
for i, (chunk, embedding) in enumerate(zip(chunks, embeddings)):
for i, (chunk, dense_emb, sparse_emb) in enumerate(
zip(chunks, dense_embeddings, sparse_embeddings)
):
# Generate deterministic UUID for point ID
# Using uuid5 with DNS namespace and combining doc info
point_name = f"{doc_task.doc_type}:{doc_task.doc_id}:chunk:{i}"
@@ -205,7 +256,10 @@ async def _index_document(
points.append(
PointStruct(
id=point_id,
vector=embedding,
vector={
"dense": dense_emb,
"sparse": sparse_emb,
},
payload={
"user_id": doc_task.user_id,
"doc_id": doc_task.doc_id,
+24 -9
View File
@@ -2,7 +2,7 @@
import logging
from qdrant_client import AsyncQdrantClient
from qdrant_client import AsyncQdrantClient, models
from qdrant_client.models import Distance, VectorParams
from nextcloud_mcp_server.config import get_settings
@@ -84,7 +84,12 @@ async def get_qdrant_client() -> AsyncQdrantClient:
f"Collection '{collection_name}' found, validating dimensions..."
)
collection_info = await _qdrant_client.get_collection(collection_name)
actual_dimension = collection_info.config.params.vectors.size
# Handle both named vectors (dict) and legacy single vector
vectors = collection_info.config.params.vectors
if isinstance(vectors, dict):
actual_dimension = vectors["dense"].size
else:
actual_dimension = vectors.size
# Validate dimension matches
if actual_dimension != expected_dimension:
@@ -112,17 +117,27 @@ async def get_qdrant_client() -> AsyncQdrantClient:
)
await _qdrant_client.create_collection(
collection_name=collection_name,
vectors_config=VectorParams(
size=expected_dimension,
distance=Distance.COSINE,
),
vectors_config={
"dense": VectorParams(
size=expected_dimension,
distance=Distance.COSINE,
),
},
sparse_vectors_config={
"sparse": models.SparseVectorParams(
index=models.SparseIndexParams(
on_disk=False,
)
),
},
)
logger.info(
f"Created Qdrant collection: {collection_name}\n"
f" Dimension: {expected_dimension}\n"
f" Model: {settings.ollama_embedding_model}\n"
f" Dense vector dimension: {expected_dimension}\n"
f" Dense embedding model: {settings.ollama_embedding_model}\n"
f" Sparse vectors: BM25 (for hybrid search)\n"
f" Distance: COSINE\n"
f"Background sync will index all documents with this embedding model."
f"Background sync will index all documents with dense + sparse vectors."
)
return _qdrant_client
+69 -56
View File
@@ -8,11 +8,13 @@ import time
from dataclasses import dataclass
import anyio
from anyio.abc import TaskStatus
from anyio.streams.memory import MemoryObjectSendStream
from qdrant_client.models import FieldCondition, Filter, MatchValue
from nextcloud_mcp_server.client import NextcloudClient
from nextcloud_mcp_server.config import get_settings
from nextcloud_mcp_server.observability.metrics import record_vector_sync_scan
from nextcloud_mcp_server.observability.tracing import trace_operation
from nextcloud_mcp_server.vector.qdrant_client import get_qdrant_client
@@ -92,6 +94,8 @@ async def scanner_task(
wake_event: anyio.Event,
nc_client: NextcloudClient,
user_id: str,
*,
task_status: TaskStatus = anyio.TASK_STATUS_IGNORED,
):
"""
Periodic scanner that detects changed documents for enabled user.
@@ -104,10 +108,14 @@ async def scanner_task(
wake_event: Event to trigger immediate scan
nc_client: Authenticated Nextcloud client
user_id: User to scan
task_status: Status object for signaling task readiness
"""
logger.info(f"Scanner task started for user: {user_id}")
settings = get_settings()
# Signal that the task has started and is ready
task_status.started()
async with send_stream:
while not shutdown_event.is_set():
try:
@@ -174,70 +182,43 @@ async def scan_user_documents(
f"[SCAN-{scan_id}] Using pruneBefore={prune_before} to optimize data transfer"
)
# Fetch all notes from Nextcloud
notes = [
note
async for note in nc_client.notes.get_all_notes(prune_before=prune_before)
]
logger.info(f"[SCAN-{scan_id}] Found {len(notes)} notes for {user_id}")
# Get indexed state from Qdrant first (for incremental sync)
indexed_docs = {}
if not initial_sync:
qdrant_client = await get_qdrant_client()
scroll_result = await qdrant_client.scroll(
collection_name=get_settings().get_collection_name(),
scroll_filter=Filter(
must=[
FieldCondition(key="user_id", match=MatchValue(value=user_id)),
FieldCondition(key="doc_type", match=MatchValue(value="note")),
]
),
with_payload=["doc_id", "indexed_at"],
with_vectors=False,
limit=10000,
)
if initial_sync:
# Send everything on first sync
for note in notes:
modified_at = note.get("modified", 0)
await send_stream.send(
DocumentTask(
user_id=user_id,
doc_id=str(note["id"]),
doc_type="note",
operation="index",
modified_at=modified_at,
)
)
logger.info(f"Sent {len(notes)} documents for initial sync: {user_id}")
return
indexed_docs = {
point.payload["doc_id"]: point.payload["indexed_at"]
for point in scroll_result[0]
}
# Get indexed state from Qdrant
qdrant_client = await get_qdrant_client()
scroll_result = await qdrant_client.scroll(
collection_name=get_settings().get_collection_name(),
scroll_filter=Filter(
must=[
FieldCondition(key="user_id", match=MatchValue(value=user_id)),
FieldCondition(key="doc_type", match=MatchValue(value="note")),
]
),
with_payload=["doc_id", "indexed_at"],
with_vectors=False,
limit=10000,
)
logger.debug(f"Found {len(indexed_docs)} indexed documents in Qdrant")
indexed_docs = {
point.payload["doc_id"]: point.payload["indexed_at"]
for point in scroll_result[0]
}
logger.debug(f"Found {len(indexed_docs)} indexed documents in Qdrant")
# Compare and queue changes
# Stream notes from Nextcloud and process immediately
note_count = 0
queued = 0
nextcloud_doc_ids = {str(note["id"]) for note in notes}
nextcloud_doc_ids = set()
for note in notes:
async for note in nc_client.notes.get_all_notes(prune_before=prune_before):
note_count += 1
doc_id = str(note["id"])
indexed_at = indexed_docs.get(doc_id)
nextcloud_doc_ids.add(doc_id)
modified_at = note.get("modified", 0)
# If document reappeared, remove from potentially_deleted
doc_key = (user_id, doc_id)
if doc_key in _potentially_deleted:
logger.debug(
f"Document {doc_id} reappeared, removing from deletion grace period"
)
del _potentially_deleted[doc_key]
# Send if never indexed or modified since last index
if indexed_at is None or modified_at > indexed_at:
if initial_sync:
# Send everything on first sync
await send_stream.send(
DocumentTask(
user_id=user_id,
@@ -248,6 +229,38 @@ async def scan_user_documents(
)
)
queued += 1
else:
# Incremental sync: compare with indexed state
indexed_at = indexed_docs.get(doc_id)
# If document reappeared, remove from potentially_deleted
doc_key = (user_id, doc_id)
if doc_key in _potentially_deleted:
logger.debug(
f"Document {doc_id} reappeared, removing from deletion grace period"
)
del _potentially_deleted[doc_key]
# Send if never indexed or modified since last index
if indexed_at is None or modified_at > indexed_at:
await send_stream.send(
DocumentTask(
user_id=user_id,
doc_id=doc_id,
doc_type="note",
operation="index",
modified_at=modified_at,
)
)
queued += 1
# Log and record metrics after streaming
logger.info(f"[SCAN-{scan_id}] Found {note_count} notes for {user_id}")
record_vector_sync_scan(note_count)
if initial_sync:
logger.info(f"Sent {queued} documents for initial sync: {user_id}")
return
# Check for deleted documents (in Qdrant but not in Nextcloud)
# Use grace period: only delete after 2 consecutive scans confirm absence
+6 -2
View File
@@ -1,6 +1,6 @@
[project]
name = "nextcloud-mcp-server"
version = "0.32.1"
version = "0.39.0"
description = "Model Context Protocol (MCP) server for Nextcloud integration - enables AI assistants to interact with Nextcloud data"
authors = [
{name = "Chris Coutinho", email = "chris@coutinho.io"}
@@ -12,7 +12,7 @@ keywords = ["nextcloud", "mcp", "model-context-protocol", "llm", "ai", "claude",
dependencies = [
"mcp[cli] (>=1.21,<1.22)",
"httpx (>=0.28.1,<0.29.0)",
"pillow (>=12.0.0,<12.1.0)",
"pillow (>=10.3.0,<12.0.0)", # Compatible with fastembed
"icalendar (>=6.0.0,<7.0.0)",
"pythonvcard4>=0.2.0",
"pydantic>=2.11.4",
@@ -22,6 +22,7 @@ dependencies = [
"aiosqlite>=0.20.0", # Async SQLite for refresh token storage
"authlib>=1.6.5",
"qdrant-client>=1.7.0",
"fastembed>=0.4.2", # BM25 sparse vector embeddings for hybrid search
# Observability dependencies
"prometheus-client>=0.21.0", # Prometheus metrics
"opentelemetry-api>=1.28.2", # OpenTelemetry API
@@ -102,7 +103,10 @@ module-root = ""
[dependency-groups]
dev = [
"anthropic>=0.42.0", # For RAG evaluation with Anthropic LLMs
"boto3>=1.35.0", # For Amazon Bedrock provider (optional)
"commitizen>=4.8.2",
"datasets>=3.3.0", # For BeIR nfcorpus dataset loading
"ipython>=9.2.0",
"playwright>=1.49.1",
"pytest>=8.3.5",
-307
View File
@@ -1,307 +0,0 @@
#!/usr/bin/env python3
"""Script to automatically add @require_scopes decorators to MCP tools.
This script parses server module files and adds appropriate scope decorators
based on the operation type (read vs write).
Usage:
python scripts/add_scope_decorators.py [--dry-run] [--file FILE]
"""
import argparse
import ast
import re
from pathlib import Path
from typing import List, Tuple
# Operation patterns for classification
READ_PATTERNS = [
r".*_get_.*",
r".*_get$",
r".*_list_.*",
r".*_list$",
r".*_search_.*",
r".*_search$",
r".*_read_.*",
r".*_read$",
r".*_find_.*",
r".*_find$",
r".*_fetch_.*",
r".*_fetch$",
r".*_retrieve_.*",
r".*_retrieve$",
]
WRITE_PATTERNS = [
r".*_create_.*",
r".*_create$",
r".*_update_.*",
r".*_update$",
r".*_delete_.*",
r".*_delete$",
r".*_append_.*",
r".*_append$",
r".*_modify_.*",
r".*_modify$",
r".*_set_.*",
r".*_set$",
r".*_add_.*",
r".*_add$",
r".*_remove_.*",
r".*_remove$",
r".*_edit_.*",
r".*_edit$",
r".*_move_.*",
r".*_move$",
r".*_copy_.*",
r".*_copy$",
r".*_upload_.*",
r".*_upload$",
r".*_download_.*",
r".*_download$",
r".*_share_.*",
r".*_share$",
r".*_unshare_.*",
r".*_unshare$",
r".*_bulk_.*", # Bulk operations are typically writes
]
def classify_operation(func_name: str) -> str | None:
"""Classify a function as read or write operation.
Args:
func_name: Function name to classify
Returns:
"nc:read", "nc:write", or None if cannot classify
"""
# Check write patterns first (more specific)
for pattern in WRITE_PATTERNS:
if re.match(pattern, func_name):
return "nc:write"
# Check read patterns
for pattern in READ_PATTERNS:
if re.match(pattern, func_name):
return "nc:read"
return None
def has_scope_decorator(decorators: List[ast.expr]) -> bool:
"""Check if function already has @require_scopes decorator."""
for decorator in decorators:
if isinstance(decorator, ast.Call):
if (
isinstance(decorator.func, ast.Name)
and decorator.func.id == "require_scopes"
):
return True
elif isinstance(decorator, ast.Name) and decorator.name == "require_scopes":
return True
return False
def has_mcp_tool_decorator(decorators: List[ast.expr]) -> bool:
"""Check if function has @mcp.tool() decorator."""
for decorator in decorators:
if isinstance(decorator, ast.Call):
if isinstance(decorator.func, ast.Attribute):
if decorator.func.attr == "tool":
return True
return False
def find_tools_needing_decorators(
file_path: Path, verbose: bool = False
) -> List[Tuple[str, int, str]]:
"""Find all tools that need scope decorators.
Returns:
List of (function_name, line_number, required_scope)
"""
with open(file_path) as f:
content = f.read()
try:
tree = ast.parse(content)
except SyntaxError as e:
print(f" ⚠️ Syntax error in {file_path}: {e}")
return []
tools_to_update = []
total_functions = 0
mcp_tools = 0
already_has_scope = 0
cannot_classify = 0
for node in ast.walk(tree):
if isinstance(node, ast.FunctionDef):
total_functions += 1
if verbose and node.decorator_list:
decorators_str = [
ast.unparse(d) if hasattr(ast, "unparse") else str(d)
for d in node.decorator_list
]
print(f" Function {node.name} has decorators: {decorators_str}")
# Check if it's an MCP tool
if not has_mcp_tool_decorator(node.decorator_list):
continue
mcp_tools += 1
# Check if it already has scope decorator
if has_scope_decorator(node.decorator_list):
already_has_scope += 1
continue
# Classify operation
scope = classify_operation(node.name)
if scope:
tools_to_update.append((node.name, node.lineno, scope))
else:
cannot_classify += 1
if verbose:
print(f" ⚠️ Cannot classify: {node.name}")
if verbose:
print(
f" Debug: total_functions={total_functions}, mcp_tools={mcp_tools}, already_has_scope={already_has_scope}, cannot_classify={cannot_classify}"
)
return tools_to_update
def add_decorator_to_file(
file_path: Path, dry_run: bool = False, verbose: bool = False
) -> int:
"""Add @require_scopes decorators to tools in a file.
Returns:
Number of decorators added
"""
tools = find_tools_needing_decorators(file_path, verbose=verbose)
if not tools:
return 0
print(f"\n📝 {file_path.relative_to(Path.cwd())}")
with open(file_path) as f:
lines = f.readlines()
# Check if require_scopes is already imported
has_import = False
import_line_idx = None
for i, line in enumerate(lines):
if "from nextcloud_mcp_server.auth import" in line and "require_scopes" in line:
has_import = True
break
elif "from nextcloud_mcp_server.auth import" in line:
import_line_idx = i
# Add import if needed
if not has_import:
if import_line_idx is not None:
# Add require_scopes to existing import
old_line = lines[import_line_idx]
if "(" in old_line:
# Multi-line import
print(
" ⚠️ Multi-line import detected, please add manually: from nextcloud_mcp_server.auth import require_scopes"
)
else:
# Single line import - add require_scopes
lines[import_line_idx] = (
old_line.rstrip().rstrip(")").rstrip() + ", require_scopes)\n"
)
print(" ✓ Added require_scopes to import")
else:
# No auth import exists, add new import
# Find first import line
for i, line in enumerate(lines):
if line.startswith("from nextcloud_mcp_server"):
lines.insert(
i, "from nextcloud_mcp_server.auth import require_scopes\n"
)
print(
" ✓ Added import: from nextcloud_mcp_server.auth import require_scopes"
)
break
# Add decorators to tools (in reverse order to preserve line numbers)
for func_name, line_num, scope in reversed(tools):
# Find the @mcp.tool() decorator line
for i in range(line_num - 1, max(0, line_num - 10), -1):
if "@mcp.tool()" in lines[i]:
# Get indentation from @mcp.tool() line
indent = len(lines[i]) - len(lines[i].lstrip())
decorator_line = " " * indent + f'@require_scopes("{scope}")\n'
lines.insert(i + 1, decorator_line)
print(f'{func_name}:{line_num} → @require_scopes("{scope}")')
break
if not dry_run:
with open(file_path, "w") as f:
f.writelines(lines)
print(" 💾 Saved changes")
else:
print(" 🔍 DRY RUN - no changes written")
return len(tools)
def main():
parser = argparse.ArgumentParser(
description="Add @require_scopes decorators to MCP tools"
)
parser.add_argument(
"--dry-run",
action="store_true",
help="Show what would be changed without modifying files",
)
parser.add_argument(
"--file",
type=Path,
help="Process a single file instead of all server modules",
)
parser.add_argument(
"--verbose",
"-v",
action="store_true",
help="Show debug information",
)
args = parser.parse_args()
server_dir = Path(__file__).parent.parent / "nextcloud_mcp_server" / "server"
if args.file:
files = [args.file]
else:
files = sorted(server_dir.glob("*.py"))
files = [f for f in files if f.name != "__init__.py"]
print("🔍 Scanning for tools needing scope decorators...")
print(
f" {'DRY RUN MODE - No changes will be made' if args.dry_run else 'LIVE MODE - Files will be modified'}"
)
total_added = 0
for file_path in files:
added = add_decorator_to_file(
file_path, dry_run=args.dry_run, verbose=args.verbose
)
total_added += added
print(f"\n{'📊 Summary (dry run)' if args.dry_run else '✅ Complete'}")
print(f" Total decorators added: {total_added}")
if args.dry_run:
print("\n💡 Run without --dry-run to apply changes")
if __name__ == "__main__":
main()
-232
View File
@@ -1,232 +0,0 @@
#!/usr/bin/env python3
"""Simpler script to add @require_scopes decorators using regex.
This script uses regex patterns to find @mcp.tool() decorators and adds
the appropriate @require_scopes decorator based on function name patterns.
Usage:
python scripts/add_scope_decorators_simple.py [--dry-run]
"""
import argparse
import re
from pathlib import Path
# Operation patterns for classification
READ_KEYWORDS = [
"get",
"list",
"search",
"read",
"find",
"fetch",
"retrieve",
"upcoming",
]
WRITE_KEYWORDS = [
"create",
"update",
"delete",
"append",
"modify",
"set",
"add",
"remove",
"edit",
"move",
"copy",
"upload",
"download",
"share",
"unshare",
"bulk",
"manage",
"import",
"reindex",
"archive",
"unarchive",
"reorder",
"assign",
"unassign",
"insert",
"write",
]
def classify_function(func_name: str) -> str | None:
"""Classify a function name as read or write operation."""
func_lower = func_name.lower()
# Check write keywords first (more specific)
for keyword in WRITE_KEYWORDS:
if f"_{keyword}_" in func_lower or func_lower.endswith(f"_{keyword}"):
return "nc:write"
# Check read keywords
for keyword in READ_KEYWORDS:
if f"_{keyword}_" in func_lower or func_lower.endswith(f"_{keyword}"):
return "nc:read"
return None
def process_file(file_path: Path, dry_run: bool = False) -> int:
"""Process a single file to add @require_scopes decorators.
Returns:
Number of decorators added
"""
with open(file_path) as f:
lines = f.readlines()
# Check if require_scopes is already imported
has_import = False
import_line_idx = None
for i, line in enumerate(lines):
if "from nextcloud_mcp_server.auth import" in line:
if "require_scopes" in line:
has_import = True
else:
import_line_idx = i
modified = False
decorators_added = 0
# Find all @mcp.tool() decorators
i = 0
while i < len(lines):
line = lines[i]
# Look for @mcp.tool() decorator
if re.match(r"\s*@mcp\.tool\(\)", line):
# Check if next line already has @require_scopes
if i + 1 < len(lines) and "@require_scopes" in lines[i + 1]:
i += 1
continue
# Find the function definition (should be on next line or after other decorators)
func_line_idx = i + 1
while func_line_idx < len(lines) and not lines[
func_line_idx
].strip().startswith("async def"):
func_line_idx += 1
if func_line_idx >= len(lines):
i += 1
continue
# Extract function name
func_match = re.match(r"\s*async def (\w+)\(", lines[func_line_idx])
if not func_match:
i += 1
continue
func_name = func_match.group(1)
scope = classify_function(func_name)
if scope:
# Get indentation from @mcp.tool() line
indent = len(line) - len(line.lstrip())
decorator_line = " " * indent + f'@require_scopes("{scope}")\n'
# Insert after @mcp.tool()
lines.insert(i + 1, decorator_line)
decorators_added += 1
modified = True
print(f'{func_name} → @require_scopes("{scope}")')
else:
print(f" ⚠️ Cannot classify: {func_name}")
i += 1
# Add import if needed and decorators were added
if decorators_added > 0 and not has_import:
if import_line_idx is not None:
# Add to existing import
old_line = lines[import_line_idx]
if old_line.rstrip().endswith(")"):
lines[import_line_idx] = old_line.rstrip()[:-1] + ", require_scopes)\n"
else:
lines[import_line_idx] = old_line.rstrip() + ", require_scopes\n"
print(" ✓ Added require_scopes to existing import")
modified = True
else:
# No auth import exists, add new import after last 'from nextcloud_mcp_server' import
last_nc_import_idx = None
for i, line in enumerate(lines):
if line.startswith("from nextcloud_mcp_server"):
last_nc_import_idx = i
if last_nc_import_idx is not None:
lines.insert(
last_nc_import_idx + 1,
"from nextcloud_mcp_server.auth import require_scopes\n",
)
print(
" ✓ Added new import: from nextcloud_mcp_server.auth import require_scopes"
)
modified = True
else:
print(" ⚠️ Could not find place to add require_scopes import")
# Write changes
if modified and not dry_run:
with open(file_path, "w") as f:
f.writelines(lines)
print(f" 💾 Saved changes to {file_path.name}")
elif dry_run and decorators_added > 0:
print(f" 🔍 DRY RUN - would add {decorators_added} decorators")
return decorators_added
def main():
parser = argparse.ArgumentParser(
description="Add @require_scopes decorators to MCP tools"
)
parser.add_argument(
"--dry-run",
action="store_true",
help="Show what would be changed without modifying files",
)
parser.add_argument(
"--file",
type=Path,
help="Process a single file instead of all server modules",
)
args = parser.parse_args()
server_dir = Path(__file__).parent.parent / "nextcloud_mcp_server" / "server"
if args.file:
files = [args.file]
else:
files = sorted(server_dir.glob("*.py"))
files = [f for f in files if f.name != "__init__.py"]
print("🔍 Scanning for tools needing scope decorators...")
print(
f" {'DRY RUN MODE - No changes will be made' if args.dry_run else 'LIVE MODE - Files will be modified'}"
)
total_added = 0
for file_path in files:
file_path = file_path.resolve() # Convert to absolute path
try:
display_path = file_path.relative_to(Path.cwd())
except ValueError:
display_path = file_path.name
print(f"\n📝 {display_path}")
added = process_file(file_path, dry_run=args.dry_run)
total_added += added
print(f"\n{'📊 Summary (dry run)' if args.dry_run else '✅ Complete'}")
print(f" Total decorators added: {total_added}")
if args.dry_run and total_added > 0:
print("\n💡 Run without --dry-run to apply changes")
if __name__ == "__main__":
main()
-90
View File
@@ -1,90 +0,0 @@
#!/bin/bash
set -e
echo "=== Testing Separate Clients Architecture ==="
echo ""
# Check both clients exist in Keycloak
echo "1. Verifying Keycloak clients..."
docker compose exec -T app curl -s http://keycloak:8080/realms/nextcloud-mcp/.well-known/openid-configuration > /dev/null && echo "✓ Keycloak realm available"
# Check user_oidc provider configuration
echo ""
echo "2. Checking user_oidc provider..."
PROVIDER_INFO=$(docker compose exec -T app php occ user_oidc:provider keycloak)
echo "$PROVIDER_INFO" | grep -q "nextcloud" && echo "✓ user_oidc configured with 'nextcloud' client"
# Get token from nextcloud-mcp-server client
echo ""
echo "3. Getting token from 'nextcloud-mcp-server' client..."
TOKEN=$(curl -s -X POST "http://localhost:8888/realms/nextcloud-mcp/protocol/openid-connect/token" \
-d "grant_type=password" \
-d "client_id=nextcloud-mcp-server" \
-d "client_secret=mcp-secret-change-in-production" \
-d "username=admin" \
-d "password=admin" \
-d "scope=openid profile email offline_access" | jq -r '.access_token')
if [ "$TOKEN" = "null" ] || [ -z "$TOKEN" ]; then
echo "✗ Failed to get token"
exit 1
fi
echo "✓ Got token from nextcloud-mcp-server client"
# Check token claims
echo ""
echo "4. Inspecting token claims..."
CLAIMS=$(echo "$TOKEN" | cut -d'.' -f2 | base64 -d 2>/dev/null | jq '{aud, azp, iss, preferred_username}')
echo "$CLAIMS"
AUD=$(echo "$CLAIMS" | jq -r '.aud')
AZP=$(echo "$CLAIMS" | jq -r '.azp')
echo ""
echo "Architecture validation:"
if [ "$AUD" = "nextcloud" ]; then
echo " ✓ aud='nextcloud' - Token intended for Nextcloud resource server"
else
echo " ✗ FAILED: aud='$AUD', expected 'nextcloud'"
exit 1
fi
if [ "$AZP" = "nextcloud-mcp-server" ]; then
echo " ✓ azp='nextcloud-mcp-server' - Token requested by MCP Server client"
else
echo " ✗ FAILED: azp='$AZP', expected 'nextcloud-mcp-server'"
exit 1
fi
# Test with Nextcloud API
echo ""
echo "5. Testing token with Nextcloud API..."
HTTP_CODE=$(curl -s -w "%{http_code}" -o /tmp/nc_response.json \
-H "Authorization: Bearer $TOKEN" \
"http://localhost:8080/ocs/v2.php/cloud/capabilities?format=json")
echo "HTTP Status: $HTTP_CODE"
if [ "$HTTP_CODE" = "200" ]; then
echo "✓ Token validated successfully!"
echo ""
echo "===================================================================="
echo "SUCCESS: Separate Clients Architecture Working!"
echo "===================================================================="
echo ""
echo "Summary:"
echo " - MCP Server client: 'nextcloud-mcp-server' (requests tokens)"
echo " - Resource server: 'nextcloud' (validates tokens via user_oidc)"
echo " - Token audience: 'nextcloud' (proper resource targeting)"
echo " - Token azp: 'nextcloud-mcp-server' (identifies requester)"
echo ""
echo "This architecture supports:"
echo " - Future multi-resource tokens: aud=['nextcloud', 'other-service']"
echo " - Clear separation of OAuth client vs resource server"
echo " - RFC 8707 Resource Indicators compliance"
else
echo "✗ Token validation failed"
cat /tmp/nc_response.json
exit 1
fi
+57 -2
View File
@@ -9,6 +9,7 @@ import pytest
from httpx import HTTPStatusError
from mcp import ClientSession
from mcp.client.session import RequestContext
from mcp.client.sse import sse_client
from mcp.client.streamable_http import streamablehttp_client
from mcp.types import ElicitRequestParams, ElicitResult, ErrorData
@@ -165,6 +166,51 @@ async def create_mcp_client_session(
logger.debug(f"{client_name} client session cleaned up successfully")
async def create_mcp_client_session_sse(
url: str,
token: str | None = None,
client_name: str = "MCP",
elicitation_callback: Any = None,
) -> AsyncGenerator[ClientSession, Any]:
"""
Factory function to create an MCP client session using SSE transport.
Similar to create_mcp_client_session but uses SSE transport instead of streamable-http.
Uses native async context managers to ensure correct LIFO cleanup order.
Args:
url: MCP server URL (e.g., "http://localhost:8000/sse")
token: Optional OAuth access token for Bearer authentication
client_name: Client name for logging (e.g., "Basic MCP (SSE)")
elicitation_callback: Optional callback for handling elicitation requests
Yields:
Initialized MCP ClientSession
Note:
SSE transport is being deprecated in favor of streamable-http.
This function exists for compatibility testing only.
"""
logger.info(f"Creating SSE client for {client_name}")
# Prepare headers with OAuth token if provided
headers = {"Authorization": f"Bearer {token}"} if token else None
# Use native async with - Python ensures LIFO cleanup
# Cleanup order will be: ClientSession.__aexit__ -> sse_client.__aexit__
# Note: sse_client yields only (read_stream, write_stream), not 3 values like streamablehttp_client
async with sse_client(url, headers=headers) as (read_stream, write_stream):
async with ClientSession(
read_stream, write_stream, elicitation_callback=elicitation_callback
) as session:
await session.initialize()
logger.info(f"{client_name} client session initialized successfully")
yield session
# Cleanup happens automatically in LIFO order - no exception suppression needed
logger.debug(f"{client_name} client session cleaned up successfully")
@pytest.fixture(scope="session")
async def nc_client(anyio_backend) -> AsyncGenerator[NextcloudClient, Any]:
"""
@@ -203,12 +249,21 @@ async def nc_client(anyio_backend) -> AsyncGenerator[NextcloudClient, Any]:
@pytest.fixture(scope="session")
async def nc_mcp_client(anyio_backend) -> AsyncGenerator[ClientSession, Any]:
"""
Fixture to create an MCP client session for integration tests using streamable-http.
Fixture to create an MCP client session for integration tests using SSE transport.
Uses anyio pytest plugin for proper async fixture handling.
Note: SSE transport is being deprecated. This fixture uses SSE for compatibility testing.
"""
# async for session in create_mcp_client_session_sse(
# url="http://localhost:8000/sse", client_name="Basic MCP (SSE)"
# ):
# yield session
async for session in create_mcp_client_session(
url="http://localhost:8000/mcp", client_name="Basic MCP"
url="http://localhost:8000/mcp",
client_name="Basic MCP (HTTP)",
):
yield session
+278
View File
@@ -0,0 +1,278 @@
# RAG Evaluation Tests
This directory contains tests for evaluating the Retrieval-Augmented Generation (RAG) system in the Nextcloud MCP server, specifically the `nc_semantic_search_answer` tool.
## Architecture
The RAG system has two components that are tested independently:
1. **Retrieval** - Vector sync/embedding pipeline (indexed Nextcloud documents → vector database)
2. **Generation** - MCP client LLM synthesis (retrieved context → natural language answer)
See [ADR-013](../../docs/ADR-013-rag-evaluation.md) for full architectural details.
## Test Structure
```
tests/rag_evaluation/
├── README.md # This file
├── conftest.py # Pytest fixtures
├── llm_providers.py # LLM provider abstraction (Ollama/Anthropic)
├── fixtures/
│ └── ground_truth.json # Pre-generated reference answers
├── test_retrieval_quality.py # Retrieval evaluation (Context Recall)
└── test_generation_quality.py # Generation evaluation (Answer Correctness)
```
## Metrics
### Retrieval Evaluation
- **Metric**: Context Recall
- **Method**: Heuristic - Check if ground-truth document IDs appear in top-k results
- **Target**: ≥80% recall
### Generation Evaluation
- **Metric**: Answer Correctness
- **Method**: LLM-as-judge - Compare RAG answer vs ground truth (binary true/false)
- **Evaluation**: External LLM evaluates semantic equivalence
## Dataset
**BeIR/nfcorpus** - Medical/biomedical corpus with ~3,600 documents
**Test Queries** (5 selected):
1. PLAIN-2630: "Alkylphenol Endocrine Disruptors and Allergies" (21 relevant docs)
2. PLAIN-2660: "How Long to Detox From Fish Before Pregnancy?" (20 relevant docs)
3. PLAIN-2510: "Coffee and Artery Function" (16 relevant docs)
4. PLAIN-2430: "Preventing Brain Loss with B Vitamins?" (15 relevant docs)
5. PLAIN-2690: "Chronic Headaches and Pork Tapeworms" (14 relevant docs)
## Setup
### 1. Install Dependencies
```bash
uv sync --group dev
```
This installs:
- `anthropic>=0.42.0` - For Anthropic LLM evaluation
- `click>=8.1.8` - For CLI interface
- `datasets>=3.3.0` - For BeIR nfcorpus dataset loading
### 2. Configure LLM Provider
Set environment variables for your LLM provider:
**Option A: Ollama (default, local/remote)**
```bash
export RAG_EVAL_PROVIDER=ollama
export OLLAMA_HOST=https://ollama.example.com # or RAG_EVAL_OLLAMA_BASE_URL
export RAG_EVAL_OLLAMA_MODEL=llama3.2:1b
```
**Option B: Anthropic (cloud)**
```bash
export RAG_EVAL_PROVIDER=anthropic
export RAG_EVAL_ANTHROPIC_API_KEY=sk-ant-...
export RAG_EVAL_ANTHROPIC_MODEL=claude-3-5-sonnet-20241022
```
### 3. One-Time Setup: Generate Ground Truth
Generate synthetic reference answers for the 5 test queries:
```bash
uv run python tools/rag_eval_cli.py generate
```
**What this does:**
- Downloads nfcorpus dataset to `tests/rag_evaluation/fixtures/nfcorpus/` (cached locally)
- For each of the 5 selected queries, extracts highly relevant documents
- Uses configured LLM to synthesize a reference answer
- Saves to `tests/rag_evaluation/fixtures/ground_truth.json`
**Optional flags:**
- `--provider ollama|anthropic` - Override LLM provider
- `--model MODEL_NAME` - Override model name
- `--force-download` - Re-download nfcorpus dataset
### 4. One-Time Setup: Upload Corpus to Nextcloud
Upload all 3,633 nfcorpus documents as Nextcloud notes:
```bash
uv run python tools/rag_eval_cli.py upload \
--nextcloud-url http://localhost:8000 \
--username admin \
--password admin
```
**What this does:**
- Downloads nfcorpus dataset (if not already cached)
- Uploads all documents as notes in Nextcloud
- Saves document ID → note ID mapping to `tests/rag_evaluation/fixtures/note_mapping.json`
**Optional flags:**
- `--category CATEGORY` - Custom category for notes (default: `nfcorpus_rag_eval`)
- `--force-download` - Re-download nfcorpus dataset
- `--force` - Delete all existing notes in the target category before uploading (efficient corpus refresh)
**Important:** This step requires:
- A running Nextcloud instance with vector sync enabled
- Notes app installed
- Valid credentials
**Duration:** ~10-15 minutes to upload 3,633 documents
## Running Tests
### Run All RAG Evaluation Tests
```bash
uv run pytest tests/rag_evaluation/ -v
```
### Run Specific Test Suites
**Retrieval Quality Only:**
```bash
uv run pytest tests/rag_evaluation/test_retrieval_quality.py -v
```
**Generation Quality Only:**
```bash
uv run pytest tests/rag_evaluation/test_generation_quality.py -v
```
### Run Individual Tests
```bash
uv run pytest tests/rag_evaluation/test_retrieval_quality.py::test_retrieval_context_recall -v
uv run pytest tests/rag_evaluation/test_generation_quality.py::test_answer_correctness -v
```
## Test Execution Flow
**Prerequisites** (one-time setup):
1. Generated ground truth (`tools/rag_eval_cli.py generate`)
2. Uploaded corpus to Nextcloud (`tools/rag_eval_cli.py upload`)
### Retrieval Quality Tests
1. **Setup** (`nfcorpus_test_data` fixture):
- Loads pre-generated ground truth from `fixtures/ground_truth.json`
- Loads note mapping from `fixtures/note_mapping.json`
- Returns test cases with expected note IDs
2. **Test** (`test_retrieval_context_recall`):
- For each query: Perform semantic search (top-10)
- Extract retrieved note IDs
- Calculate Context Recall = (expected ∩ retrieved) / expected
- Assert recall ≥ 80%
3. **Cleanup**:
- None required (notes persist in Nextcloud for reuse)
### Generation Quality Tests
1. **Setup**:
- Same as retrieval tests (reuses `nfcorpus_test_data` fixture)
- Creates evaluation LLM provider
2. **Test** (`test_answer_correctness`):
- For each query: Call `nc_semantic_search_answer` MCP tool
- Extract generated answer
- Use LLM-as-judge to compare vs ground truth
- Assert semantic equivalence (TRUE/FALSE)
3. **Cleanup**:
- LLM provider closed
## Expected Test Duration
**One-time setup:**
- **Generate ground truth**: ~5-10 minutes (5 queries with LLM generation)
- **Upload corpus**: ~10-15 minutes (3,633 documents)
- **Total setup**: ~15-25 minutes
**Test execution** (after setup):
- **Retrieval tests**: ~1-2 minutes (5 queries, no upload/cleanup)
- **Generation tests**: ~5-10 minutes (RAG generation + LLM evaluation)
- **Total per run**: ~6-12 minutes
**Note**: These are NOT smoke tests and are NOT run in CI.
## Limitations & Future Work
**Current Limitations:**
- Only 5 test queries (limited statistical confidence)
- Medical domain bias (may not represent production use cases)
- Synthetic ground truth (LLM-generated, not human-validated)
- Manual test execution (requires external LLM access)
**Future Enhancements:**
- Expand to 50-100 queries for statistical significance
- Add custom test dataset with production-representative documents
- Implement additional metrics (faithfulness, context relevance, answer relevance)
- Create automated benchmarking dashboard
- Test multi-hop reasoning (synthesis questions)
- Evaluate out-of-scope handling ("I don't know" responses)
## Troubleshooting
### Tests Fail with "Ground truth file not found"
Run the generate command first:
```bash
uv run python tools/rag_eval_cli.py generate
```
### Tests Fail with "Note mapping file not found"
Run the upload command first:
```bash
uv run python tools/rag_eval_cli.py upload --nextcloud-url http://localhost:8000 --username admin --password admin
```
### Tests Fail with "MCP sampling client not yet implemented"
The `mcp_sampling_client` fixture is a placeholder. You need to implement MCP client creation with sampling support. See the TODO in `conftest.py`.
### Upload Command Fails
Common issues:
1. **Nextcloud not running**: Ensure Nextcloud is accessible at the URL
2. **Invalid credentials**: Verify username/password
3. **Notes app not installed**: Install Notes app in Nextcloud
4. **Network timeout**: Increase timeout in CLI (currently 60s)
### LLM Timeout
If ground truth generation times out:
1. Increase timeout in `llm_providers.py` (currently 10 min)
2. Use a faster model: `--model llama3.2:1b`
3. Check Ollama/Anthropic service availability
### Dataset Download Fails
The nfcorpus dataset is downloaded automatically. If download fails:
1. Check internet connection
2. Manually download from: https://public.ukp.informatik.tu-darmstadt.de/thakur/BEIR/datasets/nfcorpus.zip
3. Extract to `tests/rag_evaluation/fixtures/nfcorpus/`
4. Or use HuggingFace datasets cache: `~/.cache/huggingface/datasets/BeIR___nfcorpus/`
### Vector Sync Not Indexing Documents
After uploading, vector sync must index the documents:
1. Check vector sync is enabled in Nextcloud
2. Trigger manual sync if needed
3. Wait for background job to process all documents
4. Verify in Qdrant that vectors exist for uploaded notes
## References
- [ADR-013: RAG Evaluation Testing Framework](../../docs/ADR-013-rag-evaluation.md)
- [ADR-008: MCP Sampling for Semantic Search](../../docs/ADR-008-mcp-sampling-for-semantic-search.md)
- [BeIR Benchmark](https://github.com/beir-cellar/beir)
- [NFCorpus Dataset](https://www.cl.uni-heidelberg.de/statnlpgroup/nfcorpus/)
+1
View File
@@ -0,0 +1 @@
"""RAG evaluation tests for the Nextcloud MCP semantic search system."""
+145
View File
@@ -0,0 +1,145 @@
"""Pytest fixtures for RAG evaluation tests.
IMPORTANT: Before running these tests, you must:
1. Generate ground truth: uv run python tools/rag_eval_cli.py generate
2. Upload corpus: uv run python tools/rag_eval_cli.py upload --nextcloud-url http://localhost:8000 --username admin --password admin
This ensures that the ground truth and note mappings are available.
"""
import json
from pathlib import Path
from typing import Any
import pytest
from tests.rag_evaluation.llm_providers import create_llm_provider
# Paths
FIXTURES_DIR = Path(__file__).parent / "fixtures"
GROUND_TRUTH_FILE = FIXTURES_DIR / "ground_truth.json"
NOTE_MAPPING_FILE = FIXTURES_DIR / "note_mapping.json"
@pytest.fixture(scope="session")
def ground_truth_data() -> list[dict[str, Any]]:
"""Load pre-generated ground truth data.
Returns:
List of test cases with query, ground truth answer, and expected doc IDs
Raises:
FileNotFoundError: If ground_truth.json doesn't exist
"""
if not GROUND_TRUTH_FILE.exists():
raise FileNotFoundError(
f"Ground truth file not found: {GROUND_TRUTH_FILE}\n"
"Run: uv run python tools/rag_eval_cli.py generate"
)
with open(GROUND_TRUTH_FILE) as f:
return json.load(f)
@pytest.fixture(scope="session")
def note_mapping() -> dict[str, int]:
"""Load document ID → note ID mapping.
Returns:
Dict mapping nfcorpus document ID to Nextcloud note ID
Raises:
FileNotFoundError: If note_mapping.json doesn't exist
"""
if not NOTE_MAPPING_FILE.exists():
raise FileNotFoundError(
f"Note mapping file not found: {NOTE_MAPPING_FILE}\n"
"Run: uv run python tools/rag_eval_cli.py upload --nextcloud-url ... --username ... --password ..."
)
with open(NOTE_MAPPING_FILE) as f:
return json.load(f)
@pytest.fixture(scope="session")
def nfcorpus_test_data(
ground_truth_data: list[dict[str, Any]],
note_mapping: dict[str, int],
):
"""Prepare nfcorpus test data for evaluation.
This fixture combines ground truth answers with note mappings to create
test cases ready for retrieval and generation quality tests.
Args:
ground_truth_data: Pre-generated ground truth answers
note_mapping: Document ID → note ID mapping
Returns:
List of test cases with query, ground truth, expected doc IDs, and note IDs
"""
test_cases = []
for gt in ground_truth_data:
# Map expected document IDs to note IDs
expected_note_ids = [
note_mapping.get(doc_id)
for doc_id in gt["expected_document_ids"]
if doc_id in note_mapping
]
# Filter out None values (docs that weren't uploaded)
expected_note_ids = [nid for nid in expected_note_ids if nid is not None]
test_cases.append(
{
"query_id": gt["query_id"],
"query_text": gt["query_text"],
"ground_truth_answer": gt["ground_truth_answer"],
"expected_document_ids": gt["expected_document_ids"],
"expected_note_ids": expected_note_ids,
"highly_relevant_count": gt["highly_relevant_count"],
}
)
return test_cases
@pytest.fixture(scope="session")
async def evaluation_llm():
"""Create LLM provider for evaluation (separate from MCP client).
Environment variables:
RAG_EVAL_PROVIDER: Provider type (ollama or anthropic)
RAG_EVAL_OLLAMA_BASE_URL: Ollama base URL (or OLLAMA_HOST)
RAG_EVAL_OLLAMA_MODEL: Ollama model name
RAG_EVAL_ANTHROPIC_API_KEY: Anthropic API key
RAG_EVAL_ANTHROPIC_MODEL: Anthropic model name
Returns:
LLM provider instance (OllamaProvider or AnthropicProvider)
"""
llm = create_llm_provider()
yield llm
await llm.close()
@pytest.fixture(scope="session")
async def mcp_sampling_client():
"""Create MCP client that supports sampling for RAG generation.
This fixture creates an MCP client configured to support sampling,
which is required for testing the nc_semantic_search_answer tool.
TODO: Implement MCP client with sampling support
For now, this is a placeholder.
Returns:
MCP client instance with sampling enabled
"""
# TODO: Implement MCP client creation with sampling support
# This will require:
# 1. Creating an MCP client configured for sampling
# 2. Authenticating with Nextcloud
# 3. Ensuring sampling is enabled
pytest.skip("MCP sampling client not yet implemented")
+89
View File
@@ -0,0 +1,89 @@
"""LLM provider abstraction for RAG evaluation.
DEPRECATED: This module is maintained for backward compatibility with RAG evaluation tests.
New code should use nextcloud_mcp_server.providers directly.
Supports Ollama (local), Anthropic (cloud), and Bedrock (AWS) providers for both ground truth
generation and evaluation.
"""
import os
from nextcloud_mcp_server.providers import (
AnthropicProvider,
BedrockProvider,
OllamaProvider,
Provider,
)
def create_llm_provider(
provider: str | None = None,
ollama_base_url: str | None = None,
ollama_model: str | None = None,
anthropic_api_key: str | None = None,
anthropic_model: str | None = None,
bedrock_region: str | None = None,
bedrock_model: str | None = None,
) -> Provider:
"""Create an LLM provider from environment variables or arguments.
Args:
provider: Provider type ('ollama', 'anthropic', or 'bedrock').
Defaults to RAG_EVAL_PROVIDER env var or 'ollama'
ollama_base_url: Ollama base URL. Defaults to RAG_EVAL_OLLAMA_BASE_URL or 'http://localhost:11434'
ollama_model: Ollama model. Defaults to RAG_EVAL_OLLAMA_MODEL or 'llama3.2:1b'
anthropic_api_key: Anthropic API key. Defaults to RAG_EVAL_ANTHROPIC_API_KEY env var
anthropic_model: Anthropic model. Defaults to RAG_EVAL_ANTHROPIC_MODEL or 'claude-3-5-sonnet-20241022'
bedrock_region: AWS region. Defaults to RAG_EVAL_BEDROCK_REGION or AWS_REGION env var
bedrock_model: Bedrock model ID. Defaults to RAG_EVAL_BEDROCK_MODEL or
'anthropic.claude-3-sonnet-20240229-v1:0'
Returns:
Provider instance
Raises:
ValueError: If provider is invalid or required credentials are missing
"""
# Get provider from args or env
provider = provider or os.environ.get("RAG_EVAL_PROVIDER", "ollama")
if provider == "ollama":
# Try RAG_EVAL_OLLAMA_BASE_URL, then OLLAMA_HOST, then default
base_url = (
ollama_base_url
or os.environ.get("RAG_EVAL_OLLAMA_BASE_URL")
or os.environ.get("OLLAMA_HOST")
or "http://localhost:11434"
)
model = ollama_model or os.environ.get("RAG_EVAL_OLLAMA_MODEL", "llama3.2:1b")
return OllamaProvider(
base_url=base_url, embedding_model=None, generation_model=model
)
elif provider == "anthropic":
api_key = anthropic_api_key or os.environ.get("RAG_EVAL_ANTHROPIC_API_KEY")
if not api_key:
raise ValueError(
"Anthropic API key required. Set RAG_EVAL_ANTHROPIC_API_KEY environment variable."
)
model = anthropic_model or os.environ.get(
"RAG_EVAL_ANTHROPIC_MODEL", "claude-3-5-sonnet-20241022"
)
return AnthropicProvider(api_key=api_key, model=model)
elif provider == "bedrock":
region = bedrock_region or os.environ.get(
"RAG_EVAL_BEDROCK_REGION", os.environ.get("AWS_REGION", "us-east-1")
)
model = bedrock_model or os.environ.get(
"RAG_EVAL_BEDROCK_MODEL", "anthropic.claude-3-sonnet-20240229-v1:0"
)
return BedrockProvider(
region_name=region, embedding_model=None, generation_model=model
)
else:
raise ValueError(
f"Invalid provider: {provider}. Must be 'ollama', 'anthropic', or 'bedrock'."
)
@@ -0,0 +1,139 @@
"""Tests for RAG generation quality (Answer Correctness metric).
These tests evaluate whether the MCP client LLM generates factually correct
answers from retrieved context using the nc_semantic_search_answer tool.
Metric: Answer Correctness
- Measures: Is the generated answer factually correct?
- Method: LLM-as-judge - Compare RAG answer vs ground truth (binary true/false)
- Evaluation: External LLM evaluates semantic equivalence
"""
import pytest
@pytest.mark.integration
async def test_answer_correctness(
mcp_sampling_client,
evaluation_llm,
nfcorpus_test_data,
):
"""Test that RAG system generates factually correct answers.
For each test query:
1. Execute full RAG pipeline via nc_semantic_search_answer MCP tool
2. Extract generated answer from RAG response
3. Use LLM-as-judge to compare against ground truth (binary true/false)
4. Assert answer is semantically equivalent to ground truth
This tests the quality of the generation component (MCP client LLM).
"""
results_summary = []
for test_case in nfcorpus_test_data:
query = test_case["query_text"]
ground_truth = test_case["ground_truth_answer"]
print(f"\n{'=' * 80}")
print(f"Query: {query}")
# Execute full RAG pipeline
print("Executing RAG pipeline...")
rag_result = await mcp_sampling_client.call_tool(
"nc_semantic_search_answer",
arguments={"query": query, "limit": 5},
)
rag_answer = rag_result["generated_answer"]
print(f"RAG Answer preview: {rag_answer[:200]}...")
print(f"Ground Truth preview: {ground_truth[:200]}...")
# LLM-as-judge evaluation
evaluation_prompt = f"""Compare these two answers and respond with only TRUE or FALSE.
Question: {query}
Generated Answer: {rag_answer}
Ground Truth Answer: {ground_truth}
Are these answers semantically equivalent (do they convey the same factual information)?
Respond with only: TRUE or FALSE"""
print("Evaluating answer correctness...")
evaluation_result = await evaluation_llm.generate(
evaluation_prompt,
max_tokens=10,
)
is_correct = evaluation_result.strip().upper() == "TRUE"
result = {
"query_id": test_case["query_id"],
"query": query,
"rag_answer_length": len(rag_answer),
"ground_truth_length": len(ground_truth),
"is_correct": is_correct,
"evaluation_result": evaluation_result.strip(),
}
results_summary.append(result)
print(f" Evaluation: {evaluation_result.strip()}")
print(f" Status: {'✓ CORRECT' if is_correct else '✗ INCORRECT'}")
# Assert answer correctness
assert is_correct, (
f"Answer mismatch for query: {query}\n\n"
f"Generated Answer:\n{rag_answer}\n\n"
f"Ground Truth:\n{ground_truth}\n\n"
f"Evaluation: {evaluation_result.strip()}"
)
# Print summary
print(f"\n{'=' * 80}")
print("Answer Correctness Summary:")
print(f" Total queries: {len(results_summary)}")
print(f" Correct: {sum(r['is_correct'] for r in results_summary)}")
print(f" Incorrect: {sum(not r['is_correct'] for r in results_summary)}")
accuracy = sum(r["is_correct"] for r in results_summary) / len(results_summary)
print(f" Accuracy: {accuracy:.2%}")
print(f"{'=' * 80}")
@pytest.mark.integration
async def test_answer_contains_sources(mcp_sampling_client, nfcorpus_test_data):
"""Test that RAG answers include source citations.
This is a basic quality check - we verify that the nc_semantic_search_answer
tool returns both a generated answer and source documents.
"""
for test_case in nfcorpus_test_data:
query = test_case["query_text"]
# Execute RAG pipeline
rag_result = await mcp_sampling_client.call_tool(
"nc_semantic_search_answer",
arguments={"query": query, "limit": 5},
)
# Check response structure
assert "generated_answer" in rag_result, "Response missing 'generated_answer'"
assert "sources" in rag_result, "Response missing 'sources'"
# Check sources are provided
sources = rag_result["sources"]
assert len(sources) > 0, f"No sources returned for query: {query}"
# Check each source has required fields
for i, source in enumerate(sources):
assert "document_id" in source or "id" in source, (
f"Source {i} missing document ID"
)
assert "excerpt" in source or "content" in source or "text" in source, (
f"Source {i} missing content"
)
print(f"Query: {query}")
print(f" Sources provided: {len(sources)}")
print(" Status: ✓ PASS")
@@ -0,0 +1,143 @@
"""Tests for RAG retrieval quality (Context Recall metric).
These tests evaluate whether the vector sync/embedding pipeline successfully
retrieves documents containing the answer to a query.
Metric: Context Recall
- Measures: Did we retrieve documents containing the answer?
- Method: Heuristic - Check if ground-truth document IDs appear in top-k results
- Target: ≥80% recall (at least 80% of expected docs in top-10 results)
"""
import pytest
@pytest.mark.integration
async def test_retrieval_context_recall(nc_client, nfcorpus_test_data):
"""Test that semantic search retrieves documents containing the answer.
For each test query:
1. Perform semantic search (retrieval only, no generation)
2. Extract retrieved document IDs from top-k results
3. Calculate Context Recall: intersection of retrieved and expected docs
4. Assert recall meets threshold (≥80%)
This tests the quality of the vector sync/embedding pipeline.
"""
# Top-k documents to retrieve
k = 10
# Minimum acceptable recall
min_recall = 0.8
results_summary = []
for test_case in nfcorpus_test_data:
query = test_case["query_text"]
expected_note_ids = set(test_case["expected_note_ids"])
# Perform semantic search (retrieval only)
search_results = await nc_client.notes.semantic_search(
query=query,
limit=k,
)
# Extract retrieved note IDs
retrieved_note_ids = {result["id"] for result in search_results}
# Calculate Context Recall
intersection = expected_note_ids & retrieved_note_ids
recall = len(intersection) / len(expected_note_ids) if expected_note_ids else 0
# Store results
result = {
"query_id": test_case["query_id"],
"query": query,
"expected_count": len(expected_note_ids),
"retrieved_count": len(retrieved_note_ids),
"intersection_count": len(intersection),
"recall": recall,
"passed": recall >= min_recall,
}
results_summary.append(result)
# Print detailed result for this query
print(f"\n{'=' * 80}")
print(f"Query: {query}")
print(f" Expected docs: {len(expected_note_ids)}")
print(f" Retrieved (top-{k}): {len(retrieved_note_ids)}")
print(f" Intersection: {len(intersection)}")
print(f" Context Recall: {recall:.2%}")
print(f" Status: {'✓ PASS' if result['passed'] else '✗ FAIL'}")
# Assert recall meets threshold
assert recall >= min_recall, (
f"Context Recall {recall:.2%} below threshold {min_recall:.2%} "
f"for query: {query}\n"
f"Expected {len(expected_note_ids)} docs, found {len(intersection)} in top-{k}"
)
# Print summary
print(f"\n{'=' * 80}")
print("Context Recall Summary:")
print(f" Total queries: {len(results_summary)}")
print(f" Passed: {sum(r['passed'] for r in results_summary)}")
print(f" Failed: {sum(not r['passed'] for r in results_summary)}")
print(
f" Average recall: {sum(r['recall'] for r in results_summary) / len(results_summary):.2%}"
)
print(f"{'=' * 80}")
@pytest.mark.integration
async def test_retrieval_top1_precision(nc_client, nfcorpus_test_data):
"""Test that the top-1 retrieved document is highly relevant.
This is a stricter test than context recall - we verify that
the single most relevant document (rank 1) is in the expected set.
This tests whether the ranking is good, not just retrieval.
"""
results_summary = []
for test_case in nfcorpus_test_data:
query = test_case["query_text"]
expected_note_ids = set(test_case["expected_note_ids"])
# Perform semantic search
search_results = await nc_client.notes.semantic_search(
query=query,
limit=1, # Only top-1
)
# Check if top result is in expected set
if search_results:
top_result_id = search_results[0]["id"]
is_relevant = top_result_id in expected_note_ids
else:
is_relevant = False
result = {
"query_id": test_case["query_id"],
"query": query,
"top_result_id": search_results[0]["id"] if search_results else None,
"is_relevant": is_relevant,
}
results_summary.append(result)
print(f"\nQuery: {query}")
print(f" Top-1 relevant: {'✓ YES' if is_relevant else '✗ NO'}")
# This is informational - we don't assert here
# Some queries may have multiple valid top results
# Print summary
precision_at_1 = sum(r["is_relevant"] for r in results_summary) / len(
results_summary
)
print(f"\n{'=' * 80}")
print(f"Precision@1: {precision_at_1:.2%}")
print(
f" ({sum(r['is_relevant'] for r in results_summary)}/{len(results_summary)} queries)"
)
print(f"{'=' * 80}")
+1
View File
@@ -0,0 +1 @@
"""Unit tests for provider infrastructure."""
+280
View File
@@ -0,0 +1,280 @@
"""Unit tests for Bedrock provider."""
import json
from unittest.mock import MagicMock
import pytest
from nextcloud_mcp_server.providers.bedrock import BOTO3_AVAILABLE, BedrockProvider
@pytest.fixture
def mock_bedrock_client(mocker):
"""Mock boto3 bedrock-runtime client."""
if not BOTO3_AVAILABLE:
pytest.skip("boto3 not installed")
mock_client = MagicMock()
mocker.patch("boto3.client", return_value=mock_client)
return mock_client
@pytest.mark.unit
async def test_bedrock_embedding_titan(mock_bedrock_client):
"""Test Bedrock embedding with Titan model."""
# Mock response
mock_response = {
"body": MagicMock(
read=MagicMock(
return_value=json.dumps({"embedding": [0.1, 0.2, 0.3]}).encode()
)
)
}
mock_bedrock_client.invoke_model.return_value = mock_response
# Create provider
provider = BedrockProvider(
region_name="us-east-1",
embedding_model="amazon.titan-embed-text-v2:0",
generation_model=None,
)
# Test embedding
embedding = await provider.embed("test text")
assert embedding == [0.1, 0.2, 0.3]
mock_bedrock_client.invoke_model.assert_called_once()
call_args = mock_bedrock_client.invoke_model.call_args
assert call_args.kwargs["modelId"] == "amazon.titan-embed-text-v2:0"
body = json.loads(call_args.kwargs["body"])
assert body == {"inputText": "test text"}
@pytest.mark.unit
async def test_bedrock_embedding_batch(mock_bedrock_client):
"""Test Bedrock batch embedding."""
# Mock response
mock_response = {
"body": MagicMock(
read=MagicMock(
return_value=json.dumps({"embedding": [0.1, 0.2, 0.3]}).encode()
)
)
}
mock_bedrock_client.invoke_model.return_value = mock_response
# Create provider
provider = BedrockProvider(
region_name="us-east-1",
embedding_model="amazon.titan-embed-text-v2:0",
generation_model=None,
)
# Test batch embedding
embeddings = await provider.embed_batch(["text1", "text2"])
assert len(embeddings) == 2
assert embeddings[0] == [0.1, 0.2, 0.3]
assert embeddings[1] == [0.1, 0.2, 0.3]
assert mock_bedrock_client.invoke_model.call_count == 2
@pytest.mark.unit
async def test_bedrock_generation_claude(mock_bedrock_client):
"""Test Bedrock text generation with Claude model."""
# Mock response
mock_response = {
"body": MagicMock(
read=MagicMock(
return_value=json.dumps(
{"content": [{"text": "Generated response"}]}
).encode()
)
)
}
mock_bedrock_client.invoke_model.return_value = mock_response
# Create provider
provider = BedrockProvider(
region_name="us-east-1",
embedding_model=None,
generation_model="anthropic.claude-3-sonnet-20240229-v1:0",
)
# Test generation
text = await provider.generate("test prompt", max_tokens=100)
assert text == "Generated response"
mock_bedrock_client.invoke_model.assert_called_once()
call_args = mock_bedrock_client.invoke_model.call_args
assert call_args.kwargs["modelId"] == "anthropic.claude-3-sonnet-20240229-v1:0"
body = json.loads(call_args.kwargs["body"])
assert body["messages"][0]["content"] == "test prompt"
assert body["max_tokens"] == 100
@pytest.mark.unit
async def test_bedrock_generation_llama(mock_bedrock_client):
"""Test Bedrock text generation with Llama model."""
# Mock response
mock_response = {
"body": MagicMock(
read=MagicMock(
return_value=json.dumps({"generation": "Llama response"}).encode()
)
)
}
mock_bedrock_client.invoke_model.return_value = mock_response
# Create provider
provider = BedrockProvider(
region_name="us-east-1",
embedding_model=None,
generation_model="meta.llama3-8b-instruct-v1:0",
)
# Test generation
text = await provider.generate("test prompt")
assert text == "Llama response"
body = json.loads(mock_bedrock_client.invoke_model.call_args.kwargs["body"])
assert body["prompt"] == "test prompt"
assert "max_gen_len" in body
@pytest.mark.unit
async def test_bedrock_both_capabilities(mock_bedrock_client):
"""Test Bedrock with both embedding and generation models."""
# Mock responses
embed_response = {
"body": MagicMock(
read=MagicMock(return_value=json.dumps({"embedding": [0.1, 0.2]}).encode())
)
}
gen_response = {
"body": MagicMock(
read=MagicMock(
return_value=json.dumps({"content": [{"text": "Response"}]}).encode()
)
)
}
# Mock to return different responses based on modelId
def mock_invoke(modelId, body, **kwargs):
if "embed" in modelId:
return embed_response
else:
return gen_response
mock_bedrock_client.invoke_model.side_effect = mock_invoke
# Create provider with both models
provider = BedrockProvider(
region_name="us-east-1",
embedding_model="amazon.titan-embed-text-v2:0",
generation_model="anthropic.claude-3-sonnet-20240229-v1:0",
)
assert provider.supports_embeddings is True
assert provider.supports_generation is True
# Test both capabilities
embedding = await provider.embed("test")
assert embedding == [0.1, 0.2]
text = await provider.generate("test")
assert text == "Response"
@pytest.mark.unit
async def test_bedrock_no_embeddings():
"""Test Bedrock provider with no embedding model raises error."""
provider = BedrockProvider(
region_name="us-east-1",
embedding_model=None,
generation_model="anthropic.claude-3-sonnet-20240229-v1:0",
)
assert provider.supports_embeddings is False
with pytest.raises(NotImplementedError, match="no embedding_model configured"):
await provider.embed("test")
with pytest.raises(NotImplementedError, match="no embedding_model configured"):
await provider.embed_batch(["test"])
with pytest.raises(NotImplementedError, match="no embedding_model configured"):
provider.get_dimension()
@pytest.mark.unit
async def test_bedrock_no_generation():
"""Test Bedrock provider with no generation model raises error."""
provider = BedrockProvider(
region_name="us-east-1",
embedding_model="amazon.titan-embed-text-v2:0",
generation_model=None,
)
assert provider.supports_generation is False
with pytest.raises(NotImplementedError, match="no generation_model configured"):
await provider.generate("test")
@pytest.mark.unit
async def test_bedrock_dimension_detection(mock_bedrock_client):
"""Test dimension detection for Bedrock embeddings."""
# Mock response with specific dimension
mock_response = {
"body": MagicMock(
read=MagicMock(
return_value=json.dumps(
{"embedding": [0.1] * 1536} # 1536-dim embedding
).encode()
)
)
}
mock_bedrock_client.invoke_model.return_value = mock_response
provider = BedrockProvider(
region_name="us-east-1",
embedding_model="amazon.titan-embed-text-v2:0",
)
# Dimension not detected yet
with pytest.raises(RuntimeError, match="not detected yet"):
provider.get_dimension()
# Detect dimension
await provider._detect_dimension()
# Now dimension should be available
assert provider.get_dimension() == 1536
@pytest.mark.unit
async def test_bedrock_cohere_embedding(mock_bedrock_client):
"""Test Bedrock with Cohere embedding model."""
# Mock response
mock_response = {
"body": MagicMock(
read=MagicMock(
return_value=json.dumps({"embeddings": [[0.1, 0.2, 0.3]]}).encode()
)
)
}
mock_bedrock_client.invoke_model.return_value = mock_response
provider = BedrockProvider(
region_name="us-east-1",
embedding_model="cohere.embed-english-v3",
)
embedding = await provider.embed("test text")
assert embedding == [0.1, 0.2, 0.3]
body = json.loads(mock_bedrock_client.invoke_model.call_args.kwargs["body"])
assert body == {"texts": ["test text"], "input_type": "search_document"}
+217
View File
@@ -0,0 +1,217 @@
"""
Unit tests for @instrument_tool decorator.
Tests that the decorator correctly instruments MCP tools with both
Prometheus metrics and OpenTelemetry tracing.
"""
from unittest.mock import MagicMock, patch
import pytest
from nextcloud_mcp_server.observability.metrics import instrument_tool
pytestmark = pytest.mark.unit
@pytest.fixture
def mock_metrics():
"""Mock Prometheus metrics."""
with (
patch(
"nextcloud_mcp_server.observability.metrics.record_tool_call"
) as mock_record,
patch(
"nextcloud_mcp_server.observability.metrics.record_tool_error"
) as mock_error,
):
yield {"record_tool_call": mock_record, "record_tool_error": mock_error}
@pytest.fixture
def mock_tracer():
"""Mock OpenTelemetry tracer."""
with patch(
"nextcloud_mcp_server.observability.tracing.trace_operation"
) as mock_trace:
# Configure mock to act as a context manager that allows exceptions to propagate
mock_trace.return_value.__enter__ = MagicMock(return_value=None)
mock_trace.return_value.__exit__ = MagicMock(
return_value=False
) # Return False to allow exceptions to propagate
yield mock_trace
class TestInstrumentToolDecorator:
"""Test the @instrument_tool decorator."""
async def test_decorator_creates_trace_span(self, mock_tracer, mock_metrics):
"""Test that decorator creates OpenTelemetry span with correct attributes."""
@instrument_tool
async def example_tool(query: str, limit: int = 10):
return {"results": []}
# Call the tool
await example_tool(query="test query", limit=5)
# Verify trace_operation was called with correct parameters
mock_tracer.assert_called_once()
call_args = mock_tracer.call_args
# Check span name
assert call_args[0][0] == "mcp.tool.example_tool"
# Check span attributes
attributes = call_args[1]["attributes"]
assert attributes["mcp.tool.name"] == "example_tool"
assert "query" in attributes["mcp.tool.args"]
assert "test query" in attributes["mcp.tool.args"]
assert "limit" in attributes["mcp.tool.args"]
# Verify record_exception parameter
assert call_args[1]["record_exception"] is True
async def test_decorator_sanitizes_sensitive_arguments(
self, mock_tracer, mock_metrics
):
"""Test that sensitive arguments are excluded from span attributes."""
@instrument_tool
async def example_tool(
query: str, password: str, token: str, api_key: str, ctx: object
):
return {"success": True}
# Call with sensitive parameters
await example_tool(
query="test",
password="secret123",
token="bearer_token",
api_key="api_key_123",
ctx=MagicMock(),
)
# Verify trace was created
mock_tracer.assert_called_once()
attributes = mock_tracer.call_args[1]["attributes"]
# Check that sensitive fields are NOT in attributes
tool_args = attributes["mcp.tool.args"]
assert "password" not in tool_args
assert "secret123" not in tool_args
assert "token" not in tool_args
assert "bearer_token" not in tool_args
assert "api_key" not in tool_args
assert "api_key_123" not in tool_args
assert "ctx" not in tool_args
# Check that non-sensitive field IS included
assert "query" in tool_args
assert "test" in tool_args
async def test_decorator_limits_argument_string_length(
self, mock_tracer, mock_metrics
):
"""Test that tool arguments are limited to 500 characters."""
@instrument_tool
async def example_tool(query: str):
return {"results": []}
# Create a very long query string (>500 chars)
long_query = "x" * 1000
await example_tool(query=long_query)
# Verify arguments were truncated
mock_tracer.assert_called_once()
attributes = mock_tracer.call_args[1]["attributes"]
tool_args = attributes["mcp.tool.args"]
assert len(tool_args) <= 500
async def test_decorator_records_success_metrics(self, mock_tracer, mock_metrics):
"""Test that successful tool execution records metrics."""
@instrument_tool
async def example_tool():
return {"success": True}
# Call the tool
await example_tool()
# Verify success metrics were recorded
mock_metrics["record_tool_call"].assert_called_once()
call_args = mock_metrics["record_tool_call"].call_args
assert call_args[0][0] == "example_tool" # tool_name
assert isinstance(call_args[0][1], float) # duration
assert call_args[0][2] == "success" # status
async def test_decorator_records_error_metrics(self, mock_tracer, mock_metrics):
"""Test that tool errors are recorded in metrics."""
@instrument_tool
async def failing_tool():
raise ValueError("Test error")
# Call the tool and expect exception
with pytest.raises(ValueError, match="Test error"):
await failing_tool()
# Verify error metrics were recorded
mock_metrics["record_tool_call"].assert_called_once()
call_args = mock_metrics["record_tool_call"].call_args
assert call_args[0][0] == "failing_tool" # tool_name
assert isinstance(call_args[0][1], float) # duration
assert call_args[0][2] == "error" # status
# Verify error type was recorded
mock_metrics["record_tool_error"].assert_called_once()
error_args = mock_metrics["record_tool_error"].call_args
assert error_args[0][0] == "failing_tool" # tool_name
assert error_args[0][1] == "ValueError" # error_type
async def test_decorator_preserves_function_metadata(
self, mock_tracer, mock_metrics
):
"""Test that decorator preserves function name and docstring."""
@instrument_tool
async def example_tool():
"""This is a test tool."""
return {"success": True}
# Verify function metadata is preserved
assert example_tool.__name__ == "example_tool"
assert example_tool.__doc__ == "This is a test tool."
async def test_decorator_preserves_return_value(self, mock_tracer, mock_metrics):
"""Test that decorator returns the original function's return value."""
@instrument_tool
async def example_tool(value: int):
return {"result": value * 2}
# Call the tool
result = await example_tool(value=5)
# Verify return value is unchanged
assert result == {"result": 10}
async def test_decorator_with_no_arguments(self, mock_tracer, mock_metrics):
"""Test decorator with tool that takes no arguments."""
@instrument_tool
async def no_args_tool():
return {"status": "ok"}
# Call the tool
await no_args_tool()
# Verify tracing works with no arguments
mock_tracer.assert_called_once()
attributes = mock_tracer.call_args[1]["attributes"]
# tool_args should be None when there are no kwargs
assert attributes["mcp.tool.args"] is None
+587
View File
@@ -0,0 +1,587 @@
#!/usr/bin/env python3
"""RAG Evaluation Management CLI.
Commands:
generate - Generate ground truth answers from nfcorpus dataset
upload - Upload nfcorpus documents as Nextcloud notes
Usage:
# Generate ground truth
uv run python tools/rag_eval_cli.py generate
# Upload corpus to Nextcloud
uv run python tools/rag_eval_cli.py upload --nextcloud-url http://localhost:8000 --username admin --password admin
"""
import io
import json
import sys
import zipfile
from pathlib import Path
from typing import Any
import anyio
import click
import httpx
from datasets import load_dataset
from httpx import BasicAuth
# Add parent directory to path to import from tests/
sys.path.insert(0, str(Path(__file__).parent.parent))
from nextcloud_mcp_server.client import NextcloudClient
from tests.rag_evaluation.llm_providers import create_llm_provider
# Paths
FIXTURES_DIR = Path(__file__).parent.parent / "tests" / "rag_evaluation" / "fixtures"
CORPUS_DIR = FIXTURES_DIR / "nfcorpus"
GROUND_TRUTH_FILE = FIXTURES_DIR / "ground_truth.json"
NOTE_MAPPING_FILE = FIXTURES_DIR / "note_mapping.json"
# Dataset URL
NFCORPUS_URL = (
"https://public.ukp.informatik.tu-darmstadt.de/thakur/BEIR/datasets/nfcorpus.zip"
)
# Selected test queries (from ADR-013)
SELECTED_QUERIES = [
"PLAIN-2630", # Alkylphenol Endocrine Disruptors and Allergies
"PLAIN-2660", # How Long to Detox From Fish Before Pregnancy?
"PLAIN-2510", # Coffee and Artery Function
"PLAIN-2430", # Preventing Brain Loss with B Vitamins?
"PLAIN-2690", # Chronic Headaches and Pork Tapeworms
]
def ensure_corpus_downloaded(force_download: bool = False) -> Path:
"""Ensure nfcorpus dataset is downloaded to fixtures directory.
Args:
force_download: Force re-download even if corpus exists
Returns:
Path to corpus directory
Raises:
RuntimeError: If download fails
"""
if CORPUS_DIR.exists() and not force_download:
click.echo(f"Corpus already exists at {CORPUS_DIR}")
return CORPUS_DIR
click.echo(f"Downloading nfcorpus dataset to {CORPUS_DIR}...")
# Create fixtures directory
FIXTURES_DIR.mkdir(parents=True, exist_ok=True)
# Download using HuggingFace datasets library (handles caching)
try:
# Download corpus
click.echo(" Downloading corpus...")
corpus_dataset = load_dataset(
"BeIR/nfcorpus",
"corpus",
split="corpus",
)
# Download queries
click.echo(" Downloading queries...")
queries_dataset = load_dataset(
"BeIR/nfcorpus",
"queries",
split="queries",
)
# Save to local fixtures directory as JSONL
CORPUS_DIR.mkdir(parents=True, exist_ok=True)
# Save corpus
with open(CORPUS_DIR / "corpus.jsonl", "w") as f:
for doc in corpus_dataset:
f.write(json.dumps(doc) + "\n")
# Save queries
with open(CORPUS_DIR / "queries.jsonl", "w") as f:
for query in queries_dataset:
f.write(json.dumps(query) + "\n")
# Download qrels from BEIR directly (not available via HuggingFace)
click.echo(" Downloading qrels from BEIR ZIP...")
with httpx.Client(timeout=300.0) as client:
response = client.get(NFCORPUS_URL)
response.raise_for_status()
# Extract qrels from ZIP
with zipfile.ZipFile(io.BytesIO(response.content)) as zf:
# The qrels are in nfcorpus/qrels/test.tsv within the ZIP
qrels_path = "nfcorpus/qrels/test.tsv"
qrels_dir = CORPUS_DIR / "qrels"
qrels_dir.mkdir(parents=True, exist_ok=True)
qrels_content = zf.read(qrels_path).decode("utf-8")
with open(qrels_dir / "test.tsv", "w") as f:
f.write(qrels_content)
click.echo(f"Dataset downloaded to {CORPUS_DIR}")
return CORPUS_DIR
except Exception as e:
raise RuntimeError(f"Failed to download nfcorpus dataset: {e}") from e
def load_corpus(corpus_dir: Path) -> dict[str, dict]:
"""Load corpus documents from local directory.
Args:
corpus_dir: Path to corpus directory
Returns:
Dict mapping document ID to document data
"""
corpus = {}
with open(corpus_dir / "corpus.jsonl") as f:
for line in f:
doc = json.loads(line)
corpus[doc["_id"]] = doc
return corpus
def load_queries(corpus_dir: Path) -> dict[str, dict]:
"""Load queries from local directory.
Args:
corpus_dir: Path to corpus directory
Returns:
Dict mapping query ID to query data
"""
queries = {}
with open(corpus_dir / "queries.jsonl") as f:
for line in f:
query = json.loads(line)
queries[query["_id"]] = query
return queries
def load_qrels(corpus_dir: Path) -> dict[str, list[tuple[str, int]]]:
"""Load query relevance judgments from local directory.
Args:
corpus_dir: Path to corpus directory
Returns:
Dict mapping query ID to list of (doc_id, score) tuples
"""
qrels: dict[str, list[tuple[str, int]]] = {}
with open(corpus_dir / "qrels" / "test.tsv") as f:
next(f) # Skip header
for line in f:
query_id, corpus_id, score = line.strip().split("\t")
if query_id not in qrels:
qrels[query_id] = []
qrels[query_id].append((corpus_id, int(score)))
# Sort by score descending
for query_id in qrels:
qrels[query_id].sort(key=lambda x: x[1], reverse=True)
return qrels
async def generate_ground_truth_answer(
query_text: str, relevant_docs: list[dict[str, Any]], llm
) -> str:
"""Generate ground truth answer from highly relevant documents.
Args:
query_text: The query/question
relevant_docs: List of highly relevant documents (top 5)
llm: LLM provider instance
Returns:
Generated ground truth answer
"""
# Construct context from documents
context_parts = []
for i, doc in enumerate(relevant_docs, 1):
context_parts.append(
f"Document {i}:\nTitle: {doc['title']}\nText: {doc['text']}\n"
)
context = "\n".join(context_parts)
# Generate ground truth
prompt = f"""Based on the following medical/biomedical documents, provide a comprehensive, factual answer to this question.
Question: {query_text}
{context}
Instructions:
- Provide a clear, well-structured answer that synthesizes information from the documents
- Focus on accuracy and completeness
- Use specific facts and findings from the documents
- Keep the answer concise but informative (2-4 paragraphs)
- Do not make up information not present in the documents
Answer:"""
click.echo(f" Generating answer for: {query_text}")
answer = await llm.generate(prompt, max_tokens=500)
click.echo(f" Generated {len(answer)} characters")
return answer.strip()
@click.group()
def cli():
"""RAG Evaluation Management CLI.
Manage ground truth generation and corpus upload for RAG evaluation tests.
"""
pass
@cli.command()
@click.option(
"--provider",
type=click.Choice(["ollama", "anthropic"]),
default="ollama",
help="LLM provider to use for generation",
)
@click.option(
"--model",
help="Model name (default: llama3.2:1b for Ollama, claude-3-5-sonnet-20241022 for Anthropic)",
)
@click.option(
"--force-download",
is_flag=True,
help="Force re-download of nfcorpus dataset",
)
def generate(provider: str, model: str | None, force_download: bool):
"""Generate ground truth answers for RAG evaluation.
This command:
1. Downloads nfcorpus dataset (if not already cached)
2. For each selected query, extracts highly relevant documents
3. Uses an LLM to synthesize a reference answer
4. Saves ground truth to fixtures/ground_truth.json
Environment variables:
RAG_EVAL_PROVIDER: Provider type (ollama or anthropic)
RAG_EVAL_OLLAMA_BASE_URL: Ollama base URL
RAG_EVAL_OLLAMA_MODEL: Ollama model name
RAG_EVAL_ANTHROPIC_API_KEY: Anthropic API key
RAG_EVAL_ANTHROPIC_MODEL: Anthropic model name
"""
async def _generate():
click.echo("=" * 80)
click.echo("RAG Ground Truth Generation")
click.echo("=" * 80)
# Ensure corpus is downloaded
corpus_dir = ensure_corpus_downloaded(force_download)
# Load dataset
click.echo("\nLoading nfcorpus dataset...")
corpus = load_corpus(corpus_dir)
queries = load_queries(corpus_dir)
qrels = load_qrels(corpus_dir)
click.echo(f"Loaded {len(corpus)} documents, {len(queries)} queries")
# Create LLM provider
click.echo("\nInitializing LLM provider...")
try:
llm = create_llm_provider(
provider=provider,
ollama_model=model if provider == "ollama" else None,
anthropic_model=model if provider == "anthropic" else None,
)
provider_type = type(llm).__name__
click.echo(f"Using provider: {provider_type}")
except ValueError as e:
click.echo(f"\nError: {e}", err=True)
return 1
# Generate ground truth for each selected query
ground_truth_data = []
try:
for query_id in SELECTED_QUERIES:
if query_id not in queries:
click.echo(
f"\nWarning: Query {query_id} not found in dataset", err=True
)
continue
query = queries[query_id]
query_text = query["text"]
# Get highly relevant documents (score=2)
if query_id not in qrels:
click.echo(
f"\nWarning: No relevance judgments for {query_id}", err=True
)
continue
highly_relevant_doc_ids = [
doc_id for doc_id, score in qrels[query_id] if score == 2
]
if not highly_relevant_doc_ids:
click.echo(
f"\nWarning: No highly relevant docs for {query_id}", err=True
)
continue
# Get top 5 highly relevant documents
relevant_docs = []
for doc_id in highly_relevant_doc_ids[:5]:
if doc_id in corpus:
relevant_docs.append(corpus[doc_id])
if not relevant_docs:
click.echo(
f"\nWarning: Could not load documents for {query_id}", err=True
)
continue
# Generate ground truth answer
click.echo(f"\n{'-' * 80}")
ground_truth_answer = await generate_ground_truth_answer(
query_text, relevant_docs, llm
)
# Store result
ground_truth_data.append(
{
"query_id": query_id,
"query_text": query_text,
"ground_truth_answer": ground_truth_answer,
"expected_document_ids": highly_relevant_doc_ids,
"highly_relevant_count": len(highly_relevant_doc_ids),
}
)
click.echo(f" Preview: {ground_truth_answer[:200]}...")
finally:
await llm.close()
# Save ground truth
GROUND_TRUTH_FILE.parent.mkdir(parents=True, exist_ok=True)
with open(GROUND_TRUTH_FILE, "w") as f:
json.dump(ground_truth_data, f, indent=2)
click.echo(f"\n{'=' * 80}")
click.echo(f"Generated {len(ground_truth_data)} ground truth answers")
click.echo(f"Saved to: {GROUND_TRUTH_FILE}")
click.echo("=" * 80)
return 0
sys.exit(anyio.run(_generate))
@cli.command()
@click.option(
"--nextcloud-url",
envvar="NEXTCLOUD_HOST",
required=True,
help="Nextcloud base URL (e.g., http://localhost:8000)",
)
@click.option(
"--username",
envvar="NEXTCLOUD_USERNAME",
required=True,
help="Nextcloud username",
)
@click.option(
"--password",
envvar="NEXTCLOUD_PASSWORD",
required=True,
help="Nextcloud password",
)
@click.option(
"--category",
default="nfcorpus_rag_eval",
help="Category/folder for uploaded notes",
)
@click.option(
"--force-download",
is_flag=True,
help="Force re-download of nfcorpus dataset",
)
@click.option(
"--force",
is_flag=True,
help="Delete all existing notes in the target category before uploading",
)
def upload(
nextcloud_url: str,
username: str,
password: str,
category: str,
force_download: bool,
force: bool,
):
"""Upload nfcorpus corpus documents as Nextcloud notes.
This command:
1. Downloads nfcorpus dataset (if not already cached)
2. Optionally deletes existing notes in target category (--force)
3. Uploads all corpus documents as Nextcloud notes
4. Saves document ID → note ID mapping to fixtures/note_mapping.json
The note mapping file is used by pytest tests to map expected document IDs
to actual note IDs in Nextcloud.
"""
async def _upload():
click.echo("=" * 80)
click.echo("Upload nfcorpus Corpus to Nextcloud")
click.echo("=" * 80)
# Ensure corpus is downloaded
corpus_dir = ensure_corpus_downloaded(force_download)
# Load corpus
click.echo("\nLoading corpus...")
corpus = load_corpus(corpus_dir)
click.echo(f"Loaded {len(corpus)} documents")
# Create Nextcloud client
click.echo(f"\nConnecting to Nextcloud at {nextcloud_url}...")
nc_client = NextcloudClient(
base_url=nextcloud_url,
username=username,
auth=BasicAuth(username, password),
)
try:
# Delete existing notes in category if force is specified
if force:
click.echo(
f"\n--force specified: Deleting existing notes in category '{category}'..."
)
# Collect notes to delete
notes_to_delete = []
async for note in nc_client.notes.get_all_notes():
if note.get("category") == category:
notes_to_delete.append(note["id"])
if not notes_to_delete:
click.echo(f"No existing notes found in category '{category}'")
else:
click.echo(f"Found {len(notes_to_delete)} notes to delete")
deleted_count = 0
delete_errors = []
delete_semaphore = anyio.Semaphore(20)
async def delete_note(note_id: int):
"""Delete a single note."""
nonlocal deleted_count
async with delete_semaphore:
try:
await nc_client.notes.delete_note(note_id)
deleted_count += 1
if deleted_count % 100 == 0:
click.echo(f" Deleted {deleted_count} notes...")
except Exception as e:
error_msg = f"Error deleting note {note_id}: {e}"
delete_errors.append(error_msg)
click.echo(f" {error_msg}", err=True)
# Delete all notes concurrently
async with anyio.create_task_group() as tg:
for note_id in notes_to_delete:
tg.start_soon(delete_note, note_id)
click.echo(
f"Deleted {deleted_count} existing notes in category '{category}'"
)
if delete_errors:
click.echo(
f"Encountered {len(delete_errors)} errors during deletion",
err=True,
)
# Upload documents concurrently
click.echo(f"\nUploading {len(corpus)} documents as notes (concurrent)...")
click.echo(f"Category: {category}")
note_mapping = {}
uploaded_count = 0
upload_errors = []
# Semaphore to limit concurrent uploads (avoid overwhelming server)
max_concurrent = 20
semaphore = anyio.Semaphore(max_concurrent)
async def upload_document(doc_id: str, doc: dict[str, Any]):
"""Upload a single document as a note."""
nonlocal uploaded_count
async with semaphore:
title = f"[{doc_id}] {doc['title'][:100]}" # Truncate long titles
content = doc["text"]
try:
note_data = await nc_client.notes.create_note(
title=title,
content=content,
category=category,
)
# Store mapping
note_id = note_data["id"]
note_mapping[doc_id] = note_id
uploaded_count += 1
# Progress indicator every 100 docs
if uploaded_count % 100 == 0:
click.echo(
f" Uploaded {uploaded_count}/{len(corpus)} documents..."
)
except Exception as e:
error_msg = f"Error uploading {doc_id}: {e}"
upload_errors.append(error_msg)
click.echo(f" {error_msg}", err=True)
# Upload all documents concurrently using task group
async with anyio.create_task_group() as tg:
for doc_id, doc in corpus.items():
tg.start_soon(upload_document, doc_id, doc)
click.echo(f"\nUploaded {uploaded_count} documents successfully")
if upload_errors:
click.echo(
f"Encountered {len(upload_errors)} errors during upload", err=True
)
# Save note mapping
with open(NOTE_MAPPING_FILE, "w") as f:
json.dump(note_mapping, f, indent=2)
click.echo(f"Saved note mapping to: {NOTE_MAPPING_FILE}")
click.echo(f" Mapped {len(note_mapping)} document IDs to note IDs")
finally:
# Close the Nextcloud client
await nc_client.close()
click.echo("=" * 80)
click.echo("Upload complete!")
click.echo("=" * 80)
return 0
sys.exit(anyio.run(_upload))
if __name__ == "__main__":
cli()
Generated
+1545 -162
View File
File diff suppressed because it is too large Load Diff