Compare commits

...

87 Commits

Author SHA1 Message Date
Chris Coutinho c4ce28f05d fix: Improve 3D plot rendering with explicit dimensions and window resize support
- Get container dimensions before creating Plotly layout to render at correct size immediately
- Add init() method with window resize listener for responsive plot sizing
- Remove post-render resize call (no longer needed with explicit dimensions)
- Improve colorbar positioning and scene domain configuration

This eliminates the visual "jump" during initial render and ensures the plot resizes smoothly when the browser window changes size.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-19 19:43:20 +01:00
Chris Coutinho c126c3ec03 fix: Preserve 3D plot camera and improve documentation
This commit addresses PR feedback and fixes plot camera behavior.

## JavaScript Fix - Camera Preservation
- Changed plot update strategy from recreating layout to using Plotly.restyle()
- Query point visibility now toggles via restyle() which only modifies trace visibility
- Camera position/zoom naturally preserved since layout remains untouched
- Resolves jumpy plot behavior when toggling "Show Query Point" checkbox

Related: nextcloud_mcp_server/auth/static/vector-viz.js:58-73

## Documentation Improvements
- Condensed vector-sync-ui.md from 316 to 94 lines (~70% reduction)
- Removed redundant FAQ section (content merged into main sections)
- Simplified use cases from 4 detailed sections to 3 focused paragraphs
- Streamlined troubleshooting to 3 common issues
- Merged technical details into overview section
- Retained all essential information while improving readability

## Screenshot Updates
Removed old/outdated images (5 files):
- rag-workflow-bidirectional-final.png
- rag-workflow-prominent-llm.png
- rag-workflow-simple-final.png
- vector-viz-interface.png
- welcome-page.png

Replaced with current screenshots (3 files):
- vector-viz-document-types-2col.png - Now shows plot + results
- vector-viz-chunk-context.png - Centered content view
- vector-viz-results.png - Updated results list

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-19 14:10:53 +01:00
Chris Coutinho 9bd02d7ef7 fix: Preserve 3D plot camera position and fix CSS loading
Two fixes for the vector visualization page:

1. **CSS Loading Fix**: Moved CSS <link> from vector_viz.html fragment
   to user_info.html <head> block. HTMX fragments don't process <link>
   tags in <head>, causing unstyled page. Now CSS loads correctly.

2. **Camera Preservation**: Modified renderPlot() to preserve camera
   position when toggling query point visibility. Previously, toggling
   the "Show Query Point" checkbox would reset zoom/rotation to default.
   Now reads existing camera settings from plot before updating.

Related: nextcloud_mcp_server/auth/static/vector-viz.js:123-130
Related: nextcloud_mcp_server/auth/templates/user_info.html:12

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-19 13:51:08 +01:00
Chris Coutinho 53689d076b feat: Improve vector visualization with static assets and fixes
- Extract CSS and JavaScript into separate static files
  - Created nextcloud_mcp_server/auth/static/vector-viz.css
  - Created nextcloud_mcp_server/auth/static/vector-viz.js
  - Updated templates to reference external assets

- Fix vector visualization issues:
  - Normalize vectors before PCA to match Qdrant's cosine distance
  - Add zero-norm and NaN detection/handling for large datasets
  - Enable responsive Plotly sizing (autosize + responsive config)
  - Widen plot area to full viewport width with minimized margins

- Improve visualization accuracy:
  - Query point now positioned correctly relative to documents
  - Handles 200+ points without JSON serialization errors
  - Full-width plot maximizes screen space utilization

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-19 04:10:44 +01:00
Chris Coutinho 9db20a4d01 feat: Redesign UI to match Nextcloud ecosystem aesthetic
This commit updates the web interface to better align with Nextcloud's
design system and improve the Vector Viz layout.

Changes:
- Replace emoji icons with Material Design SVG icons for better
  consistency with Nextcloud apps
- Simplify navigation styling with minimal padding and subtle active
  states (250px width)
- Update CSS variables to match Nextcloud design system
- Restructure Vector Viz from two-column to single-column vertical
  layout for better plot visibility
- Move search controls to compact horizontal grid at top
- Make navigation toggle always visible (not just on mobile)
- Fix plot container sizing with overflow:visible to prevent colorbar
  clipping
- Remove heavy shadows and custom card styling for cleaner aesthetic
- Add error and success page templates with consistent styling

Technical details:
- Preserve Alpine.js for reactive functionality
- Use CSS Grid for responsive horizontal controls layout
- Add smooth transitions for navigation collapse/expand
- Maintain HTMX for dynamic content loading

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-19 00:45:19 +01:00
Chris Coutinho 73e8012707 Merge pull request #325 from cbcoutinho/renovate/docker.io-library-python-3.12-slim-trixie
chore(deps): update docker.io/library/python:3.12-slim-trixie docker digest to 2bbc83f
2025-11-18 14:06:14 +01:00
Chris Coutinho c2fd87a5d3 Merge pull request #324 from cbcoutinho/renovate/docker.io-library-nextcloud-32.0.1
chore(deps): update docker.io/library/nextcloud:32.0.1 docker digest to f6232ea
2025-11-18 14:03:38 +01:00
github-actions[bot] 441d94301e bump: version 0.42.0 → 0.43.0 2025-11-18 12:56:15 +00:00
Chris Coutinho b488d69939 Merge pull request #326 from cbcoutinho/feature/notes2
feat: Replace custom document chunker with LangChain MarkdownTextSplitter
2025-11-18 13:55:34 +01:00
Chris Coutinho eec923eff5 feat: Replace custom document chunker with LangChain MarkdownTextSplitter
Migrates from custom word-based chunking to LangChain's MarkdownTextSplitter
for better semantic search quality. This implements the chunking portion of
ADR-011.

Changes:
- Replace custom regex word chunker with MarkdownTextSplitter
- Optimized for Markdown content (headers, code blocks, lists)
- Convert from word-based (512 words) to character-based (2048 chars) chunking
- Maintain backward-compatible ChunkWithPosition interface
- Update configuration defaults and validation
- Update all unit tests (12/12 passing)

Benefits:
- Respects markdown structure boundaries
- Never breaks code blocks or headers mid-chunk
- Preserves semantic coherence within chunks
- Expected 20-30% improvement in recall quality
- Industry-standard approach (used by production RAG systems)

Note: Full reindex required to apply new chunking to existing documents.
Current vector database still contains old word-based chunks.

Related: ADR-011 (Improving Semantic Search Quality)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-18 12:17:23 +01:00
renovate-bot-cbcoutinho[bot] 3642faf32c chore(deps): update docker.io/library/python:3.12-slim-trixie docker digest to 2bbc83f 2025-11-18 11:08:08 +00:00
renovate-bot-cbcoutinho[bot] 3b1cd96722 chore(deps): update docker.io/library/nextcloud:32.0.1 docker digest to f6232ea 2025-11-18 11:08:03 +00:00
Chris Coutinho 219d064459 Merge pull request #321 from cbcoutinho/renovate/pin-dependencies
chore(deps): pin ghcr.io/astral-sh/uv docker tag to 29bd450
2025-11-18 00:15:32 +01:00
Chris Coutinho d0ab8d071a Merge pull request #322 from cbcoutinho/renovate/actions-checkout-digest
chore(deps): update actions/checkout digest to 93cb6ef
2025-11-18 00:15:20 +01:00
Chris Coutinho b792e9d9a3 Merge pull request #323 from cbcoutinho/renovate/docker.io-library-mariadb-lts
chore(deps): update docker.io/library/mariadb:lts docker digest to 1cac849
2025-11-18 00:14:46 +01:00
renovate-bot-cbcoutinho[bot] 4288814ff4 chore(deps): update docker.io/library/mariadb:lts docker digest to 1cac849 2025-11-17 23:11:14 +00:00
renovate-bot-cbcoutinho[bot] f34a1c5677 chore(deps): update actions/checkout digest to 93cb6ef 2025-11-17 23:11:10 +00:00
renovate-bot-cbcoutinho[bot] 6d48f90112 chore(deps): pin ghcr.io/astral-sh/uv docker tag to 29bd450 2025-11-17 23:11:04 +00:00
Chris Coutinho b72aeca55f test: Add custom notes app 2025-11-17 22:14:01 +01:00
Chris Coutinho c1ae818b75 Merge pull request #317 from cbcoutinho/renovate/ghcr.io-astral-sh-uv-latest
chore(deps): update ghcr.io/astral-sh/uv:latest docker digest to 29bd450
2025-11-17 19:40:24 +01:00
Chris Coutinho ebca2bfc70 build: pin uv to 0.9.10, use --no-cache 2025-11-17 19:33:15 +01:00
Chris Coutinho 6dcd0bae48 Merge pull request #318 from cbcoutinho/renovate/actions-checkout-5.x
chore(deps): update actions/checkout action to v5.0.1
2025-11-17 19:23:32 +01:00
Chris Coutinho 818f643dca Merge pull request #319 from cbcoutinho/renovate/qdrant-1.x
chore(deps): update helm release qdrant to v1.16.0
2025-11-17 19:23:25 +01:00
Chris Coutinho d31b490f13 Merge pull request #320 from cbcoutinho/renovate/qdrant-qdrant-1.x
chore(deps): update qdrant/qdrant docker tag to v1.16.0
2025-11-17 19:23:16 +01:00
renovate-bot-cbcoutinho[bot] 839cf159b8 chore(deps): update qdrant/qdrant docker tag to v1.16.0 2025-11-17 17:09:02 +00:00
renovate-bot-cbcoutinho[bot] cefb438017 chore(deps): update helm release qdrant to v1.16.0 2025-11-17 17:08:54 +00:00
renovate-bot-cbcoutinho[bot] efc78a835e chore(deps): update actions/checkout action to v5.0.1 2025-11-17 17:08:34 +00:00
renovate-bot-cbcoutinho[bot] fa25a1b4df chore(deps): update ghcr.io/astral-sh/uv:latest docker digest to 29bd450 2025-11-17 17:08:28 +00:00
github-actions[bot] 8367208a03 bump: version 0.41.0 → 0.42.0 2025-11-17 07:25:33 +00:00
Chris Coutinho 52acc4bc07 Merge pull request #316 from cbcoutinho/feature/cleanup
feat(viz): Add dual-score display and improve UI controls
2025-11-17 08:25:04 +01:00
Chris Coutinho d374bfa1e5 feat(viz): Add dual-score display and improve UI controls
This commit enhances the vector visualization interface with better score
transparency and improved UX:

**Dual-Score Display:**
- Store original algorithm scores before normalization (viz_routes.py:203)
- Display both raw and normalized scores: "Raw Score: 0.842 (89% relative)"
- Update plot hover text with dual scores (userinfo_routes.py:740)
- Fixes issue where all queries showed at least one 100% match regardless
  of actual relevance (normalization artifact)

**UI Improvements:**
1. Fusion Method dropdown: Changed from x-show to :disabled
   - Prevents jarring layout shift when switching algorithms
   - Dropdown stays visible but grayed out when Semantic is selected
   - Better UX with opacity: 0.5 and cursor: not-allowed

2. Score Threshold: Changed step from 0.1 to "any"
   - Allows arbitrary float precision (0.7, 0.85, 0.123)
   - Users can now fine-tune threshold values

3. Document Types: Converted multi-select to checkbox grid
   - Replaced clunky Ctrl/Cmd multi-select listbox
   - Checkbox grid with cleaner layout
   - Positioned left of Score Threshold and Result Limit inputs
   - More intuitive UX

**Technical Details:**
- Raw score ranges vary by algorithm:
  - Semantic: 0.0-1.0 (cosine similarity)
  - BM25 RRF: ~0.001-0.033 (Reciprocal Rank Fusion)
  - BM25 DBSF: Can exceed 1.0 (Distribution-Based Score Fusion)
- Normalized scores (0-1) used for visual encoding (marker size, color)
- Original scores preserved in API response via getattr fallback

Files modified:
- nextcloud_mcp_server/auth/viz_routes.py (store original_score)
- nextcloud_mcp_server/auth/templates/vector_viz.html (UI controls)
- nextcloud_mcp_server/auth/userinfo_routes.py (plot hover text)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-17 08:05:49 +01:00
github-actions[bot] b1f7b1d30b bump: version 0.40.0 → 0.41.0 2025-11-17 05:57:12 +00:00
Chris Coutinho b8bdbb499f Merge pull request #315 from cbcoutinho/feature/cleanup
Feature/cleanup
2025-11-17 06:56:43 +01:00
Chris Coutinho 2522b13d35 ci: Add unit tests to ci 2025-11-17 06:51:40 +01:00
Chris Coutinho 6cfd7e2729 feat: add configurable fusion algorithms for BM25 hybrid search
Added support for two fusion algorithms (RRF and DBSF) to combine dense
semantic and sparse BM25 search results, with comprehensive documentation
and unit tests.

Changes:
- Added fusion parameter to nc_semantic_search and nc_semantic_search_answer tools
- Updated ADR-014 with detailed comparison of RRF vs DBSF fusion algorithms
- Added unit tests for fusion algorithm initialization and validation
- Updated search_method in responses to include fusion type (e.g., "bm25_hybrid_rrf")

Fusion Algorithms:
- RRF (Reciprocal Rank Fusion): Default, rank-based, general-purpose
- DBSF (Distribution-Based Score Fusion): Score normalization using statistics

RRF is recommended for most use cases due to its robustness and established
track record. DBSF may provide better results when retrieval systems have
very different score distributions.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-17 06:48:43 +01:00
Chris Coutinho 3aa7128f45 feat: add chunk position tracking to vector indexing and search
Track character offsets (start_offset, end_offset) for each chunk in vector
database metadata, enabling precise chunk highlighting in visualization pane.

Changes:
- processor.py: Store chunk_start_offset and chunk_end_offset in Qdrant metadata
- processor.py: Added metadata_version=2 to indicate position tracking support
- search/semantic.py: Return chunk positions from search results
- server/semantic.py: Expose chunk positions in API responses (SemanticSearchResult)

Enables viz pane to:
1. Display exact matched chunk with surrounding context
2. Highlight the precise portion of text that matched the query
3. Build user trust by showing what the RAG system actually retrieved

Position tracking uses ChunkWithPosition dataclass from document_chunker.py
which provides character-accurate offsets in the original document.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-17 06:47:58 +01:00
Chris Coutinho c3282534eb feat: add vector viz template and chunk context endpoint
Extracted vector visualization HTML template to separate file to resolve
syntax conflicts between Jinja2, Alpine.js, and CSS. Added chunk context
endpoint for fetching matched chunks with surrounding text.

Changes:
- Moved vector_viz.html to templates/ directory (separates Jinja2/Alpine.js/CSS)
- Added /app/chunk-context endpoint for retrieving chunk text with context
- Updated .dockerignore to include HTML files in Docker builds
- Moved anthropic and boto3 to main dependencies (needed for production features)
- Added jinja2 dependency for template rendering

Fixes Jinja2 TemplateSyntaxError caused by CSS colons being parsed as
Jinja2 syntax when template was inline in Python code.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-17 06:46:52 +01:00
Chris Coutinho 862308418e fix: prevent infinite loop in DocumentChunker with position tracking
Fixed a critical infinite loop bug in document_chunker.py that occurred
when the overlap parameter caused the chunker to not make forward progress.

Changes:
- Added ChunkWithPosition dataclass to track character positions
- Refactored chunk_text() to use regex word matching for accurate position tracking
- Added safety check to ensure forward progress (next_start_idx > start_idx)
- Changed return type from list[str] to list[ChunkWithPosition]

The bug manifested when:
1. end_idx reached len(word_matches) (processing last chunk)
2. next_start_idx = end_idx - overlap would not advance past start_idx
3. Loop would continue indefinitely without making progress

Fix ensures chunker always terminates by breaking when not advancing.

All 9 unit tests now pass in 1.66s (previously timing out at 180s).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-17 06:39:15 +01:00
Chris Coutinho 3464b21845 fix: Relax SearchResult validation to support DBSF fusion scores > 1.0
Fix false-positive validation error where DBSF (Distribution-Based Score
Fusion) correctly produces scores > 1.0 but SearchResult validation
incorrectly rejected them.

**Root Cause**: SearchResult.__post_init__() enforced scores in [0.0, 1.0]
range, but DBSF sums normalized scores from multiple retrieval systems
(dense semantic + sparse BM25), resulting in scores like 1.55 when both
systems strongly agree a document is relevant.

**Changes**:
- Relaxed validation to allow any score ≥ 0.0 (algorithms.py:147-157)
- Updated SearchResult and SemanticSearchResult documentation to explain
  score ranges for RRF ([0.0, 1.0]) vs DBSF (unbounded)
- Added comprehensive test coverage for both fusion methods
- Added DBSF fusion option to vector visualization UI
- Updated viz routes and vizApp() to support fusion parameter selection

**Testing**: All 157 unit tests pass, type checking passes, ruff passes

Fixes error: "Configuration error: Score must be between 0.0 and 1.0, got 1.1528953"

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-17 06:32:30 +01:00
Chris Coutinho ea01ce7673 Merge pull request #311 from cbcoutinho/renovate/python-replacement
chore(deps): replace python docker tag with docker.io/library/python
2025-11-16 12:11:52 +01:00
Chris Coutinho 216cb94383 Merge branch 'master' into renovate/python-replacement 2025-11-16 12:11:36 +01:00
Chris Coutinho 5f3e0b84a3 Merge pull request #310 from cbcoutinho/renovate/pin-dependencies
chore(deps): pin dependencies
2025-11-16 12:10:57 +01:00
github-actions[bot] 39131cefcc bump: version 0.39.0 → 0.40.0 2025-11-16 11:09:40 +00:00
Chris Coutinho 9498c0fa36 Merge pull request #309 from cbcoutinho/feature/bedrock
feat: Unified Provider Architecture + Amazon Bedrock Support
2025-11-16 12:09:12 +01:00
Chris Coutinho ed33b39062 docs: fix ADR-014 template text and numbering
- Remove template instruction text from line 1
- Fix ADR numbering from 007 to 014 to match filename
2025-11-16 12:08:37 +01:00
Chris Coutinho 1504df6fb5 Merge branch 'master' into feature/bedrock 2025-11-16 12:08:23 +01:00
renovate-bot-cbcoutinho[bot] 392e1536b9 chore(deps): replace python docker tag with docker.io/library/python 2025-11-16 11:07:34 +00:00
renovate-bot-cbcoutinho[bot] 00ed3f07e5 chore(deps): pin dependencies 2025-11-16 11:07:28 +00:00
github-actions[bot] 050e9a56b9 bump: version 0.38.0 → 0.39.0 2025-11-16 11:02:48 +00:00
Chris Coutinho 7fccd47722 Merge pull request #304 from cbcoutinho/feature/bm25
feat: Replace custom keyword search with BM25 hybrid search via Qdrant
2025-11-16 12:02:18 +01:00
Chris Coutinho f65b95ef07 Update Dockerfile 2025-11-16 11:58:13 +01:00
Chris Coutinho c28fc955ca Merge origin/master into feature/bm25
Resolved conflicts:
- viz_routes.py: Kept bm25's extract_dense_vector() function for robust vector handling
- hybrid.py: Removed (bm25 uses native Qdrant RRF fusion instead)
- uv.lock: Regenerated after accepting master's dependencies

This merge brings in:
- RAG evaluation framework (ADR-013)
- Performance optimizations (double-fetch elimination)
- Migration from asyncio to anyio
- OpenTelemetry tracing improvements
- Notes app enhancements

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-16 11:52:40 +01:00
Chris Coutinho ad4b45889f fix: suppress Starlette middleware type warnings in ty checker 2025-11-16 11:43:50 +01:00
Chris Coutinho 5b484c9226 feat: add unified provider architecture with Amazon Bedrock support
Refactored LLM provider infrastructure to support sustainable additions of new providers with both embedding and text generation capabilities.

## Major Changes

### Unified Provider Architecture (ADR-015)
- Created `nextcloud_mcp_server/providers/` with unified Provider ABC
- Providers now support optional capabilities (embeddings and/or generation)
- Auto-detection registry with priority: Bedrock → Ollama → Simple
- Backward compatible - existing code continues to work

### New Providers
- **BedrockProvider**: Full Amazon Bedrock integration
  - Embeddings: Titan Embed, Cohere Embed models
  - Generation: Claude, Llama, Titan Text, Mistral models
  - Model-specific request/response handling
  - AWS credential chain integration
- **OllamaProvider**: Migrated with both capabilities support
- **AnthropicProvider**: Moved from test code to production providers
- **SimpleProvider**: Migrated in-memory fallback provider

### Breaking Changes
None - full backward compatibility maintained:
- `embedding.get_embedding_service()` still works
- RAG evaluation tests updated to use unified providers
- All existing tests pass (127 unit tests)

### Testing
- Added 9 comprehensive Bedrock unit tests with mocked boto3
- All existing unit tests pass
- Type checking (ty) and linting (ruff) pass
- Verified backward compatibility

### Documentation
- `docs/ADR-015-unified-provider-architecture.md`: Comprehensive ADR
- `docs/bedrock-setup.md`: AWS setup guide with IAM permissions
- `CLAUDE.md`: Updated with provider architecture section

### Dependencies
- Added `boto3>=1.35.0` to dev dependencies (optional)

## Environment Variables

### Bedrock
- `AWS_REGION`: AWS region (e.g., "us-east-1")
- `BEDROCK_EMBEDDING_MODEL`: Model ID for embeddings
- `BEDROCK_GENERATION_MODEL`: Model ID for generation
- `AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY`: Optional credentials

### Ollama
- `OLLAMA_BASE_URL`: API URL
- `OLLAMA_EMBEDDING_MODEL`: Embedding model (default: "nomic-embed-text")
- `OLLAMA_GENERATION_MODEL`: Generation model

## AWS Bedrock Permissions Required

Minimal IAM policy:
```json
{
  "Effect": "Allow",
  "Action": ["bedrock:InvokeModel"],
  "Resource": ["arn:aws:bedrock:*::foundation-model/*"]
}
```

See `docs/bedrock-setup.md` for detailed setup instructions.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-16 11:36:58 +01:00
github-actions[bot] b58b200452 bump: version 0.37.0 → 0.38.0 2025-11-16 10:18:37 +00:00
Chris Coutinho c1aad94aa7 Merge pull request #308 from cbcoutinho/revert-305-feature/notes
Revert "Feature/notes"
2025-11-16 11:18:12 +01:00
github-actions[bot] 10129354d9 bump: version 0.36.0 → 0.37.0 2025-11-16 10:18:00 +00:00
Chris Coutinho 259d33b41d Revert "Feature/notes" 2025-11-16 11:17:59 +01:00
Chris Coutinho 32d8eaaab6 Merge pull request #305 from cbcoutinho/feature/notes
Feature/notes
2025-11-16 11:17:51 +01:00
Chris Coutinho 8799450c7d Merge pull request #306 from cbcoutinho/rag-evaluation
feat: RAG evaluation framework with performance improvements
2025-11-16 11:17:41 +01:00
Chris Coutinho 1a02819999 Merge pull request #307 from cbcoutinho/feature/mcp-tool-tracing
feat: Add OpenTelemetry tracing to @instrument_tool decorator
2025-11-16 11:17:33 +01:00
Chris Coutinho c4bf077050 feat: Add OpenTelemetry tracing to @instrument_tool decorator
Enhances the @instrument_tool decorator to create distributed traces
for all MCP tool executions, improving observability and debugging.

Changes:
- Modified @instrument_tool to wrap tool execution in trace_operation
- Added automatic span creation with mcp.tool.* span names
- Sanitized tool arguments before adding to span attributes
  (excludes password, token, secret, api_key, etag, ctx)
- Limited argument strings to 500 characters to prevent huge spans
- Maintained existing Prometheus metrics functionality
- Updated docs/observability.md to reflect correct decorator name
- Added comprehensive unit tests

All ~50+ MCP tools now emit traces automatically without code changes.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-16 11:16:05 +01:00
Chris Coutinho f559ca049e Merge branch 'rag-evaluation' 2025-11-16 10:26:19 +01:00
Chris Coutinho 02700a8e2c perf: Eliminate double-fetching in semantic search sampling
Performance optimization that removes redundant verification step and
makes content fetching parallel in nc_semantic_search_answer tool.

Changes:
- Remove verification.py module (only had 1 caller)
- Refactor nc_semantic_search to do inline deduplication instead of
  calling verify_search_results()
- Migrate verification patterns (anyio task group, semaphore limiting)
  to nc_semantic_search_answer's content fetching
- Change content fetching from sequential loop to parallel execution

Performance impact:
- Before: 10 API calls (5 parallel verification + 5 sequential content)
  = ~5.5s overhead
- After: 5 API calls (parallel content fetch) = ~0.5s overhead
- Result: 50% fewer API calls, ~10x faster for sampling operations

Technical details:
- Uses anyio.create_task_group() for structured concurrency
- Semaphore limiting (max_concurrent=20) prevents connection pool exhaustion
- Index-based storage maintains result ordering
- Expected failures (deleted notes) logged at debug level
- Deduplication handles hybrid search returning same doc from dense + sparse

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-16 10:25:04 +01:00
Chris Coutinho 8e7b3c3ded Merge branch 'feature/notes' 2025-11-16 09:18:58 +01:00
Chris Coutinho c74695af16 Merge branch 'feature/notes' 2025-11-16 08:28:00 +01:00
Chris Coutinho 1faf572546 Merge branch 'feature/bm25'
Resolves conflict in viz_routes.py by combining:
- Named vector extraction from feature/bm25
- Performance timing from master
2025-11-16 08:18:39 +01:00
Chris Coutinho 944b6dcf5a fix: Handle named vectors in visualization and semantic search
- viz_routes.py: Extract "dense" vector from named vector dict
- semantic.py: Specify using="dense" for BM25 hybrid collections
- Fixes "X must be 2D array" error in hybrid search
- Fixes "Dense vector  is not found" error in semantic search

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-16 08:16:35 +01:00
Chris Coutinho 2aa82d849c Merge branch 'feature/bm25' 2025-11-16 07:57:36 +01:00
Chris Coutinho fc6a2f14e4 fix: Update vizApp to use bm25_hybrid algorithm and remove deprecated weights
The visualization UI was still using the old 'hybrid' algorithm name and
weight parameters that were replaced by the BM25 hybrid search refactor.
This caused "Unknown algorithm: hybrid" errors when using the search
& visualize feature.

Changes:
- Update default algorithm from 'hybrid' to 'bm25_hybrid'
- Update default scoreThreshold from 0.7 to 0.0 to match backend
- Remove deprecated semanticWeight, keywordWeight, fuzzyWeight parameters
- Remove weight parameters from search request

Fixes the visualization search functionality after BM25 hybrid refactor.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-16 07:54:20 +01:00
Chris Coutinho d1fb7eb633 Merge branch 'rag-evaluation' 2025-11-16 07:46:17 +01:00
Chris Coutinho 5e80f22d42 Merge pull request #303 from cbcoutinho/renovate/commitizen-tools-commitizen-action-0.x
chore(deps): update commitizen-tools/commitizen-action action to v0.25.0
2025-11-16 07:37:05 +01:00
Chris Coutinho 96cee48258 build: Migrate image to debian-based 2025-11-16 07:32:01 +01:00
Chris Coutinho 16c22c953b fix: Update viz routes to use BM25 hybrid search after refactor
- Remove obsolete search algorithm imports (Fuzzy, Keyword, Hybrid)
- Update UI to only show Semantic and BM25 Hybrid algorithms
- Replace manual weight controls with RRF fusion info message
- Update default algorithm from "hybrid" to "bm25_hybrid"
- Remove weight parameters (semantic_weight, keyword_weight, fuzzy_weight)
- Update score_threshold default from 0.7 to 0.0 for RRF scoring
- Document ty type checker in CLAUDE.md

Fixes unresolved-import type errors after BM25 refactor.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-16 07:23:11 +01:00
Chris Coutinho 529daf2b48 ci: temp disable sse in ci 2025-11-16 07:03:18 +01:00
Chris Coutinho 137d1d6c75 perf: fix vector viz search performance and visual encoding
This commit addresses critical performance issues with vector visualization
search (reducing time from 40s to ~2s) and improves result visualization
through better visual encoding.

## Performance Fixes

### 1. Fix blocking sleep in retry decorator (base.py:51)
- Changed `time.sleep(5)` to `await anyio.sleep(5)` in @retry_on_429
- Prevents entire event loop from freezing during rate limit retries
- Impact: Reduced search time from 22s to 16s initially

### 2. Add concurrency limiting for verification (verification.py:77-93)
- Added `anyio.Semaphore(20)` to limit concurrent HTTP requests
- Prevents connection pool exhaustion (RequestError) from 90+ simultaneous requests
- Fixes false filtering (was filtering 77/90 results incorrectly)
- Note: Semaphore still in code but verification removed from viz endpoint

### 3. Remove unnecessary verification from viz endpoint (viz_routes.py:483-486)
- Visualization only needs Qdrant metadata (title, excerpt), not full content
- Verification only required for sampling (LLM needs full note content)
- Impact: Reduced search time from 43.7s to ~2s (final fix)

### 4. Restore streaming scanner pattern (scanner.py)
- Process notes one-at-a-time using async generator
- Avoids loading all notes into memory

## Visualization Improvements

### 5. Result-relative score normalization (viz_routes.py:489-504)
- Normalize scores within result set: best=1.0, worst=0.0
- Removes arbitrary RRF normalization (theoretical max didn't make sense)
- Makes visual encoding meaningful regardless of algorithm scores

### 6. Power scaling for marker sizes (userinfo_routes.py:743)
- Changed from linear `8 + (score * 12)` to power `6 + (score² * 14)`
- Creates dramatic visual contrast: 0.0→6px, 0.5→9.5px, 1.0→20px
- Combined with opacity (0.2-1.0) for clear visual hierarchy

### 7. Multi-channel visual encoding (userinfo_routes.py:740-745)
- Size: Exponentially scaled with score²
- Opacity: Linear 0.2-1.0 (keeps all points visible)
- Color: Viridis gradient (blue→yellow)
- Effect: Top results are large/bright/opaque, context results small/dim/transparent

## Result
- Search time: 40s → ~2s (20x faster)
- Visual contrast: Subtle → dramatic (clear result hierarchy)
- No arbitrary cutoffs: All results visible, best naturally highlighted

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-16 07:01:35 +01:00
Chris Coutinho 6fe5596c13 feat: Implement BM25 hybrid search with native Qdrant RRF fusion
Replace custom keyword/fuzzy search algorithms with industry-standard BM25
sparse vectors, combined with dense semantic vectors using Qdrant's native
Reciprocal Rank Fusion (RRF). This consolidates search architecture and
improves relevance for both semantic and keyword queries.

Key changes:
- Add fastembed dependency for BM25 sparse vector generation
- Update Qdrant collection schema to support named vectors (dense + sparse)
- Create BM25SparseEmbeddingProvider using FastEmbed's Qdrant/bm25 model
- Implement BM25HybridSearchAlgorithm with native Qdrant RRF prefetch
- Update document processor to generate both dense and sparse embeddings
- Simplify nc_semantic_search() tool to use BM25 hybrid only
- Remove legacy keyword.py, fuzzy.py, and custom hybrid.py (736 lines)
- Update ADR-014 with implementation notes and test results

Benefits:
- Consolidated architecture (single Qdrant database)
- Native database-level RRF fusion (more efficient)
- Industry-standard BM25 (replaces brittle custom keyword search)
- Better relevance across semantic and keyword queries
- Simplified codebase (-285 net lines)

Tests: All 125 tests passing (118 unit, 7 integration)

Implements ADR-014: Replace Custom Keyword Search with BM25 Hybrid Search

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-16 06:59:44 +01:00
Chris Coutinho f5bc3e3bc3 docs: init ADR 2025-11-16 06:24:25 +01:00
renovate-bot-cbcoutinho[bot] a9eb2c1da2 chore(deps): update commitizen-tools/commitizen-action action to v0.25.0 2025-11-16 05:07:20 +00:00
Chris Coutinho c8d9cc24e0 refactor: migrate asyncio to anyio for consistent structured concurrency
Replace asyncio primitives with anyio equivalents throughout the codebase
to establish a single async pattern. This provides better structured
concurrency with automatic cancellation on errors and aligns with the
pytest anyio configuration.

Changes:
- hybrid.py: Replace asyncio.gather() with anyio task groups
- token_broker.py: Replace asyncio.Lock() with anyio.Lock()
- storage.py: Replace asyncio.run() with anyio.run()
- app.py: Replace tg.start_soon() with await tg.start() for task status
- processor.py: Add task_status parameter for structured startup
- scanner.py: Add task_status parameter for structured startup
- CLAUDE.md: Update async/await patterns guidance

The change from start_soon() to await tg.start() enables proper task
initialization signaling, ensuring background tasks are ready before
proceeding. This follows anyio best practices for structured concurrency.

All 118 unit tests pass with the new implementation.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-16 03:51:45 +01:00
Chris Coutinho 98d1c2de8e perf: make note deletion concurrent in upload --force
- Collect all notes to delete first, then delete concurrently
- Use anyio task group with semaphore (20 concurrent deletions)
- Add progress reporting and error tracking for deletions
- Show count of notes found before deletion starts

This significantly improves --force performance when refreshing large
corpuses (e.g., 3,633 notes now delete in ~1 minute instead of ~5 minutes).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-16 00:55:27 +01:00
Chris Coutinho 30a4d84458 feat: add concurrent uploads and --force flag to upload command
- Add --force flag to delete all existing notes in target category before upload
- Implement concurrent uploads using anyio task groups (20 concurrent max)
- Add semaphore to limit concurrent requests and avoid overwhelming server
- Improve progress reporting with upload count and error tracking
- Update README with --force flag documentation

Performance improvement: Concurrent uploads significantly reduce upload time
from ~10-15 minutes to ~2-3 minutes for 3,633 documents.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-16 00:41:00 +01:00
Chris Coutinho fca8ab0cfd Merge remote-tracking branch 'origin/master' into rag-evaluation 2025-11-16 00:32:59 +01:00
Chris Coutinho 4fa2edf4c7 ci: Set default scan interval to 5min 2025-11-16 00:10:12 +01:00
Chris Coutinho defa8db18e fix: download qrels from BEIR ZIP instead of HuggingFace
- HuggingFace BeIR/nfcorpus only has 'corpus' and 'queries' configs
- Download qrels from original BEIR ZIP file (nfcorpus.zip)
- Use synchronous httpx.Client for download (simpler than async)
- Remove deprecated trust_remote_code parameter

Tested with successful corpus download and qrels extraction.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-16 00:02:15 +01:00
Chris Coutinho c9506da2d2 refactor: replace httpx client with NextcloudClient in upload command
- Use NextcloudClient with BasicAuth instead of raw httpx
- Replace direct HTTP POST with notes.create_note() method
- Add close() method to LLMProvider Protocol for proper cleanup
- Fix type annotations for dataset iteration

This improves code reuse and consistency with the rest of the codebase.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-15 23:26:07 +01:00
Chris Coutinho c272ddd82d feat: implement RAG evaluation framework with CLI tooling
- Add ADR-013 documenting RAG evaluation architecture
- Implement two-part evaluation: Context Recall (retrieval) + Answer Correctness (generation)
- Create Click CLI for ground truth generation and corpus upload
- Add pytest fixtures and tests for retrieval/generation quality
- Use BeIR/nfcorpus dataset with 5 selected test queries
- Support Ollama and Anthropic LLM providers
- Generate synthetic ground truth answers offline
- Add comprehensive documentation in tests/rag_evaluation/README.md

The framework separates one-time setup (generate/upload) from test execution,
making tests much faster (~6-12 min vs ~15-25 min per run).

Tests are manual only (not in CI) and require external LLM access.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-15 23:11:21 +01:00
92 changed files with 10184 additions and 2237 deletions
+2
View File
@@ -5,3 +5,5 @@
!uv.lock
!nextcloud_mcp_server/**/*.py
!nextcloud_mcp_server/**/*.html
!nextcloud_mcp_server/auth/static/*
+2 -2
View File
@@ -15,12 +15,12 @@ jobs:
packages: write
steps:
- name: Check out
uses: actions/checkout@08c6903cd8c0fde910a37f88322edcfb5dd907a8 # v5
uses: actions/checkout@93cb6efe18208431cddfb8368fd83d5badbf9bfd # v5
with:
fetch-depth: 0
token: "${{ secrets.PERSONAL_ACCESS_TOKEN }}"
- name: Create bump and changelog
uses: commitizen-tools/commitizen-action@5b0848cd060263e24602d1eba03710e056ef7711 # 0.24.0
uses: commitizen-tools/commitizen-action@9615e7be1cf341393c52e865ebbdaa0712176d81 # 0.25.0
with:
github_token: ${{ secrets.PERSONAL_ACCESS_TOKEN }}
changelog_increment_filename: body.md
+1 -1
View File
@@ -12,7 +12,7 @@ jobs:
packages: write
steps:
- name: Checkout repository
uses: actions/checkout@08c6903cd8c0fde910a37f88322edcfb5dd907a8 # v5
uses: actions/checkout@93cb6efe18208431cddfb8368fd83d5badbf9bfd # v5
- name: Docker meta
id: meta
+1 -1
View File
@@ -14,7 +14,7 @@ jobs:
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@08c6903cd8c0fde910a37f88322edcfb5dd907a8 # v5
uses: actions/checkout@93cb6efe18208431cddfb8368fd83d5badbf9bfd # v5
with:
fetch-depth: 0
+1 -1
View File
@@ -18,7 +18,7 @@ jobs:
contents: read
steps:
- name: Checkout
uses: actions/checkout@08c6903cd8c0fde910a37f88322edcfb5dd907a8 # v5
uses: actions/checkout@93cb6efe18208431cddfb8368fd83d5badbf9bfd # v5
- name: Install uv
uses: astral-sh/setup-uv@5a7eac68fb9809dea845d802897dc5c723910fa3 # v7.1.3
- name: Install Python 3.11
+3 -3
View File
@@ -9,7 +9,7 @@ jobs:
linting:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@08c6903cd8c0fde910a37f88322edcfb5dd907a8 # v5.0.0
- uses: actions/checkout@93cb6efe18208431cddfb8368fd83d5badbf9bfd # v5.0.1
- name: Install the latest version of uv
uses: astral-sh/setup-uv@5a7eac68fb9809dea845d802897dc5c723910fa3 # v7.1.3
- name: Check format
@@ -27,7 +27,7 @@ jobs:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@08c6903cd8c0fde910a37f88322edcfb5dd907a8 # v5.0.0
- uses: actions/checkout@93cb6efe18208431cddfb8368fd83d5badbf9bfd # v5.0.1
with:
submodules: 'true'
@@ -85,4 +85,4 @@ jobs:
NEXTCLOUD_USERNAME: "admin"
NEXTCLOUD_PASSWORD: "admin"
run: |
uv run pytest -v --log-cli-level=WARN -m smoke
uv run pytest -v --log-cli-level=WARN -m unit -m smoke
+3
View File
@@ -13,3 +13,6 @@ docker-compose.override.yml
# Generated by pytest used to login users
.nextcloud_oauth_*.json
.playwright-mcp/
# RAG Evaluation
tests/rag_evaluation/fixtures/
-3
View File
@@ -1,6 +1,3 @@
[submodule "oidc"]
path = third_party/oidc
url = https://github.com/cbcoutinho/oidc
[submodule "third_party/oidc"]
path = third_party/oidc
url = https://github.com/cbcoutinho/oidc
+75
View File
@@ -1,3 +1,78 @@
## v0.43.0 (2025-11-18)
### Feat
- Replace custom document chunker with LangChain MarkdownTextSplitter
## v0.42.0 (2025-11-17)
### Feat
- **viz**: Add dual-score display and improve UI controls
## v0.41.0 (2025-11-17)
### Feat
- add configurable fusion algorithms for BM25 hybrid search
- add chunk position tracking to vector indexing and search
- add vector viz template and chunk context endpoint
### Fix
- prevent infinite loop in DocumentChunker with position tracking
- Relax SearchResult validation to support DBSF fusion scores > 1.0
## v0.40.0 (2025-11-16)
### Feat
- add unified provider architecture with Amazon Bedrock support
### Fix
- suppress Starlette middleware type warnings in ty checker
## v0.39.0 (2025-11-16)
### Feat
- Implement BM25 hybrid search with native Qdrant RRF fusion
### Fix
- Handle named vectors in visualization and semantic search
- Update vizApp to use bm25_hybrid algorithm and remove deprecated weights
- Update viz routes to use BM25 hybrid search after refactor
## v0.38.0 (2025-11-16)
### Feat
- add concurrent uploads and --force flag to upload command
- implement RAG evaluation framework with CLI tooling
### Fix
- download qrels from BEIR ZIP instead of HuggingFace
### Refactor
- migrate asyncio to anyio for consistent structured concurrency
- replace httpx client with NextcloudClient in upload command
### Perf
- Eliminate double-fetching in semantic search sampling
- fix vector viz search performance and visual encoding
- make note deletion concurrent in upload --force
## v0.37.0 (2025-11-16)
### Feat
- Add OpenTelemetry tracing to @instrument_tool decorator
## v0.36.0 (2025-11-15)
### BREAKING CHANGE
+63 -5
View File
@@ -5,23 +5,29 @@ This file provides guidance to Claude Code (claude.ai/code) when working with co
## Coding Conventions
### async/await Patterns
- **Use anyio + asyncio hybrid** - Both libraries are available
- **Use anyio for all async operations** - Provides structured concurrency
- pytest runs in `anyio` mode (`anyio_mode = "auto"` in pyproject.toml)
- asyncio used in auth modules (refresh_token_storage.py, token_exchange.py, token_broker.py)
- anyio used in calendar.py, client_registration.py, app.py
- Use `anyio.create_task_group()` for concurrent execution (NOT `asyncio.gather()`)
- Use `anyio.Lock()` for synchronization primitives (NOT `asyncio.Lock()`)
- Use `anyio.run()` for entry points (NOT `asyncio.run()`)
- Prefer standard async/await syntax without explicit library imports when possible
- Examples: app.py, search/hybrid.py, search/verification.py, auth/token_broker.py
### Type Hints
- **Use Python 3.10+ union syntax**: `str | None` instead of `Optional[str]`
- **Use lowercase generics**: `dict[str, Any]` instead of `Dict[str, Any]`
- **Type all function signatures** - Parameters and return types
- **No explicit type checker configured** - Ruff handles linting only
- **Type checker**: `ty` is configured for static type checking
```bash
uv run ty check -- nextcloud_mcp_server
```
### Code Quality
- **Run ruff before committing**:
- **Run ruff and ty before committing**:
```bash
uv run ruff check
uv run ruff format
uv run ty check -- nextcloud_mcp_server
```
- **Ruff configuration** in pyproject.toml (extends select: ["I"] for import sorting)
@@ -55,8 +61,60 @@ This file provides guidance to Claude Code (claude.ai/code) when working with co
- `nextcloud_mcp_server/server/` - MCP tool/resource definitions
- `nextcloud_mcp_server/auth/` - OAuth/OIDC authentication
- `nextcloud_mcp_server/models/` - Pydantic response models
- `nextcloud_mcp_server/providers/` - Unified LLM provider infrastructure (embeddings + generation)
- `tests/` - Layered test suite (unit, smoke, integration, load)
### Provider Architecture (ADR-015)
**Unified Provider System** for embeddings and text generation:
**Location:** `nextcloud_mcp_server/providers/`
- `base.py` - `Provider` ABC with optional capabilities
- `registry.py` - Auto-detection and factory pattern
- `ollama.py` - Ollama provider (embeddings + generation)
- `anthropic.py` - Anthropic provider (generation only)
- `bedrock.py` - Amazon Bedrock provider (embeddings + generation)
- `simple.py` - Simple in-memory provider (embeddings only, fallback)
**Usage:**
```python
from nextcloud_mcp_server.providers import get_provider
provider = get_provider() # Auto-detects from environment
# Check capabilities
if provider.supports_embeddings:
embeddings = await provider.embed_batch(texts)
if provider.supports_generation:
text = await provider.generate("prompt", max_tokens=500)
```
**Environment Variables:**
Bedrock:
- `AWS_REGION` - AWS region (e.g., "us-east-1")
- `BEDROCK_EMBEDDING_MODEL` - Embedding model ID (e.g., "amazon.titan-embed-text-v2:0")
- `BEDROCK_GENERATION_MODEL` - Generation model ID (e.g., "anthropic.claude-3-sonnet-20240229-v1:0")
- `AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY` - Optional, uses AWS credential chain
Ollama:
- `OLLAMA_BASE_URL` - API URL (e.g., "http://localhost:11434")
- `OLLAMA_EMBEDDING_MODEL` - Embedding model (default: "nomic-embed-text")
- `OLLAMA_GENERATION_MODEL` - Generation model (e.g., "llama3.2:1b")
- `OLLAMA_VERIFY_SSL` - SSL verification (default: "true")
Simple (fallback, no config needed):
- `SIMPLE_EMBEDDING_DIMENSION` - Dimension (default: 384)
**Auto-Detection Priority:** Bedrock → Ollama → Simple
**Backward Compatibility:**
- Old code using `nextcloud_mcp_server.embedding.get_embedding_service()` still works
- `EmbeddingService` now wraps `get_provider()` internally
**For Details:** See `docs/ADR-015-unified-provider-architecture.md`
## Development Commands (Quick Reference)
### Testing
+7 -3
View File
@@ -1,15 +1,19 @@
FROM ghcr.io/astral-sh/uv:0.9.9-python3.11-alpine@sha256:0faa7934fac1db7f5056f159c1224d144bab864fd2677a4066d25a686ae32edd
FROM docker.io/library/python:3.12-slim-trixie@sha256:2bbc83fcf744fb96397408e595395fadc9d8d84ae38bfc30b8766d533f4f8e7e
COPY --from=ghcr.io/astral-sh/uv:0.9.10@sha256:29bd45092ea8902c0bbb7f0a338f0494a382b1f4b18355df5be270ade679ff1d /uv /uvx /bin/
# Install dependencies
# 1. git (required for caldav dependency from git)
# 2. sqlite for development with token db
RUN apk add --no-cache git sqlite
RUN apt update && apt install --no-install-recommends --no-install-suggests -y \
git \
sqlite3 && apt clean
WORKDIR /app
COPY . .
RUN uv sync --locked --no-dev --no-editable
RUN uv sync --locked --no-dev --no-editable --no-cache
ENV PYTHONUNBUFFERED=1
ENV VIRTUAL_ENV=/app/.venv
+11
View File
@@ -1,3 +1,7 @@
<p align="center">
<img src="astrolabe.svg" alt="Nextcloud MCP Server" width="128" height="128">
</p>
# Nextcloud MCP Server
[![Docker Image](https://img.shields.io/badge/docker-ghcr.io/cbcoutinho/nextcloud--mcp--server-blue)](https://github.com/cbcoutinho/nextcloud-mcp-server/pkgs/container/nextcloud-mcp-server)
@@ -29,6 +33,12 @@ docker run -p 127.0.0.1:8000:8000 --env-file .env --rm \
# 3. Test the connection
curl http://127.0.0.1:8000/health/ready
# 4. Connect to the endpoint
http://127.0.0.1:8000/sse
# 4. Or with --transport streamable-http
http://127.0.0.1:8000/mcp
```
**Next Steps:**
@@ -123,6 +133,7 @@ This enables natural language queries and helps discover related content across
- **[App Documentation](docs/)** - Notes, Calendar, Contacts, WebDAV, Deck, Cookbook, Tables
- **[Document Processing](docs/configuration.md#document-processing)** - OCR and text extraction setup
- **[Semantic Search Architecture](docs/semantic-search-architecture.md)** - Experimental vector search (Notes only, opt-in)
- **[Vector Sync UI Guide](docs/user-guide/vector-sync-ui.md)** - Browser interface for semantic search visualization and testing
### Advanced Topics
- **[OAuth Architecture](docs/oauth-architecture.md)** - How OAuth works (experimental)
@@ -9,19 +9,19 @@ if [ -d /opt/apps/notes ]; then
echo "Development notes app found at /opt/apps/notes"
# Remove any existing notes app in apps (from app store or old symlink)
if [ -e /var/www/html/apps/notes ]; then
if [ -e /var/www/html/custom_apps/notes ]; then
echo "Removing existing notes in apps..."
rm -rf /var/www/html/apps/notes
rm -rf /var/www/html/custom_apps/notes
fi
# Create symlink from apps to the mounted development version
# Per Nextcloud docs: apps outside server root need symlinks in server root
echo "Creating symlink: apps/notes -> /opt/apps/notes"
ln -sf /opt/apps/notes /var/www/html/apps/notes
echo "Creating symlink: custom_apps/notes -> /opt/apps/notes"
ln -sf /opt/apps/notes /var/www/html/custom_apps/notes
echo "Enabling notes app from /opt/apps (development mode via symlink)"
php /var/www/html/occ app:enable notes
elif [ -d /var/www/html/apps/notes ]; then
elif [ -d /var/www/html/custom_apps/notes ]; then
echo "notes app directory found in apps (already installed)"
php /var/www/html/occ app:enable notes
else
+3 -3
View File
@@ -1,9 +1,9 @@
dependencies:
- name: qdrant
repository: https://qdrant.github.io/qdrant-helm
version: 1.15.5
version: 1.16.0
- name: ollama
repository: https://otwld.github.io/ollama-helm
version: 1.34.0
digest: sha256:d51c97d05be2614b751c0dd7267ef7dc959eff5ebef859c5f895c5c554b7a874
generated: "2025-11-09T17:08:02.86648061Z"
digest: sha256:9dfb8d6e3d5488f669d4c37f3a766213b598ff3de2aead2c734789736c7835b4
generated: "2025-11-17T17:08:48.055530019Z"
+3 -3
View File
@@ -2,8 +2,8 @@ apiVersion: v2
name: nextcloud-mcp-server
description: A Helm chart for Nextcloud MCP Server - enables AI assistants to interact with Nextcloud
type: application
version: 0.36.0
appVersion: "0.36.0"
version: 0.43.0
appVersion: "0.43.0"
keywords:
- nextcloud
- mcp
@@ -27,7 +27,7 @@ annotations:
grafana_dashboard_folder: "Nextcloud MCP"
dependencies:
- name: qdrant
version: "1.15.5"
version: "1.16.0"
repository: https://qdrant.github.io/qdrant-helm
condition: qdrant.networkMode.deploySubchart
- name: ollama
+9 -19
View File
@@ -3,7 +3,7 @@ services:
# https://hub.docker.com/_/mariadb
db:
# Note: Check the recommend version here: https://docs.nextcloud.com/server/latest/admin_manual/installation/system_requirements.html#server
image: docker.io/library/mariadb:lts@sha256:6b848cb24fbbd87429917f6c4422ac53c343e85692eb0fef86553e99e4f422f3
image: docker.io/library/mariadb:lts@sha256:1cac8492bd78b1ec693238dc600be173397efd7b55eabc725abc281dc855b482
restart: always
command: --transaction-isolation=READ-COMMITTED
volumes:
@@ -21,7 +21,7 @@ services:
restart: always
app:
image: docker.io/library/nextcloud:32.0.1@sha256:5b043f7ea2f609d5ff5635f475c30d303bec17775a5c3f7fa435e3818e669120
image: docker.io/library/nextcloud:32.0.1@sha256:f6232ea49059c075e9dc65a18bc4f729d53957982644977ccc5dbbeb99988f09
restart: always
ports:
- 0.0.0.0:8080:80
@@ -34,7 +34,7 @@ services:
- ./app-hooks:/docker-entrypoint-hooks.d:ro
# Mount OIDC development directory outside /var/www/html to avoid rsync conflicts
# The post-installation hook will register /opt/apps as an additional app directory
#- ./third_party:/opt/apps:ro
- ./third_party:/opt/apps:ro
environment:
- NEXTCLOUD_TRUSTED_DOMAINS=app
- NEXTCLOUD_ADMIN_USER=admin
@@ -70,11 +70,13 @@ services:
mcp:
build: .
restart: always
command: ["--transport", "streamable-http"]
depends_on:
app:
condition: service_healthy
ports:
- 127.0.0.1:8000:8000
- 127.0.0.1:9090:9090
volumes:
- mcp-data:/app/data
environment:
@@ -85,7 +87,7 @@ services:
# Vector sync configuration (ADR-007)
- VECTOR_SYNC_ENABLED=true
- VECTOR_SYNC_SCAN_INTERVAL=10
- VECTOR_SYNC_SCAN_INTERVAL=60
- VECTOR_SYNC_PROCESSOR_WORKERS=1
#- LOG_FORMAT=json
@@ -193,8 +195,8 @@ services:
# Provider auto-detected from OIDC_DISCOVERY_URL issuer
# Using internal Docker hostname for discovery to get consistent issuer
- OIDC_DISCOVERY_URL=http://keycloak:8080/realms/nextcloud-mcp/.well-known/openid-configuration
- OIDC_CLIENT_ID=nextcloud-mcp-server
- OIDC_CLIENT_SECRET=mcp-secret-change-in-production
- NEXTCLOUD_OIDC_CLIENT_ID=nextcloud-mcp-server
- NEXTCLOUD_OIDC_CLIENT_SECRET=mcp-secret-change-in-production
- OIDC_JWKS_URI=http://keycloak:8080/realms/nextcloud-mcp/protocol/openid-connect/certs
# Nextcloud API endpoint (for accessing APIs with validated token)
@@ -223,7 +225,7 @@ services:
- keycloak-oauth-storage:/app/.oauth
qdrant:
image: qdrant/qdrant:v1.15.5@sha256:0fb8897412abc81d1c0430a899b9a81eb8328aa634e7242d1bc804c1fe8fe863
image: qdrant/qdrant:v1.16.0@sha256:1005201498cf927d835383d0f918b17d8c9da7db58550f169f694455e42d78f4
restart: always
ports:
- 127.0.0.1:6333:6333 # REST API
@@ -240,17 +242,6 @@ services:
profiles:
- qdrant
open-webui:
image: ghcr.io/open-webui/open-webui:main
environment:
- OLLAMA_BASE_URL=https://ollama.internal.coutinho.io
ports:
- 127.0.0.1:3000:8080
volumes:
- open-webui:/app/backend/data
profiles:
- open-webui
volumes:
nextcloud:
db:
@@ -260,4 +251,3 @@ volumes:
keycloak-oauth-storage:
qdrant-data:
mcp-data:
open-webui:
@@ -1,7 +1,8 @@
# ADR-011: Improving Semantic Search Quality Through Better Chunking and Embeddings
**Status**: Proposed
**Status**: Partially Implemented (Chunking Complete, Embeddings Pending)
**Date**: 2025-11-12
**Implementation Date**: 2025-11-18 (Chunking)
**Authors**: Development Team
**Related**: ADR-003 (Vector Database Architecture), ADR-008 (MCP Sampling for RAG)
@@ -893,3 +894,50 @@ This ADR addresses the root causes of poor semantic search recall:
- No new infrastructure or ongoing costs
**Next Steps**: Approve ADR → Implement changes → Reindex → Validate → Production rollout
## Implementation Status
### Completed (2025-11-18)
**✅ Semantic Markdown-Aware Chunking (Option C1 + C3 Hybrid)**
Implementation details:
- Replaced custom word-based chunking with `MarkdownTextSplitter` from LangChain
- Optimized for Nextcloud Notes markdown content with special handling for:
- Headers (`#`, `##`, `###`, etc.)
- Code blocks (` ``` `)
- Lists (`-`, `*`, `1.`)
- Horizontal rules (`---`)
- Paragraphs and sentences
- Maintained `ChunkWithPosition` interface for backward compatibility
- Updated configuration defaults:
- `DOCUMENT_CHUNK_SIZE`: 512 words → 2048 characters
- `DOCUMENT_CHUNK_OVERLAP`: 50 words → 200 characters
- Updated unit tests to verify position tracking and boundary preservation
- All tests passing with markdown-aware character-based chunking
**Files Modified**:
- `nextcloud_mcp_server/vector/document_chunker.py` - LangChain integration
- `nextcloud_mcp_server/config.py` - Character-based defaults
- `tests/unit/test_document_chunker.py` - Updated test suite
**Dependencies Added**:
- `langchain-text-splitters>=1.0.0` (already present in `pyproject.toml`)
**Migration Required**:
- ⚠️ Full reindex required to apply new chunking strategy
- Existing documents in vector database use old word-based chunks
- See "Migration Strategy" section above for reindexing process
### Pending
**⏳ Embedding Model Upgrade (Option E1)**
Still to be implemented:
- Switch from `nomic-embed-text` (768-dim) to `mxbai-embed-large-v1` (1024-dim)
- Implement dynamic dimension detection in `ollama_provider.py`
- Create migration script for collection reindexing
- Run benchmarking to validate improvement
- Deploy to production with atomic collection swap
**Estimated Timeline**: 1-2 weeks for implementation and validation
+254
View File
@@ -0,0 +1,254 @@
## ADR-013: RAG Evaluation Testing Framework
**Status:** Proposed
**Date:** 2025-11-15
### Context
The `nc_semantic_search_answer` tool implements a Retrieval-Augmented Generation (RAG) system where:
1. **Retrieval**: Vector sync pipeline indexes Nextcloud documents (notes, calendar, contacts, etc.) into a vector database
2. **Generation**: MCP client's LLM synthesizes answers from retrieved documents via MCP sampling (ADR-008)
We need a testing framework to evaluate RAG system performance and identify whether failures occur in retrieval (wrong documents found) or generation (poor answer quality). This framework must use industry-standard evaluation methodologies while remaining practical to implement and maintain.
To establish a baseline, we will use the **BeIR/nfcorpus** dataset (medical/biomedical corpus) with ~5,000 documents and established query/answer pairs.
Homepage: https://www.cl.uni-heidelberg.de/statnlpgroup/nfcorpus/
Download: https://public.ukp.informatik.tu-darmstadt.de/thakur/BEIR/datasets/nfcorpus.zip
### Decision
We will implement a **two-part evaluation framework** that independently tests retrieval and generation quality using pytest fixtures.
#### In Scope
**1. Retrieval Evaluation**
Tests the vector sync/embedding pipeline's ability to find relevant documents.
- **Metric: Context Recall** (Did we retrieve documents containing the answer?)
- **Evaluation method**: Heuristic - Check if ground-truth document IDs appear in top-k retrieval results
- **Test**: Query → Semantic search → Assert expected doc IDs present
**2. Generation Evaluation**
Tests the MCP client LLM's ability to synthesize correct answers from retrieved context.
- **Metric: Answer Correctness** (Is the generated answer factually correct?)
- **Evaluation method**: LLM-as-judge - Compare RAG answer against ground-truth answer
- **Test**: Query → `nc_semantic_search_answer` → LLM evaluates answer vs. ground truth (binary true/false)
#### Out of Scope (Initial Implementation)
- **Context Relevance/Precision**: Measuring irrelevant documents in retrieval results
- **Faithfulness/Groundedness**: Detecting hallucinations not supported by retrieved context
- **Answer Relevance**: Whether answer addresses the specific question asked
- **Out-of-Scope Handling**: Testing "I don't know" responses when answer isn't in context
- **Continuous benchmarking**: Automated tracking of metric trends over time
- **Custom domain datasets**: Production-specific test data (medical corpus used initially)
These remain valuable for future iterations but add complexity beyond our initial goals.
#### Implementation
**Test Structure**
Location: `tests/rag_evaluation/`
- `test_retrieval_quality.py` - Retrieval evaluation tests
- `test_generation_quality.py` - Generation evaluation tests
- `conftest.py` - Fixtures for test data, MCP clients, and evaluation LLMs
**Required Pytest Fixtures**
1. **`nfcorpus_test_data`** (session-scoped)
- Downloads/caches BeIR nfcorpus dataset at runtime
- Loads 5 pre-selected test queries with:
- Query text
- Pre-generated ground-truth answer (from `tests/rag_evaluation/fixtures/ground_truth.json`)
- Expected document IDs (from qrels with score=2)
- Uploads all corpus documents as notes in test Nextcloud instance
- Triggers vector sync to index documents
- Waits for indexing completion
- Returns test case data structure
2. **`mcp_sampling_client`** (session-scoped)
- Creates MCP client that supports sampling
- Configurable LLM provider (ollama or anthropic) via environment:
- `RAG_EVAL_PROVIDER=ollama` (default) or `anthropic`
- `RAG_EVAL_OLLAMA_BASE_URL=http://localhost:11434`
- `RAG_EVAL_OLLAMA_MODEL=llama3.1:8b`
- `RAG_EVAL_ANTHROPIC_API_KEY=sk-...`
- `RAG_EVAL_ANTHROPIC_MODEL=claude-3-5-sonnet-20241022`
- Returns configured MCP client fixture
3. **`evaluation_llm`** (session-scoped)
- Separate LLM instance for evaluation (independent from MCP client)
- Same provider configuration as `mcp_sampling_client`
- Returns callable: `async def evaluate(prompt: str) -> str`
**Test Implementation Examples**
```python
# tests/rag_evaluation/test_retrieval_quality.py
async def test_retrieval_recall(nc_client, nfcorpus_test_data):
"""Test that semantic search retrieves documents containing the answer."""
for test_case in nfcorpus_test_data:
# Perform semantic search (retrieval only, no generation)
results = await nc_client.notes.semantic_search(
query=test_case.query,
limit=10
)
retrieved_doc_ids = {r.document_id for r in results}
expected_doc_ids = set(test_case.expected_document_ids)
# Context Recall: Are expected documents in top-k results?
recall = len(expected_doc_ids & retrieved_doc_ids) / len(expected_doc_ids)
assert recall >= 0.8, f"Recall {recall} below threshold for query: {test_case.query}"
# tests/rag_evaluation/test_generation_quality.py
async def test_answer_correctness(mcp_sampling_client, evaluation_llm, nfcorpus_test_data):
"""Test that RAG system generates factually correct answers."""
for test_case in nfcorpus_test_data:
# Execute full RAG pipeline (retrieval + generation)
result = await mcp_sampling_client.call_tool(
"nc_semantic_search_answer",
arguments={"query": test_case.query, "limit": 5}
)
rag_answer = result["generated_answer"]
# LLM-as-judge evaluation
evaluation_prompt = f"""Compare these two answers and respond with only TRUE or FALSE.
Question: {test_case.query}
Generated Answer: {rag_answer}
Ground Truth Answer: {test_case.ground_truth}
Are these answers semantically equivalent (do they convey the same factual information)?
Respond with only: TRUE or FALSE"""
evaluation_result = await evaluation_llm(evaluation_prompt)
assert evaluation_result.strip().upper() == "TRUE", \
f"Answer mismatch for query: {test_case.query}\nGot: {rag_answer}\nExpected: {test_case.ground_truth}"
```
**Dataset Integration**
The BeIR nfcorpus dataset structure:
- **corpus.jsonl**: 3,633 medical/biomedical documents (articles from PubMed)
- **queries.jsonl**: 3,237 queries (questions)
- **qrels/*.tsv**: Relevance judgments mapping query IDs to document IDs with scores (2=highly relevant, 1=somewhat relevant)
**Important**: The dataset provides relevance judgments (which documents answer which queries) but does NOT include ground truth answers. We must generate synthetic ground truth offline.
**Selected Test Queries** (5 diverse candidates):
1. **PLAIN-2630**: "Alkylphenol Endocrine Disruptors and Allergies" (5 words, 21 highly relevant docs)
2. **PLAIN-2660**: "How Long to Detox From Fish Before Pregnancy?" (8 words, 20 highly relevant docs)
3. **PLAIN-2510**: "Coffee and Artery Function" (4 words, 16 highly relevant docs)
4. **PLAIN-2430**: "Preventing Brain Loss with B Vitamins?" (6 words, 15 highly relevant docs)
5. **PLAIN-2690**: "Chronic Headaches and Pork Tapeworms" (5 words, 14 highly relevant docs)
**Ground Truth Generation** (offline, pre-test):
Ground truth answers will be generated offline using a script that:
1. Loads nfcorpus dataset
2. For each selected query, extracts top 3-5 highly relevant documents
3. Uses an LLM (ollama/anthropic) to synthesize a reference answer
4. Stores ground truth in `tests/rag_evaluation/fixtures/ground_truth.json`
```python
# tools/generate_rag_ground_truth.py
async def generate_ground_truth(query: str, relevant_docs: List[dict], llm: LLMProvider) -> str:
"""Generate synthetic ground truth answer from highly relevant documents."""
context = "\n\n".join([
f"Document {i+1}:\nTitle: {doc['title']}\n{doc['text']}"
for i, doc in enumerate(relevant_docs[:5])
])
prompt = f"""Based on the following documents, provide a comprehensive answer to this question:
Question: {query}
{context}
Provide a factual, well-structured answer that synthesizes information from the documents.
Focus on accuracy and completeness."""
return await llm.generate(prompt, max_tokens=500)
```
**Dataset Loading at Test Runtime** (in `nfcorpus_test_data` fixture):
1. Download nfcorpus dataset (cached in pytest temp directory)
2. Load corpus, queries, and qrels (relevance judgments)
3. Load pre-generated ground truth from `tests/rag_evaluation/fixtures/ground_truth.json`
4. Upload all corpus documents as Nextcloud notes
5. Trigger vector sync to index documents
6. Wait for indexing completion
7. Return test cases with query, ground truth, and expected doc IDs
**LLM Provider Abstraction**
```python
# tests/rag_evaluation/llm_providers.py
class LLMProvider(Protocol):
async def generate(self, prompt: str, max_tokens: int = 100) -> str: ...
class OllamaProvider:
def __init__(self, base_url: str, model: str):
self.base_url = base_url
self.model = model
async def generate(self, prompt: str, max_tokens: int = 100) -> str:
# Use httpx to call Ollama API
...
class AnthropicProvider:
def __init__(self, api_key: str, model: str):
self.client = anthropic.AsyncAnthropic(api_key=api_key)
self.model = model
async def generate(self, prompt: str, max_tokens: int = 100) -> str:
message = await self.client.messages.create(
model=self.model,
max_tokens=max_tokens,
messages=[{"role": "user", "content": prompt}]
)
return message.content[0].text
```
### Consequences
**Positive:**
* **Actionable debugging**: Separate retrieval/generation tests pinpoint failure location
* **Industry-standard metrics**: Context Recall and Answer Correctness are recognized RAG evaluation metrics
* **Simple initial implementation**: Binary LLM evaluation (true/false) is straightforward to implement and interpret
* **Extensible framework**: Easy to add more metrics (faithfulness, relevance) later
* **Standardized benchmark**: nfcorpus provides objective comparison against published RAG systems
* **Hybrid evaluation**: Combines efficiency (heuristics for retrieval) with quality (LLM-as-judge for generation)
* **Provider flexibility**: Supports both local (Ollama) and cloud (Anthropic) LLM evaluation
**Negative:**
* **Medical domain bias**: nfcorpus is medical/biomedical content, may not represent production use cases (personal notes, calendar events, etc.)
* **Manual test execution**: Tests require external LLM access and are not integrated into CI pipeline
* **Limited initial coverage**: Starting with only 5 queries provides limited statistical confidence
* **Evaluation cost**: LLM-as-judge for generation evaluation incurs API costs (Anthropic) or requires local inference (Ollama)
* **Single metric per component**: Initial scope tests only one metric per component, missing other important quality dimensions
* **Synthetic ground truth**: Ground truth answers are LLM-generated, not human-validated, which may introduce evaluation bias
* **Large corpus upload**: Uploading 3,633 documents at test runtime may be slow; caching strategy needed
**Future Work:**
* Expand to 50-100 queries for statistical significance
* Add custom test dataset with production-representative documents (meeting notes, task lists, etc.)
* Implement additional metrics (faithfulness, context relevance, answer relevance)
* Create automated benchmarking dashboard to track metric trends
* Test multi-hop reasoning (synthesis questions requiring multiple documents)
* Evaluate out-of-scope handling ("I don't know" responses)
+241
View File
@@ -0,0 +1,241 @@
# ADR-014: Replace Custom Keyword Search with BM25 Hybrid Search via Qdrant
**Date:** 2025-11-16
**Status:** Implemented
---
### 1. Context
Our RAG application currently employs two separate retrieval mechanisms:
1. **Dense (Semantic) Search:** Using vector embeddings stored in our Qdrant database to find semantically similar context.
2. **Keyword Search:** A custom-built fuzzy/character-based search to match-specific keywords, acronyms, and product codes that semantic search often misses.
This dual-system approach has several drawbacks:
* **Poor Relevance:** Our current keyword search is basic (e.g., `LIKE` queries or simple fuzzy matching). It is not as effective as modern full-text search algorithms like BM25.
* **Clunky Fusion:** We lack a robust, principled method to combine the results from the two systems. This leads to disjointed logic in the application layer and suboptimal context being passed to the LLM.
* **Architectural Complexity:** We must maintain two separate search pathways (one to Qdrant, one to the keyword search mechanism), increasing code complexity and maintenance overhead.
Our vector database, **Qdrant**, natively supports **hybrid search** by combining dense vectors with BM25-based **sparse vectors** in a single collection.
### 2. Decision
We will **deprecate and remove** the existing custom keyword/fuzzy search functionality.
We will **replace it by implementing native hybrid search within Qdrant**. This involves:
1. **Modifying the Qdrant Collection:** Updating our collection to support a named sparse vector index configured for BM25.
2. **Updating the Ingestion Pipeline:** For every document chunk, we will generate and upsert *both*:
* Its **dense vector** (from our existing embedding model).
* Its **sparse vector** (generated using a BM25-compatible model, e.g., `Qdrant/bm25` from `fastembed`).
3. **Refactoring Retrieval Logic:** All retrieval calls will be consolidated into a single Qdrant query using the `query_points` endpoint. This query will use the `prefetch` parameter to execute both dense and sparse searches, and Qdrant's built-in **Reciprocal Rank Fusion (RRF)** to automatically merge the results into a single, relevance-ranked list.
4. **Backfilling:** A one-time migration script will be created to generate and add sparse vectors for all existing documents in the Qdrant collection.
---
### 3. Considered Options
#### Option 1: Native Qdrant Hybrid Search (Chosen)
* Use Qdrant's built-in sparse vector and RRF capabilities.
* **Pros:**
* **Consolidated Architecture:** Manages both dense and sparse indexes in one database.
* **No Data Sync Issues:** Updates are atomic. A single `upsert` updates both representations.
* **Built-in Fusion:** RRF is handled natively and efficiently by the database.
* **Superior Relevance:** Replaces our brittle custom search with the industry-standard BM25.
* **Cons:**
* Requires a one-time data backfill which may be time-consuming.
* Adds a new step (sparse vector generation) to the ingestion pipeline.
#### Option 2: External Full-Text Search (e.g., Elasticsearch)
* Keep Qdrant for dense search and add a separate Elasticsearch/OpenSearch cluster for BM25.
* **Pros:**
* Provides a very powerful, dedicated full-text search engine.
* **Cons:**
* **High Complexity:** Introduces a new, stateful service to deploy, manage, and scale.
* **Data Sync Nightmare:** We would be responsible for ensuring that the document IDs and content in Qdrant and Elasticsearch are always perfectly synchronized. This is a major source of bugs.
* **Manual Fusion:** The application would have to query both systems and perform RRF manually.
#### Option 3: Keep Current System
* Make no changes.
* **Pros:**
* No engineering effort required.
* **Cons:**
* Fails to address the known relevance and architectural problems.
* Our RAG application's performance will remain suboptimal, especially for keyword-sensitive queries.
---
### 4. Rationale
**Option 1 is the clear winner.** It directly solves our primary problem (poor keyword matching) by adopting the industry-standard BM25.
Critically, it achieves this while **simplifying** our overall architecture, not complicating it. By leveraging features already present in our existing database (Qdrant), we avoid the massive operational and synchronization overhead of adding a second search system (Option 2).
This decision consolidates our retrieval logic, eliminates the data consistency problem, and moves the complex fusion logic (RRF) from the application layer into the database, where it can be performed more efficiently.
### 5. Consequences
**New Work:**
* **Ingestion:** The data ingestion pipeline must be updated to add the `fastembed` library (or similar), generate sparse vectors, and upsert them to the new named vector field in Qdrant.
* **Retrieval:** The application's retrieval service must be refactored to use the `query_points` endpoint with `prefetch` and `fusion=models.Fusion.RRF`.
* **Migration:** A one-time backfill script must be written and executed to add sparse vectors for all existing documents.
* **Infrastructure:** The Qdrant collection schema must be updated (or re-created) to add the `sparse_vectors_config`.
**Positive:**
* **Improved Accuracy:** Retrieval will be significantly more accurate, handling both semantic and keyword queries robustly.
* **Simplified Code:** The application's retrieval logic will be cleaner and simpler, with one endpoint instead of two.
* **Reduced Maintenance:** We will remove the custom fuzzy-search code, which is brittle and difficult to maintain.
**Negative:**
* The data backfill process will require careful management to avoid downtime.
* Ingestion time will slightly increase due to the extra step of sparse vector generation. This is considered a negligible trade-off for the gains in relevance.
---
### 6. Implementation Notes
**Implementation completed on 2025-11-16**
**Key Changes:**
1. **Dependencies** (pyproject.toml:25):
- Added `fastembed>=0.4.2` for BM25 sparse vector embeddings
- Adjusted `pillow` version constraint to be compatible with fastembed
2. **Qdrant Collection Schema** (nextcloud_mcp_server/vector/qdrant_client.py:113-128):
- Updated to named vectors: `{"dense": VectorParams(...), "sparse": SparseVectorParams(...)}`
- Added sparse vector configuration with BM25 index
- Maintains backward compatibility with existing collections (detects legacy schema)
3. **BM25 Embedding Provider** (nextcloud_mcp_server/embedding/bm25_provider.py):
- Created `BM25SparseEmbeddingProvider` using FastEmbed's `Qdrant/bm25` model
- Implements `encode()` and `encode_batch()` methods
- Returns sparse vectors as `{indices: list[int], values: list[float]}` format
4. **Document Indexing Pipeline** (nextcloud_mcp_server/vector/processor.py:229-255):
- Generates both dense (semantic) and sparse (BM25) embeddings for each document chunk
- Updates `PointStruct` to use named vectors: `vector={"dense": ..., "sparse": ...}`
- Maintains same chunking strategy (512 words, 50-word overlap)
5. **BM25 Hybrid Search Algorithm** (nextcloud_mcp_server/search/bm25_hybrid.py):
- Implements `BM25HybridSearchAlgorithm` using Qdrant's native RRF fusion
- Uses `prefetch` parameter for parallel dense + sparse search
- Applies `fusion=models.Fusion.RRF` for automatic result merging
- Maintains same deduplication and filtering logic as semantic search
6. **MCP Tool Updates** (nextcloud_mcp_server/server/semantic.py:39-68):
- Simplified `nc_semantic_search()` to use BM25 hybrid only
- Removed `algorithm`, `semantic_weight`, `keyword_weight`, `fuzzy_weight` parameters
- Updated default `score_threshold=0.0` for RRF scoring
- Returns `search_method="bm25_hybrid"` in responses
7. **Legacy Algorithm Removal**:
- Deleted `nextcloud_mcp_server/search/keyword.py` (278 lines)
- Deleted `nextcloud_mcp_server/search/fuzzy.py` (220 lines)
- Deleted `nextcloud_mcp_server/search/hybrid.py` (238 lines - custom RRF)
- Updated `nextcloud_mcp_server/search/__init__.py` to export only BM25 hybrid
**Migration Strategy:**
- No migration required (vector sync feature is experimental)
- New documents automatically indexed with both dense + sparse vectors
- Collection re-creation on first startup with updated schema
**Test Results:**
- All unit tests passing (118 passed)
- All integration tests passing (7 semantic search tests)
- Code formatting verified with ruff
**Benefits Realized:**
- ✅ Consolidated architecture (single Qdrant database for both dense + sparse)
- ✅ Native fusion algorithms (database-level, more efficient)
- ✅ Industry-standard BM25 (replaces custom keyword search)
- ✅ Simplified codebase (removed 736 lines of legacy code)
- ✅ Better relevance (handles both semantic and keyword queries)
- ✅ Configurable fusion methods (RRF and DBSF)
---
### 7. Fusion Algorithm Options
**Update: 2025-11-16**
The BM25 hybrid search now supports two fusion algorithms for combining dense (semantic) and sparse (BM25) search results:
#### Reciprocal Rank Fusion (RRF)
**Default fusion method.** RRF is a widely-used, well-established algorithm that combines rankings from multiple retrieval systems using the reciprocal rank formula:
```
RRF(doc) = Σ 1/(k + rank_i(doc))
```
where `k` is a constant (typically 60) and `rank_i(doc)` is the rank of the document in retrieval system `i`.
**Characteristics:**
-**General-purpose**: Works well across diverse query types and document collections
-**Rank-based**: Focuses on relative rankings rather than absolute scores
-**Established**: Well-tested, documented, and understood in IR literature
-**Robust**: Less sensitive to score distribution differences between systems
**When to use RRF:**
- Default choice for most use cases
- When you have mixed query types (semantic + keyword)
- When retrieval systems have very different score ranges
- When you want predictable, well-understood behavior
#### Distribution-Based Score Fusion (DBSF)
**Alternative fusion method.** DBSF normalizes scores from each retrieval system using distribution statistics before combining them:
1. **Normalization**: For each query, calculates mean (μ) and standard deviation (σ) of scores
2. **Outlier handling**: Uses μ ± 3σ as normalization bounds
3. **Fusion**: Sums normalized scores across systems
**Characteristics:**
-**Score-aware**: Uses actual relevance scores, not just rankings
-**Statistical**: Normalizes based on score distribution properties
- ⚠️ **Experimental**: Newer algorithm, less battle-tested than RRF
- ⚠️ **Sensitive**: May behave differently depending on score distributions
**When to use DBSF:**
- When retrieval systems have vastly different score ranges that RRF doesn't balance well
- When you want to experiment with score-based (vs rank-based) fusion
- When statistical normalization better matches your use case
- For A/B testing against RRF to measure retrieval quality improvements
#### Configuration
Both fusion algorithms are exposed via the `fusion` parameter in MCP tools:
```python
# Use RRF (default)
response = await nc_semantic_search(
query="async programming",
fusion="rrf" # Can be omitted, RRF is default
)
# Use DBSF
response = await nc_semantic_search(
query="async programming",
fusion="dbsf"
)
```
The `nc_semantic_search_answer` tool also supports the `fusion` parameter and passes it through to the underlying search.
#### Future: Configurable Weights
**Current limitation**: Neither RRF nor DBSF currently support per-system weights (e.g., 0.8 for semantic, 0.2 for BM25). This is a Qdrant platform limitation tracked in [qdrant/qdrant#6067](https://github.com/qdrant/qdrant/issues/6067).
When Qdrant adds weight support, the `fusion` parameter can be extended to accept weight configurations:
```python
# Hypothetical future API
response = await nc_semantic_search(
query="async programming",
fusion="rrf",
fusion_weights={"dense": 0.7, "sparse": 0.3} # Not yet implemented
)
```
**Recommendation**: Start with RRF (default). If you encounter cases where keyword matches are under- or over-weighted, experiment with DBSF. Monitor [qdrant/qdrant#6067](https://github.com/qdrant/qdrant/issues/6067) for configurable weight support.
@@ -0,0 +1,380 @@
# ADR-015: Unified Provider Architecture for Embeddings and Text Generation
**Status:** Accepted
**Date:** 2025-01-16
**Deciders:** Development Team
**Related:** ADR-003 (Vector Database), ADR-008 (MCP Sampling), ADR-013 (RAG Evaluation)
## Context
Prior to this refactoring, the codebase had two separate provider systems:
1. **Embedding Providers** (`nextcloud_mcp_server/embedding/`)
- Used `EmbeddingProvider` ABC with methods: `embed()`, `embed_batch()`, `get_dimension()`
- Had auto-detection via `EmbeddingService._detect_provider()`
- Used for semantic search and vector indexing (production)
2. **LLM Providers** (`tests/rag_evaluation/llm_providers.py`)
- Used `LLMProvider` Protocol with method: `generate()`
- Had separate factory function `create_llm_provider()`
- Used only for RAG evaluation tests (not production)
This fragmentation created several problems:
### Problems with Dual Provider Systems
1. **Code Duplication**
- Ollama configuration appeared in both `embedding/service.py` and `tests/rag_evaluation/llm_providers.py`
- Similar provider detection logic in multiple places
- Separate singleton patterns for each system
2. **Limited Extensibility**
- Hard-coded provider detection in `EmbeddingService._detect_provider()`
- No support for providers that offer both capabilities (like Bedrock)
- Adding new providers required modifying multiple files
3. **Inconsistent Patterns**
- BM25 provider didn't follow `EmbeddingProvider` ABC
- Different method names across providers (`embed` vs `encode`)
- ABC vs Protocol for type checking
4. **Difficult Scaling**
- Adding Amazon Bedrock (our third provider) would exacerbate all issues
- No clear path for future providers (OpenAI, Cohere, etc.)
### Amazon Bedrock Requirements
Bedrock naturally supports **both** embeddings and text generation:
- **Embeddings**: `amazon.titan-embed-text-v1/v2`, `cohere.embed-*`
- **Text Generation**: `anthropic.claude-*`, `meta.llama3-*`, `amazon.titan-text-*`
- **Unified API**: Single `invoke_model()` method via bedrock-runtime
This made it the perfect opportunity to establish a unified provider architecture.
## Decision
We refactored the provider infrastructure to use a **unified Provider ABC** with optional capabilities:
### 1. Unified Provider Interface
**New Structure:**
```
nextcloud_mcp_server/providers/
├── __init__.py
├── base.py # Provider ABC with optional capabilities
├── registry.py # Auto-detection and factory
├── ollama.py # Supports both embedding + generation
├── anthropic.py # Generation only
├── bedrock.py # Supports both embedding + generation
└── simple.py # Embedding only (testing fallback)
```
**Base Class (`providers/base.py`):**
```python
class Provider(ABC):
@property
@abstractmethod
def supports_embeddings(self) -> bool:
"""Whether this provider supports embedding generation."""
pass
@property
@abstractmethod
def supports_generation(self) -> bool:
"""Whether this provider supports text generation."""
pass
@abstractmethod
async def embed(self, text: str) -> list[float]:
"""Generate embedding (raises NotImplementedError if not supported)."""
pass
@abstractmethod
async def embed_batch(self, texts: list[str]) -> list[list[float]]:
"""Generate batch embeddings (raises NotImplementedError if not supported)."""
pass
@abstractmethod
def get_dimension(self) -> int:
"""Get embedding dimension (raises NotImplementedError if not supported)."""
pass
@abstractmethod
async def generate(self, prompt: str, max_tokens: int = 500) -> str:
"""Generate text (raises NotImplementedError if not supported)."""
pass
@abstractmethod
async def close(self) -> None:
"""Close provider and release resources."""
pass
```
### 2. Provider Registry
**Auto-Detection Priority** (`providers/registry.py`):
```python
class ProviderRegistry:
@staticmethod
def create_provider() -> Provider:
# 1. Bedrock (AWS_REGION or BEDROCK_*_MODEL)
# 2. Ollama (OLLAMA_BASE_URL)
# 3. Simple (fallback)
```
**Environment Variables:**
**Bedrock:**
- `AWS_REGION`: AWS region (e.g., "us-east-1")
- `AWS_ACCESS_KEY_ID`: AWS access key (optional, uses credential chain)
- `AWS_SECRET_ACCESS_KEY`: AWS secret key (optional)
- `BEDROCK_EMBEDDING_MODEL`: Model ID for embeddings (e.g., "amazon.titan-embed-text-v2:0")
- `BEDROCK_GENERATION_MODEL`: Model ID for text generation (e.g., "anthropic.claude-3-sonnet-20240229-v1:0")
**Ollama:**
- `OLLAMA_BASE_URL`: Ollama API base URL (e.g., "http://localhost:11434")
- `OLLAMA_EMBEDDING_MODEL`: Model for embeddings (default: "nomic-embed-text")
- `OLLAMA_GENERATION_MODEL`: Model for text generation (e.g., "llama3.2:1b")
- `OLLAMA_VERIFY_SSL`: Verify SSL certificates (default: "true")
**Simple (no configuration, fallback):**
- `SIMPLE_EMBEDDING_DIMENSION`: Embedding dimension (default: 384)
### 3. Backward Compatibility
**Old Code Continues to Work:**
```python
# Old way (still works)
from nextcloud_mcp_server.embedding import get_embedding_service
service = get_embedding_service() # Returns singleton Provider
embeddings = await service.embed_batch(texts)
```
**New Way (recommended):**
```python
# New way (cleaner)
from nextcloud_mcp_server.providers import get_provider
provider = get_provider() # Returns singleton Provider
embeddings = await provider.embed_batch(texts)
# Can also use generation if provider supports it
if provider.supports_generation:
text = await provider.generate("prompt")
```
**Migration Path:**
- `embedding/service.py` now wraps `providers.get_provider()` for compatibility
- `tests/rag_evaluation/llm_providers.py` now uses unified providers
- Old imports still work, marked as deprecated in docstrings
### 4. Amazon Bedrock Implementation
**Features:**
- Supports both embeddings and text generation
- Model-specific request/response handling for:
- Titan Embed (amazon.titan-embed-text-*)
- Cohere Embed (cohere.embed-*)
- Claude (anthropic.claude-*)
- Llama (meta.llama3-*)
- Titan Text (amazon.titan-text-*)
- Mistral (mistral.*)
- Uses boto3 bedrock-runtime client
- Graceful degradation if boto3 not installed
- Async implementation matching existing patterns
**Model-Specific Handling:**
```python
# Bedrock embedding request (Titan)
{"inputText": text}
# Bedrock generation request (Claude)
{
"anthropic_version": "bedrock-2023-05-31",
"max_tokens": max_tokens,
"temperature": 0.7,
"messages": [{"role": "user", "content": prompt}]
}
```
## Consequences
### Positive
1. **Sustainable Provider Additions**
- New providers only need to implement `Provider` ABC
- Auto-detection via environment variables
- No modifications to existing code required
2. **Code Consolidation**
- Single provider interface instead of two
- Unified configuration pattern
- Eliminated duplication
3. **Better Extensibility**
- Providers can support one or both capabilities
- Clear capability detection via properties
- Registry pattern simplifies auto-detection
4. **Improved Testing**
- RAG evaluation can use any provider (Ollama, Anthropic, Bedrock)
- Comprehensive unit tests for all providers
- Mocked boto3 tests for Bedrock
5. **Production-Ready Bedrock Support**
- Full embedding and generation support
- Multiple model families supported
- AWS credential chain integration
### Neutral
1. **Optional Boto3 Dependency**
- boto3 is dev dependency only (not required for core functionality)
- Bedrock provider gracefully fails if boto3 not installed
- Users who want Bedrock must `pip install boto3`
2. **Capability Properties**
- All providers must implement capability properties
- Methods raise `NotImplementedError` if capability not supported
- Clear error messages guide users to alternatives
### Negative
1. **Migration Effort**
- Existing code must be migrated to new imports (optional, backward compatible)
- Documentation needs updating
- Users must learn new environment variables
2. **Increased Complexity**
- Provider base class has more methods (embedding + generation)
- More environment variables to configure
- Capability detection adds runtime checks
## Implementation
### Files Created
**New Provider Infrastructure:**
- `nextcloud_mcp_server/providers/__init__.py`
- `nextcloud_mcp_server/providers/base.py`
- `nextcloud_mcp_server/providers/registry.py`
- `nextcloud_mcp_server/providers/ollama.py`
- `nextcloud_mcp_server/providers/anthropic.py`
- `nextcloud_mcp_server/providers/bedrock.py`
- `nextcloud_mcp_server/providers/simple.py`
**Tests:**
- `tests/unit/providers/__init__.py`
- `tests/unit/providers/test_bedrock.py` (9 unit tests)
**Documentation:**
- `docs/ADR-015-unified-provider-architecture.md` (this file)
### Files Modified
**Backward Compatibility:**
- `nextcloud_mcp_server/embedding/service.py` - Now wraps `get_provider()`
- `tests/rag_evaluation/llm_providers.py` - Uses unified providers
**Dependencies:**
- `pyproject.toml` - Added `boto3>=1.35.0` to dev dependencies
### Testing Results
**Unit Tests:** 127 passed (including 9 new Bedrock tests)
**Type Checking:** All checks passed (ty)
**Linting:** All checks passed (ruff)
**Backward Compatibility:** Verified - existing embedding tests work
## Alternatives Considered
### Alternative 1: Keep Separate Provider Systems
**Pros:**
- No refactoring needed
- Simpler short-term
**Cons:**
- Bedrock would need to be implemented twice
- Continued code duplication
- No long-term scalability
**Decision:** Rejected - technical debt would continue to grow
### Alternative 2: Separate Embedding and Generation Providers
Use composition instead of unified interface:
```python
class CombinedProvider:
def __init__(self, embedding: EmbeddingProvider, generation: LLMProvider):
self.embedding = embedding
self.generation = generation
```
**Pros:**
- Clearer separation of concerns
- Simpler individual providers
**Cons:**
- Bedrock and Ollama naturally do both - artificial separation
- More complex configuration (two providers to configure)
- More boilerplate code
**Decision:** Rejected - unified interface better matches provider capabilities
### Alternative 3: Plugin System
Dynamic provider registration via entry points:
```python
# setup.py
entry_points={
'nextcloud_mcp.providers': [
'ollama = nextcloud_mcp_server.providers.ollama:OllamaProvider',
'bedrock = nextcloud_mcp_server.providers.bedrock:BedrockProvider',
]
}
```
**Pros:**
- Most extensible
- Third-party providers possible
**Cons:**
- Over-engineered for current needs
- Added complexity
- No immediate benefit
**Decision:** Deferred - can add later if needed
## Future Work
1. **Additional Providers**
- OpenAI (embeddings + generation)
- Cohere (embeddings + generation)
- Google Vertex AI
- Azure OpenAI
2. **Provider Features**
- Streaming generation support
- Batch API optimization (when available)
- Model-specific optimizations
- Cost tracking and metrics
3. **Configuration Improvements**
- Provider profiles (development, production)
- Model aliasing (e.g., "small", "large")
- Fallback provider chains
4. **Testing**
- Integration tests with real Bedrock endpoints
- Performance benchmarking across providers
- Cost comparison analysis
## References
- [boto3 Bedrock Runtime Documentation](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/bedrock-runtime.html)
- [Amazon Bedrock User Guide](https://docs.aws.amazon.com/bedrock/latest/userguide/what-is-bedrock.html)
- ADR-003: Vector Database and Semantic Search
- ADR-008: MCP Sampling for Semantic Search
- ADR-013: RAG Evaluation Framework
+338
View File
@@ -0,0 +1,338 @@
# Amazon Bedrock Setup Guide
This guide covers how to configure the Nextcloud MCP Server to use Amazon Bedrock for embeddings and text generation.
## Prerequisites
1. **AWS Account** with access to Amazon Bedrock
2. **boto3 library** installed: `pip install boto3` or `uv sync --group dev`
3. **Model Access** - Request access to models in AWS Bedrock console
## Required AWS Permissions
### IAM Policy for Bedrock Access
The AWS IAM user or role needs the following permissions:
```json
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "BedrockInvokeModels",
"Effect": "Allow",
"Action": [
"bedrock:InvokeModel",
"bedrock:InvokeModelWithResponseStream"
],
"Resource": [
"arn:aws:bedrock:*::foundation-model/*"
]
}
]
}
```
### Minimal Permissions (Production)
For production deployments, restrict to specific models:
```json
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "BedrockEmbeddings",
"Effect": "Allow",
"Action": [
"bedrock:InvokeModel"
],
"Resource": [
"arn:aws:bedrock:us-east-1::foundation-model/amazon.titan-embed-text-v2:0"
]
},
{
"Sid": "BedrockGeneration",
"Effect": "Allow",
"Action": [
"bedrock:InvokeModel"
],
"Resource": [
"arn:aws:bedrock:us-east-1::foundation-model/anthropic.claude-3-sonnet-20240229-v1:0"
]
}
]
}
```
### Additional Permissions (Optional)
For advanced use cases:
```json
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "BedrockListModels",
"Effect": "Allow",
"Action": [
"bedrock:ListFoundationModels",
"bedrock:GetFoundationModel"
],
"Resource": "*"
},
{
"Sid": "BedrockAsyncInvoke",
"Effect": "Allow",
"Action": [
"bedrock:InvokeModelAsync",
"bedrock:GetAsyncInvoke",
"bedrock:ListAsyncInvokes"
],
"Resource": [
"arn:aws:bedrock:*::foundation-model/*"
]
}
]
}
```
## Model Access
Before using Bedrock models, you must request access in the AWS Console:
1. Navigate to **Amazon Bedrock****Model access**
2. Click **Manage model access**
3. Select models you want to use:
- **Embeddings:** Amazon Titan Embed Text, Cohere Embed
- **Text Generation:** Anthropic Claude, Meta Llama, Amazon Titan Text
4. Click **Request model access**
5. Wait for approval (usually instant for most models)
## Supported Models
### Embedding Models
| Provider | Model ID | Dimensions | Best For |
|----------|----------|------------|----------|
| Amazon Titan | `amazon.titan-embed-text-v1` | 1,536 | General purpose |
| Amazon Titan | `amazon.titan-embed-text-v2:0` | 1,024 | Latest, improved quality |
| Cohere | `cohere.embed-english-v3` | 1,024 | English text |
| Cohere | `cohere.embed-multilingual-v3` | 1,024 | Multilingual |
### Text Generation Models
| Provider | Model ID | Context | Best For |
|----------|----------|---------|----------|
| Anthropic | `anthropic.claude-3-sonnet-20240229-v1:0` | 200K | Balanced performance |
| Anthropic | `anthropic.claude-3-haiku-20240307-v1:0` | 200K | Fast, cost-effective |
| Anthropic | `anthropic.claude-3-opus-20240229-v1:0` | 200K | Highest quality |
| Meta | `meta.llama3-8b-instruct-v1:0` | 8K | Fast, open-source |
| Meta | `meta.llama3-70b-instruct-v1:0` | 8K | High quality |
| Amazon | `amazon.titan-text-express-v1` | 8K | Fast, low cost |
| Mistral | `mistral.mistral-7b-instruct-v0:2` | 32K | Efficient |
## Configuration
### Environment Variables
**Required:**
```bash
AWS_REGION=us-east-1
```
**Optional (at least one model required):**
```bash
# For embeddings
BEDROCK_EMBEDDING_MODEL=amazon.titan-embed-text-v2:0
# For text generation (RAG evaluation)
BEDROCK_GENERATION_MODEL=anthropic.claude-3-sonnet-20240229-v1:0
```
**AWS Credentials (choose one method):**
**Method 1: Environment Variables**
```bash
AWS_ACCESS_KEY_ID=AKIAIOSFODNN7EXAMPLE
AWS_SECRET_ACCESS_KEY=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
```
**Method 2: AWS Credentials File** (`~/.aws/credentials`)
```ini
[default]
aws_access_key_id = AKIAIOSFODNN7EXAMPLE
aws_secret_access_key = wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
```
**Method 3: IAM Role** (when running on AWS EC2/ECS/Lambda)
- No credentials needed, uses instance/task role automatically
### Docker Configuration
Add to your `docker-compose.yml`:
```yaml
services:
mcp:
environment:
- AWS_REGION=us-east-1
- BEDROCK_EMBEDDING_MODEL=amazon.titan-embed-text-v2:0
- BEDROCK_GENERATION_MODEL=anthropic.claude-3-sonnet-20240229-v1:0
- AWS_ACCESS_KEY_ID=${AWS_ACCESS_KEY_ID}
- AWS_SECRET_ACCESS_KEY=${AWS_SECRET_ACCESS_KEY}
```
Or use AWS credentials file volume mount:
```yaml
services:
mcp:
volumes:
- ~/.aws:/root/.aws:ro
environment:
- AWS_REGION=us-east-1
- BEDROCK_EMBEDDING_MODEL=amazon.titan-embed-text-v2:0
```
## Usage Examples
### Embeddings Only
```bash
export AWS_REGION=us-east-1
export BEDROCK_EMBEDDING_MODEL=amazon.titan-embed-text-v2:0
export AWS_ACCESS_KEY_ID=your-key
export AWS_SECRET_ACCESS_KEY=your-secret
uv run nextcloud-mcp-server
```
### Both Embeddings and Generation
```bash
export AWS_REGION=us-east-1
export BEDROCK_EMBEDDING_MODEL=amazon.titan-embed-text-v2:0
export BEDROCK_GENERATION_MODEL=anthropic.claude-3-sonnet-20240229-v1:0
# For RAG evaluation with Bedrock
export RAG_EVAL_PROVIDER=bedrock
export RAG_EVAL_BEDROCK_MODEL=anthropic.claude-3-sonnet-20240229-v1:0
uv run python -m tests.rag_evaluation.evaluate
```
### Programmatic Usage
```python
from nextcloud_mcp_server.providers import BedrockProvider
# Embeddings only
provider = BedrockProvider(
region_name="us-east-1",
embedding_model="amazon.titan-embed-text-v2:0",
)
embeddings = await provider.embed_batch(["text1", "text2"])
# Both capabilities
provider = BedrockProvider(
region_name="us-east-1",
embedding_model="amazon.titan-embed-text-v2:0",
generation_model="anthropic.claude-3-sonnet-20240229-v1:0",
)
# Generate embeddings
embedding = await provider.embed("query text")
# Generate text
response = await provider.generate("Write a summary", max_tokens=500)
```
## Cost Considerations
### Embedding Costs (as of Jan 2025)
| Model | Price per 1K tokens |
|-------|---------------------|
| Titan Embed Text v2 | $0.0001 |
| Cohere Embed English v3 | $0.0001 |
### Generation Costs (as of Jan 2025)
| Model | Input (per 1K tokens) | Output (per 1K tokens) |
|-------|----------------------|------------------------|
| Claude 3 Haiku | $0.00025 | $0.00125 |
| Claude 3 Sonnet | $0.003 | $0.015 |
| Claude 3 Opus | $0.015 | $0.075 |
| Llama 3 8B | $0.0003 | $0.0006 |
| Titan Text Express | $0.0002 | $0.0006 |
**Note:** Prices vary by region. Check [AWS Bedrock Pricing](https://aws.amazon.com/bedrock/pricing/) for current rates.
## Troubleshooting
### Error: "Executable doesn't exist" or boto3 not found
**Solution:**
```bash
uv sync --group dev # Installs boto3
```
### Error: "AccessDeniedException"
**Causes:**
1. IAM permissions missing
2. Model access not requested
3. Wrong AWS region
**Solution:**
1. Verify IAM policy includes `bedrock:InvokeModel`
2. Request model access in Bedrock console
3. Check model is available in your region
### Error: "ResourceNotFoundException"
**Cause:** Invalid model ID or model not available in region
**Solution:**
- Verify model ID matches exactly (case-sensitive)
- Check model availability in your AWS region
- Use `aws bedrock list-foundation-models` to see available models
### Error: "ThrottlingException"
**Cause:** Rate limit exceeded
**Solution:**
- Reduce request rate
- Request quota increase via AWS Support
- Use batch operations where possible
## Security Best Practices
1. **Use IAM Roles** when running on AWS infrastructure
2. **Rotate Access Keys** regularly if using IAM users
3. **Restrict Permissions** to only required models
4. **Enable CloudTrail** for audit logging
5. **Use AWS Secrets Manager** for credential management
6. **Monitor Costs** with AWS Cost Explorer and Budgets
## Regional Availability
Amazon Bedrock is available in:
- **US East (N. Virginia)**: `us-east-1` ✅ Most models
- **US West (Oregon)**: `us-west-2` ✅ Most models
- **Asia Pacific (Singapore)**: `ap-southeast-1`
- **Asia Pacific (Tokyo)**: `ap-northeast-1`
- **Europe (Frankfurt)**: `eu-central-1`
**Note:** Model availability varies by region. Check the [AWS Bedrock documentation](https://docs.aws.amazon.com/bedrock/latest/userguide/models-regions.html) for current availability.
## References
- [AWS Bedrock Documentation](https://docs.aws.amazon.com/bedrock/)
- [AWS Bedrock Pricing](https://aws.amazon.com/bedrock/pricing/)
- [boto3 Bedrock Runtime API](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/bedrock-runtime.html)
- [Provider Architecture ADR](./ADR-015-unified-provider-architecture.md)
Binary file not shown.

After

Width:  |  Height:  |  Size: 83 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 82 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 282 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 143 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 244 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 483 KiB

+1 -1
View File
@@ -243,7 +243,7 @@ If you see cardinality warnings:
The observability stack integrates at multiple layers:
1. **HTTP Layer**: `ObservabilityMiddleware` tracks all HTTP requests
2. **MCP Layer**: Tools use `@trace_mcp_tool` for span creation
2. **MCP Layer**: Tools use `@instrument_tool` for automatic metrics and trace span creation
3. **Client Layer**: `BaseNextcloudClient` tracks all API calls
4. **OAuth Layer**: Token operations are traced and metered
5. **Background Tasks**: Vector sync operations emit metrics/traces
+93
View File
@@ -0,0 +1,93 @@
# Vector Sync UI Guide
This guide covers the browser-based interface for the Nextcloud MCP Server's semantic search and vector synchronization features.
## Overview
The Vector Sync UI (`/app`) provides an interactive interface to test semantic search queries and visualize results from your Nextcloud documents. It exposes the same retrieval capabilities that LLMs use in Retrieval-Augmented Generation (RAG) workflows, powered by Alpine.js for reactive state, htmx for dynamic updates, and Plotly.js for 3D visualization.
**Supported Apps**: Notes, Files (text/PDF), Calendar (events/tasks), Contacts (CardDAV), and Deck are indexed and searchable.
## Accessing the UI
Navigate to `/app` after authentication:
- **BasicAuth mode**: `http://localhost:8000/app` (uses credentials from environment)
- **OAuth mode**: `http://localhost:8000/app` (redirects to login if not authenticated)
## Tabs
### Welcome Page
Landing page that introduces semantic search and RAG workflows. Shows authentication status, explains how vector embeddings work, and provides feature navigation. Adapts content based on whether `VECTOR_SYNC_ENABLED=true`.
### User Info
Displays authentication details and session information:
- **BasicAuth**: Username, mode badge, Nextcloud host
- **OAuth**: Username, session ID (truncated), background access status, IdP profile, revocation option
### Vector Sync Status
Real-time monitoring of document indexing:
- **Indexed Documents**: Total chunks stored in Qdrant vector database (immediately searchable)
- **Pending Documents**: Queue awaiting embedding processing
- **Status**: "✓ Idle" (green) when up-to-date, "⟳ Syncing" (orange) during processing
Auto-refreshes every 10 seconds via htmx. Check this tab after adding content to verify indexing completion.
### Vector Visualization
Interactive search interface with 3D PCA plot of semantic space.
**Search Controls**:
- **Query**: Natural language search (e.g., "health benefits of coffee")
- **Algorithm**: Semantic (Dense) for pure vector search, or BM25 Hybrid (default) combining vectors + keywords
- **Fusion** (Hybrid only): RRF (Reciprocal Rank Fusion) or DBSF (Distribution-Based Score Fusion)
- **Advanced**: Filter by document type, adjust score threshold (0.0-1.0), set result limit (max 100)
**3D Visualization**:
The plot uses Principal Component Analysis (PCA) to reduce 768-dimensional embeddings to 3D. Documents are positioned by semantic similarity with the query point shown in red. Point size and opacity indicate relevance, and the Viridis color scale shows relative scores (yellow = highest match).
**Critical Fix**: Vectors are L2-normalized before PCA to match Qdrant's cosine distance, ensuring query points position accurately near similar documents. Without normalization, magnitude differences cause misleading spatial separation.
**Results List**:
Each result shows document title (clickable link to Nextcloud), excerpt, raw score, relative percentage, and document type. Click "Show Chunk" to view the matched text segment with surrounding context (up to 500 characters before/after).
## Configuration
**Required**:
```bash
VECTOR_SYNC_ENABLED=true
```
**Optional** (for browser-accessible links):
```bash
NEXTCLOUD_PUBLIC_ISSUER_URL=https://your-public-nextcloud-url.com
```
**Admin Access**: Webhooks tab only visible to Nextcloud admins (verified via Provisioning API).
## Use Cases
**Testing Search Queries**: Preview results before they reach LLMs in RAG workflows. Compare semantic vs. hybrid algorithms, verify relevance scores, and validate that correct documents are retrieved. Use chunk context to see exactly which text segments match and why unexpected documents appear.
**Monitoring Indexing**: Track real-time progress after creating or modifying documents. Check if the queue is backing up (high pending count) or confirm the system is idle after bulk imports. Verify documents become searchable immediately after indexing completes.
**Algorithm Comparison**: Pure semantic search excels at conceptual queries and synonyms. BM25 hybrid combines semantic understanding with precise keyword matching for better accuracy on specific terms. Experiment with RRF vs. DBSF fusion for different score distributions.
## Troubleshooting
**Vector Sync Tab Not Visible**: Set `VECTOR_SYNC_ENABLED=true` and restart the server.
**No Search Results**: Check Vector Sync Status to confirm documents are indexed (not just pending). Try broader queries or lower the score threshold in Advanced options. Initial indexing may take time depending on document volume.
**Links to Nextcloud Apps Not Working**: Set `NEXTCLOUD_PUBLIC_ISSUER_URL` to your browser-accessible Nextcloud URL for correct link generation.
## Related Documentation
- [Configuration Guide](../configuration.md) - Environment variables and settings
- [Authentication Modes](../authentication.md) - BasicAuth vs OAuth setup
- [Installation Guide](../installation.md) - Getting started
- [ADR-008: MCP Sampling for RAG](../ADR-008-mcp-sampling-for-rag.md) - Technical details on RAG workflows
+23 -8
View File
@@ -24,6 +24,7 @@ from starlette.middleware.authentication import AuthenticationMiddleware
from starlette.middleware.cors import CORSMiddleware
from starlette.responses import JSONResponse, RedirectResponse
from starlette.routing import Mount, Route
from starlette.staticfiles import StaticFiles
from nextcloud_mcp_server.auth import (
InsufficientScopeError,
@@ -446,7 +447,7 @@ async def app_lifespan_basic(server: FastMCP) -> AsyncIterator[AppContext]:
# Start background tasks using anyio TaskGroup
async with anyio.create_task_group() as tg:
# Start scanner task
tg.start_soon(
await tg.start(
scanner_task,
send_stream,
shutdown_event,
@@ -457,7 +458,7 @@ async def app_lifespan_basic(server: FastMCP) -> AsyncIterator[AppContext]:
# Start processor pool (each gets a cloned receive stream)
for i in range(settings.vector_sync_processor_workers):
tg.start_soon(
await tg.start(
processor_task,
i,
receive_stream.clone(),
@@ -1147,7 +1148,7 @@ def get_app(transport: str = "sse", enabled_apps: list[str] | None = None):
# Start background tasks using anyio TaskGroup
async with anyio_module.create_task_group() as tg:
# Start scanner task
tg.start_soon(
await tg.start(
scanner_task,
send_stream,
shutdown_event,
@@ -1158,7 +1159,7 @@ def get_app(transport: str = "sse", enabled_apps: list[str] | None = None):
# Start processor pool (each gets a cloned receive stream)
for i in range(settings.vector_sync_processor_workers):
tg.start_soon(
await tg.start(
processor_task,
i,
receive_stream.clone(),
@@ -1478,6 +1479,7 @@ def get_app(transport: str = "sse", enabled_apps: list[str] | None = None):
vector_sync_status_fragment,
)
from nextcloud_mcp_server.auth.viz_routes import (
chunk_context_endpoint,
vector_visualization_html,
vector_visualization_search,
)
@@ -1490,7 +1492,7 @@ def get_app(transport: str = "sse", enabled_apps: list[str] | None = None):
# Create a separate Starlette app for browser routes that need session auth
# This prevents SessionAuthBackend from interfering with FastMCP's OAuth
browser_routes = [
Route("/", user_info_html, methods=["GET"]), # /app → webapp (HTML UI)
Route("/", user_info_html, methods=["GET"]), # /app → user info with all tabs
Route(
"/revoke", revoke_session, methods=["POST"], name="revoke_session_endpoint"
), # /app/revoke → revoke_session
@@ -1509,6 +1511,11 @@ def get_app(transport: str = "sse", enabled_apps: list[str] | None = None):
vector_visualization_search,
methods=["GET"],
), # /app/vector-viz/search
Route(
"/chunk-context",
chunk_context_endpoint,
methods=["GET"],
), # /app/chunk-context
# Webhook management routes (admin-only)
Route("/webhooks", webhook_management_pane, methods=["GET"]), # /app/webhooks
Route(
@@ -1521,9 +1528,17 @@ def get_app(transport: str = "sse", enabled_apps: list[str] | None = None):
),
]
# Add static files mount if directory exists
static_dir = os.path.join(os.path.dirname(__file__), "auth", "static")
if os.path.isdir(static_dir):
browser_routes.append(
Mount("/static", StaticFiles(directory=static_dir), name="static")
)
logger.info(f"Mounted static files from {static_dir}")
browser_app = Starlette(routes=browser_routes)
browser_app.add_middleware(
AuthenticationMiddleware,
AuthenticationMiddleware, # type: ignore[invalid-argument-type]
backend=SessionAuthBackend(oauth_enabled=oauth_enabled),
)
@@ -1613,7 +1628,7 @@ def get_app(transport: str = "sse", enabled_apps: list[str] | None = None):
# Add CORS middleware to allow browser-based clients like MCP Inspector
app.add_middleware(
CORSMiddleware,
CORSMiddleware, # type: ignore[invalid-argument-type]
allow_origins=["*"], # Allow all origins for development
allow_credentials=True,
allow_methods=["*"],
@@ -1623,7 +1638,7 @@ def get_app(transport: str = "sse", enabled_apps: list[str] | None = None):
# Add observability middleware (metrics + tracing)
if settings.metrics_enabled or settings.otel_exporter_otlp_endpoint:
app.add_middleware(ObservabilityMiddleware)
app.add_middleware(ObservabilityMiddleware) # type: ignore[invalid-argument-type]
logger.info("Observability middleware enabled (metrics and/or tracing)")
# Add exception handler for scope challenges (OAuth mode only)
Binary file not shown.

After

Width:  |  Height:  |  Size: 18 KiB

@@ -0,0 +1,192 @@
.viz-layout {
display: flex;
flex-direction: column;
gap: 16px;
height: 100%;
min-height: 0;
overflow-y: auto;
}
.viz-card {
background: var(--color-main-background);
border-radius: 0;
padding: 16px;
box-shadow: none;
}
.viz-controls-card {
flex: 0 0 auto;
border-bottom: 1px solid var(--color-border);
padding-bottom: 16px;
}
.viz-controls-grid {
display: grid;
grid-template-columns: repeat(auto-fit, minmax(200px, 1fr));
gap: 12px;
align-items: end;
}
@media (min-width: 768px) {
.viz-controls-grid {
grid-template-columns: 2fr 1.5fr 1.5fr auto auto;
}
}
.viz-control-group {
display: flex;
flex-direction: column;
gap: 4px;
}
.viz-control-group label {
font-weight: 500;
color: var(--color-main-text);
font-size: 13px;
}
.viz-control-group input[type="text"],
.viz-control-group input[type="number"],
.viz-control-group select {
width: 100%;
padding: 7px 10px;
border: 1px solid var(--color-border-dark);
border-radius: var(--border-radius);
font-size: 14px;
background: var(--color-main-background);
color: var(--color-main-text);
}
.viz-control-group input:focus,
.viz-control-group select:focus {
outline: none;
border-color: var(--color-primary-element);
}
.viz-control-group input[type="range"] {
width: 100%;
}
.viz-control-group select[multiple] {
min-height: 100px;
}
.viz-weight-display {
display: inline-block;
min-width: 40px;
text-align: right;
color: #666;
}
.viz-btn {
background: var(--color-primary-element);
color: white;
border: none;
padding: 7px 16px;
border-radius: var(--border-radius);
cursor: pointer;
font-size: 14px;
font-weight: 500;
white-space: nowrap;
}
.viz-btn:hover {
background: #0052a3;
}
.viz-btn-secondary {
background: #6c757d;
color: white;
border: none;
padding: 7px 16px;
border-radius: var(--border-radius);
cursor: pointer;
font-size: 14px;
white-space: nowrap;
}
.viz-btn-secondary:hover {
background: #5a6268;
}
.viz-card-plot {
flex: 0 0 auto;
display: flex;
flex-direction: column;
min-height: 500px;
height: 600px;
/* Remove horizontal padding to extend to full viewport width */
padding-left: 0;
padding-right: 0;
margin-left: -16px;
margin-right: -16px;
}
#viz-plot-container {
width: 100%;
height: 100%;
position: relative;
overflow: visible;
}
#viz-plot {
width: 100%;
height: 100%;
}
.viz-loading {
text-align: center;
padding: 40px;
color: #666;
}
.viz-loading-overlay {
position: absolute;
inset: 0;
display: flex;
align-items: center;
justify-content: center;
background: white;
color: #666;
}
.viz-no-results {
text-align: center;
padding: 40px;
color: #666;
font-style: italic;
}
.viz-advanced-section {
margin-top: 12px;
padding: 12px;
background: var(--color-background-hover);
border-radius: var(--border-radius);
border: 1px solid var(--color-border);
}
.viz-info-box {
background: var(--color-primary-element-light);
border-left: 3px solid var(--color-primary-element);
padding: 10px 12px;
margin-bottom: 16px;
font-size: 13px;
color: var(--color-main-text);
}
.chunk-toggle-btn {
background: #6c757d;
color: white;
border: none;
padding: 4px 10px;
border-radius: 3px;
cursor: pointer;
font-size: 12px;
margin-top: 6px;
}
.chunk-toggle-btn:hover {
background: #5a6268;
}
.chunk-context {
background: var(--color-background-hover);
border: 1px solid var(--color-border);
border-radius: var(--border-radius);
padding: 12px;
margin-top: 8px;
font-family: 'SFMono-Regular', 'Consolas', 'Liberation Mono', 'Menlo', monospace;
font-size: 13px;
line-height: 1.6;
white-space: pre-wrap;
word-wrap: break-word;
}
.chunk-text {
color: var(--color-text-maxcontrast);
}
.chunk-matched {
background: #fff3cd;
border: 1px solid #ffc107;
padding: 2px 4px;
border-radius: var(--border-radius);
font-weight: 500;
color: var(--color-main-text);
}
.chunk-ellipsis {
color: var(--color-text-maxcontrast);
font-style: italic;
}
@@ -0,0 +1,253 @@
// Initialize vizApp for vector visualization
function vizApp() {
return {
query: '',
algorithm: 'bm25_hybrid',
fusion: 'rrf',
showAdvanced: false,
showQueryPoint: true,
docTypes: [''],
limit: 50,
scoreThreshold: 0.0,
loading: false,
results: [],
coordinates: null,
queryCoords: null,
expandedChunks: {},
chunkLoading: {},
init() {
// Set up window resize listener to resize plot
window.addEventListener('resize', () => {
if (this.coordinates && this.results.length > 0) {
Plotly.Plots.resize('viz-plot');
}
});
},
async executeSearch() {
this.loading = true;
this.results = [];
try {
const params = new URLSearchParams({
query: this.query,
algorithm: this.algorithm,
limit: this.limit,
score_threshold: this.scoreThreshold,
});
if (this.algorithm === 'bm25_hybrid') {
params.append('fusion', this.fusion);
}
const selectedTypes = this.docTypes.filter(t => t !== '');
if (selectedTypes.length > 0) {
params.append('doc_types', selectedTypes.join(','));
}
const response = await fetch(`/app/vector-viz/search?${params}`);
const data = await response.json();
if (data.success) {
this.results = data.results;
this.coordinates = data.coordinates_3d;
this.queryCoords = data.query_coords;
this.renderPlot(this.coordinates, this.queryCoords, this.results);
} else {
alert('Search failed: ' + data.error);
}
} catch (error) {
alert('Error: ' + error.message);
} finally {
this.loading = false;
}
},
updatePlot() {
// Toggle query point visibility without recreating the plot
// This preserves camera position naturally since layout is untouched
if (this.coordinates && this.queryCoords && this.results.length > 0) {
const plotDiv = document.getElementById('viz-plot');
// If plot exists, just toggle the query trace visibility
if (plotDiv && plotDiv.data && plotDiv.data.length >= 2) {
// Trace index 1 is the query point
Plotly.restyle('viz-plot', { visible: this.showQueryPoint }, [1]);
} else {
// Plot doesn't exist yet, render it
this.renderPlot(this.coordinates, this.queryCoords, this.results);
}
}
},
renderPlot(coordinates, queryCoords, results) {
// Get container dimensions before creating layout
const container = document.getElementById('viz-plot-container');
const width = container.clientWidth;
const height = container.clientHeight;
const scores = results.map(r => r.score);
// Trace 1: Document results (always visible)
const documentTrace = {
x: coordinates.map(c => c[0]),
y: coordinates.map(c => c[1]),
z: coordinates.map(c => c[2]),
mode: 'markers',
type: 'scatter3d',
name: 'Documents',
visible: true,
customdata: results.map((r, i) => ({
title: r.title,
raw_score: r.original_score,
relative_score: r.score,
x: coordinates[i][0],
y: coordinates[i][1],
z: coordinates[i][2]
})),
hovertemplate:
'<b>%{customdata.title}</b><br>' +
'Raw Score: %{customdata.raw_score:.3f} (%{customdata.relative_score:.0%} relative)<br>' +
'(x=%{customdata.x}, y=%{customdata.y}, z=%{customdata.z})' +
'<extra></extra>',
marker: {
size: results.map(r => 4 + (Math.pow(r.score, 2) * 10)),
opacity: results.map(r => 0.3 + (r.score * 0.7)),
color: scores,
colorscale: 'Viridis',
showscale: true,
colorbar: {
title: 'Relative Score',
x: 1.02,
xanchor: 'left',
thickness: 20,
len: 0.8
},
cmin: 0,
cmax: 1
}
};
// Trace 2: Query point (visibility controlled by toggle)
const queryTrace = {
x: [queryCoords[0]],
y: [queryCoords[1]],
z: [queryCoords[2]],
mode: 'markers',
type: 'scatter3d',
name: 'Query',
visible: this.showQueryPoint, // Initial visibility from state
hovertemplate:
'<b>Search Query</b><br>' +
`(x=${queryCoords[0]}, y=${queryCoords[1]}, z=${queryCoords[2]})` +
'<extra></extra>',
marker: {
size: 10,
color: '#ef5350', // Subdued red (Material Design Red 400)
line: {
color: '#c62828', // Darker red border (Material Design Red 800)
width: 1
}
}
};
const layout = {
title: `Vector Space (PCA 3D) - ${results.length} results`,
width: width, // Explicit width from container
height: height, // Explicit height from container
scene: {
xaxis: { title: 'PC1' },
yaxis: { title: 'PC2' },
zaxis: { title: 'PC3' },
camera: {
eye: { x: 1.5, y: 1.5, z: 1.5 }
},
// Full width for 3D scene
domain: {
x: [0, 1],
y: [0, 1]
}
},
hovermode: 'closest',
autosize: true, // Enable auto-sizing for window resizes
showlegend: false, // Hide legend
margin: { l: 0, r: 100, t: 40, b: 0 } // Right margin for colorbar
};
// Always render both traces - visibility is controlled by the visible property
const traces = [documentTrace, queryTrace];
// Enable responsive resizing
const config = {
responsive: true,
displayModeBar: true
};
// Use newPlot() with explicit dimensions - renders at correct size immediately
// Camera position will be preserved by subsequent Plotly.restyle() calls in updatePlot()
Plotly.newPlot('viz-plot', traces, layout, config);
},
getNextcloudUrl(result) {
// Use global NEXTCLOUD_BASE_URL if set, otherwise construct from window location
const baseUrl = window.NEXTCLOUD_BASE_URL || '';
switch (result.doc_type) {
case 'note':
return `${baseUrl}/apps/notes/note/${result.id}`;
case 'file':
return `${baseUrl}/apps/files/?fileId=${result.id}`;
case 'calendar':
return `${baseUrl}/apps/calendar`;
case 'contact':
return `${baseUrl}/apps/contacts`;
case 'deck':
return `${baseUrl}/apps/deck`;
default:
return `${baseUrl}`;
}
},
hasChunkPosition(result) {
return result.chunk_start_offset != null && result.chunk_end_offset != null;
},
isChunkExpanded(resultKey) {
return this.expandedChunks[resultKey] !== undefined;
},
async toggleChunk(result) {
const resultKey = `${result.doc_type}_${result.id}`;
if (this.isChunkExpanded(resultKey)) {
delete this.expandedChunks[resultKey];
return;
}
this.chunkLoading[resultKey] = true;
try {
const params = new URLSearchParams({
doc_type: result.doc_type,
doc_id: result.id,
start: result.chunk_start_offset,
end: result.chunk_end_offset,
context: 500
});
const response = await fetch(`/app/chunk-context?${params}`);
const data = await response.json();
if (data.success) {
this.expandedChunks[resultKey] = data;
} else {
alert('Failed to load chunk: ' + data.error);
}
} catch (error) {
alert('Error loading chunk: ' + error.message);
} finally {
delete this.chunkLoading[resultKey];
}
}
};
}
+2 -2
View File
@@ -1310,7 +1310,7 @@ async def generate_encryption_key() -> str:
# Example usage
if __name__ == "__main__":
import asyncio
import anyio
async def main():
# Generate a key for testing
@@ -1318,4 +1318,4 @@ if __name__ == "__main__":
print(f"Generated encryption key: {key}")
print(f"Set this in your environment: export TOKEN_ENCRYPTION_KEY='{key}'")
asyncio.run(main())
anyio.run(main)
@@ -0,0 +1,524 @@
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<meta name="viewport" content="width=device-width, initial-scale=1.0, maximum-scale=1">
<meta name="apple-mobile-web-app-capable" content="yes">
<meta name="theme-color" content="#0082c9">
<title>{% block title %}Nextcloud MCP Server{% endblock %}</title>
<!-- Favicon -->
<link rel="icon" type="image/svg+xml" href="data:image/svg+xml,<svg xmlns='http://www.w3.org/2000/svg' width='32' height='32' viewBox='0 0 512 512'><rect width='512' height='512' rx='80' ry='80' fill='%230082C9'/><path d='M255.9 21.04c-11.8 0-22.2 4.08-28.6 10.01-5.6 4.98-8.6 11.41-8.6 18.11 0 5.55 2.2 11.01 5.9 15.48-16.4 4.97-30.1 13.64-39 24.53 22.1-7.67 45.7-11.86 70.3-11.86 24.6 0 48.3 4.19 70.3 11.86-8.9-10.89-22.6-19.56-39-24.53 3.9-4.47 5.9-9.93 5.9-15.48 0-6.7-3-13.13-8.5-18.11-6.4-5.93-16.9-10.01-28.7-10.01zm0 20.34c5.3 0 10.1 1.27 13.6 3.52 1.7 1.16 3.4 2.43 3.4 4.27 0 1.76-1.7 3.03-3.4 4.19-3.5 2.33-8.3 3.61-13.6 3.61-5.3 0-10.1-1.28-13.6-3.61-1.6-1.16-3.3-2.43-3.3-4.19 0-1.84 1.7-3.11 3.3-4.27 3.5-2.25 8.3-3.52 13.6-3.52zm.1 48.1c-110.8 0-200.72 90.02-200.72 200.82S145.2 491 256 491s200.7-89.9 200.7-200.7c0-110.8-89.9-200.82-200.7-200.82zm0 32.62c92.9 0 168.2 75.3 168.2 168.2 0 92.8-75.3 168.2-168.2 168.2-92.9 0-168.26-75.4-168.26-168.2 0-92.9 75.36-168.2 168.26-168.2zm-8.2 6.3c-9.6.5-19 1.9-28.3 4.1l2.3 7.8c8.4-2 17.1-3.3 26-3.8v-8.1zm16.2 0v8.1c9 .5 17.7 1.8 26 3.8l2.2-7.8c-9.1-2.2-18.6-3.6-28.2-4.1zm-60 8.5c-9 3.2-17.6 7-25.8 11.6l4.1 7.1c7.7-4.3 15.6-7.9 23.9-10.8l-2.2-7.9zm103.7 0-2 7.9c8.4 2.9 16.2 6.5 23.8 10.8l4.2-7.1c-8.2-4.6-16.9-8.4-26-11.6zm-143.3 20.3c-7.5 5.4-14.6 11.4-21.1 17.9l5.8 5.8c5.9-6.1 12.5-11.7 19.5-16.6l-4.2-7.1zm182.9 0-4 7.1c6.9 4.9 13.5 10.5 19.5 16.6l5.7-5.8c-6.5-6.5-13.7-12.5-21.2-17.9zm-91.4 11.5c-37 0-67.4 28.6-70.3 64.9l15.9 4.7c.7-29.6 24.7-53.4 54.4-53.4 30.1 0 54.4 24.4 54.4 54.3 0 15-6.2 28.7-16 38.5l.1.1c1.7 2.7 3 5.6 4.1 8.6.9 3 1.7 5.7 2.3 8.6v.4c33.8-16.7 57.2-51.5 57.2-91.7 0-3.8-.2-7.3-.6-10.9-3.2-3.3-6.3-6.4-9.8-9.5 1.5 6.5 2.3 13.4 2.3 20.4 0 28.7-13 54.7-33.5 71.8 6.3-10.6 10.1-23 10.1-36.3 0-38.9-31.7-70.5-70.6-70.5zm-91.8 14.6c-3.3 3.1-6.5 6.2-9.7 9.5-.3 3.6-.5 7.1-.5 10.9 0 7.3.7 14.2 2.1 20.9l9.1 2.7c-2.1-7.5-3.1-15.4-3.1-23.6 0-7 .7-13.9 2.1-20.4zm-31.6 4c-5.8 7.1-10.9 14.6-15.4 22.6l7.1 4c4.1-7.4 8.8-14.3 14-20.8l-5.7-5.8zm246.8 0-5.7 5.8c5.3 6.5 10 13.4 13.9 20.8l7.1-4c-4.4-8-9.5-15.5-15.3-22.6zm-269.2 37.1c-2.5 5.7-4.6 11.4-6.4 17.6l.1-.3c3.4-5 7.9-9.3 12.9-12.5l.3-.6-6.9-4.2zm291.8 0-7.2 4.2c3.2 7.3 5.7 15.1 7.6 23.1l7.9-2.1c-2.1-8.8-4.9-17.3-8.3-25.2zm-261.2 11.5c-13.4.1-25.7 9-29.7 22.5l114.8 34.2c-4.9 16.7 4.6 34.2 21.2 39.2L361.7 366c16.6 5 34.1-4.4 39.1-21l-114.6-34.4c4.9-16.5-4.7-34.1-21.3-39.1 0 0-72.4-21.5-114.8-34.3-3.1-.9-6.3-1.4-9.4-1.3zm-42.09 29.7c-.9 6.9-1.4 14-1.4 21.3 0 1.3.1 2.9.1 4.2h8.09v-4.2c0-6.5.4-12.9 1.2-19.2l-7.99-2.1zm314.59 0-7.9 2.1c.7 6.3 1.3 12.7 1.3 19.2 0 1.3 0 2.9-.2 4.2h8.2v-4.2c0-7.3-.5-14.4-1.4-21.3zm-157.3 24.7c6.3 0 11.5 5 11.5 11.3 0 6.4-5.2 11.6-11.5 11.6s-11.5-5.2-11.5-11.6c0-6.3 5.2-11.3 11.5-11.3zM98.51 307.4c1 8.2 2.89 16.4 5.09 24.3l7.9-2.1c-2.1-7.2-3.8-14.6-4.8-22.2h-8.19zm306.69 0c-1.1 7.6-2.7 15-4.8 22.2l7.8 2.1c2.2-7.9 4.1-16.1 5.2-24.3h-8.2zm-191.3 10.9c-19 13.3-31.4 35.3-31.4 60.1 0 10.4 2.3 20.4 6.2 29.7 8.8 4.9 17.9 8.8 27.6 11.7-10.8-10.7-17.5-25.2-17.5-41.4 0-19 9.3-36 23.7-46.3-3.8-4.1-6.7-8.7-8.6-13.8zM116.8 345l-7.9 2c3.1 7.6 6.8 14.7 11 21.6l6.9-4.2c-3.8-6.2-7-12.8-10-19.4zm194.8 20.5c.9 4.1 1.4 8.5 1.4 12.9 0 16.2-6.7 30.7-17.4 41.4 9.6-2.9 18.8-6.8 27.5-11.7 4-9.3 6.2-19.3 6.2-29.7 0-2.7-.2-5.2-.4-7.7l-17.3-5.2zM136 377.9l-7.1 4.1c4.7 6.2 9.7 12.1 15.3 17.3l5.7-5.5c-5.1-5-9.7-10.3-13.9-15.9zm243.9 2.3-.2.1c-2.1.3-4 .6-6.2.7h-.1c-3.6 4.5-7.3 8.8-11.5 12.8l5.8 5.5c5.5-5.2 10.5-11.1 15.2-17.3l-3-1.8zm-217.8 24-5.9 5.9c6 4.8 12.2 9.7 18.8 13.6l3.8-7.8c-5.7-2.9-11.4-6.8-16.7-11.7zm187.7 0c-5.4 4.9-11.1 8.8-16.8 11.7l3.9 7.8c6.5-3.9 12.8-8.8 18.7-13.6l-5.8-5.9zm-156.4 19.5-4.1 6.8c6.6 4 13.7 5.8 20.7 8.8l2.2-7.9c-6.5-1.9-12.7-4.8-18.8-7.7zm125.2 0c-6.2 2.9-12.5 5.8-19.1 7.7l2.3 7.9c7.2-3 14-4.8 20.7-8.8l-3.9-6.8zm-90.7 11.7-2 7.8c7.1 1 14.5 1.9 21.9 1.9v-7.7c-6.8 0-13.5-1.1-19.9-2zm55.9 0c-6.3.9-13 2-19.8 2v7.7c7.5 0 14.8-.9 22.1-1.9l-2.3-7.8z' fill='%23fff'/></svg>">
<!-- Open Sans font -->
<style>
@font-face {
font-family: 'Open Sans';
font-style: normal;
font-weight: normal;
src: local('Open Sans'), local('OpenSans');
}
@font-face {
font-family: 'Open Sans';
font-style: normal;
font-weight: bold;
src: local('Open Sans Semibold'), local('OpenSans-Semibold');
}
</style>
{% block extra_head %}{% endblock %}
<style>
/* Nextcloud App Design System */
/* CSS Variables */
:root {
/* Primary Colors */
--color-primary: #00679e;
--color-primary-element: #00679e;
--color-primary-light: #e5eff5;
--color-primary-element-light: #e5eff5;
/* Background Colors */
--color-main-background: #ffffff;
--color-background-dark: #ededed;
--color-background-hover: #f5f5f5;
/* Text Colors */
--color-main-text: #222222;
--color-text-maxcontrast: #6b6b6b;
--color-text-light: #767676;
/* Border Colors */
--color-border: #ededed;
--color-border-dark: #dbdbdb;
/* Borders & Radius */
--border-radius: 3px;
--border-radius-large: 10px;
--border-radius-pill: 100px;
/* Spacing */
--default-grid-baseline: 4px;
--default-clickable-area: 44px;
}
/* SVG Icon Styles */
.nav-icon {
width: 20px;
height: 20px;
display: inline-block;
fill: var(--color-main-text);
opacity: 0.7;
}
.app-navigation-entry.active .nav-icon {
fill: var(--color-primary-element);
opacity: 1;
}
/* General */
* {
box-sizing: border-box;
}
body {
font-family: -apple-system, BlinkMacSystemFont, "Segoe UI", Roboto, "Helvetica Neue", Arial, sans-serif;
color: var(--color-main-text);
background: var(--color-main-background);
margin: 0;
padding: 0;
}
h1, h2, h3 {
font-weight: 300;
line-height: 1.2;
}
h1 {
font-size: 32px;
margin: 0 0 20px 0;
color: var(--color-main-text);
}
h2 {
font-size: 20px;
margin: 20px 0 12px 0;
color: var(--color-main-text);
border-bottom: 1px solid var(--color-border);
padding-bottom: 8px;
}
h3 {
font-size: 16px;
margin: 16px 0 8px 0;
color: var(--color-main-text);
font-weight: 500;
}
img {
max-width: 100%;
}
/* App Header (simplified, no full menu) */
.app-header {
height: 50px;
background: var(--color-primary-element);
box-shadow: 0 2px 4px rgba(0,0,0,0.1);
position: sticky;
top: 0;
z-index: 100;
display: flex;
align-items: center;
padding: 0 20px;
}
.app-header__brand {
color: white;
font-size: 18px;
font-weight: 600;
text-decoration: none;
display: flex;
align-items: center;
gap: 12px;
}
.app-header__brand:hover {
opacity: 0.9;
}
.app-header__logo {
height: 32px;
width: 32px;
fill: white;
}
/* App Layout */
.app-content-wrapper {
display: flex;
height: calc(100vh - 50px);
overflow: hidden;
}
/* Side Navigation */
#app-navigation {
width: 250px;
background: var(--color-main-background);
border-right: 1px solid var(--color-border);
display: flex;
flex-direction: column;
flex-shrink: 0;
transition: margin-left 0.3s ease;
}
#app-navigation.app-navigation--closed {
margin-left: -250px;
}
.app-navigation__content {
flex: 1;
overflow-y: auto;
padding: 8px;
display: flex;
flex-direction: column;
}
.app-navigation-list {
list-style: none;
padding: 0;
margin: 0;
flex: 1;
}
.app-navigation-entry {
position: relative;
margin-bottom: 2px;
}
.app-navigation-entry__wrapper {
display: flex;
align-items: center;
position: relative;
}
.app-navigation-entry-link {
display: flex;
align-items: center;
padding: 0 8px;
min-height: var(--default-clickable-area);
border-radius: var(--border-radius);
transition: background-color 100ms ease-in-out;
text-decoration: none;
color: var(--color-main-text);
flex: 1;
font-size: 14px;
}
.app-navigation-entry-link:hover {
background-color: var(--color-background-hover);
}
.app-navigation-entry.active .app-navigation-entry-link {
background-color: var(--color-primary-element-light);
font-weight: 500;
}
.app-navigation-entry-icon {
width: var(--default-clickable-area);
height: var(--default-clickable-area);
display: flex;
align-items: center;
justify-content: center;
margin-right: 0;
}
.app-navigation-entry__name {
flex: 1;
white-space: nowrap;
overflow: hidden;
text-overflow: ellipsis;
}
.app-navigation-entry__counter {
margin-left: auto;
padding: 2px 6px;
border-radius: var(--border-radius-pill);
background-color: var(--color-background-dark);
font-size: 11px;
color: var(--color-text-maxcontrast);
min-width: 20px;
text-align: center;
}
.app-navigation__settings {
list-style: none;
padding: 8px 0 0 0;
margin: 8px 0 0 0;
border-top: 1px solid var(--color-border);
flex-shrink: 0;
}
.app-navigation-toggle {
display: flex;
align-items: center;
justify-content: center;
position: fixed;
top: 60px;
left: 10px;
z-index: 110;
background: var(--color-main-background);
border: 1px solid var(--color-border);
border-radius: var(--border-radius);
padding: 8px 12px;
cursor: pointer;
box-shadow: 0 0 5px rgba(0,0,0,0.1);
transition: left 0.3s ease;
}
.app-navigation-toggle:hover {
background: var(--color-background-hover);
}
#app-navigation:not(.app-navigation--closed) ~ * .app-navigation-toggle {
left: 260px;
}
/* Main Content Area */
#app-content {
flex: 1;
overflow-y: auto;
background: var(--color-main-background);
}
.page-content {
max-width: 1000px;
margin: 0 auto;
padding: 24px;
}
.content-section {
background: var(--color-main-background);
border-radius: 0;
padding: 0;
box-shadow: none;
}
.content-section h1 {
font-size: 24px;
font-weight: 600;
margin-bottom: 24px;
}
.content-section h2 {
font-size: 18px;
font-weight: 500;
margin: 24px 0 12px 0;
border-bottom: none;
padding-bottom: 0;
}
.content-section h3 {
font-size: 16px;
font-weight: 500;
}
/* Responsive */
@media (max-width: 768px) {
#app-navigation {
position: fixed;
height: calc(100vh - 50px);
z-index: 105;
box-shadow: 2px 0 8px rgba(0,0,0,0.1);
}
.page-content {
padding: 16px;
}
}
/* Footer */
footer.page-footer {
background-color: #0F0833;
color: #ffffff;
padding: 40px 0;
margin-top: 60px;
}
footer.page-footer .bootstrap-container {
max-width: 1200px;
margin: 0 auto;
padding: 0 20px;
}
footer.page-footer h1 {
font-size: 15px;
font-weight: bold;
line-height: 1.8;
color: #ffffff;
margin-top: 20px;
}
footer.page-footer ul {
list-style-type: none;
padding-left: 0;
}
footer.page-footer li {
font-size: 13px;
line-height: 1.8;
color: #ffffff;
margin-top: 0;
}
footer.page-footer li a {
color: #ffffff;
text-decoration: none;
display: block;
padding: 4px 0;
}
footer.page-footer li a:hover {
text-decoration: underline;
}
footer.page-footer p {
font-size: 15px;
line-height: 1.8;
color: #ffffff;
}
footer.page-footer p.copyright {
color: rgba(255, 255, 255, 0.5);
font-size: 13px;
text-align: center;
margin-top: 30px;
}
/* Buttons */
.btn {
border-radius: 50px;
padding: 10px 20px;
text-decoration: none;
display: inline-block;
cursor: pointer;
border: none;
font-size: 14px;
transition: all 0.3s;
}
.btn-primary {
background: #0082C9;
border: 1px solid #0062C9;
color: #fff;
}
.btn-primary:hover {
background: #006ba3;
}
/* Tables */
table {
width: 100%;
border-collapse: collapse;
margin: 20px 0;
}
td {
padding: 12px 8px;
border-bottom: 1px solid var(--color-border);
font-size: 14px;
}
td:first-child {
width: 180px;
color: var(--color-text-maxcontrast);
font-weight: 500;
}
code {
background-color: var(--color-background-dark);
padding: 2px 6px;
border-radius: var(--border-radius);
font-family: 'SFMono-Regular', 'Consolas', 'Liberation Mono', 'Menlo', monospace;
font-size: 90%;
color: var(--color-main-text);
}
/* Badges */
.badge {
display: inline-block;
padding: 3px 8px;
border-radius: 12px;
font-size: 12px;
font-weight: bold;
text-transform: uppercase;
}
.badge-oauth {
background-color: #4caf50;
color: white;
}
.badge-basic {
background-color: #2196f3;
color: white;
}
/* Messages */
.warning {
background-color: #fff3cd;
border-left: 4px solid #ffc107;
padding: 15px;
margin: 15px 0;
color: #856404;
}
.info-message {
background-color: #e3f2fd;
border-left: 4px solid #2196f3;
padding: 15px;
margin: 15px 0;
color: #1565c0;
}
.error {
background-color: #ffebee;
border-left: 4px solid #d32f2f;
padding: 15px;
margin: 15px 0;
color: #c62828;
}
.success {
background-color: #e8f5e9;
border: 2px solid #4caf50;
padding: 30px;
border-radius: 8px;
text-align: center;
}
.success h1 {
color: #4caf50;
}
{% block extra_styles %}{% endblock %}
</style>
</head>
<body>
<!-- App Header -->
<header class="app-header">
<a href="/app" class="app-header__brand">
<svg class="app-header__logo" xmlns="http://www.w3.org/2000/svg" viewBox="0 0 512 512">
<path d="M255.9 21.04c-11.8 0-22.2 4.08-28.6 10.01-5.6 4.98-8.6 11.41-8.6 18.11 0 5.55 2.2 11.01 5.9 15.48-16.4 4.97-30.1 13.64-39 24.53 22.1-7.67 45.7-11.86 70.3-11.86 24.6 0 48.3 4.19 70.3 11.86-8.9-10.89-22.6-19.56-39-24.53 3.9-4.47 5.9-9.93 5.9-15.48 0-6.7-3-13.13-8.5-18.11-6.4-5.93-16.9-10.01-28.7-10.01zm0 20.34c5.3 0 10.1 1.27 13.6 3.52 1.7 1.16 3.4 2.43 3.4 4.27 0 1.76-1.7 3.03-3.4 4.19-3.5 2.33-8.3 3.61-13.6 3.61-5.3 0-10.1-1.28-13.6-3.61-1.6-1.16-3.3-2.43-3.3-4.19 0-1.84 1.7-3.11 3.3-4.27 3.5-2.25 8.3-3.52 13.6-3.52zm.1 48.1c-110.8 0-200.72 90.02-200.72 200.82S145.2 491 256 491s200.7-89.9 200.7-200.7c0-110.8-89.9-200.82-200.7-200.82zm0 32.62c92.9 0 168.2 75.3 168.2 168.2 0 92.8-75.3 168.2-168.2 168.2-92.9 0-168.26-75.4-168.26-168.2 0-92.9 75.36-168.2 168.26-168.2zm-8.2 6.3c-9.6.5-19 1.9-28.3 4.1l2.3 7.8c8.4-2 17.1-3.3 26-3.8v-8.1zm16.2 0v8.1c9 .5 17.7 1.8 26 3.8l2.2-7.8c-9.1-2.2-18.6-3.6-28.2-4.1zm-60 8.5c-9 3.2-17.6 7-25.8 11.6l4.1 7.1c7.7-4.3 15.6-7.9 23.9-10.8l-2.2-7.9zm103.7 0-2 7.9c8.4 2.9 16.2 6.5 23.8 10.8l4.2-7.1c-8.2-4.6-16.9-8.4-26-11.6zm-143.3 20.3c-7.5 5.4-14.6 11.4-21.1 17.9l5.8 5.8c5.9-6.1 12.5-11.7 19.5-16.6l-4.2-7.1zm182.9 0-4 7.1c6.9 4.9 13.5 10.5 19.5 16.6l5.7-5.8c-6.5-6.5-13.7-12.5-21.2-17.9zm-91.4 11.5c-37 0-67.4 28.6-70.3 64.9l15.9 4.7c.7-29.6 24.7-53.4 54.4-53.4 30.1 0 54.4 24.4 54.4 54.3 0 15-6.2 28.7-16 38.5l.1.1c1.7 2.7 3 5.6 4.1 8.6.9 3 1.7 5.7 2.3 8.6v.4c33.8-16.7 57.2-51.5 57.2-91.7 0-3.8-.2-7.3-.6-10.9-3.2-3.3-6.3-6.4-9.8-9.5 1.5 6.5 2.3 13.4 2.3 20.4 0 28.7-13 54.7-33.5 71.8 6.3-10.6 10.1-23 10.1-36.3 0-38.9-31.7-70.5-70.6-70.5zm-91.8 14.6c-3.3 3.1-6.5 6.2-9.7 9.5-.3 3.6-.5 7.1-.5 10.9 0 7.3.7 14.2 2.1 20.9l9.1 2.7c-2.1-7.5-3.1-15.4-3.1-23.6 0-7 .7-13.9 2.1-20.4zm-31.6 4c-5.8 7.1-10.9 14.6-15.4 22.6l7.1 4c4.1-7.4 8.8-14.3 14-20.8l-5.7-5.8zm246.8 0-5.7 5.8c5.3 6.5 10 13.4 13.9 20.8l7.1-4c-4.4-8-9.5-15.5-15.3-22.6zm-269.2 37.1c-2.5 5.7-4.6 11.4-6.4 17.6l.1-.3c3.4-5 7.9-9.3 12.9-12.5l.3-.6-6.9-4.2zm291.8 0-7.2 4.2c3.2 7.3 5.7 15.1 7.6 23.1l7.9-2.1c-2.1-8.8-4.9-17.3-8.3-25.2zm-261.2 11.5c-13.4.1-25.7 9-29.7 22.5l114.8 34.2c-4.9 16.7 4.6 34.2 21.2 39.2L361.7 366c16.6 5 34.1-4.4 39.1-21l-114.6-34.4c4.9-16.5-4.7-34.1-21.3-39.1 0 0-72.4-21.5-114.8-34.3-3.1-.9-6.3-1.4-9.4-1.3zm-42.09 29.7c-.9 6.9-1.4 14-1.4 21.3 0 1.3.1 2.9.1 4.2h8.09v-4.2c0-6.5.4-12.9 1.2-19.2l-7.99-2.1zm314.59 0-7.9 2.1c.7 6.3 1.3 12.7 1.3 19.2 0 1.3 0 2.9-.2 4.2h8.2v-4.2c0-7.3-.5-14.4-1.4-21.3zm-157.3 24.7c6.3 0 11.5 5 11.5 11.3 0 6.4-5.2 11.6-11.5 11.6s-11.5-5.2-11.5-11.6c0-6.3 5.2-11.3 11.5-11.3zM98.51 307.4c1 8.2 2.89 16.4 5.09 24.3l7.9-2.1c-2.1-7.2-3.8-14.6-4.8-22.2h-8.19zm306.69 0c-1.1 7.6-2.7 15-4.8 22.2l7.8 2.1c2.2-7.9 4.1-16.1 5.2-24.3h-8.2zm-191.3 10.9c-19 13.3-31.4 35.3-31.4 60.1 0 10.4 2.3 20.4 6.2 29.7 8.8 4.9 17.9 8.8 27.6 11.7-10.8-10.7-17.5-25.2-17.5-41.4 0-19 9.3-36 23.7-46.3-3.8-4.1-6.7-8.7-8.6-13.8zM116.8 345l-7.9 2c3.1 7.6 6.8 14.7 11 21.6l6.9-4.2c-3.8-6.2-7-12.8-10-19.4zm194.8 20.5c.9 4.1 1.4 8.5 1.4 12.9 0 16.2-6.7 30.7-17.4 41.4 9.6-2.9 18.8-6.8 27.5-11.7 4-9.3 6.2-19.3 6.2-29.7 0-2.7-.2-5.2-.4-7.7l-17.3-5.2zM136 377.9l-7.1 4.1c4.7 6.2 9.7 12.1 15.3 17.3l5.7-5.5c-5.1-5-9.7-10.3-13.9-15.9zm243.9 2.3-.2.1c-2.1.3-4 .6-6.2.7h-.1c-3.6 4.5-7.3 8.8-11.5 12.8l5.8 5.5c5.5-5.2 10.5-11.1 15.2-17.3l-3-1.8zm-217.8 24-5.9 5.9c6 4.8 12.2 9.7 18.8 13.6l3.8-7.8c-5.7-2.9-11.4-6.8-16.7-11.7zm187.7 0c-5.4 4.9-11.1 8.8-16.8 11.7l3.9 7.8c6.5-3.9 12.8-8.8 18.7-13.6l-5.8-5.9zm-156.4 19.5-4.1 6.8c6.6 4 13.7 5.8 20.7 8.8l2.2-7.9c-6.5-1.9-12.7-4.8-18.8-7.7zm125.2 0c-6.2 2.9-12.5 5.8-19.1 7.7l2.3 7.9c7.2-3 14-4.8 20.7-8.8l-3.9-6.8zm-90.7 11.7-2 7.8c7.1 1 14.5 1.9 21.9 1.9v-7.7c-6.8 0-13.5-1.1-19.9-2zm55.9 0c-6.3.9-13 2-19.8 2v7.7c7.5 0 14.8-.9 22.1-1.9l-2.3-7.8z" fill="#fff"/>
</svg>
<span>Nextcloud MCP Server</span>
</a>
</header>
<!-- App Content Wrapper (Sidebar + Main Content) -->
{% block content %}{% endblock %}
{% block scripts %}{% endblock %}
</body>
</html>
@@ -0,0 +1,19 @@
{% extends "base.html" %}
{% block title %}{{ error_title|default('Error') }} - Nextcloud MCP Server{% endblock %}
{% block content %}
<h1>{{ error_title|default('Error') }}</h1>
<div class="error">
<strong>Error:</strong> {{ error_message }}
</div>
{% if login_url %}
<p><a href="{{ login_url }}" class="btn btn-primary">Login again</a></p>
{% endif %}
{% if back_url %}
<p><a href="{{ back_url }}" class="btn">Go Back</a></p>
{% endif %}
{% endblock %}
@@ -0,0 +1,21 @@
{% extends "base.html" %}
{% block title %}{{ success_title|default('Success') }} - Nextcloud MCP Server{% endblock %}
{% block extra_head %}
{% if redirect_url and redirect_delay %}
<meta http-equiv="refresh" content="{{ redirect_delay }};url={{ redirect_url }}">
{% endif %}
{% endblock %}
{% block content %}
<div class="success">
<h1>{{ success_title|default('✓ Success') }}</h1>
{% for message in success_messages %}
<p>{{ message }}</p>
{% endfor %}
{% if redirect_url %}
<p>Redirecting...</p>
{% endif %}
</div>
{% endblock %}
@@ -0,0 +1,650 @@
{% extends "base.html" %}
{% block title %}Nextcloud MCP Server{% endblock %}
{% block extra_head %}
<!-- htmx for dynamic loading -->
<script src="https://unpkg.com/htmx.org@1.9.10"></script>
<!-- Alpine.js for state management -->
<script defer src="https://cdn.jsdelivr.net/npm/alpinejs@3.x.x/dist/cdn.min.js"></script>
<!-- Plotly.js for vector visualization -->
<script src="https://cdn.plot.ly/plotly-3.3.0.min.js"></script>
<!-- Vector Viz static assets -->
<link rel="stylesheet" href="/app/static/vector-viz.css">
{% endblock %}
{% block extra_styles %}
/* Smooth htmx transitions */
.htmx-swapping {
opacity: 0;
transition: opacity 200ms ease-out;
}
.htmx-settling {
opacity: 1;
transition: opacity 200ms ease-in;
}
/* Logout button styling */
.logout-section {
margin-top: 20px;
padding-top: 20px;
border-top: 1px solid var(--color-border);
}
/* Welcome tab specific styles */
.hero-section {
background: linear-gradient(135deg, var(--color-primary-element) 0%, #0082c9 100%);
color: white;
padding: 60px 24px;
margin: -24px -24px 40px -24px;
border-radius: 0 0 var(--border-radius-large) var(--border-radius-large);
text-align: center;
}
.hero-section h1 {
color: white;
font-size: 36px;
margin: 0 0 16px 0;
font-weight: 600;
}
.hero-section p {
font-size: 18px;
opacity: 0.95;
max-width: 700px;
margin: 0 auto;
line-height: 1.6;
}
.feature-grid {
display: grid;
grid-template-columns: repeat(auto-fit, minmax(280px, 1fr));
gap: 24px;
margin: 32px 0;
}
.feature-card {
background: var(--color-main-background);
border: 2px solid var(--color-border);
border-radius: var(--border-radius-large);
padding: 24px;
transition: all 0.2s;
cursor: pointer;
text-decoration: none;
color: inherit;
display: block;
}
.feature-card:hover {
border-color: var(--color-primary-element);
box-shadow: 0 4px 12px rgba(0, 103, 158, 0.15);
transform: translateY(-2px);
}
.feature-card h3 {
color: var(--color-primary-element);
font-size: 20px;
margin: 12px 0 8px 0;
font-weight: 600;
display: flex;
align-items: center;
gap: 12px;
}
.feature-card p {
color: var(--color-text-maxcontrast);
font-size: 14px;
line-height: 1.6;
margin: 8px 0 0 0;
}
.feature-icon {
width: 48px;
height: 48px;
background: var(--color-primary-element-light);
border-radius: var(--border-radius);
display: flex;
align-items: center;
justify-content: center;
margin-bottom: 8px;
}
.feature-icon svg {
width: 28px;
height: 28px;
fill: var(--color-primary-element);
}
.info-section {
background: var(--color-background-hover);
border-radius: var(--border-radius-large);
padding: 32px;
margin: 32px 0;
}
.info-section h2 {
color: var(--color-main-text);
font-size: 24px;
margin: 0 0 16px 0;
border: none;
padding: 0;
}
.info-section p {
color: var(--color-text-maxcontrast);
line-height: 1.7;
margin: 12px 0;
}
.info-section ul {
margin: 12px 0;
padding-left: 24px;
}
.info-section li {
color: var(--color-text-maxcontrast);
line-height: 1.7;
margin: 8px 0;
}
.info-section code {
background: var(--color-main-background);
padding: 2px 8px;
border-radius: var(--border-radius);
font-size: 13px;
}
.auth-status {
background: var(--color-primary-element-light);
border-left: 4px solid var(--color-primary-element);
padding: 16px 20px;
margin: 24px 0;
border-radius: var(--border-radius);
display: flex;
align-items: center;
gap: 12px;
}
.auth-status svg {
width: 24px;
height: 24px;
fill: var(--color-primary-element);
flex-shrink: 0;
}
.auth-status-text {
flex: 1;
}
.auth-status-text strong {
display: block;
color: var(--color-main-text);
font-size: 14px;
margin-bottom: 4px;
}
.auth-status-text span {
color: var(--color-text-maxcontrast);
font-size: 13px;
}
{% endblock %}
{% block content %}
<div class="app-content-wrapper" x-data="{ activeSection: 'welcome', navOpen: true }">
<!-- Side Navigation -->
<nav id="app-navigation" :class="{ 'app-navigation--closed': !navOpen }">
<div class="app-navigation__content">
<!-- Navigation List -->
<ul class="app-navigation-list">
<li class="app-navigation-entry" :class="{ 'active': activeSection === 'welcome' }">
<div class="app-navigation-entry__wrapper">
<a href="#"
@click.prevent="activeSection = 'welcome'"
class="app-navigation-entry-link">
<span class="app-navigation-entry-icon">
<svg class="nav-icon" viewBox="0 0 24 24">
<path d="M10,20V14H14V20H19V12H22L12,3L2,12H5V20H10Z" />
</svg>
</span>
<span class="app-navigation-entry__name">Welcome</span>
</a>
</div>
</li>
<li class="app-navigation-entry" :class="{ 'active': activeSection === 'user-info' }">
<div class="app-navigation-entry__wrapper">
<a href="#"
@click.prevent="activeSection = 'user-info'"
class="app-navigation-entry-link">
<span class="app-navigation-entry-icon">
<svg class="nav-icon" viewBox="0 0 24 24">
<path d="M12,4A4,4 0 0,1 16,8A4,4 0 0,1 12,12A4,4 0 0,1 8,8A4,4 0 0,1 12,4M12,14C16.42,14 20,15.79 20,18V20H4V18C4,15.79 7.58,14 12,14Z" />
</svg>
</span>
<span class="app-navigation-entry__name">User Info</span>
</a>
</div>
</li>
{% if show_vector_sync_tab %}
<li class="app-navigation-entry" :class="{ 'active': activeSection === 'vector-sync' }">
<div class="app-navigation-entry__wrapper">
<a href="#"
@click.prevent="activeSection = 'vector-sync'"
class="app-navigation-entry-link">
<span class="app-navigation-entry-icon">
<svg class="nav-icon" viewBox="0 0 24 24">
<path d="M12,18A6,6 0 0,1 6,12C6,11 6.25,10.03 6.7,9.2L5.24,7.74C4.46,8.97 4,10.43 4,12A8,8 0 0,0 12,20V23L16,19L12,15M12,4V1L8,5L12,9V6A6,6 0 0,1 18,12C18,13 17.75,13.97 17.3,14.8L18.76,16.26C19.54,15.03 20,13.57 20,12A8,8 0 0,0 12,4Z" />
</svg>
</span>
<span class="app-navigation-entry__name">Vector Sync</span>
</a>
</div>
</li>
<li class="app-navigation-entry" :class="{ 'active': activeSection === 'vector-viz' }">
<div class="app-navigation-entry__wrapper">
<a href="#"
@click.prevent="activeSection = 'vector-viz'"
class="app-navigation-entry-link">
<span class="app-navigation-entry-icon">
<svg class="nav-icon" viewBox="0 0 24 24">
<path d="M22,21H2V3H4V19H6V10H10V19H12V6H16V19H18V14H22V21Z" />
</svg>
</span>
<span class="app-navigation-entry__name">Vector Viz</span>
</a>
</div>
</li>
{% endif %}
{% if show_webhooks_tab %}
<li class="app-navigation-entry" :class="{ 'active': activeSection === 'webhooks' }">
<div class="app-navigation-entry__wrapper">
<a href="#"
@click.prevent="activeSection = 'webhooks'"
class="app-navigation-entry-link">
<span class="app-navigation-entry-icon">
<svg class="nav-icon" viewBox="0 0 24 24">
<path d="M10.59,13.41C11,13.8 11,14.44 10.59,14.83C10.2,15.22 9.56,15.22 9.17,14.83C7.22,12.88 7.22,9.71 9.17,7.76V7.76L12.71,4.22C14.66,2.27 17.83,2.27 19.78,4.22C21.73,6.17 21.73,9.34 19.78,11.29L18.29,12.78C18.3,11.96 18.17,11.14 17.89,10.36L18.36,9.88C19.54,8.71 19.54,6.81 18.36,5.64C17.19,4.46 15.29,4.46 14.12,5.64L10.59,9.17C9.41,10.34 9.41,12.24 10.59,13.41M13.41,9.17C13.8,8.78 14.44,8.78 14.83,9.17C16.78,11.12 16.78,14.29 14.83,16.24V16.24L11.29,19.78C9.34,21.73 6.17,21.73 4.22,19.78C2.27,17.83 2.27,14.66 4.22,12.71L5.71,11.22C5.7,12.04 5.83,12.86 6.11,13.65L5.64,14.12C4.46,15.29 4.46,17.19 5.64,18.36C6.81,19.54 8.71,19.54 9.88,18.36L13.41,14.83C14.59,13.66 14.59,11.76 13.41,10.59C13,10.2 13,9.56 13.41,9.17Z" />
</svg>
</span>
<span class="app-navigation-entry__name">Webhooks</span>
</a>
</div>
</li>
{% endif %}
</ul>
<!-- Settings/Logout at bottom -->
{% if logout_url %}
<ul class="app-navigation__settings">
<li class="app-navigation-entry">
<div class="app-navigation-entry__wrapper">
<a href="{{ logout_url }}" class="app-navigation-entry-link">
<span class="app-navigation-entry-icon">
<svg class="nav-icon" viewBox="0 0 24 24">
<path d="M16,17V14H9V10H16V7L21,12L16,17M14,2A2,2 0 0,1 16,4V6H14V4H5V20H14V18H16V20A2,2 0 0,1 14,22H5A2,2 0 0,1 3,20V4A2,2 0 0,1 5,2H14Z" />
</svg>
</span>
<span class="app-navigation-entry__name">Logout</span>
</a>
</div>
</li>
</ul>
{% endif %}
</div>
<!-- Toggle Button (mobile) -->
<button @click="navOpen = !navOpen"
class="app-navigation-toggle"
:aria-expanded="navOpen.toString()">
</button>
</nav>
<!-- Main Content Area -->
<main id="app-content">
<div class="page-content">
<!-- Welcome Section -->
<div x-show="activeSection === 'welcome'">
<!-- Hero Section -->
<div class="hero-section">
<h1>Welcome to Nextcloud MCP Server</h1>
<p>
Interactive user interface for semantic search and document retrieval.
Test queries, visualize results, and explore your Nextcloud content using RAG workflows.
</p>
</div>
<!-- Authentication Status -->
<div class="auth-status">
<svg viewBox="0 0 24 24">
<path d="M12,4A4,4 0 0,1 16,8A4,4 0 0,1 12,12A4,4 0 0,1 8,8A4,4 0 0,1 12,4M12,14C16.42,14 20,15.79 20,18V20H4V18C4,15.79 7.58,14 12,14Z" />
</svg>
<div class="auth-status-text">
<strong>Authenticated as: {{ username }}</strong>
<span>Authentication mode: <code>{{ auth_mode }}</code></span>
</div>
</div>
{% if vector_sync_enabled %}
<!-- Vector Sync Enabled Content -->
<div class="info-section">
<h2>About Semantic Search</h2>
<p>
This interface provides access to <strong>semantic search</strong> capabilities powered by vector embeddings.
Unlike traditional keyword search, semantic search understands the <em>meaning</em> of your queries and finds
conceptually similar content across your Nextcloud apps.
</p>
<p>
<strong>How it works:</strong>
</p>
<ul>
<li>Documents from Notes, Calendar, Files, Contacts, and Deck are indexed into a vector database</li>
<li>Each document chunk is converted to a 768-dimensional vector embedding that captures semantic meaning</li>
<li>Queries are also converted to embeddings and matched against document vectors using similarity search</li>
<li>Results can be retrieved using pure semantic search or hybrid BM25 search combining keywords and semantics</li>
</ul>
</div>
<div class="info-section">
<h2>RAG Workflow Integration</h2>
<p>
This UI allows you to <strong>test the same queries that Large Language Models (LLMs) would use</strong> in a
Retrieval-Augmented Generation (RAG) workflow. When an AI assistant needs to answer questions about your data:
</p>
<ul>
<li><strong>Step 1:</strong> The assistant converts your question into a search query</li>
<li><strong>Step 2:</strong> The MCP server retrieves relevant document chunks using semantic search</li>
<li><strong>Step 3:</strong> Retrieved context is passed to the LLM to generate an informed answer</li>
</ul>
<!-- RAG Workflow Diagram -->
<div style="background: var(--color-main-background); border: 2px solid var(--color-primary-element); border-radius: var(--border-radius-large); padding: 24px; margin: 24px 0; overflow-x: auto;">
<div style="text-align: center; font-weight: 600; margin-bottom: 20px; color: var(--color-primary-element); font-size: 16px;">
MCP Sampling RAG Workflow
</div>
<!-- Four-component bidirectional flow -->
<div style="max-width: 1000px; margin: 0 auto;">
<div style="display: grid; grid-template-columns: 0.7fr auto 1fr auto 1fr auto 0.9fr; gap: 10px; align-items: center;">
<!-- User -->
<div style="background: var(--color-background-hover); border: 2px solid var(--color-border); border-radius: var(--border-radius-large); padding: 14px; text-align: center;">
<div style="font-size: 26px; margin-bottom: 5px;">👤</div>
<div style="font-weight: 600; color: var(--color-main-text); font-size: 12px;">User</div>
<div style="font-size: 9px; color: var(--color-text-maxcontrast); font-style: italic; margin-top: 5px; line-height: 1.2;">
"What are health<br>benefits of coffee?"
</div>
</div>
<!-- Arrow User <-> Client -->
<div style="text-align: center;">
<div style="font-size: 20px; color: var(--color-text-maxcontrast);"></div>
</div>
<!-- MCP Client + LLM (combined) -->
<div style="background: var(--color-primary-element-light); border: 2px solid var(--color-primary-element); border-radius: var(--border-radius-large); padding: 12px; text-align: center;">
<div style="font-weight: 600; color: var(--color-primary-element); font-size: 13px; margin-bottom: 8px;">MCP Client + LLM</div>
<div style="background: var(--color-main-background); border-radius: var(--border-radius); padding: 8px; margin-bottom: 6px;">
<div style="font-size: 9px; color: var(--color-text-maxcontrast);">(Claude Code)</div>
</div>
<div style="background: var(--color-main-background); border-radius: var(--border-radius); padding: 8px; border: 2px solid var(--color-primary-element);">
<div style="font-size: 16px; margin-bottom: 2px;">🧠</div>
<div style="font-weight: 600; color: var(--color-main-text); font-size: 10px;">Client's LLM</div>
<div style="font-size: 8px; color: var(--color-text-maxcontrast);">(Claude)</div>
</div>
<div style="margin-top: 8px; font-size: 8px; color: var(--color-text-maxcontrast); line-height: 1.2;">
<strong>Enables RAG:</strong><br>
Receives context,<br>
generates answer
</div>
</div>
<!-- Arrow Client <-> Server -->
<div style="text-align: center;">
<div style="font-size: 20px; color: var(--color-primary-element);"></div>
<div style="font-size: 7px; color: var(--color-text-maxcontrast); margin-top: 2px; font-weight: 600; line-height: 1.1;">
Query +<br>
Sampling
</div>
</div>
<!-- MCP Server -->
<div style="background: var(--color-primary-element-light); border: 2px solid var(--color-primary-element); border-radius: var(--border-radius-large); padding: 12px; text-align: center;">
<div style="font-weight: 600; color: var(--color-primary-element); font-size: 13px; margin-bottom: 8px;">MCP Server</div>
<div style="background: var(--color-main-background); border-radius: var(--border-radius); padding: 7px; margin-bottom: 5px;">
<div style="font-weight: 600; color: var(--color-main-text); font-size: 9px; margin-bottom: 2px;">1. Semantic Search</div>
<div style="font-size: 7px; color: var(--color-text-maxcontrast); line-height: 1.2;">
Vector embeddings<br>
BM25 Hybrid + RRF
</div>
</div>
<div style="background: var(--color-main-background); border-radius: var(--border-radius); padding: 7px; margin-bottom: 5px;">
<div style="font-weight: 600; color: var(--color-main-text); font-size: 9px; margin-bottom: 2px;">2. Retrieve Context</div>
<div style="font-size: 7px; color: var(--color-text-maxcontrast); line-height: 1.2;">
Top relevant docs<br>
with scores
</div>
</div>
<div style="background: var(--color-main-background); border-radius: var(--border-radius); padding: 7px; margin-bottom: 5px;">
<div style="font-weight: 600; color: var(--color-main-text); font-size: 9px; margin-bottom: 2px;">3. Format Response</div>
<div style="font-size: 7px; color: var(--color-text-maxcontrast); line-height: 1.2;">
Document chunks<br>
with citations
</div>
</div>
<div style="background: var(--color-main-background); border-radius: var(--border-radius); padding: 7px;">
<div style="font-weight: 600; color: var(--color-main-text); font-size: 9px; margin-bottom: 2px;">4. Send to LLM</div>
<div style="font-size: 7px; color: var(--color-text-maxcontrast); line-height: 1.2;">
Via MCP sampling<br>
for answer generation
</div>
</div>
</div>
<!-- Arrow Server <-> Nextcloud -->
<div style="text-align: center;">
<div style="font-size: 20px; color: var(--color-primary-element);"></div>
<div style="font-size: 7px; color: var(--color-text-maxcontrast); margin-top: 2px; font-weight: 600; line-height: 1.1;">
Retrieve
</div>
</div>
<!-- Nextcloud -->
<div style="background: var(--color-background-hover); border: 2px solid var(--color-border); border-radius: var(--border-radius-large); padding: 12px; text-align: center; position: relative;">
<img src="/app/static/nextcloud-logo.png" alt="Nextcloud" style="width: 40px; height: 40px; margin-bottom: 6px;" />
<div style="font-weight: 600; color: var(--color-main-text); font-size: 12px; margin-bottom: 4px;">Nextcloud</div>
<div style="font-size: 8px; color: var(--color-text-maxcontrast); line-height: 1.2;">
Notes, Calendar,<br>
Files, Contacts,<br>
Deck
</div>
</div>
</div>
<!-- Explanation below diagram -->
<div style="margin-top: 24px; padding: 16px; background: var(--color-background-hover); border-radius: var(--border-radius); border-left: 4px solid var(--color-primary-element);">
<div style="font-size: 12px; color: var(--color-main-text); line-height: 1.6;">
<strong>How RAG works via MCP Sampling:</strong>
</div>
<ol style="margin: 8px 0 0 0; padding-left: 20px; font-size: 11px; color: var(--color-text-maxcontrast); line-height: 1.6;">
<li>User asks question through MCP Client</li>
<li>Client sends query to MCP Server</li>
<li>Server retrieves relevant document context from Nextcloud</li>
<li><strong>Server sends context back to Client's LLM</strong> (MCP Sampling)</li>
<li>Client's LLM generates answer with citations using retrieved context</li>
<li>Answer returned to user</li>
</ol>
<div style="margin-top: 8px; font-size: 10px; color: var(--color-text-maxcontrast); font-style: italic;">
The server has no LLM - it only retrieves context. The client's existing LLM is reused for answer generation.
</div>
</div>
</div>
</div>
<p style="margin-top: 16px;">
<strong>Key Point:</strong> The MCP server retrieves context but doesn't generate answers itself.
Through <strong>MCP sampling</strong>, it requests the client's LLM to generate responses, giving users
full control over which model is used and ensuring all processing happens client-side.
</p>
<p>
By using this interface, you can preview search results, understand relevance scores, and verify
that the system retrieves the right information before it reaches the LLM.
</p>
</div>
<!-- Feature Cards -->
<h2>Available Features</h2>
<div class="feature-grid">
<a href="#" @click.prevent="activeSection = 'user-info'" class="feature-card">
<div class="feature-icon">
<svg viewBox="0 0 24 24">
<path d="M12,4A4,4 0 0,1 16,8A4,4 0 0,1 12,12A4,4 0 0,1 8,8A4,4 0 0,1 12,4M12,14C16.42,14 20,15.79 20,18V20H4V18C4,15.79 7.58,14 12,14Z" />
</svg>
</div>
<h3>User Information</h3>
<p>
View your authentication details, session information, and IdP profile.
Manage background access permissions.
</p>
</a>
<a href="#" @click.prevent="activeSection = 'vector-sync'" class="feature-card">
<div class="feature-icon">
<svg viewBox="0 0 24 24">
<path d="M12,18A6,6 0 0,1 6,12C6,11 6.25,10.03 6.7,9.2L5.24,7.74C4.46,8.97 4,10.43 4,12A8,8 0 0,0 12,20V23L16,19L12,15M12,4V1L8,5L12,9V6A6,6 0 0,1 18,12C18,13 17.75,13.97 17.3,14.8L18.76,16.26C19.54,15.03 20,13.57 20,12A8,8 0 0,0 12,4Z" />
</svg>
</div>
<h3>Vector Sync Status</h3>
<p>
Monitor real-time indexing progress with metrics for indexed documents, pending queue,
and synchronization status.
</p>
</a>
<a href="#" @click.prevent="activeSection = 'vector-viz'" class="feature-card">
<div class="feature-icon">
<svg viewBox="0 0 24 24">
<path d="M22,21H2V3H4V19H6V10H10V19H12V6H16V19H18V14H22V21Z" />
</svg>
</div>
<h3>Vector Visualization</h3>
<p>
Interactive search interface with 2D PCA visualization. Compare algorithms,
view relevance scores, and explore matched document chunks.
</p>
</a>
</div>
{% else %}
<!-- Vector Sync Disabled Content -->
<div class="warning">
<h3 style="margin-top: 0;">Vector Sync is Disabled</h3>
<p>
Semantic search and vector visualization features are currently disabled.
To enable these features, set <code>VECTOR_SYNC_ENABLED=true</code> in your environment configuration.
</p>
<p style="margin-bottom: 0;">
<strong>Learn more:</strong>
<a href="https://github.com/cbcoutinho/nextcloud-mcp-server/blob/master/docs/configuration.md" target="_blank" style="color: inherit; text-decoration: underline;">
Configuration Guide
</a>
</p>
</div>
<!-- Limited Feature Card -->
<h2>Available Features</h2>
<div class="feature-grid">
<a href="#" @click.prevent="activeSection = 'user-info'" class="feature-card">
<div class="feature-icon">
<svg viewBox="0 0 24 24">
<path d="M12,4A4,4 0 0,1 16,8A4,4 0 0,1 12,12A4,4 0 0,1 8,8A4,4 0 0,1 12,4M12,14C16.42,14 20,15.79 20,18V20H4V18C4,15.79 7.58,14 12,14Z" />
</svg>
</div>
<h3>User Information</h3>
<p>
View your authentication details, session information, and IdP profile.
Manage background access permissions.
</p>
</a>
</div>
{% endif %}
<!-- Documentation Section -->
<div class="info-section" style="margin-top: 40px;">
<h2>Documentation</h2>
<p>
For detailed information about configuration, authentication modes, and advanced features,
please refer to the project documentation:
</p>
<ul>
<li><a href="https://github.com/cbcoutinho/nextcloud-mcp-server/blob/master/docs/installation.md" target="_blank">Installation Guide</a></li>
<li><a href="https://github.com/cbcoutinho/nextcloud-mcp-server/blob/master/docs/configuration.md" target="_blank">Configuration Options</a></li>
<li><a href="https://github.com/cbcoutinho/nextcloud-mcp-server/blob/master/docs/authentication.md" target="_blank">Authentication Modes</a></li>
{% if vector_sync_enabled %}
<li><a href="https://github.com/cbcoutinho/nextcloud-mcp-server/blob/master/docs/user-guide/vector-sync-ui.md" target="_blank">Vector Sync UI Guide</a></li>
{% endif %}
</ul>
</div>
</div>
<!-- User Info Section -->
<div x-show="activeSection === 'user-info'">
<div class="content-section">
<h1>User Information</h1>
{{ user_info_tab_html|safe }}
</div>
</div>
{% if show_vector_sync_tab %}
<!-- Vector Sync Section -->
<div x-show="activeSection === 'vector-sync'">
<div class="content-section">
<h1>Vector Sync Status</h1>
{{ vector_sync_tab_html|safe }}
</div>
</div>
<!-- Vector Viz Section -->
<div x-show="activeSection === 'vector-viz'">
<div class="content-section">
<h1>Vector Visualization</h1>
<div hx-get="/app/vector-viz" hx-trigger="load" hx-swap="outerHTML">
<p style="color: #999;">Loading vector visualization...</p>
</div>
</div>
</div>
{% endif %}
{% if show_webhooks_tab %}
<!-- Webhooks Section -->
<div x-show="activeSection === 'webhooks'">
<div class="content-section">
<h1>Webhook Management</h1>
{{ webhooks_tab_html|safe }}
</div>
</div>
{% endif %}
</div>
</main>
</div>
<script>
// Set global Nextcloud base URL for use in external JS
window.NEXTCLOUD_BASE_URL = '{{ nextcloud_host_for_links }}';
</script>
<script src="/app/static/vector-viz.js"></script>
{% endblock %}
@@ -0,0 +1,165 @@
<div x-data="vizApp()">
<div class="viz-layout">
<!-- Top: Search Controls -->
<div class="viz-card viz-controls-card">
<form @submit.prevent="executeSearch">
<div class="viz-controls-grid">
<div class="viz-control-group">
<label>Search Query</label>
<input type="text" x-model="query" placeholder="Enter search query..." required />
</div>
<div class="viz-control-group">
<label>Algorithm</label>
<select x-model="algorithm">
<option value="semantic">Semantic (Dense)</option>
<option value="bm25_hybrid" selected>BM25 Hybrid</option>
</select>
</div>
<div class="viz-control-group">
<label>Fusion</label>
<select x-model="fusion" :disabled="algorithm !== 'bm25_hybrid'" :style="algorithm !== 'bm25_hybrid' ? 'opacity: 0.5; cursor: not-allowed;' : ''">
<option value="rrf" selected>RRF</option>
<option value="dbsf">DBSF</option>
</select>
</div>
<div class="viz-control-group">
<label>&nbsp;</label>
<button type="submit" class="viz-btn">Search</button>
</div>
<div class="viz-control-group">
<label>&nbsp;</label>
<button type="button" class="viz-btn-secondary" @click="showAdvanced = !showAdvanced">
<span x-text="showAdvanced ? 'Hide' : 'Advanced'"></span>
</button>
</div>
</div>
<!-- Advanced Options (Collapsible) -->
<div x-show="showAdvanced" style="margin-top: 16px;">
<div class="viz-controls-grid" style="grid-template-columns: repeat(auto-fit, minmax(150px, 1fr));">
<div class="viz-control-group">
<label>Document Types</label>
<div style="display: grid; grid-template-columns: 1fr 1fr; gap: 8px; font-size: 13px;">
<label style="display: flex; align-items: center; cursor: pointer; font-weight: normal;">
<input type="checkbox" x-model="docTypes" value="" style="margin-right: 4px;">
<span>All</span>
</label>
<label style="display: flex; align-items: center; cursor: pointer; font-weight: normal;">
<input type="checkbox" x-model="docTypes" value="note" style="margin-right: 4px;">
<span>Notes</span>
</label>
<label style="display: flex; align-items: center; cursor: pointer; font-weight: normal;">
<input type="checkbox" x-model="docTypes" value="file" style="margin-right: 4px;">
<span>Files</span>
</label>
<label style="display: flex; align-items: center; cursor: pointer; font-weight: normal;">
<input type="checkbox" x-model="docTypes" value="calendar" style="margin-right: 4px;">
<span>Calendar</span>
</label>
<label style="display: flex; align-items: center; cursor: pointer; font-weight: normal;">
<input type="checkbox" x-model="docTypes" value="contact" style="margin-right: 4px;">
<span>Contacts</span>
</label>
<label style="display: flex; align-items: center; cursor: pointer; font-weight: normal;">
<input type="checkbox" x-model="docTypes" value="deck" style="margin-right: 4px;">
<span>Deck</span>
</label>
</div>
</div>
<div class="viz-control-group">
<label>Score Threshold</label>
<input type="number" x-model.number="scoreThreshold" min="0" max="1" step="any" />
</div>
<div class="viz-control-group">
<label>Result Limit</label>
<input type="number" x-model.number="limit" min="1" max="1000" />
</div>
<div class="viz-control-group">
<label>Display Options</label>
<label style="display: flex; align-items: center; cursor: pointer; font-weight: normal; margin-top: 4px;">
<input type="checkbox" x-model="showQueryPoint" @change="updatePlot()" style="margin-right: 6px;">
<span>Show Query Point</span>
</label>
</div>
</div>
</div>
</form>
</div>
<!-- Plot -->
<div class="viz-card viz-card-plot">
<div id="viz-plot-container">
<div x-show="loading" class="viz-loading-overlay" x-transition.opacity.duration.200ms>
Executing search and computing PCA projection...
</div>
<div id="viz-plot" x-show="!loading" x-transition.opacity.duration.200ms></div>
</div>
</div>
<!-- Results -->
<div class="viz-card" style="flex: 0 0 auto;">
<h3 style="margin-top: 0;">Search Results (<span x-text="loading ? '...' : results.length"></span>)</h3>
<div x-show="loading" class="viz-loading" x-transition.opacity.duration.200ms>
Loading results...
</div>
<div x-show="!loading && results.length === 0" class="viz-no-results" x-transition.opacity.duration.200ms>
No results found. Try a different query or adjust your search parameters.
</div>
<template x-if="!loading && results.length > 0">
<div x-transition.opacity.duration.200ms>
<template x-for="result in results" :key="result.id">
<div style="padding: 12px; border-bottom: 1px solid #eee;">
<a :href="getNextcloudUrl(result)" target="_blank" style="font-weight: 500; color: #0066cc; text-decoration: none;">
<span x-text="result.title"></span>
</a>
<div style="font-size: 14px; color: #666; margin-top: 4px;" x-text="result.excerpt"></div>
<div style="font-size: 12px; color: #999; margin-top: 4px;">
Raw Score: <span x-text="result.original_score.toFixed(3)"></span>
(<span x-text="(result.score * 100).toFixed(0)"></span>% relative) |
Type: <span x-text="result.doc_type"></span>
</div>
<!-- Show Chunk button (only if chunk position is available) -->
<template x-if="hasChunkPosition(result)">
<button
class="chunk-toggle-btn"
@click="toggleChunk(result)"
x-text="isChunkExpanded(`${result.doc_type}_${result.id}`) ? 'Hide Chunk' : 'Show Chunk'"
></button>
</template>
<!-- Chunk context (expanded inline) -->
<template x-if="isChunkExpanded(`${result.doc_type}_${result.id}`)">
<div class="chunk-context" x-transition.opacity.duration.200ms>
<template x-if="chunkLoading[`${result.doc_type}_${result.id}`]">
<div style="color: #666; font-style: italic;">Loading chunk...</div>
</template>
<template x-if="!chunkLoading[`${result.doc_type}_${result.id}`]">
<div>
<template x-if="expandedChunks[`${result.doc_type}_${result.id}`]?.has_more_before">
<span class="chunk-ellipsis">...</span>
</template>
<span class="chunk-text" x-text="expandedChunks[`${result.doc_type}_${result.id}`]?.before_context"></span><span class="chunk-matched" x-text="expandedChunks[`${result.doc_type}_${result.id}`]?.chunk_text"></span><span class="chunk-text" x-text="expandedChunks[`${result.doc_type}_${result.id}`]?.after_context"></span><template x-if="expandedChunks[`${result.doc_type}_${result.id}`]?.has_more_after">
<span class="chunk-ellipsis">...</span>
</template>
</div>
</template>
</div>
</template>
</div>
</template>
</div>
</template>
</div><!-- Search Results -->
</div><!-- .viz-layout -->
</div><!-- x-data="vizApp()" -->
@@ -0,0 +1,392 @@
{% extends "base.html" %}
{% block title %}Welcome - Nextcloud MCP Server{% endblock %}
{% block extra_head %}
<!-- Alpine.js for interactive elements -->
<script defer src="https://cdn.jsdelivr.net/npm/alpinejs@3.x.x/dist/cdn.min.js"></script>
{% endblock %}
{% block extra_styles %}
/* Welcome page specific styles */
.hero-section {
background: linear-gradient(135deg, var(--color-primary-element) 0%, #0082c9 100%);
color: white;
padding: 60px 24px;
margin: -24px -24px 40px -24px;
border-radius: 0 0 var(--border-radius-large) var(--border-radius-large);
text-align: center;
}
.hero-section h1 {
color: white;
font-size: 36px;
margin: 0 0 16px 0;
font-weight: 600;
}
.hero-section p {
font-size: 18px;
opacity: 0.95;
max-width: 700px;
margin: 0 auto;
line-height: 1.6;
}
.feature-grid {
display: grid;
grid-template-columns: repeat(auto-fit, minmax(280px, 1fr));
gap: 24px;
margin: 32px 0;
}
.feature-card {
background: var(--color-main-background);
border: 2px solid var(--color-border);
border-radius: var(--border-radius-large);
padding: 24px;
transition: all 0.2s;
cursor: pointer;
text-decoration: none;
color: inherit;
display: block;
}
.feature-card:hover {
border-color: var(--color-primary-element);
box-shadow: 0 4px 12px rgba(0, 103, 158, 0.15);
transform: translateY(-2px);
}
.feature-card h3 {
color: var(--color-primary-element);
font-size: 20px;
margin: 12px 0 8px 0;
font-weight: 600;
display: flex;
align-items: center;
gap: 12px;
}
.feature-card p {
color: var(--color-text-maxcontrast);
font-size: 14px;
line-height: 1.6;
margin: 8px 0 0 0;
}
.feature-icon {
width: 48px;
height: 48px;
background: var(--color-primary-element-light);
border-radius: var(--border-radius);
display: flex;
align-items: center;
justify-content: center;
margin-bottom: 8px;
}
.feature-icon svg {
width: 28px;
height: 28px;
fill: var(--color-primary-element);
}
.info-section {
background: var(--color-background-hover);
border-radius: var(--border-radius-large);
padding: 32px;
margin: 32px 0;
}
.info-section h2 {
color: var(--color-main-text);
font-size: 24px;
margin: 0 0 16px 0;
border: none;
padding: 0;
}
.info-section p {
color: var(--color-text-maxcontrast);
line-height: 1.7;
margin: 12px 0;
}
.info-section ul {
margin: 12px 0;
padding-left: 24px;
}
.info-section li {
color: var(--color-text-maxcontrast);
line-height: 1.7;
margin: 8px 0;
}
.info-section code {
background: var(--color-main-background);
padding: 2px 8px;
border-radius: var(--border-radius);
font-size: 13px;
}
.auth-status {
background: var(--color-primary-element-light);
border-left: 4px solid var(--color-primary-element);
padding: 16px 20px;
margin: 24px 0;
border-radius: var(--border-radius);
display: flex;
align-items: center;
gap: 12px;
}
.auth-status svg {
width: 24px;
height: 24px;
fill: var(--color-primary-element);
flex-shrink: 0;
}
.auth-status-text {
flex: 1;
}
.auth-status-text strong {
display: block;
color: var(--color-main-text);
font-size: 14px;
margin-bottom: 4px;
}
.auth-status-text span {
color: var(--color-text-maxcontrast);
font-size: 13px;
}
{% endblock %}
{% block content %}
<div class="app-content-wrapper">
<!-- Main Content Area -->
<main id="app-content">
<div class="page-content">
<!-- Hero Section -->
<div class="hero-section">
<h1>Welcome to Nextcloud MCP Server</h1>
<p>
Interactive user interface for semantic search and document retrieval.
Test queries, visualize results, and explore your Nextcloud content using RAG workflows.
</p>
</div>
<!-- Authentication Status -->
<div class="auth-status">
<svg viewBox="0 0 24 24">
<path d="M12,4A4,4 0 0,1 16,8A4,4 0 0,1 12,12A4,4 0 0,1 8,8A4,4 0 0,1 12,4M12,14C16.42,14 20,15.79 20,18V20H4V18C4,15.79 7.58,14 12,14Z" />
</svg>
<div class="auth-status-text">
<strong>Authenticated as: {{ username }}</strong>
<span>Authentication mode: <code>{{ auth_mode }}</code></span>
</div>
</div>
{% if vector_sync_enabled %}
<!-- Vector Sync Enabled Content -->
<div class="info-section">
<h2>About Semantic Search</h2>
<p>
This interface provides access to <strong>semantic search</strong> capabilities powered by vector embeddings.
Unlike traditional keyword search, semantic search understands the <em>meaning</em> of your queries and finds
conceptually similar content across your Nextcloud apps.
</p>
<p>
<strong>How it works:</strong>
</p>
<ul>
<li>Documents from Notes, Calendar, Files, Contacts, and Deck are indexed into a vector database</li>
<li>Each document chunk is converted to a 768-dimensional vector embedding that captures semantic meaning</li>
<li>Queries are also converted to embeddings and matched against document vectors using similarity search</li>
<li>Results can be retrieved using pure semantic search or hybrid BM25 search combining keywords and semantics</li>
</ul>
</div>
<div class="info-section">
<h2>RAG Workflow Integration</h2>
<p>
This UI allows you to <strong>test the same queries that Large Language Models (LLMs) would use</strong> in a
Retrieval-Augmented Generation (RAG) workflow. When an AI assistant needs to answer questions about your data:
</p>
<ul>
<li><strong>Step 1:</strong> The assistant converts your question into a search query</li>
<li><strong>Step 2:</strong> The MCP server retrieves relevant document chunks using semantic search</li>
<li><strong>Step 3:</strong> Retrieved context is passed to the LLM to generate an informed answer</li>
</ul>
<!-- RAG Workflow Diagram -->
<div style="background: var(--color-main-background); border: 2px solid var(--color-primary-element); border-radius: var(--border-radius-large); padding: 24px; margin: 24px 0; font-family: 'SFMono-Regular', 'Consolas', 'Liberation Mono', 'Menlo', monospace; font-size: 13px; line-height: 1.8; overflow-x: auto;">
<div style="text-align: center; font-weight: 600; margin-bottom: 16px; color: var(--color-primary-element); font-size: 14px;">
MCP Sampling RAG Workflow
</div>
<pre style="margin: 0; color: var(--color-main-text);">
┌─────────────────┐
<strong>MCP Client</strong> │ User asks: "What are health benefits of coffee?"
│ (Claude Code) │
└────────┬────────┘
│ (1) User question
┌────────────────────────────────────────────────────────────────────────┐
<strong>Nextcloud MCP Server</strong>
│ ┌──────────────────────────────────────────────────────────────────┐ │
│ │ <strong>nc_semantic_search_answer</strong> Tool (MCP Sampling-enabled) │ │
│ │ │ │
│ │ (2) Semantic Search │ │
│ │ ┌────────────────────────────────────────────────────────┐ │ │
│ │ │ Query: "health benefits of coffee" │ │ │
│ │ │ → Convert to 768D vector embedding │ │ │
│ │ │ → Search Qdrant (BM25 Hybrid + RRF fusion) │ │ │
│ │ │ → Retrieve top 5 relevant document chunks │ │ │
│ │ └────────────────────────────────────────────────────────┘ │ │
│ │ │ │
│ │ (3) Construct Prompt with Context │ │
│ │ ┌────────────────────────────────────────────────────────┐ │ │
│ │ │ "What are health benefits of coffee? │ │ │
│ │ │ │ │ │
│ │ │ Documents: │ │ │
│ │ │ - [MED-2155] Effects of habitual coffee consumption...│ │ │
│ │ │ - [MED-1646] Beverage consumption guidance... │ │ │
│ │ │ - [MED-1627] Coffee and depression risk... │ │ │
│ │ │ ... │ │ │
│ │ │ │ │ │
│ │ │ Provide answer with citations." │ │ │
│ │ └────────────────────────────────────────────────────────┘ │ │
│ │ │ │
│ │ (4) MCP Sampling Request │ │
│ │ ─────────────────────────────────────────────────────────────> │ │
│ └──────────────────────────────────────────────────────────────────┘ │
└────────────────────────────────────────────────────────────────────────┘
│ Sampling request with prompt + context
┌─────────────────┐
<strong>MCP Client</strong> │ (5) Client's LLM generates answer using retrieved context
│ (Claude) │ → "Coffee consumption (2-3 cups/day) is associated with
└────────┬────────┘ reduced risk of type 2 diabetes, cardiovascular disease,
│ and improved liver health (Document 1, 2)..."
│ (6) Answer with citations
┌─────────────────┐
│ User │ Receives comprehensive answer with source citations
└─────────────────┘</pre>
</div>
<p style="margin-top: 16px;">
<strong>Key Point:</strong> The MCP server retrieves context but doesn't generate answers itself.
Through <strong>MCP sampling</strong>, it requests the client's LLM to generate responses, giving users
full control over which model is used and ensuring all processing happens client-side.
</p>
<p>
By using this interface, you can preview search results, understand relevance scores, and verify
that the system retrieves the right information before it reaches the LLM.
</p>
</div>
<!-- Feature Cards -->
<h2>Available Features</h2>
<div class="feature-grid">
<a href="/app/user-info" class="feature-card">
<div class="feature-icon">
<svg viewBox="0 0 24 24">
<path d="M12,4A4,4 0 0,1 16,8A4,4 0 0,1 12,12A4,4 0 0,1 8,8A4,4 0 0,1 12,4M12,14C16.42,14 20,15.79 20,18V20H4V18C4,15.79 7.58,14 12,14Z" />
</svg>
</div>
<h3>User Information</h3>
<p>
View your authentication details, session information, and IdP profile.
Manage background access permissions.
</p>
</a>
<a href="/app/user-info#vector-sync" class="feature-card">
<div class="feature-icon">
<svg viewBox="0 0 24 24">
<path d="M12,18A6,6 0 0,1 6,12C6,11 6.25,10.03 6.7,9.2L5.24,7.74C4.46,8.97 4,10.43 4,12A8,8 0 0,0 12,20V23L16,19L12,15M12,4V1L8,5L12,9V6A6,6 0 0,1 18,12C18,13 17.75,13.97 17.3,14.8L18.76,16.26C19.54,15.03 20,13.57 20,12A8,8 0 0,0 12,4Z" />
</svg>
</div>
<h3>Vector Sync Status</h3>
<p>
Monitor real-time indexing progress with metrics for indexed documents, pending queue,
and synchronization status.
</p>
</a>
<a href="/app/user-info#vector-viz" class="feature-card">
<div class="feature-icon">
<svg viewBox="0 0 24 24">
<path d="M22,21H2V3H4V19H6V10H10V19H12V6H16V19H18V14H22V21Z" />
</svg>
</div>
<h3>Vector Visualization</h3>
<p>
Interactive search interface with 2D PCA visualization. Compare algorithms,
view relevance scores, and explore matched document chunks.
</p>
</a>
</div>
{% else %}
<!-- Vector Sync Disabled Content -->
<div class="warning">
<h3 style="margin-top: 0;">Vector Sync is Disabled</h3>
<p>
Semantic search and vector visualization features are currently disabled.
To enable these features, set <code>VECTOR_SYNC_ENABLED=true</code> in your environment configuration.
</p>
<p style="margin-bottom: 0;">
<strong>Learn more:</strong>
<a href="https://github.com/YOUR_REPO/docs/configuration.md" target="_blank" style="color: inherit; text-decoration: underline;">
Configuration Guide
</a>
</p>
</div>
<!-- Limited Feature Card -->
<h2>Available Features</h2>
<div class="feature-grid">
<a href="/app/user-info" class="feature-card">
<div class="feature-icon">
<svg viewBox="0 0 24 24">
<path d="M12,4A4,4 0 0,1 16,8A4,4 0 0,1 12,12A4,4 0 0,1 8,8A4,4 0 0,1 12,4M12,14C16.42,14 20,15.79 20,18V20H4V18C4,15.79 7.58,14 12,14Z" />
</svg>
</div>
<h3>User Information</h3>
<p>
View your authentication details, session information, and IdP profile.
Manage background access permissions.
</p>
</a>
</div>
{% endif %}
<!-- Documentation Section -->
<div class="info-section" style="margin-top: 40px;">
<h2>Documentation</h2>
<p>
For detailed information about configuration, authentication modes, and advanced features,
please refer to the project documentation:
</p>
<ul>
<li><a href="https://github.com/cbcoutinho/nextcloud-mcp-server/blob/master/docs/installation.md" target="_blank">Installation Guide</a></li>
<li><a href="https://github.com/cbcoutinho/nextcloud-mcp-server/blob/master/docs/configuration.md" target="_blank">Configuration Options</a></li>
<li><a href="https://github.com/cbcoutinho/nextcloud-mcp-server/blob/master/docs/authentication.md" target="_blank">Authentication Modes</a></li>
{% if vector_sync_enabled %}
<li><a href="https://github.com/cbcoutinho/nextcloud-mcp-server/blob/master/docs/user-guide/vector-sync-ui.md" target="_blank">Vector Sync UI Guide</a></li>
{% endif %}
</ul>
</div>
</div>
</main>
</div>
{% endblock %}
+2 -2
View File
@@ -14,11 +14,11 @@ The Token Broker provides:
- Session vs background token separation (RFC 8693)
"""
import asyncio
import logging
from datetime import datetime, timedelta, timezone
from typing import Dict, Optional, Tuple
import anyio
import httpx
import jwt
from cryptography.fernet import Fernet
@@ -43,7 +43,7 @@ class TokenCache:
self._cache: Dict[str, Tuple[str, datetime]] = {}
self._ttl = timedelta(seconds=ttl_seconds)
self._early_refresh = timedelta(seconds=early_refresh_seconds)
self._lock = asyncio.Lock()
self._lock = anyio.Lock()
async def get(self, user_id: str) -> Optional[str]:
"""Get cached token if valid."""
+58 -502
View File
@@ -9,15 +9,21 @@ For OAuth mode: Requires browser-based OAuth login to establish session.
import logging
import os
from pathlib import Path
from typing import Any
import httpx
from jinja2 import Environment, FileSystemLoader
from starlette.authentication import requires
from starlette.requests import Request
from starlette.responses import HTMLResponse, JSONResponse
logger = logging.getLogger(__name__)
# Setup Jinja2 environment for templates
_template_dir = Path(__file__).parent / "templates"
_jinja_env = Environment(loader=FileSystemLoader(_template_dir))
async def _get_authenticated_client_for_userinfo(request: Request) -> httpx.AsyncClient:
"""Get an authenticated HTTP client for user info page operations.
@@ -431,51 +437,14 @@ async def user_info_html(request: Request) -> HTMLResponse:
oauth_ctx = getattr(request.app.state, "oauth_context", None)
login_url = str(request.url_for("oauth_login")) if oauth_ctx else "/oauth/login"
error_html = f"""
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Error - Nextcloud MCP Server</title>
<style>
body {{
font-family: -apple-system, BlinkMacSystemFont, "Segoe UI", Roboto, "Helvetica Neue", Arial, sans-serif;
max-width: 800px;
margin: 50px auto;
padding: 20px;
background-color: #f5f5f5;
}}
.container {{
background: white;
border-radius: 8px;
padding: 30px;
box-shadow: 0 2px 4px rgba(0,0,0,0.1);
}}
h1 {{
color: #d32f2f;
margin-top: 0;
}}
.error {{
background-color: #ffebee;
border-left: 4px solid #d32f2f;
padding: 15px;
margin: 20px 0;
}}
</style>
</head>
<body>
<div class="container">
<h1>Error Retrieving User Info</h1>
<div class="error">
<strong>Error:</strong> {user_context["error"]}
</div>
<p><a href="{login_url}">Login again</a></p>
</div>
</body>
</html>
"""
return HTMLResponse(content=error_html)
template = _jinja_env.get_template("error.html")
return HTMLResponse(
content=template.render(
error_title="Error Retrieving User Info",
error_message=user_context["error"],
login_url=login_url,
)
)
# Build HTML response
auth_mode = user_context.get("auth_mode", "unknown")
@@ -654,398 +623,26 @@ async def user_info_html(request: Request) -> HTMLResponse:
</div>
"""
html_content = f"""
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Nextcloud MCP Server</title>
# Check if vector sync is enabled (needed for Welcome tab)
vector_sync_enabled = os.getenv("VECTOR_SYNC_ENABLED", "false").lower() == "true"
<!-- htmx for dynamic loading -->
<script src="https://unpkg.com/htmx.org@1.9.10"></script>
<!-- Alpine.js for tab state management -->
<script defer src="https://cdn.jsdelivr.net/npm/alpinejs@3.x.x/dist/cdn.min.js"></script>
<!-- Plotly.js for vector visualization -->
<script src="https://cdn.plot.ly/plotly-2.27.0.min.js"></script>
<!-- Vector visualization app (Alpine.js component) -->
<script>
function vizApp() {{
return {{
query: '',
algorithm: 'hybrid',
showAdvanced: false,
docTypes: [''], // Default to "All Types"
limit: 50,
scoreThreshold: 0.7,
semanticWeight: 0.5,
keywordWeight: 0.3,
fuzzyWeight: 0.2,
loading: false,
results: [],
async executeSearch() {{
this.loading = true;
this.results = [];
try {{
const params = new URLSearchParams({{
query: this.query,
algorithm: this.algorithm,
limit: this.limit,
score_threshold: this.scoreThreshold,
semantic_weight: this.semanticWeight,
keyword_weight: this.keywordWeight,
fuzzy_weight: this.fuzzyWeight,
}});
// Add doc_types parameter (filter out empty string for "All Types")
const selectedTypes = this.docTypes.filter(t => t !== '');
if (selectedTypes.length > 0) {{
params.append('doc_types', selectedTypes.join(','));
}}
const response = await fetch(`/app/vector-viz/search?${{params}}`);
const data = await response.json();
if (data.success) {{
this.results = data.results;
this.renderPlot(data.coordinates_2d, data.results);
}} else {{
alert('Search failed: ' + data.error);
}}
}} catch (error) {{
alert('Error: ' + error.message);
}} finally {{
this.loading = false;
}}
}},
renderPlot(coordinates, results) {{
const trace = {{
x: coordinates.map(c => c[0]),
y: coordinates.map(c => c[1]),
mode: 'markers',
type: 'scatter',
text: results.map(r => `${{r.title}}<br>Score: ${{r.score.toFixed(3)}}`),
marker: {{
size: 8,
color: results.map(r => r.score),
colorscale: 'Viridis',
showscale: true,
colorbar: {{ title: 'Score' }},
cmin: 0,
cmax: 1
}}
}};
const layout = {{
title: `Vector Space (PCA 2D) - ${{results.length}} results`,
xaxis: {{ title: 'PC1' }},
yaxis: {{ title: 'PC2' }},
hovermode: 'closest',
height: 600
}};
Plotly.newPlot('viz-plot', [trace], layout);
}},
getNextcloudUrl(result) {{
// Generate Nextcloud URL based on document type
// Use the actual Nextcloud host (port 8080), not the MCP server
const baseUrl = '{nextcloud_host_for_links}';
switch (result.doc_type) {{
case 'note':
return `${{baseUrl}}/apps/notes/note/${{result.id}}`;
case 'file':
return `${{baseUrl}}/apps/files/?fileId=${{result.id}}`;
case 'calendar':
return `${{baseUrl}}/apps/calendar`;
case 'contact':
return `${{baseUrl}}/apps/contacts`;
case 'deck':
return `${{baseUrl}}/apps/deck`;
default:
return `${{baseUrl}}`;
}}
}}
}}
}}
</script>
<style>
body {{
font-family: -apple-system, BlinkMacSystemFont, "Segoe UI", Roboto, "Helvetica Neue", Arial, sans-serif;
max-width: 900px;
margin: 50px auto;
padding: 20px;
background-color: #f5f5f5;
}}
.container {{
background: white;
border-radius: 8px;
padding: 30px;
box-shadow: 0 2px 4px rgba(0,0,0,0.1);
min-height: calc(100vh - 200px);
}}
h1 {{
color: #0082c9;
margin-top: 0;
border-bottom: 2px solid #0082c9;
padding-bottom: 10px;
}}
h2 {{
color: #333;
margin-top: 20px;
border-bottom: 1px solid #e0e0e0;
padding-bottom: 5px;
}}
/* Tab navigation */
.tabs {{
display: flex;
gap: 0;
margin: 20px 0 0 0;
border-bottom: 2px solid #e0e0e0;
}}
.tab {{
padding: 12px 24px;
cursor: pointer;
background: transparent;
border: none;
font-size: 14px;
font-weight: 500;
color: #666;
border-bottom: 2px solid transparent;
margin-bottom: -2px;
transition: all 0.2s;
}}
.tab:hover {{
color: #0082c9;
background-color: #f5f5f5;
}}
.tab.active {{
color: #0082c9;
border-bottom-color: #0082c9;
}}
/* Tab content - use grid to overlay panes */
.tab-content {{
padding: 20px 0;
display: grid;
}}
/* Tab panes - all occupy the same grid cell to overlay */
.tab-pane {{
grid-area: 1 / 1;
}}
/* Tables */
table {{
width: 100%;
border-collapse: collapse;
margin: 15px 0;
}}
td {{
padding: 10px;
border-bottom: 1px solid #e0e0e0;
}}
td:first-child {{
width: 200px;
color: #666;
}}
code {{
background-color: #f5f5f5;
padding: 2px 6px;
border-radius: 3px;
font-family: 'Courier New', monospace;
}}
/* Badges */
.badge {{
display: inline-block;
padding: 3px 8px;
border-radius: 12px;
font-size: 12px;
font-weight: bold;
text-transform: uppercase;
}}
.badge-oauth {{
background-color: #4caf50;
color: white;
}}
.badge-basic {{
background-color: #2196f3;
color: white;
}}
/* Messages */
.warning {{
background-color: #fff3cd;
border-left: 4px solid #ffc107;
padding: 15px;
margin: 15px 0;
color: #856404;
}}
.info-message {{
background-color: #e3f2fd;
border-left: 4px solid #2196f3;
padding: 15px;
margin: 15px 0;
color: #1565c0;
}}
/* Buttons */
.button {{
display: inline-block;
padding: 10px 20px;
background-color: #d32f2f;
color: white;
text-decoration: none;
border-radius: 4px;
transition: background-color 0.3s;
border: none;
cursor: pointer;
font-size: 14px;
}}
.button:hover {{
background-color: #b71c1c;
}}
.button-primary {{
background-color: #0082c9;
}}
.button-primary:hover {{
background-color: #006ba3;
}}
/* Logout section */
.logout {{
margin-top: 30px;
padding-top: 20px;
border-top: 1px solid #e0e0e0;
}}
/* Smooth htmx content swaps */
.htmx-swapping {{
opacity: 0;
transition: opacity 200ms ease-out;
}}
/* Smooth htmx content settling */
.htmx-settling {{
opacity: 1;
transition: opacity 200ms ease-in;
}}
</style>
</head>
<body>
<div class="container" x-data="{{ activeTab: 'user-info' }}">
<h1>Nextcloud MCP Server</h1>
<!-- Tab Navigation -->
<div class="tabs">
<button
class="tab"
:class="activeTab === 'user-info' ? 'active' : ''"
@click="activeTab = 'user-info'">
User Info
</button>
{
""
if not show_vector_sync_tab
else '''
<button
class="tab"
:class="activeTab === 'vector-sync' ? 'active' : ''"
@click="activeTab = 'vector-sync'">
Vector Sync
</button>
'''
}
{
""
if not show_vector_sync_tab
else '''
<button
class="tab"
:class="activeTab === 'vector-viz' ? 'active' : ''"
@click="activeTab = 'vector-viz'">
Vector Viz
</button>
'''
}
{
""
if not show_webhooks_tab
else '''
<button
class="tab"
:class="activeTab === 'webhooks' ? 'active' : ''"
@click="activeTab = 'webhooks'">
Webhooks
</button>
'''
}
</div>
<!-- Tab Content -->
<div class="tab-content">
<!-- User Info Tab -->
<div class="tab-pane" x-show="activeTab === 'user-info'" x-transition.opacity.duration.150ms>
{user_info_tab_html}
</div>
{
""
if not show_vector_sync_tab
else f'''
<!-- Vector Sync Tab -->
<div class="tab-pane" x-show="activeTab === 'vector-sync'" x-transition.opacity.duration.150ms>
{vector_sync_tab_html}
</div>
'''
}
{
""
if not show_vector_sync_tab
else '''
<!-- Vector Viz Tab -->
<div class="tab-pane" x-show="activeTab === 'vector-viz'" x-transition.opacity.duration.150ms>
<div hx-get="/app/vector-viz" hx-trigger="load" hx-swap="outerHTML">
<p style="color: #999;">Loading vector visualization...</p>
</div>
</div>
'''
}
{
""
if not show_webhooks_tab
else f'''
<!-- Webhooks Tab (admin-only, loaded dynamically) -->
<div class="tab-pane" x-show="activeTab === 'webhooks'" x-transition.opacity.duration.150ms>
{webhooks_tab_html}
</div>
'''
}
</div>
{
f'<div class="logout"><a href="{logout_url}" class="button">Logout</a></div>'
if auth_mode == "oauth"
else ""
}
</div>
</body>
</html>
"""
return HTMLResponse(content=html_content)
# Render template
template = _jinja_env.get_template("user_info.html")
return HTMLResponse(
content=template.render(
user_info_tab_html=user_info_tab_html,
vector_sync_tab_html=vector_sync_tab_html,
webhooks_tab_html=webhooks_tab_html,
show_vector_sync_tab=show_vector_sync_tab,
show_webhooks_tab=show_webhooks_tab,
logout_url=logout_url if auth_mode == "oauth" else None,
nextcloud_host_for_links=nextcloud_host_for_links,
# Additional context for Welcome tab
vector_sync_enabled=vector_sync_enabled,
username=username,
auth_mode=auth_mode,
)
)
@requires("authenticated", redirect="oauth_login")
@@ -1065,17 +662,12 @@ async def revoke_session(request: Request) -> HTMLResponse:
oauth_ctx = getattr(request.app.state, "oauth_context", None)
if not oauth_ctx:
template = _jinja_env.get_template("error.html")
return HTMLResponse(
"""
<!DOCTYPE html>
<html>
<head><title>Error</title></head>
<body>
<h1>Error</h1>
<p>OAuth mode not enabled</p>
</body>
</html>
""",
content=template.render(
error_title="Error",
error_message="OAuth mode not enabled",
),
status_code=400,
)
@@ -1083,17 +675,12 @@ async def revoke_session(request: Request) -> HTMLResponse:
session_id = request.cookies.get("mcp_session")
if not storage or not session_id:
template = _jinja_env.get_template("error.html")
return HTMLResponse(
"""
<!DOCTYPE html>
<html>
<head><title>Error</title></head>
<body>
<h1>Error</h1>
<p>Session not found</p>
</body>
</html>
""",
content=template.render(
error_title="Error",
error_message="Session not found",
),
status_code=400,
)
@@ -1106,57 +693,26 @@ async def revoke_session(request: Request) -> HTMLResponse:
# Redirect back to user page
user_page_url = str(request.url_for("user_info_html"))
template = _jinja_env.get_template("success.html")
return HTMLResponse(
f"""
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta http-equiv="refresh" content="2;url={user_page_url}">
<title>Background Access Revoked</title>
<style>
body {{
font-family: -apple-system, BlinkMacSystemFont, "Segoe UI", Roboto, sans-serif;
max-width: 600px;
margin: 50px auto;
padding: 20px;
text-align: center;
}}
.success {{
background-color: #e8f5e9;
border: 2px solid #4caf50;
padding: 30px;
border-radius: 8px;
}}
h1 {{
color: #4caf50;
}}
</style>
</head>
<body>
<div class="success">
<h1>✓ Background Access Revoked</h1>
<p>Your refresh token has been deleted successfully.</p>
<p>Browser session remains active.</p>
<p>Redirecting back to user page...</p>
</div>
</body>
</html>
"""
content=template.render(
success_title="✓ Background Access Revoked",
success_messages=[
"Your refresh token has been deleted successfully.",
"Browser session remains active.",
],
redirect_url=user_page_url,
redirect_delay=2,
)
)
except Exception as e:
logger.error(f"Failed to revoke background access: {e}")
template = _jinja_env.get_template("error.html")
return HTMLResponse(
f"""
<!DOCTYPE html>
<html>
<head><title>Error</title></head>
<body>
<h1>Error</h1>
<p>Failed to revoke background access: {e}</p>
</body>
</html>
""",
content=template.render(
error_title="Error",
error_message=f"Failed to revoke background access: {e}",
),
status_code=500,
)
+353 -360
View File
@@ -1,27 +1,29 @@
"""Vector visualization routes for testing search algorithms.
Provides a web UI for users to test different search algorithms on their own
indexed documents and visualize results in 2D space using PCA.
indexed documents and visualize results in 3D space using PCA.
All processing happens server-side following ADR-012:
- Search execution via shared search/algorithms.py
- PCA dimensionality reduction (768-dim → 2D)
- Only 2D coordinates + metadata sent to client
- Bandwidth-efficient (2 floats per doc vs 768)
- Query embedding generation
- PCA dimensionality reduction (768-dim → 3D)
- Only 3D coordinates + metadata sent to client
- Bandwidth-efficient (3 floats per doc vs 768)
"""
import logging
import time
from pathlib import Path
import numpy as np
from jinja2 import Environment, FileSystemLoader
from starlette.authentication import requires
from starlette.requests import Request
from starlette.responses import HTMLResponse, JSONResponse
from nextcloud_mcp_server.config import get_settings
from nextcloud_mcp_server.search import (
FuzzySearchAlgorithm,
HybridSearchAlgorithm,
KeywordSearchAlgorithm,
BM25HybridSearchAlgorithm,
SemanticSearchAlgorithm,
)
from nextcloud_mcp_server.vector.pca import PCA
@@ -29,6 +31,10 @@ from nextcloud_mcp_server.vector.qdrant_client import get_qdrant_client
logger = logging.getLogger(__name__)
# Setup Jinja2 environment for templates
_template_dir = Path(__file__).parent / "templates"
_jinja_env = Environment(loader=FileSystemLoader(_template_dir))
@requires("authenticated", redirect="oauth_login")
async def vector_visualization_html(request: Request) -> HTMLResponse:
@@ -64,284 +70,28 @@ async def vector_visualization_html(request: Request) -> HTMLResponse:
else "unknown"
)
html_content = f"""
<style>
.viz-card {{
background: white;
border-radius: 8px;
padding: 20px;
margin-bottom: 20px;
box-shadow: 0 2px 4px rgba(0,0,0,0.1);
}}
.viz-controls {{
margin-bottom: 20px;
}}
.viz-control-row {{
display: grid;
grid-template-columns: 2fr 1fr auto;
gap: 12px;
margin-bottom: 12px;
align-items: end;
}}
.viz-control-group {{
margin-bottom: 15px;
}}
.viz-control-group label {{
display: block;
margin-bottom: 5px;
font-weight: 500;
color: #333;
}}
.viz-control-group input[type="text"],
.viz-control-group input[type="number"],
.viz-control-group select {{
width: 100%;
padding: 8px 12px;
border: 1px solid #ddd;
border-radius: 4px;
font-size: 14px;
}}
.viz-control-group input[type="range"] {{
width: 100%;
}}
.viz-control-group select[multiple] {{
min-height: 100px;
}}
.viz-weight-display {{
display: inline-block;
min-width: 40px;
text-align: right;
color: #666;
}}
.viz-btn {{
background: #0066cc;
color: white;
border: none;
padding: 10px 20px;
border-radius: 4px;
cursor: pointer;
font-size: 14px;
font-weight: 500;
}}
.viz-btn:hover {{
background: #0052a3;
}}
.viz-btn-secondary {{
background: #6c757d;
color: white;
border: none;
padding: 6px 12px;
border-radius: 4px;
cursor: pointer;
font-size: 13px;
margin-bottom: 12px;
}}
.viz-btn-secondary:hover {{
background: #5a6268;
}}
#viz-plot-container {{
width: 100%;
height: 600px;
position: relative;
}}
#viz-plot {{
width: 100%;
height: 100%;
}}
.viz-loading {{
text-align: center;
padding: 40px;
color: #666;
}}
.viz-loading-overlay {{
position: absolute;
inset: 0;
display: flex;
align-items: center;
justify-content: center;
background: white;
color: #666;
}}
.viz-no-results {{
text-align: center;
padding: 40px;
color: #666;
font-style: italic;
}}
.viz-advanced-section {{
margin-top: 16px;
padding: 16px;
background: #f8f9fa;
border-radius: 4px;
border: 1px solid #dee2e6;
}}
.viz-advanced-grid {{
display: grid;
grid-template-columns: 1fr 1fr;
gap: 20px;
}}
.viz-info-box {{
background: #e3f2fd;
border-left: 4px solid #2196f3;
padding: 12px;
margin-bottom: 20px;
font-size: 14px;
}}
</style>
<div x-data="vizApp()">
<div class="viz-card">
<h2>Vector Visualization</h2>
<div class="viz-info-box">
Testing search algorithms on your indexed documents. User: <strong>{username}</strong>
</div>
<form @submit.prevent="executeSearch">
<div class="viz-controls">
<!-- Main Controls -->
<div class="viz-control-group">
<label>Search Query</label>
<input type="text" x-model="query" placeholder="Enter search query..." required />
</div>
<div class="viz-control-row">
<div class="viz-control-group" style="margin-bottom: 0;">
<label>Algorithm</label>
<select x-model="algorithm">
<option value="semantic">Semantic (Vector Similarity)</option>
<option value="keyword">Keyword (Token Matching)</option>
<option value="fuzzy">Fuzzy (Character Overlap)</option>
<option value="hybrid" selected>Hybrid (RRF Fusion)</option>
</select>
</div>
<div style="display: flex; align-items: flex-end;">
<button type="submit" class="viz-btn" style="width: 100%;">Search & Visualize</button>
</div>
<div style="display: flex; align-items: flex-end;">
<button type="button" class="viz-btn-secondary" @click="showAdvanced = !showAdvanced" style="white-space: nowrap;">
<span x-text="showAdvanced ? 'Hide Advanced' : 'Advanced'"></span>
</button>
</div>
</div>
<!-- Advanced Options (Collapsible) -->
<div class="viz-advanced-section" x-show="showAdvanced" x-transition.opacity.duration.200ms>
<h3 style="margin-top: 0; margin-bottom: 16px; font-size: 16px;">Advanced Options</h3>
<div class="viz-advanced-grid">
<div class="viz-control-group">
<label>Document Types</label>
<select x-model="docTypes" multiple>
<option value="">All Types (cross-app search)</option>
<option value="note">Notes</option>
<option value="file">Files</option>
<option value="calendar">Calendar Events</option>
<option value="contact">Contacts</option>
<option value="deck">Deck Cards</option>
</select>
<small style="color: #666; display: block; margin-top: 4px;">
Hold Ctrl/Cmd to select multiple
</small>
</div>
<div>
<div class="viz-control-group">
<label>Score Threshold (Semantic/Hybrid)</label>
<input type="number" x-model.number="scoreThreshold" min="0" max="1" step="0.1" />
</div>
<div class="viz-control-group">
<label>Result Limit</label>
<input type="number" x-model.number="limit" min="1" max="100" />
</div>
</div>
</div>
<!-- Hybrid Weights (only when hybrid selected) -->
<div x-show="algorithm === 'hybrid'" style="margin-top: 16px; padding: 12px; background: #e9ecef; border-radius: 4px;">
<label style="margin-bottom: 12px; display: block;">Hybrid Algorithm Weights</label>
<div style="margin-bottom: 8px;">
<label style="display: inline-block; width: 100px; font-weight: normal;">Semantic:</label>
<input type="range" x-model.number="semanticWeight" min="0" max="1" step="0.1" style="width: 200px; display: inline-block;">
<span class="viz-weight-display" x-text="semanticWeight.toFixed(1)"></span>
</div>
<div style="margin-bottom: 8px;">
<label style="display: inline-block; width: 100px; font-weight: normal;">Keyword:</label>
<input type="range" x-model.number="keywordWeight" min="0" max="1" step="0.1" style="width: 200px; display: inline-block;">
<span class="viz-weight-display" x-text="keywordWeight.toFixed(1)"></span>
</div>
<div>
<label style="display: inline-block; width: 100px; font-weight: normal;">Fuzzy:</label>
<input type="range" x-model.number="fuzzyWeight" min="0" max="1" step="0.1" style="width: 200px; display: inline-block;">
<span class="viz-weight-display" x-text="fuzzyWeight.toFixed(1)"></span>
</div>
</div>
</div>
</div>
</form>
</div>
<div class="viz-card">
<div id="viz-plot-container">
<div x-show="loading" class="viz-loading-overlay" x-transition.opacity.duration.200ms>
Executing search and computing PCA projection...
</div>
<div id="viz-plot" x-show="!loading" x-transition.opacity.duration.200ms></div>
</div>
</div>
<div class="viz-card">
<h3>Search Results (<span x-text="loading ? '...' : results.length"></span>)</h3>
<div x-show="loading" class="viz-loading" x-transition.opacity.duration.200ms>
Loading results...
</div>
<div x-show="!loading && results.length === 0" class="viz-no-results" x-transition.opacity.duration.200ms>
No results found. Try a different query or adjust your search parameters.
</div>
<template x-if="!loading && results.length > 0">
<div x-transition.opacity.duration.200ms>
<template x-for="result in results" :key="result.id">
<div style="padding: 12px; border-bottom: 1px solid #eee;">
<a :href="getNextcloudUrl(result)" target="_blank" style="font-weight: 500; color: #0066cc; text-decoration: none;">
<span x-text="result.title"></span>
</a>
<div style="font-size: 14px; color: #666; margin-top: 4px;" x-text="result.excerpt"></div>
<div style="font-size: 12px; color: #999; margin-top: 4px;">
Score: <span x-text="result.score.toFixed(3)"></span> |
Type: <span x-text="result.doc_type"></span>
</div>
</div>
</template>
</div>
</template>
</div>
</div>
"""
# Load and render template
template = _jinja_env.get_template("vector_viz.html")
html_content = template.render(username=username)
return HTMLResponse(content=html_content)
@requires("authenticated", redirect="oauth_login")
async def vector_visualization_search(request: Request) -> JSONResponse:
"""Execute server-side search and return 2D coordinates + results.
"""Execute server-side search and return 3D coordinates + results.
All processing happens server-side:
1. Execute search via shared algorithm module
2. Fetch matching vectors from Qdrant
3. Apply PCA reduction (768-dim → 2D)
4. Return coordinates + metadata only
2. Generate query embedding
3. Fetch matching vectors from Qdrant
4. Apply PCA reduction (768-dim → 3D) to query + documents
5. Return coordinates + metadata only
Args:
request: Starlette request with query parameters
Returns:
JSON response with coordinates_2d and results
JSON response with coordinates_3d and results (including query point)
"""
settings = get_settings()
@@ -364,12 +114,10 @@ async def vector_visualization_search(request: Request) -> JSONResponse:
# Parse query parameters
query = request.query_params.get("query", "")
algorithm = request.query_params.get("algorithm", "hybrid")
algorithm = request.query_params.get("algorithm", "bm25_hybrid")
limit = int(request.query_params.get("limit", "50"))
score_threshold = float(request.query_params.get("score_threshold", "0.7"))
semantic_weight = float(request.query_params.get("semantic_weight", "0.5"))
keyword_weight = float(request.query_params.get("keyword_weight", "0.3"))
fuzzy_weight = float(request.query_params.get("fuzzy_weight", "0.2"))
score_threshold = float(request.query_params.get("score_threshold", "0.0"))
fusion = request.query_params.get("fusion", "rrf") # Default to RRF
# Parse doc_types (comma-separated list, None = all types)
doc_types_param = request.query_params.get("doc_types", "")
@@ -377,71 +125,26 @@ async def vector_visualization_search(request: Request) -> JSONResponse:
logger.info(
f"Viz search: user={username}, query='{query}', "
f"algorithm={algorithm}, limit={limit}, doc_types={doc_types}"
f"algorithm={algorithm}, fusion={fusion}, limit={limit}, doc_types={doc_types}"
)
try:
# Start total request timer
request_start = time.perf_counter()
# Get authenticated HTTP client from session
# In BasicAuth mode: uses username/password from session
# In OAuth mode: uses access token from session
from nextcloud_mcp_server.auth.userinfo_routes import (
_get_authenticated_client_for_userinfo,
)
from nextcloud_mcp_server.client.notes import NotesClient
async with await _get_authenticated_client_for_userinfo(request) as http_client:
# Create NotesClient directly with authenticated HTTP client
notes_client = NotesClient(http_client, username)
# Wrap in a minimal client object for search algorithms
# This conforms to NextcloudClientProtocol but only implements notes
class MinimalNextcloudClient:
def __init__(self, notes_client, username):
self._notes = notes_client
self.username = username
@property
def notes(self):
return self._notes
@property
def webdav(self):
return None
@property
def calendar(self):
return None
@property
def contacts(self):
return None
@property
def deck(self):
return None
@property
def cookbook(self):
return None
@property
def tables(self):
return None
nextcloud_client = MinimalNextcloudClient(notes_client, username)
# Create search algorithm
async with await _get_authenticated_client_for_userinfo(request) as http_client: # noqa: F841
# Create search algorithm (no client needed - verification removed)
if algorithm == "semantic":
search_algo = SemanticSearchAlgorithm(score_threshold=score_threshold)
elif algorithm == "keyword":
search_algo = KeywordSearchAlgorithm()
elif algorithm == "fuzzy":
search_algo = FuzzySearchAlgorithm()
elif algorithm == "hybrid":
search_algo = HybridSearchAlgorithm(
semantic_weight=semantic_weight,
keyword_weight=keyword_weight,
fuzzy_weight=fuzzy_weight,
elif algorithm == "bm25_hybrid":
search_algo = BM25HybridSearchAlgorithm(
score_threshold=score_threshold, fusion=fusion
)
else:
return JSONResponse(
@@ -451,6 +154,7 @@ async def vector_visualization_search(request: Request) -> JSONResponse:
# Execute search (supports cross-app when doc_types=None)
# Get unverified results with buffer for filtering
search_start = time.perf_counter()
all_results = []
if doc_types is None or len(doc_types) == 0:
# Cross-app search - search all indexed types
@@ -476,25 +180,45 @@ async def vector_visualization_search(request: Request) -> JSONResponse:
# Sort by score before verification
all_results.sort(key=lambda r: r.score, reverse=True)
# Verify access for all results (deduplicates and filters)
from nextcloud_mcp_server.search.verification import verify_search_results
# No verification needed for visualization - we only need Qdrant metadata
# (title, excerpt, doc_type) which is already in search results.
# Verification is only needed for sampling (LLM needs full content).
search_results = all_results[:limit]
search_duration = time.perf_counter() - search_start
verified_results = await verify_search_results(
all_results, nextcloud_client
# Store original scores and normalize for visualization
# (best result = 1.0, worst result = 0.0 within THIS result set)
# This makes visual encoding meaningful regardless of RRF normalization
if search_results:
scores = [r.score for r in search_results]
min_score, max_score = min(scores), max(scores)
score_range = max_score - min_score if max_score > min_score else 1.0
logger.info(
f"Normalizing scores for viz: original range [{min_score:.3f}, {max_score:.3f}] "
f"→ [0.0, 1.0]"
)
search_results = verified_results[:limit]
# Store original score and rescale to 0-1 for visualization
for r in search_results:
# Store original score before normalization
r.original_score = r.score
# Rescale for visual encoding
r.score = (r.score - min_score) / score_range
if not search_results:
return JSONResponse(
{
"success": True,
"results": [],
"coordinates_2d": [],
"coordinates_3d": [],
"query_coords": None,
"message": "No results found",
}
)
# Fetch vectors for matching results from Qdrant
vector_fetch_start = time.perf_counter()
qdrant_client = await get_qdrant_client()
doc_ids = [r.id for r in search_results]
@@ -516,7 +240,7 @@ async def vector_visualization_search(request: Request) -> JSONResponse:
]
),
limit=len(doc_ids) * 2, # Account for multiple chunks per doc
with_vectors=True,
with_vectors=["dense"], # Only fetch dense vectors for visualization
with_payload=["doc_id"], # Need doc_id to map vectors to results
)
@@ -532,11 +256,31 @@ async def vector_visualization_search(request: Request) -> JSONResponse:
}
)
# Extract vectors
vectors = np.array([p.vector for p in points if p.vector is not None])
# Extract dense vectors and group by document
def extract_dense_vector(point):
if point.vector is None:
return None
# If named vectors (dict), extract "dense"
if isinstance(point.vector, dict):
return point.vector.get("dense")
# If unnamed vector (array), use directly
return point.vector
if len(vectors) < 2:
# Not enough points for PCA
# Group chunk vectors by doc_id
from collections import defaultdict
doc_chunks = defaultdict(list)
for point in points:
if point.payload:
doc_id = int(point.payload.get("doc_id", 0))
vector = extract_dense_vector(point)
if vector is not None:
doc_chunks[doc_id].append(vector)
vector_fetch_duration = time.perf_counter() - vector_fetch_start
if len(doc_chunks) < 2:
# Not enough documents for PCA
return JSONResponse(
{
"success": True,
@@ -550,33 +294,131 @@ async def vector_visualization_search(request: Request) -> JSONResponse:
}
for r in search_results
],
"coordinates_2d": [[0, 0]] * len(search_results),
"message": "Not enough vectors for PCA",
"coordinates_3d": [[0, 0, 0]] * len(search_results),
"query_coords": [0, 0, 0],
"message": "Not enough documents for PCA",
}
)
# Apply PCA dimensionality reduction (768-dim → 2D)
pca = PCA(n_components=2)
coords_2d = pca.fit_transform(vectors)
# Detect embedding dimension from first available vector
embedding_dim = None
for chunks in doc_chunks.values():
if chunks:
embedding_dim = len(chunks[0])
break
if embedding_dim is None:
return JSONResponse(
{
"success": False,
"error": "Could not determine embedding dimension",
},
status_code=500,
)
logger.info(f"Detected embedding dimension: {embedding_dim}")
# Average chunk vectors per document to create document-level embeddings
# Maintain order of search_results for coordinate mapping
doc_vectors = []
for result in search_results:
if result.id in doc_chunks:
# Average all chunk embeddings for this document
chunk_vectors = np.array(doc_chunks[result.id])
avg_vector = np.mean(chunk_vectors, axis=0)
doc_vectors.append(avg_vector)
logger.debug(f"Doc {result.id}: averaged {len(chunk_vectors)} chunks")
else:
# Document not found in vectors (shouldn't happen)
logger.warning(f"Doc {result.id} not found in fetched vectors")
# Use zero vector as fallback with detected dimension
doc_vectors.append(np.zeros(embedding_dim))
doc_vectors = np.array(doc_vectors)
# Generate query embedding for visualization
query_embed_start = time.perf_counter()
from nextcloud_mcp_server.embedding.service import get_embedding_service
embedding_service = get_embedding_service()
query_embedding = await embedding_service.embed(query)
query_embed_duration = time.perf_counter() - query_embed_start
logger.info(f"Generated query embedding (dimension={len(query_embedding)})")
# Combine query vector with document vectors for PCA
# Query will be the last point in the array
all_vectors = np.vstack([doc_vectors, np.array([query_embedding])])
# Normalize vectors to unit length (L2 normalization)
# This is critical because Qdrant uses COSINE distance, which only measures
# vector direction (angle), not magnitude. PCA uses Euclidean distance which
# considers both direction and magnitude. By normalizing to unit length,
# Euclidean distances in PCA space will match cosine distances.
norms = np.linalg.norm(all_vectors, axis=1, keepdims=True)
# Check for zero-norm vectors (can happen with empty/corrupted embeddings)
zero_norm_mask = norms[:, 0] < 1e-10
if zero_norm_mask.any():
zero_indices = np.where(zero_norm_mask)[0]
logger.warning(
f"Found {zero_norm_mask.sum()} zero-norm vectors at indices {zero_indices.tolist()}. "
"Replacing with small epsilon to avoid division by zero."
)
# Replace zero norms with small epsilon to avoid NaN
norms[zero_norm_mask] = 1e-10
all_vectors_normalized = all_vectors / norms
logger.info(
f"Normalized vectors: query_norm={norms[-1][0]:.3f}, "
f"doc_norm_range=[{norms[:-1].min():.3f}, {norms[:-1].max():.3f}]"
)
# Apply PCA dimensionality reduction (768-dim → 3D) on normalized vectors
pca_start = time.perf_counter()
pca = PCA(n_components=3)
coords_3d = pca.fit_transform(all_vectors_normalized)
pca_duration = time.perf_counter() - pca_start
# After fit, these attributes are guaranteed to be set
assert pca.explained_variance_ratio_ is not None
logger.info(
f"PCA explained variance: PC1={pca.explained_variance_ratio_[0]:.3f}, "
f"PC2={pca.explained_variance_ratio_[1]:.3f}"
# Check for NaN values in PCA output (numerical instability)
nan_mask = np.isnan(coords_3d)
if nan_mask.any():
nan_rows = np.where(nan_mask.any(axis=1))[0]
logger.error(
f"Found NaN values in PCA output at {len(nan_rows)} points: {nan_rows.tolist()[:10]}. "
"Replacing NaN with 0.0 to prevent JSON serialization error."
)
# Replace NaN with 0 to allow JSON serialization
coords_3d = np.nan_to_num(coords_3d, nan=0.0)
# Split query coords from document coords
# Round to 2 decimal places for cleaner display
query_coords_3d = [
round(float(x), 2) for x in coords_3d[-1]
] # Last point is query
doc_coords_3d = coords_3d[:-1] # All but last are documents
total_chunks = sum(len(chunks) for chunks in doc_chunks.values())
avg_chunks_per_doc = (
total_chunks / len(doc_vectors) if doc_vectors.size > 0 else 0
)
# Map results to coordinates (use first chunk per document)
result_coords = []
seen_doc_ids = set()
logger.info(
f"PCA explained variance: PC1={pca.explained_variance_ratio_[0]:.3f}, "
f"PC2={pca.explained_variance_ratio_[1]:.3f}, "
f"PC3={pca.explained_variance_ratio_[2]:.3f}"
)
logger.info(
f"Embedding stats: documents={len(doc_vectors)}, "
f"total_chunks={total_chunks}, avg_chunks_per_doc={avg_chunks_per_doc:.1f}, "
f"query_dim={len(query_embedding)}, doc_vector_dim={doc_vectors.shape[1] if doc_vectors.size > 0 else 0}"
)
for point, coord in zip(points, coords_2d):
if point.payload:
doc_id = int(point.payload.get("doc_id", 0))
if doc_id not in seen_doc_ids and doc_id in doc_ids:
seen_doc_ids.add(doc_id)
result_coords.append(coord.tolist())
# Coordinates already match search_results order (1:1 mapping)
result_coords = [[round(float(x), 2) for x in coord] for coord in doc_coords_3d]
# Build response
response_results = [
@@ -585,19 +427,48 @@ async def vector_visualization_search(request: Request) -> JSONResponse:
"doc_type": r.doc_type,
"title": r.title,
"excerpt": r.excerpt,
"score": r.score,
"score": r.score, # Normalized score for visual encoding (0-1)
"original_score": getattr(
r, "original_score", r.score
), # Raw score from algorithm
"chunk_start_offset": r.chunk_start_offset,
"chunk_end_offset": r.chunk_end_offset,
}
for r in search_results
]
# Calculate total request duration
total_duration = time.perf_counter() - request_start
# Log comprehensive timing metrics
logger.info(
f"Viz search timing: total={total_duration * 1000:.1f}ms, "
f"search={search_duration * 1000:.1f}ms ({search_duration / total_duration * 100:.1f}%), "
f"vector_fetch={vector_fetch_duration * 1000:.1f}ms ({vector_fetch_duration / total_duration * 100:.1f}%), "
f"query_embed={query_embed_duration * 1000:.1f}ms ({query_embed_duration / total_duration * 100:.1f}%), "
f"pca={pca_duration * 1000:.1f}ms ({pca_duration / total_duration * 100:.1f}%), "
f"results={len(search_results)}, doc_vectors={len(doc_vectors)}"
)
return JSONResponse(
{
"success": True,
"results": response_results,
"coordinates_2d": result_coords[: len(search_results)],
"coordinates_3d": result_coords[: len(search_results)],
"query_coords": query_coords_3d,
"pca_variance": {
"pc1": float(pca.explained_variance_ratio_[0]),
"pc2": float(pca.explained_variance_ratio_[1]),
"pc3": float(pca.explained_variance_ratio_[2]),
},
"timing": {
"total_ms": round(total_duration * 1000, 2),
"search_ms": round(search_duration * 1000, 2),
"vector_fetch_ms": round(vector_fetch_duration * 1000, 2),
"query_embed_ms": round(query_embed_duration * 1000, 2),
"pca_ms": round(pca_duration * 1000, 2),
"num_results": len(search_results),
"num_doc_vectors": len(doc_vectors),
},
}
)
@@ -608,3 +479,125 @@ async def vector_visualization_search(request: Request) -> JSONResponse:
{"success": False, "error": str(e)},
status_code=500,
)
@requires("authenticated", redirect="oauth_login")
async def chunk_context_endpoint(request: Request) -> JSONResponse:
"""Fetch chunk text with surrounding context for visualization.
This endpoint retrieves the matched chunk along with surrounding text
to provide context for the search result. Used by the viz pane to
display chunks inline.
Query parameters:
doc_type: Document type (e.g., "note")
doc_id: Document ID
start: Chunk start offset (character position)
end: Chunk end offset (character position)
context: Characters of context before/after (default: 500)
Returns:
JSON with chunk_text, before_context, after_context, and flags
"""
try:
# Get query parameters
doc_type = request.query_params.get("doc_type")
doc_id = request.query_params.get("doc_id")
start_str = request.query_params.get("start")
end_str = request.query_params.get("end")
context_chars = int(request.query_params.get("context", "500"))
# Validate required parameters
if not all([doc_type, doc_id, start_str, end_str]):
return JSONResponse(
{
"success": False,
"error": "Missing required parameters: doc_type, doc_id, start, end",
},
status_code=400,
)
start = int(start_str)
end = int(end_str)
# Currently only support notes
if doc_type != "note":
return JSONResponse(
{"success": False, "error": f"Unsupported doc_type: {doc_type}"},
status_code=400,
)
# Get authenticated HTTP client and fetch note
from nextcloud_mcp_server.auth.userinfo_routes import (
_get_authenticated_client_for_userinfo,
)
from nextcloud_mcp_server.client.notes import NotesClient
# Get username from request auth
username = (
request.user.display_name
if hasattr(request.user, "display_name")
else "unknown"
)
# Create notes client with authenticated HTTP client
http_client = await _get_authenticated_client_for_userinfo(request)
notes_client = NotesClient(http_client, username)
# Fetch full note content
note = await notes_client.get_note(int(doc_id))
full_content = f"{note['title']}\n\n{note['content']}"
# Validate offsets
if start < 0 or end > len(full_content) or start >= end:
return JSONResponse(
{
"success": False,
"error": f"Invalid offsets: start={start}, end={end}, content_length={len(full_content)}",
},
status_code=400,
)
# Extract chunk
chunk_text = full_content[start:end]
# Extract context before and after
before_start = max(0, start - context_chars)
before_context = full_content[before_start:start]
after_end = min(len(full_content), end + context_chars)
after_context = full_content[end:after_end]
# Determine if there's more content
has_more_before = before_start > 0
has_more_after = after_end < len(full_content)
logger.info(
f"Fetched chunk context for {doc_type}_{doc_id}: "
f"chunk_len={len(chunk_text)}, before_len={len(before_context)}, "
f"after_len={len(after_context)}"
)
return JSONResponse(
{
"success": True,
"chunk_text": chunk_text,
"before_context": before_context,
"after_context": after_context,
"has_more_before": has_more_before,
"has_more_after": has_more_after,
}
)
except ValueError as e:
logger.error(f"Invalid parameter format: {e}")
return JSONResponse(
{"success": False, "error": f"Invalid parameter format: {e}"},
status_code=400,
)
except Exception as e:
logger.error(f"Chunk context error: {e}", exc_info=True)
return JSONResponse(
{"success": False, "error": str(e)},
status_code=500,
)
+2 -1
View File
@@ -5,6 +5,7 @@ import time
from abc import ABC
from functools import wraps
import anyio
from httpx import AsyncClient, HTTPStatusError, RequestError, codes
from nextcloud_mcp_server.observability.metrics import (
@@ -47,7 +48,7 @@ def retry_on_429(func):
# Record retry metric (extract app name from args if available)
if len(args) > 0 and hasattr(args[0], "app_name"):
record_nextcloud_api_retry(app=args[0].app_name, reason="429")
time.sleep(5)
await anyio.sleep(5)
elif e.response.status_code == 404:
# 404 errors are often expected (e.g., checking if attachments exist)
# Log as debug instead of warning
+1 -1
View File
@@ -40,7 +40,7 @@ class NotesClient(BaseNextcloudClient):
seen_ids: set[int] = set()
while True:
params: Dict[str, Any] = {"chunkSize": 10}
params: Dict[str, Any] = {"chunkSize": 100}
if cursor:
params["chunkCursor"] = cursor
if prune_before is not None:
+7 -7
View File
@@ -181,8 +181,8 @@ class Settings:
ollama_verify_ssl: bool = True
# Document chunking settings (for vector embeddings)
document_chunk_size: int = 512 # Words per chunk
document_chunk_overlap: int = 50 # Overlapping words between chunks
document_chunk_size: int = 2048 # Characters per chunk
document_chunk_overlap: int = 200 # Overlapping characters between chunks
# Observability settings
metrics_enabled: bool = True
@@ -227,10 +227,10 @@ class Settings:
f"Overlap should be 10-20% of chunk size for optimal results."
)
if self.document_chunk_size < 100:
if self.document_chunk_size < 512:
logger.warning(
f"DOCUMENT_CHUNK_SIZE is set to {self.document_chunk_size} words, which is quite small. "
f"Smaller chunks may lose context. Consider using at least 256 words."
f"DOCUMENT_CHUNK_SIZE is set to {self.document_chunk_size} characters, which is quite small. "
f"Smaller chunks may lose context. Consider using at least 1024 characters."
)
if self.document_chunk_overlap < 0:
@@ -335,8 +335,8 @@ def get_settings() -> Settings:
ollama_embedding_model=os.getenv("OLLAMA_EMBEDDING_MODEL", "nomic-embed-text"),
ollama_verify_ssl=os.getenv("OLLAMA_VERIFY_SSL", "true").lower() == "true",
# Document chunking settings
document_chunk_size=int(os.getenv("DOCUMENT_CHUNK_SIZE", "512")),
document_chunk_overlap=int(os.getenv("DOCUMENT_CHUNK_OVERLAP", "50")),
document_chunk_size=int(os.getenv("DOCUMENT_CHUNK_SIZE", "2048")),
document_chunk_overlap=int(os.getenv("DOCUMENT_CHUNK_OVERLAP", "200")),
# Observability settings
metrics_enabled=os.getenv("METRICS_ENABLED", "true").lower() == "true",
metrics_port=int(os.getenv("METRICS_PORT", "9090")),
+9 -2
View File
@@ -1,6 +1,13 @@
"""Embedding service package for generating vector embeddings."""
from .service import EmbeddingService, get_embedding_service
from .bm25_provider import BM25SparseEmbeddingProvider
from .service import EmbeddingService, get_bm25_service, get_embedding_service
from .simple_provider import SimpleEmbeddingProvider
__all__ = ["EmbeddingService", "get_embedding_service", "SimpleEmbeddingProvider"]
__all__ = [
"EmbeddingService",
"get_embedding_service",
"BM25SparseEmbeddingProvider",
"get_bm25_service",
"SimpleEmbeddingProvider",
]
@@ -0,0 +1,74 @@
"""BM25 sparse embedding provider using FastEmbed."""
import logging
from typing import Any
from fastembed import SparseTextEmbedding
logger = logging.getLogger(__name__)
class BM25SparseEmbeddingProvider:
"""
BM25 sparse embedding provider for hybrid search.
Uses FastEmbed's BM25 model to generate sparse vectors for keyword-based
retrieval. These sparse vectors are combined with dense semantic vectors
in Qdrant using Reciprocal Rank Fusion (RRF) for hybrid search.
Unlike dense embeddings which have fixed dimensions, sparse embeddings
have variable-length vectors with (index, value) pairs representing
term frequencies in the BM25 vocabulary.
"""
def __init__(self, model_name: str = "Qdrant/bm25"):
"""
Initialize BM25 sparse embedding provider.
Args:
model_name: FastEmbed BM25 model name (default: Qdrant/bm25)
"""
self.model_name = model_name
logger.info(f"Initializing BM25 sparse embedding provider: {model_name}")
# Initialize FastEmbed sparse embedding model
self.model = SparseTextEmbedding(model_name=model_name)
logger.info(f"BM25 sparse embedding model loaded: {model_name}")
def encode(self, text: str) -> dict[str, Any]:
"""
Generate BM25 sparse embedding for a single text.
Args:
text: Input text to encode
Returns:
Dictionary with 'indices' and 'values' keys for Qdrant sparse vector
"""
# FastEmbed returns a generator, take first result
sparse_embedding = next(iter(self.model.embed([text])))
return {
"indices": sparse_embedding.indices.tolist(),
"values": sparse_embedding.values.tolist(),
}
def encode_batch(self, texts: list[str]) -> list[dict[str, Any]]:
"""
Generate BM25 sparse embeddings for multiple texts (batched).
Args:
texts: List of texts to encode
Returns:
List of dictionaries with 'indices' and 'values' for each text
"""
sparse_embeddings = list(self.model.embed(texts))
return [
{
"indices": emb.indices.tolist(),
"values": emb.values.tolist(),
}
for emb in sparse_embeddings
]
+33 -42
View File
@@ -1,56 +1,30 @@
"""Embedding service with provider detection."""
"""Embedding service with provider detection.
DEPRECATED: This module is maintained for backward compatibility.
New code should use nextcloud_mcp_server.providers.get_provider() directly.
"""
import logging
import os
from .base import EmbeddingProvider
from .ollama_provider import OllamaEmbeddingProvider
from .simple_provider import SimpleEmbeddingProvider
from nextcloud_mcp_server.providers import get_provider
from .bm25_provider import BM25SparseEmbeddingProvider
logger = logging.getLogger(__name__)
class EmbeddingService:
"""Unified embedding service with automatic provider detection."""
"""
Unified embedding service with automatic provider detection.
DEPRECATED: This class wraps the new unified provider infrastructure
for backward compatibility. New code should use
nextcloud_mcp_server.providers.get_provider() directly.
"""
def __init__(self):
"""Initialize embedding service with auto-detected provider."""
self.provider = self._detect_provider()
def _detect_provider(self) -> EmbeddingProvider:
"""
Auto-detect available embedding provider.
Checks environment variables in order:
1. OLLAMA_BASE_URL - Use Ollama provider (production)
2. OPENAI_API_KEY - Use OpenAI provider (future)
3. Fallback to SimpleEmbeddingProvider (testing/development)
Returns:
Configured embedding provider
"""
# Ollama provider (production)
ollama_url = os.getenv("OLLAMA_BASE_URL")
if ollama_url:
logger.info(f"Using Ollama embedding provider: {ollama_url}")
return OllamaEmbeddingProvider(
base_url=ollama_url,
model=os.getenv("OLLAMA_EMBEDDING_MODEL", "nomic-embed-text"),
verify_ssl=os.getenv("OLLAMA_VERIFY_SSL", "true").lower() == "true",
)
# OpenAI provider (future implementation)
# openai_key = os.getenv("OPENAI_API_KEY")
# if openai_key:
# return OpenAIEmbeddingProvider(api_key=openai_key)
# Fallback to simple provider for development/testing
logger.warning(
"No embedding provider configured (OLLAMA_BASE_URL or OPENAI_API_KEY not set). "
"Using SimpleEmbeddingProvider for testing/development. "
"For production, configure an external embedding service."
)
return SimpleEmbeddingProvider(dimension=384)
self.provider = get_provider()
async def embed(self, text: str) -> list[float]:
"""
@@ -109,3 +83,20 @@ def get_embedding_service() -> EmbeddingService:
if _embedding_service is None:
_embedding_service = EmbeddingService()
return _embedding_service
# BM25 sparse embedding singleton
_bm25_service: BM25SparseEmbeddingProvider | None = None
def get_bm25_service() -> BM25SparseEmbeddingProvider:
"""
Get singleton BM25 sparse embedding service instance.
Returns:
Global BM25SparseEmbeddingProvider instance
"""
global _bm25_service
if _bm25_service is None:
_bm25_service = BM25SparseEmbeddingProvider()
return _bm25_service
+14 -1
View File
@@ -19,9 +19,22 @@ class SemanticSearchResult(BaseModel):
default="", description="Document category (notes) or location (calendar)"
)
excerpt: str = Field(description="Excerpt from matching chunk")
score: float = Field(description="Semantic similarity score (0-1)")
score: float = Field(
description=(
"Relevance score (≥ 0.0, higher is better). "
"Score range depends on fusion method: "
"RRF produces scores in [0.0, 1.0], "
"DBSF can exceed 1.0 (sum of normalized scores from multiple systems)"
)
)
chunk_index: int = Field(description="Index of matching chunk in document")
total_chunks: int = Field(description="Total number of chunks in document")
chunk_start_offset: Optional[int] = Field(
default=None, description="Character position where chunk starts in document"
)
chunk_end_offset: Optional[int] = Field(
default=None, description="Character position where chunk ends in document"
)
class SemanticSearchResponse(BaseResponse):
@@ -39,7 +39,12 @@ class HealthCheckFilter(logging.Filter):
message = record.getMessage()
return not any(
endpoint in message
for endpoint in ["/health/live", "/health/ready", "/metrics"]
for endpoint in [
"/health/live",
"/health/ready",
"/metrics",
"/app/vector-sync/status",
]
)
+37 -14
View File
@@ -404,10 +404,11 @@ def update_vector_sync_queue_size(size: int) -> None:
def instrument_tool(func):
"""
Decorator to automatically instrument MCP tool functions with metrics.
Decorator to automatically instrument MCP tool functions with metrics and tracing.
Wraps async tool functions to record execution time and success/error status.
Compatible with @mcp.tool() and @require_scopes() decorators.
Wraps async tool functions to record execution time, success/error status, and
create OpenTelemetry trace spans. Compatible with @mcp.tool() and @require_scopes()
decorators.
Usage:
@mcp.tool()
@@ -420,24 +421,46 @@ def instrument_tool(func):
func: The async function to instrument
Returns:
Wrapped function with metrics instrumentation
Wrapped function with metrics and tracing instrumentation
"""
import functools
import time
from nextcloud_mcp_server.observability.tracing import trace_operation
@functools.wraps(func)
async def wrapper(*args, **kwargs):
tool_name = func.__name__
start_time = time.time()
try:
result = await func(*args, **kwargs)
duration = time.time() - start_time
record_tool_call(tool_name, duration, "success")
return result
except Exception as e:
duration = time.time() - start_time
record_tool_call(tool_name, duration, "error")
record_tool_error(tool_name, type(e).__name__)
raise
# Extract tool arguments for tracing (sanitize sensitive fields)
# kwargs contains the actual arguments passed to the tool
tool_args = {
k: v
for k, v in kwargs.items()
if k not in ("password", "token", "secret", "api_key", "etag", "ctx")
}
# Create trace span with metrics collection
with trace_operation(
f"mcp.tool.{tool_name}",
attributes={
"mcp.tool.name": tool_name,
"mcp.tool.args": str(tool_args)[:500]
if tool_args
else None, # Limit to 500 chars
},
record_exception=True,
):
try:
result = await func(*args, **kwargs)
duration = time.time() - start_time
record_tool_call(tool_name, duration, "success")
return result
except Exception as e:
duration = time.time() - start_time
record_tool_call(tool_name, duration, "error")
record_tool_error(tool_name, type(e).__name__)
raise
return wrapper
@@ -0,0 +1,18 @@
"""Unified provider infrastructure for embeddings and text generation."""
from .anthropic import AnthropicProvider
from .base import Provider
from .bedrock import BedrockProvider
from .ollama import OllamaProvider
from .registry import get_provider, reset_provider
from .simple import SimpleProvider
__all__ = [
"Provider",
"OllamaProvider",
"AnthropicProvider",
"SimpleProvider",
"BedrockProvider",
"get_provider",
"reset_provider",
]
@@ -0,0 +1,97 @@
"""Unified Anthropic provider for text generation."""
import logging
from anthropic import AsyncAnthropic
from .base import Provider
logger = logging.getLogger(__name__)
class AnthropicProvider(Provider):
"""
Anthropic provider for text generation.
Supports Claude models via the Anthropic API.
Note: Anthropic doesn't provide embedding models, only text generation.
"""
def __init__(self, api_key: str, model: str = "claude-3-5-sonnet-20241022"):
"""
Initialize Anthropic provider.
Args:
api_key: Anthropic API key
model: Model name (e.g., "claude-3-5-sonnet-20241022")
"""
self.client = AsyncAnthropic(api_key=api_key)
self.model = model
logger.info(f"Initialized Anthropic provider (model={model})")
@property
def supports_embeddings(self) -> bool:
"""Whether this provider supports embedding generation."""
return False
@property
def supports_generation(self) -> bool:
"""Whether this provider supports text generation."""
return True
async def embed(self, text: str) -> list[float]:
"""
Generate embedding vector for text.
Raises:
NotImplementedError: Anthropic doesn't provide embedding models
"""
raise NotImplementedError(
"Embedding not supported by Anthropic - use Ollama or Bedrock for embeddings"
)
async def embed_batch(self, texts: list[str]) -> list[list[float]]:
"""
Generate embeddings for multiple texts.
Raises:
NotImplementedError: Anthropic doesn't provide embedding models
"""
raise NotImplementedError(
"Embedding not supported by Anthropic - use Ollama or Bedrock for embeddings"
)
def get_dimension(self) -> int:
"""
Get embedding dimension.
Raises:
NotImplementedError: Anthropic doesn't provide embedding models
"""
raise NotImplementedError(
"Embedding not supported by Anthropic - use Ollama or Bedrock for embeddings"
)
async def generate(self, prompt: str, max_tokens: int = 500) -> str:
"""
Generate text using Anthropic API.
Args:
prompt: The prompt to generate from
max_tokens: Maximum tokens to generate
Returns:
Generated text
"""
message = await self.client.messages.create(
model=self.model,
max_tokens=max_tokens,
temperature=0.7,
messages=[{"role": "user", "content": prompt}],
)
return message.content[0].text
async def close(self) -> None:
"""Close the client (no-op for Anthropic SDK)."""
pass
+91
View File
@@ -0,0 +1,91 @@
"""Unified provider interface for embeddings and text generation."""
from abc import ABC, abstractmethod
class Provider(ABC):
"""
Unified base class for LLM providers.
Providers can support embeddings, text generation, or both.
Use capability properties to determine what features are available.
"""
@property
@abstractmethod
def supports_embeddings(self) -> bool:
"""Whether this provider supports embedding generation."""
pass
@property
@abstractmethod
def supports_generation(self) -> bool:
"""Whether this provider supports text generation."""
pass
@abstractmethod
async def embed(self, text: str) -> list[float]:
"""
Generate embedding vector for text.
Args:
text: Input text to embed
Returns:
Vector embedding as list of floats
Raises:
NotImplementedError: If provider doesn't support embeddings
"""
pass
@abstractmethod
async def embed_batch(self, texts: list[str]) -> list[list[float]]:
"""
Generate embeddings for multiple texts (optimized).
Args:
texts: List of texts to embed
Returns:
List of vector embeddings
Raises:
NotImplementedError: If provider doesn't support embeddings
"""
pass
@abstractmethod
def get_dimension(self) -> int:
"""
Get embedding dimension for this provider.
Returns:
Vector dimension (e.g., 768 for nomic-embed-text)
Raises:
NotImplementedError: If provider doesn't support embeddings
"""
pass
@abstractmethod
async def generate(self, prompt: str, max_tokens: int = 500) -> str:
"""
Generate text from a prompt.
Args:
prompt: The prompt to generate from
max_tokens: Maximum tokens to generate
Returns:
Generated text
Raises:
NotImplementedError: If provider doesn't support generation
"""
pass
@abstractmethod
async def close(self) -> None:
"""Close the provider and release resources."""
pass
+397
View File
@@ -0,0 +1,397 @@
"""Amazon Bedrock provider for embeddings and text generation."""
import json
import logging
from typing import Any
try:
import boto3
from botocore.exceptions import BotoCoreError, ClientError
BOTO3_AVAILABLE = True
except ImportError:
BOTO3_AVAILABLE = False
from .base import Provider
logger = logging.getLogger(__name__)
class BedrockProvider(Provider):
"""
Amazon Bedrock provider supporting both embeddings and text generation.
Uses AWS Bedrock Runtime API with boto3. Supports various model families:
- Embeddings: amazon.titan-embed-text-v1, amazon.titan-embed-text-v2, cohere.embed-*
- Text Generation: anthropic.claude-*, meta.llama3-*, amazon.titan-text-*, mistral.*, etc.
Requires AWS credentials configured via:
- Environment variables (AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_REGION)
- AWS credentials file (~/.aws/credentials)
- IAM role (when running on AWS)
"""
def __init__(
self,
region_name: str | None = None,
embedding_model: str | None = None,
generation_model: str | None = None,
aws_access_key_id: str | None = None,
aws_secret_access_key: str | None = None,
):
"""
Initialize Bedrock provider.
Args:
region_name: AWS region (e.g., "us-east-1"). Defaults to AWS_REGION env var.
embedding_model: Model ID for embeddings (e.g., "amazon.titan-embed-text-v2:0").
None disables embeddings.
generation_model: Model ID for text generation (e.g., "anthropic.claude-3-sonnet-20240229-v1:0").
None disables generation.
aws_access_key_id: AWS access key (optional, uses default credential chain if not provided)
aws_secret_access_key: AWS secret key (optional, uses default credential chain if not provided)
Raises:
ImportError: If boto3 is not installed
"""
if not BOTO3_AVAILABLE:
raise ImportError(
"boto3 is required for Bedrock provider. Install with: pip install boto3"
)
self.embedding_model = embedding_model
self.generation_model = generation_model
self._dimension: int | None = None # Detected dynamically
# Initialize bedrock-runtime client
client_kwargs: dict[str, Any] = {}
if region_name:
client_kwargs["region_name"] = region_name
if aws_access_key_id:
client_kwargs["aws_access_key_id"] = aws_access_key_id
if aws_secret_access_key:
client_kwargs["aws_secret_access_key"] = aws_secret_access_key
self.client = boto3.client("bedrock-runtime", **client_kwargs)
logger.info(
f"Initialized Bedrock provider in region {region_name or 'default'} "
f"(embedding_model={embedding_model}, generation_model={generation_model})"
)
@property
def supports_embeddings(self) -> bool:
"""Whether this provider supports embedding generation."""
return self.embedding_model is not None
@property
def supports_generation(self) -> bool:
"""Whether this provider supports text generation."""
return self.generation_model is not None
def _create_embedding_request(self, text: str) -> dict[str, Any]:
"""
Create model-specific embedding request payload.
Args:
text: Input text to embed
Returns:
Request payload dict for the embedding model
"""
if not self.embedding_model:
raise NotImplementedError(
"Embedding not supported - no embedding_model configured"
)
# Titan Embed models
if self.embedding_model.startswith("amazon.titan-embed"):
return {"inputText": text}
# Cohere Embed models
elif self.embedding_model.startswith("cohere.embed"):
return {"texts": [text], "input_type": "search_document"}
# Unknown model - try Titan format as default
else:
logger.warning(
f"Unknown embedding model format for {self.embedding_model}, "
"using Titan format as default"
)
return {"inputText": text}
def _parse_embedding_response(self, response: dict[str, Any]) -> list[float]:
"""
Parse model-specific embedding response.
Args:
response: Raw response from Bedrock
Returns:
Embedding vector as list of floats
"""
# Titan Embed models
if self.embedding_model and self.embedding_model.startswith(
"amazon.titan-embed"
):
return response["embedding"]
# Cohere Embed models
elif self.embedding_model and self.embedding_model.startswith("cohere.embed"):
return response["embeddings"][0]
# Unknown model - try Titan format as default
else:
logger.warning(
f"Unknown embedding response format for {self.embedding_model}, "
"trying Titan format"
)
return response.get("embedding", response.get("embeddings", [None])[0])
async def embed(self, text: str) -> list[float]:
"""
Generate embedding vector for text.
Args:
text: Input text to embed
Returns:
Vector embedding as list of floats
Raises:
NotImplementedError: If embeddings not enabled (no embedding_model)
ClientError: If Bedrock API call fails
"""
if not self.supports_embeddings:
raise NotImplementedError(
"Embedding not supported - no embedding_model configured"
)
try:
request_body = self._create_embedding_request(text)
response = self.client.invoke_model(
modelId=self.embedding_model,
body=json.dumps(request_body),
accept="application/json",
contentType="application/json",
)
response_body = json.loads(response["body"].read())
embedding = self._parse_embedding_response(response_body)
return embedding
except (BotoCoreError, ClientError) as e:
logger.error(f"Bedrock embedding error: {e}")
raise
async def embed_batch(self, texts: list[str]) -> list[list[float]]:
"""
Generate embeddings for multiple texts.
Note: Current implementation sends requests sequentially.
Future optimization could use asyncio for concurrent requests.
Args:
texts: List of texts to embed
Returns:
List of vector embeddings
Raises:
NotImplementedError: If embeddings not enabled (no embedding_model)
ClientError: If Bedrock API call fails
"""
if not self.supports_embeddings:
raise NotImplementedError(
"Embedding not supported - no embedding_model configured"
)
embeddings = []
for text in texts:
embedding = await self.embed(text)
embeddings.append(embedding)
return embeddings
async def _detect_dimension(self):
"""
Detect embedding dimension by generating a test embedding.
"""
if self._dimension is None and self.supports_embeddings:
logger.debug(
f"Detecting embedding dimension for model {self.embedding_model}..."
)
test_embedding = await self.embed("test")
self._dimension = len(test_embedding)
logger.info(
f"Detected embedding dimension: {self._dimension} "
f"for model {self.embedding_model}"
)
def get_dimension(self) -> int:
"""
Get embedding dimension.
Returns:
Vector dimension for the configured embedding model
Raises:
NotImplementedError: If embeddings not enabled (no embedding_model)
RuntimeError: If dimension not detected yet (call _detect_dimension first)
"""
if not self.supports_embeddings:
raise NotImplementedError(
"Embedding not supported - no embedding_model configured"
)
if self._dimension is None:
raise RuntimeError(
f"Embedding dimension not detected yet for model {self.embedding_model}. "
"Call _detect_dimension() first or generate an embedding."
)
return self._dimension
def _create_generation_request(
self, prompt: str, max_tokens: int
) -> dict[str, Any]:
"""
Create model-specific text generation request payload.
Args:
prompt: The prompt to generate from
max_tokens: Maximum tokens to generate
Returns:
Request payload dict for the generation model
"""
if not self.generation_model:
raise NotImplementedError(
"Text generation not supported - no generation_model configured"
)
# Anthropic Claude models
if self.generation_model.startswith("anthropic.claude"):
return {
"anthropic_version": "bedrock-2023-05-31",
"max_tokens": max_tokens,
"temperature": 0.7,
"messages": [{"role": "user", "content": prompt}],
}
# Meta Llama models
elif self.generation_model.startswith("meta.llama"):
return {"prompt": prompt, "max_gen_len": max_tokens, "temperature": 0.7}
# Amazon Titan Text models
elif self.generation_model.startswith("amazon.titan-text"):
return {
"inputText": prompt,
"textGenerationConfig": {
"maxTokenCount": max_tokens,
"temperature": 0.7,
},
}
# Mistral models
elif self.generation_model.startswith("mistral"):
return {"prompt": prompt, "max_tokens": max_tokens, "temperature": 0.7}
# Unknown model - try Claude format as default
else:
logger.warning(
f"Unknown generation model format for {self.generation_model}, "
"using Claude format as default"
)
return {
"anthropic_version": "bedrock-2023-05-31",
"max_tokens": max_tokens,
"temperature": 0.7,
"messages": [{"role": "user", "content": prompt}],
}
def _parse_generation_response(self, response: dict[str, Any]) -> str:
"""
Parse model-specific text generation response.
Args:
response: Raw response from Bedrock
Returns:
Generated text
"""
# Anthropic Claude models
if self.generation_model and self.generation_model.startswith(
"anthropic.claude"
):
return response["content"][0]["text"]
# Meta Llama models
elif self.generation_model and self.generation_model.startswith("meta.llama"):
return response["generation"]
# Amazon Titan Text models
elif self.generation_model and self.generation_model.startswith(
"amazon.titan-text"
):
return response["results"][0]["outputText"]
# Mistral models
elif self.generation_model and self.generation_model.startswith("mistral"):
return response["outputs"][0]["text"]
# Unknown model - try common response fields
else:
logger.warning(
f"Unknown generation response format for {self.generation_model}, "
"trying common fields"
)
# Try common response field names
for field in ["text", "generation", "outputText", "completion"]:
if field in response:
return response[field]
# Last resort: return JSON string
return json.dumps(response)
async def generate(self, prompt: str, max_tokens: int = 500) -> str:
"""
Generate text from a prompt.
Args:
prompt: The prompt to generate from
max_tokens: Maximum tokens to generate
Returns:
Generated text
Raises:
NotImplementedError: If generation not enabled (no generation_model)
ClientError: If Bedrock API call fails
"""
if not self.supports_generation:
raise NotImplementedError(
"Text generation not supported - no generation_model configured"
)
try:
request_body = self._create_generation_request(prompt, max_tokens)
response = self.client.invoke_model(
modelId=self.generation_model,
body=json.dumps(request_body),
accept="application/json",
contentType="application/json",
)
response_body = json.loads(response["body"].read())
text = self._parse_generation_response(response_body)
return text
except (BotoCoreError, ClientError) as e:
logger.error(f"Bedrock generation error: {e}")
raise
async def close(self) -> None:
"""Close the client (no-op for boto3 clients)."""
pass
+221
View File
@@ -0,0 +1,221 @@
"""Unified Ollama provider for embeddings and text generation."""
import logging
import httpx
from .base import Provider
logger = logging.getLogger(__name__)
class OllamaProvider(Provider):
"""
Ollama provider supporting both embeddings and text generation.
Supports TLS, SSL verification, and automatic model loading.
"""
def __init__(
self,
base_url: str,
embedding_model: str | None = None,
generation_model: str | None = None,
verify_ssl: bool = True,
timeout: httpx.Timeout | None = None,
):
"""
Initialize Ollama provider.
Args:
base_url: Ollama API base URL (e.g., https://ollama.internal.example.com:443)
embedding_model: Model for embeddings (e.g., "nomic-embed-text"). None disables embeddings.
generation_model: Model for text generation (e.g., "llama3.2:1b"). None disables generation.
verify_ssl: Verify SSL certificates (default: True)
timeout: HTTP timeout configuration
"""
self.base_url = base_url.rstrip("/")
self.embedding_model = embedding_model
self.generation_model = generation_model
self.verify_ssl = verify_ssl
if timeout is None:
timeout = httpx.Timeout(timeout=120, connect=5)
self.client = httpx.AsyncClient(verify=verify_ssl, timeout=timeout)
self._dimension: int | None = None # Detected dynamically for embeddings
logger.info(
f"Initialized Ollama provider: {base_url} "
f"(embedding_model={embedding_model}, generation_model={generation_model}, "
f"verify_ssl={verify_ssl})"
)
# Pre-check and auto-load models
if embedding_model:
self._check_model_is_loaded(embedding_model, autoload=True)
if generation_model:
self._check_model_is_loaded(generation_model, autoload=True)
@property
def supports_embeddings(self) -> bool:
"""Whether this provider supports embedding generation."""
return self.embedding_model is not None
@property
def supports_generation(self) -> bool:
"""Whether this provider supports text generation."""
return self.generation_model is not None
async def embed(self, text: str) -> list[float]:
"""
Generate embedding vector for text.
Args:
text: Input text to embed
Returns:
Vector embedding as list of floats
Raises:
NotImplementedError: If embeddings not enabled (no embedding_model)
"""
if not self.supports_embeddings:
raise NotImplementedError(
"Embedding not supported - no embedding_model configured"
)
response = await self.client.post(
f"{self.base_url}/api/embeddings",
json={"model": self.embedding_model, "prompt": text},
)
response.raise_for_status()
return response.json()["embedding"]
async def embed_batch(self, texts: list[str]) -> list[list[float]]:
"""
Generate embeddings for multiple texts (batched requests).
Note: Ollama doesn't have native batch API, so we send requests sequentially.
Args:
texts: List of texts to embed
Returns:
List of vector embeddings
Raises:
NotImplementedError: If embeddings not enabled (no embedding_model)
"""
if not self.supports_embeddings:
raise NotImplementedError(
"Embedding not supported - no embedding_model configured"
)
embeddings = []
for text in texts:
embedding = await self.embed(text)
embeddings.append(embedding)
return embeddings
async def _detect_dimension(self):
"""
Detect embedding dimension by generating a test embedding.
This method queries the model to determine the actual dimension
instead of relying on hardcoded values.
"""
if self._dimension is None and self.supports_embeddings:
logger.debug(
f"Detecting embedding dimension for model {self.embedding_model}..."
)
test_embedding = await self.embed("test")
self._dimension = len(test_embedding)
logger.info(
f"Detected embedding dimension: {self._dimension} "
f"for model {self.embedding_model}"
)
def get_dimension(self) -> int:
"""
Get embedding dimension.
Returns:
Vector dimension for the configured embedding model
Raises:
NotImplementedError: If embeddings not enabled (no embedding_model)
RuntimeError: If dimension not detected yet (call _detect_dimension first)
"""
if not self.supports_embeddings:
raise NotImplementedError(
"Embedding not supported - no embedding_model configured"
)
if self._dimension is None:
raise RuntimeError(
f"Embedding dimension not detected yet for model {self.embedding_model}. "
"Call _detect_dimension() first or generate an embedding."
)
return self._dimension
async def generate(self, prompt: str, max_tokens: int = 500) -> str:
"""
Generate text from a prompt.
Args:
prompt: The prompt to generate from
max_tokens: Maximum tokens to generate
Returns:
Generated text
Raises:
NotImplementedError: If generation not enabled (no generation_model)
"""
if not self.supports_generation:
raise NotImplementedError(
"Text generation not supported - no generation_model configured"
)
response = await self.client.post(
f"{self.base_url}/api/generate",
json={
"model": self.generation_model,
"prompt": prompt,
"stream": False,
"options": {
"num_predict": max_tokens,
"temperature": 0.7,
},
},
)
response.raise_for_status()
data = response.json()
return data["response"]
def _check_model_is_loaded(self, model: str, autoload: bool = True):
"""
Check if model is loaded in Ollama, optionally auto-loading it.
Args:
model: Model name to check
autoload: Whether to automatically pull the model if not loaded
"""
response = httpx.get(f"{self.base_url}/api/tags")
response.raise_for_status()
models = [m["name"] for m in response.json().get("models", [])]
logger.info("Ollama has following models pre-loaded: %s", models)
if (model not in models) and autoload:
logger.warning(
"Model '%s' not yet available in ollama, attempting to pull now...",
model,
)
response = httpx.post(f"{self.base_url}/api/pull", json={"model": model})
response.raise_for_status()
async def close(self) -> None:
"""Close HTTP client."""
await self.client.aclose()
+126
View File
@@ -0,0 +1,126 @@
"""Provider registry and factory for auto-detection and instantiation."""
import logging
import os
from .base import Provider
from .bedrock import BedrockProvider
from .ollama import OllamaProvider
from .simple import SimpleProvider
logger = logging.getLogger(__name__)
class ProviderRegistry:
"""
Registry for provider auto-detection and instantiation.
Checks environment variables in priority order and creates appropriate provider:
1. Bedrock (AWS_REGION + BEDROCK_*_MODEL)
2. Ollama (OLLAMA_BASE_URL)
3. Simple (fallback for testing/development)
"""
@staticmethod
def create_provider() -> Provider:
"""
Auto-detect and create provider based on environment variables.
Priority order:
1. Bedrock - if AWS_REGION or BEDROCK_EMBEDDING_MODEL is set
2. Ollama - if OLLAMA_BASE_URL is set
3. Simple - fallback for testing/development
Returns:
Provider instance
Environment Variables:
Bedrock:
- AWS_REGION: AWS region (e.g., "us-east-1")
- AWS_ACCESS_KEY_ID: AWS access key (optional, uses credential chain)
- AWS_SECRET_ACCESS_KEY: AWS secret key (optional)
- BEDROCK_EMBEDDING_MODEL: Model ID for embeddings (e.g., "amazon.titan-embed-text-v2:0")
- BEDROCK_GENERATION_MODEL: Model ID for text generation (e.g., "anthropic.claude-3-sonnet-20240229-v1:0")
Ollama:
- OLLAMA_BASE_URL: Ollama API base URL (e.g., "http://localhost:11434")
- OLLAMA_EMBEDDING_MODEL: Model for embeddings (default: "nomic-embed-text")
- OLLAMA_GENERATION_MODEL: Model for text generation (e.g., "llama3.2:1b")
- OLLAMA_VERIFY_SSL: Verify SSL certificates (default: "true")
Simple (no configuration needed, fallback):
- SIMPLE_EMBEDDING_DIMENSION: Embedding dimension (default: 384)
"""
# 1. Check for Bedrock
aws_region = os.getenv("AWS_REGION")
bedrock_embedding_model = os.getenv("BEDROCK_EMBEDDING_MODEL")
bedrock_generation_model = os.getenv("BEDROCK_GENERATION_MODEL")
if aws_region or bedrock_embedding_model or bedrock_generation_model:
logger.info(
f"Using Bedrock provider: region={aws_region}, "
f"embedding_model={bedrock_embedding_model}, "
f"generation_model={bedrock_generation_model}"
)
return BedrockProvider(
region_name=aws_region,
embedding_model=bedrock_embedding_model,
generation_model=bedrock_generation_model,
aws_access_key_id=os.getenv("AWS_ACCESS_KEY_ID"),
aws_secret_access_key=os.getenv("AWS_SECRET_ACCESS_KEY"),
)
# 2. Check for Ollama
ollama_url = os.getenv("OLLAMA_BASE_URL")
if ollama_url:
embedding_model = os.getenv("OLLAMA_EMBEDDING_MODEL", "nomic-embed-text")
generation_model = os.getenv("OLLAMA_GENERATION_MODEL")
verify_ssl = os.getenv("OLLAMA_VERIFY_SSL", "true").lower() == "true"
logger.info(
f"Using Ollama provider: {ollama_url}, "
f"embedding_model={embedding_model}, "
f"generation_model={generation_model}"
)
return OllamaProvider(
base_url=ollama_url,
embedding_model=embedding_model,
generation_model=generation_model,
verify_ssl=verify_ssl,
)
# 3. Fallback to Simple provider for development/testing
dimension = int(os.getenv("SIMPLE_EMBEDDING_DIMENSION", "384"))
logger.warning(
"No provider configured (AWS_REGION, OLLAMA_BASE_URL not set). "
"Using SimpleProvider for testing/development. "
"For production, configure Bedrock or Ollama."
)
return SimpleProvider(dimension=dimension)
# Singleton instance
_provider: Provider | None = None
def get_provider() -> Provider:
"""
Get singleton provider instance.
Returns:
Global Provider instance (auto-detected on first call)
"""
global _provider
if _provider is None:
_provider = ProviderRegistry.create_provider()
return _provider
def reset_provider():
"""
Reset singleton provider instance.
Useful for testing or reconfiguration.
"""
global _provider
_provider = None
+149
View File
@@ -0,0 +1,149 @@
"""Simple in-process embedding provider for testing.
This provider uses a basic TF-IDF-like approach with feature hashing to generate
deterministic embeddings without requiring external services. Suitable for testing
but not for production use.
"""
import hashlib
import math
import re
from collections import Counter
from .base import Provider
class SimpleProvider(Provider):
"""Simple deterministic embedding provider using feature hashing.
This implementation:
- Tokenizes text into words
- Uses feature hashing to map words to fixed-size vectors
- Applies TF-IDF-like weighting
- Normalizes vectors to unit length
Not suitable for production but good for testing semantic search infrastructure.
Only supports embeddings, not text generation.
"""
def __init__(self, dimension: int = 384):
"""Initialize simple embedding provider.
Args:
dimension: Embedding dimension (default: 384)
"""
self.dimension = dimension
@property
def supports_embeddings(self) -> bool:
"""Whether this provider supports embedding generation."""
return True
@property
def supports_generation(self) -> bool:
"""Whether this provider supports text generation."""
return False
def _tokenize(self, text: str) -> list[str]:
"""Tokenize text into lowercase words.
Args:
text: Input text
Returns:
List of lowercase word tokens
"""
# Simple word tokenization
text = text.lower()
words = re.findall(r"\b\w+\b", text)
return words
def _hash_word(self, word: str) -> int:
"""Hash word to dimension index.
Args:
word: Word to hash
Returns:
Index in range [0, dimension)
"""
hash_bytes = hashlib.md5(word.encode()).digest()
hash_int = int.from_bytes(hash_bytes[:4], byteorder="big")
return hash_int % self.dimension
def _embed_single(self, text: str) -> list[float]:
"""Generate embedding for single text.
Args:
text: Input text
Returns:
Normalized embedding vector
"""
tokens = self._tokenize(text)
if not tokens:
return [0.0] * self.dimension
# Count term frequencies
term_freq = Counter(tokens)
# Initialize vector
vector = [0.0] * self.dimension
# Apply TF weighting with feature hashing
for word, count in term_freq.items():
idx = self._hash_word(word)
# Simple TF weighting: log(1 + count)
vector[idx] += math.log1p(count)
# Normalize to unit length
norm = math.sqrt(sum(x * x for x in vector))
if norm > 0:
vector = [x / norm for x in vector]
return vector
async def embed(self, text: str) -> list[float]:
"""Generate embedding vector for text.
Args:
text: Input text to embed
Returns:
Vector embedding as list of floats
"""
return self._embed_single(text)
async def embed_batch(self, texts: list[str]) -> list[list[float]]:
"""Generate embeddings for multiple texts.
Args:
texts: List of texts to embed
Returns:
List of vector embeddings
"""
return [self._embed_single(text) for text in texts]
def get_dimension(self) -> int:
"""Get embedding dimension.
Returns:
Vector dimension
"""
return self.dimension
async def generate(self, prompt: str, max_tokens: int = 500) -> str:
"""
Generate text from a prompt.
Raises:
NotImplementedError: Simple provider doesn't support text generation
"""
raise NotImplementedError(
"Text generation not supported by Simple provider - use Ollama, Anthropic, or Bedrock"
)
async def close(self) -> None:
"""Close the provider (no-op for simple provider)."""
pass
+8 -14
View File
@@ -1,13 +1,11 @@
"""Search algorithms module for unified multi-algorithm search.
"""Search algorithms module for BM25 hybrid search.
This module provides a unified interface for different search algorithms:
- Semantic search (vector similarity)
- Keyword search (token-based matching)
- Fuzzy search (character overlap)
- Hybrid search (RRF fusion of multiple algorithms)
This module provides BM25 hybrid search combining:
- Dense semantic vectors (vector similarity via embeddings)
- Sparse BM25 vectors (keyword-based retrieval)
All algorithms share the same interface and can be used interchangeably by both
MCP tools and the visualization pane.
Results are fused using Qdrant's native Reciprocal Rank Fusion (RRF) for
optimal relevance across both semantic and keyword queries.
"""
from nextcloud_mcp_server.search.algorithms import (
@@ -16,9 +14,7 @@ from nextcloud_mcp_server.search.algorithms import (
SearchResult,
get_indexed_doc_types,
)
from nextcloud_mcp_server.search.fuzzy import FuzzySearchAlgorithm
from nextcloud_mcp_server.search.hybrid import HybridSearchAlgorithm
from nextcloud_mcp_server.search.keyword import KeywordSearchAlgorithm
from nextcloud_mcp_server.search.bm25_hybrid import BM25HybridSearchAlgorithm
from nextcloud_mcp_server.search.semantic import SemanticSearchAlgorithm
__all__ = [
@@ -27,7 +23,5 @@ __all__ = [
"SearchResult",
"get_indexed_doc_types",
"SemanticSearchAlgorithm",
"KeywordSearchAlgorithm",
"FuzzySearchAlgorithm",
"HybridSearchAlgorithm",
"BM25HybridSearchAlgorithm",
]
+17 -4
View File
@@ -127,8 +127,12 @@ class SearchResult:
doc_type: Document type (note, file, calendar, contact, etc.)
title: Document title
excerpt: Content excerpt showing match context
score: Relevance score (0.0-1.0, higher is better)
score: Relevance score (≥ 0.0, higher is better)
- RRF fusion: scores in [0.0, 1.0]
- DBSF fusion: scores can exceed 1.0 (sum of normalized scores)
metadata: Additional algorithm-specific metadata
chunk_start_offset: Character position where chunk starts (None if not available)
chunk_end_offset: Character position where chunk ends (None if not available)
"""
id: int
@@ -137,11 +141,20 @@ class SearchResult:
excerpt: str
score: float
metadata: dict[str, Any] | None = None
chunk_start_offset: int | None = None
chunk_end_offset: int | None = None
def __post_init__(self):
"""Validate score is in valid range."""
if not 0.0 <= self.score <= 1.0:
raise ValueError(f"Score must be between 0.0 and 1.0, got {self.score}")
"""Validate score is non-negative.
Note: Different fusion methods produce different score ranges:
- RRF (Reciprocal Rank Fusion): Bounded to [0.0, 1.0]
- DBSF (Distribution-Based Score Fusion): Unbounded (can exceed 1.0)
DBSF sums normalized scores from multiple systems, so scores can be
1.5, 2.0, etc. when multiple systems agree a document is highly relevant.
"""
if self.score < 0.0:
raise ValueError(f"Score must be non-negative, got {self.score}")
class SearchAlgorithm(ABC):
+223
View File
@@ -0,0 +1,223 @@
"""BM25 hybrid search algorithm using Qdrant native RRF fusion."""
import logging
from typing import Any
from qdrant_client import models
from qdrant_client.models import FieldCondition, Filter, MatchValue
from nextcloud_mcp_server.config import get_settings
from nextcloud_mcp_server.embedding import get_bm25_service, get_embedding_service
from nextcloud_mcp_server.observability.metrics import record_qdrant_operation
from nextcloud_mcp_server.search.algorithms import SearchAlgorithm, SearchResult
from nextcloud_mcp_server.vector.qdrant_client import get_qdrant_client
logger = logging.getLogger(__name__)
class BM25HybridSearchAlgorithm(SearchAlgorithm):
"""
Hybrid search combining dense semantic vectors with BM25 sparse vectors.
Uses Qdrant's native Reciprocal Rank Fusion (RRF) to automatically merge
results from both dense (semantic) and sparse (BM25 keyword) searches.
This provides the best of both worlds: semantic understanding for conceptual
queries and precise keyword matching for specific terms, acronyms, and codes.
The fusion happens efficiently in the database using the prefetch mechanism,
eliminating the need for application-layer result merging.
"""
def __init__(self, score_threshold: float = 0.0, fusion: str = "rrf"):
"""
Initialize BM25 hybrid search algorithm.
Args:
score_threshold: Minimum fusion score (0-1, default: 0.0 to allow fusion scoring)
Note: Both RRF and DBSF produce normalized scores
fusion: Fusion algorithm to use: "rrf" (Reciprocal Rank Fusion, default)
or "dbsf" (Distribution-Based Score Fusion)
Raises:
ValueError: If fusion is not "rrf" or "dbsf"
"""
if fusion not in ("rrf", "dbsf"):
raise ValueError(
f"Invalid fusion algorithm '{fusion}'. Must be 'rrf' or 'dbsf'"
)
self.score_threshold = score_threshold
self.fusion = models.Fusion.RRF if fusion == "rrf" else models.Fusion.DBSF
self.fusion_name = fusion
@property
def name(self) -> str:
return "bm25_hybrid"
@property
def requires_vector_db(self) -> bool:
return True
async def search(
self,
query: str,
user_id: str,
limit: int = 10,
doc_type: str | None = None,
**kwargs: Any,
) -> list[SearchResult]:
"""
Execute hybrid search using dense + sparse vectors with native RRF fusion.
Returns unverified results from Qdrant. Access verification should be
performed separately at the final output stage using verify_search_results().
Args:
query: Natural language or keyword search query
user_id: User ID for filtering
limit: Maximum results to return
doc_type: Optional document type filter
**kwargs: Additional parameters (score_threshold override)
Returns:
List of unverified SearchResult objects ranked by RRF fusion score
Raises:
McpError: If vector sync is not enabled or search fails
"""
settings = get_settings()
score_threshold = kwargs.get("score_threshold", self.score_threshold)
logger.info(
f"BM25 hybrid search: query='{query}', user={user_id}, "
f"limit={limit}, score_threshold={score_threshold}, doc_type={doc_type}, "
f"fusion={self.fusion_name}"
)
# Generate dense embedding for semantic search
embedding_service = get_embedding_service()
dense_embedding = await embedding_service.embed(query)
logger.debug(f"Generated dense embedding (dimension={len(dense_embedding)})")
# Generate sparse embedding for BM25 keyword search
bm25_service = get_bm25_service()
sparse_embedding = bm25_service.encode(query)
logger.debug(
f"Generated sparse embedding "
f"({len(sparse_embedding['indices'])} non-zero terms)"
)
# Build Qdrant filter
filter_conditions = [
FieldCondition(
key="user_id",
match=MatchValue(value=user_id),
)
]
# Add doc_type filter if specified
if doc_type:
filter_conditions.append(
FieldCondition(
key="doc_type",
match=MatchValue(value=doc_type),
)
)
query_filter = Filter(must=filter_conditions)
# Execute hybrid search with Qdrant native RRF fusion
qdrant_client = await get_qdrant_client()
try:
# Use prefetch to run both dense and sparse searches
# Qdrant will automatically merge results using RRF
search_response = await qdrant_client.query_points(
collection_name=settings.get_collection_name(),
prefetch=[
# Dense semantic search
models.Prefetch(
query=dense_embedding,
using="dense",
limit=limit * 2, # Get extra for deduplication
filter=query_filter,
),
# Sparse BM25 search
models.Prefetch(
query=models.SparseVector(
indices=sparse_embedding["indices"],
values=sparse_embedding["values"],
),
using="sparse",
limit=limit * 2, # Get extra for deduplication
filter=query_filter,
),
],
# Fusion query (RRF or DBSF based on initialization)
query=models.FusionQuery(fusion=self.fusion),
limit=limit * 2, # Get extra for deduplication
score_threshold=score_threshold,
with_payload=True,
with_vectors=False, # Don't return vectors to save bandwidth
)
record_qdrant_operation("search", "success")
except Exception:
record_qdrant_operation("search", "error")
raise
logger.info(
f"Qdrant {self.fusion_name.upper()} fusion returned {len(search_response.points)} results "
f"(before deduplication)"
)
if search_response.points:
# Log top 3 fusion scores to help with threshold tuning
top_scores = [p.score for p in search_response.points[:3]]
logger.debug(
f"Top 3 {self.fusion_name.upper()} fusion scores: {top_scores}"
)
# Deduplicate by (doc_id, doc_type) - multiple chunks per document
seen_docs = set()
results = []
for result in search_response.points:
doc_id = int(result.payload["doc_id"])
doc_type = result.payload.get("doc_type", "note")
doc_key = (doc_id, doc_type)
# Skip if we've already seen this document
if doc_key in seen_docs:
continue
seen_docs.add(doc_key)
# Return unverified results (verification happens at output stage)
results.append(
SearchResult(
id=doc_id,
doc_type=doc_type,
title=result.payload.get("title", "Untitled"),
excerpt=result.payload.get("excerpt", ""),
score=result.score, # Fusion score (RRF or DBSF)
metadata={
"chunk_index": result.payload.get("chunk_index"),
"total_chunks": result.payload.get("total_chunks"),
"search_method": f"bm25_hybrid_{self.fusion_name}",
},
chunk_start_offset=result.payload.get("chunk_start_offset"),
chunk_end_offset=result.payload.get("chunk_end_offset"),
)
)
if len(results) >= limit:
break
logger.info(f"Returning {len(results)} unverified results after deduplication")
if results:
result_details = [
f"{r.doc_type}_{r.id} (score={r.score:.3f}, title='{r.title}')"
for r in results[:5] # Show top 5
]
logger.debug(f"Top results: {', '.join(result_details)}")
return results
-219
View File
@@ -1,219 +0,0 @@
"""Fuzzy search algorithm using character overlap matching on Qdrant payload."""
import logging
from typing import Any
from qdrant_client.models import FieldCondition, Filter, MatchValue
from nextcloud_mcp_server.config import get_settings
from nextcloud_mcp_server.search.algorithms import SearchAlgorithm, SearchResult
from nextcloud_mcp_server.vector.qdrant_client import get_qdrant_client
logger = logging.getLogger(__name__)
class FuzzySearchAlgorithm(SearchAlgorithm):
"""Fuzzy search using simple character-based similarity.
Implements character overlap matching with configurable threshold:
- Compares character sets between query and text
- Requires configurable % character overlap to match (default: 70%)
- Tolerant to typos and minor variations
"""
def __init__(self, threshold: float = 0.7):
"""Initialize fuzzy search algorithm.
Args:
threshold: Minimum character overlap ratio (0-1, default: 0.7)
"""
if not 0.0 <= threshold <= 1.0:
raise ValueError(f"Threshold must be between 0.0 and 1.0, got {threshold}")
self.threshold = threshold
@property
def name(self) -> str:
return "fuzzy"
async def search(
self,
query: str,
user_id: str,
limit: int = 10,
doc_type: str | None = None,
**kwargs: Any,
) -> list[SearchResult]:
"""Execute fuzzy search using character overlap on Qdrant payload.
Queries Qdrant for all indexed documents, then scores based on character
overlap in title and excerpt fields. Returns unverified results - access
verification should be performed separately at the final output stage.
Args:
query: Search query
user_id: User ID for filtering
limit: Maximum results to return
doc_type: Optional document type filter (None = all types)
**kwargs: Additional parameters (threshold override)
Returns:
List of unverified SearchResult objects ranked by character overlap score
"""
settings = get_settings()
threshold = kwargs.get("threshold", self.threshold)
logger.info(
f"Fuzzy search: query='{query}', user={user_id}, "
f"limit={limit}, threshold={threshold}, doc_type={doc_type}"
)
# Build Qdrant filter
filter_conditions = [
FieldCondition(key="user_id", match=MatchValue(value=user_id))
]
if doc_type:
filter_conditions.append(
FieldCondition(key="doc_type", match=MatchValue(value=doc_type))
)
# Scroll through Qdrant to get all matching documents
qdrant_client = await get_qdrant_client()
collection = settings.get_collection_name()
all_points = []
offset = None
# Scroll through all points matching filter
while True:
scroll_result, next_offset = await qdrant_client.scroll(
collection_name=collection,
scroll_filter=Filter(must=filter_conditions),
limit=100, # Batch size
offset=offset,
with_payload=["doc_id", "doc_type", "title", "excerpt", "chunk_index"],
with_vectors=False, # Don't need vectors
)
all_points.extend(scroll_result)
if next_offset is None:
break
offset = next_offset
logger.debug(f"Retrieved {len(all_points)} points from Qdrant for fuzzy search")
# Deduplicate by (doc_id, doc_type) - keep first chunk
seen_docs = {}
for point in all_points:
doc_id = int(point.payload["doc_id"])
dtype = point.payload.get("doc_type", "note")
doc_key = (doc_id, dtype)
chunk_idx = point.payload.get("chunk_index", 0)
if doc_key not in seen_docs or chunk_idx == 0:
seen_docs[doc_key] = point
logger.debug(f"Deduplicated to {len(seen_docs)} unique documents")
# Score each document based on fuzzy matches
scored_results = []
query_lower = query.lower()
for doc_key, point in seen_docs.items():
doc_id, dtype = doc_key
title = point.payload.get("title", "")
excerpt = point.payload.get("excerpt", "")
# Check title match
title_score = self._calculate_char_overlap(query_lower, title.lower())
# Check excerpt match
excerpt_score = self._calculate_char_overlap(query_lower, excerpt.lower())
# Use best score
best_score = max(title_score, excerpt_score)
if best_score >= threshold:
match_location = "title" if title_score >= excerpt_score else "excerpt"
scored_results.append(
{
"doc_id": doc_id,
"doc_type": dtype,
"title": title,
"excerpt": excerpt
if excerpt_score >= title_score
else f"Title match: {title}",
"score": best_score,
"match_location": match_location,
}
)
# Sort by score (descending) and limit
scored_results.sort(key=lambda x: x["score"], reverse=True)
top_results = scored_results[:limit]
# Return unverified results (verification happens at output stage)
final_results = []
for result in top_results:
final_results.append(
SearchResult(
id=result["doc_id"],
doc_type=result["doc_type"],
title=result["title"],
excerpt=result["excerpt"],
score=result["score"],
metadata={"match_location": result["match_location"]},
)
)
logger.info(f"Fuzzy search returned {len(final_results)} unverified results")
if final_results:
result_details = [
f"{r.doc_type}_{r.id} (score={r.score:.3f}, title='{r.title}')"
for r in final_results[:5]
]
logger.debug(f"Top fuzzy results: {', '.join(result_details)}")
return final_results
def _calculate_char_overlap(self, query: str, text: str) -> float:
"""Calculate character overlap ratio between query and text.
Args:
query: Query string (normalized)
text: Text to compare (normalized)
Returns:
Overlap ratio (0.0-1.0)
"""
if not query or not text:
return 0.0
# Convert to character sets
query_chars = set(query)
text_chars = set(text)
# Calculate overlap
overlap = query_chars & text_chars
overlap_ratio = len(overlap) / len(query_chars)
return overlap_ratio
def _extract_excerpt(self, content: str, max_length: int = 200) -> str:
"""Extract excerpt from content.
Args:
content: Full document content
max_length: Maximum excerpt length
Returns:
Excerpt string
"""
if not content:
return ""
excerpt = content[:max_length].strip()
if len(content) > max_length:
excerpt += "..."
return excerpt
-237
View File
@@ -1,237 +0,0 @@
"""Hybrid search algorithm using Reciprocal Rank Fusion (RRF)."""
import asyncio
import logging
from collections import defaultdict
from typing import Any
from nextcloud_mcp_server.search.algorithms import SearchAlgorithm, SearchResult
from nextcloud_mcp_server.search.fuzzy import FuzzySearchAlgorithm
from nextcloud_mcp_server.search.keyword import KeywordSearchAlgorithm
from nextcloud_mcp_server.search.semantic import SemanticSearchAlgorithm
logger = logging.getLogger(__name__)
class HybridSearchAlgorithm(SearchAlgorithm):
"""Hybrid search combining multiple algorithms using Reciprocal Rank Fusion.
Implements RRF from ADR-003 to combine results from:
- Semantic search (vector similarity)
- Keyword search (token matching)
- Fuzzy search (character overlap)
RRF formula: score = weight / (k + rank)
where k=60 (standard value) and rank is 1-indexed position.
"""
DEFAULT_RRF_K = 60 # Standard RRF constant
def __init__(
self,
semantic_weight: float = 0.5,
keyword_weight: float = 0.3,
fuzzy_weight: float = 0.2,
rrf_k: int = DEFAULT_RRF_K,
):
"""Initialize hybrid search with algorithm weights.
Args:
semantic_weight: Weight for semantic results (default: 0.5)
keyword_weight: Weight for keyword results (default: 0.3)
fuzzy_weight: Weight for fuzzy results (default: 0.2)
rrf_k: RRF constant for rank decay (default: 60)
Raises:
ValueError: If weights are invalid
"""
# Validate weights
if semantic_weight < 0 or keyword_weight < 0 or fuzzy_weight < 0:
raise ValueError("Weights must be non-negative")
total_weight = semantic_weight + keyword_weight + fuzzy_weight
if total_weight > 1.0:
raise ValueError(f"Weights sum to {total_weight:.2f}, must be ≤1.0")
if total_weight == 0.0:
raise ValueError("At least one weight must be > 0")
self.semantic_weight = semantic_weight
self.keyword_weight = keyword_weight
self.fuzzy_weight = fuzzy_weight
self.rrf_k = rrf_k
self.total_weight = total_weight
# Initialize sub-algorithms
self.semantic = SemanticSearchAlgorithm()
self.keyword = KeywordSearchAlgorithm()
self.fuzzy = FuzzySearchAlgorithm()
@property
def name(self) -> str:
return "hybrid"
@property
def requires_vector_db(self) -> bool:
# Requires vector DB if semantic search has non-zero weight
return self.semantic_weight > 0
async def search(
self,
query: str,
user_id: str,
limit: int = 10,
doc_type: str | None = None,
**kwargs: Any,
) -> list[SearchResult]:
"""Execute hybrid search using RRF to combine algorithms.
Returns unverified results from combined algorithms. Access verification
should be performed separately at the final output stage.
Args:
query: Search query
user_id: User ID for filtering
limit: Maximum results to return
doc_type: Optional document type filter
**kwargs: Additional parameters passed to sub-algorithms
Returns:
List of unverified SearchResult objects ranked by RRF combined score
"""
logger.info(
f"Hybrid search: query='{query}', user={user_id}, limit={limit}, "
f"weights=(semantic={self.semantic_weight}, keyword={self.keyword_weight}, "
f"fuzzy={self.fuzzy_weight})"
)
# Run algorithms in parallel
tasks = []
algo_names = []
if self.semantic_weight > 0:
tasks.append(
self.semantic.search(query, user_id, limit * 2, doc_type, **kwargs)
)
algo_names.append("semantic")
if self.keyword_weight > 0:
tasks.append(
self.keyword.search(query, user_id, limit * 2, doc_type, **kwargs)
)
algo_names.append("keyword")
if self.fuzzy_weight > 0:
tasks.append(
self.fuzzy.search(query, user_id, limit * 2, doc_type, **kwargs)
)
algo_names.append("fuzzy")
# Execute searches in parallel
results_list = await asyncio.gather(*tasks)
# Build results dict
algo_results = {}
for algo_name, results in zip(algo_names, results_list):
algo_results[algo_name] = results
logger.debug(f"{algo_name} returned {len(results)} results")
# Combine using RRF
combined_results = self._reciprocal_rank_fusion(
algo_results,
{
"semantic": self.semantic_weight,
"keyword": self.keyword_weight,
"fuzzy": self.fuzzy_weight,
},
limit,
)
logger.info(f"Hybrid search returned {len(combined_results)} combined results")
if combined_results:
result_details = [
f"{r.doc_type}_{r.id} (score={r.score:.3f}, title='{r.title}')"
for r in combined_results[:5]
]
logger.debug(f"Top hybrid results: {', '.join(result_details)}")
return combined_results
def _reciprocal_rank_fusion(
self,
algo_results: dict[str, list[SearchResult]],
weights: dict[str, float],
limit: int,
) -> list[SearchResult]:
"""Combine multiple ranked result lists using RRF.
Args:
algo_results: Dict of algorithm_name -> ranked results
weights: Dict of algorithm_name -> weight (0-1)
limit: Maximum results to return
Returns:
Combined and re-ranked results
"""
# Track RRF scores per document
rrf_scores: dict[tuple[int, str], float] = defaultdict(float)
# Track best result object for each document
best_results: dict[tuple[int, str], SearchResult] = {}
for algo_name, results in algo_results.items():
weight = weights.get(algo_name, 0.0)
if weight == 0:
continue
for rank, result in enumerate(results, start=1):
doc_key = (result.id, result.doc_type)
# RRF formula: weight / (k + rank)
rrf_score = weight / (self.rrf_k + rank)
rrf_scores[doc_key] += rrf_score
# Track best result object (prefer higher original scores)
if doc_key not in best_results:
best_results[doc_key] = result
elif result.score > best_results[doc_key].score:
best_results[doc_key] = result
# Sort by combined RRF score
sorted_docs = sorted(
rrf_scores.items(),
key=lambda x: x[1],
reverse=True,
)[:limit]
# Calculate normalization factor to scale RRF scores to 0-1 range
# Theoretical max RRF score = total_weight / (rrf_k + 1)
# Normalization factor = (rrf_k + 1) / total_weight
normalization_factor = (self.rrf_k + 1) / self.total_weight
# Build final results with normalized RRF scores
final_results = []
for doc_key, rrf_score in sorted_docs:
result = best_results[doc_key]
# Normalize RRF score to 0-1 range for better user comprehension
normalized_score = rrf_score * normalization_factor
# Create new result with normalized score
# Keep original metadata but add RRF details
metadata = result.metadata or {}
metadata["rrf_score_raw"] = rrf_score # Original RRF score
metadata["original_score"] = result.score # Original algorithm score
metadata["normalization_factor"] = normalization_factor
final_results.append(
SearchResult(
id=result.id,
doc_type=result.doc_type,
title=result.title,
excerpt=result.excerpt,
score=normalized_score, # Use normalized score (0-1 range)
metadata=metadata,
)
)
return final_results
-277
View File
@@ -1,277 +0,0 @@
"""Keyword search algorithm using token-based matching on Qdrant payload (ADR-001)."""
import logging
from typing import Any
from qdrant_client.models import FieldCondition, Filter, MatchValue
from nextcloud_mcp_server.config import get_settings
from nextcloud_mcp_server.search.algorithms import SearchAlgorithm, SearchResult
from nextcloud_mcp_server.vector.qdrant_client import get_qdrant_client
logger = logging.getLogger(__name__)
class KeywordSearchAlgorithm(SearchAlgorithm):
"""Keyword search using token-based matching with weighted scoring.
Implements token-based search from ADR-001:
- Title matches weighted 3x higher than content matches
- Case-insensitive token matching
- Relevance scoring based on match frequency and location
"""
# Weighting constants from ADR-001
TITLE_WEIGHT = 3.0
CONTENT_WEIGHT = 1.0
@property
def name(self) -> str:
return "keyword"
async def search(
self,
query: str,
user_id: str,
limit: int = 10,
doc_type: str | None = None,
**kwargs: Any,
) -> list[SearchResult]:
"""Execute keyword search using token matching on Qdrant payload.
Queries Qdrant for all indexed documents, then scores based on token
matches in title and excerpt fields. Returns unverified results - access
verification should be performed separately at the final output stage.
Args:
query: Search query to tokenize and match
user_id: User ID for filtering
limit: Maximum results to return
doc_type: Optional document type filter (None = all types)
**kwargs: Additional parameters (unused)
Returns:
List of unverified SearchResult objects ranked by keyword match score
"""
settings = get_settings()
logger.info(
f"Keyword search: query='{query}', user={user_id}, "
f"limit={limit}, doc_type={doc_type}"
)
# Tokenize query
query_tokens = self._process_query(query)
logger.debug(f"Query tokens: {query_tokens}")
# Build Qdrant filter
filter_conditions = [
FieldCondition(key="user_id", match=MatchValue(value=user_id))
]
if doc_type:
filter_conditions.append(
FieldCondition(key="doc_type", match=MatchValue(value=doc_type))
)
# Scroll through Qdrant to get all matching documents
# We need title and excerpt from payload for token matching
qdrant_client = await get_qdrant_client()
collection = settings.get_collection_name()
all_points = []
offset = None
# Scroll through all points matching filter
while True:
scroll_result, next_offset = await qdrant_client.scroll(
collection_name=collection,
scroll_filter=Filter(must=filter_conditions),
limit=100, # Batch size
offset=offset,
with_payload=[
"doc_id",
"doc_type",
"title",
"excerpt",
"chunk_index",
"total_chunks",
],
with_vectors=False, # Don't need vectors for keyword search
)
all_points.extend(scroll_result)
if next_offset is None:
break
offset = next_offset
logger.debug(
f"Retrieved {len(all_points)} points from Qdrant for keyword search"
)
# Deduplicate by (doc_id, doc_type) - keep best chunk per document
seen_docs = {}
for point in all_points:
doc_id = int(point.payload["doc_id"])
dtype = point.payload.get("doc_type", "note")
doc_key = (doc_id, dtype)
# Keep first chunk (chunk_index=0) as it has the most relevant content
chunk_idx = point.payload.get("chunk_index", 0)
if doc_key not in seen_docs or chunk_idx == 0:
seen_docs[doc_key] = point
logger.debug(f"Deduplicated to {len(seen_docs)} unique documents")
# Score each document based on keyword matches
scored_results = []
for doc_key, point in seen_docs.items():
doc_id, dtype = doc_key
title = point.payload.get("title", "")
excerpt = point.payload.get("excerpt", "")
# Calculate keyword match score
score = self._calculate_score(query_tokens, title, excerpt)
if score > 0: # Only include matches
scored_results.append(
{
"doc_id": doc_id,
"doc_type": dtype,
"title": title,
"excerpt": excerpt,
"score": score,
}
)
# Sort by score (descending) and limit
scored_results.sort(key=lambda x: x["score"], reverse=True)
top_results = scored_results[:limit]
# Return unverified results (verification happens at output stage)
final_results = []
for result in top_results:
final_results.append(
SearchResult(
id=result["doc_id"],
doc_type=result["doc_type"],
title=result["title"],
excerpt=result["excerpt"],
score=result["score"],
metadata={},
)
)
logger.info(f"Keyword search returned {len(final_results)} unverified results")
if final_results:
result_details = [
f"{r.doc_type}_{r.id} (score={r.score:.3f}, title='{r.title}')"
for r in final_results[:5]
]
logger.debug(f"Top keyword results: {', '.join(result_details)}")
return final_results
def _process_query(self, query: str) -> list[str]:
"""Tokenize and normalize query.
Args:
query: Raw query string
Returns:
List of normalized tokens
"""
# Convert to lowercase and split into tokens
tokens = query.lower().split()
# Filter out very short tokens (optional)
tokens = [token for token in tokens if len(token) > 1]
return tokens
def _calculate_score(
self, query_tokens: list[str], title: str, content: str
) -> float:
"""Calculate relevance score based on token matches.
Args:
query_tokens: List of query tokens
title: Document title
content: Document content
Returns:
Relevance score (0.0-1.0)
"""
if not query_tokens:
return 0.0
# Process title and content
title_tokens = title.lower().split()
content_tokens = content.lower().split()
score = 0.0
# Count matches in title
title_matches = sum(1 for qt in query_tokens if qt in title_tokens)
if query_tokens: # Avoid division by zero
title_match_ratio = title_matches / len(query_tokens)
score += self.TITLE_WEIGHT * title_match_ratio
# Count matches in content
content_matches = sum(1 for qt in query_tokens if qt in content_tokens)
if query_tokens:
content_match_ratio = content_matches / len(query_tokens)
score += self.CONTENT_WEIGHT * content_match_ratio
# Normalize score to 0-1 range
# Max score would be TITLE_WEIGHT + CONTENT_WEIGHT if all tokens match everywhere
max_score = self.TITLE_WEIGHT + self.CONTENT_WEIGHT
normalized_score = min(score / max_score, 1.0)
return normalized_score
def _extract_excerpt(
self, content: str, query_tokens: list[str], max_length: int = 200
) -> str:
"""Extract excerpt showing match context.
Args:
content: Full document content
query_tokens: Query tokens to find
max_length: Maximum excerpt length in characters
Returns:
Excerpt string with context around matches
"""
if not content:
return ""
content_lower = content.lower()
# Find first occurrence of any query token
first_match_pos = -1
for token in query_tokens:
pos = content_lower.find(token)
if pos != -1:
if first_match_pos == -1 or pos < first_match_pos:
first_match_pos = pos
if first_match_pos == -1:
# No matches found, return beginning
return content[:max_length].strip() + (
"..." if len(content) > max_length else ""
)
# Extract context around match
start = max(0, first_match_pos - max_length // 2)
end = min(len(content), first_match_pos + max_length // 2)
excerpt = content[start:end].strip()
# Add ellipsis if truncated
if start > 0:
excerpt = "..." + excerpt
if end < len(content):
excerpt = excerpt + "..."
return excerpt
+3
View File
@@ -101,6 +101,7 @@ class SemanticSearchAlgorithm(SearchAlgorithm):
search_response = await qdrant_client.query_points(
collection_name=settings.get_collection_name(),
query=query_embedding,
using="dense", # Use named dense vector (BM25 hybrid collections)
query_filter=Filter(must=filter_conditions),
limit=limit * 2, # Get extra for deduplication
score_threshold=score_threshold,
@@ -149,6 +150,8 @@ class SemanticSearchAlgorithm(SearchAlgorithm):
"chunk_index": result.payload.get("chunk_index"),
"total_chunks": result.payload.get("total_chunks"),
},
chunk_start_offset=result.payload.get("chunk_start_offset"),
chunk_end_offset=result.payload.get("chunk_end_offset"),
)
)
-122
View File
@@ -1,122 +0,0 @@
"""Access verification for search results.
This module provides centralized verification of Nextcloud access permissions
for search results. Verification happens at the final output stage (MCP tool/viz endpoint)
rather than within individual search algorithms, preventing redundant API calls.
Key benefits:
- Deduplication: Each document verified exactly once (even in hybrid mode)
- Parallel execution: All verifications run concurrently via anyio task groups
- Separation of concerns: Algorithms handle scoring, this module handles security
"""
import logging
from dataclasses import replace
from typing import Protocol
import anyio
from nextcloud_mcp_server.search.algorithms import SearchResult
logger = logging.getLogger(__name__)
class NextcloudClientProtocol(Protocol):
"""Protocol for Nextcloud client with app-specific access."""
@property
def notes(self):
"""Notes client for accessing notes API."""
...
async def verify_search_results(
results: list[SearchResult],
nextcloud_client: NextcloudClientProtocol,
) -> list[SearchResult]:
"""
Verify Nextcloud access for search results.
Deduplicates by (doc_id, doc_type), verifies in parallel using anyio task groups,
and filters out inaccessible documents. Maintains original result ordering.
Args:
results: Unverified search results from Qdrant
nextcloud_client: Nextcloud client for access checks
Returns:
Verified and accessible results (same order as input)
Example:
>>> unverified = await search_algo.search(query="test", limit=10)
>>> verified = await verify_search_results(unverified, client)
>>> # verified contains only documents user can access
"""
# Deduplicate by (doc_id, doc_type) while preserving order
# This is critical for hybrid search where same doc may appear in multiple algorithm results
seen = set()
unique_results = []
for result in results:
key = (result.id, result.doc_type)
if key not in seen:
seen.add(key)
unique_results.append(result)
if not unique_results:
return []
logger.debug(
f"Verifying access for {len(unique_results)} unique documents "
f"(from {len(results)} total results)"
)
# Verify all unique documents in parallel using anyio task group
# Use list to maintain order (index-based storage)
verified_results = [None] * len(unique_results)
async def verify_one(index: int, result: SearchResult):
"""
Verify a single document and store result at index.
Args:
index: Position in verified_results list
result: Search result to verify
"""
try:
if result.doc_type == "note":
# Fetch note to verify access and get fresh metadata
note = await nextcloud_client.notes.get_note(result.id)
# Update metadata with fresh data from Nextcloud
updated_metadata = {**(result.metadata or {}), **note}
verified_results[index] = replace(result, metadata=updated_metadata)
# TODO: Add verification for other doc types (calendar, deck, file, etc.)
else:
# For now, assume other types are accessible
# In production, add proper verification for each type
logger.debug(
f"No verification implemented for doc_type={result.doc_type}, "
"assuming accessible"
)
verified_results[index] = result
except Exception as e:
# Document is inaccessible (403, 404, or other error)
# Log at debug level since this is expected for filtered results
logger.debug(f"Document {result.doc_type}/{result.id} not accessible: {e}")
verified_results[index] = None
# Run all verifications in parallel using anyio task group
# This provides structured concurrency with automatic cancellation on errors
async with anyio.create_task_group() as tg:
for idx, result in enumerate(unique_results):
tg.start_soon(verify_one, idx, result)
# Filter out None (inaccessible) and return verified results
accessible = [r for r in verified_results if r is not None]
logger.debug(
f"Verification complete: {len(accessible)} accessible, "
f"{len(unique_results) - len(accessible)} filtered out"
)
return accessible
+100 -94
View File
@@ -1,8 +1,8 @@
"""Semantic search MCP tools using vector database."""
import logging
from typing import Literal
import anyio
from httpx import RequestError
from mcp.server.fastmcp import Context, FastMCP
from mcp.shared.exceptions import McpError
@@ -25,12 +25,7 @@ from nextcloud_mcp_server.models.semantic import (
from nextcloud_mcp_server.observability.metrics import (
instrument_tool,
)
from nextcloud_mcp_server.search import (
FuzzySearchAlgorithm,
HybridSearchAlgorithm,
KeywordSearchAlgorithm,
SemanticSearchAlgorithm,
)
from nextcloud_mcp_server.search.bm25_hybrid import BM25HybridSearchAlgorithm
logger = logging.getLogger(__name__)
@@ -46,36 +41,34 @@ def configure_semantic_tools(mcp: FastMCP):
ctx: Context,
limit: int = 10,
doc_types: list[str] | None = None,
score_threshold: float = 0.7,
algorithm: Literal["semantic", "keyword", "fuzzy", "hybrid"] = "hybrid",
semantic_weight: float = 0.5,
keyword_weight: float = 0.3,
fuzzy_weight: float = 0.2,
score_threshold: float = 0.0,
fusion: str = "rrf",
) -> SemanticSearchResponse:
"""
Search Nextcloud content using configurable algorithms with cross-app support.
Search Nextcloud content using BM25 hybrid search with cross-app support.
Supports multiple search algorithms with client-configurable weighting:
- semantic: Vector similarity search (requires VECTOR_SYNC_ENABLED=true)
- keyword: Token-based matching (title matches weighted 3x)
- fuzzy: Character overlap matching (typo-tolerant)
- hybrid: Combines all algorithms using Reciprocal Rank Fusion (default)
Uses Qdrant's native hybrid search combining:
- Dense semantic vectors: For conceptual similarity and natural language queries
- BM25 sparse vectors: For precise keyword matching, acronyms, and specific terms
Document types are queried from the vector database to determine what's
actually indexed. Currently only "note" documents are fully supported.
Results are automatically fused using the selected fusion algorithm in the
database for optimal relevance. This provides the best of both semantic
understanding and keyword precision.
Requires VECTOR_SYNC_ENABLED=true. Currently only "note" documents are
fully supported for indexing.
Args:
query: Natural language search query
query: Natural language or keyword search query
limit: Maximum number of results to return (default: 10)
doc_types: Document types to search (e.g., ["note", "file"]). None = search all indexed types (default)
score_threshold: Minimum similarity score for semantic/hybrid (0-1, default: 0.7)
algorithm: Search algorithm to use (default: "hybrid")
semantic_weight: Weight for semantic results in hybrid mode (default: 0.5)
keyword_weight: Weight for keyword results in hybrid mode (default: 0.3)
fuzzy_weight: Weight for fuzzy results in hybrid mode (default: 0.2)
score_threshold: Minimum fusion score (0-1, default: 0.0)
fusion: Fusion algorithm: "rrf" (Reciprocal Rank Fusion, default) or "dbsf" (Distribution-Based Score Fusion)
RRF: Good general-purpose fusion using reciprocal ranks
DBSF: Uses distribution-based normalization, may better balance different score ranges
Returns:
SemanticSearchResponse with matching documents and relevance scores
SemanticSearchResponse with matching documents ranked by fusion scores
"""
from nextcloud_mcp_server.config import get_settings
@@ -84,42 +77,24 @@ def configure_semantic_tools(mcp: FastMCP):
username = client.username
logger.info(
f"Search: query='{query}', user={username}, algorithm={algorithm}, "
f"limit={limit}, score_threshold={score_threshold}"
f"BM25 hybrid search: query='{query}', user={username}, "
f"limit={limit}, score_threshold={score_threshold}, fusion={fusion}"
)
# Check that vector sync is enabled
if not settings.vector_sync_enabled:
raise McpError(
ErrorData(
code=-1,
message="BM25 hybrid search requires VECTOR_SYNC_ENABLED=true",
)
)
try:
# Create appropriate algorithm instance
if algorithm == "semantic":
if not settings.vector_sync_enabled:
raise McpError(
ErrorData(
code=-1,
message="Semantic search requires VECTOR_SYNC_ENABLED=true",
)
)
search_algo = SemanticSearchAlgorithm(score_threshold=score_threshold)
elif algorithm == "keyword":
search_algo = KeywordSearchAlgorithm()
elif algorithm == "fuzzy":
search_algo = FuzzySearchAlgorithm()
elif algorithm == "hybrid":
if semantic_weight > 0 and not settings.vector_sync_enabled:
raise McpError(
ErrorData(
code=-1,
message="Hybrid search with semantic component requires VECTOR_SYNC_ENABLED=true",
)
)
search_algo = HybridSearchAlgorithm(
semantic_weight=semantic_weight,
keyword_weight=keyword_weight,
fuzzy_weight=fuzzy_weight,
)
else:
raise McpError(
ErrorData(code=-1, message=f"Unknown algorithm: {algorithm}")
)
# Create BM25 hybrid search algorithm with specified fusion
search_algo = BM25HybridSearchAlgorithm(
score_threshold=score_threshold, fusion=fusion
)
# Execute search across requested document types
# If doc_types is None, search all indexed types (cross-app search)
@@ -153,11 +128,18 @@ def configure_semantic_tools(mcp: FastMCP):
# Sort combined results by score
all_results.sort(key=lambda r: r.score, reverse=True)
# Verify access for all results (deduplicates and filters)
from nextcloud_mcp_server.search.verification import verify_search_results
# Deduplicate results (hybrid search may return same doc from dense + sparse)
# Qdrant already filters by user_id for multi-tenant isolation
# Sampling tool will verify access when fetching full content
seen = set()
unique_results = []
for result in all_results:
key = (result.id, result.doc_type)
if key not in seen:
seen.add(key)
unique_results.append(result)
verified_results = await verify_search_results(all_results, client)
search_results = verified_results[:limit] # Final limit after verification
search_results = unique_results[:limit] # Final limit after deduplication
# Convert SearchResult objects to SemanticSearchResult for response
results = []
@@ -176,16 +158,18 @@ def configure_semantic_tools(mcp: FastMCP):
total_chunks=r.metadata.get("total_chunks", 1)
if r.metadata
else 1,
chunk_start_offset=r.chunk_start_offset,
chunk_end_offset=r.chunk_end_offset,
)
)
logger.info(f"Returning {len(results)} results from {algorithm} search")
logger.info(f"Returning {len(results)} results from BM25 hybrid search")
return SemanticSearchResponse(
results=results,
query=query,
total_found=len(results),
search_method=algorithm,
search_method=f"bm25_hybrid_{fusion}",
)
except ValueError as e:
@@ -217,6 +201,7 @@ def configure_semantic_tools(mcp: FastMCP):
limit: int = 5,
score_threshold: float = 0.7,
max_answer_tokens: int = 500,
fusion: str = "rrf",
) -> SamplingSearchResponse:
"""
Semantic search with LLM-generated answer using MCP sampling.
@@ -241,6 +226,7 @@ def configure_semantic_tools(mcp: FastMCP):
limit: Maximum number of documents to retrieve (default: 5)
score_threshold: Minimum similarity score 0-1 (default: 0.7)
max_answer_tokens: Maximum tokens for generated answer (default: 500)
fusion: Fusion algorithm: "rrf" (Reciprocal Rank Fusion, default) or "dbsf" (Distribution-Based Score Fusion)
Returns:
SamplingSearchResponse containing:
@@ -280,6 +266,7 @@ def configure_semantic_tools(mcp: FastMCP):
ctx=ctx,
limit=limit,
score_threshold=score_threshold,
fusion=fusion,
)
# 2. Handle no results case - don't waste a sampling call
@@ -334,35 +321,55 @@ def configure_semantic_tools(mcp: FastMCP):
success=True,
)
# 4. Fetch full content for notes to provide complete context to LLM
# Filter out inaccessible notes (deleted or permissions changed)
# 4. Fetch full content for notes in parallel (also verifies access)
# Use anyio task group for concurrent fetching with semaphore to prevent
# connection pool exhaustion
client = await get_client(ctx)
accessible_results = []
full_contents = [] # Full content for accessible notes
accessible_results = [None] * len(search_response.results)
full_contents = [None] * len(search_response.results)
for result in search_response.results:
if result.doc_type == "note":
try:
note = await client.notes.get_note(result.id)
# Note is accessible, store full content
accessible_results.append(result)
full_contents.append(note.get("content", ""))
logger.debug(
f"Fetched full content for note {result.id} "
f"(length: {len(full_contents[-1])} chars)"
)
except Exception as e:
# Note might have been deleted or permissions changed
# Filter it out to avoid corrupting LLM with inaccessible data
logger.warning(
f"Failed to fetch full content for note {result.id}: {e}. "
f"Excluding from results."
)
else:
# Non-note document types (future: calendar, deck, files)
# For now, keep them with excerpts
accessible_results.append(result)
full_contents.append(None)
# Limit concurrent requests to prevent connection pool exhaustion
max_concurrent = 20
semaphore = anyio.Semaphore(max_concurrent)
async def fetch_content(index: int, result: SemanticSearchResult):
"""Fetch full content for a single document (parallel with semaphore)."""
async with semaphore:
if result.doc_type == "note":
try:
note = await client.notes.get_note(result.id)
# Note is accessible, store result and full content
content = note.get("content", "")
accessible_results[index] = result
full_contents[index] = content
logger.debug(
f"Fetched full content for note {result.id} "
f"(length: {len(content)} chars)"
)
except Exception as e:
# Note might have been deleted or permissions changed
# Leave as None to filter out later
logger.debug(
f"Note {result.id} not accessible: {e}. "
f"Excluding from results."
)
else:
# Non-note document types (future: calendar, deck, files)
# For now, keep them with excerpts
accessible_results[index] = result
# full_contents[index] remains None (will use excerpt)
# Run all fetches in parallel using anyio task group
async with anyio.create_task_group() as tg:
for idx, result in enumerate(search_response.results):
tg.start_soon(fetch_content, idx, result)
# Filter out None (inaccessible notes) while preserving order
final_pairs = [
(r, c) for r, c in zip(accessible_results, full_contents) if r is not None
]
accessible_results = [r for r, c in final_pairs]
full_contents = [c for r, c in final_pairs]
# Check if we filtered out all results
if not accessible_results:
@@ -414,7 +421,6 @@ def configure_semantic_tools(mcp: FastMCP):
)
# 6. Request LLM completion via MCP sampling with timeout
import anyio
try:
with anyio.fail_after(30):
+64 -25
View File
@@ -1,51 +1,90 @@
"""Document chunking for large texts."""
"""Document chunking for large texts using LangChain text splitters."""
import logging
from dataclasses import dataclass
from langchain_text_splitters import RecursiveCharacterTextSplitter
logger = logging.getLogger(__name__)
class DocumentChunker:
"""Chunk large documents for optimal embedding."""
@dataclass
class ChunkWithPosition:
"""A text chunk with its character position in the original document."""
def __init__(self, chunk_size: int = 512, overlap: int = 50):
text: str
start_offset: int # Character position where chunk starts
end_offset: int # Character position where chunk ends (exclusive)
class DocumentChunker:
"""Chunk large documents for optimal embedding using LangChain text splitters.
Uses RecursiveCharacterTextSplitter which preserves semantic boundaries
by splitting on sentence and paragraph boundaries before resorting to
character-level splitting.
"""
def __init__(self, chunk_size: int = 2048, overlap: int = 200):
"""
Initialize document chunker.
Args:
chunk_size: Number of words per chunk (default: 512)
overlap: Number of overlapping words between chunks (default: 50)
chunk_size: Number of characters per chunk (default: 2048)
overlap: Number of overlapping characters between chunks (default: 200)
"""
self.chunk_size = chunk_size
self.overlap = overlap
def chunk_text(self, content: str) -> list[str]:
"""
Split text into overlapping chunks.
# Initialize LangChain RecursiveCharacterTextSplitter
# Uses hierarchical splitting to preserve semantic boundaries:
# - Paragraphs (\n\n)
# - Sentences (. ! ?)
# - Words (spaces)
# - Characters (last resort)
# This prevents mid-sentence splitting while maintaining semantic coherence
self.splitter = RecursiveCharacterTextSplitter(
chunk_size=chunk_size,
chunk_overlap=overlap,
add_start_index=True, # Enable position tracking
strip_whitespace=True,
)
Uses simple word-based chunking with configurable overlap to preserve
context across chunk boundaries.
def chunk_text(self, content: str) -> list[ChunkWithPosition]:
"""
Split text into overlapping chunks with position tracking.
Uses LangChain's RecursiveCharacterTextSplitter to create chunks that
preserve semantic boundaries by splitting at paragraphs and sentences
before resorting to word or character-level splitting. This ensures
sentences are kept intact. Preserves character positions for each chunk
to enable precise document retrieval.
Args:
content: Text content to chunk
Returns:
List of text chunks (may be single item if content is small)
List of chunks with their character positions in the original content
"""
# Simple word-based chunking
words = content.split()
# Handle empty content - return single empty chunk for backward compatibility
if not content:
return [ChunkWithPosition(text="", start_offset=0, end_offset=0)]
if len(words) <= self.chunk_size:
return [content]
# Use LangChain to create documents with position tracking
docs = self.splitter.create_documents([content])
chunks = []
start = 0
# Convert LangChain Documents to ChunkWithPosition objects
chunks = [
ChunkWithPosition(
text=doc.page_content,
start_offset=doc.metadata.get("start_index", 0),
end_offset=doc.metadata.get("start_index", 0) + len(doc.page_content),
)
for doc in docs
]
while start < len(words):
end = start + self.chunk_size
chunk_words = words[start:end]
chunks.append(" ".join(chunk_words))
start = end - self.overlap
logger.debug(f"Chunked document into {len(chunks)} chunks ({len(words)} words)")
logger.debug(
f"Chunked document into {len(chunks)} chunks "
f"(chunk_size={self.chunk_size}, overlap={self.overlap})"
)
return chunks
+28 -6
View File
@@ -8,13 +8,14 @@ import time
import uuid
import anyio
from anyio.abc import TaskStatus
from anyio.streams.memory import MemoryObjectReceiveStream
from httpx import HTTPStatusError
from qdrant_client.models import FieldCondition, Filter, MatchValue, PointStruct
from nextcloud_mcp_server.client import NextcloudClient
from nextcloud_mcp_server.config import get_settings
from nextcloud_mcp_server.embedding import get_embedding_service
from nextcloud_mcp_server.embedding import get_bm25_service, get_embedding_service
from nextcloud_mcp_server.observability.metrics import (
record_qdrant_operation,
record_vector_sync_processing,
@@ -34,6 +35,8 @@ async def processor_task(
shutdown_event: anyio.Event,
nc_client: NextcloudClient,
user_id: str,
*,
task_status: TaskStatus = anyio.TASK_STATUS_IGNORED,
):
"""
Process documents from stream concurrently.
@@ -53,9 +56,13 @@ async def processor_task(
shutdown_event: Event signaling shutdown
nc_client: Authenticated Nextcloud client
user_id: User being processed
task_status: Status object for signaling task readiness
"""
logger.info(f"Processor {worker_id} started")
# Signal that the task has started and is ready
task_status.started()
while not shutdown_event.is_set():
try:
# Get document with timeout (allows checking shutdown)
@@ -226,15 +233,24 @@ async def _index_document(
)
chunks = chunker.chunk_text(content)
# Generate embeddings (I/O bound - external API call)
# Extract chunk texts for embedding
chunk_texts = [chunk.text for chunk in chunks]
# Generate dense embeddings (I/O bound - external API call)
embedding_service = get_embedding_service()
embeddings = await embedding_service.embed_batch(chunks)
dense_embeddings = await embedding_service.embed_batch(chunk_texts)
# Generate sparse embeddings (BM25 for keyword matching)
bm25_service = get_bm25_service()
sparse_embeddings = bm25_service.encode_batch(chunk_texts)
# Prepare Qdrant points
indexed_at = int(time.time())
points = []
for i, (chunk, embedding) in enumerate(zip(chunks, embeddings)):
for i, (chunk, dense_emb, sparse_emb) in enumerate(
zip(chunks, dense_embeddings, sparse_embeddings)
):
# Generate deterministic UUID for point ID
# Using uuid5 with DNS namespace and combining doc info
point_name = f"{doc_task.doc_type}:{doc_task.doc_id}:chunk:{i}"
@@ -243,18 +259,24 @@ async def _index_document(
points.append(
PointStruct(
id=point_id,
vector=embedding,
vector={
"dense": dense_emb,
"sparse": sparse_emb,
},
payload={
"user_id": doc_task.user_id,
"doc_id": doc_task.doc_id,
"doc_type": doc_task.doc_type,
"title": title,
"excerpt": chunk[:200],
"excerpt": chunk.text[:200],
"indexed_at": indexed_at,
"modified_at": doc_task.modified_at,
"etag": etag,
"chunk_index": i,
"total_chunks": len(chunks),
"chunk_start_offset": chunk.start_offset,
"chunk_end_offset": chunk.end_offset,
"metadata_version": 2, # v2 includes position metadata
},
)
)
+24 -9
View File
@@ -2,7 +2,7 @@
import logging
from qdrant_client import AsyncQdrantClient
from qdrant_client import AsyncQdrantClient, models
from qdrant_client.models import Distance, VectorParams
from nextcloud_mcp_server.config import get_settings
@@ -84,7 +84,12 @@ async def get_qdrant_client() -> AsyncQdrantClient:
f"Collection '{collection_name}' found, validating dimensions..."
)
collection_info = await _qdrant_client.get_collection(collection_name)
actual_dimension = collection_info.config.params.vectors.size
# Handle both named vectors (dict) and legacy single vector
vectors = collection_info.config.params.vectors
if isinstance(vectors, dict):
actual_dimension = vectors["dense"].size
else:
actual_dimension = vectors.size
# Validate dimension matches
if actual_dimension != expected_dimension:
@@ -112,17 +117,27 @@ async def get_qdrant_client() -> AsyncQdrantClient:
)
await _qdrant_client.create_collection(
collection_name=collection_name,
vectors_config=VectorParams(
size=expected_dimension,
distance=Distance.COSINE,
),
vectors_config={
"dense": VectorParams(
size=expected_dimension,
distance=Distance.COSINE,
),
},
sparse_vectors_config={
"sparse": models.SparseVectorParams(
index=models.SparseIndexParams(
on_disk=False,
)
),
},
)
logger.info(
f"Created Qdrant collection: {collection_name}\n"
f" Dimension: {expected_dimension}\n"
f" Model: {settings.ollama_embedding_model}\n"
f" Dense vector dimension: {expected_dimension}\n"
f" Dense embedding model: {settings.ollama_embedding_model}\n"
f" Sparse vectors: BM25 (for hybrid search)\n"
f" Distance: COSINE\n"
f"Background sync will index all documents with this embedding model."
f"Background sync will index all documents with dense + sparse vectors."
)
return _qdrant_client
+68 -59
View File
@@ -8,6 +8,7 @@ import time
from dataclasses import dataclass
import anyio
from anyio.abc import TaskStatus
from anyio.streams.memory import MemoryObjectSendStream
from qdrant_client.models import FieldCondition, Filter, MatchValue
@@ -93,6 +94,8 @@ async def scanner_task(
wake_event: anyio.Event,
nc_client: NextcloudClient,
user_id: str,
*,
task_status: TaskStatus = anyio.TASK_STATUS_IGNORED,
):
"""
Periodic scanner that detects changed documents for enabled user.
@@ -105,10 +108,14 @@ async def scanner_task(
wake_event: Event to trigger immediate scan
nc_client: Authenticated Nextcloud client
user_id: User to scan
task_status: Status object for signaling task readiness
"""
logger.info(f"Scanner task started for user: {user_id}")
settings = get_settings()
# Signal that the task has started and is ready
task_status.started()
async with send_stream:
while not shutdown_event.is_set():
try:
@@ -175,73 +182,43 @@ async def scan_user_documents(
f"[SCAN-{scan_id}] Using pruneBefore={prune_before} to optimize data transfer"
)
# Fetch all notes from Nextcloud
notes = [
note
async for note in nc_client.notes.get_all_notes(prune_before=prune_before)
]
logger.info(f"[SCAN-{scan_id}] Found {len(notes)} notes for {user_id}")
# Get indexed state from Qdrant first (for incremental sync)
indexed_docs = {}
if not initial_sync:
qdrant_client = await get_qdrant_client()
scroll_result = await qdrant_client.scroll(
collection_name=get_settings().get_collection_name(),
scroll_filter=Filter(
must=[
FieldCondition(key="user_id", match=MatchValue(value=user_id)),
FieldCondition(key="doc_type", match=MatchValue(value="note")),
]
),
with_payload=["doc_id", "indexed_at"],
with_vectors=False,
limit=10000,
)
# Record documents scanned
record_vector_sync_scan(len(notes))
indexed_docs = {
point.payload["doc_id"]: point.payload["indexed_at"]
for point in scroll_result[0]
}
if initial_sync:
# Send everything on first sync
for note in notes:
modified_at = note.get("modified", 0)
await send_stream.send(
DocumentTask(
user_id=user_id,
doc_id=str(note["id"]),
doc_type="note",
operation="index",
modified_at=modified_at,
)
)
logger.info(f"Sent {len(notes)} documents for initial sync: {user_id}")
return
logger.debug(f"Found {len(indexed_docs)} indexed documents in Qdrant")
# Get indexed state from Qdrant
qdrant_client = await get_qdrant_client()
scroll_result = await qdrant_client.scroll(
collection_name=get_settings().get_collection_name(),
scroll_filter=Filter(
must=[
FieldCondition(key="user_id", match=MatchValue(value=user_id)),
FieldCondition(key="doc_type", match=MatchValue(value="note")),
]
),
with_payload=["doc_id", "indexed_at"],
with_vectors=False,
limit=10000,
)
indexed_docs = {
point.payload["doc_id"]: point.payload["indexed_at"]
for point in scroll_result[0]
}
logger.debug(f"Found {len(indexed_docs)} indexed documents in Qdrant")
# Compare and queue changes
# Stream notes from Nextcloud and process immediately
note_count = 0
queued = 0
nextcloud_doc_ids = {str(note["id"]) for note in notes}
nextcloud_doc_ids = set()
for note in notes:
async for note in nc_client.notes.get_all_notes(prune_before=prune_before):
note_count += 1
doc_id = str(note["id"])
indexed_at = indexed_docs.get(doc_id)
nextcloud_doc_ids.add(doc_id)
modified_at = note.get("modified", 0)
# If document reappeared, remove from potentially_deleted
doc_key = (user_id, doc_id)
if doc_key in _potentially_deleted:
logger.debug(
f"Document {doc_id} reappeared, removing from deletion grace period"
)
del _potentially_deleted[doc_key]
# Send if never indexed or modified since last index
if indexed_at is None or modified_at > indexed_at:
if initial_sync:
# Send everything on first sync
await send_stream.send(
DocumentTask(
user_id=user_id,
@@ -252,6 +229,38 @@ async def scan_user_documents(
)
)
queued += 1
else:
# Incremental sync: compare with indexed state
indexed_at = indexed_docs.get(doc_id)
# If document reappeared, remove from potentially_deleted
doc_key = (user_id, doc_id)
if doc_key in _potentially_deleted:
logger.debug(
f"Document {doc_id} reappeared, removing from deletion grace period"
)
del _potentially_deleted[doc_key]
# Send if never indexed or modified since last index
if indexed_at is None or modified_at > indexed_at:
await send_stream.send(
DocumentTask(
user_id=user_id,
doc_id=doc_id,
doc_type="note",
operation="index",
modified_at=modified_at,
)
)
queued += 1
# Log and record metrics after streaming
logger.info(f"[SCAN-{scan_id}] Found {note_count} notes for {user_id}")
record_vector_sync_scan(note_count)
if initial_sync:
logger.info(f"Sent {queued} documents for initial sync: {user_id}")
return
# Check for deleted documents (in Qdrant but not in Nextcloud)
# Use grace period: only delete after 2 consecutive scans confirm absence
+8 -2
View File
@@ -1,6 +1,6 @@
[project]
name = "nextcloud-mcp-server"
version = "0.36.0"
version = "0.43.0"
description = "Model Context Protocol (MCP) server for Nextcloud integration - enables AI assistants to interact with Nextcloud data"
authors = [
{name = "Chris Coutinho", email = "chris@coutinho.io"}
@@ -12,7 +12,7 @@ keywords = ["nextcloud", "mcp", "model-context-protocol", "llm", "ai", "claude",
dependencies = [
"mcp[cli] (>=1.21,<1.22)",
"httpx (>=0.28.1,<0.29.0)",
"pillow (>=12.0.0,<12.1.0)",
"pillow (>=10.3.0,<12.0.0)", # Compatible with fastembed
"icalendar (>=6.0.0,<7.0.0)",
"pythonvcard4>=0.2.0",
"pydantic>=2.11.4",
@@ -22,6 +22,9 @@ dependencies = [
"aiosqlite>=0.20.0", # Async SQLite for refresh token storage
"authlib>=1.6.5",
"qdrant-client>=1.7.0",
"fastembed>=0.7.3", # BM25 sparse vector embeddings for hybrid search
"anthropic>=0.42.0", # For RAG evaluation with Anthropic LLMs
"boto3>=1.35.0", # For Amazon Bedrock provider (optional)
# Observability dependencies
"prometheus-client>=0.21.0", # Prometheus metrics
"opentelemetry-api>=1.28.2", # OpenTelemetry API
@@ -31,6 +34,8 @@ dependencies = [
"opentelemetry-instrumentation-logging>=0.49b2", # Logging integration
"opentelemetry-exporter-otlp-proto-grpc>=1.28.2", # OTLP gRPC exporter
"python-json-logger>=3.2.0", # Structured JSON logging
"jinja2>=3.1.6",
"langchain-text-splitters>=1.0.0",
]
classifiers = [
"Development Status :: 4 - Beta",
@@ -103,6 +108,7 @@ module-root = ""
[dependency-groups]
dev = [
"commitizen>=4.8.2",
"datasets>=3.3.0", # For BeIR nfcorpus dataset loading
"ipython>=9.2.0",
"playwright>=1.49.1",
"pytest>=8.3.5",
+9 -2
View File
@@ -255,8 +255,15 @@ async def nc_mcp_client(anyio_backend) -> AsyncGenerator[ClientSession, Any]:
Note: SSE transport is being deprecated. This fixture uses SSE for compatibility testing.
"""
async for session in create_mcp_client_session_sse(
url="http://localhost:8000/sse", client_name="Basic MCP (SSE)"
# async for session in create_mcp_client_session_sse(
# url="http://localhost:8000/sse", client_name="Basic MCP (SSE)"
# ):
# yield session
async for session in create_mcp_client_session(
url="http://localhost:8000/mcp",
client_name="Basic MCP (HTTP)",
):
yield session
+278
View File
@@ -0,0 +1,278 @@
# RAG Evaluation Tests
This directory contains tests for evaluating the Retrieval-Augmented Generation (RAG) system in the Nextcloud MCP server, specifically the `nc_semantic_search_answer` tool.
## Architecture
The RAG system has two components that are tested independently:
1. **Retrieval** - Vector sync/embedding pipeline (indexed Nextcloud documents → vector database)
2. **Generation** - MCP client LLM synthesis (retrieved context → natural language answer)
See [ADR-013](../../docs/ADR-013-rag-evaluation.md) for full architectural details.
## Test Structure
```
tests/rag_evaluation/
├── README.md # This file
├── conftest.py # Pytest fixtures
├── llm_providers.py # LLM provider abstraction (Ollama/Anthropic)
├── fixtures/
│ └── ground_truth.json # Pre-generated reference answers
├── test_retrieval_quality.py # Retrieval evaluation (Context Recall)
└── test_generation_quality.py # Generation evaluation (Answer Correctness)
```
## Metrics
### Retrieval Evaluation
- **Metric**: Context Recall
- **Method**: Heuristic - Check if ground-truth document IDs appear in top-k results
- **Target**: ≥80% recall
### Generation Evaluation
- **Metric**: Answer Correctness
- **Method**: LLM-as-judge - Compare RAG answer vs ground truth (binary true/false)
- **Evaluation**: External LLM evaluates semantic equivalence
## Dataset
**BeIR/nfcorpus** - Medical/biomedical corpus with ~3,600 documents
**Test Queries** (5 selected):
1. PLAIN-2630: "Alkylphenol Endocrine Disruptors and Allergies" (21 relevant docs)
2. PLAIN-2660: "How Long to Detox From Fish Before Pregnancy?" (20 relevant docs)
3. PLAIN-2510: "Coffee and Artery Function" (16 relevant docs)
4. PLAIN-2430: "Preventing Brain Loss with B Vitamins?" (15 relevant docs)
5. PLAIN-2690: "Chronic Headaches and Pork Tapeworms" (14 relevant docs)
## Setup
### 1. Install Dependencies
```bash
uv sync --group dev
```
This installs:
- `anthropic>=0.42.0` - For Anthropic LLM evaluation
- `click>=8.1.8` - For CLI interface
- `datasets>=3.3.0` - For BeIR nfcorpus dataset loading
### 2. Configure LLM Provider
Set environment variables for your LLM provider:
**Option A: Ollama (default, local/remote)**
```bash
export RAG_EVAL_PROVIDER=ollama
export OLLAMA_HOST=https://ollama.example.com # or RAG_EVAL_OLLAMA_BASE_URL
export RAG_EVAL_OLLAMA_MODEL=llama3.2:1b
```
**Option B: Anthropic (cloud)**
```bash
export RAG_EVAL_PROVIDER=anthropic
export RAG_EVAL_ANTHROPIC_API_KEY=sk-ant-...
export RAG_EVAL_ANTHROPIC_MODEL=claude-3-5-sonnet-20241022
```
### 3. One-Time Setup: Generate Ground Truth
Generate synthetic reference answers for the 5 test queries:
```bash
uv run python tools/rag_eval_cli.py generate
```
**What this does:**
- Downloads nfcorpus dataset to `tests/rag_evaluation/fixtures/nfcorpus/` (cached locally)
- For each of the 5 selected queries, extracts highly relevant documents
- Uses configured LLM to synthesize a reference answer
- Saves to `tests/rag_evaluation/fixtures/ground_truth.json`
**Optional flags:**
- `--provider ollama|anthropic` - Override LLM provider
- `--model MODEL_NAME` - Override model name
- `--force-download` - Re-download nfcorpus dataset
### 4. One-Time Setup: Upload Corpus to Nextcloud
Upload all 3,633 nfcorpus documents as Nextcloud notes:
```bash
uv run python tools/rag_eval_cli.py upload \
--nextcloud-url http://localhost:8000 \
--username admin \
--password admin
```
**What this does:**
- Downloads nfcorpus dataset (if not already cached)
- Uploads all documents as notes in Nextcloud
- Saves document ID → note ID mapping to `tests/rag_evaluation/fixtures/note_mapping.json`
**Optional flags:**
- `--category CATEGORY` - Custom category for notes (default: `nfcorpus_rag_eval`)
- `--force-download` - Re-download nfcorpus dataset
- `--force` - Delete all existing notes in the target category before uploading (efficient corpus refresh)
**Important:** This step requires:
- A running Nextcloud instance with vector sync enabled
- Notes app installed
- Valid credentials
**Duration:** ~10-15 minutes to upload 3,633 documents
## Running Tests
### Run All RAG Evaluation Tests
```bash
uv run pytest tests/rag_evaluation/ -v
```
### Run Specific Test Suites
**Retrieval Quality Only:**
```bash
uv run pytest tests/rag_evaluation/test_retrieval_quality.py -v
```
**Generation Quality Only:**
```bash
uv run pytest tests/rag_evaluation/test_generation_quality.py -v
```
### Run Individual Tests
```bash
uv run pytest tests/rag_evaluation/test_retrieval_quality.py::test_retrieval_context_recall -v
uv run pytest tests/rag_evaluation/test_generation_quality.py::test_answer_correctness -v
```
## Test Execution Flow
**Prerequisites** (one-time setup):
1. Generated ground truth (`tools/rag_eval_cli.py generate`)
2. Uploaded corpus to Nextcloud (`tools/rag_eval_cli.py upload`)
### Retrieval Quality Tests
1. **Setup** (`nfcorpus_test_data` fixture):
- Loads pre-generated ground truth from `fixtures/ground_truth.json`
- Loads note mapping from `fixtures/note_mapping.json`
- Returns test cases with expected note IDs
2. **Test** (`test_retrieval_context_recall`):
- For each query: Perform semantic search (top-10)
- Extract retrieved note IDs
- Calculate Context Recall = (expected ∩ retrieved) / expected
- Assert recall ≥ 80%
3. **Cleanup**:
- None required (notes persist in Nextcloud for reuse)
### Generation Quality Tests
1. **Setup**:
- Same as retrieval tests (reuses `nfcorpus_test_data` fixture)
- Creates evaluation LLM provider
2. **Test** (`test_answer_correctness`):
- For each query: Call `nc_semantic_search_answer` MCP tool
- Extract generated answer
- Use LLM-as-judge to compare vs ground truth
- Assert semantic equivalence (TRUE/FALSE)
3. **Cleanup**:
- LLM provider closed
## Expected Test Duration
**One-time setup:**
- **Generate ground truth**: ~5-10 minutes (5 queries with LLM generation)
- **Upload corpus**: ~10-15 minutes (3,633 documents)
- **Total setup**: ~15-25 minutes
**Test execution** (after setup):
- **Retrieval tests**: ~1-2 minutes (5 queries, no upload/cleanup)
- **Generation tests**: ~5-10 minutes (RAG generation + LLM evaluation)
- **Total per run**: ~6-12 minutes
**Note**: These are NOT smoke tests and are NOT run in CI.
## Limitations & Future Work
**Current Limitations:**
- Only 5 test queries (limited statistical confidence)
- Medical domain bias (may not represent production use cases)
- Synthetic ground truth (LLM-generated, not human-validated)
- Manual test execution (requires external LLM access)
**Future Enhancements:**
- Expand to 50-100 queries for statistical significance
- Add custom test dataset with production-representative documents
- Implement additional metrics (faithfulness, context relevance, answer relevance)
- Create automated benchmarking dashboard
- Test multi-hop reasoning (synthesis questions)
- Evaluate out-of-scope handling ("I don't know" responses)
## Troubleshooting
### Tests Fail with "Ground truth file not found"
Run the generate command first:
```bash
uv run python tools/rag_eval_cli.py generate
```
### Tests Fail with "Note mapping file not found"
Run the upload command first:
```bash
uv run python tools/rag_eval_cli.py upload --nextcloud-url http://localhost:8000 --username admin --password admin
```
### Tests Fail with "MCP sampling client not yet implemented"
The `mcp_sampling_client` fixture is a placeholder. You need to implement MCP client creation with sampling support. See the TODO in `conftest.py`.
### Upload Command Fails
Common issues:
1. **Nextcloud not running**: Ensure Nextcloud is accessible at the URL
2. **Invalid credentials**: Verify username/password
3. **Notes app not installed**: Install Notes app in Nextcloud
4. **Network timeout**: Increase timeout in CLI (currently 60s)
### LLM Timeout
If ground truth generation times out:
1. Increase timeout in `llm_providers.py` (currently 10 min)
2. Use a faster model: `--model llama3.2:1b`
3. Check Ollama/Anthropic service availability
### Dataset Download Fails
The nfcorpus dataset is downloaded automatically. If download fails:
1. Check internet connection
2. Manually download from: https://public.ukp.informatik.tu-darmstadt.de/thakur/BEIR/datasets/nfcorpus.zip
3. Extract to `tests/rag_evaluation/fixtures/nfcorpus/`
4. Or use HuggingFace datasets cache: `~/.cache/huggingface/datasets/BeIR___nfcorpus/`
### Vector Sync Not Indexing Documents
After uploading, vector sync must index the documents:
1. Check vector sync is enabled in Nextcloud
2. Trigger manual sync if needed
3. Wait for background job to process all documents
4. Verify in Qdrant that vectors exist for uploaded notes
## References
- [ADR-013: RAG Evaluation Testing Framework](../../docs/ADR-013-rag-evaluation.md)
- [ADR-008: MCP Sampling for Semantic Search](../../docs/ADR-008-mcp-sampling-for-semantic-search.md)
- [BeIR Benchmark](https://github.com/beir-cellar/beir)
- [NFCorpus Dataset](https://www.cl.uni-heidelberg.de/statnlpgroup/nfcorpus/)
+1
View File
@@ -0,0 +1 @@
"""RAG evaluation tests for the Nextcloud MCP semantic search system."""
+145
View File
@@ -0,0 +1,145 @@
"""Pytest fixtures for RAG evaluation tests.
IMPORTANT: Before running these tests, you must:
1. Generate ground truth: uv run python tools/rag_eval_cli.py generate
2. Upload corpus: uv run python tools/rag_eval_cli.py upload --nextcloud-url http://localhost:8000 --username admin --password admin
This ensures that the ground truth and note mappings are available.
"""
import json
from pathlib import Path
from typing import Any
import pytest
from tests.rag_evaluation.llm_providers import create_llm_provider
# Paths
FIXTURES_DIR = Path(__file__).parent / "fixtures"
GROUND_TRUTH_FILE = FIXTURES_DIR / "ground_truth.json"
NOTE_MAPPING_FILE = FIXTURES_DIR / "note_mapping.json"
@pytest.fixture(scope="session")
def ground_truth_data() -> list[dict[str, Any]]:
"""Load pre-generated ground truth data.
Returns:
List of test cases with query, ground truth answer, and expected doc IDs
Raises:
FileNotFoundError: If ground_truth.json doesn't exist
"""
if not GROUND_TRUTH_FILE.exists():
raise FileNotFoundError(
f"Ground truth file not found: {GROUND_TRUTH_FILE}\n"
"Run: uv run python tools/rag_eval_cli.py generate"
)
with open(GROUND_TRUTH_FILE) as f:
return json.load(f)
@pytest.fixture(scope="session")
def note_mapping() -> dict[str, int]:
"""Load document ID → note ID mapping.
Returns:
Dict mapping nfcorpus document ID to Nextcloud note ID
Raises:
FileNotFoundError: If note_mapping.json doesn't exist
"""
if not NOTE_MAPPING_FILE.exists():
raise FileNotFoundError(
f"Note mapping file not found: {NOTE_MAPPING_FILE}\n"
"Run: uv run python tools/rag_eval_cli.py upload --nextcloud-url ... --username ... --password ..."
)
with open(NOTE_MAPPING_FILE) as f:
return json.load(f)
@pytest.fixture(scope="session")
def nfcorpus_test_data(
ground_truth_data: list[dict[str, Any]],
note_mapping: dict[str, int],
):
"""Prepare nfcorpus test data for evaluation.
This fixture combines ground truth answers with note mappings to create
test cases ready for retrieval and generation quality tests.
Args:
ground_truth_data: Pre-generated ground truth answers
note_mapping: Document ID note ID mapping
Returns:
List of test cases with query, ground truth, expected doc IDs, and note IDs
"""
test_cases = []
for gt in ground_truth_data:
# Map expected document IDs to note IDs
expected_note_ids = [
note_mapping.get(doc_id)
for doc_id in gt["expected_document_ids"]
if doc_id in note_mapping
]
# Filter out None values (docs that weren't uploaded)
expected_note_ids = [nid for nid in expected_note_ids if nid is not None]
test_cases.append(
{
"query_id": gt["query_id"],
"query_text": gt["query_text"],
"ground_truth_answer": gt["ground_truth_answer"],
"expected_document_ids": gt["expected_document_ids"],
"expected_note_ids": expected_note_ids,
"highly_relevant_count": gt["highly_relevant_count"],
}
)
return test_cases
@pytest.fixture(scope="session")
async def evaluation_llm():
"""Create LLM provider for evaluation (separate from MCP client).
Environment variables:
RAG_EVAL_PROVIDER: Provider type (ollama or anthropic)
RAG_EVAL_OLLAMA_BASE_URL: Ollama base URL (or OLLAMA_HOST)
RAG_EVAL_OLLAMA_MODEL: Ollama model name
RAG_EVAL_ANTHROPIC_API_KEY: Anthropic API key
RAG_EVAL_ANTHROPIC_MODEL: Anthropic model name
Returns:
LLM provider instance (OllamaProvider or AnthropicProvider)
"""
llm = create_llm_provider()
yield llm
await llm.close()
@pytest.fixture(scope="session")
async def mcp_sampling_client():
"""Create MCP client that supports sampling for RAG generation.
This fixture creates an MCP client configured to support sampling,
which is required for testing the nc_semantic_search_answer tool.
TODO: Implement MCP client with sampling support
For now, this is a placeholder.
Returns:
MCP client instance with sampling enabled
"""
# TODO: Implement MCP client creation with sampling support
# This will require:
# 1. Creating an MCP client configured for sampling
# 2. Authenticating with Nextcloud
# 3. Ensuring sampling is enabled
pytest.skip("MCP sampling client not yet implemented")
+89
View File
@@ -0,0 +1,89 @@
"""LLM provider abstraction for RAG evaluation.
DEPRECATED: This module is maintained for backward compatibility with RAG evaluation tests.
New code should use nextcloud_mcp_server.providers directly.
Supports Ollama (local), Anthropic (cloud), and Bedrock (AWS) providers for both ground truth
generation and evaluation.
"""
import os
from nextcloud_mcp_server.providers import (
AnthropicProvider,
BedrockProvider,
OllamaProvider,
Provider,
)
def create_llm_provider(
provider: str | None = None,
ollama_base_url: str | None = None,
ollama_model: str | None = None,
anthropic_api_key: str | None = None,
anthropic_model: str | None = None,
bedrock_region: str | None = None,
bedrock_model: str | None = None,
) -> Provider:
"""Create an LLM provider from environment variables or arguments.
Args:
provider: Provider type ('ollama', 'anthropic', or 'bedrock').
Defaults to RAG_EVAL_PROVIDER env var or 'ollama'
ollama_base_url: Ollama base URL. Defaults to RAG_EVAL_OLLAMA_BASE_URL or 'http://localhost:11434'
ollama_model: Ollama model. Defaults to RAG_EVAL_OLLAMA_MODEL or 'llama3.2:1b'
anthropic_api_key: Anthropic API key. Defaults to RAG_EVAL_ANTHROPIC_API_KEY env var
anthropic_model: Anthropic model. Defaults to RAG_EVAL_ANTHROPIC_MODEL or 'claude-3-5-sonnet-20241022'
bedrock_region: AWS region. Defaults to RAG_EVAL_BEDROCK_REGION or AWS_REGION env var
bedrock_model: Bedrock model ID. Defaults to RAG_EVAL_BEDROCK_MODEL or
'anthropic.claude-3-sonnet-20240229-v1:0'
Returns:
Provider instance
Raises:
ValueError: If provider is invalid or required credentials are missing
"""
# Get provider from args or env
provider = provider or os.environ.get("RAG_EVAL_PROVIDER", "ollama")
if provider == "ollama":
# Try RAG_EVAL_OLLAMA_BASE_URL, then OLLAMA_HOST, then default
base_url = (
ollama_base_url
or os.environ.get("RAG_EVAL_OLLAMA_BASE_URL")
or os.environ.get("OLLAMA_HOST")
or "http://localhost:11434"
)
model = ollama_model or os.environ.get("RAG_EVAL_OLLAMA_MODEL", "llama3.2:1b")
return OllamaProvider(
base_url=base_url, embedding_model=None, generation_model=model
)
elif provider == "anthropic":
api_key = anthropic_api_key or os.environ.get("RAG_EVAL_ANTHROPIC_API_KEY")
if not api_key:
raise ValueError(
"Anthropic API key required. Set RAG_EVAL_ANTHROPIC_API_KEY environment variable."
)
model = anthropic_model or os.environ.get(
"RAG_EVAL_ANTHROPIC_MODEL", "claude-3-5-sonnet-20241022"
)
return AnthropicProvider(api_key=api_key, model=model)
elif provider == "bedrock":
region = bedrock_region or os.environ.get(
"RAG_EVAL_BEDROCK_REGION", os.environ.get("AWS_REGION", "us-east-1")
)
model = bedrock_model or os.environ.get(
"RAG_EVAL_BEDROCK_MODEL", "anthropic.claude-3-sonnet-20240229-v1:0"
)
return BedrockProvider(
region_name=region, embedding_model=None, generation_model=model
)
else:
raise ValueError(
f"Invalid provider: {provider}. Must be 'ollama', 'anthropic', or 'bedrock'."
)
@@ -0,0 +1,139 @@
"""Tests for RAG generation quality (Answer Correctness metric).
These tests evaluate whether the MCP client LLM generates factually correct
answers from retrieved context using the nc_semantic_search_answer tool.
Metric: Answer Correctness
- Measures: Is the generated answer factually correct?
- Method: LLM-as-judge - Compare RAG answer vs ground truth (binary true/false)
- Evaluation: External LLM evaluates semantic equivalence
"""
import pytest
@pytest.mark.integration
async def test_answer_correctness(
mcp_sampling_client,
evaluation_llm,
nfcorpus_test_data,
):
"""Test that RAG system generates factually correct answers.
For each test query:
1. Execute full RAG pipeline via nc_semantic_search_answer MCP tool
2. Extract generated answer from RAG response
3. Use LLM-as-judge to compare against ground truth (binary true/false)
4. Assert answer is semantically equivalent to ground truth
This tests the quality of the generation component (MCP client LLM).
"""
results_summary = []
for test_case in nfcorpus_test_data:
query = test_case["query_text"]
ground_truth = test_case["ground_truth_answer"]
print(f"\n{'=' * 80}")
print(f"Query: {query}")
# Execute full RAG pipeline
print("Executing RAG pipeline...")
rag_result = await mcp_sampling_client.call_tool(
"nc_semantic_search_answer",
arguments={"query": query, "limit": 5},
)
rag_answer = rag_result["generated_answer"]
print(f"RAG Answer preview: {rag_answer[:200]}...")
print(f"Ground Truth preview: {ground_truth[:200]}...")
# LLM-as-judge evaluation
evaluation_prompt = f"""Compare these two answers and respond with only TRUE or FALSE.
Question: {query}
Generated Answer: {rag_answer}
Ground Truth Answer: {ground_truth}
Are these answers semantically equivalent (do they convey the same factual information)?
Respond with only: TRUE or FALSE"""
print("Evaluating answer correctness...")
evaluation_result = await evaluation_llm.generate(
evaluation_prompt,
max_tokens=10,
)
is_correct = evaluation_result.strip().upper() == "TRUE"
result = {
"query_id": test_case["query_id"],
"query": query,
"rag_answer_length": len(rag_answer),
"ground_truth_length": len(ground_truth),
"is_correct": is_correct,
"evaluation_result": evaluation_result.strip(),
}
results_summary.append(result)
print(f" Evaluation: {evaluation_result.strip()}")
print(f" Status: {'✓ CORRECT' if is_correct else '✗ INCORRECT'}")
# Assert answer correctness
assert is_correct, (
f"Answer mismatch for query: {query}\n\n"
f"Generated Answer:\n{rag_answer}\n\n"
f"Ground Truth:\n{ground_truth}\n\n"
f"Evaluation: {evaluation_result.strip()}"
)
# Print summary
print(f"\n{'=' * 80}")
print("Answer Correctness Summary:")
print(f" Total queries: {len(results_summary)}")
print(f" Correct: {sum(r['is_correct'] for r in results_summary)}")
print(f" Incorrect: {sum(not r['is_correct'] for r in results_summary)}")
accuracy = sum(r["is_correct"] for r in results_summary) / len(results_summary)
print(f" Accuracy: {accuracy:.2%}")
print(f"{'=' * 80}")
@pytest.mark.integration
async def test_answer_contains_sources(mcp_sampling_client, nfcorpus_test_data):
"""Test that RAG answers include source citations.
This is a basic quality check - we verify that the nc_semantic_search_answer
tool returns both a generated answer and source documents.
"""
for test_case in nfcorpus_test_data:
query = test_case["query_text"]
# Execute RAG pipeline
rag_result = await mcp_sampling_client.call_tool(
"nc_semantic_search_answer",
arguments={"query": query, "limit": 5},
)
# Check response structure
assert "generated_answer" in rag_result, "Response missing 'generated_answer'"
assert "sources" in rag_result, "Response missing 'sources'"
# Check sources are provided
sources = rag_result["sources"]
assert len(sources) > 0, f"No sources returned for query: {query}"
# Check each source has required fields
for i, source in enumerate(sources):
assert "document_id" in source or "id" in source, (
f"Source {i} missing document ID"
)
assert "excerpt" in source or "content" in source or "text" in source, (
f"Source {i} missing content"
)
print(f"Query: {query}")
print(f" Sources provided: {len(sources)}")
print(" Status: ✓ PASS")
@@ -0,0 +1,143 @@
"""Tests for RAG retrieval quality (Context Recall metric).
These tests evaluate whether the vector sync/embedding pipeline successfully
retrieves documents containing the answer to a query.
Metric: Context Recall
- Measures: Did we retrieve documents containing the answer?
- Method: Heuristic - Check if ground-truth document IDs appear in top-k results
- Target: 80% recall (at least 80% of expected docs in top-10 results)
"""
import pytest
@pytest.mark.integration
async def test_retrieval_context_recall(nc_client, nfcorpus_test_data):
"""Test that semantic search retrieves documents containing the answer.
For each test query:
1. Perform semantic search (retrieval only, no generation)
2. Extract retrieved document IDs from top-k results
3. Calculate Context Recall: intersection of retrieved and expected docs
4. Assert recall meets threshold (80%)
This tests the quality of the vector sync/embedding pipeline.
"""
# Top-k documents to retrieve
k = 10
# Minimum acceptable recall
min_recall = 0.8
results_summary = []
for test_case in nfcorpus_test_data:
query = test_case["query_text"]
expected_note_ids = set(test_case["expected_note_ids"])
# Perform semantic search (retrieval only)
search_results = await nc_client.notes.semantic_search(
query=query,
limit=k,
)
# Extract retrieved note IDs
retrieved_note_ids = {result["id"] for result in search_results}
# Calculate Context Recall
intersection = expected_note_ids & retrieved_note_ids
recall = len(intersection) / len(expected_note_ids) if expected_note_ids else 0
# Store results
result = {
"query_id": test_case["query_id"],
"query": query,
"expected_count": len(expected_note_ids),
"retrieved_count": len(retrieved_note_ids),
"intersection_count": len(intersection),
"recall": recall,
"passed": recall >= min_recall,
}
results_summary.append(result)
# Print detailed result for this query
print(f"\n{'=' * 80}")
print(f"Query: {query}")
print(f" Expected docs: {len(expected_note_ids)}")
print(f" Retrieved (top-{k}): {len(retrieved_note_ids)}")
print(f" Intersection: {len(intersection)}")
print(f" Context Recall: {recall:.2%}")
print(f" Status: {'✓ PASS' if result['passed'] else '✗ FAIL'}")
# Assert recall meets threshold
assert recall >= min_recall, (
f"Context Recall {recall:.2%} below threshold {min_recall:.2%} "
f"for query: {query}\n"
f"Expected {len(expected_note_ids)} docs, found {len(intersection)} in top-{k}"
)
# Print summary
print(f"\n{'=' * 80}")
print("Context Recall Summary:")
print(f" Total queries: {len(results_summary)}")
print(f" Passed: {sum(r['passed'] for r in results_summary)}")
print(f" Failed: {sum(not r['passed'] for r in results_summary)}")
print(
f" Average recall: {sum(r['recall'] for r in results_summary) / len(results_summary):.2%}"
)
print(f"{'=' * 80}")
@pytest.mark.integration
async def test_retrieval_top1_precision(nc_client, nfcorpus_test_data):
"""Test that the top-1 retrieved document is highly relevant.
This is a stricter test than context recall - we verify that
the single most relevant document (rank 1) is in the expected set.
This tests whether the ranking is good, not just retrieval.
"""
results_summary = []
for test_case in nfcorpus_test_data:
query = test_case["query_text"]
expected_note_ids = set(test_case["expected_note_ids"])
# Perform semantic search
search_results = await nc_client.notes.semantic_search(
query=query,
limit=1, # Only top-1
)
# Check if top result is in expected set
if search_results:
top_result_id = search_results[0]["id"]
is_relevant = top_result_id in expected_note_ids
else:
is_relevant = False
result = {
"query_id": test_case["query_id"],
"query": query,
"top_result_id": search_results[0]["id"] if search_results else None,
"is_relevant": is_relevant,
}
results_summary.append(result)
print(f"\nQuery: {query}")
print(f" Top-1 relevant: {'✓ YES' if is_relevant else '✗ NO'}")
# This is informational - we don't assert here
# Some queries may have multiple valid top results
# Print summary
precision_at_1 = sum(r["is_relevant"] for r in results_summary) / len(
results_summary
)
print(f"\n{'=' * 80}")
print(f"Precision@1: {precision_at_1:.2%}")
print(
f" ({sum(r['is_relevant'] for r in results_summary)}/{len(results_summary)} queries)"
)
print(f"{'=' * 80}")
+1
View File
@@ -0,0 +1 @@
"""Unit tests for provider infrastructure."""
+280
View File
@@ -0,0 +1,280 @@
"""Unit tests for Bedrock provider."""
import json
from unittest.mock import MagicMock
import pytest
from nextcloud_mcp_server.providers.bedrock import BOTO3_AVAILABLE, BedrockProvider
@pytest.fixture
def mock_bedrock_client(mocker):
"""Mock boto3 bedrock-runtime client."""
if not BOTO3_AVAILABLE:
pytest.skip("boto3 not installed")
mock_client = MagicMock()
mocker.patch("boto3.client", return_value=mock_client)
return mock_client
@pytest.mark.unit
async def test_bedrock_embedding_titan(mock_bedrock_client):
"""Test Bedrock embedding with Titan model."""
# Mock response
mock_response = {
"body": MagicMock(
read=MagicMock(
return_value=json.dumps({"embedding": [0.1, 0.2, 0.3]}).encode()
)
)
}
mock_bedrock_client.invoke_model.return_value = mock_response
# Create provider
provider = BedrockProvider(
region_name="us-east-1",
embedding_model="amazon.titan-embed-text-v2:0",
generation_model=None,
)
# Test embedding
embedding = await provider.embed("test text")
assert embedding == [0.1, 0.2, 0.3]
mock_bedrock_client.invoke_model.assert_called_once()
call_args = mock_bedrock_client.invoke_model.call_args
assert call_args.kwargs["modelId"] == "amazon.titan-embed-text-v2:0"
body = json.loads(call_args.kwargs["body"])
assert body == {"inputText": "test text"}
@pytest.mark.unit
async def test_bedrock_embedding_batch(mock_bedrock_client):
"""Test Bedrock batch embedding."""
# Mock response
mock_response = {
"body": MagicMock(
read=MagicMock(
return_value=json.dumps({"embedding": [0.1, 0.2, 0.3]}).encode()
)
)
}
mock_bedrock_client.invoke_model.return_value = mock_response
# Create provider
provider = BedrockProvider(
region_name="us-east-1",
embedding_model="amazon.titan-embed-text-v2:0",
generation_model=None,
)
# Test batch embedding
embeddings = await provider.embed_batch(["text1", "text2"])
assert len(embeddings) == 2
assert embeddings[0] == [0.1, 0.2, 0.3]
assert embeddings[1] == [0.1, 0.2, 0.3]
assert mock_bedrock_client.invoke_model.call_count == 2
@pytest.mark.unit
async def test_bedrock_generation_claude(mock_bedrock_client):
"""Test Bedrock text generation with Claude model."""
# Mock response
mock_response = {
"body": MagicMock(
read=MagicMock(
return_value=json.dumps(
{"content": [{"text": "Generated response"}]}
).encode()
)
)
}
mock_bedrock_client.invoke_model.return_value = mock_response
# Create provider
provider = BedrockProvider(
region_name="us-east-1",
embedding_model=None,
generation_model="anthropic.claude-3-sonnet-20240229-v1:0",
)
# Test generation
text = await provider.generate("test prompt", max_tokens=100)
assert text == "Generated response"
mock_bedrock_client.invoke_model.assert_called_once()
call_args = mock_bedrock_client.invoke_model.call_args
assert call_args.kwargs["modelId"] == "anthropic.claude-3-sonnet-20240229-v1:0"
body = json.loads(call_args.kwargs["body"])
assert body["messages"][0]["content"] == "test prompt"
assert body["max_tokens"] == 100
@pytest.mark.unit
async def test_bedrock_generation_llama(mock_bedrock_client):
"""Test Bedrock text generation with Llama model."""
# Mock response
mock_response = {
"body": MagicMock(
read=MagicMock(
return_value=json.dumps({"generation": "Llama response"}).encode()
)
)
}
mock_bedrock_client.invoke_model.return_value = mock_response
# Create provider
provider = BedrockProvider(
region_name="us-east-1",
embedding_model=None,
generation_model="meta.llama3-8b-instruct-v1:0",
)
# Test generation
text = await provider.generate("test prompt")
assert text == "Llama response"
body = json.loads(mock_bedrock_client.invoke_model.call_args.kwargs["body"])
assert body["prompt"] == "test prompt"
assert "max_gen_len" in body
@pytest.mark.unit
async def test_bedrock_both_capabilities(mock_bedrock_client):
"""Test Bedrock with both embedding and generation models."""
# Mock responses
embed_response = {
"body": MagicMock(
read=MagicMock(return_value=json.dumps({"embedding": [0.1, 0.2]}).encode())
)
}
gen_response = {
"body": MagicMock(
read=MagicMock(
return_value=json.dumps({"content": [{"text": "Response"}]}).encode()
)
)
}
# Mock to return different responses based on modelId
def mock_invoke(modelId, body, **kwargs):
if "embed" in modelId:
return embed_response
else:
return gen_response
mock_bedrock_client.invoke_model.side_effect = mock_invoke
# Create provider with both models
provider = BedrockProvider(
region_name="us-east-1",
embedding_model="amazon.titan-embed-text-v2:0",
generation_model="anthropic.claude-3-sonnet-20240229-v1:0",
)
assert provider.supports_embeddings is True
assert provider.supports_generation is True
# Test both capabilities
embedding = await provider.embed("test")
assert embedding == [0.1, 0.2]
text = await provider.generate("test")
assert text == "Response"
@pytest.mark.unit
async def test_bedrock_no_embeddings():
"""Test Bedrock provider with no embedding model raises error."""
provider = BedrockProvider(
region_name="us-east-1",
embedding_model=None,
generation_model="anthropic.claude-3-sonnet-20240229-v1:0",
)
assert provider.supports_embeddings is False
with pytest.raises(NotImplementedError, match="no embedding_model configured"):
await provider.embed("test")
with pytest.raises(NotImplementedError, match="no embedding_model configured"):
await provider.embed_batch(["test"])
with pytest.raises(NotImplementedError, match="no embedding_model configured"):
provider.get_dimension()
@pytest.mark.unit
async def test_bedrock_no_generation():
"""Test Bedrock provider with no generation model raises error."""
provider = BedrockProvider(
region_name="us-east-1",
embedding_model="amazon.titan-embed-text-v2:0",
generation_model=None,
)
assert provider.supports_generation is False
with pytest.raises(NotImplementedError, match="no generation_model configured"):
await provider.generate("test")
@pytest.mark.unit
async def test_bedrock_dimension_detection(mock_bedrock_client):
"""Test dimension detection for Bedrock embeddings."""
# Mock response with specific dimension
mock_response = {
"body": MagicMock(
read=MagicMock(
return_value=json.dumps(
{"embedding": [0.1] * 1536} # 1536-dim embedding
).encode()
)
)
}
mock_bedrock_client.invoke_model.return_value = mock_response
provider = BedrockProvider(
region_name="us-east-1",
embedding_model="amazon.titan-embed-text-v2:0",
)
# Dimension not detected yet
with pytest.raises(RuntimeError, match="not detected yet"):
provider.get_dimension()
# Detect dimension
await provider._detect_dimension()
# Now dimension should be available
assert provider.get_dimension() == 1536
@pytest.mark.unit
async def test_bedrock_cohere_embedding(mock_bedrock_client):
"""Test Bedrock with Cohere embedding model."""
# Mock response
mock_response = {
"body": MagicMock(
read=MagicMock(
return_value=json.dumps({"embeddings": [[0.1, 0.2, 0.3]]}).encode()
)
)
}
mock_bedrock_client.invoke_model.return_value = mock_response
provider = BedrockProvider(
region_name="us-east-1",
embedding_model="cohere.embed-english-v3",
)
embedding = await provider.embed("test text")
assert embedding == [0.1, 0.2, 0.3]
body = json.loads(mock_bedrock_client.invoke_model.call_args.kwargs["body"])
assert body == {"texts": ["test text"], "input_type": "search_document"}
+1
View File
@@ -0,0 +1 @@
"""Unit tests for search algorithms."""
+54
View File
@@ -0,0 +1,54 @@
"""Unit tests for BM25 hybrid search algorithm."""
import pytest
from qdrant_client import models
from nextcloud_mcp_server.search.bm25_hybrid import BM25HybridSearchAlgorithm
@pytest.mark.unit
def test_bm25_hybrid_initialization_default():
"""Test BM25HybridSearchAlgorithm initializes with default RRF fusion."""
algo = BM25HybridSearchAlgorithm()
assert algo.score_threshold == 0.0
assert algo.fusion == models.Fusion.RRF
assert algo.fusion_name == "rrf"
assert algo.name == "bm25_hybrid"
@pytest.mark.unit
def test_bm25_hybrid_initialization_with_rrf():
"""Test BM25HybridSearchAlgorithm initializes with explicit RRF fusion."""
algo = BM25HybridSearchAlgorithm(score_threshold=0.5, fusion="rrf")
assert algo.score_threshold == 0.5
assert algo.fusion == models.Fusion.RRF
assert algo.fusion_name == "rrf"
@pytest.mark.unit
def test_bm25_hybrid_initialization_with_dbsf():
"""Test BM25HybridSearchAlgorithm initializes with DBSF fusion."""
algo = BM25HybridSearchAlgorithm(score_threshold=0.7, fusion="dbsf")
assert algo.score_threshold == 0.7
assert algo.fusion == models.Fusion.DBSF
assert algo.fusion_name == "dbsf"
@pytest.mark.unit
def test_bm25_hybrid_invalid_fusion_raises_error():
"""Test BM25HybridSearchAlgorithm raises ValueError for invalid fusion."""
with pytest.raises(ValueError) as exc_info:
BM25HybridSearchAlgorithm(fusion="invalid")
assert "Invalid fusion algorithm 'invalid'" in str(exc_info.value)
assert "Must be 'rrf' or 'dbsf'" in str(exc_info.value)
@pytest.mark.unit
def test_bm25_hybrid_requires_vector_db():
"""Test BM25HybridSearchAlgorithm reports it requires vector database."""
algo = BM25HybridSearchAlgorithm()
assert algo.requires_vector_db is True
+135
View File
@@ -0,0 +1,135 @@
"""Unit tests for SearchResult validation."""
import pytest
from nextcloud_mcp_server.search.algorithms import SearchResult
@pytest.mark.unit
def test_search_result_rrf_score_in_range():
"""Test SearchResult accepts RRF scores in [0.0, 1.0] range."""
result = SearchResult(
id=1,
doc_type="note",
title="Test Note",
excerpt="Test excerpt",
score=0.85,
)
assert result.score == 0.85
@pytest.mark.unit
def test_search_result_rrf_score_at_lower_bound():
"""Test SearchResult accepts RRF score at lower bound (0.0)."""
result = SearchResult(
id=1,
doc_type="note",
title="Test Note",
excerpt="Test excerpt",
score=0.0,
)
assert result.score == 0.0
@pytest.mark.unit
def test_search_result_rrf_score_at_upper_bound():
"""Test SearchResult accepts RRF score at upper bound (1.0)."""
result = SearchResult(
id=1,
doc_type="note",
title="Test Note",
excerpt="Test excerpt",
score=1.0,
)
assert result.score == 1.0
@pytest.mark.unit
def test_search_result_dbsf_score_above_one():
"""Test SearchResult accepts DBSF scores > 1.0.
DBSF (Distribution-Based Score Fusion) sums normalized scores from multiple
systems (dense semantic + sparse BM25), so scores can exceed 1.0 when both
systems strongly agree a document is relevant.
"""
# Typical DBSF score when both systems agree
result = SearchResult(
id=1,
doc_type="note",
title="Highly Relevant Note",
excerpt="Contains keywords and is semantically similar",
score=1.55,
)
assert result.score == 1.55
@pytest.mark.unit
def test_search_result_dbsf_score_edge_case():
"""Test SearchResult accepts DBSF maximum theoretical score (2.0).
Maximum DBSF score with 2 systems: 1.0 (dense) + 1.0 (sparse) = 2.0
"""
result = SearchResult(
id=1,
doc_type="note",
title="Perfect Match",
excerpt="Perfect semantic and keyword match",
score=2.0,
)
assert result.score == 2.0
@pytest.mark.unit
def test_search_result_negative_score_raises_error():
"""Test SearchResult rejects negative scores."""
with pytest.raises(ValueError) as exc_info:
SearchResult(
id=1,
doc_type="note",
title="Test Note",
excerpt="Test excerpt",
score=-0.1,
)
assert "Score must be non-negative" in str(exc_info.value)
assert "got -0.1" in str(exc_info.value)
@pytest.mark.unit
def test_search_result_with_metadata():
"""Test SearchResult with optional metadata field."""
result = SearchResult(
id=1,
doc_type="note",
title="Test Note",
excerpt="Test excerpt",
score=1.25,
metadata={"fusion_method": "dbsf", "dense_score": 0.8, "sparse_score": 0.45},
)
assert result.score == 1.25
assert result.metadata["fusion_method"] == "dbsf"
assert result.metadata["dense_score"] == 0.8
assert result.metadata["sparse_score"] == 0.45
@pytest.mark.unit
def test_search_result_with_chunk_offsets():
"""Test SearchResult with chunk offset information."""
result = SearchResult(
id=1,
doc_type="note",
title="Test Note",
excerpt="matching chunk text",
score=0.9,
chunk_start_offset=100,
chunk_end_offset=500,
)
assert result.chunk_start_offset == 100
assert result.chunk_end_offset == 500
+8 -8
View File
@@ -159,8 +159,8 @@ class TestChunkConfigValidation:
def test_default_chunk_settings(self):
"""Test default chunk size and overlap values."""
settings = Settings()
assert settings.document_chunk_size == 512
assert settings.document_chunk_overlap == 50
assert settings.document_chunk_size == 2048
assert settings.document_chunk_overlap == 200
def test_valid_chunk_settings(self):
"""Test valid chunk size and overlap configuration."""
@@ -205,7 +205,7 @@ class TestChunkConfigValidation:
)
def test_small_chunk_size_warning(self, caplog):
"""Test that chunk size < 100 triggers warning."""
"""Test that chunk size < 512 triggers warning."""
import logging
caplog.set_level(logging.WARNING, logger="nextcloud_mcp_server.config")
@@ -214,19 +214,19 @@ class TestChunkConfigValidation:
document_chunk_overlap=10,
)
assert (
"DOCUMENT_CHUNK_SIZE is set to 64 words, which is quite small"
"DOCUMENT_CHUNK_SIZE is set to 64 characters, which is quite small"
in caplog.text
)
assert "Consider using at least 256 words" in caplog.text
assert "Consider using at least 1024 characters" in caplog.text
def test_reasonable_chunk_size_no_warning(self, caplog):
"""Test that chunk size >= 100 doesn't trigger warning."""
"""Test that chunk size >= 512 doesn't trigger warning."""
import logging
caplog.set_level(logging.WARNING, logger="nextcloud_mcp_server.config")
Settings(
document_chunk_size=256,
document_chunk_overlap=25,
document_chunk_size=1024,
document_chunk_overlap=100,
)
assert "DOCUMENT_CHUNK_SIZE" not in caplog.text
+288
View File
@@ -0,0 +1,288 @@
"""Unit tests for DocumentChunker with LangChain text splitters."""
from nextcloud_mcp_server.vector.document_chunker import (
ChunkWithPosition,
DocumentChunker,
)
class TestDocumentChunkerPositions:
"""Test suite for DocumentChunker position tracking functionality."""
def test_single_chunk_simple_text(self):
"""Test that single-chunk documents return correct positions."""
chunker = DocumentChunker(chunk_size=2048, overlap=200)
content = "This is a short document."
chunks = chunker.chunk_text(content)
assert len(chunks) == 1
assert isinstance(chunks[0], ChunkWithPosition)
assert chunks[0].text == content
assert chunks[0].start_offset == 0
assert chunks[0].end_offset == len(content)
def test_multiple_chunks_positions(self):
"""Test that multi-chunk documents have correct positions."""
# Use small chunk size to force multiple chunks
chunker = DocumentChunker(chunk_size=50, overlap=10)
# Create content longer than chunk size
content = (
"This is the first sentence with some important content. "
"This is the second sentence with more details. "
"This is the third sentence continuing the discussion. "
"This is the fourth sentence adding more context."
)
chunks = chunker.chunk_text(content)
# Verify we got multiple chunks
assert len(chunks) > 1
# Verify all chunks are ChunkWithPosition
for chunk in chunks:
assert isinstance(chunk, ChunkWithPosition)
# Verify first chunk starts at 0
assert chunks[0].start_offset == 0
# Verify last chunk ends at content length
assert chunks[-1].end_offset == len(content)
# Verify chunks are contiguous or overlap (minimal gaps allowed)
for i in range(len(chunks) - 1):
# Next chunk should start at or near current chunk end
# Allow small gaps (1-2 chars) for whitespace/punctuation at boundaries
gap = chunks[i + 1].start_offset - chunks[i].end_offset
assert gap <= 2, f"Gap too large between chunks: {gap} characters"
# Verify we can reconstruct the content using positions
for chunk in chunks:
extracted = content[chunk.start_offset : chunk.end_offset]
assert extracted == chunk.text
def test_chunk_positions_with_whitespace(self):
"""Test position tracking with various whitespace."""
chunker = DocumentChunker(chunk_size=30, overlap=5)
content = "First sentence here. Second sentence.\n\nThird sentence.\tFourth sentence."
chunks = chunker.chunk_text(content)
# Verify positions correctly handle whitespace
for chunk in chunks:
extracted = content[chunk.start_offset : chunk.end_offset]
assert extracted == chunk.text
# LangChain strips whitespace by default
assert len(chunk.text.strip()) > 0
def test_empty_content(self):
"""Test that empty content returns empty chunk."""
chunker = DocumentChunker(chunk_size=2048, overlap=200)
content = ""
chunks = chunker.chunk_text(content)
assert len(chunks) == 1
assert chunks[0].text == ""
assert chunks[0].start_offset == 0
assert chunks[0].end_offset == 0
def test_chunk_overlap_positions(self):
"""Test that overlapping chunks have correct positions."""
chunker = DocumentChunker(chunk_size=50, overlap=15)
content = (
"This is sentence one with content. "
"This is sentence two with more. "
"This is sentence three continuing. "
"This is sentence four adding details."
)
chunks = chunker.chunk_text(content)
# Verify overlap exists if we have multiple chunks
if len(chunks) > 1:
for i in range(len(chunks) - 1):
current_chunk = chunks[i]
next_chunk = chunks[i + 1]
# Verify positions are valid
assert next_chunk.start_offset >= 0
assert current_chunk.end_offset <= len(content)
# With overlap, next chunk may start before current ends
assert next_chunk.start_offset <= current_chunk.end_offset
def test_unicode_content_positions(self):
"""Test position tracking with Unicode characters."""
chunker = DocumentChunker(chunk_size=50, overlap=10)
content = (
"Hello 世界. こんにちは there. мир Привет world. שלום مرحبا 你好 friend."
)
chunks = chunker.chunk_text(content)
# Verify all chunks extract correctly
for chunk in chunks:
extracted = content[chunk.start_offset : chunk.end_offset]
assert extracted == chunk.text
# Verify full coverage
if len(chunks) == 1:
assert chunks[0].start_offset == 0
assert chunks[0].end_offset == len(content)
def test_realistic_note_content(self):
"""Test with realistic note content similar to Nextcloud Notes."""
chunker = DocumentChunker(chunk_size=200, overlap=50)
content = """My Project Notes
This is a note about my project. It contains several paragraphs of text
that should be chunked appropriately for embedding.
## Key Points
- First important point with some details
- Second point that needs to be remembered
- Third point for future reference
The document continues with more content here. We want to make sure that
the chunking preserves context across boundaries while maintaining proper
position tracking for each chunk.
This allows us to highlight the exact chunk that matched a search query,
which builds trust in the RAG system."""
chunks = chunker.chunk_text(content)
# Should have multiple chunks
assert len(chunks) > 1
# Verify all chunks
for chunk in chunks:
assert isinstance(chunk, ChunkWithPosition)
# Verify extraction
extracted = content[chunk.start_offset : chunk.end_offset]
assert extracted == chunk.text
# Verify positions are valid
assert chunk.start_offset >= 0
assert chunk.end_offset <= len(content)
assert chunk.start_offset < chunk.end_offset
def test_semantic_boundary_preservation(self):
"""Test that LangChain creates semantically coherent chunks."""
chunker = DocumentChunker(chunk_size=100, overlap=20)
content = (
"First sentence is here. "
"Second sentence follows. "
"Third sentence continues. "
"Fourth sentence ends."
)
chunks = chunker.chunk_text(content)
# Verify all chunks are extractable using their positions
for chunk in chunks:
extracted = content[chunk.start_offset : chunk.end_offset]
assert extracted == chunk.text
# Verify chunk text is meaningful (not empty or just whitespace)
assert len(chunk.text.strip()) > 0
# Verify positions are valid
assert chunk.start_offset >= 0
assert chunk.end_offset <= len(content)
assert chunk.start_offset < chunk.end_offset
def test_paragraph_boundary_preservation(self):
"""Test that LangChain preserves paragraph boundaries."""
chunker = DocumentChunker(chunk_size=80, overlap=15)
content = """First paragraph here.
Second paragraph here.
Third paragraph here.
Fourth paragraph here."""
chunks = chunker.chunk_text(content)
# LangChain should prefer splitting at paragraph boundaries (\n\n)
# Verify we got multiple chunks
assert len(chunks) >= 1
# Verify all positions work correctly
for chunk in chunks:
extracted = content[chunk.start_offset : chunk.end_offset]
assert extracted == chunk.text
def test_default_parameters(self):
"""Test that default parameters work correctly."""
chunker = DocumentChunker() # Use defaults: 2048 chars, 200 overlap
# Create content that's smaller than default chunk size
content = (
"This is a short note with a few sentences. It should fit in one chunk."
)
chunks = chunker.chunk_text(content)
assert len(chunks) == 1
assert chunks[0].text == content
assert chunks[0].start_offset == 0
assert chunks[0].end_offset == len(content)
def test_large_document_chunking(self):
"""Test chunking of a large document."""
chunker = DocumentChunker(chunk_size=100, overlap=20)
# Create a large document with multiple paragraphs
paragraphs = [
f"This is paragraph {i} with some meaningful content about topic {i}. "
f"It contains multiple sentences to make it realistic. "
f"The content should be properly chunked."
for i in range(10)
]
content = "\n\n".join(paragraphs)
chunks = chunker.chunk_text(content)
# Should create multiple chunks
assert len(chunks) > 1
# Verify all chunks are valid
for chunk in chunks:
assert isinstance(chunk, ChunkWithPosition)
assert len(chunk.text) > 0
# Verify extraction
extracted = content[chunk.start_offset : chunk.end_offset]
assert extracted == chunk.text
# Verify first and last positions
assert chunks[0].start_offset == 0
assert chunks[-1].end_offset == len(content)
def test_position_tracking_with_overlap(self):
"""Test that position tracking works correctly with overlap."""
chunker = DocumentChunker(chunk_size=50, overlap=15)
content = "A" * 25 + ". " + "B" * 25 + ". " + "C" * 25 + ". " + "D" * 25 + "."
chunks = chunker.chunk_text(content)
if len(chunks) > 1:
# Verify overlap creates correct positions
for i in range(len(chunks) - 1):
# Each chunk should be extractable
assert (
content[chunks[i].start_offset : chunks[i].end_offset]
== chunks[i].text
)
# Next chunk should overlap with current
# (start before current ends)
if chunks[i + 1].start_offset < chunks[i].end_offset:
# There is overlap - verify content matches
overlap_start = chunks[i + 1].start_offset
overlap_end = chunks[i].end_offset
overlap_text = content[overlap_start:overlap_end]
assert overlap_text in chunks[i].text
assert overlap_text in chunks[i + 1].text
+217
View File
@@ -0,0 +1,217 @@
"""
Unit tests for @instrument_tool decorator.
Tests that the decorator correctly instruments MCP tools with both
Prometheus metrics and OpenTelemetry tracing.
"""
from unittest.mock import MagicMock, patch
import pytest
from nextcloud_mcp_server.observability.metrics import instrument_tool
pytestmark = pytest.mark.unit
@pytest.fixture
def mock_metrics():
"""Mock Prometheus metrics."""
with (
patch(
"nextcloud_mcp_server.observability.metrics.record_tool_call"
) as mock_record,
patch(
"nextcloud_mcp_server.observability.metrics.record_tool_error"
) as mock_error,
):
yield {"record_tool_call": mock_record, "record_tool_error": mock_error}
@pytest.fixture
def mock_tracer():
"""Mock OpenTelemetry tracer."""
with patch(
"nextcloud_mcp_server.observability.tracing.trace_operation"
) as mock_trace:
# Configure mock to act as a context manager that allows exceptions to propagate
mock_trace.return_value.__enter__ = MagicMock(return_value=None)
mock_trace.return_value.__exit__ = MagicMock(
return_value=False
) # Return False to allow exceptions to propagate
yield mock_trace
class TestInstrumentToolDecorator:
"""Test the @instrument_tool decorator."""
async def test_decorator_creates_trace_span(self, mock_tracer, mock_metrics):
"""Test that decorator creates OpenTelemetry span with correct attributes."""
@instrument_tool
async def example_tool(query: str, limit: int = 10):
return {"results": []}
# Call the tool
await example_tool(query="test query", limit=5)
# Verify trace_operation was called with correct parameters
mock_tracer.assert_called_once()
call_args = mock_tracer.call_args
# Check span name
assert call_args[0][0] == "mcp.tool.example_tool"
# Check span attributes
attributes = call_args[1]["attributes"]
assert attributes["mcp.tool.name"] == "example_tool"
assert "query" in attributes["mcp.tool.args"]
assert "test query" in attributes["mcp.tool.args"]
assert "limit" in attributes["mcp.tool.args"]
# Verify record_exception parameter
assert call_args[1]["record_exception"] is True
async def test_decorator_sanitizes_sensitive_arguments(
self, mock_tracer, mock_metrics
):
"""Test that sensitive arguments are excluded from span attributes."""
@instrument_tool
async def example_tool(
query: str, password: str, token: str, api_key: str, ctx: object
):
return {"success": True}
# Call with sensitive parameters
await example_tool(
query="test",
password="secret123",
token="bearer_token",
api_key="api_key_123",
ctx=MagicMock(),
)
# Verify trace was created
mock_tracer.assert_called_once()
attributes = mock_tracer.call_args[1]["attributes"]
# Check that sensitive fields are NOT in attributes
tool_args = attributes["mcp.tool.args"]
assert "password" not in tool_args
assert "secret123" not in tool_args
assert "token" not in tool_args
assert "bearer_token" not in tool_args
assert "api_key" not in tool_args
assert "api_key_123" not in tool_args
assert "ctx" not in tool_args
# Check that non-sensitive field IS included
assert "query" in tool_args
assert "test" in tool_args
async def test_decorator_limits_argument_string_length(
self, mock_tracer, mock_metrics
):
"""Test that tool arguments are limited to 500 characters."""
@instrument_tool
async def example_tool(query: str):
return {"results": []}
# Create a very long query string (>500 chars)
long_query = "x" * 1000
await example_tool(query=long_query)
# Verify arguments were truncated
mock_tracer.assert_called_once()
attributes = mock_tracer.call_args[1]["attributes"]
tool_args = attributes["mcp.tool.args"]
assert len(tool_args) <= 500
async def test_decorator_records_success_metrics(self, mock_tracer, mock_metrics):
"""Test that successful tool execution records metrics."""
@instrument_tool
async def example_tool():
return {"success": True}
# Call the tool
await example_tool()
# Verify success metrics were recorded
mock_metrics["record_tool_call"].assert_called_once()
call_args = mock_metrics["record_tool_call"].call_args
assert call_args[0][0] == "example_tool" # tool_name
assert isinstance(call_args[0][1], float) # duration
assert call_args[0][2] == "success" # status
async def test_decorator_records_error_metrics(self, mock_tracer, mock_metrics):
"""Test that tool errors are recorded in metrics."""
@instrument_tool
async def failing_tool():
raise ValueError("Test error")
# Call the tool and expect exception
with pytest.raises(ValueError, match="Test error"):
await failing_tool()
# Verify error metrics were recorded
mock_metrics["record_tool_call"].assert_called_once()
call_args = mock_metrics["record_tool_call"].call_args
assert call_args[0][0] == "failing_tool" # tool_name
assert isinstance(call_args[0][1], float) # duration
assert call_args[0][2] == "error" # status
# Verify error type was recorded
mock_metrics["record_tool_error"].assert_called_once()
error_args = mock_metrics["record_tool_error"].call_args
assert error_args[0][0] == "failing_tool" # tool_name
assert error_args[0][1] == "ValueError" # error_type
async def test_decorator_preserves_function_metadata(
self, mock_tracer, mock_metrics
):
"""Test that decorator preserves function name and docstring."""
@instrument_tool
async def example_tool():
"""This is a test tool."""
return {"success": True}
# Verify function metadata is preserved
assert example_tool.__name__ == "example_tool"
assert example_tool.__doc__ == "This is a test tool."
async def test_decorator_preserves_return_value(self, mock_tracer, mock_metrics):
"""Test that decorator returns the original function's return value."""
@instrument_tool
async def example_tool(value: int):
return {"result": value * 2}
# Call the tool
result = await example_tool(value=5)
# Verify return value is unchanged
assert result == {"result": 10}
async def test_decorator_with_no_arguments(self, mock_tracer, mock_metrics):
"""Test decorator with tool that takes no arguments."""
@instrument_tool
async def no_args_tool():
return {"status": "ok"}
# Call the tool
await no_args_tool()
# Verify tracing works with no arguments
mock_tracer.assert_called_once()
attributes = mock_tracer.call_args[1]["attributes"]
# tool_args should be None when there are no kwargs
assert attributes["mcp.tool.args"] is None
+587
View File
@@ -0,0 +1,587 @@
#!/usr/bin/env python3
"""RAG Evaluation Management CLI.
Commands:
generate - Generate ground truth answers from nfcorpus dataset
upload - Upload nfcorpus documents as Nextcloud notes
Usage:
# Generate ground truth
uv run python tools/rag_eval_cli.py generate
# Upload corpus to Nextcloud
uv run python tools/rag_eval_cli.py upload --nextcloud-url http://localhost:8000 --username admin --password admin
"""
import io
import json
import sys
import zipfile
from pathlib import Path
from typing import Any
import anyio
import click
import httpx
from datasets import load_dataset
from httpx import BasicAuth
# Add parent directory to path to import from tests/
sys.path.insert(0, str(Path(__file__).parent.parent))
from nextcloud_mcp_server.client import NextcloudClient
from tests.rag_evaluation.llm_providers import create_llm_provider
# Paths
FIXTURES_DIR = Path(__file__).parent.parent / "tests" / "rag_evaluation" / "fixtures"
CORPUS_DIR = FIXTURES_DIR / "nfcorpus"
GROUND_TRUTH_FILE = FIXTURES_DIR / "ground_truth.json"
NOTE_MAPPING_FILE = FIXTURES_DIR / "note_mapping.json"
# Dataset URL
NFCORPUS_URL = (
"https://public.ukp.informatik.tu-darmstadt.de/thakur/BEIR/datasets/nfcorpus.zip"
)
# Selected test queries (from ADR-013)
SELECTED_QUERIES = [
"PLAIN-2630", # Alkylphenol Endocrine Disruptors and Allergies
"PLAIN-2660", # How Long to Detox From Fish Before Pregnancy?
"PLAIN-2510", # Coffee and Artery Function
"PLAIN-2430", # Preventing Brain Loss with B Vitamins?
"PLAIN-2690", # Chronic Headaches and Pork Tapeworms
]
def ensure_corpus_downloaded(force_download: bool = False) -> Path:
"""Ensure nfcorpus dataset is downloaded to fixtures directory.
Args:
force_download: Force re-download even if corpus exists
Returns:
Path to corpus directory
Raises:
RuntimeError: If download fails
"""
if CORPUS_DIR.exists() and not force_download:
click.echo(f"Corpus already exists at {CORPUS_DIR}")
return CORPUS_DIR
click.echo(f"Downloading nfcorpus dataset to {CORPUS_DIR}...")
# Create fixtures directory
FIXTURES_DIR.mkdir(parents=True, exist_ok=True)
# Download using HuggingFace datasets library (handles caching)
try:
# Download corpus
click.echo(" Downloading corpus...")
corpus_dataset = load_dataset(
"BeIR/nfcorpus",
"corpus",
split="corpus",
)
# Download queries
click.echo(" Downloading queries...")
queries_dataset = load_dataset(
"BeIR/nfcorpus",
"queries",
split="queries",
)
# Save to local fixtures directory as JSONL
CORPUS_DIR.mkdir(parents=True, exist_ok=True)
# Save corpus
with open(CORPUS_DIR / "corpus.jsonl", "w") as f:
for doc in corpus_dataset:
f.write(json.dumps(doc) + "\n")
# Save queries
with open(CORPUS_DIR / "queries.jsonl", "w") as f:
for query in queries_dataset:
f.write(json.dumps(query) + "\n")
# Download qrels from BEIR directly (not available via HuggingFace)
click.echo(" Downloading qrels from BEIR ZIP...")
with httpx.Client(timeout=300.0) as client:
response = client.get(NFCORPUS_URL)
response.raise_for_status()
# Extract qrels from ZIP
with zipfile.ZipFile(io.BytesIO(response.content)) as zf:
# The qrels are in nfcorpus/qrels/test.tsv within the ZIP
qrels_path = "nfcorpus/qrels/test.tsv"
qrels_dir = CORPUS_DIR / "qrels"
qrels_dir.mkdir(parents=True, exist_ok=True)
qrels_content = zf.read(qrels_path).decode("utf-8")
with open(qrels_dir / "test.tsv", "w") as f:
f.write(qrels_content)
click.echo(f"Dataset downloaded to {CORPUS_DIR}")
return CORPUS_DIR
except Exception as e:
raise RuntimeError(f"Failed to download nfcorpus dataset: {e}") from e
def load_corpus(corpus_dir: Path) -> dict[str, dict]:
"""Load corpus documents from local directory.
Args:
corpus_dir: Path to corpus directory
Returns:
Dict mapping document ID to document data
"""
corpus = {}
with open(corpus_dir / "corpus.jsonl") as f:
for line in f:
doc = json.loads(line)
corpus[doc["_id"]] = doc
return corpus
def load_queries(corpus_dir: Path) -> dict[str, dict]:
"""Load queries from local directory.
Args:
corpus_dir: Path to corpus directory
Returns:
Dict mapping query ID to query data
"""
queries = {}
with open(corpus_dir / "queries.jsonl") as f:
for line in f:
query = json.loads(line)
queries[query["_id"]] = query
return queries
def load_qrels(corpus_dir: Path) -> dict[str, list[tuple[str, int]]]:
"""Load query relevance judgments from local directory.
Args:
corpus_dir: Path to corpus directory
Returns:
Dict mapping query ID to list of (doc_id, score) tuples
"""
qrels: dict[str, list[tuple[str, int]]] = {}
with open(corpus_dir / "qrels" / "test.tsv") as f:
next(f) # Skip header
for line in f:
query_id, corpus_id, score = line.strip().split("\t")
if query_id not in qrels:
qrels[query_id] = []
qrels[query_id].append((corpus_id, int(score)))
# Sort by score descending
for query_id in qrels:
qrels[query_id].sort(key=lambda x: x[1], reverse=True)
return qrels
async def generate_ground_truth_answer(
query_text: str, relevant_docs: list[dict[str, Any]], llm
) -> str:
"""Generate ground truth answer from highly relevant documents.
Args:
query_text: The query/question
relevant_docs: List of highly relevant documents (top 5)
llm: LLM provider instance
Returns:
Generated ground truth answer
"""
# Construct context from documents
context_parts = []
for i, doc in enumerate(relevant_docs, 1):
context_parts.append(
f"Document {i}:\nTitle: {doc['title']}\nText: {doc['text']}\n"
)
context = "\n".join(context_parts)
# Generate ground truth
prompt = f"""Based on the following medical/biomedical documents, provide a comprehensive, factual answer to this question.
Question: {query_text}
{context}
Instructions:
- Provide a clear, well-structured answer that synthesizes information from the documents
- Focus on accuracy and completeness
- Use specific facts and findings from the documents
- Keep the answer concise but informative (2-4 paragraphs)
- Do not make up information not present in the documents
Answer:"""
click.echo(f" Generating answer for: {query_text}")
answer = await llm.generate(prompt, max_tokens=500)
click.echo(f" Generated {len(answer)} characters")
return answer.strip()
@click.group()
def cli():
"""RAG Evaluation Management CLI.
Manage ground truth generation and corpus upload for RAG evaluation tests.
"""
pass
@cli.command()
@click.option(
"--provider",
type=click.Choice(["ollama", "anthropic"]),
default="ollama",
help="LLM provider to use for generation",
)
@click.option(
"--model",
help="Model name (default: llama3.2:1b for Ollama, claude-3-5-sonnet-20241022 for Anthropic)",
)
@click.option(
"--force-download",
is_flag=True,
help="Force re-download of nfcorpus dataset",
)
def generate(provider: str, model: str | None, force_download: bool):
"""Generate ground truth answers for RAG evaluation.
This command:
1. Downloads nfcorpus dataset (if not already cached)
2. For each selected query, extracts highly relevant documents
3. Uses an LLM to synthesize a reference answer
4. Saves ground truth to fixtures/ground_truth.json
Environment variables:
RAG_EVAL_PROVIDER: Provider type (ollama or anthropic)
RAG_EVAL_OLLAMA_BASE_URL: Ollama base URL
RAG_EVAL_OLLAMA_MODEL: Ollama model name
RAG_EVAL_ANTHROPIC_API_KEY: Anthropic API key
RAG_EVAL_ANTHROPIC_MODEL: Anthropic model name
"""
async def _generate():
click.echo("=" * 80)
click.echo("RAG Ground Truth Generation")
click.echo("=" * 80)
# Ensure corpus is downloaded
corpus_dir = ensure_corpus_downloaded(force_download)
# Load dataset
click.echo("\nLoading nfcorpus dataset...")
corpus = load_corpus(corpus_dir)
queries = load_queries(corpus_dir)
qrels = load_qrels(corpus_dir)
click.echo(f"Loaded {len(corpus)} documents, {len(queries)} queries")
# Create LLM provider
click.echo("\nInitializing LLM provider...")
try:
llm = create_llm_provider(
provider=provider,
ollama_model=model if provider == "ollama" else None,
anthropic_model=model if provider == "anthropic" else None,
)
provider_type = type(llm).__name__
click.echo(f"Using provider: {provider_type}")
except ValueError as e:
click.echo(f"\nError: {e}", err=True)
return 1
# Generate ground truth for each selected query
ground_truth_data = []
try:
for query_id in SELECTED_QUERIES:
if query_id not in queries:
click.echo(
f"\nWarning: Query {query_id} not found in dataset", err=True
)
continue
query = queries[query_id]
query_text = query["text"]
# Get highly relevant documents (score=2)
if query_id not in qrels:
click.echo(
f"\nWarning: No relevance judgments for {query_id}", err=True
)
continue
highly_relevant_doc_ids = [
doc_id for doc_id, score in qrels[query_id] if score == 2
]
if not highly_relevant_doc_ids:
click.echo(
f"\nWarning: No highly relevant docs for {query_id}", err=True
)
continue
# Get top 5 highly relevant documents
relevant_docs = []
for doc_id in highly_relevant_doc_ids[:5]:
if doc_id in corpus:
relevant_docs.append(corpus[doc_id])
if not relevant_docs:
click.echo(
f"\nWarning: Could not load documents for {query_id}", err=True
)
continue
# Generate ground truth answer
click.echo(f"\n{'-' * 80}")
ground_truth_answer = await generate_ground_truth_answer(
query_text, relevant_docs, llm
)
# Store result
ground_truth_data.append(
{
"query_id": query_id,
"query_text": query_text,
"ground_truth_answer": ground_truth_answer,
"expected_document_ids": highly_relevant_doc_ids,
"highly_relevant_count": len(highly_relevant_doc_ids),
}
)
click.echo(f" Preview: {ground_truth_answer[:200]}...")
finally:
await llm.close()
# Save ground truth
GROUND_TRUTH_FILE.parent.mkdir(parents=True, exist_ok=True)
with open(GROUND_TRUTH_FILE, "w") as f:
json.dump(ground_truth_data, f, indent=2)
click.echo(f"\n{'=' * 80}")
click.echo(f"Generated {len(ground_truth_data)} ground truth answers")
click.echo(f"Saved to: {GROUND_TRUTH_FILE}")
click.echo("=" * 80)
return 0
sys.exit(anyio.run(_generate))
@cli.command()
@click.option(
"--nextcloud-url",
envvar="NEXTCLOUD_HOST",
required=True,
help="Nextcloud base URL (e.g., http://localhost:8000)",
)
@click.option(
"--username",
envvar="NEXTCLOUD_USERNAME",
required=True,
help="Nextcloud username",
)
@click.option(
"--password",
envvar="NEXTCLOUD_PASSWORD",
required=True,
help="Nextcloud password",
)
@click.option(
"--category",
default="nfcorpus_rag_eval",
help="Category/folder for uploaded notes",
)
@click.option(
"--force-download",
is_flag=True,
help="Force re-download of nfcorpus dataset",
)
@click.option(
"--force",
is_flag=True,
help="Delete all existing notes in the target category before uploading",
)
def upload(
nextcloud_url: str,
username: str,
password: str,
category: str,
force_download: bool,
force: bool,
):
"""Upload nfcorpus corpus documents as Nextcloud notes.
This command:
1. Downloads nfcorpus dataset (if not already cached)
2. Optionally deletes existing notes in target category (--force)
3. Uploads all corpus documents as Nextcloud notes
4. Saves document ID note ID mapping to fixtures/note_mapping.json
The note mapping file is used by pytest tests to map expected document IDs
to actual note IDs in Nextcloud.
"""
async def _upload():
click.echo("=" * 80)
click.echo("Upload nfcorpus Corpus to Nextcloud")
click.echo("=" * 80)
# Ensure corpus is downloaded
corpus_dir = ensure_corpus_downloaded(force_download)
# Load corpus
click.echo("\nLoading corpus...")
corpus = load_corpus(corpus_dir)
click.echo(f"Loaded {len(corpus)} documents")
# Create Nextcloud client
click.echo(f"\nConnecting to Nextcloud at {nextcloud_url}...")
nc_client = NextcloudClient(
base_url=nextcloud_url,
username=username,
auth=BasicAuth(username, password),
)
try:
# Delete existing notes in category if force is specified
if force:
click.echo(
f"\n--force specified: Deleting existing notes in category '{category}'..."
)
# Collect notes to delete
notes_to_delete = []
async for note in nc_client.notes.get_all_notes():
if note.get("category") == category:
notes_to_delete.append(note["id"])
if not notes_to_delete:
click.echo(f"No existing notes found in category '{category}'")
else:
click.echo(f"Found {len(notes_to_delete)} notes to delete")
deleted_count = 0
delete_errors = []
delete_semaphore = anyio.Semaphore(20)
async def delete_note(note_id: int):
"""Delete a single note."""
nonlocal deleted_count
async with delete_semaphore:
try:
await nc_client.notes.delete_note(note_id)
deleted_count += 1
if deleted_count % 100 == 0:
click.echo(f" Deleted {deleted_count} notes...")
except Exception as e:
error_msg = f"Error deleting note {note_id}: {e}"
delete_errors.append(error_msg)
click.echo(f" {error_msg}", err=True)
# Delete all notes concurrently
async with anyio.create_task_group() as tg:
for note_id in notes_to_delete:
tg.start_soon(delete_note, note_id)
click.echo(
f"Deleted {deleted_count} existing notes in category '{category}'"
)
if delete_errors:
click.echo(
f"Encountered {len(delete_errors)} errors during deletion",
err=True,
)
# Upload documents concurrently
click.echo(f"\nUploading {len(corpus)} documents as notes (concurrent)...")
click.echo(f"Category: {category}")
note_mapping = {}
uploaded_count = 0
upload_errors = []
# Semaphore to limit concurrent uploads (avoid overwhelming server)
max_concurrent = 20
semaphore = anyio.Semaphore(max_concurrent)
async def upload_document(doc_id: str, doc: dict[str, Any]):
"""Upload a single document as a note."""
nonlocal uploaded_count
async with semaphore:
title = f"[{doc_id}] {doc['title'][:100]}" # Truncate long titles
content = doc["text"]
try:
note_data = await nc_client.notes.create_note(
title=title,
content=content,
category=category,
)
# Store mapping
note_id = note_data["id"]
note_mapping[doc_id] = note_id
uploaded_count += 1
# Progress indicator every 100 docs
if uploaded_count % 100 == 0:
click.echo(
f" Uploaded {uploaded_count}/{len(corpus)} documents..."
)
except Exception as e:
error_msg = f"Error uploading {doc_id}: {e}"
upload_errors.append(error_msg)
click.echo(f" {error_msg}", err=True)
# Upload all documents concurrently using task group
async with anyio.create_task_group() as tg:
for doc_id, doc in corpus.items():
tg.start_soon(upload_document, doc_id, doc)
click.echo(f"\nUploaded {uploaded_count} documents successfully")
if upload_errors:
click.echo(
f"Encountered {len(upload_errors)} errors during upload", err=True
)
# Save note mapping
with open(NOTE_MAPPING_FILE, "w") as f:
json.dump(note_mapping, f, indent=2)
click.echo(f"Saved note mapping to: {NOTE_MAPPING_FILE}")
click.echo(f" Mapped {len(note_mapping)} document IDs to note IDs")
finally:
# Close the Nextcloud client
await nc_client.close()
click.echo("=" * 80)
click.echo("Upload complete!")
click.echo("=" * 80)
return 0
sys.exit(anyio.run(_upload))
if __name__ == "__main__":
cli()
Generated
+1783 -163
View File
File diff suppressed because it is too large Load Diff