Commit Graph

252 Commits

Author SHA1 Message Date
Chris Coutinho 050b4d2eeb Update plotly cdn to cloudflare 2025-11-26 21:50:39 +01:00
Chris Coutinho c4ce28f05d fix: Improve 3D plot rendering with explicit dimensions and window resize support
- Get container dimensions before creating Plotly layout to render at correct size immediately
- Add init() method with window resize listener for responsive plot sizing
- Remove post-render resize call (no longer needed with explicit dimensions)
- Improve colorbar positioning and scene domain configuration

This eliminates the visual "jump" during initial render and ensures the plot resizes smoothly when the browser window changes size.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-19 19:43:20 +01:00
Chris Coutinho c126c3ec03 fix: Preserve 3D plot camera and improve documentation
This commit addresses PR feedback and fixes plot camera behavior.

## JavaScript Fix - Camera Preservation
- Changed plot update strategy from recreating layout to using Plotly.restyle()
- Query point visibility now toggles via restyle() which only modifies trace visibility
- Camera position/zoom naturally preserved since layout remains untouched
- Resolves jumpy plot behavior when toggling "Show Query Point" checkbox

Related: nextcloud_mcp_server/auth/static/vector-viz.js:58-73

## Documentation Improvements
- Condensed vector-sync-ui.md from 316 to 94 lines (~70% reduction)
- Removed redundant FAQ section (content merged into main sections)
- Simplified use cases from 4 detailed sections to 3 focused paragraphs
- Streamlined troubleshooting to 3 common issues
- Merged technical details into overview section
- Retained all essential information while improving readability

## Screenshot Updates
Removed old/outdated images (5 files):
- rag-workflow-bidirectional-final.png
- rag-workflow-prominent-llm.png
- rag-workflow-simple-final.png
- vector-viz-interface.png
- welcome-page.png

Replaced with current screenshots (3 files):
- vector-viz-document-types-2col.png - Now shows plot + results
- vector-viz-chunk-context.png - Centered content view
- vector-viz-results.png - Updated results list

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-19 14:10:53 +01:00
Chris Coutinho 9bd02d7ef7 fix: Preserve 3D plot camera position and fix CSS loading
Two fixes for the vector visualization page:

1. **CSS Loading Fix**: Moved CSS <link> from vector_viz.html fragment
   to user_info.html <head> block. HTMX fragments don't process <link>
   tags in <head>, causing unstyled page. Now CSS loads correctly.

2. **Camera Preservation**: Modified renderPlot() to preserve camera
   position when toggling query point visibility. Previously, toggling
   the "Show Query Point" checkbox would reset zoom/rotation to default.
   Now reads existing camera settings from plot before updating.

Related: nextcloud_mcp_server/auth/static/vector-viz.js:123-130
Related: nextcloud_mcp_server/auth/templates/user_info.html:12

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-19 13:51:08 +01:00
Chris Coutinho 53689d076b feat: Improve vector visualization with static assets and fixes
- Extract CSS and JavaScript into separate static files
  - Created nextcloud_mcp_server/auth/static/vector-viz.css
  - Created nextcloud_mcp_server/auth/static/vector-viz.js
  - Updated templates to reference external assets

- Fix vector visualization issues:
  - Normalize vectors before PCA to match Qdrant's cosine distance
  - Add zero-norm and NaN detection/handling for large datasets
  - Enable responsive Plotly sizing (autosize + responsive config)
  - Widen plot area to full viewport width with minimized margins

- Improve visualization accuracy:
  - Query point now positioned correctly relative to documents
  - Handles 200+ points without JSON serialization errors
  - Full-width plot maximizes screen space utilization

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-19 04:10:44 +01:00
Chris Coutinho 9db20a4d01 feat: Redesign UI to match Nextcloud ecosystem aesthetic
This commit updates the web interface to better align with Nextcloud's
design system and improve the Vector Viz layout.

Changes:
- Replace emoji icons with Material Design SVG icons for better
  consistency with Nextcloud apps
- Simplify navigation styling with minimal padding and subtle active
  states (250px width)
- Update CSS variables to match Nextcloud design system
- Restructure Vector Viz from two-column to single-column vertical
  layout for better plot visibility
- Move search controls to compact horizontal grid at top
- Make navigation toggle always visible (not just on mobile)
- Fix plot container sizing with overflow:visible to prevent colorbar
  clipping
- Remove heavy shadows and custom card styling for cleaner aesthetic
- Add error and success page templates with consistent styling

Technical details:
- Preserve Alpine.js for reactive functionality
- Use CSS Grid for responsive horizontal controls layout
- Add smooth transitions for navigation collapse/expand
- Maintain HTMX for dynamic content loading

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-19 00:45:19 +01:00
Chris Coutinho eec923eff5 feat: Replace custom document chunker with LangChain MarkdownTextSplitter
Migrates from custom word-based chunking to LangChain's MarkdownTextSplitter
for better semantic search quality. This implements the chunking portion of
ADR-011.

Changes:
- Replace custom regex word chunker with MarkdownTextSplitter
- Optimized for Markdown content (headers, code blocks, lists)
- Convert from word-based (512 words) to character-based (2048 chars) chunking
- Maintain backward-compatible ChunkWithPosition interface
- Update configuration defaults and validation
- Update all unit tests (12/12 passing)

Benefits:
- Respects markdown structure boundaries
- Never breaks code blocks or headers mid-chunk
- Preserves semantic coherence within chunks
- Expected 20-30% improvement in recall quality
- Industry-standard approach (used by production RAG systems)

Note: Full reindex required to apply new chunking to existing documents.
Current vector database still contains old word-based chunks.

Related: ADR-011 (Improving Semantic Search Quality)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-18 12:17:23 +01:00
Chris Coutinho d374bfa1e5 feat(viz): Add dual-score display and improve UI controls
This commit enhances the vector visualization interface with better score
transparency and improved UX:

**Dual-Score Display:**
- Store original algorithm scores before normalization (viz_routes.py:203)
- Display both raw and normalized scores: "Raw Score: 0.842 (89% relative)"
- Update plot hover text with dual scores (userinfo_routes.py:740)
- Fixes issue where all queries showed at least one 100% match regardless
  of actual relevance (normalization artifact)

**UI Improvements:**
1. Fusion Method dropdown: Changed from x-show to :disabled
   - Prevents jarring layout shift when switching algorithms
   - Dropdown stays visible but grayed out when Semantic is selected
   - Better UX with opacity: 0.5 and cursor: not-allowed

2. Score Threshold: Changed step from 0.1 to "any"
   - Allows arbitrary float precision (0.7, 0.85, 0.123)
   - Users can now fine-tune threshold values

3. Document Types: Converted multi-select to checkbox grid
   - Replaced clunky Ctrl/Cmd multi-select listbox
   - Checkbox grid with cleaner layout
   - Positioned left of Score Threshold and Result Limit inputs
   - More intuitive UX

**Technical Details:**
- Raw score ranges vary by algorithm:
  - Semantic: 0.0-1.0 (cosine similarity)
  - BM25 RRF: ~0.001-0.033 (Reciprocal Rank Fusion)
  - BM25 DBSF: Can exceed 1.0 (Distribution-Based Score Fusion)
- Normalized scores (0-1) used for visual encoding (marker size, color)
- Original scores preserved in API response via getattr fallback

Files modified:
- nextcloud_mcp_server/auth/viz_routes.py (store original_score)
- nextcloud_mcp_server/auth/templates/vector_viz.html (UI controls)
- nextcloud_mcp_server/auth/userinfo_routes.py (plot hover text)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-17 08:05:49 +01:00
Chris Coutinho 3aa7128f45 feat: add chunk position tracking to vector indexing and search
Track character offsets (start_offset, end_offset) for each chunk in vector
database metadata, enabling precise chunk highlighting in visualization pane.

Changes:
- processor.py: Store chunk_start_offset and chunk_end_offset in Qdrant metadata
- processor.py: Added metadata_version=2 to indicate position tracking support
- search/semantic.py: Return chunk positions from search results
- server/semantic.py: Expose chunk positions in API responses (SemanticSearchResult)

Enables viz pane to:
1. Display exact matched chunk with surrounding context
2. Highlight the precise portion of text that matched the query
3. Build user trust by showing what the RAG system actually retrieved

Position tracking uses ChunkWithPosition dataclass from document_chunker.py
which provides character-accurate offsets in the original document.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-17 06:47:58 +01:00
Chris Coutinho c3282534eb feat: add vector viz template and chunk context endpoint
Extracted vector visualization HTML template to separate file to resolve
syntax conflicts between Jinja2, Alpine.js, and CSS. Added chunk context
endpoint for fetching matched chunks with surrounding text.

Changes:
- Moved vector_viz.html to templates/ directory (separates Jinja2/Alpine.js/CSS)
- Added /app/chunk-context endpoint for retrieving chunk text with context
- Updated .dockerignore to include HTML files in Docker builds
- Moved anthropic and boto3 to main dependencies (needed for production features)
- Added jinja2 dependency for template rendering

Fixes Jinja2 TemplateSyntaxError caused by CSS colons being parsed as
Jinja2 syntax when template was inline in Python code.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-17 06:46:52 +01:00
Chris Coutinho 862308418e fix: prevent infinite loop in DocumentChunker with position tracking
Fixed a critical infinite loop bug in document_chunker.py that occurred
when the overlap parameter caused the chunker to not make forward progress.

Changes:
- Added ChunkWithPosition dataclass to track character positions
- Refactored chunk_text() to use regex word matching for accurate position tracking
- Added safety check to ensure forward progress (next_start_idx > start_idx)
- Changed return type from list[str] to list[ChunkWithPosition]

The bug manifested when:
1. end_idx reached len(word_matches) (processing last chunk)
2. next_start_idx = end_idx - overlap would not advance past start_idx
3. Loop would continue indefinitely without making progress

Fix ensures chunker always terminates by breaking when not advancing.

All 9 unit tests now pass in 1.66s (previously timing out at 180s).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-17 06:39:15 +01:00
Chris Coutinho 3464b21845 fix: Relax SearchResult validation to support DBSF fusion scores > 1.0
Fix false-positive validation error where DBSF (Distribution-Based Score
Fusion) correctly produces scores > 1.0 but SearchResult validation
incorrectly rejected them.

**Root Cause**: SearchResult.__post_init__() enforced scores in [0.0, 1.0]
range, but DBSF sums normalized scores from multiple retrieval systems
(dense semantic + sparse BM25), resulting in scores like 1.55 when both
systems strongly agree a document is relevant.

**Changes**:
- Relaxed validation to allow any score ≥ 0.0 (algorithms.py:147-157)
- Updated SearchResult and SemanticSearchResult documentation to explain
  score ranges for RRF ([0.0, 1.0]) vs DBSF (unbounded)
- Added comprehensive test coverage for both fusion methods
- Added DBSF fusion option to vector visualization UI
- Updated viz routes and vizApp() to support fusion parameter selection

**Testing**: All 157 unit tests pass, type checking passes, ruff passes

Fixes error: "Configuration error: Score must be between 0.0 and 1.0, got 1.1528953"

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-17 06:32:30 +01:00
Chris Coutinho 1504df6fb5 Merge branch 'master' into feature/bedrock 2025-11-16 12:08:23 +01:00
Chris Coutinho c28fc955ca Merge origin/master into feature/bm25
Resolved conflicts:
- viz_routes.py: Kept bm25's extract_dense_vector() function for robust vector handling
- hybrid.py: Removed (bm25 uses native Qdrant RRF fusion instead)
- uv.lock: Regenerated after accepting master's dependencies

This merge brings in:
- RAG evaluation framework (ADR-013)
- Performance optimizations (double-fetch elimination)
- Migration from asyncio to anyio
- OpenTelemetry tracing improvements
- Notes app enhancements

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-16 11:52:40 +01:00
Chris Coutinho ad4b45889f fix: suppress Starlette middleware type warnings in ty checker 2025-11-16 11:43:50 +01:00
Chris Coutinho 5b484c9226 feat: add unified provider architecture with Amazon Bedrock support
Refactored LLM provider infrastructure to support sustainable additions of new providers with both embedding and text generation capabilities.

## Major Changes

### Unified Provider Architecture (ADR-015)
- Created `nextcloud_mcp_server/providers/` with unified Provider ABC
- Providers now support optional capabilities (embeddings and/or generation)
- Auto-detection registry with priority: Bedrock → Ollama → Simple
- Backward compatible - existing code continues to work

### New Providers
- **BedrockProvider**: Full Amazon Bedrock integration
  - Embeddings: Titan Embed, Cohere Embed models
  - Generation: Claude, Llama, Titan Text, Mistral models
  - Model-specific request/response handling
  - AWS credential chain integration
- **OllamaProvider**: Migrated with both capabilities support
- **AnthropicProvider**: Moved from test code to production providers
- **SimpleProvider**: Migrated in-memory fallback provider

### Breaking Changes
None - full backward compatibility maintained:
- `embedding.get_embedding_service()` still works
- RAG evaluation tests updated to use unified providers
- All existing tests pass (127 unit tests)

### Testing
- Added 9 comprehensive Bedrock unit tests with mocked boto3
- All existing unit tests pass
- Type checking (ty) and linting (ruff) pass
- Verified backward compatibility

### Documentation
- `docs/ADR-015-unified-provider-architecture.md`: Comprehensive ADR
- `docs/bedrock-setup.md`: AWS setup guide with IAM permissions
- `CLAUDE.md`: Updated with provider architecture section

### Dependencies
- Added `boto3>=1.35.0` to dev dependencies (optional)

## Environment Variables

### Bedrock
- `AWS_REGION`: AWS region (e.g., "us-east-1")
- `BEDROCK_EMBEDDING_MODEL`: Model ID for embeddings
- `BEDROCK_GENERATION_MODEL`: Model ID for generation
- `AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY`: Optional credentials

### Ollama
- `OLLAMA_BASE_URL`: API URL
- `OLLAMA_EMBEDDING_MODEL`: Embedding model (default: "nomic-embed-text")
- `OLLAMA_GENERATION_MODEL`: Generation model

## AWS Bedrock Permissions Required

Minimal IAM policy:
```json
{
  "Effect": "Allow",
  "Action": ["bedrock:InvokeModel"],
  "Resource": ["arn:aws:bedrock:*::foundation-model/*"]
}
```

See `docs/bedrock-setup.md` for detailed setup instructions.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-16 11:36:58 +01:00
Chris Coutinho 8799450c7d Merge pull request #306 from cbcoutinho/rag-evaluation
feat: RAG evaluation framework with performance improvements
2025-11-16 11:17:41 +01:00
Chris Coutinho c4bf077050 feat: Add OpenTelemetry tracing to @instrument_tool decorator
Enhances the @instrument_tool decorator to create distributed traces
for all MCP tool executions, improving observability and debugging.

Changes:
- Modified @instrument_tool to wrap tool execution in trace_operation
- Added automatic span creation with mcp.tool.* span names
- Sanitized tool arguments before adding to span attributes
  (excludes password, token, secret, api_key, etag, ctx)
- Limited argument strings to 500 characters to prevent huge spans
- Maintained existing Prometheus metrics functionality
- Updated docs/observability.md to reflect correct decorator name
- Added comprehensive unit tests

All ~50+ MCP tools now emit traces automatically without code changes.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-16 11:16:05 +01:00
Chris Coutinho f559ca049e Merge branch 'rag-evaluation' 2025-11-16 10:26:19 +01:00
Chris Coutinho 02700a8e2c perf: Eliminate double-fetching in semantic search sampling
Performance optimization that removes redundant verification step and
makes content fetching parallel in nc_semantic_search_answer tool.

Changes:
- Remove verification.py module (only had 1 caller)
- Refactor nc_semantic_search to do inline deduplication instead of
  calling verify_search_results()
- Migrate verification patterns (anyio task group, semaphore limiting)
  to nc_semantic_search_answer's content fetching
- Change content fetching from sequential loop to parallel execution

Performance impact:
- Before: 10 API calls (5 parallel verification + 5 sequential content)
  = ~5.5s overhead
- After: 5 API calls (parallel content fetch) = ~0.5s overhead
- Result: 50% fewer API calls, ~10x faster for sampling operations

Technical details:
- Uses anyio.create_task_group() for structured concurrency
- Semaphore limiting (max_concurrent=20) prevents connection pool exhaustion
- Index-based storage maintains result ordering
- Expected failures (deleted notes) logged at debug level
- Deduplication handles hybrid search returning same doc from dense + sparse

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-16 10:25:04 +01:00
Chris Coutinho 1faf572546 Merge branch 'feature/bm25'
Resolves conflict in viz_routes.py by combining:
- Named vector extraction from feature/bm25
- Performance timing from master
2025-11-16 08:18:39 +01:00
Chris Coutinho 944b6dcf5a fix: Handle named vectors in visualization and semantic search
- viz_routes.py: Extract "dense" vector from named vector dict
- semantic.py: Specify using="dense" for BM25 hybrid collections
- Fixes "X must be 2D array" error in hybrid search
- Fixes "Dense vector  is not found" error in semantic search

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-16 08:16:35 +01:00
Chris Coutinho 2aa82d849c Merge branch 'feature/bm25' 2025-11-16 07:57:36 +01:00
Chris Coutinho fc6a2f14e4 fix: Update vizApp to use bm25_hybrid algorithm and remove deprecated weights
The visualization UI was still using the old 'hybrid' algorithm name and
weight parameters that were replaced by the BM25 hybrid search refactor.
This caused "Unknown algorithm: hybrid" errors when using the search
& visualize feature.

Changes:
- Update default algorithm from 'hybrid' to 'bm25_hybrid'
- Update default scoreThreshold from 0.7 to 0.0 to match backend
- Remove deprecated semanticWeight, keywordWeight, fuzzyWeight parameters
- Remove weight parameters from search request

Fixes the visualization search functionality after BM25 hybrid refactor.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-16 07:54:20 +01:00
Chris Coutinho 16c22c953b fix: Update viz routes to use BM25 hybrid search after refactor
- Remove obsolete search algorithm imports (Fuzzy, Keyword, Hybrid)
- Update UI to only show Semantic and BM25 Hybrid algorithms
- Replace manual weight controls with RRF fusion info message
- Update default algorithm from "hybrid" to "bm25_hybrid"
- Remove weight parameters (semantic_weight, keyword_weight, fuzzy_weight)
- Update score_threshold default from 0.7 to 0.0 for RRF scoring
- Document ty type checker in CLAUDE.md

Fixes unresolved-import type errors after BM25 refactor.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-16 07:23:11 +01:00
Chris Coutinho 529daf2b48 ci: temp disable sse in ci 2025-11-16 07:03:18 +01:00
Chris Coutinho 137d1d6c75 perf: fix vector viz search performance and visual encoding
This commit addresses critical performance issues with vector visualization
search (reducing time from 40s to ~2s) and improves result visualization
through better visual encoding.

## Performance Fixes

### 1. Fix blocking sleep in retry decorator (base.py:51)
- Changed `time.sleep(5)` to `await anyio.sleep(5)` in @retry_on_429
- Prevents entire event loop from freezing during rate limit retries
- Impact: Reduced search time from 22s to 16s initially

### 2. Add concurrency limiting for verification (verification.py:77-93)
- Added `anyio.Semaphore(20)` to limit concurrent HTTP requests
- Prevents connection pool exhaustion (RequestError) from 90+ simultaneous requests
- Fixes false filtering (was filtering 77/90 results incorrectly)
- Note: Semaphore still in code but verification removed from viz endpoint

### 3. Remove unnecessary verification from viz endpoint (viz_routes.py:483-486)
- Visualization only needs Qdrant metadata (title, excerpt), not full content
- Verification only required for sampling (LLM needs full note content)
- Impact: Reduced search time from 43.7s to ~2s (final fix)

### 4. Restore streaming scanner pattern (scanner.py)
- Process notes one-at-a-time using async generator
- Avoids loading all notes into memory

## Visualization Improvements

### 5. Result-relative score normalization (viz_routes.py:489-504)
- Normalize scores within result set: best=1.0, worst=0.0
- Removes arbitrary RRF normalization (theoretical max didn't make sense)
- Makes visual encoding meaningful regardless of algorithm scores

### 6. Power scaling for marker sizes (userinfo_routes.py:743)
- Changed from linear `8 + (score * 12)` to power `6 + (score² * 14)`
- Creates dramatic visual contrast: 0.0→6px, 0.5→9.5px, 1.0→20px
- Combined with opacity (0.2-1.0) for clear visual hierarchy

### 7. Multi-channel visual encoding (userinfo_routes.py:740-745)
- Size: Exponentially scaled with score²
- Opacity: Linear 0.2-1.0 (keeps all points visible)
- Color: Viridis gradient (blue→yellow)
- Effect: Top results are large/bright/opaque, context results small/dim/transparent

## Result
- Search time: 40s → ~2s (20x faster)
- Visual contrast: Subtle → dramatic (clear result hierarchy)
- No arbitrary cutoffs: All results visible, best naturally highlighted

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-16 07:01:35 +01:00
Chris Coutinho 6fe5596c13 feat: Implement BM25 hybrid search with native Qdrant RRF fusion
Replace custom keyword/fuzzy search algorithms with industry-standard BM25
sparse vectors, combined with dense semantic vectors using Qdrant's native
Reciprocal Rank Fusion (RRF). This consolidates search architecture and
improves relevance for both semantic and keyword queries.

Key changes:
- Add fastembed dependency for BM25 sparse vector generation
- Update Qdrant collection schema to support named vectors (dense + sparse)
- Create BM25SparseEmbeddingProvider using FastEmbed's Qdrant/bm25 model
- Implement BM25HybridSearchAlgorithm with native Qdrant RRF prefetch
- Update document processor to generate both dense and sparse embeddings
- Simplify nc_semantic_search() tool to use BM25 hybrid only
- Remove legacy keyword.py, fuzzy.py, and custom hybrid.py (736 lines)
- Update ADR-014 with implementation notes and test results

Benefits:
- Consolidated architecture (single Qdrant database)
- Native database-level RRF fusion (more efficient)
- Industry-standard BM25 (replaces brittle custom keyword search)
- Better relevance across semantic and keyword queries
- Simplified codebase (-285 net lines)

Tests: All 125 tests passing (118 unit, 7 integration)

Implements ADR-014: Replace Custom Keyword Search with BM25 Hybrid Search

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-16 06:59:44 +01:00
Chris Coutinho c8d9cc24e0 refactor: migrate asyncio to anyio for consistent structured concurrency
Replace asyncio primitives with anyio equivalents throughout the codebase
to establish a single async pattern. This provides better structured
concurrency with automatic cancellation on errors and aligns with the
pytest anyio configuration.

Changes:
- hybrid.py: Replace asyncio.gather() with anyio task groups
- token_broker.py: Replace asyncio.Lock() with anyio.Lock()
- storage.py: Replace asyncio.run() with anyio.run()
- app.py: Replace tg.start_soon() with await tg.start() for task status
- processor.py: Add task_status parameter for structured startup
- scanner.py: Add task_status parameter for structured startup
- CLAUDE.md: Update async/await patterns guidance

The change from start_soon() to await tg.start() enables proper task
initialization signaling, ensuring background tasks are ready before
proceeding. This follows anyio best practices for structured concurrency.

All 118 unit tests pass with the new implementation.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-16 03:51:45 +01:00
Chris Coutinho eaeb8eae28 feat: Normalize hybrid search RRF scores to 0-1 range
Improve user comprehension by scaling RRF scores to match the intuitive
0-1 range used by other search algorithms.

## Problem

RRF (Reciprocal Rank Fusion) scores had a drastically different scale
than semantic/keyword/fuzzy scores:

- Semantic similarity: 0.0 to 1.0 (typical: 0.5-0.9)
- RRF scores: 0.0 to ~0.016 (typical: 0.005-0.015)

This caused user confusion - a score of 0.0078 looked terrible but was
actually excellent (near theoretical maximum).

## Solution

Normalize RRF scores using the formula:
`normalized_score = rrf_score * (rrf_k + 1) / total_weight`

Where:
- rrf_k = 60 (RRF constant)
- total_weight = sum of algorithm weights (default: 1.0)

**Example transformation:**
- Before: 0.0078 (confusing)
- After: 0.477 (intuitive)

## Changes

**nextcloud_mcp_server/search/hybrid.py:**
- Store total_weight as instance variable (line 63)
- Calculate normalization factor in _reciprocal_rank_fusion() (line 209)
- Apply normalization to all RRF scores (line 217)
- Preserve raw RRF score in metadata for debugging (line 222)

## Impact

**User Experience:**
- Hybrid search scores now comparable with semantic/keyword/fuzzy
- Score of 0.5 indicates good match across all algorithms
- Consistent scale improves score threshold usability

**Backward Compatibility:**
- Raw RRF scores preserved in metadata["rrf_score_raw"]
- Result ordering unchanged (normalization is linear transformation)
- Breaking change: Existing score thresholds need adjustment

**Performance:**
- Negligible overhead (single multiplication per result)

## Testing

Verified with nc_semantic_search and nc_semantic_search_answer:
- Hybrid scores now 0.47-0.7 range (was 0.003-0.011)
- Semantic scores unchanged (0.75)
- Result ordering preserved

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-15 06:48:58 +01:00
Chris Coutinho 42376483ab refactor: Optimize Nextcloud access verification with centralized filtering
Move access verification from individual search algorithms to final output
stage, eliminating redundant API calls and improving performance.

## Changes

**New:**
- `search/verification.py`: Centralized verification using anyio task groups
  - Deduplicates results by (doc_id, doc_type) before verification
  - Verifies all unique documents in parallel using structured concurrency
  - Filters out inaccessible documents in single pass

**Modified Search Algorithms:**
- `search/semantic.py`: Removed _deduplicate_and_verify() and _verify_document_access()
- `search/keyword.py`: Removed _verify_access() and parallel verification
- `search/fuzzy.py`: Removed _verify_access() and parallel verification
- `search/hybrid.py`: Removed nextcloud_client parameter passing

All algorithms now return unverified results from Qdrant payload.

**Modified Output Stages:**
- `server/semantic.py`: Added verify_search_results() call after search
- `auth/viz_routes.py`: Added verify_search_results() call after search

Both endpoints now verify access once at final stage with deduplication.

## Performance Impact

**Before:**
- Hybrid mode (limit=10): 30 API calls (10 per algorithm × 3 algorithms)
- Single algorithm: 10-20 API calls (with verification buffer)

**After:**
- Hybrid mode (limit=10): 10 API calls (deduplicated verification)
- Single algorithm: 10 API calls (deduplicated verification)

**Performance Gain:** 3x reduction in API calls for hybrid search

## Architecture Benefits

- **Separation of concerns**: Algorithms handle scoring, output stage handles security
- **Deduplication**: Each document verified exactly once
- **Parallel execution**: All verifications run concurrently via anyio task groups
- **Consistency**: Same verification logic across MCP tools and viz endpoints

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-15 06:21:06 +01:00
Chris Coutinho ed0825e661 feat: Enhance vector visualization UI and parallelize search verification
Vector Visualization Improvements:
- Add interactive vector viz tab with Alpine.js and Plotly.js to user info page
- Refactor viz route CSS for better scoping and maintainability
- Remove unused nextcloud_host variable

Performance Optimizations:
- Parallelize access verification in fuzzy and keyword search algorithms
- Use asyncio.gather() to verify multiple documents concurrently
- Add exception handling with return_exceptions=True for resilience

Dependencies:
- Update third_party/oidc submodule to include RFC 9728 resource_url support

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-15 05:39:07 +01:00
Chris Coutinho e3153822f7 perf: Exclude vector-sync status polling from distributed tracing
Skip tracing for /app/vector-sync/status to reduce noise from HTMX polling.
Metrics collection continues for this endpoint.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-15 05:19:35 +01:00
Chris Coutinho 2b35dd729f fix: Reorder tabs and fix viz pane session access
- Move Webhooks tab to the right (User Info | Vector Sync | Vector Viz | Webhooks)
- Use request.user.display_name instead of session for viz routes
- Fixes session middleware error when accessing via iframe
2025-11-15 02:41:42 +01:00
Chris Coutinho eb32bbbc6b feat: Add Vector Viz tab to app home page
- Add Vector Viz button to tab navigation
- Embed viz pane in iframe for seamless integration
- Only shown when vector sync is enabled
2025-11-15 02:38:05 +01:00
Chris Coutinho 916af1c8f3 feat: Add vector visualization pane with multi-select document types
- Add /app/vector-viz endpoint for interactive search testing
- Implement server-side PCA dimensionality reduction (768-dim → 2D)
- Support multi-select document type filter for cross-app search
- Support all search algorithms: semantic, keyword, fuzzy, hybrid
- Display 2D scatter plot of vector embeddings using Plotly
- Show search results with scores and document types
- Register viz routes in app.py
2025-11-15 02:32:10 +01:00
Chris Coutinho 9a62c8478f feat: Implement custom PCA to remove sklearn dependency
- Add custom PCA implementation using numpy eigendecomposition
- Replace sklearn.decomposition.PCA with custom implementation
- Maintains same API (fit, transform, fit_transform)
- Supports explained_variance_ratio_ for variance analysis
- Removes scikit-learn dependency from project
- Add type hints and assertion for type safety
2025-11-15 02:02:57 +01:00
Chris Coutinho 2a078093ed refactor!: Make all search algorithms query Qdrant payload, not Nextcloud
BREAKING CHANGE: Search algorithms now require Qdrant to be populated.
Vector sync must be enabled and documents indexed for search to work.

- Keyword and fuzzy search now query Qdrant scroll API for title/excerpt
- Remove inefficient Nextcloud API fetching pattern
- Add optional Nextcloud verification for security
- Deduplicate by (doc_id, doc_type) tuple, keeping chunk_index=0
- Align with document processor pattern that already stores text in Qdrant
2025-11-15 01:56:41 +01:00
Chris Coutinho b5b03bfd78 feat: Add multi-document Protocol with cross-app search support
Implements NextcloudClientProtocol for multi-document type search following
user requirement that document types are not 1:1 with apps (e.g., Notes app
specializes in markdown, while Files/WebDAV handles multiple file types).

Key Changes:
- NextcloudClientProtocol: Generic protocol with app-specific client properties
- get_indexed_doc_types(): Query Qdrant for actually-indexed document types
- Document dispatch: All algorithms check Qdrant before attempting access
- Cross-type deduplication: Use (doc_id, doc_type) tuples in hybrid RRF

Search Algorithm Updates:
- Semantic: Added _verify_document_access() with dispatch to appropriate client
  - Deduplication by (doc_id, doc_type) tuple
  - Only "note" verification implemented, others return None with info log
- Keyword: Added _fetch_documents() dispatch method
  - Queries Qdrant for available types before fetching
  - Supports cross-app search when doc_type=None
- Fuzzy: Same pattern as keyword search
- Hybrid: Already uses (doc_id, doc_type) for deduplication (no changes needed)

Future-Proof Design:
- File/calendar verification stubs in place
- Clear logging when unsupported types found
- Easy to extend when processor indexes new document types

Currently Supported:
- "note" documents fully implemented and tested
- Other types gracefully handled (logged but skipped)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-15 01:19:29 +01:00
Chris Coutinho f3bdb8b885 feat: Update nc_semantic_search tool with algorithm selection
Implements ADR-012 by adding multi-algorithm support to the MCP tool.

Key changes:
- Added algorithm parameter: "semantic"|"keyword"|"fuzzy"|"hybrid" (default: "hybrid")
- Added weight parameters for hybrid mode configuration
- Replaced direct Qdrant/embedding calls with search module abstractions
- Updated docstring to describe all four algorithms
- Simplified implementation: ~50 lines vs ~150 lines (67% reduction)
- Better error handling for missing vector sync

Algorithm selection:
- semantic: Pure vector similarity (requires VECTOR_SYNC_ENABLED=true)
- keyword: Token-based matching with weighted title/content scoring
- fuzzy: Character overlap for typo tolerance
- hybrid: RRF fusion with configurable weights (default: 0.5/0.3/0.2)

Backward compatibility:
- Tool name unchanged (nc_semantic_search)
- New parameters have sensible defaults
- Existing clients get hybrid search automatically (better than pure semantic)
- search_method field in response reflects actual algorithm used

Weight validation:
- Performed in HybridSearchAlgorithm constructor
- Must sum to ≤1.0 and all non-negative
- At least one weight must be > 0
- Clear error messages on validation failure

Next: Update viz pane to use same algorithms

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-15 00:25:55 +01:00
Chris Coutinho 11e620f2d1 feat: Implement unified search algorithm module
Creates shared search module with four algorithms implementing ADR-012:
- Semantic search (vector similarity via Qdrant)
- Keyword search (token-based matching from ADR-001)
- Fuzzy search (character overlap matching)
- Hybrid search (RRF fusion from ADR-003)

Architecture:
- Base SearchAlgorithm interface for consistent API
- SearchResult dataclass for unified result format
- All algorithms async and independently testable
- Proper logging and error handling throughout

Semantic Search (search/semantic.py):
- Extracted from server/semantic.py
- Vector similarity using Qdrant query_points
- Dual-phase authorization (vector filter + API verification)
- Deduplication of document chunks
- Configurable score threshold (default: 0.7)

Keyword Search (search/keyword.py):
- Implements ADR-001 token-based matching
- Title matches weighted 3x higher than content
- Case-insensitive token matching
- Relevance scoring with normalization
- Excerpt extraction with context

Fuzzy Search (search/fuzzy.py):
- Simple character overlap calculation
- Configurable threshold (default: 70%)
- Typo-tolerant matching
- Fast and dependency-free

Hybrid Search (search/hybrid.py):
- Reciprocal Rank Fusion (RRF) from ADR-003
- Parallel execution of sub-algorithms
- Configurable weights per algorithm
- RRF constant k=60 (standard value)
- Weight validation (must sum ≤1.0)

All algorithms:
- Share NextcloudClient for document access
- Support user_id filtering (multi-tenant)
- Support doc_type filtering (currently notes only)
- Return consistent SearchResult objects
- Properly formatted with ruff and type-checked

Next steps: Update MCP tool to use these algorithms

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-15 00:10:19 +01:00
Chris Coutinho 74e2ab2440 Merge pull request #297 from cbcoutinho/fix/helm-oidc-env-vars
fix: Use NEXTCLOUD_OIDC_CLIENT_ID/SECRET env vars consistently
2025-11-13 22:10:04 +01:00
Chris Coutinho 3648d478f1 fix: return all notes when search query is empty
Previously, an empty query string to nc_notes_search_notes would return
zero results due to an early return when no query tokens were present.

This was counterintuitive - users expect an empty query to list all
notes, not return nothing.

Changes:
- Modified NotesSearchController.search_notes() to return all notes
  when query is empty
- Added documentation to clarify this behavior
- Empty query results have _score: None (no relevance scoring)
- Non-empty query results continue to have relevance scores

Fixes behavior where listing all notes was impossible via the search tool.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-13 21:57:14 +01:00
Chris Coutinho 14a59fdff3 fix: Use NEXTCLOUD_OIDC_CLIENT_ID/SECRET env vars consistently
Fixes #296

The application code was looking for OIDC_CLIENT_ID and OIDC_CLIENT_SECRET
(without NEXTCLOUD_ prefix), but the Helm chart, documentation, and CLI
all use NEXTCLOUD_OIDC_CLIENT_ID and NEXTCLOUD_OIDC_CLIENT_SECRET.

This mismatch caused OAuth deployments via Helm to fail with crashloops
because the credentials weren't being found.

Changes:
- app.py: Use NEXTCLOUD_OIDC_CLIENT_ID/SECRET in setup_oauth_config()
- config.py: Use NEXTCLOUD_OIDC_CLIENT_ID/SECRET in get_settings()
- Updated documentation comments and error messages

This aligns with the documented naming convention where all Nextcloud-related
environment variables use the NEXTCLOUD_ prefix.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-13 21:48:58 +01:00
Chris Coutinho c3023d2cc3 feat: Complete Phase 5 - Instrument all 93 MCP tools
Applied @instrument_tool decorator to all 86 remaining tools
across 8 server files.

Instrumented files:
- calendar.py: 16 tools
- contacts.py: 7 tools
- deck.py: 25 tools
- webdav.py: 11 tools
- tables.py: 6 tools
- sharing.py: 5 tools
- cookbook.py: 13 tools
- semantic.py: 3 tools

Total: 93 tools instrumented (7 in notes.py + 86 in other files)

These metrics populate:
- MCP Tool Calls panel (by tool name and status)
- MCP Tool Duration panel (histogram)
- MCP Tool Errors panel (by tool name and error type)

This completes PR #295 - All 5 phases of metrics instrumentation done:
 Phase 1: Queue size metrics (2 locations)
 Phase 2: Health checks (1 location)
 Phase 3: Database operations (3 methods)
 Phase 4: OAuth token metrics (3 locations)
 Phase 5: MCP tool metrics (93 tools)

All 34 dashboard panels now have data sources.
2025-11-13 16:58:44 +01:00
Chris Coutinho 6253faee19 feat: Add instrumentation decorator and apply to notes tools (Phase 5)
Created @instrument_tool decorator for automatic MCP tool metrics collection.
Applied to all 7 tools in notes.py.

Changes:
- observability/metrics.py:
  * New instrument_tool() decorator for automatic timing and error tracking
  * Compatible with @mcp.tool() and @require_scopes() decorators
  * Records tool_name, duration, and success/error status

- server/notes.py:
  * Applied @instrument_tool to all 7 tool functions
  * nc_notes_create_note, nc_notes_update_note, nc_notes_append_content
  * nc_notes_search_notes, nc_notes_get_note, nc_notes_get_attachment
  * nc_notes_delete_note

These metrics will populate the MCP Tool Calls dashboard panels.

Part of PR #295 - Complete metrics instrumentation (Phase 5)
Remaining: 86 tools across 8 server files
2025-11-13 16:40:56 +01:00
Chris Coutinho c97f12d47e feat: Add OAuth token and database metrics (Phases 3-4)
Complete Prometheus instrumentation for OAuth token operations
and additional database operations to populate empty dashboard panels.

OAuth Token Metrics (Phase 4):
- unified_verifier.py:
  * Token validation cache hits/misses
  * JWT verification success/failure/error
  * Introspection validation results
  * Audience validation failures
- context_helper.py:
  * Token exchange cache hits/misses
  * RFC 8693 exchange success/error

Database Metrics (Phase 3 completion):
- storage.py:
  * get_refresh_token() with timing
  * delete_refresh_token() with timing
  * All operations record duration and success/error status

These metrics populate the following dashboard panels:
- Token Validations (by method and result)
- Token Cache Hit Rate
- Token Exchange Operations
- Database Operations (refresh token CRUD)
- Database Operation Duration

Part of PR #295 - Complete metrics instrumentation
2025-11-13 16:23:00 +01:00
Chris Coutinho a667d7c59c feat: Add metrics instrumentation for queue, health, and database operations
Implement Prometheus metrics to populate empty Grafana dashboard panels.

## Phase 1: Queue Size Metrics 
**File**: `processor.py`
- Track vector sync queue depth in real-time
- Update metric after receiving and processing each document
- Update metric during timeout (empty queue)
- Enables: "Processing Queue Depth" panel

## Phase 2: Health Check Metrics 
**File**: `app.py`
- Add Nextcloud connectivity check with timing
- Add Qdrant health check with timing
- Record dependency health status (up/down)
- Record health check duration
- Enables: 4 health status panels + health check duration panel

## Phase 3: Database Operation Metrics (Partial) 
**File**: `storage.py`
- Instrument `store_refresh_token()` method
- Track SQLite INSERT operation timing and success/error status
- Enables: Partial data for database operation latency panel

## Metrics Now Exposed

### Queue Metrics:
- `mcp_vector_sync_queue_size` - Real-time queue depth

### Health Metrics:
- `mcp_dependency_health{dependency="nextcloud"}` - UP/DOWN status
- `mcp_dependency_health{dependency="qdrant"}` - UP/DOWN status
- `mcp_dependency_check_duration_seconds{dependency}` - Health check latency

### Database Metrics:
- `mcp_db_operations_total{db="sqlite",operation="insert"}` - Operation count
- `mcp_db_operation_duration_seconds{db="sqlite",operation="insert"}` - Operation latency

## Dashboard Impact

**Panels Now Populated** (7/34 panels):
-  Processing Queue Depth
-  Nextcloud Health
-  Qdrant Health
-  Health Check Duration
-  Database Operation Latency (partial)
-  Vector sync panels (already working from PR #292)

**Panels Still Empty** (remaining work):
-  OAuth panels (4): Token validations, exchanges, cache hit rate, refresh ops
-  MCP tool panels (3): Call volume, error rates, execution duration
-  Database panel: Needs more SQLite operations instrumented (~29 remaining)

## Testing

Verified metric definitions exist and will be recorded on next deployment.

## Next Steps

Phase 4: OAuth token metrics (unified_verifier.py, context_helper.py, storage.py)
Phase 5: MCP tool metrics (all server/*.py files with @mcp.tool())
Phase 3 completion: Remaining 29 database operations in storage.py

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-13 16:14:38 +01:00
Chris Coutinho 4ea5ed72d4 feat: Add Grafana dashboard and vector sync metric instrumentation
Implement comprehensive observability for vector database synchronization
with Grafana dashboard and Prometheus metrics.

## Part 1: Grafana Dashboard

Created all-in-one operations dashboard with 7 rows and 34 panels:

### Dashboard Structure:
- **Overview Row**: Request rate, error rate, P95 latency, active requests
- **HTTP Metrics (RED)**: Request/error rates by endpoint, latency percentiles
- **MCP Tools**: Call volume, error rates, execution duration by tool
- **Nextcloud API**: API calls/latency by app, retry patterns
- **OAuth & Authentication**: Token validations, exchanges, cache hit rate
- **Dependencies & Health**: Status for Nextcloud/Qdrant/Keycloak/Unstructured
- **Vector Sync**: Processing throughput, queue depth, Qdrant operations

### Helm Chart Integration:
- Added dashboard-configmap.yaml template for automatic provisioning
- Configured Grafana sidecar auto-discovery (label: grafana_dashboard="1")
- Added dashboards configuration section in values.yaml (opt-in)
- Updated Chart.yaml with dashboard annotations
- Enhanced NOTES.txt with dashboard deployment instructions
- Comprehensive documentation in dashboards/README.md

Dashboard supports dynamic filtering via variables:
- datasource: Prometheus data source selection
- namespace: Filter by Kubernetes namespace
- pod: Multi-select pod filtering
- interval: Query interval (1m/5m/10m/30m/1h)

## Part 2: Vector Sync Metric Instrumentation

Implemented metric recording throughout vector sync pipeline:

### metrics.py:
Added convenience functions:
- record_vector_sync_scan() - Track documents per scan
- record_vector_sync_processing() - Track processing duration/status
- record_qdrant_operation() - Track database operations
- update_vector_sync_queue_size() - Track queue depth

### scanner.py:
- Record number of documents found in each scan
- Enables monitoring of scan throughput

### processor.py:
- Record processing duration for each document
- Track success/failure status with timing
- Record Qdrant upsert/delete operations
- Handle all code paths (success, deletion, error)

### semantic.py:
- Wrap Qdrant query_points with try/except
- Record search operation success/failure

## Metrics Exposed:

- mcp_vector_sync_documents_scanned_total
- mcp_vector_sync_documents_processed_total{status}
- mcp_vector_sync_processing_duration_seconds (histogram)
- mcp_vector_sync_queue_size (gauge)
- mcp_qdrant_operations_total{operation,status}

This enables monitoring of:
- Scan and processing throughput
- Processing latency (P50/P95/P99)
- Error rates for processing and Qdrant operations
- Queue depth trends
- Complete observability of vector sync pipeline

## Testing:

Verified locally that metrics are recorded correctly:
- 36 documents scanned
- 3 documents processed (avg 7.5s each)
- 3 successful Qdrant upsert operations
- Search operations tracked

## Deployment:

Enable dashboard provisioning in Helm values:
```yaml
dashboards:
  enabled: true
  grafanaFolder: "Nextcloud MCP"
```

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-13 11:49:20 +01:00
Chris Coutinho 6812e1aca7 fix: add dynamic dimension detection for Ollama embedding models
This fixes dimension mismatch errors when using embedding models with
non-standard dimensions (e.g., qwen3-embedding:4b produces 2560-dim
vectors instead of the hardcoded 768).

Changes:
- OllamaEmbeddingProvider: Detect dimensions dynamically by generating
  test embedding instead of hardcoding to 768
- qdrant_client: Call dimension detection before collection creation
- app.py: Initialize Qdrant collection before starting background tasks
  in streamable-http transport path
- tests: Fix integration tests to properly mock EmbeddingService wrapper

Fixes dimension mismatch error:
"could not broadcast input array from shape (2560,) into shape (768,)"

All integration tests passing (6/6).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-12 02:46:30 +01:00