Commit Graph

9 Commits

Author SHA1 Message Date
Chris Coutinho 02700a8e2c perf: Eliminate double-fetching in semantic search sampling
Performance optimization that removes redundant verification step and
makes content fetching parallel in nc_semantic_search_answer tool.

Changes:
- Remove verification.py module (only had 1 caller)
- Refactor nc_semantic_search to do inline deduplication instead of
  calling verify_search_results()
- Migrate verification patterns (anyio task group, semaphore limiting)
  to nc_semantic_search_answer's content fetching
- Change content fetching from sequential loop to parallel execution

Performance impact:
- Before: 10 API calls (5 parallel verification + 5 sequential content)
  = ~5.5s overhead
- After: 5 API calls (parallel content fetch) = ~0.5s overhead
- Result: 50% fewer API calls, ~10x faster for sampling operations

Technical details:
- Uses anyio.create_task_group() for structured concurrency
- Semaphore limiting (max_concurrent=20) prevents connection pool exhaustion
- Index-based storage maintains result ordering
- Expected failures (deleted notes) logged at debug level
- Deduplication handles hybrid search returning same doc from dense + sparse

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-16 10:25:04 +01:00
Chris Coutinho 42376483ab refactor: Optimize Nextcloud access verification with centralized filtering
Move access verification from individual search algorithms to final output
stage, eliminating redundant API calls and improving performance.

## Changes

**New:**
- `search/verification.py`: Centralized verification using anyio task groups
  - Deduplicates results by (doc_id, doc_type) before verification
  - Verifies all unique documents in parallel using structured concurrency
  - Filters out inaccessible documents in single pass

**Modified Search Algorithms:**
- `search/semantic.py`: Removed _deduplicate_and_verify() and _verify_document_access()
- `search/keyword.py`: Removed _verify_access() and parallel verification
- `search/fuzzy.py`: Removed _verify_access() and parallel verification
- `search/hybrid.py`: Removed nextcloud_client parameter passing

All algorithms now return unverified results from Qdrant payload.

**Modified Output Stages:**
- `server/semantic.py`: Added verify_search_results() call after search
- `auth/viz_routes.py`: Added verify_search_results() call after search

Both endpoints now verify access once at final stage with deduplication.

## Performance Impact

**Before:**
- Hybrid mode (limit=10): 30 API calls (10 per algorithm × 3 algorithms)
- Single algorithm: 10-20 API calls (with verification buffer)

**After:**
- Hybrid mode (limit=10): 10 API calls (deduplicated verification)
- Single algorithm: 10 API calls (deduplicated verification)

**Performance Gain:** 3x reduction in API calls for hybrid search

## Architecture Benefits

- **Separation of concerns**: Algorithms handle scoring, output stage handles security
- **Deduplication**: Each document verified exactly once
- **Parallel execution**: All verifications run concurrently via anyio task groups
- **Consistency**: Same verification logic across MCP tools and viz endpoints

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-15 06:21:06 +01:00
Chris Coutinho 2a078093ed refactor!: Make all search algorithms query Qdrant payload, not Nextcloud
BREAKING CHANGE: Search algorithms now require Qdrant to be populated.
Vector sync must be enabled and documents indexed for search to work.

- Keyword and fuzzy search now query Qdrant scroll API for title/excerpt
- Remove inefficient Nextcloud API fetching pattern
- Add optional Nextcloud verification for security
- Deduplicate by (doc_id, doc_type) tuple, keeping chunk_index=0
- Align with document processor pattern that already stores text in Qdrant
2025-11-15 01:56:41 +01:00
Chris Coutinho f3bdb8b885 feat: Update nc_semantic_search tool with algorithm selection
Implements ADR-012 by adding multi-algorithm support to the MCP tool.

Key changes:
- Added algorithm parameter: "semantic"|"keyword"|"fuzzy"|"hybrid" (default: "hybrid")
- Added weight parameters for hybrid mode configuration
- Replaced direct Qdrant/embedding calls with search module abstractions
- Updated docstring to describe all four algorithms
- Simplified implementation: ~50 lines vs ~150 lines (67% reduction)
- Better error handling for missing vector sync

Algorithm selection:
- semantic: Pure vector similarity (requires VECTOR_SYNC_ENABLED=true)
- keyword: Token-based matching with weighted title/content scoring
- fuzzy: Character overlap for typo tolerance
- hybrid: RRF fusion with configurable weights (default: 0.5/0.3/0.2)

Backward compatibility:
- Tool name unchanged (nc_semantic_search)
- New parameters have sensible defaults
- Existing clients get hybrid search automatically (better than pure semantic)
- search_method field in response reflects actual algorithm used

Weight validation:
- Performed in HybridSearchAlgorithm constructor
- Must sum to ≤1.0 and all non-negative
- At least one weight must be > 0
- Clear error messages on validation failure

Next: Update viz pane to use same algorithms

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-15 00:25:55 +01:00
Chris Coutinho c3023d2cc3 feat: Complete Phase 5 - Instrument all 93 MCP tools
Applied @instrument_tool decorator to all 86 remaining tools
across 8 server files.

Instrumented files:
- calendar.py: 16 tools
- contacts.py: 7 tools
- deck.py: 25 tools
- webdav.py: 11 tools
- tables.py: 6 tools
- sharing.py: 5 tools
- cookbook.py: 13 tools
- semantic.py: 3 tools

Total: 93 tools instrumented (7 in notes.py + 86 in other files)

These metrics populate:
- MCP Tool Calls panel (by tool name and status)
- MCP Tool Duration panel (histogram)
- MCP Tool Errors panel (by tool name and error type)

This completes PR #295 - All 5 phases of metrics instrumentation done:
 Phase 1: Queue size metrics (2 locations)
 Phase 2: Health checks (1 location)
 Phase 3: Database operations (3 methods)
 Phase 4: OAuth token metrics (3 locations)
 Phase 5: MCP tool metrics (93 tools)

All 34 dashboard panels now have data sources.
2025-11-13 16:58:44 +01:00
Chris Coutinho 4ea5ed72d4 feat: Add Grafana dashboard and vector sync metric instrumentation
Implement comprehensive observability for vector database synchronization
with Grafana dashboard and Prometheus metrics.

## Part 1: Grafana Dashboard

Created all-in-one operations dashboard with 7 rows and 34 panels:

### Dashboard Structure:
- **Overview Row**: Request rate, error rate, P95 latency, active requests
- **HTTP Metrics (RED)**: Request/error rates by endpoint, latency percentiles
- **MCP Tools**: Call volume, error rates, execution duration by tool
- **Nextcloud API**: API calls/latency by app, retry patterns
- **OAuth & Authentication**: Token validations, exchanges, cache hit rate
- **Dependencies & Health**: Status for Nextcloud/Qdrant/Keycloak/Unstructured
- **Vector Sync**: Processing throughput, queue depth, Qdrant operations

### Helm Chart Integration:
- Added dashboard-configmap.yaml template for automatic provisioning
- Configured Grafana sidecar auto-discovery (label: grafana_dashboard="1")
- Added dashboards configuration section in values.yaml (opt-in)
- Updated Chart.yaml with dashboard annotations
- Enhanced NOTES.txt with dashboard deployment instructions
- Comprehensive documentation in dashboards/README.md

Dashboard supports dynamic filtering via variables:
- datasource: Prometheus data source selection
- namespace: Filter by Kubernetes namespace
- pod: Multi-select pod filtering
- interval: Query interval (1m/5m/10m/30m/1h)

## Part 2: Vector Sync Metric Instrumentation

Implemented metric recording throughout vector sync pipeline:

### metrics.py:
Added convenience functions:
- record_vector_sync_scan() - Track documents per scan
- record_vector_sync_processing() - Track processing duration/status
- record_qdrant_operation() - Track database operations
- update_vector_sync_queue_size() - Track queue depth

### scanner.py:
- Record number of documents found in each scan
- Enables monitoring of scan throughput

### processor.py:
- Record processing duration for each document
- Track success/failure status with timing
- Record Qdrant upsert/delete operations
- Handle all code paths (success, deletion, error)

### semantic.py:
- Wrap Qdrant query_points with try/except
- Record search operation success/failure

## Metrics Exposed:

- mcp_vector_sync_documents_scanned_total
- mcp_vector_sync_documents_processed_total{status}
- mcp_vector_sync_processing_duration_seconds (histogram)
- mcp_vector_sync_queue_size (gauge)
- mcp_qdrant_operations_total{operation,status}

This enables monitoring of:
- Scan and processing throughput
- Processing latency (P50/P95/P99)
- Error rates for processing and Qdrant operations
- Queue depth trends
- Complete observability of vector sync pipeline

## Testing:

Verified locally that metrics are recorded correctly:
- 36 documents scanned
- 3 documents processed (avg 7.5s each)
- 3 successful Qdrant upsert operations
- Search operations tracked

## Deployment:

Enable dashboard provisioning in Helm values:
```yaml
dashboards:
  enabled: true
  grafanaFolder: "Nextcloud MCP"
```

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-13 11:49:20 +01:00
Chris Coutinho e575c8e57b feat(vector): Support multiple embedding models with auto-generated collection names
This PR enables safe switching between embedding models and multi-server
deployments by implementing auto-generated Qdrant collection names based on
deployment ID and model name.

## Problem

Previously, all deployments used a single hardcoded collection name
"nextcloud_content", which caused two critical issues:

1. **Dimension mismatches when switching models**: Changing
   OLLAMA_EMBEDDING_MODEL (e.g., nomic-embed-text at 768D → all-minilm at
   384D) would cause runtime errors as vectors couldn't be inserted into a
   collection with incompatible dimensions.

2. **Collection collisions in multi-server setups**: Multiple MCP servers
   sharing a single Qdrant instance would overwrite each other's data,
   making horizontal scaling impossible.

## Solution

### Auto-Generated Collection Naming

Collections are now automatically named using the pattern:
\`{deployment-id}-{model-name}\`

**Deployment ID**: Uses \`OTEL_SERVICE_NAME\` if configured (and not default
value), otherwise falls back to \`hostname\` for simple Docker deployments.

**Model Name**: From \`OLLAMA_EMBEDDING_MODEL\` with path separators sanitized.

**Examples**:
- \`my-mcp-server-nomic-embed-text\` (with OTEL_SERVICE_NAME=my-mcp-server)
- \`mcp-container-all-minilm\` (simple Docker, hostname=mcp-container)

**Override**: Users can still set \`QDRANT_COLLECTION\` explicitly to bypass
auto-generation for backward compatibility.

### Dimension Validation

Added startup validation that checks collection dimensions match the
embedding service. If a mismatch is detected, the server fails fast with a
clear error message explaining:
- Expected vs actual dimensions
- Likely cause (model change)
- Solutions (delete collection, use different name, or revert model)

### Improved Sampling Error Handling

Enhanced MCP sampling rejection handling to treat user rejections as normal
behavior rather than errors:

- **User rejections** ("rejected", "denied") → INFO log, no traceback
- **Unsupported clients** → INFO log, no traceback
- **Other MCP errors** → WARNING log, no traceback
- **Unexpected errors** → ERROR log WITH traceback

This aligns with the MCP specification where clients SHOULD prompt users for
approval/denial of sampling requests.

## Changes

### Core Implementation

- **nextcloud_mcp_server/config.py**: Added \`get_collection_name()\` method
  with deployment ID detection and model name sanitization
- **nextcloud_mcp_server/vector/qdrant_client.py**: Dimension validation on
  collection open with helpful error messages
- **nextcloud_mcp_server/vector/{scanner,processor}.py**: Updated to use
  \`get_collection_name()\`
- **nextcloud_mcp_server/auth/userinfo_routes.py**: Vector sync status uses
  \`get_collection_name()\`
- **nextcloud_mcp_server/server/semantic.py**:
  - Updated semantic search tools to use \`get_collection_name()\`
  - Improved sampling rejection error handling (McpError vs Exception)

### Documentation

- **docs/semantic-search-architecture.md**: New comprehensive architecture
  document (557 lines) covering background sync, semantic search flow, RAG
  implementation, and deployment modes
- **docs/configuration.md**: Added detailed "Qdrant Collection Naming"
  section with examples and multi-server deployment guidance
- **docker-compose.yml**: Added comments explaining collection naming behavior
- **README.md**: Updated semantic search descriptions to clarify
  experimental status, Notes-only support, and infrastructure requirements

## Migration Guide

**For existing single-server deployments:**

Option 1 (Recommended): Use explicit collection name for continuity
\`\`\`bash
QDRANT_COLLECTION=nextcloud_content  # Keep existing collection
\`\`\`

Option 2: Allow auto-generation and re-embed
\`\`\`bash
# Remove QDRANT_COLLECTION override
# New collection will be created based on deployment ID + model
# Requires re-embedding all documents (may take time)
\`\`\`

**For new multi-server deployments:**

Set unique OTEL service names per server:
\`\`\`bash
# Server 1
OTEL_SERVICE_NAME=mcp-prod
OLLAMA_EMBEDDING_MODEL=nomic-embed-text
# → Collection: "mcp-prod-nomic-embed-text"

# Server 2
OTEL_SERVICE_NAME=mcp-staging
OLLAMA_EMBEDDING_MODEL=nomic-embed-text
# → Collection: "mcp-staging-nomic-embed-text"
\`\`\`

## Benefits

 **Safe model switching**: Each model gets its own collection, preventing
   dimension mismatch errors
 **Multi-server support**: Multiple MCP servers can share one Qdrant
   instance without conflicts
 **Clear ownership**: Collection names show which deployment and model owns
   the data
 **Better error messages**: Dimension validation provides actionable
   guidance
 **Backward compatible**: Existing deployments can continue using
   \`QDRANT_COLLECTION\` override

## Testing

Validated with:
- Single-server deployments (default hostname-based naming)
- Multi-server deployments (OTEL service name-based naming)
- Model switching scenarios (dimension validation)
- Collection override scenarios (backward compatibility)

Next steps: Testing various Ollama embedding models to investigate optimal
chunk sizes and performance characteristics.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-10 01:18:30 +01:00
Chris Coutinho 72232f937a refactor: migrate vector sync from asyncio.Queue to anyio memory object streams
Replace asyncio.Queue with anyio.create_memory_object_stream() throughout
the vector sync system for better library consistency and improved shutdown
semantics.

## Changes Made

**scanner.py**:
- Changed parameter type from `asyncio.Queue` to `MemoryObjectSendStream[DocumentTask]`
- Replaced all `await document_queue.put()` calls with `await send_stream.send()`
- Wrapped scanner loop in `async with send_stream:` context manager for automatic cleanup
- Updated log messages: "Queued" → "Sent"
- Removed `import asyncio` (no longer needed)

**processor.py**:
- Changed parameter type from `asyncio.Queue` to `MemoryObjectReceiveStream[DocumentTask]`
- Replaced `asyncio.wait_for(document_queue.get(), timeout=1.0)` with `anyio.fail_after(1.0)` + `await receive_stream.receive()`
- Removed all `document_queue.task_done()` calls (not needed with streams)
- Added `anyio.EndOfStream` exception handling for graceful shutdown when scanner closes
- Removed `import asyncio` (no longer needed)

**app.py**:
- Removed `import asyncio` from top-level imports
- Added `from anyio.streams.memory import MemoryObjectReceiveStream, MemoryObjectSendStream`
- Updated AppContext dataclass:
  - Replaced `document_queue: Optional[asyncio.Queue]` with:
    - `document_send_stream: Optional[MemoryObjectSendStream]`
    - `document_receive_stream: Optional[MemoryObjectReceiveStream]`
- Updated `app_lifespan_basic()`:
  - Replaced `asyncio.Queue(maxsize=...)` with `anyio.create_memory_object_stream(max_buffer_size=...)`
  - Pass `send_stream` to scanner_task
  - Pass `receive_stream.clone()` to each processor_task (enables multiple consumers)
  - Updated AppContext yield to include both streams
- Updated `starlette_lifespan()`:
  - Same changes as app_lifespan_basic for streamable-http transport
  - Removed `import asyncio as asyncio_module` (no longer needed)
  - Updated app.state storage to use send_stream and receive_stream

**semantic.py**:
- Updated `nc_get_vector_sync_status()` tool:
  - Access `document_receive_stream` instead of `document_queue` from lifespan context
  - Use `stream_stats.current_buffer_used` instead of `queue.qsize()` for pending count
  - More reliable metrics (qsize() was not guaranteed accurate)

## Benefits

1. **Library Consistency**: Pure anyio throughout codebase (was mixing asyncio.Queue with anyio.Event and anyio.create_task_group)
2. **Graceful Shutdown**: `async with send_stream:` automatically closes stream on exit, signaling EndOfStream to all processors
3. **Better Timeout Handling**: `anyio.fail_after()` is more idiomatic than `asyncio.wait_for()`
4. **Stream Cloning**: Easy to add multiple consumers via `receive_stream.clone()`
5. **Better Statistics**: `.statistics()` provides accurate buffer metrics (qsize() was unreliable)
6. **Type Safety**: Separate send/receive types prevent accidental misuse
7. **No task_done() tracking**: Streams handle completion automatically

## Testing

-  All 69 unit tests passing
-  All 5 smoke tests passing
-  No regressions in functionality
-  Graceful shutdown behavior improved

## References

- https://anyio.readthedocs.io/en/stable/why.html#queue-fix
- https://anyio.readthedocs.io/en/stable/streams.html#memory-object-streams

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-09 06:43:44 +01:00
Chris Coutinho 4b026e9aa0 feat: implement ADR-009 - refactor semantic search to use generic semantic:read scope
This implements ADR-009, which documents the decision to use a generic
`semantic:read` OAuth scope instead of requiring all app-specific scopes
for semantic search functionality.

Changes:
- Created new `nextcloud_mcp_server/models/semantic.py` with semantic search models
  - SemanticSearchResult (with new doc_type field for multi-app support)
  - SemanticSearchResponse
  - SamplingSearchResponse
  - VectorSyncStatusResponse

- Created new `nextcloud_mcp_server/server/semantic.py` with semantic search tools
  - nc_semantic_search (renamed from nc_notes_semantic_search)
  - nc_semantic_search_answer (renamed from nc_notes_semantic_search_answer)
  - nc_get_vector_sync_status (renamed from nc_notes_get_vector_sync_status)
  - All tools now use @require_scopes("semantic:read") instead of "notes:read"

- Updated `nextcloud_mcp_server/server/notes.py`
  - Removed semantic search tools (moved to semantic.py)
  - Removed semantic search model imports
  - Removed unused MCP imports (ModelHint, ModelPreferences, etc.)

- Updated `nextcloud_mcp_server/models/notes.py`
  - Removed semantic search models (moved to semantic.py)

- Updated `nextcloud_mcp_server/app.py`
  - Import configure_semantic_tools
  - Register semantic tools when VECTOR_SYNC_ENABLED=true

- Updated `nextcloud_mcp_server/server/__init__.py`
  - Export configure_semantic_tools

- Updated tests
  - tests/integration/test_sampling.py: Use new tool names
  - tests/unit/test_response_models.py: Import from semantic.py, add doc_type field

Architecture:
- Semantic search is now a cross-app feature, not tied to Notes
- Uses dual-phase authorization: semantic:read scope + per-document verification
- Supports future multi-app indexing (notes, calendar, deck, files, contacts)

Test results:
- All 69 unit tests passing
- All 5 smoke tests passing

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-09 05:53:53 +01:00