Commit Graph

280 Commits

Author SHA1 Message Date
Chris Coutinho 208365cd3d feat: Add OpenAI provider support for embeddings and generation
Adds OpenAI provider to the unified provider architecture (ADR-015),
supporting:
- OpenAI API (api.openai.com)
- GitHub Models API (models.github.ai/inference)
- OpenAI-compatible endpoints (Fireworks, Together, etc.)

Features:
- Embedding support with text-embedding-3-small/large models
- Text generation via chat completions API
- Automatic retry with exponential backoff for rate limits
- Provider auto-detection in registry (priority after Bedrock)

Environment variables:
- OPENAI_API_KEY: API key (required)
- OPENAI_BASE_URL: Base URL override (optional)
- OPENAI_EMBEDDING_MODEL: Embedding model (default: text-embedding-3-small)
- OPENAI_GENERATION_MODEL: Generation model (default: gpt-4o-mini)

Also adds:
- Integration tests for RAG pipeline with MCP sampling
- MCP client sampling support for integration tests
- Ground truth Q&A pairs for Nextcloud User Manual

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-23 00:33:32 +01:00
Chris Coutinho 03b984d5a7 fix(smithery): Enable JSON response format for scanner compatibility
The Smithery scanner was reporting "0 tools" despite the server returning
valid tool definitions. Root cause: the server was returning SSE-formatted
responses (event: message\ndata: {...}) which the scanner couldn't parse.

Changes:
- Add json_response=True to FastMCP for Smithery stateless mode
- Clean up verbose docstring examples in semantic.py and webdav.py

The MCP spec allows both SSE and plain JSON responses for HTTP transport.
Setting json_response=True returns Content-Type: application/json with
plain JSON-RPC instead of text/event-stream with SSE format.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-22 22:01:18 +01:00
Chris Coutinho ea79e94842 Merge pull request #343 from cbcoutinho/fix/vector-viz-search
perf: Optimize vector viz search performance
2025-11-22 19:53:40 +01:00
Chris Coutinho b0612cfa0f perf: Optimize vector viz search performance
- Replace sequential Qdrant scroll calls with batch retrieve
  (50 HTTP requests → 1 request, ~50x faster vector fetch)

- Add point_id to SearchResult to enable batch retrieval by Qdrant point ID

- Reuse query embedding from search algorithm in viz_routes
  (eliminates redundant embedding call, saves ~30ms)

- Make BM25 encode() async with thread pool to avoid blocking event loop
  (~4.4s was blocking, now properly async)

- Run PCA computation in thread pool to avoid blocking event loop
  (~1.2s was blocking, now properly async)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-22 19:47:43 +01:00
Chris Coutinho 3b41776110 Merge pull request #342 from cbcoutinho/feature/smithery
feat: Add Smithery stateless deployment support (ADR-016)
2025-11-22 19:39:53 +01:00
Chris Coutinho 39fba49cfe fix(smithery): Add JSON Schema metadata to mcp-config endpoint
Add proper JSON Schema metadata fields per Smithery documentation:
- $schema: JSON Schema draft-07
- $id: Schema identifier URL
- title: Human-readable title
- description: Schema description
- x-query-style: "flat" (no nested objects in our schema)
- additionalProperties: false

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-22 18:28:05 +01:00
Chris Coutinho 706a15f0bc fix(smithery): Use container runtime pattern for config discovery
ADR-016: For container runtime deployment, Smithery does not auto-generate
the .well-known/mcp-config endpoint like it does for Python CLI runtime.

Changes:
- Remove [tool.smithery] from pyproject.toml (not used in container mode)
- Remove smithery_server.py (Python CLI runtime specific)
- Add .well-known/mcp-config endpoint to return JSON Schema config
- Add SmitheryConfigMiddleware to extract config from URL query params
- Use ContextVar to pass session config to tool handlers

The container runtime passes config as URL query parameters to /mcp:
  GET /mcp?nextcloud_url=...&username=...&app_password=...

Tested:
- All 164 unit tests passing
- Docker container builds successfully
- .well-known/mcp-config returns valid JSON Schema
- Health endpoints working

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-22 18:22:55 +01:00
Chris Coutinho b8dc413b73 feat: Add Smithery CLI deployment support
- Add smithery package as dependency
- Create smithery_server.py with @smithery.server() decorator
- Add SmitheryConfigSchema for session config (nextcloud_url, username, app_password)
- Add [tool.smithery] section to pyproject.toml
- Remove manual .well-known/mcp-config endpoint (Smithery handles this)

Smithery CLI will automatically:
- Extract config schema from the decorated function
- Handle session config parsing from query parameters
- Make config accessible via ctx.session_config in tools

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-22 18:05:33 +01:00
Chris Coutinho 8d29ce0122 fix: Add Smithery lifespan and auth mode detection
- Add SmitheryAppContext dataclass for stateless mode
- Add app_lifespan_smithery() with minimal lifespan (no shared state)
- Update is_oauth_mode() to detect Smithery mode and return BasicAuth
- Use Smithery lifespan when SMITHERY_DEPLOYMENT=true
- Add .well-known/mcp-config endpoint for config discovery
- Skip document processors in Smithery mode (not enabled)

Fixes startup issues in Smithery mode where missing env credentials
would incorrectly trigger OAuth mode.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-22 17:48:53 +01:00
Chris Coutinho f93d650992 feat: Implement ADR-016 Smithery stateless deployment mode
Adds support for Smithery hosted deployment with stateless operation:

- Add DeploymentMode enum with SELF_HOSTED and SMITHERY_STATELESS modes
- Add get_deployment_mode() to detect mode from SMITHERY_DEPLOYMENT env var
- Update get_client() to create per-request clients from session config
- Add conditional tool registration (skip semantic search in Smithery mode)
- Add conditional /app admin UI mounting (skip in Smithery mode)
- Create smithery.yaml with configSchema for user credentials
- Create Dockerfile.smithery for minimal stateless container
- Create smithery_main.py entrypoint for Smithery deployment

In Smithery mode:
- Users provide nextcloud_url, username, app_password via session config
- Each request creates a fresh NextcloudClient (no state between requests)
- Semantic search tools are disabled (no vector database)
- Admin UI (/app) is disabled (no webhooks, vector viz)

All existing self-hosted functionality remains unchanged.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-22 17:30:42 +01:00
Chris Coutinho 34fd17ba55 fix: Use alpha_composite for proper RGBA highlight blending
Drawing directly with ImageDraw on RGBA mode doesn't blend alpha
properly. Use Image.alpha_composite() with a transparent overlay
to achieve correct semi-transparent highlight fills.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-22 17:04:29 +01:00
Chris Coutinho 8baa07db84 fix: Remove pymupdf.layout.activate() to fix page_chunks behavior
pymupdf.layout.activate() causes pymupdf4llm.to_markdown() to ignore the
page_chunks=True option, returning a single string instead of list[dict].
This broke per-page chunking needed for semantic search indexing.

See: https://github.com/pymupdf/pymupdf4llm/issues/323

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-22 16:58:35 +01:00
Chris Coutinho ba8a53803a refactor: Simplify PDF text extraction with single to_markdown call
Replace parallel per-page extraction with single to_markdown(page_chunks=True)
call. This is more efficient as pymupdf4llm can optimize internally for
full-document processing instead of making N separate calls for N pages.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-22 03:52:02 +01:00
Chris Coutinho 31fade9730 perf: Optimize PDF processing with parallel extraction and single-render highlights
Phase 1 - PDF Highlighting Optimization:
- Render each page ONCE instead of once per chunk (N chunks = 1 render, not N)
- Use PIL to draw bounding boxes on copied base images (fast) instead of
  re-rendering page via pymupdf (slow)
- Add _find_chunk_bbox() to extract bbox without modifying page

Phase 2 - Parallel Page Extraction:
- Use anyio task group with run_sync() for parallel page extraction
- Each page extracted in separate thread via anyio.to_thread.run_sync()
- Event loop stays responsive during extraction
- Remove obsolete _process_sync() method

Expected improvement: 30-50% reduction in total PDF processing time.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-22 03:11:56 +01:00
Chris Coutinho fffe483c02 fix: Centralize PDF processing and generate separate images per chunk
Previously, pymupdf4llm.to_markdown() was called twice - once in
PyMuPDFProcessor during indexing and again in PDFHighlighter during
visualization. Different image path lengths caused different character
offsets, leading to highlighted pages not matching their chunks.

Also fixed issue where all chunks on the same page showed all highlights
instead of just their own highlight. Now restores original page contents
between chunks using xref stream caching.

Changes:
- Add PDFHighlighter class requiring pre-computed page_boundaries and
  full_text from document processor (no fallback extraction)
- Pass pre-computed data from processor to highlighter
- Extract page-relative portion of chunk text for cross-page chunks
- Add bounding box highlighting using text anchor search
- Run highlight generation in parallel with embedding/BM25
- Cache and restore page contents to isolate highlights per chunk

Results: Highlighting success rate improved from 51% to 95% (121/128).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-22 02:46:30 +01:00
Chris Coutinho a62a007c87 feat: Add context expansion to semantic search with chunk overlap removal
Implements optional context expansion for semantic search results that
fetches adjacent chunks (N-1 and N+1) from Qdrant to provide before/after
context. Removes configurable chunk overlap (default 200 chars) to avoid
duplicate text appearing in both context and excerpt.

Key changes:
- Add include_context and context_chars parameters to nc_semantic_search
  and nc_semantic_search_answer tools
- Implement Qdrant cache fast path for chunk retrieval (avoids re-fetching
  and re-parsing documents, especially important for PDFs)
- Add _get_chunk_by_index_from_qdrant() to fetch adjacent chunks
- Remove chunk overlap from before_context (last N chars) and after_context
  (first N chars) to prevent duplicate text
- Fetch context in parallel with anyio.Semaphore (max 20 concurrent)
- Pass through page_number from SearchResult to SemanticSearchResult
- Remove document-level deduplication (keep chunk-level dedup from algorithm)

Context expansion is opt-in via include_context=true parameter. When enabled:
- Populates has_context_expansion, marked_text, before_context, after_context
- Adds truncation flags when context exceeds context_chars limit
- Falls back to document fetch for legacy data with truncated excerpts

Related: nextcloud_mcp_server/search/context.py:87-382,
         nextcloud_mcp_server/server/semantic.py:161-255
2025-11-21 01:02:22 +01:00
Chris Coutinho 5a251a99e6 fix: Set is_placeholder=False in processor to fix search filtering
The processor was not setting is_placeholder field when writing real
document chunks to Qdrant. This caused the placeholder filter to exclude
all documents (since None != False), resulting in 0 search results.

Now explicitly sets is_placeholder: False in payload when writing real
indexed chunks, allowing search filters to correctly distinguish between
placeholders and real documents.
2025-11-20 17:15:19 +01:00
Chris Coutinho 25ef33de7f feat: Use Ollama native batch API in embed_batch()
- Switch from sequential loop to /api/embed batch endpoint
- Use 'input' array parameter instead of individual 'prompt' requests
- Process in chunks of 32 to avoid quality degradation (issue #6262)
- Reduces HTTP overhead: 128 texts = 4 requests instead of 128
- Maintains backward compatibility with embed() for single embeddings

Ref: ollama/ollama#6262
2025-11-20 16:50:13 +01:00
Chris Coutinho ec2c274cd9 fix: Increase placeholder staleness threshold to 5x scan interval
- Changed from 2x (120s) to 5x (300s) scan interval
- Large PDFs take 3-4 minutes to process, need longer threshold
- Prevents premature requeuing of in-flight documents
2025-11-20 15:36:49 +01:00
Chris Coutinho 47f0b3db9a fix: Add placeholder staleness check to prevent duplicate processing
- Only requeue documents if placeholder is older than 2x scan interval (120s default)
- Prevents scanner from immediately requeuing in-flight documents
- Fixes issue where PDFs were being reprocessed every 60 seconds
- Staleness check applied to both notes and files scanning logic
2025-11-20 15:30:10 +01:00
Chris Coutinho 233de3508f fix: Use empty SparseVector instead of None for placeholders
Qdrant validation rejects None for sparse vectors in named vector dicts.
Use models.SparseVector(indices=[], values=[]) instead to create valid
empty sparse vectors for placeholder points.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-20 15:15:10 +01:00
Chris Coutinho 13b2d0048c feat: Implement Qdrant placeholder state management
Introduces a placeholder-based state tracking system to prevent duplicate
document processing during the gap between scanner queuing and processor
completion.

**Key Changes:**

1. **Placeholder Helper Functions** (`vector/placeholder.py`):
   - `write_placeholder_point()` - Creates zero-vector placeholder when queuing
   - `query_document_metadata()` - Queries for existing entry (placeholder or real)
   - `delete_placeholder_point()` - Removes placeholder before writing real vectors
   - `get_placeholder_filter()` - Filters placeholders from user-facing queries

2. **Scanner Updates** (`vector/scanner.py`):
   - Replace `indexed_at` comparison with `modified_at` comparison
   - Write placeholder before queuing each document
   - Query per-document metadata instead of bulk-querying indexed_at
   - Fixes bug where files were resubmitted every scan cycle

3. **Processor Updates** (`vector/processor.py`):
   - Delete placeholder before upserting real vectors
   - Ensures no duplicate points in Qdrant

4. **Query Filters** (all search files):
   - Add `get_placeholder_filter()` to all user-facing queries
   - Ensures placeholders never appear in search results or visualizations
   - Applied to: bm25_hybrid.py, semantic.py, viz_routes.py, algorithms.py

**Architecture:**
- Placeholders use zero vectors with dimension from embedding service
- Payload includes `is_placeholder: True` flag for filtering
- Status field tracks: "pending", "processing", "completed", "failed"
- Deterministic UUIDs using uuid5 for consistent point IDs

**Impact:**
- Eliminates duplicate processing of same documents
- Fixes race condition where long-running documents get queued multiple times
- Prevents scanner from resubmitting files every scan cycle
- Maintains clean separation between in-flight and indexed documents

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-20 15:04:00 +01:00
Chris Coutinho 944dd760ca fix: Return empty array instead of null for query_coords when no results
When vector visualization search returns zero results, the code was returning
query_coords: null, which caused JavaScript error "can't access property 0,
queryCoords is null" when the frontend tried to access the array.

Changed to return empty array [] to match expected type and prevent crash.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-20 14:18:02 +01:00
Chris Coutinho d67aa6ae5c fix: Align PDF text extraction between indexing and context expansion
This commit fixes two critical issues with PDF processing:

1. **Text extraction mismatch (context expansion bug)**:
   - Indexing used pymupdf4llm.to_markdown() producing markdown text
   - Context expansion used page.get_text() producing plain text
   - Different text formats caused character offset misalignment
   - Search would find correct chunk, but expansion showed wrong section
   - Fixed by making context.py use pymupdf4llm.to_markdown() consistently

2. **Diagnostic logging for page number assignment**:
   - Added logging to verify page_boundaries exist in metadata
   - Added logging to verify assign_page_numbers() assigns values
   - Helps diagnose why page numbers show as null in search results

3. **mime_type storage bug**:
   - Fixed incorrect field reference in processor.py:405
   - Was using file_metadata.get("content_type", "")
   - Should use content_type from WebDAV response

Changes:
- nextcloud_mcp_server/search/context.py: Use pymupdf4llm.to_markdown()
  for PDF text extraction to match indexing method
- nextcloud_mcp_server/vector/processor.py: Add diagnostic logging for
  page boundaries and assignment, fix mime_type storage
- tests/unit/client/test_webdav.py: Fix import sorting

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-20 13:57:50 +01:00
Chris Coutinho f1a5fac1b9 fix: Update models and viz to use int-only doc_id
- algorithms.py: Revert SearchResult.id to int (all docs use int IDs now)
- semantic.py: Revert SemanticSearchResult.id to int, remove Union import
- viz_routes.py: Remove str() conversion when querying doc_id from Qdrant
- viz_routes.py: Convert doc_id from query param to int in chunk context

Fixes vector visualization which was collapsing all chunks to a single
point because Qdrant queries were failing to match doc_id (string vs int).
2025-11-20 12:32:27 +01:00
Chris Coutinho d0691d5aa0 feat: Switch files to use numeric IDs with file_path resolution
- scanner.py: Use file_info['id'] as doc_id instead of file_path
- scanner.py: Pass file_path in DocumentTask for content retrieval
- processor.py: Store file_path in Qdrant payload for later lookup
- context.py: Add _get_file_path_from_qdrant() to resolve file_id → file_path
- context.py: Update get_chunk_with_context() to handle file ID resolution

This makes the system resilient to file renames since file IDs are stable
identifiers in Nextcloud, while file paths can change.
2025-11-20 12:00:47 +01:00
Chris Coutinho f1610bbd2e fix: Reconstruct full content for notes to match indexed offsets
Notes are indexed as "{title}\n\n{content}" in processor.py but were
being retrieved as just content during chunk expansion, causing
chunk_start_offset and chunk_end_offset to be misaligned.

This fix reconstructs the full content structure when fetching notes
for chunk expansion, ensuring the displayed chunks match the excerpts
shown in search results.

Fixes chunk/excerpt mismatch reported in vector visualization.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-20 11:33:12 +01:00
Chris Coutinho 327d843f64 feat: Implement per-chunk vector visualization with context expansion
Major improvements to vector visualization page:
- Refactor PCA to display individual chunks instead of averaged documents
- Add context expansion module for fetching surrounding text from notes and PDFs
- Update deduplication to use (doc_id, doc_type, chunk_start, chunk_end) keys
- Fix Alpine.js rendering with chunk-specific keys including offsets
- Refactor authentication helper to return NextcloudClient for better reuse
- Add async context manager support to NextcloudClient

Technical details:
- viz_routes.py: Fetch specific chunk vectors instead of averaging per document
- context.py: New module supporting both notes and PDF text extraction via PyMuPDF
- search algorithms: Extract page_number, chunk_index, total_chunks from Qdrant
- vector-viz.js/html: Use chunk positions in expansion tracking keys

This enables users to see which specific chunks match their query
and view them with surrounding context in the PCA visualization.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-20 11:22:20 +01:00
Chris Coutinho b8010270c1 fix: Add async/await, PDF metadata, and type safety fixes
This commit addresses multiple issues with async operations, PDF metadata
extraction, and type safety in document processing and search.

## Async/Await Fixes
- processor.py:259 - Added await for chunker.chunk_text(content)
- processor.py:270 - Added await for bm25_service.encode_batch(chunk_texts)
- tests/unit/test_document_chunker.py - Converted all 12 test methods to async

## PDF Metadata Enhancement
- pymupdf.py:143 - Added file_size metadata extraction
- pymupdf.py:145-206 - Refactored to extract text page-by-page
  - Manually loop through pages instead of using page_chunks=True
  - Generate page_boundaries metadata for precise page tracking
  - Works around pymupdf.layout.activate() breaking page_chunks=True
- processor.py:32-66 - Added assign_page_numbers() helper function
  - Assigns page numbers to chunks based on overlap with page boundaries
  - Handles chunks spanning multiple pages
- processor.py:298-300 - Call assign_page_numbers() for PDF files

## Type Safety Fixes
- bm25_hybrid.py:184 - Removed int() conversion of doc_id
- semantic.py:131 - Removed int() conversion of doc_id
- viz_routes.py:275 - Removed int() conversion of doc_id
- Added comments documenting that doc_id can be int (notes) or str (file paths)

## Testing
- All 18 tests passing (12 unit + 6 integration)
- No type errors in modified files
- Container logs show successful processing
- Vector viz searches working correctly

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-20 02:37:07 +01:00
Chris Coutinho c4ce28f05d fix: Improve 3D plot rendering with explicit dimensions and window resize support
- Get container dimensions before creating Plotly layout to render at correct size immediately
- Add init() method with window resize listener for responsive plot sizing
- Remove post-render resize call (no longer needed with explicit dimensions)
- Improve colorbar positioning and scene domain configuration

This eliminates the visual "jump" during initial render and ensures the plot resizes smoothly when the browser window changes size.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-19 19:43:20 +01:00
Chris Coutinho c126c3ec03 fix: Preserve 3D plot camera and improve documentation
This commit addresses PR feedback and fixes plot camera behavior.

## JavaScript Fix - Camera Preservation
- Changed plot update strategy from recreating layout to using Plotly.restyle()
- Query point visibility now toggles via restyle() which only modifies trace visibility
- Camera position/zoom naturally preserved since layout remains untouched
- Resolves jumpy plot behavior when toggling "Show Query Point" checkbox

Related: nextcloud_mcp_server/auth/static/vector-viz.js:58-73

## Documentation Improvements
- Condensed vector-sync-ui.md from 316 to 94 lines (~70% reduction)
- Removed redundant FAQ section (content merged into main sections)
- Simplified use cases from 4 detailed sections to 3 focused paragraphs
- Streamlined troubleshooting to 3 common issues
- Merged technical details into overview section
- Retained all essential information while improving readability

## Screenshot Updates
Removed old/outdated images (5 files):
- rag-workflow-bidirectional-final.png
- rag-workflow-prominent-llm.png
- rag-workflow-simple-final.png
- vector-viz-interface.png
- welcome-page.png

Replaced with current screenshots (3 files):
- vector-viz-document-types-2col.png - Now shows plot + results
- vector-viz-chunk-context.png - Centered content view
- vector-viz-results.png - Updated results list

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-19 14:10:53 +01:00
Chris Coutinho 9bd02d7ef7 fix: Preserve 3D plot camera position and fix CSS loading
Two fixes for the vector visualization page:

1. **CSS Loading Fix**: Moved CSS <link> from vector_viz.html fragment
   to user_info.html <head> block. HTMX fragments don't process <link>
   tags in <head>, causing unstyled page. Now CSS loads correctly.

2. **Camera Preservation**: Modified renderPlot() to preserve camera
   position when toggling query point visibility. Previously, toggling
   the "Show Query Point" checkbox would reset zoom/rotation to default.
   Now reads existing camera settings from plot before updating.

Related: nextcloud_mcp_server/auth/static/vector-viz.js:123-130
Related: nextcloud_mcp_server/auth/templates/user_info.html:12

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-19 13:51:08 +01:00
Chris Coutinho 53689d076b feat: Improve vector visualization with static assets and fixes
- Extract CSS and JavaScript into separate static files
  - Created nextcloud_mcp_server/auth/static/vector-viz.css
  - Created nextcloud_mcp_server/auth/static/vector-viz.js
  - Updated templates to reference external assets

- Fix vector visualization issues:
  - Normalize vectors before PCA to match Qdrant's cosine distance
  - Add zero-norm and NaN detection/handling for large datasets
  - Enable responsive Plotly sizing (autosize + responsive config)
  - Widen plot area to full viewport width with minimized margins

- Improve visualization accuracy:
  - Query point now positioned correctly relative to documents
  - Handles 200+ points without JSON serialization errors
  - Full-width plot maximizes screen space utilization

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-19 04:10:44 +01:00
Chris Coutinho 9db20a4d01 feat: Redesign UI to match Nextcloud ecosystem aesthetic
This commit updates the web interface to better align with Nextcloud's
design system and improve the Vector Viz layout.

Changes:
- Replace emoji icons with Material Design SVG icons for better
  consistency with Nextcloud apps
- Simplify navigation styling with minimal padding and subtle active
  states (250px width)
- Update CSS variables to match Nextcloud design system
- Restructure Vector Viz from two-column to single-column vertical
  layout for better plot visibility
- Move search controls to compact horizontal grid at top
- Make navigation toggle always visible (not just on mobile)
- Fix plot container sizing with overflow:visible to prevent colorbar
  clipping
- Remove heavy shadows and custom card styling for cleaner aesthetic
- Add error and success page templates with consistent styling

Technical details:
- Preserve Alpine.js for reactive functionality
- Use CSS Grid for responsive horizontal controls layout
- Add smooth transitions for navigation collapse/expand
- Maintain HTMX for dynamic content loading

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-19 00:45:19 +01:00
Chris Coutinho eec923eff5 feat: Replace custom document chunker with LangChain MarkdownTextSplitter
Migrates from custom word-based chunking to LangChain's MarkdownTextSplitter
for better semantic search quality. This implements the chunking portion of
ADR-011.

Changes:
- Replace custom regex word chunker with MarkdownTextSplitter
- Optimized for Markdown content (headers, code blocks, lists)
- Convert from word-based (512 words) to character-based (2048 chars) chunking
- Maintain backward-compatible ChunkWithPosition interface
- Update configuration defaults and validation
- Update all unit tests (12/12 passing)

Benefits:
- Respects markdown structure boundaries
- Never breaks code blocks or headers mid-chunk
- Preserves semantic coherence within chunks
- Expected 20-30% improvement in recall quality
- Industry-standard approach (used by production RAG systems)

Note: Full reindex required to apply new chunking to existing documents.
Current vector database still contains old word-based chunks.

Related: ADR-011 (Improving Semantic Search Quality)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-18 12:17:23 +01:00
Chris Coutinho d374bfa1e5 feat(viz): Add dual-score display and improve UI controls
This commit enhances the vector visualization interface with better score
transparency and improved UX:

**Dual-Score Display:**
- Store original algorithm scores before normalization (viz_routes.py:203)
- Display both raw and normalized scores: "Raw Score: 0.842 (89% relative)"
- Update plot hover text with dual scores (userinfo_routes.py:740)
- Fixes issue where all queries showed at least one 100% match regardless
  of actual relevance (normalization artifact)

**UI Improvements:**
1. Fusion Method dropdown: Changed from x-show to :disabled
   - Prevents jarring layout shift when switching algorithms
   - Dropdown stays visible but grayed out when Semantic is selected
   - Better UX with opacity: 0.5 and cursor: not-allowed

2. Score Threshold: Changed step from 0.1 to "any"
   - Allows arbitrary float precision (0.7, 0.85, 0.123)
   - Users can now fine-tune threshold values

3. Document Types: Converted multi-select to checkbox grid
   - Replaced clunky Ctrl/Cmd multi-select listbox
   - Checkbox grid with cleaner layout
   - Positioned left of Score Threshold and Result Limit inputs
   - More intuitive UX

**Technical Details:**
- Raw score ranges vary by algorithm:
  - Semantic: 0.0-1.0 (cosine similarity)
  - BM25 RRF: ~0.001-0.033 (Reciprocal Rank Fusion)
  - BM25 DBSF: Can exceed 1.0 (Distribution-Based Score Fusion)
- Normalized scores (0-1) used for visual encoding (marker size, color)
- Original scores preserved in API response via getattr fallback

Files modified:
- nextcloud_mcp_server/auth/viz_routes.py (store original_score)
- nextcloud_mcp_server/auth/templates/vector_viz.html (UI controls)
- nextcloud_mcp_server/auth/userinfo_routes.py (plot hover text)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-17 08:05:49 +01:00
Chris Coutinho 3aa7128f45 feat: add chunk position tracking to vector indexing and search
Track character offsets (start_offset, end_offset) for each chunk in vector
database metadata, enabling precise chunk highlighting in visualization pane.

Changes:
- processor.py: Store chunk_start_offset and chunk_end_offset in Qdrant metadata
- processor.py: Added metadata_version=2 to indicate position tracking support
- search/semantic.py: Return chunk positions from search results
- server/semantic.py: Expose chunk positions in API responses (SemanticSearchResult)

Enables viz pane to:
1. Display exact matched chunk with surrounding context
2. Highlight the precise portion of text that matched the query
3. Build user trust by showing what the RAG system actually retrieved

Position tracking uses ChunkWithPosition dataclass from document_chunker.py
which provides character-accurate offsets in the original document.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-17 06:47:58 +01:00
Chris Coutinho c3282534eb feat: add vector viz template and chunk context endpoint
Extracted vector visualization HTML template to separate file to resolve
syntax conflicts between Jinja2, Alpine.js, and CSS. Added chunk context
endpoint for fetching matched chunks with surrounding text.

Changes:
- Moved vector_viz.html to templates/ directory (separates Jinja2/Alpine.js/CSS)
- Added /app/chunk-context endpoint for retrieving chunk text with context
- Updated .dockerignore to include HTML files in Docker builds
- Moved anthropic and boto3 to main dependencies (needed for production features)
- Added jinja2 dependency for template rendering

Fixes Jinja2 TemplateSyntaxError caused by CSS colons being parsed as
Jinja2 syntax when template was inline in Python code.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-17 06:46:52 +01:00
Chris Coutinho 862308418e fix: prevent infinite loop in DocumentChunker with position tracking
Fixed a critical infinite loop bug in document_chunker.py that occurred
when the overlap parameter caused the chunker to not make forward progress.

Changes:
- Added ChunkWithPosition dataclass to track character positions
- Refactored chunk_text() to use regex word matching for accurate position tracking
- Added safety check to ensure forward progress (next_start_idx > start_idx)
- Changed return type from list[str] to list[ChunkWithPosition]

The bug manifested when:
1. end_idx reached len(word_matches) (processing last chunk)
2. next_start_idx = end_idx - overlap would not advance past start_idx
3. Loop would continue indefinitely without making progress

Fix ensures chunker always terminates by breaking when not advancing.

All 9 unit tests now pass in 1.66s (previously timing out at 180s).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-17 06:39:15 +01:00
Chris Coutinho 3464b21845 fix: Relax SearchResult validation to support DBSF fusion scores > 1.0
Fix false-positive validation error where DBSF (Distribution-Based Score
Fusion) correctly produces scores > 1.0 but SearchResult validation
incorrectly rejected them.

**Root Cause**: SearchResult.__post_init__() enforced scores in [0.0, 1.0]
range, but DBSF sums normalized scores from multiple retrieval systems
(dense semantic + sparse BM25), resulting in scores like 1.55 when both
systems strongly agree a document is relevant.

**Changes**:
- Relaxed validation to allow any score ≥ 0.0 (algorithms.py:147-157)
- Updated SearchResult and SemanticSearchResult documentation to explain
  score ranges for RRF ([0.0, 1.0]) vs DBSF (unbounded)
- Added comprehensive test coverage for both fusion methods
- Added DBSF fusion option to vector visualization UI
- Updated viz routes and vizApp() to support fusion parameter selection

**Testing**: All 157 unit tests pass, type checking passes, ruff passes

Fixes error: "Configuration error: Score must be between 0.0 and 1.0, got 1.1528953"

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-17 06:32:30 +01:00
Chris Coutinho 1504df6fb5 Merge branch 'master' into feature/bedrock 2025-11-16 12:08:23 +01:00
Chris Coutinho c28fc955ca Merge origin/master into feature/bm25
Resolved conflicts:
- viz_routes.py: Kept bm25's extract_dense_vector() function for robust vector handling
- hybrid.py: Removed (bm25 uses native Qdrant RRF fusion instead)
- uv.lock: Regenerated after accepting master's dependencies

This merge brings in:
- RAG evaluation framework (ADR-013)
- Performance optimizations (double-fetch elimination)
- Migration from asyncio to anyio
- OpenTelemetry tracing improvements
- Notes app enhancements

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-16 11:52:40 +01:00
Chris Coutinho ad4b45889f fix: suppress Starlette middleware type warnings in ty checker 2025-11-16 11:43:50 +01:00
Chris Coutinho 5b484c9226 feat: add unified provider architecture with Amazon Bedrock support
Refactored LLM provider infrastructure to support sustainable additions of new providers with both embedding and text generation capabilities.

## Major Changes

### Unified Provider Architecture (ADR-015)
- Created `nextcloud_mcp_server/providers/` with unified Provider ABC
- Providers now support optional capabilities (embeddings and/or generation)
- Auto-detection registry with priority: Bedrock → Ollama → Simple
- Backward compatible - existing code continues to work

### New Providers
- **BedrockProvider**: Full Amazon Bedrock integration
  - Embeddings: Titan Embed, Cohere Embed models
  - Generation: Claude, Llama, Titan Text, Mistral models
  - Model-specific request/response handling
  - AWS credential chain integration
- **OllamaProvider**: Migrated with both capabilities support
- **AnthropicProvider**: Moved from test code to production providers
- **SimpleProvider**: Migrated in-memory fallback provider

### Breaking Changes
None - full backward compatibility maintained:
- `embedding.get_embedding_service()` still works
- RAG evaluation tests updated to use unified providers
- All existing tests pass (127 unit tests)

### Testing
- Added 9 comprehensive Bedrock unit tests with mocked boto3
- All existing unit tests pass
- Type checking (ty) and linting (ruff) pass
- Verified backward compatibility

### Documentation
- `docs/ADR-015-unified-provider-architecture.md`: Comprehensive ADR
- `docs/bedrock-setup.md`: AWS setup guide with IAM permissions
- `CLAUDE.md`: Updated with provider architecture section

### Dependencies
- Added `boto3>=1.35.0` to dev dependencies (optional)

## Environment Variables

### Bedrock
- `AWS_REGION`: AWS region (e.g., "us-east-1")
- `BEDROCK_EMBEDDING_MODEL`: Model ID for embeddings
- `BEDROCK_GENERATION_MODEL`: Model ID for generation
- `AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY`: Optional credentials

### Ollama
- `OLLAMA_BASE_URL`: API URL
- `OLLAMA_EMBEDDING_MODEL`: Embedding model (default: "nomic-embed-text")
- `OLLAMA_GENERATION_MODEL`: Generation model

## AWS Bedrock Permissions Required

Minimal IAM policy:
```json
{
  "Effect": "Allow",
  "Action": ["bedrock:InvokeModel"],
  "Resource": ["arn:aws:bedrock:*::foundation-model/*"]
}
```

See `docs/bedrock-setup.md` for detailed setup instructions.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-16 11:36:58 +01:00
Chris Coutinho 8799450c7d Merge pull request #306 from cbcoutinho/rag-evaluation
feat: RAG evaluation framework with performance improvements
2025-11-16 11:17:41 +01:00
Chris Coutinho c4bf077050 feat: Add OpenTelemetry tracing to @instrument_tool decorator
Enhances the @instrument_tool decorator to create distributed traces
for all MCP tool executions, improving observability and debugging.

Changes:
- Modified @instrument_tool to wrap tool execution in trace_operation
- Added automatic span creation with mcp.tool.* span names
- Sanitized tool arguments before adding to span attributes
  (excludes password, token, secret, api_key, etag, ctx)
- Limited argument strings to 500 characters to prevent huge spans
- Maintained existing Prometheus metrics functionality
- Updated docs/observability.md to reflect correct decorator name
- Added comprehensive unit tests

All ~50+ MCP tools now emit traces automatically without code changes.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-16 11:16:05 +01:00
Chris Coutinho f559ca049e Merge branch 'rag-evaluation' 2025-11-16 10:26:19 +01:00
Chris Coutinho 02700a8e2c perf: Eliminate double-fetching in semantic search sampling
Performance optimization that removes redundant verification step and
makes content fetching parallel in nc_semantic_search_answer tool.

Changes:
- Remove verification.py module (only had 1 caller)
- Refactor nc_semantic_search to do inline deduplication instead of
  calling verify_search_results()
- Migrate verification patterns (anyio task group, semaphore limiting)
  to nc_semantic_search_answer's content fetching
- Change content fetching from sequential loop to parallel execution

Performance impact:
- Before: 10 API calls (5 parallel verification + 5 sequential content)
  = ~5.5s overhead
- After: 5 API calls (parallel content fetch) = ~0.5s overhead
- Result: 50% fewer API calls, ~10x faster for sampling operations

Technical details:
- Uses anyio.create_task_group() for structured concurrency
- Semaphore limiting (max_concurrent=20) prevents connection pool exhaustion
- Index-based storage maintains result ordering
- Expected failures (deleted notes) logged at debug level
- Deduplication handles hybrid search returning same doc from dense + sparse

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-16 10:25:04 +01:00
Chris Coutinho 1faf572546 Merge branch 'feature/bm25'
Resolves conflict in viz_routes.py by combining:
- Named vector extraction from feature/bm25
- Performance timing from master
2025-11-16 08:18:39 +01:00
Chris Coutinho 944b6dcf5a fix: Handle named vectors in visualization and semantic search
- viz_routes.py: Extract "dense" vector from named vector dict
- semantic.py: Specify using="dense" for BM25 hybrid collections
- Fixes "X must be 2D array" error in hybrid search
- Fixes "Dense vector  is not found" error in semantic search

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-16 08:16:35 +01:00