Adds comprehensive vector search support for Nextcloud Deck cards,
including semantic search indexing, chunk preview in the vector viz UI,
and proper deep linking to cards.
**Vector Search Indexing**
- Add deck_card scanning in scanner.py (scan_deck_cards function)
- Index cards from non-archived, non-deleted boards
- Store metadata: board_id, board_title, stack_id, stack_title, card_type, duedate, owner
- Content structure: title + "\n\n" + description (matches indexing format)
- Incremental sync based on lastModified timestamp
- Deletion tracking with grace period
**Vector Visualization Support**
- Add deck_card handler in context.py for chunk preview expansion
- Include board_id in search result metadata (bm25_hybrid.py, semantic.py)
- Expose metadata in viz_routes.py JSON responses
- Update vector-viz.js to construct proper Deck URLs: /apps/deck/board/{board_id}/card/{card_id}
- Update vector_viz.html filter label from "Deck" to "Deck Cards"
**Bug Fixes**
- Skip soft-deleted boards (deletedAt > 0) to prevent 403 Forbidden errors
- Applies to scanner, processor, and context expansion code paths
- Deck API returns deleted boards but rejects stack access with 403
**Testing**
- Add integration tests in test_deck_vector_search.py:
- test_deck_card_semantic_search: Filtered search with doc_type="deck_card"
- test_deck_card_appears_in_cross_app_search: Cross-app search includes deck cards
- test_deck_card_chunk_context: Chunk context fetching for viz preview
**Documentation**
- Update README.md: Add Deck cards to semantic search feature list
- Update semantic-search-architecture.md: Document deck_card support
- Update nc_semantic_search tool documentation
**Type Safety**
- Fix type narrowing for page_boundaries (could be None) using cast()
- Fix scanner.py payload None check for type safety
Resolves vector search for Deck cards across indexing, search, and visualization.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Add support for news_item document type in the vector visualization page:
- Add "News" checkbox to document type filter options
- Add URL handler to link news items to /apps/news/item/{id}
- Add content fetching for news items in chunk context expansion
This enables users to search and view news articles in the vector
visualization, with clickable links back to Nextcloud News and the
ability to expand chunks to see full article context.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Previously, pymupdf4llm.to_markdown() was called twice - once in
PyMuPDFProcessor during indexing and again in PDFHighlighter during
visualization. Different image path lengths caused different character
offsets, leading to highlighted pages not matching their chunks.
Also fixed issue where all chunks on the same page showed all highlights
instead of just their own highlight. Now restores original page contents
between chunks using xref stream caching.
Changes:
- Add PDFHighlighter class requiring pre-computed page_boundaries and
full_text from document processor (no fallback extraction)
- Pass pre-computed data from processor to highlighter
- Extract page-relative portion of chunk text for cross-page chunks
- Add bounding box highlighting using text anchor search
- Run highlight generation in parallel with embedding/BM25
- Cache and restore page contents to isolate highlights per chunk
Results: Highlighting success rate improved from 51% to 95% (121/128).
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Implements optional context expansion for semantic search results that
fetches adjacent chunks (N-1 and N+1) from Qdrant to provide before/after
context. Removes configurable chunk overlap (default 200 chars) to avoid
duplicate text appearing in both context and excerpt.
Key changes:
- Add include_context and context_chars parameters to nc_semantic_search
and nc_semantic_search_answer tools
- Implement Qdrant cache fast path for chunk retrieval (avoids re-fetching
and re-parsing documents, especially important for PDFs)
- Add _get_chunk_by_index_from_qdrant() to fetch adjacent chunks
- Remove chunk overlap from before_context (last N chars) and after_context
(first N chars) to prevent duplicate text
- Fetch context in parallel with anyio.Semaphore (max 20 concurrent)
- Pass through page_number from SearchResult to SemanticSearchResult
- Remove document-level deduplication (keep chunk-level dedup from algorithm)
Context expansion is opt-in via include_context=true parameter. When enabled:
- Populates has_context_expansion, marked_text, before_context, after_context
- Adds truncation flags when context exceeds context_chars limit
- Falls back to document fetch for legacy data with truncated excerpts
Related: nextcloud_mcp_server/search/context.py:87-382,
nextcloud_mcp_server/server/semantic.py:161-255
Major improvements to vector visualization page:
- Refactor PCA to display individual chunks instead of averaged documents
- Add context expansion module for fetching surrounding text from notes and PDFs
- Update deduplication to use (doc_id, doc_type, chunk_start, chunk_end) keys
- Fix Alpine.js rendering with chunk-specific keys including offsets
- Refactor authentication helper to return NextcloudClient for better reuse
- Add async context manager support to NextcloudClient
Technical details:
- viz_routes.py: Fetch specific chunk vectors instead of averaging per document
- context.py: New module supporting both notes and PDF text extraction via PyMuPDF
- search algorithms: Extract page_number, chunk_index, total_chunks from Qdrant
- vector-viz.js/html: Use chunk positions in expansion tracking keys
This enables users to see which specific chunks match their query
and view them with surrounding context in the PCA visualization.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Two fixes for the vector visualization page:
1. **CSS Loading Fix**: Moved CSS <link> from vector_viz.html fragment
to user_info.html <head> block. HTMX fragments don't process <link>
tags in <head>, causing unstyled page. Now CSS loads correctly.
2. **Camera Preservation**: Modified renderPlot() to preserve camera
position when toggling query point visibility. Previously, toggling
the "Show Query Point" checkbox would reset zoom/rotation to default.
Now reads existing camera settings from plot before updating.
Related: nextcloud_mcp_server/auth/static/vector-viz.js:123-130
Related: nextcloud_mcp_server/auth/templates/user_info.html:12
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Extract CSS and JavaScript into separate static files
- Created nextcloud_mcp_server/auth/static/vector-viz.css
- Created nextcloud_mcp_server/auth/static/vector-viz.js
- Updated templates to reference external assets
- Fix vector visualization issues:
- Normalize vectors before PCA to match Qdrant's cosine distance
- Add zero-norm and NaN detection/handling for large datasets
- Enable responsive Plotly sizing (autosize + responsive config)
- Widen plot area to full viewport width with minimized margins
- Improve visualization accuracy:
- Query point now positioned correctly relative to documents
- Handles 200+ points without JSON serialization errors
- Full-width plot maximizes screen space utilization
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
This commit updates the web interface to better align with Nextcloud's
design system and improve the Vector Viz layout.
Changes:
- Replace emoji icons with Material Design SVG icons for better
consistency with Nextcloud apps
- Simplify navigation styling with minimal padding and subtle active
states (250px width)
- Update CSS variables to match Nextcloud design system
- Restructure Vector Viz from two-column to single-column vertical
layout for better plot visibility
- Move search controls to compact horizontal grid at top
- Make navigation toggle always visible (not just on mobile)
- Fix plot container sizing with overflow:visible to prevent colorbar
clipping
- Remove heavy shadows and custom card styling for cleaner aesthetic
- Add error and success page templates with consistent styling
Technical details:
- Preserve Alpine.js for reactive functionality
- Use CSS Grid for responsive horizontal controls layout
- Add smooth transitions for navigation collapse/expand
- Maintain HTMX for dynamic content loading
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
This commit enhances the vector visualization interface with better score
transparency and improved UX:
**Dual-Score Display:**
- Store original algorithm scores before normalization (viz_routes.py:203)
- Display both raw and normalized scores: "Raw Score: 0.842 (89% relative)"
- Update plot hover text with dual scores (userinfo_routes.py:740)
- Fixes issue where all queries showed at least one 100% match regardless
of actual relevance (normalization artifact)
**UI Improvements:**
1. Fusion Method dropdown: Changed from x-show to :disabled
- Prevents jarring layout shift when switching algorithms
- Dropdown stays visible but grayed out when Semantic is selected
- Better UX with opacity: 0.5 and cursor: not-allowed
2. Score Threshold: Changed step from 0.1 to "any"
- Allows arbitrary float precision (0.7, 0.85, 0.123)
- Users can now fine-tune threshold values
3. Document Types: Converted multi-select to checkbox grid
- Replaced clunky Ctrl/Cmd multi-select listbox
- Checkbox grid with cleaner layout
- Positioned left of Score Threshold and Result Limit inputs
- More intuitive UX
**Technical Details:**
- Raw score ranges vary by algorithm:
- Semantic: 0.0-1.0 (cosine similarity)
- BM25 RRF: ~0.001-0.033 (Reciprocal Rank Fusion)
- BM25 DBSF: Can exceed 1.0 (Distribution-Based Score Fusion)
- Normalized scores (0-1) used for visual encoding (marker size, color)
- Original scores preserved in API response via getattr fallback
Files modified:
- nextcloud_mcp_server/auth/viz_routes.py (store original_score)
- nextcloud_mcp_server/auth/templates/vector_viz.html (UI controls)
- nextcloud_mcp_server/auth/userinfo_routes.py (plot hover text)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Fix false-positive validation error where DBSF (Distribution-Based Score
Fusion) correctly produces scores > 1.0 but SearchResult validation
incorrectly rejected them.
**Root Cause**: SearchResult.__post_init__() enforced scores in [0.0, 1.0]
range, but DBSF sums normalized scores from multiple retrieval systems
(dense semantic + sparse BM25), resulting in scores like 1.55 when both
systems strongly agree a document is relevant.
**Changes**:
- Relaxed validation to allow any score ≥ 0.0 (algorithms.py:147-157)
- Updated SearchResult and SemanticSearchResult documentation to explain
score ranges for RRF ([0.0, 1.0]) vs DBSF (unbounded)
- Added comprehensive test coverage for both fusion methods
- Added DBSF fusion option to vector visualization UI
- Updated viz routes and vizApp() to support fusion parameter selection
**Testing**: All 157 unit tests pass, type checking passes, ruff passes
Fixes error: "Configuration error: Score must be between 0.0 and 1.0, got 1.1528953"
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>