nextcloud-mcp-server

Author	SHA1	Message	Date
Chris Coutinho	20404cf3f2	feat(vector): add Deck card vector search with visualization support Adds comprehensive vector search support for Nextcloud Deck cards, including semantic search indexing, chunk preview in the vector viz UI, and proper deep linking to cards. Vector Search Indexing - Add deck_card scanning in scanner.py (scan_deck_cards function) - Index cards from non-archived, non-deleted boards - Store metadata: board_id, board_title, stack_id, stack_title, card_type, duedate, owner - Content structure: title + "\n\n" + description (matches indexing format) - Incremental sync based on lastModified timestamp - Deletion tracking with grace period Vector Visualization Support - Add deck_card handler in context.py for chunk preview expansion - Include board_id in search result metadata (bm25_hybrid.py, semantic.py) - Expose metadata in viz_routes.py JSON responses - Update vector-viz.js to construct proper Deck URLs: /apps/deck/board/{board_id}/card/{card_id} - Update vector_viz.html filter label from "Deck" to "Deck Cards" Bug Fixes - Skip soft-deleted boards (deletedAt > 0) to prevent 403 Forbidden errors - Applies to scanner, processor, and context expansion code paths - Deck API returns deleted boards but rejects stack access with 403 Testing - Add integration tests in test_deck_vector_search.py: - test_deck_card_semantic_search: Filtered search with doc_type="deck_card" - test_deck_card_appears_in_cross_app_search: Cross-app search includes deck cards - test_deck_card_chunk_context: Chunk context fetching for viz preview Documentation - Update README.md: Add Deck cards to semantic search feature list - Update semantic-search-architecture.md: Document deck_card support - Update nc_semantic_search tool documentation Type Safety - Fix type narrowing for page_boundaries (could be None) using cast() - Fix scanner.py payload None check for type safety Resolves vector search for Deck cards across indexing, search, and visualization. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>	2025-12-13 23:51:18 +01:00
Chris Coutinho	9d0a993c2a	feat(vector-viz): add news_item support for links and chunk expansion Add support for news_item document type in the vector visualization page: - Add "News" checkbox to document type filter options - Add URL handler to link news items to /apps/news/item/{id} - Add content fetching for news items in chunk context expansion This enables users to search and view news articles in the vector visualization, with clickable links back to Nextcloud News and the ability to expand chunks to see full article context. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-13 21:34:47 +01:00
Chris Coutinho	fffe483c02	fix: Centralize PDF processing and generate separate images per chunk Previously, pymupdf4llm.to_markdown() was called twice - once in PyMuPDFProcessor during indexing and again in PDFHighlighter during visualization. Different image path lengths caused different character offsets, leading to highlighted pages not matching their chunks. Also fixed issue where all chunks on the same page showed all highlights instead of just their own highlight. Now restores original page contents between chunks using xref stream caching. Changes: - Add PDFHighlighter class requiring pre-computed page_boundaries and full_text from document processor (no fallback extraction) - Pass pre-computed data from processor to highlighter - Extract page-relative portion of chunk text for cross-page chunks - Add bounding box highlighting using text anchor search - Run highlight generation in parallel with embedding/BM25 - Cache and restore page contents to isolate highlights per chunk Results: Highlighting success rate improved from 51% to 95% (121/128). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-22 02:46:30 +01:00
Chris Coutinho	a62a007c87	feat: Add context expansion to semantic search with chunk overlap removal Implements optional context expansion for semantic search results that fetches adjacent chunks (N-1 and N+1) from Qdrant to provide before/after context. Removes configurable chunk overlap (default 200 chars) to avoid duplicate text appearing in both context and excerpt. Key changes: - Add include_context and context_chars parameters to nc_semantic_search and nc_semantic_search_answer tools - Implement Qdrant cache fast path for chunk retrieval (avoids re-fetching and re-parsing documents, especially important for PDFs) - Add _get_chunk_by_index_from_qdrant() to fetch adjacent chunks - Remove chunk overlap from before_context (last N chars) and after_context (first N chars) to prevent duplicate text - Fetch context in parallel with anyio.Semaphore (max 20 concurrent) - Pass through page_number from SearchResult to SemanticSearchResult - Remove document-level deduplication (keep chunk-level dedup from algorithm) Context expansion is opt-in via include_context=true parameter. When enabled: - Populates has_context_expansion, marked_text, before_context, after_context - Adds truncation flags when context exceeds context_chars limit - Falls back to document fetch for legacy data with truncated excerpts Related: nextcloud_mcp_server/search/context.py:87-382, nextcloud_mcp_server/server/semantic.py:161-255	2025-11-21 01:02:22 +01:00
Chris Coutinho	327d843f64	feat: Implement per-chunk vector visualization with context expansion Major improvements to vector visualization page: - Refactor PCA to display individual chunks instead of averaged documents - Add context expansion module for fetching surrounding text from notes and PDFs - Update deduplication to use (doc_id, doc_type, chunk_start, chunk_end) keys - Fix Alpine.js rendering with chunk-specific keys including offsets - Refactor authentication helper to return NextcloudClient for better reuse - Add async context manager support to NextcloudClient Technical details: - viz_routes.py: Fetch specific chunk vectors instead of averaging per document - context.py: New module supporting both notes and PDF text extraction via PyMuPDF - search algorithms: Extract page_number, chunk_index, total_chunks from Qdrant - vector-viz.js/html: Use chunk positions in expansion tracking keys This enables users to see which specific chunks match their query and view them with surrounding context in the PCA visualization. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-20 11:22:20 +01:00
Chris Coutinho	9bd02d7ef7	fix: Preserve 3D plot camera position and fix CSS loading Two fixes for the vector visualization page: 1. CSS Loading Fix: Moved CSS <link> from vector_viz.html fragment to user_info.html <head> block. HTMX fragments don't process <link> tags in <head>, causing unstyled page. Now CSS loads correctly. 2. Camera Preservation: Modified renderPlot() to preserve camera position when toggling query point visibility. Previously, toggling the "Show Query Point" checkbox would reset zoom/rotation to default. Now reads existing camera settings from plot before updating. Related: nextcloud_mcp_server/auth/static/vector-viz.js:123-130 Related: nextcloud_mcp_server/auth/templates/user_info.html:12 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-19 13:51:08 +01:00
Chris Coutinho	53689d076b	feat: Improve vector visualization with static assets and fixes - Extract CSS and JavaScript into separate static files - Created nextcloud_mcp_server/auth/static/vector-viz.css - Created nextcloud_mcp_server/auth/static/vector-viz.js - Updated templates to reference external assets - Fix vector visualization issues: - Normalize vectors before PCA to match Qdrant's cosine distance - Add zero-norm and NaN detection/handling for large datasets - Enable responsive Plotly sizing (autosize + responsive config) - Widen plot area to full viewport width with minimized margins - Improve visualization accuracy: - Query point now positioned correctly relative to documents - Handles 200+ points without JSON serialization errors - Full-width plot maximizes screen space utilization 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-19 04:10:44 +01:00
Chris Coutinho	9db20a4d01	feat: Redesign UI to match Nextcloud ecosystem aesthetic This commit updates the web interface to better align with Nextcloud's design system and improve the Vector Viz layout. Changes: - Replace emoji icons with Material Design SVG icons for better consistency with Nextcloud apps - Simplify navigation styling with minimal padding and subtle active states (250px width) - Update CSS variables to match Nextcloud design system - Restructure Vector Viz from two-column to single-column vertical layout for better plot visibility - Move search controls to compact horizontal grid at top - Make navigation toggle always visible (not just on mobile) - Fix plot container sizing with overflow:visible to prevent colorbar clipping - Remove heavy shadows and custom card styling for cleaner aesthetic - Add error and success page templates with consistent styling Technical details: - Preserve Alpine.js for reactive functionality - Use CSS Grid for responsive horizontal controls layout - Add smooth transitions for navigation collapse/expand - Maintain HTMX for dynamic content loading 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-19 00:45:19 +01:00
Chris Coutinho	d374bfa1e5	feat(viz): Add dual-score display and improve UI controls This commit enhances the vector visualization interface with better score transparency and improved UX: Dual-Score Display: - Store original algorithm scores before normalization (viz_routes.py:203) - Display both raw and normalized scores: "Raw Score: 0.842 (89% relative)" - Update plot hover text with dual scores (userinfo_routes.py:740) - Fixes issue where all queries showed at least one 100% match regardless of actual relevance (normalization artifact) UI Improvements: 1. Fusion Method dropdown: Changed from x-show to :disabled - Prevents jarring layout shift when switching algorithms - Dropdown stays visible but grayed out when Semantic is selected - Better UX with opacity: 0.5 and cursor: not-allowed 2. Score Threshold: Changed step from 0.1 to "any" - Allows arbitrary float precision (0.7, 0.85, 0.123) - Users can now fine-tune threshold values 3. Document Types: Converted multi-select to checkbox grid - Replaced clunky Ctrl/Cmd multi-select listbox - Checkbox grid with cleaner layout - Positioned left of Score Threshold and Result Limit inputs - More intuitive UX Technical Details: - Raw score ranges vary by algorithm: - Semantic: 0.0-1.0 (cosine similarity) - BM25 RRF: ~0.001-0.033 (Reciprocal Rank Fusion) - BM25 DBSF: Can exceed 1.0 (Distribution-Based Score Fusion) - Normalized scores (0-1) used for visual encoding (marker size, color) - Original scores preserved in API response via getattr fallback Files modified: - nextcloud_mcp_server/auth/viz_routes.py (store original_score) - nextcloud_mcp_server/auth/templates/vector_viz.html (UI controls) - nextcloud_mcp_server/auth/userinfo_routes.py (plot hover text) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-17 08:05:49 +01:00
Chris Coutinho	3464b21845	fix: Relax SearchResult validation to support DBSF fusion scores > 1.0 Fix false-positive validation error where DBSF (Distribution-Based Score Fusion) correctly produces scores > 1.0 but SearchResult validation incorrectly rejected them. Root Cause: SearchResult.__post_init__() enforced scores in [0.0, 1.0] range, but DBSF sums normalized scores from multiple retrieval systems (dense semantic + sparse BM25), resulting in scores like 1.55 when both systems strongly agree a document is relevant. Changes: - Relaxed validation to allow any score ≥ 0.0 (algorithms.py:147-157) - Updated SearchResult and SemanticSearchResult documentation to explain score ranges for RRF ([0.0, 1.0]) vs DBSF (unbounded) - Added comprehensive test coverage for both fusion methods - Added DBSF fusion option to vector visualization UI - Updated viz routes and vizApp() to support fusion parameter selection Testing: All 157 unit tests pass, type checking passes, ruff passes Fixes error: "Configuration error: Score must be between 0.0 and 1.0, got 1.1528953" 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-17 06:32:30 +01:00

10 Commits