nextcloud-mcp-server

Author	SHA1	Message	Date
Chris Coutinho	d67aa6ae5c	fix: Align PDF text extraction between indexing and context expansion This commit fixes two critical issues with PDF processing: 1. Text extraction mismatch (context expansion bug): - Indexing used pymupdf4llm.to_markdown() producing markdown text - Context expansion used page.get_text() producing plain text - Different text formats caused character offset misalignment - Search would find correct chunk, but expansion showed wrong section - Fixed by making context.py use pymupdf4llm.to_markdown() consistently 2. Diagnostic logging for page number assignment: - Added logging to verify page_boundaries exist in metadata - Added logging to verify assign_page_numbers() assigns values - Helps diagnose why page numbers show as null in search results 3. mime_type storage bug: - Fixed incorrect field reference in processor.py:405 - Was using file_metadata.get("content_type", "") - Should use content_type from WebDAV response Changes: - nextcloud_mcp_server/search/context.py: Use pymupdf4llm.to_markdown() for PDF text extraction to match indexing method - nextcloud_mcp_server/vector/processor.py: Add diagnostic logging for page boundaries and assignment, fix mime_type storage - tests/unit/client/test_webdav.py: Fix import sorting 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-20 13:57:50 +01:00
Chris Coutinho	f1a5fac1b9	fix: Update models and viz to use int-only doc_id - algorithms.py: Revert SearchResult.id to int (all docs use int IDs now) - semantic.py: Revert SemanticSearchResult.id to int, remove Union import - viz_routes.py: Remove str() conversion when querying doc_id from Qdrant - viz_routes.py: Convert doc_id from query param to int in chunk context Fixes vector visualization which was collapsing all chunks to a single point because Qdrant queries were failing to match doc_id (string vs int).	2025-11-20 12:32:27 +01:00
Chris Coutinho	d0691d5aa0	feat: Switch files to use numeric IDs with file_path resolution - scanner.py: Use file_info['id'] as doc_id instead of file_path - scanner.py: Pass file_path in DocumentTask for content retrieval - processor.py: Store file_path in Qdrant payload for later lookup - context.py: Add _get_file_path_from_qdrant() to resolve file_id → file_path - context.py: Update get_chunk_with_context() to handle file ID resolution This makes the system resilient to file renames since file IDs are stable identifiers in Nextcloud, while file paths can change.	2025-11-20 12:00:47 +01:00
Chris Coutinho	f1610bbd2e	fix: Reconstruct full content for notes to match indexed offsets Notes are indexed as "{title}\n\n{content}" in processor.py but were being retrieved as just content during chunk expansion, causing chunk_start_offset and chunk_end_offset to be misaligned. This fix reconstructs the full content structure when fetching notes for chunk expansion, ensuring the displayed chunks match the excerpts shown in search results. Fixes chunk/excerpt mismatch reported in vector visualization. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-20 11:33:12 +01:00
Chris Coutinho	327d843f64	feat: Implement per-chunk vector visualization with context expansion Major improvements to vector visualization page: - Refactor PCA to display individual chunks instead of averaged documents - Add context expansion module for fetching surrounding text from notes and PDFs - Update deduplication to use (doc_id, doc_type, chunk_start, chunk_end) keys - Fix Alpine.js rendering with chunk-specific keys including offsets - Refactor authentication helper to return NextcloudClient for better reuse - Add async context manager support to NextcloudClient Technical details: - viz_routes.py: Fetch specific chunk vectors instead of averaging per document - context.py: New module supporting both notes and PDF text extraction via PyMuPDF - search algorithms: Extract page_number, chunk_index, total_chunks from Qdrant - vector-viz.js/html: Use chunk positions in expansion tracking keys This enables users to see which specific chunks match their query and view them with surrounding context in the PCA visualization. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-20 11:22:20 +01:00
Chris Coutinho	b8010270c1	fix: Add async/await, PDF metadata, and type safety fixes This commit addresses multiple issues with async operations, PDF metadata extraction, and type safety in document processing and search. ## Async/Await Fixes - processor.py:259 - Added await for chunker.chunk_text(content) - processor.py:270 - Added await for bm25_service.encode_batch(chunk_texts) - tests/unit/test_document_chunker.py - Converted all 12 test methods to async ## PDF Metadata Enhancement - pymupdf.py:143 - Added file_size metadata extraction - pymupdf.py:145-206 - Refactored to extract text page-by-page - Manually loop through pages instead of using page_chunks=True - Generate page_boundaries metadata for precise page tracking - Works around pymupdf.layout.activate() breaking page_chunks=True - processor.py:32-66 - Added assign_page_numbers() helper function - Assigns page numbers to chunks based on overlap with page boundaries - Handles chunks spanning multiple pages - processor.py:298-300 - Call assign_page_numbers() for PDF files ## Type Safety Fixes - bm25_hybrid.py:184 - Removed int() conversion of doc_id - semantic.py:131 - Removed int() conversion of doc_id - viz_routes.py:275 - Removed int() conversion of doc_id - Added comments documenting that doc_id can be int (notes) or str (file paths) ## Testing - All 18 tests passing (12 unit + 6 integration) - No type errors in modified files - Container logs show successful processing - Vector viz searches working correctly 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-20 02:37:07 +01:00
Chris Coutinho	0f24bdb17a	docs: Add svg	2025-11-19 23:44:23 +01:00
github-actions[bot]	bf11f16e2f	bump: version 0.43.0 → 0.44.0 nextcloud-mcp-server-0.44.0 v0.44.0	2025-11-19 22:43:03 +00:00
Chris Coutinho	bf05ff8d6e	Merge pull request #329 from cbcoutinho/feature/nextcloud-ui-improvements feat: Redesign UI and improve vector visualization	2025-11-19 23:42:32 +01:00
Chris Coutinho	c4ce28f05d	fix: Improve 3D plot rendering with explicit dimensions and window resize support - Get container dimensions before creating Plotly layout to render at correct size immediately - Add init() method with window resize listener for responsive plot sizing - Remove post-render resize call (no longer needed with explicit dimensions) - Improve colorbar positioning and scene domain configuration This eliminates the visual "jump" during initial render and ensures the plot resizes smoothly when the browser window changes size. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-19 19:43:20 +01:00
Chris Coutinho	9b2a06964b	Merge pull request #331 from cbcoutinho/renovate/commitizen-tools-commitizen-action-0.x chore(deps): update commitizen-tools/commitizen-action action to v0.26.0	2025-11-19 14:42:06 +01:00
Chris Coutinho	c126c3ec03	fix: Preserve 3D plot camera and improve documentation This commit addresses PR feedback and fixes plot camera behavior. ## JavaScript Fix - Camera Preservation - Changed plot update strategy from recreating layout to using Plotly.restyle() - Query point visibility now toggles via restyle() which only modifies trace visibility - Camera position/zoom naturally preserved since layout remains untouched - Resolves jumpy plot behavior when toggling "Show Query Point" checkbox Related: nextcloud_mcp_server/auth/static/vector-viz.js:58-73 ## Documentation Improvements - Condensed vector-sync-ui.md from 316 to 94 lines (~70% reduction) - Removed redundant FAQ section (content merged into main sections) - Simplified use cases from 4 detailed sections to 3 focused paragraphs - Streamlined troubleshooting to 3 common issues - Merged technical details into overview section - Retained all essential information while improving readability ## Screenshot Updates Removed old/outdated images (5 files): - rag-workflow-bidirectional-final.png - rag-workflow-prominent-llm.png - rag-workflow-simple-final.png - vector-viz-interface.png - welcome-page.png Replaced with current screenshots (3 files): - vector-viz-document-types-2col.png - Now shows plot + results - vector-viz-chunk-context.png - Centered content view - vector-viz-results.png - Updated results list 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-19 14:10:53 +01:00
Chris Coutinho	9bd02d7ef7	fix: Preserve 3D plot camera position and fix CSS loading Two fixes for the vector visualization page: 1. CSS Loading Fix: Moved CSS <link> from vector_viz.html fragment to user_info.html <head> block. HTMX fragments don't process <link> tags in <head>, causing unstyled page. Now CSS loads correctly. 2. Camera Preservation: Modified renderPlot() to preserve camera position when toggling query point visibility. Previously, toggling the "Show Query Point" checkbox would reset zoom/rotation to default. Now reads existing camera settings from plot before updating. Related: nextcloud_mcp_server/auth/static/vector-viz.js:123-130 Related: nextcloud_mcp_server/auth/templates/user_info.html:12 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-19 13:51:08 +01:00
renovate-bot-cbcoutinho[bot]	e38a830f02	chore(deps): update commitizen-tools/commitizen-action action to v0.26.0	2025-11-19 11:07:37 +00:00
Chris Coutinho	18b753c3c7	Merge pull request #330 from cbcoutinho/renovate/docker.io-library-nextcloud-32.0.1 chore(deps): update docker.io/library/nextcloud:32.0.1 docker digest to d572839	2025-11-19 09:57:27 +01:00
renovate-bot-cbcoutinho[bot]	b0735bae85	chore(deps): update docker.io/library/nextcloud:32.0.1 docker digest to d572839	2025-11-19 05:08:00 +00:00
Chris Coutinho	53689d076b	feat: Improve vector visualization with static assets and fixes - Extract CSS and JavaScript into separate static files - Created nextcloud_mcp_server/auth/static/vector-viz.css - Created nextcloud_mcp_server/auth/static/vector-viz.js - Updated templates to reference external assets - Fix vector visualization issues: - Normalize vectors before PCA to match Qdrant's cosine distance - Add zero-norm and NaN detection/handling for large datasets - Enable responsive Plotly sizing (autosize + responsive config) - Widen plot area to full viewport width with minimized margins - Improve visualization accuracy: - Query point now positioned correctly relative to documents - Handles 200+ points without JSON serialization errors - Full-width plot maximizes screen space utilization 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-19 04:10:44 +01:00
Chris Coutinho	0f7d6c0e33	Merge pull request #327 from cbcoutinho/renovate/docker.io-library-python-3.12-slim-trixie chore(deps): update docker.io/library/python:3.12-slim-trixie docker digest to 2e683fc	2025-11-19 01:53:05 +01:00
Chris Coutinho	16701fdb72	Merge pull request #328 from cbcoutinho/renovate/docker.io-library-redis-alpine chore(deps): update docker.io/library/redis:alpine docker digest to 5013e94	2025-11-19 01:52:57 +01:00
Chris Coutinho	9db20a4d01	feat: Redesign UI to match Nextcloud ecosystem aesthetic This commit updates the web interface to better align with Nextcloud's design system and improve the Vector Viz layout. Changes: - Replace emoji icons with Material Design SVG icons for better consistency with Nextcloud apps - Simplify navigation styling with minimal padding and subtle active states (250px width) - Update CSS variables to match Nextcloud design system - Restructure Vector Viz from two-column to single-column vertical layout for better plot visibility - Move search controls to compact horizontal grid at top - Make navigation toggle always visible (not just on mobile) - Fix plot container sizing with overflow:visible to prevent colorbar clipping - Remove heavy shadows and custom card styling for cleaner aesthetic - Add error and success page templates with consistent styling Technical details: - Preserve Alpine.js for reactive functionality - Use CSS Grid for responsive horizontal controls layout - Add smooth transitions for navigation collapse/expand - Maintain HTMX for dynamic content loading 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-19 00:45:19 +01:00
renovate-bot-cbcoutinho[bot]	7ddf8370e6	chore(deps): update docker.io/library/redis:alpine docker digest to 5013e94	2025-11-18 23:10:41 +00:00
renovate-bot-cbcoutinho[bot]	98dff98e9c	chore(deps): update docker.io/library/python:3.12-slim-trixie docker digest to 2e683fc	2025-11-18 23:10:36 +00:00
Chris Coutinho	73e8012707	Merge pull request #325 from cbcoutinho/renovate/docker.io-library-python-3.12-slim-trixie chore(deps): update docker.io/library/python:3.12-slim-trixie docker digest to 2bbc83f	2025-11-18 14:06:14 +01:00
Chris Coutinho	c2fd87a5d3	Merge pull request #324 from cbcoutinho/renovate/docker.io-library-nextcloud-32.0.1 chore(deps): update docker.io/library/nextcloud:32.0.1 docker digest to f6232ea	2025-11-18 14:03:38 +01:00
github-actions[bot]	441d94301e	bump: version 0.42.0 → 0.43.0 nextcloud-mcp-server-0.43.0 v0.43.0	2025-11-18 12:56:15 +00:00
Chris Coutinho	b488d69939	Merge pull request #326 from cbcoutinho/feature/notes2 feat: Replace custom document chunker with LangChain MarkdownTextSplitter	2025-11-18 13:55:34 +01:00
Chris Coutinho	eec923eff5	feat: Replace custom document chunker with LangChain MarkdownTextSplitter Migrates from custom word-based chunking to LangChain's MarkdownTextSplitter for better semantic search quality. This implements the chunking portion of ADR-011. Changes: - Replace custom regex word chunker with MarkdownTextSplitter - Optimized for Markdown content (headers, code blocks, lists) - Convert from word-based (512 words) to character-based (2048 chars) chunking - Maintain backward-compatible ChunkWithPosition interface - Update configuration defaults and validation - Update all unit tests (12/12 passing) Benefits: - Respects markdown structure boundaries - Never breaks code blocks or headers mid-chunk - Preserves semantic coherence within chunks - Expected 20-30% improvement in recall quality - Industry-standard approach (used by production RAG systems) Note: Full reindex required to apply new chunking to existing documents. Current vector database still contains old word-based chunks. Related: ADR-011 (Improving Semantic Search Quality) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-18 12:17:23 +01:00
renovate-bot-cbcoutinho[bot]	3642faf32c	chore(deps): update docker.io/library/python:3.12-slim-trixie docker digest to 2bbc83f	2025-11-18 11:08:08 +00:00
renovate-bot-cbcoutinho[bot]	3b1cd96722	chore(deps): update docker.io/library/nextcloud:32.0.1 docker digest to f6232ea	2025-11-18 11:08:03 +00:00
Chris Coutinho	219d064459	Merge pull request #321 from cbcoutinho/renovate/pin-dependencies chore(deps): pin ghcr.io/astral-sh/uv docker tag to 29bd450	2025-11-18 00:15:32 +01:00
Chris Coutinho	d0ab8d071a	Merge pull request #322 from cbcoutinho/renovate/actions-checkout-digest chore(deps): update actions/checkout digest to 93cb6ef	2025-11-18 00:15:20 +01:00
Chris Coutinho	b792e9d9a3	Merge pull request #323 from cbcoutinho/renovate/docker.io-library-mariadb-lts chore(deps): update docker.io/library/mariadb:lts docker digest to 1cac849	2025-11-18 00:14:46 +01:00
renovate-bot-cbcoutinho[bot]	4288814ff4	chore(deps): update docker.io/library/mariadb:lts docker digest to 1cac849	2025-11-17 23:11:14 +00:00
renovate-bot-cbcoutinho[bot]	f34a1c5677	chore(deps): update actions/checkout digest to 93cb6ef	2025-11-17 23:11:10 +00:00
renovate-bot-cbcoutinho[bot]	6d48f90112	chore(deps): pin ghcr.io/astral-sh/uv docker tag to 29bd450	2025-11-17 23:11:04 +00:00
Chris Coutinho	b72aeca55f	test: Add custom notes app	2025-11-17 22:14:01 +01:00
Chris Coutinho	c1ae818b75	Merge pull request #317 from cbcoutinho/renovate/ghcr.io-astral-sh-uv-latest chore(deps): update ghcr.io/astral-sh/uv:latest docker digest to 29bd450	2025-11-17 19:40:24 +01:00
Chris Coutinho	ebca2bfc70	build: pin uv to 0.9.10, use --no-cache	2025-11-17 19:33:15 +01:00
Chris Coutinho	6dcd0bae48	Merge pull request #318 from cbcoutinho/renovate/actions-checkout-5.x chore(deps): update actions/checkout action to v5.0.1	2025-11-17 19:23:32 +01:00
Chris Coutinho	818f643dca	Merge pull request #319 from cbcoutinho/renovate/qdrant-1.x chore(deps): update helm release qdrant to v1.16.0	2025-11-17 19:23:25 +01:00
Chris Coutinho	d31b490f13	Merge pull request #320 from cbcoutinho/renovate/qdrant-qdrant-1.x chore(deps): update qdrant/qdrant docker tag to v1.16.0	2025-11-17 19:23:16 +01:00
renovate-bot-cbcoutinho[bot]	839cf159b8	chore(deps): update qdrant/qdrant docker tag to v1.16.0	2025-11-17 17:09:02 +00:00
renovate-bot-cbcoutinho[bot]	cefb438017	chore(deps): update helm release qdrant to v1.16.0	2025-11-17 17:08:54 +00:00
renovate-bot-cbcoutinho[bot]	efc78a835e	chore(deps): update actions/checkout action to v5.0.1	2025-11-17 17:08:34 +00:00
renovate-bot-cbcoutinho[bot]	fa25a1b4df	chore(deps): update ghcr.io/astral-sh/uv:latest docker digest to 29bd450	2025-11-17 17:08:28 +00:00
github-actions[bot]	8367208a03	bump: version 0.41.0 → 0.42.0 nextcloud-mcp-server-0.42.0 v0.42.0	2025-11-17 07:25:33 +00:00
Chris Coutinho	52acc4bc07	Merge pull request #316 from cbcoutinho/feature/cleanup feat(viz): Add dual-score display and improve UI controls	2025-11-17 08:25:04 +01:00
Chris Coutinho	d374bfa1e5	feat(viz): Add dual-score display and improve UI controls This commit enhances the vector visualization interface with better score transparency and improved UX: Dual-Score Display: - Store original algorithm scores before normalization (viz_routes.py:203) - Display both raw and normalized scores: "Raw Score: 0.842 (89% relative)" - Update plot hover text with dual scores (userinfo_routes.py:740) - Fixes issue where all queries showed at least one 100% match regardless of actual relevance (normalization artifact) UI Improvements: 1. Fusion Method dropdown: Changed from x-show to :disabled - Prevents jarring layout shift when switching algorithms - Dropdown stays visible but grayed out when Semantic is selected - Better UX with opacity: 0.5 and cursor: not-allowed 2. Score Threshold: Changed step from 0.1 to "any" - Allows arbitrary float precision (0.7, 0.85, 0.123) - Users can now fine-tune threshold values 3. Document Types: Converted multi-select to checkbox grid - Replaced clunky Ctrl/Cmd multi-select listbox - Checkbox grid with cleaner layout - Positioned left of Score Threshold and Result Limit inputs - More intuitive UX Technical Details: - Raw score ranges vary by algorithm: - Semantic: 0.0-1.0 (cosine similarity) - BM25 RRF: ~0.001-0.033 (Reciprocal Rank Fusion) - BM25 DBSF: Can exceed 1.0 (Distribution-Based Score Fusion) - Normalized scores (0-1) used for visual encoding (marker size, color) - Original scores preserved in API response via getattr fallback Files modified: - nextcloud_mcp_server/auth/viz_routes.py (store original_score) - nextcloud_mcp_server/auth/templates/vector_viz.html (UI controls) - nextcloud_mcp_server/auth/userinfo_routes.py (plot hover text) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-17 08:05:49 +01:00
github-actions[bot]	b1f7b1d30b	bump: version 0.40.0 → 0.41.0 nextcloud-mcp-server-0.41.0 v0.41.0	2025-11-17 05:57:12 +00:00
Chris Coutinho	b8bdbb499f	Merge pull request #315 from cbcoutinho/feature/cleanup Feature/cleanup	2025-11-17 06:56:43 +01:00

1 2 3 4 5 ...

1153 Commits