nextcloud-mcp-server

Author	SHA1	Message	Date
renovate-bot-cbcoutinho[bot]	2abedd6b4b	chore(deps): update actions/checkout action to v6	2025-11-20 17:12:30 +00:00
Chris Coutinho	5a251a99e6	fix: Set is_placeholder=False in processor to fix search filtering The processor was not setting is_placeholder field when writing real document chunks to Qdrant. This caused the placeholder filter to exclude all documents (since None != False), resulting in 0 search results. Now explicitly sets is_placeholder: False in payload when writing real indexed chunks, allowing search filters to correctly distinguish between placeholders and real documents.	2025-11-20 17:15:19 +01:00
Chris Coutinho	25ef33de7f	feat: Use Ollama native batch API in embed_batch() - Switch from sequential loop to /api/embed batch endpoint - Use 'input' array parameter instead of individual 'prompt' requests - Process in chunks of 32 to avoid quality degradation (issue #6262) - Reduces HTTP overhead: 128 texts = 4 requests instead of 128 - Maintains backward compatibility with embed() for single embeddings Ref: ollama/ollama#6262	2025-11-20 16:50:13 +01:00
Chris Coutinho	ec2c274cd9	fix: Increase placeholder staleness threshold to 5x scan interval - Changed from 2x (120s) to 5x (300s) scan interval - Large PDFs take 3-4 minutes to process, need longer threshold - Prevents premature requeuing of in-flight documents	2025-11-20 15:36:49 +01:00
Chris Coutinho	47f0b3db9a	fix: Add placeholder staleness check to prevent duplicate processing - Only requeue documents if placeholder is older than 2x scan interval (120s default) - Prevents scanner from immediately requeuing in-flight documents - Fixes issue where PDFs were being reprocessed every 60 seconds - Staleness check applied to both notes and files scanning logic	2025-11-20 15:30:10 +01:00
Chris Coutinho	233de3508f	fix: Use empty SparseVector instead of None for placeholders Qdrant validation rejects None for sparse vectors in named vector dicts. Use models.SparseVector(indices=[], values=[]) instead to create valid empty sparse vectors for placeholder points. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-20 15:15:10 +01:00
Chris Coutinho	13b2d0048c	feat: Implement Qdrant placeholder state management Introduces a placeholder-based state tracking system to prevent duplicate document processing during the gap between scanner queuing and processor completion. Key Changes: 1. Placeholder Helper Functions (`vector/placeholder.py`): - `write_placeholder_point()` - Creates zero-vector placeholder when queuing - `query_document_metadata()` - Queries for existing entry (placeholder or real) - `delete_placeholder_point()` - Removes placeholder before writing real vectors - `get_placeholder_filter()` - Filters placeholders from user-facing queries 2. Scanner Updates (`vector/scanner.py`): - Replace `indexed_at` comparison with `modified_at` comparison - Write placeholder before queuing each document - Query per-document metadata instead of bulk-querying indexed_at - Fixes bug where files were resubmitted every scan cycle 3. Processor Updates (`vector/processor.py`): - Delete placeholder before upserting real vectors - Ensures no duplicate points in Qdrant 4. Query Filters (all search files): - Add `get_placeholder_filter()` to all user-facing queries - Ensures placeholders never appear in search results or visualizations - Applied to: bm25_hybrid.py, semantic.py, viz_routes.py, algorithms.py Architecture: - Placeholders use zero vectors with dimension from embedding service - Payload includes `is_placeholder: True` flag for filtering - Status field tracks: "pending", "processing", "completed", "failed" - Deterministic UUIDs using uuid5 for consistent point IDs Impact: - Eliminates duplicate processing of same documents - Fixes race condition where long-running documents get queued multiple times - Prevents scanner from resubmitting files every scan cycle - Maintains clean separation between in-flight and indexed documents 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-20 15:04:00 +01:00
Chris Coutinho	944dd760ca	fix: Return empty array instead of null for query_coords when no results When vector visualization search returns zero results, the code was returning query_coords: null, which caused JavaScript error "can't access property 0, queryCoords is null" when the frontend tried to access the array. Changed to return empty array [] to match expected type and prevent crash. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-20 14:18:02 +01:00
Chris Coutinho	d67aa6ae5c	fix: Align PDF text extraction between indexing and context expansion This commit fixes two critical issues with PDF processing: 1. Text extraction mismatch (context expansion bug): - Indexing used pymupdf4llm.to_markdown() producing markdown text - Context expansion used page.get_text() producing plain text - Different text formats caused character offset misalignment - Search would find correct chunk, but expansion showed wrong section - Fixed by making context.py use pymupdf4llm.to_markdown() consistently 2. Diagnostic logging for page number assignment: - Added logging to verify page_boundaries exist in metadata - Added logging to verify assign_page_numbers() assigns values - Helps diagnose why page numbers show as null in search results 3. mime_type storage bug: - Fixed incorrect field reference in processor.py:405 - Was using file_metadata.get("content_type", "") - Should use content_type from WebDAV response Changes: - nextcloud_mcp_server/search/context.py: Use pymupdf4llm.to_markdown() for PDF text extraction to match indexing method - nextcloud_mcp_server/vector/processor.py: Add diagnostic logging for page boundaries and assignment, fix mime_type storage - tests/unit/client/test_webdav.py: Fix import sorting 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-20 13:57:50 +01:00
Chris Coutinho	f1a5fac1b9	fix: Update models and viz to use int-only doc_id - algorithms.py: Revert SearchResult.id to int (all docs use int IDs now) - semantic.py: Revert SemanticSearchResult.id to int, remove Union import - viz_routes.py: Remove str() conversion when querying doc_id from Qdrant - viz_routes.py: Convert doc_id from query param to int in chunk context Fixes vector visualization which was collapsing all chunks to a single point because Qdrant queries were failing to match doc_id (string vs int).	2025-11-20 12:32:27 +01:00
Chris Coutinho	d0691d5aa0	feat: Switch files to use numeric IDs with file_path resolution - scanner.py: Use file_info['id'] as doc_id instead of file_path - scanner.py: Pass file_path in DocumentTask for content retrieval - processor.py: Store file_path in Qdrant payload for later lookup - context.py: Add _get_file_path_from_qdrant() to resolve file_id → file_path - context.py: Update get_chunk_with_context() to handle file ID resolution This makes the system resilient to file renames since file IDs are stable identifiers in Nextcloud, while file paths can change.	2025-11-20 12:00:47 +01:00
Chris Coutinho	f1610bbd2e	fix: Reconstruct full content for notes to match indexed offsets Notes are indexed as "{title}\n\n{content}" in processor.py but were being retrieved as just content during chunk expansion, causing chunk_start_offset and chunk_end_offset to be misaligned. This fix reconstructs the full content structure when fetching notes for chunk expansion, ensuring the displayed chunks match the excerpts shown in search results. Fixes chunk/excerpt mismatch reported in vector visualization. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-20 11:33:12 +01:00
Chris Coutinho	327d843f64	feat: Implement per-chunk vector visualization with context expansion Major improvements to vector visualization page: - Refactor PCA to display individual chunks instead of averaged documents - Add context expansion module for fetching surrounding text from notes and PDFs - Update deduplication to use (doc_id, doc_type, chunk_start, chunk_end) keys - Fix Alpine.js rendering with chunk-specific keys including offsets - Refactor authentication helper to return NextcloudClient for better reuse - Add async context manager support to NextcloudClient Technical details: - viz_routes.py: Fetch specific chunk vectors instead of averaging per document - context.py: New module supporting both notes and PDF text extraction via PyMuPDF - search algorithms: Extract page_number, chunk_index, total_chunks from Qdrant - vector-viz.js/html: Use chunk positions in expansion tracking keys This enables users to see which specific chunks match their query and view them with surrounding context in the PCA visualization. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-20 11:22:20 +01:00
Chris Coutinho	b8010270c1	fix: Add async/await, PDF metadata, and type safety fixes This commit addresses multiple issues with async operations, PDF metadata extraction, and type safety in document processing and search. ## Async/Await Fixes - processor.py:259 - Added await for chunker.chunk_text(content) - processor.py:270 - Added await for bm25_service.encode_batch(chunk_texts) - tests/unit/test_document_chunker.py - Converted all 12 test methods to async ## PDF Metadata Enhancement - pymupdf.py:143 - Added file_size metadata extraction - pymupdf.py:145-206 - Refactored to extract text page-by-page - Manually loop through pages instead of using page_chunks=True - Generate page_boundaries metadata for precise page tracking - Works around pymupdf.layout.activate() breaking page_chunks=True - processor.py:32-66 - Added assign_page_numbers() helper function - Assigns page numbers to chunks based on overlap with page boundaries - Handles chunks spanning multiple pages - processor.py:298-300 - Call assign_page_numbers() for PDF files ## Type Safety Fixes - bm25_hybrid.py:184 - Removed int() conversion of doc_id - semantic.py:131 - Removed int() conversion of doc_id - viz_routes.py:275 - Removed int() conversion of doc_id - Added comments documenting that doc_id can be int (notes) or str (file paths) ## Testing - All 18 tests passing (12 unit + 6 integration) - No type errors in modified files - Container logs show successful processing - Vector viz searches working correctly 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-20 02:37:07 +01:00
Chris Coutinho	0f24bdb17a	docs: Add svg	2025-11-19 23:44:23 +01:00
github-actions[bot]	bf11f16e2f	bump: version 0.43.0 → 0.44.0 nextcloud-mcp-server-0.44.0 v0.44.0	2025-11-19 22:43:03 +00:00
Chris Coutinho	bf05ff8d6e	Merge pull request #329 from cbcoutinho/feature/nextcloud-ui-improvements feat: Redesign UI and improve vector visualization	2025-11-19 23:42:32 +01:00
Chris Coutinho	c4ce28f05d	fix: Improve 3D plot rendering with explicit dimensions and window resize support - Get container dimensions before creating Plotly layout to render at correct size immediately - Add init() method with window resize listener for responsive plot sizing - Remove post-render resize call (no longer needed with explicit dimensions) - Improve colorbar positioning and scene domain configuration This eliminates the visual "jump" during initial render and ensures the plot resizes smoothly when the browser window changes size. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-19 19:43:20 +01:00
Chris Coutinho	9b2a06964b	Merge pull request #331 from cbcoutinho/renovate/commitizen-tools-commitizen-action-0.x chore(deps): update commitizen-tools/commitizen-action action to v0.26.0	2025-11-19 14:42:06 +01:00
Chris Coutinho	c126c3ec03	fix: Preserve 3D plot camera and improve documentation This commit addresses PR feedback and fixes plot camera behavior. ## JavaScript Fix - Camera Preservation - Changed plot update strategy from recreating layout to using Plotly.restyle() - Query point visibility now toggles via restyle() which only modifies trace visibility - Camera position/zoom naturally preserved since layout remains untouched - Resolves jumpy plot behavior when toggling "Show Query Point" checkbox Related: nextcloud_mcp_server/auth/static/vector-viz.js:58-73 ## Documentation Improvements - Condensed vector-sync-ui.md from 316 to 94 lines (~70% reduction) - Removed redundant FAQ section (content merged into main sections) - Simplified use cases from 4 detailed sections to 3 focused paragraphs - Streamlined troubleshooting to 3 common issues - Merged technical details into overview section - Retained all essential information while improving readability ## Screenshot Updates Removed old/outdated images (5 files): - rag-workflow-bidirectional-final.png - rag-workflow-prominent-llm.png - rag-workflow-simple-final.png - vector-viz-interface.png - welcome-page.png Replaced with current screenshots (3 files): - vector-viz-document-types-2col.png - Now shows plot + results - vector-viz-chunk-context.png - Centered content view - vector-viz-results.png - Updated results list 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-19 14:10:53 +01:00
Chris Coutinho	9bd02d7ef7	fix: Preserve 3D plot camera position and fix CSS loading Two fixes for the vector visualization page: 1. CSS Loading Fix: Moved CSS <link> from vector_viz.html fragment to user_info.html <head> block. HTMX fragments don't process <link> tags in <head>, causing unstyled page. Now CSS loads correctly. 2. Camera Preservation: Modified renderPlot() to preserve camera position when toggling query point visibility. Previously, toggling the "Show Query Point" checkbox would reset zoom/rotation to default. Now reads existing camera settings from plot before updating. Related: nextcloud_mcp_server/auth/static/vector-viz.js:123-130 Related: nextcloud_mcp_server/auth/templates/user_info.html:12 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-19 13:51:08 +01:00
renovate-bot-cbcoutinho[bot]	e38a830f02	chore(deps): update commitizen-tools/commitizen-action action to v0.26.0	2025-11-19 11:07:37 +00:00
Chris Coutinho	18b753c3c7	Merge pull request #330 from cbcoutinho/renovate/docker.io-library-nextcloud-32.0.1 chore(deps): update docker.io/library/nextcloud:32.0.1 docker digest to d572839	2025-11-19 09:57:27 +01:00
renovate-bot-cbcoutinho[bot]	b0735bae85	chore(deps): update docker.io/library/nextcloud:32.0.1 docker digest to d572839	2025-11-19 05:08:00 +00:00
Chris Coutinho	53689d076b	feat: Improve vector visualization with static assets and fixes - Extract CSS and JavaScript into separate static files - Created nextcloud_mcp_server/auth/static/vector-viz.css - Created nextcloud_mcp_server/auth/static/vector-viz.js - Updated templates to reference external assets - Fix vector visualization issues: - Normalize vectors before PCA to match Qdrant's cosine distance - Add zero-norm and NaN detection/handling for large datasets - Enable responsive Plotly sizing (autosize + responsive config) - Widen plot area to full viewport width with minimized margins - Improve visualization accuracy: - Query point now positioned correctly relative to documents - Handles 200+ points without JSON serialization errors - Full-width plot maximizes screen space utilization 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-19 04:10:44 +01:00
Chris Coutinho	0f7d6c0e33	Merge pull request #327 from cbcoutinho/renovate/docker.io-library-python-3.12-slim-trixie chore(deps): update docker.io/library/python:3.12-slim-trixie docker digest to 2e683fc	2025-11-19 01:53:05 +01:00
Chris Coutinho	16701fdb72	Merge pull request #328 from cbcoutinho/renovate/docker.io-library-redis-alpine chore(deps): update docker.io/library/redis:alpine docker digest to 5013e94	2025-11-19 01:52:57 +01:00
Chris Coutinho	9db20a4d01	feat: Redesign UI to match Nextcloud ecosystem aesthetic This commit updates the web interface to better align with Nextcloud's design system and improve the Vector Viz layout. Changes: - Replace emoji icons with Material Design SVG icons for better consistency with Nextcloud apps - Simplify navigation styling with minimal padding and subtle active states (250px width) - Update CSS variables to match Nextcloud design system - Restructure Vector Viz from two-column to single-column vertical layout for better plot visibility - Move search controls to compact horizontal grid at top - Make navigation toggle always visible (not just on mobile) - Fix plot container sizing with overflow:visible to prevent colorbar clipping - Remove heavy shadows and custom card styling for cleaner aesthetic - Add error and success page templates with consistent styling Technical details: - Preserve Alpine.js for reactive functionality - Use CSS Grid for responsive horizontal controls layout - Add smooth transitions for navigation collapse/expand - Maintain HTMX for dynamic content loading 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-19 00:45:19 +01:00
renovate-bot-cbcoutinho[bot]	7ddf8370e6	chore(deps): update docker.io/library/redis:alpine docker digest to 5013e94	2025-11-18 23:10:41 +00:00
renovate-bot-cbcoutinho[bot]	98dff98e9c	chore(deps): update docker.io/library/python:3.12-slim-trixie docker digest to 2e683fc	2025-11-18 23:10:36 +00:00
Chris Coutinho	73e8012707	Merge pull request #325 from cbcoutinho/renovate/docker.io-library-python-3.12-slim-trixie chore(deps): update docker.io/library/python:3.12-slim-trixie docker digest to 2bbc83f	2025-11-18 14:06:14 +01:00
Chris Coutinho	c2fd87a5d3	Merge pull request #324 from cbcoutinho/renovate/docker.io-library-nextcloud-32.0.1 chore(deps): update docker.io/library/nextcloud:32.0.1 docker digest to f6232ea	2025-11-18 14:03:38 +01:00
github-actions[bot]	441d94301e	bump: version 0.42.0 → 0.43.0 nextcloud-mcp-server-0.43.0 v0.43.0	2025-11-18 12:56:15 +00:00
Chris Coutinho	b488d69939	Merge pull request #326 from cbcoutinho/feature/notes2 feat: Replace custom document chunker with LangChain MarkdownTextSplitter	2025-11-18 13:55:34 +01:00
Chris Coutinho	eec923eff5	feat: Replace custom document chunker with LangChain MarkdownTextSplitter Migrates from custom word-based chunking to LangChain's MarkdownTextSplitter for better semantic search quality. This implements the chunking portion of ADR-011. Changes: - Replace custom regex word chunker with MarkdownTextSplitter - Optimized for Markdown content (headers, code blocks, lists) - Convert from word-based (512 words) to character-based (2048 chars) chunking - Maintain backward-compatible ChunkWithPosition interface - Update configuration defaults and validation - Update all unit tests (12/12 passing) Benefits: - Respects markdown structure boundaries - Never breaks code blocks or headers mid-chunk - Preserves semantic coherence within chunks - Expected 20-30% improvement in recall quality - Industry-standard approach (used by production RAG systems) Note: Full reindex required to apply new chunking to existing documents. Current vector database still contains old word-based chunks. Related: ADR-011 (Improving Semantic Search Quality) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-18 12:17:23 +01:00
renovate-bot-cbcoutinho[bot]	3642faf32c	chore(deps): update docker.io/library/python:3.12-slim-trixie docker digest to 2bbc83f	2025-11-18 11:08:08 +00:00
renovate-bot-cbcoutinho[bot]	3b1cd96722	chore(deps): update docker.io/library/nextcloud:32.0.1 docker digest to f6232ea	2025-11-18 11:08:03 +00:00
Chris Coutinho	219d064459	Merge pull request #321 from cbcoutinho/renovate/pin-dependencies chore(deps): pin ghcr.io/astral-sh/uv docker tag to 29bd450	2025-11-18 00:15:32 +01:00
Chris Coutinho	d0ab8d071a	Merge pull request #322 from cbcoutinho/renovate/actions-checkout-digest chore(deps): update actions/checkout digest to 93cb6ef	2025-11-18 00:15:20 +01:00
Chris Coutinho	b792e9d9a3	Merge pull request #323 from cbcoutinho/renovate/docker.io-library-mariadb-lts chore(deps): update docker.io/library/mariadb:lts docker digest to 1cac849	2025-11-18 00:14:46 +01:00
renovate-bot-cbcoutinho[bot]	4288814ff4	chore(deps): update docker.io/library/mariadb:lts docker digest to 1cac849	2025-11-17 23:11:14 +00:00
renovate-bot-cbcoutinho[bot]	f34a1c5677	chore(deps): update actions/checkout digest to 93cb6ef	2025-11-17 23:11:10 +00:00
renovate-bot-cbcoutinho[bot]	6d48f90112	chore(deps): pin ghcr.io/astral-sh/uv docker tag to 29bd450	2025-11-17 23:11:04 +00:00
Chris Coutinho	b72aeca55f	test: Add custom notes app	2025-11-17 22:14:01 +01:00
Chris Coutinho	c1ae818b75	Merge pull request #317 from cbcoutinho/renovate/ghcr.io-astral-sh-uv-latest chore(deps): update ghcr.io/astral-sh/uv:latest docker digest to 29bd450	2025-11-17 19:40:24 +01:00
Chris Coutinho	ebca2bfc70	build: pin uv to 0.9.10, use --no-cache	2025-11-17 19:33:15 +01:00
Chris Coutinho	6dcd0bae48	Merge pull request #318 from cbcoutinho/renovate/actions-checkout-5.x chore(deps): update actions/checkout action to v5.0.1	2025-11-17 19:23:32 +01:00
Chris Coutinho	818f643dca	Merge pull request #319 from cbcoutinho/renovate/qdrant-1.x chore(deps): update helm release qdrant to v1.16.0	2025-11-17 19:23:25 +01:00
Chris Coutinho	d31b490f13	Merge pull request #320 from cbcoutinho/renovate/qdrant-qdrant-1.x chore(deps): update qdrant/qdrant docker tag to v1.16.0	2025-11-17 19:23:16 +01:00
renovate-bot-cbcoutinho[bot]	839cf159b8	chore(deps): update qdrant/qdrant docker tag to v1.16.0	2025-11-17 17:09:02 +00:00

1 2 3 4 5 ...

1211 Commits