nextcloud-mcp-server

Author	SHA1	Message	Date
smithery-ai[bot]	d9f458a8b7	Update README	2025-11-22 18:40:14 +00:00
Chris Coutinho	3b41776110	Merge pull request #342 from cbcoutinho/feature/smithery feat: Add Smithery stateless deployment support (ADR-016)	2025-11-22 19:39:53 +01:00
Chris Coutinho	3e3d38696c	docs(smithery): Make Smithery the primary Quick Start option Reorganize README to promote Smithery as the fastest way to get started: - Quick Start now features Smithery one-click deployment - Docker instructions moved to separate "Docker (Self-Hosted)" section - Added note about Smithery's stateless mode limitations 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-22 19:38:11 +01:00
Chris Coutinho	7b22e5be0f	build: add smithery image to docker compose	2025-11-22 19:06:25 +01:00
Chris Coutinho	39fba49cfe	fix(smithery): Add JSON Schema metadata to mcp-config endpoint Add proper JSON Schema metadata fields per Smithery documentation: - $schema: JSON Schema draft-07 - $id: Schema identifier URL - title: Human-readable title - description: Schema description - x-query-style: "flat" (no nested objects in our schema) - additionalProperties: false 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-22 18:28:05 +01:00
Chris Coutinho	706a15f0bc	fix(smithery): Use container runtime pattern for config discovery ADR-016: For container runtime deployment, Smithery does not auto-generate the .well-known/mcp-config endpoint like it does for Python CLI runtime. Changes: - Remove [tool.smithery] from pyproject.toml (not used in container mode) - Remove smithery_server.py (Python CLI runtime specific) - Add .well-known/mcp-config endpoint to return JSON Schema config - Add SmitheryConfigMiddleware to extract config from URL query params - Use ContextVar to pass session config to tool handlers The container runtime passes config as URL query parameters to /mcp: GET /mcp?nextcloud_url=...&username=...&app_password=... Tested: - All 164 unit tests passing - Docker container builds successfully - .well-known/mcp-config returns valid JSON Schema - Health endpoints working 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-22 18:22:55 +01:00
Chris Coutinho	b8dc413b73	feat: Add Smithery CLI deployment support - Add smithery package as dependency - Create smithery_server.py with @smithery.server() decorator - Add SmitheryConfigSchema for session config (nextcloud_url, username, app_password) - Add [tool.smithery] section to pyproject.toml - Remove manual .well-known/mcp-config endpoint (Smithery handles this) Smithery CLI will automatically: - Extract config schema from the decorated function - Handle session config parsing from query parameters - Make config accessible via ctx.session_config in tools 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-22 18:05:33 +01:00
Chris Coutinho	8d29ce0122	fix: Add Smithery lifespan and auth mode detection - Add SmitheryAppContext dataclass for stateless mode - Add app_lifespan_smithery() with minimal lifespan (no shared state) - Update is_oauth_mode() to detect Smithery mode and return BasicAuth - Use Smithery lifespan when SMITHERY_DEPLOYMENT=true - Add .well-known/mcp-config endpoint for config discovery - Skip document processors in Smithery mode (not enabled) Fixes startup issues in Smithery mode where missing env credentials would incorrectly trigger OAuth mode. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-22 17:48:53 +01:00
Chris Coutinho	a272e7cbab	build: Fix Dockerfile.smithery	2025-11-22 17:35:16 +01:00
Chris Coutinho	ce55b239e2	build: Fix Dockerfile.smithery	2025-11-22 17:33:12 +01:00
Chris Coutinho	432ab73741	build: Add missing deps	2025-11-22 17:32:20 +01:00
Chris Coutinho	f93d650992	feat: Implement ADR-016 Smithery stateless deployment mode Adds support for Smithery hosted deployment with stateless operation: - Add DeploymentMode enum with SELF_HOSTED and SMITHERY_STATELESS modes - Add get_deployment_mode() to detect mode from SMITHERY_DEPLOYMENT env var - Update get_client() to create per-request clients from session config - Add conditional tool registration (skip semantic search in Smithery mode) - Add conditional /app admin UI mounting (skip in Smithery mode) - Create smithery.yaml with configSchema for user credentials - Create Dockerfile.smithery for minimal stateless container - Create smithery_main.py entrypoint for Smithery deployment In Smithery mode: - Users provide nextcloud_url, username, app_password via session config - Each request creates a fresh NextcloudClient (no state between requests) - Semantic search tools are disabled (no vector database) - Admin UI (/app) is disabled (no webhooks, vector viz) All existing self-hosted functionality remains unchanged. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-22 17:30:42 +01:00
github-actions[bot]	f9da19d1a1	bump: version 0.44.1 → 0.45.0 nextcloud-mcp-server-0.45.0 v0.45.0	2025-11-22 16:14:35 +00:00
Chris Coutinho	d2b6a26fe4	Merge pull request #341 from cbcoutinho/fix/async-await-and-pdf-metadata fix: Async/await patterns, PDF metadata, and vector visualization improvements	2025-11-22 17:14:06 +01:00
Chris Coutinho	482ef89a73	docs: Add ADR-016 for Smithery stateless deployment Add architecture decision record for supporting Smithery-hosted MCP server in a stateless mode for multi-user public Nextcloud instances. Key decisions: - New SMITHERY_STATELESS deployment mode alongside SELF_HOSTED - Session-based configuration (nextcloud_url, username, app_password) - Feature subset excluding semantic search and background sync - Admin UI (/app) excluded in Smithery mode - Per-request client creation from session config This enables users to try the MCP server without self-hosting infrastructure while supporting multiple Nextcloud instances. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-22 17:13:18 +01:00
Chris Coutinho	34fd17ba55	fix: Use alpha_composite for proper RGBA highlight blending Drawing directly with ImageDraw on RGBA mode doesn't blend alpha properly. Use Image.alpha_composite() with a transparent overlay to achieve correct semi-transparent highlight fills. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-22 17:04:29 +01:00
Chris Coutinho	8baa07db84	fix: Remove pymupdf.layout.activate() to fix page_chunks behavior pymupdf.layout.activate() causes pymupdf4llm.to_markdown() to ignore the page_chunks=True option, returning a single string instead of list[dict]. This broke per-page chunking needed for semantic search indexing. See: https://github.com/pymupdf/pymupdf4llm/issues/323 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-22 16:58:35 +01:00
Chris Coutinho	ba8a53803a	refactor: Simplify PDF text extraction with single to_markdown call Replace parallel per-page extraction with single to_markdown(page_chunks=True) call. This is more efficient as pymupdf4llm can optimize internally for full-document processing instead of making N separate calls for N pages. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-22 03:52:02 +01:00
Chris Coutinho	31fade9730	perf: Optimize PDF processing with parallel extraction and single-render highlights Phase 1 - PDF Highlighting Optimization: - Render each page ONCE instead of once per chunk (N chunks = 1 render, not N) - Use PIL to draw bounding boxes on copied base images (fast) instead of re-rendering page via pymupdf (slow) - Add _find_chunk_bbox() to extract bbox without modifying page Phase 2 - Parallel Page Extraction: - Use anyio task group with run_sync() for parallel page extraction - Each page extracted in separate thread via anyio.to_thread.run_sync() - Event loop stays responsive during extraction - Remove obsolete _process_sync() method Expected improvement: 30-50% reduction in total PDF processing time. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-22 03:11:56 +01:00
Chris Coutinho	fffe483c02	fix: Centralize PDF processing and generate separate images per chunk Previously, pymupdf4llm.to_markdown() was called twice - once in PyMuPDFProcessor during indexing and again in PDFHighlighter during visualization. Different image path lengths caused different character offsets, leading to highlighted pages not matching their chunks. Also fixed issue where all chunks on the same page showed all highlights instead of just their own highlight. Now restores original page contents between chunks using xref stream caching. Changes: - Add PDFHighlighter class requiring pre-computed page_boundaries and full_text from document processor (no fallback extraction) - Pass pre-computed data from processor to highlighter - Extract page-relative portion of chunk text for cross-page chunks - Add bounding box highlighting using text anchor search - Run highlight generation in parallel with embedding/BM25 - Cache and restore page contents to isolate highlights per chunk Results: Highlighting success rate improved from 51% to 95% (121/128). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-22 02:46:30 +01:00
Chris Coutinho	8c79993280	Merge pull request #334 from cbcoutinho/renovate/docker.io-library-redis-alpine chore(deps): update docker.io/library/redis:alpine docker digest to 6cbef35	2025-11-21 14:24:54 +01:00
Chris Coutinho	8a0672a6be	Merge pull request #339 from cbcoutinho/renovate/astral-sh-setup-uv-7.x chore(deps): update astral-sh/setup-uv action to v7.1.4	2025-11-21 14:24:42 +01:00
Chris Coutinho	395f798ee2	Merge pull request #340 from cbcoutinho/renovate/ollama-1.x chore(deps): update helm release ollama to v1.35.0	2025-11-21 14:24:26 +01:00
renovate-bot-cbcoutinho[bot]	debff75221	chore(deps): update helm release ollama to v1.35.0	2025-11-21 11:09:18 +00:00
renovate-bot-cbcoutinho[bot]	4bf0a6c22e	chore(deps): update astral-sh/setup-uv action to v7.1.4	2025-11-21 11:08:53 +00:00
Chris Coutinho	fb025821cb	Merge pull request #335 from cbcoutinho/renovate/ghcr.io-astral-sh-uv-0.x chore(deps): update ghcr.io/astral-sh/uv docker tag to v0.9.11	2025-11-21 09:45:31 +01:00
Chris Coutinho	ff880fd4c9	Merge pull request #338 from cbcoutinho/renovate/docker.io-library-nextcloud-32.x chore(deps): update docker.io/library/nextcloud docker tag to v32.0.2	2025-11-21 09:34:20 +01:00
renovate-bot-cbcoutinho[bot]	03495d901d	chore(deps): update docker.io/library/nextcloud docker tag to v32.0.2	2025-11-21 05:14:28 +00:00
github-actions[bot]	798958f20a	bump: version 0.44.0 → 0.44.1 nextcloud-mcp-server-0.44.1 v0.44.1	2025-11-21 00:39:23 +00:00
Chris Coutinho	699295c5be	Merge pull request #336 from cbcoutinho/renovate/mcp-1.x fix(deps): update dependency mcp to >=1.22,<1.23	2025-11-21 01:38:50 +01:00
Chris Coutinho	a62a007c87	feat: Add context expansion to semantic search with chunk overlap removal Implements optional context expansion for semantic search results that fetches adjacent chunks (N-1 and N+1) from Qdrant to provide before/after context. Removes configurable chunk overlap (default 200 chars) to avoid duplicate text appearing in both context and excerpt. Key changes: - Add include_context and context_chars parameters to nc_semantic_search and nc_semantic_search_answer tools - Implement Qdrant cache fast path for chunk retrieval (avoids re-fetching and re-parsing documents, especially important for PDFs) - Add _get_chunk_by_index_from_qdrant() to fetch adjacent chunks - Remove chunk overlap from before_context (last N chars) and after_context (first N chars) to prevent duplicate text - Fetch context in parallel with anyio.Semaphore (max 20 concurrent) - Pass through page_number from SearchResult to SemanticSearchResult - Remove document-level deduplication (keep chunk-level dedup from algorithm) Context expansion is opt-in via include_context=true parameter. When enabled: - Populates has_context_expansion, marked_text, before_context, after_context - Adds truncation flags when context exceeds context_chars limit - Falls back to document fetch for legacy data with truncated excerpts Related: nextcloud_mcp_server/search/context.py:87-382, nextcloud_mcp_server/server/semantic.py:161-255	2025-11-21 01:02:22 +01:00
renovate-bot-cbcoutinho[bot]	d4fc1de80d	fix(deps): update dependency mcp to >=1.22,<1.23	2025-11-20 23:11:11 +00:00
renovate-bot-cbcoutinho[bot]	0902b5653f	chore(deps): update ghcr.io/astral-sh/uv docker tag to v0.9.11	2025-11-20 23:10:47 +00:00
renovate-bot-cbcoutinho[bot]	0b6a02075c	chore(deps): update docker.io/library/redis:alpine docker digest to 6cbef35	2025-11-20 23:10:43 +00:00
Chris Coutinho	7880a8de30	Merge pull request #333 from cbcoutinho/renovate/actions-checkout-6.x chore(deps): update actions/checkout action to v6	2025-11-20 20:17:21 +01:00
renovate-bot-cbcoutinho[bot]	2abedd6b4b	chore(deps): update actions/checkout action to v6	2025-11-20 17:12:30 +00:00
Chris Coutinho	5a251a99e6	fix: Set is_placeholder=False in processor to fix search filtering The processor was not setting is_placeholder field when writing real document chunks to Qdrant. This caused the placeholder filter to exclude all documents (since None != False), resulting in 0 search results. Now explicitly sets is_placeholder: False in payload when writing real indexed chunks, allowing search filters to correctly distinguish between placeholders and real documents.	2025-11-20 17:15:19 +01:00
Chris Coutinho	25ef33de7f	feat: Use Ollama native batch API in embed_batch() - Switch from sequential loop to /api/embed batch endpoint - Use 'input' array parameter instead of individual 'prompt' requests - Process in chunks of 32 to avoid quality degradation (issue #6262) - Reduces HTTP overhead: 128 texts = 4 requests instead of 128 - Maintains backward compatibility with embed() for single embeddings Ref: ollama/ollama#6262	2025-11-20 16:50:13 +01:00
Chris Coutinho	ec2c274cd9	fix: Increase placeholder staleness threshold to 5x scan interval - Changed from 2x (120s) to 5x (300s) scan interval - Large PDFs take 3-4 minutes to process, need longer threshold - Prevents premature requeuing of in-flight documents	2025-11-20 15:36:49 +01:00
Chris Coutinho	47f0b3db9a	fix: Add placeholder staleness check to prevent duplicate processing - Only requeue documents if placeholder is older than 2x scan interval (120s default) - Prevents scanner from immediately requeuing in-flight documents - Fixes issue where PDFs were being reprocessed every 60 seconds - Staleness check applied to both notes and files scanning logic	2025-11-20 15:30:10 +01:00
Chris Coutinho	233de3508f	fix: Use empty SparseVector instead of None for placeholders Qdrant validation rejects None for sparse vectors in named vector dicts. Use models.SparseVector(indices=[], values=[]) instead to create valid empty sparse vectors for placeholder points. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-20 15:15:10 +01:00
Chris Coutinho	13b2d0048c	feat: Implement Qdrant placeholder state management Introduces a placeholder-based state tracking system to prevent duplicate document processing during the gap between scanner queuing and processor completion. Key Changes: 1. Placeholder Helper Functions (`vector/placeholder.py`): - `write_placeholder_point()` - Creates zero-vector placeholder when queuing - `query_document_metadata()` - Queries for existing entry (placeholder or real) - `delete_placeholder_point()` - Removes placeholder before writing real vectors - `get_placeholder_filter()` - Filters placeholders from user-facing queries 2. Scanner Updates (`vector/scanner.py`): - Replace `indexed_at` comparison with `modified_at` comparison - Write placeholder before queuing each document - Query per-document metadata instead of bulk-querying indexed_at - Fixes bug where files were resubmitted every scan cycle 3. Processor Updates (`vector/processor.py`): - Delete placeholder before upserting real vectors - Ensures no duplicate points in Qdrant 4. Query Filters (all search files): - Add `get_placeholder_filter()` to all user-facing queries - Ensures placeholders never appear in search results or visualizations - Applied to: bm25_hybrid.py, semantic.py, viz_routes.py, algorithms.py Architecture: - Placeholders use zero vectors with dimension from embedding service - Payload includes `is_placeholder: True` flag for filtering - Status field tracks: "pending", "processing", "completed", "failed" - Deterministic UUIDs using uuid5 for consistent point IDs Impact: - Eliminates duplicate processing of same documents - Fixes race condition where long-running documents get queued multiple times - Prevents scanner from resubmitting files every scan cycle - Maintains clean separation between in-flight and indexed documents 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-20 15:04:00 +01:00
Chris Coutinho	944dd760ca	fix: Return empty array instead of null for query_coords when no results When vector visualization search returns zero results, the code was returning query_coords: null, which caused JavaScript error "can't access property 0, queryCoords is null" when the frontend tried to access the array. Changed to return empty array [] to match expected type and prevent crash. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-20 14:18:02 +01:00
Chris Coutinho	d67aa6ae5c	fix: Align PDF text extraction between indexing and context expansion This commit fixes two critical issues with PDF processing: 1. Text extraction mismatch (context expansion bug): - Indexing used pymupdf4llm.to_markdown() producing markdown text - Context expansion used page.get_text() producing plain text - Different text formats caused character offset misalignment - Search would find correct chunk, but expansion showed wrong section - Fixed by making context.py use pymupdf4llm.to_markdown() consistently 2. Diagnostic logging for page number assignment: - Added logging to verify page_boundaries exist in metadata - Added logging to verify assign_page_numbers() assigns values - Helps diagnose why page numbers show as null in search results 3. mime_type storage bug: - Fixed incorrect field reference in processor.py:405 - Was using file_metadata.get("content_type", "") - Should use content_type from WebDAV response Changes: - nextcloud_mcp_server/search/context.py: Use pymupdf4llm.to_markdown() for PDF text extraction to match indexing method - nextcloud_mcp_server/vector/processor.py: Add diagnostic logging for page boundaries and assignment, fix mime_type storage - tests/unit/client/test_webdav.py: Fix import sorting 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-20 13:57:50 +01:00
Chris Coutinho	f1a5fac1b9	fix: Update models and viz to use int-only doc_id - algorithms.py: Revert SearchResult.id to int (all docs use int IDs now) - semantic.py: Revert SemanticSearchResult.id to int, remove Union import - viz_routes.py: Remove str() conversion when querying doc_id from Qdrant - viz_routes.py: Convert doc_id from query param to int in chunk context Fixes vector visualization which was collapsing all chunks to a single point because Qdrant queries were failing to match doc_id (string vs int).	2025-11-20 12:32:27 +01:00
Chris Coutinho	d0691d5aa0	feat: Switch files to use numeric IDs with file_path resolution - scanner.py: Use file_info['id'] as doc_id instead of file_path - scanner.py: Pass file_path in DocumentTask for content retrieval - processor.py: Store file_path in Qdrant payload for later lookup - context.py: Add _get_file_path_from_qdrant() to resolve file_id → file_path - context.py: Update get_chunk_with_context() to handle file ID resolution This makes the system resilient to file renames since file IDs are stable identifiers in Nextcloud, while file paths can change.	2025-11-20 12:00:47 +01:00
Chris Coutinho	f1610bbd2e	fix: Reconstruct full content for notes to match indexed offsets Notes are indexed as "{title}\n\n{content}" in processor.py but were being retrieved as just content during chunk expansion, causing chunk_start_offset and chunk_end_offset to be misaligned. This fix reconstructs the full content structure when fetching notes for chunk expansion, ensuring the displayed chunks match the excerpts shown in search results. Fixes chunk/excerpt mismatch reported in vector visualization. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-20 11:33:12 +01:00
Chris Coutinho	327d843f64	feat: Implement per-chunk vector visualization with context expansion Major improvements to vector visualization page: - Refactor PCA to display individual chunks instead of averaged documents - Add context expansion module for fetching surrounding text from notes and PDFs - Update deduplication to use (doc_id, doc_type, chunk_start, chunk_end) keys - Fix Alpine.js rendering with chunk-specific keys including offsets - Refactor authentication helper to return NextcloudClient for better reuse - Add async context manager support to NextcloudClient Technical details: - viz_routes.py: Fetch specific chunk vectors instead of averaging per document - context.py: New module supporting both notes and PDF text extraction via PyMuPDF - search algorithms: Extract page_number, chunk_index, total_chunks from Qdrant - vector-viz.js/html: Use chunk positions in expansion tracking keys This enables users to see which specific chunks match their query and view them with surrounding context in the PCA visualization. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-20 11:22:20 +01:00
Chris Coutinho	b8010270c1	fix: Add async/await, PDF metadata, and type safety fixes This commit addresses multiple issues with async operations, PDF metadata extraction, and type safety in document processing and search. ## Async/Await Fixes - processor.py:259 - Added await for chunker.chunk_text(content) - processor.py:270 - Added await for bm25_service.encode_batch(chunk_texts) - tests/unit/test_document_chunker.py - Converted all 12 test methods to async ## PDF Metadata Enhancement - pymupdf.py:143 - Added file_size metadata extraction - pymupdf.py:145-206 - Refactored to extract text page-by-page - Manually loop through pages instead of using page_chunks=True - Generate page_boundaries metadata for precise page tracking - Works around pymupdf.layout.activate() breaking page_chunks=True - processor.py:32-66 - Added assign_page_numbers() helper function - Assigns page numbers to chunks based on overlap with page boundaries - Handles chunks spanning multiple pages - processor.py:298-300 - Call assign_page_numbers() for PDF files ## Type Safety Fixes - bm25_hybrid.py:184 - Removed int() conversion of doc_id - semantic.py:131 - Removed int() conversion of doc_id - viz_routes.py:275 - Removed int() conversion of doc_id - Added comments documenting that doc_id can be int (notes) or str (file paths) ## Testing - All 18 tests passing (12 unit + 6 integration) - No type errors in modified files - Container logs show successful processing - Vector viz searches working correctly 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-20 02:37:07 +01:00
Chris Coutinho	0f24bdb17a	docs: Add svg	2025-11-19 23:44:23 +01:00

1 2 3 4 5 ...

1196 Commits