- Add type casts for Starlette app state access
- Add assertions for cipher, card, board, stack after initialization
- Add None checks for XML element text attributes
- Handle __package__ being None in tracing setup
- Fix TokenBrokerService initialization to use storage credentials
Resolves 42 type warnings from ty-check, enabling CI linting to pass.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Addresses reviewer feedback on PR #395 about O(n²) performance issue.
Changes:
- scanner.py: Add metadata field to DocumentTask with board_id/stack_id
- scanner.py: Populate metadata during deck card scanning (both initial and incremental sync)
- processor.py: Use metadata for O(1) card lookup via get_card() API when available
- processor.py: Fallback to iteration for legacy data without metadata
- context.py: Add _get_deck_metadata_from_qdrant() helper to retrieve metadata from Qdrant
- context.py: Use metadata for fast path lookup in chunk context expansion
- context.py: Add user_id parameter to _fetch_document_text() for metadata retrieval
Performance Impact:
- Before: O(boards × stacks × cards) iteration for each card lookup
- After: O(1) direct API call using stored board_id/stack_id
- Graceful degradation: Falls back to iteration for legacy data
Testing:
- All existing integration tests pass (test_deck_vector_search.py)
- Type checking passes with no new errors
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Adds comprehensive vector search support for Nextcloud Deck cards,
including semantic search indexing, chunk preview in the vector viz UI,
and proper deep linking to cards.
**Vector Search Indexing**
- Add deck_card scanning in scanner.py (scan_deck_cards function)
- Index cards from non-archived, non-deleted boards
- Store metadata: board_id, board_title, stack_id, stack_title, card_type, duedate, owner
- Content structure: title + "\n\n" + description (matches indexing format)
- Incremental sync based on lastModified timestamp
- Deletion tracking with grace period
**Vector Visualization Support**
- Add deck_card handler in context.py for chunk preview expansion
- Include board_id in search result metadata (bm25_hybrid.py, semantic.py)
- Expose metadata in viz_routes.py JSON responses
- Update vector-viz.js to construct proper Deck URLs: /apps/deck/board/{board_id}/card/{card_id}
- Update vector_viz.html filter label from "Deck" to "Deck Cards"
**Bug Fixes**
- Skip soft-deleted boards (deletedAt > 0) to prevent 403 Forbidden errors
- Applies to scanner, processor, and context expansion code paths
- Deck API returns deleted boards but rejects stack access with 403
**Testing**
- Add integration tests in test_deck_vector_search.py:
- test_deck_card_semantic_search: Filtered search with doc_type="deck_card"
- test_deck_card_appears_in_cross_app_search: Cross-app search includes deck cards
- test_deck_card_chunk_context: Chunk context fetching for viz preview
**Documentation**
- Update README.md: Add Deck cards to semantic search feature list
- Update semantic-search-architecture.md: Document deck_card support
- Update nc_semantic_search tool documentation
**Type Safety**
- Fix type narrowing for page_boundaries (could be None) using cast()
- Fix scanner.py payload None check for type safety
Resolves vector search for Deck cards across indexing, search, and visualization.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Add support for news_item document type in the vector visualization page:
- Add "News" checkbox to document type filter options
- Add URL handler to link news items to /apps/news/item/{id}
- Add content fetching for news items in chunk context expansion
This enables users to search and view news articles in the vector
visualization, with clickable links back to Nextcloud News and the
ability to expand chunks to see full article context.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Implements optional context expansion for semantic search results that
fetches adjacent chunks (N-1 and N+1) from Qdrant to provide before/after
context. Removes configurable chunk overlap (default 200 chars) to avoid
duplicate text appearing in both context and excerpt.
Key changes:
- Add include_context and context_chars parameters to nc_semantic_search
and nc_semantic_search_answer tools
- Implement Qdrant cache fast path for chunk retrieval (avoids re-fetching
and re-parsing documents, especially important for PDFs)
- Add _get_chunk_by_index_from_qdrant() to fetch adjacent chunks
- Remove chunk overlap from before_context (last N chars) and after_context
(first N chars) to prevent duplicate text
- Fetch context in parallel with anyio.Semaphore (max 20 concurrent)
- Pass through page_number from SearchResult to SemanticSearchResult
- Remove document-level deduplication (keep chunk-level dedup from algorithm)
Context expansion is opt-in via include_context=true parameter. When enabled:
- Populates has_context_expansion, marked_text, before_context, after_context
- Adds truncation flags when context exceeds context_chars limit
- Falls back to document fetch for legacy data with truncated excerpts
Related: nextcloud_mcp_server/search/context.py:87-382,
nextcloud_mcp_server/server/semantic.py:161-255
This commit fixes two critical issues with PDF processing:
1. **Text extraction mismatch (context expansion bug)**:
- Indexing used pymupdf4llm.to_markdown() producing markdown text
- Context expansion used page.get_text() producing plain text
- Different text formats caused character offset misalignment
- Search would find correct chunk, but expansion showed wrong section
- Fixed by making context.py use pymupdf4llm.to_markdown() consistently
2. **Diagnostic logging for page number assignment**:
- Added logging to verify page_boundaries exist in metadata
- Added logging to verify assign_page_numbers() assigns values
- Helps diagnose why page numbers show as null in search results
3. **mime_type storage bug**:
- Fixed incorrect field reference in processor.py:405
- Was using file_metadata.get("content_type", "")
- Should use content_type from WebDAV response
Changes:
- nextcloud_mcp_server/search/context.py: Use pymupdf4llm.to_markdown()
for PDF text extraction to match indexing method
- nextcloud_mcp_server/vector/processor.py: Add diagnostic logging for
page boundaries and assignment, fix mime_type storage
- tests/unit/client/test_webdav.py: Fix import sorting
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
- scanner.py: Use file_info['id'] as doc_id instead of file_path
- scanner.py: Pass file_path in DocumentTask for content retrieval
- processor.py: Store file_path in Qdrant payload for later lookup
- context.py: Add _get_file_path_from_qdrant() to resolve file_id → file_path
- context.py: Update get_chunk_with_context() to handle file ID resolution
This makes the system resilient to file renames since file IDs are stable
identifiers in Nextcloud, while file paths can change.
Notes are indexed as "{title}\n\n{content}" in processor.py but were
being retrieved as just content during chunk expansion, causing
chunk_start_offset and chunk_end_offset to be misaligned.
This fix reconstructs the full content structure when fetching notes
for chunk expansion, ensuring the displayed chunks match the excerpts
shown in search results.
Fixes chunk/excerpt mismatch reported in vector visualization.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Major improvements to vector visualization page:
- Refactor PCA to display individual chunks instead of averaged documents
- Add context expansion module for fetching surrounding text from notes and PDFs
- Update deduplication to use (doc_id, doc_type, chunk_start, chunk_end) keys
- Fix Alpine.js rendering with chunk-specific keys including offsets
- Refactor authentication helper to return NextcloudClient for better reuse
- Add async context manager support to NextcloudClient
Technical details:
- viz_routes.py: Fetch specific chunk vectors instead of averaging per document
- context.py: New module supporting both notes and PDF text extraction via PyMuPDF
- search algorithms: Extract page_number, chunk_index, total_chunks from Qdrant
- vector-viz.js/html: Use chunk positions in expansion tracking keys
This enables users to see which specific chunks match their query
and view them with surrounding context in the PCA visualization.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>