feat(vector): add Deck card vector search with visualization support
Adds comprehensive vector search support for Nextcloud Deck cards,
including semantic search indexing, chunk preview in the vector viz UI,
and proper deep linking to cards.
**Vector Search Indexing**
- Add deck_card scanning in scanner.py (scan_deck_cards function)
- Index cards from non-archived, non-deleted boards
- Store metadata: board_id, board_title, stack_id, stack_title, card_type, duedate, owner
- Content structure: title + "\n\n" + description (matches indexing format)
- Incremental sync based on lastModified timestamp
- Deletion tracking with grace period
**Vector Visualization Support**
- Add deck_card handler in context.py for chunk preview expansion
- Include board_id in search result metadata (bm25_hybrid.py, semantic.py)
- Expose metadata in viz_routes.py JSON responses
- Update vector-viz.js to construct proper Deck URLs: /apps/deck/board/{board_id}/card/{card_id}
- Update vector_viz.html filter label from "Deck" to "Deck Cards"
**Bug Fixes**
- Skip soft-deleted boards (deletedAt > 0) to prevent 403 Forbidden errors
- Applies to scanner, processor, and context expansion code paths
- Deck API returns deleted boards but rejects stack access with 403
**Testing**
- Add integration tests in test_deck_vector_search.py:
- test_deck_card_semantic_search: Filtered search with doc_type="deck_card"
- test_deck_card_appears_in_cross_app_search: Cross-app search includes deck cards
- test_deck_card_chunk_context: Chunk context fetching for viz preview
**Documentation**
- Update README.md: Add Deck cards to semantic search feature list
- Update semantic-search-architecture.md: Document deck_card support
- Update nc_semantic_search tool documentation
**Type Safety**
- Fix type narrowing for page_boundaries (could be None) using cast()
- Fix scanner.py payload None check for type safety
Resolves vector search for Deck cards across indexing, search, and visualization.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
This commit is contained in:
@@ -63,7 +63,7 @@ http://127.0.0.1:8000/mcp
|
||||
|
||||
- **90+ MCP Tools** - Comprehensive API coverage across 8 Nextcloud apps
|
||||
- **MCP Resources** - Structured data URIs for browsing Nextcloud data
|
||||
- **Semantic Search (Experimental)** - Optional vector-powered search for Notes (requires Qdrant + Ollama)
|
||||
- **Semantic Search (Experimental)** - Optional vector-powered search for Notes, Files, News items, and Deck cards (requires Qdrant + Ollama)
|
||||
- **Document Processing** - OCR and text extraction from PDFs, DOCX, images with progress notifications
|
||||
- **Flexible Deployment** - Docker, Kubernetes (Helm), VM, or local installation
|
||||
- **Production-Ready Auth** - Basic Auth with app passwords (recommended) or OAuth2/OIDC (experimental)
|
||||
@@ -81,7 +81,7 @@ http://127.0.0.1:8000/mcp
|
||||
| **Cookbook** | 13 | Recipe management, URL import (schema.org) |
|
||||
| **Tables** | 5 | Row operations on Nextcloud Tables |
|
||||
| **Sharing** | 10+ | Create and manage shares |
|
||||
| **Semantic Search** | 2+ | Vector search for Notes (experimental, opt-in, requires infrastructure) |
|
||||
| **Semantic Search** | 2+ | Vector search for Notes, Files, News items, and Deck cards (experimental, opt-in, requires infrastructure) |
|
||||
|
||||
Want to see another Nextcloud app supported? [Open an issue](https://github.com/cbcoutinho/nextcloud-mcp-server/issues) or contribute a pull request!
|
||||
|
||||
@@ -145,7 +145,7 @@ This enables natural language queries and helps discover related content across
|
||||
### Features
|
||||
- **[App Documentation](docs/)** - Notes, Calendar, Contacts, WebDAV, Deck, Cookbook, Tables
|
||||
- **[Document Processing](docs/configuration.md#document-processing)** - OCR and text extraction setup
|
||||
- **[Semantic Search Architecture](docs/semantic-search-architecture.md)** - Experimental vector search (Notes only, opt-in)
|
||||
- **[Semantic Search Architecture](docs/semantic-search-architecture.md)** - Experimental vector search (Notes, Files, News items, Deck cards; opt-in)
|
||||
- **[Vector Sync UI Guide](docs/user-guide/vector-sync-ui.md)** - Browser interface for semantic search visualization and testing
|
||||
|
||||
### Advanced Topics
|
||||
|
||||
@@ -5,7 +5,7 @@ This document explains the architecture of the semantic search feature in the Ne
|
||||
> [!IMPORTANT]
|
||||
> **Status: Experimental**
|
||||
> - Disabled by default (`VECTOR_SYNC_ENABLED=false`)
|
||||
> - Currently supports **Notes app only** (multi-app architecture ready, additional apps planned)
|
||||
> - Currently supports **Notes, Files (PDFs), News items, and Deck cards**
|
||||
> - Requires additional infrastructure (Qdrant vector database + Ollama embedding service)
|
||||
> - RAG answer generation requires MCP client sampling support
|
||||
|
||||
@@ -39,9 +39,9 @@ Semantic search enables:
|
||||
|
||||
### Current Support
|
||||
|
||||
- **Supported Apps**: Notes (fully implemented)
|
||||
- **Planned Apps**: Calendar events, Calendar tasks, Deck cards, Files (with text extraction), Contacts
|
||||
- **Architecture**: Multi-app plugin system ready, awaiting implementation
|
||||
- **Supported Apps**: Notes, Files (PDFs with text extraction), News items, Deck cards
|
||||
- **Planned Apps**: Calendar events, Calendar tasks, Contacts
|
||||
- **Architecture**: Multi-app plugin system ready for additional apps
|
||||
|
||||
## System Components
|
||||
|
||||
|
||||
@@ -201,7 +201,12 @@ function vizApp() {
|
||||
return `${baseUrl}/apps/calendar`;
|
||||
case 'contact':
|
||||
return `${baseUrl}/apps/contacts`;
|
||||
case 'deck':
|
||||
case 'deck_card':
|
||||
// URL pattern: /apps/deck/board/:boardId/card/:cardId
|
||||
if (result.metadata && result.metadata.board_id) {
|
||||
return `${baseUrl}/apps/deck/board/${result.metadata.board_id}/card/${result.id}`;
|
||||
}
|
||||
// Fallback if board_id not available
|
||||
return `${baseUrl}/apps/deck`;
|
||||
case 'news_item':
|
||||
return `${baseUrl}/apps/news/item/${result.id}`;
|
||||
|
||||
@@ -65,8 +65,8 @@
|
||||
<span>Contacts</span>
|
||||
</label>
|
||||
<label style="display: flex; align-items: center; cursor: pointer; font-weight: normal;">
|
||||
<input type="checkbox" x-model="docTypes" value="deck" style="margin-right: 4px;">
|
||||
<span>Deck</span>
|
||||
<input type="checkbox" x-model="docTypes" value="deck_card" style="margin-right: 4px;">
|
||||
<span>Deck Cards</span>
|
||||
</label>
|
||||
<label style="display: flex; align-items: center; cursor: pointer; font-weight: normal;">
|
||||
<input type="checkbox" x-model="docTypes" value="news_item" style="margin-right: 4px;">
|
||||
|
||||
@@ -298,6 +298,7 @@ async def vector_visualization_search(request: Request) -> JSONResponse:
|
||||
"title": r.title,
|
||||
"excerpt": r.excerpt,
|
||||
"score": r.score,
|
||||
"metadata": r.metadata,
|
||||
}
|
||||
for r in search_results
|
||||
],
|
||||
@@ -458,6 +459,7 @@ async def vector_visualization_search(request: Request) -> JSONResponse:
|
||||
), # Raw score from algorithm
|
||||
"chunk_start_offset": r.chunk_start_offset,
|
||||
"chunk_end_offset": r.chunk_end_offset,
|
||||
"metadata": r.metadata, # Include metadata (e.g., board_id for deck_card)
|
||||
}
|
||||
for r in search_results
|
||||
]
|
||||
|
||||
@@ -219,6 +219,18 @@ class BM25HybridSearchAlgorithm(SearchAlgorithm):
|
||||
|
||||
seen_chunks.add(chunk_key)
|
||||
|
||||
# Build metadata dict with common fields
|
||||
metadata = {
|
||||
"chunk_index": result.payload.get("chunk_index"),
|
||||
"total_chunks": result.payload.get("total_chunks"),
|
||||
"search_method": f"bm25_hybrid_{self.fusion_name}",
|
||||
}
|
||||
|
||||
# Add deck_card-specific metadata for frontend URL construction
|
||||
if doc_type == "deck_card":
|
||||
if board_id := result.payload.get("board_id"):
|
||||
metadata["board_id"] = board_id
|
||||
|
||||
# Return unverified results (verification happens at output stage)
|
||||
results.append(
|
||||
SearchResult(
|
||||
@@ -227,11 +239,7 @@ class BM25HybridSearchAlgorithm(SearchAlgorithm):
|
||||
title=result.payload.get("title", "Untitled"),
|
||||
excerpt=result.payload.get("excerpt", ""),
|
||||
score=result.score, # Fusion score (RRF or DBSF)
|
||||
metadata={
|
||||
"chunk_index": result.payload.get("chunk_index"),
|
||||
"total_chunks": result.payload.get("total_chunks"),
|
||||
"search_method": f"bm25_hybrid_{self.fusion_name}",
|
||||
},
|
||||
metadata=metadata,
|
||||
chunk_start_offset=result.payload.get("chunk_start_offset"),
|
||||
chunk_end_offset=result.payload.get("chunk_end_offset"),
|
||||
page_number=result.payload.get("page_number"),
|
||||
|
||||
@@ -544,6 +544,46 @@ async def _fetch_document_text(
|
||||
content_parts.append("") # Blank line
|
||||
content_parts.append(body_markdown)
|
||||
return "\n".join(content_parts)
|
||||
elif doc_type == "deck_card":
|
||||
# Fetch card from Deck API
|
||||
# Note: Deck API requires board_id and stack_id, but we don't store those
|
||||
# We need to search through boards to find the card (same as processor.py)
|
||||
boards = await nc_client.deck.get_boards()
|
||||
card_found = False
|
||||
|
||||
for board in boards:
|
||||
if card_found:
|
||||
break
|
||||
|
||||
# Skip deleted boards (soft delete: deletedAt > 0)
|
||||
if board.deletedAt > 0:
|
||||
logger.debug(
|
||||
f"Skipping deleted board {board.id} while searching for card {doc_id}"
|
||||
)
|
||||
continue
|
||||
|
||||
stacks = await nc_client.deck.get_stacks(board.id)
|
||||
|
||||
for stack in stacks:
|
||||
if card_found:
|
||||
break
|
||||
if stack.cards:
|
||||
for card in stack.cards:
|
||||
if card.id == int(doc_id):
|
||||
# Reconstruct full content as indexed: title + "\n\n" + description
|
||||
# This ensures chunk offsets align with indexed content structure
|
||||
content_parts = [card.title]
|
||||
if card.description:
|
||||
content_parts.append(card.description)
|
||||
card_found = True
|
||||
logger.debug(
|
||||
f"Found deck card {doc_id} in board {board.id}, stack {stack.id}"
|
||||
)
|
||||
return "\n\n".join(content_parts)
|
||||
|
||||
# Card not found (might be archived or deleted)
|
||||
logger.warning(f"Deck card {doc_id} not found in any board/stack")
|
||||
return None
|
||||
else:
|
||||
logger.warning(f"Unsupported doc_type for context expansion: {doc_type}")
|
||||
return None
|
||||
|
||||
@@ -151,6 +151,17 @@ class SemanticSearchAlgorithm(SearchAlgorithm):
|
||||
|
||||
seen_chunks.add(chunk_key)
|
||||
|
||||
# Build metadata dict with common fields
|
||||
metadata = {
|
||||
"chunk_index": result.payload.get("chunk_index"),
|
||||
"total_chunks": result.payload.get("total_chunks"),
|
||||
}
|
||||
|
||||
# Add deck_card-specific metadata for frontend URL construction
|
||||
if doc_type == "deck_card":
|
||||
if board_id := result.payload.get("board_id"):
|
||||
metadata["board_id"] = board_id
|
||||
|
||||
# Return unverified results (verification happens at output stage)
|
||||
results.append(
|
||||
SearchResult(
|
||||
@@ -159,10 +170,7 @@ class SemanticSearchAlgorithm(SearchAlgorithm):
|
||||
title=result.payload.get("title", "Untitled"),
|
||||
excerpt=result.payload.get("excerpt", ""),
|
||||
score=result.score,
|
||||
metadata={
|
||||
"chunk_index": result.payload.get("chunk_index"),
|
||||
"total_chunks": result.payload.get("total_chunks"),
|
||||
},
|
||||
metadata=metadata,
|
||||
chunk_start_offset=result.payload.get("chunk_start_offset"),
|
||||
chunk_end_offset=result.payload.get("chunk_end_offset"),
|
||||
page_number=result.payload.get("page_number"),
|
||||
|
||||
@@ -65,13 +65,13 @@ def configure_semantic_tools(mcp: FastMCP):
|
||||
database for optimal relevance. This provides the best of both semantic
|
||||
understanding and keyword precision.
|
||||
|
||||
Requires VECTOR_SYNC_ENABLED=true. Currently only "note" documents are
|
||||
fully supported for indexing.
|
||||
Requires VECTOR_SYNC_ENABLED=true. Supports indexing of notes, files,
|
||||
news items, and deck cards.
|
||||
|
||||
Args:
|
||||
query: Natural language or keyword search query
|
||||
limit: Maximum number of results to return (default: 10)
|
||||
doc_types: Document types to search (e.g., ["note", "file"]). None = search all indexed types (default)
|
||||
doc_types: Document types to search (e.g., ["note", "file", "deck_card", "news_item"]). None = search all indexed types (default)
|
||||
score_threshold: Minimum fusion score (0-1, default: 0.0)
|
||||
fusion: Fusion algorithm: "rrf" (Reciprocal Rank Fusion, default) or "dbsf" (Distribution-Based Score Fusion)
|
||||
RRF: Good general-purpose fusion using reciprocal ranks
|
||||
|
||||
@@ -6,6 +6,7 @@ Processes documents from stream: fetches content, generates embeddings, stores i
|
||||
import logging
|
||||
import time
|
||||
import uuid
|
||||
from typing import Any, cast
|
||||
|
||||
import anyio
|
||||
from anyio.abc import TaskStatus
|
||||
@@ -311,6 +312,64 @@ async def _index_document(
|
||||
file_path = None
|
||||
content_bytes = None
|
||||
content_type = None
|
||||
elif doc_task.doc_type == "deck_card":
|
||||
# Fetch card from Deck API
|
||||
# Note: We need board_id and stack_id to fetch the card
|
||||
# For now, we'll need to get all boards and find the card
|
||||
# This is not optimal, but Deck API requires board_id and stack_id
|
||||
boards = await nc_client.deck.get_boards()
|
||||
card_found = False
|
||||
|
||||
for board in boards:
|
||||
if card_found:
|
||||
break
|
||||
# Skip deleted boards (soft delete: deletedAt > 0)
|
||||
if board.deletedAt > 0:
|
||||
continue
|
||||
stacks = await nc_client.deck.get_stacks(board.id)
|
||||
for stack in stacks:
|
||||
if card_found:
|
||||
break
|
||||
if stack.cards:
|
||||
for card in stack.cards:
|
||||
if card.id == int(doc_task.doc_id):
|
||||
# Build content from card title and description
|
||||
content_parts = [card.title]
|
||||
if card.description:
|
||||
content_parts.append(card.description)
|
||||
content = "\n\n".join(content_parts)
|
||||
title = card.title
|
||||
|
||||
# Store deck-specific metadata
|
||||
file_metadata = {
|
||||
"board_id": board.id,
|
||||
"board_title": board.title,
|
||||
"stack_id": stack.id,
|
||||
"stack_title": stack.title,
|
||||
"card_type": card.type,
|
||||
"duedate": (
|
||||
card.duedate.isoformat()
|
||||
if card.duedate
|
||||
else None
|
||||
),
|
||||
"archived": card.archived,
|
||||
"owner": (
|
||||
card.owner.uid
|
||||
if hasattr(card.owner, "uid")
|
||||
else str(card.owner)
|
||||
),
|
||||
}
|
||||
etag = card.etag or ""
|
||||
file_path = None
|
||||
content_bytes = None
|
||||
content_type = None
|
||||
card_found = True
|
||||
break
|
||||
|
||||
if not card_found:
|
||||
raise ValueError(
|
||||
f"Deck card {doc_task.doc_id} not found in any board/stack"
|
||||
)
|
||||
elif doc_task.doc_type == "file":
|
||||
# For files, doc_id is now the numeric file ID, file_path comes from DocumentTask
|
||||
if not doc_task.file_path:
|
||||
@@ -399,14 +458,16 @@ async def _index_document(
|
||||
# Assign page numbers to chunks if page boundaries are available (PDFs)
|
||||
page_boundaries = file_metadata.get("page_boundaries")
|
||||
if doc_task.doc_type == "file" and page_boundaries is not None:
|
||||
# Type narrowing: page_boundaries is guaranteed to be list[dict] here
|
||||
page_boundaries_list = cast(list[dict[str, Any]], page_boundaries)
|
||||
with trace_operation(
|
||||
"vector_sync.assign_page_numbers",
|
||||
attributes={
|
||||
"vector_sync.chunk_count": len(chunks),
|
||||
"vector_sync.page_count": len(page_boundaries),
|
||||
"vector_sync.page_count": len(page_boundaries_list),
|
||||
},
|
||||
):
|
||||
assign_page_numbers(chunks, page_boundaries)
|
||||
assign_page_numbers(chunks, page_boundaries_list)
|
||||
|
||||
# Diagnostic: Verify page number assignment
|
||||
assigned_count = sum(1 for c in chunks if c.page_number is not None)
|
||||
@@ -429,8 +490,8 @@ async def _index_document(
|
||||
f"Text length: {len(content)}, "
|
||||
f"Chunks: {len(chunks)}, "
|
||||
f"Chunk offset range: [{chunks[0].start_offset}:{chunks[-1].end_offset}], "
|
||||
f"Page boundaries: {len(page_boundaries)} pages, "
|
||||
f"First boundary: {page_boundaries[0] if page_boundaries else 'None'}"
|
||||
f"Page boundaries: {len(page_boundaries_list)} pages, "
|
||||
f"First boundary: {page_boundaries_list[0] if page_boundaries_list else 'None'}"
|
||||
)
|
||||
|
||||
# Extract chunk texts for embedding
|
||||
@@ -504,6 +565,9 @@ async def _index_document(
|
||||
logger.warning("No page boundaries available, skipping highlighting")
|
||||
return
|
||||
|
||||
# Type narrowing: page_boundaries is guaranteed to be list[dict] here
|
||||
page_boundaries_list = cast(list[dict[str, Any]], page_boundaries)
|
||||
|
||||
logger.info(
|
||||
f"Batch generating highlighted page images for {len(chunk_data)} PDF chunks"
|
||||
)
|
||||
@@ -514,7 +578,7 @@ async def _index_document(
|
||||
lambda: PDFHighlighter.highlight_chunks_batch(
|
||||
pdf_bytes=content_bytes,
|
||||
chunks=chunk_data,
|
||||
page_boundaries=page_boundaries,
|
||||
page_boundaries=page_boundaries_list,
|
||||
full_text=content,
|
||||
color="yellow",
|
||||
zoom=2.0,
|
||||
@@ -623,6 +687,20 @@ async def _index_document(
|
||||
if doc_task.doc_type == "news_item"
|
||||
else {}
|
||||
),
|
||||
# Deck card-specific metadata
|
||||
**(
|
||||
{
|
||||
"board_id": file_metadata.get("board_id"),
|
||||
"board_title": file_metadata.get("board_title"),
|
||||
"stack_id": file_metadata.get("stack_id"),
|
||||
"stack_title": file_metadata.get("stack_title"),
|
||||
"card_type": file_metadata.get("card_type"),
|
||||
"duedate": file_metadata.get("duedate"),
|
||||
"owner": file_metadata.get("owner"),
|
||||
}
|
||||
if doc_task.doc_type == "deck_card"
|
||||
else {}
|
||||
),
|
||||
# Highlighted page image (PDF only)
|
||||
**(
|
||||
{
|
||||
|
||||
@@ -79,9 +79,11 @@ async def get_last_indexed_timestamp(user_id: str) -> int | None:
|
||||
|
||||
if scroll_result[0]:
|
||||
timestamps = [
|
||||
point.payload.get("indexed_at", 0) for point in scroll_result[0]
|
||||
point.payload.get("indexed_at", 0)
|
||||
for point in scroll_result[0]
|
||||
if point.payload is not None
|
||||
]
|
||||
max_timestamp = max(timestamps)
|
||||
max_timestamp = max(timestamps) if timestamps else 0
|
||||
logger.info(
|
||||
f"Max indexed_at: {max_timestamp}, timestamps sample: {timestamps[:3]}"
|
||||
)
|
||||
@@ -564,9 +566,23 @@ async def scan_user_documents(
|
||||
except Exception as e:
|
||||
logger.warning(f"Failed to scan news items for {user_id}: {e}")
|
||||
|
||||
# Scan Deck cards
|
||||
deck_queued = 0
|
||||
try:
|
||||
deck_queued = await scan_deck_cards(
|
||||
user_id=user_id,
|
||||
send_stream=send_stream,
|
||||
nc_client=nc_client,
|
||||
initial_sync=initial_sync,
|
||||
scan_id=scan_id,
|
||||
)
|
||||
queued += deck_queued
|
||||
except Exception as e:
|
||||
logger.warning(f"Failed to scan deck cards for {user_id}: {e}")
|
||||
|
||||
if queued > 0:
|
||||
logger.info(
|
||||
f"Sent {queued} documents ({file_queued} files, {news_queued} news items) for incremental sync: {user_id}"
|
||||
f"Sent {queued} documents ({file_queued} files, {news_queued} news items, {deck_queued} deck cards) for incremental sync: {user_id}"
|
||||
)
|
||||
else:
|
||||
logger.debug(f"No changes detected for {user_id}")
|
||||
@@ -753,3 +769,200 @@ async def scan_news_items(
|
||||
_potentially_deleted[doc_key] = current_time
|
||||
|
||||
return queued
|
||||
|
||||
|
||||
async def scan_deck_cards(
|
||||
user_id: str,
|
||||
send_stream: MemoryObjectSendStream[DocumentTask],
|
||||
nc_client: NextcloudClient,
|
||||
initial_sync: bool,
|
||||
scan_id: int,
|
||||
) -> int:
|
||||
"""
|
||||
Scan user's Deck cards and queue changed cards for indexing.
|
||||
|
||||
Indexes cards from all non-archived boards and stacks.
|
||||
|
||||
Args:
|
||||
user_id: User to scan
|
||||
send_stream: Stream to send changed documents to processors
|
||||
nc_client: Authenticated Nextcloud client
|
||||
initial_sync: If True, send all documents (first-time sync)
|
||||
scan_id: Scan identifier for logging
|
||||
|
||||
Returns:
|
||||
Number of cards queued for processing
|
||||
"""
|
||||
settings = get_settings()
|
||||
queued = 0
|
||||
|
||||
# Get indexed deck card IDs from Qdrant (for deletion tracking)
|
||||
indexed_card_ids: set[str] = set()
|
||||
if not initial_sync:
|
||||
qdrant_client = await get_qdrant_client()
|
||||
scroll_result = await qdrant_client.scroll(
|
||||
collection_name=settings.get_collection_name(),
|
||||
scroll_filter=Filter(
|
||||
must=[
|
||||
FieldCondition(key="user_id", match=MatchValue(value=user_id)),
|
||||
FieldCondition(key="doc_type", match=MatchValue(value="deck_card")),
|
||||
]
|
||||
),
|
||||
with_payload=["doc_id"],
|
||||
with_vectors=False,
|
||||
limit=10000,
|
||||
)
|
||||
indexed_card_ids = {
|
||||
point.payload["doc_id"]
|
||||
for point in (scroll_result[0] or [])
|
||||
if point.payload is not None
|
||||
}
|
||||
logger.debug(f"Found {len(indexed_card_ids)} indexed deck cards in Qdrant")
|
||||
|
||||
# Fetch all boards
|
||||
boards = await nc_client.deck.get_boards()
|
||||
logger.debug(f"[SCAN-{scan_id}] Found {len(boards)} deck boards")
|
||||
|
||||
card_count = 0
|
||||
nextcloud_card_ids: set[str] = set()
|
||||
|
||||
# Iterate through boards
|
||||
for board in boards:
|
||||
# Skip archived boards
|
||||
if board.archived:
|
||||
continue
|
||||
|
||||
# Skip deleted boards (soft delete: deletedAt > 0)
|
||||
if board.deletedAt > 0:
|
||||
logger.debug(f"[SCAN-{scan_id}] Skipping deleted board {board.id}")
|
||||
continue
|
||||
|
||||
# Get stacks for this board
|
||||
stacks = await nc_client.deck.get_stacks(board.id)
|
||||
|
||||
# Iterate through stacks
|
||||
for stack in stacks:
|
||||
# Skip if stack has no cards
|
||||
if not stack.cards:
|
||||
continue
|
||||
|
||||
# Iterate through cards in stack
|
||||
for card in stack.cards:
|
||||
# Skip archived cards
|
||||
if card.archived:
|
||||
continue
|
||||
|
||||
card_count += 1
|
||||
doc_id = str(card.id)
|
||||
nextcloud_card_ids.add(doc_id)
|
||||
|
||||
# Use lastModified timestamp if available
|
||||
modified_at = card.lastModified or 0
|
||||
|
||||
if initial_sync:
|
||||
# Send everything on first sync - write placeholder first
|
||||
await write_placeholder_point(
|
||||
doc_id=doc_id,
|
||||
doc_type="deck_card",
|
||||
user_id=user_id,
|
||||
modified_at=modified_at,
|
||||
)
|
||||
await send_stream.send(
|
||||
DocumentTask(
|
||||
user_id=user_id,
|
||||
doc_id=doc_id,
|
||||
doc_type="deck_card",
|
||||
operation="index",
|
||||
modified_at=modified_at,
|
||||
)
|
||||
)
|
||||
queued += 1
|
||||
else:
|
||||
# Incremental sync: check if card exists and compare modified_at
|
||||
doc_key = (user_id, doc_id)
|
||||
if doc_key in _potentially_deleted:
|
||||
logger.debug(
|
||||
f"Deck card {doc_id} reappeared, removing from deletion grace period"
|
||||
)
|
||||
del _potentially_deleted[doc_key]
|
||||
|
||||
# Query Qdrant for existing entry
|
||||
existing_metadata = await query_document_metadata(
|
||||
doc_id=doc_id, doc_type="deck_card", user_id=user_id
|
||||
)
|
||||
|
||||
needs_indexing = False
|
||||
if existing_metadata is None:
|
||||
needs_indexing = True
|
||||
elif existing_metadata.get("modified_at", 0) < modified_at:
|
||||
needs_indexing = True
|
||||
elif existing_metadata.get("is_placeholder", False):
|
||||
queued_at = existing_metadata.get("queued_at", 0)
|
||||
placeholder_age = time.time() - queued_at
|
||||
stale_threshold = settings.vector_sync_scan_interval * 5
|
||||
if placeholder_age > stale_threshold:
|
||||
logger.debug(
|
||||
f"Found stale placeholder for deck card {doc_id} "
|
||||
f"(age={placeholder_age:.1f}s), requeuing"
|
||||
)
|
||||
needs_indexing = True
|
||||
|
||||
if needs_indexing:
|
||||
await write_placeholder_point(
|
||||
doc_id=doc_id,
|
||||
doc_type="deck_card",
|
||||
user_id=user_id,
|
||||
modified_at=modified_at,
|
||||
)
|
||||
await send_stream.send(
|
||||
DocumentTask(
|
||||
user_id=user_id,
|
||||
doc_id=doc_id,
|
||||
doc_type="deck_card",
|
||||
operation="index",
|
||||
modified_at=modified_at,
|
||||
)
|
||||
)
|
||||
queued += 1
|
||||
|
||||
logger.info(
|
||||
f"[SCAN-{scan_id}] Found {card_count} deck cards (non-archived) for {user_id}"
|
||||
)
|
||||
record_vector_sync_scan(card_count)
|
||||
|
||||
# Check for deleted cards (not initial sync)
|
||||
if not initial_sync:
|
||||
grace_period = settings.vector_sync_scan_interval * 1.5
|
||||
current_time = time.time()
|
||||
|
||||
for doc_id in indexed_card_ids:
|
||||
if doc_id not in nextcloud_card_ids:
|
||||
doc_key = (user_id, doc_id)
|
||||
|
||||
if doc_key in _potentially_deleted:
|
||||
first_missing_time = _potentially_deleted[doc_key]
|
||||
time_missing = current_time - first_missing_time
|
||||
|
||||
if time_missing >= grace_period:
|
||||
logger.info(
|
||||
f"Deck card {doc_id} missing for {time_missing:.1f}s "
|
||||
f"(>{grace_period:.1f}s grace period), sending deletion"
|
||||
)
|
||||
await send_stream.send(
|
||||
DocumentTask(
|
||||
user_id=user_id,
|
||||
doc_id=doc_id,
|
||||
doc_type="deck_card",
|
||||
operation="delete",
|
||||
modified_at=0,
|
||||
)
|
||||
)
|
||||
queued += 1
|
||||
del _potentially_deleted[doc_key]
|
||||
else:
|
||||
logger.debug(
|
||||
f"Deck card {doc_id} missing for first time, starting grace period"
|
||||
)
|
||||
_potentially_deleted[doc_key] = current_time
|
||||
|
||||
return queued
|
||||
|
||||
@@ -0,0 +1,238 @@
|
||||
"""Integration tests for Deck card vector search.
|
||||
|
||||
These tests validate that Deck cards are properly indexed and searchable
|
||||
via semantic search.
|
||||
"""
|
||||
|
||||
import pytest
|
||||
|
||||
pytestmark = [pytest.mark.integration, pytest.mark.smoke]
|
||||
|
||||
|
||||
async def test_deck_card_semantic_search(nc_mcp_client, nc_client, mocker):
|
||||
"""Test that Deck cards can be indexed and searched via semantic search.
|
||||
|
||||
This test:
|
||||
1. Creates a Deck board with a card
|
||||
2. Manually triggers indexing (simulates vector sync)
|
||||
3. Performs semantic search filtering by deck_card doc_type
|
||||
4. Verifies the card is found in results
|
||||
"""
|
||||
# Skip if vector sync is not enabled
|
||||
settings_response = await nc_mcp_client.call_tool("nc_get_vector_sync_status", {})
|
||||
if settings_response.isError:
|
||||
pytest.skip("Vector sync not enabled")
|
||||
|
||||
# Create a test board
|
||||
board_title = "Test Board for Vector Search"
|
||||
board = await nc_client.deck.create_board(title=board_title, color="ff0000")
|
||||
|
||||
try:
|
||||
# Create a stack for the board
|
||||
stack = await nc_client.deck.create_stack(
|
||||
board_id=board.id, title="Test Stack", order=0
|
||||
)
|
||||
|
||||
# Create a test card with searchable content
|
||||
card_title = "Machine Learning Project Plan"
|
||||
card_description = """
|
||||
# ML Project Outline
|
||||
|
||||
## Phase 1: Data Collection
|
||||
- Gather training data from multiple sources
|
||||
- Clean and preprocess the dataset
|
||||
|
||||
## Phase 2: Model Training
|
||||
- Experiment with different neural network architectures
|
||||
- Use gradient descent optimization
|
||||
|
||||
## Phase 3: Deployment
|
||||
- Deploy model to production environment
|
||||
- Monitor performance metrics
|
||||
"""
|
||||
card = await nc_client.deck.create_card(
|
||||
board_id=board.id,
|
||||
stack_id=stack.id,
|
||||
title=card_title,
|
||||
description=card_description,
|
||||
)
|
||||
|
||||
# Note: In a real integration test with vector sync enabled,
|
||||
# we would wait for the background scanner to index the card.
|
||||
# For now, we'll test the scanning function directly if needed.
|
||||
|
||||
# TODO: Once vector sync is running in test environment,
|
||||
# add actual semantic search test here
|
||||
# For now, just verify the card was created successfully
|
||||
assert card.id is not None
|
||||
assert card.title == card_title
|
||||
assert card.description == card_description
|
||||
|
||||
# Test semantic search with deck_card filter
|
||||
# Note: This will only work if vector sync is actually running
|
||||
# and the card has been indexed
|
||||
try:
|
||||
search_result = await nc_mcp_client.call_tool(
|
||||
"nc_semantic_search",
|
||||
{
|
||||
"query": "machine learning neural networks",
|
||||
"doc_types": ["deck_card"],
|
||||
"limit": 10,
|
||||
},
|
||||
)
|
||||
|
||||
# If vector sync is working, we should find the card
|
||||
if not search_result.isError:
|
||||
data = search_result.structuredContent
|
||||
results = data.get("results", [])
|
||||
|
||||
# Check if our card is in the results
|
||||
found_card = any(
|
||||
r.get("doc_type") == "deck_card" and r.get("title") == card_title
|
||||
for r in results
|
||||
)
|
||||
|
||||
# Log result for debugging
|
||||
if found_card:
|
||||
print("✓ Successfully found Deck card in vector search")
|
||||
else:
|
||||
print(
|
||||
"⚠ Deck card not found in search (may need time for indexing)"
|
||||
)
|
||||
except Exception as e:
|
||||
# If search fails, it might be because indexing hasn't happened yet
|
||||
print(f"⚠ Semantic search failed (indexing may not be complete): {e}")
|
||||
|
||||
finally:
|
||||
# Cleanup: delete the board
|
||||
try:
|
||||
await nc_client.deck.delete_board(board.id)
|
||||
except Exception as e:
|
||||
print(f"Warning: Failed to cleanup test board: {e}")
|
||||
|
||||
|
||||
async def test_deck_card_appears_in_cross_app_search(nc_mcp_client, nc_client):
|
||||
"""Test that Deck cards appear in cross-app semantic search (no doc_type filter).
|
||||
|
||||
This verifies that when searching without specifying doc_types,
|
||||
Deck cards are included in the results alongside notes, files, etc.
|
||||
"""
|
||||
# Skip if vector sync is not enabled
|
||||
settings_response = await nc_mcp_client.call_tool("nc_get_vector_sync_status", {})
|
||||
if settings_response.isError:
|
||||
pytest.skip("Vector sync not enabled")
|
||||
|
||||
# Create a test board with a distinctive card
|
||||
board_title = "Cross-App Search Test Board"
|
||||
board = await nc_client.deck.create_board(title=board_title, color="00ff00")
|
||||
|
||||
try:
|
||||
# Create a stack for the board
|
||||
stack = await nc_client.deck.create_stack(
|
||||
board_id=board.id, title="Test Stack", order=0
|
||||
)
|
||||
|
||||
# Use a very distinctive term to make it easy to find
|
||||
unique_term = "xylophone_banana_unicorn_test"
|
||||
_card = await nc_client.deck.create_card(
|
||||
board_id=board.id,
|
||||
stack_id=stack.id,
|
||||
title=f"Test Card with {unique_term}",
|
||||
description=f"This card contains the unique search term: {unique_term}",
|
||||
)
|
||||
|
||||
# Test cross-app search (no doc_type filter)
|
||||
try:
|
||||
search_result = await nc_mcp_client.call_tool(
|
||||
"nc_semantic_search",
|
||||
{
|
||||
"query": unique_term,
|
||||
"limit": 20,
|
||||
},
|
||||
)
|
||||
|
||||
if not search_result.isError:
|
||||
data = search_result.structuredContent
|
||||
results = data.get("results", [])
|
||||
|
||||
# Check if deck_card appears in cross-app results
|
||||
deck_cards_found = [
|
||||
r for r in results if r.get("doc_type") == "deck_card"
|
||||
]
|
||||
|
||||
if deck_cards_found:
|
||||
print(
|
||||
f"✓ Found {len(deck_cards_found)} Deck card(s) in cross-app search"
|
||||
)
|
||||
else:
|
||||
print(
|
||||
"⚠ No Deck cards in cross-app search (may need time for indexing)"
|
||||
)
|
||||
except Exception as e:
|
||||
print(f"⚠ Cross-app search failed: {e}")
|
||||
|
||||
finally:
|
||||
# Cleanup
|
||||
try:
|
||||
await nc_client.deck.delete_board(board.id)
|
||||
except Exception as e:
|
||||
print(f"Warning: Failed to cleanup test board: {e}")
|
||||
|
||||
|
||||
async def test_deck_card_chunk_context(nc_client):
|
||||
"""Test that Deck card chunk context can be fetched for visualization.
|
||||
|
||||
This test validates that the vector viz UI can display Deck card previews
|
||||
by fetching the chunk context via the context expansion module.
|
||||
"""
|
||||
from nextcloud_mcp_server.search.context import get_chunk_with_context
|
||||
|
||||
# Create board, stack, and card
|
||||
board = await nc_client.deck.create_board(title="Test Board", color="ff0000")
|
||||
|
||||
try:
|
||||
stack = await nc_client.deck.create_stack(
|
||||
board_id=board.id, title="Test Stack", order=0
|
||||
)
|
||||
|
||||
card_title = "Test Card for Context Expansion"
|
||||
card_description = "This is a test description that should be fetched by the context expansion module when displaying chunk previews in the vector visualization UI."
|
||||
|
||||
card = await nc_client.deck.create_card(
|
||||
board_id=board.id,
|
||||
stack_id=stack.id,
|
||||
title=card_title,
|
||||
description=card_description,
|
||||
)
|
||||
|
||||
# Fetch chunk context (simulates viz UI request)
|
||||
# The chunk spans the title, so start=0 and end=len(card_title)
|
||||
context = await get_chunk_with_context(
|
||||
nc_client=nc_client,
|
||||
user_id=nc_client.username,
|
||||
doc_id=card.id,
|
||||
doc_type="deck_card",
|
||||
chunk_start=0,
|
||||
chunk_end=len(card_title),
|
||||
context_chars=100,
|
||||
)
|
||||
|
||||
# Verify context was fetched successfully
|
||||
assert context is not None, "Chunk context should not be None"
|
||||
assert card_title in context.chunk_text, (
|
||||
f"Card title '{card_title}' should be in chunk_text"
|
||||
)
|
||||
|
||||
# Verify context includes description
|
||||
assert card_description[:50] in context.after_context, (
|
||||
"Card description should be in after_context"
|
||||
)
|
||||
|
||||
print(f"✓ Successfully fetched chunk context for Deck card {card.id}")
|
||||
|
||||
finally:
|
||||
# Cleanup
|
||||
try:
|
||||
await nc_client.deck.delete_board(board.id)
|
||||
except Exception as e:
|
||||
print(f"Warning: Failed to cleanup test board: {e}")
|
||||
Reference in New Issue
Block a user