feat: implement semantic search tool and fix vector sync issues (ADR-007 Phase 3)

Completes the ADR-007 implementation by adding user-facing semantic search functionality. Previous phases implemented scanner and processor for background indexing; this adds the query interface. Changes: - Add nc_notes_semantic_search MCP tool for natural language queries - Fix Qdrant point IDs to use UUIDs instead of strings (was causing 400 errors) - Reduce scan interval default from 1 hour to 5 minutes for faster updates - Add SemanticSearchResult and SemanticSearchNotesResponse models - Implement dual-phase authorization (Qdrant filter + Nextcloud API verification) The semantic search enables finding notes by meaning rather than exact keywords, using vector embeddings to understand query intent. Point ID fix resolves critical bug where all document indexing failed with "invalid point ID" errors. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-08 21:51:12 +01:00
parent 4dbb2eb468
commit fdd82f59e2
4 changed files with 175 additions and 3 deletions
@@ -6,6 +6,7 @@ Processes documents from queue: fetches content, generates embeddings, stores in
 import asyncio
 import logging
 import time
+import uuid

 import anyio
 from httpx import HTTPStatusError
@@ -187,9 +188,14 @@ async def _index_document(
    points = []

    for i, (chunk, embedding) in enumerate(zip(chunks, embeddings)):
+        # Generate deterministic UUID for point ID
+        # Using uuid5 with DNS namespace and combining doc info
+        point_name = f"{doc_task.doc_type}:{doc_task.doc_id}:chunk:{i}"
+        point_id = str(uuid.uuid5(uuid.NAMESPACE_DNS, point_name))
+
        points.append(
            PointStruct(
-                id=f"{doc_task.doc_type}_{doc_task.doc_id}_{i}",
+                id=point_id,
                vector=embedding,
                payload={
                    "user_id": doc_task.user_id,