20404cf3f2
Adds comprehensive vector search support for Nextcloud Deck cards,
including semantic search indexing, chunk preview in the vector viz UI,
and proper deep linking to cards.
**Vector Search Indexing**
- Add deck_card scanning in scanner.py (scan_deck_cards function)
- Index cards from non-archived, non-deleted boards
- Store metadata: board_id, board_title, stack_id, stack_title, card_type, duedate, owner
- Content structure: title + "\n\n" + description (matches indexing format)
- Incremental sync based on lastModified timestamp
- Deletion tracking with grace period
**Vector Visualization Support**
- Add deck_card handler in context.py for chunk preview expansion
- Include board_id in search result metadata (bm25_hybrid.py, semantic.py)
- Expose metadata in viz_routes.py JSON responses
- Update vector-viz.js to construct proper Deck URLs: /apps/deck/board/{board_id}/card/{card_id}
- Update vector_viz.html filter label from "Deck" to "Deck Cards"
**Bug Fixes**
- Skip soft-deleted boards (deletedAt > 0) to prevent 403 Forbidden errors
- Applies to scanner, processor, and context expansion code paths
- Deck API returns deleted boards but rejects stack access with 403
**Testing**
- Add integration tests in test_deck_vector_search.py:
- test_deck_card_semantic_search: Filtered search with doc_type="deck_card"
- test_deck_card_appears_in_cross_app_search: Cross-app search includes deck cards
- test_deck_card_chunk_context: Chunk context fetching for viz preview
**Documentation**
- Update README.md: Add Deck cards to semantic search feature list
- Update semantic-search-architecture.md: Document deck_card support
- Update nc_semantic_search tool documentation
**Type Safety**
- Fix type narrowing for page_boundaries (could be None) using cast()
- Fix scanner.py payload None check for type safety
Resolves vector search for Deck cards across indexing, search, and visualization.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
922 lines
32 KiB
Markdown
922 lines
32 KiB
Markdown
# Semantic Search Architecture
|
|
|
|
This document explains the architecture of the semantic search feature in the Nextcloud MCP Server, including background synchronization, vector search, and optional AI-generated answers via MCP sampling.
|
|
|
|
> [!IMPORTANT]
|
|
> **Status: Experimental**
|
|
> - Disabled by default (`VECTOR_SYNC_ENABLED=false`)
|
|
> - Currently supports **Notes, Files (PDFs), News items, and Deck cards**
|
|
> - Requires additional infrastructure (Qdrant vector database + Ollama embedding service)
|
|
> - RAG answer generation requires MCP client sampling support
|
|
|
|
## Overview
|
|
|
|
### What is Semantic Search?
|
|
|
|
**Semantic search** finds information based on **meaning** rather than exact keyword matches. It uses vector embeddings to understand that "car" and "automobile" are similar, or that "bread recipe" matches "how to bake bread."
|
|
|
|
**Traditional keyword search:**
|
|
```
|
|
Query: "machine learning"
|
|
Matches: Only notes containing "machine learning" exactly
|
|
Misses: Notes with "neural networks", "AI models", "deep learning"
|
|
```
|
|
|
|
**Semantic search:**
|
|
```
|
|
Query: "machine learning"
|
|
Matches: Notes about machine learning, neural networks, AI, deep learning, etc.
|
|
Understanding: Semantic similarity via vector embeddings
|
|
```
|
|
|
|
### Why It Matters
|
|
|
|
Semantic search enables:
|
|
- **Natural language queries** - Ask questions in plain language
|
|
- **Conceptual discovery** - Find related content even with different terminology
|
|
- **Cross-reference insights** - Connect ideas across your knowledge base
|
|
- **AI-powered answers** - Generate summaries with citations (optional, requires MCP sampling)
|
|
|
|
### Current Support
|
|
|
|
- **Supported Apps**: Notes, Files (PDFs with text extraction), News items, Deck cards
|
|
- **Planned Apps**: Calendar events, Calendar tasks, Contacts
|
|
- **Architecture**: Multi-app plugin system ready for additional apps
|
|
|
|
## System Components
|
|
|
|
```mermaid
|
|
graph TB
|
|
subgraph "MCP Client"
|
|
Client[Claude Desktop, IDEs, etc.]
|
|
end
|
|
|
|
subgraph "Nextcloud MCP Server"
|
|
MCP[MCP Server]
|
|
Scanner[Background Scanner<br/>Hourly Change Detection]
|
|
Queue[Document Queue]
|
|
Processor[Embedding Processors<br/>Concurrent Workers]
|
|
end
|
|
|
|
subgraph "Infrastructure"
|
|
Qdrant[(Qdrant<br/>Vector Database)]
|
|
Ollama[Ollama<br/>Embedding Service]
|
|
NC[Nextcloud<br/>Notes API, CalDAV, etc.]
|
|
end
|
|
|
|
Client <-->|MCP Protocol| MCP
|
|
Scanner -->|Fetch Changes| NC
|
|
Scanner -->|Enqueue Documents| Queue
|
|
Queue -->|Process Batch| Processor
|
|
Processor -->|Generate Embeddings| Ollama
|
|
Processor -->|Store Vectors| Qdrant
|
|
MCP -->|Search Queries| Qdrant
|
|
MCP -->|Verify Access| NC
|
|
```
|
|
|
|
**Component Roles:**
|
|
|
|
- **MCP Server**: Exposes semantic search tools (`nc_semantic_search`, `nc_semantic_search_answer`, `nc_get_vector_sync_status`)
|
|
- **Background Scanner**: Discovers changed documents every hour using ETag-based change detection
|
|
- **Document Queue**: Holds pending documents for embedding generation
|
|
- **Embedding Processors**: Generate vector embeddings via Ollama (concurrent workers)
|
|
- **Qdrant Vector Database**: Stores document vectors with metadata and user_id filtering
|
|
- **Ollama Embedding Service**: Converts text to 768-dimensional vectors (default: `nomic-embed-text` model)
|
|
- **Nextcloud APIs**: Source of truth for documents and access control verification
|
|
|
|
## How It Works: Background Synchronization
|
|
|
|
Background synchronization runs automatically when `VECTOR_SYNC_ENABLED=true`, discovering changes and indexing documents without user intervention.
|
|
|
|
```mermaid
|
|
sequenceDiagram
|
|
participant Timer
|
|
participant Scanner
|
|
participant NC as Nextcloud API
|
|
participant Queue
|
|
participant Processor
|
|
participant Ollama
|
|
participant Qdrant
|
|
|
|
Timer->>Scanner: Trigger (hourly)
|
|
Scanner->>NC: Fetch all notes<br/>(Notes API)
|
|
NC-->>Scanner: Notes with ETags
|
|
Scanner->>Qdrant: Check indexed documents
|
|
Qdrant-->>Scanner: Existing ETags
|
|
Scanner->>Scanner: Identify changes<br/>(new/modified/deleted)
|
|
Scanner->>Queue: Enqueue changed docs
|
|
|
|
loop Continuous Processing
|
|
Processor->>Queue: Fetch batch
|
|
Queue-->>Processor: Documents
|
|
Processor->>Ollama: Generate embeddings
|
|
Ollama-->>Processor: 768-dim vectors
|
|
Processor->>Qdrant: Upsert vectors<br/>(with user_id, doc_type)
|
|
end
|
|
```
|
|
|
|
### Scanner Behavior
|
|
|
|
**Hourly Trigger:**
|
|
- Runs every hour (configurable)
|
|
- Fetches all notes from Nextcloud Notes API
|
|
- Compares ETags with Qdrant's indexed state
|
|
- Enqueues new/modified documents
|
|
|
|
**Change Detection:**
|
|
- **New documents**: No entry in Qdrant → enqueue for indexing
|
|
- **Modified documents**: ETag mismatch → enqueue for re-indexing
|
|
- **Deleted documents**: In Qdrant but not in Nextcloud → delete from Qdrant
|
|
|
|
**Multi-App Plugin Architecture:**
|
|
```python
|
|
# Each app implements DocumentScanner interface
|
|
class NotesScanner(DocumentScanner):
|
|
async def scan(self) -> list[Document]:
|
|
# Fetch notes, detect changes, return documents
|
|
```
|
|
|
|
Currently only `NotesScanner` is implemented. Future: `CalendarScanner`, `DeckScanner`, `FilesScanner`, etc.
|
|
|
|
### Queue Processing
|
|
|
|
**Document Queue:**
|
|
- In-memory FIFO queue (not persistent across restarts)
|
|
- Holds documents pending embedding generation
|
|
- Batch processing for efficiency
|
|
|
|
**Processor Pool:**
|
|
- Concurrent workers using `anyio.TaskGroup`
|
|
- Process documents in parallel (default: 4 workers)
|
|
- Each worker: fetch document → generate embedding → store in Qdrant
|
|
|
|
**Backpressure Handling:**
|
|
- Queue size limits prevent memory exhaustion
|
|
- Slow consumers (Ollama) naturally pace the system
|
|
|
|
### Vector Storage
|
|
|
|
**Qdrant Collection Schema:**
|
|
```
|
|
{
|
|
"id": "note_123",
|
|
"vector": [768 dimensions],
|
|
"payload": {
|
|
"user_id": "alice",
|
|
"doc_type": "note",
|
|
"doc_id": "123",
|
|
"title": "Machine Learning Notes",
|
|
"content": "Neural networks are...",
|
|
"etag": "abc123",
|
|
"last_modified": "2025-01-15T10:30:00Z"
|
|
}
|
|
}
|
|
```
|
|
|
|
**Key Fields:**
|
|
- `user_id`: Multi-tenancy filtering (each user's vectors isolated)
|
|
- `doc_type`: App identifier ("note", "event", "card", etc.)
|
|
- `etag`: Change detection for incremental updates
|
|
- `chunk_index`: Position of this chunk within the document (0-indexed)
|
|
- `total_chunks`: Total number of chunks for this document
|
|
- `excerpt`: First 200 characters of chunk (for display)
|
|
|
|
### Document Chunking Strategy
|
|
|
|
Documents are chunked before embedding to handle content larger than the embedding model's context window and to improve search precision.
|
|
|
|
**Configuration:**
|
|
```dotenv
|
|
DOCUMENT_CHUNK_SIZE=512 # Words per chunk (default)
|
|
DOCUMENT_CHUNK_OVERLAP=50 # Overlapping words between chunks (default)
|
|
```
|
|
|
|
**Chunking Process:**
|
|
1. **Text combination**: Document title + content (e.g., `"Note Title\n\nNote content..."`)
|
|
2. **Word-based splitting**: Simple whitespace tokenization
|
|
3. **Sliding window**: Create overlapping chunks
|
|
4. **Individual embedding**: Each chunk gets its own vector
|
|
5. **Separate storage**: Each chunk stored as distinct point in Qdrant
|
|
|
|
**Example:**
|
|
```
|
|
Document (1000 words):
|
|
→ Chunk 0: words 0-511
|
|
→ Chunk 1: words 462-973 (overlaps by 50 words)
|
|
→ Chunk 2: words 924-999 (last chunk, partial)
|
|
|
|
Each chunk stored as separate vector with metadata:
|
|
- chunk_index: 0, 1, 2
|
|
- total_chunks: 3
|
|
- excerpt: First 200 chars of each chunk
|
|
```
|
|
|
|
**Search Behavior:**
|
|
- **Vector search** operates on chunks (not whole documents)
|
|
- **Deduplication** collapses multiple matching chunks from same document
|
|
- **Best match** returns highest-scoring chunk's excerpt
|
|
- **Access verification** still performed at document level
|
|
|
|
**Tuning Recommendations:**
|
|
- **Small chunks (256-384 words)**: More precise, less context, more storage
|
|
- **Large chunks (768-1024 words)**: More context, less precise, less storage
|
|
- **Overlap (10-20% of chunk size)**: Preserves context across boundaries
|
|
- **Match to embedding model**: Consider model's context window when sizing
|
|
|
|
**Important**: Changing chunk size requires re-embedding all documents. Use the collection naming strategy to manage different chunking configurations.
|
|
|
|
### Collection Naming and Model Switching
|
|
|
|
**Auto-generated collection names:**
|
|
- **Format:** `{deployment-id}-{model-name}`
|
|
- **Deployment ID:** `OTEL_SERVICE_NAME` (if configured) or `hostname` (fallback)
|
|
- **Model name:** `OLLAMA_EMBEDDING_MODEL`
|
|
- **Example:** `"my-mcp-server-nomic-embed-text"`, `"mcp-container-all-minilm"`
|
|
|
|
**Why model-based naming:**
|
|
- Ensures each embedding model gets its own collection
|
|
- Prevents dimension mismatches when switching models
|
|
- Enables safe model experimentation (new model = new collection)
|
|
- Supports multi-server deployments (different deployment IDs)
|
|
|
|
**Switching embedding models:**
|
|
|
|
Collections are **mutually exclusive** - vectors from one embedding model cannot be used with another. When you change the embedding model:
|
|
|
|
1. **New collection is created** with the new model's dimensions
|
|
2. **Full re-embedding occurs** - scanner processes all documents again
|
|
3. **Old collection remains** - can be deleted manually if no longer needed
|
|
4. **Dimension validation** - server fails fast if collection dimension doesn't match model
|
|
|
|
**Example workflow:**
|
|
```bash
|
|
# Start with nomic-embed-text (768 dimensions)
|
|
OLLAMA_EMBEDDING_MODEL=nomic-embed-text
|
|
# Collection: "my-server-nomic-embed-text"
|
|
# → Scanner indexes 1000 notes → 1000 vectors in collection
|
|
|
|
# Switch to all-minilm (384 dimensions)
|
|
OLLAMA_EMBEDDING_MODEL=all-minilm
|
|
# Collection: "my-server-all-minilm"
|
|
# → Scanner detects 0 indexed documents → re-embeds 1000 notes
|
|
# → Old collection "my-server-nomic-embed-text" still exists in Qdrant
|
|
```
|
|
|
|
**Re-embedding performance:**
|
|
- CPU-only: 1-5 notes/second
|
|
- With GPU: 50-200 notes/second
|
|
- 1000 notes: 3-16 minutes (CPU) or 5-20 seconds (GPU)
|
|
|
|
**Multi-server deployments:**
|
|
|
|
Multiple MCP servers can share one Qdrant instance safely:
|
|
|
|
```bash
|
|
# Server 1 (Production)
|
|
OTEL_SERVICE_NAME=mcp-prod
|
|
OLLAMA_EMBEDDING_MODEL=nomic-embed-text
|
|
# → Collection: "mcp-prod-nomic-embed-text"
|
|
|
|
# Server 2 (Staging with different model)
|
|
OTEL_SERVICE_NAME=mcp-staging
|
|
OLLAMA_EMBEDDING_MODEL=all-minilm
|
|
# → Collection: "mcp-staging-all-minilm"
|
|
```
|
|
|
|
Each deployment gets its own collection - no naming collisions or dimension conflicts.
|
|
|
|
## How It Works: Semantic Search
|
|
|
|
Semantic search converts user queries into vectors and finds similar documents using cosine similarity.
|
|
|
|
```mermaid
|
|
sequenceDiagram
|
|
participant User
|
|
participant MCP as MCP Server
|
|
participant Ollama
|
|
participant Qdrant
|
|
participant NC as Nextcloud API
|
|
|
|
User->>MCP: nc_semantic_search("machine learning")
|
|
MCP->>MCP: Check OAuth scope<br/>(semantic:read)
|
|
MCP->>Ollama: Generate query embedding
|
|
Ollama-->>MCP: Query vector (768-dim)
|
|
MCP->>Qdrant: Search similar vectors<br/>(filter: user_id=alice)
|
|
Qdrant-->>MCP: Top K results<br/>(with similarity scores)
|
|
|
|
loop For each result
|
|
MCP->>NC: Verify access<br/>(fetch note by ID)
|
|
alt Access granted
|
|
NC-->>MCP: Note metadata
|
|
else Access denied (404/401)
|
|
MCP->>MCP: Filter out result
|
|
end
|
|
end
|
|
|
|
MCP-->>User: Search results<br/>(with scores, excerpts)
|
|
```
|
|
|
|
### Dual-Phase Authorization
|
|
|
|
**Phase 1: OAuth Scope Check**
|
|
- Verify user has `semantic:read` scope
|
|
- Rejects unauthorized users immediately
|
|
|
|
**Phase 2: Per-Document Verification**
|
|
- For each search result, fetch document via app API (Notes, Calendar, etc.)
|
|
- If fetch succeeds (200 OK), user has access
|
|
- If fetch fails (404 Not Found, 401 Unauthorized), filter out result
|
|
- **Security**: Prevents information leakage from vector search alone
|
|
|
|
**Rationale:**
|
|
- Vector database doesn't know about sharing, permissions changes, or deleted documents
|
|
- App APIs are source of truth for access control
|
|
- Verification ensures users only see documents they can access
|
|
|
|
### Search Flow
|
|
|
|
1. **Query Embedding**: Convert user query to 768-dimensional vector via Ollama
|
|
2. **Vector Search**: Find top K similar vectors in Qdrant (cosine similarity)
|
|
3. **User Filtering**: Qdrant pre-filters by `user_id` (multi-tenancy)
|
|
4. **Access Verification**: Fetch each document via app API to verify current access
|
|
5. **Result Ranking**: Return results sorted by similarity score
|
|
6. **Response**: Include document excerpts, metadata, and similarity scores
|
|
|
|
### Performance
|
|
|
|
- **Query latency**: 50-200ms typical (embedding + vector search + verification)
|
|
- **Accuracy**: Depends on embedding model quality (`nomic-embed-text` recommended)
|
|
- **Scalability**: Qdrant handles millions of vectors efficiently
|
|
|
|
## How It Works: RAG with MCP Sampling (Optional)
|
|
|
|
The `nc_semantic_search_answer` tool generates AI-powered answers with citations using **MCP sampling** - requesting the MCP client's LLM to generate text.
|
|
|
|
```mermaid
|
|
sequenceDiagram
|
|
participant User
|
|
participant MCP as MCP Server
|
|
participant Client as MCP Client<br/>(Claude Desktop)
|
|
participant LLM as Client's LLM<br/>(Claude, GPT, etc.)
|
|
|
|
User->>MCP: nc_semantic_search_answer("What are my Q1 goals?")
|
|
MCP->>MCP: Semantic search<br/>(find relevant notes)
|
|
MCP->>MCP: Construct prompt<br/>(query + documents + instructions)
|
|
MCP->>Client: Sampling request<br/>(MCP Protocol)
|
|
Client->>User: Prompt for approval<br/>(optional, client-controlled)
|
|
User-->>Client: Approve
|
|
Client->>LLM: Generate answer<br/>(with context)
|
|
LLM-->>Client: Answer with citations
|
|
Client-->>MCP: Sampling response
|
|
MCP-->>User: Generated answer<br/>(with source documents)
|
|
```
|
|
|
|
### MCP Sampling Architecture
|
|
|
|
**Why MCP Sampling?**
|
|
- **No server-side LLM**: MCP server has no API keys, doesn't call LLMs directly
|
|
- **Client controls everything**: Which model, who pays, user approval prompts
|
|
- **Privacy**: Documents stay with the client's LLM provider, not a third-party
|
|
- **Flexibility**: Works with any MCP client that supports sampling (Claude Desktop, future clients)
|
|
|
|
**Prompt Construction:**
|
|
```
|
|
User Query: {query}
|
|
|
|
Relevant Documents:
|
|
1. Document: {title} (Note)
|
|
Content: {excerpt}
|
|
|
|
2. Document: {title} (Note)
|
|
Content: {excerpt}
|
|
|
|
Instructions:
|
|
- Provide a comprehensive answer to the user's query
|
|
- Use the documents above as context
|
|
- Include citations: "According to Document 1 (title)..."
|
|
- If documents don't contain enough information, say so
|
|
```
|
|
|
|
**Graceful Fallback:**
|
|
```python
|
|
try:
|
|
result = await ctx.session.create_message(...)
|
|
return answer_with_citations
|
|
except Exception as e:
|
|
# Fallback: Return documents without generated answer
|
|
return SearchResponse(
|
|
generated_answer=f"[Sampling unavailable: {e}]",
|
|
sources=search_results
|
|
)
|
|
```
|
|
|
|
**Client Support:**
|
|
- **Requires**: MCP client with sampling capability
|
|
- **Known support**: Claude Desktop (as of Claude 3.5+)
|
|
- **Graceful degradation**: Returns raw documents if sampling unavailable
|
|
|
|
## Authentication & Security
|
|
|
|
### OAuth Scopes
|
|
|
|
**`semantic:read`** - Search permission
|
|
- Allows using `nc_semantic_search` and `nc_semantic_search_answer` tools
|
|
- Does NOT grant access to documents (verified via app APIs)
|
|
- Required for any semantic search operation
|
|
|
|
**`semantic:write`** - Sync control permission
|
|
- Allows enabling/disabling background sync (`provision_vector_sync`, `deprovision_vector_sync`)
|
|
- Controls whether user's documents are indexed
|
|
- Currently not implemented in OAuth mode (BasicAuth only)
|
|
|
|
### Dual-Phase Authorization Pattern
|
|
|
|
**Phase 1: Scope Check** (semantic:read)
|
|
- Verifies user authorized to search
|
|
- Prevents unauthorized vector database access
|
|
|
|
**Phase 2: Document Verification** (app-specific APIs)
|
|
- For each search result, fetch via Notes API, CalDAV, etc.
|
|
- If user can fetch → include in results
|
|
- If user cannot fetch (404/401) → filter out
|
|
- **Security**: Vector search cannot leak documents user shouldn't see
|
|
|
|
**Example Scenario:**
|
|
1. Alice creates note "Secret Project X"
|
|
2. Background sync indexes note with `user_id=alice`
|
|
3. Bob searches for "project"
|
|
4. Vector search finds "Secret Project X" (vector similarity)
|
|
5. Qdrant filters by `user_id=bob` → no match (Alice's note excluded)
|
|
6. Even if Bob somehow got the doc_id, Phase 2 verification would fail (404 Not Found)
|
|
|
|
### Offline Access for Background Sync
|
|
|
|
**Why needed:**
|
|
- Background scanner runs hourly without user interaction
|
|
- Requires valid access tokens to fetch documents from Nextcloud APIs
|
|
- User's session token expires after hours/days
|
|
|
|
**OAuth Mode (ADR-004 Flow 2):**
|
|
- User explicitly provisions offline access via `provision_nextcloud_access` tool
|
|
- Server requests `offline_access` scope → receives refresh token
|
|
- Refresh token stored securely (database, encrypted)
|
|
- Background sync uses refresh tokens to obtain access tokens
|
|
|
|
**BasicAuth Mode:**
|
|
- Username/password stored in environment variables
|
|
- Always available for background operations
|
|
- Simpler but less secure (credentials never expire)
|
|
|
|
## Deployment Modes
|
|
|
|
### Authentication Modes
|
|
|
|
| Mode | Security | Offline Access | Background Sync | Best For |
|
|
|------|----------|----------------|-----------------|----------|
|
|
| **BasicAuth** | Lower (credentials in env) | Always available | ✅ Works immediately | Single-user, development, testing |
|
|
| **OAuth** | Higher (tokens, scopes) | User must provision | ⚠️ Not yet implemented | Multi-user, production |
|
|
|
|
**BasicAuth:**
|
|
- Set `NEXTCLOUD_USERNAME` and `NEXTCLOUD_PASSWORD`
|
|
- Background sync works immediately when `VECTOR_SYNC_ENABLED=true`
|
|
- Credentials stored in `.env` file (secure server access required)
|
|
|
|
**OAuth:**
|
|
- Client authenticates with `semantic:read` scope
|
|
- User must explicitly provision offline access (future: `provision_vector_sync` tool)
|
|
- Background sync only works for users who provisioned access
|
|
- More secure: tokens expire, user controls access
|
|
|
|
### Qdrant Deployment Modes
|
|
|
|
| Mode | Configuration | Persistence | Scalability | Best For |
|
|
|------|---------------|-------------|-------------|----------|
|
|
| **In-Memory** (default) | `QDRANT_LOCATION=:memory:` | ❌ Lost on restart | Single instance | Testing, development |
|
|
| **Persistent Local** | `QDRANT_LOCATION=/data/qdrant` | ✅ Survives restarts | Single instance | Small deployments |
|
|
| **Network** | `QDRANT_URL=http://qdrant:6333` | ✅ Dedicated service | ✅ Horizontal scaling | Production |
|
|
|
|
**In-Memory Mode:**
|
|
```bash
|
|
VECTOR_SYNC_ENABLED=true
|
|
# QDRANT_LOCATION not set → defaults to :memory:
|
|
```
|
|
- Fastest startup
|
|
- No disk I/O
|
|
- **Warning**: All vectors lost when server restarts (must re-index)
|
|
|
|
**Persistent Local Mode:**
|
|
```bash
|
|
VECTOR_SYNC_ENABLED=true
|
|
QDRANT_LOCATION=/var/lib/qdrant
|
|
```
|
|
- Vectors survive restarts
|
|
- Single server only (no distributed setup)
|
|
- Disk I/O for durability
|
|
|
|
**Network Mode (Recommended for Production):**
|
|
```bash
|
|
VECTOR_SYNC_ENABLED=true
|
|
QDRANT_URL=http://qdrant:6333
|
|
QDRANT_API_KEY=secret # optional
|
|
```
|
|
- Dedicated Qdrant service (Docker, Kubernetes)
|
|
- Horizontal scaling (multiple MCP servers → one Qdrant)
|
|
- High availability options
|
|
|
|
### Embedding Service Options
|
|
|
|
| Service | Configuration | Cost | Performance | Best For |
|
|
|---------|---------------|------|-------------|----------|
|
|
| **Ollama** (recommended) | `OLLAMA_BASE_URL=http://ollama:11434` | Free (self-hosted) | Fast (local GPU) | Production, development |
|
|
| **OpenAI** (future) | `OPENAI_API_KEY=sk-...` | Paid (API) | Fast (cloud) | Cloud deployments |
|
|
| **Fallback** | No config | Free | Slow (random) | Testing only (not production) |
|
|
|
|
**Ollama Setup (Recommended):**
|
|
```bash
|
|
# docker-compose.yml
|
|
services:
|
|
ollama:
|
|
image: ollama/ollama
|
|
volumes:
|
|
- ollama-data:/root/.ollama
|
|
ports:
|
|
- "11434:11434"
|
|
|
|
# Pull embedding model
|
|
docker compose exec ollama ollama pull nomic-embed-text
|
|
```
|
|
|
|
**Environment Configuration:**
|
|
```bash
|
|
OLLAMA_BASE_URL=http://ollama:11434
|
|
OLLAMA_EMBEDDING_MODEL=nomic-embed-text # 768-dimensional vectors
|
|
```
|
|
|
|
**Model Options:**
|
|
- `nomic-embed-text` (default): 768-dim, optimized for semantic search
|
|
- `all-minilm`: Smaller, faster, slightly less accurate
|
|
- `mxbai-embed-large`: Larger, more accurate, slower
|
|
|
|
## Configuration Overview
|
|
|
|
### Key Environment Variables
|
|
|
|
**Enable Semantic Search:**
|
|
```bash
|
|
VECTOR_SYNC_ENABLED=true # Default: false (opt-in)
|
|
```
|
|
|
|
**Qdrant Vector Database:**
|
|
```bash
|
|
# In-memory mode (default if VECTOR_SYNC_ENABLED=true)
|
|
# QDRANT_LOCATION not set → uses :memory:
|
|
|
|
# Persistent local mode
|
|
QDRANT_LOCATION=/var/lib/qdrant
|
|
|
|
# Network mode (production)
|
|
QDRANT_URL=http://qdrant:6333
|
|
QDRANT_API_KEY=secret # optional
|
|
```
|
|
|
|
**Ollama Embedding Service:**
|
|
```bash
|
|
OLLAMA_BASE_URL=http://ollama:11434
|
|
OLLAMA_EMBEDDING_MODEL=nomic-embed-text # Default
|
|
```
|
|
|
|
**Scanner Configuration:**
|
|
```bash
|
|
VECTOR_SYNC_INTERVAL=3600 # Scan interval in seconds (default: 1 hour)
|
|
```
|
|
|
|
### Resource Requirements
|
|
|
|
**Qdrant:**
|
|
- **Memory**: ~100-200 MB base + ~1 KB per vector (1M vectors ≈ 1 GB)
|
|
- **Disk**: Persistent mode only, ~200 bytes per vector
|
|
- **CPU**: Low (indexing) to moderate (search)
|
|
|
|
**Ollama:**
|
|
- **Memory**: 2-4 GB for `nomic-embed-text` model
|
|
- **CPU**: High during embedding generation, idle otherwise
|
|
- **GPU**: Optional but recommended (10-100x faster)
|
|
|
|
**MCP Server:**
|
|
- **Memory**: +50-100 MB for background sync workers
|
|
- **CPU**: Moderate during scanning/processing, low otherwise
|
|
|
|
### Trade-offs
|
|
|
|
| Consideration | In-Memory Qdrant | Persistent Qdrant | Network Qdrant |
|
|
|---------------|------------------|-------------------|----------------|
|
|
| Setup complexity | ✅ Minimal | ✅ Easy | ⚠️ Requires separate service |
|
|
| Durability | ❌ Lost on restart | ✅ Survives restarts | ✅ Survives restarts |
|
|
| Scalability | ❌ Single instance | ❌ Single instance | ✅ Horizontal scaling |
|
|
| Performance | ✅ Fastest | ✅ Fast | ⚠️ Network latency |
|
|
|
|
## Operational Behavior
|
|
|
|
### What Happens When VECTOR_SYNC_ENABLED=true
|
|
|
|
**Immediate (Server Startup):**
|
|
1. MCP server connects to Qdrant (creates collection if needed)
|
|
2. MCP server connects to Ollama (verifies embedding model available)
|
|
3. Background scanner starts (schedules hourly runs)
|
|
4. Document queue and processors initialize
|
|
|
|
**First Scan (Within 1 hour):**
|
|
1. Scanner fetches all notes from Nextcloud
|
|
2. Compares with Qdrant (likely empty on first run)
|
|
3. Enqueues all notes for indexing
|
|
4. Processors generate embeddings (may take minutes for large note collections)
|
|
5. Vectors stored in Qdrant with user_id filtering
|
|
|
|
**Hourly Thereafter:**
|
|
1. Scanner fetches all notes
|
|
2. Identifies new/modified/deleted notes (ETag comparison)
|
|
3. Enqueues changes only
|
|
4. Incremental updates processed
|
|
|
|
### Performance Expectations
|
|
|
|
**Embedding Generation:**
|
|
- **Without GPU**: 1-5 notes/second (CPU-bound)
|
|
- **With GPU**: 50-200 notes/second (highly parallel)
|
|
- **Initial indexing**: 100 notes ≈ 20-100 seconds (CPU), 1-2 seconds (GPU)
|
|
|
|
**Search Query:**
|
|
- **Embedding generation**: 50-100ms
|
|
- **Vector search**: 10-50ms (depends on collection size)
|
|
- **Access verification**: 20-100ms per document (Nextcloud API calls)
|
|
- **Total latency**: 100-300ms typical
|
|
|
|
**Resource Usage:**
|
|
- **Idle**: Minimal (background scanner sleeps)
|
|
- **Scanning**: Moderate CPU (ETag checks, API calls)
|
|
- **Processing**: High CPU/GPU (embedding generation)
|
|
- **Searching**: Low to moderate (depends on query frequency)
|
|
|
|
### Background Sync Behavior
|
|
|
|
**Scanner Triggers:**
|
|
- Hourly (configurable via `VECTOR_SYNC_INTERVAL`)
|
|
- Manual trigger via `nc_trigger_vector_sync` (future)
|
|
|
|
**Queue Processing:**
|
|
- Continuous (workers always running)
|
|
- Batch processing (fetch 10 documents at a time)
|
|
- Concurrent workers (4 by default)
|
|
|
|
**Error Handling:**
|
|
- Individual document failures logged but don't stop scanning
|
|
- Retries for transient errors (network timeouts, rate limits)
|
|
- Failed documents skipped, re-attempted on next scan
|
|
|
|
**What Gets Indexed:**
|
|
- **Notes**: All notes accessible to the authenticated user
|
|
- **Future**: Calendar events, tasks, deck cards, files with text extraction, contacts
|
|
|
|
## Monitoring & Observability
|
|
|
|
### MCP Tools
|
|
|
|
**`nc_get_vector_sync_status`** - Check sync status
|
|
```python
|
|
{
|
|
"total_documents": 1234,
|
|
"indexed_documents": 1200,
|
|
"pending_documents": 34,
|
|
"sync_enabled": true,
|
|
"last_scan": "2025-01-15T14:30:00Z",
|
|
"status": "syncing" # idle | syncing | error
|
|
}
|
|
```
|
|
|
|
**Interpreting Status:**
|
|
- `idle`: No pending work, last scan completed successfully
|
|
- `syncing`: Currently processing documents
|
|
- `error`: Last scan failed (check logs)
|
|
|
|
### Logs to Check
|
|
|
|
**Scanner Logs:**
|
|
```
|
|
[INFO] Vector sync scanner started (interval: 3600s)
|
|
[INFO] Scanning notes: found 150 documents
|
|
[INFO] Changes detected: 5 new, 2 modified, 1 deleted
|
|
[INFO] Enqueued 7 documents for processing
|
|
```
|
|
|
|
**Processor Logs:**
|
|
```
|
|
[INFO] Processing document: note_123
|
|
[DEBUG] Generated embedding (768 dimensions)
|
|
[INFO] Stored vector in Qdrant: note_123
|
|
```
|
|
|
|
**Error Logs:**
|
|
```
|
|
[ERROR] Failed to generate embedding for note_123: Connection timeout
|
|
[WARN] Qdrant connection lost, retrying...
|
|
[ERROR] Ollama embedding failed: Model not found
|
|
```
|
|
|
|
**Log Locations:**
|
|
- **Docker**: `docker compose logs mcp`
|
|
- **Local**: stdout (redirect to file if needed)
|
|
- **Kubernetes**: `kubectl logs -f deployment/nextcloud-mcp-server`
|
|
|
|
### Metrics to Monitor
|
|
|
|
**Indexing Progress:**
|
|
- Total documents vs indexed documents
|
|
- Pending queue size
|
|
- Processing rate (docs/second)
|
|
|
|
**Search Performance:**
|
|
- Query latency (p50, p95, p99)
|
|
- Results per query
|
|
- Verification overhead (API calls per query)
|
|
|
|
**Resource Usage:**
|
|
- Qdrant memory/disk usage
|
|
- Ollama CPU/GPU usage
|
|
- MCP server memory
|
|
|
|
For detailed observability setup, see [docs/observability.md](observability.md).
|
|
|
|
## Troubleshooting from Architecture Perspective
|
|
|
|
### Documents Not Appearing in Search
|
|
|
|
**Diagnosis Flow:**
|
|
1. Check sync status: `nc_get_vector_sync_status`
|
|
- `sync_enabled: false` → Enable with `VECTOR_SYNC_ENABLED=true`
|
|
- `status: error` → Check scanner logs for failures
|
|
2. Check queue size:
|
|
- `pending_documents > 0` → Processing in progress, wait
|
|
- `pending_documents == 0` but `indexed_documents` low → Scan hasn't run yet (wait up to 1 hour)
|
|
3. Check Qdrant:
|
|
- Connection errors in logs → Verify `QDRANT_URL` or `QDRANT_LOCATION`
|
|
- Collection empty → First scan hasn't completed
|
|
4. Check Ollama:
|
|
- Embedding errors in logs → Verify `OLLAMA_BASE_URL`
|
|
- Model not found → Pull model: `ollama pull nomic-embed-text`
|
|
|
|
**Common Causes:**
|
|
- Sync disabled (default): Enable `VECTOR_SYNC_ENABLED=true`
|
|
- Ollama not running: Start Ollama service
|
|
- Qdrant not accessible: Check network/URL
|
|
- First scan in progress: Wait up to 1 hour + processing time
|
|
|
|
### Slow Search Performance
|
|
|
|
**Diagnosis:**
|
|
1. **Query embedding slow (>500ms)**:
|
|
- Ollama overloaded or CPU-bound
|
|
- Solution: Use GPU, upgrade CPU, or reduce concurrent requests
|
|
2. **Vector search slow (>200ms)**:
|
|
- Large collection (millions of vectors)
|
|
- Solution: Use network Qdrant with SSDs, add indexing
|
|
3. **Verification slow (>500ms)**:
|
|
- Many results to verify (10+ documents)
|
|
- Nextcloud API slow or overloaded
|
|
- Solution: Reduce `limit` parameter, optimize Nextcloud
|
|
|
|
**Performance Tuning:**
|
|
- Reduce search `limit` (default: 10 results)
|
|
- Use network Qdrant for large collections
|
|
- Enable Ollama GPU acceleration
|
|
- Check Nextcloud API response times
|
|
|
|
### Background Sync Stopped
|
|
|
|
**Diagnosis:**
|
|
1. Check logs for errors:
|
|
- Authentication failures (401/403) → Token expired (OAuth) or credentials invalid (BasicAuth)
|
|
- Connection timeouts → Network issues with Nextcloud/Qdrant/Ollama
|
|
- Rate limiting (429) → Reduce scan frequency
|
|
2. Check `nc_get_vector_sync_status`:
|
|
- `status: error` → See logs for details
|
|
- `last_scan` timestamp old (>2 hours) → Scanner may have crashed
|
|
3. Verify services:
|
|
- Qdrant accessible: `curl http://qdrant:6333/`
|
|
- Ollama accessible: `curl http://ollama:11434/api/tags`
|
|
- Nextcloud accessible: Check API health
|
|
|
|
**OAuth Mode (Future):**
|
|
- Offline access token expired → Re-provision via `provision_vector_sync`
|
|
- User deprovisioned access → Sync stops intentionally
|
|
|
|
### Out of Memory
|
|
|
|
**Diagnosis:**
|
|
1. Check Qdrant mode:
|
|
- In-memory mode with large collection → Switch to persistent or network mode
|
|
2. Check embedding batch size:
|
|
- Too many documents processed simultaneously → Reduce worker count
|
|
3. Check Ollama memory:
|
|
- Large models loaded → Use smaller embedding model
|
|
|
|
**Solutions:**
|
|
- Use persistent or network Qdrant (frees server memory)
|
|
- Reduce concurrent processor workers
|
|
- Use smaller embedding model (`all-minilm` instead of `nomic-embed-text`)
|
|
- Increase server memory allocation
|
|
|
|
## Limitations & Future Work
|
|
|
|
### Current Limitations
|
|
|
|
1. **Notes App Only**
|
|
- Architecture supports multiple apps (plugin system ready)
|
|
- Only `NotesScanner` and `NotesProcessor` implemented
|
|
- Future: Calendar, Deck, Files, Contacts
|
|
|
|
2. **MCP Sampling Support**
|
|
- `nc_semantic_search_answer` requires client sampling capability
|
|
- Not all MCP clients support sampling yet
|
|
- Graceful fallback: Returns documents without generated answer
|
|
|
|
3. **OAuth Background Sync**
|
|
- User-controlled background jobs not yet implemented
|
|
- Currently works in BasicAuth mode only
|
|
- Future: Users opt-in via `provision_vector_sync` tool
|
|
|
|
4. **No Incremental Updates**
|
|
- Document changes trigger full re-embedding
|
|
- Cannot update just modified paragraphs
|
|
- Future: Paragraph-level chunking and incremental updates
|
|
|
|
5. **No Query Caching**
|
|
- Each search generates new query embedding
|
|
- Repeated queries re-search Qdrant
|
|
- Future: Cache recent query embeddings and results
|
|
|
|
6. **Single Embedding Model**
|
|
- Uses one model for all documents and queries
|
|
- Cannot customize per app or user
|
|
- Future: App-specific or user-selected models
|
|
|
|
### Future Enhancements
|
|
|
|
**Multi-App Support** (In Progress):
|
|
- Scanner plugins for Calendar, Deck, Files, Contacts
|
|
- Unified vector search across all apps
|
|
- App-specific metadata in vector payloads
|
|
|
|
**User-Controlled Sync (OAuth Mode)**:
|
|
- `provision_vector_sync` and `deprovision_vector_sync` tools
|
|
- Per-user background job scheduling
|
|
- User dashboard for sync status and controls
|
|
|
|
**Advanced Search Features**:
|
|
- Hybrid search (vector + keyword combined)
|
|
- Filtering by date range, app type, tags
|
|
- Aggregations and faceted search
|
|
- Search result explanations (why this matched)
|
|
|
|
**Performance Optimizations**:
|
|
- Query caching for repeated searches
|
|
- Incremental document updates (paragraph-level)
|
|
- Batch query processing
|
|
- Qdrant HNSW indexing tuning
|
|
|
|
**Embedding Improvements**:
|
|
- Support for OpenAI embeddings (ada-002, text-embedding-3)
|
|
- Multi-language embedding models
|
|
- Fine-tuned models for Nextcloud content
|
|
- Paragraph-level chunking for long documents
|
|
|
|
## References
|
|
|
|
### Architecture Decision Records (ADRs)
|
|
|
|
- **[ADR-003: Vector Database Semantic Search](ADR-003-vector-database-semantic-search.md)** - Qdrant selection rationale, embedding strategy, hybrid search (superseded by ADR-007 but technical decisions remain valid)
|
|
- **[ADR-007: Background Vector Sync Job Management](ADR-007-background-vector-sync-job-management.md)** - Current implementation, Scanner-Queue-Processor architecture, plugin system
|
|
- **[ADR-008: MCP Sampling for Semantic Search](ADR-008-mcp-sampling-for-semantic-search.md)** - RAG with MCP sampling, client-server separation, prompt construction
|
|
- **[ADR-009: Semantic Search OAuth Scope](ADR-009-semantic-search-oauth-scope.md)** - OAuth scope model, dual-phase authorization, security rationale
|
|
|
|
### Configuration & Setup
|
|
|
|
- **[Configuration Guide](configuration.md)** - Environment variables, Qdrant setup, Ollama setup, detailed configuration options
|
|
- **[Installation Guide](installation.md)** - Deployment options (Docker, Kubernetes, local)
|
|
- **[Running the Server](running.md)** - Starting the server, transport options, testing
|
|
|
|
### Monitoring & Troubleshooting
|
|
|
|
- **[Observability Guide](observability.md)** - Logging, metrics, tracing, debugging
|
|
- **[Troubleshooting](troubleshooting.md)** - General issues and solutions
|
|
|
|
### Related Documentation
|
|
|
|
- **[OAuth Architecture](oauth-architecture.md)** - OAuth flows, scopes, token management
|
|
- **[Comparison with Context Agent](comparison-context-agent.md)** - When to use Nextcloud MCP Server vs Context Agent
|
|
|
|
---
|
|
|
|
**Questions or Issues?**
|
|
- [Open an issue](https://github.com/cbcoutinho/nextcloud-mcp-server/issues)
|
|
- [Contribute improvements](https://github.com/cbcoutinho/nextcloud-mcp-server/pulls)
|