Files
nextcloud-mcp-server/nextcloud_mcp_server/embedding/ollama_provider.py
T
Chris Coutinho 8f45e996e8 feat: implement vector sync scanner and processor (ADR-007 Phase 2)
Implements background vector database synchronization using anyio
TaskGroups for BasicAuth mode with single-user credentials.

Scanner Implementation:
- Periodic document discovery (hourly, configurable)
- Timestamp-based change detection (Nextcloud vs Qdrant)
- Wake event for immediate scanning on-demand
- Supports both initial sync (all docs) and incremental sync (changes only)
- Detects deleted documents and queues for removal

Processor Implementation:
- Concurrent document processing pool (3 workers default)
- I/O-bound embedding generation via Ollama API
- Retry logic with exponential backoff (3 retries)
- Document chunking (512 words, 50-word overlap)
- Handles both index and delete operations
- Upserts vectors to Qdrant with rich metadata

App Lifespan Integration:
- Extended AppContext with background task state
- Modified app_lifespan_basic() to start tasks via anyio TaskGroups
- Graceful shutdown with coordinated task cancellation
- Only activates when VECTOR_SYNC_ENABLED=true

Embedding Service:
- OllamaEmbeddingProvider with TLS support
- Singleton pattern for shared client instances
- Batch embedding support for efficiency
- Auto-detects embedding dimension (768 for nomic-embed-text)

Qdrant Client:
- Async client wrapper with singleton pattern
- Auto-creates collection on first use
- COSINE distance metric for semantic similarity
- Integrates with embedding service for dimension detection

Health Check Enhancement:
- Added Qdrant status check to /health/ready endpoint
- Only checks when VECTOR_SYNC_ENABLED=true
- 2-second timeout for health probe
- Reports connection errors with details

Configuration:
- VECTOR_SYNC_ENABLED: Enable background sync
- VECTOR_SYNC_SCAN_INTERVAL: Scanner frequency (3600s default)
- VECTOR_SYNC_PROCESSOR_WORKERS: Concurrent processors (3 default)
- QDRANT_URL, QDRANT_API_KEY, QDRANT_COLLECTION: Vector DB config
- OLLAMA_BASE_URL, OLLAMA_EMBEDDING_MODEL: Embedding service config

Dependencies Added:
- qdrant-client>=1.7.0: Vector database client

Docker Compose:
- Added Qdrant service with health check
- Exposed ports 6333 (REST) and 6334 (gRPC)
- Configured MCP service with vector sync environment
- Added qdrant-data volume for persistence

Known Issue:
- FastMCP lifespan not triggering for streamable-http transport
- Background tasks will start once lifespan integration is complete
- Lifespan triggers on MCP session establishment, not server startup

Related: ADR-007 Background Vector Database Synchronization

🤖 Generated with Claude Code (https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-08 21:14:38 +01:00

86 lines
2.4 KiB
Python

"""Ollama embedding provider."""
import logging
import httpx
from .base import EmbeddingProvider
logger = logging.getLogger(__name__)
class OllamaEmbeddingProvider(EmbeddingProvider):
"""Ollama embedding provider with TLS support."""
def __init__(
self,
base_url: str,
model: str = "nomic-embed-text",
verify_ssl: bool = True,
):
"""
Initialize Ollama embedding provider.
Args:
base_url: Ollama API base URL (e.g., https://ollama.internal.coutinho.io:443)
model: Embedding model name (default: nomic-embed-text)
verify_ssl: Verify SSL certificates (default: True)
"""
self.base_url = base_url.rstrip("/")
self.model = model
self.verify_ssl = verify_ssl
self.client = httpx.AsyncClient(verify=verify_ssl, timeout=30.0)
self._dimension = 768 # nomic-embed-text default
logger.info(
f"Initialized Ollama provider: {base_url} (model={model}, verify_ssl={verify_ssl})"
)
async def embed(self, text: str) -> list[float]:
"""
Generate embedding vector for text.
Args:
text: Input text to embed
Returns:
Vector embedding as list of floats
"""
response = await self.client.post(
f"{self.base_url}/api/embeddings",
json={"model": self.model, "prompt": text},
)
response.raise_for_status()
return response.json()["embedding"]
async def embed_batch(self, texts: list[str]) -> list[list[float]]:
"""
Generate embeddings for multiple texts (batched requests).
Note: Ollama doesn't have native batch API, so we send requests sequentially.
For better performance with large batches, consider using asyncio.gather().
Args:
texts: List of texts to embed
Returns:
List of vector embeddings
"""
embeddings = []
for text in texts:
embedding = await self.embed(text)
embeddings.append(embedding)
return embeddings
def get_dimension(self) -> int:
"""
Get embedding dimension.
Returns:
Vector dimension (768 for nomic-embed-text)
"""
return self._dimension
async def close(self):
"""Close HTTP client."""
await self.client.aclose()