Files
nextcloud-mcp-server/nextcloud_mcp_server/vector/qdrant_client.py
T
Chris Coutinho e575c8e57b feat(vector): Support multiple embedding models with auto-generated collection names
This PR enables safe switching between embedding models and multi-server
deployments by implementing auto-generated Qdrant collection names based on
deployment ID and model name.

## Problem

Previously, all deployments used a single hardcoded collection name
"nextcloud_content", which caused two critical issues:

1. **Dimension mismatches when switching models**: Changing
   OLLAMA_EMBEDDING_MODEL (e.g., nomic-embed-text at 768D → all-minilm at
   384D) would cause runtime errors as vectors couldn't be inserted into a
   collection with incompatible dimensions.

2. **Collection collisions in multi-server setups**: Multiple MCP servers
   sharing a single Qdrant instance would overwrite each other's data,
   making horizontal scaling impossible.

## Solution

### Auto-Generated Collection Naming

Collections are now automatically named using the pattern:
\`{deployment-id}-{model-name}\`

**Deployment ID**: Uses \`OTEL_SERVICE_NAME\` if configured (and not default
value), otherwise falls back to \`hostname\` for simple Docker deployments.

**Model Name**: From \`OLLAMA_EMBEDDING_MODEL\` with path separators sanitized.

**Examples**:
- \`my-mcp-server-nomic-embed-text\` (with OTEL_SERVICE_NAME=my-mcp-server)
- \`mcp-container-all-minilm\` (simple Docker, hostname=mcp-container)

**Override**: Users can still set \`QDRANT_COLLECTION\` explicitly to bypass
auto-generation for backward compatibility.

### Dimension Validation

Added startup validation that checks collection dimensions match the
embedding service. If a mismatch is detected, the server fails fast with a
clear error message explaining:
- Expected vs actual dimensions
- Likely cause (model change)
- Solutions (delete collection, use different name, or revert model)

### Improved Sampling Error Handling

Enhanced MCP sampling rejection handling to treat user rejections as normal
behavior rather than errors:

- **User rejections** ("rejected", "denied") → INFO log, no traceback
- **Unsupported clients** → INFO log, no traceback
- **Other MCP errors** → WARNING log, no traceback
- **Unexpected errors** → ERROR log WITH traceback

This aligns with the MCP specification where clients SHOULD prompt users for
approval/denial of sampling requests.

## Changes

### Core Implementation

- **nextcloud_mcp_server/config.py**: Added \`get_collection_name()\` method
  with deployment ID detection and model name sanitization
- **nextcloud_mcp_server/vector/qdrant_client.py**: Dimension validation on
  collection open with helpful error messages
- **nextcloud_mcp_server/vector/{scanner,processor}.py**: Updated to use
  \`get_collection_name()\`
- **nextcloud_mcp_server/auth/userinfo_routes.py**: Vector sync status uses
  \`get_collection_name()\`
- **nextcloud_mcp_server/server/semantic.py**:
  - Updated semantic search tools to use \`get_collection_name()\`
  - Improved sampling rejection error handling (McpError vs Exception)

### Documentation

- **docs/semantic-search-architecture.md**: New comprehensive architecture
  document (557 lines) covering background sync, semantic search flow, RAG
  implementation, and deployment modes
- **docs/configuration.md**: Added detailed "Qdrant Collection Naming"
  section with examples and multi-server deployment guidance
- **docker-compose.yml**: Added comments explaining collection naming behavior
- **README.md**: Updated semantic search descriptions to clarify
  experimental status, Notes-only support, and infrastructure requirements

## Migration Guide

**For existing single-server deployments:**

Option 1 (Recommended): Use explicit collection name for continuity
\`\`\`bash
QDRANT_COLLECTION=nextcloud_content  # Keep existing collection
\`\`\`

Option 2: Allow auto-generation and re-embed
\`\`\`bash
# Remove QDRANT_COLLECTION override
# New collection will be created based on deployment ID + model
# Requires re-embedding all documents (may take time)
\`\`\`

**For new multi-server deployments:**

Set unique OTEL service names per server:
\`\`\`bash
# Server 1
OTEL_SERVICE_NAME=mcp-prod
OLLAMA_EMBEDDING_MODEL=nomic-embed-text
# → Collection: "mcp-prod-nomic-embed-text"

# Server 2
OTEL_SERVICE_NAME=mcp-staging
OLLAMA_EMBEDDING_MODEL=nomic-embed-text
# → Collection: "mcp-staging-nomic-embed-text"
\`\`\`

## Benefits

 **Safe model switching**: Each model gets its own collection, preventing
   dimension mismatch errors
 **Multi-server support**: Multiple MCP servers can share one Qdrant
   instance without conflicts
 **Clear ownership**: Collection names show which deployment and model owns
   the data
 **Better error messages**: Dimension validation provides actionable
   guidance
 **Backward compatible**: Existing deployments can continue using
   \`QDRANT_COLLECTION\` override

## Testing

Validated with:
- Single-server deployments (default hostname-based naming)
- Multi-server deployments (OTEL service name-based naming)
- Model switching scenarios (dimension validation)
- Collection override scenarios (backward compatibility)

Next steps: Testing various Ollama embedding models to investigate optimal
chunk sizes and performance characteristics.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-10 01:18:30 +01:00

116 lines
4.4 KiB
Python

"""Qdrant client wrapper."""
import logging
from qdrant_client import AsyncQdrantClient
from qdrant_client.models import Distance, VectorParams
from nextcloud_mcp_server.config import get_settings
logger = logging.getLogger(__name__)
# Singleton instance
_qdrant_client: AsyncQdrantClient | None = None
async def get_qdrant_client() -> AsyncQdrantClient:
"""
Get singleton Qdrant client instance.
Automatically creates collection on first use if it doesn't exist.
Supports three Qdrant modes:
- Network mode: QDRANT_URL set (e.g., http://qdrant:6333)
- In-memory mode: QDRANT_LOCATION=:memory: (default if nothing configured)
- Persistent local mode: QDRANT_LOCATION=/path/to/data
Returns:
Configured AsyncQdrantClient instance
Raises:
Exception: If Qdrant connection fails or collection creation fails
"""
global _qdrant_client
if _qdrant_client is None:
settings = get_settings()
# Detect mode and initialize client accordingly
if settings.qdrant_url:
# Network mode
logger.info(f"Using Qdrant network mode: {settings.qdrant_url}")
_qdrant_client = AsyncQdrantClient(
url=settings.qdrant_url,
api_key=settings.qdrant_api_key,
timeout=30,
)
elif settings.qdrant_location:
# Local mode (either :memory: or persistent path)
if settings.qdrant_location == ":memory:":
logger.info("Using Qdrant in-memory mode: :memory:")
_qdrant_client = AsyncQdrantClient(":memory:")
else:
# Persistent local mode - use path parameter
logger.info(f"Using Qdrant persistent mode: {settings.qdrant_location}")
_qdrant_client = AsyncQdrantClient(path=settings.qdrant_location)
else:
# Should not happen due to __post_init__ validation, but handle gracefully
logger.warning("No Qdrant mode configured, defaulting to :memory:")
_qdrant_client = AsyncQdrantClient(":memory:")
# Get collection name (auto-generated from deployment ID + model)
collection_name = settings.get_collection_name()
# Import here to avoid circular dependency
from nextcloud_mcp_server.embedding import get_embedding_service
embedding_service = get_embedding_service()
expected_dimension = embedding_service.get_dimension()
try:
# Get existing collection
collection_info = await _qdrant_client.get_collection(collection_name)
actual_dimension = collection_info.config.params.vectors.size
# Validate dimension matches
if actual_dimension != expected_dimension:
raise ValueError(
f"Dimension mismatch for collection '{collection_name}':\n"
f" Expected: {expected_dimension} (from embedding model '{settings.ollama_embedding_model}')\n"
f" Found: {actual_dimension}\n"
f"This usually means you changed the embedding model.\n"
f"Solutions:\n"
f" 1. Delete the old collection: Collection will be recreated with new dimensions\n"
f" 2. Set QDRANT_COLLECTION to use a different collection name\n"
f" 3. Revert OLLAMA_EMBEDDING_MODEL to the original model"
)
logger.info(
f"Using existing Qdrant collection: {collection_name} "
f"(dimension={actual_dimension}, model={settings.ollama_embedding_model})"
)
except Exception as e:
# Check if it's a dimension mismatch error (re-raise it)
if isinstance(e, ValueError):
raise
# Collection doesn't exist, create it
await _qdrant_client.create_collection(
collection_name=collection_name,
vectors_config=VectorParams(
size=expected_dimension,
distance=Distance.COSINE,
),
)
logger.info(
f"Created Qdrant collection: {collection_name}\n"
f" Dimension: {expected_dimension}\n"
f" Model: {settings.ollama_embedding_model}\n"
f" Distance: COSINE\n"
f"Background sync will index all documents with this embedding model."
)
return _qdrant_client