e575c8e57b
This PR enables safe switching between embedding models and multi-server
deployments by implementing auto-generated Qdrant collection names based on
deployment ID and model name.
## Problem
Previously, all deployments used a single hardcoded collection name
"nextcloud_content", which caused two critical issues:
1. **Dimension mismatches when switching models**: Changing
OLLAMA_EMBEDDING_MODEL (e.g., nomic-embed-text at 768D → all-minilm at
384D) would cause runtime errors as vectors couldn't be inserted into a
collection with incompatible dimensions.
2. **Collection collisions in multi-server setups**: Multiple MCP servers
sharing a single Qdrant instance would overwrite each other's data,
making horizontal scaling impossible.
## Solution
### Auto-Generated Collection Naming
Collections are now automatically named using the pattern:
\`{deployment-id}-{model-name}\`
**Deployment ID**: Uses \`OTEL_SERVICE_NAME\` if configured (and not default
value), otherwise falls back to \`hostname\` for simple Docker deployments.
**Model Name**: From \`OLLAMA_EMBEDDING_MODEL\` with path separators sanitized.
**Examples**:
- \`my-mcp-server-nomic-embed-text\` (with OTEL_SERVICE_NAME=my-mcp-server)
- \`mcp-container-all-minilm\` (simple Docker, hostname=mcp-container)
**Override**: Users can still set \`QDRANT_COLLECTION\` explicitly to bypass
auto-generation for backward compatibility.
### Dimension Validation
Added startup validation that checks collection dimensions match the
embedding service. If a mismatch is detected, the server fails fast with a
clear error message explaining:
- Expected vs actual dimensions
- Likely cause (model change)
- Solutions (delete collection, use different name, or revert model)
### Improved Sampling Error Handling
Enhanced MCP sampling rejection handling to treat user rejections as normal
behavior rather than errors:
- **User rejections** ("rejected", "denied") → INFO log, no traceback
- **Unsupported clients** → INFO log, no traceback
- **Other MCP errors** → WARNING log, no traceback
- **Unexpected errors** → ERROR log WITH traceback
This aligns with the MCP specification where clients SHOULD prompt users for
approval/denial of sampling requests.
## Changes
### Core Implementation
- **nextcloud_mcp_server/config.py**: Added \`get_collection_name()\` method
with deployment ID detection and model name sanitization
- **nextcloud_mcp_server/vector/qdrant_client.py**: Dimension validation on
collection open with helpful error messages
- **nextcloud_mcp_server/vector/{scanner,processor}.py**: Updated to use
\`get_collection_name()\`
- **nextcloud_mcp_server/auth/userinfo_routes.py**: Vector sync status uses
\`get_collection_name()\`
- **nextcloud_mcp_server/server/semantic.py**:
- Updated semantic search tools to use \`get_collection_name()\`
- Improved sampling rejection error handling (McpError vs Exception)
### Documentation
- **docs/semantic-search-architecture.md**: New comprehensive architecture
document (557 lines) covering background sync, semantic search flow, RAG
implementation, and deployment modes
- **docs/configuration.md**: Added detailed "Qdrant Collection Naming"
section with examples and multi-server deployment guidance
- **docker-compose.yml**: Added comments explaining collection naming behavior
- **README.md**: Updated semantic search descriptions to clarify
experimental status, Notes-only support, and infrastructure requirements
## Migration Guide
**For existing single-server deployments:**
Option 1 (Recommended): Use explicit collection name for continuity
\`\`\`bash
QDRANT_COLLECTION=nextcloud_content # Keep existing collection
\`\`\`
Option 2: Allow auto-generation and re-embed
\`\`\`bash
# Remove QDRANT_COLLECTION override
# New collection will be created based on deployment ID + model
# Requires re-embedding all documents (may take time)
\`\`\`
**For new multi-server deployments:**
Set unique OTEL service names per server:
\`\`\`bash
# Server 1
OTEL_SERVICE_NAME=mcp-prod
OLLAMA_EMBEDDING_MODEL=nomic-embed-text
# → Collection: "mcp-prod-nomic-embed-text"
# Server 2
OTEL_SERVICE_NAME=mcp-staging
OLLAMA_EMBEDDING_MODEL=nomic-embed-text
# → Collection: "mcp-staging-nomic-embed-text"
\`\`\`
## Benefits
✅ **Safe model switching**: Each model gets its own collection, preventing
dimension mismatch errors
✅ **Multi-server support**: Multiple MCP servers can share one Qdrant
instance without conflicts
✅ **Clear ownership**: Collection names show which deployment and model owns
the data
✅ **Better error messages**: Dimension validation provides actionable
guidance
✅ **Backward compatible**: Existing deployments can continue using
\`QDRANT_COLLECTION\` override
## Testing
Validated with:
- Single-server deployments (default hostname-based naming)
- Multi-server deployments (OTEL service name-based naming)
- Model switching scenarios (dimension validation)
- Collection override scenarios (backward compatibility)
Next steps: Testing various Ollama embedding models to investigate optimal
chunk sizes and performance characteristics.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
116 lines
4.4 KiB
Python
116 lines
4.4 KiB
Python
"""Qdrant client wrapper."""
|
|
|
|
import logging
|
|
|
|
from qdrant_client import AsyncQdrantClient
|
|
from qdrant_client.models import Distance, VectorParams
|
|
|
|
from nextcloud_mcp_server.config import get_settings
|
|
|
|
logger = logging.getLogger(__name__)
|
|
|
|
|
|
# Singleton instance
|
|
_qdrant_client: AsyncQdrantClient | None = None
|
|
|
|
|
|
async def get_qdrant_client() -> AsyncQdrantClient:
|
|
"""
|
|
Get singleton Qdrant client instance.
|
|
|
|
Automatically creates collection on first use if it doesn't exist.
|
|
|
|
Supports three Qdrant modes:
|
|
- Network mode: QDRANT_URL set (e.g., http://qdrant:6333)
|
|
- In-memory mode: QDRANT_LOCATION=:memory: (default if nothing configured)
|
|
- Persistent local mode: QDRANT_LOCATION=/path/to/data
|
|
|
|
Returns:
|
|
Configured AsyncQdrantClient instance
|
|
|
|
Raises:
|
|
Exception: If Qdrant connection fails or collection creation fails
|
|
"""
|
|
global _qdrant_client
|
|
|
|
if _qdrant_client is None:
|
|
settings = get_settings()
|
|
|
|
# Detect mode and initialize client accordingly
|
|
if settings.qdrant_url:
|
|
# Network mode
|
|
logger.info(f"Using Qdrant network mode: {settings.qdrant_url}")
|
|
_qdrant_client = AsyncQdrantClient(
|
|
url=settings.qdrant_url,
|
|
api_key=settings.qdrant_api_key,
|
|
timeout=30,
|
|
)
|
|
elif settings.qdrant_location:
|
|
# Local mode (either :memory: or persistent path)
|
|
if settings.qdrant_location == ":memory:":
|
|
logger.info("Using Qdrant in-memory mode: :memory:")
|
|
_qdrant_client = AsyncQdrantClient(":memory:")
|
|
else:
|
|
# Persistent local mode - use path parameter
|
|
logger.info(f"Using Qdrant persistent mode: {settings.qdrant_location}")
|
|
_qdrant_client = AsyncQdrantClient(path=settings.qdrant_location)
|
|
else:
|
|
# Should not happen due to __post_init__ validation, but handle gracefully
|
|
logger.warning("No Qdrant mode configured, defaulting to :memory:")
|
|
_qdrant_client = AsyncQdrantClient(":memory:")
|
|
|
|
# Get collection name (auto-generated from deployment ID + model)
|
|
collection_name = settings.get_collection_name()
|
|
|
|
# Import here to avoid circular dependency
|
|
from nextcloud_mcp_server.embedding import get_embedding_service
|
|
|
|
embedding_service = get_embedding_service()
|
|
expected_dimension = embedding_service.get_dimension()
|
|
|
|
try:
|
|
# Get existing collection
|
|
collection_info = await _qdrant_client.get_collection(collection_name)
|
|
actual_dimension = collection_info.config.params.vectors.size
|
|
|
|
# Validate dimension matches
|
|
if actual_dimension != expected_dimension:
|
|
raise ValueError(
|
|
f"Dimension mismatch for collection '{collection_name}':\n"
|
|
f" Expected: {expected_dimension} (from embedding model '{settings.ollama_embedding_model}')\n"
|
|
f" Found: {actual_dimension}\n"
|
|
f"This usually means you changed the embedding model.\n"
|
|
f"Solutions:\n"
|
|
f" 1. Delete the old collection: Collection will be recreated with new dimensions\n"
|
|
f" 2. Set QDRANT_COLLECTION to use a different collection name\n"
|
|
f" 3. Revert OLLAMA_EMBEDDING_MODEL to the original model"
|
|
)
|
|
|
|
logger.info(
|
|
f"Using existing Qdrant collection: {collection_name} "
|
|
f"(dimension={actual_dimension}, model={settings.ollama_embedding_model})"
|
|
)
|
|
|
|
except Exception as e:
|
|
# Check if it's a dimension mismatch error (re-raise it)
|
|
if isinstance(e, ValueError):
|
|
raise
|
|
|
|
# Collection doesn't exist, create it
|
|
await _qdrant_client.create_collection(
|
|
collection_name=collection_name,
|
|
vectors_config=VectorParams(
|
|
size=expected_dimension,
|
|
distance=Distance.COSINE,
|
|
),
|
|
)
|
|
logger.info(
|
|
f"Created Qdrant collection: {collection_name}\n"
|
|
f" Dimension: {expected_dimension}\n"
|
|
f" Model: {settings.ollama_embedding_model}\n"
|
|
f" Distance: COSINE\n"
|
|
f"Background sync will index all documents with this embedding model."
|
|
)
|
|
|
|
return _qdrant_client
|