feat(vector): Support multiple embedding models with auto-generated collection names
This PR enables safe switching between embedding models and multi-server
deployments by implementing auto-generated Qdrant collection names based on
deployment ID and model name.
## Problem
Previously, all deployments used a single hardcoded collection name
"nextcloud_content", which caused two critical issues:
1. **Dimension mismatches when switching models**: Changing
OLLAMA_EMBEDDING_MODEL (e.g., nomic-embed-text at 768D → all-minilm at
384D) would cause runtime errors as vectors couldn't be inserted into a
collection with incompatible dimensions.
2. **Collection collisions in multi-server setups**: Multiple MCP servers
sharing a single Qdrant instance would overwrite each other's data,
making horizontal scaling impossible.
## Solution
### Auto-Generated Collection Naming
Collections are now automatically named using the pattern:
\`{deployment-id}-{model-name}\`
**Deployment ID**: Uses \`OTEL_SERVICE_NAME\` if configured (and not default
value), otherwise falls back to \`hostname\` for simple Docker deployments.
**Model Name**: From \`OLLAMA_EMBEDDING_MODEL\` with path separators sanitized.
**Examples**:
- \`my-mcp-server-nomic-embed-text\` (with OTEL_SERVICE_NAME=my-mcp-server)
- \`mcp-container-all-minilm\` (simple Docker, hostname=mcp-container)
**Override**: Users can still set \`QDRANT_COLLECTION\` explicitly to bypass
auto-generation for backward compatibility.
### Dimension Validation
Added startup validation that checks collection dimensions match the
embedding service. If a mismatch is detected, the server fails fast with a
clear error message explaining:
- Expected vs actual dimensions
- Likely cause (model change)
- Solutions (delete collection, use different name, or revert model)
### Improved Sampling Error Handling
Enhanced MCP sampling rejection handling to treat user rejections as normal
behavior rather than errors:
- **User rejections** ("rejected", "denied") → INFO log, no traceback
- **Unsupported clients** → INFO log, no traceback
- **Other MCP errors** → WARNING log, no traceback
- **Unexpected errors** → ERROR log WITH traceback
This aligns with the MCP specification where clients SHOULD prompt users for
approval/denial of sampling requests.
## Changes
### Core Implementation
- **nextcloud_mcp_server/config.py**: Added \`get_collection_name()\` method
with deployment ID detection and model name sanitization
- **nextcloud_mcp_server/vector/qdrant_client.py**: Dimension validation on
collection open with helpful error messages
- **nextcloud_mcp_server/vector/{scanner,processor}.py**: Updated to use
\`get_collection_name()\`
- **nextcloud_mcp_server/auth/userinfo_routes.py**: Vector sync status uses
\`get_collection_name()\`
- **nextcloud_mcp_server/server/semantic.py**:
- Updated semantic search tools to use \`get_collection_name()\`
- Improved sampling rejection error handling (McpError vs Exception)
### Documentation
- **docs/semantic-search-architecture.md**: New comprehensive architecture
document (557 lines) covering background sync, semantic search flow, RAG
implementation, and deployment modes
- **docs/configuration.md**: Added detailed "Qdrant Collection Naming"
section with examples and multi-server deployment guidance
- **docker-compose.yml**: Added comments explaining collection naming behavior
- **README.md**: Updated semantic search descriptions to clarify
experimental status, Notes-only support, and infrastructure requirements
## Migration Guide
**For existing single-server deployments:**
Option 1 (Recommended): Use explicit collection name for continuity
\`\`\`bash
QDRANT_COLLECTION=nextcloud_content # Keep existing collection
\`\`\`
Option 2: Allow auto-generation and re-embed
\`\`\`bash
# Remove QDRANT_COLLECTION override
# New collection will be created based on deployment ID + model
# Requires re-embedding all documents (may take time)
\`\`\`
**For new multi-server deployments:**
Set unique OTEL service names per server:
\`\`\`bash
# Server 1
OTEL_SERVICE_NAME=mcp-prod
OLLAMA_EMBEDDING_MODEL=nomic-embed-text
# → Collection: "mcp-prod-nomic-embed-text"
# Server 2
OTEL_SERVICE_NAME=mcp-staging
OLLAMA_EMBEDDING_MODEL=nomic-embed-text
# → Collection: "mcp-staging-nomic-embed-text"
\`\`\`
## Benefits
✅ **Safe model switching**: Each model gets its own collection, preventing
dimension mismatch errors
✅ **Multi-server support**: Multiple MCP servers can share one Qdrant
instance without conflicts
✅ **Clear ownership**: Collection names show which deployment and model owns
the data
✅ **Better error messages**: Dimension validation provides actionable
guidance
✅ **Backward compatible**: Existing deployments can continue using
\`QDRANT_COLLECTION\` override
## Testing
Validated with:
- Single-server deployments (default hostname-based naming)
- Multi-server deployments (OTEL service name-based naming)
- Model switching scenarios (dimension validation)
- Collection override scenarios (backward compatibility)
Next steps: Testing various Ollama embedding models to investigate optimal
chunk sizes and performance characteristics.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
@@ -68,17 +68,25 @@ def configure_semantic_tools(mcp: FastMCP):
|
||||
client = await get_client(ctx)
|
||||
username = client.username
|
||||
|
||||
logger.info(
|
||||
f"Semantic search: query='{query}', user={username}, "
|
||||
f"limit={limit}, score_threshold={score_threshold}"
|
||||
)
|
||||
|
||||
try:
|
||||
# Generate embedding for query
|
||||
embedding_service = get_embedding_service()
|
||||
query_embedding = await embedding_service.embed(query)
|
||||
logger.debug(
|
||||
f"Generated embedding for query (dimension={len(query_embedding)})"
|
||||
)
|
||||
|
||||
# Search Qdrant with user filtering
|
||||
# Note: Currently only searching notes (doc_type="note")
|
||||
# Future: Remove doc_type filter to search all apps
|
||||
qdrant_client = await get_qdrant_client()
|
||||
search_response = await qdrant_client.query_points(
|
||||
collection_name=settings.qdrant_collection,
|
||||
collection_name=settings.get_collection_name(),
|
||||
query=query_embedding,
|
||||
query_filter=Filter(
|
||||
must=[
|
||||
@@ -98,6 +106,15 @@ def configure_semantic_tools(mcp: FastMCP):
|
||||
with_vectors=False, # Don't return vectors to save bandwidth
|
||||
)
|
||||
|
||||
logger.info(
|
||||
f"Qdrant returned {len(search_response.points)} results "
|
||||
f"(before deduplication and access verification)"
|
||||
)
|
||||
if search_response.points:
|
||||
# Log top 3 scores to help with threshold tuning
|
||||
top_scores = [p.score for p in search_response.points[:3]]
|
||||
logger.debug(f"Top 3 similarity scores: {top_scores}")
|
||||
|
||||
# Deduplicate by document ID (multiple chunks per document)
|
||||
seen_doc_ids = set()
|
||||
results = []
|
||||
@@ -137,9 +154,14 @@ def configure_semantic_tools(mcp: FastMCP):
|
||||
except HTTPStatusError as e:
|
||||
if e.response.status_code == 403:
|
||||
# User lost access, skip this document
|
||||
logger.debug(f"Skipping note {doc_id}: access denied (403)")
|
||||
continue
|
||||
elif e.response.status_code == 404:
|
||||
# Document was deleted but not yet removed from vector DB
|
||||
logger.debug(
|
||||
f"Skipping note {doc_id}: not found (404), "
|
||||
f"likely deleted after indexing"
|
||||
)
|
||||
continue
|
||||
else:
|
||||
# Log other errors but continue processing
|
||||
@@ -148,6 +170,16 @@ def configure_semantic_tools(mcp: FastMCP):
|
||||
)
|
||||
continue
|
||||
|
||||
logger.info(
|
||||
f"Returning {len(results)} results after deduplication and access verification"
|
||||
)
|
||||
if results:
|
||||
result_details = [
|
||||
f"note_{r.id} (score={r.score:.3f}, title='{r.title}')"
|
||||
for r in results[:5] # Show top 5
|
||||
]
|
||||
logger.debug(f"Top results: {', '.join(result_details)}")
|
||||
|
||||
return SemanticSearchResponse(
|
||||
results=results,
|
||||
query=query,
|
||||
@@ -259,7 +291,47 @@ def configure_semantic_tools(mcp: FastMCP):
|
||||
success=True,
|
||||
)
|
||||
|
||||
# 3. Construct context from retrieved documents
|
||||
# 3. Check if client supports sampling
|
||||
from mcp.types import ClientCapabilities, SamplingCapability
|
||||
|
||||
client_has_sampling = ctx.session.check_client_capability(
|
||||
ClientCapabilities(sampling=SamplingCapability())
|
||||
)
|
||||
|
||||
# Log capability check result for debugging
|
||||
logger.info(
|
||||
f"Sampling capability check: client_has_sampling={client_has_sampling}, "
|
||||
f"query='{query}'"
|
||||
)
|
||||
if hasattr(ctx.session, "_client_params") and ctx.session._client_params:
|
||||
client_caps = ctx.session._client_params.capabilities
|
||||
logger.debug(
|
||||
f"Client advertised capabilities: "
|
||||
f"roots={client_caps.roots is not None}, "
|
||||
f"sampling={client_caps.sampling is not None}, "
|
||||
f"experimental={client_caps.experimental is not None}"
|
||||
)
|
||||
|
||||
if not client_has_sampling:
|
||||
logger.info(
|
||||
f"Client does not support sampling (query: '{query}'), "
|
||||
f"returning {len(search_response.results)} documents"
|
||||
)
|
||||
return SamplingSearchResponse(
|
||||
query=query,
|
||||
generated_answer=(
|
||||
f"[Sampling not supported by client]\n\n"
|
||||
f"Your MCP client doesn't support answer generation. "
|
||||
f"Found {search_response.total_found} relevant documents. "
|
||||
f"Please review the sources below."
|
||||
),
|
||||
sources=search_response.results,
|
||||
total_found=search_response.total_found,
|
||||
search_method="semantic_sampling_unsupported",
|
||||
success=True,
|
||||
)
|
||||
|
||||
# 4. Construct context from retrieved documents
|
||||
context_parts = []
|
||||
for idx, result in enumerate(search_response.results, 1):
|
||||
context_parts.append(
|
||||
@@ -273,7 +345,7 @@ def configure_semantic_tools(mcp: FastMCP):
|
||||
|
||||
context = "\n".join(context_parts)
|
||||
|
||||
# 4. Construct prompt - reuse user's query, add context and instructions
|
||||
# 5. Construct prompt - reuse user's query, add context and instructions
|
||||
prompt = (
|
||||
f"{query}\n\n"
|
||||
f"Here are relevant documents from Nextcloud (notes, calendar events, deck cards, files, contacts):\n\n"
|
||||
@@ -282,31 +354,35 @@ def configure_semantic_tools(mcp: FastMCP):
|
||||
f"Cite the document numbers when referencing specific information."
|
||||
)
|
||||
|
||||
logger.debug(
|
||||
f"Requesting sampling for query: {query} "
|
||||
f"({len(search_response.results)} documents retrieved)"
|
||||
logger.info(
|
||||
f"Initiating sampling request: query_length={len(query)}, "
|
||||
f"documents={len(search_response.results)}, "
|
||||
f"prompt_length={len(prompt)}, max_tokens={max_answer_tokens}"
|
||||
)
|
||||
|
||||
# 5. Request LLM completion via MCP sampling
|
||||
try:
|
||||
sampling_result = await ctx.session.create_message(
|
||||
messages=[
|
||||
SamplingMessage(
|
||||
role="user",
|
||||
content=TextContent(type="text", text=prompt),
|
||||
)
|
||||
],
|
||||
max_tokens=max_answer_tokens,
|
||||
temperature=0.7,
|
||||
model_preferences=ModelPreferences(
|
||||
hints=[ModelHint(name="claude-3-5-sonnet")],
|
||||
intelligencePriority=0.8,
|
||||
speedPriority=0.5,
|
||||
),
|
||||
include_context="thisServer",
|
||||
)
|
||||
# 6. Request LLM completion via MCP sampling with timeout
|
||||
import anyio
|
||||
|
||||
# 6. Extract answer from sampling response
|
||||
try:
|
||||
with anyio.fail_after(30):
|
||||
sampling_result = await ctx.session.create_message(
|
||||
messages=[
|
||||
SamplingMessage(
|
||||
role="user",
|
||||
content=TextContent(type="text", text=prompt),
|
||||
)
|
||||
],
|
||||
max_tokens=max_answer_tokens,
|
||||
temperature=0.7,
|
||||
model_preferences=ModelPreferences(
|
||||
hints=[ModelHint(name="claude-3-5-sonnet")],
|
||||
intelligencePriority=0.8,
|
||||
speedPriority=0.5,
|
||||
),
|
||||
include_context="thisServer",
|
||||
)
|
||||
|
||||
# 7. Extract answer from sampling response
|
||||
if sampling_result.content.type == "text":
|
||||
generated_answer = sampling_result.content.text
|
||||
else:
|
||||
@@ -318,7 +394,8 @@ def configure_semantic_tools(mcp: FastMCP):
|
||||
|
||||
logger.info(
|
||||
f"Sampling successful: model={sampling_result.model}, "
|
||||
f"stop_reason={sampling_result.stopReason}"
|
||||
f"stop_reason={sampling_result.stopReason}, "
|
||||
f"answer_length={len(generated_answer)}"
|
||||
)
|
||||
|
||||
return SamplingSearchResponse(
|
||||
@@ -332,23 +409,78 @@ def configure_semantic_tools(mcp: FastMCP):
|
||||
success=True,
|
||||
)
|
||||
|
||||
except Exception as e:
|
||||
# Fallback: Return documents without generated answer
|
||||
except TimeoutError:
|
||||
logger.warning(
|
||||
f"Sampling failed ({type(e).__name__}: {e}), "
|
||||
f"Sampling request timed out after 30 seconds for query: '{query}', "
|
||||
f"returning search results only"
|
||||
)
|
||||
return SamplingSearchResponse(
|
||||
query=query,
|
||||
generated_answer=(
|
||||
f"[Sampling request timed out]\n\n"
|
||||
f"The answer generation took too long (>30s). "
|
||||
f"Found {search_response.total_found} relevant documents. "
|
||||
f"Please review the sources below or try a simpler query."
|
||||
),
|
||||
sources=search_response.results,
|
||||
total_found=search_response.total_found,
|
||||
search_method="semantic_sampling_timeout",
|
||||
success=True,
|
||||
)
|
||||
|
||||
except McpError as e:
|
||||
# Expected MCP protocol errors (user rejection, unsupported, etc.)
|
||||
error_msg = str(e)
|
||||
|
||||
if "rejected" in error_msg.lower() or "denied" in error_msg.lower():
|
||||
# User explicitly declined - this is normal, not an error
|
||||
logger.info(f"User declined sampling request for query: '{query}'")
|
||||
search_method = "semantic_sampling_user_declined"
|
||||
user_message = "User declined to generate an answer"
|
||||
elif "not supported" in error_msg.lower():
|
||||
# Client doesn't support sampling - also normal
|
||||
logger.info(f"Sampling not supported by client for query: '{query}'")
|
||||
search_method = "semantic_sampling_unsupported"
|
||||
user_message = "Sampling not supported by this client"
|
||||
else:
|
||||
# Other MCP protocol errors
|
||||
logger.warning(
|
||||
f"MCP error during sampling for query '{query}': {error_msg}"
|
||||
)
|
||||
search_method = "semantic_sampling_mcp_error"
|
||||
user_message = f"Sampling unavailable: {error_msg}"
|
||||
|
||||
return SamplingSearchResponse(
|
||||
query=query,
|
||||
generated_answer=(
|
||||
f"[Sampling unavailable: {str(e)}]\n\n"
|
||||
f"[{user_message}]\n\n"
|
||||
f"Found {search_response.total_found} relevant documents. "
|
||||
f"Please review the sources below."
|
||||
),
|
||||
sources=search_response.results,
|
||||
total_found=search_response.total_found,
|
||||
search_method="semantic_sampling_fallback",
|
||||
search_method=search_method,
|
||||
success=True,
|
||||
)
|
||||
|
||||
except Exception as e:
|
||||
# Truly unexpected errors - these SHOULD have tracebacks
|
||||
logger.error(
|
||||
f"Unexpected error during sampling for query '{query}': "
|
||||
f"{type(e).__name__}: {e}",
|
||||
exc_info=True,
|
||||
)
|
||||
|
||||
return SamplingSearchResponse(
|
||||
query=query,
|
||||
generated_answer=(
|
||||
f"[Unexpected error during sampling]\n\n"
|
||||
f"Found {search_response.total_found} relevant documents. "
|
||||
f"Please review the sources below."
|
||||
),
|
||||
sources=search_response.results,
|
||||
total_found=search_response.total_found,
|
||||
search_method="semantic_sampling_error",
|
||||
success=True,
|
||||
)
|
||||
|
||||
@@ -413,7 +545,7 @@ def configure_semantic_tools(mcp: FastMCP):
|
||||
|
||||
# Count documents in collection
|
||||
count_result = await qdrant_client.count(
|
||||
collection_name=settings.qdrant_collection
|
||||
collection_name=settings.get_collection_name()
|
||||
)
|
||||
indexed_count = count_result.count
|
||||
|
||||
|
||||
Reference in New Issue
Block a user