feat: implement MCP sampling for semantic search RAG (ADR-008)
Add nc_notes_semantic_search_answer tool that combines semantic search with MCP sampling to generate natural language answers from retrieved Nextcloud Notes. This enables Retrieval-Augmented Generation (RAG) patterns without requiring a server-side LLM. Key features: - Client-side LLM generation via ctx.session.create_message() - Graceful fallback when sampling unavailable - Proper source citations in generated answers - No results optimization (skips sampling when no docs found) - Comprehensive unit and integration tests Implementation details: - SamplingSearchResponse model with generated_answer and sources - Fixed prompt template with document context and citation instructions - Model preferences hint Claude Sonnet for balanced performance - Falls back to returning documents without answer on sampling failure Updates: - Add ADR-008 documenting sampling architecture decision - Add MCP sampling pattern guidance to CLAUDE.md - Update README.md and docs/notes.md (7 → 9 tools) - Add 4 unit tests and 6 integration tests 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
@@ -108,3 +108,41 @@ class SemanticSearchNotesResponse(BaseResponse):
|
||||
search_method: str = Field(
|
||||
default="semantic", description="Search method used (semantic or hybrid)"
|
||||
)
|
||||
|
||||
|
||||
class SamplingSearchResponse(BaseResponse):
|
||||
"""Response from semantic search with LLM-generated answer via MCP sampling.
|
||||
|
||||
This response includes both a generated natural language answer (created by
|
||||
the MCP client's LLM via sampling) and the source documents used to generate
|
||||
that answer. Users can read the answer for quick information and review
|
||||
sources for verification and deeper exploration.
|
||||
|
||||
Attributes:
|
||||
query: The original user query
|
||||
generated_answer: Natural language answer generated by client's LLM
|
||||
sources: List of semantic search results used as context
|
||||
total_found: Total number of matching documents found
|
||||
search_method: Always "semantic_sampling" for this response type
|
||||
model_used: Name of model that generated the answer (e.g., "claude-3-5-sonnet")
|
||||
stop_reason: Why generation stopped ("endTurn", "maxTokens", etc.)
|
||||
"""
|
||||
|
||||
query: str = Field(..., description="Original user query")
|
||||
generated_answer: str = Field(
|
||||
..., description="LLM-generated answer based on retrieved documents"
|
||||
)
|
||||
sources: List[SemanticSearchResult] = Field(
|
||||
default_factory=list,
|
||||
description="Source documents with excerpts and relevance scores",
|
||||
)
|
||||
total_found: int = Field(..., description="Total matching documents")
|
||||
search_method: str = Field(
|
||||
default="semantic_sampling", description="Search method used"
|
||||
)
|
||||
model_used: Optional[str] = Field(
|
||||
default=None, description="Model that generated the answer"
|
||||
)
|
||||
stop_reason: Optional[str] = Field(
|
||||
default=None, description="Reason generation stopped"
|
||||
)
|
||||
|
||||
Reference in New Issue
Block a user