11e620f2d1
Creates shared search module with four algorithms implementing ADR-012: - Semantic search (vector similarity via Qdrant) - Keyword search (token-based matching from ADR-001) - Fuzzy search (character overlap matching) - Hybrid search (RRF fusion from ADR-003) Architecture: - Base SearchAlgorithm interface for consistent API - SearchResult dataclass for unified result format - All algorithms async and independently testable - Proper logging and error handling throughout Semantic Search (search/semantic.py): - Extracted from server/semantic.py - Vector similarity using Qdrant query_points - Dual-phase authorization (vector filter + API verification) - Deduplication of document chunks - Configurable score threshold (default: 0.7) Keyword Search (search/keyword.py): - Implements ADR-001 token-based matching - Title matches weighted 3x higher than content - Case-insensitive token matching - Relevance scoring with normalization - Excerpt extraction with context Fuzzy Search (search/fuzzy.py): - Simple character overlap calculation - Configurable threshold (default: 70%) - Typo-tolerant matching - Fast and dependency-free Hybrid Search (search/hybrid.py): - Reciprocal Rank Fusion (RRF) from ADR-003 - Parallel execution of sub-algorithms - Configurable weights per algorithm - RRF constant k=60 (standard value) - Weight validation (must sum ≤1.0) All algorithms: - Share NextcloudClient for document access - Support user_id filtering (multi-tenant) - Support doc_type filtering (currently notes only) - Return consistent SearchResult objects - Properly formatted with ruff and type-checked Next steps: Update MCP tool to use these algorithms 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
88 lines
2.4 KiB
Python
88 lines
2.4 KiB
Python
"""Base interfaces and data structures for search algorithms."""
|
|
|
|
from abc import ABC, abstractmethod
|
|
from dataclasses import dataclass
|
|
from typing import Any
|
|
|
|
|
|
@dataclass
|
|
class SearchResult:
|
|
"""A single search result with metadata and score.
|
|
|
|
Attributes:
|
|
id: Document ID
|
|
doc_type: Document type (note, file, calendar, contact, etc.)
|
|
title: Document title
|
|
excerpt: Content excerpt showing match context
|
|
score: Relevance score (0.0-1.0, higher is better)
|
|
metadata: Additional algorithm-specific metadata
|
|
"""
|
|
|
|
id: int
|
|
doc_type: str
|
|
title: str
|
|
excerpt: str
|
|
score: float
|
|
metadata: dict[str, Any] | None = None
|
|
|
|
def __post_init__(self):
|
|
"""Validate score is in valid range."""
|
|
if not 0.0 <= self.score <= 1.0:
|
|
raise ValueError(f"Score must be between 0.0 and 1.0, got {self.score}")
|
|
|
|
|
|
class SearchAlgorithm(ABC):
|
|
"""Abstract base class for search algorithms.
|
|
|
|
All search algorithms must implement the search() method with consistent
|
|
interface, allowing them to be used interchangeably.
|
|
"""
|
|
|
|
@abstractmethod
|
|
async def search(
|
|
self,
|
|
query: str,
|
|
user_id: str,
|
|
limit: int = 10,
|
|
doc_type: str | None = None,
|
|
**kwargs: Any,
|
|
) -> list[SearchResult]:
|
|
"""Execute search with the given parameters.
|
|
|
|
Args:
|
|
query: Search query string
|
|
user_id: User ID for multi-tenant filtering
|
|
limit: Maximum number of results to return
|
|
doc_type: Optional document type filter (note, file, calendar, etc.)
|
|
**kwargs: Algorithm-specific parameters
|
|
|
|
Returns:
|
|
List of SearchResult objects ranked by relevance
|
|
|
|
Raises:
|
|
McpError: If search fails or configuration is invalid
|
|
"""
|
|
pass
|
|
|
|
@property
|
|
@abstractmethod
|
|
def name(self) -> str:
|
|
"""Return algorithm name for identification."""
|
|
pass
|
|
|
|
@property
|
|
def supports_scoring(self) -> bool:
|
|
"""Whether this algorithm provides meaningful relevance scores.
|
|
|
|
Default: True. Override if algorithm doesn't support scoring.
|
|
"""
|
|
return True
|
|
|
|
@property
|
|
def requires_vector_db(self) -> bool:
|
|
"""Whether this algorithm requires vector database.
|
|
|
|
Default: False. Override for semantic search.
|
|
"""
|
|
return False
|