Files
nextcloud-mcp-server/nextcloud_mcp_server/search/algorithms.py
T
Chris Coutinho 11e620f2d1 feat: Implement unified search algorithm module
Creates shared search module with four algorithms implementing ADR-012:
- Semantic search (vector similarity via Qdrant)
- Keyword search (token-based matching from ADR-001)
- Fuzzy search (character overlap matching)
- Hybrid search (RRF fusion from ADR-003)

Architecture:
- Base SearchAlgorithm interface for consistent API
- SearchResult dataclass for unified result format
- All algorithms async and independently testable
- Proper logging and error handling throughout

Semantic Search (search/semantic.py):
- Extracted from server/semantic.py
- Vector similarity using Qdrant query_points
- Dual-phase authorization (vector filter + API verification)
- Deduplication of document chunks
- Configurable score threshold (default: 0.7)

Keyword Search (search/keyword.py):
- Implements ADR-001 token-based matching
- Title matches weighted 3x higher than content
- Case-insensitive token matching
- Relevance scoring with normalization
- Excerpt extraction with context

Fuzzy Search (search/fuzzy.py):
- Simple character overlap calculation
- Configurable threshold (default: 70%)
- Typo-tolerant matching
- Fast and dependency-free

Hybrid Search (search/hybrid.py):
- Reciprocal Rank Fusion (RRF) from ADR-003
- Parallel execution of sub-algorithms
- Configurable weights per algorithm
- RRF constant k=60 (standard value)
- Weight validation (must sum ≤1.0)

All algorithms:
- Share NextcloudClient for document access
- Support user_id filtering (multi-tenant)
- Support doc_type filtering (currently notes only)
- Return consistent SearchResult objects
- Properly formatted with ruff and type-checked

Next steps: Update MCP tool to use these algorithms

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-15 00:10:19 +01:00

88 lines
2.4 KiB
Python

"""Base interfaces and data structures for search algorithms."""
from abc import ABC, abstractmethod
from dataclasses import dataclass
from typing import Any
@dataclass
class SearchResult:
"""A single search result with metadata and score.
Attributes:
id: Document ID
doc_type: Document type (note, file, calendar, contact, etc.)
title: Document title
excerpt: Content excerpt showing match context
score: Relevance score (0.0-1.0, higher is better)
metadata: Additional algorithm-specific metadata
"""
id: int
doc_type: str
title: str
excerpt: str
score: float
metadata: dict[str, Any] | None = None
def __post_init__(self):
"""Validate score is in valid range."""
if not 0.0 <= self.score <= 1.0:
raise ValueError(f"Score must be between 0.0 and 1.0, got {self.score}")
class SearchAlgorithm(ABC):
"""Abstract base class for search algorithms.
All search algorithms must implement the search() method with consistent
interface, allowing them to be used interchangeably.
"""
@abstractmethod
async def search(
self,
query: str,
user_id: str,
limit: int = 10,
doc_type: str | None = None,
**kwargs: Any,
) -> list[SearchResult]:
"""Execute search with the given parameters.
Args:
query: Search query string
user_id: User ID for multi-tenant filtering
limit: Maximum number of results to return
doc_type: Optional document type filter (note, file, calendar, etc.)
**kwargs: Algorithm-specific parameters
Returns:
List of SearchResult objects ranked by relevance
Raises:
McpError: If search fails or configuration is invalid
"""
pass
@property
@abstractmethod
def name(self) -> str:
"""Return algorithm name for identification."""
pass
@property
def supports_scoring(self) -> bool:
"""Whether this algorithm provides meaningful relevance scores.
Default: True. Override if algorithm doesn't support scoring.
"""
return True
@property
def requires_vector_db(self) -> bool:
"""Whether this algorithm requires vector database.
Default: False. Override for semantic search.
"""
return False