Chris Coutinho
|
11e620f2d1
|
feat: Implement unified search algorithm module
Creates shared search module with four algorithms implementing ADR-012:
- Semantic search (vector similarity via Qdrant)
- Keyword search (token-based matching from ADR-001)
- Fuzzy search (character overlap matching)
- Hybrid search (RRF fusion from ADR-003)
Architecture:
- Base SearchAlgorithm interface for consistent API
- SearchResult dataclass for unified result format
- All algorithms async and independently testable
- Proper logging and error handling throughout
Semantic Search (search/semantic.py):
- Extracted from server/semantic.py
- Vector similarity using Qdrant query_points
- Dual-phase authorization (vector filter + API verification)
- Deduplication of document chunks
- Configurable score threshold (default: 0.7)
Keyword Search (search/keyword.py):
- Implements ADR-001 token-based matching
- Title matches weighted 3x higher than content
- Case-insensitive token matching
- Relevance scoring with normalization
- Excerpt extraction with context
Fuzzy Search (search/fuzzy.py):
- Simple character overlap calculation
- Configurable threshold (default: 70%)
- Typo-tolerant matching
- Fast and dependency-free
Hybrid Search (search/hybrid.py):
- Reciprocal Rank Fusion (RRF) from ADR-003
- Parallel execution of sub-algorithms
- Configurable weights per algorithm
- RRF constant k=60 (standard value)
- Weight validation (must sum ≤1.0)
All algorithms:
- Share NextcloudClient for document access
- Support user_id filtering (multi-tenant)
- Support doc_type filtering (currently notes only)
- Return consistent SearchResult objects
- Properly formatted with ruff and type-checked
Next steps: Update MCP tool to use these algorithms
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
|
2025-11-15 00:10:19 +01:00 |
|