diff --git a/docs/ADR-012-unified-multi-algorithm-search.md b/docs/ADR-012-unified-multi-algorithm-search.md index a788a6a..1fc3738 100644 --- a/docs/ADR-012-unified-multi-algorithm-search.md +++ b/docs/ADR-012-unified-multi-algorithm-search.md @@ -43,6 +43,105 @@ Additionally, users need a **testing interface** (viz pane) to: We will implement a **unified multi-algorithm search architecture** with the following components: +### Architecture Diagram + +``` +┌─────────────────────────────────────────────────────────────────────────────┐ +│ MCP Client / User Browser │ +│ │ +│ ┌──────────────────────────┐ ┌──────────────────────────────────┐ │ +│ │ MCP Tool Call │ │ Viz Pane (Browser UI) │ │ +│ │ │ │ │ │ +│ │ nc_semantic_search( │ │ - Algorithm selector dropdown │ │ +│ │ query="kubernetes", │ │ - Weight adjustment sliders │ │ +│ │ algorithm="hybrid", │ │ - Interactive 2D scatter plot │ │ +│ │ semantic_weight=0.5, │ │ - Side-by-side comparison │ │ +│ │ keyword_weight=0.3, │ │ - Real-time search testing │ │ +│ │ fuzzy_weight=0.2 │ │ │ │ +│ │ ) │ │ │ │ +│ └───────────┬──────────────┘ └────────────┬─────────────────────┘ │ +└──────────────┼─────────────────────────────────────┼────────────────────────┘ + │ │ + │ MCP Protocol │ HTTPS (htmx) + │ │ +┌──────────────▼──────────────────────────────────────▼────────────────────────┐ +│ MCP Server (/app endpoint) │ +│ │ +│ ┌─────────────────────────────────────────────────────────────────────────┐ │ +│ │ Unified Search Interface (server/semantic.py) │ │ +│ │ │ │ +│ │ @mcp.tool() nc_semantic_search(algorithm, weights...) │ │ +│ │ ├─ Validate parameters (weights sum ≤1.0) │ │ +│ │ ├─ Dispatch to algorithm selector │ │ +│ │ └─ Return ranked SearchResponse │ │ +│ └────────────────────────────┬────────────────────────────────────────────┘ │ +│ │ │ +│ ┌────────────────────────────▼────────────────────────────────────────────┐ │ +│ │ Algorithm Dispatcher (search/algorithms.py) │ │ +│ │ │ │ +│ │ if algorithm == "semantic": → semantic.py │ │ +│ │ if algorithm == "keyword": → keyword.py │ │ +│ │ if algorithm == "fuzzy": → fuzzy.py │ │ +│ │ if algorithm == "hybrid": → hybrid.py (RRF fusion) │ │ +│ └─────────────────────────────────────────────────────────────────────────┘ │ +│ │ +│ ┌──────────────────┐ ┌──────────────────┐ ┌──────────────────┐ │ +│ │ semantic.py │ │ keyword.py │ │ fuzzy.py │ │ +│ │ │ │ │ │ │ │ +│ │ • Query Qdrant │ │ • Token matching │ │ • Char overlap │ │ +│ │ • Cosine dist │ │ • Title weight │ │ • 70% threshold │ │ +│ │ • Score ≥0.7 │ │ • ADR-001 logic │ │ • Simple impl │ │ +│ └────────┬─────────┘ └────────┬─────────┘ └────────┬─────────┘ │ +│ │ │ │ │ +│ └─────────────────────┼──────────────────────┘ │ +│ │ │ +│ ┌──────────────────────────────▼──────────────────────────────────────────┐ │ +│ │ hybrid.py (Reciprocal Rank Fusion) │ │ +│ │ │ │ +│ │ 1. Run algorithms in parallel (semantic, keyword, fuzzy) │ │ +│ │ 2. Collect ranked results from each │ │ +│ │ 3. Apply RRF formula: score = weight / (k + rank) │ │ +│ │ 4. Combine scores across algorithms │ │ +│ │ 5. Re-rank by combined score │ │ +│ └─────────────────────────────────────────────────────────────────────────┘ │ +└───────────────────────────────────┬───────────────────────────────────────────┘ + │ + ┌───────────────┴───────────────┐ + │ │ + ┌──────────▼──────────┐ ┌─────────▼────────────┐ + │ Qdrant Vector DB │ │ Nextcloud APIs │ + │ │ │ │ + │ • Vector search │ │ • Access verification│ + │ • user_id filter │ │ • Full metadata fetch│ + │ • Score threshold │ │ • Permission checks │ + │ • 768-dim embeddings│ │ │ + └─────────────────────┘ └──────────────────────┘ +``` + +### Data Flow + +#### MCP Tool Request +``` +1. Client calls nc_semantic_search(query, algorithm="hybrid", weights...) +2. Server validates parameters (weights sum ≤1.0) +3. Dispatcher routes to hybrid.py +4. Hybrid search runs semantic, keyword, fuzzy in parallel +5. RRF combines results with weighted scores +6. Access verification via Nextcloud API +7. Return ranked SearchResponse to client +``` + +#### Viz Pane Request +``` +1. User navigates to /app (Vector Visualization tab) +2. Browser loads vector-viz fragment via htmx +3. User adjusts algorithm selector and weight sliders +4. JavaScript calls same search/algorithms.py backend +5. PCA reduces vectors to 2D for visualization +6. Plotly.js renders interactive scatter plot +7. Matching results highlighted, non-matches grayed out +``` + ### 1. Core Search Algorithms Four search algorithms will be available: @@ -148,6 +247,98 @@ Update viz pane (`nextcloud_mcp_server/auth/userinfo_routes.py`) to: - Compare results across algorithms - Visualize result distribution in 2D space +#### Viz Pane UI Components + +``` +┌────────────────────────────────────────────────────────────────────────┐ +│ Vector Visualization [Status] │ +├────────────────────────────────────────────────────────────────────────┤ +│ │ +│ ┌──────────────────────────────────────────────────────────────────┐ │ +│ │ Search Configuration │ │ +│ │ │ │ +│ │ Query: [_______________________________________________] [Search]│ │ +│ │ │ │ +│ │ Algorithm: [Hybrid ▼] [Semantic] [Keyword] [Fuzzy] │ │ +│ │ │ │ +│ │ Weights (Hybrid Mode): │ │ +│ │ Semantic: [========50========] 0.5 │ │ +│ │ Keyword: [======30====== ] 0.3 │ │ +│ │ Fuzzy: [====20==== ] 0.2 │ │ +│ │ │ │ +│ │ Document Types: ☑ Notes ☑ Files ☑ Calendar ☑ Contacts │ │ +│ └──────────────────────────────────────────────────────────────────┘ │ +│ │ +│ ┌──────────────────────────────────────────────────────────────────┐ │ +│ │ Vector Space Visualization (PCA 2D Projection) │ │ +│ │ │ │ +│ │ ▲ │ │ +│ │ PC2 │ ● ● ● 🔵 Matching results (full opacity) │ │ +│ │ │ ● ● ● ⚪ Non-matching results (40% opacity) │ │ +│ │ │ 🔵 ● ● │ │ +│ │ │ ● 🔵 ● Hover: Show document title + excerpt │ │ +│ │ │ ● ● 🔵 ● Click: Open document in Nextcloud │ │ +│ │ ────┼──●─🔵──●─●────► PC1 │ │ +│ │ │ ● ● ● │ │ +│ │ │ 🔵 ● ● Explained Variance: │ │ +│ │ │ ● ● ● PC1: 23.4% | PC2: 18.7% │ │ +│ │ │ ● ● │ │ +│ │ │ │ +│ └──────────────────────────────────────────────────────────────────┘ │ +│ │ +│ ┌──────────────────────────────────────────────────────────────────┐ │ +│ │ Search Results (12 matching documents) │ │ +│ │ │ │ +│ │ 🔵 Kubernetes Setup Guide Score: 0.87 │ │ +│ │ "...configure kubectl to connect to cluster..." │ │ +│ │ [Open in Nextcloud] │ │ +│ │ │ │ +│ │ 🔵 Container Orchestration Notes Score: 0.82 │ │ +│ │ "...deployment strategies for kubernetes..." │ │ +│ │ [Open in Nextcloud] │ │ +│ │ │ │ +│ │ 🔵 K8s Troubleshooting Score: 0.79 │ │ +│ │ "...common kuberntes errors and solutions..." │ │ +│ │ [Open in Nextcloud] │ │ +│ │ │ │ +│ │ [Show More Results...] │ │ +│ └──────────────────────────────────────────────────────────────────┘ │ +│ │ +│ ┌──────────────────────────────────────────────────────────────────┐ │ +│ │ Algorithm Performance Comparison │ │ +│ │ │ │ +│ │ Algorithm │ Results │ Avg Score │ Time (ms) │ Precision │ │ +│ │ ─────────────┼─────────┼───────────┼───────────┼─────────── │ │ +│ │ Semantic │ 45 │ 0.78 │ 145ms │ ████░ 0.82 │ │ +│ │ Keyword │ 23 │ 0.91 │ 42ms │ ███░░ 0.67 │ │ +│ │ Fuzzy │ 67 │ 0.72 │ 89ms │ ██░░░ 0.45 │ │ +│ │ Hybrid (RRF) │ 52 │ 0.84 │ 198ms │ █████ 0.89 │ │ +│ └──────────────────────────────────────────────────────────────────┘ │ +└────────────────────────────────────────────────────────────────────────┘ +``` + +**Key UI Features**: + +1. **Search Input**: Real-time query testing with instant visualization +2. **Algorithm Selector**: Dropdown + quick-select buttons +3. **Weight Sliders**: Visual adjustment with live preview (hybrid mode only) +4. **Document Type Filters**: Checkboxes for notes, files, calendar, contacts +5. **2D Scatter Plot**: Interactive Plotly.js visualization + - Blue dots = matching documents (full opacity) + - Gray dots = non-matching documents (40% opacity) + - Hover = show title + excerpt tooltip + - Click = open document in Nextcloud + - Zoom/pan controls for exploration +6. **Results Panel**: Ranked list with scores and excerpts +7. **Performance Table**: Compare algorithm speed and accuracy +8. **Explained Variance**: Show how much information PCA preserves + +**Technology Stack**: +- **Frontend**: htmx for dynamic loading, Alpine.js for reactivity +- **Visualization**: Plotly.js for interactive scatter plots +- **Styling**: Tailwind CSS (consistent with existing /app UI) +- **Backend**: Shared `search/algorithms.py` implementation + ### 5. Reciprocal Rank Fusion (RRF) for Hybrid Search Following ADR-003's design: