docs: Emphasize server-side processing in ADR-012 viz pane

Updates ADR-012 to clarify that all search and filtering operations must happen server-side, not in the browser. Key changes: - Enhanced viz pane data flow showing server-side processing - Added performance benefits section (384x bandwidth reduction) - Detailed server-side filtering approach: * Query execution via search/algorithms.py * User ID filtering (multi-tenant security) * Document type filtering * PCA reduction (768-dim → 2D) on server * Only 2D coordinates + metadata sent to client - Updated Phase 3 implementation plan: * Remove ALL client-side search logic * Implement /app/vector-viz server endpoint * htmx form submission for queries * Performance optimizations (caching, streaming) This ensures: - Minimal bandwidth usage (only 2 floats per doc vs 768) - Client handles only visualization, not computation - Can visualize 10,000+ documents without client lag - Raw vectors never leave server (security) - Same search logic as MCP tool (consistency) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-15 00:02:54 +01:00
parent 5e67277049
commit 56bd85c0f7
1 changed files with 59 additions and 17 deletions
@@ -131,17 +131,34 @@ We will implement a **unified multi-algorithm search architecture** with the fol
 7. Return ranked SearchResponse to client
 ```

-#### Viz Pane Request
+#### Viz Pane Request (Server-Side Processing)
 ```
 1. User navigates to /app (Vector Visualization tab)
 2. Browser loads vector-viz fragment via htmx
-3. User adjusts algorithm selector and weight sliders
-4. JavaScript calls same search/algorithms.py backend
-5. PCA reduces vectors to 2D for visualization
-6. Plotly.js renders interactive scatter plot
-7. Matching results highlighted, non-matches grayed out
+3. User enters query and adjusts algorithm/weights
+4. htmx sends request to /app/vector-viz endpoint
+5. Server executes search via search/algorithms.py:
+   - Filters by user_id (multi-tenant security)
+   - Applies selected algorithm (semantic/keyword/fuzzy/hybrid)
+   - Filters by document type (notes/files/calendar/contacts)
+   - Retrieves matching results + metadata
+6. Server performs PCA reduction (768-dim → 2D):
+   - Converts matching results to 2D coordinates
+   - Only sends coordinates + metadata (not full vectors)
+   - Dramatically reduces bandwidth (e.g., 768 floats → 2 floats per doc)
+7. Server returns JSON: {results: [...], coordinates_2d: [...], stats: {...}}
+8. Browser receives lightweight response
+9. Plotly.js renders interactive scatter plot
+10. Matching results highlighted (blue), non-matches grayed (40% opacity)
 ```

+**Performance Benefits of Server-Side Processing**:
+- **Bandwidth reduction**: ~384x less data (2 floats vs 768 floats per document)
+- **Client efficiency**: Browser only handles visualization, not computation
+- **Scalability**: Can visualize 10,000+ documents without client-side lag
+- **Security**: Raw vectors never leave server
+- **Consistency**: Same search logic as MCP tool (no drift)
+
 ### 1. Core Search Algorithms

 Four search algorithms will be available:
@@ -238,10 +255,19 @@ nextcloud_mcp_server/
 Update viz pane (`nextcloud_mcp_server/auth/userinfo_routes.py`) to:

 1. **Use shared algorithms**: Import from `search/algorithms.py`
-2. **Remove client-side filtering**: Call server-side search methods
-3. **User accessibility**: Available to all users with vector sync enabled
-4. **Security**: Filter results by `user_id` (only show user's own documents)
-5. **Interactive testing**: Allow users to:
+2. **Server-side filtering**: All search and filtering operations happen server-side
+   - Query execution via shared search backend
+   - Document type filtering (notes, files, calendar, contacts)
+   - User ID filtering for multi-tenant security
+   - Only matching results + metadata sent to client
+   - Reduces bandwidth and improves performance
+3. **PCA reduction**: Server performs dimensionality reduction (768-dim → 2D)
+   - Only 2D coordinates sent to browser for visualization
+   - Dramatically reduces data transfer vs sending full vectors
+   - Enables visualization of large document collections
+4. **User accessibility**: Available to all users with vector sync enabled
+5. **Security**: Filter results by `user_id` (only show user's own documents)
+6. **Interactive testing**: Allow users to:
   - Select algorithm type
   - Adjust weights (hybrid mode)
   - Compare results across algorithms
@@ -403,13 +429,29 @@ def reciprocal_rank_fusion(

 ### Phase 3: Update Viz Pane (Week 2)

-1. Remove client-side search filtering
-2. Call shared `search/algorithms.py` methods
-3. Add user_id filtering for multi-user security
-4. Add algorithm selector dropdown
-5. Add weight adjustment controls (sliders)
-6. Update visualization to show algorithm-specific metadata
-7. Add side-by-side comparison mode
+**Critical: All processing must happen server-side**
+
+1. **Remove client-side search filtering**
+   - Delete JavaScript-based keyword/fuzzy matching
+   - Remove client-side document type filtering
+   - No search logic in browser
+2. **Implement server-side endpoint** (`/app/vector-viz`)
+   - Accept query, algorithm, weights, doc_type filters
+   - Execute search via `search/algorithms.py`
+   - Filter results by user_id (security)
+   - Perform PCA reduction (768-dim → 2D)
+   - Return JSON with 2D coordinates + metadata only
+3. **Update frontend**
+   - htmx form submission to `/app/vector-viz`
+   - Algorithm selector dropdown
+   - Weight adjustment sliders (htmx updates on change)
+   - Document type checkboxes
+   - Plotly.js visualization of server response
+4. **Performance optimization**
+   - Limit results to user's documents only
+   - Cache PCA transformation (invalidate on new vectors)
+   - Stream large result sets if needed
+   - Add loading indicators for server processing

 ### Phase 4: Documentation and Testing (Week 2-3)