fix: Preserve 3D plot camera and improve documentation

This commit addresses PR feedback and fixes plot camera behavior.

## JavaScript Fix - Camera Preservation
- Changed plot update strategy from recreating layout to using Plotly.restyle()
- Query point visibility now toggles via restyle() which only modifies trace visibility
- Camera position/zoom naturally preserved since layout remains untouched
- Resolves jumpy plot behavior when toggling "Show Query Point" checkbox

Related: nextcloud_mcp_server/auth/static/vector-viz.js:58-73

## Documentation Improvements
- Condensed vector-sync-ui.md from 316 to 94 lines (~70% reduction)
- Removed redundant FAQ section (content merged into main sections)
- Simplified use cases from 4 detailed sections to 3 focused paragraphs
- Streamlined troubleshooting to 3 common issues
- Merged technical details into overview section
- Retained all essential information while improving readability

## Screenshot Updates
Removed old/outdated images (5 files):
- rag-workflow-bidirectional-final.png
- rag-workflow-prominent-llm.png
- rag-workflow-simple-final.png
- vector-viz-interface.png
- welcome-page.png

Replaced with current screenshots (3 files):
- vector-viz-document-types-2col.png - Now shows plot + results
- vector-viz-chunk-context.png - Centered content view
- vector-viz-results.png - Updated results list

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
Chris Coutinho
2025-11-19 14:10:53 +01:00
parent 9bd02d7ef7
commit c126c3ec03
10 changed files with 66 additions and 283 deletions
Binary file not shown.

Before

Width:  |  Height:  |  Size: 243 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 373 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 251 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 298 KiB

After

Width:  |  Height:  |  Size: 282 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 106 KiB

After

Width:  |  Height:  |  Size: 143 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 84 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 118 KiB

After

Width:  |  Height:  |  Size: 244 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 464 KiB

+42 -264
View File
@@ -1,315 +1,93 @@
# Vector Sync UI Guide
This guide covers the browser-based user interface for the Nextcloud MCP Server's semantic search and vector synchronization features.
This guide covers the browser-based interface for the Nextcloud MCP Server's semantic search and vector synchronization features.
## Overview
The Vector Sync UI (`/app`) is a browser-based interface that allows you to interact with semantic search results from documents in your Nextcloud instance. This UI provides the same query capabilities that Large Language Models (LLMs) use in Retrieval-Augmented Generation (RAG) workflows, allowing you to test queries and visualize results in an intuitive way.
The Vector Sync UI (`/app`) provides an interactive interface to test semantic search queries and visualize results from your Nextcloud documents. It exposes the same retrieval capabilities that LLMs use in Retrieval-Augmented Generation (RAG) workflows, powered by Alpine.js for reactive state, htmx for dynamic updates, and Plotly.js for 3D visualization.
**Supported Apps**: Notes, Files (text/PDF), Calendar (events/tasks), Contacts (CardDAV), and Deck are indexed and searchable.
## Accessing the UI
Navigate to the `/app` endpoint of your MCP server after authentication:
- **BasicAuth mode**: `http://localhost:8000/app` (credentials from environment)
Navigate to `/app` after authentication:
- **BasicAuth mode**: `http://localhost:8000/app` (uses credentials from environment)
- **OAuth mode**: `http://localhost:8000/app` (redirects to login if not authenticated)
## Welcome Page
## Tabs
The welcome page is the default landing page when you access `/app`. It provides an introduction to the MCP server's capabilities and adapts its content based on whether vector sync is enabled.
### Welcome Page
![Welcome Page](../images/welcome-page.png)
Landing page that introduces semantic search and RAG workflows. Shows authentication status, explains how vector embeddings work, and provides feature navigation. Adapts content based on whether `VECTOR_SYNC_ENABLED=true`.
### When Vector Sync is Enabled
### User Info
The welcome page includes:
Displays authentication details and session information:
- **BasicAuth**: Username, mode badge, Nextcloud host
- **OAuth**: Username, session ID (truncated), background access status, IdP profile, revocation option
- **Authentication status** - Shows your username and authentication mode
- **About Semantic Search** - Explanation of semantic search capabilities and how it works
- **RAG Workflow Integration** - How the UI fits into RAG workflows and helps test LLM queries
- **Feature cards** - Quick navigation to User Info, Vector Sync Status, and Vector Visualization
### Vector Sync Status
### When Vector Sync is Disabled
Real-time monitoring of document indexing:
- **Indexed Documents**: Total chunks stored in Qdrant vector database (immediately searchable)
- **Pending Documents**: Queue awaiting embedding processing
- **Status**: "✓ Idle" (green) when up-to-date, "⟳ Syncing" (orange) during processing
If `VECTOR_SYNC_ENABLED=false`, the welcome page displays:
Auto-refreshes every 10 seconds via htmx. Check this tab after adding content to verify indexing completion.
- A warning message explaining that vector sync is disabled
- Link to configuration documentation for enabling the feature
- Limited navigation (User Info only)
### Vector Visualization
## User Info Tab
Interactive search interface with 3D PCA plot of semantic space.
Access user information and session details by navigating to `/app/user-info` or clicking "User Info" in the welcome page.
**Search Controls**:
- **Query**: Natural language search (e.g., "health benefits of coffee")
- **Algorithm**: Semantic (Dense) for pure vector search, or BM25 Hybrid (default) combining vectors + keywords
- **Fusion** (Hybrid only): RRF (Reciprocal Rank Fusion) or DBSF (Distribution-Based Score Fusion)
- **Advanced**: Filter by document type, adjust score threshold (0.0-1.0), set result limit (max 100)
![User Info Tab](../images/user-info-tab.png)
**3D Visualization**:
### What's Displayed
The plot uses Principal Component Analysis (PCA) to reduce 768-dimensional embeddings to 3D. Documents are positioned by semantic similarity with the query point shown in red. Point size and opacity indicate relevance, and the Viridis color scale shows relative scores (yellow = highest match).
**BasicAuth Mode:**
- Username
- Authentication mode badge
- Nextcloud host connection URL
**Critical Fix**: Vectors are L2-normalized before PCA to match Qdrant's cosine distance, ensuring query points position accurately near similar documents. Without normalization, magnitude differences cause misleading spatial separation.
**OAuth Mode:**
- Username
- Authentication mode badge
- Session ID (truncated for security)
- Background access status (granted or not granted)
- IdP profile information (if available)
- Option to revoke background access
**Results List**:
### Navigation
Each result shows document title (clickable link to Nextcloud), excerpt, raw score, relative percentage, and document type. Click "Show Chunk" to view the matched text segment with surrounding context (up to 500 characters before/after).
The user info page includes a sidebar with tabs for:
- **Home** - Returns to the welcome page
- **User Info** - Current page
- **Vector Sync** - Real-time sync status (if vector sync enabled)
- **Vector Viz** - Interactive visualization (if vector sync enabled)
- **Webhooks** - Admin-only webhook management (if user is admin)
## Vector Sync Status Tab
Monitor real-time indexing progress and synchronization status.
![Vector Sync Status](../images/vector-sync-status.png)
### Metrics Displayed
| Metric | Description |
|--------|-------------|
| **Indexed Documents** | Total number of document chunks stored in Qdrant vector database |
| **Pending Documents** | Number of documents in the processing queue waiting to be embedded |
| **Status** | Current sync state: "✓ Idle" (green) or "⟳ Syncing" (orange) |
### Real-Time Updates
The status tab uses htmx to automatically refresh every 10 seconds, providing live updates without manual page refreshes.
### What the Metrics Mean
- **Indexed Documents**: These are document chunks that have been converted to 768-dimensional vector embeddings and stored in Qdrant. These documents are immediately searchable via semantic search.
- **Pending Documents**: Documents in the queue that are awaiting embedding processing. The processor workers will gradually process these documents based on available resources.
- **Idle Status**: No documents are currently being processed. The system is up-to-date.
- **Syncing Status**: Documents are actively being processed and indexed. This is normal after adding new content or on initial sync.
## Vector Visualization Tab
Interactive search interface with 2D visualization of results in semantic space.
![Vector Visualization Interface](../images/vector-viz-interface.png)
### Search Controls
**Search Query**
- Enter natural language queries to search your Nextcloud documents
- Examples: "health benefits of coffee vs tea", "python testing frameworks", "project deadlines"
**Algorithm Selection**
- **Semantic (Dense)**: Pure semantic search using vector similarity
- **BM25 Hybrid** (default): Combines semantic search with keyword matching using BM25 sparse vectors
**Fusion Method** (for BM25 Hybrid only)
- **RRF** (Reciprocal Rank Fusion): General-purpose fusion using reciprocal ranks
- **DBSF** (Distribution-Based Score Fusion): Distribution-based normalization for better score balancing
**Advanced Options**
- Document types filter (Notes, Files, Calendar, Contacts, Deck)
- Score threshold (0.0-1.0)
- Result limit (default: 50, max: 100)
### Search Results
![Vector Visualization Results](../images/vector-viz-results.png)
The visualization displays:
1. **2D PCA Plot** - Documents projected into 2D space using Principal Component Analysis
- Point size indicates relevance score (larger = more relevant)
- Point opacity correlates with score (more opaque = higher score)
- Color scale (Viridis) represents similarity (yellow = highest match)
- Hover over points to see document details
2. **Results List** - Searchable documents with:
- Document title (clickable link to Nextcloud app)
- Snippet preview of matched content
- Raw score and relative score percentage
- Document type (note, file, calendar, etc.)
- "Show Chunk" button to expand matched text
### Viewing Chunk Context
Click "Show Chunk" to view the matched text with surrounding context.
![Chunk Context View](../images/vector-viz-chunk-context.png)
The chunk context view displays:
- **Highlighted matched chunk** - The specific text segment that matched your query (highlighted in yellow)
- **Surrounding context** - Up to 500 characters before and after the match for better understanding
- **Full document link** - Click the title to open the document in the Nextcloud app
### Understanding the 2D Visualization
The PCA (Principal Component Analysis) plot reduces 768-dimensional vector embeddings to 2D for visualization:
- **Proximity** - Documents closer together in 2D space are semantically similar
- **Clusters** - Groups of related documents appear as clusters
- **Outliers** - Distant points represent documents with unique content
- **Query position** - Your search query is embedded and plotted alongside results
**Note**: PCA is a dimensionality reduction technique that preserves as much variance as possible, but some information is lost in the projection from 768D to 2D.
## Configuration Requirements
### Required Environment Variables
To enable vector sync features:
## Configuration
**Required**:
```bash
VECTOR_SYNC_ENABLED=true
```
### Optional Configuration
For browser-accessible links to Nextcloud apps (Notes, Files, etc.):
**Optional** (for browser-accessible links):
```bash
NEXTCLOUD_PUBLIC_ISSUER_URL=https://your-public-nextcloud-url.com
```
If not set, falls back to `NEXTCLOUD_HOST` from settings.
### Admin Access
The Webhooks tab is only visible to users with Nextcloud admin privileges. Admin status is checked via the Nextcloud Provisioning API.
**Admin Access**: Webhooks tab only visible to Nextcloud admins (verified via Provisioning API).
## Use Cases
### 1. Monitoring Document Indexing
**Testing Search Queries**: Preview results before they reach LLMs in RAG workflows. Compare semantic vs. hybrid algorithms, verify relevance scores, and validate that correct documents are retrieved. Use chunk context to see exactly which text segments match and why unexpected documents appear.
Use the Vector Sync Status tab to:
- Verify documents are being indexed after creation/modification
- Check if the indexing queue is backing up (high pending count)
- Confirm the system is idle after bulk document imports
**Monitoring Indexing**: Track real-time progress after creating or modifying documents. Check if the queue is backing up (high pending count) or confirm the system is idle after bulk imports. Verify documents become searchable immediately after indexing completes.
### 2. Testing Search Queries
Use the Vector Visualization tab to:
- Test queries before they're used by LLMs in RAG workflows
- Compare semantic vs. hybrid search algorithms
- Verify that relevant documents are being retrieved
- Understand relevance scores and ranking
### 3. Debugging Search Results
Use chunk context to:
- See exactly which text segments match your query
- Verify that the matched content is relevant
- Identify why unexpected documents appear in results
- Understand the surrounding context of matches
### 4. Algorithm Comparison
Experiment with different search approaches:
- **Pure semantic**: Best for conceptual queries and synonyms
- **BM25 hybrid with RRF**: Balanced approach combining keywords and semantics
- **BM25 hybrid with DBSF**: Alternative fusion for different score distributions
## Technical Details
### Frontend Stack
- **Alpine.js** - Reactive state management for UI interactions
- **htmx** - Server-driven dynamic updates for status polling
- **Plotly.js** - Interactive 2D scatter plot visualization
- **Nextcloud design system** - Consistent styling matching Nextcloud ecosystem
### Backend Processing
- **Server-side PCA** - Dimensionality reduction performed on the server to minimize bandwidth
- **Chunk-level search** - Searches operate on document chunks (not whole documents)
- **Document deduplication** - Multiple chunks from the same document are deduplicated in results
- **Timing metrics** - All search operations log performance metrics for monitoring
### Supported Apps
Documents from the following Nextcloud apps are indexed and searchable:
- **Notes** - All notes and their content
- **Files** - Supported file types (text, PDF, etc.)
- **Calendar** - Calendar events and tasks (VTODO)
- **Contacts** - Contact information (CardDAV)
- **Deck** - Deck cards and board content
**Algorithm Comparison**: Pure semantic search excels at conceptual queries and synonyms. BM25 hybrid combines semantic understanding with precise keyword matching for better accuracy on specific terms. Experiment with RRF vs. DBSF fusion for different score distributions.
## Troubleshooting
### Vector Sync Tab Not Visible
**Vector Sync Tab Not Visible**: Set `VECTOR_SYNC_ENABLED=true` and restart the server.
**Cause**: `VECTOR_SYNC_ENABLED` is not set to `true`
**No Search Results**: Check Vector Sync Status to confirm documents are indexed (not just pending). Try broader queries or lower the score threshold in Advanced options. Initial indexing may take time depending on document volume.
**Solution**: Set the environment variable and restart the MCP server:
```bash
export VECTOR_SYNC_ENABLED=true
docker compose restart mcp
```
### No Search Results
**Possible causes**:
1. No documents have been indexed yet (check Vector Sync Status)
2. Query doesn't match indexed content
3. Score threshold is too high
**Solutions**:
- Wait for documents to be indexed (check "Indexed Documents" count)
- Try broader or different queries
- Lower the score threshold in Advanced options
### Chunk Context Not Loading
**Cause**: Network error or document no longer exists
**Solution**: Check browser console for errors and verify the document still exists in Nextcloud
### Links to Nextcloud Apps Not Working
**Cause**: `NEXTCLOUD_PUBLIC_ISSUER_URL` not configured or incorrect
**Solution**: Set the public URL for browser-accessible links:
```bash
export NEXTCLOUD_PUBLIC_ISSUER_URL=https://your-public-nextcloud-url.com
```
**Links to Nextcloud Apps Not Working**: Set `NEXTCLOUD_PUBLIC_ISSUER_URL` to your browser-accessible Nextcloud URL for correct link generation.
## Related Documentation
- [Configuration Guide](../configuration.md) - Environment variables and settings
- [Authentication Modes](../authentication.md) - BasicAuth vs OAuth setup
- [Installation Guide](../installation.md) - Getting started with the MCP server
- [Installation Guide](../installation.md) - Getting started
- [ADR-008: MCP Sampling for RAG](../ADR-008-mcp-sampling-for-rag.md) - Technical details on RAG workflows
## FAQ
**Q: Can I use this UI without vector sync enabled?**
A: Yes, but you'll only have access to the User Info tab. Vector Sync and Vector Visualization features require `VECTOR_SYNC_ENABLED=true`.
**Q: How often does the status refresh?**
A: The Vector Sync Status tab polls every 10 seconds automatically using htmx.
**Q: What's the difference between BM25 Hybrid and Semantic search?**
A: Semantic search uses only vector embeddings for conceptual similarity. BM25 Hybrid combines semantic search with traditional keyword matching (BM25 sparse vectors) for better precision on exact terms.
**Q: Can I search across multiple Nextcloud apps at once?**
A: Yes! By default, searches query all indexed apps. Use the Advanced options to filter by specific document types.
**Q: Why do some documents have higher scores than others?**
A: Scores represent semantic similarity to your query. Higher scores indicate better matches based on vector similarity (semantic search) or a combination of vector similarity and keyword matching (BM25 hybrid).
**Q: What does the color scale represent in the PCA plot?**
A: The Viridis color scale represents relative relevance scores, with yellow indicating the most relevant documents and purple indicating lower relevance.
+24 -19
View File
@@ -56,16 +56,26 @@ function vizApp() {
},
updatePlot() {
// Re-render plot with current data when toggle changes
// Toggle query point visibility without recreating the plot
// This preserves camera position naturally since layout is untouched
if (this.coordinates && this.queryCoords && this.results.length > 0) {
this.renderPlot(this.coordinates, this.queryCoords, this.results);
const plotDiv = document.getElementById('viz-plot');
// If plot exists, just toggle the query trace visibility
if (plotDiv && plotDiv.data && plotDiv.data.length >= 2) {
// Trace index 1 is the query point
Plotly.restyle('viz-plot', { visible: this.showQueryPoint }, [1]);
} else {
// Plot doesn't exist yet, render it
this.renderPlot(this.coordinates, this.queryCoords, this.results);
}
}
},
renderPlot(coordinates, queryCoords, results) {
const scores = results.map(r => r.score);
// Trace 1: Document results
// Trace 1: Document results (always visible)
const documentTrace = {
x: coordinates.map(c => c[0]),
y: coordinates.map(c => c[1]),
@@ -73,6 +83,7 @@ function vizApp() {
mode: 'markers',
type: 'scatter3d',
name: 'Documents',
visible: true,
customdata: results.map((r, i) => ({
title: r.title,
raw_score: r.original_score,
@@ -98,7 +109,7 @@ function vizApp() {
}
};
// Trace 2: Query point (distinct marker)
// Trace 2: Query point (visibility controlled by toggle)
const queryTrace = {
x: [queryCoords[0]],
y: [queryCoords[1]],
@@ -106,6 +117,7 @@ function vizApp() {
mode: 'markers',
type: 'scatter3d',
name: 'Query',
visible: this.showQueryPoint, // Initial visibility from state
hovertemplate:
'<b>Search Query</b><br>' +
`(x=${queryCoords[0]}, y=${queryCoords[1]}, z=${queryCoords[2]})` +
@@ -120,22 +132,15 @@ function vizApp() {
}
};
// Preserve camera position if plot already exists
const plotDiv = document.getElementById('viz-plot');
let cameraSettings = { eye: { x: 1.5, y: 1.5, z: 1.5 } }; // Default camera position
if (plotDiv && plotDiv.layout && plotDiv.layout.scene && plotDiv.layout.scene.camera) {
// Plot exists and has been interacted with - preserve current camera
cameraSettings = plotDiv.layout.scene.camera;
}
const layout = {
title: `Vector Space (PCA 3D) - ${results.length} results`,
scene: {
xaxis: { title: 'PC1' },
yaxis: { title: 'PC2' },
zaxis: { title: 'PC3' },
camera: cameraSettings
camera: {
eye: { x: 1.5, y: 1.5, z: 1.5 }
}
},
hovermode: 'closest',
autosize: true, // Enable auto-sizing to fit container
@@ -143,8 +148,8 @@ function vizApp() {
margin: { l: 0, r: 0, t: 40, b: 0 } // Minimize margins for full width
};
// Conditionally include query trace based on toggle
const traces = this.showQueryPoint ? [documentTrace, queryTrace] : [documentTrace];
// Always render both traces - visibility is controlled by the visible property
const traces = [documentTrace, queryTrace];
// Enable responsive resizing
const config = {
@@ -152,9 +157,9 @@ function vizApp() {
displayModeBar: true
};
// Use Plotly.react() instead of newPlot() to preserve camera position
// when toggling query point visibility
Plotly.react('viz-plot', traces, layout, config);
// Use newPlot() for initial render - camera position will be preserved
// by subsequent Plotly.restyle() calls in updatePlot()
Plotly.newPlot('viz-plot', traces, layout, config);
},
getNextcloudUrl(result) {