feat: Improve vector visualization with static assets and fixes
- Extract CSS and JavaScript into separate static files - Created nextcloud_mcp_server/auth/static/vector-viz.css - Created nextcloud_mcp_server/auth/static/vector-viz.js - Updated templates to reference external assets - Fix vector visualization issues: - Normalize vectors before PCA to match Qdrant's cosine distance - Add zero-norm and NaN detection/handling for large datasets - Enable responsive Plotly sizing (autosize + responsive config) - Widen plot area to full viewport width with minimized margins - Improve visualization accuracy: - Query point now positioned correctly relative to documents - Handles 200+ points without JSON serialization errors - Full-width plot maximizes screen space utilization 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
@@ -6,3 +6,4 @@
|
||||
|
||||
!nextcloud_mcp_server/**/*.py
|
||||
!nextcloud_mcp_server/**/*.html
|
||||
!nextcloud_mcp_server/auth/static/*.png
|
||||
|
||||
@@ -133,6 +133,7 @@ This enables natural language queries and helps discover related content across
|
||||
- **[App Documentation](docs/)** - Notes, Calendar, Contacts, WebDAV, Deck, Cookbook, Tables
|
||||
- **[Document Processing](docs/configuration.md#document-processing)** - OCR and text extraction setup
|
||||
- **[Semantic Search Architecture](docs/semantic-search-architecture.md)** - Experimental vector search (Notes only, opt-in)
|
||||
- **[Vector Sync UI Guide](docs/user-guide/vector-sync-ui.md)** - Browser interface for semantic search visualization and testing
|
||||
|
||||
### Advanced Topics
|
||||
- **[OAuth Architecture](docs/oauth-architecture.md)** - How OAuth works (experimental)
|
||||
|
||||
|
After Width: | Height: | Size: 243 KiB |
|
After Width: | Height: | Size: 373 KiB |
|
After Width: | Height: | Size: 251 KiB |
|
After Width: | Height: | Size: 83 KiB |
|
After Width: | Height: | Size: 82 KiB |
|
After Width: | Height: | Size: 298 KiB |
|
After Width: | Height: | Size: 106 KiB |
|
After Width: | Height: | Size: 84 KiB |
|
After Width: | Height: | Size: 118 KiB |
|
After Width: | Height: | Size: 464 KiB |
|
After Width: | Height: | Size: 483 KiB |
@@ -0,0 +1,315 @@
|
||||
# Vector Sync UI Guide
|
||||
|
||||
This guide covers the browser-based user interface for the Nextcloud MCP Server's semantic search and vector synchronization features.
|
||||
|
||||
## Overview
|
||||
|
||||
The Vector Sync UI (`/app`) is a browser-based interface that allows you to interact with semantic search results from documents in your Nextcloud instance. This UI provides the same query capabilities that Large Language Models (LLMs) use in Retrieval-Augmented Generation (RAG) workflows, allowing you to test queries and visualize results in an intuitive way.
|
||||
|
||||
## Accessing the UI
|
||||
|
||||
Navigate to the `/app` endpoint of your MCP server after authentication:
|
||||
|
||||
- **BasicAuth mode**: `http://localhost:8000/app` (credentials from environment)
|
||||
- **OAuth mode**: `http://localhost:8000/app` (redirects to login if not authenticated)
|
||||
|
||||
## Welcome Page
|
||||
|
||||
The welcome page is the default landing page when you access `/app`. It provides an introduction to the MCP server's capabilities and adapts its content based on whether vector sync is enabled.
|
||||
|
||||

|
||||
|
||||
### When Vector Sync is Enabled
|
||||
|
||||
The welcome page includes:
|
||||
|
||||
- **Authentication status** - Shows your username and authentication mode
|
||||
- **About Semantic Search** - Explanation of semantic search capabilities and how it works
|
||||
- **RAG Workflow Integration** - How the UI fits into RAG workflows and helps test LLM queries
|
||||
- **Feature cards** - Quick navigation to User Info, Vector Sync Status, and Vector Visualization
|
||||
|
||||
### When Vector Sync is Disabled
|
||||
|
||||
If `VECTOR_SYNC_ENABLED=false`, the welcome page displays:
|
||||
|
||||
- A warning message explaining that vector sync is disabled
|
||||
- Link to configuration documentation for enabling the feature
|
||||
- Limited navigation (User Info only)
|
||||
|
||||
## User Info Tab
|
||||
|
||||
Access user information and session details by navigating to `/app/user-info` or clicking "User Info" in the welcome page.
|
||||
|
||||

|
||||
|
||||
### What's Displayed
|
||||
|
||||
**BasicAuth Mode:**
|
||||
- Username
|
||||
- Authentication mode badge
|
||||
- Nextcloud host connection URL
|
||||
|
||||
**OAuth Mode:**
|
||||
- Username
|
||||
- Authentication mode badge
|
||||
- Session ID (truncated for security)
|
||||
- Background access status (granted or not granted)
|
||||
- IdP profile information (if available)
|
||||
- Option to revoke background access
|
||||
|
||||
### Navigation
|
||||
|
||||
The user info page includes a sidebar with tabs for:
|
||||
- **Home** - Returns to the welcome page
|
||||
- **User Info** - Current page
|
||||
- **Vector Sync** - Real-time sync status (if vector sync enabled)
|
||||
- **Vector Viz** - Interactive visualization (if vector sync enabled)
|
||||
- **Webhooks** - Admin-only webhook management (if user is admin)
|
||||
|
||||
## Vector Sync Status Tab
|
||||
|
||||
Monitor real-time indexing progress and synchronization status.
|
||||
|
||||

|
||||
|
||||
### Metrics Displayed
|
||||
|
||||
| Metric | Description |
|
||||
|--------|-------------|
|
||||
| **Indexed Documents** | Total number of document chunks stored in Qdrant vector database |
|
||||
| **Pending Documents** | Number of documents in the processing queue waiting to be embedded |
|
||||
| **Status** | Current sync state: "✓ Idle" (green) or "⟳ Syncing" (orange) |
|
||||
|
||||
### Real-Time Updates
|
||||
|
||||
The status tab uses htmx to automatically refresh every 10 seconds, providing live updates without manual page refreshes.
|
||||
|
||||
### What the Metrics Mean
|
||||
|
||||
- **Indexed Documents**: These are document chunks that have been converted to 768-dimensional vector embeddings and stored in Qdrant. These documents are immediately searchable via semantic search.
|
||||
|
||||
- **Pending Documents**: Documents in the queue that are awaiting embedding processing. The processor workers will gradually process these documents based on available resources.
|
||||
|
||||
- **Idle Status**: No documents are currently being processed. The system is up-to-date.
|
||||
|
||||
- **Syncing Status**: Documents are actively being processed and indexed. This is normal after adding new content or on initial sync.
|
||||
|
||||
## Vector Visualization Tab
|
||||
|
||||
Interactive search interface with 2D visualization of results in semantic space.
|
||||
|
||||

|
||||
|
||||
### Search Controls
|
||||
|
||||
**Search Query**
|
||||
- Enter natural language queries to search your Nextcloud documents
|
||||
- Examples: "health benefits of coffee vs tea", "python testing frameworks", "project deadlines"
|
||||
|
||||
**Algorithm Selection**
|
||||
- **Semantic (Dense)**: Pure semantic search using vector similarity
|
||||
- **BM25 Hybrid** (default): Combines semantic search with keyword matching using BM25 sparse vectors
|
||||
|
||||
**Fusion Method** (for BM25 Hybrid only)
|
||||
- **RRF** (Reciprocal Rank Fusion): General-purpose fusion using reciprocal ranks
|
||||
- **DBSF** (Distribution-Based Score Fusion): Distribution-based normalization for better score balancing
|
||||
|
||||
**Advanced Options**
|
||||
- Document types filter (Notes, Files, Calendar, Contacts, Deck)
|
||||
- Score threshold (0.0-1.0)
|
||||
- Result limit (default: 50, max: 100)
|
||||
|
||||
### Search Results
|
||||
|
||||

|
||||
|
||||
The visualization displays:
|
||||
|
||||
1. **2D PCA Plot** - Documents projected into 2D space using Principal Component Analysis
|
||||
- Point size indicates relevance score (larger = more relevant)
|
||||
- Point opacity correlates with score (more opaque = higher score)
|
||||
- Color scale (Viridis) represents similarity (yellow = highest match)
|
||||
- Hover over points to see document details
|
||||
|
||||
2. **Results List** - Searchable documents with:
|
||||
- Document title (clickable link to Nextcloud app)
|
||||
- Snippet preview of matched content
|
||||
- Raw score and relative score percentage
|
||||
- Document type (note, file, calendar, etc.)
|
||||
- "Show Chunk" button to expand matched text
|
||||
|
||||
### Viewing Chunk Context
|
||||
|
||||
Click "Show Chunk" to view the matched text with surrounding context.
|
||||
|
||||

|
||||
|
||||
The chunk context view displays:
|
||||
- **Highlighted matched chunk** - The specific text segment that matched your query (highlighted in yellow)
|
||||
- **Surrounding context** - Up to 500 characters before and after the match for better understanding
|
||||
- **Full document link** - Click the title to open the document in the Nextcloud app
|
||||
|
||||
### Understanding the 2D Visualization
|
||||
|
||||
The PCA (Principal Component Analysis) plot reduces 768-dimensional vector embeddings to 2D for visualization:
|
||||
|
||||
- **Proximity** - Documents closer together in 2D space are semantically similar
|
||||
- **Clusters** - Groups of related documents appear as clusters
|
||||
- **Outliers** - Distant points represent documents with unique content
|
||||
- **Query position** - Your search query is embedded and plotted alongside results
|
||||
|
||||
**Note**: PCA is a dimensionality reduction technique that preserves as much variance as possible, but some information is lost in the projection from 768D to 2D.
|
||||
|
||||
## Configuration Requirements
|
||||
|
||||
### Required Environment Variables
|
||||
|
||||
To enable vector sync features:
|
||||
|
||||
```bash
|
||||
VECTOR_SYNC_ENABLED=true
|
||||
```
|
||||
|
||||
### Optional Configuration
|
||||
|
||||
For browser-accessible links to Nextcloud apps (Notes, Files, etc.):
|
||||
|
||||
```bash
|
||||
NEXTCLOUD_PUBLIC_ISSUER_URL=https://your-public-nextcloud-url.com
|
||||
```
|
||||
|
||||
If not set, falls back to `NEXTCLOUD_HOST` from settings.
|
||||
|
||||
### Admin Access
|
||||
|
||||
The Webhooks tab is only visible to users with Nextcloud admin privileges. Admin status is checked via the Nextcloud Provisioning API.
|
||||
|
||||
## Use Cases
|
||||
|
||||
### 1. Monitoring Document Indexing
|
||||
|
||||
Use the Vector Sync Status tab to:
|
||||
- Verify documents are being indexed after creation/modification
|
||||
- Check if the indexing queue is backing up (high pending count)
|
||||
- Confirm the system is idle after bulk document imports
|
||||
|
||||
### 2. Testing Search Queries
|
||||
|
||||
Use the Vector Visualization tab to:
|
||||
- Test queries before they're used by LLMs in RAG workflows
|
||||
- Compare semantic vs. hybrid search algorithms
|
||||
- Verify that relevant documents are being retrieved
|
||||
- Understand relevance scores and ranking
|
||||
|
||||
### 3. Debugging Search Results
|
||||
|
||||
Use chunk context to:
|
||||
- See exactly which text segments match your query
|
||||
- Verify that the matched content is relevant
|
||||
- Identify why unexpected documents appear in results
|
||||
- Understand the surrounding context of matches
|
||||
|
||||
### 4. Algorithm Comparison
|
||||
|
||||
Experiment with different search approaches:
|
||||
- **Pure semantic**: Best for conceptual queries and synonyms
|
||||
- **BM25 hybrid with RRF**: Balanced approach combining keywords and semantics
|
||||
- **BM25 hybrid with DBSF**: Alternative fusion for different score distributions
|
||||
|
||||
## Technical Details
|
||||
|
||||
### Frontend Stack
|
||||
|
||||
- **Alpine.js** - Reactive state management for UI interactions
|
||||
- **htmx** - Server-driven dynamic updates for status polling
|
||||
- **Plotly.js** - Interactive 2D scatter plot visualization
|
||||
- **Nextcloud design system** - Consistent styling matching Nextcloud ecosystem
|
||||
|
||||
### Backend Processing
|
||||
|
||||
- **Server-side PCA** - Dimensionality reduction performed on the server to minimize bandwidth
|
||||
- **Chunk-level search** - Searches operate on document chunks (not whole documents)
|
||||
- **Document deduplication** - Multiple chunks from the same document are deduplicated in results
|
||||
- **Timing metrics** - All search operations log performance metrics for monitoring
|
||||
|
||||
### Supported Apps
|
||||
|
||||
Documents from the following Nextcloud apps are indexed and searchable:
|
||||
|
||||
- **Notes** - All notes and their content
|
||||
- **Files** - Supported file types (text, PDF, etc.)
|
||||
- **Calendar** - Calendar events and tasks (VTODO)
|
||||
- **Contacts** - Contact information (CardDAV)
|
||||
- **Deck** - Deck cards and board content
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Vector Sync Tab Not Visible
|
||||
|
||||
**Cause**: `VECTOR_SYNC_ENABLED` is not set to `true`
|
||||
|
||||
**Solution**: Set the environment variable and restart the MCP server:
|
||||
```bash
|
||||
export VECTOR_SYNC_ENABLED=true
|
||||
docker compose restart mcp
|
||||
```
|
||||
|
||||
### No Search Results
|
||||
|
||||
**Possible causes**:
|
||||
1. No documents have been indexed yet (check Vector Sync Status)
|
||||
2. Query doesn't match indexed content
|
||||
3. Score threshold is too high
|
||||
|
||||
**Solutions**:
|
||||
- Wait for documents to be indexed (check "Indexed Documents" count)
|
||||
- Try broader or different queries
|
||||
- Lower the score threshold in Advanced options
|
||||
|
||||
### Chunk Context Not Loading
|
||||
|
||||
**Cause**: Network error or document no longer exists
|
||||
|
||||
**Solution**: Check browser console for errors and verify the document still exists in Nextcloud
|
||||
|
||||
### Links to Nextcloud Apps Not Working
|
||||
|
||||
**Cause**: `NEXTCLOUD_PUBLIC_ISSUER_URL` not configured or incorrect
|
||||
|
||||
**Solution**: Set the public URL for browser-accessible links:
|
||||
```bash
|
||||
export NEXTCLOUD_PUBLIC_ISSUER_URL=https://your-public-nextcloud-url.com
|
||||
```
|
||||
|
||||
## Related Documentation
|
||||
|
||||
- [Configuration Guide](../configuration.md) - Environment variables and settings
|
||||
- [Authentication Modes](../authentication.md) - BasicAuth vs OAuth setup
|
||||
- [Installation Guide](../installation.md) - Getting started with the MCP server
|
||||
- [ADR-008: MCP Sampling for RAG](../ADR-008-mcp-sampling-for-rag.md) - Technical details on RAG workflows
|
||||
|
||||
## FAQ
|
||||
|
||||
**Q: Can I use this UI without vector sync enabled?**
|
||||
|
||||
A: Yes, but you'll only have access to the User Info tab. Vector Sync and Vector Visualization features require `VECTOR_SYNC_ENABLED=true`.
|
||||
|
||||
**Q: How often does the status refresh?**
|
||||
|
||||
A: The Vector Sync Status tab polls every 10 seconds automatically using htmx.
|
||||
|
||||
**Q: What's the difference between BM25 Hybrid and Semantic search?**
|
||||
|
||||
A: Semantic search uses only vector embeddings for conceptual similarity. BM25 Hybrid combines semantic search with traditional keyword matching (BM25 sparse vectors) for better precision on exact terms.
|
||||
|
||||
**Q: Can I search across multiple Nextcloud apps at once?**
|
||||
|
||||
A: Yes! By default, searches query all indexed apps. Use the Advanced options to filter by specific document types.
|
||||
|
||||
**Q: Why do some documents have higher scores than others?**
|
||||
|
||||
A: Scores represent semantic similarity to your query. Higher scores indicate better matches based on vector similarity (semantic search) or a combination of vector similarity and keyword matching (BM25 hybrid).
|
||||
|
||||
**Q: What does the color scale represent in the PCA plot?**
|
||||
|
||||
A: The Viridis color scale represents relative relevance scores, with yellow indicating the most relevant documents and purple indicating lower relevance.
|
||||
@@ -24,6 +24,7 @@ from starlette.middleware.authentication import AuthenticationMiddleware
|
||||
from starlette.middleware.cors import CORSMiddleware
|
||||
from starlette.responses import JSONResponse, RedirectResponse
|
||||
from starlette.routing import Mount, Route
|
||||
from starlette.staticfiles import StaticFiles
|
||||
|
||||
from nextcloud_mcp_server.auth import (
|
||||
InsufficientScopeError,
|
||||
@@ -1491,7 +1492,7 @@ def get_app(transport: str = "sse", enabled_apps: list[str] | None = None):
|
||||
# Create a separate Starlette app for browser routes that need session auth
|
||||
# This prevents SessionAuthBackend from interfering with FastMCP's OAuth
|
||||
browser_routes = [
|
||||
Route("/", user_info_html, methods=["GET"]), # /app → webapp (HTML UI)
|
||||
Route("/", user_info_html, methods=["GET"]), # /app → user info with all tabs
|
||||
Route(
|
||||
"/revoke", revoke_session, methods=["POST"], name="revoke_session_endpoint"
|
||||
), # /app/revoke → revoke_session
|
||||
@@ -1527,6 +1528,14 @@ def get_app(transport: str = "sse", enabled_apps: list[str] | None = None):
|
||||
),
|
||||
]
|
||||
|
||||
# Add static files mount if directory exists
|
||||
static_dir = os.path.join(os.path.dirname(__file__), "auth", "static")
|
||||
if os.path.isdir(static_dir):
|
||||
browser_routes.append(
|
||||
Mount("/static", StaticFiles(directory=static_dir), name="static")
|
||||
)
|
||||
logger.info(f"Mounted static files from {static_dir}")
|
||||
|
||||
browser_app = Starlette(routes=browser_routes)
|
||||
browser_app.add_middleware(
|
||||
AuthenticationMiddleware, # type: ignore[invalid-argument-type]
|
||||
|
||||
|
After Width: | Height: | Size: 18 KiB |
@@ -0,0 +1,192 @@
|
||||
.viz-layout {
|
||||
display: flex;
|
||||
flex-direction: column;
|
||||
gap: 16px;
|
||||
height: 100%;
|
||||
min-height: 0;
|
||||
overflow-y: auto;
|
||||
}
|
||||
.viz-card {
|
||||
background: var(--color-main-background);
|
||||
border-radius: 0;
|
||||
padding: 16px;
|
||||
box-shadow: none;
|
||||
}
|
||||
.viz-controls-card {
|
||||
flex: 0 0 auto;
|
||||
border-bottom: 1px solid var(--color-border);
|
||||
padding-bottom: 16px;
|
||||
}
|
||||
.viz-controls-grid {
|
||||
display: grid;
|
||||
grid-template-columns: repeat(auto-fit, minmax(200px, 1fr));
|
||||
gap: 12px;
|
||||
align-items: end;
|
||||
}
|
||||
@media (min-width: 768px) {
|
||||
.viz-controls-grid {
|
||||
grid-template-columns: 2fr 1.5fr 1.5fr auto auto;
|
||||
}
|
||||
}
|
||||
.viz-control-group {
|
||||
display: flex;
|
||||
flex-direction: column;
|
||||
gap: 4px;
|
||||
}
|
||||
.viz-control-group label {
|
||||
font-weight: 500;
|
||||
color: var(--color-main-text);
|
||||
font-size: 13px;
|
||||
}
|
||||
.viz-control-group input[type="text"],
|
||||
.viz-control-group input[type="number"],
|
||||
.viz-control-group select {
|
||||
width: 100%;
|
||||
padding: 7px 10px;
|
||||
border: 1px solid var(--color-border-dark);
|
||||
border-radius: var(--border-radius);
|
||||
font-size: 14px;
|
||||
background: var(--color-main-background);
|
||||
color: var(--color-main-text);
|
||||
}
|
||||
.viz-control-group input:focus,
|
||||
.viz-control-group select:focus {
|
||||
outline: none;
|
||||
border-color: var(--color-primary-element);
|
||||
}
|
||||
.viz-control-group input[type="range"] {
|
||||
width: 100%;
|
||||
}
|
||||
.viz-control-group select[multiple] {
|
||||
min-height: 100px;
|
||||
}
|
||||
.viz-weight-display {
|
||||
display: inline-block;
|
||||
min-width: 40px;
|
||||
text-align: right;
|
||||
color: #666;
|
||||
}
|
||||
.viz-btn {
|
||||
background: var(--color-primary-element);
|
||||
color: white;
|
||||
border: none;
|
||||
padding: 7px 16px;
|
||||
border-radius: var(--border-radius);
|
||||
cursor: pointer;
|
||||
font-size: 14px;
|
||||
font-weight: 500;
|
||||
white-space: nowrap;
|
||||
}
|
||||
.viz-btn:hover {
|
||||
background: #0052a3;
|
||||
}
|
||||
.viz-btn-secondary {
|
||||
background: #6c757d;
|
||||
color: white;
|
||||
border: none;
|
||||
padding: 7px 16px;
|
||||
border-radius: var(--border-radius);
|
||||
cursor: pointer;
|
||||
font-size: 14px;
|
||||
white-space: nowrap;
|
||||
}
|
||||
.viz-btn-secondary:hover {
|
||||
background: #5a6268;
|
||||
}
|
||||
.viz-card-plot {
|
||||
flex: 0 0 auto;
|
||||
display: flex;
|
||||
flex-direction: column;
|
||||
min-height: 500px;
|
||||
height: 600px;
|
||||
/* Remove horizontal padding to extend to full viewport width */
|
||||
padding-left: 0;
|
||||
padding-right: 0;
|
||||
margin-left: -16px;
|
||||
margin-right: -16px;
|
||||
}
|
||||
#viz-plot-container {
|
||||
width: 100%;
|
||||
height: 100%;
|
||||
position: relative;
|
||||
overflow: visible;
|
||||
}
|
||||
#viz-plot {
|
||||
width: 100%;
|
||||
height: 100%;
|
||||
}
|
||||
.viz-loading {
|
||||
text-align: center;
|
||||
padding: 40px;
|
||||
color: #666;
|
||||
}
|
||||
.viz-loading-overlay {
|
||||
position: absolute;
|
||||
inset: 0;
|
||||
display: flex;
|
||||
align-items: center;
|
||||
justify-content: center;
|
||||
background: white;
|
||||
color: #666;
|
||||
}
|
||||
.viz-no-results {
|
||||
text-align: center;
|
||||
padding: 40px;
|
||||
color: #666;
|
||||
font-style: italic;
|
||||
}
|
||||
.viz-advanced-section {
|
||||
margin-top: 12px;
|
||||
padding: 12px;
|
||||
background: var(--color-background-hover);
|
||||
border-radius: var(--border-radius);
|
||||
border: 1px solid var(--color-border);
|
||||
}
|
||||
.viz-info-box {
|
||||
background: var(--color-primary-element-light);
|
||||
border-left: 3px solid var(--color-primary-element);
|
||||
padding: 10px 12px;
|
||||
margin-bottom: 16px;
|
||||
font-size: 13px;
|
||||
color: var(--color-main-text);
|
||||
}
|
||||
.chunk-toggle-btn {
|
||||
background: #6c757d;
|
||||
color: white;
|
||||
border: none;
|
||||
padding: 4px 10px;
|
||||
border-radius: 3px;
|
||||
cursor: pointer;
|
||||
font-size: 12px;
|
||||
margin-top: 6px;
|
||||
}
|
||||
.chunk-toggle-btn:hover {
|
||||
background: #5a6268;
|
||||
}
|
||||
.chunk-context {
|
||||
background: var(--color-background-hover);
|
||||
border: 1px solid var(--color-border);
|
||||
border-radius: var(--border-radius);
|
||||
padding: 12px;
|
||||
margin-top: 8px;
|
||||
font-family: 'SFMono-Regular', 'Consolas', 'Liberation Mono', 'Menlo', monospace;
|
||||
font-size: 13px;
|
||||
line-height: 1.6;
|
||||
white-space: pre-wrap;
|
||||
word-wrap: break-word;
|
||||
}
|
||||
.chunk-text {
|
||||
color: var(--color-text-maxcontrast);
|
||||
}
|
||||
.chunk-matched {
|
||||
background: #fff3cd;
|
||||
border: 1px solid #ffc107;
|
||||
padding: 2px 4px;
|
||||
border-radius: var(--border-radius);
|
||||
font-weight: 500;
|
||||
color: var(--color-main-text);
|
||||
}
|
||||
.chunk-ellipsis {
|
||||
color: var(--color-text-maxcontrast);
|
||||
font-style: italic;
|
||||
}
|
||||
@@ -0,0 +1,212 @@
|
||||
// Initialize vizApp for vector visualization
|
||||
function vizApp() {
|
||||
return {
|
||||
query: '',
|
||||
algorithm: 'bm25_hybrid',
|
||||
fusion: 'rrf',
|
||||
showAdvanced: false,
|
||||
showQueryPoint: true,
|
||||
docTypes: [''],
|
||||
limit: 50,
|
||||
scoreThreshold: 0.0,
|
||||
loading: false,
|
||||
results: [],
|
||||
coordinates: null,
|
||||
queryCoords: null,
|
||||
expandedChunks: {},
|
||||
chunkLoading: {},
|
||||
|
||||
async executeSearch() {
|
||||
this.loading = true;
|
||||
this.results = [];
|
||||
|
||||
try {
|
||||
const params = new URLSearchParams({
|
||||
query: this.query,
|
||||
algorithm: this.algorithm,
|
||||
limit: this.limit,
|
||||
score_threshold: this.scoreThreshold,
|
||||
});
|
||||
|
||||
if (this.algorithm === 'bm25_hybrid') {
|
||||
params.append('fusion', this.fusion);
|
||||
}
|
||||
|
||||
const selectedTypes = this.docTypes.filter(t => t !== '');
|
||||
if (selectedTypes.length > 0) {
|
||||
params.append('doc_types', selectedTypes.join(','));
|
||||
}
|
||||
|
||||
const response = await fetch(`/app/vector-viz/search?${params}`);
|
||||
const data = await response.json();
|
||||
|
||||
if (data.success) {
|
||||
this.results = data.results;
|
||||
this.coordinates = data.coordinates_3d;
|
||||
this.queryCoords = data.query_coords;
|
||||
this.renderPlot(this.coordinates, this.queryCoords, this.results);
|
||||
} else {
|
||||
alert('Search failed: ' + data.error);
|
||||
}
|
||||
} catch (error) {
|
||||
alert('Error: ' + error.message);
|
||||
} finally {
|
||||
this.loading = false;
|
||||
}
|
||||
},
|
||||
|
||||
updatePlot() {
|
||||
// Re-render plot with current data when toggle changes
|
||||
if (this.coordinates && this.queryCoords && this.results.length > 0) {
|
||||
this.renderPlot(this.coordinates, this.queryCoords, this.results);
|
||||
}
|
||||
},
|
||||
|
||||
renderPlot(coordinates, queryCoords, results) {
|
||||
const scores = results.map(r => r.score);
|
||||
|
||||
// Trace 1: Document results
|
||||
const documentTrace = {
|
||||
x: coordinates.map(c => c[0]),
|
||||
y: coordinates.map(c => c[1]),
|
||||
z: coordinates.map(c => c[2]),
|
||||
mode: 'markers',
|
||||
type: 'scatter3d',
|
||||
name: 'Documents',
|
||||
customdata: results.map((r, i) => ({
|
||||
title: r.title,
|
||||
raw_score: r.original_score,
|
||||
relative_score: r.score,
|
||||
x: coordinates[i][0],
|
||||
y: coordinates[i][1],
|
||||
z: coordinates[i][2]
|
||||
})),
|
||||
hovertemplate:
|
||||
'<b>%{customdata.title}</b><br>' +
|
||||
'Raw Score: %{customdata.raw_score:.3f} (%{customdata.relative_score:.0%} relative)<br>' +
|
||||
'(x=%{customdata.x}, y=%{customdata.y}, z=%{customdata.z})' +
|
||||
'<extra></extra>',
|
||||
marker: {
|
||||
size: results.map(r => 4 + (Math.pow(r.score, 2) * 10)),
|
||||
opacity: results.map(r => 0.3 + (r.score * 0.7)),
|
||||
color: scores,
|
||||
colorscale: 'Viridis',
|
||||
showscale: true,
|
||||
colorbar: { title: 'Relative Score' },
|
||||
cmin: 0,
|
||||
cmax: 1
|
||||
}
|
||||
};
|
||||
|
||||
// Trace 2: Query point (distinct marker)
|
||||
const queryTrace = {
|
||||
x: [queryCoords[0]],
|
||||
y: [queryCoords[1]],
|
||||
z: [queryCoords[2]],
|
||||
mode: 'markers',
|
||||
type: 'scatter3d',
|
||||
name: 'Query',
|
||||
hovertemplate:
|
||||
'<b>Search Query</b><br>' +
|
||||
`(x=${queryCoords[0]}, y=${queryCoords[1]}, z=${queryCoords[2]})` +
|
||||
'<extra></extra>',
|
||||
marker: {
|
||||
size: 10,
|
||||
color: '#ef5350', // Subdued red (Material Design Red 400)
|
||||
line: {
|
||||
color: '#c62828', // Darker red border (Material Design Red 800)
|
||||
width: 1
|
||||
}
|
||||
}
|
||||
};
|
||||
|
||||
const layout = {
|
||||
title: `Vector Space (PCA 3D) - ${results.length} results`,
|
||||
scene: {
|
||||
xaxis: { title: 'PC1' },
|
||||
yaxis: { title: 'PC2' },
|
||||
zaxis: { title: 'PC3' },
|
||||
camera: {
|
||||
eye: { x: 1.5, y: 1.5, z: 1.5 }
|
||||
}
|
||||
},
|
||||
hovermode: 'closest',
|
||||
autosize: true, // Enable auto-sizing to fit container
|
||||
showlegend: true,
|
||||
margin: { l: 0, r: 0, t: 40, b: 0 } // Minimize margins for full width
|
||||
};
|
||||
|
||||
// Conditionally include query trace based on toggle
|
||||
const traces = this.showQueryPoint ? [documentTrace, queryTrace] : [documentTrace];
|
||||
|
||||
// Enable responsive resizing
|
||||
const config = {
|
||||
responsive: true,
|
||||
displayModeBar: true
|
||||
};
|
||||
|
||||
Plotly.newPlot('viz-plot', traces, layout, config);
|
||||
},
|
||||
|
||||
getNextcloudUrl(result) {
|
||||
// Use global NEXTCLOUD_BASE_URL if set, otherwise construct from window location
|
||||
const baseUrl = window.NEXTCLOUD_BASE_URL || '';
|
||||
switch (result.doc_type) {
|
||||
case 'note':
|
||||
return `${baseUrl}/apps/notes/note/${result.id}`;
|
||||
case 'file':
|
||||
return `${baseUrl}/apps/files/?fileId=${result.id}`;
|
||||
case 'calendar':
|
||||
return `${baseUrl}/apps/calendar`;
|
||||
case 'contact':
|
||||
return `${baseUrl}/apps/contacts`;
|
||||
case 'deck':
|
||||
return `${baseUrl}/apps/deck`;
|
||||
default:
|
||||
return `${baseUrl}`;
|
||||
}
|
||||
},
|
||||
|
||||
hasChunkPosition(result) {
|
||||
return result.chunk_start_offset != null && result.chunk_end_offset != null;
|
||||
},
|
||||
|
||||
isChunkExpanded(resultKey) {
|
||||
return this.expandedChunks[resultKey] !== undefined;
|
||||
},
|
||||
|
||||
async toggleChunk(result) {
|
||||
const resultKey = `${result.doc_type}_${result.id}`;
|
||||
|
||||
if (this.isChunkExpanded(resultKey)) {
|
||||
delete this.expandedChunks[resultKey];
|
||||
return;
|
||||
}
|
||||
|
||||
this.chunkLoading[resultKey] = true;
|
||||
|
||||
try {
|
||||
const params = new URLSearchParams({
|
||||
doc_type: result.doc_type,
|
||||
doc_id: result.id,
|
||||
start: result.chunk_start_offset,
|
||||
end: result.chunk_end_offset,
|
||||
context: 500
|
||||
});
|
||||
|
||||
const response = await fetch(`/app/chunk-context?${params}`);
|
||||
const data = await response.json();
|
||||
|
||||
if (data.success) {
|
||||
this.expandedChunks[resultKey] = data;
|
||||
} else {
|
||||
alert('Failed to load chunk: ' + data.error);
|
||||
}
|
||||
} catch (error) {
|
||||
alert('Error loading chunk: ' + error.message);
|
||||
} finally {
|
||||
delete this.chunkLoading[resultKey];
|
||||
}
|
||||
}
|
||||
};
|
||||
}
|
||||
@@ -31,15 +31,187 @@
|
||||
padding-top: 20px;
|
||||
border-top: 1px solid var(--color-border);
|
||||
}
|
||||
|
||||
/* Welcome tab specific styles */
|
||||
.hero-section {
|
||||
background: linear-gradient(135deg, var(--color-primary-element) 0%, #0082c9 100%);
|
||||
color: white;
|
||||
padding: 60px 24px;
|
||||
margin: -24px -24px 40px -24px;
|
||||
border-radius: 0 0 var(--border-radius-large) var(--border-radius-large);
|
||||
text-align: center;
|
||||
}
|
||||
|
||||
.hero-section h1 {
|
||||
color: white;
|
||||
font-size: 36px;
|
||||
margin: 0 0 16px 0;
|
||||
font-weight: 600;
|
||||
}
|
||||
|
||||
.hero-section p {
|
||||
font-size: 18px;
|
||||
opacity: 0.95;
|
||||
max-width: 700px;
|
||||
margin: 0 auto;
|
||||
line-height: 1.6;
|
||||
}
|
||||
|
||||
.feature-grid {
|
||||
display: grid;
|
||||
grid-template-columns: repeat(auto-fit, minmax(280px, 1fr));
|
||||
gap: 24px;
|
||||
margin: 32px 0;
|
||||
}
|
||||
|
||||
.feature-card {
|
||||
background: var(--color-main-background);
|
||||
border: 2px solid var(--color-border);
|
||||
border-radius: var(--border-radius-large);
|
||||
padding: 24px;
|
||||
transition: all 0.2s;
|
||||
cursor: pointer;
|
||||
text-decoration: none;
|
||||
color: inherit;
|
||||
display: block;
|
||||
}
|
||||
|
||||
.feature-card:hover {
|
||||
border-color: var(--color-primary-element);
|
||||
box-shadow: 0 4px 12px rgba(0, 103, 158, 0.15);
|
||||
transform: translateY(-2px);
|
||||
}
|
||||
|
||||
.feature-card h3 {
|
||||
color: var(--color-primary-element);
|
||||
font-size: 20px;
|
||||
margin: 12px 0 8px 0;
|
||||
font-weight: 600;
|
||||
display: flex;
|
||||
align-items: center;
|
||||
gap: 12px;
|
||||
}
|
||||
|
||||
.feature-card p {
|
||||
color: var(--color-text-maxcontrast);
|
||||
font-size: 14px;
|
||||
line-height: 1.6;
|
||||
margin: 8px 0 0 0;
|
||||
}
|
||||
|
||||
.feature-icon {
|
||||
width: 48px;
|
||||
height: 48px;
|
||||
background: var(--color-primary-element-light);
|
||||
border-radius: var(--border-radius);
|
||||
display: flex;
|
||||
align-items: center;
|
||||
justify-content: center;
|
||||
margin-bottom: 8px;
|
||||
}
|
||||
|
||||
.feature-icon svg {
|
||||
width: 28px;
|
||||
height: 28px;
|
||||
fill: var(--color-primary-element);
|
||||
}
|
||||
|
||||
.info-section {
|
||||
background: var(--color-background-hover);
|
||||
border-radius: var(--border-radius-large);
|
||||
padding: 32px;
|
||||
margin: 32px 0;
|
||||
}
|
||||
|
||||
.info-section h2 {
|
||||
color: var(--color-main-text);
|
||||
font-size: 24px;
|
||||
margin: 0 0 16px 0;
|
||||
border: none;
|
||||
padding: 0;
|
||||
}
|
||||
|
||||
.info-section p {
|
||||
color: var(--color-text-maxcontrast);
|
||||
line-height: 1.7;
|
||||
margin: 12px 0;
|
||||
}
|
||||
|
||||
.info-section ul {
|
||||
margin: 12px 0;
|
||||
padding-left: 24px;
|
||||
}
|
||||
|
||||
.info-section li {
|
||||
color: var(--color-text-maxcontrast);
|
||||
line-height: 1.7;
|
||||
margin: 8px 0;
|
||||
}
|
||||
|
||||
.info-section code {
|
||||
background: var(--color-main-background);
|
||||
padding: 2px 8px;
|
||||
border-radius: var(--border-radius);
|
||||
font-size: 13px;
|
||||
}
|
||||
|
||||
.auth-status {
|
||||
background: var(--color-primary-element-light);
|
||||
border-left: 4px solid var(--color-primary-element);
|
||||
padding: 16px 20px;
|
||||
margin: 24px 0;
|
||||
border-radius: var(--border-radius);
|
||||
display: flex;
|
||||
align-items: center;
|
||||
gap: 12px;
|
||||
}
|
||||
|
||||
.auth-status svg {
|
||||
width: 24px;
|
||||
height: 24px;
|
||||
fill: var(--color-primary-element);
|
||||
flex-shrink: 0;
|
||||
}
|
||||
|
||||
.auth-status-text {
|
||||
flex: 1;
|
||||
}
|
||||
|
||||
.auth-status-text strong {
|
||||
display: block;
|
||||
color: var(--color-main-text);
|
||||
font-size: 14px;
|
||||
margin-bottom: 4px;
|
||||
}
|
||||
|
||||
.auth-status-text span {
|
||||
color: var(--color-text-maxcontrast);
|
||||
font-size: 13px;
|
||||
}
|
||||
{% endblock %}
|
||||
|
||||
{% block content %}
|
||||
<div class="app-content-wrapper" x-data="{ activeSection: 'user-info', navOpen: true }">
|
||||
<div class="app-content-wrapper" x-data="{ activeSection: 'welcome', navOpen: true }">
|
||||
<!-- Side Navigation -->
|
||||
<nav id="app-navigation" :class="{ 'app-navigation--closed': !navOpen }">
|
||||
<div class="app-navigation__content">
|
||||
<!-- Navigation List -->
|
||||
<ul class="app-navigation-list">
|
||||
<li class="app-navigation-entry" :class="{ 'active': activeSection === 'welcome' }">
|
||||
<div class="app-navigation-entry__wrapper">
|
||||
<a href="#"
|
||||
@click.prevent="activeSection = 'welcome'"
|
||||
class="app-navigation-entry-link">
|
||||
<span class="app-navigation-entry-icon">
|
||||
<svg class="nav-icon" viewBox="0 0 24 24">
|
||||
<path d="M10,20V14H14V20H19V12H22L12,3L2,12H5V20H10Z" />
|
||||
</svg>
|
||||
</span>
|
||||
<span class="app-navigation-entry__name">Welcome</span>
|
||||
</a>
|
||||
</div>
|
||||
</li>
|
||||
|
||||
<li class="app-navigation-entry" :class="{ 'active': activeSection === 'user-info' }">
|
||||
<div class="app-navigation-entry__wrapper">
|
||||
<a href="#"
|
||||
@@ -135,6 +307,297 @@
|
||||
<!-- Main Content Area -->
|
||||
<main id="app-content">
|
||||
<div class="page-content">
|
||||
<!-- Welcome Section -->
|
||||
<div x-show="activeSection === 'welcome'">
|
||||
<!-- Hero Section -->
|
||||
<div class="hero-section">
|
||||
<h1>Welcome to Nextcloud MCP Server</h1>
|
||||
<p>
|
||||
Interactive user interface for semantic search and document retrieval.
|
||||
Test queries, visualize results, and explore your Nextcloud content using RAG workflows.
|
||||
</p>
|
||||
</div>
|
||||
|
||||
<!-- Authentication Status -->
|
||||
<div class="auth-status">
|
||||
<svg viewBox="0 0 24 24">
|
||||
<path d="M12,4A4,4 0 0,1 16,8A4,4 0 0,1 12,12A4,4 0 0,1 8,8A4,4 0 0,1 12,4M12,14C16.42,14 20,15.79 20,18V20H4V18C4,15.79 7.58,14 12,14Z" />
|
||||
</svg>
|
||||
<div class="auth-status-text">
|
||||
<strong>Authenticated as: {{ username }}</strong>
|
||||
<span>Authentication mode: <code>{{ auth_mode }}</code></span>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
{% if vector_sync_enabled %}
|
||||
<!-- Vector Sync Enabled Content -->
|
||||
<div class="info-section">
|
||||
<h2>About Semantic Search</h2>
|
||||
<p>
|
||||
This interface provides access to <strong>semantic search</strong> capabilities powered by vector embeddings.
|
||||
Unlike traditional keyword search, semantic search understands the <em>meaning</em> of your queries and finds
|
||||
conceptually similar content across your Nextcloud apps.
|
||||
</p>
|
||||
<p>
|
||||
<strong>How it works:</strong>
|
||||
</p>
|
||||
<ul>
|
||||
<li>Documents from Notes, Calendar, Files, Contacts, and Deck are indexed into a vector database</li>
|
||||
<li>Each document chunk is converted to a 768-dimensional vector embedding that captures semantic meaning</li>
|
||||
<li>Queries are also converted to embeddings and matched against document vectors using similarity search</li>
|
||||
<li>Results can be retrieved using pure semantic search or hybrid BM25 search combining keywords and semantics</li>
|
||||
</ul>
|
||||
</div>
|
||||
|
||||
<div class="info-section">
|
||||
<h2>RAG Workflow Integration</h2>
|
||||
<p>
|
||||
This UI allows you to <strong>test the same queries that Large Language Models (LLMs) would use</strong> in a
|
||||
Retrieval-Augmented Generation (RAG) workflow. When an AI assistant needs to answer questions about your data:
|
||||
</p>
|
||||
<ul>
|
||||
<li><strong>Step 1:</strong> The assistant converts your question into a search query</li>
|
||||
<li><strong>Step 2:</strong> The MCP server retrieves relevant document chunks using semantic search</li>
|
||||
<li><strong>Step 3:</strong> Retrieved context is passed to the LLM to generate an informed answer</li>
|
||||
</ul>
|
||||
|
||||
<!-- RAG Workflow Diagram -->
|
||||
<div style="background: var(--color-main-background); border: 2px solid var(--color-primary-element); border-radius: var(--border-radius-large); padding: 24px; margin: 24px 0; overflow-x: auto;">
|
||||
<div style="text-align: center; font-weight: 600; margin-bottom: 20px; color: var(--color-primary-element); font-size: 16px;">
|
||||
MCP Sampling RAG Workflow
|
||||
</div>
|
||||
|
||||
<!-- Four-component bidirectional flow -->
|
||||
<div style="max-width: 1000px; margin: 0 auto;">
|
||||
<div style="display: grid; grid-template-columns: 0.7fr auto 1fr auto 1fr auto 0.9fr; gap: 10px; align-items: center;">
|
||||
<!-- User -->
|
||||
<div style="background: var(--color-background-hover); border: 2px solid var(--color-border); border-radius: var(--border-radius-large); padding: 14px; text-align: center;">
|
||||
<div style="font-size: 26px; margin-bottom: 5px;">👤</div>
|
||||
<div style="font-weight: 600; color: var(--color-main-text); font-size: 12px;">User</div>
|
||||
<div style="font-size: 9px; color: var(--color-text-maxcontrast); font-style: italic; margin-top: 5px; line-height: 1.2;">
|
||||
"What are health<br>benefits of coffee?"
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<!-- Arrow User <-> Client -->
|
||||
<div style="text-align: center;">
|
||||
<div style="font-size: 20px; color: var(--color-text-maxcontrast);">↔</div>
|
||||
</div>
|
||||
|
||||
<!-- MCP Client + LLM (combined) -->
|
||||
<div style="background: var(--color-primary-element-light); border: 2px solid var(--color-primary-element); border-radius: var(--border-radius-large); padding: 12px; text-align: center;">
|
||||
<div style="font-weight: 600; color: var(--color-primary-element); font-size: 13px; margin-bottom: 8px;">MCP Client + LLM</div>
|
||||
|
||||
<div style="background: var(--color-main-background); border-radius: var(--border-radius); padding: 8px; margin-bottom: 6px;">
|
||||
<div style="font-size: 9px; color: var(--color-text-maxcontrast);">(Claude Code)</div>
|
||||
</div>
|
||||
|
||||
<div style="background: var(--color-main-background); border-radius: var(--border-radius); padding: 8px; border: 2px solid var(--color-primary-element);">
|
||||
<div style="font-size: 16px; margin-bottom: 2px;">🧠</div>
|
||||
<div style="font-weight: 600; color: var(--color-main-text); font-size: 10px;">Client's LLM</div>
|
||||
<div style="font-size: 8px; color: var(--color-text-maxcontrast);">(Claude)</div>
|
||||
</div>
|
||||
|
||||
<div style="margin-top: 8px; font-size: 8px; color: var(--color-text-maxcontrast); line-height: 1.2;">
|
||||
<strong>Enables RAG:</strong><br>
|
||||
Receives context,<br>
|
||||
generates answer
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<!-- Arrow Client <-> Server -->
|
||||
<div style="text-align: center;">
|
||||
<div style="font-size: 20px; color: var(--color-primary-element);">↔</div>
|
||||
<div style="font-size: 7px; color: var(--color-text-maxcontrast); margin-top: 2px; font-weight: 600; line-height: 1.1;">
|
||||
Query +<br>
|
||||
Sampling
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<!-- MCP Server -->
|
||||
<div style="background: var(--color-primary-element-light); border: 2px solid var(--color-primary-element); border-radius: var(--border-radius-large); padding: 12px; text-align: center;">
|
||||
<div style="font-weight: 600; color: var(--color-primary-element); font-size: 13px; margin-bottom: 8px;">MCP Server</div>
|
||||
|
||||
<div style="background: var(--color-main-background); border-radius: var(--border-radius); padding: 7px; margin-bottom: 5px;">
|
||||
<div style="font-weight: 600; color: var(--color-main-text); font-size: 9px; margin-bottom: 2px;">1. Semantic Search</div>
|
||||
<div style="font-size: 7px; color: var(--color-text-maxcontrast); line-height: 1.2;">
|
||||
Vector embeddings<br>
|
||||
BM25 Hybrid + RRF
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<div style="background: var(--color-main-background); border-radius: var(--border-radius); padding: 7px; margin-bottom: 5px;">
|
||||
<div style="font-weight: 600; color: var(--color-main-text); font-size: 9px; margin-bottom: 2px;">2. Retrieve Context</div>
|
||||
<div style="font-size: 7px; color: var(--color-text-maxcontrast); line-height: 1.2;">
|
||||
Top relevant docs<br>
|
||||
with scores
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<div style="background: var(--color-main-background); border-radius: var(--border-radius); padding: 7px; margin-bottom: 5px;">
|
||||
<div style="font-weight: 600; color: var(--color-main-text); font-size: 9px; margin-bottom: 2px;">3. Format Response</div>
|
||||
<div style="font-size: 7px; color: var(--color-text-maxcontrast); line-height: 1.2;">
|
||||
Document chunks<br>
|
||||
with citations
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<div style="background: var(--color-main-background); border-radius: var(--border-radius); padding: 7px;">
|
||||
<div style="font-weight: 600; color: var(--color-main-text); font-size: 9px; margin-bottom: 2px;">4. Send to LLM</div>
|
||||
<div style="font-size: 7px; color: var(--color-text-maxcontrast); line-height: 1.2;">
|
||||
Via MCP sampling<br>
|
||||
for answer generation
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<!-- Arrow Server <-> Nextcloud -->
|
||||
<div style="text-align: center;">
|
||||
<div style="font-size: 20px; color: var(--color-primary-element);">↔</div>
|
||||
<div style="font-size: 7px; color: var(--color-text-maxcontrast); margin-top: 2px; font-weight: 600; line-height: 1.1;">
|
||||
Retrieve
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<!-- Nextcloud -->
|
||||
<div style="background: var(--color-background-hover); border: 2px solid var(--color-border); border-radius: var(--border-radius-large); padding: 12px; text-align: center; position: relative;">
|
||||
<img src="/app/static/nextcloud-logo.png" alt="Nextcloud" style="width: 40px; height: 40px; margin-bottom: 6px;" />
|
||||
<div style="font-weight: 600; color: var(--color-main-text); font-size: 12px; margin-bottom: 4px;">Nextcloud</div>
|
||||
<div style="font-size: 8px; color: var(--color-text-maxcontrast); line-height: 1.2;">
|
||||
Notes, Calendar,<br>
|
||||
Files, Contacts,<br>
|
||||
Deck
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<!-- Explanation below diagram -->
|
||||
<div style="margin-top: 24px; padding: 16px; background: var(--color-background-hover); border-radius: var(--border-radius); border-left: 4px solid var(--color-primary-element);">
|
||||
<div style="font-size: 12px; color: var(--color-main-text); line-height: 1.6;">
|
||||
<strong>How RAG works via MCP Sampling:</strong>
|
||||
</div>
|
||||
<ol style="margin: 8px 0 0 0; padding-left: 20px; font-size: 11px; color: var(--color-text-maxcontrast); line-height: 1.6;">
|
||||
<li>User asks question through MCP Client</li>
|
||||
<li>Client sends query to MCP Server</li>
|
||||
<li>Server retrieves relevant document context from Nextcloud</li>
|
||||
<li><strong>Server sends context back to Client's LLM</strong> (MCP Sampling)</li>
|
||||
<li>Client's LLM generates answer with citations using retrieved context</li>
|
||||
<li>Answer returned to user</li>
|
||||
</ol>
|
||||
<div style="margin-top: 8px; font-size: 10px; color: var(--color-text-maxcontrast); font-style: italic;">
|
||||
The server has no LLM - it only retrieves context. The client's existing LLM is reused for answer generation.
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<p style="margin-top: 16px;">
|
||||
<strong>Key Point:</strong> The MCP server retrieves context but doesn't generate answers itself.
|
||||
Through <strong>MCP sampling</strong>, it requests the client's LLM to generate responses, giving users
|
||||
full control over which model is used and ensuring all processing happens client-side.
|
||||
</p>
|
||||
|
||||
<p>
|
||||
By using this interface, you can preview search results, understand relevance scores, and verify
|
||||
that the system retrieves the right information before it reaches the LLM.
|
||||
</p>
|
||||
</div>
|
||||
|
||||
<!-- Feature Cards -->
|
||||
<h2>Available Features</h2>
|
||||
<div class="feature-grid">
|
||||
<a href="#" @click.prevent="activeSection = 'user-info'" class="feature-card">
|
||||
<div class="feature-icon">
|
||||
<svg viewBox="0 0 24 24">
|
||||
<path d="M12,4A4,4 0 0,1 16,8A4,4 0 0,1 12,12A4,4 0 0,1 8,8A4,4 0 0,1 12,4M12,14C16.42,14 20,15.79 20,18V20H4V18C4,15.79 7.58,14 12,14Z" />
|
||||
</svg>
|
||||
</div>
|
||||
<h3>User Information</h3>
|
||||
<p>
|
||||
View your authentication details, session information, and IdP profile.
|
||||
Manage background access permissions.
|
||||
</p>
|
||||
</a>
|
||||
|
||||
<a href="#" @click.prevent="activeSection = 'vector-sync'" class="feature-card">
|
||||
<div class="feature-icon">
|
||||
<svg viewBox="0 0 24 24">
|
||||
<path d="M12,18A6,6 0 0,1 6,12C6,11 6.25,10.03 6.7,9.2L5.24,7.74C4.46,8.97 4,10.43 4,12A8,8 0 0,0 12,20V23L16,19L12,15M12,4V1L8,5L12,9V6A6,6 0 0,1 18,12C18,13 17.75,13.97 17.3,14.8L18.76,16.26C19.54,15.03 20,13.57 20,12A8,8 0 0,0 12,4Z" />
|
||||
</svg>
|
||||
</div>
|
||||
<h3>Vector Sync Status</h3>
|
||||
<p>
|
||||
Monitor real-time indexing progress with metrics for indexed documents, pending queue,
|
||||
and synchronization status.
|
||||
</p>
|
||||
</a>
|
||||
|
||||
<a href="#" @click.prevent="activeSection = 'vector-viz'" class="feature-card">
|
||||
<div class="feature-icon">
|
||||
<svg viewBox="0 0 24 24">
|
||||
<path d="M22,21H2V3H4V19H6V10H10V19H12V6H16V19H18V14H22V21Z" />
|
||||
</svg>
|
||||
</div>
|
||||
<h3>Vector Visualization</h3>
|
||||
<p>
|
||||
Interactive search interface with 2D PCA visualization. Compare algorithms,
|
||||
view relevance scores, and explore matched document chunks.
|
||||
</p>
|
||||
</a>
|
||||
</div>
|
||||
|
||||
{% else %}
|
||||
<!-- Vector Sync Disabled Content -->
|
||||
<div class="warning">
|
||||
<h3 style="margin-top: 0;">Vector Sync is Disabled</h3>
|
||||
<p>
|
||||
Semantic search and vector visualization features are currently disabled.
|
||||
To enable these features, set <code>VECTOR_SYNC_ENABLED=true</code> in your environment configuration.
|
||||
</p>
|
||||
<p style="margin-bottom: 0;">
|
||||
<strong>Learn more:</strong>
|
||||
<a href="https://github.com/cbcoutinho/nextcloud-mcp-server/blob/master/docs/configuration.md" target="_blank" style="color: inherit; text-decoration: underline;">
|
||||
Configuration Guide
|
||||
</a>
|
||||
</p>
|
||||
</div>
|
||||
|
||||
<!-- Limited Feature Card -->
|
||||
<h2>Available Features</h2>
|
||||
<div class="feature-grid">
|
||||
<a href="#" @click.prevent="activeSection = 'user-info'" class="feature-card">
|
||||
<div class="feature-icon">
|
||||
<svg viewBox="0 0 24 24">
|
||||
<path d="M12,4A4,4 0 0,1 16,8A4,4 0 0,1 12,12A4,4 0 0,1 8,8A4,4 0 0,1 12,4M12,14C16.42,14 20,15.79 20,18V20H4V18C4,15.79 7.58,14 12,14Z" />
|
||||
</svg>
|
||||
</div>
|
||||
<h3>User Information</h3>
|
||||
<p>
|
||||
View your authentication details, session information, and IdP profile.
|
||||
Manage background access permissions.
|
||||
</p>
|
||||
</a>
|
||||
</div>
|
||||
{% endif %}
|
||||
|
||||
<!-- Documentation Section -->
|
||||
<div class="info-section" style="margin-top: 40px;">
|
||||
<h2>Documentation</h2>
|
||||
<p>
|
||||
For detailed information about configuration, authentication modes, and advanced features,
|
||||
please refer to the project documentation:
|
||||
</p>
|
||||
<ul>
|
||||
<li><a href="https://github.com/cbcoutinho/nextcloud-mcp-server/blob/master/docs/installation.md" target="_blank">Installation Guide</a></li>
|
||||
<li><a href="https://github.com/cbcoutinho/nextcloud-mcp-server/blob/master/docs/configuration.md" target="_blank">Configuration Options</a></li>
|
||||
<li><a href="https://github.com/cbcoutinho/nextcloud-mcp-server/blob/master/docs/authentication.md" target="_blank">Authentication Modes</a></li>
|
||||
{% if vector_sync_enabled %}
|
||||
<li><a href="https://github.com/cbcoutinho/nextcloud-mcp-server/blob/master/docs/user-guide/vector-sync-ui.md" target="_blank">Vector Sync UI Guide</a></li>
|
||||
{% endif %}
|
||||
</ul>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<!-- User Info Section -->
|
||||
<div x-show="activeSection === 'user-info'">
|
||||
<div class="content-section">
|
||||
@@ -177,149 +640,8 @@
|
||||
</div>
|
||||
|
||||
<script>
|
||||
// Initialize vizApp for vector visualization (if needed)
|
||||
function vizApp() {
|
||||
return {
|
||||
query: '',
|
||||
algorithm: 'bm25_hybrid',
|
||||
fusion: 'rrf',
|
||||
showAdvanced: false,
|
||||
docTypes: [''],
|
||||
limit: 50,
|
||||
scoreThreshold: 0.0,
|
||||
loading: false,
|
||||
results: [],
|
||||
expandedChunks: {},
|
||||
chunkLoading: {},
|
||||
|
||||
async executeSearch() {
|
||||
this.loading = true;
|
||||
this.results = [];
|
||||
|
||||
try {
|
||||
const params = new URLSearchParams({
|
||||
query: this.query,
|
||||
algorithm: this.algorithm,
|
||||
limit: this.limit,
|
||||
score_threshold: this.scoreThreshold,
|
||||
});
|
||||
|
||||
if (this.algorithm === 'bm25_hybrid') {
|
||||
params.append('fusion', this.fusion);
|
||||
}
|
||||
|
||||
const selectedTypes = this.docTypes.filter(t => t !== '');
|
||||
if (selectedTypes.length > 0) {
|
||||
params.append('doc_types', selectedTypes.join(','));
|
||||
}
|
||||
|
||||
const response = await fetch(`/app/vector-viz/search?${params}`);
|
||||
const data = await response.json();
|
||||
|
||||
if (data.success) {
|
||||
this.results = data.results;
|
||||
this.renderPlot(data.coordinates_2d, data.results);
|
||||
} else {
|
||||
alert('Search failed: ' + data.error);
|
||||
}
|
||||
} catch (error) {
|
||||
alert('Error: ' + error.message);
|
||||
} finally {
|
||||
this.loading = false;
|
||||
}
|
||||
},
|
||||
|
||||
renderPlot(coordinates, results) {
|
||||
const scores = results.map(r => r.score);
|
||||
const trace = {
|
||||
x: coordinates.map(c => c[0]),
|
||||
y: coordinates.map(c => c[1]),
|
||||
mode: 'markers',
|
||||
type: 'scatter',
|
||||
text: results.map(r => `${r.title}<br>Raw Score: ${r.original_score.toFixed(3)} (${(r.score * 100).toFixed(0)}% relative)`),
|
||||
marker: {
|
||||
size: results.map(r => 6 + (Math.pow(r.score, 2) * 14)),
|
||||
opacity: results.map(r => 0.2 + (r.score * 0.8)),
|
||||
color: scores,
|
||||
colorscale: 'Viridis',
|
||||
showscale: true,
|
||||
colorbar: { title: 'Relative Score' },
|
||||
cmin: 0,
|
||||
cmax: 1
|
||||
}
|
||||
};
|
||||
|
||||
const layout = {
|
||||
title: `Vector Space (PCA 2D) - ${results.length} results`,
|
||||
xaxis: { title: 'PC1' },
|
||||
yaxis: { title: 'PC2' },
|
||||
hovermode: 'closest',
|
||||
height: 600
|
||||
};
|
||||
|
||||
Plotly.newPlot('viz-plot', [trace], layout);
|
||||
},
|
||||
|
||||
getNextcloudUrl(result) {
|
||||
const baseUrl = '{{ nextcloud_host_for_links }}';
|
||||
switch (result.doc_type) {
|
||||
case 'note':
|
||||
return `${baseUrl}/apps/notes/note/${result.id}`;
|
||||
case 'file':
|
||||
return `${baseUrl}/apps/files/?fileId=${result.id}`;
|
||||
case 'calendar':
|
||||
return `${baseUrl}/apps/calendar`;
|
||||
case 'contact':
|
||||
return `${baseUrl}/apps/contacts`;
|
||||
case 'deck':
|
||||
return `${baseUrl}/apps/deck`;
|
||||
default:
|
||||
return `${baseUrl}`;
|
||||
}
|
||||
},
|
||||
|
||||
hasChunkPosition(result) {
|
||||
return result.chunk_start_offset != null && result.chunk_end_offset != null;
|
||||
},
|
||||
|
||||
isChunkExpanded(resultKey) {
|
||||
return this.expandedChunks[resultKey] !== undefined;
|
||||
},
|
||||
|
||||
async toggleChunk(result) {
|
||||
const resultKey = `${result.doc_type}_${result.id}`;
|
||||
|
||||
if (this.isChunkExpanded(resultKey)) {
|
||||
delete this.expandedChunks[resultKey];
|
||||
return;
|
||||
}
|
||||
|
||||
this.chunkLoading[resultKey] = true;
|
||||
|
||||
try {
|
||||
const params = new URLSearchParams({
|
||||
doc_type: result.doc_type,
|
||||
doc_id: result.id,
|
||||
start: result.chunk_start_offset,
|
||||
end: result.chunk_end_offset,
|
||||
context: 500
|
||||
});
|
||||
|
||||
const response = await fetch(`/app/chunk-context?${params}`);
|
||||
const data = await response.json();
|
||||
|
||||
if (data.success) {
|
||||
this.expandedChunks[resultKey] = data;
|
||||
} else {
|
||||
alert('Failed to load chunk: ' + data.error);
|
||||
}
|
||||
} catch (error) {
|
||||
alert('Error loading chunk: ' + error.message);
|
||||
} finally {
|
||||
delete this.chunkLoading[resultKey];
|
||||
}
|
||||
}
|
||||
};
|
||||
}
|
||||
// Set global Nextcloud base URL for use in external JS
|
||||
window.NEXTCLOUD_BASE_URL = '{{ nextcloud_host_for_links }}';
|
||||
</script>
|
||||
<script src="/app/static/vector-viz.js"></script>
|
||||
{% endblock %}
|
||||
|
||||
@@ -1,192 +1,4 @@
|
||||
<style>
|
||||
.viz-layout {
|
||||
display: flex;
|
||||
flex-direction: column;
|
||||
gap: 16px;
|
||||
height: 100%;
|
||||
min-height: 0;
|
||||
overflow-y: auto;
|
||||
}
|
||||
.viz-card {
|
||||
background: var(--color-main-background);
|
||||
border-radius: 0;
|
||||
padding: 16px;
|
||||
box-shadow: none;
|
||||
}
|
||||
.viz-controls-card {
|
||||
flex: 0 0 auto;
|
||||
border-bottom: 1px solid var(--color-border);
|
||||
padding-bottom: 16px;
|
||||
}
|
||||
.viz-controls-grid {
|
||||
display: grid;
|
||||
grid-template-columns: repeat(auto-fit, minmax(200px, 1fr));
|
||||
gap: 12px;
|
||||
align-items: end;
|
||||
}
|
||||
@media (min-width: 768px) {
|
||||
.viz-controls-grid {
|
||||
grid-template-columns: 2fr 1.5fr 1.5fr auto auto;
|
||||
}
|
||||
}
|
||||
.viz-control-group {
|
||||
display: flex;
|
||||
flex-direction: column;
|
||||
gap: 4px;
|
||||
}
|
||||
.viz-control-group label {
|
||||
font-weight: 500;
|
||||
color: var(--color-main-text);
|
||||
font-size: 13px;
|
||||
}
|
||||
.viz-control-group input[type="text"],
|
||||
.viz-control-group input[type="number"],
|
||||
.viz-control-group select {
|
||||
width: 100%;
|
||||
padding: 7px 10px;
|
||||
border: 1px solid var(--color-border-dark);
|
||||
border-radius: var(--border-radius);
|
||||
font-size: 14px;
|
||||
background: var(--color-main-background);
|
||||
color: var(--color-main-text);
|
||||
}
|
||||
.viz-control-group input:focus,
|
||||
.viz-control-group select:focus {
|
||||
outline: none;
|
||||
border-color: var(--color-primary-element);
|
||||
}
|
||||
.viz-control-group input[type="range"] {
|
||||
width: 100%;
|
||||
}
|
||||
.viz-control-group select[multiple] {
|
||||
min-height: 100px;
|
||||
}
|
||||
.viz-weight-display {
|
||||
display: inline-block;
|
||||
min-width: 40px;
|
||||
text-align: right;
|
||||
color: #666;
|
||||
}
|
||||
.viz-btn {
|
||||
background: var(--color-primary-element);
|
||||
color: white;
|
||||
border: none;
|
||||
padding: 7px 16px;
|
||||
border-radius: var(--border-radius);
|
||||
cursor: pointer;
|
||||
font-size: 14px;
|
||||
font-weight: 500;
|
||||
white-space: nowrap;
|
||||
}
|
||||
.viz-btn:hover {
|
||||
background: #0052a3;
|
||||
}
|
||||
.viz-btn-secondary {
|
||||
background: #6c757d;
|
||||
color: white;
|
||||
border: none;
|
||||
padding: 7px 16px;
|
||||
border-radius: var(--border-radius);
|
||||
cursor: pointer;
|
||||
font-size: 14px;
|
||||
white-space: nowrap;
|
||||
}
|
||||
.viz-btn-secondary:hover {
|
||||
background: #5a6268;
|
||||
}
|
||||
.viz-card-plot {
|
||||
flex: 0 0 auto;
|
||||
display: flex;
|
||||
flex-direction: column;
|
||||
min-height: 500px;
|
||||
height: 600px;
|
||||
}
|
||||
#viz-plot-container {
|
||||
width: 100%;
|
||||
height: 100%;
|
||||
position: relative;
|
||||
overflow: visible;
|
||||
}
|
||||
#viz-plot {
|
||||
width: 100%;
|
||||
height: 100%;
|
||||
}
|
||||
.viz-loading {
|
||||
text-align: center;
|
||||
padding: 40px;
|
||||
color: #666;
|
||||
}
|
||||
.viz-loading-overlay {
|
||||
position: absolute;
|
||||
inset: 0;
|
||||
display: flex;
|
||||
align-items: center;
|
||||
justify-content: center;
|
||||
background: white;
|
||||
color: #666;
|
||||
}
|
||||
.viz-no-results {
|
||||
text-align: center;
|
||||
padding: 40px;
|
||||
color: #666;
|
||||
font-style: italic;
|
||||
}
|
||||
.viz-advanced-section {
|
||||
margin-top: 12px;
|
||||
padding: 12px;
|
||||
background: var(--color-background-hover);
|
||||
border-radius: var(--border-radius);
|
||||
border: 1px solid var(--color-border);
|
||||
}
|
||||
.viz-info-box {
|
||||
background: var(--color-primary-element-light);
|
||||
border-left: 3px solid var(--color-primary-element);
|
||||
padding: 10px 12px;
|
||||
margin-bottom: 16px;
|
||||
font-size: 13px;
|
||||
color: var(--color-main-text);
|
||||
}
|
||||
.chunk-toggle-btn {
|
||||
background: #6c757d;
|
||||
color: white;
|
||||
border: none;
|
||||
padding: 4px 10px;
|
||||
border-radius: 3px;
|
||||
cursor: pointer;
|
||||
font-size: 12px;
|
||||
margin-top: 6px;
|
||||
}
|
||||
.chunk-toggle-btn:hover {
|
||||
background: #5a6268;
|
||||
}
|
||||
.chunk-context {
|
||||
background: var(--color-background-hover);
|
||||
border: 1px solid var(--color-border);
|
||||
border-radius: var(--border-radius);
|
||||
padding: 12px;
|
||||
margin-top: 8px;
|
||||
font-family: 'SFMono-Regular', 'Consolas', 'Liberation Mono', 'Menlo', monospace;
|
||||
font-size: 13px;
|
||||
line-height: 1.6;
|
||||
white-space: pre-wrap;
|
||||
word-wrap: break-word;
|
||||
}
|
||||
.chunk-text {
|
||||
color: var(--color-text-maxcontrast);
|
||||
}
|
||||
.chunk-matched {
|
||||
background: #fff3cd;
|
||||
border: 1px solid #ffc107;
|
||||
padding: 2px 4px;
|
||||
border-radius: var(--border-radius);
|
||||
font-weight: 500;
|
||||
color: var(--color-main-text);
|
||||
}
|
||||
.chunk-ellipsis {
|
||||
color: var(--color-text-maxcontrast);
|
||||
font-style: italic;
|
||||
}
|
||||
</style>
|
||||
<link rel="stylesheet" href="/app/static/vector-viz.css">
|
||||
|
||||
<div x-data="vizApp()">
|
||||
<div class="viz-layout">
|
||||
@@ -233,7 +45,7 @@
|
||||
<div class="viz-controls-grid" style="grid-template-columns: repeat(auto-fit, minmax(150px, 1fr));">
|
||||
<div class="viz-control-group">
|
||||
<label>Document Types</label>
|
||||
<div style="display: flex; flex-wrap: wrap; gap: 8px; font-size: 13px;">
|
||||
<div style="display: grid; grid-template-columns: 1fr 1fr; gap: 8px; font-size: 13px;">
|
||||
<label style="display: flex; align-items: center; cursor: pointer; font-weight: normal;">
|
||||
<input type="checkbox" x-model="docTypes" value="" style="margin-right: 4px;">
|
||||
<span>All</span>
|
||||
@@ -268,7 +80,15 @@
|
||||
|
||||
<div class="viz-control-group">
|
||||
<label>Result Limit</label>
|
||||
<input type="number" x-model.number="limit" min="1" max="100" />
|
||||
<input type="number" x-model.number="limit" min="1" max="1000" />
|
||||
</div>
|
||||
|
||||
<div class="viz-control-group">
|
||||
<label>Display Options</label>
|
||||
<label style="display: flex; align-items: center; cursor: pointer; font-weight: normal; margin-top: 4px;">
|
||||
<input type="checkbox" x-model="showQueryPoint" @change="updatePlot()" style="margin-right: 6px;">
|
||||
<span>Show Query Point</span>
|
||||
</label>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
@@ -0,0 +1,392 @@
|
||||
{% extends "base.html" %}
|
||||
|
||||
{% block title %}Welcome - Nextcloud MCP Server{% endblock %}
|
||||
|
||||
{% block extra_head %}
|
||||
<!-- Alpine.js for interactive elements -->
|
||||
<script defer src="https://cdn.jsdelivr.net/npm/alpinejs@3.x.x/dist/cdn.min.js"></script>
|
||||
{% endblock %}
|
||||
|
||||
{% block extra_styles %}
|
||||
/* Welcome page specific styles */
|
||||
.hero-section {
|
||||
background: linear-gradient(135deg, var(--color-primary-element) 0%, #0082c9 100%);
|
||||
color: white;
|
||||
padding: 60px 24px;
|
||||
margin: -24px -24px 40px -24px;
|
||||
border-radius: 0 0 var(--border-radius-large) var(--border-radius-large);
|
||||
text-align: center;
|
||||
}
|
||||
|
||||
.hero-section h1 {
|
||||
color: white;
|
||||
font-size: 36px;
|
||||
margin: 0 0 16px 0;
|
||||
font-weight: 600;
|
||||
}
|
||||
|
||||
.hero-section p {
|
||||
font-size: 18px;
|
||||
opacity: 0.95;
|
||||
max-width: 700px;
|
||||
margin: 0 auto;
|
||||
line-height: 1.6;
|
||||
}
|
||||
|
||||
.feature-grid {
|
||||
display: grid;
|
||||
grid-template-columns: repeat(auto-fit, minmax(280px, 1fr));
|
||||
gap: 24px;
|
||||
margin: 32px 0;
|
||||
}
|
||||
|
||||
.feature-card {
|
||||
background: var(--color-main-background);
|
||||
border: 2px solid var(--color-border);
|
||||
border-radius: var(--border-radius-large);
|
||||
padding: 24px;
|
||||
transition: all 0.2s;
|
||||
cursor: pointer;
|
||||
text-decoration: none;
|
||||
color: inherit;
|
||||
display: block;
|
||||
}
|
||||
|
||||
.feature-card:hover {
|
||||
border-color: var(--color-primary-element);
|
||||
box-shadow: 0 4px 12px rgba(0, 103, 158, 0.15);
|
||||
transform: translateY(-2px);
|
||||
}
|
||||
|
||||
.feature-card h3 {
|
||||
color: var(--color-primary-element);
|
||||
font-size: 20px;
|
||||
margin: 12px 0 8px 0;
|
||||
font-weight: 600;
|
||||
display: flex;
|
||||
align-items: center;
|
||||
gap: 12px;
|
||||
}
|
||||
|
||||
.feature-card p {
|
||||
color: var(--color-text-maxcontrast);
|
||||
font-size: 14px;
|
||||
line-height: 1.6;
|
||||
margin: 8px 0 0 0;
|
||||
}
|
||||
|
||||
.feature-icon {
|
||||
width: 48px;
|
||||
height: 48px;
|
||||
background: var(--color-primary-element-light);
|
||||
border-radius: var(--border-radius);
|
||||
display: flex;
|
||||
align-items: center;
|
||||
justify-content: center;
|
||||
margin-bottom: 8px;
|
||||
}
|
||||
|
||||
.feature-icon svg {
|
||||
width: 28px;
|
||||
height: 28px;
|
||||
fill: var(--color-primary-element);
|
||||
}
|
||||
|
||||
.info-section {
|
||||
background: var(--color-background-hover);
|
||||
border-radius: var(--border-radius-large);
|
||||
padding: 32px;
|
||||
margin: 32px 0;
|
||||
}
|
||||
|
||||
.info-section h2 {
|
||||
color: var(--color-main-text);
|
||||
font-size: 24px;
|
||||
margin: 0 0 16px 0;
|
||||
border: none;
|
||||
padding: 0;
|
||||
}
|
||||
|
||||
.info-section p {
|
||||
color: var(--color-text-maxcontrast);
|
||||
line-height: 1.7;
|
||||
margin: 12px 0;
|
||||
}
|
||||
|
||||
.info-section ul {
|
||||
margin: 12px 0;
|
||||
padding-left: 24px;
|
||||
}
|
||||
|
||||
.info-section li {
|
||||
color: var(--color-text-maxcontrast);
|
||||
line-height: 1.7;
|
||||
margin: 8px 0;
|
||||
}
|
||||
|
||||
.info-section code {
|
||||
background: var(--color-main-background);
|
||||
padding: 2px 8px;
|
||||
border-radius: var(--border-radius);
|
||||
font-size: 13px;
|
||||
}
|
||||
|
||||
.auth-status {
|
||||
background: var(--color-primary-element-light);
|
||||
border-left: 4px solid var(--color-primary-element);
|
||||
padding: 16px 20px;
|
||||
margin: 24px 0;
|
||||
border-radius: var(--border-radius);
|
||||
display: flex;
|
||||
align-items: center;
|
||||
gap: 12px;
|
||||
}
|
||||
|
||||
.auth-status svg {
|
||||
width: 24px;
|
||||
height: 24px;
|
||||
fill: var(--color-primary-element);
|
||||
flex-shrink: 0;
|
||||
}
|
||||
|
||||
.auth-status-text {
|
||||
flex: 1;
|
||||
}
|
||||
|
||||
.auth-status-text strong {
|
||||
display: block;
|
||||
color: var(--color-main-text);
|
||||
font-size: 14px;
|
||||
margin-bottom: 4px;
|
||||
}
|
||||
|
||||
.auth-status-text span {
|
||||
color: var(--color-text-maxcontrast);
|
||||
font-size: 13px;
|
||||
}
|
||||
{% endblock %}
|
||||
|
||||
{% block content %}
|
||||
<div class="app-content-wrapper">
|
||||
<!-- Main Content Area -->
|
||||
<main id="app-content">
|
||||
<div class="page-content">
|
||||
<!-- Hero Section -->
|
||||
<div class="hero-section">
|
||||
<h1>Welcome to Nextcloud MCP Server</h1>
|
||||
<p>
|
||||
Interactive user interface for semantic search and document retrieval.
|
||||
Test queries, visualize results, and explore your Nextcloud content using RAG workflows.
|
||||
</p>
|
||||
</div>
|
||||
|
||||
<!-- Authentication Status -->
|
||||
<div class="auth-status">
|
||||
<svg viewBox="0 0 24 24">
|
||||
<path d="M12,4A4,4 0 0,1 16,8A4,4 0 0,1 12,12A4,4 0 0,1 8,8A4,4 0 0,1 12,4M12,14C16.42,14 20,15.79 20,18V20H4V18C4,15.79 7.58,14 12,14Z" />
|
||||
</svg>
|
||||
<div class="auth-status-text">
|
||||
<strong>Authenticated as: {{ username }}</strong>
|
||||
<span>Authentication mode: <code>{{ auth_mode }}</code></span>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
{% if vector_sync_enabled %}
|
||||
<!-- Vector Sync Enabled Content -->
|
||||
<div class="info-section">
|
||||
<h2>About Semantic Search</h2>
|
||||
<p>
|
||||
This interface provides access to <strong>semantic search</strong> capabilities powered by vector embeddings.
|
||||
Unlike traditional keyword search, semantic search understands the <em>meaning</em> of your queries and finds
|
||||
conceptually similar content across your Nextcloud apps.
|
||||
</p>
|
||||
<p>
|
||||
<strong>How it works:</strong>
|
||||
</p>
|
||||
<ul>
|
||||
<li>Documents from Notes, Calendar, Files, Contacts, and Deck are indexed into a vector database</li>
|
||||
<li>Each document chunk is converted to a 768-dimensional vector embedding that captures semantic meaning</li>
|
||||
<li>Queries are also converted to embeddings and matched against document vectors using similarity search</li>
|
||||
<li>Results can be retrieved using pure semantic search or hybrid BM25 search combining keywords and semantics</li>
|
||||
</ul>
|
||||
</div>
|
||||
|
||||
<div class="info-section">
|
||||
<h2>RAG Workflow Integration</h2>
|
||||
<p>
|
||||
This UI allows you to <strong>test the same queries that Large Language Models (LLMs) would use</strong> in a
|
||||
Retrieval-Augmented Generation (RAG) workflow. When an AI assistant needs to answer questions about your data:
|
||||
</p>
|
||||
<ul>
|
||||
<li><strong>Step 1:</strong> The assistant converts your question into a search query</li>
|
||||
<li><strong>Step 2:</strong> The MCP server retrieves relevant document chunks using semantic search</li>
|
||||
<li><strong>Step 3:</strong> Retrieved context is passed to the LLM to generate an informed answer</li>
|
||||
</ul>
|
||||
|
||||
<!-- RAG Workflow Diagram -->
|
||||
<div style="background: var(--color-main-background); border: 2px solid var(--color-primary-element); border-radius: var(--border-radius-large); padding: 24px; margin: 24px 0; font-family: 'SFMono-Regular', 'Consolas', 'Liberation Mono', 'Menlo', monospace; font-size: 13px; line-height: 1.8; overflow-x: auto;">
|
||||
<div style="text-align: center; font-weight: 600; margin-bottom: 16px; color: var(--color-primary-element); font-size: 14px;">
|
||||
MCP Sampling RAG Workflow
|
||||
</div>
|
||||
<pre style="margin: 0; color: var(--color-main-text);">
|
||||
┌─────────────────┐
|
||||
│ <strong>MCP Client</strong> │ User asks: "What are health benefits of coffee?"
|
||||
│ (Claude Code) │
|
||||
└────────┬────────┘
|
||||
│ (1) User question
|
||||
↓
|
||||
┌────────────────────────────────────────────────────────────────────────┐
|
||||
│ <strong>Nextcloud MCP Server</strong> │
|
||||
│ ┌──────────────────────────────────────────────────────────────────┐ │
|
||||
│ │ <strong>nc_semantic_search_answer</strong> Tool (MCP Sampling-enabled) │ │
|
||||
│ │ │ │
|
||||
│ │ (2) Semantic Search │ │
|
||||
│ │ ┌────────────────────────────────────────────────────────┐ │ │
|
||||
│ │ │ Query: "health benefits of coffee" │ │ │
|
||||
│ │ │ → Convert to 768D vector embedding │ │ │
|
||||
│ │ │ → Search Qdrant (BM25 Hybrid + RRF fusion) │ │ │
|
||||
│ │ │ → Retrieve top 5 relevant document chunks │ │ │
|
||||
│ │ └────────────────────────────────────────────────────────┘ │ │
|
||||
│ │ │ │
|
||||
│ │ (3) Construct Prompt with Context │ │
|
||||
│ │ ┌────────────────────────────────────────────────────────┐ │ │
|
||||
│ │ │ "What are health benefits of coffee? │ │ │
|
||||
│ │ │ │ │ │
|
||||
│ │ │ Documents: │ │ │
|
||||
│ │ │ - [MED-2155] Effects of habitual coffee consumption...│ │ │
|
||||
│ │ │ - [MED-1646] Beverage consumption guidance... │ │ │
|
||||
│ │ │ - [MED-1627] Coffee and depression risk... │ │ │
|
||||
│ │ │ ... │ │ │
|
||||
│ │ │ │ │ │
|
||||
│ │ │ Provide answer with citations." │ │ │
|
||||
│ │ └────────────────────────────────────────────────────────┘ │ │
|
||||
│ │ │ │
|
||||
│ │ (4) MCP Sampling Request │ │
|
||||
│ │ ─────────────────────────────────────────────────────────────> │ │
|
||||
│ └──────────────────────────────────────────────────────────────────┘ │
|
||||
└────────────────────────────────────────────────────────────────────────┘
|
||||
│
|
||||
│ Sampling request with prompt + context
|
||||
↓
|
||||
┌─────────────────┐
|
||||
│ <strong>MCP Client</strong> │ (5) Client's LLM generates answer using retrieved context
|
||||
│ (Claude) │ → "Coffee consumption (2-3 cups/day) is associated with
|
||||
└────────┬────────┘ reduced risk of type 2 diabetes, cardiovascular disease,
|
||||
│ and improved liver health (Document 1, 2)..."
|
||||
│
|
||||
│ (6) Answer with citations
|
||||
↓
|
||||
┌─────────────────┐
|
||||
│ User │ Receives comprehensive answer with source citations
|
||||
└─────────────────┘</pre>
|
||||
</div>
|
||||
|
||||
<p style="margin-top: 16px;">
|
||||
<strong>Key Point:</strong> The MCP server retrieves context but doesn't generate answers itself.
|
||||
Through <strong>MCP sampling</strong>, it requests the client's LLM to generate responses, giving users
|
||||
full control over which model is used and ensuring all processing happens client-side.
|
||||
</p>
|
||||
|
||||
<p>
|
||||
By using this interface, you can preview search results, understand relevance scores, and verify
|
||||
that the system retrieves the right information before it reaches the LLM.
|
||||
</p>
|
||||
</div>
|
||||
|
||||
<!-- Feature Cards -->
|
||||
<h2>Available Features</h2>
|
||||
<div class="feature-grid">
|
||||
<a href="/app/user-info" class="feature-card">
|
||||
<div class="feature-icon">
|
||||
<svg viewBox="0 0 24 24">
|
||||
<path d="M12,4A4,4 0 0,1 16,8A4,4 0 0,1 12,12A4,4 0 0,1 8,8A4,4 0 0,1 12,4M12,14C16.42,14 20,15.79 20,18V20H4V18C4,15.79 7.58,14 12,14Z" />
|
||||
</svg>
|
||||
</div>
|
||||
<h3>User Information</h3>
|
||||
<p>
|
||||
View your authentication details, session information, and IdP profile.
|
||||
Manage background access permissions.
|
||||
</p>
|
||||
</a>
|
||||
|
||||
<a href="/app/user-info#vector-sync" class="feature-card">
|
||||
<div class="feature-icon">
|
||||
<svg viewBox="0 0 24 24">
|
||||
<path d="M12,18A6,6 0 0,1 6,12C6,11 6.25,10.03 6.7,9.2L5.24,7.74C4.46,8.97 4,10.43 4,12A8,8 0 0,0 12,20V23L16,19L12,15M12,4V1L8,5L12,9V6A6,6 0 0,1 18,12C18,13 17.75,13.97 17.3,14.8L18.76,16.26C19.54,15.03 20,13.57 20,12A8,8 0 0,0 12,4Z" />
|
||||
</svg>
|
||||
</div>
|
||||
<h3>Vector Sync Status</h3>
|
||||
<p>
|
||||
Monitor real-time indexing progress with metrics for indexed documents, pending queue,
|
||||
and synchronization status.
|
||||
</p>
|
||||
</a>
|
||||
|
||||
<a href="/app/user-info#vector-viz" class="feature-card">
|
||||
<div class="feature-icon">
|
||||
<svg viewBox="0 0 24 24">
|
||||
<path d="M22,21H2V3H4V19H6V10H10V19H12V6H16V19H18V14H22V21Z" />
|
||||
</svg>
|
||||
</div>
|
||||
<h3>Vector Visualization</h3>
|
||||
<p>
|
||||
Interactive search interface with 2D PCA visualization. Compare algorithms,
|
||||
view relevance scores, and explore matched document chunks.
|
||||
</p>
|
||||
</a>
|
||||
</div>
|
||||
|
||||
{% else %}
|
||||
<!-- Vector Sync Disabled Content -->
|
||||
<div class="warning">
|
||||
<h3 style="margin-top: 0;">Vector Sync is Disabled</h3>
|
||||
<p>
|
||||
Semantic search and vector visualization features are currently disabled.
|
||||
To enable these features, set <code>VECTOR_SYNC_ENABLED=true</code> in your environment configuration.
|
||||
</p>
|
||||
<p style="margin-bottom: 0;">
|
||||
<strong>Learn more:</strong>
|
||||
<a href="https://github.com/YOUR_REPO/docs/configuration.md" target="_blank" style="color: inherit; text-decoration: underline;">
|
||||
Configuration Guide
|
||||
</a>
|
||||
</p>
|
||||
</div>
|
||||
|
||||
<!-- Limited Feature Card -->
|
||||
<h2>Available Features</h2>
|
||||
<div class="feature-grid">
|
||||
<a href="/app/user-info" class="feature-card">
|
||||
<div class="feature-icon">
|
||||
<svg viewBox="0 0 24 24">
|
||||
<path d="M12,4A4,4 0 0,1 16,8A4,4 0 0,1 12,12A4,4 0 0,1 8,8A4,4 0 0,1 12,4M12,14C16.42,14 20,15.79 20,18V20H4V18C4,15.79 7.58,14 12,14Z" />
|
||||
</svg>
|
||||
</div>
|
||||
<h3>User Information</h3>
|
||||
<p>
|
||||
View your authentication details, session information, and IdP profile.
|
||||
Manage background access permissions.
|
||||
</p>
|
||||
</a>
|
||||
</div>
|
||||
{% endif %}
|
||||
|
||||
<!-- Documentation Section -->
|
||||
<div class="info-section" style="margin-top: 40px;">
|
||||
<h2>Documentation</h2>
|
||||
<p>
|
||||
For detailed information about configuration, authentication modes, and advanced features,
|
||||
please refer to the project documentation:
|
||||
</p>
|
||||
<ul>
|
||||
<li><a href="https://github.com/cbcoutinho/nextcloud-mcp-server/blob/master/docs/installation.md" target="_blank">Installation Guide</a></li>
|
||||
<li><a href="https://github.com/cbcoutinho/nextcloud-mcp-server/blob/master/docs/configuration.md" target="_blank">Configuration Options</a></li>
|
||||
<li><a href="https://github.com/cbcoutinho/nextcloud-mcp-server/blob/master/docs/authentication.md" target="_blank">Authentication Modes</a></li>
|
||||
{% if vector_sync_enabled %}
|
||||
<li><a href="https://github.com/cbcoutinho/nextcloud-mcp-server/blob/master/docs/user-guide/vector-sync-ui.md" target="_blank">Vector Sync UI Guide</a></li>
|
||||
{% endif %}
|
||||
</ul>
|
||||
</div>
|
||||
</div>
|
||||
</main>
|
||||
</div>
|
||||
{% endblock %}
|
||||
@@ -623,6 +623,9 @@ async def user_info_html(request: Request) -> HTMLResponse:
|
||||
</div>
|
||||
"""
|
||||
|
||||
# Check if vector sync is enabled (needed for Welcome tab)
|
||||
vector_sync_enabled = os.getenv("VECTOR_SYNC_ENABLED", "false").lower() == "true"
|
||||
|
||||
# Render template
|
||||
template = _jinja_env.get_template("user_info.html")
|
||||
return HTMLResponse(
|
||||
@@ -634,6 +637,10 @@ async def user_info_html(request: Request) -> HTMLResponse:
|
||||
show_webhooks_tab=show_webhooks_tab,
|
||||
logout_url=logout_url if auth_mode == "oauth" else None,
|
||||
nextcloud_host_for_links=nextcloud_host_for_links,
|
||||
# Additional context for Welcome tab
|
||||
vector_sync_enabled=vector_sync_enabled,
|
||||
username=username,
|
||||
auth_mode=auth_mode,
|
||||
)
|
||||
)
|
||||
|
||||
|
||||
@@ -1,13 +1,14 @@
|
||||
"""Vector visualization routes for testing search algorithms.
|
||||
|
||||
Provides a web UI for users to test different search algorithms on their own
|
||||
indexed documents and visualize results in 2D space using PCA.
|
||||
indexed documents and visualize results in 3D space using PCA.
|
||||
|
||||
All processing happens server-side following ADR-012:
|
||||
- Search execution via shared search/algorithms.py
|
||||
- PCA dimensionality reduction (768-dim → 2D)
|
||||
- Only 2D coordinates + metadata sent to client
|
||||
- Bandwidth-efficient (2 floats per doc vs 768)
|
||||
- Query embedding generation
|
||||
- PCA dimensionality reduction (768-dim → 3D)
|
||||
- Only 3D coordinates + metadata sent to client
|
||||
- Bandwidth-efficient (3 floats per doc vs 768)
|
||||
"""
|
||||
|
||||
import logging
|
||||
@@ -77,19 +78,20 @@ async def vector_visualization_html(request: Request) -> HTMLResponse:
|
||||
|
||||
@requires("authenticated", redirect="oauth_login")
|
||||
async def vector_visualization_search(request: Request) -> JSONResponse:
|
||||
"""Execute server-side search and return 2D coordinates + results.
|
||||
"""Execute server-side search and return 3D coordinates + results.
|
||||
|
||||
All processing happens server-side:
|
||||
1. Execute search via shared algorithm module
|
||||
2. Fetch matching vectors from Qdrant
|
||||
3. Apply PCA reduction (768-dim → 2D)
|
||||
4. Return coordinates + metadata only
|
||||
2. Generate query embedding
|
||||
3. Fetch matching vectors from Qdrant
|
||||
4. Apply PCA reduction (768-dim → 3D) to query + documents
|
||||
5. Return coordinates + metadata only
|
||||
|
||||
Args:
|
||||
request: Starlette request with query parameters
|
||||
|
||||
Returns:
|
||||
JSON response with coordinates_2d and results
|
||||
JSON response with coordinates_3d and results (including query point)
|
||||
"""
|
||||
settings = get_settings()
|
||||
|
||||
@@ -209,7 +211,8 @@ async def vector_visualization_search(request: Request) -> JSONResponse:
|
||||
{
|
||||
"success": True,
|
||||
"results": [],
|
||||
"coordinates_2d": [],
|
||||
"coordinates_3d": [],
|
||||
"query_coords": None,
|
||||
"message": "No results found",
|
||||
}
|
||||
)
|
||||
@@ -253,7 +256,7 @@ async def vector_visualization_search(request: Request) -> JSONResponse:
|
||||
}
|
||||
)
|
||||
|
||||
# Extract dense vectors (handle both named and unnamed vectors)
|
||||
# Extract dense vectors and group by document
|
||||
def extract_dense_vector(point):
|
||||
if point.vector is None:
|
||||
return None
|
||||
@@ -263,13 +266,21 @@ async def vector_visualization_search(request: Request) -> JSONResponse:
|
||||
# If unnamed vector (array), use directly
|
||||
return point.vector
|
||||
|
||||
vectors = np.array(
|
||||
[v for v in (extract_dense_vector(p) for p in points) if v is not None]
|
||||
)
|
||||
# Group chunk vectors by doc_id
|
||||
from collections import defaultdict
|
||||
|
||||
doc_chunks = defaultdict(list)
|
||||
for point in points:
|
||||
if point.payload:
|
||||
doc_id = int(point.payload.get("doc_id", 0))
|
||||
vector = extract_dense_vector(point)
|
||||
if vector is not None:
|
||||
doc_chunks[doc_id].append(vector)
|
||||
|
||||
vector_fetch_duration = time.perf_counter() - vector_fetch_start
|
||||
|
||||
if len(vectors) < 2:
|
||||
# Not enough points for PCA
|
||||
if len(doc_chunks) < 2:
|
||||
# Not enough documents for PCA
|
||||
return JSONResponse(
|
||||
{
|
||||
"success": True,
|
||||
@@ -283,35 +294,131 @@ async def vector_visualization_search(request: Request) -> JSONResponse:
|
||||
}
|
||||
for r in search_results
|
||||
],
|
||||
"coordinates_2d": [[0, 0]] * len(search_results),
|
||||
"message": "Not enough vectors for PCA",
|
||||
"coordinates_3d": [[0, 0, 0]] * len(search_results),
|
||||
"query_coords": [0, 0, 0],
|
||||
"message": "Not enough documents for PCA",
|
||||
}
|
||||
)
|
||||
|
||||
# Apply PCA dimensionality reduction (768-dim → 2D)
|
||||
# Detect embedding dimension from first available vector
|
||||
embedding_dim = None
|
||||
for chunks in doc_chunks.values():
|
||||
if chunks:
|
||||
embedding_dim = len(chunks[0])
|
||||
break
|
||||
|
||||
if embedding_dim is None:
|
||||
return JSONResponse(
|
||||
{
|
||||
"success": False,
|
||||
"error": "Could not determine embedding dimension",
|
||||
},
|
||||
status_code=500,
|
||||
)
|
||||
|
||||
logger.info(f"Detected embedding dimension: {embedding_dim}")
|
||||
|
||||
# Average chunk vectors per document to create document-level embeddings
|
||||
# Maintain order of search_results for coordinate mapping
|
||||
doc_vectors = []
|
||||
for result in search_results:
|
||||
if result.id in doc_chunks:
|
||||
# Average all chunk embeddings for this document
|
||||
chunk_vectors = np.array(doc_chunks[result.id])
|
||||
avg_vector = np.mean(chunk_vectors, axis=0)
|
||||
doc_vectors.append(avg_vector)
|
||||
logger.debug(f"Doc {result.id}: averaged {len(chunk_vectors)} chunks")
|
||||
else:
|
||||
# Document not found in vectors (shouldn't happen)
|
||||
logger.warning(f"Doc {result.id} not found in fetched vectors")
|
||||
# Use zero vector as fallback with detected dimension
|
||||
doc_vectors.append(np.zeros(embedding_dim))
|
||||
|
||||
doc_vectors = np.array(doc_vectors)
|
||||
|
||||
# Generate query embedding for visualization
|
||||
query_embed_start = time.perf_counter()
|
||||
from nextcloud_mcp_server.embedding.service import get_embedding_service
|
||||
|
||||
embedding_service = get_embedding_service()
|
||||
query_embedding = await embedding_service.embed(query)
|
||||
query_embed_duration = time.perf_counter() - query_embed_start
|
||||
|
||||
logger.info(f"Generated query embedding (dimension={len(query_embedding)})")
|
||||
|
||||
# Combine query vector with document vectors for PCA
|
||||
# Query will be the last point in the array
|
||||
all_vectors = np.vstack([doc_vectors, np.array([query_embedding])])
|
||||
|
||||
# Normalize vectors to unit length (L2 normalization)
|
||||
# This is critical because Qdrant uses COSINE distance, which only measures
|
||||
# vector direction (angle), not magnitude. PCA uses Euclidean distance which
|
||||
# considers both direction and magnitude. By normalizing to unit length,
|
||||
# Euclidean distances in PCA space will match cosine distances.
|
||||
norms = np.linalg.norm(all_vectors, axis=1, keepdims=True)
|
||||
|
||||
# Check for zero-norm vectors (can happen with empty/corrupted embeddings)
|
||||
zero_norm_mask = norms[:, 0] < 1e-10
|
||||
if zero_norm_mask.any():
|
||||
zero_indices = np.where(zero_norm_mask)[0]
|
||||
logger.warning(
|
||||
f"Found {zero_norm_mask.sum()} zero-norm vectors at indices {zero_indices.tolist()}. "
|
||||
"Replacing with small epsilon to avoid division by zero."
|
||||
)
|
||||
# Replace zero norms with small epsilon to avoid NaN
|
||||
norms[zero_norm_mask] = 1e-10
|
||||
|
||||
all_vectors_normalized = all_vectors / norms
|
||||
logger.info(
|
||||
f"Normalized vectors: query_norm={norms[-1][0]:.3f}, "
|
||||
f"doc_norm_range=[{norms[:-1].min():.3f}, {norms[:-1].max():.3f}]"
|
||||
)
|
||||
|
||||
# Apply PCA dimensionality reduction (768-dim → 3D) on normalized vectors
|
||||
pca_start = time.perf_counter()
|
||||
pca = PCA(n_components=2)
|
||||
coords_2d = pca.fit_transform(vectors)
|
||||
pca = PCA(n_components=3)
|
||||
coords_3d = pca.fit_transform(all_vectors_normalized)
|
||||
pca_duration = time.perf_counter() - pca_start
|
||||
|
||||
# After fit, these attributes are guaranteed to be set
|
||||
assert pca.explained_variance_ratio_ is not None
|
||||
|
||||
logger.info(
|
||||
f"PCA explained variance: PC1={pca.explained_variance_ratio_[0]:.3f}, "
|
||||
f"PC2={pca.explained_variance_ratio_[1]:.3f}"
|
||||
# Check for NaN values in PCA output (numerical instability)
|
||||
nan_mask = np.isnan(coords_3d)
|
||||
if nan_mask.any():
|
||||
nan_rows = np.where(nan_mask.any(axis=1))[0]
|
||||
logger.error(
|
||||
f"Found NaN values in PCA output at {len(nan_rows)} points: {nan_rows.tolist()[:10]}. "
|
||||
"Replacing NaN with 0.0 to prevent JSON serialization error."
|
||||
)
|
||||
# Replace NaN with 0 to allow JSON serialization
|
||||
coords_3d = np.nan_to_num(coords_3d, nan=0.0)
|
||||
|
||||
# Split query coords from document coords
|
||||
# Round to 2 decimal places for cleaner display
|
||||
query_coords_3d = [
|
||||
round(float(x), 2) for x in coords_3d[-1]
|
||||
] # Last point is query
|
||||
doc_coords_3d = coords_3d[:-1] # All but last are documents
|
||||
|
||||
total_chunks = sum(len(chunks) for chunks in doc_chunks.values())
|
||||
avg_chunks_per_doc = (
|
||||
total_chunks / len(doc_vectors) if doc_vectors.size > 0 else 0
|
||||
)
|
||||
|
||||
# Map results to coordinates (use first chunk per document)
|
||||
result_coords = []
|
||||
seen_doc_ids = set()
|
||||
logger.info(
|
||||
f"PCA explained variance: PC1={pca.explained_variance_ratio_[0]:.3f}, "
|
||||
f"PC2={pca.explained_variance_ratio_[1]:.3f}, "
|
||||
f"PC3={pca.explained_variance_ratio_[2]:.3f}"
|
||||
)
|
||||
logger.info(
|
||||
f"Embedding stats: documents={len(doc_vectors)}, "
|
||||
f"total_chunks={total_chunks}, avg_chunks_per_doc={avg_chunks_per_doc:.1f}, "
|
||||
f"query_dim={len(query_embedding)}, doc_vector_dim={doc_vectors.shape[1] if doc_vectors.size > 0 else 0}"
|
||||
)
|
||||
|
||||
for point, coord in zip(points, coords_2d):
|
||||
if point.payload:
|
||||
doc_id = int(point.payload.get("doc_id", 0))
|
||||
if doc_id not in seen_doc_ids and doc_id in doc_ids:
|
||||
seen_doc_ids.add(doc_id)
|
||||
result_coords.append(coord.tolist())
|
||||
# Coordinates already match search_results order (1:1 mapping)
|
||||
result_coords = [[round(float(x), 2) for x in coord] for coord in doc_coords_3d]
|
||||
|
||||
# Build response
|
||||
response_results = [
|
||||
@@ -338,26 +445,30 @@ async def vector_visualization_search(request: Request) -> JSONResponse:
|
||||
f"Viz search timing: total={total_duration * 1000:.1f}ms, "
|
||||
f"search={search_duration * 1000:.1f}ms ({search_duration / total_duration * 100:.1f}%), "
|
||||
f"vector_fetch={vector_fetch_duration * 1000:.1f}ms ({vector_fetch_duration / total_duration * 100:.1f}%), "
|
||||
f"query_embed={query_embed_duration * 1000:.1f}ms ({query_embed_duration / total_duration * 100:.1f}%), "
|
||||
f"pca={pca_duration * 1000:.1f}ms ({pca_duration / total_duration * 100:.1f}%), "
|
||||
f"results={len(search_results)}, vectors={len(vectors)}"
|
||||
f"results={len(search_results)}, doc_vectors={len(doc_vectors)}"
|
||||
)
|
||||
|
||||
return JSONResponse(
|
||||
{
|
||||
"success": True,
|
||||
"results": response_results,
|
||||
"coordinates_2d": result_coords[: len(search_results)],
|
||||
"coordinates_3d": result_coords[: len(search_results)],
|
||||
"query_coords": query_coords_3d,
|
||||
"pca_variance": {
|
||||
"pc1": float(pca.explained_variance_ratio_[0]),
|
||||
"pc2": float(pca.explained_variance_ratio_[1]),
|
||||
"pc3": float(pca.explained_variance_ratio_[2]),
|
||||
},
|
||||
"timing": {
|
||||
"total_ms": round(total_duration * 1000, 2),
|
||||
"search_ms": round(search_duration * 1000, 2),
|
||||
"vector_fetch_ms": round(vector_fetch_duration * 1000, 2),
|
||||
"query_embed_ms": round(query_embed_duration * 1000, 2),
|
||||
"pca_ms": round(pca_duration * 1000, 2),
|
||||
"num_results": len(search_results),
|
||||
"num_vectors": len(vectors),
|
||||
"num_doc_vectors": len(doc_vectors),
|
||||
},
|
||||
}
|
||||
)
|
||||
|
||||
@@ -3,7 +3,7 @@
|
||||
import logging
|
||||
from dataclasses import dataclass
|
||||
|
||||
from langchain_text_splitters import MarkdownTextSplitter
|
||||
from langchain_text_splitters import RecursiveCharacterTextSplitter
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
@@ -20,9 +20,9 @@ class ChunkWithPosition:
|
||||
class DocumentChunker:
|
||||
"""Chunk large documents for optimal embedding using LangChain text splitters.
|
||||
|
||||
Uses MarkdownTextSplitter which is optimized for Markdown content like
|
||||
Nextcloud Notes. Respects markdown structure (headers, code blocks, lists)
|
||||
while maintaining semantic boundaries.
|
||||
Uses RecursiveCharacterTextSplitter which preserves semantic boundaries
|
||||
by splitting on sentence and paragraph boundaries before resorting to
|
||||
character-level splitting.
|
||||
"""
|
||||
|
||||
def __init__(self, chunk_size: int = 2048, overlap: int = 200):
|
||||
@@ -36,15 +36,14 @@ class DocumentChunker:
|
||||
self.chunk_size = chunk_size
|
||||
self.overlap = overlap
|
||||
|
||||
# Initialize LangChain MarkdownTextSplitter
|
||||
# Optimized for Markdown content with special handling for:
|
||||
# - Headers (# ## ###)
|
||||
# - Code blocks (``` ```)
|
||||
# - Lists (- * 1.)
|
||||
# - Horizontal rules (---)
|
||||
# - Paragraphs and sentences
|
||||
# This preserves both markdown structure and semantic boundaries
|
||||
self.splitter = MarkdownTextSplitter(
|
||||
# Initialize LangChain RecursiveCharacterTextSplitter
|
||||
# Uses hierarchical splitting to preserve semantic boundaries:
|
||||
# - Paragraphs (\n\n)
|
||||
# - Sentences (. ! ?)
|
||||
# - Words (spaces)
|
||||
# - Characters (last resort)
|
||||
# This prevents mid-sentence splitting while maintaining semantic coherence
|
||||
self.splitter = RecursiveCharacterTextSplitter(
|
||||
chunk_size=chunk_size,
|
||||
chunk_overlap=overlap,
|
||||
add_start_index=True, # Enable position tracking
|
||||
@@ -55,14 +54,14 @@ class DocumentChunker:
|
||||
"""
|
||||
Split text into overlapping chunks with position tracking.
|
||||
|
||||
Uses LangChain's MarkdownTextSplitter to create chunks that respect
|
||||
both markdown structure and semantic boundaries. Optimized for Nextcloud
|
||||
Notes content with special handling for headers, code blocks, lists, etc.
|
||||
Preserves character positions for each chunk to enable precise document
|
||||
retrieval.
|
||||
Uses LangChain's RecursiveCharacterTextSplitter to create chunks that
|
||||
preserve semantic boundaries by splitting at paragraphs and sentences
|
||||
before resorting to word or character-level splitting. This ensures
|
||||
sentences are kept intact. Preserves character positions for each chunk
|
||||
to enable precise document retrieval.
|
||||
|
||||
Args:
|
||||
content: Markdown text content to chunk
|
||||
content: Text content to chunk
|
||||
|
||||
Returns:
|
||||
List of chunks with their character positions in the original content
|
||||
|
||||
@@ -159,8 +159,8 @@ class TestChunkConfigValidation:
|
||||
def test_default_chunk_settings(self):
|
||||
"""Test default chunk size and overlap values."""
|
||||
settings = Settings()
|
||||
assert settings.document_chunk_size == 512
|
||||
assert settings.document_chunk_overlap == 50
|
||||
assert settings.document_chunk_size == 2048
|
||||
assert settings.document_chunk_overlap == 200
|
||||
|
||||
def test_valid_chunk_settings(self):
|
||||
"""Test valid chunk size and overlap configuration."""
|
||||
@@ -205,7 +205,7 @@ class TestChunkConfigValidation:
|
||||
)
|
||||
|
||||
def test_small_chunk_size_warning(self, caplog):
|
||||
"""Test that chunk size < 100 triggers warning."""
|
||||
"""Test that chunk size < 512 triggers warning."""
|
||||
import logging
|
||||
|
||||
caplog.set_level(logging.WARNING, logger="nextcloud_mcp_server.config")
|
||||
@@ -214,19 +214,19 @@ class TestChunkConfigValidation:
|
||||
document_chunk_overlap=10,
|
||||
)
|
||||
assert (
|
||||
"DOCUMENT_CHUNK_SIZE is set to 64 words, which is quite small"
|
||||
"DOCUMENT_CHUNK_SIZE is set to 64 characters, which is quite small"
|
||||
in caplog.text
|
||||
)
|
||||
assert "Consider using at least 256 words" in caplog.text
|
||||
assert "Consider using at least 1024 characters" in caplog.text
|
||||
|
||||
def test_reasonable_chunk_size_no_warning(self, caplog):
|
||||
"""Test that chunk size >= 100 doesn't trigger warning."""
|
||||
"""Test that chunk size >= 512 doesn't trigger warning."""
|
||||
import logging
|
||||
|
||||
caplog.set_level(logging.WARNING, logger="nextcloud_mcp_server.config")
|
||||
Settings(
|
||||
document_chunk_size=256,
|
||||
document_chunk_overlap=25,
|
||||
document_chunk_size=1024,
|
||||
document_chunk_overlap=100,
|
||||
)
|
||||
assert "DOCUMENT_CHUNK_SIZE" not in caplog.text
|
||||
|
||||
|
||||