Enable users to tune document chunking parameters to match their embedding model and content type by adding DOCUMENT_CHUNK_SIZE and DOCUMENT_CHUNK_OVERLAP environment variables. - **config.py**: Added `document_chunk_size` (default: 512) and `document_chunk_overlap` (default: 50) configuration fields with validation: - Ensures overlap < chunk_size - Warns if chunk_size < 100 words - Prevents negative overlap values - **processor.py**: Updated DocumentChunker instantiation to use config settings instead of hardcoded values (line 174-177) - **tests/unit/test_config.py**: Added TestChunkConfigValidation class with 9 tests covering: - Default values - Valid configurations - Validation errors (overlap >= chunk_size, negative overlap) - Warning for small chunk sizes - Environment variable loading - **docs/configuration.md**: Added comprehensive "Document Chunking Configuration" section with: - Chunk size selection guidance (256-384 vs 512 vs 768-1024 words) - Overlap recommendations (10-20% of chunk size) - Configuration examples for different use cases - Added env vars to reference table - **docs/semantic-search-architecture.md**: Added "Document Chunking Strategy" section with: - Chunking process explanation - Example showing sliding window behavior - Search behavior with chunks - Tuning recommendations - **env.sample**: Added complete "Semantic Search & Vector Sync Configuration" section with: - Vector sync settings - Qdrant configuration (3 modes) - Ollama embedding service - Document chunking configuration - **docker-compose.yml**: Added commented examples for DOCUMENT_CHUNK_SIZE and DOCUMENT_CHUNK_OVERLAP with usage notes \`\`\`bash DOCUMENT_CHUNK_SIZE=512 DOCUMENT_CHUNK_OVERLAP=50 \`\`\` 1. \`overlap\` must be less than \`chunk_size\` 2. \`overlap\` cannot be negative 3. Warning issued if \`chunk_size\` < 100 words **Precise matching** (small notes, specific queries): \`\`\`bash DOCUMENT_CHUNK_SIZE=256 DOCUMENT_CHUNK_OVERLAP=25 \`\`\` **Balanced** (default, general purpose): \`\`\`bash DOCUMENT_CHUNK_SIZE=512 DOCUMENT_CHUNK_OVERLAP=50 \`\`\` **Contextual** (long documents, broader topics): \`\`\`bash DOCUMENT_CHUNK_SIZE=1024 DOCUMENT_CHUNK_OVERLAP=100 \`\`\` ✅ **User control** - Tune chunking to match embedding model capabilities ✅ **Experimentation** - Test different chunk sizes for optimal results ✅ **Model alignment** - Match chunk size to embedding context window ✅ **Backward compatible** - Defaults maintain existing behavior ✅ **Well validated** - Comprehensive tests prevent misconfiguration All 22 config validation tests pass (9 new tests for chunking): - Default values work correctly - Validation prevents invalid configurations - Environment variables load properly - Warning system works as expected With configurable chunk sizes, users can now experiment with different Ollama embedding models and tune chunk parameters for optimal semantic search quality. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
18 KiB
Configuration
The Nextcloud MCP server requires configuration to connect to your Nextcloud instance. Configuration is provided through environment variables, typically stored in a .env file.
Quick Start
Create a .env file based on env.sample:
cp env.sample .env
# Edit .env with your Nextcloud details
Then choose your authentication mode:
- OAuth2/OIDC Configuration (Recommended)
- Basic Authentication Configuration
OAuth2/OIDC Configuration
OAuth2/OIDC is the recommended authentication mode for production deployments.
Minimal Configuration (Auto-registration)
# .env file for OAuth with auto-registration
NEXTCLOUD_HOST=https://your.nextcloud.instance.com
# Leave these EMPTY for OAuth mode
NEXTCLOUD_USERNAME=
NEXTCLOUD_PASSWORD=
This minimal configuration uses dynamic client registration to automatically register an OAuth client at startup.
Full Configuration (Pre-configured Client)
# .env file for OAuth with pre-configured client
NEXTCLOUD_HOST=https://your.nextcloud.instance.com
# OAuth Client Credentials (optional - auto-registers if not provided)
NEXTCLOUD_OIDC_CLIENT_ID=your-client-id
NEXTCLOUD_OIDC_CLIENT_SECRET=your-client-secret
# OAuth Callback Settings (optional)
NEXTCLOUD_MCP_SERVER_URL=http://localhost:8000
# Leave these EMPTY for OAuth mode
NEXTCLOUD_USERNAME=
NEXTCLOUD_PASSWORD=
Environment Variables Reference
| Variable | Required | Default | Description |
|---|---|---|---|
NEXTCLOUD_HOST |
✅ Yes | - | Full URL of your Nextcloud instance (e.g., https://cloud.example.com) |
NEXTCLOUD_OIDC_CLIENT_ID |
⚠️ Optional | - | OAuth client ID (auto-registers if empty) |
NEXTCLOUD_OIDC_CLIENT_SECRET |
⚠️ Optional | - | OAuth client secret (auto-registers if empty) |
NEXTCLOUD_MCP_SERVER_URL |
⚠️ Optional | http://localhost:8000 |
MCP server URL for OAuth callbacks |
NEXTCLOUD_USERNAME |
❌ Must be empty | - | Leave empty to enable OAuth mode |
NEXTCLOUD_PASSWORD |
❌ Must be empty | - | Leave empty to enable OAuth mode |
Prerequisites
Before using OAuth configuration:
-
Install required Nextcloud apps (both are required):
oidc- OIDC Identity Provider (Apps → Security)user_oidc- OpenID Connect user backend (Apps → Security)
-
Configure the apps:
- Enable dynamic client registration (if using auto-registration) - Settings → OIDC
- Enable Bearer token validation:
php occ config:system:set user_oidc oidc_provider_bearer_validation --value=true --type=boolean
-
Apply Bearer token patch - The
user_oidcapp requires a patch for non-OCS endpoints - See Upstream Status for details
See the OAuth Setup Guide for detailed step-by-step instructions, or OAuth Quick Start for a 5-minute setup.
Basic Authentication (Legacy)
Basic Authentication is maintained for backward compatibility. It uses username and password credentials.
Warning
Security Notice: Basic Authentication stores credentials in environment variables and is less secure than OAuth. Use OAuth for production deployments.
Configuration
# .env file for BasicAuth mode
NEXTCLOUD_HOST=https://your.nextcloud.instance.com
NEXTCLOUD_USERNAME=your_nextcloud_username
NEXTCLOUD_PASSWORD=your_app_password_or_password
Environment Variables Reference
| Variable | Required | Description |
|---|---|---|
NEXTCLOUD_HOST |
✅ Yes | Full URL of your Nextcloud instance |
NEXTCLOUD_USERNAME |
✅ Yes | Your Nextcloud username |
NEXTCLOUD_PASSWORD |
✅ Yes | Recommended: Use a dedicated Nextcloud App Password. Generate one in Nextcloud Security settings. Alternatively, use your login password (less secure). |
Semantic Search Configuration (Optional)
The MCP server includes semantic search capabilities powered by vector embeddings. This feature requires a vector database (Qdrant) and an embedding service.
Qdrant Vector Database Modes
The server supports three Qdrant deployment modes:
- In-Memory Mode (Default) - Simplest for development and testing
- Persistent Local Mode - For single-instance deployments with persistence
- Network Mode - For production with dedicated Qdrant service
1. In-Memory Mode (Default)
No configuration needed! If neither QDRANT_URL nor QDRANT_LOCATION is set, the server defaults to in-memory mode:
# No Qdrant configuration needed - defaults to :memory:
VECTOR_SYNC_ENABLED=true
Pros:
- Zero configuration
- Fast startup
- Perfect for testing
Cons:
- Data lost on restart
- Limited to available RAM
2. Persistent Local Mode
For single-instance deployments that need persistence without a separate Qdrant service:
# Local persistent storage
QDRANT_LOCATION=/app/data/qdrant # Or any writable path
VECTOR_SYNC_ENABLED=true
Pros:
- Data persists across restarts
- No separate service needed
- Suitable for small/medium deployments
Cons:
- Limited to single instance
- Shares resources with MCP server
3. Network Mode
For production deployments with a dedicated Qdrant service:
# Network mode configuration
QDRANT_URL=http://qdrant:6333
QDRANT_API_KEY=your-secret-api-key # Optional
QDRANT_COLLECTION=nextcloud_content # Optional
VECTOR_SYNC_ENABLED=true
Pros:
- Scalable and performant
- Can be shared across multiple MCP instances
- Supports clustering and replication
Cons:
- Requires separate Qdrant service
- More complex deployment
Qdrant Collection Naming
Collection names are automatically generated to include the embedding model, ensuring safe model switching and preventing dimension mismatches.
Auto-Generated Naming (Default)
Format: {deployment-id}-{model-name}
Components:
- Deployment ID:
OTEL_SERVICE_NAME(if configured) orhostname(fallback) - Model name:
OLLAMA_EMBEDDING_MODEL
Examples:
# With OTEL service name configured
OTEL_SERVICE_NAME=my-mcp-server
OLLAMA_EMBEDDING_MODEL=nomic-embed-text
# → Collection: "my-mcp-server-nomic-embed-text"
# Simple Docker deployment (OTEL not configured)
# hostname=mcp-container
OLLAMA_EMBEDDING_MODEL=all-minilm
# → Collection: "mcp-container-all-minilm"
Switching Embedding Models
When you change OLLAMA_EMBEDDING_MODEL, a new collection is automatically created:
# Initial setup
OLLAMA_EMBEDDING_MODEL=nomic-embed-text
# Collection: "my-server-nomic-embed-text" (768 dimensions)
# Change model
OLLAMA_EMBEDDING_MODEL=all-minilm
# Collection: "my-server-all-minilm" (384 dimensions)
# → New collection created, full re-embedding occurs
Important:
- Collections are mutually exclusive - vectors cannot be shared between different embedding models
- Switching models requires re-embedding all documents (may take time for large note collections)
- Old collection remains in Qdrant and can be deleted manually if no longer needed
Explicit Override
Set QDRANT_COLLECTION to use a specific collection name:
QDRANT_COLLECTION=my-custom-collection # Bypasses auto-generation
Use cases:
- Backward compatibility with existing deployments
- Custom naming schemes
- Sharing a collection across deployments (advanced)
Multi-Server Deployments
Each server should have a unique deployment ID to avoid collection collisions:
# Server 1 (Production)
OTEL_SERVICE_NAME=mcp-prod
OLLAMA_EMBEDDING_MODEL=nomic-embed-text
# → Collection: "mcp-prod-nomic-embed-text"
# Server 2 (Staging)
OTEL_SERVICE_NAME=mcp-staging
OLLAMA_EMBEDDING_MODEL=nomic-embed-text
# → Collection: "mcp-staging-nomic-embed-text"
# Server 3 (Different model)
OTEL_SERVICE_NAME=mcp-experimental
OLLAMA_EMBEDDING_MODEL=bge-large
# → Collection: "mcp-experimental-bge-large"
Benefits:
- Multiple MCP servers can share one Qdrant instance safely
- No naming collisions between deployments
- Clear collection ownership (can see which deployment and model)
Dimension Validation
The server validates collection dimensions on startup:
Dimension mismatch for collection 'my-server-nomic-embed-text':
Expected: 384 (from embedding model 'all-minilm')
Found: 768
This usually means you changed the embedding model.
Solutions:
1. Delete the old collection: Collection will be recreated with new dimensions
2. Set QDRANT_COLLECTION to use a different collection name
3. Revert OLLAMA_EMBEDDING_MODEL to the original model
What this prevents:
- Runtime errors from dimension mismatches
- Data corruption in Qdrant
- Confusing error messages during indexing
Vector Sync Configuration
Control background indexing behavior:
# Vector sync settings (ADR-007)
VECTOR_SYNC_ENABLED=true # Enable background indexing
VECTOR_SYNC_SCAN_INTERVAL=300 # Scan interval in seconds (default: 5 minutes)
VECTOR_SYNC_PROCESSOR_WORKERS=3 # Concurrent indexing workers (default: 3)
VECTOR_SYNC_QUEUE_MAX_SIZE=10000 # Max queued documents (default: 10000)
# Document chunking settings (for vector embeddings)
DOCUMENT_CHUNK_SIZE=512 # Words per chunk (default: 512)
DOCUMENT_CHUNK_OVERLAP=50 # Overlapping words between chunks (default: 50)
Embedding Service Configuration
The server uses an embedding service to generate vector representations. Two options are available:
Ollama (Recommended)
Use a local Ollama instance for embeddings:
OLLAMA_BASE_URL=http://ollama:11434
OLLAMA_EMBEDDING_MODEL=nomic-embed-text # Default model
OLLAMA_VERIFY_SSL=true # Verify SSL certificates
Simple Embedding Provider (Fallback)
If OLLAMA_BASE_URL is not set, the server uses a simple random embedding provider for testing. This is not suitable for production as it generates random embeddings with no semantic meaning.
Document Chunking Configuration
The server chunks documents before embedding to handle documents larger than the embedding model's context window. Chunk size and overlap can be tuned based on your embedding model and content type.
Choosing Chunk Size
Smaller chunks (256-384 words):
- More precise matching
- Less context per chunk
- Better for finding specific information
- Higher storage requirements (more vectors)
Larger chunks (768-1024 words):
- More context per chunk
- Less precise matching
- Better for understanding broader topics
- Lower storage requirements (fewer vectors)
Default (512 words):
- Balanced approach suitable for most use cases
- Works well with typical note lengths
- Good compromise between precision and context
Choosing Overlap
Overlap preserves context across chunk boundaries. Recommended settings:
- 10-20% of chunk size (e.g., 50-100 words for 512-word chunks)
- Too small (<10%): May lose context at boundaries
- Too large (>20%): Redundant storage, diminishing returns
Examples:
# Precise matching for short notes
DOCUMENT_CHUNK_SIZE=256
DOCUMENT_CHUNK_OVERLAP=25
# Default balanced configuration
DOCUMENT_CHUNK_SIZE=512
DOCUMENT_CHUNK_OVERLAP=50
# More context for long documents
DOCUMENT_CHUNK_SIZE=1024
DOCUMENT_CHUNK_OVERLAP=100
Important: Changing chunk size requires re-embedding all documents. The collection naming strategy (see "Qdrant Collection Naming" above) helps manage this by creating separate collections for different configurations.
Environment Variables Reference
| Variable | Required | Default | Description |
|---|---|---|---|
QDRANT_URL |
⚠️ Optional | - | Qdrant service URL (network mode) - mutually exclusive with QDRANT_LOCATION |
QDRANT_LOCATION |
⚠️ Optional | :memory: |
Local Qdrant path (:memory: or /path/to/data) - mutually exclusive with QDRANT_URL |
QDRANT_API_KEY |
⚠️ Optional | - | Qdrant API key (network mode only) |
QDRANT_COLLECTION |
⚠️ Optional | nextcloud_content |
Qdrant collection name |
VECTOR_SYNC_ENABLED |
⚠️ Optional | false |
Enable background vector indexing |
VECTOR_SYNC_SCAN_INTERVAL |
⚠️ Optional | 300 |
Document scan interval (seconds) |
VECTOR_SYNC_PROCESSOR_WORKERS |
⚠️ Optional | 3 |
Concurrent indexing workers |
VECTOR_SYNC_QUEUE_MAX_SIZE |
⚠️ Optional | 10000 |
Max queued documents |
OLLAMA_BASE_URL |
⚠️ Optional | - | Ollama API endpoint for embeddings |
OLLAMA_EMBEDDING_MODEL |
⚠️ Optional | nomic-embed-text |
Embedding model to use |
OLLAMA_VERIFY_SSL |
⚠️ Optional | true |
Verify SSL certificates |
DOCUMENT_CHUNK_SIZE |
⚠️ Optional | 512 |
Words per chunk for document embedding |
DOCUMENT_CHUNK_OVERLAP |
⚠️ Optional | 50 |
Overlapping words between chunks (must be < chunk size) |
Docker Compose Example
Enable network mode Qdrant with docker-compose:
services:
mcp:
environment:
- QDRANT_URL=http://qdrant:6333
- VECTOR_SYNC_ENABLED=true
qdrant:
image: qdrant/qdrant:latest
ports:
- 127.0.0.1:6333:6333
volumes:
- qdrant-data:/qdrant/storage
profiles:
- qdrant # Optional service
volumes:
qdrant-data:
Start with Qdrant service:
docker-compose --profile qdrant up
Or use default in-memory mode (no --profile needed):
docker-compose up
Loading Environment Variables
After creating your .env file, load the environment variables:
On Linux/macOS
# Load all variables from .env
export $(grep -v '^#' .env | xargs)
On Windows (PowerShell)
# Load variables from .env
Get-Content .env | ForEach-Object {
if ($_ -match '^\s*([^#][^=]*)\s*=\s*(.*)$') {
[Environment]::SetEnvironmentVariable($matches[1].Trim(), $matches[2].Trim(), "Process")
}
}
Via Docker
# Docker automatically loads .env when using --env-file
docker run -p 127.0.0.1:8000:8000 --env-file .env --rm \
ghcr.io/cbcoutinho/nextcloud-mcp-server:latest
CLI Configuration
Some configuration options can also be provided via CLI arguments. CLI arguments take precedence over environment variables.
OAuth-related CLI Options
uv run nextcloud-mcp-server --help
Options:
--oauth / --no-oauth Force OAuth mode (if enabled) or
BasicAuth mode (if disabled). By default,
auto-detected based on environment
variables.
--oauth-client-id TEXT OAuth client ID (can also use
NEXTCLOUD_OIDC_CLIENT_ID env var)
--oauth-client-secret TEXT OAuth client secret (can also use
NEXTCLOUD_OIDC_CLIENT_SECRET env var)
--mcp-server-url TEXT MCP server URL for OAuth callbacks (can
also use NEXTCLOUD_MCP_SERVER_URL env
var) [default: http://localhost:8000]
Server Options
Options:
-h, --host TEXT Server host [default: 127.0.0.1]
-p, --port INTEGER Server port [default: 8000]
-w, --workers INTEGER Number of worker processes
-r, --reload Enable auto-reload
-l, --log-level [critical|error|warning|info|debug|trace]
Logging level [default: info]
-t, --transport [sse|streamable-http|http]
MCP transport protocol [default: sse]
App Selection
Options:
-e, --enable-app [notes|tables|webdav|calendar|contacts|deck]
Enable specific Nextcloud app APIs. Can
be specified multiple times. If not
specified, all apps are enabled.
Example CLI Usage
# OAuth mode with custom client and port
uv run nextcloud-mcp-server --oauth \
--oauth-client-id abc123 \
--oauth-client-secret xyz789 \
--port 8080
# BasicAuth mode with specific apps only
uv run nextcloud-mcp-server --no-oauth \
--enable-app notes \
--enable-app calendar
Configuration Best Practices
For Development
- Use BasicAuth for quick setup and testing
- Or use OAuth with auto-registration (dynamic client registration)
- Store
.envfile in your project directory - Add
.envto.gitignore
For Production
- Always use OAuth2/OIDC with pre-configured clients
- Store OAuth client credentials securely
- Use environment variables from your deployment platform (Docker secrets, Kubernetes ConfigMaps, etc.)
- Never commit credentials to version control
- SQLite database permissions are handled automatically by the server
For Docker
- Mount OAuth client storage as a volume for persistence:
docker run -v $(pwd)/.oauth:/app/.oauth --env-file .env \ ghcr.io/cbcoutinho/nextcloud-mcp-server:latest - Use Docker secrets for sensitive values in production
See Also
- OAuth Quick Start - 5-minute OAuth setup for development
- OAuth Setup Guide - Detailed OAuth configuration for production
- OAuth Architecture - How OAuth works in the MCP server
- Upstream Status - Required patches and upstream PRs
- Authentication - Authentication modes comparison
- Running the Server - Starting the server with different configurations
- Troubleshooting - Common configuration issues
- OAuth Troubleshooting - OAuth-specific troubleshooting