feat(config): consolidate configuration with smart dependency resolution (ADR-021)

Simplifies configuration by consolidating overlapping settings and adding automatic dependency resolution. This makes semantic search configuration significantly easier for users while maintaining 100% backward compatibility. ## Key Changes ### Variable Renaming (Backward Compatible) - `VECTOR_SYNC_ENABLED` → `ENABLE_SEMANTIC_SEARCH` (old name still works) - `ENABLE_OFFLINE_ACCESS` → `ENABLE_BACKGROUND_OPERATIONS` (old name still works) - Deprecation warnings logged when old names used - Old names will be removed in v1.0.0 ### Smart Dependency Resolution - `ENABLE_SEMANTIC_SEARCH` automatically enables background operations in multi-user modes - No need to set both `ENABLE_OFFLINE_ACCESS` and `VECTOR_SYNC_ENABLED` anymore - Single-user mode doesn't auto-enable background ops (not needed) ### Explicit Mode Selection (Optional) - New `MCP_DEPLOYMENT_MODE` environment variable - Valid values: single_user_basic, multi_user_basic, oauth_single_audience, oauth_token_exchange, smithery - Removes ambiguity about which deployment mode is active - Falls back to auto-detection if not set (existing behavior) ### Configuration Templates - Reorganized `env.sample` by deployment mode with clear sections - Added mode-specific quick-start templates: - `env.sample.single-user` - Simplest configuration - `env.sample.oauth-multi-user` - Recommended multi-user - `env.sample.oauth-advanced` - Token exchange mode ## Implementation Details ### Files Modified - `nextcloud_mcp_server/config.py` - Smart dependency resolution helpers - `nextcloud_mcp_server/config_validators.py` - Simplified validation, explicit mode - `tests/unit/test_config_validators.py` - 19 new tests (60 total, all passing) - `env.sample` - Reorganized by deployment mode - `docs/configuration.md` - Complete rewrite with consolidated approach - `docs/troubleshooting.md` - New consolidation troubleshooting section - `README.md` - Updated variable references ### New Files - `docs/ADR-021-configuration-consolidation.md` - Architecture decision record - `docs/configuration-migration-v2.md` - Comprehensive migration guide - `env.sample.single-user` - Single-user quick-start template - `env.sample.oauth-multi-user` - OAuth multi-user quick-start template - `env.sample.oauth-advanced` - Token exchange quick-start template ## User Impact ### Before (Confusing) ```bash ENABLE_OFFLINE_ACCESS=true # Why both? VECTOR_SYNC_ENABLED=true # What's the relationship? ``` ### After (Simplified) ```bash MCP_DEPLOYMENT_MODE=oauth_single_audience # Explicit (optional) ENABLE_SEMANTIC_SEARCH=true # Auto-enables background ops! ``` ### Benefits - 📉 2 fewer variables to understand for semantic search - 📋 Clear intent ("I want semantic search") - 🎯 Explicit mode declaration available - 🔄 100% backward compatible - ✅ All 265 unit tests passing ## Testing - All 60 config validation tests passing - 10 new tests for configuration consolidation - 9 new tests for explicit mode selection - Full unit test suite: 265 tests passing - Backward compatibility verified ## Migration Users can migrate at their own pace. Old variable names continue working with deprecation warnings. See docs/configuration-migration-v2.md for detailed migration instructions. Related: ADR-021 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-21 20:36:36 +01:00
parent 34273ec01e
commit 1a5bb10cd0
12 changed files with 2257 additions and 259 deletions
@@ -1,203 +1,236 @@
-# Nextcloud Instance
+# ============================================
+# DEPLOYMENT MODE SELECTION
+# ============================================
+# Optional: Explicitly declare deployment mode (ADR-021)
+# If not set, mode is auto-detected from other settings
+# Valid values: single_user_basic, multi_user_basic, oauth_single_audience,
+#               oauth_token_exchange, smithery
+#
+# Recommendation: Set this for clarity and to catch configuration errors early
+#MCP_DEPLOYMENT_MODE=oauth_single_audience
+
+# ============================================
+# COMMON SETTINGS (Required for all modes)
+# ============================================
+# Your Nextcloud instance URL (without trailing slash)
 NEXTCLOUD_HOST=

-# ===== AUTHENTICATION MODE =====
-# Choose ONE of the following:
-
-# Option 1: OAuth2/OIDC (RECOMMENDED - More Secure)
-# - Requires Nextcloud OIDC app installed and configured
-# - Admin must enable "Dynamic Client Registration" in OIDC app settings
-# - Leave NEXTCLOUD_USERNAME and NEXTCLOUD_PASSWORD empty to use OAuth mode
-# - OAuth client credentials are stored encrypted in SQLite (TOKEN_STORAGE_DB)
-# - Optional: Pre-register client and provide credentials (otherwise auto-registers)
-NEXTCLOUD_OIDC_CLIENT_ID=
-NEXTCLOUD_OIDC_CLIENT_SECRET=
-NEXTCLOUD_MCP_SERVER_URL=http://localhost:8000
-
-# OAuth Storage Configuration (SQLite storage for OAuth clients and refresh tokens)
-# TOKEN_ENCRYPTION_KEY: Required for encrypting OAuth client secrets and refresh tokens
-# Generate with: python -c "from cryptography.fernet import Fernet; print(Fernet.generate_key().decode())"
-#TOKEN_ENCRYPTION_KEY=
-# TOKEN_STORAGE_DB: Path to SQLite database (default: /app/data/tokens.db)
-#TOKEN_STORAGE_DB=/app/data/tokens.db
-
-# ===== ADR-004 PROGRESSIVE CONSENT CONFIGURATION =====
-# Enable Progressive Consent mode (dual OAuth flows)
-# When enabled: Flow 1 for client auth, Flow 2 for Nextcloud resource access
-# When disabled: Uses existing hybrid flow (backward compatible)
-
-# MCP Server OAuth Client Configuration
-# The MCP server's own OAuth client credentials for Flow 2
-# If not set, will use dynamic client registration
-#MCP_SERVER_CLIENT_ID=
-#MCP_SERVER_CLIENT_SECRET=
-
-# Allowed MCP Client IDs (comma-separated list)
-# Client IDs that are allowed to authenticate in Flow 1
-# Examples: claude-desktop,continue-dev,zed-editor
-#ALLOWED_MCP_CLIENTS=claude-desktop,continue-dev,zed-editor
-
-# Token cache configuration for Token Broker Service
-# Cache TTL in seconds (default: 300 = 5 minutes)
-#TOKEN_CACHE_TTL=300
-# Early refresh threshold in seconds (default: 30)
-#TOKEN_CACHE_EARLY_REFRESH=30
-
-# Option 2: Basic Authentication (LEGACY - Less Secure)
-# - Requires username and password
-# - Credentials stored in environment variables
-# - Use only for backward compatibility or if OAuth unavailable
-# - If these are set, OAuth mode is disabled
+# ============================================
+# SINGLE-USER BASICAUTH MODE
+# ============================================
+# Simplest deployment - one user, credentials in environment
+# Use for: Personal instances, local development, testing
+#
+# Required:
 NEXTCLOUD_USERNAME=
 NEXTCLOUD_PASSWORD=
+#
+# Optional features (semantic search, document processing):
+# See "Optional Features" section below

+# ============================================
+# MULTI-USER BASICAUTH MODE
+# ============================================
+# Users provide credentials in request headers (pass-through)
+# Use for: Multi-user without OAuth, simple shared deployments
+#
+# Required:
+#ENABLE_MULTI_USER_BASIC_AUTH=true
+#
+# Optional - Background Operations (for semantic search, future features):
+# Enable background token storage using app passwords (via Astrolabe)
+# Required for semantic search in multi-user mode
+# Note: ENABLE_SEMANTIC_SEARCH automatically enables this in multi-user modes
+#ENABLE_BACKGROUND_OPERATIONS=true
+#NEXTCLOUD_OIDC_CLIENT_ID=
+#NEXTCLOUD_OIDC_CLIENT_SECRET=
+#TOKEN_ENCRYPTION_KEY=
+#TOKEN_STORAGE_DB=/app/data/tokens.db
+#
+# Optional features (semantic search, document processing):
+# See "Optional Features" section below
+
+# ============================================
+# OAUTH SINGLE-AUDIENCE MODE (Recommended)
+# ============================================
+# Multi-user OAuth with single-audience tokens
+# Use for: Multi-user production deployments, enhanced security
+# Tokens work for both MCP server and Nextcloud APIs (pass-through)
+#
+# Required: None (uses Dynamic Client Registration if credentials not provided)
+#
+# Optional - Pre-registered OAuth Client:
+# If you pre-register the client instead of using DCR:
+#NEXTCLOUD_OIDC_CLIENT_ID=
+#NEXTCLOUD_OIDC_CLIENT_SECRET=
+#
+# Optional - Background Operations (for semantic search, future features):
+# Enable refresh token storage for offline access
+# Note: ENABLE_SEMANTIC_SEARCH automatically enables this in multi-user modes
+#ENABLE_BACKGROUND_OPERATIONS=true
+#TOKEN_ENCRYPTION_KEY=
+#TOKEN_STORAGE_DB=/app/data/tokens.db
+#
+# Optional - Custom OIDC Discovery:
+# Auto-detected from NEXTCLOUD_HOST if not set
+#NEXTCLOUD_OIDC_DISCOVERY_URL=
+#
+# Optional - Custom Scopes:
+# Default: openid profile email offline_access notes:* calendar:* contacts:* tables:* webdav:* deck:* cookbook:*
+#NEXTCLOUD_OIDC_SCOPES=openid profile email notes:* calendar:*
+#
+# MCP Server URL (for OAuth redirects):
+#NEXTCLOUD_MCP_SERVER_URL=http://localhost:8000
+#
+# Optional features (semantic search, document processing):
+# See "Optional Features" section below
+
+# ============================================
+# OAUTH TOKEN EXCHANGE MODE (Advanced)
+# ============================================
+# Multi-user OAuth with RFC 8693 token exchange
+# Use for: Advanced deployments requiring separate MCP and Nextcloud tokens
+# MCP tokens are separate from Nextcloud tokens
+#
+# Required:
+#ENABLE_TOKEN_EXCHANGE=true
+#
+# Optional - Pre-registered OAuth Client:
+# If you pre-register the client instead of using DCR:
+#NEXTCLOUD_OIDC_CLIENT_ID=
+#NEXTCLOUD_OIDC_CLIENT_SECRET=
+#
+# Optional - Token Exchange Configuration:
+# Cache TTL in seconds (default: 300 = 5 minutes)
+#TOKEN_EXCHANGE_CACHE_TTL=300
+#
+# Optional - Background Operations:
+# Note: ENABLE_SEMANTIC_SEARCH automatically enables this in multi-user modes
+#ENABLE_BACKGROUND_OPERATIONS=true
+#TOKEN_ENCRYPTION_KEY=
+#TOKEN_STORAGE_DB=/app/data/tokens.db
+#
+# Optional - Custom OIDC Discovery:
+#NEXTCLOUD_OIDC_DISCOVERY_URL=
+#
+# MCP Server URL (for OAuth redirects):
+#NEXTCLOUD_MCP_SERVER_URL=http://localhost:8000
+#
+# Optional features (semantic search, document processing):
+# See "Optional Features" section below
+
+# ============================================
+# SMITHERY STATELESS MODE
+# ============================================
+# Stateless multi-tenant deployment for Smithery platform
+# Configuration comes from session URL parameters
+# No persistent storage, no OAuth, no vector sync
+#
+# Required: None (all config from session URL)
+# This mode is activated automatically when deployed to Smithery
+
+# ============================================
+# OPTIONAL FEATURES (All Deployment Modes)
+# ============================================
+
+# ===== SEMANTIC SEARCH =====
+# AI-powered semantic search across Nextcloud content
+# Requires: Qdrant vector database + embedding provider (Ollama, Bedrock, or Simple fallback)
+#
+# Enable semantic search:
+#ENABLE_SEMANTIC_SEARCH=true
+#
+# Note for Multi-User Modes:
+# ENABLE_SEMANTIC_SEARCH automatically enables background operations when needed
+# No need to set ENABLE_BACKGROUND_OPERATIONS separately
+# The server will automatically request refresh tokens and store them encrypted
+#
+# Vector Database - Choose ONE mode:
+# 1. In-memory (default): Set neither QDRANT_URL nor QDRANT_LOCATION
+# 2. Persistent local: Set QDRANT_LOCATION=/path/to/data
+# 3. Network: Set QDRANT_URL=http://qdrant:6333
+#
+#QDRANT_URL=http://qdrant:6333
+#QDRANT_LOCATION=:memory:
+#QDRANT_API_KEY=
+#QDRANT_COLLECTION=nextcloud_content
+#
+# Embedding Provider - Choose ONE:
+# 1. Ollama (recommended for local deployment):
+#OLLAMA_BASE_URL=http://ollama:11434
+#OLLAMA_EMBEDDING_MODEL=nomic-embed-text
+#OLLAMA_VERIFY_SSL=true
+#
+# 2. Amazon Bedrock (for AWS deployments):
+#AWS_REGION=us-east-1
+#BEDROCK_EMBEDDING_MODEL=amazon.titan-embed-text-v2:0
+# Optional: AWS credentials (uses credential chain if not set)
+#AWS_ACCESS_KEY_ID=
+#AWS_SECRET_ACCESS_KEY=
+#
+# 3. Simple (automatic fallback, no configuration needed)
+# Uses basic in-memory embeddings if no provider configured
+#
+# Document Chunking:
+# Configure how documents are split before embedding
+#DOCUMENT_CHUNK_SIZE=512
+#DOCUMENT_CHUNK_OVERLAP=50
+
+# ===== SEMANTIC SEARCH TUNING =====
+# Advanced parameters for vector sync background operations
+# Only modify if you understand the implications
+#
+# Document scan interval in seconds (default: 300 = 5 minutes)
+#VECTOR_SYNC_SCAN_INTERVAL=300
+#
+# Concurrent indexing workers (default: 3)
+#VECTOR_SYNC_PROCESSOR_WORKERS=3
+#
+# Max queued documents (default: 10000)
+#VECTOR_SYNC_QUEUE_MAX_SIZE=10000
+
+# ===== DOCUMENT PROCESSING =====
+# Extract text from PDFs, images, DOCX, etc. for semantic search
+# Disabled by default
+#
+#ENABLE_DOCUMENT_PROCESSING=false
+#DOCUMENT_PROCESSOR=unstructured
+#
+# Unstructured.io Processor (recommended):
+#ENABLE_UNSTRUCTURED=false
+#UNSTRUCTURED_API_URL=http://unstructured:8000
+#UNSTRUCTURED_TIMEOUT=120
+#UNSTRUCTURED_STRATEGY=auto
+#UNSTRUCTURED_LANGUAGES=eng,deu
+#PROGRESS_INTERVAL=10
+#
+# Tesseract OCR (lightweight, images only):
+#ENABLE_TESSERACT=false
+#TESSERACT_CMD=/usr/bin/tesseract
+#TESSERACT_LANG=eng
+#
+# Custom Processor (your own API):
+#ENABLE_CUSTOM_PROCESSOR=false
+#CUSTOM_PROCESSOR_NAME=my_ocr
+#CUSTOM_PROCESSOR_URL=http://localhost:9000/process
+#CUSTOM_PROCESSOR_API_KEY=
+#CUSTOM_PROCESSOR_TIMEOUT=60
+#CUSTOM_PROCESSOR_TYPES=application/pdf,image/jpeg,image/png
+
+# ===== SECURITY & ADVANCED =====
 # Cookie security (browser UI)
 # Auto-detects from NEXTCLOUD_HOST protocol if not set
-# Set explicitly for non-standard setups
 #COOKIE_SECURE=true

 # ============================================
-# Document Processing Configuration
+# DEPRECATED VARIABLES (Backward Compatibility)
 # ============================================
-# Enable document processing (PDF, DOCX, images, etc.)
-# Set to false to disable all document processing
-ENABLE_DOCUMENT_PROCESSING=false
-
-# Default processor to use when multiple are available
-# Options: unstructured, tesseract, custom
-DOCUMENT_PROCESSOR=unstructured
-
-# ============================================
-# Unstructured.io Processor
-# ============================================
-# Enable Unstructured processor (requires unstructured service in docker-compose)
-# This is a cloud-based/API processor supporting many document types
-ENABLE_UNSTRUCTURED=false
-
-# Unstructured API endpoint
-UNSTRUCTURED_API_URL=http://unstructured:8000
-
-# Request timeout in seconds (default: 120)
-# OCR operations can take 30-120 seconds for large documents
-UNSTRUCTURED_TIMEOUT=120
-
-# Parsing strategy: auto, fast, hi_res
-# - auto: Automatically choose based on document type
-# - fast: Fast parsing without OCR
-# - hi_res: High-resolution with OCR (slowest, most accurate)
-UNSTRUCTURED_STRATEGY=auto
-
-# OCR languages (comma-separated ISO 639-3 codes)
-# Common: eng=English, deu=German, fra=French, spa=Spanish
-UNSTRUCTURED_LANGUAGES=eng,deu
-
-# Progress reporting interval in seconds (default: 10)
-# During long-running OCR operations, progress notifications are sent to the MCP client
-# at this interval to prevent timeouts and provide status updates
-PROGRESS_INTERVAL=10
-
-# ============================================
-# Tesseract Processor (Local OCR)
-# ============================================
-# Enable Tesseract processor (requires tesseract binary installed)
-# This is a local, lightweight OCR solution for images only
-ENABLE_TESSERACT=false
-
-# Path to tesseract executable (optional, auto-detected if in PATH)
-#TESSERACT_CMD=/usr/bin/tesseract
-
-# OCR language (e.g., eng, deu, eng+deu for multiple)
-TESSERACT_LANG=eng
-
-# ============================================
-# Custom Processor (Your own API)
-# ============================================
-# Enable custom document processor via HTTP API
-ENABLE_CUSTOM_PROCESSOR=false
-
-# Unique name for your processor
-#CUSTOM_PROCESSOR_NAME=my_ocr
-
-# Your custom processor API endpoint
-#CUSTOM_PROCESSOR_URL=http://localhost:9000/process
-
-# Optional API key for authentication
-#CUSTOM_PROCESSOR_API_KEY=your-api-key-here
-
-# Request timeout in seconds
-#CUSTOM_PROCESSOR_TIMEOUT=60
-
-# Comma-separated MIME types your processor supports
-#CUSTOM_PROCESSOR_TYPES=application/pdf,image/jpeg,image/png
-
-# ============================================
-# Semantic Search & Vector Sync Configuration
-# ============================================
-# EXPERIMENTAL: Semantic search for Notes app (multi-app support planned)
-# Requires: Qdrant vector database + Ollama embedding service
-# Disabled by default
-
-# Enable background vector indexing
-VECTOR_SYNC_ENABLED=false
-
-# Document scan interval in seconds (default: 300 = 5 minutes)
-# How often to check for new/updated documents
-#VECTOR_SYNC_SCAN_INTERVAL=300
-
-# Concurrent indexing workers (default: 3)
-# Number of parallel workers for embedding generation
-#VECTOR_SYNC_PROCESSOR_WORKERS=3
-
-# Max queued documents (default: 10000)
-# Maximum documents waiting to be processed
-#VECTOR_SYNC_QUEUE_MAX_SIZE=10000
-
-# ============================================
-# Qdrant Vector Database Configuration
-# ============================================
-# Choose ONE of three modes:
-# 1. In-memory mode (default): Set neither QDRANT_URL nor QDRANT_LOCATION
-# 2. Persistent local: Set QDRANT_LOCATION=/path/to/data
-# 3. Network mode: Set QDRANT_URL=http://qdrant:6333
-
-# Network mode: URL to Qdrant service
-#QDRANT_URL=http://qdrant:6333
-
-# Local mode: Path to store vectors (use :memory: for in-memory)
-#QDRANT_LOCATION=:memory:
-
-# API key for network mode (optional)
-#QDRANT_API_KEY=
-
-# Collection name (optional - auto-generated if not set)
-# Auto-generation format: {deployment-id}-{model-name}
-# Allows safe model switching and multi-server deployments
-#QDRANT_COLLECTION=nextcloud_content
-
-# ============================================
-# Ollama Embedding Service Configuration
-# ============================================
-# Ollama endpoint for embeddings (if not set, uses SimpleEmbeddingProvider fallback)
-#OLLAMA_BASE_URL=http://ollama:11434
-
-# Embedding model to use (default: nomic-embed-text, 768 dimensions)
-# Changing this creates a new collection (requires re-embedding all documents)
-#OLLAMA_EMBEDDING_MODEL=nomic-embed-text
-
-# Verify SSL certificates (default: true)
-#OLLAMA_VERIFY_SSL=true
-
-# ============================================
-# Document Chunking Configuration
-# ============================================
-# Configure how documents are split before embedding
-
-# Words per chunk (default: 512)
-# Smaller chunks (256-384): More precise, less context, more storage
-# Larger chunks (768-1024): More context, less precise, less storage
-#DOCUMENT_CHUNK_SIZE=512
-
-# Overlapping words between chunks (default: 50)
-# Recommended: 10-20% of chunk size
-# Preserves context across chunk boundaries
-#DOCUMENT_CHUNK_OVERLAP=50
+# These variables still work but will be removed in v1.0.0
+# Please migrate to new names:
+#
+# Old Name                  → New Name
+# VECTOR_SYNC_ENABLED      → ENABLE_SEMANTIC_SEARCH
+# ENABLE_OFFLINE_ACCESS    → ENABLE_BACKGROUND_OPERATIONS
+#
+# Migration is optional - both old and new names work
+# Deprecation warnings will be logged when old names are used