Chris Coutinho
1707b2e6e1
feat: add self-signed SSL certificate support for Nextcloud connections
...
Add NEXTCLOUD_VERIFY_SSL and NEXTCLOUD_CA_BUNDLE env vars to configure
TLS certificate verification for all outbound Nextcloud connections.
Centralizes SSL config via a new HTTP client factory (http.py) used by
all 27 Nextcloud-bound call sites, including API clients, OIDC endpoints,
OAuth flows, and health checks.
Closes #560
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com >
2026-02-16 09:21:21 +01:00
Chris Coutinho
1a5bb10cd0
feat(config): consolidate configuration with smart dependency resolution (ADR-021)
...
Simplifies configuration by consolidating overlapping settings and adding
automatic dependency resolution. This makes semantic search configuration
significantly easier for users while maintaining 100% backward compatibility.
## Key Changes
### Variable Renaming (Backward Compatible)
- `VECTOR_SYNC_ENABLED` → `ENABLE_SEMANTIC_SEARCH` (old name still works)
- `ENABLE_OFFLINE_ACCESS` → `ENABLE_BACKGROUND_OPERATIONS` (old name still works)
- Deprecation warnings logged when old names used
- Old names will be removed in v1.0.0
### Smart Dependency Resolution
- `ENABLE_SEMANTIC_SEARCH` automatically enables background operations in multi-user modes
- No need to set both `ENABLE_OFFLINE_ACCESS` and `VECTOR_SYNC_ENABLED` anymore
- Single-user mode doesn't auto-enable background ops (not needed)
### Explicit Mode Selection (Optional)
- New `MCP_DEPLOYMENT_MODE` environment variable
- Valid values: single_user_basic, multi_user_basic, oauth_single_audience,
oauth_token_exchange, smithery
- Removes ambiguity about which deployment mode is active
- Falls back to auto-detection if not set (existing behavior)
### Configuration Templates
- Reorganized `env.sample` by deployment mode with clear sections
- Added mode-specific quick-start templates:
- `env.sample.single-user` - Simplest configuration
- `env.sample.oauth-multi-user` - Recommended multi-user
- `env.sample.oauth-advanced` - Token exchange mode
## Implementation Details
### Files Modified
- `nextcloud_mcp_server/config.py` - Smart dependency resolution helpers
- `nextcloud_mcp_server/config_validators.py` - Simplified validation, explicit mode
- `tests/unit/test_config_validators.py` - 19 new tests (60 total, all passing)
- `env.sample` - Reorganized by deployment mode
- `docs/configuration.md` - Complete rewrite with consolidated approach
- `docs/troubleshooting.md` - New consolidation troubleshooting section
- `README.md` - Updated variable references
### New Files
- `docs/ADR-021-configuration-consolidation.md` - Architecture decision record
- `docs/configuration-migration-v2.md` - Comprehensive migration guide
- `env.sample.single-user` - Single-user quick-start template
- `env.sample.oauth-multi-user` - OAuth multi-user quick-start template
- `env.sample.oauth-advanced` - Token exchange quick-start template
## User Impact
### Before (Confusing)
```bash
ENABLE_OFFLINE_ACCESS=true # Why both?
VECTOR_SYNC_ENABLED=true # What's the relationship?
```
### After (Simplified)
```bash
MCP_DEPLOYMENT_MODE=oauth_single_audience # Explicit (optional)
ENABLE_SEMANTIC_SEARCH=true # Auto-enables background ops!
```
### Benefits
- 📉 2 fewer variables to understand for semantic search
- 📋 Clear intent ("I want semantic search")
- 🎯 Explicit mode declaration available
- 🔄 100% backward compatible
- ✅ All 265 unit tests passing
## Testing
- All 60 config validation tests passing
- 10 new tests for configuration consolidation
- 9 new tests for explicit mode selection
- Full unit test suite: 265 tests passing
- Backward compatibility verified
## Migration
Users can migrate at their own pace. Old variable names continue working
with deprecation warnings. See docs/configuration-migration-v2.md for
detailed migration instructions.
Related: ADR-021
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com >
2025-12-21 20:36:36 +01:00
Chris Coutinho
daabd90359
fix(security): address critical security issues from PR #401 code review
...
Implemented 6 critical security fixes identified during PR #401 review:
1. Token Rotation Race Condition (Issue 1)
- Added in-progress marker pattern to prevent concurrent refresh
- Prevents token invalidation when multiple requests refresh simultaneously
- File: token_broker.py:324, 343-390
2. Hardcoded Localhost URL (Issue 2)
- Added getNextcloudBaseUrl() with fallback chain
- Supports overwrite.cli.url, trusted_domains, and localhost fallback
- File: IdpTokenRefresher.php:38-61, 116
3. Error Information Leakage (Issue 3)
- Replaced 13 instances of str(e) with sanitized errors
- Prevents exposure of stack traces, paths, and tokens
- File: management.py:368, 444, 492, 510, 546, 571, 625, 643, 695, 750, 919, 956, 1121
4. Input Validation Gaps (Issue 4)
- Added validation helpers: _parse_int_param, _parse_float_param, _validate_query_string
- Applied bounds checking to get_chunk_context and unified_search
- File: management.py:119-164, 807-835, 1197-1212
5. PHP Refresh Token Validation (Issue 5)
- Added explicit refresh_token presence check
- Prevents silent token rotation failures
- File: IdpTokenRefresher.php:122-132
6. Cookie Security Configuration (Issue 6)
- Added _should_use_secure_cookies() with auto-detection
- Supports explicit COOKIE_SECURE env var or auto-detect from NEXTCLOUD_HOST
- Files: browser_oauth_routes.py:27-44, 470; env.sample:54-57
Testing:
- Unit tests: 195 passed
- Integration tests: 102 passed, 4 skipped
- OAuth tests: 9 passed
- All linting and type checks passed
Follow-up work tracked in issues #408-#417
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com >
2025-12-19 13:57:33 +01:00
Chris Coutinho
cb39b3fca4
feat(vector): Add configurable chunk size and overlap for document embedding
...
Enable users to tune document chunking parameters to match their embedding
model and content type by adding DOCUMENT_CHUNK_SIZE and DOCUMENT_CHUNK_OVERLAP
environment variables.
- **config.py**: Added `document_chunk_size` (default: 512) and
`document_chunk_overlap` (default: 50) configuration fields with validation:
- Ensures overlap < chunk_size
- Warns if chunk_size < 100 words
- Prevents negative overlap values
- **processor.py**: Updated DocumentChunker instantiation to use config
settings instead of hardcoded values (line 174-177)
- **tests/unit/test_config.py**: Added TestChunkConfigValidation class with
9 tests covering:
- Default values
- Valid configurations
- Validation errors (overlap >= chunk_size, negative overlap)
- Warning for small chunk sizes
- Environment variable loading
- **docs/configuration.md**: Added comprehensive "Document Chunking
Configuration" section with:
- Chunk size selection guidance (256-384 vs 512 vs 768-1024 words)
- Overlap recommendations (10-20% of chunk size)
- Configuration examples for different use cases
- Added env vars to reference table
- **docs/semantic-search-architecture.md**: Added "Document Chunking Strategy"
section with:
- Chunking process explanation
- Example showing sliding window behavior
- Search behavior with chunks
- Tuning recommendations
- **env.sample**: Added complete "Semantic Search & Vector Sync Configuration"
section with:
- Vector sync settings
- Qdrant configuration (3 modes)
- Ollama embedding service
- Document chunking configuration
- **docker-compose.yml**: Added commented examples for DOCUMENT_CHUNK_SIZE and
DOCUMENT_CHUNK_OVERLAP with usage notes
\`\`\`bash
DOCUMENT_CHUNK_SIZE=512
DOCUMENT_CHUNK_OVERLAP=50
\`\`\`
1. \`overlap\` must be less than \`chunk_size\`
2. \`overlap\` cannot be negative
3. Warning issued if \`chunk_size\` < 100 words
**Precise matching** (small notes, specific queries):
\`\`\`bash
DOCUMENT_CHUNK_SIZE=256
DOCUMENT_CHUNK_OVERLAP=25
\`\`\`
**Balanced** (default, general purpose):
\`\`\`bash
DOCUMENT_CHUNK_SIZE=512
DOCUMENT_CHUNK_OVERLAP=50
\`\`\`
**Contextual** (long documents, broader topics):
\`\`\`bash
DOCUMENT_CHUNK_SIZE=1024
DOCUMENT_CHUNK_OVERLAP=100
\`\`\`
✅ **User control** - Tune chunking to match embedding model capabilities
✅ **Experimentation** - Test different chunk sizes for optimal results
✅ **Model alignment** - Match chunk size to embedding context window
✅ **Backward compatible** - Defaults maintain existing behavior
✅ **Well validated** - Comprehensive tests prevent misconfiguration
All 22 config validation tests pass (9 new tests for chunking):
- Default values work correctly
- Validation prevents invalid configurations
- Environment variables load properly
- Warning system works as expected
With configurable chunk sizes, users can now experiment with different Ollama
embedding models and tune chunk parameters for optimal semantic search quality.
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude <noreply@anthropic.com >
2025-11-10 02:47:57 +01:00
Chris Coutinho
15113dbb03
fix: remove Hybrid Flow, make Progressive Consent default (ADR-004)
...
Eliminates scope escalation security vulnerability by removing Hybrid Flow
and making Progressive Consent the only OAuth mode.
Changes:
- Delete oauth_callback() and oauth_token() (Hybrid Flow only, ~314 lines)
- Fix scope flows: Flow 1 requests resource scopes, Flow 2 requests identity+offline
- Remove ENABLE_PROGRESSIVE_CONSENT flag (always enabled in OAuth mode)
- Update documentation to reflect Progressive Consent as default
- Delete test_adr004_hybrid_flow.py test file
- Remove unused variables (ruff lint fixes)
Security improvements:
- No scope escalation: client gets exactly what it requests
- Clear separation: MCP session tokens vs Nextcloud offline tokens
- OAuth2 compliant: follows best practices for scope handling
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude <noreply@anthropic.com >
2025-11-04 00:26:07 +01:00
Chris Coutinho
d16bcdcfbb
feat: Implement ADR-004 Progressive Consent foundation components
...
- Token Broker Service manages Nextcloud access tokens with audience validation
- Implements short-lived token caching (5-minute TTL) with early refresh
- Enhanced token storage schema with ADR-004 fields (flow_type, audience, provisioning)
- MCP provisioning tools for explicit Flow 2 resource authorization
- Comprehensive unit tests for Token Broker Service (14 tests, all passing)
- Environment configuration for Progressive Consent mode
This implements the foundation for the dual OAuth flow architecture where:
- Flow 1: MCP clients authenticate to MCP server (aud: "mcp-server")
- Flow 2: MCP server gets delegated Nextcloud access (aud: "nextcloud")
Users must explicitly call provision_nextcloud_access tool to grant resource access,
implementing the "stateless by default" principle from ADR-004.
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude <noreply@anthropic.com >
2025-11-03 07:51:07 +01:00
Chris Coutinho
849c67c32a
fix: Complete Keycloak external IdP integration with all tests passing
...
This commit completes the Keycloak external IdP integration for the MCP
server, implementing ADR-002 Tier 2 (External Identity Provider) with
full Bearer token authentication support.
Key Changes:
1. **Keycloak backchannel-dynamic configuration**
- Added --hostname-strict=false and --hostname-backchannel-dynamic=true
- Allows external issuer (localhost:8888) with internal endpoints (keycloak:8080)
- Solves Docker networking issue where containers can't reach localhost
2. **CORSMiddleware Bearer token patch**
- Created app-hooks/patches/cors-bearer-token.patch from upstream commit 8fb5e77db82
- Allows Bearer tokens to bypass CORS/CSRF checks (stateless authentication)
- Applied via post-installation hook 20-apply-cors-bearer-token-patch.sh
- Enables app-specific APIs (Notes, Calendar, etc.) to work with Bearer tokens
3. **Patch organization**
- Moved patches to app-hooks/patches/ directory
- Updated docker-compose.yml to mount entire app-hooks directory
- Consolidated patch management for better maintainability
4. **Test improvements**
- All 11 Keycloak integration tests passing
- Tests validate OAuth token acquisition, MCP connectivity, token validation,
tool execution, token persistence, user provisioning, scope filtering,
and error handling
Architecture:
- Keycloak acts as external OAuth/OIDC identity provider
- MCP server uses Keycloak tokens to access Nextcloud APIs
- Nextcloud user_oidc app validates Bearer tokens from Keycloak
- No admin credentials needed - all API access uses user's OAuth tokens
Cache Note:
- Discovery and JWKS caches must be cleared when switching Keycloak configurations
- Use: docker compose exec redis redis-cli DEL "<cache-key>"
- Or: docker compose exec app php occ user_oidc:provider keycloak --clientid nextcloud
Related:
- ADR-002: Vector sync background jobs authentication
- Validates external IdP integration pattern
- Demonstrates offline_access with refresh tokens (Tier 1 & 2)
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude <noreply@anthropic.com >
2025-11-02 22:03:20 +01:00
Chris Coutinho
a36038422b
feat: Add text processing background worker for telling client about progress
2025-10-25 19:52:45 +02:00
Chris Coutinho
2147fc1696
refactor: Transform document parsing into pluggable processor architecture
...
Refactors PR #190 's hardcoded Unstructured.io integration into a flexible,
extensible plugin system supporting multiple text extraction engines.
- **`DocumentProcessor` ABC**: Abstract interface for all processors
- **`ProcessorRegistry`**: Central registry for discovery and routing
- **`ProcessingResult`**: Standardized output format across processors
- **`UnstructuredProcessor`**: Refactored from `UnstructuredClient`
- **`TesseractProcessor`**: Local OCR for images (lightweight alternative)
- **`CustomHTTPProcessor`**: Generic wrapper for custom HTTP APIs
- New `get_document_processor_config()` returns structured config
- Supports enabling/disabling individual processors
- Per-processor configuration via environment variables
- **Breaking Change**: `ENABLE_UNSTRUCTURED_PARSING` replaced with:
- `ENABLE_DOCUMENT_PROCESSING=true/false` (master switch)
- `ENABLE_UNSTRUCTURED=true/false` (per-processor)
- `ENABLE_TESSERACT=true/false`
- `ENABLE_CUSTOM_PROCESSOR=true/false`
- `parse_document()` now uses `ProcessorRegistry`
- Auto-selects appropriate processor based on MIME type
- Processor priority system (Unstructured=10, Tesseract=5, Custom=1)
- `initialize_document_processors()` registers processors at startup
- Integrated into both BasicAuth and OAuth lifespans
- Graceful degradation if processors fail to initialize
```env
ENABLE_DOCUMENT_PROCESSING=false
ENABLE_UNSTRUCTURED=false
UNSTRUCTURED_API_URL=http://unstructured:8000
UNSTRUCTURED_STRATEGY=auto # auto|fast|hi_res
UNSTRUCTURED_LANGUAGES=eng,deu
ENABLE_TESSERACT=false
TESSERACT_LANG=eng
ENABLE_CUSTOM_PROCESSOR=false
CUSTOM_PROCESSOR_URL=http://localhost:9000/process
CUSTOM_PROCESSOR_TYPES=application/pdf,image/jpeg
```
- **Removed**: `tests/test_unstructured_config.py` (legacy tests)
- **Added**: `tests/unit/test_document_processor_config.py`
- 7 unit tests for new config system
- Tests individual and multi-processor configurations
- **Added**:
- `nextcloud_mcp_server/document_processors/__init__.py`
- `nextcloud_mcp_server/document_processors/base.py`
- `nextcloud_mcp_server/document_processors/registry.py`
- `nextcloud_mcp_server/document_processors/unstructured.py`
- `nextcloud_mcp_server/document_processors/tesseract.py`
- `nextcloud_mcp_server/document_processors/custom_http.py`
- `tests/unit/test_document_processor_config.py`
- **Modified**:
- `nextcloud_mcp_server/config.py` - New plugin config system
- `nextcloud_mcp_server/app.py` - Processor initialization
- `nextcloud_mcp_server/utils/document_parser.py` - Uses registry
- `nextcloud_mcp_server/server/webdav.py` - Import updates
- `env.sample` - New configuration format
- `docker-compose.yml` - (profile changes from previous work)
- **Removed**:
- `nextcloud_mcp_server/client/unstructured_client.py` - Replaced by UnstructuredProcessor
- `tests/test_unstructured_config.py` - Replaced with new tests
✅ **Extensible**: Add processors without modifying core code
✅ **Testable**: Mock processors for unit tests
✅ **Configurable**: Enable only needed processors
✅ **Flexible**: Choose fast (Tesseract) vs accurate (Unstructured)
✅ **Opt-in**: Disabled by default, no mandatory dependencies
Users upgrading from PR #190 need to update environment variables:
```bash
ENABLE_UNSTRUCTURED_PARSING=true
ENABLE_DOCUMENT_PROCESSING=true
ENABLE_UNSTRUCTURED=true
```
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude <noreply@anthropic.com >
2025-10-25 19:28:35 +02:00
yuisheaven
64649c902d
Merge branch 'master' into feature/introduce_files_parsing_with_unstructured_service_for_webdav_files_retrieval
2025-10-21 20:37:00 +02:00
Chris Coutinho
4d7e4b9a4b
feat(server): Experimental support for OAuth2/OIDC authentication
2025-10-14 01:22:15 +02:00
yuisheaven
c9a687171a
added envs for unstructured to control OCR quality and OCR languages
2025-10-04 05:21:02 +02:00
yuisheaven
642108ee91
added new "unstructured" docker service to compose stack and introduced new envs
2025-10-04 04:27:31 +02:00
Chris Coutinho
0d8666a2d7
Initial commit
2025-05-04 23:24:55 +02:00