2147fc1696
Refactors PR #190's hardcoded Unstructured.io integration into a flexible, extensible plugin system supporting multiple text extraction engines. - **`DocumentProcessor` ABC**: Abstract interface for all processors - **`ProcessorRegistry`**: Central registry for discovery and routing - **`ProcessingResult`**: Standardized output format across processors - **`UnstructuredProcessor`**: Refactored from `UnstructuredClient` - **`TesseractProcessor`**: Local OCR for images (lightweight alternative) - **`CustomHTTPProcessor`**: Generic wrapper for custom HTTP APIs - New `get_document_processor_config()` returns structured config - Supports enabling/disabling individual processors - Per-processor configuration via environment variables - **Breaking Change**: `ENABLE_UNSTRUCTURED_PARSING` replaced with: - `ENABLE_DOCUMENT_PROCESSING=true/false` (master switch) - `ENABLE_UNSTRUCTURED=true/false` (per-processor) - `ENABLE_TESSERACT=true/false` - `ENABLE_CUSTOM_PROCESSOR=true/false` - `parse_document()` now uses `ProcessorRegistry` - Auto-selects appropriate processor based on MIME type - Processor priority system (Unstructured=10, Tesseract=5, Custom=1) - `initialize_document_processors()` registers processors at startup - Integrated into both BasicAuth and OAuth lifespans - Graceful degradation if processors fail to initialize ```env ENABLE_DOCUMENT_PROCESSING=false ENABLE_UNSTRUCTURED=false UNSTRUCTURED_API_URL=http://unstructured:8000 UNSTRUCTURED_STRATEGY=auto # auto|fast|hi_res UNSTRUCTURED_LANGUAGES=eng,deu ENABLE_TESSERACT=false TESSERACT_LANG=eng ENABLE_CUSTOM_PROCESSOR=false CUSTOM_PROCESSOR_URL=http://localhost:9000/process CUSTOM_PROCESSOR_TYPES=application/pdf,image/jpeg ``` - **Removed**: `tests/test_unstructured_config.py` (legacy tests) - **Added**: `tests/unit/test_document_processor_config.py` - 7 unit tests for new config system - Tests individual and multi-processor configurations - **Added**: - `nextcloud_mcp_server/document_processors/__init__.py` - `nextcloud_mcp_server/document_processors/base.py` - `nextcloud_mcp_server/document_processors/registry.py` - `nextcloud_mcp_server/document_processors/unstructured.py` - `nextcloud_mcp_server/document_processors/tesseract.py` - `nextcloud_mcp_server/document_processors/custom_http.py` - `tests/unit/test_document_processor_config.py` - **Modified**: - `nextcloud_mcp_server/config.py` - New plugin config system - `nextcloud_mcp_server/app.py` - Processor initialization - `nextcloud_mcp_server/utils/document_parser.py` - Uses registry - `nextcloud_mcp_server/server/webdav.py` - Import updates - `env.sample` - New configuration format - `docker-compose.yml` - (profile changes from previous work) - **Removed**: - `nextcloud_mcp_server/client/unstructured_client.py` - Replaced by UnstructuredProcessor - `tests/test_unstructured_config.py` - Replaced with new tests ✅ **Extensible**: Add processors without modifying core code ✅ **Testable**: Mock processors for unit tests ✅ **Configurable**: Enable only needed processors ✅ **Flexible**: Choose fast (Tesseract) vs accurate (Unstructured) ✅ **Opt-in**: Disabled by default, no mandatory dependencies Users upgrading from PR #190 need to update environment variables: ```bash ENABLE_UNSTRUCTURED_PARSING=true ENABLE_DOCUMENT_PROCESSING=true ENABLE_UNSTRUCTURED=true ``` 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
3.7 KiB
3.7 KiB
JWT Scope Truncation Fix - Summary
Problem
When using JWT tokens with many scopes, the scope claim in the JWT payload was being truncated, causing only 32 out of 90 tools to be visible to the MCP client.
Root Cause
Multiple hardcoded string length limits in the Nextcloud OIDC app code:
- Database schema:
oc_oidc_access_tokens.scopecolumn wasVARCHAR(128)- too small for 247-character scope string - Code truncation in TokenGenerationRequestListener.php:
substr($scopes, 0, 128)on line 83 - Code truncation in LoginRedirectorController.php:
substr($scope, 0, 128)on line 437 - Client scope limits: Multiple places truncating
allowed_scopesto 255 characters
Solution
Fixed all truncation points to support up to 512 characters:
Database Migration (Version0015Date20251123100100.php)
// Increase oidc_clients.allowed_scopes from 256 to 512
$table->changeColumn('allowed_scopes', [
'notnull' => false,
'length' => 512,
]);
// Increase oidc_access_tokens.scope from 128 to 512
$table->changeColumn('scope', [
'notnull' => true,
'length' => 512,
]);
Code Changes
- TokenGenerationRequestListener.php line 83:
128→512 - LoginRedirectorController.php line 437:
128→512 - SettingsController.php line 232:
255→511 - DynamicRegistrationController.php lines 182, 420:
255→511
Application Changes
-
Added todo scopes to default scope lists:
nextcloud_mcp_server/app.pytests/conftest.py(DEFAULT_FULL_SCOPES, DEFAULT_READ_SCOPES, DEFAULT_WRITE_SCOPES)
-
Skipped obsolete tests:
test_scope_classification- Script no longer existstest_all_tools_classified- Script no longer exists
Verification
Before Fix
- Scope length in database: 128 characters (truncated)
- Tools visible: 32 out of 90 (35%)
- Missing scopes:
deck,tables,files,sharing, partialcookbook:write
After Fix
- Scope length in database: 247 characters (full string)
- Tools visible: 90 out of 90 (100%)
- All scopes present and complete
Test Results
$ uv run pytest tests/server/test_scope_authorization.py -v
===== 13 passed, 2 skipped in 22.11s =====
All scope authorization tests pass, including:
- ✅ Full access token shows all 90 tools
- ✅ Read-only token filters write tools
- ✅ Write-only token filters read tools
- ✅ JWT consent scenarios work correctly
- ✅ PRM endpoint lists all scopes
Files Modified
OIDC App (third_party/oidc/)
lib/Migration/Version0015Date20251123100100.php- Database schema migrationlib/Listener/TokenGenerationRequestListener.php- Token generation scope limitlib/Controller/LoginRedirectorController.php- OAuth flow scope limitlib/Controller/SettingsController.php- Client settings scope limitlib/Controller/DynamicRegistrationController.php- DCR scope limits
MCP Server
nextcloud_mcp_server/app.py- Added todo scopes to default scopestests/conftest.py- Added todo scopes to all scope constantstests/server/test_scope_authorization.py- Skipped obsolete tests
Impact
- ✅ All 90 MCP tools now accessible with full access token
- ✅ JWT tokens contain complete scope information
- ✅ No more scope truncation at any layer
- ✅ Database supports up to 512 characters (247 currently used, 265-char margin)
- ✅ Future-proof for adding more scopes
Current Scope String
openid profile email notes:read notes:write calendar:read calendar:write todo:read todo:write contacts:read contacts:write cookbook:read cookbook:write deck:read deck:write tables:read tables:write files:read files:write sharing:read sharing:write
Length: 247 characters Capacity: 512 characters Margin: 265 characters (107% headroom)