Files
nextcloud-mcp-server/SCOPE_TRUNCATION_FIX.md
T
Chris Coutinho 2147fc1696 refactor: Transform document parsing into pluggable processor architecture
Refactors PR #190's hardcoded Unstructured.io integration into a flexible,
extensible plugin system supporting multiple text extraction engines.

- **`DocumentProcessor` ABC**: Abstract interface for all processors
- **`ProcessorRegistry`**: Central registry for discovery and routing
- **`ProcessingResult`**: Standardized output format across processors

- **`UnstructuredProcessor`**: Refactored from `UnstructuredClient`
- **`TesseractProcessor`**: Local OCR for images (lightweight alternative)
- **`CustomHTTPProcessor`**: Generic wrapper for custom HTTP APIs

- New `get_document_processor_config()` returns structured config
- Supports enabling/disabling individual processors
- Per-processor configuration via environment variables
- **Breaking Change**: `ENABLE_UNSTRUCTURED_PARSING` replaced with:
  - `ENABLE_DOCUMENT_PROCESSING=true/false` (master switch)
  - `ENABLE_UNSTRUCTURED=true/false` (per-processor)
  - `ENABLE_TESSERACT=true/false`
  - `ENABLE_CUSTOM_PROCESSOR=true/false`

- `parse_document()` now uses `ProcessorRegistry`
- Auto-selects appropriate processor based on MIME type
- Processor priority system (Unstructured=10, Tesseract=5, Custom=1)

- `initialize_document_processors()` registers processors at startup
- Integrated into both BasicAuth and OAuth lifespans
- Graceful degradation if processors fail to initialize

```env
ENABLE_DOCUMENT_PROCESSING=false

ENABLE_UNSTRUCTURED=false
UNSTRUCTURED_API_URL=http://unstructured:8000
UNSTRUCTURED_STRATEGY=auto  # auto|fast|hi_res
UNSTRUCTURED_LANGUAGES=eng,deu

ENABLE_TESSERACT=false
TESSERACT_LANG=eng

ENABLE_CUSTOM_PROCESSOR=false
CUSTOM_PROCESSOR_URL=http://localhost:9000/process
CUSTOM_PROCESSOR_TYPES=application/pdf,image/jpeg
```

- **Removed**: `tests/test_unstructured_config.py` (legacy tests)
- **Added**: `tests/unit/test_document_processor_config.py`
  - 7 unit tests for new config system
  - Tests individual and multi-processor configurations

- **Added**:
  - `nextcloud_mcp_server/document_processors/__init__.py`
  - `nextcloud_mcp_server/document_processors/base.py`
  - `nextcloud_mcp_server/document_processors/registry.py`
  - `nextcloud_mcp_server/document_processors/unstructured.py`
  - `nextcloud_mcp_server/document_processors/tesseract.py`
  - `nextcloud_mcp_server/document_processors/custom_http.py`
  - `tests/unit/test_document_processor_config.py`

- **Modified**:
  - `nextcloud_mcp_server/config.py` - New plugin config system
  - `nextcloud_mcp_server/app.py` - Processor initialization
  - `nextcloud_mcp_server/utils/document_parser.py` - Uses registry
  - `nextcloud_mcp_server/server/webdav.py` - Import updates
  - `env.sample` - New configuration format
  - `docker-compose.yml` - (profile changes from previous work)

- **Removed**:
  - `nextcloud_mcp_server/client/unstructured_client.py` - Replaced by UnstructuredProcessor
  - `tests/test_unstructured_config.py` - Replaced with new tests

 **Extensible**: Add processors without modifying core code
 **Testable**: Mock processors for unit tests
 **Configurable**: Enable only needed processors
 **Flexible**: Choose fast (Tesseract) vs accurate (Unstructured)
 **Opt-in**: Disabled by default, no mandatory dependencies

Users upgrading from PR #190 need to update environment variables:
```bash
ENABLE_UNSTRUCTURED_PARSING=true

ENABLE_DOCUMENT_PROCESSING=true
ENABLE_UNSTRUCTURED=true
```

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-25 19:28:35 +02:00

100 lines
3.7 KiB
Markdown

# JWT Scope Truncation Fix - Summary
## Problem
When using JWT tokens with many scopes, the `scope` claim in the JWT payload was being truncated, causing only 32 out of 90 tools to be visible to the MCP client.
## Root Cause
Multiple hardcoded string length limits in the Nextcloud OIDC app code:
1. **Database schema**: `oc_oidc_access_tokens.scope` column was `VARCHAR(128)` - too small for 247-character scope string
2. **Code truncation in TokenGenerationRequestListener.php**: `substr($scopes, 0, 128)` on line 83
3. **Code truncation in LoginRedirectorController.php**: `substr($scope, 0, 128)` on line 437
4. **Client scope limits**: Multiple places truncating `allowed_scopes` to 255 characters
## Solution
Fixed all truncation points to support up to 512 characters:
### Database Migration (Version0015Date20251123100100.php)
```php
// Increase oidc_clients.allowed_scopes from 256 to 512
$table->changeColumn('allowed_scopes', [
'notnull' => false,
'length' => 512,
]);
// Increase oidc_access_tokens.scope from 128 to 512
$table->changeColumn('scope', [
'notnull' => true,
'length' => 512,
]);
```
### Code Changes
1. **TokenGenerationRequestListener.php** line 83: `128``512`
2. **LoginRedirectorController.php** line 437: `128``512`
3. **SettingsController.php** line 232: `255``511`
4. **DynamicRegistrationController.php** lines 182, 420: `255``511`
### Application Changes
1. **Added todo scopes** to default scope lists:
- `nextcloud_mcp_server/app.py`
- `tests/conftest.py` (DEFAULT_FULL_SCOPES, DEFAULT_READ_SCOPES, DEFAULT_WRITE_SCOPES)
2. **Skipped obsolete tests**:
- `test_scope_classification` - Script no longer exists
- `test_all_tools_classified` - Script no longer exists
## Verification
### Before Fix
- Scope length in database: **128 characters** (truncated)
- Tools visible: **32 out of 90** (35%)
- Missing scopes: `deck`, `tables`, `files`, `sharing`, partial `cookbook:write`
### After Fix
- Scope length in database: **247 characters** (full string)
- Tools visible: **90 out of 90** (100%)
- All scopes present and complete
### Test Results
```bash
$ uv run pytest tests/server/test_scope_authorization.py -v
===== 13 passed, 2 skipped in 22.11s =====
```
All scope authorization tests pass, including:
- ✅ Full access token shows all 90 tools
- ✅ Read-only token filters write tools
- ✅ Write-only token filters read tools
- ✅ JWT consent scenarios work correctly
- ✅ PRM endpoint lists all scopes
## Files Modified
### OIDC App (third_party/oidc/)
- `lib/Migration/Version0015Date20251123100100.php` - Database schema migration
- `lib/Listener/TokenGenerationRequestListener.php` - Token generation scope limit
- `lib/Controller/LoginRedirectorController.php` - OAuth flow scope limit
- `lib/Controller/SettingsController.php` - Client settings scope limit
- `lib/Controller/DynamicRegistrationController.php` - DCR scope limits
### MCP Server
- `nextcloud_mcp_server/app.py` - Added todo scopes to default scopes
- `tests/conftest.py` - Added todo scopes to all scope constants
- `tests/server/test_scope_authorization.py` - Skipped obsolete tests
## Impact
- ✅ All 90 MCP tools now accessible with full access token
- ✅ JWT tokens contain complete scope information
- ✅ No more scope truncation at any layer
- ✅ Database supports up to 512 characters (247 currently used, 265-char margin)
- ✅ Future-proof for adding more scopes
## Current Scope String
```
openid profile email notes:read notes:write calendar:read calendar:write todo:read todo:write contacts:read contacts:write cookbook:read cookbook:write deck:read deck:write tables:read tables:write files:read files:write sharing:read sharing:write
```
**Length**: 247 characters
**Capacity**: 512 characters
**Margin**: 265 characters (107% headroom)