Files
nextcloud-mcp-server/SCOPE_TRUNCATION_FIX.md
T
Chris Coutinho 2147fc1696 refactor: Transform document parsing into pluggable processor architecture
Refactors PR #190's hardcoded Unstructured.io integration into a flexible,
extensible plugin system supporting multiple text extraction engines.

- **`DocumentProcessor` ABC**: Abstract interface for all processors
- **`ProcessorRegistry`**: Central registry for discovery and routing
- **`ProcessingResult`**: Standardized output format across processors

- **`UnstructuredProcessor`**: Refactored from `UnstructuredClient`
- **`TesseractProcessor`**: Local OCR for images (lightweight alternative)
- **`CustomHTTPProcessor`**: Generic wrapper for custom HTTP APIs

- New `get_document_processor_config()` returns structured config
- Supports enabling/disabling individual processors
- Per-processor configuration via environment variables
- **Breaking Change**: `ENABLE_UNSTRUCTURED_PARSING` replaced with:
  - `ENABLE_DOCUMENT_PROCESSING=true/false` (master switch)
  - `ENABLE_UNSTRUCTURED=true/false` (per-processor)
  - `ENABLE_TESSERACT=true/false`
  - `ENABLE_CUSTOM_PROCESSOR=true/false`

- `parse_document()` now uses `ProcessorRegistry`
- Auto-selects appropriate processor based on MIME type
- Processor priority system (Unstructured=10, Tesseract=5, Custom=1)

- `initialize_document_processors()` registers processors at startup
- Integrated into both BasicAuth and OAuth lifespans
- Graceful degradation if processors fail to initialize

```env
ENABLE_DOCUMENT_PROCESSING=false

ENABLE_UNSTRUCTURED=false
UNSTRUCTURED_API_URL=http://unstructured:8000
UNSTRUCTURED_STRATEGY=auto  # auto|fast|hi_res
UNSTRUCTURED_LANGUAGES=eng,deu

ENABLE_TESSERACT=false
TESSERACT_LANG=eng

ENABLE_CUSTOM_PROCESSOR=false
CUSTOM_PROCESSOR_URL=http://localhost:9000/process
CUSTOM_PROCESSOR_TYPES=application/pdf,image/jpeg
```

- **Removed**: `tests/test_unstructured_config.py` (legacy tests)
- **Added**: `tests/unit/test_document_processor_config.py`
  - 7 unit tests for new config system
  - Tests individual and multi-processor configurations

- **Added**:
  - `nextcloud_mcp_server/document_processors/__init__.py`
  - `nextcloud_mcp_server/document_processors/base.py`
  - `nextcloud_mcp_server/document_processors/registry.py`
  - `nextcloud_mcp_server/document_processors/unstructured.py`
  - `nextcloud_mcp_server/document_processors/tesseract.py`
  - `nextcloud_mcp_server/document_processors/custom_http.py`
  - `tests/unit/test_document_processor_config.py`

- **Modified**:
  - `nextcloud_mcp_server/config.py` - New plugin config system
  - `nextcloud_mcp_server/app.py` - Processor initialization
  - `nextcloud_mcp_server/utils/document_parser.py` - Uses registry
  - `nextcloud_mcp_server/server/webdav.py` - Import updates
  - `env.sample` - New configuration format
  - `docker-compose.yml` - (profile changes from previous work)

- **Removed**:
  - `nextcloud_mcp_server/client/unstructured_client.py` - Replaced by UnstructuredProcessor
  - `tests/test_unstructured_config.py` - Replaced with new tests

 **Extensible**: Add processors without modifying core code
 **Testable**: Mock processors for unit tests
 **Configurable**: Enable only needed processors
 **Flexible**: Choose fast (Tesseract) vs accurate (Unstructured)
 **Opt-in**: Disabled by default, no mandatory dependencies

Users upgrading from PR #190 need to update environment variables:
```bash
ENABLE_UNSTRUCTURED_PARSING=true

ENABLE_DOCUMENT_PROCESSING=true
ENABLE_UNSTRUCTURED=true
```

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-25 19:28:35 +02:00

3.7 KiB

JWT Scope Truncation Fix - Summary

Problem

When using JWT tokens with many scopes, the scope claim in the JWT payload was being truncated, causing only 32 out of 90 tools to be visible to the MCP client.

Root Cause

Multiple hardcoded string length limits in the Nextcloud OIDC app code:

  1. Database schema: oc_oidc_access_tokens.scope column was VARCHAR(128) - too small for 247-character scope string
  2. Code truncation in TokenGenerationRequestListener.php: substr($scopes, 0, 128) on line 83
  3. Code truncation in LoginRedirectorController.php: substr($scope, 0, 128) on line 437
  4. Client scope limits: Multiple places truncating allowed_scopes to 255 characters

Solution

Fixed all truncation points to support up to 512 characters:

Database Migration (Version0015Date20251123100100.php)

// Increase oidc_clients.allowed_scopes from 256 to 512
$table->changeColumn('allowed_scopes', [
    'notnull' => false,
    'length' => 512,
]);

// Increase oidc_access_tokens.scope from 128 to 512
$table->changeColumn('scope', [
    'notnull' => true,
    'length' => 512,
]);

Code Changes

  1. TokenGenerationRequestListener.php line 83: 128512
  2. LoginRedirectorController.php line 437: 128512
  3. SettingsController.php line 232: 255511
  4. DynamicRegistrationController.php lines 182, 420: 255511

Application Changes

  1. Added todo scopes to default scope lists:

    • nextcloud_mcp_server/app.py
    • tests/conftest.py (DEFAULT_FULL_SCOPES, DEFAULT_READ_SCOPES, DEFAULT_WRITE_SCOPES)
  2. Skipped obsolete tests:

    • test_scope_classification - Script no longer exists
    • test_all_tools_classified - Script no longer exists

Verification

Before Fix

  • Scope length in database: 128 characters (truncated)
  • Tools visible: 32 out of 90 (35%)
  • Missing scopes: deck, tables, files, sharing, partial cookbook:write

After Fix

  • Scope length in database: 247 characters (full string)
  • Tools visible: 90 out of 90 (100%)
  • All scopes present and complete

Test Results

$ uv run pytest tests/server/test_scope_authorization.py -v
===== 13 passed, 2 skipped in 22.11s =====

All scope authorization tests pass, including:

  • Full access token shows all 90 tools
  • Read-only token filters write tools
  • Write-only token filters read tools
  • JWT consent scenarios work correctly
  • PRM endpoint lists all scopes

Files Modified

OIDC App (third_party/oidc/)

  • lib/Migration/Version0015Date20251123100100.php - Database schema migration
  • lib/Listener/TokenGenerationRequestListener.php - Token generation scope limit
  • lib/Controller/LoginRedirectorController.php - OAuth flow scope limit
  • lib/Controller/SettingsController.php - Client settings scope limit
  • lib/Controller/DynamicRegistrationController.php - DCR scope limits

MCP Server

  • nextcloud_mcp_server/app.py - Added todo scopes to default scopes
  • tests/conftest.py - Added todo scopes to all scope constants
  • tests/server/test_scope_authorization.py - Skipped obsolete tests

Impact

  • All 90 MCP tools now accessible with full access token
  • JWT tokens contain complete scope information
  • No more scope truncation at any layer
  • Database supports up to 512 characters (247 currently used, 265-char margin)
  • Future-proof for adding more scopes

Current Scope String

openid profile email notes:read notes:write calendar:read calendar:write todo:read todo:write contacts:read contacts:write cookbook:read cookbook:write deck:read deck:write tables:read tables:write files:read files:write sharing:read sharing:write

Length: 247 characters Capacity: 512 characters Margin: 265 characters (107% headroom)