2147fc1696
Refactors PR #190's hardcoded Unstructured.io integration into a flexible, extensible plugin system supporting multiple text extraction engines. - **`DocumentProcessor` ABC**: Abstract interface for all processors - **`ProcessorRegistry`**: Central registry for discovery and routing - **`ProcessingResult`**: Standardized output format across processors - **`UnstructuredProcessor`**: Refactored from `UnstructuredClient` - **`TesseractProcessor`**: Local OCR for images (lightweight alternative) - **`CustomHTTPProcessor`**: Generic wrapper for custom HTTP APIs - New `get_document_processor_config()` returns structured config - Supports enabling/disabling individual processors - Per-processor configuration via environment variables - **Breaking Change**: `ENABLE_UNSTRUCTURED_PARSING` replaced with: - `ENABLE_DOCUMENT_PROCESSING=true/false` (master switch) - `ENABLE_UNSTRUCTURED=true/false` (per-processor) - `ENABLE_TESSERACT=true/false` - `ENABLE_CUSTOM_PROCESSOR=true/false` - `parse_document()` now uses `ProcessorRegistry` - Auto-selects appropriate processor based on MIME type - Processor priority system (Unstructured=10, Tesseract=5, Custom=1) - `initialize_document_processors()` registers processors at startup - Integrated into both BasicAuth and OAuth lifespans - Graceful degradation if processors fail to initialize ```env ENABLE_DOCUMENT_PROCESSING=false ENABLE_UNSTRUCTURED=false UNSTRUCTURED_API_URL=http://unstructured:8000 UNSTRUCTURED_STRATEGY=auto # auto|fast|hi_res UNSTRUCTURED_LANGUAGES=eng,deu ENABLE_TESSERACT=false TESSERACT_LANG=eng ENABLE_CUSTOM_PROCESSOR=false CUSTOM_PROCESSOR_URL=http://localhost:9000/process CUSTOM_PROCESSOR_TYPES=application/pdf,image/jpeg ``` - **Removed**: `tests/test_unstructured_config.py` (legacy tests) - **Added**: `tests/unit/test_document_processor_config.py` - 7 unit tests for new config system - Tests individual and multi-processor configurations - **Added**: - `nextcloud_mcp_server/document_processors/__init__.py` - `nextcloud_mcp_server/document_processors/base.py` - `nextcloud_mcp_server/document_processors/registry.py` - `nextcloud_mcp_server/document_processors/unstructured.py` - `nextcloud_mcp_server/document_processors/tesseract.py` - `nextcloud_mcp_server/document_processors/custom_http.py` - `tests/unit/test_document_processor_config.py` - **Modified**: - `nextcloud_mcp_server/config.py` - New plugin config system - `nextcloud_mcp_server/app.py` - Processor initialization - `nextcloud_mcp_server/utils/document_parser.py` - Uses registry - `nextcloud_mcp_server/server/webdav.py` - Import updates - `env.sample` - New configuration format - `docker-compose.yml` - (profile changes from previous work) - **Removed**: - `nextcloud_mcp_server/client/unstructured_client.py` - Replaced by UnstructuredProcessor - `tests/test_unstructured_config.py` - Replaced with new tests ✅ **Extensible**: Add processors without modifying core code ✅ **Testable**: Mock processors for unit tests ✅ **Configurable**: Enable only needed processors ✅ **Flexible**: Choose fast (Tesseract) vs accurate (Unstructured) ✅ **Opt-in**: Disabled by default, no mandatory dependencies Users upgrading from PR #190 need to update environment variables: ```bash ENABLE_UNSTRUCTURED_PARSING=true ENABLE_DOCUMENT_PROCESSING=true ENABLE_UNSTRUCTURED=true ``` 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
101 lines
3.7 KiB
YAML
101 lines
3.7 KiB
YAML
services:
|
|
# Note: MariaDB is external service. You can find more information about the configuration here:
|
|
# https://hub.docker.com/_/mariadb
|
|
db:
|
|
# Note: Check the recommend version here: https://docs.nextcloud.com/server/latest/admin_manual/installation/system_requirements.html#server
|
|
image: docker.io/library/mariadb:lts@sha256:ae6119716edac6998ae85508431b3d2e666530ddf4e94c61a10710caec9b0f71
|
|
restart: always
|
|
command: --transaction-isolation=READ-COMMITTED
|
|
volumes:
|
|
- db:/var/lib/mysql
|
|
environment:
|
|
- MYSQL_ROOT_PASSWORD=password
|
|
- MYSQL_PASSWORD=password
|
|
- MYSQL_DATABASE=nextcloud
|
|
- MYSQL_USER=nextcloud
|
|
|
|
# Note: Redis is an external service. You can find more information about the configuration here:
|
|
# https://hub.docker.com/_/redis
|
|
redis:
|
|
image: docker.io/library/redis:alpine@sha256:59b6e694653476de2c992937ebe1c64182af4728e54bb49e9b7a6c26614d8933
|
|
restart: always
|
|
|
|
app:
|
|
image: docker.io/library/nextcloud:32.0.1@sha256:42a36b4711191273a9cf8cebfd35602909eb1bee461b7076d4d5a57f7ec2b81e
|
|
restart: always
|
|
ports:
|
|
- 0.0.0.0:8080:80
|
|
depends_on:
|
|
- redis
|
|
- db
|
|
volumes:
|
|
- nextcloud:/var/www/html
|
|
- ./app-hooks/post-installation:/docker-entrypoint-hooks.d/post-installation:ro
|
|
# Mount OIDC development directory outside /var/www/html to avoid rsync conflicts
|
|
# The post-installation hook will register /opt/apps as an additional app directory
|
|
- ./third_party/oidc:/opt/apps/oidc:ro
|
|
environment:
|
|
- NEXTCLOUD_TRUSTED_DOMAINS=app
|
|
- NEXTCLOUD_ADMIN_USER=admin
|
|
- NEXTCLOUD_ADMIN_PASSWORD=admin
|
|
- MYSQL_PASSWORD=password
|
|
- MYSQL_DATABASE=nextcloud
|
|
- MYSQL_USER=nextcloud
|
|
- MYSQL_HOST=db
|
|
- REDIS_HOST=redis
|
|
|
|
recipes:
|
|
image: docker.io/library/nginx:alpine@sha256:61e01287e546aac28a3f56839c136b31f590273f3b41187a36f46f6a03bbfe22
|
|
restart: always
|
|
volumes:
|
|
- ./tests/fixtures/test_recipe.html:/usr/share/nginx/html/test_recipe.html:ro
|
|
- ./tests/fixtures/nginx.conf:/etc/nginx/nginx.conf:ro
|
|
|
|
unstructured:
|
|
image: downloads.unstructured.io/unstructured-io/unstructured-api:latest
|
|
restart: always
|
|
ports:
|
|
- 127.0.0.1:8002:8000
|
|
# Unstructured API runs on port 8000 internally
|
|
# We expose it on 8002 externally to avoid conflict
|
|
profiles:
|
|
- text-extraction
|
|
|
|
mcp:
|
|
build: .
|
|
command: ["--transport", "streamable-http"]
|
|
restart: always
|
|
depends_on:
|
|
- app
|
|
ports:
|
|
- 127.0.0.1:8000:8000
|
|
environment:
|
|
- NEXTCLOUD_HOST=http://app:80
|
|
- NEXTCLOUD_USERNAME=admin
|
|
- NEXTCLOUD_PASSWORD=admin
|
|
|
|
mcp-oauth:
|
|
build: .
|
|
command: ["--transport", "streamable-http", "--oauth", "--port", "8001", "--oauth-token-type", "jwt"]
|
|
restart: always
|
|
depends_on:
|
|
- app
|
|
ports:
|
|
- 127.0.0.1:8001:8001
|
|
environment:
|
|
- NEXTCLOUD_HOST=http://app:80
|
|
- NEXTCLOUD_MCP_SERVER_URL=http://localhost:8001
|
|
- NEXTCLOUD_PUBLIC_ISSUER_URL=http://localhost:8080
|
|
- NEXTCLOUD_OIDC_CLIENT_STORAGE=/app/.oauth/nextcloud_oauth_client.json
|
|
- NEXTCLOUD_OIDC_SCOPES=openid profile email notes:read notes:write calendar:read calendar:write contacts:read contacts:write cookbook:read cookbook:write deck:read deck:write tables:read tables:write files:read files:write sharing:read sharing:write todo:read todo:write
|
|
# No USERNAME/PASSWORD - will use OAuth with Dynamic Client Registration
|
|
# Client credentials will be registered and stored in volume on first startup
|
|
# JWT token type is used for testing (faster validation, scopes embedded in token)
|
|
volumes:
|
|
- oauth-client-storage:/app/.oauth
|
|
|
|
volumes:
|
|
nextcloud:
|
|
db:
|
|
oauth-client-storage:
|