Compare commits

...

39 Commits

Author SHA1 Message Date
Chris Coutinho b11c3ddfb6 build: Rename /helm -> /charts 2025-10-29 10:30:48 +01:00
Chris Coutinho 562c102711 feat(server): Add /live & /health endpoints 2025-10-29 10:29:30 +01:00
Chris Coutinho d7a8719d0e build: Remove duplicate --host 2025-10-29 01:40:36 +01:00
Chris Coutinho 97fa9ef8a7 build: Update helm chart README and instructions 2025-10-29 01:37:08 +01:00
Chris Coutinho 77dd17b3e1 build: fix templating/linting errors 2025-10-29 01:37:07 +01:00
Chris Coutinho d56ec33b77 build: update helm chart 2025-10-29 01:37:07 +01:00
Chris Coutinho a1c5acc1c2 feat: Initialize helm chart 2025-10-29 01:37:03 +01:00
Chris Coutinho 6833f7f117 Merge pull request #242 from cbcoutinho/renovate/pin-dependencies
chore(deps): pin downloads.unstructured.io/unstructured-io/unstructured-api docker tag to a43ab55
2025-10-26 02:43:56 +02:00
renovate-bot-cbcoutinho[bot] 7db2a5c586 chore(deps): pin downloads.unstructured.io/unstructured-io/unstructured-api docker tag to a43ab55 2025-10-25 22:05:59 +00:00
Chris Coutinho b76c10f18c Merge branch 'docs/oauth-arch' 2025-10-25 22:08:02 +02:00
Chris Coutinho ab7411d9fd test: Fix tests 2025-10-25 22:07:46 +02:00
Chris Coutinho d02fe3c3b6 Merge pull request #241 from cbcoutinho/docs/oauth-arch
docs: Update OAuth architecture
2025-10-25 21:58:45 +02:00
Chris Coutinho 49f9cead69 docs: Update OAuth architecture 2025-10-25 21:54:30 +02:00
Chris Coutinho 415b1c901b docs: Parse available scopes from registered tools and update docs 2025-10-25 21:16:40 +02:00
Chris Coutinho 90b96a8afe docs: Remove old [skip ci] 2025-10-25 20:43:12 +02:00
github-actions[bot] 57a2157c58 bump: version 0.20.0 → 0.21.0 2025-10-25 18:33:56 +00:00
Chris Coutinho bfdc33c390 Merge branch 'feature/document-parsing-registry' 2025-10-25 20:33:17 +02:00
Chris Coutinho 8844c07ecb docs: Update README [skip ci] 2025-10-25 20:27:41 +02:00
Chris Coutinho 0a0ef10989 Merge pull request #240 from cbcoutinho/feature/document-parsing-registry
Transform document parsing into pluggable processor architecture
2025-10-25 20:25:38 +02:00
Chris Coutinho 9414d9c9c3 test: Add integration marker to user/group tests 2025-10-25 20:16:14 +02:00
Chris Coutinho 8a52df4a8e test: Skip unstructured tests if not enabled 2025-10-25 20:13:41 +02:00
Chris Coutinho a36038422b feat: Add text processing background worker for telling client about progress 2025-10-25 19:52:45 +02:00
Chris Coutinho 2147fc1696 refactor: Transform document parsing into pluggable processor architecture
Refactors PR #190's hardcoded Unstructured.io integration into a flexible,
extensible plugin system supporting multiple text extraction engines.

- **`DocumentProcessor` ABC**: Abstract interface for all processors
- **`ProcessorRegistry`**: Central registry for discovery and routing
- **`ProcessingResult`**: Standardized output format across processors

- **`UnstructuredProcessor`**: Refactored from `UnstructuredClient`
- **`TesseractProcessor`**: Local OCR for images (lightweight alternative)
- **`CustomHTTPProcessor`**: Generic wrapper for custom HTTP APIs

- New `get_document_processor_config()` returns structured config
- Supports enabling/disabling individual processors
- Per-processor configuration via environment variables
- **Breaking Change**: `ENABLE_UNSTRUCTURED_PARSING` replaced with:
  - `ENABLE_DOCUMENT_PROCESSING=true/false` (master switch)
  - `ENABLE_UNSTRUCTURED=true/false` (per-processor)
  - `ENABLE_TESSERACT=true/false`
  - `ENABLE_CUSTOM_PROCESSOR=true/false`

- `parse_document()` now uses `ProcessorRegistry`
- Auto-selects appropriate processor based on MIME type
- Processor priority system (Unstructured=10, Tesseract=5, Custom=1)

- `initialize_document_processors()` registers processors at startup
- Integrated into both BasicAuth and OAuth lifespans
- Graceful degradation if processors fail to initialize

```env
ENABLE_DOCUMENT_PROCESSING=false

ENABLE_UNSTRUCTURED=false
UNSTRUCTURED_API_URL=http://unstructured:8000
UNSTRUCTURED_STRATEGY=auto  # auto|fast|hi_res
UNSTRUCTURED_LANGUAGES=eng,deu

ENABLE_TESSERACT=false
TESSERACT_LANG=eng

ENABLE_CUSTOM_PROCESSOR=false
CUSTOM_PROCESSOR_URL=http://localhost:9000/process
CUSTOM_PROCESSOR_TYPES=application/pdf,image/jpeg
```

- **Removed**: `tests/test_unstructured_config.py` (legacy tests)
- **Added**: `tests/unit/test_document_processor_config.py`
  - 7 unit tests for new config system
  - Tests individual and multi-processor configurations

- **Added**:
  - `nextcloud_mcp_server/document_processors/__init__.py`
  - `nextcloud_mcp_server/document_processors/base.py`
  - `nextcloud_mcp_server/document_processors/registry.py`
  - `nextcloud_mcp_server/document_processors/unstructured.py`
  - `nextcloud_mcp_server/document_processors/tesseract.py`
  - `nextcloud_mcp_server/document_processors/custom_http.py`
  - `tests/unit/test_document_processor_config.py`

- **Modified**:
  - `nextcloud_mcp_server/config.py` - New plugin config system
  - `nextcloud_mcp_server/app.py` - Processor initialization
  - `nextcloud_mcp_server/utils/document_parser.py` - Uses registry
  - `nextcloud_mcp_server/server/webdav.py` - Import updates
  - `env.sample` - New configuration format
  - `docker-compose.yml` - (profile changes from previous work)

- **Removed**:
  - `nextcloud_mcp_server/client/unstructured_client.py` - Replaced by UnstructuredProcessor
  - `tests/test_unstructured_config.py` - Replaced with new tests

 **Extensible**: Add processors without modifying core code
 **Testable**: Mock processors for unit tests
 **Configurable**: Enable only needed processors
 **Flexible**: Choose fast (Tesseract) vs accurate (Unstructured)
 **Opt-in**: Disabled by default, no mandatory dependencies

Users upgrading from PR #190 need to update environment variables:
```bash
ENABLE_UNSTRUCTURED_PARSING=true

ENABLE_DOCUMENT_PROCESSING=true
ENABLE_UNSTRUCTURED=true
```

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-25 19:28:35 +02:00
Chris Coutinho a19017c686 Merge pull request #190 from yuisheaven/feature/introduce_files_parsing_with_unstructured_service_for_webdav_files_retrieval
Introduce files parsing with "unstructured" service for webdav files retrieval
2025-10-25 19:11:27 +02:00
yuisheaven f0e5333e43 Merge branch 'master' into feature/introduce_files_parsing_with_unstructured_service_for_webdav_files_retrieval 2025-10-25 17:23:38 +02:00
Chris Coutinho 553e84e5f2 Merge pull request #239 from cbcoutinho/renovate/docker.io-library-nextcloud-32.x
chore(deps): update docker.io/library/nextcloud docker tag to v32.0.1
2025-10-25 12:28:24 +02:00
renovate-bot-cbcoutinho[bot] ff20031601 chore(deps): update docker.io/library/nextcloud docker tag to v32.0.1 2025-10-25 10:06:16 +00:00
yuisheaven db79afacb9 improved tests - fixing the linting 2025-10-23 22:56:25 +02:00
yuisheaven 6730dd4a4b added new tests for unstructured api (pdf and docx workflow) 2025-10-23 22:38:27 +02:00
yuisheaven 8734c4b292 add new tests for unstructured config 2025-10-23 22:37:52 +02:00
yuisheaven 29df645d53 Merge branch 'master' into feature/introduce_files_parsing_with_unstructured_service_for_webdav_files_retrieval 2025-10-23 21:30:09 +02:00
yuisheaven 98627593d5 corrected smaller merge issues 2025-10-21 20:55:33 +02:00
yuisheaven 64649c902d Merge branch 'master' into feature/introduce_files_parsing_with_unstructured_service_for_webdav_files_retrieval 2025-10-21 20:37:00 +02:00
yuisheaven 3ff6346c03 ran ruff format via uv 2025-10-05 02:16:42 +02:00
yuisheaven c9a687171a added envs for unstructured to control OCR quality and OCR languages 2025-10-04 05:21:02 +02:00
yuisheaven df5f85e0c6 updated claude.md test instructs to consider checking for .env file if probems occur regarding unset envs 2025-10-04 04:28:59 +02:00
yuisheaven 76dce41ed9 added first versoin of the new document_parser utility and added it to the webdav file retrieval logic 2025-10-04 04:28:24 +02:00
yuisheaven 642108ee91 added new "unstructured" docker service to compose stack and introduced new envs 2025-10-04 04:27:31 +02:00
yuisheaven ce5724f05e adjusted pyproject.toml config and uv.lock 2025-10-04 04:26:33 +02:00
41 changed files with 5084 additions and 1104 deletions
+29
View File
@@ -0,0 +1,29 @@
name: Release Charts
on:
push:
tags:
- v*
jobs:
release:
# depending on default permission settings for your org (contents being read-only or read-write for workloads), you will have to add permissions
# see: https://docs.github.com/en/actions/security-guides/automatic-token-authentication#modifying-the-permissions-for-the-github_token
permissions:
contents: write
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v4
with:
fetch-depth: 0
- name: Configure Git
run: |
git config user.name "$GITHUB_ACTOR"
git config user.email "$GITHUB_ACTOR@users.noreply.github.com"
- name: Run chart-releaser
uses: helm/chart-releaser-action@v1.7.0
env:
CR_TOKEN: "${{ secrets.GITHUB_TOKEN }}"
+10
View File
@@ -1,3 +1,13 @@
## v0.21.0 (2025-10-25)
### Feat
- Add text processing background worker for telling client about progress
### Refactor
- Transform document parsing into pluggable processor architecture
## v0.20.0 (2025-10-24)
### Feat
+2
View File
@@ -38,6 +38,8 @@ uv run pytest -m integration -v
uv run pytest -m "not integration" -v
```
! Hint: If the tests are failing due to missing environment variables, then usually the correct .env has not been created or not correctly configured yet.
### Load Testing
```bash
# Run benchmark with default settings (10 workers, 30 seconds)
+59 -9
View File
@@ -23,6 +23,7 @@ The Nextcloud MCP (Model Context Protocol) server allows Large Language Models l
| **Calendar** | ✅ Full CalDAV + tasks (20+ tools) | ✅ Events, free/busy, tasks (4 tools) |
| **Contacts** | ✅ Full CardDAV (8 tools) | ✅ Find person, current user (2 tools) |
| **Files (WebDAV)** | ✅ Full filesystem access (12 tools) | ✅ Read, folder tree, sharing (3 tools) |
| **Document Processing** | ✅ OCR with progress (PDF, DOCX, images) | ❌ Not implemented |
| **Deck** | ✅ Full project management (15 tools) | ✅ Basic board/card ops (2 tools) |
| **Tables** | ✅ Row operations (5 tools) | ❌ Not implemented |
| **Cookbook** | ✅ Full recipe management (13 tools) | ❌ Not implemented |
@@ -185,18 +186,67 @@ The server exposes Nextcloud functionality through MCP tools (for actions) and r
The server provides 90+ tools across 8 Nextcloud apps. When using OAuth, tools are dynamically filtered based on your granted scopes.
For a complete list of all supported OAuth scopes and their descriptions, see [OAuth Scopes Documentation](docs/oauth-architecture.md#oauth-scopes).
#### Available Tool Categories
| App | Tools | Read Scope | Write Scope | Operations |
|-----|-------|-----------|-------------|------------|
| **Notes** | 7 | `mcp:notes:read` | `mcp:notes:write` | Create, read, update, delete, search notes |
| **Calendar** | 20+ | `mcp:calendar:read` | `mcp:calendar:write` | Events, todos (tasks), calendars, recurring events, attendees |
| **Contacts** | 8 | `mcp:contacts:read` | `mcp:contacts:write` | Create, read, update, delete contacts and address books |
| **Files (WebDAV)** | 12 | `mcp:files:read` | `mcp:files:write` | List, read, upload, delete, move files and folders |
| **Deck** | 15 | `mcp:deck:read` | `mcp:deck:write` | Boards, stacks, cards, labels, assignments |
| **Cookbook** | 13 | `mcp:cookbook:read` | `mcp:cookbook:write` | Recipes, import from URLs, search, categories |
| **Tables** | 5 | `mcp:tables:read` | `mcp:tables:write` | Row operations on Nextcloud Tables |
| **Sharing** | 10+ | `mcp:sharing:read` | `mcp:sharing:write` | Create, manage, delete shares |
| **Notes** | 7 | `notes:read` | `notes:write` | Create, read, update, delete, search notes |
| **Calendar** | 20+ | `calendar:read` `todo:read` | `calendar:write` `todo:write` | Events, todos (tasks), calendars, recurring events, attendees |
| **Contacts** | 8 | `contacts:read` | `contacts:write` | Create, read, update, delete contacts and address books |
| **Files (WebDAV)** | 12 | `files:read` | `files:write` | List, read, upload, delete, move files; **OCR/document processing** |
| **Deck** | 15 | `deck:read` | `deck:write` | Boards, stacks, cards, labels, assignments |
| **Cookbook** | 13 | `cookbook:read` | `cookbook:write` | Recipes, import from URLs, search, categories |
| **Tables** | 5 | `tables:read` | `tables:write` | Row operations on Nextcloud Tables |
| **Sharing** | 10+ | `sharing:read` | `sharing:write` | Create, manage, delete shares |
#### Document Processing (Optional)
The WebDAV file reading tool (`nc_webdav_read_file`) supports **automatic text extraction** from documents and images:
**Supported Formats:**
- **Documents**: PDF, DOCX, PPTX, XLSX, RTF, ODT, EPUB
- **Images**: PNG, JPEG, TIFF, BMP (with OCR)
- **Email**: EML, MSG files
**Features:**
- **Progress Notifications**: Long-running OCR operations (up to 120s) send progress updates every 10 seconds to prevent client timeouts
- **Pluggable Architecture**: Multiple processor backends (Unstructured.io, Tesseract, custom HTTP APIs)
- **Automatic Detection**: Files are processed based on MIME type
- **Graceful Fallback**: Returns base64-encoded content if processing fails
**Configuration:**
```dotenv
# Enable document processing (optional)
ENABLE_DOCUMENT_PROCESSING=true
# Unstructured.io processor (cloud/API-based, supports many formats)
ENABLE_UNSTRUCTURED=true
UNSTRUCTURED_API_URL=http://localhost:8002
UNSTRUCTURED_STRATEGY=auto # auto, fast, or hi_res
UNSTRUCTURED_LANGUAGES=eng,deu
PROGRESS_INTERVAL=10 # Progress update interval in seconds
# Tesseract processor (local OCR, images only)
ENABLE_TESSERACT=false
TESSERACT_LANG=eng
# Custom HTTP processor
ENABLE_CUSTOM_PROCESSOR=false
CUSTOM_PROCESSOR_URL=http://localhost:9000/process
CUSTOM_PROCESSOR_TYPES=application/pdf,image/jpeg
```
**Example Usage:**
```
AI: "Read the contents of Documents/report.pdf"
→ Uses nc_webdav_read_file tool with automatic OCR processing
→ Returns extracted text with parsing metadata
→ Sends progress updates during long operations
```
See [env.sample](env.sample) for complete configuration options.
**Example Tools:**
- `nc_notes_create_note` - Create a new note
@@ -209,7 +259,7 @@ The server provides 90+ tools across 8 Nextcloud apps. When using OAuth, tools a
- And 80+ more...
> [!TIP]
> **OAuth Scope Filtering**: When connecting via OAuth, MCP clients will only see tools for which you've granted access. For example, granting only `mcp:notes:read` and `mcp:notes:write` will show 7 Notes tools instead of all 90+ tools. See [OAuth Troubleshooting - Limited Scopes](docs/oauth-troubleshooting.md#limited-scopes---only-seeing-notes-tools) if you're only seeing a subset of tools.
> **OAuth Scope Filtering**: When connecting via OAuth, MCP clients will only see tools for which you've granted access. For example, granting only `notes:read` and `notes:write` will show 7 Notes tools instead of all 90+ tools. See [OAuth Scopes Documentation](docs/oauth-architecture.md#oauth-scopes) for the complete scope reference, or [OAuth Troubleshooting - Limited Scopes](docs/oauth-troubleshooting.md#limited-scopes---only-seeing-notes-tools) if you're only seeing a subset of tools.
>
> **Known Issue**: Claude Code and some other MCP clients may only request/grant Notes scopes during initial connection. Track progress at [#234](https://github.com/cbcoutinho/nextcloud-mcp-server/issues/234).
+23
View File
@@ -0,0 +1,23 @@
# Patterns to ignore when building packages.
# This supports shell glob matching, relative path matching, and
# negation (prefixed with !). Only one pattern per line.
.DS_Store
# Common VCS dirs
.git/
.gitignore
.bzr/
.bzrignore
.hg/
.hgignore
.svn/
# Common backup files
*.swp
*.bak
*.tmp
*.orig
*~
# Various IDEs
.project
.idea/
*.tmproj
.vscode/
+23
View File
@@ -0,0 +1,23 @@
apiVersion: v2
name: nextcloud-mcp-server
description: A Helm chart for Nextcloud MCP Server - enables AI assistants to interact with Nextcloud
type: application
version: 0.1.0
appVersion: "0.21.0"
keywords:
- nextcloud
- mcp
- model-context-protocol
- llm
- ai
- claude
- webdav
- caldav
- carddav
maintainers:
- name: Chris Coutinho
email: chris@coutinho.io
home: https://github.com/cbcoutinho/nextcloud-mcp-server
sources:
- https://github.com/cbcoutinho/nextcloud-mcp-server
icon: https://raw.githubusercontent.com/nextcloud/server/master/core/img/logo/logo.svg
+470
View File
@@ -0,0 +1,470 @@
# Nextcloud MCP Server Helm Chart
This Helm chart deploys the Nextcloud MCP (Model Context Protocol) Server on a Kubernetes cluster, enabling AI assistants to interact with your Nextcloud instance.
## Prerequisites
- Kubernetes 1.19+
- Helm 3.0+
- A running Nextcloud instance (accessible from the Kubernetes cluster)
- Nextcloud credentials (username/password for basic auth OR OAuth client for OAuth mode)
## Installation
### Quick Start with Basic Authentication
```bash
# Install with basic auth (recommended for most users)
helm install nextcloud-mcp ./helm/nextcloud-mcp-server \
--set nextcloud.host=https://cloud.example.com \
--set auth.basic.username=myuser \
--set auth.basic.password=mypassword
```
### Using a values file
Create a `custom-values.yaml` file:
```yaml
nextcloud:
host: https://cloud.example.com
auth:
mode: basic
basic:
username: myuser
password: mypassword
resources:
limits:
cpu: 1000m
memory: 512Mi
requests:
cpu: 100m
memory: 128Mi
```
Install with your custom values:
```bash
helm install nextcloud-mcp ./helm/nextcloud-mcp-server -f custom-values.yaml
```
### OAuth Authentication Mode (Experimental)
**Warning:** OAuth mode is experimental and requires patches to the Nextcloud `user_oidc` app. See the [Authentication Guide](https://github.com/cbcoutinho/nextcloud-mcp-server#authentication) for details.
```yaml
nextcloud:
host: https://cloud.example.com
mcpServerUrl: https://mcp.example.com
publicIssuerUrl: https://cloud.example.com
auth:
mode: oauth
oauth:
# Optional: provide pre-registered client credentials
# If not provided, will use Dynamic Client Registration
clientId: "your-client-id"
clientSecret: "your-client-secret"
persistence:
enabled: true
size: 100Mi
ingress:
enabled: true
className: nginx
hosts:
- host: mcp.example.com
paths:
- path: /
pathType: Prefix
tls:
- secretName: nextcloud-mcp-tls
hosts:
- mcp.example.com
```
## Configuration
### Key Configuration Parameters
#### Nextcloud Connection
| Parameter | Description | Default |
|-----------|-------------|---------|
| `nextcloud.host` | URL of your Nextcloud instance (required) | `""` |
| `nextcloud.mcpServerUrl` | MCP server URL for OAuth callbacks (OAuth only, optional) | Smart default* |
| `nextcloud.publicIssuerUrl` | Public issuer URL for OAuth (OAuth only, optional) | Smart default** |
**Smart Defaults:**
- `*mcpServerUrl`: If not set, automatically uses ingress host (if enabled) or `http://localhost:8000` (for port-forward setups)
- `**publicIssuerUrl`: If not set, automatically defaults to `nextcloud.host` (which works when both clients and MCP server access Nextcloud at the same URL)
#### Authentication
| Parameter | Description | Default |
|-----------|-------------|---------|
| `auth.mode` | Authentication mode: `basic` or `oauth` | `basic` |
| `auth.basic.username` | Nextcloud username (basic auth) | `""` |
| `auth.basic.password` | Nextcloud password (basic auth) | `""` |
| `auth.basic.existingSecret` | Use existing secret for credentials | `""` |
| `auth.oauth.clientId` | OAuth client ID (OAuth mode, optional) | `""` |
| `auth.oauth.clientSecret` | OAuth client secret (OAuth mode, optional) | `""` |
| `auth.oauth.persistence.enabled` | Enable persistent storage for OAuth | `true` |
| `auth.oauth.persistence.size` | Size of OAuth storage PVC | `100Mi` |
#### Image Configuration
| Parameter | Description | Default |
|-----------|-------------|---------|
| `image.repository` | Container image repository | `ghcr.io/cbcoutinho/nextcloud-mcp-server` |
| `image.tag` | Container image tag | `""` (uses chart appVersion) |
| `image.pullPolicy` | Image pull policy | `IfNotPresent` |
#### Resources
| Parameter | Description | Default |
|-----------|-------------|---------|
| `resources.limits.cpu` | CPU limit | `1000m` |
| `resources.limits.memory` | Memory limit | `512Mi` |
| `resources.requests.cpu` | CPU request | `100m` |
| `resources.requests.memory` | Memory request | `128Mi` |
#### Service
| Parameter | Description | Default |
|-----------|-------------|---------|
| `service.type` | Service type | `ClusterIP` |
| `service.port` | Service port | `8000` |
| `service.oauthPort` | OAuth service port | `8001` |
#### Ingress
| Parameter | Description | Default |
|-----------|-------------|---------|
| `ingress.enabled` | Enable ingress | `false` |
| `ingress.className` | Ingress class name | `""` |
| `ingress.hosts` | Ingress host configuration | See values.yaml |
| `ingress.tls` | Ingress TLS configuration | `[]` |
#### Autoscaling
| Parameter | Description | Default |
|-----------|-------------|---------|
| `autoscaling.enabled` | Enable HPA | `false` |
| `autoscaling.minReplicas` | Minimum replicas | `1` |
| `autoscaling.maxReplicas` | Maximum replicas | `10` |
| `autoscaling.targetCPUUtilizationPercentage` | Target CPU % | `80` |
#### Health Probes
| Parameter | Description | Default |
|-----------|-------------|---------|
| `livenessProbe.httpGet.path` | Liveness probe endpoint | `/health/live` |
| `livenessProbe.initialDelaySeconds` | Initial delay for liveness | `30` |
| `livenessProbe.periodSeconds` | Check interval for liveness | `10` |
| `readinessProbe.httpGet.path` | Readiness probe endpoint | `/health/ready` |
| `readinessProbe.initialDelaySeconds` | Initial delay for readiness | `10` |
| `readinessProbe.periodSeconds` | Check interval for readiness | `5` |
The application exposes HTTP health check endpoints:
- `/health/live` - Liveness probe (checks if application is running)
- `/health/ready` - Readiness probe (checks if application is ready to serve traffic)
#### Document Processing (Optional)
| Parameter | Description | Default |
|-----------|-------------|---------|
| `documentProcessing.enabled` | Enable document processing | `false` |
| `documentProcessing.defaultProcessor` | Default processor | `unstructured` |
| `documentProcessing.unstructured.enabled` | Enable Unstructured.io processor | `false` |
| `documentProcessing.unstructured.apiUrl` | Unstructured API URL | `http://unstructured:8000` |
| `documentProcessing.tesseract.enabled` | Enable Tesseract OCR | `false` |
## Examples
### Example 1: Basic Auth with Ingress
```yaml
nextcloud:
host: https://cloud.example.com
auth:
mode: basic
basic:
username: admin
password: secure-password
ingress:
enabled: true
className: nginx
annotations:
cert-manager.io/cluster-issuer: letsencrypt-prod
hosts:
- host: mcp.example.com
paths:
- path: /
pathType: Prefix
tls:
- secretName: mcp-tls
hosts:
- mcp.example.com
resources:
limits:
cpu: 2000m
memory: 1Gi
requests:
cpu: 200m
memory: 256Mi
```
### Example 2: Using Existing Secrets
#### Basic Auth with Existing Secret
Create a secret manually:
```bash
kubectl create secret generic nextcloud-credentials \
--from-literal=username=myuser \
--from-literal=password=mypassword
```
Then reference it in your values:
```yaml
nextcloud:
host: https://cloud.example.com
auth:
mode: basic
basic:
existingSecret: nextcloud-credentials
usernameKey: username
passwordKey: password
```
#### OAuth with Existing Secret (Pre-registered Client)
If you have a pre-registered OAuth client:
```bash
kubectl create secret generic nextcloud-oauth-creds \
--from-literal=clientId=my-oauth-client-id \
--from-literal=clientSecret=my-oauth-client-secret
```
Then reference it in your values:
```yaml
nextcloud:
host: https://cloud.example.com
# mcpServerUrl and publicIssuerUrl are optional!
# If not set, mcpServerUrl defaults to ingress host or localhost
# publicIssuerUrl defaults to nextcloud.host
auth:
mode: oauth
oauth:
existingSecret: nextcloud-oauth-creds
clientIdKey: clientId
clientSecretKey: clientSecret
persistence:
enabled: true
ingress:
enabled: true
hosts:
- host: mcp.example.com
paths:
- path: /
pathType: Prefix
tls:
- secretName: mcp-tls
hosts:
- mcp.example.com
```
### Example 3: OAuth with Document Processing and Dynamic Client Registration
This example shows OAuth without pre-registered credentials (using DCR) and optional URL values:
```yaml
nextcloud:
host: https://cloud.example.com
# mcpServerUrl will automatically use ingress host (https://mcp.example.com)
# publicIssuerUrl will automatically default to nextcloud.host
auth:
mode: oauth
oauth:
# No clientId/clientSecret - will use Dynamic Client Registration!
persistence:
enabled: true
storageClass: fast-ssd
size: 200Mi
documentProcessing:
enabled: true
defaultProcessor: unstructured
unstructured:
enabled: true
apiUrl: http://unstructured-api:8000
strategy: hi_res
languages: eng,deu,fra
ingress:
enabled: true
className: nginx
hosts:
- host: mcp.example.com
paths:
- path: /
pathType: Prefix
```
### Example 4: High Availability with Autoscaling
```yaml
replicaCount: 2
autoscaling:
enabled: true
minReplicas: 2
maxReplicas: 20
targetCPUUtilizationPercentage: 70
targetMemoryUtilizationPercentage: 80
resources:
limits:
cpu: 2000m
memory: 1Gi
requests:
cpu: 500m
memory: 512Mi
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchExpressions:
- key: app.kubernetes.io/name
operator: In
values:
- nextcloud-mcp-server
topologyKey: kubernetes.io/hostname
```
## Upgrading
### To upgrade an existing deployment:
```bash
helm upgrade nextcloud-mcp ./helm/nextcloud-mcp-server -f custom-values.yaml
```
### To upgrade with new values:
```bash
helm upgrade nextcloud-mcp ./helm/nextcloud-mcp-server \
--set image.tag=0.21.0 \
--set resources.limits.memory=1Gi
```
## Uninstalling
```bash
helm uninstall nextcloud-mcp
```
**Note:** This will delete all resources including PVCs. If you want to preserve OAuth client data, backup the PVC before uninstalling.
## Troubleshooting
### Check pod status
```bash
kubectl get pods -l app.kubernetes.io/name=nextcloud-mcp-server
```
### View logs
```bash
kubectl logs -l app.kubernetes.io/name=nextcloud-mcp-server --tail=100 -f
```
### Check health endpoints
The application exposes health check endpoints for monitoring:
```bash
# Port forward to the service
kubectl port-forward svc/nextcloud-mcp 8000:8000
# Check liveness (if app is running)
curl http://localhost:8000/health/live
# Check readiness (if app is ready to serve traffic)
curl http://localhost:8000/health/ready
```
**Example responses:**
Liveness (always returns 200 if running):
```json
{
"status": "alive",
"mode": "basic"
}
```
Readiness (returns 200 if ready, 503 if not ready):
```json
{
"status": "ready",
"checks": {
"nextcloud_configured": "ok",
"auth_mode": "basic",
"auth_configured": "ok"
}
}
```
### Common Issues
1. **Connection refused to Nextcloud**
- Verify `nextcloud.host` is accessible from the Kubernetes cluster
- Check network policies and firewall rules
2. **Authentication failures**
- For basic auth: verify username/password are correct
- For OAuth: check that OIDC app is properly configured
3. **OAuth persistence issues**
- Verify PVC is bound: `kubectl get pvc`
- Check storage class exists: `kubectl get storageclass`
4. **Resource constraints**
- Increase memory limits if seeing OOM errors
- Adjust CPU requests based on load
## Security Considerations
1. **Secrets Management**: Consider using external secret management (e.g., Sealed Secrets, External Secrets Operator)
2. **TLS**: Always use TLS/HTTPS for production deployments
3. **Network Policies**: Restrict network access to necessary services only
4. **RBAC**: Review and customize ServiceAccount permissions as needed
5. **App Passwords**: For basic auth, use Nextcloud app passwords instead of main account passwords
## Support
- GitHub Issues: https://github.com/cbcoutinho/nextcloud-mcp-server/issues
- Documentation: https://github.com/cbcoutinho/nextcloud-mcp-server#readme
## License
This chart is licensed under AGPL-3.0, consistent with the Nextcloud MCP Server project.
@@ -0,0 +1,80 @@
Thank you for installing {{ .Chart.Name }}!
Your Nextcloud MCP Server has been deployed in {{ .Values.auth.mode }} authentication mode.
1. Get the application URL by running these commands:
{{- if .Values.ingress.enabled }}
{{- range $host := .Values.ingress.hosts }}
{{- range .paths }}
http{{ if $.Values.ingress.tls }}s{{ end }}://{{ $host.host }}{{ .path }}
{{- end }}
{{- end }}
{{- else if contains "NodePort" .Values.service.type }}
export NODE_PORT=$(kubectl get --namespace {{ .Release.Namespace }} -o jsonpath="{.spec.ports[0].nodePort}" services {{ include "nextcloud-mcp-server.fullname" . }})
export NODE_IP=$(kubectl get nodes --namespace {{ .Release.Namespace }} -o jsonpath="{.items[0].status.addresses[0].address}")
echo http://$NODE_IP:$NODE_PORT
{{- else if contains "LoadBalancer" .Values.service.type }}
NOTE: It may take a few minutes for the LoadBalancer IP to be available.
You can watch the status of by running 'kubectl get --namespace {{ .Release.Namespace }} svc -w {{ include "nextcloud-mcp-server.fullname" . }}'
export SERVICE_IP=$(kubectl get svc --namespace {{ .Release.Namespace }} {{ include "nextcloud-mcp-server.fullname" . }} --template "{{"{{ range (index .status.loadBalancer.ingress 0) }}{{.}}{{ end }}"}}")
echo http://$SERVICE_IP:{{ .Values.service.port }}
{{- else if contains "ClusterIP" .Values.service.type }}
export POD_NAME=$(kubectl get pods --namespace {{ .Release.Namespace }} -l "app.kubernetes.io/name={{ include "nextcloud-mcp-server.name" . }},app.kubernetes.io/instance={{ .Release.Name }}" -o jsonpath="{.items[0].metadata.name}")
export CONTAINER_PORT=$(kubectl get pod --namespace {{ .Release.Namespace }} $POD_NAME -o jsonpath="{.spec.containers[0].ports[0].containerPort}")
echo "Visit http://127.0.0.1:8080 to use your MCP server"
kubectl --namespace {{ .Release.Namespace }} port-forward $POD_NAME 8080:$CONTAINER_PORT
{{- end }}
2. Check the deployment status:
kubectl --namespace {{ .Release.Namespace }} get pods -l "app.kubernetes.io/name={{ include "nextcloud-mcp-server.name" . }},app.kubernetes.io/instance={{ .Release.Name }}"
{{- if eq .Values.auth.mode "basic" }}
3. Basic Authentication Mode:
{{- if .Values.auth.basic.existingSecret }}
- Credentials: (using existing secret {{ .Values.auth.basic.existingSecret }})
{{- else }}
- Username: {{ .Values.auth.basic.username }}
- Password: (stored in secret {{ include "nextcloud-mcp-server.basicAuthSecretName" . }})
{{- end }}
- Connected to: {{ .Values.nextcloud.host }}
{{- else if eq .Values.auth.mode "oauth" }}
3. OAuth Authentication Mode:
- Server URL: {{ include "nextcloud-mcp-server.mcpServerUrl" . }}
- Issuer URL: {{ include "nextcloud-mcp-server.publicIssuerUrl" . }}
- Connected to: {{ .Values.nextcloud.host }}
{{- if .Values.auth.oauth.existingSecret }}
- Using existing OAuth client secret: {{ .Values.auth.oauth.existingSecret }}
{{- else if and .Values.auth.oauth.clientId .Values.auth.oauth.clientSecret }}
- Using pre-registered OAuth client
{{- else }}
- Using Dynamic Client Registration (DCR)
{{- end }}
{{- if .Values.auth.oauth.persistence.enabled }}
- OAuth client credentials are persisted in PVC: {{ include "nextcloud-mcp-server.oauthPvcName" . }}
{{- end }}
IMPORTANT: OAuth mode is experimental and requires patches to the user_oidc app.
See: https://github.com/cbcoutinho/nextcloud-mcp-server#authentication
{{- end }}
{{- if .Values.documentProcessing.enabled }}
4. Document Processing:
- Enabled: {{ .Values.documentProcessing.enabled }}
- Default processor: {{ .Values.documentProcessing.defaultProcessor }}
{{- if .Values.documentProcessing.unstructured.enabled }}
- Unstructured API: {{ .Values.documentProcessing.unstructured.apiUrl }}
{{- end }}
{{- end }}
For more information and documentation:
- GitHub: https://github.com/cbcoutinho/nextcloud-mcp-server
- Documentation: https://github.com/cbcoutinho/nextcloud-mcp-server#readme
To upgrade this deployment:
helm upgrade {{ .Release.Name }} nextcloud-mcp-server
To uninstall:
helm uninstall {{ .Release.Name }}
@@ -0,0 +1,146 @@
{{/*
Expand the name of the chart.
*/}}
{{- define "nextcloud-mcp-server.name" -}}
{{- default .Chart.Name .Values.nameOverride | trunc 63 | trimSuffix "-" }}
{{- end }}
{{/*
Create a default fully qualified app name.
We truncate at 63 chars because some Kubernetes name fields are limited to this (by the DNS naming spec).
If release name contains chart name it will be used as a full name.
*/}}
{{- define "nextcloud-mcp-server.fullname" -}}
{{- if .Values.fullnameOverride }}
{{- .Values.fullnameOverride | trunc 63 | trimSuffix "-" }}
{{- else }}
{{- $name := default .Chart.Name .Values.nameOverride }}
{{- if contains $name .Release.Name }}
{{- .Release.Name | trunc 63 | trimSuffix "-" }}
{{- else }}
{{- printf "%s-%s" .Release.Name $name | trunc 63 | trimSuffix "-" }}
{{- end }}
{{- end }}
{{- end }}
{{/*
Create chart name and version as used by the chart label.
*/}}
{{- define "nextcloud-mcp-server.chart" -}}
{{- printf "%s-%s" .Chart.Name .Chart.Version | replace "+" "_" | trunc 63 | trimSuffix "-" }}
{{- end }}
{{/*
Common labels
*/}}
{{- define "nextcloud-mcp-server.labels" -}}
helm.sh/chart: {{ include "nextcloud-mcp-server.chart" . }}
{{ include "nextcloud-mcp-server.selectorLabels" . }}
{{- if .Chart.AppVersion }}
app.kubernetes.io/version: {{ .Chart.AppVersion | quote }}
{{- end }}
app.kubernetes.io/managed-by: {{ .Release.Service }}
{{- end }}
{{/*
Selector labels
*/}}
{{- define "nextcloud-mcp-server.selectorLabels" -}}
app.kubernetes.io/name: {{ include "nextcloud-mcp-server.name" . }}
app.kubernetes.io/instance: {{ .Release.Name }}
{{- end }}
{{/*
Create the name of the service account to use
*/}}
{{- define "nextcloud-mcp-server.serviceAccountName" -}}
{{- if .Values.serviceAccount.create }}
{{- default (include "nextcloud-mcp-server.fullname" .) .Values.serviceAccount.name }}
{{- else }}
{{- default "default" .Values.serviceAccount.name }}
{{- end }}
{{- end }}
{{/*
Create the name of the secret to use for basic auth
*/}}
{{- define "nextcloud-mcp-server.basicAuthSecretName" -}}
{{- if .Values.auth.basic.existingSecret }}
{{- .Values.auth.basic.existingSecret }}
{{- else }}
{{- include "nextcloud-mcp-server.fullname" . }}-basic-auth
{{- end }}
{{- end }}
{{/*
Create the name of the secret to use for OAuth
*/}}
{{- define "nextcloud-mcp-server.oauthSecretName" -}}
{{- if .Values.auth.oauth.existingSecret }}
{{- .Values.auth.oauth.existingSecret }}
{{- else }}
{{- include "nextcloud-mcp-server.fullname" . }}-oauth
{{- end }}
{{- end }}
{{/*
Create the name of the PVC to use for OAuth storage
*/}}
{{- define "nextcloud-mcp-server.oauthPvcName" -}}
{{- if .Values.auth.oauth.persistence.existingClaim }}
{{- .Values.auth.oauth.persistence.existingClaim }}
{{- else }}
{{- include "nextcloud-mcp-server.fullname" . }}-oauth-storage
{{- end }}
{{- end }}
{{/*
Return the appropriate MCP server port based on auth mode
*/}}
{{- define "nextcloud-mcp-server.port" -}}
{{- if eq .Values.auth.mode "oauth" }}
{{- .Values.auth.oauth.port }}
{{- else }}
{{- .Values.mcp.port }}
{{- end }}
{{- end }}
{{/*
Return the image tag
*/}}
{{- define "nextcloud-mcp-server.imageTag" -}}
{{- .Values.image.tag | default .Chart.AppVersion }}
{{- end }}
{{/*
Return the public issuer URL for OAuth
Defaults to nextcloud.host if not specified
*/}}
{{- define "nextcloud-mcp-server.publicIssuerUrl" -}}
{{- if .Values.nextcloud.publicIssuerUrl }}
{{- .Values.nextcloud.publicIssuerUrl }}
{{- else }}
{{- .Values.nextcloud.host }}
{{- end }}
{{- end }}
{{/*
Return the MCP server URL for OAuth callbacks
If not specified:
- Uses ingress host if ingress is enabled
- Otherwise defaults to http://localhost:8000 (for port-forward setups)
*/}}
{{- define "nextcloud-mcp-server.mcpServerUrl" -}}
{{- if .Values.nextcloud.mcpServerUrl }}
{{- .Values.nextcloud.mcpServerUrl }}
{{- else if .Values.ingress.enabled }}
{{- $host := index .Values.ingress.hosts 0 }}
{{- if .Values.ingress.tls }}
{{- printf "https://%s" $host.host }}
{{- else }}
{{- printf "http://%s" $host.host }}
{{- end }}
{{- else }}
{{- printf "http://localhost:%d" (int .Values.mcp.port) }}
{{- end }}
{{- end }}
@@ -0,0 +1,189 @@
apiVersion: apps/v1
kind: Deployment
metadata:
name: {{ include "nextcloud-mcp-server.fullname" . }}
labels:
{{- include "nextcloud-mcp-server.labels" . | nindent 4 }}
spec:
{{- if not .Values.autoscaling.enabled }}
replicas: {{ .Values.replicaCount }}
{{- end }}
selector:
matchLabels:
{{- include "nextcloud-mcp-server.selectorLabels" . | nindent 6 }}
template:
metadata:
annotations:
checksum/secret: {{ include (print $.Template.BasePath "/secret.yaml") . | sha256sum }}
{{- with .Values.podAnnotations }}
{{- toYaml . | nindent 8 }}
{{- end }}
labels:
{{- include "nextcloud-mcp-server.labels" . | nindent 8 }}
{{- with .Values.podLabels }}
{{- toYaml . | nindent 8 }}
{{- end }}
spec:
{{- with .Values.imagePullSecrets }}
imagePullSecrets:
{{- toYaml . | nindent 8 }}
{{- end }}
serviceAccountName: {{ include "nextcloud-mcp-server.serviceAccountName" . }}
securityContext:
{{- toYaml .Values.podSecurityContext | nindent 8 }}
{{- with .Values.initContainers }}
initContainers:
{{- toYaml . | nindent 8 }}
{{- end }}
containers:
- name: {{ .Chart.Name }}
securityContext:
{{- toYaml .Values.securityContext | nindent 12 }}
image: "{{ .Values.image.repository }}:{{ include "nextcloud-mcp-server.imageTag" . }}"
imagePullPolicy: {{ .Values.image.pullPolicy }}
args:
- "--transport"
- "{{ .Values.mcp.transport }}"
{{- if eq .Values.auth.mode "oauth" }}
- "--oauth"
- "--port"
- "{{ .Values.auth.oauth.port }}"
- "--oauth-token-type"
- "{{ .Values.auth.oauth.tokenType }}"
{{- end }}
ports:
- name: http
containerPort: {{ include "nextcloud-mcp-server.port" . }}
protocol: TCP
env:
# Nextcloud connection
- name: NEXTCLOUD_HOST
value: {{ .Values.nextcloud.host | quote }}
{{- if eq .Values.auth.mode "basic" }}
# Basic auth mode
- name: NEXTCLOUD_USERNAME
valueFrom:
secretKeyRef:
name: {{ include "nextcloud-mcp-server.basicAuthSecretName" . }}
key: {{ .Values.auth.basic.usernameKey }}
- name: NEXTCLOUD_PASSWORD
valueFrom:
secretKeyRef:
name: {{ include "nextcloud-mcp-server.basicAuthSecretName" . }}
key: {{ .Values.auth.basic.passwordKey }}
{{- else if eq .Values.auth.mode "oauth" }}
# OAuth mode
- name: NEXTCLOUD_MCP_SERVER_URL
value: {{ include "nextcloud-mcp-server.mcpServerUrl" . | quote }}
- name: NEXTCLOUD_PUBLIC_ISSUER_URL
value: {{ include "nextcloud-mcp-server.publicIssuerUrl" . | quote }}
- name: NEXTCLOUD_OIDC_CLIENT_STORAGE
value: "/app/.oauth/nextcloud_oauth_client.json"
- name: NEXTCLOUD_OIDC_SCOPES
value: {{ .Values.auth.oauth.scopes | quote }}
{{- if .Values.auth.oauth.clientId }}
- name: NEXTCLOUD_OIDC_CLIENT_ID
valueFrom:
secretKeyRef:
name: {{ include "nextcloud-mcp-server.oauthSecretName" . }}
key: {{ .Values.auth.oauth.clientIdKey }}
- name: NEXTCLOUD_OIDC_CLIENT_SECRET
valueFrom:
secretKeyRef:
name: {{ include "nextcloud-mcp-server.oauthSecretName" . }}
key: {{ .Values.auth.oauth.clientSecretKey }}
{{- end }}
{{- end }}
{{- if .Values.documentProcessing.enabled }}
# Document processing
- name: ENABLE_DOCUMENT_PROCESSING
value: {{ .Values.documentProcessing.enabled | quote }}
- name: DOCUMENT_PROCESSOR
value: {{ .Values.documentProcessing.defaultProcessor | quote }}
- name: PROGRESS_INTERVAL
value: {{ .Values.documentProcessing.progressInterval | quote }}
{{- if .Values.documentProcessing.unstructured.enabled }}
- name: ENABLE_UNSTRUCTURED
value: "true"
- name: UNSTRUCTURED_API_URL
value: {{ .Values.documentProcessing.unstructured.apiUrl | quote }}
- name: UNSTRUCTURED_TIMEOUT
value: {{ .Values.documentProcessing.unstructured.timeout | quote }}
- name: UNSTRUCTURED_STRATEGY
value: {{ .Values.documentProcessing.unstructured.strategy | quote }}
- name: UNSTRUCTURED_LANGUAGES
value: {{ .Values.documentProcessing.unstructured.languages | quote }}
{{- end }}
{{- if .Values.documentProcessing.tesseract.enabled }}
- name: ENABLE_TESSERACT
value: "true"
{{- if .Values.documentProcessing.tesseract.cmd }}
- name: TESSERACT_CMD
value: {{ .Values.documentProcessing.tesseract.cmd | quote }}
{{- end }}
- name: TESSERACT_LANG
value: {{ .Values.documentProcessing.tesseract.lang | quote }}
{{- end }}
{{- if .Values.documentProcessing.custom.enabled }}
- name: ENABLE_CUSTOM_PROCESSOR
value: "true"
- name: CUSTOM_PROCESSOR_NAME
value: {{ .Values.documentProcessing.custom.name | quote }}
- name: CUSTOM_PROCESSOR_URL
value: {{ .Values.documentProcessing.custom.url | quote }}
{{- if .Values.documentProcessing.custom.apiKey }}
- name: CUSTOM_PROCESSOR_API_KEY
value: {{ .Values.documentProcessing.custom.apiKey | quote }}
{{- end }}
- name: CUSTOM_PROCESSOR_TIMEOUT
value: {{ .Values.documentProcessing.custom.timeout | quote }}
- name: CUSTOM_PROCESSOR_TYPES
value: {{ .Values.documentProcessing.custom.types | quote }}
{{- end }}
{{- end }}
{{- with .Values.extraEnv }}
{{- toYaml . | nindent 12 }}
{{- end }}
{{- with .Values.extraEnvFrom }}
envFrom:
{{- toYaml . | nindent 12 }}
{{- end }}
livenessProbe:
{{- toYaml .Values.livenessProbe | nindent 12 }}
readinessProbe:
{{- toYaml .Values.readinessProbe | nindent 12 }}
resources:
{{- toYaml .Values.resources | nindent 12 }}
volumeMounts:
- name: tmp
mountPath: /tmp
{{- if and (eq .Values.auth.mode "oauth") .Values.auth.oauth.persistence.enabled }}
- name: oauth-storage
mountPath: /app/.oauth
{{- end }}
{{- with .Values.volumeMounts }}
{{- toYaml . | nindent 12 }}
{{- end }}
volumes:
- name: tmp
emptyDir: {}
{{- if and (eq .Values.auth.mode "oauth") .Values.auth.oauth.persistence.enabled }}
- name: oauth-storage
persistentVolumeClaim:
claimName: {{ include "nextcloud-mcp-server.oauthPvcName" . }}
{{- end }}
{{- with .Values.volumes }}
{{- toYaml . | nindent 8 }}
{{- end }}
{{- with .Values.nodeSelector }}
nodeSelector:
{{- toYaml . | nindent 8 }}
{{- end }}
{{- with .Values.affinity }}
affinity:
{{- toYaml . | nindent 8 }}
{{- end }}
{{- with .Values.tolerations }}
tolerations:
{{- toYaml . | nindent 8 }}
{{- end }}
@@ -0,0 +1,32 @@
{{- if .Values.autoscaling.enabled }}
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: {{ include "nextcloud-mcp-server.fullname" . }}
labels:
{{- include "nextcloud-mcp-server.labels" . | nindent 4 }}
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: {{ include "nextcloud-mcp-server.fullname" . }}
minReplicas: {{ .Values.autoscaling.minReplicas }}
maxReplicas: {{ .Values.autoscaling.maxReplicas }}
metrics:
{{- if .Values.autoscaling.targetCPUUtilizationPercentage }}
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: {{ .Values.autoscaling.targetCPUUtilizationPercentage }}
{{- end }}
{{- if .Values.autoscaling.targetMemoryUtilizationPercentage }}
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: {{ .Values.autoscaling.targetMemoryUtilizationPercentage }}
{{- end }}
{{- end }}
@@ -0,0 +1,61 @@
{{- if .Values.ingress.enabled -}}
{{- $fullName := include "nextcloud-mcp-server.fullname" . -}}
{{- $svcPort := .Values.service.port -}}
{{- if and .Values.ingress.className (not (semverCompare ">=1.18-0" .Capabilities.KubeVersion.GitVersion)) }}
{{- if not (hasKey .Values.ingress.annotations "kubernetes.io/ingress.class") }}
{{- $_ := set .Values.ingress.annotations "kubernetes.io/ingress.class" .Values.ingress.className}}
{{- end }}
{{- end }}
{{- if semverCompare ">=1.19-0" .Capabilities.KubeVersion.GitVersion -}}
apiVersion: networking.k8s.io/v1
{{- else if semverCompare ">=1.14-0" .Capabilities.KubeVersion.GitVersion -}}
apiVersion: networking.k8s.io/v1beta1
{{- else -}}
apiVersion: extensions/v1beta1
{{- end }}
kind: Ingress
metadata:
name: {{ $fullName }}
labels:
{{- include "nextcloud-mcp-server.labels" . | nindent 4 }}
{{- with .Values.ingress.annotations }}
annotations:
{{- toYaml . | nindent 4 }}
{{- end }}
spec:
{{- if and .Values.ingress.className (semverCompare ">=1.18-0" .Capabilities.KubeVersion.GitVersion) }}
ingressClassName: {{ .Values.ingress.className }}
{{- end }}
{{- if .Values.ingress.tls }}
tls:
{{- range .Values.ingress.tls }}
- hosts:
{{- range .hosts }}
- {{ . | quote }}
{{- end }}
secretName: {{ .secretName }}
{{- end }}
{{- end }}
rules:
{{- range .Values.ingress.hosts }}
- host: {{ .host | quote }}
http:
paths:
{{- range .paths }}
- path: {{ .path }}
{{- if and .pathType (semverCompare ">=1.18-0" $.Capabilities.KubeVersion.GitVersion) }}
pathType: {{ .pathType }}
{{- end }}
backend:
{{- if semverCompare ">=1.19-0" $.Capabilities.KubeVersion.GitVersion }}
service:
name: {{ $fullName }}
port:
number: {{ $svcPort }}
{{- else }}
serviceName: {{ $fullName }}
servicePort: {{ $svcPort }}
{{- end }}
{{- end }}
{{- end }}
{{- end }}
@@ -0,0 +1,17 @@
{{- if and (eq .Values.auth.mode "oauth") .Values.auth.oauth.persistence.enabled (not .Values.auth.oauth.persistence.existingClaim) }}
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: {{ include "nextcloud-mcp-server.fullname" . }}-oauth-storage
labels:
{{- include "nextcloud-mcp-server.labels" . | nindent 4 }}
spec:
accessModes:
- {{ .Values.auth.oauth.persistence.accessMode }}
{{- if .Values.auth.oauth.persistence.storageClass }}
storageClassName: {{ .Values.auth.oauth.persistence.storageClass }}
{{- end }}
resources:
requests:
storage: {{ .Values.auth.oauth.persistence.size }}
{{- end }}
@@ -0,0 +1,29 @@
{{- if eq .Values.auth.mode "basic" }}
{{- if not .Values.auth.basic.existingSecret }}
apiVersion: v1
kind: Secret
metadata:
name: {{ include "nextcloud-mcp-server.fullname" . }}-basic-auth
labels:
{{- include "nextcloud-mcp-server.labels" . | nindent 4 }}
type: Opaque
data:
{{ .Values.auth.basic.usernameKey }}: {{ .Values.auth.basic.username | b64enc | quote }}
{{ .Values.auth.basic.passwordKey }}: {{ .Values.auth.basic.password | b64enc | quote }}
{{- end }}
{{- end }}
---
{{- if eq .Values.auth.mode "oauth" }}
{{- if and .Values.auth.oauth.clientId (not .Values.auth.oauth.existingSecret) }}
apiVersion: v1
kind: Secret
metadata:
name: {{ include "nextcloud-mcp-server.fullname" . }}-oauth
labels:
{{- include "nextcloud-mcp-server.labels" . | nindent 4 }}
type: Opaque
data:
{{ .Values.auth.oauth.clientIdKey }}: {{ .Values.auth.oauth.clientId | b64enc | quote }}
{{ .Values.auth.oauth.clientSecretKey }}: {{ .Values.auth.oauth.clientSecret | b64enc | quote }}
{{- end }}
{{- end }}
@@ -0,0 +1,19 @@
apiVersion: v1
kind: Service
metadata:
name: {{ include "nextcloud-mcp-server.fullname" . }}
labels:
{{- include "nextcloud-mcp-server.labels" . | nindent 4 }}
{{- with .Values.service.annotations }}
annotations:
{{- toYaml . | nindent 4 }}
{{- end }}
spec:
type: {{ .Values.service.type }}
ports:
- port: {{ .Values.service.port }}
targetPort: http
protocol: TCP
name: http
selector:
{{- include "nextcloud-mcp-server.selectorLabels" . | nindent 4 }}
@@ -0,0 +1,13 @@
{{- if .Values.serviceAccount.create -}}
apiVersion: v1
kind: ServiceAccount
metadata:
name: {{ include "nextcloud-mcp-server.serviceAccountName" . }}
labels:
{{- include "nextcloud-mcp-server.labels" . | nindent 4 }}
{{- with .Values.serviceAccount.annotations }}
annotations:
{{- toYaml . | nindent 4 }}
{{- end }}
automountServiceAccountToken: {{ .Values.serviceAccount.automount }}
{{- end }}
+268
View File
@@ -0,0 +1,268 @@
# Default values for nextcloud-mcp-server
# This is a YAML-formatted file.
# Declare variables to be passed into your templates.
# Number of replicas
replicaCount: 1
image:
repository: ghcr.io/cbcoutinho/nextcloud-mcp-server
pullPolicy: IfNotPresent
# Overrides the image tag whose default is the chart appVersion.
tag: ""
imagePullSecrets: []
nameOverride: ""
fullnameOverride: ""
# Nextcloud connection settings
nextcloud:
# URL of your Nextcloud instance (required)
# Example: https://cloud.example.com
host: ""
# MCP server URL for OAuth callbacks (OAuth mode only)
# If not specified, will be constructed from ingress.hosts[0] if ingress is enabled,
# or defaults to http://localhost:8000 (suitable for port-forward setups)
# Example: https://mcp.example.com
mcpServerUrl: ""
# Public issuer URL for OAuth (OAuth mode only)
# If not specified, defaults to nextcloud.host
# Only set this if your Nextcloud is accessible at a different URL for OAuth
# Example: https://cloud.example.com
publicIssuerUrl: ""
# Authentication configuration
# Choose either basic auth OR oauth (not both)
auth:
# Authentication mode: "basic" or "oauth"
# basic: Uses username/password (recommended for most users)
# oauth: Uses OAuth2/OIDC (experimental, requires patches)
mode: basic
# Basic authentication settings
basic:
# Nextcloud username (ignored if existingSecret is set)
username: ""
# Nextcloud password or app password (recommended) (ignored if existingSecret is set)
password: ""
# Use existing secret instead of creating one
# If set, username and password above are ignored
# Secret must contain keys specified in usernameKey and passwordKey
# Example:
# kubectl create secret generic my-nextcloud-creds \
# --from-literal=username=myuser \
# --from-literal=password=mypassword
existingSecret: ""
# Keys in the existing secret
usernameKey: "username"
passwordKey: "password"
# OAuth2/OIDC settings (experimental)
oauth:
# Port for OAuth MCP server (default: 8001)
port: 8001
# OAuth token type: "jwt" or "opaque"
tokenType: "jwt"
# Pre-registered OAuth client ID (optional, ignored if existingSecret is set)
# If not provided and no existingSecret, will use Dynamic Client Registration (DCR)
clientId: ""
# Pre-registered OAuth client secret (optional, ignored if existingSecret is set)
clientSecret: ""
# OAuth scopes to request (space-separated)
scopes: "openid profile email notes:read notes:write calendar:read calendar:write contacts:read contacts:write cookbook:read cookbook:write deck:read deck:write tables:read tables:write files:read files:write sharing:read sharing:write todo:read todo:write"
# Use existing secret for OAuth client credentials
# If set, clientId and clientSecret above are ignored
# Secret must contain keys specified in clientIdKey and clientSecretKey
# Example:
# kubectl create secret generic my-oauth-creds \
# --from-literal=clientId=my-client-id \
# --from-literal=clientSecret=my-client-secret
existingSecret: ""
# Keys in the existing secret
clientIdKey: "clientId"
clientSecretKey: "clientSecret"
# Persistent storage for OAuth client credentials
persistence:
enabled: true
# Storage class (leave empty for default)
storageClass: ""
accessMode: ReadWriteOnce
size: 100Mi
# Use existing PVC
existingClaim: ""
# MCP server configuration
mcp:
# Transport mode (default: streamable-http for SSE)
transport: "streamable-http"
# Port for basic auth mode
port: 8000
# Document processing configuration (optional)
documentProcessing:
# Enable document processing (PDF, DOCX, images, etc.)
enabled: false
# Default processor: unstructured, tesseract, or custom
defaultProcessor: "unstructured"
# Progress reporting interval in seconds
progressInterval: 10
# Unstructured.io processor
unstructured:
enabled: false
# Unstructured API endpoint
apiUrl: "http://unstructured:8000"
# Request timeout in seconds
timeout: 120
# Parsing strategy: auto, fast, or hi_res
strategy: "auto"
# OCR languages (comma-separated ISO 639-3 codes)
languages: "eng,deu"
# Tesseract processor (local OCR)
tesseract:
enabled: false
# Path to tesseract executable (optional, auto-detected if in PATH)
cmd: ""
# OCR language (e.g., eng, deu, eng+deu for multiple)
lang: "eng"
# Custom processor
custom:
enabled: false
# Unique name for your processor
name: "my_ocr"
# Custom processor API endpoint
url: ""
# Optional API key for authentication
apiKey: ""
# Request timeout in seconds
timeout: 60
# Comma-separated MIME types your processor supports
types: "application/pdf,image/jpeg,image/png"
serviceAccount:
# Specifies whether a service account should be created
create: true
# Automatically mount a ServiceAccount's API credentials?
automount: true
# Annotations to add to the service account
annotations: {}
# The name of the service account to use.
# If not set and create is true, a name is generated using the fullname template
name: ""
podAnnotations: {}
podLabels: {}
podSecurityContext:
fsGroup: 2000
securityContext:
capabilities:
drop:
- ALL
readOnlyRootFilesystem: true
runAsNonRoot: true
runAsUser: 1000
service:
type: ClusterIP
port: 8000
# For OAuth mode, you may want to expose both ports
oauthPort: 8001
annotations: {}
ingress:
enabled: false
className: ""
annotations: {}
# kubernetes.io/ingress.class: nginx
# kubernetes.io/tls-acme: "true"
# cert-manager.io/cluster-issuer: letsencrypt-prod
hosts:
- host: mcp.example.com
paths:
- path: /
pathType: Prefix
tls: []
# - secretName: nextcloud-mcp-tls
# hosts:
# - mcp.example.com
resources:
# We recommend setting resource requests and limits
limits:
cpu: 1000m
memory: 512Mi
requests:
cpu: 100m
memory: 128Mi
# Liveness probe configuration
# Checks if the application process is running
livenessProbe:
httpGet:
path: /health/live
port: http
scheme: HTTP
initialDelaySeconds: 30
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 3
# Readiness probe configuration
# Checks if the application is ready to serve traffic
readinessProbe:
httpGet:
path: /health/ready
port: http
scheme: HTTP
initialDelaySeconds: 10
periodSeconds: 5
timeoutSeconds: 3
failureThreshold: 3
# Autoscaling configuration
autoscaling:
enabled: false
minReplicas: 1
maxReplicas: 10
targetCPUUtilizationPercentage: 80
# targetMemoryUtilizationPercentage: 80
# Additional volumes on the output Deployment definition.
volumes: []
# - name: foo
# secret:
# secretName: mysecret
# optional: false
# Additional volumeMounts on the output Deployment definition.
volumeMounts: []
# - name: foo
# mountPath: "/etc/foo"
# readOnly: true
nodeSelector: {}
tolerations: []
affinity: {}
# Init containers
initContainers: []
# Additional environment variables
extraEnv: []
# - name: CUSTOM_VAR
# value: "custom_value"
# Additional environment variables from ConfigMaps or Secrets
extraEnvFrom: []
# - configMapRef:
# name: my-configmap
# - secretRef:
# name: my-secret
+11 -1
View File
@@ -21,7 +21,7 @@ services:
restart: always
app:
image: docker.io/library/nextcloud:32.0.0@sha256:f9bec5c77a8d5603354b990550a4d24487deae6e589dd20ce870e43e28460e18
image: docker.io/library/nextcloud:32.0.1@sha256:42a36b4711191273a9cf8cebfd35602909eb1bee461b7076d4d5a57f7ec2b81e
restart: always
ports:
- 0.0.0.0:8080:80
@@ -51,6 +51,16 @@ services:
- ./tests/fixtures/test_recipe.html:/usr/share/nginx/html/test_recipe.html:ro
- ./tests/fixtures/nginx.conf:/etc/nginx/nginx.conf:ro
unstructured:
image: downloads.unstructured.io/unstructured-io/unstructured-api:latest@sha256:a43ab55898599157fb0e0e097dabb8ecdd1d8e3df1ae5b67c6e15a136b171a6c
restart: always
ports:
- 127.0.0.1:8002:8000
# Unstructured API runs on port 8000 internally
# We expose it on 8002 externally to avoid conflict
profiles:
- unstructured
mcp:
build: .
command: ["--transport", "streamable-http"]
+542 -114
View File
@@ -8,166 +8,463 @@ The Nextcloud MCP Server acts as an **OAuth 2.0 Resource Server**, protecting ac
## Architecture Diagram
The complete OAuth flow includes server startup (with DCR), client discovery (with PRM), authorization (with PKCE), and API access phases:
```
═══════════════════════════════════════════════════════════════════════════════════
Phase 0: MCP Server Startup & Client Registration (DCR - RFC 7591)
═══════════════════════════════════════════════════════════════════════════════════
┌──────────────────┐ ┌─────────────────┐
│ MCP Server │ │ Nextcloud │
│ (Resource │ │ (OIDC Provider)│
│ Server) │ │ │
└────────┬─────────┘ └────────┬────────┘
│ │
│ 0a. OIDC Discovery │
├────────────────────────────────────>│
│ GET │
| /.well-known/openid-configuration │
│ │
│ 0b. Discovery response │
│<────────────────────────────────────┤
│ {issuer, endpoints, PKCE methods} │
│ │
│ 0c. Register OAuth client (DCR) │
├────────────────────────────────────>│
│ POST /apps/oidc/register │
│ {client_name, redirect_uris, │
│ scopes, token_type} │
│ │
│ 0d. Client credentials │
│<────────────────────────────────────┤
│ {client_id, client_secret} │
│ → Saved to .nextcloud_oauth_*.json │
│ │
│ ✓ Server ready for connections │
═══════════════════════════════════════════════════════════════════════════════════
Phase 1: Client Connection & Discovery (PRM - RFC 9728)
═══════════════════════════════════════════════════════════════════════════════════
┌─────────────┐ ┌──────────────────┐ ┌─────────────────┐
│ │ │ │ │
│ MCP Client │ │ MCP Server │ │ Nextcloud
│ (Claude, │ │ (Resource │ │ Instance
│ etc.) │ │ Server) │ │ │
│ │ │ MCP Server │ │ Nextcloud
│ MCP Client │ │ (Resource │ │ Instance
│ (Claude) │ │ Server) │ │
│ │ │ │ │ │
└──────┬──────┘ └────────┬─────────┘ └────────┬────────┘
│ │ │
│ │
│ 1. Connect to MCP │ │
1a. Connect to MCP │ │
├─────────────────────────────────>│ │
│ │ │
2. Return auth settings │ │
│ (issuer_url, scopes) │ │
1b. Return auth settings │ │
│<─────────────────────────────────┤ │
│ {issuer_url, resource_url} │ │
│ │ │
│ │
│ 3. Start OAuth flow (with PKCE) │ │
├──────────────────────────────────┼────────────────────────────────────>│
│ │ /apps/oidc/authorize │
│ │ │
│ 4. User authenticates in browser│ │
│<─────────────────────────────────┼─────────────────────────────────────┤
│ │ │
│ 5. Authorization code (redirect)│ │
│<─────────────────────────────────┤ │
│ │ │
│ 6. Exchange code for token │ │
├──────────────────────────────────┼────────────────────────────────────>│
│ │ /apps/oidc/token │
│ │ │
│ 7. Access token │ │
│<─────────────────────────────────┼─────────────────────────────────────┤
│ │ │
│ │ │
│ 8. API request with Bearer token│ │
1c. PRM Discovery (RFC 9728) │ │
├─────────────────────────────────>│ │
Authorization: Bearer xxx │ │
GET /.well-known/oauth- │ │
│ protected-resource/mcp │ │
│ │ │
│ 9. Validate token via userinfo
│ ├────────────────────────────────────>│
│ │ /apps/oidc/userinfo │
│ │ │
│ │ 10. User info (token valid) │
│ │<────────────────────────────────────┤
│ │ │
│ │ 11. Nextcloud API request │
│ ├────────────────────────────────────>│
│ │ Authorization: Bearer xxx │
│ │ (Notes, Calendar, etc.) │
│ │ │
│ │ 12. API response │
│ │<────────────────────────────────────┤
│ │ │
│ 13. MCP tool response │ │
1d. PRM response (scopes!) │
│<─────────────────────────────────┤ │
│ {resource, scopes_supported, │ ← Dynamically discovered from │
│ authorization_servers} │ @require_scopes decorators │
│ │ │
═══════════════════════════════════════════════════════════════════════════════════
Phase 2: OAuth Authorization Flow (PKCE - RFC 7636)
═══════════════════════════════════════════════════════════════════════════════════
│ │ │
│ 2a. Generate PKCE challenge │ │
│ code_verifier = random(43-128) │ │
│ code_challenge = SHA256(verif.) │ │
│ │ │
│ 2b. Authorization request │ │
├──────────────────────────────────┼────────────────────────────────────>│
│ /apps/oidc/authorize? │ │
│ client_id=xxx │ │
│ &code_challenge=abc... │ │
│ &code_challenge_method=S256 │ │
│ &scope=openid notes:read ... │ │
│ │ │
│ 2c. User consent page │ │
│<─────────────────────────────────┼─────────────────────────────────────┤
│ (Browser: Select scopes) │ │
│ │ │
│ 2d. User grants scopes │ │
├──────────────────────────────────┼────────────────────────────────────>│
│ │ │
│ 2e. Authorization code redirect │ │
│<─────────────────────────────────┼─────────────────────────────────────┤
│ callback?code=xyz123 │ │
│ │ │
│ 2f. Exchange code for token │ │
├──────────────────────────────────┼────────────────────────────────────>│
│ POST /apps/oidc/token │ │
│ {code, code_verifier, │ ← Validates PKCE challenge │
│ client_id, client_secret} │ │
│ │ │
│ 2g. Access token (JWT/opaque) │ │
│<─────────────────────────────────┼─────────────────────────────────────┤
│ {access_token, token_type, │ │
│ scope: "openid notes:read...") │ ← User's granted scopes │
│ │ │
═══════════════════════════════════════════════════════════════════════════════════
Phase 3: MCP Tool Access (Scope-based Authorization)
═══════════════════════════════════════════════════════════════════════════════════
│ │ │
│ 3a. list_tools request │ │
├─────────────────────────────────>│ │
│ Authorization: Bearer <token> │ │
│ │ │
│ │ 3b. Validate token │
│ ├────────────────────────────────────>│
│ │ GET /apps/oidc/userinfo │
│ │ Authorization: Bearer <token> │
│ │ │
│ │ 3c. Token valid + scopes │
│ │<────────────────────────────────────┤
│ │ {sub, scopes, ...} │
│ │ ← Cached for 1 hour │
│ │ │
│ 3d. Filtered tool list │ │
│<─────────────────────────────────┤ ← Only tools matching user's │
│ [tools matching token scopes] │ token scopes (via @require_scopes)
│ │ │
│ 3e. Call tool │ │
├─────────────────────────────────>│ │
│ nc_notes_get_note(note_id=1) │ ← @require_scopes("notes:read") │
│ Authorization: Bearer <token> │ │
│ │ │
│ │ 3f. Scope check PASSED │
│ │ ✓ Token has notes:read │
│ │ │
│ │ 3g. Nextcloud API call │
│ ├────────────────────────────────────>│
│ │ GET /apps/notes/api/v1/notes/1 │
│ │ Authorization: Bearer <token> │
│ │ ← user_oidc validates Bearer token │
│ │ │
│ │ 3h. API response │
│ │<────────────────────────────────────┤
│ │ {id: 1, title: "Note", ...} │
│ │ │
│ 3i. MCP tool response │ │
│<─────────────────────────────────┤ │
│ {note data} │ │
│ │ │
═══════════════════════════════════════════════════════════════════════════════════
Insufficient Scope Example (Step-Up Authorization)
═══════════════════════════════════════════════════════════════════════════════════
│ 4a. Call write tool │ │
├─────────────────────────────────>│ │
│ nc_notes_create_note(...) │ ← @require_scopes("notes:write") │
│ Authorization: Bearer <token> │ │
│ │ │
│ │ 4b. Scope check FAILED │
│ │ ✗ Token only has notes:read │
│ │ │
│ 4c. 403 Insufficient Scope │ │
│<─────────────────────────────────┤ │
│ WWW-Authenticate: Bearer │ │
│ error="insufficient_scope", │ │
│ scope="notes:write", │ │
│ resource_metadata="..." │ │
│ │ │
│ → Client can re-authorize with │ │
│ additional scopes (Step-Up) │ │
│ │ │
```
## Components
### 1. MCP Client
- Any MCP-compatible client (Claude Desktop, Claude Code, custom clients)
### 1. MCP Client (e.g., Claude Desktop, Claude Code)
**Capabilities**:
- Discovers OAuth configuration via MCP server
- Queries PRM endpoint for supported scopes
- Initiates OAuth flow with PKCE (Proof Key for Code Exchange)
- Stores and sends access token with each request
- **Example**: Claude Desktop, Claude Code
- Handles scope-based tool filtering
- Supports step-up authorization (re-auth for additional scopes)
### 2. MCP Server (Resource Server)
- **Role**: OAuth 2.0 Resource Server
- **Location**: This Nextcloud MCP Server implementation
- **Responsibilities**:
- Validates Bearer tokens by calling Nextcloud's userinfo endpoint
- Caches validated tokens (default: 1 hour TTL)
- Creates authenticated Nextcloud client instances per-user
- Enforces PKCE requirements (S256 code challenge method)
- Exposes Nextcloud functionality via MCP tools
**Examples**: Claude Desktop, Claude Code, MCP Inspector, custom MCP clients
### 2. MCP Server (Resource Server - This Implementation)
**Role**: OAuth 2.0 Resource Server (RFC 6749)
**Responsibilities**:
#### Startup Phase
- **OIDC Discovery**: Queries `/.well-known/openid-configuration` for OAuth endpoints
- **PKCE Validation**: Verifies server advertises S256 code challenge method
- **Dynamic Client Registration (DCR)**: Automatically registers OAuth client via `/apps/oidc/register` (RFC 7591)
- Or loads pre-configured client credentials
- Saves credentials to `.nextcloud_oauth_client.json`
- **Tool Registration**: Loads all MCP tools with their `@require_scopes` decorators
#### Client Connection Phase
- **Auth Settings**: Returns OAuth issuer URL and resource URL
- **PRM Endpoint**: Exposes `/.well-known/oauth-protected-resource/mcp` (RFC 9728)
- Dynamically discovers scopes from all registered tools
- Returns `scopes_supported` list based on `@require_scopes` decorators
#### Request Processing Phase
- **Token Validation**: Validates Bearer tokens via Nextcloud userinfo endpoint
- Supports both JWT and opaque tokens
- Caches validation results (1-hour TTL)
- Extracts user identity and granted scopes
- **Scope Enforcement**:
- Filters `list_tools` based on user's token scopes
- Validates scopes before executing each tool
- Returns 403 with `WWW-Authenticate` header for insufficient scopes
- **Per-User Clients**: Creates authenticated `NextcloudClient` instance per user
- Uses Bearer token for all Nextcloud API requests
- User-specific permissions and audit trails
**Key Files**:
- [`app.py`](../nextcloud_mcp_server/app.py) - OAuth mode detection and configuration
- [`auth/token_verifier.py`](../nextcloud_mcp_server/auth/token_verifier.py) - Token validation logic
- [`app.py`](../nextcloud_mcp_server/app.py) - OAuth mode, DCR, PRM endpoint
- [`auth/token_verifier.py`](../nextcloud_mcp_server/auth/token_verifier.py) - Token validation (userinfo + introspection + JWT)
- [`auth/context_helper.py`](../nextcloud_mcp_server/auth/context_helper.py) - Per-user client creation
- [`auth/scope_authorization.py`](../nextcloud_mcp_server/auth/scope_authorization.py) - `@require_scopes` decorator, scope discovery
- [`auth/client_registration.py`](../nextcloud_mcp_server/auth/client_registration.py) - DCR implementation (RFC 7591)
### 3. Nextcloud OIDC Apps
#### a) `oidc` - OIDC Identity Provider
- **Role**: OAuth 2.0 Authorization Server
- **Location**: Nextcloud app (`apps/oidc`)
- **Endpoints**:
- `/.well-known/openid-configuration` - Discovery endpoint
- `/apps/oidc/authorize` - Authorization endpoint
- `/apps/oidc/token` - Token endpoint
- `/apps/oidc/userinfo` - User info endpoint (token validation)
- `/apps/oidc/jwks` - JSON Web Key Set
- `/apps/oidc/register` - Dynamic client registration
**Role**: OAuth 2.0 Authorization Server + OIDC Provider
**Location**: Nextcloud app (`apps/oidc`)
**Endpoints**:
- `/.well-known/openid-configuration` - OIDC Discovery (RFC 8414)
- `/apps/oidc/authorize` - Authorization endpoint (OAuth 2.0 + PKCE)
- `/apps/oidc/token` - Token endpoint (issues JWT or opaque tokens)
- `/apps/oidc/userinfo` - UserInfo endpoint (OIDC Core, used for token validation)
- `/apps/oidc/jwks` - JSON Web Key Set (for JWT signature verification)
- `/apps/oidc/register` - Dynamic Client Registration endpoint (RFC 7591)
- `/apps/oidc/introspect` - Token Introspection endpoint (RFC 7662, optional)
**Token Types**:
- **JWT tokens**: Self-contained tokens with embedded scopes, validated via JWKS or userinfo
- **Opaque tokens**: Random strings, validated via userinfo or introspection endpoint
**Configuration**:
```bash
# Enable dynamic client registration (optional)
# Settings → OIDC → "Allow dynamic client registration"
# Enable dynamic client registration (recommended for development)
# Nextcloud Admin → Settings → OIDC → "Allow dynamic client registration"
# Enable token introspection (optional, for opaque token validation)
# Nextcloud Admin → Settings → OIDC → "Enable token introspection"
```
#### b) `user_oidc` - OpenID Connect User Backend
- **Role**: Bearer token validation middleware
- **Location**: Nextcloud app (`apps/user_oidc`)
- **Responsibilities**:
- Validates Bearer tokens for Nextcloud API requests
- Creates user sessions from valid Bearer tokens
- Integrates with Nextcloud's authentication system
**Role**: Bearer token validation middleware for Nextcloud APIs
**Location**: Nextcloud app (`apps/user_oidc`)
**Responsibilities**:
- Intercepts Nextcloud API requests with `Authorization: Bearer` header
- Validates tokens against OIDC provider (`oidc` app)
- Creates authenticated user sessions
- Enforces user-specific permissions on API requests
**Configuration**:
```bash
# Enable Bearer token validation (required)
# Enable Bearer token validation (required for OAuth mode)
php occ config:system:set user_oidc oidc_provider_bearer_validation --value=true --type=boolean
```
> [!IMPORTANT]
> The `user_oidc` app requires a patch to properly support Bearer token authentication for non-OCS endpoints. See [Upstream Status](oauth-upstream-status.md) for details.
> The `user_oidc` app requires a patch to properly support Bearer token authentication for non-OCS endpoints (like Notes API, Calendar API). See [Upstream Status](oauth-upstream-status.md) for patch details and PR status.
### 4. Nextcloud Instance
- **Role**: Resource Owner / API Provider
- **Provides**: Notes, Calendar, Contacts, Deck, Files, etc.
**Role**: Resource Owner + API Provider
**APIs Exposed**:
- **Notes API**: `/apps/notes/api/v1/` - Note CRUD operations
- **Calendar (CalDAV)**: `/remote.php/dav/calendars/` - Events and todos
- **Contacts (CardDAV)**: `/remote.php/dav/addressbooks/` - Contact management
- **Cookbook API**: `/apps/cookbook/api/v1/` - Recipe management
- **Deck API**: `/apps/deck/api/v1.0/` - Kanban boards
- **Tables API**: `/apps/tables/api/2/` - Table row operations
- **WebDAV (Files)**: `/remote.php/dav/files/` - File operations
- **Sharing API**: `/ocs/v2.php/apps/files_sharing/api/v1/` - Share management
## Authentication Flow
### Phase 1: OAuth Authorization (Steps 1-7)
The OAuth flow consists of four distinct phases (see diagram above for visual representation):
1. **Client Connects**: MCP client connects to MCP server
2. **Auth Settings**: MCP server returns OAuth settings:
```json
{
"issuer_url": "https://nextcloud.example.com",
"resource_server_url": "http://localhost:8000",
"required_scopes": ["openid", "profile"]
}
```
3. **OAuth Flow**: Client initiates OAuth flow with PKCE
- Generates `code_verifier` (random string)
- Calculates `code_challenge` = SHA256(code_verifier)
- Redirects user to `/apps/oidc/authorize` with `code_challenge`
4. **User Authentication**: User logs in to Nextcloud via browser
5. **Authorization Code**: Nextcloud redirects back with authorization code
6. **Token Exchange**: Client exchanges code for access token
- Sends `code` + `code_verifier` to `/apps/oidc/token`
- OIDC app validates PKCE challenge
7. **Access Token**: Client receives access token (JWT or opaque)
### Phase 0: MCP Server Startup (One-time Setup)
### Phase 2: API Access (Steps 8-13)
**Happens**: On MCP server first startup
8. **API Request**: Client sends MCP request with Bearer token
9. **Token Validation**: MCP server validates token:
- Checks cache (1-hour TTL by default)
- If not cached, calls `/apps/oidc/userinfo` with Bearer token
- Extracts username from `sub` or `preferred_username` claim
10. **User Info**: Nextcloud returns user info if token is valid
11. **Nextcloud API Call**: MCP server calls Nextcloud API on behalf of user
- Creates `NextcloudClient` instance with Bearer token
- User-specific permissions apply
12. **API Response**: Nextcloud returns data
13. **MCP Response**: MCP server returns formatted response to client
**Steps**:
1. **OIDC Discovery** (`GET /.well-known/openid-configuration`)
- MCP server queries Nextcloud for OAuth endpoints
- Validates PKCE support (requires `S256` code challenge method)
- Extracts endpoints: authorize, token, userinfo, jwks, register
2. **Dynamic Client Registration** (`POST /apps/oidc/register`)
- If no pre-configured client credentials exist
- MCP server registers itself as OAuth client (RFC 7591)
- Provides: client name, redirect URIs, requested scopes, token type
- Receives: `client_id`, `client_secret`
- Saves credentials to `.nextcloud_oauth_client.json`
3. **Tool Registration**
- All MCP tools loaded with their `@require_scopes` decorators
- Scope metadata stored for later discovery
**Result**: MCP server ready to accept client connections
### Phase 1: Client Discovery (Per MCP Client Connection)
**Happens**: When MCP client first connects
**Steps**:
1. **MCP Connection**
- Client connects to MCP server
- Server returns OAuth auth settings (issuer URL, resource URL)
2. **PRM Discovery** (`GET /.well-known/oauth-protected-resource/mcp`)
- Client queries Protected Resource Metadata endpoint (RFC 9728)
- Server **dynamically discovers** scopes from all registered tools
- Returns: resource URL, `scopes_supported` list, authorization servers
- Client now knows which scopes are available
**Result**: Client knows OAuth configuration and available scopes
### Phase 2: OAuth Authorization (PKCE Flow - RFC 7636)
**Happens**: User authorizes access
**Steps**:
1. **PKCE Challenge Generation** (Client-side)
- Generate `code_verifier`: random 43-128 character string
- Calculate `code_challenge`: `BASE64URL(SHA256(code_verifier))`
2. **Authorization Request** (`GET /apps/oidc/authorize`)
- Client redirects user to Nextcloud consent page
- Parameters:
- `client_id`: OAuth client ID
- `code_challenge`: SHA256 hash of verifier
- `code_challenge_method`: `S256`
- `scope`: Requested scopes (e.g., `openid notes:read notes:write`)
- `redirect_uri`: MCP server callback URL
3. **User Consent**
- User authenticates to Nextcloud (if not already logged in)
- User reviews and approves/denies requested scopes
- Can select subset of requested scopes
4. **Authorization Code**
- Nextcloud redirects to `callback?code=xyz123`
- Code is bound to PKCE challenge
5. **Token Exchange** (`POST /apps/oidc/token`)
- Client sends:
- Authorization `code`
- `code_verifier` (proves possession of original challenge)
- `client_id` and `client_secret`
- Nextcloud validates PKCE challenge: `SHA256(code_verifier) == code_challenge`
- Nextcloud issues access token
6. **Access Token Response**
- Token type: JWT or opaque (configurable)
- Contains user's **granted scopes** (may be subset of requested)
- Client stores token for subsequent requests
**Result**: Client has valid access token with granted scopes
### Phase 3: MCP Tool Access (Scope-Based Authorization)
**Happens**: Every MCP tool invocation
**Steps**:
#### Tool Listing (`list_tools`)
1. **List Tools Request**
- Client sends `list_tools` with `Authorization: Bearer <token>`
2. **Token Validation**
- MCP server calls `/apps/oidc/userinfo` with Bearer token
- Nextcloud returns user info including **granted scopes**
- Result cached for 1 hour
3. **Dynamic Tool Filtering**
- Server compares token scopes with each tool's `@require_scopes`
- Only returns tools where user has all required scopes
- Example: Token with `notes:read` sees 4 read tools, not 3 write tools
4. **Filtered Tool List**
- Client receives only tools they can use
#### Tool Execution (e.g., `nc_notes_get_note`)
1. **Tool Call**
- Client invokes tool with `Authorization: Bearer <token>`
2. **Scope Validation**
- `@require_scopes` decorator extracts token scopes
- Verifies token contains required scope (e.g., `notes:read`)
- If missing → 403 with `WWW-Authenticate` header (step-up auth)
- If present → continues execution
3. **Nextcloud API Call**
- MCP server creates `NextcloudClient` with Bearer token
- Calls Nextcloud API (e.g., `GET /apps/notes/api/v1/notes/1`)
- `user_oidc` app validates Bearer token again
- Request executes as authenticated user
4. **Response**
- Nextcloud returns data
- MCP server formats response
- Returns to client
**Result**: User can only access tools and data they have permissions for
### Phase 4: Insufficient Scope Handling (Step-Up Authorization)
**Happens**: When user lacks required scopes
**Steps**:
1. **Tool Call with Insufficient Scopes**
- User calls `nc_notes_create_note` (requires `notes:write`)
- But token only has `notes:read`
2. **Scope Validation Fails**
- `@require_scopes("notes:write")` decorator checks token
- Finds `notes:write` missing
3. **403 Response with Challenge**
- Returns `403 Forbidden`
- Includes `WWW-Authenticate` header:
```
Bearer error="insufficient_scope",
scope="notes:write",
resource_metadata="http://localhost:8000/.well-known/oauth-protected-resource/mcp"
```
4. **Client Re-Authorization** (Optional)
- Client can initiate new OAuth flow requesting additional scopes
- User re-consents with expanded permissions
- New token includes both `notes:read` and `notes:write`
**Result**: User can dynamically upgrade permissions without full re-authentication
## Token Validation
@@ -272,14 +569,145 @@ client = get_client_from_context(ctx)
- Protects against authorization code interception
### Scopes
- Required scopes: `openid`, `profile`
- Additional scopes inferred from userinfo response
- Base required scopes: `openid`, `profile`, `email`
- App-specific scopes control access to individual Nextcloud apps
- See [OAuth Scopes](#oauth-scopes) section for complete scope reference
### Token Validation
- Every MCP request validates Bearer token
- Cached for performance (1-hour default)
- Calls userinfo endpoint for validation
## OAuth Scopes
The Nextcloud MCP Server implements fine-grained OAuth scopes for each Nextcloud app integration. Scopes control which tools are visible and accessible to users based on their granted permissions.
### Scope-Based Access Control
When using OAuth authentication:
1. **Dynamic Discovery**: The server automatically discovers all required scopes from `@require_scopes` decorators on MCP tools
2. **Tool Filtering**: Tools are dynamically filtered based on the user's token scopes - users only see tools they have permission to use
3. **Per-Tool Enforcement**: Each tool validates required scopes before execution, returning a 403 error if insufficient scopes are present
### Supported Scopes
The server supports the following OAuth scopes, organized by Nextcloud app:
#### Base OIDC Scopes
- `openid` - OpenID Connect authentication (required)
- `profile` - Access to user profile information (required)
- `email` - Access to user email address (required)
#### Notes App
- `notes:read` - Read notes, search notes, get note attachments
- `notes:write` - Create, update, append to, and delete notes
#### Calendar App
- `calendar:read` - List calendars, read events, search events
- `calendar:write` - Create, update, and delete calendars and events
#### Calendar Tasks (VTODO)
- `todo:read` - List and read CalDAV tasks
- `todo:write` - Create, update, and delete CalDAV tasks
#### Contacts App
- `contacts:read` - List address books and read contacts (CardDAV)
- `contacts:write` - Create, update, and delete address books and contacts
#### Cookbook App
- `cookbook:read` - Read recipes, search recipes
- `cookbook:write` - Create, update, and delete recipes
#### Deck App
- `deck:read` - List boards, stacks, cards, and labels
- `deck:write` - Create, update, and delete boards, stacks, cards, and labels
#### Tables App
- `tables:read` - List tables and read rows
- `tables:write` - Create, update, and delete rows in tables
#### Files (WebDAV)
- `files:read` - List files, read file contents, search files
- `files:write` - Upload, update, move, copy, and delete files
#### Sharing
- `sharing:read` - List shares and read share information
- `sharing:write` - Create, update, and delete shares
### Scope Discovery
The MCP server provides scope discovery through two mechanisms:
#### 1. Protected Resource Metadata (PRM) Endpoint
```bash
# Query the PRM endpoint
curl http://localhost:8000/.well-known/oauth-protected-resource/mcp
# Response includes dynamically discovered scopes
{
"resource": "http://localhost:8000/mcp",
"scopes_supported": ["openid", "profile", "email", "notes:read", ...],
"authorization_servers": ["https://nextcloud.example.com"],
"bearer_methods_supported": ["header"],
"resource_signing_alg_values_supported": ["RS256"]
}
```
The `scopes_supported` field is **dynamically generated** from all registered MCP tools, ensuring it always reflects the actual available scopes.
#### 2. Scope Enforcement via Decorators
Tools are decorated with `@require_scopes()` to declare their required permissions:
```python
from nextcloud_mcp_server.auth import require_scopes
@mcp.tool()
@require_scopes("notes:read")
async def nc_notes_get_note(ctx: Context, note_id: int):
"""Get a specific note by ID"""
# Implementation
```
### Client Registration Scopes
During OAuth client registration (dynamic or manual), clients request a set of scopes that define the **maximum allowed** scopes for that client. The actual per-tool enforcement is handled separately via decorators.
**Environment Variable**:
```bash
NEXTCLOUD_OIDC_SCOPES="openid profile email notes:read notes:write calendar:read calendar:write ..."
```
**Default**: All supported scopes (recommended for development)
> **Note**: Client registration scopes define the maximum permissions. The MCP server's PRM endpoint dynamically advertises the actual supported scopes based on registered tools.
### Step-Up Authorization
The server supports OAuth step-up authorization (RFC 8693). If a user attempts to use a tool requiring scopes they don't have:
1. Tool returns `403 Forbidden` with `InsufficientScopeError`
2. Response includes `WWW-Authenticate` header listing missing scopes:
```
WWW-Authenticate: Bearer error="insufficient_scope", scope="notes:write", resource_metadata="..."
```
3. Client can re-authorize with additional scopes
### Scope Validation
All scope enforcement happens at two levels:
1. **Tool Visibility**: During `list_tools` requests, only tools matching the user's token scopes are returned
2. **Execution Time**: When calling a tool, the `@require_scopes` decorator validates the token has necessary scopes
**Example**:
```python
# User token has: ["openid", "profile", "email", "notes:read"]
# They will see: 4 read-only notes tools
# They will NOT see: 3 write notes tools (notes:write required)
# Attempting to call a write tool returns 403 Forbidden
```
## Configuration
See [Configuration Guide](configuration.md) for all OAuth environment variables:
+74
View File
@@ -21,3 +21,77 @@ NEXTCLOUD_MCP_SERVER_URL=http://localhost:8000
# - If these are set, OAuth mode is disabled
NEXTCLOUD_USERNAME=
NEXTCLOUD_PASSWORD=
# ============================================
# Document Processing Configuration
# ============================================
# Enable document processing (PDF, DOCX, images, etc.)
# Set to false to disable all document processing
ENABLE_DOCUMENT_PROCESSING=false
# Default processor to use when multiple are available
# Options: unstructured, tesseract, custom
DOCUMENT_PROCESSOR=unstructured
# ============================================
# Unstructured.io Processor
# ============================================
# Enable Unstructured processor (requires unstructured service in docker-compose)
# This is a cloud-based/API processor supporting many document types
ENABLE_UNSTRUCTURED=false
# Unstructured API endpoint
UNSTRUCTURED_API_URL=http://unstructured:8000
# Request timeout in seconds (default: 120)
# OCR operations can take 30-120 seconds for large documents
UNSTRUCTURED_TIMEOUT=120
# Parsing strategy: auto, fast, hi_res
# - auto: Automatically choose based on document type
# - fast: Fast parsing without OCR
# - hi_res: High-resolution with OCR (slowest, most accurate)
UNSTRUCTURED_STRATEGY=auto
# OCR languages (comma-separated ISO 639-3 codes)
# Common: eng=English, deu=German, fra=French, spa=Spanish
UNSTRUCTURED_LANGUAGES=eng,deu
# Progress reporting interval in seconds (default: 10)
# During long-running OCR operations, progress notifications are sent to the MCP client
# at this interval to prevent timeouts and provide status updates
PROGRESS_INTERVAL=10
# ============================================
# Tesseract Processor (Local OCR)
# ============================================
# Enable Tesseract processor (requires tesseract binary installed)
# This is a local, lightweight OCR solution for images only
ENABLE_TESSERACT=false
# Path to tesseract executable (optional, auto-detected if in PATH)
#TESSERACT_CMD=/usr/bin/tesseract
# OCR language (e.g., eng, deu, eng+deu for multiple)
TESSERACT_LANG=eng
# ============================================
# Custom Processor (Your own API)
# ============================================
# Enable custom document processor via HTTP API
ENABLE_CUSTOM_PROCESSOR=false
# Unique name for your processor
#CUSTOM_PROCESSOR_NAME=my_ocr
# Your custom processor API endpoint
#CUSTOM_PROCESSOR_URL=http://localhost:9000/process
# Optional API key for authentication
#CUSTOM_PROCESSOR_API_KEY=your-api-key-here
# Request timeout in seconds
#CUSTOM_PROCESSOR_TIMEOUT=60
# Comma-separated MIME types your processor supports
#CUSTOM_PROCESSOR_TYPES=application/pdf,image/jpeg,image/png
+183 -26
View File
@@ -18,13 +18,19 @@ from starlette.routing import Mount, Route
from nextcloud_mcp_server.auth import (
InsufficientScopeError,
NextcloudTokenVerifier,
discover_all_scopes,
get_access_token_scopes,
has_required_scopes,
is_jwt_token,
)
from nextcloud_mcp_server.client import NextcloudClient
from nextcloud_mcp_server.config import LOGGING_CONFIG, setup_logging
from nextcloud_mcp_server.config import (
LOGGING_CONFIG,
get_document_processor_config,
setup_logging,
)
from nextcloud_mcp_server.context import get_client as get_nextcloud_client
from nextcloud_mcp_server.document_processors import get_registry
from nextcloud_mcp_server.server import (
configure_calendar_tools,
configure_contacts_tools,
@@ -39,6 +45,92 @@ from nextcloud_mcp_server.server import (
logger = logging.getLogger(__name__)
def initialize_document_processors():
"""Initialize and register document processors based on configuration.
This function reads the environment configuration and registers available
processors (Unstructured, Tesseract, Custom HTTP) with the global registry.
"""
config = get_document_processor_config()
if not config["enabled"]:
logger.info("Document processing disabled")
return
registry = get_registry()
registered_count = 0
# Register Unstructured processor
if "unstructured" in config["processors"]:
unst_config = config["processors"]["unstructured"]
try:
from nextcloud_mcp_server.document_processors.unstructured import (
UnstructuredProcessor,
)
processor = UnstructuredProcessor(
api_url=unst_config["api_url"],
timeout=unst_config["timeout"],
default_strategy=unst_config["strategy"],
default_languages=unst_config["languages"],
progress_interval=unst_config.get("progress_interval", 10),
)
registry.register(processor, priority=10)
logger.info(f"Registered Unstructured processor: {unst_config['api_url']}")
registered_count += 1
except Exception as e:
logger.warning(f"Failed to register Unstructured processor: {e}")
# Register Tesseract processor
if "tesseract" in config["processors"]:
tess_config = config["processors"]["tesseract"]
try:
from nextcloud_mcp_server.document_processors.tesseract import (
TesseractProcessor,
)
processor = TesseractProcessor(
tesseract_cmd=tess_config.get("tesseract_cmd"),
default_lang=tess_config["lang"],
)
registry.register(processor, priority=5)
logger.info(f"Registered Tesseract processor: lang={tess_config['lang']}")
registered_count += 1
except Exception as e:
logger.warning(f"Failed to register Tesseract processor: {e}")
# Register custom processor
if "custom" in config["processors"]:
custom_config = config["processors"]["custom"]
try:
from nextcloud_mcp_server.document_processors.custom_http import (
CustomHTTPProcessor,
)
processor = CustomHTTPProcessor(
name=custom_config["name"],
api_url=custom_config["api_url"],
api_key=custom_config.get("api_key"),
timeout=custom_config["timeout"],
supported_types=custom_config["supported_types"],
)
registry.register(processor, priority=1)
logger.info(
f"Registered Custom processor '{custom_config['name']}': {custom_config['api_url']}"
)
registered_count += 1
except Exception as e:
logger.warning(f"Failed to register Custom processor: {e}")
if registered_count > 0:
logger.info(
f"Document processing initialized with {registered_count} processor(s): "
f"{', '.join(registry.list_processors())}"
)
else:
logger.warning("Document processing enabled but no processors registered")
def validate_pkce_support(discovery: dict, discovery_url: str) -> None:
"""
Validate that the OIDC provider properly advertises PKCE support.
@@ -192,7 +284,15 @@ async def load_oauth_client_credentials(
redirect_uris = [f"{mcp_server_url}/oauth/callback"]
# Get scopes from environment or use defaults
# Default: all app-specific read/write scopes
# Note: Client registration happens BEFORE tools are registered, so we can't
# dynamically discover scopes here. These scopes define the "maximum allowed"
# scopes for this OAuth client. The actual per-tool scope enforcement happens
# via @require_scopes decorators, and the PRM endpoint advertises the actual
# supported scopes dynamically.
#
# IMPORTANT: Keep this list in sync with all @require_scopes decorators
# when adding new apps, or set NEXTCLOUD_OIDC_SCOPES environment variable
# to override.
default_scopes = (
"openid profile email "
"notes:read notes:write "
@@ -257,6 +357,9 @@ async def app_lifespan_basic(server: FastMCP) -> AsyncIterator[AppContext]:
client = NextcloudClient.from_env()
logger.info("Client initialization complete")
# Initialize document processors
initialize_document_processors()
try:
yield AppContext(client=client)
finally:
@@ -317,6 +420,9 @@ async def app_lifespan_oauth(server: FastMCP) -> AsyncIterator[OAuthAppContext]:
logger.info("OAuth initialization complete")
# Initialize document processors
initialize_document_processors()
try:
yield OAuthAppContext(
nextcloud_host=nextcloud_host, token_verifier=token_verifier
@@ -542,12 +648,79 @@ def get_app(transport: str = "sse", enabled_apps: list[str] | None = None):
await stack.enter_async_context(mcp.session_manager.run())
yield
# Health check endpoints for Kubernetes probes
def health_live(request):
"""Liveness probe endpoint.
Returns 200 OK if the application process is running.
This is a simple check that doesn't verify external dependencies.
"""
return JSONResponse(
{
"status": "alive",
"mode": "oauth" if oauth_enabled else "basic",
}
)
async def health_ready(request):
"""Readiness probe endpoint.
Returns 200 OK if the application is ready to serve traffic.
Checks that required configuration is present.
"""
checks = {}
is_ready = True
# Check Nextcloud host configuration
nextcloud_host = os.getenv("NEXTCLOUD_HOST")
if nextcloud_host:
checks["nextcloud_configured"] = "ok"
else:
checks["nextcloud_configured"] = "error: NEXTCLOUD_HOST not set"
is_ready = False
# Check authentication configuration
if oauth_enabled:
# OAuth mode - just verify we got this far (token_verifier initialized in lifespan)
checks["auth_mode"] = "oauth"
checks["auth_configured"] = "ok"
else:
# BasicAuth mode - verify credentials are set
username = os.getenv("NEXTCLOUD_USERNAME")
password = os.getenv("NEXTCLOUD_PASSWORD")
if username and password:
checks["auth_mode"] = "basic"
checks["auth_configured"] = "ok"
else:
checks["auth_mode"] = "basic"
checks["auth_configured"] = "error: credentials not set"
is_ready = False
status_code = 200 if is_ready else 503
return JSONResponse(
{
"status": "ready" if is_ready else "not_ready",
"checks": checks,
},
status_code=status_code,
)
# Add Protected Resource Metadata (PRM) endpoint for OAuth mode
routes = []
# Add health check routes (available in both OAuth and BasicAuth modes)
routes.append(Route("/health/live", health_live, methods=["GET"]))
routes.append(Route("/health/ready", health_ready, methods=["GET"]))
logger.info("Health check endpoints enabled: /health/live, /health/ready")
if oauth_enabled:
def oauth_protected_resource_metadata(request):
"""RFC 9728 Protected Resource Metadata endpoint."""
"""RFC 9728 Protected Resource Metadata endpoint.
Dynamically discovers supported scopes from registered MCP tools.
This ensures the advertised scopes always match the actual tool requirements.
"""
mcp_server_url = os.getenv(
"NEXTCLOUD_MCP_SERVER_URL", "http://localhost:8000"
)
@@ -561,30 +734,14 @@ def get_app(transport: str = "sse", enabled_apps: list[str] | None = None):
# Fallback to NEXTCLOUD_HOST if PUBLIC_ISSUER_URL not set
public_issuer_url = os.getenv("NEXTCLOUD_HOST", "")
# Dynamically discover all scopes from registered tools
# This provides a single source of truth based on @require_scopes decorators
supported_scopes = discover_all_scopes(mcp)
return JSONResponse(
{
"resource": resource_url,
"scopes_supported": [
"openid",
"notes:read",
"notes:write",
"calendar:read",
"calendar:write",
"todo:read",
"todo:write",
"contacts:read",
"contacts:write",
"cookbook:read",
"cookbook:write",
"deck:read",
"deck:write",
"tables:read",
"tables:write",
"files:read",
"files:write",
"sharing:read",
"sharing:write",
],
"scopes_supported": supported_scopes,
"authorization_servers": [public_issuer_url],
"bearer_methods_supported": ["header"],
"resource_signing_alg_values_supported": ["RS256"],
@@ -735,9 +892,9 @@ def get_app(transport: str = "sse", enabled_apps: list[str] | None = None):
@click.option(
"--oauth-scopes",
envvar="NEXTCLOUD_OIDC_SCOPES",
default="openid profile email notes:read notes:write calendar:read calendar:write contacts:read contacts:write cookbook:read cookbook:write deck:read deck:write tables:read tables:write files:read files:write sharing:read sharing:write",
default="openid profile email notes:read notes:write calendar:read calendar:write todo:read todo:write contacts:read contacts:write cookbook:read cookbook:write deck:read deck:write tables:read tables:write files:read files:write sharing:read sharing:write",
show_default=True,
help="OAuth scopes to request (can also use NEXTCLOUD_OIDC_SCOPES env var)",
help="OAuth scopes to request during client registration. These define the maximum allowed scopes for the client. Note: Actual supported scopes are discovered dynamically from MCP tools at runtime. (can also use NEXTCLOUD_OIDC_SCOPES env var)",
)
@click.option(
"--oauth-token-type",
+2
View File
@@ -7,6 +7,7 @@ from .scope_authorization import (
InsufficientScopeError,
ScopeAuthorizationError,
check_scopes,
discover_all_scopes,
get_access_token_scopes,
get_required_scopes,
has_required_scopes,
@@ -25,6 +26,7 @@ __all__ = [
"ScopeAuthorizationError",
"InsufficientScopeError",
"check_scopes",
"discover_all_scopes",
"get_access_token_scopes",
"get_required_scopes",
"has_required_scopes",
@@ -276,3 +276,68 @@ def has_required_scopes(func: Callable, user_scopes: set[str]) -> bool:
# Check if user has all required scopes
return set(required).issubset(user_scopes)
def discover_all_scopes(mcp) -> list[str]:
"""
Dynamically discover all OAuth scopes required by registered MCP tools.
This function inspects all registered tools and extracts their required scopes
from the @require_scopes decorator metadata. It provides a single source of truth
for available scopes based on the actual tool implementations.
Args:
mcp: FastMCP instance with registered tools
Returns:
Sorted list of unique scope strings, including base OIDC scopes
Example:
```python
from mcp.server.fastmcp import FastMCP
mcp = FastMCP("My Server")
@mcp.tool()
@require_scopes("notes:read")
async def get_notes():
pass
@mcp.tool()
@require_scopes("notes:write")
async def create_note():
pass
scopes = discover_all_scopes(mcp)
# Returns: ["notes:read", "notes:write", "openid", "profile", "email"]
```
Note:
- Base OIDC scopes (openid, profile, email) are always included
- Scopes are deduplicated and sorted alphabetically
- Only scopes from decorated tools are included
- Must be called after tools are registered
"""
# Start with base OIDC scopes that are always required
all_scopes = {"openid", "profile", "email"}
# Get all registered tools
try:
tools = mcp._tool_manager.list_tools()
except AttributeError:
logger.warning("FastMCP instance does not have _tool_manager attribute")
return sorted(all_scopes)
# Extract scopes from each tool
for tool in tools:
# Get the original function (tools have a .fn attribute)
func = getattr(tool, "fn", None)
if func is None:
continue
# Extract scopes using existing helper
tool_scopes = get_required_scopes(func)
all_scopes.update(tool_scopes)
# Return sorted list of unique scopes
return sorted(all_scopes)
+67
View File
@@ -1,4 +1,6 @@
import logging.config
import os
from typing import Any
LOGGING_CONFIG = {
"version": 1,
@@ -51,3 +53,68 @@ LOGGING_CONFIG = {
def setup_logging():
logging.config.dictConfig(LOGGING_CONFIG)
# Document Processing Configuration
def get_document_processor_config() -> dict[str, Any]:
"""Get document processor configuration from environment.
Returns:
Dict with processor configs:
{
"enabled": bool,
"default_processor": str,
"processors": {
"unstructured": {...},
"tesseract": {...},
"custom": {...},
}
}
"""
config: dict[str, Any] = {
"enabled": os.getenv("ENABLE_DOCUMENT_PROCESSING", "false").lower() == "true",
"default_processor": os.getenv("DOCUMENT_PROCESSOR", "unstructured"),
"processors": {},
}
# Unstructured configuration
if os.getenv("ENABLE_UNSTRUCTURED", "false").lower() == "true":
config["processors"]["unstructured"] = {
"api_url": os.getenv("UNSTRUCTURED_API_URL", "http://unstructured:8000"),
"timeout": int(os.getenv("UNSTRUCTURED_TIMEOUT", "120")),
"strategy": os.getenv("UNSTRUCTURED_STRATEGY", "auto"),
"languages": [
lang.strip()
for lang in os.getenv("UNSTRUCTURED_LANGUAGES", "eng,deu").split(",")
if lang.strip()
],
"progress_interval": int(os.getenv("PROGRESS_INTERVAL", "10")),
}
# Tesseract configuration
if os.getenv("ENABLE_TESSERACT", "false").lower() == "true":
config["processors"]["tesseract"] = {
"tesseract_cmd": os.getenv("TESSERACT_CMD"), # None = auto-detect
"lang": os.getenv("TESSERACT_LANG", "eng"),
}
# Custom processor (via HTTP API)
if os.getenv("ENABLE_CUSTOM_PROCESSOR", "false").lower() == "true":
custom_url = os.getenv("CUSTOM_PROCESSOR_URL")
if custom_url:
supported_types_str = os.getenv("CUSTOM_PROCESSOR_TYPES", "application/pdf")
supported_types = {
t.strip() for t in supported_types_str.split(",") if t.strip()
}
config["processors"]["custom"] = {
"name": os.getenv("CUSTOM_PROCESSOR_NAME", "custom"),
"api_url": custom_url,
"api_key": os.getenv("CUSTOM_PROCESSOR_API_KEY"),
"timeout": int(os.getenv("CUSTOM_PROCESSOR_TIMEOUT", "60")),
"supported_types": supported_types,
}
return config
@@ -0,0 +1,12 @@
"""Document processing plugins for extracting text from various file formats."""
from .base import DocumentProcessor, ProcessingResult, ProcessorError
from .registry import ProcessorRegistry, get_registry
__all__ = [
"DocumentProcessor",
"ProcessingResult",
"ProcessorError",
"ProcessorRegistry",
"get_registry",
]
@@ -0,0 +1,126 @@
"""Abstract base class for document processing plugins."""
from abc import ABC, abstractmethod
from collections.abc import Awaitable, Callable
from typing import Any, Optional
from pydantic import BaseModel
class ProcessingResult(BaseModel):
"""Standardized result from any document processor."""
text: str
"""Extracted text content"""
metadata: dict[str, Any]
"""Processor-specific metadata"""
processor: str
"""Name of processor that handled this (e.g., 'unstructured', 'tesseract')"""
success: bool = True
"""Whether processing succeeded"""
error: Optional[str] = None
"""Error message if processing failed"""
class DocumentProcessor(ABC):
"""Abstract base class for document processing plugins.
Document processors extract text from various file formats (PDF, DOCX, images, etc.).
Each processor implements this interface and can be registered with the ProcessorRegistry.
Example:
class MyProcessor(DocumentProcessor):
@property
def name(self) -> str:
return "my_processor"
@property
def supported_mime_types(self) -> set[str]:
return {"application/pdf", "image/jpeg"}
async def process(self, content: bytes, content_type: str, **kwargs) -> ProcessingResult:
# Extract text from content
return ProcessingResult(text="...", metadata={}, processor=self.name)
async def health_check(self) -> bool:
return True
"""
@property
@abstractmethod
def name(self) -> str:
"""Unique identifier for this processor (e.g., 'unstructured', 'tesseract')."""
pass
@property
@abstractmethod
def supported_mime_types(self) -> set[str]:
"""Set of MIME types this processor can handle.
Examples: {"application/pdf", "image/jpeg", "image/png"}
"""
pass
@abstractmethod
async def process(
self,
content: bytes,
content_type: str,
filename: Optional[str] = None,
options: Optional[dict[str, Any]] = None,
progress_callback: Optional[
Callable[[float, Optional[float], Optional[str]], Awaitable[None]]
] = None,
) -> ProcessingResult:
"""Process a document and extract text.
Args:
content: Document bytes
content_type: MIME type of the document
filename: Optional filename for format detection
options: Processor-specific options (e.g., OCR language, strategy)
progress_callback: Optional async callback for progress updates.
Called as: await progress_callback(progress, total, message)
- progress: Current progress value (monotonically increasing)
- total: Optional total value (None if unknown)
- message: Optional human-readable status message
Returns:
ProcessingResult with extracted text and metadata
Raises:
ProcessorError: If processing fails
"""
pass
@abstractmethod
async def health_check(self) -> bool:
"""Check if processor is available and healthy.
Returns:
True if processor is ready to use, False otherwise
"""
pass
def supports(self, content_type: str) -> bool:
"""Check if this processor supports the given MIME type.
Args:
content_type: MIME type (may include parameters like "application/pdf; charset=utf-8")
Returns:
True if this processor can handle the type
"""
# Strip parameters from content type
base_type = content_type.split(";")[0].strip().lower()
return base_type in self.supported_mime_types
class ProcessorError(Exception):
"""Raised when document processing fails."""
pass
@@ -0,0 +1,150 @@
"""Generic HTTP API processor wrapper for custom document processing services."""
import logging
from collections.abc import Awaitable, Callable
from typing import Any, Optional
import httpx
from .base import DocumentProcessor, ProcessingResult, ProcessorError
logger = logging.getLogger(__name__)
class CustomHTTPProcessor(DocumentProcessor):
"""Generic HTTP API processor wrapper.
Allows integration with any custom document processing API that follows
a simple request/response pattern. This makes it easy to integrate your
own text extraction services without writing a full processor.
Expected API Contract:
- POST request with file as multipart/form-data
- Response: {"text": "extracted text", "metadata": {...}}
Example:
processor = CustomHTTPProcessor(
name="my_ocr",
api_url="https://my-ocr-service.com/process",
api_key="secret",
supported_types={"application/pdf", "image/jpeg"},
)
result = await processor.process(pdf_bytes, "application/pdf")
"""
def __init__(
self,
api_url: str,
api_key: Optional[str] = None,
timeout: int = 60,
supported_types: Optional[set[str]] = None,
name: str = "custom",
):
"""Initialize custom HTTP processor.
Args:
api_url: Your API endpoint (should accept POST with multipart/form-data)
api_key: Optional API key for authentication (sent as Bearer token)
timeout: Request timeout in seconds (default: 60)
supported_types: MIME types your API supports
name: Unique name for this processor (default: "custom")
"""
self.api_url = api_url
self.api_key = api_key
self.timeout = timeout
self._name = name
self._supported_types = supported_types or set()
logger.info(f"Initialized CustomHTTPProcessor: {name} -> {api_url}")
@property
def name(self) -> str:
return self._name
@property
def supported_mime_types(self) -> set[str]:
return self._supported_types
async def process(
self,
content: bytes,
content_type: str,
filename: Optional[str] = None,
options: Optional[dict[str, Any]] = None,
progress_callback: Optional[
Callable[[float, Optional[float], Optional[str]], Awaitable[None]]
] = None,
) -> ProcessingResult:
"""Process via custom HTTP API.
Args:
content: Document bytes
content_type: MIME type
filename: Optional filename
options: Custom options (passed as form data to API)
Returns:
ProcessingResult with extracted text and metadata
Raises:
ProcessorError: If API call fails
"""
options = options or {}
# Prepare request
files = {"file": (filename or "document", content, content_type)}
headers = {}
if self.api_key:
headers["Authorization"] = f"Bearer {self.api_key}"
try:
async with httpx.AsyncClient(timeout=self.timeout) as client:
response = await client.post(
self.api_url,
files=files,
headers=headers,
data=options, # Pass options as form data
)
response.raise_for_status()
# Parse response
result = response.json()
text = result.get("text", "")
metadata = result.get("metadata", {})
logger.debug(
f"Custom processor '{self.name}' extracted {len(text)} characters"
)
return ProcessingResult(
text=text,
metadata=metadata,
processor=self.name,
success=True,
)
except httpx.HTTPError as e:
logger.error(f"Custom processor '{self.name}' HTTP error: {e}")
raise ProcessorError(f"API call failed: {str(e)}") from e
except Exception as e:
logger.error(f"Custom processor '{self.name}' failed: {e}")
raise ProcessorError(f"Processing failed: {str(e)}") from e
async def health_check(self) -> bool:
"""Check if custom API is available.
Returns:
True if API responds with status < 500
"""
try:
async with httpx.AsyncClient(timeout=5) as client:
# Try GET request to check availability
response = await client.get(
self.api_url,
headers={"User-Agent": "nextcloud-mcp-server"},
)
return response.status_code < 500
except Exception as e:
logger.warning(f"Custom processor '{self.name}' health check failed: {e}")
return False
@@ -0,0 +1,171 @@
"""Central registry for document processors."""
import logging
from collections.abc import Awaitable, Callable
from typing import Any, Optional
from .base import DocumentProcessor, ProcessingResult, ProcessorError
logger = logging.getLogger(__name__)
class ProcessorRegistry:
"""Central registry for document processors.
Manages registration and routing of document processing requests to
appropriate processors based on MIME types and priorities.
Example:
registry = ProcessorRegistry()
registry.register(UnstructuredProcessor(...), priority=10)
registry.register(TesseractProcessor(...), priority=5)
# Auto-select processor based on MIME type
result = await registry.process(pdf_bytes, "application/pdf")
# Force specific processor
result = await registry.process(img_bytes, "image/png", processor_name="tesseract")
"""
def __init__(self):
self._processors: dict[str, tuple[DocumentProcessor, int]] = {}
self._priority_order: list[str] = []
def register(self, processor: DocumentProcessor, priority: int = 0):
"""Register a document processor.
Args:
processor: Processor instance to register
priority: Higher priority processors are tried first (default: 0)
"""
name = processor.name
if name in self._processors:
logger.warning(f"Processor '{name}' already registered, replacing")
self._processors[name] = (processor, priority)
# Update priority order
if name in self._priority_order:
self._priority_order.remove(name)
# Insert in priority order (higher priority first)
inserted = False
for i, existing_name in enumerate(self._priority_order):
existing_priority = self._processors[existing_name][1]
if priority > existing_priority:
self._priority_order.insert(i, name)
inserted = True
break
if not inserted:
self._priority_order.append(name)
logger.info(
f"Registered processor: {name} "
f"(priority={priority}, supports={len(processor.supported_mime_types)} types)"
)
def get_processor(self, name: str) -> Optional[DocumentProcessor]:
"""Get a processor by name.
Args:
name: Processor name
Returns:
DocumentProcessor instance or None if not found
"""
if name in self._processors:
return self._processors[name][0]
return None
def find_processor(self, content_type: str) -> Optional[DocumentProcessor]:
"""Find the first processor that supports the given MIME type.
Processors are checked in priority order (highest priority first).
Args:
content_type: MIME type to match
Returns:
First matching processor or None
"""
for name in self._priority_order:
processor = self._processors[name][0]
if processor.supports(content_type):
logger.debug(f"Found processor '{name}' for type '{content_type}'")
return processor
logger.debug(f"No processor found for type '{content_type}'")
return None
def list_processors(self) -> list[str]:
"""List all registered processor names in priority order.
Returns:
List of processor names (highest priority first)
"""
return list(self._priority_order)
async def process(
self,
content: bytes,
content_type: str,
filename: Optional[str] = None,
processor_name: Optional[str] = None,
options: Optional[dict[str, Any]] = None,
progress_callback: Optional[
Callable[[float, Optional[float], Optional[str]], Awaitable[None]]
] = None,
) -> ProcessingResult:
"""Process a document using available processors.
Args:
content: Document bytes
content_type: MIME type
filename: Optional filename for format detection
processor_name: Force specific processor (or None for auto-select)
options: Processing options passed to processor
progress_callback: Optional async callback for progress updates
Returns:
ProcessingResult with extracted text and metadata
Raises:
ProcessorError: If no processor found or processing fails
"""
# Find processor
if processor_name:
processor = self.get_processor(processor_name)
if not processor:
raise ProcessorError(
f"Processor '{processor_name}' not found. "
f"Available: {', '.join(self.list_processors())}"
)
else:
processor = self.find_processor(content_type)
if not processor:
raise ProcessorError(
f"No processor found for type: {content_type}. "
f"Registered processors: {', '.join(self.list_processors())}"
)
logger.info(f"Processing with '{processor.name}' processor")
# Process
return await processor.process(
content, content_type, filename, options, progress_callback
)
# Global registry instance
_registry = ProcessorRegistry()
def get_registry() -> ProcessorRegistry:
"""Get the global processor registry.
Returns:
Singleton ProcessorRegistry instance
"""
return _registry
@@ -0,0 +1,165 @@
"""Document processor using Tesseract OCR (local)."""
import logging
import shutil
from collections.abc import Awaitable, Callable
from typing import Any, Optional
from .base import DocumentProcessor, ProcessingResult, ProcessorError
logger = logging.getLogger(__name__)
try:
import io
import pytesseract
from PIL import Image
TESSERACT_AVAILABLE = True
except ImportError:
TESSERACT_AVAILABLE = False
class TesseractProcessor(DocumentProcessor):
"""Document processor using Tesseract OCR (local).
This processor runs OCR locally using the Tesseract engine, which is
faster and more lightweight than cloud-based solutions but requires
Tesseract to be installed on the system.
Requirements:
- tesseract binary installed (e.g., apt install tesseract-ocr)
- Python packages: pip install pytesseract pillow
Example:
processor = TesseractProcessor(default_lang="eng+deu")
result = await processor.process(image_bytes, "image/jpeg")
"""
SUPPORTED_TYPES = {
"image/jpeg",
"image/png",
"image/tiff",
"image/bmp",
"image/gif",
}
def __init__(
self,
tesseract_cmd: Optional[str] = None,
default_lang: str = "eng",
):
"""Initialize Tesseract processor.
Args:
tesseract_cmd: Path to tesseract executable (None = auto-detect)
default_lang: Default OCR language (e.g., "eng", "deu", "eng+deu")
Raises:
ProcessorError: If Tesseract or required packages not available
"""
if not TESSERACT_AVAILABLE:
raise ProcessorError(
"Tesseract processor requires: pip install pytesseract pillow"
)
if tesseract_cmd:
pytesseract.pytesseract.tesseract_cmd = tesseract_cmd
elif not shutil.which("tesseract"):
raise ProcessorError(
"Tesseract not found in PATH. Install with: apt install tesseract-ocr"
)
self.default_lang = default_lang
logger.info(f"Initialized TesseractProcessor: lang={default_lang}")
@property
def name(self) -> str:
return "tesseract"
@property
def supported_mime_types(self) -> set[str]:
return self.SUPPORTED_TYPES
async def process(
self,
content: bytes,
content_type: str,
filename: Optional[str] = None,
options: Optional[dict[str, Any]] = None,
progress_callback: Optional[
Callable[[float, Optional[float], Optional[str]], Awaitable[None]]
] = None,
) -> ProcessingResult:
"""Process image via Tesseract OCR.
Args:
content: Image bytes
content_type: Image MIME type
filename: Optional filename
options: Processing options:
- lang: OCR language(s) (default: from init)
- config: Tesseract config string
Returns:
ProcessingResult with extracted text and metadata
Raises:
ProcessorError: If OCR fails
"""
options = options or {}
lang = options.get("lang", self.default_lang)
config = options.get("config", "")
try:
# Load image
image = Image.open(io.BytesIO(content))
# Run OCR
text = pytesseract.image_to_string(image, lang=lang, config=config)
# Get additional data for confidence scores
data = pytesseract.image_to_data(
image, lang=lang, output_type=pytesseract.Output.DICT
)
# Calculate average confidence
confidences = [c for c in data["conf"] if c != -1]
avg_confidence = sum(confidences) / len(confidences) if confidences else 0
metadata = {
"text_length": len(text),
"language": lang,
"image_size": image.size,
"image_mode": image.mode,
"confidence": round(avg_confidence, 2),
"words_detected": len([c for c in data["conf"] if c != -1]),
}
logger.debug(
f"Tesseract OCR completed: {len(text)} chars, "
f"confidence={avg_confidence:.1f}%"
)
return ProcessingResult(
text=text.strip(),
metadata=metadata,
processor=self.name,
success=True,
)
except Exception as e:
logger.error(f"Tesseract processing failed: {e}")
raise ProcessorError(f"OCR failed: {str(e)}") from e
async def health_check(self) -> bool:
"""Check if Tesseract is available.
Returns:
True if Tesseract is installed and working
"""
try:
pytesseract.get_tesseract_version()
return True
except Exception:
return False
@@ -0,0 +1,310 @@
"""Document processor using Unstructured.io API."""
import io
import logging
import time
from collections.abc import Awaitable, Callable
from typing import Any, Optional
import anyio
import httpx
from .base import DocumentProcessor, ProcessingResult, ProcessorError
logger = logging.getLogger(__name__)
class UnstructuredProcessor(DocumentProcessor):
"""Document processor using Unstructured.io API.
The Unstructured API provides document parsing capabilities for various formats
including PDF, DOCX, images with OCR, and more.
API Documentation: https://docs.unstructured.io/api-reference/api-services/api-parameters
"""
# Supported MIME types for Unstructured
SUPPORTED_TYPES = {
"application/pdf",
"application/vnd.openxmlformats-officedocument.wordprocessingml.document",
"application/msword",
"application/vnd.openxmlformats-officedocument.presentationml.presentation",
"application/vnd.ms-powerpoint",
"application/vnd.openxmlformats-officedocument.spreadsheetml.sheet",
"application/vnd.ms-excel",
"application/rtf",
"text/rtf",
"application/vnd.oasis.opendocument.text",
"application/epub+zip",
"message/rfc822",
"application/vnd.ms-outlook",
"image/jpeg",
"image/png",
"image/tiff",
"image/bmp",
}
def __init__(
self,
api_url: str,
timeout: int = 120,
default_strategy: str = "auto",
default_languages: Optional[list[str]] = None,
progress_interval: int = 10,
):
"""Initialize Unstructured processor.
Args:
api_url: Unstructured API endpoint
timeout: Request timeout in seconds (default: 120)
default_strategy: Default parsing strategy - "auto", "fast", or "hi_res"
default_languages: Default OCR language codes (e.g., ["eng", "deu"])
progress_interval: Seconds between progress updates (default: 10)
"""
self.api_url = api_url
self.timeout = timeout
self.default_strategy = default_strategy
self.default_languages = default_languages or ["eng"]
self.progress_interval = progress_interval
logger.info(
f"Initialized UnstructuredProcessor: {api_url}, "
f"strategy={default_strategy}, languages={self.default_languages}, "
f"progress_interval={progress_interval}s"
)
@property
def name(self) -> str:
return "unstructured"
@property
def supported_mime_types(self) -> set[str]:
return self.SUPPORTED_TYPES
async def _run_progress_poller(
self,
stop_event: anyio.Event,
progress_callback: Callable[
[float, Optional[float], Optional[str]], Awaitable[None]
],
start_time: float,
):
"""Run progress poller that reports status every N seconds.
Args:
stop_event: Event to signal when processing is complete
progress_callback: Async callback to report progress
start_time: Time when processing started (from time.time())
"""
logger.debug("Starting progress poller")
while not stop_event.is_set():
try:
# Wait for the event to be set, with a timeout equal to progress_interval
with anyio.fail_after(self.progress_interval):
await stop_event.wait()
# If wait() finished, the event was set (processing complete)
break
except TimeoutError:
# Timeout occurred - time to send a progress update
if not stop_event.is_set(): # Double-check in case of race condition
elapsed = int(time.time() - start_time)
message = (
f"Processing document with unstructured... ({elapsed}s elapsed)"
)
try:
await progress_callback(
progress=float(elapsed),
total=None, # Unknown total duration
message=message,
)
logger.debug(f"Progress update sent: {elapsed}s elapsed")
except Exception as e:
logger.warning(f"Failed to send progress update: {e}")
logger.debug("Progress poller stopped")
async def _make_api_request(
self,
content: bytes,
content_type: str,
filename: Optional[str],
strategy: str,
languages: list[str],
extract_image_block_types: Optional[list[str]],
) -> ProcessingResult:
"""Make the actual API request to Unstructured.
Args:
content: Document bytes
content_type: MIME type
filename: Optional filename
strategy: Processing strategy
languages: OCR languages
extract_image_block_types: Image element types to extract
Returns:
ProcessingResult with extracted text and metadata
Raises:
ProcessorError: If processing fails
"""
# Prepare multipart request
files = {
"files": (
filename or "document",
io.BytesIO(content),
content_type or "application/octet-stream",
)
}
data = {
"strategy": strategy,
"languages": ",".join(languages),
}
if extract_image_block_types:
data["extract_image_block_types"] = ",".join(extract_image_block_types)
logger.debug(
f"Processing with Unstructured API: strategy={strategy}, languages={languages}"
)
try:
async with httpx.AsyncClient(timeout=self.timeout) as client:
response = await client.post(
f"{self.api_url}/general/v0/general",
files=files,
data=data,
)
response.raise_for_status()
# Parse response
elements = response.json()
# Extract text and metadata
texts = []
element_types: dict[str, int] = {}
for element in elements:
if "text" in element and element["text"]:
texts.append(element["text"])
el_type = element.get("type", "unknown")
element_types[el_type] = element_types.get(el_type, 0) + 1
parsed_text = "\n\n".join(texts)
metadata = {
"element_count": len(elements),
"text_length": len(parsed_text),
"element_types": element_types,
"strategy": strategy,
"languages": languages,
}
logger.debug(
f"Successfully processed: {len(elements)} elements, "
f"{len(parsed_text)} characters"
)
return ProcessingResult(
text=parsed_text,
metadata=metadata,
processor=self.name,
success=True,
)
except httpx.HTTPError as e:
logger.error(f"Unstructured API HTTP error: {e}")
raise ProcessorError(f"HTTP error: {str(e)}") from e
except Exception as e:
logger.error(f"Unstructured API processing failed: {e}")
raise ProcessorError(f"Processing failed: {str(e)}") from e
async def process(
self,
content: bytes,
content_type: str,
filename: Optional[str] = None,
options: Optional[dict[str, Any]] = None,
progress_callback: Optional[
Callable[[float, Optional[float], Optional[str]], Awaitable[None]]
] = None,
) -> ProcessingResult:
"""Process document via Unstructured API.
Args:
content: Document bytes
content_type: MIME type
filename: Optional filename for format detection
options: Processing options:
- strategy: "auto", "fast", or "hi_res" (default: from init)
- languages: List of language codes (default: from init)
- extract_image_block_types: Types of image elements to extract
progress_callback: Optional async callback for progress updates
Returns:
ProcessingResult with extracted text and metadata
Raises:
ProcessorError: If processing fails
"""
options = options or {}
# Extract options with defaults
strategy = options.get("strategy", self.default_strategy)
languages = options.get("languages", self.default_languages)
extract_image_block_types = options.get("extract_image_block_types")
# If no progress callback, just make the request directly
if progress_callback is None:
return await self._make_api_request(
content=content,
content_type=content_type,
filename=filename,
strategy=strategy,
languages=languages,
extract_image_block_types=extract_image_block_types,
)
# With progress callback: run API request + progress poller concurrently
stop_event = anyio.Event()
start_time = time.time()
result = None
async def capture_result():
nonlocal result
try:
result = await self._make_api_request(
content=content,
content_type=content_type,
filename=filename,
strategy=strategy,
languages=languages,
extract_image_block_types=extract_image_block_types,
)
finally:
# Signal poller to stop after API request completes
stop_event.set()
# Run both tasks concurrently using anyio task groups
async with anyio.create_task_group() as tg:
tg.start_soon(capture_result)
tg.start_soon(
self._run_progress_poller, stop_event, progress_callback, start_time
)
return result
async def health_check(self) -> bool:
"""Check if Unstructured API is available.
Returns:
True if API is healthy, False otherwise
"""
try:
async with httpx.AsyncClient(timeout=5) as client:
response = await client.get(f"{self.api_url}/healthcheck")
return response.status_code == 200
except Exception as e:
logger.warning(f"Unstructured health check failed: {e}")
return False
+47 -2
View File
@@ -5,6 +5,10 @@ from mcp.server.fastmcp import Context, FastMCP
from nextcloud_mcp_server.auth import require_scopes
from nextcloud_mcp_server.context import get_client
from nextcloud_mcp_server.models import DirectoryListing, FileInfo, SearchFilesResponse
from nextcloud_mcp_server.utils.document_parser import (
is_parseable_document,
parse_document,
)
logger = logging.getLogger(__name__)
@@ -53,12 +57,53 @@ def configure_webdav_tools(mcp: FastMCP):
path: Full path to the file to read
Returns:
Dict with path, content, content_type, size, and encoding (if binary)
Text files are decoded to UTF-8, binary files are base64 encoded
Dict with path, content, content_type, size, and optional parsing metadata
- Text files are decoded to UTF-8
- Documents (PDF, DOCX, etc.) are parsed and text is extracted
- Other binary files are base64 encoded
Examples:
# Read a text file
result = await nc_webdav_read_file("Documents/readme.txt")
logger.info(result['content']) # Decoded text content
# Read a PDF document (automatically parsed)
result = await nc_webdav_read_file("Documents/report.pdf")
logger.info(result['content']) # Extracted text from PDF
logger.info(result['parsing_metadata']) # Document parsing info
# Read a binary file
result = await nc_webdav_read_file("Images/photo.jpg")
logger.info(result['encoding']) # 'base64'
"""
client = get_client(ctx)
content, content_type = await client.webdav.read_file(path)
# Check if this is a parseable document (PDF, DOCX, etc.)
# is_parseable_document() checks if document processing is enabled
if is_parseable_document(content_type):
try:
logger.info(f"Parsing document '{path}' of type '{content_type}'")
parsed_text, metadata = await parse_document(
content,
content_type,
filename=path,
progress_callback=ctx.report_progress,
)
return {
"path": path,
"content": parsed_text,
"content_type": content_type,
"size": len(content),
"parsed": True,
"parsing_metadata": metadata,
}
except Exception as e:
logger.warning(
f"Failed to parse document '{path}', falling back to base64: {e}"
)
# Fall through to base64 encoding on parse failure
# For text files, decode content for easier viewing
if content_type and content_type.startswith("text/"):
try:
+1
View File
@@ -0,0 +1 @@
"""Utility functions for the Nextcloud MCP server."""
@@ -0,0 +1,100 @@
"""Document parsing utilities using pluggable processor registry."""
import base64
import logging
from collections.abc import Awaitable, Callable
from typing import Optional, Tuple
from nextcloud_mcp_server.config import get_document_processor_config
from nextcloud_mcp_server.document_processors import (
ProcessorError,
get_registry,
)
logger = logging.getLogger(__name__)
def is_parseable_document(content_type: Optional[str]) -> bool:
"""Check if a document type can be parsed by any registered processor.
Args:
content_type: The MIME type of the document
Returns:
True if any processor can handle this type, False otherwise
"""
if not content_type:
return False
config = get_document_processor_config()
if not config["enabled"]:
return False
registry = get_registry()
processor = registry.find_processor(content_type)
return processor is not None
async def parse_document(
content: bytes,
content_type: Optional[str],
filename: Optional[str] = None,
progress_callback: Optional[
Callable[[float, Optional[float], Optional[str]], Awaitable[None]]
] = None,
) -> Tuple[str, dict]:
"""Parse a document using registered processors.
This function uses the processor registry to find an appropriate
processor for the given document type and extract text from it.
Args:
content: The document content as bytes
content_type: The MIME type of the document
filename: Optional filename to help with format detection
progress_callback: Optional async callback for progress updates during long operations
Returns:
Tuple of (parsed_text, metadata) where:
- parsed_text: The extracted text content
- metadata: Additional metadata about the parsing
Raises:
ValueError: If the document type is not supported
Exception: If parsing fails
"""
if not content_type:
raise ValueError("Content type is required for document parsing")
config = get_document_processor_config()
if not config["enabled"]:
raise ValueError("Document processing is disabled")
registry = get_registry()
logger.debug(f"Parsing document of type '{content_type}'")
try:
# Process using registry (auto-selects processor based on MIME type)
result = await registry.process(
content=content,
content_type=content_type,
filename=filename,
progress_callback=progress_callback,
)
logger.info(f"Successfully parsed document with '{result.processor}' processor")
return result.text, result.metadata
except ProcessorError as e:
logger.error(f"Document processing failed: {e}")
# Fallback to base64 with error metadata
parsed_text = f"Document could not be parsed. Base64 content: {base64.b64encode(content).decode('ascii')[:200]}..."
metadata = {
"mime_type": content_type,
"text_length": len(parsed_text),
"parsing_method": "fallback_base64",
"error": str(e),
}
return parsed_text, metadata
+2 -1
View File
@@ -1,6 +1,6 @@
[project]
name = "nextcloud-mcp-server"
version = "0.20.0"
version = "0.21.0"
description = "Model Context Protocol (MCP) server for Nextcloud integration - enables AI assistants to interact with Nextcloud data"
authors = [
{name = "Chris Coutinho", email = "chris@coutinho.io"}
@@ -91,6 +91,7 @@ dev = [
"pytest-playwright-asyncio>=0.7.1",
"pytest-timeout>=2.3.1",
"ruff>=0.11.13",
"reportlab>=4.0.0",
]
[project.scripts]
@@ -0,0 +1,143 @@
"""Integration tests for document processing with progress notifications."""
import io
import pytest
from PIL import Image
pytestmark = pytest.mark.integration
class TestDocumentProcessingProgress:
"""Test document processing with progress notifications."""
async def test_unstructured_processor_with_progress_callback(self, nc_client):
"""Test that UnstructuredProcessor calls progress callback during processing."""
import os
# Skip if unstructured is not enabled
if os.getenv("ENABLE_UNSTRUCTURED", "false").lower() != "true":
pytest.skip("Unstructured processor not enabled")
from nextcloud_mcp_server.document_processors.unstructured import (
UnstructuredProcessor,
)
# Track progress callback invocations
progress_updates = []
async def track_progress(progress: float, total: float | None, message: str):
progress_updates.append(
{"progress": progress, "total": total, "message": message}
)
# Create processor configured to use local unstructured service
processor = UnstructuredProcessor(
api_url=os.getenv("UNSTRUCTURED_API_URL", "http://unstructured:8000"),
timeout=120,
progress_interval=2, # 2 second intervals for testing
)
# Create a simple test image (which requires OCR processing)
# This should take long enough to trigger at least one progress update
img = Image.new("RGB", (400, 200), color=(73, 109, 137))
buffer = io.BytesIO()
img.save(buffer, format="PNG")
test_image = buffer.getvalue()
# Process with progress callback
result = await processor.process(
content=test_image,
content_type="image/png",
filename="test.png",
progress_callback=track_progress,
)
# Verify processing succeeded
assert result.success is True
assert result.processor == "unstructured"
assert isinstance(result.text, str)
# Note: Progress updates may or may not occur depending on processing speed
# If updates occurred, verify their structure
if progress_updates:
for update in progress_updates:
assert isinstance(update["progress"], float)
assert update["total"] is None # Unknown total
assert "Processing document with unstructured" in update["message"]
assert "elapsed" in update["message"]
async def test_webdav_read_file_sends_progress_notifications(
self, nc_mcp_client, nc_client
):
"""Test that reading a document via WebDAV MCP tool sends progress notifications."""
import os
# Skip if document processing is not enabled
if os.getenv("ENABLE_DOCUMENT_PROCESSING", "false").lower() != "true":
pytest.skip("Document processing not enabled")
# Create a test image file in Nextcloud via WebDAV
from PIL import Image
img = Image.new("RGB", (400, 200), color=(100, 150, 200))
buffer = io.BytesIO()
img.save(buffer, format="PNG")
test_image = buffer.getvalue()
# Upload test file
test_path = "test_progress.png"
await nc_client.webdav.write_file(test_path, test_image, "image/png")
try:
# Read file via MCP tool (which should trigger document processing)
# The MCP client will automatically track progress notifications
result = await nc_mcp_client.call_tool(
"nc_webdav_read_file", arguments={"path": test_path}
)
# Note: FastMCP progress notifications are sent automatically by ctx.report_progress
# We can't easily capture them in this test without mocking the MCP transport layer
# The important thing is that the code path is exercised without errors
assert result.isError is False
finally:
# Cleanup
try:
await nc_client.webdav.delete_resource(test_path)
except Exception:
pass # Ignore cleanup errors
async def test_progress_callback_not_required(self, nc_client):
"""Test that processing works without progress callback (backward compatibility)."""
import os
if os.getenv("ENABLE_UNSTRUCTURED", "false").lower() != "true":
pytest.skip("Unstructured processor not enabled")
from nextcloud_mcp_server.document_processors.unstructured import (
UnstructuredProcessor,
)
processor = UnstructuredProcessor(
api_url=os.getenv("UNSTRUCTURED_API_URL", "http://unstructured:8000"),
timeout=120,
)
# Create simple test image
img = Image.new("RGB", (200, 100), color=(50, 100, 150))
buffer = io.BytesIO()
img.save(buffer, format="PNG")
test_image = buffer.getvalue()
# Process WITHOUT progress callback
result = await processor.process(
content=test_image,
content_type="image/png",
filename="test.png",
progress_callback=None, # Explicitly None
)
# Should still work
assert result.success is True
assert result.processor == "unstructured"
+155
View File
@@ -0,0 +1,155 @@
"""Integration tests for Unstructured API functionality."""
import json
import logging
import os
import uuid
from io import BytesIO
import pytest
from mcp.client.session import ClientSession
from reportlab.lib.pagesizes import letter
from reportlab.pdfgen import canvas
from nextcloud_mcp_server.client import NextcloudClient
logger = logging.getLogger(__name__)
@pytest.fixture
async def test_base_path(nc_client: NextcloudClient):
"""Base path for test files/directories."""
test_dir = f"mcp_test_unstructured_{uuid.uuid4().hex[:8]}"
await nc_client.webdav.create_directory(test_dir)
yield test_dir
try:
await nc_client.webdav.delete_resource(test_dir)
except Exception:
pass # Ignore cleanup errors
def create_test_pdf(text: str) -> bytes:
"""Create a simple PDF document with the given text."""
buffer = BytesIO()
c = canvas.Canvas(buffer, pagesize=letter)
c.drawString(100, 750, text)
c.save()
buffer.seek(0)
return buffer.getvalue()
@pytest.mark.skipif(
condition=os.getenv("ENABLE_UNSTRUCTURED", "false") != "true",
reason="Unstructured is not enabled",
)
async def test_unstructured_api_enabled_parsing(
nc_client: NextcloudClient, test_base_path: str, nc_mcp_client: ClientSession
):
"""Test that documents are parsed using the Unstructured API when enabled."""
test_file = f"{test_base_path}/test_unstructured_pdf.pdf"
test_text = "This is a test PDF document for Unstructured API parsing"
try:
# Create a simple PDF
pdf_content = create_test_pdf(test_text)
# Upload the PDF
await nc_client.webdav.write_file(
test_file, pdf_content, content_type="application/pdf"
)
logger.info(f"Uploaded PDF file: {test_file}")
# Read the PDF using MCP tool (should parse via Unstructured API)
mcp_result = await nc_mcp_client.call_tool(
"nc_webdav_read_file", arguments={"path": test_file}
)
# Extract content from the MCP result
if hasattr(mcp_result.content[0], "text"):
result_text = mcp_result.content[0].text
else:
# Fallback for other content types
result_text = str(mcp_result.content[0])
# Parse the JSON response
result = json.loads(result_text)
# Verify the result structure
assert "path" in result
assert "content" in result
assert "content_type" in result
assert "parsed" in result # Should be present when parsing succeeds
# The content should be readable text, not base64
content = result["content"]
assert isinstance(content, str)
assert len(content) > 0
assert "test" in content.lower() # Should contain our test text
# Should have parsing metadata
assert "parsing_metadata" in result
parsing_metadata = result["parsing_metadata"]
assert parsing_metadata["parsing_method"] == "unstructured_api"
logger.info("Successfully parsed PDF using Unstructured API")
finally:
# Clean up
try:
await nc_client.webdav.delete_resource(test_file)
except Exception:
pass # Ignore cleanup errors
@pytest.mark.skipif(
condition=os.getenv("ENABLE_UNSTRUCTURED", "false") != "true",
reason="Unstructured is not enabled",
)
async def test_unstructured_api_with_docx(
nc_client: NextcloudClient, test_base_path: str, nc_mcp_client: ClientSession
):
"""Test Unstructured API with DOCX files."""
test_file = f"{test_base_path}/test_unstructured_docx.docx"
try:
# Create a simple DOCX-like file for testing purposes
# Since we're removing python-docx dependency, we'll create a simple file
docx_content = (
b"This is a mock DOCX file content for testing Unstructured API parsing"
)
# Upload the file
await nc_client.webdav.write_file(
test_file,
docx_content,
content_type="application/vnd.openxmlformats-officedocument.wordprocessingml.document",
)
logger.info(f"Uploaded DOCX file: {test_file}")
# Read the file using MCP tool
mcp_result = await nc_mcp_client.call_tool(
"nc_webdav_read_file", arguments={"path": test_file}
)
# Extract content from the MCP result
if hasattr(mcp_result.content[0], "text"):
result_text = mcp_result.content[0].text
else:
# Fallback for other content types
result_text = str(mcp_result.content[0])
# Parse the JSON response
result = json.loads(result_text)
# Verify the result structure
assert "path" in result
assert "content" in result
assert "content_type" in result
logger.info("Successfully processed DOCX file with Unstructured API")
finally:
# Clean up
try:
await nc_client.webdav.delete_resource(test_file)
except Exception:
pass # Ignore cleanup errors
+4
View File
@@ -1,5 +1,9 @@
import pytest
from nextcloud_mcp_server.client import NextcloudClient
pytestmark = pytest.mark.integration
async def test_create_and_delete_user(nc_client: NextcloudClient, test_user):
"""Test creating a user and verifying deletion (cleanup by fixture)."""
+11 -3
View File
@@ -253,9 +253,17 @@ def test_default_values(runner, clean_env, monkeypatch):
_ = runner.invoke(run, [])
# Verify default values
assert (
captured_env["NEXTCLOUD_OIDC_SCOPES"]
== "openid profile email notes:read notes:write calendar:read calendar:write contacts:read contacts:write cookbook:read cookbook:write deck:read deck:write tables:read tables:write files:read files:write sharing:read sharing:write"
assert captured_env["NEXTCLOUD_OIDC_SCOPES"] == (
"openid profile email "
"notes:read notes:write "
"calendar:read calendar:write "
"todo:read todo:write "
"contacts:read contacts:write "
"cookbook:read cookbook:write "
"deck:read deck:write "
"tables:read tables:write "
"files:read files:write "
"sharing:read sharing:write"
)
assert captured_env["NEXTCLOUD_OIDC_TOKEN_TYPE"] == "bearer"
assert captured_env["NEXTCLOUD_MCP_SERVER_URL"] == "http://localhost:8000"
@@ -0,0 +1,136 @@
"""Unit tests for document processor configuration."""
import os
import pytest
pytestmark = pytest.mark.unit
class TestDocumentProcessorConfig:
"""Test document processor configuration system."""
def test_config_disabled_by_default(self):
"""Test that document processing is disabled by default."""
from nextcloud_mcp_server.config import get_document_processor_config
os.environ.pop("ENABLE_DOCUMENT_PROCESSING", None)
config = get_document_processor_config()
assert config["enabled"] is False
def test_config_enabled(self):
"""Test enabling document processing."""
from nextcloud_mcp_server.config import get_document_processor_config
os.environ["ENABLE_DOCUMENT_PROCESSING"] = "true"
try:
config = get_document_processor_config()
assert config["enabled"] is True
finally:
os.environ.pop("ENABLE_DOCUMENT_PROCESSING", None)
def test_unstructured_processor_config(self):
"""Test Unstructured processor configuration."""
from nextcloud_mcp_server.config import get_document_processor_config
os.environ["ENABLE_UNSTRUCTURED"] = "true"
os.environ["UNSTRUCTURED_API_URL"] = "http://test:8000"
os.environ["UNSTRUCTURED_STRATEGY"] = "hi_res"
os.environ["UNSTRUCTURED_LANGUAGES"] = "eng,fra"
os.environ["UNSTRUCTURED_TIMEOUT"] = "60"
try:
config = get_document_processor_config()
assert "unstructured" in config["processors"]
unst_config = config["processors"]["unstructured"]
assert unst_config["api_url"] == "http://test:8000"
assert unst_config["strategy"] == "hi_res"
assert unst_config["languages"] == ["eng", "fra"]
assert unst_config["timeout"] == 60
finally:
os.environ.pop("ENABLE_UNSTRUCTURED", None)
os.environ.pop("UNSTRUCTURED_API_URL", None)
os.environ.pop("UNSTRUCTURED_STRATEGY", None)
os.environ.pop("UNSTRUCTURED_LANGUAGES", None)
os.environ.pop("UNSTRUCTURED_TIMEOUT", None)
def test_tesseract_processor_config(self):
"""Test Tesseract processor configuration."""
from nextcloud_mcp_server.config import get_document_processor_config
os.environ["ENABLE_TESSERACT"] = "true"
os.environ["TESSERACT_LANG"] = "eng+deu"
os.environ["TESSERACT_CMD"] = "/usr/local/bin/tesseract"
try:
config = get_document_processor_config()
assert "tesseract" in config["processors"]
tess_config = config["processors"]["tesseract"]
assert tess_config["lang"] == "eng+deu"
assert tess_config["tesseract_cmd"] == "/usr/local/bin/tesseract"
finally:
os.environ.pop("ENABLE_TESSERACT", None)
os.environ.pop("TESSERACT_LANG", None)
os.environ.pop("TESSERACT_CMD", None)
def test_custom_processor_config(self):
"""Test custom processor configuration."""
from nextcloud_mcp_server.config import get_document_processor_config
os.environ["ENABLE_CUSTOM_PROCESSOR"] = "true"
os.environ["CUSTOM_PROCESSOR_NAME"] = "my_ocr"
os.environ["CUSTOM_PROCESSOR_URL"] = "http://localhost:9000/process"
os.environ["CUSTOM_PROCESSOR_API_KEY"] = "secret"
os.environ["CUSTOM_PROCESSOR_TIMEOUT"] = "30"
os.environ["CUSTOM_PROCESSOR_TYPES"] = "application/pdf,image/jpeg"
try:
config = get_document_processor_config()
assert "custom" in config["processors"]
custom_config = config["processors"]["custom"]
assert custom_config["name"] == "my_ocr"
assert custom_config["api_url"] == "http://localhost:9000/process"
assert custom_config["api_key"] == "secret"
assert custom_config["timeout"] == 30
assert "application/pdf" in custom_config["supported_types"]
assert "image/jpeg" in custom_config["supported_types"]
finally:
os.environ.pop("ENABLE_CUSTOM_PROCESSOR", None)
os.environ.pop("CUSTOM_PROCESSOR_NAME", None)
os.environ.pop("CUSTOM_PROCESSOR_URL", None)
os.environ.pop("CUSTOM_PROCESSOR_API_KEY", None)
os.environ.pop("CUSTOM_PROCESSOR_TIMEOUT", None)
os.environ.pop("CUSTOM_PROCESSOR_TYPES", None)
def test_multiple_processors(self):
"""Test configuration with multiple processors enabled."""
from nextcloud_mcp_server.config import get_document_processor_config
os.environ["ENABLE_DOCUMENT_PROCESSING"] = "true"
os.environ["ENABLE_UNSTRUCTURED"] = "true"
os.environ["ENABLE_TESSERACT"] = "true"
try:
config = get_document_processor_config()
assert config["enabled"] is True
assert "unstructured" in config["processors"]
assert "tesseract" in config["processors"]
finally:
os.environ.pop("ENABLE_DOCUMENT_PROCESSING", None)
os.environ.pop("ENABLE_UNSTRUCTURED", None)
os.environ.pop("ENABLE_TESSERACT", None)
def test_default_processor_selection(self):
"""Test default processor configuration."""
from nextcloud_mcp_server.config import get_document_processor_config
os.environ.pop("DOCUMENT_PROCESSOR", None)
config = get_document_processor_config()
assert config["default_processor"] == "unstructured"
os.environ["DOCUMENT_PROCESSOR"] = "tesseract"
try:
config = get_document_processor_config()
assert config["default_processor"] == "tesseract"
finally:
os.environ.pop("DOCUMENT_PROCESSOR", None)
+164
View File
@@ -0,0 +1,164 @@
"""Unit tests for progress notification system."""
import time
from unittest.mock import AsyncMock
import anyio
import pytest
pytestmark = pytest.mark.unit
class TestProgressNotification:
"""Test progress notification in document processors."""
async def test_progress_callback_called_during_processing(self):
"""Test that progress callback is called at intervals during processing."""
from nextcloud_mcp_server.document_processors.unstructured import (
UnstructuredProcessor,
)
# Mock progress callback to track calls
progress_callback = AsyncMock()
# Create processor with 1-second interval for faster testing
processor = UnstructuredProcessor(
api_url="http://test:8000",
timeout=10,
progress_interval=1,
)
# Create a mock event and start time
stop_event = anyio.Event()
start_time = time.time()
# Run the poller for 3 seconds, then stop it
async def stop_after_delay():
await anyio.sleep(3.5)
stop_event.set()
# Run poller and stopper concurrently
async with anyio.create_task_group() as tg:
tg.start_soon(
processor._run_progress_poller,
stop_event,
progress_callback,
start_time,
)
tg.start_soon(stop_after_delay)
# Verify progress callback was called at least 3 times (1s, 2s, 3s)
assert progress_callback.call_count >= 3
# Verify each call had correct structure
for call in progress_callback.call_args_list:
# Calls are made with keyword arguments
assert "progress" in call.kwargs
assert "total" in call.kwargs
assert "message" in call.kwargs
progress = call.kwargs["progress"]
total = call.kwargs["total"]
message = call.kwargs["message"]
assert isinstance(progress, float)
assert total is None # Unknown total for unstructured
assert "Processing document with unstructured" in message
assert "elapsed" in message
async def test_progress_poller_stops_when_event_set(self):
"""Test that progress poller stops immediately when event is set."""
from nextcloud_mcp_server.document_processors.unstructured import (
UnstructuredProcessor,
)
progress_callback = AsyncMock()
processor = UnstructuredProcessor(
api_url="http://test:8000",
timeout=10,
progress_interval=10, # Long interval
)
stop_event = anyio.Event()
start_time = time.time()
# Set event immediately
stop_event.set()
# Run poller
await processor._run_progress_poller(stop_event, progress_callback, start_time)
# Should not call progress callback since event was already set
assert progress_callback.call_count == 0
async def test_progress_callback_exception_handled(self):
"""Test that exceptions in progress callback don't crash the poller."""
from nextcloud_mcp_server.document_processors.unstructured import (
UnstructuredProcessor,
)
# Mock callback that raises exception
progress_callback = AsyncMock(side_effect=Exception("Callback error"))
processor = UnstructuredProcessor(
api_url="http://test:8000",
timeout=10,
progress_interval=1,
)
stop_event = anyio.Event()
start_time = time.time()
# Run poller for 2 seconds
async def stop_after_delay():
await anyio.sleep(2.5)
stop_event.set()
# Should not raise exception even though callback fails
async with anyio.create_task_group() as tg:
tg.start_soon(
processor._run_progress_poller,
stop_event,
progress_callback,
start_time,
)
tg.start_soon(stop_after_delay)
# Callback should have been called (and failed) at least twice
assert progress_callback.call_count >= 2
async def test_process_without_progress_callback(self):
"""Test that processing works without progress callback (backward compatibility)."""
from nextcloud_mcp_server.document_processors.unstructured import (
UnstructuredProcessor,
)
processor = UnstructuredProcessor(
api_url="http://test:8000",
timeout=10,
progress_interval=1,
)
# Mock the _make_api_request method to avoid actual HTTP call
from unittest.mock import patch
from nextcloud_mcp_server.document_processors.base import ProcessingResult
mock_result = ProcessingResult(
text="Test content",
metadata={"test": "data"},
processor="unstructured",
success=True,
)
with patch.object(
processor, "_make_api_request", return_value=mock_result
) as mock_request:
# Call process without progress_callback
result = await processor.process(
content=b"test", content_type="application/pdf", progress_callback=None
)
# Should call _make_api_request directly
assert result == mock_result
mock_request.assert_called_once()
Generated
+973 -948
View File
File diff suppressed because it is too large Load Diff