Compare commits
12 Commits
| Author | SHA1 | Date | |
|---|---|---|---|
| ce666934f2 | |||
| cdf69b3ea8 | |||
| a6e5f3d8ff | |||
| f44bf3e8f2 | |||
| 37141003d8 | |||
| c787abf2f3 | |||
| b32324cb76 | |||
| 640a7818f9 | |||
| 8e5d0b5df1 | |||
| 851d21f56e | |||
| bf4eed6007 | |||
| 3a41860d27 |
@@ -5,5 +5,7 @@ __pycache__/
|
||||
.env.local
|
||||
.env.*.local
|
||||
|
||||
docker-compose.override.yml
|
||||
|
||||
# Generated by pytest used to login users
|
||||
.nextcloud_oauth_*.json
|
||||
|
||||
@@ -1,3 +1,20 @@
|
||||
## v0.31.1 (2025-11-10)
|
||||
|
||||
### Refactor
|
||||
|
||||
- simplify OpenTelemetry tracing configuration
|
||||
|
||||
## v0.31.0 (2025-11-10)
|
||||
|
||||
### Feat
|
||||
|
||||
- skip tracing for health and metrics endpoints
|
||||
|
||||
### Fix
|
||||
|
||||
- add retry logic for ETag conflicts in category change test
|
||||
- optimize Notes API pagination with pruneBefore parameter
|
||||
|
||||
## v0.30.0 (2025-11-10)
|
||||
|
||||
### Feat
|
||||
|
||||
+1
-1
@@ -9,7 +9,7 @@ WORKDIR /app
|
||||
|
||||
COPY . .
|
||||
|
||||
RUN uv sync --locked --no-dev
|
||||
RUN uv sync --locked --no-dev --no-editable
|
||||
|
||||
ENV PYTHONUNBUFFERED=1
|
||||
ENV VIRTUAL_ENV=/app/.venv
|
||||
|
||||
@@ -2,8 +2,8 @@ apiVersion: v2
|
||||
name: nextcloud-mcp-server
|
||||
description: A Helm chart for Nextcloud MCP Server - enables AI assistants to interact with Nextcloud
|
||||
type: application
|
||||
version: 0.30.0
|
||||
appVersion: "0.30.0"
|
||||
version: 0.31.1
|
||||
appVersion: "0.31.1"
|
||||
keywords:
|
||||
- nextcloud
|
||||
- mcp
|
||||
|
||||
@@ -218,8 +218,6 @@ spec:
|
||||
- name: METRICS_PORT
|
||||
value: {{ .Values.observability.metrics.port | quote }}
|
||||
{{- if .Values.observability.tracing.enabled }}
|
||||
- name: OTEL_ENABLED
|
||||
value: "true"
|
||||
- name: OTEL_EXPORTER_OTLP_ENDPOINT
|
||||
value: {{ .Values.observability.tracing.endpoint | quote }}
|
||||
- name: OTEL_SERVICE_NAME
|
||||
|
||||
+7
-3
@@ -98,16 +98,20 @@ services:
|
||||
#- QDRANT_URL=http://qdrant:6333 # Uncomment for network mode
|
||||
#- QDRANT_API_KEY=${QDRANT_API_KEY:-my_secret_api_key} # Only for network mode
|
||||
|
||||
# Observability
|
||||
#- OTEL_SERVICE_NAME=nextcloud-mcp-docker-compose
|
||||
#- OTEL_EXPORTER_OTLP_ENDPOINT=http://otel-collector:4317
|
||||
|
||||
# Collection naming: Auto-generated as {deployment-id}-{model-name}
|
||||
# - Deployment ID: OTEL_SERVICE_NAME (if set) or hostname (fallback)
|
||||
# - Model name: OLLAMA_EMBEDDING_MODEL
|
||||
# - Example: "nextcloud-mcp-server-nomic-embed-text"
|
||||
# - Changing models creates new collection (requires re-embedding)
|
||||
# - Set QDRANT_COLLECTION to override auto-generation:
|
||||
- QDRANT_COLLECTION=nextcloud_content
|
||||
#- QDRANT_COLLECTION=nextcloud_content
|
||||
|
||||
# Ollama configuration (optional - uses SimpleEmbeddingProvider if not set)
|
||||
# - OLLAMA_BASE_URL=https://ollama.internal.coutinho.io:443
|
||||
# - OLLAMA_BASE_URL=http://ollama:11434
|
||||
# - OLLAMA_EMBEDDING_MODEL=nomic-embed-text # Changing this creates new collection
|
||||
# - OLLAMA_VERIFY_SSL=false
|
||||
|
||||
@@ -219,7 +223,7 @@ services:
|
||||
- keycloak-oauth-storage:/app/.oauth
|
||||
|
||||
qdrant:
|
||||
image: qdrant/qdrant:v1.15.5
|
||||
image: qdrant/qdrant:v1.15.5@sha256:0fb8897412abc81d1c0430a899b9a81eb8328aa634e7242d1bc804c1fe8fe863
|
||||
restart: always
|
||||
ports:
|
||||
- 127.0.0.1:6333:6333 # REST API
|
||||
|
||||
@@ -0,0 +1,420 @@
|
||||
# ADR-010: Webhook-Based Vector Database Synchronization
|
||||
|
||||
**Status**: Proposed
|
||||
**Date**: 2025-01-10
|
||||
**Depends On**: ADR-007 (Background Vector Sync)
|
||||
|
||||
## Context
|
||||
|
||||
ADR-007 established a background synchronization architecture for maintaining the vector database using periodic polling. The scanner task runs on a configurable interval (default 3600 seconds / 1 hour) to detect changed documents across Nextcloud apps. While this polling approach is simple and reliable, it introduces significant latency between content changes and vector database updates.
|
||||
|
||||
### Current Polling Architecture
|
||||
|
||||
The existing scanner implementation in `nextcloud_mcp_server/vector/scanner.py` operates as follows:
|
||||
|
||||
1. **Periodic Scanning**: The scanner task sleeps for `vector_sync_scan_interval` seconds between runs
|
||||
2. **Change Detection**: For each scan, it:
|
||||
- Fetches all documents from Nextcloud (notes, calendar events, etc.)
|
||||
- Queries Qdrant for the last indexed timestamp of each document
|
||||
- Compares modification timestamps to detect changes
|
||||
- Queues changed documents for processing
|
||||
3. **Document Processing**: Processor tasks pull from the queue, generate embeddings, and update Qdrant
|
||||
|
||||
This architecture works but has fundamental limitations:
|
||||
|
||||
**Latency**: With a 1-hour scan interval, content changes can take up to 1 hour to appear in semantic search results. For time-sensitive use cases (e.g., "What's on my calendar today?"), this delay is problematic.
|
||||
|
||||
**API Load**: Every scan fetches *all* documents for *all* enabled users, regardless of whether anything changed. For large deployments with thousands of documents, this generates significant unnecessary API traffic to Nextcloud.
|
||||
|
||||
**Resource Waste**: The scanner and processors consume compute resources even when no content has changed. During periods of low activity, the system performs wasteful polling.
|
||||
|
||||
**Scalability**: As the number of users and documents grows, the time required to complete a full scan increases. Eventually, the scan duration may exceed the scan interval, causing scans to run continuously without idle periods.
|
||||
|
||||
**Rate Limiting**: Fetching all documents for all users in rapid succession can trigger Nextcloud's rate limiting, especially on shared hosting environments with restrictive API quotas.
|
||||
|
||||
These limitations are inherent to any polling-based architecture. Reducing the scan interval (e.g., to 5 minutes) reduces latency but exacerbates API load, resource waste, and rate limiting issues. The fundamental problem is that the system has no way to know *when* content changes occur—it must repeatedly check to find out.
|
||||
|
||||
### Nextcloud Webhook Listeners
|
||||
|
||||
Nextcloud provides a webhook_listeners app (bundled with Nextcloud 30+) that enables push-based change notifications. Instead of polling for changes, external services can register webhook endpoints and receive HTTP POST requests when specific events occur. Administrators register these webhooks using Nextcloud's OCS API or occ commands.
|
||||
|
||||
The webhook_listeners app supports events for all Nextcloud apps relevant to this MCP server's vector database:
|
||||
|
||||
**Files/Notes Events** (notes are stored as files):
|
||||
- `OCP\Files\Events\Node\NodeCreatedEvent`
|
||||
- `OCP\Files\Events\Node\NodeWrittenEvent`
|
||||
- `OCP\Files\Events\Node\NodeDeletedEvent`
|
||||
- `OCP\Files\Events\Node\NodeRenamedEvent`
|
||||
- `OCP\Files\Events\Node\NodeCopiedEvent`
|
||||
|
||||
**Calendar Events**:
|
||||
- `OCP\Calendar\Events\CalendarObjectCreatedEvent`
|
||||
- `OCP\Calendar\Events\CalendarObjectUpdatedEvent`
|
||||
- `OCP\Calendar\Events\CalendarObjectDeletedEvent`
|
||||
- `OCP\Calendar\Events\CalendarObjectMovedEvent`
|
||||
|
||||
**Tables Events**:
|
||||
- `OCA\Tables\Event\RowAddedEvent`
|
||||
- `OCA\Tables\Event\RowUpdatedEvent`
|
||||
- `OCA\Tables\Event\RowDeletedEvent`
|
||||
|
||||
**Deck Events** (via file events since cards are stored as files in some configurations)
|
||||
|
||||
Each webhook notification includes rich metadata:
|
||||
- User ID who triggered the event
|
||||
- Timestamp of the event
|
||||
- Document ID and metadata
|
||||
- Operation type (create, update, delete)
|
||||
- Path information (for files)
|
||||
|
||||
Webhook notifications are dispatched via background jobs, with configurable delivery guarantees. Administrators can set up dedicated webhook worker processes to achieve near-real-time delivery (within seconds of the triggering event).
|
||||
|
||||
### Why Not Replace Polling Entirely?
|
||||
|
||||
While webhooks provide superior latency and efficiency, they cannot fully replace polling:
|
||||
|
||||
**Missed Events**: If the MCP server is down when a webhook fires, the notification is lost. Nextcloud's background job system processes webhooks asynchronously, but does not queue failed deliveries indefinitely.
|
||||
|
||||
**Administrator Setup**: Webhooks must be registered by Nextcloud administrators using the OCS API or occ commands. This is an optional optimization that administrators can enable when they want to reduce polling frequency.
|
||||
|
||||
**Filter Configuration**: Webhook filters must be carefully configured to avoid notification floods. A poorly configured filter could send thousands of notifications for bulk operations (e.g., importing a calendar with hundreds of events).
|
||||
|
||||
**Graceful Degradation**: In environments where webhooks are not configured, the system continues using polling without any degradation in functionality.
|
||||
|
||||
**Deletion Detection**: Nextcloud's webhook system does not guarantee delivery of deletion events if the user's account is removed or the app is uninstalled. Periodic polling provides a safety mechanism to detect orphaned documents.
|
||||
|
||||
A complementary architecture where webhooks supplement (but don't replace) polling provides low-latency updates when configured, with polling ensuring reliability.
|
||||
|
||||
### Design Considerations
|
||||
|
||||
**Push vs Pull Trade-offs**:
|
||||
Webhooks introduce new failure modes (network issues, endpoint unavailability, notification floods) that polling avoids. The webhook endpoint must handle failures gracefully without blocking semantic search functionality.
|
||||
|
||||
**Webhook Endpoint Security**:
|
||||
The MCP server exposes an HTTP endpoint to receive webhooks. Authentication is optional—in production deployments, administrators can configure Nextcloud to send an `Authorization` header that the MCP server validates. For local development, authentication can be disabled for simplicity.
|
||||
|
||||
**Idempotency**:
|
||||
The system may receive duplicate notifications (webhook + next scan) or out-of-order notifications (update fires before create completes). Document processing must be idempotent—processing the same document multiple times produces the same result.
|
||||
|
||||
**Asynchronous Processing**:
|
||||
Nextcloud processes webhooks via background jobs, introducing delivery latency (typically seconds to minutes depending on background job configuration). This affects testing strategies—integration tests cannot rely on immediate webhook delivery.
|
||||
|
||||
**Deployment Patterns**:
|
||||
The MCP server webhook endpoint is accessible at the same host/port as the MCP server itself. Administrators configure Nextcloud to POST to `https://<mcp-server-host>:<port>/webhooks/nextcloud` when registering webhook listeners.
|
||||
|
||||
## Decision
|
||||
|
||||
We will add a webhook endpoint to the MCP server that receives change notifications from Nextcloud and queues documents for vector database processing. This complements the existing polling architecture from ADR-007 without replacing it—webhooks provide low-latency updates when configured, while polling ensures reliability regardless of webhook availability.
|
||||
|
||||
The architecture is intentionally simple: the webhook endpoint is just another producer of `DocumentTask` objects that feed into the existing processor queue. The scanner task, processor pool, and queue management remain unchanged from ADR-007.
|
||||
|
||||
### Architecture Components
|
||||
|
||||
**1. Webhook Endpoint**
|
||||
|
||||
A new Starlette HTTP route will be added to receive webhook notifications from Nextcloud:
|
||||
|
||||
```python
|
||||
from starlette.requests import Request
|
||||
from starlette.responses import JSONResponse
|
||||
|
||||
@app.route("/webhooks/nextcloud", methods=["POST"])
|
||||
async def handle_nextcloud_webhook(request: Request) -> JSONResponse:
|
||||
"""
|
||||
Receive webhook notifications from Nextcloud.
|
||||
|
||||
Parses event payload, extracts document metadata, and queues
|
||||
changed documents for processing using the same queue as the scanner.
|
||||
"""
|
||||
# 1. Optional authentication validation
|
||||
if settings.webhook_secret:
|
||||
auth_header = request.headers.get("authorization", "")
|
||||
if not auth_header.startswith("Bearer ") or \
|
||||
auth_header[7:] != settings.webhook_secret:
|
||||
logger.warning("Webhook authentication failed")
|
||||
return JSONResponse(
|
||||
{"status": "error", "message": "Unauthorized"},
|
||||
status_code=401
|
||||
)
|
||||
|
||||
# 2. Parse webhook payload
|
||||
payload = await request.json()
|
||||
event_class = payload["event"]["class"]
|
||||
user_id = payload["user"]["uid"]
|
||||
|
||||
# 3. Extract document metadata from event
|
||||
doc_task = extract_document_task(event_class, payload)
|
||||
if not doc_task:
|
||||
return JSONResponse({"status": "ignored", "reason": "unsupported event"})
|
||||
|
||||
# 4. Send to processor queue (same queue as scanner)
|
||||
try:
|
||||
await webhook_send_stream.send(doc_task)
|
||||
logger.info(f"Queued document from webhook: {doc_task}")
|
||||
return JSONResponse({"status": "queued"})
|
||||
except Exception as e:
|
||||
logger.error(f"Failed to queue webhook document: {e}")
|
||||
return JSONResponse(
|
||||
{"status": "error", "message": str(e)},
|
||||
status_code=500
|
||||
)
|
||||
```
|
||||
|
||||
The endpoint:
|
||||
- Validates optional authentication via `Authorization: Bearer <secret>` header
|
||||
- Parses various event types (calendar, files, tables) into `DocumentTask` objects
|
||||
- Sends to the same processing queue that the scanner uses
|
||||
- Returns quickly (<50ms) to avoid blocking Nextcloud's webhook workers
|
||||
- Handles errors gracefully (invalid payload, queue full, etc.)
|
||||
|
||||
**2. Webhook Registration Helper (Development Only)**
|
||||
|
||||
For development and testing purposes, a helper method will be added to `NextcloudClient` for registering webhooks via the OCS API. This is NOT exposed as an MCP tool—administrators register webhooks manually using Nextcloud's admin interface or the OCS API directly.
|
||||
|
||||
```python
|
||||
class NextcloudClient:
|
||||
async def register_webhook(
|
||||
self,
|
||||
event_type: str,
|
||||
uri: str,
|
||||
http_method: str = "POST",
|
||||
auth_method: str = "none",
|
||||
headers: dict[str, str] | None = None,
|
||||
) -> dict:
|
||||
"""
|
||||
Register a webhook with Nextcloud (requires admin credentials).
|
||||
|
||||
Used for development/testing. Production admins should register
|
||||
webhooks using Nextcloud's admin UI or occ commands.
|
||||
"""
|
||||
# Implementation uses OCS API: POST /ocs/v2.php/apps/webhook_listeners/api/v1/webhooks
|
||||
...
|
||||
```
|
||||
|
||||
This keeps webhook registration out of the MCP tool surface while providing a convenient API for integration tests.
|
||||
|
||||
**3. Event Parsing**
|
||||
|
||||
A helper function extracts `DocumentTask` from various Nextcloud event types:
|
||||
|
||||
```python
|
||||
def extract_document_task(event_class: str, payload: dict) -> DocumentTask | None:
|
||||
"""Extract DocumentTask from webhook event payload."""
|
||||
user_id = payload["user"]["uid"]
|
||||
event_data = payload["event"]
|
||||
|
||||
# File/Note events
|
||||
if "NodeCreatedEvent" in event_class or "NodeWrittenEvent" in event_class:
|
||||
# Only process markdown files (notes)
|
||||
path = event_data["node"]["path"]
|
||||
if not path.endswith(".md"):
|
||||
return None
|
||||
return DocumentTask(
|
||||
user_id=user_id,
|
||||
doc_id=event_data["node"]["id"],
|
||||
doc_type="note",
|
||||
operation="index",
|
||||
modified_at=payload["time"],
|
||||
)
|
||||
|
||||
# Calendar events
|
||||
elif "CalendarObjectCreatedEvent" in event_class or \
|
||||
"CalendarObjectUpdatedEvent" in event_class:
|
||||
return DocumentTask(
|
||||
user_id=user_id,
|
||||
doc_id=str(event_data["objectData"]["id"]),
|
||||
doc_type="calendar_event",
|
||||
operation="index",
|
||||
modified_at=event_data["objectData"]["lastmodified"],
|
||||
)
|
||||
|
||||
# Deletion events
|
||||
elif "NodeDeletedEvent" in event_class or \
|
||||
"CalendarObjectDeletedEvent" in event_class:
|
||||
# Similar logic for delete operations
|
||||
...
|
||||
|
||||
return None # Unsupported event type
|
||||
```
|
||||
|
||||
**4. No Changes to Scanner or Processors**
|
||||
|
||||
The existing scanner task from ADR-007 continues operating unchanged. It polls Nextcloud on its configured interval (`VECTOR_SYNC_SCAN_INTERVAL`), discovers changed documents, and queues them for processing. The scanner is unaware of webhooks—it simply adds `DocumentTask` objects to the queue.
|
||||
|
||||
Similarly, the processor pool continues pulling `DocumentTask` objects from the queue, generating embeddings, and updating Qdrant. Processors don't know or care whether a task came from the scanner or a webhook.
|
||||
|
||||
This design keeps concerns separated: webhooks and scanner are independent producers, processors are independent consumers, and the queue mediates between them.
|
||||
|
||||
### Configuration
|
||||
|
||||
A new optional environment variable controls webhook authentication:
|
||||
|
||||
```bash
|
||||
# Optional: Shared secret for webhook authentication
|
||||
# If set, webhooks must include "Authorization: Bearer <secret>" header
|
||||
# If unset, no authentication is required (useful for local development)
|
||||
WEBHOOK_SECRET=<generate-random-secret>
|
||||
```
|
||||
|
||||
The webhook endpoint is automatically available at `/webhooks/nextcloud` when the MCP server starts. No feature flags or additional configuration needed—if Nextcloud sends webhooks to this endpoint, they will be processed.
|
||||
|
||||
**Reducing Polling Frequency**: Administrators who configure webhooks may want to reduce polling frequency to minimize API load while maintaining safety reconciliation scans:
|
||||
|
||||
```bash
|
||||
# Increase scan interval from 1 hour (default) to 24 hours
|
||||
VECTOR_SYNC_SCAN_INTERVAL=86400
|
||||
```
|
||||
|
||||
This is a manual configuration decision, not automatic—the scanner doesn't adapt based on webhook availability.
|
||||
|
||||
### Webhook Event Mapping
|
||||
|
||||
The webhook handler maps Nextcloud events to document types:
|
||||
|
||||
| Nextcloud Event | Document Type | Operation |
|
||||
|----------------|---------------|-----------|
|
||||
| `NodeCreatedEvent` (path: `*/files/*.md`) | `note` | `index` |
|
||||
| `NodeWrittenEvent` (path: `*/files/*.md`) | `note` | `index` |
|
||||
| `NodeDeletedEvent` (path: `*/files/*.md`) | `note` | `delete` |
|
||||
| `CalendarObjectCreatedEvent` | `calendar_event` | `index` |
|
||||
| `CalendarObjectUpdatedEvent` | `calendar_event` | `index` |
|
||||
| `CalendarObjectDeletedEvent` | `calendar_event` | `delete` |
|
||||
| `RowAddedEvent` | `table_row` | `index` |
|
||||
| `RowUpdatedEvent` | `table_row` | `index` |
|
||||
| `RowDeletedEvent` | `table_row` | `delete` |
|
||||
|
||||
Path filters in webhook registration ensure only relevant files trigger notifications (e.g., exclude `.jpg`, `.mp4` for file events).
|
||||
|
||||
### Administrator Setup
|
||||
|
||||
Administrators who want to enable webhooks:
|
||||
|
||||
1. **Enable webhook_listeners app** in Nextcloud: `occ app:enable webhook_listeners`
|
||||
2. **Register webhook endpoints** using Nextcloud's OCS API or admin UI:
|
||||
- Endpoint: `https://<mcp-server-host>:<port>/webhooks/nextcloud`
|
||||
- Events: File created/updated/deleted, Calendar object events, Table row events
|
||||
- Filters: Exclude non-content files (images, videos), system directories
|
||||
- Optional: Configure `Authorization: Bearer <WEBHOOK_SECRET>` header
|
||||
3. **Optionally reduce scanner frequency**: Set `VECTOR_SYNC_SCAN_INTERVAL=86400` (24 hours)
|
||||
4. **Set up webhook workers** (optional): Configure dedicated background job workers for low-latency delivery
|
||||
|
||||
Existing deployments continue using polling without any changes. Webhooks are purely additive.
|
||||
|
||||
## Consequences
|
||||
|
||||
### Benefits
|
||||
|
||||
**Reduced Latency**: With webhooks configured, content changes appear in semantic search within seconds to minutes (depending on Nextcloud background job configuration) instead of up to 1 hour. Queries like "What meetings do I have today?" reflect recent calendar updates.
|
||||
|
||||
**Lower API Load**: Administrators who configure webhooks can reduce scanner frequency (e.g., 24-hour intervals), eliminating most polling API calls while maintaining safety reconciliation scans. This significantly reduces load on Nextcloud servers.
|
||||
|
||||
**Better Scalability**: Webhooks scale better than polling as content volume grows. The system only processes changed documents instead of checking all documents every hour.
|
||||
|
||||
**Simple Architecture**: The webhook endpoint is just another producer feeding the existing processor queue. No changes to scanner, processors, or queue management—webhooks integrate cleanly into the existing architecture.
|
||||
|
||||
**Improved User Experience**: Lower-latency semantic search feels more responsive and accurate, especially for time-sensitive queries about recent changes.
|
||||
|
||||
### Drawbacks
|
||||
|
||||
**Manual Configuration**: Administrators must configure webhooks outside the MCP server using Nextcloud's admin tools. This adds setup complexity compared to the zero-configuration polling approach.
|
||||
|
||||
**Deployment Requirements**: Webhooks require the MCP server to be reachable from Nextcloud via HTTP(S). Deployments behind NAT or with restrictive firewalls may not support webhooks without additional networking configuration.
|
||||
|
||||
**Asynchronous Delivery**: Nextcloud processes webhooks via background jobs, introducing delivery latency (typically seconds to minutes). The exact latency depends on background job worker configuration and system load.
|
||||
|
||||
**Testing Complexity**: Integration tests cannot rely on immediate webhook delivery due to asynchronous background job processing. Tests must either poll for results or mock webhook delivery directly.
|
||||
|
||||
**New Failure Modes**: Webhook endpoint downtime, network issues between Nextcloud and MCP server, webhook notification floods from bulk operations. The system must handle these gracefully.
|
||||
|
||||
**Version Dependencies**: The webhook_listeners app requires Nextcloud 30+. Older versions continue using polling exclusively.
|
||||
|
||||
### Monitoring and Observability
|
||||
|
||||
New metrics track webhook performance:
|
||||
|
||||
- `webhook_notifications_received_total{event_type}`: Count of webhook notifications by event type
|
||||
- `webhook_processing_duration_seconds{event_type}`: Webhook handler latency
|
||||
- `webhook_errors_total{error_type}`: Failed webhook processing by error type (auth failure, parse error, queue full)
|
||||
|
||||
Logs include:
|
||||
- Successful webhook processing: `Queued document from webhook: DocumentTask(...)`
|
||||
- Webhook authentication failures: `Webhook authentication failed`
|
||||
- Parse errors: `Failed to parse webhook payload: ...`
|
||||
- Unsupported events: `Ignoring webhook for unsupported event: ...`
|
||||
|
||||
### Security Considerations
|
||||
|
||||
**Optional Authentication**: When `WEBHOOK_SECRET` is configured, webhook requests must include `Authorization: Bearer <WEBHOOK_SECRET>` header. The server validates this before processing to prevent unauthorized document queueing. For local development, authentication can be disabled by leaving `WEBHOOK_SECRET` unset.
|
||||
|
||||
**Payload Validation**: Webhook payloads are parsed and validated against expected schemas. Malformed payloads are rejected with 400 Bad Request responses.
|
||||
|
||||
**No Scope Enforcement**: Unlike MCP tools, webhooks do not enforce progressive consent or check if users have enabled semantic search. Webhooks queue all document changes—administrators control which events trigger webhooks via Nextcloud filters. This keeps the webhook endpoint simple and stateless.
|
||||
|
||||
### Testing Strategy
|
||||
|
||||
**Unit Tests**: Test webhook handler logic, event parsing, and authentication validation using mocked payloads:
|
||||
|
||||
```python
|
||||
async def test_webhook_endpoint_parses_note_created_event():
|
||||
"""Unit test: webhook endpoint extracts DocumentTask from note created event."""
|
||||
payload = {
|
||||
"user": {"uid": "alice"},
|
||||
"time": 1704067200,
|
||||
"event": {
|
||||
"class": "OCP\\Files\\Events\\Node\\NodeCreatedEvent",
|
||||
"node": {"id": "123", "path": "/alice/files/test.md"}
|
||||
}
|
||||
}
|
||||
# Mock send_stream and verify DocumentTask is queued
|
||||
...
|
||||
```
|
||||
|
||||
**Integration Tests (Without Real Webhooks)**: Since Nextcloud processes webhooks asynchronously via background jobs, integration tests should NOT rely on triggering real Nextcloud events and waiting for webhook delivery. Instead, tests should:
|
||||
|
||||
1. **Mock webhook delivery**: POST webhook payloads directly to the `/webhooks/nextcloud` endpoint
|
||||
2. **Verify processing**: Check that documents are queued and eventually appear in Qdrant
|
||||
3. **Test authentication**: Verify requests without valid auth header are rejected (when `WEBHOOK_SECRET` is set)
|
||||
|
||||
```python
|
||||
async def test_webhook_integration_mocked_delivery():
|
||||
"""Integration test: webhook handler queues document for processing."""
|
||||
# POST webhook payload directly to endpoint (bypass Nextcloud)
|
||||
response = await client.post("/webhooks/nextcloud", json=note_created_payload)
|
||||
assert response.status_code == 200
|
||||
|
||||
# Wait for processor to handle document
|
||||
await asyncio.sleep(2)
|
||||
|
||||
# Verify document appears in Qdrant
|
||||
results = await qdrant_client.scroll(...)
|
||||
assert len(results[0]) > 0
|
||||
```
|
||||
|
||||
**Manual Testing (Real Webhooks)**: For end-to-end validation with real Nextcloud webhook delivery:
|
||||
|
||||
1. Register webhook via OCS API or `NextcloudClient.register_webhook()` helper
|
||||
2. Configure webhook background job workers for low-latency delivery
|
||||
3. Trigger Nextcloud events (create note, add calendar event)
|
||||
4. Monitor MCP server logs for webhook delivery
|
||||
5. Verify documents appear in Qdrant after background job processing
|
||||
|
||||
**Failure Mode Tests**:
|
||||
- Invalid authentication: Verify 401 response when auth header is missing/incorrect
|
||||
- Malformed payload: Verify 400 response for invalid JSON or missing required fields
|
||||
- Unsupported event types: Verify graceful handling (ignored, not error)
|
||||
- Queue full: Verify 500 response with appropriate error message
|
||||
|
||||
### Future Enhancements
|
||||
|
||||
**Batch Processing**: Group multiple webhook notifications within a short time window (e.g., 5 seconds) into a single batch before queueing. This reduces processor overhead during bulk operations like importing calendars.
|
||||
|
||||
**Webhook Payload Optimization**: For large documents, Nextcloud could be configured to send minimal metadata in webhooks (just user_id, doc_id, doc_type), with processors fetching full content lazily. This reduces webhook payload size and network bandwidth.
|
||||
|
||||
**Deduplication Window**: Track recently processed documents (last 5 minutes) to avoid redundant work when webhooks and scanner both detect the same change. The processor can check a simple in-memory cache before fetching document content.
|
||||
|
||||
## References
|
||||
|
||||
- ADR-007: Background Vector Database Synchronization (polling architecture)
|
||||
- Nextcloud Documentation: `~/Software/documentation/admin_manual/webhook_listeners/index.rst`
|
||||
- Nextcloud OCS API: Webhook registration endpoint
|
||||
- Current scanner implementation: `nextcloud_mcp_server/vector/scanner.py:37`
|
||||
@@ -16,8 +16,7 @@ The Nextcloud MCP Server includes comprehensive observability features for produ
|
||||
export METRICS_ENABLED=true
|
||||
export METRICS_PORT=9090
|
||||
|
||||
# Enable tracing (optional)
|
||||
export OTEL_ENABLED=true
|
||||
# Enable tracing (optional - tracing is enabled when OTEL_EXPORTER_OTLP_ENDPOINT is set)
|
||||
export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4317
|
||||
|
||||
# Start the server
|
||||
@@ -46,8 +45,7 @@ helm install nextcloud-mcp charts/nextcloud-mcp-server \
|
||||
|----------|---------|-------------|
|
||||
| `METRICS_ENABLED` | `true` | Enable Prometheus metrics |
|
||||
| `METRICS_PORT` | `9090` | Port for metrics endpoint |
|
||||
| `OTEL_ENABLED` | `false` | Enable OpenTelemetry tracing |
|
||||
| `OTEL_EXPORTER_OTLP_ENDPOINT` | - | OTLP gRPC endpoint (e.g., `http://otel-collector:4317`) |
|
||||
| `OTEL_EXPORTER_OTLP_ENDPOINT` | - | OTLP gRPC endpoint (e.g., `http://otel-collector:4317`). Tracing is enabled when this is set. |
|
||||
| `OTEL_SERVICE_NAME` | `nextcloud-mcp-server` | Service name in traces |
|
||||
| `OTEL_TRACES_SAMPLER` | `always_on` | Trace sampling strategy |
|
||||
| `OTEL_TRACES_SAMPLER_ARG` | `1.0` | Sampling rate (0.0-1.0) |
|
||||
|
||||
@@ -5,9 +5,12 @@ from contextlib import AsyncExitStack, asynccontextmanager
|
||||
from dataclasses import dataclass
|
||||
from typing import TYPE_CHECKING, Optional
|
||||
|
||||
from opentelemetry.instrumentation.httpx import HTTPXClientInstrumentor
|
||||
|
||||
if TYPE_CHECKING:
|
||||
from nextcloud_mcp_server.auth.refresh_token_storage import RefreshTokenStorage
|
||||
|
||||
|
||||
import anyio
|
||||
import click
|
||||
import httpx
|
||||
@@ -58,6 +61,7 @@ from nextcloud_mcp_server.server.oauth_tools import register_oauth_tools
|
||||
from nextcloud_mcp_server.vector import processor_task, scanner_task
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
HTTPXClientInstrumentor().instrument()
|
||||
|
||||
|
||||
def initialize_document_processors():
|
||||
@@ -791,17 +795,20 @@ def get_app(transport: str = "sse", enabled_apps: list[str] | None = None):
|
||||
)
|
||||
|
||||
# Setup OpenTelemetry tracing (optional)
|
||||
if settings.tracing_enabled:
|
||||
if settings.otel_exporter_otlp_endpoint:
|
||||
setup_tracing(
|
||||
service_name=settings.otel_service_name,
|
||||
otlp_endpoint=settings.otel_exporter_otlp_endpoint,
|
||||
otlp_verify_ssl=settings.otel_exporter_verify_ssl,
|
||||
sampling_rate=settings.otel_traces_sampler_arg,
|
||||
)
|
||||
logger.info(
|
||||
f"OpenTelemetry tracing enabled (endpoint: {settings.otel_exporter_otlp_endpoint})"
|
||||
)
|
||||
else:
|
||||
logger.info("OpenTelemetry tracing disabled (set OTEL_ENABLED=true to enable)")
|
||||
logger.info(
|
||||
"OpenTelemetry tracing disabled (set OTEL_EXPORTER_OTLP_ENDPOINT to enable)"
|
||||
)
|
||||
|
||||
# Determine authentication mode
|
||||
oauth_enabled = is_oauth_mode()
|
||||
@@ -1391,9 +1398,12 @@ def get_app(transport: str = "sse", enabled_apps: list[str] | None = None):
|
||||
)
|
||||
logger.info(f"🔑 /mcp request with Authorization: {token_preview}")
|
||||
else:
|
||||
logger.warning(
|
||||
f"⚠️ /mcp request WITHOUT Authorization header from {request.client}"
|
||||
)
|
||||
# Only warn about missing Authorization in OAuth mode
|
||||
# In BasicAuth mode, /mcp requests without Authorization are expected
|
||||
if oauth_enabled:
|
||||
logger.warning(
|
||||
f"⚠️ /mcp request WITHOUT Authorization header from {request.client}"
|
||||
)
|
||||
|
||||
# Log client capabilities on initialize request
|
||||
if request.method == "POST":
|
||||
@@ -1454,7 +1464,7 @@ def get_app(transport: str = "sse", enabled_apps: list[str] | None = None):
|
||||
)
|
||||
|
||||
# Add observability middleware (metrics + tracing)
|
||||
if settings.metrics_enabled or settings.tracing_enabled:
|
||||
if settings.metrics_enabled or settings.otel_exporter_otlp_endpoint:
|
||||
app.add_middleware(ObservabilityMiddleware)
|
||||
logger.info("Observability middleware enabled (metrics and/or tracing)")
|
||||
|
||||
|
||||
@@ -231,17 +231,21 @@ class UnifiedTokenVerifier(TokenVerifier):
|
||||
token,
|
||||
signing_key.key,
|
||||
algorithms=["RS256"],
|
||||
issuer=self.settings.oidc_issuer
|
||||
if hasattr(self.settings, "oidc_issuer")
|
||||
else None,
|
||||
issuer=(
|
||||
self.settings.oidc_issuer
|
||||
if hasattr(self.settings, "oidc_issuer")
|
||||
else None
|
||||
),
|
||||
options={
|
||||
"verify_signature": True,
|
||||
"verify_exp": True,
|
||||
"verify_iat": True,
|
||||
"verify_iss": True
|
||||
if hasattr(self.settings, "oidc_issuer")
|
||||
and self.settings.oidc_issuer
|
||||
else False,
|
||||
"verify_iss": (
|
||||
True
|
||||
if hasattr(self.settings, "oidc_issuer")
|
||||
and self.settings.oidc_issuer
|
||||
else False
|
||||
),
|
||||
"verify_aud": False, # We handle audience validation separately
|
||||
},
|
||||
)
|
||||
|
||||
@@ -9,6 +9,7 @@ from httpx import (
|
||||
BasicAuth,
|
||||
Request,
|
||||
Response,
|
||||
Timeout,
|
||||
)
|
||||
|
||||
from ..controllers.notes_search import NotesSearchController
|
||||
@@ -66,6 +67,7 @@ class NextcloudClient:
|
||||
auth=auth,
|
||||
transport=AsyncDisableCookieTransport(AsyncHTTPTransport()),
|
||||
event_hooks={"request": [log_request], "response": [log_response]},
|
||||
timeout=Timeout(timeout=30, connect=5),
|
||||
)
|
||||
|
||||
# Initialize app clients
|
||||
|
||||
@@ -18,18 +18,57 @@ class NotesClient(BaseNextcloudClient):
|
||||
response = await self._make_request("GET", "/apps/notes/api/v1/settings")
|
||||
return response.json()
|
||||
|
||||
async def get_all_notes(self) -> AsyncIterator[Dict[str, Any]]:
|
||||
"""Get all notes, yielding them one at a time."""
|
||||
async def get_all_notes(
|
||||
self, prune_before: Optional[int] = None
|
||||
) -> AsyncIterator[Dict[str, Any]]:
|
||||
"""Get all notes, yielding them one at a time.
|
||||
|
||||
The Notes API returns changed notes with full data in chunks, and ALL note IDs
|
||||
(with only 'id' field) in the last chunk for deletion detection. This causes
|
||||
duplicates which we handle by tracking seen IDs (first occurrence with full
|
||||
data is kept, later pruned duplicates are skipped).
|
||||
|
||||
Args:
|
||||
prune_before: Optional Unix timestamp. Notes unchanged since this time
|
||||
are pruned (only 'id' field returned in last chunk).
|
||||
Reduces data transfer for large note collections.
|
||||
|
||||
Yields:
|
||||
Note dictionaries with full data (deduplicated).
|
||||
"""
|
||||
cursor = ""
|
||||
seen_ids: set[int] = set()
|
||||
|
||||
while True:
|
||||
params: Dict[str, Any] = {"chunkSize": 10}
|
||||
if cursor:
|
||||
params["chunkCursor"] = cursor
|
||||
if prune_before is not None:
|
||||
params["pruneBefore"] = prune_before
|
||||
|
||||
response = await self._make_request(
|
||||
"GET",
|
||||
"/apps/notes/api/v1/notes",
|
||||
params={"chunkSize": 10, "chunkCursor": cursor},
|
||||
params=params,
|
||||
)
|
||||
for note in response.json():
|
||||
response_data = response.json()
|
||||
|
||||
for note in response_data:
|
||||
note_id = note.get("id")
|
||||
if note_id is None:
|
||||
logger.warning(f"Skipping note without ID: {note}")
|
||||
continue
|
||||
|
||||
# Skip duplicates (API returns all IDs in last chunk for deletion detection)
|
||||
if note_id in seen_ids:
|
||||
logger.debug(
|
||||
f"Skipping duplicate note {note_id} (pruned version in last chunk)"
|
||||
)
|
||||
continue
|
||||
|
||||
seen_ids.add(note_id)
|
||||
yield note
|
||||
|
||||
if "X-Notes-Chunk-Cursor" not in response.headers:
|
||||
break
|
||||
cursor = response.headers["X-Notes-Chunk-Cursor"]
|
||||
|
||||
@@ -181,8 +181,8 @@ class Settings:
|
||||
# Observability settings
|
||||
metrics_enabled: bool = True
|
||||
metrics_port: int = 9090
|
||||
tracing_enabled: bool = False
|
||||
otel_exporter_otlp_endpoint: Optional[str] = None
|
||||
otel_exporter_verify_ssl: bool = False
|
||||
otel_service_name: str = "nextcloud-mcp-server"
|
||||
otel_traces_sampler: str = "always_on"
|
||||
otel_traces_sampler_arg: float = 1.0
|
||||
@@ -334,8 +334,9 @@ def get_settings() -> Settings:
|
||||
# Observability settings
|
||||
metrics_enabled=os.getenv("METRICS_ENABLED", "true").lower() == "true",
|
||||
metrics_port=int(os.getenv("METRICS_PORT", "9090")),
|
||||
tracing_enabled=os.getenv("OTEL_ENABLED", "false").lower() == "true",
|
||||
otel_exporter_otlp_endpoint=os.getenv("OTEL_EXPORTER_OTLP_ENDPOINT"),
|
||||
otel_exporter_verify_ssl=os.getenv("OTEL_EXPORTER_VERIFY_SSL", "false").lower()
|
||||
== "true",
|
||||
otel_service_name=os.getenv("OTEL_SERVICE_NAME", "nextcloud-mcp-server"),
|
||||
otel_traces_sampler=os.getenv("OTEL_TRACES_SAMPLER", "always_on"),
|
||||
otel_traces_sampler_arg=float(os.getenv("OTEL_TRACES_SAMPLER_ARG", "1.0")),
|
||||
|
||||
@@ -66,22 +66,40 @@ class ObservabilityMiddleware(BaseHTTPMiddleware):
|
||||
# Record start time
|
||||
start_time = time.time()
|
||||
|
||||
try:
|
||||
# Create span for request (OpenTelemetry auto-instrumentation will create parent span)
|
||||
with trace_operation(
|
||||
f"HTTP {method} {endpoint}",
|
||||
attributes={
|
||||
"http.method": method,
|
||||
"http.path": path,
|
||||
"http.scheme": request.url.scheme,
|
||||
"http.host": request.url.hostname,
|
||||
},
|
||||
):
|
||||
# Process request
|
||||
response = await call_next(request)
|
||||
# Skip tracing for health/metrics endpoints to reduce noise
|
||||
should_trace = not (path.startswith("/health/") or path == "/metrics")
|
||||
|
||||
# Add response status to span
|
||||
add_span_attribute("http.status_code", response.status_code)
|
||||
try:
|
||||
if should_trace:
|
||||
# Create span for request (OpenTelemetry auto-instrumentation will create parent span)
|
||||
with trace_operation(
|
||||
f"HTTP {method} {endpoint}",
|
||||
attributes={
|
||||
"http.method": method,
|
||||
"http.path": path,
|
||||
"http.scheme": request.url.scheme,
|
||||
"http.host": request.url.hostname,
|
||||
},
|
||||
):
|
||||
# Process request
|
||||
response = await call_next(request)
|
||||
|
||||
# Add response status to span
|
||||
add_span_attribute("http.status_code", response.status_code)
|
||||
|
||||
# Record metrics
|
||||
duration = time.time() - start_time
|
||||
self._record_request_metrics(
|
||||
method=method,
|
||||
endpoint=endpoint,
|
||||
status_code=response.status_code,
|
||||
duration=duration,
|
||||
)
|
||||
|
||||
return response
|
||||
else:
|
||||
# No tracing for health/metrics endpoints, but still record metrics
|
||||
response = await call_next(request)
|
||||
|
||||
# Record metrics
|
||||
duration = time.time() - start_time
|
||||
|
||||
@@ -13,9 +13,9 @@ import logging
|
||||
from contextlib import contextmanager
|
||||
from typing import Any
|
||||
|
||||
from importlib_metadata import version
|
||||
from opentelemetry import trace
|
||||
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
|
||||
from opentelemetry.instrumentation.httpx import HTTPXClientInstrumentor
|
||||
from opentelemetry.instrumentation.logging import LoggingInstrumentor
|
||||
from opentelemetry.sdk.resources import Resource
|
||||
from opentelemetry.sdk.trace import TracerProvider
|
||||
@@ -27,10 +27,13 @@ logger = logging.getLogger(__name__)
|
||||
# Global tracer instance (initialized in setup_tracing)
|
||||
_tracer: Tracer | None = None
|
||||
|
||||
# Auto-instrument httpx for Nextcloud API calls
|
||||
|
||||
|
||||
def setup_tracing(
|
||||
service_name: str = "nextcloud-mcp-server",
|
||||
otlp_endpoint: str | None = None,
|
||||
otlp_verify_ssl: bool = False,
|
||||
sampling_rate: float = 1.0,
|
||||
) -> Tracer:
|
||||
"""
|
||||
@@ -40,6 +43,8 @@ def setup_tracing(
|
||||
service_name: Service name for traces (default: "nextcloud-mcp-server")
|
||||
otlp_endpoint: OTLP gRPC endpoint (e.g., "http://otel-collector:4317")
|
||||
If None, tracing is initialized but no exporter is configured
|
||||
otlp_verify_ssl: Enable TLS verification for otlp_endpoint. If True,
|
||||
`insecure` will eval to False
|
||||
sampling_rate: Sampling rate (0.0-1.0). Default 1.0 (100% sampling)
|
||||
|
||||
Returns:
|
||||
@@ -51,7 +56,7 @@ def setup_tracing(
|
||||
resource = Resource.create(
|
||||
{
|
||||
"service.name": service_name,
|
||||
"service.version": "0.27.2", # TODO: Extract from pyproject.toml
|
||||
"service.version": version(__package__.split(".")[0]),
|
||||
}
|
||||
)
|
||||
|
||||
@@ -61,7 +66,9 @@ def setup_tracing(
|
||||
# Configure OTLP exporter if endpoint is provided
|
||||
if otlp_endpoint:
|
||||
try:
|
||||
otlp_exporter = OTLPSpanExporter(endpoint=otlp_endpoint, insecure=True)
|
||||
otlp_exporter = OTLPSpanExporter(
|
||||
endpoint=otlp_endpoint, insecure=not otlp_verify_ssl
|
||||
)
|
||||
span_processor = BatchSpanProcessor(otlp_exporter)
|
||||
provider.add_span_processor(span_processor)
|
||||
logger.info(
|
||||
@@ -79,9 +86,6 @@ def setup_tracing(
|
||||
# Set global tracer provider
|
||||
trace.set_tracer_provider(provider)
|
||||
|
||||
# Auto-instrument httpx for Nextcloud API calls
|
||||
HTTPXClientInstrumentor().instrument()
|
||||
|
||||
# Auto-instrument logging to inject trace context
|
||||
LoggingInstrumentor().instrument(set_logging_format=True)
|
||||
|
||||
|
||||
@@ -15,6 +15,7 @@ from qdrant_client.models import FieldCondition, Filter, MatchValue, PointStruct
|
||||
from nextcloud_mcp_server.client import NextcloudClient
|
||||
from nextcloud_mcp_server.config import get_settings
|
||||
from nextcloud_mcp_server.embedding import get_embedding_service
|
||||
from nextcloud_mcp_server.observability.tracing import trace_operation
|
||||
from nextcloud_mcp_server.vector.document_chunker import DocumentChunker
|
||||
from nextcloud_mcp_server.vector.qdrant_client import get_qdrant_client
|
||||
from nextcloud_mcp_server.vector.scanner import DocumentTask
|
||||
@@ -94,58 +95,68 @@ async def process_document(doc_task: DocumentTask, nc_client: NextcloudClient):
|
||||
f"for {doc_task.user_id} ({doc_task.operation})"
|
||||
)
|
||||
|
||||
qdrant_client = await get_qdrant_client()
|
||||
settings = get_settings()
|
||||
with trace_operation(
|
||||
"vector_sync.process_document",
|
||||
attributes={
|
||||
"vector_sync.operation": "process",
|
||||
"vector_sync.user_id": doc_task.user_id,
|
||||
"vector_sync.doc_id": doc_task.doc_id,
|
||||
"vector_sync.doc_type": doc_task.doc_type,
|
||||
"vector_sync.doc_operation": doc_task.operation,
|
||||
},
|
||||
):
|
||||
qdrant_client = await get_qdrant_client()
|
||||
settings = get_settings()
|
||||
|
||||
# Handle deletion
|
||||
if doc_task.operation == "delete":
|
||||
await qdrant_client.delete(
|
||||
collection_name=settings.get_collection_name(),
|
||||
points_selector=Filter(
|
||||
must=[
|
||||
FieldCondition(
|
||||
key="user_id",
|
||||
match=MatchValue(value=doc_task.user_id),
|
||||
),
|
||||
FieldCondition(
|
||||
key="doc_id",
|
||||
match=MatchValue(value=doc_task.doc_id),
|
||||
),
|
||||
FieldCondition(
|
||||
key="doc_type",
|
||||
match=MatchValue(value=doc_task.doc_type),
|
||||
),
|
||||
]
|
||||
),
|
||||
)
|
||||
logger.info(
|
||||
f"Deleted {doc_task.doc_type}_{doc_task.doc_id} for {doc_task.user_id}"
|
||||
)
|
||||
return
|
||||
# Handle deletion
|
||||
if doc_task.operation == "delete":
|
||||
await qdrant_client.delete(
|
||||
collection_name=settings.get_collection_name(),
|
||||
points_selector=Filter(
|
||||
must=[
|
||||
FieldCondition(
|
||||
key="user_id",
|
||||
match=MatchValue(value=doc_task.user_id),
|
||||
),
|
||||
FieldCondition(
|
||||
key="doc_id",
|
||||
match=MatchValue(value=doc_task.doc_id),
|
||||
),
|
||||
FieldCondition(
|
||||
key="doc_type",
|
||||
match=MatchValue(value=doc_task.doc_type),
|
||||
),
|
||||
]
|
||||
),
|
||||
)
|
||||
logger.info(
|
||||
f"Deleted {doc_task.doc_type}_{doc_task.doc_id} for {doc_task.user_id}"
|
||||
)
|
||||
return
|
||||
|
||||
# Handle indexing with retry
|
||||
max_retries = 3
|
||||
retry_delay = 1.0
|
||||
# Handle indexing with retry
|
||||
max_retries = 3
|
||||
retry_delay = 1.0
|
||||
|
||||
for attempt in range(max_retries):
|
||||
try:
|
||||
await _index_document(doc_task, nc_client, qdrant_client)
|
||||
return # Success
|
||||
for attempt in range(max_retries):
|
||||
try:
|
||||
await _index_document(doc_task, nc_client, qdrant_client)
|
||||
return # Success
|
||||
|
||||
except (HTTPStatusError, Exception) as e:
|
||||
if attempt < max_retries - 1:
|
||||
logger.warning(
|
||||
f"Retry {attempt + 1}/{max_retries} for "
|
||||
f"{doc_task.doc_type}_{doc_task.doc_id}: {e}"
|
||||
)
|
||||
await anyio.sleep(retry_delay)
|
||||
retry_delay *= 2 # Exponential backoff
|
||||
else:
|
||||
logger.error(
|
||||
f"Failed to index {doc_task.doc_type}_{doc_task.doc_id} "
|
||||
f"after {max_retries} retries: {e}"
|
||||
)
|
||||
raise
|
||||
except (HTTPStatusError, Exception) as e:
|
||||
if attempt < max_retries - 1:
|
||||
logger.warning(
|
||||
f"Retry {attempt + 1}/{max_retries} for "
|
||||
f"{doc_task.doc_type}_{doc_task.doc_id}: {e}"
|
||||
)
|
||||
await anyio.sleep(retry_delay)
|
||||
retry_delay *= 2 # Exponential backoff
|
||||
else:
|
||||
logger.error(
|
||||
f"Failed to index {doc_task.doc_type}_{doc_task.doc_id} "
|
||||
f"after {max_retries} retries: {e}"
|
||||
)
|
||||
raise
|
||||
|
||||
|
||||
async def _index_document(
|
||||
|
||||
@@ -13,6 +13,7 @@ from qdrant_client.models import FieldCondition, Filter, MatchValue
|
||||
|
||||
from nextcloud_mcp_server.client import NextcloudClient
|
||||
from nextcloud_mcp_server.config import get_settings
|
||||
from nextcloud_mcp_server.observability.tracing import trace_operation
|
||||
from nextcloud_mcp_server.vector.qdrant_client import get_qdrant_client
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
@@ -34,6 +35,57 @@ class DocumentTask:
|
||||
_potentially_deleted: dict[tuple[str, str], float] = {}
|
||||
|
||||
|
||||
async def get_last_indexed_timestamp(user_id: str) -> int | None:
|
||||
"""Get the most recent indexed_at timestamp for user's notes in Qdrant.
|
||||
|
||||
This timestamp can be used as pruneBefore parameter to optimize data transfer
|
||||
when fetching notes - only notes modified after this timestamp will be sent
|
||||
with full data.
|
||||
|
||||
Args:
|
||||
user_id: User to query
|
||||
|
||||
Returns:
|
||||
Unix timestamp of most recently indexed note, or None if no notes indexed yet
|
||||
"""
|
||||
try:
|
||||
qdrant_client = await get_qdrant_client()
|
||||
|
||||
# Query for user's notes, ordered by indexed_at descending, limit 1
|
||||
scroll_result = await qdrant_client.scroll(
|
||||
collection_name=get_settings().get_collection_name(),
|
||||
scroll_filter=Filter(
|
||||
must=[
|
||||
FieldCondition(key="user_id", match=MatchValue(value=user_id)),
|
||||
FieldCondition(key="doc_type", match=MatchValue(value="note")),
|
||||
]
|
||||
),
|
||||
with_payload=["indexed_at"],
|
||||
with_vectors=False,
|
||||
limit=10000, # Get all to find max
|
||||
)
|
||||
|
||||
# Find max indexed_at across all results
|
||||
num_points = len(scroll_result[0]) if scroll_result[0] else 0
|
||||
logger.info(f"Found {num_points} indexed notes in Qdrant for user {user_id}")
|
||||
|
||||
if scroll_result[0]:
|
||||
timestamps = [
|
||||
point.payload.get("indexed_at", 0) for point in scroll_result[0]
|
||||
]
|
||||
max_timestamp = max(timestamps)
|
||||
logger.info(
|
||||
f"Max indexed_at: {max_timestamp}, timestamps sample: {timestamps[:3]}"
|
||||
)
|
||||
return int(max_timestamp) if max_timestamp > 0 else None
|
||||
|
||||
logger.info(f"No indexed notes found for user {user_id}")
|
||||
return None
|
||||
except Exception as e:
|
||||
logger.warning(f"Failed to get last indexed timestamp: {e}", exc_info=True)
|
||||
return None
|
||||
|
||||
|
||||
async def scanner_task(
|
||||
send_stream: MemoryObjectSendStream[DocumentTask],
|
||||
shutdown_event: anyio.Event,
|
||||
@@ -96,138 +148,155 @@ async def scan_user_documents(
|
||||
nc_client: Authenticated Nextcloud client
|
||||
initial_sync: If True, send all documents (first-time sync)
|
||||
"""
|
||||
logger.debug(f"Scanning documents for user: {user_id}")
|
||||
import random
|
||||
|
||||
# Fetch all notes from Nextcloud
|
||||
notes = [note async for note in nc_client.notes.get_all_notes()]
|
||||
logger.debug(f"Found {len(notes)} notes for {user_id}")
|
||||
|
||||
if initial_sync:
|
||||
# Send everything on first sync
|
||||
for note in notes:
|
||||
# Handle missing 'modified' field (use 0 as fallback)
|
||||
modified_at = note.get("modified", 0)
|
||||
if modified_at == 0:
|
||||
logger.warning(
|
||||
f"Note {note['id']} missing 'modified' field, using 0 as fallback"
|
||||
)
|
||||
|
||||
await send_stream.send(
|
||||
DocumentTask(
|
||||
user_id=user_id,
|
||||
doc_id=str(note["id"]),
|
||||
doc_type="note",
|
||||
operation="index",
|
||||
modified_at=modified_at,
|
||||
)
|
||||
)
|
||||
logger.info(f"Sent {len(notes)} documents for initial sync: {user_id}")
|
||||
return
|
||||
|
||||
# Get indexed state from Qdrant
|
||||
qdrant_client = await get_qdrant_client()
|
||||
scroll_result = await qdrant_client.scroll(
|
||||
collection_name=get_settings().get_collection_name(),
|
||||
scroll_filter=Filter(
|
||||
must=[
|
||||
FieldCondition(key="user_id", match=MatchValue(value=user_id)),
|
||||
FieldCondition(key="doc_type", match=MatchValue(value="note")),
|
||||
]
|
||||
),
|
||||
with_payload=["doc_id", "indexed_at"],
|
||||
with_vectors=False,
|
||||
limit=10000,
|
||||
scan_id = random.randint(1000, 9999)
|
||||
logger.info(
|
||||
f"[SCAN-{scan_id}] Starting scan for user: {user_id}, initial_sync={initial_sync}"
|
||||
)
|
||||
|
||||
indexed_docs = {
|
||||
point.payload["doc_id"]: point.payload["indexed_at"]
|
||||
for point in scroll_result[0]
|
||||
}
|
||||
|
||||
logger.debug(f"Found {len(indexed_docs)} indexed documents in Qdrant")
|
||||
|
||||
# Compare and queue changes
|
||||
queued = 0
|
||||
nextcloud_doc_ids = {str(note["id"]) for note in notes}
|
||||
|
||||
for note in notes:
|
||||
doc_id = str(note["id"])
|
||||
indexed_at = indexed_docs.get(doc_id)
|
||||
|
||||
# Handle missing 'modified' field (use 0 as fallback)
|
||||
modified_at = note.get("modified", 0)
|
||||
if modified_at == 0:
|
||||
logger.warning(
|
||||
f"Note {doc_id} missing 'modified' field, using 0 as fallback"
|
||||
with trace_operation(
|
||||
"vector_sync.scan_user_documents",
|
||||
attributes={
|
||||
"vector_sync.operation": "scan",
|
||||
"vector_sync.user_id": user_id,
|
||||
"vector_sync.initial_sync": initial_sync,
|
||||
"vector_sync.scan_id": scan_id,
|
||||
},
|
||||
):
|
||||
# Calculate prune timestamp for optimized data transfer
|
||||
# Only notes modified after this will be sent with full data
|
||||
prune_before = (
|
||||
None if initial_sync else await get_last_indexed_timestamp(user_id)
|
||||
)
|
||||
if prune_before:
|
||||
logger.info(
|
||||
f"[SCAN-{scan_id}] Using pruneBefore={prune_before} to optimize data transfer"
|
||||
)
|
||||
|
||||
# If document reappeared, remove from potentially_deleted
|
||||
doc_key = (user_id, doc_id)
|
||||
if doc_key in _potentially_deleted:
|
||||
logger.debug(
|
||||
f"Document {doc_id} reappeared, removing from deletion grace period"
|
||||
)
|
||||
del _potentially_deleted[doc_key]
|
||||
# Fetch all notes from Nextcloud
|
||||
notes = [
|
||||
note
|
||||
async for note in nc_client.notes.get_all_notes(prune_before=prune_before)
|
||||
]
|
||||
logger.info(f"[SCAN-{scan_id}] Found {len(notes)} notes for {user_id}")
|
||||
|
||||
# Send if never indexed or modified since last index
|
||||
if indexed_at is None or modified_at > indexed_at:
|
||||
await send_stream.send(
|
||||
DocumentTask(
|
||||
user_id=user_id,
|
||||
doc_id=doc_id,
|
||||
doc_type="note",
|
||||
operation="index",
|
||||
modified_at=modified_at,
|
||||
if initial_sync:
|
||||
# Send everything on first sync
|
||||
for note in notes:
|
||||
modified_at = note.get("modified", 0)
|
||||
await send_stream.send(
|
||||
DocumentTask(
|
||||
user_id=user_id,
|
||||
doc_id=str(note["id"]),
|
||||
doc_type="note",
|
||||
operation="index",
|
||||
modified_at=modified_at,
|
||||
)
|
||||
)
|
||||
)
|
||||
queued += 1
|
||||
logger.info(f"Sent {len(notes)} documents for initial sync: {user_id}")
|
||||
return
|
||||
|
||||
# Check for deleted documents (in Qdrant but not in Nextcloud)
|
||||
# Use grace period: only delete after 2 consecutive scans confirm absence
|
||||
settings = get_settings()
|
||||
grace_period = settings.vector_sync_scan_interval * 1.5 # Allow 1.5 scan intervals
|
||||
current_time = time.time()
|
||||
# Get indexed state from Qdrant
|
||||
qdrant_client = await get_qdrant_client()
|
||||
scroll_result = await qdrant_client.scroll(
|
||||
collection_name=get_settings().get_collection_name(),
|
||||
scroll_filter=Filter(
|
||||
must=[
|
||||
FieldCondition(key="user_id", match=MatchValue(value=user_id)),
|
||||
FieldCondition(key="doc_type", match=MatchValue(value="note")),
|
||||
]
|
||||
),
|
||||
with_payload=["doc_id", "indexed_at"],
|
||||
with_vectors=False,
|
||||
limit=10000,
|
||||
)
|
||||
|
||||
for doc_id in indexed_docs:
|
||||
if doc_id not in nextcloud_doc_ids:
|
||||
indexed_docs = {
|
||||
point.payload["doc_id"]: point.payload["indexed_at"]
|
||||
for point in scroll_result[0]
|
||||
}
|
||||
|
||||
logger.debug(f"Found {len(indexed_docs)} indexed documents in Qdrant")
|
||||
|
||||
# Compare and queue changes
|
||||
queued = 0
|
||||
nextcloud_doc_ids = {str(note["id"]) for note in notes}
|
||||
|
||||
for note in notes:
|
||||
doc_id = str(note["id"])
|
||||
indexed_at = indexed_docs.get(doc_id)
|
||||
modified_at = note.get("modified", 0)
|
||||
|
||||
# If document reappeared, remove from potentially_deleted
|
||||
doc_key = (user_id, doc_id)
|
||||
|
||||
if doc_key in _potentially_deleted:
|
||||
# Already marked as potentially deleted, check if grace period elapsed
|
||||
first_missing_time = _potentially_deleted[doc_key]
|
||||
time_missing = current_time - first_missing_time
|
||||
|
||||
if time_missing >= grace_period:
|
||||
# Grace period elapsed, send for deletion
|
||||
logger.info(
|
||||
f"Document {doc_id} missing for {time_missing:.1f}s "
|
||||
f"(>{grace_period:.1f}s grace period), sending deletion"
|
||||
)
|
||||
await send_stream.send(
|
||||
DocumentTask(
|
||||
user_id=user_id,
|
||||
doc_id=doc_id,
|
||||
doc_type="note",
|
||||
operation="delete",
|
||||
modified_at=0,
|
||||
)
|
||||
)
|
||||
queued += 1
|
||||
# Remove from tracking after sending deletion
|
||||
del _potentially_deleted[doc_key]
|
||||
else:
|
||||
logger.debug(
|
||||
f"Document {doc_id} still missing "
|
||||
f"({time_missing:.1f}s/{grace_period:.1f}s grace period)"
|
||||
)
|
||||
else:
|
||||
# First time missing, add to grace period tracking
|
||||
logger.debug(
|
||||
f"Document {doc_id} missing for first time, starting grace period"
|
||||
f"Document {doc_id} reappeared, removing from deletion grace period"
|
||||
)
|
||||
_potentially_deleted[doc_key] = current_time
|
||||
del _potentially_deleted[doc_key]
|
||||
|
||||
if queued > 0:
|
||||
logger.info(f"Sent {queued} documents for incremental sync: {user_id}")
|
||||
else:
|
||||
logger.debug(f"No changes detected for {user_id}")
|
||||
# Send if never indexed or modified since last index
|
||||
if indexed_at is None or modified_at > indexed_at:
|
||||
await send_stream.send(
|
||||
DocumentTask(
|
||||
user_id=user_id,
|
||||
doc_id=doc_id,
|
||||
doc_type="note",
|
||||
operation="index",
|
||||
modified_at=modified_at,
|
||||
)
|
||||
)
|
||||
queued += 1
|
||||
|
||||
# Check for deleted documents (in Qdrant but not in Nextcloud)
|
||||
# Use grace period: only delete after 2 consecutive scans confirm absence
|
||||
settings = get_settings()
|
||||
grace_period = (
|
||||
settings.vector_sync_scan_interval * 1.5
|
||||
) # Allow 1.5 scan intervals
|
||||
current_time = time.time()
|
||||
|
||||
for doc_id in indexed_docs:
|
||||
if doc_id not in nextcloud_doc_ids:
|
||||
doc_key = (user_id, doc_id)
|
||||
|
||||
if doc_key in _potentially_deleted:
|
||||
# Already marked as potentially deleted, check if grace period elapsed
|
||||
first_missing_time = _potentially_deleted[doc_key]
|
||||
time_missing = current_time - first_missing_time
|
||||
|
||||
if time_missing >= grace_period:
|
||||
# Grace period elapsed, send for deletion
|
||||
logger.info(
|
||||
f"Document {doc_id} missing for {time_missing:.1f}s "
|
||||
f"(>{grace_period:.1f}s grace period), sending deletion"
|
||||
)
|
||||
await send_stream.send(
|
||||
DocumentTask(
|
||||
user_id=user_id,
|
||||
doc_id=doc_id,
|
||||
doc_type="note",
|
||||
operation="delete",
|
||||
modified_at=0,
|
||||
)
|
||||
)
|
||||
queued += 1
|
||||
# Remove from tracking after sending deletion
|
||||
del _potentially_deleted[doc_key]
|
||||
else:
|
||||
logger.debug(
|
||||
f"Document {doc_id} still missing "
|
||||
f"({time_missing:.1f}s/{grace_period:.1f}s grace period)"
|
||||
)
|
||||
else:
|
||||
# First time missing, add to grace period tracking
|
||||
logger.debug(
|
||||
f"Document {doc_id} missing for first time, starting grace period"
|
||||
)
|
||||
_potentially_deleted[doc_key] = current_time
|
||||
|
||||
if queued > 0:
|
||||
logger.info(f"Sent {queued} documents for incremental sync: {user_id}")
|
||||
else:
|
||||
logger.debug(f"No changes detected for {user_id}")
|
||||
|
||||
+9
-9
@@ -1,6 +1,6 @@
|
||||
[project]
|
||||
name = "nextcloud-mcp-server"
|
||||
version = "0.30.0"
|
||||
version = "0.31.1"
|
||||
description = "Model Context Protocol (MCP) server for Nextcloud integration - enables AI assistants to interact with Nextcloud data"
|
||||
authors = [
|
||||
{name = "Chris Coutinho", email = "chris@coutinho.io"}
|
||||
@@ -23,14 +23,14 @@ dependencies = [
|
||||
"authlib>=1.6.5",
|
||||
"qdrant-client>=1.7.0",
|
||||
# Observability dependencies
|
||||
"prometheus-client>=0.21.0", # Prometheus metrics
|
||||
"opentelemetry-api>=1.28.2", # OpenTelemetry API
|
||||
"opentelemetry-sdk>=1.28.2", # OpenTelemetry SDK
|
||||
"opentelemetry-instrumentation-asgi>=0.49b2", # Auto-instrument ASGI/Starlette
|
||||
"opentelemetry-instrumentation-httpx>=0.49b2", # Auto-instrument httpx client
|
||||
"opentelemetry-instrumentation-logging>=0.49b2", # Logging integration
|
||||
"opentelemetry-exporter-otlp-proto-grpc>=1.28.2", # OTLP gRPC exporter
|
||||
"python-json-logger>=3.2.0", # Structured JSON logging
|
||||
"prometheus-client>=0.21.0", # Prometheus metrics
|
||||
"opentelemetry-api>=1.28.2", # OpenTelemetry API
|
||||
"opentelemetry-sdk>=1.28.2", # OpenTelemetry SDK
|
||||
"opentelemetry-instrumentation-asgi>=0.49b2", # Auto-instrument ASGI/Starlette
|
||||
"opentelemetry-instrumentation-httpx>=0.49b2", # Auto-instrument httpx client
|
||||
"opentelemetry-instrumentation-logging>=0.49b2", # Logging integration
|
||||
"opentelemetry-exporter-otlp-proto-grpc>=1.28.2", # OTLP gRPC exporter
|
||||
"python-json-logger>=3.2.0", # Structured JSON logging
|
||||
]
|
||||
classifiers = [
|
||||
"Development Status :: 4 - Beta",
|
||||
|
||||
@@ -239,23 +239,46 @@ async def test_attachments_category_change_handling(nc_client: NextcloudClient):
|
||||
assert retrieved_content1 == attachment_content
|
||||
logger.info("Attachment retrieved successfully from initial category.")
|
||||
|
||||
# 4. Update note category
|
||||
# 4. Update note category (with retry for ETag conflicts from background scanner)
|
||||
logger.info(
|
||||
f"Updating note {note_id} category from '{initial_category}' to '{new_category}'"
|
||||
)
|
||||
# Need to fetch the latest etag after attachment add (WebDAV ops don't update note etag)
|
||||
current_note_data = await nc_client.notes.get_note(note_id=note_id)
|
||||
current_etag = current_note_data["etag"]
|
||||
updated_note = await nc_client.notes.update(
|
||||
note_id=note_id,
|
||||
etag=current_etag,
|
||||
category=new_category,
|
||||
title=note_title,
|
||||
content="Updated content", # Pass required fields
|
||||
)
|
||||
etag3 = updated_note["etag"]
|
||||
assert updated_note["category"] == new_category
|
||||
logger.info(f"Note category updated successfully. New Etag: {etag3}")
|
||||
# Retry logic for 412 Precondition Failed (ETag conflict)
|
||||
# This can happen if the background vector scanner touches the note
|
||||
max_update_attempts = 3
|
||||
for attempt in range(max_update_attempts):
|
||||
try:
|
||||
# Fetch the latest etag
|
||||
current_note_data = await nc_client.notes.get_note(note_id=note_id)
|
||||
current_etag = current_note_data["etag"]
|
||||
logger.info(
|
||||
f"Update attempt {attempt + 1}/{max_update_attempts}, current etag: {current_etag}"
|
||||
)
|
||||
|
||||
updated_note = await nc_client.notes.update(
|
||||
note_id=note_id,
|
||||
etag=current_etag,
|
||||
category=new_category,
|
||||
title=note_title,
|
||||
content="Updated content", # Pass required fields
|
||||
)
|
||||
etag3 = updated_note["etag"]
|
||||
assert updated_note["category"] == new_category
|
||||
logger.info(f"Note category updated successfully. New Etag: {etag3}")
|
||||
break # Success, exit retry loop
|
||||
|
||||
except HTTPStatusError as e:
|
||||
if e.response.status_code == 412 and attempt < max_update_attempts - 1:
|
||||
# ETag conflict (likely from background scanner), retry
|
||||
logger.warning(
|
||||
f"ETag conflict (412) on attempt {attempt + 1}, retrying..."
|
||||
)
|
||||
time.sleep(1) # Brief delay before retry
|
||||
continue
|
||||
else:
|
||||
# Not a 412 or out of retries, re-raise
|
||||
raise
|
||||
|
||||
time.sleep(1)
|
||||
|
||||
# 5. Verify attachment retrieval from *new* category (passing new category)
|
||||
|
||||
Reference in New Issue
Block a user