18 Commits

Author SHA1 Message Date
Chris Coutinho a11ae9c027 refactor: enforce PLC0415 (import-outside-top-level) for source code
Enable ruff PLC0415 rule for all source files (tests excluded via
per-file-ignores). Move 136 inline imports to top-level across 33 files.
8 imports suppressed with noqa for legitimate reasons: circular
dependencies (client/__init__.py, context.py), optional dependency
guards (app.py document processors, auth/userinfo_routes.py), and
post-env-setup imports (smithery_main.py).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-20 08:04:50 +01:00
Chris Coutinho 056414752e fix(mcp): Move all imports to the top of modules 2025-12-26 10:05:27 -06:00
Chris Coutinho e4f3beee01 fix: resolve type checking warnings for CI
- Add type casts for Starlette app state access
- Add assertions for cipher, card, board, stack after initialization
- Add None checks for XML element text attributes
- Handle __package__ being None in tracing setup
- Fix TokenBrokerService initialization to use storage credentials

Resolves 42 type warnings from ty-check, enabling CI linting to pass.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-18 00:44:58 +01:00
Chris Coutinho 3f06e2ee77 fix: resolve all type checking errors (8 errors fixed)
Fixed 8 type checker errors across the codebase:

- vector/scanner.py: Handle None scroll results with null-safe iteration
- search/{bm25_hybrid,semantic}.py: Add None checks for result.payload
- auth/{unified_verifier,webhook_routes}.py: Assert non-None auth credentials
- client/webdav.py: Add None checks before int() conversions
- providers/openai.py: Assert embedding_model is not None
- search/algorithms.py: Explicitly type doc_types set and cast values
- observability/logging_config.py: Match parent class signature (log_data)

Also fixed test_create_tag_creates_system_tag to match WebDAV implementation
(was testing OCS API endpoint, now tests correct WebDAV endpoint with
Content-Location header).

Type checker: 0 errors (down from 8), 20 warnings (ignored)
Tests: All 192 unit tests passing

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-08 01:09:02 +01:00
Chris Coutinho 5a251a99e6 fix: Set is_placeholder=False in processor to fix search filtering
The processor was not setting is_placeholder field when writing real
document chunks to Qdrant. This caused the placeholder filter to exclude
all documents (since None != False), resulting in 0 search results.

Now explicitly sets is_placeholder: False in payload when writing real
indexed chunks, allowing search filters to correctly distinguish between
placeholders and real documents.
2025-11-20 17:15:19 +01:00
Chris Coutinho ec2c274cd9 fix: Increase placeholder staleness threshold to 5x scan interval
- Changed from 2x (120s) to 5x (300s) scan interval
- Large PDFs take 3-4 minutes to process, need longer threshold
- Prevents premature requeuing of in-flight documents
2025-11-20 15:36:49 +01:00
Chris Coutinho 8799450c7d Merge pull request #306 from cbcoutinho/rag-evaluation
feat: RAG evaluation framework with performance improvements
2025-11-16 11:17:41 +01:00
Chris Coutinho c4bf077050 feat: Add OpenTelemetry tracing to @instrument_tool decorator
Enhances the @instrument_tool decorator to create distributed traces
for all MCP tool executions, improving observability and debugging.

Changes:
- Modified @instrument_tool to wrap tool execution in trace_operation
- Added automatic span creation with mcp.tool.* span names
- Sanitized tool arguments before adding to span attributes
  (excludes password, token, secret, api_key, etag, ctx)
- Limited argument strings to 500 characters to prevent huge spans
- Maintained existing Prometheus metrics functionality
- Updated docs/observability.md to reflect correct decorator name
- Added comprehensive unit tests

All ~50+ MCP tools now emit traces automatically without code changes.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-16 11:16:05 +01:00
Chris Coutinho 529daf2b48 ci: temp disable sse in ci 2025-11-16 07:03:18 +01:00
Chris Coutinho e3153822f7 perf: Exclude vector-sync status polling from distributed tracing
Skip tracing for /app/vector-sync/status to reduce noise from HTMX polling.
Metrics collection continues for this endpoint.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-15 05:19:35 +01:00
Chris Coutinho 6253faee19 feat: Add instrumentation decorator and apply to notes tools (Phase 5)
Created @instrument_tool decorator for automatic MCP tool metrics collection.
Applied to all 7 tools in notes.py.

Changes:
- observability/metrics.py:
  * New instrument_tool() decorator for automatic timing and error tracking
  * Compatible with @mcp.tool() and @require_scopes() decorators
  * Records tool_name, duration, and success/error status

- server/notes.py:
  * Applied @instrument_tool to all 7 tool functions
  * nc_notes_create_note, nc_notes_update_note, nc_notes_append_content
  * nc_notes_search_notes, nc_notes_get_note, nc_notes_get_attachment
  * nc_notes_delete_note

These metrics will populate the MCP Tool Calls dashboard panels.

Part of PR #295 - Complete metrics instrumentation (Phase 5)
Remaining: 86 tools across 8 server files
2025-11-13 16:40:56 +01:00
Chris Coutinho 4ea5ed72d4 feat: Add Grafana dashboard and vector sync metric instrumentation
Implement comprehensive observability for vector database synchronization
with Grafana dashboard and Prometheus metrics.

## Part 1: Grafana Dashboard

Created all-in-one operations dashboard with 7 rows and 34 panels:

### Dashboard Structure:
- **Overview Row**: Request rate, error rate, P95 latency, active requests
- **HTTP Metrics (RED)**: Request/error rates by endpoint, latency percentiles
- **MCP Tools**: Call volume, error rates, execution duration by tool
- **Nextcloud API**: API calls/latency by app, retry patterns
- **OAuth & Authentication**: Token validations, exchanges, cache hit rate
- **Dependencies & Health**: Status for Nextcloud/Qdrant/Keycloak/Unstructured
- **Vector Sync**: Processing throughput, queue depth, Qdrant operations

### Helm Chart Integration:
- Added dashboard-configmap.yaml template for automatic provisioning
- Configured Grafana sidecar auto-discovery (label: grafana_dashboard="1")
- Added dashboards configuration section in values.yaml (opt-in)
- Updated Chart.yaml with dashboard annotations
- Enhanced NOTES.txt with dashboard deployment instructions
- Comprehensive documentation in dashboards/README.md

Dashboard supports dynamic filtering via variables:
- datasource: Prometheus data source selection
- namespace: Filter by Kubernetes namespace
- pod: Multi-select pod filtering
- interval: Query interval (1m/5m/10m/30m/1h)

## Part 2: Vector Sync Metric Instrumentation

Implemented metric recording throughout vector sync pipeline:

### metrics.py:
Added convenience functions:
- record_vector_sync_scan() - Track documents per scan
- record_vector_sync_processing() - Track processing duration/status
- record_qdrant_operation() - Track database operations
- update_vector_sync_queue_size() - Track queue depth

### scanner.py:
- Record number of documents found in each scan
- Enables monitoring of scan throughput

### processor.py:
- Record processing duration for each document
- Track success/failure status with timing
- Record Qdrant upsert/delete operations
- Handle all code paths (success, deletion, error)

### semantic.py:
- Wrap Qdrant query_points with try/except
- Record search operation success/failure

## Metrics Exposed:

- mcp_vector_sync_documents_scanned_total
- mcp_vector_sync_documents_processed_total{status}
- mcp_vector_sync_processing_duration_seconds (histogram)
- mcp_vector_sync_queue_size (gauge)
- mcp_qdrant_operations_total{operation,status}

This enables monitoring of:
- Scan and processing throughput
- Processing latency (P50/P95/P99)
- Error rates for processing and Qdrant operations
- Queue depth trends
- Complete observability of vector sync pipeline

## Testing:

Verified locally that metrics are recorded correctly:
- 36 documents scanned
- 3 documents processed (avg 7.5s each)
- 3 successful Qdrant upsert operations
- Search operations tracked

## Deployment:

Enable dashboard provisioning in Helm values:
```yaml
dashboards:
  enabled: true
  grafanaFolder: "Nextcloud MCP"
```

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-13 11:49:20 +01:00
Chris Coutinho 0eae33a918 ci: Fix logging warning and cli mock 2025-11-11 23:42:00 +01:00
Chris Coutinho a6e5f3d8ff refactor: simplify OpenTelemetry tracing configuration
Simplifies the OpenTelemetry tracing setup by removing the redundant
OTEL_ENABLED flag and using the presence of OTEL_EXPORTER_OTLP_ENDPOINT
to determine if tracing should be enabled. This follows the standard
OpenTelemetry environment variable conventions more closely.

Changes:
- Remove OTEL_ENABLED/tracing_enabled flag in favor of checking if
  OTEL_EXPORTER_OTLP_ENDPOINT is set
- Add OTEL_EXPORTER_VERIFY_SSL configuration option for OTLP endpoints
  with self-signed certificates (defaults to false for development)
- Move HTTPXClientInstrumentor initialization to module level to ensure
  httpx calls are traced across all Nextcloud API requests
- Add tracing spans to vector sync operations (scan_user_documents)
- Fix authorization header logging to only warn about missing headers
  in OAuth mode (BasicAuth mode doesn't use Authorization headers)
- Update observability documentation to reflect simplified configuration
- Refactor Dockerfile to use --no-editable flag for uv sync

Breaking changes:
- OTEL_ENABLED environment variable is removed
- Tracing is now automatically enabled when OTEL_EXPORTER_OTLP_ENDPOINT
  is set

Migration guide:
- Remove OTEL_ENABLED=true from environment configuration
- Tracing will be enabled automatically if OTEL_EXPORTER_OTLP_ENDPOINT
  is configured

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-10 22:48:37 +01:00
Chris Coutinho b32324cb76 feat: skip tracing for health and metrics endpoints
Health check and metrics endpoints are frequently polled and don't
provide meaningful trace data. This change skips OpenTelemetry span
creation for:
- /health/* (liveness, readiness checks)
- /metrics (Prometheus metrics)

These endpoints still record Prometheus metrics (request count, latency,
in-flight requests) but no longer create trace spans, reducing tracing
noise and storage costs.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-10 07:24:27 +01:00
Chris Coutinho f3050e9b45 chore: Remove /health and /metrics endpoints from logging 2025-11-10 02:07:45 +01:00
Chris Coutinho 4e89e92b65 fix(observability): isolate metrics endpoint to dedicated port
Security fix: Move Prometheus metrics endpoint from main HTTP port to
dedicated port 9090 to prevent external exposure of metrics data.

Changes:
- Use prometheus_client.start_http_server() for dedicated metrics server
- Remove /metrics route from main application routes
- Metrics now only accessible on port 9090 (configurable via METRICS_PORT)
- Main application port no longer serves /metrics endpoint

This follows security best practice of isolating monitoring endpoints
from application traffic.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-09 09:53:36 +01:00
Chris Coutinho 578de4d7d6 feat(observability): Add comprehensive monitoring with Prometheus and OpenTelemetry
- Add Prometheus metrics for HTTP, MCP tools, Nextcloud API, OAuth, vector sync, and DB operations
- Add OpenTelemetry distributed tracing with OTLP export
- Add structured JSON logging with trace context correlation
- Add ObservabilityMiddleware for automatic HTTP instrumentation
- Add app_name attribute to all client classes for per-app metrics
- Add configuration for metrics, tracing, and logging via environment variables
- Add documentation in docs/observability.md
- Fix graceful degradation when tracing is disabled (default state)
- Fix uvicorn logging configuration to use observability formatters

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-09 08:54:04 +01:00