Commit Graph

16 Commits

Author SHA1 Message Date
Chris Coutinho f43343356e fix: address review feedback — security, caching, CI 429 retry
- Add 429 retry with exponential backoff to register_client() (fixes CI
  oauth matrix failures from parallel DCR requests)
- Make client_id, redirect_uri, and PKCE mandatory at token endpoint
- Add null-checks for discovery_url and OAuth credentials in proxy flows
- Add OIDC discovery document caching with 5-min TTL
- Add per-IP rate limiting on /oauth/register DCR proxy
- Discover DCR endpoint from OIDC discovery instead of hardcoding
- Extract extract_user_id_from_token to auth/token_utils.py (breaks
  circular imports between server/ and auth/ layers)
- Add TTL scope cache in scope_authorization.py (avoids DB hit per tool)
- Add defense-in-depth scope validation in storage layer
- Broaden elicitation exception handling with graceful fallback
- Add idempotentHint to nc_auth_check_status, return "pending" status
  after accepted elicitation, add polling interval to description
- Change ALL_SUPPORTED_SCOPES from tuple to frozenset for O(1) lookups
- Replace Optional[str] with str | None throughout config.py
- Use default_factory for ProxyCodeEntry/ASProxySession dataclasses
- Add proxy code/session cleanup to background loop
- Fix OIDC verification CI step to only run for oauth/login-flow modes
- Add unit tests for access.py REST endpoints (10 tests)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-02 17:22:23 +01:00
Chris Coutinho 0d14c75eb1 fix: address remaining PR #589 review findings
- Consolidate MCP session + login flow cleanup into _mcp_session_with_login_flow() helper,
  replacing 4 duplicated AsyncExitStack sites in app.py
- Fix get_shared_storage() race condition by using module-level anyio.Lock() init
  (reverts regression from ba59763)
- Collapse cosmetic if/else branching in scope_authorization.py
- Consolidate dual password storage paths into single store_app_password_with_scopes() call
- Mark unused request param as _ in list_supported_scopes
- Make ALL_SUPPORTED_SCOPES an immutable tuple; use list() instead of .copy()
- Add hasattr(ctx, "elicit") guard in elicitation.py, narrow except to NotImplementedError
- Add YAML comment explaining --oauth flag for mcp-login-flow service

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-02 09:59:56 +01:00
Chris Coutinho ba597634bd fix: address PR #589 review findings
- Fix anyio.Lock() created at module import time; use lazy init in
  get_shared_storage() to avoid instantiation before event loop exists
- Stop get_login_flow_session from silently swallowing DB exceptions;
  re-raise and handle in caller with proper error response
- Update ProvisionAccessResponse and UpdateScopesResponse status field
  docs to include all actual values (declined, cancelled, unchanged)
- Narrow except clause in present_login_url to (AttributeError,
  NotImplementedError) instead of bare Exception
- Add KeyError handling in LoginFlowV2Client.initiate() and poll() for
  clear errors on malformed Nextcloud responses
- Simplify redundant env-var bypass branches in scope_authorization.py
- Extract _maybe_login_flow_cleanup() context manager to replace 4
  inline cleanup loop registrations in app.py; move sleep to end of
  loop body so cleanup runs once at startup
- Replace fragile string replacement in _rewrite_login_flow_url with
  proper urllib.parse URL handling

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-02 09:10:57 +01:00
Chris Coutinho 1a6ce0fa7d fix: address PR review issues for Login Flow v2
- Fix circular dependency in scope_authorization: auth tools requiring
  only identity scopes (openid/profile/email) now bypass the login flow
  provisioning check, so unprovisioned users can call provisioning tools
- Fix no-op detection in nc_auth_update_scopes: NULL scopes (legacy "all")
  now correctly map to ALL_SUPPORTED_SCOPES instead of empty set
- Fix get_app_password_with_scopes swallowing exceptions: re-raise instead
  of returning None, matching sibling methods
- Add missing audit logging to update_app_password_scopes,
  delete_login_flow_session, and delete_expired_login_flow_sessions
- Pin setup-uv to v7.3.1 in CI unit-test job (was v7.3.0)
- Add FastMCP type annotation to register_auth_tools parameter
- Log warning when user accepts elicitation without checking acknowledged box

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-01 19:02:30 +01:00
Chris Coutinho db1e0606ad fix: address PR #589 review feedback (round 2)
Consolidate three independent RefreshTokenStorage lazy singletons into a
single lock-protected get_shared_storage() function, eliminating race
conditions on concurrent first-access. Remove blanket try/except in
_get_stored_scopes so storage errors propagate as proper MCP errors
instead of silently triggering "please provision" messages. Handle
declined/cancelled elicitation results in Login Flow tools by cleaning up
sessions and returning clear status. Add update_app_password_scopes() to
avoid unnecessary decrypt/re-encrypt when only scopes change. Add
unprovisioned-user early exit and no-op detection to nc_auth_update_scopes.
Remove four dead config fields and misleading NEXTCLOUD_PASSWORD deprecation
warning. Add periodic login flow session cleanup task. Generate separate
Fernet keys per service. Add board cleanup in deck integration test. Gate
CI unit tests on linting and skip Astrolabe build for single-user profile.
Fix test markers from oauth to multi_user_basic for astrolabe integration
tests. Update login_flow.py docstrings to document outbound HTTP calls.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-01 16:35:31 +01:00
Chris Coutinho 8b5c2395b5 feat: add Docker Compose profiles and Login Flow v2 service
Add selective service startup via Docker Compose profiles so each MCP
deployment mode runs independently. Also add the new mcp-login-flow
service (port 8004) for Login Flow v2 authentication (ADR-022).

Profile assignments:
- single-user: mcp (port 8000)
- multi-user-basic: mcp-multi-user-basic (port 8003)
- oauth: mcp-oauth (port 8001)
- keycloak: keycloak + mcp-keycloak (port 8002)
- login-flow: mcp-login-flow (port 8004)

Infrastructure services (db, redis, app, recipes) always start.

Integration tests cover the full Login Flow v2 provisioning flow:
OAuth → browser login → app password → Nextcloud API access for
notes, calendar, contacts, files, deck, and cookbook operations.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-27 20:33:54 +01:00
Chris Coutinho 81efa6e263 fix: address PR #571 review comments
- Move httpx import to top-level and use anyio task group for concurrent
  validation in cleanup_invalid_app_passwords (storage.py)
- Respect Retry-After header for 429 responses, capped at 300s (oauth_sync.py)
- Soften pre-validation exceptions so transient failures don't crash the
  background sync task (oauth_sync.py)
- Replace f-string SQL with blanket DELETE and add returncode checks (conftest.py)
- Extract clear_stale_test_state() helper to deduplicate cleanup logic
  in astrolabe background sync tests

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-20 08:03:55 +01:00
Chris Coutinho 3779ec3e17 fix: resolve stale credentials causing astrolabe background sync test failures
The revoke test failed because it only completed Step 2 (app password) but
not Step 1 (OAuth authorization). In hybrid mode, Astrolabe requires both
steps for $isFullyConfigured=true, which gates the "Revoke Access" button.

Changes:
- Use complete_astrolabe_authorization() in revoke test for full two-step flow
- Add stale state cleanup (app passwords, bruteforce entries, Astrolabe prefs)
  to both enablement and revoke tests
- Add startup cleanup of invalid app passwords in BasicAuth mode
- Pre-validate credentials before entering scanner loop to fail fast
- Handle 401/403/429 in scanner with proper backoff and circuit breaking
- Clean up app passwords in test_users_setup fixture teardown

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-19 15:55:58 +01:00
Chris Coutinho e486e92f91 fix(auth): Store app passwords locally for multi-user BasicAuth background sync
Previously, the multi-user BasicAuth mode attempted to retrieve app passwords
via OAuth client_credentials grant, which Nextcloud OIDC doesn't support.

This fix implements local storage for app passwords:
- Add app_passwords table via Alembic migration (002)
- Add store/get/delete methods to RefreshTokenStorage
- Add management API endpoints for app password provisioning:
  - POST /api/v1/users/{user_id}/app-password
  - GET /api/v1/users/{user_id}/app-password
  - DELETE /api/v1/users/{user_id}/app-password
- Update oauth_sync.py to read from local storage
- Update Astrolabe to send app passwords to MCP server after validation
- Add app-hook to configure mcp_server_url in Nextcloud

The flow is now:
1. User creates app password in Nextcloud Security settings
2. User enters it in Astrolabe Personal Settings
3. Astrolabe validates against Nextcloud, then sends to MCP server
4. MCP server stores encrypted app password locally
5. Background sync uses locally stored password

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-13 15:44:11 +01:00
Chris Coutinho 056414752e fix(mcp): Move all imports to the top of modules 2025-12-26 10:05:27 -06:00
Chris Coutinho e4f3beee01 fix: resolve type checking warnings for CI
- Add type casts for Starlette app state access
- Add assertions for cipher, card, board, stack after initialization
- Add None checks for XML element text attributes
- Handle __package__ being None in tracing setup
- Fix TokenBrokerService initialization to use storage credentials

Resolves 42 type warnings from ty-check, enabling CI linting to pass.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-18 00:44:58 +01:00
Chris Coutinho 3fa376905c feat: add Alembic database migration system
Implements Alembic for managing token storage database schema versions.
Migrations run automatically on startup with full backward compatibility.

**Changes:**
- Add Alembic dependency (1.14.0+) and SQLAlchemy (auto-installed)
- Create migration infrastructure in alembic/ directory
- Add initial migration (001) capturing current schema
- Modify RefreshTokenStorage.initialize() to run migrations via anyio
- Add CLI commands: db upgrade, current, history, downgrade, migrate
- Add comprehensive migration documentation

**Backward Compatibility:**
- Pre-Alembic databases automatically stamped with revision 001
- No schema changes for existing databases
- Automatic upgrade on first startup after update

**Migration Strategy:**
Three scenarios handled:
1. New database → Run migrations from scratch
2. Pre-Alembic database → Stamp with 001 (no changes)
3. Alembic-managed → Upgrade to latest

**Architecture:**
- Uses anyio.to_thread.run_sync() for structured concurrency
- Alembic env.py runs with anyio.run() in worker thread
- SQLite-friendly migration patterns documented
- No ThreadPoolExecutor needed (anyio handles it)

**CLI Usage:**
```bash
nextcloud-mcp-server db upgrade    # Upgrade to latest
nextcloud-mcp-server db current    # Show version
nextcloud-mcp-server db history    # View changelog
nextcloud-mcp-server db downgrade  # Rollback (with confirmation)
nextcloud-mcp-server db migrate "description"  # Create migration
```

**Testing:**
- All 13 webhook storage tests pass
- New/pre-Alembic database scenarios validated
- anyio integration tested

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-18 00:02:09 +01:00
Chris Coutinho c8d9cc24e0 refactor: migrate asyncio to anyio for consistent structured concurrency
Replace asyncio primitives with anyio equivalents throughout the codebase
to establish a single async pattern. This provides better structured
concurrency with automatic cancellation on errors and aligns with the
pytest anyio configuration.

Changes:
- hybrid.py: Replace asyncio.gather() with anyio task groups
- token_broker.py: Replace asyncio.Lock() with anyio.Lock()
- storage.py: Replace asyncio.run() with anyio.run()
- app.py: Replace tg.start_soon() with await tg.start() for task status
- processor.py: Add task_status parameter for structured startup
- scanner.py: Add task_status parameter for structured startup
- CLAUDE.md: Update async/await patterns guidance

The change from start_soon() to await tg.start() enables proper task
initialization signaling, ensuring background tasks are ready before
proceeding. This follows anyio best practices for structured concurrency.

All 118 unit tests pass with the new implementation.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-16 03:51:45 +01:00
Chris Coutinho c97f12d47e feat: Add OAuth token and database metrics (Phases 3-4)
Complete Prometheus instrumentation for OAuth token operations
and additional database operations to populate empty dashboard panels.

OAuth Token Metrics (Phase 4):
- unified_verifier.py:
  * Token validation cache hits/misses
  * JWT verification success/failure/error
  * Introspection validation results
  * Audience validation failures
- context_helper.py:
  * Token exchange cache hits/misses
  * RFC 8693 exchange success/error

Database Metrics (Phase 3 completion):
- storage.py:
  * get_refresh_token() with timing
  * delete_refresh_token() with timing
  * All operations record duration and success/error status

These metrics populate the following dashboard panels:
- Token Validations (by method and result)
- Token Cache Hit Rate
- Token Exchange Operations
- Database Operations (refresh token CRUD)
- Database Operation Duration

Part of PR #295 - Complete metrics instrumentation
2025-11-13 16:23:00 +01:00
Chris Coutinho a667d7c59c feat: Add metrics instrumentation for queue, health, and database operations
Implement Prometheus metrics to populate empty Grafana dashboard panels.

## Phase 1: Queue Size Metrics 
**File**: `processor.py`
- Track vector sync queue depth in real-time
- Update metric after receiving and processing each document
- Update metric during timeout (empty queue)
- Enables: "Processing Queue Depth" panel

## Phase 2: Health Check Metrics 
**File**: `app.py`
- Add Nextcloud connectivity check with timing
- Add Qdrant health check with timing
- Record dependency health status (up/down)
- Record health check duration
- Enables: 4 health status panels + health check duration panel

## Phase 3: Database Operation Metrics (Partial) 
**File**: `storage.py`
- Instrument `store_refresh_token()` method
- Track SQLite INSERT operation timing and success/error status
- Enables: Partial data for database operation latency panel

## Metrics Now Exposed

### Queue Metrics:
- `mcp_vector_sync_queue_size` - Real-time queue depth

### Health Metrics:
- `mcp_dependency_health{dependency="nextcloud"}` - UP/DOWN status
- `mcp_dependency_health{dependency="qdrant"}` - UP/DOWN status
- `mcp_dependency_check_duration_seconds{dependency}` - Health check latency

### Database Metrics:
- `mcp_db_operations_total{db="sqlite",operation="insert"}` - Operation count
- `mcp_db_operation_duration_seconds{db="sqlite",operation="insert"}` - Operation latency

## Dashboard Impact

**Panels Now Populated** (7/34 panels):
-  Processing Queue Depth
-  Nextcloud Health
-  Qdrant Health
-  Health Check Duration
-  Database Operation Latency (partial)
-  Vector sync panels (already working from PR #292)

**Panels Still Empty** (remaining work):
-  OAuth panels (4): Token validations, exchanges, cache hit rate, refresh ops
-  MCP tool panels (3): Call volume, error rates, execution duration
-  Database panel: Needs more SQLite operations instrumented (~29 remaining)

## Testing

Verified metric definitions exist and will be recorded on next deployment.

## Next Steps

Phase 4: OAuth token metrics (unified_verifier.py, context_helper.py, storage.py)
Phase 5: MCP tool metrics (all server/*.py files with @mcp.tool())
Phase 3 completion: Remaining 29 database operations in storage.py

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-13 16:14:38 +01:00
Chris Coutinho 1bced88c97 refactor: consolidate database storage for webhooks and OAuth tokens
Refactored the storage system to use a unified SQLite database for both
webhook tracking and OAuth token storage, available in both BasicAuth
and OAuth modes.

Changes:
- Renamed refresh_token_storage.py → storage.py
- Made TOKEN_ENCRYPTION_KEY optional (only required for OAuth token ops)
- Added registered_webhooks table with schema versioning
- Added webhook storage methods (store, get, delete, list, clear)
- Initialize storage in both BasicAuth and OAuth modes
- Updated webhook routes to persist registrations in database
- Database-first pattern for webhook status checks (performance)
- Updated all imports across codebase

Storage Behavior:
- Database created automatically at startup if needed
- Existing databases detected and reused
- Server fails fast if database initialization fails
- No migrations needed (OAuth feature is experimental)

Testing:
- Added 13 comprehensive unit tests for webhook storage
- All 118 unit tests pass
- All 5 smoke tests pass
- Verified fail-fast behavior on initialization errors

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-11 20:01:49 +01:00