nextcloud-mcp-server

Author	SHA1	Message	Date
Chris Coutinho	47fb562326	fix: replace assert with proper guard and invalidate scope cache after provisioning Replace `assert entry.code_challenge` with a proper if-guard returning a 500 JSON error in the token endpoint, since Python's -O flag strips asserts and would silently disable PKCE enforcement. Invalidate the scope cache immediately after Login Flow v2 provisioning completes, so users no longer hit ProvisioningRequiredError for up to 5 minutes after successfully authenticating. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-03 09:31:36 +01:00
Chris Coutinho	1fae6920be	fix: disable NC rate limiting in dev/CI and add token endpoint diagnostics Disable Nextcloud's bruteforce protection and rate limiting via a new post-installation hook, preventing 429 errors during repeated DCR calls in CI. Add warning-level logging to all 8 error paths in the AS proxy token endpoint to make login-flow 400 errors diagnosable. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-03 08:57:02 +01:00
Chris Coutinho	f43343356e	fix: address review feedback — security, caching, CI 429 retry - Add 429 retry with exponential backoff to register_client() (fixes CI oauth matrix failures from parallel DCR requests) - Make client_id, redirect_uri, and PKCE mandatory at token endpoint - Add null-checks for discovery_url and OAuth credentials in proxy flows - Add OIDC discovery document caching with 5-min TTL - Add per-IP rate limiting on /oauth/register DCR proxy - Discover DCR endpoint from OIDC discovery instead of hardcoding - Extract extract_user_id_from_token to auth/token_utils.py (breaks circular imports between server/ and auth/ layers) - Add TTL scope cache in scope_authorization.py (avoids DB hit per tool) - Add defense-in-depth scope validation in storage layer - Broaden elicitation exception handling with graceful fallback - Add idempotentHint to nc_auth_check_status, return "pending" status after accepted elicitation, add polling interval to description - Change ALL_SUPPORTED_SCOPES from tuple to frozenset for O(1) lookups - Replace Optional[str] with str \| None throughout config.py - Use default_factory for ProxyCodeEntry/ASProxySession dataclasses - Add proxy code/session cleanup to background loop - Fix OIDC verification CI step to only run for oauth/login-flow modes - Add unit tests for access.py REST endpoints (10 tests) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-02 17:22:23 +01:00
Chris Coutinho	9d1a84af5a	feat(auth): implement OAuth AS proxy to fix audience mismatch (ADR-023) MCP clients like Claude Code were unable to use tools because tokens obtained directly from Nextcloud had the wrong audience claim. The MCP server now acts as its own OAuth Authorization Server, proxying auth to Nextcloud with its own client_id so tokens have the correct audience. New endpoints: /.well-known/oauth-authorization-server, /oauth/token, /oauth/register. Modified /oauth/authorize from pass-through to intermediary pattern. PRM now points authorization_servers to the MCP server instead of Nextcloud. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-02 11:25:54 +01:00
Chris Coutinho	0d14c75eb1	fix: address remaining PR #589 review findings - Consolidate MCP session + login flow cleanup into _mcp_session_with_login_flow() helper, replacing 4 duplicated AsyncExitStack sites in app.py - Fix get_shared_storage() race condition by using module-level anyio.Lock() init (reverts regression from `ba59763`) - Collapse cosmetic if/else branching in scope_authorization.py - Consolidate dual password storage paths into single store_app_password_with_scopes() call - Mark unused request param as _ in list_supported_scopes - Make ALL_SUPPORTED_SCOPES an immutable tuple; use list() instead of .copy() - Add hasattr(ctx, "elicit") guard in elicitation.py, narrow except to NotImplementedError - Add YAML comment explaining --oauth flag for mcp-login-flow service Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-02 09:59:56 +01:00
Chris Coutinho	ba597634bd	fix: address PR #589 review findings - Fix anyio.Lock() created at module import time; use lazy init in get_shared_storage() to avoid instantiation before event loop exists - Stop get_login_flow_session from silently swallowing DB exceptions; re-raise and handle in caller with proper error response - Update ProvisionAccessResponse and UpdateScopesResponse status field docs to include all actual values (declined, cancelled, unchanged) - Narrow except clause in present_login_url to (AttributeError, NotImplementedError) instead of bare Exception - Add KeyError handling in LoginFlowV2Client.initiate() and poll() for clear errors on malformed Nextcloud responses - Simplify redundant env-var bypass branches in scope_authorization.py - Extract _maybe_login_flow_cleanup() context manager to replace 4 inline cleanup loop registrations in app.py; move sleep to end of loop body so cleanup runs once at startup - Replace fragile string replacement in _rewrite_login_flow_url with proper urllib.parse URL handling Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-02 09:10:57 +01:00
Chris Coutinho	1a6ce0fa7d	fix: address PR review issues for Login Flow v2 - Fix circular dependency in scope_authorization: auth tools requiring only identity scopes (openid/profile/email) now bypass the login flow provisioning check, so unprovisioned users can call provisioning tools - Fix no-op detection in nc_auth_update_scopes: NULL scopes (legacy "all") now correctly map to ALL_SUPPORTED_SCOPES instead of empty set - Fix get_app_password_with_scopes swallowing exceptions: re-raise instead of returning None, matching sibling methods - Add missing audit logging to update_app_password_scopes, delete_login_flow_session, and delete_expired_login_flow_sessions - Pin setup-uv to v7.3.1 in CI unit-test job (was v7.3.0) - Add FastMCP type annotation to register_auth_tools parameter - Log warning when user accepts elicitation without checking acknowledged box Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-01 19:02:30 +01:00
Chris Coutinho	db1e0606ad	fix: address PR #589 review feedback (round 2) Consolidate three independent RefreshTokenStorage lazy singletons into a single lock-protected get_shared_storage() function, eliminating race conditions on concurrent first-access. Remove blanket try/except in _get_stored_scopes so storage errors propagate as proper MCP errors instead of silently triggering "please provision" messages. Handle declined/cancelled elicitation results in Login Flow tools by cleaning up sessions and returning clear status. Add update_app_password_scopes() to avoid unnecessary decrypt/re-encrypt when only scopes change. Add unprovisioned-user early exit and no-op detection to nc_auth_update_scopes. Remove four dead config fields and misleading NEXTCLOUD_PASSWORD deprecation warning. Add periodic login flow session cleanup task. Generate separate Fernet keys per service. Add board cleanup in deck integration test. Gate CI unit tests on linting and skip Astrolabe build for single-user profile. Fix test markers from oauth to multi_user_basic for astrolabe integration tests. Update login_flow.py docstrings to document outbound HTTP calls. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-01 16:35:31 +01:00
Chris Coutinho	e28af5453b	fix: address PR #589 review feedback for Login Flow v2 - Fix data loss in nc_auth_update_scopes: remove premature delete_app_password call; old password stays valid until upsert replaces it on successful re-provisioning - Replace assert with proper error return in nc_auth_check_status - Add lazy singleton for RefreshTokenStorage in auth_tools, scope_authorization, and context to avoid per-call re-initialization - Centralize _is_login_flow_mode() to get_settings().enable_login_flow and remove duplicate definitions and per-call os.getenv reads - Add dev-only comment to TOKEN_ENCRYPTION_KEY in docker-compose.yml - Gate OIDC build steps in CI behind matrix.needs-playwright - Add diagnostic step reporting Playwright skip count in CI Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-28 10:08:55 +01:00
Chris Coutinho	8b5c2395b5	feat: add Docker Compose profiles and Login Flow v2 service Add selective service startup via Docker Compose profiles so each MCP deployment mode runs independently. Also add the new mcp-login-flow service (port 8004) for Login Flow v2 authentication (ADR-022). Profile assignments: - single-user: mcp (port 8000) - multi-user-basic: mcp-multi-user-basic (port 8003) - oauth: mcp-oauth (port 8001) - keycloak: keycloak + mcp-keycloak (port 8002) - login-flow: mcp-login-flow (port 8004) Infrastructure services (db, redis, app, recipes) always start. Integration tests cover the full Login Flow v2 provisioning flow: OAuth → browser login → app password → Nextcloud API access for notes, calendar, contacts, files, deck, and cookbook operations. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-27 20:33:54 +01:00
Chris Coutinho	a11ae9c027	refactor: enforce PLC0415 (import-outside-top-level) for source code Enable ruff PLC0415 rule for all source files (tests excluded via per-file-ignores). Move 136 inline imports to top-level across 33 files. 8 imports suppressed with noqa for legitimate reasons: circular dependencies (client/__init__.py, context.py), optional dependency guards (app.py document processors, auth/userinfo_routes.py), and post-env-setup imports (smithery_main.py). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-20 08:04:50 +01:00
Chris Coutinho	81efa6e263	fix: address PR #571 review comments - Move httpx import to top-level and use anyio task group for concurrent validation in cleanup_invalid_app_passwords (storage.py) - Respect Retry-After header for 429 responses, capped at 300s (oauth_sync.py) - Soften pre-validation exceptions so transient failures don't crash the background sync task (oauth_sync.py) - Replace f-string SQL with blanket DELETE and add returncode checks (conftest.py) - Extract clear_stale_test_state() helper to deduplicate cleanup logic in astrolabe background sync tests Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-20 08:03:55 +01:00
Chris Coutinho	3779ec3e17	fix: resolve stale credentials causing astrolabe background sync test failures The revoke test failed because it only completed Step 2 (app password) but not Step 1 (OAuth authorization). In hybrid mode, Astrolabe requires both steps for $isFullyConfigured=true, which gates the "Revoke Access" button. Changes: - Use complete_astrolabe_authorization() in revoke test for full two-step flow - Add stale state cleanup (app passwords, bruteforce entries, Astrolabe prefs) to both enablement and revoke tests - Add startup cleanup of invalid app passwords in BasicAuth mode - Pre-validate credentials before entering scanner loop to fail fast - Handle 401/403/429 in scanner with proper backoff and circuit breaking - Clean up app passwords in test_users_setup fixture teardown Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-19 15:55:58 +01:00
Chris Coutinho	1707b2e6e1	feat: add self-signed SSL certificate support for Nextcloud connections Add NEXTCLOUD_VERIFY_SSL and NEXTCLOUD_CA_BUNDLE env vars to configure TLS certificate verification for all outbound Nextcloud connections. Centralizes SSL config via a new HTTP client factory (http.py) used by all 27 Nextcloud-bound call sites, including API clients, OIDC endpoints, OAuth flows, and health checks. Closes #560 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-16 09:21:21 +01:00
Chris Coutinho	01ad2b3d21	refactor: Use get_settings() for vector sync enabled check Replace direct os.getenv() calls with get_settings().vector_sync_enabled to ensure consistent behavior with both VECTOR_SYNC_ENABLED (deprecated) and ENABLE_SEMANTIC_SEARCH environment variables. Also add webhook management documentation guide. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-14 20:30:51 +01:00
Chris Coutinho	e486e92f91	fix(auth): Store app passwords locally for multi-user BasicAuth background sync Previously, the multi-user BasicAuth mode attempted to retrieve app passwords via OAuth client_credentials grant, which Nextcloud OIDC doesn't support. This fix implements local storage for app passwords: - Add app_passwords table via Alembic migration (002) - Add store/get/delete methods to RefreshTokenStorage - Add management API endpoints for app password provisioning: - POST /api/v1/users/{user_id}/app-password - GET /api/v1/users/{user_id}/app-password - DELETE /api/v1/users/{user_id}/app-password - Update oauth_sync.py to read from local storage - Update Astrolabe to send app passwords to MCP server after validation - Add app-hook to configure mcp_server_url in Nextcloud The flow is now: 1. User creates app password in Nextcloud Security settings 2. User enters it in Astrolabe Personal Settings 3. Astrolabe validates against Nextcloud, then sends to MCP server 4. MCP server stores encrypted app password locally 5. Background sync uses locally stored password Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-13 15:44:11 +01:00
Chris Coutinho	056414752e	fix(mcp): Move all imports to the top of modules	2025-12-26 10:05:27 -06:00
Chris Coutinho	0a23e484e9	docs(auth): Update docstrings of management api auth handling	2025-12-26 09:05:04 -06:00
Chris Coutinho	804480836e	fix(auth): Skip issuer validation for management API tokens Fixes NC PHP app (Astrolabe) OAuth integration by making token validation more lenient for management API access. Problem: - Astrolabe calls Nextcloud OIDC token endpoint via internal URL (http://localhost) - Tokens are issued with iss: http://localhost (internal) - MCP server expects iss: http://localhost:8080 (external) - Token validation failed with "Invalid issuer" Solution: - Add skip_issuer_check parameter to _verify_jwt_signature() - verify_token_for_management_api() now skips both audience and issuer checks - Security maintained: signature still verified, authorization checked by API Also includes related fixes from previous session: - Update test selectors for Vue 3 UI ("Enable Semantic Search") - Fix OIDC discovery URL transformation in OAuthController.php - Add overwrite.cli.url to setup hook for proper external URLs 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-24 17:25:48 -06:00
Chris Coutinho	a51376fd5a	fix: Use settings.enable_offline_access for env var consolidation Migrate all direct ENABLE_OFFLINE_ACCESS environment variable checks to use settings.enable_offline_access, which handles both the new ENABLE_BACKGROUND_OPERATIONS and deprecated ENABLE_OFFLINE_ACCESS vars. Also fixes JWT issuer validation in Docker by using NEXTCLOUD_PUBLIC_ISSUER_URL when set, resolving 401 errors caused by internal/external URL mismatch. Changes: - app.py: Use settings for offline access checks in setup_oauth_config, register_oauth_client, and tool registration - oauth_tools.py: Use settings in provision_nextcloud_access and check_logged_in - management.py: Use settings in get_user_session - scope_authorization.py: Use settings in require_scopes decorator - Remove unused os imports after migration 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-24 09:10:01 -06:00
Chris Coutinho	5e76ddc60d	feat: Remove URL rewriting in favor of proper nextcloud config Remove URL rewriting logic from MCP server that was converting public URLs to internal Docker URLs. This was a workaround for Nextcloud's overwritehost setting forcing URLs to localhost:8080. Changes: - Remove OIDC endpoint rewriting in app.py (setup_oauth_config) - Remove OIDC_JWKS_URI override support (no longer needed) - Remove URL rewriting in browser_oauth_routes.py - Remove URL rewriting in token_broker.py - Update Helm chart values and README - Add hybrid auth setup unit tests - Update Astrolabe admin UI for Vue 3 The proper fix is in the previous commit which removes the overwritehost setting from Nextcloud, allowing it to respect the Host header from incoming requests.	2025-12-23 11:34:57 -07:00
Chris Coutinho	286a3eb20f	feat(auth): add multi-user BasicAuth pass-through mode Implement multi-user BasicAuth pass-through mode (ADR-020) where each request includes BasicAuth credentials that are forwarded to Nextcloud APIs without persistent storage. Changes: - Add _get_client_from_basic_auth() in context.py to extract credentials from Authorization header (set by BasicAuthMiddleware) - Add AstrolabeClient for app password provisioning via Astrolabe API - Update oauth_sync.py with dual credential support (app passwords first, then refresh tokens as fallback) - Simplify oauth_tools.py provisioning logic - Add integration tests for app password provisioning and multi-user BasicAuth Features: - Stateless multi-user mode: credentials passed per-request - Optional background sync via app passwords (stored in Astrolabe) - Falls back to refresh tokens if app password not available - Test coverage for provisioning flow and pass-through mode Related: ADR-019 (Multi-user BasicAuth), ADR-020 (Deployment Modes) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>	2025-12-20 20:55:31 +01:00
Chris Coutinho	daabd90359	fix(security): address critical security issues from PR #401 code review Implemented 6 critical security fixes identified during PR #401 review: 1. Token Rotation Race Condition (Issue 1) - Added in-progress marker pattern to prevent concurrent refresh - Prevents token invalidation when multiple requests refresh simultaneously - File: token_broker.py:324, 343-390 2. Hardcoded Localhost URL (Issue 2) - Added getNextcloudBaseUrl() with fallback chain - Supports overwrite.cli.url, trusted_domains, and localhost fallback - File: IdpTokenRefresher.php:38-61, 116 3. Error Information Leakage (Issue 3) - Replaced 13 instances of str(e) with sanitized errors - Prevents exposure of stack traces, paths, and tokens - File: management.py:368, 444, 492, 510, 546, 571, 625, 643, 695, 750, 919, 956, 1121 4. Input Validation Gaps (Issue 4) - Added validation helpers: _parse_int_param, _parse_float_param, _validate_query_string - Applied bounds checking to get_chunk_context and unified_search - File: management.py:119-164, 807-835, 1197-1212 5. PHP Refresh Token Validation (Issue 5) - Added explicit refresh_token presence check - Prevents silent token rotation failures - File: IdpTokenRefresher.php:122-132 6. Cookie Security Configuration (Issue 6) - Added _should_use_secure_cookies() with auto-detection - Supports explicit COOKIE_SECURE env var or auto-detect from NEXTCLOUD_HOST - Files: browser_oauth_routes.py:27-44, 470; env.sample:54-57 Testing: - Unit tests: 195 passed - Integration tests: 102 passed, 4 skipped - OAuth tests: 9 passed - All linting and type checks passed Follow-up work tracked in issues #408-#417 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>	2025-12-19 13:57:33 +01:00
Chris Coutinho	e4f3beee01	fix: resolve type checking warnings for CI - Add type casts for Starlette app state access - Add assertions for cipher, card, board, stack after initialization - Add None checks for XML element text attributes - Handle __package__ being None in tracing setup - Fix TokenBrokerService initialization to use storage credentials Resolves 42 type warnings from ty-check, enabling CI linting to pass. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>	2025-12-18 00:44:58 +01:00
Chris Coutinho	3fa376905c	feat: add Alembic database migration system Implements Alembic for managing token storage database schema versions. Migrations run automatically on startup with full backward compatibility. Changes: - Add Alembic dependency (1.14.0+) and SQLAlchemy (auto-installed) - Create migration infrastructure in alembic/ directory - Add initial migration (001) capturing current schema - Modify RefreshTokenStorage.initialize() to run migrations via anyio - Add CLI commands: db upgrade, current, history, downgrade, migrate - Add comprehensive migration documentation Backward Compatibility: - Pre-Alembic databases automatically stamped with revision 001 - No schema changes for existing databases - Automatic upgrade on first startup after update Migration Strategy: Three scenarios handled: 1. New database → Run migrations from scratch 2. Pre-Alembic database → Stamp with 001 (no changes) 3. Alembic-managed → Upgrade to latest Architecture: - Uses anyio.to_thread.run_sync() for structured concurrency - Alembic env.py runs with anyio.run() in worker thread - SQLite-friendly migration patterns documented - No ThreadPoolExecutor needed (anyio handles it) CLI Usage: ```bash nextcloud-mcp-server db upgrade # Upgrade to latest nextcloud-mcp-server db current # Show version nextcloud-mcp-server db history # View changelog nextcloud-mcp-server db downgrade # Rollback (with confirmation) nextcloud-mcp-server db migrate "description" # Create migration ``` Testing: - All 13 webhook storage tests pass - New/pre-Alembic database scenarios validated - anyio integration tested 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>	2025-12-18 00:02:09 +01:00
Chris Coutinho	44391d3d1d	fix: address critical code review issues (4 fixes) This commit addresses 4 critical issues identified in code review: 1. Token Rotation Race Condition (token_broker.py) - Added per-user locking mechanism to prevent concurrent refresh token corruption - Implemented double-check pattern for cache after acquiring lock - Users can now safely refresh concurrently without token desync 2. Hardcoded OAuth Client ID (PHP files) - Made client ID configurable via `astroglobe_client_id` in system config - Updated McpServerClient to provide getClientId() method - Injected McpServerClient into IdpTokenRefresher and OAuthController - Updated admin settings UI to display client ID configuration status - App gracefully handles missing client ID with warnings in admin UI 3. Missing Cache Invalidation (management.py:revoke_user_access) - Added cache.invalidate() call when revoking user access - Ensures both storage AND cache are cleared atomically - Prevents stale cached tokens from being used after revocation 4. Error Message Exposure (management.py) - Created _sanitize_error_for_client() helper function - Updated all error handlers to log detailed errors internally - Returns generic messages to clients to prevent information leakage - Protects against exposing database paths, API URLs, tokens, etc. All changes are backward compatible and preserve existing functionality. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>	2025-12-18 00:01:54 +01:00
Chris Coutinho	5eec34c17e	feat(auth): implement refresh token rotation for Nextcloud OIDC Add support for one-time use refresh tokens with automatic rotation to align with Nextcloud OIDC security model. Changes: - TokenBrokerService improvements: - Add user_id parameter to refresh methods - Detect and store rotated refresh tokens - Add offline_access scope to token requests - Handle refresh token rotation on every use - Add management API endpoints: - /api/v1/webhooks (GET/POST) - List/create webhooks - /api/v1/webhooks/{id} (DELETE) - Delete webhook - /api/v1/search (POST) - Unified search - /api/v1/chunk-context (GET) - Get chunk context - /api/v1/apps (GET) - List installed apps - Update tests for refresh token rotation - Add --headed flag to pytest for Playwright debugging Benefits: - Aligns with Nextcloud OIDC one-time refresh token model - Prevents refresh token invalidation after first use - Enables long-lived background operations - Provides full webhook lifecycle management 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>	2025-12-18 00:01:53 +01:00
Chris Coutinho	a58a14111b	feat(vector-sync): enable background sync in OAuth mode Add multi-user background vector synchronization when running in OAuth mode with ENABLE_OFFLINE_ACCESS=true. Key changes: Architecture (oauth_sync.py): - User Manager task polls RefreshTokenStorage for provisioned users - Per-user scanner tasks fetch documents using OAuth tokens - Shared processor pool indexes documents from all users Token Broker improvements: - Accept client_id/client_secret instead of encryption_key - Remove redundant token audience pre-validation (Nextcloud validates) - Add _rewrite_token_endpoint for Docker internal URL routing - Remove double-decryption (storage handles encryption internally) Browser OAuth flow fixes: - Add 'resource' parameter to request Nextcloud-scoped tokens - Store and retrieve next_url for proper redirect after consent - Rewrite token endpoint URLs for internal Docker access Configuration: - Add vector_sync_user_poll_interval setting (default: 60s) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-14 20:00:41 +01:00
Chris Coutinho	20404cf3f2	feat(vector): add Deck card vector search with visualization support Adds comprehensive vector search support for Nextcloud Deck cards, including semantic search indexing, chunk preview in the vector viz UI, and proper deep linking to cards. Vector Search Indexing - Add deck_card scanning in scanner.py (scan_deck_cards function) - Index cards from non-archived, non-deleted boards - Store metadata: board_id, board_title, stack_id, stack_title, card_type, duedate, owner - Content structure: title + "\n\n" + description (matches indexing format) - Incremental sync based on lastModified timestamp - Deletion tracking with grace period Vector Visualization Support - Add deck_card handler in context.py for chunk preview expansion - Include board_id in search result metadata (bm25_hybrid.py, semantic.py) - Expose metadata in viz_routes.py JSON responses - Update vector-viz.js to construct proper Deck URLs: /apps/deck/board/{board_id}/card/{card_id} - Update vector_viz.html filter label from "Deck" to "Deck Cards" Bug Fixes - Skip soft-deleted boards (deletedAt > 0) to prevent 403 Forbidden errors - Applies to scanner, processor, and context expansion code paths - Deck API returns deleted boards but rejects stack access with 403 Testing - Add integration tests in test_deck_vector_search.py: - test_deck_card_semantic_search: Filtered search with doc_type="deck_card" - test_deck_card_appears_in_cross_app_search: Cross-app search includes deck cards - test_deck_card_chunk_context: Chunk context fetching for viz preview Documentation - Update README.md: Add Deck cards to semantic search feature list - Update semantic-search-architecture.md: Document deck_card support - Update nc_semantic_search tool documentation Type Safety - Fix type narrowing for page_boundaries (could be None) using cast() - Fix scanner.py payload None check for type safety Resolves vector search for Deck cards across indexing, search, and visualization. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>	2025-12-13 23:51:18 +01:00
Chris Coutinho	9d0a993c2a	feat(vector-viz): add news_item support for links and chunk expansion Add support for news_item document type in the vector visualization page: - Add "News" checkbox to document type filter options - Add URL handler to link news items to /apps/news/item/{id} - Add content fetching for news items in chunk context expansion This enables users to search and view news articles in the vector visualization, with clickable links back to Nextcloud News and the ability to expand chunks to see full article context. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-13 21:34:47 +01:00
Chris Coutinho	3f06e2ee77	fix: resolve all type checking errors (8 errors fixed) Fixed 8 type checker errors across the codebase: - vector/scanner.py: Handle None scroll results with null-safe iteration - search/{bm25_hybrid,semantic}.py: Add None checks for result.payload - auth/{unified_verifier,webhook_routes}.py: Assert non-None auth credentials - client/webdav.py: Add None checks before int() conversions - providers/openai.py: Assert embedding_model is not None - search/algorithms.py: Explicitly type doc_types set and cast values - observability/logging_config.py: Match parent class signature (log_data) Also fixed test_create_tag_creates_system_tag to match WebDAV implementation (was testing OCS API endpoint, now tests correct WebDAV endpoint with Content-Location header). Type checker: 0 errors (down from 8), 20 warnings (ignored) Tests: All 192 unit tests passing 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>	2025-12-08 01:09:02 +01:00
Chris Coutinho	fafeaf3d83	refactor: Move background tasks to server lifespan and deprecate SSE transport - Move scanner/processor tasks from FastMCP session lifespan to Starlette server lifespan (correct architecture: background tasks run once at server level, not per-session) - Change default CLI transport from SSE to streamable-http - Remove SSE transport option from CLI (SSE is deprecated) - Remove SSE client session factory from test fixtures - Add tracing instrumentation to BM25 hybrid search operations for better observability 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-23 04:02:30 +01:00
Chris Coutinho	b0612cfa0f	perf: Optimize vector viz search performance - Replace sequential Qdrant scroll calls with batch retrieve (50 HTTP requests → 1 request, ~50x faster vector fetch) - Add point_id to SearchResult to enable batch retrieval by Qdrant point ID - Reuse query embedding from search algorithm in viz_routes (eliminates redundant embedding call, saves ~30ms) - Make BM25 encode() async with thread pool to avoid blocking event loop (~4.4s was blocking, now properly async) - Run PCA computation in thread pool to avoid blocking event loop (~1.2s was blocking, now properly async) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-22 19:47:43 +01:00
Chris Coutinho	fffe483c02	fix: Centralize PDF processing and generate separate images per chunk Previously, pymupdf4llm.to_markdown() was called twice - once in PyMuPDFProcessor during indexing and again in PDFHighlighter during visualization. Different image path lengths caused different character offsets, leading to highlighted pages not matching their chunks. Also fixed issue where all chunks on the same page showed all highlights instead of just their own highlight. Now restores original page contents between chunks using xref stream caching. Changes: - Add PDFHighlighter class requiring pre-computed page_boundaries and full_text from document processor (no fallback extraction) - Pass pre-computed data from processor to highlighter - Extract page-relative portion of chunk text for cross-page chunks - Add bounding box highlighting using text anchor search - Run highlight generation in parallel with embedding/BM25 - Cache and restore page contents to isolate highlights per chunk Results: Highlighting success rate improved from 51% to 95% (121/128). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-22 02:46:30 +01:00
Chris Coutinho	a62a007c87	feat: Add context expansion to semantic search with chunk overlap removal Implements optional context expansion for semantic search results that fetches adjacent chunks (N-1 and N+1) from Qdrant to provide before/after context. Removes configurable chunk overlap (default 200 chars) to avoid duplicate text appearing in both context and excerpt. Key changes: - Add include_context and context_chars parameters to nc_semantic_search and nc_semantic_search_answer tools - Implement Qdrant cache fast path for chunk retrieval (avoids re-fetching and re-parsing documents, especially important for PDFs) - Add _get_chunk_by_index_from_qdrant() to fetch adjacent chunks - Remove chunk overlap from before_context (last N chars) and after_context (first N chars) to prevent duplicate text - Fetch context in parallel with anyio.Semaphore (max 20 concurrent) - Pass through page_number from SearchResult to SemanticSearchResult - Remove document-level deduplication (keep chunk-level dedup from algorithm) Context expansion is opt-in via include_context=true parameter. When enabled: - Populates has_context_expansion, marked_text, before_context, after_context - Adds truncation flags when context exceeds context_chars limit - Falls back to document fetch for legacy data with truncated excerpts Related: nextcloud_mcp_server/search/context.py:87-382, nextcloud_mcp_server/server/semantic.py:161-255	2025-11-21 01:02:22 +01:00
Chris Coutinho	13b2d0048c	feat: Implement Qdrant placeholder state management Introduces a placeholder-based state tracking system to prevent duplicate document processing during the gap between scanner queuing and processor completion. Key Changes: 1. Placeholder Helper Functions (`vector/placeholder.py`): - `write_placeholder_point()` - Creates zero-vector placeholder when queuing - `query_document_metadata()` - Queries for existing entry (placeholder or real) - `delete_placeholder_point()` - Removes placeholder before writing real vectors - `get_placeholder_filter()` - Filters placeholders from user-facing queries 2. Scanner Updates (`vector/scanner.py`): - Replace `indexed_at` comparison with `modified_at` comparison - Write placeholder before queuing each document - Query per-document metadata instead of bulk-querying indexed_at - Fixes bug where files were resubmitted every scan cycle 3. Processor Updates (`vector/processor.py`): - Delete placeholder before upserting real vectors - Ensures no duplicate points in Qdrant 4. Query Filters (all search files): - Add `get_placeholder_filter()` to all user-facing queries - Ensures placeholders never appear in search results or visualizations - Applied to: bm25_hybrid.py, semantic.py, viz_routes.py, algorithms.py Architecture: - Placeholders use zero vectors with dimension from embedding service - Payload includes `is_placeholder: True` flag for filtering - Status field tracks: "pending", "processing", "completed", "failed" - Deterministic UUIDs using uuid5 for consistent point IDs Impact: - Eliminates duplicate processing of same documents - Fixes race condition where long-running documents get queued multiple times - Prevents scanner from resubmitting files every scan cycle - Maintains clean separation between in-flight and indexed documents 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-20 15:04:00 +01:00
Chris Coutinho	944dd760ca	fix: Return empty array instead of null for query_coords when no results When vector visualization search returns zero results, the code was returning query_coords: null, which caused JavaScript error "can't access property 0, queryCoords is null" when the frontend tried to access the array. Changed to return empty array [] to match expected type and prevent crash. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-20 14:18:02 +01:00
Chris Coutinho	f1a5fac1b9	fix: Update models and viz to use int-only doc_id - algorithms.py: Revert SearchResult.id to int (all docs use int IDs now) - semantic.py: Revert SemanticSearchResult.id to int, remove Union import - viz_routes.py: Remove str() conversion when querying doc_id from Qdrant - viz_routes.py: Convert doc_id from query param to int in chunk context Fixes vector visualization which was collapsing all chunks to a single point because Qdrant queries were failing to match doc_id (string vs int).	2025-11-20 12:32:27 +01:00
Chris Coutinho	327d843f64	feat: Implement per-chunk vector visualization with context expansion Major improvements to vector visualization page: - Refactor PCA to display individual chunks instead of averaged documents - Add context expansion module for fetching surrounding text from notes and PDFs - Update deduplication to use (doc_id, doc_type, chunk_start, chunk_end) keys - Fix Alpine.js rendering with chunk-specific keys including offsets - Refactor authentication helper to return NextcloudClient for better reuse - Add async context manager support to NextcloudClient Technical details: - viz_routes.py: Fetch specific chunk vectors instead of averaging per document - context.py: New module supporting both notes and PDF text extraction via PyMuPDF - search algorithms: Extract page_number, chunk_index, total_chunks from Qdrant - vector-viz.js/html: Use chunk positions in expansion tracking keys This enables users to see which specific chunks match their query and view them with surrounding context in the PCA visualization. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-20 11:22:20 +01:00
Chris Coutinho	b8010270c1	fix: Add async/await, PDF metadata, and type safety fixes This commit addresses multiple issues with async operations, PDF metadata extraction, and type safety in document processing and search. ## Async/Await Fixes - processor.py:259 - Added await for chunker.chunk_text(content) - processor.py:270 - Added await for bm25_service.encode_batch(chunk_texts) - tests/unit/test_document_chunker.py - Converted all 12 test methods to async ## PDF Metadata Enhancement - pymupdf.py:143 - Added file_size metadata extraction - pymupdf.py:145-206 - Refactored to extract text page-by-page - Manually loop through pages instead of using page_chunks=True - Generate page_boundaries metadata for precise page tracking - Works around pymupdf.layout.activate() breaking page_chunks=True - processor.py:32-66 - Added assign_page_numbers() helper function - Assigns page numbers to chunks based on overlap with page boundaries - Handles chunks spanning multiple pages - processor.py:298-300 - Call assign_page_numbers() for PDF files ## Type Safety Fixes - bm25_hybrid.py:184 - Removed int() conversion of doc_id - semantic.py:131 - Removed int() conversion of doc_id - viz_routes.py:275 - Removed int() conversion of doc_id - Added comments documenting that doc_id can be int (notes) or str (file paths) ## Testing - All 18 tests passing (12 unit + 6 integration) - No type errors in modified files - Container logs show successful processing - Vector viz searches working correctly 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-20 02:37:07 +01:00
Chris Coutinho	c4ce28f05d	fix: Improve 3D plot rendering with explicit dimensions and window resize support - Get container dimensions before creating Plotly layout to render at correct size immediately - Add init() method with window resize listener for responsive plot sizing - Remove post-render resize call (no longer needed with explicit dimensions) - Improve colorbar positioning and scene domain configuration This eliminates the visual "jump" during initial render and ensures the plot resizes smoothly when the browser window changes size. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-19 19:43:20 +01:00
Chris Coutinho	c126c3ec03	fix: Preserve 3D plot camera and improve documentation This commit addresses PR feedback and fixes plot camera behavior. ## JavaScript Fix - Camera Preservation - Changed plot update strategy from recreating layout to using Plotly.restyle() - Query point visibility now toggles via restyle() which only modifies trace visibility - Camera position/zoom naturally preserved since layout remains untouched - Resolves jumpy plot behavior when toggling "Show Query Point" checkbox Related: nextcloud_mcp_server/auth/static/vector-viz.js:58-73 ## Documentation Improvements - Condensed vector-sync-ui.md from 316 to 94 lines (~70% reduction) - Removed redundant FAQ section (content merged into main sections) - Simplified use cases from 4 detailed sections to 3 focused paragraphs - Streamlined troubleshooting to 3 common issues - Merged technical details into overview section - Retained all essential information while improving readability ## Screenshot Updates Removed old/outdated images (5 files): - rag-workflow-bidirectional-final.png - rag-workflow-prominent-llm.png - rag-workflow-simple-final.png - vector-viz-interface.png - welcome-page.png Replaced with current screenshots (3 files): - vector-viz-document-types-2col.png - Now shows plot + results - vector-viz-chunk-context.png - Centered content view - vector-viz-results.png - Updated results list 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-19 14:10:53 +01:00
Chris Coutinho	9bd02d7ef7	fix: Preserve 3D plot camera position and fix CSS loading Two fixes for the vector visualization page: 1. CSS Loading Fix: Moved CSS <link> from vector_viz.html fragment to user_info.html <head> block. HTMX fragments don't process <link> tags in <head>, causing unstyled page. Now CSS loads correctly. 2. Camera Preservation: Modified renderPlot() to preserve camera position when toggling query point visibility. Previously, toggling the "Show Query Point" checkbox would reset zoom/rotation to default. Now reads existing camera settings from plot before updating. Related: nextcloud_mcp_server/auth/static/vector-viz.js:123-130 Related: nextcloud_mcp_server/auth/templates/user_info.html:12 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-19 13:51:08 +01:00
Chris Coutinho	53689d076b	feat: Improve vector visualization with static assets and fixes - Extract CSS and JavaScript into separate static files - Created nextcloud_mcp_server/auth/static/vector-viz.css - Created nextcloud_mcp_server/auth/static/vector-viz.js - Updated templates to reference external assets - Fix vector visualization issues: - Normalize vectors before PCA to match Qdrant's cosine distance - Add zero-norm and NaN detection/handling for large datasets - Enable responsive Plotly sizing (autosize + responsive config) - Widen plot area to full viewport width with minimized margins - Improve visualization accuracy: - Query point now positioned correctly relative to documents - Handles 200+ points without JSON serialization errors - Full-width plot maximizes screen space utilization 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-19 04:10:44 +01:00
Chris Coutinho	9db20a4d01	feat: Redesign UI to match Nextcloud ecosystem aesthetic This commit updates the web interface to better align with Nextcloud's design system and improve the Vector Viz layout. Changes: - Replace emoji icons with Material Design SVG icons for better consistency with Nextcloud apps - Simplify navigation styling with minimal padding and subtle active states (250px width) - Update CSS variables to match Nextcloud design system - Restructure Vector Viz from two-column to single-column vertical layout for better plot visibility - Move search controls to compact horizontal grid at top - Make navigation toggle always visible (not just on mobile) - Fix plot container sizing with overflow:visible to prevent colorbar clipping - Remove heavy shadows and custom card styling for cleaner aesthetic - Add error and success page templates with consistent styling Technical details: - Preserve Alpine.js for reactive functionality - Use CSS Grid for responsive horizontal controls layout - Add smooth transitions for navigation collapse/expand - Maintain HTMX for dynamic content loading 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-19 00:45:19 +01:00
Chris Coutinho	d374bfa1e5	feat(viz): Add dual-score display and improve UI controls This commit enhances the vector visualization interface with better score transparency and improved UX: Dual-Score Display: - Store original algorithm scores before normalization (viz_routes.py:203) - Display both raw and normalized scores: "Raw Score: 0.842 (89% relative)" - Update plot hover text with dual scores (userinfo_routes.py:740) - Fixes issue where all queries showed at least one 100% match regardless of actual relevance (normalization artifact) UI Improvements: 1. Fusion Method dropdown: Changed from x-show to :disabled - Prevents jarring layout shift when switching algorithms - Dropdown stays visible but grayed out when Semantic is selected - Better UX with opacity: 0.5 and cursor: not-allowed 2. Score Threshold: Changed step from 0.1 to "any" - Allows arbitrary float precision (0.7, 0.85, 0.123) - Users can now fine-tune threshold values 3. Document Types: Converted multi-select to checkbox grid - Replaced clunky Ctrl/Cmd multi-select listbox - Checkbox grid with cleaner layout - Positioned left of Score Threshold and Result Limit inputs - More intuitive UX Technical Details: - Raw score ranges vary by algorithm: - Semantic: 0.0-1.0 (cosine similarity) - BM25 RRF: ~0.001-0.033 (Reciprocal Rank Fusion) - BM25 DBSF: Can exceed 1.0 (Distribution-Based Score Fusion) - Normalized scores (0-1) used for visual encoding (marker size, color) - Original scores preserved in API response via getattr fallback Files modified: - nextcloud_mcp_server/auth/viz_routes.py (store original_score) - nextcloud_mcp_server/auth/templates/vector_viz.html (UI controls) - nextcloud_mcp_server/auth/userinfo_routes.py (plot hover text) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-17 08:05:49 +01:00
Chris Coutinho	3464b21845	fix: Relax SearchResult validation to support DBSF fusion scores > 1.0 Fix false-positive validation error where DBSF (Distribution-Based Score Fusion) correctly produces scores > 1.0 but SearchResult validation incorrectly rejected them. Root Cause: SearchResult.__post_init__() enforced scores in [0.0, 1.0] range, but DBSF sums normalized scores from multiple retrieval systems (dense semantic + sparse BM25), resulting in scores like 1.55 when both systems strongly agree a document is relevant. Changes: - Relaxed validation to allow any score ≥ 0.0 (algorithms.py:147-157) - Updated SearchResult and SemanticSearchResult documentation to explain score ranges for RRF ([0.0, 1.0]) vs DBSF (unbounded) - Added comprehensive test coverage for both fusion methods - Added DBSF fusion option to vector visualization UI - Updated viz routes and vizApp() to support fusion parameter selection Testing: All 157 unit tests pass, type checking passes, ruff passes Fixes error: "Configuration error: Score must be between 0.0 and 1.0, got 1.1528953" 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-17 06:32:30 +01:00
Chris Coutinho	c28fc955ca	Merge origin/master into feature/bm25 Resolved conflicts: - viz_routes.py: Kept bm25's extract_dense_vector() function for robust vector handling - hybrid.py: Removed (bm25 uses native Qdrant RRF fusion instead) - uv.lock: Regenerated after accepting master's dependencies This merge brings in: - RAG evaluation framework (ADR-013) - Performance optimizations (double-fetch elimination) - Migration from asyncio to anyio - OpenTelemetry tracing improvements - Notes app enhancements 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-16 11:52:40 +01:00
Chris Coutinho	944b6dcf5a	fix: Handle named vectors in visualization and semantic search - viz_routes.py: Extract "dense" vector from named vector dict - semantic.py: Specify using="dense" for BM25 hybrid collections - Fixes "X must be 2D array" error in hybrid search - Fixes "Dense vector is not found" error in semantic search 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-16 08:16:35 +01:00
Chris Coutinho	fc6a2f14e4	fix: Update vizApp to use bm25_hybrid algorithm and remove deprecated weights The visualization UI was still using the old 'hybrid' algorithm name and weight parameters that were replaced by the BM25 hybrid search refactor. This caused "Unknown algorithm: hybrid" errors when using the search & visualize feature. Changes: - Update default algorithm from 'hybrid' to 'bm25_hybrid' - Update default scoreThreshold from 0.7 to 0.0 to match backend - Remove deprecated semanticWeight, keywordWeight, fuzzyWeight parameters - Remove weight parameters from search request Fixes the visualization search functionality after BM25 hybrid refactor. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-16 07:54:20 +01:00

1 2 3

119 Commits