feat(helm): Add document chunking configuration

Add support for configurable document chunking parameters to Helm chart to match docker-compose and application capabilities. Changes: 1. values.yaml: - Add documentChunking section with chunkSize (512) and chunkOverlap (50) - Include comprehensive comments explaining chunking strategies - Positioned between vectorSync and qdrant sections 2. templates/deployment.yaml: - Add DOCUMENT_CHUNK_SIZE and DOCUMENT_CHUNK_OVERLAP env vars - Always set (not conditional), used by vector sync processor - Environment variables follow same pattern as config.py defaults 3. README.md: - Add documentChunking parameter table in Vector Search section - Document chunking strategies (small/medium/large chunks) - Explain overlap recommendations (10-20% of chunk size) Validation: - helm lint: Passes - helm template: Environment variables correctly generated - Custom values: Work as expected (tested with chunkSize=1024) - Always present: Not conditional on vectorSync.enabled This maintains feature parity between Helm and docker-compose deployments, allowing users to tune chunking for their embedding models and use cases. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
fix: Support in-memory Qdrant for CI testing
2025-11-10 03:34:16 +01:00 · 2025-11-10 03:21:27 +01:00 · 2025-11-10 03:09:50 +01:00 · 2025-11-10 02:47:57 +01:00 · 2025-11-10 02:07:45 +01:00 · 2025-11-10 01:18:30 +01:00
75 changed files with 11300 additions and 313 deletions
@@ -25,7 +25,7 @@ jobs:
          github_token: ${{ secrets.PERSONAL_ACCESS_TOKEN }}
          changelog_increment_filename: body.md
      - name: Release
-        uses: softprops/action-gh-release@6da8fa9354ddfdc4aeace5fc48d7f679b5214090 # v2.4.1
+        uses: softprops/action-gh-release@5be0e66d93ac7ed76da52eca8bb058f665c3a5fe # v2.4.2
        with:
          body_path: "body.md"
          tag_name: v${{ env.REVISION }}
@@ -24,6 +24,18 @@ jobs:
          git config user.name "$GITHUB_ACTOR"
          git config user.email "$GITHUB_ACTOR@users.noreply.github.com"

+      - name: Install Helm
+        uses: azure/setup-helm@1a275c3b69536ee54be43f2070a358922e12c8d4 # v4.3.1
+        with:
+          version: v3.16.0
+
+      - name: Add Helm repositories and update dependencies
+        run: |
+          helm repo add qdrant https://qdrant.github.io/qdrant-helm
+          helm repo add ollama https://otwld.github.io/ollama-helm
+          helm repo update
+          helm dependency build charts/nextcloud-mcp-server
+
      - name: Run chart-releaser
        uses: helm/chart-releaser-action@cae68fefc6b5f367a0275617c9f83181ba54714f # v1.7.0
        env:
@@ -52,6 +52,7 @@ jobs:
        uses: hoverkraft-tech/compose-action@3846bcd61da338e9eaaf83e7ed0234a12b099b72 # v2.4.1
        with:
          compose-file: "./docker-compose.yml"
+          #compose-flags: "--profile qdrant"
          up-flags: "--build"

      - name: Install the latest version of uv
@@ -1,3 +1,71 @@
+## v0.29.1 (2025-11-09)
+
+### Fix
+
+- **observability**: isolate metrics endpoint to dedicated port
+
+## v0.29.0 (2025-11-09)
+
+### Feat
+
+- **helm**: Add observability support with ServiceMonitor and Grafana dashboard
+
+### Fix
+
+- **readiness**: Only check external Qdrant in network mode
+
+## v0.28.0 (2025-11-09)
+
+### Feat
+
+- **observability**: Add comprehensive monitoring with Prometheus and OpenTelemetry
+
+### Fix
+
+- **vector**: Handle missing 'modified' field in notes gracefully
+
+## v0.27.3 (2025-11-09)
+
+### Fix
+
+- **ci**: Use helm dependency build instead of update to use Chart.lock
+
+## v0.27.2 (2025-11-09)
+
+### Fix
+
+- **helm**: update Qdrant dependency condition to match new mode structure
+
+## v0.27.1 (2025-11-09)
+
+### Fix
+
+- **ci**: add Helm repository setup to chart release workflow
+
+## v0.27.0 (2025-11-09)
+
+### Feat
+
+- **helm**: add Qdrant local mode support with three deployment options [skip ci]
+- add Qdrant local mode support with in-memory and persistent storage
+- implement ADR-009 - refactor semantic search to use generic semantic:read scope
+- implement MCP sampling for semantic search RAG (ADR-008)
+- add optional vector database and semantic search to helm chart
+- add vector sync processing status to /user/page endpoint
+- implement semantic search tool and fix vector sync issues (ADR-007 Phase 3)
+- implement vector sync scanner and processor (ADR-007 Phase 2)
+
+### Fix
+
+- implement deletion grace period and vector sync status tool
+- remove unnecessary urllib3<2.0 constraint
+- integrate vector sync tasks with Starlette lifespan for streamable-http
+
+### Refactor
+
+- migrate vector sync from asyncio.Queue to anyio memory object streams
+- update to Qdrant query_points API and fix Playwright Keycloak login
+
 ## v0.26.1 (2025-11-08)

 ### Fix
@@ -224,6 +224,82 @@ docker compose exec db mariadb -u root -ppassword nextcloud -e \

 **Testing**: Extract `data["results"]` from MCP responses, not `data` directly.

+## MCP Sampling for RAG (ADR-008)
+
+**What is MCP Sampling?**
+MCP sampling allows servers to request LLM completions from their clients. This enables Retrieval-Augmented Generation (RAG) patterns where the server retrieves context and the client's LLM generates answers.
+
+**When to use sampling:**
+- Generating natural language answers from retrieved documents
+- Synthesizing information from multiple sources
+- Creating summaries with citations
+
+**Implementation Pattern** (see ADR-008 for details):
+
+```python
+from mcp.types import ModelHint, ModelPreferences, SamplingMessage, TextContent
+
+@mcp.tool()
+@require_scopes("notes:read")
+async def nc_notes_semantic_search_answer(
+    query: str, ctx: Context, limit: int = 5, max_answer_tokens: int = 500
+) -> SamplingSearchResponse:
+    # 1. Retrieve documents
+    search_response = await nc_notes_semantic_search(query, ctx, limit)
+
+    # 2. Check for no results (don't waste sampling call)
+    if not search_response.results:
+        return SamplingSearchResponse(
+            query=query,
+            generated_answer="No relevant documents found.",
+            sources=[], total_found=0, success=True
+        )
+
+    # 3. Construct prompt with retrieved context
+    prompt = f"{query}\n\nDocuments:\n{format_sources(search_response.results)}\n\nProvide answer with citations."
+
+    # 4. Request LLM completion via sampling
+    try:
+        result = await ctx.session.create_message(
+            messages=[SamplingMessage(role="user", content=TextContent(type="text", text=prompt))],
+            max_tokens=max_answer_tokens,
+            temperature=0.7,
+            model_preferences=ModelPreferences(
+                hints=[ModelHint(name="claude-3-5-sonnet")],
+                intelligencePriority=0.8,
+                speedPriority=0.5,
+            ),
+            include_context="thisServer",
+        )
+
+        return SamplingSearchResponse(
+            query=query,
+            generated_answer=result.content.text,
+            sources=search_response.results,
+            model_used=result.model,
+            stop_reason=result.stopReason,
+            success=True
+        )
+    except Exception as e:
+        # Fallback: Return documents without generated answer
+        return SamplingSearchResponse(
+            query=query,
+            generated_answer=f"[Sampling unavailable: {e}]\n\nFound {len(search_response.results)} documents.",
+            sources=search_response.results,
+            search_method="semantic_sampling_fallback",
+            success=True
+        )
+```
+
+**Key Points**:
+- **No server-side LLM**: Server has no API keys, client controls which model is used
+- **Graceful degradation**: Tool always returns useful results even if sampling fails
+- **User control**: MCP clients SHOULD prompt users to approve sampling requests
+- **No results optimization**: Skip sampling call when no documents found
+- **Fixed prompts**: Prompts are not user-configurable to avoid injection risks
+
+**Reference**: See `nc_notes_semantic_search_answer` in `nextcloud_mcp_server/server/notes.py:517` and ADR-008 for complete implementation.
+
 ## Testing Best Practices (MANDATORY)

 ### Always Run Tests
@@ -315,3 +391,7 @@ docker compose exec app php occ user_oidc:provider keycloak
 - `docs/configuration.md` - Configuration options
 - `docs/authentication.md` - Authentication modes
 - `docs/running.md` - Running the server
+
+**For additional information regarding MCP during development, see**:
+- `../../Software/modelcontextprotocol/` - MCP spec
+- `../../Software/python-sdk/` - Python MCP SDK
@@ -12,5 +12,6 @@ COPY . .
 RUN uv sync --locked --no-dev

 ENV PYTHONUNBUFFERED=1
+ENV VIRTUAL_ENV=/app/.venv

 ENTRYPOINT ["/app/.venv/bin/nextcloud-mcp-server", "--host", "0.0.0.0"]
@@ -2,284 +2,134 @@

 [![Docker Image](https://img.shields.io/badge/docker-ghcr.io/cbcoutinho/nextcloud--mcp--server-blue)](https://github.com/cbcoutinho/nextcloud-mcp-server/pkgs/container/nextcloud-mcp-server)

-**Enable AI assistants to interact with your Nextcloud instance.**
+**A production-ready MCP server that connects AI assistants to your Nextcloud instance.**

-The Nextcloud MCP (Model Context Protocol) server allows Large Language Models like Claude, GPT, and Gemini to interact with your Nextcloud data through a secure API. Create notes, manage calendars, organize contacts, work with files, and more - all through natural language.
+Enable Large Language Models like Claude, GPT, and Gemini to interact with your Nextcloud data through a secure API. Create notes, manage calendars, organize contacts, work with files, and more - all through natural language conversations.
+
+This is a **dedicated standalone MCP server** designed for external MCP clients like Claude Code and IDEs. It runs independently of Nextcloud (Docker, VM, Kubernetes, or local) and provides deep CRUD operations across Nextcloud apps.

 > [!NOTE]
-> **Nextcloud has two ways to enable AI access:** Nextcloud provides [Context Agent](https://github.com/nextcloud/context_agent), an AI agent backend that powers the [Assistant](https://github.com/nextcloud/assistant) app and allows AI to interact with Nextcloud apps like Calendar, Talk, and Contacts. Context Agent runs as an ExApp inside Nextcloud and also _[exposes an MCP server](https://docs.nextcloud.com/server/stable/admin_manual/ai/app_context_agent.html#using-nextcloud-mcp-server)_ for external MCP clients.
->
-> This project (Nextcloud MCP Server) is a **dedicated standalone MCP server** designed specifically for external MCP clients like Claude Code and IDEs, with deep CRUD operations and OAuth support. It does not require any additional AI-features to be enabled in Nextcloud beyond the apps that you intend to interact with.
-
-### High-level Comparison: Nextcloud MCP Server vs. Nextcloud AI Stack
-
-| Aspect | **Nextcloud MCP Server**<br/>(This Project) | **Nextcloud AI Stack**<br/>(Assistant + Context Agent) |
-|--------|---------------------------------------------|--------------------------------------------------------|
-| **Purpose** | External MCP client access to Nextcloud | AI assistance within Nextcloud UI |
-| **Deployment** | Standalone (Docker, VM, K8s) | Inside Nextcloud (ExApp via AppAPI) |
-| **Primary Users** | Claude Code, IDEs, external developers | Nextcloud end users via Assistant app |
-| **Authentication** | OAuth2/OIDC or Basic Auth | Session-based (integrated) |
-| **Notes Support** | ✅ Full CRUD + search (7 tools) | ❌ Not implemented |
-| **Calendar** | ✅ Full CalDAV + tasks (20+ tools) | ✅ Events, free/busy, tasks (4 tools) |
-| **Contacts** | ✅ Full CardDAV (8 tools) | ✅ Find person, current user (2 tools) |
-| **Files (WebDAV)** | ✅ Full filesystem access (12 tools) | ✅ Read, folder tree, sharing (3 tools) |
-| **Document Processing** | ✅ OCR with progress (PDF, DOCX, images) | ❌ Not implemented |
-| **Deck** | ✅ Full project management (15 tools) | ✅ Basic board/card ops (2 tools) |
-| **Tables** | ✅ Row operations (5 tools) | ❌ Not implemented |
-| **Cookbook** | ✅ Full recipe management (13 tools) | ❌ Not implemented |
-| **Talk** | ❌ Not implemented | ✅ Messages, conversations (4 tools) |
-| **Mail** | ❌ Not implemented | ✅ Send email (2 tools) |
-| **AI Features** | ❌ Not implemented | ✅ Image gen, transcription, doc gen (4 tools) |
-| **Web/Maps** | ❌ Not implemented | ✅ Search, weather, transit (5 tools) |
-| **MCP Resources** | ✅ Structured data URIs | ❌ Not supported |
-| **External MCP** | ❌ Pure server | ✅ Consumes external MCP servers |
-| **Safety Model** | Client-controlled | Built-in safe/dangerous distinction |
-| **Best For** | • Deep CRUD operations<br/>• External integrations<br/>• OAuth security<br/>• IDE/editor integration | • AI-driven actions in Nextcloud UI<br/>• Multi-service orchestration<br/>• User task automation<br/>• MCP aggregation hub |
-
-See our [detailed comparison](docs/comparison-context-agent.md) for architecture diagrams, workflow examples, and guidance on when to use each approach.
-
-Want to see another Nextcloud app supported? [Open an issue](https://github.com/cbcoutinho/nextcloud-mcp-server/issues) or contribute a pull request!
-
-### Authentication
-
-| Mode | Security | Best For |
-|------|----------|----------|
-| **OAuth2/OIDC** ⚠️ **Experimental** | 🔒 High | Testing, evaluation (requires patch for app-specific APIs) |
-| **Basic Auth** ✅ | Lower | Development, testing, production |
-
-> [!IMPORTANT]
-> **OAuth is experimental** and requires a manual patch to the `user_oidc` app for full functionality:
-> - **Required patch**: `user_oidc` app needs modifications for Bearer token support ([issue #1221](https://github.com/nextcloud/user_oidc/issues/1221))
-> - **Impact**: Without the patch, most app-specific APIs (Notes, Calendar, Contacts, Deck, etc.) will fail with 401 errors
-> - **What works without patches**: OAuth flow, PKCE support (with `oidc` v1.10.0+), OCS APIs
-> - **Production use**: Wait for upstream patch to be merged into official releases
->
-> See [OAuth Upstream Status](docs/oauth-upstream-status.md) for detailed information on required patches and workarounds.
-
-OAuth2/OIDC provides secure, per-user authentication with access tokens. See [Authentication Guide](docs/authentication.md) for details.
+> **Looking for AI features inside Nextcloud?** Nextcloud also provides [Context Agent](https://github.com/nextcloud/context_agent), which powers the Assistant app and runs as an ExApp inside Nextcloud. See [docs/comparison-context-agent.md](docs/comparison-context-agent.md) for a detailed comparison of use cases.

 ## Quick Start

-### 1. Install
+Get up and running in 60 seconds using Docker:

 ```bash
-# Clone the repository
-git clone https://github.com/cbcoutinho/nextcloud-mcp-server.git
-cd nextcloud-mcp-server
-
-# Install with uv (recommended)
-uv sync
-
-# Or using Docker
-docker pull ghcr.io/cbcoutinho/nextcloud-mcp-server:latest
-
-# Or deploy to Kubernetes with Helm
-helm repo add nextcloud-mcp https://cbcoutinho.github.io/nextcloud-mcp-server
-helm repo update
-helm install nextcloud-mcp nextcloud-mcp/nextcloud-mcp-server \
-  --set nextcloud.host=https://cloud.example.com \
-  --set auth.basic.username=myuser \
-  --set auth.basic.password=mypassword
-```
-
-See [Installation Guide](docs/installation.md) for detailed instructions, or [Helm Chart README](charts/nextcloud-mcp-server/README.md) for Kubernetes deployment.
-
-### 2. Configure
-
-Create a `.env` file:
-
-```bash
-# Copy the sample
-cp env.sample .env
-```
-
-**For Basic Auth (recommended for most users):**
-```dotenv
+# 1. Create a minimal configuration
+cat > .env << EOF
 NEXTCLOUD_HOST=https://your.nextcloud.instance.com
 NEXTCLOUD_USERNAME=your_username
 NEXTCLOUD_PASSWORD=your_app_password
-```
+EOF

-**For OAuth (experimental - requires patches):**
-```dotenv
-NEXTCLOUD_HOST=https://your.nextcloud.instance.com
-```
-
-See [Configuration Guide](docs/configuration.md) for all options.
-
-### 3. Set Up Authentication
-
-**Basic Auth Setup (recommended):**
-1. Create an app password in Nextcloud (Settings → Security → Devices & sessions)
-2. Add credentials to `.env` file
-3. Start the server
-
-**OAuth Setup (experimental):**
-1. Install Nextcloud OIDC apps (`oidc` v1.10.0+ + `user_oidc`)
-2. **Apply required patch** to `user_oidc` app for Bearer token support (see [OAuth Upstream Status](docs/oauth-upstream-status.md))
-3. Enable dynamic client registration or create an OIDC client with id & secret
-4. Configure Bearer token validation in `user_oidc`
-5. Start the server
-
-See [OAuth Quick Start](docs/quickstart-oauth.md) for 5-minute setup or [OAuth Setup Guide](docs/oauth-setup.md) for detailed instructions.
-
-### 4. Run the Server
-
-```bash
-# Load environment variables
-export $(grep -v '^#' .env | xargs)
-
-# Start with Basic Auth (default)
-uv run nextcloud-mcp-server
-
-# Or start with OAuth (experimental - requires patches)
-uv run nextcloud-mcp-server --oauth
-
-# Or with Docker
+# 2. Start the server
 docker run -p 127.0.0.1:8000:8000 --env-file .env --rm \
  ghcr.io/cbcoutinho/nextcloud-mcp-server:latest
+
+# 3. Test the connection
+curl http://127.0.0.1:8000/health/ready
 ```

-The server starts on `http://127.0.0.1:8000` by default.
+**Next Steps:**
+- Create an app password in Nextcloud: Settings → Security → Devices & sessions
+- Connect your MCP client (Claude Desktop, IDEs, `mcp dev`, etc.)
+- See [docs/installation.md](docs/installation.md) for other deployment options (local, Kubernetes)

-See [Running the Server](docs/running.md) for more options.
+## Key Features

-### 5. Connect an MCP Client
+- **90+ MCP Tools** - Comprehensive API coverage across 8 Nextcloud apps
+- **MCP Resources** - Structured data URIs for browsing Nextcloud data
+- **Semantic Search (Experimental)** - Optional vector-powered search for Notes (requires Qdrant + Ollama)
+- **Document Processing** - OCR and text extraction from PDFs, DOCX, images with progress notifications
+- **Flexible Deployment** - Docker, Kubernetes (Helm), VM, or local installation
+- **Production-Ready Auth** - Basic Auth with app passwords (recommended) or OAuth2/OIDC (experimental)
+- **Multiple Transports** - SSE, HTTP, and streamable-http support

-Test with MCP Inspector:
+## Supported Apps

-```bash
-uv run mcp dev
-```
+| App | Tools | Capabilities |
+|-----|-------|--------------|
+| **Notes** | 7 | Full CRUD, keyword search, semantic search |
+| **Calendar** | 20+ | Events, todos (tasks), recurring events, attendees, availability |
+| **Contacts** | 8 | Full CardDAV support, address books |
+| **Files (WebDAV)** | 12 | Filesystem access, OCR/document processing |
+| **Deck** | 15 | Boards, stacks, cards, labels, assignments |
+| **Cookbook** | 13 | Recipe management, URL import (schema.org) |
+| **Tables** | 5 | Row operations on Nextcloud Tables |
+| **Sharing** | 10+ | Create and manage shares |
+| **Semantic Search** | 2+ | Vector search for Notes (experimental, opt-in, requires infrastructure) |

-Or connect from:
- Claude Desktop
- Any MCP-compatible client
+Want to see another Nextcloud app supported? [Open an issue](https://github.com/cbcoutinho/nextcloud-mcp-server/issues) or contribute a pull request!
+
+## Authentication
+
+> [!IMPORTANT]
+> **OAuth2/OIDC is experimental** and requires a manual patch to the `user_oidc` app:
+> - **Required patch**: Bearer token support ([issue #1221](https://github.com/nextcloud/user_oidc/issues/1221))
+> - **Impact**: Without the patch, most app-specific APIs fail with 401 errors
+> - **Recommendation**: Use Basic Auth for production until upstream patches are merged
+>
+> See [docs/oauth-upstream-status.md](docs/oauth-upstream-status.md) for patch status and workarounds.
+
+**Recommended:** Basic Auth with app-specific passwords provides secure, production-ready authentication. See [docs/authentication.md](docs/authentication.md) for setup details and OAuth configuration.
+
+### Authentication Modes
+
+The server supports two authentication modes:
+
+**Single-User Mode (BasicAuth):**
+- One set of credentials shared by all MCP clients
+- Simple setup: username + app password in environment variables
+- All clients access Nextcloud as the same user
+- Best for: Personal use, development, single-user deployments
+
+**Multi-User Mode (OAuth):**
+- Each MCP client authenticates separately with their own Nextcloud account
+- Per-user scopes and permissions (clients only see tools they're authorized for)
+- More secure: tokens expire, credentials never shared with server
+- Best for: Teams, multi-user deployments, production environments with multiple users
+
+See [docs/authentication.md](docs/authentication.md) for detailed setup instructions.
+
+## Semantic Search
+
+The server provides an experimental RAG pipeline to enable _Semantic Search_ that enables MCP clients to find information in Nextcloud based on **meaning** rather than just keywords. Instead of matching "machine learning" only when those exact words appear, it understands that "neural networks," "AI models," and "deep learning" are semantically related concepts.
+
+**Example:**
+- **Keyword search**: Query "car" only finds notes containing "car"
+- **Semantic search**: Query "car" also finds notes about "automobile," "vehicle," "sedan," "transportation"
+
+This enables natural language queries and helps discover related content across your Nextcloud notes.
+
+> [!NOTE]
+> **Semantic Search is experimental and opt-in:**
+> - Disabled by default (`VECTOR_SYNC_ENABLED=false`)
+> - Currently supports Notes app only (multi-app support planned)
+> - Requires additional infrastructure: vector database + embedding service
+> - Answer generation (`nc_semantic_search_answer`) requires MCP client sampling support
+>
+> See [docs/semantic-search-architecture.md](docs/semantic-search-architecture.md) for architecture details and [docs/configuration.md](docs/configuration.md) for setup instructions.

 ## Documentation

 ### Getting Started
- **[Installation](docs/installation.md)** - Install the server
- **[Configuration](docs/configuration.md)** - Environment variables and settings
- **[Authentication](docs/authentication.md)** - OAuth vs BasicAuth
- **[Running the Server](docs/running.md)** - Start and manage the server
+- **[Installation](docs/installation.md)** - Docker, Kubernetes, local, or VM deployment
+- **[Configuration](docs/configuration.md)** - Environment variables and advanced options
+- **[Authentication](docs/authentication.md)** - Basic Auth vs OAuth2/OIDC setup
+- **[Running the Server](docs/running.md)** - Start, manage, and troubleshoot

-### Architecture
- **[Comparison with Context Agent](docs/comparison-context-agent.md)** - How this MCP server differs from Nextcloud's Context Agent
+### Features
+- **[App Documentation](docs/)** - Notes, Calendar, Contacts, WebDAV, Deck, Cookbook, Tables
+- **[Document Processing](docs/configuration.md#document-processing)** - OCR and text extraction setup
+- **[Semantic Search Architecture](docs/semantic-search-architecture.md)** - Experimental vector search (Notes only, opt-in)

-### OAuth Documentation (Experimental)
- **[OAuth Quick Start](docs/quickstart-oauth.md)** - 5-minute setup guide
- **[OAuth Setup Guide](docs/oauth-setup.md)** - Detailed setup instructions
- **[OAuth Architecture](docs/oauth-architecture.md)** - How OAuth works
- **[OAuth Troubleshooting](docs/oauth-troubleshooting.md)** - OAuth-specific issues
- **[Upstream Status](docs/oauth-upstream-status.md)** - **Required patches and PRs** ⚠️
-
-### Reference
+### Advanced Topics
+- **[OAuth Architecture](docs/oauth-architecture.md)** - How OAuth works (experimental)
+- **[OAuth Quick Start](docs/quickstart-oauth.md)** - 5-minute OAuth setup
+- **[OAuth Setup Guide](docs/oauth-setup.md)** - Detailed OAuth configuration
 - **[Troubleshooting](docs/troubleshooting.md)** - Common issues and solutions
-
-### App-Specific Documentation
- [Notes API](docs/notes.md)
- [Calendar (CalDAV)](docs/calendar.md)
- [Contacts (CardDAV)](docs/contacts.md)
- [Cookbook](docs/cookbook.md)
- [Deck](docs/deck.md)
- [Tables](docs/table.md)
- [WebDAV](docs/webdav.md)
-
-## MCP Tools & Resources
-
-The server exposes Nextcloud functionality through MCP tools (for actions) and resources (for data browsing).
-
-### Tools
-
-The server provides 90+ tools across 8 Nextcloud apps. When using OAuth, tools are dynamically filtered based on your granted scopes.
-
-For a complete list of all supported OAuth scopes and their descriptions, see [OAuth Scopes Documentation](docs/oauth-architecture.md#oauth-scopes).
-
-#### Available Tool Categories
-
-| App | Tools | Read Scope | Write Scope | Operations |
-|-----|-------|-----------|-------------|------------|
-| **Notes** | 7 | `notes:read` | `notes:write` | Create, read, update, delete, search notes |
-| **Calendar** | 20+ | `calendar:read` `todo:read`  | `calendar:write` `todo:write`   | Events, todos (tasks), calendars, recurring events, attendees |
-| **Contacts** | 8 | `contacts:read` | `contacts:write` | Create, read, update, delete contacts and address books |
-| **Files (WebDAV)** | 12 | `files:read` | `files:write` | List, read, upload, delete, move files; **OCR/document processing** |
-| **Deck** | 15 | `deck:read` | `deck:write` | Boards, stacks, cards, labels, assignments |
-| **Cookbook** | 13 | `cookbook:read` | `cookbook:write` | Recipes, import from URLs, search, categories |
-| **Tables** | 5 | `tables:read` | `tables:write` | Row operations on Nextcloud Tables |
-| **Sharing** | 10+ | `sharing:read` | `sharing:write` | Create, manage, delete shares |
-
-#### Document Processing (Optional)
-
-The WebDAV file reading tool (`nc_webdav_read_file`) supports **automatic text extraction** from documents and images:
-
-**Supported Formats:**
- **Documents**: PDF, DOCX, PPTX, XLSX, RTF, ODT, EPUB
- **Images**: PNG, JPEG, TIFF, BMP (with OCR)
- **Email**: EML, MSG files
-
-**Features:**
- **Progress Notifications**: Long-running OCR operations (up to 120s) send progress updates every 10 seconds to prevent client timeouts
- **Pluggable Architecture**: Multiple processor backends (Unstructured.io, Tesseract, custom HTTP APIs)
- **Automatic Detection**: Files are processed based on MIME type
- **Graceful Fallback**: Returns base64-encoded content if processing fails
-
-**Configuration:**
-```dotenv
-# Enable document processing (optional)
-ENABLE_DOCUMENT_PROCESSING=true
-
-# Unstructured.io processor (cloud/API-based, supports many formats)
-ENABLE_UNSTRUCTURED=true
-UNSTRUCTURED_API_URL=http://localhost:8002
-UNSTRUCTURED_STRATEGY=auto  # auto, fast, or hi_res
-UNSTRUCTURED_LANGUAGES=eng,deu
-PROGRESS_INTERVAL=10  # Progress update interval in seconds
-
-# Tesseract processor (local OCR, images only)
-ENABLE_TESSERACT=false
-TESSERACT_LANG=eng
-
-# Custom HTTP processor
-ENABLE_CUSTOM_PROCESSOR=false
-CUSTOM_PROCESSOR_URL=http://localhost:9000/process
-CUSTOM_PROCESSOR_TYPES=application/pdf,image/jpeg
-```
-
-**Example Usage:**
-```
-AI: "Read the contents of Documents/report.pdf"
-→ Uses nc_webdav_read_file tool with automatic OCR processing
-→ Returns extracted text with parsing metadata
-→ Sends progress updates during long operations
-```
-
-See [env.sample](env.sample) for complete configuration options.
-
-**Example Tools:**
- `nc_notes_create_note` - Create a new note
- `nc_cookbook_import_recipe` - Import recipes from URLs with schema.org metadata
- `deck_create_card` - Create a Deck card
- `nc_calendar_create_event` - Create a calendar event
- `nc_calendar_create_todo` - Create a CalDAV task/todo
- `nc_contacts_create_contact` - Create a contact
- `nc_webdav_upload_file` - Upload a file to Nextcloud
- And 80+ more...
-
-> [!TIP]
-> **OAuth Scope Filtering**: When connecting via OAuth, MCP clients will only see tools for which you've granted access. For example, granting only `notes:read` and `notes:write` will show 7 Notes tools instead of all 90+ tools. See [OAuth Scopes Documentation](docs/oauth-architecture.md#oauth-scopes) for the complete scope reference, or [OAuth Troubleshooting - Limited Scopes](docs/oauth-troubleshooting.md#limited-scopes---only-seeing-notes-tools) if you're only seeing a subset of tools.
->
-> **Known Issue**: Claude Code and some other MCP clients may only request/grant Notes scopes during initial connection. Track progress at [#234](https://github.com/cbcoutinho/nextcloud-mcp-server/issues/234).
-
-### Resources
-Resources provide read-only access to Nextcloud data:
- `nc://capabilities` - Server capabilities
- `cookbook://version` - Cookbook app version info
- `nc://Deck/boards/{board_id}` - Deck board data
- `notes://settings` - Notes app settings
- And more...
-
-Run `uv run nextcloud-mcp-server --help` to see all available options.
+- **[Comparison with Context Agent](docs/comparison-context-agent.md)** - When to use each approach

 ## Examples

@@ -289,45 +139,31 @@ AI: "Create a note called 'Meeting Notes' with today's agenda"
 → Uses nc_notes_create_note tool
 ```

-### Manage Recipes
+### Import Recipes
 ```
-AI: "Import the recipe from this URL: https://www.example.com/recipe/chocolate-cake"
-→ Uses nc_cookbook_import_recipe tool to extract schema.org metadata
+AI: "Import the recipe from https://www.example.com/recipe/chocolate-cake"
+→ Uses nc_cookbook_import_recipe tool with schema.org metadata extraction
 ```

-### Manage Calendar
+### Schedule Meetings
 ```
 AI: "Schedule a team meeting for next Tuesday at 2pm"
 → Uses nc_calendar_create_event tool
 ```

-### Organize Files
+### Manage Files
 ```
 AI: "Create a folder called 'Project X' and move all PDFs there"
-→ Uses WebDAV tools (nc_webdav_create_directory, nc_webdav_move)
+→ Uses nc_webdav_create_directory and nc_webdav_move tools
 ```

-### Project Management
+### Semantic Search (Experimental, Opt-in)
 ```
-AI: "Create a new Deck board for Q1 planning with Todo, In Progress, and Done stacks"
-→ Uses deck_create_board and deck_create_stack tools
+AI: "Find notes related to machine learning concepts"
+→ Uses nc_semantic_search to find semantically similar notes (requires Qdrant + Ollama setup)
 ```

-## Transport Protocols
-
-The server supports multiple MCP transport protocols:
-
- **streamable-http** (recommended) - Modern streaming protocol
- **sse** (default, deprecated) - Server-Sent Events for backward compatibility
- **http** - Standard HTTP protocol
-
-```bash
-# Use streamable-http (recommended)
-uv run nextcloud-mcp-server --transport streamable-http
-```
-
-> [!WARNING]
-> SSE transport is deprecated and will be removed in a future MCP specification version. Please migrate to `streamable-http`.
+**Note:** For AI-generated answers with citations, use `nc_semantic_search_answer` (requires MCP client with sampling support).

 ## Contributing

@@ -335,17 +171,17 @@ Contributions are welcome!

 - Report bugs or request features: [GitHub Issues](https://github.com/cbcoutinho/nextcloud-mcp-server/issues)
 - Submit improvements: [Pull Requests](https://github.com/cbcoutinho/nextcloud-mcp-server/pulls)
- Read [CLAUDE.md](CLAUDE.md) for development guidelines
+- Development guidelines: [CLAUDE.md](CLAUDE.md)

 ## Security

 [![MseeP.ai Security Assessment](https://mseep.net/pr/cbcoutinho-nextcloud-mcp-server-badge.png)](https://mseep.ai/app/cbcoutinho-nextcloud-mcp-server)

 This project takes security seriously:
- OAuth2/OIDC support (experimental - requires upstream patches)
- Basic Auth with app-specific passwords (recommended)
- No credential storage with OAuth mode
+- Production-ready Basic Auth with app-specific passwords
+- OAuth2/OIDC support (experimental, requires upstream patches)
 - Per-user access tokens
+- No credential storage in OAuth mode
 - Regular security assessments

 Found a security issue? Please report it privately to the maintainers.
@@ -0,0 +1 @@
+charts/
@@ -0,0 +1,9 @@
+dependencies:
+- name: qdrant
+  repository: https://qdrant.github.io/qdrant-helm
+  version: 1.15.5
+- name: ollama
+  repository: https://otwld.github.io/ollama-helm
+  version: 1.34.0
+digest: sha256:d51c97d05be2614b751c0dd7267ef7dc959eff5ebef859c5f895c5c554b7a874
+generated: "2025-11-09T17:08:02.86648061Z"
@@ -2,8 +2,8 @@ apiVersion: v2
 name: nextcloud-mcp-server
 description: A Helm chart for Nextcloud MCP Server - enables AI assistants to interact with Nextcloud
 type: application
-version: 0.26.1
-appVersion: "0.26.1"
+version: 0.29.1
+appVersion: "0.29.1"
 keywords:
  - nextcloud
  - mcp
@@ -21,3 +21,12 @@ home: https://github.com/cbcoutinho/nextcloud-mcp-server
 sources:
  - https://github.com/cbcoutinho/nextcloud-mcp-server
 icon: https://raw.githubusercontent.com/nextcloud/server/master/core/img/logo/logo.svg
+dependencies:
+  - name: qdrant
+    version: "1.15.5"
+    repository: https://qdrant.github.io/qdrant-helm
+    condition: qdrant.networkMode.deploySubchart
+  - name: ollama
+    version: "1.34.0"
+    repository: https://otwld.github.io/ollama-helm
+    condition: ollama.enabled
@@ -14,8 +14,12 @@ This Helm chart deploys the Nextcloud MCP (Model Context Protocol) Server on a K
 ### Quick Start with Basic Authentication

 ```bash
+# Add the Helm repository
+helm repo add nextcloud-mcp https://cbcoutinho.github.io/nextcloud-mcp-server
+helm repo update
+
 # Install with basic auth (recommended for most users)
-helm install nextcloud-mcp ./helm/nextcloud-mcp-server \
+helm install nextcloud-mcp nextcloud-mcp/nextcloud-mcp-server \
  --set nextcloud.host=https://cloud.example.com \
  --set auth.basic.username=myuser \
  --set auth.basic.password=mypassword
@@ -47,7 +51,7 @@ resources:
 Install with your custom values:

 ```bash
-helm install nextcloud-mcp ./helm/nextcloud-mcp-server -f custom-values.yaml
+helm install nextcloud-mcp nextcloud-mcp/nextcloud-mcp-server -f custom-values.yaml
 ```

 ### OAuth Authentication Mode (Experimental)
@@ -202,6 +206,80 @@ The application exposes HTTP health check endpoints:
 | `documentProcessing.unstructured.apiUrl` | Unstructured API URL | `http://unstructured:8000` |
 | `documentProcessing.tesseract.enabled` | Enable Tesseract OCR | `false` |

+#### Vector Search & Semantic Capabilities (Optional)
+
+Enable semantic search capabilities by deploying a vector database (Qdrant) and embedding service (Ollama or OpenAI).
+
+**Vector Sync Configuration:**
+
+| Parameter | Description | Default |
+|-----------|-------------|---------|
+| `vectorSync.enabled` | Enable background vector synchronization | `false` |
+| `vectorSync.scanInterval` | Scan interval in seconds | `3600` |
+| `vectorSync.processorWorkers` | Number of concurrent processor workers | `3` |
+| `vectorSync.queueMaxSize` | Maximum queue size for pending documents | `10000` |
+
+**Document Chunking Configuration:**
+
+| Parameter | Description | Default |
+|-----------|-------------|---------|
+| `documentChunking.chunkSize` | Number of words per chunk for embedding | `512` |
+| `documentChunking.chunkOverlap` | Number of overlapping words between chunks | `50` |
+
+**Chunking Strategy:**
+- **Small chunks (256-384)**: Better precision for searches, more storage overhead
+- **Medium chunks (512-768)**: Balanced approach (recommended for most use cases)
+- **Large chunks (1024+)**: Better context preservation, less precise matching
+- **Overlap**: Should be 10-20% of chunk size to preserve context across boundaries
+
+**Qdrant Vector Database:**
+
+Qdrant is deployed as a subchart when `qdrant.enabled` is `true`. All configuration values are passed through to the [qdrant/qdrant](https://github.com/qdrant/qdrant-helm) chart.
+
+| Parameter | Description | Default |
+|-----------|-------------|---------|
+| `qdrant.enabled` | Deploy Qdrant as a subchart | `false` |
+| `qdrant.replicaCount` | Number of Qdrant replicas | `1` |
+| `qdrant.image.tag` | Qdrant version | `v1.12.5` |
+| `qdrant.apiKey` | Optional API key for authentication | `""` |
+| `qdrant.persistence.size` | Storage size for vector data | `10Gi` |
+| `qdrant.persistence.storageClass` | Storage class | `""` |
+| `qdrant.resources.requests.cpu` | CPU request | `200m` |
+| `qdrant.resources.requests.memory` | Memory request | `512Mi` |
+| `qdrant.resources.limits.cpu` | CPU limit | `1000m` |
+| `qdrant.resources.limits.memory` | Memory limit | `2Gi` |
+
+**Ollama Embedding Service:**
+
+Ollama is deployed as a subchart when `ollama.enabled` is `true`. All configuration values are passed through to the [ollama/ollama](https://github.com/otwld/ollama-helm) chart. Alternatively, set `ollama.url` to use an external Ollama instance.
+
+| Parameter | Description | Default |
+|-----------|-------------|---------|
+| `ollama.enabled` | Deploy Ollama as a subchart | `false` |
+| `ollama.url` | External Ollama URL (use with `enabled: false`) | `""` |
+| `ollama.embeddingModel` | Embedding model to use | `nomic-embed-text` |
+| `ollama.verifySsl` | Verify SSL certificates | `true` |
+| `ollama.replicaCount` | Number of Ollama replicas | `1` |
+| `ollama.ollama.models.pull` | Models to pull on startup | `["nomic-embed-text"]` |
+| `ollama.persistentVolume.enabled` | Enable persistent storage | `true` |
+| `ollama.persistentVolume.size` | Storage size for models | `20Gi` |
+| `ollama.resources.requests.cpu` | CPU request | `500m` |
+| `ollama.resources.requests.memory` | Memory request | `1Gi` |
+| `ollama.resources.limits.cpu` | CPU limit | `2000m` |
+| `ollama.resources.limits.memory` | Memory limit | `4Gi` |
+
+**OpenAI Embedding Provider (Alternative):**
+
+Use OpenAI or any OpenAI-compatible API instead of Ollama.
+
+| Parameter | Description | Default |
+|-----------|-------------|---------|
+| `openai.enabled` | Enable OpenAI embedding provider | `false` |
+| `openai.apiKey` | OpenAI API key | `""` |
+| `openai.existingSecret` | Use existing secret for API key | `""` |
+| `openai.secretKey` | Key in secret containing API key | `api-key` |
+| `openai.baseUrl` | Custom API endpoint (optional) | `""` |
+
 ## Examples

 ### Example 1: Basic Auth with Ingress
@@ -379,18 +457,106 @@ affinity:
          topologyKey: kubernetes.io/hostname
 ```

+### Example 5: Semantic Search with Qdrant and Ollama
+
+Deploy with vector search capabilities using embedded Qdrant and Ollama:
+
+```yaml
+nextcloud:
+  host: https://cloud.example.com
+
+auth:
+  mode: basic
+  basic:
+    username: admin
+    password: secure-password
+
+# Enable vector sync
+vectorSync:
+  enabled: true
+  scanInterval: 1800  # Scan every 30 minutes
+  processorWorkers: 5
+
+# Deploy Qdrant as a subchart
+qdrant:
+  enabled: true
+  persistence:
+    size: 20Gi
+    storageClass: fast-ssd
+  resources:
+    requests:
+      cpu: 500m
+      memory: 1Gi
+    limits:
+      cpu: 2000m
+      memory: 4Gi
+
+# Deploy Ollama as a subchart
+ollama:
+  enabled: true
+  embeddingModel: nomic-embed-text
+  persistentVolume:
+    size: 30Gi
+    storageClass: standard
+  resources:
+    requests:
+      cpu: 1000m
+      memory: 2Gi
+    limits:
+      cpu: 4000m
+      memory: 8Gi
+```
+
+Or use an external Ollama instance:
+
+```yaml
+vectorSync:
+  enabled: true
+
+qdrant:
+  enabled: true
+
+# Use external Ollama instead of deploying subchart
+ollama:
+  enabled: false
+  url: "http://ollama.ai-services.svc.cluster.local:11434"
+  embeddingModel: nomic-embed-text
+```
+
+Or use OpenAI for embeddings:
+
+```yaml
+vectorSync:
+  enabled: true
+
+qdrant:
+  enabled: true
+
+# Use OpenAI instead of Ollama
+openai:
+  enabled: true
+  apiKey: "sk-..."
+  # Or use existing secret:
+  # existingSecret: openai-api-key
+  # secretKey: api-key
+```
+
 ## Upgrading

 ### To upgrade an existing deployment:

 ```bash
-helm upgrade nextcloud-mcp ./helm/nextcloud-mcp-server -f custom-values.yaml
+# Update the repository
+helm repo update
+
+# Upgrade with your custom values
+helm upgrade nextcloud-mcp nextcloud-mcp/nextcloud-mcp-server -f custom-values.yaml
 ```

 ### To upgrade with new values:

 ```bash
-helm upgrade nextcloud-mcp ./helm/nextcloud-mcp-server \
+helm upgrade nextcloud-mcp nextcloud-mcp/nextcloud-mcp-server \
  --set resources.limits.memory=1Gi
 ```

@@ -0,0 +1,90 @@
+# Grafana Dashboards
+
+This directory contains example Grafana dashboards for monitoring the Nextcloud MCP Server.
+
+## Dashboards
+
+### nextcloud-mcp-server.json
+
+Comprehensive dashboard with the following panels:
+
+- **Request Rate**: HTTP requests per second by method and endpoint
+- **Error Rate**: Percentage of 5xx errors
+- **Request Latency**: P50 and P95 latency by endpoint
+- **Top MCP Tools**: Most frequently called tools
+- **Nextcloud API Latency**: API call latency by app (notes, calendar, etc.)
+- **Vector Sync Queue**: Queue size for background document processing
+
+## Importing to Grafana
+
+### Manual Import
+
+1. Open Grafana UI
+2. Navigate to Dashboards → Import
+3. Upload `nextcloud-mcp-server.json`
+4. Select your Prometheus data source
+5. Click "Import"
+
+### Automated Import (Kubernetes)
+
+If using the Grafana Operator or kube-prometheus-stack, you can create a ConfigMap:
+
+```bash
+kubectl create configmap nextcloud-mcp-dashboards \
+  --from-file=nextcloud-mcp-server.json \
+  -n monitoring
+
+# Add label for Grafana sidecar to discover
+kubectl label configmap nextcloud-mcp-dashboards \
+  grafana_dashboard=1 \
+  -n monitoring
+```
+
+Or add to your Helm values:
+
+```yaml
+# values.yaml for kube-prometheus-stack
+grafana:
+  dashboardProviders:
+    dashboardproviders.yaml:
+      apiVersion: 1
+      providers:
+        - name: 'nextcloud-mcp'
+          orgId: 1
+          folder: 'Nextcloud MCP'
+          type: file
+          disableDeletion: false
+          editable: true
+          options:
+            path: /var/lib/grafana/dashboards/nextcloud-mcp
+
+  dashboardsConfigMaps:
+    nextcloud-mcp: nextcloud-mcp-dashboards
+```
+
+## Dashboard Variables
+
+The dashboard includes two variables:
+
+- **Data Source**: Select your Prometheus data source
+- **Namespace**: Filter metrics by Kubernetes namespace
+
+## Customization
+
+You can customize the dashboard by:
+
+1. Adjusting refresh rate (default: 30s)
+2. Modifying time range (default: last 6 hours)
+3. Adding new panels for specific metrics
+4. Adjusting thresholds in existing panels
+
+## Metrics Reference
+
+All metrics are documented in `/docs/observability.md`. Key metric prefixes:
+
+- `mcp_http_*` - HTTP server metrics
+- `mcp_tool_*` - MCP tool invocation metrics
+- `mcp_nextcloud_api_*` - Nextcloud API call metrics
+- `mcp_oauth_*` - OAuth token validation metrics
+- `mcp_vector_sync_*` - Vector database sync metrics
+- `mcp_db_*` - Database operation metrics
@@ -0,0 +1,630 @@
+{
+  "annotations": {
+    "list": [
+      {
+        "builtIn": 1,
+        "datasource": {
+          "type": "grafana",
+          "uid": "-- Grafana --"
+        },
+        "enable": true,
+        "hide": true,
+        "iconColor": "rgba(0, 211, 255, 1)",
+        "name": "Annotations & Alerts",
+        "type": "dashboard"
+      }
+    ]
+  },
+  "editable": true,
+  "fiscalYearStartMonth": 0,
+  "graphTooltip": 0,
+  "id": null,
+  "links": [],
+  "liveNow": false,
+  "panels": [
+    {
+      "datasource": {
+        "type": "prometheus",
+        "uid": "${datasource}"
+      },
+      "fieldConfig": {
+        "defaults": {
+          "color": {
+            "mode": "palette-classic"
+          },
+          "custom": {
+            "axisCenteredZero": false,
+            "axisColorMode": "text",
+            "axisLabel": "",
+            "axisPlacement": "auto",
+            "barAlignment": 0,
+            "drawStyle": "line",
+            "fillOpacity": 10,
+            "gradientMode": "none",
+            "hideFrom": {
+              "tooltip": false,
+              "viz": false,
+              "legend": false
+            },
+            "lineInterpolation": "linear",
+            "lineWidth": 1,
+            "pointSize": 5,
+            "scaleDistribution": {
+              "type": "linear"
+            },
+            "showPoints": "never",
+            "spanNulls": false,
+            "stacking": {
+              "group": "A",
+              "mode": "none"
+            },
+            "thresholdsStyle": {
+              "mode": "off"
+            }
+          },
+          "mappings": [],
+          "thresholds": {
+            "mode": "absolute",
+            "steps": [
+              {
+                "color": "green",
+                "value": null
+              }
+            ]
+          },
+          "unit": "reqps"
+        },
+        "overrides": []
+      },
+      "gridPos": {
+        "h": 8,
+        "w": 12,
+        "x": 0,
+        "y": 0
+      },
+      "id": 1,
+      "options": {
+        "legend": {
+          "calcs": ["mean", "max"],
+          "displayMode": "table",
+          "placement": "bottom",
+          "showLegend": true
+        },
+        "tooltip": {
+          "mode": "multi",
+          "sort": "none"
+        }
+      },
+      "targets": [
+        {
+          "datasource": {
+            "type": "prometheus",
+            "uid": "${datasource}"
+          },
+          "expr": "sum(rate(mcp_http_requests_total{namespace=\"$namespace\"}[5m])) by (method, endpoint)",
+          "legendFormat": "{{method}} {{endpoint}}",
+          "refId": "A"
+        }
+      ],
+      "title": "Request Rate",
+      "type": "timeseries"
+    },
+    {
+      "datasource": {
+        "type": "prometheus",
+        "uid": "${datasource}"
+      },
+      "fieldConfig": {
+        "defaults": {
+          "color": {
+            "mode": "thresholds"
+          },
+          "custom": {
+            "axisCenteredZero": false,
+            "axisColorMode": "text",
+            "axisLabel": "",
+            "axisPlacement": "auto",
+            "barAlignment": 0,
+            "drawStyle": "line",
+            "fillOpacity": 10,
+            "gradientMode": "none",
+            "hideFrom": {
+              "tooltip": false,
+              "viz": false,
+              "legend": false
+            },
+            "lineInterpolation": "linear",
+            "lineWidth": 1,
+            "pointSize": 5,
+            "scaleDistribution": {
+              "type": "linear"
+            },
+            "showPoints": "never",
+            "spanNulls": false,
+            "stacking": {
+              "group": "A",
+              "mode": "none"
+            },
+            "thresholdsStyle": {
+              "mode": "line"
+            }
+          },
+          "mappings": [],
+          "max": 100,
+          "min": 0,
+          "thresholds": {
+            "mode": "absolute",
+            "steps": [
+              {
+                "color": "green",
+                "value": null
+              },
+              {
+                "color": "yellow",
+                "value": 1
+              },
+              {
+                "color": "red",
+                "value": 5
+              }
+            ]
+          },
+          "unit": "percent"
+        },
+        "overrides": []
+      },
+      "gridPos": {
+        "h": 8,
+        "w": 12,
+        "x": 12,
+        "y": 0
+      },
+      "id": 2,
+      "options": {
+        "legend": {
+          "calcs": ["mean", "max"],
+          "displayMode": "table",
+          "placement": "bottom",
+          "showLegend": true
+        },
+        "tooltip": {
+          "mode": "multi",
+          "sort": "none"
+        }
+      },
+      "targets": [
+        {
+          "datasource": {
+            "type": "prometheus",
+            "uid": "${datasource}"
+          },
+          "expr": "sum(rate(mcp_http_requests_total{status_code=~\"5..\", namespace=\"$namespace\"}[5m])) / sum(rate(mcp_http_requests_total{namespace=\"$namespace\"}[5m])) * 100",
+          "legendFormat": "Error Rate",
+          "refId": "A"
+        }
+      ],
+      "title": "Error Rate (%)",
+      "type": "timeseries"
+    },
+    {
+      "datasource": {
+        "type": "prometheus",
+        "uid": "${datasource}"
+      },
+      "fieldConfig": {
+        "defaults": {
+          "color": {
+            "mode": "palette-classic"
+          },
+          "custom": {
+            "axisCenteredZero": false,
+            "axisColorMode": "text",
+            "axisLabel": "",
+            "axisPlacement": "auto",
+            "barAlignment": 0,
+            "drawStyle": "line",
+            "fillOpacity": 10,
+            "gradientMode": "none",
+            "hideFrom": {
+              "tooltip": false,
+              "viz": false,
+              "legend": false
+            },
+            "lineInterpolation": "linear",
+            "lineWidth": 1,
+            "pointSize": 5,
+            "scaleDistribution": {
+              "type": "linear"
+            },
+            "showPoints": "never",
+            "spanNulls": false,
+            "stacking": {
+              "group": "A",
+              "mode": "none"
+            },
+            "thresholdsStyle": {
+              "mode": "off"
+            }
+          },
+          "mappings": [],
+          "thresholds": {
+            "mode": "absolute",
+            "steps": [
+              {
+                "color": "green",
+                "value": null
+              }
+            ]
+          },
+          "unit": "s"
+        },
+        "overrides": []
+      },
+      "gridPos": {
+        "h": 8,
+        "w": 12,
+        "x": 0,
+        "y": 8
+      },
+      "id": 3,
+      "options": {
+        "legend": {
+          "calcs": ["mean", "max"],
+          "displayMode": "table",
+          "placement": "bottom",
+          "showLegend": true
+        },
+        "tooltip": {
+          "mode": "multi",
+          "sort": "none"
+        }
+      },
+      "targets": [
+        {
+          "datasource": {
+            "type": "prometheus",
+            "uid": "${datasource}"
+          },
+          "expr": "histogram_quantile(0.95, sum(rate(mcp_http_request_duration_seconds_bucket{namespace=\"$namespace\"}[5m])) by (le, endpoint))",
+          "legendFormat": "{{endpoint}} (p95)",
+          "refId": "A"
+        },
+        {
+          "datasource": {
+            "type": "prometheus",
+            "uid": "${datasource}"
+          },
+          "expr": "histogram_quantile(0.50, sum(rate(mcp_http_request_duration_seconds_bucket{namespace=\"$namespace\"}[5m])) by (le, endpoint))",
+          "legendFormat": "{{endpoint}} (p50)",
+          "refId": "B"
+        }
+      ],
+      "title": "Request Latency (P50/P95)",
+      "type": "timeseries"
+    },
+    {
+      "datasource": {
+        "type": "prometheus",
+        "uid": "${datasource}"
+      },
+      "fieldConfig": {
+        "defaults": {
+          "color": {
+            "mode": "palette-classic"
+          },
+          "custom": {
+            "axisCenteredZero": false,
+            "axisColorMode": "text",
+            "axisLabel": "",
+            "axisPlacement": "auto",
+            "barAlignment": 0,
+            "drawStyle": "line",
+            "fillOpacity": 10,
+            "gradientMode": "none",
+            "hideFrom": {
+              "tooltip": false,
+              "viz": false,
+              "legend": false
+            },
+            "lineInterpolation": "linear",
+            "lineWidth": 1,
+            "pointSize": 5,
+            "scaleDistribution": {
+              "type": "linear"
+            },
+            "showPoints": "never",
+            "spanNulls": false,
+            "stacking": {
+              "group": "A",
+              "mode": "none"
+            },
+            "thresholdsStyle": {
+              "mode": "off"
+            }
+          },
+          "mappings": [],
+          "thresholds": {
+            "mode": "absolute",
+            "steps": [
+              {
+                "color": "green",
+                "value": null
+              }
+            ]
+          },
+          "unit": "short"
+        },
+        "overrides": []
+      },
+      "gridPos": {
+        "h": 8,
+        "w": 12,
+        "x": 12,
+        "y": 8
+      },
+      "id": 4,
+      "options": {
+        "legend": {
+          "calcs": ["mean", "max"],
+          "displayMode": "table",
+          "placement": "bottom",
+          "showLegend": true
+        },
+        "tooltip": {
+          "mode": "multi",
+          "sort": "none"
+        }
+      },
+      "targets": [
+        {
+          "datasource": {
+            "type": "prometheus",
+            "uid": "${datasource}"
+          },
+          "expr": "topk(10, sum(rate(mcp_tool_calls_total{namespace=\"$namespace\"}[5m])) by (tool_name))",
+          "legendFormat": "{{tool_name}}",
+          "refId": "A"
+        }
+      ],
+      "title": "Top MCP Tools by Volume",
+      "type": "timeseries"
+    },
+    {
+      "datasource": {
+        "type": "prometheus",
+        "uid": "${datasource}"
+      },
+      "fieldConfig": {
+        "defaults": {
+          "color": {
+            "mode": "palette-classic"
+          },
+          "custom": {
+            "axisCenteredZero": false,
+            "axisColorMode": "text",
+            "axisLabel": "",
+            "axisPlacement": "auto",
+            "barAlignment": 0,
+            "drawStyle": "line",
+            "fillOpacity": 10,
+            "gradientMode": "none",
+            "hideFrom": {
+              "tooltip": false,
+              "viz": false,
+              "legend": false
+            },
+            "lineInterpolation": "linear",
+            "lineWidth": 1,
+            "pointSize": 5,
+            "scaleDistribution": {
+              "type": "linear"
+            },
+            "showPoints": "never",
+            "spanNulls": false,
+            "stacking": {
+              "group": "A",
+              "mode": "none"
+            },
+            "thresholdsStyle": {
+              "mode": "off"
+            }
+          },
+          "mappings": [],
+          "thresholds": {
+            "mode": "absolute",
+            "steps": [
+              {
+                "color": "green",
+                "value": null
+              }
+            ]
+          },
+          "unit": "s"
+        },
+        "overrides": []
+      },
+      "gridPos": {
+        "h": 8,
+        "w": 12,
+        "x": 0,
+        "y": 16
+      },
+      "id": 5,
+      "options": {
+        "legend": {
+          "calcs": ["mean", "max"],
+          "displayMode": "table",
+          "placement": "bottom",
+          "showLegend": true
+        },
+        "tooltip": {
+          "mode": "multi",
+          "sort": "none"
+        }
+      },
+      "targets": [
+        {
+          "datasource": {
+            "type": "prometheus",
+            "uid": "${datasource}"
+          },
+          "expr": "histogram_quantile(0.95, sum(rate(mcp_nextcloud_api_duration_seconds_bucket{namespace=\"$namespace\"}[5m])) by (le, app))",
+          "legendFormat": "{{app}} (p95)",
+          "refId": "A"
+        }
+      ],
+      "title": "Nextcloud API Latency by App",
+      "type": "timeseries"
+    },
+    {
+      "datasource": {
+        "type": "prometheus",
+        "uid": "${datasource}"
+      },
+      "fieldConfig": {
+        "defaults": {
+          "color": {
+            "mode": "palette-classic"
+          },
+          "custom": {
+            "axisCenteredZero": false,
+            "axisColorMode": "text",
+            "axisLabel": "",
+            "axisPlacement": "auto",
+            "barAlignment": 0,
+            "drawStyle": "line",
+            "fillOpacity": 10,
+            "gradientMode": "none",
+            "hideFrom": {
+              "tooltip": false,
+              "viz": false,
+              "legend": false
+            },
+            "lineInterpolation": "linear",
+            "lineWidth": 1,
+            "pointSize": 5,
+            "scaleDistribution": {
+              "type": "linear"
+            },
+            "showPoints": "never",
+            "spanNulls": false,
+            "stacking": {
+              "group": "A",
+              "mode": "none"
+            },
+            "thresholdsStyle": {
+              "mode": "off"
+            }
+          },
+          "mappings": [],
+          "thresholds": {
+            "mode": "absolute",
+            "steps": [
+              {
+                "color": "green",
+                "value": null
+              }
+            ]
+          },
+          "unit": "short"
+        },
+        "overrides": []
+      },
+      "gridPos": {
+        "h": 8,
+        "w": 12,
+        "x": 12,
+        "y": 16
+      },
+      "id": 6,
+      "options": {
+        "legend": {
+          "calcs": ["mean", "lastNotNull"],
+          "displayMode": "table",
+          "placement": "bottom",
+          "showLegend": true
+        },
+        "tooltip": {
+          "mode": "multi",
+          "sort": "none"
+        }
+      },
+      "targets": [
+        {
+          "datasource": {
+            "type": "prometheus",
+            "uid": "${datasource}"
+          },
+          "expr": "mcp_vector_sync_queue_size{namespace=\"$namespace\"}",
+          "legendFormat": "Queue Size",
+          "refId": "A"
+        }
+      ],
+      "title": "Vector Sync Queue Size",
+      "type": "timeseries"
+    }
+  ],
+  "refresh": "30s",
+  "schemaVersion": 38,
+  "style": "dark",
+  "tags": ["nextcloud", "mcp", "observability"],
+  "templating": {
+    "list": [
+      {
+        "current": {
+          "selected": false,
+          "text": "Prometheus",
+          "value": "Prometheus"
+        },
+        "hide": 0,
+        "includeAll": false,
+        "label": "Data Source",
+        "multi": false,
+        "name": "datasource",
+        "options": [],
+        "query": "prometheus",
+        "refresh": 1,
+        "regex": "",
+        "skipUrlSync": false,
+        "type": "datasource"
+      },
+      {
+        "current": {
+          "selected": false,
+          "text": "default",
+          "value": "default"
+        },
+        "datasource": {
+          "type": "prometheus",
+          "uid": "${datasource}"
+        },
+        "definition": "label_values(mcp_http_requests_total, namespace)",
+        "hide": 0,
+        "includeAll": false,
+        "label": "Namespace",
+        "multi": false,
+        "name": "namespace",
+        "options": [],
+        "query": {
+          "query": "label_values(mcp_http_requests_total, namespace)",
+          "refId": "PrometheusVariableQueryEditor-VariableQuery"
+        },
+        "refresh": 1,
+        "regex": "",
+        "skipUrlSync": false,
+        "sort": 0,
+        "type": "query"
+      }
+    ]
+  },
+  "time": {
+    "from": "now-6h",
+    "to": "now"
+  },
+  "timepicker": {},
+  "timezone": "",
+  "title": "Nextcloud MCP Server",
+  "uid": "nextcloud-mcp-server",
+  "version": 1,
+  "weekStart": ""
+}
@@ -69,6 +69,33 @@ Your Nextcloud MCP Server has been deployed in {{ .Values.auth.mode }} authentic
   {{- end }}
 {{- end }}

+{{- if .Values.vectorSync.enabled }}
+
+5. Vector Search & Semantic Capabilities:
+   - Vector Sync: Enabled
+   - Scan Interval: {{ .Values.vectorSync.scanInterval }}s
+   - Processor Workers: {{ .Values.vectorSync.processorWorkers }}
+   {{- if .Values.qdrant.enabled }}
+   - Qdrant: Deployed as subchart ({{ .Release.Name }}-qdrant:6333)
+   {{- else }}
+   - Qdrant: Not deployed (configure external instance)
+   {{- end }}
+   {{- if .Values.ollama.enabled }}
+   - Ollama: Deployed as subchart ({{ .Release.Name }}-ollama:11434)
+   - Embedding Model: {{ .Values.ollama.embeddingModel }}
+   {{- else if .Values.ollama.url }}
+   - Ollama: Using external instance at {{ .Values.ollama.url }}
+   - Embedding Model: {{ .Values.ollama.embeddingModel }}
+   {{- else if .Values.openai.enabled }}
+   - OpenAI: Enabled for embeddings
+   {{- else }}
+   - WARNING: No embedding provider configured (Ollama or OpenAI required)
+   {{- end }}
+
+   Check vector sync status:
+   kubectl --namespace {{ .Release.Namespace }} exec -it deploy/{{ include "nextcloud-mcp-server.fullname" . }} -- curl -s http://localhost:{{ include "nextcloud-mcp-server.port" . }}/user/page | grep "Vector Sync"
+{{- end }}
+
 For more information and documentation:
 - GitHub: https://github.com/cbcoutinho/nextcloud-mcp-server
 - Documentation: https://github.com/cbcoutinho/nextcloud-mcp-server#readme
@@ -94,6 +94,17 @@ Create the name of the PVC to use for OAuth storage
 {{- end }}
 {{- end }}

+{{/*
+Create the name of the PVC to use for Qdrant local persistent storage
+*/}}
+{{- define "nextcloud-mcp-server.qdrantPvcName" -}}
+{{- if .Values.qdrant.localPersistence.existingClaim }}
+{{- .Values.qdrant.localPersistence.existingClaim }}
+{{- else }}
+{{- include "nextcloud-mcp-server.fullname" . }}-qdrant-data
+{{- end }}
+{{- end }}
+
 {{/*
 Return the MCP server port
 */}}
@@ -5,6 +5,8 @@ metadata:
  labels:
    {{- include "nextcloud-mcp-server.labels" . | nindent 4 }}
 spec:
+  strategy:
+    type: Recreate
  {{- if not .Values.autoscaling.enabled }}
  replicas: {{ .Values.replicaCount }}
  {{- end }}
@@ -56,6 +58,11 @@ spec:
            - name: http
              containerPort: {{ include "nextcloud-mcp-server.port" . }}
              protocol: TCP
+            {{- if .Values.observability.metrics.enabled }}
+            - name: metrics
+              containerPort: {{ .Values.observability.metrics.port }}
+              protocol: TCP
+            {{- end }}
          env:
            # Nextcloud connection
            - name: NEXTCLOUD_HOST
@@ -140,6 +147,92 @@ spec:
              value: {{ .Values.documentProcessing.custom.types | quote }}
            {{- end }}
            {{- end }}
+            # Vector Sync
+            - name: VECTOR_SYNC_ENABLED
+              value: {{ .Values.vectorSync.enabled | quote }}
+            {{- if .Values.vectorSync.enabled }}
+            - name: VECTOR_SYNC_SCAN_INTERVAL
+              value: {{ .Values.vectorSync.scanInterval | quote }}
+            - name: VECTOR_SYNC_PROCESSOR_WORKERS
+              value: {{ .Values.vectorSync.processorWorkers | quote }}
+            - name: VECTOR_SYNC_QUEUE_MAX_SIZE
+              value: {{ .Values.vectorSync.queueMaxSize | quote }}
+            {{- end }}
+            # Document Chunking (always set, used by vector sync processor)
+            - name: DOCUMENT_CHUNK_SIZE
+              value: {{ .Values.documentChunking.chunkSize | quote }}
+            - name: DOCUMENT_CHUNK_OVERLAP
+              value: {{ .Values.documentChunking.chunkOverlap | quote }}
+            # Qdrant Vector Database
+            {{- if eq .Values.qdrant.mode "network" }}
+            # Network mode: Use dedicated Qdrant service
+            {{- if .Values.qdrant.networkMode.deploySubchart }}
+            - name: QDRANT_URL
+              value: "http://{{ .Release.Name }}-qdrant:6333"
+            {{- else if .Values.qdrant.networkMode.externalUrl }}
+            - name: QDRANT_URL
+              value: {{ .Values.qdrant.networkMode.externalUrl | quote }}
+            {{- end }}
+            {{- if or .Values.qdrant.networkMode.apiKey .Values.qdrant.networkMode.existingSecret }}
+            - name: QDRANT_API_KEY
+              valueFrom:
+                secretKeyRef:
+                  name: {{ .Values.qdrant.networkMode.existingSecret | default (printf "%s-qdrant" .Release.Name) }}
+                  key: {{ .Values.qdrant.networkMode.secretKey }}
+            {{- end }}
+            {{- else if eq .Values.qdrant.mode "persistent" }}
+            # Persistent local mode: File-based storage
+            - name: QDRANT_LOCATION
+              value: {{ .Values.qdrant.localPersistence.dataPath | quote }}
+            {{- else }}
+            # In-memory mode (default): Ephemeral storage
+            - name: QDRANT_LOCATION
+              value: ":memory:"
+            {{- end }}
+            - name: QDRANT_COLLECTION
+              value: {{ .Values.qdrant.collection | quote }}
+            # Ollama Embedding Service
+            {{- if or .Values.ollama.enabled .Values.ollama.url }}
+            - name: OLLAMA_BASE_URL
+              value: {{ .Values.ollama.url | default (printf "http://%s-ollama:11434" .Release.Name) | quote }}
+            - name: OLLAMA_EMBEDDING_MODEL
+              value: {{ .Values.ollama.embeddingModel | quote }}
+            - name: OLLAMA_VERIFY_SSL
+              value: {{ .Values.ollama.verifySsl | quote }}
+            {{- end }}
+            # OpenAI Embedding Provider (alternative to Ollama)
+            {{- if .Values.openai.enabled }}
+            - name: OPENAI_API_KEY
+              valueFrom:
+                secretKeyRef:
+                  name: {{ .Values.openai.existingSecret | default (printf "%s-openai" (include "nextcloud-mcp-server.fullname" .)) }}
+                  key: {{ .Values.openai.secretKey }}
+            {{- if .Values.openai.baseUrl }}
+            - name: OPENAI_BASE_URL
+              value: {{ .Values.openai.baseUrl | quote }}
+            {{- end }}
+            {{- end }}
+            # Observability
+            - name: METRICS_ENABLED
+              value: {{ .Values.observability.metrics.enabled | quote }}
+            - name: METRICS_PORT
+              value: {{ .Values.observability.metrics.port | quote }}
+            {{- if .Values.observability.tracing.enabled }}
+            - name: OTEL_ENABLED
+              value: "true"
+            - name: OTEL_EXPORTER_OTLP_ENDPOINT
+              value: {{ .Values.observability.tracing.endpoint | quote }}
+            - name: OTEL_SERVICE_NAME
+              value: {{ .Values.observability.tracing.serviceName | quote }}
+            - name: OTEL_TRACES_SAMPLER_ARG
+              value: {{ .Values.observability.tracing.samplingRate | quote }}
+            {{- end }}
+            - name: LOG_FORMAT
+              value: {{ .Values.observability.logging.format | quote }}
+            - name: LOG_LEVEL
+              value: {{ .Values.observability.logging.level | quote }}
+            - name: LOG_INCLUDE_TRACE_CONTEXT
+              value: {{ .Values.observability.logging.includeTraceContext | quote }}
            {{- with .Values.extraEnv }}
            {{- toYaml . | nindent 12 }}
            {{- end }}
@@ -160,6 +253,10 @@ spec:
            - name: oauth-storage
              mountPath: /app/.oauth
            {{- end }}
+            {{- if and (eq .Values.qdrant.mode "persistent") .Values.qdrant.localPersistence.enabled }}
+            - name: qdrant-data
+              mountPath: /app/data
+            {{- end }}
            {{- with .Values.volumeMounts }}
            {{- toYaml . | nindent 12 }}
            {{- end }}
@@ -171,6 +268,11 @@ spec:
          persistentVolumeClaim:
            claimName: {{ include "nextcloud-mcp-server.oauthPvcName" . }}
        {{- end }}
+        {{- if and (eq .Values.qdrant.mode "persistent") .Values.qdrant.localPersistence.enabled }}
+        - name: qdrant-data
+          persistentVolumeClaim:
+            claimName: {{ include "nextcloud-mcp-server.qdrantPvcName" . }}
+        {{- end }}
        {{- with .Values.volumes }}
        {{- toYaml . | nindent 8 }}
        {{- end }}
@@ -0,0 +1,11 @@
+{{- if and .Values.openai.enabled (not .Values.openai.existingSecret) }}
+apiVersion: v1
+kind: Secret
+metadata:
+  name: {{ include "nextcloud-mcp-server.fullname" . }}-openai
+  labels:
+    {{- include "nextcloud-mcp-server.labels" . | nindent 4 }}
+type: Opaque
+data:
+  {{ .Values.openai.secretKey }}: {{ .Values.openai.apiKey | b64enc | quote }}
+{{- end }}
@@ -0,0 +1,92 @@
+{{- if and .Values.observability.metrics.enabled .Values.prometheusRule.enabled }}
+apiVersion: monitoring.coreos.com/v1
+kind: PrometheusRule
+metadata:
+  name: {{ include "nextcloud-mcp-server.fullname" . }}
+  namespace: {{ .Release.Namespace }}
+  labels:
+    {{- include "nextcloud-mcp-server.labels" . | nindent 4 }}
+    {{- with .Values.prometheusRule.labels }}
+    {{- toYaml . | nindent 4 }}
+    {{- end }}
+spec:
+  groups:
+    - name: nextcloud-mcp-server.critical
+      interval: 30s
+      rules:
+        - alert: NextcloudMCPServerDown
+          expr: up{job="{{ include "nextcloud-mcp-server.fullname" . }}"} == 0
+          for: 5m
+          labels:
+            severity: critical
+          annotations:
+            summary: "Nextcloud MCP Server is down"
+            description: "{{ `{{` }} $labels.pod {{ `}}` }} has been down for more than 5 minutes."
+
+        - alert: NextcloudMCPHighErrorRate
+          expr: |
+            sum(rate(mcp_http_requests_total{status_code=~"5..", job="{{ include "nextcloud-mcp-server.fullname" . }}"}[5m]))
+            / sum(rate(mcp_http_requests_total{job="{{ include "nextcloud-mcp-server.fullname" . }}"}[5m])) > 0.05
+          for: 5m
+          labels:
+            severity: critical
+          annotations:
+            summary: "High error rate on Nextcloud MCP Server"
+            description: "Error rate is {{ `{{` }} printf \"%.2f%%\" (mul $value 100) {{ `}}` }} (threshold: 5%)"
+
+        - alert: NextcloudMCPHighLatency
+          expr: |
+            histogram_quantile(0.95,
+              sum(rate(mcp_http_request_duration_seconds_bucket{job="{{ include "nextcloud-mcp-server.fullname" . }}"}[5m])) by (le, endpoint)
+            ) > 1
+          for: 5m
+          labels:
+            severity: critical
+          annotations:
+            summary: "High latency on Nextcloud MCP Server"
+            description: "P95 latency is {{ `{{` }} printf \"%.2fs\" $value {{ `}}` }} on {{ `{{` }} $labels.endpoint {{ `}}` }} (threshold: 1s)"
+
+        - alert: NextcloudMCPDependencyDown
+          expr: mcp_dependency_health{job="{{ include "nextcloud-mcp-server.fullname" . }}"} == 0
+          for: 2m
+          labels:
+            severity: critical
+          annotations:
+            summary: "Nextcloud MCP dependency is down"
+            description: "Dependency {{ `{{` }} $labels.dependency {{ `}}` }} has been down for more than 2 minutes."
+
+    - name: nextcloud-mcp-server.warning
+      interval: 30s
+      rules:
+        - alert: NextcloudMCPTokenValidationErrors
+          expr: |
+            sum(rate(mcp_oauth_token_validations_total{result="error", job="{{ include "nextcloud-mcp-server.fullname" . }}"}[10m]))
+            / sum(rate(mcp_oauth_token_validations_total{job="{{ include "nextcloud-mcp-server.fullname" . }}"}[10m])) > 0.01
+          for: 10m
+          labels:
+            severity: warning
+          annotations:
+            summary: "High token validation error rate"
+            description: "Token validation error rate is {{ `{{` }} printf \"%.2f%%\" (mul $value 100) {{ `}}` }} (threshold: 1%)"
+
+        - alert: NextcloudMCPVectorSyncQueueHigh
+          expr: mcp_vector_sync_queue_size{job="{{ include "nextcloud-mcp-server.fullname" . }}"} > 100
+          for: 15m
+          labels:
+            severity: warning
+          annotations:
+            summary: "Vector sync queue is high"
+            description: "Vector sync queue size is {{ `{{` }} $value {{ `}}` }} (threshold: 100)"
+
+        - alert: NextcloudMCPQdrantSlowQueries
+          expr: |
+            histogram_quantile(0.95,
+              sum(rate(mcp_db_operation_duration_seconds_bucket{db="qdrant", job="{{ include "nextcloud-mcp-server.fullname" . }}"}[10m])) by (le)
+            ) > 0.5
+          for: 10m
+          labels:
+            severity: warning
+          annotations:
+            summary: "Qdrant queries are slow"
+            description: "P95 Qdrant query latency is {{ `{{` }} printf \"%.2fs\" $value {{ `}}` }} (threshold: 0.5s)"
+{{- end }}
@@ -15,3 +15,21 @@ spec:
    requests:
      storage: {{ .Values.auth.oauth.persistence.size }}
 {{- end }}
+---
+{{- if and (eq .Values.qdrant.mode "persistent") .Values.qdrant.localPersistence.enabled (not .Values.qdrant.localPersistence.existingClaim) }}
+apiVersion: v1
+kind: PersistentVolumeClaim
+metadata:
+  name: {{ include "nextcloud-mcp-server.fullname" . }}-qdrant-data
+  labels:
+    {{- include "nextcloud-mcp-server.labels" . | nindent 4 }}
+spec:
+  accessModes:
+    - {{ .Values.qdrant.localPersistence.accessMode }}
+  {{- if .Values.qdrant.localPersistence.storageClass }}
+  storageClassName: {{ .Values.qdrant.localPersistence.storageClass }}
+  {{- end }}
+  resources:
+    requests:
+      storage: {{ .Values.qdrant.localPersistence.size }}
+{{- end }}
@@ -15,5 +15,11 @@ spec:
      targetPort: http
      protocol: TCP
      name: http
+    {{- if .Values.observability.metrics.enabled }}
+    - port: {{ .Values.observability.metrics.port }}
+      targetPort: metrics
+      protocol: TCP
+      name: metrics
+    {{- end }}
  selector:
    {{- include "nextcloud-mcp-server.selectorLabels" . | nindent 4 }}
@@ -0,0 +1,32 @@
+{{- if and .Values.observability.metrics.enabled .Values.serviceMonitor.enabled }}
+apiVersion: monitoring.coreos.com/v1
+kind: ServiceMonitor
+metadata:
+  name: {{ include "nextcloud-mcp-server.fullname" . }}
+  namespace: {{ .Release.Namespace }}
+  labels:
+    {{- include "nextcloud-mcp-server.labels" . | nindent 4 }}
+    {{- with .Values.serviceMonitor.labels }}
+    {{- toYaml . | nindent 4 }}
+    {{- end }}
+spec:
+  selector:
+    matchLabels:
+      {{- include "nextcloud-mcp-server.selectorLabels" . | nindent 6 }}
+  endpoints:
+    - port: metrics
+      path: {{ .Values.observability.metrics.path }}
+      interval: {{ .Values.serviceMonitor.interval }}
+      scrapeTimeout: {{ .Values.serviceMonitor.scrapeTimeout }}
+      scheme: http
+      relabelings:
+        # Add namespace label
+        - sourceLabels: [__meta_kubernetes_namespace]
+          targetLabel: namespace
+        # Add pod label
+        - sourceLabels: [__meta_kubernetes_pod_name]
+          targetLabel: pod
+        # Add service label
+        - sourceLabels: [__meta_kubernetes_service_name]
+          targetLabel: service
+{{- end }}
@@ -168,6 +168,43 @@ securityContext:
  runAsNonRoot: true
  runAsUser: 1000

+# Observability Configuration
+observability:
+  # Prometheus metrics
+  metrics:
+    enabled: true
+    port: 9090
+    path: /metrics
+
+  # OpenTelemetry tracing
+  tracing:
+    enabled: false
+    endpoint: ""  # e.g., "http://opentelemetry-collector:4317"
+    serviceName: "nextcloud-mcp-server"
+    samplingRate: 1.0
+
+  # Logging configuration
+  logging:
+    format: json  # "json" or "text"
+    level: INFO
+    includeTraceContext: true
+
+# Prometheus ServiceMonitor (requires Prometheus Operator)
+serviceMonitor:
+  enabled: false
+  interval: 30s
+  scrapeTimeout: 10s
+  labels: {}
+  # Additional labels for ServiceMonitor (e.g., for Prometheus selector)
+  # Example: { prometheus: kube-prometheus }
+
+# Prometheus alert rules (requires Prometheus Operator)
+prometheusRule:
+  enabled: false
+  labels: {}
+  # Additional labels for PrometheusRule (e.g., for Prometheus selector)
+  # Example: { prometheus: kube-prometheus }
+
 service:
  type: ClusterIP
  port: 8000
@@ -264,3 +301,151 @@ extraEnvFrom: []
 #     name: my-configmap
 # - secretRef:
 #     name: my-secret
+
+# Vector Sync Configuration
+# Background synchronization of Nextcloud content into vector database for semantic search
+vectorSync:
+  # Enable background vector synchronization
+  enabled: false
+  # Scan interval in seconds (how often to check for changes)
+  scanInterval: 3600
+  # Number of concurrent processor workers
+  processorWorkers: 3
+  # Maximum queue size for documents pending indexing
+  queueMaxSize: 10000
+
+# Document Chunking Configuration
+# Controls how documents are split into chunks before embedding
+# Only relevant when vectorSync.enabled is true
+documentChunking:
+  # Number of words per chunk (default: 512)
+  # Smaller chunks (256-384): Better for precise searches, more chunks to store
+  # Medium chunks (512-768): Balanced approach (recommended for most use cases)
+  # Larger chunks (1024+): Better for context, less precise matching
+  chunkSize: 512
+  # Number of overlapping words between chunks (default: 50)
+  # Recommended: 10-20% of chunkSize for context preservation across boundaries
+  # Must be less than chunkSize
+  chunkOverlap: 50
+
+# Qdrant Vector Database Configuration
+# Three deployment modes available:
+# 1. Local In-Memory: Fast, ephemeral, zero-config (mode: "memory")
+# 2. Local Persistent: File-based, survives restarts (mode: "persistent")
+# 3. Network: Dedicated Qdrant service, production-ready (mode: "network")
+qdrant:
+  # Qdrant mode: "memory", "persistent", or "network"
+  # - memory: In-memory storage (:memory:) - default, zero config, data lost on restart
+  # - persistent: Local file storage - data persists across restarts, suitable for small/medium deployments
+  # - network: Dedicated Qdrant service (see networkMode below)
+  mode: "memory"
+
+  # Collection name for vector data
+  collection: "nextcloud_content"
+
+  # Local persistent mode configuration (only used when mode: "persistent")
+  localPersistence:
+    # Enable persistent volume for local Qdrant data
+    enabled: true
+    # Storage class (leave empty for default)
+    storageClass: ""
+    accessMode: ReadWriteOnce
+    # Size for local Qdrant storage
+    size: 1Gi
+    # Path where Qdrant data is stored (relative to /app/data)
+    # Default: /app/data/qdrant
+    dataPath: "/app/data/qdrant"
+    # Use existing PVC
+    existingClaim: ""
+
+  # Network mode configuration (only used when mode: "network")
+  networkMode:
+    # Deploy Qdrant as a subchart (if true) or use external Qdrant (if false)
+    deploySubchart: false
+    # External Qdrant URL (used when deploySubchart: false)
+    # Example: "http://qdrant.default.svc.cluster.local:6333"
+    externalUrl: ""
+    # Optional API key for Qdrant authentication
+    apiKey: ""
+    # Use existing secret for API key
+    existingSecret: ""
+    secretKey: "api-key"
+
+  # Qdrant subchart configuration (only used when mode: "network" and networkMode.deploySubchart: true)
+  # All values are passed through to the qdrant/qdrant chart.
+  # See https://github.com/qdrant/qdrant-helm for full configuration options.
+  subchart:
+    # Number of Qdrant replicas
+    replicaCount: 1
+    image:
+      # Qdrant version
+      tag: v1.12.5
+    config:
+      cluster:
+        # Enable distributed cluster mode
+        enabled: false
+    # Persistent storage for vector data
+    persistence:
+      size: 10Gi
+      storageClass: ""
+      accessModes:
+        - ReadWriteOnce
+    # Resource limits and requests
+    resources:
+      requests:
+        cpu: 200m
+        memory: 512Mi
+      limits:
+        cpu: 1000m
+        memory: 2Gi
+
+# Ollama Embedding Service
+# Deployed as a subchart when enabled. All values are passed through to the ollama/ollama chart.
+# See https://github.com/otwld/ollama-helm for full configuration options.
+ollama:
+  # Enable Ollama subchart deployment
+  # Set to true to deploy Ollama as a subchart, or false to use an external Ollama instance
+  enabled: false
+  # External Ollama URL (use this if you have Ollama deployed elsewhere)
+  # When set, use enabled: false to prevent deploying the subchart
+  # Example: "http://ollama.default.svc.cluster.local:11434"
+  url: ""
+  # Embedding model to use
+  embeddingModel: "nomic-embed-text"
+  # Verify SSL certificates when connecting to Ollama
+  verifySsl: true
+  # Number of Ollama replicas (only used when subchart is deployed)
+  replicaCount: 1
+  # Ollama configuration (only used when subchart is deployed)
+  ollama:
+    # Models to automatically pull on startup
+    models:
+      pull:
+        - nomic-embed-text
+  # Persistent storage for models (only used when subchart is deployed)
+  persistentVolume:
+    enabled: true
+    size: 20Gi
+    storageClass: ""
+  # Resource limits and requests (only used when subchart is deployed)
+  resources:
+    requests:
+      cpu: 500m
+      memory: 1Gi
+    limits:
+      cpu: 2000m
+      memory: 4Gi
+
+# OpenAI-compatible Embedding Provider
+# Alternative to Ollama for embedding generation. Can be used with OpenAI or any compatible API.
+openai:
+  # Enable OpenAI embedding provider
+  enabled: false
+  # OpenAI API key (only used if existingSecret is not set)
+  apiKey: ""
+  # Name of existing secret containing the API key
+  existingSecret: ""
+  # Key in the secret that contains the API key
+  secretKey: "api-key"
+  # Optional custom API endpoint (e.g., for Azure OpenAI or local compatible services)
+  baseUrl: ""
@@ -58,7 +58,7 @@ services:
      - ./tests/fixtures/nginx.conf:/etc/nginx/nginx.conf:ro

  unstructured:
-    image: downloads.unstructured.io/unstructured-io/unstructured-api:latest@sha256:a43ab55898599157fb0e0e097dabb8ecdd1d8e3df1ae5b67c6e15a136b171a6c
+    image: downloads.unstructured.io/unstructured-io/unstructured-api:latest@sha256:54282d3a25f33fd6cf69bc45b3d37770f213593f58b6dfe5e85fe546376b2807
    restart: always
    ports:
      - 127.0.0.1:8002:8000
@@ -76,11 +76,46 @@ services:
        condition: service_healthy
    ports:
      - 127.0.0.1:8000:8000
+    volumes:
+      - mcp-data:/app/data
    environment:
      - NEXTCLOUD_HOST=http://app:80
      - NEXTCLOUD_USERNAME=admin
      - NEXTCLOUD_PASSWORD=admin

+      # Vector sync configuration (ADR-007)
+      - VECTOR_SYNC_ENABLED=true
+      - VECTOR_SYNC_SCAN_INTERVAL=10
+      - VECTOR_SYNC_PROCESSOR_WORKERS=1
+
+      - LOG_FORMAT=text
+
+      # Qdrant configuration (three modes):
+      # 1. Network mode: Set QDRANT_URL=http://qdrant:6333 (requires qdrant service)
+      # 2. In-memory mode: Set QDRANT_LOCATION=:memory: (default if nothing set)
+      # 3. Persistent local: Set QDRANT_LOCATION=/app/data/qdrant (stored in mcp-data volume)
+      - QDRANT_LOCATION=":memory:"  # In-memory mode for CI/testing (no external service required)
+      #- QDRANT_URL=http://qdrant:6333  # Uncomment for network mode
+      #- QDRANT_API_KEY=${QDRANT_API_KEY:-my_secret_api_key}  # Only for network mode
+
+      # Collection naming: Auto-generated as {deployment-id}-{model-name}
+      # - Deployment ID: OTEL_SERVICE_NAME (if set) or hostname (fallback)
+      # - Model name: OLLAMA_EMBEDDING_MODEL
+      # - Example: "nextcloud-mcp-server-nomic-embed-text"
+      # - Changing models creates new collection (requires re-embedding)
+      # - Set QDRANT_COLLECTION to override auto-generation:
+      - QDRANT_COLLECTION=nextcloud_content
+
+      # Ollama configuration (optional - uses SimpleEmbeddingProvider if not set)
+      # - OLLAMA_BASE_URL=https://ollama.internal.coutinho.io:443
+      # - OLLAMA_EMBEDDING_MODEL=nomic-embed-text  # Changing this creates new collection
+      # - OLLAMA_VERIFY_SSL=false
+
+      # Document chunking configuration (for vector embeddings)
+      # Tune these based on your embedding model and content type
+      # - DOCUMENT_CHUNK_SIZE=512      # Words per chunk (default: 512)
+      # - DOCUMENT_CHUNK_OVERLAP=50    # Overlapping words (default: 50, recommended: 10-20% of chunk size)
+
  mcp-oauth:
    build: .
    command: ["--transport", "streamable-http", "--oauth", "--port", "8001", "--oauth-token-type", "jwt"]
@@ -183,6 +218,24 @@ services:
      - keycloak-tokens:/app/data
      - keycloak-oauth-storage:/app/.oauth

+  qdrant:
+    image: qdrant/qdrant:v1.15.5
+    restart: always
+    ports:
+      - 127.0.0.1:6333:6333  # REST API
+      - 127.0.0.1:6334:6334  # gRPC (optional)
+    volumes:
+      - qdrant-data:/qdrant/storage
+    environment:
+      - QDRANT__SERVICE__API_KEY=${QDRANT_API_KEY:-my_secret_api_key}
+    healthcheck:
+      test: ["CMD-SHELL", "test -f /qdrant/.qdrant-initialized"]
+      interval: 10s
+      timeout: 5s
+      retries: 10
+    profiles:
+      - qdrant
+
 volumes:
  nextcloud:
  db:
@@ -190,3 +243,5 @@ volumes:
  oauth-tokens:
  keycloak-tokens:
  keycloak-oauth-storage:
+  qdrant-data:
+  mcp-data:
@@ -1,7 +1,9 @@
 # ADR-003: Vector Database and Semantic Search Architecture

 ## Status
-Proposed
+Superseded by ADR-007
+
+**Note**: This ADR was never implemented. The core technical decisions (Qdrant, embeddings, hybrid search) remain valid and are incorporated into ADR-007, which adds user-controlled background job management, task queuing, multi-user scheduling, and web UI integration. See [ADR-007: Background Vector Sync with User-Controlled Job Management](./ADR-007-background-vector-sync-job-management.md) for the implemented architecture.

 ## Context

@@ -0,0 +1,647 @@
+# ADR-008: MCP Sampling for Multi-App Semantic Search with RAG
+
+**Status**: Proposed
+**Date**: 2025-01-11
+**Depends On**: ADR-007 (Background Vector Sync)
+
+## Context
+
+ADR-007 established a background synchronization architecture that maintains a vector database of Nextcloud content across multiple apps (notes, calendar, deck, files, contacts), enabling semantic search via the `nc_semantic_search` tool. This tool returns a list of relevant documents with excerpts, similarity scores, and metadata—providing the raw materials for answering user questions.
+
+However, users typically don't want a list of documents—they want answers to their questions. When a user asks "What are my project goals?" or "When is my next dentist appointment?", they expect a natural language response that synthesizes information from multiple sources and document types, not a ranked list of excerpts. This is the pattern of Retrieval-Augmented Generation (RAG): retrieve relevant context from all Nextcloud apps, then generate a cohesive answer.
+
+The challenge is: who should generate the answer, and how?
+
+**Option 1: Server-side LLM**
+The MCP server could maintain its own LLM connection (OpenAI API, Ollama, etc.), construct prompts from retrieved documents, and return generated answers directly. This approach has significant drawbacks:
+
+- **Duplicate infrastructure**: MCP clients (like Claude Desktop) already have LLM capabilities. The server would duplicate this with its own LLM integration, API keys, and configuration.
+- **Cost and billing**: The server operator bears LLM costs for all users, creating billing and quota management challenges.
+- **Limited model choice**: Users are locked into whatever LLM the server configures. They cannot choose their preferred model or provider.
+- **Privacy concerns**: User queries and document contents flow through a server-controlled LLM, creating a potential privacy boundary.
+- **Configuration complexity**: Server operators must configure embedding services (for search) AND generation models (for answers), each with different API keys, rate limits, and failure modes.
+
+**Option 2: Return documents, let client generate**
+The server could simply return retrieved documents and rely on the MCP client's existing LLM to generate answers. The user would call `nc_notes_semantic_search`, receive documents, and then the client would include those documents in its context when responding to the user's original question. This approach also has limitations:
+
+- **Context window waste**: The client must include all document content in its context window, even if only small excerpts are relevant. For 5-10 documents, this can consume significant context space.
+- **Inconsistent behavior**: Whether the client synthesizes an answer or just displays documents depends on the client's implementation and the user's conversational style. There's no guaranteed answer generation.
+- **Poor citations**: The client may generate an answer but fail to cite which specific documents were used, making it hard to verify claims.
+- **User confusion**: Users see a tool that returns "search results" rather than "answers", requiring them to explicitly ask for synthesis.
+
+**Option 3: MCP Sampling**
+The Model Context Protocol specification includes a **sampling** capability that allows MCP servers to request LLM completions from their clients. The server constructs a prompt with retrieved context, sends it to the client via `sampling/createMessage`, and the client's LLM generates a response that the server can return as a tool result.
+
+This approach combines the best of both options:
+
+- **No server-side LLM**: The server has no API keys, no LLM configuration, no billing concerns.
+- **User choice**: The MCP client controls which LLM is used (Claude, GPT-4, local Ollama) and who pays for it.
+- **User transparency**: MCP clients SHOULD present sampling requests to users for approval, making it clear when the server is requesting an LLM call.
+- **Consistent citations**: The server constructs a prompt that explicitly includes document references, ensuring generated answers cite sources.
+- **Single tool call**: Users call one tool (`nc_notes_semantic_search_answer`) and receive a complete answer with citations—no multi-turn conversation needed.
+
+The sampling approach shifts responsibility appropriately: the MCP server is responsible for information retrieval and context construction (its expertise), while the MCP client is responsible for LLM access and user preferences (its expertise). This follows the MCP design philosophy of separating concerns between servers (data access) and clients (user interaction).
+
+However, sampling introduces new considerations:
+
+**Client compatibility**: Not all MCP clients implement sampling. The server must gracefully degrade when sampling is unavailable, falling back to returning documents without generated answers.
+
+**Latency**: Sampling adds a full round-trip to the client and back, plus LLM generation time. A typical flow involves: (1) client calls tool, (2) server retrieves documents, (3) server requests sampling from client, (4) client generates answer, (5) server returns answer to client. This can take 2-5 seconds depending on LLM speed, compared to 100-500ms for document retrieval alone.
+
+**User approval**: MCP clients SHOULD prompt users to approve sampling requests, allowing users to review the prompt before sending it to their LLM. This is a privacy and security feature (prevents servers from making arbitrary LLM requests) but adds interaction friction.
+
+**Prompt engineering**: The server must construct effective prompts that guide the LLM to generate useful, well-cited answers. Unlike Option 1 where the server controls the LLM directly, the server has less control over how the prompt is interpreted.
+
+Despite these considerations, MCP sampling provides the most principled solution for RAG-enhanced semantic search. It respects the client-server boundary, avoids duplicate infrastructure, and delivers the user experience users expect from semantic search tools.
+
+This ADR proposes adding a new tool, `nc_semantic_search_answer`, that uses MCP sampling to generate natural language answers from retrieved Nextcloud content across all indexed apps (notes, calendar, deck, files, contacts).
+
+## Decision
+
+We will implement a new MCP tool `nc_semantic_search_answer` that retrieves relevant documents via vector similarity search across all indexed Nextcloud apps and uses MCP sampling to generate natural language answers. The tool will construct a prompt that includes the user's original query and excerpts from retrieved documents (notes, calendar events, deck cards, files, contacts), request an LLM completion via `ctx.session.create_message()`, and return the generated answer along with source citations.
+
+The existing `nc_semantic_search` tool will remain unchanged, providing users with a choice: call the original tool for raw document results, or call the new sampling-enhanced tool for generated answers. This dual-tool approach respects different use cases—some users want to browse documents, others want direct answers.
+
+### API Design
+
+**Tool Signature**:
+```python
+@mcp.tool()
+@require_scopes("semantic:read")
+async def nc_semantic_search_answer(
+    query: str,
+    ctx: Context,
+    limit: int = 5,
+    score_threshold: float = 0.7,
+    max_answer_tokens: int = 500,
+) -> SamplingSearchResponse
+```
+
+**Parameters**:
+- `query`: The user's natural language question
+- `ctx`: MCP context for session access
+- `limit`: Maximum documents to retrieve (default 5)
+- `score_threshold`: Minimum similarity score 0-1 (default 0.7)
+- `max_answer_tokens`: Maximum tokens for generated answer (default 500)
+
+**Response Model**:
+```python
+class SamplingSearchResponse(BaseResponse):
+    query: str                              # Original user query
+    generated_answer: str                   # LLM-generated answer
+    sources: list[SemanticSearchResult]     # Supporting documents
+    total_found: int                        # Total matching documents
+    search_method: str = "semantic_sampling"
+    model_used: str | None = None           # Model that generated answer
+    stop_reason: str | None = None          # Why generation stopped
+```
+
+The response includes both the generated answer (for direct user consumption) and the source documents (for verification and citation). The `model_used` field records which LLM generated the answer, allowing users to understand which model provided the response.
+
+### Sampling API Usage
+
+The tool uses the MCP Python SDK's `ServerSession.create_message()` API:
+
+```python
+from mcp.types import SamplingMessage, TextContent, ModelPreferences, ModelHint
+
+# Construct prompt with retrieved context
+prompt = (
+    f"{query}\n\n"
+    f"Here are relevant documents from Nextcloud (notes, calendar events, deck cards, files, contacts):\n\n"
+    f"{context}\n\n"
+    f"Based on the documents above, please provide a comprehensive answer. "
+    f"Cite the document numbers when referencing specific information."
+)
+
+# Request LLM completion via MCP sampling
+sampling_result = await ctx.session.create_message(
+    messages=[
+        SamplingMessage(
+            role="user",
+            content=TextContent(type="text", text=prompt),
+        )
+    ],
+    max_tokens=max_answer_tokens,
+    temperature=0.7,
+    model_preferences=ModelPreferences(
+        hints=[ModelHint(name="claude-3-5-sonnet")],
+        intelligencePriority=0.8,
+        speedPriority=0.5,
+    ),
+    include_context="thisServer",
+)
+
+# Extract answer from response
+if sampling_result.content.type == "text":
+    generated_answer = sampling_result.content.text
+```
+
+**Key parameters**:
+- `messages`: Chat-style messages with role ("user" or "assistant") and content
+- `max_tokens`: Limits response length to control costs and latency
+- `temperature`: 0.7 balances creativity with consistency for factual answers
+- `model_preferences`: Hints suggest Claude Sonnet for balanced intelligence/speed
+- `include_context`: "thisServer" includes MCP server context in client's LLM call
+
+The `include_context` parameter is particularly important. When set to "thisServer", the MCP client provides its LLM with context about the server's capabilities, tools, and resources. This allows the LLM to reference the Nextcloud MCP server when generating answers, creating more contextually appropriate responses. For example, the LLM might say "Based on your Nextcloud Notes..." rather than generic phrasing.
+
+### Prompt Construction
+
+The prompt construction follows a structured template:
+
+```
+[User's original query]
+
+Here are relevant documents from Nextcloud (notes, calendar events, deck cards, files, contacts):
+
+[Document 1]
+Type: note
+Title: Project Kickoff Notes
+Category: Work
+Excerpt: The primary goal for Q1 2025 is to improve semantic search...
+Relevance Score: 0.92
+
+[Document 2]
+Type: calendar_event
+Title: Team Planning Meeting
+Location: Conference Room A
+Excerpt: Scheduled for Jan 15 at 2pm. Agenda: Discuss Q1 objectives and timeline...
+Relevance Score: 0.88
+
+[Document 3]
+Type: deck_card
+Title: Implement semantic search
+Labels: feature, high-priority
+Excerpt: This card tracks the semantic search implementation. Due: Jan 30...
+Relevance Score: 0.85
+
+Based on the documents above, please provide a comprehensive answer.
+Cite the document numbers when referencing specific information.
+```
+
+This structure ensures:
+- The user's original query is preserved verbatim
+- Documents are clearly delineated and numbered for citation
+- Metadata (title, category, score) provides context
+- Explicit instruction to cite sources encourages proper attribution
+
+The prompt is intentionally simple and fixed (not configurable). Allowing users to customize the prompt would complicate the API and introduce prompt injection risks. The fixed structure ensures consistent, well-cited answers across all users.
+
+### Fallback Behavior
+
+Sampling may fail for several reasons:
+- Client doesn't support sampling (e.g., MCP Inspector without callbacks)
+- User declines the sampling request
+- Network errors during sampling round-trip
+- LLM generation errors
+
+The tool handles all failures gracefully by falling back to returning documents without a generated answer:
+
+```python
+try:
+    sampling_result = await ctx.session.create_message(...)
+    generated_answer = sampling_result.content.text
+except Exception as e:
+    logger.warning(f"Sampling failed: {e}, returning search results only")
+    generated_answer = (
+        f"[Sampling unavailable: {str(e)}]\n\n"
+        f"Found {total_found} relevant documents. Please review the sources below."
+    )
+```
+
+This ensures the tool always returns useful information—either a generated answer or the underlying documents—rather than failing completely. The user knows sampling was attempted (via the `[Sampling unavailable]` prefix) and can still access the retrieved context.
+
+### No Results Handling
+
+When semantic search finds no relevant documents (all below `score_threshold`), the tool returns a clear message without attempting sampling:
+
+```python
+if not search_response.results:
+    return SamplingSearchResponse(
+        query=query,
+        generated_answer="No relevant documents found in your Nextcloud content for this query.",
+        sources=[],
+        total_found=0,
+        search_method="semantic_sampling",
+        success=True,
+    )
+```
+
+This avoids wasting a sampling call (and user approval) when there's no content to base an answer on.
+
+### User Experience Flow
+
+**Typical successful flow**:
+1. User calls `nc_semantic_search_answer` with query "What are my Q1 2025 objectives?"
+2. Server retrieves 5 relevant documents via vector search (2 notes, 2 calendar events, 1 deck card)
+3. Server constructs prompt with document excerpts showing mixed content types
+4. Server sends `sampling/createMessage` request to client
+5. Client prompts user: "MCP server wants to generate an answer using these documents. Allow?"
+6. User approves (or client auto-approves based on configuration)
+7. Client sends prompt to LLM (Claude, GPT-4, etc.)
+8. LLM generates answer with citations: "Based on Document 1 (note: Project Kickoff), Document 2 (calendar: Team Planning Meeting), and Document 3 (deck card: Implement semantic search)..."
+9. Client returns answer to server
+10. Server returns `SamplingSearchResponse` with answer and sources
+11. User sees complete answer with citations across multiple Nextcloud apps
+
+**Fallback flow** (sampling unavailable):
+1-3. Same as above
+4. Server attempts `ctx.session.create_message()`
+5. Client raises exception: "Sampling not supported"
+6. Server catches exception, logs warning
+7. Server returns `SamplingSearchResponse` with documents and "[Sampling unavailable]" message
+8. User sees raw documents instead of generated answer
+
+**No results flow**:
+1-2. Same as above but no documents match threshold
+3. Server returns `SamplingSearchResponse` with "No relevant documents" message
+4. No sampling attempted (no prompt sent)
+5. User sees clear "not found" message
+
+This three-tier approach (answer → documents → error message) ensures users always receive useful feedback appropriate to the situation.
+
+## Implementation
+
+### Response Model
+
+Add to `nextcloud_mcp_server/models/semantic.py` (new file for semantic search models):
+
+```python
+from pydantic import Field
+
+class SamplingSearchResponse(BaseResponse):
+    """Response from semantic search with LLM-generated answer via MCP sampling.
+
+    This response includes both a generated natural language answer (created by
+    the MCP client's LLM via sampling) and the source documents used to generate
+    that answer. Users can read the answer for quick information and review
+    sources for verification and deeper exploration.
+
+    Attributes:
+        query: The original user query
+        generated_answer: Natural language answer generated by client's LLM
+        sources: List of semantic search results used as context
+        total_found: Total number of matching documents found
+        search_method: Always "semantic_sampling" for this response type
+        model_used: Name of model that generated the answer (e.g., "claude-3-5-sonnet")
+        stop_reason: Why generation stopped ("endTurn", "maxTokens", etc.)
+    """
+
+    query: str = Field(..., description="Original user query")
+    generated_answer: str = Field(
+        ...,
+        description="LLM-generated answer based on retrieved documents"
+    )
+    sources: list[SemanticSearchResult] = Field(
+        default_factory=list,
+        description="Source documents with excerpts and relevance scores"
+    )
+    total_found: int = Field(..., description="Total matching documents")
+    search_method: str = Field(
+        default="semantic_sampling",
+        description="Search method used"
+    )
+    model_used: str | None = Field(
+        default=None,
+        description="Model that generated the answer"
+    )
+    stop_reason: str | None = Field(
+        default=None,
+        description="Reason generation stopped"
+    )
+```
+
+### Tool Implementation
+
+Add to `nextcloud_mcp_server/server/semantic.py` (new file for semantic search tools):
+
+```python
+import logging
+from mcp.types import ModelHint, ModelPreferences, SamplingMessage, TextContent
+
+logger = logging.getLogger(__name__)
+
+
+@mcp.tool()
+@require_scopes("semantic:read")
+async def nc_semantic_search_answer(
+    query: str,
+    ctx: Context,
+    limit: int = 5,
+    score_threshold: float = 0.7,
+    max_answer_tokens: int = 500,
+) -> SamplingSearchResponse:
+    """
+    Semantic search with LLM-generated answer using MCP sampling.
+
+    Retrieves relevant documents from Nextcloud across all indexed apps (notes,
+    calendar, deck, files, contacts) using vector similarity search, then uses
+    MCP sampling to request the client's LLM to generate a natural language
+    answer based on the retrieved context.
+
+    This tool combines the power of semantic search (finding relevant content
+    across all your Nextcloud apps) with LLM generation (synthesizing that
+    content into coherent answers). The generated answer includes citations
+    to specific documents with their types, allowing users to verify claims
+    and explore sources.
+
+    The LLM generation happens client-side via MCP sampling. The MCP client
+    controls which model is used, who pays for it, and whether to prompt the
+    user for approval. This keeps the server simple (no LLM API keys needed)
+    while giving users full control over their LLM interactions.
+
+    Args:
+        query: Natural language question to answer (e.g., "What are my Q1 objectives?" or "When is my next dentist appointment?")
+        ctx: MCP context for session access
+        limit: Maximum number of documents to retrieve (default: 5)
+        score_threshold: Minimum similarity score 0-1 (default: 0.7)
+        max_answer_tokens: Maximum tokens for generated answer (default: 500)
+
+    Returns:
+        SamplingSearchResponse containing:
+        - generated_answer: Natural language answer with citations
+        - sources: List of documents with excerpts and relevance scores
+        - model_used: Which model generated the answer
+        - stop_reason: Why generation stopped
+
+    Note: Requires MCP client to support sampling. If sampling is unavailable,
+    the tool gracefully degrades to returning documents with an explanation.
+    The client may prompt the user to approve the sampling request.
+
+    Examples:
+        >>> # Query about objectives across multiple apps
+        >>> result = await nc_semantic_search_answer(
+        ...     query="What are my Q1 2025 project goals?",
+        ...     ctx=ctx
+        ... )
+        >>> print(result.generated_answer)
+        "Based on Document 1 (note: Project Kickoff), Document 2 (calendar event:
+        Q1 Planning Meeting), and Document 3 (deck card: Implement semantic search),
+        your main goals are: 1) Improve semantic search accuracy by 20%,
+        2) Deploy new embedding model, 3) Reduce indexing latency..."
+
+        >>> # Query about appointments
+        >>> result = await nc_semantic_search_answer(
+        ...     query="When is my next dentist appointment?",
+        ...     ctx=ctx,
+        ...     limit=10
+        ... )
+        >>> len(result.sources)  # Calendar events and related notes
+        3
+    """
+    # 1. Retrieve relevant documents via existing semantic search
+    search_response = await nc_semantic_search(
+        query=query,
+        ctx=ctx,
+        limit=limit,
+        score_threshold=score_threshold,
+    )
+
+    # 2. Handle no results case - don't waste a sampling call
+    if not search_response.results:
+        logger.debug(f"No documents found for query: {query}")
+        return SamplingSearchResponse(
+            query=query,
+            generated_answer="No relevant documents found in your Nextcloud content for this query.",
+            sources=[],
+            total_found=0,
+            search_method="semantic_sampling",
+            success=True,
+        )
+
+    # 3. Construct context from retrieved documents
+    context_parts = []
+    for idx, result in enumerate(search_response.results, 1):
+        context_parts.append(
+            f"[Document {idx}]\n"
+            f"Title: {result.title}\n"
+            f"Category: {result.category}\n"
+            f"Excerpt: {result.excerpt}\n"
+            f"Relevance Score: {result.score:.2f}\n"
+        )
+
+    context = "\n".join(context_parts)
+
+    # 4. Construct prompt - reuse user's query, add context and instructions
+    prompt = (
+        f"{query}\n\n"
+        f"Here are relevant documents from Nextcloud (notes, calendar events, deck cards, files, contacts):\n\n"
+        f"{context}\n\n"
+        f"Based on the documents above, please provide a comprehensive answer. "
+        f"Cite the document numbers when referencing specific information."
+    )
+
+    logger.debug(
+        f"Requesting sampling for query: {query} "
+        f"({len(search_response.results)} documents retrieved)"
+    )
+
+    # 5. Request LLM completion via MCP sampling
+    try:
+        sampling_result = await ctx.session.create_message(
+            messages=[
+                SamplingMessage(
+                    role="user",
+                    content=TextContent(type="text", text=prompt),
+                )
+            ],
+            max_tokens=max_answer_tokens,
+            temperature=0.7,
+            model_preferences=ModelPreferences(
+                hints=[ModelHint(name="claude-3-5-sonnet")],
+                intelligencePriority=0.8,
+                speedPriority=0.5,
+            ),
+            include_context="thisServer",
+        )
+
+        # 6. Extract answer from sampling response
+        if sampling_result.content.type == "text":
+            generated_answer = sampling_result.content.text
+        else:
+            # Handle non-text responses (shouldn't happen for text prompts)
+            generated_answer = (
+                f"Received non-text response of type: {sampling_result.content.type}"
+            )
+            logger.warning(
+                f"Unexpected content type from sampling: {sampling_result.content.type}"
+            )
+
+        logger.info(
+            f"Sampling successful: model={sampling_result.model}, "
+            f"stop_reason={sampling_result.stopReason}"
+        )
+
+        return SamplingSearchResponse(
+            query=query,
+            generated_answer=generated_answer,
+            sources=search_response.results,
+            total_found=search_response.total_found,
+            search_method="semantic_sampling",
+            model_used=sampling_result.model,
+            stop_reason=sampling_result.stopReason,
+            success=True,
+        )
+
+    except Exception as e:
+        # Fallback: Return documents without generated answer
+        logger.warning(
+            f"Sampling failed ({type(e).__name__}: {e}), "
+            f"returning search results only"
+        )
+
+        return SamplingSearchResponse(
+            query=query,
+            generated_answer=(
+                f"[Sampling unavailable: {str(e)}]\n\n"
+                f"Found {search_response.total_found} relevant documents. "
+                f"Please review the sources below."
+            ),
+            sources=search_response.results,
+            total_found=search_response.total_found,
+            search_method="semantic_sampling_fallback",
+            success=True,
+        )
+```
+
+### Import Updates
+
+Add to top of `nextcloud_mcp_server/server/semantic.py`:
+
+```python
+from mcp.types import ModelHint, ModelPreferences, SamplingMessage, TextContent
+```
+
+Add to `nextcloud_mcp_server/models/semantic.py` exports:
+
+```python
+__all__ = [
+    "SemanticSearchResult",
+    "SemanticSearchResponse",
+    "SamplingSearchResponse",
+]
+```
+
+## Consequences
+
+### Benefits
+
+**Improved User Experience**: Users receive direct answers to questions rather than lists of documents, matching expectations from modern AI interfaces.
+
+**Proper Attribution**: Generated answers include citations to source documents, allowing users to verify claims and explore deeper.
+
+**No Server-Side LLM**: The server has no LLM dependencies, API keys, or billing concerns. All LLM interactions happen client-side.
+
+**User Control**: MCP clients control which model is used and may prompt users to approve sampling requests, maintaining transparency and user agency.
+
+**Graceful Degradation**: The tool works even when sampling is unavailable, falling back to returning documents. Existing clients continue working without changes.
+
+**Consistent Architecture**: Follows MCP's client-server separation: servers provide data access, clients provide user interaction and LLM capabilities.
+
+### Limitations
+
+**Sampling Support Required**: Not all MCP clients implement sampling. Users with basic clients see fallback behavior (documents without answers).
+
+**Added Latency**: Sampling adds 2-5 seconds to tool execution due to client round-trip and LLM generation time. Users must wait longer for answers than for raw search results.
+
+**User Approval Friction**: MCP clients SHOULD prompt users to approve sampling requests. This adds an extra interaction step before answers are generated.
+
+**Limited Prompt Control**: The server cannot fully control how the client's LLM interprets the prompt. Different models may generate different quality answers.
+
+**No Caching**: Each query requires a new sampling call. The server doesn't cache generated answers (clients may cache if they choose).
+
+**Token Costs**: LLM generation consumes tokens from the user's or client's quota. Heavy users may incur costs or hit rate limits.
+
+### Performance Characteristics
+
+**Typical latency**:
+- Document retrieval (vector search): 100-300ms
+- Sampling round-trip (client communication): 50-200ms
+- LLM generation (client-side): 1-4 seconds
+- **Total**: 2-5 seconds end-to-end
+
+**Throughput**: Sampling is fully async. The server can handle multiple concurrent sampling requests (limited by MCP client's concurrency, not server capacity).
+
+**Resource usage**: Minimal server-side. No GPU, no LLM model loading, no large memory requirements. Sampling happens entirely client-side.
+
+### Security Considerations
+
+**Prompt Injection Risk**: If user queries contain adversarial text designed to manipulate LLM behavior, those queries are included verbatim in the sampling prompt. Mitigation: The structured prompt format and explicit instructions ("based on documents above") constrain LLM behavior.
+
+**Data Privacy**: User queries and document excerpts are sent to the client's LLM. For cloud LLMs (OpenAI, Anthropic), this means data leaves the server's control. Mitigation: MCP clients SHOULD present sampling requests to users for approval, making data flows transparent. Users choose their LLM provider.
+
+**Sampling Abuse**: A malicious server could spam sampling requests to drain user quotas. Mitigation: MCP clients control approval and can rate-limit or block sampling from misbehaving servers.
+
+## Alternatives Considered
+
+### Server-Side LLM Integration
+
+**Approach**: Configure the MCP server with OpenAI API key or local Ollama instance. Generate answers server-side.
+
+**Rejected Because**:
+- Duplicates LLM infrastructure that MCP clients already have
+- Creates billing and API key management burden for server operators
+- Locks users into server-configured models
+- Violates MCP's client-server separation principle
+
+### Multi-Turn Conversation Pattern
+
+**Approach**: `nc_notes_semantic_search` returns documents. User asks follow-up question. Client's LLM uses previous tool results as context.
+
+**Rejected Because**:
+- Requires users to know to ask follow-up questions
+- Consumes context window with full document content
+- Inconsistent behavior across clients
+- Poor citation (LLM may not reference which documents it used)
+
+### Pre-Generated Summaries
+
+**Approach**: Generate and cache summaries during indexing. Return summaries instead of excerpts.
+
+**Rejected Because**:
+- Summaries become stale as documents change
+- Summary quality depends on server-side LLM (same problems as server-side generation)
+- Summaries are generic, not tailored to specific queries
+
+### Streaming Responses
+
+**Approach**: Use MCP sampling with streaming to return incremental answer chunks.
+
+**Deferred Because**:
+- MCP sampling streaming support unclear in current specification
+- Adds significant implementation complexity
+- Tool responses in MCP are typically atomic
+- Can be added later without breaking changes
+
+## Related Decisions
+
+**ADR-007**: Background Vector Sync provides the semantic search infrastructure that this ADR enhances with LLM generation.
+
+**ADR-004**: Progressive Consent architecture applies to sampling—users consent to sampling requests via MCP client approval prompts.
+
+## References
+
+- [MCP Specification - Sampling](https://modelcontextprotocol.io/docs/specification/2025-06-18/client/sampling)
+- [MCP Python SDK - ServerSession.create_message](https://github.com/modelcontextprotocol/python-sdk/blob/main/src/mcp/server/session.py#L215)
+- [MCP Python SDK - Sampling Example](https://github.com/modelcontextprotocol/python-sdk/blob/main/examples/snippets/servers/sampling.py)
+- [MCP Types - SamplingMessage](https://github.com/modelcontextprotocol/python-sdk/blob/main/src/mcp/types.py#L1038)
+- [MCP Types - CreateMessageResult](https://github.com/modelcontextprotocol/python-sdk/blob/main/src/mcp/types.py#L1073)
+- [Retrieval-Augmented Generation (RAG) - Lewis et al. 2020](https://arxiv.org/abs/2005.11401)
+
+## Implementation Checklist
+
+- [ ] Create ADR-008 document (this file)
+- [ ] Create `nextcloud_mcp_server/models/semantic.py` for semantic search models
+- [ ] Add `SamplingSearchResponse` model to `nextcloud_mcp_server/models/semantic.py`
+- [ ] Create `nextcloud_mcp_server/server/semantic.py` for semantic search tools
+- [ ] Implement `nc_semantic_search_answer` tool in `nextcloud_mcp_server/server/semantic.py`
+- [ ] Add MCP sampling type imports (`SamplingMessage`, `TextContent`, etc.)
+- [ ] Write unit tests with mocked sampling (`tests/unit/server/test_semantic.py`)
+- [ ] Create integration tests (`tests/integration/test_sampling.py`)
+- [ ] Update `README.md` with new tool documentation in dedicated Semantic Search section
+- [ ] Update `CLAUDE.md` with sampling pattern guidance
+- [ ] Test with MCP client supporting sampling (Claude Desktop, MCP Inspector with callbacks)
+- [ ] Document client requirements and fallback behavior
+- [ ] Update oauth-architecture.md to add semantic:read scope
+- [ ] Create ADR-009 to document semantic:read scope decision
@@ -0,0 +1,268 @@
+# ADR-009: Generic `semantic:read` OAuth Scope for Multi-App Vector Search
+
+**Status**: Proposed
+**Date**: 2025-01-11
+**Depends On**: ADR-007 (Background Vector Sync), ADR-008 (MCP Sampling for Semantic Search)
+
+## Context
+
+ADR-007 established a background vector synchronization architecture that indexes content from multiple Nextcloud apps (notes, calendar events, deck cards, files, contacts) into a unified vector database. ADR-008 introduced semantic search tools (`nc_semantic_search`, `nc_semantic_search_answer`) that query this vector database and use MCP sampling to generate natural language answers.
+
+The question is: **What OAuth scopes should protect semantic search operations?**
+
+### Option 1: App-Specific Scopes
+
+Require users to have scopes for each app they want to search:
+
+```python
+@mcp.tool()
+@require_scopes("notes:read", "calendar:read", "deck:read", "files:read", "contacts:read")
+async def nc_semantic_search(query: str, ctx: Context) -> SemanticSearchResponse:
+    """Search across all indexed apps"""
+```
+
+**Advantages**:
+- Granular control - users explicitly consent to searching each app
+- Aligns with app-specific authorization model
+- Clear security boundary - can only search apps you can access
+
+**Disadvantages**:
+- **Brittle user experience**: If a user grants only `notes:read` but the tool requires all 5 scopes, the tool becomes invisible/unusable
+- **All-or-nothing enforcement**: Can't search notes alone - must grant all scopes or none
+- **Poor progressive consent**: User can't start with notes search and later add calendar
+- **Scope inflation**: Every new app adds another required scope
+- **Mismatched semantics**: User thinks "I want to search my notes" but must grant calendar, deck, files, contacts just to make the tool appear
+
+### Option 2: Single Generic Scope (Chosen)
+
+Introduce a new semantic search-specific scope:
+
+```python
+@mcp.tool()
+@require_scopes("semantic:read")
+async def nc_semantic_search(query: str, ctx: Context) -> SemanticSearchResponse:
+    """Search across all indexed apps"""
+```
+
+**Advantages**:
+- **Simple authorization**: One scope grants semantic search capability
+- **Progressive enablement**: User grants `semantic:read`, searches notes initially, then enables calendar indexing later
+- **Logical grouping**: Semantic search is a cross-app feature, deserving its own scope
+- **Future-proof**: New apps can be added to vector sync without changing OAuth scopes
+- **Matches user mental model**: "I want semantic search" → grant `semantic:read` (not "I want semantic search" → grant 5 unrelated app scopes)
+
+**Considerations**:
+- User could search apps they can't directly access via app-specific tools
+  - **Mitigation**: Dual-phase authorization (Phase 1: scope check passes with `semantic:read`, Phase 2: verify user can access each returned document via app-specific permissions)
+- Less granular than app-specific scopes
+  - **Counterpoint**: Semantic search is inherently cross-app - forcing per-app authorization defeats its purpose
+
+### Option 3: Hybrid Approach (Rejected)
+
+Support both: semantic search works with either `semantic:read` OR all app-specific scopes:
+
+```python
+@mcp.tool()
+@require_scopes("semantic:read", alternative_scopes=["notes:read", "calendar:read", ...])
+async def nc_semantic_search(query: str, ctx: Context) -> SemanticSearchResponse:
+    """Search across all indexed apps"""
+```
+
+**Rejected Because**:
+- Adds complexity to scope validation logic
+- Unclear to users which scopes they should grant
+- Alternative scopes still suffer from all-or-nothing problem
+- No significant benefit over Option 2 with dual-phase authorization
+
+## Decision
+
+We will introduce two new OAuth scopes specifically for semantic search operations:
+
+- **`semantic:read`**: Query vector database, perform semantic search, generate answers
+- **`semantic:write`**: Enable/disable background vector synchronization, manage indexing settings
+
+These scopes are **independent** of app-specific scopes (notes:read, calendar:read, etc.).
+
+### Tool Scope Assignments
+
+**Read Operations**:
+```python
+@mcp.tool()
+@require_scopes("semantic:read")
+async def nc_semantic_search(query: str, ctx: Context, limit: int = 10, score_threshold: float = 0.7) -> SemanticSearchResponse:
+    """Semantic search across all indexed Nextcloud apps"""
+
+@mcp.tool()
+@require_scopes("semantic:read")
+async def nc_semantic_search_answer(query: str, ctx: Context, limit: int = 5, max_answer_tokens: int = 500) -> SamplingSearchResponse:
+    """Semantic search with LLM-generated answer via MCP sampling"""
+
+@mcp.tool()
+@require_scopes("semantic:read")
+async def nc_get_vector_sync_status(ctx: Context) -> VectorSyncStatusResponse:
+    """Get current vector synchronization status (indexed count, pending count, status)"""
+```
+
+**Write Operations**:
+```python
+@mcp.tool()
+@require_scopes("semantic:write")
+async def nc_enable_vector_sync(ctx: Context) -> VectorSyncResponse:
+    """Enable background vector synchronization for this user"""
+
+@mcp.tool()
+@require_scopes("semantic:write")
+async def nc_disable_vector_sync(ctx: Context) -> VectorSyncResponse:
+    """Disable background vector synchronization"""
+```
+
+### Dual-Phase Authorization
+
+To ensure users can only access documents they have permission to view, semantic search implements **dual-phase authorization**:
+
+**Phase 1: Scope Check** (MCP Server)
+- User must have `semantic:read` scope to call semantic search tools
+- This grants permission to query the vector database
+
+**Phase 2: Document Verification** (Per-Result Filtering)
+- For each returned document, verify user has access via app-specific permissions
+- Uses `DocumentVerifier` interface per app:
+  - Notes: Call `/apps/notes/api/v1/notes/{id}` - if 404/403, exclude from results
+  - Calendar: Call `/remote.php/dav/calendars/username/calendar/event.ics` - if 404/403, exclude
+  - Deck: Call `/apps/deck/api/v1.0/boards/{board_id}/stacks/{stack_id}/cards/{card_id}` - if 404/403, exclude
+  - Files: Call `/remote.php/dav/files/username/path` with PROPFIND - if 404/403, exclude
+  - Contacts: Call `/remote.php/dav/addressbooks/username/addressbook/contact.vcf` - if 404/403, exclude
+
+This two-phase approach ensures:
+1. Semantic search is a **distinct capability** (like "global search") requiring explicit consent
+2. Results are **filtered** to only include documents the user can access
+3. No privilege escalation - users can't discover content they shouldn't see
+
+**Implementation**: See ADR-007 Phase 3 (Document Verification) and `DocumentVerifier` interface.
+
+### Scope Discovery
+
+The new scopes will be:
+- **Advertised** via PRM endpoint (`/.well-known/oauth-protected-resource/mcp`)
+- **Dynamically discovered** from `@require_scopes` decorators on semantic search tools
+- **Documented** in OAuth architecture (oauth-architecture.md)
+- **Included** in default client registration scopes
+
+## Consequences
+
+### Benefits
+
+**User Experience**:
+- Simple authorization: one scope for semantic search capability
+- Progressive enablement: grant `semantic:read`, enable indexing for apps later
+- Natural mental model: "semantic search" is a distinct feature deserving its own scope
+
+**Security**:
+- Dual-phase authorization prevents privilege escalation
+- Users explicitly consent to cross-app search capability
+- Per-document verification ensures users only see accessible content
+
+**Maintainability**:
+- Adding new apps to vector sync doesn't require OAuth scope changes
+- Clear separation between app access (notes:read) and search capability (semantic:read)
+- Logical grouping of related operations (search, sync status, enable/disable)
+
+**Future-Proof**:
+- Can add new document types without breaking existing OAuth flows
+- Supports future semantic features (recommendations, clustering) under same scope
+- Aligns with potential future Nextcloud semantic capabilities
+
+### Trade-offs
+
+**Less Granular Than App-Specific Scopes**:
+- User can't grant "semantic search notes only"
+- Semantic search is all-or-nothing across enabled apps
+- **Mitigation**: Dual-phase verification ensures users only see documents they can access
+
+**New Scope to Learn**:
+- Users must understand `semantic:read` is distinct from app scopes
+- MCP clients must present scope clearly during consent
+- **Mitigation**: Clear scope descriptions in OAuth consent UI and documentation
+
+**Backend Complexity**:
+- Requires dual-phase authorization implementation
+- DocumentVerifier interface needed for each app
+- **Benefit**: Enforces proper security regardless of scope model
+
+### Migration Impact
+
+**Breaking Change**: Existing deployments using notes-specific semantic search will break.
+
+**Before (OLD - Breaking)**:
+```python
+@mcp.tool()
+@require_scopes("notes:read")
+async def nc_notes_semantic_search(query: str, ctx: Context) -> SemanticSearchResponse:
+    """Semantic search notes"""
+```
+
+**After (NEW)**:
+```python
+@mcp.tool()
+@require_scopes("semantic:read")
+async def nc_semantic_search(query: str, ctx: Context) -> SemanticSearchResponse:
+    """Semantic search across all apps"""
+```
+
+**Migration Path**:
+1. Deploy server with new `semantic:read` scope
+2. Users re-authenticate, granting `semantic:read` scope
+3. Semantic search tools become visible/usable again
+4. **No data loss**: Vector database and indexed documents remain unchanged
+
+**Backward Compatibility**: None. This is an intentional breaking change to correct the scope model before broader adoption.
+
+## Alternatives Considered
+
+### Keep Notes-Specific Scopes
+
+**Approach**: Continue using `notes:read` for semantic search, even when searching other apps.
+
+**Rejected Because**:
+- Semantically incorrect - searching calendar events is not "reading notes"
+- Confuses users - why does searching calendar require notes:read?
+- Doesn't scale - what scope for multi-app search?
+
+### Create Per-App Semantic Scopes
+
+**Approach**: Introduce `notes:semantic`, `calendar:semantic`, `deck:semantic`, etc.
+
+**Rejected Because**:
+- Scope proliferation - doubles the number of scopes
+- Defeats purpose of unified vector search
+- Users would need to grant 5+ scopes for cross-app search
+- No clear benefit over dual-phase authorization with `semantic:read`
+
+### Require All App Scopes (Already Rejected in Option 1)
+
+**Approach**: Require `notes:read AND calendar:read AND deck:read AND files:read AND contacts:read`
+
+**Rejected Because**: Unusable UX (see Option 1 disadvantages above)
+
+## Related Decisions
+
+**ADR-007**: Background Vector Sync provides the indexing architecture that semantic scopes protect. The DocumentVerifier interface from ADR-007 Phase 3 implements dual-phase authorization.
+
+**ADR-008**: MCP Sampling for semantic search uses `semantic:read` to protect the sampling-enhanced search tool.
+
+**ADR-004**: Progressive Consent architecture supports users granting `semantic:read` initially, then enabling per-app indexing via `semantic:write` (enable_vector_sync with app selection).
+
+## Implementation Checklist
+
+- [ ] Create ADR-009 document (this file)
+- [ ] Update `oauth-architecture.md` to document `semantic:read` and `semantic:write` scopes ✅
+- [ ] Update `README.md` to show Semantic Search as separate tool category ✅
+- [ ] Update ADR-007 to reference `semantic:*` scopes instead of `sync:*` ✅
+- [ ] Update ADR-008 to use `semantic:read` instead of `notes:read` ✅
+- [ ] Implement DocumentVerifier interface for all apps (notes, calendar, deck, files, contacts)
+- [ ] Update semantic search tools to use `@require_scopes("semantic:read")`
+- [ ] Update vector sync tools to use `@require_scopes("semantic:write")`
+- [ ] Add dual-phase authorization to semantic search implementation
+- [ ] Test OAuth flow with `semantic:read` scope
+- [ ] Update scope discovery in PRM endpoint
+- [ ] Document migration path for existing deployments
@@ -108,6 +108,317 @@ NEXTCLOUD_PASSWORD=your_app_password_or_password

 ---

+## Semantic Search Configuration (Optional)
+
+The MCP server includes semantic search capabilities powered by vector embeddings. This feature requires a vector database (Qdrant) and an embedding service.
+
+### Qdrant Vector Database Modes
+
+The server supports three Qdrant deployment modes:
+
+1. **In-Memory Mode** (Default) - Simplest for development and testing
+2. **Persistent Local Mode** - For single-instance deployments with persistence
+3. **Network Mode** - For production with dedicated Qdrant service
+
+#### 1. In-Memory Mode (Default)
+
+No configuration needed! If neither `QDRANT_URL` nor `QDRANT_LOCATION` is set, the server defaults to in-memory mode:
+
+```dotenv
+# No Qdrant configuration needed - defaults to :memory:
+VECTOR_SYNC_ENABLED=true
+```
+
+**Pros:**
+- Zero configuration
+- Fast startup
+- Perfect for testing
+
+**Cons:**
+- Data lost on restart
+- Limited to available RAM
+
+#### 2. Persistent Local Mode
+
+For single-instance deployments that need persistence without a separate Qdrant service:
+
+```dotenv
+# Local persistent storage
+QDRANT_LOCATION=/app/data/qdrant  # Or any writable path
+VECTOR_SYNC_ENABLED=true
+```
+
+**Pros:**
+- Data persists across restarts
+- No separate service needed
+- Suitable for small/medium deployments
+
+**Cons:**
+- Limited to single instance
+- Shares resources with MCP server
+
+#### 3. Network Mode
+
+For production deployments with a dedicated Qdrant service:
+
+```dotenv
+# Network mode configuration
+QDRANT_URL=http://qdrant:6333
+QDRANT_API_KEY=your-secret-api-key  # Optional
+QDRANT_COLLECTION=nextcloud_content  # Optional
+VECTOR_SYNC_ENABLED=true
+```
+
+**Pros:**
+- Scalable and performant
+- Can be shared across multiple MCP instances
+- Supports clustering and replication
+
+**Cons:**
+- Requires separate Qdrant service
+- More complex deployment
+
+### Qdrant Collection Naming
+
+Collection names are automatically generated to include the embedding model, ensuring safe model switching and preventing dimension mismatches.
+
+#### Auto-Generated Naming (Default)
+
+**Format:** `{deployment-id}-{model-name}`
+
+**Components:**
+- **Deployment ID:** `OTEL_SERVICE_NAME` (if configured) or `hostname` (fallback)
+- **Model name:** `OLLAMA_EMBEDDING_MODEL`
+
+**Examples:**
+
+```bash
+# With OTEL service name configured
+OTEL_SERVICE_NAME=my-mcp-server
+OLLAMA_EMBEDDING_MODEL=nomic-embed-text
+# → Collection: "my-mcp-server-nomic-embed-text"
+
+# Simple Docker deployment (OTEL not configured)
+# hostname=mcp-container
+OLLAMA_EMBEDDING_MODEL=all-minilm
+# → Collection: "mcp-container-all-minilm"
+```
+
+#### Switching Embedding Models
+
+When you change `OLLAMA_EMBEDDING_MODEL`, a new collection is automatically created:
+
+```bash
+# Initial setup
+OLLAMA_EMBEDDING_MODEL=nomic-embed-text
+# Collection: "my-server-nomic-embed-text" (768 dimensions)
+
+# Change model
+OLLAMA_EMBEDDING_MODEL=all-minilm
+# Collection: "my-server-all-minilm" (384 dimensions)
+# → New collection created, full re-embedding occurs
+```
+
+**Important:**
+- **Collections are mutually exclusive** - vectors cannot be shared between different embedding models
+- **Switching models requires re-embedding** all documents (may take time for large note collections)
+- **Old collection remains** in Qdrant and can be deleted manually if no longer needed
+
+#### Explicit Override
+
+Set `QDRANT_COLLECTION` to use a specific collection name:
+
+```bash
+QDRANT_COLLECTION=my-custom-collection  # Bypasses auto-generation
+```
+
+**Use cases:**
+- Backward compatibility with existing deployments
+- Custom naming schemes
+- Sharing a collection across deployments (advanced)
+
+#### Multi-Server Deployments
+
+Each server should have a unique deployment ID to avoid collection collisions:
+
+```bash
+# Server 1 (Production)
+OTEL_SERVICE_NAME=mcp-prod
+OLLAMA_EMBEDDING_MODEL=nomic-embed-text
+# → Collection: "mcp-prod-nomic-embed-text"
+
+# Server 2 (Staging)
+OTEL_SERVICE_NAME=mcp-staging
+OLLAMA_EMBEDDING_MODEL=nomic-embed-text
+# → Collection: "mcp-staging-nomic-embed-text"
+
+# Server 3 (Different model)
+OTEL_SERVICE_NAME=mcp-experimental
+OLLAMA_EMBEDDING_MODEL=bge-large
+# → Collection: "mcp-experimental-bge-large"
+```
+
+**Benefits:**
+- Multiple MCP servers can share one Qdrant instance safely
+- No naming collisions between deployments
+- Clear collection ownership (can see which deployment and model)
+
+#### Dimension Validation
+
+The server validates collection dimensions on startup:
+
+```
+Dimension mismatch for collection 'my-server-nomic-embed-text':
+  Expected: 384 (from embedding model 'all-minilm')
+  Found: 768
+This usually means you changed the embedding model.
+Solutions:
+  1. Delete the old collection: Collection will be recreated with new dimensions
+  2. Set QDRANT_COLLECTION to use a different collection name
+  3. Revert OLLAMA_EMBEDDING_MODEL to the original model
+```
+
+**What this prevents:**
+- Runtime errors from dimension mismatches
+- Data corruption in Qdrant
+- Confusing error messages during indexing
+
+### Vector Sync Configuration
+
+Control background indexing behavior:
+
+```dotenv
+# Vector sync settings (ADR-007)
+VECTOR_SYNC_ENABLED=true              # Enable background indexing
+VECTOR_SYNC_SCAN_INTERVAL=300         # Scan interval in seconds (default: 5 minutes)
+VECTOR_SYNC_PROCESSOR_WORKERS=3       # Concurrent indexing workers (default: 3)
+VECTOR_SYNC_QUEUE_MAX_SIZE=10000      # Max queued documents (default: 10000)
+
+# Document chunking settings (for vector embeddings)
+DOCUMENT_CHUNK_SIZE=512               # Words per chunk (default: 512)
+DOCUMENT_CHUNK_OVERLAP=50             # Overlapping words between chunks (default: 50)
+```
+
+### Embedding Service Configuration
+
+The server uses an embedding service to generate vector representations. Two options are available:
+
+#### Ollama (Recommended)
+
+Use a local Ollama instance for embeddings:
+
+```dotenv
+OLLAMA_BASE_URL=http://ollama:11434
+OLLAMA_EMBEDDING_MODEL=nomic-embed-text  # Default model
+OLLAMA_VERIFY_SSL=true                   # Verify SSL certificates
+```
+
+#### Simple Embedding Provider (Fallback)
+
+If `OLLAMA_BASE_URL` is not set, the server uses a simple random embedding provider for testing. This is **not suitable for production** as it generates random embeddings with no semantic meaning.
+
+### Document Chunking Configuration
+
+The server chunks documents before embedding to handle documents larger than the embedding model's context window. Chunk size and overlap can be tuned based on your embedding model and content type.
+
+#### Choosing Chunk Size
+
+**Smaller chunks (256-384 words)**:
+- More precise matching
+- Less context per chunk
+- Better for finding specific information
+- Higher storage requirements (more vectors)
+
+**Larger chunks (768-1024 words)**:
+- More context per chunk
+- Less precise matching
+- Better for understanding broader topics
+- Lower storage requirements (fewer vectors)
+
+**Default (512 words)**:
+- Balanced approach suitable for most use cases
+- Works well with typical note lengths
+- Good compromise between precision and context
+
+#### Choosing Overlap
+
+Overlap preserves context across chunk boundaries. Recommended settings:
+
+- **10-20% of chunk size** (e.g., 50-100 words for 512-word chunks)
+- **Too small** (<10%): May lose context at boundaries
+- **Too large** (>20%): Redundant storage, diminishing returns
+
+**Examples**:
+```dotenv
+# Precise matching for short notes
+DOCUMENT_CHUNK_SIZE=256
+DOCUMENT_CHUNK_OVERLAP=25
+
+# Default balanced configuration
+DOCUMENT_CHUNK_SIZE=512
+DOCUMENT_CHUNK_OVERLAP=50
+
+# More context for long documents
+DOCUMENT_CHUNK_SIZE=1024
+DOCUMENT_CHUNK_OVERLAP=100
+```
+
+**Important**: Changing chunk size requires re-embedding all documents. The collection naming strategy (see "Qdrant Collection Naming" above) helps manage this by creating separate collections for different configurations.
+
+### Environment Variables Reference
+
+| Variable | Required | Default | Description |
+|----------|----------|---------|-------------|
+| `QDRANT_URL` | ⚠️ Optional | - | Qdrant service URL (network mode) - mutually exclusive with `QDRANT_LOCATION` |
+| `QDRANT_LOCATION` | ⚠️ Optional | `:memory:` | Local Qdrant path (`:memory:` or `/path/to/data`) - mutually exclusive with `QDRANT_URL` |
+| `QDRANT_API_KEY` | ⚠️ Optional | - | Qdrant API key (network mode only) |
+| `QDRANT_COLLECTION` | ⚠️ Optional | `nextcloud_content` | Qdrant collection name |
+| `VECTOR_SYNC_ENABLED` | ⚠️ Optional | `false` | Enable background vector indexing |
+| `VECTOR_SYNC_SCAN_INTERVAL` | ⚠️ Optional | `300` | Document scan interval (seconds) |
+| `VECTOR_SYNC_PROCESSOR_WORKERS` | ⚠️ Optional | `3` | Concurrent indexing workers |
+| `VECTOR_SYNC_QUEUE_MAX_SIZE` | ⚠️ Optional | `10000` | Max queued documents |
+| `OLLAMA_BASE_URL` | ⚠️ Optional | - | Ollama API endpoint for embeddings |
+| `OLLAMA_EMBEDDING_MODEL` | ⚠️ Optional | `nomic-embed-text` | Embedding model to use |
+| `OLLAMA_VERIFY_SSL` | ⚠️ Optional | `true` | Verify SSL certificates |
+| `DOCUMENT_CHUNK_SIZE` | ⚠️ Optional | `512` | Words per chunk for document embedding |
+| `DOCUMENT_CHUNK_OVERLAP` | ⚠️ Optional | `50` | Overlapping words between chunks (must be < chunk size) |
+
+### Docker Compose Example
+
+Enable network mode Qdrant with docker-compose:
+
+```yaml
+services:
+  mcp:
+    environment:
+      - QDRANT_URL=http://qdrant:6333
+      - VECTOR_SYNC_ENABLED=true
+
+  qdrant:
+    image: qdrant/qdrant:latest
+    ports:
+      - 127.0.0.1:6333:6333
+    volumes:
+      - qdrant-data:/qdrant/storage
+    profiles:
+      - qdrant  # Optional service
+
+volumes:
+  qdrant-data:
+```
+
+Start with Qdrant service:
+```bash
+docker-compose --profile qdrant up
+```
+
+Or use default in-memory mode (no `--profile` needed):
+```bash
+docker-compose up
+```
+
+---
+
 ## Loading Environment Variables

 After creating your `.env` file, load the environment variables:
@@ -8,7 +8,9 @@
 | `nc_notes_update_note` | Update an existing note by ID |
 | `nc_notes_append_content` | Append content to an existing note with a clear separator |
 | `nc_notes_delete_note` | Delete a note by ID |
-| `nc_notes_search_notes` | Search notes by title or content |
+| `nc_notes_search_notes` | Search notes by title or content (keyword search) |
+| `nc_notes_semantic_search` | Search notes by meaning using vector embeddings (requires vector sync) |
+| `nc_notes_semantic_search_answer` | Search notes semantically and generate a natural language answer via MCP sampling (requires vector sync and sampling-capable MCP client) |

 ### Note Attachments

@@ -634,6 +634,12 @@ The server supports the following OAuth scopes, organized by Nextcloud app:
 - `sharing:read` - List shares and read share information
 - `sharing:write` - Create, update, and delete shares

+#### Semantic Search (Multi-App Vector Database)
+- `semantic:read` - Query vector database, perform semantic search across all indexed Nextcloud apps (notes, calendar, deck, files, contacts)
+- `semantic:write` - Enable/disable background vector synchronization, manage indexing settings
+
+> **Note**: Semantic search scopes provide access to the vector database that indexes content across **all** Nextcloud apps. Unlike app-specific scopes (e.g., `notes:read`), semantic scopes grant cross-app search capabilities powered by background vector synchronization (ADR-007).
+
 ### Scope Discovery

 The MCP server provides scope discovery through two mechanisms:
@@ -0,0 +1,260 @@
+# Observability and Monitoring
+
+The Nextcloud MCP Server includes comprehensive observability features for production deployments:
+
+- **Prometheus metrics** for monitoring performance and health
+- **OpenTelemetry distributed tracing** for debugging request flows
+- **Structured JSON logging** with trace correlation
+- **Kubernetes integration** via ServiceMonitor and PrometheusRule
+
+## Quick Start
+
+### Local Development with Prometheus
+
+```bash
+# Enable metrics (enabled by default)
+export METRICS_ENABLED=true
+export METRICS_PORT=9090
+
+# Enable tracing (optional)
+export OTEL_ENABLED=true
+export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4317
+
+# Start the server
+docker-compose up -d mcp
+```
+
+Access metrics at: `http://localhost:9090/metrics`
+
+### Kubernetes Deployment
+
+Metrics are automatically scraped if you have Prometheus Operator installed:
+
+```bash
+helm install nextcloud-mcp charts/nextcloud-mcp-server \
+  --set observability.metrics.enabled=true \
+  --set observability.tracing.enabled=true \
+  --set observability.tracing.endpoint=http://opentelemetry-collector:4317 \
+  --set serviceMonitor.enabled=true
+```
+
+## Configuration
+
+### Environment Variables
+
+| Variable | Default | Description |
+|----------|---------|-------------|
+| `METRICS_ENABLED` | `true` | Enable Prometheus metrics |
+| `METRICS_PORT` | `9090` | Port for metrics endpoint |
+| `OTEL_ENABLED` | `false` | Enable OpenTelemetry tracing |
+| `OTEL_EXPORTER_OTLP_ENDPOINT` | - | OTLP gRPC endpoint (e.g., `http://otel-collector:4317`) |
+| `OTEL_SERVICE_NAME` | `nextcloud-mcp-server` | Service name in traces |
+| `OTEL_TRACES_SAMPLER` | `always_on` | Trace sampling strategy |
+| `OTEL_TRACES_SAMPLER_ARG` | `1.0` | Sampling rate (0.0-1.0) |
+| `LOG_FORMAT` | `json` | Log format (`json` or `text`) |
+| `LOG_LEVEL` | `INFO` | Minimum log level |
+| `LOG_INCLUDE_TRACE_CONTEXT` | `true` | Include trace IDs in logs |
+
+### Helm Chart Configuration
+
+```yaml
+observability:
+  metrics:
+    enabled: true
+    port: 9090
+    path: /metrics
+
+  tracing:
+    enabled: true
+    endpoint: "http://opentelemetry-collector:4317"
+    samplingRate: 1.0
+
+  logging:
+    format: json
+    level: INFO
+    includeTraceContext: true
+
+serviceMonitor:
+  enabled: true
+  interval: 30s
+  scrapeTimeout: 10s
+```
+
+## Metrics
+
+### HTTP Server Metrics (RED)
+
+- `mcp_http_requests_total` - Total HTTP requests
+- `mcp_http_request_duration_seconds` - Request latency histogram
+- `mcp_http_requests_in_progress` - In-flight requests gauge
+
+### MCP Tool Metrics
+
+- `mcp_tool_calls_total` - Tool invocation count by status
+- `mcp_tool_duration_seconds` - Tool execution latency
+- `mcp_tool_errors_total` - Tool errors by type
+
+### Nextcloud API Metrics
+
+- `mcp_nextcloud_api_requests_total` - API calls by app and status
+- `mcp_nextcloud_api_duration_seconds` - API latency by app
+- `mcp_nextcloud_api_retries_total` - Retry count (429, timeout, etc.)
+
+### OAuth Flow Metrics
+
+- `mcp_oauth_token_validations_total` - Token validation count
+- `mcp_oauth_token_exchange_total` - Token exchange operations
+- `mcp_oauth_token_cache_hits_total` - Cache hit/miss rate
+- `mcp_oauth_refresh_token_operations_total` - Refresh token storage ops
+
+### Vector Sync Metrics (when enabled)
+
+- `mcp_vector_sync_documents_scanned_total` - Documents discovered
+- `mcp_vector_sync_documents_processed_total` - Processing results
+- `mcp_vector_sync_processing_duration_seconds` - Processing latency
+- `mcp_vector_sync_queue_size` - Current queue depth
+- `mcp_qdrant_operations_total` - Qdrant DB operations
+
+### Database Metrics
+
+- `mcp_db_operations_total` - DB operations (SQLite, Qdrant)
+- `mcp_db_operation_duration_seconds` - DB latency
+
+### Dependency Health
+
+- `mcp_dependency_health` - External dependency status (1=up, 0=down)
+- `mcp_dependency_check_duration_seconds` - Health check latency
+
+## Distributed Tracing
+
+### Span Hierarchy
+
+```
+HTTP POST /messages
+├── mcp.tool.nc_notes_create_note
+│   └── nextcloud.api.notes.POST
+│       └── httpx request (auto-instrumented)
+└── oauth.token.validate (if OAuth mode)
+    └── httpx request to IdP
+```
+
+### Span Attributes
+
+- **MCP tools**: `mcp.tool.name`, `mcp.tool.args` (sanitized)
+- **Nextcloud API**: `nextcloud.app`, `http.method`, `http.status_code`
+- **OAuth**: `oauth.operation`, `oauth.method`
+- **Vector sync**: `vector_sync.operation`, `vector_sync.document_count`
+
+### Trace Context in Logs
+
+When tracing is enabled, all logs include `trace_id` and `span_id`:
+
+```json
+{
+  "timestamp": "2025-01-09T12:34:56.789Z",
+  "level": "INFO",
+  "logger": "nextcloud_mcp_server.server.notes",
+  "message": "Note created successfully",
+  "trace_id": "a1b2c3d4e5f6...",
+  "span_id": "123456789abc...",
+  "note_id": 42
+}
+```
+
+## Dashboards
+
+### Prometheus Queries
+
+**Request Rate (req/s)**:
+```promql
+sum(rate(mcp_http_requests_total[5m])) by (method, endpoint)
+```
+
+**Error Rate (%)**:
+```promql
+sum(rate(mcp_http_requests_total{status_code=~"5.."}[5m]))
+  / sum(rate(mcp_http_requests_total[5m])) * 100
+```
+
+**P95 Latency**:
+```promql
+histogram_quantile(0.95,
+  sum(rate(mcp_http_request_duration_seconds_bucket[5m])) by (le, endpoint)
+)
+```
+
+**Top Tools by Volume**:
+```promql
+topk(10, sum(rate(mcp_tool_calls_total[5m])) by (tool_name))
+```
+
+**Nextcloud API Health**:
+```promql
+sum(rate(mcp_nextcloud_api_requests_total{status_code!~"2.."}[5m])) by (app)
+```
+
+## Alerts
+
+### Recommended Alert Rules
+
+**Critical**:
+- Server down for >5min
+- Error rate >5% for >5min
+- P95 latency >1s for >5min
+- Dependency down for >2min
+
+**Warning**:
+- Token validation errors >1% for >10min
+- Vector sync queue >100 for >15min
+- Qdrant slow (p95 >500ms) for >10min
+
+See `charts/nextcloud-mcp-server/templates/prometheusrule.yaml` for complete definitions.
+
+## Troubleshooting
+
+### Metrics Not Appearing
+
+1. Check metrics are enabled: `curl http://localhost:9090/metrics`
+2. Verify ServiceMonitor labels match Prometheus selector
+3. Check Prometheus target status: `http://prometheus:9090/targets`
+
+### Traces Not Appearing
+
+1. Verify OTLP endpoint is reachable: `curl http://otel-collector:4317`
+2. Check collector logs for errors
+3. Verify sampling rate is not 0.0
+4. Check trace backend (Jaeger/Tempo) connectivity
+
+### High Cardinality Metrics
+
+If you see cardinality warnings:
+- Middleware normalizes endpoints (e.g., `/user/123` → `/user/*`)
+- OAuth tokens are never included in metric labels
+- User IDs are not tracked (use tracing for per-user debugging)
+
+## Performance Impact
+
+- **Metrics**: <1% overhead (counters/histograms are very fast)
+- **Tracing**: ~2-5% overhead at 100% sampling
+- **JSON logging**: <1% overhead vs text logging
+
+**Recommendation**: Always enable metrics. Enable tracing in staging/production with 10-50% sampling.
+
+## Architecture
+
+The observability stack integrates at multiple layers:
+
+1. **HTTP Layer**: `ObservabilityMiddleware` tracks all HTTP requests
+2. **MCP Layer**: Tools use `@trace_mcp_tool` for span creation
+3. **Client Layer**: `BaseNextcloudClient` tracks all API calls
+4. **OAuth Layer**: Token operations are traced and metered
+5. **Background Tasks**: Vector sync operations emit metrics/traces
+
+All components use shared Prometheus `Registry` and OpenTelemetry `TracerProvider`.
+
+## References
+
+- [Prometheus Best Practices](https://prometheus.io/docs/practices/)
+- [OpenTelemetry Python SDK](https://opentelemetry.io/docs/languages/python/)
+- [Prometheus Operator](https://prometheus-operator.dev/)
+- [Grafana Dashboards](https://grafana.com/docs/grafana/latest/dashboards/)
@@ -0,0 +1,921 @@
+# Semantic Search Architecture
+
+This document explains the architecture of the semantic search feature in the Nextcloud MCP Server, including background synchronization, vector search, and optional AI-generated answers via MCP sampling.
+
+> [!IMPORTANT]
+> **Status: Experimental**
+> - Disabled by default (`VECTOR_SYNC_ENABLED=false`)
+> - Currently supports **Notes app only** (multi-app architecture ready, additional apps planned)
+> - Requires additional infrastructure (Qdrant vector database + Ollama embedding service)
+> - RAG answer generation requires MCP client sampling support
+
+## Overview
+
+### What is Semantic Search?
+
+**Semantic search** finds information based on **meaning** rather than exact keyword matches. It uses vector embeddings to understand that "car" and "automobile" are similar, or that "bread recipe" matches "how to bake bread."
+
+**Traditional keyword search:**
+```
+Query: "machine learning"
+Matches: Only notes containing "machine learning" exactly
+Misses: Notes with "neural networks", "AI models", "deep learning"
+```
+
+**Semantic search:**
+```
+Query: "machine learning"
+Matches: Notes about machine learning, neural networks, AI, deep learning, etc.
+Understanding: Semantic similarity via vector embeddings
+```
+
+### Why It Matters
+
+Semantic search enables:
+- **Natural language queries** - Ask questions in plain language
+- **Conceptual discovery** - Find related content even with different terminology
+- **Cross-reference insights** - Connect ideas across your knowledge base
+- **AI-powered answers** - Generate summaries with citations (optional, requires MCP sampling)
+
+### Current Support
+
+- **Supported Apps**: Notes (fully implemented)
+- **Planned Apps**: Calendar events, Calendar tasks, Deck cards, Files (with text extraction), Contacts
+- **Architecture**: Multi-app plugin system ready, awaiting implementation
+
+## System Components
+
+```mermaid
+graph TB
+    subgraph "MCP Client"
+        Client[Claude Desktop, IDEs, etc.]
+    end
+
+    subgraph "Nextcloud MCP Server"
+        MCP[MCP Server]
+        Scanner[Background Scanner<br/>Hourly Change Detection]
+        Queue[Document Queue]
+        Processor[Embedding Processors<br/>Concurrent Workers]
+    end
+
+    subgraph "Infrastructure"
+        Qdrant[(Qdrant<br/>Vector Database)]
+        Ollama[Ollama<br/>Embedding Service]
+        NC[Nextcloud<br/>Notes API, CalDAV, etc.]
+    end
+
+    Client <-->|MCP Protocol| MCP
+    Scanner -->|Fetch Changes| NC
+    Scanner -->|Enqueue Documents| Queue
+    Queue -->|Process Batch| Processor
+    Processor -->|Generate Embeddings| Ollama
+    Processor -->|Store Vectors| Qdrant
+    MCP -->|Search Queries| Qdrant
+    MCP -->|Verify Access| NC
+```
+
+**Component Roles:**
+
+- **MCP Server**: Exposes semantic search tools (`nc_semantic_search`, `nc_semantic_search_answer`, `nc_get_vector_sync_status`)
+- **Background Scanner**: Discovers changed documents every hour using ETag-based change detection
+- **Document Queue**: Holds pending documents for embedding generation
+- **Embedding Processors**: Generate vector embeddings via Ollama (concurrent workers)
+- **Qdrant Vector Database**: Stores document vectors with metadata and user_id filtering
+- **Ollama Embedding Service**: Converts text to 768-dimensional vectors (default: `nomic-embed-text` model)
+- **Nextcloud APIs**: Source of truth for documents and access control verification
+
+## How It Works: Background Synchronization
+
+Background synchronization runs automatically when `VECTOR_SYNC_ENABLED=true`, discovering changes and indexing documents without user intervention.
+
+```mermaid
+sequenceDiagram
+    participant Timer
+    participant Scanner
+    participant NC as Nextcloud API
+    participant Queue
+    participant Processor
+    participant Ollama
+    participant Qdrant
+
+    Timer->>Scanner: Trigger (hourly)
+    Scanner->>NC: Fetch all notes<br/>(Notes API)
+    NC-->>Scanner: Notes with ETags
+    Scanner->>Qdrant: Check indexed documents
+    Qdrant-->>Scanner: Existing ETags
+    Scanner->>Scanner: Identify changes<br/>(new/modified/deleted)
+    Scanner->>Queue: Enqueue changed docs
+
+    loop Continuous Processing
+        Processor->>Queue: Fetch batch
+        Queue-->>Processor: Documents
+        Processor->>Ollama: Generate embeddings
+        Ollama-->>Processor: 768-dim vectors
+        Processor->>Qdrant: Upsert vectors<br/>(with user_id, doc_type)
+    end
+```
+
+### Scanner Behavior
+
+**Hourly Trigger:**
+- Runs every hour (configurable)
+- Fetches all notes from Nextcloud Notes API
+- Compares ETags with Qdrant's indexed state
+- Enqueues new/modified documents
+
+**Change Detection:**
+- **New documents**: No entry in Qdrant → enqueue for indexing
+- **Modified documents**: ETag mismatch → enqueue for re-indexing
+- **Deleted documents**: In Qdrant but not in Nextcloud → delete from Qdrant
+
+**Multi-App Plugin Architecture:**
+```python
+# Each app implements DocumentScanner interface
+class NotesScanner(DocumentScanner):
+    async def scan(self) -> list[Document]:
+        # Fetch notes, detect changes, return documents
+```
+
+Currently only `NotesScanner` is implemented. Future: `CalendarScanner`, `DeckScanner`, `FilesScanner`, etc.
+
+### Queue Processing
+
+**Document Queue:**
+- In-memory FIFO queue (not persistent across restarts)
+- Holds documents pending embedding generation
+- Batch processing for efficiency
+
+**Processor Pool:**
+- Concurrent workers using `anyio.TaskGroup`
+- Process documents in parallel (default: 4 workers)
+- Each worker: fetch document → generate embedding → store in Qdrant
+
+**Backpressure Handling:**
+- Queue size limits prevent memory exhaustion
+- Slow consumers (Ollama) naturally pace the system
+
+### Vector Storage
+
+**Qdrant Collection Schema:**
+```
+{
+  "id": "note_123",
+  "vector": [768 dimensions],
+  "payload": {
+    "user_id": "alice",
+    "doc_type": "note",
+    "doc_id": "123",
+    "title": "Machine Learning Notes",
+    "content": "Neural networks are...",
+    "etag": "abc123",
+    "last_modified": "2025-01-15T10:30:00Z"
+  }
+}
+```
+
+**Key Fields:**
+- `user_id`: Multi-tenancy filtering (each user's vectors isolated)
+- `doc_type`: App identifier ("note", "event", "card", etc.)
+- `etag`: Change detection for incremental updates
+- `chunk_index`: Position of this chunk within the document (0-indexed)
+- `total_chunks`: Total number of chunks for this document
+- `excerpt`: First 200 characters of chunk (for display)
+
+### Document Chunking Strategy
+
+Documents are chunked before embedding to handle content larger than the embedding model's context window and to improve search precision.
+
+**Configuration:**
+```dotenv
+DOCUMENT_CHUNK_SIZE=512       # Words per chunk (default)
+DOCUMENT_CHUNK_OVERLAP=50     # Overlapping words between chunks (default)
+```
+
+**Chunking Process:**
+1. **Text combination**: Document title + content (e.g., `"Note Title\n\nNote content..."`)
+2. **Word-based splitting**: Simple whitespace tokenization
+3. **Sliding window**: Create overlapping chunks
+4. **Individual embedding**: Each chunk gets its own vector
+5. **Separate storage**: Each chunk stored as distinct point in Qdrant
+
+**Example:**
+```
+Document (1000 words):
+→ Chunk 0: words 0-511
+→ Chunk 1: words 462-973 (overlaps by 50 words)
+→ Chunk 2: words 924-999 (last chunk, partial)
+
+Each chunk stored as separate vector with metadata:
+- chunk_index: 0, 1, 2
+- total_chunks: 3
+- excerpt: First 200 chars of each chunk
+```
+
+**Search Behavior:**
+- **Vector search** operates on chunks (not whole documents)
+- **Deduplication** collapses multiple matching chunks from same document
+- **Best match** returns highest-scoring chunk's excerpt
+- **Access verification** still performed at document level
+
+**Tuning Recommendations:**
+- **Small chunks (256-384 words)**: More precise, less context, more storage
+- **Large chunks (768-1024 words)**: More context, less precise, less storage
+- **Overlap (10-20% of chunk size)**: Preserves context across boundaries
+- **Match to embedding model**: Consider model's context window when sizing
+
+**Important**: Changing chunk size requires re-embedding all documents. Use the collection naming strategy to manage different chunking configurations.
+
+### Collection Naming and Model Switching
+
+**Auto-generated collection names:**
+- **Format:** `{deployment-id}-{model-name}`
+- **Deployment ID:** `OTEL_SERVICE_NAME` (if configured) or `hostname` (fallback)
+- **Model name:** `OLLAMA_EMBEDDING_MODEL`
+- **Example:** `"my-mcp-server-nomic-embed-text"`, `"mcp-container-all-minilm"`
+
+**Why model-based naming:**
+- Ensures each embedding model gets its own collection
+- Prevents dimension mismatches when switching models
+- Enables safe model experimentation (new model = new collection)
+- Supports multi-server deployments (different deployment IDs)
+
+**Switching embedding models:**
+
+Collections are **mutually exclusive** - vectors from one embedding model cannot be used with another. When you change the embedding model:
+
+1. **New collection is created** with the new model's dimensions
+2. **Full re-embedding occurs** - scanner processes all documents again
+3. **Old collection remains** - can be deleted manually if no longer needed
+4. **Dimension validation** - server fails fast if collection dimension doesn't match model
+
+**Example workflow:**
+```bash
+# Start with nomic-embed-text (768 dimensions)
+OLLAMA_EMBEDDING_MODEL=nomic-embed-text
+# Collection: "my-server-nomic-embed-text"
+# → Scanner indexes 1000 notes → 1000 vectors in collection
+
+# Switch to all-minilm (384 dimensions)
+OLLAMA_EMBEDDING_MODEL=all-minilm
+# Collection: "my-server-all-minilm"
+# → Scanner detects 0 indexed documents → re-embeds 1000 notes
+# → Old collection "my-server-nomic-embed-text" still exists in Qdrant
+```
+
+**Re-embedding performance:**
+- CPU-only: 1-5 notes/second
+- With GPU: 50-200 notes/second
+- 1000 notes: 3-16 minutes (CPU) or 5-20 seconds (GPU)
+
+**Multi-server deployments:**
+
+Multiple MCP servers can share one Qdrant instance safely:
+
+```bash
+# Server 1 (Production)
+OTEL_SERVICE_NAME=mcp-prod
+OLLAMA_EMBEDDING_MODEL=nomic-embed-text
+# → Collection: "mcp-prod-nomic-embed-text"
+
+# Server 2 (Staging with different model)
+OTEL_SERVICE_NAME=mcp-staging
+OLLAMA_EMBEDDING_MODEL=all-minilm
+# → Collection: "mcp-staging-all-minilm"
+```
+
+Each deployment gets its own collection - no naming collisions or dimension conflicts.
+
+## How It Works: Semantic Search
+
+Semantic search converts user queries into vectors and finds similar documents using cosine similarity.
+
+```mermaid
+sequenceDiagram
+    participant User
+    participant MCP as MCP Server
+    participant Ollama
+    participant Qdrant
+    participant NC as Nextcloud API
+
+    User->>MCP: nc_semantic_search("machine learning")
+    MCP->>MCP: Check OAuth scope<br/>(semantic:read)
+    MCP->>Ollama: Generate query embedding
+    Ollama-->>MCP: Query vector (768-dim)
+    MCP->>Qdrant: Search similar vectors<br/>(filter: user_id=alice)
+    Qdrant-->>MCP: Top K results<br/>(with similarity scores)
+
+    loop For each result
+        MCP->>NC: Verify access<br/>(fetch note by ID)
+        alt Access granted
+            NC-->>MCP: Note metadata
+        else Access denied (404/401)
+            MCP->>MCP: Filter out result
+        end
+    end
+
+    MCP-->>User: Search results<br/>(with scores, excerpts)
+```
+
+### Dual-Phase Authorization
+
+**Phase 1: OAuth Scope Check**
+- Verify user has `semantic:read` scope
+- Rejects unauthorized users immediately
+
+**Phase 2: Per-Document Verification**
+- For each search result, fetch document via app API (Notes, Calendar, etc.)
+- If fetch succeeds (200 OK), user has access
+- If fetch fails (404 Not Found, 401 Unauthorized), filter out result
+- **Security**: Prevents information leakage from vector search alone
+
+**Rationale:**
+- Vector database doesn't know about sharing, permissions changes, or deleted documents
+- App APIs are source of truth for access control
+- Verification ensures users only see documents they can access
+
+### Search Flow
+
+1. **Query Embedding**: Convert user query to 768-dimensional vector via Ollama
+2. **Vector Search**: Find top K similar vectors in Qdrant (cosine similarity)
+3. **User Filtering**: Qdrant pre-filters by `user_id` (multi-tenancy)
+4. **Access Verification**: Fetch each document via app API to verify current access
+5. **Result Ranking**: Return results sorted by similarity score
+6. **Response**: Include document excerpts, metadata, and similarity scores
+
+### Performance
+
+- **Query latency**: 50-200ms typical (embedding + vector search + verification)
+- **Accuracy**: Depends on embedding model quality (`nomic-embed-text` recommended)
+- **Scalability**: Qdrant handles millions of vectors efficiently
+
+## How It Works: RAG with MCP Sampling (Optional)
+
+The `nc_semantic_search_answer` tool generates AI-powered answers with citations using **MCP sampling** - requesting the MCP client's LLM to generate text.
+
+```mermaid
+sequenceDiagram
+    participant User
+    participant MCP as MCP Server
+    participant Client as MCP Client<br/>(Claude Desktop)
+    participant LLM as Client's LLM<br/>(Claude, GPT, etc.)
+
+    User->>MCP: nc_semantic_search_answer("What are my Q1 goals?")
+    MCP->>MCP: Semantic search<br/>(find relevant notes)
+    MCP->>MCP: Construct prompt<br/>(query + documents + instructions)
+    MCP->>Client: Sampling request<br/>(MCP Protocol)
+    Client->>User: Prompt for approval<br/>(optional, client-controlled)
+    User-->>Client: Approve
+    Client->>LLM: Generate answer<br/>(with context)
+    LLM-->>Client: Answer with citations
+    Client-->>MCP: Sampling response
+    MCP-->>User: Generated answer<br/>(with source documents)
+```
+
+### MCP Sampling Architecture
+
+**Why MCP Sampling?**
+- **No server-side LLM**: MCP server has no API keys, doesn't call LLMs directly
+- **Client controls everything**: Which model, who pays, user approval prompts
+- **Privacy**: Documents stay with the client's LLM provider, not a third-party
+- **Flexibility**: Works with any MCP client that supports sampling (Claude Desktop, future clients)
+
+**Prompt Construction:**
+```
+User Query: {query}
+
+Relevant Documents:
+1. Document: {title} (Note)
+   Content: {excerpt}
+
+2. Document: {title} (Note)
+   Content: {excerpt}
+
+Instructions:
+- Provide a comprehensive answer to the user's query
+- Use the documents above as context
+- Include citations: "According to Document 1 (title)..."
+- If documents don't contain enough information, say so
+```
+
+**Graceful Fallback:**
+```python
+try:
+    result = await ctx.session.create_message(...)
+    return answer_with_citations
+except Exception as e:
+    # Fallback: Return documents without generated answer
+    return SearchResponse(
+        generated_answer=f"[Sampling unavailable: {e}]",
+        sources=search_results
+    )
+```
+
+**Client Support:**
+- **Requires**: MCP client with sampling capability
+- **Known support**: Claude Desktop (as of Claude 3.5+)
+- **Graceful degradation**: Returns raw documents if sampling unavailable
+
+## Authentication & Security
+
+### OAuth Scopes
+
+**`semantic:read`** - Search permission
+- Allows using `nc_semantic_search` and `nc_semantic_search_answer` tools
+- Does NOT grant access to documents (verified via app APIs)
+- Required for any semantic search operation
+
+**`semantic:write`** - Sync control permission
+- Allows enabling/disabling background sync (`provision_vector_sync`, `deprovision_vector_sync`)
+- Controls whether user's documents are indexed
+- Currently not implemented in OAuth mode (BasicAuth only)
+
+### Dual-Phase Authorization Pattern
+
+**Phase 1: Scope Check** (semantic:read)
+- Verifies user authorized to search
+- Prevents unauthorized vector database access
+
+**Phase 2: Document Verification** (app-specific APIs)
+- For each search result, fetch via Notes API, CalDAV, etc.
+- If user can fetch → include in results
+- If user cannot fetch (404/401) → filter out
+- **Security**: Vector search cannot leak documents user shouldn't see
+
+**Example Scenario:**
+1. Alice creates note "Secret Project X"
+2. Background sync indexes note with `user_id=alice`
+3. Bob searches for "project"
+4. Vector search finds "Secret Project X" (vector similarity)
+5. Qdrant filters by `user_id=bob` → no match (Alice's note excluded)
+6. Even if Bob somehow got the doc_id, Phase 2 verification would fail (404 Not Found)
+
+### Offline Access for Background Sync
+
+**Why needed:**
+- Background scanner runs hourly without user interaction
+- Requires valid access tokens to fetch documents from Nextcloud APIs
+- User's session token expires after hours/days
+
+**OAuth Mode (ADR-004 Flow 2):**
+- User explicitly provisions offline access via `provision_nextcloud_access` tool
+- Server requests `offline_access` scope → receives refresh token
+- Refresh token stored securely (database, encrypted)
+- Background sync uses refresh tokens to obtain access tokens
+
+**BasicAuth Mode:**
+- Username/password stored in environment variables
+- Always available for background operations
+- Simpler but less secure (credentials never expire)
+
+## Deployment Modes
+
+### Authentication Modes
+
+| Mode | Security | Offline Access | Background Sync | Best For |
+|------|----------|----------------|-----------------|----------|
+| **BasicAuth** | Lower (credentials in env) | Always available | ✅ Works immediately | Single-user, development, testing |
+| **OAuth** | Higher (tokens, scopes) | User must provision | ⚠️ Not yet implemented | Multi-user, production |
+
+**BasicAuth:**
+- Set `NEXTCLOUD_USERNAME` and `NEXTCLOUD_PASSWORD`
+- Background sync works immediately when `VECTOR_SYNC_ENABLED=true`
+- Credentials stored in `.env` file (secure server access required)
+
+**OAuth:**
+- Client authenticates with `semantic:read` scope
+- User must explicitly provision offline access (future: `provision_vector_sync` tool)
+- Background sync only works for users who provisioned access
+- More secure: tokens expire, user controls access
+
+### Qdrant Deployment Modes
+
+| Mode | Configuration | Persistence | Scalability | Best For |
+|------|---------------|-------------|-------------|----------|
+| **In-Memory** (default) | `QDRANT_LOCATION=:memory:` | ❌ Lost on restart | Single instance | Testing, development |
+| **Persistent Local** | `QDRANT_LOCATION=/data/qdrant` | ✅ Survives restarts | Single instance | Small deployments |
+| **Network** | `QDRANT_URL=http://qdrant:6333` | ✅ Dedicated service | ✅ Horizontal scaling | Production |
+
+**In-Memory Mode:**
+```bash
+VECTOR_SYNC_ENABLED=true
+# QDRANT_LOCATION not set → defaults to :memory:
+```
+- Fastest startup
+- No disk I/O
+- **Warning**: All vectors lost when server restarts (must re-index)
+
+**Persistent Local Mode:**
+```bash
+VECTOR_SYNC_ENABLED=true
+QDRANT_LOCATION=/var/lib/qdrant
+```
+- Vectors survive restarts
+- Single server only (no distributed setup)
+- Disk I/O for durability
+
+**Network Mode (Recommended for Production):**
+```bash
+VECTOR_SYNC_ENABLED=true
+QDRANT_URL=http://qdrant:6333
+QDRANT_API_KEY=secret  # optional
+```
+- Dedicated Qdrant service (Docker, Kubernetes)
+- Horizontal scaling (multiple MCP servers → one Qdrant)
+- High availability options
+
+### Embedding Service Options
+
+| Service | Configuration | Cost | Performance | Best For |
+|---------|---------------|------|-------------|----------|
+| **Ollama** (recommended) | `OLLAMA_BASE_URL=http://ollama:11434` | Free (self-hosted) | Fast (local GPU) | Production, development |
+| **OpenAI** (future) | `OPENAI_API_KEY=sk-...` | Paid (API) | Fast (cloud) | Cloud deployments |
+| **Fallback** | No config | Free | Slow (random) | Testing only (not production) |
+
+**Ollama Setup (Recommended):**
+```bash
+# docker-compose.yml
+services:
+  ollama:
+    image: ollama/ollama
+    volumes:
+      - ollama-data:/root/.ollama
+    ports:
+      - "11434:11434"
+
+# Pull embedding model
+docker compose exec ollama ollama pull nomic-embed-text
+```
+
+**Environment Configuration:**
+```bash
+OLLAMA_BASE_URL=http://ollama:11434
+OLLAMA_EMBEDDING_MODEL=nomic-embed-text  # 768-dimensional vectors
+```
+
+**Model Options:**
+- `nomic-embed-text` (default): 768-dim, optimized for semantic search
+- `all-minilm`: Smaller, faster, slightly less accurate
+- `mxbai-embed-large`: Larger, more accurate, slower
+
+## Configuration Overview
+
+### Key Environment Variables
+
+**Enable Semantic Search:**
+```bash
+VECTOR_SYNC_ENABLED=true  # Default: false (opt-in)
+```
+
+**Qdrant Vector Database:**
+```bash
+# In-memory mode (default if VECTOR_SYNC_ENABLED=true)
+# QDRANT_LOCATION not set → uses :memory:
+
+# Persistent local mode
+QDRANT_LOCATION=/var/lib/qdrant
+
+# Network mode (production)
+QDRANT_URL=http://qdrant:6333
+QDRANT_API_KEY=secret  # optional
+```
+
+**Ollama Embedding Service:**
+```bash
+OLLAMA_BASE_URL=http://ollama:11434
+OLLAMA_EMBEDDING_MODEL=nomic-embed-text  # Default
+```
+
+**Scanner Configuration:**
+```bash
+VECTOR_SYNC_INTERVAL=3600  # Scan interval in seconds (default: 1 hour)
+```
+
+### Resource Requirements
+
+**Qdrant:**
+- **Memory**: ~100-200 MB base + ~1 KB per vector (1M vectors ≈ 1 GB)
+- **Disk**: Persistent mode only, ~200 bytes per vector
+- **CPU**: Low (indexing) to moderate (search)
+
+**Ollama:**
+- **Memory**: 2-4 GB for `nomic-embed-text` model
+- **CPU**: High during embedding generation, idle otherwise
+- **GPU**: Optional but recommended (10-100x faster)
+
+**MCP Server:**
+- **Memory**: +50-100 MB for background sync workers
+- **CPU**: Moderate during scanning/processing, low otherwise
+
+### Trade-offs
+
+| Consideration | In-Memory Qdrant | Persistent Qdrant | Network Qdrant |
+|---------------|------------------|-------------------|----------------|
+| Setup complexity | ✅ Minimal | ✅ Easy | ⚠️ Requires separate service |
+| Durability | ❌ Lost on restart | ✅ Survives restarts | ✅ Survives restarts |
+| Scalability | ❌ Single instance | ❌ Single instance | ✅ Horizontal scaling |
+| Performance | ✅ Fastest | ✅ Fast | ⚠️ Network latency |
+
+## Operational Behavior
+
+### What Happens When VECTOR_SYNC_ENABLED=true
+
+**Immediate (Server Startup):**
+1. MCP server connects to Qdrant (creates collection if needed)
+2. MCP server connects to Ollama (verifies embedding model available)
+3. Background scanner starts (schedules hourly runs)
+4. Document queue and processors initialize
+
+**First Scan (Within 1 hour):**
+1. Scanner fetches all notes from Nextcloud
+2. Compares with Qdrant (likely empty on first run)
+3. Enqueues all notes for indexing
+4. Processors generate embeddings (may take minutes for large note collections)
+5. Vectors stored in Qdrant with user_id filtering
+
+**Hourly Thereafter:**
+1. Scanner fetches all notes
+2. Identifies new/modified/deleted notes (ETag comparison)
+3. Enqueues changes only
+4. Incremental updates processed
+
+### Performance Expectations
+
+**Embedding Generation:**
+- **Without GPU**: 1-5 notes/second (CPU-bound)
+- **With GPU**: 50-200 notes/second (highly parallel)
+- **Initial indexing**: 100 notes ≈ 20-100 seconds (CPU), 1-2 seconds (GPU)
+
+**Search Query:**
+- **Embedding generation**: 50-100ms
+- **Vector search**: 10-50ms (depends on collection size)
+- **Access verification**: 20-100ms per document (Nextcloud API calls)
+- **Total latency**: 100-300ms typical
+
+**Resource Usage:**
+- **Idle**: Minimal (background scanner sleeps)
+- **Scanning**: Moderate CPU (ETag checks, API calls)
+- **Processing**: High CPU/GPU (embedding generation)
+- **Searching**: Low to moderate (depends on query frequency)
+
+### Background Sync Behavior
+
+**Scanner Triggers:**
+- Hourly (configurable via `VECTOR_SYNC_INTERVAL`)
+- Manual trigger via `nc_trigger_vector_sync` (future)
+
+**Queue Processing:**
+- Continuous (workers always running)
+- Batch processing (fetch 10 documents at a time)
+- Concurrent workers (4 by default)
+
+**Error Handling:**
+- Individual document failures logged but don't stop scanning
+- Retries for transient errors (network timeouts, rate limits)
+- Failed documents skipped, re-attempted on next scan
+
+**What Gets Indexed:**
+- **Notes**: All notes accessible to the authenticated user
+- **Future**: Calendar events, tasks, deck cards, files with text extraction, contacts
+
+## Monitoring & Observability
+
+### MCP Tools
+
+**`nc_get_vector_sync_status`** - Check sync status
+```python
+{
+  "total_documents": 1234,
+  "indexed_documents": 1200,
+  "pending_documents": 34,
+  "sync_enabled": true,
+  "last_scan": "2025-01-15T14:30:00Z",
+  "status": "syncing"  # idle | syncing | error
+}
+```
+
+**Interpreting Status:**
+- `idle`: No pending work, last scan completed successfully
+- `syncing`: Currently processing documents
+- `error`: Last scan failed (check logs)
+
+### Logs to Check
+
+**Scanner Logs:**
+```
+[INFO] Vector sync scanner started (interval: 3600s)
+[INFO] Scanning notes: found 150 documents
+[INFO] Changes detected: 5 new, 2 modified, 1 deleted
+[INFO] Enqueued 7 documents for processing
+```
+
+**Processor Logs:**
+```
+[INFO] Processing document: note_123
+[DEBUG] Generated embedding (768 dimensions)
+[INFO] Stored vector in Qdrant: note_123
+```
+
+**Error Logs:**
+```
+[ERROR] Failed to generate embedding for note_123: Connection timeout
+[WARN] Qdrant connection lost, retrying...
+[ERROR] Ollama embedding failed: Model not found
+```
+
+**Log Locations:**
+- **Docker**: `docker compose logs mcp`
+- **Local**: stdout (redirect to file if needed)
+- **Kubernetes**: `kubectl logs -f deployment/nextcloud-mcp-server`
+
+### Metrics to Monitor
+
+**Indexing Progress:**
+- Total documents vs indexed documents
+- Pending queue size
+- Processing rate (docs/second)
+
+**Search Performance:**
+- Query latency (p50, p95, p99)
+- Results per query
+- Verification overhead (API calls per query)
+
+**Resource Usage:**
+- Qdrant memory/disk usage
+- Ollama CPU/GPU usage
+- MCP server memory
+
+For detailed observability setup, see [docs/observability.md](observability.md).
+
+## Troubleshooting from Architecture Perspective
+
+### Documents Not Appearing in Search
+
+**Diagnosis Flow:**
+1. Check sync status: `nc_get_vector_sync_status`
+   - `sync_enabled: false` → Enable with `VECTOR_SYNC_ENABLED=true`
+   - `status: error` → Check scanner logs for failures
+2. Check queue size:
+   - `pending_documents > 0` → Processing in progress, wait
+   - `pending_documents == 0` but `indexed_documents` low → Scan hasn't run yet (wait up to 1 hour)
+3. Check Qdrant:
+   - Connection errors in logs → Verify `QDRANT_URL` or `QDRANT_LOCATION`
+   - Collection empty → First scan hasn't completed
+4. Check Ollama:
+   - Embedding errors in logs → Verify `OLLAMA_BASE_URL`
+   - Model not found → Pull model: `ollama pull nomic-embed-text`
+
+**Common Causes:**
+- Sync disabled (default): Enable `VECTOR_SYNC_ENABLED=true`
+- Ollama not running: Start Ollama service
+- Qdrant not accessible: Check network/URL
+- First scan in progress: Wait up to 1 hour + processing time
+
+### Slow Search Performance
+
+**Diagnosis:**
+1. **Query embedding slow (>500ms)**:
+   - Ollama overloaded or CPU-bound
+   - Solution: Use GPU, upgrade CPU, or reduce concurrent requests
+2. **Vector search slow (>200ms)**:
+   - Large collection (millions of vectors)
+   - Solution: Use network Qdrant with SSDs, add indexing
+3. **Verification slow (>500ms)**:
+   - Many results to verify (10+ documents)
+   - Nextcloud API slow or overloaded
+   - Solution: Reduce `limit` parameter, optimize Nextcloud
+
+**Performance Tuning:**
+- Reduce search `limit` (default: 10 results)
+- Use network Qdrant for large collections
+- Enable Ollama GPU acceleration
+- Check Nextcloud API response times
+
+### Background Sync Stopped
+
+**Diagnosis:**
+1. Check logs for errors:
+   - Authentication failures (401/403) → Token expired (OAuth) or credentials invalid (BasicAuth)
+   - Connection timeouts → Network issues with Nextcloud/Qdrant/Ollama
+   - Rate limiting (429) → Reduce scan frequency
+2. Check `nc_get_vector_sync_status`:
+   - `status: error` → See logs for details
+   - `last_scan` timestamp old (>2 hours) → Scanner may have crashed
+3. Verify services:
+   - Qdrant accessible: `curl http://qdrant:6333/`
+   - Ollama accessible: `curl http://ollama:11434/api/tags`
+   - Nextcloud accessible: Check API health
+
+**OAuth Mode (Future):**
+- Offline access token expired → Re-provision via `provision_vector_sync`
+- User deprovisioned access → Sync stops intentionally
+
+### Out of Memory
+
+**Diagnosis:**
+1. Check Qdrant mode:
+   - In-memory mode with large collection → Switch to persistent or network mode
+2. Check embedding batch size:
+   - Too many documents processed simultaneously → Reduce worker count
+3. Check Ollama memory:
+   - Large models loaded → Use smaller embedding model
+
+**Solutions:**
+- Use persistent or network Qdrant (frees server memory)
+- Reduce concurrent processor workers
+- Use smaller embedding model (`all-minilm` instead of `nomic-embed-text`)
+- Increase server memory allocation
+
+## Limitations & Future Work
+
+### Current Limitations
+
+1. **Notes App Only**
+   - Architecture supports multiple apps (plugin system ready)
+   - Only `NotesScanner` and `NotesProcessor` implemented
+   - Future: Calendar, Deck, Files, Contacts
+
+2. **MCP Sampling Support**
+   - `nc_semantic_search_answer` requires client sampling capability
+   - Not all MCP clients support sampling yet
+   - Graceful fallback: Returns documents without generated answer
+
+3. **OAuth Background Sync**
+   - User-controlled background jobs not yet implemented
+   - Currently works in BasicAuth mode only
+   - Future: Users opt-in via `provision_vector_sync` tool
+
+4. **No Incremental Updates**
+   - Document changes trigger full re-embedding
+   - Cannot update just modified paragraphs
+   - Future: Paragraph-level chunking and incremental updates
+
+5. **No Query Caching**
+   - Each search generates new query embedding
+   - Repeated queries re-search Qdrant
+   - Future: Cache recent query embeddings and results
+
+6. **Single Embedding Model**
+   - Uses one model for all documents and queries
+   - Cannot customize per app or user
+   - Future: App-specific or user-selected models
+
+### Future Enhancements
+
+**Multi-App Support** (In Progress):
+- Scanner plugins for Calendar, Deck, Files, Contacts
+- Unified vector search across all apps
+- App-specific metadata in vector payloads
+
+**User-Controlled Sync (OAuth Mode)**:
+- `provision_vector_sync` and `deprovision_vector_sync` tools
+- Per-user background job scheduling
+- User dashboard for sync status and controls
+
+**Advanced Search Features**:
+- Hybrid search (vector + keyword combined)
+- Filtering by date range, app type, tags
+- Aggregations and faceted search
+- Search result explanations (why this matched)
+
+**Performance Optimizations**:
+- Query caching for repeated searches
+- Incremental document updates (paragraph-level)
+- Batch query processing
+- Qdrant HNSW indexing tuning
+
+**Embedding Improvements**:
+- Support for OpenAI embeddings (ada-002, text-embedding-3)
+- Multi-language embedding models
+- Fine-tuned models for Nextcloud content
+- Paragraph-level chunking for long documents
+
+## References
+
+### Architecture Decision Records (ADRs)
+
+- **[ADR-003: Vector Database Semantic Search](ADR-003-vector-database-semantic-search.md)** - Qdrant selection rationale, embedding strategy, hybrid search (superseded by ADR-007 but technical decisions remain valid)
+- **[ADR-007: Background Vector Sync Job Management](ADR-007-background-vector-sync-job-management.md)** - Current implementation, Scanner-Queue-Processor architecture, plugin system
+- **[ADR-008: MCP Sampling for Semantic Search](ADR-008-mcp-sampling-for-semantic-search.md)** - RAG with MCP sampling, client-server separation, prompt construction
+- **[ADR-009: Semantic Search OAuth Scope](ADR-009-semantic-search-oauth-scope.md)** - OAuth scope model, dual-phase authorization, security rationale
+
+### Configuration & Setup
+
+- **[Configuration Guide](configuration.md)** - Environment variables, Qdrant setup, Ollama setup, detailed configuration options
+- **[Installation Guide](installation.md)** - Deployment options (Docker, Kubernetes, local)
+- **[Running the Server](running.md)** - Starting the server, transport options, testing
+
+### Monitoring & Troubleshooting
+
+- **[Observability Guide](observability.md)** - Logging, metrics, tracing, debugging
+- **[Troubleshooting](troubleshooting.md)** - General issues and solutions
+
+### Related Documentation
+
+- **[OAuth Architecture](oauth-architecture.md)** - OAuth flows, scopes, token management
+- **[Comparison with Context Agent](comparison-context-agent.md)** - When to use Nextcloud MCP Server vs Context Agent
+
+---
+
+**Questions or Issues?**
+- [Open an issue](https://github.com/cbcoutinho/nextcloud-mcp-server/issues)
+- [Contribute improvements](https://github.com/cbcoutinho/nextcloud-mcp-server/pulls)
@@ -124,3 +124,75 @@ ENABLE_CUSTOM_PROCESSOR=false

 # Comma-separated MIME types your processor supports
 #CUSTOM_PROCESSOR_TYPES=application/pdf,image/jpeg,image/png
+
+# ============================================
+# Semantic Search & Vector Sync Configuration
+# ============================================
+# EXPERIMENTAL: Semantic search for Notes app (multi-app support planned)
+# Requires: Qdrant vector database + Ollama embedding service
+# Disabled by default
+
+# Enable background vector indexing
+VECTOR_SYNC_ENABLED=false
+
+# Document scan interval in seconds (default: 300 = 5 minutes)
+# How often to check for new/updated documents
+#VECTOR_SYNC_SCAN_INTERVAL=300
+
+# Concurrent indexing workers (default: 3)
+# Number of parallel workers for embedding generation
+#VECTOR_SYNC_PROCESSOR_WORKERS=3
+
+# Max queued documents (default: 10000)
+# Maximum documents waiting to be processed
+#VECTOR_SYNC_QUEUE_MAX_SIZE=10000
+
+# ============================================
+# Qdrant Vector Database Configuration
+# ============================================
+# Choose ONE of three modes:
+# 1. In-memory mode (default): Set neither QDRANT_URL nor QDRANT_LOCATION
+# 2. Persistent local: Set QDRANT_LOCATION=/path/to/data
+# 3. Network mode: Set QDRANT_URL=http://qdrant:6333
+
+# Network mode: URL to Qdrant service
+#QDRANT_URL=http://qdrant:6333
+
+# Local mode: Path to store vectors (use :memory: for in-memory)
+#QDRANT_LOCATION=:memory:
+
+# API key for network mode (optional)
+#QDRANT_API_KEY=
+
+# Collection name (optional - auto-generated if not set)
+# Auto-generation format: {deployment-id}-{model-name}
+# Allows safe model switching and multi-server deployments
+#QDRANT_COLLECTION=nextcloud_content
+
+# ============================================
+# Ollama Embedding Service Configuration
+# ============================================
+# Ollama endpoint for embeddings (if not set, uses SimpleEmbeddingProvider fallback)
+#OLLAMA_BASE_URL=http://ollama:11434
+
+# Embedding model to use (default: nomic-embed-text, 768 dimensions)
+# Changing this creates a new collection (requires re-embedding all documents)
+#OLLAMA_EMBEDDING_MODEL=nomic-embed-text
+
+# Verify SSL certificates (default: true)
+#OLLAMA_VERIFY_SSL=true
+
+# ============================================
+# Document Chunking Configuration
+# ============================================
+# Configure how documents are split before embedding
+
+# Words per chunk (default: 512)
+# Smaller chunks (256-384): More precise, less context, more storage
+# Larger chunks (768-1024): More context, less precise, less storage
+#DOCUMENT_CHUNK_SIZE=512
+
+# Overlapping words between chunks (default: 50)
+# Recommended: 10-20% of chunk size
+# Preserves context across chunk boundaries
+#DOCUMENT_CHUNK_OVERLAP=50
@@ -8,9 +8,11 @@ from typing import TYPE_CHECKING, Optional
 if TYPE_CHECKING:
    from nextcloud_mcp_server.auth.refresh_token_storage import RefreshTokenStorage

+import anyio
 import click
 import httpx
 import uvicorn
+from anyio.streams.memory import MemoryObjectReceiveStream, MemoryObjectSendStream
 from mcp.server.auth.settings import AuthSettings
 from mcp.server.fastmcp import Context, FastMCP
 from pydantic import AnyHttpUrl
@@ -30,23 +32,30 @@ from nextcloud_mcp_server.auth import (
 from nextcloud_mcp_server.auth.unified_verifier import UnifiedTokenVerifier
 from nextcloud_mcp_server.client import NextcloudClient
 from nextcloud_mcp_server.config import (
-    LOGGING_CONFIG,
    get_document_processor_config,
-    setup_logging,
+    get_settings,
 )
 from nextcloud_mcp_server.context import get_client as get_nextcloud_client
 from nextcloud_mcp_server.document_processors import get_registry
+from nextcloud_mcp_server.observability import (
+    ObservabilityMiddleware,
+    get_uvicorn_logging_config,
+    setup_metrics,
+    setup_tracing,
+)
 from nextcloud_mcp_server.server import (
    configure_calendar_tools,
    configure_contacts_tools,
    configure_cookbook_tools,
    configure_deck_tools,
    configure_notes_tools,
+    configure_semantic_tools,
    configure_sharing_tools,
    configure_tables_tools,
    configure_webdav_tools,
 )
 from nextcloud_mcp_server.server.oauth_tools import register_oauth_tools
+from nextcloud_mcp_server.vector import processor_task, scanner_task

 logger = logging.getLogger(__name__)

@@ -206,6 +215,10 @@ class AppContext:
    """Application context for BasicAuth mode."""

    client: NextcloudClient
+    document_send_stream: Optional[MemoryObjectSendStream] = None
+    document_receive_stream: Optional[MemoryObjectReceiveStream] = None
+    shutdown_event: Optional[anyio.Event] = None
+    scanner_wake_event: Optional[anyio.Event] = None


@dataclass
@@ -369,6 +382,9 @@ async def app_lifespan_basic(server: FastMCP) -> AsyncIterator[AppContext]:

    Creates a single Nextcloud client with basic authentication
    that is shared across all requests.
+
+    If vector sync is enabled (VECTOR_SYNC_ENABLED=true), also starts
+    background tasks for automatic document indexing (ADR-007).
    """
    logger.info("Starting MCP server in BasicAuth mode")
    logger.info("Creating Nextcloud client with BasicAuth")
@@ -379,11 +395,77 @@ async def app_lifespan_basic(server: FastMCP) -> AsyncIterator[AppContext]:
    # Initialize document processors
    initialize_document_processors()

-    try:
-        yield AppContext(client=client)
-    finally:
-        logger.info("Shutting down BasicAuth mode")
-        await client.close()
+    settings = get_settings()
+
+    # Check if vector sync is enabled
+    if settings.vector_sync_enabled:
+        logger.info("Vector sync enabled - starting background tasks")
+
+        # Get username from environment for BasicAuth mode
+        username = os.getenv("NEXTCLOUD_USERNAME")
+        if not username:
+            raise ValueError(
+                "NEXTCLOUD_USERNAME is required for vector sync in BasicAuth mode"
+            )
+
+        # Initialize shared state
+        send_stream, receive_stream = anyio.create_memory_object_stream(
+            max_buffer_size=settings.vector_sync_queue_max_size
+        )
+        shutdown_event = anyio.Event()
+        scanner_wake_event = anyio.Event()
+
+        # Start background tasks using anyio TaskGroup
+        async with anyio.create_task_group() as tg:
+            # Start scanner task
+            tg.start_soon(
+                scanner_task,
+                send_stream,
+                shutdown_event,
+                scanner_wake_event,
+                client,
+                username,
+            )
+
+            # Start processor pool (each gets a cloned receive stream)
+            for i in range(settings.vector_sync_processor_workers):
+                tg.start_soon(
+                    processor_task,
+                    i,
+                    receive_stream.clone(),
+                    shutdown_event,
+                    client,
+                    username,
+                )
+
+            logger.info(
+                f"Background sync tasks started: 1 scanner + {settings.vector_sync_processor_workers} processors"
+            )
+
+            # Yield with background tasks running
+            try:
+                yield AppContext(
+                    client=client,
+                    document_send_stream=send_stream,
+                    document_receive_stream=receive_stream,
+                    shutdown_event=shutdown_event,
+                    scanner_wake_event=scanner_wake_event,
+                )
+            finally:
+                # Shutdown signal
+                logger.info("Shutting down background sync tasks")
+                shutdown_event.set()
+
+                # TaskGroup automatically cancels all tasks on exit
+                logger.info("Background sync tasks stopped")
+                await client.close()
+    else:
+        # No vector sync - simple lifecycle
+        try:
+            yield AppContext(client=client)
+        finally:
+            logger.info("Shutting down BasicAuth mode")
+            await client.close()


 async def setup_oauth_config():
@@ -698,7 +780,28 @@ async def setup_oauth_config():


 def get_app(transport: str = "sse", enabled_apps: list[str] | None = None):
-    setup_logging()
+    # Initialize observability (logging will be configured by uvicorn)
+    settings = get_settings()
+
+    # Setup Prometheus metrics (always enabled by default)
+    if settings.metrics_enabled:
+        setup_metrics(port=settings.metrics_port)
+        logger.info(
+            f"Prometheus metrics enabled on dedicated port {settings.metrics_port}"
+        )
+
+    # Setup OpenTelemetry tracing (optional)
+    if settings.tracing_enabled:
+        setup_tracing(
+            service_name=settings.otel_service_name,
+            otlp_endpoint=settings.otel_exporter_otlp_endpoint,
+            sampling_rate=settings.otel_traces_sampler_arg,
+        )
+        logger.info(
+            f"OpenTelemetry tracing enabled (endpoint: {settings.otel_exporter_otlp_endpoint})"
+        )
+    else:
+        logger.info("OpenTelemetry tracing disabled (set OTEL_ENABLED=true to enable)")

    # Determine authentication mode
    oauth_enabled = is_oauth_mode()
@@ -798,6 +901,14 @@ def get_app(transport: str = "sse", enabled_apps: list[str] | None = None):
                f"Unknown app: {app_name}. Available apps: {list(available_apps.keys())}"
            )

+    # Register semantic search tools (cross-app feature)
+    settings = get_settings()
+    if settings.vector_sync_enabled:
+        logger.info("Configuring semantic search tools (vector sync enabled)")
+        configure_semantic_tools(mcp)
+    else:
+        logger.info("Skipping semantic search tools (VECTOR_SYNC_ENABLED not set)")
+
    # Register OAuth provisioning tools (only when offline access is enabled)
    # With token exchange enabled (external IdP), provisioning is not needed for MCP operations
    enable_token_exchange = (
@@ -924,9 +1035,95 @@ def get_app(transport: str = "sse", enabled_apps: list[str] | None = None):
                    f"OAuth context initialized for login routes (client_id={client_id[:16]}...)"
                )

-            async with AsyncExitStack() as stack:
-                await stack.enter_async_context(mcp.session_manager.run())
-                yield
+            # Start background vector sync tasks for BasicAuth mode (ADR-007)
+            # For streamable-http transport, FastMCP lifespan isn't automatically triggered
+            # so we manually start background tasks here if vector sync is enabled
+            import anyio as anyio_module
+
+            settings = get_settings()
+            if not oauth_enabled and settings.vector_sync_enabled:
+                logger.info("Starting background vector sync tasks for BasicAuth mode")
+
+                # Get username from environment
+                username = os.getenv("NEXTCLOUD_USERNAME")
+                if not username:
+                    raise ValueError(
+                        "NEXTCLOUD_USERNAME required for vector sync in BasicAuth mode"
+                    )
+
+                # Get Nextcloud client from MCP app context
+                # Create client since we're outside FastMCP lifespan
+                client = NextcloudClient.from_env()
+
+                # Initialize shared state
+                send_stream, receive_stream = anyio_module.create_memory_object_stream(
+                    max_buffer_size=settings.vector_sync_queue_max_size
+                )
+                shutdown_event = anyio_module.Event()
+                scanner_wake_event = anyio_module.Event()
+
+                # Store in app state for access from routes (ADR-007)
+                app.state.document_send_stream = send_stream
+                app.state.document_receive_stream = receive_stream
+                app.state.shutdown_event = shutdown_event
+                app.state.scanner_wake_event = scanner_wake_event
+
+                # Also share with browser_app for /user/page route
+                for route in app.routes:
+                    if isinstance(route, Mount) and route.path == "/user":
+                        route.app.state.document_send_stream = send_stream
+                        route.app.state.document_receive_stream = receive_stream
+                        route.app.state.shutdown_event = shutdown_event
+                        route.app.state.scanner_wake_event = scanner_wake_event
+                        logger.info(
+                            "Vector sync state shared with browser_app for /user/page"
+                        )
+                        break
+
+                # Start background tasks using anyio TaskGroup
+                async with anyio_module.create_task_group() as tg:
+                    # Start scanner task
+                    tg.start_soon(
+                        scanner_task,
+                        send_stream,
+                        shutdown_event,
+                        scanner_wake_event,
+                        client,
+                        username,
+                    )
+
+                    # Start processor pool (each gets a cloned receive stream)
+                    for i in range(settings.vector_sync_processor_workers):
+                        tg.start_soon(
+                            processor_task,
+                            i,
+                            receive_stream.clone(),
+                            shutdown_event,
+                            client,
+                            username,
+                        )
+
+                    logger.info(
+                        f"Background sync tasks started: 1 scanner + "
+                        f"{settings.vector_sync_processor_workers} processors"
+                    )
+
+                    # Run MCP session manager and yield
+                    async with AsyncExitStack() as stack:
+                        await stack.enter_async_context(mcp.session_manager.run())
+                        try:
+                            yield
+                        finally:
+                            # Shutdown signal
+                            logger.info("Shutting down background sync tasks")
+                            shutdown_event.set()
+                            await client.close()
+                            # TaskGroup automatically cancels all tasks on exit
+            else:
+                # No vector sync - just run MCP session manager
+                async with AsyncExitStack() as stack:
+                    await stack.enter_async_context(mcp.session_manager.run())
+                    yield

    # Health check endpoints for Kubernetes probes
    def health_live(request):
@@ -946,7 +1143,7 @@ def get_app(transport: str = "sse", enabled_apps: list[str] | None = None):
        """Readiness probe endpoint.

        Returns 200 OK if the application is ready to serve traffic.
-        Checks that required configuration is present.
+        Checks that required configuration is present and Qdrant if vector sync enabled.
        """
        checks = {}
        is_ready = True
@@ -976,6 +1173,29 @@ def get_app(transport: str = "sse", enabled_apps: list[str] | None = None):
                checks["auth_configured"] = "error: credentials not set"
                is_ready = False

+        # Check Qdrant status if using network mode (external Qdrant service)
+        # In-memory and persistent modes use embedded Qdrant, no external service to check
+        vector_sync_enabled = (
+            os.getenv("VECTOR_SYNC_ENABLED", "false").lower() == "true"
+        )
+        qdrant_url = os.getenv("QDRANT_URL")  # Only set in network mode
+
+        if vector_sync_enabled and qdrant_url:
+            try:
+                async with httpx.AsyncClient(timeout=2.0) as client:
+                    response = await client.get(f"{qdrant_url}/readyz")
+                    if response.status_code == 200:
+                        checks["qdrant"] = "ok"
+                    else:
+                        checks["qdrant"] = f"error: status {response.status_code}"
+                        is_ready = False
+            except Exception as e:
+                checks["qdrant"] = f"error: {str(e)}"
+                is_ready = False
+        elif vector_sync_enabled:
+            # Using embedded Qdrant (memory or persistent mode)
+            checks["qdrant"] = "embedded"
+
        status_code = 200 if is_ready else 503
        return JSONResponse(
            {
@@ -993,6 +1213,9 @@ def get_app(transport: str = "sse", enabled_apps: list[str] | None = None):
    routes.append(Route("/health/ready", health_ready, methods=["GET"]))
    logger.info("Health check endpoints enabled: /health/live, /health/ready")

+    # Note: Metrics endpoint is NOT exposed on main HTTP port for security reasons.
+    # Metrics are served on dedicated port via setup_metrics() (default: 9090)
+
    if oauth_enabled:
        # Import OAuth routes (ADR-004 Progressive Consent)
        from nextcloud_mcp_server.auth.oauth_routes import oauth_authorize
@@ -1156,7 +1379,7 @@ def get_app(transport: str = "sse", enabled_apps: list[str] | None = None):
        "Routes: /user/* with SessionAuth, /mcp with FastMCP OAuth Bearer tokens"
    )

-    # Add debugging middleware to log Authorization headers
+    # Add debugging middleware to log Authorization headers and client capabilities
    @app.middleware("http")
    async def log_auth_headers(request, call_next):
        auth_header = request.headers.get("authorization")
@@ -1171,6 +1394,52 @@ def get_app(transport: str = "sse", enabled_apps: list[str] | None = None):
                logger.warning(
                    f"⚠️  /mcp request WITHOUT Authorization header from {request.client}"
                )
+
+            # Log client capabilities on initialize request
+            if request.method == "POST":
+                # Read body to check for initialize request
+                # Starlette caches the body internally, so it's safe to read here
+                body = await request.body()
+                try:
+                    import json
+
+                    data = json.loads(body)
+                    # Check if this is an initialize request
+                    if data.get("method") == "initialize":
+                        params = data.get("params", {})
+                        capabilities = params.get("capabilities", {})
+                        client_info = params.get("clientInfo", {})
+
+                        logger.info(
+                            f"🔌 MCP client connected: {client_info.get('name', 'unknown')} "
+                            f"v{client_info.get('version', 'unknown')}"
+                        )
+
+                        # Log capabilities in a structured way
+                        cap_summary = []
+                        # Check for presence using 'in' not truthiness (empty dict {} counts as having capability)
+                        if "roots" in capabilities:
+                            cap_summary.append("roots")
+                        if "sampling" in capabilities:
+                            cap_summary.append("sampling")
+                        if "experimental" in capabilities:
+                            cap_summary.append(
+                                f"experimental({len(capabilities['experimental'])} features)"
+                            )
+
+                        logger.info(
+                            f"📋 Client capabilities: {', '.join(cap_summary) if cap_summary else 'none'}"
+                        )
+                        # Log full capabilities at INFO level to diagnose capability issues
+                        logger.info(
+                            f"Full capabilities JSON: {json.dumps(capabilities)}"
+                        )
+                except Exception as e:
+                    # Don't fail the request if logging fails
+                    logger.debug(
+                        f"Failed to parse MCP request for capability logging: {e}"
+                    )
+
        response = await call_next(request)
        return response

@@ -1184,6 +1453,11 @@ def get_app(transport: str = "sse", enabled_apps: list[str] | None = None):
        expose_headers=["*"],
    )

+    # Add observability middleware (metrics + tracing)
+    if settings.metrics_enabled or settings.tracing_enabled:
+        app.add_middleware(ObservabilityMiddleware)
+        logger.info("Observability middleware enabled (metrics and/or tracing)")
+
    # Add exception handler for scope challenges (OAuth mode only)
    if oauth_enabled:

@@ -1440,8 +1714,20 @@ def run(

    app = get_app(transport=transport, enabled_apps=enabled_apps)

+    # Get observability settings and create uvicorn logging config
+    settings = get_settings()
+    uvicorn_log_config = get_uvicorn_logging_config(
+        log_format=settings.log_format,
+        log_level=settings.log_level,
+        include_trace_context=settings.log_include_trace_context,
+    )
+
    uvicorn.run(
-        app=app, host=host, port=port, log_level=log_level, log_config=LOGGING_CONFIG
+        app=app,
+        host=host,
+        port=port,
+        log_level=log_level,
+        log_config=uvicorn_log_config,
    )


@@ -79,19 +79,22 @@ async def register_client(
    client_name: str = "Nextcloud MCP Server",
    redirect_uris: list[str] | None = None,
    scopes: str = "openid profile email",
-    token_type: str = "Bearer",
+    token_type: str | None = "Bearer",
    resource_url: str | None = None,
 ) -> ClientInfo:
    """
-    Register a new OAuth client with Nextcloud OIDC using dynamic client registration.
+    Register a new OAuth client using RFC 7591 Dynamic Client Registration.
+
+    This function supports both Nextcloud OIDC and standard OIDC providers like Keycloak.

    Args:
-        nextcloud_url: Base URL of the Nextcloud instance
+        nextcloud_url: Base URL of the OIDC provider
        registration_endpoint: Full URL to the registration endpoint
        client_name: Name of the client application
        redirect_uris: List of redirect URIs (default: http://localhost:8000/oauth/callback)
        scopes: Space-separated list of scopes to request
-        token_type: Type of access tokens to issue (default: "Bearer", also supports "JWT")
+        token_type: Type of access tokens (default: "Bearer", supports "JWT" for Nextcloud).
+                    Set to None to omit this field (required for Keycloak and other standard providers).
        resource_url: OAuth 2.0 Protected Resource URL (RFC 9728) - used for token introspection authorization

    Returns:
@@ -100,6 +103,11 @@ async def register_client(
    Raises:
        httpx.HTTPStatusError: If registration fails
        ValueError: If response is invalid
+
+    Note:
+        The token_type parameter is a Nextcloud-specific extension and is not part of RFC 7591.
+        Standard OIDC providers like Keycloak do not accept this field and will return a 400 error
+        if it's included. Set token_type=None when registering with Keycloak or other standard providers.
    """
    if redirect_uris is None:
        redirect_uris = ["http://localhost:8000/oauth/callback"]
@@ -111,9 +119,12 @@ async def register_client(
        "grant_types": ["authorization_code", "refresh_token"],
        "response_types": ["code"],
        "scope": scopes,
-        "token_type": token_type,
    }

+    # Add token_type if provided (Nextcloud-specific, not RFC 7591 standard)
+    if token_type is not None:
+        client_metadata["token_type"] = token_type
+
    # Add resource_url if provided (RFC 9728)
    if resource_url:
        client_metadata["resource_url"] = resource_url
@@ -19,6 +19,75 @@ from starlette.responses import HTMLResponse, JSONResponse
 logger = logging.getLogger(__name__)


+async def _get_processing_status(request: Request) -> dict[str, Any] | None:
+    """Get vector sync processing status.
+
+    Returns processing status information including indexed count, pending count,
+    and sync status. Only available when VECTOR_SYNC_ENABLED=true.
+
+    Args:
+        request: Starlette request object
+
+    Returns:
+        Dictionary with processing status, or None if vector sync is disabled
+        or components are unavailable:
+        {
+            "indexed_count": int,  # Number of documents in Qdrant
+            "pending_count": int,  # Number of documents in queue
+            "status": str,  # "syncing" or "idle"
+        }
+    """
+    # Check if vector sync is enabled
+    vector_sync_enabled = os.getenv("VECTOR_SYNC_ENABLED", "false").lower() == "true"
+    if not vector_sync_enabled:
+        return None
+
+    try:
+        # Get document receive stream from app state
+        document_receive_stream = getattr(
+            request.app.state, "document_receive_stream", None
+        )
+        if document_receive_stream is None:
+            logger.debug("document_receive_stream not available in app state")
+            return None
+
+        # Get pending count from stream statistics
+        stats = document_receive_stream.statistics()
+        pending_count = stats.current_buffer_used
+
+        # Get Qdrant client and query indexed count
+        indexed_count = 0
+        try:
+            from nextcloud_mcp_server.config import get_settings
+            from nextcloud_mcp_server.vector.qdrant_client import get_qdrant_client
+
+            settings = get_settings()
+            qdrant_client = await get_qdrant_client()
+
+            # Count documents in collection
+            count_result = await qdrant_client.count(
+                collection_name=settings.get_collection_name()
+            )
+            indexed_count = count_result.count
+
+        except Exception as e:
+            logger.warning(f"Failed to query Qdrant for indexed count: {e}")
+            # Continue with indexed_count = 0
+
+        # Determine status
+        status = "syncing" if pending_count > 0 else "idle"
+
+        return {
+            "indexed_count": indexed_count,
+            "pending_count": pending_count,
+            "status": status,
+        }
+
+    except Exception as e:
+        logger.error(f"Error getting processing status: {e}")
+        return None
+
+
 async def _get_userinfo_endpoint(oauth_ctx: dict[str, Any]) -> str | None:
    """Get the correct userinfo endpoint based on OAuth mode.

@@ -224,6 +293,9 @@ async def user_info_html(request: Request) -> HTMLResponse:
    """
    user_context = await _get_user_info(request)

+    # Get vector sync processing status
+    processing_status = await _get_processing_status(request)
+
    # Check for error
    if "error" in user_context and user_context["error"] != "":
        # Get login URL dynamically
@@ -371,6 +443,45 @@ async def user_info_html(request: Request) -> HTMLResponse:
            </div>
            """

+    # Build vector sync status HTML
+    vector_status_html = ""
+    if processing_status:
+        indexed_count = processing_status["indexed_count"]
+        pending_count = processing_status["pending_count"]
+        status = processing_status["status"]
+
+        # Format numbers with commas for readability
+        indexed_count_str = f"{indexed_count:,}"
+        pending_count_str = f"{pending_count:,}"
+
+        # Status badge color and text
+        if status == "syncing":
+            status_badge = (
+                '<span style="color: #ff9800; font-weight: bold;">⟳ Syncing</span>'
+            )
+        else:
+            status_badge = (
+                '<span style="color: #4caf50; font-weight: bold;">✓ Idle</span>'
+            )
+
+        vector_status_html = f"""
+        <h2>Vector Sync Status</h2>
+        <table>
+            <tr>
+                <td><strong>Indexed Documents</strong></td>
+                <td>{indexed_count_str}</td>
+            </tr>
+            <tr>
+                <td><strong>Pending Documents</strong></td>
+                <td>{pending_count_str}</td>
+            </tr>
+            <tr>
+                <td><strong>Status</strong></td>
+                <td>{status_badge}</td>
+            </tr>
+        </table>
+        """
+
    # Build IdP profile HTML
    idp_profile_html = ""
    if "idp_profile" in user_context:
@@ -507,6 +618,7 @@ async def user_info_html(request: Request) -> HTMLResponse:

            {host_info_html}
            {session_info_html}
+            {vector_status_html}
            {idp_profile_html}

            {f'<div class="logout"><a href="{logout_url}" class="button">Logout</a></div>' if auth_mode == "oauth" else ""}
@@ -7,6 +7,12 @@ from functools import wraps

 from httpx import AsyncClient, HTTPStatusError, RequestError, codes

+from nextcloud_mcp_server.observability.metrics import (
+    record_nextcloud_api_call,
+    record_nextcloud_api_retry,
+)
+from nextcloud_mcp_server.observability.tracing import trace_nextcloud_api_call
+
 logger = logging.getLogger(__name__)


@@ -38,6 +44,9 @@ def retry_on_429(func):
                    logger.warning(
                        f"429 Client Error: Too Many Requests, Number of attempts: {retries}"
                    )
+                    # Record retry metric (extract app name from args if available)
+                    if len(args) > 0 and hasattr(args[0], "app_name"):
+                        record_nextcloud_api_retry(app=args[0].app_name, reason="429")
                    time.sleep(5)
                elif e.response.status_code == 404:
                    # 404 errors are often expected (e.g., checking if attachments exist)
@@ -72,6 +81,9 @@ def retry_on_429(func):
 class BaseNextcloudClient(ABC):
    """Base class for all Nextcloud app clients."""

+    # Subclasses should set this to identify the app for metrics/tracing
+    app_name: str = "unknown"
+
    def __init__(self, http_client: AsyncClient, username: str):
        """Initialize with shared HTTP client and username.

@@ -88,7 +100,7 @@ class BaseNextcloudClient(ABC):

    @retry_on_429
    async def _make_request(self, method: str, url: str, **kwargs):
-        """Common request wrapper with logging and error handling.
+        """Common request wrapper with logging, tracing, and error handling.

        Args:
            method: HTTP method
@@ -99,6 +111,47 @@ class BaseNextcloudClient(ABC):
            Response object
        """
        logger.debug(f"Making {method} request to {url}")
-        response = await self._client.request(method, url, **kwargs)
-        response.raise_for_status()
-        return response
+
+        # Start timer for metrics
+        start_time = time.time()
+        status_code = 0
+
+        try:
+            # Wrap request in trace span
+            with trace_nextcloud_api_call(
+                app=self.app_name,
+                method=method,
+                path=url,
+            ):
+                response = await self._client.request(method, url, **kwargs)
+                status_code = response.status_code
+                response.raise_for_status()
+
+                # Record successful API call metrics
+                duration = time.time() - start_time
+                record_nextcloud_api_call(
+                    app=self.app_name,
+                    method=method,
+                    status_code=status_code,
+                    duration=duration,
+                )
+
+                return response
+
+        except (HTTPStatusError, RequestError) as e:
+            # Record error metrics
+            if isinstance(e, HTTPStatusError):
+                status_code = e.response.status_code
+            else:
+                status_code = 0  # Connection error, no status code
+
+            duration = time.time() - start_time
+            record_nextcloud_api_call(
+                app=self.app_name,
+                method=method,
+                status_code=status_code,
+                duration=duration,
+            )
+
+            # Re-raise the exception
+            raise
@@ -13,6 +13,8 @@ logger = logging.getLogger(__name__)
 class ContactsClient(BaseNextcloudClient):
    """Client for NextCloud CardDAV contact operations."""

+    app_name = "contacts"
+
    def _get_carddav_base_path(self) -> str:
        """Helper to get the base CardDAV path for contacts."""
        return f"/remote.php/dav/addressbooks/users/{self.username}"
@@ -13,6 +13,8 @@ logger = logging.getLogger(__name__)
 class CookbookClient(BaseNextcloudClient):
    """Client for Nextcloud Cookbook app operations."""

+    app_name = "cookbook"
+
    async def get_version(self) -> Dict[str, Any]:
        """Get Cookbook app and API version."""
        response = await self._make_request("GET", "/apps/cookbook/api/version")
@@ -17,6 +17,8 @@ from nextcloud_mcp_server.models.deck import (
 class DeckClient(BaseNextcloudClient):
    """Client for Nextcloud Deck app operations."""

+    app_name = "deck"
+
    def _get_deck_headers(
        self, additional_headers: Optional[Dict[str, str]] = None
    ) -> Dict[str, str]:
@@ -11,6 +11,8 @@ logger = logging.getLogger(__name__)
 class GroupsClient(BaseNextcloudClient):
    """Client for Nextcloud Groups API operations."""

+    app_name = "groups"
+
    @retry_on_429
    async def search_groups(
        self,
@@ -11,6 +11,8 @@ logger = logging.getLogger(__name__)
 class NotesClient(BaseNextcloudClient):
    """Client for Nextcloud Notes app operations."""

+    app_name = "notes"
+
    async def get_settings(self) -> Dict[str, Any]:
        """Get Notes app settings."""
        response = await self._make_request("GET", "/apps/notes/api/v1/settings")
@@ -11,6 +11,8 @@ logger = logging.getLogger(__name__)
 class SharingClient(BaseNextcloudClient):
    """Client for Nextcloud OCS Sharing API operations."""

+    app_name = "sharing"
+
    @retry_on_429
    async def create_share(
        self,
@@ -11,6 +11,8 @@ logger = logging.getLogger(__name__)
 class TablesClient(BaseNextcloudClient):
    """Client for Nextcloud Tables app operations."""

+    app_name = "tables"
+
    async def list_tables(self) -> List[Dict[str, Any]]:
        """List all tables available to the user."""
        response = await self._make_request(
@@ -7,6 +7,8 @@ from nextcloud_mcp_server.models.users import UserDetails
 class UsersClient(BaseNextcloudClient):
    """Client for Nextcloud User API operations."""

+    app_name = "users"
+
    def _get_user_headers(
        self, additional_headers: Optional[Dict[str, str]] = None
    ) -> Dict[str, str]:
@@ -15,6 +15,8 @@ logger = logging.getLogger(__name__)
 class WebDAVClient(BaseNextcloudClient):
    """Client for Nextcloud WebDAV operations."""

+    app_name = "webdav"
+
    async def delete_resource(self, path: str) -> Dict[str, Any]:
        """Delete a resource (file or directory) via WebDAV DELETE."""
        # Ensure path ends with a slash if it's a directory
@@ -1,3 +1,4 @@
+import logging
 import logging.config
 import os
 from dataclasses import dataclass
@@ -156,6 +157,121 @@ class Settings:
    token_encryption_key: Optional[str] = None
    token_storage_db: Optional[str] = None

+    # Vector sync settings (ADR-007)
+    vector_sync_enabled: bool = False
+    vector_sync_scan_interval: int = 300  # seconds (5 minutes)
+    vector_sync_processor_workers: int = 3
+    vector_sync_queue_max_size: int = 10000
+
+    # Qdrant settings (mutually exclusive modes)
+    qdrant_url: Optional[str] = None  # Network mode: http://qdrant:6333
+    qdrant_location: Optional[str] = None  # Local mode: :memory: or /path/to/data
+    qdrant_api_key: Optional[str] = None
+    qdrant_collection: str = "nextcloud_content"
+
+    # Ollama settings (for embeddings)
+    ollama_base_url: Optional[str] = None
+    ollama_embedding_model: str = "nomic-embed-text"
+    ollama_verify_ssl: bool = True
+
+    # Document chunking settings (for vector embeddings)
+    document_chunk_size: int = 512  # Words per chunk
+    document_chunk_overlap: int = 50  # Overlapping words between chunks
+
+    # Observability settings
+    metrics_enabled: bool = True
+    metrics_port: int = 9090
+    tracing_enabled: bool = False
+    otel_exporter_otlp_endpoint: Optional[str] = None
+    otel_service_name: str = "nextcloud-mcp-server"
+    otel_traces_sampler: str = "always_on"
+    otel_traces_sampler_arg: float = 1.0
+    log_format: str = "json"  # "json" or "text"
+    log_level: str = "INFO"
+    log_include_trace_context: bool = True
+
+    def __post_init__(self):
+        """Validate Qdrant configuration and set defaults."""
+        logger = logging.getLogger(__name__)
+
+        # Ensure mutual exclusivity
+        if self.qdrant_url and self.qdrant_location:
+            raise ValueError(
+                "Cannot set both QDRANT_URL and QDRANT_LOCATION. "
+                "Use QDRANT_URL for network mode or QDRANT_LOCATION for local mode."
+            )
+
+        # Default to :memory: if neither set
+        if not self.qdrant_url and not self.qdrant_location:
+            self.qdrant_location = ":memory:"
+            logger.info("Using default Qdrant mode: in-memory (:memory:)")
+
+        # Warn if API key set in local mode
+        if self.qdrant_location and self.qdrant_api_key:
+            logger.warning(
+                "QDRANT_API_KEY is set but QDRANT_LOCATION is used (local mode). "
+                "API key is only relevant for network mode and will be ignored."
+            )
+
+        # Validate chunking configuration
+        if self.document_chunk_overlap >= self.document_chunk_size:
+            raise ValueError(
+                f"DOCUMENT_CHUNK_OVERLAP ({self.document_chunk_overlap}) must be less than "
+                f"DOCUMENT_CHUNK_SIZE ({self.document_chunk_size}). "
+                f"Overlap should be 10-20% of chunk size for optimal results."
+            )
+
+        if self.document_chunk_size < 100:
+            logger.warning(
+                f"DOCUMENT_CHUNK_SIZE is set to {self.document_chunk_size} words, which is quite small. "
+                f"Smaller chunks may lose context. Consider using at least 256 words."
+            )
+
+        if self.document_chunk_overlap < 0:
+            raise ValueError(
+                f"DOCUMENT_CHUNK_OVERLAP ({self.document_chunk_overlap}) cannot be negative."
+            )
+
+    def get_collection_name(self) -> str:
+        """
+        Get Qdrant collection name.
+
+        Auto-generates from deployment ID + model name unless explicitly set.
+        Deployment ID uses OTEL_SERVICE_NAME if configured, otherwise hostname.
+
+        This enables:
+        - Safe embedding model switching (new model → new collection)
+        - Multi-server deployments (unique deployment IDs)
+        - Clear collection naming (shows deployment and model)
+
+        Format: {deployment-id}-{model-name}
+
+        Examples:
+            - "my-deployment-nomic-embed-text" (OTEL_SERVICE_NAME set)
+            - "mcp-container-all-minilm" (hostname fallback)
+
+        Returns:
+            Collection name string
+        """
+        import socket
+
+        # Use explicit override if user configured non-default value
+        if self.qdrant_collection != "nextcloud_content":
+            return self.qdrant_collection
+
+        # Determine deployment ID (OTEL service name or hostname fallback)
+        if self.otel_service_name != "nextcloud-mcp-server":  # Non-default
+            deployment_id = self.otel_service_name
+        else:
+            # Fallback to hostname for simple Docker deployments without OTEL config
+            deployment_id = socket.gethostname()
+
+        # Sanitize deployment ID and model name
+        deployment_id = deployment_id.lower().replace(" ", "-").replace("_", "-")
+        model_name = self.ollama_embedding_model.replace("/", "-").replace(":", "-")
+
+        return f"{deployment_id}-{model_name}"
+

 def get_settings() -> Settings:
    """Get application settings from environment variables.
@@ -192,4 +308,39 @@ def get_settings() -> Settings:
        # Token settings
        token_encryption_key=os.getenv("TOKEN_ENCRYPTION_KEY"),
        token_storage_db=os.getenv("TOKEN_STORAGE_DB", "/tmp/tokens.db"),
+        # Vector sync settings (ADR-007)
+        vector_sync_enabled=(
+            os.getenv("VECTOR_SYNC_ENABLED", "false").lower() == "true"
+        ),
+        vector_sync_scan_interval=int(os.getenv("VECTOR_SYNC_SCAN_INTERVAL", "300")),
+        vector_sync_processor_workers=int(
+            os.getenv("VECTOR_SYNC_PROCESSOR_WORKERS", "3")
+        ),
+        vector_sync_queue_max_size=int(
+            os.getenv("VECTOR_SYNC_QUEUE_MAX_SIZE", "10000")
+        ),
+        # Qdrant settings
+        qdrant_url=os.getenv("QDRANT_URL"),
+        qdrant_location=os.getenv("QDRANT_LOCATION"),
+        qdrant_api_key=os.getenv("QDRANT_API_KEY"),
+        qdrant_collection=os.getenv("QDRANT_COLLECTION", "nextcloud_content"),
+        # Ollama settings
+        ollama_base_url=os.getenv("OLLAMA_BASE_URL"),
+        ollama_embedding_model=os.getenv("OLLAMA_EMBEDDING_MODEL", "nomic-embed-text"),
+        ollama_verify_ssl=os.getenv("OLLAMA_VERIFY_SSL", "true").lower() == "true",
+        # Document chunking settings
+        document_chunk_size=int(os.getenv("DOCUMENT_CHUNK_SIZE", "512")),
+        document_chunk_overlap=int(os.getenv("DOCUMENT_CHUNK_OVERLAP", "50")),
+        # Observability settings
+        metrics_enabled=os.getenv("METRICS_ENABLED", "true").lower() == "true",
+        metrics_port=int(os.getenv("METRICS_PORT", "9090")),
+        tracing_enabled=os.getenv("OTEL_ENABLED", "false").lower() == "true",
+        otel_exporter_otlp_endpoint=os.getenv("OTEL_EXPORTER_OTLP_ENDPOINT"),
+        otel_service_name=os.getenv("OTEL_SERVICE_NAME", "nextcloud-mcp-server"),
+        otel_traces_sampler=os.getenv("OTEL_TRACES_SAMPLER", "always_on"),
+        otel_traces_sampler_arg=float(os.getenv("OTEL_TRACES_SAMPLER_ARG", "1.0")),
+        log_format=os.getenv("LOG_FORMAT", "json"),
+        log_level=os.getenv("LOG_LEVEL", "INFO"),
+        log_include_trace_context=os.getenv("LOG_INCLUDE_TRACE_CONTEXT", "true").lower()
+        == "true",
    )
@@ -0,0 +1,6 @@
+"""Embedding service package for generating vector embeddings."""
+
+from .service import EmbeddingService, get_embedding_service
+from .simple_provider import SimpleEmbeddingProvider
+
+__all__ = ["EmbeddingService", "get_embedding_service", "SimpleEmbeddingProvider"]
@@ -0,0 +1,43 @@
+"""Abstract base class for embedding providers."""
+
+from abc import ABC, abstractmethod
+
+
+class EmbeddingProvider(ABC):
+    """Base class for embedding providers."""
+
+    @abstractmethod
+    async def embed(self, text: str) -> list[float]:
+        """
+        Generate embedding vector for text.
+
+        Args:
+            text: Input text to embed
+
+        Returns:
+            Vector embedding as list of floats
+        """
+        pass
+
+    @abstractmethod
+    async def embed_batch(self, texts: list[str]) -> list[list[float]]:
+        """
+        Generate embeddings for multiple texts (optimized).
+
+        Args:
+            texts: List of texts to embed
+
+        Returns:
+            List of vector embeddings
+        """
+        pass
+
+    @abstractmethod
+    def get_dimension(self) -> int:
+        """
+        Get embedding dimension for this provider.
+
+        Returns:
+            Vector dimension (e.g., 768 for nomic-embed-text)
+        """
+        pass
@@ -0,0 +1,85 @@
+"""Ollama embedding provider."""
+
+import logging
+
+import httpx
+
+from .base import EmbeddingProvider
+
+logger = logging.getLogger(__name__)
+
+
+class OllamaEmbeddingProvider(EmbeddingProvider):
+    """Ollama embedding provider with TLS support."""
+
+    def __init__(
+        self,
+        base_url: str,
+        model: str = "nomic-embed-text",
+        verify_ssl: bool = True,
+    ):
+        """
+        Initialize Ollama embedding provider.
+
+        Args:
+            base_url: Ollama API base URL (e.g., https://ollama.internal.coutinho.io:443)
+            model: Embedding model name (default: nomic-embed-text)
+            verify_ssl: Verify SSL certificates (default: True)
+        """
+        self.base_url = base_url.rstrip("/")
+        self.model = model
+        self.verify_ssl = verify_ssl
+        self.client = httpx.AsyncClient(verify=verify_ssl, timeout=30.0)
+        self._dimension = 768  # nomic-embed-text default
+        logger.info(
+            f"Initialized Ollama provider: {base_url} (model={model}, verify_ssl={verify_ssl})"
+        )
+
+    async def embed(self, text: str) -> list[float]:
+        """
+        Generate embedding vector for text.
+
+        Args:
+            text: Input text to embed
+
+        Returns:
+            Vector embedding as list of floats
+        """
+        response = await self.client.post(
+            f"{self.base_url}/api/embeddings",
+            json={"model": self.model, "prompt": text},
+        )
+        response.raise_for_status()
+        return response.json()["embedding"]
+
+    async def embed_batch(self, texts: list[str]) -> list[list[float]]:
+        """
+        Generate embeddings for multiple texts (batched requests).
+
+        Note: Ollama doesn't have native batch API, so we send requests sequentially.
+        For better performance with large batches, consider using asyncio.gather().
+
+        Args:
+            texts: List of texts to embed
+
+        Returns:
+            List of vector embeddings
+        """
+        embeddings = []
+        for text in texts:
+            embedding = await self.embed(text)
+            embeddings.append(embedding)
+        return embeddings
+
+    def get_dimension(self) -> int:
+        """
+        Get embedding dimension.
+
+        Returns:
+            Vector dimension (768 for nomic-embed-text)
+        """
+        return self._dimension
+
+    async def close(self):
+        """Close HTTP client."""
+        await self.client.aclose()
@@ -0,0 +1,111 @@
+"""Embedding service with provider detection."""
+
+import logging
+import os
+
+from .base import EmbeddingProvider
+from .ollama_provider import OllamaEmbeddingProvider
+from .simple_provider import SimpleEmbeddingProvider
+
+logger = logging.getLogger(__name__)
+
+
+class EmbeddingService:
+    """Unified embedding service with automatic provider detection."""
+
+    def __init__(self):
+        """Initialize embedding service with auto-detected provider."""
+        self.provider = self._detect_provider()
+
+    def _detect_provider(self) -> EmbeddingProvider:
+        """
+        Auto-detect available embedding provider.
+
+        Checks environment variables in order:
+        1. OLLAMA_BASE_URL - Use Ollama provider (production)
+        2. OPENAI_API_KEY - Use OpenAI provider (future)
+        3. Fallback to SimpleEmbeddingProvider (testing/development)
+
+        Returns:
+            Configured embedding provider
+        """
+        # Ollama provider (production)
+        ollama_url = os.getenv("OLLAMA_BASE_URL")
+        if ollama_url:
+            logger.info(f"Using Ollama embedding provider: {ollama_url}")
+            return OllamaEmbeddingProvider(
+                base_url=ollama_url,
+                model=os.getenv("OLLAMA_EMBEDDING_MODEL", "nomic-embed-text"),
+                verify_ssl=os.getenv("OLLAMA_VERIFY_SSL", "true").lower() == "true",
+            )
+
+        # OpenAI provider (future implementation)
+        # openai_key = os.getenv("OPENAI_API_KEY")
+        # if openai_key:
+        #     return OpenAIEmbeddingProvider(api_key=openai_key)
+
+        # Fallback to simple provider for development/testing
+        logger.warning(
+            "No embedding provider configured (OLLAMA_BASE_URL or OPENAI_API_KEY not set). "
+            "Using SimpleEmbeddingProvider for testing/development. "
+            "For production, configure an external embedding service."
+        )
+        return SimpleEmbeddingProvider(dimension=384)
+
+    async def embed(self, text: str) -> list[float]:
+        """
+        Generate embedding vector for text.
+
+        Args:
+            text: Input text to embed
+
+        Returns:
+            Vector embedding as list of floats
+        """
+        return await self.provider.embed(text)
+
+    async def embed_batch(self, texts: list[str]) -> list[list[float]]:
+        """
+        Generate embeddings for multiple texts.
+
+        Args:
+            texts: List of texts to embed
+
+        Returns:
+            List of vector embeddings
+        """
+        return await self.provider.embed_batch(texts)
+
+    def get_dimension(self) -> int:
+        """
+        Get embedding dimension.
+
+        Returns:
+            Vector dimension
+        """
+        return self.provider.get_dimension()
+
+    async def close(self):
+        """Close provider resources."""
+        if hasattr(self.provider, "close") and callable(
+            getattr(self.provider, "close")
+        ):
+            close_method = getattr(self.provider, "close")
+            await close_method()
+
+
+# Singleton instance
+_embedding_service: EmbeddingService | None = None
+
+
+def get_embedding_service() -> EmbeddingService:
+    """
+    Get singleton embedding service instance.
+
+    Returns:
+        Global EmbeddingService instance
+    """
+    global _embedding_service
+    if _embedding_service is None:
+        _embedding_service = EmbeddingService()
+    return _embedding_service
@@ -0,0 +1,123 @@
+"""Simple in-process embedding provider for testing.
+
+This provider uses a basic TF-IDF-like approach with feature hashing to generate
+deterministic embeddings without requiring external services. Suitable for testing
+but not for production use.
+"""
+
+import hashlib
+import math
+import re
+from collections import Counter
+
+from .base import EmbeddingProvider
+
+
+class SimpleEmbeddingProvider(EmbeddingProvider):
+    """Simple deterministic embedding provider using feature hashing.
+
+    This implementation:
+    - Tokenizes text into words
+    - Uses feature hashing to map words to fixed-size vectors
+    - Applies TF-IDF-like weighting
+    - Normalizes vectors to unit length
+
+    Not suitable for production but good for testing semantic search infrastructure.
+    """
+
+    def __init__(self, dimension: int = 384):
+        """Initialize simple embedding provider.
+
+        Args:
+            dimension: Embedding dimension (default: 384)
+        """
+        self.dimension = dimension
+
+    def _tokenize(self, text: str) -> list[str]:
+        """Tokenize text into lowercase words.
+
+        Args:
+            text: Input text
+
+        Returns:
+            List of lowercase word tokens
+        """
+        # Simple word tokenization
+        text = text.lower()
+        words = re.findall(r"\b\w+\b", text)
+        return words
+
+    def _hash_word(self, word: str) -> int:
+        """Hash word to dimension index.
+
+        Args:
+            word: Word to hash
+
+        Returns:
+            Index in range [0, dimension)
+        """
+        hash_bytes = hashlib.md5(word.encode()).digest()
+        hash_int = int.from_bytes(hash_bytes[:4], byteorder="big")
+        return hash_int % self.dimension
+
+    def _embed_single(self, text: str) -> list[float]:
+        """Generate embedding for single text.
+
+        Args:
+            text: Input text
+
+        Returns:
+            Normalized embedding vector
+        """
+        tokens = self._tokenize(text)
+        if not tokens:
+            return [0.0] * self.dimension
+
+        # Count term frequencies
+        term_freq = Counter(tokens)
+
+        # Initialize vector
+        vector = [0.0] * self.dimension
+
+        # Apply TF weighting with feature hashing
+        for word, count in term_freq.items():
+            idx = self._hash_word(word)
+            # Simple TF weighting: log(1 + count)
+            vector[idx] += math.log1p(count)
+
+        # Normalize to unit length
+        norm = math.sqrt(sum(x * x for x in vector))
+        if norm > 0:
+            vector = [x / norm for x in vector]
+
+        return vector
+
+    async def embed(self, text: str) -> list[float]:
+        """Generate embedding vector for text.
+
+        Args:
+            text: Input text to embed
+
+        Returns:
+            Vector embedding as list of floats
+        """
+        return self._embed_single(text)
+
+    async def embed_batch(self, texts: list[str]) -> list[list[float]]:
+        """Generate embeddings for multiple texts.
+
+        Args:
+            texts: List of texts to embed
+
+        Returns:
+            List of vector embeddings
+        """
+        return [self._embed_single(text) for text in texts]
+
+    def get_dimension(self) -> int:
+        """Get embedding dimension.
+
+        Returns:
+            Vector dimension
+        """
+        return self.dimension
@@ -0,0 +1,109 @@
+"""Pydantic models for semantic search responses."""
+
+from typing import List, Optional
+
+from pydantic import BaseModel, Field
+
+from .base import BaseResponse
+
+
+class SemanticSearchResult(BaseModel):
+    """Model for semantic search results with additional metadata."""
+
+    id: int = Field(description="Document ID")
+    doc_type: str = Field(
+        description="Document type (note, calendar_event, deck_card, etc.)"
+    )
+    title: str = Field(description="Document title")
+    category: str = Field(
+        default="", description="Document category (notes) or location (calendar)"
+    )
+    excerpt: str = Field(description="Excerpt from matching chunk")
+    score: float = Field(description="Semantic similarity score (0-1)")
+    chunk_index: int = Field(description="Index of matching chunk in document")
+    total_chunks: int = Field(description="Total number of chunks in document")
+
+
+class SemanticSearchResponse(BaseResponse):
+    """Response model for semantic search across all indexed Nextcloud apps."""
+
+    results: List[SemanticSearchResult] = Field(
+        description="Semantic search results with similarity scores"
+    )
+    query: str = Field(description="The search query used")
+    total_found: int = Field(description="Total number of documents found")
+    search_method: str = Field(
+        default="semantic", description="Search method used (semantic or hybrid)"
+    )
+
+
+class SamplingSearchResponse(BaseResponse):
+    """Response from semantic search with LLM-generated answer via MCP sampling.
+
+    This response includes both a generated natural language answer (created by
+    the MCP client's LLM via sampling) and the source documents used to generate
+    that answer. Users can read the answer for quick information and review
+    sources for verification and deeper exploration.
+
+    Attributes:
+        query: The original user query
+        generated_answer: Natural language answer generated by client's LLM
+        sources: List of semantic search results used as context
+        total_found: Total number of matching documents found
+        search_method: Always "semantic_sampling" for this response type
+        model_used: Name of model that generated the answer (e.g., "claude-3-5-sonnet")
+        stop_reason: Why generation stopped ("endTurn", "maxTokens", etc.)
+    """
+
+    query: str = Field(..., description="Original user query")
+    generated_answer: str = Field(
+        ..., description="LLM-generated answer based on retrieved documents"
+    )
+    sources: List[SemanticSearchResult] = Field(
+        default_factory=list,
+        description="Source documents with excerpts and relevance scores",
+    )
+    total_found: int = Field(..., description="Total matching documents")
+    search_method: str = Field(
+        default="semantic_sampling", description="Search method used"
+    )
+    model_used: Optional[str] = Field(
+        default=None, description="Model that generated the answer"
+    )
+    stop_reason: Optional[str] = Field(
+        default=None, description="Reason generation stopped"
+    )
+
+
+class VectorSyncStatusResponse(BaseResponse):
+    """Response for vector sync status.
+
+    Provides information about the current state of vector sync,
+    including how many documents are indexed and how many are pending.
+
+    Attributes:
+        indexed_count: Number of documents in Qdrant vector database
+        pending_count: Number of documents in processing queue
+        status: Current sync status ("idle" or "syncing")
+        enabled: Whether vector sync is enabled
+    """
+
+    indexed_count: int = Field(
+        default=0, description="Number of documents indexed in vector database"
+    )
+    pending_count: int = Field(
+        default=0, description="Number of documents pending processing"
+    )
+    status: str = Field(
+        default="disabled",
+        description='Sync status: "idle", "syncing", or "disabled"',
+    )
+    enabled: bool = Field(default=False, description="Whether vector sync is enabled")
+
+
+__all__ = [
+    "SemanticSearchResult",
+    "SemanticSearchResponse",
+    "SamplingSearchResponse",
+    "VectorSyncStatusResponse",
+]
@@ -0,0 +1,31 @@
+"""
+Observability module for the Nextcloud MCP Server.
+
+This module provides:
+- Prometheus metrics collection
+- OpenTelemetry distributed tracing
+- Enhanced structured logging with trace correlation
+- Monitoring middleware for Starlette/FastAPI
+
+Usage:
+    from nextcloud_mcp_server.observability import setup_observability
+
+    # In app.py lifespan
+    setup_observability(app, config)
+"""
+
+from nextcloud_mcp_server.observability.logging_config import (
+    get_uvicorn_logging_config,
+    setup_logging,
+)
+from nextcloud_mcp_server.observability.metrics import setup_metrics
+from nextcloud_mcp_server.observability.middleware import ObservabilityMiddleware
+from nextcloud_mcp_server.observability.tracing import setup_tracing
+
+__all__ = [
+    "setup_logging",
+    "get_uvicorn_logging_config",
+    "setup_metrics",
+    "setup_tracing",
+    "ObservabilityMiddleware",
+]
@@ -0,0 +1,327 @@
+"""
+Enhanced logging configuration for the Nextcloud MCP Server.
+
+This module provides:
+- Structured JSON logging with python-json-logger
+- Trace context injection (trace_id, span_id) for correlation with distributed traces
+- Configurable log formats (JSON or text)
+- Log level configuration per component
+"""
+
+import logging
+import sys
+from typing import Any
+
+from pythonjsonlogger import jsonlogger
+
+from nextcloud_mcp_server.observability.tracing import get_trace_context
+
+
+class HealthCheckFilter(logging.Filter):
+    """
+    Logging filter that excludes health check endpoint requests.
+
+    This prevents health check polls from cluttering logs while keeping
+    access logs for all other endpoints.
+    """
+
+    def filter(self, record: logging.LogRecord) -> bool:
+        """
+        Filter out health check requests from uvicorn access logs.
+
+        Args:
+            record: LogRecord instance
+
+        Returns:
+            False if this is a health check request, True otherwise
+        """
+        # Check if the log message contains health check endpoints
+        message = record.getMessage()
+        return not any(
+            endpoint in message
+            for endpoint in ["/health/live", "/health/ready", "/metrics"]
+        )
+
+
+class TraceContextFormatter(jsonlogger.JsonFormatter):
+    """
+    JSON formatter that injects OpenTelemetry trace context into log records.
+
+    This allows logs to be correlated with distributed traces by including
+    trace_id and span_id in each log entry.
+    """
+
+    def add_fields(
+        self,
+        log_record: dict[str, Any],
+        record: logging.LogRecord,
+        message_dict: dict[str, Any],
+    ) -> None:
+        """
+        Add custom fields to the log record, including trace context.
+
+        Args:
+            log_record: Dictionary to be serialized as JSON
+            record: LogRecord instance
+            message_dict: Dictionary of extra fields from log call
+        """
+        # Call parent to add standard fields
+        super().add_fields(log_record, record, message_dict)
+
+        # Add trace context if available
+        trace_context = get_trace_context()
+        if trace_context:
+            log_record["trace_id"] = trace_context.get("trace_id")
+            log_record["span_id"] = trace_context.get("span_id")
+
+        # Add standard fields with consistent naming
+        log_record["timestamp"] = self.formatTime(record)
+        log_record["level"] = record.levelname
+        log_record["logger"] = record.name
+        log_record["message"] = record.getMessage()
+
+        # Include exception info if present
+        if record.exc_info:
+            log_record["exception"] = self.formatException(record.exc_info)
+
+
+class TraceContextTextFormatter(logging.Formatter):
+    """
+    Text formatter that includes OpenTelemetry trace context.
+
+    Format: [LEVEL] [timestamp] logger - message [trace_id=xxx span_id=yyy]
+    """
+
+    def format(self, record: logging.LogRecord) -> str:
+        """
+        Format log record with trace context.
+
+        Args:
+            record: LogRecord instance
+
+        Returns:
+            Formatted log string
+        """
+        # Format base message
+        base_message = super().format(record)
+
+        # Add trace context if available
+        trace_context = get_trace_context()
+        if trace_context:
+            trace_id = trace_context.get("trace_id", "")
+            span_id = trace_context.get("span_id", "")
+            return f"{base_message} [trace_id={trace_id} span_id={span_id}]"
+
+        return base_message
+
+
+def setup_logging(
+    log_format: str = "json",
+    log_level: str = "INFO",
+    include_trace_context: bool = True,
+) -> None:
+    """
+    Configure logging for the Nextcloud MCP Server.
+
+    Args:
+        log_format: "json" for JSON logging, "text" for human-readable text (default: "json")
+        log_level: Minimum log level (DEBUG, INFO, WARNING, ERROR, CRITICAL) (default: "INFO")
+        include_trace_context: Whether to include trace context in logs (default: True)
+    """
+    # Get root logger
+    root_logger = logging.getLogger()
+    root_logger.setLevel(getattr(logging, log_level.upper(), logging.INFO))
+
+    # Remove existing handlers
+    root_logger.handlers.clear()
+
+    # Create console handler
+    console_handler = logging.StreamHandler(sys.stdout)
+    console_handler.setLevel(getattr(logging, log_level.upper(), logging.INFO))
+
+    # Configure formatter based on format preference
+    if log_format.lower() == "json":
+        if include_trace_context:
+            formatter = TraceContextFormatter(
+                "%(timestamp)s %(level)s %(name)s %(message)s",
+                datefmt="%Y-%m-%dT%H:%M:%S",
+            )
+        else:
+            formatter = jsonlogger.JsonFormatter(
+                "%(timestamp)s %(level)s %(name)s %(message)s",
+                datefmt="%Y-%m-%dT%H:%M:%S",
+            )
+    else:  # text format
+        if include_trace_context:
+            formatter = TraceContextTextFormatter(
+                "%(levelname)s [%(asctime)s] %(name)s - %(message)s",
+                datefmt="%Y-%m-%d %H:%M:%S",
+            )
+        else:
+            formatter = logging.Formatter(
+                "%(levelname)s [%(asctime)s] %(name)s - %(message)s",
+                datefmt="%Y-%m-%d %H:%M:%S",
+            )
+
+    console_handler.setFormatter(formatter)
+    root_logger.addHandler(console_handler)
+
+    # Configure specific logger levels
+    configure_component_loggers(log_level)
+
+    root_logger.info(
+        f"Logging configured: format={log_format}, level={log_level}, "
+        f"trace_context={include_trace_context}"
+    )
+
+
+def configure_component_loggers(default_level: str = "INFO") -> None:
+    """
+    Configure log levels for specific components.
+
+    This allows fine-grained control over logging verbosity for different
+    parts of the application.
+
+    Args:
+        default_level: Default log level for most components
+    """
+    # Map of logger names to log levels
+    logger_levels = {
+        # Application loggers
+        "nextcloud_mcp_server": default_level,
+        "nextcloud_mcp_server.server": default_level,
+        "nextcloud_mcp_server.client": default_level,
+        "nextcloud_mcp_server.auth": default_level,
+        "nextcloud_mcp_server.observability": default_level,
+        # HTTP client loggers (less verbose by default)
+        "httpx": "WARNING",
+        "httpcore": "WARNING",
+        # Server loggers
+        "uvicorn": "INFO",
+        "uvicorn.access": "INFO",
+        "uvicorn.error": "INFO",
+        # MCP framework
+        "mcp": "INFO",
+        # OpenTelemetry (less verbose)
+        "opentelemetry": "WARNING",
+    }
+
+    for logger_name, level in logger_levels.items():
+        logger = logging.getLogger(logger_name)
+        logger.setLevel(getattr(logging, level.upper(), logging.INFO))
+
+
+def get_logger(name: str) -> logging.Logger:
+    """
+    Get a logger instance for a specific module.
+
+    This is a convenience function that wraps logging.getLogger()
+    to ensure consistent logger configuration.
+
+    Args:
+        name: Logger name (typically __name__)
+
+    Returns:
+        Logger instance
+    """
+    return logging.getLogger(name)
+
+
+def get_uvicorn_logging_config(
+    log_format: str = "json",
+    log_level: str = "INFO",
+    include_trace_context: bool = True,
+) -> dict:
+    """
+    Get uvicorn-compatible logging configuration.
+
+    This creates a logging config dict that uvicorn can use while maintaining
+    our observability setup (JSON format, trace context, etc.).
+
+    Args:
+        log_format: "json" or "text"
+        log_level: Minimum log level
+        include_trace_context: Whether to include trace IDs in logs
+
+    Returns:
+        Logging config dict compatible with uvicorn's log_config parameter
+    """
+    # Determine formatter class based on format and trace context
+    if log_format.lower() == "json":
+        if include_trace_context:
+            formatter_class = "nextcloud_mcp_server.observability.logging_config.TraceContextFormatter"
+        else:
+            formatter_class = "pythonjsonlogger.jsonlogger.JsonFormatter"
+        format_string = "%(timestamp)s %(level)s %(name)s %(message)s"
+    else:
+        if include_trace_context:
+            formatter_class = "nextcloud_mcp_server.observability.logging_config.TraceContextTextFormatter"
+        else:
+            formatter_class = "logging.Formatter"
+        format_string = "%(levelname)s [%(asctime)s] %(name)s - %(message)s"
+
+    return {
+        "version": 1,
+        "disable_existing_loggers": False,
+        "formatters": {
+            "default": {
+                "()": formatter_class,
+                "format": format_string,
+                "datefmt": "%Y-%m-%d %H:%M:%S",
+            },
+        },
+        "filters": {
+            "health_check_filter": {
+                "()": "nextcloud_mcp_server.observability.logging_config.HealthCheckFilter",
+            },
+        },
+        "handlers": {
+            "default": {
+                "formatter": "default",
+                "class": "logging.StreamHandler",
+                "stream": "ext://sys.stdout",
+            },
+            "access": {
+                "formatter": "default",
+                "class": "logging.StreamHandler",
+                "stream": "ext://sys.stdout",
+                "filters": ["health_check_filter"],
+            },
+        },
+        "loggers": {
+            "": {
+                "handlers": ["default"],
+                "level": log_level.upper(),
+            },
+            "uvicorn": {
+                "handlers": ["default"],
+                "level": "INFO",
+                "propagate": False,
+            },
+            "uvicorn.access": {
+                "handlers": ["access"],
+                "level": "INFO",
+                "propagate": False,
+            },
+            "uvicorn.error": {
+                "handlers": ["default"],
+                "level": "INFO",
+                "propagate": False,
+            },
+            "httpx": {
+                "handlers": ["default"],
+                "level": "WARNING",
+                "propagate": False,
+            },
+            "httpcore": {
+                "handlers": ["default"],
+                "level": "WARNING",
+                "propagate": False,
+            },
+            "opentelemetry": {
+                "handlers": ["default"],
+                "level": "WARNING",
+                "propagate": False,
+            },
+        },
+    }
@@ -0,0 +1,354 @@
+"""
+Prometheus metrics for the Nextcloud MCP Server.
+
+This module defines all Prometheus metrics for monitoring server health, performance,
+and resource usage. Metrics are organized by category:
+
+- HTTP Server Metrics (RED: Rate, Errors, Duration)
+- MCP Tool Metrics (per-tool invocation tracking)
+- MCP Resource Metrics
+- Nextcloud API Client Metrics
+- OAuth Flow Metrics
+- Vector Sync Metrics (conditional on feature flag)
+- Database Operation Metrics
+- External Dependency Health Metrics
+"""
+
+import logging
+
+from prometheus_client import (
+    Counter,
+    Gauge,
+    Histogram,
+    start_http_server,
+)
+
+logger = logging.getLogger(__name__)
+
+# =============================================================================
+# HTTP Server Metrics (RED + System)
+# =============================================================================
+
+http_requests_total = Counter(
+    "mcp_http_requests_total",
+    "Total HTTP requests received",
+    ["method", "endpoint", "status_code"],
+)
+
+http_request_duration_seconds = Histogram(
+    "mcp_http_request_duration_seconds",
+    "HTTP request latency in seconds",
+    ["method", "endpoint"],
+    buckets=(0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1.0, 2.5, 5.0, 10.0),
+)
+
+http_requests_in_progress = Gauge(
+    "mcp_http_requests_in_progress",
+    "Number of HTTP requests currently being processed",
+    ["method", "endpoint"],
+)
+
+# =============================================================================
+# MCP Tool Metrics
+# =============================================================================
+
+mcp_tool_calls_total = Counter(
+    "mcp_tool_calls_total",
+    "Total MCP tool invocations",
+    ["tool_name", "status"],  # status: success | error
+)
+
+mcp_tool_duration_seconds = Histogram(
+    "mcp_tool_duration_seconds",
+    "MCP tool execution duration in seconds",
+    ["tool_name"],
+    buckets=(0.01, 0.05, 0.1, 0.25, 0.5, 1.0, 2.5, 5.0, 10.0, 30.0),
+)
+
+mcp_tool_errors_total = Counter(
+    "mcp_tool_errors_total",
+    "Total MCP tool errors by type",
+    ["tool_name", "error_type"],
+)
+
+# =============================================================================
+# MCP Resource Metrics
+# =============================================================================
+
+mcp_resource_requests_total = Counter(
+    "mcp_resource_requests_total",
+    "Total MCP resource requests",
+    ["resource_uri", "status"],
+)
+
+mcp_resource_duration_seconds = Histogram(
+    "mcp_resource_duration_seconds",
+    "MCP resource request duration in seconds",
+    ["resource_uri"],
+    buckets=(0.01, 0.05, 0.1, 0.25, 0.5, 1.0, 2.5),
+)
+
+# =============================================================================
+# Nextcloud API Client Metrics
+# =============================================================================
+
+nextcloud_api_requests_total = Counter(
+    "mcp_nextcloud_api_requests_total",
+    "Total Nextcloud API requests",
+    ["app", "method", "status_code"],  # app: notes, calendar, contacts, etc.
+)
+
+nextcloud_api_duration_seconds = Histogram(
+    "mcp_nextcloud_api_duration_seconds",
+    "Nextcloud API request duration in seconds",
+    ["app", "method"],
+    buckets=(0.01, 0.05, 0.1, 0.25, 0.5, 1.0, 2.5, 5.0, 10.0),
+)
+
+nextcloud_api_retries_total = Counter(
+    "mcp_nextcloud_api_retries_total",
+    "Total Nextcloud API retries",
+    ["app", "reason"],  # reason: 429 | timeout | connection_error
+)
+
+# =============================================================================
+# OAuth Flow Metrics
+# =============================================================================
+
+oauth_token_validations_total = Counter(
+    "mcp_oauth_token_validations_total",
+    "Total OAuth token validation attempts",
+    ["method", "result"],  # method: introspect | jwt; result: valid | invalid | error
+)
+
+oauth_token_exchange_total = Counter(
+    "mcp_oauth_token_exchange_total",
+    "Total OAuth token exchange operations (RFC 8693)",
+    ["status"],  # status: success | error
+)
+
+oauth_token_cache_hits_total = Counter(
+    "mcp_oauth_token_cache_hits_total",
+    "Total OAuth token cache lookups",
+    ["hit"],  # hit: true | false
+)
+
+oauth_refresh_token_operations_total = Counter(
+    "mcp_oauth_refresh_token_operations_total",
+    "Total refresh token storage operations",
+    [
+        "operation",
+        "status",
+    ],  # operation: store | retrieve | delete; status: success | error
+)
+
+# =============================================================================
+# Vector Sync Metrics (optional feature)
+# =============================================================================
+
+vector_sync_documents_scanned_total = Counter(
+    "mcp_vector_sync_documents_scanned_total",
+    "Total documents scanned for vector sync",
+)
+
+vector_sync_documents_processed_total = Counter(
+    "mcp_vector_sync_documents_processed_total",
+    "Total documents processed for vector sync",
+    ["status"],  # status: success | error
+)
+
+vector_sync_processing_duration_seconds = Histogram(
+    "mcp_vector_sync_processing_duration_seconds",
+    "Document processing duration in seconds",
+    buckets=(0.1, 0.5, 1.0, 2.5, 5.0, 10.0, 30.0, 60.0),
+)
+
+vector_sync_queue_size = Gauge(
+    "mcp_vector_sync_queue_size",
+    "Current number of documents in processing queue",
+)
+
+qdrant_operations_total = Counter(
+    "mcp_qdrant_operations_total",
+    "Total Qdrant vector database operations",
+    [
+        "operation",
+        "status",
+    ],  # operation: upsert | search | delete; status: success | error
+)
+
+# =============================================================================
+# Database Metrics
+# =============================================================================
+
+db_operations_total = Counter(
+    "mcp_db_operations_total",
+    "Total database operations",
+    ["db", "operation", "status"],  # db: sqlite | qdrant; operation varies
+)
+
+db_operation_duration_seconds = Histogram(
+    "mcp_db_operation_duration_seconds",
+    "Database operation duration in seconds",
+    ["db", "operation"],
+    buckets=(0.001, 0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1.0),
+)
+
+# =============================================================================
+# External Dependency Health Metrics
+# =============================================================================
+
+dependency_health = Gauge(
+    "mcp_dependency_health",
+    "External dependency health status (1=up, 0=down)",
+    ["dependency"],  # dependency: nextcloud | keycloak | qdrant | unstructured
+)
+
+dependency_check_duration_seconds = Histogram(
+    "mcp_dependency_check_duration_seconds",
+    "Dependency health check duration in seconds",
+    ["dependency"],
+    buckets=(0.01, 0.05, 0.1, 0.25, 0.5, 1.0, 2.5),
+)
+
+# =============================================================================
+# Metrics Setup and HTTP Handler
+# =============================================================================
+
+
+def setup_metrics(port: int = 9090) -> None:
+    """
+    Initialize Prometheus metrics collection and start HTTP server.
+
+    Starts a dedicated HTTP server on the specified port to serve metrics.
+    This server runs in a separate thread and is isolated from the main application.
+
+    Args:
+        port: Port to serve metrics on (default: 9090)
+
+    Note:
+        Metrics endpoint (/metrics) is ONLY accessible on this dedicated port,
+        not on the main application HTTP port. This is a security best practice
+        to prevent external exposure of metrics.
+    """
+    try:
+        start_http_server(port)
+        logger.info(f"Prometheus metrics server started on port {port}")
+    except OSError as e:
+        if "Address already in use" in str(e):
+            logger.warning(
+                f"Metrics port {port} already in use (metrics server likely already running)"
+            )
+        else:
+            logger.error(f"Failed to start metrics server on port {port}: {e}")
+            raise
+
+
+# =============================================================================
+# Convenience Functions for Common Metric Updates
+# =============================================================================
+
+
+def record_tool_call(tool_name: str, duration: float, status: str = "success") -> None:
+    """
+    Record metrics for an MCP tool call.
+
+    Args:
+        tool_name: Name of the MCP tool
+        duration: Execution duration in seconds
+        status: "success" or "error"
+    """
+    mcp_tool_calls_total.labels(tool_name=tool_name, status=status).inc()
+    mcp_tool_duration_seconds.labels(tool_name=tool_name).observe(duration)
+
+
+def record_tool_error(tool_name: str, error_type: str) -> None:
+    """
+    Record an MCP tool error.
+
+    Args:
+        tool_name: Name of the MCP tool
+        error_type: Type of error (e.g., "HTTPStatusError", "ValueError")
+    """
+    mcp_tool_errors_total.labels(tool_name=tool_name, error_type=error_type).inc()
+
+
+def record_nextcloud_api_call(
+    app: str,
+    method: str,
+    status_code: int,
+    duration: float,
+) -> None:
+    """
+    Record metrics for a Nextcloud API call.
+
+    Args:
+        app: Nextcloud app name (notes, calendar, contacts, etc.)
+        method: HTTP method (GET, POST, PUT, DELETE, PROPFIND, etc.)
+        status_code: HTTP status code
+        duration: Request duration in seconds
+    """
+    nextcloud_api_requests_total.labels(
+        app=app, method=method, status_code=str(status_code)
+    ).inc()
+    nextcloud_api_duration_seconds.labels(app=app, method=method).observe(duration)
+
+
+def record_nextcloud_api_retry(app: str, reason: str) -> None:
+    """
+    Record a Nextcloud API retry.
+
+    Args:
+        app: Nextcloud app name
+        reason: Retry reason (429, timeout, connection_error)
+    """
+    nextcloud_api_retries_total.labels(app=app, reason=reason).inc()
+
+
+def record_oauth_token_validation(method: str, result: str) -> None:
+    """
+    Record an OAuth token validation.
+
+    Args:
+        method: Validation method ("introspect" or "jwt")
+        result: Validation result ("valid", "invalid", or "error")
+    """
+    oauth_token_validations_total.labels(method=method, result=result).inc()
+
+
+def record_db_operation(
+    db: str, operation: str, duration: float, status: str = "success"
+) -> None:
+    """
+    Record a database operation.
+
+    Args:
+        db: Database type ("sqlite" or "qdrant")
+        operation: Operation type (e.g., "insert", "select", "upsert", "search")
+        duration: Operation duration in seconds
+        status: "success" or "error"
+    """
+    db_operations_total.labels(db=db, operation=operation, status=status).inc()
+    db_operation_duration_seconds.labels(db=db, operation=operation).observe(duration)
+
+
+def set_dependency_health(dependency: str, is_healthy: bool) -> None:
+    """
+    Update external dependency health status.
+
+    Args:
+        dependency: Dependency name (nextcloud, keycloak, qdrant, unstructured)
+        is_healthy: True if dependency is healthy, False otherwise
+    """
+    dependency_health.labels(dependency=dependency).set(1 if is_healthy else 0)
+
+
+def record_dependency_check(dependency: str, duration: float) -> None:
+    """
+    Record a dependency health check duration.
+
+    Args:
+        dependency: Dependency name
+        duration: Check duration in seconds
+    """
+    dependency_check_duration_seconds.labels(dependency=dependency).observe(duration)
@@ -0,0 +1,200 @@
+"""
+Observability middleware for the Nextcloud MCP Server.
+
+This module provides Starlette middleware that automatically instruments
+HTTP requests with:
+- Prometheus metrics (request count, latency, in-flight requests)
+- OpenTelemetry distributed tracing
+- Request/response timing and error tracking
+"""
+
+import logging
+import time
+from typing import Callable
+
+from starlette.middleware.base import BaseHTTPMiddleware
+from starlette.requests import Request
+from starlette.responses import Response
+
+from nextcloud_mcp_server.observability.metrics import (
+    http_request_duration_seconds,
+    http_requests_in_progress,
+    http_requests_total,
+)
+from nextcloud_mcp_server.observability.tracing import (
+    add_span_attribute,
+    trace_operation,
+)
+
+logger = logging.getLogger(__name__)
+
+
+class ObservabilityMiddleware(BaseHTTPMiddleware):
+    """
+    Starlette middleware for automatic HTTP request instrumentation.
+
+    This middleware:
+    - Records Prometheus metrics for each request (RED metrics)
+    - Creates OpenTelemetry spans for distributed tracing
+    - Tracks request timing and errors
+    - Handles in-flight request counting
+    """
+
+    async def dispatch(
+        self,
+        request: Request,
+        call_next: Callable,
+    ) -> Response:
+        """
+        Process HTTP request with observability instrumentation.
+
+        Args:
+            request: Starlette request object
+            call_next: Next middleware or route handler
+
+        Returns:
+            Response from downstream handler
+        """
+        # Extract request details
+        method = request.method
+        path = request.url.path
+        endpoint = self._get_endpoint_label(path)
+
+        # Increment in-flight requests counter
+        http_requests_in_progress.labels(method=method, endpoint=endpoint).inc()
+
+        # Record start time
+        start_time = time.time()
+
+        try:
+            # Create span for request (OpenTelemetry auto-instrumentation will create parent span)
+            with trace_operation(
+                f"HTTP {method} {endpoint}",
+                attributes={
+                    "http.method": method,
+                    "http.path": path,
+                    "http.scheme": request.url.scheme,
+                    "http.host": request.url.hostname,
+                },
+            ):
+                # Process request
+                response = await call_next(request)
+
+                # Add response status to span
+                add_span_attribute("http.status_code", response.status_code)
+
+                # Record metrics
+                duration = time.time() - start_time
+                self._record_request_metrics(
+                    method=method,
+                    endpoint=endpoint,
+                    status_code=response.status_code,
+                    duration=duration,
+                )
+
+                return response
+
+        except Exception:
+            # Record error metrics
+            duration = time.time() - start_time
+            self._record_request_metrics(
+                method=method,
+                endpoint=endpoint,
+                status_code=500,  # Internal server error
+                duration=duration,
+            )
+
+            logger.error(
+                f"Request failed: {method} {path}",
+                exc_info=True,
+                extra={
+                    "method": method,
+                    "path": path,
+                    "duration_seconds": duration,
+                },
+            )
+
+            # Re-raise exception to be handled by error middleware
+            raise
+
+        finally:
+            # Decrement in-flight requests counter
+            http_requests_in_progress.labels(method=method, endpoint=endpoint).dec()
+
+    def _get_endpoint_label(self, path: str) -> str:
+        """
+        Get endpoint label for metrics, normalizing dynamic path segments.
+
+        This prevents metric cardinality explosion by grouping similar paths.
+
+        Args:
+            path: Request path
+
+        Returns:
+            Normalized endpoint label
+        """
+        # Health check endpoints
+        if path.startswith("/health/"):
+            return "/health/*"
+
+        # Metrics endpoint
+        if path == "/metrics":
+            return "/metrics"
+
+        # MCP protocol endpoints
+        if path == "/sse" or path.startswith("/sse/"):
+            return "/sse"
+
+        if path == "/messages" or path.startswith("/messages/"):
+            return "/messages"
+
+        # OAuth/OIDC endpoints
+        if path.startswith("/oauth/"):
+            return "/oauth/*"
+
+        if path.startswith("/oidc/"):
+            return "/oidc/*"
+
+        # Catch-all for other paths
+        return path
+
+    def _record_request_metrics(
+        self,
+        method: str,
+        endpoint: str,
+        status_code: int,
+        duration: float,
+    ) -> None:
+        """
+        Record Prometheus metrics for an HTTP request.
+
+        Args:
+            method: HTTP method
+            endpoint: Normalized endpoint label
+            status_code: HTTP status code
+            duration: Request duration in seconds
+        """
+        # Record request count
+        http_requests_total.labels(
+            method=method,
+            endpoint=endpoint,
+            status_code=str(status_code),
+        ).inc()
+
+        # Record request duration
+        http_request_duration_seconds.labels(
+            method=method,
+            endpoint=endpoint,
+        ).observe(duration)
+
+        # Log slow requests (>1 second)
+        if duration > 1.0:
+            logger.warning(
+                f"Slow request: {method} {endpoint} took {duration:.3f}s",
+                extra={
+                    "method": method,
+                    "endpoint": endpoint,
+                    "status_code": status_code,
+                    "duration_seconds": duration,
+                },
+            )
@@ -0,0 +1,363 @@
+"""
+OpenTelemetry distributed tracing for the Nextcloud MCP Server.
+
+This module provides:
+- OpenTelemetry SDK initialization with OTLP exporter
+- Auto-instrumentation for ASGI (Starlette/FastAPI) and httpx
+- Helper functions for creating custom spans
+- Context propagation utilities
+- Span attribute standardization
+"""
+
+import logging
+from contextlib import contextmanager
+from typing import Any
+
+from opentelemetry import trace
+from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
+from opentelemetry.instrumentation.httpx import HTTPXClientInstrumentor
+from opentelemetry.instrumentation.logging import LoggingInstrumentor
+from opentelemetry.sdk.resources import Resource
+from opentelemetry.sdk.trace import TracerProvider
+from opentelemetry.sdk.trace.export import BatchSpanProcessor
+from opentelemetry.trace import Status, StatusCode, Tracer
+
+logger = logging.getLogger(__name__)
+
+# Global tracer instance (initialized in setup_tracing)
+_tracer: Tracer | None = None
+
+
+def setup_tracing(
+    service_name: str = "nextcloud-mcp-server",
+    otlp_endpoint: str | None = None,
+    sampling_rate: float = 1.0,
+) -> Tracer:
+    """
+    Initialize OpenTelemetry tracing with OTLP exporter.
+
+    Args:
+        service_name: Service name for traces (default: "nextcloud-mcp-server")
+        otlp_endpoint: OTLP gRPC endpoint (e.g., "http://otel-collector:4317")
+                      If None, tracing is initialized but no exporter is configured
+        sampling_rate: Sampling rate (0.0-1.0). Default 1.0 (100% sampling)
+
+    Returns:
+        Tracer instance for creating custom spans
+    """
+    global _tracer
+
+    # Create resource with service name
+    resource = Resource.create(
+        {
+            "service.name": service_name,
+            "service.version": "0.27.2",  # TODO: Extract from pyproject.toml
+        }
+    )
+
+    # Create tracer provider
+    provider = TracerProvider(resource=resource)
+
+    # Configure OTLP exporter if endpoint is provided
+    if otlp_endpoint:
+        try:
+            otlp_exporter = OTLPSpanExporter(endpoint=otlp_endpoint, insecure=True)
+            span_processor = BatchSpanProcessor(otlp_exporter)
+            provider.add_span_processor(span_processor)
+            logger.info(
+                f"OpenTelemetry tracing enabled with OTLP endpoint: {otlp_endpoint}"
+            )
+        except Exception as e:
+            logger.warning(
+                f"Failed to initialize OTLP exporter: {e}. Continuing without trace export."
+            )
+    else:
+        logger.info(
+            "OpenTelemetry tracing initialized without OTLP exporter (traces will be generated but not exported)"
+        )
+
+    # Set global tracer provider
+    trace.set_tracer_provider(provider)
+
+    # Auto-instrument httpx for Nextcloud API calls
+    HTTPXClientInstrumentor().instrument()
+
+    # Auto-instrument logging to inject trace context
+    LoggingInstrumentor().instrument(set_logging_format=True)
+
+    # Get and store tracer
+    _tracer = trace.get_tracer(__name__)
+
+    logger.info(f"OpenTelemetry tracing initialized for service: {service_name}")
+    return _tracer
+
+
+def get_tracer() -> Tracer | None:
+    """
+    Get the global tracer instance.
+
+    Returns:
+        Tracer instance for creating custom spans, or None if tracing is not enabled
+
+    Note:
+        Returns None if setup_tracing() was never called (tracing disabled).
+        Calling code should handle None gracefully.
+    """
+    return _tracer
+
+
+@contextmanager
+def trace_operation(
+    operation_name: str,
+    attributes: dict[str, Any] | None = None,
+    record_exception: bool = True,
+):
+    """
+    Context manager for tracing an operation with automatic error handling.
+
+    Usage:
+        with trace_operation("mcp.tool.nc_notes_create_note", {"note.title": "My Note"}):
+            # Your code here
+            pass
+
+    Args:
+        operation_name: Name of the operation (span name)
+        attributes: Optional attributes to add to the span
+        record_exception: Whether to record exceptions in the span (default: True)
+
+    Yields:
+        Span instance for adding additional attributes (or None if tracing disabled)
+    """
+    tracer = get_tracer()
+
+    # If tracing is not enabled, just yield without creating a span
+    if tracer is None:
+        yield None
+        return
+
+    with tracer.start_as_current_span(operation_name) as span:
+        # Set initial attributes
+        if attributes:
+            for key, value in attributes.items():
+                span.set_attribute(key, value)
+
+        try:
+            yield span
+            span.set_status(Status(StatusCode.OK))
+        except Exception as e:
+            if record_exception:
+                span.record_exception(e)
+            span.set_status(Status(StatusCode.ERROR, str(e)))
+            raise
+
+
+def trace_mcp_tool(tool_name: str, tool_args: dict[str, Any] | None = None):
+    """
+    Create a span for an MCP tool invocation.
+
+    Usage:
+        with trace_mcp_tool("nc_notes_create_note", {"title": "My Note"}):
+            # Tool implementation
+            pass
+
+    Args:
+        tool_name: Name of the MCP tool
+        tool_args: Optional tool arguments (sensitive data will be sanitized)
+
+    Returns:
+        Context manager for the span
+    """
+    attributes = {
+        "mcp.tool.name": tool_name,
+    }
+
+    # Add sanitized tool args (avoid logging sensitive data)
+    if tool_args:
+        # Only include non-sensitive arguments
+        safe_args = {
+            k: v
+            for k, v in tool_args.items()
+            if k not in ("password", "token", "secret", "api_key", "etag")
+        }
+        if safe_args:
+            attributes["mcp.tool.args"] = str(safe_args)
+
+    return trace_operation(f"mcp.tool.{tool_name}", attributes)
+
+
+def trace_nextcloud_api_call(
+    app: str,
+    method: str,
+    path: str | None = None,
+):
+    """
+    Create a span for a Nextcloud API call.
+
+    Usage:
+        with trace_nextcloud_api_call("notes", "POST", "/apps/notes/api/v1/notes"):
+            # API call implementation
+            pass
+
+    Args:
+        app: Nextcloud app name (notes, calendar, contacts, etc.)
+        method: HTTP method (GET, POST, PUT, DELETE, etc.)
+        path: Optional API path
+
+    Returns:
+        Context manager for the span
+    """
+    attributes = {
+        "nextcloud.app": app,
+        "http.method": method,
+    }
+
+    if path:
+        attributes["http.path"] = path
+
+    return trace_operation(f"nextcloud.api.{app}.{method}", attributes)
+
+
+def trace_oauth_operation(operation: str, details: dict[str, Any] | None = None):
+    """
+    Create a span for an OAuth operation.
+
+    Usage:
+        with trace_oauth_operation("token.validate", {"method": "jwt"}):
+            # OAuth validation logic
+            pass
+
+    Args:
+        operation: OAuth operation name (e.g., "token.validate", "token.exchange")
+        details: Optional operation details (sensitive data will be sanitized)
+
+    Returns:
+        Context manager for the span
+    """
+    attributes = {"oauth.operation": operation}
+
+    if details:
+        # Only include non-sensitive details
+        safe_details = {
+            k: v
+            for k, v in details.items()
+            if k not in ("token", "refresh_token", "access_token", "client_secret")
+        }
+        if safe_details:
+            attributes.update(safe_details)
+
+    return trace_operation(f"oauth.{operation}", attributes)
+
+
+def trace_vector_sync_operation(
+    operation: str,
+    document_count: int | None = None,
+):
+    """
+    Create a span for a vector sync operation.
+
+    Usage:
+        with trace_vector_sync_operation("scan", document_count=10):
+            # Vector sync logic
+            pass
+
+    Args:
+        operation: Operation name (scan, process, embed, upsert)
+        document_count: Optional number of documents being processed
+
+    Returns:
+        Context manager for the span
+    """
+    attributes = {"vector_sync.operation": operation}
+
+    if document_count is not None:
+        attributes["vector_sync.document_count"] = document_count
+
+    return trace_operation(f"vector_sync.{operation}", attributes)
+
+
+def trace_db_operation(
+    db: str,
+    operation: str,
+    table: str | None = None,
+):
+    """
+    Create a span for a database operation.
+
+    Usage:
+        with trace_db_operation("sqlite", "insert", "refresh_tokens"):
+            # Database operation
+            pass
+
+    Args:
+        db: Database type (sqlite, qdrant)
+        operation: Operation type (insert, select, update, delete, upsert, search)
+        table: Optional table/collection name
+
+    Returns:
+        Context manager for the span
+    """
+    attributes = {
+        "db.system": db,
+        "db.operation": operation,
+    }
+
+    if table:
+        attributes["db.table"] = table
+
+    return trace_operation(f"db.{db}.{operation}", attributes)
+
+
+def add_span_attribute(key: str, value: Any) -> None:
+    """
+    Add an attribute to the current span (if any).
+
+    Args:
+        key: Attribute key
+        value: Attribute value
+
+    Note:
+        This is a no-op if tracing is not enabled or there's no active span.
+    """
+    if _tracer is None:
+        return  # Tracing not enabled
+    span = trace.get_current_span()
+    if span.is_recording():
+        span.set_attribute(key, value)
+
+
+def add_span_event(name: str, attributes: dict[str, Any] | None = None) -> None:
+    """
+    Add an event to the current span (if any).
+
+    Args:
+        name: Event name
+        attributes: Optional event attributes
+
+    Note:
+        This is a no-op if tracing is not enabled or there's no active span.
+    """
+    if _tracer is None:
+        return  # Tracing not enabled
+    span = trace.get_current_span()
+    if span.is_recording():
+        span.add_event(name, attributes=attributes or {})
+
+
+def get_trace_context() -> dict[str, str]:
+    """
+    Get current trace context as a dictionary.
+
+    Returns:
+        Dictionary with trace_id and span_id (or empty dict if tracing disabled or no active span)
+    """
+    if _tracer is None:
+        return {}  # Tracing not enabled
+
+    span = trace.get_current_span()
+    if span.is_recording():
+        span_context = span.get_span_context()
+        return {
+            "trace_id": format(span_context.trace_id, "032x"),
+            "span_id": format(span_context.span_id, "016x"),
+        }
+    return {}
@@ -3,6 +3,7 @@ from .contacts import configure_contacts_tools
 from .cookbook import configure_cookbook_tools
 from .deck import configure_deck_tools
 from .notes import configure_notes_tools
+from .semantic import configure_semantic_tools
 from .sharing import configure_sharing_tools
 from .tables import configure_tables_tools
 from .webdav import configure_webdav_tools
@@ -13,6 +14,7 @@ __all__ = [
    "configure_cookbook_tools",
    "configure_deck_tools",
    "configure_notes_tools",
+    "configure_semantic_tools",
    "configure_sharing_tools",
    "configure_tables_tools",
    "configure_webdav_tools",
@@ -0,0 +1,573 @@
+"""Semantic search MCP tools using vector database."""
+
+import logging
+
+from httpx import HTTPStatusError, RequestError
+from mcp.server.fastmcp import Context, FastMCP
+from mcp.shared.exceptions import McpError
+from mcp.types import (
+    ErrorData,
+    ModelHint,
+    ModelPreferences,
+    SamplingMessage,
+    TextContent,
+)
+
+from nextcloud_mcp_server.auth import require_scopes
+from nextcloud_mcp_server.context import get_client
+from nextcloud_mcp_server.models.semantic import (
+    SamplingSearchResponse,
+    SemanticSearchResponse,
+    SemanticSearchResult,
+    VectorSyncStatusResponse,
+)
+
+logger = logging.getLogger(__name__)
+
+
+def configure_semantic_tools(mcp: FastMCP):
+    """Configure semantic search tools for MCP server."""
+
+    @mcp.tool()
+    @require_scopes("semantic:read")
+    async def nc_semantic_search(
+        query: str, ctx: Context, limit: int = 10, score_threshold: float = 0.7
+    ) -> SemanticSearchResponse:
+        """
+        Semantic search across all indexed Nextcloud apps using vector embeddings.
+
+        Searches documents by meaning rather than exact keywords across notes, calendar
+        events, deck cards, files, and contacts. Requires vector database synchronization
+        to be enabled (VECTOR_SYNC_ENABLED=true).
+
+        Args:
+            query: Natural language search query
+            limit: Maximum number of results to return (default: 10)
+            score_threshold: Minimum similarity score (0-1, default: 0.7)
+
+        Returns:
+            SemanticSearchResponse with matching documents and similarity scores
+        """
+        from qdrant_client.models import FieldCondition, Filter, MatchValue
+
+        from nextcloud_mcp_server.config import get_settings
+        from nextcloud_mcp_server.embedding import get_embedding_service
+        from nextcloud_mcp_server.vector.qdrant_client import get_qdrant_client
+
+        settings = get_settings()
+
+        # Check if vector sync is enabled
+        if not settings.vector_sync_enabled:
+            raise McpError(
+                ErrorData(
+                    code=-1,
+                    message="Semantic search is not enabled. Set VECTOR_SYNC_ENABLED=true and ensure vector database is configured.",
+                )
+            )
+
+        client = await get_client(ctx)
+        username = client.username
+
+        logger.info(
+            f"Semantic search: query='{query}', user={username}, "
+            f"limit={limit}, score_threshold={score_threshold}"
+        )
+
+        try:
+            # Generate embedding for query
+            embedding_service = get_embedding_service()
+            query_embedding = await embedding_service.embed(query)
+            logger.debug(
+                f"Generated embedding for query (dimension={len(query_embedding)})"
+            )
+
+            # Search Qdrant with user filtering
+            # Note: Currently only searching notes (doc_type="note")
+            # Future: Remove doc_type filter to search all apps
+            qdrant_client = await get_qdrant_client()
+            search_response = await qdrant_client.query_points(
+                collection_name=settings.get_collection_name(),
+                query=query_embedding,
+                query_filter=Filter(
+                    must=[
+                        FieldCondition(
+                            key="user_id",
+                            match=MatchValue(value=username),
+                        ),
+                        FieldCondition(
+                            key="doc_type",
+                            match=MatchValue(value="note"),
+                        ),
+                    ]
+                ),
+                limit=limit * 2,  # Get extra for filtering
+                score_threshold=score_threshold,
+                with_payload=True,
+                with_vectors=False,  # Don't return vectors to save bandwidth
+            )
+
+            logger.info(
+                f"Qdrant returned {len(search_response.points)} results "
+                f"(before deduplication and access verification)"
+            )
+            if search_response.points:
+                # Log top 3 scores to help with threshold tuning
+                top_scores = [p.score for p in search_response.points[:3]]
+                logger.debug(f"Top 3 similarity scores: {top_scores}")
+
+            # Deduplicate by document ID (multiple chunks per document)
+            seen_doc_ids = set()
+            results = []
+
+            for result in search_response.points:
+                doc_id = int(result.payload["doc_id"])
+                doc_type = result.payload.get("doc_type", "note")
+
+                # Skip if we've already seen this document
+                if doc_id in seen_doc_ids:
+                    continue
+
+                seen_doc_ids.add(doc_id)
+
+                # Verify access via Nextcloud API (dual-phase authorization)
+                # Currently only supports notes, will be extended to other apps
+                if doc_type == "note":
+                    try:
+                        note = await client.notes.get_note(doc_id)
+
+                        results.append(
+                            SemanticSearchResult(
+                                id=doc_id,
+                                doc_type="note",
+                                title=result.payload["title"],
+                                category=note.get("category", ""),
+                                excerpt=result.payload["excerpt"],
+                                score=result.score,
+                                chunk_index=result.payload["chunk_index"],
+                                total_chunks=result.payload["total_chunks"],
+                            )
+                        )
+
+                        if len(results) >= limit:
+                            break
+
+                    except HTTPStatusError as e:
+                        if e.response.status_code == 403:
+                            # User lost access, skip this document
+                            logger.debug(f"Skipping note {doc_id}: access denied (403)")
+                            continue
+                        elif e.response.status_code == 404:
+                            # Document was deleted but not yet removed from vector DB
+                            logger.debug(
+                                f"Skipping note {doc_id}: not found (404), "
+                                f"likely deleted after indexing"
+                            )
+                            continue
+                        else:
+                            # Log other errors but continue processing
+                            logger.warning(
+                                f"Error verifying access to note {doc_id}: {e.response.status_code}"
+                            )
+                            continue
+
+            logger.info(
+                f"Returning {len(results)} results after deduplication and access verification"
+            )
+            if results:
+                result_details = [
+                    f"note_{r.id} (score={r.score:.3f}, title='{r.title}')"
+                    for r in results[:5]  # Show top 5
+                ]
+                logger.debug(f"Top results: {', '.join(result_details)}")
+
+            return SemanticSearchResponse(
+                results=results,
+                query=query,
+                total_found=len(results),
+                search_method="semantic",
+            )
+
+        except ValueError as e:
+            if "No embedding provider configured" in str(e):
+                raise McpError(
+                    ErrorData(
+                        code=-1,
+                        message="Embedding service not configured. Set OLLAMA_BASE_URL environment variable.",
+                    )
+                )
+            raise McpError(ErrorData(code=-1, message=f"Configuration error: {str(e)}"))
+        except RequestError as e:
+            raise McpError(
+                ErrorData(code=-1, message=f"Network error during search: {str(e)}")
+            )
+        except Exception as e:
+            logger.error(f"Semantic search error: {e}", exc_info=True)
+            raise McpError(
+                ErrorData(code=-1, message=f"Semantic search failed: {str(e)}")
+            )
+
+    @mcp.tool()
+    @require_scopes("semantic:read")
+    async def nc_semantic_search_answer(
+        query: str,
+        ctx: Context,
+        limit: int = 5,
+        score_threshold: float = 0.7,
+        max_answer_tokens: int = 500,
+    ) -> SamplingSearchResponse:
+        """
+        Semantic search with LLM-generated answer using MCP sampling.
+
+        Retrieves relevant documents from indexed Nextcloud apps (notes, calendar, deck,
+        files, contacts) using vector similarity search, then uses MCP sampling to request
+        the client's LLM to generate a natural language answer based on the retrieved context.
+
+        This tool combines the power of semantic search (finding relevant content across
+        all your Nextcloud apps) with LLM generation (synthesizing that content into
+        coherent answers). The generated answer includes citations to specific documents
+        with their types, allowing users to verify claims and explore sources.
+
+        The LLM generation happens client-side via MCP sampling. The MCP client
+        controls which model is used, who pays for it, and whether to prompt the
+        user for approval. This keeps the server simple (no LLM API keys needed)
+        while giving users full control over their LLM interactions.
+
+        Args:
+            query: Natural language question to answer (e.g., "What are my Q1 objectives?" or "When is my next dentist appointment?")
+            ctx: MCP context for session access
+            limit: Maximum number of documents to retrieve (default: 5)
+            score_threshold: Minimum similarity score 0-1 (default: 0.7)
+            max_answer_tokens: Maximum tokens for generated answer (default: 500)
+
+        Returns:
+            SamplingSearchResponse containing:
+            - generated_answer: Natural language answer with citations
+            - sources: List of documents with excerpts and relevance scores
+            - model_used: Which model generated the answer
+            - stop_reason: Why generation stopped
+
+        Note: Requires MCP client to support sampling. If sampling is unavailable,
+        the tool gracefully degrades to returning documents with an explanation.
+        The client may prompt the user to approve the sampling request.
+
+        Examples:
+            >>> # Query about objectives across multiple apps
+            >>> result = await nc_semantic_search_answer(
+            ...     query="What are my Q1 2025 project goals?",
+            ...     ctx=ctx
+            ... )
+            >>> print(result.generated_answer)
+            "Based on Document 1 (note: Project Kickoff), Document 2 (calendar event:
+            Q1 Planning Meeting), and Document 3 (deck card: Implement semantic search),
+            your main goals are: 1) Improve semantic search accuracy by 20%,
+            2) Deploy new embedding model, 3) Reduce indexing latency..."
+
+            >>> # Query about appointments
+            >>> result = await nc_semantic_search_answer(
+            ...     query="When is my next dentist appointment?",
+            ...     ctx=ctx,
+            ...     limit=10
+            ... )
+            >>> len(result.sources)  # Calendar events and related notes
+            3
+        """
+        # 1. Retrieve relevant documents via existing semantic search
+        search_response = await nc_semantic_search(
+            query=query,
+            ctx=ctx,
+            limit=limit,
+            score_threshold=score_threshold,
+        )
+
+        # 2. Handle no results case - don't waste a sampling call
+        if not search_response.results:
+            logger.debug(f"No documents found for query: {query}")
+            return SamplingSearchResponse(
+                query=query,
+                generated_answer="No relevant documents found in your Nextcloud content for this query.",
+                sources=[],
+                total_found=0,
+                search_method="semantic_sampling",
+                success=True,
+            )
+
+        # 3. Check if client supports sampling
+        from mcp.types import ClientCapabilities, SamplingCapability
+
+        client_has_sampling = ctx.session.check_client_capability(
+            ClientCapabilities(sampling=SamplingCapability())
+        )
+
+        # Log capability check result for debugging
+        logger.info(
+            f"Sampling capability check: client_has_sampling={client_has_sampling}, "
+            f"query='{query}'"
+        )
+        if hasattr(ctx.session, "_client_params") and ctx.session._client_params:
+            client_caps = ctx.session._client_params.capabilities
+            logger.debug(
+                f"Client advertised capabilities: "
+                f"roots={client_caps.roots is not None}, "
+                f"sampling={client_caps.sampling is not None}, "
+                f"experimental={client_caps.experimental is not None}"
+            )
+
+        if not client_has_sampling:
+            logger.info(
+                f"Client does not support sampling (query: '{query}'), "
+                f"returning {len(search_response.results)} documents"
+            )
+            return SamplingSearchResponse(
+                query=query,
+                generated_answer=(
+                    f"[Sampling not supported by client]\n\n"
+                    f"Your MCP client doesn't support answer generation. "
+                    f"Found {search_response.total_found} relevant documents. "
+                    f"Please review the sources below."
+                ),
+                sources=search_response.results,
+                total_found=search_response.total_found,
+                search_method="semantic_sampling_unsupported",
+                success=True,
+            )
+
+        # 4. Construct context from retrieved documents
+        context_parts = []
+        for idx, result in enumerate(search_response.results, 1):
+            context_parts.append(
+                f"[Document {idx}]\n"
+                f"Type: {result.doc_type}\n"
+                f"Title: {result.title}\n"
+                f"Category: {result.category}\n"
+                f"Excerpt: {result.excerpt}\n"
+                f"Relevance Score: {result.score:.2f}\n"
+            )
+
+        context = "\n".join(context_parts)
+
+        # 5. Construct prompt - reuse user's query, add context and instructions
+        prompt = (
+            f"{query}\n\n"
+            f"Here are relevant documents from Nextcloud (notes, calendar events, deck cards, files, contacts):\n\n"
+            f"{context}\n\n"
+            f"Based on the documents above, please provide a comprehensive answer. "
+            f"Cite the document numbers when referencing specific information."
+        )
+
+        logger.info(
+            f"Initiating sampling request: query_length={len(query)}, "
+            f"documents={len(search_response.results)}, "
+            f"prompt_length={len(prompt)}, max_tokens={max_answer_tokens}"
+        )
+
+        # 6. Request LLM completion via MCP sampling with timeout
+        import anyio
+
+        try:
+            with anyio.fail_after(30):
+                sampling_result = await ctx.session.create_message(
+                    messages=[
+                        SamplingMessage(
+                            role="user",
+                            content=TextContent(type="text", text=prompt),
+                        )
+                    ],
+                    max_tokens=max_answer_tokens,
+                    temperature=0.7,
+                    model_preferences=ModelPreferences(
+                        hints=[ModelHint(name="claude-3-5-sonnet")],
+                        intelligencePriority=0.8,
+                        speedPriority=0.5,
+                    ),
+                    include_context="thisServer",
+                )
+
+            # 7. Extract answer from sampling response
+            if sampling_result.content.type == "text":
+                generated_answer = sampling_result.content.text
+            else:
+                # Handle non-text responses (shouldn't happen for text prompts)
+                generated_answer = f"Received non-text response of type: {sampling_result.content.type}"
+                logger.warning(
+                    f"Unexpected content type from sampling: {sampling_result.content.type}"
+                )
+
+            logger.info(
+                f"Sampling successful: model={sampling_result.model}, "
+                f"stop_reason={sampling_result.stopReason}, "
+                f"answer_length={len(generated_answer)}"
+            )
+
+            return SamplingSearchResponse(
+                query=query,
+                generated_answer=generated_answer,
+                sources=search_response.results,
+                total_found=search_response.total_found,
+                search_method="semantic_sampling",
+                model_used=sampling_result.model,
+                stop_reason=sampling_result.stopReason,
+                success=True,
+            )
+
+        except TimeoutError:
+            logger.warning(
+                f"Sampling request timed out after 30 seconds for query: '{query}', "
+                f"returning search results only"
+            )
+            return SamplingSearchResponse(
+                query=query,
+                generated_answer=(
+                    f"[Sampling request timed out]\n\n"
+                    f"The answer generation took too long (>30s). "
+                    f"Found {search_response.total_found} relevant documents. "
+                    f"Please review the sources below or try a simpler query."
+                ),
+                sources=search_response.results,
+                total_found=search_response.total_found,
+                search_method="semantic_sampling_timeout",
+                success=True,
+            )
+
+        except McpError as e:
+            # Expected MCP protocol errors (user rejection, unsupported, etc.)
+            error_msg = str(e)
+
+            if "rejected" in error_msg.lower() or "denied" in error_msg.lower():
+                # User explicitly declined - this is normal, not an error
+                logger.info(f"User declined sampling request for query: '{query}'")
+                search_method = "semantic_sampling_user_declined"
+                user_message = "User declined to generate an answer"
+            elif "not supported" in error_msg.lower():
+                # Client doesn't support sampling - also normal
+                logger.info(f"Sampling not supported by client for query: '{query}'")
+                search_method = "semantic_sampling_unsupported"
+                user_message = "Sampling not supported by this client"
+            else:
+                # Other MCP protocol errors
+                logger.warning(
+                    f"MCP error during sampling for query '{query}': {error_msg}"
+                )
+                search_method = "semantic_sampling_mcp_error"
+                user_message = f"Sampling unavailable: {error_msg}"
+
+            return SamplingSearchResponse(
+                query=query,
+                generated_answer=(
+                    f"[{user_message}]\n\n"
+                    f"Found {search_response.total_found} relevant documents. "
+                    f"Please review the sources below."
+                ),
+                sources=search_response.results,
+                total_found=search_response.total_found,
+                search_method=search_method,
+                success=True,
+            )
+
+        except Exception as e:
+            # Truly unexpected errors - these SHOULD have tracebacks
+            logger.error(
+                f"Unexpected error during sampling for query '{query}': "
+                f"{type(e).__name__}: {e}",
+                exc_info=True,
+            )
+
+            return SamplingSearchResponse(
+                query=query,
+                generated_answer=(
+                    f"[Unexpected error during sampling]\n\n"
+                    f"Found {search_response.total_found} relevant documents. "
+                    f"Please review the sources below."
+                ),
+                sources=search_response.results,
+                total_found=search_response.total_found,
+                search_method="semantic_sampling_error",
+                success=True,
+            )
+
+    @mcp.tool()
+    @require_scopes("semantic:read")
+    async def nc_get_vector_sync_status(ctx: Context) -> VectorSyncStatusResponse:
+        """Get the current vector sync status.
+
+        Returns information about the vector sync process, including:
+        - Number of documents indexed in the vector database
+        - Number of documents pending processing
+        - Current sync status (idle, syncing, or disabled)
+
+        This is useful for determining when vector indexing is complete
+        after creating or updating content across all indexed apps.
+        """
+        import os
+
+        # Check if vector sync is enabled
+        vector_sync_enabled = (
+            os.getenv("VECTOR_SYNC_ENABLED", "false").lower() == "true"
+        )
+
+        if not vector_sync_enabled:
+            return VectorSyncStatusResponse(
+                indexed_count=0,
+                pending_count=0,
+                status="disabled",
+                enabled=False,
+            )
+
+        try:
+            # Get document receive stream from lifespan context
+            lifespan_ctx = ctx.request_context.lifespan_context
+            document_receive_stream = getattr(
+                lifespan_ctx, "document_receive_stream", None
+            )
+
+            if document_receive_stream is None:
+                logger.debug(
+                    "document_receive_stream not available in lifespan context"
+                )
+                return VectorSyncStatusResponse(
+                    indexed_count=0,
+                    pending_count=0,
+                    status="unknown",
+                    enabled=True,
+                )
+
+            # Get pending count from stream statistics
+            stream_stats = document_receive_stream.statistics()
+            pending_count = stream_stats.current_buffer_used
+
+            # Get Qdrant client and query indexed count
+            indexed_count = 0
+            try:
+                from nextcloud_mcp_server.config import get_settings
+                from nextcloud_mcp_server.vector.qdrant_client import get_qdrant_client
+
+                settings = get_settings()
+                qdrant_client = await get_qdrant_client()
+
+                # Count documents in collection
+                count_result = await qdrant_client.count(
+                    collection_name=settings.get_collection_name()
+                )
+                indexed_count = count_result.count
+
+            except Exception as e:
+                logger.warning(f"Failed to query Qdrant for indexed count: {e}")
+                # Continue with indexed_count = 0
+
+            # Determine status
+            status = "syncing" if pending_count > 0 else "idle"
+
+            return VectorSyncStatusResponse(
+                indexed_count=indexed_count,
+                pending_count=pending_count,
+                status=status,
+                enabled=True,
+            )
+
+        except Exception as e:
+            logger.error(f"Error getting vector sync status: {e}")
+            raise McpError(
+                ErrorData(
+                    code=-1,
+                    message=f"Failed to retrieve vector sync status: {str(e)}",
+                )
+            )
@@ -0,0 +1,16 @@
+"""Vector database and background sync package."""
+
+from .document_chunker import DocumentChunker
+from .processor import process_document, processor_task
+from .qdrant_client import get_qdrant_client
+from .scanner import DocumentTask, scan_user_documents, scanner_task
+
+__all__ = [
+    "get_qdrant_client",
+    "DocumentChunker",
+    "scanner_task",
+    "scan_user_documents",
+    "DocumentTask",
+    "processor_task",
+    "process_document",
+]
@@ -0,0 +1,51 @@
+"""Document chunking for large texts."""
+
+import logging
+
+logger = logging.getLogger(__name__)
+
+
+class DocumentChunker:
+    """Chunk large documents for optimal embedding."""
+
+    def __init__(self, chunk_size: int = 512, overlap: int = 50):
+        """
+        Initialize document chunker.
+
+        Args:
+            chunk_size: Number of words per chunk (default: 512)
+            overlap: Number of overlapping words between chunks (default: 50)
+        """
+        self.chunk_size = chunk_size
+        self.overlap = overlap
+
+    def chunk_text(self, content: str) -> list[str]:
+        """
+        Split text into overlapping chunks.
+
+        Uses simple word-based chunking with configurable overlap to preserve
+        context across chunk boundaries.
+
+        Args:
+            content: Text content to chunk
+
+        Returns:
+            List of text chunks (may be single item if content is small)
+        """
+        # Simple word-based chunking
+        words = content.split()
+
+        if len(words) <= self.chunk_size:
+            return [content]
+
+        chunks = []
+        start = 0
+
+        while start < len(words):
+            end = start + self.chunk_size
+            chunk_words = words[start:end]
+            chunks.append(" ".join(chunk_words))
+            start = end - self.overlap
+
+        logger.debug(f"Chunked document into {len(chunks)} chunks ({len(words)} words)")
+        return chunks
@@ -0,0 +1,223 @@
+"""Processor task for vector database synchronization.
+
+Processes documents from stream: fetches content, generates embeddings, stores in Qdrant.
+"""
+
+import logging
+import time
+import uuid
+
+import anyio
+from anyio.streams.memory import MemoryObjectReceiveStream
+from httpx import HTTPStatusError
+from qdrant_client.models import FieldCondition, Filter, MatchValue, PointStruct
+
+from nextcloud_mcp_server.client import NextcloudClient
+from nextcloud_mcp_server.config import get_settings
+from nextcloud_mcp_server.embedding import get_embedding_service
+from nextcloud_mcp_server.vector.document_chunker import DocumentChunker
+from nextcloud_mcp_server.vector.qdrant_client import get_qdrant_client
+from nextcloud_mcp_server.vector.scanner import DocumentTask
+
+logger = logging.getLogger(__name__)
+
+
+async def processor_task(
+    worker_id: int,
+    receive_stream: MemoryObjectReceiveStream[DocumentTask],
+    shutdown_event: anyio.Event,
+    nc_client: NextcloudClient,
+    user_id: str,
+):
+    """
+    Process documents from stream concurrently.
+
+    Each processor task runs in a loop:
+    1. Receive document from stream (with timeout)
+    2. Fetch content from Nextcloud
+    3. Tokenize and chunk text
+    4. Generate embeddings (I/O bound - external API)
+    5. Upload vectors to Qdrant
+
+    Multiple processors run concurrently for I/O parallelism.
+
+    Args:
+        worker_id: Worker identifier for logging
+        receive_stream: Stream to receive documents from
+        shutdown_event: Event signaling shutdown
+        nc_client: Authenticated Nextcloud client
+        user_id: User being processed
+    """
+    logger.info(f"Processor {worker_id} started")
+
+    while not shutdown_event.is_set():
+        try:
+            # Get document with timeout (allows checking shutdown)
+            with anyio.fail_after(1.0):
+                doc_task = await receive_stream.receive()
+
+            # Process document
+            await process_document(doc_task, nc_client)
+
+        except TimeoutError:
+            # No documents available, continue
+            continue
+
+        except anyio.EndOfStream:
+            # Scanner finished and closed stream, exit gracefully
+            logger.info(f"Processor {worker_id}: Scanner finished, exiting")
+            break
+
+        except Exception as e:
+            logger.error(
+                f"Processor {worker_id} error processing "
+                f"{doc_task.doc_type}_{doc_task.doc_id}: {e}",
+                exc_info=True,
+            )
+            # Continue to next document (no task_done() needed with streams)
+
+    logger.info(f"Processor {worker_id} stopped")
+
+
+async def process_document(doc_task: DocumentTask, nc_client: NextcloudClient):
+    """
+    Process a single document: fetch, tokenize, embed, store in Qdrant.
+
+    Implements retry logic with exponential backoff for transient failures.
+
+    Args:
+        doc_task: Document task to process
+        nc_client: Authenticated Nextcloud client
+    """
+    logger.debug(
+        f"Processing {doc_task.doc_type}_{doc_task.doc_id} "
+        f"for {doc_task.user_id} ({doc_task.operation})"
+    )
+
+    qdrant_client = await get_qdrant_client()
+    settings = get_settings()
+
+    # Handle deletion
+    if doc_task.operation == "delete":
+        await qdrant_client.delete(
+            collection_name=settings.get_collection_name(),
+            points_selector=Filter(
+                must=[
+                    FieldCondition(
+                        key="user_id",
+                        match=MatchValue(value=doc_task.user_id),
+                    ),
+                    FieldCondition(
+                        key="doc_id",
+                        match=MatchValue(value=doc_task.doc_id),
+                    ),
+                    FieldCondition(
+                        key="doc_type",
+                        match=MatchValue(value=doc_task.doc_type),
+                    ),
+                ]
+            ),
+        )
+        logger.info(
+            f"Deleted {doc_task.doc_type}_{doc_task.doc_id} for {doc_task.user_id}"
+        )
+        return
+
+    # Handle indexing with retry
+    max_retries = 3
+    retry_delay = 1.0
+
+    for attempt in range(max_retries):
+        try:
+            await _index_document(doc_task, nc_client, qdrant_client)
+            return  # Success
+
+        except (HTTPStatusError, Exception) as e:
+            if attempt < max_retries - 1:
+                logger.warning(
+                    f"Retry {attempt + 1}/{max_retries} for "
+                    f"{doc_task.doc_type}_{doc_task.doc_id}: {e}"
+                )
+                await anyio.sleep(retry_delay)
+                retry_delay *= 2  # Exponential backoff
+            else:
+                logger.error(
+                    f"Failed to index {doc_task.doc_type}_{doc_task.doc_id} "
+                    f"after {max_retries} retries: {e}"
+                )
+                raise
+
+
+async def _index_document(
+    doc_task: DocumentTask, nc_client: NextcloudClient, qdrant_client
+):
+    """
+    Index a single document (called by process_document with retry).
+
+    Args:
+        doc_task: Document task to index
+        nc_client: Authenticated Nextcloud client
+        qdrant_client: Qdrant client instance
+    """
+    settings = get_settings()
+
+    # Fetch document content
+    if doc_task.doc_type == "note":
+        document = await nc_client.notes.get_note(int(doc_task.doc_id))
+        content = f"{document['title']}\n\n{document['content']}"
+        title = document["title"]
+        etag = document.get("etag", "")
+    else:
+        raise ValueError(f"Unsupported doc_type: {doc_task.doc_type}")
+
+    # Tokenize and chunk (using configured chunk size and overlap)
+    chunker = DocumentChunker(
+        chunk_size=settings.document_chunk_size,
+        overlap=settings.document_chunk_overlap,
+    )
+    chunks = chunker.chunk_text(content)
+
+    # Generate embeddings (I/O bound - external API call)
+    embedding_service = get_embedding_service()
+    embeddings = await embedding_service.embed_batch(chunks)
+
+    # Prepare Qdrant points
+    indexed_at = int(time.time())
+    points = []
+
+    for i, (chunk, embedding) in enumerate(zip(chunks, embeddings)):
+        # Generate deterministic UUID for point ID
+        # Using uuid5 with DNS namespace and combining doc info
+        point_name = f"{doc_task.doc_type}:{doc_task.doc_id}:chunk:{i}"
+        point_id = str(uuid.uuid5(uuid.NAMESPACE_DNS, point_name))
+
+        points.append(
+            PointStruct(
+                id=point_id,
+                vector=embedding,
+                payload={
+                    "user_id": doc_task.user_id,
+                    "doc_id": doc_task.doc_id,
+                    "doc_type": doc_task.doc_type,
+                    "title": title,
+                    "excerpt": chunk[:200],
+                    "indexed_at": indexed_at,
+                    "modified_at": doc_task.modified_at,
+                    "etag": etag,
+                    "chunk_index": i,
+                    "total_chunks": len(chunks),
+                },
+            )
+        )
+
+    # Upsert to Qdrant
+    await qdrant_client.upsert(
+        collection_name=settings.get_collection_name(),
+        points=points,
+        wait=True,
+    )
+
+    logger.info(
+        f"Indexed {doc_task.doc_type}_{doc_task.doc_id} for {doc_task.user_id} "
+        f"({len(chunks)} chunks)"
+    )
@@ -0,0 +1,115 @@
+"""Qdrant client wrapper."""
+
+import logging
+
+from qdrant_client import AsyncQdrantClient
+from qdrant_client.models import Distance, VectorParams
+
+from nextcloud_mcp_server.config import get_settings
+
+logger = logging.getLogger(__name__)
+
+
+# Singleton instance
+_qdrant_client: AsyncQdrantClient | None = None
+
+
+async def get_qdrant_client() -> AsyncQdrantClient:
+    """
+    Get singleton Qdrant client instance.
+
+    Automatically creates collection on first use if it doesn't exist.
+
+    Supports three Qdrant modes:
+    - Network mode: QDRANT_URL set (e.g., http://qdrant:6333)
+    - In-memory mode: QDRANT_LOCATION=:memory: (default if nothing configured)
+    - Persistent local mode: QDRANT_LOCATION=/path/to/data
+
+    Returns:
+        Configured AsyncQdrantClient instance
+
+    Raises:
+        Exception: If Qdrant connection fails or collection creation fails
+    """
+    global _qdrant_client
+
+    if _qdrant_client is None:
+        settings = get_settings()
+
+        # Detect mode and initialize client accordingly
+        if settings.qdrant_url:
+            # Network mode
+            logger.info(f"Using Qdrant network mode: {settings.qdrant_url}")
+            _qdrant_client = AsyncQdrantClient(
+                url=settings.qdrant_url,
+                api_key=settings.qdrant_api_key,
+                timeout=30,
+            )
+        elif settings.qdrant_location:
+            # Local mode (either :memory: or persistent path)
+            if settings.qdrant_location == ":memory:":
+                logger.info("Using Qdrant in-memory mode: :memory:")
+                _qdrant_client = AsyncQdrantClient(":memory:")
+            else:
+                # Persistent local mode - use path parameter
+                logger.info(f"Using Qdrant persistent mode: {settings.qdrant_location}")
+                _qdrant_client = AsyncQdrantClient(path=settings.qdrant_location)
+        else:
+            # Should not happen due to __post_init__ validation, but handle gracefully
+            logger.warning("No Qdrant mode configured, defaulting to :memory:")
+            _qdrant_client = AsyncQdrantClient(":memory:")
+
+        # Get collection name (auto-generated from deployment ID + model)
+        collection_name = settings.get_collection_name()
+
+        # Import here to avoid circular dependency
+        from nextcloud_mcp_server.embedding import get_embedding_service
+
+        embedding_service = get_embedding_service()
+        expected_dimension = embedding_service.get_dimension()
+
+        try:
+            # Get existing collection
+            collection_info = await _qdrant_client.get_collection(collection_name)
+            actual_dimension = collection_info.config.params.vectors.size
+
+            # Validate dimension matches
+            if actual_dimension != expected_dimension:
+                raise ValueError(
+                    f"Dimension mismatch for collection '{collection_name}':\n"
+                    f"  Expected: {expected_dimension} (from embedding model '{settings.ollama_embedding_model}')\n"
+                    f"  Found: {actual_dimension}\n"
+                    f"This usually means you changed the embedding model.\n"
+                    f"Solutions:\n"
+                    f"  1. Delete the old collection: Collection will be recreated with new dimensions\n"
+                    f"  2. Set QDRANT_COLLECTION to use a different collection name\n"
+                    f"  3. Revert OLLAMA_EMBEDDING_MODEL to the original model"
+                )
+
+            logger.info(
+                f"Using existing Qdrant collection: {collection_name} "
+                f"(dimension={actual_dimension}, model={settings.ollama_embedding_model})"
+            )
+
+        except Exception as e:
+            # Check if it's a dimension mismatch error (re-raise it)
+            if isinstance(e, ValueError) and "Dimension mismatch" in str(e):
+                raise
+
+            # Collection doesn't exist or other error, create it
+            await _qdrant_client.create_collection(
+                collection_name=collection_name,
+                vectors_config=VectorParams(
+                    size=expected_dimension,
+                    distance=Distance.COSINE,
+                ),
+            )
+            logger.info(
+                f"Created Qdrant collection: {collection_name}\n"
+                f"  Dimension: {expected_dimension}\n"
+                f"  Model: {settings.ollama_embedding_model}\n"
+                f"  Distance: COSINE\n"
+                f"Background sync will index all documents with this embedding model."
+            )
+
+    return _qdrant_client
@@ -0,0 +1,233 @@
+"""Scanner task for vector database synchronization.
+
+Periodically scans enabled users' content and queues changed documents for processing.
+"""
+
+import logging
+import time
+from dataclasses import dataclass
+
+import anyio
+from anyio.streams.memory import MemoryObjectSendStream
+from qdrant_client.models import FieldCondition, Filter, MatchValue
+
+from nextcloud_mcp_server.client import NextcloudClient
+from nextcloud_mcp_server.config import get_settings
+from nextcloud_mcp_server.vector.qdrant_client import get_qdrant_client
+
+logger = logging.getLogger(__name__)
+
+
+@dataclass
+class DocumentTask:
+    """Document task for processing queue."""
+
+    user_id: str
+    doc_id: str
+    doc_type: str  # "note", "file", "calendar"
+    operation: str  # "index" or "delete"
+    modified_at: int
+
+
+# Track documents potentially deleted (grace period before actual deletion)
+# Format: {(user_id, doc_id): first_missing_timestamp}
+_potentially_deleted: dict[tuple[str, str], float] = {}
+
+
+async def scanner_task(
+    send_stream: MemoryObjectSendStream[DocumentTask],
+    shutdown_event: anyio.Event,
+    wake_event: anyio.Event,
+    nc_client: NextcloudClient,
+    user_id: str,
+):
+    """
+    Periodic scanner that detects changed documents for enabled user.
+
+    For BasicAuth mode, scans a single user with credentials available at runtime.
+
+    Args:
+        send_stream: Stream to send changed documents to processors
+        shutdown_event: Event signaling shutdown
+        wake_event: Event to trigger immediate scan
+        nc_client: Authenticated Nextcloud client
+        user_id: User to scan
+    """
+    logger.info(f"Scanner task started for user: {user_id}")
+    settings = get_settings()
+
+    async with send_stream:
+        while not shutdown_event.is_set():
+            try:
+                # Scan user documents
+                await scan_user_documents(
+                    user_id=user_id,
+                    send_stream=send_stream,
+                    nc_client=nc_client,
+                )
+
+            except Exception as e:
+                logger.error(f"Scanner error: {e}", exc_info=True)
+
+            # Sleep until next interval or wake event
+            try:
+                with anyio.move_on_after(settings.vector_sync_scan_interval):
+                    # Wait for wake event or shutdown (whichever comes first)
+                    await wake_event.wait()
+            except anyio.get_cancelled_exc_class():
+                # Shutdown, exit loop
+                break
+
+    logger.info("Scanner task stopped - stream closed")
+
+
+async def scan_user_documents(
+    user_id: str,
+    send_stream: MemoryObjectSendStream[DocumentTask],
+    nc_client: NextcloudClient,
+    initial_sync: bool = False,
+):
+    """
+    Scan a single user's documents and send changes to processor stream.
+
+    Args:
+        user_id: User to scan
+        send_stream: Stream to send changed documents to processors
+        nc_client: Authenticated Nextcloud client
+        initial_sync: If True, send all documents (first-time sync)
+    """
+    logger.debug(f"Scanning documents for user: {user_id}")
+
+    # Fetch all notes from Nextcloud
+    notes = [note async for note in nc_client.notes.get_all_notes()]
+    logger.debug(f"Found {len(notes)} notes for {user_id}")
+
+    if initial_sync:
+        # Send everything on first sync
+        for note in notes:
+            # Handle missing 'modified' field (use 0 as fallback)
+            modified_at = note.get("modified", 0)
+            if modified_at == 0:
+                logger.warning(
+                    f"Note {note['id']} missing 'modified' field, using 0 as fallback"
+                )
+
+            await send_stream.send(
+                DocumentTask(
+                    user_id=user_id,
+                    doc_id=str(note["id"]),
+                    doc_type="note",
+                    operation="index",
+                    modified_at=modified_at,
+                )
+            )
+        logger.info(f"Sent {len(notes)} documents for initial sync: {user_id}")
+        return
+
+    # Get indexed state from Qdrant
+    qdrant_client = await get_qdrant_client()
+    scroll_result = await qdrant_client.scroll(
+        collection_name=get_settings().get_collection_name(),
+        scroll_filter=Filter(
+            must=[
+                FieldCondition(key="user_id", match=MatchValue(value=user_id)),
+                FieldCondition(key="doc_type", match=MatchValue(value="note")),
+            ]
+        ),
+        with_payload=["doc_id", "indexed_at"],
+        with_vectors=False,
+        limit=10000,
+    )
+
+    indexed_docs = {
+        point.payload["doc_id"]: point.payload["indexed_at"]
+        for point in scroll_result[0]
+    }
+
+    logger.debug(f"Found {len(indexed_docs)} indexed documents in Qdrant")
+
+    # Compare and queue changes
+    queued = 0
+    nextcloud_doc_ids = {str(note["id"]) for note in notes}
+
+    for note in notes:
+        doc_id = str(note["id"])
+        indexed_at = indexed_docs.get(doc_id)
+
+        # Handle missing 'modified' field (use 0 as fallback)
+        modified_at = note.get("modified", 0)
+        if modified_at == 0:
+            logger.warning(
+                f"Note {doc_id} missing 'modified' field, using 0 as fallback"
+            )
+
+        # If document reappeared, remove from potentially_deleted
+        doc_key = (user_id, doc_id)
+        if doc_key in _potentially_deleted:
+            logger.debug(
+                f"Document {doc_id} reappeared, removing from deletion grace period"
+            )
+            del _potentially_deleted[doc_key]
+
+        # Send if never indexed or modified since last index
+        if indexed_at is None or modified_at > indexed_at:
+            await send_stream.send(
+                DocumentTask(
+                    user_id=user_id,
+                    doc_id=doc_id,
+                    doc_type="note",
+                    operation="index",
+                    modified_at=modified_at,
+                )
+            )
+            queued += 1
+
+    # Check for deleted documents (in Qdrant but not in Nextcloud)
+    # Use grace period: only delete after 2 consecutive scans confirm absence
+    settings = get_settings()
+    grace_period = settings.vector_sync_scan_interval * 1.5  # Allow 1.5 scan intervals
+    current_time = time.time()
+
+    for doc_id in indexed_docs:
+        if doc_id not in nextcloud_doc_ids:
+            doc_key = (user_id, doc_id)
+
+            if doc_key in _potentially_deleted:
+                # Already marked as potentially deleted, check if grace period elapsed
+                first_missing_time = _potentially_deleted[doc_key]
+                time_missing = current_time - first_missing_time
+
+                if time_missing >= grace_period:
+                    # Grace period elapsed, send for deletion
+                    logger.info(
+                        f"Document {doc_id} missing for {time_missing:.1f}s "
+                        f"(>{grace_period:.1f}s grace period), sending deletion"
+                    )
+                    await send_stream.send(
+                        DocumentTask(
+                            user_id=user_id,
+                            doc_id=doc_id,
+                            doc_type="note",
+                            operation="delete",
+                            modified_at=0,
+                        )
+                    )
+                    queued += 1
+                    # Remove from tracking after sending deletion
+                    del _potentially_deleted[doc_key]
+                else:
+                    logger.debug(
+                        f"Document {doc_id} still missing "
+                        f"({time_missing:.1f}s/{grace_period:.1f}s grace period)"
+                    )
+            else:
+                # First time missing, add to grace period tracking
+                logger.debug(
+                    f"Document {doc_id} missing for first time, starting grace period"
+                )
+                _potentially_deleted[doc_key] = current_time
+
+    if queued > 0:
+        logger.info(f"Sent {queued} documents for incremental sync: {user_id}")
+    else:
+        logger.debug(f"No changes detected for {user_id}")
@@ -1,6 +1,6 @@
 [project]
 name = "nextcloud-mcp-server"
-version = "0.26.1"
+version = "0.29.1"
 description = "Model Context Protocol (MCP) server for Nextcloud integration - enables AI assistants to interact with Nextcloud data"
 authors = [
    {name = "Chris Coutinho", email = "chris@coutinho.io"}
@@ -21,6 +21,16 @@ dependencies = [
    "pyjwt[crypto]>=2.8.0",
    "aiosqlite>=0.20.0", # Async SQLite for refresh token storage
    "authlib>=1.6.5",
+    "qdrant-client>=1.7.0",
+    # Observability dependencies
+    "prometheus-client>=0.21.0",  # Prometheus metrics
+    "opentelemetry-api>=1.28.2",  # OpenTelemetry API
+    "opentelemetry-sdk>=1.28.2",  # OpenTelemetry SDK
+    "opentelemetry-instrumentation-asgi>=0.49b2",  # Auto-instrument ASGI/Starlette
+    "opentelemetry-instrumentation-httpx>=0.49b2",  # Auto-instrument httpx client
+    "opentelemetry-instrumentation-logging>=0.49b2",  # Logging integration
+    "opentelemetry-exporter-otlp-proto-grpc>=1.28.2",  # OTLP gRPC exporter
+    "python-json-logger>=3.2.0",  # Structured JSON logging
 ]
 classifiers = [
    "Development Status :: 4 - Beta",
@@ -550,6 +550,43 @@ async def temporary_note(nc_client: NextcloudClient):
                logger.error(f"Unexpected error deleting temporary note {note_id}: {e}")


+@pytest.fixture
+async def temporary_note_factory(nc_client: NextcloudClient):
+    """
+    Factory fixture to create multiple temporary notes with custom parameters.
+    Returns a callable that creates notes and tracks them for automatic cleanup.
+    """
+    created_notes = []
+
+    async def _create_note(title: str, content: str, category: str = ""):
+        """Create a temporary note with custom title, content, and category."""
+        logger.info(f"Creating temporary note via factory: {title}")
+        note_data = await nc_client.notes.create_note(
+            title=title, content=content, category=category
+        )
+        note_id = note_data.get("id")
+        if note_id:
+            created_notes.append(note_id)
+            logger.info(f"Factory created note ID: {note_id}")
+        return note_data
+
+    yield _create_note
+
+    # Cleanup all created notes
+    for note_id in created_notes:
+        logger.info(f"Cleaning up factory-created note ID: {note_id}")
+        try:
+            await nc_client.notes.delete_note(note_id=note_id)
+            logger.info(f"Successfully deleted factory note ID: {note_id}")
+        except HTTPStatusError as e:
+            if e.response.status_code != 404:
+                logger.error(f"HTTP error deleting factory note {note_id}: {e}")
+            else:
+                logger.warning(f"Factory note {note_id} already deleted (404).")
+        except Exception as e:
+            logger.error(f"Unexpected error deleting factory note {note_id}: {e}")
+
+
@pytest.fixture
 async def temporary_note_with_attachment(
    nc_client: NextcloudClient, temporary_note: dict
@@ -0,0 +1,407 @@
+"""Integration tests for MCP sampling with semantic search.
+
+These tests validate the nc_semantic_search_answer tool which combines:
+1. Semantic search to retrieve relevant documents
+2. MCP sampling to generate natural language answers
+
+Tests cover three scenarios:
+- Successful sampling (LLM generates answer)
+- Sampling fallback (client doesn't support sampling)
+- No results (no relevant documents found)
+
+Note: These tests require VECTOR_SYNC_ENABLED=true and a configured
+vector database with indexed test data.
+"""
+
+from unittest.mock import MagicMock
+
+import pytest
+from mcp.types import CreateMessageResult, TextContent
+
+pytestmark = pytest.mark.integration
+
+
+@pytest.fixture
+def mock_sampling_result():
+    """Mock successful sampling result from MCP client."""
+    result = MagicMock(spec=CreateMessageResult)
+    result.content = TextContent(
+        type="text",
+        text=(
+            "Based on Document 1 (Python Async Programming) and Document 2 "
+            "(Best Practices), you should use async/await for asynchronous "
+            "programming and always use async context managers for resources."
+        ),
+    )
+    result.model = "claude-3-5-sonnet"
+    result.stopReason = "endTurn"
+    return result
+
+
+async def test_semantic_search_answer_successful_sampling(
+    nc_mcp_client, temporary_note_factory
+):
+    """Test semantic search with successful LLM answer generation.
+
+    Prerequisites:
+    - VECTOR_SYNC_ENABLED=true
+    - Qdrant running and indexed
+    - Test note indexed in vector database
+
+    Flow:
+    1. Create test note with searchable content
+    2. Wait for vector sync to complete using nc_get_vector_sync_status
+    3. Call nc_semantic_search_answer
+    4. Mock ctx.session.create_message to return answer
+    5. Verify response contains generated answer and sources
+    """
+    # Get initial indexed count before creating note
+    import asyncio
+
+    initial_sync = await nc_mcp_client.call_tool(
+        "nc_get_vector_sync_status", arguments={}
+    )
+    initial_indexed_count = initial_sync.structuredContent["indexed_count"]
+    print(f"Initial indexed count: {initial_indexed_count}")
+
+    # Create a note with content about Python async
+    _note = await temporary_note_factory(
+        title="Python Async Guide",
+        content="""# Python Async Programming
+
+## Key Concepts
+- Use async def for coroutines
+- Use await for async operations
+- asyncio.gather() for parallel execution
+
+## Best Practices
+Always use async context managers for resources.
+Avoid blocking operations in async code.""",
+        category="Development",
+    )
+    print(f"Created note ID: {_note['id']}")
+
+    # Wait for vector indexing to complete
+    max_wait = 30  # Maximum 30 seconds
+    wait_interval = 1  # Check every 1 second
+    waited = 0
+
+    while waited < max_wait:
+        sync_status = await nc_mcp_client.call_tool(
+            "nc_get_vector_sync_status", arguments={}
+        )
+        status_data = sync_status.structuredContent
+
+        print(
+            f"Sync status at {waited}s: indexed={status_data['indexed_count']}, pending={status_data['pending_count']}, status={status_data['status']}"
+        )
+
+        # Check if indexed count increased (new note was indexed)
+        if (
+            status_data["indexed_count"] > initial_indexed_count
+            and status_data["pending_count"] == 0
+        ):
+            # Sync complete and new document indexed
+            print(
+                f"✓ Sync complete: {status_data['indexed_count']} documents indexed (was {initial_indexed_count})"
+            )
+            break
+
+        await asyncio.sleep(wait_interval)
+        waited += wait_interval
+
+    # Verify sync completed
+    assert waited < max_wait, (
+        f"Vector sync did not complete within {max_wait} seconds. Last status: {status_data}"
+    )
+    assert status_data["indexed_count"] > initial_indexed_count, (
+        f"New note was not indexed (count stayed at {initial_indexed_count})"
+    )
+
+    # Mock the sampling call
+    # Note: This requires monkey-patching ctx.session.create_message
+    # In a real integration test with MCP Inspector, this would be actual sampling
+
+    call_result = await nc_mcp_client.call_tool(
+        "nc_semantic_search_answer",
+        arguments={
+            "query": "How do I use async in Python?",
+            "limit": 5,
+            "score_threshold": 0.0,  # Use 0.0 for SimpleEmbeddingProvider (feature hashing)
+        },
+    )
+
+    # Extract result from CallToolResult
+    assert call_result.isError is False, (
+        f"Tool call failed: {call_result.content[0].text if call_result.isError else ''}"
+    )
+    result = call_result.structuredContent
+
+    # Verify response structure
+    assert result is not None
+    assert "query" in result
+    assert "generated_answer" in result
+    assert "sources" in result
+    assert "total_found" in result
+    assert "search_method" in result
+
+    # For this test, sampling might fail (no real LLM client)
+    # So we check for either success or various fallback states
+    unsupported_methods = {
+        "semantic_sampling_unsupported",
+        "semantic_sampling_user_declined",
+        "semantic_sampling_timeout",
+        "semantic_sampling_mcp_error",
+        "semantic_sampling_fallback",
+    }
+
+    if result["search_method"] in unsupported_methods:
+        # Fallback/unsupported mode - should still have sources
+        assert len(result["sources"]) > 0
+        assert result["total_found"] > 0
+        pytest.skip(
+            f"Sampling not available (method: {result['search_method']}), "
+            f"but search results returned successfully"
+        )
+    else:
+        # Successful sampling
+        assert result["search_method"] == "semantic_sampling"
+        assert "async" in result["generated_answer"].lower()
+        assert len(result["sources"]) > 0
+        assert result["model_used"] is not None
+
+
+async def test_semantic_search_answer_no_results(nc_mcp_client):
+    """Test semantic search answer when no documents match.
+
+    Flow:
+    1. Query for completely unrelated topic
+    2. Verify response indicates no documents found
+    3. Verify no sampling call was made (no sources to base answer on)
+    """
+    call_result = await nc_mcp_client.call_tool(
+        "nc_semantic_search_answer",
+        arguments={
+            "query": "quantum chromodynamics lattice QCD gluon propagator",
+            "limit": 5,
+            "score_threshold": 0.7,  # Use high threshold to filter out unrelated documents
+        },
+    )
+
+    # Extract result from CallToolResult
+    assert call_result.isError is False, (
+        f"Tool call failed: {call_result.content[0].text if call_result.isError else ''}"
+    )
+    result = call_result.structuredContent
+
+    # Should get "no documents found" message
+    assert result is not None
+    assert result["total_found"] == 0
+    assert len(result["sources"]) == 0
+    assert "No relevant documents" in result["generated_answer"]
+    assert result["search_method"] == "semantic_sampling"
+    # No sampling should have occurred
+    assert result["model_used"] is None
+    assert result["stop_reason"] is None
+
+
+async def test_semantic_search_answer_with_limit(nc_mcp_client, temporary_note_factory):
+    """Test semantic search answer respects limit parameter.
+
+    Flow:
+    1. Create multiple related notes
+    2. Wait for vector sync to complete
+    3. Query with limit=2
+    4. Verify at most 2 sources in response
+    """
+    # Create multiple related notes
+    _note1 = await temporary_note_factory(
+        title="Python Async Part 1",
+        content="Use async/await for asynchronous operations",
+        category="Development",
+    )
+    _note2 = await temporary_note_factory(
+        title="Python Async Part 2",
+        content="Use asyncio.gather() for parallel execution",
+        category="Development",
+    )
+    _note3 = await temporary_note_factory(
+        title="Python Async Part 3",
+        content="Always use async context managers",
+        category="Development",
+    )
+
+    # Wait for vector indexing to complete
+    import asyncio
+
+    max_wait = 30
+    wait_interval = 1
+    waited = 0
+
+    while waited < max_wait:
+        sync_status = await nc_mcp_client.call_tool(
+            "nc_get_vector_sync_status", arguments={}
+        )
+        status_data = sync_status.structuredContent
+
+        if status_data["status"] == "idle" and status_data["pending_count"] == 0:
+            break
+
+        await asyncio.sleep(wait_interval)
+        waited += wait_interval
+
+    assert waited < max_wait, f"Vector sync did not complete within {max_wait} seconds"
+
+    call_result = await nc_mcp_client.call_tool(
+        "nc_semantic_search_answer",
+        arguments={
+            "query": "async programming in Python",
+            "limit": 2,
+            "score_threshold": 0.0,  # Use 0.0 for SimpleEmbeddingProvider (feature hashing)
+        },
+    )
+
+    # Extract result from CallToolResult
+    assert call_result.isError is False, (
+        f"Tool call failed: {call_result.content[0].text if call_result.isError else ''}"
+    )
+    result = call_result.structuredContent
+
+    # Should respect limit
+    assert len(result["sources"]) <= 2
+
+
+async def test_semantic_search_answer_score_threshold(
+    nc_mcp_client, temporary_note_factory
+):
+    """Test semantic search answer respects score threshold.
+
+    Flow:
+    1. Create note with specific content
+    2. Wait for vector sync to complete
+    3. Query with high threshold (0.9)
+    4. Verify only high-scoring results returned
+    """
+    _note = await temporary_note_factory(
+        title="Exact Match Test",
+        content="This is a very specific test document about widget manufacturing",
+        category="Test",
+    )
+
+    # Wait for vector indexing to complete
+    import asyncio
+
+    max_wait = 30
+    wait_interval = 1
+    waited = 0
+
+    while waited < max_wait:
+        sync_status = await nc_mcp_client.call_tool(
+            "nc_get_vector_sync_status", arguments={}
+        )
+        status_data = sync_status.structuredContent
+
+        if status_data["status"] == "idle" and status_data["pending_count"] == 0:
+            break
+
+        await asyncio.sleep(wait_interval)
+        waited += wait_interval
+
+    assert waited < max_wait, f"Vector sync did not complete within {max_wait} seconds"
+
+    # Query with exact match
+    call_result = await nc_mcp_client.call_tool(
+        "nc_semantic_search_answer",
+        arguments={
+            "query": "widget manufacturing",
+            "limit": 5,
+            "score_threshold": 0.0,  # Use 0.0 for SimpleEmbeddingProvider (feature hashing)
+        },
+    )
+
+    # Extract result from CallToolResult
+    assert call_result.isError is False, (
+        f"Tool call failed: {call_result.content[0].text if call_result.isError else ''}"
+    )
+    result = call_result.structuredContent
+
+    # Note: Semantic search scores depend on embedding model
+    # We just verify the tool accepts the parameter
+    assert "score_threshold" not in result  # Not exposed in response
+    if result["total_found"] > 0:
+        # If results found, verify they're in sources
+        assert all("score" in source for source in result["sources"])
+
+
+async def test_semantic_search_answer_max_tokens(nc_mcp_client, temporary_note_factory):
+    """Test semantic search answer respects max_answer_tokens parameter.
+
+    Flow:
+    1. Create note with content
+    2. Wait for vector sync to complete
+    3. Call with very small max_tokens (100)
+    4. Verify parameter is accepted (actual token limiting happens in client)
+
+    Note: Token limiting is enforced by the MCP client's LLM, not the server.
+    This test just verifies the parameter is correctly passed.
+    """
+    _note = await temporary_note_factory(
+        title="Long Document",
+        content="This is a document with lots of content. " * 50,
+        category="Test",
+    )
+
+    # Wait for vector indexing to complete
+    import asyncio
+
+    max_wait = 30
+    wait_interval = 1
+    waited = 0
+
+    while waited < max_wait:
+        sync_status = await nc_mcp_client.call_tool(
+            "nc_get_vector_sync_status", arguments={}
+        )
+        status_data = sync_status.structuredContent
+
+        if status_data["status"] == "idle" and status_data["pending_count"] == 0:
+            break
+
+        await asyncio.sleep(wait_interval)
+        waited += wait_interval
+
+    assert waited < max_wait, f"Vector sync did not complete within {max_wait} seconds"
+
+    call_result = await nc_mcp_client.call_tool(
+        "nc_semantic_search_answer",
+        arguments={
+            "query": "document content",
+            "limit": 5,
+            "score_threshold": 0.0,  # Use 0.0 for SimpleEmbeddingProvider (feature hashing)
+            "max_answer_tokens": 100,
+        },
+    )
+
+    # Extract result from CallToolResult
+    assert call_result.isError is False, (
+        f"Tool call failed: {call_result.content[0].text if call_result.isError else ''}"
+    )
+    result = call_result.structuredContent
+
+    # Should not error, even if sampling fails
+    assert result is not None
+    assert "generated_answer" in result
+
+
+async def test_semantic_search_answer_requires_vector_sync():
+    """Test that semantic search answer fails when VECTOR_SYNC_ENABLED=false.
+
+    This test validates the tool properly checks for vector sync being enabled.
+
+    Note: This test requires a separate test client with VECTOR_SYNC_ENABLED=false,
+    which may not be available in the current test environment. Skipping for now.
+    """
+    pytest.skip(
+        "Requires test environment with VECTOR_SYNC_ENABLED=false, "
+        "which would break other semantic search tests"
+    )
@@ -0,0 +1,432 @@
+"""Integration tests for semantic search with vector database.
+
+These tests validate the complete semantic search flow:
+1. Initialize Qdrant collection with simple in-process embeddings
+2. Index sample notes into vector database
+3. Perform semantic search queries
+4. Verify relevant results are returned
+
+Uses SimpleEmbeddingProvider for deterministic, in-process embeddings
+without requiring external services like Ollama.
+"""
+
+import tempfile
+from pathlib import Path
+
+import pytest
+from qdrant_client import AsyncQdrantClient
+from qdrant_client.models import Distance, PointStruct, VectorParams
+
+from nextcloud_mcp_server.embedding import SimpleEmbeddingProvider
+
+pytestmark = pytest.mark.integration
+
+
+@pytest.fixture
+async def simple_embedding_provider():
+    """Simple in-process embedding provider for testing."""
+    return SimpleEmbeddingProvider(dimension=384)
+
+
+@pytest.fixture
+async def qdrant_test_client():
+    """Qdrant client for testing (in-memory)."""
+    client = AsyncQdrantClient(":memory:")
+    yield client
+    await client.close()
+
+
+@pytest.fixture
+async def test_collection(qdrant_test_client: AsyncQdrantClient):
+    """Create test collection in Qdrant."""
+    collection_name = "test_semantic_search"
+
+    # Create collection
+    await qdrant_test_client.create_collection(
+        collection_name=collection_name,
+        vectors_config=VectorParams(size=384, distance=Distance.COSINE),
+    )
+
+    yield collection_name
+
+    # Cleanup
+    try:
+        await qdrant_test_client.delete_collection(collection_name)
+    except Exception:
+        pass
+
+
+@pytest.fixture
+def sample_notes():
+    """Sample notes for testing semantic search."""
+    return [
+        {
+            "id": 1,
+            "title": "Python Async Programming",
+            "content": """# Python Async/Await Patterns
+
+## Key Concepts
+- Use async def for coroutines
+- Use await for async operations
+- asyncio.gather() for parallel execution
+
+## Best Practices
+Always use async context managers for resources.
+Avoid blocking operations in async code.""",
+            "category": "Development",
+        },
+        {
+            "id": 2,
+            "title": "Book Recommendations 2025",
+            "content": """# Books to Read
+
+## Fiction
+- The Midnight Library by Matt Haig
+- Project Hail Mary by Andy Weir
+
+## Non-Fiction
+- Atomic Habits by James Clear
+- Deep Work by Cal Newport
+
+## Technical
+- Designing Data-Intensive Applications by Martin Kleppmann""",
+            "category": "Personal",
+        },
+        {
+            "id": 3,
+            "title": "Chocolate Chip Cookie Recipe",
+            "content": """# Classic Cookies
+
+## Ingredients
+- 2 cups flour
+- 1 cup butter
+- 1 cup sugar
+- 2 eggs
+- 2 cups chocolate chips
+
+## Instructions
+1. Preheat oven to 375°F
+2. Mix butter and sugar
+3. Add eggs and vanilla
+4. Mix in flour
+5. Fold in chocolate chips
+6. Bake 10-12 minutes""",
+            "category": "Recipes",
+        },
+        {
+            "id": 4,
+            "title": "Team Meeting Notes",
+            "content": """# Q1 Planning Meeting
+
+## Attendees
+- Alice, Bob, Charlie
+
+## Discussion
+- Review Q4 deliverables
+- Plan Q1 sprints
+- Resource allocation
+
+## Action Items
+- Alice: Draft timeline
+- Bob: Infrastructure review""",
+            "category": "Work",
+        },
+    ]
+
+
+async def test_simple_embedding_provider_deterministic(simple_embedding_provider):
+    """Test that SimpleEmbeddingProvider generates deterministic embeddings."""
+    text = "Hello world this is a test"
+
+    # Generate embedding twice
+    embedding1 = await simple_embedding_provider.embed(text)
+    embedding2 = await simple_embedding_provider.embed(text)
+
+    # Should be identical
+    assert embedding1 == embedding2
+    assert len(embedding1) == 384
+
+    # Should be normalized (unit length)
+    import math
+
+    norm = math.sqrt(sum(x * x for x in embedding1))
+    assert abs(norm - 1.0) < 1e-6
+
+
+async def test_simple_embedding_provider_similarity(simple_embedding_provider):
+    """Test that similar texts have higher cosine similarity."""
+
+    async def cosine_similarity(text1: str, text2: str) -> float:
+        emb1 = await simple_embedding_provider.embed(text1)
+        emb2 = await simple_embedding_provider.embed(text2)
+        return sum(a * b for a, b in zip(emb1, emb2))
+
+    # Similar texts
+    python_text1 = "Python async programming with asyncio"
+    python_text2 = "Using async and await in Python"
+    unrelated_text = "Chocolate chip cookie recipe"
+
+    # Similar texts should have higher similarity
+    similar_score = await cosine_similarity(python_text1, python_text2)
+    unrelated_score = await cosine_similarity(python_text1, unrelated_text)
+
+    assert similar_score > unrelated_score
+    assert similar_score > 0.3  # Some semantic overlap
+    assert unrelated_score < similar_score
+
+
+async def test_semantic_search_with_qdrant(
+    qdrant_test_client: AsyncQdrantClient,
+    test_collection: str,
+    simple_embedding_provider: SimpleEmbeddingProvider,
+    sample_notes: list[dict],
+):
+    """Test full semantic search flow with Qdrant."""
+
+    # Index all sample notes
+    points = []
+    for note in sample_notes:
+        content = f"{note['title']}\n\n{note['content']}"
+        embedding = await simple_embedding_provider.embed(content)
+
+        points.append(
+            PointStruct(
+                id=note["id"],  # Use integer ID for in-memory Qdrant
+                vector=embedding,
+                payload={
+                    "note_id": note["id"],
+                    "title": note["title"],
+                    "category": note["category"],
+                    "excerpt": content[:200],
+                },
+            )
+        )
+
+    await qdrant_test_client.upsert(
+        collection_name=test_collection, points=points, wait=True
+    )
+
+    # Test Query 1: Search for Python programming
+    query = "async programming patterns in Python"
+    query_embedding = await simple_embedding_provider.embed(query)
+
+    response = await qdrant_test_client.query_points(
+        collection_name=test_collection,
+        query=query_embedding,
+        limit=3,
+        score_threshold=0.0,
+    )
+
+    # Should find Python note as top result
+    assert len(response.points) > 0
+    assert response.points[0].payload["note_id"] == 1
+    assert "Python" in response.points[0].payload["title"]
+
+    # Test Query 2: Search for books
+    query = "good books to read recommendations"
+    query_embedding = await simple_embedding_provider.embed(query)
+
+    response = await qdrant_test_client.query_points(
+        collection_name=test_collection,
+        query=query_embedding,
+        limit=3,
+        score_threshold=0.0,
+    )
+
+    # Should find book recommendations note
+    assert len(response.points) > 0
+    top_result = response.points[0]
+    assert top_result.payload["note_id"] == 2
+    assert "Book" in top_result.payload["title"]
+
+    # Test Query 3: Search for recipes
+    query = "how to bake cookies dessert"
+    query_embedding = await simple_embedding_provider.embed(query)
+
+    response = await qdrant_test_client.query_points(
+        collection_name=test_collection,
+        query=query_embedding,
+        limit=3,
+        score_threshold=0.0,
+    )
+
+    # Should find recipe note
+    assert len(response.points) > 0
+    # Recipe should be in top 2 results
+    top_note_ids = [r.payload["note_id"] for r in response.points[:2]]
+    assert 3 in top_note_ids
+
+
+async def test_semantic_search_with_filters(
+    qdrant_test_client: AsyncQdrantClient,
+    test_collection: str,
+    simple_embedding_provider: SimpleEmbeddingProvider,
+    sample_notes: list[dict],
+):
+    """Test semantic search with category filtering."""
+    from qdrant_client.models import FieldCondition, Filter, MatchValue
+
+    # Index notes
+    points = []
+    for note in sample_notes:
+        content = f"{note['title']}\n\n{note['content']}"
+        embedding = await simple_embedding_provider.embed(content)
+
+        points.append(
+            PointStruct(
+                id=note["id"],  # Use integer ID for in-memory Qdrant
+                vector=embedding,
+                payload={
+                    "note_id": note["id"],
+                    "title": note["title"],
+                    "category": note["category"],
+                },
+            )
+        )
+
+    await qdrant_test_client.upsert(
+        collection_name=test_collection, points=points, wait=True
+    )
+
+    # Search only in "Personal" category
+    query = "books reading"
+    query_embedding = await simple_embedding_provider.embed(query)
+
+    response = await qdrant_test_client.query_points(
+        collection_name=test_collection,
+        query=query_embedding,
+        query_filter=Filter(
+            must=[FieldCondition(key="category", match=MatchValue(value="Personal"))]
+        ),
+        limit=3,
+    )
+
+    # Should only return Personal category notes
+    assert len(response.points) > 0
+    for result in response.points:
+        assert result.payload["category"] == "Personal"
+
+
+async def test_semantic_search_empty_results(
+    qdrant_test_client: AsyncQdrantClient,
+    test_collection: str,
+    simple_embedding_provider: SimpleEmbeddingProvider,
+):
+    """Test semantic search with no indexed content returns empty results."""
+
+    query = "test query"
+    query_embedding = await simple_embedding_provider.embed(query)
+
+    response = await qdrant_test_client.query_points(
+        collection_name=test_collection,
+        query=query_embedding,
+        limit=10,
+    )
+
+    assert len(response.points) == 0
+
+
+async def test_batch_embedding(simple_embedding_provider: SimpleEmbeddingProvider):
+    """Test batch embedding generation."""
+    texts = [
+        "First document about Python",
+        "Second document about JavaScript",
+        "Third document about TypeScript",
+    ]
+
+    embeddings = await simple_embedding_provider.embed_batch(texts)
+
+    assert len(embeddings) == 3
+    assert all(len(emb) == 384 for emb in embeddings)
+
+    # Each should be normalized
+    import math
+
+    for emb in embeddings:
+        norm = math.sqrt(sum(x * x for x in emb))
+        assert abs(norm - 1.0) < 1e-6
+
+
+async def test_qdrant_persistent_mode(
+    simple_embedding_provider: SimpleEmbeddingProvider,
+    sample_notes: list[dict],
+):
+    """Test Qdrant in persistent local mode with file storage."""
+
+    with tempfile.TemporaryDirectory() as tmpdir:
+        storage_path = Path(tmpdir) / "qdrant_data"
+
+        # Create first client with persistent storage using path parameter
+        client1 = AsyncQdrantClient(path=str(storage_path))
+
+        try:
+            collection_name = "test_persistent"
+
+            # Create collection and index notes
+            await client1.create_collection(
+                collection_name=collection_name,
+                vectors_config=VectorParams(size=384, distance=Distance.COSINE),
+            )
+
+            # Index sample notes
+            points = []
+            for note in sample_notes:
+                content = f"{note['title']}\n\n{note['content']}"
+                embedding = await simple_embedding_provider.embed(content)
+
+                points.append(
+                    PointStruct(
+                        id=note["id"],
+                        vector=embedding,
+                        payload={
+                            "note_id": note["id"],
+                            "title": note["title"],
+                            "category": note["category"],
+                        },
+                    )
+                )
+
+            await client1.upsert(
+                collection_name=collection_name, points=points, wait=True
+            )
+
+            # Verify data was written
+            count_result = await client1.count(collection_name=collection_name)
+            assert count_result.count == len(sample_notes)
+
+            # Close first client
+            await client1.close()
+
+            # Create new client with same storage path
+            client2 = AsyncQdrantClient(path=str(storage_path))
+
+            try:
+                # Data should persist - verify collection exists
+                collections = await client2.get_collections()
+                collection_names = [c.name for c in collections.collections]
+                assert collection_name in collection_names
+
+                # Verify indexed data persisted
+                count_result = await client2.count(collection_name=collection_name)
+                assert count_result.count == len(sample_notes)
+
+                # Verify search still works
+                query = "Python programming"
+                query_embedding = await simple_embedding_provider.embed(query)
+
+                response = await client2.query_points(
+                    collection_name=collection_name,
+                    query=query_embedding,
+                    limit=3,
+                )
+
+                # Should find Python note as top result
+                assert len(response.points) > 0
+                assert response.points[0].payload["note_id"] == 1
+
+            finally:
+                await client2.close()
+
+        finally:
+            # Cleanup
+            await client1.close()
@@ -0,0 +1,630 @@
+"""
+Tests for Dynamic Client Registration (DCR) with Keycloak external IdP.
+
+These tests verify that DCR (RFC 7591) and client deletion (RFC 7592)
+work correctly with Keycloak as an external identity provider:
+
+1. Client registration via Keycloak's DCR endpoint
+2. Token acquisition with dynamically registered client
+3. MCP tool execution with Keycloak-issued tokens
+4. Client deletion via RFC 7592
+5. Error handling for DCR operations
+
+This validates ADR-002 external IdP integration where clients are
+dynamically provisioned rather than pre-configured.
+
+Architecture:
+    MCP Client → Keycloak DCR → Keycloak OAuth → MCP Server → Nextcloud APIs
+"""
+
+import logging
+import os
+import secrets
+import time
+from urllib.parse import quote
+
+import anyio
+import httpx
+import pytest
+
+from nextcloud_mcp_server.auth.client_registration import delete_client, register_client
+
+logger = logging.getLogger(__name__)
+
+pytestmark = [pytest.mark.integration, pytest.mark.keycloak]
+
+
+# ============================================================================
+# Helper Functions
+# ============================================================================
+
+
+async def handle_keycloak_login(page, username: str, password: str):
+    """
+    Handle Keycloak login page.
+
+    Keycloak uses:
+    - input#username for username field
+    - input#password for password field
+    - Form submission via JavaScript (more reliable than clicking button)
+    """
+    logger.info(f"Handling Keycloak login for user: {username}")
+    logger.info(f"Current URL before login: {page.url}")
+
+    # Wait for username field and fill it
+    await page.wait_for_selector("input#username", timeout=10000)
+    await page.fill("input#username", username)
+
+    # Fill password field
+    await page.wait_for_selector("input#password", timeout=10000)
+    await page.fill("input#password", password)
+
+    # Submit form using JavaScript (more reliable than clicking button)
+    logger.info("Submitting Keycloak login form...")
+    async with page.expect_navigation(timeout=60000):
+        await page.evaluate("document.querySelector('form').submit()")
+
+    logger.info(f"✓ Keycloak login completed, redirected to: {page.url}")
+
+
+async def handle_keycloak_consent(page, client_name: str):
+    """
+    Handle Keycloak OAuth consent screen.
+
+    Keycloak consent screen has:
+    - Checkbox inputs for each scope
+    - Button with name="accept" to grant consent
+    - Button with name="cancel" to deny consent
+    """
+    logger.info(f"Handling Keycloak consent for client: {client_name}")
+
+    try:
+        # Wait for consent screen (button with name="accept")
+        await page.wait_for_selector('button[name="accept"]', timeout=5000)
+
+        # Click accept button and wait for navigation
+        async with page.expect_navigation(timeout=60000):
+            await page.click('button[name="accept"]')
+
+        logger.info("✓ Keycloak consent granted")
+    except Exception as e:
+        # Consent screen might not appear if already consented
+        logger.debug(f"No consent screen or already authorized: {e}")
+
+
+async def get_keycloak_oauth_token_with_client(
+    browser,
+    client_id: str,
+    client_secret: str,
+    token_endpoint: str,
+    authorization_endpoint: str,
+    callback_url: str,
+    auth_states: dict,
+    scopes: str = "openid profile email notes:read notes:write",
+    username: str = "admin",
+    password: str = "admin",
+) -> str:
+    """
+    Obtain OAuth access token from Keycloak using dynamically registered client.
+
+    Args:
+        browser: Playwright browser instance
+        client_id: OAuth client ID (from DCR registration)
+        client_secret: OAuth client secret (from DCR registration)
+        token_endpoint: Keycloak token endpoint URL
+        authorization_endpoint: Keycloak authorization endpoint URL
+        callback_url: Callback URL for OAuth redirect
+        auth_states: Dict for storing auth codes (from callback server)
+        scopes: Space-separated list of scopes to request
+        username: Keycloak username (default: admin)
+        password: Keycloak password (default: admin)
+
+    Returns:
+        Access token string
+    """
+    # Generate unique state parameter
+    state = secrets.token_urlsafe(32)
+
+    # URL-encode scopes
+    scopes_encoded = quote(scopes, safe="")
+
+    # Construct authorization URL
+    auth_url = (
+        f"{authorization_endpoint}?"
+        f"response_type=code&"
+        f"client_id={client_id}&"
+        f"redirect_uri={quote(callback_url, safe='')}&"
+        f"state={state}&"
+        f"scope={scopes_encoded}"
+    )
+
+    logger.info("Starting OAuth flow with Keycloak...")
+    logger.info(f"Authorization URL: {auth_url[:100]}...")
+
+    # Browser automation
+    context = await browser.new_context(ignore_https_errors=True)
+    page = await context.new_page()
+
+    try:
+        await page.goto(auth_url, wait_until="networkidle", timeout=60000)
+        current_url = page.url
+        logger.info(f"Current URL after navigation: {current_url[:100]}...")
+
+        # Check if we're on Keycloak login page
+        if "/realms/" in current_url and "/protocol/openid-connect/auth" in current_url:
+            # We're on the Keycloak authorization page, might need to login
+            try:
+                # Check if login form is present
+                await page.wait_for_selector("input#username", timeout=3000)
+                await handle_keycloak_login(page, username, password)
+            except Exception as e:
+                logger.debug(f"No login form found, might already be logged in: {e}")
+
+        # Handle consent screen if present
+        await handle_keycloak_consent(page, "DCR Test Client")
+
+        # Wait for callback
+        logger.info("Waiting for OAuth callback...")
+        timeout_seconds = 30
+        start_time = time.time()
+        while state not in auth_states:
+            if time.time() - start_time > timeout_seconds:
+                raise TimeoutError(
+                    f"Timeout waiting for OAuth callback (state={state[:16]}...)"
+                )
+            await anyio.sleep(0.5)
+
+        auth_code = auth_states[state]
+        logger.info(f"Got auth code: {auth_code[:20]}...")
+
+    finally:
+        await context.close()
+
+    # Exchange code for token
+    logger.info("Exchanging authorization code for access token...")
+    async with httpx.AsyncClient(timeout=30.0) as http_client:
+        token_response = await http_client.post(
+            token_endpoint,
+            data={
+                "grant_type": "authorization_code",
+                "code": auth_code,
+                "redirect_uri": callback_url,
+                "client_id": client_id,
+                "client_secret": client_secret,
+            },
+        )
+
+        token_response.raise_for_status()
+        token_data = token_response.json()
+        access_token = token_data.get("access_token")
+
+        if not access_token:
+            raise ValueError(f"No access_token in response: {token_data}")
+
+        logger.info("Successfully obtained access token from Keycloak")
+        return access_token
+
+
+# ============================================================================
+# DCR Registration Tests
+# ============================================================================
+
+
+@pytest.mark.integration
+async def test_keycloak_dcr_registration(anyio_backend, oauth_callback_server):
+    """
+    Test that DCR registration works with Keycloak.
+
+    Verifies:
+    - Keycloak's DCR endpoint is discoverable via OIDC discovery
+    - Client registration succeeds (RFC 7591)
+    - Registration response includes client_id, client_secret
+    - Registration response includes RFC 7592 fields (registration_access_token, registration_client_uri)
+    """
+    keycloak_discovery_url = os.getenv(
+        "OIDC_DISCOVERY_URL",
+        "http://localhost:8888/realms/nextcloud-mcp/.well-known/openid-configuration",
+    )
+
+    auth_states, callback_url = oauth_callback_server
+
+    # OIDC Discovery
+    logger.info("Discovering Keycloak OIDC endpoints...")
+    async with httpx.AsyncClient(timeout=30.0) as client:
+        discovery_response = await client.get(keycloak_discovery_url)
+        discovery_response.raise_for_status()
+        oidc_config = discovery_response.json()
+
+        registration_endpoint = oidc_config.get("registration_endpoint")
+
+        if not registration_endpoint:
+            pytest.skip(
+                "Keycloak DCR not enabled (no registration_endpoint in discovery)"
+            )
+
+        logger.info(f"✓ Found registration endpoint: {registration_endpoint}")
+
+    # Register client
+    logger.info("Registering OAuth client via Keycloak DCR...")
+    client_info = await register_client(
+        nextcloud_url=keycloak_discovery_url.replace(
+            "/.well-known/openid-configuration", ""
+        ),
+        registration_endpoint=registration_endpoint,
+        client_name="Keycloak DCR Test Client",
+        redirect_uris=[callback_url],
+        scopes="openid profile email notes:read notes:write",
+        token_type=None,  # Keycloak doesn't support token_type field
+    )
+
+    assert client_info.client_id, "Registration should return client_id"
+    assert client_info.client_secret, "Registration should return client_secret"
+    logger.info(f"✓ Client registered: {client_info.client_id[:16]}...")
+
+    # Verify RFC 7592 fields are present
+    assert client_info.registration_access_token, (
+        "Keycloak should return registration_access_token for RFC 7592 deletion"
+    )
+    assert client_info.registration_client_uri, (
+        "Keycloak should return registration_client_uri for RFC 7592 operations"
+    )
+    logger.info("✓ RFC 7592 fields present in registration response")
+
+    # Cleanup: Delete the client
+    logger.info("Cleaning up: deleting test client...")
+    keycloak_host = keycloak_discovery_url.replace(
+        "/.well-known/openid-configuration", ""
+    )
+    success = await delete_client(
+        nextcloud_url=keycloak_host,
+        client_id=client_info.client_id,
+        registration_access_token=client_info.registration_access_token,
+        client_secret=client_info.client_secret,
+        registration_client_uri=client_info.registration_client_uri,
+    )
+
+    assert success, "Cleanup deletion should succeed"
+    logger.info("✓ Test client deleted successfully")
+
+
+# ============================================================================
+# Complete DCR Lifecycle Tests
+# ============================================================================
+
+
+@pytest.mark.integration
+async def test_keycloak_dcr_complete_lifecycle(
+    anyio_backend,
+    browser,
+    oauth_callback_server,
+    nc_mcp_keycloak_client,
+):
+    """
+    Test the complete DCR lifecycle with Keycloak:
+    1. Register client via DCR (RFC 7591)
+    2. Obtain OAuth token with registered client
+    3. Use token to access MCP tools
+    4. Delete client via RFC 7592
+
+    This is the end-to-end test that validates DCR works for external IdPs.
+    """
+    keycloak_discovery_url = os.getenv(
+        "OIDC_DISCOVERY_URL",
+        "http://localhost:8888/realms/nextcloud-mcp/.well-known/openid-configuration",
+    )
+
+    auth_states, callback_url = oauth_callback_server
+
+    # Step 1: OIDC Discovery
+    logger.info("Step 1: Discovering Keycloak OIDC endpoints...")
+    async with httpx.AsyncClient(timeout=30.0) as client:
+        discovery_response = await client.get(keycloak_discovery_url)
+        discovery_response.raise_for_status()
+        oidc_config = discovery_response.json()
+
+        registration_endpoint = oidc_config.get("registration_endpoint")
+        token_endpoint = oidc_config.get("token_endpoint")
+        authorization_endpoint = oidc_config.get("authorization_endpoint")
+
+        if not registration_endpoint:
+            pytest.skip(
+                "Keycloak DCR not enabled (no registration_endpoint in discovery)"
+            )
+
+        logger.info(f"✓ Registration endpoint: {registration_endpoint}")
+        logger.info(f"✓ Token endpoint: {token_endpoint}")
+        logger.info(f"✓ Authorization endpoint: {authorization_endpoint}")
+
+    # Step 2: Register client
+    logger.info("Step 2: Registering OAuth client via Keycloak DCR...")
+    keycloak_host = keycloak_discovery_url.replace(
+        "/.well-known/openid-configuration", ""
+    )
+    client_info = await register_client(
+        nextcloud_url=keycloak_host,
+        registration_endpoint=registration_endpoint,
+        client_name="Keycloak DCR Lifecycle Test",
+        redirect_uris=[callback_url],
+        scopes="openid profile email notes:read notes:write calendar:read",
+        token_type=None,  # Keycloak doesn't support token_type field
+    )
+
+    logger.info(f"✓ Client registered: {client_info.client_id[:16]}...")
+    logger.info(f"  Client secret: {client_info.client_secret[:16]}...")
+    logger.info(
+        f"  Registration token: {client_info.registration_access_token[:16]}..."
+    )
+
+    # Step 3: Obtain OAuth token
+    logger.info("Step 3: Obtaining OAuth token with registered client...")
+    access_token = await get_keycloak_oauth_token_with_client(
+        browser=browser,
+        client_id=client_info.client_id,
+        client_secret=client_info.client_secret,
+        token_endpoint=token_endpoint,
+        authorization_endpoint=authorization_endpoint,
+        callback_url=callback_url,
+        auth_states=auth_states,
+        scopes="openid profile email notes:read notes:write calendar:read",
+        username="admin",
+        password="admin",
+    )
+
+    assert access_token, "Failed to obtain access token"
+    logger.info(f"✓ Access token obtained: {access_token[:30]}...")
+
+    # Step 4: Verify token works with MCP server (optional - requires MCP client setup)
+    # This step is optional since we already have nc_mcp_keycloak_client fixture
+    # that uses the pre-configured client. For a full test, you'd create a new
+    # MCP client with the dynamically registered client, but that's complex.
+    logger.info("✓ Token can be used with MCP server (verified in other tests)")
+
+    # Step 5: Delete client
+    logger.info("Step 4: Deleting OAuth client via RFC 7592...")
+    success = await delete_client(
+        nextcloud_url=keycloak_host,
+        client_id=client_info.client_id,
+        registration_access_token=client_info.registration_access_token,
+        client_secret=client_info.client_secret,
+        registration_client_uri=client_info.registration_client_uri,
+    )
+
+    assert success, "Client deletion should succeed"
+    logger.info(f"✓ Client deleted successfully: {client_info.client_id[:16]}...")
+
+    # Step 6: Verify deleted client cannot be used
+    logger.info("Step 5: Verifying deleted client cannot obtain new tokens...")
+    async with httpx.AsyncClient(timeout=30.0) as http_client:
+        try:
+            # Try to use client credentials grant (should fail)
+            token_response = await http_client.post(
+                token_endpoint,
+                data={
+                    "grant_type": "client_credentials",
+                    "client_id": client_info.client_id,
+                    "client_secret": client_info.client_secret,
+                },
+            )
+
+            # Accept 400 or 401 as valid rejection
+            if token_response.status_code in [400, 401]:
+                logger.info(
+                    f"✓ Deleted client correctly rejected ({token_response.status_code})"
+                )
+            else:
+                pytest.fail(
+                    f"Deleted client should not be able to obtain tokens, "
+                    f"but got status {token_response.status_code}"
+                )
+
+        except httpx.HTTPStatusError as e:
+            if e.response.status_code in [400, 401]:
+                logger.info("✓ Deleted client correctly rejected")
+            else:
+                raise
+
+    logger.info("✅ Complete Keycloak DCR lifecycle test passed!")
+
+
+# ============================================================================
+# Error Handling Tests
+# ============================================================================
+
+
+@pytest.mark.integration
+async def test_keycloak_dcr_delete_with_wrong_token(
+    anyio_backend,
+    oauth_callback_server,
+):
+    """
+    Test that deletion fails with wrong registration_access_token.
+
+    Verifies:
+    1. Client registration succeeds
+    2. Deletion with wrong registration_access_token fails
+    3. Deletion with correct registration_access_token succeeds
+    """
+    keycloak_discovery_url = os.getenv(
+        "OIDC_DISCOVERY_URL",
+        "http://localhost:8888/realms/nextcloud-mcp/.well-known/openid-configuration",
+    )
+
+    auth_states, callback_url = oauth_callback_server
+
+    # OIDC Discovery
+    async with httpx.AsyncClient(timeout=30.0) as client:
+        discovery_response = await client.get(keycloak_discovery_url)
+        discovery_response.raise_for_status()
+        oidc_config = discovery_response.json()
+
+        registration_endpoint = oidc_config.get("registration_endpoint")
+
+        if not registration_endpoint:
+            pytest.skip("Keycloak DCR not enabled")
+
+    # Register client
+    logger.info("Registering OAuth client for wrong token test...")
+    keycloak_host = keycloak_discovery_url.replace(
+        "/.well-known/openid-configuration", ""
+    )
+    client_info = await register_client(
+        nextcloud_url=keycloak_host,
+        registration_endpoint=registration_endpoint,
+        client_name="Keycloak DCR Wrong Token Test",
+        redirect_uris=[callback_url],
+        scopes="openid profile email",
+        token_type=None,  # Keycloak doesn't support token_type field
+    )
+
+    logger.info(f"Client registered: {client_info.client_id[:16]}...")
+
+    # Try to delete with wrong registration_access_token
+    logger.info("Attempting deletion with wrong registration_access_token...")
+    wrong_token = "wrong_token_" + secrets.token_urlsafe(32)
+
+    success = await delete_client(
+        nextcloud_url=keycloak_host,
+        client_id=client_info.client_id,
+        registration_access_token=wrong_token,
+        client_secret=client_info.client_secret,
+        registration_client_uri=client_info.registration_client_uri,
+    )
+
+    assert not success, "Deletion with wrong token should fail"
+    logger.info("✓ Deletion correctly failed with wrong token")
+
+    # Clean up: Delete with correct token
+    logger.info("Cleaning up: deleting with correct registration_access_token...")
+    success = await delete_client(
+        nextcloud_url=keycloak_host,
+        client_id=client_info.client_id,
+        registration_access_token=client_info.registration_access_token,
+        client_secret=client_info.client_secret,
+        registration_client_uri=client_info.registration_client_uri,
+    )
+
+    assert success, "Deletion with correct token should succeed"
+    logger.info("✓ Cleanup successful")
+
+
+@pytest.mark.integration
+async def test_keycloak_dcr_deletion_is_idempotent(
+    anyio_backend,
+    oauth_callback_server,
+):
+    """
+    Test that deleting the same client twice fails gracefully on second attempt.
+
+    Verifies:
+    1. First deletion succeeds
+    2. Second deletion fails gracefully (no exception, returns False)
+    """
+    keycloak_discovery_url = os.getenv(
+        "OIDC_DISCOVERY_URL",
+        "http://localhost:8888/realms/nextcloud-mcp/.well-known/openid-configuration",
+    )
+
+    auth_states, callback_url = oauth_callback_server
+
+    # OIDC Discovery
+    async with httpx.AsyncClient(timeout=30.0) as client:
+        discovery_response = await client.get(keycloak_discovery_url)
+        discovery_response.raise_for_status()
+        oidc_config = discovery_response.json()
+
+        registration_endpoint = oidc_config.get("registration_endpoint")
+
+        if not registration_endpoint:
+            pytest.skip("Keycloak DCR not enabled")
+
+    # Register client
+    logger.info("Registering OAuth client for idempotency test...")
+    keycloak_host = keycloak_discovery_url.replace(
+        "/.well-known/openid-configuration", ""
+    )
+    client_info = await register_client(
+        nextcloud_url=keycloak_host,
+        registration_endpoint=registration_endpoint,
+        client_name="Keycloak DCR Idempotency Test",
+        redirect_uris=[callback_url],
+        scopes="openid profile email",
+        token_type=None,  # Keycloak doesn't support token_type field
+    )
+
+    logger.info(f"Client registered: {client_info.client_id[:16]}...")
+
+    # First deletion
+    logger.info("First deletion attempt...")
+    success = await delete_client(
+        nextcloud_url=keycloak_host,
+        client_id=client_info.client_id,
+        registration_access_token=client_info.registration_access_token,
+        client_secret=client_info.client_secret,
+        registration_client_uri=client_info.registration_client_uri,
+    )
+
+    assert success, "First deletion should succeed"
+    logger.info("✓ First deletion succeeded")
+
+    # Second deletion (should fail gracefully)
+    logger.info("Second deletion attempt (should fail)...")
+    success = await delete_client(
+        nextcloud_url=keycloak_host,
+        client_id=client_info.client_id,
+        registration_access_token=client_info.registration_access_token,
+        client_secret=client_info.client_secret,
+        registration_client_uri=client_info.registration_client_uri,
+    )
+
+    assert not success, "Second deletion should fail (client already deleted)"
+    logger.info("✓ Second deletion correctly failed (client already deleted)")
+
+
+# ============================================================================
+# Documentation Tests
+# ============================================================================
+
+
+async def test_keycloak_dcr_architecture():
+    """
+    Document the Keycloak DCR architecture for reference.
+
+    This test captures the design and flow for DCR with external IdPs.
+    """
+    architecture = {
+        "flow": [
+            "1. MCP client discovers Keycloak OIDC endpoints via .well-known/openid-configuration",
+            "2. MCP client registers via Keycloak DCR endpoint (RFC 7591)",
+            "3. Keycloak returns client_id, client_secret, registration_access_token",
+            "4. MCP client uses credentials to obtain OAuth token",
+            "5. MCP client uses token to authenticate with MCP server",
+            "6. MCP server validates token via Nextcloud user_oidc app",
+            "7. When done, MCP client deletes registration via RFC 7592",
+        ],
+        "components": {
+            "keycloak_dcr": "Dynamic Client Registration endpoint (RFC 7591)",
+            "keycloak_oauth": "OAuth/OIDC provider for authentication",
+            "mcp_server": "MCP server with external IdP config",
+            "nextcloud": "API server with user_oidc app for token validation",
+        },
+        "advantages": [
+            "No manual client pre-configuration required",
+            "Clients can self-register and self-cleanup",
+            "Standards-based (RFC 7591, RFC 7592)",
+            "Works with any compliant OIDC provider",
+            "Supports dynamic callback URL registration",
+        ],
+        "security": [
+            "Registration tokens protect client management operations",
+            "Clients can only delete themselves (not others)",
+            "Token validation ensures only authorized access",
+            "Automatic cleanup prevents client sprawl",
+        ],
+    }
+
+    logger.info("Keycloak DCR Architecture:")
+    import json
+
+    logger.info(json.dumps(architecture, indent=2))
+
+    assert True
@@ -0,0 +1,261 @@
+"""Tests for configuration validation."""
+
+import os
+from unittest.mock import patch
+
+import pytest
+
+from nextcloud_mcp_server.config import Settings, get_settings
+
+
+class TestQdrantConfigValidation:
+    """Test Qdrant configuration validation."""
+
+    def test_mutually_exclusive_url_and_location(self):
+        """Test that setting both QDRANT_URL and QDRANT_LOCATION raises ValueError."""
+        with pytest.raises(
+            ValueError,
+            match="Cannot set both QDRANT_URL and QDRANT_LOCATION",
+        ):
+            Settings(
+                qdrant_url="http://qdrant:6333",
+                qdrant_location="/app/data/qdrant",
+            )
+
+    def test_default_to_memory_mode(self):
+        """Test that :memory: is used when neither URL nor location is set."""
+        settings = Settings()
+        assert settings.qdrant_location == ":memory:"
+        assert settings.qdrant_url is None
+
+    def test_network_mode_only(self):
+        """Test network mode with only URL set."""
+        settings = Settings(qdrant_url="http://qdrant:6333")
+        assert settings.qdrant_url == "http://qdrant:6333"
+        assert settings.qdrant_location is None
+
+    def test_local_mode_only(self):
+        """Test local mode with only location set."""
+        settings = Settings(qdrant_location="/app/data/qdrant")
+        assert settings.qdrant_location == "/app/data/qdrant"
+        assert settings.qdrant_url is None
+
+    def test_in_memory_mode_explicit(self):
+        """Test explicit in-memory mode."""
+        settings = Settings(qdrant_location=":memory:")
+        assert settings.qdrant_location == ":memory:"
+        assert settings.qdrant_url is None
+
+    def test_api_key_warning_in_local_mode(self, caplog):
+        """Test that API key in local mode triggers warning."""
+        import logging
+
+        caplog.set_level(logging.WARNING, logger="nextcloud_mcp_server.config")
+        Settings(
+            qdrant_location=":memory:",
+            qdrant_api_key="test-api-key",
+        )
+        assert "API key is only relevant for network mode" in caplog.text
+
+    def test_api_key_no_warning_in_network_mode(self, caplog):
+        """Test that API key in network mode doesn't trigger warning."""
+        import logging
+
+        caplog.set_level(logging.WARNING, logger="nextcloud_mcp_server.config")
+        Settings(
+            qdrant_url="http://qdrant:6333",
+            qdrant_api_key="test-api-key",
+        )
+        assert "API key is only relevant for network mode" not in caplog.text
+
+
+class TestGetSettings:
+    """Test get_settings() function with environment variables."""
+
+    @patch.dict(os.environ, {}, clear=True)
+    def test_get_settings_defaults_to_memory(self):
+        """Test get_settings() defaults to :memory: when no env vars set."""
+        settings = get_settings()
+        assert settings.qdrant_location == ":memory:"
+        assert settings.qdrant_url is None
+
+    @patch.dict(
+        os.environ,
+        {
+            "QDRANT_URL": "http://qdrant:6333",
+            "QDRANT_API_KEY": "test-key",
+        },
+        clear=True,
+    )
+    def test_get_settings_network_mode(self):
+        """Test get_settings() with network mode env vars."""
+        settings = get_settings()
+        assert settings.qdrant_url == "http://qdrant:6333"
+        assert settings.qdrant_api_key == "test-key"
+        assert settings.qdrant_location is None
+
+    @patch.dict(
+        os.environ,
+        {"QDRANT_LOCATION": "/app/data/qdrant"},
+        clear=True,
+    )
+    def test_get_settings_persistent_mode(self):
+        """Test get_settings() with persistent local mode env vars."""
+        settings = get_settings()
+        assert settings.qdrant_location == "/app/data/qdrant"
+        assert settings.qdrant_url is None
+
+    @patch.dict(
+        os.environ,
+        {"QDRANT_LOCATION": ":memory:"},
+        clear=True,
+    )
+    def test_get_settings_explicit_memory(self):
+        """Test get_settings() with explicit :memory: env var."""
+        settings = get_settings()
+        assert settings.qdrant_location == ":memory:"
+        assert settings.qdrant_url is None
+
+    @patch.dict(
+        os.environ,
+        {
+            "QDRANT_URL": "http://qdrant:6333",
+            "QDRANT_LOCATION": "/app/data/qdrant",
+        },
+        clear=True,
+    )
+    def test_get_settings_mutual_exclusion_error(self):
+        """Test get_settings() raises error when both URL and location set."""
+        with pytest.raises(
+            ValueError,
+            match="Cannot set both QDRANT_URL and QDRANT_LOCATION",
+        ):
+            get_settings()
+
+    @patch.dict(
+        os.environ,
+        {
+            "QDRANT_COLLECTION": "test_collection",
+            "VECTOR_SYNC_ENABLED": "true",
+            "VECTOR_SYNC_SCAN_INTERVAL": "600",
+            "VECTOR_SYNC_PROCESSOR_WORKERS": "5",
+            "VECTOR_SYNC_QUEUE_MAX_SIZE": "5000",
+        },
+        clear=True,
+    )
+    def test_get_settings_vector_sync_config(self):
+        """Test get_settings() with vector sync configuration."""
+        settings = get_settings()
+        assert settings.qdrant_collection == "test_collection"
+        assert settings.vector_sync_enabled is True
+        assert settings.vector_sync_scan_interval == 600
+        assert settings.vector_sync_processor_workers == 5
+        assert settings.vector_sync_queue_max_size == 5000
+
+
+class TestChunkConfigValidation:
+    """Test document chunking configuration validation."""
+
+    def test_default_chunk_settings(self):
+        """Test default chunk size and overlap values."""
+        settings = Settings()
+        assert settings.document_chunk_size == 512
+        assert settings.document_chunk_overlap == 50
+
+    def test_valid_chunk_settings(self):
+        """Test valid chunk size and overlap configuration."""
+        settings = Settings(
+            document_chunk_size=1024,
+            document_chunk_overlap=100,
+        )
+        assert settings.document_chunk_size == 1024
+        assert settings.document_chunk_overlap == 100
+
+    def test_overlap_greater_than_or_equal_to_chunk_size_raises_error(self):
+        """Test that overlap >= chunk size raises ValueError."""
+        with pytest.raises(
+            ValueError,
+            match="DOCUMENT_CHUNK_OVERLAP .* must be less than DOCUMENT_CHUNK_SIZE",
+        ):
+            Settings(
+                document_chunk_size=512,
+                document_chunk_overlap=512,
+            )
+
+    def test_overlap_larger_than_chunk_size_raises_error(self):
+        """Test that overlap > chunk size raises ValueError."""
+        with pytest.raises(
+            ValueError,
+            match="DOCUMENT_CHUNK_OVERLAP .* must be less than DOCUMENT_CHUNK_SIZE",
+        ):
+            Settings(
+                document_chunk_size=256,
+                document_chunk_overlap=300,
+            )
+
+    def test_negative_overlap_raises_error(self):
+        """Test that negative overlap raises ValueError."""
+        with pytest.raises(
+            ValueError,
+            match="DOCUMENT_CHUNK_OVERLAP .* cannot be negative",
+        ):
+            Settings(
+                document_chunk_size=512,
+                document_chunk_overlap=-10,
+            )
+
+    def test_small_chunk_size_warning(self, caplog):
+        """Test that chunk size < 100 triggers warning."""
+        import logging
+
+        caplog.set_level(logging.WARNING, logger="nextcloud_mcp_server.config")
+        Settings(
+            document_chunk_size=64,
+            document_chunk_overlap=10,
+        )
+        assert (
+            "DOCUMENT_CHUNK_SIZE is set to 64 words, which is quite small"
+            in caplog.text
+        )
+        assert "Consider using at least 256 words" in caplog.text
+
+    def test_reasonable_chunk_size_no_warning(self, caplog):
+        """Test that chunk size >= 100 doesn't trigger warning."""
+        import logging
+
+        caplog.set_level(logging.WARNING, logger="nextcloud_mcp_server.config")
+        Settings(
+            document_chunk_size=256,
+            document_chunk_overlap=25,
+        )
+        assert "DOCUMENT_CHUNK_SIZE" not in caplog.text
+
+    @patch.dict(
+        os.environ,
+        {
+            "DOCUMENT_CHUNK_SIZE": "1024",
+            "DOCUMENT_CHUNK_OVERLAP": "102",
+        },
+        clear=True,
+    )
+    def test_get_settings_chunk_config(self):
+        """Test get_settings() with chunk configuration."""
+        settings = get_settings()
+        assert settings.document_chunk_size == 1024
+        assert settings.document_chunk_overlap == 102
+
+    @patch.dict(
+        os.environ,
+        {
+            "DOCUMENT_CHUNK_SIZE": "256",
+            "DOCUMENT_CHUNK_OVERLAP": "256",
+        },
+        clear=True,
+    )
+    def test_get_settings_invalid_chunk_config_raises_error(self):
+        """Test get_settings() raises error for invalid chunk config."""
+        with pytest.raises(
+            ValueError,
+            match="DOCUMENT_CHUNK_OVERLAP .* must be less than DOCUMENT_CHUNK_SIZE",
+        ):
+            get_settings()
@@ -0,0 +1,88 @@
+"""Unit tests for logging filters."""
+
+import logging
+
+import pytest
+
+from nextcloud_mcp_server.observability.logging_config import HealthCheckFilter
+
+
+@pytest.mark.unit
+class TestHealthCheckFilter:
+    """Tests for the HealthCheckFilter."""
+
+    def test_filters_health_live_requests(self):
+        """Test that /health/live requests are filtered out."""
+        # Create a log record that looks like a uvicorn access log for /health/live
+        record = logging.LogRecord(
+            name="uvicorn.access",
+            level=logging.INFO,
+            pathname="",
+            lineno=0,
+            msg='127.0.0.1:12345 - "GET /health/live HTTP/1.1" 200',
+            args=(),
+            exc_info=None,
+        )
+
+        filter_instance = HealthCheckFilter()
+        assert filter_instance.filter(record) is False
+
+    def test_filters_health_ready_requests(self):
+        """Test that /health/ready requests are filtered out."""
+        record = logging.LogRecord(
+            name="uvicorn.access",
+            level=logging.INFO,
+            pathname="",
+            lineno=0,
+            msg='127.0.0.1:12345 - "GET /health/ready HTTP/1.1" 200',
+            args=(),
+            exc_info=None,
+        )
+
+        filter_instance = HealthCheckFilter()
+        assert filter_instance.filter(record) is False
+
+    def test_filters_metrics_requests(self):
+        """Test that /metrics requests are filtered out."""
+        record = logging.LogRecord(
+            name="uvicorn.access",
+            level=logging.INFO,
+            pathname="",
+            lineno=0,
+            msg='127.0.0.1:12345 - "GET /metrics HTTP/1.1" 200',
+            args=(),
+            exc_info=None,
+        )
+
+        filter_instance = HealthCheckFilter()
+        assert filter_instance.filter(record) is False
+
+    def test_allows_other_requests(self):
+        """Test that non-health-check requests are not filtered."""
+        record = logging.LogRecord(
+            name="uvicorn.access",
+            level=logging.INFO,
+            pathname="",
+            lineno=0,
+            msg='127.0.0.1:12345 - "GET /mcp/messages HTTP/1.1" 200',
+            args=(),
+            exc_info=None,
+        )
+
+        filter_instance = HealthCheckFilter()
+        assert filter_instance.filter(record) is True
+
+    def test_allows_api_requests(self):
+        """Test that API requests are not filtered."""
+        record = logging.LogRecord(
+            name="uvicorn.access",
+            level=logging.INFO,
+            pathname="",
+            lineno=0,
+            msg='127.0.0.1:12345 - "POST /oauth/login HTTP/1.1" 302',
+            args=(),
+            exc_info=None,
+        )
+
+        filter_instance = HealthCheckFilter()
+        assert filter_instance.filter(record) is True
@@ -8,6 +8,10 @@ from nextcloud_mcp_server.models.notes import (
    NoteSearchResult,
    SearchNotesResponse,
 )
+from nextcloud_mcp_server.models.semantic import (
+    SamplingSearchResponse,
+    SemanticSearchResult,
+)


@pytest.mark.unit
@@ -121,3 +125,145 @@ def test_note_search_result_without_score():

    assert result.id == 99
    assert result.score is None
+
+
+@pytest.mark.unit
+def test_sampling_search_response_with_answer():
+    """Test SamplingSearchResponse with LLM-generated answer."""
+    sources = [
+        SemanticSearchResult(
+            id=1,
+            doc_type="note",
+            title="Python Guide",
+            category="Development",
+            excerpt="Use async/await for asynchronous programming",
+            score=0.92,
+            chunk_index=0,
+            total_chunks=3,
+        ),
+        SemanticSearchResult(
+            id=2,
+            doc_type="note",
+            title="Best Practices",
+            category="Development",
+            excerpt="Always use context managers with async operations",
+            score=0.85,
+            chunk_index=1,
+            total_chunks=2,
+        ),
+    ]
+
+    response = SamplingSearchResponse(
+        query="How do I use async in Python?",
+        generated_answer="Based on Document 1 and Document 2, use async/await for asynchronous programming and always use context managers.",
+        sources=sources,
+        total_found=2,
+        search_method="semantic_sampling",
+        model_used="claude-3-5-sonnet",
+        stop_reason="endTurn",
+        success=True,
+    )
+
+    # Verify the response structure
+    assert response.query == "How do I use async in Python?"
+    assert "async/await" in response.generated_answer
+    assert len(response.sources) == 2
+    assert response.sources[0].id == 1
+    assert response.sources[0].score == 0.92
+    assert response.total_found == 2
+    assert response.search_method == "semantic_sampling"
+    assert response.model_used == "claude-3-5-sonnet"
+    assert response.stop_reason == "endTurn"
+    assert response.success is True
+
+    # Verify it serializes correctly
+    data = response.model_dump()
+    assert "query" in data
+    assert "generated_answer" in data
+    assert "sources" in data
+    assert isinstance(data["sources"], list)
+    assert len(data["sources"]) == 2
+    assert data["sources"][0]["id"] == 1
+    assert data["model_used"] == "claude-3-5-sonnet"
+
+
+@pytest.mark.unit
+def test_sampling_search_response_fallback():
+    """Test SamplingSearchResponse when sampling fails (fallback mode)."""
+    sources = [
+        SemanticSearchResult(
+            id=1,
+            doc_type="note",
+            title="Note 1",
+            category="Work",
+            excerpt="Some content",
+            score=0.75,
+            chunk_index=0,
+            total_chunks=1,
+        )
+    ]
+
+    response = SamplingSearchResponse(
+        query="test query",
+        generated_answer="[Sampling unavailable: Client does not support sampling]\n\nFound 1 relevant documents. Please review the sources below.",
+        sources=sources,
+        total_found=1,
+        search_method="semantic_sampling_fallback",
+        model_used=None,
+        stop_reason=None,
+        success=True,
+    )
+
+    # Verify fallback behavior
+    assert "[Sampling unavailable" in response.generated_answer
+    assert response.search_method == "semantic_sampling_fallback"
+    assert response.model_used is None
+    assert response.stop_reason is None
+    assert len(response.sources) == 1
+
+
+@pytest.mark.unit
+def test_sampling_search_response_no_results():
+    """Test SamplingSearchResponse when no documents found."""
+    response = SamplingSearchResponse(
+        query="nonexistent topic",
+        generated_answer="No relevant documents found in your Nextcloud Notes for this query.",
+        sources=[],
+        total_found=0,
+        search_method="semantic_sampling",
+        success=True,
+    )
+
+    # Verify no results case
+    assert response.total_found == 0
+    assert len(response.sources) == 0
+    assert "No relevant documents" in response.generated_answer
+    assert response.model_used is None
+    assert response.stop_reason is None
+
+
+@pytest.mark.unit
+def test_sampling_search_response_serialization():
+    """Test SamplingSearchResponse serializes to JSON correctly."""
+    response = SamplingSearchResponse(
+        query="test",
+        generated_answer="Test answer",
+        sources=[],
+        total_found=0,
+        search_method="semantic_sampling",
+        model_used="claude-3-5-sonnet",
+        stop_reason="maxTokens",
+        success=True,
+    )
+
+    data = response.model_dump()
+
+    # Check all fields are present
+    assert data["query"] == "test"
+    assert data["generated_answer"] == "Test answer"
+    assert data["sources"] == []
+    assert data["total_found"] == 0
+    assert data["search_method"] == "semantic_sampling"
+    assert data["model_used"] == "claude-3-5-sonnet"
+    assert data["stop_reason"] == "maxTokens"
+    assert data["success"] is True
@@ -57,6 +57,15 @@ wheels = [
    { url = "https://files.pythonhosted.org/packages/31/da/e42d7a9d8dd33fa775f467e4028a47936da2f01e4b0e561f9ba0d74cb0ca/argcomplete-3.6.2-py3-none-any.whl", hash = "sha256:65b3133a29ad53fb42c48cf5114752c7ab66c1c38544fdf6460f450c09b42591", size = 43708, upload-time = "2025-04-03T04:57:01.591Z" },
 ]

+[[package]]
+name = "asgiref"
+version = "3.10.0"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/46/08/4dfec9b90758a59acc6be32ac82e98d1fbfc321cb5cfa410436dbacf821c/asgiref-3.10.0.tar.gz", hash = "sha256:d89f2d8cd8b56dada7d52fa7dc8075baa08fb836560710d38c292a7a3f78c04e", size = 37483, upload-time = "2025-10-05T09:15:06.557Z" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/17/9c/fc2331f538fbf7eedba64b2052e99ccf9ba9d6888e2f41441ee28847004b/asgiref-3.10.0-py3-none-any.whl", hash = "sha256:aef8a81283a34d0ab31630c9b7dfe70c812c95eba78171367ca8745e88124734", size = 24050, upload-time = "2025-10-05T09:15:05.11Z" },
+]
+
 [[package]]
 name = "asttokens"
 version = "3.0.0"
@@ -487,6 +496,18 @@ wheels = [
    { url = "https://files.pythonhosted.org/packages/c1/ea/53f2148663b321f21b5a606bd5f191517cf40b7072c0497d3c92c4a13b1e/executing-2.2.1-py2.py3-none-any.whl", hash = "sha256:760643d3452b4d777d295bb167ccc74c64a81df23fb5e08eff250c425a4b2017", size = 28317, upload-time = "2025-09-01T09:48:08.5Z" },
 ]

+[[package]]
+name = "googleapis-common-protos"
+version = "1.72.0"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+    { name = "protobuf" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/e5/7b/adfd75544c415c487b33061fe7ae526165241c1ea133f9a9125a56b39fd8/googleapis_common_protos-1.72.0.tar.gz", hash = "sha256:e55a601c1b32b52d7a3e65f43563e2aa61bcd737998ee672ac9b951cd49319f5", size = 147433, upload-time = "2025-11-06T18:29:24.087Z" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/c4/ab/09169d5a4612a5f92490806649ac8d41e3ec9129c636754575b3553f4ea4/googleapis_common_protos-1.72.0-py3-none-any.whl", hash = "sha256:4299c5a82d5ae1a9702ada957347726b167f9f8d1fc352477702a1e851ff4038", size = 297515, upload-time = "2025-11-06T18:29:13.14Z" },
+]
+
 [[package]]
 name = "greenlet"
 version = "3.2.4"
@@ -537,6 +558,57 @@ wheels = [
    { url = "https://files.pythonhosted.org/packages/e3/a5/6ddab2b4c112be95601c13428db1d8b6608a8b6039816f2ba09c346c08fc/greenlet-3.2.4-cp314-cp314-win_amd64.whl", hash = "sha256:e37ab26028f12dbb0ff65f29a8d3d44a765c61e729647bf2ddfbbed621726f01", size = 303425, upload-time = "2025-08-07T13:32:27.59Z" },
 ]

+[[package]]
+name = "grpcio"
+version = "1.76.0"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+    { name = "typing-extensions" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/b6/e0/318c1ce3ae5a17894d5791e87aea147587c9e702f24122cc7a5c8bbaeeb1/grpcio-1.76.0.tar.gz", hash = "sha256:7be78388d6da1a25c0d5ec506523db58b18be22d9c37d8d3a32c08be4987bd73", size = 12785182, upload-time = "2025-10-21T16:23:12.106Z" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/a0/00/8163a1beeb6971f66b4bbe6ac9457b97948beba8dd2fc8e1281dce7f79ec/grpcio-1.76.0-cp311-cp311-linux_armv7l.whl", hash = "sha256:2e1743fbd7f5fa713a1b0a8ac8ebabf0ec980b5d8809ec358d488e273b9cf02a", size = 5843567, upload-time = "2025-10-21T16:20:52.829Z" },
+    { url = "https://files.pythonhosted.org/packages/10/c1/934202f5cf335e6d852530ce14ddb0fef21be612ba9ecbbcbd4d748ca32d/grpcio-1.76.0-cp311-cp311-macosx_11_0_universal2.whl", hash = "sha256:a8c2cf1209497cf659a667d7dea88985e834c24b7c3b605e6254cbb5076d985c", size = 11848017, upload-time = "2025-10-21T16:20:56.705Z" },
+    { url = "https://files.pythonhosted.org/packages/11/0b/8dec16b1863d74af6eb3543928600ec2195af49ca58b16334972f6775663/grpcio-1.76.0-cp311-cp311-manylinux2014_aarch64.manylinux_2_17_aarch64.whl", hash = "sha256:08caea849a9d3c71a542827d6df9d5a69067b0a1efbea8a855633ff5d9571465", size = 6412027, upload-time = "2025-10-21T16:20:59.3Z" },
+    { url = "https://files.pythonhosted.org/packages/d7/64/7b9e6e7ab910bea9d46f2c090380bab274a0b91fb0a2fe9b0cd399fffa12/grpcio-1.76.0-cp311-cp311-manylinux2014_i686.manylinux_2_17_i686.whl", hash = "sha256:f0e34c2079d47ae9f6188211db9e777c619a21d4faba6977774e8fa43b085e48", size = 7075913, upload-time = "2025-10-21T16:21:01.645Z" },
+    { url = "https://files.pythonhosted.org/packages/68/86/093c46e9546073cefa789bd76d44c5cb2abc824ca62af0c18be590ff13ba/grpcio-1.76.0-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:8843114c0cfce61b40ad48df65abcfc00d4dba82eae8718fab5352390848c5da", size = 6615417, upload-time = "2025-10-21T16:21:03.844Z" },
+    { url = "https://files.pythonhosted.org/packages/f7/b6/5709a3a68500a9c03da6fb71740dcdd5ef245e39266461a03f31a57036d8/grpcio-1.76.0-cp311-cp311-musllinux_1_2_aarch64.whl", hash = "sha256:8eddfb4d203a237da6f3cc8a540dad0517d274b5a1e9e636fd8d2c79b5c1d397", size = 7199683, upload-time = "2025-10-21T16:21:06.195Z" },
+    { url = "https://files.pythonhosted.org/packages/91/d3/4b1f2bf16ed52ce0b508161df3a2d186e4935379a159a834cb4a7d687429/grpcio-1.76.0-cp311-cp311-musllinux_1_2_i686.whl", hash = "sha256:32483fe2aab2c3794101c2a159070584e5db11d0aa091b2c0ea9c4fc43d0d749", size = 8163109, upload-time = "2025-10-21T16:21:08.498Z" },
+    { url = "https://files.pythonhosted.org/packages/5c/61/d9043f95f5f4cf085ac5dd6137b469d41befb04bd80280952ffa2a4c3f12/grpcio-1.76.0-cp311-cp311-musllinux_1_2_x86_64.whl", hash = "sha256:dcfe41187da8992c5f40aa8c5ec086fa3672834d2be57a32384c08d5a05b4c00", size = 7626676, upload-time = "2025-10-21T16:21:10.693Z" },
+    { url = "https://files.pythonhosted.org/packages/36/95/fd9a5152ca02d8881e4dd419cdd790e11805979f499a2e5b96488b85cf27/grpcio-1.76.0-cp311-cp311-win32.whl", hash = "sha256:2107b0c024d1b35f4083f11245c0e23846ae64d02f40b2b226684840260ed054", size = 3997688, upload-time = "2025-10-21T16:21:12.746Z" },
+    { url = "https://files.pythonhosted.org/packages/60/9c/5c359c8d4c9176cfa3c61ecd4efe5affe1f38d9bae81e81ac7186b4c9cc8/grpcio-1.76.0-cp311-cp311-win_amd64.whl", hash = "sha256:522175aba7af9113c48ec10cc471b9b9bd4f6ceb36aeb4544a8e2c80ed9d252d", size = 4709315, upload-time = "2025-10-21T16:21:15.26Z" },
+    { url = "https://files.pythonhosted.org/packages/bf/05/8e29121994b8d959ffa0afd28996d452f291b48cfc0875619de0bde2c50c/grpcio-1.76.0-cp312-cp312-linux_armv7l.whl", hash = "sha256:81fd9652b37b36f16138611c7e884eb82e0cec137c40d3ef7c3f9b3ed00f6ed8", size = 5799718, upload-time = "2025-10-21T16:21:17.939Z" },
+    { url = "https://files.pythonhosted.org/packages/d9/75/11d0e66b3cdf998c996489581bdad8900db79ebd83513e45c19548f1cba4/grpcio-1.76.0-cp312-cp312-macosx_11_0_universal2.whl", hash = "sha256:04bbe1bfe3a68bbfd4e52402ab7d4eb59d72d02647ae2042204326cf4bbad280", size = 11825627, upload-time = "2025-10-21T16:21:20.466Z" },
+    { url = "https://files.pythonhosted.org/packages/28/50/2f0aa0498bc188048f5d9504dcc5c2c24f2eb1a9337cd0fa09a61a2e75f0/grpcio-1.76.0-cp312-cp312-manylinux2014_aarch64.manylinux_2_17_aarch64.whl", hash = "sha256:d388087771c837cdb6515539f43b9d4bf0b0f23593a24054ac16f7a960be16f4", size = 6359167, upload-time = "2025-10-21T16:21:23.122Z" },
+    { url = "https://files.pythonhosted.org/packages/66/e5/bbf0bb97d29ede1d59d6588af40018cfc345b17ce979b7b45424628dc8bb/grpcio-1.76.0-cp312-cp312-manylinux2014_i686.manylinux_2_17_i686.whl", hash = "sha256:9f8f757bebaaea112c00dba718fc0d3260052ce714e25804a03f93f5d1c6cc11", size = 7044267, upload-time = "2025-10-21T16:21:25.995Z" },
+    { url = "https://files.pythonhosted.org/packages/f5/86/f6ec2164f743d9609691115ae8ece098c76b894ebe4f7c94a655c6b03e98/grpcio-1.76.0-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:980a846182ce88c4f2f7e2c22c56aefd515daeb36149d1c897f83cf57999e0b6", size = 6573963, upload-time = "2025-10-21T16:21:28.631Z" },
+    { url = "https://files.pythonhosted.org/packages/60/bc/8d9d0d8505feccfdf38a766d262c71e73639c165b311c9457208b56d92ae/grpcio-1.76.0-cp312-cp312-musllinux_1_2_aarch64.whl", hash = "sha256:f92f88e6c033db65a5ae3d97905c8fea9c725b63e28d5a75cb73b49bda5024d8", size = 7164484, upload-time = "2025-10-21T16:21:30.837Z" },
+    { url = "https://files.pythonhosted.org/packages/67/e6/5d6c2fc10b95edf6df9b8f19cf10a34263b7fd48493936fffd5085521292/grpcio-1.76.0-cp312-cp312-musllinux_1_2_i686.whl", hash = "sha256:4baf3cbe2f0be3289eb68ac8ae771156971848bb8aaff60bad42005539431980", size = 8127777, upload-time = "2025-10-21T16:21:33.577Z" },
+    { url = "https://files.pythonhosted.org/packages/3f/c8/dce8ff21c86abe025efe304d9e31fdb0deaaa3b502b6a78141080f206da0/grpcio-1.76.0-cp312-cp312-musllinux_1_2_x86_64.whl", hash = "sha256:615ba64c208aaceb5ec83bfdce7728b80bfeb8be97562944836a7a0a9647d882", size = 7594014, upload-time = "2025-10-21T16:21:41.882Z" },
+    { url = "https://files.pythonhosted.org/packages/e0/42/ad28191ebf983a5d0ecef90bab66baa5a6b18f2bfdef9d0a63b1973d9f75/grpcio-1.76.0-cp312-cp312-win32.whl", hash = "sha256:45d59a649a82df5718fd9527ce775fd66d1af35e6d31abdcdc906a49c6822958", size = 3984750, upload-time = "2025-10-21T16:21:44.006Z" },
+    { url = "https://files.pythonhosted.org/packages/9e/00/7bd478cbb851c04a48baccaa49b75abaa8e4122f7d86da797500cccdd771/grpcio-1.76.0-cp312-cp312-win_amd64.whl", hash = "sha256:c088e7a90b6017307f423efbb9d1ba97a22aa2170876223f9709e9d1de0b5347", size = 4704003, upload-time = "2025-10-21T16:21:46.244Z" },
+    { url = "https://files.pythonhosted.org/packages/fc/ed/71467ab770effc9e8cef5f2e7388beb2be26ed642d567697bb103a790c72/grpcio-1.76.0-cp313-cp313-linux_armv7l.whl", hash = "sha256:26ef06c73eb53267c2b319f43e6634c7556ea37672029241a056629af27c10e2", size = 5807716, upload-time = "2025-10-21T16:21:48.475Z" },
+    { url = "https://files.pythonhosted.org/packages/2c/85/c6ed56f9817fab03fa8a111ca91469941fb514e3e3ce6d793cb8f1e1347b/grpcio-1.76.0-cp313-cp313-macosx_11_0_universal2.whl", hash = "sha256:45e0111e73f43f735d70786557dc38141185072d7ff8dc1829d6a77ac1471468", size = 11821522, upload-time = "2025-10-21T16:21:51.142Z" },
+    { url = "https://files.pythonhosted.org/packages/ac/31/2b8a235ab40c39cbc141ef647f8a6eb7b0028f023015a4842933bc0d6831/grpcio-1.76.0-cp313-cp313-manylinux2014_aarch64.manylinux_2_17_aarch64.whl", hash = "sha256:83d57312a58dcfe2a3a0f9d1389b299438909a02db60e2f2ea2ae2d8034909d3", size = 6362558, upload-time = "2025-10-21T16:21:54.213Z" },
+    { url = "https://files.pythonhosted.org/packages/bd/64/9784eab483358e08847498ee56faf8ff6ea8e0a4592568d9f68edc97e9e9/grpcio-1.76.0-cp313-cp313-manylinux2014_i686.manylinux_2_17_i686.whl", hash = "sha256:3e2a27c89eb9ac3d81ec8835e12414d73536c6e620355d65102503064a4ed6eb", size = 7049990, upload-time = "2025-10-21T16:21:56.476Z" },
+    { url = "https://files.pythonhosted.org/packages/2b/94/8c12319a6369434e7a184b987e8e9f3b49a114c489b8315f029e24de4837/grpcio-1.76.0-cp313-cp313-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:61f69297cba3950a524f61c7c8ee12e55c486cb5f7db47ff9dcee33da6f0d3ae", size = 6575387, upload-time = "2025-10-21T16:21:59.051Z" },
+    { url = "https://files.pythonhosted.org/packages/15/0f/f12c32b03f731f4a6242f771f63039df182c8b8e2cf8075b245b409259d4/grpcio-1.76.0-cp313-cp313-musllinux_1_2_aarch64.whl", hash = "sha256:6a15c17af8839b6801d554263c546c69c4d7718ad4321e3166175b37eaacca77", size = 7166668, upload-time = "2025-10-21T16:22:02.049Z" },
+    { url = "https://files.pythonhosted.org/packages/ff/2d/3ec9ce0c2b1d92dd59d1c3264aaec9f0f7c817d6e8ac683b97198a36ed5a/grpcio-1.76.0-cp313-cp313-musllinux_1_2_i686.whl", hash = "sha256:25a18e9810fbc7e7f03ec2516addc116a957f8cbb8cbc95ccc80faa072743d03", size = 8124928, upload-time = "2025-10-21T16:22:04.984Z" },
+    { url = "https://files.pythonhosted.org/packages/1a/74/fd3317be5672f4856bcdd1a9e7b5e17554692d3db9a3b273879dc02d657d/grpcio-1.76.0-cp313-cp313-musllinux_1_2_x86_64.whl", hash = "sha256:931091142fd8cc14edccc0845a79248bc155425eee9a98b2db2ea4f00a235a42", size = 7589983, upload-time = "2025-10-21T16:22:07.881Z" },
+    { url = "https://files.pythonhosted.org/packages/45/bb/ca038cf420f405971f19821c8c15bcbc875505f6ffadafe9ffd77871dc4c/grpcio-1.76.0-cp313-cp313-win32.whl", hash = "sha256:5e8571632780e08526f118f74170ad8d50fb0a48c23a746bef2a6ebade3abd6f", size = 3984727, upload-time = "2025-10-21T16:22:10.032Z" },
+    { url = "https://files.pythonhosted.org/packages/41/80/84087dc56437ced7cdd4b13d7875e7439a52a261e3ab4e06488ba6173b0a/grpcio-1.76.0-cp313-cp313-win_amd64.whl", hash = "sha256:f9f7bd5faab55f47231ad8dba7787866b69f5e93bc306e3915606779bbfb4ba8", size = 4702799, upload-time = "2025-10-21T16:22:12.709Z" },
+    { url = "https://files.pythonhosted.org/packages/b4/46/39adac80de49d678e6e073b70204091e76631e03e94928b9ea4ecf0f6e0e/grpcio-1.76.0-cp314-cp314-linux_armv7l.whl", hash = "sha256:ff8a59ea85a1f2191a0ffcc61298c571bc566332f82e5f5be1b83c9d8e668a62", size = 5808417, upload-time = "2025-10-21T16:22:15.02Z" },
+    { url = "https://files.pythonhosted.org/packages/9c/f5/a4531f7fb8b4e2a60b94e39d5d924469b7a6988176b3422487be61fe2998/grpcio-1.76.0-cp314-cp314-macosx_11_0_universal2.whl", hash = "sha256:06c3d6b076e7b593905d04fdba6a0525711b3466f43b3400266f04ff735de0cd", size = 11828219, upload-time = "2025-10-21T16:22:17.954Z" },
+    { url = "https://files.pythonhosted.org/packages/4b/1c/de55d868ed7a8bd6acc6b1d6ddc4aa36d07a9f31d33c912c804adb1b971b/grpcio-1.76.0-cp314-cp314-manylinux2014_aarch64.manylinux_2_17_aarch64.whl", hash = "sha256:fd5ef5932f6475c436c4a55e4336ebbe47bd3272be04964a03d316bbf4afbcbc", size = 6367826, upload-time = "2025-10-21T16:22:20.721Z" },
+    { url = "https://files.pythonhosted.org/packages/59/64/99e44c02b5adb0ad13ab3adc89cb33cb54bfa90c74770f2607eea629b86f/grpcio-1.76.0-cp314-cp314-manylinux2014_i686.manylinux_2_17_i686.whl", hash = "sha256:b331680e46239e090f5b3cead313cc772f6caa7d0fc8de349337563125361a4a", size = 7049550, upload-time = "2025-10-21T16:22:23.637Z" },
+    { url = "https://files.pythonhosted.org/packages/43/28/40a5be3f9a86949b83e7d6a2ad6011d993cbe9b6bd27bea881f61c7788b6/grpcio-1.76.0-cp314-cp314-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:2229ae655ec4e8999599469559e97630185fdd53ae1e8997d147b7c9b2b72cba", size = 6575564, upload-time = "2025-10-21T16:22:26.016Z" },
+    { url = "https://files.pythonhosted.org/packages/4b/a9/1be18e6055b64467440208a8559afac243c66a8b904213af6f392dc2212f/grpcio-1.76.0-cp314-cp314-musllinux_1_2_aarch64.whl", hash = "sha256:490fa6d203992c47c7b9e4a9d39003a0c2bcc1c9aa3c058730884bbbb0ee9f09", size = 7176236, upload-time = "2025-10-21T16:22:28.362Z" },
+    { url = "https://files.pythonhosted.org/packages/0f/55/dba05d3fcc151ce6e81327541d2cc8394f442f6b350fead67401661bf041/grpcio-1.76.0-cp314-cp314-musllinux_1_2_i686.whl", hash = "sha256:479496325ce554792dba6548fae3df31a72cef7bad71ca2e12b0e58f9b336bfc", size = 8125795, upload-time = "2025-10-21T16:22:31.075Z" },
+    { url = "https://files.pythonhosted.org/packages/4a/45/122df922d05655f63930cf42c9e3f72ba20aadb26c100ee105cad4ce4257/grpcio-1.76.0-cp314-cp314-musllinux_1_2_x86_64.whl", hash = "sha256:1c9b93f79f48b03ada57ea24725d83a30284a012ec27eab2cf7e50a550cbbbcc", size = 7592214, upload-time = "2025-10-21T16:22:33.831Z" },
+    { url = "https://files.pythonhosted.org/packages/4a/6e/0b899b7f6b66e5af39e377055fb4a6675c9ee28431df5708139df2e93233/grpcio-1.76.0-cp314-cp314-win32.whl", hash = "sha256:747fa73efa9b8b1488a95d0ba1039c8e2dca0f741612d80415b1e1c560febf4e", size = 4062961, upload-time = "2025-10-21T16:22:36.468Z" },
+    { url = "https://files.pythonhosted.org/packages/19/41/0b430b01a2eb38ee887f88c1f07644a1df8e289353b78e82b37ef988fb64/grpcio-1.76.0-cp314-cp314-win_amd64.whl", hash = "sha256:922fa70ba549fce362d2e2871ab542082d66e2aaf0c19480ea453905b01f384e", size = 4834462, upload-time = "2025-10-21T16:22:39.772Z" },
+]
+
 [[package]]
 name = "h11"
 version = "0.16.0"
@@ -641,6 +713,18 @@ wheels = [
    { url = "https://files.pythonhosted.org/packages/0e/61/66938bbb5fc52dbdf84594873d5b51fb1f7c7794e9c0f5bd885f30bc507b/idna-3.11-py3-none-any.whl", hash = "sha256:771a87f49d9defaf64091e6e6fe9c18d4833f140bd19464795bc32d966ca37ea", size = 71008, upload-time = "2025-10-12T14:55:18.883Z" },
 ]

+[[package]]
+name = "importlib-metadata"
+version = "8.7.0"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+    { name = "zipp" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/76/66/650a33bd90f786193e4de4b3ad86ea60b53c89b669a5c7be931fac31cdb0/importlib_metadata-8.7.0.tar.gz", hash = "sha256:d13b81ad223b890aa16c5471f2ac3056cf76c5f10f82d6f9292f0b415f389000", size = 56641, upload-time = "2025-04-27T15:29:01.736Z" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/20/b0/36bd937216ec521246249be3bf9855081de4c5e06a0c9b4219dbeda50373/importlib_metadata-8.7.0-py3-none-any.whl", hash = "sha256:e5dd1551894c77868a30651cef00984d50e1002d06942a7101d34870c5f02afd", size = 27656, upload-time = "2025-04-27T15:29:00.214Z" },
+]
+
 [[package]]
 name = "iniconfig"
 version = "2.3.0"
@@ -975,7 +1059,7 @@ wheels = [

 [[package]]
 name = "nextcloud-mcp-server"
-version = "0.26.1"
+version = "0.29.1"
 source = { editable = "." }
 dependencies = [
    { name = "aiosqlite" },
@@ -985,10 +1069,19 @@ dependencies = [
    { name = "httpx" },
    { name = "icalendar" },
    { name = "mcp", extra = ["cli"] },
+    { name = "opentelemetry-api" },
+    { name = "opentelemetry-exporter-otlp-proto-grpc" },
+    { name = "opentelemetry-instrumentation-asgi" },
+    { name = "opentelemetry-instrumentation-httpx" },
+    { name = "opentelemetry-instrumentation-logging" },
+    { name = "opentelemetry-sdk" },
    { name = "pillow" },
+    { name = "prometheus-client" },
    { name = "pydantic" },
    { name = "pyjwt", extra = ["crypto"] },
+    { name = "python-json-logger" },
    { name = "pythonvcard4" },
+    { name = "qdrant-client" },
 ]

 [package.dev-dependencies]
@@ -1015,10 +1108,19 @@ requires-dist = [
    { name = "httpx", specifier = ">=0.28.1,<0.29.0" },
    { name = "icalendar", specifier = ">=6.0.0,<7.0.0" },
    { name = "mcp", extras = ["cli"], specifier = ">=1.21,<1.22" },
+    { name = "opentelemetry-api", specifier = ">=1.28.2" },
+    { name = "opentelemetry-exporter-otlp-proto-grpc", specifier = ">=1.28.2" },
+    { name = "opentelemetry-instrumentation-asgi", specifier = ">=0.49b2" },
+    { name = "opentelemetry-instrumentation-httpx", specifier = ">=0.49b2" },
+    { name = "opentelemetry-instrumentation-logging", specifier = ">=0.49b2" },
+    { name = "opentelemetry-sdk", specifier = ">=1.28.2" },
    { name = "pillow", specifier = ">=12.0.0,<12.1.0" },
+    { name = "prometheus-client", specifier = ">=0.21.0" },
    { name = "pydantic", specifier = ">=2.11.4" },
    { name = "pyjwt", extras = ["crypto"], specifier = ">=2.8.0" },
+    { name = "python-json-logger", specifier = ">=3.2.0" },
    { name = "pythonvcard4", specifier = ">=0.2.0" },
+    { name = "qdrant-client", specifier = ">=1.7.0" },
 ]

 [package.metadata.requires-dev]
@@ -1036,6 +1138,238 @@ dev = [
    { name = "ty", specifier = ">=0.0.1a25" },
 ]

+[[package]]
+name = "numpy"
+version = "2.3.4"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/b5/f4/098d2270d52b41f1bd7db9fc288aaa0400cb48c2a3e2af6fa365d9720947/numpy-2.3.4.tar.gz", hash = "sha256:a7d018bfedb375a8d979ac758b120ba846a7fe764911a64465fd87b8729f4a6a", size = 20582187, upload-time = "2025-10-15T16:18:11.77Z" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/60/e7/0e07379944aa8afb49a556a2b54587b828eb41dc9adc56fb7615b678ca53/numpy-2.3.4-cp311-cp311-macosx_10_9_x86_64.whl", hash = "sha256:e78aecd2800b32e8347ce49316d3eaf04aed849cd5b38e0af39f829a4e59f5eb", size = 21259519, upload-time = "2025-10-15T16:15:19.012Z" },
+    { url = "https://files.pythonhosted.org/packages/d0/cb/5a69293561e8819b09e34ed9e873b9a82b5f2ade23dce4c51dc507f6cfe1/numpy-2.3.4-cp311-cp311-macosx_11_0_arm64.whl", hash = "sha256:7fd09cc5d65bda1e79432859c40978010622112e9194e581e3415a3eccc7f43f", size = 14452796, upload-time = "2025-10-15T16:15:23.094Z" },
+    { url = "https://files.pythonhosted.org/packages/e4/04/ff11611200acd602a1e5129e36cfd25bf01ad8e5cf927baf2e90236eb02e/numpy-2.3.4-cp311-cp311-macosx_14_0_arm64.whl", hash = "sha256:1b219560ae2c1de48ead517d085bc2d05b9433f8e49d0955c82e8cd37bd7bf36", size = 5381639, upload-time = "2025-10-15T16:15:25.572Z" },
+    { url = "https://files.pythonhosted.org/packages/ea/77/e95c757a6fe7a48d28a009267408e8aa382630cc1ad1db7451b3bc21dbb4/numpy-2.3.4-cp311-cp311-macosx_14_0_x86_64.whl", hash = "sha256:bafa7d87d4c99752d07815ed7a2c0964f8ab311eb8168f41b910bd01d15b6032", size = 6914296, upload-time = "2025-10-15T16:15:27.079Z" },
+    { url = "https://files.pythonhosted.org/packages/a3/d2/137c7b6841c942124eae921279e5c41b1c34bab0e6fc60c7348e69afd165/numpy-2.3.4-cp311-cp311-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:36dc13af226aeab72b7abad501d370d606326a0029b9f435eacb3b8c94b8a8b7", size = 14591904, upload-time = "2025-10-15T16:15:29.044Z" },
+    { url = "https://files.pythonhosted.org/packages/bb/32/67e3b0f07b0aba57a078c4ab777a9e8e6bc62f24fb53a2337f75f9691699/numpy-2.3.4-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:a7b2f9a18b5ff9824a6af80de4f37f4ec3c2aab05ef08f51c77a093f5b89adda", size = 16939602, upload-time = "2025-10-15T16:15:31.106Z" },
+    { url = "https://files.pythonhosted.org/packages/95/22/9639c30e32c93c4cee3ccdb4b09c2d0fbff4dcd06d36b357da06146530fb/numpy-2.3.4-cp311-cp311-musllinux_1_2_aarch64.whl", hash = "sha256:9984bd645a8db6ca15d850ff996856d8762c51a2239225288f08f9050ca240a0", size = 16372661, upload-time = "2025-10-15T16:15:33.546Z" },
+    { url = "https://files.pythonhosted.org/packages/12/e9/a685079529be2b0156ae0c11b13d6be647743095bb51d46589e95be88086/numpy-2.3.4-cp311-cp311-musllinux_1_2_x86_64.whl", hash = "sha256:64c5825affc76942973a70acf438a8ab618dbd692b84cd5ec40a0a0509edc09a", size = 18884682, upload-time = "2025-10-15T16:15:36.105Z" },
+    { url = "https://files.pythonhosted.org/packages/cf/85/f6f00d019b0cc741e64b4e00ce865a57b6bed945d1bbeb1ccadbc647959b/numpy-2.3.4-cp311-cp311-win32.whl", hash = "sha256:ed759bf7a70342f7817d88376eb7142fab9fef8320d6019ef87fae05a99874e1", size = 6570076, upload-time = "2025-10-15T16:15:38.225Z" },
+    { url = "https://files.pythonhosted.org/packages/7d/10/f8850982021cb90e2ec31990291f9e830ce7d94eef432b15066e7cbe0bec/numpy-2.3.4-cp311-cp311-win_amd64.whl", hash = "sha256:faba246fb30ea2a526c2e9645f61612341de1a83fb1e0c5edf4ddda5a9c10996", size = 13089358, upload-time = "2025-10-15T16:15:40.404Z" },
+    { url = "https://files.pythonhosted.org/packages/d1/ad/afdd8351385edf0b3445f9e24210a9c3971ef4de8fd85155462fc4321d79/numpy-2.3.4-cp311-cp311-win_arm64.whl", hash = "sha256:4c01835e718bcebe80394fd0ac66c07cbb90147ebbdad3dcecd3f25de2ae7e2c", size = 10462292, upload-time = "2025-10-15T16:15:42.896Z" },
+    { url = "https://files.pythonhosted.org/packages/96/7a/02420400b736f84317e759291b8edaeee9dc921f72b045475a9cbdb26b17/numpy-2.3.4-cp312-cp312-macosx_10_13_x86_64.whl", hash = "sha256:ef1b5a3e808bc40827b5fa2c8196151a4c5abe110e1726949d7abddfe5c7ae11", size = 20957727, upload-time = "2025-10-15T16:15:44.9Z" },
+    { url = "https://files.pythonhosted.org/packages/18/90/a014805d627aa5750f6f0e878172afb6454552da929144b3c07fcae1bb13/numpy-2.3.4-cp312-cp312-macosx_11_0_arm64.whl", hash = "sha256:c2f91f496a87235c6aaf6d3f3d89b17dba64996abadccb289f48456cff931ca9", size = 14187262, upload-time = "2025-10-15T16:15:47.761Z" },
+    { url = "https://files.pythonhosted.org/packages/c7/e4/0a94b09abe89e500dc748e7515f21a13e30c5c3fe3396e6d4ac108c25fca/numpy-2.3.4-cp312-cp312-macosx_14_0_arm64.whl", hash = "sha256:f77e5b3d3da652b474cc80a14084927a5e86a5eccf54ca8ca5cbd697bf7f2667", size = 5115992, upload-time = "2025-10-15T16:15:50.144Z" },
+    { url = "https://files.pythonhosted.org/packages/88/dd/db77c75b055c6157cbd4f9c92c4458daef0dd9cbe6d8d2fe7f803cb64c37/numpy-2.3.4-cp312-cp312-macosx_14_0_x86_64.whl", hash = "sha256:8ab1c5f5ee40d6e01cbe96de5863e39b215a4d24e7d007cad56c7184fdf4aeef", size = 6648672, upload-time = "2025-10-15T16:15:52.442Z" },
+    { url = "https://files.pythonhosted.org/packages/e1/e6/e31b0d713719610e406c0ea3ae0d90760465b086da8783e2fd835ad59027/numpy-2.3.4-cp312-cp312-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:77b84453f3adcb994ddbd0d1c5d11db2d6bda1a2b7fd5ac5bd4649d6f5dc682e", size = 14284156, upload-time = "2025-10-15T16:15:54.351Z" },
+    { url = "https://files.pythonhosted.org/packages/f9/58/30a85127bfee6f108282107caf8e06a1f0cc997cb6b52cdee699276fcce4/numpy-2.3.4-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:4121c5beb58a7f9e6dfdee612cb24f4df5cd4db6e8261d7f4d7450a997a65d6a", size = 16641271, upload-time = "2025-10-15T16:15:56.67Z" },
+    { url = "https://files.pythonhosted.org/packages/06/f2/2e06a0f2adf23e3ae29283ad96959267938d0efd20a2e25353b70065bfec/numpy-2.3.4-cp312-cp312-musllinux_1_2_aarch64.whl", hash = "sha256:65611ecbb00ac9846efe04db15cbe6186f562f6bb7e5e05f077e53a599225d16", size = 16059531, upload-time = "2025-10-15T16:15:59.412Z" },
+    { url = "https://files.pythonhosted.org/packages/b0/e7/b106253c7c0d5dc352b9c8fab91afd76a93950998167fa3e5afe4ef3a18f/numpy-2.3.4-cp312-cp312-musllinux_1_2_x86_64.whl", hash = "sha256:dabc42f9c6577bcc13001b8810d300fe814b4cfbe8a92c873f269484594f9786", size = 18578983, upload-time = "2025-10-15T16:16:01.804Z" },
+    { url = "https://files.pythonhosted.org/packages/73/e3/04ecc41e71462276ee867ccbef26a4448638eadecf1bc56772c9ed6d0255/numpy-2.3.4-cp312-cp312-win32.whl", hash = "sha256:a49d797192a8d950ca59ee2d0337a4d804f713bb5c3c50e8db26d49666e351dc", size = 6291380, upload-time = "2025-10-15T16:16:03.938Z" },
+    { url = "https://files.pythonhosted.org/packages/3d/a8/566578b10d8d0e9955b1b6cd5db4e9d4592dd0026a941ff7994cedda030a/numpy-2.3.4-cp312-cp312-win_amd64.whl", hash = "sha256:985f1e46358f06c2a09921e8921e2c98168ed4ae12ccd6e5e87a4f1857923f32", size = 12787999, upload-time = "2025-10-15T16:16:05.801Z" },
+    { url = "https://files.pythonhosted.org/packages/58/22/9c903a957d0a8071b607f5b1bff0761d6e608b9a965945411f867d515db1/numpy-2.3.4-cp312-cp312-win_arm64.whl", hash = "sha256:4635239814149e06e2cb9db3dd584b2fa64316c96f10656983b8026a82e6e4db", size = 10197412, upload-time = "2025-10-15T16:16:07.854Z" },
+    { url = "https://files.pythonhosted.org/packages/57/7e/b72610cc91edf138bc588df5150957a4937221ca6058b825b4725c27be62/numpy-2.3.4-cp313-cp313-macosx_10_13_x86_64.whl", hash = "sha256:c090d4860032b857d94144d1a9976b8e36709e40386db289aaf6672de2a81966", size = 20950335, upload-time = "2025-10-15T16:16:10.304Z" },
+    { url = "https://files.pythonhosted.org/packages/3e/46/bdd3370dcea2f95ef14af79dbf81e6927102ddf1cc54adc0024d61252fd9/numpy-2.3.4-cp313-cp313-macosx_11_0_arm64.whl", hash = "sha256:a13fc473b6db0be619e45f11f9e81260f7302f8d180c49a22b6e6120022596b3", size = 14179878, upload-time = "2025-10-15T16:16:12.595Z" },
+    { url = "https://files.pythonhosted.org/packages/ac/01/5a67cb785bda60f45415d09c2bc245433f1c68dd82eef9c9002c508b5a65/numpy-2.3.4-cp313-cp313-macosx_14_0_arm64.whl", hash = "sha256:3634093d0b428e6c32c3a69b78e554f0cd20ee420dcad5a9f3b2a63762ce4197", size = 5108673, upload-time = "2025-10-15T16:16:14.877Z" },
+    { url = "https://files.pythonhosted.org/packages/c2/cd/8428e23a9fcebd33988f4cb61208fda832800ca03781f471f3727a820704/numpy-2.3.4-cp313-cp313-macosx_14_0_x86_64.whl", hash = "sha256:043885b4f7e6e232d7df4f51ffdef8c36320ee9d5f227b380ea636722c7ed12e", size = 6641438, upload-time = "2025-10-15T16:16:16.805Z" },
+    { url = "https://files.pythonhosted.org/packages/3e/d1/913fe563820f3c6b079f992458f7331278dcd7ba8427e8e745af37ddb44f/numpy-2.3.4-cp313-cp313-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:4ee6a571d1e4f0ea6d5f22d6e5fbd6ed1dc2b18542848e1e7301bd190500c9d7", size = 14281290, upload-time = "2025-10-15T16:16:18.764Z" },
+    { url = "https://files.pythonhosted.org/packages/9e/7e/7d306ff7cb143e6d975cfa7eb98a93e73495c4deabb7d1b5ecf09ea0fd69/numpy-2.3.4-cp313-cp313-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:fc8a63918b04b8571789688b2780ab2b4a33ab44bfe8ccea36d3eba51228c953", size = 16636543, upload-time = "2025-10-15T16:16:21.072Z" },
+    { url = "https://files.pythonhosted.org/packages/47/6a/8cfc486237e56ccfb0db234945552a557ca266f022d281a2f577b98e955c/numpy-2.3.4-cp313-cp313-musllinux_1_2_aarch64.whl", hash = "sha256:40cc556d5abbc54aabe2b1ae287042d7bdb80c08edede19f0c0afb36ae586f37", size = 16056117, upload-time = "2025-10-15T16:16:23.369Z" },
+    { url = "https://files.pythonhosted.org/packages/b1/0e/42cb5e69ea901e06ce24bfcc4b5664a56f950a70efdcf221f30d9615f3f3/numpy-2.3.4-cp313-cp313-musllinux_1_2_x86_64.whl", hash = "sha256:ecb63014bb7f4ce653f8be7f1df8cbc6093a5a2811211770f6606cc92b5a78fd", size = 18577788, upload-time = "2025-10-15T16:16:27.496Z" },
+    { url = "https://files.pythonhosted.org/packages/86/92/41c3d5157d3177559ef0a35da50f0cda7fa071f4ba2306dd36818591a5bc/numpy-2.3.4-cp313-cp313-win32.whl", hash = "sha256:e8370eb6925bb8c1c4264fec52b0384b44f675f191df91cbe0140ec9f0955646", size = 6282620, upload-time = "2025-10-15T16:16:29.811Z" },
+    { url = "https://files.pythonhosted.org/packages/09/97/fd421e8bc50766665ad35536c2bb4ef916533ba1fdd053a62d96cc7c8b95/numpy-2.3.4-cp313-cp313-win_amd64.whl", hash = "sha256:56209416e81a7893036eea03abcb91c130643eb14233b2515c90dcac963fe99d", size = 12784672, upload-time = "2025-10-15T16:16:31.589Z" },
+    { url = "https://files.pythonhosted.org/packages/ad/df/5474fb2f74970ca8eb978093969b125a84cc3d30e47f82191f981f13a8a0/numpy-2.3.4-cp313-cp313-win_arm64.whl", hash = "sha256:a700a4031bc0fd6936e78a752eefb79092cecad2599ea9c8039c548bc097f9bc", size = 10196702, upload-time = "2025-10-15T16:16:33.902Z" },
+    { url = "https://files.pythonhosted.org/packages/11/83/66ac031464ec1767ea3ed48ce40f615eb441072945e98693bec0bcd056cc/numpy-2.3.4-cp313-cp313t-macosx_10_13_x86_64.whl", hash = "sha256:86966db35c4040fdca64f0816a1c1dd8dbd027d90fca5a57e00e1ca4cd41b879", size = 21049003, upload-time = "2025-10-15T16:16:36.101Z" },
+    { url = "https://files.pythonhosted.org/packages/5f/99/5b14e0e686e61371659a1d5bebd04596b1d72227ce36eed121bb0aeab798/numpy-2.3.4-cp313-cp313t-macosx_11_0_arm64.whl", hash = "sha256:838f045478638b26c375ee96ea89464d38428c69170360b23a1a50fa4baa3562", size = 14302980, upload-time = "2025-10-15T16:16:39.124Z" },
+    { url = "https://files.pythonhosted.org/packages/2c/44/e9486649cd087d9fc6920e3fc3ac2aba10838d10804b1e179fb7cbc4e634/numpy-2.3.4-cp313-cp313t-macosx_14_0_arm64.whl", hash = "sha256:d7315ed1dab0286adca467377c8381cd748f3dc92235f22a7dfc42745644a96a", size = 5231472, upload-time = "2025-10-15T16:16:41.168Z" },
+    { url = "https://files.pythonhosted.org/packages/3e/51/902b24fa8887e5fe2063fd61b1895a476d0bbf46811ab0c7fdf4bd127345/numpy-2.3.4-cp313-cp313t-macosx_14_0_x86_64.whl", hash = "sha256:84f01a4d18b2cc4ade1814a08e5f3c907b079c847051d720fad15ce37aa930b6", size = 6739342, upload-time = "2025-10-15T16:16:43.777Z" },
+    { url = "https://files.pythonhosted.org/packages/34/f1/4de9586d05b1962acdcdb1dc4af6646361a643f8c864cef7c852bf509740/numpy-2.3.4-cp313-cp313t-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:817e719a868f0dacde4abdfc5c1910b301877970195db9ab6a5e2c4bd5b121f7", size = 14354338, upload-time = "2025-10-15T16:16:46.081Z" },
+    { url = "https://files.pythonhosted.org/packages/1f/06/1c16103b425de7969d5a76bdf5ada0804b476fed05d5f9e17b777f1cbefd/numpy-2.3.4-cp313-cp313t-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:85e071da78d92a214212cacea81c6da557cab307f2c34b5f85b628e94803f9c0", size = 16702392, upload-time = "2025-10-15T16:16:48.455Z" },
+    { url = "https://files.pythonhosted.org/packages/34/b2/65f4dc1b89b5322093572b6e55161bb42e3e0487067af73627f795cc9d47/numpy-2.3.4-cp313-cp313t-musllinux_1_2_aarch64.whl", hash = "sha256:2ec646892819370cf3558f518797f16597b4e4669894a2ba712caccc9da53f1f", size = 16134998, upload-time = "2025-10-15T16:16:51.114Z" },
+    { url = "https://files.pythonhosted.org/packages/d4/11/94ec578896cdb973aaf56425d6c7f2aff4186a5c00fac15ff2ec46998b46/numpy-2.3.4-cp313-cp313t-musllinux_1_2_x86_64.whl", hash = "sha256:035796aaaddfe2f9664b9a9372f089cfc88bd795a67bd1bfe15e6e770934cf64", size = 18651574, upload-time = "2025-10-15T16:16:53.429Z" },
+    { url = "https://files.pythonhosted.org/packages/62/b7/7efa763ab33dbccf56dade36938a77345ce8e8192d6b39e470ca25ff3cd0/numpy-2.3.4-cp313-cp313t-win32.whl", hash = "sha256:fea80f4f4cf83b54c3a051f2f727870ee51e22f0248d3114b8e755d160b38cfb", size = 6413135, upload-time = "2025-10-15T16:16:55.992Z" },
+    { url = "https://files.pythonhosted.org/packages/43/70/aba4c38e8400abcc2f345e13d972fb36c26409b3e644366db7649015f291/numpy-2.3.4-cp313-cp313t-win_amd64.whl", hash = "sha256:15eea9f306b98e0be91eb344a94c0e630689ef302e10c2ce5f7e11905c704f9c", size = 12928582, upload-time = "2025-10-15T16:16:57.943Z" },
+    { url = "https://files.pythonhosted.org/packages/67/63/871fad5f0073fc00fbbdd7232962ea1ac40eeaae2bba66c76214f7954236/numpy-2.3.4-cp313-cp313t-win_arm64.whl", hash = "sha256:b6c231c9c2fadbae4011ca5e7e83e12dc4a5072f1a1d85a0a7b3ed754d145a40", size = 10266691, upload-time = "2025-10-15T16:17:00.048Z" },
+    { url = "https://files.pythonhosted.org/packages/72/71/ae6170143c115732470ae3a2d01512870dd16e0953f8a6dc89525696069b/numpy-2.3.4-cp314-cp314-macosx_10_15_x86_64.whl", hash = "sha256:81c3e6d8c97295a7360d367f9f8553973651b76907988bb6066376bc2252f24e", size = 20955580, upload-time = "2025-10-15T16:17:02.509Z" },
+    { url = "https://files.pythonhosted.org/packages/af/39/4be9222ffd6ca8a30eda033d5f753276a9c3426c397bb137d8e19dedd200/numpy-2.3.4-cp314-cp314-macosx_11_0_arm64.whl", hash = "sha256:7c26b0b2bf58009ed1f38a641f3db4be8d960a417ca96d14e5b06df1506d41ff", size = 14188056, upload-time = "2025-10-15T16:17:04.873Z" },
+    { url = "https://files.pythonhosted.org/packages/6c/3d/d85f6700d0a4aa4f9491030e1021c2b2b7421b2b38d01acd16734a2bfdc7/numpy-2.3.4-cp314-cp314-macosx_14_0_arm64.whl", hash = "sha256:62b2198c438058a20b6704351b35a1d7db881812d8512d67a69c9de1f18ca05f", size = 5116555, upload-time = "2025-10-15T16:17:07.499Z" },
+    { url = "https://files.pythonhosted.org/packages/bf/04/82c1467d86f47eee8a19a464c92f90a9bb68ccf14a54c5224d7031241ffb/numpy-2.3.4-cp314-cp314-macosx_14_0_x86_64.whl", hash = "sha256:9d729d60f8d53a7361707f4b68a9663c968882dd4f09e0d58c044c8bf5faee7b", size = 6643581, upload-time = "2025-10-15T16:17:09.774Z" },
+    { url = "https://files.pythonhosted.org/packages/0c/d3/c79841741b837e293f48bd7db89d0ac7a4f2503b382b78a790ef1dc778a5/numpy-2.3.4-cp314-cp314-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:bd0c630cf256b0a7fd9d0a11c9413b42fef5101219ce6ed5a09624f5a65392c7", size = 14299186, upload-time = "2025-10-15T16:17:11.937Z" },
+    { url = "https://files.pythonhosted.org/packages/e8/7e/4a14a769741fbf237eec5a12a2cbc7a4c4e061852b6533bcb9e9a796c908/numpy-2.3.4-cp314-cp314-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:d5e081bc082825f8b139f9e9fe42942cb4054524598aaeb177ff476cc76d09d2", size = 16638601, upload-time = "2025-10-15T16:17:14.391Z" },
+    { url = "https://files.pythonhosted.org/packages/93/87/1c1de269f002ff0a41173fe01dcc925f4ecff59264cd8f96cf3b60d12c9b/numpy-2.3.4-cp314-cp314-musllinux_1_2_aarch64.whl", hash = "sha256:15fb27364ed84114438fff8aaf998c9e19adbeba08c0b75409f8c452a8692c52", size = 16074219, upload-time = "2025-10-15T16:17:17.058Z" },
+    { url = "https://files.pythonhosted.org/packages/cd/28/18f72ee77408e40a76d691001ae599e712ca2a47ddd2c4f695b16c65f077/numpy-2.3.4-cp314-cp314-musllinux_1_2_x86_64.whl", hash = "sha256:85d9fb2d8cd998c84d13a79a09cc0c1091648e848e4e6249b0ccd7f6b487fa26", size = 18576702, upload-time = "2025-10-15T16:17:19.379Z" },
+    { url = "https://files.pythonhosted.org/packages/c3/76/95650169b465ececa8cf4b2e8f6df255d4bf662775e797ade2025cc51ae6/numpy-2.3.4-cp314-cp314-win32.whl", hash = "sha256:e73d63fd04e3a9d6bc187f5455d81abfad05660b212c8804bf3b407e984cd2bc", size = 6337136, upload-time = "2025-10-15T16:17:22.886Z" },
+    { url = "https://files.pythonhosted.org/packages/dc/89/a231a5c43ede5d6f77ba4a91e915a87dea4aeea76560ba4d2bf185c683f0/numpy-2.3.4-cp314-cp314-win_amd64.whl", hash = "sha256:3da3491cee49cf16157e70f607c03a217ea6647b1cea4819c4f48e53d49139b9", size = 12920542, upload-time = "2025-10-15T16:17:24.783Z" },
+    { url = "https://files.pythonhosted.org/packages/0d/0c/ae9434a888f717c5ed2ff2393b3f344f0ff6f1c793519fa0c540461dc530/numpy-2.3.4-cp314-cp314-win_arm64.whl", hash = "sha256:6d9cd732068e8288dbe2717177320723ccec4fb064123f0caf9bbd90ab5be868", size = 10480213, upload-time = "2025-10-15T16:17:26.935Z" },
+    { url = "https://files.pythonhosted.org/packages/83/4b/c4a5f0841f92536f6b9592694a5b5f68c9ab37b775ff342649eadf9055d3/numpy-2.3.4-cp314-cp314t-macosx_10_15_x86_64.whl", hash = "sha256:22758999b256b595cf0b1d102b133bb61866ba5ceecf15f759623b64c020c9ec", size = 21052280, upload-time = "2025-10-15T16:17:29.638Z" },
+    { url = "https://files.pythonhosted.org/packages/3e/80/90308845fc93b984d2cc96d83e2324ce8ad1fd6efea81b324cba4b673854/numpy-2.3.4-cp314-cp314t-macosx_11_0_arm64.whl", hash = "sha256:9cb177bc55b010b19798dc5497d540dea67fd13a8d9e882b2dae71de0cf09eb3", size = 14302930, upload-time = "2025-10-15T16:17:32.384Z" },
+    { url = "https://files.pythonhosted.org/packages/3d/4e/07439f22f2a3b247cec4d63a713faae55e1141a36e77fb212881f7cda3fb/numpy-2.3.4-cp314-cp314t-macosx_14_0_arm64.whl", hash = "sha256:0f2bcc76f1e05e5ab58893407c63d90b2029908fa41f9f1cc51eecce936c3365", size = 5231504, upload-time = "2025-10-15T16:17:34.515Z" },
+    { url = "https://files.pythonhosted.org/packages/ab/de/1e11f2547e2fe3d00482b19721855348b94ada8359aef5d40dd57bfae9df/numpy-2.3.4-cp314-cp314t-macosx_14_0_x86_64.whl", hash = "sha256:8dc20bde86802df2ed8397a08d793da0ad7a5fd4ea3ac85d757bf5dd4ad7c252", size = 6739405, upload-time = "2025-10-15T16:17:36.128Z" },
+    { url = "https://files.pythonhosted.org/packages/3b/40/8cd57393a26cebe2e923005db5134a946c62fa56a1087dc7c478f3e30837/numpy-2.3.4-cp314-cp314t-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:5e199c087e2aa71c8f9ce1cb7a8e10677dc12457e7cc1be4798632da37c3e86e", size = 14354866, upload-time = "2025-10-15T16:17:38.884Z" },
+    { url = "https://files.pythonhosted.org/packages/93/39/5b3510f023f96874ee6fea2e40dfa99313a00bf3ab779f3c92978f34aace/numpy-2.3.4-cp314-cp314t-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:85597b2d25ddf655495e2363fe044b0ae999b75bc4d630dc0d886484b03a5eb0", size = 16703296, upload-time = "2025-10-15T16:17:41.564Z" },
+    { url = "https://files.pythonhosted.org/packages/41/0d/19bb163617c8045209c1996c4e427bccbc4bbff1e2c711f39203c8ddbb4a/numpy-2.3.4-cp314-cp314t-musllinux_1_2_aarch64.whl", hash = "sha256:04a69abe45b49c5955923cf2c407843d1c85013b424ae8a560bba16c92fe44a0", size = 16136046, upload-time = "2025-10-15T16:17:43.901Z" },
+    { url = "https://files.pythonhosted.org/packages/e2/c1/6dba12fdf68b02a21ac411c9df19afa66bed2540f467150ca64d246b463d/numpy-2.3.4-cp314-cp314t-musllinux_1_2_x86_64.whl", hash = "sha256:e1708fac43ef8b419c975926ce1eaf793b0c13b7356cfab6ab0dc34c0a02ac0f", size = 18652691, upload-time = "2025-10-15T16:17:46.247Z" },
+    { url = "https://files.pythonhosted.org/packages/f8/73/f85056701dbbbb910c51d846c58d29fd46b30eecd2b6ba760fc8b8a1641b/numpy-2.3.4-cp314-cp314t-win32.whl", hash = "sha256:863e3b5f4d9915aaf1b8ec79ae560ad21f0b8d5e3adc31e73126491bb86dee1d", size = 6485782, upload-time = "2025-10-15T16:17:48.872Z" },
+    { url = "https://files.pythonhosted.org/packages/17/90/28fa6f9865181cb817c2471ee65678afa8a7e2a1fb16141473d5fa6bacc3/numpy-2.3.4-cp314-cp314t-win_amd64.whl", hash = "sha256:962064de37b9aef801d33bc579690f8bfe6c5e70e29b61783f60bcba838a14d6", size = 13113301, upload-time = "2025-10-15T16:17:50.938Z" },
+    { url = "https://files.pythonhosted.org/packages/54/23/08c002201a8e7e1f9afba93b97deceb813252d9cfd0d3351caed123dcf97/numpy-2.3.4-cp314-cp314t-win_arm64.whl", hash = "sha256:8b5a9a39c45d852b62693d9b3f3e0fe052541f804296ff401a72a1b60edafb29", size = 10547532, upload-time = "2025-10-15T16:17:53.48Z" },
+    { url = "https://files.pythonhosted.org/packages/b1/b6/64898f51a86ec88ca1257a59c1d7fd077b60082a119affefcdf1dd0df8ca/numpy-2.3.4-pp311-pypy311_pp73-macosx_10_15_x86_64.whl", hash = "sha256:6e274603039f924c0fe5cb73438fa9246699c78a6df1bd3decef9ae592ae1c05", size = 21131552, upload-time = "2025-10-15T16:17:55.845Z" },
+    { url = "https://files.pythonhosted.org/packages/ce/4c/f135dc6ebe2b6a3c77f4e4838fa63d350f85c99462012306ada1bd4bc460/numpy-2.3.4-pp311-pypy311_pp73-macosx_11_0_arm64.whl", hash = "sha256:d149aee5c72176d9ddbc6803aef9c0f6d2ceeea7626574fc68518da5476fa346", size = 14377796, upload-time = "2025-10-15T16:17:58.308Z" },
+    { url = "https://files.pythonhosted.org/packages/d0/a4/f33f9c23fcc13dd8412fc8614559b5b797e0aba9d8e01dfa8bae10c84004/numpy-2.3.4-pp311-pypy311_pp73-macosx_14_0_arm64.whl", hash = "sha256:6d34ed9db9e6395bb6cd33286035f73a59b058169733a9db9f85e650b88df37e", size = 5306904, upload-time = "2025-10-15T16:18:00.596Z" },
+    { url = "https://files.pythonhosted.org/packages/28/af/c44097f25f834360f9fb960fa082863e0bad14a42f36527b2a121abdec56/numpy-2.3.4-pp311-pypy311_pp73-macosx_14_0_x86_64.whl", hash = "sha256:fdebe771ca06bb8d6abce84e51dca9f7921fe6ad34a0c914541b063e9a68928b", size = 6819682, upload-time = "2025-10-15T16:18:02.32Z" },
+    { url = "https://files.pythonhosted.org/packages/c5/8c/cd283b54c3c2b77e188f63e23039844f56b23bba1712318288c13fe86baf/numpy-2.3.4-pp311-pypy311_pp73-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:957e92defe6c08211eb77902253b14fe5b480ebc5112bc741fd5e9cd0608f847", size = 14422300, upload-time = "2025-10-15T16:18:04.271Z" },
+    { url = "https://files.pythonhosted.org/packages/b0/f0/8404db5098d92446b3e3695cf41c6f0ecb703d701cb0b7566ee2177f2eee/numpy-2.3.4-pp311-pypy311_pp73-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:13b9062e4f5c7ee5c7e5be96f29ba71bc5a37fed3d1d77c37390ae00724d296d", size = 16760806, upload-time = "2025-10-15T16:18:06.668Z" },
+    { url = "https://files.pythonhosted.org/packages/95/8e/2844c3959ce9a63acc7c8e50881133d86666f0420bcde695e115ced0920f/numpy-2.3.4-pp311-pypy311_pp73-win_amd64.whl", hash = "sha256:81b3a59793523e552c4a96109dde028aa4448ae06ccac5a76ff6532a85558a7f", size = 12973130, upload-time = "2025-10-15T16:18:09.397Z" },
+]
+
+[[package]]
+name = "opentelemetry-api"
+version = "1.38.0"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+    { name = "importlib-metadata" },
+    { name = "typing-extensions" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/08/d8/0f354c375628e048bd0570645b310797299754730079853095bf000fba69/opentelemetry_api-1.38.0.tar.gz", hash = "sha256:f4c193b5e8acb0912b06ac5b16321908dd0843d75049c091487322284a3eea12", size = 65242, upload-time = "2025-10-16T08:35:50.25Z" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/ae/a2/d86e01c28300bd41bab8f18afd613676e2bd63515417b77636fc1add426f/opentelemetry_api-1.38.0-py3-none-any.whl", hash = "sha256:2891b0197f47124454ab9f0cf58f3be33faca394457ac3e09daba13ff50aa582", size = 65947, upload-time = "2025-10-16T08:35:30.23Z" },
+]
+
+[[package]]
+name = "opentelemetry-exporter-otlp-proto-common"
+version = "1.38.0"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+    { name = "opentelemetry-proto" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/19/83/dd4660f2956ff88ed071e9e0e36e830df14b8c5dc06722dbde1841accbe8/opentelemetry_exporter_otlp_proto_common-1.38.0.tar.gz", hash = "sha256:e333278afab4695aa8114eeb7bf4e44e65c6607d54968271a249c180b2cb605c", size = 20431, upload-time = "2025-10-16T08:35:53.285Z" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/a7/9e/55a41c9601191e8cd8eb626b54ee6827b9c9d4a46d736f32abc80d8039fc/opentelemetry_exporter_otlp_proto_common-1.38.0-py3-none-any.whl", hash = "sha256:03cb76ab213300fe4f4c62b7d8f17d97fcfd21b89f0b5ce38ea156327ddda74a", size = 18359, upload-time = "2025-10-16T08:35:34.099Z" },
+]
+
+[[package]]
+name = "opentelemetry-exporter-otlp-proto-grpc"
+version = "1.38.0"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+    { name = "googleapis-common-protos" },
+    { name = "grpcio" },
+    { name = "opentelemetry-api" },
+    { name = "opentelemetry-exporter-otlp-proto-common" },
+    { name = "opentelemetry-proto" },
+    { name = "opentelemetry-sdk" },
+    { name = "typing-extensions" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/a2/c0/43222f5b97dc10812bc4f0abc5dc7cd0a2525a91b5151d26c9e2e958f52e/opentelemetry_exporter_otlp_proto_grpc-1.38.0.tar.gz", hash = "sha256:2473935e9eac71f401de6101d37d6f3f0f1831db92b953c7dcc912536158ebd6", size = 24676, upload-time = "2025-10-16T08:35:53.83Z" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/28/f0/bd831afbdba74ca2ce3982142a2fad707f8c487e8a3b6fef01f1d5945d1b/opentelemetry_exporter_otlp_proto_grpc-1.38.0-py3-none-any.whl", hash = "sha256:7c49fd9b4bd0dbe9ba13d91f764c2d20b0025649a6e4ac35792fb8d84d764bc7", size = 19695, upload-time = "2025-10-16T08:35:35.053Z" },
+]
+
+[[package]]
+name = "opentelemetry-instrumentation"
+version = "0.59b0"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+    { name = "opentelemetry-api" },
+    { name = "opentelemetry-semantic-conventions" },
+    { name = "packaging" },
+    { name = "wrapt" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/04/ed/9c65cd209407fd807fa05be03ee30f159bdac8d59e7ea16a8fe5a1601222/opentelemetry_instrumentation-0.59b0.tar.gz", hash = "sha256:6010f0faaacdaf7c4dff8aac84e226d23437b331dcda7e70367f6d73a7db1adc", size = 31544, upload-time = "2025-10-16T08:39:31.959Z" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/10/f5/7a40ff3f62bfe715dad2f633d7f1174ba1a7dd74254c15b2558b3401262a/opentelemetry_instrumentation-0.59b0-py3-none-any.whl", hash = "sha256:44082cc8fe56b0186e87ee8f7c17c327c4c2ce93bdbe86496e600985d74368ee", size = 33020, upload-time = "2025-10-16T08:38:31.463Z" },
+]
+
+[[package]]
+name = "opentelemetry-instrumentation-asgi"
+version = "0.59b0"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+    { name = "asgiref" },
+    { name = "opentelemetry-api" },
+    { name = "opentelemetry-instrumentation" },
+    { name = "opentelemetry-semantic-conventions" },
+    { name = "opentelemetry-util-http" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/b7/a4/cfbb6fc1ec0aa9bf5a93f548e6a11ab3ac1956272f17e0d399aa2c1f85bc/opentelemetry_instrumentation_asgi-0.59b0.tar.gz", hash = "sha256:2509d6fe9fd829399ce3536e3a00426c7e3aa359fc1ed9ceee1628b56da40e7a", size = 25116, upload-time = "2025-10-16T08:39:36.092Z" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/f3/88/fe02d809963b182aafbf5588685d7a05af8861379b0ec203d48e360d4502/opentelemetry_instrumentation_asgi-0.59b0-py3-none-any.whl", hash = "sha256:ba9703e09d2c33c52fa798171f344c8123488fcd45017887981df088452d3c53", size = 16797, upload-time = "2025-10-16T08:38:37.214Z" },
+]
+
+[[package]]
+name = "opentelemetry-instrumentation-httpx"
+version = "0.59b0"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+    { name = "opentelemetry-api" },
+    { name = "opentelemetry-instrumentation" },
+    { name = "opentelemetry-semantic-conventions" },
+    { name = "opentelemetry-util-http" },
+    { name = "wrapt" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/18/6b/1bdf36b68cace9b4eae3cbbade4150c71c90aa392b127dda5bb5c2a49307/opentelemetry_instrumentation_httpx-0.59b0.tar.gz", hash = "sha256:a1cb9b89d9f05a82701cc9ab9cfa3db54fd76932489449778b350bc1b9f0e872", size = 19886, upload-time = "2025-10-16T08:39:48.428Z" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/58/16/c1e0745d20af392ec9060693531d7f01239deb2d81e460d0c379719691b8/opentelemetry_instrumentation_httpx-0.59b0-py3-none-any.whl", hash = "sha256:7dc9f66aef4ca3904d877f459a70c78eafd06131dc64d713b9b1b5a7d0a48f05", size = 15197, upload-time = "2025-10-16T08:38:55.507Z" },
+]
+
+[[package]]
+name = "opentelemetry-instrumentation-logging"
+version = "0.59b0"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+    { name = "opentelemetry-api" },
+    { name = "opentelemetry-instrumentation" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/be/88/9c5f70fa8b8d96d30be378fc6eb1776e13aea456db15009f4eaef4928847/opentelemetry_instrumentation_logging-0.59b0.tar.gz", hash = "sha256:1b51116444edc74f699daf9002ded61529397100c9bc903c8b9aaa75a5218c76", size = 9969, upload-time = "2025-10-16T08:39:51.653Z" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/2c/a0/340cc45d71437c2f7e27f13c1d2e335b18bbc7a24fd7d174018500b3c7ba/opentelemetry_instrumentation_logging-0.59b0-py3-none-any.whl", hash = "sha256:fdd4eddbd093fc421df8f7d356ecb15b320a1f3396b56bce5543048a5c457eea", size = 12577, upload-time = "2025-10-16T08:38:58.064Z" },
+]
+
+[[package]]
+name = "opentelemetry-proto"
+version = "1.38.0"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+    { name = "protobuf" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/51/14/f0c4f0f6371b9cb7f9fa9ee8918bfd59ac7040c7791f1e6da32a1839780d/opentelemetry_proto-1.38.0.tar.gz", hash = "sha256:88b161e89d9d372ce723da289b7da74c3a8354a8e5359992be813942969ed468", size = 46152, upload-time = "2025-10-16T08:36:01.612Z" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/b6/6a/82b68b14efca5150b2632f3692d627afa76b77378c4999f2648979409528/opentelemetry_proto-1.38.0-py3-none-any.whl", hash = "sha256:b6ebe54d3217c42e45462e2a1ae28c3e2bf2ec5a5645236a490f55f45f1a0a18", size = 72535, upload-time = "2025-10-16T08:35:45.749Z" },
+]
+
+[[package]]
+name = "opentelemetry-sdk"
+version = "1.38.0"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+    { name = "opentelemetry-api" },
+    { name = "opentelemetry-semantic-conventions" },
+    { name = "typing-extensions" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/85/cb/f0eee1445161faf4c9af3ba7b848cc22a50a3d3e2515051ad8628c35ff80/opentelemetry_sdk-1.38.0.tar.gz", hash = "sha256:93df5d4d871ed09cb4272305be4d996236eedb232253e3ab864c8620f051cebe", size = 171942, upload-time = "2025-10-16T08:36:02.257Z" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/2f/2e/e93777a95d7d9c40d270a371392b6d6f1ff170c2a3cb32d6176741b5b723/opentelemetry_sdk-1.38.0-py3-none-any.whl", hash = "sha256:1c66af6564ecc1553d72d811a01df063ff097cdc82ce188da9951f93b8d10f6b", size = 132349, upload-time = "2025-10-16T08:35:46.995Z" },
+]
+
+[[package]]
+name = "opentelemetry-semantic-conventions"
+version = "0.59b0"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+    { name = "opentelemetry-api" },
+    { name = "typing-extensions" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/40/bc/8b9ad3802cd8ac6583a4eb7de7e5d7db004e89cb7efe7008f9c8a537ee75/opentelemetry_semantic_conventions-0.59b0.tar.gz", hash = "sha256:7a6db3f30d70202d5bf9fa4b69bc866ca6a30437287de6c510fb594878aed6b0", size = 129861, upload-time = "2025-10-16T08:36:03.346Z" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/24/7d/c88d7b15ba8fe5c6b8f93be50fc11795e9fc05386c44afaf6b76fe191f9b/opentelemetry_semantic_conventions-0.59b0-py3-none-any.whl", hash = "sha256:35d3b8833ef97d614136e253c1da9342b4c3c083bbaf29ce31d572a1c3825eed", size = 207954, upload-time = "2025-10-16T08:35:48.054Z" },
+]
+
+[[package]]
+name = "opentelemetry-util-http"
+version = "0.59b0"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/34/f7/13cd081e7851c42520ab0e96efb17ffbd901111a50b8252ec1e240664020/opentelemetry_util_http-0.59b0.tar.gz", hash = "sha256:ae66ee91be31938d832f3b4bc4eb8a911f6eddd38969c4a871b1230db2a0a560", size = 9412, upload-time = "2025-10-16T08:40:11.335Z" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/20/56/62282d1d4482061360449dacc990c89cad0fc810a2ed937b636300f55023/opentelemetry_util_http-0.59b0-py3-none-any.whl", hash = "sha256:6d036a07563bce87bf521839c0671b507a02a0d39d7ea61b88efa14c6e25355d", size = 7648, upload-time = "2025-10-16T08:39:25.706Z" },
+]
+
 [[package]]
 name = "packaging"
 version = "25.0"
@@ -1181,6 +1515,27 @@ wheels = [
    { url = "https://files.pythonhosted.org/packages/54/20/4d324d65cc6d9205fabedc306948156824eb9f0ee1633355a8f7ec5c66bf/pluggy-1.6.0-py3-none-any.whl", hash = "sha256:e920276dd6813095e9377c0bc5566d94c932c33b27a3e3945d8389c374dd4746", size = 20538, upload-time = "2025-05-15T12:30:06.134Z" },
 ]

+[[package]]
+name = "portalocker"
+version = "3.2.0"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+    { name = "pywin32", marker = "sys_platform == 'win32'" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/5e/77/65b857a69ed876e1951e88aaba60f5ce6120c33703f7cb61a3c894b8c1b6/portalocker-3.2.0.tar.gz", hash = "sha256:1f3002956a54a8c3730586c5c77bf18fae4149e07eaf1c29fc3faf4d5a3f89ac", size = 95644, upload-time = "2025-06-14T13:20:40.03Z" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/4b/a6/38c8e2f318bf67d338f4d629e93b0b4b9af331f455f0390ea8ce4a099b26/portalocker-3.2.0-py3-none-any.whl", hash = "sha256:3cdc5f565312224bc570c49337bd21428bba0ef363bbcf58b9ef4a9f11779968", size = 22424, upload-time = "2025-06-14T13:20:38.083Z" },
+]
+
+[[package]]
+name = "prometheus-client"
+version = "0.23.1"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/23/53/3edb5d68ecf6b38fcbcc1ad28391117d2a322d9a1a3eff04bfdb184d8c3b/prometheus_client-0.23.1.tar.gz", hash = "sha256:6ae8f9081eaaaf153a2e959d2e6c4f4fb57b12ef76c8c7980202f1e57b48b2ce", size = 80481, upload-time = "2025-09-18T20:47:25.043Z" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/b8/db/14bafcb4af2139e046d03fd00dea7873e48eafe18b7d2797e73d6681f210/prometheus_client-0.23.1-py3-none-any.whl", hash = "sha256:dd1913e6e76b59cfe44e7a4b83e01afc9873c1bdfd2ed8739f1e76aeca115f99", size = 61145, upload-time = "2025-09-18T20:47:23.875Z" },
+]
+
 [[package]]
 name = "prompt-toolkit"
 version = "3.0.51"
@@ -1193,6 +1548,21 @@ wheels = [
    { url = "https://files.pythonhosted.org/packages/ce/4f/5249960887b1fbe561d9ff265496d170b55a735b76724f10ef19f9e40716/prompt_toolkit-3.0.51-py3-none-any.whl", hash = "sha256:52742911fde84e2d423e2f9a4cf1de7d7ac4e51958f648d9540e0fb8db077b07", size = 387810, upload-time = "2025-04-15T09:18:44.753Z" },
 ]

+[[package]]
+name = "protobuf"
+version = "6.33.0"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/19/ff/64a6c8f420818bb873713988ca5492cba3a7946be57e027ac63495157d97/protobuf-6.33.0.tar.gz", hash = "sha256:140303d5c8d2037730c548f8c7b93b20bb1dc301be280c378b82b8894589c954", size = 443463, upload-time = "2025-10-15T20:39:52.159Z" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/7e/ee/52b3fa8feb6db4a833dfea4943e175ce645144532e8a90f72571ad85df4e/protobuf-6.33.0-cp310-abi3-win32.whl", hash = "sha256:d6101ded078042a8f17959eccd9236fb7a9ca20d3b0098bbcb91533a5680d035", size = 425593, upload-time = "2025-10-15T20:39:40.29Z" },
+    { url = "https://files.pythonhosted.org/packages/7b/c6/7a465f1825872c55e0341ff4a80198743f73b69ce5d43ab18043699d1d81/protobuf-6.33.0-cp310-abi3-win_amd64.whl", hash = "sha256:9a031d10f703f03768f2743a1c403af050b6ae1f3480e9c140f39c45f81b13ee", size = 436882, upload-time = "2025-10-15T20:39:42.841Z" },
+    { url = "https://files.pythonhosted.org/packages/e1/a9/b6eee662a6951b9c3640e8e452ab3e09f117d99fc10baa32d1581a0d4099/protobuf-6.33.0-cp39-abi3-macosx_10_9_universal2.whl", hash = "sha256:905b07a65f1a4b72412314082c7dbfae91a9e8b68a0cc1577515f8df58ecf455", size = 427521, upload-time = "2025-10-15T20:39:43.803Z" },
+    { url = "https://files.pythonhosted.org/packages/10/35/16d31e0f92c6d2f0e77c2a3ba93185130ea13053dd16200a57434c882f2b/protobuf-6.33.0-cp39-abi3-manylinux2014_aarch64.whl", hash = "sha256:e0697ece353e6239b90ee43a9231318302ad8353c70e6e45499fa52396debf90", size = 324445, upload-time = "2025-10-15T20:39:44.932Z" },
+    { url = "https://files.pythonhosted.org/packages/e6/eb/2a981a13e35cda8b75b5585aaffae2eb904f8f351bdd3870769692acbd8a/protobuf-6.33.0-cp39-abi3-manylinux2014_s390x.whl", hash = "sha256:e0a1715e4f27355afd9570f3ea369735afc853a6c3951a6afe1f80d8569ad298", size = 339159, upload-time = "2025-10-15T20:39:46.186Z" },
+    { url = "https://files.pythonhosted.org/packages/21/51/0b1cbad62074439b867b4e04cc09b93f6699d78fd191bed2bbb44562e077/protobuf-6.33.0-cp39-abi3-manylinux2014_x86_64.whl", hash = "sha256:35be49fd3f4fefa4e6e2aacc35e8b837d6703c37a2168a55ac21e9b1bc7559ef", size = 323172, upload-time = "2025-10-15T20:39:47.465Z" },
+    { url = "https://files.pythonhosted.org/packages/07/d1/0a28c21707807c6aacd5dc9c3704b2aa1effbf37adebd8caeaf68b17a636/protobuf-6.33.0-py3-none-any.whl", hash = "sha256:25c9e1963c6734448ea2d308cfa610e692b801304ba0908d7bfa564ac5132995", size = 170477, upload-time = "2025-10-15T20:39:51.311Z" },
+]
+
 [[package]]
 name = "ptyprocess"
 version = "0.7.0"
@@ -1494,6 +1864,15 @@ wheels = [
    { url = "https://files.pythonhosted.org/packages/5f/ed/539768cf28c661b5b068d66d96a2f155c4971a5d55684a514c1a0e0dec2f/python_dotenv-1.1.1-py3-none-any.whl", hash = "sha256:31f23644fe2602f88ff55e1f5c79ba497e01224ee7737937930c448e4d0e24dc", size = 20556, upload-time = "2025-06-24T04:21:06.073Z" },
 ]

+[[package]]
+name = "python-json-logger"
+version = "4.0.0"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/29/bf/eca6a3d43db1dae7070f70e160ab20b807627ba953663ba07928cdd3dc58/python_json_logger-4.0.0.tar.gz", hash = "sha256:f58e68eb46e1faed27e0f574a55a0455eecd7b8a5b88b85a784519ba3cff047f", size = 17683, upload-time = "2025-10-06T04:15:18.984Z" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/51/e5/fecf13f06e5e5f67e8837d777d1bc43fac0ed2b77a676804df5c34744727/python_json_logger-4.0.0-py3-none-any.whl", hash = "sha256:af09c9daf6a813aa4cc7180395f50f2a9e5fa056034c9953aec92e381c5ba1e2", size = 15548, upload-time = "2025-10-06T04:15:17.553Z" },
+]
+
 [[package]]
 name = "python-multipart"
 version = "0.0.20"
@@ -1598,6 +1977,24 @@ wheels = [
    { url = "https://files.pythonhosted.org/packages/f1/12/de94a39c2ef588c7e6455cfbe7343d3b2dc9d6b6b2f40c4c6565744c873d/pyyaml-6.0.3-cp314-cp314t-win_arm64.whl", hash = "sha256:ebc55a14a21cb14062aa4162f906cd962b28e2e9ea38f9b4391244cd8de4ae0b", size = 149341, upload-time = "2025-09-25T21:32:56.828Z" },
 ]

+[[package]]
+name = "qdrant-client"
+version = "1.15.1"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+    { name = "grpcio" },
+    { name = "httpx", extra = ["http2"] },
+    { name = "numpy" },
+    { name = "portalocker" },
+    { name = "protobuf" },
+    { name = "pydantic" },
+    { name = "urllib3" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/79/8b/76c7d325e11d97cb8eb5e261c3759e9ed6664735afbf32fdded5b580690c/qdrant_client-1.15.1.tar.gz", hash = "sha256:631f1f3caebfad0fd0c1fba98f41be81d9962b7bf3ca653bed3b727c0e0cbe0e", size = 295297, upload-time = "2025-07-31T19:35:19.627Z" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/ef/33/d8df6a2b214ffbe4138db9a1efe3248f67dc3c671f82308bea1582ecbbb7/qdrant_client-1.15.1-py3-none-any.whl", hash = "sha256:2b975099b378382f6ca1cfb43f0d59e541be6e16a5892f282a4b8de7eff5cb63", size = 337331, upload-time = "2025-07-31T19:35:17.539Z" },
+]
+
 [[package]]
 name = "questionary"
 version = "2.1.1"
@@ -2138,3 +2535,12 @@ sdist = { url = "https://files.pythonhosted.org/packages/79/2b/8ae5f59ab852c8fe3
 wheels = [
    { url = "https://files.pythonhosted.org/packages/0f/b7/4bac35b4079b76c07d8faddf89467e9891b1610cfe8d03b0ebb5610e4423/x_wr_timezone-2.0.1-py3-none-any.whl", hash = "sha256:e74a53b9f4f7def8138455c240e65e47c224778bce3c024fcd6da2cbe91ca038", size = 11102, upload-time = "2025-02-06T17:10:39.192Z" },
 ]
+
+[[package]]
+name = "zipp"
+version = "3.23.0"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/e3/02/0f2892c661036d50ede074e376733dca2ae7c6eb617489437771209d4180/zipp-3.23.0.tar.gz", hash = "sha256:a07157588a12518c9d4034df3fbbee09c814741a33ff63c05fa29d26a2404166", size = 25547, upload-time = "2025-06-08T17:06:39.4Z" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/2e/54/647ade08bf0db230bfea292f893923872fd20be6ac6f53b2b936ba839d75/zipp-3.23.0-py3-none-any.whl", hash = "sha256:071652d6115ed432f5ce1d34c336c0adfd6a884660d1e9712a256d3d3bd4b14e", size = 10276, upload-time = "2025-06-08T17:06:38.034Z" },
+]