bump: version 0.30.0 → 0.31.0

Merge pull request #283 from cbcoutinho/feat/adr-010-webhook-vector-sync
docs: Add ADR-010 for webhook-based vector sync
2025-11-10 07:02:49 +00:00 · 2025-11-10 08:02:22 +01:00 · 2025-11-10 07:41:02 +01:00 · 2025-11-10 07:24:27 +01:00 · 2025-11-10 07:19:26 +01:00 · 2025-11-10 06:48:01 +01:00
50 changed files with 5751 additions and 724 deletions
@@ -25,7 +25,7 @@ jobs:
          github_token: ${{ secrets.PERSONAL_ACCESS_TOKEN }}
          changelog_increment_filename: body.md
      - name: Release
-        uses: softprops/action-gh-release@6da8fa9354ddfdc4aeace5fc48d7f679b5214090 # v2.4.1
+        uses: softprops/action-gh-release@5be0e66d93ac7ed76da52eca8bb058f665c3a5fe # v2.4.2
        with:
          body_path: "body.md"
          tag_name: v${{ env.REVISION }}
@@ -24,6 +24,18 @@ jobs:
          git config user.name "$GITHUB_ACTOR"
          git config user.email "$GITHUB_ACTOR@users.noreply.github.com"

+      - name: Install Helm
+        uses: azure/setup-helm@1a275c3b69536ee54be43f2070a358922e12c8d4 # v4.3.1
+        with:
+          version: v3.16.0
+
+      - name: Add Helm repositories and update dependencies
+        run: |
+          helm repo add qdrant https://qdrant.github.io/qdrant-helm
+          helm repo add ollama https://otwld.github.io/ollama-helm
+          helm repo update
+          helm dependency build charts/nextcloud-mcp-server
+
      - name: Run chart-releaser
        uses: helm/chart-releaser-action@cae68fefc6b5f367a0275617c9f83181ba54714f # v1.7.0
        env:
@@ -52,6 +52,7 @@ jobs:
        uses: hoverkraft-tech/compose-action@3846bcd61da338e9eaaf83e7ed0234a12b099b72 # v2.4.1
        with:
          compose-file: "./docker-compose.yml"
+          #compose-flags: "--profile qdrant"
          up-flags: "--build"

      - name: Install the latest version of uv
@@ -1,3 +1,100 @@
+## v0.31.0 (2025-11-10)
+
+### Feat
+
+- skip tracing for health and metrics endpoints
+
+### Fix
+
+- add retry logic for ETag conflicts in category change test
+- optimize Notes API pagination with pruneBefore parameter
+
+## v0.30.0 (2025-11-10)
+
+### Feat
+
+- **helm**: Add document chunking configuration
+- **vector**: Add configurable chunk size and overlap for document embedding
+- **vector**: Support multiple embedding models with auto-generated collection names
+
+### Fix
+
+- Support in-memory Qdrant for CI testing
+
+## v0.29.2 (2025-11-09)
+
+### Fix
+
+- **helm**: Set default strategy to Recreate
+
+## v0.29.1 (2025-11-09)
+
+### Fix
+
+- **observability**: isolate metrics endpoint to dedicated port
+
+## v0.29.0 (2025-11-09)
+
+### Feat
+
+- **helm**: Add observability support with ServiceMonitor and Grafana dashboard
+
+### Fix
+
+- **readiness**: Only check external Qdrant in network mode
+
+## v0.28.0 (2025-11-09)
+
+### Feat
+
+- **observability**: Add comprehensive monitoring with Prometheus and OpenTelemetry
+
+### Fix
+
+- **vector**: Handle missing 'modified' field in notes gracefully
+
+## v0.27.3 (2025-11-09)
+
+### Fix
+
+- **ci**: Use helm dependency build instead of update to use Chart.lock
+
+## v0.27.2 (2025-11-09)
+
+### Fix
+
+- **helm**: update Qdrant dependency condition to match new mode structure
+
+## v0.27.1 (2025-11-09)
+
+### Fix
+
+- **ci**: add Helm repository setup to chart release workflow
+
+## v0.27.0 (2025-11-09)
+
+### Feat
+
+- **helm**: add Qdrant local mode support with three deployment options [skip ci]
+- add Qdrant local mode support with in-memory and persistent storage
+- implement ADR-009 - refactor semantic search to use generic semantic:read scope
+- implement MCP sampling for semantic search RAG (ADR-008)
+- add optional vector database and semantic search to helm chart
+- add vector sync processing status to /user/page endpoint
+- implement semantic search tool and fix vector sync issues (ADR-007 Phase 3)
+- implement vector sync scanner and processor (ADR-007 Phase 2)
+
+### Fix
+
+- implement deletion grace period and vector sync status tool
+- remove unnecessary urllib3<2.0 constraint
+- integrate vector sync tasks with Starlette lifespan for streamable-http
+
+### Refactor
+
+- migrate vector sync from asyncio.Queue to anyio memory object streams
+- update to Qdrant query_points API and fix Playwright Keycloak login
+
 ## v0.26.1 (2025-11-08)

 ### Fix
@@ -391,3 +391,7 @@ docker compose exec app php occ user_oidc:provider keycloak
 - `docs/configuration.md` - Configuration options
 - `docs/authentication.md` - Authentication modes
 - `docs/running.md` - Running the server
+
+**For additional information regarding MCP during development, see**:
+- `../../Software/modelcontextprotocol/` - MCP spec
+- `../../Software/python-sdk/` - Python MCP SDK
@@ -2,286 +2,134 @@

 [![Docker Image](https://img.shields.io/badge/docker-ghcr.io/cbcoutinho/nextcloud--mcp--server-blue)](https://github.com/cbcoutinho/nextcloud-mcp-server/pkgs/container/nextcloud-mcp-server)

-**Enable AI assistants to interact with your Nextcloud instance.**
+**A production-ready MCP server that connects AI assistants to your Nextcloud instance.**

-The Nextcloud MCP (Model Context Protocol) server allows Large Language Models like Claude, GPT, and Gemini to interact with your Nextcloud data through a secure API. Create notes, manage calendars, organize contacts, work with files, and more - all through natural language.
+Enable Large Language Models like Claude, GPT, and Gemini to interact with your Nextcloud data through a secure API. Create notes, manage calendars, organize contacts, work with files, and more - all through natural language conversations.
+
+This is a **dedicated standalone MCP server** designed for external MCP clients like Claude Code and IDEs. It runs independently of Nextcloud (Docker, VM, Kubernetes, or local) and provides deep CRUD operations across Nextcloud apps.

 > [!NOTE]
-> **Nextcloud has two ways to enable AI access:** Nextcloud provides [Context Agent](https://github.com/nextcloud/context_agent), an AI agent backend that powers the [Assistant](https://github.com/nextcloud/assistant) app and allows AI to interact with Nextcloud apps like Calendar, Talk, and Contacts. Context Agent runs as an ExApp inside Nextcloud and also _[exposes an MCP server](https://docs.nextcloud.com/server/stable/admin_manual/ai/app_context_agent.html#using-nextcloud-mcp-server)_ for external MCP clients.
->
-> This project (Nextcloud MCP Server) is a **dedicated standalone MCP server** designed specifically for external MCP clients like Claude Code and IDEs, with deep CRUD operations and OAuth support. It does not require any additional AI-features to be enabled in Nextcloud beyond the apps that you intend to interact with.
-
-### High-level Comparison: Nextcloud MCP Server vs. Nextcloud AI Stack
-
-| Aspect | **Nextcloud MCP Server**<br/>(This Project) | **Nextcloud AI Stack**<br/>(Assistant + Context Agent) |
-|--------|---------------------------------------------|--------------------------------------------------------|
-| **Purpose** | External MCP client access to Nextcloud | AI assistance within Nextcloud UI |
-| **Deployment** | Standalone (Docker, VM, K8s) | Inside Nextcloud (ExApp via AppAPI) |
-| **Primary Users** | Claude Code, IDEs, external developers | Nextcloud end users via Assistant app |
-| **Authentication** | OAuth2/OIDC or Basic Auth | Session-based (integrated) |
-| **Notes Support** | ✅ Full CRUD + keyword search (7 tools) | ❌ Not implemented |
-| **Semantic Search** | ✅ Multi-app vector search (2+ tools) | ❌ Not implemented |
-| **Calendar** | ✅ Full CalDAV + tasks (20+ tools) | ✅ Events, free/busy, tasks (4 tools) |
-| **Contacts** | ✅ Full CardDAV (8 tools) | ✅ Find person, current user (2 tools) |
-| **Files (WebDAV)** | ✅ Full filesystem access (12 tools) | ✅ Read, folder tree, sharing (3 tools) |
-| **Document Processing** | ✅ OCR with progress (PDF, DOCX, images) | ❌ Not implemented |
-| **Deck** | ✅ Full project management (15 tools) | ✅ Basic board/card ops (2 tools) |
-| **Tables** | ✅ Row operations (5 tools) | ❌ Not implemented |
-| **Cookbook** | ✅ Full recipe management (13 tools) | ❌ Not implemented |
-| **Talk** | ❌ Not implemented | ✅ Messages, conversations (4 tools) |
-| **Mail** | ❌ Not implemented | ✅ Send email (2 tools) |
-| **AI Features** | ❌ Not implemented | ✅ Image gen, transcription, doc gen (4 tools) |
-| **Web/Maps** | ❌ Not implemented | ✅ Search, weather, transit (5 tools) |
-| **MCP Resources** | ✅ Structured data URIs | ❌ Not supported |
-| **External MCP** | ❌ Pure server | ✅ Consumes external MCP servers |
-| **Safety Model** | Client-controlled | Built-in safe/dangerous distinction |
-| **Best For** | • Deep CRUD operations<br/>• External integrations<br/>• OAuth security<br/>• IDE/editor integration | • AI-driven actions in Nextcloud UI<br/>• Multi-service orchestration<br/>• User task automation<br/>• MCP aggregation hub |
-
-See our [detailed comparison](docs/comparison-context-agent.md) for architecture diagrams, workflow examples, and guidance on when to use each approach.
-
-Want to see another Nextcloud app supported? [Open an issue](https://github.com/cbcoutinho/nextcloud-mcp-server/issues) or contribute a pull request!
-
-### Authentication
-
-| Mode | Security | Best For |
-|------|----------|----------|
-| **OAuth2/OIDC** ⚠️ **Experimental** | 🔒 High | Testing, evaluation (requires patch for app-specific APIs) |
-| **Basic Auth** ✅ | Lower | Development, testing, production |
-
-> [!IMPORTANT]
-> **OAuth is experimental** and requires a manual patch to the `user_oidc` app for full functionality:
-> - **Required patch**: `user_oidc` app needs modifications for Bearer token support ([issue #1221](https://github.com/nextcloud/user_oidc/issues/1221))
-> - **Impact**: Without the patch, most app-specific APIs (Notes, Calendar, Contacts, Deck, etc.) will fail with 401 errors
-> - **What works without patches**: OAuth flow, PKCE support (with `oidc` v1.10.0+), OCS APIs
-> - **Production use**: Wait for upstream patch to be merged into official releases
->
-> See [OAuth Upstream Status](docs/oauth-upstream-status.md) for detailed information on required patches and workarounds.
-
-OAuth2/OIDC provides secure, per-user authentication with access tokens. See [Authentication Guide](docs/authentication.md) for details.
+> **Looking for AI features inside Nextcloud?** Nextcloud also provides [Context Agent](https://github.com/nextcloud/context_agent), which powers the Assistant app and runs as an ExApp inside Nextcloud. See [docs/comparison-context-agent.md](docs/comparison-context-agent.md) for a detailed comparison of use cases.

 ## Quick Start

-### 1. Install
+Get up and running in 60 seconds using Docker:

 ```bash
-# Clone the repository
-git clone https://github.com/cbcoutinho/nextcloud-mcp-server.git
-cd nextcloud-mcp-server
-
-# Install with uv (recommended)
-uv sync
-
-# Or using Docker
-docker pull ghcr.io/cbcoutinho/nextcloud-mcp-server:latest
-
-# Or deploy to Kubernetes with Helm
-helm repo add nextcloud-mcp https://cbcoutinho.github.io/nextcloud-mcp-server
-helm repo update
-helm install nextcloud-mcp nextcloud-mcp/nextcloud-mcp-server \
-  --set nextcloud.host=https://cloud.example.com \
-  --set auth.basic.username=myuser \
-  --set auth.basic.password=mypassword
-```
-
-See [Installation Guide](docs/installation.md) for detailed instructions, or [Helm Chart README](charts/nextcloud-mcp-server/README.md) for Kubernetes deployment.
-
-### 2. Configure
-
-Create a `.env` file:
-
-```bash
-# Copy the sample
-cp env.sample .env
-```
-
-**For Basic Auth (recommended for most users):**
-```dotenv
+# 1. Create a minimal configuration
+cat > .env << EOF
 NEXTCLOUD_HOST=https://your.nextcloud.instance.com
 NEXTCLOUD_USERNAME=your_username
 NEXTCLOUD_PASSWORD=your_app_password
-```
+EOF

-**For OAuth (experimental - requires patches):**
-```dotenv
-NEXTCLOUD_HOST=https://your.nextcloud.instance.com
-```
-
-See [Configuration Guide](docs/configuration.md) for all options.
-
-### 3. Set Up Authentication
-
-**Basic Auth Setup (recommended):**
-1. Create an app password in Nextcloud (Settings → Security → Devices & sessions)
-2. Add credentials to `.env` file
-3. Start the server
-
-**OAuth Setup (experimental):**
-1. Install Nextcloud OIDC apps (`oidc` v1.10.0+ + `user_oidc`)
-2. **Apply required patch** to `user_oidc` app for Bearer token support (see [OAuth Upstream Status](docs/oauth-upstream-status.md))
-3. Enable dynamic client registration or create an OIDC client with id & secret
-4. Configure Bearer token validation in `user_oidc`
-5. Start the server
-
-See [OAuth Quick Start](docs/quickstart-oauth.md) for 5-minute setup or [OAuth Setup Guide](docs/oauth-setup.md) for detailed instructions.
-
-### 4. Run the Server
-
-```bash
-# Load environment variables
-export $(grep -v '^#' .env | xargs)
-
-# Start with Basic Auth (default)
-uv run nextcloud-mcp-server
-
-# Or start with OAuth (experimental - requires patches)
-uv run nextcloud-mcp-server --oauth
-
-# Or with Docker
+# 2. Start the server
 docker run -p 127.0.0.1:8000:8000 --env-file .env --rm \
  ghcr.io/cbcoutinho/nextcloud-mcp-server:latest
+
+# 3. Test the connection
+curl http://127.0.0.1:8000/health/ready
 ```

-The server starts on `http://127.0.0.1:8000` by default.
+**Next Steps:**
+- Create an app password in Nextcloud: Settings → Security → Devices & sessions
+- Connect your MCP client (Claude Desktop, IDEs, `mcp dev`, etc.)
+- See [docs/installation.md](docs/installation.md) for other deployment options (local, Kubernetes)

-See [Running the Server](docs/running.md) for more options.
+## Key Features

-### 5. Connect an MCP Client
+- **90+ MCP Tools** - Comprehensive API coverage across 8 Nextcloud apps
+- **MCP Resources** - Structured data URIs for browsing Nextcloud data
+- **Semantic Search (Experimental)** - Optional vector-powered search for Notes (requires Qdrant + Ollama)
+- **Document Processing** - OCR and text extraction from PDFs, DOCX, images with progress notifications
+- **Flexible Deployment** - Docker, Kubernetes (Helm), VM, or local installation
+- **Production-Ready Auth** - Basic Auth with app passwords (recommended) or OAuth2/OIDC (experimental)
+- **Multiple Transports** - SSE, HTTP, and streamable-http support

-Test with MCP Inspector:
+## Supported Apps

-```bash
-uv run mcp dev
-```
+| App | Tools | Capabilities |
+|-----|-------|--------------|
+| **Notes** | 7 | Full CRUD, keyword search, semantic search |
+| **Calendar** | 20+ | Events, todos (tasks), recurring events, attendees, availability |
+| **Contacts** | 8 | Full CardDAV support, address books |
+| **Files (WebDAV)** | 12 | Filesystem access, OCR/document processing |
+| **Deck** | 15 | Boards, stacks, cards, labels, assignments |
+| **Cookbook** | 13 | Recipe management, URL import (schema.org) |
+| **Tables** | 5 | Row operations on Nextcloud Tables |
+| **Sharing** | 10+ | Create and manage shares |
+| **Semantic Search** | 2+ | Vector search for Notes (experimental, opt-in, requires infrastructure) |

-Or connect from:
- Claude Desktop
- Any MCP-compatible client
+Want to see another Nextcloud app supported? [Open an issue](https://github.com/cbcoutinho/nextcloud-mcp-server/issues) or contribute a pull request!
+
+## Authentication
+
+> [!IMPORTANT]
+> **OAuth2/OIDC is experimental** and requires a manual patch to the `user_oidc` app:
+> - **Required patch**: Bearer token support ([issue #1221](https://github.com/nextcloud/user_oidc/issues/1221))
+> - **Impact**: Without the patch, most app-specific APIs fail with 401 errors
+> - **Recommendation**: Use Basic Auth for production until upstream patches are merged
+>
+> See [docs/oauth-upstream-status.md](docs/oauth-upstream-status.md) for patch status and workarounds.
+
+**Recommended:** Basic Auth with app-specific passwords provides secure, production-ready authentication. See [docs/authentication.md](docs/authentication.md) for setup details and OAuth configuration.
+
+### Authentication Modes
+
+The server supports two authentication modes:
+
+**Single-User Mode (BasicAuth):**
+- One set of credentials shared by all MCP clients
+- Simple setup: username + app password in environment variables
+- All clients access Nextcloud as the same user
+- Best for: Personal use, development, single-user deployments
+
+**Multi-User Mode (OAuth):**
+- Each MCP client authenticates separately with their own Nextcloud account
+- Per-user scopes and permissions (clients only see tools they're authorized for)
+- More secure: tokens expire, credentials never shared with server
+- Best for: Teams, multi-user deployments, production environments with multiple users
+
+See [docs/authentication.md](docs/authentication.md) for detailed setup instructions.
+
+## Semantic Search
+
+The server provides an experimental RAG pipeline to enable _Semantic Search_ that enables MCP clients to find information in Nextcloud based on **meaning** rather than just keywords. Instead of matching "machine learning" only when those exact words appear, it understands that "neural networks," "AI models," and "deep learning" are semantically related concepts.
+
+**Example:**
+- **Keyword search**: Query "car" only finds notes containing "car"
+- **Semantic search**: Query "car" also finds notes about "automobile," "vehicle," "sedan," "transportation"
+
+This enables natural language queries and helps discover related content across your Nextcloud notes.
+
+> [!NOTE]
+> **Semantic Search is experimental and opt-in:**
+> - Disabled by default (`VECTOR_SYNC_ENABLED=false`)
+> - Currently supports Notes app only (multi-app support planned)
+> - Requires additional infrastructure: vector database + embedding service
+> - Answer generation (`nc_semantic_search_answer`) requires MCP client sampling support
+>
+> See [docs/semantic-search-architecture.md](docs/semantic-search-architecture.md) for architecture details and [docs/configuration.md](docs/configuration.md) for setup instructions.

 ## Documentation

 ### Getting Started
- **[Installation](docs/installation.md)** - Install the server
- **[Configuration](docs/configuration.md)** - Environment variables and settings
- **[Authentication](docs/authentication.md)** - OAuth vs BasicAuth
- **[Running the Server](docs/running.md)** - Start and manage the server
+- **[Installation](docs/installation.md)** - Docker, Kubernetes, local, or VM deployment
+- **[Configuration](docs/configuration.md)** - Environment variables and advanced options
+- **[Authentication](docs/authentication.md)** - Basic Auth vs OAuth2/OIDC setup
+- **[Running the Server](docs/running.md)** - Start, manage, and troubleshoot

-### Architecture
- **[Comparison with Context Agent](docs/comparison-context-agent.md)** - How this MCP server differs from Nextcloud's Context Agent
+### Features
+- **[App Documentation](docs/)** - Notes, Calendar, Contacts, WebDAV, Deck, Cookbook, Tables
+- **[Document Processing](docs/configuration.md#document-processing)** - OCR and text extraction setup
+- **[Semantic Search Architecture](docs/semantic-search-architecture.md)** - Experimental vector search (Notes only, opt-in)

-### OAuth Documentation (Experimental)
- **[OAuth Quick Start](docs/quickstart-oauth.md)** - 5-minute setup guide
- **[OAuth Setup Guide](docs/oauth-setup.md)** - Detailed setup instructions
- **[OAuth Architecture](docs/oauth-architecture.md)** - How OAuth works
- **[OAuth Troubleshooting](docs/oauth-troubleshooting.md)** - OAuth-specific issues
- **[Upstream Status](docs/oauth-upstream-status.md)** - **Required patches and PRs** ⚠️
-
-### Reference
+### Advanced Topics
+- **[OAuth Architecture](docs/oauth-architecture.md)** - How OAuth works (experimental)
+- **[OAuth Quick Start](docs/quickstart-oauth.md)** - 5-minute OAuth setup
+- **[OAuth Setup Guide](docs/oauth-setup.md)** - Detailed OAuth configuration
 - **[Troubleshooting](docs/troubleshooting.md)** - Common issues and solutions
-
-### App-Specific Documentation
- [Notes API](docs/notes.md)
- [Calendar (CalDAV)](docs/calendar.md)
- [Contacts (CardDAV)](docs/contacts.md)
- [Cookbook](docs/cookbook.md)
- [Deck](docs/deck.md)
- [Tables](docs/table.md)
- [WebDAV](docs/webdav.md)
-
-## MCP Tools & Resources
-
-The server exposes Nextcloud functionality through MCP tools (for actions) and resources (for data browsing).
-
-### Tools
-
-The server provides 90+ tools across 8 Nextcloud apps. When using OAuth, tools are dynamically filtered based on your granted scopes.
-
-For a complete list of all supported OAuth scopes and their descriptions, see [OAuth Scopes Documentation](docs/oauth-architecture.md#oauth-scopes).
-
-#### Available Tool Categories
-
-| App | Tools | Read Scope | Write Scope | Operations |
-|-----|-------|-----------|-------------|------------|
-| **Notes** | 7 | `notes:read` | `notes:write` | Create, read, update, delete, search notes (keyword search) |
-| **Calendar** | 20+ | `calendar:read` `todo:read`  | `calendar:write` `todo:write`   | Events, todos (tasks), calendars, recurring events, attendees |
-| **Contacts** | 8 | `contacts:read` | `contacts:write` | Create, read, update, delete contacts and address books |
-| **Files (WebDAV)** | 12 | `files:read` | `files:write` | List, read, upload, delete, move files; **OCR/document processing** |
-| **Deck** | 15 | `deck:read` | `deck:write` | Boards, stacks, cards, labels, assignments |
-| **Cookbook** | 13 | `cookbook:read` | `cookbook:write` | Recipes, import from URLs, search, categories |
-| **Tables** | 5 | `tables:read` | `tables:write` | Row operations on Nextcloud Tables |
-| **Sharing** | 10+ | `sharing:read` | `sharing:write` | Create, manage, delete shares |
-| **Semantic Search** | 2+ | `semantic:read` | `semantic:write` | Vector-powered semantic search across **all apps** (notes, calendar, deck, files, contacts), background indexing |
-
-#### Document Processing (Optional)
-
-The WebDAV file reading tool (`nc_webdav_read_file`) supports **automatic text extraction** from documents and images:
-
-**Supported Formats:**
- **Documents**: PDF, DOCX, PPTX, XLSX, RTF, ODT, EPUB
- **Images**: PNG, JPEG, TIFF, BMP (with OCR)
- **Email**: EML, MSG files
-
-**Features:**
- **Progress Notifications**: Long-running OCR operations (up to 120s) send progress updates every 10 seconds to prevent client timeouts
- **Pluggable Architecture**: Multiple processor backends (Unstructured.io, Tesseract, custom HTTP APIs)
- **Automatic Detection**: Files are processed based on MIME type
- **Graceful Fallback**: Returns base64-encoded content if processing fails
-
-**Configuration:**
-```dotenv
-# Enable document processing (optional)
-ENABLE_DOCUMENT_PROCESSING=true
-
-# Unstructured.io processor (cloud/API-based, supports many formats)
-ENABLE_UNSTRUCTURED=true
-UNSTRUCTURED_API_URL=http://localhost:8002
-UNSTRUCTURED_STRATEGY=auto  # auto, fast, or hi_res
-UNSTRUCTURED_LANGUAGES=eng,deu
-PROGRESS_INTERVAL=10  # Progress update interval in seconds
-
-# Tesseract processor (local OCR, images only)
-ENABLE_TESSERACT=false
-TESSERACT_LANG=eng
-
-# Custom HTTP processor
-ENABLE_CUSTOM_PROCESSOR=false
-CUSTOM_PROCESSOR_URL=http://localhost:9000/process
-CUSTOM_PROCESSOR_TYPES=application/pdf,image/jpeg
-```
-
-**Example Usage:**
-```
-AI: "Read the contents of Documents/report.pdf"
-→ Uses nc_webdav_read_file tool with automatic OCR processing
-→ Returns extracted text with parsing metadata
-→ Sends progress updates during long operations
-```
-
-See [env.sample](env.sample) for complete configuration options.
-
-**Example Tools:**
- `nc_notes_create_note` - Create a new note
- `nc_cookbook_import_recipe` - Import recipes from URLs with schema.org metadata
- `deck_create_card` - Create a Deck card
- `nc_calendar_create_event` - Create a calendar event
- `nc_calendar_create_todo` - Create a CalDAV task/todo
- `nc_contacts_create_contact` - Create a contact
- `nc_webdav_upload_file` - Upload a file to Nextcloud
- And 80+ more...
-
-> [!TIP]
-> **OAuth Scope Filtering**: When connecting via OAuth, MCP clients will only see tools for which you've granted access. For example, granting only `notes:read` and `notes:write` will show 7 Notes tools instead of all 90+ tools. See [OAuth Scopes Documentation](docs/oauth-architecture.md#oauth-scopes) for the complete scope reference, or [OAuth Troubleshooting - Limited Scopes](docs/oauth-troubleshooting.md#limited-scopes---only-seeing-notes-tools) if you're only seeing a subset of tools.
->
-> **Known Issue**: Claude Code and some other MCP clients may only request/grant Notes scopes during initial connection. Track progress at [#234](https://github.com/cbcoutinho/nextcloud-mcp-server/issues/234).
-
-### Resources
-Resources provide read-only access to Nextcloud data:
- `nc://capabilities` - Server capabilities
- `cookbook://version` - Cookbook app version info
- `nc://Deck/boards/{board_id}` - Deck board data
- `notes://settings` - Notes app settings
- And more...
-
-Run `uv run nextcloud-mcp-server --help` to see all available options.
+- **[Comparison with Context Agent](docs/comparison-context-agent.md)** - When to use each approach

 ## Examples

@@ -291,45 +139,31 @@ AI: "Create a note called 'Meeting Notes' with today's agenda"
 → Uses nc_notes_create_note tool
 ```

-### Manage Recipes
+### Import Recipes
 ```
-AI: "Import the recipe from this URL: https://www.example.com/recipe/chocolate-cake"
-→ Uses nc_cookbook_import_recipe tool to extract schema.org metadata
+AI: "Import the recipe from https://www.example.com/recipe/chocolate-cake"
+→ Uses nc_cookbook_import_recipe tool with schema.org metadata extraction
 ```

-### Manage Calendar
+### Schedule Meetings
 ```
 AI: "Schedule a team meeting for next Tuesday at 2pm"
 → Uses nc_calendar_create_event tool
 ```

-### Organize Files
+### Manage Files
 ```
 AI: "Create a folder called 'Project X' and move all PDFs there"
-→ Uses WebDAV tools (nc_webdav_create_directory, nc_webdav_move)
+→ Uses nc_webdav_create_directory and nc_webdav_move tools
 ```

-### Project Management
+### Semantic Search (Experimental, Opt-in)
 ```
-AI: "Create a new Deck board for Q1 planning with Todo, In Progress, and Done stacks"
-→ Uses deck_create_board and deck_create_stack tools
+AI: "Find notes related to machine learning concepts"
+→ Uses nc_semantic_search to find semantically similar notes (requires Qdrant + Ollama setup)
 ```

-## Transport Protocols
-
-The server supports multiple MCP transport protocols:
-
- **streamable-http** (recommended) - Modern streaming protocol
- **sse** (default, deprecated) - Server-Sent Events for backward compatibility
- **http** - Standard HTTP protocol
-
-```bash
-# Use streamable-http (recommended)
-uv run nextcloud-mcp-server --transport streamable-http
-```
-
-> [!WARNING]
-> SSE transport is deprecated and will be removed in a future MCP specification version. Please migrate to `streamable-http`.
+**Note:** For AI-generated answers with citations, use `nc_semantic_search_answer` (requires MCP client with sampling support).

 ## Contributing

@@ -337,17 +171,17 @@ Contributions are welcome!

 - Report bugs or request features: [GitHub Issues](https://github.com/cbcoutinho/nextcloud-mcp-server/issues)
 - Submit improvements: [Pull Requests](https://github.com/cbcoutinho/nextcloud-mcp-server/pulls)
- Read [CLAUDE.md](CLAUDE.md) for development guidelines
+- Development guidelines: [CLAUDE.md](CLAUDE.md)

 ## Security

 [![MseeP.ai Security Assessment](https://mseep.net/pr/cbcoutinho-nextcloud-mcp-server-badge.png)](https://mseep.ai/app/cbcoutinho-nextcloud-mcp-server)

 This project takes security seriously:
- OAuth2/OIDC support (experimental - requires upstream patches)
- Basic Auth with app-specific passwords (recommended)
- No credential storage with OAuth mode
+- Production-ready Basic Auth with app-specific passwords
+- OAuth2/OIDC support (experimental, requires upstream patches)
 - Per-user access tokens
+- No credential storage in OAuth mode
 - Regular security assessments

 Found a security issue? Please report it privately to the maintainers.
@@ -1,9 +1,9 @@
 dependencies:
 - name: qdrant
  repository: https://qdrant.github.io/qdrant-helm
-  version: 0.9.0
+  version: 1.15.5
 - name: ollama
  repository: https://otwld.github.io/ollama-helm
-  version: 1.33.0
-digest: sha256:c53b7a604d202460f60408a62025ae837cad8d4da970b1e5bb404e2b41289f94
-generated: "2025-11-08T23:44:59.709689907+01:00"
+  version: 1.34.0
+digest: sha256:d51c97d05be2614b751c0dd7267ef7dc959eff5ebef859c5f895c5c554b7a874
+generated: "2025-11-09T17:08:02.86648061Z"
@@ -2,8 +2,8 @@ apiVersion: v2
 name: nextcloud-mcp-server
 description: A Helm chart for Nextcloud MCP Server - enables AI assistants to interact with Nextcloud
 type: application
-version: 0.26.1
-appVersion: "0.26.1"
+version: 0.31.0
+appVersion: "0.31.0"
 keywords:
  - nextcloud
  - mcp
@@ -23,10 +23,10 @@ sources:
 icon: https://raw.githubusercontent.com/nextcloud/server/master/core/img/logo/logo.svg
 dependencies:
  - name: qdrant
-    version: "0.9.0"
+    version: "1.15.5"
    repository: https://qdrant.github.io/qdrant-helm
-    condition: qdrant.enabled
+    condition: qdrant.networkMode.deploySubchart
  - name: ollama
-    version: "1.33.0"
+    version: "1.34.0"
    repository: https://otwld.github.io/ollama-helm
    condition: ollama.enabled
@@ -219,6 +219,19 @@ Enable semantic search capabilities by deploying a vector database (Qdrant) and
 | `vectorSync.processorWorkers` | Number of concurrent processor workers | `3` |
 | `vectorSync.queueMaxSize` | Maximum queue size for pending documents | `10000` |

+**Document Chunking Configuration:**
+
+| Parameter | Description | Default |
+|-----------|-------------|---------|
+| `documentChunking.chunkSize` | Number of words per chunk for embedding | `512` |
+| `documentChunking.chunkOverlap` | Number of overlapping words between chunks | `50` |
+
+**Chunking Strategy:**
+- **Small chunks (256-384)**: Better precision for searches, more storage overhead
+- **Medium chunks (512-768)**: Balanced approach (recommended for most use cases)
+- **Large chunks (1024+)**: Better context preservation, less precise matching
+- **Overlap**: Should be 10-20% of chunk size to preserve context across boundaries
+
 **Qdrant Vector Database:**

 Qdrant is deployed as a subchart when `qdrant.enabled` is `true`. All configuration values are passed through to the [qdrant/qdrant](https://github.com/qdrant/qdrant-helm) chart.
@@ -0,0 +1,90 @@
+# Grafana Dashboards
+
+This directory contains example Grafana dashboards for monitoring the Nextcloud MCP Server.
+
+## Dashboards
+
+### nextcloud-mcp-server.json
+
+Comprehensive dashboard with the following panels:
+
+- **Request Rate**: HTTP requests per second by method and endpoint
+- **Error Rate**: Percentage of 5xx errors
+- **Request Latency**: P50 and P95 latency by endpoint
+- **Top MCP Tools**: Most frequently called tools
+- **Nextcloud API Latency**: API call latency by app (notes, calendar, etc.)
+- **Vector Sync Queue**: Queue size for background document processing
+
+## Importing to Grafana
+
+### Manual Import
+
+1. Open Grafana UI
+2. Navigate to Dashboards → Import
+3. Upload `nextcloud-mcp-server.json`
+4. Select your Prometheus data source
+5. Click "Import"
+
+### Automated Import (Kubernetes)
+
+If using the Grafana Operator or kube-prometheus-stack, you can create a ConfigMap:
+
+```bash
+kubectl create configmap nextcloud-mcp-dashboards \
+  --from-file=nextcloud-mcp-server.json \
+  -n monitoring
+
+# Add label for Grafana sidecar to discover
+kubectl label configmap nextcloud-mcp-dashboards \
+  grafana_dashboard=1 \
+  -n monitoring
+```
+
+Or add to your Helm values:
+
+```yaml
+# values.yaml for kube-prometheus-stack
+grafana:
+  dashboardProviders:
+    dashboardproviders.yaml:
+      apiVersion: 1
+      providers:
+        - name: 'nextcloud-mcp'
+          orgId: 1
+          folder: 'Nextcloud MCP'
+          type: file
+          disableDeletion: false
+          editable: true
+          options:
+            path: /var/lib/grafana/dashboards/nextcloud-mcp
+
+  dashboardsConfigMaps:
+    nextcloud-mcp: nextcloud-mcp-dashboards
+```
+
+## Dashboard Variables
+
+The dashboard includes two variables:
+
+- **Data Source**: Select your Prometheus data source
+- **Namespace**: Filter metrics by Kubernetes namespace
+
+## Customization
+
+You can customize the dashboard by:
+
+1. Adjusting refresh rate (default: 30s)
+2. Modifying time range (default: last 6 hours)
+3. Adding new panels for specific metrics
+4. Adjusting thresholds in existing panels
+
+## Metrics Reference
+
+All metrics are documented in `/docs/observability.md`. Key metric prefixes:
+
+- `mcp_http_*` - HTTP server metrics
+- `mcp_tool_*` - MCP tool invocation metrics
+- `mcp_nextcloud_api_*` - Nextcloud API call metrics
+- `mcp_oauth_*` - OAuth token validation metrics
+- `mcp_vector_sync_*` - Vector database sync metrics
+- `mcp_db_*` - Database operation metrics
@@ -0,0 +1,630 @@
+{
+  "annotations": {
+    "list": [
+      {
+        "builtIn": 1,
+        "datasource": {
+          "type": "grafana",
+          "uid": "-- Grafana --"
+        },
+        "enable": true,
+        "hide": true,
+        "iconColor": "rgba(0, 211, 255, 1)",
+        "name": "Annotations & Alerts",
+        "type": "dashboard"
+      }
+    ]
+  },
+  "editable": true,
+  "fiscalYearStartMonth": 0,
+  "graphTooltip": 0,
+  "id": null,
+  "links": [],
+  "liveNow": false,
+  "panels": [
+    {
+      "datasource": {
+        "type": "prometheus",
+        "uid": "${datasource}"
+      },
+      "fieldConfig": {
+        "defaults": {
+          "color": {
+            "mode": "palette-classic"
+          },
+          "custom": {
+            "axisCenteredZero": false,
+            "axisColorMode": "text",
+            "axisLabel": "",
+            "axisPlacement": "auto",
+            "barAlignment": 0,
+            "drawStyle": "line",
+            "fillOpacity": 10,
+            "gradientMode": "none",
+            "hideFrom": {
+              "tooltip": false,
+              "viz": false,
+              "legend": false
+            },
+            "lineInterpolation": "linear",
+            "lineWidth": 1,
+            "pointSize": 5,
+            "scaleDistribution": {
+              "type": "linear"
+            },
+            "showPoints": "never",
+            "spanNulls": false,
+            "stacking": {
+              "group": "A",
+              "mode": "none"
+            },
+            "thresholdsStyle": {
+              "mode": "off"
+            }
+          },
+          "mappings": [],
+          "thresholds": {
+            "mode": "absolute",
+            "steps": [
+              {
+                "color": "green",
+                "value": null
+              }
+            ]
+          },
+          "unit": "reqps"
+        },
+        "overrides": []
+      },
+      "gridPos": {
+        "h": 8,
+        "w": 12,
+        "x": 0,
+        "y": 0
+      },
+      "id": 1,
+      "options": {
+        "legend": {
+          "calcs": ["mean", "max"],
+          "displayMode": "table",
+          "placement": "bottom",
+          "showLegend": true
+        },
+        "tooltip": {
+          "mode": "multi",
+          "sort": "none"
+        }
+      },
+      "targets": [
+        {
+          "datasource": {
+            "type": "prometheus",
+            "uid": "${datasource}"
+          },
+          "expr": "sum(rate(mcp_http_requests_total{namespace=\"$namespace\"}[5m])) by (method, endpoint)",
+          "legendFormat": "{{method}} {{endpoint}}",
+          "refId": "A"
+        }
+      ],
+      "title": "Request Rate",
+      "type": "timeseries"
+    },
+    {
+      "datasource": {
+        "type": "prometheus",
+        "uid": "${datasource}"
+      },
+      "fieldConfig": {
+        "defaults": {
+          "color": {
+            "mode": "thresholds"
+          },
+          "custom": {
+            "axisCenteredZero": false,
+            "axisColorMode": "text",
+            "axisLabel": "",
+            "axisPlacement": "auto",
+            "barAlignment": 0,
+            "drawStyle": "line",
+            "fillOpacity": 10,
+            "gradientMode": "none",
+            "hideFrom": {
+              "tooltip": false,
+              "viz": false,
+              "legend": false
+            },
+            "lineInterpolation": "linear",
+            "lineWidth": 1,
+            "pointSize": 5,
+            "scaleDistribution": {
+              "type": "linear"
+            },
+            "showPoints": "never",
+            "spanNulls": false,
+            "stacking": {
+              "group": "A",
+              "mode": "none"
+            },
+            "thresholdsStyle": {
+              "mode": "line"
+            }
+          },
+          "mappings": [],
+          "max": 100,
+          "min": 0,
+          "thresholds": {
+            "mode": "absolute",
+            "steps": [
+              {
+                "color": "green",
+                "value": null
+              },
+              {
+                "color": "yellow",
+                "value": 1
+              },
+              {
+                "color": "red",
+                "value": 5
+              }
+            ]
+          },
+          "unit": "percent"
+        },
+        "overrides": []
+      },
+      "gridPos": {
+        "h": 8,
+        "w": 12,
+        "x": 12,
+        "y": 0
+      },
+      "id": 2,
+      "options": {
+        "legend": {
+          "calcs": ["mean", "max"],
+          "displayMode": "table",
+          "placement": "bottom",
+          "showLegend": true
+        },
+        "tooltip": {
+          "mode": "multi",
+          "sort": "none"
+        }
+      },
+      "targets": [
+        {
+          "datasource": {
+            "type": "prometheus",
+            "uid": "${datasource}"
+          },
+          "expr": "sum(rate(mcp_http_requests_total{status_code=~\"5..\", namespace=\"$namespace\"}[5m])) / sum(rate(mcp_http_requests_total{namespace=\"$namespace\"}[5m])) * 100",
+          "legendFormat": "Error Rate",
+          "refId": "A"
+        }
+      ],
+      "title": "Error Rate (%)",
+      "type": "timeseries"
+    },
+    {
+      "datasource": {
+        "type": "prometheus",
+        "uid": "${datasource}"
+      },
+      "fieldConfig": {
+        "defaults": {
+          "color": {
+            "mode": "palette-classic"
+          },
+          "custom": {
+            "axisCenteredZero": false,
+            "axisColorMode": "text",
+            "axisLabel": "",
+            "axisPlacement": "auto",
+            "barAlignment": 0,
+            "drawStyle": "line",
+            "fillOpacity": 10,
+            "gradientMode": "none",
+            "hideFrom": {
+              "tooltip": false,
+              "viz": false,
+              "legend": false
+            },
+            "lineInterpolation": "linear",
+            "lineWidth": 1,
+            "pointSize": 5,
+            "scaleDistribution": {
+              "type": "linear"
+            },
+            "showPoints": "never",
+            "spanNulls": false,
+            "stacking": {
+              "group": "A",
+              "mode": "none"
+            },
+            "thresholdsStyle": {
+              "mode": "off"
+            }
+          },
+          "mappings": [],
+          "thresholds": {
+            "mode": "absolute",
+            "steps": [
+              {
+                "color": "green",
+                "value": null
+              }
+            ]
+          },
+          "unit": "s"
+        },
+        "overrides": []
+      },
+      "gridPos": {
+        "h": 8,
+        "w": 12,
+        "x": 0,
+        "y": 8
+      },
+      "id": 3,
+      "options": {
+        "legend": {
+          "calcs": ["mean", "max"],
+          "displayMode": "table",
+          "placement": "bottom",
+          "showLegend": true
+        },
+        "tooltip": {
+          "mode": "multi",
+          "sort": "none"
+        }
+      },
+      "targets": [
+        {
+          "datasource": {
+            "type": "prometheus",
+            "uid": "${datasource}"
+          },
+          "expr": "histogram_quantile(0.95, sum(rate(mcp_http_request_duration_seconds_bucket{namespace=\"$namespace\"}[5m])) by (le, endpoint))",
+          "legendFormat": "{{endpoint}} (p95)",
+          "refId": "A"
+        },
+        {
+          "datasource": {
+            "type": "prometheus",
+            "uid": "${datasource}"
+          },
+          "expr": "histogram_quantile(0.50, sum(rate(mcp_http_request_duration_seconds_bucket{namespace=\"$namespace\"}[5m])) by (le, endpoint))",
+          "legendFormat": "{{endpoint}} (p50)",
+          "refId": "B"
+        }
+      ],
+      "title": "Request Latency (P50/P95)",
+      "type": "timeseries"
+    },
+    {
+      "datasource": {
+        "type": "prometheus",
+        "uid": "${datasource}"
+      },
+      "fieldConfig": {
+        "defaults": {
+          "color": {
+            "mode": "palette-classic"
+          },
+          "custom": {
+            "axisCenteredZero": false,
+            "axisColorMode": "text",
+            "axisLabel": "",
+            "axisPlacement": "auto",
+            "barAlignment": 0,
+            "drawStyle": "line",
+            "fillOpacity": 10,
+            "gradientMode": "none",
+            "hideFrom": {
+              "tooltip": false,
+              "viz": false,
+              "legend": false
+            },
+            "lineInterpolation": "linear",
+            "lineWidth": 1,
+            "pointSize": 5,
+            "scaleDistribution": {
+              "type": "linear"
+            },
+            "showPoints": "never",
+            "spanNulls": false,
+            "stacking": {
+              "group": "A",
+              "mode": "none"
+            },
+            "thresholdsStyle": {
+              "mode": "off"
+            }
+          },
+          "mappings": [],
+          "thresholds": {
+            "mode": "absolute",
+            "steps": [
+              {
+                "color": "green",
+                "value": null
+              }
+            ]
+          },
+          "unit": "short"
+        },
+        "overrides": []
+      },
+      "gridPos": {
+        "h": 8,
+        "w": 12,
+        "x": 12,
+        "y": 8
+      },
+      "id": 4,
+      "options": {
+        "legend": {
+          "calcs": ["mean", "max"],
+          "displayMode": "table",
+          "placement": "bottom",
+          "showLegend": true
+        },
+        "tooltip": {
+          "mode": "multi",
+          "sort": "none"
+        }
+      },
+      "targets": [
+        {
+          "datasource": {
+            "type": "prometheus",
+            "uid": "${datasource}"
+          },
+          "expr": "topk(10, sum(rate(mcp_tool_calls_total{namespace=\"$namespace\"}[5m])) by (tool_name))",
+          "legendFormat": "{{tool_name}}",
+          "refId": "A"
+        }
+      ],
+      "title": "Top MCP Tools by Volume",
+      "type": "timeseries"
+    },
+    {
+      "datasource": {
+        "type": "prometheus",
+        "uid": "${datasource}"
+      },
+      "fieldConfig": {
+        "defaults": {
+          "color": {
+            "mode": "palette-classic"
+          },
+          "custom": {
+            "axisCenteredZero": false,
+            "axisColorMode": "text",
+            "axisLabel": "",
+            "axisPlacement": "auto",
+            "barAlignment": 0,
+            "drawStyle": "line",
+            "fillOpacity": 10,
+            "gradientMode": "none",
+            "hideFrom": {
+              "tooltip": false,
+              "viz": false,
+              "legend": false
+            },
+            "lineInterpolation": "linear",
+            "lineWidth": 1,
+            "pointSize": 5,
+            "scaleDistribution": {
+              "type": "linear"
+            },
+            "showPoints": "never",
+            "spanNulls": false,
+            "stacking": {
+              "group": "A",
+              "mode": "none"
+            },
+            "thresholdsStyle": {
+              "mode": "off"
+            }
+          },
+          "mappings": [],
+          "thresholds": {
+            "mode": "absolute",
+            "steps": [
+              {
+                "color": "green",
+                "value": null
+              }
+            ]
+          },
+          "unit": "s"
+        },
+        "overrides": []
+      },
+      "gridPos": {
+        "h": 8,
+        "w": 12,
+        "x": 0,
+        "y": 16
+      },
+      "id": 5,
+      "options": {
+        "legend": {
+          "calcs": ["mean", "max"],
+          "displayMode": "table",
+          "placement": "bottom",
+          "showLegend": true
+        },
+        "tooltip": {
+          "mode": "multi",
+          "sort": "none"
+        }
+      },
+      "targets": [
+        {
+          "datasource": {
+            "type": "prometheus",
+            "uid": "${datasource}"
+          },
+          "expr": "histogram_quantile(0.95, sum(rate(mcp_nextcloud_api_duration_seconds_bucket{namespace=\"$namespace\"}[5m])) by (le, app))",
+          "legendFormat": "{{app}} (p95)",
+          "refId": "A"
+        }
+      ],
+      "title": "Nextcloud API Latency by App",
+      "type": "timeseries"
+    },
+    {
+      "datasource": {
+        "type": "prometheus",
+        "uid": "${datasource}"
+      },
+      "fieldConfig": {
+        "defaults": {
+          "color": {
+            "mode": "palette-classic"
+          },
+          "custom": {
+            "axisCenteredZero": false,
+            "axisColorMode": "text",
+            "axisLabel": "",
+            "axisPlacement": "auto",
+            "barAlignment": 0,
+            "drawStyle": "line",
+            "fillOpacity": 10,
+            "gradientMode": "none",
+            "hideFrom": {
+              "tooltip": false,
+              "viz": false,
+              "legend": false
+            },
+            "lineInterpolation": "linear",
+            "lineWidth": 1,
+            "pointSize": 5,
+            "scaleDistribution": {
+              "type": "linear"
+            },
+            "showPoints": "never",
+            "spanNulls": false,
+            "stacking": {
+              "group": "A",
+              "mode": "none"
+            },
+            "thresholdsStyle": {
+              "mode": "off"
+            }
+          },
+          "mappings": [],
+          "thresholds": {
+            "mode": "absolute",
+            "steps": [
+              {
+                "color": "green",
+                "value": null
+              }
+            ]
+          },
+          "unit": "short"
+        },
+        "overrides": []
+      },
+      "gridPos": {
+        "h": 8,
+        "w": 12,
+        "x": 12,
+        "y": 16
+      },
+      "id": 6,
+      "options": {
+        "legend": {
+          "calcs": ["mean", "lastNotNull"],
+          "displayMode": "table",
+          "placement": "bottom",
+          "showLegend": true
+        },
+        "tooltip": {
+          "mode": "multi",
+          "sort": "none"
+        }
+      },
+      "targets": [
+        {
+          "datasource": {
+            "type": "prometheus",
+            "uid": "${datasource}"
+          },
+          "expr": "mcp_vector_sync_queue_size{namespace=\"$namespace\"}",
+          "legendFormat": "Queue Size",
+          "refId": "A"
+        }
+      ],
+      "title": "Vector Sync Queue Size",
+      "type": "timeseries"
+    }
+  ],
+  "refresh": "30s",
+  "schemaVersion": 38,
+  "style": "dark",
+  "tags": ["nextcloud", "mcp", "observability"],
+  "templating": {
+    "list": [
+      {
+        "current": {
+          "selected": false,
+          "text": "Prometheus",
+          "value": "Prometheus"
+        },
+        "hide": 0,
+        "includeAll": false,
+        "label": "Data Source",
+        "multi": false,
+        "name": "datasource",
+        "options": [],
+        "query": "prometheus",
+        "refresh": 1,
+        "regex": "",
+        "skipUrlSync": false,
+        "type": "datasource"
+      },
+      {
+        "current": {
+          "selected": false,
+          "text": "default",
+          "value": "default"
+        },
+        "datasource": {
+          "type": "prometheus",
+          "uid": "${datasource}"
+        },
+        "definition": "label_values(mcp_http_requests_total, namespace)",
+        "hide": 0,
+        "includeAll": false,
+        "label": "Namespace",
+        "multi": false,
+        "name": "namespace",
+        "options": [],
+        "query": {
+          "query": "label_values(mcp_http_requests_total, namespace)",
+          "refId": "PrometheusVariableQueryEditor-VariableQuery"
+        },
+        "refresh": 1,
+        "regex": "",
+        "skipUrlSync": false,
+        "sort": 0,
+        "type": "query"
+      }
+    ]
+  },
+  "time": {
+    "from": "now-6h",
+    "to": "now"
+  },
+  "timepicker": {},
+  "timezone": "",
+  "title": "Nextcloud MCP Server",
+  "uid": "nextcloud-mcp-server",
+  "version": 1,
+  "weekStart": ""
+}
@@ -5,6 +5,8 @@ metadata:
  labels:
    {{- include "nextcloud-mcp-server.labels" . | nindent 4 }}
 spec:
+  strategy:
+    type: Recreate
  {{- if not .Values.autoscaling.enabled }}
  replicas: {{ .Values.replicaCount }}
  {{- end }}
@@ -56,6 +58,11 @@ spec:
            - name: http
              containerPort: {{ include "nextcloud-mcp-server.port" . }}
              protocol: TCP
+            {{- if .Values.observability.metrics.enabled }}
+            - name: metrics
+              containerPort: {{ .Values.observability.metrics.port }}
+              protocol: TCP
+            {{- end }}
          env:
            # Nextcloud connection
            - name: NEXTCLOUD_HOST
@@ -151,6 +158,11 @@ spec:
            - name: VECTOR_SYNC_QUEUE_MAX_SIZE
              value: {{ .Values.vectorSync.queueMaxSize | quote }}
            {{- end }}
+            # Document Chunking (always set, used by vector sync processor)
+            - name: DOCUMENT_CHUNK_SIZE
+              value: {{ .Values.documentChunking.chunkSize | quote }}
+            - name: DOCUMENT_CHUNK_OVERLAP
+              value: {{ .Values.documentChunking.chunkOverlap | quote }}
            # Qdrant Vector Database
            {{- if eq .Values.qdrant.mode "network" }}
            # Network mode: Use dedicated Qdrant service
@@ -200,6 +212,27 @@ spec:
              value: {{ .Values.openai.baseUrl | quote }}
            {{- end }}
            {{- end }}
+            # Observability
+            - name: METRICS_ENABLED
+              value: {{ .Values.observability.metrics.enabled | quote }}
+            - name: METRICS_PORT
+              value: {{ .Values.observability.metrics.port | quote }}
+            {{- if .Values.observability.tracing.enabled }}
+            - name: OTEL_ENABLED
+              value: "true"
+            - name: OTEL_EXPORTER_OTLP_ENDPOINT
+              value: {{ .Values.observability.tracing.endpoint | quote }}
+            - name: OTEL_SERVICE_NAME
+              value: {{ .Values.observability.tracing.serviceName | quote }}
+            - name: OTEL_TRACES_SAMPLER_ARG
+              value: {{ .Values.observability.tracing.samplingRate | quote }}
+            {{- end }}
+            - name: LOG_FORMAT
+              value: {{ .Values.observability.logging.format | quote }}
+            - name: LOG_LEVEL
+              value: {{ .Values.observability.logging.level | quote }}
+            - name: LOG_INCLUDE_TRACE_CONTEXT
+              value: {{ .Values.observability.logging.includeTraceContext | quote }}
            {{- with .Values.extraEnv }}
            {{- toYaml . | nindent 12 }}
            {{- end }}
@@ -0,0 +1,92 @@
+{{- if and .Values.observability.metrics.enabled .Values.prometheusRule.enabled }}
+apiVersion: monitoring.coreos.com/v1
+kind: PrometheusRule
+metadata:
+  name: {{ include "nextcloud-mcp-server.fullname" . }}
+  namespace: {{ .Release.Namespace }}
+  labels:
+    {{- include "nextcloud-mcp-server.labels" . | nindent 4 }}
+    {{- with .Values.prometheusRule.labels }}
+    {{- toYaml . | nindent 4 }}
+    {{- end }}
+spec:
+  groups:
+    - name: nextcloud-mcp-server.critical
+      interval: 30s
+      rules:
+        - alert: NextcloudMCPServerDown
+          expr: up{job="{{ include "nextcloud-mcp-server.fullname" . }}"} == 0
+          for: 5m
+          labels:
+            severity: critical
+          annotations:
+            summary: "Nextcloud MCP Server is down"
+            description: "{{ `{{` }} $labels.pod {{ `}}` }} has been down for more than 5 minutes."
+
+        - alert: NextcloudMCPHighErrorRate
+          expr: |
+            sum(rate(mcp_http_requests_total{status_code=~"5..", job="{{ include "nextcloud-mcp-server.fullname" . }}"}[5m]))
+            / sum(rate(mcp_http_requests_total{job="{{ include "nextcloud-mcp-server.fullname" . }}"}[5m])) > 0.05
+          for: 5m
+          labels:
+            severity: critical
+          annotations:
+            summary: "High error rate on Nextcloud MCP Server"
+            description: "Error rate is {{ `{{` }} printf \"%.2f%%\" (mul $value 100) {{ `}}` }} (threshold: 5%)"
+
+        - alert: NextcloudMCPHighLatency
+          expr: |
+            histogram_quantile(0.95,
+              sum(rate(mcp_http_request_duration_seconds_bucket{job="{{ include "nextcloud-mcp-server.fullname" . }}"}[5m])) by (le, endpoint)
+            ) > 1
+          for: 5m
+          labels:
+            severity: critical
+          annotations:
+            summary: "High latency on Nextcloud MCP Server"
+            description: "P95 latency is {{ `{{` }} printf \"%.2fs\" $value {{ `}}` }} on {{ `{{` }} $labels.endpoint {{ `}}` }} (threshold: 1s)"
+
+        - alert: NextcloudMCPDependencyDown
+          expr: mcp_dependency_health{job="{{ include "nextcloud-mcp-server.fullname" . }}"} == 0
+          for: 2m
+          labels:
+            severity: critical
+          annotations:
+            summary: "Nextcloud MCP dependency is down"
+            description: "Dependency {{ `{{` }} $labels.dependency {{ `}}` }} has been down for more than 2 minutes."
+
+    - name: nextcloud-mcp-server.warning
+      interval: 30s
+      rules:
+        - alert: NextcloudMCPTokenValidationErrors
+          expr: |
+            sum(rate(mcp_oauth_token_validations_total{result="error", job="{{ include "nextcloud-mcp-server.fullname" . }}"}[10m]))
+            / sum(rate(mcp_oauth_token_validations_total{job="{{ include "nextcloud-mcp-server.fullname" . }}"}[10m])) > 0.01
+          for: 10m
+          labels:
+            severity: warning
+          annotations:
+            summary: "High token validation error rate"
+            description: "Token validation error rate is {{ `{{` }} printf \"%.2f%%\" (mul $value 100) {{ `}}` }} (threshold: 1%)"
+
+        - alert: NextcloudMCPVectorSyncQueueHigh
+          expr: mcp_vector_sync_queue_size{job="{{ include "nextcloud-mcp-server.fullname" . }}"} > 100
+          for: 15m
+          labels:
+            severity: warning
+          annotations:
+            summary: "Vector sync queue is high"
+            description: "Vector sync queue size is {{ `{{` }} $value {{ `}}` }} (threshold: 100)"
+
+        - alert: NextcloudMCPQdrantSlowQueries
+          expr: |
+            histogram_quantile(0.95,
+              sum(rate(mcp_db_operation_duration_seconds_bucket{db="qdrant", job="{{ include "nextcloud-mcp-server.fullname" . }}"}[10m])) by (le)
+            ) > 0.5
+          for: 10m
+          labels:
+            severity: warning
+          annotations:
+            summary: "Qdrant queries are slow"
+            description: "P95 Qdrant query latency is {{ `{{` }} printf \"%.2fs\" $value {{ `}}` }} (threshold: 0.5s)"
+{{- end }}
@@ -15,5 +15,11 @@ spec:
      targetPort: http
      protocol: TCP
      name: http
+    {{- if .Values.observability.metrics.enabled }}
+    - port: {{ .Values.observability.metrics.port }}
+      targetPort: metrics
+      protocol: TCP
+      name: metrics
+    {{- end }}
  selector:
    {{- include "nextcloud-mcp-server.selectorLabels" . | nindent 4 }}
@@ -0,0 +1,32 @@
+{{- if and .Values.observability.metrics.enabled .Values.serviceMonitor.enabled }}
+apiVersion: monitoring.coreos.com/v1
+kind: ServiceMonitor
+metadata:
+  name: {{ include "nextcloud-mcp-server.fullname" . }}
+  namespace: {{ .Release.Namespace }}
+  labels:
+    {{- include "nextcloud-mcp-server.labels" . | nindent 4 }}
+    {{- with .Values.serviceMonitor.labels }}
+    {{- toYaml . | nindent 4 }}
+    {{- end }}
+spec:
+  selector:
+    matchLabels:
+      {{- include "nextcloud-mcp-server.selectorLabels" . | nindent 6 }}
+  endpoints:
+    - port: metrics
+      path: {{ .Values.observability.metrics.path }}
+      interval: {{ .Values.serviceMonitor.interval }}
+      scrapeTimeout: {{ .Values.serviceMonitor.scrapeTimeout }}
+      scheme: http
+      relabelings:
+        # Add namespace label
+        - sourceLabels: [__meta_kubernetes_namespace]
+          targetLabel: namespace
+        # Add pod label
+        - sourceLabels: [__meta_kubernetes_pod_name]
+          targetLabel: pod
+        # Add service label
+        - sourceLabels: [__meta_kubernetes_service_name]
+          targetLabel: service
+{{- end }}
@@ -168,6 +168,43 @@ securityContext:
  runAsNonRoot: true
  runAsUser: 1000

+# Observability Configuration
+observability:
+  # Prometheus metrics
+  metrics:
+    enabled: true
+    port: 9090
+    path: /metrics
+
+  # OpenTelemetry tracing
+  tracing:
+    enabled: false
+    endpoint: ""  # e.g., "http://opentelemetry-collector:4317"
+    serviceName: "nextcloud-mcp-server"
+    samplingRate: 1.0
+
+  # Logging configuration
+  logging:
+    format: json  # "json" or "text"
+    level: INFO
+    includeTraceContext: true
+
+# Prometheus ServiceMonitor (requires Prometheus Operator)
+serviceMonitor:
+  enabled: false
+  interval: 30s
+  scrapeTimeout: 10s
+  labels: {}
+  # Additional labels for ServiceMonitor (e.g., for Prometheus selector)
+  # Example: { prometheus: kube-prometheus }
+
+# Prometheus alert rules (requires Prometheus Operator)
+prometheusRule:
+  enabled: false
+  labels: {}
+  # Additional labels for PrometheusRule (e.g., for Prometheus selector)
+  # Example: { prometheus: kube-prometheus }
+
 service:
  type: ClusterIP
  port: 8000
@@ -277,6 +314,20 @@ vectorSync:
  # Maximum queue size for documents pending indexing
  queueMaxSize: 10000

+# Document Chunking Configuration
+# Controls how documents are split into chunks before embedding
+# Only relevant when vectorSync.enabled is true
+documentChunking:
+  # Number of words per chunk (default: 512)
+  # Smaller chunks (256-384): Better for precise searches, more chunks to store
+  # Medium chunks (512-768): Balanced approach (recommended for most use cases)
+  # Larger chunks (1024+): Better for context, less precise matching
+  chunkSize: 512
+  # Number of overlapping words between chunks (default: 50)
+  # Recommended: 10-20% of chunkSize for context preservation across boundaries
+  # Must be less than chunkSize
+  chunkOverlap: 50
+
 # Qdrant Vector Database Configuration
 # Three deployment modes available:
 # 1. Local In-Memory: Fast, ephemeral, zero-config (mode: "memory")
@@ -88,20 +88,34 @@ services:
      - VECTOR_SYNC_SCAN_INTERVAL=10
      - VECTOR_SYNC_PROCESSOR_WORKERS=1

+      - LOG_FORMAT=text
+
      # Qdrant configuration (three modes):
      # 1. Network mode: Set QDRANT_URL=http://qdrant:6333 (requires qdrant service)
      # 2. In-memory mode: Set QDRANT_LOCATION=:memory: (default if nothing set)
      # 3. Persistent local: Set QDRANT_LOCATION=/app/data/qdrant (stored in mcp-data volume)
-      - QDRANT_LOCATION=:memory:
-      # - QDRANT_URL=http://qdrant:6333  # Uncomment for network mode
-      # - QDRANT_API_KEY=${QDRANT_API_KEY:-my_secret_api_key}  # Only for network mode
+      - QDRANT_LOCATION=":memory:"  # In-memory mode for CI/testing (no external service required)
+      #- QDRANT_URL=http://qdrant:6333  # Uncomment for network mode
+      #- QDRANT_API_KEY=${QDRANT_API_KEY:-my_secret_api_key}  # Only for network mode
+
+      # Collection naming: Auto-generated as {deployment-id}-{model-name}
+      # - Deployment ID: OTEL_SERVICE_NAME (if set) or hostname (fallback)
+      # - Model name: OLLAMA_EMBEDDING_MODEL
+      # - Example: "nextcloud-mcp-server-nomic-embed-text"
+      # - Changing models creates new collection (requires re-embedding)
+      # - Set QDRANT_COLLECTION to override auto-generation:
      - QDRANT_COLLECTION=nextcloud_content

      # Ollama configuration (optional - uses SimpleEmbeddingProvider if not set)
-      # - OLLAMA_BASE_URL=http://your-ollama-endpoint:port
-      # - OLLAMA_EMBEDDING_MODEL=nomic-embed-text
+      # - OLLAMA_BASE_URL=https://ollama.internal.coutinho.io:443
+      # - OLLAMA_EMBEDDING_MODEL=nomic-embed-text  # Changing this creates new collection
      # - OLLAMA_VERIFY_SSL=false

+      # Document chunking configuration (for vector embeddings)
+      # Tune these based on your embedding model and content type
+      # - DOCUMENT_CHUNK_SIZE=512      # Words per chunk (default: 512)
+      # - DOCUMENT_CHUNK_OVERLAP=50    # Overlapping words (default: 50, recommended: 10-20% of chunk size)
+
  mcp-oauth:
    build: .
    command: ["--transport", "streamable-http", "--oauth", "--port", "8001", "--oauth-token-type", "jwt"]
@@ -205,7 +219,7 @@ services:
      - keycloak-oauth-storage:/app/.oauth

  qdrant:
-    image: qdrant/qdrant:latest
+    image: qdrant/qdrant:v1.15.5@sha256:0fb8897412abc81d1c0430a899b9a81eb8328aa634e7242d1bc804c1fe8fe863
    restart: always
    ports:
      - 127.0.0.1:6333:6333  # REST API
@@ -0,0 +1,420 @@
+# ADR-010: Webhook-Based Vector Database Synchronization
+
+**Status**: Proposed
+**Date**: 2025-01-10
+**Depends On**: ADR-007 (Background Vector Sync)
+
+## Context
+
+ADR-007 established a background synchronization architecture for maintaining the vector database using periodic polling. The scanner task runs on a configurable interval (default 3600 seconds / 1 hour) to detect changed documents across Nextcloud apps. While this polling approach is simple and reliable, it introduces significant latency between content changes and vector database updates.
+
+### Current Polling Architecture
+
+The existing scanner implementation in `nextcloud_mcp_server/vector/scanner.py` operates as follows:
+
+1. **Periodic Scanning**: The scanner task sleeps for `vector_sync_scan_interval` seconds between runs
+2. **Change Detection**: For each scan, it:
+   - Fetches all documents from Nextcloud (notes, calendar events, etc.)
+   - Queries Qdrant for the last indexed timestamp of each document
+   - Compares modification timestamps to detect changes
+   - Queues changed documents for processing
+3. **Document Processing**: Processor tasks pull from the queue, generate embeddings, and update Qdrant
+
+This architecture works but has fundamental limitations:
+
+**Latency**: With a 1-hour scan interval, content changes can take up to 1 hour to appear in semantic search results. For time-sensitive use cases (e.g., "What's on my calendar today?"), this delay is problematic.
+
+**API Load**: Every scan fetches *all* documents for *all* enabled users, regardless of whether anything changed. For large deployments with thousands of documents, this generates significant unnecessary API traffic to Nextcloud.
+
+**Resource Waste**: The scanner and processors consume compute resources even when no content has changed. During periods of low activity, the system performs wasteful polling.
+
+**Scalability**: As the number of users and documents grows, the time required to complete a full scan increases. Eventually, the scan duration may exceed the scan interval, causing scans to run continuously without idle periods.
+
+**Rate Limiting**: Fetching all documents for all users in rapid succession can trigger Nextcloud's rate limiting, especially on shared hosting environments with restrictive API quotas.
+
+These limitations are inherent to any polling-based architecture. Reducing the scan interval (e.g., to 5 minutes) reduces latency but exacerbates API load, resource waste, and rate limiting issues. The fundamental problem is that the system has no way to know *when* content changes occur—it must repeatedly check to find out.
+
+### Nextcloud Webhook Listeners
+
+Nextcloud provides a webhook_listeners app (bundled with Nextcloud 30+) that enables push-based change notifications. Instead of polling for changes, external services can register webhook endpoints and receive HTTP POST requests when specific events occur. Administrators register these webhooks using Nextcloud's OCS API or occ commands.
+
+The webhook_listeners app supports events for all Nextcloud apps relevant to this MCP server's vector database:
+
+**Files/Notes Events** (notes are stored as files):
+- `OCP\Files\Events\Node\NodeCreatedEvent`
+- `OCP\Files\Events\Node\NodeWrittenEvent`
+- `OCP\Files\Events\Node\NodeDeletedEvent`
+- `OCP\Files\Events\Node\NodeRenamedEvent`
+- `OCP\Files\Events\Node\NodeCopiedEvent`
+
+**Calendar Events**:
+- `OCP\Calendar\Events\CalendarObjectCreatedEvent`
+- `OCP\Calendar\Events\CalendarObjectUpdatedEvent`
+- `OCP\Calendar\Events\CalendarObjectDeletedEvent`
+- `OCP\Calendar\Events\CalendarObjectMovedEvent`
+
+**Tables Events**:
+- `OCA\Tables\Event\RowAddedEvent`
+- `OCA\Tables\Event\RowUpdatedEvent`
+- `OCA\Tables\Event\RowDeletedEvent`
+
+**Deck Events** (via file events since cards are stored as files in some configurations)
+
+Each webhook notification includes rich metadata:
+- User ID who triggered the event
+- Timestamp of the event
+- Document ID and metadata
+- Operation type (create, update, delete)
+- Path information (for files)
+
+Webhook notifications are dispatched via background jobs, with configurable delivery guarantees. Administrators can set up dedicated webhook worker processes to achieve near-real-time delivery (within seconds of the triggering event).
+
+### Why Not Replace Polling Entirely?
+
+While webhooks provide superior latency and efficiency, they cannot fully replace polling:
+
+**Missed Events**: If the MCP server is down when a webhook fires, the notification is lost. Nextcloud's background job system processes webhooks asynchronously, but does not queue failed deliveries indefinitely.
+
+**Administrator Setup**: Webhooks must be registered by Nextcloud administrators using the OCS API or occ commands. This is an optional optimization that administrators can enable when they want to reduce polling frequency.
+
+**Filter Configuration**: Webhook filters must be carefully configured to avoid notification floods. A poorly configured filter could send thousands of notifications for bulk operations (e.g., importing a calendar with hundreds of events).
+
+**Graceful Degradation**: In environments where webhooks are not configured, the system continues using polling without any degradation in functionality.
+
+**Deletion Detection**: Nextcloud's webhook system does not guarantee delivery of deletion events if the user's account is removed or the app is uninstalled. Periodic polling provides a safety mechanism to detect orphaned documents.
+
+A complementary architecture where webhooks supplement (but don't replace) polling provides low-latency updates when configured, with polling ensuring reliability.
+
+### Design Considerations
+
+**Push vs Pull Trade-offs**:
+Webhooks introduce new failure modes (network issues, endpoint unavailability, notification floods) that polling avoids. The webhook endpoint must handle failures gracefully without blocking semantic search functionality.
+
+**Webhook Endpoint Security**:
+The MCP server exposes an HTTP endpoint to receive webhooks. Authentication is optional—in production deployments, administrators can configure Nextcloud to send an `Authorization` header that the MCP server validates. For local development, authentication can be disabled for simplicity.
+
+**Idempotency**:
+The system may receive duplicate notifications (webhook + next scan) or out-of-order notifications (update fires before create completes). Document processing must be idempotent—processing the same document multiple times produces the same result.
+
+**Asynchronous Processing**:
+Nextcloud processes webhooks via background jobs, introducing delivery latency (typically seconds to minutes depending on background job configuration). This affects testing strategies—integration tests cannot rely on immediate webhook delivery.
+
+**Deployment Patterns**:
+The MCP server webhook endpoint is accessible at the same host/port as the MCP server itself. Administrators configure Nextcloud to POST to `https://<mcp-server-host>:<port>/webhooks/nextcloud` when registering webhook listeners.
+
+## Decision
+
+We will add a webhook endpoint to the MCP server that receives change notifications from Nextcloud and queues documents for vector database processing. This complements the existing polling architecture from ADR-007 without replacing it—webhooks provide low-latency updates when configured, while polling ensures reliability regardless of webhook availability.
+
+The architecture is intentionally simple: the webhook endpoint is just another producer of `DocumentTask` objects that feed into the existing processor queue. The scanner task, processor pool, and queue management remain unchanged from ADR-007.
+
+### Architecture Components
+
+**1. Webhook Endpoint**
+
+A new Starlette HTTP route will be added to receive webhook notifications from Nextcloud:
+
+```python
+from starlette.requests import Request
+from starlette.responses import JSONResponse
+
+@app.route("/webhooks/nextcloud", methods=["POST"])
+async def handle_nextcloud_webhook(request: Request) -> JSONResponse:
+    """
+    Receive webhook notifications from Nextcloud.
+
+    Parses event payload, extracts document metadata, and queues
+    changed documents for processing using the same queue as the scanner.
+    """
+    # 1. Optional authentication validation
+    if settings.webhook_secret:
+        auth_header = request.headers.get("authorization", "")
+        if not auth_header.startswith("Bearer ") or \
+           auth_header[7:] != settings.webhook_secret:
+            logger.warning("Webhook authentication failed")
+            return JSONResponse(
+                {"status": "error", "message": "Unauthorized"},
+                status_code=401
+            )
+
+    # 2. Parse webhook payload
+    payload = await request.json()
+    event_class = payload["event"]["class"]
+    user_id = payload["user"]["uid"]
+
+    # 3. Extract document metadata from event
+    doc_task = extract_document_task(event_class, payload)
+    if not doc_task:
+        return JSONResponse({"status": "ignored", "reason": "unsupported event"})
+
+    # 4. Send to processor queue (same queue as scanner)
+    try:
+        await webhook_send_stream.send(doc_task)
+        logger.info(f"Queued document from webhook: {doc_task}")
+        return JSONResponse({"status": "queued"})
+    except Exception as e:
+        logger.error(f"Failed to queue webhook document: {e}")
+        return JSONResponse(
+            {"status": "error", "message": str(e)},
+            status_code=500
+        )
+```
+
+The endpoint:
+- Validates optional authentication via `Authorization: Bearer <secret>` header
+- Parses various event types (calendar, files, tables) into `DocumentTask` objects
+- Sends to the same processing queue that the scanner uses
+- Returns quickly (<50ms) to avoid blocking Nextcloud's webhook workers
+- Handles errors gracefully (invalid payload, queue full, etc.)
+
+**2. Webhook Registration Helper (Development Only)**
+
+For development and testing purposes, a helper method will be added to `NextcloudClient` for registering webhooks via the OCS API. This is NOT exposed as an MCP tool—administrators register webhooks manually using Nextcloud's admin interface or the OCS API directly.
+
+```python
+class NextcloudClient:
+    async def register_webhook(
+        self,
+        event_type: str,
+        uri: str,
+        http_method: str = "POST",
+        auth_method: str = "none",
+        headers: dict[str, str] | None = None,
+    ) -> dict:
+        """
+        Register a webhook with Nextcloud (requires admin credentials).
+
+        Used for development/testing. Production admins should register
+        webhooks using Nextcloud's admin UI or occ commands.
+        """
+        # Implementation uses OCS API: POST /ocs/v2.php/apps/webhook_listeners/api/v1/webhooks
+        ...
+```
+
+This keeps webhook registration out of the MCP tool surface while providing a convenient API for integration tests.
+
+**3. Event Parsing**
+
+A helper function extracts `DocumentTask` from various Nextcloud event types:
+
+```python
+def extract_document_task(event_class: str, payload: dict) -> DocumentTask | None:
+    """Extract DocumentTask from webhook event payload."""
+    user_id = payload["user"]["uid"]
+    event_data = payload["event"]
+
+    # File/Note events
+    if "NodeCreatedEvent" in event_class or "NodeWrittenEvent" in event_class:
+        # Only process markdown files (notes)
+        path = event_data["node"]["path"]
+        if not path.endswith(".md"):
+            return None
+        return DocumentTask(
+            user_id=user_id,
+            doc_id=event_data["node"]["id"],
+            doc_type="note",
+            operation="index",
+            modified_at=payload["time"],
+        )
+
+    # Calendar events
+    elif "CalendarObjectCreatedEvent" in event_class or \
+         "CalendarObjectUpdatedEvent" in event_class:
+        return DocumentTask(
+            user_id=user_id,
+            doc_id=str(event_data["objectData"]["id"]),
+            doc_type="calendar_event",
+            operation="index",
+            modified_at=event_data["objectData"]["lastmodified"],
+        )
+
+    # Deletion events
+    elif "NodeDeletedEvent" in event_class or \
+         "CalendarObjectDeletedEvent" in event_class:
+        # Similar logic for delete operations
+        ...
+
+    return None  # Unsupported event type
+```
+
+**4. No Changes to Scanner or Processors**
+
+The existing scanner task from ADR-007 continues operating unchanged. It polls Nextcloud on its configured interval (`VECTOR_SYNC_SCAN_INTERVAL`), discovers changed documents, and queues them for processing. The scanner is unaware of webhooks—it simply adds `DocumentTask` objects to the queue.
+
+Similarly, the processor pool continues pulling `DocumentTask` objects from the queue, generating embeddings, and updating Qdrant. Processors don't know or care whether a task came from the scanner or a webhook.
+
+This design keeps concerns separated: webhooks and scanner are independent producers, processors are independent consumers, and the queue mediates between them.
+
+### Configuration
+
+A new optional environment variable controls webhook authentication:
+
+```bash
+# Optional: Shared secret for webhook authentication
+# If set, webhooks must include "Authorization: Bearer <secret>" header
+# If unset, no authentication is required (useful for local development)
+WEBHOOK_SECRET=<generate-random-secret>
+```
+
+The webhook endpoint is automatically available at `/webhooks/nextcloud` when the MCP server starts. No feature flags or additional configuration needed—if Nextcloud sends webhooks to this endpoint, they will be processed.
+
+**Reducing Polling Frequency**: Administrators who configure webhooks may want to reduce polling frequency to minimize API load while maintaining safety reconciliation scans:
+
+```bash
+# Increase scan interval from 1 hour (default) to 24 hours
+VECTOR_SYNC_SCAN_INTERVAL=86400
+```
+
+This is a manual configuration decision, not automatic—the scanner doesn't adapt based on webhook availability.
+
+### Webhook Event Mapping
+
+The webhook handler maps Nextcloud events to document types:
+
+| Nextcloud Event | Document Type | Operation |
+|----------------|---------------|-----------|
+| `NodeCreatedEvent` (path: `*/files/*.md`) | `note` | `index` |
+| `NodeWrittenEvent` (path: `*/files/*.md`) | `note` | `index` |
+| `NodeDeletedEvent` (path: `*/files/*.md`) | `note` | `delete` |
+| `CalendarObjectCreatedEvent` | `calendar_event` | `index` |
+| `CalendarObjectUpdatedEvent` | `calendar_event` | `index` |
+| `CalendarObjectDeletedEvent` | `calendar_event` | `delete` |
+| `RowAddedEvent` | `table_row` | `index` |
+| `RowUpdatedEvent` | `table_row` | `index` |
+| `RowDeletedEvent` | `table_row` | `delete` |
+
+Path filters in webhook registration ensure only relevant files trigger notifications (e.g., exclude `.jpg`, `.mp4` for file events).
+
+### Administrator Setup
+
+Administrators who want to enable webhooks:
+
+1. **Enable webhook_listeners app** in Nextcloud: `occ app:enable webhook_listeners`
+2. **Register webhook endpoints** using Nextcloud's OCS API or admin UI:
+   - Endpoint: `https://<mcp-server-host>:<port>/webhooks/nextcloud`
+   - Events: File created/updated/deleted, Calendar object events, Table row events
+   - Filters: Exclude non-content files (images, videos), system directories
+   - Optional: Configure `Authorization: Bearer <WEBHOOK_SECRET>` header
+3. **Optionally reduce scanner frequency**: Set `VECTOR_SYNC_SCAN_INTERVAL=86400` (24 hours)
+4. **Set up webhook workers** (optional): Configure dedicated background job workers for low-latency delivery
+
+Existing deployments continue using polling without any changes. Webhooks are purely additive.
+
+## Consequences
+
+### Benefits
+
+**Reduced Latency**: With webhooks configured, content changes appear in semantic search within seconds to minutes (depending on Nextcloud background job configuration) instead of up to 1 hour. Queries like "What meetings do I have today?" reflect recent calendar updates.
+
+**Lower API Load**: Administrators who configure webhooks can reduce scanner frequency (e.g., 24-hour intervals), eliminating most polling API calls while maintaining safety reconciliation scans. This significantly reduces load on Nextcloud servers.
+
+**Better Scalability**: Webhooks scale better than polling as content volume grows. The system only processes changed documents instead of checking all documents every hour.
+
+**Simple Architecture**: The webhook endpoint is just another producer feeding the existing processor queue. No changes to scanner, processors, or queue management—webhooks integrate cleanly into the existing architecture.
+
+**Improved User Experience**: Lower-latency semantic search feels more responsive and accurate, especially for time-sensitive queries about recent changes.
+
+### Drawbacks
+
+**Manual Configuration**: Administrators must configure webhooks outside the MCP server using Nextcloud's admin tools. This adds setup complexity compared to the zero-configuration polling approach.
+
+**Deployment Requirements**: Webhooks require the MCP server to be reachable from Nextcloud via HTTP(S). Deployments behind NAT or with restrictive firewalls may not support webhooks without additional networking configuration.
+
+**Asynchronous Delivery**: Nextcloud processes webhooks via background jobs, introducing delivery latency (typically seconds to minutes). The exact latency depends on background job worker configuration and system load.
+
+**Testing Complexity**: Integration tests cannot rely on immediate webhook delivery due to asynchronous background job processing. Tests must either poll for results or mock webhook delivery directly.
+
+**New Failure Modes**: Webhook endpoint downtime, network issues between Nextcloud and MCP server, webhook notification floods from bulk operations. The system must handle these gracefully.
+
+**Version Dependencies**: The webhook_listeners app requires Nextcloud 30+. Older versions continue using polling exclusively.
+
+### Monitoring and Observability
+
+New metrics track webhook performance:
+
+- `webhook_notifications_received_total{event_type}`: Count of webhook notifications by event type
+- `webhook_processing_duration_seconds{event_type}`: Webhook handler latency
+- `webhook_errors_total{error_type}`: Failed webhook processing by error type (auth failure, parse error, queue full)
+
+Logs include:
+- Successful webhook processing: `Queued document from webhook: DocumentTask(...)`
+- Webhook authentication failures: `Webhook authentication failed`
+- Parse errors: `Failed to parse webhook payload: ...`
+- Unsupported events: `Ignoring webhook for unsupported event: ...`
+
+### Security Considerations
+
+**Optional Authentication**: When `WEBHOOK_SECRET` is configured, webhook requests must include `Authorization: Bearer <WEBHOOK_SECRET>` header. The server validates this before processing to prevent unauthorized document queueing. For local development, authentication can be disabled by leaving `WEBHOOK_SECRET` unset.
+
+**Payload Validation**: Webhook payloads are parsed and validated against expected schemas. Malformed payloads are rejected with 400 Bad Request responses.
+
+**No Scope Enforcement**: Unlike MCP tools, webhooks do not enforce progressive consent or check if users have enabled semantic search. Webhooks queue all document changes—administrators control which events trigger webhooks via Nextcloud filters. This keeps the webhook endpoint simple and stateless.
+
+### Testing Strategy
+
+**Unit Tests**: Test webhook handler logic, event parsing, and authentication validation using mocked payloads:
+
+```python
+async def test_webhook_endpoint_parses_note_created_event():
+    """Unit test: webhook endpoint extracts DocumentTask from note created event."""
+    payload = {
+        "user": {"uid": "alice"},
+        "time": 1704067200,
+        "event": {
+            "class": "OCP\\Files\\Events\\Node\\NodeCreatedEvent",
+            "node": {"id": "123", "path": "/alice/files/test.md"}
+        }
+    }
+    # Mock send_stream and verify DocumentTask is queued
+    ...
+```
+
+**Integration Tests (Without Real Webhooks)**: Since Nextcloud processes webhooks asynchronously via background jobs, integration tests should NOT rely on triggering real Nextcloud events and waiting for webhook delivery. Instead, tests should:
+
+1. **Mock webhook delivery**: POST webhook payloads directly to the `/webhooks/nextcloud` endpoint
+2. **Verify processing**: Check that documents are queued and eventually appear in Qdrant
+3. **Test authentication**: Verify requests without valid auth header are rejected (when `WEBHOOK_SECRET` is set)
+
+```python
+async def test_webhook_integration_mocked_delivery():
+    """Integration test: webhook handler queues document for processing."""
+    # POST webhook payload directly to endpoint (bypass Nextcloud)
+    response = await client.post("/webhooks/nextcloud", json=note_created_payload)
+    assert response.status_code == 200
+
+    # Wait for processor to handle document
+    await asyncio.sleep(2)
+
+    # Verify document appears in Qdrant
+    results = await qdrant_client.scroll(...)
+    assert len(results[0]) > 0
+```
+
+**Manual Testing (Real Webhooks)**: For end-to-end validation with real Nextcloud webhook delivery:
+
+1. Register webhook via OCS API or `NextcloudClient.register_webhook()` helper
+2. Configure webhook background job workers for low-latency delivery
+3. Trigger Nextcloud events (create note, add calendar event)
+4. Monitor MCP server logs for webhook delivery
+5. Verify documents appear in Qdrant after background job processing
+
+**Failure Mode Tests**:
+- Invalid authentication: Verify 401 response when auth header is missing/incorrect
+- Malformed payload: Verify 400 response for invalid JSON or missing required fields
+- Unsupported event types: Verify graceful handling (ignored, not error)
+- Queue full: Verify 500 response with appropriate error message
+
+### Future Enhancements
+
+**Batch Processing**: Group multiple webhook notifications within a short time window (e.g., 5 seconds) into a single batch before queueing. This reduces processor overhead during bulk operations like importing calendars.
+
+**Webhook Payload Optimization**: For large documents, Nextcloud could be configured to send minimal metadata in webhooks (just user_id, doc_id, doc_type), with processors fetching full content lazily. This reduces webhook payload size and network bandwidth.
+
+**Deduplication Window**: Track recently processed documents (last 5 minutes) to avoid redundant work when webhooks and scanner both detect the same change. The processor can check a simple in-memory cache before fetching document content.
+
+## References
+
+- ADR-007: Background Vector Database Synchronization (polling architecture)
+- Nextcloud Documentation: `~/Software/documentation/admin_manual/webhook_listeners/index.rst`
+- Nextcloud OCS API: Webhook registration endpoint
+- Current scanner implementation: `nextcloud_mcp_server/vector/scanner.py:37`
@@ -178,6 +178,111 @@ VECTOR_SYNC_ENABLED=true
 - Requires separate Qdrant service
 - More complex deployment

+### Qdrant Collection Naming
+
+Collection names are automatically generated to include the embedding model, ensuring safe model switching and preventing dimension mismatches.
+
+#### Auto-Generated Naming (Default)
+
+**Format:** `{deployment-id}-{model-name}`
+
+**Components:**
+- **Deployment ID:** `OTEL_SERVICE_NAME` (if configured) or `hostname` (fallback)
+- **Model name:** `OLLAMA_EMBEDDING_MODEL`
+
+**Examples:**
+
+```bash
+# With OTEL service name configured
+OTEL_SERVICE_NAME=my-mcp-server
+OLLAMA_EMBEDDING_MODEL=nomic-embed-text
+# → Collection: "my-mcp-server-nomic-embed-text"
+
+# Simple Docker deployment (OTEL not configured)
+# hostname=mcp-container
+OLLAMA_EMBEDDING_MODEL=all-minilm
+# → Collection: "mcp-container-all-minilm"
+```
+
+#### Switching Embedding Models
+
+When you change `OLLAMA_EMBEDDING_MODEL`, a new collection is automatically created:
+
+```bash
+# Initial setup
+OLLAMA_EMBEDDING_MODEL=nomic-embed-text
+# Collection: "my-server-nomic-embed-text" (768 dimensions)
+
+# Change model
+OLLAMA_EMBEDDING_MODEL=all-minilm
+# Collection: "my-server-all-minilm" (384 dimensions)
+# → New collection created, full re-embedding occurs
+```
+
+**Important:**
+- **Collections are mutually exclusive** - vectors cannot be shared between different embedding models
+- **Switching models requires re-embedding** all documents (may take time for large note collections)
+- **Old collection remains** in Qdrant and can be deleted manually if no longer needed
+
+#### Explicit Override
+
+Set `QDRANT_COLLECTION` to use a specific collection name:
+
+```bash
+QDRANT_COLLECTION=my-custom-collection  # Bypasses auto-generation
+```
+
+**Use cases:**
+- Backward compatibility with existing deployments
+- Custom naming schemes
+- Sharing a collection across deployments (advanced)
+
+#### Multi-Server Deployments
+
+Each server should have a unique deployment ID to avoid collection collisions:
+
+```bash
+# Server 1 (Production)
+OTEL_SERVICE_NAME=mcp-prod
+OLLAMA_EMBEDDING_MODEL=nomic-embed-text
+# → Collection: "mcp-prod-nomic-embed-text"
+
+# Server 2 (Staging)
+OTEL_SERVICE_NAME=mcp-staging
+OLLAMA_EMBEDDING_MODEL=nomic-embed-text
+# → Collection: "mcp-staging-nomic-embed-text"
+
+# Server 3 (Different model)
+OTEL_SERVICE_NAME=mcp-experimental
+OLLAMA_EMBEDDING_MODEL=bge-large
+# → Collection: "mcp-experimental-bge-large"
+```
+
+**Benefits:**
+- Multiple MCP servers can share one Qdrant instance safely
+- No naming collisions between deployments
+- Clear collection ownership (can see which deployment and model)
+
+#### Dimension Validation
+
+The server validates collection dimensions on startup:
+
+```
+Dimension mismatch for collection 'my-server-nomic-embed-text':
+  Expected: 384 (from embedding model 'all-minilm')
+  Found: 768
+This usually means you changed the embedding model.
+Solutions:
+  1. Delete the old collection: Collection will be recreated with new dimensions
+  2. Set QDRANT_COLLECTION to use a different collection name
+  3. Revert OLLAMA_EMBEDDING_MODEL to the original model
+```
+
+**What this prevents:**
+- Runtime errors from dimension mismatches
+- Data corruption in Qdrant
+- Confusing error messages during indexing
+
 ### Vector Sync Configuration

 Control background indexing behavior:
@@ -188,6 +293,10 @@ VECTOR_SYNC_ENABLED=true              # Enable background indexing
 VECTOR_SYNC_SCAN_INTERVAL=300         # Scan interval in seconds (default: 5 minutes)
 VECTOR_SYNC_PROCESSOR_WORKERS=3       # Concurrent indexing workers (default: 3)
 VECTOR_SYNC_QUEUE_MAX_SIZE=10000      # Max queued documents (default: 10000)
+
+# Document chunking settings (for vector embeddings)
+DOCUMENT_CHUNK_SIZE=512               # Words per chunk (default: 512)
+DOCUMENT_CHUNK_OVERLAP=50             # Overlapping words between chunks (default: 50)
 ```

 ### Embedding Service Configuration
@@ -208,6 +317,54 @@ OLLAMA_VERIFY_SSL=true                   # Verify SSL certificates

 If `OLLAMA_BASE_URL` is not set, the server uses a simple random embedding provider for testing. This is **not suitable for production** as it generates random embeddings with no semantic meaning.

+### Document Chunking Configuration
+
+The server chunks documents before embedding to handle documents larger than the embedding model's context window. Chunk size and overlap can be tuned based on your embedding model and content type.
+
+#### Choosing Chunk Size
+
+**Smaller chunks (256-384 words)**:
+- More precise matching
+- Less context per chunk
+- Better for finding specific information
+- Higher storage requirements (more vectors)
+
+**Larger chunks (768-1024 words)**:
+- More context per chunk
+- Less precise matching
+- Better for understanding broader topics
+- Lower storage requirements (fewer vectors)
+
+**Default (512 words)**:
+- Balanced approach suitable for most use cases
+- Works well with typical note lengths
+- Good compromise between precision and context
+
+#### Choosing Overlap
+
+Overlap preserves context across chunk boundaries. Recommended settings:
+
+- **10-20% of chunk size** (e.g., 50-100 words for 512-word chunks)
+- **Too small** (<10%): May lose context at boundaries
+- **Too large** (>20%): Redundant storage, diminishing returns
+
+**Examples**:
+```dotenv
+# Precise matching for short notes
+DOCUMENT_CHUNK_SIZE=256
+DOCUMENT_CHUNK_OVERLAP=25
+
+# Default balanced configuration
+DOCUMENT_CHUNK_SIZE=512
+DOCUMENT_CHUNK_OVERLAP=50
+
+# More context for long documents
+DOCUMENT_CHUNK_SIZE=1024
+DOCUMENT_CHUNK_OVERLAP=100
+```
+
+**Important**: Changing chunk size requires re-embedding all documents. The collection naming strategy (see "Qdrant Collection Naming" above) helps manage this by creating separate collections for different configurations.
+
 ### Environment Variables Reference

 | Variable | Required | Default | Description |
@@ -223,6 +380,8 @@ If `OLLAMA_BASE_URL` is not set, the server uses a simple random embedding provi
 | `OLLAMA_BASE_URL` | ⚠️ Optional | - | Ollama API endpoint for embeddings |
 | `OLLAMA_EMBEDDING_MODEL` | ⚠️ Optional | `nomic-embed-text` | Embedding model to use |
 | `OLLAMA_VERIFY_SSL` | ⚠️ Optional | `true` | Verify SSL certificates |
+| `DOCUMENT_CHUNK_SIZE` | ⚠️ Optional | `512` | Words per chunk for document embedding |
+| `DOCUMENT_CHUNK_OVERLAP` | ⚠️ Optional | `50` | Overlapping words between chunks (must be < chunk size) |

 ### Docker Compose Example

@@ -0,0 +1,260 @@
+# Observability and Monitoring
+
+The Nextcloud MCP Server includes comprehensive observability features for production deployments:
+
+- **Prometheus metrics** for monitoring performance and health
+- **OpenTelemetry distributed tracing** for debugging request flows
+- **Structured JSON logging** with trace correlation
+- **Kubernetes integration** via ServiceMonitor and PrometheusRule
+
+## Quick Start
+
+### Local Development with Prometheus
+
+```bash
+# Enable metrics (enabled by default)
+export METRICS_ENABLED=true
+export METRICS_PORT=9090
+
+# Enable tracing (optional)
+export OTEL_ENABLED=true
+export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4317
+
+# Start the server
+docker-compose up -d mcp
+```
+
+Access metrics at: `http://localhost:9090/metrics`
+
+### Kubernetes Deployment
+
+Metrics are automatically scraped if you have Prometheus Operator installed:
+
+```bash
+helm install nextcloud-mcp charts/nextcloud-mcp-server \
+  --set observability.metrics.enabled=true \
+  --set observability.tracing.enabled=true \
+  --set observability.tracing.endpoint=http://opentelemetry-collector:4317 \
+  --set serviceMonitor.enabled=true
+```
+
+## Configuration
+
+### Environment Variables
+
+| Variable | Default | Description |
+|----------|---------|-------------|
+| `METRICS_ENABLED` | `true` | Enable Prometheus metrics |
+| `METRICS_PORT` | `9090` | Port for metrics endpoint |
+| `OTEL_ENABLED` | `false` | Enable OpenTelemetry tracing |
+| `OTEL_EXPORTER_OTLP_ENDPOINT` | - | OTLP gRPC endpoint (e.g., `http://otel-collector:4317`) |
+| `OTEL_SERVICE_NAME` | `nextcloud-mcp-server` | Service name in traces |
+| `OTEL_TRACES_SAMPLER` | `always_on` | Trace sampling strategy |
+| `OTEL_TRACES_SAMPLER_ARG` | `1.0` | Sampling rate (0.0-1.0) |
+| `LOG_FORMAT` | `json` | Log format (`json` or `text`) |
+| `LOG_LEVEL` | `INFO` | Minimum log level |
+| `LOG_INCLUDE_TRACE_CONTEXT` | `true` | Include trace IDs in logs |
+
+### Helm Chart Configuration
+
+```yaml
+observability:
+  metrics:
+    enabled: true
+    port: 9090
+    path: /metrics
+
+  tracing:
+    enabled: true
+    endpoint: "http://opentelemetry-collector:4317"
+    samplingRate: 1.0
+
+  logging:
+    format: json
+    level: INFO
+    includeTraceContext: true
+
+serviceMonitor:
+  enabled: true
+  interval: 30s
+  scrapeTimeout: 10s
+```
+
+## Metrics
+
+### HTTP Server Metrics (RED)
+
+- `mcp_http_requests_total` - Total HTTP requests
+- `mcp_http_request_duration_seconds` - Request latency histogram
+- `mcp_http_requests_in_progress` - In-flight requests gauge
+
+### MCP Tool Metrics
+
+- `mcp_tool_calls_total` - Tool invocation count by status
+- `mcp_tool_duration_seconds` - Tool execution latency
+- `mcp_tool_errors_total` - Tool errors by type
+
+### Nextcloud API Metrics
+
+- `mcp_nextcloud_api_requests_total` - API calls by app and status
+- `mcp_nextcloud_api_duration_seconds` - API latency by app
+- `mcp_nextcloud_api_retries_total` - Retry count (429, timeout, etc.)
+
+### OAuth Flow Metrics
+
+- `mcp_oauth_token_validations_total` - Token validation count
+- `mcp_oauth_token_exchange_total` - Token exchange operations
+- `mcp_oauth_token_cache_hits_total` - Cache hit/miss rate
+- `mcp_oauth_refresh_token_operations_total` - Refresh token storage ops
+
+### Vector Sync Metrics (when enabled)
+
+- `mcp_vector_sync_documents_scanned_total` - Documents discovered
+- `mcp_vector_sync_documents_processed_total` - Processing results
+- `mcp_vector_sync_processing_duration_seconds` - Processing latency
+- `mcp_vector_sync_queue_size` - Current queue depth
+- `mcp_qdrant_operations_total` - Qdrant DB operations
+
+### Database Metrics
+
+- `mcp_db_operations_total` - DB operations (SQLite, Qdrant)
+- `mcp_db_operation_duration_seconds` - DB latency
+
+### Dependency Health
+
+- `mcp_dependency_health` - External dependency status (1=up, 0=down)
+- `mcp_dependency_check_duration_seconds` - Health check latency
+
+## Distributed Tracing
+
+### Span Hierarchy
+
+```
+HTTP POST /messages
+├── mcp.tool.nc_notes_create_note
+│   └── nextcloud.api.notes.POST
+│       └── httpx request (auto-instrumented)
+└── oauth.token.validate (if OAuth mode)
+    └── httpx request to IdP
+```
+
+### Span Attributes
+
+- **MCP tools**: `mcp.tool.name`, `mcp.tool.args` (sanitized)
+- **Nextcloud API**: `nextcloud.app`, `http.method`, `http.status_code`
+- **OAuth**: `oauth.operation`, `oauth.method`
+- **Vector sync**: `vector_sync.operation`, `vector_sync.document_count`
+
+### Trace Context in Logs
+
+When tracing is enabled, all logs include `trace_id` and `span_id`:
+
+```json
+{
+  "timestamp": "2025-01-09T12:34:56.789Z",
+  "level": "INFO",
+  "logger": "nextcloud_mcp_server.server.notes",
+  "message": "Note created successfully",
+  "trace_id": "a1b2c3d4e5f6...",
+  "span_id": "123456789abc...",
+  "note_id": 42
+}
+```
+
+## Dashboards
+
+### Prometheus Queries
+
+**Request Rate (req/s)**:
+```promql
+sum(rate(mcp_http_requests_total[5m])) by (method, endpoint)
+```
+
+**Error Rate (%)**:
+```promql
+sum(rate(mcp_http_requests_total{status_code=~"5.."}[5m]))
+  / sum(rate(mcp_http_requests_total[5m])) * 100
+```
+
+**P95 Latency**:
+```promql
+histogram_quantile(0.95,
+  sum(rate(mcp_http_request_duration_seconds_bucket[5m])) by (le, endpoint)
+)
+```
+
+**Top Tools by Volume**:
+```promql
+topk(10, sum(rate(mcp_tool_calls_total[5m])) by (tool_name))
+```
+
+**Nextcloud API Health**:
+```promql
+sum(rate(mcp_nextcloud_api_requests_total{status_code!~"2.."}[5m])) by (app)
+```
+
+## Alerts
+
+### Recommended Alert Rules
+
+**Critical**:
+- Server down for >5min
+- Error rate >5% for >5min
+- P95 latency >1s for >5min
+- Dependency down for >2min
+
+**Warning**:
+- Token validation errors >1% for >10min
+- Vector sync queue >100 for >15min
+- Qdrant slow (p95 >500ms) for >10min
+
+See `charts/nextcloud-mcp-server/templates/prometheusrule.yaml` for complete definitions.
+
+## Troubleshooting
+
+### Metrics Not Appearing
+
+1. Check metrics are enabled: `curl http://localhost:9090/metrics`
+2. Verify ServiceMonitor labels match Prometheus selector
+3. Check Prometheus target status: `http://prometheus:9090/targets`
+
+### Traces Not Appearing
+
+1. Verify OTLP endpoint is reachable: `curl http://otel-collector:4317`
+2. Check collector logs for errors
+3. Verify sampling rate is not 0.0
+4. Check trace backend (Jaeger/Tempo) connectivity
+
+### High Cardinality Metrics
+
+If you see cardinality warnings:
+- Middleware normalizes endpoints (e.g., `/user/123` → `/user/*`)
+- OAuth tokens are never included in metric labels
+- User IDs are not tracked (use tracing for per-user debugging)
+
+## Performance Impact
+
+- **Metrics**: <1% overhead (counters/histograms are very fast)
+- **Tracing**: ~2-5% overhead at 100% sampling
+- **JSON logging**: <1% overhead vs text logging
+
+**Recommendation**: Always enable metrics. Enable tracing in staging/production with 10-50% sampling.
+
+## Architecture
+
+The observability stack integrates at multiple layers:
+
+1. **HTTP Layer**: `ObservabilityMiddleware` tracks all HTTP requests
+2. **MCP Layer**: Tools use `@trace_mcp_tool` for span creation
+3. **Client Layer**: `BaseNextcloudClient` tracks all API calls
+4. **OAuth Layer**: Token operations are traced and metered
+5. **Background Tasks**: Vector sync operations emit metrics/traces
+
+All components use shared Prometheus `Registry` and OpenTelemetry `TracerProvider`.
+
+## References
+
+- [Prometheus Best Practices](https://prometheus.io/docs/practices/)
+- [OpenTelemetry Python SDK](https://opentelemetry.io/docs/languages/python/)
+- [Prometheus Operator](https://prometheus-operator.dev/)
+- [Grafana Dashboards](https://grafana.com/docs/grafana/latest/dashboards/)
@@ -0,0 +1,921 @@
+# Semantic Search Architecture
+
+This document explains the architecture of the semantic search feature in the Nextcloud MCP Server, including background synchronization, vector search, and optional AI-generated answers via MCP sampling.
+
+> [!IMPORTANT]
+> **Status: Experimental**
+> - Disabled by default (`VECTOR_SYNC_ENABLED=false`)
+> - Currently supports **Notes app only** (multi-app architecture ready, additional apps planned)
+> - Requires additional infrastructure (Qdrant vector database + Ollama embedding service)
+> - RAG answer generation requires MCP client sampling support
+
+## Overview
+
+### What is Semantic Search?
+
+**Semantic search** finds information based on **meaning** rather than exact keyword matches. It uses vector embeddings to understand that "car" and "automobile" are similar, or that "bread recipe" matches "how to bake bread."
+
+**Traditional keyword search:**
+```
+Query: "machine learning"
+Matches: Only notes containing "machine learning" exactly
+Misses: Notes with "neural networks", "AI models", "deep learning"
+```
+
+**Semantic search:**
+```
+Query: "machine learning"
+Matches: Notes about machine learning, neural networks, AI, deep learning, etc.
+Understanding: Semantic similarity via vector embeddings
+```
+
+### Why It Matters
+
+Semantic search enables:
+- **Natural language queries** - Ask questions in plain language
+- **Conceptual discovery** - Find related content even with different terminology
+- **Cross-reference insights** - Connect ideas across your knowledge base
+- **AI-powered answers** - Generate summaries with citations (optional, requires MCP sampling)
+
+### Current Support
+
+- **Supported Apps**: Notes (fully implemented)
+- **Planned Apps**: Calendar events, Calendar tasks, Deck cards, Files (with text extraction), Contacts
+- **Architecture**: Multi-app plugin system ready, awaiting implementation
+
+## System Components
+
+```mermaid
+graph TB
+    subgraph "MCP Client"
+        Client[Claude Desktop, IDEs, etc.]
+    end
+
+    subgraph "Nextcloud MCP Server"
+        MCP[MCP Server]
+        Scanner[Background Scanner<br/>Hourly Change Detection]
+        Queue[Document Queue]
+        Processor[Embedding Processors<br/>Concurrent Workers]
+    end
+
+    subgraph "Infrastructure"
+        Qdrant[(Qdrant<br/>Vector Database)]
+        Ollama[Ollama<br/>Embedding Service]
+        NC[Nextcloud<br/>Notes API, CalDAV, etc.]
+    end
+
+    Client <-->|MCP Protocol| MCP
+    Scanner -->|Fetch Changes| NC
+    Scanner -->|Enqueue Documents| Queue
+    Queue -->|Process Batch| Processor
+    Processor -->|Generate Embeddings| Ollama
+    Processor -->|Store Vectors| Qdrant
+    MCP -->|Search Queries| Qdrant
+    MCP -->|Verify Access| NC
+```
+
+**Component Roles:**
+
+- **MCP Server**: Exposes semantic search tools (`nc_semantic_search`, `nc_semantic_search_answer`, `nc_get_vector_sync_status`)
+- **Background Scanner**: Discovers changed documents every hour using ETag-based change detection
+- **Document Queue**: Holds pending documents for embedding generation
+- **Embedding Processors**: Generate vector embeddings via Ollama (concurrent workers)
+- **Qdrant Vector Database**: Stores document vectors with metadata and user_id filtering
+- **Ollama Embedding Service**: Converts text to 768-dimensional vectors (default: `nomic-embed-text` model)
+- **Nextcloud APIs**: Source of truth for documents and access control verification
+
+## How It Works: Background Synchronization
+
+Background synchronization runs automatically when `VECTOR_SYNC_ENABLED=true`, discovering changes and indexing documents without user intervention.
+
+```mermaid
+sequenceDiagram
+    participant Timer
+    participant Scanner
+    participant NC as Nextcloud API
+    participant Queue
+    participant Processor
+    participant Ollama
+    participant Qdrant
+
+    Timer->>Scanner: Trigger (hourly)
+    Scanner->>NC: Fetch all notes<br/>(Notes API)
+    NC-->>Scanner: Notes with ETags
+    Scanner->>Qdrant: Check indexed documents
+    Qdrant-->>Scanner: Existing ETags
+    Scanner->>Scanner: Identify changes<br/>(new/modified/deleted)
+    Scanner->>Queue: Enqueue changed docs
+
+    loop Continuous Processing
+        Processor->>Queue: Fetch batch
+        Queue-->>Processor: Documents
+        Processor->>Ollama: Generate embeddings
+        Ollama-->>Processor: 768-dim vectors
+        Processor->>Qdrant: Upsert vectors<br/>(with user_id, doc_type)
+    end
+```
+
+### Scanner Behavior
+
+**Hourly Trigger:**
+- Runs every hour (configurable)
+- Fetches all notes from Nextcloud Notes API
+- Compares ETags with Qdrant's indexed state
+- Enqueues new/modified documents
+
+**Change Detection:**
+- **New documents**: No entry in Qdrant → enqueue for indexing
+- **Modified documents**: ETag mismatch → enqueue for re-indexing
+- **Deleted documents**: In Qdrant but not in Nextcloud → delete from Qdrant
+
+**Multi-App Plugin Architecture:**
+```python
+# Each app implements DocumentScanner interface
+class NotesScanner(DocumentScanner):
+    async def scan(self) -> list[Document]:
+        # Fetch notes, detect changes, return documents
+```
+
+Currently only `NotesScanner` is implemented. Future: `CalendarScanner`, `DeckScanner`, `FilesScanner`, etc.
+
+### Queue Processing
+
+**Document Queue:**
+- In-memory FIFO queue (not persistent across restarts)
+- Holds documents pending embedding generation
+- Batch processing for efficiency
+
+**Processor Pool:**
+- Concurrent workers using `anyio.TaskGroup`
+- Process documents in parallel (default: 4 workers)
+- Each worker: fetch document → generate embedding → store in Qdrant
+
+**Backpressure Handling:**
+- Queue size limits prevent memory exhaustion
+- Slow consumers (Ollama) naturally pace the system
+
+### Vector Storage
+
+**Qdrant Collection Schema:**
+```
+{
+  "id": "note_123",
+  "vector": [768 dimensions],
+  "payload": {
+    "user_id": "alice",
+    "doc_type": "note",
+    "doc_id": "123",
+    "title": "Machine Learning Notes",
+    "content": "Neural networks are...",
+    "etag": "abc123",
+    "last_modified": "2025-01-15T10:30:00Z"
+  }
+}
+```
+
+**Key Fields:**
+- `user_id`: Multi-tenancy filtering (each user's vectors isolated)
+- `doc_type`: App identifier ("note", "event", "card", etc.)
+- `etag`: Change detection for incremental updates
+- `chunk_index`: Position of this chunk within the document (0-indexed)
+- `total_chunks`: Total number of chunks for this document
+- `excerpt`: First 200 characters of chunk (for display)
+
+### Document Chunking Strategy
+
+Documents are chunked before embedding to handle content larger than the embedding model's context window and to improve search precision.
+
+**Configuration:**
+```dotenv
+DOCUMENT_CHUNK_SIZE=512       # Words per chunk (default)
+DOCUMENT_CHUNK_OVERLAP=50     # Overlapping words between chunks (default)
+```
+
+**Chunking Process:**
+1. **Text combination**: Document title + content (e.g., `"Note Title\n\nNote content..."`)
+2. **Word-based splitting**: Simple whitespace tokenization
+3. **Sliding window**: Create overlapping chunks
+4. **Individual embedding**: Each chunk gets its own vector
+5. **Separate storage**: Each chunk stored as distinct point in Qdrant
+
+**Example:**
+```
+Document (1000 words):
+→ Chunk 0: words 0-511
+→ Chunk 1: words 462-973 (overlaps by 50 words)
+→ Chunk 2: words 924-999 (last chunk, partial)
+
+Each chunk stored as separate vector with metadata:
+- chunk_index: 0, 1, 2
+- total_chunks: 3
+- excerpt: First 200 chars of each chunk
+```
+
+**Search Behavior:**
+- **Vector search** operates on chunks (not whole documents)
+- **Deduplication** collapses multiple matching chunks from same document
+- **Best match** returns highest-scoring chunk's excerpt
+- **Access verification** still performed at document level
+
+**Tuning Recommendations:**
+- **Small chunks (256-384 words)**: More precise, less context, more storage
+- **Large chunks (768-1024 words)**: More context, less precise, less storage
+- **Overlap (10-20% of chunk size)**: Preserves context across boundaries
+- **Match to embedding model**: Consider model's context window when sizing
+
+**Important**: Changing chunk size requires re-embedding all documents. Use the collection naming strategy to manage different chunking configurations.
+
+### Collection Naming and Model Switching
+
+**Auto-generated collection names:**
+- **Format:** `{deployment-id}-{model-name}`
+- **Deployment ID:** `OTEL_SERVICE_NAME` (if configured) or `hostname` (fallback)
+- **Model name:** `OLLAMA_EMBEDDING_MODEL`
+- **Example:** `"my-mcp-server-nomic-embed-text"`, `"mcp-container-all-minilm"`
+
+**Why model-based naming:**
+- Ensures each embedding model gets its own collection
+- Prevents dimension mismatches when switching models
+- Enables safe model experimentation (new model = new collection)
+- Supports multi-server deployments (different deployment IDs)
+
+**Switching embedding models:**
+
+Collections are **mutually exclusive** - vectors from one embedding model cannot be used with another. When you change the embedding model:
+
+1. **New collection is created** with the new model's dimensions
+2. **Full re-embedding occurs** - scanner processes all documents again
+3. **Old collection remains** - can be deleted manually if no longer needed
+4. **Dimension validation** - server fails fast if collection dimension doesn't match model
+
+**Example workflow:**
+```bash
+# Start with nomic-embed-text (768 dimensions)
+OLLAMA_EMBEDDING_MODEL=nomic-embed-text
+# Collection: "my-server-nomic-embed-text"
+# → Scanner indexes 1000 notes → 1000 vectors in collection
+
+# Switch to all-minilm (384 dimensions)
+OLLAMA_EMBEDDING_MODEL=all-minilm
+# Collection: "my-server-all-minilm"
+# → Scanner detects 0 indexed documents → re-embeds 1000 notes
+# → Old collection "my-server-nomic-embed-text" still exists in Qdrant
+```
+
+**Re-embedding performance:**
+- CPU-only: 1-5 notes/second
+- With GPU: 50-200 notes/second
+- 1000 notes: 3-16 minutes (CPU) or 5-20 seconds (GPU)
+
+**Multi-server deployments:**
+
+Multiple MCP servers can share one Qdrant instance safely:
+
+```bash
+# Server 1 (Production)
+OTEL_SERVICE_NAME=mcp-prod
+OLLAMA_EMBEDDING_MODEL=nomic-embed-text
+# → Collection: "mcp-prod-nomic-embed-text"
+
+# Server 2 (Staging with different model)
+OTEL_SERVICE_NAME=mcp-staging
+OLLAMA_EMBEDDING_MODEL=all-minilm
+# → Collection: "mcp-staging-all-minilm"
+```
+
+Each deployment gets its own collection - no naming collisions or dimension conflicts.
+
+## How It Works: Semantic Search
+
+Semantic search converts user queries into vectors and finds similar documents using cosine similarity.
+
+```mermaid
+sequenceDiagram
+    participant User
+    participant MCP as MCP Server
+    participant Ollama
+    participant Qdrant
+    participant NC as Nextcloud API
+
+    User->>MCP: nc_semantic_search("machine learning")
+    MCP->>MCP: Check OAuth scope<br/>(semantic:read)
+    MCP->>Ollama: Generate query embedding
+    Ollama-->>MCP: Query vector (768-dim)
+    MCP->>Qdrant: Search similar vectors<br/>(filter: user_id=alice)
+    Qdrant-->>MCP: Top K results<br/>(with similarity scores)
+
+    loop For each result
+        MCP->>NC: Verify access<br/>(fetch note by ID)
+        alt Access granted
+            NC-->>MCP: Note metadata
+        else Access denied (404/401)
+            MCP->>MCP: Filter out result
+        end
+    end
+
+    MCP-->>User: Search results<br/>(with scores, excerpts)
+```
+
+### Dual-Phase Authorization
+
+**Phase 1: OAuth Scope Check**
+- Verify user has `semantic:read` scope
+- Rejects unauthorized users immediately
+
+**Phase 2: Per-Document Verification**
+- For each search result, fetch document via app API (Notes, Calendar, etc.)
+- If fetch succeeds (200 OK), user has access
+- If fetch fails (404 Not Found, 401 Unauthorized), filter out result
+- **Security**: Prevents information leakage from vector search alone
+
+**Rationale:**
+- Vector database doesn't know about sharing, permissions changes, or deleted documents
+- App APIs are source of truth for access control
+- Verification ensures users only see documents they can access
+
+### Search Flow
+
+1. **Query Embedding**: Convert user query to 768-dimensional vector via Ollama
+2. **Vector Search**: Find top K similar vectors in Qdrant (cosine similarity)
+3. **User Filtering**: Qdrant pre-filters by `user_id` (multi-tenancy)
+4. **Access Verification**: Fetch each document via app API to verify current access
+5. **Result Ranking**: Return results sorted by similarity score
+6. **Response**: Include document excerpts, metadata, and similarity scores
+
+### Performance
+
+- **Query latency**: 50-200ms typical (embedding + vector search + verification)
+- **Accuracy**: Depends on embedding model quality (`nomic-embed-text` recommended)
+- **Scalability**: Qdrant handles millions of vectors efficiently
+
+## How It Works: RAG with MCP Sampling (Optional)
+
+The `nc_semantic_search_answer` tool generates AI-powered answers with citations using **MCP sampling** - requesting the MCP client's LLM to generate text.
+
+```mermaid
+sequenceDiagram
+    participant User
+    participant MCP as MCP Server
+    participant Client as MCP Client<br/>(Claude Desktop)
+    participant LLM as Client's LLM<br/>(Claude, GPT, etc.)
+
+    User->>MCP: nc_semantic_search_answer("What are my Q1 goals?")
+    MCP->>MCP: Semantic search<br/>(find relevant notes)
+    MCP->>MCP: Construct prompt<br/>(query + documents + instructions)
+    MCP->>Client: Sampling request<br/>(MCP Protocol)
+    Client->>User: Prompt for approval<br/>(optional, client-controlled)
+    User-->>Client: Approve
+    Client->>LLM: Generate answer<br/>(with context)
+    LLM-->>Client: Answer with citations
+    Client-->>MCP: Sampling response
+    MCP-->>User: Generated answer<br/>(with source documents)
+```
+
+### MCP Sampling Architecture
+
+**Why MCP Sampling?**
+- **No server-side LLM**: MCP server has no API keys, doesn't call LLMs directly
+- **Client controls everything**: Which model, who pays, user approval prompts
+- **Privacy**: Documents stay with the client's LLM provider, not a third-party
+- **Flexibility**: Works with any MCP client that supports sampling (Claude Desktop, future clients)
+
+**Prompt Construction:**
+```
+User Query: {query}
+
+Relevant Documents:
+1. Document: {title} (Note)
+   Content: {excerpt}
+
+2. Document: {title} (Note)
+   Content: {excerpt}
+
+Instructions:
+- Provide a comprehensive answer to the user's query
+- Use the documents above as context
+- Include citations: "According to Document 1 (title)..."
+- If documents don't contain enough information, say so
+```
+
+**Graceful Fallback:**
+```python
+try:
+    result = await ctx.session.create_message(...)
+    return answer_with_citations
+except Exception as e:
+    # Fallback: Return documents without generated answer
+    return SearchResponse(
+        generated_answer=f"[Sampling unavailable: {e}]",
+        sources=search_results
+    )
+```
+
+**Client Support:**
+- **Requires**: MCP client with sampling capability
+- **Known support**: Claude Desktop (as of Claude 3.5+)
+- **Graceful degradation**: Returns raw documents if sampling unavailable
+
+## Authentication & Security
+
+### OAuth Scopes
+
+**`semantic:read`** - Search permission
+- Allows using `nc_semantic_search` and `nc_semantic_search_answer` tools
+- Does NOT grant access to documents (verified via app APIs)
+- Required for any semantic search operation
+
+**`semantic:write`** - Sync control permission
+- Allows enabling/disabling background sync (`provision_vector_sync`, `deprovision_vector_sync`)
+- Controls whether user's documents are indexed
+- Currently not implemented in OAuth mode (BasicAuth only)
+
+### Dual-Phase Authorization Pattern
+
+**Phase 1: Scope Check** (semantic:read)
+- Verifies user authorized to search
+- Prevents unauthorized vector database access
+
+**Phase 2: Document Verification** (app-specific APIs)
+- For each search result, fetch via Notes API, CalDAV, etc.
+- If user can fetch → include in results
+- If user cannot fetch (404/401) → filter out
+- **Security**: Vector search cannot leak documents user shouldn't see
+
+**Example Scenario:**
+1. Alice creates note "Secret Project X"
+2. Background sync indexes note with `user_id=alice`
+3. Bob searches for "project"
+4. Vector search finds "Secret Project X" (vector similarity)
+5. Qdrant filters by `user_id=bob` → no match (Alice's note excluded)
+6. Even if Bob somehow got the doc_id, Phase 2 verification would fail (404 Not Found)
+
+### Offline Access for Background Sync
+
+**Why needed:**
+- Background scanner runs hourly without user interaction
+- Requires valid access tokens to fetch documents from Nextcloud APIs
+- User's session token expires after hours/days
+
+**OAuth Mode (ADR-004 Flow 2):**
+- User explicitly provisions offline access via `provision_nextcloud_access` tool
+- Server requests `offline_access` scope → receives refresh token
+- Refresh token stored securely (database, encrypted)
+- Background sync uses refresh tokens to obtain access tokens
+
+**BasicAuth Mode:**
+- Username/password stored in environment variables
+- Always available for background operations
+- Simpler but less secure (credentials never expire)
+
+## Deployment Modes
+
+### Authentication Modes
+
+| Mode | Security | Offline Access | Background Sync | Best For |
+|------|----------|----------------|-----------------|----------|
+| **BasicAuth** | Lower (credentials in env) | Always available | ✅ Works immediately | Single-user, development, testing |
+| **OAuth** | Higher (tokens, scopes) | User must provision | ⚠️ Not yet implemented | Multi-user, production |
+
+**BasicAuth:**
+- Set `NEXTCLOUD_USERNAME` and `NEXTCLOUD_PASSWORD`
+- Background sync works immediately when `VECTOR_SYNC_ENABLED=true`
+- Credentials stored in `.env` file (secure server access required)
+
+**OAuth:**
+- Client authenticates with `semantic:read` scope
+- User must explicitly provision offline access (future: `provision_vector_sync` tool)
+- Background sync only works for users who provisioned access
+- More secure: tokens expire, user controls access
+
+### Qdrant Deployment Modes
+
+| Mode | Configuration | Persistence | Scalability | Best For |
+|------|---------------|-------------|-------------|----------|
+| **In-Memory** (default) | `QDRANT_LOCATION=:memory:` | ❌ Lost on restart | Single instance | Testing, development |
+| **Persistent Local** | `QDRANT_LOCATION=/data/qdrant` | ✅ Survives restarts | Single instance | Small deployments |
+| **Network** | `QDRANT_URL=http://qdrant:6333` | ✅ Dedicated service | ✅ Horizontal scaling | Production |
+
+**In-Memory Mode:**
+```bash
+VECTOR_SYNC_ENABLED=true
+# QDRANT_LOCATION not set → defaults to :memory:
+```
+- Fastest startup
+- No disk I/O
+- **Warning**: All vectors lost when server restarts (must re-index)
+
+**Persistent Local Mode:**
+```bash
+VECTOR_SYNC_ENABLED=true
+QDRANT_LOCATION=/var/lib/qdrant
+```
+- Vectors survive restarts
+- Single server only (no distributed setup)
+- Disk I/O for durability
+
+**Network Mode (Recommended for Production):**
+```bash
+VECTOR_SYNC_ENABLED=true
+QDRANT_URL=http://qdrant:6333
+QDRANT_API_KEY=secret  # optional
+```
+- Dedicated Qdrant service (Docker, Kubernetes)
+- Horizontal scaling (multiple MCP servers → one Qdrant)
+- High availability options
+
+### Embedding Service Options
+
+| Service | Configuration | Cost | Performance | Best For |
+|---------|---------------|------|-------------|----------|
+| **Ollama** (recommended) | `OLLAMA_BASE_URL=http://ollama:11434` | Free (self-hosted) | Fast (local GPU) | Production, development |
+| **OpenAI** (future) | `OPENAI_API_KEY=sk-...` | Paid (API) | Fast (cloud) | Cloud deployments |
+| **Fallback** | No config | Free | Slow (random) | Testing only (not production) |
+
+**Ollama Setup (Recommended):**
+```bash
+# docker-compose.yml
+services:
+  ollama:
+    image: ollama/ollama
+    volumes:
+      - ollama-data:/root/.ollama
+    ports:
+      - "11434:11434"
+
+# Pull embedding model
+docker compose exec ollama ollama pull nomic-embed-text
+```
+
+**Environment Configuration:**
+```bash
+OLLAMA_BASE_URL=http://ollama:11434
+OLLAMA_EMBEDDING_MODEL=nomic-embed-text  # 768-dimensional vectors
+```
+
+**Model Options:**
+- `nomic-embed-text` (default): 768-dim, optimized for semantic search
+- `all-minilm`: Smaller, faster, slightly less accurate
+- `mxbai-embed-large`: Larger, more accurate, slower
+
+## Configuration Overview
+
+### Key Environment Variables
+
+**Enable Semantic Search:**
+```bash
+VECTOR_SYNC_ENABLED=true  # Default: false (opt-in)
+```
+
+**Qdrant Vector Database:**
+```bash
+# In-memory mode (default if VECTOR_SYNC_ENABLED=true)
+# QDRANT_LOCATION not set → uses :memory:
+
+# Persistent local mode
+QDRANT_LOCATION=/var/lib/qdrant
+
+# Network mode (production)
+QDRANT_URL=http://qdrant:6333
+QDRANT_API_KEY=secret  # optional
+```
+
+**Ollama Embedding Service:**
+```bash
+OLLAMA_BASE_URL=http://ollama:11434
+OLLAMA_EMBEDDING_MODEL=nomic-embed-text  # Default
+```
+
+**Scanner Configuration:**
+```bash
+VECTOR_SYNC_INTERVAL=3600  # Scan interval in seconds (default: 1 hour)
+```
+
+### Resource Requirements
+
+**Qdrant:**
+- **Memory**: ~100-200 MB base + ~1 KB per vector (1M vectors ≈ 1 GB)
+- **Disk**: Persistent mode only, ~200 bytes per vector
+- **CPU**: Low (indexing) to moderate (search)
+
+**Ollama:**
+- **Memory**: 2-4 GB for `nomic-embed-text` model
+- **CPU**: High during embedding generation, idle otherwise
+- **GPU**: Optional but recommended (10-100x faster)
+
+**MCP Server:**
+- **Memory**: +50-100 MB for background sync workers
+- **CPU**: Moderate during scanning/processing, low otherwise
+
+### Trade-offs
+
+| Consideration | In-Memory Qdrant | Persistent Qdrant | Network Qdrant |
+|---------------|------------------|-------------------|----------------|
+| Setup complexity | ✅ Minimal | ✅ Easy | ⚠️ Requires separate service |
+| Durability | ❌ Lost on restart | ✅ Survives restarts | ✅ Survives restarts |
+| Scalability | ❌ Single instance | ❌ Single instance | ✅ Horizontal scaling |
+| Performance | ✅ Fastest | ✅ Fast | ⚠️ Network latency |
+
+## Operational Behavior
+
+### What Happens When VECTOR_SYNC_ENABLED=true
+
+**Immediate (Server Startup):**
+1. MCP server connects to Qdrant (creates collection if needed)
+2. MCP server connects to Ollama (verifies embedding model available)
+3. Background scanner starts (schedules hourly runs)
+4. Document queue and processors initialize
+
+**First Scan (Within 1 hour):**
+1. Scanner fetches all notes from Nextcloud
+2. Compares with Qdrant (likely empty on first run)
+3. Enqueues all notes for indexing
+4. Processors generate embeddings (may take minutes for large note collections)
+5. Vectors stored in Qdrant with user_id filtering
+
+**Hourly Thereafter:**
+1. Scanner fetches all notes
+2. Identifies new/modified/deleted notes (ETag comparison)
+3. Enqueues changes only
+4. Incremental updates processed
+
+### Performance Expectations
+
+**Embedding Generation:**
+- **Without GPU**: 1-5 notes/second (CPU-bound)
+- **With GPU**: 50-200 notes/second (highly parallel)
+- **Initial indexing**: 100 notes ≈ 20-100 seconds (CPU), 1-2 seconds (GPU)
+
+**Search Query:**
+- **Embedding generation**: 50-100ms
+- **Vector search**: 10-50ms (depends on collection size)
+- **Access verification**: 20-100ms per document (Nextcloud API calls)
+- **Total latency**: 100-300ms typical
+
+**Resource Usage:**
+- **Idle**: Minimal (background scanner sleeps)
+- **Scanning**: Moderate CPU (ETag checks, API calls)
+- **Processing**: High CPU/GPU (embedding generation)
+- **Searching**: Low to moderate (depends on query frequency)
+
+### Background Sync Behavior
+
+**Scanner Triggers:**
+- Hourly (configurable via `VECTOR_SYNC_INTERVAL`)
+- Manual trigger via `nc_trigger_vector_sync` (future)
+
+**Queue Processing:**
+- Continuous (workers always running)
+- Batch processing (fetch 10 documents at a time)
+- Concurrent workers (4 by default)
+
+**Error Handling:**
+- Individual document failures logged but don't stop scanning
+- Retries for transient errors (network timeouts, rate limits)
+- Failed documents skipped, re-attempted on next scan
+
+**What Gets Indexed:**
+- **Notes**: All notes accessible to the authenticated user
+- **Future**: Calendar events, tasks, deck cards, files with text extraction, contacts
+
+## Monitoring & Observability
+
+### MCP Tools
+
+**`nc_get_vector_sync_status`** - Check sync status
+```python
+{
+  "total_documents": 1234,
+  "indexed_documents": 1200,
+  "pending_documents": 34,
+  "sync_enabled": true,
+  "last_scan": "2025-01-15T14:30:00Z",
+  "status": "syncing"  # idle | syncing | error
+}
+```
+
+**Interpreting Status:**
+- `idle`: No pending work, last scan completed successfully
+- `syncing`: Currently processing documents
+- `error`: Last scan failed (check logs)
+
+### Logs to Check
+
+**Scanner Logs:**
+```
+[INFO] Vector sync scanner started (interval: 3600s)
+[INFO] Scanning notes: found 150 documents
+[INFO] Changes detected: 5 new, 2 modified, 1 deleted
+[INFO] Enqueued 7 documents for processing
+```
+
+**Processor Logs:**
+```
+[INFO] Processing document: note_123
+[DEBUG] Generated embedding (768 dimensions)
+[INFO] Stored vector in Qdrant: note_123
+```
+
+**Error Logs:**
+```
+[ERROR] Failed to generate embedding for note_123: Connection timeout
+[WARN] Qdrant connection lost, retrying...
+[ERROR] Ollama embedding failed: Model not found
+```
+
+**Log Locations:**
+- **Docker**: `docker compose logs mcp`
+- **Local**: stdout (redirect to file if needed)
+- **Kubernetes**: `kubectl logs -f deployment/nextcloud-mcp-server`
+
+### Metrics to Monitor
+
+**Indexing Progress:**
+- Total documents vs indexed documents
+- Pending queue size
+- Processing rate (docs/second)
+
+**Search Performance:**
+- Query latency (p50, p95, p99)
+- Results per query
+- Verification overhead (API calls per query)
+
+**Resource Usage:**
+- Qdrant memory/disk usage
+- Ollama CPU/GPU usage
+- MCP server memory
+
+For detailed observability setup, see [docs/observability.md](observability.md).
+
+## Troubleshooting from Architecture Perspective
+
+### Documents Not Appearing in Search
+
+**Diagnosis Flow:**
+1. Check sync status: `nc_get_vector_sync_status`
+   - `sync_enabled: false` → Enable with `VECTOR_SYNC_ENABLED=true`
+   - `status: error` → Check scanner logs for failures
+2. Check queue size:
+   - `pending_documents > 0` → Processing in progress, wait
+   - `pending_documents == 0` but `indexed_documents` low → Scan hasn't run yet (wait up to 1 hour)
+3. Check Qdrant:
+   - Connection errors in logs → Verify `QDRANT_URL` or `QDRANT_LOCATION`
+   - Collection empty → First scan hasn't completed
+4. Check Ollama:
+   - Embedding errors in logs → Verify `OLLAMA_BASE_URL`
+   - Model not found → Pull model: `ollama pull nomic-embed-text`
+
+**Common Causes:**
+- Sync disabled (default): Enable `VECTOR_SYNC_ENABLED=true`
+- Ollama not running: Start Ollama service
+- Qdrant not accessible: Check network/URL
+- First scan in progress: Wait up to 1 hour + processing time
+
+### Slow Search Performance
+
+**Diagnosis:**
+1. **Query embedding slow (>500ms)**:
+   - Ollama overloaded or CPU-bound
+   - Solution: Use GPU, upgrade CPU, or reduce concurrent requests
+2. **Vector search slow (>200ms)**:
+   - Large collection (millions of vectors)
+   - Solution: Use network Qdrant with SSDs, add indexing
+3. **Verification slow (>500ms)**:
+   - Many results to verify (10+ documents)
+   - Nextcloud API slow or overloaded
+   - Solution: Reduce `limit` parameter, optimize Nextcloud
+
+**Performance Tuning:**
+- Reduce search `limit` (default: 10 results)
+- Use network Qdrant for large collections
+- Enable Ollama GPU acceleration
+- Check Nextcloud API response times
+
+### Background Sync Stopped
+
+**Diagnosis:**
+1. Check logs for errors:
+   - Authentication failures (401/403) → Token expired (OAuth) or credentials invalid (BasicAuth)
+   - Connection timeouts → Network issues with Nextcloud/Qdrant/Ollama
+   - Rate limiting (429) → Reduce scan frequency
+2. Check `nc_get_vector_sync_status`:
+   - `status: error` → See logs for details
+   - `last_scan` timestamp old (>2 hours) → Scanner may have crashed
+3. Verify services:
+   - Qdrant accessible: `curl http://qdrant:6333/`
+   - Ollama accessible: `curl http://ollama:11434/api/tags`
+   - Nextcloud accessible: Check API health
+
+**OAuth Mode (Future):**
+- Offline access token expired → Re-provision via `provision_vector_sync`
+- User deprovisioned access → Sync stops intentionally
+
+### Out of Memory
+
+**Diagnosis:**
+1. Check Qdrant mode:
+   - In-memory mode with large collection → Switch to persistent or network mode
+2. Check embedding batch size:
+   - Too many documents processed simultaneously → Reduce worker count
+3. Check Ollama memory:
+   - Large models loaded → Use smaller embedding model
+
+**Solutions:**
+- Use persistent or network Qdrant (frees server memory)
+- Reduce concurrent processor workers
+- Use smaller embedding model (`all-minilm` instead of `nomic-embed-text`)
+- Increase server memory allocation
+
+## Limitations & Future Work
+
+### Current Limitations
+
+1. **Notes App Only**
+   - Architecture supports multiple apps (plugin system ready)
+   - Only `NotesScanner` and `NotesProcessor` implemented
+   - Future: Calendar, Deck, Files, Contacts
+
+2. **MCP Sampling Support**
+   - `nc_semantic_search_answer` requires client sampling capability
+   - Not all MCP clients support sampling yet
+   - Graceful fallback: Returns documents without generated answer
+
+3. **OAuth Background Sync**
+   - User-controlled background jobs not yet implemented
+   - Currently works in BasicAuth mode only
+   - Future: Users opt-in via `provision_vector_sync` tool
+
+4. **No Incremental Updates**
+   - Document changes trigger full re-embedding
+   - Cannot update just modified paragraphs
+   - Future: Paragraph-level chunking and incremental updates
+
+5. **No Query Caching**
+   - Each search generates new query embedding
+   - Repeated queries re-search Qdrant
+   - Future: Cache recent query embeddings and results
+
+6. **Single Embedding Model**
+   - Uses one model for all documents and queries
+   - Cannot customize per app or user
+   - Future: App-specific or user-selected models
+
+### Future Enhancements
+
+**Multi-App Support** (In Progress):
+- Scanner plugins for Calendar, Deck, Files, Contacts
+- Unified vector search across all apps
+- App-specific metadata in vector payloads
+
+**User-Controlled Sync (OAuth Mode)**:
+- `provision_vector_sync` and `deprovision_vector_sync` tools
+- Per-user background job scheduling
+- User dashboard for sync status and controls
+
+**Advanced Search Features**:
+- Hybrid search (vector + keyword combined)
+- Filtering by date range, app type, tags
+- Aggregations and faceted search
+- Search result explanations (why this matched)
+
+**Performance Optimizations**:
+- Query caching for repeated searches
+- Incremental document updates (paragraph-level)
+- Batch query processing
+- Qdrant HNSW indexing tuning
+
+**Embedding Improvements**:
+- Support for OpenAI embeddings (ada-002, text-embedding-3)
+- Multi-language embedding models
+- Fine-tuned models for Nextcloud content
+- Paragraph-level chunking for long documents
+
+## References
+
+### Architecture Decision Records (ADRs)
+
+- **[ADR-003: Vector Database Semantic Search](ADR-003-vector-database-semantic-search.md)** - Qdrant selection rationale, embedding strategy, hybrid search (superseded by ADR-007 but technical decisions remain valid)
+- **[ADR-007: Background Vector Sync Job Management](ADR-007-background-vector-sync-job-management.md)** - Current implementation, Scanner-Queue-Processor architecture, plugin system
+- **[ADR-008: MCP Sampling for Semantic Search](ADR-008-mcp-sampling-for-semantic-search.md)** - RAG with MCP sampling, client-server separation, prompt construction
+- **[ADR-009: Semantic Search OAuth Scope](ADR-009-semantic-search-oauth-scope.md)** - OAuth scope model, dual-phase authorization, security rationale
+
+### Configuration & Setup
+
+- **[Configuration Guide](configuration.md)** - Environment variables, Qdrant setup, Ollama setup, detailed configuration options
+- **[Installation Guide](installation.md)** - Deployment options (Docker, Kubernetes, local)
+- **[Running the Server](running.md)** - Starting the server, transport options, testing
+
+### Monitoring & Troubleshooting
+
+- **[Observability Guide](observability.md)** - Logging, metrics, tracing, debugging
+- **[Troubleshooting](troubleshooting.md)** - General issues and solutions
+
+### Related Documentation
+
+- **[OAuth Architecture](oauth-architecture.md)** - OAuth flows, scopes, token management
+- **[Comparison with Context Agent](comparison-context-agent.md)** - When to use Nextcloud MCP Server vs Context Agent
+
+---
+
+**Questions or Issues?**
+- [Open an issue](https://github.com/cbcoutinho/nextcloud-mcp-server/issues)
+- [Contribute improvements](https://github.com/cbcoutinho/nextcloud-mcp-server/pulls)
@@ -124,3 +124,75 @@ ENABLE_CUSTOM_PROCESSOR=false

 # Comma-separated MIME types your processor supports
 #CUSTOM_PROCESSOR_TYPES=application/pdf,image/jpeg,image/png
+
+# ============================================
+# Semantic Search & Vector Sync Configuration
+# ============================================
+# EXPERIMENTAL: Semantic search for Notes app (multi-app support planned)
+# Requires: Qdrant vector database + Ollama embedding service
+# Disabled by default
+
+# Enable background vector indexing
+VECTOR_SYNC_ENABLED=false
+
+# Document scan interval in seconds (default: 300 = 5 minutes)
+# How often to check for new/updated documents
+#VECTOR_SYNC_SCAN_INTERVAL=300
+
+# Concurrent indexing workers (default: 3)
+# Number of parallel workers for embedding generation
+#VECTOR_SYNC_PROCESSOR_WORKERS=3
+
+# Max queued documents (default: 10000)
+# Maximum documents waiting to be processed
+#VECTOR_SYNC_QUEUE_MAX_SIZE=10000
+
+# ============================================
+# Qdrant Vector Database Configuration
+# ============================================
+# Choose ONE of three modes:
+# 1. In-memory mode (default): Set neither QDRANT_URL nor QDRANT_LOCATION
+# 2. Persistent local: Set QDRANT_LOCATION=/path/to/data
+# 3. Network mode: Set QDRANT_URL=http://qdrant:6333
+
+# Network mode: URL to Qdrant service
+#QDRANT_URL=http://qdrant:6333
+
+# Local mode: Path to store vectors (use :memory: for in-memory)
+#QDRANT_LOCATION=:memory:
+
+# API key for network mode (optional)
+#QDRANT_API_KEY=
+
+# Collection name (optional - auto-generated if not set)
+# Auto-generation format: {deployment-id}-{model-name}
+# Allows safe model switching and multi-server deployments
+#QDRANT_COLLECTION=nextcloud_content
+
+# ============================================
+# Ollama Embedding Service Configuration
+# ============================================
+# Ollama endpoint for embeddings (if not set, uses SimpleEmbeddingProvider fallback)
+#OLLAMA_BASE_URL=http://ollama:11434
+
+# Embedding model to use (default: nomic-embed-text, 768 dimensions)
+# Changing this creates a new collection (requires re-embedding all documents)
+#OLLAMA_EMBEDDING_MODEL=nomic-embed-text
+
+# Verify SSL certificates (default: true)
+#OLLAMA_VERIFY_SSL=true
+
+# ============================================
+# Document Chunking Configuration
+# ============================================
+# Configure how documents are split before embedding
+
+# Words per chunk (default: 512)
+# Smaller chunks (256-384): More precise, less context, more storage
+# Larger chunks (768-1024): More context, less precise, less storage
+#DOCUMENT_CHUNK_SIZE=512
+
+# Overlapping words between chunks (default: 50)
+# Recommended: 10-20% of chunk size
+# Preserves context across chunk boundaries
+#DOCUMENT_CHUNK_OVERLAP=50
@@ -32,13 +32,17 @@ from nextcloud_mcp_server.auth import (
 from nextcloud_mcp_server.auth.unified_verifier import UnifiedTokenVerifier
 from nextcloud_mcp_server.client import NextcloudClient
 from nextcloud_mcp_server.config import (
-    LOGGING_CONFIG,
    get_document_processor_config,
    get_settings,
-    setup_logging,
 )
 from nextcloud_mcp_server.context import get_client as get_nextcloud_client
 from nextcloud_mcp_server.document_processors import get_registry
+from nextcloud_mcp_server.observability import (
+    ObservabilityMiddleware,
+    get_uvicorn_logging_config,
+    setup_metrics,
+    setup_tracing,
+)
 from nextcloud_mcp_server.server import (
    configure_calendar_tools,
    configure_contacts_tools,
@@ -776,7 +780,28 @@ async def setup_oauth_config():


 def get_app(transport: str = "sse", enabled_apps: list[str] | None = None):
-    setup_logging()
+    # Initialize observability (logging will be configured by uvicorn)
+    settings = get_settings()
+
+    # Setup Prometheus metrics (always enabled by default)
+    if settings.metrics_enabled:
+        setup_metrics(port=settings.metrics_port)
+        logger.info(
+            f"Prometheus metrics enabled on dedicated port {settings.metrics_port}"
+        )
+
+    # Setup OpenTelemetry tracing (optional)
+    if settings.tracing_enabled:
+        setup_tracing(
+            service_name=settings.otel_service_name,
+            otlp_endpoint=settings.otel_exporter_otlp_endpoint,
+            sampling_rate=settings.otel_traces_sampler_arg,
+        )
+        logger.info(
+            f"OpenTelemetry tracing enabled (endpoint: {settings.otel_exporter_otlp_endpoint})"
+        )
+    else:
+        logger.info("OpenTelemetry tracing disabled (set OTEL_ENABLED=true to enable)")

    # Determine authentication mode
    oauth_enabled = is_oauth_mode()
@@ -1148,13 +1173,15 @@ def get_app(transport: str = "sse", enabled_apps: list[str] | None = None):
                checks["auth_configured"] = "error: credentials not set"
                is_ready = False

-        # Check Qdrant status if vector sync is enabled
+        # Check Qdrant status if using network mode (external Qdrant service)
+        # In-memory and persistent modes use embedded Qdrant, no external service to check
        vector_sync_enabled = (
            os.getenv("VECTOR_SYNC_ENABLED", "false").lower() == "true"
        )
-        if vector_sync_enabled:
+        qdrant_url = os.getenv("QDRANT_URL")  # Only set in network mode
+
+        if vector_sync_enabled and qdrant_url:
            try:
-                qdrant_url = os.getenv("QDRANT_URL", "http://qdrant:6333")
                async with httpx.AsyncClient(timeout=2.0) as client:
                    response = await client.get(f"{qdrant_url}/readyz")
                    if response.status_code == 200:
@@ -1165,6 +1192,9 @@ def get_app(transport: str = "sse", enabled_apps: list[str] | None = None):
            except Exception as e:
                checks["qdrant"] = f"error: {str(e)}"
                is_ready = False
+        elif vector_sync_enabled:
+            # Using embedded Qdrant (memory or persistent mode)
+            checks["qdrant"] = "embedded"

        status_code = 200 if is_ready else 503
        return JSONResponse(
@@ -1183,6 +1213,9 @@ def get_app(transport: str = "sse", enabled_apps: list[str] | None = None):
    routes.append(Route("/health/ready", health_ready, methods=["GET"]))
    logger.info("Health check endpoints enabled: /health/live, /health/ready")

+    # Note: Metrics endpoint is NOT exposed on main HTTP port for security reasons.
+    # Metrics are served on dedicated port via setup_metrics() (default: 9090)
+
    if oauth_enabled:
        # Import OAuth routes (ADR-004 Progressive Consent)
        from nextcloud_mcp_server.auth.oauth_routes import oauth_authorize
@@ -1346,7 +1379,7 @@ def get_app(transport: str = "sse", enabled_apps: list[str] | None = None):
        "Routes: /user/* with SessionAuth, /mcp with FastMCP OAuth Bearer tokens"
    )

-    # Add debugging middleware to log Authorization headers
+    # Add debugging middleware to log Authorization headers and client capabilities
    @app.middleware("http")
    async def log_auth_headers(request, call_next):
        auth_header = request.headers.get("authorization")
@@ -1361,6 +1394,52 @@ def get_app(transport: str = "sse", enabled_apps: list[str] | None = None):
                logger.warning(
                    f"⚠️  /mcp request WITHOUT Authorization header from {request.client}"
                )
+
+            # Log client capabilities on initialize request
+            if request.method == "POST":
+                # Read body to check for initialize request
+                # Starlette caches the body internally, so it's safe to read here
+                body = await request.body()
+                try:
+                    import json
+
+                    data = json.loads(body)
+                    # Check if this is an initialize request
+                    if data.get("method") == "initialize":
+                        params = data.get("params", {})
+                        capabilities = params.get("capabilities", {})
+                        client_info = params.get("clientInfo", {})
+
+                        logger.info(
+                            f"🔌 MCP client connected: {client_info.get('name', 'unknown')} "
+                            f"v{client_info.get('version', 'unknown')}"
+                        )
+
+                        # Log capabilities in a structured way
+                        cap_summary = []
+                        # Check for presence using 'in' not truthiness (empty dict {} counts as having capability)
+                        if "roots" in capabilities:
+                            cap_summary.append("roots")
+                        if "sampling" in capabilities:
+                            cap_summary.append("sampling")
+                        if "experimental" in capabilities:
+                            cap_summary.append(
+                                f"experimental({len(capabilities['experimental'])} features)"
+                            )
+
+                        logger.info(
+                            f"📋 Client capabilities: {', '.join(cap_summary) if cap_summary else 'none'}"
+                        )
+                        # Log full capabilities at INFO level to diagnose capability issues
+                        logger.info(
+                            f"Full capabilities JSON: {json.dumps(capabilities)}"
+                        )
+                except Exception as e:
+                    # Don't fail the request if logging fails
+                    logger.debug(
+                        f"Failed to parse MCP request for capability logging: {e}"
+                    )
+
        response = await call_next(request)
        return response

@@ -1374,6 +1453,11 @@ def get_app(transport: str = "sse", enabled_apps: list[str] | None = None):
        expose_headers=["*"],
    )

+    # Add observability middleware (metrics + tracing)
+    if settings.metrics_enabled or settings.tracing_enabled:
+        app.add_middleware(ObservabilityMiddleware)
+        logger.info("Observability middleware enabled (metrics and/or tracing)")
+
    # Add exception handler for scope challenges (OAuth mode only)
    if oauth_enabled:

@@ -1630,8 +1714,20 @@ def run(

    app = get_app(transport=transport, enabled_apps=enabled_apps)

+    # Get observability settings and create uvicorn logging config
+    settings = get_settings()
+    uvicorn_log_config = get_uvicorn_logging_config(
+        log_format=settings.log_format,
+        log_level=settings.log_level,
+        include_trace_context=settings.log_include_trace_context,
+    )
+
    uvicorn.run(
-        app=app, host=host, port=port, log_level=log_level, log_config=LOGGING_CONFIG
+        app=app,
+        host=host,
+        port=port,
+        log_level=log_level,
+        log_config=uvicorn_log_config,
    )


@@ -43,14 +43,17 @@ async def _get_processing_status(request: Request) -> dict[str, Any] | None:
        return None

    try:
-        # Get document queue from app state
-        document_queue = getattr(request.app.state, "document_queue", None)
-        if document_queue is None:
-            logger.debug("document_queue not available in app state")
+        # Get document receive stream from app state
+        document_receive_stream = getattr(
+            request.app.state, "document_receive_stream", None
+        )
+        if document_receive_stream is None:
+            logger.debug("document_receive_stream not available in app state")
            return None

-        # Get pending count from queue
-        pending_count = document_queue.qsize()
+        # Get pending count from stream statistics
+        stats = document_receive_stream.statistics()
+        pending_count = stats.current_buffer_used

        # Get Qdrant client and query indexed count
        indexed_count = 0
@@ -63,7 +66,7 @@ async def _get_processing_status(request: Request) -> dict[str, Any] | None:

            # Count documents in collection
            count_result = await qdrant_client.count(
-                collection_name=settings.qdrant_collection
+                collection_name=settings.get_collection_name()
            )
            indexed_count = count_result.count

@@ -7,6 +7,12 @@ from functools import wraps

 from httpx import AsyncClient, HTTPStatusError, RequestError, codes

+from nextcloud_mcp_server.observability.metrics import (
+    record_nextcloud_api_call,
+    record_nextcloud_api_retry,
+)
+from nextcloud_mcp_server.observability.tracing import trace_nextcloud_api_call
+
 logger = logging.getLogger(__name__)


@@ -38,6 +44,9 @@ def retry_on_429(func):
                    logger.warning(
                        f"429 Client Error: Too Many Requests, Number of attempts: {retries}"
                    )
+                    # Record retry metric (extract app name from args if available)
+                    if len(args) > 0 and hasattr(args[0], "app_name"):
+                        record_nextcloud_api_retry(app=args[0].app_name, reason="429")
                    time.sleep(5)
                elif e.response.status_code == 404:
                    # 404 errors are often expected (e.g., checking if attachments exist)
@@ -72,6 +81,9 @@ def retry_on_429(func):
 class BaseNextcloudClient(ABC):
    """Base class for all Nextcloud app clients."""

+    # Subclasses should set this to identify the app for metrics/tracing
+    app_name: str = "unknown"
+
    def __init__(self, http_client: AsyncClient, username: str):
        """Initialize with shared HTTP client and username.

@@ -88,7 +100,7 @@ class BaseNextcloudClient(ABC):

    @retry_on_429
    async def _make_request(self, method: str, url: str, **kwargs):
-        """Common request wrapper with logging and error handling.
+        """Common request wrapper with logging, tracing, and error handling.

        Args:
            method: HTTP method
@@ -99,6 +111,47 @@ class BaseNextcloudClient(ABC):
            Response object
        """
        logger.debug(f"Making {method} request to {url}")
-        response = await self._client.request(method, url, **kwargs)
-        response.raise_for_status()
-        return response
+
+        # Start timer for metrics
+        start_time = time.time()
+        status_code = 0
+
+        try:
+            # Wrap request in trace span
+            with trace_nextcloud_api_call(
+                app=self.app_name,
+                method=method,
+                path=url,
+            ):
+                response = await self._client.request(method, url, **kwargs)
+                status_code = response.status_code
+                response.raise_for_status()
+
+                # Record successful API call metrics
+                duration = time.time() - start_time
+                record_nextcloud_api_call(
+                    app=self.app_name,
+                    method=method,
+                    status_code=status_code,
+                    duration=duration,
+                )
+
+                return response
+
+        except (HTTPStatusError, RequestError) as e:
+            # Record error metrics
+            if isinstance(e, HTTPStatusError):
+                status_code = e.response.status_code
+            else:
+                status_code = 0  # Connection error, no status code
+
+            duration = time.time() - start_time
+            record_nextcloud_api_call(
+                app=self.app_name,
+                method=method,
+                status_code=status_code,
+                duration=duration,
+            )
+
+            # Re-raise the exception
+            raise
@@ -13,6 +13,8 @@ logger = logging.getLogger(__name__)
 class ContactsClient(BaseNextcloudClient):
    """Client for NextCloud CardDAV contact operations."""

+    app_name = "contacts"
+
    def _get_carddav_base_path(self) -> str:
        """Helper to get the base CardDAV path for contacts."""
        return f"/remote.php/dav/addressbooks/users/{self.username}"
@@ -13,6 +13,8 @@ logger = logging.getLogger(__name__)
 class CookbookClient(BaseNextcloudClient):
    """Client for Nextcloud Cookbook app operations."""

+    app_name = "cookbook"
+
    async def get_version(self) -> Dict[str, Any]:
        """Get Cookbook app and API version."""
        response = await self._make_request("GET", "/apps/cookbook/api/version")
@@ -17,6 +17,8 @@ from nextcloud_mcp_server.models.deck import (
 class DeckClient(BaseNextcloudClient):
    """Client for Nextcloud Deck app operations."""

+    app_name = "deck"
+
    def _get_deck_headers(
        self, additional_headers: Optional[Dict[str, str]] = None
    ) -> Dict[str, str]:
@@ -11,6 +11,8 @@ logger = logging.getLogger(__name__)
 class GroupsClient(BaseNextcloudClient):
    """Client for Nextcloud Groups API operations."""

+    app_name = "groups"
+
    @retry_on_429
    async def search_groups(
        self,
@@ -11,23 +11,64 @@ logger = logging.getLogger(__name__)
 class NotesClient(BaseNextcloudClient):
    """Client for Nextcloud Notes app operations."""

+    app_name = "notes"
+
    async def get_settings(self) -> Dict[str, Any]:
        """Get Notes app settings."""
        response = await self._make_request("GET", "/apps/notes/api/v1/settings")
        return response.json()

-    async def get_all_notes(self) -> AsyncIterator[Dict[str, Any]]:
-        """Get all notes, yielding them one at a time."""
+    async def get_all_notes(
+        self, prune_before: Optional[int] = None
+    ) -> AsyncIterator[Dict[str, Any]]:
+        """Get all notes, yielding them one at a time.
+
+        The Notes API returns changed notes with full data in chunks, and ALL note IDs
+        (with only 'id' field) in the last chunk for deletion detection. This causes
+        duplicates which we handle by tracking seen IDs (first occurrence with full
+        data is kept, later pruned duplicates are skipped).
+
+        Args:
+            prune_before: Optional Unix timestamp. Notes unchanged since this time
+                         are pruned (only 'id' field returned in last chunk).
+                         Reduces data transfer for large note collections.
+
+        Yields:
+            Note dictionaries with full data (deduplicated).
+        """
        cursor = ""
+        seen_ids: set[int] = set()

        while True:
+            params: Dict[str, Any] = {"chunkSize": 10}
+            if cursor:
+                params["chunkCursor"] = cursor
+            if prune_before is not None:
+                params["pruneBefore"] = prune_before
+
            response = await self._make_request(
                "GET",
                "/apps/notes/api/v1/notes",
-                params={"chunkSize": 10, "chunkCursor": cursor},
+                params=params,
            )
-            for note in response.json():
+            response_data = response.json()
+
+            for note in response_data:
+                note_id = note.get("id")
+                if note_id is None:
+                    logger.warning(f"Skipping note without ID: {note}")
+                    continue
+
+                # Skip duplicates (API returns all IDs in last chunk for deletion detection)
+                if note_id in seen_ids:
+                    logger.debug(
+                        f"Skipping duplicate note {note_id} (pruned version in last chunk)"
+                    )
+                    continue
+
+                seen_ids.add(note_id)
                yield note
+
            if "X-Notes-Chunk-Cursor" not in response.headers:
                break
            cursor = response.headers["X-Notes-Chunk-Cursor"]
@@ -11,6 +11,8 @@ logger = logging.getLogger(__name__)
 class SharingClient(BaseNextcloudClient):
    """Client for Nextcloud OCS Sharing API operations."""

+    app_name = "sharing"
+
    @retry_on_429
    async def create_share(
        self,
@@ -11,6 +11,8 @@ logger = logging.getLogger(__name__)
 class TablesClient(BaseNextcloudClient):
    """Client for Nextcloud Tables app operations."""

+    app_name = "tables"
+
    async def list_tables(self) -> List[Dict[str, Any]]:
        """List all tables available to the user."""
        response = await self._make_request(
@@ -7,6 +7,8 @@ from nextcloud_mcp_server.models.users import UserDetails
 class UsersClient(BaseNextcloudClient):
    """Client for Nextcloud User API operations."""

+    app_name = "users"
+
    def _get_user_headers(
        self, additional_headers: Optional[Dict[str, str]] = None
    ) -> Dict[str, str]:
@@ -15,6 +15,8 @@ logger = logging.getLogger(__name__)
 class WebDAVClient(BaseNextcloudClient):
    """Client for Nextcloud WebDAV operations."""

+    app_name = "webdav"
+
    async def delete_resource(self, path: str) -> Dict[str, Any]:
        """Delete a resource (file or directory) via WebDAV DELETE."""
        # Ensure path ends with a slash if it's a directory
@@ -174,6 +174,22 @@ class Settings:
    ollama_embedding_model: str = "nomic-embed-text"
    ollama_verify_ssl: bool = True

+    # Document chunking settings (for vector embeddings)
+    document_chunk_size: int = 512  # Words per chunk
+    document_chunk_overlap: int = 50  # Overlapping words between chunks
+
+    # Observability settings
+    metrics_enabled: bool = True
+    metrics_port: int = 9090
+    tracing_enabled: bool = False
+    otel_exporter_otlp_endpoint: Optional[str] = None
+    otel_service_name: str = "nextcloud-mcp-server"
+    otel_traces_sampler: str = "always_on"
+    otel_traces_sampler_arg: float = 1.0
+    log_format: str = "json"  # "json" or "text"
+    log_level: str = "INFO"
+    log_include_trace_context: bool = True
+
    def __post_init__(self):
        """Validate Qdrant configuration and set defaults."""
        logger = logging.getLogger(__name__)
@@ -197,6 +213,65 @@ class Settings:
                "API key is only relevant for network mode and will be ignored."
            )

+        # Validate chunking configuration
+        if self.document_chunk_overlap >= self.document_chunk_size:
+            raise ValueError(
+                f"DOCUMENT_CHUNK_OVERLAP ({self.document_chunk_overlap}) must be less than "
+                f"DOCUMENT_CHUNK_SIZE ({self.document_chunk_size}). "
+                f"Overlap should be 10-20% of chunk size for optimal results."
+            )
+
+        if self.document_chunk_size < 100:
+            logger.warning(
+                f"DOCUMENT_CHUNK_SIZE is set to {self.document_chunk_size} words, which is quite small. "
+                f"Smaller chunks may lose context. Consider using at least 256 words."
+            )
+
+        if self.document_chunk_overlap < 0:
+            raise ValueError(
+                f"DOCUMENT_CHUNK_OVERLAP ({self.document_chunk_overlap}) cannot be negative."
+            )
+
+    def get_collection_name(self) -> str:
+        """
+        Get Qdrant collection name.
+
+        Auto-generates from deployment ID + model name unless explicitly set.
+        Deployment ID uses OTEL_SERVICE_NAME if configured, otherwise hostname.
+
+        This enables:
+        - Safe embedding model switching (new model → new collection)
+        - Multi-server deployments (unique deployment IDs)
+        - Clear collection naming (shows deployment and model)
+
+        Format: {deployment-id}-{model-name}
+
+        Examples:
+            - "my-deployment-nomic-embed-text" (OTEL_SERVICE_NAME set)
+            - "mcp-container-all-minilm" (hostname fallback)
+
+        Returns:
+            Collection name string
+        """
+        import socket
+
+        # Use explicit override if user configured non-default value
+        if self.qdrant_collection != "nextcloud_content":
+            return self.qdrant_collection
+
+        # Determine deployment ID (OTEL service name or hostname fallback)
+        if self.otel_service_name != "nextcloud-mcp-server":  # Non-default
+            deployment_id = self.otel_service_name
+        else:
+            # Fallback to hostname for simple Docker deployments without OTEL config
+            deployment_id = socket.gethostname()
+
+        # Sanitize deployment ID and model name
+        deployment_id = deployment_id.lower().replace(" ", "-").replace("_", "-")
+        model_name = self.ollama_embedding_model.replace("/", "-").replace(":", "-")
+
+        return f"{deployment_id}-{model_name}"
+

 def get_settings() -> Settings:
    """Get application settings from environment variables.
@@ -253,4 +328,19 @@ def get_settings() -> Settings:
        ollama_base_url=os.getenv("OLLAMA_BASE_URL"),
        ollama_embedding_model=os.getenv("OLLAMA_EMBEDDING_MODEL", "nomic-embed-text"),
        ollama_verify_ssl=os.getenv("OLLAMA_VERIFY_SSL", "true").lower() == "true",
+        # Document chunking settings
+        document_chunk_size=int(os.getenv("DOCUMENT_CHUNK_SIZE", "512")),
+        document_chunk_overlap=int(os.getenv("DOCUMENT_CHUNK_OVERLAP", "50")),
+        # Observability settings
+        metrics_enabled=os.getenv("METRICS_ENABLED", "true").lower() == "true",
+        metrics_port=int(os.getenv("METRICS_PORT", "9090")),
+        tracing_enabled=os.getenv("OTEL_ENABLED", "false").lower() == "true",
+        otel_exporter_otlp_endpoint=os.getenv("OTEL_EXPORTER_OTLP_ENDPOINT"),
+        otel_service_name=os.getenv("OTEL_SERVICE_NAME", "nextcloud-mcp-server"),
+        otel_traces_sampler=os.getenv("OTEL_TRACES_SAMPLER", "always_on"),
+        otel_traces_sampler_arg=float(os.getenv("OTEL_TRACES_SAMPLER_ARG", "1.0")),
+        log_format=os.getenv("LOG_FORMAT", "json"),
+        log_level=os.getenv("LOG_LEVEL", "INFO"),
+        log_include_trace_context=os.getenv("LOG_INCLUDE_TRACE_CONTEXT", "true").lower()
+        == "true",
    )
@@ -0,0 +1,31 @@
+"""
+Observability module for the Nextcloud MCP Server.
+
+This module provides:
+- Prometheus metrics collection
+- OpenTelemetry distributed tracing
+- Enhanced structured logging with trace correlation
+- Monitoring middleware for Starlette/FastAPI
+
+Usage:
+    from nextcloud_mcp_server.observability import setup_observability
+
+    # In app.py lifespan
+    setup_observability(app, config)
+"""
+
+from nextcloud_mcp_server.observability.logging_config import (
+    get_uvicorn_logging_config,
+    setup_logging,
+)
+from nextcloud_mcp_server.observability.metrics import setup_metrics
+from nextcloud_mcp_server.observability.middleware import ObservabilityMiddleware
+from nextcloud_mcp_server.observability.tracing import setup_tracing
+
+__all__ = [
+    "setup_logging",
+    "get_uvicorn_logging_config",
+    "setup_metrics",
+    "setup_tracing",
+    "ObservabilityMiddleware",
+]
@@ -0,0 +1,327 @@
+"""
+Enhanced logging configuration for the Nextcloud MCP Server.
+
+This module provides:
+- Structured JSON logging with python-json-logger
+- Trace context injection (trace_id, span_id) for correlation with distributed traces
+- Configurable log formats (JSON or text)
+- Log level configuration per component
+"""
+
+import logging
+import sys
+from typing import Any
+
+from pythonjsonlogger import jsonlogger
+
+from nextcloud_mcp_server.observability.tracing import get_trace_context
+
+
+class HealthCheckFilter(logging.Filter):
+    """
+    Logging filter that excludes health check endpoint requests.
+
+    This prevents health check polls from cluttering logs while keeping
+    access logs for all other endpoints.
+    """
+
+    def filter(self, record: logging.LogRecord) -> bool:
+        """
+        Filter out health check requests from uvicorn access logs.
+
+        Args:
+            record: LogRecord instance
+
+        Returns:
+            False if this is a health check request, True otherwise
+        """
+        # Check if the log message contains health check endpoints
+        message = record.getMessage()
+        return not any(
+            endpoint in message
+            for endpoint in ["/health/live", "/health/ready", "/metrics"]
+        )
+
+
+class TraceContextFormatter(jsonlogger.JsonFormatter):
+    """
+    JSON formatter that injects OpenTelemetry trace context into log records.
+
+    This allows logs to be correlated with distributed traces by including
+    trace_id and span_id in each log entry.
+    """
+
+    def add_fields(
+        self,
+        log_record: dict[str, Any],
+        record: logging.LogRecord,
+        message_dict: dict[str, Any],
+    ) -> None:
+        """
+        Add custom fields to the log record, including trace context.
+
+        Args:
+            log_record: Dictionary to be serialized as JSON
+            record: LogRecord instance
+            message_dict: Dictionary of extra fields from log call
+        """
+        # Call parent to add standard fields
+        super().add_fields(log_record, record, message_dict)
+
+        # Add trace context if available
+        trace_context = get_trace_context()
+        if trace_context:
+            log_record["trace_id"] = trace_context.get("trace_id")
+            log_record["span_id"] = trace_context.get("span_id")
+
+        # Add standard fields with consistent naming
+        log_record["timestamp"] = self.formatTime(record)
+        log_record["level"] = record.levelname
+        log_record["logger"] = record.name
+        log_record["message"] = record.getMessage()
+
+        # Include exception info if present
+        if record.exc_info:
+            log_record["exception"] = self.formatException(record.exc_info)
+
+
+class TraceContextTextFormatter(logging.Formatter):
+    """
+    Text formatter that includes OpenTelemetry trace context.
+
+    Format: [LEVEL] [timestamp] logger - message [trace_id=xxx span_id=yyy]
+    """
+
+    def format(self, record: logging.LogRecord) -> str:
+        """
+        Format log record with trace context.
+
+        Args:
+            record: LogRecord instance
+
+        Returns:
+            Formatted log string
+        """
+        # Format base message
+        base_message = super().format(record)
+
+        # Add trace context if available
+        trace_context = get_trace_context()
+        if trace_context:
+            trace_id = trace_context.get("trace_id", "")
+            span_id = trace_context.get("span_id", "")
+            return f"{base_message} [trace_id={trace_id} span_id={span_id}]"
+
+        return base_message
+
+
+def setup_logging(
+    log_format: str = "json",
+    log_level: str = "INFO",
+    include_trace_context: bool = True,
+) -> None:
+    """
+    Configure logging for the Nextcloud MCP Server.
+
+    Args:
+        log_format: "json" for JSON logging, "text" for human-readable text (default: "json")
+        log_level: Minimum log level (DEBUG, INFO, WARNING, ERROR, CRITICAL) (default: "INFO")
+        include_trace_context: Whether to include trace context in logs (default: True)
+    """
+    # Get root logger
+    root_logger = logging.getLogger()
+    root_logger.setLevel(getattr(logging, log_level.upper(), logging.INFO))
+
+    # Remove existing handlers
+    root_logger.handlers.clear()
+
+    # Create console handler
+    console_handler = logging.StreamHandler(sys.stdout)
+    console_handler.setLevel(getattr(logging, log_level.upper(), logging.INFO))
+
+    # Configure formatter based on format preference
+    if log_format.lower() == "json":
+        if include_trace_context:
+            formatter = TraceContextFormatter(
+                "%(timestamp)s %(level)s %(name)s %(message)s",
+                datefmt="%Y-%m-%dT%H:%M:%S",
+            )
+        else:
+            formatter = jsonlogger.JsonFormatter(
+                "%(timestamp)s %(level)s %(name)s %(message)s",
+                datefmt="%Y-%m-%dT%H:%M:%S",
+            )
+    else:  # text format
+        if include_trace_context:
+            formatter = TraceContextTextFormatter(
+                "%(levelname)s [%(asctime)s] %(name)s - %(message)s",
+                datefmt="%Y-%m-%d %H:%M:%S",
+            )
+        else:
+            formatter = logging.Formatter(
+                "%(levelname)s [%(asctime)s] %(name)s - %(message)s",
+                datefmt="%Y-%m-%d %H:%M:%S",
+            )
+
+    console_handler.setFormatter(formatter)
+    root_logger.addHandler(console_handler)
+
+    # Configure specific logger levels
+    configure_component_loggers(log_level)
+
+    root_logger.info(
+        f"Logging configured: format={log_format}, level={log_level}, "
+        f"trace_context={include_trace_context}"
+    )
+
+
+def configure_component_loggers(default_level: str = "INFO") -> None:
+    """
+    Configure log levels for specific components.
+
+    This allows fine-grained control over logging verbosity for different
+    parts of the application.
+
+    Args:
+        default_level: Default log level for most components
+    """
+    # Map of logger names to log levels
+    logger_levels = {
+        # Application loggers
+        "nextcloud_mcp_server": default_level,
+        "nextcloud_mcp_server.server": default_level,
+        "nextcloud_mcp_server.client": default_level,
+        "nextcloud_mcp_server.auth": default_level,
+        "nextcloud_mcp_server.observability": default_level,
+        # HTTP client loggers (less verbose by default)
+        "httpx": "WARNING",
+        "httpcore": "WARNING",
+        # Server loggers
+        "uvicorn": "INFO",
+        "uvicorn.access": "INFO",
+        "uvicorn.error": "INFO",
+        # MCP framework
+        "mcp": "INFO",
+        # OpenTelemetry (less verbose)
+        "opentelemetry": "WARNING",
+    }
+
+    for logger_name, level in logger_levels.items():
+        logger = logging.getLogger(logger_name)
+        logger.setLevel(getattr(logging, level.upper(), logging.INFO))
+
+
+def get_logger(name: str) -> logging.Logger:
+    """
+    Get a logger instance for a specific module.
+
+    This is a convenience function that wraps logging.getLogger()
+    to ensure consistent logger configuration.
+
+    Args:
+        name: Logger name (typically __name__)
+
+    Returns:
+        Logger instance
+    """
+    return logging.getLogger(name)
+
+
+def get_uvicorn_logging_config(
+    log_format: str = "json",
+    log_level: str = "INFO",
+    include_trace_context: bool = True,
+) -> dict:
+    """
+    Get uvicorn-compatible logging configuration.
+
+    This creates a logging config dict that uvicorn can use while maintaining
+    our observability setup (JSON format, trace context, etc.).
+
+    Args:
+        log_format: "json" or "text"
+        log_level: Minimum log level
+        include_trace_context: Whether to include trace IDs in logs
+
+    Returns:
+        Logging config dict compatible with uvicorn's log_config parameter
+    """
+    # Determine formatter class based on format and trace context
+    if log_format.lower() == "json":
+        if include_trace_context:
+            formatter_class = "nextcloud_mcp_server.observability.logging_config.TraceContextFormatter"
+        else:
+            formatter_class = "pythonjsonlogger.jsonlogger.JsonFormatter"
+        format_string = "%(timestamp)s %(level)s %(name)s %(message)s"
+    else:
+        if include_trace_context:
+            formatter_class = "nextcloud_mcp_server.observability.logging_config.TraceContextTextFormatter"
+        else:
+            formatter_class = "logging.Formatter"
+        format_string = "%(levelname)s [%(asctime)s] %(name)s - %(message)s"
+
+    return {
+        "version": 1,
+        "disable_existing_loggers": False,
+        "formatters": {
+            "default": {
+                "()": formatter_class,
+                "format": format_string,
+                "datefmt": "%Y-%m-%d %H:%M:%S",
+            },
+        },
+        "filters": {
+            "health_check_filter": {
+                "()": "nextcloud_mcp_server.observability.logging_config.HealthCheckFilter",
+            },
+        },
+        "handlers": {
+            "default": {
+                "formatter": "default",
+                "class": "logging.StreamHandler",
+                "stream": "ext://sys.stdout",
+            },
+            "access": {
+                "formatter": "default",
+                "class": "logging.StreamHandler",
+                "stream": "ext://sys.stdout",
+                "filters": ["health_check_filter"],
+            },
+        },
+        "loggers": {
+            "": {
+                "handlers": ["default"],
+                "level": log_level.upper(),
+            },
+            "uvicorn": {
+                "handlers": ["default"],
+                "level": "INFO",
+                "propagate": False,
+            },
+            "uvicorn.access": {
+                "handlers": ["access"],
+                "level": "INFO",
+                "propagate": False,
+            },
+            "uvicorn.error": {
+                "handlers": ["default"],
+                "level": "INFO",
+                "propagate": False,
+            },
+            "httpx": {
+                "handlers": ["default"],
+                "level": "WARNING",
+                "propagate": False,
+            },
+            "httpcore": {
+                "handlers": ["default"],
+                "level": "WARNING",
+                "propagate": False,
+            },
+            "opentelemetry": {
+                "handlers": ["default"],
+                "level": "WARNING",
+                "propagate": False,
+            },
+        },
+    }
@@ -0,0 +1,354 @@
+"""
+Prometheus metrics for the Nextcloud MCP Server.
+
+This module defines all Prometheus metrics for monitoring server health, performance,
+and resource usage. Metrics are organized by category:
+
+- HTTP Server Metrics (RED: Rate, Errors, Duration)
+- MCP Tool Metrics (per-tool invocation tracking)
+- MCP Resource Metrics
+- Nextcloud API Client Metrics
+- OAuth Flow Metrics
+- Vector Sync Metrics (conditional on feature flag)
+- Database Operation Metrics
+- External Dependency Health Metrics
+"""
+
+import logging
+
+from prometheus_client import (
+    Counter,
+    Gauge,
+    Histogram,
+    start_http_server,
+)
+
+logger = logging.getLogger(__name__)
+
+# =============================================================================
+# HTTP Server Metrics (RED + System)
+# =============================================================================
+
+http_requests_total = Counter(
+    "mcp_http_requests_total",
+    "Total HTTP requests received",
+    ["method", "endpoint", "status_code"],
+)
+
+http_request_duration_seconds = Histogram(
+    "mcp_http_request_duration_seconds",
+    "HTTP request latency in seconds",
+    ["method", "endpoint"],
+    buckets=(0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1.0, 2.5, 5.0, 10.0),
+)
+
+http_requests_in_progress = Gauge(
+    "mcp_http_requests_in_progress",
+    "Number of HTTP requests currently being processed",
+    ["method", "endpoint"],
+)
+
+# =============================================================================
+# MCP Tool Metrics
+# =============================================================================
+
+mcp_tool_calls_total = Counter(
+    "mcp_tool_calls_total",
+    "Total MCP tool invocations",
+    ["tool_name", "status"],  # status: success | error
+)
+
+mcp_tool_duration_seconds = Histogram(
+    "mcp_tool_duration_seconds",
+    "MCP tool execution duration in seconds",
+    ["tool_name"],
+    buckets=(0.01, 0.05, 0.1, 0.25, 0.5, 1.0, 2.5, 5.0, 10.0, 30.0),
+)
+
+mcp_tool_errors_total = Counter(
+    "mcp_tool_errors_total",
+    "Total MCP tool errors by type",
+    ["tool_name", "error_type"],
+)
+
+# =============================================================================
+# MCP Resource Metrics
+# =============================================================================
+
+mcp_resource_requests_total = Counter(
+    "mcp_resource_requests_total",
+    "Total MCP resource requests",
+    ["resource_uri", "status"],
+)
+
+mcp_resource_duration_seconds = Histogram(
+    "mcp_resource_duration_seconds",
+    "MCP resource request duration in seconds",
+    ["resource_uri"],
+    buckets=(0.01, 0.05, 0.1, 0.25, 0.5, 1.0, 2.5),
+)
+
+# =============================================================================
+# Nextcloud API Client Metrics
+# =============================================================================
+
+nextcloud_api_requests_total = Counter(
+    "mcp_nextcloud_api_requests_total",
+    "Total Nextcloud API requests",
+    ["app", "method", "status_code"],  # app: notes, calendar, contacts, etc.
+)
+
+nextcloud_api_duration_seconds = Histogram(
+    "mcp_nextcloud_api_duration_seconds",
+    "Nextcloud API request duration in seconds",
+    ["app", "method"],
+    buckets=(0.01, 0.05, 0.1, 0.25, 0.5, 1.0, 2.5, 5.0, 10.0),
+)
+
+nextcloud_api_retries_total = Counter(
+    "mcp_nextcloud_api_retries_total",
+    "Total Nextcloud API retries",
+    ["app", "reason"],  # reason: 429 | timeout | connection_error
+)
+
+# =============================================================================
+# OAuth Flow Metrics
+# =============================================================================
+
+oauth_token_validations_total = Counter(
+    "mcp_oauth_token_validations_total",
+    "Total OAuth token validation attempts",
+    ["method", "result"],  # method: introspect | jwt; result: valid | invalid | error
+)
+
+oauth_token_exchange_total = Counter(
+    "mcp_oauth_token_exchange_total",
+    "Total OAuth token exchange operations (RFC 8693)",
+    ["status"],  # status: success | error
+)
+
+oauth_token_cache_hits_total = Counter(
+    "mcp_oauth_token_cache_hits_total",
+    "Total OAuth token cache lookups",
+    ["hit"],  # hit: true | false
+)
+
+oauth_refresh_token_operations_total = Counter(
+    "mcp_oauth_refresh_token_operations_total",
+    "Total refresh token storage operations",
+    [
+        "operation",
+        "status",
+    ],  # operation: store | retrieve | delete; status: success | error
+)
+
+# =============================================================================
+# Vector Sync Metrics (optional feature)
+# =============================================================================
+
+vector_sync_documents_scanned_total = Counter(
+    "mcp_vector_sync_documents_scanned_total",
+    "Total documents scanned for vector sync",
+)
+
+vector_sync_documents_processed_total = Counter(
+    "mcp_vector_sync_documents_processed_total",
+    "Total documents processed for vector sync",
+    ["status"],  # status: success | error
+)
+
+vector_sync_processing_duration_seconds = Histogram(
+    "mcp_vector_sync_processing_duration_seconds",
+    "Document processing duration in seconds",
+    buckets=(0.1, 0.5, 1.0, 2.5, 5.0, 10.0, 30.0, 60.0),
+)
+
+vector_sync_queue_size = Gauge(
+    "mcp_vector_sync_queue_size",
+    "Current number of documents in processing queue",
+)
+
+qdrant_operations_total = Counter(
+    "mcp_qdrant_operations_total",
+    "Total Qdrant vector database operations",
+    [
+        "operation",
+        "status",
+    ],  # operation: upsert | search | delete; status: success | error
+)
+
+# =============================================================================
+# Database Metrics
+# =============================================================================
+
+db_operations_total = Counter(
+    "mcp_db_operations_total",
+    "Total database operations",
+    ["db", "operation", "status"],  # db: sqlite | qdrant; operation varies
+)
+
+db_operation_duration_seconds = Histogram(
+    "mcp_db_operation_duration_seconds",
+    "Database operation duration in seconds",
+    ["db", "operation"],
+    buckets=(0.001, 0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1.0),
+)
+
+# =============================================================================
+# External Dependency Health Metrics
+# =============================================================================
+
+dependency_health = Gauge(
+    "mcp_dependency_health",
+    "External dependency health status (1=up, 0=down)",
+    ["dependency"],  # dependency: nextcloud | keycloak | qdrant | unstructured
+)
+
+dependency_check_duration_seconds = Histogram(
+    "mcp_dependency_check_duration_seconds",
+    "Dependency health check duration in seconds",
+    ["dependency"],
+    buckets=(0.01, 0.05, 0.1, 0.25, 0.5, 1.0, 2.5),
+)
+
+# =============================================================================
+# Metrics Setup and HTTP Handler
+# =============================================================================
+
+
+def setup_metrics(port: int = 9090) -> None:
+    """
+    Initialize Prometheus metrics collection and start HTTP server.
+
+    Starts a dedicated HTTP server on the specified port to serve metrics.
+    This server runs in a separate thread and is isolated from the main application.
+
+    Args:
+        port: Port to serve metrics on (default: 9090)
+
+    Note:
+        Metrics endpoint (/metrics) is ONLY accessible on this dedicated port,
+        not on the main application HTTP port. This is a security best practice
+        to prevent external exposure of metrics.
+    """
+    try:
+        start_http_server(port)
+        logger.info(f"Prometheus metrics server started on port {port}")
+    except OSError as e:
+        if "Address already in use" in str(e):
+            logger.warning(
+                f"Metrics port {port} already in use (metrics server likely already running)"
+            )
+        else:
+            logger.error(f"Failed to start metrics server on port {port}: {e}")
+            raise
+
+
+# =============================================================================
+# Convenience Functions for Common Metric Updates
+# =============================================================================
+
+
+def record_tool_call(tool_name: str, duration: float, status: str = "success") -> None:
+    """
+    Record metrics for an MCP tool call.
+
+    Args:
+        tool_name: Name of the MCP tool
+        duration: Execution duration in seconds
+        status: "success" or "error"
+    """
+    mcp_tool_calls_total.labels(tool_name=tool_name, status=status).inc()
+    mcp_tool_duration_seconds.labels(tool_name=tool_name).observe(duration)
+
+
+def record_tool_error(tool_name: str, error_type: str) -> None:
+    """
+    Record an MCP tool error.
+
+    Args:
+        tool_name: Name of the MCP tool
+        error_type: Type of error (e.g., "HTTPStatusError", "ValueError")
+    """
+    mcp_tool_errors_total.labels(tool_name=tool_name, error_type=error_type).inc()
+
+
+def record_nextcloud_api_call(
+    app: str,
+    method: str,
+    status_code: int,
+    duration: float,
+) -> None:
+    """
+    Record metrics for a Nextcloud API call.
+
+    Args:
+        app: Nextcloud app name (notes, calendar, contacts, etc.)
+        method: HTTP method (GET, POST, PUT, DELETE, PROPFIND, etc.)
+        status_code: HTTP status code
+        duration: Request duration in seconds
+    """
+    nextcloud_api_requests_total.labels(
+        app=app, method=method, status_code=str(status_code)
+    ).inc()
+    nextcloud_api_duration_seconds.labels(app=app, method=method).observe(duration)
+
+
+def record_nextcloud_api_retry(app: str, reason: str) -> None:
+    """
+    Record a Nextcloud API retry.
+
+    Args:
+        app: Nextcloud app name
+        reason: Retry reason (429, timeout, connection_error)
+    """
+    nextcloud_api_retries_total.labels(app=app, reason=reason).inc()
+
+
+def record_oauth_token_validation(method: str, result: str) -> None:
+    """
+    Record an OAuth token validation.
+
+    Args:
+        method: Validation method ("introspect" or "jwt")
+        result: Validation result ("valid", "invalid", or "error")
+    """
+    oauth_token_validations_total.labels(method=method, result=result).inc()
+
+
+def record_db_operation(
+    db: str, operation: str, duration: float, status: str = "success"
+) -> None:
+    """
+    Record a database operation.
+
+    Args:
+        db: Database type ("sqlite" or "qdrant")
+        operation: Operation type (e.g., "insert", "select", "upsert", "search")
+        duration: Operation duration in seconds
+        status: "success" or "error"
+    """
+    db_operations_total.labels(db=db, operation=operation, status=status).inc()
+    db_operation_duration_seconds.labels(db=db, operation=operation).observe(duration)
+
+
+def set_dependency_health(dependency: str, is_healthy: bool) -> None:
+    """
+    Update external dependency health status.
+
+    Args:
+        dependency: Dependency name (nextcloud, keycloak, qdrant, unstructured)
+        is_healthy: True if dependency is healthy, False otherwise
+    """
+    dependency_health.labels(dependency=dependency).set(1 if is_healthy else 0)
+
+
+def record_dependency_check(dependency: str, duration: float) -> None:
+    """
+    Record a dependency health check duration.
+
+    Args:
+        dependency: Dependency name
+        duration: Check duration in seconds
+    """
+    dependency_check_duration_seconds.labels(dependency=dependency).observe(duration)
@@ -0,0 +1,218 @@
+"""
+Observability middleware for the Nextcloud MCP Server.
+
+This module provides Starlette middleware that automatically instruments
+HTTP requests with:
+- Prometheus metrics (request count, latency, in-flight requests)
+- OpenTelemetry distributed tracing
+- Request/response timing and error tracking
+"""
+
+import logging
+import time
+from typing import Callable
+
+from starlette.middleware.base import BaseHTTPMiddleware
+from starlette.requests import Request
+from starlette.responses import Response
+
+from nextcloud_mcp_server.observability.metrics import (
+    http_request_duration_seconds,
+    http_requests_in_progress,
+    http_requests_total,
+)
+from nextcloud_mcp_server.observability.tracing import (
+    add_span_attribute,
+    trace_operation,
+)
+
+logger = logging.getLogger(__name__)
+
+
+class ObservabilityMiddleware(BaseHTTPMiddleware):
+    """
+    Starlette middleware for automatic HTTP request instrumentation.
+
+    This middleware:
+    - Records Prometheus metrics for each request (RED metrics)
+    - Creates OpenTelemetry spans for distributed tracing
+    - Tracks request timing and errors
+    - Handles in-flight request counting
+    """
+
+    async def dispatch(
+        self,
+        request: Request,
+        call_next: Callable,
+    ) -> Response:
+        """
+        Process HTTP request with observability instrumentation.
+
+        Args:
+            request: Starlette request object
+            call_next: Next middleware or route handler
+
+        Returns:
+            Response from downstream handler
+        """
+        # Extract request details
+        method = request.method
+        path = request.url.path
+        endpoint = self._get_endpoint_label(path)
+
+        # Increment in-flight requests counter
+        http_requests_in_progress.labels(method=method, endpoint=endpoint).inc()
+
+        # Record start time
+        start_time = time.time()
+
+        # Skip tracing for health/metrics endpoints to reduce noise
+        should_trace = not (path.startswith("/health/") or path == "/metrics")
+
+        try:
+            if should_trace:
+                # Create span for request (OpenTelemetry auto-instrumentation will create parent span)
+                with trace_operation(
+                    f"HTTP {method} {endpoint}",
+                    attributes={
+                        "http.method": method,
+                        "http.path": path,
+                        "http.scheme": request.url.scheme,
+                        "http.host": request.url.hostname,
+                    },
+                ):
+                    # Process request
+                    response = await call_next(request)
+
+                    # Add response status to span
+                    add_span_attribute("http.status_code", response.status_code)
+
+                    # Record metrics
+                    duration = time.time() - start_time
+                    self._record_request_metrics(
+                        method=method,
+                        endpoint=endpoint,
+                        status_code=response.status_code,
+                        duration=duration,
+                    )
+
+                    return response
+            else:
+                # No tracing for health/metrics endpoints, but still record metrics
+                response = await call_next(request)
+
+                # Record metrics
+                duration = time.time() - start_time
+                self._record_request_metrics(
+                    method=method,
+                    endpoint=endpoint,
+                    status_code=response.status_code,
+                    duration=duration,
+                )
+
+                return response
+
+        except Exception:
+            # Record error metrics
+            duration = time.time() - start_time
+            self._record_request_metrics(
+                method=method,
+                endpoint=endpoint,
+                status_code=500,  # Internal server error
+                duration=duration,
+            )
+
+            logger.error(
+                f"Request failed: {method} {path}",
+                exc_info=True,
+                extra={
+                    "method": method,
+                    "path": path,
+                    "duration_seconds": duration,
+                },
+            )
+
+            # Re-raise exception to be handled by error middleware
+            raise
+
+        finally:
+            # Decrement in-flight requests counter
+            http_requests_in_progress.labels(method=method, endpoint=endpoint).dec()
+
+    def _get_endpoint_label(self, path: str) -> str:
+        """
+        Get endpoint label for metrics, normalizing dynamic path segments.
+
+        This prevents metric cardinality explosion by grouping similar paths.
+
+        Args:
+            path: Request path
+
+        Returns:
+            Normalized endpoint label
+        """
+        # Health check endpoints
+        if path.startswith("/health/"):
+            return "/health/*"
+
+        # Metrics endpoint
+        if path == "/metrics":
+            return "/metrics"
+
+        # MCP protocol endpoints
+        if path == "/sse" or path.startswith("/sse/"):
+            return "/sse"
+
+        if path == "/messages" or path.startswith("/messages/"):
+            return "/messages"
+
+        # OAuth/OIDC endpoints
+        if path.startswith("/oauth/"):
+            return "/oauth/*"
+
+        if path.startswith("/oidc/"):
+            return "/oidc/*"
+
+        # Catch-all for other paths
+        return path
+
+    def _record_request_metrics(
+        self,
+        method: str,
+        endpoint: str,
+        status_code: int,
+        duration: float,
+    ) -> None:
+        """
+        Record Prometheus metrics for an HTTP request.
+
+        Args:
+            method: HTTP method
+            endpoint: Normalized endpoint label
+            status_code: HTTP status code
+            duration: Request duration in seconds
+        """
+        # Record request count
+        http_requests_total.labels(
+            method=method,
+            endpoint=endpoint,
+            status_code=str(status_code),
+        ).inc()
+
+        # Record request duration
+        http_request_duration_seconds.labels(
+            method=method,
+            endpoint=endpoint,
+        ).observe(duration)
+
+        # Log slow requests (>1 second)
+        if duration > 1.0:
+            logger.warning(
+                f"Slow request: {method} {endpoint} took {duration:.3f}s",
+                extra={
+                    "method": method,
+                    "endpoint": endpoint,
+                    "status_code": status_code,
+                    "duration_seconds": duration,
+                },
+            )
@@ -0,0 +1,363 @@
+"""
+OpenTelemetry distributed tracing for the Nextcloud MCP Server.
+
+This module provides:
+- OpenTelemetry SDK initialization with OTLP exporter
+- Auto-instrumentation for ASGI (Starlette/FastAPI) and httpx
+- Helper functions for creating custom spans
+- Context propagation utilities
+- Span attribute standardization
+"""
+
+import logging
+from contextlib import contextmanager
+from typing import Any
+
+from opentelemetry import trace
+from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
+from opentelemetry.instrumentation.httpx import HTTPXClientInstrumentor
+from opentelemetry.instrumentation.logging import LoggingInstrumentor
+from opentelemetry.sdk.resources import Resource
+from opentelemetry.sdk.trace import TracerProvider
+from opentelemetry.sdk.trace.export import BatchSpanProcessor
+from opentelemetry.trace import Status, StatusCode, Tracer
+
+logger = logging.getLogger(__name__)
+
+# Global tracer instance (initialized in setup_tracing)
+_tracer: Tracer | None = None
+
+
+def setup_tracing(
+    service_name: str = "nextcloud-mcp-server",
+    otlp_endpoint: str | None = None,
+    sampling_rate: float = 1.0,
+) -> Tracer:
+    """
+    Initialize OpenTelemetry tracing with OTLP exporter.
+
+    Args:
+        service_name: Service name for traces (default: "nextcloud-mcp-server")
+        otlp_endpoint: OTLP gRPC endpoint (e.g., "http://otel-collector:4317")
+                      If None, tracing is initialized but no exporter is configured
+        sampling_rate: Sampling rate (0.0-1.0). Default 1.0 (100% sampling)
+
+    Returns:
+        Tracer instance for creating custom spans
+    """
+    global _tracer
+
+    # Create resource with service name
+    resource = Resource.create(
+        {
+            "service.name": service_name,
+            "service.version": "0.27.2",  # TODO: Extract from pyproject.toml
+        }
+    )
+
+    # Create tracer provider
+    provider = TracerProvider(resource=resource)
+
+    # Configure OTLP exporter if endpoint is provided
+    if otlp_endpoint:
+        try:
+            otlp_exporter = OTLPSpanExporter(endpoint=otlp_endpoint, insecure=True)
+            span_processor = BatchSpanProcessor(otlp_exporter)
+            provider.add_span_processor(span_processor)
+            logger.info(
+                f"OpenTelemetry tracing enabled with OTLP endpoint: {otlp_endpoint}"
+            )
+        except Exception as e:
+            logger.warning(
+                f"Failed to initialize OTLP exporter: {e}. Continuing without trace export."
+            )
+    else:
+        logger.info(
+            "OpenTelemetry tracing initialized without OTLP exporter (traces will be generated but not exported)"
+        )
+
+    # Set global tracer provider
+    trace.set_tracer_provider(provider)
+
+    # Auto-instrument httpx for Nextcloud API calls
+    HTTPXClientInstrumentor().instrument()
+
+    # Auto-instrument logging to inject trace context
+    LoggingInstrumentor().instrument(set_logging_format=True)
+
+    # Get and store tracer
+    _tracer = trace.get_tracer(__name__)
+
+    logger.info(f"OpenTelemetry tracing initialized for service: {service_name}")
+    return _tracer
+
+
+def get_tracer() -> Tracer | None:
+    """
+    Get the global tracer instance.
+
+    Returns:
+        Tracer instance for creating custom spans, or None if tracing is not enabled
+
+    Note:
+        Returns None if setup_tracing() was never called (tracing disabled).
+        Calling code should handle None gracefully.
+    """
+    return _tracer
+
+
+@contextmanager
+def trace_operation(
+    operation_name: str,
+    attributes: dict[str, Any] | None = None,
+    record_exception: bool = True,
+):
+    """
+    Context manager for tracing an operation with automatic error handling.
+
+    Usage:
+        with trace_operation("mcp.tool.nc_notes_create_note", {"note.title": "My Note"}):
+            # Your code here
+            pass
+
+    Args:
+        operation_name: Name of the operation (span name)
+        attributes: Optional attributes to add to the span
+        record_exception: Whether to record exceptions in the span (default: True)
+
+    Yields:
+        Span instance for adding additional attributes (or None if tracing disabled)
+    """
+    tracer = get_tracer()
+
+    # If tracing is not enabled, just yield without creating a span
+    if tracer is None:
+        yield None
+        return
+
+    with tracer.start_as_current_span(operation_name) as span:
+        # Set initial attributes
+        if attributes:
+            for key, value in attributes.items():
+                span.set_attribute(key, value)
+
+        try:
+            yield span
+            span.set_status(Status(StatusCode.OK))
+        except Exception as e:
+            if record_exception:
+                span.record_exception(e)
+            span.set_status(Status(StatusCode.ERROR, str(e)))
+            raise
+
+
+def trace_mcp_tool(tool_name: str, tool_args: dict[str, Any] | None = None):
+    """
+    Create a span for an MCP tool invocation.
+
+    Usage:
+        with trace_mcp_tool("nc_notes_create_note", {"title": "My Note"}):
+            # Tool implementation
+            pass
+
+    Args:
+        tool_name: Name of the MCP tool
+        tool_args: Optional tool arguments (sensitive data will be sanitized)
+
+    Returns:
+        Context manager for the span
+    """
+    attributes = {
+        "mcp.tool.name": tool_name,
+    }
+
+    # Add sanitized tool args (avoid logging sensitive data)
+    if tool_args:
+        # Only include non-sensitive arguments
+        safe_args = {
+            k: v
+            for k, v in tool_args.items()
+            if k not in ("password", "token", "secret", "api_key", "etag")
+        }
+        if safe_args:
+            attributes["mcp.tool.args"] = str(safe_args)
+
+    return trace_operation(f"mcp.tool.{tool_name}", attributes)
+
+
+def trace_nextcloud_api_call(
+    app: str,
+    method: str,
+    path: str | None = None,
+):
+    """
+    Create a span for a Nextcloud API call.
+
+    Usage:
+        with trace_nextcloud_api_call("notes", "POST", "/apps/notes/api/v1/notes"):
+            # API call implementation
+            pass
+
+    Args:
+        app: Nextcloud app name (notes, calendar, contacts, etc.)
+        method: HTTP method (GET, POST, PUT, DELETE, etc.)
+        path: Optional API path
+
+    Returns:
+        Context manager for the span
+    """
+    attributes = {
+        "nextcloud.app": app,
+        "http.method": method,
+    }
+
+    if path:
+        attributes["http.path"] = path
+
+    return trace_operation(f"nextcloud.api.{app}.{method}", attributes)
+
+
+def trace_oauth_operation(operation: str, details: dict[str, Any] | None = None):
+    """
+    Create a span for an OAuth operation.
+
+    Usage:
+        with trace_oauth_operation("token.validate", {"method": "jwt"}):
+            # OAuth validation logic
+            pass
+
+    Args:
+        operation: OAuth operation name (e.g., "token.validate", "token.exchange")
+        details: Optional operation details (sensitive data will be sanitized)
+
+    Returns:
+        Context manager for the span
+    """
+    attributes = {"oauth.operation": operation}
+
+    if details:
+        # Only include non-sensitive details
+        safe_details = {
+            k: v
+            for k, v in details.items()
+            if k not in ("token", "refresh_token", "access_token", "client_secret")
+        }
+        if safe_details:
+            attributes.update(safe_details)
+
+    return trace_operation(f"oauth.{operation}", attributes)
+
+
+def trace_vector_sync_operation(
+    operation: str,
+    document_count: int | None = None,
+):
+    """
+    Create a span for a vector sync operation.
+
+    Usage:
+        with trace_vector_sync_operation("scan", document_count=10):
+            # Vector sync logic
+            pass
+
+    Args:
+        operation: Operation name (scan, process, embed, upsert)
+        document_count: Optional number of documents being processed
+
+    Returns:
+        Context manager for the span
+    """
+    attributes = {"vector_sync.operation": operation}
+
+    if document_count is not None:
+        attributes["vector_sync.document_count"] = document_count
+
+    return trace_operation(f"vector_sync.{operation}", attributes)
+
+
+def trace_db_operation(
+    db: str,
+    operation: str,
+    table: str | None = None,
+):
+    """
+    Create a span for a database operation.
+
+    Usage:
+        with trace_db_operation("sqlite", "insert", "refresh_tokens"):
+            # Database operation
+            pass
+
+    Args:
+        db: Database type (sqlite, qdrant)
+        operation: Operation type (insert, select, update, delete, upsert, search)
+        table: Optional table/collection name
+
+    Returns:
+        Context manager for the span
+    """
+    attributes = {
+        "db.system": db,
+        "db.operation": operation,
+    }
+
+    if table:
+        attributes["db.table"] = table
+
+    return trace_operation(f"db.{db}.{operation}", attributes)
+
+
+def add_span_attribute(key: str, value: Any) -> None:
+    """
+    Add an attribute to the current span (if any).
+
+    Args:
+        key: Attribute key
+        value: Attribute value
+
+    Note:
+        This is a no-op if tracing is not enabled or there's no active span.
+    """
+    if _tracer is None:
+        return  # Tracing not enabled
+    span = trace.get_current_span()
+    if span.is_recording():
+        span.set_attribute(key, value)
+
+
+def add_span_event(name: str, attributes: dict[str, Any] | None = None) -> None:
+    """
+    Add an event to the current span (if any).
+
+    Args:
+        name: Event name
+        attributes: Optional event attributes
+
+    Note:
+        This is a no-op if tracing is not enabled or there's no active span.
+    """
+    if _tracer is None:
+        return  # Tracing not enabled
+    span = trace.get_current_span()
+    if span.is_recording():
+        span.add_event(name, attributes=attributes or {})
+
+
+def get_trace_context() -> dict[str, str]:
+    """
+    Get current trace context as a dictionary.
+
+    Returns:
+        Dictionary with trace_id and span_id (or empty dict if tracing disabled or no active span)
+    """
+    if _tracer is None:
+        return {}  # Tracing not enabled
+
+    span = trace.get_current_span()
+    if span.is_recording():
+        span_context = span.get_span_context()
+        return {
+            "trace_id": format(span_context.trace_id, "032x"),
+            "span_id": format(span_context.span_id, "016x"),
+        }
+    return {}
@@ -68,17 +68,25 @@ def configure_semantic_tools(mcp: FastMCP):
        client = await get_client(ctx)
        username = client.username

+        logger.info(
+            f"Semantic search: query='{query}', user={username}, "
+            f"limit={limit}, score_threshold={score_threshold}"
+        )
+
        try:
            # Generate embedding for query
            embedding_service = get_embedding_service()
            query_embedding = await embedding_service.embed(query)
+            logger.debug(
+                f"Generated embedding for query (dimension={len(query_embedding)})"
+            )

            # Search Qdrant with user filtering
            # Note: Currently only searching notes (doc_type="note")
            # Future: Remove doc_type filter to search all apps
            qdrant_client = await get_qdrant_client()
            search_response = await qdrant_client.query_points(
-                collection_name=settings.qdrant_collection,
+                collection_name=settings.get_collection_name(),
                query=query_embedding,
                query_filter=Filter(
                    must=[
@@ -98,6 +106,15 @@ def configure_semantic_tools(mcp: FastMCP):
                with_vectors=False,  # Don't return vectors to save bandwidth
            )

+            logger.info(
+                f"Qdrant returned {len(search_response.points)} results "
+                f"(before deduplication and access verification)"
+            )
+            if search_response.points:
+                # Log top 3 scores to help with threshold tuning
+                top_scores = [p.score for p in search_response.points[:3]]
+                logger.debug(f"Top 3 similarity scores: {top_scores}")
+
            # Deduplicate by document ID (multiple chunks per document)
            seen_doc_ids = set()
            results = []
@@ -137,9 +154,14 @@ def configure_semantic_tools(mcp: FastMCP):
                    except HTTPStatusError as e:
                        if e.response.status_code == 403:
                            # User lost access, skip this document
+                            logger.debug(f"Skipping note {doc_id}: access denied (403)")
                            continue
                        elif e.response.status_code == 404:
                            # Document was deleted but not yet removed from vector DB
+                            logger.debug(
+                                f"Skipping note {doc_id}: not found (404), "
+                                f"likely deleted after indexing"
+                            )
                            continue
                        else:
                            # Log other errors but continue processing
@@ -148,6 +170,16 @@ def configure_semantic_tools(mcp: FastMCP):
                            )
                            continue

+            logger.info(
+                f"Returning {len(results)} results after deduplication and access verification"
+            )
+            if results:
+                result_details = [
+                    f"note_{r.id} (score={r.score:.3f}, title='{r.title}')"
+                    for r in results[:5]  # Show top 5
+                ]
+                logger.debug(f"Top results: {', '.join(result_details)}")
+
            return SemanticSearchResponse(
                results=results,
                query=query,
@@ -259,7 +291,47 @@ def configure_semantic_tools(mcp: FastMCP):
                success=True,
            )

-        # 3. Construct context from retrieved documents
+        # 3. Check if client supports sampling
+        from mcp.types import ClientCapabilities, SamplingCapability
+
+        client_has_sampling = ctx.session.check_client_capability(
+            ClientCapabilities(sampling=SamplingCapability())
+        )
+
+        # Log capability check result for debugging
+        logger.info(
+            f"Sampling capability check: client_has_sampling={client_has_sampling}, "
+            f"query='{query}'"
+        )
+        if hasattr(ctx.session, "_client_params") and ctx.session._client_params:
+            client_caps = ctx.session._client_params.capabilities
+            logger.debug(
+                f"Client advertised capabilities: "
+                f"roots={client_caps.roots is not None}, "
+                f"sampling={client_caps.sampling is not None}, "
+                f"experimental={client_caps.experimental is not None}"
+            )
+
+        if not client_has_sampling:
+            logger.info(
+                f"Client does not support sampling (query: '{query}'), "
+                f"returning {len(search_response.results)} documents"
+            )
+            return SamplingSearchResponse(
+                query=query,
+                generated_answer=(
+                    f"[Sampling not supported by client]\n\n"
+                    f"Your MCP client doesn't support answer generation. "
+                    f"Found {search_response.total_found} relevant documents. "
+                    f"Please review the sources below."
+                ),
+                sources=search_response.results,
+                total_found=search_response.total_found,
+                search_method="semantic_sampling_unsupported",
+                success=True,
+            )
+
+        # 4. Construct context from retrieved documents
        context_parts = []
        for idx, result in enumerate(search_response.results, 1):
            context_parts.append(
@@ -273,7 +345,7 @@ def configure_semantic_tools(mcp: FastMCP):

        context = "\n".join(context_parts)

-        # 4. Construct prompt - reuse user's query, add context and instructions
+        # 5. Construct prompt - reuse user's query, add context and instructions
        prompt = (
            f"{query}\n\n"
            f"Here are relevant documents from Nextcloud (notes, calendar events, deck cards, files, contacts):\n\n"
@@ -282,31 +354,35 @@ def configure_semantic_tools(mcp: FastMCP):
            f"Cite the document numbers when referencing specific information."
        )

-        logger.debug(
-            f"Requesting sampling for query: {query} "
-            f"({len(search_response.results)} documents retrieved)"
+        logger.info(
+            f"Initiating sampling request: query_length={len(query)}, "
+            f"documents={len(search_response.results)}, "
+            f"prompt_length={len(prompt)}, max_tokens={max_answer_tokens}"
        )

-        # 5. Request LLM completion via MCP sampling
-        try:
-            sampling_result = await ctx.session.create_message(
-                messages=[
-                    SamplingMessage(
-                        role="user",
-                        content=TextContent(type="text", text=prompt),
-                    )
-                ],
-                max_tokens=max_answer_tokens,
-                temperature=0.7,
-                model_preferences=ModelPreferences(
-                    hints=[ModelHint(name="claude-3-5-sonnet")],
-                    intelligencePriority=0.8,
-                    speedPriority=0.5,
-                ),
-                include_context="thisServer",
-            )
+        # 6. Request LLM completion via MCP sampling with timeout
+        import anyio

-            # 6. Extract answer from sampling response
+        try:
+            with anyio.fail_after(30):
+                sampling_result = await ctx.session.create_message(
+                    messages=[
+                        SamplingMessage(
+                            role="user",
+                            content=TextContent(type="text", text=prompt),
+                        )
+                    ],
+                    max_tokens=max_answer_tokens,
+                    temperature=0.7,
+                    model_preferences=ModelPreferences(
+                        hints=[ModelHint(name="claude-3-5-sonnet")],
+                        intelligencePriority=0.8,
+                        speedPriority=0.5,
+                    ),
+                    include_context="thisServer",
+                )
+
+            # 7. Extract answer from sampling response
            if sampling_result.content.type == "text":
                generated_answer = sampling_result.content.text
            else:
@@ -318,7 +394,8 @@ def configure_semantic_tools(mcp: FastMCP):

            logger.info(
                f"Sampling successful: model={sampling_result.model}, "
-                f"stop_reason={sampling_result.stopReason}"
+                f"stop_reason={sampling_result.stopReason}, "
+                f"answer_length={len(generated_answer)}"
            )

            return SamplingSearchResponse(
@@ -332,23 +409,78 @@ def configure_semantic_tools(mcp: FastMCP):
                success=True,
            )

-        except Exception as e:
-            # Fallback: Return documents without generated answer
+        except TimeoutError:
            logger.warning(
-                f"Sampling failed ({type(e).__name__}: {e}), "
+                f"Sampling request timed out after 30 seconds for query: '{query}', "
                f"returning search results only"
            )
+            return SamplingSearchResponse(
+                query=query,
+                generated_answer=(
+                    f"[Sampling request timed out]\n\n"
+                    f"The answer generation took too long (>30s). "
+                    f"Found {search_response.total_found} relevant documents. "
+                    f"Please review the sources below or try a simpler query."
+                ),
+                sources=search_response.results,
+                total_found=search_response.total_found,
+                search_method="semantic_sampling_timeout",
+                success=True,
+            )
+
+        except McpError as e:
+            # Expected MCP protocol errors (user rejection, unsupported, etc.)
+            error_msg = str(e)
+
+            if "rejected" in error_msg.lower() or "denied" in error_msg.lower():
+                # User explicitly declined - this is normal, not an error
+                logger.info(f"User declined sampling request for query: '{query}'")
+                search_method = "semantic_sampling_user_declined"
+                user_message = "User declined to generate an answer"
+            elif "not supported" in error_msg.lower():
+                # Client doesn't support sampling - also normal
+                logger.info(f"Sampling not supported by client for query: '{query}'")
+                search_method = "semantic_sampling_unsupported"
+                user_message = "Sampling not supported by this client"
+            else:
+                # Other MCP protocol errors
+                logger.warning(
+                    f"MCP error during sampling for query '{query}': {error_msg}"
+                )
+                search_method = "semantic_sampling_mcp_error"
+                user_message = f"Sampling unavailable: {error_msg}"

            return SamplingSearchResponse(
                query=query,
                generated_answer=(
-                    f"[Sampling unavailable: {str(e)}]\n\n"
+                    f"[{user_message}]\n\n"
                    f"Found {search_response.total_found} relevant documents. "
                    f"Please review the sources below."
                ),
                sources=search_response.results,
                total_found=search_response.total_found,
-                search_method="semantic_sampling_fallback",
+                search_method=search_method,
+                success=True,
+            )
+
+        except Exception as e:
+            # Truly unexpected errors - these SHOULD have tracebacks
+            logger.error(
+                f"Unexpected error during sampling for query '{query}': "
+                f"{type(e).__name__}: {e}",
+                exc_info=True,
+            )
+
+            return SamplingSearchResponse(
+                query=query,
+                generated_answer=(
+                    f"[Unexpected error during sampling]\n\n"
+                    f"Found {search_response.total_found} relevant documents. "
+                    f"Please review the sources below."
+                ),
+                sources=search_response.results,
+                total_found=search_response.total_found,
+                search_method="semantic_sampling_error",
                success=True,
            )

@@ -413,7 +545,7 @@ def configure_semantic_tools(mcp: FastMCP):

                # Count documents in collection
                count_result = await qdrant_client.count(
-                    collection_name=settings.qdrant_collection
+                    collection_name=settings.get_collection_name()
                )
                indexed_count = count_result.count

@@ -100,7 +100,7 @@ async def process_document(doc_task: DocumentTask, nc_client: NextcloudClient):
    # Handle deletion
    if doc_task.operation == "delete":
        await qdrant_client.delete(
-            collection_name=settings.qdrant_collection,
+            collection_name=settings.get_collection_name(),
            points_selector=Filter(
                must=[
                    FieldCondition(
@@ -170,8 +170,11 @@ async def _index_document(
    else:
        raise ValueError(f"Unsupported doc_type: {doc_task.doc_type}")

-    # Tokenize and chunk
-    chunker = DocumentChunker(chunk_size=512, overlap=50)
+    # Tokenize and chunk (using configured chunk size and overlap)
+    chunker = DocumentChunker(
+        chunk_size=settings.document_chunk_size,
+        overlap=settings.document_chunk_overlap,
+    )
    chunks = chunker.chunk_text(content)

    # Generate embeddings (I/O bound - external API call)
@@ -209,7 +212,7 @@ async def _index_document(

    # Upsert to Qdrant
    await qdrant_client.upsert(
-        collection_name=settings.qdrant_collection,
+        collection_name=settings.get_collection_name(),
        points=points,
        wait=True,
    )
@@ -59,30 +59,57 @@ async def get_qdrant_client() -> AsyncQdrantClient:
            logger.warning("No Qdrant mode configured, defaulting to :memory:")
            _qdrant_client = AsyncQdrantClient(":memory:")

-        # Ensure collection exists
-        collection_name = settings.qdrant_collection
+        # Get collection name (auto-generated from deployment ID + model)
+        collection_name = settings.get_collection_name()

        # Import here to avoid circular dependency
        from nextcloud_mcp_server.embedding import get_embedding_service

        embedding_service = get_embedding_service()
-        dimension = embedding_service.get_dimension()
+        expected_dimension = embedding_service.get_dimension()

        try:
-            await _qdrant_client.get_collection(collection_name)
-            logger.info(f"Using existing Qdrant collection: {collection_name}")
-        except Exception:
-            # Collection doesn't exist, create it
+            # Get existing collection
+            collection_info = await _qdrant_client.get_collection(collection_name)
+            actual_dimension = collection_info.config.params.vectors.size
+
+            # Validate dimension matches
+            if actual_dimension != expected_dimension:
+                raise ValueError(
+                    f"Dimension mismatch for collection '{collection_name}':\n"
+                    f"  Expected: {expected_dimension} (from embedding model '{settings.ollama_embedding_model}')\n"
+                    f"  Found: {actual_dimension}\n"
+                    f"This usually means you changed the embedding model.\n"
+                    f"Solutions:\n"
+                    f"  1. Delete the old collection: Collection will be recreated with new dimensions\n"
+                    f"  2. Set QDRANT_COLLECTION to use a different collection name\n"
+                    f"  3. Revert OLLAMA_EMBEDDING_MODEL to the original model"
+                )
+
+            logger.info(
+                f"Using existing Qdrant collection: {collection_name} "
+                f"(dimension={actual_dimension}, model={settings.ollama_embedding_model})"
+            )
+
+        except Exception as e:
+            # Check if it's a dimension mismatch error (re-raise it)
+            if isinstance(e, ValueError) and "Dimension mismatch" in str(e):
+                raise
+
+            # Collection doesn't exist or other error, create it
            await _qdrant_client.create_collection(
                collection_name=collection_name,
                vectors_config=VectorParams(
-                    size=dimension,
+                    size=expected_dimension,
                    distance=Distance.COSINE,
                ),
            )
            logger.info(
-                f"Created Qdrant collection: {collection_name} "
-                f"(dimension={dimension}, distance=COSINE)"
+                f"Created Qdrant collection: {collection_name}\n"
+                f"  Dimension: {expected_dimension}\n"
+                f"  Model: {settings.ollama_embedding_model}\n"
+                f"  Distance: COSINE\n"
+                f"Background sync will index all documents with this embedding model."
            )

    return _qdrant_client
@@ -34,6 +34,57 @@ class DocumentTask:
 _potentially_deleted: dict[tuple[str, str], float] = {}


+async def get_last_indexed_timestamp(user_id: str) -> int | None:
+    """Get the most recent indexed_at timestamp for user's notes in Qdrant.
+
+    This timestamp can be used as pruneBefore parameter to optimize data transfer
+    when fetching notes - only notes modified after this timestamp will be sent
+    with full data.
+
+    Args:
+        user_id: User to query
+
+    Returns:
+        Unix timestamp of most recently indexed note, or None if no notes indexed yet
+    """
+    try:
+        qdrant_client = await get_qdrant_client()
+
+        # Query for user's notes, ordered by indexed_at descending, limit 1
+        scroll_result = await qdrant_client.scroll(
+            collection_name=get_settings().get_collection_name(),
+            scroll_filter=Filter(
+                must=[
+                    FieldCondition(key="user_id", match=MatchValue(value=user_id)),
+                    FieldCondition(key="doc_type", match=MatchValue(value="note")),
+                ]
+            ),
+            with_payload=["indexed_at"],
+            with_vectors=False,
+            limit=10000,  # Get all to find max
+        )
+
+        # Find max indexed_at across all results
+        num_points = len(scroll_result[0]) if scroll_result[0] else 0
+        logger.info(f"Found {num_points} indexed notes in Qdrant for user {user_id}")
+
+        if scroll_result[0]:
+            timestamps = [
+                point.payload.get("indexed_at", 0) for point in scroll_result[0]
+            ]
+            max_timestamp = max(timestamps)
+            logger.info(
+                f"Max indexed_at: {max_timestamp}, timestamps sample: {timestamps[:3]}"
+            )
+            return int(max_timestamp) if max_timestamp > 0 else None
+
+        logger.info(f"No indexed notes found for user {user_id}")
+        return None
+    except Exception as e:
+        logger.warning(f"Failed to get last indexed timestamp: {e}", exc_info=True)
+        return None
+
+
 async def scanner_task(
    send_stream: MemoryObjectSendStream[DocumentTask],
    shutdown_event: anyio.Event,
@@ -96,22 +147,38 @@ async def scan_user_documents(
        nc_client: Authenticated Nextcloud client
        initial_sync: If True, send all documents (first-time sync)
    """
-    logger.info(f"Scanning documents for user: {user_id}")
+    import random
+
+    scan_id = random.randint(1000, 9999)
+    logger.info(
+        f"[SCAN-{scan_id}] Starting scan for user: {user_id}, initial_sync={initial_sync}"
+    )
+
+    # Calculate prune timestamp for optimized data transfer
+    # Only notes modified after this will be sent with full data
+    prune_before = None if initial_sync else await get_last_indexed_timestamp(user_id)
+    if prune_before:
+        logger.info(
+            f"[SCAN-{scan_id}] Using pruneBefore={prune_before} to optimize data transfer"
+        )

    # Fetch all notes from Nextcloud
-    notes = [note async for note in nc_client.notes.get_all_notes()]
-    logger.debug(f"Found {len(notes)} notes for {user_id}")
+    notes = [
+        note async for note in nc_client.notes.get_all_notes(prune_before=prune_before)
+    ]
+    logger.info(f"[SCAN-{scan_id}] Found {len(notes)} notes for {user_id}")

    if initial_sync:
        # Send everything on first sync
        for note in notes:
+            modified_at = note.get("modified", 0)
            await send_stream.send(
                DocumentTask(
                    user_id=user_id,
                    doc_id=str(note["id"]),
                    doc_type="note",
                    operation="index",
-                    modified_at=note["modified"],
+                    modified_at=modified_at,
                )
            )
        logger.info(f"Sent {len(notes)} documents for initial sync: {user_id}")
@@ -120,7 +187,7 @@ async def scan_user_documents(
    # Get indexed state from Qdrant
    qdrant_client = await get_qdrant_client()
    scroll_result = await qdrant_client.scroll(
-        collection_name=get_settings().qdrant_collection,
+        collection_name=get_settings().get_collection_name(),
        scroll_filter=Filter(
            must=[
                FieldCondition(key="user_id", match=MatchValue(value=user_id)),
@@ -146,6 +213,7 @@ async def scan_user_documents(
    for note in notes:
        doc_id = str(note["id"])
        indexed_at = indexed_docs.get(doc_id)
+        modified_at = note.get("modified", 0)

        # If document reappeared, remove from potentially_deleted
        doc_key = (user_id, doc_id)
@@ -156,14 +224,14 @@ async def scan_user_documents(
            del _potentially_deleted[doc_key]

        # Send if never indexed or modified since last index
-        if indexed_at is None or note["modified"] > indexed_at:
+        if indexed_at is None or modified_at > indexed_at:
            await send_stream.send(
                DocumentTask(
                    user_id=user_id,
                    doc_id=doc_id,
                    doc_type="note",
                    operation="index",
-                    modified_at=note["modified"],
+                    modified_at=modified_at,
                )
            )
            queued += 1
@@ -1,6 +1,6 @@
 [project]
 name = "nextcloud-mcp-server"
-version = "0.26.1"
+version = "0.31.0"
 description = "Model Context Protocol (MCP) server for Nextcloud integration - enables AI assistants to interact with Nextcloud data"
 authors = [
    {name = "Chris Coutinho", email = "chris@coutinho.io"}
@@ -22,6 +22,15 @@ dependencies = [
    "aiosqlite>=0.20.0", # Async SQLite for refresh token storage
    "authlib>=1.6.5",
    "qdrant-client>=1.7.0",
+    # Observability dependencies
+    "prometheus-client>=0.21.0",  # Prometheus metrics
+    "opentelemetry-api>=1.28.2",  # OpenTelemetry API
+    "opentelemetry-sdk>=1.28.2",  # OpenTelemetry SDK
+    "opentelemetry-instrumentation-asgi>=0.49b2",  # Auto-instrument ASGI/Starlette
+    "opentelemetry-instrumentation-httpx>=0.49b2",  # Auto-instrument httpx client
+    "opentelemetry-instrumentation-logging>=0.49b2",  # Logging integration
+    "opentelemetry-exporter-otlp-proto-grpc>=1.28.2",  # OTLP gRPC exporter
+    "python-json-logger>=3.2.0",  # Structured JSON logging
 ]
 classifiers = [
    "Development Status :: 4 - Beta",
@@ -239,23 +239,46 @@ async def test_attachments_category_change_handling(nc_client: NextcloudClient):
        assert retrieved_content1 == attachment_content
        logger.info("Attachment retrieved successfully from initial category.")

-        # 4. Update note category
+        # 4. Update note category (with retry for ETag conflicts from background scanner)
        logger.info(
            f"Updating note {note_id} category from '{initial_category}' to '{new_category}'"
        )
-        # Need to fetch the latest etag after attachment add (WebDAV ops don't update note etag)
-        current_note_data = await nc_client.notes.get_note(note_id=note_id)
-        current_etag = current_note_data["etag"]
-        updated_note = await nc_client.notes.update(
-            note_id=note_id,
-            etag=current_etag,
-            category=new_category,
-            title=note_title,
-            content="Updated content",  # Pass required fields
-        )
-        etag3 = updated_note["etag"]
-        assert updated_note["category"] == new_category
-        logger.info(f"Note category updated successfully. New Etag: {etag3}")
+        # Retry logic for 412 Precondition Failed (ETag conflict)
+        # This can happen if the background vector scanner touches the note
+        max_update_attempts = 3
+        for attempt in range(max_update_attempts):
+            try:
+                # Fetch the latest etag
+                current_note_data = await nc_client.notes.get_note(note_id=note_id)
+                current_etag = current_note_data["etag"]
+                logger.info(
+                    f"Update attempt {attempt + 1}/{max_update_attempts}, current etag: {current_etag}"
+                )
+
+                updated_note = await nc_client.notes.update(
+                    note_id=note_id,
+                    etag=current_etag,
+                    category=new_category,
+                    title=note_title,
+                    content="Updated content",  # Pass required fields
+                )
+                etag3 = updated_note["etag"]
+                assert updated_note["category"] == new_category
+                logger.info(f"Note category updated successfully. New Etag: {etag3}")
+                break  # Success, exit retry loop
+
+            except HTTPStatusError as e:
+                if e.response.status_code == 412 and attempt < max_update_attempts - 1:
+                    # ETag conflict (likely from background scanner), retry
+                    logger.warning(
+                        f"ETag conflict (412) on attempt {attempt + 1}, retrying..."
+                    )
+                    time.sleep(1)  # Brief delay before retry
+                    continue
+                else:
+                    # Not a 412 or out of retries, re-raise
+                    raise
+
        time.sleep(1)

        # 5. Verify attachment retrieval from *new* category (passing new category)
@@ -146,12 +146,23 @@ Avoid blocking operations in async code.""",
    assert "search_method" in result

    # For this test, sampling might fail (no real LLM client)
-    # So we check for either success or fallback
-    if "[Sampling unavailable" in result["generated_answer"]:
-        # Fallback mode - should still have sources
-        assert result["search_method"] == "semantic_sampling_fallback"
+    # So we check for either success or various fallback states
+    unsupported_methods = {
+        "semantic_sampling_unsupported",
+        "semantic_sampling_user_declined",
+        "semantic_sampling_timeout",
+        "semantic_sampling_mcp_error",
+        "semantic_sampling_fallback",
+    }
+
+    if result["search_method"] in unsupported_methods:
+        # Fallback/unsupported mode - should still have sources
        assert len(result["sources"]) > 0
-        pytest.skip("Sampling not supported by test client (expected fallback)")
+        assert result["total_found"] > 0
+        pytest.skip(
+            f"Sampling not available (method: {result['search_method']}), "
+            f"but search results returned successfully"
+        )
    else:
        # Successful sampling
        assert result["search_method"] == "semantic_sampling"
@@ -151,3 +151,111 @@ class TestGetSettings:
        assert settings.vector_sync_scan_interval == 600
        assert settings.vector_sync_processor_workers == 5
        assert settings.vector_sync_queue_max_size == 5000
+
+
+class TestChunkConfigValidation:
+    """Test document chunking configuration validation."""
+
+    def test_default_chunk_settings(self):
+        """Test default chunk size and overlap values."""
+        settings = Settings()
+        assert settings.document_chunk_size == 512
+        assert settings.document_chunk_overlap == 50
+
+    def test_valid_chunk_settings(self):
+        """Test valid chunk size and overlap configuration."""
+        settings = Settings(
+            document_chunk_size=1024,
+            document_chunk_overlap=100,
+        )
+        assert settings.document_chunk_size == 1024
+        assert settings.document_chunk_overlap == 100
+
+    def test_overlap_greater_than_or_equal_to_chunk_size_raises_error(self):
+        """Test that overlap >= chunk size raises ValueError."""
+        with pytest.raises(
+            ValueError,
+            match="DOCUMENT_CHUNK_OVERLAP .* must be less than DOCUMENT_CHUNK_SIZE",
+        ):
+            Settings(
+                document_chunk_size=512,
+                document_chunk_overlap=512,
+            )
+
+    def test_overlap_larger_than_chunk_size_raises_error(self):
+        """Test that overlap > chunk size raises ValueError."""
+        with pytest.raises(
+            ValueError,
+            match="DOCUMENT_CHUNK_OVERLAP .* must be less than DOCUMENT_CHUNK_SIZE",
+        ):
+            Settings(
+                document_chunk_size=256,
+                document_chunk_overlap=300,
+            )
+
+    def test_negative_overlap_raises_error(self):
+        """Test that negative overlap raises ValueError."""
+        with pytest.raises(
+            ValueError,
+            match="DOCUMENT_CHUNK_OVERLAP .* cannot be negative",
+        ):
+            Settings(
+                document_chunk_size=512,
+                document_chunk_overlap=-10,
+            )
+
+    def test_small_chunk_size_warning(self, caplog):
+        """Test that chunk size < 100 triggers warning."""
+        import logging
+
+        caplog.set_level(logging.WARNING, logger="nextcloud_mcp_server.config")
+        Settings(
+            document_chunk_size=64,
+            document_chunk_overlap=10,
+        )
+        assert (
+            "DOCUMENT_CHUNK_SIZE is set to 64 words, which is quite small"
+            in caplog.text
+        )
+        assert "Consider using at least 256 words" in caplog.text
+
+    def test_reasonable_chunk_size_no_warning(self, caplog):
+        """Test that chunk size >= 100 doesn't trigger warning."""
+        import logging
+
+        caplog.set_level(logging.WARNING, logger="nextcloud_mcp_server.config")
+        Settings(
+            document_chunk_size=256,
+            document_chunk_overlap=25,
+        )
+        assert "DOCUMENT_CHUNK_SIZE" not in caplog.text
+
+    @patch.dict(
+        os.environ,
+        {
+            "DOCUMENT_CHUNK_SIZE": "1024",
+            "DOCUMENT_CHUNK_OVERLAP": "102",
+        },
+        clear=True,
+    )
+    def test_get_settings_chunk_config(self):
+        """Test get_settings() with chunk configuration."""
+        settings = get_settings()
+        assert settings.document_chunk_size == 1024
+        assert settings.document_chunk_overlap == 102
+
+    @patch.dict(
+        os.environ,
+        {
+            "DOCUMENT_CHUNK_SIZE": "256",
+            "DOCUMENT_CHUNK_OVERLAP": "256",
+        },
+        clear=True,
+    )
+    def test_get_settings_invalid_chunk_config_raises_error(self):
+        """Test get_settings() raises error for invalid chunk config."""
+        with pytest.raises(
+            ValueError,
+            match="DOCUMENT_CHUNK_OVERLAP .* must be less than DOCUMENT_CHUNK_SIZE",
+        ):
+            get_settings()
@@ -0,0 +1,88 @@
+"""Unit tests for logging filters."""
+
+import logging
+
+import pytest
+
+from nextcloud_mcp_server.observability.logging_config import HealthCheckFilter
+
+
+@pytest.mark.unit
+class TestHealthCheckFilter:
+    """Tests for the HealthCheckFilter."""
+
+    def test_filters_health_live_requests(self):
+        """Test that /health/live requests are filtered out."""
+        # Create a log record that looks like a uvicorn access log for /health/live
+        record = logging.LogRecord(
+            name="uvicorn.access",
+            level=logging.INFO,
+            pathname="",
+            lineno=0,
+            msg='127.0.0.1:12345 - "GET /health/live HTTP/1.1" 200',
+            args=(),
+            exc_info=None,
+        )
+
+        filter_instance = HealthCheckFilter()
+        assert filter_instance.filter(record) is False
+
+    def test_filters_health_ready_requests(self):
+        """Test that /health/ready requests are filtered out."""
+        record = logging.LogRecord(
+            name="uvicorn.access",
+            level=logging.INFO,
+            pathname="",
+            lineno=0,
+            msg='127.0.0.1:12345 - "GET /health/ready HTTP/1.1" 200',
+            args=(),
+            exc_info=None,
+        )
+
+        filter_instance = HealthCheckFilter()
+        assert filter_instance.filter(record) is False
+
+    def test_filters_metrics_requests(self):
+        """Test that /metrics requests are filtered out."""
+        record = logging.LogRecord(
+            name="uvicorn.access",
+            level=logging.INFO,
+            pathname="",
+            lineno=0,
+            msg='127.0.0.1:12345 - "GET /metrics HTTP/1.1" 200',
+            args=(),
+            exc_info=None,
+        )
+
+        filter_instance = HealthCheckFilter()
+        assert filter_instance.filter(record) is False
+
+    def test_allows_other_requests(self):
+        """Test that non-health-check requests are not filtered."""
+        record = logging.LogRecord(
+            name="uvicorn.access",
+            level=logging.INFO,
+            pathname="",
+            lineno=0,
+            msg='127.0.0.1:12345 - "GET /mcp/messages HTTP/1.1" 200',
+            args=(),
+            exc_info=None,
+        )
+
+        filter_instance = HealthCheckFilter()
+        assert filter_instance.filter(record) is True
+
+    def test_allows_api_requests(self):
+        """Test that API requests are not filtered."""
+        record = logging.LogRecord(
+            name="uvicorn.access",
+            level=logging.INFO,
+            pathname="",
+            lineno=0,
+            msg='127.0.0.1:12345 - "POST /oauth/login HTTP/1.1" 302',
+            args=(),
+            exc_info=None,
+        )
+
+        filter_instance = HealthCheckFilter()
+        assert filter_instance.filter(record) is True