bump: version 0.34.0 → 0.34.1

2025-11-13 21:10:16 +00:00
72 changed files with 549 additions and 11035 deletions
@@ -5,4 +5,3 @@
 !uv.lock

 !nextcloud_mcp_server/**/*.py
-!nextcloud_mcp_server/**/*.html
@@ -20,7 +20,7 @@ jobs:
          fetch-depth: 0
          token: "${{ secrets.PERSONAL_ACCESS_TOKEN }}"
      - name: Create bump and changelog
-        uses: commitizen-tools/commitizen-action@9615e7be1cf341393c52e865ebbdaa0712176d81 # 0.25.0
+        uses: commitizen-tools/commitizen-action@5b0848cd060263e24602d1eba03710e056ef7711 # 0.24.0
        with:
          github_token: ${{ secrets.PERSONAL_ACCESS_TOKEN }}
          changelog_increment_filename: body.md
@@ -9,7 +9,7 @@ jobs:
  linting:
    runs-on: ubuntu-latest
    steps:
-      - uses: actions/checkout@93cb6efe18208431cddfb8368fd83d5badbf9bfd # v5.0.1
+      - uses: actions/checkout@08c6903cd8c0fde910a37f88322edcfb5dd907a8 # v5.0.0
      - name: Install the latest version of uv
        uses: astral-sh/setup-uv@5a7eac68fb9809dea845d802897dc5c723910fa3 # v7.1.3
      - name: Check format
@@ -27,7 +27,7 @@ jobs:
    runs-on: ubuntu-latest

    steps:
-      - uses: actions/checkout@93cb6efe18208431cddfb8368fd83d5badbf9bfd # v5.0.1
+      - uses: actions/checkout@08c6903cd8c0fde910a37f88322edcfb5dd907a8 # v5.0.0
        with:
          submodules: 'true'

@@ -85,4 +85,4 @@ jobs:
          NEXTCLOUD_USERNAME: "admin"
          NEXTCLOUD_PASSWORD: "admin"
        run: |
-          uv run pytest -v --log-cli-level=WARN -m unit -m smoke
+          uv run pytest -v --log-cli-level=WARN --ignore=tests/manual
@@ -13,6 +13,3 @@ docker-compose.override.yml
 # Generated by pytest used to login users
 .nextcloud_oauth_*.json
 .playwright-mcp/
-
-# RAG Evaluation
-tests/rag_evaluation/fixtures/
@@ -1,6 +1,6 @@
+[submodule "oidc"]
+	path = third_party/oidc
+	url = https://github.com/cbcoutinho/oidc
 [submodule "third_party/oidc"]
 	path = third_party/oidc
 	url = https://github.com/cbcoutinho/oidc
-[submodule "third_party/notes"]
-	path = third_party/notes
-	url = https://github.com/cbcoutinho/notes
@@ -1,115 +1,3 @@
-## v0.42.0 (2025-11-17)
-
-### Feat
-
- **viz**: Add dual-score display and improve UI controls
-
-## v0.41.0 (2025-11-17)
-
-### Feat
-
- add configurable fusion algorithms for BM25 hybrid search
- add chunk position tracking to vector indexing and search
- add vector viz template and chunk context endpoint
-
-### Fix
-
- prevent infinite loop in DocumentChunker with position tracking
- Relax SearchResult validation to support DBSF fusion scores > 1.0
-
-## v0.40.0 (2025-11-16)
-
-### Feat
-
- add unified provider architecture with Amazon Bedrock support
-
-### Fix
-
- suppress Starlette middleware type warnings in ty checker
-
-## v0.39.0 (2025-11-16)
-
-### Feat
-
- Implement BM25 hybrid search with native Qdrant RRF fusion
-
-### Fix
-
- Handle named vectors in visualization and semantic search
- Update vizApp to use bm25_hybrid algorithm and remove deprecated weights
- Update viz routes to use BM25 hybrid search after refactor
-
-## v0.38.0 (2025-11-16)
-
-### Feat
-
- add concurrent uploads and --force flag to upload command
- implement RAG evaluation framework with CLI tooling
-
-### Fix
-
- download qrels from BEIR ZIP instead of HuggingFace
-
-### Refactor
-
- migrate asyncio to anyio for consistent structured concurrency
- replace httpx client with NextcloudClient in upload command
-
-### Perf
-
- Eliminate double-fetching in semantic search sampling
- fix vector viz search performance and visual encoding
- make note deletion concurrent in upload --force
-
-## v0.37.0 (2025-11-16)
-
-### Feat
-
- Add OpenTelemetry tracing to @instrument_tool decorator
-
-## v0.36.0 (2025-11-15)
-
-### BREAKING CHANGE
-
- Search algorithms now require Qdrant to be populated.
-Vector sync must be enabled and documents indexed for search to work.
-
-### Feat
-
- Normalize hybrid search RRF scores to 0-1 range
- Enhance vector visualization UI and parallelize search verification
- Add Vector Viz tab to app home page
- Add vector visualization pane with multi-select document types
- Implement custom PCA to remove sklearn dependency
- Add multi-document Protocol with cross-app search support
- Update nc_semantic_search tool with algorithm selection
- Implement unified search algorithm module
-
-### Fix
-
- Reorder tabs and fix viz pane session access
-
-### Refactor
-
- Optimize Nextcloud access verification with centralized filtering
- Make all search algorithms query Qdrant payload, not Nextcloud
-
-### Perf
-
- Exclude vector-sync status polling from distributed tracing
-
-## v0.35.0 (2025-11-15)
-
-### Feat
-
- Enable SSE transport for mcp service and update test fixtures
-
-## v0.34.2 (2025-11-13)
-
-### Fix
-
- Use NEXTCLOUD_OIDC_CLIENT_ID/SECRET env vars consistently
-
 ## v0.34.1 (2025-11-13)

 ### Fix
@@ -5,29 +5,23 @@ This file provides guidance to Claude Code (claude.ai/code) when working with co
 ## Coding Conventions

 ### async/await Patterns
- **Use anyio for all async operations** - Provides structured concurrency
+- **Use anyio + asyncio hybrid** - Both libraries are available
  - pytest runs in `anyio` mode (`anyio_mode = "auto"` in pyproject.toml)
-  - Use `anyio.create_task_group()` for concurrent execution (NOT `asyncio.gather()`)
-  - Use `anyio.Lock()` for synchronization primitives (NOT `asyncio.Lock()`)
-  - Use `anyio.run()` for entry points (NOT `asyncio.run()`)
+  - asyncio used in auth modules (refresh_token_storage.py, token_exchange.py, token_broker.py)
+  - anyio used in calendar.py, client_registration.py, app.py
  - Prefer standard async/await syntax without explicit library imports when possible
-  - Examples: app.py, search/hybrid.py, search/verification.py, auth/token_broker.py

 ### Type Hints
 - **Use Python 3.10+ union syntax**: `str | None` instead of `Optional[str]`
 - **Use lowercase generics**: `dict[str, Any]` instead of `Dict[str, Any]`
 - **Type all function signatures** - Parameters and return types
- **Type checker**: `ty` is configured for static type checking
-  ```bash
-  uv run ty check -- nextcloud_mcp_server
-  ```
+- **No explicit type checker configured** - Ruff handles linting only

 ### Code Quality
- **Run ruff and ty before committing**:
+- **Run ruff before committing**:
  ```bash
  uv run ruff check
  uv run ruff format
-  uv run ty check -- nextcloud_mcp_server
  ```
 - **Ruff configuration** in pyproject.toml (extends select: ["I"] for import sorting)

@@ -61,60 +55,8 @@ This file provides guidance to Claude Code (claude.ai/code) when working with co
 - `nextcloud_mcp_server/server/` - MCP tool/resource definitions
 - `nextcloud_mcp_server/auth/` - OAuth/OIDC authentication
 - `nextcloud_mcp_server/models/` - Pydantic response models
- `nextcloud_mcp_server/providers/` - Unified LLM provider infrastructure (embeddings + generation)
 - `tests/` - Layered test suite (unit, smoke, integration, load)

-### Provider Architecture (ADR-015)
-
-**Unified Provider System** for embeddings and text generation:
-
-**Location:** `nextcloud_mcp_server/providers/`
- `base.py` - `Provider` ABC with optional capabilities
- `registry.py` - Auto-detection and factory pattern
- `ollama.py` - Ollama provider (embeddings + generation)
- `anthropic.py` - Anthropic provider (generation only)
- `bedrock.py` - Amazon Bedrock provider (embeddings + generation)
- `simple.py` - Simple in-memory provider (embeddings only, fallback)
-
-**Usage:**
-```python
-from nextcloud_mcp_server.providers import get_provider
-
-provider = get_provider()  # Auto-detects from environment
-
-# Check capabilities
-if provider.supports_embeddings:
-    embeddings = await provider.embed_batch(texts)
-
-if provider.supports_generation:
-    text = await provider.generate("prompt", max_tokens=500)
-```
-
-**Environment Variables:**
-
-Bedrock:
- `AWS_REGION` - AWS region (e.g., "us-east-1")
- `BEDROCK_EMBEDDING_MODEL` - Embedding model ID (e.g., "amazon.titan-embed-text-v2:0")
- `BEDROCK_GENERATION_MODEL` - Generation model ID (e.g., "anthropic.claude-3-sonnet-20240229-v1:0")
- `AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY` - Optional, uses AWS credential chain
-
-Ollama:
- `OLLAMA_BASE_URL` - API URL (e.g., "http://localhost:11434")
- `OLLAMA_EMBEDDING_MODEL` - Embedding model (default: "nomic-embed-text")
- `OLLAMA_GENERATION_MODEL` - Generation model (e.g., "llama3.2:1b")
- `OLLAMA_VERIFY_SSL` - SSL verification (default: "true")
-
-Simple (fallback, no config needed):
- `SIMPLE_EMBEDDING_DIMENSION` - Dimension (default: 384)
-
-**Auto-Detection Priority:** Bedrock → Ollama → Simple
-
-**Backward Compatibility:**
- Old code using `nextcloud_mcp_server.embedding.get_embedding_service()` still works
- `EmbeddingService` now wraps `get_provider()` internally
-
-**For Details:** See `docs/ADR-015-unified-provider-architecture.md`
-
 ## Development Commands (Quick Reference)

 ### Testing
@@ -1,19 +1,15 @@
-FROM docker.io/library/python:3.12-slim-trixie@sha256:d86b4c74b936c438cd4cc3a9f7256b9a7c27ad68c7caf8c205e18d9845af0164
-
-COPY --from=ghcr.io/astral-sh/uv:0.9.10 /uv /uvx /bin/
+FROM ghcr.io/astral-sh/uv:0.9.9-python3.11-alpine@sha256:0faa7934fac1db7f5056f159c1224d144bab864fd2677a4066d25a686ae32edd

 # Install dependencies
 # 1. git (required for caldav dependency from git)
 # 2. sqlite for development with token db
-RUN apt update && apt install --no-install-recommends --no-install-suggests -y \
-    git \
-    sqlite3 && apt clean
+RUN apk add --no-cache git sqlite

 WORKDIR /app

 COPY . .

-RUN uv sync --locked --no-dev --no-editable --no-cache
+RUN uv sync --locked --no-dev --no-editable

 ENV PYTHONUNBUFFERED=1
 ENV VIRTUAL_ENV=/app/.venv
@@ -2,30 +2,4 @@

 set -euox pipefail

-echo "Installing and configuring notes app for testing..."
-
-# Check if development notes app is mounted at /opt/apps/notes
-if [ -d /opt/apps/notes ]; then
-    echo "Development notes app found at /opt/apps/notes"
-
-    # Remove any existing notes app in apps (from app store or old symlink)
-    if [ -e /var/www/html/custom_apps/notes ]; then
-        echo "Removing existing notes in apps..."
-        rm -rf /var/www/html/custom_apps/notes
-    fi
-
-    # Create symlink from apps to the mounted development version
-    # Per Nextcloud docs: apps outside server root need symlinks in server root
-    echo "Creating symlink: custom_apps/notes -> /opt/apps/notes"
-    ln -sf /opt/apps/notes /var/www/html/custom_apps/notes
-
-    echo "Enabling notes app from /opt/apps (development mode via symlink)"
-    php /var/www/html/occ app:enable notes
-elif [ -d /var/www/html/custom_apps/notes ]; then
-    echo "notes app directory found in apps (already installed)"
-    php /var/www/html/occ app:enable notes
-else
-    echo "notes app not found, installing from app store..."
-    php /var/www/html/occ app:install notes
-    php /var/www/html/occ app:enable notes
-fi
+php /var/www/html/occ app:enable notes
@@ -1,9 +1,9 @@
 dependencies:
 - name: qdrant
  repository: https://qdrant.github.io/qdrant-helm
-  version: 1.16.0
+  version: 1.15.5
 - name: ollama
  repository: https://otwld.github.io/ollama-helm
  version: 1.34.0
-digest: sha256:9dfb8d6e3d5488f669d4c37f3a766213b598ff3de2aead2c734789736c7835b4
-generated: "2025-11-17T17:08:48.055530019Z"
+digest: sha256:d51c97d05be2614b751c0dd7267ef7dc959eff5ebef859c5f895c5c554b7a874
+generated: "2025-11-09T17:08:02.86648061Z"
@@ -2,8 +2,8 @@ apiVersion: v2
 name: nextcloud-mcp-server
 description: A Helm chart for Nextcloud MCP Server - enables AI assistants to interact with Nextcloud
 type: application
-version: 0.42.0
-appVersion: "0.42.0"
+version: 0.34.1
+appVersion: "0.34.1"
 keywords:
  - nextcloud
  - mcp
@@ -27,7 +27,7 @@ annotations:
  grafana_dashboard_folder: "Nextcloud MCP"
 dependencies:
  - name: qdrant
-    version: "1.16.0"
+    version: "1.15.5"
    repository: https://qdrant.github.io/qdrant-helm
    condition: qdrant.networkMode.deploySubchart
  - name: ollama
@@ -3,7 +3,7 @@ services:
  # https://hub.docker.com/_/mariadb
  db:
    # Note: Check the recommend version here: https://docs.nextcloud.com/server/latest/admin_manual/installation/system_requirements.html#server
-    image: docker.io/library/mariadb:lts@sha256:6b848cb24fbbd87429917f6c4422ac53c343e85692eb0fef86553e99e4f422f3
+    image: docker.io/library/mariadb:lts@sha256:404ebf26ed7a56fbab05c29f6f1e70188e5eadb51bba8cee8d355775776deb08
    restart: always
    command: --transaction-isolation=READ-COMMITTED
    volumes:
@@ -69,25 +69,23 @@ services:

  mcp:
    build: .
-    restart: always
    command: ["--transport", "streamable-http"]
+    restart: always
    depends_on:
      app:
        condition: service_healthy
    ports:
      - 127.0.0.1:8000:8000
-      - 127.0.0.1:9090:9090
    volumes:
      - mcp-data:/app/data
    environment:
      - NEXTCLOUD_HOST=http://app:80
      - NEXTCLOUD_USERNAME=admin
      - NEXTCLOUD_PASSWORD=admin
-      - NEXTCLOUD_PUBLIC_ISSUER_URL=http://localhost:8080

      # Vector sync configuration (ADR-007)
      - VECTOR_SYNC_ENABLED=true
-      - VECTOR_SYNC_SCAN_INTERVAL=60
+      - VECTOR_SYNC_SCAN_INTERVAL=10
      - VECTOR_SYNC_PROCESSOR_WORKERS=1

      #- LOG_FORMAT=json
@@ -158,7 +156,7 @@ services:
      - oauth-tokens:/app/data

  keycloak:
-    image: quay.io/keycloak/keycloak:26.4.5@sha256:653852bfdea2be6e958b9e90a976eff1c6de34edd55f2f679bdc48ef16bc528e
+    image: quay.io/keycloak/keycloak:26.4.4@sha256:c6459d5fae1b759f5d667ebdc6237ab3121379c3494e213898569014ede1846d
    command:
      - "start-dev"
      - "--import-realm"
@@ -195,8 +193,8 @@ services:
      # Provider auto-detected from OIDC_DISCOVERY_URL issuer
      # Using internal Docker hostname for discovery to get consistent issuer
      - OIDC_DISCOVERY_URL=http://keycloak:8080/realms/nextcloud-mcp/.well-known/openid-configuration
-      - NEXTCLOUD_OIDC_CLIENT_ID=nextcloud-mcp-server
-      - NEXTCLOUD_OIDC_CLIENT_SECRET=mcp-secret-change-in-production
+      - OIDC_CLIENT_ID=nextcloud-mcp-server
+      - OIDC_CLIENT_SECRET=mcp-secret-change-in-production
      - OIDC_JWKS_URI=http://keycloak:8080/realms/nextcloud-mcp/protocol/openid-connect/certs

      # Nextcloud API endpoint (for accessing APIs with validated token)
@@ -225,7 +223,7 @@ services:
      - keycloak-oauth-storage:/app/.oauth

  qdrant:
-    image: qdrant/qdrant:v1.16.0@sha256:1005201498cf927d835383d0f918b17d8c9da7db58550f169f694455e42d78f4
+    image: qdrant/qdrant:v1.15.5@sha256:0fb8897412abc81d1c0430a899b9a81eb8328aa634e7242d1bc804c1fe8fe863
    restart: always
    ports:
      - 127.0.0.1:6333:6333  # REST API
@@ -1,8 +1,7 @@
 # ADR-011: Improving Semantic Search Quality Through Better Chunking and Embeddings

-**Status**: Partially Implemented (Chunking Complete, Embeddings Pending)
+**Status**: Proposed
 **Date**: 2025-11-12
-**Implementation Date**: 2025-11-18 (Chunking)
 **Authors**: Development Team
 **Related**: ADR-003 (Vector Database Architecture), ADR-008 (MCP Sampling for RAG)

@@ -894,50 +893,3 @@ This ADR addresses the root causes of poor semantic search recall:
 - No new infrastructure or ongoing costs

 **Next Steps**: Approve ADR → Implement changes → Reindex → Validate → Production rollout
-
-## Implementation Status
-
-### Completed (2025-11-18)
-
-**✅ Semantic Markdown-Aware Chunking (Option C1 + C3 Hybrid)**
-
-Implementation details:
- Replaced custom word-based chunking with `MarkdownTextSplitter` from LangChain
- Optimized for Nextcloud Notes markdown content with special handling for:
-  - Headers (`#`, `##`, `###`, etc.)
-  - Code blocks (` ``` `)
-  - Lists (`-`, `*`, `1.`)
-  - Horizontal rules (`---`)
-  - Paragraphs and sentences
- Maintained `ChunkWithPosition` interface for backward compatibility
- Updated configuration defaults:
-  - `DOCUMENT_CHUNK_SIZE`: 512 words → 2048 characters
-  - `DOCUMENT_CHUNK_OVERLAP`: 50 words → 200 characters
- Updated unit tests to verify position tracking and boundary preservation
- All tests passing with markdown-aware character-based chunking
-
-**Files Modified**:
- `nextcloud_mcp_server/vector/document_chunker.py` - LangChain integration
- `nextcloud_mcp_server/config.py` - Character-based defaults
- `tests/unit/test_document_chunker.py` - Updated test suite
-
-**Dependencies Added**:
- `langchain-text-splitters>=1.0.0` (already present in `pyproject.toml`)
-
-**Migration Required**:
- ⚠️ Full reindex required to apply new chunking strategy
- Existing documents in vector database use old word-based chunks
- See "Migration Strategy" section above for reindexing process
-
-### Pending
-
-**⏳ Embedding Model Upgrade (Option E1)**
-
-Still to be implemented:
- Switch from `nomic-embed-text` (768-dim) to `mxbai-embed-large-v1` (1024-dim)
- Implement dynamic dimension detection in `ollama_provider.py`
- Create migration script for collection reindexing
- Run benchmarking to validate improvement
- Deploy to production with atomic collection swap
-
-**Estimated Timeline**: 1-2 weeks for implementation and validation
@@ -1,619 +0,0 @@
-# ADR-012: Unified Multi-Algorithm Search with Client-Configurable Weighting
-
-## Status
-Proposed
-
-## Context
-
-### Current State
-
-The Nextcloud MCP server currently provides semantic search via vector similarity (Qdrant), as designed in ADR-003 and implemented through ADR-007. However, users and MCP clients have limited control over search behavior:
-
-1. **Single algorithm only**: Only pure vector similarity search is available
-2. **No algorithm selection**: MCP clients cannot choose between semantic, keyword, or fuzzy approaches
-3. **No weighting control**: Clients cannot adjust the balance between different search methods
-4. **Disconnected implementations**: Viz pane uses different search algorithms than MCP tools
-5. **Limited flexibility**: No way to optimize search for different use cases (exact match vs. conceptual similarity)
-
-### User Needs
-
-Different search scenarios require different algorithms:
-
- **Exact match queries**: "Find note titled 'Q1 Budget'" → keyword search preferred
- **Conceptual queries**: "What are my goals for next quarter?" → semantic search preferred
- **Typo-tolerant queries**: "Find note about kuberntes" → fuzzy search needed
- **Balanced queries**: "Find documentation about API endpoints" → hybrid search optimal
-
-Additionally, users need a **testing interface** (viz pane) to:
- Experiment with different search algorithms on their own documents
- Visualize search results and algorithm behavior
- Tune weights for optimal results
- Understand which algorithm works best for their queries
-
-### Technical Requirements
-
-1. **Unified interface**: Single MCP tool supporting multiple algorithms
-2. **Client control**: MCP clients specify algorithm and weights via tool parameters
-3. **Backward compatibility**: Existing `nc_semantic_search()` behavior preserved
-4. **Shared implementation**: Viz pane and MCP tools use identical search algorithms
-5. **User accessibility**: Viz pane available to all logged-in users with vector sync enabled
-6. **Performance**: Minimal overhead for algorithm selection
-
-## Decision
-
-We will implement a **unified multi-algorithm search architecture** with the following components:
-
-### Architecture Diagram
-
-```
-┌─────────────────────────────────────────────────────────────────────────────┐
-│                         MCP Client / User Browser                            │
-│                                                                               │
-│  ┌──────────────────────────┐         ┌──────────────────────────────────┐  │
-│  │   MCP Tool Call          │         │   Viz Pane (Browser UI)          │  │
-│  │                          │         │                                  │  │
-│  │ nc_semantic_search(      │         │ - Algorithm selector dropdown    │  │
-│  │   query="kubernetes",    │         │ - Weight adjustment sliders      │  │
-│  │   algorithm="hybrid",    │         │ - Interactive 2D scatter plot    │  │
-│  │   semantic_weight=0.5,   │         │ - Side-by-side comparison        │  │
-│  │   keyword_weight=0.3,    │         │ - Real-time search testing       │  │
-│  │   fuzzy_weight=0.2       │         │                                  │  │
-│  │ )                        │         │                                  │  │
-│  └───────────┬──────────────┘         └────────────┬─────────────────────┘  │
-└──────────────┼─────────────────────────────────────┼────────────────────────┘
-               │                                      │
-               │ MCP Protocol                         │ HTTPS (htmx)
-               │                                      │
-┌──────────────▼──────────────────────────────────────▼────────────────────────┐
-│                        MCP Server (/app endpoint)                             │
-│                                                                               │
-│  ┌─────────────────────────────────────────────────────────────────────────┐ │
-│  │              Unified Search Interface (server/semantic.py)              │ │
-│  │                                                                         │ │
-│  │  @mcp.tool() nc_semantic_search(algorithm, weights...)                 │ │
-│  │  ├─ Validate parameters (weights sum ≤1.0)                             │ │
-│  │  ├─ Dispatch to algorithm selector                                     │ │
-│  │  └─ Return ranked SearchResponse                                       │ │
-│  └────────────────────────────┬────────────────────────────────────────────┘ │
-│                                │                                              │
-│  ┌────────────────────────────▼────────────────────────────────────────────┐ │
-│  │              Algorithm Dispatcher (search/algorithms.py)                │ │
-│  │                                                                         │ │
-│  │  if algorithm == "semantic":    → semantic.py                          │ │
-│  │  if algorithm == "keyword":     → keyword.py                           │ │
-│  │  if algorithm == "fuzzy":       → fuzzy.py                             │ │
-│  │  if algorithm == "hybrid":      → hybrid.py (RRF fusion)               │ │
-│  └─────────────────────────────────────────────────────────────────────────┘ │
-│                                                                               │
-│  ┌──────────────────┐  ┌──────────────────┐  ┌──────────────────┐           │
-│  │  semantic.py     │  │  keyword.py      │  │  fuzzy.py        │           │
-│  │                  │  │                  │  │                  │           │
-│  │ • Query Qdrant   │  │ • Token matching │  │ • Char overlap   │           │
-│  │ • Cosine dist    │  │ • Title weight   │  │ • 70% threshold  │           │
-│  │ • Score ≥0.7     │  │ • ADR-001 logic  │  │ • Simple impl    │           │
-│  └────────┬─────────┘  └────────┬─────────┘  └────────┬─────────┘           │
-│           │                     │                      │                     │
-│           └─────────────────────┼──────────────────────┘                     │
-│                                 │                                            │
-│  ┌──────────────────────────────▼──────────────────────────────────────────┐ │
-│  │                    hybrid.py (Reciprocal Rank Fusion)                   │ │
-│  │                                                                         │ │
-│  │  1. Run algorithms in parallel (semantic, keyword, fuzzy)              │ │
-│  │  2. Collect ranked results from each                                   │ │
-│  │  3. Apply RRF formula: score = weight / (k + rank)                     │ │
-│  │  4. Combine scores across algorithms                                   │ │
-│  │  5. Re-rank by combined score                                          │ │
-│  └─────────────────────────────────────────────────────────────────────────┘ │
-└───────────────────────────────────┬───────────────────────────────────────────┘
-                                    │
-                    ┌───────────────┴───────────────┐
-                    │                               │
-         ┌──────────▼──────────┐         ┌─────────▼────────────┐
-         │ Qdrant Vector DB    │         │ Nextcloud APIs       │
-         │                     │         │                      │
-         │ • Vector search     │         │ • Access verification│
-         │ • user_id filter    │         │ • Full metadata fetch│
-         │ • Score threshold   │         │ • Permission checks  │
-         │ • 768-dim embeddings│         │                      │
-         └─────────────────────┘         └──────────────────────┘
-```
-
-### Data Flow
-
-#### MCP Tool Request
-```
-1. Client calls nc_semantic_search(query, algorithm="hybrid", weights...)
-2. Server validates parameters (weights sum ≤1.0)
-3. Dispatcher routes to hybrid.py
-4. Hybrid search runs semantic, keyword, fuzzy in parallel
-5. RRF combines results with weighted scores
-6. Access verification via Nextcloud API
-7. Return ranked SearchResponse to client
-```
-
-#### Viz Pane Request (Server-Side Processing)
-```
-1. User navigates to /app (Vector Visualization tab)
-2. Browser loads vector-viz fragment via htmx
-3. User enters query and adjusts algorithm/weights
-4. htmx sends request to /app/vector-viz endpoint
-5. Server executes search via search/algorithms.py:
-   - Filters by user_id (multi-tenant security)
-   - Applies selected algorithm (semantic/keyword/fuzzy/hybrid)
-   - Filters by document type (notes/files/calendar/contacts)
-   - Retrieves matching results + metadata
-6. Server performs PCA reduction (768-dim → 2D):
-   - Converts matching results to 2D coordinates
-   - Only sends coordinates + metadata (not full vectors)
-   - Dramatically reduces bandwidth (e.g., 768 floats → 2 floats per doc)
-7. Server returns JSON: {results: [...], coordinates_2d: [...], stats: {...}}
-8. Browser receives lightweight response
-9. Plotly.js renders interactive scatter plot
-10. Matching results highlighted (blue), non-matches grayed (40% opacity)
-```
-
-**Performance Benefits of Server-Side Processing**:
- **Bandwidth reduction**: ~384x less data (2 floats vs 768 floats per document)
- **Client efficiency**: Browser only handles visualization, not computation
- **Scalability**: Can visualize 10,000+ documents without client-side lag
- **Security**: Raw vectors never leave server
- **Consistency**: Same search logic as MCP tool (no drift)
-
-### 1. Core Search Algorithms
-
-Four search algorithms will be available:
-
-#### a) Semantic Search (Vector Similarity)
- **Method**: Cosine distance in 768-dimensional embedding space
- **Implementation**: Qdrant `query_points` with user_id filtering
- **Use case**: Conceptual queries, finding related content
- **Current status**: Implemented in `nextcloud_mcp_server/server/semantic.py`
-
-#### b) Keyword Search (Token-Based)
- **Method**: Token matching with weighted scoring (from ADR-001)
- **Implementation**: Title matches weighted 3x higher than content
- **Use case**: Exact phrase matching, known titles
- **Current status**: Designed in ADR-001, not implemented
-
-#### c) Fuzzy Search (Character Overlap)
- **Method**: Simple character-based similarity (70% threshold)
- **Implementation**: Character set comparison (current viz pane approach)
- **Use case**: Typo tolerance, approximate matching
- **Current status**: Implemented in viz pane only
-
-#### d) Hybrid Search (Multi-Algorithm Fusion)
- **Method**: Reciprocal Rank Fusion (RRF) from ADR-003
- **Implementation**: Parallel execution + score combination
- **Use case**: Balanced queries, general-purpose search
- **Current status**: Designed in ADR-003, not implemented
-
-### 2. Unified MCP Tool Interface
-
-```python
-@mcp.tool()
-@require_scopes("semantic:read")
-async def nc_semantic_search(
-    query: str,
-    ctx: Context,
-    limit: int = 10,
-    score_threshold: float = 0.7,
-    algorithm: Literal["semantic", "keyword", "fuzzy", "hybrid"] = "hybrid",
-    semantic_weight: float = 0.5,
-    keyword_weight: float = 0.3,
-    fuzzy_weight: float = 0.2,
-) -> SearchResponse:
-    """
-    Search Nextcloud content using configurable algorithms.
-
-    Args:
-        query: Natural language search query
-        ctx: MCP context for authentication
-        limit: Maximum results to return
-        score_threshold: Minimum similarity score (semantic/hybrid only)
-        algorithm: Search algorithm to use
-        semantic_weight: Weight for semantic results (hybrid only, default: 0.5)
-        keyword_weight: Weight for keyword results (hybrid only, default: 0.3)
-        fuzzy_weight: Weight for fuzzy results (hybrid only, default: 0.2)
-
-    Returns:
-        Ranked search results with scores and excerpts
-    """
-```
-
-**Key decisions**:
- **Single tool name**: Keep `nc_semantic_search` for backward compatibility
- **Algorithm parameter**: Explicit selection via enum
- **Weight parameters**: Client-configurable, only apply to hybrid mode
- **Validation**: Weights must sum to ≤1.0, enforced server-side
- **Defaults**: Hybrid mode with balanced weights (semantic 50%, keyword 30%, fuzzy 20%)
-
-### 3. Shared Algorithm Implementation
-
-Extract search algorithms into reusable module:
-
-```
-nextcloud_mcp_server/
-├── search/
-│   ├── __init__.py
-│   ├── algorithms.py          # Core search implementations
-│   ├── semantic.py             # Vector similarity search
-│   ├── keyword.py              # Token-based search (ADR-001)
-│   ├── fuzzy.py                # Character overlap search
-│   └── hybrid.py               # RRF fusion (ADR-003)
-└── server/
-    └── semantic.py             # MCP tool wrapper
-```
-
-**Benefits**:
- Viz pane and MCP tools share identical implementations
- Testable in isolation
- Easy to add new algorithms (e.g., BM25, neural reranking)
- Clear separation of concerns
-
-### 4. Viz Pane Integration
-
-Update viz pane (`nextcloud_mcp_server/auth/userinfo_routes.py`) to:
-
-1. **Use shared algorithms**: Import from `search/algorithms.py`
-2. **Server-side filtering**: All search and filtering operations happen server-side
-   - Query execution via shared search backend
-   - Document type filtering (notes, files, calendar, contacts)
-   - User ID filtering for multi-tenant security
-   - Only matching results + metadata sent to client
-   - Reduces bandwidth and improves performance
-3. **PCA reduction**: Server performs dimensionality reduction (768-dim → 2D)
-   - Only 2D coordinates sent to browser for visualization
-   - Dramatically reduces data transfer vs sending full vectors
-   - Enables visualization of large document collections
-4. **User accessibility**: Available to all users with vector sync enabled
-5. **Security**: Filter results by `user_id` (only show user's own documents)
-6. **Interactive testing**: Allow users to:
-   - Select algorithm type
-   - Adjust weights (hybrid mode)
-   - Compare results across algorithms
-   - Visualize result distribution in 2D space
-
-#### Viz Pane UI Components
-
-```
-┌────────────────────────────────────────────────────────────────────────┐
-│ Vector Visualization                                          [Status] │
-├────────────────────────────────────────────────────────────────────────┤
-│                                                                        │
-│ ┌──────────────────────────────────────────────────────────────────┐  │
-│ │ Search Configuration                                             │  │
-│ │                                                                  │  │
-│ │ Query: [_______________________________________________] [Search]│  │
-│ │                                                                  │  │
-│ │ Algorithm: [Hybrid ▼]  [Semantic] [Keyword] [Fuzzy]             │  │
-│ │                                                                  │  │
-│ │ Weights (Hybrid Mode):                                           │  │
-│ │   Semantic: [========50========] 0.5                             │  │
-│ │   Keyword:  [======30======    ] 0.3                             │  │
-│ │   Fuzzy:    [====20====        ] 0.2                             │  │
-│ │                                                                  │  │
-│ │ Document Types: ☑ Notes  ☑ Files  ☑ Calendar  ☑ Contacts        │  │
-│ └──────────────────────────────────────────────────────────────────┘  │
-│                                                                        │
-│ ┌──────────────────────────────────────────────────────────────────┐  │
-│ │ Vector Space Visualization (PCA 2D Projection)                   │  │
-│ │                                                                  │  │
-│ │        ▲                                                         │  │
-│ │    PC2 │     ●  ● ●      🔵 Matching results (full opacity)     │  │
-│ │        │  ●     ●  ●     ⚪ Non-matching results (40% opacity)   │  │
-│ │        │    🔵  ● ●                                              │  │
-│ │        │  ●  🔵  ●       Hover: Show document title + excerpt    │  │
-│ │        │  ● ●  🔵 ●      Click: Open document in Nextcloud       │  │
-│ │    ────┼──●─🔵──●─●────► PC1                                     │  │
-│ │        │   ● ●  ●                                                │  │
-│ │        │    🔵 ●   ●     Explained Variance:                     │  │
-│ │        │  ●    ●  ●      PC1: 23.4% | PC2: 18.7%                 │  │
-│ │        │     ● ●                                                 │  │
-│ │                                                                  │  │
-│ └──────────────────────────────────────────────────────────────────┘  │
-│                                                                        │
-│ ┌──────────────────────────────────────────────────────────────────┐  │
-│ │ Search Results (12 matching documents)                           │  │
-│ │                                                                  │  │
-│ │ 🔵 Kubernetes Setup Guide                        Score: 0.87     │  │
-│ │    "...configure kubectl to connect to cluster..."              │  │
-│ │    [Open in Nextcloud]                                           │  │
-│ │                                                                  │  │
-│ │ 🔵 Container Orchestration Notes                 Score: 0.82     │  │
-│ │    "...deployment strategies for kubernetes..."                 │  │
-│ │    [Open in Nextcloud]                                           │  │
-│ │                                                                  │  │
-│ │ 🔵 K8s Troubleshooting                           Score: 0.79     │  │
-│ │    "...common kuberntes errors and solutions..."                │  │
-│ │    [Open in Nextcloud]                                           │  │
-│ │                                                                  │  │
-│ │ [Show More Results...]                                           │  │
-│ └──────────────────────────────────────────────────────────────────┘  │
-│                                                                        │
-│ ┌──────────────────────────────────────────────────────────────────┐  │
-│ │ Algorithm Performance Comparison                                 │  │
-│ │                                                                  │  │
-│ │ Algorithm    │ Results │ Avg Score │ Time (ms) │ Precision     │  │
-│ │ ─────────────┼─────────┼───────────┼───────────┼───────────     │  │
-│ │ Semantic     │   45    │   0.78    │   145ms   │  ████░ 0.82   │  │
-│ │ Keyword      │   23    │   0.91    │    42ms   │  ███░░ 0.67   │  │
-│ │ Fuzzy        │   67    │   0.72    │    89ms   │  ██░░░ 0.45   │  │
-│ │ Hybrid (RRF) │   52    │   0.84    │   198ms   │  █████ 0.89   │  │
-│ └──────────────────────────────────────────────────────────────────┘  │
-└────────────────────────────────────────────────────────────────────────┘
-```
-
-**Key UI Features**:
-
-1. **Search Input**: Real-time query testing with instant visualization
-2. **Algorithm Selector**: Dropdown + quick-select buttons
-3. **Weight Sliders**: Visual adjustment with live preview (hybrid mode only)
-4. **Document Type Filters**: Checkboxes for notes, files, calendar, contacts
-5. **2D Scatter Plot**: Interactive Plotly.js visualization
-   - Blue dots = matching documents (full opacity)
-   - Gray dots = non-matching documents (40% opacity)
-   - Hover = show title + excerpt tooltip
-   - Click = open document in Nextcloud
-   - Zoom/pan controls for exploration
-6. **Results Panel**: Ranked list with scores and excerpts
-7. **Performance Table**: Compare algorithm speed and accuracy
-8. **Explained Variance**: Show how much information PCA preserves
-
-**Technology Stack**:
- **Frontend**: htmx for dynamic loading, Alpine.js for reactivity
- **Visualization**: Plotly.js for interactive scatter plots
- **Styling**: Tailwind CSS (consistent with existing /app UI)
- **Backend**: Shared `search/algorithms.py` implementation
-
-### 5. Reciprocal Rank Fusion (RRF) for Hybrid Search
-
-Following ADR-003's design:
-
-```python
-def reciprocal_rank_fusion(
-    results: dict[str, list[SearchResult]],
-    weights: dict[str, float],
-    k: int = 60
-) -> list[SearchResult]:
-    """
-    Combine multiple ranked result lists using RRF.
-
-    Args:
-        results: Dict of algorithm_name -> ranked results
-        weights: Dict of algorithm_name -> weight (0-1)
-        k: RRF constant (default: 60, standard value)
-
-    Returns:
-        Combined and re-ranked results
-    """
-    scores = defaultdict(float)
-
-    for algo_name, algo_results in results.items():
-        weight = weights.get(algo_name, 0.0)
-        for rank, result in enumerate(algo_results, start=1):
-            # RRF formula: 1 / (k + rank)
-            rrf_score = weight / (k + rank)
-            scores[result.doc_id] += rrf_score
-
-    # Sort by combined score, return top results
-    return sorted(scores.items(), key=lambda x: x[1], reverse=True)
-```
-
-**RRF properties**:
- **Rank-based**: Uses position, not raw scores (handles score scale differences)
- **Proven effective**: Standard approach in information retrieval
- **Configurable**: `k` parameter controls rank decay (default: 60)
- **Weight support**: Allows algorithm-specific importance
-
-## Implementation Plan
-
-### Phase 1: Extract and Unify Algorithms (Week 1)
-
-1. Create `nextcloud_mcp_server/search/` module
-2. Implement `algorithms.py` with base interface
-3. Extract semantic search logic from `server/semantic.py`
-4. Implement keyword search from ADR-001 design
-5. Extract fuzzy search from viz pane
-6. Implement RRF hybrid search from ADR-003
-7. Add comprehensive unit tests for each algorithm
-
-### Phase 2: Update MCP Tool (Week 1-2)
-
-1. Add `algorithm` parameter to `nc_semantic_search()`
-2. Add weight parameters (`semantic_weight`, etc.)
-3. Implement algorithm dispatcher
-4. Add parameter validation (weights sum ≤1.0)
-5. Update response model to include algorithm metadata
-6. Maintain backward compatibility (default: hybrid)
-7. Add integration tests for all algorithm modes
-
-### Phase 3: Update Viz Pane (Week 2)
-
-**Critical: All processing must happen server-side**
-
-1. **Remove client-side search filtering**
-   - Delete JavaScript-based keyword/fuzzy matching
-   - Remove client-side document type filtering
-   - No search logic in browser
-2. **Implement server-side endpoint** (`/app/vector-viz`)
-   - Accept query, algorithm, weights, doc_type filters
-   - Execute search via `search/algorithms.py`
-   - Filter results by user_id (security)
-   - Perform PCA reduction (768-dim → 2D)
-   - Return JSON with 2D coordinates + metadata only
-3. **Update frontend**
-   - htmx form submission to `/app/vector-viz`
-   - Algorithm selector dropdown
-   - Weight adjustment sliders (htmx updates on change)
-   - Document type checkboxes
-   - Plotly.js visualization of server response
-4. **Performance optimization**
-   - Limit results to user's documents only
-   - Cache PCA transformation (invalidate on new vectors)
-   - Stream large result sets if needed
-   - Add loading indicators for server processing
-
-### Phase 4: Documentation and Testing (Week 2-3)
-
-1. Update MCP tool documentation
-2. Add algorithm selection guide
-3. Document weight tuning recommendations
-4. Add end-to-end tests (MCP + viz pane)
-5. Performance benchmarks for each algorithm
-6. Update CLAUDE.md with search patterns
-
-## Consequences
-
-### Positive
-
-1. **Flexibility**: MCP clients can optimize search for their use case
-2. **Unified implementation**: Single source of truth for search algorithms
-3. **User empowerment**: Viz pane enables query testing and tuning
-4. **Backward compatible**: Existing semantic search behavior preserved
-5. **Extensible**: Easy to add new algorithms (BM25, neural reranking)
-6. **Testable**: Each algorithm can be unit tested independently
-7. **Standards-based**: RRF is proven in production systems
-
-### Negative
-
-1. **Complexity**: More parameters for clients to understand
-2. **API surface**: Larger tool signature (8 parameters)
-3. **Performance**: Hybrid search requires multiple queries
-4. **Validation overhead**: Weight validation adds processing
-5. **Documentation burden**: Need to explain when to use each algorithm
-
-### Neutral
-
-1. **Weight defaults**: May need tuning based on user feedback
-2. **Algorithm performance**: Will vary by content type and query
-3. **Viz pane adoption**: Unknown if users will utilize testing interface
-
-## Alternatives Considered
-
-### Alternative 1: Separate Tools Per Algorithm
-
-```python
-@mcp.tool()
-async def nc_semantic_search(query: str, ctx: Context, ...) -> SearchResponse:
-    """Pure vector similarity search."""
-
-@mcp.tool()
-async def nc_keyword_search(query: str, ctx: Context, ...) -> SearchResponse:
-    """Pure keyword matching."""
-
-@mcp.tool()
-async def nc_hybrid_search(query: str, ctx: Context, weights: dict, ...) -> SearchResponse:
-    """Hybrid search with weights."""
-```
-
-**Rejected because**:
- API proliferation (3+ tools instead of 1)
- Harder to discover capabilities
- Backward compatibility issues
- DRY violation (repeated parameters)
-
-### Alternative 2: Server-Wide Configuration Only
-
-```python
-# .env configuration
-SEARCH_ALGORITHM=hybrid
-SEMANTIC_WEIGHT=0.5
-KEYWORD_WEIGHT=0.3
-FUZZY_WEIGHT=0.2
-```
-
-**Rejected because**:
- No per-query flexibility
- MCP clients cannot optimize for different tasks
- Requires server restart for changes
- User's requirement: "expose a way for users to override the default weights"
-
-### Alternative 3: Production-Grade Fuzzy (Levenshtein/RapidFuzz)
-
-**Rejected because**:
- Adds external dependency
- Simple character overlap performs adequately
- Can always upgrade later if needed
- User's preference: "Keep simple character overlap"
-
-## Related ADRs
-
- **ADR-001**: Enhanced Note Search (keyword algorithm design)
- **ADR-003**: Vector Database and Semantic Search (hybrid search + RRF design)
- **ADR-007**: Background Vector Sync (semantic search implementation)
- **ADR-008**: MCP Sampling for RAG (uses semantic search results)
- **ADR-009**: Semantic Search OAuth Scope (security model)
- **ADR-011**: Improving Semantic Search Quality (mentions future "ADR-013" for hybrid search)
-
-**This ADR supersedes**:
- ADR-011's placeholder for "ADR-013: Hybrid Search"
-
-**This ADR implements**:
- ADR-003's hybrid search design (previously unimplemented)
- ADR-001's keyword search design (previously unimplemented)
-
-## References
-
- **Reciprocal Rank Fusion**: Cormack, G. V., Clarke, C. L., & Buettcher, S. (2009). "Reciprocal rank fusion outperforms condorcet and individual rank learning methods." SIGIR '09.
- **Vector Search**: Malkov, Y. A., & Yashunin, D. A. (2018). "Efficient and robust approximate nearest neighbor search using Hierarchical Navigable Small World graphs." TPAMI.
- **Hybrid Search Best Practices**: Qdrant documentation on hybrid search patterns
- **MCP Protocol**: Model Context Protocol specification for tool design
-
-## Implementation Notes
-
-### Weight Validation
-
-```python
-def validate_weights(
-    semantic_weight: float,
-    keyword_weight: float,
-    fuzzy_weight: float
-) -> None:
-    """Validate hybrid search weights."""
-    if semantic_weight < 0 or keyword_weight < 0 or fuzzy_weight < 0:
-        raise ValueError("Weights must be non-negative")
-
-    total = semantic_weight + keyword_weight + fuzzy_weight
-    if total > 1.0:
-        raise ValueError(f"Weights sum to {total:.2f}, must be ≤1.0")
-
-    if total == 0.0:
-        raise ValueError("At least one weight must be > 0")
-```
-
-### Backward Compatibility
-
-The default behavior (`algorithm="hybrid"` with balanced weights) provides better results than current pure semantic search, while maintaining the same tool name and signature structure. Existing clients will automatically benefit from hybrid search without code changes.
-
-### Performance Considerations
-
- **Semantic search**: ~50-200ms (vector DB query)
- **Keyword search**: ~10-50ms (in-memory token matching)
- **Fuzzy search**: ~20-100ms (character comparison)
- **Hybrid search**: ~100-300ms (parallel execution + fusion)
-
-Parallel execution of algorithms minimizes hybrid search latency.
-
-### Security Model
-
-All algorithms respect the same security boundaries:
-1. **User filtering**: Qdrant queries filter by `user_id`
-2. **Access verification**: Results verified via Nextcloud API
-3. **OAuth scope**: `semantic:read` required for all algorithms
-4. **Viz pane**: Shows only current user's documents
-
-## Success Metrics
-
-1. **Adoption**: % of MCP clients using algorithm parameter
-2. **Performance**: Search latency percentiles (p50, p95, p99)
-3. **Quality**: User satisfaction with result relevance
-4. **Viz pane usage**: % of users accessing testing interface
-5. **Weight distribution**: Most common weight configurations
-
-## Future Enhancements
-
-1. **Additional algorithms**: BM25, TF-IDF, neural reranking
-2. **Auto-tuning**: Learn optimal weights per user
-3. **Query analysis**: Automatic algorithm selection based on query
-4. **Cross-app search**: Extend beyond notes to calendar, files, etc.
-5. **Feedback loop**: Use click-through rate to improve weights
@@ -1,254 +0,0 @@
-## ADR-013: RAG Evaluation Testing Framework
-
-**Status:** Proposed
-
-**Date:** 2025-11-15
-
-### Context
-
-The `nc_semantic_search_answer` tool implements a Retrieval-Augmented Generation (RAG) system where:
-1. **Retrieval**: Vector sync pipeline indexes Nextcloud documents (notes, calendar, contacts, etc.) into a vector database
-2. **Generation**: MCP client's LLM synthesizes answers from retrieved documents via MCP sampling (ADR-008)
-
-We need a testing framework to evaluate RAG system performance and identify whether failures occur in retrieval (wrong documents found) or generation (poor answer quality). This framework must use industry-standard evaluation methodologies while remaining practical to implement and maintain.
-
-To establish a baseline, we will use the **BeIR/nfcorpus** dataset (medical/biomedical corpus) with ~5,000 documents and established query/answer pairs.
-
-Homepage: https://www.cl.uni-heidelberg.de/statnlpgroup/nfcorpus/
-Download: https://public.ukp.informatik.tu-darmstadt.de/thakur/BEIR/datasets/nfcorpus.zip
-
-### Decision
-
-We will implement a **two-part evaluation framework** that independently tests retrieval and generation quality using pytest fixtures.
-
-#### In Scope
-
-**1. Retrieval Evaluation**
-Tests the vector sync/embedding pipeline's ability to find relevant documents.
-
- **Metric: Context Recall** (Did we retrieve documents containing the answer?)
-  - **Evaluation method**: Heuristic - Check if ground-truth document IDs appear in top-k retrieval results
-  - **Test**: Query → Semantic search → Assert expected doc IDs present
-
-**2. Generation Evaluation**
-Tests the MCP client LLM's ability to synthesize correct answers from retrieved context.
-
- **Metric: Answer Correctness** (Is the generated answer factually correct?)
-  - **Evaluation method**: LLM-as-judge - Compare RAG answer against ground-truth answer
-  - **Test**: Query → `nc_semantic_search_answer` → LLM evaluates answer vs. ground truth (binary true/false)
-
-#### Out of Scope (Initial Implementation)
-
- **Context Relevance/Precision**: Measuring irrelevant documents in retrieval results
- **Faithfulness/Groundedness**: Detecting hallucinations not supported by retrieved context
- **Answer Relevance**: Whether answer addresses the specific question asked
- **Out-of-Scope Handling**: Testing "I don't know" responses when answer isn't in context
- **Continuous benchmarking**: Automated tracking of metric trends over time
- **Custom domain datasets**: Production-specific test data (medical corpus used initially)
-
-These remain valuable for future iterations but add complexity beyond our initial goals.
-
-#### Implementation
-
-**Test Structure**
-
-Location: `tests/rag_evaluation/`
- `test_retrieval_quality.py` - Retrieval evaluation tests
- `test_generation_quality.py` - Generation evaluation tests
- `conftest.py` - Fixtures for test data, MCP clients, and evaluation LLMs
-
-**Required Pytest Fixtures**
-
-1. **`nfcorpus_test_data`** (session-scoped)
-   - Downloads/caches BeIR nfcorpus dataset at runtime
-   - Loads 5 pre-selected test queries with:
-     - Query text
-     - Pre-generated ground-truth answer (from `tests/rag_evaluation/fixtures/ground_truth.json`)
-     - Expected document IDs (from qrels with score=2)
-   - Uploads all corpus documents as notes in test Nextcloud instance
-   - Triggers vector sync to index documents
-   - Waits for indexing completion
-   - Returns test case data structure
-
-2. **`mcp_sampling_client`** (session-scoped)
-   - Creates MCP client that supports sampling
-   - Configurable LLM provider (ollama or anthropic) via environment:
-     - `RAG_EVAL_PROVIDER=ollama` (default) or `anthropic`
-     - `RAG_EVAL_OLLAMA_BASE_URL=http://localhost:11434`
-     - `RAG_EVAL_OLLAMA_MODEL=llama3.1:8b`
-     - `RAG_EVAL_ANTHROPIC_API_KEY=sk-...`
-     - `RAG_EVAL_ANTHROPIC_MODEL=claude-3-5-sonnet-20241022`
-   - Returns configured MCP client fixture
-
-3. **`evaluation_llm`** (session-scoped)
-   - Separate LLM instance for evaluation (independent from MCP client)
-   - Same provider configuration as `mcp_sampling_client`
-   - Returns callable: `async def evaluate(prompt: str) -> str`
-
-**Test Implementation Examples**
-
-```python
-# tests/rag_evaluation/test_retrieval_quality.py
-async def test_retrieval_recall(nc_client, nfcorpus_test_data):
-    """Test that semantic search retrieves documents containing the answer."""
-    for test_case in nfcorpus_test_data:
-        # Perform semantic search (retrieval only, no generation)
-        results = await nc_client.notes.semantic_search(
-            query=test_case.query,
-            limit=10
-        )
-
-        retrieved_doc_ids = {r.document_id for r in results}
-        expected_doc_ids = set(test_case.expected_document_ids)
-
-        # Context Recall: Are expected documents in top-k results?
-        recall = len(expected_doc_ids & retrieved_doc_ids) / len(expected_doc_ids)
-        assert recall >= 0.8, f"Recall {recall} below threshold for query: {test_case.query}"
-
-
-# tests/rag_evaluation/test_generation_quality.py
-async def test_answer_correctness(mcp_sampling_client, evaluation_llm, nfcorpus_test_data):
-    """Test that RAG system generates factually correct answers."""
-    for test_case in nfcorpus_test_data:
-        # Execute full RAG pipeline (retrieval + generation)
-        result = await mcp_sampling_client.call_tool(
-            "nc_semantic_search_answer",
-            arguments={"query": test_case.query, "limit": 5}
-        )
-
-        rag_answer = result["generated_answer"]
-
-        # LLM-as-judge evaluation
-        evaluation_prompt = f"""Compare these two answers and respond with only TRUE or FALSE.
-
-Question: {test_case.query}
-
-Generated Answer: {rag_answer}
-
-Ground Truth Answer: {test_case.ground_truth}
-
-Are these answers semantically equivalent (do they convey the same factual information)?
-Respond with only: TRUE or FALSE"""
-
-        evaluation_result = await evaluation_llm(evaluation_prompt)
-
-        assert evaluation_result.strip().upper() == "TRUE", \
-            f"Answer mismatch for query: {test_case.query}\nGot: {rag_answer}\nExpected: {test_case.ground_truth}"
-```
-
-**Dataset Integration**
-
-The BeIR nfcorpus dataset structure:
- **corpus.jsonl**: 3,633 medical/biomedical documents (articles from PubMed)
- **queries.jsonl**: 3,237 queries (questions)
- **qrels/*.tsv**: Relevance judgments mapping query IDs to document IDs with scores (2=highly relevant, 1=somewhat relevant)
-
-**Important**: The dataset provides relevance judgments (which documents answer which queries) but does NOT include ground truth answers. We must generate synthetic ground truth offline.
-
-**Selected Test Queries** (5 diverse candidates):
-
-1. **PLAIN-2630**: "Alkylphenol Endocrine Disruptors and Allergies" (5 words, 21 highly relevant docs)
-2. **PLAIN-2660**: "How Long to Detox From Fish Before Pregnancy?" (8 words, 20 highly relevant docs)
-3. **PLAIN-2510**: "Coffee and Artery Function" (4 words, 16 highly relevant docs)
-4. **PLAIN-2430**: "Preventing Brain Loss with B Vitamins?" (6 words, 15 highly relevant docs)
-5. **PLAIN-2690**: "Chronic Headaches and Pork Tapeworms" (5 words, 14 highly relevant docs)
-
-**Ground Truth Generation** (offline, pre-test):
-
-Ground truth answers will be generated offline using a script that:
-1. Loads nfcorpus dataset
-2. For each selected query, extracts top 3-5 highly relevant documents
-3. Uses an LLM (ollama/anthropic) to synthesize a reference answer
-4. Stores ground truth in `tests/rag_evaluation/fixtures/ground_truth.json`
-
-```python
-# tools/generate_rag_ground_truth.py
-async def generate_ground_truth(query: str, relevant_docs: List[dict], llm: LLMProvider) -> str:
-    """Generate synthetic ground truth answer from highly relevant documents."""
-    context = "\n\n".join([
-        f"Document {i+1}:\nTitle: {doc['title']}\n{doc['text']}"
-        for i, doc in enumerate(relevant_docs[:5])
-    ])
-
-    prompt = f"""Based on the following documents, provide a comprehensive answer to this question:
-
-Question: {query}
-
-{context}
-
-Provide a factual, well-structured answer that synthesizes information from the documents.
-Focus on accuracy and completeness."""
-
-    return await llm.generate(prompt, max_tokens=500)
-```
-
-**Dataset Loading at Test Runtime** (in `nfcorpus_test_data` fixture):
-
-1. Download nfcorpus dataset (cached in pytest temp directory)
-2. Load corpus, queries, and qrels (relevance judgments)
-3. Load pre-generated ground truth from `tests/rag_evaluation/fixtures/ground_truth.json`
-4. Upload all corpus documents as Nextcloud notes
-5. Trigger vector sync to index documents
-6. Wait for indexing completion
-7. Return test cases with query, ground truth, and expected doc IDs
-
-**LLM Provider Abstraction**
-
-```python
-# tests/rag_evaluation/llm_providers.py
-class LLMProvider(Protocol):
-    async def generate(self, prompt: str, max_tokens: int = 100) -> str: ...
-
-class OllamaProvider:
-    def __init__(self, base_url: str, model: str):
-        self.base_url = base_url
-        self.model = model
-
-    async def generate(self, prompt: str, max_tokens: int = 100) -> str:
-        # Use httpx to call Ollama API
-        ...
-
-class AnthropicProvider:
-    def __init__(self, api_key: str, model: str):
-        self.client = anthropic.AsyncAnthropic(api_key=api_key)
-        self.model = model
-
-    async def generate(self, prompt: str, max_tokens: int = 100) -> str:
-        message = await self.client.messages.create(
-            model=self.model,
-            max_tokens=max_tokens,
-            messages=[{"role": "user", "content": prompt}]
-        )
-        return message.content[0].text
-```
-
-### Consequences
-
-**Positive:**
-
-* **Actionable debugging**: Separate retrieval/generation tests pinpoint failure location
-* **Industry-standard metrics**: Context Recall and Answer Correctness are recognized RAG evaluation metrics
-* **Simple initial implementation**: Binary LLM evaluation (true/false) is straightforward to implement and interpret
-* **Extensible framework**: Easy to add more metrics (faithfulness, relevance) later
-* **Standardized benchmark**: nfcorpus provides objective comparison against published RAG systems
-* **Hybrid evaluation**: Combines efficiency (heuristics for retrieval) with quality (LLM-as-judge for generation)
-* **Provider flexibility**: Supports both local (Ollama) and cloud (Anthropic) LLM evaluation
-
-**Negative:**
-
-* **Medical domain bias**: nfcorpus is medical/biomedical content, may not represent production use cases (personal notes, calendar events, etc.)
-* **Manual test execution**: Tests require external LLM access and are not integrated into CI pipeline
-* **Limited initial coverage**: Starting with only 5 queries provides limited statistical confidence
-* **Evaluation cost**: LLM-as-judge for generation evaluation incurs API costs (Anthropic) or requires local inference (Ollama)
-* **Single metric per component**: Initial scope tests only one metric per component, missing other important quality dimensions
-* **Synthetic ground truth**: Ground truth answers are LLM-generated, not human-validated, which may introduce evaluation bias
-* **Large corpus upload**: Uploading 3,633 documents at test runtime may be slow; caching strategy needed
-
-**Future Work:**
-
-* Expand to 50-100 queries for statistical significance
-* Add custom test dataset with production-representative documents (meeting notes, task lists, etc.)
-* Implement additional metrics (faithfulness, context relevance, answer relevance)
-* Create automated benchmarking dashboard to track metric trends
-* Test multi-hop reasoning (synthesis questions requiring multiple documents)
-* Evaluate out-of-scope handling ("I don't know" responses)
@@ -1,241 +0,0 @@
-# ADR-014: Replace Custom Keyword Search with BM25 Hybrid Search via Qdrant
-
-**Date:** 2025-11-16
-
-**Status:** Implemented
-
---
-
-### 1. Context
-
-Our RAG application currently employs two separate retrieval mechanisms:
-1.  **Dense (Semantic) Search:** Using vector embeddings stored in our Qdrant database to find semantically similar context.
-2.  **Keyword Search:** A custom-built fuzzy/character-based search to match-specific keywords, acronyms, and product codes that semantic search often misses.
-
-This dual-system approach has several drawbacks:
-* **Poor Relevance:** Our current keyword search is basic (e.g., `LIKE` queries or simple fuzzy matching). It is not as effective as modern full-text search algorithms like BM25.
-* **Clunky Fusion:** We lack a robust, principled method to combine the results from the two systems. This leads to disjointed logic in the application layer and suboptimal context being passed to the LLM.
-* **Architectural Complexity:** We must maintain two separate search pathways (one to Qdrant, one to the keyword search mechanism), increasing code complexity and maintenance overhead.
-
-Our vector database, **Qdrant**, natively supports **hybrid search** by combining dense vectors with BM25-based **sparse vectors** in a single collection.
-
-### 2. Decision
-
-We will **deprecate and remove** the existing custom keyword/fuzzy search functionality.
-
-We will **replace it by implementing native hybrid search within Qdrant**. This involves:
-1.  **Modifying the Qdrant Collection:** Updating our collection to support a named sparse vector index configured for BM25.
-2.  **Updating the Ingestion Pipeline:** For every document chunk, we will generate and upsert *both*:
-    * Its **dense vector** (from our existing embedding model).
-    * Its **sparse vector** (generated using a BM25-compatible model, e.g., `Qdrant/bm25` from `fastembed`).
-3.  **Refactoring Retrieval Logic:** All retrieval calls will be consolidated into a single Qdrant query using the `query_points` endpoint. This query will use the `prefetch` parameter to execute both dense and sparse searches, and Qdrant's built-in **Reciprocal Rank Fusion (RRF)** to automatically merge the results into a single, relevance-ranked list.
-4.  **Backfilling:** A one-time migration script will be created to generate and add sparse vectors for all existing documents in the Qdrant collection.
-
---
-
-### 3. Considered Options
-
-#### Option 1: Native Qdrant Hybrid Search (Chosen)
-* Use Qdrant's built-in sparse vector and RRF capabilities.
-* **Pros:**
-    * **Consolidated Architecture:** Manages both dense and sparse indexes in one database.
-    * **No Data Sync Issues:** Updates are atomic. A single `upsert` updates both representations.
-    * **Built-in Fusion:** RRF is handled natively and efficiently by the database.
-    * **Superior Relevance:** Replaces our brittle custom search with the industry-standard BM25.
-* **Cons:**
-    * Requires a one-time data backfill which may be time-consuming.
-    * Adds a new step (sparse vector generation) to the ingestion pipeline.
-
-#### Option 2: External Full-Text Search (e.g., Elasticsearch)
-* Keep Qdrant for dense search and add a separate Elasticsearch/OpenSearch cluster for BM25.
-* **Pros:**
-    * Provides a very powerful, dedicated full-text search engine.
-* **Cons:**
-    * **High Complexity:** Introduces a new, stateful service to deploy, manage, and scale.
-    * **Data Sync Nightmare:** We would be responsible for ensuring that the document IDs and content in Qdrant and Elasticsearch are always perfectly synchronized. This is a major source of bugs.
-    * **Manual Fusion:** The application would have to query both systems and perform RRF manually.
-
-#### Option 3: Keep Current System
-* Make no changes.
-* **Pros:**
-    * No engineering effort required.
-* **Cons:**
-    * Fails to address the known relevance and architectural problems.
-    * Our RAG application's performance will remain suboptimal, especially for keyword-sensitive queries.
-
---
-
-### 4. Rationale
-
-**Option 1 is the clear winner.** It directly solves our primary problem (poor keyword matching) by adopting the industry-standard BM25.
-
-Critically, it achieves this while **simplifying** our overall architecture, not complicating it. By leveraging features already present in our existing database (Qdrant), we avoid the massive operational and synchronization overhead of adding a second search system (Option 2).
-
-This decision consolidates our retrieval logic, eliminates the data consistency problem, and moves the complex fusion logic (RRF) from the application layer into the database, where it can be performed more efficiently.
-
-### 5. Consequences
-
-**New Work:**
-* **Ingestion:** The data ingestion pipeline must be updated to add the `fastembed` library (or similar), generate sparse vectors, and upsert them to the new named vector field in Qdrant.
-* **Retrieval:** The application's retrieval service must be refactored to use the `query_points` endpoint with `prefetch` and `fusion=models.Fusion.RRF`.
-* **Migration:** A one-time backfill script must be written and executed to add sparse vectors for all existing documents.
-* **Infrastructure:** The Qdrant collection schema must be updated (or re-created) to add the `sparse_vectors_config`.
-
-**Positive:**
-* **Improved Accuracy:** Retrieval will be significantly more accurate, handling both semantic and keyword queries robustly.
-* **Simplified Code:** The application's retrieval logic will be cleaner and simpler, with one endpoint instead of two.
-* **Reduced Maintenance:** We will remove the custom fuzzy-search code, which is brittle and difficult to maintain.
-
-**Negative:**
-* The data backfill process will require careful management to avoid downtime.
-* Ingestion time will slightly increase due to the extra step of sparse vector generation. This is considered a negligible trade-off for the gains in relevance.
-
---
-
-### 6. Implementation Notes
-
-**Implementation completed on 2025-11-16**
-
-**Key Changes:**
-
-1. **Dependencies** (pyproject.toml:25):
-   - Added `fastembed>=0.4.2` for BM25 sparse vector embeddings
-   - Adjusted `pillow` version constraint to be compatible with fastembed
-
-2. **Qdrant Collection Schema** (nextcloud_mcp_server/vector/qdrant_client.py:113-128):
-   - Updated to named vectors: `{"dense": VectorParams(...), "sparse": SparseVectorParams(...)}`
-   - Added sparse vector configuration with BM25 index
-   - Maintains backward compatibility with existing collections (detects legacy schema)
-
-3. **BM25 Embedding Provider** (nextcloud_mcp_server/embedding/bm25_provider.py):
-   - Created `BM25SparseEmbeddingProvider` using FastEmbed's `Qdrant/bm25` model
-   - Implements `encode()` and `encode_batch()` methods
-   - Returns sparse vectors as `{indices: list[int], values: list[float]}` format
-
-4. **Document Indexing Pipeline** (nextcloud_mcp_server/vector/processor.py:229-255):
-   - Generates both dense (semantic) and sparse (BM25) embeddings for each document chunk
-   - Updates `PointStruct` to use named vectors: `vector={"dense": ..., "sparse": ...}`
-   - Maintains same chunking strategy (512 words, 50-word overlap)
-
-5. **BM25 Hybrid Search Algorithm** (nextcloud_mcp_server/search/bm25_hybrid.py):
-   - Implements `BM25HybridSearchAlgorithm` using Qdrant's native RRF fusion
-   - Uses `prefetch` parameter for parallel dense + sparse search
-   - Applies `fusion=models.Fusion.RRF` for automatic result merging
-   - Maintains same deduplication and filtering logic as semantic search
-
-6. **MCP Tool Updates** (nextcloud_mcp_server/server/semantic.py:39-68):
-   - Simplified `nc_semantic_search()` to use BM25 hybrid only
-   - Removed `algorithm`, `semantic_weight`, `keyword_weight`, `fuzzy_weight` parameters
-   - Updated default `score_threshold=0.0` for RRF scoring
-   - Returns `search_method="bm25_hybrid"` in responses
-
-7. **Legacy Algorithm Removal**:
-   - Deleted `nextcloud_mcp_server/search/keyword.py` (278 lines)
-   - Deleted `nextcloud_mcp_server/search/fuzzy.py` (220 lines)
-   - Deleted `nextcloud_mcp_server/search/hybrid.py` (238 lines - custom RRF)
-   - Updated `nextcloud_mcp_server/search/__init__.py` to export only BM25 hybrid
-
-**Migration Strategy:**
- No migration required (vector sync feature is experimental)
- New documents automatically indexed with both dense + sparse vectors
- Collection re-creation on first startup with updated schema
-
-**Test Results:**
- All unit tests passing (118 passed)
- All integration tests passing (7 semantic search tests)
- Code formatting verified with ruff
-
-**Benefits Realized:**
- ✅ Consolidated architecture (single Qdrant database for both dense + sparse)
- ✅ Native fusion algorithms (database-level, more efficient)
- ✅ Industry-standard BM25 (replaces custom keyword search)
- ✅ Simplified codebase (removed 736 lines of legacy code)
- ✅ Better relevance (handles both semantic and keyword queries)
- ✅ Configurable fusion methods (RRF and DBSF)
-
---
-
-### 7. Fusion Algorithm Options
-
-**Update: 2025-11-16**
-
-The BM25 hybrid search now supports two fusion algorithms for combining dense (semantic) and sparse (BM25) search results:
-
-#### Reciprocal Rank Fusion (RRF)
-
-**Default fusion method.** RRF is a widely-used, well-established algorithm that combines rankings from multiple retrieval systems using the reciprocal rank formula:
-
-```
-RRF(doc) = Σ 1/(k + rank_i(doc))
-```
-
-where `k` is a constant (typically 60) and `rank_i(doc)` is the rank of the document in retrieval system `i`.
-
-**Characteristics:**
- ✅ **General-purpose**: Works well across diverse query types and document collections
- ✅ **Rank-based**: Focuses on relative rankings rather than absolute scores
- ✅ **Established**: Well-tested, documented, and understood in IR literature
- ✅ **Robust**: Less sensitive to score distribution differences between systems
-
-**When to use RRF:**
- Default choice for most use cases
- When you have mixed query types (semantic + keyword)
- When retrieval systems have very different score ranges
- When you want predictable, well-understood behavior
-
-#### Distribution-Based Score Fusion (DBSF)
-
-**Alternative fusion method.** DBSF normalizes scores from each retrieval system using distribution statistics before combining them:
-
-1. **Normalization**: For each query, calculates mean (μ) and standard deviation (σ) of scores
-2. **Outlier handling**: Uses μ ± 3σ as normalization bounds
-3. **Fusion**: Sums normalized scores across systems
-
-**Characteristics:**
- ✅ **Score-aware**: Uses actual relevance scores, not just rankings
- ✅ **Statistical**: Normalizes based on score distribution properties
- ⚠️ **Experimental**: Newer algorithm, less battle-tested than RRF
- ⚠️ **Sensitive**: May behave differently depending on score distributions
-
-**When to use DBSF:**
- When retrieval systems have vastly different score ranges that RRF doesn't balance well
- When you want to experiment with score-based (vs rank-based) fusion
- When statistical normalization better matches your use case
- For A/B testing against RRF to measure retrieval quality improvements
-
-#### Configuration
-
-Both fusion algorithms are exposed via the `fusion` parameter in MCP tools:
-
-```python
-# Use RRF (default)
-response = await nc_semantic_search(
-    query="async programming",
-    fusion="rrf"  # Can be omitted, RRF is default
-)
-
-# Use DBSF
-response = await nc_semantic_search(
-    query="async programming",
-    fusion="dbsf"
-)
-```
-
-The `nc_semantic_search_answer` tool also supports the `fusion` parameter and passes it through to the underlying search.
-
-#### Future: Configurable Weights
-
-**Current limitation**: Neither RRF nor DBSF currently support per-system weights (e.g., 0.8 for semantic, 0.2 for BM25). This is a Qdrant platform limitation tracked in [qdrant/qdrant#6067](https://github.com/qdrant/qdrant/issues/6067).
-
-When Qdrant adds weight support, the `fusion` parameter can be extended to accept weight configurations:
-
-```python
-# Hypothetical future API
-response = await nc_semantic_search(
-    query="async programming",
-    fusion="rrf",
-    fusion_weights={"dense": 0.7, "sparse": 0.3}  # Not yet implemented
-)
-```
-
-**Recommendation**: Start with RRF (default). If you encounter cases where keyword matches are under- or over-weighted, experiment with DBSF. Monitor [qdrant/qdrant#6067](https://github.com/qdrant/qdrant/issues/6067) for configurable weight support.
@@ -1,380 +0,0 @@
-# ADR-015: Unified Provider Architecture for Embeddings and Text Generation
-
-**Status:** Accepted
-**Date:** 2025-01-16
-**Deciders:** Development Team
-**Related:** ADR-003 (Vector Database), ADR-008 (MCP Sampling), ADR-013 (RAG Evaluation)
-
-## Context
-
-Prior to this refactoring, the codebase had two separate provider systems:
-
-1. **Embedding Providers** (`nextcloud_mcp_server/embedding/`)
-   - Used `EmbeddingProvider` ABC with methods: `embed()`, `embed_batch()`, `get_dimension()`
-   - Had auto-detection via `EmbeddingService._detect_provider()`
-   - Used for semantic search and vector indexing (production)
-
-2. **LLM Providers** (`tests/rag_evaluation/llm_providers.py`)
-   - Used `LLMProvider` Protocol with method: `generate()`
-   - Had separate factory function `create_llm_provider()`
-   - Used only for RAG evaluation tests (not production)
-
-This fragmentation created several problems:
-
-### Problems with Dual Provider Systems
-
-1. **Code Duplication**
-   - Ollama configuration appeared in both `embedding/service.py` and `tests/rag_evaluation/llm_providers.py`
-   - Similar provider detection logic in multiple places
-   - Separate singleton patterns for each system
-
-2. **Limited Extensibility**
-   - Hard-coded provider detection in `EmbeddingService._detect_provider()`
-   - No support for providers that offer both capabilities (like Bedrock)
-   - Adding new providers required modifying multiple files
-
-3. **Inconsistent Patterns**
-   - BM25 provider didn't follow `EmbeddingProvider` ABC
-   - Different method names across providers (`embed` vs `encode`)
-   - ABC vs Protocol for type checking
-
-4. **Difficult Scaling**
-   - Adding Amazon Bedrock (our third provider) would exacerbate all issues
-   - No clear path for future providers (OpenAI, Cohere, etc.)
-
-### Amazon Bedrock Requirements
-
-Bedrock naturally supports **both** embeddings and text generation:
- **Embeddings**: `amazon.titan-embed-text-v1/v2`, `cohere.embed-*`
- **Text Generation**: `anthropic.claude-*`, `meta.llama3-*`, `amazon.titan-text-*`
- **Unified API**: Single `invoke_model()` method via bedrock-runtime
-
-This made it the perfect opportunity to establish a unified provider architecture.
-
-## Decision
-
-We refactored the provider infrastructure to use a **unified Provider ABC** with optional capabilities:
-
-### 1. Unified Provider Interface
-
-**New Structure:**
-```
-nextcloud_mcp_server/providers/
-├── __init__.py
-├── base.py              # Provider ABC with optional capabilities
-├── registry.py          # Auto-detection and factory
-├── ollama.py            # Supports both embedding + generation
-├── anthropic.py         # Generation only
-├── bedrock.py           # Supports both embedding + generation
-└── simple.py            # Embedding only (testing fallback)
-```
-
-**Base Class (`providers/base.py`):**
-```python
-class Provider(ABC):
-    @property
-    @abstractmethod
-    def supports_embeddings(self) -> bool:
-        """Whether this provider supports embedding generation."""
-        pass
-
-    @property
-    @abstractmethod
-    def supports_generation(self) -> bool:
-        """Whether this provider supports text generation."""
-        pass
-
-    @abstractmethod
-    async def embed(self, text: str) -> list[float]:
-        """Generate embedding (raises NotImplementedError if not supported)."""
-        pass
-
-    @abstractmethod
-    async def embed_batch(self, texts: list[str]) -> list[list[float]]:
-        """Generate batch embeddings (raises NotImplementedError if not supported)."""
-        pass
-
-    @abstractmethod
-    def get_dimension(self) -> int:
-        """Get embedding dimension (raises NotImplementedError if not supported)."""
-        pass
-
-    @abstractmethod
-    async def generate(self, prompt: str, max_tokens: int = 500) -> str:
-        """Generate text (raises NotImplementedError if not supported)."""
-        pass
-
-    @abstractmethod
-    async def close(self) -> None:
-        """Close provider and release resources."""
-        pass
-```
-
-### 2. Provider Registry
-
-**Auto-Detection Priority** (`providers/registry.py`):
-```python
-class ProviderRegistry:
-    @staticmethod
-    def create_provider() -> Provider:
-        # 1. Bedrock (AWS_REGION or BEDROCK_*_MODEL)
-        # 2. Ollama (OLLAMA_BASE_URL)
-        # 3. Simple (fallback)
-```
-
-**Environment Variables:**
-
-**Bedrock:**
- `AWS_REGION`: AWS region (e.g., "us-east-1")
- `AWS_ACCESS_KEY_ID`: AWS access key (optional, uses credential chain)
- `AWS_SECRET_ACCESS_KEY`: AWS secret key (optional)
- `BEDROCK_EMBEDDING_MODEL`: Model ID for embeddings (e.g., "amazon.titan-embed-text-v2:0")
- `BEDROCK_GENERATION_MODEL`: Model ID for text generation (e.g., "anthropic.claude-3-sonnet-20240229-v1:0")
-
-**Ollama:**
- `OLLAMA_BASE_URL`: Ollama API base URL (e.g., "http://localhost:11434")
- `OLLAMA_EMBEDDING_MODEL`: Model for embeddings (default: "nomic-embed-text")
- `OLLAMA_GENERATION_MODEL`: Model for text generation (e.g., "llama3.2:1b")
- `OLLAMA_VERIFY_SSL`: Verify SSL certificates (default: "true")
-
-**Simple (no configuration, fallback):**
- `SIMPLE_EMBEDDING_DIMENSION`: Embedding dimension (default: 384)
-
-### 3. Backward Compatibility
-
-**Old Code Continues to Work:**
-```python
-# Old way (still works)
-from nextcloud_mcp_server.embedding import get_embedding_service
-
-service = get_embedding_service()  # Returns singleton Provider
-embeddings = await service.embed_batch(texts)
-```
-
-**New Way (recommended):**
-```python
-# New way (cleaner)
-from nextcloud_mcp_server.providers import get_provider
-
-provider = get_provider()  # Returns singleton Provider
-embeddings = await provider.embed_batch(texts)
-
-# Can also use generation if provider supports it
-if provider.supports_generation:
-    text = await provider.generate("prompt")
-```
-
-**Migration Path:**
- `embedding/service.py` now wraps `providers.get_provider()` for compatibility
- `tests/rag_evaluation/llm_providers.py` now uses unified providers
- Old imports still work, marked as deprecated in docstrings
-
-### 4. Amazon Bedrock Implementation
-
-**Features:**
- Supports both embeddings and text generation
- Model-specific request/response handling for:
-  - Titan Embed (amazon.titan-embed-text-*)
-  - Cohere Embed (cohere.embed-*)
-  - Claude (anthropic.claude-*)
-  - Llama (meta.llama3-*)
-  - Titan Text (amazon.titan-text-*)
-  - Mistral (mistral.*)
- Uses boto3 bedrock-runtime client
- Graceful degradation if boto3 not installed
- Async implementation matching existing patterns
-
-**Model-Specific Handling:**
-```python
-# Bedrock embedding request (Titan)
-{"inputText": text}
-
-# Bedrock generation request (Claude)
-{
-    "anthropic_version": "bedrock-2023-05-31",
-    "max_tokens": max_tokens,
-    "temperature": 0.7,
-    "messages": [{"role": "user", "content": prompt}]
-}
-```
-
-## Consequences
-
-### Positive
-
-1. **Sustainable Provider Additions**
-   - New providers only need to implement `Provider` ABC
-   - Auto-detection via environment variables
-   - No modifications to existing code required
-
-2. **Code Consolidation**
-   - Single provider interface instead of two
-   - Unified configuration pattern
-   - Eliminated duplication
-
-3. **Better Extensibility**
-   - Providers can support one or both capabilities
-   - Clear capability detection via properties
-   - Registry pattern simplifies auto-detection
-
-4. **Improved Testing**
-   - RAG evaluation can use any provider (Ollama, Anthropic, Bedrock)
-   - Comprehensive unit tests for all providers
-   - Mocked boto3 tests for Bedrock
-
-5. **Production-Ready Bedrock Support**
-   - Full embedding and generation support
-   - Multiple model families supported
-   - AWS credential chain integration
-
-### Neutral
-
-1. **Optional Boto3 Dependency**
-   - boto3 is dev dependency only (not required for core functionality)
-   - Bedrock provider gracefully fails if boto3 not installed
-   - Users who want Bedrock must `pip install boto3`
-
-2. **Capability Properties**
-   - All providers must implement capability properties
-   - Methods raise `NotImplementedError` if capability not supported
-   - Clear error messages guide users to alternatives
-
-### Negative
-
-1. **Migration Effort**
-   - Existing code must be migrated to new imports (optional, backward compatible)
-   - Documentation needs updating
-   - Users must learn new environment variables
-
-2. **Increased Complexity**
-   - Provider base class has more methods (embedding + generation)
-   - More environment variables to configure
-   - Capability detection adds runtime checks
-
-## Implementation
-
-### Files Created
-
-**New Provider Infrastructure:**
- `nextcloud_mcp_server/providers/__init__.py`
- `nextcloud_mcp_server/providers/base.py`
- `nextcloud_mcp_server/providers/registry.py`
- `nextcloud_mcp_server/providers/ollama.py`
- `nextcloud_mcp_server/providers/anthropic.py`
- `nextcloud_mcp_server/providers/bedrock.py`
- `nextcloud_mcp_server/providers/simple.py`
-
-**Tests:**
- `tests/unit/providers/__init__.py`
- `tests/unit/providers/test_bedrock.py` (9 unit tests)
-
-**Documentation:**
- `docs/ADR-015-unified-provider-architecture.md` (this file)
-
-### Files Modified
-
-**Backward Compatibility:**
- `nextcloud_mcp_server/embedding/service.py` - Now wraps `get_provider()`
- `tests/rag_evaluation/llm_providers.py` - Uses unified providers
-
-**Dependencies:**
- `pyproject.toml` - Added `boto3>=1.35.0` to dev dependencies
-
-### Testing Results
-
-**Unit Tests:** 127 passed (including 9 new Bedrock tests)
-**Type Checking:** All checks passed (ty)
-**Linting:** All checks passed (ruff)
-**Backward Compatibility:** Verified - existing embedding tests work
-
-## Alternatives Considered
-
-### Alternative 1: Keep Separate Provider Systems
-
-**Pros:**
- No refactoring needed
- Simpler short-term
-
-**Cons:**
- Bedrock would need to be implemented twice
- Continued code duplication
- No long-term scalability
-
-**Decision:** Rejected - technical debt would continue to grow
-
-### Alternative 2: Separate Embedding and Generation Providers
-
-Use composition instead of unified interface:
-```python
-class CombinedProvider:
-    def __init__(self, embedding: EmbeddingProvider, generation: LLMProvider):
-        self.embedding = embedding
-        self.generation = generation
-```
-
-**Pros:**
- Clearer separation of concerns
- Simpler individual providers
-
-**Cons:**
- Bedrock and Ollama naturally do both - artificial separation
- More complex configuration (two providers to configure)
- More boilerplate code
-
-**Decision:** Rejected - unified interface better matches provider capabilities
-
-### Alternative 3: Plugin System
-
-Dynamic provider registration via entry points:
-```python
-# setup.py
-entry_points={
-    'nextcloud_mcp.providers': [
-        'ollama = nextcloud_mcp_server.providers.ollama:OllamaProvider',
-        'bedrock = nextcloud_mcp_server.providers.bedrock:BedrockProvider',
-    ]
-}
-```
-
-**Pros:**
- Most extensible
- Third-party providers possible
-
-**Cons:**
- Over-engineered for current needs
- Added complexity
- No immediate benefit
-
-**Decision:** Deferred - can add later if needed
-
-## Future Work
-
-1. **Additional Providers**
-   - OpenAI (embeddings + generation)
-   - Cohere (embeddings + generation)
-   - Google Vertex AI
-   - Azure OpenAI
-
-2. **Provider Features**
-   - Streaming generation support
-   - Batch API optimization (when available)
-   - Model-specific optimizations
-   - Cost tracking and metrics
-
-3. **Configuration Improvements**
-   - Provider profiles (development, production)
-   - Model aliasing (e.g., "small", "large")
-   - Fallback provider chains
-
-4. **Testing**
-   - Integration tests with real Bedrock endpoints
-   - Performance benchmarking across providers
-   - Cost comparison analysis
-
-## References
-
- [boto3 Bedrock Runtime Documentation](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/bedrock-runtime.html)
- [Amazon Bedrock User Guide](https://docs.aws.amazon.com/bedrock/latest/userguide/what-is-bedrock.html)
- ADR-003: Vector Database and Semantic Search
- ADR-008: MCP Sampling for Semantic Search
- ADR-013: RAG Evaluation Framework
@@ -1,338 +0,0 @@
-# Amazon Bedrock Setup Guide
-
-This guide covers how to configure the Nextcloud MCP Server to use Amazon Bedrock for embeddings and text generation.
-
-## Prerequisites
-
-1. **AWS Account** with access to Amazon Bedrock
-2. **boto3 library** installed: `pip install boto3` or `uv sync --group dev`
-3. **Model Access** - Request access to models in AWS Bedrock console
-
-## Required AWS Permissions
-
-### IAM Policy for Bedrock Access
-
-The AWS IAM user or role needs the following permissions:
-
-```json
-{
-  "Version": "2012-10-17",
-  "Statement": [
-    {
-      "Sid": "BedrockInvokeModels",
-      "Effect": "Allow",
-      "Action": [
-        "bedrock:InvokeModel",
-        "bedrock:InvokeModelWithResponseStream"
-      ],
-      "Resource": [
-        "arn:aws:bedrock:*::foundation-model/*"
-      ]
-    }
-  ]
-}
-```
-
-### Minimal Permissions (Production)
-
-For production deployments, restrict to specific models:
-
-```json
-{
-  "Version": "2012-10-17",
-  "Statement": [
-    {
-      "Sid": "BedrockEmbeddings",
-      "Effect": "Allow",
-      "Action": [
-        "bedrock:InvokeModel"
-      ],
-      "Resource": [
-        "arn:aws:bedrock:us-east-1::foundation-model/amazon.titan-embed-text-v2:0"
-      ]
-    },
-    {
-      "Sid": "BedrockGeneration",
-      "Effect": "Allow",
-      "Action": [
-        "bedrock:InvokeModel"
-      ],
-      "Resource": [
-        "arn:aws:bedrock:us-east-1::foundation-model/anthropic.claude-3-sonnet-20240229-v1:0"
-      ]
-    }
-  ]
-}
-```
-
-### Additional Permissions (Optional)
-
-For advanced use cases:
-
-```json
-{
-  "Version": "2012-10-17",
-  "Statement": [
-    {
-      "Sid": "BedrockListModels",
-      "Effect": "Allow",
-      "Action": [
-        "bedrock:ListFoundationModels",
-        "bedrock:GetFoundationModel"
-      ],
-      "Resource": "*"
-    },
-    {
-      "Sid": "BedrockAsyncInvoke",
-      "Effect": "Allow",
-      "Action": [
-        "bedrock:InvokeModelAsync",
-        "bedrock:GetAsyncInvoke",
-        "bedrock:ListAsyncInvokes"
-      ],
-      "Resource": [
-        "arn:aws:bedrock:*::foundation-model/*"
-      ]
-    }
-  ]
-}
-```
-
-## Model Access
-
-Before using Bedrock models, you must request access in the AWS Console:
-
-1. Navigate to **Amazon Bedrock** → **Model access**
-2. Click **Manage model access**
-3. Select models you want to use:
-   - **Embeddings:** Amazon Titan Embed Text, Cohere Embed
-   - **Text Generation:** Anthropic Claude, Meta Llama, Amazon Titan Text
-4. Click **Request model access**
-5. Wait for approval (usually instant for most models)
-
-## Supported Models
-
-### Embedding Models
-
-| Provider | Model ID | Dimensions | Best For |
-|----------|----------|------------|----------|
-| Amazon Titan | `amazon.titan-embed-text-v1` | 1,536 | General purpose |
-| Amazon Titan | `amazon.titan-embed-text-v2:0` | 1,024 | Latest, improved quality |
-| Cohere | `cohere.embed-english-v3` | 1,024 | English text |
-| Cohere | `cohere.embed-multilingual-v3` | 1,024 | Multilingual |
-
-### Text Generation Models
-
-| Provider | Model ID | Context | Best For |
-|----------|----------|---------|----------|
-| Anthropic | `anthropic.claude-3-sonnet-20240229-v1:0` | 200K | Balanced performance |
-| Anthropic | `anthropic.claude-3-haiku-20240307-v1:0` | 200K | Fast, cost-effective |
-| Anthropic | `anthropic.claude-3-opus-20240229-v1:0` | 200K | Highest quality |
-| Meta | `meta.llama3-8b-instruct-v1:0` | 8K | Fast, open-source |
-| Meta | `meta.llama3-70b-instruct-v1:0` | 8K | High quality |
-| Amazon | `amazon.titan-text-express-v1` | 8K | Fast, low cost |
-| Mistral | `mistral.mistral-7b-instruct-v0:2` | 32K | Efficient |
-
-## Configuration
-
-### Environment Variables
-
-**Required:**
-```bash
-AWS_REGION=us-east-1
-```
-
-**Optional (at least one model required):**
-```bash
-# For embeddings
-BEDROCK_EMBEDDING_MODEL=amazon.titan-embed-text-v2:0
-
-# For text generation (RAG evaluation)
-BEDROCK_GENERATION_MODEL=anthropic.claude-3-sonnet-20240229-v1:0
-```
-
-**AWS Credentials (choose one method):**
-
-**Method 1: Environment Variables**
-```bash
-AWS_ACCESS_KEY_ID=AKIAIOSFODNN7EXAMPLE
-AWS_SECRET_ACCESS_KEY=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
-```
-
-**Method 2: AWS Credentials File** (`~/.aws/credentials`)
-```ini
-[default]
-aws_access_key_id = AKIAIOSFODNN7EXAMPLE
-aws_secret_access_key = wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
-```
-
-**Method 3: IAM Role** (when running on AWS EC2/ECS/Lambda)
- No credentials needed, uses instance/task role automatically
-
-### Docker Configuration
-
-Add to your `docker-compose.yml`:
-
-```yaml
-services:
-  mcp:
-    environment:
-      - AWS_REGION=us-east-1
-      - BEDROCK_EMBEDDING_MODEL=amazon.titan-embed-text-v2:0
-      - BEDROCK_GENERATION_MODEL=anthropic.claude-3-sonnet-20240229-v1:0
-      - AWS_ACCESS_KEY_ID=${AWS_ACCESS_KEY_ID}
-      - AWS_SECRET_ACCESS_KEY=${AWS_SECRET_ACCESS_KEY}
-```
-
-Or use AWS credentials file volume mount:
-
-```yaml
-services:
-  mcp:
-    volumes:
-      - ~/.aws:/root/.aws:ro
-    environment:
-      - AWS_REGION=us-east-1
-      - BEDROCK_EMBEDDING_MODEL=amazon.titan-embed-text-v2:0
-```
-
-## Usage Examples
-
-### Embeddings Only
-
-```bash
-export AWS_REGION=us-east-1
-export BEDROCK_EMBEDDING_MODEL=amazon.titan-embed-text-v2:0
-export AWS_ACCESS_KEY_ID=your-key
-export AWS_SECRET_ACCESS_KEY=your-secret
-
-uv run nextcloud-mcp-server
-```
-
-### Both Embeddings and Generation
-
-```bash
-export AWS_REGION=us-east-1
-export BEDROCK_EMBEDDING_MODEL=amazon.titan-embed-text-v2:0
-export BEDROCK_GENERATION_MODEL=anthropic.claude-3-sonnet-20240229-v1:0
-
-# For RAG evaluation with Bedrock
-export RAG_EVAL_PROVIDER=bedrock
-export RAG_EVAL_BEDROCK_MODEL=anthropic.claude-3-sonnet-20240229-v1:0
-
-uv run python -m tests.rag_evaluation.evaluate
-```
-
-### Programmatic Usage
-
-```python
-from nextcloud_mcp_server.providers import BedrockProvider
-
-# Embeddings only
-provider = BedrockProvider(
-    region_name="us-east-1",
-    embedding_model="amazon.titan-embed-text-v2:0",
-)
-
-embeddings = await provider.embed_batch(["text1", "text2"])
-
-# Both capabilities
-provider = BedrockProvider(
-    region_name="us-east-1",
-    embedding_model="amazon.titan-embed-text-v2:0",
-    generation_model="anthropic.claude-3-sonnet-20240229-v1:0",
-)
-
-# Generate embeddings
-embedding = await provider.embed("query text")
-
-# Generate text
-response = await provider.generate("Write a summary", max_tokens=500)
-```
-
-## Cost Considerations
-
-### Embedding Costs (as of Jan 2025)
-
-| Model | Price per 1K tokens |
-|-------|---------------------|
-| Titan Embed Text v2 | $0.0001 |
-| Cohere Embed English v3 | $0.0001 |
-
-### Generation Costs (as of Jan 2025)
-
-| Model | Input (per 1K tokens) | Output (per 1K tokens) |
-|-------|----------------------|------------------------|
-| Claude 3 Haiku | $0.00025 | $0.00125 |
-| Claude 3 Sonnet | $0.003 | $0.015 |
-| Claude 3 Opus | $0.015 | $0.075 |
-| Llama 3 8B | $0.0003 | $0.0006 |
-| Titan Text Express | $0.0002 | $0.0006 |
-
-**Note:** Prices vary by region. Check [AWS Bedrock Pricing](https://aws.amazon.com/bedrock/pricing/) for current rates.
-
-## Troubleshooting
-
-### Error: "Executable doesn't exist" or boto3 not found
-
-**Solution:**
-```bash
-uv sync --group dev  # Installs boto3
-```
-
-### Error: "AccessDeniedException"
-
-**Causes:**
-1. IAM permissions missing
-2. Model access not requested
-3. Wrong AWS region
-
-**Solution:**
-1. Verify IAM policy includes `bedrock:InvokeModel`
-2. Request model access in Bedrock console
-3. Check model is available in your region
-
-### Error: "ResourceNotFoundException"
-
-**Cause:** Invalid model ID or model not available in region
-
-**Solution:**
- Verify model ID matches exactly (case-sensitive)
- Check model availability in your AWS region
- Use `aws bedrock list-foundation-models` to see available models
-
-### Error: "ThrottlingException"
-
-**Cause:** Rate limit exceeded
-
-**Solution:**
- Reduce request rate
- Request quota increase via AWS Support
- Use batch operations where possible
-
-## Security Best Practices
-
-1. **Use IAM Roles** when running on AWS infrastructure
-2. **Rotate Access Keys** regularly if using IAM users
-3. **Restrict Permissions** to only required models
-4. **Enable CloudTrail** for audit logging
-5. **Use AWS Secrets Manager** for credential management
-6. **Monitor Costs** with AWS Cost Explorer and Budgets
-
-## Regional Availability
-
-Amazon Bedrock is available in:
- **US East (N. Virginia)**: `us-east-1` ✅ Most models
- **US West (Oregon)**: `us-west-2` ✅ Most models
- **Asia Pacific (Singapore)**: `ap-southeast-1`
- **Asia Pacific (Tokyo)**: `ap-northeast-1`
- **Europe (Frankfurt)**: `eu-central-1`
-
-**Note:** Model availability varies by region. Check the [AWS Bedrock documentation](https://docs.aws.amazon.com/bedrock/latest/userguide/models-regions.html) for current availability.
-
-## References
-
- [AWS Bedrock Documentation](https://docs.aws.amazon.com/bedrock/)
- [AWS Bedrock Pricing](https://aws.amazon.com/bedrock/pricing/)
- [boto3 Bedrock Runtime API](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/bedrock-runtime.html)
- [Provider Architecture ADR](./ADR-015-unified-provider-architecture.md)
@@ -243,7 +243,7 @@ If you see cardinality warnings:
 The observability stack integrates at multiple layers:

 1. **HTTP Layer**: `ObservabilityMiddleware` tracks all HTTP requests
-2. **MCP Layer**: Tools use `@instrument_tool` for automatic metrics and trace span creation
+2. **MCP Layer**: Tools use `@trace_mcp_tool` for span creation
 3. **Client Layer**: `BaseNextcloudClient` tracks all API calls
 4. **OAuth Layer**: Token operations are traced and metered
 5. **Background Tasks**: Vector sync operations emit metrics/traces
@@ -446,7 +446,7 @@ async def app_lifespan_basic(server: FastMCP) -> AsyncIterator[AppContext]:
        # Start background tasks using anyio TaskGroup
        async with anyio.create_task_group() as tg:
            # Start scanner task
-            await tg.start(
+            tg.start_soon(
                scanner_task,
                send_stream,
                shutdown_event,
@@ -457,7 +457,7 @@ async def app_lifespan_basic(server: FastMCP) -> AsyncIterator[AppContext]:

            # Start processor pool (each gets a cloned receive stream)
            for i in range(settings.vector_sync_processor_workers):
-                await tg.start(
+                tg.start_soon(
                    processor_task,
                    i,
                    receive_stream.clone(),
@@ -507,9 +507,9 @@ async def setup_oauth_config():
    - External IdP mode: OIDC_DISCOVERY_URL points to external provider
      → External IdP for OAuth, Nextcloud user_oidc validates tokens and provides API access

-    Uses OIDC environment variables:
+    Uses generic OIDC environment variables:
    - OIDC_DISCOVERY_URL: OIDC discovery endpoint (optional, defaults to NEXTCLOUD_HOST)
-    - NEXTCLOUD_OIDC_CLIENT_ID / NEXTCLOUD_OIDC_CLIENT_SECRET: Static credentials (optional, uses DCR if not provided)
+    - OIDC_CLIENT_ID / OIDC_CLIENT_SECRET: Static credentials (optional, uses DCR if not provided)
    - NEXTCLOUD_OIDC_SCOPES: Requested OAuth scopes

    This is done synchronously before FastMCP initialization because FastMCP
@@ -633,21 +633,19 @@ async def setup_oauth_config():
            )

    # Load client credentials (static or dynamic registration)
-    client_id = os.getenv("NEXTCLOUD_OIDC_CLIENT_ID")
-    client_secret = os.getenv("NEXTCLOUD_OIDC_CLIENT_SECRET")
+    client_id = os.getenv("OIDC_CLIENT_ID")
+    client_secret = os.getenv("OIDC_CLIENT_SECRET")

    if client_id and client_secret:
        logger.info(f"Using static OIDC client credentials: {client_id}")
    elif registration_endpoint:
-        logger.info(
-            "NEXTCLOUD_OIDC_CLIENT_ID not set, attempting Dynamic Client Registration"
-        )
+        logger.info("OIDC_CLIENT_ID not set, attempting Dynamic Client Registration")
        client_id, client_secret = await load_oauth_client_credentials(
            nextcloud_host=nextcloud_host, registration_endpoint=registration_endpoint
        )
    else:
        raise ValueError(
-            "NEXTCLOUD_OIDC_CLIENT_ID and NEXTCLOUD_OIDC_CLIENT_SECRET environment variables are required "
+            "OIDC_CLIENT_ID and OIDC_CLIENT_SECRET environment variables are required "
            "when the OIDC provider does not support Dynamic Client Registration. "
            f"Discovery URL: {discovery_url}"
        )
@@ -1147,7 +1145,7 @@ def get_app(transport: str = "sse", enabled_apps: list[str] | None = None):
                # Start background tasks using anyio TaskGroup
                async with anyio_module.create_task_group() as tg:
                    # Start scanner task
-                    await tg.start(
+                    tg.start_soon(
                        scanner_task,
                        send_stream,
                        shutdown_event,
@@ -1158,7 +1156,7 @@ def get_app(transport: str = "sse", enabled_apps: list[str] | None = None):

                    # Start processor pool (each gets a cloned receive stream)
                    for i in range(settings.vector_sync_processor_workers):
-                        await tg.start(
+                        tg.start_soon(
                            processor_task,
                            i,
                            receive_stream.clone(),
@@ -1477,11 +1475,6 @@ def get_app(transport: str = "sse", enabled_apps: list[str] | None = None):
        user_info_html,
        vector_sync_status_fragment,
    )
-    from nextcloud_mcp_server.auth.viz_routes import (
-        chunk_context_endpoint,
-        vector_visualization_html,
-        vector_visualization_search,
-    )
    from nextcloud_mcp_server.auth.webhook_routes import (
        disable_webhook_preset,
        enable_webhook_preset,
@@ -1501,20 +1494,6 @@ def get_app(transport: str = "sse", enabled_apps: list[str] | None = None):
            vector_sync_status_fragment,
            methods=["GET"],
        ),  # /app/vector-sync/status
-        # Vector visualization routes
-        Route(
-            "/vector-viz", vector_visualization_html, methods=["GET"]
-        ),  # /app/vector-viz
-        Route(
-            "/vector-viz/search",
-            vector_visualization_search,
-            methods=["GET"],
-        ),  # /app/vector-viz/search
-        Route(
-            "/chunk-context",
-            chunk_context_endpoint,
-            methods=["GET"],
-        ),  # /app/chunk-context
        # Webhook management routes (admin-only)
        Route("/webhooks", webhook_management_pane, methods=["GET"]),  # /app/webhooks
        Route(
@@ -1529,7 +1508,7 @@ def get_app(transport: str = "sse", enabled_apps: list[str] | None = None):

    browser_app = Starlette(routes=browser_routes)
    browser_app.add_middleware(
-        AuthenticationMiddleware,  # type: ignore[invalid-argument-type]
+        AuthenticationMiddleware,
        backend=SessionAuthBackend(oauth_enabled=oauth_enabled),
    )

@@ -1619,7 +1598,7 @@ def get_app(transport: str = "sse", enabled_apps: list[str] | None = None):

    # Add CORS middleware to allow browser-based clients like MCP Inspector
    app.add_middleware(
-        CORSMiddleware,  # type: ignore[invalid-argument-type]
+        CORSMiddleware,
        allow_origins=["*"],  # Allow all origins for development
        allow_credentials=True,
        allow_methods=["*"],
@@ -1629,7 +1608,7 @@ def get_app(transport: str = "sse", enabled_apps: list[str] | None = None):

    # Add observability middleware (metrics + tracing)
    if settings.metrics_enabled or settings.otel_exporter_otlp_endpoint:
-        app.add_middleware(ObservabilityMiddleware)  # type: ignore[invalid-argument-type]
+        app.add_middleware(ObservabilityMiddleware)
        logger.info("Observability middleware enabled (metrics and/or tracing)")

    # Add exception handler for scope challenges (OAuth mode only)
@@ -1310,7 +1310,7 @@ async def generate_encryption_key() -> str:

 # Example usage
 if __name__ == "__main__":
-    import anyio
+    import asyncio

    async def main():
        # Generate a key for testing
@@ -1318,4 +1318,4 @@ if __name__ == "__main__":
        print(f"Generated encryption key: {key}")
        print(f"Set this in your environment: export TOKEN_ENCRYPTION_KEY='{key}'")

-    anyio.run(main)
+    asyncio.run(main())
@@ -1,339 +0,0 @@
-<style>
-    .viz-card {
-        background: white;
-        border-radius: 8px;
-        padding: 20px;
-        margin-bottom: 20px;
-        box-shadow: 0 2px 4px rgba(0,0,0,0.1);
-    }
-    .viz-controls {
-        margin-bottom: 20px;
-    }
-    .viz-control-row {
-        display: grid;
-        grid-template-columns: 2fr 1fr auto;
-        gap: 12px;
-        margin-bottom: 12px;
-        align-items: end;
-    }
-    .viz-control-group {
-        margin-bottom: 15px;
-    }
-    .viz-control-group label {
-        display: block;
-        margin-bottom: 5px;
-        font-weight: 500;
-        color: #333;
-    }
-    .viz-control-group input[type="text"],
-    .viz-control-group input[type="number"],
-    .viz-control-group select {
-        width: 100%;
-        padding: 8px 12px;
-        border: 1px solid #ddd;
-        border-radius: 4px;
-        font-size: 14px;
-    }
-    .viz-control-group input[type="range"] {
-        width: 100%;
-    }
-    .viz-control-group select[multiple] {
-        min-height: 100px;
-    }
-    .viz-weight-display {
-        display: inline-block;
-        min-width: 40px;
-        text-align: right;
-        color: #666;
-    }
-    .viz-btn {
-        background: #0066cc;
-        color: white;
-        border: none;
-        padding: 10px 20px;
-        border-radius: 4px;
-        cursor: pointer;
-        font-size: 14px;
-        font-weight: 500;
-    }
-    .viz-btn:hover {
-        background: #0052a3;
-    }
-    .viz-btn-secondary {
-        background: #6c757d;
-        color: white;
-        border: none;
-        padding: 6px 12px;
-        border-radius: 4px;
-        cursor: pointer;
-        font-size: 13px;
-        margin-bottom: 12px;
-    }
-    .viz-btn-secondary:hover {
-        background: #5a6268;
-    }
-    #viz-plot-container {
-        width: 100%;
-        height: 600px;
-        position: relative;
-    }
-    #viz-plot {
-        width: 100%;
-        height: 100%;
-    }
-    .viz-loading {
-        text-align: center;
-        padding: 40px;
-        color: #666;
-    }
-    .viz-loading-overlay {
-        position: absolute;
-        inset: 0;
-        display: flex;
-        align-items: center;
-        justify-content: center;
-        background: white;
-        color: #666;
-    }
-    .viz-no-results {
-        text-align: center;
-        padding: 40px;
-        color: #666;
-        font-style: italic;
-    }
-    .viz-advanced-section {
-        margin-top: 16px;
-        padding: 16px;
-        background: #f8f9fa;
-        border-radius: 4px;
-        border: 1px solid #dee2e6;
-    }
-    .viz-advanced-grid {
-        display: grid;
-        grid-template-columns: 1fr 1fr;
-        gap: 20px;
-    }
-    .viz-info-box {
-        background: #e3f2fd;
-        border-left: 4px solid #2196f3;
-        padding: 12px;
-        margin-bottom: 20px;
-        font-size: 14px;
-    }
-    .chunk-toggle-btn {
-        background: #6c757d;
-        color: white;
-        border: none;
-        padding: 4px 10px;
-        border-radius: 3px;
-        cursor: pointer;
-        font-size: 12px;
-        margin-top: 6px;
-    }
-    .chunk-toggle-btn:hover {
-        background: #5a6268;
-    }
-    .chunk-context {
-        background: #f8f9fa;
-        border: 1px solid #dee2e6;
-        border-radius: 4px;
-        padding: 12px;
-        margin-top: 8px;
-        font-family: monospace;
-        font-size: 13px;
-        line-height: 1.6;
-        white-space: pre-wrap;
-        word-wrap: break-word;
-    }
-    .chunk-text {
-        color: #666;
-    }
-    .chunk-matched {
-        background: #fff3cd;
-        border: 1px solid #ffc107;
-        padding: 2px 4px;
-        border-radius: 2px;
-        font-weight: 500;
-        color: #333;
-    }
-    .chunk-ellipsis {
-        color: #999;
-        font-style: italic;
-    }
-</style>
-
-<div x-data="vizApp()">
-    <div class="viz-card">
-        <h2>Vector Visualization</h2>
-        <div class="viz-info-box">
-            Testing search algorithms on your indexed documents. User: <strong>{{ username }}</strong>
-        </div>
-
-        <form @submit.prevent="executeSearch">
-            <div class="viz-controls">
-                <!-- Main Controls -->
-                <div class="viz-control-group">
-                    <label>Search Query</label>
-                    <input type="text" x-model="query" placeholder="Enter search query..." required />
-                </div>
-
-                <div class="viz-control-row">
-                    <div class="viz-control-group" style="margin-bottom: 0;">
-                        <label>Algorithm</label>
-                        <select x-model="algorithm">
-                            <option value="semantic">Semantic (Dense Vectors)</option>
-                            <option value="bm25_hybrid" selected>BM25 Hybrid (Dense + Sparse)</option>
-                        </select>
-                    </div>
-
-                    <div class="viz-control-group" style="margin-bottom: 0;">
-                        <label>Fusion Method</label>
-                        <select x-model="fusion" :disabled="algorithm !== 'bm25_hybrid'" :style="algorithm !== 'bm25_hybrid' ? 'opacity: 0.5; cursor: not-allowed;' : ''">
-                            <option value="rrf" selected>RRF (Reciprocal Rank Fusion)</option>
-                            <option value="dbsf">DBSF (Distribution-Based Score Fusion)</option>
-                        </select>
-                    </div>
-
-                    <div style="display: flex; align-items: flex-end;">
-                        <button type="submit" class="viz-btn" style="width: 100%;">Search & Visualize</button>
-                    </div>
-
-                    <div style="display: flex; align-items: flex-end;">
-                        <button type="button" class="viz-btn-secondary" @click="showAdvanced = !showAdvanced" style="white-space: nowrap;">
-                            <span x-text="showAdvanced ? 'Hide Advanced' : 'Advanced'"></span>
-                        </button>
-                    </div>
-                </div>
-
-                <!-- Advanced Options (Collapsible) -->
-                <div class="viz-advanced-section" x-show="showAdvanced" x-transition.opacity.duration.200ms>
-                    <h3 style="margin-top: 0; margin-bottom: 16px; font-size: 16px;">Advanced Options</h3>
-
-                    <div class="viz-advanced-grid">
-                        <div class="viz-control-group">
-                            <label style="display: block; margin-bottom: 8px;">Document Types</label>
-                            <div style="display: grid; grid-template-columns: 1fr; gap: 6px;">
-                                <label style="display: flex; align-items: center; cursor: pointer; font-weight: normal;">
-                                    <input type="checkbox" x-model="docTypes" value="" style="margin-right: 8px;">
-                                    <span>All Types</span>
-                                </label>
-                                <label style="display: flex; align-items: center; cursor: pointer; font-weight: normal;">
-                                    <input type="checkbox" x-model="docTypes" value="note" style="margin-right: 8px;">
-                                    <span>Notes</span>
-                                </label>
-                                <label style="display: flex; align-items: center; cursor: pointer; font-weight: normal;">
-                                    <input type="checkbox" x-model="docTypes" value="file" style="margin-right: 8px;">
-                                    <span>Files</span>
-                                </label>
-                                <label style="display: flex; align-items: center; cursor: pointer; font-weight: normal;">
-                                    <input type="checkbox" x-model="docTypes" value="calendar" style="margin-right: 8px;">
-                                    <span>Calendar Events</span>
-                                </label>
-                                <label style="display: flex; align-items: center; cursor: pointer; font-weight: normal;">
-                                    <input type="checkbox" x-model="docTypes" value="contact" style="margin-right: 8px;">
-                                    <span>Contacts</span>
-                                </label>
-                                <label style="display: flex; align-items: center; cursor: pointer; font-weight: normal;">
-                                    <input type="checkbox" x-model="docTypes" value="deck" style="margin-right: 8px;">
-                                    <span>Deck Cards</span>
-                                </label>
-                            </div>
-                        </div>
-
-                        <div>
-                            <div class="viz-control-group">
-                                <label>Score Threshold (Semantic/Hybrid)</label>
-                                <input type="number" x-model.number="scoreThreshold" min="0" max="1" step="any" />
-                            </div>
-
-                            <div class="viz-control-group">
-                                <label>Result Limit</label>
-                                <input type="number" x-model.number="limit" min="1" max="100" />
-                            </div>
-                        </div>
-                    </div>
-
-                    <!-- Info: BM25 Hybrid fusion methods -->
-                    <div x-show="algorithm === 'bm25_hybrid'" style="margin-top: 16px; padding: 12px; background: #e9ecef; border-radius: 4px;">
-                        <p style="margin: 0; font-size: 14px; color: #666;">
-                            <strong>BM25 Hybrid Search:</strong> Combines dense semantic vectors with sparse BM25 keyword vectors.
-                        </p>
-                        <p style="margin: 8px 0 0 0; font-size: 13px; color: #666;">
-                            <strong>RRF:</strong> Reciprocal Rank Fusion - Rank-based fusion producing scores in [0.0, 1.0]
-                        </p>
-                        <p style="margin: 4px 0 0 0; font-size: 13px; color: #666;">
-                            <strong>DBSF:</strong> Distribution-Based Score Fusion - Sums normalized scores (can exceed 1.0)
-                        </p>
-                    </div>
-                </div>
-            </div>
-        </form>
-    </div>
-
-    <div class="viz-card">
-        <div id="viz-plot-container">
-            <div x-show="loading" class="viz-loading-overlay" x-transition.opacity.duration.200ms>
-                Executing search and computing PCA projection...
-            </div>
-            <div id="viz-plot" x-show="!loading" x-transition.opacity.duration.200ms></div>
-        </div>
-    </div>
-
-    <div class="viz-card">
-        <h3>Search Results (<span x-text="loading ? '...' : results.length"></span>)</h3>
-
-        <div x-show="loading" class="viz-loading" x-transition.opacity.duration.200ms>
-            Loading results...
-        </div>
-
-        <div x-show="!loading && results.length === 0" class="viz-no-results" x-transition.opacity.duration.200ms>
-            No results found. Try a different query or adjust your search parameters.
-        </div>
-
-        <template x-if="!loading && results.length > 0">
-            <div x-transition.opacity.duration.200ms>
-                <template x-for="result in results" :key="result.id">
-                    <div style="padding: 12px; border-bottom: 1px solid #eee;">
-                        <a :href="getNextcloudUrl(result)" target="_blank" style="font-weight: 500; color: #0066cc; text-decoration: none;">
-                            <span x-text="result.title"></span>
-                        </a>
-                        <div style="font-size: 14px; color: #666; margin-top: 4px;" x-text="result.excerpt"></div>
-                        <div style="font-size: 12px; color: #999; margin-top: 4px;">
-                            Raw Score: <span x-text="result.original_score.toFixed(3)"></span>
-                            (<span x-text="(result.score * 100).toFixed(0)"></span>% relative) |
-                            Type: <span x-text="result.doc_type"></span>
-                        </div>
-
-                        <!-- Show Chunk button (only if chunk position is available) -->
-                        <template x-if="hasChunkPosition(result)">
-                            <button
-                                class="chunk-toggle-btn"
-                                @click="toggleChunk(result)"
-                                x-text="isChunkExpanded(`${result.doc_type}_${result.id}`) ? 'Hide Chunk' : 'Show Chunk'"
-                            ></button>
-                        </template>
-
-                        <!-- Chunk context (expanded inline) -->
-                        <template x-if="isChunkExpanded(`${result.doc_type}_${result.id}`)">
-                            <div class="chunk-context" x-transition.opacity.duration.200ms>
-                                <template x-if="chunkLoading[`${result.doc_type}_${result.id}`]">
-                                    <div style="color: #666; font-style: italic;">Loading chunk...</div>
-                                </template>
-                                <template x-if="!chunkLoading[`${result.doc_type}_${result.id}`]">
-                                    <div>
-                                        <template x-if="expandedChunks[`${result.doc_type}_${result.id}`]?.has_more_before">
-                                            <span class="chunk-ellipsis">...</span>
-                                        </template>
-                                        <span class="chunk-text" x-text="expandedChunks[`${result.doc_type}_${result.id}`]?.before_context"></span><span class="chunk-matched" x-text="expandedChunks[`${result.doc_type}_${result.id}`]?.chunk_text"></span><span class="chunk-text" x-text="expandedChunks[`${result.doc_type}_${result.id}`]?.after_context"></span><template x-if="expandedChunks[`${result.doc_type}_${result.id}`]?.has_more_after">
-                                            <span class="chunk-ellipsis">...</span>
-                                        </template>
-                                    </div>
-                                </template>
-                            </div>
-                        </template>
-                    </div>
-                </template>
-            </div>
-        </template>
-    </div>
-</div>
@@ -14,11 +14,11 @@ The Token Broker provides:
 - Session vs background token separation (RFC 8693)
 """

+import asyncio
 import logging
 from datetime import datetime, timedelta, timezone
 from typing import Dict, Optional, Tuple

-import anyio
 import httpx
 import jwt
 from cryptography.fernet import Fernet
@@ -43,7 +43,7 @@ class TokenCache:
        self._cache: Dict[str, Tuple[str, datetime]] = {}
        self._ttl = timedelta(seconds=ttl_seconds)
        self._early_refresh = timedelta(seconds=early_refresh_seconds)
-        self._lock = anyio.Lock()
+        self._lock = asyncio.Lock()

    async def get(self, user_id: str) -> Optional[str]:
        """Get cached token if valid."""
@@ -489,16 +489,6 @@ async def user_info_html(request: Request) -> HTMLResponse:
            str(request.url_for("oauth_logout")) if oauth_ctx else "/oauth/logout"
        )

-    # Get Nextcloud host for generating links to apps (used by viz tab)
-    # Use public issuer URL if available (for browser-accessible links),
-    # otherwise fall back to NEXTCLOUD_HOST from settings
-    from nextcloud_mcp_server.config import get_settings
-
-    settings = get_settings()
-    nextcloud_host_for_links = (
-        os.getenv("NEXTCLOUD_PUBLIC_ISSUER_URL") or settings.nextcloud_host
-    )
-
    # Build host info HTML (BasicAuth only)
    host_info_html = ""
    if auth_mode == "basic":
@@ -668,174 +658,6 @@ async def user_info_html(request: Request) -> HTMLResponse:
        <!-- Alpine.js for tab state management -->
        <script defer src="https://cdn.jsdelivr.net/npm/alpinejs@3.x.x/dist/cdn.min.js"></script>

-        <!-- Plotly.js for vector visualization -->
-        <script src="https://cdn.plot.ly/plotly-2.27.0.min.js"></script>
-
-        <!-- Vector visualization app (Alpine.js component) -->
-        <script>
-            function vizApp() {{
-                return {{
-                    query: '',
-                    algorithm: 'bm25_hybrid',
-                    fusion: 'rrf',  // Default fusion method for BM25 Hybrid
-                    showAdvanced: false,
-                    docTypes: [''],  // Default to "All Types"
-                    limit: 50,
-                    scoreThreshold: 0.0,
-                    loading: false,
-                    results: [],
-                    expandedChunks: {{}},  // Track which chunks are expanded (result_id -> chunk data)
-                    chunkLoading: {{}},    // Track loading state per result
-
-                    async executeSearch() {{
-                        this.loading = true;
-                        this.results = [];
-
-                        try {{
-                            const params = new URLSearchParams({{
-                                query: this.query,
-                                algorithm: this.algorithm,
-                                limit: this.limit,
-                                score_threshold: this.scoreThreshold,
-                            }});
-
-                            // Add fusion parameter for BM25 Hybrid
-                            if (this.algorithm === 'bm25_hybrid') {{
-                                params.append('fusion', this.fusion);
-                            }}
-
-                            // Add doc_types parameter (filter out empty string for "All Types")
-                            const selectedTypes = this.docTypes.filter(t => t !== '');
-                            if (selectedTypes.length > 0) {{
-                                params.append('doc_types', selectedTypes.join(','));
-                            }}
-
-                            const response = await fetch(`/app/vector-viz/search?${{params}}`);
-                            const data = await response.json();
-
-                            if (data.success) {{
-                                this.results = data.results;
-                                this.renderPlot(data.coordinates_2d, data.results);
-                            }} else {{
-                                alert('Search failed: ' + data.error);
-                            }}
-                        }} catch (error) {{
-                            alert('Error: ' + error.message);
-                        }} finally {{
-                            this.loading = false;
-                        }}
-                    }},
-
-                    renderPlot(coordinates, results) {{
-                        // Calculate score range for auto-scaling
-                        const scores = results.map(r => r.score);
-                        const minScore = Math.min(...scores);
-                        const maxScore = Math.max(...scores);
-
-                        const trace = {{
-                            x: coordinates.map(c => c[0]),
-                            y: coordinates.map(c => c[1]),
-                            mode: 'markers',
-                            type: 'scatter',
-                            text: results.map(r => `${{r.title}}<br>Raw Score: ${{r.original_score.toFixed(3)}} (${{(r.score * 100).toFixed(0)}}% relative)`),
-                            marker: {{
-                                // Multi-channel encoding: size + opacity + color for visual hierarchy
-                                // Power scaling (score^2) amplifies visual differences dramatically
-                                // score=0.0 → 6px, score=0.5 → 9.5px, score=1.0 → 20px
-                                size: results.map(r => 6 + (Math.pow(r.score, 2) * 14)),
-                                // Linear opacity scaling (0.2-1.0 range keeps all points visible)
-                                opacity: results.map(r => 0.2 + (r.score * 0.8)),
-                                // Color gradient shows score
-                                color: scores,
-                                colorscale: 'Viridis',
-                                showscale: true,
-                                colorbar: {{ title: 'Relative Score' }},
-                                // Scores are normalized 0-1 within result set
-                                cmin: 0,
-                                cmax: 1
-                            }}
-                        }};
-
-                        const layout = {{
-                            title: `Vector Space (PCA 2D) - ${{results.length}} results`,
-                            xaxis: {{ title: 'PC1' }},
-                            yaxis: {{ title: 'PC2' }},
-                            hovermode: 'closest',
-                            height: 600
-                        }};
-
-                        Plotly.newPlot('viz-plot', [trace], layout);
-                    }},
-
-                    getNextcloudUrl(result) {{
-                        // Generate Nextcloud URL based on document type
-                        // Use the actual Nextcloud host (port 8080), not the MCP server
-                        const baseUrl = '{nextcloud_host_for_links}';
-
-                        switch (result.doc_type) {{
-                            case 'note':
-                                return `${{baseUrl}}/apps/notes/note/${{result.id}}`;
-                            case 'file':
-                                return `${{baseUrl}}/apps/files/?fileId=${{result.id}}`;
-                            case 'calendar':
-                                return `${{baseUrl}}/apps/calendar`;
-                            case 'contact':
-                                return `${{baseUrl}}/apps/contacts`;
-                            case 'deck':
-                                return `${{baseUrl}}/apps/deck`;
-                            default:
-                                return `${{baseUrl}}`;
-                        }}
-                    }},
-
-                    hasChunkPosition(result) {{
-                        // Check if result has position metadata
-                        return result.chunk_start_offset != null && result.chunk_end_offset != null;
-                    }},
-
-                    isChunkExpanded(resultKey) {{
-                        return this.expandedChunks[resultKey] !== undefined;
-                    }},
-
-                    async toggleChunk(result) {{
-                        const resultKey = `${{result.doc_type}}_${{result.id}}`;
-
-                        // If already expanded, collapse
-                        if (this.isChunkExpanded(resultKey)) {{
-                            delete this.expandedChunks[resultKey];
-                            return;
-                        }}
-
-                        // Otherwise, fetch and expand
-                        this.chunkLoading[resultKey] = true;
-
-                        try {{
-                            const params = new URLSearchParams({{
-                                doc_type: result.doc_type,
-                                doc_id: result.id,
-                                start: result.chunk_start_offset,
-                                end: result.chunk_end_offset,
-                                context: 500  // 500 chars before/after
-                            }});
-
-                            const response = await fetch(`/app/chunk-context?${{params}}`);
-                            const data = await response.json();
-
-                            if (data.success) {{
-                                this.expandedChunks[resultKey] = data;
-                            }} else {{
-                                alert('Failed to load chunk: ' + data.error);
-                            }}
-                        }} catch (error) {{
-                            alert('Error loading chunk: ' + error.message);
-                        }} finally {{
-                            delete this.chunkLoading[resultKey];
-                        }}
-                    }}
-                }}
-            }}
-        </script>
-
        <style>
            body {{
                font-family: -apple-system, BlinkMacSystemFont, "Segoe UI", Roboto, "Helvetica Neue", Arial, sans-serif;
@@ -1024,18 +846,6 @@ async def user_info_html(request: Request) -> HTMLResponse:
                    Vector Sync
                </button>
                '''
-    }
-                {
-        ""
-        if not show_vector_sync_tab
-        else '''
-                <button
-                    class="tab"
-                    :class="activeTab === 'vector-viz' ? 'active' : ''"
-                    @click="activeTab = 'vector-viz'">
-                    Vector Viz
-                </button>
-                '''
    }
                {
        ""
@@ -1071,19 +881,6 @@ async def user_info_html(request: Request) -> HTMLResponse:

                {
        ""
-        if not show_vector_sync_tab
-        else '''
-                <!-- Vector Viz Tab -->
-                <div class="tab-pane" x-show="activeTab === 'vector-viz'" x-transition.opacity.duration.150ms>
-                    <div hx-get="/app/vector-viz" hx-trigger="load" hx-swap="outerHTML">
-                        <p style="color: #999;">Loading vector visualization...</p>
-                    </div>
-                </div>
-                '''
-    }
-
-                {
-        ""
        if not show_webhooks_tab
        else f'''
                <!-- Webhooks Tab (admin-only, loaded dynamically) -->
@@ -1,492 +0,0 @@
-"""Vector visualization routes for testing search algorithms.
-
-Provides a web UI for users to test different search algorithms on their own
-indexed documents and visualize results in 2D space using PCA.
-
-All processing happens server-side following ADR-012:
- Search execution via shared search/algorithms.py
- PCA dimensionality reduction (768-dim → 2D)
- Only 2D coordinates + metadata sent to client
- Bandwidth-efficient (2 floats per doc vs 768)
-"""
-
-import logging
-import time
-from pathlib import Path
-
-import numpy as np
-from jinja2 import Environment, FileSystemLoader
-from starlette.authentication import requires
-from starlette.requests import Request
-from starlette.responses import HTMLResponse, JSONResponse
-
-from nextcloud_mcp_server.config import get_settings
-from nextcloud_mcp_server.search import (
-    BM25HybridSearchAlgorithm,
-    SemanticSearchAlgorithm,
-)
-from nextcloud_mcp_server.vector.pca import PCA
-from nextcloud_mcp_server.vector.qdrant_client import get_qdrant_client
-
-logger = logging.getLogger(__name__)
-
-# Setup Jinja2 environment for templates
-_template_dir = Path(__file__).parent / "templates"
-_jinja_env = Environment(loader=FileSystemLoader(_template_dir))
-
-
-@requires("authenticated", redirect="oauth_login")
-async def vector_visualization_html(request: Request) -> HTMLResponse:
-    """Vector visualization page with search controls and interactive plot.
-
-    Provides UI for testing search algorithms with real-time visualization.
-    Requires vector sync to be enabled.
-
-    Args:
-        request: Starlette request object
-
-    Returns:
-        HTML page with search interface
-    """
-    settings = get_settings()
-
-    if not settings.vector_sync_enabled:
-        return HTMLResponse(
-            """
-            <div>
-                <h2>Vector Visualization</h2>
-                <div style="padding: 20px; background: #fff3cd; border: 1px solid #ffc107; border-radius: 4px;">
-                    Vector sync is not enabled. Set VECTOR_SYNC_ENABLED=true to use this feature.
-                </div>
-            </div>
-            """
-        )
-
-    # Get user info from auth context
-    username = (
-        request.user.display_name
-        if hasattr(request.user, "display_name")
-        else "unknown"
-    )
-
-    # Load and render template
-    template = _jinja_env.get_template("vector_viz.html")
-    html_content = template.render(username=username)
-    return HTMLResponse(content=html_content)
-
-
-@requires("authenticated", redirect="oauth_login")
-async def vector_visualization_search(request: Request) -> JSONResponse:
-    """Execute server-side search and return 2D coordinates + results.
-
-    All processing happens server-side:
-    1. Execute search via shared algorithm module
-    2. Fetch matching vectors from Qdrant
-    3. Apply PCA reduction (768-dim → 2D)
-    4. Return coordinates + metadata only
-
-    Args:
-        request: Starlette request with query parameters
-
-    Returns:
-        JSON response with coordinates_2d and results
-    """
-    settings = get_settings()
-
-    if not settings.vector_sync_enabled:
-        return JSONResponse(
-            {"success": False, "error": "Vector sync not enabled"},
-            status_code=400,
-        )
-
-    # Get user info from auth context
-    username = (
-        request.user.display_name if hasattr(request.user, "display_name") else None
-    )
-
-    if not username:
-        return JSONResponse(
-            {"success": False, "error": "User not authenticated"},
-            status_code=401,
-        )
-
-    # Parse query parameters
-    query = request.query_params.get("query", "")
-    algorithm = request.query_params.get("algorithm", "bm25_hybrid")
-    limit = int(request.query_params.get("limit", "50"))
-    score_threshold = float(request.query_params.get("score_threshold", "0.0"))
-    fusion = request.query_params.get("fusion", "rrf")  # Default to RRF
-
-    # Parse doc_types (comma-separated list, None = all types)
-    doc_types_param = request.query_params.get("doc_types", "")
-    doc_types = doc_types_param.split(",") if doc_types_param else None
-
-    logger.info(
-        f"Viz search: user={username}, query='{query}', "
-        f"algorithm={algorithm}, fusion={fusion}, limit={limit}, doc_types={doc_types}"
-    )
-
-    try:
-        # Start total request timer
-        request_start = time.perf_counter()
-        # Get authenticated HTTP client from session
-        # In BasicAuth mode: uses username/password from session
-        # In OAuth mode: uses access token from session
-        from nextcloud_mcp_server.auth.userinfo_routes import (
-            _get_authenticated_client_for_userinfo,
-        )
-
-        async with await _get_authenticated_client_for_userinfo(request) as http_client:  # noqa: F841
-            # Create search algorithm (no client needed - verification removed)
-            if algorithm == "semantic":
-                search_algo = SemanticSearchAlgorithm(score_threshold=score_threshold)
-            elif algorithm == "bm25_hybrid":
-                search_algo = BM25HybridSearchAlgorithm(
-                    score_threshold=score_threshold, fusion=fusion
-                )
-            else:
-                return JSONResponse(
-                    {"success": False, "error": f"Unknown algorithm: {algorithm}"},
-                    status_code=400,
-                )
-
-            # Execute search (supports cross-app when doc_types=None)
-            # Get unverified results with buffer for filtering
-            search_start = time.perf_counter()
-            all_results = []
-            if doc_types is None or len(doc_types) == 0:
-                # Cross-app search - search all indexed types
-                unverified_results = await search_algo.search(
-                    query=query,
-                    user_id=username,
-                    limit=limit * 2,  # Buffer for verification filtering
-                    doc_type=None,  # Search all types
-                    score_threshold=score_threshold,
-                )
-                all_results.extend(unverified_results)
-            else:
-                # Search each document type and combine
-                for doc_type in doc_types:
-                    unverified_results = await search_algo.search(
-                        query=query,
-                        user_id=username,
-                        limit=limit * 2,  # Buffer for verification filtering
-                        doc_type=doc_type,
-                        score_threshold=score_threshold,
-                    )
-                    all_results.extend(unverified_results)
-                # Sort by score before verification
-                all_results.sort(key=lambda r: r.score, reverse=True)
-
-            # No verification needed for visualization - we only need Qdrant metadata
-            # (title, excerpt, doc_type) which is already in search results.
-            # Verification is only needed for sampling (LLM needs full content).
-            search_results = all_results[:limit]
-            search_duration = time.perf_counter() - search_start
-
-        # Store original scores and normalize for visualization
-        # (best result = 1.0, worst result = 0.0 within THIS result set)
-        # This makes visual encoding meaningful regardless of RRF normalization
-        if search_results:
-            scores = [r.score for r in search_results]
-            min_score, max_score = min(scores), max(scores)
-            score_range = max_score - min_score if max_score > min_score else 1.0
-
-            logger.info(
-                f"Normalizing scores for viz: original range [{min_score:.3f}, {max_score:.3f}] "
-                f"→ [0.0, 1.0]"
-            )
-
-            # Store original score and rescale to 0-1 for visualization
-            for r in search_results:
-                # Store original score before normalization
-                r.original_score = r.score
-                # Rescale for visual encoding
-                r.score = (r.score - min_score) / score_range
-
-        if not search_results:
-            return JSONResponse(
-                {
-                    "success": True,
-                    "results": [],
-                    "coordinates_2d": [],
-                    "message": "No results found",
-                }
-            )
-
-        # Fetch vectors for matching results from Qdrant
-        vector_fetch_start = time.perf_counter()
-        qdrant_client = await get_qdrant_client()
-        doc_ids = [r.id for r in search_results]
-
-        # Retrieve vectors for the matching documents
-        from qdrant_client.models import FieldCondition, Filter, MatchAny
-
-        points_response = await qdrant_client.scroll(
-            collection_name=settings.get_collection_name(),
-            scroll_filter=Filter(
-                must=[
-                    FieldCondition(
-                        key="doc_id",
-                        match=MatchAny(any=[str(doc_id) for doc_id in doc_ids]),
-                    ),
-                    FieldCondition(
-                        key="user_id",
-                        match={"value": username},
-                    ),
-                ]
-            ),
-            limit=len(doc_ids) * 2,  # Account for multiple chunks per doc
-            with_vectors=["dense"],  # Only fetch dense vectors for visualization
-            with_payload=["doc_id"],  # Need doc_id to map vectors to results
-        )
-
-        points = points_response[0]
-
-        if not points:
-            return JSONResponse(
-                {
-                    "success": True,
-                    "results": [],
-                    "coordinates_2d": [],
-                    "message": "No vectors found for results",
-                }
-            )
-
-        # Extract dense vectors (handle both named and unnamed vectors)
-        def extract_dense_vector(point):
-            if point.vector is None:
-                return None
-            # If named vectors (dict), extract "dense"
-            if isinstance(point.vector, dict):
-                return point.vector.get("dense")
-            # If unnamed vector (array), use directly
-            return point.vector
-
-        vectors = np.array(
-            [v for v in (extract_dense_vector(p) for p in points) if v is not None]
-        )
-        vector_fetch_duration = time.perf_counter() - vector_fetch_start
-
-        if len(vectors) < 2:
-            # Not enough points for PCA
-            return JSONResponse(
-                {
-                    "success": True,
-                    "results": [
-                        {
-                            "id": r.id,
-                            "doc_type": r.doc_type,
-                            "title": r.title,
-                            "excerpt": r.excerpt,
-                            "score": r.score,
-                        }
-                        for r in search_results
-                    ],
-                    "coordinates_2d": [[0, 0]] * len(search_results),
-                    "message": "Not enough vectors for PCA",
-                }
-            )
-
-        # Apply PCA dimensionality reduction (768-dim → 2D)
-        pca_start = time.perf_counter()
-        pca = PCA(n_components=2)
-        coords_2d = pca.fit_transform(vectors)
-        pca_duration = time.perf_counter() - pca_start
-
-        # After fit, these attributes are guaranteed to be set
-        assert pca.explained_variance_ratio_ is not None
-
-        logger.info(
-            f"PCA explained variance: PC1={pca.explained_variance_ratio_[0]:.3f}, "
-            f"PC2={pca.explained_variance_ratio_[1]:.3f}"
-        )
-
-        # Map results to coordinates (use first chunk per document)
-        result_coords = []
-        seen_doc_ids = set()
-
-        for point, coord in zip(points, coords_2d):
-            if point.payload:
-                doc_id = int(point.payload.get("doc_id", 0))
-                if doc_id not in seen_doc_ids and doc_id in doc_ids:
-                    seen_doc_ids.add(doc_id)
-                    result_coords.append(coord.tolist())
-
-        # Build response
-        response_results = [
-            {
-                "id": r.id,
-                "doc_type": r.doc_type,
-                "title": r.title,
-                "excerpt": r.excerpt,
-                "score": r.score,  # Normalized score for visual encoding (0-1)
-                "original_score": getattr(
-                    r, "original_score", r.score
-                ),  # Raw score from algorithm
-                "chunk_start_offset": r.chunk_start_offset,
-                "chunk_end_offset": r.chunk_end_offset,
-            }
-            for r in search_results
-        ]
-
-        # Calculate total request duration
-        total_duration = time.perf_counter() - request_start
-
-        # Log comprehensive timing metrics
-        logger.info(
-            f"Viz search timing: total={total_duration * 1000:.1f}ms, "
-            f"search={search_duration * 1000:.1f}ms ({search_duration / total_duration * 100:.1f}%), "
-            f"vector_fetch={vector_fetch_duration * 1000:.1f}ms ({vector_fetch_duration / total_duration * 100:.1f}%), "
-            f"pca={pca_duration * 1000:.1f}ms ({pca_duration / total_duration * 100:.1f}%), "
-            f"results={len(search_results)}, vectors={len(vectors)}"
-        )
-
-        return JSONResponse(
-            {
-                "success": True,
-                "results": response_results,
-                "coordinates_2d": result_coords[: len(search_results)],
-                "pca_variance": {
-                    "pc1": float(pca.explained_variance_ratio_[0]),
-                    "pc2": float(pca.explained_variance_ratio_[1]),
-                },
-                "timing": {
-                    "total_ms": round(total_duration * 1000, 2),
-                    "search_ms": round(search_duration * 1000, 2),
-                    "vector_fetch_ms": round(vector_fetch_duration * 1000, 2),
-                    "pca_ms": round(pca_duration * 1000, 2),
-                    "num_results": len(search_results),
-                    "num_vectors": len(vectors),
-                },
-            }
-        )
-
-    except Exception as e:
-        logger.error(f"Viz search error: {e}", exc_info=True)
-        return JSONResponse(
-            {"success": False, "error": str(e)},
-            status_code=500,
-        )
-
-
-@requires("authenticated", redirect="oauth_login")
-async def chunk_context_endpoint(request: Request) -> JSONResponse:
-    """Fetch chunk text with surrounding context for visualization.
-
-    This endpoint retrieves the matched chunk along with surrounding text
-    to provide context for the search result. Used by the viz pane to
-    display chunks inline.
-
-    Query parameters:
-        doc_type: Document type (e.g., "note")
-        doc_id: Document ID
-        start: Chunk start offset (character position)
-        end: Chunk end offset (character position)
-        context: Characters of context before/after (default: 500)
-
-    Returns:
-        JSON with chunk_text, before_context, after_context, and flags
-    """
-    try:
-        # Get query parameters
-        doc_type = request.query_params.get("doc_type")
-        doc_id = request.query_params.get("doc_id")
-        start_str = request.query_params.get("start")
-        end_str = request.query_params.get("end")
-        context_chars = int(request.query_params.get("context", "500"))
-
-        # Validate required parameters
-        if not all([doc_type, doc_id, start_str, end_str]):
-            return JSONResponse(
-                {
-                    "success": False,
-                    "error": "Missing required parameters: doc_type, doc_id, start, end",
-                },
-                status_code=400,
-            )
-
-        start = int(start_str)
-        end = int(end_str)
-
-        # Currently only support notes
-        if doc_type != "note":
-            return JSONResponse(
-                {"success": False, "error": f"Unsupported doc_type: {doc_type}"},
-                status_code=400,
-            )
-
-        # Get authenticated HTTP client and fetch note
-        from nextcloud_mcp_server.auth.userinfo_routes import (
-            _get_authenticated_client_for_userinfo,
-        )
-        from nextcloud_mcp_server.client.notes import NotesClient
-
-        # Get username from request auth
-        username = (
-            request.user.display_name
-            if hasattr(request.user, "display_name")
-            else "unknown"
-        )
-
-        # Create notes client with authenticated HTTP client
-        http_client = await _get_authenticated_client_for_userinfo(request)
-        notes_client = NotesClient(http_client, username)
-
-        # Fetch full note content
-        note = await notes_client.get_note(int(doc_id))
-        full_content = f"{note['title']}\n\n{note['content']}"
-
-        # Validate offsets
-        if start < 0 or end > len(full_content) or start >= end:
-            return JSONResponse(
-                {
-                    "success": False,
-                    "error": f"Invalid offsets: start={start}, end={end}, content_length={len(full_content)}",
-                },
-                status_code=400,
-            )
-
-        # Extract chunk
-        chunk_text = full_content[start:end]
-
-        # Extract context before and after
-        before_start = max(0, start - context_chars)
-        before_context = full_content[before_start:start]
-
-        after_end = min(len(full_content), end + context_chars)
-        after_context = full_content[end:after_end]
-
-        # Determine if there's more content
-        has_more_before = before_start > 0
-        has_more_after = after_end < len(full_content)
-
-        logger.info(
-            f"Fetched chunk context for {doc_type}_{doc_id}: "
-            f"chunk_len={len(chunk_text)}, before_len={len(before_context)}, "
-            f"after_len={len(after_context)}"
-        )
-
-        return JSONResponse(
-            {
-                "success": True,
-                "chunk_text": chunk_text,
-                "before_context": before_context,
-                "after_context": after_context,
-                "has_more_before": has_more_before,
-                "has_more_after": has_more_after,
-            }
-        )
-
-    except ValueError as e:
-        logger.error(f"Invalid parameter format: {e}")
-        return JSONResponse(
-            {"success": False, "error": f"Invalid parameter format: {e}"},
-            status_code=400,
-        )
-    except Exception as e:
-        logger.error(f"Chunk context error: {e}", exc_info=True)
-        return JSONResponse(
-            {"success": False, "error": str(e)},
-            status_code=500,
-        )
@@ -5,7 +5,6 @@ import time
 from abc import ABC
 from functools import wraps

-import anyio
 from httpx import AsyncClient, HTTPStatusError, RequestError, codes

 from nextcloud_mcp_server.observability.metrics import (
@@ -48,7 +47,7 @@ def retry_on_429(func):
                    # Record retry metric (extract app name from args if available)
                    if len(args) > 0 and hasattr(args[0], "app_name"):
                        record_nextcloud_api_retry(app=args[0].app_name, reason="429")
-                    await anyio.sleep(5)
+                    time.sleep(5)
                elif e.response.status_code == 404:
                    # 404 errors are often expected (e.g., checking if attachments exist)
                    # Log as debug instead of warning
@@ -40,7 +40,7 @@ class NotesClient(BaseNextcloudClient):
        seen_ids: set[int] = set()

        while True:
-            params: Dict[str, Any] = {"chunkSize": 100}
+            params: Dict[str, Any] = {"chunkSize": 10}
            if cursor:
                params["chunkCursor"] = cursor
            if prune_before is not None:
@@ -181,8 +181,8 @@ class Settings:
    ollama_verify_ssl: bool = True

    # Document chunking settings (for vector embeddings)
-    document_chunk_size: int = 2048  # Characters per chunk
-    document_chunk_overlap: int = 200  # Overlapping characters between chunks
+    document_chunk_size: int = 512  # Words per chunk
+    document_chunk_overlap: int = 50  # Overlapping words between chunks

    # Observability settings
    metrics_enabled: bool = True
@@ -227,10 +227,10 @@ class Settings:
                f"Overlap should be 10-20% of chunk size for optimal results."
            )

-        if self.document_chunk_size < 512:
+        if self.document_chunk_size < 100:
            logger.warning(
-                f"DOCUMENT_CHUNK_SIZE is set to {self.document_chunk_size} characters, which is quite small. "
-                f"Smaller chunks may lose context. Consider using at least 1024 characters."
+                f"DOCUMENT_CHUNK_SIZE is set to {self.document_chunk_size} words, which is quite small. "
+                f"Smaller chunks may lose context. Consider using at least 256 words."
            )

        if self.document_chunk_overlap < 0:
@@ -288,8 +288,8 @@ def get_settings() -> Settings:
    return Settings(
        # OAuth/OIDC settings
        oidc_discovery_url=os.getenv("OIDC_DISCOVERY_URL"),
-        oidc_client_id=os.getenv("NEXTCLOUD_OIDC_CLIENT_ID"),
-        oidc_client_secret=os.getenv("NEXTCLOUD_OIDC_CLIENT_SECRET"),
+        oidc_client_id=os.getenv("OIDC_CLIENT_ID"),
+        oidc_client_secret=os.getenv("OIDC_CLIENT_SECRET"),
        oidc_issuer=os.getenv("OIDC_ISSUER"),
        # Nextcloud settings
        nextcloud_host=os.getenv("NEXTCLOUD_HOST"),
@@ -335,8 +335,8 @@ def get_settings() -> Settings:
        ollama_embedding_model=os.getenv("OLLAMA_EMBEDDING_MODEL", "nomic-embed-text"),
        ollama_verify_ssl=os.getenv("OLLAMA_VERIFY_SSL", "true").lower() == "true",
        # Document chunking settings
-        document_chunk_size=int(os.getenv("DOCUMENT_CHUNK_SIZE", "2048")),
-        document_chunk_overlap=int(os.getenv("DOCUMENT_CHUNK_OVERLAP", "200")),
+        document_chunk_size=int(os.getenv("DOCUMENT_CHUNK_SIZE", "512")),
+        document_chunk_overlap=int(os.getenv("DOCUMENT_CHUNK_OVERLAP", "50")),
        # Observability settings
        metrics_enabled=os.getenv("METRICS_ENABLED", "true").lower() == "true",
        metrics_port=int(os.getenv("METRICS_PORT", "9090")),
@@ -1,13 +1,6 @@
 """Embedding service package for generating vector embeddings."""

-from .bm25_provider import BM25SparseEmbeddingProvider
-from .service import EmbeddingService, get_bm25_service, get_embedding_service
+from .service import EmbeddingService, get_embedding_service
 from .simple_provider import SimpleEmbeddingProvider

-__all__ = [
-    "EmbeddingService",
-    "get_embedding_service",
-    "BM25SparseEmbeddingProvider",
-    "get_bm25_service",
-    "SimpleEmbeddingProvider",
-]
+__all__ = ["EmbeddingService", "get_embedding_service", "SimpleEmbeddingProvider"]
@@ -1,74 +0,0 @@
-"""BM25 sparse embedding provider using FastEmbed."""
-
-import logging
-from typing import Any
-
-from fastembed import SparseTextEmbedding
-
-logger = logging.getLogger(__name__)
-
-
-class BM25SparseEmbeddingProvider:
-    """
-    BM25 sparse embedding provider for hybrid search.
-
-    Uses FastEmbed's BM25 model to generate sparse vectors for keyword-based
-    retrieval. These sparse vectors are combined with dense semantic vectors
-    in Qdrant using Reciprocal Rank Fusion (RRF) for hybrid search.
-
-    Unlike dense embeddings which have fixed dimensions, sparse embeddings
-    have variable-length vectors with (index, value) pairs representing
-    term frequencies in the BM25 vocabulary.
-    """
-
-    def __init__(self, model_name: str = "Qdrant/bm25"):
-        """
-        Initialize BM25 sparse embedding provider.
-
-        Args:
-            model_name: FastEmbed BM25 model name (default: Qdrant/bm25)
-        """
-        self.model_name = model_name
-        logger.info(f"Initializing BM25 sparse embedding provider: {model_name}")
-
-        # Initialize FastEmbed sparse embedding model
-        self.model = SparseTextEmbedding(model_name=model_name)
-        logger.info(f"BM25 sparse embedding model loaded: {model_name}")
-
-    def encode(self, text: str) -> dict[str, Any]:
-        """
-        Generate BM25 sparse embedding for a single text.
-
-        Args:
-            text: Input text to encode
-
-        Returns:
-            Dictionary with 'indices' and 'values' keys for Qdrant sparse vector
-        """
-        # FastEmbed returns a generator, take first result
-        sparse_embedding = next(iter(self.model.embed([text])))
-
-        return {
-            "indices": sparse_embedding.indices.tolist(),
-            "values": sparse_embedding.values.tolist(),
-        }
-
-    def encode_batch(self, texts: list[str]) -> list[dict[str, Any]]:
-        """
-        Generate BM25 sparse embeddings for multiple texts (batched).
-
-        Args:
-            texts: List of texts to encode
-
-        Returns:
-            List of dictionaries with 'indices' and 'values' for each text
-        """
-        sparse_embeddings = list(self.model.embed(texts))
-
-        return [
-            {
-                "indices": emb.indices.tolist(),
-                "values": emb.values.tolist(),
-            }
-            for emb in sparse_embeddings
-        ]
@@ -1,30 +1,56 @@
-"""Embedding service with provider detection.
-
-DEPRECATED: This module is maintained for backward compatibility.
-New code should use nextcloud_mcp_server.providers.get_provider() directly.
-"""
+"""Embedding service with provider detection."""

 import logging
+import os

-from nextcloud_mcp_server.providers import get_provider
-
-from .bm25_provider import BM25SparseEmbeddingProvider
+from .base import EmbeddingProvider
+from .ollama_provider import OllamaEmbeddingProvider
+from .simple_provider import SimpleEmbeddingProvider

 logger = logging.getLogger(__name__)


 class EmbeddingService:
-    """
-    Unified embedding service with automatic provider detection.
-
-    DEPRECATED: This class wraps the new unified provider infrastructure
-    for backward compatibility. New code should use
-    nextcloud_mcp_server.providers.get_provider() directly.
-    """
+    """Unified embedding service with automatic provider detection."""

    def __init__(self):
        """Initialize embedding service with auto-detected provider."""
-        self.provider = get_provider()
+        self.provider = self._detect_provider()
+
+    def _detect_provider(self) -> EmbeddingProvider:
+        """
+        Auto-detect available embedding provider.
+
+        Checks environment variables in order:
+        1. OLLAMA_BASE_URL - Use Ollama provider (production)
+        2. OPENAI_API_KEY - Use OpenAI provider (future)
+        3. Fallback to SimpleEmbeddingProvider (testing/development)
+
+        Returns:
+            Configured embedding provider
+        """
+        # Ollama provider (production)
+        ollama_url = os.getenv("OLLAMA_BASE_URL")
+        if ollama_url:
+            logger.info(f"Using Ollama embedding provider: {ollama_url}")
+            return OllamaEmbeddingProvider(
+                base_url=ollama_url,
+                model=os.getenv("OLLAMA_EMBEDDING_MODEL", "nomic-embed-text"),
+                verify_ssl=os.getenv("OLLAMA_VERIFY_SSL", "true").lower() == "true",
+            )
+
+        # OpenAI provider (future implementation)
+        # openai_key = os.getenv("OPENAI_API_KEY")
+        # if openai_key:
+        #     return OpenAIEmbeddingProvider(api_key=openai_key)
+
+        # Fallback to simple provider for development/testing
+        logger.warning(
+            "No embedding provider configured (OLLAMA_BASE_URL or OPENAI_API_KEY not set). "
+            "Using SimpleEmbeddingProvider for testing/development. "
+            "For production, configure an external embedding service."
+        )
+        return SimpleEmbeddingProvider(dimension=384)

    async def embed(self, text: str) -> list[float]:
        """
@@ -83,20 +109,3 @@ def get_embedding_service() -> EmbeddingService:
    if _embedding_service is None:
        _embedding_service = EmbeddingService()
    return _embedding_service
-
-
-# BM25 sparse embedding singleton
-_bm25_service: BM25SparseEmbeddingProvider | None = None
-
-
-def get_bm25_service() -> BM25SparseEmbeddingProvider:
-    """
-    Get singleton BM25 sparse embedding service instance.
-
-    Returns:
-        Global BM25SparseEmbeddingProvider instance
-    """
-    global _bm25_service
-    if _bm25_service is None:
-        _bm25_service = BM25SparseEmbeddingProvider()
-    return _bm25_service
@@ -19,22 +19,9 @@ class SemanticSearchResult(BaseModel):
        default="", description="Document category (notes) or location (calendar)"
    )
    excerpt: str = Field(description="Excerpt from matching chunk")
-    score: float = Field(
-        description=(
-            "Relevance score (≥ 0.0, higher is better). "
-            "Score range depends on fusion method: "
-            "RRF produces scores in [0.0, 1.0], "
-            "DBSF can exceed 1.0 (sum of normalized scores from multiple systems)"
-        )
-    )
+    score: float = Field(description="Semantic similarity score (0-1)")
    chunk_index: int = Field(description="Index of matching chunk in document")
    total_chunks: int = Field(description="Total number of chunks in document")
-    chunk_start_offset: Optional[int] = Field(
-        default=None, description="Character position where chunk starts in document"
-    )
-    chunk_end_offset: Optional[int] = Field(
-        default=None, description="Character position where chunk ends in document"
-    )


 class SemanticSearchResponse(BaseResponse):
@@ -39,12 +39,7 @@ class HealthCheckFilter(logging.Filter):
        message = record.getMessage()
        return not any(
            endpoint in message
-            for endpoint in [
-                "/health/live",
-                "/health/ready",
-                "/metrics",
-                "/app/vector-sync/status",
-            ]
+            for endpoint in ["/health/live", "/health/ready", "/metrics"]
        )


@@ -404,11 +404,10 @@ def update_vector_sync_queue_size(size: int) -> None:

 def instrument_tool(func):
    """
-    Decorator to automatically instrument MCP tool functions with metrics and tracing.
+    Decorator to automatically instrument MCP tool functions with metrics.

-    Wraps async tool functions to record execution time, success/error status, and
-    create OpenTelemetry trace spans. Compatible with @mcp.tool() and @require_scopes()
-    decorators.
+    Wraps async tool functions to record execution time and success/error status.
+    Compatible with @mcp.tool() and @require_scopes() decorators.

    Usage:
        @mcp.tool()
@@ -421,46 +420,24 @@ def instrument_tool(func):
        func: The async function to instrument

    Returns:
-        Wrapped function with metrics and tracing instrumentation
+        Wrapped function with metrics instrumentation
    """
    import functools
    import time

-    from nextcloud_mcp_server.observability.tracing import trace_operation
-
    @functools.wraps(func)
    async def wrapper(*args, **kwargs):
        tool_name = func.__name__
        start_time = time.time()
-
-        # Extract tool arguments for tracing (sanitize sensitive fields)
-        # kwargs contains the actual arguments passed to the tool
-        tool_args = {
-            k: v
-            for k, v in kwargs.items()
-            if k not in ("password", "token", "secret", "api_key", "etag", "ctx")
-        }
-
-        # Create trace span with metrics collection
-        with trace_operation(
-            f"mcp.tool.{tool_name}",
-            attributes={
-                "mcp.tool.name": tool_name,
-                "mcp.tool.args": str(tool_args)[:500]
-                if tool_args
-                else None,  # Limit to 500 chars
-            },
-            record_exception=True,
-        ):
-            try:
-                result = await func(*args, **kwargs)
-                duration = time.time() - start_time
-                record_tool_call(tool_name, duration, "success")
-                return result
-            except Exception as e:
-                duration = time.time() - start_time
-                record_tool_call(tool_name, duration, "error")
-                record_tool_error(tool_name, type(e).__name__)
-                raise
+        try:
+            result = await func(*args, **kwargs)
+            duration = time.time() - start_time
+            record_tool_call(tool_name, duration, "success")
+            return result
+        except Exception as e:
+            duration = time.time() - start_time
+            record_tool_call(tool_name, duration, "error")
+            record_tool_error(tool_name, type(e).__name__)
+            raise

    return wrapper
@@ -66,12 +66,8 @@ class ObservabilityMiddleware(BaseHTTPMiddleware):
        # Record start time
        start_time = time.time()

-        # Skip tracing for health/metrics/polling endpoints to reduce noise
-        should_trace = not (
-            path.startswith("/health/")
-            or path == "/metrics"
-            or path == "/app/vector-sync/status"
-        )
+        # Skip tracing for health/metrics endpoints to reduce noise
+        should_trace = not (path.startswith("/health/") or path == "/metrics")

        try:
            if should_trace:
@@ -1,18 +0,0 @@
-"""Unified provider infrastructure for embeddings and text generation."""
-
-from .anthropic import AnthropicProvider
-from .base import Provider
-from .bedrock import BedrockProvider
-from .ollama import OllamaProvider
-from .registry import get_provider, reset_provider
-from .simple import SimpleProvider
-
-__all__ = [
-    "Provider",
-    "OllamaProvider",
-    "AnthropicProvider",
-    "SimpleProvider",
-    "BedrockProvider",
-    "get_provider",
-    "reset_provider",
-]
@@ -1,97 +0,0 @@
-"""Unified Anthropic provider for text generation."""
-
-import logging
-
-from anthropic import AsyncAnthropic
-
-from .base import Provider
-
-logger = logging.getLogger(__name__)
-
-
-class AnthropicProvider(Provider):
-    """
-    Anthropic provider for text generation.
-
-    Supports Claude models via the Anthropic API.
-    Note: Anthropic doesn't provide embedding models, only text generation.
-    """
-
-    def __init__(self, api_key: str, model: str = "claude-3-5-sonnet-20241022"):
-        """
-        Initialize Anthropic provider.
-
-        Args:
-            api_key: Anthropic API key
-            model: Model name (e.g., "claude-3-5-sonnet-20241022")
-        """
-        self.client = AsyncAnthropic(api_key=api_key)
-        self.model = model
-
-        logger.info(f"Initialized Anthropic provider (model={model})")
-
-    @property
-    def supports_embeddings(self) -> bool:
-        """Whether this provider supports embedding generation."""
-        return False
-
-    @property
-    def supports_generation(self) -> bool:
-        """Whether this provider supports text generation."""
-        return True
-
-    async def embed(self, text: str) -> list[float]:
-        """
-        Generate embedding vector for text.
-
-        Raises:
-            NotImplementedError: Anthropic doesn't provide embedding models
-        """
-        raise NotImplementedError(
-            "Embedding not supported by Anthropic - use Ollama or Bedrock for embeddings"
-        )
-
-    async def embed_batch(self, texts: list[str]) -> list[list[float]]:
-        """
-        Generate embeddings for multiple texts.
-
-        Raises:
-            NotImplementedError: Anthropic doesn't provide embedding models
-        """
-        raise NotImplementedError(
-            "Embedding not supported by Anthropic - use Ollama or Bedrock for embeddings"
-        )
-
-    def get_dimension(self) -> int:
-        """
-        Get embedding dimension.
-
-        Raises:
-            NotImplementedError: Anthropic doesn't provide embedding models
-        """
-        raise NotImplementedError(
-            "Embedding not supported by Anthropic - use Ollama or Bedrock for embeddings"
-        )
-
-    async def generate(self, prompt: str, max_tokens: int = 500) -> str:
-        """
-        Generate text using Anthropic API.
-
-        Args:
-            prompt: The prompt to generate from
-            max_tokens: Maximum tokens to generate
-
-        Returns:
-            Generated text
-        """
-        message = await self.client.messages.create(
-            model=self.model,
-            max_tokens=max_tokens,
-            temperature=0.7,
-            messages=[{"role": "user", "content": prompt}],
-        )
-        return message.content[0].text
-
-    async def close(self) -> None:
-        """Close the client (no-op for Anthropic SDK)."""
-        pass
@@ -1,91 +0,0 @@
-"""Unified provider interface for embeddings and text generation."""
-
-from abc import ABC, abstractmethod
-
-
-class Provider(ABC):
-    """
-    Unified base class for LLM providers.
-
-    Providers can support embeddings, text generation, or both.
-    Use capability properties to determine what features are available.
-    """
-
-    @property
-    @abstractmethod
-    def supports_embeddings(self) -> bool:
-        """Whether this provider supports embedding generation."""
-        pass
-
-    @property
-    @abstractmethod
-    def supports_generation(self) -> bool:
-        """Whether this provider supports text generation."""
-        pass
-
-    @abstractmethod
-    async def embed(self, text: str) -> list[float]:
-        """
-        Generate embedding vector for text.
-
-        Args:
-            text: Input text to embed
-
-        Returns:
-            Vector embedding as list of floats
-
-        Raises:
-            NotImplementedError: If provider doesn't support embeddings
-        """
-        pass
-
-    @abstractmethod
-    async def embed_batch(self, texts: list[str]) -> list[list[float]]:
-        """
-        Generate embeddings for multiple texts (optimized).
-
-        Args:
-            texts: List of texts to embed
-
-        Returns:
-            List of vector embeddings
-
-        Raises:
-            NotImplementedError: If provider doesn't support embeddings
-        """
-        pass
-
-    @abstractmethod
-    def get_dimension(self) -> int:
-        """
-        Get embedding dimension for this provider.
-
-        Returns:
-            Vector dimension (e.g., 768 for nomic-embed-text)
-
-        Raises:
-            NotImplementedError: If provider doesn't support embeddings
-        """
-        pass
-
-    @abstractmethod
-    async def generate(self, prompt: str, max_tokens: int = 500) -> str:
-        """
-        Generate text from a prompt.
-
-        Args:
-            prompt: The prompt to generate from
-            max_tokens: Maximum tokens to generate
-
-        Returns:
-            Generated text
-
-        Raises:
-            NotImplementedError: If provider doesn't support generation
-        """
-        pass
-
-    @abstractmethod
-    async def close(self) -> None:
-        """Close the provider and release resources."""
-        pass
@@ -1,397 +0,0 @@
-"""Amazon Bedrock provider for embeddings and text generation."""
-
-import json
-import logging
-from typing import Any
-
-try:
-    import boto3
-    from botocore.exceptions import BotoCoreError, ClientError
-
-    BOTO3_AVAILABLE = True
-except ImportError:
-    BOTO3_AVAILABLE = False
-
-from .base import Provider
-
-logger = logging.getLogger(__name__)
-
-
-class BedrockProvider(Provider):
-    """
-    Amazon Bedrock provider supporting both embeddings and text generation.
-
-    Uses AWS Bedrock Runtime API with boto3. Supports various model families:
-    - Embeddings: amazon.titan-embed-text-v1, amazon.titan-embed-text-v2, cohere.embed-*
-    - Text Generation: anthropic.claude-*, meta.llama3-*, amazon.titan-text-*, mistral.*, etc.
-
-    Requires AWS credentials configured via:
-    - Environment variables (AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_REGION)
-    - AWS credentials file (~/.aws/credentials)
-    - IAM role (when running on AWS)
-    """
-
-    def __init__(
-        self,
-        region_name: str | None = None,
-        embedding_model: str | None = None,
-        generation_model: str | None = None,
-        aws_access_key_id: str | None = None,
-        aws_secret_access_key: str | None = None,
-    ):
-        """
-        Initialize Bedrock provider.
-
-        Args:
-            region_name: AWS region (e.g., "us-east-1"). Defaults to AWS_REGION env var.
-            embedding_model: Model ID for embeddings (e.g., "amazon.titan-embed-text-v2:0").
-                None disables embeddings.
-            generation_model: Model ID for text generation (e.g., "anthropic.claude-3-sonnet-20240229-v1:0").
-                None disables generation.
-            aws_access_key_id: AWS access key (optional, uses default credential chain if not provided)
-            aws_secret_access_key: AWS secret key (optional, uses default credential chain if not provided)
-
-        Raises:
-            ImportError: If boto3 is not installed
-        """
-        if not BOTO3_AVAILABLE:
-            raise ImportError(
-                "boto3 is required for Bedrock provider. Install with: pip install boto3"
-            )
-
-        self.embedding_model = embedding_model
-        self.generation_model = generation_model
-        self._dimension: int | None = None  # Detected dynamically
-
-        # Initialize bedrock-runtime client
-        client_kwargs: dict[str, Any] = {}
-        if region_name:
-            client_kwargs["region_name"] = region_name
-        if aws_access_key_id:
-            client_kwargs["aws_access_key_id"] = aws_access_key_id
-        if aws_secret_access_key:
-            client_kwargs["aws_secret_access_key"] = aws_secret_access_key
-
-        self.client = boto3.client("bedrock-runtime", **client_kwargs)
-
-        logger.info(
-            f"Initialized Bedrock provider in region {region_name or 'default'} "
-            f"(embedding_model={embedding_model}, generation_model={generation_model})"
-        )
-
-    @property
-    def supports_embeddings(self) -> bool:
-        """Whether this provider supports embedding generation."""
-        return self.embedding_model is not None
-
-    @property
-    def supports_generation(self) -> bool:
-        """Whether this provider supports text generation."""
-        return self.generation_model is not None
-
-    def _create_embedding_request(self, text: str) -> dict[str, Any]:
-        """
-        Create model-specific embedding request payload.
-
-        Args:
-            text: Input text to embed
-
-        Returns:
-            Request payload dict for the embedding model
-        """
-        if not self.embedding_model:
-            raise NotImplementedError(
-                "Embedding not supported - no embedding_model configured"
-            )
-
-        # Titan Embed models
-        if self.embedding_model.startswith("amazon.titan-embed"):
-            return {"inputText": text}
-
-        # Cohere Embed models
-        elif self.embedding_model.startswith("cohere.embed"):
-            return {"texts": [text], "input_type": "search_document"}
-
-        # Unknown model - try Titan format as default
-        else:
-            logger.warning(
-                f"Unknown embedding model format for {self.embedding_model}, "
-                "using Titan format as default"
-            )
-            return {"inputText": text}
-
-    def _parse_embedding_response(self, response: dict[str, Any]) -> list[float]:
-        """
-        Parse model-specific embedding response.
-
-        Args:
-            response: Raw response from Bedrock
-
-        Returns:
-            Embedding vector as list of floats
-        """
-        # Titan Embed models
-        if self.embedding_model and self.embedding_model.startswith(
-            "amazon.titan-embed"
-        ):
-            return response["embedding"]
-
-        # Cohere Embed models
-        elif self.embedding_model and self.embedding_model.startswith("cohere.embed"):
-            return response["embeddings"][0]
-
-        # Unknown model - try Titan format as default
-        else:
-            logger.warning(
-                f"Unknown embedding response format for {self.embedding_model}, "
-                "trying Titan format"
-            )
-            return response.get("embedding", response.get("embeddings", [None])[0])
-
-    async def embed(self, text: str) -> list[float]:
-        """
-        Generate embedding vector for text.
-
-        Args:
-            text: Input text to embed
-
-        Returns:
-            Vector embedding as list of floats
-
-        Raises:
-            NotImplementedError: If embeddings not enabled (no embedding_model)
-            ClientError: If Bedrock API call fails
-        """
-        if not self.supports_embeddings:
-            raise NotImplementedError(
-                "Embedding not supported - no embedding_model configured"
-            )
-
-        try:
-            request_body = self._create_embedding_request(text)
-
-            response = self.client.invoke_model(
-                modelId=self.embedding_model,
-                body=json.dumps(request_body),
-                accept="application/json",
-                contentType="application/json",
-            )
-
-            response_body = json.loads(response["body"].read())
-            embedding = self._parse_embedding_response(response_body)
-
-            return embedding
-
-        except (BotoCoreError, ClientError) as e:
-            logger.error(f"Bedrock embedding error: {e}")
-            raise
-
-    async def embed_batch(self, texts: list[str]) -> list[list[float]]:
-        """
-        Generate embeddings for multiple texts.
-
-        Note: Current implementation sends requests sequentially.
-        Future optimization could use asyncio for concurrent requests.
-
-        Args:
-            texts: List of texts to embed
-
-        Returns:
-            List of vector embeddings
-
-        Raises:
-            NotImplementedError: If embeddings not enabled (no embedding_model)
-            ClientError: If Bedrock API call fails
-        """
-        if not self.supports_embeddings:
-            raise NotImplementedError(
-                "Embedding not supported - no embedding_model configured"
-            )
-
-        embeddings = []
-        for text in texts:
-            embedding = await self.embed(text)
-            embeddings.append(embedding)
-        return embeddings
-
-    async def _detect_dimension(self):
-        """
-        Detect embedding dimension by generating a test embedding.
-        """
-        if self._dimension is None and self.supports_embeddings:
-            logger.debug(
-                f"Detecting embedding dimension for model {self.embedding_model}..."
-            )
-            test_embedding = await self.embed("test")
-            self._dimension = len(test_embedding)
-            logger.info(
-                f"Detected embedding dimension: {self._dimension} "
-                f"for model {self.embedding_model}"
-            )
-
-    def get_dimension(self) -> int:
-        """
-        Get embedding dimension.
-
-        Returns:
-            Vector dimension for the configured embedding model
-
-        Raises:
-            NotImplementedError: If embeddings not enabled (no embedding_model)
-            RuntimeError: If dimension not detected yet (call _detect_dimension first)
-        """
-        if not self.supports_embeddings:
-            raise NotImplementedError(
-                "Embedding not supported - no embedding_model configured"
-            )
-
-        if self._dimension is None:
-            raise RuntimeError(
-                f"Embedding dimension not detected yet for model {self.embedding_model}. "
-                "Call _detect_dimension() first or generate an embedding."
-            )
-        return self._dimension
-
-    def _create_generation_request(
-        self, prompt: str, max_tokens: int
-    ) -> dict[str, Any]:
-        """
-        Create model-specific text generation request payload.
-
-        Args:
-            prompt: The prompt to generate from
-            max_tokens: Maximum tokens to generate
-
-        Returns:
-            Request payload dict for the generation model
-        """
-        if not self.generation_model:
-            raise NotImplementedError(
-                "Text generation not supported - no generation_model configured"
-            )
-
-        # Anthropic Claude models
-        if self.generation_model.startswith("anthropic.claude"):
-            return {
-                "anthropic_version": "bedrock-2023-05-31",
-                "max_tokens": max_tokens,
-                "temperature": 0.7,
-                "messages": [{"role": "user", "content": prompt}],
-            }
-
-        # Meta Llama models
-        elif self.generation_model.startswith("meta.llama"):
-            return {"prompt": prompt, "max_gen_len": max_tokens, "temperature": 0.7}
-
-        # Amazon Titan Text models
-        elif self.generation_model.startswith("amazon.titan-text"):
-            return {
-                "inputText": prompt,
-                "textGenerationConfig": {
-                    "maxTokenCount": max_tokens,
-                    "temperature": 0.7,
-                },
-            }
-
-        # Mistral models
-        elif self.generation_model.startswith("mistral"):
-            return {"prompt": prompt, "max_tokens": max_tokens, "temperature": 0.7}
-
-        # Unknown model - try Claude format as default
-        else:
-            logger.warning(
-                f"Unknown generation model format for {self.generation_model}, "
-                "using Claude format as default"
-            )
-            return {
-                "anthropic_version": "bedrock-2023-05-31",
-                "max_tokens": max_tokens,
-                "temperature": 0.7,
-                "messages": [{"role": "user", "content": prompt}],
-            }
-
-    def _parse_generation_response(self, response: dict[str, Any]) -> str:
-        """
-        Parse model-specific text generation response.
-
-        Args:
-            response: Raw response from Bedrock
-
-        Returns:
-            Generated text
-        """
-        # Anthropic Claude models
-        if self.generation_model and self.generation_model.startswith(
-            "anthropic.claude"
-        ):
-            return response["content"][0]["text"]
-
-        # Meta Llama models
-        elif self.generation_model and self.generation_model.startswith("meta.llama"):
-            return response["generation"]
-
-        # Amazon Titan Text models
-        elif self.generation_model and self.generation_model.startswith(
-            "amazon.titan-text"
-        ):
-            return response["results"][0]["outputText"]
-
-        # Mistral models
-        elif self.generation_model and self.generation_model.startswith("mistral"):
-            return response["outputs"][0]["text"]
-
-        # Unknown model - try common response fields
-        else:
-            logger.warning(
-                f"Unknown generation response format for {self.generation_model}, "
-                "trying common fields"
-            )
-            # Try common response field names
-            for field in ["text", "generation", "outputText", "completion"]:
-                if field in response:
-                    return response[field]
-            # Last resort: return JSON string
-            return json.dumps(response)
-
-    async def generate(self, prompt: str, max_tokens: int = 500) -> str:
-        """
-        Generate text from a prompt.
-
-        Args:
-            prompt: The prompt to generate from
-            max_tokens: Maximum tokens to generate
-
-        Returns:
-            Generated text
-
-        Raises:
-            NotImplementedError: If generation not enabled (no generation_model)
-            ClientError: If Bedrock API call fails
-        """
-        if not self.supports_generation:
-            raise NotImplementedError(
-                "Text generation not supported - no generation_model configured"
-            )
-
-        try:
-            request_body = self._create_generation_request(prompt, max_tokens)
-
-            response = self.client.invoke_model(
-                modelId=self.generation_model,
-                body=json.dumps(request_body),
-                accept="application/json",
-                contentType="application/json",
-            )
-
-            response_body = json.loads(response["body"].read())
-            text = self._parse_generation_response(response_body)
-
-            return text
-
-        except (BotoCoreError, ClientError) as e:
-            logger.error(f"Bedrock generation error: {e}")
-            raise
-
-    async def close(self) -> None:
-        """Close the client (no-op for boto3 clients)."""
-        pass
@@ -1,221 +0,0 @@
-"""Unified Ollama provider for embeddings and text generation."""
-
-import logging
-
-import httpx
-
-from .base import Provider
-
-logger = logging.getLogger(__name__)
-
-
-class OllamaProvider(Provider):
-    """
-    Ollama provider supporting both embeddings and text generation.
-
-    Supports TLS, SSL verification, and automatic model loading.
-    """
-
-    def __init__(
-        self,
-        base_url: str,
-        embedding_model: str | None = None,
-        generation_model: str | None = None,
-        verify_ssl: bool = True,
-        timeout: httpx.Timeout | None = None,
-    ):
-        """
-        Initialize Ollama provider.
-
-        Args:
-            base_url: Ollama API base URL (e.g., https://ollama.internal.example.com:443)
-            embedding_model: Model for embeddings (e.g., "nomic-embed-text"). None disables embeddings.
-            generation_model: Model for text generation (e.g., "llama3.2:1b"). None disables generation.
-            verify_ssl: Verify SSL certificates (default: True)
-            timeout: HTTP timeout configuration
-        """
-        self.base_url = base_url.rstrip("/")
-        self.embedding_model = embedding_model
-        self.generation_model = generation_model
-        self.verify_ssl = verify_ssl
-
-        if timeout is None:
-            timeout = httpx.Timeout(timeout=120, connect=5)
-
-        self.client = httpx.AsyncClient(verify=verify_ssl, timeout=timeout)
-        self._dimension: int | None = None  # Detected dynamically for embeddings
-
-        logger.info(
-            f"Initialized Ollama provider: {base_url} "
-            f"(embedding_model={embedding_model}, generation_model={generation_model}, "
-            f"verify_ssl={verify_ssl})"
-        )
-
-        # Pre-check and auto-load models
-        if embedding_model:
-            self._check_model_is_loaded(embedding_model, autoload=True)
-        if generation_model:
-            self._check_model_is_loaded(generation_model, autoload=True)
-
-    @property
-    def supports_embeddings(self) -> bool:
-        """Whether this provider supports embedding generation."""
-        return self.embedding_model is not None
-
-    @property
-    def supports_generation(self) -> bool:
-        """Whether this provider supports text generation."""
-        return self.generation_model is not None
-
-    async def embed(self, text: str) -> list[float]:
-        """
-        Generate embedding vector for text.
-
-        Args:
-            text: Input text to embed
-
-        Returns:
-            Vector embedding as list of floats
-
-        Raises:
-            NotImplementedError: If embeddings not enabled (no embedding_model)
-        """
-        if not self.supports_embeddings:
-            raise NotImplementedError(
-                "Embedding not supported - no embedding_model configured"
-            )
-
-        response = await self.client.post(
-            f"{self.base_url}/api/embeddings",
-            json={"model": self.embedding_model, "prompt": text},
-        )
-        response.raise_for_status()
-        return response.json()["embedding"]
-
-    async def embed_batch(self, texts: list[str]) -> list[list[float]]:
-        """
-        Generate embeddings for multiple texts (batched requests).
-
-        Note: Ollama doesn't have native batch API, so we send requests sequentially.
-
-        Args:
-            texts: List of texts to embed
-
-        Returns:
-            List of vector embeddings
-
-        Raises:
-            NotImplementedError: If embeddings not enabled (no embedding_model)
-        """
-        if not self.supports_embeddings:
-            raise NotImplementedError(
-                "Embedding not supported - no embedding_model configured"
-            )
-
-        embeddings = []
-        for text in texts:
-            embedding = await self.embed(text)
-            embeddings.append(embedding)
-        return embeddings
-
-    async def _detect_dimension(self):
-        """
-        Detect embedding dimension by generating a test embedding.
-
-        This method queries the model to determine the actual dimension
-        instead of relying on hardcoded values.
-        """
-        if self._dimension is None and self.supports_embeddings:
-            logger.debug(
-                f"Detecting embedding dimension for model {self.embedding_model}..."
-            )
-            test_embedding = await self.embed("test")
-            self._dimension = len(test_embedding)
-            logger.info(
-                f"Detected embedding dimension: {self._dimension} "
-                f"for model {self.embedding_model}"
-            )
-
-    def get_dimension(self) -> int:
-        """
-        Get embedding dimension.
-
-        Returns:
-            Vector dimension for the configured embedding model
-
-        Raises:
-            NotImplementedError: If embeddings not enabled (no embedding_model)
-            RuntimeError: If dimension not detected yet (call _detect_dimension first)
-        """
-        if not self.supports_embeddings:
-            raise NotImplementedError(
-                "Embedding not supported - no embedding_model configured"
-            )
-
-        if self._dimension is None:
-            raise RuntimeError(
-                f"Embedding dimension not detected yet for model {self.embedding_model}. "
-                "Call _detect_dimension() first or generate an embedding."
-            )
-        return self._dimension
-
-    async def generate(self, prompt: str, max_tokens: int = 500) -> str:
-        """
-        Generate text from a prompt.
-
-        Args:
-            prompt: The prompt to generate from
-            max_tokens: Maximum tokens to generate
-
-        Returns:
-            Generated text
-
-        Raises:
-            NotImplementedError: If generation not enabled (no generation_model)
-        """
-        if not self.supports_generation:
-            raise NotImplementedError(
-                "Text generation not supported - no generation_model configured"
-            )
-
-        response = await self.client.post(
-            f"{self.base_url}/api/generate",
-            json={
-                "model": self.generation_model,
-                "prompt": prompt,
-                "stream": False,
-                "options": {
-                    "num_predict": max_tokens,
-                    "temperature": 0.7,
-                },
-            },
-        )
-        response.raise_for_status()
-        data = response.json()
-        return data["response"]
-
-    def _check_model_is_loaded(self, model: str, autoload: bool = True):
-        """
-        Check if model is loaded in Ollama, optionally auto-loading it.
-
-        Args:
-            model: Model name to check
-            autoload: Whether to automatically pull the model if not loaded
-        """
-        response = httpx.get(f"{self.base_url}/api/tags")
-        response.raise_for_status()
-
-        models = [m["name"] for m in response.json().get("models", [])]
-        logger.info("Ollama has following models pre-loaded: %s", models)
-
-        if (model not in models) and autoload:
-            logger.warning(
-                "Model '%s' not yet available in ollama, attempting to pull now...",
-                model,
-            )
-            response = httpx.post(f"{self.base_url}/api/pull", json={"model": model})
-            response.raise_for_status()
-
-    async def close(self) -> None:
-        """Close HTTP client."""
-        await self.client.aclose()
@@ -1,126 +0,0 @@
-"""Provider registry and factory for auto-detection and instantiation."""
-
-import logging
-import os
-
-from .base import Provider
-from .bedrock import BedrockProvider
-from .ollama import OllamaProvider
-from .simple import SimpleProvider
-
-logger = logging.getLogger(__name__)
-
-
-class ProviderRegistry:
-    """
-    Registry for provider auto-detection and instantiation.
-
-    Checks environment variables in priority order and creates appropriate provider:
-    1. Bedrock (AWS_REGION + BEDROCK_*_MODEL)
-    2. Ollama (OLLAMA_BASE_URL)
-    3. Simple (fallback for testing/development)
-    """
-
-    @staticmethod
-    def create_provider() -> Provider:
-        """
-        Auto-detect and create provider based on environment variables.
-
-        Priority order:
-        1. Bedrock - if AWS_REGION or BEDROCK_EMBEDDING_MODEL is set
-        2. Ollama - if OLLAMA_BASE_URL is set
-        3. Simple - fallback for testing/development
-
-        Returns:
-            Provider instance
-
-        Environment Variables:
-            Bedrock:
-                - AWS_REGION: AWS region (e.g., "us-east-1")
-                - AWS_ACCESS_KEY_ID: AWS access key (optional, uses credential chain)
-                - AWS_SECRET_ACCESS_KEY: AWS secret key (optional)
-                - BEDROCK_EMBEDDING_MODEL: Model ID for embeddings (e.g., "amazon.titan-embed-text-v2:0")
-                - BEDROCK_GENERATION_MODEL: Model ID for text generation (e.g., "anthropic.claude-3-sonnet-20240229-v1:0")
-
-            Ollama:
-                - OLLAMA_BASE_URL: Ollama API base URL (e.g., "http://localhost:11434")
-                - OLLAMA_EMBEDDING_MODEL: Model for embeddings (default: "nomic-embed-text")
-                - OLLAMA_GENERATION_MODEL: Model for text generation (e.g., "llama3.2:1b")
-                - OLLAMA_VERIFY_SSL: Verify SSL certificates (default: "true")
-
-            Simple (no configuration needed, fallback):
-                - SIMPLE_EMBEDDING_DIMENSION: Embedding dimension (default: 384)
-        """
-        # 1. Check for Bedrock
-        aws_region = os.getenv("AWS_REGION")
-        bedrock_embedding_model = os.getenv("BEDROCK_EMBEDDING_MODEL")
-        bedrock_generation_model = os.getenv("BEDROCK_GENERATION_MODEL")
-
-        if aws_region or bedrock_embedding_model or bedrock_generation_model:
-            logger.info(
-                f"Using Bedrock provider: region={aws_region}, "
-                f"embedding_model={bedrock_embedding_model}, "
-                f"generation_model={bedrock_generation_model}"
-            )
-            return BedrockProvider(
-                region_name=aws_region,
-                embedding_model=bedrock_embedding_model,
-                generation_model=bedrock_generation_model,
-                aws_access_key_id=os.getenv("AWS_ACCESS_KEY_ID"),
-                aws_secret_access_key=os.getenv("AWS_SECRET_ACCESS_KEY"),
-            )
-
-        # 2. Check for Ollama
-        ollama_url = os.getenv("OLLAMA_BASE_URL")
-        if ollama_url:
-            embedding_model = os.getenv("OLLAMA_EMBEDDING_MODEL", "nomic-embed-text")
-            generation_model = os.getenv("OLLAMA_GENERATION_MODEL")
-            verify_ssl = os.getenv("OLLAMA_VERIFY_SSL", "true").lower() == "true"
-
-            logger.info(
-                f"Using Ollama provider: {ollama_url}, "
-                f"embedding_model={embedding_model}, "
-                f"generation_model={generation_model}"
-            )
-            return OllamaProvider(
-                base_url=ollama_url,
-                embedding_model=embedding_model,
-                generation_model=generation_model,
-                verify_ssl=verify_ssl,
-            )
-
-        # 3. Fallback to Simple provider for development/testing
-        dimension = int(os.getenv("SIMPLE_EMBEDDING_DIMENSION", "384"))
-        logger.warning(
-            "No provider configured (AWS_REGION, OLLAMA_BASE_URL not set). "
-            "Using SimpleProvider for testing/development. "
-            "For production, configure Bedrock or Ollama."
-        )
-        return SimpleProvider(dimension=dimension)
-
-
-# Singleton instance
-_provider: Provider | None = None
-
-
-def get_provider() -> Provider:
-    """
-    Get singleton provider instance.
-
-    Returns:
-        Global Provider instance (auto-detected on first call)
-    """
-    global _provider
-    if _provider is None:
-        _provider = ProviderRegistry.create_provider()
-    return _provider
-
-
-def reset_provider():
-    """
-    Reset singleton provider instance.
-
-    Useful for testing or reconfiguration.
-    """
-    global _provider
-    _provider = None
@@ -1,149 +0,0 @@
-"""Simple in-process embedding provider for testing.
-
-This provider uses a basic TF-IDF-like approach with feature hashing to generate
-deterministic embeddings without requiring external services. Suitable for testing
-but not for production use.
-"""
-
-import hashlib
-import math
-import re
-from collections import Counter
-
-from .base import Provider
-
-
-class SimpleProvider(Provider):
-    """Simple deterministic embedding provider using feature hashing.
-
-    This implementation:
-    - Tokenizes text into words
-    - Uses feature hashing to map words to fixed-size vectors
-    - Applies TF-IDF-like weighting
-    - Normalizes vectors to unit length
-
-    Not suitable for production but good for testing semantic search infrastructure.
-    Only supports embeddings, not text generation.
-    """
-
-    def __init__(self, dimension: int = 384):
-        """Initialize simple embedding provider.
-
-        Args:
-            dimension: Embedding dimension (default: 384)
-        """
-        self.dimension = dimension
-
-    @property
-    def supports_embeddings(self) -> bool:
-        """Whether this provider supports embedding generation."""
-        return True
-
-    @property
-    def supports_generation(self) -> bool:
-        """Whether this provider supports text generation."""
-        return False
-
-    def _tokenize(self, text: str) -> list[str]:
-        """Tokenize text into lowercase words.
-
-        Args:
-            text: Input text
-
-        Returns:
-            List of lowercase word tokens
-        """
-        # Simple word tokenization
-        text = text.lower()
-        words = re.findall(r"\b\w+\b", text)
-        return words
-
-    def _hash_word(self, word: str) -> int:
-        """Hash word to dimension index.
-
-        Args:
-            word: Word to hash
-
-        Returns:
-            Index in range [0, dimension)
-        """
-        hash_bytes = hashlib.md5(word.encode()).digest()
-        hash_int = int.from_bytes(hash_bytes[:4], byteorder="big")
-        return hash_int % self.dimension
-
-    def _embed_single(self, text: str) -> list[float]:
-        """Generate embedding for single text.
-
-        Args:
-            text: Input text
-
-        Returns:
-            Normalized embedding vector
-        """
-        tokens = self._tokenize(text)
-        if not tokens:
-            return [0.0] * self.dimension
-
-        # Count term frequencies
-        term_freq = Counter(tokens)
-
-        # Initialize vector
-        vector = [0.0] * self.dimension
-
-        # Apply TF weighting with feature hashing
-        for word, count in term_freq.items():
-            idx = self._hash_word(word)
-            # Simple TF weighting: log(1 + count)
-            vector[idx] += math.log1p(count)
-
-        # Normalize to unit length
-        norm = math.sqrt(sum(x * x for x in vector))
-        if norm > 0:
-            vector = [x / norm for x in vector]
-
-        return vector
-
-    async def embed(self, text: str) -> list[float]:
-        """Generate embedding vector for text.
-
-        Args:
-            text: Input text to embed
-
-        Returns:
-            Vector embedding as list of floats
-        """
-        return self._embed_single(text)
-
-    async def embed_batch(self, texts: list[str]) -> list[list[float]]:
-        """Generate embeddings for multiple texts.
-
-        Args:
-            texts: List of texts to embed
-
-        Returns:
-            List of vector embeddings
-        """
-        return [self._embed_single(text) for text in texts]
-
-    def get_dimension(self) -> int:
-        """Get embedding dimension.
-
-        Returns:
-            Vector dimension
-        """
-        return self.dimension
-
-    async def generate(self, prompt: str, max_tokens: int = 500) -> str:
-        """
-        Generate text from a prompt.
-
-        Raises:
-            NotImplementedError: Simple provider doesn't support text generation
-        """
-        raise NotImplementedError(
-            "Text generation not supported by Simple provider - use Ollama, Anthropic, or Bedrock"
-        )
-
-    async def close(self) -> None:
-        """Close the provider (no-op for simple provider)."""
-        pass
@@ -1,27 +0,0 @@
-"""Search algorithms module for BM25 hybrid search.
-
-This module provides BM25 hybrid search combining:
- Dense semantic vectors (vector similarity via embeddings)
- Sparse BM25 vectors (keyword-based retrieval)
-
-Results are fused using Qdrant's native Reciprocal Rank Fusion (RRF) for
-optimal relevance across both semantic and keyword queries.
-"""
-
-from nextcloud_mcp_server.search.algorithms import (
-    NextcloudClientProtocol,
-    SearchAlgorithm,
-    SearchResult,
-    get_indexed_doc_types,
-)
-from nextcloud_mcp_server.search.bm25_hybrid import BM25HybridSearchAlgorithm
-from nextcloud_mcp_server.search.semantic import SemanticSearchAlgorithm
-
-__all__ = [
-    "NextcloudClientProtocol",
-    "SearchAlgorithm",
-    "SearchResult",
-    "get_indexed_doc_types",
-    "SemanticSearchAlgorithm",
-    "BM25HybridSearchAlgorithm",
-]
@@ -1,213 +0,0 @@
-"""Base interfaces and data structures for search algorithms."""
-
-from abc import ABC, abstractmethod
-from dataclasses import dataclass
-from typing import Any, Protocol, runtime_checkable
-
-
-@runtime_checkable
-class NextcloudClientProtocol(Protocol):
-    """Protocol for Nextcloud client supporting multi-document search.
-
-    This protocol defines the interface that search algorithms need from a
-    Nextcloud client to access documents across different apps (Notes, Files,
-    Calendar, etc.). The client provides access to app-specific sub-clients
-    that handle the actual API calls.
-
-    Document types (e.g., "note", "file", "calendar") are NOT 1:1 with apps.
-    For example, the Notes app specializes in markdown files, while Files/WebDAV
-    handles multiple file types. The abstraction is at the document type level.
-
-    Search algorithms query Qdrant to determine which document types are actually
-    indexed before attempting to access them, enabling graceful cross-app search.
-    """
-
-    username: str
-
-    # App-specific clients that search algorithms dispatch to
-    @property
-    def notes(self) -> Any:
-        """Notes client for accessing note documents."""
-        ...
-
-    @property
-    def webdav(self) -> Any:
-        """WebDAV client for accessing file documents."""
-        ...
-
-    @property
-    def calendar(self) -> Any:
-        """Calendar client for accessing event/task documents."""
-        ...
-
-    @property
-    def contacts(self) -> Any:
-        """Contacts client for accessing contact card documents."""
-        ...
-
-    @property
-    def deck(self) -> Any:
-        """Deck client for accessing deck card documents."""
-        ...
-
-    @property
-    def cookbook(self) -> Any:
-        """Cookbook client for accessing recipe documents."""
-        ...
-
-    @property
-    def tables(self) -> Any:
-        """Tables client for accessing table row documents."""
-        ...
-
-
-async def get_indexed_doc_types(user_id: str) -> set[str]:
-    """Query Qdrant to get actually-indexed document types for a user.
-
-    This enables search algorithms to check which document types are available
-    before attempting to search/verify them, allowing graceful cross-app search.
-
-    Args:
-        user_id: User ID to filter by
-
-    Returns:
-        Set of document type strings (e.g., {"note", "file", "calendar"})
-
-    Example:
-        >>> types = await get_indexed_doc_types("alice")
-        >>> if "note" in types:
-        ...     # Search notes
-    """
-    import logging
-
-    from qdrant_client.models import FieldCondition, Filter, MatchValue
-
-    from nextcloud_mcp_server.config import get_settings
-    from nextcloud_mcp_server.vector.qdrant_client import get_qdrant_client
-
-    logger = logging.getLogger(__name__)
-    settings = get_settings()
-
-    qdrant_client = await get_qdrant_client()
-    collection = settings.get_collection_name()
-
-    # Use scroll to sample documents and extract doc_types
-    # Note: This could be optimized with a facet/aggregation query if Qdrant adds support
-    try:
-        scroll_results, _next_offset = await qdrant_client.scroll(
-            collection_name=collection,
-            scroll_filter=Filter(
-                must=[FieldCondition(key="user_id", match=MatchValue(value=user_id))]
-            ),
-            limit=1000,  # Sample size to discover types
-            with_payload=["doc_type"],
-            with_vectors=False,  # Don't need vectors for type discovery
-        )
-
-        doc_types = {
-            point.payload.get("doc_type")
-            for point in scroll_results
-            if point.payload.get("doc_type")
-        }
-
-        logger.debug(f"Found indexed document types for user {user_id}: {doc_types}")
-        return doc_types
-
-    except Exception as e:
-        logger.warning(f"Failed to query Qdrant for doc_types: {e}")
-        return set()
-
-
-@dataclass
-class SearchResult:
-    """A single search result with metadata and score.
-
-    Attributes:
-        id: Document ID
-        doc_type: Document type (note, file, calendar, contact, etc.)
-        title: Document title
-        excerpt: Content excerpt showing match context
-        score: Relevance score (≥ 0.0, higher is better)
-            - RRF fusion: scores in [0.0, 1.0]
-            - DBSF fusion: scores can exceed 1.0 (sum of normalized scores)
-        metadata: Additional algorithm-specific metadata
-        chunk_start_offset: Character position where chunk starts (None if not available)
-        chunk_end_offset: Character position where chunk ends (None if not available)
-    """
-
-    id: int
-    doc_type: str
-    title: str
-    excerpt: str
-    score: float
-    metadata: dict[str, Any] | None = None
-    chunk_start_offset: int | None = None
-    chunk_end_offset: int | None = None
-
-    def __post_init__(self):
-        """Validate score is non-negative.
-
-        Note: Different fusion methods produce different score ranges:
-        - RRF (Reciprocal Rank Fusion): Bounded to [0.0, 1.0]
-        - DBSF (Distribution-Based Score Fusion): Unbounded (can exceed 1.0)
-          DBSF sums normalized scores from multiple systems, so scores can be
-          1.5, 2.0, etc. when multiple systems agree a document is highly relevant.
-        """
-        if self.score < 0.0:
-            raise ValueError(f"Score must be non-negative, got {self.score}")
-
-
-class SearchAlgorithm(ABC):
-    """Abstract base class for search algorithms.
-
-    All search algorithms must implement the search() method with consistent
-    interface, allowing them to be used interchangeably.
-    """
-
-    @abstractmethod
-    async def search(
-        self,
-        query: str,
-        user_id: str,
-        limit: int = 10,
-        doc_type: str | None = None,
-        **kwargs: Any,
-    ) -> list[SearchResult]:
-        """Execute search with the given parameters.
-
-        Args:
-            query: Search query string
-            user_id: User ID for multi-tenant filtering
-            limit: Maximum number of results to return
-            doc_type: Optional document type filter (note, file, calendar, etc.)
-            **kwargs: Algorithm-specific parameters
-
-        Returns:
-            List of SearchResult objects ranked by relevance
-
-        Raises:
-            McpError: If search fails or configuration is invalid
-        """
-        pass
-
-    @property
-    @abstractmethod
-    def name(self) -> str:
-        """Return algorithm name for identification."""
-        pass
-
-    @property
-    def supports_scoring(self) -> bool:
-        """Whether this algorithm provides meaningful relevance scores.
-
-        Default: True. Override if algorithm doesn't support scoring.
-        """
-        return True
-
-    @property
-    def requires_vector_db(self) -> bool:
-        """Whether this algorithm requires vector database.
-
-        Default: False. Override for semantic search.
-        """
-        return False
@@ -1,223 +0,0 @@
-"""BM25 hybrid search algorithm using Qdrant native RRF fusion."""
-
-import logging
-from typing import Any
-
-from qdrant_client import models
-from qdrant_client.models import FieldCondition, Filter, MatchValue
-
-from nextcloud_mcp_server.config import get_settings
-from nextcloud_mcp_server.embedding import get_bm25_service, get_embedding_service
-from nextcloud_mcp_server.observability.metrics import record_qdrant_operation
-from nextcloud_mcp_server.search.algorithms import SearchAlgorithm, SearchResult
-from nextcloud_mcp_server.vector.qdrant_client import get_qdrant_client
-
-logger = logging.getLogger(__name__)
-
-
-class BM25HybridSearchAlgorithm(SearchAlgorithm):
-    """
-    Hybrid search combining dense semantic vectors with BM25 sparse vectors.
-
-    Uses Qdrant's native Reciprocal Rank Fusion (RRF) to automatically merge
-    results from both dense (semantic) and sparse (BM25 keyword) searches.
-    This provides the best of both worlds: semantic understanding for conceptual
-    queries and precise keyword matching for specific terms, acronyms, and codes.
-
-    The fusion happens efficiently in the database using the prefetch mechanism,
-    eliminating the need for application-layer result merging.
-    """
-
-    def __init__(self, score_threshold: float = 0.0, fusion: str = "rrf"):
-        """
-        Initialize BM25 hybrid search algorithm.
-
-        Args:
-            score_threshold: Minimum fusion score (0-1, default: 0.0 to allow fusion scoring)
-                           Note: Both RRF and DBSF produce normalized scores
-            fusion: Fusion algorithm to use: "rrf" (Reciprocal Rank Fusion, default)
-                   or "dbsf" (Distribution-Based Score Fusion)
-
-        Raises:
-            ValueError: If fusion is not "rrf" or "dbsf"
-        """
-        if fusion not in ("rrf", "dbsf"):
-            raise ValueError(
-                f"Invalid fusion algorithm '{fusion}'. Must be 'rrf' or 'dbsf'"
-            )
-
-        self.score_threshold = score_threshold
-        self.fusion = models.Fusion.RRF if fusion == "rrf" else models.Fusion.DBSF
-        self.fusion_name = fusion
-
-    @property
-    def name(self) -> str:
-        return "bm25_hybrid"
-
-    @property
-    def requires_vector_db(self) -> bool:
-        return True
-
-    async def search(
-        self,
-        query: str,
-        user_id: str,
-        limit: int = 10,
-        doc_type: str | None = None,
-        **kwargs: Any,
-    ) -> list[SearchResult]:
-        """
-        Execute hybrid search using dense + sparse vectors with native RRF fusion.
-
-        Returns unverified results from Qdrant. Access verification should be
-        performed separately at the final output stage using verify_search_results().
-
-        Args:
-            query: Natural language or keyword search query
-            user_id: User ID for filtering
-            limit: Maximum results to return
-            doc_type: Optional document type filter
-            **kwargs: Additional parameters (score_threshold override)
-
-        Returns:
-            List of unverified SearchResult objects ranked by RRF fusion score
-
-        Raises:
-            McpError: If vector sync is not enabled or search fails
-        """
-        settings = get_settings()
-        score_threshold = kwargs.get("score_threshold", self.score_threshold)
-
-        logger.info(
-            f"BM25 hybrid search: query='{query}', user={user_id}, "
-            f"limit={limit}, score_threshold={score_threshold}, doc_type={doc_type}, "
-            f"fusion={self.fusion_name}"
-        )
-
-        # Generate dense embedding for semantic search
-        embedding_service = get_embedding_service()
-        dense_embedding = await embedding_service.embed(query)
-        logger.debug(f"Generated dense embedding (dimension={len(dense_embedding)})")
-
-        # Generate sparse embedding for BM25 keyword search
-        bm25_service = get_bm25_service()
-        sparse_embedding = bm25_service.encode(query)
-        logger.debug(
-            f"Generated sparse embedding "
-            f"({len(sparse_embedding['indices'])} non-zero terms)"
-        )
-
-        # Build Qdrant filter
-        filter_conditions = [
-            FieldCondition(
-                key="user_id",
-                match=MatchValue(value=user_id),
-            )
-        ]
-
-        # Add doc_type filter if specified
-        if doc_type:
-            filter_conditions.append(
-                FieldCondition(
-                    key="doc_type",
-                    match=MatchValue(value=doc_type),
-                )
-            )
-
-        query_filter = Filter(must=filter_conditions)
-
-        # Execute hybrid search with Qdrant native RRF fusion
-        qdrant_client = await get_qdrant_client()
-        try:
-            # Use prefetch to run both dense and sparse searches
-            # Qdrant will automatically merge results using RRF
-            search_response = await qdrant_client.query_points(
-                collection_name=settings.get_collection_name(),
-                prefetch=[
-                    # Dense semantic search
-                    models.Prefetch(
-                        query=dense_embedding,
-                        using="dense",
-                        limit=limit * 2,  # Get extra for deduplication
-                        filter=query_filter,
-                    ),
-                    # Sparse BM25 search
-                    models.Prefetch(
-                        query=models.SparseVector(
-                            indices=sparse_embedding["indices"],
-                            values=sparse_embedding["values"],
-                        ),
-                        using="sparse",
-                        limit=limit * 2,  # Get extra for deduplication
-                        filter=query_filter,
-                    ),
-                ],
-                # Fusion query (RRF or DBSF based on initialization)
-                query=models.FusionQuery(fusion=self.fusion),
-                limit=limit * 2,  # Get extra for deduplication
-                score_threshold=score_threshold,
-                with_payload=True,
-                with_vectors=False,  # Don't return vectors to save bandwidth
-            )
-            record_qdrant_operation("search", "success")
-        except Exception:
-            record_qdrant_operation("search", "error")
-            raise
-
-        logger.info(
-            f"Qdrant {self.fusion_name.upper()} fusion returned {len(search_response.points)} results "
-            f"(before deduplication)"
-        )
-
-        if search_response.points:
-            # Log top 3 fusion scores to help with threshold tuning
-            top_scores = [p.score for p in search_response.points[:3]]
-            logger.debug(
-                f"Top 3 {self.fusion_name.upper()} fusion scores: {top_scores}"
-            )
-
-        # Deduplicate by (doc_id, doc_type) - multiple chunks per document
-        seen_docs = set()
-        results = []
-
-        for result in search_response.points:
-            doc_id = int(result.payload["doc_id"])
-            doc_type = result.payload.get("doc_type", "note")
-            doc_key = (doc_id, doc_type)
-
-            # Skip if we've already seen this document
-            if doc_key in seen_docs:
-                continue
-
-            seen_docs.add(doc_key)
-
-            # Return unverified results (verification happens at output stage)
-            results.append(
-                SearchResult(
-                    id=doc_id,
-                    doc_type=doc_type,
-                    title=result.payload.get("title", "Untitled"),
-                    excerpt=result.payload.get("excerpt", ""),
-                    score=result.score,  # Fusion score (RRF or DBSF)
-                    metadata={
-                        "chunk_index": result.payload.get("chunk_index"),
-                        "total_chunks": result.payload.get("total_chunks"),
-                        "search_method": f"bm25_hybrid_{self.fusion_name}",
-                    },
-                    chunk_start_offset=result.payload.get("chunk_start_offset"),
-                    chunk_end_offset=result.payload.get("chunk_end_offset"),
-                )
-            )
-
-            if len(results) >= limit:
-                break
-
-        logger.info(f"Returning {len(results)} unverified results after deduplication")
-        if results:
-            result_details = [
-                f"{r.doc_type}_{r.id} (score={r.score:.3f}, title='{r.title}')"
-                for r in results[:5]  # Show top 5
-            ]
-            logger.debug(f"Top results: {', '.join(result_details)}")
-
-        return results
@@ -1,169 +0,0 @@
-"""Semantic search algorithm using vector similarity (Qdrant)."""
-
-import logging
-from typing import Any
-
-from qdrant_client.models import FieldCondition, Filter, MatchValue
-
-from nextcloud_mcp_server.config import get_settings
-from nextcloud_mcp_server.embedding import get_embedding_service
-from nextcloud_mcp_server.observability.metrics import record_qdrant_operation
-from nextcloud_mcp_server.search.algorithms import SearchAlgorithm, SearchResult
-from nextcloud_mcp_server.vector.qdrant_client import get_qdrant_client
-
-logger = logging.getLogger(__name__)
-
-
-class SemanticSearchAlgorithm(SearchAlgorithm):
-    """Semantic search using vector similarity in Qdrant.
-
-    Searches documents by meaning rather than exact keywords using
-    768-dimensional embeddings and cosine distance.
-    """
-
-    def __init__(self, score_threshold: float = 0.7):
-        """Initialize semantic search algorithm.
-
-        Args:
-            score_threshold: Minimum similarity score (0-1, default: 0.7)
-        """
-        self.score_threshold = score_threshold
-
-    @property
-    def name(self) -> str:
-        return "semantic"
-
-    @property
-    def requires_vector_db(self) -> bool:
-        return True
-
-    async def search(
-        self,
-        query: str,
-        user_id: str,
-        limit: int = 10,
-        doc_type: str | None = None,
-        **kwargs: Any,
-    ) -> list[SearchResult]:
-        """Execute semantic search using vector similarity.
-
-        Returns unverified results from Qdrant. Access verification should be
-        performed separately at the final output stage using verify_search_results().
-
-        Args:
-            query: Natural language search query
-            user_id: User ID for filtering
-            limit: Maximum results to return
-            doc_type: Optional document type filter
-            **kwargs: Additional parameters (score_threshold override)
-
-        Returns:
-            List of unverified SearchResult objects ranked by similarity score
-
-        Raises:
-            McpError: If vector sync is not enabled or search fails
-        """
-        settings = get_settings()
-        score_threshold = kwargs.get("score_threshold", self.score_threshold)
-
-        logger.info(
-            f"Semantic search: query='{query}', user={user_id}, "
-            f"limit={limit}, score_threshold={score_threshold}, doc_type={doc_type}"
-        )
-
-        # Generate embedding for query
-        embedding_service = get_embedding_service()
-        query_embedding = await embedding_service.embed(query)
-        logger.debug(
-            f"Generated embedding for query (dimension={len(query_embedding)})"
-        )
-
-        # Build Qdrant filter
-        filter_conditions = [
-            FieldCondition(
-                key="user_id",
-                match=MatchValue(value=user_id),
-            )
-        ]
-
-        # Add doc_type filter if specified
-        if doc_type:
-            filter_conditions.append(
-                FieldCondition(
-                    key="doc_type",
-                    match=MatchValue(value=doc_type),
-                )
-            )
-
-        # Search Qdrant
-        qdrant_client = await get_qdrant_client()
-        try:
-            search_response = await qdrant_client.query_points(
-                collection_name=settings.get_collection_name(),
-                query=query_embedding,
-                using="dense",  # Use named dense vector (BM25 hybrid collections)
-                query_filter=Filter(must=filter_conditions),
-                limit=limit * 2,  # Get extra for deduplication
-                score_threshold=score_threshold,
-                with_payload=True,
-                with_vectors=False,  # Don't return vectors to save bandwidth
-            )
-            record_qdrant_operation("search", "success")
-        except Exception:
-            record_qdrant_operation("search", "error")
-            raise
-
-        logger.info(
-            f"Qdrant returned {len(search_response.points)} results "
-            f"(before deduplication)"
-        )
-
-        if search_response.points:
-            # Log top 3 scores to help with threshold tuning
-            top_scores = [p.score for p in search_response.points[:3]]
-            logger.debug(f"Top 3 similarity scores: {top_scores}")
-
-        # Deduplicate by (doc_id, doc_type) - multiple chunks per document
-        seen_docs = set()
-        results = []
-
-        for result in search_response.points:
-            doc_id = int(result.payload["doc_id"])
-            doc_type = result.payload.get("doc_type", "note")
-            doc_key = (doc_id, doc_type)
-
-            # Skip if we've already seen this document
-            if doc_key in seen_docs:
-                continue
-
-            seen_docs.add(doc_key)
-
-            # Return unverified results (verification happens at output stage)
-            results.append(
-                SearchResult(
-                    id=doc_id,
-                    doc_type=doc_type,
-                    title=result.payload.get("title", "Untitled"),
-                    excerpt=result.payload.get("excerpt", ""),
-                    score=result.score,
-                    metadata={
-                        "chunk_index": result.payload.get("chunk_index"),
-                        "total_chunks": result.payload.get("total_chunks"),
-                    },
-                    chunk_start_offset=result.payload.get("chunk_start_offset"),
-                    chunk_end_offset=result.payload.get("chunk_end_offset"),
-                )
-            )
-
-            if len(results) >= limit:
-                break
-
-        logger.info(f"Returning {len(results)} unverified results after deduplication")
-        if results:
-            result_details = [
-                f"{r.doc_type}_{r.id} (score={r.score:.3f}, title='{r.title}')"
-                for r in results[:5]  # Show top 5
-            ]
-            logger.debug(f"Top results: {', '.join(result_details)}")
-
-        return results
@@ -2,8 +2,7 @@

 import logging

-import anyio
-from httpx import RequestError
+from httpx import HTTPStatusError, RequestError
 from mcp.server.fastmcp import Context, FastMCP
 from mcp.shared.exceptions import McpError
 from mcp.types import (
@@ -24,8 +23,8 @@ from nextcloud_mcp_server.models.semantic import (
 )
 from nextcloud_mcp_server.observability.metrics import (
    instrument_tool,
+    record_qdrant_operation,
 )
-from nextcloud_mcp_server.search.bm25_hybrid import BM25HybridSearchAlgorithm

 logger = logging.getLogger(__name__)

@@ -37,160 +36,187 @@ def configure_semantic_tools(mcp: FastMCP):
    @require_scopes("semantic:read")
    @instrument_tool
    async def nc_semantic_search(
-        query: str,
-        ctx: Context,
-        limit: int = 10,
-        doc_types: list[str] | None = None,
-        score_threshold: float = 0.0,
-        fusion: str = "rrf",
+        query: str, ctx: Context, limit: int = 10, score_threshold: float = 0.7
    ) -> SemanticSearchResponse:
        """
-        Search Nextcloud content using BM25 hybrid search with cross-app support.
+        Semantic search across all indexed Nextcloud apps using vector embeddings.

-        Uses Qdrant's native hybrid search combining:
-        - Dense semantic vectors: For conceptual similarity and natural language queries
-        - BM25 sparse vectors: For precise keyword matching, acronyms, and specific terms
-
-        Results are automatically fused using the selected fusion algorithm in the
-        database for optimal relevance. This provides the best of both semantic
-        understanding and keyword precision.
-
-        Requires VECTOR_SYNC_ENABLED=true. Currently only "note" documents are
-        fully supported for indexing.
+        Searches documents by meaning rather than exact keywords across notes, calendar
+        events, deck cards, files, and contacts. Requires vector database synchronization
+        to be enabled (VECTOR_SYNC_ENABLED=true).

        Args:
-            query: Natural language or keyword search query
+            query: Natural language search query
            limit: Maximum number of results to return (default: 10)
-            doc_types: Document types to search (e.g., ["note", "file"]). None = search all indexed types (default)
-            score_threshold: Minimum fusion score (0-1, default: 0.0)
-            fusion: Fusion algorithm: "rrf" (Reciprocal Rank Fusion, default) or "dbsf" (Distribution-Based Score Fusion)
-                   RRF: Good general-purpose fusion using reciprocal ranks
-                   DBSF: Uses distribution-based normalization, may better balance different score ranges
+            score_threshold: Minimum similarity score (0-1, default: 0.7)

        Returns:
-            SemanticSearchResponse with matching documents ranked by fusion scores
+            SemanticSearchResponse with matching documents and similarity scores
        """
+        from qdrant_client.models import FieldCondition, Filter, MatchValue
+
        from nextcloud_mcp_server.config import get_settings
+        from nextcloud_mcp_server.embedding import get_embedding_service
+        from nextcloud_mcp_server.vector.qdrant_client import get_qdrant_client

        settings = get_settings()
-        client = await get_client(ctx)
-        username = client.username

-        logger.info(
-            f"BM25 hybrid search: query='{query}', user={username}, "
-            f"limit={limit}, score_threshold={score_threshold}, fusion={fusion}"
-        )
-
-        # Check that vector sync is enabled
+        # Check if vector sync is enabled
        if not settings.vector_sync_enabled:
            raise McpError(
                ErrorData(
                    code=-1,
-                    message="BM25 hybrid search requires VECTOR_SYNC_ENABLED=true",
+                    message="Semantic search is not enabled. Set VECTOR_SYNC_ENABLED=true and ensure vector database is configured.",
                )
            )

+        client = await get_client(ctx)
+        username = client.username
+
+        logger.info(
+            f"Semantic search: query='{query}', user={username}, "
+            f"limit={limit}, score_threshold={score_threshold}"
+        )
+
        try:
-            # Create BM25 hybrid search algorithm with specified fusion
-            search_algo = BM25HybridSearchAlgorithm(
-                score_threshold=score_threshold, fusion=fusion
+            # Generate embedding for query
+            embedding_service = get_embedding_service()
+            query_embedding = await embedding_service.embed(query)
+            logger.debug(
+                f"Generated embedding for query (dimension={len(query_embedding)})"
            )

-            # Execute search across requested document types
-            # If doc_types is None, search all indexed types (cross-app search)
-            # If doc_types is a list, search only those types
-            all_results = []
-
-            if doc_types is None:
-                # Cross-app search: search all indexed types
-                # Get unverified results from Qdrant
-                unverified_results = await search_algo.search(
-                    query=query,
-                    user_id=username,
-                    limit=limit * 2,  # Get extra for access filtering
-                    doc_type=None,  # Signal to search all types
+            # Search Qdrant with user filtering
+            # Note: Currently only searching notes (doc_type="note")
+            # Future: Remove doc_type filter to search all apps
+            qdrant_client = await get_qdrant_client()
+            try:
+                search_response = await qdrant_client.query_points(
+                    collection_name=settings.get_collection_name(),
+                    query=query_embedding,
+                    query_filter=Filter(
+                        must=[
+                            FieldCondition(
+                                key="user_id",
+                                match=MatchValue(value=username),
+                            ),
+                            FieldCondition(
+                                key="doc_type",
+                                match=MatchValue(value="note"),
+                            ),
+                        ]
+                    ),
+                    limit=limit * 2,  # Get extra for filtering
                    score_threshold=score_threshold,
+                    with_payload=True,
+                    with_vectors=False,  # Don't return vectors to save bandwidth
                )
-                all_results.extend(unverified_results)
-            else:
-                # Search specific document types
-                # For each requested type, execute search and combine results
-                for dtype in doc_types:
-                    unverified_results = await search_algo.search(
-                        query=query,
-                        user_id=username,
-                        limit=limit * 2,  # Get extra for combining and filtering
-                        doc_type=dtype,
-                        score_threshold=score_threshold,
-                    )
-                    all_results.extend(unverified_results)
+                # Record successful search operation
+                record_qdrant_operation("search", "success")
+            except Exception:
+                # Record failed search operation
+                record_qdrant_operation("search", "error")
+                raise

-                # Sort combined results by score
-                all_results.sort(key=lambda r: r.score, reverse=True)
+            logger.info(
+                f"Qdrant returned {len(search_response.points)} results "
+                f"(before deduplication and access verification)"
+            )
+            if search_response.points:
+                # Log top 3 scores to help with threshold tuning
+                top_scores = [p.score for p in search_response.points[:3]]
+                logger.debug(f"Top 3 similarity scores: {top_scores}")

-            # Deduplicate results (hybrid search may return same doc from dense + sparse)
-            # Qdrant already filters by user_id for multi-tenant isolation
-            # Sampling tool will verify access when fetching full content
-            seen = set()
-            unique_results = []
-            for result in all_results:
-                key = (result.id, result.doc_type)
-                if key not in seen:
-                    seen.add(key)
-                    unique_results.append(result)
-
-            search_results = unique_results[:limit]  # Final limit after deduplication
-
-            # Convert SearchResult objects to SemanticSearchResult for response
+            # Deduplicate by document ID (multiple chunks per document)
+            seen_doc_ids = set()
            results = []
-            for r in search_results:
-                results.append(
-                    SemanticSearchResult(
-                        id=r.id,
-                        doc_type=r.doc_type,
-                        title=r.title,
-                        category=r.metadata.get("category", "") if r.metadata else "",
-                        excerpt=r.excerpt,
-                        score=r.score,
-                        chunk_index=r.metadata.get("chunk_index", 0)
-                        if r.metadata
-                        else 0,
-                        total_chunks=r.metadata.get("total_chunks", 1)
-                        if r.metadata
-                        else 1,
-                        chunk_start_offset=r.chunk_start_offset,
-                        chunk_end_offset=r.chunk_end_offset,
-                    )
-                )

-            logger.info(f"Returning {len(results)} results from BM25 hybrid search")
+            for result in search_response.points:
+                doc_id = int(result.payload["doc_id"])
+                doc_type = result.payload.get("doc_type", "note")
+
+                # Skip if we've already seen this document
+                if doc_id in seen_doc_ids:
+                    continue
+
+                seen_doc_ids.add(doc_id)
+
+                # Verify access via Nextcloud API (dual-phase authorization)
+                # Currently only supports notes, will be extended to other apps
+                if doc_type == "note":
+                    try:
+                        note = await client.notes.get_note(doc_id)
+
+                        results.append(
+                            SemanticSearchResult(
+                                id=doc_id,
+                                doc_type="note",
+                                title=result.payload["title"],
+                                category=note.get("category", ""),
+                                excerpt=result.payload["excerpt"],
+                                score=result.score,
+                                chunk_index=result.payload["chunk_index"],
+                                total_chunks=result.payload["total_chunks"],
+                            )
+                        )
+
+                        if len(results) >= limit:
+                            break
+
+                    except HTTPStatusError as e:
+                        if e.response.status_code == 403:
+                            # User lost access, skip this document
+                            logger.debug(f"Skipping note {doc_id}: access denied (403)")
+                            continue
+                        elif e.response.status_code == 404:
+                            # Document was deleted but not yet removed from vector DB
+                            logger.debug(
+                                f"Skipping note {doc_id}: not found (404), "
+                                f"likely deleted after indexing"
+                            )
+                            continue
+                        else:
+                            # Log other errors but continue processing
+                            logger.warning(
+                                f"Error verifying access to note {doc_id}: {e.response.status_code}"
+                            )
+                            continue
+
+            logger.info(
+                f"Returning {len(results)} results after deduplication and access verification"
+            )
+            if results:
+                result_details = [
+                    f"note_{r.id} (score={r.score:.3f}, title='{r.title}')"
+                    for r in results[:5]  # Show top 5
+                ]
+                logger.debug(f"Top results: {', '.join(result_details)}")

            return SemanticSearchResponse(
                results=results,
                query=query,
                total_found=len(results),
-                search_method=f"bm25_hybrid_{fusion}",
+                search_method="semantic",
            )

        except ValueError as e:
-            error_msg = str(e)
-            if "No embedding provider configured" in error_msg:
+            if "No embedding provider configured" in str(e):
                raise McpError(
                    ErrorData(
                        code=-1,
                        message="Embedding service not configured. Set OLLAMA_BASE_URL environment variable.",
                    )
                )
-            raise McpError(
-                ErrorData(code=-1, message=f"Configuration error: {error_msg}")
-            )
+            raise McpError(ErrorData(code=-1, message=f"Configuration error: {str(e)}"))
        except RequestError as e:
            raise McpError(
                ErrorData(code=-1, message=f"Network error during search: {str(e)}")
            )
        except Exception as e:
-            logger.error(f"Search error: {e}", exc_info=True)
-            raise McpError(ErrorData(code=-1, message=f"Search failed: {str(e)}"))
+            logger.error(f"Semantic search error: {e}", exc_info=True)
+            raise McpError(
+                ErrorData(code=-1, message=f"Semantic search failed: {str(e)}")
+            )

    @mcp.tool()
    @require_scopes("semantic:read")
@@ -201,7 +227,6 @@ def configure_semantic_tools(mcp: FastMCP):
        limit: int = 5,
        score_threshold: float = 0.7,
        max_answer_tokens: int = 500,
-        fusion: str = "rrf",
    ) -> SamplingSearchResponse:
        """
        Semantic search with LLM-generated answer using MCP sampling.
@@ -226,7 +251,6 @@ def configure_semantic_tools(mcp: FastMCP):
            limit: Maximum number of documents to retrieve (default: 5)
            score_threshold: Minimum similarity score 0-1 (default: 0.7)
            max_answer_tokens: Maximum tokens for generated answer (default: 500)
-            fusion: Fusion algorithm: "rrf" (Reciprocal Rank Fusion, default) or "dbsf" (Distribution-Based Score Fusion)

        Returns:
            SamplingSearchResponse containing:
@@ -266,7 +290,6 @@ def configure_semantic_tools(mcp: FastMCP):
            ctx=ctx,
            limit=limit,
            score_threshold=score_threshold,
-            fusion=fusion,
        )

        # 2. Handle no results case - don't waste a sampling call
@@ -321,55 +344,35 @@ def configure_semantic_tools(mcp: FastMCP):
                success=True,
            )

-        # 4. Fetch full content for notes in parallel (also verifies access)
-        # Use anyio task group for concurrent fetching with semaphore to prevent
-        # connection pool exhaustion
+        # 4. Fetch full content for notes to provide complete context to LLM
+        # Filter out inaccessible notes (deleted or permissions changed)
        client = await get_client(ctx)
-        accessible_results = [None] * len(search_response.results)
-        full_contents = [None] * len(search_response.results)
+        accessible_results = []
+        full_contents = []  # Full content for accessible notes

-        # Limit concurrent requests to prevent connection pool exhaustion
-        max_concurrent = 20
-        semaphore = anyio.Semaphore(max_concurrent)
-
-        async def fetch_content(index: int, result: SemanticSearchResult):
-            """Fetch full content for a single document (parallel with semaphore)."""
-            async with semaphore:
-                if result.doc_type == "note":
-                    try:
-                        note = await client.notes.get_note(result.id)
-                        # Note is accessible, store result and full content
-                        content = note.get("content", "")
-                        accessible_results[index] = result
-                        full_contents[index] = content
-                        logger.debug(
-                            f"Fetched full content for note {result.id} "
-                            f"(length: {len(content)} chars)"
-                        )
-                    except Exception as e:
-                        # Note might have been deleted or permissions changed
-                        # Leave as None to filter out later
-                        logger.debug(
-                            f"Note {result.id} not accessible: {e}. "
-                            f"Excluding from results."
-                        )
-                else:
-                    # Non-note document types (future: calendar, deck, files)
-                    # For now, keep them with excerpts
-                    accessible_results[index] = result
-                    # full_contents[index] remains None (will use excerpt)
-
-        # Run all fetches in parallel using anyio task group
-        async with anyio.create_task_group() as tg:
-            for idx, result in enumerate(search_response.results):
-                tg.start_soon(fetch_content, idx, result)
-
-        # Filter out None (inaccessible notes) while preserving order
-        final_pairs = [
-            (r, c) for r, c in zip(accessible_results, full_contents) if r is not None
-        ]
-        accessible_results = [r for r, c in final_pairs]
-        full_contents = [c for r, c in final_pairs]
+        for result in search_response.results:
+            if result.doc_type == "note":
+                try:
+                    note = await client.notes.get_note(result.id)
+                    # Note is accessible, store full content
+                    accessible_results.append(result)
+                    full_contents.append(note.get("content", ""))
+                    logger.debug(
+                        f"Fetched full content for note {result.id} "
+                        f"(length: {len(full_contents[-1])} chars)"
+                    )
+                except Exception as e:
+                    # Note might have been deleted or permissions changed
+                    # Filter it out to avoid corrupting LLM with inaccessible data
+                    logger.warning(
+                        f"Failed to fetch full content for note {result.id}: {e}. "
+                        f"Excluding from results."
+                    )
+            else:
+                # Non-note document types (future: calendar, deck, files)
+                # For now, keep them with excerpts
+                accessible_results.append(result)
+                full_contents.append(None)

        # Check if we filtered out all results
        if not accessible_results:
@@ -421,6 +424,7 @@ def configure_semantic_tools(mcp: FastMCP):
        )

        # 6. Request LLM completion via MCP sampling with timeout
+        import anyio

        try:
            with anyio.fail_after(30):
@@ -1,91 +1,51 @@
-"""Document chunking for large texts using LangChain text splitters."""
+"""Document chunking for large texts."""

 import logging
-from dataclasses import dataclass
-
-from langchain_text_splitters import MarkdownTextSplitter

 logger = logging.getLogger(__name__)


-@dataclass
-class ChunkWithPosition:
-    """A text chunk with its character position in the original document."""
-
-    text: str
-    start_offset: int  # Character position where chunk starts
-    end_offset: int  # Character position where chunk ends (exclusive)
-
-
 class DocumentChunker:
-    """Chunk large documents for optimal embedding using LangChain text splitters.
+    """Chunk large documents for optimal embedding."""

-    Uses MarkdownTextSplitter which is optimized for Markdown content like
-    Nextcloud Notes. Respects markdown structure (headers, code blocks, lists)
-    while maintaining semantic boundaries.
-    """
-
-    def __init__(self, chunk_size: int = 2048, overlap: int = 200):
+    def __init__(self, chunk_size: int = 512, overlap: int = 50):
        """
        Initialize document chunker.

        Args:
-            chunk_size: Number of characters per chunk (default: 2048)
-            overlap: Number of overlapping characters between chunks (default: 200)
+            chunk_size: Number of words per chunk (default: 512)
+            overlap: Number of overlapping words between chunks (default: 50)
        """
        self.chunk_size = chunk_size
        self.overlap = overlap

-        # Initialize LangChain MarkdownTextSplitter
-        # Optimized for Markdown content with special handling for:
-        # - Headers (# ## ###)
-        # - Code blocks (``` ```)
-        # - Lists (- * 1.)
-        # - Horizontal rules (---)
-        # - Paragraphs and sentences
-        # This preserves both markdown structure and semantic boundaries
-        self.splitter = MarkdownTextSplitter(
-            chunk_size=chunk_size,
-            chunk_overlap=overlap,
-            add_start_index=True,  # Enable position tracking
-            strip_whitespace=True,
-        )
-
-    def chunk_text(self, content: str) -> list[ChunkWithPosition]:
+    def chunk_text(self, content: str) -> list[str]:
        """
-        Split text into overlapping chunks with position tracking.
+        Split text into overlapping chunks.

-        Uses LangChain's MarkdownTextSplitter to create chunks that respect
-        both markdown structure and semantic boundaries. Optimized for Nextcloud
-        Notes content with special handling for headers, code blocks, lists, etc.
-        Preserves character positions for each chunk to enable precise document
-        retrieval.
+        Uses simple word-based chunking with configurable overlap to preserve
+        context across chunk boundaries.

        Args:
-            content: Markdown text content to chunk
+            content: Text content to chunk

        Returns:
-            List of chunks with their character positions in the original content
+            List of text chunks (may be single item if content is small)
        """
-        # Handle empty content - return single empty chunk for backward compatibility
-        if not content:
-            return [ChunkWithPosition(text="", start_offset=0, end_offset=0)]
+        # Simple word-based chunking
+        words = content.split()

-        # Use LangChain to create documents with position tracking
-        docs = self.splitter.create_documents([content])
+        if len(words) <= self.chunk_size:
+            return [content]

-        # Convert LangChain Documents to ChunkWithPosition objects
-        chunks = [
-            ChunkWithPosition(
-                text=doc.page_content,
-                start_offset=doc.metadata.get("start_index", 0),
-                end_offset=doc.metadata.get("start_index", 0) + len(doc.page_content),
-            )
-            for doc in docs
-        ]
+        chunks = []
+        start = 0

-        logger.debug(
-            f"Chunked document into {len(chunks)} chunks "
-            f"(chunk_size={self.chunk_size}, overlap={self.overlap})"
-        )
+        while start < len(words):
+            end = start + self.chunk_size
+            chunk_words = words[start:end]
+            chunks.append(" ".join(chunk_words))
+            start = end - self.overlap
+
+        logger.debug(f"Chunked document into {len(chunks)} chunks ({len(words)} words)")
        return chunks
@@ -1,140 +0,0 @@
-"""Custom PCA implementation for dimensionality reduction.
-
-Implements Principal Component Analysis without scikit-learn dependency.
-Used for reducing high-dimensional embeddings (768-dim) to 2D for visualization.
-"""
-
-import logging
-
-import numpy as np
-
-logger = logging.getLogger(__name__)
-
-
-class PCA:
-    """Principal Component Analysis for dimensionality reduction.
-
-    Simple implementation that finds principal components via eigendecomposition
-    of the covariance matrix. Suitable for small-to-medium datasets.
-
-    Attributes:
-        n_components: Number of principal components to keep
-        mean_: Mean of training data (set during fit)
-        components_: Principal components (eigenvectors)
-        explained_variance_: Variance explained by each component
-        explained_variance_ratio_: Fraction of total variance explained
-    """
-
-    def __init__(self, n_components: int = 2):
-        """Initialize PCA.
-
-        Args:
-            n_components: Number of components to keep (default: 2)
-        """
-        if n_components < 1:
-            raise ValueError(f"n_components must be >= 1, got {n_components}")
-
-        self.n_components = n_components
-        self.mean_: np.ndarray | None = None
-        self.components_: np.ndarray | None = None
-        self.explained_variance_: np.ndarray | None = None
-        self.explained_variance_ratio_: np.ndarray | None = None
-
-    def fit(self, X: np.ndarray) -> "PCA":
-        """Fit PCA model to data.
-
-        Args:
-            X: Training data of shape (n_samples, n_features)
-
-        Returns:
-            self (for method chaining)
-
-        Raises:
-            ValueError: If X has fewer features than n_components
-        """
-        X = np.asarray(X)
-
-        if X.ndim != 2:
-            raise ValueError(f"X must be 2D array, got shape {X.shape}")
-
-        n_samples, n_features = X.shape
-
-        if n_features < self.n_components:
-            raise ValueError(
-                f"n_components={self.n_components} > n_features={n_features}"
-            )
-
-        # Center data
-        self.mean_ = np.mean(X, axis=0)
-        X_centered = X - self.mean_
-
-        # Compute covariance matrix
-        # Use (X^T X) / (n-1) for numerical stability with high-dim data
-        cov = np.cov(X_centered.T)
-
-        # Eigendecomposition
-        eigenvalues, eigenvectors = np.linalg.eigh(cov)
-
-        # Sort by eigenvalue (descending)
-        idx = np.argsort(eigenvalues)[::-1]
-        eigenvalues = eigenvalues[idx]
-        eigenvectors = eigenvectors[:, idx]
-
-        # Keep top n_components
-        self.components_ = eigenvectors[:, : self.n_components].T
-        self.explained_variance_ = eigenvalues[: self.n_components]
-
-        # Calculate explained variance ratio
-        total_variance = np.sum(eigenvalues)
-        if total_variance > 0:
-            self.explained_variance_ratio_ = self.explained_variance_ / total_variance
-        else:
-            self.explained_variance_ratio_ = np.zeros(self.n_components)
-
-        logger.debug(
-            f"PCA fit: {n_samples} samples, {n_features} features → "
-            f"{self.n_components} components, "
-            f"explained variance: {self.explained_variance_ratio_}"
-        )
-
-        return self
-
-    def transform(self, X: np.ndarray) -> np.ndarray:
-        """Transform data to principal component space.
-
-        Args:
-            X: Data to transform of shape (n_samples, n_features)
-
-        Returns:
-            Transformed data of shape (n_samples, n_components)
-
-        Raises:
-            ValueError: If PCA not fitted yet
-        """
-        if self.mean_ is None or self.components_ is None:
-            raise ValueError("PCA not fitted yet. Call fit() first.")
-
-        X = np.asarray(X)
-
-        if X.ndim != 2:
-            raise ValueError(f"X must be 2D array, got shape {X.shape}")
-
-        # Center using training mean
-        X_centered = X - self.mean_
-
-        # Project onto principal components
-        X_transformed = np.dot(X_centered, self.components_.T)
-
-        return X_transformed
-
-    def fit_transform(self, X: np.ndarray) -> np.ndarray:
-        """Fit PCA model and transform data in one step.
-
-        Args:
-            X: Training data of shape (n_samples, n_features)
-
-        Returns:
-            Transformed data of shape (n_samples, n_components)
-        """
-        self.fit(X)
-        return self.transform(X)
@@ -8,14 +8,13 @@ import time
 import uuid

 import anyio
-from anyio.abc import TaskStatus
 from anyio.streams.memory import MemoryObjectReceiveStream
 from httpx import HTTPStatusError
 from qdrant_client.models import FieldCondition, Filter, MatchValue, PointStruct

 from nextcloud_mcp_server.client import NextcloudClient
 from nextcloud_mcp_server.config import get_settings
-from nextcloud_mcp_server.embedding import get_bm25_service, get_embedding_service
+from nextcloud_mcp_server.embedding import get_embedding_service
 from nextcloud_mcp_server.observability.metrics import (
    record_qdrant_operation,
    record_vector_sync_processing,
@@ -35,8 +34,6 @@ async def processor_task(
    shutdown_event: anyio.Event,
    nc_client: NextcloudClient,
    user_id: str,
-    *,
-    task_status: TaskStatus = anyio.TASK_STATUS_IGNORED,
 ):
    """
    Process documents from stream concurrently.
@@ -56,13 +53,9 @@ async def processor_task(
        shutdown_event: Event signaling shutdown
        nc_client: Authenticated Nextcloud client
        user_id: User being processed
-        task_status: Status object for signaling task readiness
    """
    logger.info(f"Processor {worker_id} started")

-    # Signal that the task has started and is ready
-    task_status.started()
-
    while not shutdown_event.is_set():
        try:
            # Get document with timeout (allows checking shutdown)
@@ -233,24 +226,15 @@ async def _index_document(
    )
    chunks = chunker.chunk_text(content)

-    # Extract chunk texts for embedding
-    chunk_texts = [chunk.text for chunk in chunks]
-
-    # Generate dense embeddings (I/O bound - external API call)
+    # Generate embeddings (I/O bound - external API call)
    embedding_service = get_embedding_service()
-    dense_embeddings = await embedding_service.embed_batch(chunk_texts)
-
-    # Generate sparse embeddings (BM25 for keyword matching)
-    bm25_service = get_bm25_service()
-    sparse_embeddings = bm25_service.encode_batch(chunk_texts)
+    embeddings = await embedding_service.embed_batch(chunks)

    # Prepare Qdrant points
    indexed_at = int(time.time())
    points = []

-    for i, (chunk, dense_emb, sparse_emb) in enumerate(
-        zip(chunks, dense_embeddings, sparse_embeddings)
-    ):
+    for i, (chunk, embedding) in enumerate(zip(chunks, embeddings)):
        # Generate deterministic UUID for point ID
        # Using uuid5 with DNS namespace and combining doc info
        point_name = f"{doc_task.doc_type}:{doc_task.doc_id}:chunk:{i}"
@@ -259,24 +243,18 @@ async def _index_document(
        points.append(
            PointStruct(
                id=point_id,
-                vector={
-                    "dense": dense_emb,
-                    "sparse": sparse_emb,
-                },
+                vector=embedding,
                payload={
                    "user_id": doc_task.user_id,
                    "doc_id": doc_task.doc_id,
                    "doc_type": doc_task.doc_type,
                    "title": title,
-                    "excerpt": chunk.text[:200],
+                    "excerpt": chunk[:200],
                    "indexed_at": indexed_at,
                    "modified_at": doc_task.modified_at,
                    "etag": etag,
                    "chunk_index": i,
                    "total_chunks": len(chunks),
-                    "chunk_start_offset": chunk.start_offset,
-                    "chunk_end_offset": chunk.end_offset,
-                    "metadata_version": 2,  # v2 includes position metadata
                },
            )
        )
@@ -2,7 +2,7 @@

 import logging

-from qdrant_client import AsyncQdrantClient, models
+from qdrant_client import AsyncQdrantClient
 from qdrant_client.models import Distance, VectorParams

 from nextcloud_mcp_server.config import get_settings
@@ -84,12 +84,7 @@ async def get_qdrant_client() -> AsyncQdrantClient:
                f"Collection '{collection_name}' found, validating dimensions..."
            )
            collection_info = await _qdrant_client.get_collection(collection_name)
-            # Handle both named vectors (dict) and legacy single vector
-            vectors = collection_info.config.params.vectors
-            if isinstance(vectors, dict):
-                actual_dimension = vectors["dense"].size
-            else:
-                actual_dimension = vectors.size
+            actual_dimension = collection_info.config.params.vectors.size

            # Validate dimension matches
            if actual_dimension != expected_dimension:
@@ -117,27 +112,17 @@ async def get_qdrant_client() -> AsyncQdrantClient:
            )
            await _qdrant_client.create_collection(
                collection_name=collection_name,
-                vectors_config={
-                    "dense": VectorParams(
-                        size=expected_dimension,
-                        distance=Distance.COSINE,
-                    ),
-                },
-                sparse_vectors_config={
-                    "sparse": models.SparseVectorParams(
-                        index=models.SparseIndexParams(
-                            on_disk=False,
-                        )
-                    ),
-                },
+                vectors_config=VectorParams(
+                    size=expected_dimension,
+                    distance=Distance.COSINE,
+                ),
            )
            logger.info(
                f"Created Qdrant collection: {collection_name}\n"
-                f"  Dense vector dimension: {expected_dimension}\n"
-                f"  Dense embedding model: {settings.ollama_embedding_model}\n"
-                f"  Sparse vectors: BM25 (for hybrid search)\n"
+                f"  Dimension: {expected_dimension}\n"
+                f"  Model: {settings.ollama_embedding_model}\n"
                f"  Distance: COSINE\n"
-                f"Background sync will index all documents with dense + sparse vectors."
+                f"Background sync will index all documents with this embedding model."
            )

    return _qdrant_client
@@ -8,7 +8,6 @@ import time
 from dataclasses import dataclass

 import anyio
-from anyio.abc import TaskStatus
 from anyio.streams.memory import MemoryObjectSendStream
 from qdrant_client.models import FieldCondition, Filter, MatchValue

@@ -94,8 +93,6 @@ async def scanner_task(
    wake_event: anyio.Event,
    nc_client: NextcloudClient,
    user_id: str,
-    *,
-    task_status: TaskStatus = anyio.TASK_STATUS_IGNORED,
 ):
    """
    Periodic scanner that detects changed documents for enabled user.
@@ -108,14 +105,10 @@ async def scanner_task(
        wake_event: Event to trigger immediate scan
        nc_client: Authenticated Nextcloud client
        user_id: User to scan
-        task_status: Status object for signaling task readiness
    """
    logger.info(f"Scanner task started for user: {user_id}")
    settings = get_settings()

-    # Signal that the task has started and is ready
-    task_status.started()
-
    async with send_stream:
        while not shutdown_event.is_set():
            try:
@@ -182,43 +175,73 @@ async def scan_user_documents(
                f"[SCAN-{scan_id}] Using pruneBefore={prune_before} to optimize data transfer"
            )

-        # Get indexed state from Qdrant first (for incremental sync)
-        indexed_docs = {}
-        if not initial_sync:
-            qdrant_client = await get_qdrant_client()
-            scroll_result = await qdrant_client.scroll(
-                collection_name=get_settings().get_collection_name(),
-                scroll_filter=Filter(
-                    must=[
-                        FieldCondition(key="user_id", match=MatchValue(value=user_id)),
-                        FieldCondition(key="doc_type", match=MatchValue(value="note")),
-                    ]
-                ),
-                with_payload=["doc_id", "indexed_at"],
-                with_vectors=False,
-                limit=10000,
-            )
+        # Fetch all notes from Nextcloud
+        notes = [
+            note
+            async for note in nc_client.notes.get_all_notes(prune_before=prune_before)
+        ]
+        logger.info(f"[SCAN-{scan_id}] Found {len(notes)} notes for {user_id}")

-            indexed_docs = {
-                point.payload["doc_id"]: point.payload["indexed_at"]
-                for point in scroll_result[0]
-            }
+        # Record documents scanned
+        record_vector_sync_scan(len(notes))

-            logger.debug(f"Found {len(indexed_docs)} indexed documents in Qdrant")
+        if initial_sync:
+            # Send everything on first sync
+            for note in notes:
+                modified_at = note.get("modified", 0)
+                await send_stream.send(
+                    DocumentTask(
+                        user_id=user_id,
+                        doc_id=str(note["id"]),
+                        doc_type="note",
+                        operation="index",
+                        modified_at=modified_at,
+                    )
+                )
+            logger.info(f"Sent {len(notes)} documents for initial sync: {user_id}")
+            return

-        # Stream notes from Nextcloud and process immediately
-        note_count = 0
+        # Get indexed state from Qdrant
+        qdrant_client = await get_qdrant_client()
+        scroll_result = await qdrant_client.scroll(
+            collection_name=get_settings().get_collection_name(),
+            scroll_filter=Filter(
+                must=[
+                    FieldCondition(key="user_id", match=MatchValue(value=user_id)),
+                    FieldCondition(key="doc_type", match=MatchValue(value="note")),
+                ]
+            ),
+            with_payload=["doc_id", "indexed_at"],
+            with_vectors=False,
+            limit=10000,
+        )
+
+        indexed_docs = {
+            point.payload["doc_id"]: point.payload["indexed_at"]
+            for point in scroll_result[0]
+        }
+
+        logger.debug(f"Found {len(indexed_docs)} indexed documents in Qdrant")
+
+        # Compare and queue changes
        queued = 0
-        nextcloud_doc_ids = set()
+        nextcloud_doc_ids = {str(note["id"]) for note in notes}

-        async for note in nc_client.notes.get_all_notes(prune_before=prune_before):
-            note_count += 1
+        for note in notes:
            doc_id = str(note["id"])
-            nextcloud_doc_ids.add(doc_id)
+            indexed_at = indexed_docs.get(doc_id)
            modified_at = note.get("modified", 0)

-            if initial_sync:
-                # Send everything on first sync
+            # If document reappeared, remove from potentially_deleted
+            doc_key = (user_id, doc_id)
+            if doc_key in _potentially_deleted:
+                logger.debug(
+                    f"Document {doc_id} reappeared, removing from deletion grace period"
+                )
+                del _potentially_deleted[doc_key]
+
+            # Send if never indexed or modified since last index
+            if indexed_at is None or modified_at > indexed_at:
                await send_stream.send(
                    DocumentTask(
                        user_id=user_id,
@@ -229,38 +252,6 @@ async def scan_user_documents(
                    )
                )
                queued += 1
-            else:
-                # Incremental sync: compare with indexed state
-                indexed_at = indexed_docs.get(doc_id)
-
-                # If document reappeared, remove from potentially_deleted
-                doc_key = (user_id, doc_id)
-                if doc_key in _potentially_deleted:
-                    logger.debug(
-                        f"Document {doc_id} reappeared, removing from deletion grace period"
-                    )
-                    del _potentially_deleted[doc_key]
-
-                # Send if never indexed or modified since last index
-                if indexed_at is None or modified_at > indexed_at:
-                    await send_stream.send(
-                        DocumentTask(
-                            user_id=user_id,
-                            doc_id=doc_id,
-                            doc_type="note",
-                            operation="index",
-                            modified_at=modified_at,
-                        )
-                    )
-                    queued += 1
-
-        # Log and record metrics after streaming
-        logger.info(f"[SCAN-{scan_id}] Found {note_count} notes for {user_id}")
-        record_vector_sync_scan(note_count)
-
-        if initial_sync:
-            logger.info(f"Sent {queued} documents for initial sync: {user_id}")
-            return

        # Check for deleted documents (in Qdrant but not in Nextcloud)
        # Use grace period: only delete after 2 consecutive scans confirm absence
@@ -1,6 +1,6 @@
 [project]
 name = "nextcloud-mcp-server"
-version = "0.42.0"
+version = "0.34.1"
 description = "Model Context Protocol (MCP) server for Nextcloud integration - enables AI assistants to interact with Nextcloud data"
 authors = [
    {name = "Chris Coutinho", email = "chris@coutinho.io"}
@@ -12,7 +12,7 @@ keywords = ["nextcloud", "mcp", "model-context-protocol", "llm", "ai", "claude",
 dependencies = [
    "mcp[cli] (>=1.21,<1.22)",
    "httpx (>=0.28.1,<0.29.0)",
-    "pillow (>=10.3.0,<12.0.0)", # Compatible with fastembed
+    "pillow (>=12.0.0,<12.1.0)",
    "icalendar (>=6.0.0,<7.0.0)",
    "pythonvcard4>=0.2.0",
    "pydantic>=2.11.4",
@@ -22,9 +22,6 @@ dependencies = [
    "aiosqlite>=0.20.0", # Async SQLite for refresh token storage
    "authlib>=1.6.5",
    "qdrant-client>=1.7.0",
-    "fastembed>=0.7.3", # BM25 sparse vector embeddings for hybrid search
-    "anthropic>=0.42.0", # For RAG evaluation with Anthropic LLMs
-    "boto3>=1.35.0", # For Amazon Bedrock provider (optional)
    # Observability dependencies
    "prometheus-client>=0.21.0", # Prometheus metrics
    "opentelemetry-api>=1.28.2", # OpenTelemetry API
@@ -34,8 +31,6 @@ dependencies = [
    "opentelemetry-instrumentation-logging>=0.49b2", # Logging integration
    "opentelemetry-exporter-otlp-proto-grpc>=1.28.2", # OTLP gRPC exporter
    "python-json-logger>=3.2.0", # Structured JSON logging
-    "jinja2>=3.1.6",
-    "langchain-text-splitters>=1.0.0",
 ]
 classifiers = [
    "Development Status :: 4 - Beta",
@@ -108,7 +103,6 @@ module-root = ""
 [dependency-groups]
 dev = [
    "commitizen>=4.8.2",
-    "datasets>=3.3.0", # For BeIR nfcorpus dataset loading
    "ipython>=9.2.0",
    "playwright>=1.49.1",
    "pytest>=8.3.5",
@@ -9,7 +9,6 @@ import pytest
 from httpx import HTTPStatusError
 from mcp import ClientSession
 from mcp.client.session import RequestContext
-from mcp.client.sse import sse_client
 from mcp.client.streamable_http import streamablehttp_client
 from mcp.types import ElicitRequestParams, ElicitResult, ErrorData

@@ -166,51 +165,6 @@ async def create_mcp_client_session(
    logger.debug(f"{client_name} client session cleaned up successfully")


-async def create_mcp_client_session_sse(
-    url: str,
-    token: str | None = None,
-    client_name: str = "MCP",
-    elicitation_callback: Any = None,
-) -> AsyncGenerator[ClientSession, Any]:
-    """
-    Factory function to create an MCP client session using SSE transport.
-
-    Similar to create_mcp_client_session but uses SSE transport instead of streamable-http.
-    Uses native async context managers to ensure correct LIFO cleanup order.
-
-    Args:
-        url: MCP server URL (e.g., "http://localhost:8000/sse")
-        token: Optional OAuth access token for Bearer authentication
-        client_name: Client name for logging (e.g., "Basic MCP (SSE)")
-        elicitation_callback: Optional callback for handling elicitation requests
-
-    Yields:
-        Initialized MCP ClientSession
-
-    Note:
-        SSE transport is being deprecated in favor of streamable-http.
-        This function exists for compatibility testing only.
-    """
-    logger.info(f"Creating SSE client for {client_name}")
-
-    # Prepare headers with OAuth token if provided
-    headers = {"Authorization": f"Bearer {token}"} if token else None
-
-    # Use native async with - Python ensures LIFO cleanup
-    # Cleanup order will be: ClientSession.__aexit__ -> sse_client.__aexit__
-    # Note: sse_client yields only (read_stream, write_stream), not 3 values like streamablehttp_client
-    async with sse_client(url, headers=headers) as (read_stream, write_stream):
-        async with ClientSession(
-            read_stream, write_stream, elicitation_callback=elicitation_callback
-        ) as session:
-            await session.initialize()
-            logger.info(f"{client_name} client session initialized successfully")
-            yield session
-
-    # Cleanup happens automatically in LIFO order - no exception suppression needed
-    logger.debug(f"{client_name} client session cleaned up successfully")
-
-
@pytest.fixture(scope="session")
 async def nc_client(anyio_backend) -> AsyncGenerator[NextcloudClient, Any]:
    """
@@ -249,21 +203,12 @@ async def nc_client(anyio_backend) -> AsyncGenerator[NextcloudClient, Any]:
@pytest.fixture(scope="session")
 async def nc_mcp_client(anyio_backend) -> AsyncGenerator[ClientSession, Any]:
    """
-    Fixture to create an MCP client session for integration tests using SSE transport.
+    Fixture to create an MCP client session for integration tests using streamable-http.

    Uses anyio pytest plugin for proper async fixture handling.
-
-    Note: SSE transport is being deprecated. This fixture uses SSE for compatibility testing.
    """
-
-    # async for session in create_mcp_client_session_sse(
-    # url="http://localhost:8000/sse", client_name="Basic MCP (SSE)"
-    # ):
-    # yield session
-
    async for session in create_mcp_client_session(
-        url="http://localhost:8000/mcp",
-        client_name="Basic MCP (HTTP)",
+        url="http://localhost:8000/mcp", client_name="Basic MCP"
    ):
        yield session

@@ -1,278 +0,0 @@
-# RAG Evaluation Tests
-
-This directory contains tests for evaluating the Retrieval-Augmented Generation (RAG) system in the Nextcloud MCP server, specifically the `nc_semantic_search_answer` tool.
-
-## Architecture
-
-The RAG system has two components that are tested independently:
-
-1. **Retrieval** - Vector sync/embedding pipeline (indexed Nextcloud documents → vector database)
-2. **Generation** - MCP client LLM synthesis (retrieved context → natural language answer)
-
-See [ADR-013](../../docs/ADR-013-rag-evaluation.md) for full architectural details.
-
-## Test Structure
-
-```
-tests/rag_evaluation/
-├── README.md                       # This file
-├── conftest.py                     # Pytest fixtures
-├── llm_providers.py                # LLM provider abstraction (Ollama/Anthropic)
-├── fixtures/
-│   └── ground_truth.json           # Pre-generated reference answers
-├── test_retrieval_quality.py       # Retrieval evaluation (Context Recall)
-└── test_generation_quality.py      # Generation evaluation (Answer Correctness)
-```
-
-## Metrics
-
-### Retrieval Evaluation
- **Metric**: Context Recall
- **Method**: Heuristic - Check if ground-truth document IDs appear in top-k results
- **Target**: ≥80% recall
-
-### Generation Evaluation
- **Metric**: Answer Correctness
- **Method**: LLM-as-judge - Compare RAG answer vs ground truth (binary true/false)
- **Evaluation**: External LLM evaluates semantic equivalence
-
-## Dataset
-
-**BeIR/nfcorpus** - Medical/biomedical corpus with ~3,600 documents
-
-**Test Queries** (5 selected):
-1. PLAIN-2630: "Alkylphenol Endocrine Disruptors and Allergies" (21 relevant docs)
-2. PLAIN-2660: "How Long to Detox From Fish Before Pregnancy?" (20 relevant docs)
-3. PLAIN-2510: "Coffee and Artery Function" (16 relevant docs)
-4. PLAIN-2430: "Preventing Brain Loss with B Vitamins?" (15 relevant docs)
-5. PLAIN-2690: "Chronic Headaches and Pork Tapeworms" (14 relevant docs)
-
-## Setup
-
-### 1. Install Dependencies
-
-```bash
-uv sync --group dev
-```
-
-This installs:
- `anthropic>=0.42.0` - For Anthropic LLM evaluation
- `click>=8.1.8` - For CLI interface
- `datasets>=3.3.0` - For BeIR nfcorpus dataset loading
-
-### 2. Configure LLM Provider
-
-Set environment variables for your LLM provider:
-
-**Option A: Ollama (default, local/remote)**
-```bash
-export RAG_EVAL_PROVIDER=ollama
-export OLLAMA_HOST=https://ollama.example.com  # or RAG_EVAL_OLLAMA_BASE_URL
-export RAG_EVAL_OLLAMA_MODEL=llama3.2:1b
-```
-
-**Option B: Anthropic (cloud)**
-```bash
-export RAG_EVAL_PROVIDER=anthropic
-export RAG_EVAL_ANTHROPIC_API_KEY=sk-ant-...
-export RAG_EVAL_ANTHROPIC_MODEL=claude-3-5-sonnet-20241022
-```
-
-### 3. One-Time Setup: Generate Ground Truth
-
-Generate synthetic reference answers for the 5 test queries:
-
-```bash
-uv run python tools/rag_eval_cli.py generate
-```
-
-**What this does:**
- Downloads nfcorpus dataset to `tests/rag_evaluation/fixtures/nfcorpus/` (cached locally)
- For each of the 5 selected queries, extracts highly relevant documents
- Uses configured LLM to synthesize a reference answer
- Saves to `tests/rag_evaluation/fixtures/ground_truth.json`
-
-**Optional flags:**
- `--provider ollama|anthropic` - Override LLM provider
- `--model MODEL_NAME` - Override model name
- `--force-download` - Re-download nfcorpus dataset
-
-### 4. One-Time Setup: Upload Corpus to Nextcloud
-
-Upload all 3,633 nfcorpus documents as Nextcloud notes:
-
-```bash
-uv run python tools/rag_eval_cli.py upload \
-    --nextcloud-url http://localhost:8000 \
-    --username admin \
-    --password admin
-```
-
-**What this does:**
- Downloads nfcorpus dataset (if not already cached)
- Uploads all documents as notes in Nextcloud
- Saves document ID → note ID mapping to `tests/rag_evaluation/fixtures/note_mapping.json`
-
-**Optional flags:**
- `--category CATEGORY` - Custom category for notes (default: `nfcorpus_rag_eval`)
- `--force-download` - Re-download nfcorpus dataset
- `--force` - Delete all existing notes in the target category before uploading (efficient corpus refresh)
-
-**Important:** This step requires:
- A running Nextcloud instance with vector sync enabled
- Notes app installed
- Valid credentials
-
-**Duration:** ~10-15 minutes to upload 3,633 documents
-
-## Running Tests
-
-### Run All RAG Evaluation Tests
-
-```bash
-uv run pytest tests/rag_evaluation/ -v
-```
-
-### Run Specific Test Suites
-
-**Retrieval Quality Only:**
-```bash
-uv run pytest tests/rag_evaluation/test_retrieval_quality.py -v
-```
-
-**Generation Quality Only:**
-```bash
-uv run pytest tests/rag_evaluation/test_generation_quality.py -v
-```
-
-### Run Individual Tests
-
-```bash
-uv run pytest tests/rag_evaluation/test_retrieval_quality.py::test_retrieval_context_recall -v
-uv run pytest tests/rag_evaluation/test_generation_quality.py::test_answer_correctness -v
-```
-
-## Test Execution Flow
-
-**Prerequisites** (one-time setup):
-1. Generated ground truth (`tools/rag_eval_cli.py generate`)
-2. Uploaded corpus to Nextcloud (`tools/rag_eval_cli.py upload`)
-
-### Retrieval Quality Tests
-
-1. **Setup** (`nfcorpus_test_data` fixture):
-   - Loads pre-generated ground truth from `fixtures/ground_truth.json`
-   - Loads note mapping from `fixtures/note_mapping.json`
-   - Returns test cases with expected note IDs
-
-2. **Test** (`test_retrieval_context_recall`):
-   - For each query: Perform semantic search (top-10)
-   - Extract retrieved note IDs
-   - Calculate Context Recall = (expected ∩ retrieved) / expected
-   - Assert recall ≥ 80%
-
-3. **Cleanup**:
-   - None required (notes persist in Nextcloud for reuse)
-
-### Generation Quality Tests
-
-1. **Setup**:
-   - Same as retrieval tests (reuses `nfcorpus_test_data` fixture)
-   - Creates evaluation LLM provider
-
-2. **Test** (`test_answer_correctness`):
-   - For each query: Call `nc_semantic_search_answer` MCP tool
-   - Extract generated answer
-   - Use LLM-as-judge to compare vs ground truth
-   - Assert semantic equivalence (TRUE/FALSE)
-
-3. **Cleanup**:
-   - LLM provider closed
-
-## Expected Test Duration
-
-**One-time setup:**
- **Generate ground truth**: ~5-10 minutes (5 queries with LLM generation)
- **Upload corpus**: ~10-15 minutes (3,633 documents)
- **Total setup**: ~15-25 minutes
-
-**Test execution** (after setup):
- **Retrieval tests**: ~1-2 minutes (5 queries, no upload/cleanup)
- **Generation tests**: ~5-10 minutes (RAG generation + LLM evaluation)
- **Total per run**: ~6-12 minutes
-
-**Note**: These are NOT smoke tests and are NOT run in CI.
-
-## Limitations & Future Work
-
-**Current Limitations:**
- Only 5 test queries (limited statistical confidence)
- Medical domain bias (may not represent production use cases)
- Synthetic ground truth (LLM-generated, not human-validated)
- Manual test execution (requires external LLM access)
-
-**Future Enhancements:**
- Expand to 50-100 queries for statistical significance
- Add custom test dataset with production-representative documents
- Implement additional metrics (faithfulness, context relevance, answer relevance)
- Create automated benchmarking dashboard
- Test multi-hop reasoning (synthesis questions)
- Evaluate out-of-scope handling ("I don't know" responses)
-
-## Troubleshooting
-
-### Tests Fail with "Ground truth file not found"
-
-Run the generate command first:
-```bash
-uv run python tools/rag_eval_cli.py generate
-```
-
-### Tests Fail with "Note mapping file not found"
-
-Run the upload command first:
-```bash
-uv run python tools/rag_eval_cli.py upload --nextcloud-url http://localhost:8000 --username admin --password admin
-```
-
-### Tests Fail with "MCP sampling client not yet implemented"
-
-The `mcp_sampling_client` fixture is a placeholder. You need to implement MCP client creation with sampling support. See the TODO in `conftest.py`.
-
-### Upload Command Fails
-
-Common issues:
-1. **Nextcloud not running**: Ensure Nextcloud is accessible at the URL
-2. **Invalid credentials**: Verify username/password
-3. **Notes app not installed**: Install Notes app in Nextcloud
-4. **Network timeout**: Increase timeout in CLI (currently 60s)
-
-### LLM Timeout
-
-If ground truth generation times out:
-1. Increase timeout in `llm_providers.py` (currently 10 min)
-2. Use a faster model: `--model llama3.2:1b`
-3. Check Ollama/Anthropic service availability
-
-### Dataset Download Fails
-
-The nfcorpus dataset is downloaded automatically. If download fails:
-1. Check internet connection
-2. Manually download from: https://public.ukp.informatik.tu-darmstadt.de/thakur/BEIR/datasets/nfcorpus.zip
-3. Extract to `tests/rag_evaluation/fixtures/nfcorpus/`
-4. Or use HuggingFace datasets cache: `~/.cache/huggingface/datasets/BeIR___nfcorpus/`
-
-### Vector Sync Not Indexing Documents
-
-After uploading, vector sync must index the documents:
-1. Check vector sync is enabled in Nextcloud
-2. Trigger manual sync if needed
-3. Wait for background job to process all documents
-4. Verify in Qdrant that vectors exist for uploaded notes
-
-## References
-
- [ADR-013: RAG Evaluation Testing Framework](../../docs/ADR-013-rag-evaluation.md)
- [ADR-008: MCP Sampling for Semantic Search](../../docs/ADR-008-mcp-sampling-for-semantic-search.md)
- [BeIR Benchmark](https://github.com/beir-cellar/beir)
- [NFCorpus Dataset](https://www.cl.uni-heidelberg.de/statnlpgroup/nfcorpus/)
@@ -1 +0,0 @@
-"""RAG evaluation tests for the Nextcloud MCP semantic search system."""
@@ -1,145 +0,0 @@
-"""Pytest fixtures for RAG evaluation tests.
-
-IMPORTANT: Before running these tests, you must:
-1. Generate ground truth: uv run python tools/rag_eval_cli.py generate
-2. Upload corpus: uv run python tools/rag_eval_cli.py upload --nextcloud-url http://localhost:8000 --username admin --password admin
-
-This ensures that the ground truth and note mappings are available.
-"""
-
-import json
-from pathlib import Path
-from typing import Any
-
-import pytest
-
-from tests.rag_evaluation.llm_providers import create_llm_provider
-
-# Paths
-FIXTURES_DIR = Path(__file__).parent / "fixtures"
-GROUND_TRUTH_FILE = FIXTURES_DIR / "ground_truth.json"
-NOTE_MAPPING_FILE = FIXTURES_DIR / "note_mapping.json"
-
-
-@pytest.fixture(scope="session")
-def ground_truth_data() -> list[dict[str, Any]]:
-    """Load pre-generated ground truth data.
-
-    Returns:
-        List of test cases with query, ground truth answer, and expected doc IDs
-
-    Raises:
-        FileNotFoundError: If ground_truth.json doesn't exist
-    """
-    if not GROUND_TRUTH_FILE.exists():
-        raise FileNotFoundError(
-            f"Ground truth file not found: {GROUND_TRUTH_FILE}\n"
-            "Run: uv run python tools/rag_eval_cli.py generate"
-        )
-
-    with open(GROUND_TRUTH_FILE) as f:
-        return json.load(f)
-
-
-@pytest.fixture(scope="session")
-def note_mapping() -> dict[str, int]:
-    """Load document ID → note ID mapping.
-
-    Returns:
-        Dict mapping nfcorpus document ID to Nextcloud note ID
-
-    Raises:
-        FileNotFoundError: If note_mapping.json doesn't exist
-    """
-    if not NOTE_MAPPING_FILE.exists():
-        raise FileNotFoundError(
-            f"Note mapping file not found: {NOTE_MAPPING_FILE}\n"
-            "Run: uv run python tools/rag_eval_cli.py upload --nextcloud-url ... --username ... --password ..."
-        )
-
-    with open(NOTE_MAPPING_FILE) as f:
-        return json.load(f)
-
-
-@pytest.fixture(scope="session")
-def nfcorpus_test_data(
-    ground_truth_data: list[dict[str, Any]],
-    note_mapping: dict[str, int],
-):
-    """Prepare nfcorpus test data for evaluation.
-
-    This fixture combines ground truth answers with note mappings to create
-    test cases ready for retrieval and generation quality tests.
-
-    Args:
-        ground_truth_data: Pre-generated ground truth answers
-        note_mapping: Document ID → note ID mapping
-
-    Returns:
-        List of test cases with query, ground truth, expected doc IDs, and note IDs
-    """
-    test_cases = []
-
-    for gt in ground_truth_data:
-        # Map expected document IDs to note IDs
-        expected_note_ids = [
-            note_mapping.get(doc_id)
-            for doc_id in gt["expected_document_ids"]
-            if doc_id in note_mapping
-        ]
-
-        # Filter out None values (docs that weren't uploaded)
-        expected_note_ids = [nid for nid in expected_note_ids if nid is not None]
-
-        test_cases.append(
-            {
-                "query_id": gt["query_id"],
-                "query_text": gt["query_text"],
-                "ground_truth_answer": gt["ground_truth_answer"],
-                "expected_document_ids": gt["expected_document_ids"],
-                "expected_note_ids": expected_note_ids,
-                "highly_relevant_count": gt["highly_relevant_count"],
-            }
-        )
-
-    return test_cases
-
-
-@pytest.fixture(scope="session")
-async def evaluation_llm():
-    """Create LLM provider for evaluation (separate from MCP client).
-
-    Environment variables:
-      RAG_EVAL_PROVIDER: Provider type (ollama or anthropic)
-      RAG_EVAL_OLLAMA_BASE_URL: Ollama base URL (or OLLAMA_HOST)
-      RAG_EVAL_OLLAMA_MODEL: Ollama model name
-      RAG_EVAL_ANTHROPIC_API_KEY: Anthropic API key
-      RAG_EVAL_ANTHROPIC_MODEL: Anthropic model name
-
-    Returns:
-        LLM provider instance (OllamaProvider or AnthropicProvider)
-    """
-    llm = create_llm_provider()
-    yield llm
-    await llm.close()
-
-
-@pytest.fixture(scope="session")
-async def mcp_sampling_client():
-    """Create MCP client that supports sampling for RAG generation.
-
-    This fixture creates an MCP client configured to support sampling,
-    which is required for testing the nc_semantic_search_answer tool.
-
-    TODO: Implement MCP client with sampling support
-    For now, this is a placeholder.
-
-    Returns:
-        MCP client instance with sampling enabled
-    """
-    # TODO: Implement MCP client creation with sampling support
-    # This will require:
-    # 1. Creating an MCP client configured for sampling
-    # 2. Authenticating with Nextcloud
-    # 3. Ensuring sampling is enabled
-    pytest.skip("MCP sampling client not yet implemented")
@@ -1,89 +0,0 @@
-"""LLM provider abstraction for RAG evaluation.
-
-DEPRECATED: This module is maintained for backward compatibility with RAG evaluation tests.
-New code should use nextcloud_mcp_server.providers directly.
-
-Supports Ollama (local), Anthropic (cloud), and Bedrock (AWS) providers for both ground truth
-generation and evaluation.
-"""
-
-import os
-
-from nextcloud_mcp_server.providers import (
-    AnthropicProvider,
-    BedrockProvider,
-    OllamaProvider,
-    Provider,
-)
-
-
-def create_llm_provider(
-    provider: str | None = None,
-    ollama_base_url: str | None = None,
-    ollama_model: str | None = None,
-    anthropic_api_key: str | None = None,
-    anthropic_model: str | None = None,
-    bedrock_region: str | None = None,
-    bedrock_model: str | None = None,
-) -> Provider:
-    """Create an LLM provider from environment variables or arguments.
-
-    Args:
-        provider: Provider type ('ollama', 'anthropic', or 'bedrock').
-            Defaults to RAG_EVAL_PROVIDER env var or 'ollama'
-        ollama_base_url: Ollama base URL. Defaults to RAG_EVAL_OLLAMA_BASE_URL or 'http://localhost:11434'
-        ollama_model: Ollama model. Defaults to RAG_EVAL_OLLAMA_MODEL or 'llama3.2:1b'
-        anthropic_api_key: Anthropic API key. Defaults to RAG_EVAL_ANTHROPIC_API_KEY env var
-        anthropic_model: Anthropic model. Defaults to RAG_EVAL_ANTHROPIC_MODEL or 'claude-3-5-sonnet-20241022'
-        bedrock_region: AWS region. Defaults to RAG_EVAL_BEDROCK_REGION or AWS_REGION env var
-        bedrock_model: Bedrock model ID. Defaults to RAG_EVAL_BEDROCK_MODEL or
-            'anthropic.claude-3-sonnet-20240229-v1:0'
-
-    Returns:
-        Provider instance
-
-    Raises:
-        ValueError: If provider is invalid or required credentials are missing
-    """
-    # Get provider from args or env
-    provider = provider or os.environ.get("RAG_EVAL_PROVIDER", "ollama")
-
-    if provider == "ollama":
-        # Try RAG_EVAL_OLLAMA_BASE_URL, then OLLAMA_HOST, then default
-        base_url = (
-            ollama_base_url
-            or os.environ.get("RAG_EVAL_OLLAMA_BASE_URL")
-            or os.environ.get("OLLAMA_HOST")
-            or "http://localhost:11434"
-        )
-        model = ollama_model or os.environ.get("RAG_EVAL_OLLAMA_MODEL", "llama3.2:1b")
-        return OllamaProvider(
-            base_url=base_url, embedding_model=None, generation_model=model
-        )
-
-    elif provider == "anthropic":
-        api_key = anthropic_api_key or os.environ.get("RAG_EVAL_ANTHROPIC_API_KEY")
-        if not api_key:
-            raise ValueError(
-                "Anthropic API key required. Set RAG_EVAL_ANTHROPIC_API_KEY environment variable."
-            )
-        model = anthropic_model or os.environ.get(
-            "RAG_EVAL_ANTHROPIC_MODEL", "claude-3-5-sonnet-20241022"
-        )
-        return AnthropicProvider(api_key=api_key, model=model)
-
-    elif provider == "bedrock":
-        region = bedrock_region or os.environ.get(
-            "RAG_EVAL_BEDROCK_REGION", os.environ.get("AWS_REGION", "us-east-1")
-        )
-        model = bedrock_model or os.environ.get(
-            "RAG_EVAL_BEDROCK_MODEL", "anthropic.claude-3-sonnet-20240229-v1:0"
-        )
-        return BedrockProvider(
-            region_name=region, embedding_model=None, generation_model=model
-        )
-
-    else:
-        raise ValueError(
-            f"Invalid provider: {provider}. Must be 'ollama', 'anthropic', or 'bedrock'."
-        )
@@ -1,139 +0,0 @@
-"""Tests for RAG generation quality (Answer Correctness metric).
-
-These tests evaluate whether the MCP client LLM generates factually correct
-answers from retrieved context using the nc_semantic_search_answer tool.
-
-Metric: Answer Correctness
- Measures: Is the generated answer factually correct?
- Method: LLM-as-judge - Compare RAG answer vs ground truth (binary true/false)
- Evaluation: External LLM evaluates semantic equivalence
-"""
-
-import pytest
-
-
-@pytest.mark.integration
-async def test_answer_correctness(
-    mcp_sampling_client,
-    evaluation_llm,
-    nfcorpus_test_data,
-):
-    """Test that RAG system generates factually correct answers.
-
-    For each test query:
-    1. Execute full RAG pipeline via nc_semantic_search_answer MCP tool
-    2. Extract generated answer from RAG response
-    3. Use LLM-as-judge to compare against ground truth (binary true/false)
-    4. Assert answer is semantically equivalent to ground truth
-
-    This tests the quality of the generation component (MCP client LLM).
-    """
-    results_summary = []
-
-    for test_case in nfcorpus_test_data:
-        query = test_case["query_text"]
-        ground_truth = test_case["ground_truth_answer"]
-
-        print(f"\n{'=' * 80}")
-        print(f"Query: {query}")
-
-        # Execute full RAG pipeline
-        print("Executing RAG pipeline...")
-        rag_result = await mcp_sampling_client.call_tool(
-            "nc_semantic_search_answer",
-            arguments={"query": query, "limit": 5},
-        )
-
-        rag_answer = rag_result["generated_answer"]
-
-        print(f"RAG Answer preview: {rag_answer[:200]}...")
-        print(f"Ground Truth preview: {ground_truth[:200]}...")
-
-        # LLM-as-judge evaluation
-        evaluation_prompt = f"""Compare these two answers and respond with only TRUE or FALSE.
-
-Question: {query}
-
-Generated Answer: {rag_answer}
-
-Ground Truth Answer: {ground_truth}
-
-Are these answers semantically equivalent (do they convey the same factual information)?
-Respond with only: TRUE or FALSE"""
-
-        print("Evaluating answer correctness...")
-        evaluation_result = await evaluation_llm.generate(
-            evaluation_prompt,
-            max_tokens=10,
-        )
-
-        is_correct = evaluation_result.strip().upper() == "TRUE"
-
-        result = {
-            "query_id": test_case["query_id"],
-            "query": query,
-            "rag_answer_length": len(rag_answer),
-            "ground_truth_length": len(ground_truth),
-            "is_correct": is_correct,
-            "evaluation_result": evaluation_result.strip(),
-        }
-        results_summary.append(result)
-
-        print(f"  Evaluation: {evaluation_result.strip()}")
-        print(f"  Status: {'✓ CORRECT' if is_correct else '✗ INCORRECT'}")
-
-        # Assert answer correctness
-        assert is_correct, (
-            f"Answer mismatch for query: {query}\n\n"
-            f"Generated Answer:\n{rag_answer}\n\n"
-            f"Ground Truth:\n{ground_truth}\n\n"
-            f"Evaluation: {evaluation_result.strip()}"
-        )
-
-    # Print summary
-    print(f"\n{'=' * 80}")
-    print("Answer Correctness Summary:")
-    print(f"  Total queries: {len(results_summary)}")
-    print(f"  Correct: {sum(r['is_correct'] for r in results_summary)}")
-    print(f"  Incorrect: {sum(not r['is_correct'] for r in results_summary)}")
-    accuracy = sum(r["is_correct"] for r in results_summary) / len(results_summary)
-    print(f"  Accuracy: {accuracy:.2%}")
-    print(f"{'=' * 80}")
-
-
-@pytest.mark.integration
-async def test_answer_contains_sources(mcp_sampling_client, nfcorpus_test_data):
-    """Test that RAG answers include source citations.
-
-    This is a basic quality check - we verify that the nc_semantic_search_answer
-    tool returns both a generated answer and source documents.
-    """
-    for test_case in nfcorpus_test_data:
-        query = test_case["query_text"]
-
-        # Execute RAG pipeline
-        rag_result = await mcp_sampling_client.call_tool(
-            "nc_semantic_search_answer",
-            arguments={"query": query, "limit": 5},
-        )
-
-        # Check response structure
-        assert "generated_answer" in rag_result, "Response missing 'generated_answer'"
-        assert "sources" in rag_result, "Response missing 'sources'"
-
-        # Check sources are provided
-        sources = rag_result["sources"]
-        assert len(sources) > 0, f"No sources returned for query: {query}"
-
-        # Check each source has required fields
-        for i, source in enumerate(sources):
-            assert "document_id" in source or "id" in source, (
-                f"Source {i} missing document ID"
-            )
-            assert "excerpt" in source or "content" in source or "text" in source, (
-                f"Source {i} missing content"
-            )
-
-        print(f"Query: {query}")
-        print(f"  Sources provided: {len(sources)}")
-        print("  Status: ✓ PASS")
@@ -1,143 +0,0 @@
-"""Tests for RAG retrieval quality (Context Recall metric).
-
-These tests evaluate whether the vector sync/embedding pipeline successfully
-retrieves documents containing the answer to a query.
-
-Metric: Context Recall
- Measures: Did we retrieve documents containing the answer?
- Method: Heuristic - Check if ground-truth document IDs appear in top-k results
- Target: ≥80% recall (at least 80% of expected docs in top-10 results)
-"""
-
-import pytest
-
-
-@pytest.mark.integration
-async def test_retrieval_context_recall(nc_client, nfcorpus_test_data):
-    """Test that semantic search retrieves documents containing the answer.
-
-    For each test query:
-    1. Perform semantic search (retrieval only, no generation)
-    2. Extract retrieved document IDs from top-k results
-    3. Calculate Context Recall: intersection of retrieved and expected docs
-    4. Assert recall meets threshold (≥80%)
-
-    This tests the quality of the vector sync/embedding pipeline.
-    """
-    # Top-k documents to retrieve
-    k = 10
-
-    # Minimum acceptable recall
-    min_recall = 0.8
-
-    results_summary = []
-
-    for test_case in nfcorpus_test_data:
-        query = test_case["query_text"]
-        expected_note_ids = set(test_case["expected_note_ids"])
-
-        # Perform semantic search (retrieval only)
-        search_results = await nc_client.notes.semantic_search(
-            query=query,
-            limit=k,
-        )
-
-        # Extract retrieved note IDs
-        retrieved_note_ids = {result["id"] for result in search_results}
-
-        # Calculate Context Recall
-        intersection = expected_note_ids & retrieved_note_ids
-        recall = len(intersection) / len(expected_note_ids) if expected_note_ids else 0
-
-        # Store results
-        result = {
-            "query_id": test_case["query_id"],
-            "query": query,
-            "expected_count": len(expected_note_ids),
-            "retrieved_count": len(retrieved_note_ids),
-            "intersection_count": len(intersection),
-            "recall": recall,
-            "passed": recall >= min_recall,
-        }
-        results_summary.append(result)
-
-        # Print detailed result for this query
-        print(f"\n{'=' * 80}")
-        print(f"Query: {query}")
-        print(f"  Expected docs: {len(expected_note_ids)}")
-        print(f"  Retrieved (top-{k}): {len(retrieved_note_ids)}")
-        print(f"  Intersection: {len(intersection)}")
-        print(f"  Context Recall: {recall:.2%}")
-        print(f"  Status: {'✓ PASS' if result['passed'] else '✗ FAIL'}")
-
-        # Assert recall meets threshold
-        assert recall >= min_recall, (
-            f"Context Recall {recall:.2%} below threshold {min_recall:.2%} "
-            f"for query: {query}\n"
-            f"Expected {len(expected_note_ids)} docs, found {len(intersection)} in top-{k}"
-        )
-
-    # Print summary
-    print(f"\n{'=' * 80}")
-    print("Context Recall Summary:")
-    print(f"  Total queries: {len(results_summary)}")
-    print(f"  Passed: {sum(r['passed'] for r in results_summary)}")
-    print(f"  Failed: {sum(not r['passed'] for r in results_summary)}")
-    print(
-        f"  Average recall: {sum(r['recall'] for r in results_summary) / len(results_summary):.2%}"
-    )
-    print(f"{'=' * 80}")
-
-
-@pytest.mark.integration
-async def test_retrieval_top1_precision(nc_client, nfcorpus_test_data):
-    """Test that the top-1 retrieved document is highly relevant.
-
-    This is a stricter test than context recall - we verify that
-    the single most relevant document (rank 1) is in the expected set.
-
-    This tests whether the ranking is good, not just retrieval.
-    """
-    results_summary = []
-
-    for test_case in nfcorpus_test_data:
-        query = test_case["query_text"]
-        expected_note_ids = set(test_case["expected_note_ids"])
-
-        # Perform semantic search
-        search_results = await nc_client.notes.semantic_search(
-            query=query,
-            limit=1,  # Only top-1
-        )
-
-        # Check if top result is in expected set
-        if search_results:
-            top_result_id = search_results[0]["id"]
-            is_relevant = top_result_id in expected_note_ids
-        else:
-            is_relevant = False
-
-        result = {
-            "query_id": test_case["query_id"],
-            "query": query,
-            "top_result_id": search_results[0]["id"] if search_results else None,
-            "is_relevant": is_relevant,
-        }
-        results_summary.append(result)
-
-        print(f"\nQuery: {query}")
-        print(f"  Top-1 relevant: {'✓ YES' if is_relevant else '✗ NO'}")
-
-        # This is informational - we don't assert here
-        # Some queries may have multiple valid top results
-
-    # Print summary
-    precision_at_1 = sum(r["is_relevant"] for r in results_summary) / len(
-        results_summary
-    )
-    print(f"\n{'=' * 80}")
-    print(f"Precision@1: {precision_at_1:.2%}")
-    print(
-        f"  ({sum(r['is_relevant'] for r in results_summary)}/{len(results_summary)} queries)"
-    )
-    print(f"{'=' * 80}")
@@ -1 +0,0 @@
-"""Unit tests for provider infrastructure."""
@@ -1,280 +0,0 @@
-"""Unit tests for Bedrock provider."""
-
-import json
-from unittest.mock import MagicMock
-
-import pytest
-
-from nextcloud_mcp_server.providers.bedrock import BOTO3_AVAILABLE, BedrockProvider
-
-
-@pytest.fixture
-def mock_bedrock_client(mocker):
-    """Mock boto3 bedrock-runtime client."""
-    if not BOTO3_AVAILABLE:
-        pytest.skip("boto3 not installed")
-
-    mock_client = MagicMock()
-    mocker.patch("boto3.client", return_value=mock_client)
-    return mock_client
-
-
-@pytest.mark.unit
-async def test_bedrock_embedding_titan(mock_bedrock_client):
-    """Test Bedrock embedding with Titan model."""
-    # Mock response
-    mock_response = {
-        "body": MagicMock(
-            read=MagicMock(
-                return_value=json.dumps({"embedding": [0.1, 0.2, 0.3]}).encode()
-            )
-        )
-    }
-    mock_bedrock_client.invoke_model.return_value = mock_response
-
-    # Create provider
-    provider = BedrockProvider(
-        region_name="us-east-1",
-        embedding_model="amazon.titan-embed-text-v2:0",
-        generation_model=None,
-    )
-
-    # Test embedding
-    embedding = await provider.embed("test text")
-
-    assert embedding == [0.1, 0.2, 0.3]
-    mock_bedrock_client.invoke_model.assert_called_once()
-    call_args = mock_bedrock_client.invoke_model.call_args
-
-    assert call_args.kwargs["modelId"] == "amazon.titan-embed-text-v2:0"
-    body = json.loads(call_args.kwargs["body"])
-    assert body == {"inputText": "test text"}
-
-
-@pytest.mark.unit
-async def test_bedrock_embedding_batch(mock_bedrock_client):
-    """Test Bedrock batch embedding."""
-    # Mock response
-    mock_response = {
-        "body": MagicMock(
-            read=MagicMock(
-                return_value=json.dumps({"embedding": [0.1, 0.2, 0.3]}).encode()
-            )
-        )
-    }
-    mock_bedrock_client.invoke_model.return_value = mock_response
-
-    # Create provider
-    provider = BedrockProvider(
-        region_name="us-east-1",
-        embedding_model="amazon.titan-embed-text-v2:0",
-        generation_model=None,
-    )
-
-    # Test batch embedding
-    embeddings = await provider.embed_batch(["text1", "text2"])
-
-    assert len(embeddings) == 2
-    assert embeddings[0] == [0.1, 0.2, 0.3]
-    assert embeddings[1] == [0.1, 0.2, 0.3]
-    assert mock_bedrock_client.invoke_model.call_count == 2
-
-
-@pytest.mark.unit
-async def test_bedrock_generation_claude(mock_bedrock_client):
-    """Test Bedrock text generation with Claude model."""
-    # Mock response
-    mock_response = {
-        "body": MagicMock(
-            read=MagicMock(
-                return_value=json.dumps(
-                    {"content": [{"text": "Generated response"}]}
-                ).encode()
-            )
-        )
-    }
-    mock_bedrock_client.invoke_model.return_value = mock_response
-
-    # Create provider
-    provider = BedrockProvider(
-        region_name="us-east-1",
-        embedding_model=None,
-        generation_model="anthropic.claude-3-sonnet-20240229-v1:0",
-    )
-
-    # Test generation
-    text = await provider.generate("test prompt", max_tokens=100)
-
-    assert text == "Generated response"
-    mock_bedrock_client.invoke_model.assert_called_once()
-    call_args = mock_bedrock_client.invoke_model.call_args
-
-    assert call_args.kwargs["modelId"] == "anthropic.claude-3-sonnet-20240229-v1:0"
-    body = json.loads(call_args.kwargs["body"])
-    assert body["messages"][0]["content"] == "test prompt"
-    assert body["max_tokens"] == 100
-
-
-@pytest.mark.unit
-async def test_bedrock_generation_llama(mock_bedrock_client):
-    """Test Bedrock text generation with Llama model."""
-    # Mock response
-    mock_response = {
-        "body": MagicMock(
-            read=MagicMock(
-                return_value=json.dumps({"generation": "Llama response"}).encode()
-            )
-        )
-    }
-    mock_bedrock_client.invoke_model.return_value = mock_response
-
-    # Create provider
-    provider = BedrockProvider(
-        region_name="us-east-1",
-        embedding_model=None,
-        generation_model="meta.llama3-8b-instruct-v1:0",
-    )
-
-    # Test generation
-    text = await provider.generate("test prompt")
-
-    assert text == "Llama response"
-    body = json.loads(mock_bedrock_client.invoke_model.call_args.kwargs["body"])
-    assert body["prompt"] == "test prompt"
-    assert "max_gen_len" in body
-
-
-@pytest.mark.unit
-async def test_bedrock_both_capabilities(mock_bedrock_client):
-    """Test Bedrock with both embedding and generation models."""
-    # Mock responses
-    embed_response = {
-        "body": MagicMock(
-            read=MagicMock(return_value=json.dumps({"embedding": [0.1, 0.2]}).encode())
-        )
-    }
-    gen_response = {
-        "body": MagicMock(
-            read=MagicMock(
-                return_value=json.dumps({"content": [{"text": "Response"}]}).encode()
-            )
-        )
-    }
-
-    # Mock to return different responses based on modelId
-    def mock_invoke(modelId, body, **kwargs):
-        if "embed" in modelId:
-            return embed_response
-        else:
-            return gen_response
-
-    mock_bedrock_client.invoke_model.side_effect = mock_invoke
-
-    # Create provider with both models
-    provider = BedrockProvider(
-        region_name="us-east-1",
-        embedding_model="amazon.titan-embed-text-v2:0",
-        generation_model="anthropic.claude-3-sonnet-20240229-v1:0",
-    )
-
-    assert provider.supports_embeddings is True
-    assert provider.supports_generation is True
-
-    # Test both capabilities
-    embedding = await provider.embed("test")
-    assert embedding == [0.1, 0.2]
-
-    text = await provider.generate("test")
-    assert text == "Response"
-
-
-@pytest.mark.unit
-async def test_bedrock_no_embeddings():
-    """Test Bedrock provider with no embedding model raises error."""
-    provider = BedrockProvider(
-        region_name="us-east-1",
-        embedding_model=None,
-        generation_model="anthropic.claude-3-sonnet-20240229-v1:0",
-    )
-
-    assert provider.supports_embeddings is False
-
-    with pytest.raises(NotImplementedError, match="no embedding_model configured"):
-        await provider.embed("test")
-
-    with pytest.raises(NotImplementedError, match="no embedding_model configured"):
-        await provider.embed_batch(["test"])
-
-    with pytest.raises(NotImplementedError, match="no embedding_model configured"):
-        provider.get_dimension()
-
-
-@pytest.mark.unit
-async def test_bedrock_no_generation():
-    """Test Bedrock provider with no generation model raises error."""
-    provider = BedrockProvider(
-        region_name="us-east-1",
-        embedding_model="amazon.titan-embed-text-v2:0",
-        generation_model=None,
-    )
-
-    assert provider.supports_generation is False
-
-    with pytest.raises(NotImplementedError, match="no generation_model configured"):
-        await provider.generate("test")
-
-
-@pytest.mark.unit
-async def test_bedrock_dimension_detection(mock_bedrock_client):
-    """Test dimension detection for Bedrock embeddings."""
-    # Mock response with specific dimension
-    mock_response = {
-        "body": MagicMock(
-            read=MagicMock(
-                return_value=json.dumps(
-                    {"embedding": [0.1] * 1536}  # 1536-dim embedding
-                ).encode()
-            )
-        )
-    }
-    mock_bedrock_client.invoke_model.return_value = mock_response
-
-    provider = BedrockProvider(
-        region_name="us-east-1",
-        embedding_model="amazon.titan-embed-text-v2:0",
-    )
-
-    # Dimension not detected yet
-    with pytest.raises(RuntimeError, match="not detected yet"):
-        provider.get_dimension()
-
-    # Detect dimension
-    await provider._detect_dimension()
-
-    # Now dimension should be available
-    assert provider.get_dimension() == 1536
-
-
-@pytest.mark.unit
-async def test_bedrock_cohere_embedding(mock_bedrock_client):
-    """Test Bedrock with Cohere embedding model."""
-    # Mock response
-    mock_response = {
-        "body": MagicMock(
-            read=MagicMock(
-                return_value=json.dumps({"embeddings": [[0.1, 0.2, 0.3]]}).encode()
-            )
-        )
-    }
-    mock_bedrock_client.invoke_model.return_value = mock_response
-
-    provider = BedrockProvider(
-        region_name="us-east-1",
-        embedding_model="cohere.embed-english-v3",
-    )
-
-    embedding = await provider.embed("test text")
-
-    assert embedding == [0.1, 0.2, 0.3]
-    body = json.loads(mock_bedrock_client.invoke_model.call_args.kwargs["body"])
-    assert body == {"texts": ["test text"], "input_type": "search_document"}
@@ -1 +0,0 @@
-"""Unit tests for search algorithms."""
@@ -1,54 +0,0 @@
-"""Unit tests for BM25 hybrid search algorithm."""
-
-import pytest
-from qdrant_client import models
-
-from nextcloud_mcp_server.search.bm25_hybrid import BM25HybridSearchAlgorithm
-
-
-@pytest.mark.unit
-def test_bm25_hybrid_initialization_default():
-    """Test BM25HybridSearchAlgorithm initializes with default RRF fusion."""
-    algo = BM25HybridSearchAlgorithm()
-
-    assert algo.score_threshold == 0.0
-    assert algo.fusion == models.Fusion.RRF
-    assert algo.fusion_name == "rrf"
-    assert algo.name == "bm25_hybrid"
-
-
-@pytest.mark.unit
-def test_bm25_hybrid_initialization_with_rrf():
-    """Test BM25HybridSearchAlgorithm initializes with explicit RRF fusion."""
-    algo = BM25HybridSearchAlgorithm(score_threshold=0.5, fusion="rrf")
-
-    assert algo.score_threshold == 0.5
-    assert algo.fusion == models.Fusion.RRF
-    assert algo.fusion_name == "rrf"
-
-
-@pytest.mark.unit
-def test_bm25_hybrid_initialization_with_dbsf():
-    """Test BM25HybridSearchAlgorithm initializes with DBSF fusion."""
-    algo = BM25HybridSearchAlgorithm(score_threshold=0.7, fusion="dbsf")
-
-    assert algo.score_threshold == 0.7
-    assert algo.fusion == models.Fusion.DBSF
-    assert algo.fusion_name == "dbsf"
-
-
-@pytest.mark.unit
-def test_bm25_hybrid_invalid_fusion_raises_error():
-    """Test BM25HybridSearchAlgorithm raises ValueError for invalid fusion."""
-    with pytest.raises(ValueError) as exc_info:
-        BM25HybridSearchAlgorithm(fusion="invalid")
-
-    assert "Invalid fusion algorithm 'invalid'" in str(exc_info.value)
-    assert "Must be 'rrf' or 'dbsf'" in str(exc_info.value)
-
-
-@pytest.mark.unit
-def test_bm25_hybrid_requires_vector_db():
-    """Test BM25HybridSearchAlgorithm reports it requires vector database."""
-    algo = BM25HybridSearchAlgorithm()
-    assert algo.requires_vector_db is True
@@ -1,135 +0,0 @@
-"""Unit tests for SearchResult validation."""
-
-import pytest
-
-from nextcloud_mcp_server.search.algorithms import SearchResult
-
-
-@pytest.mark.unit
-def test_search_result_rrf_score_in_range():
-    """Test SearchResult accepts RRF scores in [0.0, 1.0] range."""
-    result = SearchResult(
-        id=1,
-        doc_type="note",
-        title="Test Note",
-        excerpt="Test excerpt",
-        score=0.85,
-    )
-
-    assert result.score == 0.85
-
-
-@pytest.mark.unit
-def test_search_result_rrf_score_at_lower_bound():
-    """Test SearchResult accepts RRF score at lower bound (0.0)."""
-    result = SearchResult(
-        id=1,
-        doc_type="note",
-        title="Test Note",
-        excerpt="Test excerpt",
-        score=0.0,
-    )
-
-    assert result.score == 0.0
-
-
-@pytest.mark.unit
-def test_search_result_rrf_score_at_upper_bound():
-    """Test SearchResult accepts RRF score at upper bound (1.0)."""
-    result = SearchResult(
-        id=1,
-        doc_type="note",
-        title="Test Note",
-        excerpt="Test excerpt",
-        score=1.0,
-    )
-
-    assert result.score == 1.0
-
-
-@pytest.mark.unit
-def test_search_result_dbsf_score_above_one():
-    """Test SearchResult accepts DBSF scores > 1.0.
-
-    DBSF (Distribution-Based Score Fusion) sums normalized scores from multiple
-    systems (dense semantic + sparse BM25), so scores can exceed 1.0 when both
-    systems strongly agree a document is relevant.
-    """
-    # Typical DBSF score when both systems agree
-    result = SearchResult(
-        id=1,
-        doc_type="note",
-        title="Highly Relevant Note",
-        excerpt="Contains keywords and is semantically similar",
-        score=1.55,
-    )
-
-    assert result.score == 1.55
-
-
-@pytest.mark.unit
-def test_search_result_dbsf_score_edge_case():
-    """Test SearchResult accepts DBSF maximum theoretical score (2.0).
-
-    Maximum DBSF score with 2 systems: 1.0 (dense) + 1.0 (sparse) = 2.0
-    """
-    result = SearchResult(
-        id=1,
-        doc_type="note",
-        title="Perfect Match",
-        excerpt="Perfect semantic and keyword match",
-        score=2.0,
-    )
-
-    assert result.score == 2.0
-
-
-@pytest.mark.unit
-def test_search_result_negative_score_raises_error():
-    """Test SearchResult rejects negative scores."""
-    with pytest.raises(ValueError) as exc_info:
-        SearchResult(
-            id=1,
-            doc_type="note",
-            title="Test Note",
-            excerpt="Test excerpt",
-            score=-0.1,
-        )
-
-    assert "Score must be non-negative" in str(exc_info.value)
-    assert "got -0.1" in str(exc_info.value)
-
-
-@pytest.mark.unit
-def test_search_result_with_metadata():
-    """Test SearchResult with optional metadata field."""
-    result = SearchResult(
-        id=1,
-        doc_type="note",
-        title="Test Note",
-        excerpt="Test excerpt",
-        score=1.25,
-        metadata={"fusion_method": "dbsf", "dense_score": 0.8, "sparse_score": 0.45},
-    )
-
-    assert result.score == 1.25
-    assert result.metadata["fusion_method"] == "dbsf"
-    assert result.metadata["dense_score"] == 0.8
-    assert result.metadata["sparse_score"] == 0.45
-
-
-@pytest.mark.unit
-def test_search_result_with_chunk_offsets():
-    """Test SearchResult with chunk offset information."""
-    result = SearchResult(
-        id=1,
-        doc_type="note",
-        title="Test Note",
-        excerpt="matching chunk text",
-        score=0.9,
-        chunk_start_offset=100,
-        chunk_end_offset=500,
-    )
-
-    assert result.chunk_start_offset == 100
-    assert result.chunk_end_offset == 500
@@ -1,288 +0,0 @@
-"""Unit tests for DocumentChunker with LangChain text splitters."""
-
-from nextcloud_mcp_server.vector.document_chunker import (
-    ChunkWithPosition,
-    DocumentChunker,
-)
-
-
-class TestDocumentChunkerPositions:
-    """Test suite for DocumentChunker position tracking functionality."""
-
-    def test_single_chunk_simple_text(self):
-        """Test that single-chunk documents return correct positions."""
-        chunker = DocumentChunker(chunk_size=2048, overlap=200)
-        content = "This is a short document."
-
-        chunks = chunker.chunk_text(content)
-
-        assert len(chunks) == 1
-        assert isinstance(chunks[0], ChunkWithPosition)
-        assert chunks[0].text == content
-        assert chunks[0].start_offset == 0
-        assert chunks[0].end_offset == len(content)
-
-    def test_multiple_chunks_positions(self):
-        """Test that multi-chunk documents have correct positions."""
-        # Use small chunk size to force multiple chunks
-        chunker = DocumentChunker(chunk_size=50, overlap=10)
-        # Create content longer than chunk size
-        content = (
-            "This is the first sentence with some important content. "
-            "This is the second sentence with more details. "
-            "This is the third sentence continuing the discussion. "
-            "This is the fourth sentence adding more context."
-        )
-
-        chunks = chunker.chunk_text(content)
-
-        # Verify we got multiple chunks
-        assert len(chunks) > 1
-
-        # Verify all chunks are ChunkWithPosition
-        for chunk in chunks:
-            assert isinstance(chunk, ChunkWithPosition)
-
-        # Verify first chunk starts at 0
-        assert chunks[0].start_offset == 0
-
-        # Verify last chunk ends at content length
-        assert chunks[-1].end_offset == len(content)
-
-        # Verify chunks are contiguous or overlap (minimal gaps allowed)
-        for i in range(len(chunks) - 1):
-            # Next chunk should start at or near current chunk end
-            # Allow small gaps (1-2 chars) for whitespace/punctuation at boundaries
-            gap = chunks[i + 1].start_offset - chunks[i].end_offset
-            assert gap <= 2, f"Gap too large between chunks: {gap} characters"
-
-        # Verify we can reconstruct the content using positions
-        for chunk in chunks:
-            extracted = content[chunk.start_offset : chunk.end_offset]
-            assert extracted == chunk.text
-
-    def test_chunk_positions_with_whitespace(self):
-        """Test position tracking with various whitespace."""
-        chunker = DocumentChunker(chunk_size=30, overlap=5)
-        content = "First sentence here.  Second sentence.\n\nThird sentence.\tFourth sentence."
-
-        chunks = chunker.chunk_text(content)
-
-        # Verify positions correctly handle whitespace
-        for chunk in chunks:
-            extracted = content[chunk.start_offset : chunk.end_offset]
-            assert extracted == chunk.text
-            # LangChain strips whitespace by default
-            assert len(chunk.text.strip()) > 0
-
-    def test_empty_content(self):
-        """Test that empty content returns empty chunk."""
-        chunker = DocumentChunker(chunk_size=2048, overlap=200)
-        content = ""
-
-        chunks = chunker.chunk_text(content)
-
-        assert len(chunks) == 1
-        assert chunks[0].text == ""
-        assert chunks[0].start_offset == 0
-        assert chunks[0].end_offset == 0
-
-    def test_chunk_overlap_positions(self):
-        """Test that overlapping chunks have correct positions."""
-        chunker = DocumentChunker(chunk_size=50, overlap=15)
-        content = (
-            "This is sentence one with content. "
-            "This is sentence two with more. "
-            "This is sentence three continuing. "
-            "This is sentence four adding details."
-        )
-
-        chunks = chunker.chunk_text(content)
-
-        # Verify overlap exists if we have multiple chunks
-        if len(chunks) > 1:
-            for i in range(len(chunks) - 1):
-                current_chunk = chunks[i]
-                next_chunk = chunks[i + 1]
-
-                # Verify positions are valid
-                assert next_chunk.start_offset >= 0
-                assert current_chunk.end_offset <= len(content)
-
-                # With overlap, next chunk may start before current ends
-                assert next_chunk.start_offset <= current_chunk.end_offset
-
-    def test_unicode_content_positions(self):
-        """Test position tracking with Unicode characters."""
-        chunker = DocumentChunker(chunk_size=50, overlap=10)
-        content = (
-            "Hello 世界. こんにちは there. мир Привет world. שלום مرحبا 你好 friend."
-        )
-
-        chunks = chunker.chunk_text(content)
-
-        # Verify all chunks extract correctly
-        for chunk in chunks:
-            extracted = content[chunk.start_offset : chunk.end_offset]
-            assert extracted == chunk.text
-
-        # Verify full coverage
-        if len(chunks) == 1:
-            assert chunks[0].start_offset == 0
-            assert chunks[0].end_offset == len(content)
-
-    def test_realistic_note_content(self):
-        """Test with realistic note content similar to Nextcloud Notes."""
-        chunker = DocumentChunker(chunk_size=200, overlap=50)
-        content = """My Project Notes
-
-This is a note about my project. It contains several paragraphs of text
-that should be chunked appropriately for embedding.
-
-## Key Points
-
- First important point with some details
- Second point that needs to be remembered
- Third point for future reference
-
-The document continues with more content here. We want to make sure that
-the chunking preserves context across boundaries while maintaining proper
-position tracking for each chunk.
-
-This allows us to highlight the exact chunk that matched a search query,
-which builds trust in the RAG system."""
-
-        chunks = chunker.chunk_text(content)
-
-        # Should have multiple chunks
-        assert len(chunks) > 1
-
-        # Verify all chunks
-        for chunk in chunks:
-            assert isinstance(chunk, ChunkWithPosition)
-            # Verify extraction
-            extracted = content[chunk.start_offset : chunk.end_offset]
-            assert extracted == chunk.text
-            # Verify positions are valid
-            assert chunk.start_offset >= 0
-            assert chunk.end_offset <= len(content)
-            assert chunk.start_offset < chunk.end_offset
-
-    def test_semantic_boundary_preservation(self):
-        """Test that LangChain creates semantically coherent chunks."""
-        chunker = DocumentChunker(chunk_size=100, overlap=20)
-        content = (
-            "First sentence is here. "
-            "Second sentence follows. "
-            "Third sentence continues. "
-            "Fourth sentence ends."
-        )
-
-        chunks = chunker.chunk_text(content)
-
-        # Verify all chunks are extractable using their positions
-        for chunk in chunks:
-            extracted = content[chunk.start_offset : chunk.end_offset]
-            assert extracted == chunk.text
-
-            # Verify chunk text is meaningful (not empty or just whitespace)
-            assert len(chunk.text.strip()) > 0
-
-            # Verify positions are valid
-            assert chunk.start_offset >= 0
-            assert chunk.end_offset <= len(content)
-            assert chunk.start_offset < chunk.end_offset
-
-    def test_paragraph_boundary_preservation(self):
-        """Test that LangChain preserves paragraph boundaries."""
-        chunker = DocumentChunker(chunk_size=80, overlap=15)
-        content = """First paragraph here.
-
-Second paragraph here.
-
-Third paragraph here.
-
-Fourth paragraph here."""
-
-        chunks = chunker.chunk_text(content)
-
-        # LangChain should prefer splitting at paragraph boundaries (\n\n)
-        # Verify we got multiple chunks
-        assert len(chunks) >= 1
-
-        # Verify all positions work correctly
-        for chunk in chunks:
-            extracted = content[chunk.start_offset : chunk.end_offset]
-            assert extracted == chunk.text
-
-    def test_default_parameters(self):
-        """Test that default parameters work correctly."""
-        chunker = DocumentChunker()  # Use defaults: 2048 chars, 200 overlap
-
-        # Create content that's smaller than default chunk size
-        content = (
-            "This is a short note with a few sentences. It should fit in one chunk."
-        )
-
-        chunks = chunker.chunk_text(content)
-
-        assert len(chunks) == 1
-        assert chunks[0].text == content
-        assert chunks[0].start_offset == 0
-        assert chunks[0].end_offset == len(content)
-
-    def test_large_document_chunking(self):
-        """Test chunking of a large document."""
-        chunker = DocumentChunker(chunk_size=100, overlap=20)
-
-        # Create a large document with multiple paragraphs
-        paragraphs = [
-            f"This is paragraph {i} with some meaningful content about topic {i}. "
-            f"It contains multiple sentences to make it realistic. "
-            f"The content should be properly chunked."
-            for i in range(10)
-        ]
-        content = "\n\n".join(paragraphs)
-
-        chunks = chunker.chunk_text(content)
-
-        # Should create multiple chunks
-        assert len(chunks) > 1
-
-        # Verify all chunks are valid
-        for chunk in chunks:
-            assert isinstance(chunk, ChunkWithPosition)
-            assert len(chunk.text) > 0
-            # Verify extraction
-            extracted = content[chunk.start_offset : chunk.end_offset]
-            assert extracted == chunk.text
-
-        # Verify first and last positions
-        assert chunks[0].start_offset == 0
-        assert chunks[-1].end_offset == len(content)
-
-    def test_position_tracking_with_overlap(self):
-        """Test that position tracking works correctly with overlap."""
-        chunker = DocumentChunker(chunk_size=50, overlap=15)
-        content = "A" * 25 + ". " + "B" * 25 + ". " + "C" * 25 + ". " + "D" * 25 + "."
-
-        chunks = chunker.chunk_text(content)
-
-        if len(chunks) > 1:
-            # Verify overlap creates correct positions
-            for i in range(len(chunks) - 1):
-                # Each chunk should be extractable
-                assert (
-                    content[chunks[i].start_offset : chunks[i].end_offset]
-                    == chunks[i].text
-                )
-
-                # Next chunk should overlap with current
-                # (start before current ends)
-                if chunks[i + 1].start_offset < chunks[i].end_offset:
-                    # There is overlap - verify content matches
-                    overlap_start = chunks[i + 1].start_offset
-                    overlap_end = chunks[i].end_offset
-                    overlap_text = content[overlap_start:overlap_end]
-                    assert overlap_text in chunks[i].text
-                    assert overlap_text in chunks[i + 1].text
@@ -1,217 +0,0 @@
-"""
-Unit tests for @instrument_tool decorator.
-
-Tests that the decorator correctly instruments MCP tools with both
-Prometheus metrics and OpenTelemetry tracing.
-"""
-
-from unittest.mock import MagicMock, patch
-
-import pytest
-
-from nextcloud_mcp_server.observability.metrics import instrument_tool
-
-pytestmark = pytest.mark.unit
-
-
-@pytest.fixture
-def mock_metrics():
-    """Mock Prometheus metrics."""
-    with (
-        patch(
-            "nextcloud_mcp_server.observability.metrics.record_tool_call"
-        ) as mock_record,
-        patch(
-            "nextcloud_mcp_server.observability.metrics.record_tool_error"
-        ) as mock_error,
-    ):
-        yield {"record_tool_call": mock_record, "record_tool_error": mock_error}
-
-
-@pytest.fixture
-def mock_tracer():
-    """Mock OpenTelemetry tracer."""
-    with patch(
-        "nextcloud_mcp_server.observability.tracing.trace_operation"
-    ) as mock_trace:
-        # Configure mock to act as a context manager that allows exceptions to propagate
-        mock_trace.return_value.__enter__ = MagicMock(return_value=None)
-        mock_trace.return_value.__exit__ = MagicMock(
-            return_value=False
-        )  # Return False to allow exceptions to propagate
-        yield mock_trace
-
-
-class TestInstrumentToolDecorator:
-    """Test the @instrument_tool decorator."""
-
-    async def test_decorator_creates_trace_span(self, mock_tracer, mock_metrics):
-        """Test that decorator creates OpenTelemetry span with correct attributes."""
-
-        @instrument_tool
-        async def example_tool(query: str, limit: int = 10):
-            return {"results": []}
-
-        # Call the tool
-        await example_tool(query="test query", limit=5)
-
-        # Verify trace_operation was called with correct parameters
-        mock_tracer.assert_called_once()
-        call_args = mock_tracer.call_args
-
-        # Check span name
-        assert call_args[0][0] == "mcp.tool.example_tool"
-
-        # Check span attributes
-        attributes = call_args[1]["attributes"]
-        assert attributes["mcp.tool.name"] == "example_tool"
-        assert "query" in attributes["mcp.tool.args"]
-        assert "test query" in attributes["mcp.tool.args"]
-        assert "limit" in attributes["mcp.tool.args"]
-
-        # Verify record_exception parameter
-        assert call_args[1]["record_exception"] is True
-
-    async def test_decorator_sanitizes_sensitive_arguments(
-        self, mock_tracer, mock_metrics
-    ):
-        """Test that sensitive arguments are excluded from span attributes."""
-
-        @instrument_tool
-        async def example_tool(
-            query: str, password: str, token: str, api_key: str, ctx: object
-        ):
-            return {"success": True}
-
-        # Call with sensitive parameters
-        await example_tool(
-            query="test",
-            password="secret123",
-            token="bearer_token",
-            api_key="api_key_123",
-            ctx=MagicMock(),
-        )
-
-        # Verify trace was created
-        mock_tracer.assert_called_once()
-        attributes = mock_tracer.call_args[1]["attributes"]
-
-        # Check that sensitive fields are NOT in attributes
-        tool_args = attributes["mcp.tool.args"]
-        assert "password" not in tool_args
-        assert "secret123" not in tool_args
-        assert "token" not in tool_args
-        assert "bearer_token" not in tool_args
-        assert "api_key" not in tool_args
-        assert "api_key_123" not in tool_args
-        assert "ctx" not in tool_args
-
-        # Check that non-sensitive field IS included
-        assert "query" in tool_args
-        assert "test" in tool_args
-
-    async def test_decorator_limits_argument_string_length(
-        self, mock_tracer, mock_metrics
-    ):
-        """Test that tool arguments are limited to 500 characters."""
-
-        @instrument_tool
-        async def example_tool(query: str):
-            return {"results": []}
-
-        # Create a very long query string (>500 chars)
-        long_query = "x" * 1000
-
-        await example_tool(query=long_query)
-
-        # Verify arguments were truncated
-        mock_tracer.assert_called_once()
-        attributes = mock_tracer.call_args[1]["attributes"]
-        tool_args = attributes["mcp.tool.args"]
-
-        assert len(tool_args) <= 500
-
-    async def test_decorator_records_success_metrics(self, mock_tracer, mock_metrics):
-        """Test that successful tool execution records metrics."""
-
-        @instrument_tool
-        async def example_tool():
-            return {"success": True}
-
-        # Call the tool
-        await example_tool()
-
-        # Verify success metrics were recorded
-        mock_metrics["record_tool_call"].assert_called_once()
-        call_args = mock_metrics["record_tool_call"].call_args
-        assert call_args[0][0] == "example_tool"  # tool_name
-        assert isinstance(call_args[0][1], float)  # duration
-        assert call_args[0][2] == "success"  # status
-
-    async def test_decorator_records_error_metrics(self, mock_tracer, mock_metrics):
-        """Test that tool errors are recorded in metrics."""
-
-        @instrument_tool
-        async def failing_tool():
-            raise ValueError("Test error")
-
-        # Call the tool and expect exception
-        with pytest.raises(ValueError, match="Test error"):
-            await failing_tool()
-
-        # Verify error metrics were recorded
-        mock_metrics["record_tool_call"].assert_called_once()
-        call_args = mock_metrics["record_tool_call"].call_args
-        assert call_args[0][0] == "failing_tool"  # tool_name
-        assert isinstance(call_args[0][1], float)  # duration
-        assert call_args[0][2] == "error"  # status
-
-        # Verify error type was recorded
-        mock_metrics["record_tool_error"].assert_called_once()
-        error_args = mock_metrics["record_tool_error"].call_args
-        assert error_args[0][0] == "failing_tool"  # tool_name
-        assert error_args[0][1] == "ValueError"  # error_type
-
-    async def test_decorator_preserves_function_metadata(
-        self, mock_tracer, mock_metrics
-    ):
-        """Test that decorator preserves function name and docstring."""
-
-        @instrument_tool
-        async def example_tool():
-            """This is a test tool."""
-            return {"success": True}
-
-        # Verify function metadata is preserved
-        assert example_tool.__name__ == "example_tool"
-        assert example_tool.__doc__ == "This is a test tool."
-
-    async def test_decorator_preserves_return_value(self, mock_tracer, mock_metrics):
-        """Test that decorator returns the original function's return value."""
-
-        @instrument_tool
-        async def example_tool(value: int):
-            return {"result": value * 2}
-
-        # Call the tool
-        result = await example_tool(value=5)
-
-        # Verify return value is unchanged
-        assert result == {"result": 10}
-
-    async def test_decorator_with_no_arguments(self, mock_tracer, mock_metrics):
-        """Test decorator with tool that takes no arguments."""
-
-        @instrument_tool
-        async def no_args_tool():
-            return {"status": "ok"}
-
-        # Call the tool
-        await no_args_tool()
-
-        # Verify tracing works with no arguments
-        mock_tracer.assert_called_once()
-        attributes = mock_tracer.call_args[1]["attributes"]
-
-        # tool_args should be None when there are no kwargs
-        assert attributes["mcp.tool.args"] is None
@@ -1,587 +0,0 @@
-#!/usr/bin/env python3
-"""RAG Evaluation Management CLI.
-
-Commands:
-  generate - Generate ground truth answers from nfcorpus dataset
-  upload   - Upload nfcorpus documents as Nextcloud notes
-
-Usage:
-    # Generate ground truth
-    uv run python tools/rag_eval_cli.py generate
-
-    # Upload corpus to Nextcloud
-    uv run python tools/rag_eval_cli.py upload --nextcloud-url http://localhost:8000 --username admin --password admin
-"""
-
-import io
-import json
-import sys
-import zipfile
-from pathlib import Path
-from typing import Any
-
-import anyio
-import click
-import httpx
-from datasets import load_dataset
-from httpx import BasicAuth
-
-# Add parent directory to path to import from tests/
-sys.path.insert(0, str(Path(__file__).parent.parent))
-
-from nextcloud_mcp_server.client import NextcloudClient
-from tests.rag_evaluation.llm_providers import create_llm_provider
-
-# Paths
-FIXTURES_DIR = Path(__file__).parent.parent / "tests" / "rag_evaluation" / "fixtures"
-CORPUS_DIR = FIXTURES_DIR / "nfcorpus"
-GROUND_TRUTH_FILE = FIXTURES_DIR / "ground_truth.json"
-NOTE_MAPPING_FILE = FIXTURES_DIR / "note_mapping.json"
-
-# Dataset URL
-NFCORPUS_URL = (
-    "https://public.ukp.informatik.tu-darmstadt.de/thakur/BEIR/datasets/nfcorpus.zip"
-)
-
-# Selected test queries (from ADR-013)
-SELECTED_QUERIES = [
-    "PLAIN-2630",  # Alkylphenol Endocrine Disruptors and Allergies
-    "PLAIN-2660",  # How Long to Detox From Fish Before Pregnancy?
-    "PLAIN-2510",  # Coffee and Artery Function
-    "PLAIN-2430",  # Preventing Brain Loss with B Vitamins?
-    "PLAIN-2690",  # Chronic Headaches and Pork Tapeworms
-]
-
-
-def ensure_corpus_downloaded(force_download: bool = False) -> Path:
-    """Ensure nfcorpus dataset is downloaded to fixtures directory.
-
-    Args:
-        force_download: Force re-download even if corpus exists
-
-    Returns:
-        Path to corpus directory
-
-    Raises:
-        RuntimeError: If download fails
-    """
-    if CORPUS_DIR.exists() and not force_download:
-        click.echo(f"Corpus already exists at {CORPUS_DIR}")
-        return CORPUS_DIR
-
-    click.echo(f"Downloading nfcorpus dataset to {CORPUS_DIR}...")
-
-    # Create fixtures directory
-    FIXTURES_DIR.mkdir(parents=True, exist_ok=True)
-
-    # Download using HuggingFace datasets library (handles caching)
-    try:
-        # Download corpus
-        click.echo("  Downloading corpus...")
-        corpus_dataset = load_dataset(
-            "BeIR/nfcorpus",
-            "corpus",
-            split="corpus",
-        )
-
-        # Download queries
-        click.echo("  Downloading queries...")
-        queries_dataset = load_dataset(
-            "BeIR/nfcorpus",
-            "queries",
-            split="queries",
-        )
-
-        # Save to local fixtures directory as JSONL
-        CORPUS_DIR.mkdir(parents=True, exist_ok=True)
-
-        # Save corpus
-        with open(CORPUS_DIR / "corpus.jsonl", "w") as f:
-            for doc in corpus_dataset:
-                f.write(json.dumps(doc) + "\n")
-
-        # Save queries
-        with open(CORPUS_DIR / "queries.jsonl", "w") as f:
-            for query in queries_dataset:
-                f.write(json.dumps(query) + "\n")
-
-        # Download qrels from BEIR directly (not available via HuggingFace)
-        click.echo("  Downloading qrels from BEIR ZIP...")
-        with httpx.Client(timeout=300.0) as client:
-            response = client.get(NFCORPUS_URL)
-            response.raise_for_status()
-
-            # Extract qrels from ZIP
-            with zipfile.ZipFile(io.BytesIO(response.content)) as zf:
-                # The qrels are in nfcorpus/qrels/test.tsv within the ZIP
-                qrels_path = "nfcorpus/qrels/test.tsv"
-                qrels_dir = CORPUS_DIR / "qrels"
-                qrels_dir.mkdir(parents=True, exist_ok=True)
-
-                qrels_content = zf.read(qrels_path).decode("utf-8")
-                with open(qrels_dir / "test.tsv", "w") as f:
-                    f.write(qrels_content)
-
-        click.echo(f"Dataset downloaded to {CORPUS_DIR}")
-        return CORPUS_DIR
-
-    except Exception as e:
-        raise RuntimeError(f"Failed to download nfcorpus dataset: {e}") from e
-
-
-def load_corpus(corpus_dir: Path) -> dict[str, dict]:
-    """Load corpus documents from local directory.
-
-    Args:
-        corpus_dir: Path to corpus directory
-
-    Returns:
-        Dict mapping document ID to document data
-    """
-    corpus = {}
-    with open(corpus_dir / "corpus.jsonl") as f:
-        for line in f:
-            doc = json.loads(line)
-            corpus[doc["_id"]] = doc
-    return corpus
-
-
-def load_queries(corpus_dir: Path) -> dict[str, dict]:
-    """Load queries from local directory.
-
-    Args:
-        corpus_dir: Path to corpus directory
-
-    Returns:
-        Dict mapping query ID to query data
-    """
-    queries = {}
-    with open(corpus_dir / "queries.jsonl") as f:
-        for line in f:
-            query = json.loads(line)
-            queries[query["_id"]] = query
-    return queries
-
-
-def load_qrels(corpus_dir: Path) -> dict[str, list[tuple[str, int]]]:
-    """Load query relevance judgments from local directory.
-
-    Args:
-        corpus_dir: Path to corpus directory
-
-    Returns:
-        Dict mapping query ID to list of (doc_id, score) tuples
-    """
-    qrels: dict[str, list[tuple[str, int]]] = {}
-    with open(corpus_dir / "qrels" / "test.tsv") as f:
-        next(f)  # Skip header
-        for line in f:
-            query_id, corpus_id, score = line.strip().split("\t")
-            if query_id not in qrels:
-                qrels[query_id] = []
-            qrels[query_id].append((corpus_id, int(score)))
-
-    # Sort by score descending
-    for query_id in qrels:
-        qrels[query_id].sort(key=lambda x: x[1], reverse=True)
-
-    return qrels
-
-
-async def generate_ground_truth_answer(
-    query_text: str, relevant_docs: list[dict[str, Any]], llm
-) -> str:
-    """Generate ground truth answer from highly relevant documents.
-
-    Args:
-        query_text: The query/question
-        relevant_docs: List of highly relevant documents (top 5)
-        llm: LLM provider instance
-
-    Returns:
-        Generated ground truth answer
-    """
-    # Construct context from documents
-    context_parts = []
-    for i, doc in enumerate(relevant_docs, 1):
-        context_parts.append(
-            f"Document {i}:\nTitle: {doc['title']}\nText: {doc['text']}\n"
-        )
-    context = "\n".join(context_parts)
-
-    # Generate ground truth
-    prompt = f"""Based on the following medical/biomedical documents, provide a comprehensive, factual answer to this question.
-
-Question: {query_text}
-
-{context}
-
-Instructions:
- Provide a clear, well-structured answer that synthesizes information from the documents
- Focus on accuracy and completeness
- Use specific facts and findings from the documents
- Keep the answer concise but informative (2-4 paragraphs)
- Do not make up information not present in the documents
-
-Answer:"""
-
-    click.echo(f"  Generating answer for: {query_text}")
-    answer = await llm.generate(prompt, max_tokens=500)
-    click.echo(f"  Generated {len(answer)} characters")
-    return answer.strip()
-
-
-@click.group()
-def cli():
-    """RAG Evaluation Management CLI.
-
-    Manage ground truth generation and corpus upload for RAG evaluation tests.
-    """
-    pass
-
-
-@cli.command()
-@click.option(
-    "--provider",
-    type=click.Choice(["ollama", "anthropic"]),
-    default="ollama",
-    help="LLM provider to use for generation",
-)
-@click.option(
-    "--model",
-    help="Model name (default: llama3.2:1b for Ollama, claude-3-5-sonnet-20241022 for Anthropic)",
-)
-@click.option(
-    "--force-download",
-    is_flag=True,
-    help="Force re-download of nfcorpus dataset",
-)
-def generate(provider: str, model: str | None, force_download: bool):
-    """Generate ground truth answers for RAG evaluation.
-
-    This command:
-    1. Downloads nfcorpus dataset (if not already cached)
-    2. For each selected query, extracts highly relevant documents
-    3. Uses an LLM to synthesize a reference answer
-    4. Saves ground truth to fixtures/ground_truth.json
-
-    Environment variables:
-      RAG_EVAL_PROVIDER: Provider type (ollama or anthropic)
-      RAG_EVAL_OLLAMA_BASE_URL: Ollama base URL
-      RAG_EVAL_OLLAMA_MODEL: Ollama model name
-      RAG_EVAL_ANTHROPIC_API_KEY: Anthropic API key
-      RAG_EVAL_ANTHROPIC_MODEL: Anthropic model name
-    """
-
-    async def _generate():
-        click.echo("=" * 80)
-        click.echo("RAG Ground Truth Generation")
-        click.echo("=" * 80)
-
-        # Ensure corpus is downloaded
-        corpus_dir = ensure_corpus_downloaded(force_download)
-
-        # Load dataset
-        click.echo("\nLoading nfcorpus dataset...")
-        corpus = load_corpus(corpus_dir)
-        queries = load_queries(corpus_dir)
-        qrels = load_qrels(corpus_dir)
-        click.echo(f"Loaded {len(corpus)} documents, {len(queries)} queries")
-
-        # Create LLM provider
-        click.echo("\nInitializing LLM provider...")
-        try:
-            llm = create_llm_provider(
-                provider=provider,
-                ollama_model=model if provider == "ollama" else None,
-                anthropic_model=model if provider == "anthropic" else None,
-            )
-            provider_type = type(llm).__name__
-            click.echo(f"Using provider: {provider_type}")
-        except ValueError as e:
-            click.echo(f"\nError: {e}", err=True)
-            return 1
-
-        # Generate ground truth for each selected query
-        ground_truth_data = []
-
-        try:
-            for query_id in SELECTED_QUERIES:
-                if query_id not in queries:
-                    click.echo(
-                        f"\nWarning: Query {query_id} not found in dataset", err=True
-                    )
-                    continue
-
-                query = queries[query_id]
-                query_text = query["text"]
-
-                # Get highly relevant documents (score=2)
-                if query_id not in qrels:
-                    click.echo(
-                        f"\nWarning: No relevance judgments for {query_id}", err=True
-                    )
-                    continue
-
-                highly_relevant_doc_ids = [
-                    doc_id for doc_id, score in qrels[query_id] if score == 2
-                ]
-
-                if not highly_relevant_doc_ids:
-                    click.echo(
-                        f"\nWarning: No highly relevant docs for {query_id}", err=True
-                    )
-                    continue
-
-                # Get top 5 highly relevant documents
-                relevant_docs = []
-                for doc_id in highly_relevant_doc_ids[:5]:
-                    if doc_id in corpus:
-                        relevant_docs.append(corpus[doc_id])
-
-                if not relevant_docs:
-                    click.echo(
-                        f"\nWarning: Could not load documents for {query_id}", err=True
-                    )
-                    continue
-
-                # Generate ground truth answer
-                click.echo(f"\n{'-' * 80}")
-                ground_truth_answer = await generate_ground_truth_answer(
-                    query_text, relevant_docs, llm
-                )
-
-                # Store result
-                ground_truth_data.append(
-                    {
-                        "query_id": query_id,
-                        "query_text": query_text,
-                        "ground_truth_answer": ground_truth_answer,
-                        "expected_document_ids": highly_relevant_doc_ids,
-                        "highly_relevant_count": len(highly_relevant_doc_ids),
-                    }
-                )
-
-                click.echo(f"  Preview: {ground_truth_answer[:200]}...")
-
-        finally:
-            await llm.close()
-
-        # Save ground truth
-        GROUND_TRUTH_FILE.parent.mkdir(parents=True, exist_ok=True)
-        with open(GROUND_TRUTH_FILE, "w") as f:
-            json.dump(ground_truth_data, f, indent=2)
-
-        click.echo(f"\n{'=' * 80}")
-        click.echo(f"Generated {len(ground_truth_data)} ground truth answers")
-        click.echo(f"Saved to: {GROUND_TRUTH_FILE}")
-        click.echo("=" * 80)
-
-        return 0
-
-    sys.exit(anyio.run(_generate))
-
-
-@cli.command()
-@click.option(
-    "--nextcloud-url",
-    envvar="NEXTCLOUD_HOST",
-    required=True,
-    help="Nextcloud base URL (e.g., http://localhost:8000)",
-)
-@click.option(
-    "--username",
-    envvar="NEXTCLOUD_USERNAME",
-    required=True,
-    help="Nextcloud username",
-)
-@click.option(
-    "--password",
-    envvar="NEXTCLOUD_PASSWORD",
-    required=True,
-    help="Nextcloud password",
-)
-@click.option(
-    "--category",
-    default="nfcorpus_rag_eval",
-    help="Category/folder for uploaded notes",
-)
-@click.option(
-    "--force-download",
-    is_flag=True,
-    help="Force re-download of nfcorpus dataset",
-)
-@click.option(
-    "--force",
-    is_flag=True,
-    help="Delete all existing notes in the target category before uploading",
-)
-def upload(
-    nextcloud_url: str,
-    username: str,
-    password: str,
-    category: str,
-    force_download: bool,
-    force: bool,
-):
-    """Upload nfcorpus corpus documents as Nextcloud notes.
-
-    This command:
-    1. Downloads nfcorpus dataset (if not already cached)
-    2. Optionally deletes existing notes in target category (--force)
-    3. Uploads all corpus documents as Nextcloud notes
-    4. Saves document ID → note ID mapping to fixtures/note_mapping.json
-
-    The note mapping file is used by pytest tests to map expected document IDs
-    to actual note IDs in Nextcloud.
-    """
-
-    async def _upload():
-        click.echo("=" * 80)
-        click.echo("Upload nfcorpus Corpus to Nextcloud")
-        click.echo("=" * 80)
-
-        # Ensure corpus is downloaded
-        corpus_dir = ensure_corpus_downloaded(force_download)
-
-        # Load corpus
-        click.echo("\nLoading corpus...")
-        corpus = load_corpus(corpus_dir)
-        click.echo(f"Loaded {len(corpus)} documents")
-
-        # Create Nextcloud client
-        click.echo(f"\nConnecting to Nextcloud at {nextcloud_url}...")
-        nc_client = NextcloudClient(
-            base_url=nextcloud_url,
-            username=username,
-            auth=BasicAuth(username, password),
-        )
-
-        try:
-            # Delete existing notes in category if force is specified
-            if force:
-                click.echo(
-                    f"\n--force specified: Deleting existing notes in category '{category}'..."
-                )
-
-                # Collect notes to delete
-                notes_to_delete = []
-                async for note in nc_client.notes.get_all_notes():
-                    if note.get("category") == category:
-                        notes_to_delete.append(note["id"])
-
-                if not notes_to_delete:
-                    click.echo(f"No existing notes found in category '{category}'")
-                else:
-                    click.echo(f"Found {len(notes_to_delete)} notes to delete")
-
-                    deleted_count = 0
-                    delete_errors = []
-                    delete_semaphore = anyio.Semaphore(20)
-
-                    async def delete_note(note_id: int):
-                        """Delete a single note."""
-                        nonlocal deleted_count
-
-                        async with delete_semaphore:
-                            try:
-                                await nc_client.notes.delete_note(note_id)
-                                deleted_count += 1
-                                if deleted_count % 100 == 0:
-                                    click.echo(f"  Deleted {deleted_count} notes...")
-                            except Exception as e:
-                                error_msg = f"Error deleting note {note_id}: {e}"
-                                delete_errors.append(error_msg)
-                                click.echo(f"  {error_msg}", err=True)
-
-                    # Delete all notes concurrently
-                    async with anyio.create_task_group() as tg:
-                        for note_id in notes_to_delete:
-                            tg.start_soon(delete_note, note_id)
-
-                    click.echo(
-                        f"Deleted {deleted_count} existing notes in category '{category}'"
-                    )
-                    if delete_errors:
-                        click.echo(
-                            f"Encountered {len(delete_errors)} errors during deletion",
-                            err=True,
-                        )
-
-            # Upload documents concurrently
-            click.echo(f"\nUploading {len(corpus)} documents as notes (concurrent)...")
-            click.echo(f"Category: {category}")
-
-            note_mapping = {}
-            uploaded_count = 0
-            upload_errors = []
-
-            # Semaphore to limit concurrent uploads (avoid overwhelming server)
-            max_concurrent = 20
-            semaphore = anyio.Semaphore(max_concurrent)
-
-            async def upload_document(doc_id: str, doc: dict[str, Any]):
-                """Upload a single document as a note."""
-                nonlocal uploaded_count
-
-                async with semaphore:
-                    title = f"[{doc_id}] {doc['title'][:100]}"  # Truncate long titles
-                    content = doc["text"]
-
-                    try:
-                        note_data = await nc_client.notes.create_note(
-                            title=title,
-                            content=content,
-                            category=category,
-                        )
-
-                        # Store mapping
-                        note_id = note_data["id"]
-                        note_mapping[doc_id] = note_id
-
-                        uploaded_count += 1
-
-                        # Progress indicator every 100 docs
-                        if uploaded_count % 100 == 0:
-                            click.echo(
-                                f"  Uploaded {uploaded_count}/{len(corpus)} documents..."
-                            )
-
-                    except Exception as e:
-                        error_msg = f"Error uploading {doc_id}: {e}"
-                        upload_errors.append(error_msg)
-                        click.echo(f"  {error_msg}", err=True)
-
-            # Upload all documents concurrently using task group
-            async with anyio.create_task_group() as tg:
-                for doc_id, doc in corpus.items():
-                    tg.start_soon(upload_document, doc_id, doc)
-
-            click.echo(f"\nUploaded {uploaded_count} documents successfully")
-            if upload_errors:
-                click.echo(
-                    f"Encountered {len(upload_errors)} errors during upload", err=True
-                )
-
-            # Save note mapping
-            with open(NOTE_MAPPING_FILE, "w") as f:
-                json.dump(note_mapping, f, indent=2)
-
-            click.echo(f"Saved note mapping to: {NOTE_MAPPING_FILE}")
-            click.echo(f"  Mapped {len(note_mapping)} document IDs to note IDs")
-
-        finally:
-            # Close the Nextcloud client
-            await nc_client.close()
-
-        click.echo("=" * 80)
-        click.echo("Upload complete!")
-        click.echo("=" * 80)
-
-        return 0
-
-    sys.exit(anyio.run(_upload))
-
-
-if __name__ == "__main__":
-    cli()
				`@@ -1 +0,0 @@`
				`"""RAG evaluation tests for the Nextcloud MCP semantic search system."""`
				`@@ -1 +0,0 @@`
				`"""Unit tests for provider infrastructure."""`