diff --git a/docs/ADR-002-vector-sync-authentication.md b/docs/ADR-002-vector-sync-authentication.md
new file mode 100644
index 0000000..5322651
--- /dev/null
+++ b/docs/ADR-002-vector-sync-authentication.md
@@ -0,0 +1,795 @@
+# ADR-002: Vector Database Background Sync Authentication
+
+## Status
+Proposed
+
+## Context
+
+To enable semantic search capabilities, the MCP server needs to index user content (notes, files, calendar events) into a vector database. This requires a background sync worker that:
+
+1. **Runs independently** of user requests (periodic or continuous operation)
+2. **Accesses multiple users' content** to build a comprehensive search index
+3. **Respects user permissions** - only index content users have access to
+4. **Operates in OAuth mode** - where the MCP server doesn't have traditional admin credentials
+
+### Current OAuth Architecture
+
+The MCP server currently operates in two authentication modes:
+
+1. **BasicAuth Mode**: Uses username/password credentials (typically admin account)
+2. **OAuth Mode**: Single OAuth client, multiple user tokens
+   - Users authenticate via OAuth flow
+   - Each request includes user's access token
+   - Server creates per-request `NextcloudClient` with user's bearer token
+   - No tokens are stored server-side
+
+### The Challenge
+
+Background workers need long-lived authentication to:
+- Index content continuously/periodically
+- Process multiple users' data in batch operations
+- Operate when users are not actively making requests
+
+However, in OAuth mode:
+- User access tokens are ephemeral (exist only during request)
+- MCP server doesn't store user credentials
+- Admin credentials defeat the purpose of OAuth
+
+We need an OAuth-native solution that maintains security while enabling background operations.
+
+## Decision
+
+We will implement a **tiered authentication strategy** that leverages OAuth standards with graceful fallback:
+
+### Primary Strategy: OAuth-Based Authentication
+
+**Tier 1: Offline Access with Refresh Tokens** (Preferred)
+- Request `offline_access` scope during OAuth client registration
+- Receive and securely store user refresh tokens
+- Background worker exchanges refresh tokens for access tokens as needed
+- Respects per-user permissions and provides full audit trail
+
+**Tier 2: Token Exchange (RFC 8693)** (If supported)
+- Service account exchanges its token for user-scoped tokens on-demand
+- No token storage required
+- Only available if OIDC provider implements RFC 8693
+
+### Fallback Strategy: Admin Credentials
+
+**Tier 3: Admin BasicAuth** (Development/Simple Deployments)
+- Dedicated sync account with read-only permissions
+- Clear documentation of security implications
+- Recommended only for trusted environments
+
+### Key Architectural Principles
+
+1. **Capability Detection**: Automatically detect which OAuth methods are supported
+2. **Dual-Phase Authorization**:
+   - Sync worker indexes with service credentials
+   - User requests verify access with user's OAuth token
+3. **Defense in Depth**: Vector database is search accelerator, not security boundary
+4. **Separation of Concerns**: Sync credentials ≠ Request credentials
+
+## Implementation Details
+
+### 1. Offline Access Flow (Tier 1)
+
+#### 1.1 Client Registration
+```python
+# During OAuth client registration
+client_metadata = {
+    "client_name": "Nextcloud MCP Server",
+    "redirect_uris": ["http://localhost:8000/oauth/callback"],
+    "grant_types": ["authorization_code", "refresh_token"],
+    "scope": "openid profile email offline_access notes:read files:read ...",
+    "token_type": "Bearer"  # or "jwt"
+}
+```
+
+#### 1.2 Token Storage
+```python
+# Encrypted token storage
+class RefreshTokenStorage:
+    """Securely store and manage user refresh tokens"""
+
+    def __init__(self, db_path: str, encryption_key: bytes):
+        self.db = Database(db_path)
+        self.cipher = Fernet(encryption_key)
+
+    async def store_refresh_token(
+        self,
+        user_id: str,
+        refresh_token: str,
+        expires_at: int | None = None
+    ):
+        """Store encrypted refresh token for user"""
+        encrypted_token = self.cipher.encrypt(refresh_token.encode())
+        await self.db.execute(
+            "INSERT OR REPLACE INTO refresh_tokens VALUES (?, ?, ?, ?)",
+            (user_id, encrypted_token, expires_at, int(time.time()))
+        )
+
+    async def get_refresh_token(self, user_id: str) -> str | None:
+        """Retrieve and decrypt refresh token"""
+        row = await self.db.fetch_one(
+            "SELECT encrypted_token FROM refresh_tokens WHERE user_id = ?",
+            (user_id,)
+        )
+        if row:
+            return self.cipher.decrypt(row[0]).decode()
+        return None
+```
+
+#### 1.3 Token Refresh Flow
+```python
+async def get_user_access_token(user_id: str) -> str:
+    """Exchange refresh token for fresh access token"""
+
+    # Retrieve stored refresh token
+    refresh_token = await token_storage.get_refresh_token(user_id)
+    if not refresh_token:
+        raise ValueError(f"No refresh token for user {user_id}")
+
+    # Exchange for access token
+    async with httpx.AsyncClient() as client:
+        response = await client.post(
+            token_endpoint,
+            data={
+                "grant_type": "refresh_token",
+                "refresh_token": refresh_token
+            },
+            auth=(client_id, client_secret)
+        )
+        response.raise_for_status()
+        token_data = response.json()
+
+        # Store new refresh token if rotated
+        if "refresh_token" in token_data:
+            await token_storage.store_refresh_token(
+                user_id,
+                token_data["refresh_token"],
+                token_data.get("refresh_expires_in")
+            )
+
+        return token_data["access_token"]
+```
+
+#### 1.4 Capturing Refresh Tokens
+
+**Challenge**: MCP protocol doesn't expose refresh tokens to server
+
+**Solution**: Intercept OAuth callback
+```python
+# Add route to MCP server
+@app.route("/oauth/callback")
+async def oauth_callback(request):
+    """Capture OAuth callback and store refresh token"""
+
+    code = request.query_params.get("code")
+    state = request.query_params.get("state")
+
+    # Exchange authorization code for tokens
+    token_response = await exchange_authorization_code(code)
+
+    # Extract user info
+    userinfo = await get_userinfo(token_response["access_token"])
+    user_id = userinfo["sub"]
+
+    # Store refresh token (if present)
+    if "refresh_token" in token_response:
+        await token_storage.store_refresh_token(
+            user_id,
+            token_response["refresh_token"],
+            expires_at=token_response.get("refresh_expires_in")
+        )
+        logger.info(f"Stored refresh token for user: {user_id}")
+
+    # Continue MCP OAuth flow
+    return redirect_to_mcp_client(state, token_response)
+```
+
+### 2. Token Exchange Flow (Tier 2)
+
+#### 2.1 Capability Detection
+```python
+async def check_token_exchange_support(discovery_url: str) -> bool:
+    """Check if OIDC provider supports RFC 8693 token exchange"""
+
+    async with httpx.AsyncClient() as client:
+        response = await client.get(discovery_url)
+        discovery = response.json()
+
+        # Check for token exchange grant type
+        grant_types = discovery.get("grant_types_supported", [])
+        return "urn:ietf:params:oauth:grant-type:token-exchange" in grant_types
+```
+
+#### 2.2 Token Exchange Implementation
+```python
+async def exchange_for_user_token(
+    service_token: str,
+    user_id: str,
+    scopes: list[str]
+) -> str:
+    """Exchange service token for user-scoped token"""
+
+    async with httpx.AsyncClient() as client:
+        response = await client.post(
+            token_endpoint,
+            data={
+                "grant_type": "urn:ietf:params:oauth:grant-type:token-exchange",
+                "subject_token": service_token,
+                "subject_token_type": "urn:ietf:params:oauth:token-type:access_token",
+                "requested_token_type": "urn:ietf:params:oauth:token-type:access_token",
+                "resource": f"user:{user_id}",
+                "scope": " ".join(scopes)
+            },
+            auth=(client_id, client_secret)
+        )
+
+        if response.status_code != 200:
+            logger.warning(f"Token exchange failed: {response.status_code}")
+            raise TokenExchangeNotSupportedError()
+
+        return response.json()["access_token"]
+```
+
+#### 2.3 Service Account Token
+```python
+async def get_service_token() -> str:
+    """Get token for MCP server's service account"""
+
+    async with httpx.AsyncClient() as client:
+        response = await client.post(
+            token_endpoint,
+            data={
+                "grant_type": "client_credentials",
+                "scope": "notes:read files:read calendar:read"
+            },
+            auth=(client_id, client_secret)
+        )
+        response.raise_for_status()
+        return response.json()["access_token"]
+```
+
+### 3. Sync Worker with Tiered Authentication
+
+```python
+# nextcloud_mcp_server/sync_worker.py
+class VectorSyncWorker:
+    """Background worker for indexing content into vector database"""
+
+    def __init__(self):
+        self.auth_method = None
+        self.token_storage = None
+        self.vector_service = None
+
+    async def initialize(self):
+        """Detect and configure authentication method"""
+
+        # Try Tier 1: Offline Access
+        if os.getenv("ENABLE_OFFLINE_ACCESS") == "true":
+            try:
+                encryption_key = os.getenv("TOKEN_ENCRYPTION_KEY")
+                self.token_storage = RefreshTokenStorage(
+                    db_path="tokens.db",
+                    encryption_key=base64.b64decode(encryption_key)
+                )
+                self.auth_method = "offline_access"
+                logger.info("✓ Using offline_access authentication")
+                return
+            except Exception as e:
+                logger.warning(f"Offline access unavailable: {e}")
+
+        # Try Tier 2: Token Exchange
+        try:
+            if await check_token_exchange_support(discovery_url):
+                self.auth_method = "token_exchange"
+                logger.info("✓ Using token exchange authentication (RFC 8693)")
+                return
+        except Exception as e:
+            logger.warning(f"Token exchange unavailable: {e}")
+
+        # Fallback: Admin Credentials
+        if os.getenv("NEXTCLOUD_USERNAME") and os.getenv("NEXTCLOUD_PASSWORD"):
+            self.auth_method = "admin_basic"
+            logger.warning(
+                "⚠ Using admin BasicAuth authentication. "
+                "Consider enabling offline_access for production."
+            )
+            return
+
+        raise RuntimeError("No authentication method available for sync worker")
+
+    async def get_user_client(self, user_id: str) -> NextcloudClient:
+        """Get authenticated client for user based on auth method"""
+
+        if self.auth_method == "offline_access":
+            # Exchange refresh token for access token
+            access_token = await get_user_access_token(user_id)
+            return NextcloudClient.from_token(
+                base_url=nextcloud_host,
+                token=access_token,
+                username=user_id
+            )
+
+        elif self.auth_method == "token_exchange":
+            # Get service token and exchange for user token
+            service_token = await get_service_token()
+            user_token = await exchange_for_user_token(
+                service_token,
+                user_id,
+                scopes=["notes:read", "files:read"]
+            )
+            return NextcloudClient.from_token(
+                base_url=nextcloud_host,
+                token=user_token,
+                username=user_id
+            )
+
+        elif self.auth_method == "admin_basic":
+            # Use admin credentials (fallback)
+            return NextcloudClient.from_env()
+
+        raise RuntimeError(f"Unknown auth method: {self.auth_method}")
+
+    async def sync_user_content(self, user_id: str):
+        """Index a user's content into vector database"""
+
+        try:
+            # Get authenticated client for this user
+            client = await self.get_user_client(user_id)
+
+            # Sync notes
+            notes = await client.notes.list_notes()
+            for note in notes:
+                embedding = await self.vector_service.embed(note.content)
+                await self.vector_service.upsert(
+                    collection="nextcloud_content",
+                    id=f"note_{note.id}",
+                    vector=embedding,
+                    metadata={
+                        "user_id": user_id,
+                        "content_type": "note",
+                        "note_id": note.id,
+                        "title": note.title,
+                        "category": note.category
+                    }
+                )
+
+            logger.info(f"Synced {len(notes)} notes for user: {user_id}")
+
+        except Exception as e:
+            logger.error(f"Failed to sync user {user_id}: {e}")
+
+    async def run(self):
+        """Main sync loop"""
+
+        await self.initialize()
+
+        while True:
+            try:
+                # Get list of users to sync
+                if self.auth_method == "admin_basic":
+                    # Admin can list all users
+                    admin_client = NextcloudClient.from_env()
+                    users = await admin_client.users.list_users()
+                    user_ids = [u.id for u in users]
+                else:
+                    # OAuth methods: only sync users with stored tokens
+                    user_ids = await self.token_storage.get_all_user_ids()
+
+                logger.info(f"Syncing content for {len(user_ids)} users")
+
+                for user_id in user_ids:
+                    await self.sync_user_content(user_id)
+
+                logger.info("Sync complete, sleeping...")
+                await asyncio.sleep(300)  # 5 minutes
+
+            except Exception as e:
+                logger.error(f"Sync failed: {e}")
+                await asyncio.sleep(60)  # Retry after 1 minute
+```
+
+### 4. User Request Verification (Dual-Phase Authorization)
+
+```python
+@mcp.tool()
+@require_scopes("notes:read")
+async def nc_notes_semantic_search(
+    query: str,
+    ctx: Context,
+    limit: int = 10
+) -> SemanticSearchResponse:
+    """Semantic search with permission verification"""
+
+    # Get user's OAuth client (uses their access token from request)
+    user_client = get_client(ctx)
+    username = user_client.username
+
+    # Phase 1: Vector search (fast, may include false positives)
+    embedding = await vector_service.embed(query)
+    candidate_results = await qdrant.search(
+        collection_name="nextcloud_content",
+        query_vector=embedding,
+        query_filter={
+            "must": [
+                {
+                    "should": [
+                        {"key": "user_id", "match": {"value": username}},
+                        {"key": "shared_with", "match": {"any": [username]}}
+                    ]
+                },
+                {"key": "content_type", "match": {"value": "note"}}
+            ]
+        },
+        limit=limit * 2  # Get extra candidates
+    )
+
+    # Phase 2: Verify access via Nextcloud API (authoritative)
+    verified_results = []
+    for candidate in candidate_results:
+        note_id = candidate.payload["note_id"]
+        try:
+            # This uses user's OAuth token - will fail if no access
+            note = await user_client.notes.get_note(note_id)
+            verified_results.append({
+                "note": note,
+                "score": candidate.score
+            })
+            if len(verified_results) >= limit:
+                break
+        except HTTPStatusError as e:
+            if e.response.status_code == 403:
+                # User doesn't have access - skip silently
+                logger.debug(f"Filtered out note {note_id} for {username}")
+                continue
+            raise
+
+    return SemanticSearchResponse(results=verified_results)
+```
+
+### 5. Security Implementation
+
+#### 5.1 Token Encryption
+```python
+# Generate encryption key (store securely)
+from cryptography.fernet import Fernet
+
+# On first setup
+encryption_key = Fernet.generate_key()
+# Store in environment or secrets manager
+# NEVER commit to source control
+
+# In production
+encryption_key = os.getenv("TOKEN_ENCRYPTION_KEY")  # Base64-encoded Fernet key
+```
+
+#### 5.2 Token Rotation
+```python
+async def rotate_refresh_token(user_id: str):
+    """Handle refresh token rotation"""
+
+    old_refresh_token = await token_storage.get_refresh_token(user_id)
+
+    # Exchange for new tokens
+    response = await exchange_refresh_token(old_refresh_token)
+
+    if "refresh_token" in response:
+        # Store new refresh token
+        await token_storage.store_refresh_token(
+            user_id,
+            response["refresh_token"],
+            expires_at=response.get("refresh_expires_in")
+        )
+
+        # Securely delete old token
+        await token_storage.delete_refresh_token(user_id, old_refresh_token)
+```
+
+#### 5.3 Audit Logging
+```python
+async def audit_log(
+    event: str,
+    user_id: str,
+    resource_type: str,
+    resource_id: str,
+    auth_method: str
+):
+    """Log sync operations for audit trail"""
+
+    await audit_db.execute(
+        "INSERT INTO audit_logs VALUES (?, ?, ?, ?, ?, ?, ?)",
+        (
+            int(time.time()),
+            event,  # "index_note", "index_file"
+            user_id,
+            resource_type,
+            resource_id,
+            auth_method,
+            socket.gethostname()
+        )
+    )
+```
+
+### 6. Configuration
+
+#### 6.1 Environment Variables
+```bash
+# Tier 1: Offline Access
+ENABLE_OFFLINE_ACCESS=true
+TOKEN_ENCRYPTION_KEY=<base64-encoded-fernet-key>
+TOKEN_STORAGE_DB=/app/data/tokens.db
+
+# Tier 2: Token Exchange (auto-detected)
+# No configuration needed - detected via OIDC discovery
+
+# Tier 3: Admin Fallback
+NEXTCLOUD_USERNAME=sync-bot
+NEXTCLOUD_PASSWORD=<secure-password>
+
+# Vector Database
+QDRANT_URL=http://qdrant:6333
+QDRANT_API_KEY=<api-key>
+
+# Sync Configuration
+SYNC_INTERVAL_SECONDS=300
+SYNC_BATCH_SIZE=100
+```
+
+#### 6.2 Docker Compose
+```yaml
+services:
+  mcp-sync:
+    build: .
+    command: ["python", "-m", "nextcloud_mcp_server.sync_worker"]
+    environment:
+      - NEXTCLOUD_HOST=http://app:80
+      - ENABLE_OFFLINE_ACCESS=true
+      - TOKEN_ENCRYPTION_KEY=${TOKEN_ENCRYPTION_KEY}
+      - QDRANT_URL=http://qdrant:6333
+      # OAuth client credentials (for token refresh)
+      - NEXTCLOUD_OIDC_CLIENT_ID=${NEXTCLOUD_OIDC_CLIENT_ID}
+      - NEXTCLOUD_OIDC_CLIENT_SECRET=${NEXTCLOUD_OIDC_CLIENT_SECRET}
+    volumes:
+      - sync-tokens:/app/data
+    depends_on:
+      - app
+      - qdrant
+
+volumes:
+  sync-tokens:  # Persistent storage for encrypted tokens
+```
+
+## Consequences
+
+### Benefits
+
+1. **OAuth-Native Authentication**
+   - Leverages standard OAuth flows (offline_access, token exchange)
+   - No reliance on admin passwords in production
+   - Compatible with enterprise OIDC providers
+
+2. **User-Level Permissions**
+   - Each user's content indexed with their own credentials
+   - Respects sharing, permissions, and access controls
+   - Full audit trail of which user's token was used
+
+3. **Security**
+   - Tokens encrypted at rest
+   - Short-lived access tokens (refreshed as needed)
+   - Token rotation support
+   - Defense in depth with dual-phase authorization
+
+4. **Flexibility**
+   - Automatic capability detection
+   - Graceful degradation through authentication tiers
+   - Works with varying OIDC provider capabilities
+
+5. **Operational**
+   - Background sync independent of user activity
+   - Efficient batch processing
+   - Clear separation of sync vs request credentials
+
+### Limitations
+
+1. **Complexity**
+   - Multiple authentication paths to maintain
+   - Token storage and encryption infrastructure
+   - More moving parts than simple admin auth
+
+2. **User Experience**
+   - `offline_access` scope may require additional consent
+   - Users must authenticate at least once for indexing
+   - New users not automatically indexed
+
+3. **OIDC Provider Dependency**
+   - Token exchange requires RFC 8693 support (rare)
+   - Refresh token rotation varies by provider
+   - Some providers may not support offline_access
+
+4. **Operational Overhead**
+   - Token database maintenance
+   - Monitoring token expiration
+   - Handling revoked tokens gracefully
+
+### Security Considerations
+
+#### Threat Model
+
+**Threat 1: Token Storage Breach**
+- **Mitigation**: Encryption at rest using Fernet
+- **Mitigation**: Secure key management (secrets manager)
+- **Mitigation**: Minimal token lifetime
+- **Detection**: Audit logs for unusual access patterns
+
+**Threat 2: Token Replay**
+- **Mitigation**: Short-lived access tokens (refreshed frequently)
+- **Mitigation**: Token rotation on each refresh
+- **Mitigation**: Revocation support
+
+**Threat 3: Privilege Escalation**
+- **Mitigation**: Dual-phase authorization (vector DB + Nextcloud API)
+- **Mitigation**: Sync worker uses same scopes as user requests
+- **Mitigation**: Per-user token isolation
+
+**Threat 4: Vector Database Poisoning**
+- **Mitigation**: User requests always verify via Nextcloud API
+- **Mitigation**: Vector DB is cache/accelerator, not source of truth
+- **Mitigation**: Sync operations audited per user
+
+#### Security Best Practices
+
+1. **Token Encryption Key Management**
+   ```bash
+   # Generate secure key
+   python -c "from cryptography.fernet import Fernet; print(Fernet.generate_key().decode())"
+
+   # Store in secrets manager (Vault, AWS Secrets Manager, etc.)
+   # Or use environment variable with restricted permissions
+   ```
+
+2. **Token Storage Permissions**
+   ```bash
+   # Restrict database file permissions
+   chmod 600 /app/data/tokens.db
+   chown mcp-server:mcp-server /app/data/tokens.db
+   ```
+
+3. **Token Rotation Schedule**
+   - Refresh access tokens every 5 minutes (or token expiry)
+   - Rotate refresh tokens on each use (if provider supports)
+   - Revoke tokens on user logout/deauthorization
+
+4. **Monitoring and Alerting**
+   - Alert on token refresh failures
+   - Monitor for unusual access patterns
+   - Track token age and rotation
+   - Audit sync operations per user
+
+### Future Enhancements
+
+1. **Token Revocation Handling**
+   - Webhook endpoint for token revocation events
+   - Periodic validation of stored tokens
+   - Graceful handling of revoked tokens
+
+2. **Selective Sync**
+   - Allow users to opt-in/opt-out of indexing
+   - Per-content-type sync preferences
+   - Privacy controls for sensitive content
+
+3. **Multi-Tenant Token Storage**
+   - Separate token databases per tenant
+   - Key rotation per tenant
+   - Tenant isolation
+
+4. **Token Lifecycle Management**
+   - Automatic cleanup of expired tokens
+   - Token usage analytics
+   - Token health dashboard
+
+5. **Alternative OAuth Flows**
+   - Device flow for headless sync
+   - Resource owner password credentials (ROPC) as fallback
+   - SAML assertion grants
+
+## Alternatives Considered
+
+### Alternative 1: Admin BasicAuth Only
+
+**Approach**: Background worker always uses admin credentials
+
+**Pros**:
+- Simple implementation
+- No token storage complexity
+- Works with any authentication backend
+
+**Cons**:
+- Violates principle of least privilege
+- Single powerful credential
+- No per-user audit trail
+- Bypasses OAuth entirely
+
+**Decision**: Rejected for production use; kept as fallback only
+
+### Alternative 2: Client Credentials Grant Only
+
+**Approach**: Service account with broad read permissions
+
+**Pros**:
+- OAuth-native pattern
+- No user token storage
+- Standard OAuth flow
+
+**Cons**:
+- Requires client_credentials support (may not be available)
+- Still needs broad cross-user permissions
+- Not well-suited for multi-user indexing
+
+**Decision**: Rejected; token exchange is better fit for multi-user scenario
+
+### Alternative 3: Per-User Access Token Storage
+
+**Approach**: Store user access tokens (not refresh tokens)
+
+**Pros**:
+- Simpler than refresh token flow
+- No token refresh logic needed
+
+**Cons**:
+- Access tokens are short-lived (1-24 hours)
+- Requires frequent re-authentication
+- Poor user experience
+- Sync gaps when tokens expire
+
+**Decision**: Rejected; refresh tokens provide better UX
+
+### Alternative 4: On-Demand Indexing Only
+
+**Approach**: Index content when user searches (no background worker)
+
+**Pros**:
+- Uses user's request token
+- No background auth needed
+- Simpler architecture
+
+**Cons**:
+- Very slow first search
+- Poor user experience
+- Incomplete index
+- Can't pre-compute embeddings
+
+**Decision**: Rejected; background indexing is essential for semantic search
+
+### Alternative 5: Nextcloud App Tokens
+
+**Approach**: Generate app-specific passwords for each user
+
+**Pros**:
+- Nextcloud-native feature
+- User-controlled revocation
+- Scoped per-application
+
+**Cons**:
+- Requires user interaction to create
+- May not support programmatic creation
+- Still requires secure storage
+- Not standard OAuth
+
+**Decision**: Rejected; not automatable for background worker
+
+## Related Decisions
+
+- ADR-001: Enhanced Note Search (establishes need for vector search)
+- [Future] ADR-003: Vector Database Selection
+- [Future] ADR-004: Embedding Model Strategy
+
+## References
+
+- [RFC 8693: OAuth 2.0 Token Exchange](https://datatracker.ietf.org/doc/html/rfc8693)
+- [RFC 6749: OAuth 2.0 - Refresh Tokens](https://datatracker.ietf.org/doc/html/rfc6749#section-1.5)
+- [OpenID Connect Core - Offline Access](https://openid.net/specs/openid-connect-core-1_0.html#OfflineAccess)
+- [OWASP: OAuth Security Cheat Sheet](https://cheatsheetseries.owasp.org/cheatsheets/OAuth2_Cheat_Sheet.html)
+- [RFC 8707: Resource Indicators for OAuth 2.0](https://datatracker.ietf.org/doc/html/rfc8707)
diff --git a/docs/ADR-003-vector-database-semantic-search.md b/docs/ADR-003-vector-database-semantic-search.md
new file mode 100644
index 0000000..f1d1bbe
--- /dev/null
+++ b/docs/ADR-003-vector-database-semantic-search.md
@@ -0,0 +1,1116 @@
+# ADR-003: Vector Database and Semantic Search Architecture
+
+## Status
+Proposed
+
+## Context
+
+### Current State
+
+ADR-001 introduced token-based keyword search with relevance ranking, which improved upon simple substring matching. However, this approach still has fundamental limitations:
+
+1. **Lexical Matching Only**: Requires exact word matches (e.g., "automobile" won't match "car")
+2. **No Semantic Understanding**: Cannot understand intent or context (e.g., "how to bake bread" won't match "bread recipe")
+3. **Language Barriers**: Poor support for synonyms, related terms, or multilingual content
+4. **No Cross-Content Search**: Cannot find related content across different apps (notes, files, calendar)
+5. **Scaling Issues**: Performance degrades with large content collections
+
+### User Needs
+
+LLM-powered applications (Claude via MCP) benefit significantly from semantic search capabilities:
+
+- **Context Discovery**: Find relevant information based on meaning, not just keywords
+- **Knowledge Retrieval**: Retrieve contextually relevant notes/files for task completion
+- **Cross-Referencing**: Connect related information across different content types
+- **Natural Language Queries**: Support conversational search patterns
+
+### Technical Requirements
+
+1. **Multi-User Environment**: OAuth-based with per-user isolation and permissions
+2. **Multi-Tenant**: Single deployment serving multiple users with strict data isolation
+3. **Real-Time Search**: Sub-second query latency for good UX
+4. **Large Content**: Support for documents, PDFs, images with text extraction
+5. **Privacy**: No external API calls for sensitive content (optionally self-hosted)
+6. **Hybrid Search**: Combine semantic and keyword search for best results
+
+## Decision
+
+We will implement **semantic search using a vector database** with the following architecture:
+
+### Core Components
+
+1. **Vector Database**: Qdrant as external sidecar service
+2. **Embedding Strategy**: Configurable (OpenAI API / local models / self-hosted)
+3. **Search Pattern**: Hybrid search (semantic + keyword fusion)
+4. **Multi-Tenancy**: Single collection with user_id filtering
+5. **Authorization**: Dual-phase (vector search + Nextcloud API verification)
+6. **Sync Strategy**: Background worker with incremental updates (see ADR-002)
+
+### Architecture Diagram
+
+```
+┌─────────────────────────────────────────────────────────────┐
+│                    User Request (OAuth)                      │
+│                    "find notes about baking"                 │
+└───────────────────────────┬─────────────────────────────────┘
+                            │
+                            ▼
+┌────────────────────────────────────────────────────────────┐
+│               MCP Server (Semantic Search Tool)             │
+│                                                              │
+│  1. Generate query embedding                                │
+│  2. Search vector DB (user_id filter)                       │
+│  3. Verify permissions via Nextcloud API                    │
+│  4. Return ranked results                                   │
+└──────────┬─────────────────────────────┬────────────────────┘
+           │                              │
+           ▼                              ▼
+┌──────────────────────┐      ┌──────────────────────────────┐
+│ Embedding Service    │      │ Qdrant Vector Database        │
+│ - OpenAI API         │      │                               │
+│ - Local Model        │      │ Collection: nextcloud_content │
+│ - Self-hosted        │      │ - User-filtered vectors       │
+└──────────────────────┘      │ - Metadata for auth          │
+                               │ - HNSW index                  │
+                               └───────────────────────────────┘
+                                          ▲
+                                          │
+                                          │ Indexing
+                                          │
+                               ┌──────────┴────────────────────┐
+                               │ Background Sync Worker        │
+                               │ (see ADR-002 for auth)        │
+                               │                               │
+                               │ 1. Fetch user content         │
+                               │ 2. Generate embeddings        │
+                               │ 3. Upsert to Qdrant          │
+                               │ 4. Update metadata            │
+                               └───────────────────────────────┘
+```
+
+## Implementation Details
+
+### 1. Vector Database Selection: Qdrant
+
+After evaluating multiple options, we select **Qdrant** for the following reasons:
+
+**Qdrant Advantages:**
+- ✅ Native async Python client (`qdrant-client`)
+- ✅ Efficient multi-tenancy via filtered search (no collection-per-user needed)
+- ✅ Built-in hybrid search support (dense + sparse vectors)
+- ✅ HNSW index with excellent performance
+- ✅ Lightweight Docker deployment
+- ✅ Persistent storage with snapshots
+- ✅ API key authentication
+- ✅ Active development and documentation
+
+**Comparison with Alternatives:**
+
+| Feature | Qdrant | Chroma | Weaviate | pgvector |
+|---------|--------|--------|----------|----------|
+| Async Python | ✅ | ⚠️ Sync | ✅ | ✅ |
+| Multi-tenant filtering | ✅ | ⚠️ Limited | ✅ | ✅ |
+| Hybrid search | ✅ | ❌ | ✅ | ⚠️ Manual |
+| Docker deployment | ✅ Easy | ✅ Easy | ✅ Complex | ⚠️ Postgres |
+| Memory usage | ✅ Low | ⚠️ Medium | ⚠️ High | ✅ Low |
+| Maturity | ✅ Production | ⚠️ Young | ✅ Production | ✅ Mature |
+
+**Decision**: Qdrant provides the best balance of features, performance, and ease of deployment.
+
+### 2. Embedding Strategy: Tiered Approach
+
+Support multiple embedding backends with automatic fallback:
+
+```python
+class EmbeddingService:
+    """Unified interface for embedding generation"""
+
+    def __init__(self):
+        self.provider = self._detect_provider()
+
+    def _detect_provider(self) -> EmbeddingProvider:
+        """Auto-detect available embedding provider"""
+
+        # Tier 1: OpenAI API (best quality, requires API key)
+        if os.getenv("OPENAI_API_KEY"):
+            return OpenAIEmbedding(
+                model=os.getenv("OPENAI_EMBEDDING_MODEL", "text-embedding-3-small"),
+                api_key=os.getenv("OPENAI_API_KEY")
+            )
+
+        # Tier 2: Self-hosted embedding service (good quality, privacy-preserving)
+        if os.getenv("EMBEDDING_SERVICE_URL"):
+            return HTTPEmbedding(
+                url=os.getenv("EMBEDDING_SERVICE_URL"),
+                model=os.getenv("EMBEDDING_MODEL", "BAAI/bge-small-en-v1.5")
+            )
+
+        # Tier 3: Local model (fallback, CPU-only)
+        logger.warning("No cloud/hosted embeddings available, using local model")
+        return LocalEmbedding(
+            model=os.getenv("LOCAL_EMBEDDING_MODEL", "all-MiniLM-L6-v2")
+        )
+
+    async def embed(self, text: str) -> list[float]:
+        """Generate embedding vector for text"""
+        return await self.provider.embed(text)
+
+    async def embed_batch(self, texts: list[str]) -> list[list[float]]:
+        """Generate embeddings for multiple texts (optimized)"""
+        return await self.provider.embed_batch(texts)
+```
+
+#### 2.1 OpenAI Embeddings (Tier 1)
+
+```python
+class OpenAIEmbedding(EmbeddingProvider):
+    """OpenAI embedding API"""
+
+    def __init__(self, model: str, api_key: str):
+        self.client = AsyncOpenAI(api_key=api_key)
+        self.model = model
+        self.dimension = 1536 if "3-small" in model else 1536  # Model-dependent
+
+    async def embed(self, text: str) -> list[float]:
+        response = await self.client.embeddings.create(
+            model=self.model,
+            input=text
+        )
+        return response.data[0].embedding
+
+    async def embed_batch(self, texts: list[str]) -> list[list[float]]:
+        # OpenAI supports batch up to 2048 inputs
+        response = await self.client.embeddings.create(
+            model=self.model,
+            input=texts
+        )
+        return [item.embedding for item in response.data]
+```
+
+**Costs**: text-embedding-3-small: $0.02 per 1M tokens (~4M characters)
+- 10,000 notes × 500 words avg = ~$0.10 to index
+- Searches are extremely cheap (~$0.00002 per query)
+
+#### 2.2 Self-Hosted Embeddings (Tier 2)
+
+```python
+class HTTPEmbedding(EmbeddingProvider):
+    """Self-hosted embedding service (Infinity, TEI, Ollama)"""
+
+    def __init__(self, url: str, model: str):
+        self.client = httpx.AsyncClient()
+        self.url = url
+        self.model = model
+        self.dimension = 384  # Model-dependent (bge-small: 384, bge-base: 768)
+
+    async def embed(self, text: str) -> list[float]:
+        response = await self.client.post(
+            f"{self.url}/embeddings",
+            json={"input": text, "model": self.model}
+        )
+        response.raise_for_status()
+        return response.json()["data"][0]["embedding"]
+```
+
+**Self-Hosted Options**:
+- **Infinity**: Lightweight, OpenAI-compatible API, GPU support
+- **Text Embeddings Inference (TEI)**: HuggingFace official, optimized, Rust-based
+- **Ollama**: Easy setup, multi-model support, CPU/GPU
+
+#### 2.3 Local Embeddings (Tier 3)
+
+```python
+class LocalEmbedding(EmbeddingProvider):
+    """Local embedding using sentence-transformers (CPU fallback)"""
+
+    def __init__(self, model: str):
+        from sentence_transformers import SentenceTransformer
+        self.model = SentenceTransformer(model)
+        self.dimension = self.model.get_sentence_embedding_dimension()
+
+    async def embed(self, text: str) -> list[float]:
+        # Run in thread pool to avoid blocking
+        loop = asyncio.get_event_loop()
+        embedding = await loop.run_in_executor(
+            None,
+            self.model.encode,
+            text
+        )
+        return embedding.tolist()
+```
+
+**Recommended Local Models**:
+- `all-MiniLM-L6-v2`: 384 dims, fast, good quality
+- `all-mpnet-base-v2`: 768 dims, slower, better quality
+- `paraphrase-multilingual-MiniLM-L12-v2`: Multilingual support
+
+### 3. Vector Database Schema
+
+```python
+# Qdrant collection configuration
+collection_config = {
+    "collection_name": "nextcloud_content",
+    "vectors_config": {
+        "size": 384,  # Embedding dimension (model-dependent)
+        "distance": "Cosine"  # Cosine similarity for semantic search
+    },
+    "optimizers_config": {
+        "indexing_threshold": 10000  # Start indexing after 10k vectors
+    },
+    "hnsw_config": {
+        "m": 16,  # Number of edges per node (balance speed/accuracy)
+        "ef_construct": 100  # Quality of index construction
+    }
+}
+
+# Payload schema (metadata)
+payload_schema = {
+    "user_id": str,           # Required: owner of content
+    "content_type": str,      # "note", "file", "calendar_event"
+    "content_id": str,        # Source ID (note_id, file_path, event_id)
+    "title": str,             # Searchable title
+    "excerpt": str,           # First 200 chars for preview
+    "category": str,          # Optional: category/folder
+    "mime_type": str,         # Optional: file MIME type
+    "shared_with": list[str], # Optional: list of user_ids with access
+    "tags": list[str],        # Optional: user tags
+    "created_at": int,        # Unix timestamp
+    "modified_at": int,       # Unix timestamp
+    "indexed_at": int         # Unix timestamp (for sync tracking)
+}
+```
+
+#### 3.1 Multi-Tenancy via Filtering
+
+```python
+# User-specific search with filtering
+search_results = await qdrant_client.search(
+    collection_name="nextcloud_content",
+    query_vector=query_embedding,
+    query_filter=models.Filter(
+        must=[
+            # User owns the content OR it's shared with them
+            models.Filter(
+                should=[
+                    models.FieldCondition(
+                        key="user_id",
+                        match=models.MatchValue(value=current_user_id)
+                    ),
+                    models.FieldCondition(
+                        key="shared_with",
+                        match=models.MatchAny(any=[current_user_id])
+                    )
+                ]
+            ),
+            # Optional: filter by content type
+            models.FieldCondition(
+                key="content_type",
+                match=models.MatchValue(value="note")
+            )
+        ]
+    ),
+    limit=20,
+    score_threshold=0.7  # Only return confident matches
+)
+```
+
+### 4. Hybrid Search Implementation
+
+Combine semantic and keyword search for best results:
+
+```python
+@mcp.tool()
+@require_scopes("notes:read")
+async def nc_notes_hybrid_search(
+    query: str,
+    ctx: Context,
+    limit: int = 10,
+    semantic_weight: float = 0.7,
+    keyword_weight: float = 0.3
+) -> SearchNotesResponse:
+    """
+    Hybrid search combining semantic understanding with keyword precision.
+
+    Args:
+        query: Natural language search query
+        limit: Maximum results to return
+        semantic_weight: Weight for semantic similarity (0-1)
+        keyword_weight: Weight for keyword matching (0-1)
+    """
+
+    client = get_client(ctx)
+    username = client.username
+
+    # Run searches in parallel
+    semantic_task = asyncio.create_task(
+        semantic_search(query, username, limit=limit * 2)
+    )
+    keyword_task = asyncio.create_task(
+        keyword_search(query, username, limit=limit * 2)
+    )
+
+    semantic_results, keyword_results = await asyncio.gather(
+        semantic_task, keyword_task
+    )
+
+    # Fusion: Combine and rerank results
+    fused_results = reciprocal_rank_fusion(
+        semantic_results,
+        keyword_results,
+        semantic_weight=semantic_weight,
+        keyword_weight=keyword_weight
+    )
+
+    # Verify permissions via Nextcloud API (dual-phase authorization)
+    verified_results = []
+    for result in fused_results[:limit * 2]:  # Get extra for filtering
+        try:
+            note = await client.notes.get_note(result["note_id"])
+            verified_results.append({
+                "note": note,
+                "score": result["score"],
+                "match_type": result["match_type"]  # "semantic", "keyword", "both"
+            })
+            if len(verified_results) >= limit:
+                break
+        except HTTPStatusError as e:
+            if e.response.status_code == 403:
+                continue  # User lost access
+            raise
+
+    return SearchNotesResponse(
+        results=verified_results,
+        query=query,
+        total_found=len(verified_results),
+        search_method="hybrid"
+    )
+
+def reciprocal_rank_fusion(
+    semantic_results: list[dict],
+    keyword_results: list[dict],
+    semantic_weight: float = 0.7,
+    keyword_weight: float = 0.3,
+    k: int = 60  # RRF constant
+) -> list[dict]:
+    """
+    Reciprocal Rank Fusion for combining search results.
+
+    RRF is more robust than score normalization because it only
+    depends on ranks, not absolute scores.
+    """
+
+    # Build rank maps
+    semantic_ranks = {r["note_id"]: i for i, r in enumerate(semantic_results)}
+    keyword_ranks = {r["note_id"]: i for i, r in enumerate(keyword_results)}
+
+    # Get all unique note IDs
+    all_note_ids = set(semantic_ranks.keys()) | set(keyword_ranks.keys())
+
+    # Calculate fused scores
+    fused = []
+    for note_id in all_note_ids:
+        # RRF formula: score = sum(weight_i / (k + rank_i))
+        semantic_score = 0
+        keyword_score = 0
+        match_type = []
+
+        if note_id in semantic_ranks:
+            semantic_score = semantic_weight / (k + semantic_ranks[note_id])
+            match_type.append("semantic")
+
+        if note_id in keyword_ranks:
+            keyword_score = keyword_weight / (k + keyword_ranks[note_id])
+            match_type.append("keyword")
+
+        fused.append({
+            "note_id": note_id,
+            "score": semantic_score + keyword_score,
+            "match_type": "+".join(match_type)
+        })
+
+    # Sort by fused score
+    fused.sort(key=lambda x: x["score"], reverse=True)
+    return fused
+```
+
+### 5. Document Chunking Strategy
+
+For large documents (>1000 tokens), implement semantic chunking:
+
+```python
+class DocumentChunker:
+    """Chunk large documents for optimal embedding"""
+
+    def __init__(self, chunk_size: int = 512, overlap: int = 50):
+        self.chunk_size = chunk_size  # tokens
+        self.overlap = overlap  # overlapping tokens
+
+    def chunk_document(
+        self,
+        content: str,
+        metadata: dict
+    ) -> list[tuple[str, dict]]:
+        """
+        Split document into overlapping chunks with metadata.
+
+        Returns list of (chunk_text, chunk_metadata) tuples.
+        """
+
+        # Tokenize (approximate with words for simplicity)
+        tokens = content.split()
+
+        if len(tokens) <= self.chunk_size:
+            # Document fits in single chunk
+            return [(content, metadata)]
+
+        chunks = []
+        start = 0
+
+        while start < len(tokens):
+            end = start + self.chunk_size
+            chunk_tokens = tokens[start:end]
+            chunk_text = " ".join(chunk_tokens)
+
+            # Add chunk metadata
+            chunk_metadata = {
+                **metadata,
+                "chunk_index": len(chunks),
+                "chunk_start": start,
+                "chunk_end": end,
+                "is_chunk": True
+            }
+
+            chunks.append((chunk_text, chunk_metadata))
+
+            # Move to next chunk with overlap
+            start = end - self.overlap
+
+        return chunks
+
+# Usage in sync worker
+async def index_document(doc: Document, user_id: str):
+    """Index a document with chunking"""
+
+    chunker = DocumentChunker(chunk_size=512, overlap=50)
+    chunks = chunker.chunk_document(
+        content=doc.content,
+        metadata={
+            "user_id": user_id,
+            "content_type": "file",
+            "content_id": doc.path,
+            "title": doc.title,
+            "mime_type": doc.mime_type
+        }
+    )
+
+    # Generate embeddings in batch
+    chunk_texts = [chunk[0] for chunk in chunks]
+    embeddings = await embedding_service.embed_batch(chunk_texts)
+
+    # Upsert all chunks
+    points = []
+    for (chunk_text, chunk_metadata), embedding in zip(chunks, embeddings):
+        points.append(
+            models.PointStruct(
+                id=str(uuid.uuid4()),
+                vector=embedding,
+                payload={
+                    **chunk_metadata,
+                    "excerpt": chunk_text[:200]  # Preview
+                }
+            )
+        )
+
+    await qdrant_client.upsert(
+        collection_name="nextcloud_content",
+        points=points
+    )
+```
+
+### 6. Background Sync Worker
+
+```python
+# nextcloud_mcp_server/sync/vector_indexer.py
+class VectorIndexer:
+    """Indexes content into vector database"""
+
+    def __init__(
+        self,
+        qdrant_client: AsyncQdrantClient,
+        embedding_service: EmbeddingService,
+        auth_provider: SyncAuthProvider  # From ADR-002
+    ):
+        self.qdrant = qdrant_client
+        self.embeddings = embedding_service
+        self.auth = auth_provider
+
+    async def sync_user_notes(self, user_id: str):
+        """Sync all notes for a user"""
+
+        # Get authenticated client for user
+        client = await self.auth.get_user_client(user_id)
+
+        # Fetch all notes
+        notes = await client.notes.list_notes()
+        logger.info(f"Syncing {len(notes)} notes for {user_id}")
+
+        # Check which notes need updating
+        existing_ids = await self._get_indexed_note_ids(user_id)
+        notes_to_update = [
+            n for n in notes
+            if f"note_{n.id}" not in existing_ids
+            or n.modified > existing_ids[f"note_{n.id}"]
+        ]
+
+        if not notes_to_update:
+            logger.info(f"All notes up-to-date for {user_id}")
+            return
+
+        # Generate embeddings in batch
+        contents = [f"{n.title}\n\n{n.content}" for n in notes_to_update]
+        embeddings = await self.embeddings.embed_batch(contents)
+
+        # Prepare points for upsert
+        points = []
+        for note, embedding in zip(notes_to_update, embeddings):
+            points.append(
+                models.PointStruct(
+                    id=f"note_{note.id}",
+                    vector=embedding,
+                    payload={
+                        "user_id": user_id,
+                        "content_type": "note",
+                        "content_id": str(note.id),
+                        "note_id": note.id,
+                        "title": note.title,
+                        "excerpt": note.content[:200],
+                        "category": note.category,
+                        "created_at": note.created,
+                        "modified_at": note.modified,
+                        "indexed_at": int(time.time())
+                    }
+                )
+            )
+
+        # Upsert to Qdrant
+        await self.qdrant.upsert(
+            collection_name="nextcloud_content",
+            points=points
+        )
+
+        logger.info(f"Indexed {len(points)} notes for {user_id}")
+
+    async def _get_indexed_note_ids(self, user_id: str) -> dict[str, int]:
+        """Get map of note_id -> modified_at for indexed notes"""
+
+        # Query Qdrant for existing notes
+        scroll_result = await self.qdrant.scroll(
+            collection_name="nextcloud_content",
+            scroll_filter=models.Filter(
+                must=[
+                    models.FieldCondition(
+                        key="user_id",
+                        match=models.MatchValue(value=user_id)
+                    ),
+                    models.FieldCondition(
+                        key="content_type",
+                        match=models.MatchValue(value="note")
+                    )
+                ]
+            ),
+            with_payload=["content_id", "modified_at"],
+            limit=10000
+        )
+
+        return {
+            point.payload["content_id"]: point.payload["modified_at"]
+            for point, _ in scroll_result
+        }
+
+    async def delete_note(self, user_id: str, note_id: int):
+        """Remove deleted note from index"""
+
+        await self.qdrant.delete(
+            collection_name="nextcloud_content",
+            points_selector=models.FilterSelector(
+                filter=models.Filter(
+                    must=[
+                        models.FieldCondition(
+                            key="user_id",
+                            match=models.MatchValue(value=user_id)
+                        ),
+                        models.FieldCondition(
+                            key="note_id",
+                            match=models.MatchValue(value=note_id)
+                        )
+                    ]
+                )
+            )
+        )
+```
+
+### 7. Configuration
+
+#### 7.1 Environment Variables
+```bash
+# Vector Database
+QDRANT_URL=http://qdrant:6333
+QDRANT_API_KEY=<secure-api-key>
+QDRANT_COLLECTION=nextcloud_content
+
+# Embedding Strategy (choose one)
+# Option 1: OpenAI
+OPENAI_API_KEY=sk-...
+OPENAI_EMBEDDING_MODEL=text-embedding-3-small  # or text-embedding-3-large
+
+# Option 2: Self-hosted
+EMBEDDING_SERVICE_URL=http://embeddings:7997
+EMBEDDING_MODEL=BAAI/bge-small-en-v1.5
+
+# Option 3: Local (fallback, no config needed)
+
+# Search Configuration
+SEMANTIC_SEARCH_ENABLED=true
+HYBRID_SEARCH_DEFAULT_SEMANTIC_WEIGHT=0.7
+HYBRID_SEARCH_DEFAULT_KEYWORD_WEIGHT=0.3
+SEARCH_SCORE_THRESHOLD=0.7
+
+# Sync Configuration
+VECTOR_SYNC_INTERVAL=300  # seconds
+VECTOR_SYNC_BATCH_SIZE=100
+```
+
+#### 7.2 Docker Compose
+
+```yaml
+services:
+  # Vector Database
+  qdrant:
+    image: qdrant/qdrant:latest
+    restart: always
+    ports:
+      - 127.0.0.1:6333:6333  # REST API
+      - 127.0.0.1:6334:6334  # gRPC
+    volumes:
+      - qdrant_storage:/qdrant/storage
+    environment:
+      - QDRANT__SERVICE__API_KEY=${QDRANT_API_KEY}
+      - QDRANT__SERVICE__HTTP_PORT=6333
+      - QDRANT__SERVICE__GRPC_PORT=6334
+
+  # Embedding Service (optional - for self-hosted)
+  embeddings:
+    image: michaelf34/infinity:latest
+    restart: always
+    ports:
+      - 127.0.0.1:7997:7997
+    volumes:
+      - embedding_models:/app/.cache
+    environment:
+      - MODEL_ID=BAAI/bge-small-en-v1.5
+      - BATCH_SIZE=32
+      - ENGINE=torch  # or optimum for better CPU performance
+    # Optional: GPU support
+    deploy:
+      resources:
+        reservations:
+          devices:
+            - driver: nvidia
+              count: 1
+              capabilities: [gpu]
+
+  # MCP Server with vector search
+  mcp:
+    build: .
+    command: ["--transport", "streamable-http"]
+    depends_on:
+      - app
+      - qdrant
+      - embeddings  # optional
+    environment:
+      # ... existing env vars ...
+      - SEMANTIC_SEARCH_ENABLED=true
+      - QDRANT_URL=http://qdrant:6333
+      - QDRANT_API_KEY=${QDRANT_API_KEY}
+      # Choose embedding strategy
+      - EMBEDDING_SERVICE_URL=http://embeddings:7997
+      # OR
+      # - OPENAI_API_KEY=${OPENAI_API_KEY}
+
+  # Vector Sync Worker
+  mcp-vector-sync:
+    build: .
+    command: ["python", "-m", "nextcloud_mcp_server.sync.vector_indexer"]
+    depends_on:
+      - app
+      - qdrant
+      - embeddings  # optional
+    environment:
+      # Nextcloud + Auth (from ADR-002)
+      - NEXTCLOUD_HOST=http://app:80
+      - ENABLE_OFFLINE_ACCESS=true
+      - TOKEN_ENCRYPTION_KEY=${TOKEN_ENCRYPTION_KEY}
+      # Vector Database
+      - QDRANT_URL=http://qdrant:6333
+      - QDRANT_API_KEY=${QDRANT_API_KEY}
+      # Embeddings
+      - EMBEDDING_SERVICE_URL=http://embeddings:7997
+    volumes:
+      - sync-tokens:/app/data
+
+volumes:
+  qdrant_storage:
+  embedding_models:
+  sync-tokens:
+```
+
+### 8. Performance Optimization
+
+#### 8.1 Indexing Performance
+
+```python
+# Batch embedding generation
+async def embed_batch_chunked(
+    texts: list[str],
+    batch_size: int = 100
+) -> list[list[float]]:
+    """Generate embeddings in chunks to avoid memory issues"""
+
+    embeddings = []
+    for i in range(0, len(texts), batch_size):
+        batch = texts[i:i + batch_size]
+        batch_embeddings = await embedding_service.embed_batch(batch)
+        embeddings.extend(batch_embeddings)
+        await asyncio.sleep(0.1)  # Rate limiting
+
+    return embeddings
+
+# Parallel upsert with batching
+async def upsert_points_batched(
+    points: list[models.PointStruct],
+    batch_size: int = 100
+):
+    """Upsert points in batches"""
+
+    for i in range(0, len(points), batch_size):
+        batch = points[i:i + batch_size]
+        await qdrant_client.upsert(
+            collection_name="nextcloud_content",
+            points=batch,
+            wait=False  # Don't wait for indexing
+        )
+```
+
+#### 8.2 Search Performance
+
+```python
+# Search with prefetch for better accuracy
+search_results = await qdrant_client.search(
+    collection_name="nextcloud_content",
+    query_vector=query_embedding,
+    query_filter=user_filter,
+    limit=20,
+    with_payload=True,
+    with_vectors=False,  # Don't return vectors (saves bandwidth)
+    search_params=models.SearchParams(
+        hnsw_ef=128,  # Higher = more accurate but slower
+        exact=False   # Use HNSW index
+    )
+)
+```
+
+#### 8.3 Caching
+
+```python
+# Cache embeddings for common queries
+from functools import lru_cache
+
+@lru_cache(maxsize=1000)
+def cache_key(text: str) -> str:
+    return hashlib.sha256(text.encode()).hexdigest()
+
+async def embed_with_cache(text: str) -> list[float]:
+    """Generate embedding with caching"""
+
+    key = cache_key(text)
+
+    # Check Redis cache
+    cached = await redis.get(f"embedding:{key}")
+    if cached:
+        return json.loads(cached)
+
+    # Generate embedding
+    embedding = await embedding_service.embed(text)
+
+    # Cache for 1 hour
+    await redis.setex(
+        f"embedding:{key}",
+        3600,
+        json.dumps(embedding)
+    )
+
+    return embedding
+```
+
+### 9. Monitoring and Metrics
+
+```python
+# Prometheus metrics
+from prometheus_client import Counter, Histogram, Gauge
+
+# Search metrics
+semantic_search_count = Counter(
+    'semantic_search_total',
+    'Total semantic searches',
+    ['user_id', 'content_type']
+)
+
+semantic_search_latency = Histogram(
+    'semantic_search_duration_seconds',
+    'Semantic search latency',
+    ['phase']  # 'embedding', 'vector_search', 'verification'
+)
+
+# Indexing metrics
+documents_indexed = Counter(
+    'documents_indexed_total',
+    'Total documents indexed',
+    ['user_id', 'content_type']
+)
+
+index_queue_size = Gauge(
+    'index_queue_size',
+    'Number of documents waiting to be indexed'
+)
+
+# Usage
+async def semantic_search(query: str, user_id: str):
+    semantic_search_count.labels(user_id=user_id, content_type='note').inc()
+
+    with semantic_search_latency.labels(phase='embedding').time():
+        embedding = await embed(query)
+
+    with semantic_search_latency.labels(phase='vector_search').time():
+        results = await qdrant.search(...)
+
+    with semantic_search_latency.labels(phase='verification').time():
+        verified = await verify_access(results)
+
+    return verified
+```
+
+## Consequences
+
+### Benefits
+
+1. **Semantic Understanding**
+   - Find content by meaning, not just keywords
+   - Support for natural language queries
+   - Cross-lingual search potential
+   - Better context discovery for LLMs
+
+2. **User Experience**
+   - More relevant search results
+   - Discover related content across apps
+   - Fast sub-second query latency
+   - Hybrid search combines best of both worlds
+
+3. **Architecture**
+   - External sidecar (doesn't bloat MCP server)
+   - Configurable embedding backend (cloud/self-hosted/local)
+   - Multi-tenant with strict isolation
+   - Scales horizontally (Qdrant cluster)
+
+4. **Privacy & Security**
+   - Self-hosted option available
+   - Dual-phase authorization enforces permissions
+   - Vector DB is cache, not source of truth
+   - Per-user audit trail
+
+5. **Developer Experience**
+   - Simple async Python API
+   - Comprehensive monitoring
+   - Clear upgrade path (better embeddings, reranking)
+
+### Limitations
+
+1. **Complexity**
+   - Additional infrastructure (Qdrant + embeddings)
+   - More monitoring required
+   - Embedding generation latency
+   - Initial indexing time for large collections
+
+2. **Cost**
+   - Storage: ~4KB per document (embedding + metadata)
+   - Compute: Embedding generation (API costs or GPU)
+   - Memory: Qdrant keeps vectors in RAM for speed
+
+3. **Operational**
+   - Index maintenance and updates
+   - Embedding model versioning
+   - Handling deleted/moved content
+   - Cold start indexing for new users
+
+4. **Search Accuracy**
+   - Quality depends on embedding model
+   - May miss exact keyword matches (mitigated by hybrid search)
+   - Cultural/domain-specific terms may not embed well
+   - Requires tuning score thresholds
+
+### Performance Characteristics
+
+| Metric | Target | Notes |
+|--------|--------|-------|
+| Search latency | <200ms | Embedding + vector search + verification |
+| Indexing throughput | >100 docs/sec | With batch embeddings |
+| Memory per 10k docs | ~40MB | Qdrant vectors + metadata |
+| Disk per 10k docs | ~40MB | Persistent storage |
+| Search accuracy | >90% | At score_threshold=0.7 |
+
+### Cost Estimates
+
+**Small Deployment** (10 users, 1000 notes each):
+- Initial indexing: 10,000 notes × $0.00002 = $0.20 (OpenAI)
+- Monthly searches: 1000 queries × $0.00002 = $0.02
+- Infrastructure: Qdrant (40MB RAM), Embeddings (optional)
+- **Total**: ~$0.25/month (API) or self-hosted (negligible)
+
+**Medium Deployment** (100 users, 500 notes each):
+- Initial indexing: 50,000 notes × $0.00002 = $1.00
+- Monthly searches: 10,000 queries × $0.00002 = $0.20
+- Infrastructure: Qdrant (200MB RAM)
+- **Total**: ~$1.20/month or self-hosted
+
+**Self-Hosted** (any size):
+- GPU instance: ~$0.50/hour (~$360/month for 24/7)
+- Or CPU-only: negligible cost, slower embeddings
+
+### Future Enhancements
+
+1. **Multimodal Search**
+   - Image embeddings (CLIP)
+   - PDF/document layout understanding
+   - Audio transcription + embedding
+
+2. **Advanced Ranking**
+   - Cross-encoder reranking
+   - Learning-to-rank models
+   - User feedback signals
+
+3. **Query Understanding**
+   - Query expansion
+   - Spell correction
+   - Entity extraction
+
+4. **Performance**
+   - Query result caching
+   - Approximate nearest neighbor improvements
+   - Quantization for reduced memory
+
+5. **Features**
+   - Saved searches
+   - Search analytics
+   - Recommended content
+
+## Alternatives Considered
+
+### Alternative 1: Elasticsearch/OpenSearch
+
+**Approach**: Use traditional full-text search engine with vector plugin
+
+**Pros**:
+- Mature ecosystem
+- Excellent keyword search
+- Rich query DSL
+
+**Cons**:
+- Heavy infrastructure (JVM-based)
+- Complex setup and tuning
+- Vector search is plugin/add-on (not native)
+- Higher resource usage
+
+**Decision**: Rejected; Qdrant is purpose-built for vectors
+
+### Alternative 2: ChromaDB
+
+**Approach**: Embedded or client-server vector database
+
+**Pros**:
+- Simple Python API
+- Easy to get started
+- Good for prototyping
+
+**Cons**:
+- Sync-only Python client (no async)
+- Limited multi-tenancy features
+- Less mature than Qdrant
+- Scaling concerns
+
+**Decision**: Rejected; async and multi-tenancy are critical
+
+### Alternative 3: Weaviate
+
+**Approach**: Full-featured vector database with GraphQL
+
+**Pros**:
+- Very feature-rich
+- Built-in vectorization
+- Good documentation
+
+**Cons**:
+- More complex architecture
+- Higher resource usage
+- GraphQL adds complexity
+- Overkill for our use case
+
+**Decision**: Rejected; Qdrant provides better balance
+
+### Alternative 4: pgvector (PostgreSQL Extension)
+
+**Approach**: Add vector search to existing PostgreSQL
+
+**Pros**:
+- Leverages existing PostgreSQL expertise
+- Transactional consistency
+- Mature database ecosystem
+
+**Cons**:
+- This deployment uses MariaDB (would need PostgreSQL)
+- Performance not as optimized as purpose-built vector DB
+- Manual hybrid search implementation
+- HNSW index limitations
+
+**Decision**: Rejected; dedicated vector DB is better fit
+
+### Alternative 5: Pinecone / Vertex AI Vector Search
+
+**Approach**: Managed cloud vector database
+
+**Pros**:
+- Fully managed
+- Excellent performance
+- No infrastructure management
+
+**Cons**:
+- Cloud-only (no self-hosting)
+- Recurring costs
+- Vendor lock-in
+- Data leaves premises
+
+**Decision**: Rejected; self-hosting is important for privacy
+
+## Related Decisions
+
+- ADR-001: Enhanced Note Search (establishes need for better search)
+- ADR-002: Vector Sync Authentication (defines how sync workers authenticate)
+- [Future] ADR-004: Content Extraction and Document Processing
+- [Future] ADR-005: Cross-App Semantic Search
+
+## References
+
+- [Qdrant Documentation](https://qdrant.tech/documentation/)
+- [Sentence Transformers](https://www.sbert.net/)
+- [OpenAI Embeddings Guide](https://platform.openai.com/docs/guides/embeddings)
+- [Hybrid Search with RRF](https://qdrant.tech/articles/hybrid-search/)
+- [HNSW Algorithm](https://arxiv.org/abs/1603.09320)
+- [Reciprocal Rank Fusion](https://plg.uwaterloo.ca/~gvcormac/cormacksigir09-rrf.pdf)