Files

T

Chris Coutinho bb5d4f464f feat: implement MCP sampling for semantic search RAG (ADR-008)

Add nc_notes_semantic_search_answer tool that combines semantic search
with MCP sampling to generate natural language answers from retrieved
Nextcloud Notes. This enables Retrieval-Augmented Generation (RAG)
patterns without requiring a server-side LLM.

Key features:
- Client-side LLM generation via ctx.session.create_message()
- Graceful fallback when sampling unavailable
- Proper source citations in generated answers
- No results optimization (skips sampling when no docs found)
- Comprehensive unit and integration tests

Implementation details:
- SamplingSearchResponse model with generated_answer and sources
- Fixed prompt template with document context and citation instructions
- Model preferences hint Claude Sonnet for balanced performance
- Falls back to returning documents without answer on sampling failure

Updates:
- Add ADR-008 documenting sampling architecture decision
- Add MCP sampling pattern guidance to CLAUDE.md
- Update README.md and docs/notes.md (7 → 9 tools)
- Add 4 unit tests and 6 integration tests

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

2025-11-09 01:00:18 +01:00

16 KiB

Raw Blame History

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Coding Conventions

async/await Patterns

Use anyio + asyncio hybrid - Both libraries are available
- pytest runs in anyio mode (anyio_mode = "auto" in pyproject.toml)
- asyncio used in auth modules (refresh_token_storage.py, token_exchange.py, token_broker.py)
- anyio used in calendar.py, client_registration.py, app.py
- Prefer standard async/await syntax without explicit library imports when possible

Type Hints

Use Python 3.10+ union syntax: str | None instead of Optional[str]
Use lowercase generics: dict[str, Any] instead of Dict[str, Any]
Type all function signatures - Parameters and return types
No explicit type checker configured - Ruff handles linting only

Code Quality

Run ruff before committing:
```
uv run ruff check
uv run ruff format
```
Ruff configuration in pyproject.toml (extends select: ["I"] for import sorting)

Error Handling

Use custom decorators: @retry_on_429 for rate limiting (see base_client.py)
Standard exceptions: HTTPStatusError from httpx, McpError for MCP-specific errors
Logging patterns:
- logger.debug() for expected 404s and normal operations
- logger.warning() for retries and non-critical issues
- logger.error() for actual errors

Testing Patterns

Use existing fixtures from tests/conftest.py (2888 lines of test infrastructure)
Session-scoped fixtures handle anyio/pytest-asyncio incompatibility
Mocked unit tests use mocker.AsyncMock(spec=httpx.AsyncClient)
pytest-timeout: 180s default per test
Mark tests appropriately: @pytest.mark.unit, @pytest.mark.integration, @pytest.mark.oauth, @pytest.mark.smoke

Architectural Patterns

Base classes: BaseNextcloudClient for all API clients
Pydantic responses: All MCP tools return Pydantic models inheriting from BaseResponse
Decorators: @require_scopes, @require_provisioning for access control
Context pattern: await get_client(ctx) to access authenticated NextcloudClient (async!)
FastMCP decorators: @mcp.tool(), @mcp.resource()
Token acquisition: get_client() handles both pass-through and token exchange modes
- Pass-through (default): Simple, stateless (ENABLE_TOKEN_EXCHANGE=false)
- Token exchange (opt-in): RFC 8693 delegation (ENABLE_TOKEN_EXCHANGE=true)

Project Structure

nextcloud_mcp_server/client/ - HTTP clients for Nextcloud APIs
nextcloud_mcp_server/server/ - MCP tool/resource definitions
nextcloud_mcp_server/auth/ - OAuth/OIDC authentication
nextcloud_mcp_server/models/ - Pydantic response models
tests/ - Layered test suite (unit, smoke, integration, load)

Development Commands (Quick Reference)

Testing

# Fast feedback (recommended)
uv run pytest tests/unit/ -v                    # Unit tests (~5s)
uv run pytest -m smoke -v                       # Smoke tests (~30-60s)

# Integration tests
uv run pytest -m "integration and not oauth" -v # Without OAuth (~2-3min)
uv run pytest -m oauth -v                       # OAuth only (~3min)
uv run pytest                                   # Full suite (~4-5min)

# Coverage
uv run pytest --cov

# Specific tests after changes
uv run pytest tests/server/test_mcp.py -k "notes" -v
uv run pytest tests/client/notes/test_notes_api.py -v

Important: After code changes, rebuild the correct container:

Single-user tests: docker-compose up --build -d mcp
OAuth tests: docker-compose up --build -d mcp-oauth
Keycloak tests: docker-compose up --build -d mcp-keycloak

Running the Server

# Local development
export $(grep -v '^#' .env | xargs)
mcp run --transport sse nextcloud_mcp_server.app:mcp

# Docker development (rebuilds after code changes)
docker-compose up --build -d mcp        # Single-user (port 8000)
docker-compose up --build -d mcp-oauth  # Nextcloud OAuth (port 8001)
docker-compose up --build -d mcp-keycloak  # Keycloak OAuth (port 8002)

Environment Setup

uv sync                # Install dependencies
uv sync --group dev    # Install with dev dependencies

Load Testing

# Quick test (default: 10 workers, 30 seconds)
uv run python -m tests.load.benchmark

# Custom concurrency and duration
uv run python -m tests.load.benchmark -c 20 -d 60

# Export results for analysis
uv run python -m tests.load.benchmark --output results.json --verbose

Expected Performance: 50-200 RPS for mixed workload, p50 <100ms, p95 <500ms, p99 <1000ms.

Database Inspection

Credentials: root/password, nextcloud/password, database: nextcloud

# Connect to database
docker compose exec db mariadb -u root -ppassword nextcloud

# Check OAuth clients
docker compose exec db mariadb -u root -ppassword nextcloud -e \
  "SELECT id, name, token_type FROM oc_oidc_clients ORDER BY id DESC LIMIT 10;"

# Check OAuth client scopes
docker compose exec db mariadb -u root -ppassword nextcloud -e \
  "SELECT c.id, c.name, s.scope FROM oc_oidc_clients c LEFT JOIN oc_oidc_client_scopes s ON c.id = s.client_id WHERE c.name LIKE '%MCP%';"

# Check OAuth access tokens
docker compose exec db mariadb -u root -ppassword nextcloud -e \
  "SELECT id, client_id, user_id, created_at FROM oc_oidc_access_tokens ORDER BY created_at DESC LIMIT 10;"

Important Tables:

oc_oidc_clients - OAuth client registrations (DCR)
oc_oidc_client_scopes - Client allowed scopes
oc_oidc_access_tokens - Issued access tokens
oc_oidc_authorization_codes - Authorization codes
oc_oidc_registration_tokens - RFC 7592 registration tokens
oc_oidc_redirect_uris - Redirect URIs

Architecture Quick Reference

For detailed architecture, see:

docs/comparison-context-agent.md - Overall architecture
docs/oauth-architecture.md - OAuth integration patterns
docs/ADR-004-progressive-consent.md - Progressive consent implementation

Core Components:

nextcloud_mcp_server/app.py - FastMCP server entry point
nextcloud_mcp_server/client/ - HTTP clients (Notes, Calendar, Contacts, Tables, WebDAV)
nextcloud_mcp_server/server/ - MCP tool/resource definitions
nextcloud_mcp_server/auth/ - OAuth/OIDC authentication

Supported Apps: Notes, Calendar (CalDAV + VTODO tasks), Contacts (CardDAV), Tables, WebDAV, Deck, Cookbook

Key Patterns:

NextcloudClient orchestrates all app-specific clients
BaseNextcloudClient provides common HTTP functionality + retry logic
MCP tools use context pattern: get_client(ctx) → NextcloudClient
All operations are async using httpx

Important: Progressive consent is a mechanism for granting access, not a feature flag. The architecture is always present in OAuth mode. Whether provisioning tools are available is controlled by ENABLE_OFFLINE_ACCESS.

What is Progressive Consent?

Dual OAuth flow architecture that separates client authentication (Flow 1) from resource provisioning (Flow 2)
Flow 1: MCP client authenticates directly to IdP with resource scopes (notes:, calendar:, etc.)
- Token audience: "mcp-server"
- Client receives resource-scoped token for MCP session
Flow 2: Server explicitly provisions Nextcloud access via separate login (only when ENABLE_OFFLINE_ACCESS=true)
- Server requests: openid, profile, email, offline_access
- Token audience: "nextcloud"
- Server receives refresh token for offline access
- Client never sees this token
Provides clear separation between session tokens and offline access tokens

Modes:

Pass-through mode (ENABLE_OFFLINE_ACCESS=false, default):
- No Flow 2 provisioning
- Server uses client's token to access Nextcloud (pass-through)
- No provisioning tools available
- Suitable for stateless, client-driven operations
Offline access mode (ENABLE_OFFLINE_ACCESS=true):
- Flow 2 provisioning available
- Server stores refresh tokens for background operations
- Provisioning tools available: provision_nextcloud_access, check_logged_in
- Suitable for background jobs and server-initiated operations

When to use OAuth mode:

Multi-user deployments
Background jobs requiring offline access (with ENABLE_OFFLINE_ACCESS=true)
Enhanced security with separate authorization contexts
Explicit user control over resource access

When to use BasicAuth instead:

Simple single-user deployments
Local development and testing

Key features:

No scope escalation - client gets exactly what it requests
User explicitly authorizes via provision_nextcloud_access tool
Clear security boundaries between MCP session and Nextcloud access

MCP Response Patterns (CRITICAL)

Never return raw List[Dict] from MCP tools - FastMCP mangles them into dicts with numeric string keys.

Correct Pattern:

Client methods return List[Dict] (raw data)
MCP tools convert to Pydantic models and wrap in response object
Response models inherit from BaseResponse, include results field + metadata

Reference implementations:

nextcloud_mcp_server/models/notes.py:80 - SearchNotesResponse
nextcloud_mcp_server/models/webdav.py:113 - SearchFilesResponse
nextcloud_mcp_server/server/{notes,webdav}.py - Tool examples

Testing: Extract data["results"] from MCP responses, not data directly.

MCP Sampling for RAG (ADR-008)

What is MCP Sampling? MCP sampling allows servers to request LLM completions from their clients. This enables Retrieval-Augmented Generation (RAG) patterns where the server retrieves context and the client's LLM generates answers.

When to use sampling:

Generating natural language answers from retrieved documents
Synthesizing information from multiple sources
Creating summaries with citations

Implementation Pattern (see ADR-008 for details):

from mcp.types import ModelHint, ModelPreferences, SamplingMessage, TextContent

@mcp.tool()
@require_scopes("notes:read")
async def nc_notes_semantic_search_answer(
    query: str, ctx: Context, limit: int = 5, max_answer_tokens: int = 500
) -> SamplingSearchResponse:
    # 1. Retrieve documents
    search_response = await nc_notes_semantic_search(query, ctx, limit)

    # 2. Check for no results (don't waste sampling call)
    if not search_response.results:
        return SamplingSearchResponse(
            query=query,
            generated_answer="No relevant documents found.",
            sources=[], total_found=0, success=True
        )

    # 3. Construct prompt with retrieved context
    prompt = f"{query}\n\nDocuments:\n{format_sources(search_response.results)}\n\nProvide answer with citations."

    # 4. Request LLM completion via sampling
    try:
        result = await ctx.session.create_message(
            messages=[SamplingMessage(role="user", content=TextContent(type="text", text=prompt))],
            max_tokens=max_answer_tokens,
            temperature=0.7,
            model_preferences=ModelPreferences(
                hints=[ModelHint(name="claude-3-5-sonnet")],
                intelligencePriority=0.8,
                speedPriority=0.5,
            ),
            include_context="thisServer",
        )

        return SamplingSearchResponse(
            query=query,
            generated_answer=result.content.text,
            sources=search_response.results,
            model_used=result.model,
            stop_reason=result.stopReason,
            success=True
        )
    except Exception as e:
        # Fallback: Return documents without generated answer
        return SamplingSearchResponse(
            query=query,
            generated_answer=f"[Sampling unavailable: {e}]\n\nFound {len(search_response.results)} documents.",
            sources=search_response.results,
            search_method="semantic_sampling_fallback",
            success=True
        )

Key Points:

No server-side LLM: Server has no API keys, client controls which model is used
Graceful degradation: Tool always returns useful results even if sampling fails
User control: MCP clients SHOULD prompt users to approve sampling requests
No results optimization: Skip sampling call when no documents found
Fixed prompts: Prompts are not user-configurable to avoid injection risks

Reference: See nc_notes_semantic_search_answer in nextcloud_mcp_server/server/notes.py:517 and ADR-008 for complete implementation.

Testing Best Practices (MANDATORY)

Always Run Tests

Run tests to completion before considering any task complete
Rebuild the correct container after code changes (see Development Commands above)
If tests require modifications, ask for permission before proceeding

Use Existing Fixtures

See tests/conftest.py for 2888 lines of test infrastructure:

nc_mcp_client - MCP client for tool/resource testing (uses mcp container)
nc_mcp_oauth_client - MCP client for OAuth testing (uses mcp-oauth container)
nc_client - Direct NextcloudClient for setup/cleanup
temporary_note, temporary_addressbook, temporary_contact - Auto-cleanup

Writing Mocked Unit Tests

For client-layer response parsing tests, use mocked HTTP responses:

async def test_notes_api_get_note(mocker):
    """Test that get_note correctly parses the API response."""
    mock_response = create_mock_note_response(
        note_id=123, title="Test Note", content="Test content",
        category="Test", etag="abc123"
    )

    mock_make_request = mocker.patch.object(
        NotesClient, "_make_request", return_value=mock_response
    )

    client = NotesClient(mocker.AsyncMock(spec=httpx.AsyncClient), "testuser")
    note = await client.get_note(note_id=123)

    assert note["id"] == 123
    mock_make_request.assert_called_once_with("GET", "/apps/notes/api/v1/notes/123")

Mock helpers in tests/conftest.py: create_mock_response(), create_mock_note_response(), create_mock_error_response()

When to use: Response parsing, error handling, request parameter building When NOT to use: CalDAV/CardDAV/WebDAV protocols, OAuth flows, end-to-end MCP testing

OAuth Testing

OAuth tests use Playwright browser automation to complete flows programmatically.

Test Environment:

Three MCP containers: mcp (single-user), mcp-oauth (Nextcloud OIDC), mcp-keycloak (external IdP)
OAuth tests require NEXTCLOUD_HOST, NEXTCLOUD_USERNAME, NEXTCLOUD_PASSWORD environment variables
Playwright configuration: --browser firefox --headed for debugging
Install browsers: uv run playwright install firefox

OAuth fixtures: nc_oauth_client, nc_mcp_oauth_client, alice_oauth_token, bob_oauth_token, etc.

Shared OAuth Client: All test users authenticate using a single OAuth client (created via DCR, deleted at session end via RFC 7592). Matches production behavior.

Run OAuth tests:

uv run pytest -m oauth -v                        # All OAuth tests
uv run pytest tests/server/oauth/ --browser firefox -v
uv run pytest tests/server/oauth/test_oauth_core.py --browser firefox --headed -v

Keycloak OAuth Testing

Validates ADR-002 architecture for external identity providers and offline access patterns.

Architecture: MCP Client → Keycloak (OAuth) → MCP Server → Nextcloud user_oidc (validates token) → APIs

Setup:

docker-compose up -d keycloak app mcp-keycloak
curl http://localhost:8888/realms/nextcloud-mcp/.well-known/openid-configuration
docker compose exec app php occ user_oidc:provider keycloak

Credentials: admin/admin (Keycloak realm: nextcloud-mcp)

For detailed Keycloak setup, see:

docs/oauth-setup.md - OAuth configuration
docs/ADR-002-vector-sync-authentication.md - Offline access architecture
docs/audience-validation-setup.md - Token audience validation
docs/keycloak-multi-client-validation.md - Realm-level validation

Integration Testing with Docker

Nextcloud: docker compose exec app php occ ... for occ commands MariaDB: docker compose exec db mariadb -u [user] -p [password] [database] for queries

For detailed setup, see:

docs/installation.md - Installation guide
docs/configuration.md - Configuration options
docs/authentication.md - Authentication modes
docs/running.md - Running the server

16 KiB Raw Blame History