Files
nextcloud-mcp-server/docs/ADR-015-unified-provider-architecture.md
T
Chris Coutinho 5b484c9226 feat: add unified provider architecture with Amazon Bedrock support
Refactored LLM provider infrastructure to support sustainable additions of new providers with both embedding and text generation capabilities.

## Major Changes

### Unified Provider Architecture (ADR-015)
- Created `nextcloud_mcp_server/providers/` with unified Provider ABC
- Providers now support optional capabilities (embeddings and/or generation)
- Auto-detection registry with priority: Bedrock → Ollama → Simple
- Backward compatible - existing code continues to work

### New Providers
- **BedrockProvider**: Full Amazon Bedrock integration
  - Embeddings: Titan Embed, Cohere Embed models
  - Generation: Claude, Llama, Titan Text, Mistral models
  - Model-specific request/response handling
  - AWS credential chain integration
- **OllamaProvider**: Migrated with both capabilities support
- **AnthropicProvider**: Moved from test code to production providers
- **SimpleProvider**: Migrated in-memory fallback provider

### Breaking Changes
None - full backward compatibility maintained:
- `embedding.get_embedding_service()` still works
- RAG evaluation tests updated to use unified providers
- All existing tests pass (127 unit tests)

### Testing
- Added 9 comprehensive Bedrock unit tests with mocked boto3
- All existing unit tests pass
- Type checking (ty) and linting (ruff) pass
- Verified backward compatibility

### Documentation
- `docs/ADR-015-unified-provider-architecture.md`: Comprehensive ADR
- `docs/bedrock-setup.md`: AWS setup guide with IAM permissions
- `CLAUDE.md`: Updated with provider architecture section

### Dependencies
- Added `boto3>=1.35.0` to dev dependencies (optional)

## Environment Variables

### Bedrock
- `AWS_REGION`: AWS region (e.g., "us-east-1")
- `BEDROCK_EMBEDDING_MODEL`: Model ID for embeddings
- `BEDROCK_GENERATION_MODEL`: Model ID for generation
- `AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY`: Optional credentials

### Ollama
- `OLLAMA_BASE_URL`: API URL
- `OLLAMA_EMBEDDING_MODEL`: Embedding model (default: "nomic-embed-text")
- `OLLAMA_GENERATION_MODEL`: Generation model

## AWS Bedrock Permissions Required

Minimal IAM policy:
```json
{
  "Effect": "Allow",
  "Action": ["bedrock:InvokeModel"],
  "Resource": ["arn:aws:bedrock:*::foundation-model/*"]
}
```

See `docs/bedrock-setup.md` for detailed setup instructions.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-16 11:36:58 +01:00

12 KiB

ADR-015: Unified Provider Architecture for Embeddings and Text Generation

Status: Accepted Date: 2025-01-16 Deciders: Development Team Related: ADR-003 (Vector Database), ADR-008 (MCP Sampling), ADR-013 (RAG Evaluation)

Context

Prior to this refactoring, the codebase had two separate provider systems:

  1. Embedding Providers (nextcloud_mcp_server/embedding/)

    • Used EmbeddingProvider ABC with methods: embed(), embed_batch(), get_dimension()
    • Had auto-detection via EmbeddingService._detect_provider()
    • Used for semantic search and vector indexing (production)
  2. LLM Providers (tests/rag_evaluation/llm_providers.py)

    • Used LLMProvider Protocol with method: generate()
    • Had separate factory function create_llm_provider()
    • Used only for RAG evaluation tests (not production)

This fragmentation created several problems:

Problems with Dual Provider Systems

  1. Code Duplication

    • Ollama configuration appeared in both embedding/service.py and tests/rag_evaluation/llm_providers.py
    • Similar provider detection logic in multiple places
    • Separate singleton patterns for each system
  2. Limited Extensibility

    • Hard-coded provider detection in EmbeddingService._detect_provider()
    • No support for providers that offer both capabilities (like Bedrock)
    • Adding new providers required modifying multiple files
  3. Inconsistent Patterns

    • BM25 provider didn't follow EmbeddingProvider ABC
    • Different method names across providers (embed vs encode)
    • ABC vs Protocol for type checking
  4. Difficult Scaling

    • Adding Amazon Bedrock (our third provider) would exacerbate all issues
    • No clear path for future providers (OpenAI, Cohere, etc.)

Amazon Bedrock Requirements

Bedrock naturally supports both embeddings and text generation:

  • Embeddings: amazon.titan-embed-text-v1/v2, cohere.embed-*
  • Text Generation: anthropic.claude-*, meta.llama3-*, amazon.titan-text-*
  • Unified API: Single invoke_model() method via bedrock-runtime

This made it the perfect opportunity to establish a unified provider architecture.

Decision

We refactored the provider infrastructure to use a unified Provider ABC with optional capabilities:

1. Unified Provider Interface

New Structure:

nextcloud_mcp_server/providers/
├── __init__.py
├── base.py              # Provider ABC with optional capabilities
├── registry.py          # Auto-detection and factory
├── ollama.py            # Supports both embedding + generation
├── anthropic.py         # Generation only
├── bedrock.py           # Supports both embedding + generation
└── simple.py            # Embedding only (testing fallback)

Base Class (providers/base.py):

class Provider(ABC):
    @property
    @abstractmethod
    def supports_embeddings(self) -> bool:
        """Whether this provider supports embedding generation."""
        pass

    @property
    @abstractmethod
    def supports_generation(self) -> bool:
        """Whether this provider supports text generation."""
        pass

    @abstractmethod
    async def embed(self, text: str) -> list[float]:
        """Generate embedding (raises NotImplementedError if not supported)."""
        pass

    @abstractmethod
    async def embed_batch(self, texts: list[str]) -> list[list[float]]:
        """Generate batch embeddings (raises NotImplementedError if not supported)."""
        pass

    @abstractmethod
    def get_dimension(self) -> int:
        """Get embedding dimension (raises NotImplementedError if not supported)."""
        pass

    @abstractmethod
    async def generate(self, prompt: str, max_tokens: int = 500) -> str:
        """Generate text (raises NotImplementedError if not supported)."""
        pass

    @abstractmethod
    async def close(self) -> None:
        """Close provider and release resources."""
        pass

2. Provider Registry

Auto-Detection Priority (providers/registry.py):

class ProviderRegistry:
    @staticmethod
    def create_provider() -> Provider:
        # 1. Bedrock (AWS_REGION or BEDROCK_*_MODEL)
        # 2. Ollama (OLLAMA_BASE_URL)
        # 3. Simple (fallback)

Environment Variables:

Bedrock:

  • AWS_REGION: AWS region (e.g., "us-east-1")
  • AWS_ACCESS_KEY_ID: AWS access key (optional, uses credential chain)
  • AWS_SECRET_ACCESS_KEY: AWS secret key (optional)
  • BEDROCK_EMBEDDING_MODEL: Model ID for embeddings (e.g., "amazon.titan-embed-text-v2:0")
  • BEDROCK_GENERATION_MODEL: Model ID for text generation (e.g., "anthropic.claude-3-sonnet-20240229-v1:0")

Ollama:

  • OLLAMA_BASE_URL: Ollama API base URL (e.g., "http://localhost:11434")
  • OLLAMA_EMBEDDING_MODEL: Model for embeddings (default: "nomic-embed-text")
  • OLLAMA_GENERATION_MODEL: Model for text generation (e.g., "llama3.2:1b")
  • OLLAMA_VERIFY_SSL: Verify SSL certificates (default: "true")

Simple (no configuration, fallback):

  • SIMPLE_EMBEDDING_DIMENSION: Embedding dimension (default: 384)

3. Backward Compatibility

Old Code Continues to Work:

# Old way (still works)
from nextcloud_mcp_server.embedding import get_embedding_service

service = get_embedding_service()  # Returns singleton Provider
embeddings = await service.embed_batch(texts)

New Way (recommended):

# New way (cleaner)
from nextcloud_mcp_server.providers import get_provider

provider = get_provider()  # Returns singleton Provider
embeddings = await provider.embed_batch(texts)

# Can also use generation if provider supports it
if provider.supports_generation:
    text = await provider.generate("prompt")

Migration Path:

  • embedding/service.py now wraps providers.get_provider() for compatibility
  • tests/rag_evaluation/llm_providers.py now uses unified providers
  • Old imports still work, marked as deprecated in docstrings

4. Amazon Bedrock Implementation

Features:

  • Supports both embeddings and text generation
  • Model-specific request/response handling for:
    • Titan Embed (amazon.titan-embed-text-*)
    • Cohere Embed (cohere.embed-*)
    • Claude (anthropic.claude-*)
    • Llama (meta.llama3-*)
    • Titan Text (amazon.titan-text-*)
    • Mistral (mistral.*)
  • Uses boto3 bedrock-runtime client
  • Graceful degradation if boto3 not installed
  • Async implementation matching existing patterns

Model-Specific Handling:

# Bedrock embedding request (Titan)
{"inputText": text}

# Bedrock generation request (Claude)
{
    "anthropic_version": "bedrock-2023-05-31",
    "max_tokens": max_tokens,
    "temperature": 0.7,
    "messages": [{"role": "user", "content": prompt}]
}

Consequences

Positive

  1. Sustainable Provider Additions

    • New providers only need to implement Provider ABC
    • Auto-detection via environment variables
    • No modifications to existing code required
  2. Code Consolidation

    • Single provider interface instead of two
    • Unified configuration pattern
    • Eliminated duplication
  3. Better Extensibility

    • Providers can support one or both capabilities
    • Clear capability detection via properties
    • Registry pattern simplifies auto-detection
  4. Improved Testing

    • RAG evaluation can use any provider (Ollama, Anthropic, Bedrock)
    • Comprehensive unit tests for all providers
    • Mocked boto3 tests for Bedrock
  5. Production-Ready Bedrock Support

    • Full embedding and generation support
    • Multiple model families supported
    • AWS credential chain integration

Neutral

  1. Optional Boto3 Dependency

    • boto3 is dev dependency only (not required for core functionality)
    • Bedrock provider gracefully fails if boto3 not installed
    • Users who want Bedrock must pip install boto3
  2. Capability Properties

    • All providers must implement capability properties
    • Methods raise NotImplementedError if capability not supported
    • Clear error messages guide users to alternatives

Negative

  1. Migration Effort

    • Existing code must be migrated to new imports (optional, backward compatible)
    • Documentation needs updating
    • Users must learn new environment variables
  2. Increased Complexity

    • Provider base class has more methods (embedding + generation)
    • More environment variables to configure
    • Capability detection adds runtime checks

Implementation

Files Created

New Provider Infrastructure:

  • nextcloud_mcp_server/providers/__init__.py
  • nextcloud_mcp_server/providers/base.py
  • nextcloud_mcp_server/providers/registry.py
  • nextcloud_mcp_server/providers/ollama.py
  • nextcloud_mcp_server/providers/anthropic.py
  • nextcloud_mcp_server/providers/bedrock.py
  • nextcloud_mcp_server/providers/simple.py

Tests:

  • tests/unit/providers/__init__.py
  • tests/unit/providers/test_bedrock.py (9 unit tests)

Documentation:

  • docs/ADR-015-unified-provider-architecture.md (this file)

Files Modified

Backward Compatibility:

  • nextcloud_mcp_server/embedding/service.py - Now wraps get_provider()
  • tests/rag_evaluation/llm_providers.py - Uses unified providers

Dependencies:

  • pyproject.toml - Added boto3>=1.35.0 to dev dependencies

Testing Results

Unit Tests: 127 passed (including 9 new Bedrock tests) Type Checking: All checks passed (ty) Linting: All checks passed (ruff) Backward Compatibility: Verified - existing embedding tests work

Alternatives Considered

Alternative 1: Keep Separate Provider Systems

Pros:

  • No refactoring needed
  • Simpler short-term

Cons:

  • Bedrock would need to be implemented twice
  • Continued code duplication
  • No long-term scalability

Decision: Rejected - technical debt would continue to grow

Alternative 2: Separate Embedding and Generation Providers

Use composition instead of unified interface:

class CombinedProvider:
    def __init__(self, embedding: EmbeddingProvider, generation: LLMProvider):
        self.embedding = embedding
        self.generation = generation

Pros:

  • Clearer separation of concerns
  • Simpler individual providers

Cons:

  • Bedrock and Ollama naturally do both - artificial separation
  • More complex configuration (two providers to configure)
  • More boilerplate code

Decision: Rejected - unified interface better matches provider capabilities

Alternative 3: Plugin System

Dynamic provider registration via entry points:

# setup.py
entry_points={
    'nextcloud_mcp.providers': [
        'ollama = nextcloud_mcp_server.providers.ollama:OllamaProvider',
        'bedrock = nextcloud_mcp_server.providers.bedrock:BedrockProvider',
    ]
}

Pros:

  • Most extensible
  • Third-party providers possible

Cons:

  • Over-engineered for current needs
  • Added complexity
  • No immediate benefit

Decision: Deferred - can add later if needed

Future Work

  1. Additional Providers

    • OpenAI (embeddings + generation)
    • Cohere (embeddings + generation)
    • Google Vertex AI
    • Azure OpenAI
  2. Provider Features

    • Streaming generation support
    • Batch API optimization (when available)
    • Model-specific optimizations
    • Cost tracking and metrics
  3. Configuration Improvements

    • Provider profiles (development, production)
    • Model aliasing (e.g., "small", "large")
    • Fallback provider chains
  4. Testing

    • Integration tests with real Bedrock endpoints
    • Performance benchmarking across providers
    • Cost comparison analysis

References