Files

T

Chris Coutinho 5b484c9226 feat: add unified provider architecture with Amazon Bedrock support

Refactored LLM provider infrastructure to support sustainable additions of new providers with both embedding and text generation capabilities.

## Major Changes

### Unified Provider Architecture (ADR-015)
- Created `nextcloud_mcp_server/providers/` with unified Provider ABC
- Providers now support optional capabilities (embeddings and/or generation)
- Auto-detection registry with priority: Bedrock → Ollama → Simple
- Backward compatible - existing code continues to work

### New Providers
- **BedrockProvider**: Full Amazon Bedrock integration
  - Embeddings: Titan Embed, Cohere Embed models
  - Generation: Claude, Llama, Titan Text, Mistral models
  - Model-specific request/response handling
  - AWS credential chain integration
- **OllamaProvider**: Migrated with both capabilities support
- **AnthropicProvider**: Moved from test code to production providers
- **SimpleProvider**: Migrated in-memory fallback provider

### Breaking Changes
None - full backward compatibility maintained:
- `embedding.get_embedding_service()` still works
- RAG evaluation tests updated to use unified providers
- All existing tests pass (127 unit tests)

### Testing
- Added 9 comprehensive Bedrock unit tests with mocked boto3
- All existing unit tests pass
- Type checking (ty) and linting (ruff) pass
- Verified backward compatibility

### Documentation
- `docs/ADR-015-unified-provider-architecture.md`: Comprehensive ADR
- `docs/bedrock-setup.md`: AWS setup guide with IAM permissions
- `CLAUDE.md`: Updated with provider architecture section

### Dependencies
- Added `boto3>=1.35.0` to dev dependencies (optional)

## Environment Variables

### Bedrock
- `AWS_REGION`: AWS region (e.g., "us-east-1")
- `BEDROCK_EMBEDDING_MODEL`: Model ID for embeddings
- `BEDROCK_GENERATION_MODEL`: Model ID for generation
- `AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY`: Optional credentials

### Ollama
- `OLLAMA_BASE_URL`: API URL
- `OLLAMA_EMBEDDING_MODEL`: Embedding model (default: "nomic-embed-text")
- `OLLAMA_GENERATION_MODEL`: Generation model

## AWS Bedrock Permissions Required

Minimal IAM policy:
```json
{
  "Effect": "Allow",
  "Action": ["bedrock:InvokeModel"],
  "Resource": ["arn:aws:bedrock:*::foundation-model/*"]
}
```

See `docs/bedrock-setup.md` for detailed setup instructions.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

2025-11-16 11:36:58 +01:00

12 KiB

Raw Blame History

ADR-015: Unified Provider Architecture for Embeddings and Text Generation

Status: Accepted Date: 2025-01-16 Deciders: Development Team Related: ADR-003 (Vector Database), ADR-008 (MCP Sampling), ADR-013 (RAG Evaluation)

Context

Prior to this refactoring, the codebase had two separate provider systems:

Embedding Providers (nextcloud_mcp_server/embedding/)
- Used EmbeddingProvider ABC with methods: embed(), embed_batch(), get_dimension()
- Had auto-detection via EmbeddingService._detect_provider()
- Used for semantic search and vector indexing (production)
LLM Providers (tests/rag_evaluation/llm_providers.py)
- Used LLMProvider Protocol with method: generate()
- Had separate factory function create_llm_provider()
- Used only for RAG evaluation tests (not production)

This fragmentation created several problems:

Problems with Dual Provider Systems

Code Duplication
- Ollama configuration appeared in both embedding/service.py and tests/rag_evaluation/llm_providers.py
- Similar provider detection logic in multiple places
- Separate singleton patterns for each system
Limited Extensibility
- Hard-coded provider detection in EmbeddingService._detect_provider()
- No support for providers that offer both capabilities (like Bedrock)
- Adding new providers required modifying multiple files
Inconsistent Patterns
- BM25 provider didn't follow EmbeddingProvider ABC
- Different method names across providers (embed vs encode)
- ABC vs Protocol for type checking
Difficult Scaling
- Adding Amazon Bedrock (our third provider) would exacerbate all issues
- No clear path for future providers (OpenAI, Cohere, etc.)

Amazon Bedrock Requirements

Bedrock naturally supports both embeddings and text generation:

Embeddings: amazon.titan-embed-text-v1/v2, cohere.embed-*
Text Generation: anthropic.claude-*, meta.llama3-*, amazon.titan-text-*
Unified API: Single invoke_model() method via bedrock-runtime

This made it the perfect opportunity to establish a unified provider architecture.

Decision

We refactored the provider infrastructure to use a unified Provider ABC with optional capabilities:

1. Unified Provider Interface

New Structure:

nextcloud_mcp_server/providers/
├── __init__.py
├── base.py              # Provider ABC with optional capabilities
├── registry.py          # Auto-detection and factory
├── ollama.py            # Supports both embedding + generation
├── anthropic.py         # Generation only
├── bedrock.py           # Supports both embedding + generation
└── simple.py            # Embedding only (testing fallback)

Base Class (providers/base.py):

class Provider(ABC):
    @property
    @abstractmethod
    def supports_embeddings(self) -> bool:
        """Whether this provider supports embedding generation."""
        pass

    @property
    @abstractmethod
    def supports_generation(self) -> bool:
        """Whether this provider supports text generation."""
        pass

    @abstractmethod
    async def embed(self, text: str) -> list[float]:
        """Generate embedding (raises NotImplementedError if not supported)."""
        pass

    @abstractmethod
    async def embed_batch(self, texts: list[str]) -> list[list[float]]:
        """Generate batch embeddings (raises NotImplementedError if not supported)."""
        pass

    @abstractmethod
    def get_dimension(self) -> int:
        """Get embedding dimension (raises NotImplementedError if not supported)."""
        pass

    @abstractmethod
    async def generate(self, prompt: str, max_tokens: int = 500) -> str:
        """Generate text (raises NotImplementedError if not supported)."""
        pass

    @abstractmethod
    async def close(self) -> None:
        """Close provider and release resources."""
        pass

2. Provider Registry

Auto-Detection Priority (providers/registry.py):

class ProviderRegistry:
    @staticmethod
    def create_provider() -> Provider:
        # 1. Bedrock (AWS_REGION or BEDROCK_*_MODEL)
        # 2. Ollama (OLLAMA_BASE_URL)
        # 3. Simple (fallback)

Environment Variables:

Bedrock:

AWS_REGION: AWS region (e.g., "us-east-1")
AWS_ACCESS_KEY_ID: AWS access key (optional, uses credential chain)
AWS_SECRET_ACCESS_KEY: AWS secret key (optional)
BEDROCK_EMBEDDING_MODEL: Model ID for embeddings (e.g., "amazon.titan-embed-text-v2:0")
BEDROCK_GENERATION_MODEL: Model ID for text generation (e.g., "anthropic.claude-3-sonnet-20240229-v1:0")

Ollama:

OLLAMA_BASE_URL: Ollama API base URL (e.g., "http://localhost:11434")
OLLAMA_EMBEDDING_MODEL: Model for embeddings (default: "nomic-embed-text")
OLLAMA_GENERATION_MODEL: Model for text generation (e.g., "llama3.2:1b")
OLLAMA_VERIFY_SSL: Verify SSL certificates (default: "true")

Simple (no configuration, fallback):

SIMPLE_EMBEDDING_DIMENSION: Embedding dimension (default: 384)

3. Backward Compatibility

Old Code Continues to Work:

# Old way (still works)
from nextcloud_mcp_server.embedding import get_embedding_service

service = get_embedding_service()  # Returns singleton Provider
embeddings = await service.embed_batch(texts)

New Way (recommended):

# New way (cleaner)
from nextcloud_mcp_server.providers import get_provider

provider = get_provider()  # Returns singleton Provider
embeddings = await provider.embed_batch(texts)

# Can also use generation if provider supports it
if provider.supports_generation:
    text = await provider.generate("prompt")

Migration Path:

embedding/service.py now wraps providers.get_provider() for compatibility
tests/rag_evaluation/llm_providers.py now uses unified providers
Old imports still work, marked as deprecated in docstrings

4. Amazon Bedrock Implementation

Features:

Supports both embeddings and text generation
Model-specific request/response handling for:
- Titan Embed (amazon.titan-embed-text-*)
- Cohere Embed (cohere.embed-*)
- Claude (anthropic.claude-*)
- Llama (meta.llama3-*)
- Titan Text (amazon.titan-text-*)
- Mistral (mistral.*)
Uses boto3 bedrock-runtime client
Graceful degradation if boto3 not installed
Async implementation matching existing patterns

Model-Specific Handling:

# Bedrock embedding request (Titan)
{"inputText": text}

# Bedrock generation request (Claude)
{
    "anthropic_version": "bedrock-2023-05-31",
    "max_tokens": max_tokens,
    "temperature": 0.7,
    "messages": [{"role": "user", "content": prompt}]
}

Consequences

Positive

Sustainable Provider Additions
- New providers only need to implement Provider ABC
- Auto-detection via environment variables
- No modifications to existing code required
Code Consolidation
- Single provider interface instead of two
- Unified configuration pattern
- Eliminated duplication
Better Extensibility
- Providers can support one or both capabilities
- Clear capability detection via properties
- Registry pattern simplifies auto-detection
Improved Testing
- RAG evaluation can use any provider (Ollama, Anthropic, Bedrock)
- Comprehensive unit tests for all providers
- Mocked boto3 tests for Bedrock
Production-Ready Bedrock Support
- Full embedding and generation support
- Multiple model families supported
- AWS credential chain integration

Neutral

Optional Boto3 Dependency
- boto3 is dev dependency only (not required for core functionality)
- Bedrock provider gracefully fails if boto3 not installed
- Users who want Bedrock must pip install boto3
Capability Properties
- All providers must implement capability properties
- Methods raise NotImplementedError if capability not supported
- Clear error messages guide users to alternatives

Negative

Migration Effort
- Existing code must be migrated to new imports (optional, backward compatible)
- Documentation needs updating
- Users must learn new environment variables
Increased Complexity
- Provider base class has more methods (embedding + generation)
- More environment variables to configure
- Capability detection adds runtime checks

Implementation

Files Created

New Provider Infrastructure:

nextcloud_mcp_server/providers/__init__.py
nextcloud_mcp_server/providers/base.py
nextcloud_mcp_server/providers/registry.py
nextcloud_mcp_server/providers/ollama.py
nextcloud_mcp_server/providers/anthropic.py
nextcloud_mcp_server/providers/bedrock.py
nextcloud_mcp_server/providers/simple.py

Tests:

tests/unit/providers/__init__.py
tests/unit/providers/test_bedrock.py (9 unit tests)

Documentation:

docs/ADR-015-unified-provider-architecture.md (this file)

Files Modified

Backward Compatibility:

nextcloud_mcp_server/embedding/service.py - Now wraps get_provider()
tests/rag_evaluation/llm_providers.py - Uses unified providers

Dependencies:

pyproject.toml - Added boto3>=1.35.0 to dev dependencies

Testing Results

Unit Tests: 127 passed (including 9 new Bedrock tests) Type Checking: All checks passed (ty) Linting: All checks passed (ruff) Backward Compatibility: Verified - existing embedding tests work

Alternatives Considered

Alternative 1: Keep Separate Provider Systems

Pros:

No refactoring needed
Simpler short-term

Cons:

Bedrock would need to be implemented twice
Continued code duplication
No long-term scalability

Decision: Rejected - technical debt would continue to grow

Alternative 2: Separate Embedding and Generation Providers

Use composition instead of unified interface:

class CombinedProvider:
    def __init__(self, embedding: EmbeddingProvider, generation: LLMProvider):
        self.embedding = embedding
        self.generation = generation

Pros:

Clearer separation of concerns
Simpler individual providers

Cons:

Bedrock and Ollama naturally do both - artificial separation
More complex configuration (two providers to configure)
More boilerplate code

Decision: Rejected - unified interface better matches provider capabilities

Alternative 3: Plugin System

Dynamic provider registration via entry points:

# setup.py
entry_points={
    'nextcloud_mcp.providers': [
        'ollama = nextcloud_mcp_server.providers.ollama:OllamaProvider',
        'bedrock = nextcloud_mcp_server.providers.bedrock:BedrockProvider',
    ]
}

Pros:

Most extensible
Third-party providers possible

Cons:

Over-engineered for current needs
Added complexity
No immediate benefit

Decision: Deferred - can add later if needed

Future Work

Additional Providers
- OpenAI (embeddings + generation)
- Cohere (embeddings + generation)
- Google Vertex AI
- Azure OpenAI
Provider Features
- Streaming generation support
- Batch API optimization (when available)
- Model-specific optimizations
- Cost tracking and metrics
Configuration Improvements
- Provider profiles (development, production)
- Model aliasing (e.g., "small", "large")
- Fallback provider chains
Testing
- Integration tests with real Bedrock endpoints
- Performance benchmarking across providers
- Cost comparison analysis

References

boto3 Bedrock Runtime Documentation
Amazon Bedrock User Guide
ADR-003: Vector Database and Semantic Search
ADR-008: MCP Sampling for Semantic Search
ADR-013: RAG Evaluation Framework

12 KiB Raw Blame History

ADR-015: Unified Provider Architecture for Embeddings and Text Generation

Context

Problems with Dual Provider Systems

Amazon Bedrock Requirements

Decision

1. Unified Provider Interface

2. Provider Registry

3. Backward Compatibility

4. Amazon Bedrock Implementation

Consequences

Positive

Neutral

Negative

Implementation

Files Created

Files Modified

Testing Results

Alternatives Considered

Alternative 1: Keep Separate Provider Systems

Alternative 2: Separate Embedding and Generation Providers

Alternative 3: Plugin System

Future Work

References

12 KiB

Raw Blame History