nextcloud-mcp-server/docs/ADR-015-unified-provider-architecture.md

# ADR-015: Unified Provider Architecture for Embeddings and Text Generation

**Status:** Accepted
**Date:** 2025-01-16
**Deciders:** Development Team
**Related:** ADR-003 (Vector Database), ADR-008 (MCP Sampling), ADR-013 (RAG Evaluation)

## Context

Prior to this refactoring, the codebase had two separate provider systems:

1. **Embedding Providers** (`nextcloud_mcp_server/embedding/`)
   - Used `EmbeddingProvider` ABC with methods: `embed()`, `embed_batch()`, `get_dimension()`
   - Had auto-detection via `EmbeddingService._detect_provider()`
   - Used for semantic search and vector indexing (production)

2. **LLM Providers** (`tests/rag_evaluation/llm_providers.py`)
   - Used `LLMProvider` Protocol with method: `generate()`
   - Had separate factory function `create_llm_provider()`
   - Used only for RAG evaluation tests (not production)

This fragmentation created several problems:

### Problems with Dual Provider Systems

1. **Code Duplication**
   - Ollama configuration appeared in both `embedding/service.py` and `tests/rag_evaluation/llm_providers.py`
   - Similar provider detection logic in multiple places
   - Separate singleton patterns for each system

2. **Limited Extensibility**
   - Hard-coded provider detection in `EmbeddingService._detect_provider()`
   - No support for providers that offer both capabilities (like Bedrock)
   - Adding new providers required modifying multiple files

3. **Inconsistent Patterns**
   - BM25 provider didn't follow `EmbeddingProvider` ABC
   - Different method names across providers (`embed` vs `encode`)
   - ABC vs Protocol for type checking

4. **Difficult Scaling**
   - Adding Amazon Bedrock (our third provider) would exacerbate all issues
   - No clear path for future providers (OpenAI, Cohere, etc.)

### Amazon Bedrock Requirements

Bedrock naturally supports **both** embeddings and text generation:
- **Embeddings**: `amazon.titan-embed-text-v1/v2`, `cohere.embed-*`
- **Text Generation**: `anthropic.claude-*`, `meta.llama3-*`, `amazon.titan-text-*`
- **Unified API**: Single `invoke_model()` method via bedrock-runtime

This made it the perfect opportunity to establish a unified provider architecture.

## Decision

We refactored the provider infrastructure to use a **unified Provider ABC** with optional capabilities:

### 1. Unified Provider Interface

**New Structure:**
```
nextcloud_mcp_server/providers/
├── __init__.py
├── base.py              # Provider ABC with optional capabilities
├── registry.py          # Auto-detection and factory
├── ollama.py            # Supports both embedding + generation
├── anthropic.py         # Generation only
├── bedrock.py           # Supports both embedding + generation
└── simple.py            # Embedding only (testing fallback)
```

**Base Class (`providers/base.py`):**
```python
class Provider(ABC):
    @property
    @abstractmethod
    def supports_embeddings(self) -> bool:
        """Whether this provider supports embedding generation."""
        pass

    @property
    @abstractmethod
    def supports_generation(self) -> bool:
        """Whether this provider supports text generation."""
        pass

    @abstractmethod
    async def embed(self, text: str) -> list[float]:
        """Generate embedding (raises NotImplementedError if not supported)."""
        pass

    @abstractmethod
    async def embed_batch(self, texts: list[str]) -> list[list[float]]:
        """Generate batch embeddings (raises NotImplementedError if not supported)."""
        pass

    @abstractmethod
    def get_dimension(self) -> int:
        """Get embedding dimension (raises NotImplementedError if not supported)."""
        pass

    @abstractmethod
    async def generate(self, prompt: str, max_tokens: int = 500) -> str:
        """Generate text (raises NotImplementedError if not supported)."""
        pass

    @abstractmethod
    async def close(self) -> None:
        """Close provider and release resources."""
        pass
```

### 2. Provider Registry

**Auto-Detection Priority** (`providers/registry.py`):
```python
class ProviderRegistry:
    @staticmethod
    def create_provider() -> Provider:
        # 1. Bedrock (AWS_REGION or BEDROCK_*_MODEL)
        # 2. Ollama (OLLAMA_BASE_URL)
        # 3. Simple (fallback)
```

**Environment Variables:**

**Bedrock:**
- `AWS_REGION`: AWS region (e.g., "us-east-1")
- `AWS_ACCESS_KEY_ID`: AWS access key (optional, uses credential chain)
- `AWS_SECRET_ACCESS_KEY`: AWS secret key (optional)
- `BEDROCK_EMBEDDING_MODEL`: Model ID for embeddings (e.g., "amazon.titan-embed-text-v2:0")
- `BEDROCK_GENERATION_MODEL`: Model ID for text generation (e.g., "anthropic.claude-3-sonnet-20240229-v1:0")

**Ollama:**
- `OLLAMA_BASE_URL`: Ollama API base URL (e.g., "http://localhost:11434")
- `OLLAMA_EMBEDDING_MODEL`: Model for embeddings (default: "nomic-embed-text")
- `OLLAMA_GENERATION_MODEL`: Model for text generation (e.g., "llama3.2:1b")
- `OLLAMA_VERIFY_SSL`: Verify SSL certificates (default: "true")

**Simple (no configuration, fallback):**
- `SIMPLE_EMBEDDING_DIMENSION`: Embedding dimension (default: 384)

### 3. Backward Compatibility

**Old Code Continues to Work:**
```python
# Old way (still works)
from nextcloud_mcp_server.embedding import get_embedding_service

service = get_embedding_service()  # Returns singleton Provider
embeddings = await service.embed_batch(texts)
```

**New Way (recommended):**
```python
# New way (cleaner)
from nextcloud_mcp_server.providers import get_provider

provider = get_provider()  # Returns singleton Provider
embeddings = await provider.embed_batch(texts)

# Can also use generation if provider supports it
if provider.supports_generation:
    text = await provider.generate("prompt")
```

**Migration Path:**
- `embedding/service.py` now wraps `providers.get_provider()` for compatibility
- `tests/rag_evaluation/llm_providers.py` now uses unified providers
- Old imports still work, marked as deprecated in docstrings

### 4. Amazon Bedrock Implementation

**Features:**
- Supports both embeddings and text generation
- Model-specific request/response handling for:
  - Titan Embed (amazon.titan-embed-text-*)
  - Cohere Embed (cohere.embed-*)
  - Claude (anthropic.claude-*)
  - Llama (meta.llama3-*)
  - Titan Text (amazon.titan-text-*)
  - Mistral (mistral.*)
- Uses boto3 bedrock-runtime client
- Graceful degradation if boto3 not installed
- Async implementation matching existing patterns

**Model-Specific Handling:**
```python
# Bedrock embedding request (Titan)
{"inputText": text}

# Bedrock generation request (Claude)
{
    "anthropic_version": "bedrock-2023-05-31",
    "max_tokens": max_tokens,
    "temperature": 0.7,
    "messages": [{"role": "user", "content": prompt}]
}
```

## Consequences

### Positive

1. **Sustainable Provider Additions**
   - New providers only need to implement `Provider` ABC
   - Auto-detection via environment variables
   - No modifications to existing code required

2. **Code Consolidation**
   - Single provider interface instead of two
   - Unified configuration pattern
   - Eliminated duplication

3. **Better Extensibility**
   - Providers can support one or both capabilities
   - Clear capability detection via properties
   - Registry pattern simplifies auto-detection

4. **Improved Testing**
   - RAG evaluation can use any provider (Ollama, Anthropic, Bedrock)
   - Comprehensive unit tests for all providers
   - Mocked boto3 tests for Bedrock

5. **Production-Ready Bedrock Support**
   - Full embedding and generation support
   - Multiple model families supported
   - AWS credential chain integration

### Neutral

1. **Optional Boto3 Dependency**
   - boto3 is dev dependency only (not required for core functionality)
   - Bedrock provider gracefully fails if boto3 not installed
   - Users who want Bedrock must `pip install boto3`

2. **Capability Properties**
   - All providers must implement capability properties
   - Methods raise `NotImplementedError` if capability not supported
   - Clear error messages guide users to alternatives

### Negative

1. **Migration Effort**
   - Existing code must be migrated to new imports (optional, backward compatible)
   - Documentation needs updating
   - Users must learn new environment variables

2. **Increased Complexity**
   - Provider base class has more methods (embedding + generation)
   - More environment variables to configure
   - Capability detection adds runtime checks

## Implementation

### Files Created

**New Provider Infrastructure:**
- `nextcloud_mcp_server/providers/__init__.py`
- `nextcloud_mcp_server/providers/base.py`
- `nextcloud_mcp_server/providers/registry.py`
- `nextcloud_mcp_server/providers/ollama.py`
- `nextcloud_mcp_server/providers/anthropic.py`
- `nextcloud_mcp_server/providers/bedrock.py`
- `nextcloud_mcp_server/providers/simple.py`

**Tests:**
- `tests/unit/providers/__init__.py`
- `tests/unit/providers/test_bedrock.py` (9 unit tests)

**Documentation:**
- `docs/ADR-015-unified-provider-architecture.md` (this file)

### Files Modified

**Backward Compatibility:**
- `nextcloud_mcp_server/embedding/service.py` - Now wraps `get_provider()`
- `tests/rag_evaluation/llm_providers.py` - Uses unified providers

**Dependencies:**
- `pyproject.toml` - Added `boto3>=1.35.0` to dev dependencies

### Testing Results

**Unit Tests:** 127 passed (including 9 new Bedrock tests)
**Type Checking:** All checks passed (ty)
**Linting:** All checks passed (ruff)
**Backward Compatibility:** Verified - existing embedding tests work

## Alternatives Considered

### Alternative 1: Keep Separate Provider Systems

**Pros:**
- No refactoring needed
- Simpler short-term

**Cons:**
- Bedrock would need to be implemented twice
- Continued code duplication
- No long-term scalability

**Decision:** Rejected - technical debt would continue to grow

### Alternative 2: Separate Embedding and Generation Providers

Use composition instead of unified interface:
```python
class CombinedProvider:
    def __init__(self, embedding: EmbeddingProvider, generation: LLMProvider):
        self.embedding = embedding
        self.generation = generation
```

**Pros:**
- Clearer separation of concerns
- Simpler individual providers

**Cons:**
- Bedrock and Ollama naturally do both - artificial separation
- More complex configuration (two providers to configure)
- More boilerplate code

**Decision:** Rejected - unified interface better matches provider capabilities

### Alternative 3: Plugin System

Dynamic provider registration via entry points:
```python
# setup.py
entry_points={
    'nextcloud_mcp.providers': [
        'ollama = nextcloud_mcp_server.providers.ollama:OllamaProvider',
        'bedrock = nextcloud_mcp_server.providers.bedrock:BedrockProvider',
    ]
}
```

**Pros:**
- Most extensible
- Third-party providers possible

**Cons:**
- Over-engineered for current needs
- Added complexity
- No immediate benefit

**Decision:** Deferred - can add later if needed

## Future Work

1. **Additional Providers**
   - OpenAI (embeddings + generation)
   - Cohere (embeddings + generation)
   - Google Vertex AI
   - Azure OpenAI

2. **Provider Features**
   - Streaming generation support
   - Batch API optimization (when available)
   - Model-specific optimizations
   - Cost tracking and metrics

3. **Configuration Improvements**
   - Provider profiles (development, production)
   - Model aliasing (e.g., "small", "large")
   - Fallback provider chains

4. **Testing**
   - Integration tests with real Bedrock endpoints
   - Performance benchmarking across providers
   - Cost comparison analysis

## References

- [boto3 Bedrock Runtime Documentation](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/bedrock-runtime.html)
- [Amazon Bedrock User Guide](https://docs.aws.amazon.com/bedrock/latest/userguide/what-is-bedrock.html)
- ADR-003: Vector Database and Semantic Search
- ADR-008: MCP Sampling for Semantic Search
- ADR-013: RAG Evaluation Framework