Refactored LLM provider infrastructure to support sustainable additions of new providers with both embedding and text generation capabilities.
## Major Changes
### Unified Provider Architecture (ADR-015)
- Created `nextcloud_mcp_server/providers/` with unified Provider ABC
- Providers now support optional capabilities (embeddings and/or generation)
- Auto-detection registry with priority: Bedrock → Ollama → Simple
- Backward compatible - existing code continues to work
### New Providers
- **BedrockProvider**: Full Amazon Bedrock integration
- Embeddings: Titan Embed, Cohere Embed models
- Generation: Claude, Llama, Titan Text, Mistral models
- Model-specific request/response handling
- AWS credential chain integration
- **OllamaProvider**: Migrated with both capabilities support
- **AnthropicProvider**: Moved from test code to production providers
- **SimpleProvider**: Migrated in-memory fallback provider
### Breaking Changes
None - full backward compatibility maintained:
- `embedding.get_embedding_service()` still works
- RAG evaluation tests updated to use unified providers
- All existing tests pass (127 unit tests)
### Testing
- Added 9 comprehensive Bedrock unit tests with mocked boto3
- All existing unit tests pass
- Type checking (ty) and linting (ruff) pass
- Verified backward compatibility
### Documentation
- `docs/ADR-015-unified-provider-architecture.md`: Comprehensive ADR
- `docs/bedrock-setup.md`: AWS setup guide with IAM permissions
- `CLAUDE.md`: Updated with provider architecture section
### Dependencies
- Added `boto3>=1.35.0` to dev dependencies (optional)
## Environment Variables
### Bedrock
- `AWS_REGION`: AWS region (e.g., "us-east-1")
- `BEDROCK_EMBEDDING_MODEL`: Model ID for embeddings
- `BEDROCK_GENERATION_MODEL`: Model ID for generation
- `AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY`: Optional credentials
### Ollama
- `OLLAMA_BASE_URL`: API URL
- `OLLAMA_EMBEDDING_MODEL`: Embedding model (default: "nomic-embed-text")
- `OLLAMA_GENERATION_MODEL`: Generation model
## AWS Bedrock Permissions Required
Minimal IAM policy:
```json
{
"Effect": "Allow",
"Action": ["bedrock:InvokeModel"],
"Resource": ["arn:aws:bedrock:*::foundation-model/*"]
}
```
See `docs/bedrock-setup.md` for detailed setup instructions.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
12 KiB
ADR-015: Unified Provider Architecture for Embeddings and Text Generation
Status: Accepted Date: 2025-01-16 Deciders: Development Team Related: ADR-003 (Vector Database), ADR-008 (MCP Sampling), ADR-013 (RAG Evaluation)
Context
Prior to this refactoring, the codebase had two separate provider systems:
-
Embedding Providers (
nextcloud_mcp_server/embedding/)- Used
EmbeddingProviderABC with methods:embed(),embed_batch(),get_dimension() - Had auto-detection via
EmbeddingService._detect_provider() - Used for semantic search and vector indexing (production)
- Used
-
LLM Providers (
tests/rag_evaluation/llm_providers.py)- Used
LLMProviderProtocol with method:generate() - Had separate factory function
create_llm_provider() - Used only for RAG evaluation tests (not production)
- Used
This fragmentation created several problems:
Problems with Dual Provider Systems
-
Code Duplication
- Ollama configuration appeared in both
embedding/service.pyandtests/rag_evaluation/llm_providers.py - Similar provider detection logic in multiple places
- Separate singleton patterns for each system
- Ollama configuration appeared in both
-
Limited Extensibility
- Hard-coded provider detection in
EmbeddingService._detect_provider() - No support for providers that offer both capabilities (like Bedrock)
- Adding new providers required modifying multiple files
- Hard-coded provider detection in
-
Inconsistent Patterns
- BM25 provider didn't follow
EmbeddingProviderABC - Different method names across providers (
embedvsencode) - ABC vs Protocol for type checking
- BM25 provider didn't follow
-
Difficult Scaling
- Adding Amazon Bedrock (our third provider) would exacerbate all issues
- No clear path for future providers (OpenAI, Cohere, etc.)
Amazon Bedrock Requirements
Bedrock naturally supports both embeddings and text generation:
- Embeddings:
amazon.titan-embed-text-v1/v2,cohere.embed-* - Text Generation:
anthropic.claude-*,meta.llama3-*,amazon.titan-text-* - Unified API: Single
invoke_model()method via bedrock-runtime
This made it the perfect opportunity to establish a unified provider architecture.
Decision
We refactored the provider infrastructure to use a unified Provider ABC with optional capabilities:
1. Unified Provider Interface
New Structure:
nextcloud_mcp_server/providers/
├── __init__.py
├── base.py # Provider ABC with optional capabilities
├── registry.py # Auto-detection and factory
├── ollama.py # Supports both embedding + generation
├── anthropic.py # Generation only
├── bedrock.py # Supports both embedding + generation
└── simple.py # Embedding only (testing fallback)
Base Class (providers/base.py):
class Provider(ABC):
@property
@abstractmethod
def supports_embeddings(self) -> bool:
"""Whether this provider supports embedding generation."""
pass
@property
@abstractmethod
def supports_generation(self) -> bool:
"""Whether this provider supports text generation."""
pass
@abstractmethod
async def embed(self, text: str) -> list[float]:
"""Generate embedding (raises NotImplementedError if not supported)."""
pass
@abstractmethod
async def embed_batch(self, texts: list[str]) -> list[list[float]]:
"""Generate batch embeddings (raises NotImplementedError if not supported)."""
pass
@abstractmethod
def get_dimension(self) -> int:
"""Get embedding dimension (raises NotImplementedError if not supported)."""
pass
@abstractmethod
async def generate(self, prompt: str, max_tokens: int = 500) -> str:
"""Generate text (raises NotImplementedError if not supported)."""
pass
@abstractmethod
async def close(self) -> None:
"""Close provider and release resources."""
pass
2. Provider Registry
Auto-Detection Priority (providers/registry.py):
class ProviderRegistry:
@staticmethod
def create_provider() -> Provider:
# 1. Bedrock (AWS_REGION or BEDROCK_*_MODEL)
# 2. Ollama (OLLAMA_BASE_URL)
# 3. Simple (fallback)
Environment Variables:
Bedrock:
AWS_REGION: AWS region (e.g., "us-east-1")AWS_ACCESS_KEY_ID: AWS access key (optional, uses credential chain)AWS_SECRET_ACCESS_KEY: AWS secret key (optional)BEDROCK_EMBEDDING_MODEL: Model ID for embeddings (e.g., "amazon.titan-embed-text-v2:0")BEDROCK_GENERATION_MODEL: Model ID for text generation (e.g., "anthropic.claude-3-sonnet-20240229-v1:0")
Ollama:
OLLAMA_BASE_URL: Ollama API base URL (e.g., "http://localhost:11434")OLLAMA_EMBEDDING_MODEL: Model for embeddings (default: "nomic-embed-text")OLLAMA_GENERATION_MODEL: Model for text generation (e.g., "llama3.2:1b")OLLAMA_VERIFY_SSL: Verify SSL certificates (default: "true")
Simple (no configuration, fallback):
SIMPLE_EMBEDDING_DIMENSION: Embedding dimension (default: 384)
3. Backward Compatibility
Old Code Continues to Work:
# Old way (still works)
from nextcloud_mcp_server.embedding import get_embedding_service
service = get_embedding_service() # Returns singleton Provider
embeddings = await service.embed_batch(texts)
New Way (recommended):
# New way (cleaner)
from nextcloud_mcp_server.providers import get_provider
provider = get_provider() # Returns singleton Provider
embeddings = await provider.embed_batch(texts)
# Can also use generation if provider supports it
if provider.supports_generation:
text = await provider.generate("prompt")
Migration Path:
embedding/service.pynow wrapsproviders.get_provider()for compatibilitytests/rag_evaluation/llm_providers.pynow uses unified providers- Old imports still work, marked as deprecated in docstrings
4. Amazon Bedrock Implementation
Features:
- Supports both embeddings and text generation
- Model-specific request/response handling for:
- Titan Embed (amazon.titan-embed-text-*)
- Cohere Embed (cohere.embed-*)
- Claude (anthropic.claude-*)
- Llama (meta.llama3-*)
- Titan Text (amazon.titan-text-*)
- Mistral (mistral.*)
- Uses boto3 bedrock-runtime client
- Graceful degradation if boto3 not installed
- Async implementation matching existing patterns
Model-Specific Handling:
# Bedrock embedding request (Titan)
{"inputText": text}
# Bedrock generation request (Claude)
{
"anthropic_version": "bedrock-2023-05-31",
"max_tokens": max_tokens,
"temperature": 0.7,
"messages": [{"role": "user", "content": prompt}]
}
Consequences
Positive
-
Sustainable Provider Additions
- New providers only need to implement
ProviderABC - Auto-detection via environment variables
- No modifications to existing code required
- New providers only need to implement
-
Code Consolidation
- Single provider interface instead of two
- Unified configuration pattern
- Eliminated duplication
-
Better Extensibility
- Providers can support one or both capabilities
- Clear capability detection via properties
- Registry pattern simplifies auto-detection
-
Improved Testing
- RAG evaluation can use any provider (Ollama, Anthropic, Bedrock)
- Comprehensive unit tests for all providers
- Mocked boto3 tests for Bedrock
-
Production-Ready Bedrock Support
- Full embedding and generation support
- Multiple model families supported
- AWS credential chain integration
Neutral
-
Optional Boto3 Dependency
- boto3 is dev dependency only (not required for core functionality)
- Bedrock provider gracefully fails if boto3 not installed
- Users who want Bedrock must
pip install boto3
-
Capability Properties
- All providers must implement capability properties
- Methods raise
NotImplementedErrorif capability not supported - Clear error messages guide users to alternatives
Negative
-
Migration Effort
- Existing code must be migrated to new imports (optional, backward compatible)
- Documentation needs updating
- Users must learn new environment variables
-
Increased Complexity
- Provider base class has more methods (embedding + generation)
- More environment variables to configure
- Capability detection adds runtime checks
Implementation
Files Created
New Provider Infrastructure:
nextcloud_mcp_server/providers/__init__.pynextcloud_mcp_server/providers/base.pynextcloud_mcp_server/providers/registry.pynextcloud_mcp_server/providers/ollama.pynextcloud_mcp_server/providers/anthropic.pynextcloud_mcp_server/providers/bedrock.pynextcloud_mcp_server/providers/simple.py
Tests:
tests/unit/providers/__init__.pytests/unit/providers/test_bedrock.py(9 unit tests)
Documentation:
docs/ADR-015-unified-provider-architecture.md(this file)
Files Modified
Backward Compatibility:
nextcloud_mcp_server/embedding/service.py- Now wrapsget_provider()tests/rag_evaluation/llm_providers.py- Uses unified providers
Dependencies:
pyproject.toml- Addedboto3>=1.35.0to dev dependencies
Testing Results
Unit Tests: 127 passed (including 9 new Bedrock tests) Type Checking: All checks passed (ty) Linting: All checks passed (ruff) Backward Compatibility: Verified - existing embedding tests work
Alternatives Considered
Alternative 1: Keep Separate Provider Systems
Pros:
- No refactoring needed
- Simpler short-term
Cons:
- Bedrock would need to be implemented twice
- Continued code duplication
- No long-term scalability
Decision: Rejected - technical debt would continue to grow
Alternative 2: Separate Embedding and Generation Providers
Use composition instead of unified interface:
class CombinedProvider:
def __init__(self, embedding: EmbeddingProvider, generation: LLMProvider):
self.embedding = embedding
self.generation = generation
Pros:
- Clearer separation of concerns
- Simpler individual providers
Cons:
- Bedrock and Ollama naturally do both - artificial separation
- More complex configuration (two providers to configure)
- More boilerplate code
Decision: Rejected - unified interface better matches provider capabilities
Alternative 3: Plugin System
Dynamic provider registration via entry points:
# setup.py
entry_points={
'nextcloud_mcp.providers': [
'ollama = nextcloud_mcp_server.providers.ollama:OllamaProvider',
'bedrock = nextcloud_mcp_server.providers.bedrock:BedrockProvider',
]
}
Pros:
- Most extensible
- Third-party providers possible
Cons:
- Over-engineered for current needs
- Added complexity
- No immediate benefit
Decision: Deferred - can add later if needed
Future Work
-
Additional Providers
- OpenAI (embeddings + generation)
- Cohere (embeddings + generation)
- Google Vertex AI
- Azure OpenAI
-
Provider Features
- Streaming generation support
- Batch API optimization (when available)
- Model-specific optimizations
- Cost tracking and metrics
-
Configuration Improvements
- Provider profiles (development, production)
- Model aliasing (e.g., "small", "large")
- Fallback provider chains
-
Testing
- Integration tests with real Bedrock endpoints
- Performance benchmarking across providers
- Cost comparison analysis
References
- boto3 Bedrock Runtime Documentation
- Amazon Bedrock User Guide
- ADR-003: Vector Database and Semantic Search
- ADR-008: MCP Sampling for Semantic Search
- ADR-013: RAG Evaluation Framework