feat: add unified provider architecture with Amazon Bedrock support
Refactored LLM provider infrastructure to support sustainable additions of new providers with both embedding and text generation capabilities.
## Major Changes
### Unified Provider Architecture (ADR-015)
- Created `nextcloud_mcp_server/providers/` with unified Provider ABC
- Providers now support optional capabilities (embeddings and/or generation)
- Auto-detection registry with priority: Bedrock → Ollama → Simple
- Backward compatible - existing code continues to work
### New Providers
- **BedrockProvider**: Full Amazon Bedrock integration
- Embeddings: Titan Embed, Cohere Embed models
- Generation: Claude, Llama, Titan Text, Mistral models
- Model-specific request/response handling
- AWS credential chain integration
- **OllamaProvider**: Migrated with both capabilities support
- **AnthropicProvider**: Moved from test code to production providers
- **SimpleProvider**: Migrated in-memory fallback provider
### Breaking Changes
None - full backward compatibility maintained:
- `embedding.get_embedding_service()` still works
- RAG evaluation tests updated to use unified providers
- All existing tests pass (127 unit tests)
### Testing
- Added 9 comprehensive Bedrock unit tests with mocked boto3
- All existing unit tests pass
- Type checking (ty) and linting (ruff) pass
- Verified backward compatibility
### Documentation
- `docs/ADR-015-unified-provider-architecture.md`: Comprehensive ADR
- `docs/bedrock-setup.md`: AWS setup guide with IAM permissions
- `CLAUDE.md`: Updated with provider architecture section
### Dependencies
- Added `boto3>=1.35.0` to dev dependencies (optional)
## Environment Variables
### Bedrock
- `AWS_REGION`: AWS region (e.g., "us-east-1")
- `BEDROCK_EMBEDDING_MODEL`: Model ID for embeddings
- `BEDROCK_GENERATION_MODEL`: Model ID for generation
- `AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY`: Optional credentials
### Ollama
- `OLLAMA_BASE_URL`: API URL
- `OLLAMA_EMBEDDING_MODEL`: Embedding model (default: "nomic-embed-text")
- `OLLAMA_GENERATION_MODEL`: Generation model
## AWS Bedrock Permissions Required
Minimal IAM policy:
```json
{
"Effect": "Allow",
"Action": ["bedrock:InvokeModel"],
"Resource": ["arn:aws:bedrock:*::foundation-model/*"]
}
```
See `docs/bedrock-setup.md` for detailed setup instructions.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,380 @@
|
||||
# ADR-015: Unified Provider Architecture for Embeddings and Text Generation
|
||||
|
||||
**Status:** Accepted
|
||||
**Date:** 2025-01-16
|
||||
**Deciders:** Development Team
|
||||
**Related:** ADR-003 (Vector Database), ADR-008 (MCP Sampling), ADR-013 (RAG Evaluation)
|
||||
|
||||
## Context
|
||||
|
||||
Prior to this refactoring, the codebase had two separate provider systems:
|
||||
|
||||
1. **Embedding Providers** (`nextcloud_mcp_server/embedding/`)
|
||||
- Used `EmbeddingProvider` ABC with methods: `embed()`, `embed_batch()`, `get_dimension()`
|
||||
- Had auto-detection via `EmbeddingService._detect_provider()`
|
||||
- Used for semantic search and vector indexing (production)
|
||||
|
||||
2. **LLM Providers** (`tests/rag_evaluation/llm_providers.py`)
|
||||
- Used `LLMProvider` Protocol with method: `generate()`
|
||||
- Had separate factory function `create_llm_provider()`
|
||||
- Used only for RAG evaluation tests (not production)
|
||||
|
||||
This fragmentation created several problems:
|
||||
|
||||
### Problems with Dual Provider Systems
|
||||
|
||||
1. **Code Duplication**
|
||||
- Ollama configuration appeared in both `embedding/service.py` and `tests/rag_evaluation/llm_providers.py`
|
||||
- Similar provider detection logic in multiple places
|
||||
- Separate singleton patterns for each system
|
||||
|
||||
2. **Limited Extensibility**
|
||||
- Hard-coded provider detection in `EmbeddingService._detect_provider()`
|
||||
- No support for providers that offer both capabilities (like Bedrock)
|
||||
- Adding new providers required modifying multiple files
|
||||
|
||||
3. **Inconsistent Patterns**
|
||||
- BM25 provider didn't follow `EmbeddingProvider` ABC
|
||||
- Different method names across providers (`embed` vs `encode`)
|
||||
- ABC vs Protocol for type checking
|
||||
|
||||
4. **Difficult Scaling**
|
||||
- Adding Amazon Bedrock (our third provider) would exacerbate all issues
|
||||
- No clear path for future providers (OpenAI, Cohere, etc.)
|
||||
|
||||
### Amazon Bedrock Requirements
|
||||
|
||||
Bedrock naturally supports **both** embeddings and text generation:
|
||||
- **Embeddings**: `amazon.titan-embed-text-v1/v2`, `cohere.embed-*`
|
||||
- **Text Generation**: `anthropic.claude-*`, `meta.llama3-*`, `amazon.titan-text-*`
|
||||
- **Unified API**: Single `invoke_model()` method via bedrock-runtime
|
||||
|
||||
This made it the perfect opportunity to establish a unified provider architecture.
|
||||
|
||||
## Decision
|
||||
|
||||
We refactored the provider infrastructure to use a **unified Provider ABC** with optional capabilities:
|
||||
|
||||
### 1. Unified Provider Interface
|
||||
|
||||
**New Structure:**
|
||||
```
|
||||
nextcloud_mcp_server/providers/
|
||||
├── __init__.py
|
||||
├── base.py # Provider ABC with optional capabilities
|
||||
├── registry.py # Auto-detection and factory
|
||||
├── ollama.py # Supports both embedding + generation
|
||||
├── anthropic.py # Generation only
|
||||
├── bedrock.py # Supports both embedding + generation
|
||||
└── simple.py # Embedding only (testing fallback)
|
||||
```
|
||||
|
||||
**Base Class (`providers/base.py`):**
|
||||
```python
|
||||
class Provider(ABC):
|
||||
@property
|
||||
@abstractmethod
|
||||
def supports_embeddings(self) -> bool:
|
||||
"""Whether this provider supports embedding generation."""
|
||||
pass
|
||||
|
||||
@property
|
||||
@abstractmethod
|
||||
def supports_generation(self) -> bool:
|
||||
"""Whether this provider supports text generation."""
|
||||
pass
|
||||
|
||||
@abstractmethod
|
||||
async def embed(self, text: str) -> list[float]:
|
||||
"""Generate embedding (raises NotImplementedError if not supported)."""
|
||||
pass
|
||||
|
||||
@abstractmethod
|
||||
async def embed_batch(self, texts: list[str]) -> list[list[float]]:
|
||||
"""Generate batch embeddings (raises NotImplementedError if not supported)."""
|
||||
pass
|
||||
|
||||
@abstractmethod
|
||||
def get_dimension(self) -> int:
|
||||
"""Get embedding dimension (raises NotImplementedError if not supported)."""
|
||||
pass
|
||||
|
||||
@abstractmethod
|
||||
async def generate(self, prompt: str, max_tokens: int = 500) -> str:
|
||||
"""Generate text (raises NotImplementedError if not supported)."""
|
||||
pass
|
||||
|
||||
@abstractmethod
|
||||
async def close(self) -> None:
|
||||
"""Close provider and release resources."""
|
||||
pass
|
||||
```
|
||||
|
||||
### 2. Provider Registry
|
||||
|
||||
**Auto-Detection Priority** (`providers/registry.py`):
|
||||
```python
|
||||
class ProviderRegistry:
|
||||
@staticmethod
|
||||
def create_provider() -> Provider:
|
||||
# 1. Bedrock (AWS_REGION or BEDROCK_*_MODEL)
|
||||
# 2. Ollama (OLLAMA_BASE_URL)
|
||||
# 3. Simple (fallback)
|
||||
```
|
||||
|
||||
**Environment Variables:**
|
||||
|
||||
**Bedrock:**
|
||||
- `AWS_REGION`: AWS region (e.g., "us-east-1")
|
||||
- `AWS_ACCESS_KEY_ID`: AWS access key (optional, uses credential chain)
|
||||
- `AWS_SECRET_ACCESS_KEY`: AWS secret key (optional)
|
||||
- `BEDROCK_EMBEDDING_MODEL`: Model ID for embeddings (e.g., "amazon.titan-embed-text-v2:0")
|
||||
- `BEDROCK_GENERATION_MODEL`: Model ID for text generation (e.g., "anthropic.claude-3-sonnet-20240229-v1:0")
|
||||
|
||||
**Ollama:**
|
||||
- `OLLAMA_BASE_URL`: Ollama API base URL (e.g., "http://localhost:11434")
|
||||
- `OLLAMA_EMBEDDING_MODEL`: Model for embeddings (default: "nomic-embed-text")
|
||||
- `OLLAMA_GENERATION_MODEL`: Model for text generation (e.g., "llama3.2:1b")
|
||||
- `OLLAMA_VERIFY_SSL`: Verify SSL certificates (default: "true")
|
||||
|
||||
**Simple (no configuration, fallback):**
|
||||
- `SIMPLE_EMBEDDING_DIMENSION`: Embedding dimension (default: 384)
|
||||
|
||||
### 3. Backward Compatibility
|
||||
|
||||
**Old Code Continues to Work:**
|
||||
```python
|
||||
# Old way (still works)
|
||||
from nextcloud_mcp_server.embedding import get_embedding_service
|
||||
|
||||
service = get_embedding_service() # Returns singleton Provider
|
||||
embeddings = await service.embed_batch(texts)
|
||||
```
|
||||
|
||||
**New Way (recommended):**
|
||||
```python
|
||||
# New way (cleaner)
|
||||
from nextcloud_mcp_server.providers import get_provider
|
||||
|
||||
provider = get_provider() # Returns singleton Provider
|
||||
embeddings = await provider.embed_batch(texts)
|
||||
|
||||
# Can also use generation if provider supports it
|
||||
if provider.supports_generation:
|
||||
text = await provider.generate("prompt")
|
||||
```
|
||||
|
||||
**Migration Path:**
|
||||
- `embedding/service.py` now wraps `providers.get_provider()` for compatibility
|
||||
- `tests/rag_evaluation/llm_providers.py` now uses unified providers
|
||||
- Old imports still work, marked as deprecated in docstrings
|
||||
|
||||
### 4. Amazon Bedrock Implementation
|
||||
|
||||
**Features:**
|
||||
- Supports both embeddings and text generation
|
||||
- Model-specific request/response handling for:
|
||||
- Titan Embed (amazon.titan-embed-text-*)
|
||||
- Cohere Embed (cohere.embed-*)
|
||||
- Claude (anthropic.claude-*)
|
||||
- Llama (meta.llama3-*)
|
||||
- Titan Text (amazon.titan-text-*)
|
||||
- Mistral (mistral.*)
|
||||
- Uses boto3 bedrock-runtime client
|
||||
- Graceful degradation if boto3 not installed
|
||||
- Async implementation matching existing patterns
|
||||
|
||||
**Model-Specific Handling:**
|
||||
```python
|
||||
# Bedrock embedding request (Titan)
|
||||
{"inputText": text}
|
||||
|
||||
# Bedrock generation request (Claude)
|
||||
{
|
||||
"anthropic_version": "bedrock-2023-05-31",
|
||||
"max_tokens": max_tokens,
|
||||
"temperature": 0.7,
|
||||
"messages": [{"role": "user", "content": prompt}]
|
||||
}
|
||||
```
|
||||
|
||||
## Consequences
|
||||
|
||||
### Positive
|
||||
|
||||
1. **Sustainable Provider Additions**
|
||||
- New providers only need to implement `Provider` ABC
|
||||
- Auto-detection via environment variables
|
||||
- No modifications to existing code required
|
||||
|
||||
2. **Code Consolidation**
|
||||
- Single provider interface instead of two
|
||||
- Unified configuration pattern
|
||||
- Eliminated duplication
|
||||
|
||||
3. **Better Extensibility**
|
||||
- Providers can support one or both capabilities
|
||||
- Clear capability detection via properties
|
||||
- Registry pattern simplifies auto-detection
|
||||
|
||||
4. **Improved Testing**
|
||||
- RAG evaluation can use any provider (Ollama, Anthropic, Bedrock)
|
||||
- Comprehensive unit tests for all providers
|
||||
- Mocked boto3 tests for Bedrock
|
||||
|
||||
5. **Production-Ready Bedrock Support**
|
||||
- Full embedding and generation support
|
||||
- Multiple model families supported
|
||||
- AWS credential chain integration
|
||||
|
||||
### Neutral
|
||||
|
||||
1. **Optional Boto3 Dependency**
|
||||
- boto3 is dev dependency only (not required for core functionality)
|
||||
- Bedrock provider gracefully fails if boto3 not installed
|
||||
- Users who want Bedrock must `pip install boto3`
|
||||
|
||||
2. **Capability Properties**
|
||||
- All providers must implement capability properties
|
||||
- Methods raise `NotImplementedError` if capability not supported
|
||||
- Clear error messages guide users to alternatives
|
||||
|
||||
### Negative
|
||||
|
||||
1. **Migration Effort**
|
||||
- Existing code must be migrated to new imports (optional, backward compatible)
|
||||
- Documentation needs updating
|
||||
- Users must learn new environment variables
|
||||
|
||||
2. **Increased Complexity**
|
||||
- Provider base class has more methods (embedding + generation)
|
||||
- More environment variables to configure
|
||||
- Capability detection adds runtime checks
|
||||
|
||||
## Implementation
|
||||
|
||||
### Files Created
|
||||
|
||||
**New Provider Infrastructure:**
|
||||
- `nextcloud_mcp_server/providers/__init__.py`
|
||||
- `nextcloud_mcp_server/providers/base.py`
|
||||
- `nextcloud_mcp_server/providers/registry.py`
|
||||
- `nextcloud_mcp_server/providers/ollama.py`
|
||||
- `nextcloud_mcp_server/providers/anthropic.py`
|
||||
- `nextcloud_mcp_server/providers/bedrock.py`
|
||||
- `nextcloud_mcp_server/providers/simple.py`
|
||||
|
||||
**Tests:**
|
||||
- `tests/unit/providers/__init__.py`
|
||||
- `tests/unit/providers/test_bedrock.py` (9 unit tests)
|
||||
|
||||
**Documentation:**
|
||||
- `docs/ADR-015-unified-provider-architecture.md` (this file)
|
||||
|
||||
### Files Modified
|
||||
|
||||
**Backward Compatibility:**
|
||||
- `nextcloud_mcp_server/embedding/service.py` - Now wraps `get_provider()`
|
||||
- `tests/rag_evaluation/llm_providers.py` - Uses unified providers
|
||||
|
||||
**Dependencies:**
|
||||
- `pyproject.toml` - Added `boto3>=1.35.0` to dev dependencies
|
||||
|
||||
### Testing Results
|
||||
|
||||
**Unit Tests:** 127 passed (including 9 new Bedrock tests)
|
||||
**Type Checking:** All checks passed (ty)
|
||||
**Linting:** All checks passed (ruff)
|
||||
**Backward Compatibility:** Verified - existing embedding tests work
|
||||
|
||||
## Alternatives Considered
|
||||
|
||||
### Alternative 1: Keep Separate Provider Systems
|
||||
|
||||
**Pros:**
|
||||
- No refactoring needed
|
||||
- Simpler short-term
|
||||
|
||||
**Cons:**
|
||||
- Bedrock would need to be implemented twice
|
||||
- Continued code duplication
|
||||
- No long-term scalability
|
||||
|
||||
**Decision:** Rejected - technical debt would continue to grow
|
||||
|
||||
### Alternative 2: Separate Embedding and Generation Providers
|
||||
|
||||
Use composition instead of unified interface:
|
||||
```python
|
||||
class CombinedProvider:
|
||||
def __init__(self, embedding: EmbeddingProvider, generation: LLMProvider):
|
||||
self.embedding = embedding
|
||||
self.generation = generation
|
||||
```
|
||||
|
||||
**Pros:**
|
||||
- Clearer separation of concerns
|
||||
- Simpler individual providers
|
||||
|
||||
**Cons:**
|
||||
- Bedrock and Ollama naturally do both - artificial separation
|
||||
- More complex configuration (two providers to configure)
|
||||
- More boilerplate code
|
||||
|
||||
**Decision:** Rejected - unified interface better matches provider capabilities
|
||||
|
||||
### Alternative 3: Plugin System
|
||||
|
||||
Dynamic provider registration via entry points:
|
||||
```python
|
||||
# setup.py
|
||||
entry_points={
|
||||
'nextcloud_mcp.providers': [
|
||||
'ollama = nextcloud_mcp_server.providers.ollama:OllamaProvider',
|
||||
'bedrock = nextcloud_mcp_server.providers.bedrock:BedrockProvider',
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
**Pros:**
|
||||
- Most extensible
|
||||
- Third-party providers possible
|
||||
|
||||
**Cons:**
|
||||
- Over-engineered for current needs
|
||||
- Added complexity
|
||||
- No immediate benefit
|
||||
|
||||
**Decision:** Deferred - can add later if needed
|
||||
|
||||
## Future Work
|
||||
|
||||
1. **Additional Providers**
|
||||
- OpenAI (embeddings + generation)
|
||||
- Cohere (embeddings + generation)
|
||||
- Google Vertex AI
|
||||
- Azure OpenAI
|
||||
|
||||
2. **Provider Features**
|
||||
- Streaming generation support
|
||||
- Batch API optimization (when available)
|
||||
- Model-specific optimizations
|
||||
- Cost tracking and metrics
|
||||
|
||||
3. **Configuration Improvements**
|
||||
- Provider profiles (development, production)
|
||||
- Model aliasing (e.g., "small", "large")
|
||||
- Fallback provider chains
|
||||
|
||||
4. **Testing**
|
||||
- Integration tests with real Bedrock endpoints
|
||||
- Performance benchmarking across providers
|
||||
- Cost comparison analysis
|
||||
|
||||
## References
|
||||
|
||||
- [boto3 Bedrock Runtime Documentation](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/bedrock-runtime.html)
|
||||
- [Amazon Bedrock User Guide](https://docs.aws.amazon.com/bedrock/latest/userguide/what-is-bedrock.html)
|
||||
- ADR-003: Vector Database and Semantic Search
|
||||
- ADR-008: MCP Sampling for Semantic Search
|
||||
- ADR-013: RAG Evaluation Framework
|
||||
@@ -0,0 +1,338 @@
|
||||
# Amazon Bedrock Setup Guide
|
||||
|
||||
This guide covers how to configure the Nextcloud MCP Server to use Amazon Bedrock for embeddings and text generation.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
1. **AWS Account** with access to Amazon Bedrock
|
||||
2. **boto3 library** installed: `pip install boto3` or `uv sync --group dev`
|
||||
3. **Model Access** - Request access to models in AWS Bedrock console
|
||||
|
||||
## Required AWS Permissions
|
||||
|
||||
### IAM Policy for Bedrock Access
|
||||
|
||||
The AWS IAM user or role needs the following permissions:
|
||||
|
||||
```json
|
||||
{
|
||||
"Version": "2012-10-17",
|
||||
"Statement": [
|
||||
{
|
||||
"Sid": "BedrockInvokeModels",
|
||||
"Effect": "Allow",
|
||||
"Action": [
|
||||
"bedrock:InvokeModel",
|
||||
"bedrock:InvokeModelWithResponseStream"
|
||||
],
|
||||
"Resource": [
|
||||
"arn:aws:bedrock:*::foundation-model/*"
|
||||
]
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
### Minimal Permissions (Production)
|
||||
|
||||
For production deployments, restrict to specific models:
|
||||
|
||||
```json
|
||||
{
|
||||
"Version": "2012-10-17",
|
||||
"Statement": [
|
||||
{
|
||||
"Sid": "BedrockEmbeddings",
|
||||
"Effect": "Allow",
|
||||
"Action": [
|
||||
"bedrock:InvokeModel"
|
||||
],
|
||||
"Resource": [
|
||||
"arn:aws:bedrock:us-east-1::foundation-model/amazon.titan-embed-text-v2:0"
|
||||
]
|
||||
},
|
||||
{
|
||||
"Sid": "BedrockGeneration",
|
||||
"Effect": "Allow",
|
||||
"Action": [
|
||||
"bedrock:InvokeModel"
|
||||
],
|
||||
"Resource": [
|
||||
"arn:aws:bedrock:us-east-1::foundation-model/anthropic.claude-3-sonnet-20240229-v1:0"
|
||||
]
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
### Additional Permissions (Optional)
|
||||
|
||||
For advanced use cases:
|
||||
|
||||
```json
|
||||
{
|
||||
"Version": "2012-10-17",
|
||||
"Statement": [
|
||||
{
|
||||
"Sid": "BedrockListModels",
|
||||
"Effect": "Allow",
|
||||
"Action": [
|
||||
"bedrock:ListFoundationModels",
|
||||
"bedrock:GetFoundationModel"
|
||||
],
|
||||
"Resource": "*"
|
||||
},
|
||||
{
|
||||
"Sid": "BedrockAsyncInvoke",
|
||||
"Effect": "Allow",
|
||||
"Action": [
|
||||
"bedrock:InvokeModelAsync",
|
||||
"bedrock:GetAsyncInvoke",
|
||||
"bedrock:ListAsyncInvokes"
|
||||
],
|
||||
"Resource": [
|
||||
"arn:aws:bedrock:*::foundation-model/*"
|
||||
]
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
## Model Access
|
||||
|
||||
Before using Bedrock models, you must request access in the AWS Console:
|
||||
|
||||
1. Navigate to **Amazon Bedrock** → **Model access**
|
||||
2. Click **Manage model access**
|
||||
3. Select models you want to use:
|
||||
- **Embeddings:** Amazon Titan Embed Text, Cohere Embed
|
||||
- **Text Generation:** Anthropic Claude, Meta Llama, Amazon Titan Text
|
||||
4. Click **Request model access**
|
||||
5. Wait for approval (usually instant for most models)
|
||||
|
||||
## Supported Models
|
||||
|
||||
### Embedding Models
|
||||
|
||||
| Provider | Model ID | Dimensions | Best For |
|
||||
|----------|----------|------------|----------|
|
||||
| Amazon Titan | `amazon.titan-embed-text-v1` | 1,536 | General purpose |
|
||||
| Amazon Titan | `amazon.titan-embed-text-v2:0` | 1,024 | Latest, improved quality |
|
||||
| Cohere | `cohere.embed-english-v3` | 1,024 | English text |
|
||||
| Cohere | `cohere.embed-multilingual-v3` | 1,024 | Multilingual |
|
||||
|
||||
### Text Generation Models
|
||||
|
||||
| Provider | Model ID | Context | Best For |
|
||||
|----------|----------|---------|----------|
|
||||
| Anthropic | `anthropic.claude-3-sonnet-20240229-v1:0` | 200K | Balanced performance |
|
||||
| Anthropic | `anthropic.claude-3-haiku-20240307-v1:0` | 200K | Fast, cost-effective |
|
||||
| Anthropic | `anthropic.claude-3-opus-20240229-v1:0` | 200K | Highest quality |
|
||||
| Meta | `meta.llama3-8b-instruct-v1:0` | 8K | Fast, open-source |
|
||||
| Meta | `meta.llama3-70b-instruct-v1:0` | 8K | High quality |
|
||||
| Amazon | `amazon.titan-text-express-v1` | 8K | Fast, low cost |
|
||||
| Mistral | `mistral.mistral-7b-instruct-v0:2` | 32K | Efficient |
|
||||
|
||||
## Configuration
|
||||
|
||||
### Environment Variables
|
||||
|
||||
**Required:**
|
||||
```bash
|
||||
AWS_REGION=us-east-1
|
||||
```
|
||||
|
||||
**Optional (at least one model required):**
|
||||
```bash
|
||||
# For embeddings
|
||||
BEDROCK_EMBEDDING_MODEL=amazon.titan-embed-text-v2:0
|
||||
|
||||
# For text generation (RAG evaluation)
|
||||
BEDROCK_GENERATION_MODEL=anthropic.claude-3-sonnet-20240229-v1:0
|
||||
```
|
||||
|
||||
**AWS Credentials (choose one method):**
|
||||
|
||||
**Method 1: Environment Variables**
|
||||
```bash
|
||||
AWS_ACCESS_KEY_ID=AKIAIOSFODNN7EXAMPLE
|
||||
AWS_SECRET_ACCESS_KEY=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
|
||||
```
|
||||
|
||||
**Method 2: AWS Credentials File** (`~/.aws/credentials`)
|
||||
```ini
|
||||
[default]
|
||||
aws_access_key_id = AKIAIOSFODNN7EXAMPLE
|
||||
aws_secret_access_key = wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
|
||||
```
|
||||
|
||||
**Method 3: IAM Role** (when running on AWS EC2/ECS/Lambda)
|
||||
- No credentials needed, uses instance/task role automatically
|
||||
|
||||
### Docker Configuration
|
||||
|
||||
Add to your `docker-compose.yml`:
|
||||
|
||||
```yaml
|
||||
services:
|
||||
mcp:
|
||||
environment:
|
||||
- AWS_REGION=us-east-1
|
||||
- BEDROCK_EMBEDDING_MODEL=amazon.titan-embed-text-v2:0
|
||||
- BEDROCK_GENERATION_MODEL=anthropic.claude-3-sonnet-20240229-v1:0
|
||||
- AWS_ACCESS_KEY_ID=${AWS_ACCESS_KEY_ID}
|
||||
- AWS_SECRET_ACCESS_KEY=${AWS_SECRET_ACCESS_KEY}
|
||||
```
|
||||
|
||||
Or use AWS credentials file volume mount:
|
||||
|
||||
```yaml
|
||||
services:
|
||||
mcp:
|
||||
volumes:
|
||||
- ~/.aws:/root/.aws:ro
|
||||
environment:
|
||||
- AWS_REGION=us-east-1
|
||||
- BEDROCK_EMBEDDING_MODEL=amazon.titan-embed-text-v2:0
|
||||
```
|
||||
|
||||
## Usage Examples
|
||||
|
||||
### Embeddings Only
|
||||
|
||||
```bash
|
||||
export AWS_REGION=us-east-1
|
||||
export BEDROCK_EMBEDDING_MODEL=amazon.titan-embed-text-v2:0
|
||||
export AWS_ACCESS_KEY_ID=your-key
|
||||
export AWS_SECRET_ACCESS_KEY=your-secret
|
||||
|
||||
uv run nextcloud-mcp-server
|
||||
```
|
||||
|
||||
### Both Embeddings and Generation
|
||||
|
||||
```bash
|
||||
export AWS_REGION=us-east-1
|
||||
export BEDROCK_EMBEDDING_MODEL=amazon.titan-embed-text-v2:0
|
||||
export BEDROCK_GENERATION_MODEL=anthropic.claude-3-sonnet-20240229-v1:0
|
||||
|
||||
# For RAG evaluation with Bedrock
|
||||
export RAG_EVAL_PROVIDER=bedrock
|
||||
export RAG_EVAL_BEDROCK_MODEL=anthropic.claude-3-sonnet-20240229-v1:0
|
||||
|
||||
uv run python -m tests.rag_evaluation.evaluate
|
||||
```
|
||||
|
||||
### Programmatic Usage
|
||||
|
||||
```python
|
||||
from nextcloud_mcp_server.providers import BedrockProvider
|
||||
|
||||
# Embeddings only
|
||||
provider = BedrockProvider(
|
||||
region_name="us-east-1",
|
||||
embedding_model="amazon.titan-embed-text-v2:0",
|
||||
)
|
||||
|
||||
embeddings = await provider.embed_batch(["text1", "text2"])
|
||||
|
||||
# Both capabilities
|
||||
provider = BedrockProvider(
|
||||
region_name="us-east-1",
|
||||
embedding_model="amazon.titan-embed-text-v2:0",
|
||||
generation_model="anthropic.claude-3-sonnet-20240229-v1:0",
|
||||
)
|
||||
|
||||
# Generate embeddings
|
||||
embedding = await provider.embed("query text")
|
||||
|
||||
# Generate text
|
||||
response = await provider.generate("Write a summary", max_tokens=500)
|
||||
```
|
||||
|
||||
## Cost Considerations
|
||||
|
||||
### Embedding Costs (as of Jan 2025)
|
||||
|
||||
| Model | Price per 1K tokens |
|
||||
|-------|---------------------|
|
||||
| Titan Embed Text v2 | $0.0001 |
|
||||
| Cohere Embed English v3 | $0.0001 |
|
||||
|
||||
### Generation Costs (as of Jan 2025)
|
||||
|
||||
| Model | Input (per 1K tokens) | Output (per 1K tokens) |
|
||||
|-------|----------------------|------------------------|
|
||||
| Claude 3 Haiku | $0.00025 | $0.00125 |
|
||||
| Claude 3 Sonnet | $0.003 | $0.015 |
|
||||
| Claude 3 Opus | $0.015 | $0.075 |
|
||||
| Llama 3 8B | $0.0003 | $0.0006 |
|
||||
| Titan Text Express | $0.0002 | $0.0006 |
|
||||
|
||||
**Note:** Prices vary by region. Check [AWS Bedrock Pricing](https://aws.amazon.com/bedrock/pricing/) for current rates.
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Error: "Executable doesn't exist" or boto3 not found
|
||||
|
||||
**Solution:**
|
||||
```bash
|
||||
uv sync --group dev # Installs boto3
|
||||
```
|
||||
|
||||
### Error: "AccessDeniedException"
|
||||
|
||||
**Causes:**
|
||||
1. IAM permissions missing
|
||||
2. Model access not requested
|
||||
3. Wrong AWS region
|
||||
|
||||
**Solution:**
|
||||
1. Verify IAM policy includes `bedrock:InvokeModel`
|
||||
2. Request model access in Bedrock console
|
||||
3. Check model is available in your region
|
||||
|
||||
### Error: "ResourceNotFoundException"
|
||||
|
||||
**Cause:** Invalid model ID or model not available in region
|
||||
|
||||
**Solution:**
|
||||
- Verify model ID matches exactly (case-sensitive)
|
||||
- Check model availability in your AWS region
|
||||
- Use `aws bedrock list-foundation-models` to see available models
|
||||
|
||||
### Error: "ThrottlingException"
|
||||
|
||||
**Cause:** Rate limit exceeded
|
||||
|
||||
**Solution:**
|
||||
- Reduce request rate
|
||||
- Request quota increase via AWS Support
|
||||
- Use batch operations where possible
|
||||
|
||||
## Security Best Practices
|
||||
|
||||
1. **Use IAM Roles** when running on AWS infrastructure
|
||||
2. **Rotate Access Keys** regularly if using IAM users
|
||||
3. **Restrict Permissions** to only required models
|
||||
4. **Enable CloudTrail** for audit logging
|
||||
5. **Use AWS Secrets Manager** for credential management
|
||||
6. **Monitor Costs** with AWS Cost Explorer and Budgets
|
||||
|
||||
## Regional Availability
|
||||
|
||||
Amazon Bedrock is available in:
|
||||
- **US East (N. Virginia)**: `us-east-1` ✅ Most models
|
||||
- **US West (Oregon)**: `us-west-2` ✅ Most models
|
||||
- **Asia Pacific (Singapore)**: `ap-southeast-1`
|
||||
- **Asia Pacific (Tokyo)**: `ap-northeast-1`
|
||||
- **Europe (Frankfurt)**: `eu-central-1`
|
||||
|
||||
**Note:** Model availability varies by region. Check the [AWS Bedrock documentation](https://docs.aws.amazon.com/bedrock/latest/userguide/models-regions.html) for current availability.
|
||||
|
||||
## References
|
||||
|
||||
- [AWS Bedrock Documentation](https://docs.aws.amazon.com/bedrock/)
|
||||
- [AWS Bedrock Pricing](https://aws.amazon.com/bedrock/pricing/)
|
||||
- [boto3 Bedrock Runtime API](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/bedrock-runtime.html)
|
||||
- [Provider Architecture ADR](./ADR-015-unified-provider-architecture.md)
|
||||
Reference in New Issue
Block a user