Compare commits
2 Commits
| Author | SHA1 | Date | |
|---|---|---|---|
| 4a3b80cb98 | |||
| fc3ab8d0ac |
@@ -0,0 +1,441 @@
|
||||
# Ollama Capacity Analysis: ollama.internal.coutinho.io
|
||||
|
||||
**Date**: 2025-10-30
|
||||
**Model**: nomic-embed-text:latest
|
||||
**Test Location**: From nextcloud-mcp-server host
|
||||
|
||||
## Summary
|
||||
|
||||
✅ **Ollama instance is operational and performing well**
|
||||
- Embedding generation working correctly
|
||||
- Reasonable latency for small-medium workloads
|
||||
- Good parallelism support
|
||||
- Suitable for development and small production deployments
|
||||
|
||||
## Test Results
|
||||
|
||||
### Model Configuration
|
||||
|
||||
```json
|
||||
{
|
||||
"model": "nomic-embed-text",
|
||||
"dimensions": 768,
|
||||
"status": "operational"
|
||||
}
|
||||
```
|
||||
|
||||
### Performance Metrics
|
||||
|
||||
#### 1. Single Embedding Latency
|
||||
|
||||
**Result**: ~553ms per embedding
|
||||
- **Total time**: 0.553 seconds
|
||||
- **Includes**: Network + processing + model inference
|
||||
- **Quality**: Full 768-dimensional vector
|
||||
|
||||
**Analysis**:
|
||||
- Higher than bare-metal benchmarks (~100ms) due to network latency
|
||||
- Acceptable for interactive search queries
|
||||
- Within expected range for remote Ollama instance
|
||||
|
||||
#### 2. Batch Processing (5 items)
|
||||
|
||||
**Result**: ~1.02 seconds for 5 embeddings
|
||||
- **Per-item average**: 204ms
|
||||
- **Throughput**: ~4.9 embeddings/sec
|
||||
- **Batch efficiency**: 2.7x faster than sequential
|
||||
|
||||
**Analysis**:
|
||||
- Good batching efficiency (2.7x speedup vs 5x theoretical)
|
||||
- Optimal for background indexing
|
||||
- Network overhead amortized across batch
|
||||
|
||||
#### 3. Batch Processing (20 items)
|
||||
|
||||
**Result**: ~6.71 seconds for 20 embeddings
|
||||
- **Per-item average**: 336ms
|
||||
- **Throughput**: ~3.0 embeddings/sec
|
||||
- **Batch efficiency**: 1.65x faster than sequential
|
||||
|
||||
**Analysis**:
|
||||
- Performance degrades slightly with larger batches
|
||||
- Still faster than sequential processing
|
||||
- Matches reported Ollama behavior (quality issues at batch >16)
|
||||
- **Recommendation**: Keep batch size ≤16 for best quality
|
||||
|
||||
#### 4. Concurrent Requests (5 parallel)
|
||||
|
||||
**Result**: ~1.27 seconds for 5 parallel requests
|
||||
- **Effective parallelism**: ~4x speedup (vs 2.77s sequential)
|
||||
- **Per-request average**: 254ms
|
||||
- **Throughput**: ~3.9 requests/sec
|
||||
|
||||
**Analysis**:
|
||||
- Excellent parallelism support
|
||||
- Server handles concurrent requests efficiently
|
||||
- Network and compute overlap effectively
|
||||
- Good for multi-user scenarios
|
||||
|
||||
## Capacity Planning
|
||||
|
||||
### Current Performance Profile
|
||||
|
||||
| Metric | Value | Rating |
|
||||
|--------|-------|--------|
|
||||
| Single embedding latency | 553ms | ⚠️ Moderate |
|
||||
| Batch (5) throughput | 4.9/sec | ✅ Good |
|
||||
| Batch (20) throughput | 3.0/sec | ⚠️ Moderate |
|
||||
| Concurrent throughput | 3.9/sec | ✅ Good |
|
||||
| Network latency | ~300-400ms | ⚠️ Significant |
|
||||
|
||||
### Bottleneck Analysis
|
||||
|
||||
**Primary Bottleneck**: Network latency (~300-400ms per request)
|
||||
- Model inference: ~100-200ms (estimated)
|
||||
- Network round-trip: ~300-400ms (measured overhead)
|
||||
- **Impact**: 60-70% of total latency is network
|
||||
|
||||
**Secondary Bottleneck**: CPU/GPU capacity (unknown hardware)
|
||||
- Batch performance degrades at >16 items
|
||||
- Suggests resource constraints
|
||||
- Likely CPU-only (no GPU metrics available)
|
||||
|
||||
### Recommended Usage Patterns
|
||||
|
||||
#### ✅ **Excellent For:**
|
||||
|
||||
**1. Background Indexing**
|
||||
- Use batch size of 10-15 items
|
||||
- Expected throughput: 3-5 embeddings/sec
|
||||
- **10,000 notes**: ~30-55 minutes to index
|
||||
- **1,000 notes**: ~3-5 minutes to index
|
||||
|
||||
**2. Interactive Search**
|
||||
- Single query embedding: ~550ms
|
||||
- Acceptable for user-facing search
|
||||
- Add 100-200ms for vector search + verification
|
||||
- **Total search time**: ~650-750ms (reasonable UX)
|
||||
|
||||
**3. Multi-User Development**
|
||||
- 5-10 concurrent users: Comfortable
|
||||
- Good parallelism support
|
||||
- Network latency dominates (shared)
|
||||
|
||||
#### ⚠️ **Consider Alternatives For:**
|
||||
|
||||
**1. Real-Time Applications**
|
||||
- Sub-100ms latency requirements
|
||||
- High-frequency queries (>10/sec sustained)
|
||||
- Consider: Local embeddings or Infinity
|
||||
|
||||
**2. Large-Scale Batch Processing**
|
||||
- >100,000 documents to index
|
||||
- >10 embeddings/sec sustained
|
||||
- Consider: GPU-accelerated TEI
|
||||
|
||||
**3. Production with >50 Users**
|
||||
- High concurrent load
|
||||
- Latency sensitivity
|
||||
- Consider: Dedicated embedding service
|
||||
|
||||
### Deployment Scenarios
|
||||
|
||||
#### Scenario 1: Development Environment
|
||||
|
||||
**Profile**:
|
||||
- 1-3 developers
|
||||
- 1,000-5,000 notes total
|
||||
- Occasional searches/indexing
|
||||
|
||||
**Verdict**: ✅ **Perfect fit**
|
||||
- Initial index: ~5-15 minutes (one-time)
|
||||
- Incremental updates: <1 minute
|
||||
- Search latency: Acceptable
|
||||
- No infrastructure changes needed
|
||||
|
||||
**Configuration**:
|
||||
```bash
|
||||
OLLAMA_URL=https://ollama.internal.coutinho.io
|
||||
OLLAMA_MODEL=nomic-embed-text
|
||||
VECTOR_SYNC_INTERVAL=600 # 10 minutes
|
||||
VECTOR_SYNC_BATCH_SIZE=10
|
||||
```
|
||||
|
||||
#### Scenario 2: Small Production (10-20 users)
|
||||
|
||||
**Profile**:
|
||||
- 10-20 active users
|
||||
- 10,000-50,000 notes total
|
||||
- 50-200 searches/day
|
||||
- Nightly incremental indexing
|
||||
|
||||
**Verdict**: ✅ **Suitable with optimizations**
|
||||
- Initial index: 1-3 hours (run overnight)
|
||||
- Incremental: 5-15 minutes/night
|
||||
- Search: Acceptable for most users
|
||||
- Monitor network latency
|
||||
|
||||
**Configuration**:
|
||||
```bash
|
||||
OLLAMA_URL=https://ollama.internal.coutinho.io
|
||||
OLLAMA_MODEL=nomic-embed-text
|
||||
VECTOR_SYNC_INTERVAL=86400 # Daily at night
|
||||
VECTOR_SYNC_BATCH_SIZE=12 # Conservative for quality
|
||||
SEARCH_TIMEOUT_MS=1000 # Account for 550ms latency
|
||||
```
|
||||
|
||||
**Optimizations**:
|
||||
- Run sync during off-hours
|
||||
- Cache query embeddings (common searches)
|
||||
- Use hybrid search (keyword + semantic)
|
||||
|
||||
#### Scenario 3: Medium Production (50-100 users)
|
||||
|
||||
**Profile**:
|
||||
- 50-100 active users
|
||||
- 100,000+ notes
|
||||
- 500-1000 searches/day
|
||||
- Real-time indexing desired
|
||||
|
||||
**Verdict**: ⚠️ **Marginal - monitor closely**
|
||||
- Initial index: 5-10 hours
|
||||
- Search latency: May feel slow for some users
|
||||
- Concurrent load: Approaching limits
|
||||
- **Recommendation**: Plan migration to Infinity
|
||||
|
||||
**Configuration**:
|
||||
```bash
|
||||
OLLAMA_URL=https://ollama.internal.coutinho.io
|
||||
OLLAMA_MODEL=nomic-embed-text
|
||||
VECTOR_SYNC_INTERVAL=3600 # Hourly
|
||||
VECTOR_SYNC_BATCH_SIZE=10
|
||||
SEMANTIC_WEIGHT=0.5 # Rely more on keyword search
|
||||
SEARCH_TIMEOUT_MS=2000 # Generous timeout
|
||||
```
|
||||
|
||||
**Migration Path**:
|
||||
- Start with Ollama
|
||||
- Monitor latency metrics
|
||||
- When p95 latency >1s, migrate to Infinity
|
||||
- Keep Ollama as fallback
|
||||
|
||||
#### Scenario 4: Large Production (>100 users)
|
||||
|
||||
**Profile**:
|
||||
- >100 active users
|
||||
- >500,000 notes
|
||||
- >1000 searches/day
|
||||
- Real-time expectations
|
||||
|
||||
**Verdict**: ❌ **Not recommended**
|
||||
- Latency too high for scale
|
||||
- Throughput insufficient
|
||||
- Network becomes bottleneck
|
||||
- **Recommendation**: Use Infinity or TEI from start
|
||||
|
||||
## Network Latency Optimization
|
||||
|
||||
### Current Overhead: ~300-400ms
|
||||
|
||||
**If MCP server runs closer to Ollama**:
|
||||
```
|
||||
Same VPC/network: ~1-5ms (300-400ms savings!)
|
||||
Same host: <1ms (300-400ms savings!)
|
||||
```
|
||||
|
||||
### Recommendation
|
||||
|
||||
**Option A: Co-locate MCP server with Ollama**
|
||||
- Reduces latency from 550ms → 150-200ms
|
||||
- 2.5-3x improvement
|
||||
- Makes Ollama competitive with cloud APIs
|
||||
|
||||
**Option B: Keep separate (current)**
|
||||
- Simpler deployment
|
||||
- Better security isolation
|
||||
- Accept 550ms latency
|
||||
|
||||
**Option C: Add Infinity container to MCP server**
|
||||
- Best of both worlds
|
||||
- Use Infinity for speed (local)
|
||||
- Fallback to Ollama if needed
|
||||
|
||||
## Capacity Estimates
|
||||
|
||||
### Indexing Capacity
|
||||
|
||||
**Sustained Throughput**: 3-4 embeddings/sec (conservative)
|
||||
|
||||
| Document Count | Index Time | Notes |
|
||||
|----------------|------------|-------|
|
||||
| 1,000 | 4-5 min | Quick |
|
||||
| 5,000 | 20-25 min | Reasonable |
|
||||
| 10,000 | 40-50 min | Acceptable |
|
||||
| 50,000 | 3.5-4.5 hours | Overnight job |
|
||||
| 100,000 | 7-9 hours | Long batch |
|
||||
| 500,000 | 35-45 hours | Not recommended |
|
||||
|
||||
**Incremental Updates** (10% change daily):
|
||||
- 1,000 docs: ~30 sec
|
||||
- 10,000 docs: ~5 min
|
||||
- 50,000 docs: ~25 min
|
||||
|
||||
### Search Capacity
|
||||
|
||||
**Query Latency Budget**:
|
||||
- Embedding: 550ms
|
||||
- Vector search: 50-100ms
|
||||
- Permission verification: 50-100ms
|
||||
- **Total**: 650-750ms
|
||||
|
||||
**Concurrent Users** (assuming 1 search every 5 minutes):
|
||||
- 10 users: 2 queries/min → Comfortable
|
||||
- 50 users: 10 queries/min → Near limit
|
||||
- 100 users: 20 queries/min → Over capacity
|
||||
|
||||
**Peak Load** (all users search at once):
|
||||
- Parallelism: ~4 concurrent
|
||||
- Queue time: Proportional to position
|
||||
- 10 simultaneous: ~1.5-2 sec for last user
|
||||
- 50 simultaneous: ~7-10 sec for last user
|
||||
|
||||
## Recommendations
|
||||
|
||||
### Immediate Actions (Development)
|
||||
|
||||
1. **✅ Use Ollama as-is**
|
||||
- Current setup is perfect for dev/testing
|
||||
- No changes needed
|
||||
- Start building semantic search
|
||||
|
||||
2. **Configuration**:
|
||||
```bash
|
||||
OLLAMA_URL=https://ollama.internal.coutinho.io
|
||||
OLLAMA_MODEL=nomic-embed-text
|
||||
VECTOR_SYNC_BATCH_SIZE=10
|
||||
```
|
||||
|
||||
3. **Add Monitoring**:
|
||||
```python
|
||||
# Track these metrics
|
||||
- embedding_latency_seconds (histogram)
|
||||
- embedding_batch_size (gauge)
|
||||
- embedding_errors_total (counter)
|
||||
```
|
||||
|
||||
### Short-Term (Small Production)
|
||||
|
||||
1. **Optimize Batching**:
|
||||
- Use batch size 10-12 (quality sweet spot)
|
||||
- Process during off-hours
|
||||
- Implement incremental sync
|
||||
|
||||
2. **Add Caching**:
|
||||
```python
|
||||
# Cache common query embeddings
|
||||
@lru_cache(maxsize=1000)
|
||||
async def embed_with_cache(query: str):
|
||||
return await ollama.embed(query)
|
||||
```
|
||||
|
||||
3. **Monitor Metrics**:
|
||||
- P50, P95, P99 latency
|
||||
- Throughput (embeddings/sec)
|
||||
- Error rates
|
||||
|
||||
### Medium-Term (If Scaling Up)
|
||||
|
||||
1. **Add Infinity Container** (when >50 users or latency issues):
|
||||
```yaml
|
||||
services:
|
||||
infinity:
|
||||
image: michaelf34/infinity:latest
|
||||
# Local to MCP server - ~10-20ms latency
|
||||
```
|
||||
|
||||
2. **Implement Tiered Fallback**:
|
||||
```
|
||||
Infinity (local, fast) → Ollama (remote, slower) → Local model
|
||||
```
|
||||
|
||||
3. **Load Testing**:
|
||||
- Simulate 50-100 concurrent users
|
||||
- Measure actual throughput limits
|
||||
- Identify breaking points
|
||||
|
||||
### Long-Term (Enterprise Scale)
|
||||
|
||||
1. **Migrate to TEI Cluster** (when >100 users):
|
||||
- GPU-accelerated
|
||||
- Horizontal scaling
|
||||
- <20ms latency
|
||||
|
||||
2. **Consider Managed Services**:
|
||||
- Pinecone, Qdrant Cloud
|
||||
- Removes operational burden
|
||||
- Better SLAs
|
||||
|
||||
## Testing Recommendations
|
||||
|
||||
### Load Testing Script
|
||||
|
||||
```bash
|
||||
# Test sustained load
|
||||
for i in {1..100}; do
|
||||
curl -s https://ollama.internal.coutinho.io/api/embed \
|
||||
-d "{\"model\": \"nomic-embed-text\", \"input\": \"Test $i\"}" &
|
||||
|
||||
# Rate limit: 5 concurrent
|
||||
if [ $(($i % 5)) -eq 0 ]; then
|
||||
wait
|
||||
sleep 1
|
||||
fi
|
||||
done
|
||||
```
|
||||
|
||||
### Metrics to Collect
|
||||
|
||||
1. **Latency Distribution**:
|
||||
- P50 (median)
|
||||
- P95 (acceptable)
|
||||
- P99 (outliers)
|
||||
|
||||
2. **Throughput**:
|
||||
- Embeddings/second
|
||||
- Peak vs sustained
|
||||
|
||||
3. **Error Rates**:
|
||||
- Timeouts
|
||||
- Server errors
|
||||
- Quality issues
|
||||
|
||||
## Conclusion
|
||||
|
||||
**Your Ollama instance is ready for development and small production use!**
|
||||
|
||||
**Current Capacity**:
|
||||
- ✅ Development: Unlimited
|
||||
- ✅ Small prod (10-20 users, 10k docs): Comfortable
|
||||
- ⚠️ Medium prod (50 users, 50k docs): Monitoring needed
|
||||
- ❌ Large prod (>100 users): Migrate to Infinity/TEI
|
||||
|
||||
**Key Strengths**:
|
||||
- Fully operational
|
||||
- Good parallelism
|
||||
- Acceptable latency for most use cases
|
||||
- Easy to integrate
|
||||
|
||||
**Key Limitations**:
|
||||
- Network latency adds 300-400ms overhead
|
||||
- Batch quality issues at >16 items
|
||||
- Limited scalability beyond 50 users
|
||||
|
||||
**Recommendation**:
|
||||
Start using Ollama immediately for development. Add monitoring and plan for Infinity when you approach 50 users or experience latency issues. The abstraction layer in ADR-003 makes migration seamless.
|
||||
|
||||
**Next Steps**:
|
||||
1. Configure MCP server with Ollama URL
|
||||
2. Implement semantic search tools
|
||||
3. Add basic monitoring
|
||||
4. Test with real workload
|
||||
5. Scale up as needed
|
||||
@@ -0,0 +1,796 @@
|
||||
# Ollama Embeddings Investigation
|
||||
|
||||
**Date**: 2025-10-30
|
||||
**Status**: Recommendation for Integration
|
||||
|
||||
## Executive Summary
|
||||
|
||||
Ollama provides a **local, self-hosted embedding solution** that is excellent for **development and small-scale deployments** but has **performance limitations** compared to specialized embedding inference engines (TEI, Infinity).
|
||||
|
||||
**Recommendation**: Include Ollama as **Tier 2 fallback** in our embedding strategy (after cloud APIs, before local sentence-transformers), prioritizing ease of setup over maximum performance.
|
||||
|
||||
## Overview
|
||||
|
||||
Ollama is primarily known as a local LLM runner but added embedding model support in version 0.1.26, making it a convenient option for generating vector embeddings without external API dependencies.
|
||||
|
||||
### Key Characteristics
|
||||
|
||||
- **Local & Self-Hosted**: No external API calls, full privacy
|
||||
- **Easy Setup**: Single binary, simple model downloads (`ollama pull nomic-embed-text`)
|
||||
- **Unified Platform**: Same tool for both LLMs and embeddings
|
||||
- **OpenAI Compatible**: `/v1/embeddings` endpoint for drop-in replacement
|
||||
- **Multi-Platform**: Linux, macOS, Windows support
|
||||
- **GPU Support**: CUDA, ROCm, Metal acceleration
|
||||
|
||||
## API Details
|
||||
|
||||
### Endpoint Structure
|
||||
|
||||
**New API** (recommended):
|
||||
```bash
|
||||
POST http://localhost:11434/api/embed
|
||||
```
|
||||
|
||||
**OpenAI Compatible**:
|
||||
```bash
|
||||
POST http://localhost:11434/v1/embeddings
|
||||
```
|
||||
|
||||
**Legacy API** (deprecated):
|
||||
```bash
|
||||
POST http://localhost:11434/api/embeddings
|
||||
```
|
||||
|
||||
### Request Format
|
||||
|
||||
**Single Text Embedding**:
|
||||
```json
|
||||
{
|
||||
"model": "nomic-embed-text",
|
||||
"input": "Text to embed"
|
||||
}
|
||||
```
|
||||
|
||||
**Batch Embedding** (since v0.2.0):
|
||||
```json
|
||||
{
|
||||
"model": "nomic-embed-text",
|
||||
"input": [
|
||||
"First text to embed",
|
||||
"Second text to embed",
|
||||
"Third text to embed"
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
### Response Format
|
||||
|
||||
```json
|
||||
{
|
||||
"model": "nomic-embed-text",
|
||||
"embeddings": [
|
||||
[0.123, -0.456, 0.789, ...], // 768 dimensions for nomic-embed-text
|
||||
[0.234, -0.567, 0.890, ...]
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
### Python Integration
|
||||
|
||||
```python
|
||||
import ollama
|
||||
|
||||
# Single embedding
|
||||
response = ollama.embed(
|
||||
model='nomic-embed-text',
|
||||
input='Text to embed'
|
||||
)
|
||||
embedding = response['embeddings'][0]
|
||||
|
||||
# Batch embeddings (more efficient)
|
||||
response = ollama.embed(
|
||||
model='nomic-embed-text',
|
||||
input=[
|
||||
'First text',
|
||||
'Second text',
|
||||
'Third text'
|
||||
]
|
||||
)
|
||||
embeddings = response['embeddings']
|
||||
```
|
||||
|
||||
## Available Models
|
||||
|
||||
### 1. nomic-embed-text (Recommended)
|
||||
|
||||
**Specifications**:
|
||||
- **Parameters**: 137M
|
||||
- **Dimensions**: 768
|
||||
- **Context Length**: 8,192 tokens (2K effective)
|
||||
- **Size**: 274MB
|
||||
- **Architecture**: BERT-based
|
||||
|
||||
**Performance**:
|
||||
- Outperforms OpenAI `text-embedding-ada-002` and `text-embedding-3-small`
|
||||
- Excellent for long-context tasks
|
||||
- Strong general-purpose performance
|
||||
|
||||
**Use Cases**:
|
||||
- General RAG applications
|
||||
- Long document processing
|
||||
- Semantic search
|
||||
- Document clustering
|
||||
|
||||
**Pull Command**:
|
||||
```bash
|
||||
ollama pull nomic-embed-text
|
||||
```
|
||||
|
||||
### 2. mxbai-embed-large
|
||||
|
||||
**Specifications**:
|
||||
- **Parameters**: 334M
|
||||
- **Dimensions**: 1,024
|
||||
- **Context Length**: 512 tokens
|
||||
- **Architecture**: BERT-large optimized
|
||||
|
||||
**Performance**:
|
||||
- Claims to outperform commercial models
|
||||
- Higher precision for complex queries
|
||||
- Best quality but slower
|
||||
|
||||
**Use Cases**:
|
||||
- High-precision semantic search
|
||||
- Enterprise knowledge bases
|
||||
- Multilingual content
|
||||
|
||||
**Pull Command**:
|
||||
```bash
|
||||
ollama pull mxbai-embed-large
|
||||
```
|
||||
|
||||
### 3. all-minilm
|
||||
|
||||
**Specifications**:
|
||||
- **Parameters**: 23M
|
||||
- **Dimensions**: 384
|
||||
- **Context Length**: 256 tokens
|
||||
- **Size**: Smallest footprint
|
||||
|
||||
**Performance**:
|
||||
- Fastest processing speed
|
||||
- Good for sentence-level tasks
|
||||
- Limited context window
|
||||
|
||||
**Use Cases**:
|
||||
- Real-time applications
|
||||
- Resource-constrained environments
|
||||
- High-throughput scenarios
|
||||
- Development/testing
|
||||
|
||||
**Pull Command**:
|
||||
```bash
|
||||
ollama pull all-minilm
|
||||
```
|
||||
|
||||
## Performance Benchmarks
|
||||
|
||||
### Throughput Comparison
|
||||
|
||||
| Hardware | Model | Batch Size | Throughput | Notes |
|
||||
|----------|-------|------------|------------|-------|
|
||||
| RTX 4090 (24GB) | nomic-embed-text | 256 | 12,450 tok/sec | GPU-accelerated |
|
||||
| RTX 4090 (24GB) | mxbai-embed-large | 128 | 8,920 tok/sec | GPU-accelerated |
|
||||
| Intel i9-13900K (CPU) | nomic-embed-text | 32 | 3,250 tok/sec | CPU-only |
|
||||
| Intel i9-13900K (CPU) | mxbai-embed-large | 16 | 2,180 tok/sec | CPU-only |
|
||||
|
||||
### Latency Comparison
|
||||
|
||||
**Single Request Latency** (RTX 4060):
|
||||
- Ollama: ~99ms
|
||||
- TEI: ~20ms (5x faster)
|
||||
- Infinity: ~30-40ms (2.5-3x faster)
|
||||
|
||||
**Batch Processing**:
|
||||
- Optimal batch size: 32-64 (model dependent)
|
||||
- Performance degrades with batches >16 (quality issues reported)
|
||||
- 2x slower than direct sentence-transformers usage
|
||||
|
||||
### Engine Comparison
|
||||
|
||||
Based on benchmarks from Baseten (2024):
|
||||
|
||||
| Engine | Relative Throughput | Notes |
|
||||
|--------|---------------------|-------|
|
||||
| BEI | 9.0x (baseline) | Fastest (proprietary) |
|
||||
| TEI | 4.5x | Open source, Rust-based |
|
||||
| Infinity | 3.5x | PyTorch/ONNX optimized |
|
||||
| vLLM | 3.0x | General LLM inference |
|
||||
| **Ollama** | **1.0x** | Slowest for embeddings |
|
||||
|
||||
**Key Insight**: Ollama is **5-9x slower** than specialized embedding engines but trades performance for ease of use and unified platform.
|
||||
|
||||
## Integration Implementation
|
||||
|
||||
### Python Client Wrapper
|
||||
|
||||
```python
|
||||
# nextcloud_mcp_server/embeddings/ollama.py
|
||||
import httpx
|
||||
from typing import List
|
||||
|
||||
|
||||
class OllamaEmbedding:
|
||||
"""Ollama embedding provider"""
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
base_url: str = "http://localhost:11434",
|
||||
model: str = "nomic-embed-text"
|
||||
):
|
||||
self.base_url = base_url.rstrip("/")
|
||||
self.model = model
|
||||
self.client = httpx.AsyncClient(timeout=60.0)
|
||||
|
||||
# Model dimension mapping
|
||||
self.dimensions = {
|
||||
"nomic-embed-text": 768,
|
||||
"mxbai-embed-large": 1024,
|
||||
"all-minilm": 384
|
||||
}
|
||||
self.dimension = self.dimensions.get(model, 768)
|
||||
|
||||
async def embed(self, text: str) -> List[float]:
|
||||
"""Generate embedding for single text"""
|
||||
response = await self.client.post(
|
||||
f"{self.base_url}/api/embed",
|
||||
json={
|
||||
"model": self.model,
|
||||
"input": text
|
||||
}
|
||||
)
|
||||
response.raise_for_status()
|
||||
data = response.json()
|
||||
return data["embeddings"][0]
|
||||
|
||||
async def embed_batch(
|
||||
self,
|
||||
texts: List[str],
|
||||
batch_size: int = 32
|
||||
) -> List[List[float]]:
|
||||
"""
|
||||
Generate embeddings for multiple texts in batches.
|
||||
|
||||
Note: Ollama has reported quality issues with batch sizes >16.
|
||||
We use batch_size=32 as default but allow configuration.
|
||||
"""
|
||||
all_embeddings = []
|
||||
|
||||
# Process in chunks to avoid batch size issues
|
||||
for i in range(0, len(texts), batch_size):
|
||||
batch = texts[i:i + batch_size]
|
||||
|
||||
response = await self.client.post(
|
||||
f"{self.base_url}/api/embed",
|
||||
json={
|
||||
"model": self.model,
|
||||
"input": batch
|
||||
}
|
||||
)
|
||||
response.raise_for_status()
|
||||
data = response.json()
|
||||
all_embeddings.extend(data["embeddings"])
|
||||
|
||||
return all_embeddings
|
||||
|
||||
async def check_health(self) -> bool:
|
||||
"""Check if Ollama server is running and model is available"""
|
||||
try:
|
||||
# Check if server is up
|
||||
response = await self.client.get(f"{self.base_url}/api/tags")
|
||||
response.raise_for_status()
|
||||
|
||||
# Check if model is pulled
|
||||
models = response.json().get("models", [])
|
||||
model_names = [m["name"] for m in models]
|
||||
|
||||
if self.model not in model_names:
|
||||
raise ValueError(
|
||||
f"Model '{self.model}' not found. "
|
||||
f"Run: ollama pull {self.model}"
|
||||
)
|
||||
|
||||
return True
|
||||
|
||||
except Exception as e:
|
||||
raise ConnectionError(f"Ollama health check failed: {e}")
|
||||
|
||||
async def close(self):
|
||||
"""Close HTTP client"""
|
||||
await self.client.aclose()
|
||||
```
|
||||
|
||||
### Auto-Detection in Embedding Service
|
||||
|
||||
```python
|
||||
# nextcloud_mcp_server/embeddings/service.py
|
||||
from typing import Optional
|
||||
import os
|
||||
import logging
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
class EmbeddingService:
|
||||
"""Unified embedding service with automatic provider detection"""
|
||||
|
||||
def __init__(self):
|
||||
self.provider = None
|
||||
self._detect_provider()
|
||||
|
||||
def _detect_provider(self):
|
||||
"""Auto-detect available embedding provider"""
|
||||
|
||||
# Tier 1: OpenAI API (best quality)
|
||||
if os.getenv("OPENAI_API_KEY"):
|
||||
from .openai import OpenAIEmbedding
|
||||
self.provider = OpenAIEmbedding(
|
||||
model=os.getenv("OPENAI_EMBEDDING_MODEL", "text-embedding-3-small"),
|
||||
api_key=os.getenv("OPENAI_API_KEY")
|
||||
)
|
||||
logger.info("✓ Using OpenAI embeddings")
|
||||
return
|
||||
|
||||
# Tier 2a: Infinity (optimized self-hosted)
|
||||
if os.getenv("INFINITY_URL"):
|
||||
from .infinity import InfinityEmbedding
|
||||
try:
|
||||
self.provider = InfinityEmbedding(
|
||||
url=os.getenv("INFINITY_URL"),
|
||||
model=os.getenv("EMBEDDING_MODEL", "BAAI/bge-small-en-v1.5")
|
||||
)
|
||||
logger.info("✓ Using Infinity embeddings (optimized)")
|
||||
return
|
||||
except Exception as e:
|
||||
logger.warning(f"Infinity unavailable: {e}")
|
||||
|
||||
# Tier 2b: Ollama (easy self-hosted)
|
||||
if os.getenv("OLLAMA_URL"):
|
||||
from .ollama import OllamaEmbedding
|
||||
try:
|
||||
self.provider = OllamaEmbedding(
|
||||
base_url=os.getenv("OLLAMA_URL", "http://localhost:11434"),
|
||||
model=os.getenv("OLLAMA_MODEL", "nomic-embed-text")
|
||||
)
|
||||
# Verify Ollama is running and model is available
|
||||
import asyncio
|
||||
asyncio.run(self.provider.check_health())
|
||||
logger.info("✓ Using Ollama embeddings (easy setup)")
|
||||
return
|
||||
except Exception as e:
|
||||
logger.warning(f"Ollama unavailable: {e}")
|
||||
|
||||
# Tier 3: Local model (fallback)
|
||||
logger.warning("No cloud/hosted embeddings available, using local model")
|
||||
from .local import LocalEmbedding
|
||||
self.provider = LocalEmbedding(
|
||||
model=os.getenv("LOCAL_EMBEDDING_MODEL", "all-MiniLM-L6-v2")
|
||||
)
|
||||
logger.info("✓ Using local embeddings (CPU fallback)")
|
||||
|
||||
async def embed(self, text: str):
|
||||
"""Generate embedding for text"""
|
||||
return await self.provider.embed(text)
|
||||
|
||||
async def embed_batch(self, texts: list[str]):
|
||||
"""Generate embeddings for multiple texts"""
|
||||
return await self.provider.embed_batch(texts)
|
||||
|
||||
@property
|
||||
def dimension(self) -> int:
|
||||
"""Get embedding dimension"""
|
||||
return self.provider.dimension
|
||||
```
|
||||
|
||||
### Docker Compose Configuration
|
||||
|
||||
```yaml
|
||||
services:
|
||||
# Ollama embedding service
|
||||
ollama:
|
||||
image: ollama/ollama:latest
|
||||
restart: always
|
||||
ports:
|
||||
- 127.0.0.1:11434:11434
|
||||
volumes:
|
||||
- ollama_models:/root/.ollama
|
||||
# Optional: GPU support
|
||||
deploy:
|
||||
resources:
|
||||
reservations:
|
||||
devices:
|
||||
- driver: nvidia
|
||||
count: 1
|
||||
capabilities: [gpu]
|
||||
# Pull models on startup
|
||||
entrypoint: ["/bin/sh", "-c"]
|
||||
command:
|
||||
- |
|
||||
ollama serve &
|
||||
sleep 5
|
||||
ollama pull nomic-embed-text
|
||||
wait
|
||||
|
||||
# MCP Server with Ollama embeddings
|
||||
mcp:
|
||||
build: .
|
||||
depends_on:
|
||||
- ollama
|
||||
environment:
|
||||
# ... other vars ...
|
||||
- OLLAMA_URL=http://ollama:11434
|
||||
- OLLAMA_MODEL=nomic-embed-text
|
||||
|
||||
# Vector sync worker
|
||||
mcp-vector-sync:
|
||||
build: .
|
||||
command: ["python", "-m", "nextcloud_mcp_server.sync.vector_indexer"]
|
||||
depends_on:
|
||||
- ollama
|
||||
- qdrant
|
||||
environment:
|
||||
# ... other vars ...
|
||||
- OLLAMA_URL=http://ollama:11434
|
||||
- OLLAMA_MODEL=nomic-embed-text
|
||||
|
||||
volumes:
|
||||
ollama_models:
|
||||
```
|
||||
|
||||
## Advantages of Ollama
|
||||
|
||||
### 1. **Ease of Setup**
|
||||
|
||||
```bash
|
||||
# Install Ollama
|
||||
curl -fsSL https://ollama.com/install.sh | sh
|
||||
|
||||
# Pull embedding model
|
||||
ollama pull nomic-embed-text
|
||||
|
||||
# Done! API available at localhost:11434
|
||||
```
|
||||
|
||||
No complex configuration, no Docker registries, no model conversion.
|
||||
|
||||
### 2. **Privacy & Data Sovereignty**
|
||||
|
||||
- All processing happens locally
|
||||
- No data leaves your infrastructure
|
||||
- No API keys or external dependencies
|
||||
- Ideal for sensitive content (medical, legal, financial)
|
||||
|
||||
### 3. **Unified Platform**
|
||||
|
||||
- Same tool for LLMs and embeddings
|
||||
- Consistent API across model types
|
||||
- Single point of management
|
||||
- Simplified operations
|
||||
|
||||
### 4. **Developer Experience**
|
||||
|
||||
- Simple API (similar to OpenAI)
|
||||
- Good documentation
|
||||
- Active community
|
||||
- Framework integrations (LangChain, LlamaIndex)
|
||||
|
||||
### 5. **Cost**
|
||||
|
||||
- Free and open source
|
||||
- No per-token API costs
|
||||
- Only infrastructure costs (compute)
|
||||
|
||||
### 6. **Model Variety**
|
||||
|
||||
Growing library of embedding models:
|
||||
- nomic-embed-text (general purpose)
|
||||
- mxbai-embed-large (high quality)
|
||||
- all-minilm (fast)
|
||||
- More models added regularly
|
||||
|
||||
## Limitations of Ollama
|
||||
|
||||
### 1. **Performance**
|
||||
|
||||
- **5-9x slower** than specialized engines (TEI, Infinity)
|
||||
- Not optimized specifically for embedding inference
|
||||
- Batch processing issues at larger batch sizes (>16)
|
||||
- Higher latency compared to alternatives
|
||||
|
||||
### 2. **Scalability**
|
||||
|
||||
- Single-instance deployment (no native clustering)
|
||||
- Limited concurrent request handling
|
||||
- Not designed for high-throughput production
|
||||
- Resource usage per request is higher
|
||||
|
||||
### 3. **Batch Processing Issues**
|
||||
|
||||
- Quality degradation reported with large batches
|
||||
- Optimal batch size: 32-64 (conservative)
|
||||
- Less efficient than specialized engines
|
||||
- GitHub issues tracking batch problems (#6262)
|
||||
|
||||
### 4. **Resource Usage**
|
||||
|
||||
- Models stay loaded in memory (VRAM/RAM)
|
||||
- Higher memory footprint per model
|
||||
- GPU context switching overhead
|
||||
- Not as memory-efficient as specialized engines
|
||||
|
||||
### 5. **Production Features**
|
||||
|
||||
- No built-in load balancing
|
||||
- Limited monitoring/metrics
|
||||
- No automatic scaling
|
||||
- Basic error handling
|
||||
|
||||
## Use Case Recommendations
|
||||
|
||||
### ✅ **Excellent For:**
|
||||
|
||||
1. **Development & Testing**
|
||||
- Quick setup for prototyping
|
||||
- Local development environments
|
||||
- Testing embedding pipelines
|
||||
|
||||
2. **Small Deployments**
|
||||
- <10 users
|
||||
- <10,000 documents
|
||||
- Infrequent searches (<100/day)
|
||||
- Hobbyist/personal projects
|
||||
|
||||
3. **Privacy-Critical Applications**
|
||||
- Medical/healthcare records
|
||||
- Legal documents
|
||||
- Financial data
|
||||
- Air-gapped environments
|
||||
|
||||
4. **Unified LLM Stack**
|
||||
- Projects already using Ollama for LLMs
|
||||
- Simplified operations
|
||||
- Consistent tooling
|
||||
|
||||
5. **Educational/Learning**
|
||||
- Teaching RAG concepts
|
||||
- Learning embeddings
|
||||
- Hackathons/workshops
|
||||
|
||||
### ⚠️ **Consider Alternatives For:**
|
||||
|
||||
1. **Production at Scale**
|
||||
- >100 users
|
||||
- >100,000 documents
|
||||
- High query volume (>1000/day)
|
||||
- Use: TEI or Infinity
|
||||
|
||||
2. **Performance-Critical**
|
||||
- Real-time search (<50ms latency)
|
||||
- High-throughput batch processing
|
||||
- Use: TEI with GPU
|
||||
|
||||
3. **Enterprise Deployments**
|
||||
- Need for high availability
|
||||
- Load balancing requirements
|
||||
- Advanced monitoring
|
||||
- Use: Managed services or TEI cluster
|
||||
|
||||
4. **Large-Scale Indexing**
|
||||
- Millions of documents
|
||||
- Continuous high-volume ingestion
|
||||
- Use: Infinity or commercial solutions
|
||||
|
||||
## Integration Strategy
|
||||
|
||||
### Recommended Tier Placement
|
||||
|
||||
**Update ADR-003 embedding strategy:**
|
||||
|
||||
```
|
||||
Tier 1: OpenAI API (best quality, requires API key)
|
||||
↓ fallback
|
||||
Tier 2a: Infinity (optimized self-hosted, complex setup)
|
||||
↓ fallback
|
||||
Tier 2b: Ollama (easy self-hosted, moderate performance) ← NEW
|
||||
↓ fallback
|
||||
Tier 3: Local sentence-transformers (CPU fallback, simplest)
|
||||
```
|
||||
|
||||
### Configuration
|
||||
|
||||
```bash
|
||||
# Option 1: Use Infinity (if available)
|
||||
INFINITY_URL=http://infinity:7997
|
||||
EMBEDDING_MODEL=BAAI/bge-small-en-v1.5
|
||||
|
||||
# Option 2: Use Ollama (if Infinity unavailable)
|
||||
OLLAMA_URL=http://ollama:11434
|
||||
OLLAMA_MODEL=nomic-embed-text
|
||||
|
||||
# Option 3: Use local model (automatic fallback)
|
||||
# No configuration needed
|
||||
```
|
||||
|
||||
### When to Choose Ollama
|
||||
|
||||
**Choose Ollama if**:
|
||||
- You're already using Ollama for LLMs
|
||||
- You need privacy/data sovereignty
|
||||
- You have <10k documents and <100 users
|
||||
- Ease of setup is more important than max performance
|
||||
- You're in development/testing phase
|
||||
|
||||
**Choose Infinity/TEI if**:
|
||||
- You need maximum throughput (>1000 embeddings/sec)
|
||||
- You have >100k documents
|
||||
- Latency is critical (<50ms)
|
||||
- You're in production with >100 users
|
||||
|
||||
**Choose OpenAI API if**:
|
||||
- You're okay with cloud dependencies
|
||||
- You need best-in-class quality
|
||||
- Cost is not a concern (~$0.02 per 1M tokens)
|
||||
|
||||
## Production Deployment Guidance
|
||||
|
||||
### Small Production (Ollama Acceptable)
|
||||
|
||||
**Profile**:
|
||||
- 5-20 users
|
||||
- 1,000-10,000 documents
|
||||
- 50-200 searches/day
|
||||
- <2 sec acceptable latency
|
||||
|
||||
**Configuration**:
|
||||
```yaml
|
||||
ollama:
|
||||
image: ollama/ollama:latest
|
||||
deploy:
|
||||
resources:
|
||||
limits:
|
||||
memory: 4GB
|
||||
cpus: "2.0"
|
||||
reservations:
|
||||
devices:
|
||||
- driver: nvidia # GPU if available
|
||||
count: 1
|
||||
capabilities: [gpu]
|
||||
environment:
|
||||
- OLLAMA_NUM_PARALLEL=2 # Concurrent requests
|
||||
```
|
||||
|
||||
**Expected Performance**:
|
||||
- Embedding latency: 100-200ms
|
||||
- Throughput: 5-10 embeddings/sec
|
||||
- Memory: 2-3GB (model loaded)
|
||||
|
||||
### Medium Production (Use Infinity/TEI)
|
||||
|
||||
**Profile**:
|
||||
- 20-200 users
|
||||
- 10,000-1M documents
|
||||
- 500-5,000 searches/day
|
||||
- <500ms acceptable latency
|
||||
|
||||
**Recommendation**: Migrate to Infinity or TEI
|
||||
```yaml
|
||||
infinity:
|
||||
image: michaelf34/infinity:latest
|
||||
# Better throughput and latency
|
||||
```
|
||||
|
||||
### Large Production (Use Specialized Solution)
|
||||
|
||||
**Profile**:
|
||||
- >200 users
|
||||
- >1M documents
|
||||
- >5,000 searches/day
|
||||
- <100ms required latency
|
||||
|
||||
**Recommendation**: Use TEI cluster or commercial service
|
||||
|
||||
## Monitoring Considerations
|
||||
|
||||
### Key Metrics to Track
|
||||
|
||||
```python
|
||||
# Add Ollama-specific metrics
|
||||
from prometheus_client import Histogram, Counter, Gauge
|
||||
|
||||
ollama_embedding_latency = Histogram(
|
||||
'ollama_embedding_duration_seconds',
|
||||
'Ollama embedding generation time',
|
||||
['model', 'batch_size']
|
||||
)
|
||||
|
||||
ollama_batch_size = Gauge(
|
||||
'ollama_batch_size',
|
||||
'Current batch size being processed'
|
||||
)
|
||||
|
||||
ollama_errors = Counter(
|
||||
'ollama_errors_total',
|
||||
'Ollama embedding errors',
|
||||
['error_type']
|
||||
)
|
||||
```
|
||||
|
||||
### Health Checks
|
||||
|
||||
```python
|
||||
async def ollama_health_check():
|
||||
"""Check Ollama availability"""
|
||||
try:
|
||||
async with httpx.AsyncClient() as client:
|
||||
# Check server
|
||||
response = await client.get("http://ollama:11434/api/tags")
|
||||
response.raise_for_status()
|
||||
|
||||
# Verify model loaded
|
||||
models = response.json().get("models", [])
|
||||
if "nomic-embed-text" not in [m["name"] for m in models]:
|
||||
return False, "Model not pulled"
|
||||
|
||||
return True, "OK"
|
||||
except Exception as e:
|
||||
return False, str(e)
|
||||
```
|
||||
|
||||
## Migration Path
|
||||
|
||||
### Starting with Ollama
|
||||
|
||||
**Phase 1: Development** (Ollama)
|
||||
- Use Ollama for initial development
|
||||
- Validate embedding pipeline
|
||||
- Test search quality
|
||||
|
||||
**Phase 2: Growth** (Ollama → Infinity)
|
||||
- Monitor performance metrics
|
||||
- When >50 users or >10k docs, migrate to Infinity
|
||||
- Simple config change, no code changes
|
||||
|
||||
**Phase 3: Scale** (Infinity → TEI/Commercial)
|
||||
- When >200 users or performance issues
|
||||
- Consider TEI cluster or managed services
|
||||
|
||||
### Code Compatibility
|
||||
|
||||
All embedding providers use the same interface:
|
||||
```python
|
||||
# Works with Ollama, Infinity, OpenAI, Local
|
||||
embedding = await embedding_service.embed(text)
|
||||
embeddings = await embedding_service.embed_batch(texts)
|
||||
```
|
||||
|
||||
**Migration is a configuration change only** - no code rewrite needed.
|
||||
|
||||
## Conclusion
|
||||
|
||||
**Ollama is a solid choice for:**
|
||||
- Early-stage projects
|
||||
- Development/testing
|
||||
- Privacy-critical applications
|
||||
- Small deployments (<10 users, <10k docs)
|
||||
- Unified LLM + embedding stack
|
||||
|
||||
**But recognize its limitations:**
|
||||
- 5-9x slower than specialized engines
|
||||
- Not designed for high-throughput production
|
||||
- Batch processing can be problematic
|
||||
- Limited scalability
|
||||
|
||||
**Recommendation**:
|
||||
✅ **Include Ollama as Tier 2b** (after Infinity, before local models) in the embedding strategy. It provides a good balance of ease-of-use and privacy for small-to-medium deployments while allowing seamless migration to more performant engines as needs grow.
|
||||
|
||||
The key is designing the abstraction layer (as done in ADR-003) so migration between engines requires only configuration changes, not code rewrites.
|
||||
@@ -3,8 +3,8 @@ Tests for Dynamic Client Registration (DCR) token_type parameter.
|
||||
|
||||
These tests verify that the Nextcloud OIDC server properly honors the token_type
|
||||
parameter during client registration, issuing the correct type of access tokens:
|
||||
- token_type="JWT" → JWT-formatted tokens (RFC 9068)
|
||||
- token_type="Bearer" → Opaque tokens (standard OAuth2)
|
||||
- token_type="jwt" → JWT-formatted tokens (RFC 9068)
|
||||
- token_type="opaque" → Opaque tokens (standard OAuth2)
|
||||
|
||||
This is critical for ensuring:
|
||||
1. Client choice is respected by the OIDC server
|
||||
@@ -208,12 +208,14 @@ async def test_dcr_respects_jwt_token_type(
|
||||
oauth_callback_server,
|
||||
):
|
||||
"""
|
||||
Test that DCR honors token_type=JWT and issues JWT-formatted tokens.
|
||||
Test that DCR honors token_type=jwt and issues JWT-formatted tokens.
|
||||
|
||||
This verifies:
|
||||
1. Client registration with token_type="JWT" succeeds
|
||||
1. Client registration with token_type="jwt" succeeds
|
||||
2. Tokens obtained via this client are JWT format (base64.base64.signature)
|
||||
3. JWT payload contains expected claims (sub, iss, scope, etc.)
|
||||
|
||||
Note: The OIDC app uses lowercase 'jwt' (not 'JWT').
|
||||
"""
|
||||
nextcloud_host = os.getenv("NEXTCLOUD_HOST")
|
||||
if not nextcloud_host:
|
||||
@@ -232,15 +234,15 @@ async def test_dcr_respects_jwt_token_type(
|
||||
token_endpoint = oidc_config.get("token_endpoint")
|
||||
authorization_endpoint = oidc_config.get("authorization_endpoint")
|
||||
|
||||
# Register client with token_type="JWT"
|
||||
logger.info("Registering OAuth client with token_type=JWT...")
|
||||
# Register client with token_type="jwt"
|
||||
logger.info("Registering OAuth client with token_type=jwt...")
|
||||
client_info = await register_client(
|
||||
nextcloud_url=nextcloud_host,
|
||||
registration_endpoint=registration_endpoint,
|
||||
client_name="DCR Test - JWT Token Type",
|
||||
redirect_uris=[callback_url],
|
||||
scopes="openid profile email notes:read notes:write",
|
||||
token_type="JWT",
|
||||
token_type="jwt",
|
||||
)
|
||||
|
||||
logger.info(f"Registered JWT client: {client_info.client_id[:16]}...")
|
||||
@@ -278,7 +280,7 @@ async def test_dcr_respects_jwt_token_type(
|
||||
assert "notes:write" in scopes, "JWT scope claim missing notes:write"
|
||||
|
||||
logger.info(
|
||||
f"✅ DCR with token_type=JWT works correctly! "
|
||||
f"✅ DCR with token_type=jwt works correctly! "
|
||||
f"Token is JWT format with scope claim: {payload['scope']}"
|
||||
)
|
||||
|
||||
@@ -290,12 +292,14 @@ async def test_dcr_respects_bearer_token_type(
|
||||
oauth_callback_server,
|
||||
):
|
||||
"""
|
||||
Test that DCR honors token_type=Bearer and issues opaque tokens.
|
||||
Test that DCR honors token_type=opaque and issues opaque tokens.
|
||||
|
||||
This verifies:
|
||||
1. Client registration with token_type="Bearer" succeeds
|
||||
1. Client registration with token_type="opaque" succeeds
|
||||
2. Tokens obtained via this client are opaque (NOT JWT format)
|
||||
3. Opaque tokens are simple strings, not base64-encoded structures
|
||||
|
||||
Note: The OIDC app uses 'opaque' or 'jwt' as token_type values (not 'Bearer').
|
||||
"""
|
||||
nextcloud_host = os.getenv("NEXTCLOUD_HOST")
|
||||
if not nextcloud_host:
|
||||
@@ -314,18 +318,18 @@ async def test_dcr_respects_bearer_token_type(
|
||||
token_endpoint = oidc_config.get("token_endpoint")
|
||||
authorization_endpoint = oidc_config.get("authorization_endpoint")
|
||||
|
||||
# Register client with token_type="Bearer" (opaque tokens)
|
||||
logger.info("Registering OAuth client with token_type=Bearer...")
|
||||
# Register client with token_type="opaque" (opaque tokens)
|
||||
logger.info("Registering OAuth client with token_type=opaque...")
|
||||
client_info = await register_client(
|
||||
nextcloud_url=nextcloud_host,
|
||||
registration_endpoint=registration_endpoint,
|
||||
client_name="DCR Test - Bearer Token Type",
|
||||
client_name="DCR Test - Opaque Token Type",
|
||||
redirect_uris=[callback_url],
|
||||
scopes="openid profile email notes:read notes:write",
|
||||
token_type="Bearer",
|
||||
token_type="opaque",
|
||||
)
|
||||
|
||||
logger.info(f"Registered Bearer client: {client_info.client_id[:16]}...")
|
||||
logger.info(f"Registered Opaque token client: {client_info.client_id[:16]}...")
|
||||
|
||||
# Obtain token via OAuth flow
|
||||
access_token = await get_oauth_token_with_client(
|
||||
@@ -353,7 +357,7 @@ async def test_dcr_respects_bearer_token_type(
|
||||
pass
|
||||
|
||||
logger.info(
|
||||
f"✅ DCR with token_type=Bearer works correctly! "
|
||||
f"✅ DCR with token_type=opaque works correctly! "
|
||||
f"Token is opaque (not JWT format): {access_token[:30]}..."
|
||||
)
|
||||
|
||||
|
||||
Reference in New Issue
Block a user