docs: Add ADRs
This commit is contained in:
@@ -0,0 +1,795 @@
|
||||
# ADR-002: Vector Database Background Sync Authentication
|
||||
|
||||
## Status
|
||||
Proposed
|
||||
|
||||
## Context
|
||||
|
||||
To enable semantic search capabilities, the MCP server needs to index user content (notes, files, calendar events) into a vector database. This requires a background sync worker that:
|
||||
|
||||
1. **Runs independently** of user requests (periodic or continuous operation)
|
||||
2. **Accesses multiple users' content** to build a comprehensive search index
|
||||
3. **Respects user permissions** - only index content users have access to
|
||||
4. **Operates in OAuth mode** - where the MCP server doesn't have traditional admin credentials
|
||||
|
||||
### Current OAuth Architecture
|
||||
|
||||
The MCP server currently operates in two authentication modes:
|
||||
|
||||
1. **BasicAuth Mode**: Uses username/password credentials (typically admin account)
|
||||
2. **OAuth Mode**: Single OAuth client, multiple user tokens
|
||||
- Users authenticate via OAuth flow
|
||||
- Each request includes user's access token
|
||||
- Server creates per-request `NextcloudClient` with user's bearer token
|
||||
- No tokens are stored server-side
|
||||
|
||||
### The Challenge
|
||||
|
||||
Background workers need long-lived authentication to:
|
||||
- Index content continuously/periodically
|
||||
- Process multiple users' data in batch operations
|
||||
- Operate when users are not actively making requests
|
||||
|
||||
However, in OAuth mode:
|
||||
- User access tokens are ephemeral (exist only during request)
|
||||
- MCP server doesn't store user credentials
|
||||
- Admin credentials defeat the purpose of OAuth
|
||||
|
||||
We need an OAuth-native solution that maintains security while enabling background operations.
|
||||
|
||||
## Decision
|
||||
|
||||
We will implement a **tiered authentication strategy** that leverages OAuth standards with graceful fallback:
|
||||
|
||||
### Primary Strategy: OAuth-Based Authentication
|
||||
|
||||
**Tier 1: Offline Access with Refresh Tokens** (Preferred)
|
||||
- Request `offline_access` scope during OAuth client registration
|
||||
- Receive and securely store user refresh tokens
|
||||
- Background worker exchanges refresh tokens for access tokens as needed
|
||||
- Respects per-user permissions and provides full audit trail
|
||||
|
||||
**Tier 2: Token Exchange (RFC 8693)** (If supported)
|
||||
- Service account exchanges its token for user-scoped tokens on-demand
|
||||
- No token storage required
|
||||
- Only available if OIDC provider implements RFC 8693
|
||||
|
||||
### Fallback Strategy: Admin Credentials
|
||||
|
||||
**Tier 3: Admin BasicAuth** (Development/Simple Deployments)
|
||||
- Dedicated sync account with read-only permissions
|
||||
- Clear documentation of security implications
|
||||
- Recommended only for trusted environments
|
||||
|
||||
### Key Architectural Principles
|
||||
|
||||
1. **Capability Detection**: Automatically detect which OAuth methods are supported
|
||||
2. **Dual-Phase Authorization**:
|
||||
- Sync worker indexes with service credentials
|
||||
- User requests verify access with user's OAuth token
|
||||
3. **Defense in Depth**: Vector database is search accelerator, not security boundary
|
||||
4. **Separation of Concerns**: Sync credentials ≠ Request credentials
|
||||
|
||||
## Implementation Details
|
||||
|
||||
### 1. Offline Access Flow (Tier 1)
|
||||
|
||||
#### 1.1 Client Registration
|
||||
```python
|
||||
# During OAuth client registration
|
||||
client_metadata = {
|
||||
"client_name": "Nextcloud MCP Server",
|
||||
"redirect_uris": ["http://localhost:8000/oauth/callback"],
|
||||
"grant_types": ["authorization_code", "refresh_token"],
|
||||
"scope": "openid profile email offline_access notes:read files:read ...",
|
||||
"token_type": "Bearer" # or "jwt"
|
||||
}
|
||||
```
|
||||
|
||||
#### 1.2 Token Storage
|
||||
```python
|
||||
# Encrypted token storage
|
||||
class RefreshTokenStorage:
|
||||
"""Securely store and manage user refresh tokens"""
|
||||
|
||||
def __init__(self, db_path: str, encryption_key: bytes):
|
||||
self.db = Database(db_path)
|
||||
self.cipher = Fernet(encryption_key)
|
||||
|
||||
async def store_refresh_token(
|
||||
self,
|
||||
user_id: str,
|
||||
refresh_token: str,
|
||||
expires_at: int | None = None
|
||||
):
|
||||
"""Store encrypted refresh token for user"""
|
||||
encrypted_token = self.cipher.encrypt(refresh_token.encode())
|
||||
await self.db.execute(
|
||||
"INSERT OR REPLACE INTO refresh_tokens VALUES (?, ?, ?, ?)",
|
||||
(user_id, encrypted_token, expires_at, int(time.time()))
|
||||
)
|
||||
|
||||
async def get_refresh_token(self, user_id: str) -> str | None:
|
||||
"""Retrieve and decrypt refresh token"""
|
||||
row = await self.db.fetch_one(
|
||||
"SELECT encrypted_token FROM refresh_tokens WHERE user_id = ?",
|
||||
(user_id,)
|
||||
)
|
||||
if row:
|
||||
return self.cipher.decrypt(row[0]).decode()
|
||||
return None
|
||||
```
|
||||
|
||||
#### 1.3 Token Refresh Flow
|
||||
```python
|
||||
async def get_user_access_token(user_id: str) -> str:
|
||||
"""Exchange refresh token for fresh access token"""
|
||||
|
||||
# Retrieve stored refresh token
|
||||
refresh_token = await token_storage.get_refresh_token(user_id)
|
||||
if not refresh_token:
|
||||
raise ValueError(f"No refresh token for user {user_id}")
|
||||
|
||||
# Exchange for access token
|
||||
async with httpx.AsyncClient() as client:
|
||||
response = await client.post(
|
||||
token_endpoint,
|
||||
data={
|
||||
"grant_type": "refresh_token",
|
||||
"refresh_token": refresh_token
|
||||
},
|
||||
auth=(client_id, client_secret)
|
||||
)
|
||||
response.raise_for_status()
|
||||
token_data = response.json()
|
||||
|
||||
# Store new refresh token if rotated
|
||||
if "refresh_token" in token_data:
|
||||
await token_storage.store_refresh_token(
|
||||
user_id,
|
||||
token_data["refresh_token"],
|
||||
token_data.get("refresh_expires_in")
|
||||
)
|
||||
|
||||
return token_data["access_token"]
|
||||
```
|
||||
|
||||
#### 1.4 Capturing Refresh Tokens
|
||||
|
||||
**Challenge**: MCP protocol doesn't expose refresh tokens to server
|
||||
|
||||
**Solution**: Intercept OAuth callback
|
||||
```python
|
||||
# Add route to MCP server
|
||||
@app.route("/oauth/callback")
|
||||
async def oauth_callback(request):
|
||||
"""Capture OAuth callback and store refresh token"""
|
||||
|
||||
code = request.query_params.get("code")
|
||||
state = request.query_params.get("state")
|
||||
|
||||
# Exchange authorization code for tokens
|
||||
token_response = await exchange_authorization_code(code)
|
||||
|
||||
# Extract user info
|
||||
userinfo = await get_userinfo(token_response["access_token"])
|
||||
user_id = userinfo["sub"]
|
||||
|
||||
# Store refresh token (if present)
|
||||
if "refresh_token" in token_response:
|
||||
await token_storage.store_refresh_token(
|
||||
user_id,
|
||||
token_response["refresh_token"],
|
||||
expires_at=token_response.get("refresh_expires_in")
|
||||
)
|
||||
logger.info(f"Stored refresh token for user: {user_id}")
|
||||
|
||||
# Continue MCP OAuth flow
|
||||
return redirect_to_mcp_client(state, token_response)
|
||||
```
|
||||
|
||||
### 2. Token Exchange Flow (Tier 2)
|
||||
|
||||
#### 2.1 Capability Detection
|
||||
```python
|
||||
async def check_token_exchange_support(discovery_url: str) -> bool:
|
||||
"""Check if OIDC provider supports RFC 8693 token exchange"""
|
||||
|
||||
async with httpx.AsyncClient() as client:
|
||||
response = await client.get(discovery_url)
|
||||
discovery = response.json()
|
||||
|
||||
# Check for token exchange grant type
|
||||
grant_types = discovery.get("grant_types_supported", [])
|
||||
return "urn:ietf:params:oauth:grant-type:token-exchange" in grant_types
|
||||
```
|
||||
|
||||
#### 2.2 Token Exchange Implementation
|
||||
```python
|
||||
async def exchange_for_user_token(
|
||||
service_token: str,
|
||||
user_id: str,
|
||||
scopes: list[str]
|
||||
) -> str:
|
||||
"""Exchange service token for user-scoped token"""
|
||||
|
||||
async with httpx.AsyncClient() as client:
|
||||
response = await client.post(
|
||||
token_endpoint,
|
||||
data={
|
||||
"grant_type": "urn:ietf:params:oauth:grant-type:token-exchange",
|
||||
"subject_token": service_token,
|
||||
"subject_token_type": "urn:ietf:params:oauth:token-type:access_token",
|
||||
"requested_token_type": "urn:ietf:params:oauth:token-type:access_token",
|
||||
"resource": f"user:{user_id}",
|
||||
"scope": " ".join(scopes)
|
||||
},
|
||||
auth=(client_id, client_secret)
|
||||
)
|
||||
|
||||
if response.status_code != 200:
|
||||
logger.warning(f"Token exchange failed: {response.status_code}")
|
||||
raise TokenExchangeNotSupportedError()
|
||||
|
||||
return response.json()["access_token"]
|
||||
```
|
||||
|
||||
#### 2.3 Service Account Token
|
||||
```python
|
||||
async def get_service_token() -> str:
|
||||
"""Get token for MCP server's service account"""
|
||||
|
||||
async with httpx.AsyncClient() as client:
|
||||
response = await client.post(
|
||||
token_endpoint,
|
||||
data={
|
||||
"grant_type": "client_credentials",
|
||||
"scope": "notes:read files:read calendar:read"
|
||||
},
|
||||
auth=(client_id, client_secret)
|
||||
)
|
||||
response.raise_for_status()
|
||||
return response.json()["access_token"]
|
||||
```
|
||||
|
||||
### 3. Sync Worker with Tiered Authentication
|
||||
|
||||
```python
|
||||
# nextcloud_mcp_server/sync_worker.py
|
||||
class VectorSyncWorker:
|
||||
"""Background worker for indexing content into vector database"""
|
||||
|
||||
def __init__(self):
|
||||
self.auth_method = None
|
||||
self.token_storage = None
|
||||
self.vector_service = None
|
||||
|
||||
async def initialize(self):
|
||||
"""Detect and configure authentication method"""
|
||||
|
||||
# Try Tier 1: Offline Access
|
||||
if os.getenv("ENABLE_OFFLINE_ACCESS") == "true":
|
||||
try:
|
||||
encryption_key = os.getenv("TOKEN_ENCRYPTION_KEY")
|
||||
self.token_storage = RefreshTokenStorage(
|
||||
db_path="tokens.db",
|
||||
encryption_key=base64.b64decode(encryption_key)
|
||||
)
|
||||
self.auth_method = "offline_access"
|
||||
logger.info("✓ Using offline_access authentication")
|
||||
return
|
||||
except Exception as e:
|
||||
logger.warning(f"Offline access unavailable: {e}")
|
||||
|
||||
# Try Tier 2: Token Exchange
|
||||
try:
|
||||
if await check_token_exchange_support(discovery_url):
|
||||
self.auth_method = "token_exchange"
|
||||
logger.info("✓ Using token exchange authentication (RFC 8693)")
|
||||
return
|
||||
except Exception as e:
|
||||
logger.warning(f"Token exchange unavailable: {e}")
|
||||
|
||||
# Fallback: Admin Credentials
|
||||
if os.getenv("NEXTCLOUD_USERNAME") and os.getenv("NEXTCLOUD_PASSWORD"):
|
||||
self.auth_method = "admin_basic"
|
||||
logger.warning(
|
||||
"⚠ Using admin BasicAuth authentication. "
|
||||
"Consider enabling offline_access for production."
|
||||
)
|
||||
return
|
||||
|
||||
raise RuntimeError("No authentication method available for sync worker")
|
||||
|
||||
async def get_user_client(self, user_id: str) -> NextcloudClient:
|
||||
"""Get authenticated client for user based on auth method"""
|
||||
|
||||
if self.auth_method == "offline_access":
|
||||
# Exchange refresh token for access token
|
||||
access_token = await get_user_access_token(user_id)
|
||||
return NextcloudClient.from_token(
|
||||
base_url=nextcloud_host,
|
||||
token=access_token,
|
||||
username=user_id
|
||||
)
|
||||
|
||||
elif self.auth_method == "token_exchange":
|
||||
# Get service token and exchange for user token
|
||||
service_token = await get_service_token()
|
||||
user_token = await exchange_for_user_token(
|
||||
service_token,
|
||||
user_id,
|
||||
scopes=["notes:read", "files:read"]
|
||||
)
|
||||
return NextcloudClient.from_token(
|
||||
base_url=nextcloud_host,
|
||||
token=user_token,
|
||||
username=user_id
|
||||
)
|
||||
|
||||
elif self.auth_method == "admin_basic":
|
||||
# Use admin credentials (fallback)
|
||||
return NextcloudClient.from_env()
|
||||
|
||||
raise RuntimeError(f"Unknown auth method: {self.auth_method}")
|
||||
|
||||
async def sync_user_content(self, user_id: str):
|
||||
"""Index a user's content into vector database"""
|
||||
|
||||
try:
|
||||
# Get authenticated client for this user
|
||||
client = await self.get_user_client(user_id)
|
||||
|
||||
# Sync notes
|
||||
notes = await client.notes.list_notes()
|
||||
for note in notes:
|
||||
embedding = await self.vector_service.embed(note.content)
|
||||
await self.vector_service.upsert(
|
||||
collection="nextcloud_content",
|
||||
id=f"note_{note.id}",
|
||||
vector=embedding,
|
||||
metadata={
|
||||
"user_id": user_id,
|
||||
"content_type": "note",
|
||||
"note_id": note.id,
|
||||
"title": note.title,
|
||||
"category": note.category
|
||||
}
|
||||
)
|
||||
|
||||
logger.info(f"Synced {len(notes)} notes for user: {user_id}")
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"Failed to sync user {user_id}: {e}")
|
||||
|
||||
async def run(self):
|
||||
"""Main sync loop"""
|
||||
|
||||
await self.initialize()
|
||||
|
||||
while True:
|
||||
try:
|
||||
# Get list of users to sync
|
||||
if self.auth_method == "admin_basic":
|
||||
# Admin can list all users
|
||||
admin_client = NextcloudClient.from_env()
|
||||
users = await admin_client.users.list_users()
|
||||
user_ids = [u.id for u in users]
|
||||
else:
|
||||
# OAuth methods: only sync users with stored tokens
|
||||
user_ids = await self.token_storage.get_all_user_ids()
|
||||
|
||||
logger.info(f"Syncing content for {len(user_ids)} users")
|
||||
|
||||
for user_id in user_ids:
|
||||
await self.sync_user_content(user_id)
|
||||
|
||||
logger.info("Sync complete, sleeping...")
|
||||
await asyncio.sleep(300) # 5 minutes
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"Sync failed: {e}")
|
||||
await asyncio.sleep(60) # Retry after 1 minute
|
||||
```
|
||||
|
||||
### 4. User Request Verification (Dual-Phase Authorization)
|
||||
|
||||
```python
|
||||
@mcp.tool()
|
||||
@require_scopes("notes:read")
|
||||
async def nc_notes_semantic_search(
|
||||
query: str,
|
||||
ctx: Context,
|
||||
limit: int = 10
|
||||
) -> SemanticSearchResponse:
|
||||
"""Semantic search with permission verification"""
|
||||
|
||||
# Get user's OAuth client (uses their access token from request)
|
||||
user_client = get_client(ctx)
|
||||
username = user_client.username
|
||||
|
||||
# Phase 1: Vector search (fast, may include false positives)
|
||||
embedding = await vector_service.embed(query)
|
||||
candidate_results = await qdrant.search(
|
||||
collection_name="nextcloud_content",
|
||||
query_vector=embedding,
|
||||
query_filter={
|
||||
"must": [
|
||||
{
|
||||
"should": [
|
||||
{"key": "user_id", "match": {"value": username}},
|
||||
{"key": "shared_with", "match": {"any": [username]}}
|
||||
]
|
||||
},
|
||||
{"key": "content_type", "match": {"value": "note"}}
|
||||
]
|
||||
},
|
||||
limit=limit * 2 # Get extra candidates
|
||||
)
|
||||
|
||||
# Phase 2: Verify access via Nextcloud API (authoritative)
|
||||
verified_results = []
|
||||
for candidate in candidate_results:
|
||||
note_id = candidate.payload["note_id"]
|
||||
try:
|
||||
# This uses user's OAuth token - will fail if no access
|
||||
note = await user_client.notes.get_note(note_id)
|
||||
verified_results.append({
|
||||
"note": note,
|
||||
"score": candidate.score
|
||||
})
|
||||
if len(verified_results) >= limit:
|
||||
break
|
||||
except HTTPStatusError as e:
|
||||
if e.response.status_code == 403:
|
||||
# User doesn't have access - skip silently
|
||||
logger.debug(f"Filtered out note {note_id} for {username}")
|
||||
continue
|
||||
raise
|
||||
|
||||
return SemanticSearchResponse(results=verified_results)
|
||||
```
|
||||
|
||||
### 5. Security Implementation
|
||||
|
||||
#### 5.1 Token Encryption
|
||||
```python
|
||||
# Generate encryption key (store securely)
|
||||
from cryptography.fernet import Fernet
|
||||
|
||||
# On first setup
|
||||
encryption_key = Fernet.generate_key()
|
||||
# Store in environment or secrets manager
|
||||
# NEVER commit to source control
|
||||
|
||||
# In production
|
||||
encryption_key = os.getenv("TOKEN_ENCRYPTION_KEY") # Base64-encoded Fernet key
|
||||
```
|
||||
|
||||
#### 5.2 Token Rotation
|
||||
```python
|
||||
async def rotate_refresh_token(user_id: str):
|
||||
"""Handle refresh token rotation"""
|
||||
|
||||
old_refresh_token = await token_storage.get_refresh_token(user_id)
|
||||
|
||||
# Exchange for new tokens
|
||||
response = await exchange_refresh_token(old_refresh_token)
|
||||
|
||||
if "refresh_token" in response:
|
||||
# Store new refresh token
|
||||
await token_storage.store_refresh_token(
|
||||
user_id,
|
||||
response["refresh_token"],
|
||||
expires_at=response.get("refresh_expires_in")
|
||||
)
|
||||
|
||||
# Securely delete old token
|
||||
await token_storage.delete_refresh_token(user_id, old_refresh_token)
|
||||
```
|
||||
|
||||
#### 5.3 Audit Logging
|
||||
```python
|
||||
async def audit_log(
|
||||
event: str,
|
||||
user_id: str,
|
||||
resource_type: str,
|
||||
resource_id: str,
|
||||
auth_method: str
|
||||
):
|
||||
"""Log sync operations for audit trail"""
|
||||
|
||||
await audit_db.execute(
|
||||
"INSERT INTO audit_logs VALUES (?, ?, ?, ?, ?, ?, ?)",
|
||||
(
|
||||
int(time.time()),
|
||||
event, # "index_note", "index_file"
|
||||
user_id,
|
||||
resource_type,
|
||||
resource_id,
|
||||
auth_method,
|
||||
socket.gethostname()
|
||||
)
|
||||
)
|
||||
```
|
||||
|
||||
### 6. Configuration
|
||||
|
||||
#### 6.1 Environment Variables
|
||||
```bash
|
||||
# Tier 1: Offline Access
|
||||
ENABLE_OFFLINE_ACCESS=true
|
||||
TOKEN_ENCRYPTION_KEY=<base64-encoded-fernet-key>
|
||||
TOKEN_STORAGE_DB=/app/data/tokens.db
|
||||
|
||||
# Tier 2: Token Exchange (auto-detected)
|
||||
# No configuration needed - detected via OIDC discovery
|
||||
|
||||
# Tier 3: Admin Fallback
|
||||
NEXTCLOUD_USERNAME=sync-bot
|
||||
NEXTCLOUD_PASSWORD=<secure-password>
|
||||
|
||||
# Vector Database
|
||||
QDRANT_URL=http://qdrant:6333
|
||||
QDRANT_API_KEY=<api-key>
|
||||
|
||||
# Sync Configuration
|
||||
SYNC_INTERVAL_SECONDS=300
|
||||
SYNC_BATCH_SIZE=100
|
||||
```
|
||||
|
||||
#### 6.2 Docker Compose
|
||||
```yaml
|
||||
services:
|
||||
mcp-sync:
|
||||
build: .
|
||||
command: ["python", "-m", "nextcloud_mcp_server.sync_worker"]
|
||||
environment:
|
||||
- NEXTCLOUD_HOST=http://app:80
|
||||
- ENABLE_OFFLINE_ACCESS=true
|
||||
- TOKEN_ENCRYPTION_KEY=${TOKEN_ENCRYPTION_KEY}
|
||||
- QDRANT_URL=http://qdrant:6333
|
||||
# OAuth client credentials (for token refresh)
|
||||
- NEXTCLOUD_OIDC_CLIENT_ID=${NEXTCLOUD_OIDC_CLIENT_ID}
|
||||
- NEXTCLOUD_OIDC_CLIENT_SECRET=${NEXTCLOUD_OIDC_CLIENT_SECRET}
|
||||
volumes:
|
||||
- sync-tokens:/app/data
|
||||
depends_on:
|
||||
- app
|
||||
- qdrant
|
||||
|
||||
volumes:
|
||||
sync-tokens: # Persistent storage for encrypted tokens
|
||||
```
|
||||
|
||||
## Consequences
|
||||
|
||||
### Benefits
|
||||
|
||||
1. **OAuth-Native Authentication**
|
||||
- Leverages standard OAuth flows (offline_access, token exchange)
|
||||
- No reliance on admin passwords in production
|
||||
- Compatible with enterprise OIDC providers
|
||||
|
||||
2. **User-Level Permissions**
|
||||
- Each user's content indexed with their own credentials
|
||||
- Respects sharing, permissions, and access controls
|
||||
- Full audit trail of which user's token was used
|
||||
|
||||
3. **Security**
|
||||
- Tokens encrypted at rest
|
||||
- Short-lived access tokens (refreshed as needed)
|
||||
- Token rotation support
|
||||
- Defense in depth with dual-phase authorization
|
||||
|
||||
4. **Flexibility**
|
||||
- Automatic capability detection
|
||||
- Graceful degradation through authentication tiers
|
||||
- Works with varying OIDC provider capabilities
|
||||
|
||||
5. **Operational**
|
||||
- Background sync independent of user activity
|
||||
- Efficient batch processing
|
||||
- Clear separation of sync vs request credentials
|
||||
|
||||
### Limitations
|
||||
|
||||
1. **Complexity**
|
||||
- Multiple authentication paths to maintain
|
||||
- Token storage and encryption infrastructure
|
||||
- More moving parts than simple admin auth
|
||||
|
||||
2. **User Experience**
|
||||
- `offline_access` scope may require additional consent
|
||||
- Users must authenticate at least once for indexing
|
||||
- New users not automatically indexed
|
||||
|
||||
3. **OIDC Provider Dependency**
|
||||
- Token exchange requires RFC 8693 support (rare)
|
||||
- Refresh token rotation varies by provider
|
||||
- Some providers may not support offline_access
|
||||
|
||||
4. **Operational Overhead**
|
||||
- Token database maintenance
|
||||
- Monitoring token expiration
|
||||
- Handling revoked tokens gracefully
|
||||
|
||||
### Security Considerations
|
||||
|
||||
#### Threat Model
|
||||
|
||||
**Threat 1: Token Storage Breach**
|
||||
- **Mitigation**: Encryption at rest using Fernet
|
||||
- **Mitigation**: Secure key management (secrets manager)
|
||||
- **Mitigation**: Minimal token lifetime
|
||||
- **Detection**: Audit logs for unusual access patterns
|
||||
|
||||
**Threat 2: Token Replay**
|
||||
- **Mitigation**: Short-lived access tokens (refreshed frequently)
|
||||
- **Mitigation**: Token rotation on each refresh
|
||||
- **Mitigation**: Revocation support
|
||||
|
||||
**Threat 3: Privilege Escalation**
|
||||
- **Mitigation**: Dual-phase authorization (vector DB + Nextcloud API)
|
||||
- **Mitigation**: Sync worker uses same scopes as user requests
|
||||
- **Mitigation**: Per-user token isolation
|
||||
|
||||
**Threat 4: Vector Database Poisoning**
|
||||
- **Mitigation**: User requests always verify via Nextcloud API
|
||||
- **Mitigation**: Vector DB is cache/accelerator, not source of truth
|
||||
- **Mitigation**: Sync operations audited per user
|
||||
|
||||
#### Security Best Practices
|
||||
|
||||
1. **Token Encryption Key Management**
|
||||
```bash
|
||||
# Generate secure key
|
||||
python -c "from cryptography.fernet import Fernet; print(Fernet.generate_key().decode())"
|
||||
|
||||
# Store in secrets manager (Vault, AWS Secrets Manager, etc.)
|
||||
# Or use environment variable with restricted permissions
|
||||
```
|
||||
|
||||
2. **Token Storage Permissions**
|
||||
```bash
|
||||
# Restrict database file permissions
|
||||
chmod 600 /app/data/tokens.db
|
||||
chown mcp-server:mcp-server /app/data/tokens.db
|
||||
```
|
||||
|
||||
3. **Token Rotation Schedule**
|
||||
- Refresh access tokens every 5 minutes (or token expiry)
|
||||
- Rotate refresh tokens on each use (if provider supports)
|
||||
- Revoke tokens on user logout/deauthorization
|
||||
|
||||
4. **Monitoring and Alerting**
|
||||
- Alert on token refresh failures
|
||||
- Monitor for unusual access patterns
|
||||
- Track token age and rotation
|
||||
- Audit sync operations per user
|
||||
|
||||
### Future Enhancements
|
||||
|
||||
1. **Token Revocation Handling**
|
||||
- Webhook endpoint for token revocation events
|
||||
- Periodic validation of stored tokens
|
||||
- Graceful handling of revoked tokens
|
||||
|
||||
2. **Selective Sync**
|
||||
- Allow users to opt-in/opt-out of indexing
|
||||
- Per-content-type sync preferences
|
||||
- Privacy controls for sensitive content
|
||||
|
||||
3. **Multi-Tenant Token Storage**
|
||||
- Separate token databases per tenant
|
||||
- Key rotation per tenant
|
||||
- Tenant isolation
|
||||
|
||||
4. **Token Lifecycle Management**
|
||||
- Automatic cleanup of expired tokens
|
||||
- Token usage analytics
|
||||
- Token health dashboard
|
||||
|
||||
5. **Alternative OAuth Flows**
|
||||
- Device flow for headless sync
|
||||
- Resource owner password credentials (ROPC) as fallback
|
||||
- SAML assertion grants
|
||||
|
||||
## Alternatives Considered
|
||||
|
||||
### Alternative 1: Admin BasicAuth Only
|
||||
|
||||
**Approach**: Background worker always uses admin credentials
|
||||
|
||||
**Pros**:
|
||||
- Simple implementation
|
||||
- No token storage complexity
|
||||
- Works with any authentication backend
|
||||
|
||||
**Cons**:
|
||||
- Violates principle of least privilege
|
||||
- Single powerful credential
|
||||
- No per-user audit trail
|
||||
- Bypasses OAuth entirely
|
||||
|
||||
**Decision**: Rejected for production use; kept as fallback only
|
||||
|
||||
### Alternative 2: Client Credentials Grant Only
|
||||
|
||||
**Approach**: Service account with broad read permissions
|
||||
|
||||
**Pros**:
|
||||
- OAuth-native pattern
|
||||
- No user token storage
|
||||
- Standard OAuth flow
|
||||
|
||||
**Cons**:
|
||||
- Requires client_credentials support (may not be available)
|
||||
- Still needs broad cross-user permissions
|
||||
- Not well-suited for multi-user indexing
|
||||
|
||||
**Decision**: Rejected; token exchange is better fit for multi-user scenario
|
||||
|
||||
### Alternative 3: Per-User Access Token Storage
|
||||
|
||||
**Approach**: Store user access tokens (not refresh tokens)
|
||||
|
||||
**Pros**:
|
||||
- Simpler than refresh token flow
|
||||
- No token refresh logic needed
|
||||
|
||||
**Cons**:
|
||||
- Access tokens are short-lived (1-24 hours)
|
||||
- Requires frequent re-authentication
|
||||
- Poor user experience
|
||||
- Sync gaps when tokens expire
|
||||
|
||||
**Decision**: Rejected; refresh tokens provide better UX
|
||||
|
||||
### Alternative 4: On-Demand Indexing Only
|
||||
|
||||
**Approach**: Index content when user searches (no background worker)
|
||||
|
||||
**Pros**:
|
||||
- Uses user's request token
|
||||
- No background auth needed
|
||||
- Simpler architecture
|
||||
|
||||
**Cons**:
|
||||
- Very slow first search
|
||||
- Poor user experience
|
||||
- Incomplete index
|
||||
- Can't pre-compute embeddings
|
||||
|
||||
**Decision**: Rejected; background indexing is essential for semantic search
|
||||
|
||||
### Alternative 5: Nextcloud App Tokens
|
||||
|
||||
**Approach**: Generate app-specific passwords for each user
|
||||
|
||||
**Pros**:
|
||||
- Nextcloud-native feature
|
||||
- User-controlled revocation
|
||||
- Scoped per-application
|
||||
|
||||
**Cons**:
|
||||
- Requires user interaction to create
|
||||
- May not support programmatic creation
|
||||
- Still requires secure storage
|
||||
- Not standard OAuth
|
||||
|
||||
**Decision**: Rejected; not automatable for background worker
|
||||
|
||||
## Related Decisions
|
||||
|
||||
- ADR-001: Enhanced Note Search (establishes need for vector search)
|
||||
- [Future] ADR-003: Vector Database Selection
|
||||
- [Future] ADR-004: Embedding Model Strategy
|
||||
|
||||
## References
|
||||
|
||||
- [RFC 8693: OAuth 2.0 Token Exchange](https://datatracker.ietf.org/doc/html/rfc8693)
|
||||
- [RFC 6749: OAuth 2.0 - Refresh Tokens](https://datatracker.ietf.org/doc/html/rfc6749#section-1.5)
|
||||
- [OpenID Connect Core - Offline Access](https://openid.net/specs/openid-connect-core-1_0.html#OfflineAccess)
|
||||
- [OWASP: OAuth Security Cheat Sheet](https://cheatsheetseries.owasp.org/cheatsheets/OAuth2_Cheat_Sheet.html)
|
||||
- [RFC 8707: Resource Indicators for OAuth 2.0](https://datatracker.ietf.org/doc/html/rfc8707)
|
||||
File diff suppressed because it is too large
Load Diff
Reference in New Issue
Block a user