535 lines
18 KiB
Markdown
535 lines
18 KiB
Markdown
# OAuth Multi-User Load Testing Framework
|
|
|
|
Comprehensive multi-user benchmarking system for testing OAuth-authenticated Nextcloud MCP server with realistic collaborative workflows.
|
|
|
|
## Quick Start
|
|
|
|
```bash
|
|
# 1. Ensure docker-compose is running
|
|
docker-compose up -d
|
|
|
|
# 2. Run a benchmark with 2 users for 30 seconds
|
|
uv run python -m tests.load.oauth_benchmark --users 2 --duration 30
|
|
|
|
# 3. Clean up test users (IMPORTANT - always run after benchmark)
|
|
uv run python -m tests.load.cleanup_loadtest_users
|
|
|
|
# Optional: Verify cleanup
|
|
uv run python -m tests.load.cleanup_loadtest_users --dry-run
|
|
```
|
|
|
|
## Overview
|
|
|
|
This framework extends the basic load testing infrastructure to support:
|
|
- **Multiple OAuth-authenticated users** running concurrently
|
|
- **Coordinated workflows** spanning multiple users (sharing, collaboration, permissions)
|
|
- **Per-user metrics** tracking individual user performance
|
|
- **Workflow-specific metrics** measuring cross-user operation latencies
|
|
- **Realistic scenarios** mimicking actual user collaboration patterns
|
|
- **Concurrent user creation** - all users created and authenticated in parallel for fast setup
|
|
|
|
## Architecture
|
|
|
|
### Components
|
|
|
|
```
|
|
tests/load/
|
|
├── oauth_pool.py # OAuth user pool management
|
|
├── oauth_workloads.py # Multi-user workflow definitions
|
|
├── oauth_metrics.py # Enhanced metrics collection
|
|
├── oauth_benchmark.py # Main CLI entry point
|
|
└── README_OAUTH.md # This file
|
|
```
|
|
|
|
### Key Classes
|
|
|
|
**OAuthUserPool** (`oauth_pool.py`)
|
|
- Manages N OAuth-authenticated users
|
|
- Handles token acquisition and storage
|
|
- Creates and manages MCP sessions per user
|
|
- Tracks per-user operation statistics
|
|
|
|
**UserSessionWrapper** (`oauth_pool.py`)
|
|
- Wraps MCP ClientSession for a specific user
|
|
- Automatic operation tracking
|
|
- Convenient tool/resource access methods
|
|
|
|
**Workflow** (`oauth_workloads.py`)
|
|
- Base class for multi-user coordinated workflows
|
|
- Step-by-step execution with timing
|
|
- Comprehensive error handling and reporting
|
|
|
|
**OAuthBenchmarkMetrics** (`oauth_metrics.py`)
|
|
- Per-user operation counts and latencies
|
|
- Workflow completion rates and timings
|
|
- Baseline operation statistics
|
|
- Detailed reporting and JSON export
|
|
|
|
## Available Workflows
|
|
|
|
### 1. NoteShareWorkflow
|
|
**Scenario**: Alice creates a note and shares it with Bob, who then reads it.
|
|
|
|
**Steps**:
|
|
1. User A creates a note
|
|
2. User A shares note with User B (read-only permissions)
|
|
3. User B lists their shared notes (measures propagation delay)
|
|
4. User B reads the shared note
|
|
|
|
**Metrics**: Creation latency, share propagation time, read latency
|
|
|
|
### 2. CollaborativeEditWorkflow
|
|
**Scenario**: Multiple users concurrently edit the same note.
|
|
|
|
**Steps**:
|
|
1. Owner creates a note
|
|
2. All users read the note simultaneously
|
|
3. All users append content concurrently
|
|
4. Owner verifies final state
|
|
|
|
**Metrics**: Concurrent read latency, concurrent write conflicts, final state consistency
|
|
|
|
### 3. FileShareAndDownloadWorkflow
|
|
**Scenario**: Alice uploads a file, shares it with Bob, who then downloads it.
|
|
|
|
**Steps**:
|
|
1. User A creates a file via WebDAV
|
|
2. User A shares file with User B (read-only)
|
|
3. User B lists their shares
|
|
4. User B downloads the file
|
|
|
|
**Metrics**: Upload latency, share creation, download latency
|
|
|
|
### 4. MixedOAuthWorkload
|
|
**Distribution**:
|
|
- 50% Baseline operations (individual user CRUD)
|
|
- 30% Note sharing workflows
|
|
- 15% Collaborative editing workflows
|
|
- 5% File sharing workflows
|
|
|
|
## Usage
|
|
|
|
### Basic Usage
|
|
|
|
```bash
|
|
# 4 users, 60-second test with mixed workload
|
|
uv run python -m tests.load.oauth_benchmark --users 4 --duration 60
|
|
|
|
# 10 users, 5-minute test
|
|
uv run python -m tests.load.oauth_benchmark -u 10 -d 300
|
|
|
|
# Export results to JSON
|
|
uv run python -m tests.load.oauth_benchmark -u 5 -d 120 --output results.json
|
|
```
|
|
|
|
### Advanced Options
|
|
|
|
```bash
|
|
# Sharing-focused workload
|
|
uv run python -m tests.load.oauth_benchmark --workload sharing -u 8 -d 180
|
|
|
|
# Collaborative editing workload
|
|
uv run python -m tests.load.oauth_benchmark --workload collaboration -u 6 -d 120
|
|
|
|
# Baseline operations only (no workflows)
|
|
uv run python -m tests.load.oauth_benchmark --workload baseline -u 10 -d 60
|
|
|
|
# Verbose logging for debugging
|
|
uv run python -m tests.load.oauth_benchmark -u 2 -d 30 --verbose
|
|
```
|
|
|
|
### CLI Options
|
|
|
|
| Option | Short | Default | Description |
|
|
|--------|-------|---------|-------------|
|
|
| `--users` | `-u` | 2 | Number of concurrent users (dynamically created) |
|
|
| `--duration` | `-d` | 30.0 | Test duration in seconds |
|
|
| `--warmup` | `-w` | 5.0 | Warmup period before metrics collection (seconds) |
|
|
| `--url` | | `http://localhost:8001/mcp` | MCP OAuth server URL |
|
|
| `--output` | `-o` | None | JSON output file path |
|
|
| `--workload` | | `mixed` | Workload type: mixed, sharing, collaboration, baseline |
|
|
| `--user-prefix` | | `loadtest` | Prefix for dynamically created usernames |
|
|
| `--cleanup/--no-cleanup` | | `cleanup` | Delete created users after benchmark |
|
|
| `--browser` | | `chromium` | Playwright browser: firefox, chromium, webkit |
|
|
| `--headed` | | False | Run browser in headed mode (visible window) |
|
|
| `--verbose` | `-v` | False | Enable verbose logging |
|
|
|
|
## Test User Creation
|
|
|
|
The framework **dynamically creates test users** on-demand with OAuth authentication:
|
|
|
|
- **Naming**: Users are created with the pattern `{prefix}_user_{n}` (default: `loadtest_user_1`, `loadtest_user_2`, etc.)
|
|
- **Customization**: Use `--user-prefix` to change the prefix (e.g., `--user-prefix mytest` → `mytest_user_1`)
|
|
- **Scalability**: No limit on user count - create as many concurrent users as your system can handle
|
|
- **Credentials**: Each user gets a randomly generated secure password
|
|
- **OAuth Tokens**: All users authenticate via automated OAuth flow using Playwright
|
|
- **Cleanup**: Users are automatically deleted after the benchmark (disable with `--no-cleanup`)
|
|
|
|
**Example**: Running `--users 5` creates:
|
|
- `loadtest_user_1` (Display: Load Test User 1, Email: loadtest_user_1@benchmark.local)
|
|
- `loadtest_user_2` (Display: Load Test User 2, Email: loadtest_user_2@benchmark.local)
|
|
- `loadtest_user_3` (Display: Load Test User 3, Email: loadtest_user_3@benchmark.local)
|
|
- `loadtest_user_4` (Display: Load Test User 4, Email: loadtest_user_4@benchmark.local)
|
|
- `loadtest_user_5` (Display: Load Test User 5, Email: loadtest_user_5@benchmark.local)
|
|
|
|
## Metrics Output
|
|
|
|
### Console Report
|
|
|
|
```
|
|
================================================================================
|
|
OAUTH MULTI-USER BENCHMARK RESULTS
|
|
================================================================================
|
|
|
|
Duration: 120.45s
|
|
Total Users: 5
|
|
Total Workflows Executed: 312
|
|
Total Baseline Operations: 678
|
|
|
|
--------------------------------------------------------------------------------
|
|
WORKFLOW STATISTICS
|
|
--------------------------------------------------------------------------------
|
|
Workflow Total Success Rate P50 P95
|
|
--------------------------------------------------------------------------------
|
|
note_share 112 109 97.3% 0.2341s 0.4782s
|
|
collaborative_edit 65 61 93.8% 0.5123s 0.9234s
|
|
file_share 29 29 100.0% 0.3456s 0.6123s
|
|
|
|
--------------------------------------------------------------------------------
|
|
PER-USER STATISTICS
|
|
--------------------------------------------------------------------------------
|
|
User Total Ops Success Errors Rate P50
|
|
--------------------------------------------------------------------------------
|
|
loadtest_user_1 289 283 6 97.9% 0.2456s
|
|
loadtest_user_2 245 241 4 98.4% 0.2123s
|
|
loadtest_user_3 231 226 5 97.8% 0.2345s
|
|
loadtest_user_4 198 195 3 98.5% 0.2234s
|
|
loadtest_user_5 187 184 3 98.4% 0.2189s
|
|
|
|
--------------------------------------------------------------------------------
|
|
BASELINE OPERATIONS
|
|
--------------------------------------------------------------------------------
|
|
Total Operations: 678
|
|
Success Rate: 98.2%
|
|
Latency: min=0.0234s, p50=0.1234s, p95=0.3456s, max=0.8123s
|
|
================================================================================
|
|
```
|
|
|
|
### JSON Export
|
|
|
|
```json
|
|
{
|
|
"summary": {
|
|
"duration": 120.45,
|
|
"total_workflows": 312,
|
|
"total_baseline_ops": 678,
|
|
"total_users": 5
|
|
},
|
|
"workflows": {
|
|
"note_share": {
|
|
"total_executions": 112,
|
|
"successful_executions": 109,
|
|
"failed_executions": 3,
|
|
"success_rate": 97.3,
|
|
"latency": {
|
|
"min": 0.1234,
|
|
"max": 0.8765,
|
|
"mean": 0.2891,
|
|
"median": 0.2341,
|
|
"p90": 0.4123,
|
|
"p95": 0.4782,
|
|
"p99": 0.7234
|
|
},
|
|
"step_latencies": {
|
|
"create_note": {...},
|
|
"share_note": {...},
|
|
"list_shared_with_me": {...},
|
|
"read_shared_note": {...}
|
|
}
|
|
}
|
|
},
|
|
"users": {
|
|
"loadtest_user_1": {
|
|
"total_operations": 289,
|
|
"successful_operations": 283,
|
|
"failed_operations": 6,
|
|
"success_rate": 97.9,
|
|
"latency": {...},
|
|
"operations_breakdown": {...},
|
|
"errors_breakdown": {...}
|
|
},
|
|
"loadtest_user_2": {...},
|
|
"loadtest_user_3": {...},
|
|
"loadtest_user_4": {...},
|
|
"loadtest_user_5": {...}
|
|
},
|
|
"baseline": {...}
|
|
}
|
|
```
|
|
|
|
## Implementation Status
|
|
|
|
### ✅ Completed Components
|
|
|
|
**Framework:**
|
|
- OAuth user pool management with dynamic user creation
|
|
- User session wrappers with automatic tracking
|
|
- Workflow base classes and framework
|
|
- 3 example workflows (note share, collaborative edit, file share)
|
|
- Enhanced metrics with per-user and workflow tracking
|
|
- CLI interface with multiple workload options
|
|
- Comprehensive reporting (console + JSON)
|
|
|
|
**OAuth Integration:**
|
|
- ✅ Playwright browser automation for OAuth login
|
|
- ✅ OAuth callback server for auth code capture
|
|
- ✅ Token exchange with OIDC provider
|
|
- ✅ OAuth token injection into MCP sessions via Authorization headers
|
|
- ✅ Cancel scope error handling for reliable cleanup
|
|
- ✅ Dynamic user creation and deletion via Nextcloud Users API
|
|
|
|
**Implementation Details:**
|
|
The benchmark now successfully:
|
|
1. Creates Nextcloud users dynamically with unique passwords
|
|
2. Acquires OAuth tokens via automated Playwright browser flows
|
|
3. Creates MCP client sessions with proper `Authorization: Bearer {token}` headers
|
|
4. Executes coordinated multi-user workflows
|
|
5. Tracks per-user and per-workflow metrics
|
|
6. Provides standalone cleanup utility for test users
|
|
|
|
**Key Fix (oauth_pool.py:163-164)**:
|
|
```python
|
|
# Pass OAuth token as Authorization header
|
|
headers = {"Authorization": f"Bearer {profile.token}"}
|
|
streamable_context = streamablehttp_client(mcp_url, headers=headers)
|
|
```
|
|
|
|
## Creating Custom Workflows
|
|
|
|
### Example: Permission Escalation Workflow
|
|
|
|
```python
|
|
class PermissionEscalationWorkflow(Workflow):
|
|
"""Test sharing permission changes."""
|
|
|
|
def __init__(self):
|
|
super().__init__("permission_escalation")
|
|
|
|
async def execute(self, users: list[UserSessionWrapper]) -> WorkflowResult:
|
|
self.start_time = time.time()
|
|
|
|
if len(users) < 2:
|
|
return self._finish(False, error="Requires 2+ users")
|
|
|
|
owner, collaborator = users[0], users[1]
|
|
|
|
# Step 1: Owner creates note
|
|
create_result = await self._execute_step(
|
|
"create_note",
|
|
owner,
|
|
lambda: owner.call_tool("nc_notes_create_note", {...})
|
|
)
|
|
|
|
# Step 2: Share read-only
|
|
await self._execute_step(
|
|
"share_readonly",
|
|
owner,
|
|
lambda: owner.call_tool("nc_share_create", {
|
|
"permissions": 1 # Read-only
|
|
})
|
|
)
|
|
|
|
# Step 3: Upgrade to edit permissions
|
|
await self._execute_step(
|
|
"upgrade_permissions",
|
|
owner,
|
|
lambda: owner.call_tool("nc_share_update", {
|
|
"permissions": 15 # Read+update+create+delete
|
|
})
|
|
)
|
|
|
|
# Step 4: Collaborator edits
|
|
await self._execute_step(
|
|
"collaborator_edit",
|
|
collaborator,
|
|
lambda: collaborator.call_tool("nc_notes_update_note", {...})
|
|
)
|
|
|
|
return self._finish(success=True)
|
|
```
|
|
|
|
### Registering Custom Workflows
|
|
|
|
```python
|
|
# In oauth_workloads.py
|
|
class MixedOAuthWorkload:
|
|
def __init__(self, users: list[UserSessionWrapper]):
|
|
self.users = users
|
|
self.workflows = {
|
|
"note_share": NoteShareWorkflow(),
|
|
"collaborative_edit": CollaborativeEditWorkflow(),
|
|
"file_share": FileShareAndDownloadWorkflow(),
|
|
"permission_escalation": PermissionEscalationWorkflow(), # Add your workflow
|
|
}
|
|
```
|
|
|
|
## Performance Expectations
|
|
|
|
### Baseline Performance (basic auth, from existing benchmarks)
|
|
- **Throughput**: 50-200 RPS for mixed workload
|
|
- **Latency**: p50 <100ms, p95 <500ms, p99 <1000ms
|
|
|
|
### OAuth Multi-User Expectations
|
|
- **Lower throughput**: ~30-60% of baseline due to:
|
|
- OAuth token validation overhead
|
|
- Cross-user synchronization delays
|
|
- Workflow coordination overhead
|
|
- **Higher p99 latency**: Due to workflow step dependencies
|
|
- **Focus**: End-to-end workflow completion time more important than raw RPS
|
|
|
|
### Common Bottlenecks
|
|
1. **OAuth token validation**: Per-request overhead
|
|
2. **Share propagation**: Time for shares to become visible to recipients
|
|
3. **Concurrent edit conflicts**: ETags and conflict resolution
|
|
4. **Permission checks**: Cross-user access validation
|
|
|
|
## Best Practices
|
|
|
|
1. **Start Small**: Begin with 2-3 users to validate workflows
|
|
2. **Monitor Errors**: Watch for permission errors and conflicts
|
|
3. **Adjust Delays**: Tune sleep delays between operations based on server response
|
|
4. **Profile Workflows**: Use step latencies to identify bottlenecks
|
|
5. **Export Results**: Always export to JSON for historical comparison
|
|
|
|
## Performance Optimizations
|
|
|
|
### Concurrent User Creation
|
|
|
|
The benchmark creates and authenticates users **concurrently** for maximum performance:
|
|
|
|
**Step 5: User Creation & OAuth Authentication**
|
|
- All N users are created in parallel using `asyncio.gather()`
|
|
- Each user runs through the full OAuth flow simultaneously
|
|
- Multiple Playwright browser contexts operate independently
|
|
|
|
**Step 6: MCP Session Creation**
|
|
- All user sessions are created concurrently
|
|
- OAuth tokens passed as Authorization headers to each session
|
|
|
|
**Performance Impact:**
|
|
- **Sequential** (old): ~10-12s per user → 40-48s for 4 users
|
|
- **Concurrent** (new): ~12-15s total for 4 users (3-4x speedup!)
|
|
|
|
Example output showing concurrent execution:
|
|
```
|
|
Step 5/6: Creating 4 users and acquiring OAuth tokens...
|
|
(Running concurrently for faster setup)
|
|
|
|
[1/4] Creating user 'loadtest_user_1'...
|
|
[2/4] Creating user 'loadtest_user_2'...
|
|
[3/4] Creating user 'loadtest_user_3'...
|
|
[4/4] Creating user 'loadtest_user_4'...
|
|
✓ User 'loadtest_user_4' authenticated
|
|
✓ User 'loadtest_user_2' authenticated
|
|
✓ User 'loadtest_user_1' authenticated
|
|
✓ User 'loadtest_user_3' authenticated
|
|
|
|
✓ Successfully created and authenticated 4 users
|
|
```
|
|
|
|
**Implementation** (oauth_benchmark.py:402-437):
|
|
```python
|
|
# Create tasks for all users
|
|
tasks = [
|
|
create_user_task(i, browser, callback_server.auth_states)
|
|
for i in range(num_users)
|
|
]
|
|
# Run all concurrently
|
|
results = await asyncio.gather(*tasks, return_exceptions=True)
|
|
```
|
|
|
|
## Cleanup
|
|
|
|
**Important**: Due to asyncio scoping issues with the MCP client library, automatic cleanup in the benchmark's finally block may not execute reliably. Always use the cleanup utility after running benchmarks.
|
|
|
|
### Cleanup Utility (Recommended)
|
|
|
|
Use the cleanup utility to remove test users:
|
|
|
|
```bash
|
|
# Dry run - see what would be deleted
|
|
uv run python -m tests.load.cleanup_loadtest_users --dry-run
|
|
|
|
# Delete all loadtest users
|
|
uv run python -m tests.load.cleanup_loadtest_users
|
|
|
|
# Delete users with custom prefix
|
|
uv run python -m tests.load.cleanup_loadtest_users --prefix mytest
|
|
```
|
|
|
|
### Disable Automatic Cleanup
|
|
|
|
To keep test users after the benchmark for inspection:
|
|
|
|
```bash
|
|
uv run python -m tests.load.oauth_benchmark --users 2 --no-cleanup
|
|
```
|
|
|
|
## Troubleshooting
|
|
|
|
### Leftover Test Users
|
|
**Symptom**: Test users remain in Nextcloud after benchmark crashes
|
|
|
|
**Solution**: Run the cleanup utility:
|
|
```bash
|
|
uv run python -m tests.load.cleanup_loadtest_users
|
|
```
|
|
|
|
### "User X not in pool" Error
|
|
- Ensure user count doesn't exceed configured limits
|
|
- Check that user creation succeeded in previous steps
|
|
|
|
### CancelledError During Benchmark
|
|
**Symptom**: Error message like `'CancelledError' object has no attribute 'username'` appears in logs
|
|
|
|
**Cause**: Async task cancellation during benchmark shutdown or errors can cause race conditions in error handling
|
|
|
|
**Solution**: This has been mitigated with defensive error handling. The worker now:
|
|
- Catches `asyncio.CancelledError` specifically before general exceptions
|
|
- Logs cancellation gracefully without attempting to access potentially invalid state
|
|
- Re-raises the exception to allow proper cleanup chain
|
|
|
|
If you still see this error, it's likely harmless and occurs during shutdown. The benchmark results should still be valid.
|
|
|
|
### High Error Rates
|
|
- Increase delay between operations (`await asyncio.sleep()` in worker)
|
|
- Check OAuth token validity
|
|
- Verify MCP OAuth server is running and accessible (port 8001)
|
|
- Rebuild mcp-oauth container after code changes: `docker-compose up --build -d mcp-oauth`
|
|
|
|
### Workflows Failing
|
|
- Check step-by-step latencies to identify failing steps
|
|
- Verify users have correct permissions
|
|
- Review server logs for errors
|
|
|
|
### MCP Session Creation Fails (401 Unauthorized)
|
|
**Solution**: This issue has been fixed! OAuth tokens are now properly passed as Authorization headers when creating MCP sessions.
|
|
|
|
If you still see 401 errors:
|
|
- Rebuild the mcp-oauth container: `docker-compose up --build -d mcp-oauth`
|
|
- Verify OAuth tokens are being acquired successfully in verbose mode
|
|
- Check that the token hasn't expired (use shorter test durations during troubleshooting)
|
|
|
|
## Future Enhancements
|
|
|
|
- [x] Dynamic user creation (beyond 4 default users) - **COMPLETED**
|
|
- [x] OAuth token injection for MCP sessions - **COMPLETED**
|
|
- [x] Cancel scope error handling - **COMPLETED**
|
|
- [x] Concurrent user creation and authentication - **COMPLETED** (3-4x speedup!)
|
|
- [ ] Workflow templates for common patterns
|
|
- [ ] Real-time dashboard for live monitoring
|
|
- [ ] Historical comparison and regression detection
|
|
- [ ] Load ramping (gradual user increase)
|
|
- [ ] Geographic distribution simulation (latency injection)
|
|
- [ ] Improve cleanup reliability in finally block
|