18 KiB
OAuth Multi-User Load Testing Framework
Comprehensive multi-user benchmarking system for testing OAuth-authenticated Nextcloud MCP server with realistic collaborative workflows.
Quick Start
# 1. Ensure docker-compose is running
docker-compose up -d
# 2. Run a benchmark with 2 users for 30 seconds
uv run python -m tests.load.oauth_benchmark --users 2 --duration 30
# 3. Clean up test users (IMPORTANT - always run after benchmark)
uv run python -m tests.load.cleanup_loadtest_users
# Optional: Verify cleanup
uv run python -m tests.load.cleanup_loadtest_users --dry-run
Overview
This framework extends the basic load testing infrastructure to support:
- Multiple OAuth-authenticated users running concurrently
- Coordinated workflows spanning multiple users (sharing, collaboration, permissions)
- Per-user metrics tracking individual user performance
- Workflow-specific metrics measuring cross-user operation latencies
- Realistic scenarios mimicking actual user collaboration patterns
- Concurrent user creation - all users created and authenticated in parallel for fast setup
Architecture
Components
tests/load/
├── oauth_pool.py # OAuth user pool management
├── oauth_workloads.py # Multi-user workflow definitions
├── oauth_metrics.py # Enhanced metrics collection
├── oauth_benchmark.py # Main CLI entry point
└── README_OAUTH.md # This file
Key Classes
OAuthUserPool (oauth_pool.py)
- Manages N OAuth-authenticated users
- Handles token acquisition and storage
- Creates and manages MCP sessions per user
- Tracks per-user operation statistics
UserSessionWrapper (oauth_pool.py)
- Wraps MCP ClientSession for a specific user
- Automatic operation tracking
- Convenient tool/resource access methods
Workflow (oauth_workloads.py)
- Base class for multi-user coordinated workflows
- Step-by-step execution with timing
- Comprehensive error handling and reporting
OAuthBenchmarkMetrics (oauth_metrics.py)
- Per-user operation counts and latencies
- Workflow completion rates and timings
- Baseline operation statistics
- Detailed reporting and JSON export
Available Workflows
1. NoteShareWorkflow
Scenario: Alice creates a note and shares it with Bob, who then reads it.
Steps:
- User A creates a note
- User A shares note with User B (read-only permissions)
- User B lists their shared notes (measures propagation delay)
- User B reads the shared note
Metrics: Creation latency, share propagation time, read latency
2. CollaborativeEditWorkflow
Scenario: Multiple users concurrently edit the same note.
Steps:
- Owner creates a note
- All users read the note simultaneously
- All users append content concurrently
- Owner verifies final state
Metrics: Concurrent read latency, concurrent write conflicts, final state consistency
3. FileShareAndDownloadWorkflow
Scenario: Alice uploads a file, shares it with Bob, who then downloads it.
Steps:
- User A creates a file via WebDAV
- User A shares file with User B (read-only)
- User B lists their shares
- User B downloads the file
Metrics: Upload latency, share creation, download latency
4. MixedOAuthWorkload
Distribution:
- 50% Baseline operations (individual user CRUD)
- 30% Note sharing workflows
- 15% Collaborative editing workflows
- 5% File sharing workflows
Usage
Basic Usage
# 4 users, 60-second test with mixed workload
uv run python -m tests.load.oauth_benchmark --users 4 --duration 60
# 10 users, 5-minute test
uv run python -m tests.load.oauth_benchmark -u 10 -d 300
# Export results to JSON
uv run python -m tests.load.oauth_benchmark -u 5 -d 120 --output results.json
Advanced Options
# Sharing-focused workload
uv run python -m tests.load.oauth_benchmark --workload sharing -u 8 -d 180
# Collaborative editing workload
uv run python -m tests.load.oauth_benchmark --workload collaboration -u 6 -d 120
# Baseline operations only (no workflows)
uv run python -m tests.load.oauth_benchmark --workload baseline -u 10 -d 60
# Verbose logging for debugging
uv run python -m tests.load.oauth_benchmark -u 2 -d 30 --verbose
CLI Options
| Option | Short | Default | Description |
|---|---|---|---|
--users |
-u |
2 | Number of concurrent users (dynamically created) |
--duration |
-d |
30.0 | Test duration in seconds |
--warmup |
-w |
5.0 | Warmup period before metrics collection (seconds) |
--url |
http://localhost:8001/mcp |
MCP OAuth server URL | |
--output |
-o |
None | JSON output file path |
--workload |
mixed |
Workload type: mixed, sharing, collaboration, baseline | |
--user-prefix |
loadtest |
Prefix for dynamically created usernames | |
--cleanup/--no-cleanup |
cleanup |
Delete created users after benchmark | |
--browser |
chromium |
Playwright browser: firefox, chromium, webkit | |
--headed |
False | Run browser in headed mode (visible window) | |
--verbose |
-v |
False | Enable verbose logging |
Test User Creation
The framework dynamically creates test users on-demand with OAuth authentication:
- Naming: Users are created with the pattern
{prefix}_user_{n}(default:loadtest_user_1,loadtest_user_2, etc.) - Customization: Use
--user-prefixto change the prefix (e.g.,--user-prefix mytest→mytest_user_1) - Scalability: No limit on user count - create as many concurrent users as your system can handle
- Credentials: Each user gets a randomly generated secure password
- OAuth Tokens: All users authenticate via automated OAuth flow using Playwright
- Cleanup: Users are automatically deleted after the benchmark (disable with
--no-cleanup)
Example: Running --users 5 creates:
loadtest_user_1(Display: Load Test User 1, Email: loadtest_user_1@benchmark.local)loadtest_user_2(Display: Load Test User 2, Email: loadtest_user_2@benchmark.local)loadtest_user_3(Display: Load Test User 3, Email: loadtest_user_3@benchmark.local)loadtest_user_4(Display: Load Test User 4, Email: loadtest_user_4@benchmark.local)loadtest_user_5(Display: Load Test User 5, Email: loadtest_user_5@benchmark.local)
Metrics Output
Console Report
================================================================================
OAUTH MULTI-USER BENCHMARK RESULTS
================================================================================
Duration: 120.45s
Total Users: 5
Total Workflows Executed: 312
Total Baseline Operations: 678
--------------------------------------------------------------------------------
WORKFLOW STATISTICS
--------------------------------------------------------------------------------
Workflow Total Success Rate P50 P95
--------------------------------------------------------------------------------
note_share 112 109 97.3% 0.2341s 0.4782s
collaborative_edit 65 61 93.8% 0.5123s 0.9234s
file_share 29 29 100.0% 0.3456s 0.6123s
--------------------------------------------------------------------------------
PER-USER STATISTICS
--------------------------------------------------------------------------------
User Total Ops Success Errors Rate P50
--------------------------------------------------------------------------------
loadtest_user_1 289 283 6 97.9% 0.2456s
loadtest_user_2 245 241 4 98.4% 0.2123s
loadtest_user_3 231 226 5 97.8% 0.2345s
loadtest_user_4 198 195 3 98.5% 0.2234s
loadtest_user_5 187 184 3 98.4% 0.2189s
--------------------------------------------------------------------------------
BASELINE OPERATIONS
--------------------------------------------------------------------------------
Total Operations: 678
Success Rate: 98.2%
Latency: min=0.0234s, p50=0.1234s, p95=0.3456s, max=0.8123s
================================================================================
JSON Export
{
"summary": {
"duration": 120.45,
"total_workflows": 312,
"total_baseline_ops": 678,
"total_users": 5
},
"workflows": {
"note_share": {
"total_executions": 112,
"successful_executions": 109,
"failed_executions": 3,
"success_rate": 97.3,
"latency": {
"min": 0.1234,
"max": 0.8765,
"mean": 0.2891,
"median": 0.2341,
"p90": 0.4123,
"p95": 0.4782,
"p99": 0.7234
},
"step_latencies": {
"create_note": {...},
"share_note": {...},
"list_shared_with_me": {...},
"read_shared_note": {...}
}
}
},
"users": {
"loadtest_user_1": {
"total_operations": 289,
"successful_operations": 283,
"failed_operations": 6,
"success_rate": 97.9,
"latency": {...},
"operations_breakdown": {...},
"errors_breakdown": {...}
},
"loadtest_user_2": {...},
"loadtest_user_3": {...},
"loadtest_user_4": {...},
"loadtest_user_5": {...}
},
"baseline": {...}
}
Implementation Status
✅ Completed Components
Framework:
- OAuth user pool management with dynamic user creation
- User session wrappers with automatic tracking
- Workflow base classes and framework
- 3 example workflows (note share, collaborative edit, file share)
- Enhanced metrics with per-user and workflow tracking
- CLI interface with multiple workload options
- Comprehensive reporting (console + JSON)
OAuth Integration:
- ✅ Playwright browser automation for OAuth login
- ✅ OAuth callback server for auth code capture
- ✅ Token exchange with OIDC provider
- ✅ OAuth token injection into MCP sessions via Authorization headers
- ✅ Cancel scope error handling for reliable cleanup
- ✅ Dynamic user creation and deletion via Nextcloud Users API
Implementation Details: The benchmark now successfully:
- Creates Nextcloud users dynamically with unique passwords
- Acquires OAuth tokens via automated Playwright browser flows
- Creates MCP client sessions with proper
Authorization: Bearer {token}headers - Executes coordinated multi-user workflows
- Tracks per-user and per-workflow metrics
- Provides standalone cleanup utility for test users
Key Fix (oauth_pool.py:163-164):
# Pass OAuth token as Authorization header
headers = {"Authorization": f"Bearer {profile.token}"}
streamable_context = streamablehttp_client(mcp_url, headers=headers)
Creating Custom Workflows
Example: Permission Escalation Workflow
class PermissionEscalationWorkflow(Workflow):
"""Test sharing permission changes."""
def __init__(self):
super().__init__("permission_escalation")
async def execute(self, users: list[UserSessionWrapper]) -> WorkflowResult:
self.start_time = time.time()
if len(users) < 2:
return self._finish(False, error="Requires 2+ users")
owner, collaborator = users[0], users[1]
# Step 1: Owner creates note
create_result = await self._execute_step(
"create_note",
owner,
lambda: owner.call_tool("nc_notes_create_note", {...})
)
# Step 2: Share read-only
await self._execute_step(
"share_readonly",
owner,
lambda: owner.call_tool("nc_share_create", {
"permissions": 1 # Read-only
})
)
# Step 3: Upgrade to edit permissions
await self._execute_step(
"upgrade_permissions",
owner,
lambda: owner.call_tool("nc_share_update", {
"permissions": 15 # Read+update+create+delete
})
)
# Step 4: Collaborator edits
await self._execute_step(
"collaborator_edit",
collaborator,
lambda: collaborator.call_tool("nc_notes_update_note", {...})
)
return self._finish(success=True)
Registering Custom Workflows
# In oauth_workloads.py
class MixedOAuthWorkload:
def __init__(self, users: list[UserSessionWrapper]):
self.users = users
self.workflows = {
"note_share": NoteShareWorkflow(),
"collaborative_edit": CollaborativeEditWorkflow(),
"file_share": FileShareAndDownloadWorkflow(),
"permission_escalation": PermissionEscalationWorkflow(), # Add your workflow
}
Performance Expectations
Baseline Performance (basic auth, from existing benchmarks)
- Throughput: 50-200 RPS for mixed workload
- Latency: p50 <100ms, p95 <500ms, p99 <1000ms
OAuth Multi-User Expectations
- Lower throughput: ~30-60% of baseline due to:
- OAuth token validation overhead
- Cross-user synchronization delays
- Workflow coordination overhead
- Higher p99 latency: Due to workflow step dependencies
- Focus: End-to-end workflow completion time more important than raw RPS
Common Bottlenecks
- OAuth token validation: Per-request overhead
- Share propagation: Time for shares to become visible to recipients
- Concurrent edit conflicts: ETags and conflict resolution
- Permission checks: Cross-user access validation
Best Practices
- Start Small: Begin with 2-3 users to validate workflows
- Monitor Errors: Watch for permission errors and conflicts
- Adjust Delays: Tune sleep delays between operations based on server response
- Profile Workflows: Use step latencies to identify bottlenecks
- Export Results: Always export to JSON for historical comparison
Performance Optimizations
Concurrent User Creation
The benchmark creates and authenticates users concurrently for maximum performance:
Step 5: User Creation & OAuth Authentication
- All N users are created in parallel using
asyncio.gather() - Each user runs through the full OAuth flow simultaneously
- Multiple Playwright browser contexts operate independently
Step 6: MCP Session Creation
- All user sessions are created concurrently
- OAuth tokens passed as Authorization headers to each session
Performance Impact:
- Sequential (old): ~10-12s per user → 40-48s for 4 users
- Concurrent (new): ~12-15s total for 4 users (3-4x speedup!)
Example output showing concurrent execution:
Step 5/6: Creating 4 users and acquiring OAuth tokens...
(Running concurrently for faster setup)
[1/4] Creating user 'loadtest_user_1'...
[2/4] Creating user 'loadtest_user_2'...
[3/4] Creating user 'loadtest_user_3'...
[4/4] Creating user 'loadtest_user_4'...
✓ User 'loadtest_user_4' authenticated
✓ User 'loadtest_user_2' authenticated
✓ User 'loadtest_user_1' authenticated
✓ User 'loadtest_user_3' authenticated
✓ Successfully created and authenticated 4 users
Implementation (oauth_benchmark.py:402-437):
# Create tasks for all users
tasks = [
create_user_task(i, browser, callback_server.auth_states)
for i in range(num_users)
]
# Run all concurrently
results = await asyncio.gather(*tasks, return_exceptions=True)
Cleanup
Important: Due to asyncio scoping issues with the MCP client library, automatic cleanup in the benchmark's finally block may not execute reliably. Always use the cleanup utility after running benchmarks.
Cleanup Utility (Recommended)
Use the cleanup utility to remove test users:
# Dry run - see what would be deleted
uv run python -m tests.load.cleanup_loadtest_users --dry-run
# Delete all loadtest users
uv run python -m tests.load.cleanup_loadtest_users
# Delete users with custom prefix
uv run python -m tests.load.cleanup_loadtest_users --prefix mytest
Disable Automatic Cleanup
To keep test users after the benchmark for inspection:
uv run python -m tests.load.oauth_benchmark --users 2 --no-cleanup
Troubleshooting
Leftover Test Users
Symptom: Test users remain in Nextcloud after benchmark crashes
Solution: Run the cleanup utility:
uv run python -m tests.load.cleanup_loadtest_users
"User X not in pool" Error
- Ensure user count doesn't exceed configured limits
- Check that user creation succeeded in previous steps
CancelledError During Benchmark
Symptom: Error message like 'CancelledError' object has no attribute 'username' appears in logs
Cause: Async task cancellation during benchmark shutdown or errors can cause race conditions in error handling
Solution: This has been mitigated with defensive error handling. The worker now:
- Catches
asyncio.CancelledErrorspecifically before general exceptions - Logs cancellation gracefully without attempting to access potentially invalid state
- Re-raises the exception to allow proper cleanup chain
If you still see this error, it's likely harmless and occurs during shutdown. The benchmark results should still be valid.
High Error Rates
- Increase delay between operations (
await asyncio.sleep()in worker) - Check OAuth token validity
- Verify MCP OAuth server is running and accessible (port 8001)
- Rebuild mcp-oauth container after code changes:
docker-compose up --build -d mcp-oauth
Workflows Failing
- Check step-by-step latencies to identify failing steps
- Verify users have correct permissions
- Review server logs for errors
MCP Session Creation Fails (401 Unauthorized)
Solution: This issue has been fixed! OAuth tokens are now properly passed as Authorization headers when creating MCP sessions.
If you still see 401 errors:
- Rebuild the mcp-oauth container:
docker-compose up --build -d mcp-oauth - Verify OAuth tokens are being acquired successfully in verbose mode
- Check that the token hasn't expired (use shorter test durations during troubleshooting)
Future Enhancements
- Dynamic user creation (beyond 4 default users) - COMPLETED
- OAuth token injection for MCP sessions - COMPLETED
- Cancel scope error handling - COMPLETED
- Concurrent user creation and authentication - COMPLETED (3-4x speedup!)
- Workflow templates for common patterns
- Real-time dashboard for live monitoring
- Historical comparison and regression detection
- Load ramping (gradual user increase)
- Geographic distribution simulation (latency injection)
- Improve cleanup reliability in finally block