Files

18 KiB

OAuth Multi-User Load Testing Framework

Comprehensive multi-user benchmarking system for testing OAuth-authenticated Nextcloud MCP server with realistic collaborative workflows.

Quick Start

# 1. Ensure docker-compose is running
docker-compose up -d

# 2. Run a benchmark with 2 users for 30 seconds
uv run python -m tests.load.oauth_benchmark --users 2 --duration 30

# 3. Clean up test users (IMPORTANT - always run after benchmark)
uv run python -m tests.load.cleanup_loadtest_users

# Optional: Verify cleanup
uv run python -m tests.load.cleanup_loadtest_users --dry-run

Overview

This framework extends the basic load testing infrastructure to support:

  • Multiple OAuth-authenticated users running concurrently
  • Coordinated workflows spanning multiple users (sharing, collaboration, permissions)
  • Per-user metrics tracking individual user performance
  • Workflow-specific metrics measuring cross-user operation latencies
  • Realistic scenarios mimicking actual user collaboration patterns
  • Concurrent user creation - all users created and authenticated in parallel for fast setup

Architecture

Components

tests/load/
├── oauth_pool.py          # OAuth user pool management
├── oauth_workloads.py     # Multi-user workflow definitions
├── oauth_metrics.py       # Enhanced metrics collection
├── oauth_benchmark.py     # Main CLI entry point
└── README_OAUTH.md        # This file

Key Classes

OAuthUserPool (oauth_pool.py)

  • Manages N OAuth-authenticated users
  • Handles token acquisition and storage
  • Creates and manages MCP sessions per user
  • Tracks per-user operation statistics

UserSessionWrapper (oauth_pool.py)

  • Wraps MCP ClientSession for a specific user
  • Automatic operation tracking
  • Convenient tool/resource access methods

Workflow (oauth_workloads.py)

  • Base class for multi-user coordinated workflows
  • Step-by-step execution with timing
  • Comprehensive error handling and reporting

OAuthBenchmarkMetrics (oauth_metrics.py)

  • Per-user operation counts and latencies
  • Workflow completion rates and timings
  • Baseline operation statistics
  • Detailed reporting and JSON export

Available Workflows

1. NoteShareWorkflow

Scenario: Alice creates a note and shares it with Bob, who then reads it.

Steps:

  1. User A creates a note
  2. User A shares note with User B (read-only permissions)
  3. User B lists their shared notes (measures propagation delay)
  4. User B reads the shared note

Metrics: Creation latency, share propagation time, read latency

2. CollaborativeEditWorkflow

Scenario: Multiple users concurrently edit the same note.

Steps:

  1. Owner creates a note
  2. All users read the note simultaneously
  3. All users append content concurrently
  4. Owner verifies final state

Metrics: Concurrent read latency, concurrent write conflicts, final state consistency

3. FileShareAndDownloadWorkflow

Scenario: Alice uploads a file, shares it with Bob, who then downloads it.

Steps:

  1. User A creates a file via WebDAV
  2. User A shares file with User B (read-only)
  3. User B lists their shares
  4. User B downloads the file

Metrics: Upload latency, share creation, download latency

4. MixedOAuthWorkload

Distribution:

  • 50% Baseline operations (individual user CRUD)
  • 30% Note sharing workflows
  • 15% Collaborative editing workflows
  • 5% File sharing workflows

Usage

Basic Usage

# 4 users, 60-second test with mixed workload
uv run python -m tests.load.oauth_benchmark --users 4 --duration 60

# 10 users, 5-minute test
uv run python -m tests.load.oauth_benchmark -u 10 -d 300

# Export results to JSON
uv run python -m tests.load.oauth_benchmark -u 5 -d 120 --output results.json

Advanced Options

# Sharing-focused workload
uv run python -m tests.load.oauth_benchmark --workload sharing -u 8 -d 180

# Collaborative editing workload
uv run python -m tests.load.oauth_benchmark --workload collaboration -u 6 -d 120

# Baseline operations only (no workflows)
uv run python -m tests.load.oauth_benchmark --workload baseline -u 10 -d 60

# Verbose logging for debugging
uv run python -m tests.load.oauth_benchmark -u 2 -d 30 --verbose

CLI Options

Option Short Default Description
--users -u 2 Number of concurrent users (dynamically created)
--duration -d 30.0 Test duration in seconds
--warmup -w 5.0 Warmup period before metrics collection (seconds)
--url http://localhost:8001/mcp MCP OAuth server URL
--output -o None JSON output file path
--workload mixed Workload type: mixed, sharing, collaboration, baseline
--user-prefix loadtest Prefix for dynamically created usernames
--cleanup/--no-cleanup cleanup Delete created users after benchmark
--browser chromium Playwright browser: firefox, chromium, webkit
--headed False Run browser in headed mode (visible window)
--verbose -v False Enable verbose logging

Test User Creation

The framework dynamically creates test users on-demand with OAuth authentication:

  • Naming: Users are created with the pattern {prefix}_user_{n} (default: loadtest_user_1, loadtest_user_2, etc.)
  • Customization: Use --user-prefix to change the prefix (e.g., --user-prefix mytestmytest_user_1)
  • Scalability: No limit on user count - create as many concurrent users as your system can handle
  • Credentials: Each user gets a randomly generated secure password
  • OAuth Tokens: All users authenticate via automated OAuth flow using Playwright
  • Cleanup: Users are automatically deleted after the benchmark (disable with --no-cleanup)

Example: Running --users 5 creates:

Metrics Output

Console Report

================================================================================
OAUTH MULTI-USER BENCHMARK RESULTS
================================================================================

Duration: 120.45s
Total Users: 5
Total Workflows Executed: 312
Total Baseline Operations: 678

--------------------------------------------------------------------------------
WORKFLOW STATISTICS
--------------------------------------------------------------------------------
Workflow                         Total  Success     Rate        P50        P95
--------------------------------------------------------------------------------
note_share                         112      109    97.3%   0.2341s   0.4782s
collaborative_edit                  65       61    93.8%   0.5123s   0.9234s
file_share                          29       29   100.0%   0.3456s   0.6123s

--------------------------------------------------------------------------------
PER-USER STATISTICS
--------------------------------------------------------------------------------
User                  Total Ops    Success   Errors     Rate        P50
--------------------------------------------------------------------------------
loadtest_user_1              289        283        6    97.9%   0.2456s
loadtest_user_2              245        241        4    98.4%   0.2123s
loadtest_user_3              231        226        5    97.8%   0.2345s
loadtest_user_4              198        195        3    98.5%   0.2234s
loadtest_user_5              187        184        3    98.4%   0.2189s

--------------------------------------------------------------------------------
BASELINE OPERATIONS
--------------------------------------------------------------------------------
Total Operations: 678
Success Rate: 98.2%
Latency: min=0.0234s, p50=0.1234s, p95=0.3456s, max=0.8123s
================================================================================

JSON Export

{
  "summary": {
    "duration": 120.45,
    "total_workflows": 312,
    "total_baseline_ops": 678,
    "total_users": 5
  },
  "workflows": {
    "note_share": {
      "total_executions": 112,
      "successful_executions": 109,
      "failed_executions": 3,
      "success_rate": 97.3,
      "latency": {
        "min": 0.1234,
        "max": 0.8765,
        "mean": 0.2891,
        "median": 0.2341,
        "p90": 0.4123,
        "p95": 0.4782,
        "p99": 0.7234
      },
      "step_latencies": {
        "create_note": {...},
        "share_note": {...},
        "list_shared_with_me": {...},
        "read_shared_note": {...}
      }
    }
  },
  "users": {
    "loadtest_user_1": {
      "total_operations": 289,
      "successful_operations": 283,
      "failed_operations": 6,
      "success_rate": 97.9,
      "latency": {...},
      "operations_breakdown": {...},
      "errors_breakdown": {...}
    },
    "loadtest_user_2": {...},
    "loadtest_user_3": {...},
    "loadtest_user_4": {...},
    "loadtest_user_5": {...}
  },
  "baseline": {...}
}

Implementation Status

Completed Components

Framework:

  • OAuth user pool management with dynamic user creation
  • User session wrappers with automatic tracking
  • Workflow base classes and framework
  • 3 example workflows (note share, collaborative edit, file share)
  • Enhanced metrics with per-user and workflow tracking
  • CLI interface with multiple workload options
  • Comprehensive reporting (console + JSON)

OAuth Integration:

  • Playwright browser automation for OAuth login
  • OAuth callback server for auth code capture
  • Token exchange with OIDC provider
  • OAuth token injection into MCP sessions via Authorization headers
  • Cancel scope error handling for reliable cleanup
  • Dynamic user creation and deletion via Nextcloud Users API

Implementation Details: The benchmark now successfully:

  1. Creates Nextcloud users dynamically with unique passwords
  2. Acquires OAuth tokens via automated Playwright browser flows
  3. Creates MCP client sessions with proper Authorization: Bearer {token} headers
  4. Executes coordinated multi-user workflows
  5. Tracks per-user and per-workflow metrics
  6. Provides standalone cleanup utility for test users

Key Fix (oauth_pool.py:163-164):

# Pass OAuth token as Authorization header
headers = {"Authorization": f"Bearer {profile.token}"}
streamable_context = streamablehttp_client(mcp_url, headers=headers)

Creating Custom Workflows

Example: Permission Escalation Workflow

class PermissionEscalationWorkflow(Workflow):
    """Test sharing permission changes."""

    def __init__(self):
        super().__init__("permission_escalation")

    async def execute(self, users: list[UserSessionWrapper]) -> WorkflowResult:
        self.start_time = time.time()

        if len(users) < 2:
            return self._finish(False, error="Requires 2+ users")

        owner, collaborator = users[0], users[1]

        # Step 1: Owner creates note
        create_result = await self._execute_step(
            "create_note",
            owner,
            lambda: owner.call_tool("nc_notes_create_note", {...})
        )

        # Step 2: Share read-only
        await self._execute_step(
            "share_readonly",
            owner,
            lambda: owner.call_tool("nc_share_create", {
                "permissions": 1  # Read-only
            })
        )

        # Step 3: Upgrade to edit permissions
        await self._execute_step(
            "upgrade_permissions",
            owner,
            lambda: owner.call_tool("nc_share_update", {
                "permissions": 15  # Read+update+create+delete
            })
        )

        # Step 4: Collaborator edits
        await self._execute_step(
            "collaborator_edit",
            collaborator,
            lambda: collaborator.call_tool("nc_notes_update_note", {...})
        )

        return self._finish(success=True)

Registering Custom Workflows

# In oauth_workloads.py
class MixedOAuthWorkload:
    def __init__(self, users: list[UserSessionWrapper]):
        self.users = users
        self.workflows = {
            "note_share": NoteShareWorkflow(),
            "collaborative_edit": CollaborativeEditWorkflow(),
            "file_share": FileShareAndDownloadWorkflow(),
            "permission_escalation": PermissionEscalationWorkflow(),  # Add your workflow
        }

Performance Expectations

Baseline Performance (basic auth, from existing benchmarks)

  • Throughput: 50-200 RPS for mixed workload
  • Latency: p50 <100ms, p95 <500ms, p99 <1000ms

OAuth Multi-User Expectations

  • Lower throughput: ~30-60% of baseline due to:
    • OAuth token validation overhead
    • Cross-user synchronization delays
    • Workflow coordination overhead
  • Higher p99 latency: Due to workflow step dependencies
  • Focus: End-to-end workflow completion time more important than raw RPS

Common Bottlenecks

  1. OAuth token validation: Per-request overhead
  2. Share propagation: Time for shares to become visible to recipients
  3. Concurrent edit conflicts: ETags and conflict resolution
  4. Permission checks: Cross-user access validation

Best Practices

  1. Start Small: Begin with 2-3 users to validate workflows
  2. Monitor Errors: Watch for permission errors and conflicts
  3. Adjust Delays: Tune sleep delays between operations based on server response
  4. Profile Workflows: Use step latencies to identify bottlenecks
  5. Export Results: Always export to JSON for historical comparison

Performance Optimizations

Concurrent User Creation

The benchmark creates and authenticates users concurrently for maximum performance:

Step 5: User Creation & OAuth Authentication

  • All N users are created in parallel using asyncio.gather()
  • Each user runs through the full OAuth flow simultaneously
  • Multiple Playwright browser contexts operate independently

Step 6: MCP Session Creation

  • All user sessions are created concurrently
  • OAuth tokens passed as Authorization headers to each session

Performance Impact:

  • Sequential (old): ~10-12s per user → 40-48s for 4 users
  • Concurrent (new): ~12-15s total for 4 users (3-4x speedup!)

Example output showing concurrent execution:

Step 5/6: Creating 4 users and acquiring OAuth tokens...
(Running concurrently for faster setup)

  [1/4] Creating user 'loadtest_user_1'...
  [2/4] Creating user 'loadtest_user_2'...
  [3/4] Creating user 'loadtest_user_3'...
  [4/4] Creating user 'loadtest_user_4'...
  ✓ User 'loadtest_user_4' authenticated
  ✓ User 'loadtest_user_2' authenticated
  ✓ User 'loadtest_user_1' authenticated
  ✓ User 'loadtest_user_3' authenticated

✓ Successfully created and authenticated 4 users

Implementation (oauth_benchmark.py:402-437):

# Create tasks for all users
tasks = [
    create_user_task(i, browser, callback_server.auth_states)
    for i in range(num_users)
]
# Run all concurrently
results = await asyncio.gather(*tasks, return_exceptions=True)

Cleanup

Important: Due to asyncio scoping issues with the MCP client library, automatic cleanup in the benchmark's finally block may not execute reliably. Always use the cleanup utility after running benchmarks.

Use the cleanup utility to remove test users:

# Dry run - see what would be deleted
uv run python -m tests.load.cleanup_loadtest_users --dry-run

# Delete all loadtest users
uv run python -m tests.load.cleanup_loadtest_users

# Delete users with custom prefix
uv run python -m tests.load.cleanup_loadtest_users --prefix mytest

Disable Automatic Cleanup

To keep test users after the benchmark for inspection:

uv run python -m tests.load.oauth_benchmark --users 2 --no-cleanup

Troubleshooting

Leftover Test Users

Symptom: Test users remain in Nextcloud after benchmark crashes

Solution: Run the cleanup utility:

uv run python -m tests.load.cleanup_loadtest_users

"User X not in pool" Error

  • Ensure user count doesn't exceed configured limits
  • Check that user creation succeeded in previous steps

CancelledError During Benchmark

Symptom: Error message like 'CancelledError' object has no attribute 'username' appears in logs

Cause: Async task cancellation during benchmark shutdown or errors can cause race conditions in error handling

Solution: This has been mitigated with defensive error handling. The worker now:

  • Catches asyncio.CancelledError specifically before general exceptions
  • Logs cancellation gracefully without attempting to access potentially invalid state
  • Re-raises the exception to allow proper cleanup chain

If you still see this error, it's likely harmless and occurs during shutdown. The benchmark results should still be valid.

High Error Rates

  • Increase delay between operations (await asyncio.sleep() in worker)
  • Check OAuth token validity
  • Verify MCP OAuth server is running and accessible (port 8001)
  • Rebuild mcp-oauth container after code changes: docker-compose up --build -d mcp-oauth

Workflows Failing

  • Check step-by-step latencies to identify failing steps
  • Verify users have correct permissions
  • Review server logs for errors

MCP Session Creation Fails (401 Unauthorized)

Solution: This issue has been fixed! OAuth tokens are now properly passed as Authorization headers when creating MCP sessions.

If you still see 401 errors:

  • Rebuild the mcp-oauth container: docker-compose up --build -d mcp-oauth
  • Verify OAuth tokens are being acquired successfully in verbose mode
  • Check that the token hasn't expired (use shorter test durations during troubleshooting)

Future Enhancements

  • Dynamic user creation (beyond 4 default users) - COMPLETED
  • OAuth token injection for MCP sessions - COMPLETED
  • Cancel scope error handling - COMPLETED
  • Concurrent user creation and authentication - COMPLETED (3-4x speedup!)
  • Workflow templates for common patterns
  • Real-time dashboard for live monitoring
  • Historical comparison and regression detection
  • Load ramping (gradual user increase)
  • Geographic distribution simulation (latency injection)
  • Improve cleanup reliability in finally block