Files

T

Chris Coutinho 54e975198f test: Update all test network hosts to respect iss claims from JWTs

2025-10-23 11:09:51 +02:00

18 KiB

Raw Permalink Blame History

OAuth Multi-User Load Testing Framework

Comprehensive multi-user benchmarking system for testing OAuth-authenticated Nextcloud MCP server with realistic collaborative workflows.

Quick Start

# 1. Ensure docker-compose is running
docker-compose up -d

# 2. Run a benchmark with 2 users for 30 seconds
uv run python -m tests.load.oauth_benchmark --users 2 --duration 30

# 3. Clean up test users (IMPORTANT - always run after benchmark)
uv run python -m tests.load.cleanup_loadtest_users

# Optional: Verify cleanup
uv run python -m tests.load.cleanup_loadtest_users --dry-run

Overview

This framework extends the basic load testing infrastructure to support:

Multiple OAuth-authenticated users running concurrently
Coordinated workflows spanning multiple users (sharing, collaboration, permissions)
Per-user metrics tracking individual user performance
Workflow-specific metrics measuring cross-user operation latencies
Realistic scenarios mimicking actual user collaboration patterns
Concurrent user creation - all users created and authenticated in parallel for fast setup

Architecture

Components

tests/load/
├── oauth_pool.py          # OAuth user pool management
├── oauth_workloads.py     # Multi-user workflow definitions
├── oauth_metrics.py       # Enhanced metrics collection
├── oauth_benchmark.py     # Main CLI entry point
└── README_OAUTH.md        # This file

Key Classes

OAuthUserPool (oauth_pool.py)

Manages N OAuth-authenticated users
Handles token acquisition and storage
Creates and manages MCP sessions per user
Tracks per-user operation statistics

UserSessionWrapper (oauth_pool.py)

Wraps MCP ClientSession for a specific user
Automatic operation tracking
Convenient tool/resource access methods

Workflow (oauth_workloads.py)

Base class for multi-user coordinated workflows
Step-by-step execution with timing
Comprehensive error handling and reporting

OAuthBenchmarkMetrics (oauth_metrics.py)

Per-user operation counts and latencies
Workflow completion rates and timings
Baseline operation statistics
Detailed reporting and JSON export

Available Workflows

1. NoteShareWorkflow

Scenario: Alice creates a note and shares it with Bob, who then reads it.

Steps:

User A creates a note
User A shares note with User B (read-only permissions)
User B lists their shared notes (measures propagation delay)
User B reads the shared note

Metrics: Creation latency, share propagation time, read latency

2. CollaborativeEditWorkflow

Scenario: Multiple users concurrently edit the same note.

Steps:

Owner creates a note
All users read the note simultaneously
All users append content concurrently
Owner verifies final state

Metrics: Concurrent read latency, concurrent write conflicts, final state consistency

3. FileShareAndDownloadWorkflow

Scenario: Alice uploads a file, shares it with Bob, who then downloads it.

Steps:

User A creates a file via WebDAV
User A shares file with User B (read-only)
User B lists their shares
User B downloads the file

Metrics: Upload latency, share creation, download latency

4. MixedOAuthWorkload

Distribution:

50% Baseline operations (individual user CRUD)
30% Note sharing workflows
15% Collaborative editing workflows
5% File sharing workflows

Usage

Basic Usage

# 4 users, 60-second test with mixed workload
uv run python -m tests.load.oauth_benchmark --users 4 --duration 60

# 10 users, 5-minute test
uv run python -m tests.load.oauth_benchmark -u 10 -d 300

# Export results to JSON
uv run python -m tests.load.oauth_benchmark -u 5 -d 120 --output results.json

Advanced Options

# Sharing-focused workload
uv run python -m tests.load.oauth_benchmark --workload sharing -u 8 -d 180

# Collaborative editing workload
uv run python -m tests.load.oauth_benchmark --workload collaboration -u 6 -d 120

# Baseline operations only (no workflows)
uv run python -m tests.load.oauth_benchmark --workload baseline -u 10 -d 60

# Verbose logging for debugging
uv run python -m tests.load.oauth_benchmark -u 2 -d 30 --verbose

CLI Options

Option	Short	Default	Description
`--users`	`-u`	2	Number of concurrent users (dynamically created)
`--duration`	`-d`	30.0	Test duration in seconds
`--warmup`	`-w`	5.0	Warmup period before metrics collection (seconds)
`--url`		`http://localhost:8001/mcp`	MCP OAuth server URL
`--output`	`-o`	None	JSON output file path
`--workload`		`mixed`	Workload type: mixed, sharing, collaboration, baseline
`--user-prefix`		`loadtest`	Prefix for dynamically created usernames
`--cleanup/--no-cleanup`		`cleanup`	Delete created users after benchmark
`--browser`		`chromium`	Playwright browser: firefox, chromium, webkit
`--headed`		False	Run browser in headed mode (visible window)
`--verbose`	`-v`	False	Enable verbose logging

Test User Creation

The framework dynamically creates test users on-demand with OAuth authentication:

Naming: Users are created with the pattern {prefix}_user_{n} (default: loadtest_user_1, loadtest_user_2, etc.)
Customization: Use --user-prefix to change the prefix (e.g., --user-prefix mytest → mytest_user_1)
Scalability: No limit on user count - create as many concurrent users as your system can handle
Credentials: Each user gets a randomly generated secure password
OAuth Tokens: All users authenticate via automated OAuth flow using Playwright
Cleanup: Users are automatically deleted after the benchmark (disable with --no-cleanup)

Example: Running --users 5 creates:

loadtest_user_1 (Display: Load Test User 1, Email: loadtest_user_1@benchmark.local)
loadtest_user_2 (Display: Load Test User 2, Email: loadtest_user_2@benchmark.local)
loadtest_user_3 (Display: Load Test User 3, Email: loadtest_user_3@benchmark.local)
loadtest_user_4 (Display: Load Test User 4, Email: loadtest_user_4@benchmark.local)
loadtest_user_5 (Display: Load Test User 5, Email: loadtest_user_5@benchmark.local)

Metrics Output

Console Report

================================================================================
OAUTH MULTI-USER BENCHMARK RESULTS
================================================================================

Duration: 120.45s
Total Users: 5
Total Workflows Executed: 312
Total Baseline Operations: 678

--------------------------------------------------------------------------------
WORKFLOW STATISTICS
--------------------------------------------------------------------------------
Workflow                         Total  Success     Rate        P50        P95
--------------------------------------------------------------------------------
note_share                         112      109    97.3%   0.2341s   0.4782s
collaborative_edit                  65       61    93.8%   0.5123s   0.9234s
file_share                          29       29   100.0%   0.3456s   0.6123s

--------------------------------------------------------------------------------
PER-USER STATISTICS
--------------------------------------------------------------------------------
User                  Total Ops    Success   Errors     Rate        P50
--------------------------------------------------------------------------------
loadtest_user_1              289        283        6    97.9%   0.2456s
loadtest_user_2              245        241        4    98.4%   0.2123s
loadtest_user_3              231        226        5    97.8%   0.2345s
loadtest_user_4              198        195        3    98.5%   0.2234s
loadtest_user_5              187        184        3    98.4%   0.2189s

--------------------------------------------------------------------------------
BASELINE OPERATIONS
--------------------------------------------------------------------------------
Total Operations: 678
Success Rate: 98.2%
Latency: min=0.0234s, p50=0.1234s, p95=0.3456s, max=0.8123s
================================================================================

JSON Export

{
  "summary": {
    "duration": 120.45,
    "total_workflows": 312,
    "total_baseline_ops": 678,
    "total_users": 5
  },
  "workflows": {
    "note_share": {
      "total_executions": 112,
      "successful_executions": 109,
      "failed_executions": 3,
      "success_rate": 97.3,
      "latency": {
        "min": 0.1234,
        "max": 0.8765,
        "mean": 0.2891,
        "median": 0.2341,
        "p90": 0.4123,
        "p95": 0.4782,
        "p99": 0.7234
      },
      "step_latencies": {
        "create_note": {...},
        "share_note": {...},
        "list_shared_with_me": {...},
        "read_shared_note": {...}
      }
    }
  },
  "users": {
    "loadtest_user_1": {
      "total_operations": 289,
      "successful_operations": 283,
      "failed_operations": 6,
      "success_rate": 97.9,
      "latency": {...},
      "operations_breakdown": {...},
      "errors_breakdown": {...}
    },
    "loadtest_user_2": {...},
    "loadtest_user_3": {...},
    "loadtest_user_4": {...},
    "loadtest_user_5": {...}
  },
  "baseline": {...}
}

Implementation Status

✅ Completed Components

Framework:

OAuth user pool management with dynamic user creation
User session wrappers with automatic tracking
Workflow base classes and framework
3 example workflows (note share, collaborative edit, file share)
Enhanced metrics with per-user and workflow tracking
CLI interface with multiple workload options
Comprehensive reporting (console + JSON)

OAuth Integration:

✅ Playwright browser automation for OAuth login
✅ OAuth callback server for auth code capture
✅ Token exchange with OIDC provider
✅ OAuth token injection into MCP sessions via Authorization headers
✅ Cancel scope error handling for reliable cleanup
✅ Dynamic user creation and deletion via Nextcloud Users API

Implementation Details: The benchmark now successfully:

Creates Nextcloud users dynamically with unique passwords
Acquires OAuth tokens via automated Playwright browser flows
Creates MCP client sessions with proper Authorization: Bearer {token} headers
Executes coordinated multi-user workflows
Tracks per-user and per-workflow metrics
Provides standalone cleanup utility for test users

Key Fix (oauth_pool.py:163-164):

# Pass OAuth token as Authorization header
headers = {"Authorization": f"Bearer {profile.token}"}
streamable_context = streamablehttp_client(mcp_url, headers=headers)

Creating Custom Workflows

Example: Permission Escalation Workflow

class PermissionEscalationWorkflow(Workflow):
    """Test sharing permission changes."""

    def __init__(self):
        super().__init__("permission_escalation")

    async def execute(self, users: list[UserSessionWrapper]) -> WorkflowResult:
        self.start_time = time.time()

        if len(users) < 2:
            return self._finish(False, error="Requires 2+ users")

        owner, collaborator = users[0], users[1]

        # Step 1: Owner creates note
        create_result = await self._execute_step(
            "create_note",
            owner,
            lambda: owner.call_tool("nc_notes_create_note", {...})
        )

        # Step 2: Share read-only
        await self._execute_step(
            "share_readonly",
            owner,
            lambda: owner.call_tool("nc_share_create", {
                "permissions": 1  # Read-only
            })
        )

        # Step 3: Upgrade to edit permissions
        await self._execute_step(
            "upgrade_permissions",
            owner,
            lambda: owner.call_tool("nc_share_update", {
                "permissions": 15  # Read+update+create+delete
            })
        )

        # Step 4: Collaborator edits
        await self._execute_step(
            "collaborator_edit",
            collaborator,
            lambda: collaborator.call_tool("nc_notes_update_note", {...})
        )

        return self._finish(success=True)

Registering Custom Workflows

# In oauth_workloads.py
class MixedOAuthWorkload:
    def __init__(self, users: list[UserSessionWrapper]):
        self.users = users
        self.workflows = {
            "note_share": NoteShareWorkflow(),
            "collaborative_edit": CollaborativeEditWorkflow(),
            "file_share": FileShareAndDownloadWorkflow(),
            "permission_escalation": PermissionEscalationWorkflow(),  # Add your workflow
        }

Performance Expectations

Baseline Performance (basic auth, from existing benchmarks)

Throughput: 50-200 RPS for mixed workload
Latency: p50 <100ms, p95 <500ms, p99 <1000ms

OAuth Multi-User Expectations

Lower throughput: ~30-60% of baseline due to:
- OAuth token validation overhead
- Cross-user synchronization delays
- Workflow coordination overhead
Higher p99 latency: Due to workflow step dependencies
Focus: End-to-end workflow completion time more important than raw RPS

Common Bottlenecks

OAuth token validation: Per-request overhead
Share propagation: Time for shares to become visible to recipients
Concurrent edit conflicts: ETags and conflict resolution
Permission checks: Cross-user access validation

Best Practices

Start Small: Begin with 2-3 users to validate workflows
Monitor Errors: Watch for permission errors and conflicts
Adjust Delays: Tune sleep delays between operations based on server response
Profile Workflows: Use step latencies to identify bottlenecks
Export Results: Always export to JSON for historical comparison

Performance Optimizations

Concurrent User Creation

The benchmark creates and authenticates users concurrently for maximum performance:

Step 5: User Creation & OAuth Authentication

All N users are created in parallel using asyncio.gather()
Each user runs through the full OAuth flow simultaneously
Multiple Playwright browser contexts operate independently

Step 6: MCP Session Creation

All user sessions are created concurrently
OAuth tokens passed as Authorization headers to each session

Performance Impact:

Sequential (old): ~10-12s per user → 40-48s for 4 users
Concurrent (new): ~12-15s total for 4 users (3-4x speedup!)

Example output showing concurrent execution:

Step 5/6: Creating 4 users and acquiring OAuth tokens...
(Running concurrently for faster setup)

  [1/4] Creating user 'loadtest_user_1'...
  [2/4] Creating user 'loadtest_user_2'...
  [3/4] Creating user 'loadtest_user_3'...
  [4/4] Creating user 'loadtest_user_4'...
  ✓ User 'loadtest_user_4' authenticated
  ✓ User 'loadtest_user_2' authenticated
  ✓ User 'loadtest_user_1' authenticated
  ✓ User 'loadtest_user_3' authenticated

✓ Successfully created and authenticated 4 users

Implementation (oauth_benchmark.py:402-437):

# Create tasks for all users
tasks = [
    create_user_task(i, browser, callback_server.auth_states)
    for i in range(num_users)
]
# Run all concurrently
results = await asyncio.gather(*tasks, return_exceptions=True)

Cleanup

Important: Due to asyncio scoping issues with the MCP client library, automatic cleanup in the benchmark's finally block may not execute reliably. Always use the cleanup utility after running benchmarks.

Cleanup Utility (Recommended)

Use the cleanup utility to remove test users:

# Dry run - see what would be deleted
uv run python -m tests.load.cleanup_loadtest_users --dry-run

# Delete all loadtest users
uv run python -m tests.load.cleanup_loadtest_users

# Delete users with custom prefix
uv run python -m tests.load.cleanup_loadtest_users --prefix mytest

Disable Automatic Cleanup

To keep test users after the benchmark for inspection:

uv run python -m tests.load.oauth_benchmark --users 2 --no-cleanup

Troubleshooting

Leftover Test Users

Symptom: Test users remain in Nextcloud after benchmark crashes

Solution: Run the cleanup utility:

uv run python -m tests.load.cleanup_loadtest_users

"User X not in pool" Error

Ensure user count doesn't exceed configured limits
Check that user creation succeeded in previous steps

CancelledError During Benchmark

Symptom: Error message like 'CancelledError' object has no attribute 'username' appears in logs

Cause: Async task cancellation during benchmark shutdown or errors can cause race conditions in error handling

Solution: This has been mitigated with defensive error handling. The worker now:

Catches asyncio.CancelledError specifically before general exceptions
Logs cancellation gracefully without attempting to access potentially invalid state
Re-raises the exception to allow proper cleanup chain

If you still see this error, it's likely harmless and occurs during shutdown. The benchmark results should still be valid.

High Error Rates

Increase delay between operations (await asyncio.sleep() in worker)
Check OAuth token validity
Verify MCP OAuth server is running and accessible (port 8001)
Rebuild mcp-oauth container after code changes: docker-compose up --build -d mcp-oauth

Workflows Failing

Check step-by-step latencies to identify failing steps
Verify users have correct permissions
Review server logs for errors

MCP Session Creation Fails (401 Unauthorized)

Solution: This issue has been fixed! OAuth tokens are now properly passed as Authorization headers when creating MCP sessions.

If you still see 401 errors:

Rebuild the mcp-oauth container: docker-compose up --build -d mcp-oauth
Verify OAuth tokens are being acquired successfully in verbose mode
Check that the token hasn't expired (use shorter test durations during troubleshooting)

Future Enhancements

Dynamic user creation (beyond 4 default users) - COMPLETED
OAuth token injection for MCP sessions - COMPLETED
Cancel scope error handling - COMPLETED
Concurrent user creation and authentication - COMPLETED (3-4x speedup!)
Workflow templates for common patterns
Real-time dashboard for live monitoring
Historical comparison and regression detection
Load ramping (gradual user increase)
Geographic distribution simulation (latency injection)
Improve cleanup reliability in finally block

18 KiB Raw Permalink Blame History

OAuth Multi-User Load Testing Framework

Quick Start

Overview

Architecture

Components

Key Classes

Available Workflows

1. NoteShareWorkflow

2. CollaborativeEditWorkflow

3. FileShareAndDownloadWorkflow

4. MixedOAuthWorkload

Usage

Basic Usage

Advanced Options

CLI Options

Test User Creation

Metrics Output

Console Report

JSON Export

Implementation Status

✅ Completed Components

Creating Custom Workflows

Example: Permission Escalation Workflow

Registering Custom Workflows

Performance Expectations

Baseline Performance (basic auth, from existing benchmarks)

OAuth Multi-User Expectations

Common Bottlenecks

Best Practices

Performance Optimizations

Concurrent User Creation

Cleanup

Cleanup Utility (Recommended)

Disable Automatic Cleanup

Troubleshooting

Leftover Test Users

"User X not in pool" Error

CancelledError During Benchmark

High Error Rates

Workflows Failing

MCP Session Creation Fails (401 Unauthorized)

Future Enhancements

18 KiB

Raw Permalink Blame History