nextcloud-mcp-server/tests/load/README_OAUTH.md

# OAuth Multi-User Load Testing Framework

Comprehensive multi-user benchmarking system for testing OAuth-authenticated Nextcloud MCP server with realistic collaborative workflows.

## Quick Start

```bash
# 1. Ensure docker-compose is running
docker-compose up -d

# 2. Run a benchmark with 2 users for 30 seconds
uv run python -m tests.load.oauth_benchmark --users 2 --duration 30

# 3. Clean up test users (IMPORTANT - always run after benchmark)
uv run python -m tests.load.cleanup_loadtest_users

# Optional: Verify cleanup
uv run python -m tests.load.cleanup_loadtest_users --dry-run
```

## Overview

This framework extends the basic load testing infrastructure to support:
- **Multiple OAuth-authenticated users** running concurrently
- **Coordinated workflows** spanning multiple users (sharing, collaboration, permissions)
- **Per-user metrics** tracking individual user performance
- **Workflow-specific metrics** measuring cross-user operation latencies
- **Realistic scenarios** mimicking actual user collaboration patterns
- **Concurrent user creation** - all users created and authenticated in parallel for fast setup

## Architecture

### Components

```
tests/load/
├── oauth_pool.py          # OAuth user pool management
├── oauth_workloads.py     # Multi-user workflow definitions
├── oauth_metrics.py       # Enhanced metrics collection
├── oauth_benchmark.py     # Main CLI entry point
└── README_OAUTH.md        # This file
```

### Key Classes

**OAuthUserPool** (`oauth_pool.py`)
- Manages N OAuth-authenticated users
- Handles token acquisition and storage
- Creates and manages MCP sessions per user
- Tracks per-user operation statistics

**UserSessionWrapper** (`oauth_pool.py`)
- Wraps MCP ClientSession for a specific user
- Automatic operation tracking
- Convenient tool/resource access methods

**Workflow** (`oauth_workloads.py`)
- Base class for multi-user coordinated workflows
- Step-by-step execution with timing
- Comprehensive error handling and reporting

**OAuthBenchmarkMetrics** (`oauth_metrics.py`)
- Per-user operation counts and latencies
- Workflow completion rates and timings
- Baseline operation statistics
- Detailed reporting and JSON export

## Available Workflows

### 1. NoteShareWorkflow
**Scenario**: Alice creates a note and shares it with Bob, who then reads it.

**Steps**:
1. User A creates a note
2. User A shares note with User B (read-only permissions)
3. User B lists their shared notes (measures propagation delay)
4. User B reads the shared note

**Metrics**: Creation latency, share propagation time, read latency

### 2. CollaborativeEditWorkflow
**Scenario**: Multiple users concurrently edit the same note.

**Steps**:
1. Owner creates a note
2. All users read the note simultaneously
3. All users append content concurrently
4. Owner verifies final state

**Metrics**: Concurrent read latency, concurrent write conflicts, final state consistency

### 3. FileShareAndDownloadWorkflow
**Scenario**: Alice uploads a file, shares it with Bob, who then downloads it.

**Steps**:
1. User A creates a file via WebDAV
2. User A shares file with User B (read-only)
3. User B lists their shares
4. User B downloads the file

**Metrics**: Upload latency, share creation, download latency

### 4. MixedOAuthWorkload
**Distribution**:
- 50% Baseline operations (individual user CRUD)
- 30% Note sharing workflows
- 15% Collaborative editing workflows
- 5% File sharing workflows

## Usage

### Basic Usage

```bash
# 4 users, 60-second test with mixed workload
uv run python -m tests.load.oauth_benchmark --users 4 --duration 60

# 10 users, 5-minute test
uv run python -m tests.load.oauth_benchmark -u 10 -d 300

# Export results to JSON
uv run python -m tests.load.oauth_benchmark -u 5 -d 120 --output results.json
```

### Advanced Options

```bash
# Sharing-focused workload
uv run python -m tests.load.oauth_benchmark --workload sharing -u 8 -d 180

# Collaborative editing workload
uv run python -m tests.load.oauth_benchmark --workload collaboration -u 6 -d 120

# Baseline operations only (no workflows)
uv run python -m tests.load.oauth_benchmark --workload baseline -u 10 -d 60

# Verbose logging for debugging
uv run python -m tests.load.oauth_benchmark -u 2 -d 30 --verbose
```

### CLI Options

| Option | Short | Default | Description |
|--------|-------|---------|-------------|
| `--users` | `-u` | 2 | Number of concurrent users (dynamically created) |
| `--duration` | `-d` | 30.0 | Test duration in seconds |
| `--warmup` | `-w` | 5.0 | Warmup period before metrics collection (seconds) |
| `--url` | | `http://localhost:8001/mcp` | MCP OAuth server URL |
| `--output` | `-o` | None | JSON output file path |
| `--workload` | | `mixed` | Workload type: mixed, sharing, collaboration, baseline |
| `--user-prefix` | | `loadtest` | Prefix for dynamically created usernames |
| `--cleanup/--no-cleanup` | | `cleanup` | Delete created users after benchmark |
| `--browser` | | `chromium` | Playwright browser: firefox, chromium, webkit |
| `--headed` | | False | Run browser in headed mode (visible window) |
| `--verbose` | `-v` | False | Enable verbose logging |

## Test User Creation

The framework **dynamically creates test users** on-demand with OAuth authentication:

- **Naming**: Users are created with the pattern `{prefix}_user_{n}` (default: `loadtest_user_1`, `loadtest_user_2`, etc.)
- **Customization**: Use `--user-prefix` to change the prefix (e.g., `--user-prefix mytest` → `mytest_user_1`)
- **Scalability**: No limit on user count - create as many concurrent users as your system can handle
- **Credentials**: Each user gets a randomly generated secure password
- **OAuth Tokens**: All users authenticate via automated OAuth flow using Playwright
- **Cleanup**: Users are automatically deleted after the benchmark (disable with `--no-cleanup`)

**Example**: Running `--users 5` creates:
- `loadtest_user_1` (Display: Load Test User 1, Email: loadtest_user_1@benchmark.local)
- `loadtest_user_2` (Display: Load Test User 2, Email: loadtest_user_2@benchmark.local)
- `loadtest_user_3` (Display: Load Test User 3, Email: loadtest_user_3@benchmark.local)
- `loadtest_user_4` (Display: Load Test User 4, Email: loadtest_user_4@benchmark.local)
- `loadtest_user_5` (Display: Load Test User 5, Email: loadtest_user_5@benchmark.local)

## Metrics Output

### Console Report

```
================================================================================
OAUTH MULTI-USER BENCHMARK RESULTS
================================================================================

Duration: 120.45s
Total Users: 5
Total Workflows Executed: 312
Total Baseline Operations: 678

--------------------------------------------------------------------------------
WORKFLOW STATISTICS
--------------------------------------------------------------------------------
Workflow                         Total  Success     Rate        P50        P95
--------------------------------------------------------------------------------
note_share                         112      109    97.3%   0.2341s   0.4782s
collaborative_edit                  65       61    93.8%   0.5123s   0.9234s
file_share                          29       29   100.0%   0.3456s   0.6123s

--------------------------------------------------------------------------------
PER-USER STATISTICS
--------------------------------------------------------------------------------
User                  Total Ops    Success   Errors     Rate        P50
--------------------------------------------------------------------------------
loadtest_user_1              289        283        6    97.9%   0.2456s
loadtest_user_2              245        241        4    98.4%   0.2123s
loadtest_user_3              231        226        5    97.8%   0.2345s
loadtest_user_4              198        195        3    98.5%   0.2234s
loadtest_user_5              187        184        3    98.4%   0.2189s

--------------------------------------------------------------------------------
BASELINE OPERATIONS
--------------------------------------------------------------------------------
Total Operations: 678
Success Rate: 98.2%
Latency: min=0.0234s, p50=0.1234s, p95=0.3456s, max=0.8123s
================================================================================
```

### JSON Export

```json
{
  "summary": {
    "duration": 120.45,
    "total_workflows": 312,
    "total_baseline_ops": 678,
    "total_users": 5
  },
  "workflows": {
    "note_share": {
      "total_executions": 112,
      "successful_executions": 109,
      "failed_executions": 3,
      "success_rate": 97.3,
      "latency": {
        "min": 0.1234,
        "max": 0.8765,
        "mean": 0.2891,
        "median": 0.2341,
        "p90": 0.4123,
        "p95": 0.4782,
        "p99": 0.7234
      },
      "step_latencies": {
        "create_note": {...},
        "share_note": {...},
        "list_shared_with_me": {...},
        "read_shared_note": {...}
      }
    }
  },
  "users": {
    "loadtest_user_1": {
      "total_operations": 289,
      "successful_operations": 283,
      "failed_operations": 6,
      "success_rate": 97.9,
      "latency": {...},
      "operations_breakdown": {...},
      "errors_breakdown": {...}
    },
    "loadtest_user_2": {...},
    "loadtest_user_3": {...},
    "loadtest_user_4": {...},
    "loadtest_user_5": {...}
  },
  "baseline": {...}
}
```

## Implementation Status

### ✅ Completed Components

**Framework:**
- OAuth user pool management with dynamic user creation
- User session wrappers with automatic tracking
- Workflow base classes and framework
- 3 example workflows (note share, collaborative edit, file share)
- Enhanced metrics with per-user and workflow tracking
- CLI interface with multiple workload options
- Comprehensive reporting (console + JSON)

**OAuth Integration:**
- ✅ Playwright browser automation for OAuth login
- ✅ OAuth callback server for auth code capture
- ✅ Token exchange with OIDC provider
- ✅ OAuth token injection into MCP sessions via Authorization headers
- ✅ Cancel scope error handling for reliable cleanup
- ✅ Dynamic user creation and deletion via Nextcloud Users API

**Implementation Details:**
The benchmark now successfully:
1. Creates Nextcloud users dynamically with unique passwords
2. Acquires OAuth tokens via automated Playwright browser flows
3. Creates MCP client sessions with proper `Authorization: Bearer {token}` headers
4. Executes coordinated multi-user workflows
5. Tracks per-user and per-workflow metrics
6. Provides standalone cleanup utility for test users

**Key Fix (oauth_pool.py:163-164)**:
```python
# Pass OAuth token as Authorization header
headers = {"Authorization": f"Bearer {profile.token}"}
streamable_context = streamablehttp_client(mcp_url, headers=headers)
```

## Creating Custom Workflows

### Example: Permission Escalation Workflow

```python
class PermissionEscalationWorkflow(Workflow):
    """Test sharing permission changes."""

    def __init__(self):
        super().__init__("permission_escalation")

    async def execute(self, users: list[UserSessionWrapper]) -> WorkflowResult:
        self.start_time = time.time()

        if len(users) < 2:
            return self._finish(False, error="Requires 2+ users")

        owner, collaborator = users[0], users[1]

        # Step 1: Owner creates note
        create_result = await self._execute_step(
            "create_note",
            owner,
            lambda: owner.call_tool("nc_notes_create_note", {...})
        )

        # Step 2: Share read-only
        await self._execute_step(
            "share_readonly",
            owner,
            lambda: owner.call_tool("nc_share_create", {
                "permissions": 1  # Read-only
            })
        )

        # Step 3: Upgrade to edit permissions
        await self._execute_step(
            "upgrade_permissions",
            owner,
            lambda: owner.call_tool("nc_share_update", {
                "permissions": 15  # Read+update+create+delete
            })
        )

        # Step 4: Collaborator edits
        await self._execute_step(
            "collaborator_edit",
            collaborator,
            lambda: collaborator.call_tool("nc_notes_update_note", {...})
        )

        return self._finish(success=True)
```

### Registering Custom Workflows

```python
# In oauth_workloads.py
class MixedOAuthWorkload:
    def __init__(self, users: list[UserSessionWrapper]):
        self.users = users
        self.workflows = {
            "note_share": NoteShareWorkflow(),
            "collaborative_edit": CollaborativeEditWorkflow(),
            "file_share": FileShareAndDownloadWorkflow(),
            "permission_escalation": PermissionEscalationWorkflow(),  # Add your workflow
        }
```

## Performance Expectations

### Baseline Performance (basic auth, from existing benchmarks)
- **Throughput**: 50-200 RPS for mixed workload
- **Latency**: p50 <100ms, p95 <500ms, p99 <1000ms

### OAuth Multi-User Expectations
- **Lower throughput**: ~30-60% of baseline due to:
  - OAuth token validation overhead
  - Cross-user synchronization delays
  - Workflow coordination overhead
- **Higher p99 latency**: Due to workflow step dependencies
- **Focus**: End-to-end workflow completion time more important than raw RPS

### Common Bottlenecks
1. **OAuth token validation**: Per-request overhead
2. **Share propagation**: Time for shares to become visible to recipients
3. **Concurrent edit conflicts**: ETags and conflict resolution
4. **Permission checks**: Cross-user access validation

## Best Practices

1. **Start Small**: Begin with 2-3 users to validate workflows
2. **Monitor Errors**: Watch for permission errors and conflicts
3. **Adjust Delays**: Tune sleep delays between operations based on server response
4. **Profile Workflows**: Use step latencies to identify bottlenecks
5. **Export Results**: Always export to JSON for historical comparison

## Performance Optimizations

### Concurrent User Creation

The benchmark creates and authenticates users **concurrently** for maximum performance:

**Step 5: User Creation & OAuth Authentication**
- All N users are created in parallel using `asyncio.gather()`
- Each user runs through the full OAuth flow simultaneously
- Multiple Playwright browser contexts operate independently

**Step 6: MCP Session Creation**
- All user sessions are created concurrently
- OAuth tokens passed as Authorization headers to each session

**Performance Impact:**
- **Sequential** (old): ~10-12s per user → 40-48s for 4 users
- **Concurrent** (new): ~12-15s total for 4 users (3-4x speedup!)

Example output showing concurrent execution:
```
Step 5/6: Creating 4 users and acquiring OAuth tokens...
(Running concurrently for faster setup)

  [1/4] Creating user 'loadtest_user_1'...
  [2/4] Creating user 'loadtest_user_2'...
  [3/4] Creating user 'loadtest_user_3'...
  [4/4] Creating user 'loadtest_user_4'...
  ✓ User 'loadtest_user_4' authenticated
  ✓ User 'loadtest_user_2' authenticated
  ✓ User 'loadtest_user_1' authenticated
  ✓ User 'loadtest_user_3' authenticated

✓ Successfully created and authenticated 4 users
```

**Implementation** (oauth_benchmark.py:402-437):
```python
# Create tasks for all users
tasks = [
    create_user_task(i, browser, callback_server.auth_states)
    for i in range(num_users)
]
# Run all concurrently
results = await asyncio.gather(*tasks, return_exceptions=True)
```

## Cleanup

**Important**: Due to asyncio scoping issues with the MCP client library, automatic cleanup in the benchmark's finally block may not execute reliably. Always use the cleanup utility after running benchmarks.

### Cleanup Utility (Recommended)

Use the cleanup utility to remove test users:

```bash
# Dry run - see what would be deleted
uv run python -m tests.load.cleanup_loadtest_users --dry-run

# Delete all loadtest users
uv run python -m tests.load.cleanup_loadtest_users

# Delete users with custom prefix
uv run python -m tests.load.cleanup_loadtest_users --prefix mytest
```

### Disable Automatic Cleanup

To keep test users after the benchmark for inspection:

```bash
uv run python -m tests.load.oauth_benchmark --users 2 --no-cleanup
```

## Troubleshooting

### Leftover Test Users
**Symptom**: Test users remain in Nextcloud after benchmark crashes

**Solution**: Run the cleanup utility:
```bash
uv run python -m tests.load.cleanup_loadtest_users
```

### "User X not in pool" Error
- Ensure user count doesn't exceed configured limits
- Check that user creation succeeded in previous steps

### CancelledError During Benchmark
**Symptom**: Error message like `'CancelledError' object has no attribute 'username'` appears in logs

**Cause**: Async task cancellation during benchmark shutdown or errors can cause race conditions in error handling

**Solution**: This has been mitigated with defensive error handling. The worker now:
- Catches `asyncio.CancelledError` specifically before general exceptions
- Logs cancellation gracefully without attempting to access potentially invalid state
- Re-raises the exception to allow proper cleanup chain

If you still see this error, it's likely harmless and occurs during shutdown. The benchmark results should still be valid.

### High Error Rates
- Increase delay between operations (`await asyncio.sleep()` in worker)
- Check OAuth token validity
- Verify MCP OAuth server is running and accessible (port 8001)
- Rebuild mcp-oauth container after code changes: `docker-compose up --build -d mcp-oauth`

### Workflows Failing
- Check step-by-step latencies to identify failing steps
- Verify users have correct permissions
- Review server logs for errors

### MCP Session Creation Fails (401 Unauthorized)
**Solution**: This issue has been fixed! OAuth tokens are now properly passed as Authorization headers when creating MCP sessions.

If you still see 401 errors:
- Rebuild the mcp-oauth container: `docker-compose up --build -d mcp-oauth`
- Verify OAuth tokens are being acquired successfully in verbose mode
- Check that the token hasn't expired (use shorter test durations during troubleshooting)

## Future Enhancements

- [x] Dynamic user creation (beyond 4 default users) - **COMPLETED**
- [x] OAuth token injection for MCP sessions - **COMPLETED**
- [x] Cancel scope error handling - **COMPLETED**
- [x] Concurrent user creation and authentication - **COMPLETED** (3-4x speedup!)
- [ ] Workflow templates for common patterns
- [ ] Real-time dashboard for live monitoring
- [ ] Historical comparison and regression detection
- [ ] Load ramping (gradual user increase)
- [ ] Geographic distribution simulation (latency injection)
- [ ] Improve cleanup reliability in finally block