2147fc1696
Refactors PR #190's hardcoded Unstructured.io integration into a flexible, extensible plugin system supporting multiple text extraction engines. - **`DocumentProcessor` ABC**: Abstract interface for all processors - **`ProcessorRegistry`**: Central registry for discovery and routing - **`ProcessingResult`**: Standardized output format across processors - **`UnstructuredProcessor`**: Refactored from `UnstructuredClient` - **`TesseractProcessor`**: Local OCR for images (lightweight alternative) - **`CustomHTTPProcessor`**: Generic wrapper for custom HTTP APIs - New `get_document_processor_config()` returns structured config - Supports enabling/disabling individual processors - Per-processor configuration via environment variables - **Breaking Change**: `ENABLE_UNSTRUCTURED_PARSING` replaced with: - `ENABLE_DOCUMENT_PROCESSING=true/false` (master switch) - `ENABLE_UNSTRUCTURED=true/false` (per-processor) - `ENABLE_TESSERACT=true/false` - `ENABLE_CUSTOM_PROCESSOR=true/false` - `parse_document()` now uses `ProcessorRegistry` - Auto-selects appropriate processor based on MIME type - Processor priority system (Unstructured=10, Tesseract=5, Custom=1) - `initialize_document_processors()` registers processors at startup - Integrated into both BasicAuth and OAuth lifespans - Graceful degradation if processors fail to initialize ```env ENABLE_DOCUMENT_PROCESSING=false ENABLE_UNSTRUCTURED=false UNSTRUCTURED_API_URL=http://unstructured:8000 UNSTRUCTURED_STRATEGY=auto # auto|fast|hi_res UNSTRUCTURED_LANGUAGES=eng,deu ENABLE_TESSERACT=false TESSERACT_LANG=eng ENABLE_CUSTOM_PROCESSOR=false CUSTOM_PROCESSOR_URL=http://localhost:9000/process CUSTOM_PROCESSOR_TYPES=application/pdf,image/jpeg ``` - **Removed**: `tests/test_unstructured_config.py` (legacy tests) - **Added**: `tests/unit/test_document_processor_config.py` - 7 unit tests for new config system - Tests individual and multi-processor configurations - **Added**: - `nextcloud_mcp_server/document_processors/__init__.py` - `nextcloud_mcp_server/document_processors/base.py` - `nextcloud_mcp_server/document_processors/registry.py` - `nextcloud_mcp_server/document_processors/unstructured.py` - `nextcloud_mcp_server/document_processors/tesseract.py` - `nextcloud_mcp_server/document_processors/custom_http.py` - `tests/unit/test_document_processor_config.py` - **Modified**: - `nextcloud_mcp_server/config.py` - New plugin config system - `nextcloud_mcp_server/app.py` - Processor initialization - `nextcloud_mcp_server/utils/document_parser.py` - Uses registry - `nextcloud_mcp_server/server/webdav.py` - Import updates - `env.sample` - New configuration format - `docker-compose.yml` - (profile changes from previous work) - **Removed**: - `nextcloud_mcp_server/client/unstructured_client.py` - Replaced by UnstructuredProcessor - `tests/test_unstructured_config.py` - Replaced with new tests ✅ **Extensible**: Add processors without modifying core code ✅ **Testable**: Mock processors for unit tests ✅ **Configurable**: Enable only needed processors ✅ **Flexible**: Choose fast (Tesseract) vs accurate (Unstructured) ✅ **Opt-in**: Disabled by default, no mandatory dependencies Users upgrading from PR #190 need to update environment variables: ```bash ENABLE_UNSTRUCTURED_PARSING=true ENABLE_DOCUMENT_PROCESSING=true ENABLE_UNSTRUCTURED=true ``` 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
5.0 KiB
5.0 KiB
Test Suite Reorganization Summary
Completed: 2025-10-24
Changes Implemented
1. Added Test Layer Markers
File: pyproject.toml
Added four test markers to enable selective test execution:
@pytest.mark.unit- Fast unit tests with mocked dependencies@pytest.mark.integration- Integration tests requiring Docker containers@pytest.mark.oauth- OAuth tests requiring Playwright (slowest)@pytest.mark.smoke- Critical path smoke tests
2. Created Unit Test Suite
Directory: tests/unit/
Added fast unit tests (~5 seconds total):
test_scope_decorator.py(5 tests) - Scope decorator metadata logictest_response_models.py(6 tests) - Pydantic model serialization
Total: 11 unit tests
3. Reorganized OAuth Tests
Directory: tests/server/oauth/
Moved all OAuth-related tests to dedicated subdirectory:
- Created
test_oauth_core.py- consolidated basic OAuth connectivity tests - Moved 7 OAuth test files to
oauth/subdirectory - Fixed relative imports (
..conftest→...conftest)
Files:
test_oauth_core.py- Basic OAuth connectivity & JWT operations (8 tests)test_scope_authorization.py- Scope filtering & enforcement (16 tests)test_introspection_authorization.py- Token introspection auth (5 tests)test_dcr_token_type.py- Dynamic client registration (3 tests)test_oauth_notes_permissions.py- Notes app permissions (4 tests)test_oauth_deck_permissions.py- Deck app permissions (4 tests)test_oauth_file_permissions.py- Files app permissions (4 tests)
Total: ~48 OAuth tests
4. Created Smoke Test Suite
Directory: tests/smoke/
Added critical path validation tests (~30-60 seconds):
test_smoke.py(5 tests) - Essential functionality validation- MCP connectivity
- Notes CRUD
- Calendar basic operations
- WebDAV basic operations
- OAuth connectivity
5. Updated Documentation
File: CLAUDE.md
Added comprehensive test execution guide:
# Fast feedback (unit tests) - ~5 seconds
uv run pytest tests/unit/ -v
# Smoke tests - ~30-60 seconds
uv run pytest -m smoke -v
# Integration without OAuth - ~2-3 minutes
uv run pytest -m "integration and not oauth" -v
# Full suite - ~4-5 minutes
uv run pytest
# OAuth only - ~3 minutes
uv run pytest -m oauth -v
Added test structure diagram and marker documentation.
Test Suite Metrics
Before Reorganization:
- ~235 tests, all integration
- No fast feedback loop
- All tests take ~5-7 minutes
- OAuth tests scattered across 9 files
After Reorganization:
- 234 tests total (11 unit + 5 smoke + ~218 integration)
- Fast feedback: unit tests in ~5 seconds
- Quick validation: smoke tests in ~30-60 seconds
- Focused testing: integration without OAuth in ~2-3 minutes
- Full suite: ~4-5 minutes
- OAuth tests consolidated in dedicated directory
Feedback Time Improvements
| Test Type | Count | Time | Use Case |
|---|---|---|---|
| Unit only | 11 | ~5s | Logic changes, model updates |
| Smoke only | 5 | ~30-60s | Critical path validation |
| Integration (no OAuth) | ~172 | ~2-3min | API/MCP changes |
| OAuth only | 48 | ~3min | OAuth feature work |
| Full suite | 234 | ~4-5min | Pre-commit validation |
Key Benefits
-
Fast Development Feedback
- Unit tests run in 5 seconds vs. 5+ minutes
- Immediate validation for logic changes
-
Efficient CI/CD
- Can run unit tests on every commit
- Run smoke tests for pull requests
- Full suite for merge to main
-
Better Organization
- OAuth tests grouped together
- Clear test purpose from directory structure
- Easier to navigate and maintain
-
Selective Execution
- Skip slow OAuth tests during development
- Run only relevant test layer
- Faster iteration cycles
Migration Notes
- No breaking changes to existing tests
- All tests continue to work as before
- Legacy commands still supported (
-m integration, etc.) - OAuth tests moved to subdirectory, imports updated
- Removed duplicate tests consolidated into
test_oauth_core.py
Next Steps (Optional Future Work)
- Further Consolidation: Merge remaining OAuth permission tests
- More Unit Tests: Add unit tests for client initialization, search logic
- Client/Server Deduplication: Reduce overlap between client and server tests
- CI Pipeline: Configure GitHub Actions to run test layers separately
- Performance: Optimize fixtures to reduce setup time
Commands Reference
# Development workflow
uv run pytest tests/unit/ -v # Check logic changes
uv run pytest -m smoke -v # Quick validation
uv run pytest -m "integration and not oauth" -v # Full validation without slow tests
# Before committing
uv run pytest # Run everything
# Working on OAuth features
uv run pytest tests/server/oauth/ -v # OAuth tests only
uv run pytest -m oauth --browser firefox --headed -v # Debug OAuth with visible browser