Commit Graph

56 Commits

Author SHA1 Message Date
Chris Coutinho 2147fc1696 refactor: Transform document parsing into pluggable processor architecture
Refactors PR #190's hardcoded Unstructured.io integration into a flexible,
extensible plugin system supporting multiple text extraction engines.

- **`DocumentProcessor` ABC**: Abstract interface for all processors
- **`ProcessorRegistry`**: Central registry for discovery and routing
- **`ProcessingResult`**: Standardized output format across processors

- **`UnstructuredProcessor`**: Refactored from `UnstructuredClient`
- **`TesseractProcessor`**: Local OCR for images (lightweight alternative)
- **`CustomHTTPProcessor`**: Generic wrapper for custom HTTP APIs

- New `get_document_processor_config()` returns structured config
- Supports enabling/disabling individual processors
- Per-processor configuration via environment variables
- **Breaking Change**: `ENABLE_UNSTRUCTURED_PARSING` replaced with:
  - `ENABLE_DOCUMENT_PROCESSING=true/false` (master switch)
  - `ENABLE_UNSTRUCTURED=true/false` (per-processor)
  - `ENABLE_TESSERACT=true/false`
  - `ENABLE_CUSTOM_PROCESSOR=true/false`

- `parse_document()` now uses `ProcessorRegistry`
- Auto-selects appropriate processor based on MIME type
- Processor priority system (Unstructured=10, Tesseract=5, Custom=1)

- `initialize_document_processors()` registers processors at startup
- Integrated into both BasicAuth and OAuth lifespans
- Graceful degradation if processors fail to initialize

```env
ENABLE_DOCUMENT_PROCESSING=false

ENABLE_UNSTRUCTURED=false
UNSTRUCTURED_API_URL=http://unstructured:8000
UNSTRUCTURED_STRATEGY=auto  # auto|fast|hi_res
UNSTRUCTURED_LANGUAGES=eng,deu

ENABLE_TESSERACT=false
TESSERACT_LANG=eng

ENABLE_CUSTOM_PROCESSOR=false
CUSTOM_PROCESSOR_URL=http://localhost:9000/process
CUSTOM_PROCESSOR_TYPES=application/pdf,image/jpeg
```

- **Removed**: `tests/test_unstructured_config.py` (legacy tests)
- **Added**: `tests/unit/test_document_processor_config.py`
  - 7 unit tests for new config system
  - Tests individual and multi-processor configurations

- **Added**:
  - `nextcloud_mcp_server/document_processors/__init__.py`
  - `nextcloud_mcp_server/document_processors/base.py`
  - `nextcloud_mcp_server/document_processors/registry.py`
  - `nextcloud_mcp_server/document_processors/unstructured.py`
  - `nextcloud_mcp_server/document_processors/tesseract.py`
  - `nextcloud_mcp_server/document_processors/custom_http.py`
  - `tests/unit/test_document_processor_config.py`

- **Modified**:
  - `nextcloud_mcp_server/config.py` - New plugin config system
  - `nextcloud_mcp_server/app.py` - Processor initialization
  - `nextcloud_mcp_server/utils/document_parser.py` - Uses registry
  - `nextcloud_mcp_server/server/webdav.py` - Import updates
  - `env.sample` - New configuration format
  - `docker-compose.yml` - (profile changes from previous work)

- **Removed**:
  - `nextcloud_mcp_server/client/unstructured_client.py` - Replaced by UnstructuredProcessor
  - `tests/test_unstructured_config.py` - Replaced with new tests

 **Extensible**: Add processors without modifying core code
 **Testable**: Mock processors for unit tests
 **Configurable**: Enable only needed processors
 **Flexible**: Choose fast (Tesseract) vs accurate (Unstructured)
 **Opt-in**: Disabled by default, no mandatory dependencies

Users upgrading from PR #190 need to update environment variables:
```bash
ENABLE_UNSTRUCTURED_PARSING=true

ENABLE_DOCUMENT_PROCESSING=true
ENABLE_UNSTRUCTURED=true
```

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-25 19:28:35 +02:00
yuisheaven 64649c902d Merge branch 'master' into feature/introduce_files_parsing_with_unstructured_service_for_webdav_files_retrieval 2025-10-21 20:37:00 +02:00
Chris Coutinho e8f1340133 fix(caldav): Fix caldav search() due to missing todos 2025-10-20 22:18:46 +02:00
Chris Coutinho 71f09a47ca docs: Update CalendarClient docstrings [skip ci] 2025-10-20 00:54:35 +02:00
Chris Coutinho f4dd68735c test: Fix how categories are handled in calendar 2025-10-20 00:04:38 +02:00
Chris Coutinho a143123acc fix(caldav): Check that calendar exists after creation to avoid race condition
Verify that field preservation tests still operate
2025-10-19 23:44:39 +02:00
Chris Coutinho 1dc2ddfdb7 fix(caldav): Properly parse datetimes as vDDDTypes 2025-10-19 20:13:05 +02:00
Chris Coutinho 92e18825bc feat(caldav): Add support for tasks 2025-10-19 18:02:43 +02:00
Chris Coutinho c2f6c6ce0d ci: Set cookbook recipe import timeout to 5min 2025-10-19 01:49:21 +02:00
Chris Coutinho 31ffeba69b chore: Move timeout to recipe import 2025-10-18 23:12:31 +02:00
Chris Coutinho 6158a890af feat(webdav): Add search and list favorite response tools 2025-10-18 22:02:26 +02:00
Chris Coutinho 37164dbdbc chore: sort imports 2025-10-18 22:02:25 +02:00
Chris Coutinho 83917b3786 perf(notes): Improve notes search performance using async iterators 2025-10-18 22:02:19 +02:00
Chris Coutinho 8e7191e0ea fix: Increase HTTP client timeout to 30s
The default 5s timeout was too short for Nextcloud Cookbook app to fetch and process recipes from external URLs, causing intermittent test failures with ReadTimeout errors.

Fixes intermittent CI failures in cookbook import tests.
2025-10-17 04:41:28 +02:00
Chris Coutinho 9de59db718 feat(cookbook): Add full Cookbook app support with 13 tools and 2 resources
- Import recipes from URLs using schema.org metadata
- Full CRUD operations for recipes
- Search, categorize, and organize recipes
- Manage keywords/tags and categories
- Configure app settings and trigger reindexing
2025-10-17 03:08:16 +02:00
Chris Coutinho 5db02313a1 test: Update share client to fix test, update passwords 2025-10-15 10:35:22 +02:00
Chris Coutinho 85f8522085 feat: Add Groups API client 2025-10-15 03:43:25 +02:00
Chris Coutinho a38c795124 feat: add sharing API client and server tools 2025-10-15 02:59:26 +02:00
Chris Coutinho 7004104873 test: Fix multi-user tests 2025-10-15 02:11:17 +02:00
Chris Coutinho 7a4a31b52d fix: Update user/groups API to OCS v2 2025-10-15 00:05:22 +02:00
Chris Coutinho 898c2e72ae Merge remote-tracking branch 'origin/master' into feature/user-api 2025-10-14 23:43:03 +02:00
Chris Coutinho 13e4915e38 test: Remove unused pytest fixtures 2025-10-14 01:23:39 +02:00
Chris Coutinho 2b11718c43 test: continue working on oauth client 2025-10-14 01:23:30 +02:00
Chris Coutinho 33b962a7fc test: Setup interactive browser test 2025-10-14 01:23:30 +02:00
Chris Coutinho 4d7e4b9a4b feat(server): Experimental support for OAuth2/OIDC authentication 2025-10-14 01:22:15 +02:00
yuisheaven 3ff6346c03 ran ruff format via uv 2025-10-05 02:16:42 +02:00
yuisheaven c9a687171a added envs for unstructured to control OCR quality and OCR languages 2025-10-04 05:21:02 +02:00
yuisheaven 76dce41ed9 added first versoin of the new document_parser utility and added it to the webdav file retrieval logic 2025-10-04 04:28:24 +02:00
Chris Coutinho 961f23b5ea feat(users): Initialize user API client 2025-09-11 09:42:42 +02:00
Chris Coutinho e7a5caa0d6 Merge remote-tracking branch 'origin/master' into feature/deck 2025-09-11 00:37:58 +02:00
Chris Coutinho 167053578d feat(deck): Initialize Deck app client/server 2025-09-11 00:10:25 +02:00
Pedro Ruiz 5d4902a73e feat: Add WebDAV resource copy functionality 2025-09-10 22:15:16 +02:00
Pedro Ruiz b55b9640c6 feat: Add WebDAV resource move/rename functionality 2025-09-10 22:12:17 +02:00
Chris Coutinho 4cf5f2a95a feat(client): Preserve fields when modifying contacts/calendar resources 2025-08-30 19:19:20 +02:00
Chris Coutinho 0484167a22 refactor: Use _make_request where available 2025-08-30 14:27:53 +02:00
Chris Coutinho 84ad1958af chore: Remove unnecessary logging
Migrate pre-commit tasks to local
2025-08-30 14:25:16 +02:00
Rémi Nivet 4f7023a16e fix(client): Use paging to fetch all notes 2025-08-29 23:46:58 +02:00
Chris Coutinho 3836534205 fix(client): Strip cookies from responses to avoid falsely raising CSRF errors 2025-08-08 21:03:16 +02:00
Chris Coutinho 72cb62a101 test(contacts): Add unit/integration tests for a few tools 2025-08-03 14:36:16 +02:00
Chris Coutinho 70f01bf40a Add files 2025-08-03 14:16:55 +02:00
Chris Coutinho 37b1057d2a feat(contacts): Initialize Contacts App 2025-08-03 14:15:37 +02:00
Chris Coutinho 8956945e9d chore: sort imports 2025-08-01 12:21:32 +02:00
Chris Coutinho 69fccb496a Use self._make_request 2025-08-01 11:05:28 +02:00
Chris Coutinho 6bdbb6ea6c Create sample calendar 2025-08-01 10:26:56 +02:00
Chris Coutinho 0b8a3aa646 Prepare calendar before running tests 2025-08-01 09:29:15 +02:00
Chris Coutinho 2bcfd3d7ee fix(calendar): Fix iCalendar date vs datetime format 2025-08-01 08:34:51 +02:00
Chris Coutinho 75235d6013 Refactor datetime 2025-07-31 14:51:33 +02:00
Chris Coutinho b81fe6dfa0 fix(calendar): Remove try/except in calendar API 2025-07-30 11:03:01 +02:00
Neovasky 83748a27da fix: apply ruff formatting to pass CI checks
- Fixed line length issues in logger.warning calls
- Removed trailing spaces in docstrings
- Applied consistent formatting across all files
2025-07-28 11:52:10 -04:00
Neovasky 3ddeeab67f fix(calendar): address PR feedback from maintainer
- Remove CHANGELOG.md changes (auto-generated from commits)
- Move all parameter descriptions into function docstrings for LLM context
- Remove unused caldav dependency (using httpx for CalDAV implementation)
- Move datetime imports to top of modules
- Remove load_dotenv from tests/conftest.py
- Clarify Event vs Meeting distinction in docstrings
- Handle 401 auth errors gracefully in calendar tests

Addresses all feedback from PR #95 review
2025-07-28 11:44:53 -04:00