Chris Coutinho
2147fc1696
refactor: Transform document parsing into pluggable processor architecture
...
Refactors PR #190 's hardcoded Unstructured.io integration into a flexible,
extensible plugin system supporting multiple text extraction engines.
- **`DocumentProcessor` ABC**: Abstract interface for all processors
- **`ProcessorRegistry`**: Central registry for discovery and routing
- **`ProcessingResult`**: Standardized output format across processors
- **`UnstructuredProcessor`**: Refactored from `UnstructuredClient`
- **`TesseractProcessor`**: Local OCR for images (lightweight alternative)
- **`CustomHTTPProcessor`**: Generic wrapper for custom HTTP APIs
- New `get_document_processor_config()` returns structured config
- Supports enabling/disabling individual processors
- Per-processor configuration via environment variables
- **Breaking Change**: `ENABLE_UNSTRUCTURED_PARSING` replaced with:
- `ENABLE_DOCUMENT_PROCESSING=true/false` (master switch)
- `ENABLE_UNSTRUCTURED=true/false` (per-processor)
- `ENABLE_TESSERACT=true/false`
- `ENABLE_CUSTOM_PROCESSOR=true/false`
- `parse_document()` now uses `ProcessorRegistry`
- Auto-selects appropriate processor based on MIME type
- Processor priority system (Unstructured=10, Tesseract=5, Custom=1)
- `initialize_document_processors()` registers processors at startup
- Integrated into both BasicAuth and OAuth lifespans
- Graceful degradation if processors fail to initialize
```env
ENABLE_DOCUMENT_PROCESSING=false
ENABLE_UNSTRUCTURED=false
UNSTRUCTURED_API_URL=http://unstructured:8000
UNSTRUCTURED_STRATEGY=auto # auto|fast|hi_res
UNSTRUCTURED_LANGUAGES=eng,deu
ENABLE_TESSERACT=false
TESSERACT_LANG=eng
ENABLE_CUSTOM_PROCESSOR=false
CUSTOM_PROCESSOR_URL=http://localhost:9000/process
CUSTOM_PROCESSOR_TYPES=application/pdf,image/jpeg
```
- **Removed**: `tests/test_unstructured_config.py` (legacy tests)
- **Added**: `tests/unit/test_document_processor_config.py`
- 7 unit tests for new config system
- Tests individual and multi-processor configurations
- **Added**:
- `nextcloud_mcp_server/document_processors/__init__.py`
- `nextcloud_mcp_server/document_processors/base.py`
- `nextcloud_mcp_server/document_processors/registry.py`
- `nextcloud_mcp_server/document_processors/unstructured.py`
- `nextcloud_mcp_server/document_processors/tesseract.py`
- `nextcloud_mcp_server/document_processors/custom_http.py`
- `tests/unit/test_document_processor_config.py`
- **Modified**:
- `nextcloud_mcp_server/config.py` - New plugin config system
- `nextcloud_mcp_server/app.py` - Processor initialization
- `nextcloud_mcp_server/utils/document_parser.py` - Uses registry
- `nextcloud_mcp_server/server/webdav.py` - Import updates
- `env.sample` - New configuration format
- `docker-compose.yml` - (profile changes from previous work)
- **Removed**:
- `nextcloud_mcp_server/client/unstructured_client.py` - Replaced by UnstructuredProcessor
- `tests/test_unstructured_config.py` - Replaced with new tests
✅ **Extensible**: Add processors without modifying core code
✅ **Testable**: Mock processors for unit tests
✅ **Configurable**: Enable only needed processors
✅ **Flexible**: Choose fast (Tesseract) vs accurate (Unstructured)
✅ **Opt-in**: Disabled by default, no mandatory dependencies
Users upgrading from PR #190 need to update environment variables:
```bash
ENABLE_UNSTRUCTURED_PARSING=true
ENABLE_DOCUMENT_PROCESSING=true
ENABLE_UNSTRUCTURED=true
```
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude <noreply@anthropic.com >
2025-10-25 19:28:35 +02:00
yuisheaven
64649c902d
Merge branch 'master' into feature/introduce_files_parsing_with_unstructured_service_for_webdav_files_retrieval
2025-10-21 20:37:00 +02:00
Chris Coutinho
e8f1340133
fix(caldav): Fix caldav search() due to missing todos
2025-10-20 22:18:46 +02:00
Chris Coutinho
71f09a47ca
docs: Update CalendarClient docstrings [skip ci]
2025-10-20 00:54:35 +02:00
Chris Coutinho
f4dd68735c
test: Fix how categories are handled in calendar
2025-10-20 00:04:38 +02:00
Chris Coutinho
a143123acc
fix(caldav): Check that calendar exists after creation to avoid race condition
...
Verify that field preservation tests still operate
2025-10-19 23:44:39 +02:00
Chris Coutinho
1dc2ddfdb7
fix(caldav): Properly parse datetimes as vDDDTypes
2025-10-19 20:13:05 +02:00
Chris Coutinho
92e18825bc
feat(caldav): Add support for tasks
2025-10-19 18:02:43 +02:00
Chris Coutinho
c2f6c6ce0d
ci: Set cookbook recipe import timeout to 5min
2025-10-19 01:49:21 +02:00
Chris Coutinho
31ffeba69b
chore: Move timeout to recipe import
2025-10-18 23:12:31 +02:00
Chris Coutinho
6158a890af
feat(webdav): Add search and list favorite response tools
2025-10-18 22:02:26 +02:00
Chris Coutinho
37164dbdbc
chore: sort imports
2025-10-18 22:02:25 +02:00
Chris Coutinho
83917b3786
perf(notes): Improve notes search performance using async iterators
2025-10-18 22:02:19 +02:00
Chris Coutinho
8e7191e0ea
fix: Increase HTTP client timeout to 30s
...
The default 5s timeout was too short for Nextcloud Cookbook app to fetch and process recipes from external URLs, causing intermittent test failures with ReadTimeout errors.
Fixes intermittent CI failures in cookbook import tests.
2025-10-17 04:41:28 +02:00
Chris Coutinho
9de59db718
feat(cookbook): Add full Cookbook app support with 13 tools and 2 resources
...
- Import recipes from URLs using schema.org metadata
- Full CRUD operations for recipes
- Search, categorize, and organize recipes
- Manage keywords/tags and categories
- Configure app settings and trigger reindexing
2025-10-17 03:08:16 +02:00
Chris Coutinho
5db02313a1
test: Update share client to fix test, update passwords
2025-10-15 10:35:22 +02:00
Chris Coutinho
85f8522085
feat: Add Groups API client
2025-10-15 03:43:25 +02:00
Chris Coutinho
a38c795124
feat: add sharing API client and server tools
2025-10-15 02:59:26 +02:00
Chris Coutinho
7004104873
test: Fix multi-user tests
2025-10-15 02:11:17 +02:00
Chris Coutinho
7a4a31b52d
fix: Update user/groups API to OCS v2
2025-10-15 00:05:22 +02:00
Chris Coutinho
898c2e72ae
Merge remote-tracking branch 'origin/master' into feature/user-api
2025-10-14 23:43:03 +02:00
Chris Coutinho
13e4915e38
test: Remove unused pytest fixtures
2025-10-14 01:23:39 +02:00
Chris Coutinho
2b11718c43
test: continue working on oauth client
2025-10-14 01:23:30 +02:00
Chris Coutinho
33b962a7fc
test: Setup interactive browser test
2025-10-14 01:23:30 +02:00
Chris Coutinho
4d7e4b9a4b
feat(server): Experimental support for OAuth2/OIDC authentication
2025-10-14 01:22:15 +02:00
yuisheaven
3ff6346c03
ran ruff format via uv
2025-10-05 02:16:42 +02:00
yuisheaven
c9a687171a
added envs for unstructured to control OCR quality and OCR languages
2025-10-04 05:21:02 +02:00
yuisheaven
76dce41ed9
added first versoin of the new document_parser utility and added it to the webdav file retrieval logic
2025-10-04 04:28:24 +02:00
Chris Coutinho
961f23b5ea
feat(users): Initialize user API client
2025-09-11 09:42:42 +02:00
Chris Coutinho
e7a5caa0d6
Merge remote-tracking branch 'origin/master' into feature/deck
2025-09-11 00:37:58 +02:00
Chris Coutinho
167053578d
feat(deck): Initialize Deck app client/server
2025-09-11 00:10:25 +02:00
Pedro Ruiz
5d4902a73e
feat: Add WebDAV resource copy functionality
2025-09-10 22:15:16 +02:00
Pedro Ruiz
b55b9640c6
feat: Add WebDAV resource move/rename functionality
2025-09-10 22:12:17 +02:00
Chris Coutinho
4cf5f2a95a
feat(client): Preserve fields when modifying contacts/calendar resources
2025-08-30 19:19:20 +02:00
Chris Coutinho
0484167a22
refactor: Use _make_request where available
2025-08-30 14:27:53 +02:00
Chris Coutinho
84ad1958af
chore: Remove unnecessary logging
...
Migrate pre-commit tasks to local
2025-08-30 14:25:16 +02:00
Rémi Nivet
4f7023a16e
fix(client): Use paging to fetch all notes
2025-08-29 23:46:58 +02:00
Chris Coutinho
3836534205
fix(client): Strip cookies from responses to avoid falsely raising CSRF errors
2025-08-08 21:03:16 +02:00
Chris Coutinho
72cb62a101
test(contacts): Add unit/integration tests for a few tools
2025-08-03 14:36:16 +02:00
Chris Coutinho
70f01bf40a
Add files
2025-08-03 14:16:55 +02:00
Chris Coutinho
37b1057d2a
feat(contacts): Initialize Contacts App
2025-08-03 14:15:37 +02:00
Chris Coutinho
8956945e9d
chore: sort imports
2025-08-01 12:21:32 +02:00
Chris Coutinho
69fccb496a
Use self._make_request
2025-08-01 11:05:28 +02:00
Chris Coutinho
6bdbb6ea6c
Create sample calendar
2025-08-01 10:26:56 +02:00
Chris Coutinho
0b8a3aa646
Prepare calendar before running tests
2025-08-01 09:29:15 +02:00
Chris Coutinho
2bcfd3d7ee
fix(calendar): Fix iCalendar date vs datetime format
2025-08-01 08:34:51 +02:00
Chris Coutinho
75235d6013
Refactor datetime
2025-07-31 14:51:33 +02:00
Chris Coutinho
b81fe6dfa0
fix(calendar): Remove try/except in calendar API
2025-07-30 11:03:01 +02:00
Neovasky
83748a27da
fix: apply ruff formatting to pass CI checks
...
- Fixed line length issues in logger.warning calls
- Removed trailing spaces in docstrings
- Applied consistent formatting across all files
2025-07-28 11:52:10 -04:00
Neovasky
3ddeeab67f
fix(calendar): address PR feedback from maintainer
...
- Remove CHANGELOG.md changes (auto-generated from commits)
- Move all parameter descriptions into function docstrings for LLM context
- Remove unused caldav dependency (using httpx for CalDAV implementation)
- Move datetime imports to top of modules
- Remove load_dotenv from tests/conftest.py
- Clarify Event vs Meeting distinction in docstrings
- Handle 401 auth errors gracefully in calendar tests
Addresses all feedback from PR #95 review
2025-07-28 11:44:53 -04:00