nextcloud-mcp-server

brandon/nextcloud-mcp-server

Fork 0

Commit Graph

Author	SHA1	Message	Date
Chris Coutinho	2896fa1dc9	feat: Add tag management methods to WebDAV client - Add get_file_info() to get file info including file ID via PROPFIND - Add create_tag() to create system tags via OCS API - Add get_or_create_tag() for idempotent tag creation - Add assign_tag_to_file() to assign tags to files via WebDAV - Add remove_tag_from_file() to remove tags from files Also refactors RAG evaluation: - Add indexed_manual_pdf fixture using existing nc_client/nc_mcp_client - Remove manual tag creation steps from workflow (now handled by fixture) - Add comprehensive unit tests for new WebDAV methods 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-23 01:51:42 +01:00
Chris Coutinho	d67aa6ae5c	fix: Align PDF text extraction between indexing and context expansion This commit fixes two critical issues with PDF processing: 1. Text extraction mismatch (context expansion bug): - Indexing used pymupdf4llm.to_markdown() producing markdown text - Context expansion used page.get_text() producing plain text - Different text formats caused character offset misalignment - Search would find correct chunk, but expansion showed wrong section - Fixed by making context.py use pymupdf4llm.to_markdown() consistently 2. Diagnostic logging for page number assignment: - Added logging to verify page_boundaries exist in metadata - Added logging to verify assign_page_numbers() assigns values - Helps diagnose why page numbers show as null in search results 3. mime_type storage bug: - Fixed incorrect field reference in processor.py:405 - Was using file_metadata.get("content_type", "") - Should use content_type from WebDAV response Changes: - nextcloud_mcp_server/search/context.py: Use pymupdf4llm.to_markdown() for PDF text extraction to match indexing method - nextcloud_mcp_server/vector/processor.py: Add diagnostic logging for page boundaries and assignment, fix mime_type storage - tests/unit/client/test_webdav.py: Fix import sorting 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-20 13:57:50 +01:00

Author

SHA1

Message

Date

Chris Coutinho

2896fa1dc9

feat: Add tag management methods to WebDAV client

- Add get_file_info() to get file info including file ID via PROPFIND
- Add create_tag() to create system tags via OCS API
- Add get_or_create_tag() for idempotent tag creation
- Add assign_tag_to_file() to assign tags to files via WebDAV
- Add remove_tag_from_file() to remove tags from files

Also refactors RAG evaluation:
- Add indexed_manual_pdf fixture using existing nc_client/nc_mcp_client
- Remove manual tag creation steps from workflow (now handled by fixture)
- Add comprehensive unit tests for new WebDAV methods

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

2025-11-23 01:51:42 +01:00

Chris Coutinho

d67aa6ae5c

fix: Align PDF text extraction between indexing and context expansion

This commit fixes two critical issues with PDF processing:

1. **Text extraction mismatch (context expansion bug)**:
   - Indexing used pymupdf4llm.to_markdown() producing markdown text
   - Context expansion used page.get_text() producing plain text
   - Different text formats caused character offset misalignment
   - Search would find correct chunk, but expansion showed wrong section
   - Fixed by making context.py use pymupdf4llm.to_markdown() consistently

2. **Diagnostic logging for page number assignment**:
   - Added logging to verify page_boundaries exist in metadata
   - Added logging to verify assign_page_numbers() assigns values
   - Helps diagnose why page numbers show as null in search results

3. **mime_type storage bug**:
   - Fixed incorrect field reference in processor.py:405
   - Was using file_metadata.get("content_type", "")
   - Should use content_type from WebDAV response

Changes:
- nextcloud_mcp_server/search/context.py: Use pymupdf4llm.to_markdown()
  for PDF text extraction to match indexing method
- nextcloud_mcp_server/vector/processor.py: Add diagnostic logging for
  page boundaries and assignment, fix mime_type storage
- tests/unit/client/test_webdav.py: Fix import sorting

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

2025-11-20 13:57:50 +01:00

2 Commits