nextcloud-mcp-server

Files

T

Chris Coutinho 31fade9730 perf: Optimize PDF processing with parallel extraction and single-render highlights

Phase 1 - PDF Highlighting Optimization:
- Render each page ONCE instead of once per chunk (N chunks = 1 render, not N)
- Use PIL to draw bounding boxes on copied base images (fast) instead of
  re-rendering page via pymupdf (slow)
- Add _find_chunk_bbox() to extract bbox without modifying page

Phase 2 - Parallel Page Extraction:
- Use anyio task group with run_sync() for parallel page extraction
- Each page extracted in separate thread via anyio.to_thread.run_sync()
- Event loop stays responsive during extraction
- Remove obsolete _process_sync() method

Expected improvement: 30-50% reduction in total PDF processing time.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

2025-11-22 03:11:56 +01:00

__init__.py

feat: Implement BM25 hybrid search with native Qdrant RRF fusion

2025-11-16 06:59:44 +01:00

algorithms.py

feat: Implement Qdrant placeholder state management

2025-11-20 15:04:00 +01:00

bm25_hybrid.py

feat: Implement Qdrant placeholder state management

2025-11-20 15:04:00 +01:00

context.py

feat: Add context expansion to semantic search with chunk overlap removal