5 Commits

Author SHA1 Message Date
Chris Coutinho 5a251a99e6 fix: Set is_placeholder=False in processor to fix search filtering
The processor was not setting is_placeholder field when writing real
document chunks to Qdrant. This caused the placeholder filter to exclude
all documents (since None != False), resulting in 0 search results.

Now explicitly sets is_placeholder: False in payload when writing real
indexed chunks, allowing search filters to correctly distinguish between
placeholders and real documents.
2025-11-20 17:15:19 +01:00
Chris Coutinho 98d1c2de8e perf: make note deletion concurrent in upload --force
- Collect all notes to delete first, then delete concurrently
- Use anyio task group with semaphore (20 concurrent deletions)
- Add progress reporting and error tracking for deletions
- Show count of notes found before deletion starts

This significantly improves --force performance when refreshing large
corpuses (e.g., 3,633 notes now delete in ~1 minute instead of ~5 minutes).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-16 00:55:27 +01:00
Chris Coutinho 30a4d84458 feat: add concurrent uploads and --force flag to upload command
- Add --force flag to delete all existing notes in target category before upload
- Implement concurrent uploads using anyio task groups (20 concurrent max)
- Add semaphore to limit concurrent requests and avoid overwhelming server
- Improve progress reporting with upload count and error tracking
- Update README with --force flag documentation

Performance improvement: Concurrent uploads significantly reduce upload time
from ~10-15 minutes to ~2-3 minutes for 3,633 documents.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-16 00:41:00 +01:00
Chris Coutinho defa8db18e fix: download qrels from BEIR ZIP instead of HuggingFace
- HuggingFace BeIR/nfcorpus only has 'corpus' and 'queries' configs
- Download qrels from original BEIR ZIP file (nfcorpus.zip)
- Use synchronous httpx.Client for download (simpler than async)
- Remove deprecated trust_remote_code parameter

Tested with successful corpus download and qrels extraction.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-16 00:02:15 +01:00
Chris Coutinho c9506da2d2 refactor: replace httpx client with NextcloudClient in upload command
- Use NextcloudClient with BasicAuth instead of raw httpx
- Replace direct HTTP POST with notes.create_note() method
- Add close() method to LLMProvider Protocol for proper cleanup
- Fix type annotations for dataset iteration

This improves code reuse and consistency with the rest of the codebase.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-15 23:26:07 +01:00