Qdrant validation rejects None for sparse vectors in named vector dicts.
Use models.SparseVector(indices=[], values=[]) instead to create valid
empty sparse vectors for placeholder points.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Introduces a placeholder-based state tracking system to prevent duplicate
document processing during the gap between scanner queuing and processor
completion.
**Key Changes:**
1. **Placeholder Helper Functions** (`vector/placeholder.py`):
- `write_placeholder_point()` - Creates zero-vector placeholder when queuing
- `query_document_metadata()` - Queries for existing entry (placeholder or real)
- `delete_placeholder_point()` - Removes placeholder before writing real vectors
- `get_placeholder_filter()` - Filters placeholders from user-facing queries
2. **Scanner Updates** (`vector/scanner.py`):
- Replace `indexed_at` comparison with `modified_at` comparison
- Write placeholder before queuing each document
- Query per-document metadata instead of bulk-querying indexed_at
- Fixes bug where files were resubmitted every scan cycle
3. **Processor Updates** (`vector/processor.py`):
- Delete placeholder before upserting real vectors
- Ensures no duplicate points in Qdrant
4. **Query Filters** (all search files):
- Add `get_placeholder_filter()` to all user-facing queries
- Ensures placeholders never appear in search results or visualizations
- Applied to: bm25_hybrid.py, semantic.py, viz_routes.py, algorithms.py
**Architecture:**
- Placeholders use zero vectors with dimension from embedding service
- Payload includes `is_placeholder: True` flag for filtering
- Status field tracks: "pending", "processing", "completed", "failed"
- Deterministic UUIDs using uuid5 for consistent point IDs
**Impact:**
- Eliminates duplicate processing of same documents
- Fixes race condition where long-running documents get queued multiple times
- Prevents scanner from resubmitting files every scan cycle
- Maintains clean separation between in-flight and indexed documents
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>