feat(search): add file_path metadata and chunk offsets to search results

Changes:
- Add file_path to metadata in semantic and BM25 hybrid search algorithms
  for PDF viewer integration (search/semantic.py:161-163, search/bm25_hybrid.py:230-232)
- Include chunk_start_offset, chunk_end_offset, page_number, and page_count
  in search results for rich chunk display (api/management.py:981-1004)
- Add point_id field to SearchResult for batch retrieval (models/semantic.py)
- Fix type narrowing for chunk context API parameters (api/management.py:1102-1111)
- Fix None-safety in doc_types discovery (search/algorithms.py:114)

This enables the Astroglobe UI to display PDF pages at the correct
location for matched chunks.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
This commit is contained in:
Chris Coutinho
2025-12-15 21:31:10 +01:00
parent a026f2eddb
commit 85db90a2df
5 changed files with 504 additions and 35 deletions
+3
View File
@@ -38,6 +38,9 @@ class SemanticSearchResult(BaseModel):
page_number: Optional[int] = Field(
default=None, description="Page number for PDF documents"
)
page_count: Optional[int] = Field(
default=None, description="Total number of pages in PDF document"
)
# Context expansion fields (optional, populated when include_context=True)
has_context_expansion: bool = Field(
default=False, description="Whether context expansion was performed"