nextcloud-mcp-server

brandon/nextcloud-mcp-server

Fork 0

Commit Graph

Author	SHA1	Message	Date
Chris Coutinho	fffe483c02	fix: Centralize PDF processing and generate separate images per chunk Previously, pymupdf4llm.to_markdown() was called twice - once in PyMuPDFProcessor during indexing and again in PDFHighlighter during visualization. Different image path lengths caused different character offsets, leading to highlighted pages not matching their chunks. Also fixed issue where all chunks on the same page showed all highlights instead of just their own highlight. Now restores original page contents between chunks using xref stream caching. Changes: - Add PDFHighlighter class requiring pre-computed page_boundaries and full_text from document processor (no fallback extraction) - Pass pre-computed data from processor to highlighter - Extract page-relative portion of chunk text for cross-page chunks - Add bounding box highlighting using text anchor search - Run highlight generation in parallel with embedding/BM25 - Cache and restore page contents to isolate highlights per chunk Results: Highlighting success rate improved from 51% to 95% (121/128). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-22 02:46:30 +01:00
Chris Coutinho	53689d076b	feat: Improve vector visualization with static assets and fixes - Extract CSS and JavaScript into separate static files - Created nextcloud_mcp_server/auth/static/vector-viz.css - Created nextcloud_mcp_server/auth/static/vector-viz.js - Updated templates to reference external assets - Fix vector visualization issues: - Normalize vectors before PCA to match Qdrant's cosine distance - Add zero-norm and NaN detection/handling for large datasets - Enable responsive Plotly sizing (autosize + responsive config) - Widen plot area to full viewport width with minimized margins - Improve visualization accuracy: - Query point now positioned correctly relative to documents - Handles 200+ points without JSON serialization errors - Full-width plot maximizes screen space utilization 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-19 04:10:44 +01:00

Author

SHA1

Message

Date

Chris Coutinho

fffe483c02

fix: Centralize PDF processing and generate separate images per chunk

Previously, pymupdf4llm.to_markdown() was called twice - once in
PyMuPDFProcessor during indexing and again in PDFHighlighter during
visualization. Different image path lengths caused different character
offsets, leading to highlighted pages not matching their chunks.

Also fixed issue where all chunks on the same page showed all highlights
instead of just their own highlight. Now restores original page contents
between chunks using xref stream caching.

Changes:
- Add PDFHighlighter class requiring pre-computed page_boundaries and
  full_text from document processor (no fallback extraction)
- Pass pre-computed data from processor to highlighter
- Extract page-relative portion of chunk text for cross-page chunks
- Add bounding box highlighting using text anchor search
- Run highlight generation in parallel with embedding/BM25
- Cache and restore page contents to isolate highlights per chunk

Results: Highlighting success rate improved from 51% to 95% (121/128).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

2025-11-22 02:46:30 +01:00

Chris Coutinho

53689d076b

feat: Improve vector visualization with static assets and fixes

- Extract CSS and JavaScript into separate static files
  - Created nextcloud_mcp_server/auth/static/vector-viz.css
  - Created nextcloud_mcp_server/auth/static/vector-viz.js
  - Updated templates to reference external assets

- Fix vector visualization issues:
  - Normalize vectors before PCA to match Qdrant's cosine distance
  - Add zero-norm and NaN detection/handling for large datasets
  - Enable responsive Plotly sizing (autosize + responsive config)
  - Widen plot area to full viewport width with minimized margins

- Improve visualization accuracy:
  - Query point now positioned correctly relative to documents
  - Handles 200+ points without JSON serialization errors
  - Full-width plot maximizes screen space utilization

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

2025-11-19 04:10:44 +01:00

2 Commits