nextcloud-mcp-server/tests/rag_evaluation/__init__.py at master - nextcloud-mcp-server - Gitea: Git with a cup of tea

brandon/nextcloud-mcp-server

Files

T

Chris Coutinho c272ddd82d feat: implement RAG evaluation framework with CLI tooling

- Add ADR-013 documenting RAG evaluation architecture
- Implement two-part evaluation: Context Recall (retrieval) + Answer Correctness (generation)
- Create Click CLI for ground truth generation and corpus upload
- Add pytest fixtures and tests for retrieval/generation quality
- Use BeIR/nfcorpus dataset with 5 selected test queries
- Support Ollama and Anthropic LLM providers
- Generate synthetic ground truth answers offline
- Add comprehensive documentation in tests/rag_evaluation/README.md

The framework separates one-time setup (generate/upload) from test execution,
making tests much faster (~6-12 min vs ~15-25 min per run).

Tests are manual only (not in CI) and require external LLM access.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

2025-11-15 23:11:21 +01:00

2 lines

73 B

Python

Raw Permalink Blame History

"""RAG evaluation tests for the Nextcloud MCP semantic search system."""