Files
Chris Coutinho c272ddd82d feat: implement RAG evaluation framework with CLI tooling
- Add ADR-013 documenting RAG evaluation architecture
- Implement two-part evaluation: Context Recall (retrieval) + Answer Correctness (generation)
- Create Click CLI for ground truth generation and corpus upload
- Add pytest fixtures and tests for retrieval/generation quality
- Use BeIR/nfcorpus dataset with 5 selected test queries
- Support Ollama and Anthropic LLM providers
- Generate synthetic ground truth answers offline
- Add comprehensive documentation in tests/rag_evaluation/README.md

The framework separates one-time setup (generate/upload) from test execution,
making tests much faster (~6-12 min vs ~15-25 min per run).

Tests are manual only (not in CI) and require external LLM access.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-15 23:11:21 +01:00

2 lines
73 B
Python

"""RAG evaluation tests for the Nextcloud MCP semantic search system."""