c272ddd82d
- Add ADR-013 documenting RAG evaluation architecture - Implement two-part evaluation: Context Recall (retrieval) + Answer Correctness (generation) - Create Click CLI for ground truth generation and corpus upload - Add pytest fixtures and tests for retrieval/generation quality - Use BeIR/nfcorpus dataset with 5 selected test queries - Support Ollama and Anthropic LLM providers - Generate synthetic ground truth answers offline - Add comprehensive documentation in tests/rag_evaluation/README.md The framework separates one-time setup (generate/upload) from test execution, making tests much faster (~6-12 min vs ~15-25 min per run). Tests are manual only (not in CI) and require external LLM access. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
19 lines
238 B
Plaintext
19 lines
238 B
Plaintext
__pycache__/
|
|
.coverage
|
|
.env
|
|
*.env
|
|
.env.local
|
|
.env.*.local
|
|
|
|
# Git
|
|
worktrees/
|
|
|
|
docker-compose.override.yml
|
|
|
|
# Generated by pytest used to login users
|
|
.nextcloud_oauth_*.json
|
|
.playwright-mcp/
|
|
|
|
# RAG Evaluation
|
|
tests/rag_evaluation/fixtures/
|