veritail supports pluggable storage backends for evaluation results. You choose a backend with the --backend CLI flag (default: file).
The file backend requires no external services. All evaluation artifacts are written to a local directory tree under eval-results/:
eval-results/
<experiment-name>/
config.json
judgments.jsonl
metrics.json
report.html
| File | Contents |
|---|---|
config.json |
Experiment configuration (model, adapter, checks, etc.) |
judgments.jsonl |
One JSON object per LLM judgment |
metrics.json |
Computed IR metrics (NDCG, MRR, MAP, etc.) |
report.html |
Interactive HTML report |
No extra install or configuration is needed -- the file backend is included with the base package.
Tip: Add
eval-results/(or your custom--output-dir) to.gitignoreto avoid accidentally committing catalog data to version control.
Langfuse provides a richer experience with trace-level visibility, built-in annotation queues, and experiment versioning.
pip install veritail[langfuse]Set the required environment variables:
export LANGFUSE_PUBLIC_KEY=pk-...
export LANGFUSE_SECRET_KEY=sk-...
# Optional: point at a self-hosted instance
# export LANGFUSE_HOST=https://your-langfuse.example.comPass --backend langfuse when running an evaluation:
veritail run \
--queries queries.csv \
--adapter my_adapter.py \
--backend langfuse \
--llm-model gpt-4oEach LLM judgment is stored as a Langfuse trace with full prompt/response details and a relevance score, making it straightforward to review, annotate, and compare experiments in the Langfuse UI.
The Langfuse backend is write-only — it sends judgments, scores, and traces to Langfuse but cannot read them back. This means:
--resumeis not supported with--backend langfuse. Use the file backend for resumable evaluation runs.- Metrics, reports, and
judgments.jsonlare not persisted locally. The file backend handles these automatically.
If you need both local persistence and Langfuse observability, run with the file backend and export results to Langfuse separately via the Langfuse REST API.
- Supported LLM Providers -- provider setup and model guidance
- Development -- contributing and running tests