Skip to content

Latest commit

 

History

History
43 lines (30 loc) · 1.49 KB

File metadata and controls

43 lines (30 loc) · 1.49 KB

Contributing

Thanks for contributing to Awesome Agentic Evaluation.

What To Add

Add resources that are directly relevant to agentic evaluation, not generic LLM evaluation alone.

Strong additions usually fit at least one of these categories:

  • benchmarking
  • environment simulation
  • trajectory or process evaluation
  • observability and tracing
  • benchmark rigor and methodology
  • production testing and regression workflows

Curation Rules

  • Prefer primary sources: official repository, official paper, or official documentation.
  • Keep each entry to one link and one concise sentence whenever possible.
  • Mark archived, outdated, or historically important resources clearly.
  • Do not add generic "awesome AI" projects that are not evaluation-centric.
  • Place entries in the most specific section that fits.
  • Keep entries alphabetized within a section when practical.

Suggested Entry Format

- [**Project / Paper Name**](link) - One-line explanation of what it evaluates and why it matters.

Pull Request Checklist

  • I added a primary source link.
  • I placed the item in the most specific section available.
  • I kept the description concise and factual.
  • I checked for obvious duplicates.
  • I noted archival or historical status when relevant.

Scope Notes

This repository focuses on evaluations where agents interact with tools, environments, users, or production systems. Static model-only benchmarks are usually out of scope unless they are directly reused for agent evaluation.