PyRank
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About

Agent Benchmark Python Packages

Python packages with the GitHub topic agent-benchmark. Sorted by relevance, with stars and monthly downloads.
hidai25
evalview

Regression testing for AI agents. Snapshot behavior,diff tool calls,catch regressions in CI. Works with LangGraph, CrewAI, OpenAI, Anthropic.

3K 105 20
NoesisVision
nasde-toolkit

CLI for benchmarks & evals of AI coding agents — on tasks you already understand, using your Claude / Codex / Gemini individual subscriptions or API keys.

1K 9 0
he-yufeng
codejoust

Pit AI coding agents against the same bug. Score them on tests, diff, cost, and time — pick the winning patch.

934 4 0
justindobbs
tracecore

Deterministic runtime for agent evaluation

514 8 0
    • Data from PyPI, GitHub, ClickHouse, and BigQuery