tcot
The cost-and-failure-mode benchmark for LLM agents. Methodology plus Python package for honest, reproducible cross-provider agent evaluation.