eval-harness
A boring, config-driven harness for evaluating AI systems. One YAML drives the run, the trace is the source of truth. Offline, backtesting, and online-eval modes — works with any agent, RAG, or code-modifying system.