performance-evaluation
The official VOT Challenge evaluation and analysis toolkit
model-confidence-set provides a Python implementation of the Model Confidence Set (MCS) procedure (Hansen, Lunde, and Nason, 2011), a statistical method for comparing and selecting models based on their performance.
StatLine — Advanced weighted player scoring and analytics, with modular tools for awards, performance evaluation, and real-time integrations.
model-compare evaluates AI models side‑by‑side based on user tasks, rating accuracy, creativity, and efficiency to guide model choice.