Performance Evaluation Python Packages

vot-toolkit

The official VOT Challenge evaluation and analysis toolkit

2K 204 53

model-confidence-set

Model Confidence Set (MCS) implementation in Python

1K 20 2

statline

StatLine — weighted player scoring, efficiency modeling, CLI + adapter tooling

910 2 0

model-compare

model-compare evaluates AI models side‑by‑side based on user tasks, rating accuracy, creativity, and efficiency to guide model choice.

490 1 0