PyRank
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About

Evaluation Python Packages

Python packages with the GitHub topic evaluation. Sorted by relevance, with stars and monthly downloads.
langchain-ai
langsmith

LangSmith Client SDK Implementations

85.6M 889 233
mlflow
mlflow-skinny

The open source AI engineering platform for agents, LLMs, and ML models. MLflow enables teams of all sizes to debug, evaluate, monitor, and optimize production-quality AI applications while controlling costs and managing access to models and data.

38.2M 26K 6K
mlflow
mlflow

The open source AI engineering platform for agents, LLMs, and ML models. MLflow enables teams of all sizes to debug, evaluate, monitor, and optimize production-quality AI applications while controlling costs and managing access to models and data.

37.1M 26K 6K
mlflow
mlflow-tracing

The open source AI engineering platform for agents, LLMs, and ML models. MLflow enables teams of all sizes to debug, evaluate, monitor, and optimize production-quality AI applications while controlling costs and managing access to models and data.

17M 26K 6K
danthedeckie
simpleeval

Simple Safe Sandboxed Extensible Expression Evaluator for Python

13.9M 599 93
huggingface
evaluate

🤗 Evaluate: A library for easily evaluating machine learning models and datasets.

6.6M 2K 320
comet-ml
opik

Debug, evaluate, and monitor your LLM applications, RAG systems, and agentic workflows with comprehensive tracing, automated evaluations, and production-ready dashboards.

6.2M 19K 1K
vibrantlabsai
ragas

Supercharge Your LLM Application Evaluations 🚀

1.4M 14K 1K
MiXaiLL76
faster-coco-eval

Continuation of an abandoned project fast-coco-eval

561K 141 11
MichaelGrupp
evo

Python package for the evaluation of odometry and SLAM

213K 4K 792
AmenRa
ranx

⚡️A Blazing-Fast Python Library for Ranking Evaluation, Comparison, and Fusion 🐍

121K 677 31
cvangysel
pytrec-eval

pytrec_eval is an Information Retrieval evaluation tool for Python, based on the popular trec_eval.

84K 346 36
ibm
unitxt

🦄 Unitxt is a Python library for enterprise-grade evaluation of AI performance, offering the world's largest catalog of tools and data for end-to-end AI benchmarking

77K 212 67
yaroslaff
evalidate

Safe and fast evaluation of untrusted user-supplied python expressions

58K 40 4
agenta-ai
agenta

The open-source LLMOps platform: prompt playground, prompt management, LLM evaluation, and LLM observability all in one place.

51K 4K 520
comet-ml
opik-optimizer

Debug, evaluate, and monitor your LLM applications, RAG systems, and agentic workflows with comprehensive tracing, automated evaluations, and production-ready dashboards.

42K 19K 1K
huggingface
lighteval

Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backends

42K 2K 463
modelscope
evalscope

A streamlined and customizable framework for efficient large model (LLM, VLM, AIGC) evaluation and performance benchmarking.

38K 3K 331
run-house
kubetorch

Distribute and run AI workloads on Kubernetes magically in Python, like PyTorch for ML infra.

34K 1K 58
run-house
runhouse

Distribute and run AI workloads on Kubernetes magically in Python, like PyTorch for ML infra.

29K 1K 58
zouharvi
subset2evaluate

Find informative examples to efficiently (human)-evaluate NLG models.

24K 18 3
dustalov
evalica

Evalica, your favourite evaluation toolkit

21K 62 5
foreai-co
fore

The fore client package

19K 13 1
sepandhaghighi
pycm

Multi-class confusion matrix library in Python

17K 2K 125
    • Data from PyPI, GitHub, ClickHouse, and BigQuery