PyRank
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About

Agent Evaluation Python Packages

Python packages with the GitHub topic agent-evaluation. Sorted by relevance, with stars and monthly downloads.
truera
trulens-core

Evaluation and Tracking for LLM Experiments and AI Agents

147K 3K 280
truera
trulens

Evaluation and Tracking for LLM Experiments and AI Agents

123K 3K 280
truera
trulens-providers-litellm

Evaluation and Tracking for LLM Experiments and AI Agents

73K 3K 280
truera
trulens-dashboard

Evaluation and Tracking for LLM Experiments and AI Agents

62K 3K 280
truera
trulens-feedback

Evaluation and Tracking for LLM Experiments and AI Agents

61K 3K 280
truera
trulens-otel-semconv

Evaluation and Tracking for LLM Experiments and AI Agents

60K 3K 280
truera
trulens-eval

Evaluation and Tracking for LLM Experiments and AI Agents

53K 3K 280
Giskard-AI
giskard

🐢 Open-Source Evaluation & Testing library for LLM Agents

36K 5K 458
truera
trulens-connectors-snowflake

Evaluation and Tracking for LLM Experiments and AI Agents

35K 3K 280
yohanpoul
etzchaim

A diagnosable brain for your LLM. Cognitive architecture in the SOAR/ACT-R/CLARION/LIDA lineage, for the LLM era. Apache 2.0.

28K 1 0
truera
trulens-providers-cortex

Evaluation and Tracking for LLM Experiments and AI Agents

21K 3K 280
truera
trulens-providers-openai

Evaluation and Tracking for LLM Experiments and AI Agents

16K 3K 280
truera
trulens-apps-langchain

Evaluation and Tracking for LLM Experiments and AI Agents

14K 3K 280
truera
trulens-apps-llamaindex

Evaluation and Tracking for LLM Experiments and AI Agents

10K 3K 280
mozilla-ai
any-agent

A single interface to use and evaluate different agent frameworks

7K 1K 93
truera
trulens-providers-langchain

Evaluation and Tracking for LLM Experiments and AI Agents

6K 3K 280
truera
trulens-providers-bedrock

Evaluation and Tracking for LLM Experiments and AI Agents

5K 3K 280
truera
trulens-apps-langgraph

Evaluation and Tracking for LLM Experiments and AI Agents

4K 3K 280
truera
trulens-providers-huggingface

Evaluation and Tracking for LLM Experiments and AI Agents

4K 3K 280
hidai25
evalview

Regression testing for AI agents. Snapshot behavior,diff tool calls,catch regressions in CI. Works with LangGraph, CrewAI, OpenAI, Anthropic.

3K 105 20
truera
trulens-benchmark

Evaluation and Tracking for LLM Experiments and AI Agents

3K 3K 280
truera
trulens-apps-nemo

Evaluation and Tracking for LLM Experiments and AI Agents

3K 3K 280
truera
trulens-hotspots

Evaluation and Tracking for LLM Experiments and AI Agents

3K 3K 280
reacher-z
clawbench-eval

Open-source benchmark for browser AI agents on 153 everyday online tasks across 144 live websites. 5-layer recording + DOM-match + LLM judge. Top score 33.3%.

2K 314 19
    • Data from PyPI, GitHub, ClickHouse, and BigQuery