PyRank
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About

Llm Evaluation Framework Python Packages

Python packages with the GitHub topic llm-evaluation-framework. Sorted by relevance, with stars and monthly downloads.
confident-ai
deepeval

The LLM Evaluation Framework

3.3M 15K 1K
zli12321
qa-metrics

An easy python package to run quick basic QA evaluations. This package includes standardized QA evaluation metrics and semantic evaluation metrics: Black-box and Open-Source large language model prompting and evaluation, exact match, F1 Score, PEDANT semantic match, transformer match. Our package also supports prompting OPENAI and Anthropic API.

8K 61 6
parea-ai
parea-ai

Python SDK for experimenting, testing, evaluating & monitoring LLM-powered applications - Parea AI (YC S23)

3K 82 11
rhesis-ai
rhesis-sdk

The testing platform for AI teams. Bring engineers, PMs, and domain experts together to generate tests, simulate (adversarial) conversations, and trace every failure to its root cause.

2K 346 24
msoedov
agentic-security

Agentic LLM Vulnerability Scanner / AI red teaming kit 🧪

2K 2K 252
cvs-health
langfair

LangFair is a Python library for conducting use-case level LLM bias and fairness assessments

2K 257 44
rhesis-ai
rhesis

The testing platform for AI teams. Bring engineers, PMs, and domain experts together to generate tests, simulate (adversarial) conversations, and trace every failure to its root cause.

2K 346 24
gmitt98
fieldtest

LLM evaluation framework — define what correct, well-formed, and safe means before you measure

869 0 0
nhsengland
evalsense

Tools for systematic large language model evaluations

442 4 2
multinear
multinear

Develop reliable AI apps

401 45 1
Addepto
ccheck

A human-friendly framework for testing and evaluating LLMs, RAGs, and chatbots.

345 95 11
mr-gpt
llmevals

The LLM Evaluation Framework

315 16K 1K
mr-gpt
deepevals

The LLM Evaluation Framework

232 16K 1K
vero-labs-ai
vero-eval

Open source framework for evaluating AI Agents

215 29 2
msoedov
mseep-agentic-security

Agentic LLM vulnerability scanner

185 2K 252
mr-gpt
testllm

The LLM Evaluation Framework

165 16K 1K
msoedov
langalf

Agentic LLM Vulnerability Scanner / AI red teaming kit 🧪

110 2K 254
    • Data from PyPI, GitHub, ClickHouse, and BigQuery