Llm Evaluation Framework Python Packages

deepeval

The LLM Evaluation Framework

7.6M 17K 2K

qa-metrics

An easy python package to run quick basic QA evaluations. This package includes standardized QA evaluation metrics and semantic evaluation metrics: Black-box and Open-Source large language model prompting and evaluation, exact match, F1 Score, PEDANT semantic match, transformer match. Our package also supports prompting OPENAI and Anthropic API.

6K 61 6

parea-ai

Python SDK for experimenting, testing, evaluating & monitoring LLM-powered applications - Parea AI (YC S23)

3K 82 11

rhesis-sdk

The testing platform for AI teams. Bring engineers, PMs, and domain experts together to generate tests, simulate (adversarial) conversations, and trace every failure to its root cause.

2K 375 26

langfair

LangFair is a Python library for conducting use-case level LLM bias and fairness assessments

2K 259 46

rhesis

The testing platform for AI teams. Bring engineers, PMs, and domain experts together to generate tests, simulate (adversarial) conversations, and trace every failure to its root cause.

2K 375 26