Llm Evaluation Metrics Python Packages

deepeval

The LLM Evaluation Framework

7.6M 17K 2K

langfair

LangFair is a Python library for conducting use-case level LLM bias and fairness assessments

2K 259 46

evalsense

Tools for systematic large language model evaluations

484 4 2

deepevals

The LLM Evaluation Framework

413 17K 2K

llmevals

The LLM Evaluation Framework

330 17K 2K

testllm

The LLM Evaluation Framework

190 17K 2K

voigt-kampff

Voigt-Kampff — a behavioral safety scoring engine for AI systems. Part of the SAPIEN Framework.

168 2 0