Llm Judge Python Packages

structai

StructAI offers a robust toolkit for LLM interaction—such as structured outputs, context management, and parallel execution.

2K 7 2

verdict

Inference-time scaling for LLMs-as-a-judge.

996 345 28

fieldtest

LLM evaluation framework — define what correct, well-formed, and safe means before you measure

549 0 0

eval-harness

A boring, config-driven harness for evaluating AI systems. One YAML drives the run, the trace is the source of truth. Offline, backtesting, and online-eval modes — works with any agent, RAG, or code-modifying system.

381 0 0

hermes-rubric

Evidence-first structured scoring. Class-aware rubric templates for deterministic dim sets across runs.

139 1 0