PyRank
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About

Llm As Judge Python Packages

Python packages with the GitHub topic llm-as-judge. Sorted by relevance, with stars and monthly downloads.
regokan
eval-harness

A boring, config-driven harness for evaluating AI systems. One YAML drives the run, the trace is the source of truth. Offline, backtesting, and online-eval modes — works with any agent, RAG, or code-modifying system.

1K 0 0
Ahmad8864
autosynth

Agentic synthetic-data generation framework inspired by Meta FAIR's Autodata / Agentic Self-Instruct.

717 0 0
auraoneai
judge-bench

Bias probes and reproducible diagnostics for LLM-as-judge evaluation workflows.

340 0 0
auraoneai
judge-card

A disclosure format for judge prompts, calibration results, known bias, and recommended use envelopes.

340 0 0
WesleyPeng
pyxtaf

Agentic Extensible Test Automation Framework

269 7 2
TECHKNOWMAD-LABS
cortex-research-suite

[Legacy] AI Research OS — 27 self-evolving skills. Decomposed into 20 repos at github.com/TECHKNOWMAD-LABS

151 0 0
dariero
ragaliq

LLM & RAG evaluation testing framework — hallucination detection, faithfulness metrics, answer relevance scoring, and retrieval pipeline testing with pytest integration

88 1 0
    • Data from PyPI, GitHub, ClickHouse, and BigQuery