PyRank
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About

Llm Benchmark Python Packages

Python packages with the GitHub topic llm-benchmark. Sorted by relevance, with stars and monthly downloads.
Basaltlabs-app
gauntlet-cli

Community-driven behavioral reliability benchmark for LLMs. 231 probes across 19 modules, deterministic scoring, perplexity correlation, layer sensitivity mapping, quant method capture, hardware-stratified community rankings. Every test contributes to the community dataset.

2K 6 0
npow
context-bench

Benchmark any system that transforms LLM context: compressors, RAG rerankers, memory managers, and more.

383 0 0
AmadeusITGroup
pickyourllm

Pick Your LLM: Intelligent, Use-Case Aware LLM Model advisor for Optimal Performance and Cost

258 1 0
bluet
arguslm

ArgusLM — Open-source LLM monitoring & benchmarking SDK

220 1 1
    • Data from PyPI, GitHub, ClickHouse, and BigQuery