Ai Benchmarks Python Packages

grandjury

GrandJury SDK — submit LLM traces for human evaluation + analytics client

701 2 1

nvidia-scicode

A benchmark that challenges language models to code solutions for scientific problems

410 212 35

agi-pragma

AI Action Firewall — seven-stage Decision Intelligence Core for safe agentic AI

376 0 0

guardex

Guardex - AI Control Plane for autonomous agents (closed source)

130 0 0