Ai Testing Python Packages

langwatch-scenario

Agentic testing for agentic codebases

92K 911 67

giskard

🐢 Open-Source Evaluation & Testing library for LLM Agents

32K 5K 479

prompture

Prompture is an API-first library for requesting structured JSON output from LLMs (or any structure), validating it against a schema, and running comparative tests between models.

9K 11 0

aitest-kit

一套 AI 辅助的自动化测试工具链

3K 12 0

langtest

Deliver safe & effective language models

3K 562 50

nlptest

Deliver safe & effective language models

3K 562 50

seleniumboot-mcp

A Python MCP server that brings Selenium WebDriver automation to Claude and other AI assistants — with built-in code generation for Python (pytest), Java TestNG, and JUnit 5 test scripts.

2K 1 1

openvals

Open-source AI model evaluation and benchmarking framework for LLMs (OpenAI, Ollama, Claude, Gemini)

2K 25 21

aat-devqa

AI-powered automated E2E testing. Just enter a URL — AI generates and runs test scenarios.

2K 7 1

evalcraft

Generate deterministic pytest tests for your AI agents from one real run, then replay them in CI for $0. Fast, flake-free agent testing.

2K 4 2

softgnn-advisor

Graph-guided, runtime-proven, LLM-assisted PR test generation with explicit scan-plan-apply workflow

2K 1 0

chatbot-tracer

An automated approach for exploring and testing conversational agents using large language models. TRACER discovers chatbot functionalities, generates user profiles, and creates comprehensive test suites for conversational AI systems.

2K 2 0

llm-behave

Behavioral testing for LLM applications. pytest plugin with semantic assertions, multi-turn conversation testing, and drift detection. No LLM judge needed.

818 1 0

pytest-self-healer

Stop fixing stale selectors at 3 AM. This Playwright framework uses a local LLM to self-heal broken test locators before they block your CI pipeline.

808 1 0

alignmenter

Check if your AI sounds like your brand, stays safe, and behaves consistently. Works with your custom GPTs, hosted APIs, and local models. Get detailed reports in minutes, not days.

741 6 0

maia-test-framework

A pytest-based framework for testing multi AI agents system. It provides a flexible and extensible platform for creating and running complex multi-agent simulations and capturing the results.

720 1 0

agentrial

Statistical evaluation framework for AI agents

704 17 2

intentguard

A Python library for verifying code properties using natural language assertions.

566 35 0

py-toolguard

The "Cloudflare for AI Agents". 7-layer security interceptor, real-time observability dashboard, and automated reliability testing for MCP and AI tool chains. Prevent hallucinations, prompt injection, and destructive tool calls.

560 14 3

aetherlab

Official AetherLab Python SDK - AI guardrails, LLM safety, and content moderation via the AetherLab compliance API.

480 1 0

tenro

Open-source simulation harness for testing AI agents. Simulate LLM and tool calls to test edge cases, failure paths, and agent logic without live API calls.

477 6 0

ragverdict

pytest for RAG agents — behavioral audits with PASS/FAIL/WEAK verdicts

450 0 0

ai-flow-architect

AI proposes. You decide. — Adversarial AI workflow engine with built-in quality arbitration

402 38 3

ccheck

A human-friendly framework for testing and evaluating LLMs, RAGs, and chatbots.

384 95 11