Jailbreak Detection Python Packages

pyaigis

Deterministic, zero-dependency Python firewall for AI agents — MCP rug-pull, memory poisoning, indirect injection, exfil channels. 44 compliance templates (US/CN/JP/EU).

2K 51 7

uptrain

UpTrain is an open-source unified platform to evaluate and improve Generative AI applications. We provide grades for 20+ preconfigured checks (covering language, code, embedding use-cases), perform root cause analysis on failure cases and give insights on how to resolve them.

2K 2K 202

llm-injection-guard

Drop-in prompt injection defense for LLM apps and AI agents — detect, block, and audit injection attacks in real time

2K 0 0

shadowshield

Unified open-source security shield for agentic AI systems — defense-in-depth prompt-injection protection (canary tokens, agent-trace alignment audit, tool-call guarding, PII/secret scanning).

1K 0 0

sovereign-shield

Production-grade AI defense — hybrid deterministic filters + optional LLM veto + HITL approval + file validation + hallucination detection. OS-enforced immutability.

1K 19 7

llm-thorn

🌵 Runtime semantic security layer for LLM apps — catches jailbreaks, prompt injection & data exfiltration before they reach your model. Fully local. WAF for the AI era.

718 1 0

intentshield

Pre-execution intent verification for AI agents

621 19 5

reasongate

Explainable security gate for LLM apps — blocks prompt injection with an auditable reason for every decision.

573 0 0

lockllm

Official Python SDK for LockLLM

541 0 0

rlm-toolkit

AI Security Platform: Defense (61 Rust engines + Micro-Model Swarm) + Offense (39K+ payloads)

492 106 17

blackwall-llm-shield-python

Blackwall LLM Shield is an open-source AI security toolkit for JavaScript and Python that protects LLM apps from prompt injection, sensitive data leaks, unsafe tool calls, and hostile RAG content with prompt sanitization, PII masking, output inspection, policy enforcement, and audit trails.

456 1 0

sentinel-llm-security

AI Security Platform: Defense (61 Rust engines + Micro-Model Swarm) + Offense (39K+ payloads)

456 106 17

injectionguard

Prompt injection detection for LLM applications and MCP servers

408 1 0

aisafeguard

Open-source LLM safety guardrails: prompt injection protection, PII redaction, toxicity filtering, and OpenAI-compatible AI proxy

406 0 0

p1-hybrid-jailbreak-detector

Defense-in-depth input safety for LLMs — perplexity gate + FAISS + ModernBERT + LoRA + Llama Guard 3, behind a deterministic policy gate. 99.88% accuracy, 99.47% jailbreak recall, calibrated confidence, ONNX-optimized. Live demo on HF Spaces.

399 0 0

soweak

OWASP LLM Top 10 security middleware framework for Python.

369 1 0

little-canary

Sacrificial LLM instances as behavioral probes for prompt injection detection

361 22 4

pydefend

AI security guardrails for LLM applications — scan inputs and check outputs with Claude, OpenAI, Gemini, Azure, or Ollama.

343 0 0

jailguard

Pure-Rust prompt-injection detector with 1.5MB embedded MLP classifier. 98.40% accuracy, p50 14ms CPU inference, bindings for Python/JS/Go. Apache-2.0/MIT alternative to Rebuff (archived) and Lakera Guard.

303 5 1

sovereign-shield-adaptive

Self-improving security filter for AI applications. Reports missed attacks, sandbox-tests new rules, auto-deploys validated filters.

291 19 7

hermes-jailbreak-bench

Automated jailbreak testing CLI — run a battery of known attack patterns against any LLM endpoint

221 1 0

hermes-jailbench

Automated jailbreak testing CLI — run a battery of known attack patterns against any LLM endpoint

181 1 0

promptscreen

Protect your LLMs from prompt injection and jailbreak attacks. Easy-to-use Python package with multiple detection methods, CLI tool, and FastAPI integration.

175 13 8

vsos-guard

Border Guard — AI-native territory sovereignty & self-evolving security OS. 6-module D-S fusion engine, trajectory detection, anchor detection, fission engine, territory adjudicator. Full design + Python implementation.

127 2 0