Sparse Autoencoder Python Packages

interp-lab

An open LLM interpretability tool that helps researchers and AI agents identify LLM activations and features and make Sparse Auto Encoders as needed.

2K 1 0

sansa

SANSA - sparse EASE for millions of items

2K 46 7

tiny-dashboard

A tiny easily hackable implementation of a feature dashboard.

595 17 2

warden-interp

Circuit-level regression testing for AI systems. Catch mechanistic drift that behavioral evals miss.

561 3 0

gavagai

Quantify translation indeterminacy between sparse autoencoder feature dictionaries (Quine × Mechanistic Interpretability).

496 0 0

openinterp

openinterp — Python SDK + CLI. FabricationGuard hallucination probe + ProbeBench leaderboard + Atlas search + Trace generation. pip install openinterp

475 0 1

mlx-lens

Mechanistic interpretability on Apple Silicon: steering vectors, residual capture, and SAE analysis for MLX models

341 1 0

recurrentlens

Mechanistic interpretability for State-Space Models: SAEs, feature visualization, and a Hub registry for Mamba/Mamba-2.

264 0 0

gated-sae-tf

A TensorFlow/Keras Sparse Gated Autoencoder (Gated SAE) for dictionary learning and interpretability.

172 0 0

mechreward

Mechanistic interpretability as reward signal for RL training of LLMs

144 6 0