PyRank
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About

Mechanistic Interpretability Python Packages

Python packages with the GitHub topic mechanistic-interpretability. Sorted by relevance, with stars and monthly downloads.
stanfordnlp
pyvene

Stanford NLP Python library for understanding and improving PyTorch models via interventions

9K 876 105
a9lim
llmoji

Making agents cuter

5K 2 0
butanium
nnterp

Unified access to Large Language Model modules using NNsight

3K 109 11
OpenInterpretability
openinterp

openinterp — Python SDK + CLI. FabricationGuard hallucination probe + ProbeBench leaderboard + Atlas search + Trace generation. pip install openinterp

2K 0 1
wheattoast11
carl-studio

Coherence-Aware Reinforcement Learning (CARL) - breakthrough LLM post-training and test-time training paradigm. carl builds the world's most advanced and intelligent agent systems that are a step change over current gen agents

2K 2 1
steering-vectors
steering-vectors

Steering vectors for transformer language models in Pytorch / Huggingface

2K 149 18
designer-coderajay
glassbox-mech-interp

Open-source EU AI Act Annex IV compliance toolkit. Mechanistic interpretability + circuit discovery for transformers. One function call generates a court-ready evidence package

1K 1 0
dthinkr
mlx-lens

Mechanistic interpretability on Apple Silicon: steering vectors, residual capture, and SAE analysis for MLX models

1K 1 0
rashomon-gh
softmax-linear-unit

An implementation of softmax linear unit (solu) in PyTorch

1K 0 0
wheattoast11
carl-core

Coherence-Aware Reinforcement Learning (CARL) - breakthrough LLM post-training and test-time training paradigm. carl builds the world's most advanced and intelligent agent systems that are a step change over current gen agents

975 2 1
arturoornelasb
reptimeline

Track how discrete representations evolve during neural network training — lifecycle events, phase transitions, ontology discovery, and causal verification

911 0 0
taufeeque9
codebook-features

Sparse and discrete interpretability tool for neural networks

314 64 5
scouzi1966
mlxlmprobe

Universal probing and interpretability tool for MLX language models on Apple Silicon

281 3 0
evan-lloyd
graphpatch

graphpatch is a library for activation patching on PyTorch neural network models.

279 21 0
XXO47OXX
neuro-scan

LLM Neuroanatomy Explorer — map what each transformer layer does

271 0 0
designer-coderajay
glassbox-mcp

Open-source EU AI Act Annex IV compliance toolkit. Mechanistic interpretability + circuit discovery for transformers. One function call generates a court-ready evidence package

132 1 0
caiovicentino
mechreward

Mechanistic interpretability as reward signal for RL training of LLMs — SAE features + GRPO + anti-Goodhart framework

126 5 0
yash-srivastava19
arrakis-mi

Arrakis is a library to conduct, track and visualize mechanistic interpretability experiments.

124 31 4
levashi
reprobe

Linear probes and activation steering for transformer models

84 2 0
aiexplorations
todacomm

Topological Data Analysis Comparison of Multiple Models - A framework for characterizing transformer representations using persistent homology

70 0 0
    • Data from PyPI, GitHub, ClickHouse, and BigQuery