sparse-autoencoder
openinterp — Python SDK + CLI. FabricationGuard hallucination probe + ProbeBench leaderboard + Atlas search + Trace generation. pip install openinterp
Mechanistic interpretability on Apple Silicon: steering vectors, residual capture, and SAE analysis for MLX models
SANSA - sparse EASE for millions of items
A tiny easily hackable implementation of a feature dashboard.
Mechanistic interpretability as reward signal for RL training of LLMs — SAE features + GRPO + anti-Goodhart framework