activation-patching
Stanford NLP Python library for understanding and improving PyTorch models via interventions