activation-intervention
Stanford NLP Python library for understanding and improving PyTorch models via interventions