Use the minimal-implementation
branch for an easy-to-use version of edge attribution patching! All code in the minimal_implementation branch has been created by Oscar Balcells.
This repository is currently under development. It is built on top of https://github.com/neelnanda-io/TransformerLens which we may merge into eventually.
Please cite this work as:
@inproceedings{
syed2023attribution,
title={Attribution Patching Outperforms Automated Circuit Discovery},
author={Aaquib Syed and Can Rager and Arthur Conmy},
booktitle={NeurIPS Workshop on Attributing Model Behavior at Scale},
year={2023},
url={https://openreview.net/forum?id=tiLbFR4bJW}
}