Skip to content

Code for my NeurIPS 2024 ATTRIB paper titled "Attribution Patching Outperforms Automated Circuit Discovery"

Notifications You must be signed in to change notification settings

Iust1n2/edge-attribution-patching

 
 

Repository files navigation

Edge Attribution Patching

Use the minimal-implementation branch for an easy-to-use version of edge attribution patching! All code in the minimal_implementation branch has been created by Oscar Balcells.

This repository is currently under development. It is built on top of https://github.com/neelnanda-io/TransformerLens which we may merge into eventually.

Please cite this work as:

@inproceedings{
  syed2023attribution,
  title={Attribution Patching Outperforms Automated Circuit Discovery},
  author={Aaquib Syed and Can Rager and Arthur Conmy},
  booktitle={NeurIPS Workshop on Attributing Model Behavior at Scale},
  year={2023},
  url={https://openreview.net/forum?id=tiLbFR4bJW}
}

About

Code for my NeurIPS 2024 ATTRIB paper titled "Attribution Patching Outperforms Automated Circuit Discovery"

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 93.7%
  • Python 6.2%
  • Shell 0.1%