To view our Streamlit, visit https://copy-suppression.streamlit.app/
(If you're interested in our research, please reach out! Our emails are {cal.s.mcdougall, arthurconmy, thisiscodyr}@gmail.com
)
This repo serves two purposes:
- An edited version of TransformerLens with a couple of extra features (see below).
- Hosting streamlit pages from https://github.com/callummcdougall/SERI-MATS-2023-Streamlit-pages/blob/main/transformer_lens/rs/callum2/st_page/Home.py
See transformer_lens/rs/arthurs_notebooks/example_notebook.py
for example usage.
This setup assumes you're using an SSH key to access Github. See here and the associated links on that page (if you don't have an SSH key to begin with)
$ git clone [email protected]:callummcdougall/SERI-MATS-2023-Streamlit-pages.git
$ cd TransformerLens
$ poetry install
pip install -e
works too, though not using identical packages to poetry.lock
plausibly could be a problem.
You need to have poetry installed; to do this run
curl -sSL https://install.python-poetry.org | python3 -
and then either try to edit PATH
manually or do echo -e "$(cat ~/.bashrc)\nexport PATH=\"$HOME/.local/bin:\$PATH\"\n" > ~/.bashrc; source ~/.bashrc
to run through the poetry install tricks on a linux machine.
You should add requirements, e.g einops
, via running poetry add einops
.
We stored some large files in git history and need clean them up; try git clone --depth 1 [email protected]:callummcdougall/SERI-MATS-2023-Streamlit-pages.git
if git clone is lagging.
If you want to launch streamlit pages, run
pip install streamlit
cd transformer_lens/rs/callum2/st_page
streamlit run Home.py
Difference from the main branch of TransformerLens
- We set the
ACCELERATE_DISABLE_RICH
environment variable intransformer_lens/__init__.py
to"1"
to stop an annoying reformatting of notebook error messages - We add the
qkv_normalized_input
hooks that can be optionally added to models
- Surveying the direct effects of individual attention heads:
transformer_lens/rs/arthurs_notebooks/direct_effect_survey.py
- (TODO: scan through the paper, ideally clean up the repo too)
- (TODO: write a better implementation of the learnable scale and bias vectors)
(Written by Callum) These are the directories which I use to structure my own work.
This directory is for 2 small investigations:
- How does the head manage to attend to BOS by default?
Conclusions - when you look at the cosine similarity of "residual stream vector before attn layer 10" and "query bias for head 10.7", it's very positive and in a very tight range for all tokens (between 0.45 and 0.47) whenever position is zero, and the same but very negative for all tokens whenever position isn't zero. So this isn't a function of BOS, it's a function of position. This has implications for how CSPA works; the query-side prediction has to overcome some threshold to actually activate the copy suppression mechanism.
- What's the perpendicular component of the query, in IOI?
Conclusions -
- Adding semantically similar tokens
"Mary"
,"mary"
rather than just" Mary"
doesn't seem to help. - Found weak evidence that there's some kind of "indirect prediction", because when you take the perpendicular component and put them through the MLPs it does favour IO over S1 (but the MLPs don't have much impact in IOI so this effect isn't large anyway).
Hosting all of the Streamlit pages. This isn't for generating any plots (at least I don't use it for that); it's exclusively for hosting pages & storing media files.
The pages are:
- OV and QK circuits - you get to see what tokens are most attended to (QK circuit, prediction-attention) and what tokens are most suppressed (OV circuit). It's a nice way to highlight semantic similarity, and build intuition for how it works.
- Browse Examples - the most important page. You get to investigate OWT examples, and see how all parts of the copy suppression mechanism works. You can:
- See the loss change per token when you ablate, i.e. find the MIDS (tokens which the head is most helpful for).
- See the logits pre and post-ablation, as well as the direct logit attribution for this head. You can confirm that the head is pushing down tokens which appear in context, for most of the MIDS examples.
- Look at the attention patterns. You can confirm that the head is attending to the tokens which it pushes down.
- Look at the logit lens before head 10.7. You can confirm that the head is predicting precisely the words which it is attending to.
This is where I get the copy suppression-preserving ablation results. In other words, the stuff that's gonna be in section 3.3 of the paper (and that makes up one of the Streamlit pages).
It also adds to the HTML plots dictionary, for the "Browse Examples" Streamlit page.
This generates code for section 3.1, and generates the data for the following Streamlit pages:
- OV and QK circuits
This is exclusively for generating the HTML figures that will be on the following Streamlit pages:
- Browse Examples
- Test Your Own Examples
=======
Optionally, if you want Jupyter Lab you can run
poetry run pip install jupyterlab
(to install in the same virtual environment), and then run withpoetry run jupyter lab
.
Then the library can be imported as import transformer_lens
.
If adding a feature, please add unit tests for it to the tests folder, and check that it hasn't broken anything major using the existing tests (install pytest and run it in the root TransformerLens/ directory).
- All tests via
make test
- Unit tests only via
make unit-test
- Acceptance tests only via
make acceptance-test
This project uses pycln
, isort
and black
for formatting, pull requests are checked in github actions.
- Format all files via
make format
- Only check the formatting via
make check-format
If adding a feature, please add it to the demo notebook in the demos
folder, and check that it works in the demo format. This can be tested by replacing pip install git+https://github.com/neelnanda-io/TransformerLens.git
with pip install git+https://github.com/<YOUR_USERNAME_HERE>/TransformerLens.git
in the demo notebook, and running it in a fresh environment.
Please cite us with :
@article{copy_suppression,
title={Copy Suppression: Comprehensively Understanding an Attention Head},
author={McDougall, Callum and Conmy, Arthur and Rushing, Cody and McGrath, Thomas and Nanda, Neel},
journal={arXiv preprint},
year={2023},
}
(arXiv should be out soon!)