Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 9 additions & 7 deletions demos/Exploratory_Analysis_Demo.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -100,7 +100,7 @@
},
{
"cell_type": "code",
"execution_count": 2,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
Expand All @@ -118,7 +118,8 @@
"from jaxtyping import Float\n",
"\n",
"import transformer_lens.utils as utils\n",
"from transformer_lens import ActivationCache, HookedTransformer"
"from transformer_lens import ActivationCache\n",
"from transformer_lens.model_bridge import TransformerBridge"
]
},
{
Expand Down Expand Up @@ -245,12 +246,12 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"The first step is to load in our model, GPT-2 Small, a 12 layer and 80M parameter transformer with `HookedTransformer.from_pretrained`. The various flags are simplifications that preserve the model's output but simplify its internals."
"The first step is to load in our model, GPT-2 Small, a 12 layer and 80M parameter transformer with `TransformerBridge.boot_transformers`. The various flags are simplifications that preserve the model's output but simplify its internals."
]
},
{
"cell_type": "code",
"execution_count": 5,
"execution_count": null,
"metadata": {},
"outputs": [
{
Expand All @@ -270,13 +271,14 @@
],
"source": [
"# NBVAL_IGNORE_OUTPUT\n",
"model = HookedTransformer.from_pretrained(\n",
" \"gpt2-small\",\n",
"model = TransformerBridge.boot_transformers(\n",
" \"gpt2\",\n",
" center_unembed=True,\n",
" center_writing_weights=True,\n",
" fold_ln=True,\n",
" refactor_factored_attn_matrices=True,\n",
")\n",
"model.enable_compatibility_mode()\n",
"\n",
"# Get the default device used\n",
"device: torch.device = utils.get_device()"
Expand Down Expand Up @@ -372,7 +374,7 @@
"\n",
"We want models that can take in arbitrary text, but models need to have a fixed vocabulary. So the solution is to define a vocabulary of **tokens** and to deterministically break up arbitrary text into tokens. Tokens are, essentially, subwords, and are determined by finding the most frequent substrings - this means that tokens vary a lot in length and frequency! \n",
"\n",
"Tokens are a *massive* headache and are one of the most annoying things about reverse engineering language models... Different names will be different numbers of tokens, different prompts will have the relevant tokens at different positions, different prompts will have different total numbers of tokens, etc. Language models often devote significant amounts of parameters in early layers to convert inputs from tokens to a more sensible internal format (and do the reverse in later layers). You really, really want to avoid needing to think about tokenization wherever possible when doing exploratory analysis (though, of course, it's relevant later when trying to flesh out your analysis and make it rigorous!). HookedTransformer comes with several helper methods to deal with tokens: `to_tokens, to_string, to_str_tokens, to_single_token, get_token_position`\n",
"Tokens are a *massive* headache and are one of the most annoying things about reverse engineering language models... Different names will be different numbers of tokens, different prompts will have the relevant tokens at different positions, different prompts will have different total numbers of tokens, etc. Language models often devote significant amounts of parameters in early layers to convert inputs from tokens to a more sensible internal format (and do the reverse in later layers). You really, really want to avoid needing to think about tokenization wherever possible when doing exploratory analysis (though, of course, it's relevant later when trying to flesh out your analysis and make it rigorous!). TransformerBridge comes with several helper methods to deal with tokens: `to_tokens, to_string, to_str_tokens, to_single_token, get_token_position`\n",
"\n",
"**Exercise:** I recommend using `model.to_str_tokens` to explore how the model tokenizes different strings. In particular, try adding or removing spaces at the start, or changing capitalization - these change tokenization!</details>"
]
Expand Down
Loading