-
Notifications
You must be signed in to change notification settings - Fork 460
Description
Describe the bug
Currently HookedTransformer.generate() does mask padding tokens when generating a batch of inputs, causing generating for a batch of prompts to give different results from generating from each prompt one-by-one.
Code example
model = HookedTransformer.from_pretrained("gpt2")
input_prompts = ["Hello, my dog is cute", "This is a much longer text. Hello, my cat is cute"]
orig_outputs = []
for prompt in input_prompts:
out = model.generate(prompt, verbose=False, do_sample=False)
orig_outputs.append(out)
batched_outputs = model.generate(input_prompts, verbose=False, do_sample=False)
for i in range(len(orig_outputs)):
assert orig_outputs[i] == batched_outputs[i]
In this example the output for the shorter prompt changes when batched, while the longer prompt return the same output. This is because attention to padding tokens is not masked. After adding masking for the padding according to the PR the shorter sentence also returns the same output when batching.
System Info
Using transformer_lens 4.55.0(installed from pip) on a Linux environment, python 3.10.18
Additional context
I have added a test case and a fix for generating with padding="left" on the linked PR, but I am not sure how this can be fixed for right padding.
Checklist
- [ x] I have checked that there is no similar issue in the repo (required)