Fixed Masking in HookedTransformer.generate #999
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
Currently calling HookedTransformer.generate() with a batch of inputs, i.e. a list of two prompts does not correctly mask the padding tokens, causing the batched outputs to be different from the outputs of generating 1-by-1 for the shorter sequences. Current code does not mask attention at all, so I implemented a ~5 line change to mask attention correctly using existing functions. This fix only works when padding_side="left" so I changed the default value for generate to be "left", not sure how to fix this for right padding. After the fix .generate() with a batch gives the same output as generating one by one (with do_sample = False).
Fixes # (issue)
Type of change
Please delete options that are not relevant.
Changing the default value/fixing behavior will change some outputs but only ones that were broken already.
Screenshots
Before
Please attach before and after screenshots of the change if applicable.


After
Checklist: