Pad prompts to the right in T5 examples and add EOS token to seq2seq prompts #422
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The default T5 tokenizer will generate padded
input_ids
tensors like below (0 is BOS and PAD token, 1 is EOS token)I think there was two issues that might cause the inputs to T5 to be a bit different than what the model is trained for
padding_side
for the tokenizer, so it defaults to "left" which caused prompts to be left-padded instead of right-paddedPromptPipeline
class appliedadd_special_tokens=False
when tokenizing prompts, causing the EOS</s>
to not be added to the prompt for seq2seq models