-
Notifications
You must be signed in to change notification settings - Fork 31.9k
[ProcessingIdefics] Attention mask bug with padding
#29449
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ProcessingIdefics] Attention mask bug with padding
#29449
Conversation
|
Thanks for the PR! Reviewing asap |
amyeroberts
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @bri25yu, thanks for working on this!
Making the padding logic be consistent within the processor is a good idea, but I don't think we should change the default behaviour.
| self, | ||
| prompts: Union[List[TextInput], List[List[TextInput]]], | ||
| padding: Union[bool, str, PaddingStrategy] = False, | ||
| padding: Union[bool, str, PaddingStrategy] = "longest", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should change the default here for two reasons:
- It doesn't match the default behaviour for most processing classes
- It changes the default behaviour, which can be considered a breaking change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bri25yu
You may have tagged the wrong person 🙃
It doesn't match the default behaviour for most processing classes
Does the idefics model support non-padded inputs? From my understanding of the original issue, it seems they desire expect some padding even when the argument is not passed.
It changes the default behaviour, which can be considered a breaking change
I'm not sure if this is a bug, but the default behavior appears to have been inaccurate to begin with. Even if the user passes in padding=False, lines 347-354 seems to be forcibly padding the text input to the maximum sequence length anyways.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You may have tagged the wrong person 🙃
Woops, yes, sorry about that
Does the idefics model support non-padded inputs? From my understanding of the original issue, it seems they desire expect some padding even when the argument is not passed.
No, but no models support non-padded inputs if batch_size > 1 and the input sequences are of different lengths, but all processors and tokenizers do not pad the inputs by default.
I'm not sure if this is a bug, but the default behavior appears to have been inaccurate to begin with. Even if the user passes in padding=False, lines 347-354 seems to be forcibly padding the text input to the maximum sequence length anyways.
In this case, the forcible padding when padding=False should be removed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case, the forcible padding when padding=False should be removed
Yes, this behavior was removed as part of this PR. After this PR, padding=False or not setting padding will not pad the input.
I have modified the PR to default to padding=False, and the unit tests (and one of the integration tests) to explicitly specify padding='longest'. I guess my only concern was that by changing this behavior, a user which was relying on the default behavior would find their code broken overnight.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess my only concern was that by changing this behavior, a user which was relying on the default behavior would find their code broken overnight.
@byi8220 Hmm, yes, this is tricky and that's a good point. OK, in this case, I think your original solution of setting padding='longest' as default is best, ideally with a comment linking to this issue to explain why the default is different and adding a description for the False option in the docstring
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
…byi8220/transformers into attention-mask-bug-with-padding
ProcessingIdefics] Attention mask bug with padding
amyeroberts
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this and adding tests!
What does this PR do?
Fixes #28591
This PR does the following:
IdeficsProcessor.__call__()spaddingparam to'longest'instead ofFalse.IdeficsProcessorTest.test_tokenizer_paddingIdeficsProcessorTest.test_tokenizer_left_paddingwhich is a copy ofIdeficsProcessorTestexcept using default tokenizer (which left pads). This could have been parameterized but it felt fine to just duplicate it.Context
IIUC, it seems like this code is currently always hand-performing an additional padding pass after the tokenizer performs it's padding, according to this code: https://github.com/huggingface/transformers/blob/main/src/transformers/models/idefics/processing_idefics.py#L348-L354
And since the output tensors are stacked together in this code, they all have to be of the same size, i.e. padded or otherwise normalized.
Note: Since this changes the default padding, now anyone who calls this function with an explicit value of
padding=Falsewill likely encounter an error.Testing
Unit tests ran with
pytest ./tests/models/idefics/pass. (129 passed, 115 skipped, 8 warnings in 19.22s)Did not run the slow tests since, running all tests with
RUN_SLOW=yesenabled crashes my workstation.Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.
@ArthurZucker
@amyeroberts