-
Notifications
You must be signed in to change notification settings - Fork 31.9k
BLOOM - modifying slow tests #17963
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BLOOM - modifying slow tests #17963
Conversation
- changed to fp32 - reduced sequence length - remove padding test
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. |
|
Hi, @younesbelkada Could you explain a bit more on Thanks! |
|
Hi @ydshieh !
|
|
@younesbelkada - IMO we should not expect the generation to be flaky ever, why is this the case here? |
|
Hi @younesbelkada : I have 3 questions 🙏
|
| path_350m = "bigscience/bloom-350m" | ||
| model = BloomForCausalLM.from_pretrained(path_350m, torch_dtype="auto", use_cache=True).cuda() | ||
| model = BloomForCausalLM.from_pretrained( | ||
| path_350m, torch_dtype=torch.float32, use_cache=False, revision="gs555750" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not use the most up-to-date model here?
| ) | ||
|
|
||
| @slow | ||
| def test_right_left_batched_input(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why would we delete this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because we always use padding_side=left for batched generation for autoregressive models so for me there is no point keeping this test
| path_350m = "bigscience/bloom-350m" | ||
| model = BloomForCausalLM.from_pretrained(path_350m, torch_dtype="auto", use_cache=True).cuda() | ||
| model = BloomForCausalLM.from_pretrained( | ||
| path_350m, torch_dtype="auto", use_cache=True, revision="gs555750" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we add a revision here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that we don't need it there as well you are right
|
I'm still a bit confused by this PR - generations are not flaky for pretrained models normally and it's a bit weird to me that all this test does is modifying slow generation tests |
|
Closing as it has been fixed by #18344 |
What does this PR do?
All these matters have been discussed on Slack but mainly:
1- Generations tests were not passing because the linear layers does not give the same results between torch 1.11 and torch 1.12
2- batched generation can be flaky sometimes in half precision mode, this should be expected. Therefore we reduce the sequence length of the generated output
3- One should always use
padding_side=leftwhen doing batched generationscc @ydshieh @patrickvonplaten