BLOOM - modifying slow tests #17963

younesbelkada · 2022-06-30T15:27:49Z

What does this PR do?

changed the non passing tests to fp32
reduced sequence length
remove padding test

All these matters have been discussed on Slack but mainly:

1- Generations tests were not passing because the linear layers does not give the same results between torch 1.11 and torch 1.12
2- batched generation can be flaky sometimes in half precision mode, this should be expected. Therefore we reduce the sequence length of the generated output
3- One should always use padding_side=left when doing batched generations

cc @ydshieh @patrickvonplaten

- changed to fp32 - reduced sequence length - remove padding test

HuggingFaceDocBuilderDev · 2022-06-30T15:38:27Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

ydshieh · 2022-06-30T18:00:01Z

Hi, @younesbelkada

Could you explain a bit more on One should always use padding_side=left when doing batched generations?
And what are examples of test failures when using padding=right ? I couldn't find you mentioning this on Slack.

Thanks!

younesbelkada · 2022-07-04T09:04:09Z

Hi @ydshieh !
From the internal discussions here is a summary of why one should always use padding_side=left (cc @patrickvonplaten ):

Imagine: ["hello my name is", "hey <pad> <pad> <pad>"]
For the first input the correct token will be sampled from "is" - however for the second input, generate would incorrectly sample from "<pad>" where as it should sample from "hey". Making sure everything is batched on the left circumvents this problem !

patrickvonplaten · 2022-07-04T18:28:18Z

@younesbelkada - IMO we should not expect the generation to be flaky ever, why is this the case here?

ydshieh · 2022-07-05T18:20:11Z

Hi @younesbelkada : I have 3 questions 🙏

with fp16:
- do we get stable results in a specific torch version (i.e. the same result across many runs)
after changing to fp32 (without reducing the seq length)
- do we get the same results across torch 1.11 and 1.12?
- do we get stable results in a specific torch version (i.e. the same result across many runs)

younesbelkada · 2022-07-13T16:47:35Z

Hi @ydshieh !
After merging this PR: #17866 the slow tests are now passing. Our conclusion is that:
1- In half precision mode we might not get the same results across batched generation and it should be expected
2- This behavior is observed ONLY on small models !

patrickvonplaten · 2022-07-04T18:27:03Z

tests/models/bloom/test_modeling_bloom.py

        path_350m = "bigscience/bloom-350m"
-        model = BloomForCausalLM.from_pretrained(path_350m, torch_dtype="auto", use_cache=True).cuda()
+        model = BloomForCausalLM.from_pretrained(
+            path_350m, torch_dtype=torch.float32, use_cache=False, revision="gs555750"


Why not use the most up-to-date model here?

patrickvonplaten · 2022-07-27T09:34:37Z

tests/models/bloom/test_modeling_bloom.py

        )

-    @slow
-    def test_right_left_batched_input(self):


Why would we delete this?

Because we always use padding_side=left for batched generation for autoregressive models so for me there is no point keeping this test

patrickvonplaten · 2022-07-27T09:34:24Z

tests/models/bloom/test_modeling_bloom.py

        path_350m = "bigscience/bloom-350m"
-        model = BloomForCausalLM.from_pretrained(path_350m, torch_dtype="auto", use_cache=True).cuda()
+        model = BloomForCausalLM.from_pretrained(
+            path_350m, torch_dtype="auto", use_cache=True, revision="gs555750"


Why do we add a revision here?

I think that we don't need it there as well you are right

patrickvonplaten · 2022-07-27T09:36:28Z

I'm still a bit confused by this PR - generations are not flaky for pretrained models normally and it's a bit weird to me that all this test does is modifying slow generation tests

younesbelkada · 2022-08-16T23:22:48Z

Closing as it has been fixed by #18344

younesbelkada added 2 commits June 30, 2022 15:22

modifying tests

564314d

- changed to fp32 - reduced sequence length - remove padding test

make quality

6982516

younesbelkada changed the title ~~modifying tests~~ BLOOM - modifying slow tests Jun 30, 2022

younesbelkada marked this pull request as ready for review June 30, 2022 15:41

younesbelkada requested a review from patrickvonplaten June 30, 2022 15:41

add correct revision for backward compt

4d936dd

younesbelkada requested a review from sgugger July 4, 2022 09:10

younesbelkada mentioned this pull request Jul 18, 2022

BLOOM minor fixes small test #18175

Merged

patrickvonplaten reviewed Jul 27, 2022

View reviewed changes

younesbelkada closed this Aug 16, 2022

BLOOM - modifying slow tests #17963

BLOOM - modifying slow tests #17963

Uh oh!

Conversation

younesbelkada commented Jun 30, 2022

What does this PR do?

Uh oh!

HuggingFaceDocBuilderDev commented Jun 30, 2022

Uh oh!

ydshieh commented Jun 30, 2022

Uh oh!

younesbelkada commented Jul 4, 2022

Uh oh!

patrickvonplaten commented Jul 4, 2022

Uh oh!

ydshieh commented Jul 5, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

younesbelkada commented Jul 13, 2022

Uh oh!

patrickvonplaten Jul 4, 2022

Choose a reason for hiding this comment

Uh oh!

patrickvonplaten Jul 27, 2022

Choose a reason for hiding this comment

Uh oh!

younesbelkada Jul 27, 2022

Choose a reason for hiding this comment

Uh oh!

patrickvonplaten Jul 27, 2022

Choose a reason for hiding this comment

Uh oh!

younesbelkada Jul 27, 2022

Choose a reason for hiding this comment

Uh oh!

patrickvonplaten commented Jul 27, 2022

Uh oh!

younesbelkada commented Aug 16, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ydshieh commented Jul 5, 2022 •

edited

Loading