Skip to content

Conversation

@younesbelkada
Copy link
Contributor

What does this PR do?

  • changed the non passing tests to fp32
  • reduced sequence length
  • remove padding test

All these matters have been discussed on Slack but mainly:

1- Generations tests were not passing because the linear layers does not give the same results between torch 1.11 and torch 1.12
2- batched generation can be flaky sometimes in half precision mode, this should be expected. Therefore we reduce the sequence length of the generated output
3- One should always use padding_side=left when doing batched generations

cc @ydshieh @patrickvonplaten

- changed to fp32
- reduced sequence length
- remove padding test
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

@younesbelkada younesbelkada changed the title modifying tests BLOOM - modifying slow tests Jun 30, 2022
@younesbelkada younesbelkada marked this pull request as ready for review June 30, 2022 15:41
@ydshieh
Copy link
Collaborator

ydshieh commented Jun 30, 2022

Hi, @younesbelkada

Could you explain a bit more on One should always use padding_side=left when doing batched generations?
And what are examples of test failures when using padding=right ? I couldn't find you mentioning this on Slack.

Thanks!

@younesbelkada
Copy link
Contributor Author

Hi @ydshieh !
From the internal discussions here is a summary of why one should always use padding_side=left (cc @patrickvonplaten ):

  • Imagine: ["hello my name is", "hey <pad> <pad> <pad>"]
    For the first input the correct token will be sampled from "is" - however for the second input, generate would incorrectly sample from "<pad>" where as it should sample from "hey". Making sure everything is batched on the left circumvents this problem !

@younesbelkada younesbelkada requested a review from sgugger July 4, 2022 09:10
@patrickvonplaten
Copy link
Contributor

@younesbelkada - IMO we should not expect the generation to be flaky ever, why is this the case here?

@ydshieh
Copy link
Collaborator

ydshieh commented Jul 5, 2022

Hi @younesbelkada : I have 3 questions 🙏

  • with fp16:
    • do we get stable results in a specific torch version (i.e. the same result across many runs)
  • after changing to fp32 (without reducing the seq length)
    • do we get the same results across torch 1.11 and 1.12?
    • do we get stable results in a specific torch version (i.e. the same result across many runs)

@younesbelkada
Copy link
Contributor Author

Hi @ydshieh !
After merging this PR: #17866 the slow tests are now passing. Our conclusion is that:
1- In half precision mode we might not get the same results across batched generation and it should be expected
2- This behavior is observed ONLY on small models !

path_350m = "bigscience/bloom-350m"
model = BloomForCausalLM.from_pretrained(path_350m, torch_dtype="auto", use_cache=True).cuda()
model = BloomForCausalLM.from_pretrained(
path_350m, torch_dtype=torch.float32, use_cache=False, revision="gs555750"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use the most up-to-date model here?

)

@slow
def test_right_left_batched_input(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would we delete this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because we always use padding_side=left for batched generation for autoregressive models so for me there is no point keeping this test

path_350m = "bigscience/bloom-350m"
model = BloomForCausalLM.from_pretrained(path_350m, torch_dtype="auto", use_cache=True).cuda()
model = BloomForCausalLM.from_pretrained(
path_350m, torch_dtype="auto", use_cache=True, revision="gs555750"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we add a revision here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that we don't need it there as well you are right

@patrickvonplaten
Copy link
Contributor

I'm still a bit confused by this PR - generations are not flaky for pretrained models normally and it's a bit weird to me that all this test does is modifying slow generation tests

@younesbelkada
Copy link
Contributor Author

Closing as it has been fixed by #18344

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants