Fix GPT-NeoX-20B past handling, attention computation by zphang · Pull Request #17811 · huggingface/transformers

zphang · 2022-06-21T22:30:58Z

What does this PR do?

Fixes GPT-NeoX-20B handing of the past object to correctly be used in .generate
Swaps attention computation for one more similar in the original training code, to hopefully avoid NaNs
Update docstring, removed unnecessary dropout configs in config object

Fixes # (issue)

#17632
#17452 (Hopefully)

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

@sgugger

sgugger

Thanks for the fix, I can confirm that with this PR, I get the same generations in float32 and float16 (whereas before, I get either a crappy one or some Nans in float16) for EleutherAI/gpt-neox-20b.

The cleaning up in the config LGTM, thanks for making the docstring on par with the defaults, and the two attributes you remove are not used anywhere.

src/transformers/models/gpt_neox/modeling_gpt_neox.py

HuggingFaceDocBuilderDev · 2022-06-21T22:51:53Z

The documentation is not available anymore as the PR was closed or merged.

sgugger · 2022-06-21T22:56:19Z

There are a few equivalence tests failing with the PR, if you can dive into it. Let us know if you need any help!

patrickvonplaten · 2022-06-21T23:28:06Z

src/transformers/models/gpt_neox/modeling_gpt_neox.py

Out of curiosity why is the second statement needed here? The past[0] is not None part?

patrickvonplaten · 2022-06-21T23:28:24Z

src/transformers/models/gpt_neox/configuration_gpt_neox.py

Thanks for cleaning this up!

patrickvonplaten

Thanks for fixing @zphang !

zphang · 2022-06-29T01:04:14Z

I've run the tests locally and they pass, so I can't seem to reproduce the test errors. Can someone else give them a try?

sgugger · 2022-06-29T17:03:07Z

The tests pass on GPU but not on CPU on my side. So doing

CUDA_VISIBLE_DEVICES="" pytest tests/models/gpt_neox/test_modeling_gpt_neox.py

reproduces the failure.

…ly avoid NaN, update docs

sgugger · 2022-06-30T12:48:01Z

Thanks again! Nice to be able to use GPT-Neo-X in float16 for generations :-)

) * Fix GPT-NeoX-20B past handling, swap attention computation to hopefully avoid NaN, update docs * 20B tests

zphang force-pushed the neox_20b_fixes_220621_v2 branch from f6c9561 to a84811d Compare June 21, 2022 22:42

sgugger approved these changes Jun 21, 2022

View reviewed changes

src/transformers/models/gpt_neox/modeling_gpt_neox.py Outdated Show resolved Hide resolved

patrickvonplaten reviewed Jun 21, 2022

View reviewed changes

src/transformers/models/gpt_neox/configuration_gpt_neox.py Outdated

Copy link

Contributor

patrickvonplaten Jun 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for cleaning this up!

patrickvonplaten approved these changes Jun 21, 2022

View reviewed changes

zphang force-pushed the neox_20b_fixes_220621_v2 branch from ce7c60e to a84811d Compare June 29, 2022 01:03

zphang added 2 commits June 29, 2022 20:55

Fix GPT-NeoX-20B past handling, swap attention computation to hopeful…

50f6601

…ly avoid NaN, update docs

20B tests

d2e9de9

zphang force-pushed the neox_20b_fixes_220621_v2 branch from c946908 to d2e9de9 Compare June 30, 2022 03:55

sgugger merged commit 205bc41 into huggingface:main Jun 30, 2022

viclzhu pushed a commit to viclzhu/transformers that referenced this pull request Jul 18, 2022

Fix GPT-NeoX-20B past handling, attention computation (huggingface#17811

353a747

) * Fix GPT-NeoX-20B past handling, swap attention computation to hopefully avoid NaN, update docs * 20B tests

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix GPT-NeoX-20B past handling, attention computation#17811

Fix GPT-NeoX-20B past handling, attention computation#17811
sgugger merged 2 commits intohuggingface:mainfrom
zphang:neox_20b_fixes_220621_v2

zphang commented Jun 21, 2022

Uh oh!

sgugger left a comment •

edited

Loading

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented Jun 21, 2022 •

edited

Loading

Uh oh!

sgugger commented Jun 21, 2022

Uh oh!

patrickvonplaten Jun 21, 2022

Uh oh!

patrickvonplaten Jun 21, 2022

Uh oh!

patrickvonplaten left a comment

Uh oh!

zphang commented Jun 29, 2022

Uh oh!

sgugger commented Jun 29, 2022

Uh oh!

sgugger commented Jun 30, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

zphang commented Jun 21, 2022

What does this PR do?

Before submitting

Who can review?

Uh oh!

sgugger left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented Jun 21, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sgugger commented Jun 21, 2022

Uh oh!

patrickvonplaten Jun 21, 2022

Choose a reason for hiding this comment

Uh oh!

patrickvonplaten Jun 21, 2022

Choose a reason for hiding this comment

Uh oh!

patrickvonplaten left a comment

Choose a reason for hiding this comment

Uh oh!

zphang commented Jun 29, 2022

Uh oh!

sgugger commented Jun 29, 2022

Uh oh!

sgugger commented Jun 30, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

sgugger left a comment •

edited

Loading

HuggingFaceDocBuilderDev commented Jun 21, 2022 •

edited

Loading