Fix GPT-NeoX-20B past handling, attention computation#17811
Fix GPT-NeoX-20B past handling, attention computation#17811sgugger merged 2 commits intohuggingface:mainfrom
Conversation
f6c9561 to
a84811d
Compare
There was a problem hiding this comment.
Thanks for the fix, I can confirm that with this PR, I get the same generations in float32 and float16 (whereas before, I get either a crappy one or some Nans in float16) for EleutherAI/gpt-neox-20b.
The cleaning up in the config LGTM, thanks for making the docstring on par with the defaults, and the two attributes you remove are not used anywhere.
|
The documentation is not available anymore as the PR was closed or merged. |
|
There are a few equivalence tests failing with the PR, if you can dive into it. Let us know if you need any help! |
There was a problem hiding this comment.
Out of curiosity why is the second statement needed here? The past[0] is not None part?
There was a problem hiding this comment.
Thanks for cleaning this up!
patrickvonplaten
left a comment
There was a problem hiding this comment.
Thanks for fixing @zphang !
ce7c60e to
a84811d
Compare
|
I've run the tests locally and they pass, so I can't seem to reproduce the test errors. Can someone else give them a try? |
|
The tests pass on GPU but not on CPU on my side. So doing reproduces the failure. |
…ly avoid NaN, update docs
c946908 to
d2e9de9
Compare
|
Thanks again! Nice to be able to use GPT-Neo-X in float16 for generations :-) |
What does this PR do?
Fixes # (issue)
#17632
#17452 (Hopefully)
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
@sgugger