-
Notifications
You must be signed in to change notification settings - Fork 31.6k
Fix torchscript tests for GPT-NeoX #18012
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix torchscript tests for GPT-NeoX #18012
Conversation
|
The documentation is not available anymore as the PR was closed or merged. |
sgugger
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for fixing!
| beta=1.0, | ||
| alpha=(1.0 / self.norm_factor), | ||
| alpha=(torch.tensor(1.0, dtype=self.norm_factor.dtype, device=self.norm_factor.device) / self.norm_factor), | ||
| # alpha=(1.0 / self.norm_factor), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be cleaned up.
patrickvonplaten
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
However, could we add the failing test for reference or do we need to add a new test here?
I updated the PR description to include the current failing test. Regarding new tests, I don't think it's necessary, as we just build the necessary tensors in (however, let me know if you have some idea of new necessary test cases!) |
Perfect thanks! |
LysandreJik
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
* fix dtype issue in _attn * fix RotaryEmbedding * fix RotaryEmbedding 2 * clean up Co-authored-by: ydshieh <[email protected]>
What does this PR do?
Fix torchscript tests for GPT-NeoX. The main issue comes from the fact that current
RotaryEmbeddingchanges the model structure inforward.This PR creates the necessary embeddings in
__init__, which basically makes the cache (of embedding) mechanism useless. Furthermore, the attribute names seems a bit confusing now. We could probably add some attribute (ex.init_sin_cos_cache_seq_len) in config with a value<= max_position_embeddings, but I think it's way too much.Not certain if it is worth it. However, with a PR opened, we have a reference.
The current failing test is
https://github.com/huggingface/transformers/runs/7216768053?check_suite_focus=true