-
Notifications
You must be signed in to change notification settings - Fork 502
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Drop RoPE when filling KV cache #3346
Conversation
✅ Deploy Preview for pytorch-fbgemm-docs ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
@GD06 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
Summary: This PR provides CUDA kernels to fill KV cache without applying ROPE. Reviewed By: jianyuh Differential Revision: D66307820 Pulled By: GD06
77e3b5c
to
3aab114
Compare
This pull request was exported from Phabricator. Differential Revision: D66307820 |
Summary: This PR provides CUDA kernels to fill KV cache without applying ROPE. Reviewed By: jianyuh Differential Revision: D66307820 Pulled By: GD06
3aab114
to
ae41dfa
Compare
This pull request was exported from Phabricator. Differential Revision: D66307820 |
Summary: This PR provides CUDA kernels to fill KV cache without applying ROPE. Reviewed By: jianyuh Differential Revision: D66307820 Pulled By: GD06
ae41dfa
to
25e2d83
Compare
This pull request was exported from Phabricator. Differential Revision: D66307820 |
Summary: X-link: facebookresearch/FBGEMM#488 This PR provides CUDA kernels to fill KV cache without applying ROPE. Reviewed By: jianyuh Differential Revision: D66307820 Pulled By: GD06
25e2d83
to
5cbab5a
Compare
This pull request was exported from Phabricator. Differential Revision: D66307820 |
Summary: X-link: facebookresearch/FBGEMM#488 This PR provides CUDA kernels to fill KV cache without applying ROPE. Reviewed By: jianyuh Differential Revision: D66307820 Pulled By: GD06
5cbab5a
to
b5f25d5
Compare
This pull request was exported from Phabricator. Differential Revision: D66307820 |
1 similar comment
This pull request was exported from Phabricator. Differential Revision: D66307820 |
This PR provides CUDA kernels to fill KV cache without applying ROPE.