-
Notifications
You must be signed in to change notification settings - Fork 540
[PyTorch] Add sink attention support from cuDNN #2148
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Charlene Yang <[email protected]>
Signed-off-by: Charlene Yang <[email protected]>
Signed-off-by: Charlene Yang <[email protected]>
Signed-off-by: Charlene Yang <[email protected]>
Signed-off-by: Charlene Yang <[email protected]>
Signed-off-by: Charlene Yang <[email protected]>
Co-authored-by: Chen Cui <[email protected]> Signed-off-by: Charlene Yang <[email protected]>
Signed-off-by: Charlene Yang <[email protected]>
Signed-off-by: Charlene Yang <[email protected]>
Co-authored-by: Chen Cui <[email protected]> Signed-off-by: Charlene Yang <[email protected]>
Signed-off-by: Charlene Yang <[email protected]>
Signed-off-by: Charlene Yang <[email protected]>
Co-authored-by: Chen Cui <[email protected]> Signed-off-by: Charlene Yang <[email protected]>
Co-authored-by: Chen Cui <[email protected]> Signed-off-by: Charlene Yang <[email protected]>
Co-authored-by: Chen Cui <[email protected]> Signed-off-by: Charlene Yang <[email protected]>
Co-authored-by: Chen Cui <[email protected]> Signed-off-by: Charlene Yang <[email protected]>
Co-authored-by: Chen Cui <[email protected]> Signed-off-by: Charlene Yang <[email protected]>
Co-authored-by: Chen Cui <[email protected]> Signed-off-by: Charlene Yang <[email protected]>
Co-authored-by: Chen Cui <[email protected]> Signed-off-by: Charlene Yang <[email protected]>
Co-authored-by: Chen Cui <[email protected]> Signed-off-by: Charlene Yang <[email protected]>
Co-authored-by: Chen Cui <[email protected]> Signed-off-by: Charlene Yang <[email protected]>
Signed-off-by: Charlene Yang <[email protected]>
Signed-off-by: Charlene Yang <[email protected]>
Signed-off-by: Charlene Yang <[email protected]>
Signed-off-by: Charlene Yang <[email protected]>
Signed-off-by: Charlene Yang <[email protected]>
Signed-off-by: Charlene Yang <[email protected]>
Signed-off-by: Charlene Yang <[email protected]>
Signed-off-by: Charlene Yang <[email protected]>
Signed-off-by: Charlene Yang <[email protected]>
Signed-off-by: Charlene Yang <[email protected]>
Signed-off-by: Charlene Yang <[email protected]>
for more information, see https://pre-commit.ci
|
/te-ci L1 |
Signed-off-by: Charlene Yang <[email protected]>
Signed-off-by: Charlene Yang <[email protected]>
|
@cyanguwa Thank you for the excellent work! I have merged the feature into my own Megatron fork, but I found that every attention backend reports “not supported” when qkv_format is set to 'thd'. Are there any work-arounds? At the moment, in packing mode the qkv format seems to be required to be 'thd'. |
|
/te-ci L1 |
cuichenx
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Tested this branch with nemo training and convergence looks good.
Signed-off-by: Charlene Yang <[email protected]>
|
Tested with cuDNN 9.13.1 in pipeline 35266475. All looks good except for |
Sorry about the late reply. You probably have found my update to frontend in this PR to 1.14.1. The commit for FE now is 1a7b4b7. |
We will add the support for 'thd' in our next PR. Due to release schedules, we didn't have time to push more changes in. We will support 'bshd' and 'sbhd' for now. Thanks! |
Signed-off-by: Charlene Yang <[email protected]>
* first draft; debug plan failure Signed-off-by: Charlene Yang <[email protected]> * debug uid error Signed-off-by: Charlene Yang <[email protected]> * tweak params Signed-off-by: Charlene Yang <[email protected]> * add grad in output Signed-off-by: Charlene Yang <[email protected]> * clean up prints Signed-off-by: Charlene Yang <[email protected]> * fix prints in test Signed-off-by: Charlene Yang <[email protected]> * Apply 1 suggestion(s) to 1 file(s) Co-authored-by: Chen Cui <[email protected]> Signed-off-by: Charlene Yang <[email protected]> * address review comments Signed-off-by: Charlene Yang <[email protected]> * fix unfused grad; add softmax_type; add sink to bwd Signed-off-by: Charlene Yang <[email protected]> * Apply 1 suggestion(s) to 1 file(s) Co-authored-by: Chen Cui <[email protected]> Signed-off-by: Charlene Yang <[email protected]> * fix padding mask; add swa tests; remove requires_grad for off-by-one Signed-off-by: Charlene Yang <[email protected]> * update FE Signed-off-by: Charlene Yang <[email protected]> * Apply 1 suggestion(s) to 1 file(s) Co-authored-by: Chen Cui <[email protected]> Signed-off-by: Charlene Yang <[email protected]> * Apply 1 suggestion(s) to 1 file(s) Co-authored-by: Chen Cui <[email protected]> Signed-off-by: Charlene Yang <[email protected]> * Apply 1 suggestion(s) to 1 file(s) Co-authored-by: Chen Cui <[email protected]> Signed-off-by: Charlene Yang <[email protected]> * Apply 1 suggestion(s) to 1 file(s) Co-authored-by: Chen Cui <[email protected]> Signed-off-by: Charlene Yang <[email protected]> * Apply 1 suggestion(s) to 1 file(s) Co-authored-by: Chen Cui <[email protected]> Signed-off-by: Charlene Yang <[email protected]> * Apply 1 suggestion(s) to 1 file(s) Co-authored-by: Chen Cui <[email protected]> Signed-off-by: Charlene Yang <[email protected]> * Apply 1 suggestion(s) to 1 file(s) Co-authored-by: Chen Cui <[email protected]> Signed-off-by: Charlene Yang <[email protected]> * Apply 1 suggestion(s) to 1 file(s) Co-authored-by: Chen Cui <[email protected]> Signed-off-by: Charlene Yang <[email protected]> * Apply 1 suggestion(s) to 1 file(s) Co-authored-by: Chen Cui <[email protected]> Signed-off-by: Charlene Yang <[email protected]> * fix indent Signed-off-by: Charlene Yang <[email protected]> * fix non-determinism and shapes Signed-off-by: Charlene Yang <[email protected]> * clean up prints Signed-off-by: Charlene Yang <[email protected]> * add GQA Signed-off-by: Charlene Yang <[email protected]> * add CP A2A; dq/dk mismatches Signed-off-by: Charlene Yang <[email protected]> * fix CP A2A; need cleaner solution Signed-off-by: Charlene Yang <[email protected]> * fix CP A2A; pending cudnn kernel change Signed-off-by: Charlene Yang <[email protected]> * minor fixes Signed-off-by: Charlene Yang <[email protected]> * fix world size in unit test; avoid thd format Signed-off-by: Charlene Yang <[email protected]> * fix kernel_backend, dtype in unit test; fix head_dim for FP8 Hopper Signed-off-by: Charlene Yang <[email protected]> * fix thd logic Signed-off-by: Charlene Yang <[email protected]> * fix fp8 context Signed-off-by: Charlene Yang <[email protected]> * tweak CP logging Signed-off-by: Charlene Yang <[email protected]> * allow no_mask/padding for SWA(left,0) Signed-off-by: Charlene Yang <[email protected]> * Revert "allow no_mask/padding for SWA(left,0)" This reverts commit 08b4ccc67a08b6882080b06aa715f541bb832aca. Signed-off-by: Charlene Yang <[email protected]> * add softmax_type to Jax Signed-off-by: Charlene Yang <[email protected]> * add cuDNN version control Signed-off-by: Charlene Yang <[email protected]> * prettify tests Signed-off-by: Charlene Yang <[email protected]> * skip 9.13 for MLA, non 192/128 Signed-off-by: Charlene Yang <[email protected]> * rename compare_with_error Signed-off-by: Charlene Yang <[email protected]> * small cleanups and improvements Signed-off-by: Charlene Yang <[email protected]> * fix minor CI failures Signed-off-by: Charlene Yang <[email protected]> * force sink/dsink to be float32 Signed-off-by: Charlene Yang <[email protected]> * switch FE to GH FE Signed-off-by: Charlene Yang <[email protected]> * return to GH TE main FE commit Signed-off-by: Charlene Yang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update FE to 1.14.1 Signed-off-by: Charlene Yang <[email protected]> * clean up before CI Signed-off-by: Charlene Yang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix lint Signed-off-by: Charlene Yang <[email protected]> * bump up cudnn version Signed-off-by: Charlene Yang <[email protected]> * add backend selection guard for unit tests Signed-off-by: Charlene Yang <[email protected]> * add docstring for softmax type enums in C Signed-off-by: Charlene Yang <[email protected]> --------- Signed-off-by: Charlene Yang <[email protected]> Co-authored-by: Chen Cui <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Description
This PR adds sink attention support (fwd + bwd) to TE-PyTorch.
FusedAttentionbackend for FP16/BF16 and BSHD/SBHD: cuDNN 9.13.1+ and cudnn-frontend 1.14.1UnfusedDotProductAttentionbackend for FP32/FP16/BF16 and BSHD/SBHDcp_comm_type=a2awithFusedAttentionType of change
Changes
Please list the changes introduced in this PR:
cp_comm_typeandsoftmax_typetoAttentionParamspytorch/csrc/extensions/attention.cppand tensor indexing inAux_CTX_Tensorsinfused_attn_f16_arbitrary_seqlen.cuChecklist: