[PyTorch] Add sink attention support from cuDNN #2148

cyanguwa · 2025-09-02T21:57:30Z

Description

This PR adds sink attention support (fwd + bwd) to TE-PyTorch.

FusedAttention backend for FP16/BF16 and BSHD/SBHD: cuDNN 9.13.1+ and cudnn-frontend 1.14.1
UnfusedDotProductAttention backend for FP32/FP16/BF16 and BSHD/SBHD
context parallel support for cp_comm_type=a2a with FusedAttention

Type of change

Documentation change (change only to the documentation, either a fix or a new content)
Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Infra/Build change
Code refactoring

Changes

Please list the changes introduced in this PR:

Added sink attention support via cuDNN
Added cp_comm_type and softmax_type to AttentionParams
Improved tensor allocation in pytorch/csrc/extensions/attention.cpp and tensor indexing in Aux_CTX_Tensors in fused_attn_f16_arbitrary_seqlen.cu
Improved CP unit tests

Checklist:

I have read and followed the contributing guidelines
The functionality is complete
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes

Signed-off-by: Charlene Yang <[email protected]>

Co-authored-by: Chen Cui <[email protected]> Signed-off-by: Charlene Yang <[email protected]>

Signed-off-by: Charlene Yang <[email protected]>

Co-authored-by: Chen Cui <[email protected]> Signed-off-by: Charlene Yang <[email protected]>

Signed-off-by: Charlene Yang <[email protected]>

Co-authored-by: Chen Cui <[email protected]> Signed-off-by: Charlene Yang <[email protected]>

Signed-off-by: Charlene Yang <[email protected]>

transformer_engine/pytorch/attention/dot_product_attention/utils.py

Signed-off-by: Charlene Yang <[email protected]>

for more information, see https://pre-commit.ci

cyanguwa · 2025-09-11T19:31:02Z

/te-ci L1

Signed-off-by: Charlene Yang <[email protected]>

jeremyyx · 2025-09-17T03:06:25Z

@cyanguwa Thank you for the excellent work! I have merged the feature into my own Megatron fork, but I found that every attention backend reports “not supported” when qkv_format is set to 'thd'. Are there any work-arounds? At the moment, in packing mode the qkv format seems to be required to be 'thd'.

cyanguwa · 2025-09-17T05:20:08Z

/te-ci L1

cuichenx

LGTM. Tested this branch with nemo training and convergence looks good.

Signed-off-by: Charlene Yang <[email protected]>

cyanguwa · 2025-09-21T05:59:22Z

Tested with cuDNN 9.13.1 in pipeline 35266475. All looks good except for cp_4_0 tests. Have reported them to cuDNN in bug 5522629. Based on @cuichenx's review and convergence testing, I'm merging the PR.

cyanguwa · 2025-09-21T06:09:16Z

Nice work! How could I find cudnn-frontend 1.14.1? I can only install 1.14.0. And I clone the latest cudnn-frontend, which still do not have "set_dsink_token"

Sorry about the late reply. You probably have found my update to frontend in this PR to 1.14.1. The commit for FE now is 1a7b4b7.

cyanguwa · 2025-09-21T06:10:25Z

@cyanguwa Thank you for the excellent work! I have merged the feature into my own Megatron fork, but I found that every attention backend reports “not supported” when qkv_format is set to 'thd'. Are there any work-arounds? At the moment, in packing mode the qkv format seems to be required to be 'thd'.

We will add the support for 'thd' in our next PR. Due to release schedules, we didn't have time to push more changes in. We will support 'bshd' and 'sbhd' for now. Thanks!

tests/pytorch/attention/test_kv_cache.py

tests/pytorch/utils.py

transformer_engine/common/include/transformer_engine/fused_attn.h

Signed-off-by: Charlene Yang <[email protected]>

* first draft; debug plan failure Signed-off-by: Charlene Yang <[email protected]> * debug uid error Signed-off-by: Charlene Yang <[email protected]> * tweak params Signed-off-by: Charlene Yang <[email protected]> * add grad in output Signed-off-by: Charlene Yang <[email protected]> * clean up prints Signed-off-by: Charlene Yang <[email protected]> * fix prints in test Signed-off-by: Charlene Yang <[email protected]> * Apply 1 suggestion(s) to 1 file(s) Co-authored-by: Chen Cui <[email protected]> Signed-off-by: Charlene Yang <[email protected]> * address review comments Signed-off-by: Charlene Yang <[email protected]> * fix unfused grad; add softmax_type; add sink to bwd Signed-off-by: Charlene Yang <[email protected]> * Apply 1 suggestion(s) to 1 file(s) Co-authored-by: Chen Cui <[email protected]> Signed-off-by: Charlene Yang <[email protected]> * fix padding mask; add swa tests; remove requires_grad for off-by-one Signed-off-by: Charlene Yang <[email protected]> * update FE Signed-off-by: Charlene Yang <[email protected]> * Apply 1 suggestion(s) to 1 file(s) Co-authored-by: Chen Cui <[email protected]> Signed-off-by: Charlene Yang <[email protected]> * Apply 1 suggestion(s) to 1 file(s) Co-authored-by: Chen Cui <[email protected]> Signed-off-by: Charlene Yang <[email protected]> * Apply 1 suggestion(s) to 1 file(s) Co-authored-by: Chen Cui <[email protected]> Signed-off-by: Charlene Yang <[email protected]> * Apply 1 suggestion(s) to 1 file(s) Co-authored-by: Chen Cui <[email protected]> Signed-off-by: Charlene Yang <[email protected]> * Apply 1 suggestion(s) to 1 file(s) Co-authored-by: Chen Cui <[email protected]> Signed-off-by: Charlene Yang <[email protected]> * Apply 1 suggestion(s) to 1 file(s) Co-authored-by: Chen Cui <[email protected]> Signed-off-by: Charlene Yang <[email protected]> * Apply 1 suggestion(s) to 1 file(s) Co-authored-by: Chen Cui <[email protected]> Signed-off-by: Charlene Yang <[email protected]> * Apply 1 suggestion(s) to 1 file(s) Co-authored-by: Chen Cui <[email protected]> Signed-off-by: Charlene Yang <[email protected]> * Apply 1 suggestion(s) to 1 file(s) Co-authored-by: Chen Cui <[email protected]> Signed-off-by: Charlene Yang <[email protected]> * fix indent Signed-off-by: Charlene Yang <[email protected]> * fix non-determinism and shapes Signed-off-by: Charlene Yang <[email protected]> * clean up prints Signed-off-by: Charlene Yang <[email protected]> * add GQA Signed-off-by: Charlene Yang <[email protected]> * add CP A2A; dq/dk mismatches Signed-off-by: Charlene Yang <[email protected]> * fix CP A2A; need cleaner solution Signed-off-by: Charlene Yang <[email protected]> * fix CP A2A; pending cudnn kernel change Signed-off-by: Charlene Yang <[email protected]> * minor fixes Signed-off-by: Charlene Yang <[email protected]> * fix world size in unit test; avoid thd format Signed-off-by: Charlene Yang <[email protected]> * fix kernel_backend, dtype in unit test; fix head_dim for FP8 Hopper Signed-off-by: Charlene Yang <[email protected]> * fix thd logic Signed-off-by: Charlene Yang <[email protected]> * fix fp8 context Signed-off-by: Charlene Yang <[email protected]> * tweak CP logging Signed-off-by: Charlene Yang <[email protected]> * allow no_mask/padding for SWA(left,0) Signed-off-by: Charlene Yang <[email protected]> * Revert "allow no_mask/padding for SWA(left,0)" This reverts commit 08b4ccc67a08b6882080b06aa715f541bb832aca. Signed-off-by: Charlene Yang <[email protected]> * add softmax_type to Jax Signed-off-by: Charlene Yang <[email protected]> * add cuDNN version control Signed-off-by: Charlene Yang <[email protected]> * prettify tests Signed-off-by: Charlene Yang <[email protected]> * skip 9.13 for MLA, non 192/128 Signed-off-by: Charlene Yang <[email protected]> * rename compare_with_error Signed-off-by: Charlene Yang <[email protected]> * small cleanups and improvements Signed-off-by: Charlene Yang <[email protected]> * fix minor CI failures Signed-off-by: Charlene Yang <[email protected]> * force sink/dsink to be float32 Signed-off-by: Charlene Yang <[email protected]> * switch FE to GH FE Signed-off-by: Charlene Yang <[email protected]> * return to GH TE main FE commit Signed-off-by: Charlene Yang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update FE to 1.14.1 Signed-off-by: Charlene Yang <[email protected]> * clean up before CI Signed-off-by: Charlene Yang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix lint Signed-off-by: Charlene Yang <[email protected]> * bump up cudnn version Signed-off-by: Charlene Yang <[email protected]> * add backend selection guard for unit tests Signed-off-by: Charlene Yang <[email protected]> * add docstring for softmax type enums in C Signed-off-by: Charlene Yang <[email protected]> --------- Signed-off-by: Charlene Yang <[email protected]> Co-authored-by: Chen Cui <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

cyanguwa and others added 30 commits September 2, 2025 15:00

first draft; debug plan failure

26e1deb

Signed-off-by: Charlene Yang <[email protected]>

debug uid error

73f2ad3

Signed-off-by: Charlene Yang <[email protected]>

tweak params

c3c1843

Signed-off-by: Charlene Yang <[email protected]>

add grad in output

6e59c49

Signed-off-by: Charlene Yang <[email protected]>

clean up prints

a2242e8

Signed-off-by: Charlene Yang <[email protected]>

fix prints in test

854cf1f

Signed-off-by: Charlene Yang <[email protected]>

Apply 1 suggestion(s) to 1 file(s)

95f44fc

Co-authored-by: Chen Cui <[email protected]> Signed-off-by: Charlene Yang <[email protected]>

address review comments

308c3aa

Signed-off-by: Charlene Yang <[email protected]>

fix unfused grad; add softmax_type; add sink to bwd

4050332

Signed-off-by: Charlene Yang <[email protected]>

Apply 1 suggestion(s) to 1 file(s)

fdbdabc

Co-authored-by: Chen Cui <[email protected]> Signed-off-by: Charlene Yang <[email protected]>

fix padding mask; add swa tests; remove requires_grad for off-by-one

cfab71c

Signed-off-by: Charlene Yang <[email protected]>

update FE

b8ff061

Signed-off-by: Charlene Yang <[email protected]>

Apply 1 suggestion(s) to 1 file(s)

9aa99c1

Co-authored-by: Chen Cui <[email protected]> Signed-off-by: Charlene Yang <[email protected]>

Apply 1 suggestion(s) to 1 file(s)

cde079e

Co-authored-by: Chen Cui <[email protected]> Signed-off-by: Charlene Yang <[email protected]>

Apply 1 suggestion(s) to 1 file(s)

be47d64

Co-authored-by: Chen Cui <[email protected]> Signed-off-by: Charlene Yang <[email protected]>

Apply 1 suggestion(s) to 1 file(s)

ed0b389

Co-authored-by: Chen Cui <[email protected]> Signed-off-by: Charlene Yang <[email protected]>

Apply 1 suggestion(s) to 1 file(s)

802b552

Co-authored-by: Chen Cui <[email protected]> Signed-off-by: Charlene Yang <[email protected]>

Apply 1 suggestion(s) to 1 file(s)

d879a77

Co-authored-by: Chen Cui <[email protected]> Signed-off-by: Charlene Yang <[email protected]>

Apply 1 suggestion(s) to 1 file(s)

f48b5fc

Co-authored-by: Chen Cui <[email protected]> Signed-off-by: Charlene Yang <[email protected]>

Apply 1 suggestion(s) to 1 file(s)

557f982

Co-authored-by: Chen Cui <[email protected]> Signed-off-by: Charlene Yang <[email protected]>

Apply 1 suggestion(s) to 1 file(s)

38c4cb6

Co-authored-by: Chen Cui <[email protected]> Signed-off-by: Charlene Yang <[email protected]>

fix indent

094f9f8

Signed-off-by: Charlene Yang <[email protected]>

fix non-determinism and shapes

2e8acc5

Signed-off-by: Charlene Yang <[email protected]>

clean up prints

13b8d99

Signed-off-by: Charlene Yang <[email protected]>

add GQA

a44fe19

Signed-off-by: Charlene Yang <[email protected]>

add CP A2A; dq/dk mismatches

14122ff

Signed-off-by: Charlene Yang <[email protected]>

fix CP A2A; need cleaner solution

0341c49

Signed-off-by: Charlene Yang <[email protected]>

fix CP A2A; pending cudnn kernel change

d7053fe

Signed-off-by: Charlene Yang <[email protected]>

minor fixes

c259b3c

Signed-off-by: Charlene Yang <[email protected]>

fix world size in unit test; avoid thd format

31cdaf9

Signed-off-by: Charlene Yang <[email protected]>

cyanguwa added 2 commits September 5, 2025 11:30

update FE to 1.14.1

a15f079

Signed-off-by: Charlene Yang <[email protected]>

Merge branch 'main' into sink_attn

f3c17eb

cyanguwa added the 2.8.0 label Sep 8, 2025

Merge branch 'main' into sink_attn

11e4329

cyanguwa commented Sep 11, 2025

View reviewed changes

transformer_engine/pytorch/attention/dot_product_attention/utils.py Show resolved Hide resolved

cyanguwa and others added 2 commits September 11, 2025 12:29

clean up before CI

f1c1688

Signed-off-by: Charlene Yang <[email protected]>

[pre-commit.ci] auto fixes from pre-commit.com hooks

a701b17

for more information, see https://pre-commit.ci

cyanguwa added 2 commits September 11, 2025 12:37

fix lint

516a423

Signed-off-by: Charlene Yang <[email protected]>

bump up cudnn version

435c338

Signed-off-by: Charlene Yang <[email protected]>

Merge branch 'main' into sink_attn

0822c81

cuichenx approved these changes Sep 18, 2025

View reviewed changes

cyanguwa added 3 commits September 17, 2025 22:29

Merge branch 'main' into sink_attn

e97327d

Merge branch 'main' into sink_attn

26adddb

add backend selection guard for unit tests

d8a7d6f

Signed-off-by: Charlene Yang <[email protected]>

cyanguwa requested a review from ptrendx September 21, 2025 06:03

ptrendx reviewed Sep 22, 2025

View reviewed changes

tests/pytorch/attention/test_kv_cache.py Show resolved Hide resolved

ptrendx reviewed Sep 22, 2025

View reviewed changes

tests/pytorch/utils.py Show resolved Hide resolved

ptrendx reviewed Sep 22, 2025

View reviewed changes

transformer_engine/common/include/transformer_engine/fused_attn.h Show resolved Hide resolved

add docstring for softmax type enums in C

759b5ab

Signed-off-by: Charlene Yang <[email protected]>

ptrendx approved these changes Sep 22, 2025

View reviewed changes

cyanguwa merged commit 5e4e0b2 into NVIDIA:main Sep 22, 2025
12 checks passed

hemildesai mentioned this pull request Sep 25, 2025

Support TE for GPTOSS NVIDIA-NeMo/Automodel#513

Closed

pggPL mentioned this pull request Oct 6, 2025

[JAX] Add support for sink attention in JAX #2225

Open

13 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[PyTorch] Add sink attention support from cuDNN #2148

[PyTorch] Add sink attention support from cuDNN #2148

Uh oh!

cyanguwa commented Sep 2, 2025 •

edited

Loading

Uh oh!

Uh oh!

cyanguwa commented Sep 11, 2025

Uh oh!

jeremyyx commented Sep 17, 2025

Uh oh!

cyanguwa commented Sep 17, 2025

Uh oh!

cuichenx left a comment

Uh oh!

cyanguwa commented Sep 21, 2025 •

edited

Loading

Uh oh!

cyanguwa commented Sep 21, 2025

Uh oh!

cyanguwa commented Sep 21, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[PyTorch] Add sink attention support from cuDNN #2148

[PyTorch] Add sink attention support from cuDNN #2148

Uh oh!

Conversation

cyanguwa commented Sep 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of change

Changes

Checklist:

Uh oh!

Uh oh!

cyanguwa commented Sep 11, 2025

Uh oh!

jeremyyx commented Sep 17, 2025

Uh oh!

cyanguwa commented Sep 17, 2025

Uh oh!

cuichenx left a comment

Choose a reason for hiding this comment

Uh oh!

cyanguwa commented Sep 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cyanguwa commented Sep 21, 2025

Uh oh!

cyanguwa commented Sep 21, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

cyanguwa commented Sep 2, 2025 •

edited

Loading

cyanguwa commented Sep 21, 2025 •

edited

Loading