[NVIDIA] Enable TMA gather4 on sm_120 and sm_121 by ita9naiwa · Pull Request #8498 · triton-lang/triton

ita9naiwa · 2025-10-21T09:26:42Z

Enable cp.async.bulk.tensor.2d.tile::gather4.shared on sm_120 and sm_121.
Skip TMA scatter4 test on sm_120 since it is unsupported by hardware.

Note:
All other TMA features except for cluster-related ones are supported on sm_120.

ita9naiwa · 2025-10-21T09:27:07Z

Intended to open to my local fork. but I made mistake. Sorry.

ita9naiwa · 2025-10-21T20:52:06Z

Ready to get reviewed!

ThomasRaoux · 2025-10-23T01:10:00Z

+  AsyncTMAGatherOpConversion(LLVMTypeConverter &converter,
+                             PatternBenefit benefit, int computeCapability)
+      : ConvertOpToLLVMPattern<triton::nvidia_gpu::AsyncTMAGatherOp>(converter,
+                                                                     benefit),
+        computeCapability(computeCapability) {}
+
+  int computeCapability;


we can revert that too?

Thanks for catching what I missed! fixed!

- Enable cp.async.bulk.tensor.2d.tile::gather4.shared on sm_120 and sm_121. - Skip TMA scatter4 test on sm_120 since it is unsupported by hardware. Note: All other TMA features except for cluster-related ones are supported on sm_120.

mobicham · 2025-11-17T09:02:16Z

How's TMA speeding up things vs non-TMA on the sm_120 btw?

- Enable cp.async.bulk.tensor.2d.tile::gather4.shared on sm_120 and sm_121. - Skip TMA scatter4 test on sm_120 since it is unsupported by hardware. Note: All other TMA features except for cluster-related ones are supported on sm_120.

- Add SM120 to triton_kernels_supported condition in both backend selection functions (get_mxfp4_backend, get_mxfp4_backend_with_lora) - Use StridedLayout for SM120 to avoid "Must use persistent kernel" error caused by unsupported cluster TMA operations - Configure SM120-specific constraints: is_persistent=False, num_stages=1 Tested on NVIDIA RTX PRO 6000 Blackwell (compute capability 12.0). Requires Triton fix: triton-lang/triton#8498

…n sm_120 and sm_121 (#8498)' Summary: This is a cherry-pick of an upstream PR: triton-lang/triton#8498 Upstream commit message: ``` > [NVIDIA] Enable TMA gather4 on sm_120 and sm_121 (#8498) > - Enable cp.async.bulk.tensor.2d.tile::gather4.shared on sm_120 and > sm_121. > - Skip TMA scatter4 test on sm_120 since it is unsupported by hardware. > Note: > All other TMA features except for cluster-related ones are supported on > sm_120. ``` Conflict Resolution: - File: python/test/unit/language/test_tensor_descriptor.py Action: Added is_sm12x() skipif decorator from upstream; kept local function signature without 'device' param (body uses hardcoded 'cuda') Reason: Local version intentionally omits device fixture for this test; upstream's intent was to add the sm120 skip guard Raw Conflicts: https://www.internalfb.com/intern/paste/P2251271000/ Resolution Diff: https://www.internalfb.com/intern/paste/P2251271369/ ***Do not remove the following line from this commit*** Reactor Cherry-pick Revision: 4d85824 Reviewed By: dshi7 Differential Revision: D98272343 fbshipit-source-id: 8578ef3a83f2a4120369c969a58ed6e34adb6deb

ita9naiwa added 2 commits October 21, 2025 09:22

support tma scatter, disable gather test

5b2662c

yapf lint

dafd120

ita9naiwa requested a review from ptillet as a code owner October 21, 2025 09:26

ita9naiwa closed this Oct 21, 2025

ita9naiwa added 2 commits October 21, 2025 11:42

AsyncTMAGatherOpConversion callback fix

c85a0f7

-lint

dd28b10

ita9naiwa changed the title ~~Support tma scatter, disable gather test~~ [NVIDIA] Enable TMA gather4 on sm_120 and sm_121 Oct 21, 2025

ita9naiwa reopened this Oct 21, 2025

masahi requested review from Mogball and ThomasRaoux and removed request for ptillet October 21, 2025 20:59

fix tma_to_llvm.mlir

3c0020f

ita9naiwa commented Oct 21, 2025

View reviewed changes

Comment thread test/Conversion/tma_to_llvm.mlir Outdated

ThomasRaoux reviewed Oct 22, 2025

View reviewed changes

Comment thread third_party/nvidia/lib/TritonNVIDIAGPUToLLVM/LoadStoreOpToLLVM.cpp Outdated

ita9naiwa added 3 commits October 23, 2025 05:09

Update LoadStoreOpToLLVM.cpp

f0103a8

fix tma_to_llvm.mlir

dcb213f

eof lint

4d1f122

ThomasRaoux reviewed Oct 23, 2025

View reviewed changes

revert AsyncTMAGatherOpConversion

3568771

ThomasRaoux approved these changes Oct 23, 2025

View reviewed changes

Merge branch 'main' into sm120tma

5cea9a4

masahi merged commit 4d85824 into triton-lang:main Oct 24, 2025
9 checks passed

ita9naiwa deleted the sm120tma branch October 24, 2025 05:55

ita9naiwa mentioned this pull request Oct 24, 2025

PTXAS compilation error: '.tile::gather4 with destination state space as .shared::cluster' not supported on target 'sm_121a' #8335

Closed

xhejtman mentioned this pull request Nov 20, 2025

[Bug] RTX PRO 6000 Blackwell Cuda 13 sgl-project/sglang#13342

Closed

5 tasks

ijpq mentioned this pull request Dec 9, 2025

[Performance]: Can we enable triton_kernels on sm120 vllm-project/vllm#30325

Closed

1 task

Ramalama2 mentioned this pull request Dec 23, 2025

[Feature]: Add SM120 (RTX 6000/5000 Blackwell) support for native NVFP4 MoE kernels vllm-project/vllm#31085

Open

1 task

ohsono mentioned this pull request Jan 3, 2026

[Bugfix] Add SM 12.1 support + Fix GPT-OSS Harmony garbled reasoning and HarmonyError crashes vllm-project/vllm#31607

Open

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[NVIDIA] Enable TMA gather4 on sm_120 and sm_121#8498

[NVIDIA] Enable TMA gather4 on sm_120 and sm_121#8498
masahi merged 10 commits intotriton-lang:mainfrom
ita9naiwa:sm120tma

ita9naiwa commented Oct 21, 2025 •

edited

Loading

Uh oh!

ita9naiwa commented Oct 21, 2025 •

edited

Loading

Uh oh!

ita9naiwa commented Oct 21, 2025

Uh oh!

Uh oh!

Uh oh!

ThomasRaoux Oct 23, 2025

Uh oh!

ita9naiwa Oct 23, 2025

Uh oh!

Uh oh!

mobicham commented Nov 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

ita9naiwa commented Oct 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ita9naiwa commented Oct 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ita9naiwa commented Oct 21, 2025

Uh oh!

Uh oh!

Uh oh!

ThomasRaoux Oct 23, 2025

Choose a reason for hiding this comment

Uh oh!

ita9naiwa Oct 23, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mobicham commented Nov 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ita9naiwa commented Oct 21, 2025 •

edited

Loading

ita9naiwa commented Oct 21, 2025 •

edited

Loading