[Reland][NVIDIA] Support swizzle 0 TMA + MMA for Hopper and Blackwell by masahi · Pull Request #10148 · triton-lang/triton

masahi · 2026-04-27T22:06:03Z

Compared to #9931:

When looking for an equivalent LL in the loop https://github.com/masahi/triton/blob/62b3a08c1b205522991148b4d0b6d761e0ecb369/lib/Dialect/TritonNvidiaGPU/Transforms/OptimizeDescriptorEncoding.cpp#L69-L79, we are now guarding against an LL creation failure. Previously, I was using ttg::areLayoutsEquivalent(shape, sharedLinear, candidate), but this test can fail with a non-recoverable error due to an incompatible shape and nvmma_shared attributes. I added tryNvmmaSharedToLinearLayout as a safe way to create an LL and test layout equivalence if the former succeeds. An alternative would be to decide if an LL creation is guaranteed to succeed before using ttg::areLayoutsEquivalent. I didn't investigate the feasibility and the completeness of this path deeply
The new rewrite in OptimizeDotOperands always updates the operand encoding with #shared_linear whenever view operations are present. The premise of this rewrite is supposed to be that it preserves the operand encoding and propagates #shared_linear encoding upward. This rewriting should not fire when the operand encoding is #nvmma_shared, in which case it is replaced with an equivalent #shared_linear encoding. Although this rewrite is benign in principle, I decided to keep the scope of this rewrite to those MMA ops whose operand encoding is already #shared_linear, since this was the original use case this rewriting was intended for. This change is unrelated to the regression but I added it for an additional safety.

…e on Hopper

lezcano · 2026-04-28T12:19:03Z

I asked codex to put together the diff:
3130b82...compare/pr10148-on-pr9931

lezcano · 2026-04-28T12:21:36Z

+    // Restrict this rewrite to an operand which already uses a shared-linear
+    // encoding. Backward propagation through tensor reshape/trans is not
+    // encoding-stable for NVMMAShared.
+    if (!isa<SharedLinearEncodingAttr>(operandTy.getEncoding()))
+      return failure();


Mmas accept both sharedlinearlayouts as well as nvmma so this change is really benign. Do you have any particular concerns?

I don't have an actual example that would be broken if an mma operand with #nvmma_shared gets replaced by #shared_linear. I added this skip as an extra safety measure, since I wouldn't be surprised if some pass depends on having #nvmma_shared in an mma operand.

I think this is also more in line with the original purpose of this rewriting - some earlier pass like AccelerateMatmul fixes the operand encoding to be #shared_linear when it identified a need to do so, and this rewriting is supposed to propagate that over a preceding view ops chain. So it doesn't make sense that the operand encoding would change before / after this pass.

lezcano · 2026-04-28T12:23:46Z

+  // This condition can fail if a layout is speculatively constructed for
+  // equivalence checking.
+  if (layout.getTotalOutDimSize() != product(maybeTransposedTmaShape))
+    return failure();


where can this happen exactly? it feels like quite a big issue.

A concrete test case that fails this condition is this one: https://github.com/masahi/triton/blob/58c3b956958f572e1f6bfa3ddbd865c9cac40763/test/TritonNvidiaGPU/optimize_descriptor_encoding.mlir#L180-L188

We call buildNvmmaSharedLinearLayout with shape [1, 16, 1, 16] and various candidates nvmma_shared encodings. For some candidates, it seems ensureLayoutNotSmallerThan can return a layout that covers more than [1, 16, 1, 16] output elements.

I still get the feeling that there's a better place to catch this one than this late, but sure.

ThomasRaoux · 2026-04-28T22:51:43Z

+  if (failed(layout))
+    llvm::report_fatal_error("Illegal shared layout");


I think it is fine to keep that along with the emitError

I hope the current code after 58c3b95 has addressed this comment

…aSharedToLinearLayout

masahi · 2026-04-28T23:06:12Z

Also removed the tryNvmmaSharedToLinearLayout thing, since adding a safe version of nvmmaSharedToLinearLayout and reimplementing the existing unsafe version in terms of the safe one is now possible and cleaner.

lezcano

New diff for review: 3130b82...compare/pr10148-vs-pr9931

lezcano · 2026-04-30T09:31:10Z

+  // This condition can fail if a layout is speculatively constructed for
+  // equivalence checking.
+  if (layout.getTotalOutDimSize() != product(maybeTransposedTmaShape))
+    return failure();


I still get the feeling that there's a better place to catch this one than this late, but sure.

masahi and others added 30 commits March 25, 2026 04:02

Add swizzle=0 TCGen5 operand-view memdesc rewrite and lit test

8317097

cmake fix

1939857

works

7d1e42c

make it work for other dot ops

a86d083

fix

d2955e7

fix

28d35fa

[TritonGPU] Match swizzle0 operand-view rewrite from local_load sourc…

638c3b0

…e on Hopper

[TritonGPU] Use source shared encoding for swizzle0 operand-view rewrite

3375a12

fix

9f559e9

clean

390b118

simplify

3782068

remove pattern matching against desc load

8707f6d

upd lit test

5ea9724

fix

12cb8e0

fix for bw

07119d3

update bw lit

746c28a

update for hop

1d02e00

upd

be6eb93

upd

0fa2e71

clean test

5e45dac

refactoring operand update

e7d54f8

wip

3291122

more

6637c0d

refactor

9dcce40

wip

9144860

fix

da8d60c

more clean

a41052a

add comment

d3eee96

remove stale include

b9b6eb4

Merge branch 'main' into tma-mma-swizzle-0

0699532

masahi added 2 commits April 17, 2026 07:46

[TritonGPU] Restrict operand-view rewrite to shared_linear

102193e

Merge branch 'main' into swizzle-0-fix

30f12e1

masahi requested review from lezcano and ptillet as code owners April 27, 2026 22:06

format

62b3a08

masahi marked this pull request as draft April 27, 2026 22:28

lezcano reviewed Apr 28, 2026

View reviewed changes

Comment thread lib/Dialect/TritonGPU/IR/Dialect.cpp Outdated

masahi added 3 commits April 28, 2026 20:01

[TritonGPU] Simplify TMA block shape diagnostics

de8af7a

[TritonGPU] Simplify TMA block shape error helper

2bb6ccd

[TritonGPU] Drop stale TMA helper suffixes

6963198

lezcano reviewed Apr 28, 2026

View reviewed changes

Comment thread lib/Dialect/TritonGPU/IR/Dialect.cpp Outdated

masahi added 2 commits April 28, 2026 20:39

minor change in LinearLayoutConversions.cpp

ed11c36

inline error emit

ff09a8a

masahi marked this pull request as ready for review April 28, 2026 11:46

masahi requested review from ThomasRaoux and lezcano April 28, 2026 11:46

lezcano reviewed Apr 28, 2026

View reviewed changes

tolleybot mentioned this pull request Apr 28, 2026

Mamba3 backward pass (mamba3_siso_bwd_kernel_dqkv) 38.7x slower on GB200 (SM100) due to ptxas C7907 eliminating autotuner configs state-spaces/mamba#904

Open

ThomasRaoux reviewed Apr 28, 2026

View reviewed changes

Comment thread lib/Dialect/TritonGPU/IR/Dialect.cpp Outdated

masahi added 2 commits April 29, 2026 07:38

more inline error msg

b7f1acd

remove tryGetTMABlockShape

0171037

ThomasRaoux reviewed Apr 28, 2026

View reviewed changes

removed tryNvmmaSharedToLinearLayout by adding a safe version of nvmm…

58c3b95

…aSharedToLinearLayout

Merge branch 'main' into swizzle-0-fix

157f580

masahi requested review from ThomasRaoux and lezcano April 30, 2026 08:04

lezcano approved these changes Apr 30, 2026

View reviewed changes

masahi merged commit 0f2a3b1 into triton-lang:main Apr 30, 2026
23 of 27 checks passed

		if (failed(layout))
		llvm::report_fatal_error("Illegal shared layout");

Conversation

masahi commented Apr 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lezcano commented Apr 28, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

masahi commented Apr 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lezcano left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

masahi commented Apr 27, 2026 •

edited

Loading

masahi commented Apr 28, 2026 •

edited

Loading