Skip to content

[Reland][NVIDIA] Support swizzle 0 TMA + MMA for Hopper and Blackwell#10148

Merged
masahi merged 72 commits into
triton-lang:mainfrom
masahi:swizzle-0-fix
Apr 30, 2026
Merged

[Reland][NVIDIA] Support swizzle 0 TMA + MMA for Hopper and Blackwell#10148
masahi merged 72 commits into
triton-lang:mainfrom
masahi:swizzle-0-fix

Conversation

@masahi
Copy link
Copy Markdown
Collaborator

@masahi masahi commented Apr 27, 2026

Compared to #9931:

  • When looking for an equivalent LL in the loop https://github.com/masahi/triton/blob/62b3a08c1b205522991148b4d0b6d761e0ecb369/lib/Dialect/TritonNvidiaGPU/Transforms/OptimizeDescriptorEncoding.cpp#L69-L79, we are now guarding against an LL creation failure. Previously, I was using ttg::areLayoutsEquivalent(shape, sharedLinear, candidate), but this test can fail with a non-recoverable error due to an incompatible shape and nvmma_shared attributes. I added tryNvmmaSharedToLinearLayout as a safe way to create an LL and test layout equivalence if the former succeeds. An alternative would be to decide if an LL creation is guaranteed to succeed before using ttg::areLayoutsEquivalent. I didn't investigate the feasibility and the completeness of this path deeply

  • The new rewrite in OptimizeDotOperands always updates the operand encoding with #shared_linear whenever view operations are present. The premise of this rewrite is supposed to be that it preserves the operand encoding and propagates #shared_linear encoding upward. This rewriting should not fire when the operand encoding is #nvmma_shared, in which case it is replaced with an equivalent #shared_linear encoding. Although this rewrite is benign in principle, I decided to keep the scope of this rewrite to those MMA ops whose operand encoding is already #shared_linear, since this was the original use case this rewriting was intended for. This change is unrelated to the regression but I added it for an additional safety.

@masahi masahi requested review from lezcano and ptillet as code owners April 27, 2026 22:06
@masahi masahi marked this pull request as draft April 27, 2026 22:28
Comment thread lib/Dialect/TritonGPU/IR/Dialect.cpp Outdated
Comment thread lib/Dialect/TritonGPU/IR/Dialect.cpp Outdated
@masahi masahi marked this pull request as ready for review April 28, 2026 11:46
@masahi masahi requested review from ThomasRaoux and lezcano April 28, 2026 11:46
@lezcano
Copy link
Copy Markdown
Contributor

lezcano commented Apr 28, 2026

I asked codex to put together the diff:
3130b82...compare/pr10148-on-pr9931

Comment on lines +221 to +225
// Restrict this rewrite to an operand which already uses a shared-linear
// encoding. Backward propagation through tensor reshape/trans is not
// encoding-stable for NVMMAShared.
if (!isa<SharedLinearEncodingAttr>(operandTy.getEncoding()))
return failure();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mmas accept both sharedlinearlayouts as well as nvmma so this change is really benign. Do you have any particular concerns?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have an actual example that would be broken if an mma operand with #nvmma_shared gets replaced by #shared_linear. I added this skip as an extra safety measure, since I wouldn't be surprised if some pass depends on having #nvmma_shared in an mma operand.

I think this is also more in line with the original purpose of this rewriting - some earlier pass like AccelerateMatmul fixes the operand encoding to be #shared_linear when it identified a need to do so, and this rewriting is supposed to propagate that over a preceding view ops chain. So it doesn't make sense that the operand encoding would change before / after this pass.

Comment on lines +261 to +264
// This condition can fail if a layout is speculatively constructed for
// equivalence checking.
if (layout.getTotalOutDimSize() != product(maybeTransposedTmaShape))
return failure();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where can this happen exactly? it feels like quite a big issue.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A concrete test case that fails this condition is this one: https://github.com/masahi/triton/blob/58c3b956958f572e1f6bfa3ddbd865c9cac40763/test/TritonNvidiaGPU/optimize_descriptor_encoding.mlir#L180-L188

We call buildNvmmaSharedLinearLayout with shape [1, 16, 1, 16] and various candidates nvmma_shared encodings. For some candidates, it seems ensureLayoutNotSmallerThan can return a layout that covers more than [1, 16, 1, 16] output elements.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still get the feeling that there's a better place to catch this one than this late, but sure.

Comment thread lib/Dialect/TritonGPU/IR/Dialect.cpp Outdated
Comment thread lib/Dialect/TritonGPU/Transforms/OptimizeDotOperands.cpp
Comment thread lib/Dialect/TritonGPU/IR/Dialect.cpp Outdated
Comment on lines +290 to +291
if (failed(layout))
llvm::report_fatal_error("Illegal shared layout");
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is fine to keep that along with the emitError

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hope the current code after 58c3b95 has addressed this comment

@masahi
Copy link
Copy Markdown
Collaborator Author

masahi commented Apr 28, 2026

Also removed the tryNvmmaSharedToLinearLayout thing, since adding a safe version of nvmmaSharedToLinearLayout and reimplementing the existing unsafe version in terms of the safe one is now possible and cleaner.

@masahi masahi requested review from ThomasRaoux and lezcano April 30, 2026 08:04
Copy link
Copy Markdown
Contributor

@lezcano lezcano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines +261 to +264
// This condition can fail if a layout is speculatively constructed for
// equivalence checking.
if (layout.getTotalOutDimSize() != product(maybeTransposedTmaShape))
return failure();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still get the feeling that there's a better place to catch this one than this late, but sure.

@masahi masahi merged commit 0f2a3b1 into triton-lang:main Apr 30, 2026
23 of 27 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants