[Backend] Fix incorrect shared layout for dot operands rank==3 by AlexAUT · Pull Request #4944 · triton-lang/triton

AlexAUT · 2024-10-17T17:19:04Z

#4904 moved the layout rewrite for dot operands from blocked->mma to blocked->shared->mma for wmma from ReduceDataDeplucation to DecomposeUnsupportedConversion with this change. However, DecomposeUnsupportedConversion was missing the special case for rank==3 which is copied over in this PR.

lezcano

The changes to wmma were not intentional, it was land race with #4538. Feel free to revert the change making ReduceDataDuplicaton's condition apply only to Ampere.

That being said, would it make sense here to simply use dstOrder as the order to be used, similar to how it's done in

triton/lib/Analysis/Allocation.cpp

Line 124 in 692143c

scratchConfig.order = outOrd;

This one already gives you the order you want for rank=3, while the current one could end up in a funny state where the order does not match that of the input or the output.

AlexAUT · 2024-10-18T17:01:52Z

Thanks for the feedback. I reverted the change in ReduceDataDuplication.

I also switched to getThreadOrder(dstEncoding) and added it to DecomposeUnsupportedConversions to prevent a future regression.

lezcano · 2024-10-18T19:03:52Z

+      if (rank == 3) {
+        sharedOrder = gpu::getThreadOrder(dstEncoding);
+      } else {
+        sharedOrder = srcOrder;
+      }


If you are going this route, you probably want to do it for all ranks, otherwise this heuristic would be incredibly counterintuitive.

But then we might change the shared layout order for rank != 3 with this change? I can also revert all changes except for the condition in ReduceDataDuplication to make it ampere specific. Then we have the same behavior as before #4904.

I just find very weird the current behaviour for rank==3. @Jokeren thoughts?

Actually block to dot decomposition can be deprecated soon
main...keren/dot-mma#diff-30fb59df648f6c4ec5db24c51ef728a8d56104e151bd52e8d482b6951c207aa7R95

I agree the special condition is weird. Can you attach a test case for us to take a look at?

When you run python/test/unit/language/test_core.py::test_dot3d[1-1-64-64-64-32-32-float16-float32] it will trigger this assert. This happens when wmma is used for the dot. The shared -> dot layout conversion for wmma will also expect that the batch dim is the slowest dimension.

AlexAUT · 2024-10-22T09:29:24Z

#4950 ~~changed~~ removed the condition which restores the old behavior for wmma so this problem is fixed now. Thanks @lezcano.

antiagainst changed the title ~~[AMD][Backend] Fix incorrect shared layout for wmma dot operands for rank==3~~ [Backend] Fix incorrect shared layout for wmma dot operands for rank==3 Oct 17, 2024

antiagainst changed the title ~~[Backend] Fix incorrect shared layout for wmma dot operands for rank==3~~ [Backend] Fix incorrect shared layout for dot operands rank==3 Oct 17, 2024

AlexAUT force-pushed the fixInvalidSharedLayoutForDot3d branch 2 times, most recently from ff0d816 to 33941e2 Compare October 18, 2024 08:40

lezcano reviewed Oct 18, 2024

View reviewed changes

AlexAUT force-pushed the fixInvalidSharedLayoutForDot3d branch from 33941e2 to fab393d Compare October 18, 2024 16:52

AlexAUT force-pushed the fixInvalidSharedLayoutForDot3d branch 3 times, most recently from d6b00d2 to 2d4751d Compare October 18, 2024 18:15

lezcano reviewed Oct 18, 2024

View reviewed changes

antiagainst marked this pull request as ready for review October 18, 2024 22:28

antiagainst requested a review from ptillet as a code owner October 18, 2024 22:28

Fix incorrect shared layout for dot operand for rank==3

2e796e6

AlexAUT force-pushed the fixInvalidSharedLayoutForDot3d branch from 2d4751d to 2e796e6 Compare October 19, 2024 11:03

AlexAUT closed this Oct 22, 2024

AlexAUT deleted the fixInvalidSharedLayoutForDot3d branch February 6, 2025 11:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Backend] Fix incorrect shared layout for dot operands rank==3#4944

[Backend] Fix incorrect shared layout for dot operands rank==3#4944
AlexAUT wants to merge 1 commit intotriton-lang:mainfrom
AlexAUT:fixInvalidSharedLayoutForDot3d

AlexAUT commented Oct 17, 2024

Uh oh!

lezcano left a comment

Uh oh!

AlexAUT commented Oct 18, 2024 •

edited

Loading

Uh oh!

lezcano Oct 18, 2024

Uh oh!

AlexAUT Oct 19, 2024

Uh oh!

lezcano Oct 20, 2024

Uh oh!

Jokeren Oct 20, 2024

Uh oh!

Jokeren Oct 20, 2024

Uh oh!

AlexAUT Oct 21, 2024 •

edited

Loading

Uh oh!

AlexAUT commented Oct 22, 2024 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

AlexAUT commented Oct 17, 2024

Uh oh!

lezcano left a comment

Choose a reason for hiding this comment

Uh oh!

AlexAUT commented Oct 18, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lezcano Oct 18, 2024

Choose a reason for hiding this comment

Uh oh!

AlexAUT Oct 19, 2024

Choose a reason for hiding this comment

Uh oh!

lezcano Oct 20, 2024

Choose a reason for hiding this comment

Uh oh!

Jokeren Oct 20, 2024

Choose a reason for hiding this comment

Uh oh!

Jokeren Oct 20, 2024

Choose a reason for hiding this comment

Uh oh!

AlexAUT Oct 21, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

AlexAUT commented Oct 22, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

AlexAUT commented Oct 18, 2024 •

edited

Loading

AlexAUT Oct 21, 2024 •

edited

Loading

AlexAUT commented Oct 22, 2024 •

edited

Loading