[BACKEND] Fix the `divideRight` method in Linear Layout when eliminating input and output dimensions #4530

Jokeren · 2024-08-17T03:05:45Z

Before this patch, if a * b = c, c.divideRight(b) might return nullopt even if a' * b = c, where a' is the potential result of divideRight.
This PR addresses the issue by conservatively removing input and output dimensions, ensuring that the division returns a non-nullopt result when a valid solution exists.
However, it does not guarantee that a and a' will have identical input and output dimensions.

In addition, this PR also fixes a bug in TritonGPURemoveLayoutConversionsPass. The backward slice should be continued when encountering a free conversion. This includes cases where c.divideRight(b) results in a layout that only permutes register values within individual threads.

jlebar · 2024-08-23T17:00:13Z

Before this patch, if a * b = c, c.divideRight(a) might return nullopt even if a' * b = c, where a' is the potential result of divideRight.

~~I'm confused... c.divideRight(a) should equal b, right? How does a' fit into it?~~

Ah, I think you meant c.divideRight(b).

Jokeren · 2024-08-23T17:09:26Z

Oh yes, typo fixed! Thank you!

jlebar · 2024-08-23T17:11:13Z

However, it does not guarantee that a and a' will have identical input and output dimensions.

This seems pretty confusing to me, because now we do not have the invariant that divideRight is the inverse of *.

I haven't thought about this, but instead of removing the empty dimensions from the dividend, would it be impossible to infer the additional dimensions and add them?

jlebar · 2024-08-23T17:02:10Z

include/triton/Tools/LinearLayout.h

+  // a' * b = c and a * b' = c.
+  //
+  // Note that a' and a may not have exactly the same input/output dimensions.
+  // a may contain additional empty input dimensions than a'. For example:


jlebar · 2024-08-23T17:05:36Z

include/triton/Tools/LinearLayout.h

  }

-  // divideLeft and divideRight are the inverses of operator*.
+  // divideLeft and divideRight are the inverses of operator *.


In C++ it's called operator*, not operator *

Ah, it's a GPT bug...

I didn't meant to change this line

jlebar · 2024-08-23T17:06:36Z

lib/Tools/LinearLayout.cpp

+template <typename T, typename U>
+void assertCommonDimsSameOrder(T &&outerDims, U &&innerDims) {
+  // Check that elements common to both outerDimsRange and innerDimsRange
+  // appear in the same relative order.


This comment describes the behavior of the function. Therefore move it outside the function?

jlebar · 2024-08-23T17:06:56Z

lib/Tools/LinearLayout.cpp

+
+  if (outerCommonDims != innerCommonDims) {
+    llvm::report_fatal_error(
+        "Cannot multiply layouts.  All in/out dimensions common to both "


Seems like this error is not correct anymore?

Yeah, I think we can just remove "Cannot multiply layouts"

jlebar · 2024-08-23T17:11:56Z

lib/Dialect/TritonGPU/Transforms/Utility.cpp

+        enqueue(definingOp->getOperand(0), encoding);
+        continue;
+      }
+      if (canFoldIntoConversion(definingOp, encoding))


I don't understand what this change is doing. Is it related to this PR?

Yes, there was a bug in layout removal. It's contributed by @ThomasRaoux. I should add his commit message also

I think it would help to put it in a separate PR. You can use one of these tools https://www.stacking.dev/?utm_source=stack-comment to stack your PRs, or you can use my very hacky tool https://github.com/jlebar/git-pr-chain

jlebar · 2024-08-23T17:13:33Z

lib/Tools/LinearLayout.cpp

+  //              out-dim0
+  //   in-dim0 |   size 1
+  //   in-dim1 |   size 1
+  //   in-dim2 |   size 1


I don't really know what you mean with these diagrams. Perhaps we could use something that matches the toString() output of LinearLayout, since at least that has a well-defined meaning?

lib/Tools/LinearLayout.cpp

Jokeren · 2024-08-23T17:22:40Z

I haven't thought about this, but instead of removing the empty dimensions from the dividend, would it be impossible to infer the additional dimensions and add them?

It's impossible I think for the reasons stated in the code comments

Jokeren · 2024-08-23T17:32:31Z

I think there could be multiple divideRight answers due to either empty output or empty input dimensions.

e.g., output
["out0", "out1", "out2", "out3"] * ["out1", "out3"] = ["out0", "out1", "out2"] * ["out1", "out3"], if out3 is an empty dimension

e.g., input
["in0", "in1"] * ["in2"] = ["in0", "in1"] * ["in1", "in2"], if in1 is an empty dimension

jlebar · 2024-08-24T05:41:36Z

Ah, I understand this PR better now.

What I think we are saying is, we canonicalize the div output. There may be many values a for which a * b = c. We have to choose one. We choose the one with as few size-0 input- and output-dimensions as possible.

Our correctness property now becomes canonicalize(b) == (a*b).divRight(b) for all a and b.

Is that correct?

I think it might help me verify the correctness of this PR if we did two things.

Rewrite the PR description and comments in the code to talk about a canonicalization and this new correctness property.
Split out Thomas's change so that I don't have to figure out which test changes relate to that change versus which changes relate to this change.
If possible, I wonder if the new logic in LinearLayout::divRight can be simplified? If all we're doing is a canonicalization, could we use the same logic we had before, and then simply canonicalize the result?

jlebar · 2024-08-24T05:47:06Z

include/triton/Tools/LinearLayout.h

+  //   b = L("in2") -> ("out2")
+  //
+  // c = a * b = a' * b if "in1" is an empty dimension that maps everything
+  // to 0.


Maybe say something like the following instead.

Size-zero dimensions are effectively ignored by operator*: a*b == a*b' if (and only if) b and b' are the same ignoring any size-zero input- and output-dimensions that are present in a. Therefore if we want divLeft to be the inverse of operator*, there are many possible values that we could return for (a*b).divLeft(a) which would satisfy a * (a*b).divLeft(a) == a*b.

divideLeft and divideRight resolve this ambiguity by always returning the "canonical" quotient, namely the one with the fewest possible size-zero input- and output-dimensions.

jlebar · 2024-08-24T05:51:42Z

lib/Conversion/TritonGPUToLLVM/ConvertLayoutOpToLLVM.cpp

                                   ConversionPatternRewriter &rewriter) const {
    // TODO(jlebar): Implement me.
-    return failure();
+    return transferWithinBlockOrGroup(op, srcLayout, dstLayout, adaptor,


Remove the "implement me" comment above?

jlebar · 2024-08-24T05:52:33Z

lib/Conversion/TritonGPUToLLVM/ConvertLayoutOpToLLVM.cpp

-      auto srcIdx = dstToSrc->apply({{kRegister, i}});
+    outVals.resize(subLayout.getInDimSize(kRegister));
+    for (int i = 0; i < subLayout.getInDimSize(kRegister); i++) {
+      auto srcIdx = subLayout.apply({{kRegister, i}});


Can this change be put into a separate PR? It would make the LinearLayout change here easier to understand in isolation.

It couldn't because the following condition doesn't return true any more:

assert(ArrayRef(to_vector(dstToSrc->getInDimNames())) ==

We must get a subLayout first.

Perhaps worth writing a comment? "You might be tempted to do X, but it doesn't work because Y."?

jlebar · 2024-08-24T05:53:12Z

lib/Dialect/TritonGPU/Transforms/Utility.cpp

+        enqueue(definingOp->getOperand(0), encoding);
+        continue;
+      }
+      if (canFoldIntoConversion(definingOp, encoding))


I think it would help to put it in a separate PR. You can use one of these tools https://www.stacking.dev/?utm_source=stack-comment to stack your PRs, or you can use my very hacky tool https://github.com/jlebar/git-pr-chain

jlebar · 2024-08-24T05:54:08Z

lib/Tools/LinearLayout.cpp

+  if (outerCommonDims != innerCommonDims) {
+    llvm::report_fatal_error("All in/out dimensions common to both layouts "
+                             "must appear in the same relative order, but they "
+                             "don't.\n");


We had \n here because we used to be outputting the in/out dims. Now we're not outputting them anymore, so we should lose the \n (or output the dims again, I thought that was kind of helpful?)

Oh, sorry. It got removed accidentally

jlebar · 2024-08-24T05:54:58Z

lib/Tools/LinearLayout.cpp

+// Check that elements common to both outerDimsRange and innerDimsRange
+// appear in the same relative order.
+template <typename T, typename U>
+void assertCommonDimsSameOrder(T &&outerDims, U &&innerDims) {


I am not sure "outerDims" and "innerDims" are the correct names, based on how this function is used?

I agree it doesn't make sense anymore. Should I just call them dimsA and dimsB?

jlebar · 2024-08-24T05:58:50Z

lib/Tools/LinearLayout.cpp

                         divisor.getInDimSizeLog2(inDim));
  }

+  // Record size 1 out-dims caused by the division.


I think this loop now does two things.

Record size-1 out-dims caused by the division.

Check if newOutDims[outDim] > outDimSize (what the loop used to do).

I think this is confusing. For one thing, there's a comment above the loop which says "Record size 1 out-dims caused by the division." but actually that is misleading, it only mentions one of teh two things done by the loop.

Perhaps we could simply have two loops?

jlebar · 2024-08-24T06:11:19Z

lib/Tools/LinearLayout.cpp

+  //
+  // If we remove "out1" from o, we get:
+  //
+  //   out-dims(l) = ["out0", "out2", "out3"]


I am having trouble following this example.

It seems to be explaining an important edge case in an algorithm, but the overall algorithm -- the thing we're trying to do -- is not clear to me. The example also has some assumptions that I don't understand. For example, I don't understand how o / r returns anything at all if we remove "out1" from o. Isn't that just an infeasible division? (Unless you're assuming out1 is a size-zero dim? But I don't see how I'm supposed to know that?)

I wonder if the following algorithm works.

Assume c = a*b.

We are given b and c, and we want to compute c.divRight(b). i.e. we want to find a (or some a' which is equivalent to a ignoring size-zero dims).

Let b' be b but with all size-zero in-dims and out-dims removed. Same for c'.

Compute our candidate quotient a' = c' / b', same as before.

Check if a' * b' == c', same as before.

If it matches, then let a'' be a' but with the minimum number of size-zero in- and out-dims added back to it so that a'' * b' = c as desired.

Return a''.

About the new algorithm, I think it would fix the same problem as the existing code in this PR

For example, I don't understand how o / r returns anything at all if we remove "out1" from o. Isn't that just an infeasible division? (Unless you're assuming out1 is a size-zero dim? But I don't see how I'm supposed to know that?)

Yeah, we assume that "out1" is a size-zero dim. Why did you say "I don't see how I'm supposed to know that"? Is it because of lacking a comment, or you think there's no way to check it? If the former I'll add a comment; I thought it's clear because there's a variable emptyOutDimIndices.

The key point is that we cannot remove arbitrary empty output dimensions from the quotient.

The following code simulates quotient * divisor = result and enumerates the output dimensions of the result from right to left to check which ones can be removed. When we perform the multiplication, the output dimensions of the quotient are always placed before the output dimensions of the divisor in the result. So if this order breaks in the result, we should stop enumerating output dimensions.

Therefore, I believe the new algorithm you described solves the same problem as the existing code.

There are some confusion about size-1 or size-0 dimensions though. I call them sizeOneOutDimIndices since the out-dim maps everything to 0 and still has a size of 1.

Why did you say "I don't see how I'm supposed to know that"? Is it because of lacking a comment, or you think there's no way to check it?

Lacking a comment. :)

I thought it's clear because there's a variable emptyOutDimIndices.

...yeah I'm doing my best to understand what's going on, but I did not consider looking at the code below to understand the algorithm above (and anyway I'm not sure it would have helped me).

There are some confusion about size-1 or size-0 dimensions though. I call them sizeOneOutDimIndices since the out-dim maps everything to 0 and still has a size of 1.

Ah yes, "size-1" dims is correct, I was calling it the wrong thing.

Therefore, I believe the new algorithm you described solves the same problem as the existing code.

I think I would have an easier time understanding the new algorithm I proposed, but if you don't think that's the best approach (who knows, I'm not the one implementing it), it would help me if we could explain the algorithm we're using as a comment in the code. I think you're starting to get there with this:

The following code simulates quotient * divisor = result and enumerates the output dimensions of the result from right to left to check which ones can be removed. When we perform the multiplication, the output dimensions of the quotient are always placed before the output dimensions of the divisor in the result. So if this order breaks in the result, we should stop enumerating output dimensions.

I promise I'm doing my best to understand things here and not just being obstinate

I've refactored the algorithm as follows:

Consider c = a * b and construct a candidate quotient a' first without removing any dimensions.

Check if a' * b == c. If yes, we start to remove empty dimensions.

First, we remove empty trailing output dimensions from a'.

Then, we remove empty trailing input dimensions from a'.

Finally, we return the final quotient a''.

Also, I found that we actually allow a linear layout with no input dimensions but with empty output dimensions. Therefore, the hacky len(input_dims) == len(output_dims) condition can be removed, allowing us to use the same code to check if data transfer can happen within each thread.

Aren't empty dims in the middle of a' (i.e. not at either end) also removable? They're only not removable if b does not contain the dimension, right?

Ah I see, I'm wrong, you explain it in the comment. Thanks. :)

Jokeren · 2024-08-24T11:35:37Z

I think it would help to put it in a separate PR. You can use one of these tools https://www.stacking.dev/?utm_source=stack-comment to stack your PRs, or you can use my very hacky tool https://github.com/jlebar/git-pr-chain

Sure, I'll probably just cherry pick the commit out of this PR.

…keren/eliminate-dims

jlebar · 2024-08-25T21:10:06Z

lib/Tools/LinearLayout.cpp

 }

+// Check that elements common to both outerDimsRange and innerDimsRange
+// appear in the same relative order.


Update the comment now that you updated the variable names.

jlebar · 2024-08-25T21:12:20Z

lib/Tools/LinearLayout.cpp

+  //
+  // If we remove "out1" from o, we get:
+  //
+  //   out-dims(l) = ["out0", "out2", "out3"]


Ah I see, I'm wrong, you explain it in the comment. Thanks. :)

jlebar · 2024-08-25T21:12:41Z

include/triton/Tools/LinearLayout.h

+  // input and output dimensions that are present in `b`.  Therefore, if we want
+  // divideRight to be the inverse of operator*, there are many possible values
+  // that we could return for `(a*b).divideRight(b)` which would satisfy
+  // `((a*b).divideRight(b))*b == a*b`.


This doesn't match what you do, which is only to remove empty dims at the end of the quotient. :)

Comments updated. Thanks for your kind suggestions!

…ing input and output dimensions (triton-lang#4530) Before this patch, if `a * b = c`, `c.divideRight(b)` might return `nullopt` even if `a' * b = c`, where `a'` is the potential result of `divideRight`. This PR addresses the issue by conservatively removing input and output dimensions, ensuring that the division returns a non-nullopt result when a valid solution exists. However, it does not guarantee that `a` and `a'` will have identical input and output dimensions. In addition, this PR also fixes a bug in `TritonGPURemoveLayoutConversionsPass`. The backward slice should be continued when encountering a free conversion. This includes cases where `c.divideRight(b)` results in a layout that only permutes register values within individual threads. --------- Co-authored-by: Thomas Raoux <[email protected]>

Jokeren changed the title ~~[BACKEND][DRAFT] Fix Linear Layout's divideRight logic for eliminating input and output dimensions~~ [BACKEND][DRAFT] Fix Linear Layout's divideRight when eliminating input and output dimensions Aug 17, 2024

Jokeren marked this pull request as ready for review August 23, 2024 14:13

Jokeren requested a review from ptillet as a code owner August 23, 2024 14:13

Jokeren requested review from ThomasRaoux and jlebar August 23, 2024 14:14

Jokeren changed the title ~~[BACKEND][DRAFT] Fix Linear Layout's divideRight when eliminating input and output dimensions~~ [BACKEND] Fix divideRight method in Linear Layout when eliminating input and output dimensions Aug 23, 2024

Jokeren changed the title ~~[BACKEND] Fix divideRight method in Linear Layout when eliminating input and output dimensions~~ [BACKEND] Fix the divideRight method in Linear Layout when eliminating input and output dimensions Aug 23, 2024

Jokeren requested a review from zahimoud August 23, 2024 14:17

jlebar reviewed Aug 23, 2024

View reviewed changes

jlebar reviewed Aug 24, 2024

View reviewed changes

Jokeren force-pushed the keren/eliminate-dims branch from 5ef4326 to 081d226 Compare August 24, 2024 12:21

Jokeren requested review from antiagainst and zhanglx13 as code owners August 24, 2024 12:21

ThomasRaoux and others added 2 commits August 24, 2024 08:24

Continue the backward slice when finding free convert

2222be1

Update

238c9b4

Jokeren force-pushed the keren/eliminate-dims branch from 081d226 to 238c9b4 Compare August 24, 2024 12:25

Jokeren and others added 7 commits August 24, 2024 08:26

Merge branch 'main' into keren/eliminate-dims

5ab93ca

Address some comments

dfa5e4c

Merge branch 'keren/eliminate-dims' of github.com:openai/triton into …

ed936ab

…keren/eliminate-dims

Update

8d2f17c

Update comment

c09413d

Update comment

d4efbb4

Remove "the"

38ec0e1

Jokeren added 3 commits August 25, 2024 10:04

Update

9e3f273

Update comment

b067996

Use const

6160baf

jlebar approved these changes Aug 25, 2024

View reviewed changes

Jokeren and others added 2 commits August 26, 2024 09:31

Update comments

db0fd2f

Merge branch 'main' into keren/eliminate-dims

a9ad541

Jokeren merged commit 381ff67 into main Aug 26, 2024

Jokeren deleted the keren/eliminate-dims branch August 26, 2024 19:16

jlebar mentioned this pull request Sep 3, 2024

Build LLVMAarch64CodeGen if CMAKE_OSX_ARCHITECTURES is arm64. #4637

Merged

[BACKEND] Fix the divideRight method in Linear Layout when eliminating input and output dimensions #4530

[BACKEND] Fix the divideRight method in Linear Layout when eliminating input and output dimensions #4530

Uh oh!

Conversation

Jokeren commented Aug 17, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jlebar commented Aug 23, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Jokeren commented Aug 23, 2024

Uh oh!

jlebar commented Aug 23, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Jokeren Aug 23, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Jokeren commented Aug 23, 2024

Uh oh!

Jokeren commented Aug 23, 2024

Uh oh!

jlebar commented Aug 24, 2024

Uh oh!

jlebar Aug 24, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

[BACKEND] Fix the `divideRight` method in Linear Layout when eliminating input and output dimensions #4530

[BACKEND] Fix the `divideRight` method in Linear Layout when eliminating input and output dimensions #4530

Jokeren commented Aug 17, 2024 •

edited

Loading

jlebar commented Aug 23, 2024 •

edited

Loading

Jokeren Aug 23, 2024 •

edited

Loading

jlebar Aug 24, 2024 •

edited

Loading

jlebar Aug 25, 2024 •

edited

Loading