[Triton] Make tl.cat deterministic by using permute+reshape+join and remove CatOp#8769
Conversation
ThomasRaoux
left a comment
There was a problem hiding this comment.
Yay, this is awesome!
|
very cool! |
|
With Triton 3.3.1+git96394dff, when implementing the randn op, tl.pair_uniform_to_normal generates 2 * BLOCK random normal numbers, but in an interleaved memory layout. With tl.cat (the older implementation), these numbers could be reordered to be contiguous, and we didn't care whether CatOp would break determinism. This method brought huge performance improvements. However, with Triton 3.5.0, the improvements disappear. Is this the reason for the disappearance? |
|
@ThomasRaoux I think that we shoud have a |
This change is not part of 3.5 so I don't think this is related.
Interesting idea, one thing that have been really annoying in the past is that if the value has multiple uses, they may want different layouts, in this case it may or may not be correct to use different layouts for different uses. It feels like the cases where being able to re-order elements in a reasonable way is rare, I think we should only enable it/keep it enabled if we find a strong use case. For random value generation I think we should always be able to propagate layout to be able to get the output layout we want for free. |
|
Even better if we don't find a use case for reorder, but if there is, we could always just allow it asserting that it only has one user. |
It can be subtle though, as the user may get rematerialized by the compiler. But we can figure it out if there is a real life use case. For now except for the case of reshape + two step reduction I haven't seen a legit use case |
…d remove `CatOp` (triton-lang#8769) `tl.cat` is now implemented with permute, reshape, join, reshape, permute instead of using the builtin `CatOp`. This makes `tl.cat` deterministic, and thanks to linear layouts, it doesn't appear to affect performance much. The `can_reorder` option is deprecated and now does nothing. `tl.cat` also supports concatenating along any dim.
…+join and remove `CatOp`" (triton-lang#8777) Reverts triton-lang#8769 This depends on triton-lang#8776 but since it's a significant change, revert it so we can integrate them individually.
… using permute+reshape+join and remove `CatOp` (#8769)' Summary: This is a cherry-pick of an upstream PR: triton-lang/triton#8769 Upstream commit message: ``` > [Triton] Make `tl.cat` deterministic by using permute+reshape+join and remove `CatOp` (#8769) > `tl.cat` is now implemented with permute, reshape, join, reshape, > permute instead of using the builtin `CatOp`. This makes `tl.cat` > deterministic, and thanks to linear layouts, it doesn't appear to affect > performance much. The `can_reorder` option is deprecated and now does > nothing. > `tl.cat` also supports concatenating along any dim. ``` ***Do not remove the following line from this commit*** Reactor Cherry-pick Revision: bc4fc36 --- This diff was generated by running: ``` buck run fbcode//triton/tools/reactor:reactor -- cherrypick --num-commits 1 --no-submit ``` Reviewed By: dshi7 Differential Revision: D99194850 fbshipit-source-id: a26108582af94f34d779c8b3270ebb7581b50caf
…istic by using permute+reshape+join and remove `CatOp`" (#8777)' Summary: This is a cherry-pick of an upstream PR: triton-lang/triton#8777 Upstream commit message: ``` > Revert "[Triton] Make `tl.cat` deterministic by using permute+reshape+join and remove `CatOp`" (#8777) > Reverts triton-lang/triton#8769 > This depends on triton-lang/triton#8776 but > since it's a significant change, revert it so we can integrate them > individually. ``` ***Do not remove the following line from this commit*** Reactor Cherry-pick Revision: e7fb841 --- This diff was generated by running: ``` buck run fbcode//triton/tools/reactor:reactor -- cherrypick --num-commits 1 --no-submit ``` Reviewed By: dshi7 Differential Revision: D99343063 fbshipit-source-id: b7f68732787eaf200639f89adf5ac8b80d69a9b7
tl.catis now implemented with permute, reshape, join, reshape, permute instead of using the builtinCatOp. This makestl.catdeterministic, and thanks to linear layouts, it doesn't appear to affect performance much. Thecan_reorderoption is deprecated and now does nothing.tl.catalso supports concatenating along any dim.