Skip to content

[Triton] Make tl.cat deterministic by using permute+reshape+join and remove CatOp#8769

Merged
Mogball merged 4 commits into
mainfrom
mogball/cat
Nov 19, 2025
Merged

[Triton] Make tl.cat deterministic by using permute+reshape+join and remove CatOp#8769
Mogball merged 4 commits into
mainfrom
mogball/cat

Conversation

@Mogball
Copy link
Copy Markdown
Collaborator

@Mogball Mogball commented Nov 19, 2025

tl.cat is now implemented with permute, reshape, join, reshape, permute instead of using the builtin CatOp. This makes tl.cat deterministic, and thanks to linear layouts, it doesn't appear to affect performance much. The can_reorder option is deprecated and now does nothing.

tl.cat also supports concatenating along any dim.

Copy link
Copy Markdown
Collaborator

@ThomasRaoux ThomasRaoux left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yay, this is awesome!

@Mogball Mogball merged commit bc4fc36 into main Nov 19, 2025
17 of 18 checks passed
@Mogball Mogball deleted the mogball/cat branch November 19, 2025 06:22
@lezcano
Copy link
Copy Markdown
Contributor

lezcano commented Nov 19, 2025

very cool!

Mogball added a commit that referenced this pull request Nov 20, 2025
…+join and remove `CatOp`" (#8777)

Reverts #8769

This depends on #8776 but
since it's a significant change, revert it so we can integrate them
individually.
@songdejun
Copy link
Copy Markdown
Contributor

With Triton 3.3.1+git96394dff, when implementing the randn op, tl.pair_uniform_to_normal generates 2 * BLOCK random normal numbers, but in an interleaved memory layout. With tl.cat (the older implementation), these numbers could be reordered to be contiguous, and we didn't care whether CatOp would break determinism. This method brought huge performance improvements.

However, with Triton 3.5.0, the improvements disappear.

Is this the reason for the disappearance?

@lezcano
Copy link
Copy Markdown
Contributor

lezcano commented Nov 26, 2025

@ThomasRaoux I think that we shoud have a tl.can_reorder for people to be able to do this wherever they want, rather than have it baked into our view ops now that our views are actual views.

@ThomasRaoux
Copy link
Copy Markdown
Collaborator

With Triton 3.3.1+git96394dff, when implementing the randn op, tl.pair_uniform_to_normal generates 2 * BLOCK random normal numbers, but in an interleaved memory layout. With tl.cat (the older implementation), these numbers could be reordered to be contiguous, and we didn't care whether CatOp would break determinism. This method brought huge performance improvements.

However, with Triton 3.5.0, the improvements disappear.

Is this the reason for the disappearance?

This change is not part of 3.5 so I don't think this is related.

@ThomasRaoux I think that we shoud have a tl.can_reorder for people to be able to do this wherever they want, rather than have it baked into our view ops now that our views are actual views.

Interesting idea, one thing that have been really annoying in the past is that if the value has multiple uses, they may want different layouts, in this case it may or may not be correct to use different layouts for different uses. It feels like the cases where being able to re-order elements in a reasonable way is rare, I think we should only enable it/keep it enabled if we find a strong use case.

For random value generation I think we should always be able to propagate layout to be able to get the output layout we want for free.

@lezcano
Copy link
Copy Markdown
Contributor

lezcano commented Nov 30, 2025

Even better if we don't find a use case for reorder, but if there is, we could always just allow it asserting that it only has one user.

@ThomasRaoux
Copy link
Copy Markdown
Collaborator

we could always just allow it asserting that it only has one user.

It can be subtle though, as the user may get rematerialized by the compiler. But we can figure it out if there is a real life use case. For now except for the case of reshape + two step reduction I haven't seen a legit use case

tmoreau89 pushed a commit to tmoreau89/triton that referenced this pull request Dec 1, 2025
…d remove `CatOp` (triton-lang#8769)

`tl.cat` is now implemented with permute, reshape, join, reshape,
permute instead of using the builtin `CatOp`. This makes `tl.cat`
deterministic, and thanks to linear layouts, it doesn't appear to affect
performance much. The `can_reorder` option is deprecated and now does
nothing.

`tl.cat` also supports concatenating along any dim.
tmoreau89 pushed a commit to tmoreau89/triton that referenced this pull request Dec 1, 2025
…+join and remove `CatOp`" (triton-lang#8777)

Reverts triton-lang#8769

This depends on triton-lang#8776 but
since it's a significant change, revert it so we can integrate them
individually.
meta-codesync Bot pushed a commit to facebookexperimental/triton that referenced this pull request Apr 6, 2026
… using permute+reshape+join and remove `CatOp` (#8769)'

Summary:
This is a cherry-pick of an upstream PR: triton-lang/triton#8769

Upstream commit message:
```
> [Triton] Make `tl.cat` deterministic by using permute+reshape+join and remove `CatOp` (#8769)

> `tl.cat` is now implemented with permute, reshape, join, reshape,
> permute instead of using the builtin `CatOp`. This makes `tl.cat`
> deterministic, and thanks to linear layouts, it doesn't appear to affect
> performance much. The `can_reorder` option is deprecated and now does
> nothing.

> `tl.cat` also supports concatenating along any dim.
```

***Do not remove the following line from this commit***
Reactor Cherry-pick Revision: bc4fc36
 ---

This diff was generated by running:
```
buck run fbcode//triton/tools/reactor:reactor -- cherrypick --num-commits 1 --no-submit
```

Reviewed By: dshi7

Differential Revision: D99194850

fbshipit-source-id: a26108582af94f34d779c8b3270ebb7581b50caf
meta-codesync Bot pushed a commit to facebookexperimental/triton that referenced this pull request Apr 6, 2026
…istic by using permute+reshape+join and remove `CatOp`" (#8777)'

Summary:
This is a cherry-pick of an upstream PR: triton-lang/triton#8777

Upstream commit message:
```
> Revert "[Triton] Make `tl.cat` deterministic by using permute+reshape+join and remove `CatOp`" (#8777)

> Reverts triton-lang/triton#8769

> This depends on triton-lang/triton#8776 but
> since it's a significant change, revert it so we can integrate them
> individually.
```

***Do not remove the following line from this commit***
Reactor Cherry-pick Revision: e7fb841
 ---

This diff was generated by running:
```
buck run fbcode//triton/tools/reactor:reactor -- cherrypick --num-commits 1 --no-submit
```

Reviewed By: dshi7

Differential Revision: D99343063

fbshipit-source-id: b7f68732787eaf200639f89adf5ac8b80d69a9b7
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants