[BACKEND] Implement multiCTA support for TMA gather/scatter by lezcano · Pull Request #9977 · triton-lang/triton

lezcano · 2026-04-09T12:10:14Z

We also add tighter invariants for gather/scatter ops as well

lezcano · 2026-04-09T12:10:44Z

@peterbell10 can you review this one?

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e63aee2ccb

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Mogball

lgtm

Mogball · 2026-04-10T21:00:38Z

I might get a chance to try this out actually

lezcano · 2026-04-14T14:02:04Z

done, added these ops to the comprehensive TMA into mma test.

lezcano · 2026-04-14T14:05:31Z

ugh, the tests are failing. Let me look into those.

lezcano · 2026-04-14T15:18:23Z

found a real latent multicta issue and fixed it. Thank you for pushing for more comprehensive tests.

peterbell10 · 2026-04-15T15:31:37Z

+    mbarrier.init(bar, count=1)
+
+    gather_offsets = ttgl.load(gather_idx_ptr + ttgl.arange(0, BLOCK_M, layout=x_offsets_layout))
+    mbarrier.expect(bar, blackwell_tma.nbytes_per_cta_gather(in_desc, gather_offsets))


Maybe we should have smem.nbytes_per_cta instead?

Yep, I was thinking about that the other day. Let me do that.

Added, for SharedLinearEncoding we compute a pseudo cga_layout of sorts to divide the shape by it.

We also add tighter invariants for the cga_layout part of all TMA ops

tbh, I think this is also the way we should represent this in our IR. I will add an end-to-end test once #9977 is merged (I will extend the test in that PR to test this op)

tbh, I think this is also the way we should represent this in our IR. I will add an end-to-end test once triton-lang#9977 is merged (I will extend the test in that PR to test this op)

…ang#9977) We also add tighter invariants for gather/scatter ops as well

lezcano requested review from peterbell10 and ptillet as code owners April 9, 2026 12:10

chatgpt-codex-connector Bot reviewed Apr 9, 2026

View reviewed changes

Comment thread third_party/nvidia/lib/TritonNVIDIAGPUToLLVM/LoadStoreOpToLLVM.cpp

lezcano force-pushed the scatter_gather_multicast branch from e63aee2 to 93b0fb1 Compare April 9, 2026 17:33

jeffniu-openai reviewed Apr 9, 2026

View reviewed changes

Comment thread third_party/nvidia/lib/TritonNVIDIAGPUToLLVM/LoadStoreOpToLLVM.cpp

Mogball approved these changes Apr 10, 2026

View reviewed changes

lezcano force-pushed the scatter_gather_multicast branch from 9152ee1 to 52744e4 Compare April 13, 2026 09:41

lezcano changed the title ~~[BACKEND] Implement multiCTA support for TMA gather~~ [BACKEND] Implement multiCTA support for TMA gather/scatter Apr 13, 2026

lezcano mentioned this pull request Apr 13, 2026

[BACKEND] Model async TMA variants in ConSan #10015

Merged

peterbell10 requested changes Apr 13, 2026

View reviewed changes

lezcano force-pushed the scatter_gather_multicast branch from 43bbba8 to bfe7657 Compare April 14, 2026 08:28

lezcano requested a review from peterbell10 April 14, 2026 08:46

lezcano force-pushed the scatter_gather_multicast branch from 3967b38 to d55f753 Compare April 14, 2026 15:14

lezcano mentioned this pull request Apr 15, 2026

[Gluon] Expose TMA atomic ops #10040

Merged

peterbell10 reviewed Apr 15, 2026

View reviewed changes

[BACKEND] Implement multiCTA support for TMA gather

6a54bcc

We also add tighter invariants for the cga_layout part of all TMA ops

lezcano force-pushed the scatter_gather_multicast branch 2 times, most recently from c1148c1 to 6a54bcc Compare April 15, 2026 16:09

better computation

8523533

lezcano added a commit that referenced this pull request Apr 15, 2026

[Gluon] Expose TMA atomic ops (#10040)

ab1f012

tbh, I think this is also the way we should represent this in our IR. I will add an end-to-end test once #9977 is merged (I will extend the test in that PR to test this op)

lezcano requested a review from peterbell10 April 15, 2026 19:54

lezcano requested review from CRobeck, Jokeren and fywkevin as code owners April 16, 2026 12:07

lezcano force-pushed the scatter_gather_multicast branch from 89e495a to 8523533 Compare April 16, 2026 12:09

peterbell10 reviewed Apr 16, 2026

View reviewed changes

Comment thread python/triton/experimental/gluon/language/_core.py Outdated

more explicit

052a53e

peterbell10 approved these changes Apr 16, 2026

View reviewed changes

lezcano enabled auto-merge (squash) April 16, 2026 12:38

lezcano merged commit eb5efe2 into main Apr 16, 2026
23 of 27 checks passed

lezcano deleted the scatter_gather_multicast branch April 16, 2026 17:03

raymondtay pushed a commit to raymondtay/triton that referenced this pull request Apr 18, 2026

[BACKEND] Implement multiCTA support for TMA gather/scatter (triton-l…

117295e

…ang#9977) We also add tighter invariants for gather/scatter ops as well

Conversation

lezcano commented Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lezcano commented Apr 9, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

Mogball left a comment

Choose a reason for hiding this comment

Uh oh!

Mogball commented Apr 10, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lezcano commented Apr 14, 2026

Uh oh!

lezcano commented Apr 14, 2026

Uh oh!

lezcano commented Apr 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

peterbell10 Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

lezcano Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

lezcano Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

lezcano commented Apr 9, 2026 •

edited

Loading

lezcano commented Apr 14, 2026 •

edited

Loading