[Gluon] Add support for nv local_store_async by ThomasRaoux · Pull Request #10357 · triton-lang/triton

ThomasRaoux · 2026-05-22T15:10:03Z

No description provided.

lezcano · 2026-05-23T07:46:21Z

will review next week, but if we are adding store it'd be nice to add the load as well for symmetry

ThomasRaoux · 2026-05-25T16:00:32Z

will review next week, but if we are adding store it'd be nice to add the load as well for symmetry

there is no load equivalent to st.async

lezcano

In a follow-up PR, we could check if the op could be lowered to cp.async.bulk.shared::cluster.shared::cta which should hopefully emit fewer instructions.

lezcano · 2026-05-26T07:36:41Z

+  if (bitwidth < 8 || bitwidth > 64 || !llvm::isPowerOf2_32(bitwidth))
+    return emitOpError("requires 8-, 16-, 32-, or 64-bit element types");


unnecessary

lezcano · 2026-05-26T07:42:25Z

+  if (failed(verifyCompletionBarrierLayout(getOperation(), getMbarrier())))
+    return failure();


this just allows a 1-CTA mbarrier, while we could be feeding a tcgen05 op and we'd need a 2-cta one. Let's remove it altogether.

lezcano · 2026-05-26T07:44:07Z


+@pytest.mark.skipif(not is_cuda() or torch.cuda.get_device_capability()[0] < 9, reason="Requires hopper or newer")
+@pytest.mark.parametrize("EXPECT_DELTA", [0, 4], ids=["match", "mismatch"])
+def test_async_shared_store_expect_bytes(EXPECT_DELTA, device, run_wrapper, monkeypatch, num_ctas):


we have a very similar test for TMA. Can you see if it's possible to merge them?

I don't see a way to cleanly merge those

lezcano · 2026-05-26T07:49:59Z

+    Value mbarrier =
+        mapSharedToCluster(storeLoc, mbarrierPtr, targetCTAId, rewriter);


This should use the mbarrierPtr associated to its peer CTA if it's in 2CTA mode (once the verifier allows it). There is a helper to do that.

ThomasRaoux · 2026-05-26T19:12:04Z

In a follow-up PR, we could check if the op could be lowered to cp.async.bulk.shared::cluster.shared::cta which should hopefully emit fewer instructions.

how can that be? This op is for copying data from shared to shared, the one here is from reg to shared

lezcano · 2026-05-26T20:41:44Z

ah, yes, sorry, nevermind

lezcano · 2026-05-26T20:42:25Z

also, looked alright to me, but ping @peterbell10 to review the gluon part

peterbell10 · 2026-05-27T16:30:42Z

+    bar = mbarrier.allocate_mbarrier()
+    mbarrier.init(bar, count=1)
+    mbarrier.expect(bar, smem.nbytes_per_cta)
+    hopper.async_store(smem, values, bar)


Do you know if there are any lifetime issues with the registers, similar to wgmma, or does the instruction completely finish reading the registers synchronously (via the usual SASS register dependency tracking)?

there isn't lifetime issues for the register in this case, it is fully handled by the scoreboard

ThomasRaoux added 2 commits May 21, 2026 19:02

Add Gluon async shared store support

f08a2b3

Support packed async shared stores

9ed9c47

Add async shared store ConSan test

c75874c

lezcano reviewed May 26, 2026

View reviewed changes

ThomasRaoux marked this pull request as ready for review May 27, 2026 02:36

ThomasRaoux requested review from peterbell10 and ptillet as code owners May 27, 2026 02:36

Address async shared store review comments

45f1d62

peterbell10 requested changes May 27, 2026

View reviewed changes

Address Gluon async store review comments

b4ea49e

ThomasRaoux requested a review from peterbell10 May 29, 2026 13:32

Drop async shared store PTX version check

d621f10

peterbell10 approved these changes May 29, 2026

View reviewed changes

ThomasRaoux merged commit af1bca5 into triton-lang:main May 29, 2026
10 checks passed

		if (bitwidth < 8 \|\| bitwidth > 64 \|\| !llvm::isPowerOf2_32(bitwidth))
		return emitOpError("requires 8-, 16-, 32-, or 64-bit element types");

		if (failed(verifyCompletionBarrierLayout(getOperation(), getMbarrier())))
		return failure();

		Value mbarrier =
		mapSharedToCluster(storeLoc, mbarrierPtr, targetCTAId, rewriter);

Conversation

ThomasRaoux commented May 22, 2026

Uh oh!

lezcano commented May 23, 2026

Uh oh!

ThomasRaoux commented May 25, 2026

Uh oh!

lezcano left a comment

Choose a reason for hiding this comment

Uh oh!

lezcano May 26, 2026

Choose a reason for hiding this comment

Uh oh!

lezcano May 26, 2026

Choose a reason for hiding this comment

Uh oh!

lezcano May 26, 2026

Choose a reason for hiding this comment

Uh oh!

ThomasRaoux May 27, 2026

Choose a reason for hiding this comment

Uh oh!

lezcano May 26, 2026

Choose a reason for hiding this comment

Uh oh!

ThomasRaoux commented May 26, 2026

Uh oh!

lezcano commented May 26, 2026

Uh oh!

lezcano commented May 26, 2026

Uh oh!

Uh oh!

Uh oh!

peterbell10 May 27, 2026

Choose a reason for hiding this comment

Uh oh!

ThomasRaoux May 27, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants