[AMD] Enable ds_read_tr* lowering for PartitionedSharedEncodingAttr by plognjen · Pull Request #10062 · triton-lang/triton

plognjen · 2026-04-17T15:36:26Z

Extends the AMD ds_read_tr* local-load lowering to accept PartitionedSharedEncodingAttr as the source encoding.
Previously the pattern bailed out as soon as it saw a partitioned shared encoding, forcing a slower generic
local-load lowering for all WMMA dot-operand loads from partitioned LDS buffers.

plognjen · 2026-04-17T15:37:19Z

-      triton::gpu::LocalLoadOp op,
-      ::triton::AMD::TargetInfo::LDSTransLoadParams ldsParams, Location loc,
-      LinearLayout cvt,
-      SmallVector<Value> &vals, // Input for stmatrix, output for ldmatrix


removed this comment on purpose, as it seems as an accidental copy/paste from nv path

plognjen · 2026-04-17T15:38:13Z

+      // One ds_read_tr* instruction produces `fullTile.getInDimSize(kReg)`
+      // consecutive register values from a single LDS base pointer. We only
+      // select a partition once per instruction, so all of those register
+      // positions must map to the same partition. For a LinearLayout that holds
+      // iff the low log2(elemsPerInstr) register bases contribute 0 to
+      // kPartition. Bail out if not, so a generic lowering can take over.
+      const unsigned numInstrRegBits =
+          llvm::Log2_32(fullTile.getInDimSize(kReg));
+      for (unsigned pos = 0; pos < numInstrRegBits; ++pos) {
+        if (partitionLayout.getBasis(kReg, pos, kPartition) != 0)
+          return failure();
+      }


I should probably add same check in regular lowering path as well.

plognjen · 2026-04-17T15:39:10Z

-def test_runtime_partitioned_tdm_load(BLOCK_M, BLOCK_N, NUM_PARTITIONS, NUM_GROUPS, PARTITION_DIM, num_warps, M, N):
-    """Test TDM async_load with PartitionedSharedLayout (global -> LDS)."""
+@pytest.mark.parametrize("BLOCK_M,BLOCK_N,NUM_PARTITIONS,NUM_GROUPS,PARTITION_DIM", _PARTITIONED_TDM_PARAMS)
+def test_runtime_partitioned_tdm_load(BLOCK_M, BLOCK_N, NUM_PARTITIONS, NUM_GROUPS, PARTITION_DIM):


Repurposed this test so it can check end-to-end correctness of ds_transpose path with partitioned layout as well.

plognjen · 2026-04-17T15:40:51Z

@lezcano @nzaghen can you take a look please?

plognjen · 2026-04-17T15:48:29Z

-@pytest.mark.parametrize("num_warps", [4])
-@pytest.mark.parametrize("M,N", [(256, 256)])


No need to parametrize since it's a single value

lezcano

I didn't carefully read the details, but the general structure looks reasonable to me. I'll let amd folks to have a proper look at the semantics

antiagainst

Overall LGTM; just a few impl nits.

antiagainst · 2026-04-18T22:48:23Z

                             f"partitionDim={PARTITION_DIM}, numPartitions={NUM_PARTITIONS}, numGroups={NUM_GROUPS}")


+@pytest.mark.skipif(not is_hip_gfx1250(), reason="Requires GFX1250")


Compilation only tests don't need to be gated on is_hip_gfx1250.

antiagainst · 2026-04-18T23:08:12Z


    block_layout: ttgl.constexpr = ttgl.BlockedLayout([1, 8], [4, 8], [num_warps, 1], [1, 0])
+    WMMA_LAYOUT: ttgl.constexpr = ttgl.amd.AMDWMMALayout(3, True, [[0, 1], [1, 0]], [], [16, 16, 32])
+    OPERAND_LAYOUT: ttgl.constexpr = ttgl.DotOperandLayout(1, WMMA_LAYOUT, 8)


Nit: maybe using DOT_RHS_LAYOUT to be clearer.

antiagainst · 2026-04-18T23:09:32Z

    auto smemObj = LLVM::getSharedMemoryObjectFromStruct(loc, adaptor.getSrc(),
                                                         llvmElemTy, rewriter);
-    auto smemBase = smemObj.getBase();
+    SmallVector<Value> smemBases(smemObj.getBases().begin(),


Nit: llvm::to_vector(smemObj.getBases())?

antiagainst · 2026-04-18T23:19:22Z

+      // kPartition. Bail out if not, so a generic lowering can take over.
+      const unsigned numInstrRegBits =
+          llvm::Log2_32(fullTile.getInDimSize(kReg));
+      for (unsigned pos = 0; pos < numInstrRegBits; ++pos) {


This is just checking partitionLayout.sublayoutIsZero({kReg}, {kPartition})?

no, because this would check whole reg bases of partition layout, which would include repetitions. The point is that we want to check just first numInstrRegBits, which are number of register from fullTile layout, which is one instruction. It's fine for different repetitions (instructions) to be in different partitions, but we want to check if registers from single instruction are in different partition.

For reference, if want it without looking at the bases, you can do that by reshaping kReg into two dimensions, one of dim numInstrRegBits and a different one and check the sublayoutIsZero there. But tbh I wouldn't rewrite it, the current solution seems alright.

yeah makes sense. Thanks for the explanation.

plognjen requested review from antiagainst, ptillet and zhanglx13 as code owners April 17, 2026 15:36

plognjen commented Apr 17, 2026

View reviewed changes

lezcano reviewed Apr 17, 2026

View reviewed changes

antiagainst approved these changes Apr 18, 2026

View reviewed changes

oplavsic added 2 commits April 20, 2026 11:04

Enable LDS transpose path for tensors with PartitionedSharedEncodingAttr

a539692

Address review comments

7e89e33

plognjen force-pushed the ds_transpose_partitioned branch from 154d54f to 7e89e33 Compare April 20, 2026 11:04

Merge branch 'main' into ds_transpose_partitioned

eaea693

antiagainst approved these changes Apr 20, 2026

View reviewed changes

antiagainst enabled auto-merge (squash) April 20, 2026 17:43

antiagainst merged commit ee5bc26 into triton-lang:main Apr 20, 2026
15 of 18 checks passed

		@pytest.mark.parametrize("num_warps", [4])
		@pytest.mark.parametrize("M,N", [(256, 256)])

		f"partitionDim={PARTITION_DIM}, numPartitions={NUM_PARTITIONS}, numGroups={NUM_GROUPS}")


		@pytest.mark.skipif(not is_hip_gfx1250(), reason="Requires GFX1250")

Conversation

plognjen commented Apr 17, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

plognjen commented Apr 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lezcano left a comment

Choose a reason for hiding this comment

Uh oh!

antiagainst left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

plognjen Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

plognjen commented Apr 17, 2026 •

edited

Loading

plognjen Apr 20, 2026 •

edited

Loading