[Backend] Add a shared layout for padding by antiagainst · Pull Request #7212 · triton-lang/triton

antiagainst · 2025-06-18T03:54:48Z

This commit adds a new shared memory layout for padding.
Padding cannot be represented with linear layout, so we need
to define it at a parallel level with the swizzled shared layout.

Intermediate lowering steps don't need to concern about
the exact padding actually; only when we are making the
1-D physical allocation and creating pointers for indexing
we then need to factor in the padding. It means we can
leverage existing linear layout facilities for reasoning the
element mapping.

This reverts commit b36f6c3.

ThomasRaoux · 2025-06-19T00:57:07Z

+        return get(context, intervals, paddings, order, ctaLayout);
+      }]>,
+       AttrBuilder<(ins "ArrayRef<int64_t>":$shape, "ArrayRef<unsigned>":$order,
+                       "unsigned":$dotKWidth, "unsigned":$elemBitWidth,


can we move those builder to helper functions instead? We have that for the other shared layouts and it's super annoying (and we have been meaning to clean it up)

for context it's annoying because builders are not expected to handle logic to decide on the layout

Yeah good point. Done with a66fa0d.

sorry my comment should have been more clear, what I meant is that adding a builder that take dotKWidth and other such parameters is confusing as it contains logic on how to avoid bank conflicts based on the reg layout. Those make it very hard to read the code as builder are usually expected to be simple and not contain logic related to bank conflicts or other such considerations. My suggestion was to make this an explicit function that would call into the default builder.

Yeah makes sense. I don't need this builder right away. (It was added because I also enabled pipeliner on AMD side to emit this layout just to try out correctness with b622870 but I reverted it to make this pull request focusing on core changes.) So I just dropped it with 25221f4 and can do it properly later when needed.

Jokeren

In which cases should we still use PaddedSharedEncoding but not the swizzled layout?

antiagainst · 2025-06-19T04:26:23Z

In which cases should we still use PaddedSharedEncoding but not the swizzled layout?

One example is for CDNA4, we have global load direct to LDS (i.e. shared memory) instruction. However, that instruction does not support scattered write when writing to LDS--the whole warp uses one m0 register to describe the base offset (see 3.7. M0 Memory Descriptor in https://www.amd.com/content/dam/amd/en/documents/instinct-tech-docs/instruction-set-architectures/amd-instinct-cdna4-instruction-set-architecture.pdf) and the whole warp can only write consecutive banks. So we cannot do swizzled write like the normal way. Instead we need to "reverse" swizzle the global pointers when load from global memory. Those "reverse" swizzle introduces overhead because we need to do warp shuffle with ds_permute to exchange global pointers etc. which is a source of performance issues. Using padded layout could avoid the cost there.

This reverts commit 2de5b2f.

This reverts commit b5f258e.

This reverts commit 017b888.

lezcano

Amazing! Just a minor point (and see Thomas' point) but otherwise looks great!

lezcano · 2025-06-20T08:26:48Z

+    if (auto paddedLayout =
+            dyn_cast<gpu::PaddedSharedEncodingAttr>(allocType.getEncoding())) {
+      SmallVector<int64_t> unpaddedShape = gpu::getShapePerCTA(allocType);
+      numElems = paddedLayout.getPaddedSize(unpaddedShape);


It might be better to do it inside getAllocationShapePerCTA

I actually was trying to do that. Then I realized it's not that compatible--getAllocationShapePerCTA assumes the original ranked shape, while after factoring in padding fundamentally we only have a 1-D size. Also getAllocationShapePerCTA is used quite a few places that assumes original rank. So ends up I'm doing it this way given only when doing allocation or the final pointer indexing we care about the exact physical memory.

actually makes sense.

lezcano

LGTM but let's wait for Thomas' ok

lezcano · 2025-06-20T15:33:59Z

+    if (auto paddedLayout =
+            dyn_cast<gpu::PaddedSharedEncodingAttr>(allocType.getEncoding())) {
+      SmallVector<int64_t> unpaddedShape = gpu::getShapePerCTA(allocType);
+      numElems = paddedLayout.getPaddedSize(unpaddedShape);


actually makes sense.

ThomasRaoux

LGTM

For padded layouts introduced by #7212 we need to add the padding to the base ptr of the resulting subview.

For padded layouts introduced by triton-lang/triton#7212 we need to add the padding to the base ptr of the resulting subview.

This commit adds a new shared memory layout for padding. Padding cannot be represented with linear layout, so we need to define it at a parallel level with the swizzled shared layout. Intermediate lowering steps don't need to concern about the exact padding actually; only when we are making the 1-D physical allocation and creating pointers for indexing we then need to factor in the padding. It means we can leverage existing linear layout facilities for reasoning the element mapping.

antiagainst added 10 commits June 18, 2025 03:43

[Backend] Add a PaddedSharedEncodingAttr definition

4173051

Support PaddedSharedEncodingAttr in LLVM lowering

8e8bb88

Add new padded shared layout attr builder

ddebf3c

Fix LLVM lowering issues

03b802a

Fix more llvm lowering issues

3bdcc7e

Add allocation tests

ae67bba

Fix a bunch of small issues

fbb041e

Add linear layout conversion test

d59edb8

Wire up StreamPipeline usage

b622870

Revert "Wire up StreamPipeline usage"

961ecc4

This reverts commit b36f6c3.

antiagainst force-pushed the padded-shared branch from 5875a0f to 961ecc4 Compare June 18, 2025 04:01

lezcano reviewed Jun 18, 2025

View reviewed changes

Comment thread lib/Tools/LinearLayout.cpp Outdated

Merge remote-tracking branch 'origin/main' into padded-shared

eafb2fd

antiagainst marked this pull request as ready for review June 18, 2025 16:39

antiagainst requested review from Jokeren, ptillet and zhanglx13 as code owners June 18, 2025 16:39

antiagainst added 3 commits June 19, 2025 00:35

Add some more tests

b1c6f94

Improve PaddedLinearLayout a bit

017b888

Fix lit test

c0f88a8

ThomasRaoux reviewed Jun 19, 2025

View reviewed changes

Jokeren reviewed Jun 19, 2025

View reviewed changes

antiagainst added 3 commits June 19, 2025 03:33

Move builder out to cpp

a66fa0d

Improve wording for PaddedLinearLayout once more

b5f258e

Rename to SwizzledOrPaddedLayout

2de5b2f

antiagainst added 4 commits June 19, 2025 15:18

Revert "Rename to SwizzledOrPaddedLayout"

89d069d

This reverts commit 2de5b2f.

Revert "Improve wording for PaddedLinearLayout once more"

28c3428

This reverts commit b5f258e.

Revert "Improve PaddedLinearLayout a bit"

0637bc5

This reverts commit 017b888.

Drop PaddedLinearLayout

8fa8d8d

antiagainst added 2 commits June 19, 2025 22:08

Use reshapeOuts

e176ed3

Merge remote-tracking branch 'origin/main' into padded-shared

cd9fbad

antiagainst referenced this pull request in antiagainst/triton Jun 19, 2025

Drop PaddedLinearLayout

61e11fc

lezcano reviewed Jun 20, 2025

View reviewed changes

antiagainst added 3 commits June 20, 2025 15:10

Drop a builder for now

25221f4

Drop not used code

c068a7f

Merge remote-tracking branch 'origin/main' into padded-shared

72a1f56

lezcano approved these changes Jun 20, 2025

View reviewed changes

ThomasRaoux approved these changes Jun 20, 2025

View reviewed changes

antiagainst merged commit 526d168 into triton-lang:main Jun 20, 2025
9 checks passed

antiagainst deleted the padded-shared branch June 20, 2025 22:09

AlexAUT mentioned this pull request Jul 7, 2025

[BACKEND] Apply padding to memdesc_subview for PaddedSharedEncoding #7404

Merged

antiagainst pushed a commit that referenced this pull request Jul 7, 2025

[BACKEND] Fix subview padding for PaddedSharedEncoding (#7404)

6a38bee

For padded layouts introduced by #7212 we need to add the padding to the base ptr of the resulting subview.

Conversation

antiagainst commented Jun 18, 2025

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

antiagainst Jun 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Jokeren left a comment

Choose a reason for hiding this comment

Uh oh!

antiagainst commented Jun 19, 2025

Uh oh!

lezcano left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lezcano left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ThomasRaoux left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

antiagainst Jun 20, 2025 •

edited

Loading

lezcano left a comment •

edited

Loading