[AMD][Backend] Add OptimizeDescriptorEncoding pass for AMDGPU by sriakrish · Pull Request #9792 · triton-lang/triton

sriakrish · 2026-03-20T23:46:07Z

Introduces the pass re-using the AssignDescriptorMemoryLayouts
Better handling of descriptor lowerings
Move shared encoding assignment for descriptor loads with dot operands from LowerLoops to this pass
Fixes rank-reducing descriptor loads
Fixes descriptors passed as kernel args

New contributor declaration

I am not making a trivial change, such as fixing a typo in a comment.
I have written a PR description following these
rules.
I have run pre-commit run --from-ref origin/main --to-ref HEAD.
Select one of the following.
- I have added tests.
  - /test for lit tests
  - /unittest for C++ tests
  - /python/test for end-to-end tests
- This PR does not need a test because FILL THIS IN.
Select one of the following.
- I have not added any lit tests.
- The lit tests I have added follow these best practices,
  including the "tests should be minimal" section. (Usually running Python code
  and using the instructions it generates is not minimal.)

- Introduces the pass re-using the `AssignDescriptorMemoryLayouts` - Better handling of descriptor lowerings - Move shared encoding assignment for descriptor loads with dot operands from LowerLoops to this pass - Fixes rank-reducing descriptor loads - Fixes descriptors passed as kernel args

sriakrish · 2026-03-21T02:07:35Z

@antiagainst

ThomasRaoux · 2026-03-21T00:11:51Z

+  auto encoding = type.getEncoding();
+  assert(isPaddedEncoding(encoding) &&
+         "expected padded encoding or partitioned wrapping padded");
+  if (auto padded = dyn_cast<PaddedSharedEncodingAttr>(encoding)) {


how can a tensor have a padded layout?

We introduced this function to handle rank reducing loads with descriptors. In the lowering of TDM copy in LoadStoreOpToLLVM, we need the layout of the descriptor's block type to properly configure the TDM copy operation. This is particularly useful in rank-reducing loads, since the block type has the layout information required for TDM copy. So to get its linear component, this function was introduced. The padded encoding is from the descriptors' block type.

a tensor should not have a shared layout. We should check this in the verifier..

We introduced this function to handle rank reducing loads with descriptors. In the lowering of TDM copy in LoadStoreOpToLLVM, we need the layout of the descriptor's block type to properly configure the TDM copy operation. This is particularly useful in rank-reducing loads, since the block type has the layout information required for TDM copy. So to get its linear component, this function was introduced. The padded encoding is from the descriptors' block type.

the descriptor can have a shared layout to represent the destination layout but surely the tensor wouldn't. I'm confused how this works

AFAICT, descriptor type just warps a RankedTensorType inside and we just attach the planned shared memory encoding to that inner ranked tensor type; so seemingly we have a shared layout for a tensor type here. This is how it's done for NVIDIA side too like shown in the tests. For AMD we are just using #ttg.padded_shared instead of #ttg.nvmma_shared.

@sriakrish is OOO right now. I changed the API here to give separate shape and encoding here to avoid being confusing a bit, given LinearLayoutConversions.h is more generic.

Right, I think we may want to move the shared encoding attribute to be directly attached to descriptor type itself to better disambiguate? I can create another refactoring pull request if we agree that's better.

#9851 is a prototype.

I think maybe there is a misunderstanding, the RankedTensorType with a shared encoding is never actually the type of an IR value, it's just a component of the TensorDescType. I did it this way simply because RankedTensorType has all the attributes as well as printing and parsing logic. We can clean this up if you guys prefer, but it's purely cosmetic IMO.

e.g. currently the IR looks like !tt.tensordesc<tensor<16x128xf32, #shared>> but instead it could be !tt.tensordesc<16x128xf32, #shared> and not involve RankedTensorType at all.

+1 sounds much better, I didn't realize we had the ranked tensor right now.

antiagainst

The major concern is peeled out to #9851 which will be resolved separately. I have reviewed the rest internally earlier so I'll land for now to move forward. If any further comments we can address post-commit. :)

…-lang#9792) - Introduces the pass re-using the `AssignDescriptorMemoryLayouts` - Better handling of descriptor lowerings - Move shared encoding assignment for descriptor loads with dot operands from LowerLoops to this pass - Fixes rank-reducing descriptor loads - Fixes descriptors passed as kernel args --------- Co-authored-by: Lei Zhang <antiagainst@gmail.com>

sriakrish requested review from antiagainst, lezcano, ptillet and zhanglx13 as code owners March 20, 2026 23:46

ThomasRaoux reviewed Mar 21, 2026

View reviewed changes

antiagainst added 3 commits March 25, 2026 00:41

Improve shared layout query

7a99d19

Merge remote-tracking branch 'origin/main' into amd-opt-desc-encoding

80e8eae

Merge branch 'main' into amd-opt-desc-encoding

747f955

antiagainst approved these changes Mar 30, 2026

View reviewed changes

antiagainst merged commit f50c8df into triton-lang:main Mar 30, 2026
17 of 18 checks passed

antiagainst mentioned this pull request Apr 6, 2026

[Tools][Translator] Add AMD backend support for Triton-to-Gluon translator #9717

Merged

neildhar mentioned this pull request May 8, 2026

Add -Wunused to default Triton builds #10267

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AMD][Backend] Add OptimizeDescriptorEncoding pass for AMDGPU#9792

[AMD][Backend] Add OptimizeDescriptorEncoding pass for AMDGPU#9792
antiagainst merged 4 commits into
triton-lang:mainfrom
sriakrish:amd-opt-desc-encoding

sriakrish commented Mar 20, 2026

Uh oh!

sriakrish commented Mar 21, 2026

Uh oh!

ThomasRaoux Mar 21, 2026

Uh oh!

sriakrish Mar 21, 2026 •

edited

Loading

Uh oh!

lezcano Mar 21, 2026

Uh oh!

ThomasRaoux Mar 21, 2026

Uh oh!

antiagainst Mar 25, 2026

Uh oh!

antiagainst Mar 25, 2026

Uh oh!

antiagainst Mar 26, 2026

Uh oh!

peterbell10 Mar 26, 2026 •

edited

Loading

Uh oh!

peterbell10 Mar 26, 2026

Uh oh!

ThomasRaoux Mar 26, 2026

Uh oh!

antiagainst left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

sriakrish commented Mar 20, 2026

New contributor declaration

Uh oh!

sriakrish commented Mar 21, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sriakrish Mar 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

peterbell10 Mar 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

antiagainst left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

sriakrish Mar 21, 2026 •

edited

Loading

peterbell10 Mar 26, 2026 •

edited

Loading