[AMD] [FA] Hoist convert_layout to dotOp for Q out of the loop by zhanglx13 · Pull Request #6017 · triton-lang/triton

zhanglx13 · 2025-02-25T16:42:14Z

This PR adds a new amd.pass that hoists conver_layout to dotOperand layout for the Q tensor out of the loop. Therefore, Q tensor is kept in registers instead of being loaded at every iteration of the loop.

This PR is actually achieving the same thing as #4901. However, #4901 does not hoist local_load for Q in the epilogue, making Q tensor live in shared memory all the time.
On the other hand, this PR does the trick before stream-pipeline pass. Therefore, the livessness of Q tensor in shared memory is limited in the prologue.

sjw36

Looks good and much more simple. Thanks!

This PR adds a new amd.pass that hoists conver_layout to dotOperand layout for the Q tensor out of the loop. Therefore, Q tensor is kept in registers instead of being loaded at every iteration of the loop. This PR is actually achieving the same thing as triton-lang#4901. However, triton-lang#4901 does not hoist local_load for Q in the epilogue, making Q tensor live in shared memory all the time. On the other hand, this PR does the trick before stream-pipeline pass. Therefore, the livessness of Q tensor in shared memory is limited in the prologue.

sjw36 approved these changes Feb 25, 2025

View reviewed changes

zhanglx13 force-pushed the hoist_cvt branch from 0e6b790 to efd0fff Compare February 25, 2025 23:50

antiagainst requested changes Feb 26, 2025

View reviewed changes

Move the pass into a FuncOp scope

9de2cbb

antiagainst reviewed Feb 26, 2025

View reviewed changes

Comment thread third_party/amd/lib/TritonAMDGPUTransforms/HoistLayoutConversions.cpp Outdated

Addressed review comments and added lit tests

2e3af7d

antiagainst approved these changes Feb 26, 2025

View reviewed changes

antiagainst marked this pull request as ready for review February 26, 2025 16:00

antiagainst requested a review from ptillet as a code owner February 26, 2025 16:00

zhanglx13 merged commit e24d693 into triton-lang:main Feb 26, 2025

antiagainst deleted the hoist_cvt branch February 28, 2025 19:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AMD] [FA] Hoist convert_layout to dotOp for Q out of the loop#6017

[AMD] [FA] Hoist convert_layout to dotOp for Q out of the loop#6017
zhanglx13 merged 3 commits intotriton-lang:mainfrom
ROCm:hoist_cvt

zhanglx13 commented Feb 25, 2025 •

edited

Loading

Uh oh!

sjw36 left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

zhanglx13 commented Feb 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sjw36 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

zhanglx13 commented Feb 25, 2025 •

edited

Loading