Skip to content

[AMD] [FA] Hoist convert_layout to dotOp for Q out of the loop#6017

Merged
zhanglx13 merged 3 commits intotriton-lang:mainfrom
ROCm:hoist_cvt
Feb 26, 2025
Merged

[AMD] [FA] Hoist convert_layout to dotOp for Q out of the loop#6017
zhanglx13 merged 3 commits intotriton-lang:mainfrom
ROCm:hoist_cvt

Conversation

@zhanglx13
Copy link
Copy Markdown
Collaborator

@zhanglx13 zhanglx13 commented Feb 25, 2025

This PR adds a new amd.pass that hoists conver_layout to dotOperand layout for the Q tensor out of the loop. Therefore, Q tensor is kept in registers instead of being loaded at every iteration of the loop.

This PR is actually achieving the same thing as #4901. However, #4901 does not hoist local_load for Q in the epilogue, making Q tensor live in shared memory all the time.
On the other hand, this PR does the trick before stream-pipeline pass. Therefore, the livessness of Q tensor in shared memory is limited in the prologue.

Copy link
Copy Markdown
Contributor

@sjw36 sjw36 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good and much more simple. Thanks!

This PR adds a new amd.pass that hoists conver_layout to dotOperand
layout for the Q tensor out of the loop. Therefore, Q tensor is kept
in registers instead of being loaded at every iteration of the loop.

This PR is actually achieving the same thing as
triton-lang#4901. However,
triton-lang#4901 does not hoist
local_load for Q in the epilogue, making Q tensor live in shared
memory all the time.
On the other hand, this PR does the trick before stream-pipeline
pass. Therefore, the livessness of Q tensor in shared memory is
limited in the prologue.
Comment thread third_party/amd/include/TritonAMDGPUTransforms/Passes.td Outdated
Comment thread third_party/amd/include/TritonAMDGPUTransforms/Passes.td
Comment thread third_party/amd/include/TritonAMDGPUTransforms/Passes.td Outdated
Comment thread third_party/amd/lib/TritonAMDGPUTransforms/HoistLayoutConversions.cpp Outdated
Comment thread third_party/amd/lib/TritonAMDGPUTransforms/HoistLayoutConversions.cpp Outdated
@antiagainst antiagainst marked this pull request as ready for review February 26, 2025 16:00
@antiagainst antiagainst requested a review from ptillet as a code owner February 26, 2025 16:00
@zhanglx13 zhanglx13 merged commit e24d693 into triton-lang:main Feb 26, 2025
@antiagainst antiagainst deleted the hoist_cvt branch February 28, 2025 19:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants