[AMD] Add basics to allow bypass LDS for dot RHS by plognjen · Pull Request #5350 · triton-lang/triton

plognjen · 2024-12-05T14:03:20Z

This pull request supersedes #4856.

The AMDBypassLDSForDotOperandPass implements a strategy to bypass using the
Local Data Share (LDS) for one of the operands in an MFMA dot operation.

Under certain conditions, the dot layout of one of the operands allows direct
loading from HBM to VGPRs in the MFMA dot layout, without losing of vectorization of global loads
or increasing the number of global loads due to shared data between threads.
The required conditions are:

K-Major Tensor Layout:
The operand we want to bypass LDS for must be K-major (i.e., row-major for
operand 0 or column-major for operand 1). This supports vectorized global
load instructions, as MFMA instructions require each thread to hold B
operand elements along the K dimension.
kWidth * sizeof(dataType) == 128:
Using the maximum kWidth for a specific data type ensures optimal global
load vectorization (e.g., using global_load_dwordx4 instructions).
Single Warp per CTA Dimension:
Either warpsPerCTA[ndim] == 1 for operand A bypass or warpsPerCTA[mDim] ==
1 for operand B bypass. This guarantees that each tensor element is
handled by exactly one thread, maintaining the same number of global loads
as in the blocked layout (i.e., each element is loaded only once).

bypass_lds_upstream_new

…#5350)" This reverts commit cea35da.

) Reverting, as I have to revert [cec1db5](cec1db5), (which this change relies on) due to regression in internal tests.

…#5350)" (triton-lang#5708) Reverting, as I have to revert [cec1db5](triton-lang@cec1db5), (which this change relies on) due to regression in internal tests.

…ton-lang#5350)" (triton-lang#5708)" This reverts commit 216385e.

antiagainst requested changes Dec 5, 2024

View reviewed changes

Comment thread third_party/amd/python/triton_amd.cc Outdated

Comment thread include/triton/Tools/Sys/GetEnv.hpp Outdated

antiagainst reviewed Dec 5, 2024

View reviewed changes

Comment thread third_party/amd/lib/TritonAMDGPUTransforms/AMDBypassLDSForDotOperand.cpp Outdated

antiagainst mentioned this pull request Dec 5, 2024

[AMD] Add basics to allow bypass LDS for dot RHS #4856

Closed

Ognjen Plavsic and others added 11 commits January 11, 2025 23:32

initial commit

4d2203f

add test

e34e789

[AMD] Implement AMDBypassLDSForDotOperandPass pass

e0be1f0

Introduce workaround for getSizePerThreadForOperands and add some doc

072c0eb

Address review comments

06e08bf

Address second iteration of review

142807f

Address comments

cd86019

Address comments

2bb479d

resolve conflicts

2498b4a

Fix dot tests

52ba3a5

Merge remote-tracking branch 'origin/main' into bypass_lds_upstream_new

54ab236

plognjen force-pushed the bypass_lds_upstream_new branch from 036fe75 to e8369e6 Compare January 14, 2025 15:26

Resolve merge conflicts

e494441

plognjen force-pushed the bypass_lds_upstream_new branch from e8369e6 to e494441 Compare January 14, 2025 22:15

oplavsic added 2 commits January 19, 2025 18:32

Merge remote-tracking branch 'origin/main' into

137339c

bypass_lds_upstream_new

Fix comments

3d36fc9

plognjen force-pushed the bypass_lds_upstream_new branch from a6bece0 to 3d36fc9 Compare January 19, 2025 20:32

antiagainst approved these changes Jan 22, 2025

View reviewed changes

antiagainst marked this pull request as ready for review January 22, 2025 05:51

antiagainst requested review from ptillet and zhanglx13 as code owners January 22, 2025 05:51

oplavsic added 2 commits January 22, 2025 22:41

Merge remote-tracking branch 'origin/main' into bypass_lds_upstream_new

8b790f5

Fix merge conflicts

cfd7dbd

antiagainst merged commit cea35da into triton-lang:main Jan 23, 2025

pawelszczerbuk added a commit to pawelszczerbuk/triton that referenced this pull request Jan 25, 2025

Revert "[AMD] Add basics to allow bypass LDS for dot RHS (triton-lang…

5a971fb

…#5350)" This reverts commit cea35da.

pawelszczerbuk added a commit to pawelszczerbuk/triton that referenced this pull request Jan 26, 2025

Revert "[AMD] Add basics to allow bypass LDS for dot RHS (triton-lang…

ed4efd5

…#5350)" This reverts commit cea35da.

pawelszczerbuk added a commit that referenced this pull request Jan 26, 2025

Revert "[AMD] Add basics to allow bypass LDS for dot RHS (#5350)" (#5708

216385e

) Reverting, as I have to revert [cec1db5](cec1db5), (which this change relies on) due to regression in internal tests.

lezcano mentioned this pull request Jan 27, 2025

[LAYOUTS] Generalise HoistLayoutConversion to work with arbitrary layouts and chains of ops #5673

Merged

plognjen pushed a commit to plognjen/triton that referenced this pull request Mar 21, 2025

Revert "Revert "[AMD] Add basics to allow bypass LDS for dot RHS (tri…

167bdf5

…ton-lang#5350)" (triton-lang#5708)" This reverts commit 216385e.

plognjen pushed a commit to plognjen/triton that referenced this pull request Mar 24, 2025

Revert "Revert "[AMD] Add basics to allow bypass LDS for dot RHS (tri…

94fcd61

…ton-lang#5350)" (triton-lang#5708)" This reverts commit 216385e.

jtang10 pushed a commit to ROCm/triton that referenced this pull request Jun 27, 2025

Revert "Revert "[AMD] Add basics to allow bypass LDS for dot RHS (tri…

426d164

…ton-lang#5350)" (triton-lang#5708)" This reverts commit 216385e.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AMD] Add basics to allow bypass LDS for dot RHS#5350

[AMD] Add basics to allow bypass LDS for dot RHS#5350
antiagainst merged 16 commits intotriton-lang:mainfrom
plognjen:bypass_lds_upstream_new

plognjen commented Dec 5, 2024 •

edited by antiagainst

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

plognjen commented Dec 5, 2024 • edited by antiagainst Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

plognjen commented Dec 5, 2024 •

edited by antiagainst

Loading