[AMD] f16_gemm gluon kernel improve pipeline by guacamoleo · Pull Request #10057 · triton-lang/triton

guacamoleo · 2026-04-16T18:54:11Z

Improve f16 gemm gfx1250-gluon performance. Improves gemm_tdm_pipelined_single_warp_per_simd_schedule_kernel by moving tdm.load earlier; from the top of the loop (which hides 3/4th of a loop-iteration's worth of cycles) to right after the wait (which hides a full loop-iteration's worth of cycles).
This only fixes the mentioned kernel; other kernels need independent benchmarking and improving.

guacamoleo · 2026-04-16T19:39:32Z

Converting to draft while I verify that gfx1250 tests pass.

guacamoleo · 2026-04-17T19:29:06Z

Ready for review; I've verified that the kernel tests pass, and that this change alone improves performance significantly.

…wait

) Improve f16 gemm gfx1250-gluon performance. Improves gemm_tdm_pipelined_single_warp_per_simd_schedule_kernel by moving tdm.load earlier; from the top of the loop (which hides 3/4th of a loop-iteration's worth of cycles) to right after the wait (which hides a full loop-iteration's worth of cycles). This only fixes the mentioned kernel; other kernels need independent benchmarking and improving.

guacamoleo requested review from antiagainst and zhanglx13 as code owners April 16, 2026 18:54

guacamoleo marked this pull request as draft April 16, 2026 19:38

guacamoleo marked this pull request as ready for review April 17, 2026 19:25

Moving tdm.load earlier; from the top of the loop to right after the …

849ac59

…wait

guacamoleo force-pushed the dtanner/f16_gemm_pipeline branch from 9ab90c1 to 849ac59 Compare April 17, 2026 19:45

antiagainst approved these changes Apr 17, 2026

View reviewed changes

antiagainst merged commit 0ee2ec2 into triton-lang:main Apr 18, 2026
15 of 18 checks passed

antiagainst deleted the dtanner/f16_gemm_pipeline branch April 18, 2026 06:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AMD] f16_gemm gluon kernel improve pipeline#10057

[AMD] f16_gemm gluon kernel improve pipeline#10057
antiagainst merged 1 commit into
triton-lang:mainfrom
ROCm:dtanner/f16_gemm_pipeline

guacamoleo commented Apr 16, 2026

Uh oh!

guacamoleo commented Apr 16, 2026

Uh oh!

guacamoleo commented Apr 17, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

guacamoleo commented Apr 16, 2026

Uh oh!

guacamoleo commented Apr 16, 2026

Uh oh!

guacamoleo commented Apr 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

guacamoleo commented Apr 17, 2026 •

edited

Loading