Cherry picks for 3.0.x release by amjames · Pull Request #4159 · triton-lang/triton

amjames · 2024-06-18T21:23:13Z

Cherry pick the following for release 3.0.x:

In the current implementation, when backward rematerialization encounters a loop argument that has already been rematerialized, the process short-circuits the collection of yield operations, leaving the value in the slice. However, if another loop argument is present in the same slice, the loop is collected again, resulting in the duplication of the first argument without generating the corresponding yield. To address this issue, this fix removes values from the slice that are skipped during collection, ensuring they are not reduplicated. This adjustment ensures that the number of yield operands and loop iter_args remain synchronized.

These math functions are used in pytorch inductor but missing in the current list of hip libdevice.

…ng#4059)

… it is not needed (triton-lang#3790) This PR: - moves shortcut check earlier, to not compute scratch buffer shape if it is not needed - raise priority of AMD specific over common conversions to eliminate uncertainty which pattern to apply. - add regression test for MFMA to Dot Op shortcut

The TritonGPUPipeline pass has unused pass options and the TritonGPUAccelerateMatmul pass option could instead be read from the module attributes, where the data already exists. The goal is to reduce redundancy. --------- Signed-off-by: Finlay Marno <finlay.marno@codeplay.com>

This will enable prefetching for mma-v2 dots on H100. --------- Co-authored-by: Manman Ren <mren@fb.com>

…patibility (triton-lang#4049) The dictionary merge operator was introduced in Python 3.9 and unfortunately PyTorch still supports 3.8. I think for this use case there is no downside to unpacking, other than that it's a bit ugly.

amjames · 2024-06-18T21:25:50Z

I will break this up into individual commits /dependent groups

amjames and others added 8 commits June 17, 2024 22:22

[AMD] Add more math functions in libdevice (triton-lang#4086)

086a891

These math functions are used in pytorch inductor but missing in the current list of hip libdevice.

[BACKEND] Update LLVM version to llvm/llvm-project@765206e (triton-la…

02c6c79

…ng#4059)

Run prefetch on mma-v2 dots (triton-lang#4000)

b010d99

This will enable prefetching for mma-v2 dots on H100. --------- Co-authored-by: Manman Ren <mren@fb.com>

amjames requested review from Jokeren, antiagainst, ptillet and zhanglx13 as code owners June 18, 2024 21:23

amjames closed this Jun 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cherry picks for 3.0.x release#4159

Cherry picks for 3.0.x release#4159
amjames wants to merge 8 commits intotriton-lang:release/3.0.xfrom
amjames:cherry-pick-release

amjames commented Jun 18, 2024

Uh oh!

amjames commented Jun 18, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

Conversation

amjames commented Jun 18, 2024

Uh oh!

amjames commented Jun 18, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants