[BACKEND] Refactor RemoveLayoutConversion pass by ThomasRaoux · Pull Request #2181 · triton-lang/triton

ThomasRaoux · 2023-08-25T04:49:51Z

Significant changes to the pass logic. Move away from greedy rewrites and use more global analysis instead. The pass is now bocken down into 2 main phases. First forward propagation of layout starting from ops that we don't want to change. Propagate to all the nodes. If there is a single layout needed for the op then we can rewrite the op, if there are multiple layout required based on dependency we need a tie break.
The second phase is backward propgation that gets a backward slice of operations starting from the convert and if all the operations in the slice can be rematerialized rewrite the slice. This backward phase now supports going through loop arguments.

This will allow more complex logic in the future to add a cost model to decide which convert to leave and which to fold

Jokeren · 2023-08-25T13:50:24Z

Thanks for the great work! I'll make sure to take a pass over once all tests have succeeded.

github-actions · 2023-08-25T16:59:17Z

⚠️ This PR does not produce bitwise identical kernels as the branch it's merged against. Please check artifacts for details. Download the output file here.

ThomasRaoux · 2023-08-25T16:59:47Z

Thanks for the great work! I'll make sure to take a pass over once all tests have succeeded.

Thanks! The tests are green now, please take a look when you get a chance.

Jokeren · 2023-08-25T17:57:22Z

Have you checked the kernel performance? I asked because it doesn't produce bitwise identical kernels, just wanted to make sure perf is OK

ThomasRaoux · 2023-08-28T06:20:59Z

Have you checked the kernel performance? I asked because it doesn't produce bitwise identical kernels, just wanted to make sure perf is OK

Right, I'm running the internal benchmarks to compare performance, in general it is on par, there are couple cases that are worse that I'm debugging.

Significant changes to the pass logic. Move away from greedy rewrites and use more global analysis instead. The pass is now bocken down into 2 main phases. First forward propagation of layout starting from ops that we don't want to change. Propagate to all the nodes. If there is a single layout needed for the op then we can rewrite the op, if there are multiple layout required based on dependency we need a tie break. The second phase is backward propgation that gets a backward slice of operations starting from the convert and if all the operations in the slice can be rematerialized rewrite the slice. This backward phase now supports going through loop arguments. This will allow more complex logic in the future to add a cost model to decide which convert to leave and which to fold

github-actions · 2023-08-28T06:37:36Z

⚠️ This PR does not produce bitwise identical kernels as the branch it's merged against. Please check artifacts for details. Download the output file here.

Significant changes to the pass logic. Move away from greedy rewrites and use more global analysis instead. The pass is now bocken down into 2 main phases. First forward propagation of layout starting from ops that we don't want to change. Propagate to all the nodes. If there is a single layout needed for the op then we can rewrite the op, if there are multiple layout required based on dependency we need a tie break. The second phase is backward propgation that gets a backward slice of operations starting from the convert and if all the operations in the slice can be rematerialized rewrite the slice. This backward phase now supports going through loop arguments. This will allow more complex logic in the future to add a cost model to decide which convert to leave and which to fold

…outs and chains of ops (#5673) We generalise `HoistLayoutConversion` to lift a given `convert_layout dot_operand` above any chain of operations that do not require data movement. We could totally generalise this in the future to lift it over other ops. We do this as a first step to keep the code somewhat similar to the previous one. Regarding the previous limitations of `canHoistDotOpEncV2` I did a bit of archeology: - The "don't hoist past select" was added in this issue #2857. I run the repro and with the recent layout fixes, it now passes. - The TruncOps being skipped comes from #2181. I think this is related with the hack that was removed in #5044, so now it should work - Same same for the `UIToFpOp`, this is now supported after #5044 - Mixed dtype hack is not necessary either as now everything works as expected with the `convert_layout` rework. We also add proper support for `isPure` for `elementwise_inline_asm` ops On the location of the code, we just leave it in `RemoveLayoutConversion.cpp` to take advantage of the rather generic implementation of `rewriteSlice`. We could totally move this pass outside of `remove-layout-conversion`, as it's probably enough to run it once. This code will go through further changes in the near future, so we'll assess this then.

We generalise `HoistLayoutConversion` to lift a given `convert_layout dot_operand` above any chain of operations that do not require data movement. We could totally generalise this in the future to lift it over other ops. We do this as a first step to keep the code somewhat similar to the previous one. Regarding the previous limitations of `canHoistDotOpEncV2` I did a bit of archeology: - The "don't hoist past select" was added in this issue #2857. I run the repro and with the recent layout fixes, it now passes. - The TruncOps being skipped comes from #2181. I think this is related with the hack that was removed in #5044, so now it should work - Same same for the `UIToFpOp`, this is now supported after #5044 - Mixed dtype hack is not necessary either as now everything works as expected with the `convert_layout` rework. We also add proper support for `isPure` for `elementwise_inline_asm` ops On the location of the code, we just leave it in `RemoveLayoutConversion.cpp` to take advantage of the rather generic implementation of `rewriteSlice`. We could totally move this pass outside of `remove-layout-conversion`, as it's probably enough to run it once. This code will go through further changes in the near future, so we'll assess this then.

…outs and chains of ops (triton-lang#5673) We generalise `HoistLayoutConversion` to lift a given `convert_layout dot_operand` above any chain of operations that do not require data movement. We could totally generalise this in the future to lift it over other ops. We do this as a first step to keep the code somewhat similar to the previous one. Regarding the previous limitations of `canHoistDotOpEncV2` I did a bit of archeology: - The "don't hoist past select" was added in this issue triton-lang#2857. I run the repro and with the recent layout fixes, it now passes. - The TruncOps being skipped comes from triton-lang#2181. I think this is related with the hack that was removed in triton-lang#5044, so now it should work - Same same for the `UIToFpOp`, this is now supported after triton-lang#5044 - Mixed dtype hack is not necessary either as now everything works as expected with the `convert_layout` rework. We also add proper support for `isPure` for `elementwise_inline_asm` ops On the location of the code, we just leave it in `RemoveLayoutConversion.cpp` to take advantage of the rather generic implementation of `rewriteSlice`. We could totally move this pass outside of `remove-layout-conversion`, as it's probably enough to run it once. This code will go through further changes in the near future, so we'll assess this then.

ThomasRaoux requested review from Jokeren and ptillet as code owners August 25, 2023 04:49

ThomasRaoux force-pushed the refactor_remove_convert2 branch from 3a62cc5 to 2986a00 Compare August 25, 2023 05:14

ThomasRaoux mentioned this pull request Aug 25, 2023

[Optimizer][Hopper] remove extra convert layout between two dots #2179

Closed

Jokeren reviewed Aug 25, 2023

View reviewed changes

Comment thread lib/Dialect/TritonGPU/Transforms/RemoveLayoutConversions.cpp

Comment thread lib/Dialect/TritonGPU/Transforms/RemoveLayoutConversions.cpp

Comment thread lib/Dialect/TritonGPU/Transforms/RemoveLayoutConversions.cpp Outdated

Comment thread lib/Analysis/Utility.cpp

Jokeren changed the title ~~[BACKEND] Refactore RemoveLayoutConversion pass~~ [BACKEND] Refactor RemoveLayoutConversion pass Aug 25, 2023

ThomasRaoux added 3 commits August 27, 2023 23:21

fix missing IfOp case

08e55ba

Add few missing cases

f498901

ThomasRaoux force-pushed the refactor_remove_convert2 branch from 30ba028 to f498901 Compare August 28, 2023 06:21

ThomasRaoux added 2 commits August 28, 2023 15:18

hoist convert above ext ops in order to reduce the cost

bcf8bd4

Add comment based on review feedback

61c6690

ptillet approved these changes Aug 29, 2023

View reviewed changes

ptillet merged commit d4644d6 into main Aug 29, 2023

ptillet deleted the refactor_remove_convert2 branch August 29, 2023 02:05

ThomasRaoux mentioned this pull request Aug 29, 2023

implementation of flash attention fwd hangs on V100 #1567

Closed

peterbell10 mentioned this pull request Oct 11, 2023

Accuracy failure in reduction kernel #2483

Closed

lezcano mentioned this pull request Jan 23, 2025

[LAYOUTS] Generalise HoistLayoutConversion to work with arbitrary layouts and chains of ops #5673

Merged

lezcano mentioned this pull request Feb 1, 2025

[LAYOUTS] Remove HoistLayoutConversion in favour of backwardsRemat #5788

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BACKEND] Refactor RemoveLayoutConversion pass#2181

[BACKEND] Refactor RemoveLayoutConversion pass#2181
ptillet merged 5 commits into
mainfrom
refactor_remove_convert2

ThomasRaoux commented Aug 25, 2023

Uh oh!

Jokeren commented Aug 25, 2023

Uh oh!

github-actions Bot commented Aug 25, 2023

Uh oh!

ThomasRaoux commented Aug 25, 2023

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Jokeren commented Aug 25, 2023 •

edited

Loading

Uh oh!

ThomasRaoux commented Aug 28, 2023

Uh oh!

github-actions Bot commented Aug 28, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

ThomasRaoux commented Aug 25, 2023

Uh oh!

Jokeren commented Aug 25, 2023

Uh oh!

github-actions Bot commented Aug 25, 2023

Uh oh!

ThomasRaoux commented Aug 25, 2023

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Jokeren commented Aug 25, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ThomasRaoux commented Aug 28, 2023

Uh oh!

github-actions Bot commented Aug 28, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Jokeren commented Aug 25, 2023 •

edited

Loading