Skip to content

[BACKEND] Refactor RemoveLayoutConversion pass#2181

Merged
ptillet merged 5 commits into
mainfrom
refactor_remove_convert2
Aug 29, 2023
Merged

[BACKEND] Refactor RemoveLayoutConversion pass#2181
ptillet merged 5 commits into
mainfrom
refactor_remove_convert2

Conversation

@ThomasRaoux
Copy link
Copy Markdown
Collaborator

Significant changes to the pass logic. Move away from greedy rewrites and use more global analysis instead. The pass is now bocken down into 2 main phases. First forward propagation of layout starting from ops that we don't want to change. Propagate to all the nodes. If there is a single layout needed for the op then we can rewrite the op, if there are multiple layout required based on dependency we need a tie break.
The second phase is backward propgation that gets a backward slice of operations starting from the convert and if all the operations in the slice can be rematerialized rewrite the slice. This backward phase now supports going through loop arguments.

This will allow more complex logic in the future to add a cost model to decide which convert to leave and which to fold

@Jokeren
Copy link
Copy Markdown
Contributor

Jokeren commented Aug 25, 2023

Thanks for the great work! I'll make sure to take a pass over once all tests have succeeded.

@github-actions
Copy link
Copy Markdown

⚠️ This PR does not produce bitwise identical kernels as the branch it's merged against. Please check artifacts for details. Download the output file here.

@ThomasRaoux
Copy link
Copy Markdown
Collaborator Author

Thanks for the great work! I'll make sure to take a pass over once all tests have succeeded.

Thanks! The tests are green now, please take a look when you get a chance.

Comment thread lib/Dialect/TritonGPU/Transforms/RemoveLayoutConversions.cpp
Comment thread lib/Dialect/TritonGPU/Transforms/RemoveLayoutConversions.cpp
Comment thread lib/Dialect/TritonGPU/Transforms/RemoveLayoutConversions.cpp Outdated
Comment thread lib/Analysis/Utility.cpp
@Jokeren
Copy link
Copy Markdown
Contributor

Jokeren commented Aug 25, 2023

Have you checked the kernel performance? I asked because it doesn't produce bitwise identical kernels, just wanted to make sure perf is OK

@Jokeren Jokeren changed the title [BACKEND] Refactore RemoveLayoutConversion pass [BACKEND] Refactor RemoveLayoutConversion pass Aug 25, 2023
@ThomasRaoux
Copy link
Copy Markdown
Collaborator Author

Have you checked the kernel performance? I asked because it doesn't produce bitwise identical kernels, just wanted to make sure perf is OK

Right, I'm running the internal benchmarks to compare performance, in general it is on par, there are couple cases that are worse that I'm debugging.

Significant changes to the pass logic. Move away from greedy rewrites
and use more global analysis instead. The pass is now bocken down into
2 main phases. First forward propagation of layout starting from ops
that we don't want to change. Propagate to all the nodes.
If there is a single layout needed for the op then we can rewrite the op,
if there are multiple layout required based on dependency we need a
tie break.
The second phase is backward propgation that gets a backward slice of operations
starting from the convert and if all the operations in the slice can be
rematerialized rewrite the slice. This backward phase now supports going through
loop arguments.

This will allow more complex logic in the future to add a cost model to
decide which convert to leave and which to fold
@ThomasRaoux ThomasRaoux force-pushed the refactor_remove_convert2 branch from 30ba028 to f498901 Compare August 28, 2023 06:21
@github-actions
Copy link
Copy Markdown

⚠️ This PR does not produce bitwise identical kernels as the branch it's merged against. Please check artifacts for details. Download the output file here.

@ptillet ptillet merged commit d4644d6 into main Aug 29, 2023
@ptillet ptillet deleted the refactor_remove_convert2 branch August 29, 2023 02:05
pingzhuu pushed a commit to siliconflow/triton that referenced this pull request Apr 2, 2024
Significant changes to the pass logic. Move away from greedy rewrites
and use more global analysis instead. The pass is now bocken down into 2
main phases. First forward propagation of layout starting from ops that
we don't want to change. Propagate to all the nodes. If there is a
single layout needed for the op then we can rewrite the op, if there are
multiple layout required based on dependency we need a tie break.
The second phase is backward propgation that gets a backward slice of
operations starting from the convert and if all the operations in the
slice can be rematerialized rewrite the slice. This backward phase now
supports going through loop arguments.

This will allow more complex logic in the future to add a cost model to
decide which convert to leave and which to fold
lezcano added a commit that referenced this pull request Jan 31, 2025
…outs and chains of ops (#5673)

We generalise `HoistLayoutConversion` to lift a given `convert_layout
dot_operand`
above any chain of operations that do not require data movement. We
could totally generalise this in the future to lift it over other ops.
We do
this as a first step to keep the code somewhat similar to the previous
one.

Regarding the previous limitations of `canHoistDotOpEncV2` I did a bit
of archeology:
- The "don't hoist past select" was added in this issue
#2857. I run the repro and
with the recent layout fixes, it now passes.
- The TruncOps being skipped comes from
#2181. I think this is
related with the hack that was removed in
#5044, so now it should work
- Same same for the `UIToFpOp`, this is now supported after #5044
- Mixed dtype hack is not necessary either as now everything works as
expected with the `convert_layout` rework.

We also add proper support for `isPure` for `elementwise_inline_asm` ops

On the location of the code, we just leave it in
`RemoveLayoutConversion.cpp` to
take advantage of the rather generic implementation of `rewriteSlice`.
We could totally
move this pass outside of `remove-layout-conversion`, as it's probably
enough to run
it once. This code will go through further changes in the near future,
so we'll assess this
then.
lezcano added a commit that referenced this pull request Feb 1, 2025
We generalise `HoistLayoutConversion` to lift a given `convert_layout dot_operand`
above any chain of operations that do not require data movement. We
could totally generalise this in the future to lift it over other ops. We do
this as a first step to keep the code somewhat similar to the previous
one.

Regarding the previous limitations of `canHoistDotOpEncV2` I did a bit of archeology:
- The "don't hoist past select" was added in this issue #2857. I run the repro and with the recent layout fixes, it now passes.
- The TruncOps being skipped comes from #2181. I think this is related with the hack that was removed in #5044, so now it should work
- Same same for the `UIToFpOp`, this is now supported after #5044
- Mixed dtype hack is not necessary either as now everything works as expected with the `convert_layout` rework.

We also add proper support for `isPure` for `elementwise_inline_asm` ops

On the location of the code, we just leave it in `RemoveLayoutConversion.cpp` to
take advantage of the rather generic implementation of `rewriteSlice`. We could totally
move this pass outside of `remove-layout-conversion`, as it's probably enough to run
it once. This code will go through further changes in the near future, so we'll assess this
then.
lezcano added a commit that referenced this pull request Feb 3, 2025
We generalise `HoistLayoutConversion` to lift a given `convert_layout dot_operand`
above any chain of operations that do not require data movement. We
could totally generalise this in the future to lift it over other ops. We do
this as a first step to keep the code somewhat similar to the previous
one.

Regarding the previous limitations of `canHoistDotOpEncV2` I did a bit of archeology:
- The "don't hoist past select" was added in this issue #2857. I run the repro and with the recent layout fixes, it now passes.
- The TruncOps being skipped comes from #2181. I think this is related with the hack that was removed in #5044, so now it should work
- Same same for the `UIToFpOp`, this is now supported after #5044
- Mixed dtype hack is not necessary either as now everything works as expected with the `convert_layout` rework.

We also add proper support for `isPure` for `elementwise_inline_asm` ops

On the location of the code, we just leave it in `RemoveLayoutConversion.cpp` to
take advantage of the rather generic implementation of `rewriteSlice`. We could totally
move this pass outside of `remove-layout-conversion`, as it's probably enough to run
it once. This code will go through further changes in the near future, so we'll assess this
then.
AlexAUT pushed a commit to AlexAUT/triton that referenced this pull request Feb 6, 2025
…outs and chains of ops (triton-lang#5673)

We generalise `HoistLayoutConversion` to lift a given `convert_layout
dot_operand`
above any chain of operations that do not require data movement. We
could totally generalise this in the future to lift it over other ops.
We do
this as a first step to keep the code somewhat similar to the previous
one.

Regarding the previous limitations of `canHoistDotOpEncV2` I did a bit
of archeology:
- The "don't hoist past select" was added in this issue
triton-lang#2857. I run the repro and
with the recent layout fixes, it now passes.
- The TruncOps being skipped comes from
triton-lang#2181. I think this is
related with the hack that was removed in
triton-lang#5044, so now it should work
- Same same for the `UIToFpOp`, this is now supported after triton-lang#5044
- Mixed dtype hack is not necessary either as now everything works as
expected with the `convert_layout` rework.

We also add proper support for `isPure` for `elementwise_inline_asm` ops

On the location of the code, we just leave it in
`RemoveLayoutConversion.cpp` to
take advantage of the rather generic implementation of `rewriteSlice`.
We could totally
move this pass outside of `remove-layout-conversion`, as it's probably
enough to run
it once. This code will go through further changes in the near future,
so we'll assess this
then.
makslevental pushed a commit to makslevental/triton that referenced this pull request Feb 19, 2025
…outs and chains of ops (triton-lang#5673)

We generalise `HoistLayoutConversion` to lift a given `convert_layout
dot_operand`
above any chain of operations that do not require data movement. We
could totally generalise this in the future to lift it over other ops.
We do
this as a first step to keep the code somewhat similar to the previous
one.

Regarding the previous limitations of `canHoistDotOpEncV2` I did a bit
of archeology:
- The "don't hoist past select" was added in this issue
triton-lang#2857. I run the repro and
with the recent layout fixes, it now passes.
- The TruncOps being skipped comes from
triton-lang#2181. I think this is
related with the hack that was removed in
triton-lang#5044, so now it should work
- Same same for the `UIToFpOp`, this is now supported after triton-lang#5044
- Mixed dtype hack is not necessary either as now everything works as
expected with the `convert_layout` rework.

We also add proper support for `isPure` for `elementwise_inline_asm` ops

On the location of the code, we just leave it in
`RemoveLayoutConversion.cpp` to
take advantage of the rather generic implementation of `rewriteSlice`.
We could totally
move this pass outside of `remove-layout-conversion`, as it's probably
enough to run
it once. This code will go through further changes in the near future,
so we'll assess this
then.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants