Skip to content

[Dispatch Creation] Don't fuse uses from above#22708

Merged
IanWood1 merged 1 commit intoiree-org:mainfrom
IanWood1:dispatch_creation_uses_from_above
Nov 20, 2025
Merged

[Dispatch Creation] Don't fuse uses from above#22708
IanWood1 merged 1 commit intoiree-org:mainfrom
IanWood1:dispatch_creation_uses_from_above

Conversation

@IanWood1
Copy link
Contributor

@IanWood1 IanWood1 commented Nov 20, 2025

When fusing with a consumer operation with producers, the consumer may also use values from the producer in its body (e.g., tensor.extract). This change prevents fusing in those cases because we only support fusion through operands.

This was discovered because moveFollowingOpIntoDispatchRegion was generating illegal IR. When moving a consumer into the region, it wasn't updating the consumer's uses of the region if they were nested in the consumer's region (only operands). This led to illegal IR. This PR also fixes moveFollowingOpIntoDispatchRegion to update all uses by the target op.

Progress on #22631

Signed-off-by: Ian Wood <[email protected]>
Copy link
Collaborator

@MaheshRavishankar MaheshRavishankar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nicely done!

cast<IREE::Flow::ReturnOp>(regionOp.getBody().front().getTerminator());
rewriter.replaceOpUsesWithIf(
regionOp, returnOp.getOperands(), [&](OpOperand &operand) {
return clonedTarget->isAncestor(operand.getOwner());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So just to confirm, isAncestor includes the operation as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes!

^bb0(%in: f32, %out: f32):
%c0_idx = arith.constant 0 : index
%c1_idx = arith.constant 1 : index
%extracted = tensor.extract %matmul[%c0_idx, %c1_idx] : tensor<?x?xf32>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting example.

First, I dont know why we are lowering it this way. This seems awfully strange lowering. Are we sure we know whats happening with the input lowering of torch-mlir. I dont think we want to end up with this lowering.

It does so happen that the SSA violation is being used to prevent this fusion, but we should prevent this fusion "structurally" as well, i.e. when we decide if this is a fusable consumer, we should mark it as not fusable. due to the tensor.extract usage in the consumer. That would also avoid the issue and seems like the more correct handling of this in my view.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EDIT: I see that you have checked that exactly below. Nice!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test might be a bit over-simplified, making it look stranger than it is. I've lost the exact IR from CI that was causing this issue but I should be able to re-create it on #22341

bool hasUseFromAbove = false;
mlir::visitUsedValuesDefinedAbove(
op->getRegions(), [&](OpOperand *operand) {
if (loopMaps.contains(operand->get().getDefiningOp())) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is loopMaps ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a map that contains all of the operations currently in the "fusion group".

@IanWood1 IanWood1 merged commit 21c1d55 into iree-org:main Nov 20, 2025
44 of 47 checks passed
bangtianliu pushed a commit to bangtianliu/iree that referenced this pull request Nov 25, 2025
When fusing with a consumer operation with producers, the consumer may
also use values from the producer in its body (e.g., `tensor.extract`).
This change prevents fusing in those cases because we only support
fusion through operands.


This was discovered because `moveFollowingOpIntoDispatchRegion` was
generating illegal IR. When moving a consumer into the region, it wasn't
updating the consumer's uses of the region if they were nested in the
consumer's region (only operands). This led to illegal IR. This PR also
fixes `moveFollowingOpIntoDispatchRegion` to update all uses by the
target op.


Progress on iree-org#22631

Signed-off-by: Ian Wood <[email protected]>
pstarkcdpr pushed a commit to pstarkcdpr/iree that referenced this pull request Nov 28, 2025
When fusing with a consumer operation with producers, the consumer may
also use values from the producer in its body (e.g., `tensor.extract`).
This change prevents fusing in those cases because we only support
fusion through operands.


This was discovered because `moveFollowingOpIntoDispatchRegion` was
generating illegal IR. When moving a consumer into the region, it wasn't
updating the consumer's uses of the region if they were nested in the
consumer's region (only operands). This led to illegal IR. This PR also
fixes `moveFollowingOpIntoDispatchRegion` to update all uses by the
target op.


Progress on iree-org#22631

Signed-off-by: Ian Wood <[email protected]>
IanWood1 added a commit that referenced this pull request Dec 8, 2025
This change allows producers to try to fuse with all consumers.
Previously, fusing with multiple consumers was only allowed if the
consumers were all truncate ops. This has been removed.

This has some side effects that require a few other accompanying
changes:
1. This PR can lead to dispatches with many ops and many operands to the
dispatch. To prevent forming dispatches with more operands than the
runtime can handle, `wouldExceedOperandLimit` was added to limit the
number of operands to 16.
2. The golden times for datatiling llama decode were slightly increased.
See #22841 for more details.
3. `options.numIterations = 32` was added to more aggressively fuse
multi-use elementwise ops in the same dispatch to prevent codegen
issues.
4. Changed check to prevent fusion from
`IREE::LinalgExt::isBitExtendOp()` to
`IREE::Flow::isClonableIntoDispatchOp()` to prevent fusing with
scatter's index producer when it should be cloned.
5. Changed error to warning when
`IREE::Flow::moveFollowingOpIntoDispatchRegion` fails. This can occur
because `hasTransitiveDependencyOnFusionGroup` does not account for
moving ops into dispatch regions. For example, A and B have no use-def
relation and A does have a "transitive dep on the fusion group" but B
doesn't. If you place A and B in the same dispatch, then asking if B
`hasTransitiveDependencyOnFusionGroup` you must also consider all ops in
the dispatch too. Which is currently unaccounted for. Side note: this
change was supposed to be made in
#22708, but I think I merged
without actually making this change.

Related: #22528
Closes: #22462

ci-extra: test_torch

---------

Signed-off-by: Ian Wood <[email protected]>
keshavvinayak01 pushed a commit that referenced this pull request Jan 27, 2026
This change allows producers to try to fuse with all consumers.
Previously, fusing with multiple consumers was only allowed if the
consumers were all truncate ops. This has been removed.

This has some side effects that require a few other accompanying
changes:
1. This PR can lead to dispatches with many ops and many operands to the
dispatch. To prevent forming dispatches with more operands than the
runtime can handle, `wouldExceedOperandLimit` was added to limit the
number of operands to 16.
2. The golden times for datatiling llama decode were slightly increased.
See #22841 for more details.
3. `options.numIterations = 32` was added to more aggressively fuse
multi-use elementwise ops in the same dispatch to prevent codegen
issues.
4. Changed check to prevent fusion from
`IREE::LinalgExt::isBitExtendOp()` to
`IREE::Flow::isClonableIntoDispatchOp()` to prevent fusing with
scatter's index producer when it should be cloned.
5. Changed error to warning when
`IREE::Flow::moveFollowingOpIntoDispatchRegion` fails. This can occur
because `hasTransitiveDependencyOnFusionGroup` does not account for
moving ops into dispatch regions. For example, A and B have no use-def
relation and A does have a "transitive dep on the fusion group" but B
doesn't. If you place A and B in the same dispatch, then asking if B
`hasTransitiveDependencyOnFusionGroup` you must also consider all ops in
the dispatch too. Which is currently unaccounted for. Side note: this
change was supposed to be made in
#22708, but I think I merged
without actually making this change.

Related: #22528
Closes: #22462

ci-extra: test_torch

---------

Signed-off-by: Ian Wood <[email protected]>
Signed-off-by: Keshav Vinayak Jha <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants