[Dispatch Creation] Don't fuse uses from above#22708
Conversation
Signed-off-by: Ian Wood <[email protected]>
| cast<IREE::Flow::ReturnOp>(regionOp.getBody().front().getTerminator()); | ||
| rewriter.replaceOpUsesWithIf( | ||
| regionOp, returnOp.getOperands(), [&](OpOperand &operand) { | ||
| return clonedTarget->isAncestor(operand.getOwner()); |
There was a problem hiding this comment.
So just to confirm, isAncestor includes the operation as well.
| ^bb0(%in: f32, %out: f32): | ||
| %c0_idx = arith.constant 0 : index | ||
| %c1_idx = arith.constant 1 : index | ||
| %extracted = tensor.extract %matmul[%c0_idx, %c1_idx] : tensor<?x?xf32> |
There was a problem hiding this comment.
Interesting example.
First, I dont know why we are lowering it this way. This seems awfully strange lowering. Are we sure we know whats happening with the input lowering of torch-mlir. I dont think we want to end up with this lowering.
It does so happen that the SSA violation is being used to prevent this fusion, but we should prevent this fusion "structurally" as well, i.e. when we decide if this is a fusable consumer, we should mark it as not fusable. due to the tensor.extract usage in the consumer. That would also avoid the issue and seems like the more correct handling of this in my view.
There was a problem hiding this comment.
EDIT: I see that you have checked that exactly below. Nice!
There was a problem hiding this comment.
This test might be a bit over-simplified, making it look stranger than it is. I've lost the exact IR from CI that was causing this issue but I should be able to re-create it on #22341
| bool hasUseFromAbove = false; | ||
| mlir::visitUsedValuesDefinedAbove( | ||
| op->getRegions(), [&](OpOperand *operand) { | ||
| if (loopMaps.contains(operand->get().getDefiningOp())) { |
There was a problem hiding this comment.
What is loopMaps ?
There was a problem hiding this comment.
It's a map that contains all of the operations currently in the "fusion group".
When fusing with a consumer operation with producers, the consumer may also use values from the producer in its body (e.g., `tensor.extract`). This change prevents fusing in those cases because we only support fusion through operands. This was discovered because `moveFollowingOpIntoDispatchRegion` was generating illegal IR. When moving a consumer into the region, it wasn't updating the consumer's uses of the region if they were nested in the consumer's region (only operands). This led to illegal IR. This PR also fixes `moveFollowingOpIntoDispatchRegion` to update all uses by the target op. Progress on iree-org#22631 Signed-off-by: Ian Wood <[email protected]>
When fusing with a consumer operation with producers, the consumer may also use values from the producer in its body (e.g., `tensor.extract`). This change prevents fusing in those cases because we only support fusion through operands. This was discovered because `moveFollowingOpIntoDispatchRegion` was generating illegal IR. When moving a consumer into the region, it wasn't updating the consumer's uses of the region if they were nested in the consumer's region (only operands). This led to illegal IR. This PR also fixes `moveFollowingOpIntoDispatchRegion` to update all uses by the target op. Progress on iree-org#22631 Signed-off-by: Ian Wood <[email protected]>
This change allows producers to try to fuse with all consumers. Previously, fusing with multiple consumers was only allowed if the consumers were all truncate ops. This has been removed. This has some side effects that require a few other accompanying changes: 1. This PR can lead to dispatches with many ops and many operands to the dispatch. To prevent forming dispatches with more operands than the runtime can handle, `wouldExceedOperandLimit` was added to limit the number of operands to 16. 2. The golden times for datatiling llama decode were slightly increased. See #22841 for more details. 3. `options.numIterations = 32` was added to more aggressively fuse multi-use elementwise ops in the same dispatch to prevent codegen issues. 4. Changed check to prevent fusion from `IREE::LinalgExt::isBitExtendOp()` to `IREE::Flow::isClonableIntoDispatchOp()` to prevent fusing with scatter's index producer when it should be cloned. 5. Changed error to warning when `IREE::Flow::moveFollowingOpIntoDispatchRegion` fails. This can occur because `hasTransitiveDependencyOnFusionGroup` does not account for moving ops into dispatch regions. For example, A and B have no use-def relation and A does have a "transitive dep on the fusion group" but B doesn't. If you place A and B in the same dispatch, then asking if B `hasTransitiveDependencyOnFusionGroup` you must also consider all ops in the dispatch too. Which is currently unaccounted for. Side note: this change was supposed to be made in #22708, but I think I merged without actually making this change. Related: #22528 Closes: #22462 ci-extra: test_torch --------- Signed-off-by: Ian Wood <[email protected]>
This change allows producers to try to fuse with all consumers. Previously, fusing with multiple consumers was only allowed if the consumers were all truncate ops. This has been removed. This has some side effects that require a few other accompanying changes: 1. This PR can lead to dispatches with many ops and many operands to the dispatch. To prevent forming dispatches with more operands than the runtime can handle, `wouldExceedOperandLimit` was added to limit the number of operands to 16. 2. The golden times for datatiling llama decode were slightly increased. See #22841 for more details. 3. `options.numIterations = 32` was added to more aggressively fuse multi-use elementwise ops in the same dispatch to prevent codegen issues. 4. Changed check to prevent fusion from `IREE::LinalgExt::isBitExtendOp()` to `IREE::Flow::isClonableIntoDispatchOp()` to prevent fusing with scatter's index producer when it should be cloned. 5. Changed error to warning when `IREE::Flow::moveFollowingOpIntoDispatchRegion` fails. This can occur because `hasTransitiveDependencyOnFusionGroup` does not account for moving ops into dispatch regions. For example, A and B have no use-def relation and A does have a "transitive dep on the fusion group" but B doesn't. If you place A and B in the same dispatch, then asking if B `hasTransitiveDependencyOnFusionGroup` you must also consider all ops in the dispatch too. Which is currently unaccounted for. Side note: this change was supposed to be made in #22708, but I think I merged without actually making this change. Related: #22528 Closes: #22462 ci-extra: test_torch --------- Signed-off-by: Ian Wood <[email protected]> Signed-off-by: Keshav Vinayak Jha <[email protected]>
When fusing with a consumer operation with producers, the consumer may also use values from the producer in its body (e.g.,
tensor.extract). This change prevents fusing in those cases because we only support fusion through operands.This was discovered because
moveFollowingOpIntoDispatchRegionwas generating illegal IR. When moving a consumer into the region, it wasn't updating the consumer's uses of the region if they were nested in the consumer's region (only operands). This led to illegal IR. This PR also fixesmoveFollowingOpIntoDispatchRegionto update all uses by the target op.Progress on #22631