[Dispatch Creation] Create more multi-use dispatches #22011

IanWood1 · 2025-09-16T23:18:28Z

This change allows producers to try to fuse with all consumers. Previously, fusing with multiple consumers was only allowed if the consumers were all truncate ops. This has been removed.

This has some side effects that require a few other accompanying changes:

This PR can lead to dispatches with many ops and many operands to the dispatch. To prevent forming dispatches with more operands than the runtime can handle, wouldExceedOperandLimit was added to limit the number of operands to 16.
The golden times for datatiling llama decode were slightly increased. See [GPU] VectorDistribute iree_encoding.set_encoding performance #22841 for more details.
options.numIterations = 32 was added to more aggressively fuse multi-use elementwise ops in the same dispatch to prevent codegen issues.
Changed check to prevent fusion from IREE::LinalgExt::isBitExtendOp() to IREE::Flow::isClonableIntoDispatchOp() to prevent fusing with scatter's index producer when it should be cloned.
Changed error to warning when IREE::Flow::moveFollowingOpIntoDispatchRegion fails. This can occur because hasTransitiveDependencyOnFusionGroup does not account for moving ops into dispatch regions. For example, A and B have no use-def relation and A does have a "transitive dep on the fusion group" but B doesn't. If you place A and B in the same dispatch, then asking if B hasTransitiveDependencyOnFusionGroup you must also consider all ops in the dispatch too. Which is currently unaccounted for. Side note: this change was supposed to be made in [Dispatch Creation] Don't fuse uses from above #22708, but I think I merged without actually making this change.

Related: #22528
Closes: #22462

ci-extra: test_torch

IanWood1 · 2025-12-04T22:13:42Z

#22799 didn't seem to fix the issue. The reduction dispatch seems to get much slower when adding an additional result. Here is the before and after (slow)

Update: I created #22841 and I'm just going to increase the golden times until that is resolved.

Signed-off-by: Ian Wood <[email protected]>

IanWood1 · 2025-12-08T18:55:39Z

Ignore the failure on PkgCI / Test Torch / torch_models tests :: amdgpu_mi325_gfx942 (pull_request), it's getting fixed with #22855

MaheshRavishankar

Long time coming!

Lowers golden dispatch counts to reflect the expected numbers after #22011 ci-extra: test_torch Signed-off-by: Ian Wood <[email protected]>

This change allows producers to try to fuse with all consumers. Previously, fusing with multiple consumers was only allowed if the consumers were all truncate ops. This has been removed. This has some side effects that require a few other accompanying changes: 1. This PR can lead to dispatches with many ops and many operands to the dispatch. To prevent forming dispatches with more operands than the runtime can handle, `wouldExceedOperandLimit` was added to limit the number of operands to 16. 2. The golden times for datatiling llama decode were slightly increased. See #22841 for more details. 3. `options.numIterations = 32` was added to more aggressively fuse multi-use elementwise ops in the same dispatch to prevent codegen issues. 4. Changed check to prevent fusion from `IREE::LinalgExt::isBitExtendOp()` to `IREE::Flow::isClonableIntoDispatchOp()` to prevent fusing with scatter's index producer when it should be cloned. 5. Changed error to warning when `IREE::Flow::moveFollowingOpIntoDispatchRegion` fails. This can occur because `hasTransitiveDependencyOnFusionGroup` does not account for moving ops into dispatch regions. For example, A and B have no use-def relation and A does have a "transitive dep on the fusion group" but B doesn't. If you place A and B in the same dispatch, then asking if B `hasTransitiveDependencyOnFusionGroup` you must also consider all ops in the dispatch too. Which is currently unaccounted for. Side note: this change was supposed to be made in #22708, but I think I merged without actually making this change. Related: #22528 Closes: #22462 ci-extra: test_torch --------- Signed-off-by: Ian Wood <[email protected]> Signed-off-by: Keshav Vinayak Jha <[email protected]>

Lowers golden dispatch counts to reflect the expected numbers after #22011 ci-extra: test_torch Signed-off-by: Ian Wood <[email protected]> Signed-off-by: Keshav Vinayak Jha <[email protected]>

IanWood1 force-pushed the enable_fuse_all_multi_use branch 2 times, most recently from 978db84 to 474562e Compare September 18, 2025 19:42

IanWood1 force-pushed the enable_fuse_all_multi_use branch from 9cecc04 to ebfe3c7 Compare October 6, 2025 20:59

IanWood1 force-pushed the enable_fuse_all_multi_use branch 2 times, most recently from d5cf1a2 to 34db4ca Compare November 3, 2025 20:11

IanWood1 force-pushed the enable_fuse_all_multi_use branch from 34db4ca to cbd1c3b Compare November 5, 2025 22:41

This was referenced Nov 6, 2025

[Codegen] SSA violation for multi result scf.forall fusion #22576

Closed

Enable fusion when producers have multiple uses #22462

Closed

IanWood1 force-pushed the enable_fuse_all_multi_use branch 2 times, most recently from aba7406 to ec408a4 Compare December 1, 2025 20:09

IanWood1 mentioned this pull request Dec 2, 2025

[GPU][DT] --iree-hip-enable-tensor-ukernels is pessimizing llama performance #22799

Closed

IanWood1 force-pushed the enable_fuse_all_multi_use branch from a827a1e to d69930d Compare December 4, 2025 18:41

IanWood1 mentioned this pull request Dec 5, 2025

[GPU] VectorDistribute iree_encoding.set_encoding performance #22841

Open

IanWood1 force-pushed the enable_fuse_all_multi_use branch from db02f0d to e92c03f Compare December 8, 2025 17:57

IanWood1 added 7 commits December 8, 2025 10:00

[Dispatch Creation] Create more multi-use dispatches

6677437

Signed-off-by: Ian Wood <[email protected]>

Limit operands to dispatch

69fafca

Signed-off-by: Ian Wood <[email protected]>

Add multi-use fusion test

293a021

Signed-off-by: Ian Wood <[email protected]>

Consolidate kIreeMaxOperandCount and set it to 16

b89eb2a

Signed-off-by: Ian Wood <[email protected]>

Change error to warning

7e82d01

Signed-off-by: Ian Wood <[email protected]>

Don't fuse with clonable consumers

d20b0f4

Signed-off-by: Ian Wood <[email protected]>

Increase datatiling golden time

e92c03f

Signed-off-by: Ian Wood <[email protected]>

IanWood1 marked this pull request as ready for review December 8, 2025 18:55

IanWood1 requested review from Groverkss, MaheshRavishankar and kuhar as code owners December 8, 2025 18:55

MaheshRavishankar approved these changes Dec 8, 2025

View reviewed changes

IanWood1 merged commit fd4ff2b into iree-org:main Dec 8, 2025
49 of 52 checks passed

IanWood1 mentioned this pull request Dec 9, 2025

[Pkgci] Update golden dispatch counts #22869

Merged

IanWood1 added a commit that referenced this pull request Dec 9, 2025

[Pkgci] Update golden dispatch counts (#22869)

4581198

Lowers golden dispatch counts to reflect the expected numbers after #22011 ci-extra: test_torch Signed-off-by: Ian Wood <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Dispatch Creation] Create more multi-use dispatches #22011

[Dispatch Creation] Create more multi-use dispatches #22011

Uh oh!

IanWood1 commented Sep 16, 2025 •

edited

Loading

Uh oh!

IanWood1 commented Dec 4, 2025 •

edited

Loading

Uh oh!

IanWood1 commented Dec 8, 2025

Uh oh!

MaheshRavishankar left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[Dispatch Creation] Create more multi-use dispatches #22011

[Dispatch Creation] Create more multi-use dispatches #22011

Uh oh!

Conversation

IanWood1 commented Sep 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

IanWood1 commented Dec 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

IanWood1 commented Dec 8, 2025

Uh oh!

MaheshRavishankar left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

IanWood1 commented Sep 16, 2025 •

edited

Loading

IanWood1 commented Dec 4, 2025 •

edited

Loading