-
Notifications
You must be signed in to change notification settings - Fork 836
[Dispatch Creation] Create more multi-use dispatches #22011
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
978db84 to
474562e
Compare
9cecc04 to
ebfe3c7
Compare
d5cf1a2 to
34db4ca
Compare
34db4ca to
cbd1c3b
Compare
aba7406 to
ec408a4
Compare
a827a1e to
d69930d
Compare
|
#22799 didn't seem to fix the issue. The reduction dispatch seems to get much slower when adding an additional result. Here is the before and after (slow) Update: I created #22841 and I'm just going to increase the golden times until that is resolved. |
db02f0d to
e92c03f
Compare
Signed-off-by: Ian Wood <[email protected]>
Signed-off-by: Ian Wood <[email protected]>
Signed-off-by: Ian Wood <[email protected]>
Signed-off-by: Ian Wood <[email protected]>
Signed-off-by: Ian Wood <[email protected]>
Signed-off-by: Ian Wood <[email protected]>
Signed-off-by: Ian Wood <[email protected]>
|
Ignore the failure on PkgCI / Test Torch / torch_models tests :: amdgpu_mi325_gfx942 (pull_request), it's getting fixed with #22855 |
MaheshRavishankar
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Long time coming!
Lowers golden dispatch counts to reflect the expected numbers after #22011 ci-extra: test_torch Signed-off-by: Ian Wood <[email protected]>
This change allows producers to try to fuse with all consumers. Previously, fusing with multiple consumers was only allowed if the consumers were all truncate ops. This has been removed. This has some side effects that require a few other accompanying changes: 1. This PR can lead to dispatches with many ops and many operands to the dispatch. To prevent forming dispatches with more operands than the runtime can handle, `wouldExceedOperandLimit` was added to limit the number of operands to 16. 2. The golden times for datatiling llama decode were slightly increased. See #22841 for more details. 3. `options.numIterations = 32` was added to more aggressively fuse multi-use elementwise ops in the same dispatch to prevent codegen issues. 4. Changed check to prevent fusion from `IREE::LinalgExt::isBitExtendOp()` to `IREE::Flow::isClonableIntoDispatchOp()` to prevent fusing with scatter's index producer when it should be cloned. 5. Changed error to warning when `IREE::Flow::moveFollowingOpIntoDispatchRegion` fails. This can occur because `hasTransitiveDependencyOnFusionGroup` does not account for moving ops into dispatch regions. For example, A and B have no use-def relation and A does have a "transitive dep on the fusion group" but B doesn't. If you place A and B in the same dispatch, then asking if B `hasTransitiveDependencyOnFusionGroup` you must also consider all ops in the dispatch too. Which is currently unaccounted for. Side note: this change was supposed to be made in #22708, but I think I merged without actually making this change. Related: #22528 Closes: #22462 ci-extra: test_torch --------- Signed-off-by: Ian Wood <[email protected]> Signed-off-by: Keshav Vinayak Jha <[email protected]>
Lowers golden dispatch counts to reflect the expected numbers after #22011 ci-extra: test_torch Signed-off-by: Ian Wood <[email protected]> Signed-off-by: Keshav Vinayak Jha <[email protected]>
This change allows producers to try to fuse with all consumers. Previously, fusing with multiple consumers was only allowed if the consumers were all truncate ops. This has been removed.
This has some side effects that require a few other accompanying changes:
wouldExceedOperandLimitwas added to limit the number of operands to 16.iree_encoding.set_encodingperformance #22841 for more details.options.numIterations = 32was added to more aggressively fuse multi-use elementwise ops in the same dispatch to prevent codegen issues.IREE::LinalgExt::isBitExtendOp()toIREE::Flow::isClonableIntoDispatchOp()to prevent fusing with scatter's index producer when it should be cloned.IREE::Flow::moveFollowingOpIntoDispatchRegionfails. This can occur becausehasTransitiveDependencyOnFusionGroupdoes not account for moving ops into dispatch regions. For example, A and B have no use-def relation and A does have a "transitive dep on the fusion group" but B doesn't. If you place A and B in the same dispatch, then asking if BhasTransitiveDependencyOnFusionGroupyou must also consider all ops in the dispatch too. Which is currently unaccounted for. Side note: this change was supposed to be made in [Dispatch Creation] Don't fuse uses from above #22708, but I think I merged without actually making this change.Related: #22528
Closes: #22462
ci-extra: test_torch