-
Notifications
You must be signed in to change notification settings - Fork 789
[CPU] Switch CPUDoubleTilingExpert pipeline to use IREE::CPU::LoweringConfigAttr. #21354
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
It's ready for review; it depends on |
|
cc @banach-space @egebeysel since there are many SVE/SME changes. They are NFC in terms of e2e execution. It just switches to the new lowering config. |
|
I know what's happening. The tilingLevel from TilingConfig is within We can either drop the support of cache level tiling, which is performing dummy tiling. Or we can make |
Sorry for not responding earlier. I will review this properly tomorrow. In the meantime, +1 to keeping the cache level tiling for now. I am hoping to properly play with it this quarter. Thanks! |
I see, thanks!. I'll ping you when this is ready. I need to work on few more patches now. |
6d7e8c1 to
237939f
Compare
35f78de to
1e9fb9a
Compare
|
yay, integration tests are all happy. I can start splitting the changes out. Note that the final PR may involve many components, since I'll need to restructure the pipeline passes a bit and a few of the passes need to learn about |
cc6b5d5 to
3f71744
Compare
9663fb6 to
1c8163b
Compare
|
There is a huge compile-time regression in toy_llama (23 sec v.s. 3 sec), where the root cause is in pack consumer fusion. I'm looking at the issue. https://gist.github.com/hanhanW/1f125955fb4f2d23871b77a04615d0af |
2c44177 to
f31ac84
Compare
llvm/llvm-project#149600 fixes the issue. The compile-time is back to 3 seconds. There are other new issues, but I think we can fix them later: #21420 |
3879a82 to
06356af
Compare
3cf8c44 to
48a527f
Compare
|
This is ready for review, assuming that the upstream semantic change will be accepted. Please take a look at the last two commits:
|
banach-space
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @hanhanW , great clean-up 🙏🏻
I've not touched this code for a while, so I've focused on the tests - the changes mostly makes sense, though I do have one question.
| return | ||
| } | ||
| // CHECK-DAG: #[[CONFIG:.+]] = #iree_codegen.lowering_config<tile_sizes = {{\[}}[64, 64, 0], [64, 64, 0], [0, 0, 0], [8, 16, 0], [0, 0, 1], [0, 0, 0]]> | ||
| // CHECK-DAG: #[[CONFIG:.+]] = #iree_cpu.lowering_config<cache_parallel = [64, 64, 0], cache_reduction = [0, 0, 0], distribution = [64, 64, 0], vector_common_parallel = [8, 16, 0], vector_inner_parallel = [0, 0, 0], vector_reduction = [0, 0, 1]> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BEFORE:
[0, 0, 1], [0, 0, 0]
AFTER:
vector_inner_parallel = [0, 0, 0], vector_reduction = [0, 0, 1]
Is this correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it is correct. The order in the old lowering config is [[distribution], [vector-common-parallel], [vector-reduction], [vector-inner-parallel]]. The print sorts the keys in alphabetical order, so the vector_inner_parallel is in front of vector_reduction. The values are correct in this case.
iree/compiler/src/iree/compiler/Codegen/Common/TileSizeSelection.h
Lines 23 to 31 in c682836
| /// We currently support the following scenarios, if | |
| /// IREE::Codegen::LoweringConfigAttr is used: | |
| /// 1. [[distribution]] | |
| /// 2. [[distribution], [vector-common-parallel]] | |
| /// 3. [[distribution], [vector-common-parallel], [vector-reduction]] | |
| /// 4. [[distribution], [vector-common-parallel], [vector-reduction], | |
| /// [vector-inner-parallel]] | |
| /// 5. [[distribution], [cache-parallel], [cache-reduction], | |
| /// [vector-parallel], [vector-reduction]] |
48a527f to
8206563
Compare
It also replaces `TileAndFuse` pass uses with `TileRootAndFuseProducerConsumer` pass that may impact other dispatches, if they use DoubleTilingExpert. E.g., generic ops dispatches. Signed-off-by: hanhanW <[email protected]>
…gConfigAttr. Signed-off-by: hanhanW <[email protected]>
8206563 to
14032b5
Compare
Signed-off-by: hanhanW <[email protected]>
hanhanW
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All the upstream changes are in IREE now, finally. Please take a look, thanks!
jtuyls
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, just a few nits and a question.
compiler/src/iree/compiler/Codegen/LLVMCPU/LLVMCPU2DScalableTo1DScalable.cpp
Outdated
Show resolved
Hide resolved
| IREE::Codegen::LoweringConfigAttrInterface loweringConfig = | ||
| getLoweringConfig(op); | ||
| if (!loweringConfig) | ||
| if (!loweringConfig || !loweringConfig.hasWorkgroupTilingLevel()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe useful here to add a comment on why !loweringConfig.hasWorkgroupTilingLevel() is needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question, and I will update the PR description. The reason is that we used to require all the lowering config has four levels of tiling. With the new CPU lowering config, only the root op has the distribution tile sizes; we use such information to determine which op is the root op.
The pipeline verification expects the lowering config should have four levels of tiling -- which is no longer necessary, and it is legacy. To keep a decent verification, we only check the lowering config on root op now. It is also enough as most tiling starts from root op.
I'm going to remove the four levels requirement and refresh the verifier in a follow-up.
I already have a TODO comment here, and I'll refresh the verification logic soon. So I'm not going to update the TODO.
|
|
||
| // CHECK-DAG: #[[CONFIG1:.+]] = #iree_codegen.lowering_config<tile_sizes = {{\[}}[64, 64], [4, [16]], [0, 0], [0, 0]]> | ||
| // CHECK-DAG: #[[CONFIG2:.+]] = #iree_codegen.lowering_config<tile_sizes = {{\[}}[64, 64, 0], [4, [16], 0], [0, 0, 1], [0, 0, 0]]> | ||
| // CHECK-DAG: #[[CONFIG1:.+]] = #iree_cpu.lowering_config<vector_common_parallel = [4, [16]], vector_inner_parallel = [0, 0], vector_reduction = [0, 0]> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one loses the [64, 64] tile sizes information compared with earlier?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question, and see the other comment! The distribution tile sizes is only set on the root op, so we don't expect that in the propagated lowering configs.
Signed-off-by: hanhanW <[email protected]>
…gConfigAttr. (iree-org#21354) The revision switches all the dispatches that use `CPUDoubleTilingExpert` to `IREE::CPU::LoweringConfigAttr`, which is a root-based tiling approach. There are two commits in the revision: - [The switch for `linalg.matmul` dispatches](iree-org@e885e86): it mainly focuses on the lowering config changes. - [The switch for `linalg.generic` dispatches](iree-org@48a527f): - Changes for lowering configs. - Update the pipeline that enumerates all the tiling levels. - Update `LLVMCPU2DScalableTo1DScalable` pass to use `IREE::CPU::LoweringConfigAttr`. The pipeline and lowering config verification only applies on the root op because CPUDoubleTilingExpert expects exactly four levels of tiling. After the switch, only the root op has four levels of tiling. We will need to refresh the verification logic anyway, as they are legacy code and it is easier to have better implementation today. So it will be refreshed in a follow-up. New known issues: - Trailing vector unit dims are not folded away in the root-op based pipeline, which results in larger binary sizes. Because they are unrolled: iree-org#21420 --------- Signed-off-by: hanhanW <[email protected]>
…gConfigAttr. (iree-org#21354) The revision switches all the dispatches that use `CPUDoubleTilingExpert` to `IREE::CPU::LoweringConfigAttr`, which is a root-based tiling approach. There are two commits in the revision: - [The switch for `linalg.matmul` dispatches](iree-org@e885e86): it mainly focuses on the lowering config changes. - [The switch for `linalg.generic` dispatches](iree-org@48a527f): - Changes for lowering configs. - Update the pipeline that enumerates all the tiling levels. - Update `LLVMCPU2DScalableTo1DScalable` pass to use `IREE::CPU::LoweringConfigAttr`. The pipeline and lowering config verification only applies on the root op because CPUDoubleTilingExpert expects exactly four levels of tiling. After the switch, only the root op has four levels of tiling. We will need to refresh the verification logic anyway, as they are legacy code and it is easier to have better implementation today. So it will be refreshed in a follow-up. New known issues: - Trailing vector unit dims are not folded away in the root-op based pipeline, which results in larger binary sizes. Because they are unrolled: iree-org#21420 --------- Signed-off-by: hanhanW <[email protected]> Signed-off-by: keshavvinayak01 <[email protected]>
The revision switches all the dispatches that use
CPUDoubleTilingExperttoIREE::CPU::LoweringConfigAttr, which is a root-based tiling approach. There are two commits in the revision:linalg.matmuldispatches: it mainly focuses on the lowering config changes.linalg.genericdispatches:LLVMCPU2DScalableTo1DScalablepass to useIREE::CPU::LoweringConfigAttr.The pipeline and lowering config verification only applies on the root op because CPUDoubleTilingExpert expects exactly four levels of tiling. After the switch, only the root op has four levels of tiling. We will need to refresh the verification logic anyway, as they are legacy code and it is easier to have better implementation today. So it will be refreshed in a follow-up.
New known issues: