Skip to content

Conversation

@hanhanW
Copy link
Contributor

@hanhanW hanhanW commented Jul 12, 2025

The revision switches all the dispatches that use CPUDoubleTilingExpert to IREE::CPU::LoweringConfigAttr, which is a root-based tiling approach. There are two commits in the revision:

The pipeline and lowering config verification only applies on the root op because CPUDoubleTilingExpert expects exactly four levels of tiling. After the switch, only the root op has four levels of tiling. We will need to refresh the verification logic anyway, as they are legacy code and it is easier to have better implementation today. So it will be refreshed in a follow-up.

New known issues:

@hanhanW hanhanW marked this pull request as ready for review July 12, 2025 02:08
@hanhanW hanhanW requested a review from Max191 July 12, 2025 02:08
@hanhanW
Copy link
Contributor Author

hanhanW commented Jul 12, 2025

cc @banach-space @egebeysel since there are many SVE/SME changes. They are NFC in terms of e2e execution. It just switches to the new lowering config.

@hanhanW
Copy link
Contributor Author

hanhanW commented Jul 14, 2025

I know what's happening. The tilingLevel from TilingConfig is within [0, number_of_level), while we'd like to apply TilingLevel in the pass. E.g., if the cache tile sizes are not set, the 1 tilingLevel should map to TilingLevel::VectorCommonParallelTiles which is 3.

We can either drop the support of cache level tiling, which is performing dummy tiling. Or we can make TileRootOpFuseProducerConsumer takes TilingLevel instead of int64_t; switches all the lowering config to CPU one at the same time. The latter is more like lit tests changes, so it should be fine. I'll try the latter approach.

@hanhanW hanhanW marked this pull request as draft July 14, 2025 20:14
@banach-space
Copy link
Collaborator

We can either drop the support of cache level tiling, which is performing dummy tiling. Or we can make TileRootOpFuseProducerConsumer takes TilingLevel instead of int64_t; switches all the lowering config to CPU one at the same time. The latter is more like lit tests changes, so it should be fine. I'll try the latter approach.

Sorry for not responding earlier. I will review this properly tomorrow. In the meantime, +1 to keeping the cache level tiling for now. I am hoping to properly play with it this quarter. Thanks!

@hanhanW
Copy link
Contributor Author

hanhanW commented Jul 14, 2025

We can either drop the support of cache level tiling, which is performing dummy tiling. Or we can make TileRootOpFuseProducerConsumer takes TilingLevel instead of int64_t; switches all the lowering config to CPU one at the same time. The latter is more like lit tests changes, so it should be fine. I'll try the latter approach.

Sorry for not responding earlier. I will review this properly tomorrow. In the meantime, +1 to keeping the cache level tiling for now. I am hoping to properly play with it this quarter. Thanks!

I see, thanks!. I'll ping you when this is ready. I need to work on few more patches now.

@hanhanW hanhanW force-pushed the switch-multi-tiling branch from 6d7e8c1 to 237939f Compare July 15, 2025 00:23
@hanhanW hanhanW changed the title [CPU] Switch matmul dispatches to use IREE::CPU::LoweringConfigAttr. [CPU] Switch CPUDoubleTilingExpert pipeline to use IREE::CPU::LoweringConfigAttr. Jul 15, 2025
@hanhanW hanhanW changed the base branch from users/hanhanW/diff-for-switch-multi-tiling to main July 15, 2025 00:26
@hanhanW hanhanW force-pushed the switch-multi-tiling branch 3 times, most recently from 35f78de to 1e9fb9a Compare July 15, 2025 17:22
@hanhanW
Copy link
Contributor Author

hanhanW commented Jul 15, 2025

yay, integration tests are all happy. I can start splitting the changes out. Note that the final PR may involve many components, since I'll need to restructure the pipeline passes a bit and a few of the passes need to learn about IREE::CPU::LoweringConfigAttr.

@hanhanW hanhanW force-pushed the switch-multi-tiling branch from cc6b5d5 to 3f71744 Compare July 16, 2025 00:33
@hanhanW hanhanW force-pushed the switch-multi-tiling branch 4 times, most recently from 9663fb6 to 1c8163b Compare July 18, 2025 17:55
@hanhanW
Copy link
Contributor Author

hanhanW commented Jul 18, 2025

There is a huge compile-time regression in toy_llama (23 sec v.s. 3 sec), where the root cause is in pack consumer fusion. I'm looking at the issue.

https://gist.github.com/hanhanW/1f125955fb4f2d23871b77a04615d0af

@hanhanW
Copy link
Contributor Author

hanhanW commented Jul 18, 2025

There is a huge compile-time regression in toy_llama (23 sec v.s. 3 sec), where the root cause is in pack consumer fusion. I'm looking at the issue.

https://gist.github.com/hanhanW/1f125955fb4f2d23871b77a04615d0af

llvm/llvm-project#149600 fixes the issue. The compile-time is back to 3 seconds.

There are other new issues, but I think we can fix them later: #21420

@hanhanW hanhanW force-pushed the switch-multi-tiling branch 2 times, most recently from 3879a82 to 06356af Compare July 21, 2025 20:42
@hanhanW hanhanW force-pushed the switch-multi-tiling branch 3 times, most recently from 3cf8c44 to 48a527f Compare July 21, 2025 23:52
@hanhanW hanhanW requested a review from banach-space July 21, 2025 23:58
@hanhanW
Copy link
Contributor Author

hanhanW commented Jul 22, 2025

This is ready for review, assuming that the upstream semantic change will be accepted. Please take a look at the last two commits:

@hanhanW hanhanW marked this pull request as ready for review July 22, 2025 01:36
Copy link
Collaborator

@banach-space banach-space left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @hanhanW , great clean-up 🙏🏻

I've not touched this code for a while, so I've focused on the tests - the changes mostly makes sense, though I do have one question.

return
}
// CHECK-DAG: #[[CONFIG:.+]] = #iree_codegen.lowering_config<tile_sizes = {{\[}}[64, 64, 0], [64, 64, 0], [0, 0, 0], [8, 16, 0], [0, 0, 1], [0, 0, 0]]>
// CHECK-DAG: #[[CONFIG:.+]] = #iree_cpu.lowering_config<cache_parallel = [64, 64, 0], cache_reduction = [0, 0, 0], distribution = [64, 64, 0], vector_common_parallel = [8, 16, 0], vector_inner_parallel = [0, 0, 0], vector_reduction = [0, 0, 1]>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BEFORE:

[0, 0, 1], [0, 0, 0]

AFTER:

vector_inner_parallel = [0, 0, 0], vector_reduction = [0, 0, 1]

Is this correct?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is correct. The order in the old lowering config is [[distribution], [vector-common-parallel], [vector-reduction], [vector-inner-parallel]]. The print sorts the keys in alphabetical order, so the vector_inner_parallel is in front of vector_reduction. The values are correct in this case.

/// We currently support the following scenarios, if
/// IREE::Codegen::LoweringConfigAttr is used:
/// 1. [[distribution]]
/// 2. [[distribution], [vector-common-parallel]]
/// 3. [[distribution], [vector-common-parallel], [vector-reduction]]
/// 4. [[distribution], [vector-common-parallel], [vector-reduction],
/// [vector-inner-parallel]]
/// 5. [[distribution], [cache-parallel], [cache-reduction],
/// [vector-parallel], [vector-reduction]]

hanhanW added 2 commits July 28, 2025 14:12
It also replaces `TileAndFuse` pass uses with
`TileRootAndFuseProducerConsumer` pass that may impact other dispatches,
if they use DoubleTilingExpert. E.g., generic ops dispatches.

Signed-off-by: hanhanW <[email protected]>
@hanhanW hanhanW force-pushed the switch-multi-tiling branch from 8206563 to 14032b5 Compare July 28, 2025 21:13
@hanhanW hanhanW requested review from banach-space and jtuyls July 28, 2025 23:14
Copy link
Contributor Author

@hanhanW hanhanW left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the upstream changes are in IREE now, finally. Please take a look, thanks!

Copy link
Contributor

@jtuyls jtuyls left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just a few nits and a question.

IREE::Codegen::LoweringConfigAttrInterface loweringConfig =
getLoweringConfig(op);
if (!loweringConfig)
if (!loweringConfig || !loweringConfig.hasWorkgroupTilingLevel())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe useful here to add a comment on why !loweringConfig.hasWorkgroupTilingLevel() is needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question, and I will update the PR description. The reason is that we used to require all the lowering config has four levels of tiling. With the new CPU lowering config, only the root op has the distribution tile sizes; we use such information to determine which op is the root op.

The pipeline verification expects the lowering config should have four levels of tiling -- which is no longer necessary, and it is legacy. To keep a decent verification, we only check the lowering config on root op now. It is also enough as most tiling starts from root op.

I'm going to remove the four levels requirement and refresh the verifier in a follow-up.

I already have a TODO comment here, and I'll refresh the verification logic soon. So I'm not going to update the TODO.


// CHECK-DAG: #[[CONFIG1:.+]] = #iree_codegen.lowering_config<tile_sizes = {{\[}}[64, 64], [4, [16]], [0, 0], [0, 0]]>
// CHECK-DAG: #[[CONFIG2:.+]] = #iree_codegen.lowering_config<tile_sizes = {{\[}}[64, 64, 0], [4, [16], 0], [0, 0, 1], [0, 0, 0]]>
// CHECK-DAG: #[[CONFIG1:.+]] = #iree_cpu.lowering_config<vector_common_parallel = [4, [16]], vector_inner_parallel = [0, 0], vector_reduction = [0, 0]>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one loses the [64, 64] tile sizes information compared with earlier?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question, and see the other comment! The distribution tile sizes is only set on the root op, so we don't expect that in the propagated lowering configs.

Signed-off-by: hanhanW <[email protected]>
@hanhanW hanhanW merged commit 382c4fa into iree-org:main Jul 29, 2025
44 checks passed
@hanhanW hanhanW deleted the switch-multi-tiling branch July 29, 2025 17:41
hhkit pushed a commit to opencompl/iree that referenced this pull request Aug 8, 2025
…gConfigAttr. (iree-org#21354)

The revision switches all the dispatches that use
`CPUDoubleTilingExpert` to `IREE::CPU::LoweringConfigAttr`, which is a
root-based tiling approach. There are two commits in the revision:

- [The switch for `linalg.matmul`
dispatches](iree-org@e885e86):
it mainly focuses on the lowering config changes.
- [The switch for `linalg.generic`
dispatches](iree-org@48a527f):
  - Changes for lowering configs.
  - Update the pipeline that enumerates all the tiling levels.
- Update `LLVMCPU2DScalableTo1DScalable` pass to use
`IREE::CPU::LoweringConfigAttr`.

The pipeline and lowering config verification only applies on the root
op because CPUDoubleTilingExpert expects exactly four levels of tiling.
After the switch, only the root op has four levels of tiling. We will
need to refresh the verification logic anyway, as they are legacy code
and it is easier to have better implementation today. So it will be
refreshed in a follow-up.

New known issues:
- Trailing vector unit dims are not folded away in the root-op based
pipeline, which results in larger binary sizes. Because they are
unrolled: iree-org#21420

---------

Signed-off-by: hanhanW <[email protected]>
keshavvinayak01 pushed a commit to keshavvinayak01/iree that referenced this pull request Sep 4, 2025
…gConfigAttr. (iree-org#21354)

The revision switches all the dispatches that use
`CPUDoubleTilingExpert` to `IREE::CPU::LoweringConfigAttr`, which is a
root-based tiling approach. There are two commits in the revision:

- [The switch for `linalg.matmul`
dispatches](iree-org@e885e86):
it mainly focuses on the lowering config changes.
- [The switch for `linalg.generic`
dispatches](iree-org@48a527f):
  - Changes for lowering configs.
  - Update the pipeline that enumerates all the tiling levels.
- Update `LLVMCPU2DScalableTo1DScalable` pass to use
`IREE::CPU::LoweringConfigAttr`.

The pipeline and lowering config verification only applies on the root
op because CPUDoubleTilingExpert expects exactly four levels of tiling.
After the switch, only the root op has four levels of tiling. We will
need to refresh the verification logic anyway, as they are legacy code
and it is easier to have better implementation today. So it will be
refreshed in a follow-up.

New known issues:
- Trailing vector unit dims are not folded away in the root-op based
pipeline, which results in larger binary sizes. Because they are
unrolled: iree-org#21420

---------

Signed-off-by: hanhanW <[email protected]>
Signed-off-by: keshavvinayak01 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants