Skip to content

[ROCm] Track latest composable_kernel#2038

Closed
a4lg wants to merge 1 commit intoDao-AILab:mainfrom
a4lg:update-ck-for-versatile-amd-support-1
Closed

[ROCm] Track latest composable_kernel#2038
a4lg wants to merge 1 commit intoDao-AILab:mainfrom
a4lg:update-ck-for-versatile-amd-support-1

Conversation

@a4lg
Copy link
Copy Markdown

@a4lg a4lg commented Dec 2, 2025

The latest version of composable_kernel supports more versatile architectures and no longer assumes wavefront size of 64 (where many compiler errors come from).

This commit updates the composable_kernel submodule along with necessary interface changes.

This is a part of my attempt to support Flash Attention for Strix Halo (gfx1151) and it seems... CK portions used by Flash Attention already supports this architecture (not entire CK, though). I believe that my interface changes are fine (only adds/removes defaults).

p.s.
If someone has an AMD hardware already supported by Flash Attention (i.e. AMD Instinct), can you check the test results before and after this PR? In my Strix Halo environment, about half of the tests fail due to high arithmetic errors and I'd like to see whether this behavior is Strix Halo-specific.
If that's not Strix Halo-specific, I'll submit a follow-up PR to allow versatile AMD GPU architectures (to setup.py; possibly RDNA 2 or later?).

The latest version of `composable_kernel` supports more versatile
architectures and no longer assumes wavefront size of 64.

This commit updates the `composable_kernel` submodule along with
necessary interface changes.
@tridao
Copy link
Copy Markdown
Member

tridao commented Dec 2, 2025

@rocking5566 does this interface change affect any existing case?

@rocking5566
Copy link
Copy Markdown
Contributor

rocking5566 commented Dec 6, 2025

@a4lg could you also change c++ version in setup.py from c++17 to c++20?
Because lastest CK use c++20 by default

@rocking5566
Copy link
Copy Markdown
Contributor

rocking5566 commented Dec 6, 2025

Actually, we are doing the similiar thing.
just finish the testing and about to send the PR from this branch.
https://github.com/ROCm/flash-attention/tree/ck_improve_v0.1.8

@a4lg
Copy link
Copy Markdown
Author

a4lg commented Dec 6, 2025

@rocking5566
As long as CK is correctly updated, I don't stick to my changes.
BTW, my updated branch with suggested changes will be ready tomorrow (because I'm on a trip).

@rocking5566
Copy link
Copy Markdown
Contributor

@rocking5566 As long as CK is correctly updated, I don't stick to my changes. BTW, my updated branch with suggested changes will be ready tomorrow (because I'm on a trip).

I submit the PR and change the c++ version.
#2052

We also test correctness and performance in this version (commit id) of CK in both MI300 and MI350.

@a4lg
Copy link
Copy Markdown
Author

a4lg commented Dec 7, 2025

Closing in favor of #2052.

@a4lg a4lg closed this Dec 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants