Conversation
The latest version of `composable_kernel` supports more versatile architectures and no longer assumes wavefront size of 64. This commit updates the `composable_kernel` submodule along with necessary interface changes.
|
@rocking5566 does this interface change affect any existing case? |
|
@a4lg could you also change c++ version in setup.py from c++17 to c++20? |
|
Actually, we are doing the similiar thing. |
|
@rocking5566 |
I submit the PR and change the c++ version. We also test correctness and performance in this version (commit id) of CK in both MI300 and MI350. |
|
Closing in favor of #2052. |
The latest version of
composable_kernelsupports more versatile architectures and no longer assumes wavefront size of 64 (where many compiler errors come from).This commit updates the
composable_kernelsubmodule along with necessary interface changes.This is a part of my attempt to support Flash Attention for Strix Halo (
gfx1151) and it seems... CK portions used by Flash Attention already supports this architecture (not entire CK, though). I believe that my interface changes are fine (only adds/removes defaults).p.s.
If someone has an AMD hardware already supported by Flash Attention (i.e. AMD Instinct), can you check the test results before and after this PR? In my Strix Halo environment, about half of the tests fail due to high arithmetic errors and I'd like to see whether this behavior is Strix Halo-specific.
If that's not Strix Halo-specific, I'll submit a follow-up PR to allow versatile AMD GPU architectures (to
setup.py; possibly RDNA 2 or later?).