[AMD ROCm] Update ROCm/CK backend to align with latest ComposableKernel API changes by rocking5566 · Pull Request #2363 · Dao-AILab/flash-attention

rocking5566 · 2026-03-17T09:44:08Z

Summary

Update the AMD ROCm ComposableKernel (CK) backend to be compatible with the latest CK FMHA API changes, including new fields for MX (microscaling) FP8 support, attention sink, and improved backward pass
dq_accum memory layout.

Changes

CK backend API alignment (csrc/flash_attn_ck/)

Update fmha_fwd_args, fmha_fwd_splitkv_traits and fmha_bwd_traits to align with upstream CK
Change dq_accum tensor layoutin mha_bwd.cpp and mha_varlen_bwd.cpp to aligns with upstream CK
Query nsplits from fmha_bwd_launcher instead of hardcoding split count logic on the host side

ComposableKernel submodule update

Bump csrc/composable_kernel from 13f6d635 to 574c1c12

Build system (setup.py)

Add -Wno-unknown-warning-option and -fbracket-depth=1024 compiler flags for ROCm CK builds

Testing

Validated with pytest tests/test_flash_attn_ck.py on MI300 and MI350

…test_ck_fail

update CK

Update to ROCm/rocm-libraries#4368 (ROCm/rocm-libraries@17f7dfc)

Update to ROCm/rocm-libraries@a358a21

eliasmagn · 2026-03-17T12:40:24Z

Hi @rocking5566, nice to see we converged on a very similar direction here.

When I saw the recent aiter-related merges, I decided to base this on the CK copy already pulled in through aiter, rather than keeping two overlapping CK copies in the project.

The main motivation on my side was to remove that duplication and rely on the CK path that is already exercised as part of the aiter project.

In any case, I understand the preference for the split you described.

I’m happy to split my PR (#2350) accordingly when the time comes, if that helps with review and integration.

rocking5566 · 2026-03-17T16:20:06Z

Hi @rocking5566, nice to see we converged on a very similar direction here.

When I saw the recent aiter-related merges, I decided to base this on the CK copy already pulled in through aiter, rather than keeping two overlapping CK copies in the project.

The main motivation on my side was to remove that duplication and rely on the CK path that is already exercised as part of the aiter project.

In any case, I understand the preference for the split you described.

I'm happy to split my PR (#2350) accordingly when the time comes, if that helps with review and integration.

Thanks @eliasmagn! That makes total sense — consolidating on the aiter-vendored CK to avoid duplication is definitely the right long-term direction.

Once this PR gets merged, you should be able to rebase #2350 on top of it, which should significantly reduce the diff and let your PR focus on the aiter migration part.

@tridao would you mind taking a look and merging this one when you get a chance? It should also unblock #2350 for a cleaner integration. Thanks!

…el API changes (Dao-AILab#2363) * update ck * update ck * before gpt-oss sink * gpt-oss sink * Add missing parameter * Fix typo * Update to ROCm/composable_kernel@b09112b * add -Wno-unknown-warning-option * Update to ROCm/rocm-libraries#4368 (ROCm/rocm-libraries@17f7dfc) * Update to ROCm/rocm-libraries@a358a21 --------- Co-authored-by: Ding, Yi <yi.ding@amd.com> Co-authored-by: Yi DING <andy-ding@outlook.com>

rocking5566 and others added 15 commits January 21, 2026 19:07

update ck

bcb23ce

update ck

e62b282

before gpt-oss sink

12496d6

gpt-oss sink

e08ea4c

Add missing parameter

9094b51

Merge commit 'e62b282deb9a7521953ecae2537ee7f26f059246' into rocking/…

b3290d9

…test_ck_fail

Merge pull request #174 from ROCm/rocking/test_ck_fail

afa0311

update CK

Fix typo

424dc79

Update to ROCm/composable_kernel@b09112b

bd51942

add -Wno-unknown-warning-option

f5259c1

Update to ROCm/composable_kernel@e951863 (#173)

c861fe8

Update to ROCm/rocm-libraries#4368 (ROCm/rocm-libraries@17f7dfc)

6334739

Merge pull request #176 from ROCm/update-api-new-mx-params

8e8921b

Update to ROCm/rocm-libraries#4368 (ROCm/rocm-libraries@17f7dfc)

Update to ROCm/rocm-libraries@a358a21

3432d7c

Merge pull request #177 from ROCm/yiding12/pr5174

febaff9

Update to ROCm/rocm-libraries@a358a21

rocking5566 mentioned this pull request Mar 17, 2026

rocm/ck: switch FlashAttention to the CK vendored via aiter and fix current CK API integration #2350

Open

tridao approved these changes Mar 18, 2026

View reviewed changes

tridao merged commit 8afc617 into Dao-AILab:main Mar 18, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AMD ROCm] Update ROCm/CK backend to align with latest ComposableKernel API changes#2363

[AMD ROCm] Update ROCm/CK backend to align with latest ComposableKernel API changes#2363
tridao merged 15 commits intoDao-AILab:mainfrom
ROCm:ck_improve_v0.1.9

rocking5566 commented Mar 17, 2026 •

edited

Loading

Uh oh!

eliasmagn commented Mar 17, 2026

Uh oh!

rocking5566 commented Mar 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

rocking5566 commented Mar 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Testing

Uh oh!

eliasmagn commented Mar 17, 2026

Uh oh!

rocking5566 commented Mar 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

rocking5566 commented Mar 17, 2026 •

edited

Loading