Skip to content

[AMD ROCm] Update ROCm/CK backend to align with latest ComposableKernel API changes#2363

Merged
tridao merged 15 commits intoDao-AILab:mainfrom
ROCm:ck_improve_v0.1.9
Mar 18, 2026
Merged

[AMD ROCm] Update ROCm/CK backend to align with latest ComposableKernel API changes#2363
tridao merged 15 commits intoDao-AILab:mainfrom
ROCm:ck_improve_v0.1.9

Conversation

@rocking5566
Copy link
Copy Markdown
Contributor

@rocking5566 rocking5566 commented Mar 17, 2026

Summary

Update the AMD ROCm ComposableKernel (CK) backend to be compatible with the latest CK FMHA API changes, including new fields for MX (microscaling) FP8 support, attention sink, and improved backward pass
dq_accum memory layout.

Changes

CK backend API alignment (csrc/flash_attn_ck/)

  • Update fmha_fwd_args, fmha_fwd_splitkv_traits and fmha_bwd_traits to align with upstream CK
  • Change dq_accum tensor layoutin mha_bwd.cpp and mha_varlen_bwd.cpp to aligns with upstream CK
  • Query nsplits from fmha_bwd_launcher instead of hardcoding split count logic on the host side

ComposableKernel submodule update

  • Bump csrc/composable_kernel from 13f6d635 to 574c1c12

Build system (setup.py)

  • Add -Wno-unknown-warning-option and -fbracket-depth=1024 compiler flags for ROCm CK builds

Testing

  • Validated with pytest tests/test_flash_attn_ck.py on MI300 and MI350

@eliasmagn
Copy link
Copy Markdown

Hi @rocking5566, nice to see we converged on a very similar direction here.

When I saw the recent aiter-related merges, I decided to base this on the CK copy already pulled in through aiter, rather than keeping two overlapping CK copies in the project.

The main motivation on my side was to remove that duplication and rely on the CK path that is already exercised as part of the aiter project.

In any case, I understand the preference for the split you described.

I’m happy to split my PR (#2350) accordingly when the time comes, if that helps with review and integration.

@rocking5566
Copy link
Copy Markdown
Contributor Author

Hi @rocking5566, nice to see we converged on a very similar direction here.

When I saw the recent aiter-related merges, I decided to base this on the CK copy already pulled in through aiter, rather than keeping two overlapping CK copies in the project.

The main motivation on my side was to remove that duplication and rely on the CK path that is already exercised as part of the aiter project.

In any case, I understand the preference for the split you described.

I'm happy to split my PR (#2350) accordingly when the time comes, if that helps with review and integration.

Thanks @eliasmagn! That makes total sense — consolidating on the aiter-vendored CK to avoid duplication is definitely the right long-term direction.

Once this PR gets merged, you should be able to rebase #2350 on top of it, which should significantly reduce the diff and let your PR focus on the aiter migration part.

@tridao would you mind taking a look and merging this one when you get a chance? It should also unblock #2350 for a cleaner integration. Thanks!

@tridao tridao merged commit 8afc617 into Dao-AILab:main Mar 18, 2026
zhuochenKIDD pushed a commit to zhuochenKIDD/flash-attention that referenced this pull request Mar 25, 2026
…el API changes (Dao-AILab#2363)

* update ck

* update ck

* before gpt-oss sink

* gpt-oss sink

* Add missing parameter

* Fix typo

* Update to ROCm/composable_kernel@b09112b

* add -Wno-unknown-warning-option

* Update to ROCm/rocm-libraries#4368 (ROCm/rocm-libraries@17f7dfc)

* Update to ROCm/rocm-libraries@a358a21

---------

Co-authored-by: Ding, Yi <yi.ding@amd.com>
Co-authored-by: Yi DING <andy-ding@outlook.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants