Skip to content

[CK_TILE] Update CK and add RDNA build support#178

Merged
rocking5566 merged 10 commits intock_improve_mainfrom
hyoon1/fa-ck-rdna
Mar 26, 2026
Merged

[CK_TILE] Update CK and add RDNA build support#178
rocking5566 merged 10 commits intock_improve_mainfrom
hyoon1/fa-ck-rdna

Conversation

@hyoon1
Copy link
Copy Markdown

@hyoon1 hyoon1 commented Mar 20, 2026

Motivation

Update CK to a gfx1x FMHA-capable version and align FlashAttention CK argument wiring accordingly.

Technical Details

  • Update composable_kernel to e5683e2
  • Align FlashAttention CK integration with the updated CK interface:
    • adjust CK FMHA call/trait argument wiring to match the updated CK side
    • keep CK-side argument mapping consistent after the CK update
  • Add ROCm gfx11/gfx12 build support
  • Set CK_TILE_FLOAT_TO_BFLOAT16_DEFAULT=0 for gfx11 targets
  • Add backward guards for unstable gfx1x paths:
    • gfx11: disable CK backward
    • gfx12: disable deterministic CK backward

Test Plan

pytest tests/test_flash_attn_ck.py

Test Result

No failures on both gfx11 and gfx12

Submission Checklist

@hyoon1 hyoon1 requested a review from rocking5566 March 20, 2026 19:52
@hyoon1 hyoon1 force-pushed the hyoon1/fa-ck-rdna branch from 78af732 to 4a318f7 Compare March 20, 2026 19:58
@hyoon1 hyoon1 changed the title [CK_TILE] Update CK and add gfx1x build support [CK_TILE] Update CK and add RDNA build support Mar 20, 2026
@hyoon1 hyoon1 force-pushed the hyoon1/fa-ck-rdna branch 2 times, most recently from 376c625 to 3f99c2a Compare March 23, 2026 22:34
@rocking5566 rocking5566 requested a review from Copilot March 25, 2026 05:38
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates the ROCm CK backend to a newer composable_kernel revision and extends build/runtime support for RDNA (gfx11/gfx12), including updated FMHA argument wiring and targeted backward guards.

Changes:

  • Bump composable_kernel submodule and adjust CK FMHA args to match the updated interface.
  • Add gfx11/gfx12 build targeting and default CK_TILE_FLOAT_TO_BFLOAT16_DEFAULT behavior for gfx11 targets.
  • Gate/skip CK backward in tests and runtime for unsupported/unstable gfx1x backward paths; add optional “LLC Head Grouping” forward dispatch.

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
tests/test_flash_attn_ck.py Adds gfx11/gfx12 detection helpers and skips/guards for unsupported CK backward on gfx1x.
setup.py Extends supported GPU_ARCHS, adds target detection / forwarding to CK kernel generator, and adjusts CK_TILE defaults for gfx11.
csrc/flash_attn_ck/mha_varlen_fwd.cpp Updates CK args and adds optional head-grouped forward dispatch with logging.
csrc/flash_attn_ck/mha_varlen_bwd.cpp Adds gfx1x backward support checks before allocating/intermediate work.
csrc/flash_attn_ck/mha_fwd.cpp Updates CK args and adds optional head-grouped forward dispatch with logging.
csrc/flash_attn_ck/mha_bwd.cpp Adds gfx1x backward support checks before allocating/intermediate work.
csrc/flash_attn_ck/flash_common.hpp Adds ROCm arch detection helpers and a gfx1x deterministic/backward guard.
csrc/composable_kernel Updates submodule pointer to the newer CK commit.
README.md Documents RDNA 3/4 support and current backward limitations.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread setup.py Outdated
Comment thread setup.py
Comment thread csrc/flash_attn_ck/flash_common.hpp Outdated
Comment thread csrc/flash_attn_ck/mha_fwd.cpp Outdated
Copy link
Copy Markdown
Collaborator

@rocking5566 rocking5566 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good — the RDNA support and backward guards are well-structured. A few items worth addressing beyond what Copilot already flagged (I've replied on those threads separately).

Comment thread csrc/flash_attn_ck/mha_fwd.cpp
Comment thread csrc/flash_attn_ck/mha_fwd.cpp Outdated
Comment thread tests/test_flash_attn_ck.py Outdated
Comment thread tests/test_flash_attn_ck.py Outdated
Comment thread tests/test_flash_attn_ck.py
Comment thread setup.py
@hyoon1 hyoon1 force-pushed the hyoon1/fa-ck-rdna branch from 541d5a2 to 12b0f2a Compare March 26, 2026 04:26
@hyoon1 hyoon1 force-pushed the hyoon1/fa-ck-rdna branch from 12b0f2a to d16038e Compare March 26, 2026 06:21
Copy link
Copy Markdown
Collaborator

@rocking5566 rocking5566 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@rocking5566 rocking5566 merged commit ba957ff into ck_improve_main Mar 26, 2026
@rocking5566 rocking5566 deleted the hyoon1/fa-ck-rdna branch April 8, 2026 07:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants