Skip to content

(Continuted from PR #86) Initial RDNA Windows bring-up for CK FMHA##87

Open
jammm wants to merge 10 commits into
developfrom
users/jam/pr-86-rdna-fmha
Open

(Continuted from PR #86) Initial RDNA Windows bring-up for CK FMHA##87
jammm wants to merge 10 commits into
developfrom
users/jam/pr-86-rdna-fmha

Conversation

@jammm
Copy link
Copy Markdown

@jammm jammm commented May 9, 2026

Re-posted from #86 (comment)

reviews and final PR merge should go from here.

Needs ROCm/rocm-libraries#7016 to be merged first,

cc @qianfengz @0xDELUXA

0xDELUXA and others added 6 commits April 27, 2026 17:44
Default RDNA CK FMHA builds to round-to-nearest bf16 conversion so CK forward payloads remain accurate when FlashAttention backward consumes them. This fixes the remaining bf16 RDNA failures without adding skips or loosening tolerances.
gfx11 WMMA duplicates Q data across subgroups, so the 128x256 Q tile can exceed static_for's 256-iteration limit. Use M=64 for the hdim-256 common forward tile on gfx11 while keeping the existing tile on gfx12 and other architectures.
jammm added 2 commits May 9, 2026 22:01
Point the vendored CK submodule at the review-updated RDNA FMHA changes and keep the async infer pipeline out of gfx11/gfx12 xFormers builds. The async global-to-LDS CK path is not used for current RDNA3/4 FMHA builds and fails to compile when instantiated after the core CK fallback was removed.
@0xDELUXA
Copy link
Copy Markdown

Now that ROCm/rocm-libraries#7016 has been approved, theoretically, once it goes in, this one could too, right?

Could you please review the changes here, @qianfengz?

@jammm
Copy link
Copy Markdown
Author

jammm commented May 13, 2026

Now that ROCm/rocm-libraries#7016 has been approved, theoretically, once it goes in, this one could too, right?

Could you please review the changes here, @qianfengz?

The PR needs to be merged first, then we update the submodule here before it's ready to be merged.

…na-fmha

# Conflicts:
#	third_party/composable_kernel_tiled
#	xformers/csrc/attention/hip_fmha/ck_tiled_rand_uniform_kernel.h
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants