[CuTe,Fwd,SM90] Enable head dim 512 for SM90 by IwakuraRein · Pull Request #2422 · Dao-AILab/flash-attention

IwakuraRein · 2026-04-01T21:40:27Z

Relax the checks for head dim for SM90; Fix the N size of MMA and the tensor shape of PagedKVManager.

Signed-off-by: Siyuan Fu <siyuanf@nvidia.com>

tridao · 2026-04-01T22:15:34Z

For hdim 512 we probably want 2 warp groups where WG0 computes the Q @ K, softmax, then write to smem. Then both warp groups compute P @ V. That's how the FA3 implementation did it.
https://github.com/Dao-AILab/flash-attention/blob/main/hopper/mainloop_fwd_sm90_tma_gmma_ws.hpp

As written this PR would have 2 WGs both computing the same Q @ K and softmax so there's redundancy here?

IwakuraRein · 2026-04-02T02:56:33Z

@tridao Thanks for the suggestion. Currently it's still using 1 WG, and the performance in decode is bad. Will optimize based on the LargeHeadDimV path in the FA3.

Signed-off-by: Siyuan Fu <siyuanf@nvidia.com>

IwakuraRein · 2026-04-03T18:47:30Z

Hi @tridao, thanks again for your reply.

Could you clarify why using 1 warp group for QK is preferred? Is it mainly to reduce register pressure? From my understanding, using 1 warp group for QK and 2 for KV introduces an extra shared memory round trip. It also requires synchronization after QK, which seems to be conflict with ping-pong scheduling. In my tests, using 2 warp groups for both QK and KV didn’t cause register spilling.

I'd really appreciate any insight you can share. Thanks!

tridao · 2026-04-03T18:50:34Z

How do you split the work when using 2 WGs to compute QK?
this is a 64x 512 @ 64 x 512 MMA right?
Are you spliting the work along the N (64) dimension? Or do both WGs compute the same 64 x 64 output?

IwakuraRein · 2026-04-03T20:56:06Z

How do you split the work when using 2 WGs to compute QK? this is a 64x 512 @ 64 x 512 MMA right? Are you spliting the work along the N (64) dimension? Or do both WGs compute the same 64 x 64 output?

@tridao I am splitting along the N dimension. I think QK is (64, 128, 16), and VP is (64, 512, 16). Therefore, the MNK of the MMA for each warpgroup is (64, 64, 16) for QK, and (64, 256, 16) for KV. The generated SASS confirmed this.

tridao · 2026-04-03T21:45:22Z

What's the M, N, K dimension for the QK MMA?
Shoudn't K be 512 if Q, K have headdim 512?

relax head dim 512 for sm90

a581db4

Signed-off-by: Siyuan Fu <siyuanf@nvidia.com>

use two mma warp groups (WIP)

cedf07d

Signed-off-by: Siyuan Fu <siyuanf@nvidia.com>

IwakuraRein mentioned this pull request Apr 2, 2026

[Attention] Allow using system FA4 vllm-project/vllm#38823

Closed

5 tasks

IwakuraRein added a commit to IwakuraRein/flash-attention that referenced this pull request Apr 2, 2026

Enable head dim 512 for SM90 Dao-AILab#2422

089b809

IwakuraRein mentioned this pull request Apr 2, 2026

Enable hdim 512 sm90 vllm vllm-project/flash-attention#130

Merged

IwakuraRein added a commit to IwakuraRein/flash-attention that referenced this pull request Apr 2, 2026

Enable head dim 512 for SM90 Dao-AILab#2422

0b68146

Signed-off-by: Siyuan Fu <siyuanf@nvidia.com>

IwakuraRein mentioned this pull request Apr 2, 2026

[Attention] relax the head dim 512 and paged kv for sm90+FA4 vllm-project/vllm#38835

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CuTe,Fwd,SM90] Enable head dim 512 for SM90#2422

[CuTe,Fwd,SM90] Enable head dim 512 for SM90#2422
IwakuraRein wants to merge 2 commits intoDao-AILab:mainfrom
IwakuraRein:enable-hdim-512-sm90

IwakuraRein commented Apr 1, 2026

Uh oh!

tridao commented Apr 1, 2026

Uh oh!

IwakuraRein commented Apr 2, 2026

Uh oh!

IwakuraRein commented Apr 3, 2026

Uh oh!

tridao commented Apr 3, 2026

Uh oh!

IwakuraRein commented Apr 3, 2026

Uh oh!

tridao commented Apr 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

IwakuraRein commented Apr 1, 2026

Uh oh!

tridao commented Apr 1, 2026

Uh oh!

IwakuraRein commented Apr 2, 2026

Uh oh!

IwakuraRein commented Apr 3, 2026

Uh oh!

tridao commented Apr 3, 2026

Uh oh!

IwakuraRein commented Apr 3, 2026

Uh oh!

tridao commented Apr 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants