[Cute,Fwd,Sm90] Support SplitKV by imbr92 · Pull Request #2415 · Dao-AILab/flash-attention

imbr92 · 2026-03-31T13:16:21Z

Summary

Adding support for splitKV in sm90 fwd.

Testing

Ran all tests in tests/cute/test_flash_attn.py

Benchmarks (on H200, splits = 0 is choosing num splits based on existing heuristic)

Configuration	splits=1	splits=0	Speedup
b=1 sq=1 sk=4096 nh=32 nhkv=8 hd=128	0.05ms	0.06ms	0.80x
b=1 sq=1 sk=16384 nh=32 nhkv=8 hd=128	0.18ms	0.07ms	2.60x
b=1 sq=1 sk=65536 nh=32 nhkv=8 hd=128	0.69ms	0.20ms	3.45x
b=1 sq=1 sk=131072 nh=32 nhkv=8 hd=128	1.36ms	0.37ms	3.67x
b=2 sq=1 sk=65536 nh=32 nhkv=8 hd=128	0.69ms	0.37ms	1.86x
b=4 sq=1 sk=16384 nh=32 nhkv=8 hd=128	0.18ms	0.20ms	0.91x
b=1 sq=64 sk=16384 nh=32 nhkv=8 hd=128	0.18ms	0.06ms	2.85x
b=1 sq=128 sk=32768 nh=32 nhkv=8 hd=128	0.35ms	0.11ms	3.25x
b=1 sq=256 sk=65536 nh=32 nhkv=8 hd=128	0.69ms	0.38ms	1.82x
b=1 sq=1 sk=65536 nh=32 nhkv=8 hd=128 causal	0.76ms	0.21ms	3.54x
b=1 sq=128 sk=32768 nh=32 nhkv=8 hd=128 causal	0.39ms	0.11ms	3.39x

Signed-off-by: Siyuan Fu <siyuanf@nvidia.com>

tridao · 2026-04-06T01:54:31Z

Thanks for the contribution, this is great. I'll review it late this week

imbr92 added 7 commits March 30, 2026 16:53

Initial split kv impl

fd00014

Fix split kv with empty splits

c701fee

Enable split kv for sm90 in tests

44c3b0c

Get splitkv working with local attn

61a9620

Support splitkv + packgqa

cf0d5c8

cleanup

2351ca1

ruff on flash_fwd_sm90

698f05d

IwakuraRein mentioned this pull request Apr 2, 2026

[Attention] Allow using system FA4 vllm-project/vllm#38823

Closed

5 tasks

IwakuraRein added a commit to IwakuraRein/flash-attention that referenced this pull request Apr 2, 2026

Support SplitKV Dao-AILab#2415

aaacd74

IwakuraRein mentioned this pull request Apr 2, 2026

Enable hdim 512 sm90 vllm vllm-project/flash-attention#130

Merged

IwakuraRein added a commit to IwakuraRein/flash-attention that referenced this pull request Apr 2, 2026

Support SplitKV Dao-AILab#2415

ccaa385

Signed-off-by: Siyuan Fu <siyuanf@nvidia.com>

IwakuraRein mentioned this pull request Apr 2, 2026

[Attention] relax the head dim 512 and paged kv for sm90+FA4 vllm-project/vllm#38835

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Cute,Fwd,Sm90] Support SplitKV#2415

[Cute,Fwd,Sm90] Support SplitKV#2415
imbr92 wants to merge 7 commits intoDao-AILab:mainfrom
adaptive-ml:sm90_splitkv

imbr92 commented Mar 31, 2026 •

edited

Loading

Uh oh!

tridao commented Apr 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

imbr92 commented Mar 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Testing

Uh oh!

tridao commented Apr 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

imbr92 commented Mar 31, 2026 •

edited

Loading