Skip to content

[Cute,Fwd,Sm90] Support SplitKV#2415

Open
imbr92 wants to merge 7 commits intoDao-AILab:mainfrom
adaptive-ml:sm90_splitkv
Open

[Cute,Fwd,Sm90] Support SplitKV#2415
imbr92 wants to merge 7 commits intoDao-AILab:mainfrom
adaptive-ml:sm90_splitkv

Conversation

@imbr92
Copy link
Copy Markdown
Contributor

@imbr92 imbr92 commented Mar 31, 2026

Summary

Adding support for splitKV in sm90 fwd.

Testing

Ran all tests in tests/cute/test_flash_attn.py
image
image
image

Benchmarks (on H200, splits = 0 is choosing num splits based on existing heuristic)

Configuration splits=1 splits=0 Speedup
b=1 sq=1 sk=4096 nh=32 nhkv=8 hd=128 0.05ms 0.06ms 0.80x
b=1 sq=1 sk=16384 nh=32 nhkv=8 hd=128 0.18ms 0.07ms 2.60x
b=1 sq=1 sk=65536 nh=32 nhkv=8 hd=128 0.69ms 0.20ms 3.45x
b=1 sq=1 sk=131072 nh=32 nhkv=8 hd=128 1.36ms 0.37ms 3.67x
b=2 sq=1 sk=65536 nh=32 nhkv=8 hd=128 0.69ms 0.37ms 1.86x
b=4 sq=1 sk=16384 nh=32 nhkv=8 hd=128 0.18ms 0.20ms 0.91x
b=1 sq=64 sk=16384 nh=32 nhkv=8 hd=128 0.18ms 0.06ms 2.85x
b=1 sq=128 sk=32768 nh=32 nhkv=8 hd=128 0.35ms 0.11ms 3.25x
b=1 sq=256 sk=65536 nh=32 nhkv=8 hd=128 0.69ms 0.38ms 1.82x
b=1 sq=1 sk=65536 nh=32 nhkv=8 hd=128 causal 0.76ms 0.21ms 3.54x
b=1 sq=128 sk=32768 nh=32 nhkv=8 hd=128 causal 0.39ms 0.11ms 3.39x

IwakuraRein added a commit to IwakuraRein/flash-attention that referenced this pull request Apr 2, 2026
IwakuraRein added a commit to IwakuraRein/flash-attention that referenced this pull request Apr 2, 2026
Signed-off-by: Siyuan Fu <siyuanf@nvidia.com>
@tridao
Copy link
Copy Markdown
Member

tridao commented Apr 6, 2026

Thanks for the contribution, this is great. I'll review it late this week

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants