Skip to content

[Bugfix] Fix FA4 OOM with diff headdim#123

Merged
LucasWilkinson merged 4 commits intovllm-project:mainfrom
MatthewBonanni:disable_split_diffkv
Mar 6, 2026
Merged

[Bugfix] Fix FA4 OOM with diff headdim#123
LucasWilkinson merged 4 commits intovllm-project:mainfrom
MatthewBonanni:disable_split_diffkv

Conversation

@MatthewBonanni
Copy link

@MatthewBonanni MatthewBonanni commented Mar 2, 2026

Using split KV with diff headdim (e.g. 192/128 like DeepSeek MLA prefill) leads to exceeding SMEM. This PR uses a heuristic to either disable splitting or adjust tile size in this case

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Copy link
Collaborator

@LucasWilkinson LucasWilkinson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make more sense to adjust tile sizes instead?

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
@MatthewBonanni
Copy link
Author

MatthewBonanni commented Mar 6, 2026

Turns out the best approach is a heuristic that either disables splitting or adjusts tile size. Benchmarked with vLLM attention benchmarks with 16 q heads (deepseek V2):
comparison

@MatthewBonanni MatthewBonanni changed the title [Bugfix] Disable split KV for diff headdim [Bugfix] Fix OOM with diff headdim Mar 6, 2026
@LucasWilkinson LucasWilkinson merged commit b61e94e into vllm-project:main Mar 6, 2026
1 check passed
@MatthewBonanni MatthewBonanni changed the title [Bugfix] Fix OOM with diff headdim [Bugfix] Fix FA4 OOM with diff headdim Mar 6, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants