[Attention] Make `split_decodes_and_prefills(..., require_uniform=True)` support padding by LucasWilkinson · Pull Request #29644 · vllm-project/vllm

LucasWilkinson · 2025-11-28T04:50:20Z

with #28579 we pad attention metadata before building; in the case of a uniform decode request we want to make sure that num_decodes matches the cudagraph size so that attention schedulers receive the same batch size as the graph was captured with. Currently we have work around this in all the existing attention backends (e.g. vllm-project/FlashMLA#3) since we used to pad for attention after building attention metadata so this was always the case. But this is needed #27532 since the FlashMLA FP8 Sparse Kernels do not have this workaround yet.

The | (query_lens == 0) is removed from:

        is_prefill = (query_lens > decode_threshold) | (query_lens == 0)

as a small cleanup since this is not actually required (since this is actually not needed given we do actually want to treat them as decades in the case of full-decode batches and if there is a prefill it will come computed as the first prefill so these entries will be ignored anyways.

Test Plan:

CI

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>

gemini-code-assist

Code Review

This pull request aims to add support for padded requests in split_decodes_and_prefills when require_uniform=True, which is important for cudagraph compatibility. The changes involve updating the splitting logic and adding a corresponding test case. While the intention is correct, I've found a critical issue in the implementation for detecting padded uniform batches. The current logic can fail if the first request is a padding request, leading to incorrect batch splitting. I've provided a detailed comment with a suggested fix to make the logic more robust. The rest of the changes look good.

vllm/v1/attention/backends/utils.py

benchislett

LGTM

Co-authored-by: Benjamin Chislett <chislett.ben@gmail.com> Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>

…e)` support padding (vllm-project#29644) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com> Co-authored-by: Benjamin Chislett <chislett.ben@gmail.com> Signed-off-by: dsuhinin <suhinin.dmitriy@gmail.com>

fix

a19a294

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>

mergify bot added the v1 label Nov 28, 2025

gemini-code-assist bot reviewed Nov 28, 2025

View reviewed changes

vllm/v1/attention/backends/utils.py Show resolved Hide resolved

LucasWilkinson added the ready ONLY add when PR is ready to merge/full CI is needed label Nov 28, 2025

LucasWilkinson requested a review from benchislett November 28, 2025 19:34

benchislett reviewed Dec 1, 2025

View reviewed changes

vllm/v1/attention/backends/utils.py Outdated Show resolved Hide resolved

benchislett approved these changes Dec 1, 2025

View reviewed changes

LucasWilkinson and others added 4 commits December 1, 2025 11:37

Update vllm/v1/attention/backends/utils.py

e319177

Co-authored-by: Benjamin Chislett <chislett.ben@gmail.com> Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>

Merge branch 'main' into lwilkinson/fix-requires-uniform-case

7850e8b

Merge branch 'main' into lwilkinson/fix-requires-uniform-case

c1c1d4b

Merge branch 'main' into lwilkinson/fix-requires-uniform-case

3c5fc6c

LucasWilkinson enabled auto-merge (squash) December 9, 2025 05:06

LucasWilkinson merged commit aed8469 into vllm-project:main Dec 9, 2025
50 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Attention] Make `split_decodes_and_prefills(..., require_uniform=True)` support padding#29644

[Attention] Make `split_decodes_and_prefills(..., require_uniform=True)` support padding#29644
LucasWilkinson merged 5 commits intovllm-project:mainfrom
neuralmagic:lwilkinson/fix-requires-uniform-case

LucasWilkinson commented Nov 28, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

benchislett left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

LucasWilkinson commented Nov 28, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

benchislett left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

LucasWilkinson commented Nov 28, 2025 •

edited by github-actions bot

Loading