[Attention] Refactor `check_and_update_config` by MatthewBonanni · Pull Request #33600 · vllm-project/vllm

MatthewBonanni · 2026-02-02T21:43:36Z

Purpose

check_and_update_config unnecessarily duplicates much of the logic from the attention selector in order to set an approproate block size. This PR refactors check_and_update_config to use the selector, which will be simpler to maintain going forward.

If the user specifies --block-size and --attention-backend and the backend doesn't support the block size, we raise an error rather than overriding a user selection
If the user specifies --block-size but not the backend, the backend selector respects that block size choice and tries to find a backend which is compatible, raising an error if no valid backends are found
- If the user-selected block size forces the selection of a non-optimal backend, the user is warned about this
If the user specifies --attention-backend only, an appropriate block size is selected
If the user specifies neither, an appropriate attention backend is selected, and an appropriate block size for that backend is selected

Test Plan

Automatic selection

vllm serve deepseek-ai/DeepSeek-V2-Lite-Chat --attention-backend=FLASHMLA

yields

Setting kv cache block size to 64 for FLASHMLA backend.

Setting bad block size

vllm serve deepseek-ai/DeepSeek-V2-Lite-Chat --attention-backend=FLASHMLA --block-size 32

yields

(APIServer pid=1710084) pydantic_core._pydantic_core.ValidationError: 1 validation error for VllmConfig
(APIServer pid=1710084)   Value error, User-specified block_size=16 is incompatible with FLASH_ATTN_MLA backend (requires block_size=128). Either remove --block-size to auto-select, or use --block-size 128. [type=value_error, input_value=ArgsKwargs((), {'model_co...transfer_config': None}), input_type=ArgsKwargs]

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>

mergify · 2026-02-02T21:44:21Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @MatthewBonanni.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

gemini-code-assist

Code Review

This pull request refactors the attention backend selection logic in check_and_update_config and get_attn_backend_cls. The changes significantly improve code clarity and maintainability by centralizing the backend selection logic into a new select_attention_backend method and introducing get_preferred_block_size for determining block sizes. This is a great improvement. I've found one issue with a type hint that should be addressed.

vllm/platforms/cuda.py

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>

vllm/v1/attention/backend.py

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>

mergify · 2026-02-16T21:34:38Z

Hi @MatthewBonanni, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy or markdownlint failing?

mypy and markdownlint are run differently in CI. If the failure is related to either of these checks, please use the following commands to run them locally:

# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10
# For markdownlint
pre-commit run --hook-stage manual markdownlint

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>

pavanimajety

Thanks for the PR, @MatthewBonanni!
Looks good to me, pending clean CI and minor nits!

vllm/platforms/cuda.py

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>

mergify · 2026-02-17T20:08:07Z

Hi @MatthewBonanni, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy or markdownlint failing?

mypy and markdownlint are run differently in CI. If the failure is related to either of these checks, please use the following commands to run them locally:

# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10
# For markdownlint
pre-commit run --hook-stage manual markdownlint

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>

mgoin

Great work, LGTM!

DarkLight1337 · 2026-02-18T07:31:18Z

I think this PR broke model initialization and V1 Others tests, checking in https://buildkite.com/vllm/ci/builds/52009/steps/canvas

The build for the previous PR for comparison: https://buildkite.com/vllm/ci/builds/52001/steps/canvas

njhill · 2026-02-18T16:47:29Z

Do we understand how this was merged with so many breakages? Have we given up on the rule that we don't force merge without certainty that CI failures are unrelated (even if it seems obvious that they are)? Has @vllm-bot gone rogue?

mgoin · 2026-02-18T16:53:26Z

I went rogue, sorry. I looked at the failures but thought they were not related

MatthewBonanni · 2026-02-18T17:00:36Z

My fault too, sorry. I asked @mgoin yesterday morning whether we should consider a force merge on this. I also assumed failures were unrelated.

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com> Signed-off-by: wzhao18 <wzhao18.sz@gmail.com>

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com> Signed-off-by: Jason Ozuzu <jasonozuzu@cohere.com>

This reverts commit 7743152.

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com> Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com> Signed-off-by: Andrii Skliar <askliar@nvidia.com>

Initial attempt

15a53c4

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>

mergify bot added nvidia v1 labels Feb 2, 2026

mergify bot added the needs-rebase label Feb 2, 2026

github-project-automation bot added this to NVIDIA Feb 2, 2026

gemini-code-assist bot reviewed Feb 2, 2026

View reviewed changes

vllm/platforms/cuda.py Outdated Show resolved Hide resolved

MatthewBonanni added 2 commits February 2, 2026 16:47

Type hint

1f2161e

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>

Merge branch 'main' into refactor_block_size

0244607

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>

mergify bot removed the needs-rebase label Feb 12, 2026

Bugfix + don't override user setting, just throw error

6e6ed2b

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>

MatthewBonanni changed the title ~~[WIP][Attention] Refactor check_and_update_config~~ [Attention] Refactor check_and_update_config Feb 12, 2026

MatthewBonanni marked this pull request as ready for review February 13, 2026 02:08

MatthewBonanni requested review from WoosukKwon, alexm-redhat, njhill, youkaichao and zhuohan123 as code owners February 13, 2026 02:08

pavanimajety added the ready ONLY add when PR is ready to merge/full CI is needed label Feb 13, 2026

ElizaWszola reviewed Feb 13, 2026

View reviewed changes

vllm/v1/attention/backend.py Outdated Show resolved Hide resolved

MatthewBonanni added 5 commits February 16, 2026 11:04

Prefer the default block size if it's valid

3b4f7ae

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>

Refactor

a4392f8

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>

Warn when block size choice forces non-optimal backend

088c875

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>

Update warning

7812c4e

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>

Clean up

bc5957c

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>

Rename

e6106c5

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>

pavanimajety approved these changes Feb 16, 2026

View reviewed changes

vllm/platforms/cuda.py Outdated Show resolved Hide resolved

vllm/platforms/cuda.py Outdated Show resolved Hide resolved

github-project-automation bot moved this to Ready in NVIDIA Feb 16, 2026

Rename selected_backend to user_specified_backend

49ed58a

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>

MatthewBonanni added 4 commits February 17, 2026 10:10

Update error messages

6b2b22c

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>

Merge branch 'main' into refactor_block_size

b6f0a98

Fix hybrid and config context

aebc5ca

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>

Fix hybrid and clean up

718a05e

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>

MatthewBonanni requested a review from heheda12345 as a code owner February 17, 2026 20:03

MatthewBonanni added 2 commits February 17, 2026 15:52

Fix pre-commit

4dc0992

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>

Fix pydantic

a310339

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>

mgoin approved these changes Feb 18, 2026

View reviewed changes

vllm-bot merged commit 7743152 into vllm-project:main Feb 18, 2026
49 of 59 checks passed

github-project-automation bot moved this from Ready to Done in NVIDIA Feb 18, 2026

MatthewBonanni deleted the refactor_block_size branch February 18, 2026 01:08

ilmarkov mentioned this pull request Feb 18, 2026

[CI Failure]: models/test_initialization.py::test_can_initialize_large_subset[InternS1ProForConditionalGeneration] #34814

Closed

3 tasks

DarkLight1337 mentioned this pull request Feb 18, 2026

[Bugfix] Fix Basic Models Test #34818

Merged

5 tasks

wzhao18 pushed a commit to wzhao18/vllm that referenced this pull request Feb 18, 2026

[Attention] Refactor check_and_update_config (vllm-project#33600)

20d874a

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com> Signed-off-by: wzhao18 <wzhao18.sz@gmail.com>

jasonozuzu-cohere pushed a commit to jasonozuzu-cohere/vllm that referenced this pull request Feb 18, 2026

[Attention] Refactor check_and_update_config (vllm-project#33600)

0c94119

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com> Signed-off-by: Jason Ozuzu <jasonozuzu@cohere.com>

This was referenced Feb 19, 2026

[ROCm][AITER] Fix aiter paged_attention_v1 decode for sliding window and head_size < 64 #34570

Merged

[Feature][Scheduler] Add split prefix caching feature to eliminate bf16 GEMM tiling divergence across cache-hit/miss paths #34046

Open

hl475 mentioned this pull request Feb 19, 2026

[perf] Avoid dtype promotion sync in mamba_get_block_table_tensor #34870

Merged

5 tasks

LucasWilkinson added a commit that referenced this pull request Feb 20, 2026

Revert "[Attention] Refactor check_and_update_config (#33600)"

8502fc3

This reverts commit 7743152.

LucasWilkinson mentioned this pull request Feb 20, 2026

[CI] Revert PRs 34818 and 33600 #34979

Merged

ZJY0516 pushed a commit to ZJY0516/vllm that referenced this pull request Feb 23, 2026

[Attention] Refactor check_and_update_config (vllm-project#33600)

92ca7e0

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com> Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>

MatthewBonanni mentioned this pull request Feb 23, 2026

Reapply [Attention] Refactor check_and_update_config #35122

Merged

5 tasks

llsj14 pushed a commit to llsj14/vllm that referenced this pull request Mar 1, 2026

[Attention] Refactor check_and_update_config (vllm-project#33600)

d479562

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>

tunglinwood pushed a commit to tunglinwood/vllm that referenced this pull request Mar 4, 2026

[Attention] Refactor check_and_update_config (vllm-project#33600)

2caca80

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>

askliar pushed a commit to askliar/vllm that referenced this pull request Mar 9, 2026

[Attention] Refactor check_and_update_config (vllm-project#33600)

5d965e6

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com> Signed-off-by: Andrii Skliar <askliar@nvidia.com>

Uh oh!

Conversation

MatthewBonanni commented Feb 2, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Automatic selection

Setting bad block size

Uh oh!

mergify bot commented Feb 2, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

mergify bot commented Feb 16, 2026

Uh oh!

pavanimajety left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

mergify bot commented Feb 17, 2026

Uh oh!

mgoin left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

DarkLight1337 commented Feb 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

njhill commented Feb 18, 2026

Uh oh!

mgoin commented Feb 18, 2026

Uh oh!

MatthewBonanni commented Feb 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

MatthewBonanni commented Feb 2, 2026 •

edited by github-actions bot

Loading

DarkLight1337 commented Feb 18, 2026 •

edited

Loading

MatthewBonanni commented Feb 18, 2026 •

edited

Loading