Skip to content

[Attention] Refactor check_and_update_config#33600

Merged
vllm-bot merged 17 commits intovllm-project:mainfrom
MatthewBonanni:refactor_block_size
Feb 18, 2026
Merged

[Attention] Refactor check_and_update_config#33600
vllm-bot merged 17 commits intovllm-project:mainfrom
MatthewBonanni:refactor_block_size

Conversation

@MatthewBonanni
Copy link
Collaborator

@MatthewBonanni MatthewBonanni commented Feb 2, 2026

Purpose

check_and_update_config unnecessarily duplicates much of the logic from the attention selector in order to set an approproate block size. This PR refactors check_and_update_config to use the selector, which will be simpler to maintain going forward.

  • If the user specifies --block-size and --attention-backend and the backend doesn't support the block size, we raise an error rather than overriding a user selection
  • If the user specifies --block-size but not the backend, the backend selector respects that block size choice and tries to find a backend which is compatible, raising an error if no valid backends are found
    • If the user-selected block size forces the selection of a non-optimal backend, the user is warned about this
  • If the user specifies --attention-backend only, an appropriate block size is selected
  • If the user specifies neither, an appropriate attention backend is selected, and an appropriate block size for that backend is selected

Test Plan

Automatic selection

vllm serve deepseek-ai/DeepSeek-V2-Lite-Chat --attention-backend=FLASHMLA

yields

Setting kv cache block size to 64 for FLASHMLA backend.

Setting bad block size

vllm serve deepseek-ai/DeepSeek-V2-Lite-Chat --attention-backend=FLASHMLA --block-size 32

yields

(APIServer pid=1710084) pydantic_core._pydantic_core.ValidationError: 1 validation error for VllmConfig
(APIServer pid=1710084)   Value error, User-specified block_size=16 is incompatible with FLASH_ATTN_MLA backend (requires block_size=128). Either remove --block-size to auto-select, or use --block-size 128. [type=value_error, input_value=ArgsKwargs((), {'model_co...transfer_config': None}), input_type=ArgsKwargs]

Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
@mergify
Copy link

mergify bot commented Feb 2, 2026

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @MatthewBonanni.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the attention backend selection logic in check_and_update_config and get_attn_backend_cls. The changes significantly improve code clarity and maintainability by centralizing the backend selection logic into a new select_attention_backend method and introducing get_preferred_block_size for determining block sizes. This is a great improvement. I've found one issue with a type hint that should be addressed.

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
@mergify mergify bot removed the needs-rebase label Feb 12, 2026
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
@MatthewBonanni MatthewBonanni changed the title [WIP][Attention] Refactor check_and_update_config [Attention] Refactor check_and_update_config Feb 12, 2026
@MatthewBonanni MatthewBonanni marked this pull request as ready for review February 13, 2026 02:08
@pavanimajety pavanimajety added the ready ONLY add when PR is ready to merge/full CI is needed label Feb 13, 2026
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
@mergify
Copy link

mergify bot commented Feb 16, 2026

Hi @MatthewBonanni, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy or markdownlint failing?
mypy and markdownlint are run differently in CI. If the failure is related to either of these checks, please use the following commands to run them locally:
# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10
# For markdownlint
pre-commit run --hook-stage manual markdownlint

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Copy link
Collaborator

@pavanimajety pavanimajety left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR, @MatthewBonanni!
Looks good to me, pending clean CI and minor nits!

@github-project-automation github-project-automation bot moved this to Ready in NVIDIA Feb 16, 2026
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
@mergify
Copy link

mergify bot commented Feb 17, 2026

Hi @MatthewBonanni, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy or markdownlint failing?
mypy and markdownlint are run differently in CI. If the failure is related to either of these checks, please use the following commands to run them locally:
# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10
# For markdownlint
pre-commit run --hook-stage manual markdownlint

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Copy link
Member

@mgoin mgoin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work, LGTM!

@vllm-bot vllm-bot merged commit 7743152 into vllm-project:main Feb 18, 2026
49 of 59 checks passed
@github-project-automation github-project-automation bot moved this from Ready to Done in NVIDIA Feb 18, 2026
@MatthewBonanni MatthewBonanni deleted the refactor_block_size branch February 18, 2026 01:08
@DarkLight1337
Copy link
Member

DarkLight1337 commented Feb 18, 2026

I think this PR broke model initialization and V1 Others tests, checking in https://buildkite.com/vllm/ci/builds/52009/steps/canvas

The build for the previous PR for comparison: https://buildkite.com/vllm/ci/builds/52001/steps/canvas

@njhill
Copy link
Member

njhill commented Feb 18, 2026

Do we understand how this was merged with so many breakages? Have we given up on the rule that we don't force merge without certainty that CI failures are unrelated (even if it seems obvious that they are)? Has @vllm-bot gone rogue?

@mgoin
Copy link
Member

mgoin commented Feb 18, 2026

I went rogue, sorry. I looked at the failures but thought they were not related

@MatthewBonanni
Copy link
Collaborator Author

MatthewBonanni commented Feb 18, 2026

My fault too, sorry. I asked @mgoin yesterday morning whether we should consider a force merge on this. I also assumed failures were unrelated.

wzhao18 pushed a commit to wzhao18/vllm that referenced this pull request Feb 18, 2026
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Signed-off-by: wzhao18 <wzhao18.sz@gmail.com>
jasonozuzu-cohere pushed a commit to jasonozuzu-cohere/vllm that referenced this pull request Feb 18, 2026
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Signed-off-by: Jason Ozuzu <jasonozuzu@cohere.com>
LucasWilkinson added a commit that referenced this pull request Feb 20, 2026
ZJY0516 pushed a commit to ZJY0516/vllm that referenced this pull request Feb 23, 2026
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
llsj14 pushed a commit to llsj14/vllm that referenced this pull request Mar 1, 2026
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
tunglinwood pushed a commit to tunglinwood/vllm that referenced this pull request Mar 4, 2026
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
askliar pushed a commit to askliar/vllm that referenced this pull request Mar 9, 2026
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Signed-off-by: Andrii Skliar <askliar@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

nvidia ready ONLY add when PR is ready to merge/full CI is needed v1

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

7 participants