[Bugfix] Fix block_size mismatch for MLA models after #34818 by mgoin · Pull Request #34970 · vllm-project/vllm

mgoin · 2026-02-20T17:11:16Z

Purpose

#34818 deferred block_size selection to after model loading, but InputBatch is created before model loading with a placeholder block_size=16.
When update_block_size_for_backend later sets the real block_size (e.g. 32 for FLASHINFER_MLA), may_reinitialize_input_batch compared the kv_cache sizes against the already-updated cache_config.block_size (both 32) and skipped re-initialization.
The InputBatch kept using block_size=16 for slot mappings while the KV cache used block_size=32, producing garbage output.

The fix is to track what block sizes InputBatch was actually created with and compare against those. Added post-condition asserts so this class of bug fails loudly at startup instead of silently producing wrong results.

Fixes #34969

Test Plan

Test Result

Manually ran failing tests on B200 to now pass

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: mgoin <mgoin64@gmail.com>

gemini-code-assist

Code Review

This pull request correctly addresses a bug where InputBatch could have a mismatched block_size compared to the KV cache, which is particularly relevant for MLA models. The issue arose because the check for re-initializing InputBatch used a block_size value that could be updated after InputBatch's initial creation. The fix introduces new state variables to track the block_size used during InputBatch's instantiation and uses these for the re-initialization check. This ensures InputBatch is correctly updated when the block_size changes. The addition of assertions to verify block size consistency at the end of the process is a good defensive programming practice that will help prevent similar regressions. The changes are logical, well-targeted, and I approve them.

mergify · 2026-02-21T04:22:11Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @mgoin.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Fix block_size mismatch for MLA models after vllm-project#34818

feed637

Signed-off-by: mgoin <mgoin64@gmail.com>

mgoin changed the title ~~Fix block_size mismatch for MLA models after #34818~~ [Bugfix] Fix block_size mismatch for MLA models after #34818 Feb 20, 2026

mergify bot added v1 bug Something isn't working labels Feb 20, 2026

Cleanup

4f9b8be

Signed-off-by: mgoin <mgoin64@gmail.com>

mgoin added the ready ONLY add when PR is ready to merge/full CI is needed label Feb 20, 2026

mgoin requested review from LucasWilkinson and robertgshaw2-redhat February 20, 2026 17:15

gemini-code-assist bot reviewed Feb 20, 2026

View reviewed changes

mergify bot added the needs-rebase label Feb 21, 2026

mgoin closed this Feb 23, 2026

MatthewBonanni mentioned this pull request Feb 23, 2026

Reapply [Attention] Refactor check_and_update_config #35122

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bugfix] Fix block_size mismatch for MLA models after #34818#34970

[Bugfix] Fix block_size mismatch for MLA models after #34818#34970
mgoin wants to merge 2 commits intovllm-project:mainfrom
neuralmagic:fix-block-size-mismatch

mgoin commented Feb 20, 2026 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

mergify bot commented Feb 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

mgoin commented Feb 20, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

mergify bot commented Feb 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

mgoin commented Feb 20, 2026 •

edited by github-actions bot

Loading