Skip to content

[Bugfix] Fix block_size mismatch for MLA models after #34818#34970

Closed
mgoin wants to merge 2 commits intovllm-project:mainfrom
neuralmagic:fix-block-size-mismatch
Closed

[Bugfix] Fix block_size mismatch for MLA models after #34818#34970
mgoin wants to merge 2 commits intovllm-project:mainfrom
neuralmagic:fix-block-size-mismatch

Conversation

@mgoin
Copy link
Member

@mgoin mgoin commented Feb 20, 2026

Purpose

#34818 deferred block_size selection to after model loading, but InputBatch is created before model loading with a placeholder block_size=16.
When update_block_size_for_backend later sets the real block_size (e.g. 32 for FLASHINFER_MLA), may_reinitialize_input_batch compared the kv_cache sizes against the already-updated cache_config.block_size (both 32) and skipped re-initialization.
The InputBatch kept using block_size=16 for slot mappings while the KV cache used block_size=32, producing garbage output.

The fix is to track what block sizes InputBatch was actually created with and compare against those. Added post-condition asserts so this class of bug fails loudly at startup instead of silently producing wrong results.

Fixes #34969

Test Plan

Test Result

Manually ran failing tests on B200 to now pass


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: mgoin <mgoin64@gmail.com>
@mgoin mgoin changed the title Fix block_size mismatch for MLA models after #34818 [Bugfix] Fix block_size mismatch for MLA models after #34818 Feb 20, 2026
@mergify mergify bot added v1 bug Something isn't working labels Feb 20, 2026
Signed-off-by: mgoin <mgoin64@gmail.com>
@mgoin mgoin added the ready ONLY add when PR is ready to merge/full CI is needed label Feb 20, 2026
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request correctly addresses a bug where InputBatch could have a mismatched block_size compared to the KV cache, which is particularly relevant for MLA models. The issue arose because the check for re-initializing InputBatch used a block_size value that could be updated after InputBatch's initial creation. The fix introduces new state variables to track the block_size used during InputBatch's instantiation and uses these for the re-initialization check. This ensures InputBatch is correctly updated when the block_size changes. The addition of assertions to verify block size consistency at the end of the process is a good defensive programming practice that will help prevent similar regressions. The changes are logical, well-targeted, and I approve them.

@mergify
Copy link

mergify bot commented Feb 21, 2026

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @mgoin.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working needs-rebase ready ONLY add when PR is ready to merge/full CI is needed v1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[CI Failure]: LM Eval Small Models (B200) - DeepSeek and Qwen3 Next

1 participant