[Bugfix] Fix block_size mismatch for MLA models after #34818#34970
[Bugfix] Fix block_size mismatch for MLA models after #34818#34970mgoin wants to merge 2 commits intovllm-project:mainfrom
Conversation
Signed-off-by: mgoin <mgoin64@gmail.com>
There was a problem hiding this comment.
Code Review
This pull request correctly addresses a bug where InputBatch could have a mismatched block_size compared to the KV cache, which is particularly relevant for MLA models. The issue arose because the check for re-initializing InputBatch used a block_size value that could be updated after InputBatch's initial creation. The fix introduces new state variables to track the block_size used during InputBatch's instantiation and uses these for the re-initialization check. This ensures InputBatch is correctly updated when the block_size changes. The addition of assertions to verify block size consistency at the end of the process is a good defensive programming practice that will help prevent similar regressions. The changes are logical, well-targeted, and I approve them.
|
This pull request has merge conflicts that must be resolved before it can be |
Purpose
#34818 deferred block_size selection to after model loading, but
InputBatchis created before model loading with a placeholderblock_size=16.When
update_block_size_for_backendlater sets the real block_size (e.g. 32 forFLASHINFER_MLA),may_reinitialize_input_batchcompared the kv_cache sizes against the already-updatedcache_config.block_size(both 32) and skipped re-initialization.The
InputBatchkept usingblock_size=16for slot mappings while the KV cache usedblock_size=32, producing garbage output.The fix is to track what block sizes
InputBatchwas actually created with and compare against those. Added post-condition asserts so this class of bug fails loudly at startup instead of silently producing wrong results.Fixes #34969
Test Plan
Test Result
Manually ran failing tests on B200 to now pass
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.