[Bugfix][Model Runner v2] Fix MRV2 KV cache kernel block sizing. by chfeng-cs · Pull Request #42872 · vllm-project/vllm

chfeng-cs · 2026-05-17T08:00:55Z

Purpose

Fix Model Runner V2 KV cache handling when the backend kernel block size differs from the KV manager block size.

For FlashInfer with --block-size 128, MRV2 was still constructing KV cache/block table state using the logical block size, while NIXL expected the physical/kernel block view. This caused NIXL KV
cache registration to fail during startup.

Closes #42846

Test Plan

Added a focused MRV2 block table regression test.
Verified the Qwen3 + FlashInfer + NIXL single-GPU startup repro.

Test Result

> pytest tests/v1/worker/test_gpu_model_runner_v2.py -q
.                                                                                                                                                                                              [100%]
1 passed in 0.71s

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.

gemini-code-assist

Code Review

This pull request introduces kernel block expansion for the vLLM V1 worker, allowing the KV cache to be managed with larger logical blocks while being processed by kernels using smaller physical block sizes. It includes changes to BlockTables to handle the mapping between these block types and updates the attention backend initialization to propagate kernel_block_sizes. However, the current implementation has critical issues regarding the alignment of kernel_block_sizes with KV cache groups, as certain specifications are skipped during preparation, leading to incorrect indexing or missing caches. Additionally, there is a potential shape mismatch when storage_block_size differs from the logical block size, which could result in out-of-bounds memory access.

gemini-code-assist · 2026-05-17T08:03:06Z

+    kernel_block_sizes = prepare_kernel_block_sizes(kv_cache_config, attn_groups)
+    for kv_cache_group_id, groups in enumerate(attn_groups):
+        kernel_block_size = (
+            kernel_block_sizes[kv_cache_group_id]
+            if kv_cache_group_id < len(kernel_block_sizes)
+            else None
+        )


The indexing of kernel_block_sizes by kv_cache_group_id is incorrect because prepare_kernel_block_sizes (in vllm/v1/worker/utils.py) skips EncoderOnlyAttentionSpec groups. This results in a length mismatch and misaligned mapping between groups and their kernel block sizes. If an encoder-only group exists, subsequent groups will receive the wrong block size or None, causing them to fallback to logical block sizes and defeating the purpose of this fix. prepare_kernel_block_sizes should be updated to return a list of the same length as kv_cache_groups (e.g., by using None for skipped groups).

prepare_kernel_block_sizes() currently returns a compact list because EncoderOnlyAttentionSpec does not allocate KV cache. In the current KV cache config construction, encoder-only
groups are appended after regular KV cache groups, so the compact list remains aligned for all non-encoder-only groups, and the trailing encoder-only group is skipped by the existing guard.

This PR keeps that existing behavior and focuses on the Qwen3 + FlashInfer + NIXL MRV2 regression. If MRV2 later allows encoder-only groups before regular attention groups, we should revisit this
indexing contract separately.

gemini-code-assist · 2026-05-17T08:03:06Z

+        if kv_cache_group_id >= len(kernel_block_sizes):
+            continue


In _reshape_kv_cache, the loop incorrectly assumes that kernel_block_sizes is 1-to-1 with kv_cache_config.kv_cache_groups. Because prepare_kernel_block_sizes skips certain specs, len(kernel_block_sizes) may be less than the number of groups. The continue on line 173 will cause the last groups in the configuration to be skipped entirely, leading to missing KV caches for those layers.

Same underlying concern as above. Under the current KV cache config construction, the groups skipped by this guard are trailing encoder-only groups, which do not allocate KV cache.

gemini-code-assist · 2026-05-17T08:03:06Z

+                num_blocks_per_kv_block = (
+                    kv_cache_spec.block_size // kernel_block_size
+                )
+                kernel_num_blocks = num_blocks * num_blocks_per_kv_block
+                if kv_cache_spec.storage_block_size != kv_cache_spec.block_size:
+                    shape_block_size = kv_cache_spec.storage_block_size
+                else:
+                    shape_block_size = kernel_block_size


There is a mismatch between kernel_num_blocks and shape_block_size when storage_block_size is used. kernel_num_blocks is currently calculated based on kernel_block_size, but shape_block_size might be set to storage_block_size. If these two differ (e.g., in MLA where storage_block_size is 1 but kernel_block_size might be larger), the resulting KV cache shape will be incorrect and likely too small, leading to out-of-bounds access. kernel_num_blocks should be calculated using the same block size used for the shape.

Suggested change

num_blocks_per_kv_block = (

kv_cache_spec.block_size // kernel_block_size

)

kernel_num_blocks = num_blocks * num_blocks_per_kv_block

if kv_cache_spec.storage_block_size != kv_cache_spec.block_size:

shape_block_size = kv_cache_spec.storage_block_size

else:

shape_block_size = kernel_block_size

if kv_cache_spec.storage_block_size != kv_cache_spec.block_size:

shape_block_size = kv_cache_spec.storage_block_size

else:

shape_block_size = kernel_block_size

num_blocks_per_kv_block = (

kv_cache_spec.block_size // shape_block_size

)

kernel_num_blocks = num_blocks * num_blocks_per_kv_block

mergify · 2026-05-17T08:09:29Z

Hi @chfeng-cs, the pre-commit checks have failed. Please run:

uv pip install pre-commit>=4.5.1
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy failing?

mypy is run differently in CI. If the failure is related to this check, please use the following command to run it locally:

# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10

mergify · 2026-05-17T13:47:27Z

Hi @chfeng-cs, the pre-commit checks have failed. Please run:

uv pip install pre-commit>=4.5.1
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy failing?

mypy is run differently in CI. If the failure is related to this check, please use the following command to run it locally:

# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10

mergify · 2026-05-17T15:55:28Z

Hi @chfeng-cs, the pre-commit checks have failed. Please run:

uv pip install pre-commit>=4.5.1
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy failing?

mypy is run differently in CI. If the failure is related to this check, please use the following command to run it locally:

# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10

mergify · 2026-05-18T05:21:46Z

Hi @chfeng-cs, the pre-commit checks have failed. Please run:

uv pip install pre-commit>=4.5.1
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy failing?

mypy is run differently in CI. If the failure is related to this check, please use the following command to run it locally:

# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10

Use backend kernel block sizes when initializing Model Runner V2 attention metadata, KV cache views, and block tables. This keeps FlashInfer's physical block view consistent with NIXL registration when the KV manager block size is larger than the kernel block size. Add a focused regression test for MRV2 block table logical-to-kernel block expansion. Signed-off-by: fengchuanheng <fengchuanheng@sjtu.edu.cn>

chfeng-cs · 2026-05-18T09:33:12Z

Closing in favor of #42766 and #42955. Thanks for the guidance @NickLucche.

njhill · 2026-05-19T14:26:53Z

Thanks @chfeng-cs

chfeng-cs requested review from WoosukKwon and njhill as code owners May 17, 2026 08:00

mergify Bot added v1 bug Something isn't working labels May 17, 2026

gemini-code-assist Bot reviewed May 17, 2026

View reviewed changes

This was referenced May 17, 2026

[Bug][CI] NIXL + FlashInfer fails with Qwen3 MRV2 and --block-size 128 #42846

Closed

[KV Transfer] Enable HMA by default for connectors that support it #41847

Merged

chfeng-cs force-pushed the fix-mrv2-flashinfer-kernel-block-size branch from eb7fe50 to 931f274 Compare May 17, 2026 13:42

chfeng-cs force-pushed the fix-mrv2-flashinfer-kernel-block-size branch from 931f274 to eb7fe50 Compare May 17, 2026 15:50

chfeng-cs mentioned this pull request May 18, 2026

[Model Runner v2] fix pd accuracy #42888

Closed

4 tasks

chfeng-cs force-pushed the fix-mrv2-flashinfer-kernel-block-size branch from eb7fe50 to 1f2b399 Compare May 18, 2026 05:16

chfeng-cs changed the title ~~[Bugfix][CI] Fix MRV2 KV cache kernel block sizing.~~ [Bugfix][[Model Runner v2]] Fix MRV2 KV cache kernel block sizing. May 18, 2026

chfeng-cs changed the title ~~[Bugfix][[Model Runner v2]] Fix MRV2 KV cache kernel block sizing.~~ [Bugfix][Model Runner v2] Fix MRV2 KV cache kernel block sizing. May 18, 2026

chfeng-cs force-pushed the fix-mrv2-flashinfer-kernel-block-size branch from 1f2b399 to 25cf3a2 Compare May 18, 2026 05:50

chfeng-cs closed this May 18, 2026

njhill added the v2 label May 20, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bugfix][Model Runner v2] Fix MRV2 KV cache kernel block sizing.#42872

[Bugfix][Model Runner v2] Fix MRV2 KV cache kernel block sizing.#42872
chfeng-cs wants to merge 1 commit into
vllm-project:mainfrom
chfeng-cs:fix-mrv2-flashinfer-kernel-block-size

chfeng-cs commented May 17, 2026 •

edited by github-actions Bot

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot May 17, 2026

Uh oh!

chfeng-cs May 17, 2026

Uh oh!

gemini-code-assist Bot May 17, 2026

Uh oh!

chfeng-cs May 17, 2026

Uh oh!

gemini-code-assist Bot May 17, 2026

Uh oh!

mergify Bot commented May 17, 2026

Uh oh!

mergify Bot commented May 17, 2026

Uh oh!

mergify Bot commented May 17, 2026

Uh oh!

mergify Bot commented May 18, 2026

Uh oh!

chfeng-cs commented May 18, 2026

Uh oh!

njhill commented May 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

chfeng-cs commented May 17, 2026 • edited by github-actions Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 17, 2026

Choose a reason for hiding this comment

Uh oh!

chfeng-cs May 17, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 17, 2026

Choose a reason for hiding this comment

Uh oh!

chfeng-cs May 17, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 17, 2026

Choose a reason for hiding this comment

Uh oh!

mergify Bot commented May 17, 2026

Uh oh!

mergify Bot commented May 17, 2026

Uh oh!

mergify Bot commented May 17, 2026

Uh oh!

mergify Bot commented May 18, 2026

Uh oh!

chfeng-cs commented May 18, 2026

Uh oh!

njhill commented May 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

chfeng-cs commented May 17, 2026 •

edited by github-actions Bot

Loading