Revert "[BugFix] Correct max memory usage for multiple KV-cache groups" (#36030) by zhewenl · Pull Request #37584 · vllm-project/vllm

zhewenl · 2026-03-19T17:47:07Z

Revert of PR #36030

This reverts #36030 (merge commit 45f526d).

Reason: CI failure in Distributed Torchrun + Examples (4 GPUs) — the KV cache memory calculation change caused insufficient KV cache memory (0.44 GiB available vs 0.5 GiB needed) for microsoft/Phi-mini-MoE-instruct on L4 GPUs, breaking the test_torchrun_example_moe.py test with TP_SIZE=2 DP_SIZE=2 ENABLE_EP=1.

Linked build: https://buildkite.com/vllm/ci/builds/56956
New failures linked: 1

Auto-generated by CI failure analyzer.

vllm-project#36030)" This reverts commit 45f526d.

gemini-code-assist

Code Review

The pull request successfully reverts the changes introduced in PR #36030, addressing the CI failures related to KV cache memory calculation. While the revert restores a previously stable state, the logic for calculating blocks_needed in _max_memory_usage_bytes_from_groups in vllm/v1/core/kv_cache_utils.py appears to reintroduce a potential bug. It currently only considers the memory usage of the first KV cache group, which could lead to insufficient memory allocation if other groups have different or higher memory requirements. This is a critical issue that should be addressed to prevent future runtime errors, especially in hybrid models with diverse KV cache specifications.

gemini-code-assist · 2026-03-19T17:49:12Z

-        for group in kv_cache_groups
-    )
+    any_spec = kv_cache_groups[0].kv_cache_spec
+    blocks_needed = cdiv(any_spec.max_memory_usage_bytes(vllm_config), page_size)


The reverted logic for calculating blocks_needed uses any_spec = kv_cache_groups[0].kv_cache_spec and then cdiv(any_spec.max_memory_usage_bytes(vllm_config), page_size). This assumes that the max_memory_usage_bytes is uniform across all kv_cache_spec objects within kv_cache_groups for the "General case" (i.e., when not UniformTypeKVCacheSpecs).

However, different KVCacheSpec types (e.g., FullAttentionSpec vs. SlidingWindowSpec) can have different max_memory_usage_bytes calculations, even if their page_size_bytes are unified. By only considering kv_cache_groups[0].kv_cache_spec, this calculation might underestimate the total blocks needed if subsequent groups have higher memory requirements. This could lead to insufficient memory allocation and runtime failures.

To correctly account for all groups, blocks_needed should be derived from the maximum memory usage among all individual kv_cache_spec objects in the groups.

Suggested change

blocks_needed = cdiv(any_spec.max_memory_usage_bytes(vllm_config), page_size)

blocks_needed = cdiv(max(group.kv_cache_spec.max_memory_usage_bytes(vllm_config) for group in kv_cache_groups), page_size)

Revert "[BugFix] Correct max memory usage for multiple KV-cache groups (

21d5800

vllm-project#36030)" This reverts commit 45f526d.

mergify Bot added v1 bug Something isn't working labels Mar 19, 2026

gemini-code-assist Bot reviewed Mar 19, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Revert "[BugFix] Correct max memory usage for multiple KV-cache groups" (#36030)#37584

Revert "[BugFix] Correct max memory usage for multiple KV-cache groups" (#36030)#37584
zhewenl wants to merge 1 commit intovllm-project:mainfrom
zhewenl:auto-revert/pr-36030

zhewenl commented Mar 19, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Mar 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	blocks_needed = cdiv(any_spec.max_memory_usage_bytes(vllm_config), page_size)
	blocks_needed = cdiv(max(group.kv_cache_spec.max_memory_usage_bytes(vllm_config) for group in kv_cache_groups), page_size)

Uh oh!

Conversation

zhewenl commented Mar 19, 2026

Revert of PR #36030

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant