[BugFix] kv_offloading: Fix bug in loading of partial cpu blocks by orozery · Pull Request #28951 · vllm-project/vllm

orozery · 2025-11-18T17:12:58Z

This PR fixes a bug when trying to load from the middle of a CPU block. This can happen if cpu_block_size > gpu_block_size, and there's both a cpu and gpu (prefix cache) hit, where the gpu hit ends in the middle of a cpu block. Before this commit, the code tried to wrongfully address the other direction, storing to the middle of a cpu block. But this is impossible since the offloading connector always stores full CPU blocks.

gemini-code-assist

Code Review

This pull request addresses a bug in loading partial CPU blocks for KV cache offloading. The core logic is adjusted to correctly skip sub-blocks from the source (CPU) rather than the destination, and the test suite is updated to validate this scenario. However, this change introduces a critical issue where the block mapping array src_to_dst is allocated with an incorrect size. This can lead to reading uninitialized memory and subsequent incorrect data transfers. I have provided a specific code suggestion to rectify this allocation bug.

vllm/v1/kv_offload/worker/cpu_gpu.py

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

vllm/v1/kv_offload/worker/cpu_gpu.py

This commit fixes a bug when trying to load from the middle of a CPU block. This can happen if cpu_block_size > gpu_block_size, and there's both a cpu and gpu (prefix cache) hit, where the gpu hit ends in the middle of a cpu block. Before this commit, the code tried to wrongfully address the other direction, storing to the middle of a cpu block. But this is impossible since the offloading connector always stores full CPU blocks. Signed-off-by: Or Ozeri <oro@il.ibm.com>

orozery · 2025-11-18T19:07:05Z

@ApostaC can you please have a look?
Thanks!

ApostaC

LGTM.

One quick question: is there any chance that the CPU block size is smaller than the GPU block size? In this case, we also need to skip dst (GPU) sub blocks when doing CPU -> GPU transfer.

orozery · 2025-11-19T04:16:12Z

One quick question: is there any chance that the CPU block size is smaller than the GPU block size? In this case, we also need to skip dst (GPU) sub blocks when doing CPU -> GPU transfer.

It's impossible.
The offloading connector is based on the assumption that the offloaded block size is a multiply of the gpu block size.

See here:

vllm/vllm/v1/kv_offload/spec.py

Line 38 in 4c23690

assert self.offloaded_block_size % self.gpu_block_size == 0

…m-project#28951) Signed-off-by: Or Ozeri <oro@il.ibm.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>

…m-project#28951) Signed-off-by: Or Ozeri <oro@il.ibm.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk> Signed-off-by: Runkai Tao <rt572@physics.rutgers.edu>

…m-project#28951) Signed-off-by: Or Ozeri <oro@il.ibm.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>

mergify bot added the v1 label Nov 18, 2025

gemini-code-assist bot reviewed Nov 18, 2025

View reviewed changes

vllm/v1/kv_offload/worker/cpu_gpu.py Outdated Show resolved Hide resolved

chatgpt-codex-connector bot reviewed Nov 18, 2025

View reviewed changes

vllm/v1/kv_offload/worker/cpu_gpu.py Outdated Show resolved Hide resolved

orozery force-pushed the cpu-offloading-partial-block-load-bugfix branch from b7ad215 to ee67691 Compare November 18, 2025 17:24

orozery force-pushed the cpu-offloading-partial-block-load-bugfix branch from ee67691 to b5a0482 Compare November 18, 2025 17:45

ApostaC approved these changes Nov 19, 2025

View reviewed changes

njhill changed the title ~~kv_offloading: Fix bug in loading of partial cpu blocks~~ [BugFix] kv_offloading: Fix bug in loading of partial cpu blocks Nov 19, 2025

njhill added ready ONLY add when PR is ready to merge/full CI is needed bug Something isn't working labels Nov 19, 2025

Merge branch 'main' into cpu-offloading-partial-block-load-bugfix

62fc831

DarkLight1337 merged commit c0c2dd1 into vllm-project:main Nov 20, 2025
42 checks passed

devpatelio pushed a commit to SumanthRH/vllm that referenced this pull request Nov 29, 2025

[BugFix] kv_offloading: Fix bug in loading of partial cpu blocks (vll…

3c18858

…m-project#28951) Signed-off-by: Or Ozeri <oro@il.ibm.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>

kitaekatt pushed a commit to kitaekatt/vllm that referenced this pull request Dec 1, 2025

[BugFix] kv_offloading: Fix bug in loading of partial cpu blocks (vll…

4e07fb5

…m-project#28951) Signed-off-by: Or Ozeri <oro@il.ibm.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>

orozery mentioned this pull request Dec 2, 2025

[KVConnector] OffloadingConnector: Fix bug in handling of preemptions #29870

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[BugFix] kv_offloading: Fix bug in loading of partial cpu blocks#28951

[BugFix] kv_offloading: Fix bug in loading of partial cpu blocks#28951
DarkLight1337 merged 2 commits intovllm-project:mainfrom
orozery:cpu-offloading-partial-block-load-bugfix

orozery commented Nov 18, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

Uh oh!

orozery commented Nov 18, 2025

Uh oh!

ApostaC left a comment

Uh oh!

orozery commented Nov 19, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

Conversation

orozery commented Nov 18, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

orozery commented Nov 18, 2025

Uh oh!

ApostaC left a comment

Choose a reason for hiding this comment

Uh oh!

orozery commented Nov 19, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

orozery commented Nov 18, 2025 •

edited by github-actions bot

Loading