OffloadingConnector: Fix GPU block tracking bug #25856

orozery · 2025-09-29T05:54:14Z

This PR fixes a bug in the offloading connector that may result in incorrect GPU block tracking per request.
It occurs when blocks cannot be allocated on the offloaded medium (prepare_store fails), and the scheduler output has multiple requests, some of them with new GPU block IDs. Before this PR, the connector simply returned without processing the rest of the requests, and their GPU block IDs.

This resulted in a crash of the engine core:

^[[1;36m(EngineCore_0 pid=271)^[[0;0m ERROR 09-21 17:33:21 [core.py:714] EngineCore encountered a fatal error.
^[[1;36m(EngineCore_0 pid=271)^[[0;0m ERROR 09-21 17:33:21 [core.py:714] Traceback (most recent call last):
^[[1;36m(EngineCore_0 pid=271)^[[0;0m ERROR 09-21 17:33:21 [core.py:714]   File "/workspace/vllm/vllm/v1/engine/core.py", line 705, in run_engine_core
^[[1;36m(EngineCore_0 pid=271)^[[0;0m ERROR 09-21 17:33:21 [core.py:714]     engine_core.run_busy_loop()
^[[1;36m(EngineCore_0 pid=271)^[[0;0m ERROR 09-21 17:33:21 [core.py:714]   File "/workspace/vllm/vllm/v1/engine/core.py", line 732, in run_busy_loop
^[[1;36m(EngineCore_0 pid=271)^[[0;0m ERROR 09-21 17:33:21 [core.py:714]     self._process_engine_step()
^[[1;36m(EngineCore_0 pid=271)^[[0;0m ERROR 09-21 17:33:21 [core.py:714]   File "/workspace/vllm/vllm/v1/engine/core.py", line 758, in _process_engine_step
^[[1;36m(EngineCore_0 pid=271)^[[0;0m ERROR 09-21 17:33:21 [core.py:714]     outputs, model_executed = self.step_fn()
^[[1;36m(EngineCore_0 pid=271)^[[0;0m ERROR 09-21 17:33:21 [core.py:714]                               ^^^^^^^^^^^^^^
^[[1;36m(EngineCore_0 pid=271)^[[0;0m ERROR 09-21 17:33:21 [core.py:714]   File "/workspace/vllm/vllm/v1/engine/core.py", line 290, in step
^[[1;36m(EngineCore_0 pid=271)^[[0;0m ERROR 09-21 17:33:21 [core.py:714]     scheduler_output = self.scheduler.schedule()
^[[1;36m(EngineCore_0 pid=271)^[[0;0m ERROR 09-21 17:33:21 [core.py:714]                        ^^^^^^^^^^^^^^^^^^^^^^^^^
^[[1;36m(EngineCore_0 pid=271)^[[0;0m ERROR 09-21 17:33:21 [core.py:714]   File "/workspace/vllm/vllm/v1/core/sched/scheduler.py", line 589, in schedule
^[[1;36m(EngineCore_0 pid=271)^[[0;0m ERROR 09-21 17:33:21 [core.py:714]     meta = self.connector.build_connector_meta(scheduler_output)
^[[1;36m(EngineCore_0 pid=271)^[[0;0m ERROR 09-21 17:33:21 [core.py:714]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^[[1;36m(EngineCore_0 pid=271)^[[0;0m ERROR 09-21 17:33:21 [core.py:714]   File "/workspace/vllm/vllm/distributed/kv_transfer/kv_connector/v1/offloading_connector.py", line 101, in build_connector_meta
^[[1;36m(EngineCore_0 pid=271)^[[0;0m ERROR 09-21 17:33:21 [core.py:714]     return self.connector_scheduler.build_connector_meta(scheduler_output)
^[[1;36m(EngineCore_0 pid=271)^[[0;0m ERROR 09-21 17:33:21 [core.py:714]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^[[1;36m(EngineCore_0 pid=271)^[[0;0m ERROR 09-21 17:33:21 [core.py:714]   File "/workspace/vllm/vllm/distributed/kv_transfer/kv_connector/v1/offloading_connector.py", line 323, in build_connector_meta
^[[1;36m(EngineCore_0 pid=271)^[[0;0m ERROR 09-21 17:33:21 [core.py:714]     reqs_to_store=self._get_reqs_to_store(scheduler_output))
^[[1;36m(EngineCore_0 pid=271)^[[0;0m ERROR 09-21 17:33:21 [core.py:714]                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^[[1;36m(EngineCore_0 pid=271)^[[0;0m ERROR 09-21 17:33:21 [core.py:714]   File "/workspace/vllm/vllm/distributed/kv_transfer/kv_connector/v1/offloading_connector.py", line 305, in _get_reqs_to_store
^[[1;36m(EngineCore_0 pid=271)^[[0;0m ERROR 09-21 17:33:21 [core.py:714]     GPULoadStoreSpec(block_ids[gpu_block_idx + i]))
^[[1;36m(EngineCore_0 pid=271)^[[0;0m ERROR 09-21 17:33:21 [core.py:714]                      ~~~~~~~~~^^^^^^^^^^^^^^^^^^^
^[[1;36m(EngineCore_0 pid=271)^[[0;0m ERROR 09-21 17:33:21 [core.py:714] IndexError: list index out of range

gemini-code-assist

Code Review

This pull request addresses a critical bug in the OffloadingConnector that could cause the engine to crash. The issue occurred when preparing to store K/V cache blocks for offloading failed for one request, causing the system to stop processing subsequent requests in the same batch. This resulted in inconsistent state for GPU block tracking and an IndexError. The fix correctly replaces a break with a continue statement, ensuring that even if one request fails to offload, other requests in the batch are still processed correctly. The change is small, targeted, and effectively resolves the described crash. The logic is sound and I see no further issues.

orozery · 2025-09-29T05:58:16Z

@njhill

vllm/distributed/kv_transfer/kv_connector/v1/offloading_connector.py

This commit fixes a bug in the offloading connector that may result in incorrect GPU block tracking per request. It occurs when blocks cannot be allocated on the offloaded medium (prepare_store fails), and the scheduler output has multiple requests, some of them with new GPU block IDs. Before this commit, the connector simply returned without processing the rest of the requests, and their GPU block IDs. Signed-off-by: Or Ozeri <[email protected]>

Signed-off-by: Or Ozeri <[email protected]>

Signed-off-by: Or Ozeri <[email protected]> Signed-off-by: yewentao256 <[email protected]>

Signed-off-by: Or Ozeri <[email protected]> Signed-off-by: Tomer Asida <[email protected]>

Signed-off-by: Or Ozeri <[email protected]> Signed-off-by: xuebwang-amd <[email protected]>

Signed-off-by: Or Ozeri <[email protected]>

Signed-off-by: Or Ozeri <[email protected]> Signed-off-by: xuebwang-amd <[email protected]>

Signed-off-by: Or Ozeri <[email protected]>

orozery requested review from ApostaC and NickLucche as code owners September 29, 2025 05:54

mergify bot added the kv-connector label Sep 29, 2025

gemini-code-assist bot reviewed Sep 29, 2025

View reviewed changes

njhill approved these changes Sep 29, 2025

View reviewed changes

vllm/distributed/kv_transfer/kv_connector/v1/offloading_connector.py Outdated Show resolved Hide resolved

orozery force-pushed the offloading-connector-bug branch from 666d4a6 to 00f1347 Compare September 30, 2025 06:12

njhill added the ready ONLY add when PR is ready to merge/full CI is needed label Sep 30, 2025

njhill enabled auto-merge (squash) September 30, 2025 17:43

njhill merged commit cfd302d into vllm-project:main Sep 30, 2025
48 checks passed

pdasigi pushed a commit to pdasigi/vllm that referenced this pull request Oct 2, 2025

OffloadingConnector: Fix GPU block tracking bug (vllm-project#25856)

dc5fa2e

Signed-off-by: Or Ozeri <[email protected]>

yewentao256 pushed a commit that referenced this pull request Oct 3, 2025

OffloadingConnector: Fix GPU block tracking bug (#25856)

bb2e04e

Signed-off-by: Or Ozeri <[email protected]> Signed-off-by: yewentao256 <[email protected]>

tomeras91 pushed a commit to tomeras91/vllm that referenced this pull request Oct 6, 2025

OffloadingConnector: Fix GPU block tracking bug (vllm-project#25856)

680223f

Signed-off-by: Or Ozeri <[email protected]> Signed-off-by: Tomer Asida <[email protected]>

orozery mentioned this pull request Oct 8, 2025

[Bug]: OffloadingConnector instability with increasing --num-prompts vllm bench values #26329

Closed

1 task

xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 10, 2025

OffloadingConnector: Fix GPU block tracking bug (vllm-project#25856)

2d5b893

Signed-off-by: Or Ozeri <[email protected]> Signed-off-by: xuebwang-amd <[email protected]>

lywa1998 pushed a commit to lywa1998/vllm that referenced this pull request Oct 20, 2025

OffloadingConnector: Fix GPU block tracking bug (vllm-project#25856)

ffed429

Signed-off-by: Or Ozeri <[email protected]>

alhridoy pushed a commit to alhridoy/vllm that referenced this pull request Oct 24, 2025

OffloadingConnector: Fix GPU block tracking bug (vllm-project#25856)

d3057d2

Signed-off-by: Or Ozeri <[email protected]>

xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 24, 2025

OffloadingConnector: Fix GPU block tracking bug (vllm-project#25856)

69b51a4

Signed-off-by: Or Ozeri <[email protected]> Signed-off-by: xuebwang-amd <[email protected]>

rtourgeman pushed a commit to rtourgeman/vllm that referenced this pull request Nov 10, 2025

OffloadingConnector: Fix GPU block tracking bug (vllm-project#25856)

b5fd6bc

Signed-off-by: Or Ozeri <[email protected]>

devpatelio pushed a commit to SumanthRH/vllm that referenced this pull request Nov 29, 2025

OffloadingConnector: Fix GPU block tracking bug (vllm-project#25856)

195a554

Signed-off-by: Or Ozeri <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

OffloadingConnector: Fix GPU block tracking bug #25856

OffloadingConnector: Fix GPU block tracking bug #25856

Uh oh!

orozery commented Sep 29, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

orozery commented Sep 29, 2025

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

OffloadingConnector: Fix GPU block tracking bug #25856

OffloadingConnector: Fix GPU block tracking bug #25856

Uh oh!

Conversation

orozery commented Sep 29, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

orozery commented Sep 29, 2025

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

orozery commented Sep 29, 2025 •

edited by github-actions bot

Loading