Skip to content

Commit 00f1347

Browse files
committed
OffloadingConnector: Fix GPU block tracking bug
This commit fixes a bug in the offloading connector that may result in incorrect GPU block tracking per request. It occurs when blocks cannot be allocated on the offloaded medium (prepare_store fails), and the scheduler output has multiple requests, some of them with new GPU block IDs. Before this commit, the connector simply returned without processing the rest of the requests, and their GPU block IDs. Signed-off-by: Or Ozeri <[email protected]>
1 parent 70fbdb2 commit 00f1347

File tree

1 file changed

+3
-2
lines changed

1 file changed

+3
-2
lines changed

vllm/distributed/kv_transfer/kv_connector/v1/offloading_connector.py

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -278,8 +278,9 @@ def _get_reqs_to_store(self, scheduler_output: SchedulerOutput):
278278
req, start_idx=start_block_idx, end_idx=num_blocks)
279279
store_output = self.manager.prepare_store(new_block_hashes)
280280
if store_output is None:
281-
logger.warning("Cannot store %s blocks", num_new_blocks)
282-
break
281+
logger.warning("Request %s: cannot store %s blocks", req_id,
282+
num_new_blocks)
283+
continue
283284

284285
self._next_stored_block_idx[req_id] = num_blocks
285286

0 commit comments

Comments
 (0)