[KVConnector] Allow connector to protect GPU blocks from eviction by orozery · Pull Request #33353 · vllm-project/vllm

orozery · 2026-01-29T16:36:48Z

This PR introduces a new connector API allowing the scheduler-side connector to increase GPU blocks ref-counts, in order to prevent them from evicting.
In particular, this is necessary for the case of async offloading of sliding window KV data, as it is automatically freed by the KV cache manager as the window progresses.

This commit introduces a new connector API allowing the scheduler-side connector to increase GPU blocks ref-counts, in order to prevent them from evicting. In particular, this is necessary for the case of async offloading of sliding window KV data, as it is automatically freed by the KV cache manager as the window progresses. Signed-off-by: Or Ozeri <oro@il.ibm.com>

gemini-code-assist

Code Review

This pull request introduces a valuable new API for KV connectors to lock and unlock GPU blocks, preventing them from being evicted. This is a crucial feature for scenarios like asynchronous offloading. The changes are well-structured, with clear API definitions in base.py and a solid implementation in the scheduler. The addition of has_work() to the scheduler interface is a logical extension to support this new functionality. The tests are comprehensive and cover the new locking/unlocking mechanism thoroughly. I have one suggestion to improve the robustness of the MultiConnector implementation.

gemini-code-assist · 2026-01-29T16:47:04Z

vllm/distributed/kv_transfer/kv_connector/v1/multi_connector.py

+    def report_to_scheduler(self) -> KVConnectorSchedulerOutput | None:
+        block_ids_to_lock: list[int] | None = None
+        block_ids_to_unlock: list[int] | None = None
+        for c in self._connectors:
+            output = c.report_to_scheduler()
+            if output is None:
+                continue
+
+            if output.block_ids_to_lock:
+                if not block_ids_to_lock:
+                    block_ids_to_lock = output.block_ids_to_lock
+                else:
+                    block_ids_to_lock.extend(output.block_ids_to_lock)
+
+            if output.block_ids_to_unlock:
+                if not block_ids_to_unlock:
+                    block_ids_to_unlock = output.block_ids_to_unlock
+                else:
+                    block_ids_to_unlock.extend(output.block_ids_to_unlock)
+
+        if not block_ids_to_lock and not block_ids_to_unlock:
+            return None
+
+        return KVConnectorSchedulerOutput(
+            block_ids_to_lock=block_ids_to_lock,
+            block_ids_to_unlock=block_ids_to_unlock,
+        )


The current implementation of report_to_scheduler has a potential for side effects. The line if not block_ids_to_lock: block_ids_to_lock = output.block_ids_to_lock assigns a list from a sub-connector's output directly. If a subsequent sub-connector also returns blocks to lock, block_ids_to_lock.extend(...) will be called, modifying the list from the first sub-connector's output object. This can lead to unexpected behavior if a sub-connector reuses its output object across calls.

A safer approach is to initialize local lists and always use extend to aggregate the block IDs. This avoids modifying child connector outputs and makes the code more robust and easier to reason about.

def report_to_scheduler(self) -> KVConnectorSchedulerOutput | None: all_block_ids_to_lock: list[int] = [] all_block_ids_to_unlock: list[int] = [] for c in self._connectors: output = c.report_to_scheduler() if output is None: continue if output.block_ids_to_lock: all_block_ids_to_lock.extend(output.block_ids_to_lock) if output.block_ids_to_unlock: all_block_ids_to_unlock.extend(output.block_ids_to_unlock) if not all_block_ids_to_lock and not all_block_ids_to_unlock: return None return KVConnectorSchedulerOutput( block_ids_to_lock=all_block_ids_to_lock or None, block_ids_to_unlock=all_block_ids_to_unlock or None, )

mergify · 2026-01-30T15:54:33Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @orozery.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

orozery · 2026-02-18T14:10:17Z

Moving on to another direction (for now): #34805

orozery requested review from ApostaC, NickLucche, WoosukKwon, alexm-redhat, heheda12345, njhill, robertgshaw2-redhat and ywang96 as code owners January 29, 2026 16:36

mergify bot added v1 kv-connector labels Jan 29, 2026

gemini-code-assist bot reviewed Jan 29, 2026

View reviewed changes

mergify bot added the needs-rebase label Jan 30, 2026

orozery closed this Feb 18, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[KVConnector] Allow connector to protect GPU blocks from eviction#33353

[KVConnector] Allow connector to protect GPU blocks from eviction#33353
orozery wants to merge 1 commit intovllm-project:mainfrom
orozery:kv-connector-hold-gpu-blocks

orozery commented Jan 29, 2026 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Jan 29, 2026

Uh oh!

mergify bot commented Jan 30, 2026

Uh oh!

orozery commented Feb 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

orozery commented Jan 29, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

mergify bot commented Jan 30, 2026

Uh oh!

orozery commented Feb 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

orozery commented Jan 29, 2026 •

edited by github-actions bot

Loading