Skip to content

[KVConnector] Allow connector to protect GPU blocks from eviction#33353

Closed
orozery wants to merge 1 commit intovllm-project:mainfrom
orozery:kv-connector-hold-gpu-blocks
Closed

[KVConnector] Allow connector to protect GPU blocks from eviction#33353
orozery wants to merge 1 commit intovllm-project:mainfrom
orozery:kv-connector-hold-gpu-blocks

Conversation

@orozery
Copy link
Copy Markdown
Collaborator

@orozery orozery commented Jan 29, 2026

This PR introduces a new connector API allowing the scheduler-side connector to increase GPU blocks ref-counts, in order to prevent them from evicting.
In particular, this is necessary for the case of async offloading of sliding window KV data, as it is automatically freed by the KV cache manager as the window progresses.

This commit introduces a new connector API allowing
the scheduler-side connector to increase GPU blocks ref-counts,
in order to prevent them from evicting.
In particular, this is necessary for the case of async offloading
of sliding window KV data, as it is automatically freed by the KV cache manager
as the window progresses.

Signed-off-by: Or Ozeri <oro@il.ibm.com>
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a valuable new API for KV connectors to lock and unlock GPU blocks, preventing them from being evicted. This is a crucial feature for scenarios like asynchronous offloading. The changes are well-structured, with clear API definitions in base.py and a solid implementation in the scheduler. The addition of has_work() to the scheduler interface is a logical extension to support this new functionality. The tests are comprehensive and cover the new locking/unlocking mechanism thoroughly. I have one suggestion to improve the robustness of the MultiConnector implementation.

Comment on lines +336 to +362
def report_to_scheduler(self) -> KVConnectorSchedulerOutput | None:
block_ids_to_lock: list[int] | None = None
block_ids_to_unlock: list[int] | None = None
for c in self._connectors:
output = c.report_to_scheduler()
if output is None:
continue

if output.block_ids_to_lock:
if not block_ids_to_lock:
block_ids_to_lock = output.block_ids_to_lock
else:
block_ids_to_lock.extend(output.block_ids_to_lock)

if output.block_ids_to_unlock:
if not block_ids_to_unlock:
block_ids_to_unlock = output.block_ids_to_unlock
else:
block_ids_to_unlock.extend(output.block_ids_to_unlock)

if not block_ids_to_lock and not block_ids_to_unlock:
return None

return KVConnectorSchedulerOutput(
block_ids_to_lock=block_ids_to_lock,
block_ids_to_unlock=block_ids_to_unlock,
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The current implementation of report_to_scheduler has a potential for side effects. The line if not block_ids_to_lock: block_ids_to_lock = output.block_ids_to_lock assigns a list from a sub-connector's output directly. If a subsequent sub-connector also returns blocks to lock, block_ids_to_lock.extend(...) will be called, modifying the list from the first sub-connector's output object. This can lead to unexpected behavior if a sub-connector reuses its output object across calls.

A safer approach is to initialize local lists and always use extend to aggregate the block IDs. This avoids modifying child connector outputs and makes the code more robust and easier to reason about.

    def report_to_scheduler(self) -> KVConnectorSchedulerOutput | None:
        all_block_ids_to_lock: list[int] = []
        all_block_ids_to_unlock: list[int] = []
        for c in self._connectors:
            output = c.report_to_scheduler()
            if output is None:
                continue

            if output.block_ids_to_lock:
                all_block_ids_to_lock.extend(output.block_ids_to_lock)

            if output.block_ids_to_unlock:
                all_block_ids_to_unlock.extend(output.block_ids_to_unlock)

        if not all_block_ids_to_lock and not all_block_ids_to_unlock:
            return None

        return KVConnectorSchedulerOutput(
            block_ids_to_lock=all_block_ids_to_lock or None,
            block_ids_to_unlock=all_block_ids_to_unlock or None,
        )

@mergify
Copy link
Copy Markdown

mergify bot commented Jan 30, 2026

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @orozery.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Jan 30, 2026
@orozery
Copy link
Copy Markdown
Collaborator Author

orozery commented Feb 18, 2026

Moving on to another direction (for now): #34805

@orozery orozery closed this Feb 18, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant