[kv_offload+HMA][0/N]: Support block-level preemption handling by orozery · Pull Request #34805 · vllm-project/vllm

orozery · 2026-02-18T14:09:16Z

This PR changes the handle_preemptions connector API function to support handling of arbitrary events via KVConnectorMetadata.
Specifically, this will allow handling of sliding-window layer blocks which can be evicted from the GPU KV cache while still being saved by a connector.

gemini-code-assist

Code Review

This pull request refactors the handle_preemptions API to use KVConnectorMetadata, which is a good change for flexibility. The implementation looks mostly correct, propagating the change through the connector hierarchy and updating tests. However, I found a redundant call to handle_preemptions in gpu_model_runner.py, which should be removed to avoid potential side effects and keep the logic clean. Please see my detailed comment.

gemini-code-assist · 2026-02-18T14:11:40Z

vllm/v1/worker/gpu_model_runner.py

+        if has_kv_transfer_group():
+            kv_connector_metadata = scheduler_output.kv_connector_metadata
+            assert kv_connector_metadata is not None
+            get_kv_transfer_group().handle_preemptions(kv_connector_metadata)


The handle_preemptions method is called here, but it's also called within ActiveKVConnector.pre_forward, which is invoked later in this execute_model function (via set_forward_context or kv_connector_no_forward). This results in handle_preemptions being called twice in each step.

While the current implementations appear to be idempotent, this redundancy can be confusing and might lead to bugs if a future connector's handle_preemptions is not idempotent. To centralize the logic, this call should be removed, relying on the one inside ActiveKVConnector.pre_forward.

AFAIK pre_forward is only called in model runner v2.

@orozery Does the code make this obvious or enforced? If it possible that it could be called twice?

All connector functions have 2 call locations, one for each model runner.
For a specific run, only one model runner will be used (either v1 or v2), so it's not possible functions will be called twice.

hickeyma

This looks good, thanks @orozery. Some comments inline to address.

hickeyma · 2026-02-20T16:19:38Z

vllm/v1/worker/gpu_model_runner.py

-            )
+        if has_kv_transfer_group():
+            kv_connector_metadata = scheduler_output.kv_connector_metadata
+            assert kv_connector_metadata is not None


Should this assert here? Previously, it checked for ids were available.

Right, previously handle_preemptions only got a a set of preempted requests IDs, whereas now it gets KVConnectorMetadata.
has_kv_transfer_group() guarantees that kv_connector_metadata is not None (since build_connector_metadata is called on each step), so that assert is fine.
BTW the same assert also exists in the forward pass (at KVConnectorModelRunnerMixin._get_kv_connector_output).

hickeyma · 2026-02-20T16:21:29Z

vllm/v1/worker/gpu_model_runner.py

+        if has_kv_transfer_group():
+            kv_connector_metadata = scheduler_output.kv_connector_metadata
+            assert kv_connector_metadata is not None
+            get_kv_transfer_group().handle_preemptions(kv_connector_metadata)


@orozery Does the code make this obvious or enforced? If it possible that it could be called twice?

vllm/distributed/kv_transfer/kv_connector/v1/offloading_connector.py

hickeyma · 2026-02-20T16:24:02Z

vllm/distributed/kv_transfer/kv_connector/v1/offloading_connector.py

        self._register_handlers(kv_caches, attn_backends)

-    def handle_preemptions(self, preempted_req_ids: set[str]):
+    def handle_preemptions(self, kv_connector_metadata: OffloadingConnectorMetadata):


Should it be documented that this is a breaking API change?

This is a fresh API that I recently introduced. It's not used by any in-tree connector, and most-likely not used at all.
I don't see any benefit from documenting it.

hickeyma

LGTM, thanks @orozery

mergify · 2026-03-10T09:38:37Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @orozery.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

This commit changes the handle_preemptions connector API function to support handling of arbitrary events via KVConnectorMetadata. Specifically, this will allow handling of sliding-window layer blocks which can be evicted from the GPU KV cache while still being saved by a connector. Signed-off-by: Or Ozeri <oro@il.ibm.com>

NickLucche

@orozery I am not super thrilled about breaking interface but iiuc:

this callback is quite niche to offloading and also quite new
the info we're now passing is a "super-set" of what we were passing previously

orozery · 2026-03-17T13:09:11Z

the info we're now passing is a "super-set" of what we were passing previously

It's not exactly a super-set, but it could be a super-set since KVConnectorMetadata is built from SchedulerOutput, which also includes the previously passed preempted_req_ids.
So anyhow, any implementation using the old API can be surely converted to using the new API.

…project#34805) Signed-off-by: Or Ozeri <oro@il.ibm.com> Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>

…project#34805) Signed-off-by: Or Ozeri <oro@il.ibm.com> Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com> Signed-off-by: Monishver Chandrasekaran <monishverchandrasekaran@gmail.com>

…project#34805) Signed-off-by: Or Ozeri <oro@il.ibm.com> Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>

…project#34805) Signed-off-by: Or Ozeri <oro@il.ibm.com> Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com> Signed-off-by: Vinay Damodaran <vrdn@hey.com>

…project#34805) Signed-off-by: Or Ozeri <oro@il.ibm.com> Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com> Signed-off-by: EricccYang <yangyang4991@gmail.com>

…nge (#190) ## Summary - vLLM [#34805](vllm-project/vllm#34805) changed `handle_preemptions` signature from `set[str]` to `KVConnectorMetadata` - Add runtime type check in `__init__.py` to support both old and new vLLM versions - Carry `preempted_req_ids` through `PegaConnectorMetadata` for new vLLM path ## Test plan - [x] Deploy with new vLLM (v0.17.2rc1+) — verify preemption no longer crashes with `TypeError` - [x] Deploy with old vLLM — verify preemption still works via `set[str]` path 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

orozery requested a review from njhill February 18, 2026 14:09

orozery requested review from ApostaC, NickLucche and WoosukKwon as code owners February 18, 2026 14:09

orozery mentioned this pull request Feb 18, 2026

[KVConnector] Allow connector to protect GPU blocks from eviction #33353

Closed

mergify bot added v1 kv-connector labels Feb 18, 2026

gemini-code-assist bot reviewed Feb 18, 2026

View reviewed changes

hickeyma suggested changes Feb 20, 2026

View reviewed changes

orozery force-pushed the kv-connector-preemptions branch from 43254e1 to dc73aae Compare February 22, 2026 07:42

hickeyma approved these changes Feb 24, 2026

View reviewed changes

orozery changed the title ~~[KVConnector]: Support block-level preemption handling~~ [kv_offload+HMA][0/N]: Support block-level preemption handling Mar 10, 2026

mergify bot added the needs-rebase label Mar 10, 2026

orozery force-pushed the kv-connector-preemptions branch from dc73aae to cb1180e Compare March 10, 2026 11:27

mergify bot removed the needs-rebase label Mar 10, 2026

NickLucche approved these changes Mar 17, 2026

View reviewed changes

Merge branch 'main' into kv-connector-preemptions

9b78ffb

orozery added the ready ONLY add when PR is ready to merge/full CI is needed label Mar 17, 2026

Merge branch 'main' into kv-connector-preemptions

a226b74

orozery merged commit fcf0687 into vllm-project:main Mar 18, 2026
62 checks passed

fxdawnn pushed a commit to fxdawnn/vllm that referenced this pull request Mar 19, 2026

[kv_offload+HMA][0/N]: Support block-level preemption handling (vllm-…

a16631f

…project#34805) Signed-off-by: Or Ozeri <oro@il.ibm.com> Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>

SouthWest7 pushed a commit to SouthWest7/vllm that referenced this pull request Mar 27, 2026

[kv_offload+HMA][0/N]: Support block-level preemption handling (vllm-…

39329d1

…project#34805) Signed-off-by: Or Ozeri <oro@il.ibm.com> Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>

JiantaoXu pushed a commit to JiantaoXu/vllm that referenced this pull request Mar 28, 2026

[kv_offload+HMA][0/N]: Support block-level preemption handling (vllm-…

c1eb6f7

…project#34805) Signed-off-by: Or Ozeri <oro@il.ibm.com> Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>

xiaguan mentioned this pull request Apr 1, 2026

fix(connector): adapt handle_preemptions to vLLM #34805 interface change novitalabs/pegaflow#190

Merged

2 tasks

Uh oh!

Conversation

orozery commented Feb 18, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Feb 18, 2026

Choose a reason for hiding this comment

Uh oh!

orozery Feb 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hickeyma Feb 20, 2026

Choose a reason for hiding this comment

Uh oh!

orozery Feb 22, 2026

Choose a reason for hiding this comment

Uh oh!

hickeyma left a comment

Choose a reason for hiding this comment

Uh oh!

hickeyma Feb 20, 2026

Choose a reason for hiding this comment

Uh oh!

orozery Feb 22, 2026

Choose a reason for hiding this comment

Uh oh!

hickeyma Feb 20, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

hickeyma Feb 20, 2026

Choose a reason for hiding this comment

Uh oh!

orozery Feb 22, 2026

Choose a reason for hiding this comment

Uh oh!

hickeyma left a comment

Choose a reason for hiding this comment

Uh oh!

mergify bot commented Mar 10, 2026

Uh oh!

NickLucche left a comment

Choose a reason for hiding this comment

Uh oh!

orozery commented Mar 17, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

orozery commented Feb 18, 2026 •

edited by github-actions bot

Loading

orozery Feb 18, 2026 •

edited

Loading