[WIP][kv_offload] Decouple store policy and request lifecycle from the scheduler by hickeyma · Pull Request #40625 · vllm-project/vllm

hickeyma · 2026-04-22T15:07:04Z

Purpose

This commit helps separate three concerns that were previously conflated in OffloadingConnectorScheduler: transfer lifecycle, store policy, and request teardown.

It achieves this by:

Adding a request_finished() to OffloadingManager so the scheduler can ask the manager whether GPU blocks are safe to free, rather than maintaining that knowledge itself via an inline dict check.
Extracting the hardcoded store-on-compute logic into a pluggable interface OffloadPolicy.
StoreOnComputePolicy deomonstrates the ability to add future policies
(preemption-only, spillover) which can be injected at construction with no scheduler changes.
Moving the per-request store watermark out of the general-purpose RequestKVState struct and
into the policy that owns it.

Partial #33689

Tasks:

Add request_finished method to OffloadingManager.
Create OffloadPolicy(ABC).get_blocks_to_store. Move existing _get_reqs_to_store to StoreOnComputePolicy(OffloadPolicy). Rename RequestOffloadState to RequestKVState, and move next_stored_block_idx to StoreOnComputePolicy.

Note: ~~This is based on PR #40538. This PR should only be merged after #40538.~~ #40538 merged

Test Plan

Test Result

claude

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

gemini-code-assist

Code Review

This pull request refactors the KV offloading architecture by consolidating core abstractions into a new base module and introducing a policy-driven approach for storage decisions via the OffloadPolicy interface. Key feedback identifies a critical race condition in the scheduler's request teardown logic where the removal of in-flight transfer checks could lead to premature GPU memory freeing. Additionally, the StoreOnComputePolicy incorrectly advances the store watermark for blocks filtered by the manager, which may prevent eligible blocks from being offloaded in subsequent steps.

gemini-code-assist · 2026-04-22T15:09:12Z


-        request_being_stored = req_id in self._reqs_being_stored
-        return request_being_stored, None
+        return self.manager.request_finished(req_id), None


The logic for checking in-flight transfers during request teardown has been removed. The previous implementation checked req_id in self._reqs_being_stored to ensure that GPU blocks are not freed while an asynchronous store operation is still reading from them. By delegating this entirely to self.manager.request_finished(req_id), and given that the current OffloadingManager implementations (like CPUOffloadingManager) do not track the scheduler's in-flight transfers, this creates a critical race condition where GPU memory could be freed and reused while a transfer is still in progress. The scheduler must maintain its own check for transfers it has initiated to prevent use-after-free or data corruption.

Suggested change

return self.manager.request_finished(req_id), None

return self.manager.request_finished(req_id) or req_id in self._reqs_being_stored, None

gemini-code-assist · 2026-04-22T15:09:12Z

+                )
+                continue
+
+            self._next_stored_block_idx[req_id][0] = num_blocks


The store watermark is advanced even if no blocks were actually submitted for storage (e.g., if the manager filtered them out via FilterReusedOffloadingManager). This means that blocks skipped due to a reuse threshold will never be reconsidered for offloading in subsequent steps, even after they have been reused enough times to pass the filter. The watermark should only be advanced for blocks that the manager has accepted for storage, or the policy should track skipped blocks to retry them in future steps.

Signed-off-by: Martin Hickey <martin.hickey@ie.ibm.com>

Code review from gemini-code-assist to avoid potential circular dependency with `vllm.v1.kv_offload.cpu.manager` to add the manager imports to the relevant methods. vllm-project#40538 (comment) Signed-off-by: Martin Hickey <martin.hickey@ie.ibm.com>

…eduler This commit helps separate three concerns that were previously conflated in `OffloadingConnectorScheduler`: transfer lifecycle, store policy, and request teardown. It achievees this by: - Adding a request_finished lifecycle hook to OffloadingManager so the scheduler can ask the manager whether GPU blocks are safe to free, rather than maintaining that knowledge itself via an inline dict check. - Extracting the hardcoded store-on-compute logic into a pluggable interface `OffloadPolicy`. `StoreOnComputePolicy` deomonstrates the ability to add future policies (preemption-only, spillover) which can be injected at construction with no scheduler changes. - Moving the per-request store watermark out of the general-purpose `RequestKVState` struct and into the policy that owns it. Signed-off-by: Martin Hickey <martin.hickey@ie.ibm.com>

Signed-off-by: Martin Hickey <martin.hickey@ie.ibm.com>

mergify · 2026-05-07T13:23:36Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @hickeyma.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

hickeyma · 2026-05-08T09:37:43Z

Closing as superseded by #42050

hickeyma requested review from ApostaC, NickLucche, orozery and xuechendi as code owners April 22, 2026 15:07

claude Bot reviewed Apr 22, 2026

View reviewed changes

mergify Bot added v1 kv-connector labels Apr 22, 2026

gemini-code-assist Bot reviewed Apr 22, 2026

View reviewed changes

hickeyma added 3 commits April 30, 2026 06:05

Refactor KV Offloading

3d6d953

Signed-off-by: Martin Hickey <martin.hickey@ie.ibm.com>

Fix fragile circular dependency

3133c68

Code review from gemini-code-assist to avoid potential circular dependency with `vllm.v1.kv_offload.cpu.manager` to add the manager imports to the relevant methods. vllm-project#40538 (comment) Signed-off-by: Martin Hickey <martin.hickey@ie.ibm.com>

hickeyma force-pushed the add-sched-side-policy-extr branch from 2775f46 to fec97b6 Compare April 30, 2026 05:35

hickeyma marked this pull request as draft April 30, 2026 05:35

hickeyma changed the title ~~[kv_offload] Decouple store policy and request lifecycle from the scheduler~~ [WIP][kv_offload] Decouple store policy and request lifecycle from the scheduler Apr 30, 2026

hickeyma added 2 commits April 30, 2026 08:10

Merge branch 'main' into add-sched-side-policy-extr

9d9b55e

Refactor after changes to scheduling in main

f6213ea

Signed-off-by: Martin Hickey <martin.hickey@ie.ibm.com>

mergify Bot added the needs-rebase label May 7, 2026

hickeyma mentioned this pull request May 8, 2026

[kv_offload]: Add request_finished method to OffloadingManager and decouple store policy #42050

Open

2 tasks

hickeyma closed this May 8, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[WIP][kv_offload] Decouple store policy and request lifecycle from the scheduler#40625

[WIP][kv_offload] Decouple store policy and request lifecycle from the scheduler#40625
hickeyma wants to merge 5 commits intovllm-project:mainfrom
hickeyma:add-sched-side-policy-extr

hickeyma commented Apr 22, 2026 •

edited

Loading

Uh oh!

claude Bot left a comment

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Apr 22, 2026

Uh oh!

gemini-code-assist Bot Apr 22, 2026

Uh oh!

mergify Bot commented May 7, 2026

Uh oh!

hickeyma commented May 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	return self.manager.request_finished(req_id), None
	return self.manager.request_finished(req_id) or req_id in self._reqs_being_stored, None

Uh oh!

Conversation

hickeyma commented Apr 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

Claude Code Review

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

mergify Bot commented May 7, 2026

Uh oh!

hickeyma commented May 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

hickeyma commented Apr 22, 2026 •

edited

Loading