[kv_offload]: Add request_finished method to OffloadingManager and decouple store policy by hickeyma · Pull Request #42050 · vllm-project/vllm

hickeyma · 2026-05-08T09:35:45Z

Purpose

Adds a request_finished method to OffloadingManager so implementations can react when a request ends, e.g. to flush a deferred transfer for the last partial block. The method is a no-op default to keep existing subclasses compatible. FilterReusedOffloadingManager delegates it to its backing manager and the connector scheduler calls it alongside the new policy class.

The block-selection logic embedded in _build_store_jobs is pulled out into a new OffloadPolicy class. The existing behaviour becomes StoreOnComputePolicy, which owns the per-request, per-group progress index that previously lived as next_stored_block_idx on RequestGroupState. The renamed RequestKVState (was RequestOffloadState) now only tracks KV state (offload keys, block IDs, in-flight jobs).

The scheduler state types are moved to a new state.py so that policy.py can import them without creating a circular dependency back through scheduler.py.

Partial #33689

Tasks:

Add request_finished method to OffloadingManager.
Create OffloadPolicy(ABC).get_blocks_to_store. Move existing _get_reqs_to_store to StoreOnComputePolicy(OffloadPolicy). Rename RequestOffloadState to RequestKVState, and move next_stored_block_idx to StoreOnComputePolicy.

Test Plan

VLLM_LOG_STATS_INTERVAL=0.01 vllm bench throughput --model Qwen/Qwen3-14B --kv-offloading-size 10 --disable-hybrid-kv-cache-manager --num-prompts 1000 --kv-events-config '{"enable_kv_cache_events": "True", "publisher": "zmq", "topic": "kv-events"}'

Test Result

[...]
Throughput: 3.43 requests/s, 3952.48 total tokens/s, 439.16 output tokens/s
Total num prompt tokens:  1024000
Total num output tokens:  128000

claude

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

hickeyma · 2026-05-08T09:37:10Z

Supersedes #40625

gemini-code-assist

Code Review

This pull request introduces an OffloadPolicy abstraction to decouple KV block offloading logic from the scheduler and refactors state management into a new state.py file. It also adds a request_finished hook to both the OffloadingManager and OffloadPolicy for cleanup and deferred transfers. Review feedback suggests optimizing dictionary access in the hot path to reduce allocations, using try...finally blocks for robust cleanup to prevent memory leaks, and addressing inconsistencies in the request_finished API documentation regarding partial block flushing.

mergify · 2026-05-08T09:45:11Z

Hi @hickeyma, the pre-commit checks have failed. Please run:

uv pip install pre-commit>=4.5.1
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy failing?

mypy is run differently in CI. If the failure is related to this check, please use the following command to run it locally:

# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10

…couple store policy Adds a `request_finished` method to `OffloadingManager` so implementations can react when a request ends, e.g. to flush a deferred transfer for the last partial block. The method is a no-op default to keep existing subclasses compatible. `FilterReusedOffloadingManager` delegates it to its backing manager, and the connector scheduler calls it alongside the new policy hook. The block-selection logic embedded in `_build_store_jobs` is pulled out into a new `OffloadPolicy` class. The existing behaviour becomes `StoreOnComputePolicy`, which owns the per-request, per-group progress index that previously lived as `next_stored_block_idx` on `RequestGroupState`. The renamed `RequestKVState` (was `RequestOffloadState`) now only tracks KV state — offload keys, block IDs, in-flight jobs. The scheduler state types are moved to a new `state.py` so that `policy.py` can import them without creating a circular dependency back through `scheduler.py`. Signed-off-by: Martin Hickey <martin.hickey@ie.ibm.com>

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Martin Hickey <martin.hickey@ie.ibm.com>

Review comment: - vllm-project#42050 (comment) Signed-off-by: Martin Hickey <martin.hickey@ie.ibm.com>

hickeyma requested review from ApostaC, NickLucche, orozery and xuechendi as code owners May 8, 2026 09:35

claude Bot reviewed May 8, 2026

View reviewed changes

mergify Bot added v1 kv-connector labels May 8, 2026

hickeyma mentioned this pull request May 8, 2026

[WIP][kv_offload] Decouple store policy and request lifecycle from the scheduler #40625

Closed

2 tasks

gemini-code-assist Bot reviewed May 8, 2026

View reviewed changes

hickeyma force-pushed the decouple-store-policy branch from 05a0b68 to 53571c2 Compare May 8, 2026 10:37

hickeyma and others added 4 commits May 8, 2026 11:59

Update vllm/distributed/kv_transfer/kv_connector/v1/offloading/policy.py

6c1c916

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Martin Hickey <martin.hickey@ie.ibm.com>

Update vllm/distributed/kv_transfer/kv_connector/v1/offloading/policy.py

df08633

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Martin Hickey <martin.hickey@ie.ibm.com>

Merge branch 'main' into decouple-store-policy

f7a95dd

Refine request_finished doc string

b04603d

Review comment: - vllm-project#42050 (comment) Signed-off-by: Martin Hickey <martin.hickey@ie.ibm.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[kv_offload]: Add request_finished method to OffloadingManager and decouple store policy#42050

[kv_offload]: Add request_finished method to OffloadingManager and decouple store policy#42050
hickeyma wants to merge 5 commits intovllm-project:mainfrom
hickeyma:decouple-store-policy

hickeyma commented May 8, 2026 •

edited

Loading

Uh oh!

claude Bot left a comment

Uh oh!

hickeyma commented May 8, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mergify Bot commented May 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

hickeyma commented May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

Claude Code Review

Uh oh!

hickeyma commented May 8, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mergify Bot commented May 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

hickeyma commented May 8, 2026 •

edited

Loading