[kv_offload+HMA][2/N]: Support multiple KV groups in GPULoadStoreSpec by orozery · Pull Request #36642 · vllm-project/vllm

orozery · 2026-03-10T11:20:00Z

This PR extends GPULoadStoreSpec to support multiple KV cache groups.
Specifically, the block IDs of all groups are concatenated, and we use an
auxiliary group_sizes list to determine the number of blocks per each group.
Additionally, we add a block_indices parameter which is used to encode
the logical block index of the first block in each group.
This information is required in order to support loading from offloaded blocks
which are larger than GPU blocks.
In such cases, the first GPU block per each group may be unaligned to the offloaded
block size, and so knowing block_indices[i] allows the worker to correctly
skip part of the first matching offloaded block.

gemini-code-assist

Code Review

This pull request extends GPULoadStoreSpec to support multiple KV cache groups by adding group_sizes and block_indices parameters. The changes are well-implemented across the codebase, including updates to tests and connector logic. My main feedback is to improve the robustness of input validation in the GPULoadStoreSpec constructor by using ValueError instead of assert.

vllm/v1/kv_offload/mediums.py

NickLucche

@orozery Just curious about the choice of using group_sizes + flattened_block_ids list instead of aligning to KV manager on tuple(list[int])

orozery · 2026-03-17T13:04:46Z

@orozery Just curious about the choice of using group_sizes + flattened_block_ids list instead of aligning to KV manager on tuple(list[int])

Good question :)
First answer that comes to my mind:
GPULoadStoreSpec is IPC-transmitted over KVConnectorMetadata.
So we prefer to use a NumPy array instead of list[int] as it is more compact.

NickLucche

So we prefer to use a NumPy array instead of list[int] as it is more compact

I'd say that's fine given it's a very internal interface. LGTM

This commit extends GPULoadStoreSpec to support multiple KV cache groups. Specifically, the block IDs of all groups are concatenated, and we use an auxiliary group_sizes list to determine the number of blocks per each group. Additionally, we add a block_indices parameter which is used to encode the logical block index of the first block in each group. This information is required in order to support loading from offloaded blocks which are larger than GPU blocks. In such cases, the first GPU block per each group may be unaligned to the offloaded block size, and so knowing block_indices[i] allows the worker to correctly skip part of the first matching offloaded block. Signed-off-by: Or Ozeri <oro@il.ibm.com>

…vllm-project#36642) Signed-off-by: Or Ozeri <oro@il.ibm.com>

…vllm-project#36642) Signed-off-by: Or Ozeri <oro@il.ibm.com> Signed-off-by: Monishver Chandrasekaran <monishverchandrasekaran@gmail.com>

…vllm-project#36642) Signed-off-by: Or Ozeri <oro@il.ibm.com>

…vllm-project#36642) Signed-off-by: Or Ozeri <oro@il.ibm.com> Signed-off-by: Vinay Damodaran <vrdn@hey.com>

…vllm-project#36642) Signed-off-by: Or Ozeri <oro@il.ibm.com> Signed-off-by: EricccYang <yangyang4991@gmail.com>

orozery requested review from ApostaC and NickLucche as code owners March 10, 2026 11:20

mergify bot added v1 kv-connector labels Mar 10, 2026

gemini-code-assist bot reviewed Mar 10, 2026

View reviewed changes

vllm/v1/kv_offload/mediums.py Show resolved Hide resolved

NickLucche reviewed Mar 17, 2026

View reviewed changes

NickLucche added the ready ONLY add when PR is ready to merge/full CI is needed label Mar 17, 2026

NickLucche approved these changes Mar 18, 2026

View reviewed changes

orozery force-pushed the kv-offload-gpuloadstorespec-multiple-groups branch from d720518 to 7d05848 Compare March 18, 2026 14:16

orozery merged commit 5dd8df0 into vllm-project:main Mar 18, 2026
55 checks passed

fxdawnn pushed a commit to fxdawnn/vllm that referenced this pull request Mar 19, 2026

[kv_offload+HMA][2/N]: Support multiple KV groups in GPULoadStoreSpec (…

1a71c9a

…vllm-project#36642) Signed-off-by: Or Ozeri <oro@il.ibm.com>

SouthWest7 pushed a commit to SouthWest7/vllm that referenced this pull request Mar 27, 2026

[kv_offload+HMA][2/N]: Support multiple KV groups in GPULoadStoreSpec (…

b819edc

…vllm-project#36642) Signed-off-by: Or Ozeri <oro@il.ibm.com>

khairulkabir1661 pushed a commit to khairulkabir1661/vllm that referenced this pull request Mar 27, 2026

[kv_offload+HMA][2/N]: Support multiple KV groups in GPULoadStoreSpec (…

9496288

…vllm-project#36642) Signed-off-by: Or Ozeri <oro@il.ibm.com>

JiantaoXu pushed a commit to JiantaoXu/vllm that referenced this pull request Mar 28, 2026

[kv_offload+HMA][2/N]: Support multiple KV groups in GPULoadStoreSpec (…

f211f68

…vllm-project#36642) Signed-off-by: Or Ozeri <oro@il.ibm.com>

vrdn-23 pushed a commit to vrdn-23/vllm that referenced this pull request Mar 30, 2026

[kv_offload+HMA][2/N]: Support multiple KV groups in GPULoadStoreSpec (…

cfd3a52

…vllm-project#36642) Signed-off-by: Or Ozeri <oro@il.ibm.com> Signed-off-by: Vinay Damodaran <vrdn@hey.com>

EricccYang pushed a commit to EricccYang/vllm that referenced this pull request Apr 1, 2026

[kv_offload+HMA][2/N]: Support multiple KV groups in GPULoadStoreSpec (…

15e1111

…vllm-project#36642) Signed-off-by: Or Ozeri <oro@il.ibm.com> Signed-off-by: EricccYang <yangyang4991@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[kv_offload+HMA][2/N]: Support multiple KV groups in GPULoadStoreSpec#36642

[kv_offload+HMA][2/N]: Support multiple KV groups in GPULoadStoreSpec#36642
orozery merged 1 commit intovllm-project:mainfrom
orozery:kv-offload-gpuloadstorespec-multiple-groups

orozery commented Mar 10, 2026 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

NickLucche left a comment

Uh oh!

orozery commented Mar 17, 2026

Uh oh!

NickLucche left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

orozery commented Mar 10, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

NickLucche left a comment

Choose a reason for hiding this comment

Uh oh!

orozery commented Mar 17, 2026

Uh oh!

NickLucche left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

orozery commented Mar 10, 2026 •

edited by github-actions bot

Loading