Skip to content

[kv_offload+HMA][2/N]: Support multiple KV groups in GPULoadStoreSpec#36642

Merged
orozery merged 1 commit intovllm-project:mainfrom
orozery:kv-offload-gpuloadstorespec-multiple-groups
Mar 18, 2026
Merged

[kv_offload+HMA][2/N]: Support multiple KV groups in GPULoadStoreSpec#36642
orozery merged 1 commit intovllm-project:mainfrom
orozery:kv-offload-gpuloadstorespec-multiple-groups

Conversation

@orozery
Copy link
Copy Markdown
Collaborator

@orozery orozery commented Mar 10, 2026

This PR extends GPULoadStoreSpec to support multiple KV cache groups.
Specifically, the block IDs of all groups are concatenated, and we use an
auxiliary group_sizes list to determine the number of blocks per each group.
Additionally, we add a block_indices parameter which is used to encode
the logical block index of the first block in each group.
This information is required in order to support loading from offloaded blocks
which are larger than GPU blocks.
In such cases, the first GPU block per each group may be unaligned to the offloaded
block size, and so knowing block_indices[i] allows the worker to correctly
skip part of the first matching offloaded block.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request extends GPULoadStoreSpec to support multiple KV cache groups by adding group_sizes and block_indices parameters. The changes are well-implemented across the codebase, including updates to tests and connector logic. My main feedback is to improve the robustness of input validation in the GPULoadStoreSpec constructor by using ValueError instead of assert.

Copy link
Copy Markdown
Collaborator

@NickLucche NickLucche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@orozery Just curious about the choice of using group_sizes + flattened_block_ids list instead of aligning to KV manager on tuple(list[int])

@NickLucche NickLucche added the ready ONLY add when PR is ready to merge/full CI is needed label Mar 17, 2026
@orozery
Copy link
Copy Markdown
Collaborator Author

orozery commented Mar 17, 2026

@orozery Just curious about the choice of using group_sizes + flattened_block_ids list instead of aligning to KV manager on tuple(list[int])

Good question :)
First answer that comes to my mind:
GPULoadStoreSpec is IPC-transmitted over KVConnectorMetadata.
So we prefer to use a NumPy array instead of list[int] as it is more compact.

Copy link
Copy Markdown
Collaborator

@NickLucche NickLucche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we prefer to use a NumPy array instead of list[int] as it is more compact

I'd say that's fine given it's a very internal interface. LGTM

This commit extends GPULoadStoreSpec to support multiple KV cache groups.
Specifically, the block IDs of all groups are concatenated, and we use an
auxiliary group_sizes list to determine the number of blocks per each group.
Additionally, we add a block_indices parameter which is used to encode
the logical block index of the first block in each group.
This information is required in order to support loading from offloaded blocks
which are larger than GPU blocks.
In such cases, the first GPU block per each group may be unaligned to the offloaded
block size, and so knowing block_indices[i] allows the worker to correctly
skip part of the first matching offloaded block.

Signed-off-by: Or Ozeri <oro@il.ibm.com>
@orozery orozery force-pushed the kv-offload-gpuloadstorespec-multiple-groups branch from d720518 to 7d05848 Compare March 18, 2026 14:16
@orozery orozery merged commit 5dd8df0 into vllm-project:main Mar 18, 2026
55 checks passed
fxdawnn pushed a commit to fxdawnn/vllm that referenced this pull request Mar 19, 2026
SouthWest7 pushed a commit to SouthWest7/vllm that referenced this pull request Mar 27, 2026
khairulkabir1661 pushed a commit to khairulkabir1661/vllm that referenced this pull request Mar 27, 2026
Monishver11 pushed a commit to Monishver11/vllm that referenced this pull request Mar 27, 2026
…vllm-project#36642)

Signed-off-by: Or Ozeri <oro@il.ibm.com>
Signed-off-by: Monishver Chandrasekaran <monishverchandrasekaran@gmail.com>
JiantaoXu pushed a commit to JiantaoXu/vllm that referenced this pull request Mar 28, 2026
vrdn-23 pushed a commit to vrdn-23/vllm that referenced this pull request Mar 30, 2026
…vllm-project#36642)

Signed-off-by: Or Ozeri <oro@il.ibm.com>
Signed-off-by: Vinay Damodaran <vrdn@hey.com>
EricccYang pushed a commit to EricccYang/vllm that referenced this pull request Apr 1, 2026
…vllm-project#36642)

Signed-off-by: Or Ozeri <oro@il.ibm.com>
Signed-off-by: EricccYang <yangyang4991@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kv-connector ready ONLY add when PR is ready to merge/full CI is needed v1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants