Skip to content

[kv_offload+HMA][9/N]: Support lookup with multiple KV groups#39401

Open
orozery wants to merge 1 commit intovllm-project:mainfrom
orozery:kv-offload-lookup-multiple-groups
Open

[kv_offload+HMA][9/N]: Support lookup with multiple KV groups#39401
orozery wants to merge 1 commit intovllm-project:mainfrom
orozery:kv-offload-lookup-multiple-groups

Conversation

@orozery
Copy link
Copy Markdown
Collaborator

@orozery orozery commented Apr 9, 2026

This PR extends the offloading connector to support lookups (get_num_new_matched_tokens) where KVCacheConfig contains multiple groups.

Currently supports only full attention groups.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the offloading scheduler in vllm/distributed/kv_transfer/kv_connector/v1/offloading/scheduler.py to support multiple KV cache groups. It initializes lookup_groups in the constructor and modifies the get_num_new_matched_tokens method to iterate through these groups, performing individual block lookups and handling request deferral or delay based on the state of blocks within each group. There are no review comments to address.

This commit extends the offloading connector to support lookups (get_num_new_matched_tokens)
where KVCacheConfig contains multiple groups.

Signed-off-by: Or Ozeri <oro@il.ibm.com>
@orozery orozery force-pushed the kv-offload-lookup-multiple-groups branch from 587ba2a to 56f4df9 Compare April 20, 2026 11:05
@orozery orozery added the ready ONLY add when PR is ready to merge/full CI is needed label Apr 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kv-connector ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant