Skip to content

[Misc][Hybrid allocator + kv connector] Optionally enable hybrid allocator + KV cache connector#29805

Merged
heheda12345 merged 8 commits into
vllm-project:mainfrom
NickLucche:enable-hma-kv-connector
Dec 15, 2025
Merged

[Misc][Hybrid allocator + kv connector] Optionally enable hybrid allocator + KV cache connector#29805
heheda12345 merged 8 commits into
vllm-project:mainfrom
NickLucche:enable-hma-kv-connector

Conversation

@NickLucche
Copy link
Copy Markdown
Member

@NickLucche NickLucche commented Dec 1, 2025

This PR contains a simple toggle for enabling the experimental HMA + KVConnector integration, needed to build actual HMA support into existing connectors.
This feature is still under development so only the LMCache will actually work with it out of the box.

cc @KuntaiDu @heheda12345 @ivanium

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This PR introduces a toggle for enabling HMA with KV connectors. The implementation is straightforward. However, there is a risk of runtime crashes if the toggle is enabled for a connector that does not support HMA. I've suggested a safeguard to prevent this by checking if the configured connector is LMCache, which is currently the only supported one. This change would also require a small update to the new unit test.

Comment thread vllm/config/vllm.py Outdated
Comment thread vllm/config/vllm.py Outdated
"your connector by making sure your connector is a subclass"
" of `SupportsHMA` defined in kv_connector/v1/base.py."
)
self.scheduler_config.disable_hybrid_kv_cache_manager = True
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about making disable_hybrid_kv_cache_manager a bool | None so that we can follow the user-specified config if it is not None and decide it automatically if it is None. Like this:

num_gpu_blocks_override: int | None = None

@NickLucche
Copy link
Copy Markdown
Member Author

@heheda12345 I have addressed your comment.
I have split the handling between forcing HMA off and not expressing a preference to maintain current logic unchanged for all but kv connector (scope of this PR).
We can move options into the if disable_hybrid_kv_cache_manager is None branch more gracefully in separate PRs with better context.

However I am not sure this is simplifying things. Let me know what you think.

PS: --no-disable-hybrid-kv-cache-manager must be used to explicitly enable HMA.

@NickLucche NickLucche added the ready ONLY add when PR is ready to merge/full CI is needed label Dec 4, 2025
@NickLucche NickLucche force-pushed the enable-hma-kv-connector branch from 061f3e1 to f78e350 Compare December 4, 2025 16:43
Comment thread vllm/config/vllm.py Outdated
prev_disable_hma is False
and self.scheduler_config.disable_hybrid_kv_cache_manager is True
):
logger.info(
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this code path, maybe we can raise a NotImplementedError with some explanation instead of a warning? Because for me if the user intentionally set --no-disable-hybrid-manager they are not expecting vLLM falling back to non-hybrid-allocator code path.

Explanation example:

Hybrid KV cache manager is explicitly enabled, but currently `--kv-events-config` is set and KV events code path is not compatible with hybrid kv cache manager.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see your point, the idea was that I didn't want this PR to affect existing behavior.
I felt like logging now and then swapping to a NotImplementedError later on when we've given users time to omit --no--disable-hybrid-manager, would actually be a more desirable ux.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would logger.warning better fit here?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I thought this code path will only be triggered when --no-disable-hybrid-manager is set. Let me double check.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

chatted offline about this, we're raising exception on this PR

Comment thread vllm/config/vllm.py Outdated
Copy link
Copy Markdown
Collaborator

@KuntaiDu KuntaiDu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution!
See comments.

@NickLucche
Copy link
Copy Markdown
Member Author

Thanks for reviewing @KuntaiDu ! I've addressed your comments

Copy link
Copy Markdown
Collaborator

@ivanium ivanium left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall the PR looks good to me too! I just have one question. Right now we still disable the hybrid allocator whenever kv_transfer_config is not None. When we eventually support hybrid allocation per KV connector, what’s the best practice you envision? For example, if we want hybrid allocation enabled only for a specific subset of connectors, will there be a mechanism for developers to enforce “hybrid allocator + chosen connector” together?

Comment thread vllm/config/vllm.py Outdated
prev_disable_hma is False
and self.scheduler_config.disable_hybrid_kv_cache_manager is True
):
logger.info(
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would logger.warning better fit here?

@KuntaiDu
Copy link
Copy Markdown
Collaborator

KuntaiDu commented Dec 7, 2025

Overall the PR looks good to me too! I just have one question. Right now we still disable the hybrid allocator whenever kv_transfer_config is not None. When we eventually support hybrid allocation per KV connector, what’s the best practice you envision? For example, if we want hybrid allocation enabled only for a specific subset of connectors, will there be a mechanism for developers to enforce “hybrid allocator + chosen connector” together?

Good question. I need to think a bit about this. Let me get back to you later.

Signed-off-by: NickLucche <nlucches@redhat.com>
This reverts commit 250ea870740e99ced7b496d61d4c1056d6b98156.

Signed-off-by: NickLucche <nlucches@redhat.com>
Signed-off-by: NickLucche <nlucches@redhat.com>
Signed-off-by: NickLucche <nlucches@redhat.com>
Signed-off-by: NickLucche <nlucches@redhat.com>
Signed-off-by: NickLucche <nlucches@redhat.com>
@NickLucche NickLucche force-pushed the enable-hma-kv-connector branch from cc6977a to 4d59565 Compare December 9, 2025 17:42
Comment thread vllm/config/vllm.py Outdated

# Runtime-dependent disable of hybrid kv cache manager logic.
if not self.scheduler_config.disable_hybrid_kv_cache_manager:
prev_disable_hma = self.scheduler_config.disable_hybrid_kv_cache_manager
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about:

need_disable_hybrid_kv_cache_manager = False
if not current_platform.support_hybrid_kv_cache():
    need_disable_hybrid_kv_cache_manager = True
if ***: need_disable_hybrid_kv_cache_manager = True
...
if self.scheduler_config.disable_hybrid_kv_cache_manager  is None:
    if self.kv_transfer_config is not None:
         # Experimental feature. Default to disable but allow users to enable.
         need_disable_hybrid_kv_cache_manager = True
         logger.warning(***)
    self.scheduler_config.disable_hybrid_kv_cache_manager = need_disable_hybrid_kv_cache_manager
elif self.scheduler_config.disable_hybrid_kv_cache_manager == False:
    if need_disable_hybrid_kv_cache_manager: raise xxxx

I feel prev_disable_hma is a little bit hacky.

Copy link
Copy Markdown
Member Author

@NickLucche NickLucche Dec 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@heheda12345 I am sorry but I don't see how checking whether a bool was flipped here is hacky.
EDIT: I think I see what you mean now, you're referring to clarity. Don't have a strong opinion on that, will check it out when I have the time.

I've separated features that do NOT work with HMA from features that may work with HMA such as kv connector (a later check on supports_hma is carried out when attempting to create the actual connector).

Attempting to enable HMA explicitly on the former will crash the server. prev_disable_hma is just for checking whether the flag was set by the user without adding more lines such as need_disable_hybrid_kv_cache_manager.

thameem-abbas added a commit to thameem-abbas/vllm that referenced this pull request Dec 10, 2025
Signed-off-by: NickLucche <nlucches@redhat.com>
@NickLucche
Copy link
Copy Markdown
Member Author

NickLucche commented Dec 11, 2025

@heheda12345 I have addressed your review, hopefully that improves clarity. Thanks for looking into it!

Copy link
Copy Markdown
Collaborator

@heheda12345 heheda12345 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thank you very much.

@heheda12345 heheda12345 enabled auto-merge (squash) December 11, 2025 17:06
@heheda12345 heheda12345 merged commit 185c22b into vllm-project:main Dec 15, 2025
51 checks passed
Lucaskabela pushed a commit to Lucaskabela/vllm that referenced this pull request Dec 15, 2025
…cator + KV cache connector (vllm-project#29805)

Signed-off-by: NickLucche <nlucches@redhat.com>
joa-stdn pushed a commit to joa-stdn/vllm that referenced this pull request Dec 15, 2025
…cator + KV cache connector (vllm-project#29805)

Signed-off-by: NickLucche <nlucches@redhat.com>
Signed-off-by: Joachim Studnia <joachim@mistral.ai>
Majid-Taheri pushed a commit to Majid-Taheri/vllm that referenced this pull request Dec 23, 2025
…cator + KV cache connector (vllm-project#29805)

Signed-off-by: NickLucche <nlucches@redhat.com>
Signed-off-by: Ubuntu <mjtaheri68@gmail.com>
mystous pushed a commit to mystous/vllm_hybrid that referenced this pull request May 10, 2026
…cator + KV cache connector (vllm-project#29805)

Signed-off-by: NickLucche <nlucches@redhat.com>
my-other-github-account pushed a commit to my-other-github-account/vllm that referenced this pull request May 15, 2026
…cator + KV cache connector (vllm-project#29805)

Signed-off-by: NickLucche <nlucches@redhat.com>
my-other-github-account pushed a commit to my-other-github-account/vllm that referenced this pull request May 15, 2026
…cator + KV cache connector (vllm-project#29805)

Signed-off-by: NickLucche <nlucches@redhat.com>
0826joyce pushed a commit to 0826joyce/vllm-serving-optimization that referenced this pull request May 19, 2026
…cator + KV cache connector (vllm-project#29805)

Signed-off-by: NickLucche <nlucches@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kv-connector ready ONLY add when PR is ready to merge/full CI is needed v1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants