[Misc][Hybrid allocator + kv connector] Optionally enable hybrid allocator + KV cache connector by NickLucche · Pull Request #29805 · vllm-project/vllm

NickLucche · 2025-12-01T19:00:44Z

This PR contains a simple toggle for enabling the experimental HMA + KVConnector integration, needed to build actual HMA support into existing connectors.
This feature is still under development so only the LMCache will actually work with it out of the box.

cc @KuntaiDu @heheda12345 @ivanium

chatgpt-codex-connector · 2025-12-01T19:00:53Z

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.

gemini-code-assist

Code Review

This PR introduces a toggle for enabling HMA with KV connectors. The implementation is straightforward. However, there is a risk of runtime crashes if the toggle is enabled for a connector that does not support HMA. I've suggested a safeguard to prevent this by checking if the configured connector is LMCache, which is currently the only supported one. This change would also require a small update to the new unit test.

heheda12345 · 2025-12-02T19:09:19Z

+                        "your connector by making sure your connector is a subclass"
+                        " of `SupportsHMA` defined in kv_connector/v1/base.py."
+                    )
+                    self.scheduler_config.disable_hybrid_kv_cache_manager = True


what about making disable_hybrid_kv_cache_manager a bool | None so that we can follow the user-specified config if it is not None and decide it automatically if it is None. Like this:

num_gpu_blocks_override: int | None = None

NickLucche · 2025-12-04T13:06:07Z

@heheda12345 I have addressed your comment.
I have split the handling between forcing HMA off and not expressing a preference to maintain current logic unchanged for all but kv connector (scope of this PR).
We can move options into the if disable_hybrid_kv_cache_manager is None branch more gracefully in separate PRs with better context.

However I am not sure this is simplifying things. Let me know what you think.

PS: --no-disable-hybrid-kv-cache-manager must be used to explicitly enable HMA.

KuntaiDu · 2025-12-04T22:50:05Z

+                prev_disable_hma is False
+                and self.scheduler_config.disable_hybrid_kv_cache_manager is True
+            ):
+                logger.info(


For this code path, maybe we can raise a NotImplementedError with some explanation instead of a warning? Because for me if the user intentionally set --no-disable-hybrid-manager they are not expecting vLLM falling back to non-hybrid-allocator code path.

Explanation example:

Hybrid KV cache manager is explicitly enabled, but currently `--kv-events-config` is set and KV events code path is not compatible with hybrid kv cache manager.

I see your point, the idea was that I didn't want this PR to affect existing behavior.
I felt like logging now and then swapping to a NotImplementedError later on when we've given users time to omit --no--disable-hybrid-manager, would actually be a more desirable ux.

Would logger.warning better fit here?

Oh I thought this code path will only be triggered when --no-disable-hybrid-manager is set. Let me double check.

chatted offline about this, we're raising exception on this PR

KuntaiDu

Thanks for the contribution!
See comments.

NickLucche · 2025-12-05T16:37:08Z

Thanks for reviewing @KuntaiDu ! I've addressed your comments

ivanium

Overall the PR looks good to me too! I just have one question. Right now we still disable the hybrid allocator whenever kv_transfer_config is not None. When we eventually support hybrid allocation per KV connector, what’s the best practice you envision? For example, if we want hybrid allocation enabled only for a specific subset of connectors, will there be a mechanism for developers to enforce “hybrid allocator + chosen connector” together?

ivanium · 2025-12-06T00:10:05Z

+                prev_disable_hma is False
+                and self.scheduler_config.disable_hybrid_kv_cache_manager is True
+            ):
+                logger.info(


Would logger.warning better fit here?

KuntaiDu · 2025-12-07T07:14:03Z

Overall the PR looks good to me too! I just have one question. Right now we still disable the hybrid allocator whenever kv_transfer_config is not None. When we eventually support hybrid allocation per KV connector, what’s the best practice you envision? For example, if we want hybrid allocation enabled only for a specific subset of connectors, will there be a mechanism for developers to enforce “hybrid allocator + chosen connector” together?

Good question. I need to think a bit about this. Let me get back to you later.

Signed-off-by: NickLucche <nlucches@redhat.com>

This reverts commit 250ea870740e99ced7b496d61d4c1056d6b98156. Signed-off-by: NickLucche <nlucches@redhat.com>

Signed-off-by: NickLucche <nlucches@redhat.com>

heheda12345 · 2025-12-09T19:55:59Z


+        # Runtime-dependent disable of hybrid kv cache manager logic.
        if not self.scheduler_config.disable_hybrid_kv_cache_manager:
+            prev_disable_hma = self.scheduler_config.disable_hybrid_kv_cache_manager


What about:

need_disable_hybrid_kv_cache_manager = False if not current_platform.support_hybrid_kv_cache(): need_disable_hybrid_kv_cache_manager = True if ***: need_disable_hybrid_kv_cache_manager = True ... if self.scheduler_config.disable_hybrid_kv_cache_manager is None: if self.kv_transfer_config is not None: # Experimental feature. Default to disable but allow users to enable. need_disable_hybrid_kv_cache_manager = True logger.warning(***) self.scheduler_config.disable_hybrid_kv_cache_manager = need_disable_hybrid_kv_cache_manager elif self.scheduler_config.disable_hybrid_kv_cache_manager == False: if need_disable_hybrid_kv_cache_manager: raise xxxx

I feel prev_disable_hma is a little bit hacky.

@heheda12345 I am sorry ~~but I don't see how checking whether a bool was flipped here is hacky~~.
EDIT: I think I see what you mean now, you're referring to clarity. Don't have a strong opinion on that, will check it out when I have the time.

I've separated features that do NOT work with HMA from features that may work with HMA such as kv connector (a later check on supports_hma is carried out when attempting to create the actual connector).

Attempting to enable HMA explicitly on the former will crash the server. prev_disable_hma is just for checking whether the flag was set by the user without adding more lines such as need_disable_hybrid_kv_cache_manager.

Signed-off-by: NickLucche <nlucches@redhat.com>

NickLucche · 2025-12-11T16:26:51Z

@heheda12345 I have addressed your review, hopefully that improves clarity. Thanks for looking into it!

heheda12345

LGTM! Thank you very much.

…cator + KV cache connector (vllm-project#29805) Signed-off-by: NickLucche <nlucches@redhat.com>

…cator + KV cache connector (vllm-project#29805) Signed-off-by: NickLucche <nlucches@redhat.com> Signed-off-by: Joachim Studnia <joachim@mistral.ai>

…cator + KV cache connector (vllm-project#29805) Signed-off-by: NickLucche <nlucches@redhat.com> Signed-off-by: Ubuntu <mjtaheri68@gmail.com>

…cator + KV cache connector (vllm-project#29805) Signed-off-by: NickLucche <nlucches@redhat.com>

NickLucche requested review from ApostaC, ProExpertProg, WoosukKwon, hmellor, houseroad, mgoin, robertgshaw2-redhat, tlrmchlsmth, yewentao256 and youkaichao as code owners December 1, 2025 19:00

mergify Bot added v1 kv-connector labels Dec 1, 2025

gemini-code-assist Bot reviewed Dec 1, 2025

View reviewed changes

Comment thread vllm/config/vllm.py Outdated

heheda12345 reviewed Dec 2, 2025

View reviewed changes

NickLucche added the ready ONLY add when PR is ready to merge/full CI is needed label Dec 4, 2025

NickLucche force-pushed the enable-hma-kv-connector branch from 061f3e1 to f78e350 Compare December 4, 2025 16:43

KuntaiDu reviewed Dec 4, 2025

View reviewed changes

Comment thread vllm/config/vllm.py Outdated

KuntaiDu reviewed Dec 4, 2025

View reviewed changes

ivanium reviewed Dec 6, 2025

View reviewed changes

NickLucche added 6 commits December 9, 2025 17:42

enable hma+kv_conn

a376cf4

Signed-off-by: NickLucche <nlucches@redhat.com>

Revert "enable hma+kv_conn"

95d4971

This reverts commit 250ea870740e99ced7b496d61d4c1056d6b98156. Signed-off-by: NickLucche <nlucches@redhat.com>

optional bool flag

2142e46

Signed-off-by: NickLucche <nlucches@redhat.com>

comment

664db48

Signed-off-by: NickLucche <nlucches@redhat.com>

rasing

e854448

Signed-off-by: NickLucche <nlucches@redhat.com>

comment

4d59565

Signed-off-by: NickLucche <nlucches@redhat.com>

NickLucche force-pushed the enable-hma-kv-connector branch from cc6977a to 4d59565 Compare December 9, 2025 17:42

heheda12345 reviewed Dec 9, 2025

View reviewed changes

thameem-abbas added a commit to thameem-abbas/vllm that referenced this pull request Dec 10, 2025

Merge PR vllm-project#29805: enable-hma-kv-connector

442a590

chen review

c965044

Signed-off-by: NickLucche <nlucches@redhat.com>

heheda12345 approved these changes Dec 11, 2025

View reviewed changes

heheda12345 enabled auto-merge (squash) December 11, 2025 17:06

Merge branch 'main' into enable-hma-kv-connector

2d1535d

heheda12345 merged commit 185c22b into vllm-project:main Dec 15, 2025
51 checks passed

Lucaskabela pushed a commit to Lucaskabela/vllm that referenced this pull request Dec 15, 2025

[Misc][Hybrid allocator + kv connector] Optionally enable hybrid allo…

b69b5d7

…cator + KV cache connector (vllm-project#29805) Signed-off-by: NickLucche <nlucches@redhat.com>

v1b3coder mentioned this pull request May 6, 2026

[Feature]: --kv-transfer-config unconditionally disables HMA, ignoring SupportsHMA on the connector #41830

Closed

chfeng-cs mentioned this pull request May 6, 2026

[KV Transfer] Enable HMA by default for connectors that support it #41847

Merged

mystous pushed a commit to mystous/vllm_hybrid that referenced this pull request May 10, 2026

[Misc][Hybrid allocator + kv connector] Optionally enable hybrid allo…

b0a3fe3

…cator + KV cache connector (vllm-project#29805) Signed-off-by: NickLucche <nlucches@redhat.com>

0826joyce pushed a commit to 0826joyce/vllm-serving-optimization that referenced this pull request May 19, 2026

[Misc][Hybrid allocator + kv connector] Optionally enable hybrid allo…

c345225

…cator + KV cache connector (vllm-project#29805) Signed-off-by: NickLucche <nlucches@redhat.com>

Uh oh!

Conversation

NickLucche commented Dec 1, 2025 • edited by github-actions Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chatgpt-codex-connector Bot commented Dec 1, 2025

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

NickLucche commented Dec 4, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

KuntaiDu left a comment

Choose a reason for hiding this comment

Uh oh!

NickLucche commented Dec 5, 2025

Uh oh!

ivanium left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

KuntaiDu commented Dec 7, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

NickLucche Dec 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

NickLucche commented Dec 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

heheda12345 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

NickLucche commented Dec 1, 2025 •

edited by github-actions Bot

Loading

NickLucche Dec 10, 2025 •

edited

Loading

NickLucche commented Dec 11, 2025 •

edited

Loading