[Misc][Hybrid allocator + kv connector] Optionally enable hybrid allocator + KV cache connector#29805
Conversation
|
Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits. |
There was a problem hiding this comment.
Code Review
This PR introduces a toggle for enabling HMA with KV connectors. The implementation is straightforward. However, there is a risk of runtime crashes if the toggle is enabled for a connector that does not support HMA. I've suggested a safeguard to prevent this by checking if the configured connector is LMCache, which is currently the only supported one. This change would also require a small update to the new unit test.
| "your connector by making sure your connector is a subclass" | ||
| " of `SupportsHMA` defined in kv_connector/v1/base.py." | ||
| ) | ||
| self.scheduler_config.disable_hybrid_kv_cache_manager = True |
There was a problem hiding this comment.
what about making disable_hybrid_kv_cache_manager a bool | None so that we can follow the user-specified config if it is not None and decide it automatically if it is None. Like this:
num_gpu_blocks_override: int | None = None
|
@heheda12345 I have addressed your comment. However I am not sure this is simplifying things. Let me know what you think. PS: |
061f3e1 to
f78e350
Compare
| prev_disable_hma is False | ||
| and self.scheduler_config.disable_hybrid_kv_cache_manager is True | ||
| ): | ||
| logger.info( |
There was a problem hiding this comment.
For this code path, maybe we can raise a NotImplementedError with some explanation instead of a warning? Because for me if the user intentionally set --no-disable-hybrid-manager they are not expecting vLLM falling back to non-hybrid-allocator code path.
Explanation example:
Hybrid KV cache manager is explicitly enabled, but currently `--kv-events-config` is set and KV events code path is not compatible with hybrid kv cache manager.
There was a problem hiding this comment.
I see your point, the idea was that I didn't want this PR to affect existing behavior.
I felt like logging now and then swapping to a NotImplementedError later on when we've given users time to omit --no--disable-hybrid-manager, would actually be a more desirable ux.
There was a problem hiding this comment.
Would logger.warning better fit here?
There was a problem hiding this comment.
Oh I thought this code path will only be triggered when --no-disable-hybrid-manager is set. Let me double check.
There was a problem hiding this comment.
chatted offline about this, we're raising exception on this PR
KuntaiDu
left a comment
There was a problem hiding this comment.
Thanks for the contribution!
See comments.
|
Thanks for reviewing @KuntaiDu ! I've addressed your comments |
ivanium
left a comment
There was a problem hiding this comment.
Overall the PR looks good to me too! I just have one question. Right now we still disable the hybrid allocator whenever kv_transfer_config is not None. When we eventually support hybrid allocation per KV connector, what’s the best practice you envision? For example, if we want hybrid allocation enabled only for a specific subset of connectors, will there be a mechanism for developers to enforce “hybrid allocator + chosen connector” together?
| prev_disable_hma is False | ||
| and self.scheduler_config.disable_hybrid_kv_cache_manager is True | ||
| ): | ||
| logger.info( |
There was a problem hiding this comment.
Would logger.warning better fit here?
Good question. I need to think a bit about this. Let me get back to you later. |
Signed-off-by: NickLucche <nlucches@redhat.com>
This reverts commit 250ea870740e99ced7b496d61d4c1056d6b98156. Signed-off-by: NickLucche <nlucches@redhat.com>
Signed-off-by: NickLucche <nlucches@redhat.com>
cc6977a to
4d59565
Compare
|
|
||
| # Runtime-dependent disable of hybrid kv cache manager logic. | ||
| if not self.scheduler_config.disable_hybrid_kv_cache_manager: | ||
| prev_disable_hma = self.scheduler_config.disable_hybrid_kv_cache_manager |
There was a problem hiding this comment.
What about:
need_disable_hybrid_kv_cache_manager = False
if not current_platform.support_hybrid_kv_cache():
need_disable_hybrid_kv_cache_manager = True
if ***: need_disable_hybrid_kv_cache_manager = True
...
if self.scheduler_config.disable_hybrid_kv_cache_manager is None:
if self.kv_transfer_config is not None:
# Experimental feature. Default to disable but allow users to enable.
need_disable_hybrid_kv_cache_manager = True
logger.warning(***)
self.scheduler_config.disable_hybrid_kv_cache_manager = need_disable_hybrid_kv_cache_manager
elif self.scheduler_config.disable_hybrid_kv_cache_manager == False:
if need_disable_hybrid_kv_cache_manager: raise xxxx
I feel prev_disable_hma is a little bit hacky.
There was a problem hiding this comment.
@heheda12345 I am sorry but I don't see how checking whether a bool was flipped here is hacky.
EDIT: I think I see what you mean now, you're referring to clarity. Don't have a strong opinion on that, will check it out when I have the time.
I've separated features that do NOT work with HMA from features that may work with HMA such as kv connector (a later check on supports_hma is carried out when attempting to create the actual connector).
Attempting to enable HMA explicitly on the former will crash the server. prev_disable_hma is just for checking whether the flag was set by the user without adding more lines such as need_disable_hybrid_kv_cache_manager.
Signed-off-by: NickLucche <nlucches@redhat.com>
|
@heheda12345 I have addressed your review, hopefully that improves clarity. Thanks for looking into it! |
heheda12345
left a comment
There was a problem hiding this comment.
LGTM! Thank you very much.
…cator + KV cache connector (vllm-project#29805) Signed-off-by: NickLucche <nlucches@redhat.com>
…cator + KV cache connector (vllm-project#29805) Signed-off-by: NickLucche <nlucches@redhat.com> Signed-off-by: Joachim Studnia <joachim@mistral.ai>
…cator + KV cache connector (vllm-project#29805) Signed-off-by: NickLucche <nlucches@redhat.com> Signed-off-by: Ubuntu <mjtaheri68@gmail.com>
…cator + KV cache connector (vllm-project#29805) Signed-off-by: NickLucche <nlucches@redhat.com>
…cator + KV cache connector (vllm-project#29805) Signed-off-by: NickLucche <nlucches@redhat.com>
…cator + KV cache connector (vllm-project#29805) Signed-off-by: NickLucche <nlucches@redhat.com>
…cator + KV cache connector (vllm-project#29805) Signed-off-by: NickLucche <nlucches@redhat.com>
This PR contains a simple toggle for enabling the experimental HMA + KVConnector integration, needed to build actual HMA support into existing connectors.
This feature is still under development so only the LMCache will actually work with it out of the box.
cc @KuntaiDu @heheda12345 @ivanium