[Feature] Keep HMA enabled for supported KV connectors#41644
[Feature] Keep HMA enabled for supported KV connectors#41644arpera wants to merge 1 commit intovllm-project:mainfrom
Conversation
Signed-off-by: Artem Perevedentsev <aperevedents@nvidia.com>
|
Warning Gemini encountered an error creating the review. You can try again by commenting |
|
/gemini review |
|
Warning Gemini encountered an error creating the review. You can try again by commenting |
|
@vadiklyutiy, please, have a look |
|
cc @NickLucche |
|
@orozery, could you please explain a bit more the issue with multi connector? |
See discussion in #39571. |
NickLucche
left a comment
There was a problem hiding this comment.
Thanks for contributing @arpera.
Let's chat before changing current hma opt-in policy.
|
Sure, I am open to discussion, if there is any concerns about this change, please, feel free to discuss it there. |
Motivation
vLLM currently disables the hybrid KV cache manager by default whenever
kv_transfer_configis set, unless the user explicitly passes--no-disable-hybrid-kv-cache-manager. That is conservative for connectors that do not support HMA, but it also disables HMA for connectors likeNixlConnectorthat already advertiseSupportsHMA.This change preserves the conservative default for unsupported connectors while allowing HMA to stay enabled by default when the selected connector explicitly supports it.
Test Results
Hardware: 4xGB200
Functional
python -m pytest tests/v1/core/test_kv_cache_utils.py::test_hma_not_disabled_for_supported_kv_connector tests/v1/core/test_kv_cache_utils.py::test_hma_disabled_for_unsupported_kv_connector -v— passed.Performance
Not measured. This only changes the default HMA decision for KV connectors that already declare HMA support.
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.