Skip to content

[Feature] Keep HMA enabled for supported KV connectors#41644

Open
arpera wants to merge 1 commit intovllm-project:mainfrom
arpera:qwen35-hma-connector-default
Open

[Feature] Keep HMA enabled for supported KV connectors#41644
arpera wants to merge 1 commit intovllm-project:mainfrom
arpera:qwen35-hma-connector-default

Conversation

@arpera
Copy link
Copy Markdown
Contributor

@arpera arpera commented May 4, 2026

Motivation

vLLM currently disables the hybrid KV cache manager by default whenever kv_transfer_config is set, unless the user explicitly passes --no-disable-hybrid-kv-cache-manager. That is conservative for connectors that do not support HMA, but it also disables HMA for connectors like NixlConnector that already advertise SupportsHMA.

This change preserves the conservative default for unsupported connectors while allowing HMA to stay enabled by default when the selected connector explicitly supports it.

Test Results

Hardware: 4xGB200

Functional

python -m pytest tests/v1/core/test_kv_cache_utils.py::test_hma_not_disabled_for_supported_kv_connector tests/v1/core/test_kv_cache_utils.py::test_hma_disabled_for_unsupported_kv_connector -v — passed.

Performance

Not measured. This only changes the default HMA decision for KV connectors that already declare HMA support.


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.

Signed-off-by: Artem Perevedentsev <aperevedents@nvidia.com>
Copy link
Copy Markdown

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Warning

Gemini encountered an error creating the review. You can try again by commenting /gemini review.

@arpera
Copy link
Copy Markdown
Contributor Author

arpera commented May 4, 2026

/gemini review

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Warning

Gemini encountered an error creating the review. You can try again by commenting /gemini review.

@arpera
Copy link
Copy Markdown
Contributor Author

arpera commented May 4, 2026

@vadiklyutiy, please, have a look

@orozery
Copy link
Copy Markdown
Collaborator

orozery commented May 5, 2026

cc @NickLucche
This will break the multi connector.

@orozery orozery requested review from NickLucche and removed request for ywang96 May 5, 2026 08:32
@arpera
Copy link
Copy Markdown
Contributor Author

arpera commented May 5, 2026

@orozery, could you please explain a bit more the issue with multi connector?

@orozery
Copy link
Copy Markdown
Collaborator

orozery commented May 5, 2026

@orozery, could you please explain a bit more the issue with multi connector?

See discussion in #39571.
The multi connector currently assumes that HMA will be off by default if kv_transfer_config is present.
Breaking this assumption would lead to a possible case where HMA is enabled though one of the sub-connectors of the multi-connector does not support HMA.

Copy link
Copy Markdown
Collaborator

@NickLucche NickLucche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for contributing @arpera.
Let's chat before changing current hma opt-in policy.

@arpera
Copy link
Copy Markdown
Contributor Author

arpera commented May 5, 2026

Sure, I am open to discussion, if there is any concerns about this change, please, feel free to discuss it there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants