Skip to content

[PD][Bugfix] Fix KV Cache sharing with HMA#44629

Open
NickLucche wants to merge 3 commits into
vllm-project:mainfrom
NickLucche:fix-pd-kv-sharing
Open

[PD][Bugfix] Fix KV Cache sharing with HMA#44629
NickLucche wants to merge 3 commits into
vllm-project:mainfrom
NickLucche:fix-pd-kv-sharing

Conversation

@NickLucche
Copy link
Copy Markdown
Member

HMA is now the default way to serve models even when a connector is provided as of #41847.

Unfortunately we missed covering an important feature that is layers that share the same KV cache tensor, which currently crashes on main when attempting to fetch the corresponding layer_spec.
The fix is straightforward, as we ensure that layers that do not need a kv cache are not present in the kv_cache_spec at setup time

kv_cache_spec: dict[str, KVCacheSpec] = {}
layer_type = cast(type[Any], AttentionLayerBase)
attn_layers = get_layers_from_vllm_config(self.vllm_config, layer_type)
for layer_name, attn_module in attn_layers.items():
if isinstance(attn_module, Attention) and (
kv_tgt_layer := attn_module.kv_sharing_target_layer_name
):
# The layer doesn't need its own KV cache and will use that of
# the target layer. We skip creating a KVCacheSpec for it, so
# that KV cache management logic will act as this layer does
# not exist, and doesn't allocate KV cache for the layer. This
# enables the memory saving of cross-layer kv sharing, allowing
# a given amount of memory to accommodate longer context lengths
# or enable more requests to be processed simultaneously.
self.shared_kv_cache_layers[layer_name] = kv_tgt_layer
continue
# Skip modules that don't need KV cache (eg encoder-only attention)
if spec := attn_module.get_kv_cache_spec(self.vllm_config):
kv_cache_spec[layer_name] = spec

All we have to do in the PD connector is skip registration for those layers, like we did before supporting HMA.

Tested with a google/gemma-4-E2B-it PD deployment

local-chat-completions ({'model': 'google/gemma-4-E2B-it', 'base_url': 'http://127.0.0.1:25068/v1/chat/completions', 'tokenizer_backend': 'huggingface', 'max_concurrency': 100}), gen_kwargs: ({}), limit: None, num_fewshot: 5, batch_size: 1
|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value |   |Stderr|
|-----|------:|----------------|-----:|-----------|---|-----:|---|-----:|
|gsm8k|      3|flexible-extract|     5|exact_match|↑  |0.6573|±  |0.0131|
|     |       |strict-match    |     5|exact_match|↑  |0.4860|±  |0.0138|

and added test case to our eval suite

Signed-off-by: NickLucche <nlucches@redhat.com>
Signed-off-by: NickLucche <nlucches@redhat.com>
Signed-off-by: NickLucche <nlucches@redhat.com>
Copy link
Copy Markdown

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

@mergify mergify Bot added v1 bug Something isn't working kv-connector labels Jun 5, 2026
@mergify
Copy link
Copy Markdown
Contributor

mergify Bot commented Jun 5, 2026

Hi @NickLucche, the pre-commit checks have failed. Please run:

uv pip install pre-commit>=4.5.1
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy failing?
mypy is run differently in CI. If the failure is related to this check, please use the following command to run it locally:
# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10

1 similar comment
@mergify
Copy link
Copy Markdown
Contributor

mergify Bot commented Jun 5, 2026

Hi @NickLucche, the pre-commit checks have failed. Please run:

uv pip install pre-commit>=4.5.1
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy failing?
mypy is run differently in CI. If the failure is related to this check, please use the following command to run it locally:
# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working kv-connector v1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant