Skip to content

Revert "[HMA] [KVEvent] Enable GPU-side KV events for HMA" (#37688)#39658

Draft
vllm-agent wants to merge 1 commit intovllm-project:mainfrom
vllm-agent:auto-revert/pr-37688
Draft

Revert "[HMA] [KVEvent] Enable GPU-side KV events for HMA" (#37688)#39658
vllm-agent wants to merge 1 commit intovllm-project:mainfrom
vllm-agent:auto-revert/pr-37688

Conversation

@vllm-agent
Copy link
Copy Markdown

Revert of #37688

This reverts commit cc07dad.

Reason: This PR is linked to 1 new CI failure in build #60970:

  • MultiConnector (Nixl+Offloading) PD edge cases (2 GPUs): test_full_decode_gpu_cache_hit_metrics — local_cache_hit off-by-one (expected 128, got 127.0)

Original PR: #37688


Auto-generated by CI failure analyzer

@mergify
Copy link
Copy Markdown
Contributor

mergify Bot commented Apr 13, 2026

Documentation preview: https://vllm--39658.org.readthedocs.build/en/39658/

@mergify mergify Bot added documentation Improvements or additions to documentation v1 labels Apr 13, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request removes the group_idx field from KV cache events (BlockStored and BlockRemoved) and updates associated tests and logic across the codebase. It also introduces a change to disable the hybrid KV cache manager when KV events are configured, as they are currently incompatible. Feedback suggests refining the logic in vllm/config/vllm.py to only disable the hybrid KV cache manager if KV events are explicitly enabled, rather than just checking for the presence of the configuration object.

Comment thread vllm/config/vllm.py
Comment on lines +1242 to +1244
if self.kv_events_config is not None:
# Hybrid KV cache manager is not compatible with KV events.
need_disable_hybrid_kv_cache_manager = True
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The current logic disables the hybrid KV cache manager (HMA) whenever kv_events_config is present, regardless of whether KV events are actually enabled. This can lead to unnecessary performance regressions for hybrid models if the configuration object exists but enable_kv_cache_events is set to False. It is better to check the enable_kv_cache_events flag specifically.

Suggested change
if self.kv_events_config is not None:
# Hybrid KV cache manager is not compatible with KV events.
need_disable_hybrid_kv_cache_manager = True
if self.kv_events_config is not None and self.kv_events_config.enable_kv_cache_events:
# Hybrid KV cache manager is not compatible with KV events.
need_disable_hybrid_kv_cache_manager = True

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation v1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant