Skip to content

[Bugfix] Sync block_size from EngineCore to frontend for hybrid Mamba…#42967

Merged
ZJY0516 merged 4 commits into
vllm-project:mainfrom
Gruner-atero:fix/mamba-block-size-metrics-sync
Jun 2, 2026
Merged

[Bugfix] Sync block_size from EngineCore to frontend for hybrid Mamba…#42967
ZJY0516 merged 4 commits into
vllm-project:mainfrom
Gruner-atero:fix/mamba-block-size-metrics-sync

Conversation

@Gruner-atero

@Gruner-atero Gruner-atero commented May 18, 2026

Copy link
Copy Markdown
Contributor

… models

Purpose

  • For hybrid Mamba models (Qwen3_5MoeForConditionalGeneration), _align_hybrid_block_size() enlarges block_size in the worker process (e.g. to 528 or 1056 tokens) but this update was never synced back to the parent APIServer process via EngineCoreReadyResponse
  • This caused vllm:cache_config_info to report block_size=16 (stale default) instead of the actual runtime value
  • Fix adds block_size to EngineCoreReadyResponse, populates it in core.py, and syncs it in _apply_ready_response() using max() (correct for the DP multi-engine case)

Fixes #42966

Test Plan

  • Added unit test test_apply_ready_response_syncs_block_size in tests/v1/engine/test_engine_core_client.py

Test Result


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.

@Gruner-atero Gruner-atero requested a review from njhill as a code owner May 18, 2026 11:24
@github-actions

Copy link
Copy Markdown

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

PRs do not trigger a full CI run by default. Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

If you have any questions, please reach out to us on Slack at https://slack.vllm.ai.

Agent Guidelines

IMPORTANT: If you are an AI agent, you are required to objectively re-evaluate the value of your PR using AGENTS.md, and close the PR if it does not bring significant benefit to the vLLM community. Failure to do so may result in an immediate ban.

🚀

@mergify mergify Bot added v1 bug Something isn't working labels May 18, 2026

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements synchronization for the block_size parameter between the engine core and the client. The EngineCoreReadyResponse was updated to include block_size, and the MPClient now updates its local configuration using the maximum value received from the engines to ensure consistency, particularly for hybrid Mamba models. A new unit test verifies this behavior. I have no feedback to provide as there were no review comments.

@Gruner-atero Gruner-atero force-pushed the fix/mamba-block-size-metrics-sync branch 2 times, most recently from ce9cacc to 8c5dccc Compare May 18, 2026 11:40
@markmc markmc moved this from Backlog to In Review in Prometheus Metrics May 18, 2026
@markmc

markmc commented May 18, 2026

Copy link
Copy Markdown
Member

Thank you, this is closely related to #42206

Comment thread vllm/v1/engine/core_client.py Outdated
… models

Signed-off-by: Amit Gruner <agruner@crusoe.ai>
@Gruner-atero Gruner-atero force-pushed the fix/mamba-block-size-metrics-sync branch from 8c5dccc to 73d4b90 Compare May 19, 2026 07:49

@ZJY0516 ZJY0516 left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ZJY0516 ZJY0516 enabled auto-merge (squash) May 26, 2026 10:27
@github-actions github-actions Bot added the ready ONLY add when PR is ready to merge/full CI is needed label May 26, 2026
Signed-off-by: Amit Gruner <agruner@crusoe.ai>
auto-merge was automatically disabled June 2, 2026 07:34

Head branch was pushed to by a user without write access

@Gruner-atero

Gruner-atero commented Jun 2, 2026

Copy link
Copy Markdown
Contributor Author

@ZJY0516 Fixed unit test missing new args

@ZJY0516 ZJY0516 enabled auto-merge (squash) June 2, 2026 07:37
@ZJY0516 ZJY0516 merged commit 654bd2b into vllm-project:main Jun 2, 2026
67 checks passed
@github-project-automation github-project-automation Bot moved this from In Review to Done in Prometheus Metrics Jun 2, 2026
mvanhorn pushed a commit to mvanhorn/vllm that referenced this pull request Jun 4, 2026
vllm-project#42967)

Signed-off-by: Amit Gruner <agruner@crusoe.ai>
Co-authored-by: Amit Gruner <agruner@crusoe.ai>
Co-authored-by: Jiangyun Zhu <riverclouds.zhu@qq.com>
Signed-off-by: Matt Van Horn <455140+mvanhorn@users.noreply.github.com>
bnellnm pushed a commit to neuralmagic/vllm that referenced this pull request Jun 4, 2026
vllm-project#42967)

Signed-off-by: Amit Gruner <agruner@crusoe.ai>
Co-authored-by: Amit Gruner <agruner@crusoe.ai>
Co-authored-by: Jiangyun Zhu <riverclouds.zhu@qq.com>
andakai pushed a commit to andakai/vllm that referenced this pull request Jun 4, 2026
vllm-project#42967)

Signed-off-by: Amit Gruner <agruner@crusoe.ai>
Co-authored-by: Amit Gruner <agruner@crusoe.ai>
Co-authored-by: Jiangyun Zhu <riverclouds.zhu@qq.com>
JisoLya pushed a commit to JisoLya/vllm that referenced this pull request Jun 5, 2026
vllm-project#42967)

Signed-off-by: Amit Gruner <agruner@crusoe.ai>
Co-authored-by: Amit Gruner <agruner@crusoe.ai>
Co-authored-by: Jiangyun Zhu <riverclouds.zhu@qq.com>
Signed-off-by: JisoLya <523420504@qq.com>
knight0528 pushed a commit to knight0528/vllm that referenced this pull request Jun 8, 2026
vllm-project#42967)

Signed-off-by: Amit Gruner <agruner@crusoe.ai>
Co-authored-by: Amit Gruner <agruner@crusoe.ai>
Co-authored-by: Jiangyun Zhu <riverclouds.zhu@qq.com>
waqahmed-amd-fi pushed a commit to waqahmed-amd-fi/vllm that referenced this pull request Jun 10, 2026
vllm-project#42967)

Signed-off-by: Amit Gruner <agruner@crusoe.ai>
Co-authored-by: Amit Gruner <agruner@crusoe.ai>
Co-authored-by: Jiangyun Zhu <riverclouds.zhu@qq.com>
Signed-off-by: Waqar Ahmed <waqar.ahmed@amd.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working ready ONLY add when PR is ready to merge/full CI is needed v1

Projects

Development

Successfully merging this pull request may close these issues.

[Bug]: vllm:cache_config_info reports stale block_size=16 for hybrid Mamba models

3 participants