[0.18.0][BugFix] Update capture sizes after rounding operations by slippersss · Pull Request #8380 · vllm-project/vllm-ascend

slippersss · 2026-04-17T07:43:29Z

What this PR does / why we need it?

This PR is partially cherry-picked from #8172.

This PR aims to fix mismatched capture sizes after rounding operations when using sp or speculative. The reason is that original self.cudagraph_batch_sizes is no longer updated and remains as the initial sizes. Now we use self.cudagraph_dispatcher.get_capture_descs to the get up-to-date sizes.

Does this PR introduce any user-facing change?

N/A

How was this patch tested?

by ci

gemini-code-assist · 2026-04-17T07:43:36Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses a bug where capture sizes became mismatched following rounding operations in speculative or sp modes. By shifting from static cached sizes to real-time descriptors provided by the dispatcher, the system ensures that graph parameters are correctly aligned with the actual state of the model runner.

Highlights

Capture Size Synchronization: Updated the capture size retrieval mechanism to use the dispatcher's current descriptors instead of relying on stale cached values.
Dynamic Graph Parameter Updates: Ensured that graph parameters for both standard and speculative configurations are initialized using the dynamically calculated capture sizes.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

Signed-off-by: Zetong Li <slippersss@126.com>

gemini-code-assist

Code Review

Suggested PR Title:\n\nmarkdown\n[Ops][Misc] Use dynamic capture sizes for ACL graph parameter initialization\n\n\nSuggested PR Summary:\n\nmarkdown\n### What this PR does / why we need it?\nThis pull request updates the `_check_and_update_cudagraph_mode` method to retrieve capture sizes from the `cudagraph_dispatcher`. This ensures that `set_graph_params` and `set_draft_graph_params` are initialized with the correct token counts for ACL graph execution instead of using static batch sizes. A review comment suggests simplifying the set comprehension used to extract these sizes for better readability.\n\n### Does this PR introduce _any_ user-facing change?\nNo.\n\n### How was this patch tested?\nCI passed with existing tests.\n

gemini-code-assist · 2026-04-17T07:52:30Z

+        capture_descs = self.cudagraph_dispatcher.get_capture_descs()
+        capture_sizes = sorted({
+            desc.num_tokens
+            for _, descs in capture_descs
+            for desc in descs
+        })


The logic to extract capture_sizes from capture_descs is redundant and can be simplified using a set comprehension directly. This improves readability and maintainability.

capture_sizes = sorted({desc.num_tokens for descs in capture_descs.values() for desc in descs})

It got a point, but not so important.

slippersss requested a review from MengqingCao as a code owner April 17, 2026 07:43

slippersss changed the title ~~[BugFix] Update capture sizes after rounding operations~~ [0.18.0][BugFix] Update capture sizes after rounding operations Apr 17, 2026

[0.18.0][BugFix] Update capture sizes after rounding operations

959eddf

Signed-off-by: Zetong Li <slippersss@126.com>

slippersss force-pushed the bugfix_capture_sizes branch from 13602a4 to 959eddf Compare April 17, 2026 07:48

gemini-code-assist Bot reviewed Apr 17, 2026

View reviewed changes

yiz-liu approved these changes Apr 17, 2026

View reviewed changes

wangxiyuan approved these changes Apr 17, 2026

View reviewed changes

yiz-liu added ready read for review ready-for-test start test by label for PR labels Apr 17, 2026

zzzzwwjj mentioned this pull request Apr 17, 2026

[Release]: Release checklist for v0.18.0 #8198

Open

53 tasks

yiz-liu mentioned this pull request Apr 17, 2026

[Bug]: qwen3.5 mtp+graph cannot start vllm service #7598

Open

wangxiyuan merged commit b72ade9 into vllm-project:releases/v0.18.0 Apr 17, 2026
55 of 57 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[0.18.0][BugFix] Update capture sizes after rounding operations#8380

[0.18.0][BugFix] Update capture sizes after rounding operations#8380
wangxiyuan merged 1 commit intovllm-project:releases/v0.18.0from
slippersss:bugfix_capture_sizes

slippersss commented Apr 17, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot commented Apr 17, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Apr 17, 2026

Uh oh!

yiz-liu Apr 17, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

slippersss commented Apr 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

gemini-code-assist Bot commented Apr 17, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

yiz-liu Apr 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

slippersss commented Apr 17, 2026 •

edited

Loading

yiz-liu Apr 17, 2026 •

edited

Loading