Skip to content

[Misc][v0.13.0]Removes unnecessary graph size re-initialization#6281

Merged
wangxiyuan merged 2 commits intovllm-project:releases/v0.13.0from
Angazenn:bugfix_dev
Jan 27, 2026
Merged

[Misc][v0.13.0]Removes unnecessary graph size re-initialization#6281
wangxiyuan merged 2 commits intovllm-project:releases/v0.13.0from
Angazenn:bugfix_dev

Conversation

@Angazenn
Copy link
Copy Markdown
Collaborator

@Angazenn Angazenn commented Jan 26, 2026

What this PR does / why we need it?

Cherry-pick from #6280 .
This PR removes update_default_aclgraph_sizes. In earlier versions, we add this function to change default cudagraph_capture_sizes because _npu_paged_attention degrades significantly on certain shapes (which is included in default cudagraph_capture_sizes of VLLM). Now since we use FIA as default attention op (which does not contain such performance degradation), there is no need to add this default change. Otherwise, it could cause some conflicts if we set a small cudagraph_capture_sizes that < 20 now.

Does this PR introduce any user-facing change?

How was this patch tested?

Signed-off-by: Angazenn <supperccell@163.com>
@Angazenn Angazenn added the ready read for review label Jan 26, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

The pull request removes the update_default_aclgraph_sizes function and its associated test patch. While the PR title suggests this re-initialization is unnecessary, the original comments within the removed code indicated that this function was crucial for optimizing ACL graph capture sizes for Ascend hardware and specifically addressed performance degradation for Qwen3-MoE models. If these underlying issues have been resolved through other mechanisms, it would be beneficial to explicitly state this to ensure that performance and correctness are not negatively impacted by this removal.

I am having trouble creating individual review comments. Click here to see my feedback.

vllm_ascend/platform.py (219-222)

critical

The removal of update_default_aclgraph_sizes(vllm_config) is concerning given the original comment stating its purpose was to "improve default performance" and address cases where "default cudagraph_capture_sizes are not friendly to ascend ops && hardwares." Without this update, there's a high risk of performance degradation or incorrect behavior if the default sizes are indeed suboptimal for Ascend. Please confirm that the issues previously addressed by this function are now resolved by other means, or that the function was indeed redundant.

vllm_ascend/utils.py (432-485)

critical

The complete removal of update_default_aclgraph_sizes and _is_default_capture_sizes is a critical change. The update_default_aclgraph_sizes function explicitly aimed to make ACL graph sizes "more friendly to ascend ops && hardware" and specifically handled performance issues for "Qwen3-MoE models on dp settings." If the problems this function was designed to solve still exist, its removal could lead to severe performance regressions or functional errors. It's crucial to confirm that the issues previously addressed by this function are now resolved by other means, or that the function was indeed redundant.

@Angazenn Angazenn added the ready-for-test start test by label for PR label Jan 26, 2026
Signed-off-by: Angazenn <supperccell@163.com>
@wangxiyuan wangxiyuan merged commit 744d1e6 into vllm-project:releases/v0.13.0 Jan 27, 2026
12 checks passed
starmountain1997 pushed a commit to starmountain1997/vllm-ascend that referenced this pull request Jan 31, 2026
…-project#6281)

### What this PR does / why we need it?
Cherry-pick from vllm-project#6280 .
This PR removes `update_default_aclgraph_sizes`. In earlier versions, we
add this function to change default `cudagraph_capture_sizes` because
`_npu_paged_attention` degrades significantly on certain shapes (which
is included in default `cudagraph_capture_sizes` of VLLM). Now since we
use FIA as default attention op (which does not contain such performance
degradation), there is no need to add this default change. Otherwise, it
could cause some conflicts if we set a small `cudagraph_capture_sizes`
that < 20 now.

---------

Signed-off-by: Angazenn <supperccell@163.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready read for review ready-for-test start test by label for PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants