[Misc][v0.13.0]Removes unnecessary graph size re-initialization#6281
[Misc][v0.13.0]Removes unnecessary graph size re-initialization#6281wangxiyuan merged 2 commits intovllm-project:releases/v0.13.0from
Conversation
There was a problem hiding this comment.
Code Review
The pull request removes the update_default_aclgraph_sizes function and its associated test patch. While the PR title suggests this re-initialization is unnecessary, the original comments within the removed code indicated that this function was crucial for optimizing ACL graph capture sizes for Ascend hardware and specifically addressed performance degradation for Qwen3-MoE models. If these underlying issues have been resolved through other mechanisms, it would be beneficial to explicitly state this to ensure that performance and correctness are not negatively impacted by this removal.
I am having trouble creating individual review comments. Click here to see my feedback.
vllm_ascend/platform.py (219-222)
The removal of update_default_aclgraph_sizes(vllm_config) is concerning given the original comment stating its purpose was to "improve default performance" and address cases where "default cudagraph_capture_sizes are not friendly to ascend ops && hardwares." Without this update, there's a high risk of performance degradation or incorrect behavior if the default sizes are indeed suboptimal for Ascend. Please confirm that the issues previously addressed by this function are now resolved by other means, or that the function was indeed redundant.
vllm_ascend/utils.py (432-485)
The complete removal of update_default_aclgraph_sizes and _is_default_capture_sizes is a critical change. The update_default_aclgraph_sizes function explicitly aimed to make ACL graph sizes "more friendly to ascend ops && hardware" and specifically handled performance issues for "Qwen3-MoE models on dp settings." If the problems this function was designed to solve still exist, its removal could lead to severe performance regressions or functional errors. It's crucial to confirm that the issues previously addressed by this function are now resolved by other means, or that the function was indeed redundant.
…-project#6281) ### What this PR does / why we need it? Cherry-pick from vllm-project#6280 . This PR removes `update_default_aclgraph_sizes`. In earlier versions, we add this function to change default `cudagraph_capture_sizes` because `_npu_paged_attention` degrades significantly on certain shapes (which is included in default `cudagraph_capture_sizes` of VLLM). Now since we use FIA as default attention op (which does not contain such performance degradation), there is no need to add this default change. Otherwise, it could cause some conflicts if we set a small `cudagraph_capture_sizes` that < 20 now. --------- Signed-off-by: Angazenn <supperccell@163.com>
What this PR does / why we need it?
Cherry-pick from #6280 .
This PR removes
update_default_aclgraph_sizes. In earlier versions, we add this function to change defaultcudagraph_capture_sizesbecause_npu_paged_attentiondegrades significantly on certain shapes (which is included in defaultcudagraph_capture_sizesof VLLM). Now since we use FIA as default attention op (which does not contain such performance degradation), there is no need to add this default change. Otherwise, it could cause some conflicts if we set a smallcudagraph_capture_sizesthat < 20 now.Does this PR introduce any user-facing change?
How was this patch tested?