[bugfix][npugraph_ex]add the extra check for allreduce rmsnorm fusion pass by ChenCangtao · Pull Request #6430 · vllm-project/vllm-ascend

ChenCangtao · 2026-01-30T09:09:29Z

What this PR does / why we need it?

Allreduce rmsnorm fusion pass has an additional check condition, which requires fusion of the Fx graph only when the start of compile_range is greater than 512. We previously overlooked this check.

Does this PR introduce any user-facing change?

How was this patch tested?

vLLM version: v0.14.1
vLLM main: vllm-project/vllm@dc917cc

gemini-code-assist

Code Review

This pull request adds a missing check to the allreduce rmsnorm fusion pass to ensure it only runs when compile_range.start is above a certain threshold. The implementation correctly adds this check to the extra_check function for the torchair pattern. However, I've identified a critical issue where an existing stream scope check is replaced instead of being augmented, which could lead to incorrect fusions. I've provided a suggestion to combine both checks. Additionally, I've noted a code duplication issue that should be addressed to improve maintainability.

gemini-code-assist · 2026-01-30T09:11:05Z

+def extra_check_for_allreduce_rmsnorm_fusion_pass(match: Match) -> bool:
+    compile_range = get_pass_context().compile_range
+    return compile_range.start > ALLREDUCE_NORM_FUSE_THREHOLD


This new function introduces the intended check, but there are a couple of concerns:

(Critical) The PR description mentions adding an additional check, but the current implementation replaces the existing extra_stream_scope_check. This check prevents fusion across different streams and is likely a necessary safety guardrail that should be preserved. It seems both checks are required. This function should probably also call extra_stream_scope_check, which would require re-adding its import.

(High) The condition compile_range.start > ALLREDUCE_NORM_FUSE_THREHOLD is also present in GraphEXMatmulAllReduceAddRMSNormPass.is_applicable_for_range at the end of this file. This code duplication can lead to maintenance issues. It would be best to extract this logic into a shared helper function to be used in both places.

My suggestion below addresses the critical issue. Please also consider refactoring to address the code duplication.

Suggested change

def extra_check_for_allreduce_rmsnorm_fusion_pass(match: Match) -> bool:

compile_range = get_pass_context().compile_range

return compile_range.start > ALLREDUCE_NORM_FUSE_THREHOLD

def extra_check_for_allreduce_rmsnorm_fusion_pass(match: Match) -> bool:

compile_range = get_pass_context().compile_range

return extra_stream_scope_check(match) and compile_range.start > ALLREDUCE_NORM_FUSE_THREHOLD

github-actions · 2026-01-30T09:13:25Z

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

A PR should do only one thing, smaller PRs enable faster reviews.
Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

Signed-off-by: chencangtao <chencangtao@huawei.com>

…to qwen3next_rebase * 'main' of https://github.com/vllm-project/vllm-ascend: (59 commits) [Feat.]: 310p support MOE models (vllm-project#6530) [Doc] backport 0.13.0 release note (vllm-project#6584) [CI] Update UT CANN version to 8.5.0 for main branch (vllm-project#6564) [CI] Change A2 runner (vllm-project#6557) [Bugfix] Fix the incorrect use of the output parameter in _forward_fia_slidingwindow (vllm-project#6469) [main2main] upgrade vllm main 0202 (vllm-project#6560) [CI][npugraph_ex]Fix npugraph ex e2e test (vllm-project#6553) [Feature]KV pool supports sparse attention (vllm-project#6339) [bugfix]Fix accuracy issue in PCP/DCP with speculative decoding (vllm-project#6491) perf: adaptive block size selection in linear_persistent kernel (vllm-project#6537) [ModelRunner][Fix] Pads query_start_loc to satisfy FIA/TND constraint (vllm-project#6475) [Bugfix]Fix of Pooling Code and Update of Pooling Usage Guide (vllm-project#6126) [Fusion] Add rmsnorm dynamic quant fusion pass (vllm-project#6274) [Bugfix] Synchronize only the current stream to avoid device sync (vllm-project#6432) [CI] Add long and short prompt tests for DeepSeek-V3.2 (vllm-project#6499) [Refactor] MLP weight prefetch to consistency with MoE Model's prefetching in terms of code and usage (vllm-project#6442) [bugfix][npugraph_ex]duplicate pattern issue (vllm-project#6513) [bugfix][npugraph_ex]add the extra check for allreduce rmsnorm fusion pass (vllm-project#6430) [Quant] GLM4.7-Flash Support W8A8 (vllm-project#6492) [Nightly][BugFix] Remove kv_cache nz test case for test_mla_preprocess_nq.py (vllm-project#6505) ...

… pass (vllm-project#6430) ### What this PR does / why we need it? Allreduce rmsnorm fusion pass has an additional check condition, which requires fusion of the Fx graph only when the start of compile_range is greater than 512. We previously overlooked this check. ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.14.1 - vLLM main: vllm-project/vllm@dc917cc --------- Signed-off-by: chencangtao <chencangtao@huawei.com> Co-authored-by: chencangtao <chencangtao@huawei.com> Signed-off-by: momochenchuw <chenchuw@huawei.com>

… pass (vllm-project#6430) ### What this PR does / why we need it? Allreduce rmsnorm fusion pass has an additional check condition, which requires fusion of the Fx graph only when the start of compile_range is greater than 512. We previously overlooked this check. ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.14.1 - vLLM main: vllm-project/vllm@dc917cc --------- Signed-off-by: chencangtao <chencangtao@huawei.com> Co-authored-by: chencangtao <chencangtao@huawei.com> Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>

… pass (vllm-project#6430) ### What this PR does / why we need it? Allreduce rmsnorm fusion pass has an additional check condition, which requires fusion of the Fx graph only when the start of compile_range is greater than 512. We previously overlooked this check. ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.14.1 - vLLM main: vllm-project/vllm@dc917cc --------- Signed-off-by: chencangtao <chencangtao@huawei.com> Co-authored-by: chencangtao <chencangtao@huawei.com>

… pass (vllm-project#6430) ### What this PR does / why we need it? Allreduce rmsnorm fusion pass has an additional check condition, which requires fusion of the Fx graph only when the start of compile_range is greater than 512. We previously overlooked this check. ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.14.1 - vLLM main: vllm-project/vllm@dc917cc --------- Signed-off-by: chencangtao <chencangtao@huawei.com> Co-authored-by: chencangtao <chencangtao@huawei.com> Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>

… pass (vllm-project#6430) ### What this PR does / why we need it? Allreduce rmsnorm fusion pass has an additional check condition, which requires fusion of the Fx graph only when the start of compile_range is greater than 512. We previously overlooked this check. ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.14.1 - vLLM main: vllm-project/vllm@dc917cc --------- Signed-off-by: chencangtao <chencangtao@huawei.com> Co-authored-by: chencangtao <chencangtao@huawei.com>

ChenCangtao requested a review from yiz-liu as a code owner January 30, 2026 09:09

gemini-code-assist bot reviewed Jan 30, 2026

View reviewed changes

add the extra check for allreduce rmsnorm fusion pass

275ad33

Signed-off-by: chencangtao <chencangtao@huawei.com>

ChenCangtao force-pushed the check_allreduce_rmsnorm_fusion_pass branch from 6aa5a87 to 275ad33 Compare January 30, 2026 09:49

chencangtao added 2 commits January 31, 2026 17:33

Merge branch 'main' into check_allreduce_rmsnorm_fusion_pass

e7af89d

add extra_stream_scope_check back

c009e39

Signed-off-by: chencangtao <chencangtao@huawei.com>

wangxiyuan approved these changes Feb 4, 2026

View reviewed changes

wangxiyuan merged commit 7b3921c into vllm-project:main Feb 4, 2026
24 checks passed

wangxiyuan mentioned this pull request Feb 24, 2026

[Misc]: test #6787

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[bugfix][npugraph_ex]add the extra check for allreduce rmsnorm fusion pass#6430

[bugfix][npugraph_ex]add the extra check for allreduce rmsnorm fusion pass#6430
wangxiyuan merged 3 commits intovllm-project:mainfrom
ChenCangtao:check_allreduce_rmsnorm_fusion_pass

ChenCangtao commented Jan 30, 2026 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Jan 30, 2026

Uh oh!

github-actions bot commented Jan 30, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ChenCangtao commented Jan 30, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Jan 30, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Jan 30, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ChenCangtao commented Jan 30, 2026 •

edited by github-actions bot

Loading