Skip to content

[bugfix][npugraph_ex]add the extra check for allreduce rmsnorm fusion pass#6430

Merged
wangxiyuan merged 3 commits intovllm-project:mainfrom
ChenCangtao:check_allreduce_rmsnorm_fusion_pass
Feb 4, 2026
Merged

[bugfix][npugraph_ex]add the extra check for allreduce rmsnorm fusion pass#6430
wangxiyuan merged 3 commits intovllm-project:mainfrom
ChenCangtao:check_allreduce_rmsnorm_fusion_pass

Conversation

@ChenCangtao
Copy link
Copy Markdown
Contributor

@ChenCangtao ChenCangtao commented Jan 30, 2026

What this PR does / why we need it?

Allreduce rmsnorm fusion pass has an additional check condition, which requires fusion of the Fx graph only when the start of compile_range is greater than 512. We previously overlooked this check.

Does this PR introduce any user-facing change?

How was this patch tested?

@ChenCangtao ChenCangtao requested a review from yiz-liu as a code owner January 30, 2026 09:09
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds a missing check to the allreduce rmsnorm fusion pass to ensure it only runs when compile_range.start is above a certain threshold. The implementation correctly adds this check to the extra_check function for the torchair pattern. However, I've identified a critical issue where an existing stream scope check is replaced instead of being augmented, which could lead to incorrect fusions. I've provided a suggestion to combine both checks. Additionally, I've noted a code duplication issue that should be addressed to improve maintainability.

Comment on lines +30 to +32
def extra_check_for_allreduce_rmsnorm_fusion_pass(match: Match) -> bool:
compile_range = get_pass_context().compile_range
return compile_range.start > ALLREDUCE_NORM_FUSE_THREHOLD
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

This new function introduces the intended check, but there are a couple of concerns:

  1. (Critical) The PR description mentions adding an additional check, but the current implementation replaces the existing extra_stream_scope_check. This check prevents fusion across different streams and is likely a necessary safety guardrail that should be preserved. It seems both checks are required. This function should probably also call extra_stream_scope_check, which would require re-adding its import.

  2. (High) The condition compile_range.start > ALLREDUCE_NORM_FUSE_THREHOLD is also present in GraphEXMatmulAllReduceAddRMSNormPass.is_applicable_for_range at the end of this file. This code duplication can lead to maintenance issues. It would be best to extract this logic into a shared helper function to be used in both places.

My suggestion below addresses the critical issue. Please also consider refactoring to address the code duplication.

Suggested change
def extra_check_for_allreduce_rmsnorm_fusion_pass(match: Match) -> bool:
compile_range = get_pass_context().compile_range
return compile_range.start > ALLREDUCE_NORM_FUSE_THREHOLD
def extra_check_for_allreduce_rmsnorm_fusion_pass(match: Match) -> bool:
compile_range = get_pass_context().compile_range
return extra_stream_scope_check(match) and compile_range.start > ALLREDUCE_NORM_FUSE_THREHOLD

@github-actions
Copy link
Copy Markdown
Contributor

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

  • A PR should do only one thing, smaller PRs enable faster reviews.
  • Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
  • Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

Signed-off-by: chencangtao <chencangtao@huawei.com>
@ChenCangtao ChenCangtao force-pushed the check_allreduce_rmsnorm_fusion_pass branch from 6aa5a87 to 275ad33 Compare January 30, 2026 09:49
chencangtao added 2 commits January 31, 2026 17:33
@wangxiyuan wangxiyuan merged commit 7b3921c into vllm-project:main Feb 4, 2026
24 checks passed
845473182 pushed a commit to 845473182/vllm-ascend that referenced this pull request Feb 6, 2026
…to qwen3next_rebase

* 'main' of https://github.com/vllm-project/vllm-ascend: (59 commits)
  [Feat.]: 310p support MOE models (vllm-project#6530)
  [Doc] backport 0.13.0 release note (vllm-project#6584)
  [CI] Update UT CANN version to 8.5.0 for main branch (vllm-project#6564)
  [CI] Change A2 runner (vllm-project#6557)
  [Bugfix] Fix the incorrect use of the output parameter in _forward_fia_slidingwindow (vllm-project#6469)
  [main2main] upgrade vllm main 0202 (vllm-project#6560)
  [CI][npugraph_ex]Fix npugraph ex e2e test (vllm-project#6553)
  [Feature]KV pool supports sparse attention (vllm-project#6339)
  [bugfix]Fix accuracy issue in PCP/DCP with speculative decoding (vllm-project#6491)
  perf: adaptive block size selection in linear_persistent kernel (vllm-project#6537)
  [ModelRunner][Fix] Pads query_start_loc to satisfy FIA/TND constraint (vllm-project#6475)
  [Bugfix]Fix of Pooling Code and Update of Pooling Usage Guide (vllm-project#6126)
  [Fusion] Add rmsnorm dynamic quant fusion pass (vllm-project#6274)
  [Bugfix] Synchronize only the current stream to avoid device sync (vllm-project#6432)
  [CI] Add long and short prompt tests for DeepSeek-V3.2 (vllm-project#6499)
  [Refactor] MLP weight prefetch to consistency with MoE Model's prefetching in terms of code and usage (vllm-project#6442)
  [bugfix][npugraph_ex]duplicate pattern issue (vllm-project#6513)
  [bugfix][npugraph_ex]add the extra check for allreduce rmsnorm fusion pass (vllm-project#6430)
  [Quant] GLM4.7-Flash Support W8A8 (vllm-project#6492)
  [Nightly][BugFix] Remove kv_cache nz test case for test_mla_preprocess_nq.py (vllm-project#6505)
  ...
chenchuw886 pushed a commit to chenchuw886/vllm-ascend that referenced this pull request Feb 12, 2026
… pass (vllm-project#6430)

### What this PR does / why we need it?
Allreduce rmsnorm fusion pass has an additional check condition, which
requires fusion of the Fx graph only when the start of compile_range is
greater than 512. We previously overlooked this check.

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.14.1
- vLLM main:
vllm-project/vllm@dc917cc

---------

Signed-off-by: chencangtao <chencangtao@huawei.com>
Co-authored-by: chencangtao <chencangtao@huawei.com>
Signed-off-by: momochenchuw <chenchuw@huawei.com>
@wangxiyuan wangxiyuan mentioned this pull request Feb 24, 2026
ZRJ026 pushed a commit to ZRJ026/vllm-ascend that referenced this pull request Feb 28, 2026
… pass (vllm-project#6430)

### What this PR does / why we need it?
Allreduce rmsnorm fusion pass has an additional check condition, which
requires fusion of the Fx graph only when the start of compile_range is
greater than 512. We previously overlooked this check.

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.14.1
- vLLM main:
vllm-project/vllm@dc917cc

---------

Signed-off-by: chencangtao <chencangtao@huawei.com>
Co-authored-by: chencangtao <chencangtao@huawei.com>
Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
maoxx241 pushed a commit to maoxx241/vllm-ascend that referenced this pull request Mar 2, 2026
… pass (vllm-project#6430)

### What this PR does / why we need it?
Allreduce rmsnorm fusion pass has an additional check condition, which
requires fusion of the Fx graph only when the start of compile_range is
greater than 512. We previously overlooked this check.

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.14.1
- vLLM main:
vllm-project/vllm@dc917cc

---------

Signed-off-by: chencangtao <chencangtao@huawei.com>
Co-authored-by: chencangtao <chencangtao@huawei.com>
ZRJ026 pushed a commit to ZRJ026/vllm-ascend that referenced this pull request Mar 4, 2026
… pass (vllm-project#6430)

### What this PR does / why we need it?
Allreduce rmsnorm fusion pass has an additional check condition, which
requires fusion of the Fx graph only when the start of compile_range is
greater than 512. We previously overlooked this check.

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.14.1
- vLLM main:
vllm-project/vllm@dc917cc

---------

Signed-off-by: chencangtao <chencangtao@huawei.com>
Co-authored-by: chencangtao <chencangtao@huawei.com>
Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
LCAIZJ pushed a commit to LCAIZJ/vllm-ascend that referenced this pull request Mar 7, 2026
… pass (vllm-project#6430)

### What this PR does / why we need it?
Allreduce rmsnorm fusion pass has an additional check condition, which
requires fusion of the Fx graph only when the start of compile_range is
greater than 512. We previously overlooked this check.

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.14.1
- vLLM main:
vllm-project/vllm@dc917cc

---------

Signed-off-by: chencangtao <chencangtao@huawei.com>
Co-authored-by: chencangtao <chencangtao@huawei.com>
jiangyunfan1 pushed a commit to jiangyunfan1/vllm-ascend that referenced this pull request Apr 9, 2026
… pass (vllm-project#6430)

### What this PR does / why we need it?
Allreduce rmsnorm fusion pass has an additional check condition, which
requires fusion of the Fx graph only when the start of compile_range is
greater than 512. We previously overlooked this check.

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.14.1
- vLLM main:
vllm-project/vllm@dc917cc

---------

Signed-off-by: chencangtao <chencangtao@huawei.com>
Co-authored-by: chencangtao <chencangtao@huawei.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants