Skip to content

[Fusion] [Graph]Add Matmul Allreduce Rmsnorm fusion Pass#5034

Merged
realliujiaxu merged 19 commits intovllm-project:mainfrom
wxsIcey:matmul_allreduce_addrmsnorm
Jan 19, 2026
Merged

[Fusion] [Graph]Add Matmul Allreduce Rmsnorm fusion Pass#5034
realliujiaxu merged 19 commits intovllm-project:mainfrom
wxsIcey:matmul_allreduce_addrmsnorm

Conversation

@wxsIcey
Copy link
Copy Markdown
Collaborator

@wxsIcey wxsIcey commented Dec 15, 2025

This PR add MatmulAllreduceRmsnorm operator and introduces a graph fusion pass for matmul_allreduce_rmsnorm operations. The implementation includes a new configuration flag, a pattern matching pass using torch._inductor.pattern_matcher.

Co-authored-by: Trunrain 270250579@qq.com

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new fusion pass for Matmul -> AllReduce -> RMSNorm to optimize performance on Ascend hardware. The changes include a new configuration flag, the fusion pass implementation, and its integration into the compilation process. My review has identified a few issues: there are some leftover debugging print statements that should be removed. More critically, the new fusion pass contains a bug where tensor parallel rank and world size are hardcoded to 0, which will cause failures in distributed setups. There are also some logging statements with inappropriately high severity levels that could flood production logs. I've provided suggestions to fix these issues.

Comment thread vllm_ascend/compilation/passes/allreduce_rmsnorm_fusion_pass.py Outdated
Comment thread vllm_ascend/compilation/compiler_interface.py Outdated
Comment thread vllm_ascend/compilation/passes/allreduce_rmsnorm_fusion_pass.py Outdated
@github-actions
Copy link
Copy Markdown
Contributor

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

  • A PR should do only one thing, smaller PRs enable faster reviews.
  • Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
  • Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

@github-actions
Copy link
Copy Markdown
Contributor

This pull request has conflicts, please resolve those before we can evaluate the pull request.

@wxsIcey wxsIcey force-pushed the matmul_allreduce_addrmsnorm branch from a81097e to b35845a Compare December 22, 2025 04:55
@wxsIcey wxsIcey force-pushed the matmul_allreduce_addrmsnorm branch from b35845a to e29f3c2 Compare December 26, 2025 09:45
@wxsIcey wxsIcey marked this pull request as ready for review December 27, 2025 07:04
@wxsIcey wxsIcey force-pushed the matmul_allreduce_addrmsnorm branch from 4f2cfd7 to 012317c Compare January 5, 2026 09:26
@Trunrain Trunrain force-pushed the matmul_allreduce_addrmsnorm branch from 2bbc518 to 33ebc40 Compare January 6, 2026 02:02
Comment thread vllm_ascend/worker/model_runner_v1.py Outdated
Comment thread vllm_ascend/worker/model_runner_v1.py Outdated
Comment thread vllm_ascend/patch/worker/patch_unquantized_gemm.py
Comment thread vllm_ascend/compilation/passes/allreduce_rmsnorm_fusion_pass.py Outdated
Comment thread vllm_ascend/patch/worker/patch_distributed.py
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jan 8, 2026

This pull request has conflicts, please resolve those before we can evaluate the pull request.

@Trunrain Trunrain force-pushed the matmul_allreduce_addrmsnorm branch 2 times, most recently from e911814 to 4a38204 Compare January 12, 2026 11:10
Comment thread vllm_ascend/compilation/passes/allreduce_rmsnorm_fusion_pass.py Outdated
super().__init__(vllm_config)
self.pattern_match_passes: PatternMatcherPass = PatternMatcherPass(
pass_name="allreduce_rmsnorm_fusion_pass")

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pass name should change?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest leave pass and pattern name as MatmulAllReduceAddRMSNormPass and xxMatmulAllReduceAddRMSNormPattern respectively. And using comments to explain that the fusion operator actually splits the allreduce into reducescatter and allgather.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I have modified it.

Comment thread vllm_ascend/compilation/passes/allreduce_rmsnorm_fusion_pass.py Outdated
Comment thread vllm_ascend/patch/worker/__init__.py Outdated
Comment thread vllm_ascend/compilation/passes/allreduce_rmsnorm_fusion_pass.py Outdated
Comment thread vllm_ascend/compilation/passes/allreduce_rmsnorm_fusion_pass.py Outdated
Copy link
Copy Markdown
Collaborator

@realliujiaxu realliujiaxu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What model was used for the verification? It would be great if you could add the performance data. This whill allow users to assess whether the performance benefit meet their expectations during usage.

Signed-off-by: wxsIcey <1790571317@qq.com>
Signed-off-by: wxsIcey <1790571317@qq.com>
Signed-off-by: tongrunze <t00574058@china.huawei.com>
Signed-off-by: wxsIcey <1790571317@qq.com>
Signed-off-by: wxsIcey <1790571317@qq.com>
Signed-off-by: wxsIcey <1790571317@qq.com>
Signed-off-by: wxsIcey <1790571317@qq.com>
Signed-off-by: wxsIcey <1790571317@qq.com>
Signed-off-by: wxsIcey <1790571317@qq.com>
Signed-off-by: wxsIcey <1790571317@qq.com>
Signed-off-by: wxsIcey <1790571317@qq.com>
Signed-off-by: wxsIcey <1790571317@qq.com>
Signed-off-by: wxsIcey <1790571317@qq.com>
Signed-off-by: wxsIcey <1790571317@qq.com>
@realliujiaxu realliujiaxu merged commit c929bd1 into vllm-project:main Jan 19, 2026
20 checks passed
845473182 pushed a commit to 845473182/vllm-ascend that referenced this pull request Jan 19, 2026
…to FIA_rebase

* 'main' of https://github.com/vllm-project/vllm-ascend: (110 commits)
  [Performance] Remove index opetation when VLLM_ASCEND_FLASHCOMM2_PARALLEL_SIZE=1 (vllm-project#5936)
  [main][bugfix] fix mooncake kv cache transfer when one P has multi nodes (vllm-project#5960)
  [Feature] Adapt DispathGmmCombineDecode opertor to align with weight scale dtype of small operators. [RFC: issue 5476] (vllm-project#5755)
  [Refactor] Move AttentionSpec initialization to Attention module (vllm-project#5834)
  [EPLB][Bugfix] policy_swift_balancer bugfix and renaming (vllm-project#5897)
  [CI]fix for lint CI (vllm-project#5982)
  [Fusion] [Graph]Add Matmul Allreduce Rmsnorm fusion Pass (vllm-project#5034)
  [Refactor] Migrate profiler config from env vars to explicit ProfilerConfig (vllm-project#5928)
  [EPLB][Bugfix] Dispatch Allgather use log2phy if enable eplb (vllm-project#5933)
  [EPLB][Nightly][Bugfix] Get expert from moe layer only (vllm-project#5908)
  [Bugfix][MM] Fix multi-modal inference OOM issues by setting `expandable_segments:True` (vllm-project#5855)
  [doc]Table split  (vllm-project#5929)
  [Doc] Upgrade outdated ut doc (vllm-project#5937)
  [Lint]Style: Convert `vllm-ascend/` to ruff format(Batch vllm-project#2) (vllm-project#5977)
  Eagle3 mm support, enablement on qwen3vl (vllm-project#4848)
  [Doc] Remove Chinese characters from the icons in the doc. (vllm-project#5959)
  [P/D]The issue of solving the force-free secondary release request, which causes the node to crash. (vllm-project#5968)
  [Feature] Support fine-grained shared expert overlap (vllm-project#5482)
  [Bugfix] fix cpu offload hang with tp=1 (vllm-project#5963)
  [Feature]: Support 310P device run qwen2.5/3 dense and qwen2.5vl models (vllm-project#5776)
  ...
845473182 pushed a commit to 845473182/vllm-ascend that referenced this pull request Jan 21, 2026
…to qwen3next_rebase

* 'main' of https://github.com/vllm-project/vllm-ascend: (637 commits)
  [Performance] Remove index opetation when VLLM_ASCEND_FLASHCOMM2_PARALLEL_SIZE=1 (vllm-project#5936)
  [main][bugfix] fix mooncake kv cache transfer when one P has multi nodes (vllm-project#5960)
  [Feature] Adapt DispathGmmCombineDecode opertor to align with weight scale dtype of small operators. [RFC: issue 5476] (vllm-project#5755)
  [Refactor] Move AttentionSpec initialization to Attention module (vllm-project#5834)
  [EPLB][Bugfix] policy_swift_balancer bugfix and renaming (vllm-project#5897)
  [CI]fix for lint CI (vllm-project#5982)
  [Fusion] [Graph]Add Matmul Allreduce Rmsnorm fusion Pass (vllm-project#5034)
  [Refactor] Migrate profiler config from env vars to explicit ProfilerConfig (vllm-project#5928)
  [EPLB][Bugfix] Dispatch Allgather use log2phy if enable eplb (vllm-project#5933)
  [EPLB][Nightly][Bugfix] Get expert from moe layer only (vllm-project#5908)
  [Bugfix][MM] Fix multi-modal inference OOM issues by setting `expandable_segments:True` (vllm-project#5855)
  [doc]Table split  (vllm-project#5929)
  [Doc] Upgrade outdated ut doc (vllm-project#5937)
  [Lint]Style: Convert `vllm-ascend/` to ruff format(Batch vllm-project#2) (vllm-project#5977)
  Eagle3 mm support, enablement on qwen3vl (vllm-project#4848)
  [Doc] Remove Chinese characters from the icons in the doc. (vllm-project#5959)
  [P/D]The issue of solving the force-free secondary release request, which causes the node to crash. (vllm-project#5968)
  [Feature] Support fine-grained shared expert overlap (vllm-project#5482)
  [Bugfix] fix cpu offload hang with tp=1 (vllm-project#5963)
  [Feature]: Support 310P device run qwen2.5/3 dense and qwen2.5vl models (vllm-project#5776)
  ...
starmountain1997 pushed a commit to starmountain1997/vllm-ascend that referenced this pull request Jan 31, 2026
…t#5034)

This PR add `MatmulAllreduceRmsnorm` operator and introduces a graph
fusion pass for `matmul_allreduce_rmsnorm` operations. The
implementation includes a new configuration flag, a pattern matching
pass using `torch._inductor.pattern_matcher`.

Co-authored-by: Trunrain [270250579@qq.com](mailto:270250579@qq.com)

- vLLM version: v0.12.0
- vLLM main:
vllm-project/vllm@ad32e3e

---------

Signed-off-by: wxsIcey <1790571317@qq.com>
Signed-off-by: tongrunze <t00574058@china.huawei.com>
ZRJ026 pushed a commit to ZRJ026/vllm-ascend that referenced this pull request Feb 28, 2026
…t#5034)

This PR add `MatmulAllreduceRmsnorm` operator and introduces a graph
fusion pass for `matmul_allreduce_rmsnorm` operations. The
implementation includes a new configuration flag, a pattern matching
pass using `torch._inductor.pattern_matcher`.

Co-authored-by: Trunrain [270250579@qq.com](mailto:270250579@qq.com)

- vLLM version: v0.12.0
- vLLM main:
vllm-project/vllm@ad32e3e

---------

Signed-off-by: wxsIcey <1790571317@qq.com>
Signed-off-by: tongrunze <t00574058@china.huawei.com>
Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
maoxx241 pushed a commit to maoxx241/vllm-ascend that referenced this pull request Mar 2, 2026
…t#5034)

This PR add `MatmulAllreduceRmsnorm` operator and introduces a graph
fusion pass for `matmul_allreduce_rmsnorm` operations. The
implementation includes a new configuration flag, a pattern matching
pass using `torch._inductor.pattern_matcher`.

Co-authored-by: Trunrain [270250579@qq.com](mailto:270250579@qq.com)

- vLLM version: v0.12.0
- vLLM main:
vllm-project/vllm@ad32e3e

---------

Signed-off-by: wxsIcey <1790571317@qq.com>
Signed-off-by: tongrunze <t00574058@china.huawei.com>
ZRJ026 pushed a commit to ZRJ026/vllm-ascend that referenced this pull request Mar 4, 2026
…t#5034)

This PR add `MatmulAllreduceRmsnorm` operator and introduces a graph
fusion pass for `matmul_allreduce_rmsnorm` operations. The
implementation includes a new configuration flag, a pattern matching
pass using `torch._inductor.pattern_matcher`.

Co-authored-by: Trunrain [270250579@qq.com](mailto:270250579@qq.com)

- vLLM version: v0.12.0
- vLLM main:
vllm-project/vllm@ad32e3e

---------

Signed-off-by: wxsIcey <1790571317@qq.com>
Signed-off-by: tongrunze <t00574058@china.huawei.com>
Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
shitianlong1994 pushed a commit to triomino/vllm-ascend that referenced this pull request Mar 5, 2026
…t#5034)

This PR add `MatmulAllreduceRmsnorm` operator and introduces a graph
fusion pass for `matmul_allreduce_rmsnorm` operations. The
implementation includes a new configuration flag, a pattern matching
pass using `torch._inductor.pattern_matcher`.

Co-authored-by: Trunrain [270250579@qq.com](mailto:270250579@qq.com)

- vLLM version: v0.12.0
- vLLM main:
vllm-project/vllm@ad32e3e

---------

Signed-off-by: wxsIcey <1790571317@qq.com>
Signed-off-by: tongrunze <t00574058@china.huawei.com>
shitianlong1994 pushed a commit to triomino/vllm-ascend that referenced this pull request Mar 6, 2026
…t#5034)

This PR add `MatmulAllreduceRmsnorm` operator and introduces a graph
fusion pass for `matmul_allreduce_rmsnorm` operations. The
implementation includes a new configuration flag, a pattern matching
pass using `torch._inductor.pattern_matcher`.

Co-authored-by: Trunrain [270250579@qq.com](mailto:270250579@qq.com)

- vLLM version: v0.12.0
- vLLM main:
vllm-project/vllm@ad32e3e

---------

Signed-off-by: wxsIcey <1790571317@qq.com>
Signed-off-by: tongrunze <t00574058@china.huawei.com>

[bugfix]nd to nz optimize
1. adapt to allreduce_rmsnorm_fusion_pass
2. nd 2 nz
shitianlong1994 pushed a commit to triomino/vllm-ascend that referenced this pull request Mar 6, 2026
…t#5034)

This PR add `MatmulAllreduceRmsnorm` operator and introduces a graph
fusion pass for `matmul_allreduce_rmsnorm` operations. The
implementation includes a new configuration flag, a pattern matching
pass using `torch._inductor.pattern_matcher`.

Co-authored-by: Trunrain [270250579@qq.com](mailto:270250579@qq.com)

- vLLM version: v0.12.0
- vLLM main:
vllm-project/vllm@ad32e3e

---------

Signed-off-by: wxsIcey <1790571317@qq.com>
Signed-off-by: tongrunze <t00574058@china.huawei.com>

[bugfix]nd to nz optimize
1. adapt to allreduce_rmsnorm_fusion_pass
2. nd 2 nz
LCAIZJ pushed a commit to LCAIZJ/vllm-ascend that referenced this pull request Mar 7, 2026
…t#5034)

This PR add `MatmulAllreduceRmsnorm` operator and introduces a graph
fusion pass for `matmul_allreduce_rmsnorm` operations. The
implementation includes a new configuration flag, a pattern matching
pass using `torch._inductor.pattern_matcher`.

Co-authored-by: Trunrain [270250579@qq.com](mailto:270250579@qq.com)

- vLLM version: v0.12.0
- vLLM main:
vllm-project/vllm@ad32e3e

---------

Signed-off-by: wxsIcey <1790571317@qq.com>
Signed-off-by: tongrunze <t00574058@china.huawei.com>
triomino pushed a commit to triomino/vllm-ascend that referenced this pull request Mar 10, 2026
…t#5034)

This PR add `MatmulAllreduceRmsnorm` operator and introduces a graph
fusion pass for `matmul_allreduce_rmsnorm` operations. The
implementation includes a new configuration flag, a pattern matching
pass using `torch._inductor.pattern_matcher`.

Co-authored-by: Trunrain [270250579@qq.com](mailto:270250579@qq.com)

- vLLM version: v0.12.0
- vLLM main:
vllm-project/vllm@ad32e3e

---------

Signed-off-by: wxsIcey <1790571317@qq.com>
Signed-off-by: tongrunze <t00574058@china.huawei.com>

[bugfix]nd to nz optimize
1. adapt to allreduce_rmsnorm_fusion_pass
2. nd 2 nz
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

module:core ready read for review ready-for-test start test by label for PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants