[Fusion] [Graph]Add Matmul Allreduce Rmsnorm fusion Pass by wxsIcey · Pull Request #5034 · vllm-project/vllm-ascend

wxsIcey · 2025-12-15T10:47:54Z

This PR add MatmulAllreduceRmsnorm operator and introduces a graph fusion pass for matmul_allreduce_rmsnorm operations. The implementation includes a new configuration flag, a pattern matching pass using torch._inductor.pattern_matcher.

Co-authored-by: Trunrain 270250579@qq.com

vLLM version: v0.12.0
vLLM main: vllm-project/vllm@ad32e3e

gemini-code-assist

Code Review

This pull request introduces a new fusion pass for Matmul -> AllReduce -> RMSNorm to optimize performance on Ascend hardware. The changes include a new configuration flag, the fusion pass implementation, and its integration into the compilation process. My review has identified a few issues: there are some leftover debugging print statements that should be removed. More critically, the new fusion pass contains a bug where tensor parallel rank and world size are hardcoded to 0, which will cause failures in distributed setups. There are also some logging statements with inappropriately high severity levels that could flood production logs. I've provided suggestions to fix these issues.

github-actions · 2025-12-15T11:00:35Z

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

A PR should do only one thing, smaller PRs enable faster reviews.
Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

github-actions · 2025-12-17T00:56:54Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

github-actions · 2026-01-08T07:42:35Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

wxsIcey · 2026-01-13T02:11:41Z

+        super().__init__(vllm_config)
+        self.pattern_match_passes: PatternMatcherPass = PatternMatcherPass(
+            pass_name="allreduce_rmsnorm_fusion_pass")
+


pass name should change?

I suggest leave pass and pattern name as MatmulAllReduceAddRMSNormPass and xxMatmulAllReduceAddRMSNormPattern respectively. And using comments to explain that the fusion operator actually splits the allreduce into reducescatter and allgather.

Thanks. I have modified it.

realliujiaxu

What model was used for the verification? It would be great if you could add the performance data. This whill allow users to assess whether the performance benefit meet their expectations during usage.

Signed-off-by: wxsIcey <1790571317@qq.com>

Signed-off-by: wxsIcey <1790571317@qq.com> Signed-off-by: tongrunze <t00574058@china.huawei.com>

Signed-off-by: wxsIcey <1790571317@qq.com>

…to FIA_rebase * 'main' of https://github.com/vllm-project/vllm-ascend: (110 commits) [Performance] Remove index opetation when VLLM_ASCEND_FLASHCOMM2_PARALLEL_SIZE=1 (vllm-project#5936) [main][bugfix] fix mooncake kv cache transfer when one P has multi nodes (vllm-project#5960) [Feature] Adapt DispathGmmCombineDecode opertor to align with weight scale dtype of small operators. [RFC: issue 5476] (vllm-project#5755) [Refactor] Move AttentionSpec initialization to Attention module (vllm-project#5834) [EPLB][Bugfix] policy_swift_balancer bugfix and renaming (vllm-project#5897) [CI]fix for lint CI (vllm-project#5982) [Fusion] [Graph]Add Matmul Allreduce Rmsnorm fusion Pass (vllm-project#5034) [Refactor] Migrate profiler config from env vars to explicit ProfilerConfig (vllm-project#5928) [EPLB][Bugfix] Dispatch Allgather use log2phy if enable eplb (vllm-project#5933) [EPLB][Nightly][Bugfix] Get expert from moe layer only (vllm-project#5908) [Bugfix][MM] Fix multi-modal inference OOM issues by setting `expandable_segments:True` (vllm-project#5855) [doc]Table split (vllm-project#5929) [Doc] Upgrade outdated ut doc (vllm-project#5937) [Lint]Style: Convert `vllm-ascend/` to ruff format(Batch vllm-project#2) (vllm-project#5977) Eagle3 mm support, enablement on qwen3vl (vllm-project#4848) [Doc] Remove Chinese characters from the icons in the doc. (vllm-project#5959) [P/D]The issue of solving the force-free secondary release request, which causes the node to crash. (vllm-project#5968) [Feature] Support fine-grained shared expert overlap (vllm-project#5482) [Bugfix] fix cpu offload hang with tp=1 (vllm-project#5963) [Feature]: Support 310P device run qwen2.5/3 dense and qwen2.5vl models (vllm-project#5776) ...

…to qwen3next_rebase * 'main' of https://github.com/vllm-project/vllm-ascend: (637 commits) [Performance] Remove index opetation when VLLM_ASCEND_FLASHCOMM2_PARALLEL_SIZE=1 (vllm-project#5936) [main][bugfix] fix mooncake kv cache transfer when one P has multi nodes (vllm-project#5960) [Feature] Adapt DispathGmmCombineDecode opertor to align with weight scale dtype of small operators. [RFC: issue 5476] (vllm-project#5755) [Refactor] Move AttentionSpec initialization to Attention module (vllm-project#5834) [EPLB][Bugfix] policy_swift_balancer bugfix and renaming (vllm-project#5897) [CI]fix for lint CI (vllm-project#5982) [Fusion] [Graph]Add Matmul Allreduce Rmsnorm fusion Pass (vllm-project#5034) [Refactor] Migrate profiler config from env vars to explicit ProfilerConfig (vllm-project#5928) [EPLB][Bugfix] Dispatch Allgather use log2phy if enable eplb (vllm-project#5933) [EPLB][Nightly][Bugfix] Get expert from moe layer only (vllm-project#5908) [Bugfix][MM] Fix multi-modal inference OOM issues by setting `expandable_segments:True` (vllm-project#5855) [doc]Table split (vllm-project#5929) [Doc] Upgrade outdated ut doc (vllm-project#5937) [Lint]Style: Convert `vllm-ascend/` to ruff format(Batch vllm-project#2) (vllm-project#5977) Eagle3 mm support, enablement on qwen3vl (vllm-project#4848) [Doc] Remove Chinese characters from the icons in the doc. (vllm-project#5959) [P/D]The issue of solving the force-free secondary release request, which causes the node to crash. (vllm-project#5968) [Feature] Support fine-grained shared expert overlap (vllm-project#5482) [Bugfix] fix cpu offload hang with tp=1 (vllm-project#5963) [Feature]: Support 310P device run qwen2.5/3 dense and qwen2.5vl models (vllm-project#5776) ...

…t#5034) This PR add `MatmulAllreduceRmsnorm` operator and introduces a graph fusion pass for `matmul_allreduce_rmsnorm` operations. The implementation includes a new configuration flag, a pattern matching pass using `torch._inductor.pattern_matcher`. Co-authored-by: Trunrain [270250579@qq.com](mailto:270250579@qq.com) - vLLM version: v0.12.0 - vLLM main: vllm-project/vllm@ad32e3e --------- Signed-off-by: wxsIcey <1790571317@qq.com> Signed-off-by: tongrunze <t00574058@china.huawei.com>

…t#5034) This PR add `MatmulAllreduceRmsnorm` operator and introduces a graph fusion pass for `matmul_allreduce_rmsnorm` operations. The implementation includes a new configuration flag, a pattern matching pass using `torch._inductor.pattern_matcher`. Co-authored-by: Trunrain [270250579@qq.com](mailto:270250579@qq.com) - vLLM version: v0.12.0 - vLLM main: vllm-project/vllm@ad32e3e --------- Signed-off-by: wxsIcey <1790571317@qq.com> Signed-off-by: tongrunze <t00574058@china.huawei.com> Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>

…t#5034) This PR add `MatmulAllreduceRmsnorm` operator and introduces a graph fusion pass for `matmul_allreduce_rmsnorm` operations. The implementation includes a new configuration flag, a pattern matching pass using `torch._inductor.pattern_matcher`. Co-authored-by: Trunrain [270250579@qq.com](mailto:270250579@qq.com) - vLLM version: v0.12.0 - vLLM main: vllm-project/vllm@ad32e3e --------- Signed-off-by: wxsIcey <1790571317@qq.com> Signed-off-by: tongrunze <t00574058@china.huawei.com>

…t#5034) This PR add `MatmulAllreduceRmsnorm` operator and introduces a graph fusion pass for `matmul_allreduce_rmsnorm` operations. The implementation includes a new configuration flag, a pattern matching pass using `torch._inductor.pattern_matcher`. Co-authored-by: Trunrain [270250579@qq.com](mailto:270250579@qq.com) - vLLM version: v0.12.0 - vLLM main: vllm-project/vllm@ad32e3e --------- Signed-off-by: wxsIcey <1790571317@qq.com> Signed-off-by: tongrunze <t00574058@china.huawei.com> Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>

…t#5034) This PR add `MatmulAllreduceRmsnorm` operator and introduces a graph fusion pass for `matmul_allreduce_rmsnorm` operations. The implementation includes a new configuration flag, a pattern matching pass using `torch._inductor.pattern_matcher`. Co-authored-by: Trunrain [270250579@qq.com](mailto:270250579@qq.com) - vLLM version: v0.12.0 - vLLM main: vllm-project/vllm@ad32e3e --------- Signed-off-by: wxsIcey <1790571317@qq.com> Signed-off-by: tongrunze <t00574058@china.huawei.com>

…t#5034) This PR add `MatmulAllreduceRmsnorm` operator and introduces a graph fusion pass for `matmul_allreduce_rmsnorm` operations. The implementation includes a new configuration flag, a pattern matching pass using `torch._inductor.pattern_matcher`. Co-authored-by: Trunrain [270250579@qq.com](mailto:270250579@qq.com) - vLLM version: v0.12.0 - vLLM main: vllm-project/vllm@ad32e3e --------- Signed-off-by: wxsIcey <1790571317@qq.com> Signed-off-by: tongrunze <t00574058@china.huawei.com> [bugfix]nd to nz optimize 1. adapt to allreduce_rmsnorm_fusion_pass 2. nd 2 nz

…t#5034) This PR add `MatmulAllreduceRmsnorm` operator and introduces a graph fusion pass for `matmul_allreduce_rmsnorm` operations. The implementation includes a new configuration flag, a pattern matching pass using `torch._inductor.pattern_matcher`. Co-authored-by: Trunrain [270250579@qq.com](mailto:270250579@qq.com) - vLLM version: v0.12.0 - vLLM main: vllm-project/vllm@ad32e3e --------- Signed-off-by: wxsIcey <1790571317@qq.com> Signed-off-by: tongrunze <t00574058@china.huawei.com>

…t#5034) This PR add `MatmulAllreduceRmsnorm` operator and introduces a graph fusion pass for `matmul_allreduce_rmsnorm` operations. The implementation includes a new configuration flag, a pattern matching pass using `torch._inductor.pattern_matcher`. Co-authored-by: Trunrain [270250579@qq.com](mailto:270250579@qq.com) - vLLM version: v0.12.0 - vLLM main: vllm-project/vllm@ad32e3e --------- Signed-off-by: wxsIcey <1790571317@qq.com> Signed-off-by: tongrunze <t00574058@china.huawei.com> [bugfix]nd to nz optimize 1. adapt to allreduce_rmsnorm_fusion_pass 2. nd 2 nz

gemini-code-assist Bot reviewed Dec 15, 2025

View reviewed changes

Comment thread vllm_ascend/compilation/passes/allreduce_rmsnorm_fusion_pass.py Outdated

Comment thread vllm_ascend/compilation/compiler_interface.py Outdated

Comment thread vllm_ascend/compilation/passes/allreduce_rmsnorm_fusion_pass.py Outdated

github-actions Bot added the module:core label Dec 15, 2025

github-actions Bot added the merge-conflicts label Dec 17, 2025

wxsIcey force-pushed the matmul_allreduce_addrmsnorm branch from a81097e to b35845a Compare December 22, 2025 04:55

github-actions Bot removed the merge-conflicts label Dec 22, 2025

wxsIcey force-pushed the matmul_allreduce_addrmsnorm branch from b35845a to e29f3c2 Compare December 26, 2025 09:45

wxsIcey marked this pull request as ready for review December 27, 2025 07:04

wxsIcey force-pushed the matmul_allreduce_addrmsnorm branch from 4f2cfd7 to 012317c Compare January 5, 2026 09:26

Trunrain force-pushed the matmul_allreduce_addrmsnorm branch from 2bbc518 to 33ebc40 Compare January 6, 2026 02:02

realliujiaxu reviewed Jan 6, 2026

View reviewed changes

Comment thread vllm_ascend/worker/model_runner_v1.py Outdated

realliujiaxu reviewed Jan 6, 2026

View reviewed changes

Comment thread vllm_ascend/worker/model_runner_v1.py Outdated

weijinqian0 reviewed Jan 6, 2026

View reviewed changes

Comment thread vllm_ascend/patch/worker/patch_unquantized_gemm.py

wxsIcey commented Jan 6, 2026

View reviewed changes

Comment thread vllm_ascend/compilation/passes/allreduce_rmsnorm_fusion_pass.py Outdated

wxsIcey commented Jan 6, 2026

View reviewed changes

Comment thread vllm_ascend/patch/worker/patch_distributed.py

github-actions Bot added the merge-conflicts label Jan 8, 2026

Trunrain force-pushed the matmul_allreduce_addrmsnorm branch 2 times, most recently from e911814 to 4a38204 Compare January 12, 2026 11:10

github-actions Bot removed the merge-conflicts label Jan 12, 2026

wxsIcey commented Jan 13, 2026

View reviewed changes

Comment thread vllm_ascend/compilation/passes/allreduce_rmsnorm_fusion_pass.py Outdated

wxsIcey commented Jan 13, 2026

View reviewed changes

Comment thread vllm_ascend/compilation/passes/allreduce_rmsnorm_fusion_pass.py Outdated

realliujiaxu reviewed Jan 13, 2026

View reviewed changes

Comment thread vllm_ascend/patch/worker/__init__.py Outdated

realliujiaxu reviewed Jan 13, 2026

View reviewed changes

Comment thread vllm_ascend/compilation/passes/allreduce_rmsnorm_fusion_pass.py Outdated

realliujiaxu reviewed Jan 13, 2026

View reviewed changes

Comment thread vllm_ascend/compilation/passes/allreduce_rmsnorm_fusion_pass.py Outdated

realliujiaxu requested changes Jan 13, 2026

View reviewed changes

wxsIcey mentioned this pull request Jan 13, 2026

[Feat]support sequence parallelism by pass for VL models #5632

Merged

wxsIcey force-pushed the matmul_allreduce_addrmsnorm branch from 4a38204 to e7a51b7 Compare January 13, 2026 10:37

wxsIcey added 8 commits January 17, 2026 23:36

add patch

96be8ce

Signed-off-by: wxsIcey <1790571317@qq.com>

fix

21593f7

Signed-off-by: wxsIcey <1790571317@qq.com> Signed-off-by: tongrunze <t00574058@china.huawei.com>

fix

08af085

Signed-off-by: wxsIcey <1790571317@qq.com>

fix

1d31a4c

Signed-off-by: wxsIcey <1790571317@qq.com>

default close the matmulallreduceaddrmsnorm fusion

ef80a9c

Signed-off-by: wxsIcey <1790571317@qq.com>

fix

43528b4

Signed-off-by: wxsIcey <1790571317@qq.com>

fix ut

6425402

Signed-off-by: wxsIcey <1790571317@qq.com>

fix

168a667

Signed-off-by: wxsIcey <1790571317@qq.com>

wxsIcey force-pushed the matmul_allreduce_addrmsnorm branch from 2ce80a9 to 168a667 Compare January 17, 2026 15:36

wxsIcey requested review from wangxiyuan and yiz-liu as code owners January 17, 2026 15:36

github-actions Bot removed the merge-conflicts label Jan 17, 2026

wxsIcey added 5 commits January 18, 2026 18:51

fix compile range

707eaeb

Signed-off-by: wxsIcey <1790571317@qq.com>

fix typo

5bb521c

Signed-off-by: wxsIcey <1790571317@qq.com>

fix

d602082

Signed-off-by: wxsIcey <1790571317@qq.com>

fix

3f3e27f

Signed-off-by: wxsIcey <1790571317@qq.com>

fix lint

4b9d9e3

Signed-off-by: wxsIcey <1790571317@qq.com>

realliujiaxu approved these changes Jan 19, 2026

View reviewed changes

realliujiaxu merged commit c929bd1 into vllm-project:main Jan 19, 2026
20 checks passed

Conversation

wxsIcey commented Dec 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented Dec 15, 2025

Uh oh!

github-actions Bot commented Dec 17, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented Jan 8, 2026

Uh oh!

Uh oh!

wxsIcey Jan 13, 2026

Choose a reason for hiding this comment

Uh oh!

realliujiaxu Jan 13, 2026

Choose a reason for hiding this comment

Uh oh!

wxsIcey Jan 13, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

realliujiaxu left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

wxsIcey commented Dec 15, 2025 •

edited

Loading

realliujiaxu left a comment •

edited

Loading