[Fusion] [Graph]Add Matmul Allreduce Rmsnorm fusion Pass#5034
[Fusion] [Graph]Add Matmul Allreduce Rmsnorm fusion Pass#5034realliujiaxu merged 19 commits intovllm-project:mainfrom
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces a new fusion pass for Matmul -> AllReduce -> RMSNorm to optimize performance on Ascend hardware. The changes include a new configuration flag, the fusion pass implementation, and its integration into the compilation process. My review has identified a few issues: there are some leftover debugging print statements that should be removed. More critically, the new fusion pass contains a bug where tensor parallel rank and world size are hardcoded to 0, which will cause failures in distributed setups. There are also some logging statements with inappropriately high severity levels that could flood production logs. I've provided suggestions to fix these issues.
|
👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:
If CI fails, you can run linting and testing checks locally according Contributing and Testing. |
|
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
a81097e to
b35845a
Compare
b35845a to
e29f3c2
Compare
4f2cfd7 to
012317c
Compare
2bbc518 to
33ebc40
Compare
|
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
e911814 to
4a38204
Compare
| super().__init__(vllm_config) | ||
| self.pattern_match_passes: PatternMatcherPass = PatternMatcherPass( | ||
| pass_name="allreduce_rmsnorm_fusion_pass") | ||
|
|
There was a problem hiding this comment.
pass name should change?
There was a problem hiding this comment.
I suggest leave pass and pattern name as MatmulAllReduceAddRMSNormPass and xxMatmulAllReduceAddRMSNormPattern respectively. And using comments to explain that the fusion operator actually splits the allreduce into reducescatter and allgather.
There was a problem hiding this comment.
Thanks. I have modified it.
4a38204 to
e7a51b7
Compare
Signed-off-by: wxsIcey <1790571317@qq.com>
2ce80a9 to
168a667
Compare
…to FIA_rebase * 'main' of https://github.com/vllm-project/vllm-ascend: (110 commits) [Performance] Remove index opetation when VLLM_ASCEND_FLASHCOMM2_PARALLEL_SIZE=1 (vllm-project#5936) [main][bugfix] fix mooncake kv cache transfer when one P has multi nodes (vllm-project#5960) [Feature] Adapt DispathGmmCombineDecode opertor to align with weight scale dtype of small operators. [RFC: issue 5476] (vllm-project#5755) [Refactor] Move AttentionSpec initialization to Attention module (vllm-project#5834) [EPLB][Bugfix] policy_swift_balancer bugfix and renaming (vllm-project#5897) [CI]fix for lint CI (vllm-project#5982) [Fusion] [Graph]Add Matmul Allreduce Rmsnorm fusion Pass (vllm-project#5034) [Refactor] Migrate profiler config from env vars to explicit ProfilerConfig (vllm-project#5928) [EPLB][Bugfix] Dispatch Allgather use log2phy if enable eplb (vllm-project#5933) [EPLB][Nightly][Bugfix] Get expert from moe layer only (vllm-project#5908) [Bugfix][MM] Fix multi-modal inference OOM issues by setting `expandable_segments:True` (vllm-project#5855) [doc]Table split (vllm-project#5929) [Doc] Upgrade outdated ut doc (vllm-project#5937) [Lint]Style: Convert `vllm-ascend/` to ruff format(Batch vllm-project#2) (vllm-project#5977) Eagle3 mm support, enablement on qwen3vl (vllm-project#4848) [Doc] Remove Chinese characters from the icons in the doc. (vllm-project#5959) [P/D]The issue of solving the force-free secondary release request, which causes the node to crash. (vllm-project#5968) [Feature] Support fine-grained shared expert overlap (vllm-project#5482) [Bugfix] fix cpu offload hang with tp=1 (vllm-project#5963) [Feature]: Support 310P device run qwen2.5/3 dense and qwen2.5vl models (vllm-project#5776) ...
…to qwen3next_rebase * 'main' of https://github.com/vllm-project/vllm-ascend: (637 commits) [Performance] Remove index opetation when VLLM_ASCEND_FLASHCOMM2_PARALLEL_SIZE=1 (vllm-project#5936) [main][bugfix] fix mooncake kv cache transfer when one P has multi nodes (vllm-project#5960) [Feature] Adapt DispathGmmCombineDecode opertor to align with weight scale dtype of small operators. [RFC: issue 5476] (vllm-project#5755) [Refactor] Move AttentionSpec initialization to Attention module (vllm-project#5834) [EPLB][Bugfix] policy_swift_balancer bugfix and renaming (vllm-project#5897) [CI]fix for lint CI (vllm-project#5982) [Fusion] [Graph]Add Matmul Allreduce Rmsnorm fusion Pass (vllm-project#5034) [Refactor] Migrate profiler config from env vars to explicit ProfilerConfig (vllm-project#5928) [EPLB][Bugfix] Dispatch Allgather use log2phy if enable eplb (vllm-project#5933) [EPLB][Nightly][Bugfix] Get expert from moe layer only (vllm-project#5908) [Bugfix][MM] Fix multi-modal inference OOM issues by setting `expandable_segments:True` (vllm-project#5855) [doc]Table split (vllm-project#5929) [Doc] Upgrade outdated ut doc (vllm-project#5937) [Lint]Style: Convert `vllm-ascend/` to ruff format(Batch vllm-project#2) (vllm-project#5977) Eagle3 mm support, enablement on qwen3vl (vllm-project#4848) [Doc] Remove Chinese characters from the icons in the doc. (vllm-project#5959) [P/D]The issue of solving the force-free secondary release request, which causes the node to crash. (vllm-project#5968) [Feature] Support fine-grained shared expert overlap (vllm-project#5482) [Bugfix] fix cpu offload hang with tp=1 (vllm-project#5963) [Feature]: Support 310P device run qwen2.5/3 dense and qwen2.5vl models (vllm-project#5776) ...
…t#5034) This PR add `MatmulAllreduceRmsnorm` operator and introduces a graph fusion pass for `matmul_allreduce_rmsnorm` operations. The implementation includes a new configuration flag, a pattern matching pass using `torch._inductor.pattern_matcher`. Co-authored-by: Trunrain [270250579@qq.com](mailto:270250579@qq.com) - vLLM version: v0.12.0 - vLLM main: vllm-project/vllm@ad32e3e --------- Signed-off-by: wxsIcey <1790571317@qq.com> Signed-off-by: tongrunze <t00574058@china.huawei.com>
…t#5034) This PR add `MatmulAllreduceRmsnorm` operator and introduces a graph fusion pass for `matmul_allreduce_rmsnorm` operations. The implementation includes a new configuration flag, a pattern matching pass using `torch._inductor.pattern_matcher`. Co-authored-by: Trunrain [270250579@qq.com](mailto:270250579@qq.com) - vLLM version: v0.12.0 - vLLM main: vllm-project/vllm@ad32e3e --------- Signed-off-by: wxsIcey <1790571317@qq.com> Signed-off-by: tongrunze <t00574058@china.huawei.com> Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
…t#5034) This PR add `MatmulAllreduceRmsnorm` operator and introduces a graph fusion pass for `matmul_allreduce_rmsnorm` operations. The implementation includes a new configuration flag, a pattern matching pass using `torch._inductor.pattern_matcher`. Co-authored-by: Trunrain [270250579@qq.com](mailto:270250579@qq.com) - vLLM version: v0.12.0 - vLLM main: vllm-project/vllm@ad32e3e --------- Signed-off-by: wxsIcey <1790571317@qq.com> Signed-off-by: tongrunze <t00574058@china.huawei.com>
…t#5034) This PR add `MatmulAllreduceRmsnorm` operator and introduces a graph fusion pass for `matmul_allreduce_rmsnorm` operations. The implementation includes a new configuration flag, a pattern matching pass using `torch._inductor.pattern_matcher`. Co-authored-by: Trunrain [270250579@qq.com](mailto:270250579@qq.com) - vLLM version: v0.12.0 - vLLM main: vllm-project/vllm@ad32e3e --------- Signed-off-by: wxsIcey <1790571317@qq.com> Signed-off-by: tongrunze <t00574058@china.huawei.com> Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
…t#5034) This PR add `MatmulAllreduceRmsnorm` operator and introduces a graph fusion pass for `matmul_allreduce_rmsnorm` operations. The implementation includes a new configuration flag, a pattern matching pass using `torch._inductor.pattern_matcher`. Co-authored-by: Trunrain [270250579@qq.com](mailto:270250579@qq.com) - vLLM version: v0.12.0 - vLLM main: vllm-project/vllm@ad32e3e --------- Signed-off-by: wxsIcey <1790571317@qq.com> Signed-off-by: tongrunze <t00574058@china.huawei.com>
…t#5034) This PR add `MatmulAllreduceRmsnorm` operator and introduces a graph fusion pass for `matmul_allreduce_rmsnorm` operations. The implementation includes a new configuration flag, a pattern matching pass using `torch._inductor.pattern_matcher`. Co-authored-by: Trunrain [270250579@qq.com](mailto:270250579@qq.com) - vLLM version: v0.12.0 - vLLM main: vllm-project/vllm@ad32e3e --------- Signed-off-by: wxsIcey <1790571317@qq.com> Signed-off-by: tongrunze <t00574058@china.huawei.com> [bugfix]nd to nz optimize 1. adapt to allreduce_rmsnorm_fusion_pass 2. nd 2 nz
…t#5034) This PR add `MatmulAllreduceRmsnorm` operator and introduces a graph fusion pass for `matmul_allreduce_rmsnorm` operations. The implementation includes a new configuration flag, a pattern matching pass using `torch._inductor.pattern_matcher`. Co-authored-by: Trunrain [270250579@qq.com](mailto:270250579@qq.com) - vLLM version: v0.12.0 - vLLM main: vllm-project/vllm@ad32e3e --------- Signed-off-by: wxsIcey <1790571317@qq.com> Signed-off-by: tongrunze <t00574058@china.huawei.com> [bugfix]nd to nz optimize 1. adapt to allreduce_rmsnorm_fusion_pass 2. nd 2 nz
…t#5034) This PR add `MatmulAllreduceRmsnorm` operator and introduces a graph fusion pass for `matmul_allreduce_rmsnorm` operations. The implementation includes a new configuration flag, a pattern matching pass using `torch._inductor.pattern_matcher`. Co-authored-by: Trunrain [270250579@qq.com](mailto:270250579@qq.com) - vLLM version: v0.12.0 - vLLM main: vllm-project/vllm@ad32e3e --------- Signed-off-by: wxsIcey <1790571317@qq.com> Signed-off-by: tongrunze <t00574058@china.huawei.com>
…t#5034) This PR add `MatmulAllreduceRmsnorm` operator and introduces a graph fusion pass for `matmul_allreduce_rmsnorm` operations. The implementation includes a new configuration flag, a pattern matching pass using `torch._inductor.pattern_matcher`. Co-authored-by: Trunrain [270250579@qq.com](mailto:270250579@qq.com) - vLLM version: v0.12.0 - vLLM main: vllm-project/vllm@ad32e3e --------- Signed-off-by: wxsIcey <1790571317@qq.com> Signed-off-by: tongrunze <t00574058@china.huawei.com> [bugfix]nd to nz optimize 1. adapt to allreduce_rmsnorm_fusion_pass 2. nd 2 nz
This PR add
MatmulAllreduceRmsnormoperator and introduces a graph fusion pass formatmul_allreduce_rmsnormoperations. The implementation includes a new configuration flag, a pattern matching pass usingtorch._inductor.pattern_matcher.Co-authored-by: Trunrain 270250579@qq.com