[bugfix][npugraph_ex]add the extra check for allreduce rmsnorm fusion pass#6430
Conversation
There was a problem hiding this comment.
Code Review
This pull request adds a missing check to the allreduce rmsnorm fusion pass to ensure it only runs when compile_range.start is above a certain threshold. The implementation correctly adds this check to the extra_check function for the torchair pattern. However, I've identified a critical issue where an existing stream scope check is replaced instead of being augmented, which could lead to incorrect fusions. I've provided a suggestion to combine both checks. Additionally, I've noted a code duplication issue that should be addressed to improve maintainability.
| def extra_check_for_allreduce_rmsnorm_fusion_pass(match: Match) -> bool: | ||
| compile_range = get_pass_context().compile_range | ||
| return compile_range.start > ALLREDUCE_NORM_FUSE_THREHOLD |
There was a problem hiding this comment.
This new function introduces the intended check, but there are a couple of concerns:
-
(Critical) The PR description mentions adding an additional check, but the current implementation replaces the existing
extra_stream_scope_check. This check prevents fusion across different streams and is likely a necessary safety guardrail that should be preserved. It seems both checks are required. This function should probably also callextra_stream_scope_check, which would require re-adding its import. -
(High) The condition
compile_range.start > ALLREDUCE_NORM_FUSE_THREHOLDis also present inGraphEXMatmulAllReduceAddRMSNormPass.is_applicable_for_rangeat the end of this file. This code duplication can lead to maintenance issues. It would be best to extract this logic into a shared helper function to be used in both places.
My suggestion below addresses the critical issue. Please also consider refactoring to address the code duplication.
| def extra_check_for_allreduce_rmsnorm_fusion_pass(match: Match) -> bool: | |
| compile_range = get_pass_context().compile_range | |
| return compile_range.start > ALLREDUCE_NORM_FUSE_THREHOLD | |
| def extra_check_for_allreduce_rmsnorm_fusion_pass(match: Match) -> bool: | |
| compile_range = get_pass_context().compile_range | |
| return extra_stream_scope_check(match) and compile_range.start > ALLREDUCE_NORM_FUSE_THREHOLD |
|
👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:
If CI fails, you can run linting and testing checks locally according Contributing and Testing. |
Signed-off-by: chencangtao <chencangtao@huawei.com>
6aa5a87 to
275ad33
Compare
Signed-off-by: chencangtao <chencangtao@huawei.com>
…to qwen3next_rebase * 'main' of https://github.com/vllm-project/vllm-ascend: (59 commits) [Feat.]: 310p support MOE models (vllm-project#6530) [Doc] backport 0.13.0 release note (vllm-project#6584) [CI] Update UT CANN version to 8.5.0 for main branch (vllm-project#6564) [CI] Change A2 runner (vllm-project#6557) [Bugfix] Fix the incorrect use of the output parameter in _forward_fia_slidingwindow (vllm-project#6469) [main2main] upgrade vllm main 0202 (vllm-project#6560) [CI][npugraph_ex]Fix npugraph ex e2e test (vllm-project#6553) [Feature]KV pool supports sparse attention (vllm-project#6339) [bugfix]Fix accuracy issue in PCP/DCP with speculative decoding (vllm-project#6491) perf: adaptive block size selection in linear_persistent kernel (vllm-project#6537) [ModelRunner][Fix] Pads query_start_loc to satisfy FIA/TND constraint (vllm-project#6475) [Bugfix]Fix of Pooling Code and Update of Pooling Usage Guide (vllm-project#6126) [Fusion] Add rmsnorm dynamic quant fusion pass (vllm-project#6274) [Bugfix] Synchronize only the current stream to avoid device sync (vllm-project#6432) [CI] Add long and short prompt tests for DeepSeek-V3.2 (vllm-project#6499) [Refactor] MLP weight prefetch to consistency with MoE Model's prefetching in terms of code and usage (vllm-project#6442) [bugfix][npugraph_ex]duplicate pattern issue (vllm-project#6513) [bugfix][npugraph_ex]add the extra check for allreduce rmsnorm fusion pass (vllm-project#6430) [Quant] GLM4.7-Flash Support W8A8 (vllm-project#6492) [Nightly][BugFix] Remove kv_cache nz test case for test_mla_preprocess_nq.py (vllm-project#6505) ...
… pass (vllm-project#6430) ### What this PR does / why we need it? Allreduce rmsnorm fusion pass has an additional check condition, which requires fusion of the Fx graph only when the start of compile_range is greater than 512. We previously overlooked this check. ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.14.1 - vLLM main: vllm-project/vllm@dc917cc --------- Signed-off-by: chencangtao <chencangtao@huawei.com> Co-authored-by: chencangtao <chencangtao@huawei.com> Signed-off-by: momochenchuw <chenchuw@huawei.com>
… pass (vllm-project#6430) ### What this PR does / why we need it? Allreduce rmsnorm fusion pass has an additional check condition, which requires fusion of the Fx graph only when the start of compile_range is greater than 512. We previously overlooked this check. ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.14.1 - vLLM main: vllm-project/vllm@dc917cc --------- Signed-off-by: chencangtao <chencangtao@huawei.com> Co-authored-by: chencangtao <chencangtao@huawei.com> Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
… pass (vllm-project#6430) ### What this PR does / why we need it? Allreduce rmsnorm fusion pass has an additional check condition, which requires fusion of the Fx graph only when the start of compile_range is greater than 512. We previously overlooked this check. ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.14.1 - vLLM main: vllm-project/vllm@dc917cc --------- Signed-off-by: chencangtao <chencangtao@huawei.com> Co-authored-by: chencangtao <chencangtao@huawei.com>
… pass (vllm-project#6430) ### What this PR does / why we need it? Allreduce rmsnorm fusion pass has an additional check condition, which requires fusion of the Fx graph only when the start of compile_range is greater than 512. We previously overlooked this check. ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.14.1 - vLLM main: vllm-project/vllm@dc917cc --------- Signed-off-by: chencangtao <chencangtao@huawei.com> Co-authored-by: chencangtao <chencangtao@huawei.com> Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
… pass (vllm-project#6430) ### What this PR does / why we need it? Allreduce rmsnorm fusion pass has an additional check condition, which requires fusion of the Fx graph only when the start of compile_range is greater than 512. We previously overlooked this check. ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.14.1 - vLLM main: vllm-project/vllm@dc917cc --------- Signed-off-by: chencangtao <chencangtao@huawei.com> Co-authored-by: chencangtao <chencangtao@huawei.com>
… pass (vllm-project#6430) ### What this PR does / why we need it? Allreduce rmsnorm fusion pass has an additional check condition, which requires fusion of the Fx graph only when the start of compile_range is greater than 512. We previously overlooked this check. ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.14.1 - vLLM main: vllm-project/vllm@dc917cc --------- Signed-off-by: chencangtao <chencangtao@huawei.com> Co-authored-by: chencangtao <chencangtao@huawei.com>
What this PR does / why we need it?
Allreduce rmsnorm fusion pass has an additional check condition, which requires fusion of the Fx graph only when the start of compile_range is greater than 512. We previously overlooked this check.
Does this PR introduce any user-facing change?
How was this patch tested?