-
Notifications
You must be signed in to change notification settings - Fork 604
[Main] [Refactor] Enable MoECommMethod in Eager Mode #2791
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Main] [Refactor] Enable MoECommMethod in Eager Mode #2791
Conversation
|
👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:
If CI fails, you can run linting and testing checks locally according Contributing and Testing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request refactors the Fused MoE implementation by restructuring the code into more modular and object-oriented components. The changes are extensive, moving logic from vllm_ascend/distributed/moe_comm_method.py and vllm_ascend/ops/common_fused_moe.py into new classes like MoECommMethod, FusedMoEPrepareAndFinalize, and various TokenDispatcher implementations under a new vllm_ascend/ops/moe/ directory. This improves code organization and separation of concerns. New unit tests have been added for the refactored components.
My main concern is with the NativeAllGatherCommImpl, which appears to be broken by the refactoring, containing dead code and not providing the native PyTorch fallback it's intended to. This should be addressed to avoid confusion and future bugs.
| class NativeAllGatherCommImpl(AllGatherCommImpl): | ||
| """This implementation should be compatible with all scenarios. | ||
| Note that this implementation purely consists of native PyTorch ops | ||
| and does not use any NPU-specific ops. So the performance may not be optimal. | ||
| But it is a good fallback for scenarios where NPU-specific ops are not available. | ||
| """ | ||
|
|
||
| def permute( | ||
| self, | ||
| hidden_states: torch.Tensor, | ||
| topk_ids: torch.Tensor, | ||
| topk_weights: torch.Tensor, | ||
| expert_map: torch.Tensor, | ||
| num_experts: int, | ||
| apply_a8_quantization: bool, | ||
| ) -> tuple[torch.Tensor, torch.Tensor, Optional[torch.Tensor], int]: | ||
| num_tokens = hidden_states.shape[0] | ||
|
|
||
| # Generate token indices and flatten | ||
| token_indices = torch.arange(num_tokens, | ||
| device=hidden_states.device, | ||
| dtype=torch.int64) | ||
| token_indices = (token_indices.unsqueeze(1).expand( | ||
| -1, self.moe_config.experts_per_token).reshape(-1)) | ||
|
|
||
| # Flatten token-to-expert mappings and map to local experts | ||
| weights_flat = topk_weights.view(-1) | ||
| experts_flat = topk_ids.view(-1) | ||
| local_experts_flat = (expert_map[experts_flat] | ||
| if expert_map is not None else experts_flat) | ||
|
|
||
| # Filter valid token-expert pairs | ||
| mask = local_experts_flat != -1 | ||
| # FIXME: npu_grouped_matmul output random values at [num_valid_tokens:, ...] | ||
| # So we need to filter out invalid tokens by zeroing their weights. | ||
| # This is a workaround and should be removed after the issue is fixed | ||
| filtered_weights = torch.where(mask, weights_flat, | ||
| torch.zeros_like(weights_flat)).to( | ||
| topk_weights.dtype) | ||
| filtered_experts = torch.where( | ||
| mask, | ||
| local_experts_flat, | ||
| torch.full_like(local_experts_flat, num_experts), | ||
| ).to(topk_ids.dtype) | ||
|
|
||
| # Sort by local expert IDs | ||
| sort_indices = torch.argsort(filtered_experts.view(torch.float32)) | ||
| self.sorted_token_indices = token_indices[sort_indices] | ||
| self.sorted_weights = filtered_weights[sort_indices] | ||
|
|
||
| # Compute token counts with minlength of num_experts | ||
| # This is equivalent to but faster than: | ||
| # >>> token_counts = torch.bincount(filtered_experts, minlength=num_experts)[:-1] | ||
| token_counts = torch.zeros(num_experts + 1, | ||
| device=hidden_states.device, | ||
| dtype=torch.int64) | ||
| ones = torch.ones_like(filtered_experts, dtype=torch.int64) | ||
| token_counts.scatter_add_(0, filtered_experts.to(torch.int64), ones) | ||
| expert_tokens = token_counts[:num_experts] | ||
|
|
||
| # Rearrange hidden_states | ||
| permuted_hidden_states = hidden_states[self.sorted_token_indices] | ||
|
|
||
| group_list_type = 1 # `count` mode | ||
|
|
||
| return permuted_hidden_states, expert_tokens, None, group_list_type | ||
|
|
||
| def unpermute(self, mlp_output: torch.Tensor, | ||
| hidden_states: torch.Tensor) -> None: | ||
| mlp_output = mlp_output * self.sorted_weights.unsqueeze(1) | ||
|
|
||
| final_hidden_states = torch.zeros_like(hidden_states) | ||
| final_hidden_states.index_add_(0, self.sorted_token_indices, | ||
| mlp_output) | ||
|
|
||
| hidden_states[:] = final_hidden_states | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The permute and unpermute methods within NativeAllGatherCommImpl appear to be dead code after the refactoring. The fused_experts method from the base MoECommMethod is now used, which calls a token_dispatcher instead of these methods.
Furthermore, NativeAllGatherCommImpl inherits its _get_token_dispatcher method from AllGatherCommImpl, which provides TokenDispatcherWithAllGather. This dispatcher uses NPU-specific operations, contradicting the docstring of NativeAllGatherCommImpl which states it should only use native PyTorch ops. This makes the class non-functional as a native fallback.
To fix this, these unused methods should be removed. A proper native fallback would require a new NativeTokenDispatcher that implements the logic from the old permute and unpermute methods, and NativeAllGatherCommImpl should be updated to use it.
|
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
a6dc375 to
9ff7384
Compare
278a4dc to
4e3a7fe
Compare
|
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
0dd9668 to
495551f
Compare
495551f to
3bf442d
Compare
|
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
87afa53 to
461834a
Compare
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #2791 +/- ##
==========================================
- Coverage 74.76% 74.75% -0.02%
==========================================
Files 150 153 +3
Lines 20891 21080 +189
==========================================
+ Hits 15620 15758 +138
- Misses 5271 5322 +51
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
approved |
Co-Authored-By: weijinqian0 <[email protected]> Signed-off-by: Pr0Wh1teGivee <[email protected]>
b31c231 to
93cd027
Compare
| _output_dtype = w2_scale.dtype | ||
|
|
||
| is_mc2 = get_forward_context().fused_moe_state == FusedMoEState.MC2 | ||
| is_mc2 = get_forward_context().moe_comm_method_name == "mc2commimpl" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there are a lot hardcode xxxcommimpl, it's good to use Enum instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair point, will be refactored in my next pr.
| router_logits = self.naive_multicast( | ||
| router_logits, cu_tokens_across_dp_cpu) | ||
| moe_comm_method_name = forward_context.moe_comm_method_name | ||
| forward_context.moe_comm_method = getattr(self, moe_comm_method_name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
set a new property to forward_context here is not suitable, it breaks the princeple of forward_context and make the code hard for debug. While i notice that it's been done in common fused moe already. Let's refactor it in the next PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair point, will be refactored in my next pr.
| elif soc_version in {AscendSocVersion.A3}: | ||
| moe_comm_method = "mc2" if num_tokens <= self.mc2_tokens_capacity else "alltoall" | ||
| else: | ||
| raise ValueError(f"Unsupported soc_version: {soc_version}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks that 310p is missing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FusedExpertsWithMoGE will be used by judging "model_type == "PanguProMoE" in the future. No need to add 310P here.
| if not self.parallel_config.enable_expert_parallel: | ||
| moe_comm_method = "allgather" | ||
| elif soc_version in {AscendSocVersion.A2}: | ||
| if num_tokens <= self.mc2_tokens_capacity and self.parallel_config.world_size_across_dp >= 16: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
previous: go into allgather if VLLM_ENABLE_FUSED_EXPERTS_ALLGATHER_EP enabled
now: go into allgather if num_tokens <= self.mc2_tokens_capacity and self.parallel_config.world_size_across_dp >= 16 is False.
is the case squal?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Confirmed with Ant, it is ok to use _select_comm_method.
wangxiyuan
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All the review comment can be fixed in the next PR
|
pangu is broken by qwen3-next PR. issue is here: #2949 We'll fix it later. |
### What this PR does / why we need it? Fix issues mentioned in #2791 and some minor refactoring. 1. Use Enum instead of string. 2. Avoid setting a new property to forward_context in AscendFusedMoE.forward(). 3. Enabling TokenDispatcherWithMoge. 4. Remove redundant code. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Qwen3-30B-A3B/Qwen3-30B-A3B-W8A8/DeepSeek-V3-W4A8-Pruing/deepseek-mtp/pangu-pro-moe-pruing: 1. Enable/Disable EP 2. Aclgraph & eager - vLLM version: v0.10.2 - vLLM main: vllm-project/vllm@9607d5e Signed-off-by: Pr0Wh1teGivee <[email protected]> Co-authored-by: weijinqian0 <[email protected]>
…ject#3001) ### What this PR does / why we need it? Fix issues mentioned in vllm-project#2791 and some minor refactoring. 1. Use Enum instead of string. 2. Avoid setting a new property to forward_context in AscendFusedMoE.forward(). 3. Enabling TokenDispatcherWithMoge. 4. Remove redundant code. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Qwen3-30B-A3B/Qwen3-30B-A3B-W8A8/DeepSeek-V3-W4A8-Pruing/deepseek-mtp/pangu-pro-moe-pruing: 1. Enable/Disable EP 2. Aclgraph & eager - vLLM version: v0.10.2 - vLLM main: vllm-project/vllm@9607d5e Signed-off-by: Pr0Wh1teGivee <[email protected]> Co-authored-by: weijinqian0 <[email protected]> Signed-off-by: Che Ruan <[email protected]>
### What this PR does / why we need it? 1. Replace prepare/finalize operation in fused_moe.py by moe_comm_method.prepare()/finalize() 2. Replace unified_fused_experts by moe_comm_method.fused_experts() in fused_moe.py/w8a8_dynamic.py/w4a8_dynamic.py 3. Add calling _select_moe_comm_method in spec-decode proposers. 4. Currently, w4a8_dynamic does not support gatherep, use all2allv instead. 5. Remove redundant code. ### Does this PR introduce _any_ user-facing change? AllgatherEP switch is disabled in aclgraph/eager mode, just follow the rules in modelrunner_v1._select_moe_comm_method() ### How was this patch tested? e2e & ut - vLLM version: v0.10.2 - vLLM main: vllm-project/vllm@7f6f2c1 Signed-off-by: Pr0Wh1teGivee <[email protected]> Co-authored-by: weijinqian0 <[email protected]>
…nalinaly (#3406) I'd like to nominate 4 new maintainers for vllm-ascend: ---- Yizhou Liu [@yiz-liu](https://github.com/yiz-liu) ---- **Review Quality**: He has completed [40+ reviews](https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+commenter%3Ayiz-liu) and provided solutions or guides for [10+ issues](https://github.com/vllm-project/vllm-ascend/issues?q=is%3Aissue%20commenter%3Ayiz-liu), which includes many quality review like [#issue-3428408401](#3002 (comment)), [#discussion_r2224572309](#1803 (comment)), [#issuecomment-2982470226](#1261 (comment)), [#issuecomment-2903621197](#836 (comment)), [#issuecomment-2857678691](#778 (comment)). **Sustained and High-Quality Contributions:** He has contributed more than [30+ commits](https://github.com/vllm-project/vllm-ascend/commits?author=yiz-liu) since Mar.2025, especially, aclgraph, DP, and EP related contributions are the main reason why I nominated him. As the owner of aclgraph support, he continuously improves aclgraph stability and performance as well as fixes key bugs. he laid the groundwork for EP-related functionality and delivered multiple foundational improvements **Community involvement:** He has a very good habit of logging issues:#1649 and is also very active and involved in [many issues](https://github.com/vllm-project/vllm-ascend/issues?q=is%3Aissue%20state%3Aopen%20commenter%3Ayiz-liu%20-author%3Ayiz-liu) to help users resolve issues. ---- Peng Yu [@paulyu12](https://github.com/paulyu12) --- The main reasons for his nomination are his expertise and key contributions to the LORA and sustained and major contributions (initial support/doc/bugfix) around Lora. **Sustained and Major Contributions:** @paulyu12 starts his contribution with [Lora and Mulit-Lora support](697908f) since Apr 2025, he contributed about [10+ commits and bugfixes](697908f) on vllm-ascend. **Review Quality and Community Involvement:** He also helped more than 10+ users address [Lora related issues](https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+commenter%3Apaulyu12+-author%3Apaulyu12+is%3Aclosed). I believe his addition will further improve vLLM Ascend Lora support. ---- Jinqian Wei [@weijinqian0](https://github.com/weijinqian0) --- The main reasons for his nomination are his key contributions to the RL scene and the high quality of his code reviews. **Review Quality:** He has completed [60+ reviews](https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+commenter%3Aweijinqian0+is%3Aopen+-author%3Aweijinqian0) since June. 2025, include [#comment-3284055430](#2791 (comment)), [discussion_r2332166704](#2817 (comment)), [discussion_r2343289692](#2846 (comment)) high quality review. **Sustained and Quality Contributions:** He has Deep understanding of vLLM and vLLM Ascend codebases and solid contributions in RL scene (about [10+ PR merged](https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+author%3Aweijinqian0+is%3Amerged+) and 10+ PRs merged as co-author. - Code Refactor: As a co-author, he participated in the refactoring of the MOE module #2150 #2706 #2867 - Performance Enhancement for RL: Participated as a co-author in the design and development of the solution, contributing to the planning of core capabilities. #1547 #2120 and so on. So I think he's a great addition to the vLLM Ascend Maintainer team. ---- Chuanyu Qin [@nalinaly](https://github.com/nalinaly) --- The main reason I nominated Qinchuanyu is because he is the initial designer of aclgraph and torch-npu, two key components of vllm-ascend. Considering aclgraph will eventually become the main path for vllm-ascend's graph model, I propose to nominate him. **Sustained and Major Contributions:** In fact, chuanyu actively helped the users/developers of vllm-ascend since Mar 2025 ([vllm-discuss#162](https://discuss.vllm.ai/t/can-ascend-officially-draft-a-documentation-on-the-vllm-ascend-adaptation-for-graph-mode/162/5)), and also helped early users of vllm-ascend understand aclgraph. He provided lots of help in the process of integrating aclgraph with vllm-ascend. **Community Involvement:** As speaker, he also presents help users understand aclgraph and torch_npu [《The design philosophy of torch_npu and the high performance principle of aclGraph》](https://github.com/PyTorch-China/pytorch-meetup/blob/main/beijing-2025/%E3%80%905%E3%80%91torch_npu%20%E7%9A%84%E8%AE%BE%E8%AE%A1%E5%93%B2%E5%AD%A6%E4%B8%8E%20aclGraph%20%E9%AB%98%E6%80%A7%E8%83%BD%E5%8E%9F%E7%90%86-%E7%A7%A6%E4%BC%A0%E7%91%9C-0920.pdf) ---- They have activate contribution to vllm-ascend or have rich experience for ascend AI. Welcome! - vLLM version: v0.11.0rc3 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0 Signed-off-by: wangxiyuan <[email protected]>
### What this PR does / why we need it? 1. Replace prepare/finalize operation in fused_moe.py by moe_comm_method.prepare()/finalize() 2. Replace unified_fused_experts by moe_comm_method.fused_experts() in fused_moe.py/w8a8_dynamic.py/w4a8_dynamic.py 3. Add calling _select_moe_comm_method in spec-decode proposers. 4. Currently, w4a8_dynamic does not support gatherep, use all2allv instead. 5. Remove redundant code. ### Does this PR introduce _any_ user-facing change? AllgatherEP switch is disabled in aclgraph/eager mode, just follow the rules in modelrunner_v1._select_moe_comm_method() ### How was this patch tested? e2e & ut - vLLM version: v0.10.2 - vLLM main: vllm-project/vllm@7f6f2c1 Signed-off-by: Pr0Wh1teGivee <[email protected]> Co-authored-by: weijinqian0 <[email protected]>
…ject#3001) ### What this PR does / why we need it? Fix issues mentioned in vllm-project#2791 and some minor refactoring. 1. Use Enum instead of string. 2. Avoid setting a new property to forward_context in AscendFusedMoE.forward(). 3. Enabling TokenDispatcherWithMoge. 4. Remove redundant code. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Qwen3-30B-A3B/Qwen3-30B-A3B-W8A8/DeepSeek-V3-W4A8-Pruing/deepseek-mtp/pangu-pro-moe-pruing: 1. Enable/Disable EP 2. Aclgraph & eager - vLLM version: v0.10.2 - vLLM main: vllm-project/vllm@9607d5e Signed-off-by: Pr0Wh1teGivee <[email protected]> Co-authored-by: weijinqian0 <[email protected]>
…nalinaly (vllm-project#3406) I'd like to nominate 4 new maintainers for vllm-ascend: ---- Yizhou Liu [@yiz-liu](https://github.com/yiz-liu) ---- **Review Quality**: He has completed [40+ reviews](https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+commenter%3Ayiz-liu) and provided solutions or guides for [10+ issues](https://github.com/vllm-project/vllm-ascend/issues?q=is%3Aissue%20commenter%3Ayiz-liu), which includes many quality review like [#issue-3428408401](vllm-project#3002 (comment)), [#discussion_r2224572309](vllm-project#1803 (comment)), [#issuecomment-2982470226](vllm-project#1261 (comment)), [#issuecomment-2903621197](vllm-project#836 (comment)), [#issuecomment-2857678691](vllm-project#778 (comment)). **Sustained and High-Quality Contributions:** He has contributed more than [30+ commits](https://github.com/vllm-project/vllm-ascend/commits?author=yiz-liu) since Mar.2025, especially, aclgraph, DP, and EP related contributions are the main reason why I nominated him. As the owner of aclgraph support, he continuously improves aclgraph stability and performance as well as fixes key bugs. he laid the groundwork for EP-related functionality and delivered multiple foundational improvements **Community involvement:** He has a very good habit of logging issues:vllm-project#1649 and is also very active and involved in [many issues](https://github.com/vllm-project/vllm-ascend/issues?q=is%3Aissue%20state%3Aopen%20commenter%3Ayiz-liu%20-author%3Ayiz-liu) to help users resolve issues. ---- Peng Yu [@paulyu12](https://github.com/paulyu12) --- The main reasons for his nomination are his expertise and key contributions to the LORA and sustained and major contributions (initial support/doc/bugfix) around Lora. **Sustained and Major Contributions:** @paulyu12 starts his contribution with [Lora and Mulit-Lora support](vllm-project@697908f) since Apr 2025, he contributed about [10+ commits and bugfixes](vllm-project@697908f) on vllm-ascend. **Review Quality and Community Involvement:** He also helped more than 10+ users address [Lora related issues](https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+commenter%3Apaulyu12+-author%3Apaulyu12+is%3Aclosed). I believe his addition will further improve vLLM Ascend Lora support. ---- Jinqian Wei [@weijinqian0](https://github.com/weijinqian0) --- The main reasons for his nomination are his key contributions to the RL scene and the high quality of his code reviews. **Review Quality:** He has completed [60+ reviews](https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+commenter%3Aweijinqian0+is%3Aopen+-author%3Aweijinqian0) since June. 2025, include [#comment-3284055430](vllm-project#2791 (comment)), [discussion_r2332166704](vllm-project#2817 (comment)), [discussion_r2343289692](vllm-project#2846 (comment)) high quality review. **Sustained and Quality Contributions:** He has Deep understanding of vLLM and vLLM Ascend codebases and solid contributions in RL scene (about [10+ PR merged](https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+author%3Aweijinqian0+is%3Amerged+) and 10+ PRs merged as co-author. - Code Refactor: As a co-author, he participated in the refactoring of the MOE module vllm-project#2150 vllm-project#2706 vllm-project#2867 - Performance Enhancement for RL: Participated as a co-author in the design and development of the solution, contributing to the planning of core capabilities. vllm-project#1547 vllm-project#2120 and so on. So I think he's a great addition to the vLLM Ascend Maintainer team. ---- Chuanyu Qin [@nalinaly](https://github.com/nalinaly) --- The main reason I nominated Qinchuanyu is because he is the initial designer of aclgraph and torch-npu, two key components of vllm-ascend. Considering aclgraph will eventually become the main path for vllm-ascend's graph model, I propose to nominate him. **Sustained and Major Contributions:** In fact, chuanyu actively helped the users/developers of vllm-ascend since Mar 2025 ([vllm-discuss#162](https://discuss.vllm.ai/t/can-ascend-officially-draft-a-documentation-on-the-vllm-ascend-adaptation-for-graph-mode/162/5)), and also helped early users of vllm-ascend understand aclgraph. He provided lots of help in the process of integrating aclgraph with vllm-ascend. **Community Involvement:** As speaker, he also presents help users understand aclgraph and torch_npu [《The design philosophy of torch_npu and the high performance principle of aclGraph》](https://github.com/PyTorch-China/pytorch-meetup/blob/main/beijing-2025/%E3%80%905%E3%80%91torch_npu%20%E7%9A%84%E8%AE%BE%E8%AE%A1%E5%93%B2%E5%AD%A6%E4%B8%8E%20aclGraph%20%E9%AB%98%E6%80%A7%E8%83%BD%E5%8E%9F%E7%90%86-%E7%A7%A6%E4%BC%A0%E7%91%9C-0920.pdf) ---- They have activate contribution to vllm-ascend or have rich experience for ascend AI. Welcome! - vLLM version: v0.11.0rc3 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0 Signed-off-by: wangxiyuan <[email protected]>
…nalinaly (vllm-project#3406) I'd like to nominate 4 new maintainers for vllm-ascend: ---- Yizhou Liu [@yiz-liu](https://github.com/yiz-liu) ---- **Review Quality**: He has completed [40+ reviews](https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+commenter%3Ayiz-liu) and provided solutions or guides for [10+ issues](https://github.com/vllm-project/vllm-ascend/issues?q=is%3Aissue%20commenter%3Ayiz-liu), which includes many quality review like [#issue-3428408401](vllm-project#3002 (comment)), [#discussion_r2224572309](vllm-project#1803 (comment)), [#issuecomment-2982470226](vllm-project#1261 (comment)), [#issuecomment-2903621197](vllm-project#836 (comment)), [#issuecomment-2857678691](vllm-project#778 (comment)). **Sustained and High-Quality Contributions:** He has contributed more than [30+ commits](https://github.com/vllm-project/vllm-ascend/commits?author=yiz-liu) since Mar.2025, especially, aclgraph, DP, and EP related contributions are the main reason why I nominated him. As the owner of aclgraph support, he continuously improves aclgraph stability and performance as well as fixes key bugs. he laid the groundwork for EP-related functionality and delivered multiple foundational improvements **Community involvement:** He has a very good habit of logging issues:vllm-project#1649 and is also very active and involved in [many issues](https://github.com/vllm-project/vllm-ascend/issues?q=is%3Aissue%20state%3Aopen%20commenter%3Ayiz-liu%20-author%3Ayiz-liu) to help users resolve issues. ---- Peng Yu [@paulyu12](https://github.com/paulyu12) --- The main reasons for his nomination are his expertise and key contributions to the LORA and sustained and major contributions (initial support/doc/bugfix) around Lora. **Sustained and Major Contributions:** @paulyu12 starts his contribution with [Lora and Mulit-Lora support](vllm-project@697908f) since Apr 2025, he contributed about [10+ commits and bugfixes](vllm-project@697908f) on vllm-ascend. **Review Quality and Community Involvement:** He also helped more than 10+ users address [Lora related issues](https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+commenter%3Apaulyu12+-author%3Apaulyu12+is%3Aclosed). I believe his addition will further improve vLLM Ascend Lora support. ---- Jinqian Wei [@weijinqian0](https://github.com/weijinqian0) --- The main reasons for his nomination are his key contributions to the RL scene and the high quality of his code reviews. **Review Quality:** He has completed [60+ reviews](https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+commenter%3Aweijinqian0+is%3Aopen+-author%3Aweijinqian0) since June. 2025, include [#comment-3284055430](vllm-project#2791 (comment)), [discussion_r2332166704](vllm-project#2817 (comment)), [discussion_r2343289692](vllm-project#2846 (comment)) high quality review. **Sustained and Quality Contributions:** He has Deep understanding of vLLM and vLLM Ascend codebases and solid contributions in RL scene (about [10+ PR merged](https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+author%3Aweijinqian0+is%3Amerged+) and 10+ PRs merged as co-author. - Code Refactor: As a co-author, he participated in the refactoring of the MOE module vllm-project#2150 vllm-project#2706 vllm-project#2867 - Performance Enhancement for RL: Participated as a co-author in the design and development of the solution, contributing to the planning of core capabilities. vllm-project#1547 vllm-project#2120 and so on. So I think he's a great addition to the vLLM Ascend Maintainer team. ---- Chuanyu Qin [@nalinaly](https://github.com/nalinaly) --- The main reason I nominated Qinchuanyu is because he is the initial designer of aclgraph and torch-npu, two key components of vllm-ascend. Considering aclgraph will eventually become the main path for vllm-ascend's graph model, I propose to nominate him. **Sustained and Major Contributions:** In fact, chuanyu actively helped the users/developers of vllm-ascend since Mar 2025 ([vllm-discuss#162](https://discuss.vllm.ai/t/can-ascend-officially-draft-a-documentation-on-the-vllm-ascend-adaptation-for-graph-mode/162/5)), and also helped early users of vllm-ascend understand aclgraph. He provided lots of help in the process of integrating aclgraph with vllm-ascend. **Community Involvement:** As speaker, he also presents help users understand aclgraph and torch_npu [《The design philosophy of torch_npu and the high performance principle of aclGraph》](https://github.com/PyTorch-China/pytorch-meetup/blob/main/beijing-2025/%E3%80%905%E3%80%91torch_npu%20%E7%9A%84%E8%AE%BE%E8%AE%A1%E5%93%B2%E5%AD%A6%E4%B8%8E%20aclGraph%20%E9%AB%98%E6%80%A7%E8%83%BD%E5%8E%9F%E7%90%86-%E7%A7%A6%E4%BC%A0%E7%91%9C-0920.pdf) ---- They have activate contribution to vllm-ascend or have rich experience for ascend AI. Welcome! - vLLM version: v0.11.0rc3 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0 Signed-off-by: wangxiyuan <[email protected]> Signed-off-by: luolun <[email protected]>
…ject#3001) ### What this PR does / why we need it? Fix issues mentioned in vllm-project#2791 and some minor refactoring. 1. Use Enum instead of string. 2. Avoid setting a new property to forward_context in AscendFusedMoE.forward(). 3. Enabling TokenDispatcherWithMoge. 4. Remove redundant code. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Qwen3-30B-A3B/Qwen3-30B-A3B-W8A8/DeepSeek-V3-W4A8-Pruing/deepseek-mtp/pangu-pro-moe-pruing: 1. Enable/Disable EP 2. Aclgraph & eager - vLLM version: v0.10.2 - vLLM main: vllm-project/vllm@9607d5e Signed-off-by: Pr0Wh1teGivee <[email protected]> Co-authored-by: weijinqian0 <[email protected]> Signed-off-by: hwhaokun <[email protected]>
…nalinaly (vllm-project#3406) I'd like to nominate 4 new maintainers for vllm-ascend: ---- Yizhou Liu [@yiz-liu](https://github.com/yiz-liu) ---- **Review Quality**: He has completed [40+ reviews](https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+commenter%3Ayiz-liu) and provided solutions or guides for [10+ issues](https://github.com/vllm-project/vllm-ascend/issues?q=is%3Aissue%20commenter%3Ayiz-liu), which includes many quality review like [#issue-3428408401](vllm-project#3002 (comment)), [#discussion_r2224572309](vllm-project#1803 (comment)), [#issuecomment-2982470226](vllm-project#1261 (comment)), [#issuecomment-2903621197](vllm-project#836 (comment)), [#issuecomment-2857678691](vllm-project#778 (comment)). **Sustained and High-Quality Contributions:** He has contributed more than [30+ commits](https://github.com/vllm-project/vllm-ascend/commits?author=yiz-liu) since Mar.2025, especially, aclgraph, DP, and EP related contributions are the main reason why I nominated him. As the owner of aclgraph support, he continuously improves aclgraph stability and performance as well as fixes key bugs. he laid the groundwork for EP-related functionality and delivered multiple foundational improvements **Community involvement:** He has a very good habit of logging issues:vllm-project#1649 and is also very active and involved in [many issues](https://github.com/vllm-project/vllm-ascend/issues?q=is%3Aissue%20state%3Aopen%20commenter%3Ayiz-liu%20-author%3Ayiz-liu) to help users resolve issues. ---- Peng Yu [@paulyu12](https://github.com/paulyu12) --- The main reasons for his nomination are his expertise and key contributions to the LORA and sustained and major contributions (initial support/doc/bugfix) around Lora. **Sustained and Major Contributions:** @paulyu12 starts his contribution with [Lora and Mulit-Lora support](vllm-project@697908f) since Apr 2025, he contributed about [10+ commits and bugfixes](vllm-project@697908f) on vllm-ascend. **Review Quality and Community Involvement:** He also helped more than 10+ users address [Lora related issues](https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+commenter%3Apaulyu12+-author%3Apaulyu12+is%3Aclosed). I believe his addition will further improve vLLM Ascend Lora support. ---- Jinqian Wei [@weijinqian0](https://github.com/weijinqian0) --- The main reasons for his nomination are his key contributions to the RL scene and the high quality of his code reviews. **Review Quality:** He has completed [60+ reviews](https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+commenter%3Aweijinqian0+is%3Aopen+-author%3Aweijinqian0) since June. 2025, include [#comment-3284055430](vllm-project#2791 (comment)), [discussion_r2332166704](vllm-project#2817 (comment)), [discussion_r2343289692](vllm-project#2846 (comment)) high quality review. **Sustained and Quality Contributions:** He has Deep understanding of vLLM and vLLM Ascend codebases and solid contributions in RL scene (about [10+ PR merged](https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+author%3Aweijinqian0+is%3Amerged+) and 10+ PRs merged as co-author. - Code Refactor: As a co-author, he participated in the refactoring of the MOE module vllm-project#2150 vllm-project#2706 vllm-project#2867 - Performance Enhancement for RL: Participated as a co-author in the design and development of the solution, contributing to the planning of core capabilities. vllm-project#1547 vllm-project#2120 and so on. So I think he's a great addition to the vLLM Ascend Maintainer team. ---- Chuanyu Qin [@nalinaly](https://github.com/nalinaly) --- The main reason I nominated Qinchuanyu is because he is the initial designer of aclgraph and torch-npu, two key components of vllm-ascend. Considering aclgraph will eventually become the main path for vllm-ascend's graph model, I propose to nominate him. **Sustained and Major Contributions:** In fact, chuanyu actively helped the users/developers of vllm-ascend since Mar 2025 ([vllm-discuss#162](https://discuss.vllm.ai/t/can-ascend-officially-draft-a-documentation-on-the-vllm-ascend-adaptation-for-graph-mode/162/5)), and also helped early users of vllm-ascend understand aclgraph. He provided lots of help in the process of integrating aclgraph with vllm-ascend. **Community Involvement:** As speaker, he also presents help users understand aclgraph and torch_npu [《The design philosophy of torch_npu and the high performance principle of aclGraph》](https://github.com/PyTorch-China/pytorch-meetup/blob/main/beijing-2025/%E3%80%905%E3%80%91torch_npu%20%E7%9A%84%E8%AE%BE%E8%AE%A1%E5%93%B2%E5%AD%A6%E4%B8%8E%20aclGraph%20%E9%AB%98%E6%80%A7%E8%83%BD%E5%8E%9F%E7%90%86-%E7%A7%A6%E4%BC%A0%E7%91%9C-0920.pdf) ---- They have activate contribution to vllm-ascend or have rich experience for ascend AI. Welcome! - vLLM version: v0.11.0rc3 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0 Signed-off-by: wangxiyuan <[email protected]> Signed-off-by: hwhaokun <[email protected]>
### What this PR does / why we need it? 1. Replace prepare/finalize operation in fused_moe.py by moe_comm_method.prepare()/finalize() 2. Replace unified_fused_experts by moe_comm_method.fused_experts() in fused_moe.py/w8a8_dynamic.py/w4a8_dynamic.py 3. Add calling _select_moe_comm_method in spec-decode proposers. 4. Currently, w4a8_dynamic does not support gatherep, use all2allv instead. 5. Remove redundant code. ### Does this PR introduce _any_ user-facing change? AllgatherEP switch is disabled in aclgraph/eager mode, just follow the rules in modelrunner_v1._select_moe_comm_method() ### How was this patch tested? e2e & ut - vLLM version: v0.10.2 - vLLM main: vllm-project/vllm@7f6f2c1 Signed-off-by: Pr0Wh1teGivee <[email protected]> Co-authored-by: weijinqian0 <[email protected]> Signed-off-by: nsdie <[email protected]>
…ject#3001) ### What this PR does / why we need it? Fix issues mentioned in vllm-project#2791 and some minor refactoring. 1. Use Enum instead of string. 2. Avoid setting a new property to forward_context in AscendFusedMoE.forward(). 3. Enabling TokenDispatcherWithMoge. 4. Remove redundant code. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Qwen3-30B-A3B/Qwen3-30B-A3B-W8A8/DeepSeek-V3-W4A8-Pruing/deepseek-mtp/pangu-pro-moe-pruing: 1. Enable/Disable EP 2. Aclgraph & eager - vLLM version: v0.10.2 - vLLM main: vllm-project/vllm@9607d5e Signed-off-by: Pr0Wh1teGivee <[email protected]> Co-authored-by: weijinqian0 <[email protected]> Signed-off-by: nsdie <[email protected]>
…nalinaly (vllm-project#3406) I'd like to nominate 4 new maintainers for vllm-ascend: ---- Yizhou Liu [@yiz-liu](https://github.com/yiz-liu) ---- **Review Quality**: He has completed [40+ reviews](https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+commenter%3Ayiz-liu) and provided solutions or guides for [10+ issues](https://github.com/vllm-project/vllm-ascend/issues?q=is%3Aissue%20commenter%3Ayiz-liu), which includes many quality review like [#issue-3428408401](vllm-project#3002 (comment)), [#discussion_r2224572309](vllm-project#1803 (comment)), [#issuecomment-2982470226](vllm-project#1261 (comment)), [#issuecomment-2903621197](vllm-project#836 (comment)), [#issuecomment-2857678691](vllm-project#778 (comment)). **Sustained and High-Quality Contributions:** He has contributed more than [30+ commits](https://github.com/vllm-project/vllm-ascend/commits?author=yiz-liu) since Mar.2025, especially, aclgraph, DP, and EP related contributions are the main reason why I nominated him. As the owner of aclgraph support, he continuously improves aclgraph stability and performance as well as fixes key bugs. he laid the groundwork for EP-related functionality and delivered multiple foundational improvements **Community involvement:** He has a very good habit of logging issues:vllm-project#1649 and is also very active and involved in [many issues](https://github.com/vllm-project/vllm-ascend/issues?q=is%3Aissue%20state%3Aopen%20commenter%3Ayiz-liu%20-author%3Ayiz-liu) to help users resolve issues. ---- Peng Yu [@paulyu12](https://github.com/paulyu12) --- The main reasons for his nomination are his expertise and key contributions to the LORA and sustained and major contributions (initial support/doc/bugfix) around Lora. **Sustained and Major Contributions:** @paulyu12 starts his contribution with [Lora and Mulit-Lora support](vllm-project@697908f) since Apr 2025, he contributed about [10+ commits and bugfixes](vllm-project@697908f) on vllm-ascend. **Review Quality and Community Involvement:** He also helped more than 10+ users address [Lora related issues](https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+commenter%3Apaulyu12+-author%3Apaulyu12+is%3Aclosed). I believe his addition will further improve vLLM Ascend Lora support. ---- Jinqian Wei [@weijinqian0](https://github.com/weijinqian0) --- The main reasons for his nomination are his key contributions to the RL scene and the high quality of his code reviews. **Review Quality:** He has completed [60+ reviews](https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+commenter%3Aweijinqian0+is%3Aopen+-author%3Aweijinqian0) since June. 2025, include [#comment-3284055430](vllm-project#2791 (comment)), [discussion_r2332166704](vllm-project#2817 (comment)), [discussion_r2343289692](vllm-project#2846 (comment)) high quality review. **Sustained and Quality Contributions:** He has Deep understanding of vLLM and vLLM Ascend codebases and solid contributions in RL scene (about [10+ PR merged](https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+author%3Aweijinqian0+is%3Amerged+) and 10+ PRs merged as co-author. - Code Refactor: As a co-author, he participated in the refactoring of the MOE module vllm-project#2150 vllm-project#2706 vllm-project#2867 - Performance Enhancement for RL: Participated as a co-author in the design and development of the solution, contributing to the planning of core capabilities. vllm-project#1547 vllm-project#2120 and so on. So I think he's a great addition to the vLLM Ascend Maintainer team. ---- Chuanyu Qin [@nalinaly](https://github.com/nalinaly) --- The main reason I nominated Qinchuanyu is because he is the initial designer of aclgraph and torch-npu, two key components of vllm-ascend. Considering aclgraph will eventually become the main path for vllm-ascend's graph model, I propose to nominate him. **Sustained and Major Contributions:** In fact, chuanyu actively helped the users/developers of vllm-ascend since Mar 2025 ([vllm-discuss#162](https://discuss.vllm.ai/t/can-ascend-officially-draft-a-documentation-on-the-vllm-ascend-adaptation-for-graph-mode/162/5)), and also helped early users of vllm-ascend understand aclgraph. He provided lots of help in the process of integrating aclgraph with vllm-ascend. **Community Involvement:** As speaker, he also presents help users understand aclgraph and torch_npu [《The design philosophy of torch_npu and the high performance principle of aclGraph》](https://github.com/PyTorch-China/pytorch-meetup/blob/main/beijing-2025/%E3%80%905%E3%80%91torch_npu%20%E7%9A%84%E8%AE%BE%E8%AE%A1%E5%93%B2%E5%AD%A6%E4%B8%8E%20aclGraph%20%E9%AB%98%E6%80%A7%E8%83%BD%E5%8E%9F%E7%90%86-%E7%A7%A6%E4%BC%A0%E7%91%9C-0920.pdf) ---- They have activate contribution to vllm-ascend or have rich experience for ascend AI. Welcome! - vLLM version: v0.11.0rc3 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0 Signed-off-by: wangxiyuan <[email protected]> Signed-off-by: nsdie <[email protected]>
What this PR does / why we need it?
Does this PR introduce any user-facing change?
AllgatherEP switch is disabled in aclgraph/eager mode, just follow the rules in modelrunner_v1._select_moe_comm_method()
How was this patch tested?
e2e & ut