Skip to content

Conversation

@Pr0Wh1teGivee
Copy link
Contributor

@Pr0Wh1teGivee Pr0Wh1teGivee commented Sep 6, 2025

What this PR does / why we need it?

  1. Replace prepare/finalize operation in fused_moe.py by moe_comm_method.prepare()/finalize()
  2. Replace unified_fused_experts by moe_comm_method.fused_experts() in fused_moe.py/w8a8_dynamic.py/w4a8_dynamic.py
  3. Add calling _select_moe_comm_method in spec-decode proposers.
  4. Currently, w4a8_dynamic does not support gatherep, use all2allv instead.
  5. Remove redundant code.

Does this PR introduce any user-facing change?

AllgatherEP switch is disabled in aclgraph/eager mode, just follow the rules in modelrunner_v1._select_moe_comm_method()

How was this patch tested?

e2e & ut

@github-actions
Copy link

github-actions bot commented Sep 6, 2025

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

  • A PR should do only one thing, smaller PRs enable faster reviews.
  • Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
  • Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the Fused MoE implementation by restructuring the code into more modular and object-oriented components. The changes are extensive, moving logic from vllm_ascend/distributed/moe_comm_method.py and vllm_ascend/ops/common_fused_moe.py into new classes like MoECommMethod, FusedMoEPrepareAndFinalize, and various TokenDispatcher implementations under a new vllm_ascend/ops/moe/ directory. This improves code organization and separation of concerns. New unit tests have been added for the refactored components.

My main concern is with the NativeAllGatherCommImpl, which appears to be broken by the refactoring, containing dead code and not providing the native PyTorch fallback it's intended to. This should be addressed to avoid confusion and future bugs.

Comment on lines 164 to 255
class NativeAllGatherCommImpl(AllGatherCommImpl):
"""This implementation should be compatible with all scenarios.
Note that this implementation purely consists of native PyTorch ops
and does not use any NPU-specific ops. So the performance may not be optimal.
But it is a good fallback for scenarios where NPU-specific ops are not available.
"""

def permute(
self,
hidden_states: torch.Tensor,
topk_ids: torch.Tensor,
topk_weights: torch.Tensor,
expert_map: torch.Tensor,
num_experts: int,
apply_a8_quantization: bool,
) -> tuple[torch.Tensor, torch.Tensor, Optional[torch.Tensor], int]:
num_tokens = hidden_states.shape[0]

# Generate token indices and flatten
token_indices = torch.arange(num_tokens,
device=hidden_states.device,
dtype=torch.int64)
token_indices = (token_indices.unsqueeze(1).expand(
-1, self.moe_config.experts_per_token).reshape(-1))

# Flatten token-to-expert mappings and map to local experts
weights_flat = topk_weights.view(-1)
experts_flat = topk_ids.view(-1)
local_experts_flat = (expert_map[experts_flat]
if expert_map is not None else experts_flat)

# Filter valid token-expert pairs
mask = local_experts_flat != -1
# FIXME: npu_grouped_matmul output random values at [num_valid_tokens:, ...]
# So we need to filter out invalid tokens by zeroing their weights.
# This is a workaround and should be removed after the issue is fixed
filtered_weights = torch.where(mask, weights_flat,
torch.zeros_like(weights_flat)).to(
topk_weights.dtype)
filtered_experts = torch.where(
mask,
local_experts_flat,
torch.full_like(local_experts_flat, num_experts),
).to(topk_ids.dtype)

# Sort by local expert IDs
sort_indices = torch.argsort(filtered_experts.view(torch.float32))
self.sorted_token_indices = token_indices[sort_indices]
self.sorted_weights = filtered_weights[sort_indices]

# Compute token counts with minlength of num_experts
# This is equivalent to but faster than:
# >>> token_counts = torch.bincount(filtered_experts, minlength=num_experts)[:-1]
token_counts = torch.zeros(num_experts + 1,
device=hidden_states.device,
dtype=torch.int64)
ones = torch.ones_like(filtered_experts, dtype=torch.int64)
token_counts.scatter_add_(0, filtered_experts.to(torch.int64), ones)
expert_tokens = token_counts[:num_experts]

# Rearrange hidden_states
permuted_hidden_states = hidden_states[self.sorted_token_indices]

group_list_type = 1 # `count` mode

return permuted_hidden_states, expert_tokens, None, group_list_type

def unpermute(self, mlp_output: torch.Tensor,
hidden_states: torch.Tensor) -> None:
mlp_output = mlp_output * self.sorted_weights.unsqueeze(1)

final_hidden_states = torch.zeros_like(hidden_states)
final_hidden_states.index_add_(0, self.sorted_token_indices,
mlp_output)

hidden_states[:] = final_hidden_states

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The permute and unpermute methods within NativeAllGatherCommImpl appear to be dead code after the refactoring. The fused_experts method from the base MoECommMethod is now used, which calls a token_dispatcher instead of these methods.

Furthermore, NativeAllGatherCommImpl inherits its _get_token_dispatcher method from AllGatherCommImpl, which provides TokenDispatcherWithAllGather. This dispatcher uses NPU-specific operations, contradicting the docstring of NativeAllGatherCommImpl which states it should only use native PyTorch ops. This makes the class non-functional as a native fallback.

To fix this, these unused methods should be removed. A proper native fallback would require a new NativeTokenDispatcher that implements the logic from the old permute and unpermute methods, and NativeAllGatherCommImpl should be updated to use it.

@Pr0Wh1teGivee Pr0Wh1teGivee changed the title refactor common_fused_moe.py refactor fused_moe.py Sep 6, 2025
@github-actions
Copy link

github-actions bot commented Sep 8, 2025

This pull request has conflicts, please resolve those before we can evaluate the pull request.

@Pr0Wh1teGivee Pr0Wh1teGivee force-pushed the refactor_common_fused_moe branch from a6dc375 to 9ff7384 Compare September 8, 2025 06:56
@Pr0Wh1teGivee Pr0Wh1teGivee force-pushed the refactor_common_fused_moe branch from 278a4dc to 4e3a7fe Compare September 8, 2025 12:23
@Pr0Wh1teGivee Pr0Wh1teGivee changed the title refactor fused_moe.py [Main] [Refactor] Enable MoECommMethod in Eager Mode Sep 9, 2025
@github-actions
Copy link

github-actions bot commented Sep 9, 2025

This pull request has conflicts, please resolve those before we can evaluate the pull request.

@github-actions
Copy link

This pull request has conflicts, please resolve those before we can evaluate the pull request.

@Pr0Wh1teGivee Pr0Wh1teGivee force-pushed the refactor_common_fused_moe branch 4 times, most recently from 87afa53 to 461834a Compare September 11, 2025 09:12
@codecov
Copy link

codecov bot commented Sep 12, 2025

Codecov Report

❌ Patch coverage is 85.60000% with 18 lines in your changes missing coverage. Please review.
✅ Project coverage is 74.75%. Comparing base (1bbb20e) to head (1df0ea7).
⚠️ Report is 31 commits behind head on main.

Files with missing lines Patch % Lines
...m_ascend/ops/moe/fused_moe_prepare_and_finalize.py 88.88% 4 Missing ⚠️
vllm_ascend/worker/model_runner_v1.py 40.00% 3 Missing ⚠️
tests/ut/ops/test_fused_ops.py 89.47% 2 Missing ⚠️
vllm_ascend/ops/moe/token_dispatcher.py 33.33% 2 Missing ⚠️
vllm_ascend/quantization/w4a8_dynamic.py 0.00% 2 Missing ⚠️
vllm_ascend/quantization/w8a8_dynamic.py 0.00% 2 Missing ⚠️
vllm_ascend/spec_decode/mtp_proposer.py 0.00% 2 Missing ⚠️
vllm_ascend/ops/common_fused_moe.py 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2791      +/-   ##
==========================================
- Coverage   74.76%   74.75%   -0.02%     
==========================================
  Files         150      153       +3     
  Lines       20891    21080     +189     
==========================================
+ Hits        15620    15758     +138     
- Misses       5271     5322      +51     
Flag Coverage Δ
unittests 74.75% <85.60%> (-0.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@weijinqian0
Copy link
Collaborator

approved

Co-Authored-By: weijinqian0 <[email protected]>
Signed-off-by: Pr0Wh1teGivee <[email protected]>
@Pr0Wh1teGivee Pr0Wh1teGivee force-pushed the refactor_common_fused_moe branch from b31c231 to 93cd027 Compare September 16, 2025 01:28
_output_dtype = w2_scale.dtype

is_mc2 = get_forward_context().fused_moe_state == FusedMoEState.MC2
is_mc2 = get_forward_context().moe_comm_method_name == "mc2commimpl"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there are a lot hardcode xxxcommimpl, it's good to use Enum instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair point, will be refactored in my next pr.

router_logits = self.naive_multicast(
router_logits, cu_tokens_across_dp_cpu)
moe_comm_method_name = forward_context.moe_comm_method_name
forward_context.moe_comm_method = getattr(self, moe_comm_method_name)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

set a new property to forward_context here is not suitable, it breaks the princeple of forward_context and make the code hard for debug. While i notice that it's been done in common fused moe already. Let's refactor it in the next PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair point, will be refactored in my next pr.

elif soc_version in {AscendSocVersion.A3}:
moe_comm_method = "mc2" if num_tokens <= self.mc2_tokens_capacity else "alltoall"
else:
raise ValueError(f"Unsupported soc_version: {soc_version}")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks that 310p is missing

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FusedExpertsWithMoGE will be used by judging "model_type == "PanguProMoE" in the future. No need to add 310P here.

if not self.parallel_config.enable_expert_parallel:
moe_comm_method = "allgather"
elif soc_version in {AscendSocVersion.A2}:
if num_tokens <= self.mc2_tokens_capacity and self.parallel_config.world_size_across_dp >= 16:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

previous: go into allgather if VLLM_ENABLE_FUSED_EXPERTS_ALLGATHER_EP enabled
now: go into allgather if num_tokens <= self.mc2_tokens_capacity and self.parallel_config.world_size_across_dp >= 16 is False.

is the case squal?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confirmed with Ant, it is ok to use _select_comm_method.

Copy link
Collaborator

@wangxiyuan wangxiyuan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the review comment can be fixed in the next PR

@wangxiyuan wangxiyuan merged commit 18ca786 into vllm-project:main Sep 16, 2025
16 of 17 checks passed
@wangxiyuan
Copy link
Collaborator

pangu is broken by qwen3-next PR. issue is here: #2949 We'll fix it later.

wangxiyuan pushed a commit that referenced this pull request Sep 22, 2025
### What this PR does / why we need it?
Fix issues mentioned in
#2791 and some minor
refactoring.
1. Use Enum instead of string.
2. Avoid setting a new property to forward_context in
AscendFusedMoE.forward().
3. Enabling TokenDispatcherWithMoge.
4. Remove redundant code.

### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?

Qwen3-30B-A3B/Qwen3-30B-A3B-W8A8/DeepSeek-V3-W4A8-Pruing/deepseek-mtp/pangu-pro-moe-pruing:
1. Enable/Disable EP
2. Aclgraph & eager


- vLLM version: v0.10.2
- vLLM main:
vllm-project/vllm@9607d5e

Signed-off-by: Pr0Wh1teGivee <[email protected]>
Co-authored-by: weijinqian0 <[email protected]>
Mercykid-bash pushed a commit to Mercykid-bash/vllm-ascend that referenced this pull request Sep 22, 2025
…ject#3001)

### What this PR does / why we need it?
Fix issues mentioned in
vllm-project#2791 and some minor
refactoring.
1. Use Enum instead of string.
2. Avoid setting a new property to forward_context in
AscendFusedMoE.forward().
3. Enabling TokenDispatcherWithMoge.
4. Remove redundant code.

### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?

Qwen3-30B-A3B/Qwen3-30B-A3B-W8A8/DeepSeek-V3-W4A8-Pruing/deepseek-mtp/pangu-pro-moe-pruing:
1. Enable/Disable EP
2. Aclgraph & eager

- vLLM version: v0.10.2
- vLLM main:
vllm-project/vllm@9607d5e

Signed-off-by: Pr0Wh1teGivee <[email protected]>
Co-authored-by: weijinqian0 <[email protected]>
Signed-off-by: Che Ruan <[email protected]>
chopper0126 pushed a commit to chopper0126/vllm-ascend that referenced this pull request Sep 26, 2025
### What this PR does / why we need it?
1. Replace prepare/finalize operation in fused_moe.py by
moe_comm_method.prepare()/finalize()
2. Replace unified_fused_experts by moe_comm_method.fused_experts() in
fused_moe.py/w8a8_dynamic.py/w4a8_dynamic.py
3. Add calling _select_moe_comm_method in spec-decode proposers.
4. Currently, w4a8_dynamic does not support gatherep, use all2allv
instead.
5. Remove redundant code.
### Does this PR introduce _any_ user-facing change?
AllgatherEP switch is disabled in aclgraph/eager mode, just follow the
rules in modelrunner_v1._select_moe_comm_method()
### How was this patch tested?
e2e & ut


- vLLM version: v0.10.2
- vLLM main:
vllm-project/vllm@7f6f2c1

Signed-off-by: Pr0Wh1teGivee <[email protected]>
Co-authored-by: weijinqian0 <[email protected]>
wangxiyuan added a commit that referenced this pull request Oct 14, 2025
…nalinaly (#3406)

I'd like to nominate 4 new maintainers for vllm-ascend: 

----

Yizhou Liu [@yiz-liu](https://github.com/yiz-liu)
----

**Review Quality‌**: He has completed [40+
reviews](https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+commenter%3Ayiz-liu)
and provided solutions or guides for [10+
issues](https://github.com/vllm-project/vllm-ascend/issues?q=is%3Aissue%20commenter%3Ayiz-liu),
which includes many quality review like
[#issue-3428408401](#3002 (comment)),
[#discussion_r2224572309](#1803 (comment)),
[#issuecomment-2982470226](#1261 (comment)),
[#issuecomment-2903621197](#836 (comment)),
[#issuecomment-2857678691](#778 (comment)).

**Sustained and High-Quality Contributions:** He has contributed more
than [30+
commits](https://github.com/vllm-project/vllm-ascend/commits?author=yiz-liu)
since Mar.2025, especially, aclgraph, DP, and EP related contributions
are the main reason why I nominated him. As the owner of aclgraph
support, he continuously improves aclgraph stability and performance as
well as fixes key bugs. he laid the groundwork for EP-related
functionality and delivered multiple foundational improvements

**Community involvement:** He has a very good habit of logging
issues:#1649 and is
also very active and involved in [many
issues](https://github.com/vllm-project/vllm-ascend/issues?q=is%3Aissue%20state%3Aopen%20commenter%3Ayiz-liu%20-author%3Ayiz-liu)
to help users resolve issues.

----

Peng Yu  [@paulyu12](https://github.com/paulyu12)
---
The main reasons for his nomination are his expertise and key
contributions to the LORA and sustained and major contributions (initial
support/doc/bugfix) around Lora.

**Sustained and Major Contributions:** @paulyu12 starts his contribution
with [Lora and Mulit-Lora
support](697908f)
since Apr 2025, he contributed about [10+ commits and
bugfixes](697908f)
on vllm-ascend.
**Review Quality‌ and Community Involvement‌:** He also helped more than
10+ users address [Lora related
issues](https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+commenter%3Apaulyu12+-author%3Apaulyu12+is%3Aclosed).

I believe his addition will further improve vLLM Ascend Lora support.

----

Jinqian Wei [@weijinqian0](https://github.com/weijinqian0)
---
The main reasons for his nomination are his key contributions to the RL
scene and the high quality of his code reviews.

**Review Quality‌:** He has completed [60+
reviews](https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+commenter%3Aweijinqian0+is%3Aopen+-author%3Aweijinqian0)
since June. 2025, include
[#comment-3284055430](#2791 (comment)),
[discussion_r2332166704](#2817 (comment)),
[discussion_r2343289692](#2846 (comment))
high quality review.

**Sustained and Quality Contributions:** He has Deep understanding of
‌vLLM‌ and ‌vLLM Ascend‌ codebases and solid contributions in RL scene
(about [10+ PR
merged](https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+author%3Aweijinqian0+is%3Amerged+)
and 10+ PRs merged as co-author.

- Code Refactor: As a co-author, he participated in the refactoring of
the MOE module #2150
#2706
#2867
- Performance Enhancement for RL: Participated as a co-author in the
design and development of the solution, contributing to the planning of
core capabilities. #1547
#2120 and so on.

So I think he's a great addition to the vLLM Ascend Maintainer team.

----

Chuanyu Qin  [@nalinaly](https://github.com/nalinaly)
---
The main reason I nominated Qinchuanyu is because he is the initial
designer of aclgraph and torch-npu, two key components of vllm-ascend.
Considering aclgraph will eventually become the main path for
vllm-ascend's graph model, I propose to nominate him.

**Sustained and Major Contributions:** In fact, chuanyu actively helped
the users/developers of vllm-ascend since Mar 2025
([vllm-discuss#162](https://discuss.vllm.ai/t/can-ascend-officially-draft-a-documentation-on-the-vllm-ascend-adaptation-for-graph-mode/162/5)),
and also helped early users of vllm-ascend understand aclgraph. He
provided lots of help in the process of integrating aclgraph with
vllm-ascend.

**Community Involvement‌:** As speaker, he also presents help users
understand aclgraph and torch_npu [《The design philosophy of torch_npu
and the high performance principle of
aclGraph》](https://github.com/PyTorch-China/pytorch-meetup/blob/main/beijing-2025/%E3%80%905%E3%80%91torch_npu%20%E7%9A%84%E8%AE%BE%E8%AE%A1%E5%93%B2%E5%AD%A6%E4%B8%8E%20aclGraph%20%E9%AB%98%E6%80%A7%E8%83%BD%E5%8E%9F%E7%90%86-%E7%A7%A6%E4%BC%A0%E7%91%9C-0920.pdf)

----

They have activate contribution to vllm-ascend or have rich experience
for ascend AI.

Welcome!
- vLLM version: v0.11.0rc3
- vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0

Signed-off-by: wangxiyuan <[email protected]>
Angazenn pushed a commit to Angazenn/vllm-ascend that referenced this pull request Oct 21, 2025
### What this PR does / why we need it?
1. Replace prepare/finalize operation in fused_moe.py by
moe_comm_method.prepare()/finalize()
2. Replace unified_fused_experts by moe_comm_method.fused_experts() in
fused_moe.py/w8a8_dynamic.py/w4a8_dynamic.py
3. Add calling _select_moe_comm_method in spec-decode proposers.
4. Currently, w4a8_dynamic does not support gatherep, use all2allv
instead.
5. Remove redundant code.
### Does this PR introduce _any_ user-facing change?
AllgatherEP switch is disabled in aclgraph/eager mode, just follow the
rules in modelrunner_v1._select_moe_comm_method()
### How was this patch tested?
e2e & ut


- vLLM version: v0.10.2
- vLLM main:
vllm-project/vllm@7f6f2c1

Signed-off-by: Pr0Wh1teGivee <[email protected]>
Co-authored-by: weijinqian0 <[email protected]>
Angazenn pushed a commit to Angazenn/vllm-ascend that referenced this pull request Oct 21, 2025
…ject#3001)

### What this PR does / why we need it?
Fix issues mentioned in
vllm-project#2791 and some minor
refactoring.
1. Use Enum instead of string.
2. Avoid setting a new property to forward_context in
AscendFusedMoE.forward().
3. Enabling TokenDispatcherWithMoge.
4. Remove redundant code.

### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?

Qwen3-30B-A3B/Qwen3-30B-A3B-W8A8/DeepSeek-V3-W4A8-Pruing/deepseek-mtp/pangu-pro-moe-pruing:
1. Enable/Disable EP
2. Aclgraph & eager


- vLLM version: v0.10.2
- vLLM main:
vllm-project/vllm@9607d5e

Signed-off-by: Pr0Wh1teGivee <[email protected]>
Co-authored-by: weijinqian0 <[email protected]>
Angazenn pushed a commit to Angazenn/vllm-ascend that referenced this pull request Oct 21, 2025
…nalinaly (vllm-project#3406)

I'd like to nominate 4 new maintainers for vllm-ascend: 

----

Yizhou Liu [@yiz-liu](https://github.com/yiz-liu)
----

**Review Quality‌**: He has completed [40+
reviews](https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+commenter%3Ayiz-liu)
and provided solutions or guides for [10+
issues](https://github.com/vllm-project/vllm-ascend/issues?q=is%3Aissue%20commenter%3Ayiz-liu),
which includes many quality review like
[#issue-3428408401](vllm-project#3002 (comment)),
[#discussion_r2224572309](vllm-project#1803 (comment)),
[#issuecomment-2982470226](vllm-project#1261 (comment)),
[#issuecomment-2903621197](vllm-project#836 (comment)),
[#issuecomment-2857678691](vllm-project#778 (comment)).

**Sustained and High-Quality Contributions:** He has contributed more
than [30+
commits](https://github.com/vllm-project/vllm-ascend/commits?author=yiz-liu)
since Mar.2025, especially, aclgraph, DP, and EP related contributions
are the main reason why I nominated him. As the owner of aclgraph
support, he continuously improves aclgraph stability and performance as
well as fixes key bugs. he laid the groundwork for EP-related
functionality and delivered multiple foundational improvements

**Community involvement:** He has a very good habit of logging
issues:vllm-project#1649 and is
also very active and involved in [many
issues](https://github.com/vllm-project/vllm-ascend/issues?q=is%3Aissue%20state%3Aopen%20commenter%3Ayiz-liu%20-author%3Ayiz-liu)
to help users resolve issues.

----

Peng Yu  [@paulyu12](https://github.com/paulyu12)
---
The main reasons for his nomination are his expertise and key
contributions to the LORA and sustained and major contributions (initial
support/doc/bugfix) around Lora.

**Sustained and Major Contributions:** @paulyu12 starts his contribution
with [Lora and Mulit-Lora
support](vllm-project@697908f)
since Apr 2025, he contributed about [10+ commits and
bugfixes](vllm-project@697908f)
on vllm-ascend.
**Review Quality‌ and Community Involvement‌:** He also helped more than
10+ users address [Lora related
issues](https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+commenter%3Apaulyu12+-author%3Apaulyu12+is%3Aclosed).

I believe his addition will further improve vLLM Ascend Lora support.

----

Jinqian Wei [@weijinqian0](https://github.com/weijinqian0)
---
The main reasons for his nomination are his key contributions to the RL
scene and the high quality of his code reviews.

**Review Quality‌:** He has completed [60+
reviews](https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+commenter%3Aweijinqian0+is%3Aopen+-author%3Aweijinqian0)
since June. 2025, include
[#comment-3284055430](vllm-project#2791 (comment)),
[discussion_r2332166704](vllm-project#2817 (comment)),
[discussion_r2343289692](vllm-project#2846 (comment))
high quality review.

**Sustained and Quality Contributions:** He has Deep understanding of
‌vLLM‌ and ‌vLLM Ascend‌ codebases and solid contributions in RL scene
(about [10+ PR
merged](https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+author%3Aweijinqian0+is%3Amerged+)
and 10+ PRs merged as co-author.

- Code Refactor: As a co-author, he participated in the refactoring of
the MOE module vllm-project#2150
vllm-project#2706
vllm-project#2867
- Performance Enhancement for RL: Participated as a co-author in the
design and development of the solution, contributing to the planning of
core capabilities. vllm-project#1547
vllm-project#2120 and so on.

So I think he's a great addition to the vLLM Ascend Maintainer team.

----

Chuanyu Qin  [@nalinaly](https://github.com/nalinaly)
---
The main reason I nominated Qinchuanyu is because he is the initial
designer of aclgraph and torch-npu, two key components of vllm-ascend.
Considering aclgraph will eventually become the main path for
vllm-ascend's graph model, I propose to nominate him.

**Sustained and Major Contributions:** In fact, chuanyu actively helped
the users/developers of vllm-ascend since Mar 2025
([vllm-discuss#162](https://discuss.vllm.ai/t/can-ascend-officially-draft-a-documentation-on-the-vllm-ascend-adaptation-for-graph-mode/162/5)),
and also helped early users of vllm-ascend understand aclgraph. He
provided lots of help in the process of integrating aclgraph with
vllm-ascend.

**Community Involvement‌:** As speaker, he also presents help users
understand aclgraph and torch_npu [《The design philosophy of torch_npu
and the high performance principle of
aclGraph》](https://github.com/PyTorch-China/pytorch-meetup/blob/main/beijing-2025/%E3%80%905%E3%80%91torch_npu%20%E7%9A%84%E8%AE%BE%E8%AE%A1%E5%93%B2%E5%AD%A6%E4%B8%8E%20aclGraph%20%E9%AB%98%E6%80%A7%E8%83%BD%E5%8E%9F%E7%90%86-%E7%A7%A6%E4%BC%A0%E7%91%9C-0920.pdf)

----

They have activate contribution to vllm-ascend or have rich experience
for ascend AI.

Welcome!
- vLLM version: v0.11.0rc3
- vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0

Signed-off-by: wangxiyuan <[email protected]>
luolun pushed a commit to luolun/vllm-ascend that referenced this pull request Nov 19, 2025
…nalinaly (vllm-project#3406)

I'd like to nominate 4 new maintainers for vllm-ascend: 

----

Yizhou Liu [@yiz-liu](https://github.com/yiz-liu)
----

**Review Quality‌**: He has completed [40+
reviews](https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+commenter%3Ayiz-liu)
and provided solutions or guides for [10+
issues](https://github.com/vllm-project/vllm-ascend/issues?q=is%3Aissue%20commenter%3Ayiz-liu),
which includes many quality review like
[#issue-3428408401](vllm-project#3002 (comment)),
[#discussion_r2224572309](vllm-project#1803 (comment)),
[#issuecomment-2982470226](vllm-project#1261 (comment)),
[#issuecomment-2903621197](vllm-project#836 (comment)),
[#issuecomment-2857678691](vllm-project#778 (comment)).

**Sustained and High-Quality Contributions:** He has contributed more
than [30+
commits](https://github.com/vllm-project/vllm-ascend/commits?author=yiz-liu)
since Mar.2025, especially, aclgraph, DP, and EP related contributions
are the main reason why I nominated him. As the owner of aclgraph
support, he continuously improves aclgraph stability and performance as
well as fixes key bugs. he laid the groundwork for EP-related
functionality and delivered multiple foundational improvements

**Community involvement:** He has a very good habit of logging
issues:vllm-project#1649 and is
also very active and involved in [many
issues](https://github.com/vllm-project/vllm-ascend/issues?q=is%3Aissue%20state%3Aopen%20commenter%3Ayiz-liu%20-author%3Ayiz-liu)
to help users resolve issues.

----

Peng Yu  [@paulyu12](https://github.com/paulyu12)
---
The main reasons for his nomination are his expertise and key
contributions to the LORA and sustained and major contributions (initial
support/doc/bugfix) around Lora.

**Sustained and Major Contributions:** @paulyu12 starts his contribution
with [Lora and Mulit-Lora
support](vllm-project@697908f)
since Apr 2025, he contributed about [10+ commits and
bugfixes](vllm-project@697908f)
on vllm-ascend.
**Review Quality‌ and Community Involvement‌:** He also helped more than
10+ users address [Lora related
issues](https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+commenter%3Apaulyu12+-author%3Apaulyu12+is%3Aclosed).

I believe his addition will further improve vLLM Ascend Lora support.

----

Jinqian Wei [@weijinqian0](https://github.com/weijinqian0)
---
The main reasons for his nomination are his key contributions to the RL
scene and the high quality of his code reviews.

**Review Quality‌:** He has completed [60+
reviews](https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+commenter%3Aweijinqian0+is%3Aopen+-author%3Aweijinqian0)
since June. 2025, include
[#comment-3284055430](vllm-project#2791 (comment)),
[discussion_r2332166704](vllm-project#2817 (comment)),
[discussion_r2343289692](vllm-project#2846 (comment))
high quality review.

**Sustained and Quality Contributions:** He has Deep understanding of
‌vLLM‌ and ‌vLLM Ascend‌ codebases and solid contributions in RL scene
(about [10+ PR
merged](https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+author%3Aweijinqian0+is%3Amerged+)
and 10+ PRs merged as co-author.

- Code Refactor: As a co-author, he participated in the refactoring of
the MOE module vllm-project#2150
vllm-project#2706
vllm-project#2867
- Performance Enhancement for RL: Participated as a co-author in the
design and development of the solution, contributing to the planning of
core capabilities. vllm-project#1547
vllm-project#2120 and so on.

So I think he's a great addition to the vLLM Ascend Maintainer team.

----

Chuanyu Qin  [@nalinaly](https://github.com/nalinaly)
---
The main reason I nominated Qinchuanyu is because he is the initial
designer of aclgraph and torch-npu, two key components of vllm-ascend.
Considering aclgraph will eventually become the main path for
vllm-ascend's graph model, I propose to nominate him.

**Sustained and Major Contributions:** In fact, chuanyu actively helped
the users/developers of vllm-ascend since Mar 2025
([vllm-discuss#162](https://discuss.vllm.ai/t/can-ascend-officially-draft-a-documentation-on-the-vllm-ascend-adaptation-for-graph-mode/162/5)),
and also helped early users of vllm-ascend understand aclgraph. He
provided lots of help in the process of integrating aclgraph with
vllm-ascend.

**Community Involvement‌:** As speaker, he also presents help users
understand aclgraph and torch_npu [《The design philosophy of torch_npu
and the high performance principle of
aclGraph》](https://github.com/PyTorch-China/pytorch-meetup/blob/main/beijing-2025/%E3%80%905%E3%80%91torch_npu%20%E7%9A%84%E8%AE%BE%E8%AE%A1%E5%93%B2%E5%AD%A6%E4%B8%8E%20aclGraph%20%E9%AB%98%E6%80%A7%E8%83%BD%E5%8E%9F%E7%90%86-%E7%A7%A6%E4%BC%A0%E7%91%9C-0920.pdf)

----

They have activate contribution to vllm-ascend or have rich experience
for ascend AI.

Welcome!
- vLLM version: v0.11.0rc3
- vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0

Signed-off-by: wangxiyuan <[email protected]>
Signed-off-by: luolun <[email protected]>
hwhaokun pushed a commit to hwhaokun/vllm-ascend that referenced this pull request Nov 19, 2025
…ject#3001)

### What this PR does / why we need it?
Fix issues mentioned in
vllm-project#2791 and some minor
refactoring.
1. Use Enum instead of string.
2. Avoid setting a new property to forward_context in
AscendFusedMoE.forward().
3. Enabling TokenDispatcherWithMoge.
4. Remove redundant code.

### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?

Qwen3-30B-A3B/Qwen3-30B-A3B-W8A8/DeepSeek-V3-W4A8-Pruing/deepseek-mtp/pangu-pro-moe-pruing:
1. Enable/Disable EP
2. Aclgraph & eager

- vLLM version: v0.10.2
- vLLM main:
vllm-project/vllm@9607d5e

Signed-off-by: Pr0Wh1teGivee <[email protected]>
Co-authored-by: weijinqian0 <[email protected]>
Signed-off-by: hwhaokun <[email protected]>
hwhaokun pushed a commit to hwhaokun/vllm-ascend that referenced this pull request Nov 19, 2025
…nalinaly (vllm-project#3406)

I'd like to nominate 4 new maintainers for vllm-ascend:

----

Yizhou Liu [@yiz-liu](https://github.com/yiz-liu)
----

**Review Quality‌**: He has completed [40+
reviews](https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+commenter%3Ayiz-liu)
and provided solutions or guides for [10+
issues](https://github.com/vllm-project/vllm-ascend/issues?q=is%3Aissue%20commenter%3Ayiz-liu),
which includes many quality review like
[#issue-3428408401](vllm-project#3002 (comment)),
[#discussion_r2224572309](vllm-project#1803 (comment)),
[#issuecomment-2982470226](vllm-project#1261 (comment)),
[#issuecomment-2903621197](vllm-project#836 (comment)),
[#issuecomment-2857678691](vllm-project#778 (comment)).

**Sustained and High-Quality Contributions:** He has contributed more
than [30+
commits](https://github.com/vllm-project/vllm-ascend/commits?author=yiz-liu)
since Mar.2025, especially, aclgraph, DP, and EP related contributions
are the main reason why I nominated him. As the owner of aclgraph
support, he continuously improves aclgraph stability and performance as
well as fixes key bugs. he laid the groundwork for EP-related
functionality and delivered multiple foundational improvements

**Community involvement:** He has a very good habit of logging
issues:vllm-project#1649 and is
also very active and involved in [many
issues](https://github.com/vllm-project/vllm-ascend/issues?q=is%3Aissue%20state%3Aopen%20commenter%3Ayiz-liu%20-author%3Ayiz-liu)
to help users resolve issues.

----

Peng Yu  [@paulyu12](https://github.com/paulyu12)
---
The main reasons for his nomination are his expertise and key
contributions to the LORA and sustained and major contributions (initial
support/doc/bugfix) around Lora.

**Sustained and Major Contributions:** @paulyu12 starts his contribution
with [Lora and Mulit-Lora
support](vllm-project@697908f)
since Apr 2025, he contributed about [10+ commits and
bugfixes](vllm-project@697908f)
on vllm-ascend.
**Review Quality‌ and Community Involvement‌:** He also helped more than
10+ users address [Lora related
issues](https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+commenter%3Apaulyu12+-author%3Apaulyu12+is%3Aclosed).

I believe his addition will further improve vLLM Ascend Lora support.

----

Jinqian Wei [@weijinqian0](https://github.com/weijinqian0)
---
The main reasons for his nomination are his key contributions to the RL
scene and the high quality of his code reviews.

**Review Quality‌:** He has completed [60+
reviews](https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+commenter%3Aweijinqian0+is%3Aopen+-author%3Aweijinqian0)
since June. 2025, include
[#comment-3284055430](vllm-project#2791 (comment)),
[discussion_r2332166704](vllm-project#2817 (comment)),
[discussion_r2343289692](vllm-project#2846 (comment))
high quality review.

**Sustained and Quality Contributions:** He has Deep understanding of
‌vLLM‌ and ‌vLLM Ascend‌ codebases and solid contributions in RL scene
(about [10+ PR
merged](https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+author%3Aweijinqian0+is%3Amerged+)
and 10+ PRs merged as co-author.

- Code Refactor: As a co-author, he participated in the refactoring of
the MOE module vllm-project#2150
vllm-project#2706
vllm-project#2867
- Performance Enhancement for RL: Participated as a co-author in the
design and development of the solution, contributing to the planning of
core capabilities. vllm-project#1547
vllm-project#2120 and so on.

So I think he's a great addition to the vLLM Ascend Maintainer team.

----

Chuanyu Qin  [@nalinaly](https://github.com/nalinaly)
---
The main reason I nominated Qinchuanyu is because he is the initial
designer of aclgraph and torch-npu, two key components of vllm-ascend.
Considering aclgraph will eventually become the main path for
vllm-ascend's graph model, I propose to nominate him.

**Sustained and Major Contributions:** In fact, chuanyu actively helped
the users/developers of vllm-ascend since Mar 2025
([vllm-discuss#162](https://discuss.vllm.ai/t/can-ascend-officially-draft-a-documentation-on-the-vllm-ascend-adaptation-for-graph-mode/162/5)),
and also helped early users of vllm-ascend understand aclgraph. He
provided lots of help in the process of integrating aclgraph with
vllm-ascend.

**Community Involvement‌:** As speaker, he also presents help users
understand aclgraph and torch_npu [《The design philosophy of torch_npu
and the high performance principle of
aclGraph》](https://github.com/PyTorch-China/pytorch-meetup/blob/main/beijing-2025/%E3%80%905%E3%80%91torch_npu%20%E7%9A%84%E8%AE%BE%E8%AE%A1%E5%93%B2%E5%AD%A6%E4%B8%8E%20aclGraph%20%E9%AB%98%E6%80%A7%E8%83%BD%E5%8E%9F%E7%90%86-%E7%A7%A6%E4%BC%A0%E7%91%9C-0920.pdf)

----

They have activate contribution to vllm-ascend or have rich experience
for ascend AI.

Welcome!
- vLLM version: v0.11.0rc3
- vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0

Signed-off-by: wangxiyuan <[email protected]>
Signed-off-by: hwhaokun <[email protected]>
NSDie pushed a commit to NSDie/vllm-ascend that referenced this pull request Nov 24, 2025
### What this PR does / why we need it?
1. Replace prepare/finalize operation in fused_moe.py by
moe_comm_method.prepare()/finalize()
2. Replace unified_fused_experts by moe_comm_method.fused_experts() in
fused_moe.py/w8a8_dynamic.py/w4a8_dynamic.py
3. Add calling _select_moe_comm_method in spec-decode proposers.
4. Currently, w4a8_dynamic does not support gatherep, use all2allv
instead.
5. Remove redundant code.
### Does this PR introduce _any_ user-facing change?
AllgatherEP switch is disabled in aclgraph/eager mode, just follow the
rules in modelrunner_v1._select_moe_comm_method()
### How was this patch tested?
e2e & ut

- vLLM version: v0.10.2
- vLLM main:
vllm-project/vllm@7f6f2c1

Signed-off-by: Pr0Wh1teGivee <[email protected]>
Co-authored-by: weijinqian0 <[email protected]>
Signed-off-by: nsdie <[email protected]>
NSDie pushed a commit to NSDie/vllm-ascend that referenced this pull request Nov 24, 2025
…ject#3001)

### What this PR does / why we need it?
Fix issues mentioned in
vllm-project#2791 and some minor
refactoring.
1. Use Enum instead of string.
2. Avoid setting a new property to forward_context in
AscendFusedMoE.forward().
3. Enabling TokenDispatcherWithMoge.
4. Remove redundant code.

### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?

Qwen3-30B-A3B/Qwen3-30B-A3B-W8A8/DeepSeek-V3-W4A8-Pruing/deepseek-mtp/pangu-pro-moe-pruing:
1. Enable/Disable EP
2. Aclgraph & eager

- vLLM version: v0.10.2
- vLLM main:
vllm-project/vllm@9607d5e

Signed-off-by: Pr0Wh1teGivee <[email protected]>
Co-authored-by: weijinqian0 <[email protected]>
Signed-off-by: nsdie <[email protected]>
NSDie pushed a commit to NSDie/vllm-ascend that referenced this pull request Nov 24, 2025
…nalinaly (vllm-project#3406)

I'd like to nominate 4 new maintainers for vllm-ascend:

----

Yizhou Liu [@yiz-liu](https://github.com/yiz-liu)
----

**Review Quality‌**: He has completed [40+
reviews](https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+commenter%3Ayiz-liu)
and provided solutions or guides for [10+
issues](https://github.com/vllm-project/vllm-ascend/issues?q=is%3Aissue%20commenter%3Ayiz-liu),
which includes many quality review like
[#issue-3428408401](vllm-project#3002 (comment)),
[#discussion_r2224572309](vllm-project#1803 (comment)),
[#issuecomment-2982470226](vllm-project#1261 (comment)),
[#issuecomment-2903621197](vllm-project#836 (comment)),
[#issuecomment-2857678691](vllm-project#778 (comment)).

**Sustained and High-Quality Contributions:** He has contributed more
than [30+
commits](https://github.com/vllm-project/vllm-ascend/commits?author=yiz-liu)
since Mar.2025, especially, aclgraph, DP, and EP related contributions
are the main reason why I nominated him. As the owner of aclgraph
support, he continuously improves aclgraph stability and performance as
well as fixes key bugs. he laid the groundwork for EP-related
functionality and delivered multiple foundational improvements

**Community involvement:** He has a very good habit of logging
issues:vllm-project#1649 and is
also very active and involved in [many
issues](https://github.com/vllm-project/vllm-ascend/issues?q=is%3Aissue%20state%3Aopen%20commenter%3Ayiz-liu%20-author%3Ayiz-liu)
to help users resolve issues.

----

Peng Yu  [@paulyu12](https://github.com/paulyu12)
---
The main reasons for his nomination are his expertise and key
contributions to the LORA and sustained and major contributions (initial
support/doc/bugfix) around Lora.

**Sustained and Major Contributions:** @paulyu12 starts his contribution
with [Lora and Mulit-Lora
support](vllm-project@697908f)
since Apr 2025, he contributed about [10+ commits and
bugfixes](vllm-project@697908f)
on vllm-ascend.
**Review Quality‌ and Community Involvement‌:** He also helped more than
10+ users address [Lora related
issues](https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+commenter%3Apaulyu12+-author%3Apaulyu12+is%3Aclosed).

I believe his addition will further improve vLLM Ascend Lora support.

----

Jinqian Wei [@weijinqian0](https://github.com/weijinqian0)
---
The main reasons for his nomination are his key contributions to the RL
scene and the high quality of his code reviews.

**Review Quality‌:** He has completed [60+
reviews](https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+commenter%3Aweijinqian0+is%3Aopen+-author%3Aweijinqian0)
since June. 2025, include
[#comment-3284055430](vllm-project#2791 (comment)),
[discussion_r2332166704](vllm-project#2817 (comment)),
[discussion_r2343289692](vllm-project#2846 (comment))
high quality review.

**Sustained and Quality Contributions:** He has Deep understanding of
‌vLLM‌ and ‌vLLM Ascend‌ codebases and solid contributions in RL scene
(about [10+ PR
merged](https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+author%3Aweijinqian0+is%3Amerged+)
and 10+ PRs merged as co-author.

- Code Refactor: As a co-author, he participated in the refactoring of
the MOE module vllm-project#2150
vllm-project#2706
vllm-project#2867
- Performance Enhancement for RL: Participated as a co-author in the
design and development of the solution, contributing to the planning of
core capabilities. vllm-project#1547
vllm-project#2120 and so on.

So I think he's a great addition to the vLLM Ascend Maintainer team.

----

Chuanyu Qin  [@nalinaly](https://github.com/nalinaly)
---
The main reason I nominated Qinchuanyu is because he is the initial
designer of aclgraph and torch-npu, two key components of vllm-ascend.
Considering aclgraph will eventually become the main path for
vllm-ascend's graph model, I propose to nominate him.

**Sustained and Major Contributions:** In fact, chuanyu actively helped
the users/developers of vllm-ascend since Mar 2025
([vllm-discuss#162](https://discuss.vllm.ai/t/can-ascend-officially-draft-a-documentation-on-the-vllm-ascend-adaptation-for-graph-mode/162/5)),
and also helped early users of vllm-ascend understand aclgraph. He
provided lots of help in the process of integrating aclgraph with
vllm-ascend.

**Community Involvement‌:** As speaker, he also presents help users
understand aclgraph and torch_npu [《The design philosophy of torch_npu
and the high performance principle of
aclGraph》](https://github.com/PyTorch-China/pytorch-meetup/blob/main/beijing-2025/%E3%80%905%E3%80%91torch_npu%20%E7%9A%84%E8%AE%BE%E8%AE%A1%E5%93%B2%E5%AD%A6%E4%B8%8E%20aclGraph%20%E9%AB%98%E6%80%A7%E8%83%BD%E5%8E%9F%E7%90%86-%E7%A7%A6%E4%BC%A0%E7%91%9C-0920.pdf)

----

They have activate contribution to vllm-ascend or have rich experience
for ascend AI.

Welcome!
- vLLM version: v0.11.0rc3
- vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0

Signed-off-by: wangxiyuan <[email protected]>
Signed-off-by: nsdie <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants