Skip to content

[Bugfix] bugfix for moe_mlp#4822

Merged
wangxiyuan merged 6 commits intovllm-project:mainfrom
Clorist33:bugfix_moe_mlp_new
Dec 12, 2025
Merged

[Bugfix] bugfix for moe_mlp#4822
wangxiyuan merged 6 commits intovllm-project:mainfrom
Clorist33:bugfix_moe_mlp_new

Conversation

@Clorist33
Copy link
Copy Markdown
Contributor

@Clorist33 Clorist33 commented Dec 9, 2025

What this PR does / why we need it?

This PR fixes a bug in the moe_mlp module by correcting the arguments passed to the torch_npu.npu_dequant_swiglu_quant function.It properly converts group_list from a cumulative sum to counts for the group_index parameter.

Does this PR introduce any user-facing change?

No

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request addresses a bug in the moe_mlp module by correctly converting the group_list argument from a cumulative sum to counts before passing it to torch_npu.npu_dequant_swiglu_quant. The logic for this conversion is sound. However, I've identified a potential edge case in the new code: it doesn't handle an empty group_list, which could lead to a runtime error. I have provided a suggestion to make the implementation more robust against this scenario.

Comment on lines +131 to +132
new_group = torch.cat([group_list[0].unsqueeze(0), group_diff],
dim=0)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Using group_list[0].unsqueeze(0) will raise an IndexError if group_list is an empty tensor. This can occur if no tokens are routed to experts on the current device. Using slicing group_list[:1] is more robust as it returns an empty tensor for an empty input, preventing a crash.

            new_group = torch.cat([group_list[:1], group_diff], dim=0)

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Dec 9, 2025

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

  • A PR should do only one thing, smaller PRs enable faster reviews.
  • Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
  • Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

Comment on lines +130 to +132
group_diff = torch.diff(group_list, dim=0)
new_group = torch.cat([group_list[0].unsqueeze(0), group_diff],
dim=0)
Copy link
Copy Markdown
Contributor

@zhoux77899 zhoux77899 Dec 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It’s better to first check if there are cases where group_list_type != 1. You can extract a function to convert other types of group_list into count format, taking group_list and group_list_type as parameters, and determine how to perform the conversion within the function (refer to the cumsum_group_list function).

def count_group_list(group_list: torch.Tensor,
                     group_list_type: int) -> torch.Tensor
    if group_list_type not in [0, 1, 2]:
        raise ValueError(
            f"group_list_type should be in [0, 1, 2], but received {group_list_type}"
        )

    if group_list_type == 0:
        return torch.cat((group_list[:1], torch.diff(group_list)))
    if group_list_type == 1:
        return group_list

    # group_list_type == 2
    ...

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your suggestions. We have incorporated your proposal into the cumsum_group_list function. Additionally, could you please clarify what scenario corresponds to group_list_type == 2?

Copy link
Copy Markdown
Contributor Author

@Clorist33 Clorist33 Dec 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes are here:

def cumsum_group_list(group_list: torch.Tensor,
                      src_list_type: int,
                      dst_list_type: int,
                      active_num: int = 0,
                      expert_num: int = 0) -> torch.Tensor:
    if src_list_type not in [0, 1, 2]:
        raise ValueError(
            f"group_list_type should be in [0, 1, 2], but received {src_list_type}"
        )
    if src_list_type == dst_list_type:
        return group_list
    if src_list_type == 1 and dst_list_type == 0:
        return group_list.cumsum(dim=0)
    if src_list_type == 0 and dst_list_type == 1:
        group_diff = torch.diff(group_list)
        new_group = torch.cat([group_diff[0].unsqueeze(0), group_diff], dim=0)
        return new_group

    experts = pad(group_list[:, 0], (1, 0))
    tokens = pad(group_list[:, 1].cumsum(dim=0), (1, 0))
    cumsum_group_list = torch.full(size=(expert_num, ),
                                   fill_value=active_num,
                                   dtype=group_list.dtype,
                                   device=group_list.device)

    for i, (start, end) in enumerate(zip(experts[:-1], experts[1:])):
        if end > start:
            cumsum_group_list[start:end] = tokens[i]

    return cumsum_group_list

quant_scale=None,
quant_offset=None,
group_index=group_list,
group_index=new_group,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And use the function here.

Suggested change
group_index=new_group,
group_index=count_group_list(group_list, group_list_type)

active_num: int = 0,
expert_num: int = 0) -> torch.Tensor:
if group_list_type not in [0, 1, 2]:
if src_list_type not in [0, 1, 2]:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what meanings src_list_type==2? How to handle src_list_type=2 and dst_list_type=0?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also have the same confusion regarding the scenario where src_list_type == 2 in the file ops/fused_moe/moe_mlp.py on the main branch of the vllm-ascend repository. @zhoux77899 Would you please clarify this point ?

Copy link
Copy Markdown
Contributor

@zhoux77899 zhoux77899 Dec 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some ops like moe_init_routing_v2 can output the type 2 (key_value) group_list, but I’ve never seen this type of group_list actually used anywhere.

If there is a type 1 group_list like [0, 2, 1, 0]:

  • group_list_type = 0 means cumsum group_list, it will be [0, 2, 3, 3];
  • group_list_type = 1 means count group_list, it will be [0, 2, 1, 0];
  • group_list_type = 2 means key_value group_list, it will be [[1, 2], [2, 1], [0, 0], [0, 0]];

Copy link
Copy Markdown
Contributor Author

@Clorist33 Clorist33 Dec 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

group_list_type == 2 means key_value group_list, it will be [[0, 0], [1, 2], [2, 1], [3, 0]]?

Copy link
Copy Markdown
Contributor

@zhoux77899 zhoux77899 Dec 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies, when group_list_type = 2, group_list will be [[1, 2], [2, 1], [0, 0], [0, 0]]. It only contains tokens of active_num but is padded to expert_num. Maybe you can compare their differences using the script below.

import torch
import torch_npu

from vllm_ascend.ops.fused_moe.moe_mlp import cumsum_group_list


class GroupListTypeTester:
    def __init__(
        self,
        batch_size: int = 1,
        hidden_size: int = 768,
        active_experts: int = 2,
        num_experts: int = 4,
    ) -> None:
        self.batch_size = batch_size
        self.hidden_size = hidden_size
        self.active_experts = active_experts
        self.num_experts = num_experts

        self.x = torch.randn(size=(self.batch_size, self.hidden_size), dtype=torch.bfloat16).npu()
        self.expert_idx = torch.randint(low=0, high=self.num_experts, size=(self.batch_size, self.active_experts), dtype=torch.int32).npu()
        self.scale = torch.randn(size=(self.batch_size, ), dtype=torch.float32).npu()
        self.offset = None

        self.init_routing_kwargs = {
            "x": self.x,
            "expert_idx": self.expert_idx,
            "scale": self.scale,
            "offset": self.offset,
            "active_num": self.active_experts,
            "expert_num": self.num_experts,
            "expert_tokens_num_flag": True,
            "quant_mode": -1,
            "active_expert_range": [0, self.num_experts],
            "row_idx_type": 0,
        }

    def __call__(self) -> None:
        count_group_list = self.output_count_group_list()
        kv_group_list = self.output_kv_group_list()
        print(f"{count_group_list=}, cumsum_group_list_from_count={cumsum_group_list(count_group_list, 1)}")
        print(f"{kv_group_list=}, cumsum_group_list_from_kv={cumsum_group_list(kv_group_list, 2, self.active_experts, self.num_experts)}")

    def output_count_group_list(self) -> torch.Tensor:
        _, _, group_list, _ = torch_npu.npu_moe_init_routing_v2(
            **self.init_routing_kwargs,
            expert_tokens_num_type=1,
        )
        return group_list

    def output_kv_group_list(self) -> torch.Tensor:
        _, _, group_list, _ = torch_npu.npu_moe_init_routing_v2(
            **self.init_routing_kwargs,
            expert_tokens_num_type=2,
        )
        return group_list


if __name__ == "__main__":
    tester = GroupListTypeTester()
    tester()

I think consolidating all types of group_list computations into a single function might be overly complex, as it requires handling 6 different cases, and many call sites would also need modifications. Splitting them into separate functions would be more reasonable.

You may also consider whether to include the group_list_type = 2 scenario. The reason cumsum_group_list includes it is for safety redundancy, but I’m unsure about the specific use case it was designed for and I’ve never seen actually uses it at anywhere.

@github-actions
Copy link
Copy Markdown
Contributor

This pull request has conflicts, please resolve those before we can evaluate the pull request.

Signed-off-by: tanqingshan (A) <50050625@china.huawei.com>

Signed-off-by: tanqingshan (A)  <50050625@china.huawei.com>
tanqingshan (A) added 3 commits December 11, 2025 15:18
Signed-off-by: tanqingshan (A) <50050625@china.huawei.com>

Signed-off-by: tanqingshan (A)  <50050625@china.huawei.com>
Signed-off-by: tanqingshan (A) <50050625@china.huawei.com>

Signed-off-by: tanqingshan (A)  <50050625@china.huawei.com>
@zzzzwwjj zzzzwwjj added ready read for review ready-for-test start test by label for PR labels Dec 11, 2025
weight_scale=w1_scale,
x_scale=pertoken_scale,
group_list=cumsum_group_list(group_list, group_list_type),
group_list=group_list,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to check the range of group_list supported by the corresponding operator.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confirmed. Code updated.

tanqingshan (A) added 2 commits December 12, 2025 11:04
Signed-off-by: tanqingshan (A) <50050625@china.huawei.com>

Signed-off-by: tanqingshan (A)  <50050625@china.huawei.com>
Signed-off-by: tanqingshan (A) <50050625@china.huawei.com>

Signed-off-by: tanqingshan (A)  <50050625@china.huawei.com>
@wangxiyuan wangxiyuan merged commit 4984e8a into vllm-project:main Dec 12, 2025
25 checks passed
@wangxiyuan
Copy link
Copy Markdown
Collaborator

This need be merged to dev branch. So I merged this now.

wangxiyuan added a commit that referenced this pull request Dec 18, 2025
I'd like to nominate @zzzzwwjj @realliujiaxu @LCAIZJ to join vLLM Ascend
committer team.

@zzzzwwjj
---
- Review Quality‌:
He has completed 80+reviews since April. 2025, include
#3232 (comment),
#4822 (comment),
#4768 (comment)
high quality review.

- Sustained Contributions
15+ Valuable bug fix and refactor is very good.

https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+author%3Azzzzwwjj+is%3Aclosed+review%3Aapproved
Continuous optimization of code architecture

https://github.com/vllm-project/vllm-ascend/pulls?q=author%3Azzzzwwjj+is%3Amerged

- Quality Contribution‌:
#1229
#1979
#4359
#4878

- Community Involvement‌: 
He lead the #1147, to
refactor AscendFusedMoE at the first time.
He shared topics about large-scale distributed inference and
reinforcement learning on vLLM-Ascend meetup on August 2nd.

@realliujiaxu
---
- Review Quality‌:
He has completed about [40+
reviews](https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+commenter%3Arealliujiaxu+-author%3Arealliujiaxu+)
since September, include
#4868 (comment),
#2275 (comment).

- Sustained Contributions
He has completed (17
commits)[https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+author%3Arealliujiaxu+is%3Amerged],
continuously optimizing the performance of the MoE model.

- Quality Contribution‌:

Contributed the Flash Comm1 feature to the community, supporting both
eager and aclgraph execution modes, while compatible with multiple MoE
models including DeepSeek and GLM4.5.
  - #3334
  - #3420
  - #3015
  
  co-author:
  - #3495
  - #4868

- Community Involvement‌: 
1. Completed two major refactors, enabling vllm-ascend to evolve more
rapidly and robustly: [Linear
module](#2867) and
[rejection
sampler](#4975)
2. [fixed 8
bugs](https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+author%3Arealliujiaxu+is%3Amerged+bugfix+)
in graph mode, spec decoding and async scheduling.

@LCAIZJ
---
- Review Quality‌: He's been the go-to reviewer for virtually all PD
disaggregation and KV Pool related PRs, having completed [30+
reviews](https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+commenter%3ALCAIZJ+is%3Aopen+-author%3ALCAIZJ+)
since May 2025. Notable examples include
[discussion_r2553887360](#4345 (comment)),
[issuecomment-3540994801](#4161 (comment)),
and
[discussion_r2492593988](#3981 (comment)),
all demonstrating thorough and insightful feedback.
- Sustained and Quality Contributions: His contributions reflect a
strong grasp of both ‌vLLM‌ and ‌vLLM Ascend‌ codebases, particularly in
prefill-decode disaggregation and KV pool areas ([7 PRs
merged](https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+author%3ALCAIZJ+is%3Amerged+)).
Prefill-Decode Disaggregation: Delivered KV transfer functionality using
Mooncake TransferEngine and enabled layerwise KV transfer
#1568
#2602
KV Pool: Developed the foundational KV Pool infrastructure and migrated
it to the latest ADXL stack
#2913
#3350
- Quality Contribution‌:
#1568
#2602
#2913
#3350
- Community Involvement‌: 
He actively responds to [community
issues](https://github.com/vllm-project/vllm-ascend/issues?q=is%3Aissue%20commenter%3ALCAIZJ%20is%3Aopen%20-author%3ALCAIZJ),
continuously monitors functionality and accuracy issues related to PD
disaggregation and KV Pool, and proactively delivers [bug
fixes](https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+author%3ALCAIZJ+is%3Amerged+bugfix).
- vLLM version: v0.12.0
- vLLM main:
vllm-project/vllm@ad32e3e

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
chenaoxuan pushed a commit to chenaoxuan/vllm-ascend that referenced this pull request Dec 20, 2025
…t#5152)

I'd like to nominate @zzzzwwjj @realliujiaxu @LCAIZJ to join vLLM Ascend
committer team.

@zzzzwwjj
---
- Review Quality‌:
He has completed 80+reviews since April. 2025, include
vllm-project#3232 (comment),
vllm-project#4822 (comment),
vllm-project#4768 (comment)
high quality review.

- Sustained Contributions
15+ Valuable bug fix and refactor is very good.

https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+author%3Azzzzwwjj+is%3Aclosed+review%3Aapproved
Continuous optimization of code architecture

https://github.com/vllm-project/vllm-ascend/pulls?q=author%3Azzzzwwjj+is%3Amerged

- Quality Contribution‌:
vllm-project#1229
vllm-project#1979
vllm-project#4359
vllm-project#4878

- Community Involvement‌: 
He lead the vllm-project#1147, to
refactor AscendFusedMoE at the first time.
He shared topics about large-scale distributed inference and
reinforcement learning on vLLM-Ascend meetup on August 2nd.

@realliujiaxu
---
- Review Quality‌:
He has completed about [40+
reviews](https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+commenter%3Arealliujiaxu+-author%3Arealliujiaxu+)
since September, include
vllm-project#4868 (comment),
vllm-project#2275 (comment).

- Sustained Contributions
He has completed (17
commits)[https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+author%3Arealliujiaxu+is%3Amerged],
continuously optimizing the performance of the MoE model.

- Quality Contribution‌:

Contributed the Flash Comm1 feature to the community, supporting both
eager and aclgraph execution modes, while compatible with multiple MoE
models including DeepSeek and GLM4.5.
  - vllm-project#3334
  - vllm-project#3420
  - vllm-project#3015
  
  co-author:
  - vllm-project#3495
  - vllm-project#4868

- Community Involvement‌: 
1. Completed two major refactors, enabling vllm-ascend to evolve more
rapidly and robustly: [Linear
module](vllm-project#2867) and
[rejection
sampler](vllm-project#4975)
2. [fixed 8
bugs](https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+author%3Arealliujiaxu+is%3Amerged+bugfix+)
in graph mode, spec decoding and async scheduling.

@LCAIZJ
---
- Review Quality‌: He's been the go-to reviewer for virtually all PD
disaggregation and KV Pool related PRs, having completed [30+
reviews](https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+commenter%3ALCAIZJ+is%3Aopen+-author%3ALCAIZJ+)
since May 2025. Notable examples include
[discussion_r2553887360](vllm-project#4345 (comment)),
[issuecomment-3540994801](vllm-project#4161 (comment)),
and
[discussion_r2492593988](vllm-project#3981 (comment)),
all demonstrating thorough and insightful feedback.
- Sustained and Quality Contributions: His contributions reflect a
strong grasp of both ‌vLLM‌ and ‌vLLM Ascend‌ codebases, particularly in
prefill-decode disaggregation and KV pool areas ([7 PRs
merged](https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+author%3ALCAIZJ+is%3Amerged+)).
Prefill-Decode Disaggregation: Delivered KV transfer functionality using
Mooncake TransferEngine and enabled layerwise KV transfer
vllm-project#1568
vllm-project#2602
KV Pool: Developed the foundational KV Pool infrastructure and migrated
it to the latest ADXL stack
vllm-project#2913
vllm-project#3350
- Quality Contribution‌:
vllm-project#1568
vllm-project#2602
vllm-project#2913
vllm-project#3350
- Community Involvement‌: 
He actively responds to [community
issues](https://github.com/vllm-project/vllm-ascend/issues?q=is%3Aissue%20commenter%3ALCAIZJ%20is%3Aopen%20-author%3ALCAIZJ),
continuously monitors functionality and accuracy issues related to PD
disaggregation and KV Pool, and proactively delivers [bug
fixes](https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+author%3ALCAIZJ+is%3Amerged+bugfix).
- vLLM version: v0.12.0
- vLLM main:
vllm-project/vllm@ad32e3e

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
ZRJ026 pushed a commit to ZRJ026/vllm-ascend that referenced this pull request Feb 28, 2026
…t#5152)

I'd like to nominate @zzzzwwjj @realliujiaxu @LCAIZJ to join vLLM Ascend
committer team.

@zzzzwwjj
---
- Review Quality‌:
He has completed 80+reviews since April. 2025, include
vllm-project#3232 (comment),
vllm-project#4822 (comment),
vllm-project#4768 (comment)
high quality review.

- Sustained Contributions
15+ Valuable bug fix and refactor is very good.

https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+author%3Azzzzwwjj+is%3Aclosed+review%3Aapproved
Continuous optimization of code architecture

https://github.com/vllm-project/vllm-ascend/pulls?q=author%3Azzzzwwjj+is%3Amerged

- Quality Contribution‌:
vllm-project#1229
vllm-project#1979
vllm-project#4359
vllm-project#4878

- Community Involvement‌:
He lead the vllm-project#1147, to
refactor AscendFusedMoE at the first time.
He shared topics about large-scale distributed inference and
reinforcement learning on vLLM-Ascend meetup on August 2nd.

@realliujiaxu
---
- Review Quality‌:
He has completed about [40+
reviews](https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+commenter%3Arealliujiaxu+-author%3Arealliujiaxu+)
since September, include
vllm-project#4868 (comment),
vllm-project#2275 (comment).

- Sustained Contributions
He has completed (17
commits)[https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+author%3Arealliujiaxu+is%3Amerged],
continuously optimizing the performance of the MoE model.

- Quality Contribution‌:

Contributed the Flash Comm1 feature to the community, supporting both
eager and aclgraph execution modes, while compatible with multiple MoE
models including DeepSeek and GLM4.5.
  - vllm-project#3334
  - vllm-project#3420
  - vllm-project#3015

  co-author:
  - vllm-project#3495
  - vllm-project#4868

- Community Involvement‌:
1. Completed two major refactors, enabling vllm-ascend to evolve more
rapidly and robustly: [Linear
module](vllm-project#2867) and
[rejection
sampler](vllm-project#4975)
2. [fixed 8
bugs](https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+author%3Arealliujiaxu+is%3Amerged+bugfix+)
in graph mode, spec decoding and async scheduling.

@LCAIZJ
---
- Review Quality‌: He's been the go-to reviewer for virtually all PD
disaggregation and KV Pool related PRs, having completed [30+
reviews](https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+commenter%3ALCAIZJ+is%3Aopen+-author%3ALCAIZJ+)
since May 2025. Notable examples include
[discussion_r2553887360](vllm-project#4345 (comment)),
[issuecomment-3540994801](vllm-project#4161 (comment)),
and
[discussion_r2492593988](vllm-project#3981 (comment)),
all demonstrating thorough and insightful feedback.
- Sustained and Quality Contributions: His contributions reflect a
strong grasp of both ‌vLLM‌ and ‌vLLM Ascend‌ codebases, particularly in
prefill-decode disaggregation and KV pool areas ([7 PRs
merged](https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+author%3ALCAIZJ+is%3Amerged+)).
Prefill-Decode Disaggregation: Delivered KV transfer functionality using
Mooncake TransferEngine and enabled layerwise KV transfer
vllm-project#1568
vllm-project#2602
KV Pool: Developed the foundational KV Pool infrastructure and migrated
it to the latest ADXL stack
vllm-project#2913
vllm-project#3350
- Quality Contribution‌:
vllm-project#1568
vllm-project#2602
vllm-project#2913
vllm-project#3350
- Community Involvement‌:
He actively responds to [community
issues](https://github.com/vllm-project/vllm-ascend/issues?q=is%3Aissue%20commenter%3ALCAIZJ%20is%3Aopen%20-author%3ALCAIZJ),
continuously monitors functionality and accuracy issues related to PD
disaggregation and KV Pool, and proactively delivers [bug
fixes](https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+author%3ALCAIZJ+is%3Amerged+bugfix).
- vLLM version: v0.12.0
- vLLM main:
vllm-project/vllm@ad32e3e

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
ZRJ026 pushed a commit to ZRJ026/vllm-ascend that referenced this pull request Mar 4, 2026
…t#5152)

I'd like to nominate @zzzzwwjj @realliujiaxu @LCAIZJ to join vLLM Ascend
committer team.

@zzzzwwjj
---
- Review Quality‌:
He has completed 80+reviews since April. 2025, include
vllm-project#3232 (comment),
vllm-project#4822 (comment),
vllm-project#4768 (comment)
high quality review.

- Sustained Contributions
15+ Valuable bug fix and refactor is very good.

https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+author%3Azzzzwwjj+is%3Aclosed+review%3Aapproved
Continuous optimization of code architecture

https://github.com/vllm-project/vllm-ascend/pulls?q=author%3Azzzzwwjj+is%3Amerged

- Quality Contribution‌:
vllm-project#1229
vllm-project#1979
vllm-project#4359
vllm-project#4878

- Community Involvement‌:
He lead the vllm-project#1147, to
refactor AscendFusedMoE at the first time.
He shared topics about large-scale distributed inference and
reinforcement learning on vLLM-Ascend meetup on August 2nd.

@realliujiaxu
---
- Review Quality‌:
He has completed about [40+
reviews](https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+commenter%3Arealliujiaxu+-author%3Arealliujiaxu+)
since September, include
vllm-project#4868 (comment),
vllm-project#2275 (comment).

- Sustained Contributions
He has completed (17
commits)[https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+author%3Arealliujiaxu+is%3Amerged],
continuously optimizing the performance of the MoE model.

- Quality Contribution‌:

Contributed the Flash Comm1 feature to the community, supporting both
eager and aclgraph execution modes, while compatible with multiple MoE
models including DeepSeek and GLM4.5.
  - vllm-project#3334
  - vllm-project#3420
  - vllm-project#3015

  co-author:
  - vllm-project#3495
  - vllm-project#4868

- Community Involvement‌:
1. Completed two major refactors, enabling vllm-ascend to evolve more
rapidly and robustly: [Linear
module](vllm-project#2867) and
[rejection
sampler](vllm-project#4975)
2. [fixed 8
bugs](https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+author%3Arealliujiaxu+is%3Amerged+bugfix+)
in graph mode, spec decoding and async scheduling.

@LCAIZJ
---
- Review Quality‌: He's been the go-to reviewer for virtually all PD
disaggregation and KV Pool related PRs, having completed [30+
reviews](https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+commenter%3ALCAIZJ+is%3Aopen+-author%3ALCAIZJ+)
since May 2025. Notable examples include
[discussion_r2553887360](vllm-project#4345 (comment)),
[issuecomment-3540994801](vllm-project#4161 (comment)),
and
[discussion_r2492593988](vllm-project#3981 (comment)),
all demonstrating thorough and insightful feedback.
- Sustained and Quality Contributions: His contributions reflect a
strong grasp of both ‌vLLM‌ and ‌vLLM Ascend‌ codebases, particularly in
prefill-decode disaggregation and KV pool areas ([7 PRs
merged](https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+author%3ALCAIZJ+is%3Amerged+)).
Prefill-Decode Disaggregation: Delivered KV transfer functionality using
Mooncake TransferEngine and enabled layerwise KV transfer
vllm-project#1568
vllm-project#2602
KV Pool: Developed the foundational KV Pool infrastructure and migrated
it to the latest ADXL stack
vllm-project#2913
vllm-project#3350
- Quality Contribution‌:
vllm-project#1568
vllm-project#2602
vllm-project#2913
vllm-project#3350
- Community Involvement‌:
He actively responds to [community
issues](https://github.com/vllm-project/vllm-ascend/issues?q=is%3Aissue%20commenter%3ALCAIZJ%20is%3Aopen%20-author%3ALCAIZJ),
continuously monitors functionality and accuracy issues related to PD
disaggregation and KV Pool, and proactively delivers [bug
fixes](https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+author%3ALCAIZJ+is%3Amerged+bugfix).
- vLLM version: v0.12.0
- vLLM main:
vllm-project/vllm@ad32e3e

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

module:ops module:tests ready read for review ready-for-test start test by label for PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants