[Bugfix] bugfix for moe_mlp#4822
Conversation
There was a problem hiding this comment.
Code Review
This pull request addresses a bug in the moe_mlp module by correctly converting the group_list argument from a cumulative sum to counts before passing it to torch_npu.npu_dequant_swiglu_quant. The logic for this conversion is sound. However, I've identified a potential edge case in the new code: it doesn't handle an empty group_list, which could lead to a runtime error. I have provided a suggestion to make the implementation more robust against this scenario.
vllm_ascend/ops/fused_moe/moe_mlp.py
Outdated
| new_group = torch.cat([group_list[0].unsqueeze(0), group_diff], | ||
| dim=0) |
There was a problem hiding this comment.
Using group_list[0].unsqueeze(0) will raise an IndexError if group_list is an empty tensor. This can occur if no tokens are routed to experts on the current device. Using slicing group_list[:1] is more robust as it returns an empty tensor for an empty input, preventing a crash.
new_group = torch.cat([group_list[:1], group_diff], dim=0)|
👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:
If CI fails, you can run linting and testing checks locally according Contributing and Testing. |
vllm_ascend/ops/fused_moe/moe_mlp.py
Outdated
| group_diff = torch.diff(group_list, dim=0) | ||
| new_group = torch.cat([group_list[0].unsqueeze(0), group_diff], | ||
| dim=0) |
There was a problem hiding this comment.
It’s better to first check if there are cases where group_list_type != 1. You can extract a function to convert other types of group_list into count format, taking group_list and group_list_type as parameters, and determine how to perform the conversion within the function (refer to the cumsum_group_list function).
def count_group_list(group_list: torch.Tensor,
group_list_type: int) -> torch.Tensor
if group_list_type not in [0, 1, 2]:
raise ValueError(
f"group_list_type should be in [0, 1, 2], but received {group_list_type}"
)
if group_list_type == 0:
return torch.cat((group_list[:1], torch.diff(group_list)))
if group_list_type == 1:
return group_list
# group_list_type == 2
...There was a problem hiding this comment.
Thanks for your suggestions. We have incorporated your proposal into the cumsum_group_list function. Additionally, could you please clarify what scenario corresponds to group_list_type == 2?
There was a problem hiding this comment.
The changes are here:
def cumsum_group_list(group_list: torch.Tensor,
src_list_type: int,
dst_list_type: int,
active_num: int = 0,
expert_num: int = 0) -> torch.Tensor:
if src_list_type not in [0, 1, 2]:
raise ValueError(
f"group_list_type should be in [0, 1, 2], but received {src_list_type}"
)
if src_list_type == dst_list_type:
return group_list
if src_list_type == 1 and dst_list_type == 0:
return group_list.cumsum(dim=0)
if src_list_type == 0 and dst_list_type == 1:
group_diff = torch.diff(group_list)
new_group = torch.cat([group_diff[0].unsqueeze(0), group_diff], dim=0)
return new_group
experts = pad(group_list[:, 0], (1, 0))
tokens = pad(group_list[:, 1].cumsum(dim=0), (1, 0))
cumsum_group_list = torch.full(size=(expert_num, ),
fill_value=active_num,
dtype=group_list.dtype,
device=group_list.device)
for i, (start, end) in enumerate(zip(experts[:-1], experts[1:])):
if end > start:
cumsum_group_list[start:end] = tokens[i]
return cumsum_group_list
vllm_ascend/ops/fused_moe/moe_mlp.py
Outdated
| quant_scale=None, | ||
| quant_offset=None, | ||
| group_index=group_list, | ||
| group_index=new_group, |
There was a problem hiding this comment.
And use the function here.
| group_index=new_group, | |
| group_index=count_group_list(group_list, group_list_type) |
| active_num: int = 0, | ||
| expert_num: int = 0) -> torch.Tensor: | ||
| if group_list_type not in [0, 1, 2]: | ||
| if src_list_type not in [0, 1, 2]: |
There was a problem hiding this comment.
what meanings src_list_type==2? How to handle src_list_type=2 and dst_list_type=0?
There was a problem hiding this comment.
We also have the same confusion regarding the scenario where src_list_type == 2 in the file ops/fused_moe/moe_mlp.py on the main branch of the vllm-ascend repository. @zhoux77899 Would you please clarify this point ?
There was a problem hiding this comment.
Some ops like moe_init_routing_v2 can output the type 2 (key_value) group_list, but I’ve never seen this type of group_list actually used anywhere.
If there is a type 1 group_list like [0, 2, 1, 0]:
group_list_type = 0means cumsumgroup_list, it will be[0, 2, 3, 3];group_list_type = 1means countgroup_list, it will be[0, 2, 1, 0];group_list_type = 2means key_valuegroup_list, it will be[[1, 2], [2, 1], [0, 0], [0, 0]];
There was a problem hiding this comment.
group_list_type == 2 means key_value group_list, it will be [[0, 0], [1, 2], [2, 1], [3, 0]]?
There was a problem hiding this comment.
Apologies, when group_list_type = 2, group_list will be [[1, 2], [2, 1], [0, 0], [0, 0]]. It only contains tokens of active_num but is padded to expert_num. Maybe you can compare their differences using the script below.
import torch
import torch_npu
from vllm_ascend.ops.fused_moe.moe_mlp import cumsum_group_list
class GroupListTypeTester:
def __init__(
self,
batch_size: int = 1,
hidden_size: int = 768,
active_experts: int = 2,
num_experts: int = 4,
) -> None:
self.batch_size = batch_size
self.hidden_size = hidden_size
self.active_experts = active_experts
self.num_experts = num_experts
self.x = torch.randn(size=(self.batch_size, self.hidden_size), dtype=torch.bfloat16).npu()
self.expert_idx = torch.randint(low=0, high=self.num_experts, size=(self.batch_size, self.active_experts), dtype=torch.int32).npu()
self.scale = torch.randn(size=(self.batch_size, ), dtype=torch.float32).npu()
self.offset = None
self.init_routing_kwargs = {
"x": self.x,
"expert_idx": self.expert_idx,
"scale": self.scale,
"offset": self.offset,
"active_num": self.active_experts,
"expert_num": self.num_experts,
"expert_tokens_num_flag": True,
"quant_mode": -1,
"active_expert_range": [0, self.num_experts],
"row_idx_type": 0,
}
def __call__(self) -> None:
count_group_list = self.output_count_group_list()
kv_group_list = self.output_kv_group_list()
print(f"{count_group_list=}, cumsum_group_list_from_count={cumsum_group_list(count_group_list, 1)}")
print(f"{kv_group_list=}, cumsum_group_list_from_kv={cumsum_group_list(kv_group_list, 2, self.active_experts, self.num_experts)}")
def output_count_group_list(self) -> torch.Tensor:
_, _, group_list, _ = torch_npu.npu_moe_init_routing_v2(
**self.init_routing_kwargs,
expert_tokens_num_type=1,
)
return group_list
def output_kv_group_list(self) -> torch.Tensor:
_, _, group_list, _ = torch_npu.npu_moe_init_routing_v2(
**self.init_routing_kwargs,
expert_tokens_num_type=2,
)
return group_list
if __name__ == "__main__":
tester = GroupListTypeTester()
tester()I think consolidating all types of group_list computations into a single function might be overly complex, as it requires handling 6 different cases, and many call sites would also need modifications. Splitting them into separate functions would be more reasonable.
You may also consider whether to include the group_list_type = 2 scenario. The reason cumsum_group_list includes it is for safety redundancy, but I’m unsure about the specific use case it was designed for and I’ve never seen actually uses it at anywhere.
fb6a23c to
d4b6cc6
Compare
|
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
Signed-off-by: tanqingshan (A) <50050625@china.huawei.com> Signed-off-by: tanqingshan (A) <50050625@china.huawei.com>
760351f to
db29fda
Compare
Signed-off-by: tanqingshan (A) <50050625@china.huawei.com> Signed-off-by: tanqingshan (A) <50050625@china.huawei.com>
Signed-off-by: tanqingshan (A) <50050625@china.huawei.com> Signed-off-by: tanqingshan (A) <50050625@china.huawei.com>
…m-ascend into bugfix_moe_mlp_new
vllm_ascend/ops/fused_moe/moe_mlp.py
Outdated
| weight_scale=w1_scale, | ||
| x_scale=pertoken_scale, | ||
| group_list=cumsum_group_list(group_list, group_list_type), | ||
| group_list=group_list, |
There was a problem hiding this comment.
Need to check the range of group_list supported by the corresponding operator.
There was a problem hiding this comment.
Confirmed. Code updated.
Signed-off-by: tanqingshan (A) <50050625@china.huawei.com> Signed-off-by: tanqingshan (A) <50050625@china.huawei.com>
Signed-off-by: tanqingshan (A) <50050625@china.huawei.com> Signed-off-by: tanqingshan (A) <50050625@china.huawei.com>
|
This need be merged to dev branch. So I merged this now. |
I'd like to nominate @zzzzwwjj @realliujiaxu @LCAIZJ to join vLLM Ascend committer team. @zzzzwwjj --- - Review Quality: He has completed 80+reviews since April. 2025, include #3232 (comment), #4822 (comment), #4768 (comment) high quality review. - Sustained Contributions 15+ Valuable bug fix and refactor is very good. https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+author%3Azzzzwwjj+is%3Aclosed+review%3Aapproved Continuous optimization of code architecture https://github.com/vllm-project/vllm-ascend/pulls?q=author%3Azzzzwwjj+is%3Amerged - Quality Contribution: #1229 #1979 #4359 #4878 - Community Involvement: He lead the #1147, to refactor AscendFusedMoE at the first time. He shared topics about large-scale distributed inference and reinforcement learning on vLLM-Ascend meetup on August 2nd. @realliujiaxu --- - Review Quality: He has completed about [40+ reviews](https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+commenter%3Arealliujiaxu+-author%3Arealliujiaxu+) since September, include #4868 (comment), #2275 (comment). - Sustained Contributions He has completed (17 commits)[https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+author%3Arealliujiaxu+is%3Amerged], continuously optimizing the performance of the MoE model. - Quality Contribution: Contributed the Flash Comm1 feature to the community, supporting both eager and aclgraph execution modes, while compatible with multiple MoE models including DeepSeek and GLM4.5. - #3334 - #3420 - #3015 co-author: - #3495 - #4868 - Community Involvement: 1. Completed two major refactors, enabling vllm-ascend to evolve more rapidly and robustly: [Linear module](#2867) and [rejection sampler](#4975) 2. [fixed 8 bugs](https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+author%3Arealliujiaxu+is%3Amerged+bugfix+) in graph mode, spec decoding and async scheduling. @LCAIZJ --- - Review Quality: He's been the go-to reviewer for virtually all PD disaggregation and KV Pool related PRs, having completed [30+ reviews](https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+commenter%3ALCAIZJ+is%3Aopen+-author%3ALCAIZJ+) since May 2025. Notable examples include [discussion_r2553887360](#4345 (comment)), [issuecomment-3540994801](#4161 (comment)), and [discussion_r2492593988](#3981 (comment)), all demonstrating thorough and insightful feedback. - Sustained and Quality Contributions: His contributions reflect a strong grasp of both vLLM and vLLM Ascend codebases, particularly in prefill-decode disaggregation and KV pool areas ([7 PRs merged](https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+author%3ALCAIZJ+is%3Amerged+)). Prefill-Decode Disaggregation: Delivered KV transfer functionality using Mooncake TransferEngine and enabled layerwise KV transfer #1568 #2602 KV Pool: Developed the foundational KV Pool infrastructure and migrated it to the latest ADXL stack #2913 #3350 - Quality Contribution: #1568 #2602 #2913 #3350 - Community Involvement: He actively responds to [community issues](https://github.com/vllm-project/vllm-ascend/issues?q=is%3Aissue%20commenter%3ALCAIZJ%20is%3Aopen%20-author%3ALCAIZJ), continuously monitors functionality and accuracy issues related to PD disaggregation and KV Pool, and proactively delivers [bug fixes](https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+author%3ALCAIZJ+is%3Amerged+bugfix). - vLLM version: v0.12.0 - vLLM main: vllm-project/vllm@ad32e3e Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
…t#5152) I'd like to nominate @zzzzwwjj @realliujiaxu @LCAIZJ to join vLLM Ascend committer team. @zzzzwwjj --- - Review Quality: He has completed 80+reviews since April. 2025, include vllm-project#3232 (comment), vllm-project#4822 (comment), vllm-project#4768 (comment) high quality review. - Sustained Contributions 15+ Valuable bug fix and refactor is very good. https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+author%3Azzzzwwjj+is%3Aclosed+review%3Aapproved Continuous optimization of code architecture https://github.com/vllm-project/vllm-ascend/pulls?q=author%3Azzzzwwjj+is%3Amerged - Quality Contribution: vllm-project#1229 vllm-project#1979 vllm-project#4359 vllm-project#4878 - Community Involvement: He lead the vllm-project#1147, to refactor AscendFusedMoE at the first time. He shared topics about large-scale distributed inference and reinforcement learning on vLLM-Ascend meetup on August 2nd. @realliujiaxu --- - Review Quality: He has completed about [40+ reviews](https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+commenter%3Arealliujiaxu+-author%3Arealliujiaxu+) since September, include vllm-project#4868 (comment), vllm-project#2275 (comment). - Sustained Contributions He has completed (17 commits)[https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+author%3Arealliujiaxu+is%3Amerged], continuously optimizing the performance of the MoE model. - Quality Contribution: Contributed the Flash Comm1 feature to the community, supporting both eager and aclgraph execution modes, while compatible with multiple MoE models including DeepSeek and GLM4.5. - vllm-project#3334 - vllm-project#3420 - vllm-project#3015 co-author: - vllm-project#3495 - vllm-project#4868 - Community Involvement: 1. Completed two major refactors, enabling vllm-ascend to evolve more rapidly and robustly: [Linear module](vllm-project#2867) and [rejection sampler](vllm-project#4975) 2. [fixed 8 bugs](https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+author%3Arealliujiaxu+is%3Amerged+bugfix+) in graph mode, spec decoding and async scheduling. @LCAIZJ --- - Review Quality: He's been the go-to reviewer for virtually all PD disaggregation and KV Pool related PRs, having completed [30+ reviews](https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+commenter%3ALCAIZJ+is%3Aopen+-author%3ALCAIZJ+) since May 2025. Notable examples include [discussion_r2553887360](vllm-project#4345 (comment)), [issuecomment-3540994801](vllm-project#4161 (comment)), and [discussion_r2492593988](vllm-project#3981 (comment)), all demonstrating thorough and insightful feedback. - Sustained and Quality Contributions: His contributions reflect a strong grasp of both vLLM and vLLM Ascend codebases, particularly in prefill-decode disaggregation and KV pool areas ([7 PRs merged](https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+author%3ALCAIZJ+is%3Amerged+)). Prefill-Decode Disaggregation: Delivered KV transfer functionality using Mooncake TransferEngine and enabled layerwise KV transfer vllm-project#1568 vllm-project#2602 KV Pool: Developed the foundational KV Pool infrastructure and migrated it to the latest ADXL stack vllm-project#2913 vllm-project#3350 - Quality Contribution: vllm-project#1568 vllm-project#2602 vllm-project#2913 vllm-project#3350 - Community Involvement: He actively responds to [community issues](https://github.com/vllm-project/vllm-ascend/issues?q=is%3Aissue%20commenter%3ALCAIZJ%20is%3Aopen%20-author%3ALCAIZJ), continuously monitors functionality and accuracy issues related to PD disaggregation and KV Pool, and proactively delivers [bug fixes](https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+author%3ALCAIZJ+is%3Amerged+bugfix). - vLLM version: v0.12.0 - vLLM main: vllm-project/vllm@ad32e3e Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
…t#5152) I'd like to nominate @zzzzwwjj @realliujiaxu @LCAIZJ to join vLLM Ascend committer team. @zzzzwwjj --- - Review Quality: He has completed 80+reviews since April. 2025, include vllm-project#3232 (comment), vllm-project#4822 (comment), vllm-project#4768 (comment) high quality review. - Sustained Contributions 15+ Valuable bug fix and refactor is very good. https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+author%3Azzzzwwjj+is%3Aclosed+review%3Aapproved Continuous optimization of code architecture https://github.com/vllm-project/vllm-ascend/pulls?q=author%3Azzzzwwjj+is%3Amerged - Quality Contribution: vllm-project#1229 vllm-project#1979 vllm-project#4359 vllm-project#4878 - Community Involvement: He lead the vllm-project#1147, to refactor AscendFusedMoE at the first time. He shared topics about large-scale distributed inference and reinforcement learning on vLLM-Ascend meetup on August 2nd. @realliujiaxu --- - Review Quality: He has completed about [40+ reviews](https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+commenter%3Arealliujiaxu+-author%3Arealliujiaxu+) since September, include vllm-project#4868 (comment), vllm-project#2275 (comment). - Sustained Contributions He has completed (17 commits)[https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+author%3Arealliujiaxu+is%3Amerged], continuously optimizing the performance of the MoE model. - Quality Contribution: Contributed the Flash Comm1 feature to the community, supporting both eager and aclgraph execution modes, while compatible with multiple MoE models including DeepSeek and GLM4.5. - vllm-project#3334 - vllm-project#3420 - vllm-project#3015 co-author: - vllm-project#3495 - vllm-project#4868 - Community Involvement: 1. Completed two major refactors, enabling vllm-ascend to evolve more rapidly and robustly: [Linear module](vllm-project#2867) and [rejection sampler](vllm-project#4975) 2. [fixed 8 bugs](https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+author%3Arealliujiaxu+is%3Amerged+bugfix+) in graph mode, spec decoding and async scheduling. @LCAIZJ --- - Review Quality: He's been the go-to reviewer for virtually all PD disaggregation and KV Pool related PRs, having completed [30+ reviews](https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+commenter%3ALCAIZJ+is%3Aopen+-author%3ALCAIZJ+) since May 2025. Notable examples include [discussion_r2553887360](vllm-project#4345 (comment)), [issuecomment-3540994801](vllm-project#4161 (comment)), and [discussion_r2492593988](vllm-project#3981 (comment)), all demonstrating thorough and insightful feedback. - Sustained and Quality Contributions: His contributions reflect a strong grasp of both vLLM and vLLM Ascend codebases, particularly in prefill-decode disaggregation and KV pool areas ([7 PRs merged](https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+author%3ALCAIZJ+is%3Amerged+)). Prefill-Decode Disaggregation: Delivered KV transfer functionality using Mooncake TransferEngine and enabled layerwise KV transfer vllm-project#1568 vllm-project#2602 KV Pool: Developed the foundational KV Pool infrastructure and migrated it to the latest ADXL stack vllm-project#2913 vllm-project#3350 - Quality Contribution: vllm-project#1568 vllm-project#2602 vllm-project#2913 vllm-project#3350 - Community Involvement: He actively responds to [community issues](https://github.com/vllm-project/vllm-ascend/issues?q=is%3Aissue%20commenter%3ALCAIZJ%20is%3Aopen%20-author%3ALCAIZJ), continuously monitors functionality and accuracy issues related to PD disaggregation and KV Pool, and proactively delivers [bug fixes](https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+author%3ALCAIZJ+is%3Amerged+bugfix). - vLLM version: v0.12.0 - vLLM main: vllm-project/vllm@ad32e3e Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com> Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
…t#5152) I'd like to nominate @zzzzwwjj @realliujiaxu @LCAIZJ to join vLLM Ascend committer team. @zzzzwwjj --- - Review Quality: He has completed 80+reviews since April. 2025, include vllm-project#3232 (comment), vllm-project#4822 (comment), vllm-project#4768 (comment) high quality review. - Sustained Contributions 15+ Valuable bug fix and refactor is very good. https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+author%3Azzzzwwjj+is%3Aclosed+review%3Aapproved Continuous optimization of code architecture https://github.com/vllm-project/vllm-ascend/pulls?q=author%3Azzzzwwjj+is%3Amerged - Quality Contribution: vllm-project#1229 vllm-project#1979 vllm-project#4359 vllm-project#4878 - Community Involvement: He lead the vllm-project#1147, to refactor AscendFusedMoE at the first time. He shared topics about large-scale distributed inference and reinforcement learning on vLLM-Ascend meetup on August 2nd. @realliujiaxu --- - Review Quality: He has completed about [40+ reviews](https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+commenter%3Arealliujiaxu+-author%3Arealliujiaxu+) since September, include vllm-project#4868 (comment), vllm-project#2275 (comment). - Sustained Contributions He has completed (17 commits)[https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+author%3Arealliujiaxu+is%3Amerged], continuously optimizing the performance of the MoE model. - Quality Contribution: Contributed the Flash Comm1 feature to the community, supporting both eager and aclgraph execution modes, while compatible with multiple MoE models including DeepSeek and GLM4.5. - vllm-project#3334 - vllm-project#3420 - vllm-project#3015 co-author: - vllm-project#3495 - vllm-project#4868 - Community Involvement: 1. Completed two major refactors, enabling vllm-ascend to evolve more rapidly and robustly: [Linear module](vllm-project#2867) and [rejection sampler](vllm-project#4975) 2. [fixed 8 bugs](https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+author%3Arealliujiaxu+is%3Amerged+bugfix+) in graph mode, spec decoding and async scheduling. @LCAIZJ --- - Review Quality: He's been the go-to reviewer for virtually all PD disaggregation and KV Pool related PRs, having completed [30+ reviews](https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+commenter%3ALCAIZJ+is%3Aopen+-author%3ALCAIZJ+) since May 2025. Notable examples include [discussion_r2553887360](vllm-project#4345 (comment)), [issuecomment-3540994801](vllm-project#4161 (comment)), and [discussion_r2492593988](vllm-project#3981 (comment)), all demonstrating thorough and insightful feedback. - Sustained and Quality Contributions: His contributions reflect a strong grasp of both vLLM and vLLM Ascend codebases, particularly in prefill-decode disaggregation and KV pool areas ([7 PRs merged](https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+author%3ALCAIZJ+is%3Amerged+)). Prefill-Decode Disaggregation: Delivered KV transfer functionality using Mooncake TransferEngine and enabled layerwise KV transfer vllm-project#1568 vllm-project#2602 KV Pool: Developed the foundational KV Pool infrastructure and migrated it to the latest ADXL stack vllm-project#2913 vllm-project#3350 - Quality Contribution: vllm-project#1568 vllm-project#2602 vllm-project#2913 vllm-project#3350 - Community Involvement: He actively responds to [community issues](https://github.com/vllm-project/vllm-ascend/issues?q=is%3Aissue%20commenter%3ALCAIZJ%20is%3Aopen%20-author%3ALCAIZJ), continuously monitors functionality and accuracy issues related to PD disaggregation and KV Pool, and proactively delivers [bug fixes](https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+author%3ALCAIZJ+is%3Amerged+bugfix). - vLLM version: v0.12.0 - vLLM main: vllm-project/vllm@ad32e3e Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com> Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
What this PR does / why we need it?
This PR fixes a bug in the moe_mlp module by correcting the arguments passed to the torch_npu.npu_dequant_swiglu_quant function.It properly converts group_list from a cumulative sum to counts for the group_index parameter.
Does this PR introduce any user-facing change?
No