[OP] add custom op aclnnMoeInitRoutingCustom by jiazhengyi · Pull Request #5251 · vllm-project/vllm-ascend

jiazhengyi · 2025-12-22T11:48:42Z

What this PR does / why we need it?

This pull request introduces a new custom operator aclnnMoeInitRoutingCustom for Mixture-of-Experts models.
It can be replaced by aclnnMoeInitRoutingV3 once CANN 8.5 becomes available.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

gemini-code-assist

Code Review

This pull request introduces a new custom operator moe_init_routing_custom for Mixture-of-Experts models. The changes are extensive, covering the operator's definition, implementation for both host and device, shape inference, tiling logic, and Torch bindings. I've identified a few critical issues related to attribute validation and shape inference that could lead to runtime errors or incorrect behavior. Additionally, there's a bug in the golden implementation within the test file that could affect test accuracy. My review provides suggestions to address these points to improve the robustness and correctness of the new operator.

gemini-code-assist · 2025-12-22T11:52:17Z

+        if (activeNum < 0) {
+            OPS_LOG_E(context->GetNodeName(), "The active_num should be greater than or equal to 0. But it is %ld.", activeNum);
+            return ge::GRAPH_FAILED;
+        }


The validation for active_num is overly strict. The default value for this attribute is -1, which is a standard way to indicate a dynamic dimension. The current check activeNum < 0 incorrectly flags this default value as an error when dropPadMode is NO_DROP_PAD, preventing the operator from being used with dynamic shapes. Please adjust the check to permit the value -1, for instance, by changing it to activeNum < -1.

if (activeNum < -1) { OPS_LOG_E(context->GetNodeName(), "The active_num should be greater than or equal to -1. But it is %ld.", activeNum); return ge::GRAPH_FAILED; }

gemini-code-assist · 2025-12-22T11:52:17Z

+    if(expertTokenNumFlag){
+        if (expertTokenNumType == ExpertTokenNumType::KEY_VALUE) {
+            expertTokenCumsumOrCountShape->SetDimNum(DIM_TWO);
+            expertTokenCumsumOrCountShape->SetDim(0U, experNum);
+            expertTokenCumsumOrCountShape->SetDim(DIM_ONE, KEY_VALUE_MODE_DIM0_NUM);
+        } else {
+            expertTokenCumsumOrCountShape->SetDimNum(DIM_ONE);
+            expertTokenCumsumOrCountShape->SetDim(0U, expertEnd - expertStart);
+        }
+    }


The shape for the expert_tokens_count_or_cumsum output is only set when expertTokenNumFlag is true. As this is a required output, its shape must be defined in all scenarios to prevent graph compilation failures. When the flag is false, you should set a valid shape, such as [0], to represent an empty tensor.

if(expertTokenNumFlag){ if (expertTokenNumType == ExpertTokenNumType::KEY_VALUE) { expertTokenCumsumOrCountShape->SetDimNum(DIM_TWO); expertTokenCumsumOrCountShape->SetDim(0U, experNum); expertTokenCumsumOrCountShape->SetDim(DIM_ONE, KEY_VALUE_MODE_DIM0_NUM); } else { expertTokenCumsumOrCountShape->SetDimNum(DIM_ONE); expertTokenCumsumOrCountShape->SetDim(0U, expertEnd - expertStart); } } else { expertTokenCumsumOrCountShape->SetDimNum(DIM_ONE); expertTokenCumsumOrCountShape->SetDim(0U, 0); }

gemini-code-assist · 2025-12-22T11:52:17Z

+    if (expert_tokens_num_type >= CUMSUM && expert_tokens_num_type <= COUNT) {
+        // expert_tokens_count_or_cumsum in [end-start, ]
+        expert_tokens_count_or_cumsum = at::empty({expert_length}, x.options().dtype(at::kLong));
+    } else if (expert_tokens_num_type == KEY_VALUE) {
+        // key_value in [2, end-start]
+        expert_tokens_count_or_cumsum = at::empty({expert_num, 2}, x.options().dtype(at::kLong));
+    }


The expert_tokens_num_type parameter is not being validated. An invalid value could result in the expert_tokens_count_or_cumsum tensor not being initialized, leading to a runtime crash. It's important to add a TORCH_CHECK to confirm that expert_tokens_num_type is within the valid range of {0, 1, 2}, mirroring the validation in infershape.cpp.

TORCH_CHECK(expert_tokens_num_type >= CUMSUM && expert_tokens_num_type <= KEY_VALUE, "expert_tokens_num_type must be one of {0, 1, 2}, but got ", expert_tokens_num_type); if (expert_tokens_num_type >= CUMSUM && expert_tokens_num_type <= COUNT) { // expert_tokens_count_or_cumsum in [end-start, ] expert_tokens_count_or_cumsum = at::empty({expert_length}, x.options().dtype(at::kLong)); } else if (expert_tokens_num_type == KEY_VALUE) { // key_value in [2, end-start] expert_tokens_count_or_cumsum = at::empty({expert_num, 2}, x.options().dtype(at::kLong)); }

gemini-code-assist · 2025-12-22T11:52:17Z

+    if (expert_tokens_num_type >= CUMSUM && expert_tokens_num_type <= COUNT) {
+        // expert_tokens_count_or_cumsum in [end-start, ]
+        expert_tokens_count_or_cumsum = at::empty({expert_length}, x.options().dtype(at::kLong));
+    } else if (expert_tokens_num_type == KEY_VALUE) {
+        // key_value in [2, end-start]
+        expert_tokens_count_or_cumsum = at::empty({expert_num, 2}, x.options().dtype(at::kLong));
+    }


Similar to the main binding, this meta function is missing validation for expert_tokens_num_type. An invalid value will cause an uninitialized tensor to be returned, which can lead to issues during model tracing or exporting. Please add a TORCH_CHECK to ensure expert_tokens_num_type is valid.

TORCH_CHECK(expert_tokens_num_type >= CUMSUM && expert_tokens_num_type <= KEY_VALUE, "expert_tokens_num_type must be one of {0, 1, 2}, but got ", expert_tokens_num_type); if (expert_tokens_num_type >= CUMSUM && expert_tokens_num_type <= COUNT) { // expert_tokens_count_or_cumsum in [end-start, ] expert_tokens_count_or_cumsum = at::empty({expert_length}, x.options().dtype(at::kLong)); } else if (expert_tokens_num_type == KEY_VALUE) { // key_value in [2, end-start] expert_tokens_count_or_cumsum = at::empty({expert_num, 2}, x.options().dtype(at::kLong)); }

gemini-code-assist · 2025-12-22T11:52:17Z

+            expanded_x = x_final / expanded_scale
+            expanded_x = np.round(expanded_x).astype(np.int8)
+        if x.dtype == np.int8:
+            expanded_scale == None


The expression expanded_scale == None is a comparison and does not perform an assignment. This appears to be a bug in the golden implementation, as the likely intent was to assign None to expanded_scale (expanded_scale = None). This could lead to incorrect test validations if expanded_scale retains a value from a previous execution path.

Suggested change

expanded_scale == None

expanded_scale = None

github-actions · 2025-12-22T12:09:30Z

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

A PR should do only one thing, smaller PRs enable faster reviews.
Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

Signed-off-by: jiazhengyi <jiazhengyi@huawei.com>

Signed-off-by: Chenxi Qian <chenxi.qian.cq@outlook.com>

…to FIA_rebase * 'main' of https://github.com/vllm-project/vllm-ascend: (88 commits) [1/N] Refactor nightly test structure (vllm-project#5479) Docs: Remove deprecated --task parameter for embedding models (vllm-project#5257) Revert "moe_gating_top_k" (vllm-project#5512) [Doc] Fix issue link for 0.12.0 (vllm-project#5500) [CI]update triton ascend version (vllm-project#5392) moe_gating_top_k (vllm-project#5271) [refactor] refactor model runner capture model (vllm-project#5230) Update corresponding vllm commit ID to 12 29 (vllm-project#5475) [Kernel]update csrc cmakelist for open-source cann (vllm-project#5458) [OP] add custom op aclnnMoeInitRoutingCustom (vllm-project#5251) [Refactor][EAGLE] 1/N delete __init__ in mtp_proposer (vllm-project#5176) [Refactor][Triton] Move reject sample triton kernels into ops/triton (vllm-project#5324) [Feature] support eager mode in model runner v2 (vllm-project#5210) [feature] fia support sliding windows (vllm-project#5239) Optimize some rejectsampler functions to make npu op launch non-blocking (vllm-project#4587) [Feature] Support to use fullgraph with eagle (vllm-project#5118) [EPLB][refactor] Modification of the initialization logic for expert_map and log2phy（depend on pr5285） (vllm-project#5311) [Refactor]6/N Extract common code of class AscendMLAImpl (vllm-project#5314) [Refactor] cache cos/sin in mla & remove parameter model in builder. (vllm-project#5277) update vllm pin to 12.27 (vllm-project#5412) ...

### What this PR does / why we need it?  This pull request introduces a new custom operator `aclnnMoeInitRoutingCustom` for Mixture-of-Experts models. It can be replaced by `aclnnMoeInitRoutingV3` once CANN 8.5 becomes available. ### Does this PR introduce _any_ user-facing change?  No. ### How was this patch tested?  --------- Signed-off-by: jiazhengyi <jiazhengyi@huawei.com> Signed-off-by: Chenxi Qian <chenxi.qian.cq@outlook.com> Co-authored-by: jiazhengyi <jiazhengyi@huawei.com> Co-authored-by: Chenxi Qian <chenxi.qian.cq@outlook.com>

### What this PR does / why we need it? This PR enables custom op `aclnnMoeInitRoutingCustom` introduced in PR #5251 ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? - vLLM version: release/v0.13.0 - vLLM main: vllm-project/vllm@bc0a5a0 --------- Signed-off-by: QianChenxi <chenxi.qian.cq@outlook.com> Signed-off-by: zzzzwwjj <1183291235@qq.com> Co-authored-by: zzzzwwjj <1183291235@qq.com>

### What this PR does / why we need it? This PR enables custom op `aclnnMoeInitRoutingCustom` introduced in PR vllm-project#5251 ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? - vLLM version: release/v0.13.0 - vLLM main: vllm-project/vllm@bc0a5a0 --------- Signed-off-by: QianChenxi <chenxi.qian.cq@outlook.com> Signed-off-by: zzzzwwjj <1183291235@qq.com> Co-authored-by: zzzzwwjj <1183291235@qq.com>

### What this PR does / why we need it?  This pull request introduces a new custom operator `aclnnMoeInitRoutingCustom` for Mixture-of-Experts models. It can be replaced by `aclnnMoeInitRoutingV3` once CANN 8.5 becomes available. ### Does this PR introduce _any_ user-facing change?  No. ### How was this patch tested?  --------- Signed-off-by: jiazhengyi <jiazhengyi@huawei.com> Signed-off-by: Chenxi Qian <chenxi.qian.cq@outlook.com> Co-authored-by: jiazhengyi <jiazhengyi@huawei.com> Co-authored-by: Chenxi Qian <chenxi.qian.cq@outlook.com> Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>

### What this PR does / why we need it? This PR enables custom op `aclnnMoeInitRoutingCustom` introduced in PR vllm-project#5251 ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? - vLLM version: release/v0.13.0 - vLLM main: vllm-project/vllm@bc0a5a0 --------- Signed-off-by: QianChenxi <chenxi.qian.cq@outlook.com> Signed-off-by: zzzzwwjj <1183291235@qq.com> Co-authored-by: zzzzwwjj <1183291235@qq.com> Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>

### What this PR does / why we need it?  This pull request introduces a new custom operator `aclnnMoeInitRoutingCustom` for Mixture-of-Experts models. It can be replaced by `aclnnMoeInitRoutingV3` once CANN 8.5 becomes available. ### Does this PR introduce _any_ user-facing change?  No. ### How was this patch tested?  --------- Signed-off-by: jiazhengyi <jiazhengyi@huawei.com> Signed-off-by: Chenxi Qian <chenxi.qian.cq@outlook.com> Co-authored-by: jiazhengyi <jiazhengyi@huawei.com> Co-authored-by: Chenxi Qian <chenxi.qian.cq@outlook.com>

### What this PR does / why we need it? This PR enables custom op `aclnnMoeInitRoutingCustom` introduced in PR vllm-project#5251 ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? - vLLM version: release/v0.13.0 - vLLM main: vllm-project/vllm@bc0a5a0 --------- Signed-off-by: QianChenxi <chenxi.qian.cq@outlook.com> Signed-off-by: zzzzwwjj <1183291235@qq.com> Co-authored-by: zzzzwwjj <1183291235@qq.com>

### What this PR does / why we need it?  This pull request introduces a new custom operator `aclnnMoeInitRoutingCustom` for Mixture-of-Experts models. It can be replaced by `aclnnMoeInitRoutingV3` once CANN 8.5 becomes available. ### Does this PR introduce _any_ user-facing change?  No. ### How was this patch tested?  --------- Signed-off-by: jiazhengyi <jiazhengyi@huawei.com> Signed-off-by: Chenxi Qian <chenxi.qian.cq@outlook.com> Co-authored-by: jiazhengyi <jiazhengyi@huawei.com> Co-authored-by: Chenxi Qian <chenxi.qian.cq@outlook.com> Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>

### What this PR does / why we need it? This PR enables custom op `aclnnMoeInitRoutingCustom` introduced in PR vllm-project#5251 ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? - vLLM version: release/v0.13.0 - vLLM main: vllm-project/vllm@bc0a5a0 --------- Signed-off-by: QianChenxi <chenxi.qian.cq@outlook.com> Signed-off-by: zzzzwwjj <1183291235@qq.com> Co-authored-by: zzzzwwjj <1183291235@qq.com> Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>

### What this PR does / why we need it? This PR enables custom op `aclnnMoeInitRoutingCustom` introduced in PR vllm-project#5251 ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? - vLLM version: release/v0.13.0 - vLLM main: vllm-project/vllm@bc0a5a0 --------- Signed-off-by: QianChenxi <chenxi.qian.cq@outlook.com> Signed-off-by: zzzzwwjj <1183291235@qq.com> Co-authored-by: zzzzwwjj <1183291235@qq.com>

gemini-code-assist bot reviewed Dec 22, 2025

View reviewed changes

github-actions bot added the module:tests label Dec 22, 2025

ChenxiQ force-pushed the jia-init-routing branch 4 times, most recently from eaeafbe to 5c50197 Compare December 23, 2025 12:29

jiazhengyi changed the title ~~[OPs] add custom op moe_init_routing_custom~~ [OP] add custom op moe_init_routing_custom Dec 23, 2025

jiazhengyi changed the title ~~[OP] add custom op moe_init_routing_custom~~ [OP] add custom op aclnnMoeInitRoutingCustom Dec 23, 2025

ChenxiQ force-pushed the jia-init-routing branch 5 times, most recently from 3a0d47e to e1d0f70 Compare December 23, 2025 17:49

wangxiyuan approved these changes Dec 24, 2025

View reviewed changes

ChenxiQ mentioned this pull request Dec 24, 2025

[OP] Enable custom op aclnnMoeInitRoutingCustom #5332

Merged

zzzzwwjj approved these changes Dec 27, 2025

View reviewed changes

zzzzwwjj force-pushed the jia-init-routing branch from e1d0f70 to c98a11a Compare December 27, 2025 09:53

ChenxiQ force-pushed the jia-init-routing branch from c98a11a to a2c1981 Compare December 29, 2025 01:33

jiazhengyi and others added 2 commits December 29, 2025 19:29

add custom op moe_init_routing_custom

dd25aa0

Signed-off-by: jiazhengyi <jiazhengyi@huawei.com>

update op to latest

198dd70

Signed-off-by: Chenxi Qian <chenxi.qian.cq@outlook.com>

zzzzwwjj force-pushed the jia-init-routing branch from a2c1981 to 198dd70 Compare December 29, 2025 11:29

zzzzwwjj merged commit d5f7283 into vllm-project:main Dec 29, 2025
9 of 11 checks passed

Yikun mentioned this pull request Dec 30, 2025

[RFC]: Upgrade CANN from 8.3 to 8.5 #5494

Closed

1 task

ChenxiQ mentioned this pull request Dec 30, 2025

[RFC]: Support MoeInitRoutingCustom for CANN 8.3 #5501

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[OP] add custom op aclnnMoeInitRoutingCustom#5251

[OP] add custom op aclnnMoeInitRoutingCustom#5251
zzzzwwjj merged 2 commits intovllm-project:mainfrom
jiazhengyi:jia-init-routing

jiazhengyi commented Dec 22, 2025 •

edited

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Dec 22, 2025

Uh oh!

gemini-code-assist bot Dec 22, 2025

Uh oh!

gemini-code-assist bot Dec 22, 2025

Uh oh!

gemini-code-assist bot Dec 22, 2025

Uh oh!

gemini-code-assist bot Dec 22, 2025

Uh oh!

github-actions bot commented Dec 22, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

jiazhengyi commented Dec 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Dec 22, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Dec 22, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Dec 22, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Dec 22, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Dec 22, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Dec 22, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

jiazhengyi commented Dec 22, 2025 •

edited

Loading