Skip to content

[OP] add custom op aclnnMoeInitRoutingCustom#5251

Merged
zzzzwwjj merged 2 commits intovllm-project:mainfrom
jiazhengyi:jia-init-routing
Dec 29, 2025
Merged

[OP] add custom op aclnnMoeInitRoutingCustom#5251
zzzzwwjj merged 2 commits intovllm-project:mainfrom
jiazhengyi:jia-init-routing

Conversation

@jiazhengyi
Copy link
Copy Markdown
Contributor

@jiazhengyi jiazhengyi commented Dec 22, 2025

What this PR does / why we need it?

This pull request introduces a new custom operator aclnnMoeInitRoutingCustom for Mixture-of-Experts models.
It can be replaced by aclnnMoeInitRoutingV3 once CANN 8.5 becomes available.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new custom operator moe_init_routing_custom for Mixture-of-Experts models. The changes are extensive, covering the operator's definition, implementation for both host and device, shape inference, tiling logic, and Torch bindings. I've identified a few critical issues related to attribute validation and shape inference that could lead to runtime errors or incorrect behavior. Additionally, there's a bug in the golden implementation within the test file that could affect test accuracy. My review provides suggestions to address these points to improve the robustness and correctness of the new operator.

Comment on lines +124 to +127
if (activeNum < 0) {
OPS_LOG_E(context->GetNodeName(), "The active_num should be greater than or equal to 0. But it is %ld.", activeNum);
return ge::GRAPH_FAILED;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The validation for active_num is overly strict. The default value for this attribute is -1, which is a standard way to indicate a dynamic dimension. The current check activeNum < 0 incorrectly flags this default value as an error when dropPadMode is NO_DROP_PAD, preventing the operator from being used with dynamic shapes. Please adjust the check to permit the value -1, for instance, by changing it to activeNum < -1.

        if (activeNum < -1) {
            OPS_LOG_E(context->GetNodeName(), "The active_num should be greater than or equal to -1. But it is %ld.", activeNum);
            return ge::GRAPH_FAILED;
        }

Comment on lines +607 to +616
if(expertTokenNumFlag){
if (expertTokenNumType == ExpertTokenNumType::KEY_VALUE) {
expertTokenCumsumOrCountShape->SetDimNum(DIM_TWO);
expertTokenCumsumOrCountShape->SetDim(0U, experNum);
expertTokenCumsumOrCountShape->SetDim(DIM_ONE, KEY_VALUE_MODE_DIM0_NUM);
} else {
expertTokenCumsumOrCountShape->SetDimNum(DIM_ONE);
expertTokenCumsumOrCountShape->SetDim(0U, expertEnd - expertStart);
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The shape for the expert_tokens_count_or_cumsum output is only set when expertTokenNumFlag is true. As this is a required output, its shape must be defined in all scenarios to prevent graph compilation failures. When the flag is false, you should set a valid shape, such as [0], to represent an empty tensor.

    if(expertTokenNumFlag){
        if (expertTokenNumType == ExpertTokenNumType::KEY_VALUE) {
            expertTokenCumsumOrCountShape->SetDimNum(DIM_TWO);
            expertTokenCumsumOrCountShape->SetDim(0U, experNum);
            expertTokenCumsumOrCountShape->SetDim(DIM_ONE, KEY_VALUE_MODE_DIM0_NUM);
        } else {
            expertTokenCumsumOrCountShape->SetDimNum(DIM_ONE);
            expertTokenCumsumOrCountShape->SetDim(0U, expertEnd - expertStart);
        }
    } else {
        expertTokenCumsumOrCountShape->SetDimNum(DIM_ONE);
        expertTokenCumsumOrCountShape->SetDim(0U, 0);
    }

Comment thread csrc/torch_binding.cpp
Comment on lines +1193 to +1199
if (expert_tokens_num_type >= CUMSUM && expert_tokens_num_type <= COUNT) {
// expert_tokens_count_or_cumsum in [end-start, ]
expert_tokens_count_or_cumsum = at::empty({expert_length}, x.options().dtype(at::kLong));
} else if (expert_tokens_num_type == KEY_VALUE) {
// key_value in [2, end-start]
expert_tokens_count_or_cumsum = at::empty({expert_num, 2}, x.options().dtype(at::kLong));
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The expert_tokens_num_type parameter is not being validated. An invalid value could result in the expert_tokens_count_or_cumsum tensor not being initialized, leading to a runtime crash. It's important to add a TORCH_CHECK to confirm that expert_tokens_num_type is within the valid range of {0, 1, 2}, mirroring the validation in infershape.cpp.

    TORCH_CHECK(expert_tokens_num_type >= CUMSUM && expert_tokens_num_type <= KEY_VALUE, "expert_tokens_num_type must be one of {0, 1, 2}, but got ", expert_tokens_num_type);
    if (expert_tokens_num_type >= CUMSUM && expert_tokens_num_type <= COUNT) {
        // expert_tokens_count_or_cumsum in [end-start, ]
        expert_tokens_count_or_cumsum = at::empty({expert_length}, x.options().dtype(at::kLong));
    } else if (expert_tokens_num_type == KEY_VALUE) {
        // key_value in [2, end-start]
        expert_tokens_count_or_cumsum = at::empty({expert_num, 2}, x.options().dtype(at::kLong));
    }

Comment on lines +357 to +363
if (expert_tokens_num_type >= CUMSUM && expert_tokens_num_type <= COUNT) {
// expert_tokens_count_or_cumsum in [end-start, ]
expert_tokens_count_or_cumsum = at::empty({expert_length}, x.options().dtype(at::kLong));
} else if (expert_tokens_num_type == KEY_VALUE) {
// key_value in [2, end-start]
expert_tokens_count_or_cumsum = at::empty({expert_num, 2}, x.options().dtype(at::kLong));
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Similar to the main binding, this meta function is missing validation for expert_tokens_num_type. An invalid value will cause an uninitialized tensor to be returned, which can lead to issues during model tracing or exporting. Please add a TORCH_CHECK to ensure expert_tokens_num_type is valid.

    TORCH_CHECK(expert_tokens_num_type >= CUMSUM && expert_tokens_num_type <= KEY_VALUE, "expert_tokens_num_type must be one of {0, 1, 2}, but got ", expert_tokens_num_type);
    if (expert_tokens_num_type >= CUMSUM && expert_tokens_num_type <= COUNT) {
        // expert_tokens_count_or_cumsum in [end-start, ]
        expert_tokens_count_or_cumsum = at::empty({expert_length}, x.options().dtype(at::kLong));
    } else if (expert_tokens_num_type == KEY_VALUE) {
        // key_value in [2, end-start]
        expert_tokens_count_or_cumsum = at::empty({expert_num, 2}, x.options().dtype(at::kLong));
    }

expanded_x = x_final / expanded_scale
expanded_x = np.round(expanded_x).astype(np.int8)
if x.dtype == np.int8:
expanded_scale == None
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The expression expanded_scale == None is a comparison and does not perform an assignment. This appears to be a bug in the golden implementation, as the likely intent was to assign None to expanded_scale (expanded_scale = None). This could lead to incorrect test validations if expanded_scale retains a value from a previous execution path.

Suggested change
expanded_scale == None
expanded_scale = None

@github-actions
Copy link
Copy Markdown
Contributor

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

  • A PR should do only one thing, smaller PRs enable faster reviews.
  • Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
  • Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

@ChenxiQ ChenxiQ force-pushed the jia-init-routing branch 4 times, most recently from eaeafbe to 5c50197 Compare December 23, 2025 12:29
@jiazhengyi jiazhengyi changed the title [OPs] add custom op moe_init_routing_custom [OP] add custom op moe_init_routing_custom Dec 23, 2025
@jiazhengyi jiazhengyi changed the title [OP] add custom op moe_init_routing_custom [OP] add custom op aclnnMoeInitRoutingCustom Dec 23, 2025
@ChenxiQ ChenxiQ force-pushed the jia-init-routing branch 5 times, most recently from 3a0d47e to e1d0f70 Compare December 23, 2025 17:49
jiazhengyi and others added 2 commits December 29, 2025 19:29
Signed-off-by: jiazhengyi <jiazhengyi@huawei.com>
Signed-off-by: Chenxi Qian <chenxi.qian.cq@outlook.com>
@zzzzwwjj zzzzwwjj merged commit d5f7283 into vllm-project:main Dec 29, 2025
9 of 11 checks passed
@Yikun Yikun mentioned this pull request Dec 30, 2025
1 task
845473182 pushed a commit to 845473182/vllm-ascend that referenced this pull request Dec 31, 2025
…to FIA_rebase

* 'main' of https://github.com/vllm-project/vllm-ascend: (88 commits)
  [1/N] Refactor nightly test structure (vllm-project#5479)
  Docs: Remove deprecated --task parameter for embedding models (vllm-project#5257)
  Revert "moe_gating_top_k" (vllm-project#5512)
  [Doc] Fix issue link for 0.12.0 (vllm-project#5500)
  [CI]update triton ascend version (vllm-project#5392)
  moe_gating_top_k (vllm-project#5271)
  [refactor] refactor model runner capture model (vllm-project#5230)
  Update corresponding vllm commit ID to 12 29 (vllm-project#5475)
  [Kernel]update csrc cmakelist for open-source cann (vllm-project#5458)
  [OP] add custom op aclnnMoeInitRoutingCustom (vllm-project#5251)
  [Refactor][EAGLE] 1/N delete __init__ in mtp_proposer (vllm-project#5176)
  [Refactor][Triton] Move reject sample triton kernels into ops/triton (vllm-project#5324)
  [Feature] support eager mode in model runner v2 (vllm-project#5210)
  [feature] fia support sliding windows (vllm-project#5239)
  Optimize some rejectsampler functions to make npu op launch non-blocking (vllm-project#4587)
  [Feature] Support to use fullgraph with eagle (vllm-project#5118)
  [EPLB][refactor] Modification of the initialization logic for expert_map and log2phy(depend on pr5285) (vllm-project#5311)
  [Refactor]6/N Extract common code of class AscendMLAImpl (vllm-project#5314)
  [Refactor] cache cos/sin in mla & remove parameter model in builder. (vllm-project#5277)
  update vllm pin to 12.27 (vllm-project#5412)
  ...
shenchuxiaofugui pushed a commit to shenchuxiaofugui/vllm-ascend that referenced this pull request Dec 31, 2025
<!--  Thanks for sending a pull request!

BEFORE SUBMITTING, PLEASE READ
https://docs.vllm.ai/en/latest/contributing/overview.html

-->
### What this PR does / why we need it?
<!--
- Please clarify what changes you are proposing. The purpose of this
section is to outline the changes and how this PR fixes the issue.
If possible, please consider writing useful notes for better and faster
reviews in your PR.

- Please clarify why the changes are needed. For instance, the use case
and bug description.

- Fixes #
-->

This pull request introduces a new custom operator
`aclnnMoeInitRoutingCustom` for Mixture-of-Experts models.
It can be replaced by `aclnnMoeInitRoutingV3` once CANN 8.5 becomes
available.

### Does this PR introduce _any_ user-facing change?
<!--
Note that it means *any* user-facing change including all aspects such
as API, interface or other behavior changes.
Documentation-only updates are not considered user-facing changes.
-->
No.

### How was this patch tested?
<!--
CI passed with new added/existing test.
If it was tested in a way different from regular unit tests, please
clarify how you tested step by step, ideally copy and paste-able, so
that other reviewers can test and check, and descendants can verify in
the future.
If tests were not added, please describe why they were not added and/or
why it was difficult to add.
-->

---------

Signed-off-by: jiazhengyi <jiazhengyi@huawei.com>
Signed-off-by: Chenxi Qian <chenxi.qian.cq@outlook.com>
Co-authored-by: jiazhengyi <jiazhengyi@huawei.com>
Co-authored-by: Chenxi Qian <chenxi.qian.cq@outlook.com>
zzzzwwjj added a commit that referenced this pull request Jan 9, 2026
### What this PR does / why we need it?
This PR enables custom op `aclnnMoeInitRoutingCustom` introduced in PR
#5251

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?

- vLLM version: release/v0.13.0
- vLLM main:
vllm-project/vllm@bc0a5a0

---------

Signed-off-by: QianChenxi <chenxi.qian.cq@outlook.com>
Signed-off-by: zzzzwwjj <1183291235@qq.com>
Co-authored-by: zzzzwwjj <1183291235@qq.com>
starmountain1997 pushed a commit to starmountain1997/vllm-ascend that referenced this pull request Jan 10, 2026
### What this PR does / why we need it?
This PR enables custom op `aclnnMoeInitRoutingCustom` introduced in PR
vllm-project#5251

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?

- vLLM version: release/v0.13.0
- vLLM main:
vllm-project/vllm@bc0a5a0

---------

Signed-off-by: QianChenxi <chenxi.qian.cq@outlook.com>
Signed-off-by: zzzzwwjj <1183291235@qq.com>
Co-authored-by: zzzzwwjj <1183291235@qq.com>
guanguan0308 pushed a commit to guanguan0308/vllm-ascend that referenced this pull request Jan 13, 2026
### What this PR does / why we need it?
This PR enables custom op `aclnnMoeInitRoutingCustom` introduced in PR
vllm-project#5251

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?

- vLLM version: release/v0.13.0
- vLLM main:
vllm-project/vllm@bc0a5a0

---------

Signed-off-by: QianChenxi <chenxi.qian.cq@outlook.com>
Signed-off-by: zzzzwwjj <1183291235@qq.com>
Co-authored-by: zzzzwwjj <1183291235@qq.com>
guanguan0308 pushed a commit to guanguan0308/vllm-ascend that referenced this pull request Jan 13, 2026
### What this PR does / why we need it?
This PR enables custom op `aclnnMoeInitRoutingCustom` introduced in PR
vllm-project#5251

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?

- vLLM version: release/v0.13.0
- vLLM main:
vllm-project/vllm@bc0a5a0

---------

Signed-off-by: QianChenxi <chenxi.qian.cq@outlook.com>
Signed-off-by: zzzzwwjj <1183291235@qq.com>
Co-authored-by: zzzzwwjj <1183291235@qq.com>
aipaes pushed a commit to aipaes/vllm-ascend that referenced this pull request Jan 15, 2026
### What this PR does / why we need it?
This PR enables custom op `aclnnMoeInitRoutingCustom` introduced in PR
vllm-project#5251

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?

- vLLM version: release/v0.13.0
- vLLM main:
vllm-project/vllm@bc0a5a0

---------

Signed-off-by: QianChenxi <chenxi.qian.cq@outlook.com>
Signed-off-by: zzzzwwjj <1183291235@qq.com>
Co-authored-by: zzzzwwjj <1183291235@qq.com>
ZRJ026 pushed a commit to ZRJ026/vllm-ascend that referenced this pull request Feb 28, 2026
<!--  Thanks for sending a pull request!

BEFORE SUBMITTING, PLEASE READ
https://docs.vllm.ai/en/latest/contributing/overview.html

-->
### What this PR does / why we need it?
<!--
- Please clarify what changes you are proposing. The purpose of this
section is to outline the changes and how this PR fixes the issue.
If possible, please consider writing useful notes for better and faster
reviews in your PR.

- Please clarify why the changes are needed. For instance, the use case
and bug description.

- Fixes #
-->

This pull request introduces a new custom operator
`aclnnMoeInitRoutingCustom` for Mixture-of-Experts models.
It can be replaced by `aclnnMoeInitRoutingV3` once CANN 8.5 becomes
available.

### Does this PR introduce _any_ user-facing change?
<!--
Note that it means *any* user-facing change including all aspects such
as API, interface or other behavior changes.
Documentation-only updates are not considered user-facing changes.
-->
No.

### How was this patch tested?
<!--
CI passed with new added/existing test.
If it was tested in a way different from regular unit tests, please
clarify how you tested step by step, ideally copy and paste-able, so
that other reviewers can test and check, and descendants can verify in
the future.
If tests were not added, please describe why they were not added and/or
why it was difficult to add.
-->

---------

Signed-off-by: jiazhengyi <jiazhengyi@huawei.com>
Signed-off-by: Chenxi Qian <chenxi.qian.cq@outlook.com>
Co-authored-by: jiazhengyi <jiazhengyi@huawei.com>
Co-authored-by: Chenxi Qian <chenxi.qian.cq@outlook.com>
Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
ZRJ026 pushed a commit to ZRJ026/vllm-ascend that referenced this pull request Feb 28, 2026
### What this PR does / why we need it?
This PR enables custom op `aclnnMoeInitRoutingCustom` introduced in PR
vllm-project#5251

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?

- vLLM version: release/v0.13.0
- vLLM main:
vllm-project/vllm@bc0a5a0

---------

Signed-off-by: QianChenxi <chenxi.qian.cq@outlook.com>
Signed-off-by: zzzzwwjj <1183291235@qq.com>
Co-authored-by: zzzzwwjj <1183291235@qq.com>
Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
maoxx241 pushed a commit to maoxx241/vllm-ascend that referenced this pull request Mar 2, 2026
<!--  Thanks for sending a pull request!

BEFORE SUBMITTING, PLEASE READ
https://docs.vllm.ai/en/latest/contributing/overview.html

-->
### What this PR does / why we need it?
<!--
- Please clarify what changes you are proposing. The purpose of this
section is to outline the changes and how this PR fixes the issue.
If possible, please consider writing useful notes for better and faster
reviews in your PR.

- Please clarify why the changes are needed. For instance, the use case
and bug description.

- Fixes #
-->

This pull request introduces a new custom operator
`aclnnMoeInitRoutingCustom` for Mixture-of-Experts models.
It can be replaced by `aclnnMoeInitRoutingV3` once CANN 8.5 becomes
available.

### Does this PR introduce _any_ user-facing change?
<!--
Note that it means *any* user-facing change including all aspects such
as API, interface or other behavior changes.
Documentation-only updates are not considered user-facing changes.
-->
No.

### How was this patch tested?
<!--
CI passed with new added/existing test.
If it was tested in a way different from regular unit tests, please
clarify how you tested step by step, ideally copy and paste-able, so
that other reviewers can test and check, and descendants can verify in
the future.
If tests were not added, please describe why they were not added and/or
why it was difficult to add.
-->

---------

Signed-off-by: jiazhengyi <jiazhengyi@huawei.com>
Signed-off-by: Chenxi Qian <chenxi.qian.cq@outlook.com>
Co-authored-by: jiazhengyi <jiazhengyi@huawei.com>
Co-authored-by: Chenxi Qian <chenxi.qian.cq@outlook.com>
maoxx241 pushed a commit to maoxx241/vllm-ascend that referenced this pull request Mar 2, 2026
### What this PR does / why we need it?
This PR enables custom op `aclnnMoeInitRoutingCustom` introduced in PR
vllm-project#5251

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?

- vLLM version: release/v0.13.0
- vLLM main:
vllm-project/vllm@bc0a5a0

---------

Signed-off-by: QianChenxi <chenxi.qian.cq@outlook.com>
Signed-off-by: zzzzwwjj <1183291235@qq.com>
Co-authored-by: zzzzwwjj <1183291235@qq.com>
ZRJ026 pushed a commit to ZRJ026/vllm-ascend that referenced this pull request Mar 4, 2026
<!--  Thanks for sending a pull request!

BEFORE SUBMITTING, PLEASE READ
https://docs.vllm.ai/en/latest/contributing/overview.html

-->
### What this PR does / why we need it?
<!--
- Please clarify what changes you are proposing. The purpose of this
section is to outline the changes and how this PR fixes the issue.
If possible, please consider writing useful notes for better and faster
reviews in your PR.

- Please clarify why the changes are needed. For instance, the use case
and bug description.

- Fixes #
-->

This pull request introduces a new custom operator
`aclnnMoeInitRoutingCustom` for Mixture-of-Experts models.
It can be replaced by `aclnnMoeInitRoutingV3` once CANN 8.5 becomes
available.

### Does this PR introduce _any_ user-facing change?
<!--
Note that it means *any* user-facing change including all aspects such
as API, interface or other behavior changes.
Documentation-only updates are not considered user-facing changes.
-->
No.

### How was this patch tested?
<!--
CI passed with new added/existing test.
If it was tested in a way different from regular unit tests, please
clarify how you tested step by step, ideally copy and paste-able, so
that other reviewers can test and check, and descendants can verify in
the future.
If tests were not added, please describe why they were not added and/or
why it was difficult to add.
-->

---------

Signed-off-by: jiazhengyi <jiazhengyi@huawei.com>
Signed-off-by: Chenxi Qian <chenxi.qian.cq@outlook.com>
Co-authored-by: jiazhengyi <jiazhengyi@huawei.com>
Co-authored-by: Chenxi Qian <chenxi.qian.cq@outlook.com>
Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
ZRJ026 pushed a commit to ZRJ026/vllm-ascend that referenced this pull request Mar 4, 2026
### What this PR does / why we need it?
This PR enables custom op `aclnnMoeInitRoutingCustom` introduced in PR
vllm-project#5251

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?

- vLLM version: release/v0.13.0
- vLLM main:
vllm-project/vllm@bc0a5a0

---------

Signed-off-by: QianChenxi <chenxi.qian.cq@outlook.com>
Signed-off-by: zzzzwwjj <1183291235@qq.com>
Co-authored-by: zzzzwwjj <1183291235@qq.com>
Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
LCAIZJ pushed a commit to LCAIZJ/vllm-ascend that referenced this pull request Mar 7, 2026
### What this PR does / why we need it?
This PR enables custom op `aclnnMoeInitRoutingCustom` introduced in PR
vllm-project#5251

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?

- vLLM version: release/v0.13.0
- vLLM main:
vllm-project/vllm@bc0a5a0

---------

Signed-off-by: QianChenxi <chenxi.qian.cq@outlook.com>
Signed-off-by: zzzzwwjj <1183291235@qq.com>
Co-authored-by: zzzzwwjj <1183291235@qq.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants