[OP] add custom op aclnnMoeInitRoutingCustom#5251
[OP] add custom op aclnnMoeInitRoutingCustom#5251zzzzwwjj merged 2 commits intovllm-project:mainfrom
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces a new custom operator moe_init_routing_custom for Mixture-of-Experts models. The changes are extensive, covering the operator's definition, implementation for both host and device, shape inference, tiling logic, and Torch bindings. I've identified a few critical issues related to attribute validation and shape inference that could lead to runtime errors or incorrect behavior. Additionally, there's a bug in the golden implementation within the test file that could affect test accuracy. My review provides suggestions to address these points to improve the robustness and correctness of the new operator.
| if (activeNum < 0) { | ||
| OPS_LOG_E(context->GetNodeName(), "The active_num should be greater than or equal to 0. But it is %ld.", activeNum); | ||
| return ge::GRAPH_FAILED; | ||
| } |
There was a problem hiding this comment.
The validation for active_num is overly strict. The default value for this attribute is -1, which is a standard way to indicate a dynamic dimension. The current check activeNum < 0 incorrectly flags this default value as an error when dropPadMode is NO_DROP_PAD, preventing the operator from being used with dynamic shapes. Please adjust the check to permit the value -1, for instance, by changing it to activeNum < -1.
if (activeNum < -1) {
OPS_LOG_E(context->GetNodeName(), "The active_num should be greater than or equal to -1. But it is %ld.", activeNum);
return ge::GRAPH_FAILED;
}| if(expertTokenNumFlag){ | ||
| if (expertTokenNumType == ExpertTokenNumType::KEY_VALUE) { | ||
| expertTokenCumsumOrCountShape->SetDimNum(DIM_TWO); | ||
| expertTokenCumsumOrCountShape->SetDim(0U, experNum); | ||
| expertTokenCumsumOrCountShape->SetDim(DIM_ONE, KEY_VALUE_MODE_DIM0_NUM); | ||
| } else { | ||
| expertTokenCumsumOrCountShape->SetDimNum(DIM_ONE); | ||
| expertTokenCumsumOrCountShape->SetDim(0U, expertEnd - expertStart); | ||
| } | ||
| } |
There was a problem hiding this comment.
The shape for the expert_tokens_count_or_cumsum output is only set when expertTokenNumFlag is true. As this is a required output, its shape must be defined in all scenarios to prevent graph compilation failures. When the flag is false, you should set a valid shape, such as [0], to represent an empty tensor.
if(expertTokenNumFlag){
if (expertTokenNumType == ExpertTokenNumType::KEY_VALUE) {
expertTokenCumsumOrCountShape->SetDimNum(DIM_TWO);
expertTokenCumsumOrCountShape->SetDim(0U, experNum);
expertTokenCumsumOrCountShape->SetDim(DIM_ONE, KEY_VALUE_MODE_DIM0_NUM);
} else {
expertTokenCumsumOrCountShape->SetDimNum(DIM_ONE);
expertTokenCumsumOrCountShape->SetDim(0U, expertEnd - expertStart);
}
} else {
expertTokenCumsumOrCountShape->SetDimNum(DIM_ONE);
expertTokenCumsumOrCountShape->SetDim(0U, 0);
}| if (expert_tokens_num_type >= CUMSUM && expert_tokens_num_type <= COUNT) { | ||
| // expert_tokens_count_or_cumsum in [end-start, ] | ||
| expert_tokens_count_or_cumsum = at::empty({expert_length}, x.options().dtype(at::kLong)); | ||
| } else if (expert_tokens_num_type == KEY_VALUE) { | ||
| // key_value in [2, end-start] | ||
| expert_tokens_count_or_cumsum = at::empty({expert_num, 2}, x.options().dtype(at::kLong)); | ||
| } |
There was a problem hiding this comment.
The expert_tokens_num_type parameter is not being validated. An invalid value could result in the expert_tokens_count_or_cumsum tensor not being initialized, leading to a runtime crash. It's important to add a TORCH_CHECK to confirm that expert_tokens_num_type is within the valid range of {0, 1, 2}, mirroring the validation in infershape.cpp.
TORCH_CHECK(expert_tokens_num_type >= CUMSUM && expert_tokens_num_type <= KEY_VALUE, "expert_tokens_num_type must be one of {0, 1, 2}, but got ", expert_tokens_num_type);
if (expert_tokens_num_type >= CUMSUM && expert_tokens_num_type <= COUNT) {
// expert_tokens_count_or_cumsum in [end-start, ]
expert_tokens_count_or_cumsum = at::empty({expert_length}, x.options().dtype(at::kLong));
} else if (expert_tokens_num_type == KEY_VALUE) {
// key_value in [2, end-start]
expert_tokens_count_or_cumsum = at::empty({expert_num, 2}, x.options().dtype(at::kLong));
}| if (expert_tokens_num_type >= CUMSUM && expert_tokens_num_type <= COUNT) { | ||
| // expert_tokens_count_or_cumsum in [end-start, ] | ||
| expert_tokens_count_or_cumsum = at::empty({expert_length}, x.options().dtype(at::kLong)); | ||
| } else if (expert_tokens_num_type == KEY_VALUE) { | ||
| // key_value in [2, end-start] | ||
| expert_tokens_count_or_cumsum = at::empty({expert_num, 2}, x.options().dtype(at::kLong)); | ||
| } |
There was a problem hiding this comment.
Similar to the main binding, this meta function is missing validation for expert_tokens_num_type. An invalid value will cause an uninitialized tensor to be returned, which can lead to issues during model tracing or exporting. Please add a TORCH_CHECK to ensure expert_tokens_num_type is valid.
TORCH_CHECK(expert_tokens_num_type >= CUMSUM && expert_tokens_num_type <= KEY_VALUE, "expert_tokens_num_type must be one of {0, 1, 2}, but got ", expert_tokens_num_type);
if (expert_tokens_num_type >= CUMSUM && expert_tokens_num_type <= COUNT) {
// expert_tokens_count_or_cumsum in [end-start, ]
expert_tokens_count_or_cumsum = at::empty({expert_length}, x.options().dtype(at::kLong));
} else if (expert_tokens_num_type == KEY_VALUE) {
// key_value in [2, end-start]
expert_tokens_count_or_cumsum = at::empty({expert_num, 2}, x.options().dtype(at::kLong));
}| expanded_x = x_final / expanded_scale | ||
| expanded_x = np.round(expanded_x).astype(np.int8) | ||
| if x.dtype == np.int8: | ||
| expanded_scale == None |
There was a problem hiding this comment.
The expression expanded_scale == None is a comparison and does not perform an assignment. This appears to be a bug in the golden implementation, as the likely intent was to assign None to expanded_scale (expanded_scale = None). This could lead to incorrect test validations if expanded_scale retains a value from a previous execution path.
| expanded_scale == None | |
| expanded_scale = None |
|
👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:
If CI fails, you can run linting and testing checks locally according Contributing and Testing. |
eaeafbe to
5c50197
Compare
3a0d47e to
e1d0f70
Compare
e1d0f70 to
c98a11a
Compare
c98a11a to
a2c1981
Compare
Signed-off-by: jiazhengyi <jiazhengyi@huawei.com>
Signed-off-by: Chenxi Qian <chenxi.qian.cq@outlook.com>
a2c1981 to
198dd70
Compare
…to FIA_rebase * 'main' of https://github.com/vllm-project/vllm-ascend: (88 commits) [1/N] Refactor nightly test structure (vllm-project#5479) Docs: Remove deprecated --task parameter for embedding models (vllm-project#5257) Revert "moe_gating_top_k" (vllm-project#5512) [Doc] Fix issue link for 0.12.0 (vllm-project#5500) [CI]update triton ascend version (vllm-project#5392) moe_gating_top_k (vllm-project#5271) [refactor] refactor model runner capture model (vllm-project#5230) Update corresponding vllm commit ID to 12 29 (vllm-project#5475) [Kernel]update csrc cmakelist for open-source cann (vllm-project#5458) [OP] add custom op aclnnMoeInitRoutingCustom (vllm-project#5251) [Refactor][EAGLE] 1/N delete __init__ in mtp_proposer (vllm-project#5176) [Refactor][Triton] Move reject sample triton kernels into ops/triton (vllm-project#5324) [Feature] support eager mode in model runner v2 (vllm-project#5210) [feature] fia support sliding windows (vllm-project#5239) Optimize some rejectsampler functions to make npu op launch non-blocking (vllm-project#4587) [Feature] Support to use fullgraph with eagle (vllm-project#5118) [EPLB][refactor] Modification of the initialization logic for expert_map and log2phy(depend on pr5285) (vllm-project#5311) [Refactor]6/N Extract common code of class AscendMLAImpl (vllm-project#5314) [Refactor] cache cos/sin in mla & remove parameter model in builder. (vllm-project#5277) update vllm pin to 12.27 (vllm-project#5412) ...
<!-- Thanks for sending a pull request! BEFORE SUBMITTING, PLEASE READ https://docs.vllm.ai/en/latest/contributing/overview.html --> ### What this PR does / why we need it? <!-- - Please clarify what changes you are proposing. The purpose of this section is to outline the changes and how this PR fixes the issue. If possible, please consider writing useful notes for better and faster reviews in your PR. - Please clarify why the changes are needed. For instance, the use case and bug description. - Fixes # --> This pull request introduces a new custom operator `aclnnMoeInitRoutingCustom` for Mixture-of-Experts models. It can be replaced by `aclnnMoeInitRoutingV3` once CANN 8.5 becomes available. ### Does this PR introduce _any_ user-facing change? <!-- Note that it means *any* user-facing change including all aspects such as API, interface or other behavior changes. Documentation-only updates are not considered user-facing changes. --> No. ### How was this patch tested? <!-- CI passed with new added/existing test. If it was tested in a way different from regular unit tests, please clarify how you tested step by step, ideally copy and paste-able, so that other reviewers can test and check, and descendants can verify in the future. If tests were not added, please describe why they were not added and/or why it was difficult to add. --> --------- Signed-off-by: jiazhengyi <jiazhengyi@huawei.com> Signed-off-by: Chenxi Qian <chenxi.qian.cq@outlook.com> Co-authored-by: jiazhengyi <jiazhengyi@huawei.com> Co-authored-by: Chenxi Qian <chenxi.qian.cq@outlook.com>
### What this PR does / why we need it? This PR enables custom op `aclnnMoeInitRoutingCustom` introduced in PR #5251 ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? - vLLM version: release/v0.13.0 - vLLM main: vllm-project/vllm@bc0a5a0 --------- Signed-off-by: QianChenxi <chenxi.qian.cq@outlook.com> Signed-off-by: zzzzwwjj <1183291235@qq.com> Co-authored-by: zzzzwwjj <1183291235@qq.com>
### What this PR does / why we need it? This PR enables custom op `aclnnMoeInitRoutingCustom` introduced in PR vllm-project#5251 ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? - vLLM version: release/v0.13.0 - vLLM main: vllm-project/vllm@bc0a5a0 --------- Signed-off-by: QianChenxi <chenxi.qian.cq@outlook.com> Signed-off-by: zzzzwwjj <1183291235@qq.com> Co-authored-by: zzzzwwjj <1183291235@qq.com>
### What this PR does / why we need it? This PR enables custom op `aclnnMoeInitRoutingCustom` introduced in PR vllm-project#5251 ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? - vLLM version: release/v0.13.0 - vLLM main: vllm-project/vllm@bc0a5a0 --------- Signed-off-by: QianChenxi <chenxi.qian.cq@outlook.com> Signed-off-by: zzzzwwjj <1183291235@qq.com> Co-authored-by: zzzzwwjj <1183291235@qq.com>
### What this PR does / why we need it? This PR enables custom op `aclnnMoeInitRoutingCustom` introduced in PR vllm-project#5251 ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? - vLLM version: release/v0.13.0 - vLLM main: vllm-project/vllm@bc0a5a0 --------- Signed-off-by: QianChenxi <chenxi.qian.cq@outlook.com> Signed-off-by: zzzzwwjj <1183291235@qq.com> Co-authored-by: zzzzwwjj <1183291235@qq.com>
### What this PR does / why we need it? This PR enables custom op `aclnnMoeInitRoutingCustom` introduced in PR vllm-project#5251 ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? - vLLM version: release/v0.13.0 - vLLM main: vllm-project/vllm@bc0a5a0 --------- Signed-off-by: QianChenxi <chenxi.qian.cq@outlook.com> Signed-off-by: zzzzwwjj <1183291235@qq.com> Co-authored-by: zzzzwwjj <1183291235@qq.com>
<!-- Thanks for sending a pull request! BEFORE SUBMITTING, PLEASE READ https://docs.vllm.ai/en/latest/contributing/overview.html --> ### What this PR does / why we need it? <!-- - Please clarify what changes you are proposing. The purpose of this section is to outline the changes and how this PR fixes the issue. If possible, please consider writing useful notes for better and faster reviews in your PR. - Please clarify why the changes are needed. For instance, the use case and bug description. - Fixes # --> This pull request introduces a new custom operator `aclnnMoeInitRoutingCustom` for Mixture-of-Experts models. It can be replaced by `aclnnMoeInitRoutingV3` once CANN 8.5 becomes available. ### Does this PR introduce _any_ user-facing change? <!-- Note that it means *any* user-facing change including all aspects such as API, interface or other behavior changes. Documentation-only updates are not considered user-facing changes. --> No. ### How was this patch tested? <!-- CI passed with new added/existing test. If it was tested in a way different from regular unit tests, please clarify how you tested step by step, ideally copy and paste-able, so that other reviewers can test and check, and descendants can verify in the future. If tests were not added, please describe why they were not added and/or why it was difficult to add. --> --------- Signed-off-by: jiazhengyi <jiazhengyi@huawei.com> Signed-off-by: Chenxi Qian <chenxi.qian.cq@outlook.com> Co-authored-by: jiazhengyi <jiazhengyi@huawei.com> Co-authored-by: Chenxi Qian <chenxi.qian.cq@outlook.com> Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
### What this PR does / why we need it? This PR enables custom op `aclnnMoeInitRoutingCustom` introduced in PR vllm-project#5251 ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? - vLLM version: release/v0.13.0 - vLLM main: vllm-project/vllm@bc0a5a0 --------- Signed-off-by: QianChenxi <chenxi.qian.cq@outlook.com> Signed-off-by: zzzzwwjj <1183291235@qq.com> Co-authored-by: zzzzwwjj <1183291235@qq.com> Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
<!-- Thanks for sending a pull request! BEFORE SUBMITTING, PLEASE READ https://docs.vllm.ai/en/latest/contributing/overview.html --> ### What this PR does / why we need it? <!-- - Please clarify what changes you are proposing. The purpose of this section is to outline the changes and how this PR fixes the issue. If possible, please consider writing useful notes for better and faster reviews in your PR. - Please clarify why the changes are needed. For instance, the use case and bug description. - Fixes # --> This pull request introduces a new custom operator `aclnnMoeInitRoutingCustom` for Mixture-of-Experts models. It can be replaced by `aclnnMoeInitRoutingV3` once CANN 8.5 becomes available. ### Does this PR introduce _any_ user-facing change? <!-- Note that it means *any* user-facing change including all aspects such as API, interface or other behavior changes. Documentation-only updates are not considered user-facing changes. --> No. ### How was this patch tested? <!-- CI passed with new added/existing test. If it was tested in a way different from regular unit tests, please clarify how you tested step by step, ideally copy and paste-able, so that other reviewers can test and check, and descendants can verify in the future. If tests were not added, please describe why they were not added and/or why it was difficult to add. --> --------- Signed-off-by: jiazhengyi <jiazhengyi@huawei.com> Signed-off-by: Chenxi Qian <chenxi.qian.cq@outlook.com> Co-authored-by: jiazhengyi <jiazhengyi@huawei.com> Co-authored-by: Chenxi Qian <chenxi.qian.cq@outlook.com>
### What this PR does / why we need it? This PR enables custom op `aclnnMoeInitRoutingCustom` introduced in PR vllm-project#5251 ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? - vLLM version: release/v0.13.0 - vLLM main: vllm-project/vllm@bc0a5a0 --------- Signed-off-by: QianChenxi <chenxi.qian.cq@outlook.com> Signed-off-by: zzzzwwjj <1183291235@qq.com> Co-authored-by: zzzzwwjj <1183291235@qq.com>
<!-- Thanks for sending a pull request! BEFORE SUBMITTING, PLEASE READ https://docs.vllm.ai/en/latest/contributing/overview.html --> ### What this PR does / why we need it? <!-- - Please clarify what changes you are proposing. The purpose of this section is to outline the changes and how this PR fixes the issue. If possible, please consider writing useful notes for better and faster reviews in your PR. - Please clarify why the changes are needed. For instance, the use case and bug description. - Fixes # --> This pull request introduces a new custom operator `aclnnMoeInitRoutingCustom` for Mixture-of-Experts models. It can be replaced by `aclnnMoeInitRoutingV3` once CANN 8.5 becomes available. ### Does this PR introduce _any_ user-facing change? <!-- Note that it means *any* user-facing change including all aspects such as API, interface or other behavior changes. Documentation-only updates are not considered user-facing changes. --> No. ### How was this patch tested? <!-- CI passed with new added/existing test. If it was tested in a way different from regular unit tests, please clarify how you tested step by step, ideally copy and paste-able, so that other reviewers can test and check, and descendants can verify in the future. If tests were not added, please describe why they were not added and/or why it was difficult to add. --> --------- Signed-off-by: jiazhengyi <jiazhengyi@huawei.com> Signed-off-by: Chenxi Qian <chenxi.qian.cq@outlook.com> Co-authored-by: jiazhengyi <jiazhengyi@huawei.com> Co-authored-by: Chenxi Qian <chenxi.qian.cq@outlook.com> Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
### What this PR does / why we need it? This PR enables custom op `aclnnMoeInitRoutingCustom` introduced in PR vllm-project#5251 ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? - vLLM version: release/v0.13.0 - vLLM main: vllm-project/vllm@bc0a5a0 --------- Signed-off-by: QianChenxi <chenxi.qian.cq@outlook.com> Signed-off-by: zzzzwwjj <1183291235@qq.com> Co-authored-by: zzzzwwjj <1183291235@qq.com> Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
### What this PR does / why we need it? This PR enables custom op `aclnnMoeInitRoutingCustom` introduced in PR vllm-project#5251 ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? - vLLM version: release/v0.13.0 - vLLM main: vllm-project/vllm@bc0a5a0 --------- Signed-off-by: QianChenxi <chenxi.qian.cq@outlook.com> Signed-off-by: zzzzwwjj <1183291235@qq.com> Co-authored-by: zzzzwwjj <1183291235@qq.com>
What this PR does / why we need it?
This pull request introduces a new custom operator
aclnnMoeInitRoutingCustomfor Mixture-of-Experts models.It can be replaced by
aclnnMoeInitRoutingV3once CANN 8.5 becomes available.Does this PR introduce any user-facing change?
No.
How was this patch tested?