add dispatch_gmm_combine kernel#3532
Conversation
|
👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:
If CI fails, you can run linting and testing checks locally according Contributing and Testing. |
There was a problem hiding this comment.
Code Review
This pull request introduces a new dispatch_gmm_combine kernel and refactors some file paths for better organization. My review identified two critical issues with the new kernel implementation. First, in csrc/torch_binding.cpp, there's an unsafe use of c10::string_view when calling a C-style API, which could lead to buffer over-reads or crashes. Second, in csrc/torch_binding_meta.cpp, the meta function for the new operator has a signature mismatch with its schema, which will prevent the operator from being registered correctly. I've provided suggestions to fix both of these critical issues. The rest of the changes, which are mainly include path updates due to file moves, appear to be correct.
d15a37b to
a04f6d8
Compare
f350f81 to
d0f648d
Compare
4632552 to
0c3328b
Compare
|
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
|
@kiscad Is all the code related to this feature? |
Yes, this is a complex kernel including gmm and hccl communication. |
19f77ab to
2fd4bc9
Compare
|
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
95160a2 to
e11c13d
Compare
72ab8ce to
919dc72
Compare
| @@ -2245,7 +2248,8 @@ def _select_moe_comm_method(self, | |||
| elif soc_version in {AscendDeviceType._910_93}: | |||
| moe_comm_type = (MoECommType.MC2 | |||
There was a problem hiding this comment.
There is an accuracy problem with the FUSED_MC2. We are working on it.
|
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
Signed-off-by: mojave2 <chenchen145@huawei.com>
Signed-off-by: mojave2 <chenchen145@huawei.com>
|
please remove the chinese note in a follow up PR |
What this PR does / why we need it?
This PR introduces the Ascend implementation of the
dispatch_ffn_combinekernel and wires it into the vLLM-Ascend runtime, together with follow‑up fixes to ensure the kernel builds and runs correctly in CI.dispatch_ffn_combinekernel undercsrc/dispatch_ffn_combine, including tiling logic, MOE routing helpers, and kernel utilities for quantized FFN dispatch.vllm_ascend/ops/fused_moe, adding methods/utilities needed by the new dispatch path.csrc/build_aclnn.sh, CMake configuration, and include/namespace usage in the new kernel files.tests/e2e/nightly/ops/test_dispatch_ffn_combine.pyand helper utilities invllm_ascend/utils.pyto validate the new kernel.Does this PR introduce any user-facing change?
How was this patch tested?