add `dispatch_gmm_combine` kernel by kiscad · Pull Request #3532 · vllm-project/vllm-ascend

kiscad · 2025-10-18T12:52:50Z

What this PR does / why we need it?

This PR introduces the Ascend implementation of the dispatch_ffn_combine kernel and wires it into the vLLM-Ascend runtime, together with follow‑up fixes to ensure the kernel builds and runs correctly in CI.

Add full host and device implementation of the dispatch_ffn_combine kernel under csrc/dispatch_ffn_combine, including tiling logic, MOE routing helpers, and kernel utilities for quantized FFN dispatch.
Integrate the new kernel with the PyTorch binding (csrc/torch_binding.cpp, csrc/torch_binding_meta.cpp) and the Ascend runtime (vllm_ascend/ascend_forward_context.py, vllm_ascend/worker/model_runner_v1.py).
Extend fused MoE communication and token dispatch support in vllm_ascend/ops/fused_moe, adding methods/utilities needed by the new dispatch path.
Update quantization logic in vllm_ascend/quantization/w8a8_dynamic.py to support the new FFN dispatch flow.
Fix kernel build issues by adjusting csrc/build_aclnn.sh, CMake configuration, and include/namespace usage in the new kernel files.
Add an end‑to‑end nightly test tests/e2e/nightly/ops/test_dispatch_ffn_combine.py and helper utilities in vllm_ascend/utils.py to validate the new kernel.

Does this PR introduce any user-facing change?

How was this patch tested?

vLLM version: v0.12.0
vLLM main: vllm-project/vllm@ad32e3e

github-actions · 2025-10-18T12:52:57Z

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

A PR should do only one thing, smaller PRs enable faster reviews.
Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

gemini-code-assist

Code Review

This pull request introduces a new dispatch_gmm_combine kernel and refactors some file paths for better organization. My review identified two critical issues with the new kernel implementation. First, in csrc/torch_binding.cpp, there's an unsafe use of c10::string_view when calling a C-style API, which could lead to buffer over-reads or crashes. Second, in csrc/torch_binding_meta.cpp, the meta function for the new operator has a signature mismatch with its schema, which will prevent the operator from being registered correctly. I've provided suggestions to fix both of these critical issues. The rest of the changes, which are mainly include path updates due to file moves, appear to be correct.

github-actions · 2025-10-22T03:45:55Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

jianzs · 2025-10-28T08:47:26Z

@kiscad Is all the code related to this feature?

kiscad · 2025-10-29T08:47:09Z

@kiscad Is all the code related to this feature?

Yes, this is a complex kernel including gmm and hccl communication.

github-actions · 2025-11-24T09:34:48Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

Angazenn · 2025-12-02T15:01:43Z

@@ -2245,7 +2248,8 @@ def _select_moe_comm_method(self,
        elif soc_version in {AscendDeviceType._910_93}:
            moe_comm_type = (MoECommType.MC2


Not using FUSED_MC2?

There is an accuracy problem with the FUSED_MC2. We are working on it.

github-actions · 2025-12-03T01:55:21Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

Signed-off-by: mojave2 <chenchen145@huawei.com>

wangxiyuan · 2025-12-04T15:00:29Z

please remove the chinese note in a follow up PR

github-actions bot added the module:tests label Oct 18, 2025

gemini-code-assist bot reviewed Oct 18, 2025

View reviewed changes

Comment thread csrc/torch_binding.cpp Outdated

Comment thread csrc/torch_binding_meta.cpp Outdated

kiscad marked this pull request as draft October 18, 2025 13:00

kiscad force-pushed the fused-mc2 branch from 1c8b212 to f898dd8 Compare October 18, 2025 13:12

github-actions bot added module:ops module:quantization labels Oct 18, 2025

kiscad force-pushed the fused-mc2 branch from f898dd8 to 2be4789 Compare October 20, 2025 02:29

github-actions bot added the documentation Improvements or additions to documentation label Oct 20, 2025

kiscad force-pushed the fused-mc2 branch 2 times, most recently from d15a37b to a04f6d8 Compare October 20, 2025 03:28

kiscad marked this pull request as ready for review October 20, 2025 03:46

kiscad force-pushed the fused-mc2 branch 6 times, most recently from f350f81 to d0f648d Compare October 21, 2025 02:08

github-actions bot added the module:tools label Oct 21, 2025

kiscad force-pushed the fused-mc2 branch 4 times, most recently from 4632552 to 0c3328b Compare October 21, 2025 07:19

github-actions bot added the merge-conflicts label Oct 22, 2025

kiscad force-pushed the fused-mc2 branch from ab38557 to 84bcea7 Compare October 29, 2025 08:45

kiscad force-pushed the fused-mc2 branch from 84bcea7 to 127d544 Compare October 29, 2025 12:26

github-actions bot removed the merge-conflicts label Oct 29, 2025

kiscad force-pushed the fused-mc2 branch 3 times, most recently from 19f77ab to 2fd4bc9 Compare November 24, 2025 02:03

github-actions bot added the merge-conflicts label Nov 24, 2025

weijinqian0 reviewed Nov 29, 2025

View reviewed changes

Comment thread vllm_ascend/ops/fused_moe/token_dispatcher.py Outdated

weijinqian0 reviewed Nov 29, 2025

View reviewed changes

Comment thread vllm_ascend/quantization/w8a8_dynamic.py Outdated

kiscad force-pushed the fused-mc2 branch 3 times, most recently from 95160a2 to e11c13d Compare December 1, 2025 08:24

github-actions bot removed merge-conflicts documentation Improvements or additions to documentation module:tools labels Dec 1, 2025

kiscad force-pushed the fused-mc2 branch 5 times, most recently from 72ab8ce to 919dc72 Compare December 2, 2025 14:10

Angazenn reviewed Dec 2, 2025

View reviewed changes

github-actions bot added the merge-conflicts label Dec 3, 2025

kiscad force-pushed the fused-mc2 branch from 919dc72 to 4193df8 Compare December 3, 2025 02:07

github-actions bot removed the merge-conflicts label Dec 3, 2025

kiscad added 2 commits December 4, 2025 16:18

add disptach_ffn_combine kernel

dcc5f9e

Signed-off-by: mojave2 <chenchen145@huawei.com>

fix kernel building issue

58315fb

Signed-off-by: mojave2 <chenchen145@huawei.com>

wangxiyuan approved these changes Dec 4, 2025

View reviewed changes

Merge branch 'main' into fused-mc2

dae8141

kiscad mentioned this pull request Dec 23, 2025

[bugfix] remove the EP buffer allocation introduced by fused-op dispatch_ffn_c… #5284

Merged

gao12312 mentioned this pull request Jan 9, 2026

[Bug]: qwen3 coder failed to start and get HCCL function error #5748

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add `dispatch_gmm_combine` kernel#3532

add `dispatch_gmm_combine` kernel#3532
wangxiyuan merged 3 commits intovllm-project:mainfrom
kiscad:fused-mc2

kiscad commented Oct 18, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Oct 18, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Oct 22, 2025

Uh oh!

jianzs commented Oct 28, 2025

Uh oh!

kiscad commented Oct 29, 2025

Uh oh!

github-actions bot commented Nov 24, 2025

Uh oh!

Uh oh!

Uh oh!

Angazenn Dec 2, 2025 •

edited

Loading

Uh oh!

kiscad Dec 3, 2025

Uh oh!

github-actions bot commented Dec 3, 2025

Uh oh!

wangxiyuan commented Dec 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

		@@ -2245,7 +2248,8 @@ def _select_moe_comm_method(self,
		elif soc_version in {AscendDeviceType._910_93}:
		moe_comm_type = (MoECommType.MC2

Conversation

kiscad commented Oct 18, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

github-actions bot commented Oct 18, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Oct 22, 2025

Uh oh!

jianzs commented Oct 28, 2025

Uh oh!

kiscad commented Oct 29, 2025

Uh oh!

github-actions bot commented Nov 24, 2025

Uh oh!

Uh oh!

Uh oh!

Angazenn Dec 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kiscad Dec 3, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Dec 3, 2025

Uh oh!

wangxiyuan commented Dec 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

kiscad commented Oct 18, 2025 •

edited by github-actions bot

Loading

Angazenn Dec 2, 2025 •

edited

Loading