Skip to content

add dispatch_gmm_combine kernel#3532

Merged
wangxiyuan merged 3 commits intovllm-project:mainfrom
kiscad:fused-mc2
Dec 4, 2025
Merged

add dispatch_gmm_combine kernel#3532
wangxiyuan merged 3 commits intovllm-project:mainfrom
kiscad:fused-mc2

Conversation

@kiscad
Copy link
Copy Markdown
Contributor

@kiscad kiscad commented Oct 18, 2025

What this PR does / why we need it?

This PR introduces the Ascend implementation of the dispatch_ffn_combine kernel and wires it into the vLLM-Ascend runtime, together with follow‑up fixes to ensure the kernel builds and runs correctly in CI.

  • Add full host and device implementation of the dispatch_ffn_combine kernel under csrc/dispatch_ffn_combine, including tiling logic, MOE routing helpers, and kernel utilities for quantized FFN dispatch.
  • Integrate the new kernel with the PyTorch binding (csrc/torch_binding.cpp, csrc/torch_binding_meta.cpp) and the Ascend runtime (vllm_ascend/ascend_forward_context.py, vllm_ascend/worker/model_runner_v1.py).
  • Extend fused MoE communication and token dispatch support in vllm_ascend/ops/fused_moe, adding methods/utilities needed by the new dispatch path.
  • Update quantization logic in vllm_ascend/quantization/w8a8_dynamic.py to support the new FFN dispatch flow.
  • Fix kernel build issues by adjusting csrc/build_aclnn.sh, CMake configuration, and include/namespace usage in the new kernel files.
  • Add an end‑to‑end nightly test tests/e2e/nightly/ops/test_dispatch_ffn_combine.py and helper utilities in vllm_ascend/utils.py to validate the new kernel.

Does this PR introduce any user-facing change?

How was this patch tested?

@github-actions
Copy link
Copy Markdown
Contributor

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

  • A PR should do only one thing, smaller PRs enable faster reviews.
  • Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
  • Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new dispatch_gmm_combine kernel and refactors some file paths for better organization. My review identified two critical issues with the new kernel implementation. First, in csrc/torch_binding.cpp, there's an unsafe use of c10::string_view when calling a C-style API, which could lead to buffer over-reads or crashes. Second, in csrc/torch_binding_meta.cpp, the meta function for the new operator has a signature mismatch with its schema, which will prevent the operator from being registered correctly. I've provided suggestions to fix both of these critical issues. The rest of the changes, which are mainly include path updates due to file moves, appear to be correct.

Comment thread csrc/torch_binding.cpp Outdated
Comment thread csrc/torch_binding_meta.cpp Outdated
@kiscad kiscad marked this pull request as draft October 18, 2025 13:00
@github-actions github-actions bot added the documentation Improvements or additions to documentation label Oct 20, 2025
@kiscad kiscad force-pushed the fused-mc2 branch 2 times, most recently from d15a37b to a04f6d8 Compare October 20, 2025 03:28
@kiscad kiscad marked this pull request as ready for review October 20, 2025 03:46
@kiscad kiscad force-pushed the fused-mc2 branch 6 times, most recently from f350f81 to d0f648d Compare October 21, 2025 02:08
@kiscad kiscad force-pushed the fused-mc2 branch 4 times, most recently from 4632552 to 0c3328b Compare October 21, 2025 07:19
@github-actions
Copy link
Copy Markdown
Contributor

This pull request has conflicts, please resolve those before we can evaluate the pull request.

@jianzs
Copy link
Copy Markdown
Collaborator

jianzs commented Oct 28, 2025

@kiscad Is all the code related to this feature?

@kiscad
Copy link
Copy Markdown
Contributor Author

kiscad commented Oct 29, 2025

@kiscad Is all the code related to this feature?

Yes, this is a complex kernel including gmm and hccl communication.

@kiscad kiscad force-pushed the fused-mc2 branch 3 times, most recently from 19f77ab to 2fd4bc9 Compare November 24, 2025 02:03
@github-actions
Copy link
Copy Markdown
Contributor

This pull request has conflicts, please resolve those before we can evaluate the pull request.

Comment thread vllm_ascend/ops/fused_moe/token_dispatcher.py Outdated
Comment thread vllm_ascend/quantization/w8a8_dynamic.py Outdated
@kiscad kiscad force-pushed the fused-mc2 branch 3 times, most recently from 95160a2 to e11c13d Compare December 1, 2025 08:24
@github-actions github-actions bot removed merge-conflicts documentation Improvements or additions to documentation module:tools labels Dec 1, 2025
@kiscad kiscad force-pushed the fused-mc2 branch 5 times, most recently from 72ab8ce to 919dc72 Compare December 2, 2025 14:10
@@ -2245,7 +2248,8 @@ def _select_moe_comm_method(self,
elif soc_version in {AscendDeviceType._910_93}:
moe_comm_type = (MoECommType.MC2
Copy link
Copy Markdown
Collaborator

@Angazenn Angazenn Dec 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not using FUSED_MC2?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is an accuracy problem with the FUSED_MC2. We are working on it.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Dec 3, 2025

This pull request has conflicts, please resolve those before we can evaluate the pull request.

Signed-off-by: mojave2 <chenchen145@huawei.com>
Signed-off-by: mojave2 <chenchen145@huawei.com>
@wangxiyuan
Copy link
Copy Markdown
Collaborator

please remove the chinese note in a follow up PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants