[Refactor] Formatting output types related to FuseMoE by jianzs · Pull Request #5481 · vllm-project/vllm-ascend

jianzs · 2025-12-29T09:51:53Z

What this PR does / why we need it?

Currently in the Fused MoE module, functions of classes like MoECommMethod and MoETokenDispatcher output data in dictionary or tuple format, which hampers code maintainability, readability, and extensibility. This PR introduces dataclasses for these key output types to address these issues.

Does this PR introduce any user-facing change?

No

How was this patch tested?

vLLM version: v0.13.0
vLLM main: vllm-project/vllm@5326c89

gemini-code-assist

Code Review

This pull request refactors the Mixture of Experts (MoE) communication methods by introducing new dataclasses (FusedExpertsResult, TokenDispatchResult, TokenCombineResult) to standardize the return types of dispatch and combine operations. Specifically, the fused_experts function now returns a FusedExpertsResult object, and the token_dispatch and token_combine methods in various token dispatcher implementations (TokenDispatcherWithMC2, TokenDispatcherWithAllGather, TokenDispatcherWithAll2AllV) now return TokenDispatchResult and TokenCombineResult respectively. However, a critical regression was introduced in TokenDispatcherWithMC2.token_combine, where the logic for handling shared experts, including the use of shared_act and swiglu_out_scale from context_metadata and the return of shared_hidden_states, was completely removed. The reviewer notes that the current TokenCombineResult only holds routed_out, which is insufficient for shared expert functionality, and suggests restoring the shared expert processing logic and updating TokenCombineResult to include shared_hidden_states.

gemini-code-assist · 2025-12-29T09:54:42Z

vllm_ascend/ops/fused_moe/token_dispatcher.py

-            shared_hidden_states, _ = shared_experts.down_proj(shared_act)
-
-        return combined_output, shared_hidden_states
+        return TokenCombineResult(routed_out=combined_output)


The logic for handling shared experts in TokenDispatcherWithMC2.token_combine has been completely removed. This previously involved using shared_act and swiglu_out_scale from context_metadata to process shared experts and returning a tuple containing both combined_output and shared_hidden_states.

This is a critical regression that will break shared expert functionality when TokenDispatcherWithMC2 is active. The TokenCombineResult dataclass currently only holds routed_out, which is insufficient if shared experts produce a separate output that needs to be combined or returned.

To fix this, the shared expert processing logic must be restored, and the TokenCombineResult dataclass should be updated to accommodate the shared expert output, or an alternative mechanism for handling shared expert results must be implemented.

Suggested change

return TokenCombineResult(routed_out=combined_output)

return TokenCombineResult(routed_out=combined_output, shared_hidden_states=shared_hidden_states) # Example: if shared_hidden_states is needed

github-actions · 2025-12-29T10:37:09Z

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

A PR should do only one thing, smaller PRs enable faster reviews.
Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

Signed-off-by: Jade Zheng <zheng.shoujian@outlook.com>

) Currently in the Fused MoE module, functions of classes like MoECommMethod and MoETokenDispatcher output data in dictionary or tuple format, which hampers code maintainability, readability, and extensibility. This PR introduces dataclasses for these key output types to address these issues. - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@5326c89 --------- Signed-off-by: Jade Zheng <zheng.shoujian@outlook.com> Signed-off-by: wjunLu <wjunlu217@gmail.com>

) Currently in the Fused MoE module, functions of classes like MoECommMethod and MoETokenDispatcher output data in dictionary or tuple format, which hampers code maintainability, readability, and extensibility. This PR introduces dataclasses for these key output types to address these issues. - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@5326c89 --------- Signed-off-by: Jade Zheng <zheng.shoujian@outlook.com>

) Currently in the Fused MoE module, functions of classes like MoECommMethod and MoETokenDispatcher output data in dictionary or tuple format, which hampers code maintainability, readability, and extensibility. This PR introduces dataclasses for these key output types to address these issues. - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@5326c89 --------- Signed-off-by: Jade Zheng <zheng.shoujian@outlook.com> Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>

) Currently in the Fused MoE module, functions of classes like MoECommMethod and MoETokenDispatcher output data in dictionary or tuple format, which hampers code maintainability, readability, and extensibility. This PR introduces dataclasses for these key output types to address these issues. - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@5326c89 --------- Signed-off-by: Jade Zheng <zheng.shoujian@outlook.com>

) Currently in the Fused MoE module, functions of classes like MoECommMethod and MoETokenDispatcher output data in dictionary or tuple format, which hampers code maintainability, readability, and extensibility. This PR introduces dataclasses for these key output types to address these issues. - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@5326c89 --------- Signed-off-by: Jade Zheng <zheng.shoujian@outlook.com> Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>

gemini-code-assist bot reviewed Dec 29, 2025

View reviewed changes

jianzs mentioned this pull request Dec 29, 2025

[Feature] Support fine-grained shared expert overlap #5482

Merged

jianzs requested review from Yikun, realliujiaxu and wangxiyuan December 29, 2025 10:21

github-actions bot added module:tests module:ops labels Dec 29, 2025

realliujiaxu approved these changes Dec 29, 2025

View reviewed changes

jianzs added ready read for review ready-for-test start test by label for PR labels Dec 29, 2025

jianzs force-pushed the refactor/fused-moe branch 3 times, most recently from 1d1cc33 to b66e7e2 Compare December 30, 2025 11:04

jianzs added 6 commits December 30, 2025 20:54

[Refactor] Formatting output types related to FuseMoE

0d565a5

Signed-off-by: Jade Zheng <zheng.shoujian@outlook.com>

fix: test cases

8cbff5b

Signed-off-by: Jade Zheng <zheng.shoujian@outlook.com>

chore: lint code

2bffcd2

Signed-off-by: Jade Zheng <zheng.shoujian@outlook.com>

chore: lint code

f46154f

Signed-off-by: Jade Zheng <zheng.shoujian@outlook.com>

fix: tests

4cabe0c

Signed-off-by: Jade Zheng <zheng.shoujian@outlook.com>

fix: tests

6c0962e

Signed-off-by: Jade Zheng <zheng.shoujian@outlook.com>

jianzs force-pushed the refactor/fused-moe branch from b66e7e2 to 6c0962e Compare December 30, 2025 12:54

jianzs requested a review from zzzzwwjj December 30, 2025 17:04

jianzs merged commit 7d5242f into vllm-project:main Dec 31, 2025
19 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Refactor] Formatting output types related to FuseMoE#5481

[Refactor] Formatting output types related to FuseMoE#5481
jianzs merged 6 commits intovllm-project:mainfrom
jianzs:refactor/fused-moe

jianzs commented Dec 29, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Dec 29, 2025

Uh oh!

github-actions bot commented Dec 29, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	return TokenCombineResult(routed_out=combined_output)
	return TokenCombineResult(routed_out=combined_output, shared_hidden_states=shared_hidden_states) # Example: if shared_hidden_states is needed

Conversation

jianzs commented Dec 29, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Dec 29, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Dec 29, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jianzs commented Dec 29, 2025 •

edited by github-actions bot

Loading