Skip to content

[Refactor] Formatting output types related to FuseMoE#5481

Merged
jianzs merged 6 commits intovllm-project:mainfrom
jianzs:refactor/fused-moe
Dec 31, 2025
Merged

[Refactor] Formatting output types related to FuseMoE#5481
jianzs merged 6 commits intovllm-project:mainfrom
jianzs:refactor/fused-moe

Conversation

@jianzs
Copy link
Copy Markdown
Collaborator

@jianzs jianzs commented Dec 29, 2025

What this PR does / why we need it?

Currently in the Fused MoE module, functions of classes like MoECommMethod and MoETokenDispatcher output data in dictionary or tuple format, which hampers code maintainability, readability, and extensibility. This PR introduces dataclasses for these key output types to address these issues.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the Mixture of Experts (MoE) communication methods by introducing new dataclasses (FusedExpertsResult, TokenDispatchResult, TokenCombineResult) to standardize the return types of dispatch and combine operations. Specifically, the fused_experts function now returns a FusedExpertsResult object, and the token_dispatch and token_combine methods in various token dispatcher implementations (TokenDispatcherWithMC2, TokenDispatcherWithAllGather, TokenDispatcherWithAll2AllV) now return TokenDispatchResult and TokenCombineResult respectively. However, a critical regression was introduced in TokenDispatcherWithMC2.token_combine, where the logic for handling shared experts, including the use of shared_act and swiglu_out_scale from context_metadata and the return of shared_hidden_states, was completely removed. The reviewer notes that the current TokenCombineResult only holds routed_out, which is insufficient for shared expert functionality, and suggests restoring the shared expert processing logic and updating TokenCombineResult to include shared_hidden_states.

shared_hidden_states, _ = shared_experts.down_proj(shared_act)

return combined_output, shared_hidden_states
return TokenCombineResult(routed_out=combined_output)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The logic for handling shared experts in TokenDispatcherWithMC2.token_combine has been completely removed. This previously involved using shared_act and swiglu_out_scale from context_metadata to process shared experts and returning a tuple containing both combined_output and shared_hidden_states.

This is a critical regression that will break shared expert functionality when TokenDispatcherWithMC2 is active. The TokenCombineResult dataclass currently only holds routed_out, which is insufficient if shared experts produce a separate output that needs to be combined or returned.

To fix this, the shared expert processing logic must be restored, and the TokenCombineResult dataclass should be updated to accommodate the shared expert output, or an alternative mechanism for handling shared expert results must be implemented.

Suggested change
return TokenCombineResult(routed_out=combined_output)
return TokenCombineResult(routed_out=combined_output, shared_hidden_states=shared_hidden_states) # Example: if shared_hidden_states is needed

@github-actions
Copy link
Copy Markdown
Contributor

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

  • A PR should do only one thing, smaller PRs enable faster reviews.
  • Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
  • Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

@jianzs jianzs added ready read for review ready-for-test start test by label for PR labels Dec 29, 2025
@jianzs jianzs force-pushed the refactor/fused-moe branch 3 times, most recently from 1d1cc33 to b66e7e2 Compare December 30, 2025 11:04
Signed-off-by: Jade Zheng <zheng.shoujian@outlook.com>
Signed-off-by: Jade Zheng <zheng.shoujian@outlook.com>
Signed-off-by: Jade Zheng <zheng.shoujian@outlook.com>
Signed-off-by: Jade Zheng <zheng.shoujian@outlook.com>
Signed-off-by: Jade Zheng <zheng.shoujian@outlook.com>
Signed-off-by: Jade Zheng <zheng.shoujian@outlook.com>
@jianzs jianzs force-pushed the refactor/fused-moe branch from b66e7e2 to 6c0962e Compare December 30, 2025 12:54
@jianzs jianzs requested a review from zzzzwwjj December 30, 2025 17:04
@jianzs jianzs merged commit 7d5242f into vllm-project:main Dec 31, 2025
19 checks passed
wjunLu pushed a commit to wjunLu/vllm-ascend that referenced this pull request Jan 4, 2026
)

Currently in the Fused MoE module, functions of classes like
MoECommMethod and MoETokenDispatcher output data in dictionary or tuple
format, which hampers code maintainability, readability, and
extensibility. This PR introduces dataclasses for these key output types
to address these issues.

- vLLM version: v0.13.0
- vLLM main:
vllm-project/vllm@5326c89

---------

Signed-off-by: Jade Zheng <zheng.shoujian@outlook.com>
Signed-off-by: wjunLu <wjunlu217@gmail.com>
Rozwel-dx pushed a commit to Rozwel-dx/vllm-ascend that referenced this pull request Jan 8, 2026
)

Currently in the Fused MoE module, functions of classes like
MoECommMethod and MoETokenDispatcher output data in dictionary or tuple
format, which hampers code maintainability, readability, and
extensibility. This PR introduces dataclasses for these key output types
to address these issues.

- vLLM version: v0.13.0
- vLLM main:
vllm-project/vllm@5326c89

---------

Signed-off-by: Jade Zheng <zheng.shoujian@outlook.com>
ZRJ026 pushed a commit to ZRJ026/vllm-ascend that referenced this pull request Feb 28, 2026
)

Currently in the Fused MoE module, functions of classes like
MoECommMethod and MoETokenDispatcher output data in dictionary or tuple
format, which hampers code maintainability, readability, and
extensibility. This PR introduces dataclasses for these key output types
to address these issues.

- vLLM version: v0.13.0
- vLLM main:
vllm-project/vllm@5326c89

---------

Signed-off-by: Jade Zheng <zheng.shoujian@outlook.com>
Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
maoxx241 pushed a commit to maoxx241/vllm-ascend that referenced this pull request Mar 2, 2026
)

Currently in the Fused MoE module, functions of classes like
MoECommMethod and MoETokenDispatcher output data in dictionary or tuple
format, which hampers code maintainability, readability, and
extensibility. This PR introduces dataclasses for these key output types
to address these issues.

- vLLM version: v0.13.0
- vLLM main:
vllm-project/vllm@5326c89

---------

Signed-off-by: Jade Zheng <zheng.shoujian@outlook.com>
ZRJ026 pushed a commit to ZRJ026/vllm-ascend that referenced this pull request Mar 4, 2026
)

Currently in the Fused MoE module, functions of classes like
MoECommMethod and MoETokenDispatcher output data in dictionary or tuple
format, which hampers code maintainability, readability, and
extensibility. This PR introduces dataclasses for these key output types
to address these issues.

- vLLM version: v0.13.0
- vLLM main:
vllm-project/vllm@5326c89

---------

Signed-off-by: Jade Zheng <zheng.shoujian@outlook.com>
Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

module:ops module:tests ready read for review ready-for-test start test by label for PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants