[Refactor] Formatting output types related to FuseMoE#5481
[Refactor] Formatting output types related to FuseMoE#5481jianzs merged 6 commits intovllm-project:mainfrom
Conversation
There was a problem hiding this comment.
Code Review
This pull request refactors the Mixture of Experts (MoE) communication methods by introducing new dataclasses (FusedExpertsResult, TokenDispatchResult, TokenCombineResult) to standardize the return types of dispatch and combine operations. Specifically, the fused_experts function now returns a FusedExpertsResult object, and the token_dispatch and token_combine methods in various token dispatcher implementations (TokenDispatcherWithMC2, TokenDispatcherWithAllGather, TokenDispatcherWithAll2AllV) now return TokenDispatchResult and TokenCombineResult respectively. However, a critical regression was introduced in TokenDispatcherWithMC2.token_combine, where the logic for handling shared experts, including the use of shared_act and swiglu_out_scale from context_metadata and the return of shared_hidden_states, was completely removed. The reviewer notes that the current TokenCombineResult only holds routed_out, which is insufficient for shared expert functionality, and suggests restoring the shared expert processing logic and updating TokenCombineResult to include shared_hidden_states.
| shared_hidden_states, _ = shared_experts.down_proj(shared_act) | ||
|
|
||
| return combined_output, shared_hidden_states | ||
| return TokenCombineResult(routed_out=combined_output) |
There was a problem hiding this comment.
The logic for handling shared experts in TokenDispatcherWithMC2.token_combine has been completely removed. This previously involved using shared_act and swiglu_out_scale from context_metadata to process shared experts and returning a tuple containing both combined_output and shared_hidden_states.
This is a critical regression that will break shared expert functionality when TokenDispatcherWithMC2 is active. The TokenCombineResult dataclass currently only holds routed_out, which is insufficient if shared experts produce a separate output that needs to be combined or returned.
To fix this, the shared expert processing logic must be restored, and the TokenCombineResult dataclass should be updated to accommodate the shared expert output, or an alternative mechanism for handling shared expert results must be implemented.
| return TokenCombineResult(routed_out=combined_output) | |
| return TokenCombineResult(routed_out=combined_output, shared_hidden_states=shared_hidden_states) # Example: if shared_hidden_states is needed |
|
👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:
If CI fails, you can run linting and testing checks locally according Contributing and Testing. |
1d1cc33 to
b66e7e2
Compare
Signed-off-by: Jade Zheng <zheng.shoujian@outlook.com>
Signed-off-by: Jade Zheng <zheng.shoujian@outlook.com>
Signed-off-by: Jade Zheng <zheng.shoujian@outlook.com>
Signed-off-by: Jade Zheng <zheng.shoujian@outlook.com>
Signed-off-by: Jade Zheng <zheng.shoujian@outlook.com>
Signed-off-by: Jade Zheng <zheng.shoujian@outlook.com>
b66e7e2 to
6c0962e
Compare
) Currently in the Fused MoE module, functions of classes like MoECommMethod and MoETokenDispatcher output data in dictionary or tuple format, which hampers code maintainability, readability, and extensibility. This PR introduces dataclasses for these key output types to address these issues. - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@5326c89 --------- Signed-off-by: Jade Zheng <zheng.shoujian@outlook.com> Signed-off-by: wjunLu <wjunlu217@gmail.com>
) Currently in the Fused MoE module, functions of classes like MoECommMethod and MoETokenDispatcher output data in dictionary or tuple format, which hampers code maintainability, readability, and extensibility. This PR introduces dataclasses for these key output types to address these issues. - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@5326c89 --------- Signed-off-by: Jade Zheng <zheng.shoujian@outlook.com>
) Currently in the Fused MoE module, functions of classes like MoECommMethod and MoETokenDispatcher output data in dictionary or tuple format, which hampers code maintainability, readability, and extensibility. This PR introduces dataclasses for these key output types to address these issues. - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@5326c89 --------- Signed-off-by: Jade Zheng <zheng.shoujian@outlook.com> Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
) Currently in the Fused MoE module, functions of classes like MoECommMethod and MoETokenDispatcher output data in dictionary or tuple format, which hampers code maintainability, readability, and extensibility. This PR introduces dataclasses for these key output types to address these issues. - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@5326c89 --------- Signed-off-by: Jade Zheng <zheng.shoujian@outlook.com>
) Currently in the Fused MoE module, functions of classes like MoECommMethod and MoETokenDispatcher output data in dictionary or tuple format, which hampers code maintainability, readability, and extensibility. This PR introduces dataclasses for these key output types to address these issues. - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@5326c89 --------- Signed-off-by: Jade Zheng <zheng.shoujian@outlook.com> Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
What this PR does / why we need it?
Currently in the Fused MoE module, functions of classes like MoECommMethod and MoETokenDispatcher output data in dictionary or tuple format, which hampers code maintainability, readability, and extensibility. This PR introduces dataclasses for these key output types to address these issues.
Does this PR introduce any user-facing change?
No
How was this patch tested?