[MiniMax-M2] Remove reduce_results kwarg from FusedMoE init#1444
Merged
Conversation
Contributor
There was a problem hiding this comment.
Pull request overview
This PR updates the Gaudi-specific MiniMax-M2 MoE implementation to stay compatible with upstream vLLM by removing a no-longer-supported reduce_results keyword argument when constructing FusedMoE, preventing worker startup failures.
Changes:
- Remove the deprecated
reduce_results=Falsekwarg fromFusedMoE(...)initialization inHpuMiniMaxM2MoE.
✅ CI PassedAll checks passed successfully against the following vllm commit: |
iboiko-habana
approved these changes
May 13, 2026
Collaborator
iboiko-habana
left a comment
There was a problem hiding this comment.
reduce_results was removed in vllm-project/vllm#35949. Thanks for fix
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Removes the reduce_results=False argument passed to FusedMoE in HpuMiniMaxM2MoE which is no longer accepted by upstream VLLM and causes worker startup to fail.
Upstream VLLM removed the reduce_results parameter from Fused MoE_init_ (vllm/model_executor/layers/fused_moe/layer.py). THe MoE output reduction is now decided internally based on TP/EP topology. The corresponding upstream model MiniMaxM2MoE (vllm/model_executor/models/minimax_m2.py) was updated accordingly, but the HPU port HpuMiniMaxM2MoE was not, so it still passes the now-unknown kwarg.
Fix :
Drop the reduce_results=False kwarg from the FusedMoE construction in HpuMiniMaxM2MoE. Behavior is unchanged because upstream now governs MoE output reduction internally based on TP/EP configuration.