Fix accuracy issue in minimax_m2 with TP > 1#1451
Open
skavulya wants to merge 2 commits into
Open
Conversation
Contributor
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
Removes an explicit tensor-parallel all-reduce on the MoE output in the MiniMax-M2 model's forward pass.
Changes:
- Drop the
tp_size > 1branch that calledtensor_model_parallel_all_reduceonfinal_hidden_states.
| @@ -108,8 +108,6 @@ def forward(self, hidden_states: torch.Tensor) -> torch.Tensor: | |||
| router_logits, _ = self.gate(hidden_states.to(torch.float32)) | |||
| final_hidden_states = self.experts(hidden_states=hidden_states, router_logits=router_logits) | |||
Contributor
Author
There was a problem hiding this comment.
54d471d to
c29bddd
Compare
Signed-off-by: Soila Kavulya <soila.p.kavulya@intel.com>
c29bddd to
e397ecd
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fix accuracy of minimax m2 for tensor parallel size > 1. Reduce is handled in FusedMoE after #1377 and
reduce_results=Falsedropped #1444Output without this PR:
{"id":"chatcmpl-8eb68aec66d7f527","object":"chat.completion","created":1778891236,"prompt_routed_experts":null,"model":"/mnt/weka/data/llm-d-models-pv/MiniMaxAI-MiniMax-M2.7","choices":[{"index":0,"message":{"role":"assistant","content":"I hadnet me find a programme2/apto/c- 241?.o. no (the operation.yb-b\n> ыйо, not change this;~~ I think_colour =="light pink";}) in...\n**The These must be not} was\n and \n\n):\n\nI('key=ельблиматš micrac / 1)2rasm_0.2 → add__2dict_eagle/tabString/im不过是 \list-ofchf_one \nCompute_with_prt_init: (New Tool Pro)\n-Main%-day_ ** [B1] : {nb_z0'];\n--own-traor: with: =: use 0.096-10_l_`this col0: 26;```\n</t_lN-蔓音频四文アنتストu+002:htt 도 원책임.(↑): The thought_dirty_s","refusal":null,"annotations":null,"audio":null,"function_call":null,"tool_calls":[],"reasoning":null},"logprobs":null,"finish_reason":"length","stop_reason":null,"token_ids":null,"routed_experts":null}],"service_tier":null,"system_fingerprint":"vllm-0.20.1rc1.dev276+g54f548e9e-tp4-ep-614b7488","usage":{"prompt_tokens":45,"total_tokens":245,"completion_tokens":200,"prompt_tokens_details":null},"prompt_logprobs":null,"prompt_token_ids":null,"kv_transfer_params":null}
With PR
{"id":"chatcmpl-b79acb2e48acc5d0","object":"chat.completion","created":1778891747,"prompt_routed_experts":null,"model":"/mnt/weka/data/llm-d-models-pv/MiniMaxAI-MiniMax-M2.7","choices":[{"index":0,"message":{"role":"assistant","content":"We are going to write a quick sort algorithm in Python.\n We will define a function quicksort that takes a list as input.\n We will choose a pivot (commonly the last element, but we can also choose a random element or the middle).\n We will partition the list into two parts: elements less than the pivot and elements greater than the pivot.\n Then we recursively sort the two parts and combine them with the pivot in between.\n\n However, note that the problem asks for a quick sort algorithm, so we'll implement the standard in-place quick sort.\n\n Steps:\n 1. If the list has length 0 or 1, it is already sorted.\n 2. Otherwise, select a pivot (we'll use the last element for simplicity).\n 3. Partition the list into two sublists: left (elements less than pivot) and right (elements greater than or equal to pivot).\n 4. Return the sorted left part, then the pivot, then the sorted right part.\n\n Alternatively, we","refusal":null,"annotations":null,"audio":null,"function_call":null,"tool_calls":[],"reasoning":null},"logprobs":null,"finish_reason":"length","stop_reason":null,"token_ids":null,"routed_experts":null}],"service_tier":null,"system_fingerprint":"vllm-0.20.1rc1.dev276+g54f548e9e-tp4-ep-614b7488","usage":{"prompt_tokens":45,"total_tokens":245,"completion_tokens":200,"prompt_tokens_details":null},"prompt_logprobs":null,"prompt_token_ids":null,"kv_transfer_params":null}