Skip to content

[v0.21.0] Fix accuracy issue in minimax_m2 with TP > 1#1506

Closed
skavulya wants to merge 2 commits into
vllm-project:releases/v0.21.0from
skavulya:skavulya/minimax2_accuracy_v0.21.0
Closed

[v0.21.0] Fix accuracy issue in minimax_m2 with TP > 1#1506
skavulya wants to merge 2 commits into
vllm-project:releases/v0.21.0from
skavulya:skavulya/minimax2_accuracy_v0.21.0

Conversation

@skavulya
Copy link
Copy Markdown
Contributor

Fix accuracy of minimax m2 for tensor parallel size > 1. Reduce is handled in FusedMoE after #1377 and reduce_results=False dropped #1444

Output without this PR:

curl http://localhost:8000/v1/chat/completions -H "Content-Type: application/json" -d '{
"model": "/mnt/weka/data/llm-d-models-pv/MiniMaxAI-MiniMax-M2.7",
"messages": [
{"role": "user", "content": [{"type": "text", "text": "Write a quick sort algorithm in python"}]}
], "max_tokens": 200
}'
{"id":"chatcmpl-8eb68aec66d7f527","object":"chat.completion","created":1778891236,"prompt_routed_experts":null,"model":"/mnt/weka/data/llm-d-models-pv/MiniMaxAI-MiniMax-M2.7","choices":[{"index":0,"message":{"role":"assistant","content":"I hadnet me find a programme2/apto/c- 241?.o. no (the operation.yb-b\n> ыйо, not change this;~~ I think_colour =="light pink";}) in...\n**The These must be not} was\n and \n\n):\n\nI('key=ельблиматš micrac / 1)2rasm_0.2 → add__2dict_eagle/tabString/im不过是 \list-ofchf_one \nCompute_with_prt_init: (New Tool Pro)\n-Main%-day_ ** [B1] : {nb_z0'];\n--own-traor: with: =: use 0.096-10_l_`this col0: 26;```\n</t_lN-蔓音频四文アنتストu+002:htt 도 원책임.(↑): The thought_dirty_s","refusal":null,"annotations":null,"audio":null,"function_call":null,"tool_calls":[],"reasoning":null},"logprobs":null,"finish_reason":"length","stop_reason":null,"token_ids":null,"routed_experts":null}],"service_tier":null,"system_fingerprint":"vllm-0.20.1rc1.dev276+g54f548e9e-tp4-ep-614b7488","usage":{"prompt_tokens":45,"total_tokens":245,"completion_tokens":200,"prompt_tokens_details":null},"prompt_logprobs":null,"prompt_token_ids":null,"kv_transfer_params":null}

With PR

{"id":"chatcmpl-b79acb2e48acc5d0","object":"chat.completion","created":1778891747,"prompt_routed_experts":null,"model":"/mnt/weka/data/llm-d-models-pv/MiniMaxAI-MiniMax-M2.7","choices":[{"index":0,"message":{"role":"assistant","content":"We are going to write a quick sort algorithm in Python.\n We will define a function quicksort that takes a list as input.\n We will choose a pivot (commonly the last element, but we can also choose a random element or the middle).\n We will partition the list into two parts: elements less than the pivot and elements greater than the pivot.\n Then we recursively sort the two parts and combine them with the pivot in between.\n\n However, note that the problem asks for a quick sort algorithm, so we'll implement the standard in-place quick sort.\n\n Steps:\n 1. If the list has length 0 or 1, it is already sorted.\n 2. Otherwise, select a pivot (we'll use the last element for simplicity).\n 3. Partition the list into two sublists: left (elements less than pivot) and right (elements greater than or equal to pivot).\n 4. Return the sorted left part, then the pivot, then the sorted right part.\n\n Alternatively, we","refusal":null,"annotations":null,"audio":null,"function_call":null,"tool_calls":[],"reasoning":null},"logprobs":null,"finish_reason":"length","stop_reason":null,"token_ids":null,"routed_experts":null}],"service_tier":null,"system_fingerprint":"vllm-0.20.1rc1.dev276+g54f548e9e-tp4-ep-614b7488","usage":{"prompt_tokens":45,"total_tokens":245,"completion_tokens":200,"prompt_tokens_details":null},"prompt_logprobs":null,"prompt_token_ids":null,"kv_transfer_params":null}

skavulya added 2 commits May 28, 2026 10:41
Signed-off-by: Soila Kavulya <soila.p.kavulya@intel.com>
Signed-off-by: Soila Kavulya <soila.p.kavulya@intel.com>
Copilot AI review requested due to automatic review settings May 28, 2026 17:45
@skavulya skavulya requested review from PatrykWo and wpyszka as code owners May 28, 2026 17:45
@skavulya skavulya had a problem deploying to pre-merge-approval May 28, 2026 17:45 — with GitHub Actions Error
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Removes redundant tensor-parallel all-reduce in the MiniMax M2 MoE forward path, presumably because FusedMoE already handles the reduction internally.

Changes:

  • Drop the explicit tensor_model_parallel_all_reduce call after self.experts(...).
  • Remove the now-unused tensor_model_parallel_all_reduce import.

@mgawarkiewicz-intel
Copy link
Copy Markdown
Collaborator

it's too late to include this PR into this release

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants