Fix accuracy issue in minimax_m2 with TP > 1 by skavulya · Pull Request #1451 · vllm-project/vllm-gaudi

skavulya · 2026-05-16T00:59:22Z

Fix accuracy of minimax m2 for tensor parallel size > 1. Reduce is handled in FusedMoE after #1377 and reduce_results=False dropped #1444

Output without this PR:

curl http://localhost:8000/v1/chat/completions     -H "Content-Type: application/json"     -d '{
        "model": "/mnt/weka/data/llm-d-models-pv/MiniMaxAI-MiniMax-M2.7",
        "messages": [
            {"role": "user", "content": [{"type": "text", "text": "Write a quick sort algorithm in python"}]}
        ], "max_tokens": 200
    }'

{"id":"chatcmpl-8eb68aec66d7f527","object":"chat.completion","created":1778891236,"prompt_routed_experts":null,"model":"/mnt/weka/data/llm-d-models-pv/MiniMaxAI-MiniMax-M2.7","choices":[{"index":0,"message":{"role":"assistant","content":"I hadnet me find a programme2/apto/c- 241?.o. no (the operation.yb-b\n> ыйо, not change this;~~ I think_colour =="light pink";}) in...\n**The These must be not} was\n and \n\n):\n\nI('key=ельблиматš micrac / 1)2rasm_0.2 → add__2dict_eagle/tabString/im不过是 \list-ofchf_one \nCompute_with_prt_init: (New Tool Pro)\n-Main%-day_ ** [B1] : {nb_z0'];\n--own-traor: with: =: use 0.096-10_l_`this col0: 26;```\n</t_lN-蔓音频四文アنتストu+002:htt 도 원책임.(↑): The thought_dirty_s","refusal":null,"annotations":null,"audio":null,"function_call":null,"tool_calls":[],"reasoning":null},"logprobs":null,"finish_reason":"length","stop_reason":null,"token_ids":null,"routed_experts":null}],"service_tier":null,"system_fingerprint":"vllm-0.20.1rc1.dev276+g54f548e9e-tp4-ep-614b7488","usage":{"prompt_tokens":45,"total_tokens":245,"completion_tokens":200,"prompt_tokens_details":null},"prompt_logprobs":null,"prompt_token_ids":null,"kv_transfer_params":null}

With PR

{"id":"chatcmpl-b79acb2e48acc5d0","object":"chat.completion","created":1778891747,"prompt_routed_experts":null,"model":"/mnt/weka/data/llm-d-models-pv/MiniMaxAI-MiniMax-M2.7","choices":[{"index":0,"message":{"role":"assistant","content":"We are going to write a quick sort algorithm in Python.\n We will define a function quicksort that takes a list as input.\n We will choose a pivot (commonly the last element, but we can also choose a random element or the middle).\n We will partition the list into two parts: elements less than the pivot and elements greater than the pivot.\n Then we recursively sort the two parts and combine them with the pivot in between.\n\n However, note that the problem asks for a quick sort algorithm, so we'll implement the standard in-place quick sort.\n\n Steps:\n 1. If the list has length 0 or 1, it is already sorted.\n 2. Otherwise, select a pivot (we'll use the last element for simplicity).\n 3. Partition the list into two sublists: left (elements less than pivot) and right (elements greater than or equal to pivot).\n 4. Return the sorted left part, then the pivot, then the sorted right part.\n\n Alternatively, we","refusal":null,"annotations":null,"audio":null,"function_call":null,"tool_calls":[],"reasoning":null},"logprobs":null,"finish_reason":"length","stop_reason":null,"token_ids":null,"routed_experts":null}],"service_tier":null,"system_fingerprint":"vllm-0.20.1rc1.dev276+g54f548e9e-tp4-ep-614b7488","usage":{"prompt_tokens":45,"total_tokens":245,"completion_tokens":200,"prompt_tokens_details":null},"prompt_logprobs":null,"prompt_token_ids":null,"kv_transfer_params":null}

Copilot

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Removes an explicit tensor-parallel all-reduce on the MoE output in the MiniMax-M2 model's forward pass.

Changes:

Drop the tp_size > 1 branch that called tensor_model_parallel_all_reduce on final_hidden_states.

skavulya · 2026-05-16T01:08:06Z

@@ -108,8 +108,6 @@ def forward(self, hidden_states: torch.Tensor) -> torch.Tensor:
        router_logits, _ = self.gate(hidden_states.to(torch.float32))
        final_hidden_states = self.experts(hidden_states=hidden_states, router_logits=router_logits)


Reduce is already handled here https://github.com/vllm-project/vllm-gaudi/blame/f1abfec3e7ee4ecb7bb937624911ce112569ecca/vllm_gaudi/ops/hpu_fused_moe.py#L327

Signed-off-by: Soila Kavulya <soila.p.kavulya@intel.com>

Copilot AI review requested due to automatic review settings May 16, 2026 00:59

skavulya requested review from PatrykWo, adobrzyn, afierka-intel, iboiko-habana, jbyczkow, kamil-kaczor, ksmusz, mgawarkiewicz-intel, michalkuligowski and xuechendi as code owners May 16, 2026 00:59

Copilot AI reviewed May 16, 2026

View reviewed changes

skavulya force-pushed the skavulya/minimax2_accuracy branch 4 times, most recently from 54d471d to c29bddd Compare May 16, 2026 01:13

Fix accuracy issue in minimax_m2 with TP > 1

e397ecd

Signed-off-by: Soila Kavulya <soila.p.kavulya@intel.com>

skavulya force-pushed the skavulya/minimax2_accuracy branch from c29bddd to e397ecd Compare May 16, 2026 01:15

github-actions Bot mentioned this pull request May 16, 2026

🚦 Team Review Dashboard #701

Open

Merge branch 'main' into skavulya/minimax2_accuracy

91c9c37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix accuracy issue in minimax_m2 with TP > 1#1451

Fix accuracy issue in minimax_m2 with TP > 1#1451
skavulya wants to merge 2 commits into
vllm-project:mainfrom
skavulya:skavulya/minimax2_accuracy

skavulya commented May 16, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

skavulya May 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		@@ -108,8 +108,6 @@ def forward(self, hidden_states: torch.Tensor) -> torch.Tensor:
		router_logits, _ = self.gate(hidden_states.to(torch.float32))
		final_hidden_states = self.experts(hidden_states=hidden_states, router_logits=router_logits)

Conversation

skavulya commented May 16, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

skavulya May 16, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants