Add B200 GPU configuration for MiniMax-M2.7 by kjiang249 · Pull Request #334 · vllm-project/recipes

kjiang249 · 2026-04-15T22:24:17Z

No description provided.

gemini-code-assist

Code Review

This pull request updates the documentation to include support for NVIDIA B200 GPUs and provides a recommended configuration for running the MiniMax-M2.7 model on them. Feedback suggests including the --compilation-config flag to enable specific optimizations and adjusting the reasoning-parser value to maintain consistency with other examples in the guide.

gemini-code-assist · 2026-04-15T22:25:18Z

+VLLM_FLOAT32_MATMUL_PRECISION="high" vllm serve MiniMaxAI/MiniMax-M2.7 \
+  --trust-remote-code \
+  --tensor-parallel-size 4 \
+  --enable-auto-tool-choice \
+  --tool-call-parser minimax_m2 \
+  --reasoning-parser minimax_m2_append_think


The B200 configuration is missing the --compilation-config which includes the fuse_minimax_qk_norm optimization recommended for this model series (as noted in line 184). Additionally, the reasoning-parser value minimax_m2_append_think is inconsistent with all other examples in this guide (e.g., lines 70, 126, 153) which use minimax_m2. It is recommended to maintain consistency across the documentation unless this specific parser is required for B200.

Suggested change

VLLM_FLOAT32_MATMUL_PRECISION="high" vllm serve MiniMaxAI/MiniMax-M2.7 \

--trust-remote-code \

--tensor-parallel-size 4 \

--enable-auto-tool-choice \

--tool-call-parser minimax_m2 \

--reasoning-parser minimax_m2_append_think

VLLM_FLOAT32_MATMUL_PRECISION="high" vllm serve MiniMaxAI/MiniMax-M2.7 \

--tensor-parallel-size 4 \

--tool-call-parser minimax_m2 \

--reasoning-parser minimax_m2 \

--compilation-config '{"mode":3,"pass_config":{"fuse_minimax_qk_norm":true}}' \

--enable-auto-tool-choice \

--trust-remote-code

simon-mo · 2026-04-16T16:50:53Z

+  --tensor-parallel-size 4 \
+  --enable-auto-tool-choice \
+  --tool-call-parser minimax_m2 \
+  --reasoning-parser minimax_m2_append_think


This is quite different from the previous configuration

vllm serve MiniMaxAI/MiniMax-M2.7 \ --tensor-parallel-size 4 \ --tool-call-parser minimax_m2 \ --reasoning-parser minimax_m2 \ --compilation-config '{"mode":3,"pass_config":{"fuse_minimax_qk_norm":true}}' \ --enable-auto-tool-choice \ --trust-remote-code

Have you tested that,

we need minimax_m2_append_think

we cannot do --compilation-config '{"mode":3,"pass_config":{"fuse_minimax_qk_norm":true}}' on blackwell?

We can remove minimax_m2_append_think and add --compilation-config '{"mode":3,"pass_config":{"fuse_minimax_qk_norm":true}}'. Though the fuse_minimax_qk_norm is not yet available on latest vllm release.

My main concern is we have different configurations for Hpper and Blackwell and I would like to educate the users on the minimal difference between the two and explain the rationale.

yes we will keep them consistent. Thanks for flagging this out.

Thanks. please ping me on Slack when it's updated, thx

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Kaihang Jiang <kaihangj@nvidia.com>

Signed-off-by: Kaihang Jiang <kaihangj@nvidia.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: haic0 <haichzha@amd.com>

gemini-code-assist Bot reviewed Apr 15, 2026

View reviewed changes

simon-mo requested changes Apr 16, 2026

View reviewed changes

Add B200 GPU configuration for MiniMax-M2.7

ec6b0f0

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Kaihang Jiang <kaihangj@nvidia.com>

kjiang249 force-pushed the minimax-m2.7-update branch from 53acbc2 to ec6b0f0 Compare April 16, 2026 16:54

Align B200 config with H200 recipe

74949e3

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Kaihang Jiang <kaihangj@nvidia.com>

simon-mo reviewed Apr 16, 2026

View reviewed changes

Comment thread MiniMax/MiniMax-M2.md Outdated

Remove duplicate B200 section, keep VLLM_FLOAT32_MATMUL_PRECISION note

f63d8f7

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Kaihang Jiang <kaihangj@nvidia.com>

simon-mo approved these changes Apr 16, 2026

View reviewed changes

simon-mo merged commit fbe13d6 into vllm-project:main Apr 16, 2026
2 checks passed

Ankur-singh mentioned this pull request Apr 20, 2026

[NV] update minimaxm2.5-fp8-b200-vllm SemiAnalysisAI/InferenceX#1068

Merged

faradawn mentioned this pull request Apr 21, 2026

feat(MiniMax-M2.5): add VLLM_FLOAT32_MATMUL_PRECISION=high for Blackwell (B200/B300 FP8+FP4) #353

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add B200 GPU configuration for MiniMax-M2.7#334

Add B200 GPU configuration for MiniMax-M2.7#334
simon-mo merged 3 commits intovllm-project:mainfrom
kjiang249:minimax-m2.7-update

kjiang249 commented Apr 15, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Apr 15, 2026

Uh oh!

simon-mo Apr 16, 2026

Uh oh!

wzhao18 Apr 16, 2026

Uh oh!

simon-mo Apr 16, 2026

Uh oh!

wzhao18 Apr 16, 2026

Uh oh!

simon-mo Apr 16, 2026

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

kjiang249 commented Apr 15, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

simon-mo Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

wzhao18 Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

simon-mo Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

wzhao18 Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

simon-mo Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants