Skip to content

Add B200 GPU configuration for MiniMax-M2.7#334

Merged
simon-mo merged 3 commits intovllm-project:mainfrom
kjiang249:minimax-m2.7-update
Apr 16, 2026
Merged

Add B200 GPU configuration for MiniMax-M2.7#334
simon-mo merged 3 commits intovllm-project:mainfrom
kjiang249:minimax-m2.7-update

Conversation

@kjiang249
Copy link
Copy Markdown
Contributor

No description provided.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the documentation to include support for NVIDIA B200 GPUs and provides a recommended configuration for running the MiniMax-M2.7 model on them. Feedback suggests including the --compilation-config flag to enable specific optimizations and adjusting the reasoning-parser value to maintain consistency with other examples in the guide.

Comment thread MiniMax/MiniMax-M2.md Outdated
Comment on lines +137 to +142
VLLM_FLOAT32_MATMUL_PRECISION="high" vllm serve MiniMaxAI/MiniMax-M2.7 \
--trust-remote-code \
--tensor-parallel-size 4 \
--enable-auto-tool-choice \
--tool-call-parser minimax_m2 \
--reasoning-parser minimax_m2_append_think
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The B200 configuration is missing the --compilation-config which includes the fuse_minimax_qk_norm optimization recommended for this model series (as noted in line 184). Additionally, the reasoning-parser value minimax_m2_append_think is inconsistent with all other examples in this guide (e.g., lines 70, 126, 153) which use minimax_m2. It is recommended to maintain consistency across the documentation unless this specific parser is required for B200.

Suggested change
VLLM_FLOAT32_MATMUL_PRECISION="high" vllm serve MiniMaxAI/MiniMax-M2.7 \
--trust-remote-code \
--tensor-parallel-size 4 \
--enable-auto-tool-choice \
--tool-call-parser minimax_m2 \
--reasoning-parser minimax_m2_append_think
VLLM_FLOAT32_MATMUL_PRECISION="high" vllm serve MiniMaxAI/MiniMax-M2.7 \
--tensor-parallel-size 4 \
--tool-call-parser minimax_m2 \
--reasoning-parser minimax_m2 \
--compilation-config '{"mode":3,"pass_config":{"fuse_minimax_qk_norm":true}}' \
--enable-auto-tool-choice \
--trust-remote-code

Comment thread MiniMax/MiniMax-M2.md Outdated
--tensor-parallel-size 4 \
--enable-auto-tool-choice \
--tool-call-parser minimax_m2 \
--reasoning-parser minimax_m2_append_think
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is quite different from the previous configuration

vllm serve MiniMaxAI/MiniMax-M2.7 \
  --tensor-parallel-size 4 \
  --tool-call-parser minimax_m2 \
  --reasoning-parser minimax_m2  \
  --compilation-config '{"mode":3,"pass_config":{"fuse_minimax_qk_norm":true}}' \
  --enable-auto-tool-choice \
  --trust-remote-code

Have you tested that,

  • we need minimax_m2_append_think
  • we cannot do --compilation-config '{"mode":3,"pass_config":{"fuse_minimax_qk_norm":true}}' on blackwell?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can remove minimax_m2_append_think and add --compilation-config '{"mode":3,"pass_config":{"fuse_minimax_qk_norm":true}}'. Though the fuse_minimax_qk_norm is not yet available on latest vllm release.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My main concern is we have different configurations for Hpper and Blackwell and I would like to educate the users on the minimal difference between the two and explain the rationale.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes we will keep them consistent. Thanks for flagging this out.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. please ping me on Slack when it's updated, thx

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Kaihang Jiang <kaihangj@nvidia.com>
@kjiang249 kjiang249 force-pushed the minimax-m2.7-update branch from 53acbc2 to ec6b0f0 Compare April 16, 2026 16:54
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Kaihang Jiang <kaihangj@nvidia.com>
Comment thread MiniMax/MiniMax-M2.md Outdated
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Kaihang Jiang <kaihangj@nvidia.com>
@simon-mo simon-mo merged commit fbe13d6 into vllm-project:main Apr 16, 2026
2 checks passed
haic0 pushed a commit to haic0/recipes-AMD that referenced this pull request Apr 24, 2026
Signed-off-by: Kaihang Jiang <kaihangj@nvidia.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: haic0 <haichzha@amd.com>
haic0 pushed a commit to haic0/recipes-AMD that referenced this pull request Apr 24, 2026
Signed-off-by: Kaihang Jiang <kaihangj@nvidia.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: haic0 <haichzha@amd.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants