Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion MiniMax/MiniMax-M2.md
Original file line number Diff line number Diff line change
Expand Up @@ -115,7 +115,7 @@ uv pip install vllm \

### NVIDIA GPU

You can use 4x H200/H20/H100 or 4x A100/A800 GPUs to launch this model.
You can use 4x H200/H20/H100 or 4x A100/A800 or 4x B200 GPUs to launch this model.

run tensor-parallel like this:

Expand All @@ -129,6 +129,8 @@ vllm serve MiniMaxAI/MiniMax-M2.7 \
--trust-remote-code
```

> **Note**: For improved performance, you may set `VLLM_FLOAT32_MATMUL_PRECISION="high"` to enable TF32 TensorCore acceleration for float32 matmuls. This deviates from the original implementation, which uses full FP32 precision for MoE gating, but evaluations show no observable differences on GSM8K, MMLU-Pro, and tool-calling benchmarks.

Note that pure TP8 is not supported. To run the model with >4 GPUs, please use DP+EP or TP+EP:

- DP8+EP
Expand Down