Add MoE config for Super B200 TP2#33510
Merged
mgoin merged 1 commit intovllm-project:mainfrom Feb 1, 2026
Merged
Conversation
Signed-off-by: Shahar Mor <smor@nvidia.com>
|
Related Documentation No published documentation to review for changes on this repository. |
Contributor
There was a problem hiding this comment.
Code Review
This pull request introduces a new Mixture of Experts (MoE) configuration file for the NVIDIA B200 GPU with a tensor parallelism size of 2. The configuration is generated by the project's benchmarking script and aims to optimize performance for MoE models on this specific hardware. The provided performance metrics show a significant improvement with the new configuration. The change is straightforward and appears to be a valuable performance enhancement.
mgoin
approved these changes
Feb 1, 2026
Member
mgoin
left a comment
There was a problem hiding this comment.
LGTM, thanks for including benchmarks!
PiratePai
pushed a commit
to PiratePai/epd_shm
that referenced
this pull request
Feb 3, 2026
Signed-off-by: Pai <416932041@qq.com>
5 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
When locally running Nemotron Super on B200 the following warning appears:
Using default MoE config. Performance might be sub-optimal!I used the
benchmark_moe.pyto create a JSON file for this use-case:python benchmarks/kernels/benchmark_moe.py \ --model $MODEL_PATH \ --trust-remote-code \ --tp-size 2 \ --tune \ --batch-size 1 2 4 8 16 24 32 48 64 96 128 256 512 768 1024 1536 \ --save-dir /.../vllm/model_executor/layers/fused_moe/configs/Related PRs:
#27967
Test Plan
Compare performance (
vllm bench serve) with various batch sizes, with and without the JSON file.Performance should be equal or better when the JSON is available.
Test Result
Absolute output tokens per second cannot be disclosed at this stage.
Instead, we'd report the gained diff.
Setup for all benchmarks: B200, TP2
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.