Update Deepseek V3 MXFP8 GB200 mapping by dingqingy-nv · Pull Request #2215 · NVIDIA-NeMo/Megatron-Bridge

dingqingy-nv · 2026-02-04T18:27:01Z

What does this PR do ?

Small CG change to achieve 1040 TFLOPs on DSV3 MXFP8 GB200 recipe.

Summary by CodeRabbit

Chores
- Updated GPU performance configuration to optimize computation graph scope with additional execution components.

Signed-off-by: Dingqing Yang <dingqingy@nvidia.com>

coderabbitai · 2026-02-04T18:29:57Z

📝 Walkthrough

Walkthrough

Updated the GB200 V1 pretrain configuration to expand cuda_graph_scope from an empty list to include "attn", "moe_router", and "moe_preprocess". This is a single-line configuration modification with no changes to control flow or error handling.

Changes

Cohort / File(s)	Summary
CUDA Graph Scope Configuration `scripts/performance/configs/deepseek/deepseek_workload_base_configs.py`	Updated GB200 V1 pretrain config to add "attn" to `cuda_graph_scope` alongside existing "moe_router" and "moe_preprocess" entries.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~2 minutes

Possibly related PRs

Update Qwen3 235B A22B MXFP8 GB200/300 recipe and resolve NaN grad norm #2209: Adds identical cuda_graph_scope entries to multiple model pretrain configurations.
Dsv3 Recipe Update #2152: Modifies the same deepseek_workload_base_configs.py file for GB300 V1 variant CUDA graph scope configuration.
cp: Dsv3 Recipe Update (2152) into r0.3.0 #2186: Updates cuda_graph_scope entries in the same DeepSeek configuration file for a different pretrain variant.

Suggested reviewers

erhoo82
ko3n1g

🚥 Pre-merge checks | ✅ 3 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Test Results For Major Changes	⚠️ Warning	PR description mentions intended performance goal of 1040 TFLOPs but lacks actual before-and-after performance metrics validating the CUDA graph scope configuration change.	Add performance test results showing baseline metrics, post-change metrics confirming 1040 TFLOPs achievement, testing environment details, and regression testing results.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title mentions 'Deepseek V3 MXFP8 GB200 mapping' which aligns with the configuration change to GB200 V1 pretrain config, but is less specific than the actual change (adding 'attn' to cuda_graph_scope for performance optimization).
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Signed-off-by: Dingqing Yang <dingqingy@nvidia.com> Signed-off-by: NeMo Bot <nemo-bot@nvidia.com>

Signed-off-by: Dingqing Yang <dingqingy@nvidia.com>

small update to dsv3 gb200 mapping for 1040 TFLOPs

5e461a2

Signed-off-by: Dingqing Yang <dingqingy@nvidia.com>

dingqingy-nv added this to the 26.02 milestone Feb 4, 2026

dingqingy-nv requested review from erhoo82 and ko3n1g February 4, 2026 18:27

dingqingy-nv added the r0.3.0 Cherry-pick label for r0.3.0 release branch label Feb 4, 2026

copy-pr-bot bot temporarily deployed to nemo-ci February 4, 2026 18:27 Inactive

copy-pr-bot bot temporarily deployed to test February 4, 2026 18:27 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci February 4, 2026 18:45 Inactive

erhoo82 approved these changes Feb 4, 2026

View reviewed changes

erhoo82 enabled auto-merge (squash) February 4, 2026 18:56

ko3n1g approved these changes Feb 4, 2026

View reviewed changes

copy-pr-bot bot temporarily deployed to nemo-ci February 4, 2026 19:20 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci February 4, 2026 19:30 Inactive

copy-pr-bot bot had a problem deploying to nemo-ci February 4, 2026 19:30 Failure

copy-pr-bot bot temporarily deployed to nemo-ci February 4, 2026 19:30 Inactive

erhoo82 approved these changes Feb 7, 2026

View reviewed changes

erhoo82 approved these changes Feb 13, 2026

View reviewed changes

ko3n1g disabled auto-merge February 13, 2026 22:20

ko3n1g merged commit eff3c76 into NVIDIA-NeMo:main Feb 13, 2026
46 of 48 checks passed

ko3n1g pushed a commit that referenced this pull request Feb 13, 2026

Update Deepseek V3 MXFP8 GB200 mapping (#2215)

25915c0

Signed-off-by: Dingqing Yang <dingqingy@nvidia.com> Signed-off-by: NeMo Bot <nemo-bot@nvidia.com>

copy-pr-bot bot pushed a commit that referenced this pull request Mar 19, 2026

Update Deepseek V3 MXFP8 GB200 mapping (#2215)

371e964

Signed-off-by: Dingqing Yang <dingqingy@nvidia.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update Deepseek V3 MXFP8 GB200 mapping#2215

Update Deepseek V3 MXFP8 GB200 mapping#2215
ko3n1g merged 1 commit intoNVIDIA-NeMo:mainfrom
dingqingy-nv:dsv3_gb200_mxfp8_update

dingqingy-nv commented Feb 4, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Feb 4, 2026

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

dingqingy-nv commented Feb 4, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do ?

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Feb 4, 2026

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

dingqingy-nv commented Feb 4, 2026 •

edited by coderabbitai bot

Loading