Skip to content

Update Deepseek V3 MXFP8 GB200 mapping#2215

Merged
ko3n1g merged 1 commit intoNVIDIA-NeMo:mainfrom
dingqingy-nv:dsv3_gb200_mxfp8_update
Feb 13, 2026
Merged

Update Deepseek V3 MXFP8 GB200 mapping#2215
ko3n1g merged 1 commit intoNVIDIA-NeMo:mainfrom
dingqingy-nv:dsv3_gb200_mxfp8_update

Conversation

@dingqingy-nv
Copy link
Copy Markdown
Contributor

@dingqingy-nv dingqingy-nv commented Feb 4, 2026

What does this PR do ?

Small CG change to achieve 1040 TFLOPs on DSV3 MXFP8 GB200 recipe.

Summary by CodeRabbit

  • Chores
    • Updated GPU performance configuration to optimize computation graph scope with additional execution components.

Signed-off-by: Dingqing Yang <dingqingy@nvidia.com>
@dingqingy-nv dingqingy-nv added this to the 26.02 milestone Feb 4, 2026
@dingqingy-nv dingqingy-nv added the r0.3.0 Cherry-pick label for r0.3.0 release branch label Feb 4, 2026
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Feb 4, 2026

📝 Walkthrough

Walkthrough

Updated the GB200 V1 pretrain configuration to expand cuda_graph_scope from an empty list to include "attn", "moe_router", and "moe_preprocess". This is a single-line configuration modification with no changes to control flow or error handling.

Changes

Cohort / File(s) Summary
CUDA Graph Scope Configuration
scripts/performance/configs/deepseek/deepseek_workload_base_configs.py
Updated GB200 V1 pretrain config to add "attn" to cuda_graph_scope alongside existing "moe_router" and "moe_preprocess" entries.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~2 minutes

Possibly related PRs

Suggested reviewers

  • erhoo82
  • ko3n1g
🚥 Pre-merge checks | ✅ 3 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Test Results For Major Changes ⚠️ Warning PR description mentions intended performance goal of 1040 TFLOPs but lacks actual before-and-after performance metrics validating the CUDA graph scope configuration change. Add performance test results showing baseline metrics, post-change metrics confirming 1040 TFLOPs achievement, testing environment details, and regression testing results.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title mentions 'Deepseek V3 MXFP8 GB200 mapping' which aligns with the configuration change to GB200 V1 pretrain config, but is less specific than the actual change (adding 'attn' to cuda_graph_scope for performance optimization).
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@erhoo82 erhoo82 enabled auto-merge (squash) February 4, 2026 18:56
@ko3n1g ko3n1g disabled auto-merge February 13, 2026 22:20
@ko3n1g ko3n1g merged commit eff3c76 into NVIDIA-NeMo:main Feb 13, 2026
46 of 48 checks passed
ko3n1g pushed a commit that referenced this pull request Feb 13, 2026
Signed-off-by: Dingqing Yang <dingqingy@nvidia.com>
Signed-off-by: NeMo Bot <nemo-bot@nvidia.com>
copy-pr-bot bot pushed a commit that referenced this pull request Mar 19, 2026
Signed-off-by: Dingqing Yang <dingqingy@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

r0.3.0 Cherry-pick label for r0.3.0 release branch

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants