cp: `Fix DeepSeek-V3 H100 large scale config (2401)` into `r0.3.0` by svcnvidia-nemo-ci · Pull Request #2483 · NVIDIA-NeMo/Megatron-Bridge

svcnvidia-nemo-ci · 2026-02-23T09:13:35Z

beep boop [🤖]: Hi @scsudhakaran 👋,

we've cherry picked #2401 into  for you! 🚀

Please review and approve this cherry pick by your convenience!

Summary by CodeRabbit

Chores
- Updated performance testing configurations to support enhanced model parallelization strategies for large-scale deployments.

Signed-off-by: Sanju C Sudhakaran <scsudhakaran@nvidia.com> Signed-off-by: NeMo Bot <nemo-bot@nvidia.com>

svcnvidia-nemo-ci · 2026-02-23T09:13:39Z

/ok to test 3b1fed8

copy-pr-bot · 2026-02-23T09:13:39Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

coderabbitai · 2026-02-23T09:17:36Z

📝 Walkthrough

Walkthrough

This PR adds two configuration fields to the DEEPSEEK_V3_PRETRAIN_CONFIG_H100_FP8_SC_LARGE_SCALE configuration: virtual_pipeline_model_parallel_size set to 2 and pp_layout set to None.

Changes

Cohort / File(s)	Summary
DeepSeek V3 Pretrain Config `scripts/performance/configs/deepseek/deepseek_workload_base_configs.py`	Add `virtual_pipeline_model_parallel_size=2` and `pp_layout=None` fields to DEEPSEEK_V3_PRETRAIN_CONFIG_H100_FP8_SC_LARGE_SCALE configuration.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~5 minutes

Possibly related PRs

DeepSeek-V3 recipes for H100 #2312: Modifies the same DeepSeek V3 H100 base configs and adjusts virtual_pipeline_model_parallel_size and pp_layout fields.
Fix DeepSeek-V3 H100 large scale config #2401: Modifies the same DeepSeek V3 H100 FP8 SC Large Scale pretrain config by adding virtual_pipeline_model_parallel_size=2 and pp_layout=None.
cp: Dsv3 Recipe Update (2152) into r0.3.0 #2186: Modifies deepseek_workload_base_configs.py and adjusts model layout/parallelism fields across different DeepSeek pretrain configs.

Suggested labels

r0.3.0

Suggested reviewers

malay-nagda

🚥 Pre-merge checks | ✅ 3 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Test Results For Major Changes	⚠️ Warning	PR modifies critical parallelization parameters for DeepSeek-V3 H100 large-scale config but provides no test results, performance metrics, or validation documentation.	Add test results confirming the fix resolves the issue and include convergence metrics or performance comparisons demonstrating no regression.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly indicates it's a cherry-pick of fix `#2401` for DeepSeek-V3 H100 large scale config into r0.3.0 branch, which aligns with the actual change to the configuration file.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings (stacked PR)
📝 Generate docstrings (commit on current branch)

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch cherry-pick-2401-r0.3.0

Tip

Issue Planner is now in beta. Read the docs and try it out! Share your feedback on Discord.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

🧹 Nitpick comments (1)

scripts/performance/configs/deepseek/deepseek_workload_base_configs.py (1)
228-233: Fix is correct and mirrors the SC-V2 parallelism settings.

DEEPSEEK_V3_PRETRAIN_CONFIG_H100_FP8_SC_LARGE_SCALE was inheriting virtual_pipeline_model_parallel_size=4 and pp_layout="Et|(tt|)*30mL" from DEEPSEEK_V3_PRETRAIN_CONFIG_H100_FP8_SC_V1. The explicit overrides to vpp=2 and pp_layout=None correctly align the LARGE_SCALE config with the SC variant's intended pipeline settings, consistent with what DEEPSEEK_V3_PRETRAIN_CONFIG_H100_FP8_SC_V2 already does.

Optional: since DEEPSEEK_V3_PRETRAIN_CONFIG_H100_FP8_SC_V2 already carries both SC-specific overrides, the LARGE_SCALE config could derive from it directly to avoid re-stating them:
♻️ Optional refactor: derive from SC V2
 DEEPSEEK_V3_PRETRAIN_CONFIG_H100_FP8_SC_LARGE_SCALE = replace(
-    DEEPSEEK_V3_PRETRAIN_CONFIG_H100_FP8_SC_V1,
+    DEEPSEEK_V3_PRETRAIN_CONFIG_H100_FP8_SC_V2,
     global_batch_size=1024,
-    virtual_pipeline_model_parallel_size=2,
-    pp_layout=None,
 )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/performance/configs/deepseek/deepseek_workload_base_configs.py`
around lines 228 - 233, The LARGE_SCALE config currently replaces
DEEPSEEK_V3_PRETRAIN_CONFIG_H100_FP8_SC_V1 then overrides
virtual_pipeline_model_parallel_size and pp_layout to match SC V2; to simplify
and avoid re-stating SC-specific overrides, change the base to
DEEPSEEK_V3_PRETRAIN_CONFIG_H100_FP8_SC_V2 (i.e., create
DEEPSEEK_V3_PRETRAIN_CONFIG_H100_FP8_SC_LARGE_SCALE by calling
replace(DEEPSEEK_V3_PRETRAIN_CONFIG_H100_FP8_SC_V2, global_batch_size=1024,
virtual_pipeline_model_parallel_size=2, pp_layout=None) or remove redundant
overrides if SC_V2 already sets them) so LARGE_SCALE directly derives SC
settings from the SC V2 symbol.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@scripts/performance/configs/deepseek/deepseek_workload_base_configs.py`:
- Around line 228-233: The LARGE_SCALE config currently replaces
DEEPSEEK_V3_PRETRAIN_CONFIG_H100_FP8_SC_V1 then overrides
virtual_pipeline_model_parallel_size and pp_layout to match SC V2; to simplify
and avoid re-stating SC-specific overrides, change the base to
DEEPSEEK_V3_PRETRAIN_CONFIG_H100_FP8_SC_V2 (i.e., create
DEEPSEEK_V3_PRETRAIN_CONFIG_H100_FP8_SC_LARGE_SCALE by calling
replace(DEEPSEEK_V3_PRETRAIN_CONFIG_H100_FP8_SC_V2, global_batch_size=1024,
virtual_pipeline_model_parallel_size=2, pp_layout=None) or remove redundant
overrides if SC_V2 already sets them) so LARGE_SCALE directly derives SC
settings from the SC V2 symbol.

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 99db9fc and 3b1fed8.

📒 Files selected for processing (1)

scripts/performance/configs/deepseek/deepseek_workload_base_configs.py

scsudhakaran · 2026-02-23T09:22:44Z

/ok to test 3b1fed8

ko3n1g · 2026-03-03T10:07:29Z

Merge via #2509

Fix DeepSeek-V3 H100 large scale config (#2401)

3b1fed8

Signed-off-by: Sanju C Sudhakaran <scsudhakaran@nvidia.com> Signed-off-by: NeMo Bot <nemo-bot@nvidia.com>

svcnvidia-nemo-ci requested a review from scsudhakaran February 23, 2026 09:13

svcnvidia-nemo-ci added cherry-pick Run CICD labels Feb 23, 2026

copy-pr-bot bot temporarily deployed to nemo-ci February 23, 2026 09:14 Inactive

copy-pr-bot bot temporarily deployed to test February 23, 2026 09:14 Inactive

coderabbitai bot reviewed Feb 23, 2026

View reviewed changes

copy-pr-bot bot temporarily deployed to nemo-ci February 23, 2026 09:29 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci February 23, 2026 09:36 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci February 23, 2026 09:52 Inactive

scsudhakaran approved these changes Feb 24, 2026

View reviewed changes

scsudhakaran requested a review from malay-nagda February 24, 2026 07:06

malay-nagda approved these changes Feb 24, 2026

View reviewed changes

ko3n1g marked this pull request as draft February 24, 2026 21:23

ko3n1g closed this Mar 3, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cp: `Fix DeepSeek-V3 H100 large scale config (2401)` into `r0.3.0`#2483

cp: `Fix DeepSeek-V3 H100 large scale config (2401)` into `r0.3.0`#2483
svcnvidia-nemo-ci wants to merge 1 commit intor0.3.0from
cherry-pick-2401-r0.3.0

svcnvidia-nemo-ci commented Feb 23, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

svcnvidia-nemo-ci commented Feb 23, 2026

Uh oh!

copy-pr-bot bot commented Feb 23, 2026

Uh oh!

coderabbitai bot commented Feb 23, 2026

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Uh oh!

scsudhakaran commented Feb 23, 2026

Uh oh!

ko3n1g commented Mar 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

svcnvidia-nemo-ci commented Feb 23, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

svcnvidia-nemo-ci commented Feb 23, 2026

Uh oh!

copy-pr-bot bot commented Feb 23, 2026

Uh oh!

coderabbitai bot commented Feb 23, 2026

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

scsudhakaran commented Feb 23, 2026

Uh oh!

ko3n1g commented Mar 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

svcnvidia-nemo-ci commented Feb 23, 2026 •

edited by coderabbitai bot

Loading