Skip to content

cp: Fix DeepSeek-V3 H100 large scale config (2401) into r0.3.0#2483

Closed
svcnvidia-nemo-ci wants to merge 1 commit intor0.3.0from
cherry-pick-2401-r0.3.0
Closed

cp: Fix DeepSeek-V3 H100 large scale config (2401) into r0.3.0#2483
svcnvidia-nemo-ci wants to merge 1 commit intor0.3.0from
cherry-pick-2401-r0.3.0

Conversation

@svcnvidia-nemo-ci
Copy link
Copy Markdown
Contributor

@svcnvidia-nemo-ci svcnvidia-nemo-ci commented Feb 23, 2026

beep boop [🤖]: Hi @scsudhakaran 👋,

we've cherry picked #2401 into  for you! 🚀

Please review and approve this cherry pick by your convenience!

Summary by CodeRabbit

  • Chores
    • Updated performance testing configurations to support enhanced model parallelization strategies for large-scale deployments.

Signed-off-by: Sanju C Sudhakaran <scsudhakaran@nvidia.com>
Signed-off-by: NeMo Bot <nemo-bot@nvidia.com>
@svcnvidia-nemo-ci
Copy link
Copy Markdown
Contributor Author

/ok to test 3b1fed8

@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot bot commented Feb 23, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Feb 23, 2026

📝 Walkthrough

Walkthrough

This PR adds two configuration fields to the DEEPSEEK_V3_PRETRAIN_CONFIG_H100_FP8_SC_LARGE_SCALE configuration: virtual_pipeline_model_parallel_size set to 2 and pp_layout set to None.

Changes

Cohort / File(s) Summary
DeepSeek V3 Pretrain Config
scripts/performance/configs/deepseek/deepseek_workload_base_configs.py
Add virtual_pipeline_model_parallel_size=2 and pp_layout=None fields to DEEPSEEK_V3_PRETRAIN_CONFIG_H100_FP8_SC_LARGE_SCALE configuration.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~5 minutes

Possibly related PRs

Suggested labels

r0.3.0

Suggested reviewers

  • malay-nagda
🚥 Pre-merge checks | ✅ 3 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Test Results For Major Changes ⚠️ Warning PR modifies critical parallelization parameters for DeepSeek-V3 H100 large-scale config but provides no test results, performance metrics, or validation documentation. Add test results confirming the fix resolves the issue and include convergence metrics or performance comparisons demonstrating no regression.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly indicates it's a cherry-pick of fix #2401 for DeepSeek-V3 H100 large scale config into r0.3.0 branch, which aligns with the actual change to the configuration file.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch cherry-pick-2401-r0.3.0

Tip

Issue Planner is now in beta. Read the docs and try it out! Share your feedback on Discord.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
scripts/performance/configs/deepseek/deepseek_workload_base_configs.py (1)

228-233: Fix is correct and mirrors the SC-V2 parallelism settings.

DEEPSEEK_V3_PRETRAIN_CONFIG_H100_FP8_SC_LARGE_SCALE was inheriting virtual_pipeline_model_parallel_size=4 and pp_layout="Et|(tt|)*30mL" from DEEPSEEK_V3_PRETRAIN_CONFIG_H100_FP8_SC_V1. The explicit overrides to vpp=2 and pp_layout=None correctly align the LARGE_SCALE config with the SC variant's intended pipeline settings, consistent with what DEEPSEEK_V3_PRETRAIN_CONFIG_H100_FP8_SC_V2 already does.

Optional: since DEEPSEEK_V3_PRETRAIN_CONFIG_H100_FP8_SC_V2 already carries both SC-specific overrides, the LARGE_SCALE config could derive from it directly to avoid re-stating them:

♻️ Optional refactor: derive from SC V2
 DEEPSEEK_V3_PRETRAIN_CONFIG_H100_FP8_SC_LARGE_SCALE = replace(
-    DEEPSEEK_V3_PRETRAIN_CONFIG_H100_FP8_SC_V1,
+    DEEPSEEK_V3_PRETRAIN_CONFIG_H100_FP8_SC_V2,
     global_batch_size=1024,
-    virtual_pipeline_model_parallel_size=2,
-    pp_layout=None,
 )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/performance/configs/deepseek/deepseek_workload_base_configs.py`
around lines 228 - 233, The LARGE_SCALE config currently replaces
DEEPSEEK_V3_PRETRAIN_CONFIG_H100_FP8_SC_V1 then overrides
virtual_pipeline_model_parallel_size and pp_layout to match SC V2; to simplify
and avoid re-stating SC-specific overrides, change the base to
DEEPSEEK_V3_PRETRAIN_CONFIG_H100_FP8_SC_V2 (i.e., create
DEEPSEEK_V3_PRETRAIN_CONFIG_H100_FP8_SC_LARGE_SCALE by calling
replace(DEEPSEEK_V3_PRETRAIN_CONFIG_H100_FP8_SC_V2, global_batch_size=1024,
virtual_pipeline_model_parallel_size=2, pp_layout=None) or remove redundant
overrides if SC_V2 already sets them) so LARGE_SCALE directly derives SC
settings from the SC V2 symbol.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@scripts/performance/configs/deepseek/deepseek_workload_base_configs.py`:
- Around line 228-233: The LARGE_SCALE config currently replaces
DEEPSEEK_V3_PRETRAIN_CONFIG_H100_FP8_SC_V1 then overrides
virtual_pipeline_model_parallel_size and pp_layout to match SC V2; to simplify
and avoid re-stating SC-specific overrides, change the base to
DEEPSEEK_V3_PRETRAIN_CONFIG_H100_FP8_SC_V2 (i.e., create
DEEPSEEK_V3_PRETRAIN_CONFIG_H100_FP8_SC_LARGE_SCALE by calling
replace(DEEPSEEK_V3_PRETRAIN_CONFIG_H100_FP8_SC_V2, global_batch_size=1024,
virtual_pipeline_model_parallel_size=2, pp_layout=None) or remove redundant
overrides if SC_V2 already sets them) so LARGE_SCALE directly derives SC
settings from the SC V2 symbol.
ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 99db9fc and 3b1fed8.

📒 Files selected for processing (1)
  • scripts/performance/configs/deepseek/deepseek_workload_base_configs.py

@scsudhakaran
Copy link
Copy Markdown
Contributor

/ok to test 3b1fed8

@ko3n1g ko3n1g marked this pull request as draft February 24, 2026 21:23
@ko3n1g
Copy link
Copy Markdown
Contributor

ko3n1g commented Mar 3, 2026

Merge via #2509

@ko3n1g ko3n1g closed this Mar 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants