DeepSeek-V3 recipes for H100 by scsudhakaran · Pull Request #2197 · NVIDIA-NeMo/Megatron-Bridge

scsudhakaran · 2026-02-03T18:10:26Z

This PR updates the DeepSeek-V3 H100 recipes with a configuration that provides better performance numbers.

Summary by CodeRabbit

Release Notes

Configuration Updates
- Updated DeepSeek V3 pretraining configuration parameters for H100 GPU environments with adjusted model parallelization strategy
- Added pipeline layout parameters to optimize resource distribution across training nodes
- Refined secondary pretraining configuration variant for improved performance characteristics
Performance Enhancements
- Enabled expandable memory segment allocation for DeepSeek V3 pretraining on H100 hardware to optimize GPU memory utilization

copy-pr-bot · 2026-02-03T18:10:29Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

malay-nagda · 2026-02-04T11:53:04Z

@scsudhakaran can you remove the draft tag?
was this tested with v2 configs (GBS=16384)

coderabbitai · 2026-02-10T13:15:03Z

📝 Walkthrough

Walkthrough

The changes modify DeepSeek V3 pretraining configurations for H100 GPUs by adjusting tensor and pipeline model parallelism settings, propagating pipeline layout configurations conditionally, and adding targeted environment variable configuration for specific hardware and model combinations.

Changes

Cohort / File(s)	Summary
Pipeline Layout Configuration `scripts/performance/configs/deepseek/deepseek_llm_pretrain.py`	Modified pipeline_model_parallel_layout assignment to use base_cfg.pp_layout when provided; otherwise recompute via set_deepseek_v3_pipeline_model_parallel_layout().
Workload Base Configurations `scripts/performance/configs/deepseek/deepseek_workload_base_configs.py`	Updated DEEPSEEK_V3_PRETRAIN_CONFIG_H100_V1 with tensor_model_parallel_size reduced from 4 to 2 and added pp_layout="Et
Environment Configuration `scripts/performance/perf_plugins.py`	Added conditional branch to set PYTORCH_CUDA_ALLOC_CONF="expandable_segments:True" for deepseek v3 pretraining on h100 GPU.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

#2186: Sets gb200 to use base_cfg.pp_layout in deepseek_llm_pretrain.py, parallel to base_cfg.pp_layout propagation in this PR.
#2076: Directly modifies deepseek LLM pretrain config to propagate base_cfg.pp_layout and introduces pp_layout field in workload configs.
#2060: Modifies model-specific environment variable gating in perf_plugins.py for different model cases.

Suggested labels

Run CICD

Suggested reviewers

malay-nagda
dingqingy-nv

🚥 Pre-merge checks | ✅ 3 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Test Results For Major Changes	⚠️ Warning	PR aims to improve DeepSeek-V3 H100 performance through configuration changes but provides no performance test results, benchmarks, or before-and-after metrics.	Add documented performance test results comparing old (TP=4) and new (TP=2) configurations, including throughput numbers, testing environment details, and convergence metrics.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'DeepSeek-V3 recipes for H100' directly and clearly describes the main change: adding/updating DeepSeek-V3 configuration recipes optimized for H100 hardware.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch scsudhakaran/deepseekv3-h100

No actionable comments were generated in the recent review. 🎉

🧹 Recent nitpick comments

scripts/performance/perf_plugins.py (1)
270-276: Minor style nit: missing space after elif.

Line 270 uses elif( while the preceding if ( on line 262 has a space before the parenthesis. This is inconsistent and may be flagged by ruff.
Proposed fix
-        elif(
+        elif (

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

scsudhakaran · 2026-02-10T13:21:35Z

/ok to test 3bdd232

Signed-off-by: Sanju C Sudhakaran <scsudhakaran@nvidia.com>

scsudhakaran · 2026-02-10T13:38:29Z

/ok to test 724e50c

Signed-off-by: Sanju C Sudhakaran <scsudhakaran@nvidia.com> Signed-off-by: sowmen <sowmendipta@gmail.com>

scsudhakaran requested a review from malay-nagda February 4, 2026 10:34

scsudhakaran added the r0.3.0 Cherry-pick label for r0.3.0 release branch label Feb 4, 2026

malay-nagda previously approved these changes Feb 4, 2026

View reviewed changes

malay-nagda self-requested a review February 4, 2026 11:52

scsudhakaran dismissed malay-nagda’s stale review via e7667ec February 4, 2026 16:09

scsudhakaran force-pushed the scsudhakaran/deepseekv3-h100 branch 2 times, most recently from e7667ec to 3bdd232 Compare February 10, 2026 11:57

scsudhakaran marked this pull request as ready for review February 10, 2026 13:09

malay-nagda previously approved these changes Feb 10, 2026

View reviewed changes

malay-nagda added performance performance/release Performance items related with NeMo release performance/optimize Performance optimization tracking labels Feb 10, 2026

malay-nagda added this to the 26.02 milestone Feb 10, 2026

copy-pr-bot bot had a problem deploying to nemo-ci February 10, 2026 13:22 Error

DeepSeek-V3 recipes for H100

724e50c

Signed-off-by: Sanju C Sudhakaran <scsudhakaran@nvidia.com>

scsudhakaran dismissed malay-nagda’s stale review via 724e50c February 10, 2026 13:32

scsudhakaran force-pushed the scsudhakaran/deepseekv3-h100 branch from 3bdd232 to 724e50c Compare February 10, 2026 13:32

copy-pr-bot bot temporarily deployed to nemo-ci February 10, 2026 13:38 Inactive

copy-pr-bot bot temporarily deployed to test February 10, 2026 13:39 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci February 10, 2026 14:05 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci February 10, 2026 14:12 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci February 10, 2026 14:22 Inactive

scsudhakaran requested a review from malay-nagda February 10, 2026 15:06

malay-nagda approved these changes Feb 10, 2026

View reviewed changes

scsudhakaran merged commit f36e5de into main Feb 10, 2026
87 of 90 checks passed

scsudhakaran deleted the scsudhakaran/deepseekv3-h100 branch February 10, 2026 15:38

coderabbitai bot mentioned this pull request Feb 10, 2026

DeepSeek-V3 recipes for H100 #2312

Merged

sowmen pushed a commit to sowmen/Megatron-Bridge that referenced this pull request Feb 11, 2026

DeepSeek-V3 recipes for H100 (NVIDIA-NeMo#2197)

16bc755

Signed-off-by: Sanju C Sudhakaran <scsudhakaran@nvidia.com> Signed-off-by: sowmen <sowmendipta@gmail.com>

scsudhakaran mentioned this pull request Feb 12, 2026

[Test] H100 DSv3 Performance tuning with HybridEP #2062

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DeepSeek-V3 recipes for H100#2197

DeepSeek-V3 recipes for H100#2197
scsudhakaran merged 1 commit intomainfrom
scsudhakaran/deepseekv3-h100

scsudhakaran commented Feb 3, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

copy-pr-bot bot commented Feb 3, 2026

Uh oh!

malay-nagda commented Feb 4, 2026

Uh oh!

coderabbitai bot commented Feb 10, 2026

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Uh oh!

scsudhakaran commented Feb 10, 2026

Uh oh!

scsudhakaran commented Feb 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

scsudhakaran commented Feb 3, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Release Notes

Uh oh!

copy-pr-bot bot commented Feb 3, 2026

Uh oh!

malay-nagda commented Feb 4, 2026

Uh oh!

coderabbitai bot commented Feb 10, 2026

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Uh oh!

scsudhakaran commented Feb 10, 2026

Uh oh!

scsudhakaran commented Feb 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

scsudhakaran commented Feb 3, 2026 •

edited by coderabbitai bot

Loading