cp: `Updating Configs for LLAMA3 70B LoRa (2292)` into `r0.3.0` by ko3n1g · Pull Request #2311 · NVIDIA-NeMo/Megatron-Bridge

ko3n1g · 2026-02-10T17:34:55Z

beep boop [🤖]: Hi @rhmukundan 👋,

we've cherry picked #2292 into  for you! 🚀

Please review and approve this cherry pick by your convenience!

Summary by CodeRabbit

Chores
- Revised Llama3 70B model configuration settings across multiple hardware platforms (GB300, GB200, H100). Updated sequence length handling with dynamic precision-based configuration, adjusted parallelism and batch size parameters, and introduced new hardware-specific configuration variants with CUDA graph support.

Signed-off-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com> Signed-off-by: NeMo Bot <nemo-bot@nvidia.com>

copy-pr-bot · 2026-02-10T17:34:59Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

ko3n1g · 2026-02-10T17:34:59Z

/ok to test 81eddd8

coderabbitai · 2026-02-10T17:39:47Z

📝 Walkthrough

Walkthrough

This pull request updates sequence length, padding, and parallelism configurations for Llama3 70B models across GB300, GB200, and H100 hardware variants. Sequence length increased to 4096 with padding support for some configs, while tensor and pipeline parallelism settings are reduced to 1 for improved efficiency.

Changes

Cohort / File(s)	Summary
Sequence Length & Padding Config `scripts/performance/configs/llama/llama3_llm_finetune.py`	GB300 LoRA config: seq_length increased from 2048 to 4096 with added padding configuration (pad_cu_seqlens, pad_to_max_length). GB200 LoRA config: dynamic seq_length based on precision (2048 for bf16, 4096 for others).
Parallelism & Batch Size Tuning `scripts/performance/configs/llama/llama3_workload_base_configs.py`	GB300/GB200 LoRA: tensor/pipeline/context parallelism reduced to 1; GB300 global batch reduced to 32; GB200 FP8 CS: new explicit replace() variant with adjusted parallelism and batch settings; H100 LoRA: tensor parallelism reduced to 1, context parallelism added; H100 BF16: recompute_num_layers reduced from 2 to 1; H100 FP8 CS: now uses replace() with tensor_model_parallel_size=2.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Suggested labels

r0.3.0, performance

Suggested reviewers

malay-nagda

🚥 Pre-merge checks | ✅ 3 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Test Results For Major Changes	⚠️ Warning	PR contains significant LLAMA3 70B LoRa configuration changes (sequence length 2048→4096, batch size 64→32, parallelism adjustments) but PR description lacks test results, performance benchmarks, or regression analysis.	Update PR description with test results, before-and-after performance metrics, hardware/precision context, and validation evidence that configuration changes do not cause training regressions.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly identifies the main change as updating configurations for LLAMA3 70B LoRa, directly aligning with the raw summary which shows configuration updates to both llama3_llm_finetune.py and llama3_workload_base_configs.py files.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch cherry-pick-2292-r0.3.0

Tip

Issue Planner is now in beta. Read the docs and try it out! Share your feedback on Discord.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

scripts/performance/configs/llama/llama3_llm_finetune.py (1)
234-246: ⚠️ Potential issue | 🟠 Major

Add pad_cu_seqlens and pad_to_max_length settings to GB200 LoRA config for CUDA graph compatibility.

The GB200 LoRA config enables cuda_graph_impl="transformer_engine" with cuda_graph_scope="mlp" (identical to GB300), uses packed_sequence=True, and for FP8 variants uses seq_length=4096 (same as GB300). However, it lacks the padding settings that GB300 explicitly includes with a comment explaining they are "required for CUDA graphs and avoids NaN issues in attention kernels."

Add these lines before the return statement:
cfg.dataset.packed_sequence_specs.pad_cu_seqlens = True
cfg.dataset.dataset_kwargs["pad_to_max_length"] = True

🧹 Nitpick comments (1)

scripts/performance/configs/llama/llama3_llm_finetune.py (1)

234-235: Consider making the precision-dependent seq_length selection more explicit/extensible.

The if precision.lower() == "bf16" else 4096 pattern is concise but could silently assign 4096 to any future precision string (e.g., "nvfp4"). This is fine for now since the workload base configs only define BF16, FP8_CS, and FP8_MX variants for GB200 LoRA, but worth noting if more precisions are added later.

ko3n1g · 2026-02-10T23:56:51Z

Tested and updated golden values

Updating Configs for LLAMA3 70B LoRa (#2292)

81eddd8

Signed-off-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com> Signed-off-by: NeMo Bot <nemo-bot@nvidia.com>

ko3n1g requested a review from rhmukundan February 10, 2026 17:34

ko3n1g added cherry-pick Run CICD labels Feb 10, 2026

copy-pr-bot bot temporarily deployed to nemo-ci February 10, 2026 17:35 Inactive

copy-pr-bot bot temporarily deployed to test February 10, 2026 17:35 Inactive

coderabbitai bot reviewed Feb 10, 2026

View reviewed changes

rhmukundan approved these changes Feb 10, 2026

View reviewed changes

copy-pr-bot bot temporarily deployed to nemo-ci February 10, 2026 18:36 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci February 10, 2026 18:45 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci February 10, 2026 19:03 Inactive

copy-pr-bot bot had a problem deploying to nemo-ci February 10, 2026 19:03 Failure

copy-pr-bot bot temporarily deployed to nemo-ci February 10, 2026 19:03 Inactive

ko3n1g merged commit c9dcdb6 into r0.3.0 Feb 10, 2026
48 of 51 checks passed

ko3n1g deleted the cherry-pick-2292-r0.3.0 branch February 10, 2026 23:56

This was referenced Feb 17, 2026

Onboarding LLAMA3 70B LoRa to B300 and B200 chips #2397

Merged

Onboard LLAMA3 LoRa to B200 & B300 Chips #2396

Open

coderabbitai bot mentioned this pull request Feb 27, 2026

cp: Onboarding LLAMA3 70B LoRa to B300 and B200 chips (2397) into r0.3.0 #2581

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cp: `Updating Configs for LLAMA3 70B LoRa (2292)` into `r0.3.0`#2311

cp: `Updating Configs for LLAMA3 70B LoRa (2292)` into `r0.3.0`#2311
ko3n1g merged 1 commit intor0.3.0from
cherry-pick-2292-r0.3.0

ko3n1g commented Feb 10, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

copy-pr-bot bot commented Feb 10, 2026

Uh oh!

ko3n1g commented Feb 10, 2026

Uh oh!

coderabbitai bot commented Feb 10, 2026

Walkthrough

Changes

Estimated code review effort

Suggested labels

Suggested reviewers

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

ko3n1g commented Feb 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ko3n1g commented Feb 10, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

copy-pr-bot bot commented Feb 10, 2026

Uh oh!

ko3n1g commented Feb 10, 2026

Uh oh!

coderabbitai bot commented Feb 10, 2026

Walkthrough

Changes

Estimated code review effort

Suggested labels

Suggested reviewers

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ko3n1g commented Feb 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ko3n1g commented Feb 10, 2026 •

edited by coderabbitai bot

Loading