cp: `Onboarding LLAMA3 70B LoRa to B300 and B200 chips (2397)` into `r0.3.0` by svcnvidia-nemo-ci · Pull Request #2581 · NVIDIA-NeMo/Megatron-Bridge

svcnvidia-nemo-ci · 2026-02-27T06:50:51Z

beep boop [🤖]: Hi @rhmukundan 👋,

we've cherry picked #2397 into  for you! 🚀

Please review and approve this cherry pick by your convenience!

Summary by CodeRabbit

New Features
- Added Llama3 70B LoRA configurations for B200 and B300 GPU variants
- Support for multiple precision options: BF16 and FP8 (CS and MX variants)
- New performance-tuned configurations now available for fine-tuning tasks

Signed-off-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com> Signed-off-by: NeMo Bot <nemo-bot@nvidia.com>

copy-pr-bot · 2026-02-27T06:50:54Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

svcnvidia-nemo-ci · 2026-02-27T06:50:54Z

/ok to test 84e8d6f

coderabbitai · 2026-02-27T06:55:17Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e1594e5 and 84e8d6f.

📒 Files selected for processing (3)

scripts/performance/configs/llama/__init__.py
scripts/performance/configs/llama/llama3_llm_finetune.py
scripts/performance/configs/llama/llama3_workload_base_configs.py

📝 Walkthrough

Walkthrough

The PR adds LLAMA3 70B LoRA configuration support for B200 and B300 GPUs. New configuration functions are introduced in llama3_llm_finetune.py, base configurations and public aliases are defined in llama3_workload_base_configs.py, and these configurations are exported through the module's public API in init.py.

Changes

Cohort / File(s)	Summary
Configuration Functions `scripts/performance/configs/llama/llama3_llm_finetune.py`	Added two new public functions `llama3_70b_lora_config_b300()` and `llama3_70b_lora_config_b200()` that configure LoRA settings for 70B model on B300 and B200 GPUs respectively, with peft="lora", seq_length=4096, comm_overlap, and target_modules set to ["linear_qkv"].
Base Configurations and Aliases `scripts/performance/configs/llama/llama3_workload_base_configs.py`	Added private base configurations `_LLAMA3_70B_LORA_CONFIG_B300` and `_LLAMA3_70B_LORA_CONFIG_B200` with GPU-specific settings, and six public aliases covering BF16, FP8_CS, and FP8_MX variants for both B300 and B200, with some variants sharing configurations across precision types.
Public API Exports `scripts/performance/configs/llama/__init__.py`	Added imports for `llama3_70b_lora_config_b200` and `llama3_70b_lora_config_b300` when Megatron bridge is available, and updated `__all__` list with six new public configuration constants for B200/B300 LoRA variants.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

cp: Updating Configs for LLAMA3 70B LoRa (2292) into r0.3.0 #2311: Updates GB200/GB300 seq_length, padding, and parallelism details that the main PR's new B200/B300 configs mirror and reuse.
cp: LLAMA3 70B: LoRa enabled in all modules instead of only LinearQKV (2181) into r0.3.0 #2310: Sets cfg.peft.target_modules = ["linear_qkv"] for 70B LoRA configs, same pattern applied in the new B200/B300 variant functions.
Onboarding LLAMA3 70B LoRa to B300 and B200 chips #2397: Introduces the same B200/B300 LLAMA3 70B LoRA configurations across the same three files.

Suggested labels

performance, r0.3.0

Suggested reviewers

rhmukundan
malay-nagda
erhoo82

🚥 Pre-merge checks | ✅ 3 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Test Results For Major Changes	⚠️ Warning	PR adds hardware-specific configurations for B300/B200 GPUs without test results, performance metrics, or validation documentation in the auto-generated description.	Update PR description with test results, performance metrics, regression testing, and configuration notes from original PR demonstrating successful validation on target hardware.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly identifies the main change: onboarding LLAMA3 70B LoRA configurations for B300 and B200 GPU chips into the release branch r0.3.0.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings (stacked PR)
📝 Generate docstrings (commit on current branch)

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch cherry-pick-2397-r0.3.0

Tip

Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs).
Share your feedback on Discord.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

ko3n1g · 2026-03-03T10:05:08Z

Merge via #2509

Onboarding LLAMA3 70B LoRa to B300 and B200 chips (#2397)

84e8d6f

Signed-off-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com> Signed-off-by: NeMo Bot <nemo-bot@nvidia.com>

svcnvidia-nemo-ci requested a review from rhmukundan February 27, 2026 06:50

svcnvidia-nemo-ci added cherry-pick Run CICD labels Feb 27, 2026

copy-pr-bot bot temporarily deployed to nemo-ci February 27, 2026 06:51 Inactive

copy-pr-bot bot temporarily deployed to test February 27, 2026 06:51 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci February 27, 2026 07:03 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci February 27, 2026 07:39 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci February 27, 2026 07:50 Inactive

ko3n1g closed this Mar 3, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cp: `Onboarding LLAMA3 70B LoRa to B300 and B200 chips (2397)` into `r0.3.0`#2581

cp: `Onboarding LLAMA3 70B LoRa to B300 and B200 chips (2397)` into `r0.3.0`#2581
svcnvidia-nemo-ci wants to merge 1 commit intor0.3.0from
cherry-pick-2397-r0.3.0

svcnvidia-nemo-ci commented Feb 27, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

copy-pr-bot bot commented Feb 27, 2026

Uh oh!

svcnvidia-nemo-ci commented Feb 27, 2026

Uh oh!

coderabbitai bot commented Feb 27, 2026

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

ko3n1g commented Mar 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

svcnvidia-nemo-ci commented Feb 27, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

copy-pr-bot bot commented Feb 27, 2026

Uh oh!

svcnvidia-nemo-ci commented Feb 27, 2026

Uh oh!

coderabbitai bot commented Feb 27, 2026

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

ko3n1g commented Mar 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

svcnvidia-nemo-ci commented Feb 27, 2026 •

edited by coderabbitai bot

Loading