Skip to content

cp: Onboarding LLAMA3 70B LoRa to B300 and B200 chips (2397) into r0.3.0#2581

Closed
svcnvidia-nemo-ci wants to merge 1 commit intor0.3.0from
cherry-pick-2397-r0.3.0
Closed

cp: Onboarding LLAMA3 70B LoRa to B300 and B200 chips (2397) into r0.3.0#2581
svcnvidia-nemo-ci wants to merge 1 commit intor0.3.0from
cherry-pick-2397-r0.3.0

Conversation

@svcnvidia-nemo-ci
Copy link
Copy Markdown
Contributor

@svcnvidia-nemo-ci svcnvidia-nemo-ci commented Feb 27, 2026

beep boop [🤖]: Hi @rhmukundan 👋,

we've cherry picked #2397 into  for you! 🚀

Please review and approve this cherry pick by your convenience!

Summary by CodeRabbit

  • New Features
    • Added Llama3 70B LoRA configurations for B200 and B300 GPU variants
    • Support for multiple precision options: BF16 and FP8 (CS and MX variants)
    • New performance-tuned configurations now available for fine-tuning tasks

Signed-off-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com>
Signed-off-by: NeMo Bot <nemo-bot@nvidia.com>
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot bot commented Feb 27, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@svcnvidia-nemo-ci
Copy link
Copy Markdown
Contributor Author

/ok to test 84e8d6f

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Feb 27, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e1594e5 and 84e8d6f.

📒 Files selected for processing (3)
  • scripts/performance/configs/llama/__init__.py
  • scripts/performance/configs/llama/llama3_llm_finetune.py
  • scripts/performance/configs/llama/llama3_workload_base_configs.py

📝 Walkthrough

Walkthrough

The PR adds LLAMA3 70B LoRA configuration support for B200 and B300 GPUs. New configuration functions are introduced in llama3_llm_finetune.py, base configurations and public aliases are defined in llama3_workload_base_configs.py, and these configurations are exported through the module's public API in init.py.

Changes

Cohort / File(s) Summary
Configuration Functions
scripts/performance/configs/llama/llama3_llm_finetune.py
Added two new public functions llama3_70b_lora_config_b300() and llama3_70b_lora_config_b200() that configure LoRA settings for 70B model on B300 and B200 GPUs respectively, with peft="lora", seq_length=4096, comm_overlap, and target_modules set to ["linear_qkv"].
Base Configurations and Aliases
scripts/performance/configs/llama/llama3_workload_base_configs.py
Added private base configurations _LLAMA3_70B_LORA_CONFIG_B300 and _LLAMA3_70B_LORA_CONFIG_B200 with GPU-specific settings, and six public aliases covering BF16, FP8_CS, and FP8_MX variants for both B300 and B200, with some variants sharing configurations across precision types.
Public API Exports
scripts/performance/configs/llama/__init__.py
Added imports for llama3_70b_lora_config_b200 and llama3_70b_lora_config_b300 when Megatron bridge is available, and updated __all__ list with six new public configuration constants for B200/B300 LoRA variants.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Suggested labels

performance, r0.3.0

Suggested reviewers

  • rhmukundan
  • malay-nagda
  • erhoo82
🚥 Pre-merge checks | ✅ 3 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Test Results For Major Changes ⚠️ Warning PR adds hardware-specific configurations for B300/B200 GPUs without test results, performance metrics, or validation documentation in the auto-generated description. Update PR description with test results, performance metrics, regression testing, and configuration notes from original PR demonstrating successful validation on target hardware.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly identifies the main change: onboarding LLAMA3 70B LoRA configurations for B300 and B200 GPU chips into the release branch r0.3.0.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch cherry-pick-2397-r0.3.0

Tip

Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs).
Share your feedback on Discord.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@ko3n1g
Copy link
Copy Markdown
Contributor

ko3n1g commented Mar 3, 2026

Merge via #2509

@ko3n1g ko3n1g closed this Mar 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants