Skip to content

Onboard LLAMA3 LoRa to B200 & B300 Chips#2396

Open
rhmukundan wants to merge 4 commits intor0.3.0from
rmukundan/onboard_llama3_lora_b200
Open

Onboard LLAMA3 LoRa to B200 & B300 Chips#2396
rhmukundan wants to merge 4 commits intor0.3.0from
rmukundan/onboard_llama3_lora_b200

Conversation

@rhmukundan
Copy link
Copy Markdown
Contributor

@rhmukundan rhmukundan commented Feb 16, 2026

Summary by CodeRabbit

Release Notes

  • New Features
    • Added Llama3 70B LoRA fine-tuning configurations for B200 and B300 GPU variants with BF16 and FP8 precision options.

Signed-off-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com>
Signed-off-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com>
@rhmukundan rhmukundan self-assigned this Feb 16, 2026
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot bot commented Feb 16, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@rhmukundan rhmukundan requested a review from erhoo82 February 17, 2026 00:13
@rhmukundan rhmukundan marked this pull request as ready for review February 17, 2026 00:13
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Feb 17, 2026

📝 Walkthrough

Walkthrough

This PR adds new LoRA configuration variants for Llama3 70B models targeting B200 and B300 GPUs. The changes span three files: base workload configurations, fine-tuning function definitions, and public module exports. All modifications are purely additive with no existing code alterations.

Changes

Cohort / File(s) Summary
Base Workload Configurations
scripts/performance/configs/llama/llama3_workload_base_configs.py
Introduces six new B300 and B200 LoRA configuration definitions with variants for BF16 and FP8 precision modes. Configurations specify 8 GPUs, LORA PEFT, and varying parallelism settings (B300: 1/1/1, B200: 1/2/1 for tensor/pipeline/context parallelism). Updated all to expose all new variants.
Fine-tune Configuration Functions
scripts/performance/configs/llama/llama3_llm_finetune.py
Adds two new functions llama3_70b_lora_config_b300() and llama3_70b_lora_config_b200() that construct ConfigContainer objects with LORA settings (seq_length 4096, QKV target modules, padding and communicator overlap features).
Module Exports
scripts/performance/configs/llama/__init__.py
Adds imports and public exports for B200 LoRA configurations including three variants: BF16_V1, FP8_CS_V1, and FP8_MX_V1, along with the base configuration function.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related PRs

Suggested labels

performance, r0.3.0

Suggested reviewers

  • erhoo82
  • malay-nagda
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Test Results For Major Changes ⚠️ Warning PR adds new LLAMA3 LoRA configurations for B200 and B300 chips but provides no documented test results or performance validation. Add test results or performance benchmarking data demonstrating the new B200 and B300 LoRA configurations function correctly and achieve expected performance.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly summarizes the main change: adding LLAMA3 LoRA configurations for B200 and B300 GPU chips, which aligns with all file modifications adding new configs for these hardware variants.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Merge Conflict Detection ✅ Passed ✅ No merge conflicts detected when merging into r0.3.0

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch rmukundan/onboard_llama3_lora_b200

Tip

Issue Planner is now in beta. Read the docs and try it out! Share your feedback on Discord.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
scripts/performance/configs/llama/__init__.py (2)

40-119: ⚠️ Potential issue | 🔴 Critical

Missing imports for LLAMA3_70B_LORA_CONFIG_B300_* base config constants.

The B300 LoRA base configs (LLAMA3_70B_LORA_CONFIG_B300_BF16_V1, LLAMA3_70B_LORA_CONFIG_B300_FP8_CS_V1, LLAMA3_70B_LORA_CONFIG_B300_FP8_MX_V1) are defined and exported in llama3_workload_base_configs.py (lines 616–621, 741–743) but are not imported here, unlike their B200 counterparts (lines 67–69).

Proposed fix — add B300 LoRA imports alongside the B200 ones
     LLAMA3_70B_LORA_CONFIG_B200_BF16_V1,
     LLAMA3_70B_LORA_CONFIG_B200_FP8_CS_V1,
     LLAMA3_70B_LORA_CONFIG_B200_FP8_MX_V1,
+    LLAMA3_70B_LORA_CONFIG_B300_BF16_V1,
+    LLAMA3_70B_LORA_CONFIG_B300_FP8_CS_V1,
+    LLAMA3_70B_LORA_CONFIG_B300_FP8_MX_V1,
     LLAMA3_70B_LORA_CONFIG_GB300_BF16_V1,

And add the corresponding entries to __all__:

     "LLAMA3_70B_LORA_CONFIG_B200_FP8_MX_V1",
+    "LLAMA3_70B_LORA_CONFIG_B300_BF16_V1",
+    "LLAMA3_70B_LORA_CONFIG_B300_FP8_CS_V1",
+    "LLAMA3_70B_LORA_CONFIG_B300_FP8_MX_V1",
     "LLAMA3_70B_LORA_CONFIG_GB300_BF16_V1",
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/performance/configs/llama/__init__.py` around lines 40 - 119, The
file is missing imports and exports for the B300 LoRA base configs; add imports
for LLAMA3_70B_LORA_CONFIG_B300_BF16_V1, LLAMA3_70B_LORA_CONFIG_B300_FP8_CS_V1,
and LLAMA3_70B_LORA_CONFIG_B300_FP8_MX_V1 to the top import list (next to the
existing LLAMA3_70B_LORA_CONFIG_B200_* entries) and include those three symbols
in the module's __all__ export list so the B300 LoRA configs are available where
expected.

9-19: ⚠️ Potential issue | 🔴 Critical

Missing import for llama3_70b_lora_config_b300.

The B300 LoRA function is defined in llama3_llm_finetune.py (line 262) but is not imported here. Only llama3_70b_lora_config_b200 was added (line 12). Similarly, it's missing from the __all__ extension block (line 286).

Proposed fix
     if HAVE_MEGATRON_BRIDGE:
         from .llama3_llm_finetune import (
             llama3_8b_sft_config_gb200,
             llama3_8b_sft_config_h100,
             llama3_70b_lora_config_b200,
+            llama3_70b_lora_config_b300,
             llama3_70b_lora_config_gb200,
             llama3_70b_lora_config_gb300,

And in the __all__.extend block:

             "llama3_70b_lora_config_b200",
+            "llama3_70b_lora_config_b300",
             "llama3_70b_lora_config_gb200",
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/performance/configs/llama/__init__.py` around lines 9 - 19, Add the
missing export for the B300 LoRA config: import llama3_70b_lora_config_b300
alongside the other imports from llama3_llm_finetune (so it appears with
llama3_70b_lora_config_b200 etc.), and also add "llama3_70b_lora_config_b300" to
the __all__.extend block so it is exported; locate the import list and the
__all__.__extend block by the symbols llama3_70b_lora_config_b200 and
__all__.extend respectively and insert the new name in both places.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@scripts/performance/configs/llama/__init__.py`:
- Around line 40-119: The file is missing imports and exports for the B300 LoRA
base configs; add imports for LLAMA3_70B_LORA_CONFIG_B300_BF16_V1,
LLAMA3_70B_LORA_CONFIG_B300_FP8_CS_V1, and LLAMA3_70B_LORA_CONFIG_B300_FP8_MX_V1
to the top import list (next to the existing LLAMA3_70B_LORA_CONFIG_B200_*
entries) and include those three symbols in the module's __all__ export list so
the B300 LoRA configs are available where expected.
- Around line 9-19: Add the missing export for the B300 LoRA config: import
llama3_70b_lora_config_b300 alongside the other imports from llama3_llm_finetune
(so it appears with llama3_70b_lora_config_b200 etc.), and also add
"llama3_70b_lora_config_b300" to the __all__.extend block so it is exported;
locate the import list and the __all__.__extend block by the symbols
llama3_70b_lora_config_b200 and __all__.extend respectively and insert the new
name in both places.

Signed-off-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant