Onboarding LLAMA3 70B LoRa to B300 and B200 chips#2397
Conversation
Signed-off-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com>
📝 WalkthroughWalkthroughAdds B200 and B300 GPU variant support for Llama3 70B LoRA fine-tuning configurations. Introduces two new configuration builder functions, base configuration constants with multiple precision variants (BF16, FP8 CS, FP8 MX), and corresponding public API exports across the configuration module hierarchy. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Tip Issue Planner is now in beta. Read the docs and try it out! Share your feedback on Discord. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
scripts/performance/configs/llama/__init__.py (3)
208-219:⚠️ Potential issue | 🟠 MajorMissing B300 LoRA constants in
__all__.Only the B200 LoRA constants are added to
__all__(lines 212-214). The B300 LoRA constants should also be listed here for a complete public API surface.Proposed fix
"LLAMA3_70B_LORA_CONFIG_B200_BF16_V1", "LLAMA3_70B_LORA_CONFIG_B200_FP8_CS_V1", "LLAMA3_70B_LORA_CONFIG_B200_FP8_MX_V1", + "LLAMA3_70B_LORA_CONFIG_B300_BF16_V1", + "LLAMA3_70B_LORA_CONFIG_B300_FP8_CS_V1", + "LLAMA3_70B_LORA_CONFIG_B300_FP8_MX_V1", "LLAMA3_70B_LORA_CONFIG_GB300_BF16_V1",🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/performance/configs/llama/__init__.py` around lines 208 - 219, The __all__ list is missing the B300 LoRA constant names; update the module's export list (the __all__ variable) to include the B300 LoRA symbols corresponding to the GB300 entries—add "LLAMA3_70B_LORA_CONFIG_B300_BF16_V1", "LLAMA3_70B_LORA_CONFIG_B300_FP8_CS_V1", and "LLAMA3_70B_LORA_CONFIG_B300_FP8_MX_V1" to __all__ alongside the existing B200 and GB300 entries so the B300 configurations are part of the public API.
269-296:⚠️ Potential issue | 🟠 MajorMissing
llama3_70b_lora_config_b300in the dynamic__all__extension.
llama3_70b_lora_config_b200is added at line 286, butllama3_70b_lora_config_b300is missing from this list.Proposed fix
"llama3_70b_lora_config_b200", + "llama3_70b_lora_config_b300", "llama3_70b_lora_config_gb200",🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/performance/configs/llama/__init__.py` around lines 269 - 296, The __all__ extension guarded by HAVE_MEGATRON_BRIDGE is missing the symbol "llama3_70b_lora_config_b300"; update the list in the block that extends __all__ (the array containing "llama3_70b_lora_config_b200", "llama3_70b_lora_config_gb200", "llama3_70b_lora_config_gb300", "llama3_70b_lora_config_h100") to include "llama3_70b_lora_config_b300" so the exported names match the defined configs.
9-19:⚠️ Potential issue | 🟠 MajorMissing import and export of
llama3_70b_lora_config_b300.The B300 LoRA config function is defined in
llama3_llm_finetune.pybut is not imported here. Onlyllama3_70b_lora_config_b200was added (line 12), whilellama3_70b_lora_config_b300was omitted. This means B300 LoRA won't be accessible via this package.Proposed fix
from .llama3_llm_finetune import ( llama3_8b_sft_config_gb200, llama3_8b_sft_config_h100, llama3_70b_lora_config_b200, + llama3_70b_lora_config_b300, llama3_70b_lora_config_gb200, llama3_70b_lora_config_gb300, llama3_70b_lora_config_h100,🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/performance/configs/llama/__init__.py` around lines 9 - 19, The package export is missing the llama3_70b_lora_config_b300 symbol; add llama3_70b_lora_config_b300 to the import list in __init__.py (alongside llama3_70b_lora_config_b200) so it is exported by the package; if there is an __all__ or any exported names list in this module, include llama3_70b_lora_config_b300 there as well to make the B300 LoRA config accessible.
🤖 Fix all issues with AI agents
Verify each finding against the current code and only fix it if needed.
In `@scripts/performance/configs/llama/__init__.py`:
- Around line 64-72: The file is missing imports for the B300 LoRA constants
used in the module; add imports for LLAMA3_70B_LORA_CONFIG_B300_BF16_V1,
LLAMA3_70B_LORA_CONFIG_B300_FP8_CS_V1, and LLAMA3_70B_LORA_CONFIG_B300_FP8_MX_V1
from .llama3_workload_base_configs so the constants referenced in the exported
list (e.g., LLAMA3_70B_LORA_CONFIG_GB300_BF16_V1,
LLAMA3_70B_LORA_CONFIG_B300_FP8_MX_V1) are actually defined; update the same
import statement that currently imports the B200 constants to include these
three B300 symbols.
- Around line 208-219: The __all__ list is missing the B300 LoRA constant names;
update the module's export list (the __all__ variable) to include the B300 LoRA
symbols corresponding to the GB300 entries—add
"LLAMA3_70B_LORA_CONFIG_B300_BF16_V1", "LLAMA3_70B_LORA_CONFIG_B300_FP8_CS_V1",
and "LLAMA3_70B_LORA_CONFIG_B300_FP8_MX_V1" to __all__ alongside the existing
B200 and GB300 entries so the B300 configurations are part of the public API.
- Around line 269-296: The __all__ extension guarded by HAVE_MEGATRON_BRIDGE is
missing the symbol "llama3_70b_lora_config_b300"; update the list in the block
that extends __all__ (the array containing "llama3_70b_lora_config_b200",
"llama3_70b_lora_config_gb200", "llama3_70b_lora_config_gb300",
"llama3_70b_lora_config_h100") to include "llama3_70b_lora_config_b300" so the
exported names match the defined configs.
- Around line 9-19: The package export is missing the
llama3_70b_lora_config_b300 symbol; add llama3_70b_lora_config_b300 to the
import list in __init__.py (alongside llama3_70b_lora_config_b200) so it is
exported by the package; if there is an __all__ or any exported names list in
this module, include llama3_70b_lora_config_b300 there as well to make the B300
LoRA config accessible.
Signed-off-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com>
Signed-off-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com>
|
/ok to test 13ee9f6 |
Signed-off-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com>
dd660c4
13ee9f6 to
dd660c4
Compare
|
/ok to test dd660c4 |
Signed-off-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com>
This merged yesterday, which removed llama3_70b_finetune_config from that file b162358 This merged today but was a stale branch, still assuming llama3_70b_finetune_config was imported in that file #2397 Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
Signed-off-by: Raghav Hrishikeshan Mukundan <rmukundan@nvidia.com>
Summary by CodeRabbit
New Features