Skip to content

ci(fix): PerfPlugin for llama#2060

Merged
ko3n1g merged 2 commits intomainfrom
ko3n1g/ci/fix-perf-plugin
Jan 26, 2026
Merged

ci(fix): PerfPlugin for llama#2060
ko3n1g merged 2 commits intomainfrom
ko3n1g/ci/fix-perf-plugin

Conversation

@ko3n1g
Copy link
Copy Markdown
Contributor

@ko3n1g ko3n1g commented Jan 25, 2026

What does this PR do ?

Add a one line overview of what this PR aims to accomplish.

Changelog

  • Add specific line by line info of high level changes in this PR.

GitHub Actions CI

See the CI sectionin the Contributing doc for how to trigger the CI. A Nvidia developer will need to approve and trigger the CI for external contributors.

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you add or update any necessary documentation?
  • Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
    • Reviewer: Does the PR have correct import guards for all optional libraries?

If you haven't finished some of the above items you can still open "Draft" PR.

Additional Information

  • Related to # (issue)

Summary by CodeRabbit

  • Refactor
    • Improved model identification and environment variable configuration logic during pretraining phase, enhancing consistency and handling of Llama model variants across different sizes and configurations.
    • Standardized hardware-specific performance optimizations across multiple systems (h100, gb200, gb300), ensuring more predictable and consistent training behavior with enhanced model classification logic and optimized configuration management for better results.

✏️ Tip: You can customize this high-level summary in your review settings.

Signed-off-by: oliver könig <okoenig@nvidia.com>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Jan 25, 2026

📝 Walkthrough

Walkthrough

The change refines model-specific environment variable gating conditions in the performance plugins module. It updates checks for Llama-based models by changing model_family_name comparisons from specific subversions (llama31, llama3) to a unified "llama" check, affecting when hardware-specific NCCL and cuDNN optimizations are applied during pretraining.

Changes

Cohort / File(s) Change Summary
Environment variable gating refinement
scripts/performance/perf_plugins.py
Updated model-specific condition checks to use unified model_family_name == "llama" instead of version-specific checks (llama31, llama3\*), affecting applicability of gb200, h100, and gb300 hardware optimizations (NCCL_CTA_POLICY, del_cudnn_ln) during pretraining

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~5 minutes

🚥 Pre-merge checks | ✅ 3 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Test Results For Major Changes ⚠️ Warning PR alters performance-related gating logic for llama-based models without before/after benchmarks or testing details to demonstrate no regression. Add detailed performance results comparing previous and updated behavior with hardware configuration, model, dataset, and measurement methodology.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'ci(fix): PerfPlugin for llama' is directly related to the changeset, which refines llama model-specific environment variable gating in the PerfPlugin by updating model checks from 'llama31' to 'llama' family patterns.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@ko3n1g ko3n1g merged commit 93d2395 into main Jan 26, 2026
45 of 47 checks passed
@ko3n1g ko3n1g deleted the ko3n1g/ci/fix-perf-plugin branch January 26, 2026 08:16
nv-mollys pushed a commit that referenced this pull request Jan 27, 2026
Signed-off-by: oliver könig <okoenig@nvidia.com>
Signed-off-by: mollys <mollys@mollys.nvidia.com>
aroshanghias-nvd pushed a commit to aroshanghias-nvd/Megatron-Bridge that referenced this pull request Jan 29, 2026
Signed-off-by: oliver könig <okoenig@nvidia.com>
Signed-off-by: Ali Roshan Ghias <aroshanghias@nvidia.com>
aroshanghias-nvd pushed a commit to aroshanghias-nvd/Megatron-Bridge that referenced this pull request Jan 29, 2026
Signed-off-by: oliver könig <okoenig@nvidia.com>
Signed-off-by: Ali Roshan Ghias <aroshanghias@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants