Skip to content

cp: fix: use local_rank (#2328)#2330

Merged
ko3n1g merged 1 commit intor0.3.0from
ko3n1g/cp/0893618f27c5075f8ff104db3154cb8c1672cac9
Feb 11, 2026
Merged

cp: fix: use local_rank (#2328)#2330
ko3n1g merged 1 commit intor0.3.0from
ko3n1g/cp/0893618f27c5075f8ff104db3154cb8c1672cac9

Conversation

@ko3n1g
Copy link
Copy Markdown
Contributor

@ko3n1g ko3n1g commented Feb 11, 2026

What does this PR do ?

Add a one line overview of what this PR aims to accomplish.

Changelog

  • Add specific line by line info of high level changes in this PR.

GitHub Actions CI

See the CI sectionin the Contributing doc for how to trigger the CI. A Nvidia developer will need to approve and trigger the CI for external contributors.

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you add or update any necessary documentation?
  • Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
    • Reviewer: Does the PR have correct import guards for all optional libraries?

If you haven't finished some of the above items you can still open "Draft" PR.

Additional Information

  • Related to # (issue)

Summary by CodeRabbit

  • Bug Fixes
    • Improved distributed training initialization to properly handle dataset index compilation in multi-node environments, ensuring consistent behavior across all compute nodes.

Signed-off-by: oliver könig <okoenig@nvidia.com>
@ko3n1g ko3n1g changed the title fix: use local_rank cp: fix: use local_rank (#2328) Feb 11, 2026
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Feb 11, 2026

📝 Walkthrough

Walkthrough

The change modifies dataset index builder compilation gating in initialize_megatron from using global rank zero to local pre-initialization rank zero, enabling per-node compilation in multi-node distributed training setups without shared filesystems.

Changes

Cohort / File(s) Summary
Dataset compilation rank gating
src/megatron/bridge/training/initialize.py
Replaced global rank condition (get_rank_safe() == 0) with local pre-initialization rank condition (get_local_rank_preinit() == 0) to trigger dataset index compilation per local rank instead of only on global rank zero, supporting multi-node configurations without shared storage.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related PRs

Suggested reviewers

  • thomasdhc
🚥 Pre-merge checks | ✅ 3 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Test Results For Major Changes ⚠️ Warning PR makes a targeted rank-checking logic change for multi-node setups, but provides no testing documentation, verification results, or description of validation in the PR description template. Add documentation of testing approach, multi-node scenario validation results, regression testing, and any performance/convergence impact to the PR description.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'fix: use local_rank' directly aligns with the main change: switching from global rank to local rank for dataset index builder compilation.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch ko3n1g/cp/0893618f27c5075f8ff104db3154cb8c1672cac9

No actionable comments were generated in the recent review. 🎉


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@ko3n1g ko3n1g merged commit c9ccaf2 into r0.3.0 Feb 11, 2026
16 of 18 checks passed
@ko3n1g ko3n1g deleted the ko3n1g/cp/0893618f27c5075f8ff104db3154cb8c1672cac9 branch February 11, 2026 16:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants