Skip to content

cp: feat: Add dataset compile helper (#2236)#2249

Merged
ko3n1g merged 2 commits intor0.3.0from
ko3n1g/cp/281f6a53a9eb89b6c55debc04b702fb834f66399
Feb 7, 2026
Merged

cp: feat: Add dataset compile helper (#2236)#2249
ko3n1g merged 2 commits intor0.3.0from
ko3n1g/cp/281f6a53a9eb89b6c55debc04b702fb834f66399

Conversation

@ko3n1g
Copy link
Copy Markdown
Contributor

@ko3n1g ko3n1g commented Feb 5, 2026

What does this PR do ?

!!! Not for february release, but for patch release in March !!!

Changelog

  • Add specific line by line info of high level changes in this PR.

GitHub Actions CI

See the CI sectionin the Contributing doc for how to trigger the CI. A Nvidia developer will need to approve and trigger the CI for external contributors.

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you add or update any necessary documentation?
  • Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
    • Reviewer: Does the PR have correct import guards for all optional libraries?

If you haven't finished some of the above items you can still open "Draft" PR.

Additional Information

  • Related to # (issue)

Summary by CodeRabbit

  • Chores

    • Updated megatron-core package configuration to support editable installation.
  • Refactor

    • Enhanced the training initialization process with an optimized compilation step that executes during distributed setup and ensures proper synchronization across all training processes.

Signed-off-by: oliver könig <okoenig@nvidia.com>
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot bot commented Feb 5, 2026

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@ko3n1g ko3n1g changed the title feat: Add dataset compile helper (#2236) cp: feat: Add dataset compile helper (#2236) Feb 5, 2026
…9eb89b6c55debc04b702fb834f66399

Signed-off-by: oliver könig <okoenig@nvidia.com>
@ko3n1g ko3n1g marked this pull request as ready for review February 7, 2026 12:21
@ko3n1g ko3n1g requested a review from a team as a code owner February 7, 2026 12:21
@ko3n1g ko3n1g merged commit 8ae972e into r0.3.0 Feb 7, 2026
1 check passed
@ko3n1g ko3n1g deleted the ko3n1g/cp/281f6a53a9eb89b6c55debc04b702fb834f66399 branch February 7, 2026 12:22
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Feb 7, 2026

Caution

Review failed

The pull request is closed.

📝 Walkthrough

Walkthrough

This pull request updates the build configuration to use editable installation mode for the local megatron-core dependency and introduces a post-distribution-initialization compilation step in the training initialization routine, executed only on rank 0 with barrier synchronization.

Changes

Cohort / File(s) Summary
Configuration Updates
pyproject.toml
Adds editable = true flag to the megatron-core local source in [tool.uv.sources] to enable editable installation mode for development.
Initialization Enhancement
src/megatron/bridge/training/initialize.py
Adds imports for time and compile_helpers. Modifies initialize_megatron to capture the result of torch_dist_init and invoke compile_helpers with timing logs on rank 0 after distributed initialization, followed by a barrier synchronization before returning.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Possibly related PRs

  • feat: Add dataset compile helper #2236: Directly related as it modifies the same file (src/megatron/bridge/training/initialize.py) to add the same imports and implement the identical rank-0 guarded dataset compilation step after distributed initialization.

Suggested reviewers

  • maanug-nv
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch ko3n1g/cp/281f6a53a9eb89b6c55debc04b702fb834f66399

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant