Skip to content

[data] switch pretrain from broadcast to replicated load#491

Merged
ananthsub merged 1 commit intoNVIDIA-NeMo:mainfrom
ananthsub:tp-replicated-load
Aug 26, 2025
Merged

[data] switch pretrain from broadcast to replicated load#491
ananthsub merged 1 commit intoNVIDIA-NeMo:mainfrom
ananthsub:tp-replicated-load

Conversation

@ananthsub
Copy link
Contributor

@ananthsub ananthsub commented Aug 26, 2025

Don't load the batch in one TP rank and broadcast. instead, initialize the dataloader on all ranks and unconditionally call get_batch_from_iterator

Broadcast uses less file I/O and PCIe BW but hurts GPU memcpy and NVL BW

Signed-off-by: Ananth Subramaniam <ansubramania@nvidia.com>
@copy-pr-bot
Copy link

copy-pr-bot bot commented Aug 26, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@ananthsub
Copy link
Contributor Author

/ok to test fe02f57

Copy link
Contributor

@sanandaraj5597 sanandaraj5597 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@ananthsub ananthsub marked this pull request as ready for review August 26, 2025 21:01
@ananthsub ananthsub enabled auto-merge (squash) August 26, 2025 21:13
@ananthsub ananthsub merged commit 7612573 into NVIDIA-NeMo:main Aug 26, 2025
33 checks passed
ko3n1g pushed a commit that referenced this pull request Aug 26, 2025
Signed-off-by: Ananth Subramaniam <ansubramania@nvidia.com>
ananthsub added a commit that referenced this pull request Aug 26, 2025
Signed-off-by: Ananth Subramaniam <ansubramania@nvidia.com>
Co-authored-by: Ananth Subramaniam <ansubramania@nvidia.com>
@ananthsub ananthsub deleted the tp-replicated-load branch February 17, 2026 07:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants