Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deprecations fix TORCH_NCCL_BLOCKING_WAIT #9448

Merged
merged 2 commits into from
Mar 31, 2024
Merged

Conversation

glenn-jocher
Copy link
Member

@glenn-jocher glenn-jocher commented Mar 31, 2024

@Laughing-q should resolve DDP warning

πŸ› οΈ PR Summary

Made with ❀️ by Ultralytics Actions

🌟 Summary

Improved network communication setup for distributed training.

πŸ“Š Key Changes

  • Changed an environment variable from NCCL_BLOCKING_WAIT to TORCH_NCCL_BLOCKING_WAIT to better manage timeouts in distributed training.

🎯 Purpose & Impact

  • Purpose: This change aims to enhance the reliability and efficiency of network communications in distributed deep learning scenarios.
  • Impact: Users may experience more stable and consistent training sessions across multiple GPUs, especially in environments prone to network issues. This could lead to faster training times and potentially better model performance. πŸš€πŸ“ˆ

Copy link

codecov bot commented Mar 31, 2024

Codecov Report

All modified and coverable lines are covered by tests βœ…

❗ No coverage uploaded for pull request base (main@aa75606). Click here to learn what that means.

❗ Current head 5a5362b differs from pull request most recent head e40e8d4. Consider uploading reports for the commit e40e8d4 to get more accurate results

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #9448   +/-   ##
=======================================
  Coverage        ?   38.35%           
=======================================
  Files           ?      120           
  Lines           ?    15168           
  Branches        ?        0           
=======================================
  Hits            ?     5818           
  Misses          ?     9350           
  Partials        ?        0           
Flag Coverage Ξ”
GPU 38.35% <100.00%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

β˜” View full report in Codecov by Sentry.
πŸ“’ Have feedback on the report? Share it here.

@glenn-jocher
Copy link
Member Author

@Laughing-q it works!! Warning is fixed :)

Screenshot 2024-03-31 at 18 36 00

@glenn-jocher glenn-jocher merged commit 7df821e into main Mar 31, 2024
10 checks passed
@glenn-jocher glenn-jocher deleted the glenn-jocher-patch-1 branch March 31, 2024 16:36
hmurari pushed a commit to hmurari/ultralytics that referenced this pull request Apr 17, 2024
gkinman pushed a commit to Octasic/ultralytics that referenced this pull request May 30, 2024
iamdgarcia pushed a commit to iamdgarcia/ultralytics_16U that referenced this pull request Nov 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant