Skip to content

Conversation

@chrisbraddock
Copy link
Owner

Summary

  • simplify DDP initialization in llm_training.py
  • default master address/port and use torchrun-provided env vars
  • update README distributed training section

Testing

  • python -m py_compile llm_training.py run_experiments.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants