Skip to content

Conversation

@jeffra
Copy link
Collaborator

@jeffra jeffra commented Jun 25, 2020

Allow users to initialize torch distributed outside of deepspeed.initialize. Specifically needed for MPI discovery (e.g., AML) and other models. This also removes the need for --deepspeed_mpi flag, we will automatically attempt to discover MPI world info if we are not launched with a torch.distributed or deepspeed launcher.

TODO: Need to update documentation, remove --deepspeed_mpi flag, and add unit tests

@jeffra jeffra marked this pull request as ready for review June 25, 2020 22:02
self.device = torch.device("cuda", self.local_rank)
self.world_size = dist.get_world_size()
self.global_rank = dist.get_rank()
logger.info("Set device to local rank {} within node.".format(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we move this message to where we set self.local_rank in line 125?

@jeffra
Copy link
Collaborator Author

jeffra commented Dec 17, 2020

Closing and moving to #608

@jeffra jeffra closed this Dec 17, 2020
@jeffra jeffra deleted the jeffra/ds_dist_init branch September 24, 2021 04:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants