-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move init_ddp_connection to DDP Plugin #4407
Move init_ddp_connection to DDP Plugin #4407
Conversation
Do we need the |
@justusschock |
def init_ddp_connection( | ||
self, | ||
trainer, | ||
cluster_environment, | ||
global_rank: int, | ||
world_size: int, | ||
is_slurm_managing_tasks: bool = True, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i don't know if this is the right interface. is this leaking too much? should the plugin have a reference to the trainer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO we already have too many things with trainer references. Besides that, I think the interface is fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as mentioned earlier, we’re cleaning up trainer references for the next stage of refactoring.
but i think that’s for a different PR.
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. If you need further help see our docs: https://pytorch-lightning.readthedocs.io/en/latest/CONTRIBUTING.html#pull-request or ask the assistance of a core contributor here or on Slack. Thank you for your contributions. |
This pull request is going to be closed. Please feel free to reopen it create a new from the actual master. |
Opening this because I'm running into a case where I need to define an additional torch RPC connection. This would allow the additional connection to exist within the plugin. |
Codecov Report
@@ Coverage Diff @@
## master #4407 +/- ##
======================================
- Coverage 93% 93% -0%
======================================
Files 117 117
Lines 8949 8954 +5
======================================
+ Hits 8319 8321 +2
- Misses 630 633 +3 |
* Move init_ddp_connection to DDP Plugin * cluster-env * trainer? * imports * Update ddp_plugin.py Co-authored-by: Sean Naren <[email protected]>
What does this PR do?
This moves
init_ddp_connection
to the DDP Plugin, making it easily overridable in the same place as theconfigure_ddp
for the model. This means custom implementations can be shared easily cross accelerator backendsSome questions:
configure_ddp
cc @williamFalcon @awaelchli
Before submitting
PR review
Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.
Did you have fun?
Make sure you had fun coding 🙃