-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
set find_unused_parameters=False in DDP as in pytorch #5435
Conversation
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for splitting this out @awaelchli !
With this change we might see some models breaking that had parameters which were unused being pruned out automatically for them. I think @tchaton ran into a test breaking due to this as well! @blefaudeux spotted iGPT was working due to find_unused_parameters being set to True by default I think: https://github.com/teddykoker/image-gpt I think this is fine, as long as we make it clear that this change is being made + having an easy way to turn it back on. Currently the user would need to instantiate the ddp plugin to enable this flag, anyway we could potentially add this to the trainer? |
@@ -107,6 +108,7 @@ def test_sync_batchnorm_ddp(tmpdir): | |||
sync_batchnorm=True, | |||
num_sanity_val_steps=0, | |||
replace_sampler_ddp=False, | |||
plugins=[DDPPlugin(find_unused_parameters=True)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
had to set this here to true, as mentioned by @SeanNaren in the comment above.
turns out this modification to the test is already in master branch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah for a test this is fine, but for the user you can't enable this unless you pass your own plugin, which is why I don't like this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, it's also not agnostic to command line input / device without directly. A new trainer flag is probably best for user.
Codecov Report
@@ Coverage Diff @@
## release/1.2-dev #5435 +/- ##
================================================
- Coverage 93% 93% -0%
================================================
Files 152 151 -1
Lines 10737 10620 -117
================================================
- Hits 9950 9835 -115
+ Misses 787 785 -2 |
@SeanNaren sounds like the reason we had it set to True before was simply so that users don't run into error messages despite it having a performance hit. I wonder how many users would have breaking code because of this. Perhaps we should print deprecation warnings for this change, the question is just how to do show it only to the users who would need it and not just everybody. |
This is going to be tricky, I think if it breaks it's better to hard crash so the user know what's happening. Then exposing the parameter |
sounds good. then do we want to do it directly in this PR here (the trainer flag)? |
I don't think it is a large percentage. And in any case, wouldn't most of them want to fix their unused parameters? I don't like the idea of introducing another Trainer parameter for this. It is just noise for everybody else. I would focus on making clear in the docs what is happening if the program fails due to the new behaviour and how to fix it. After all, it's not much harder to do:
|
This requires' making the change within the code though, that's what I'm concerned about. If we feel confident that this won't break enough user code to add to the trainer args for convenience, then I'm happy :) more interested in getting this fix in ASAP. Regardless let's do it in a separate PR, we can chat offline! |
alright, will open this for review then |
@awaelchli mind resolve conflicts... |
Lightning currently forces
find_unused_parameters=True
in DDP, but pytorch recommends False.Context: brief discussion here: #5185, suggested by @ananthsub