Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

set find_unused_parameters=False in DDP as in pytorch #5435

Merged
merged 9 commits into from
Jan 13, 2021

Conversation

awaelchli
Copy link
Contributor

@awaelchli awaelchli commented Jan 9, 2021

Lightning currently forces find_unused_parameters=True in DDP, but pytorch recommends False.
Context: brief discussion here: #5185, suggested by @ananthsub

@awaelchli awaelchli added the feature Is an improvement or enhancement label Jan 9, 2021
@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@awaelchli awaelchli changed the base branch from master to release/1.2-dev January 9, 2021 06:14
@awaelchli awaelchli added this to the 1.2 milestone Jan 9, 2021
Copy link
Contributor

@ananthsub ananthsub left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for splitting this out @awaelchli !

CHANGELOG.md Outdated Show resolved Hide resolved
@SeanNaren
Copy link
Contributor

With this change we might see some models breaking that had parameters which were unused being pruned out automatically for them. I think @tchaton ran into a test breaking due to this as well!

@blefaudeux spotted iGPT was working due to find_unused_parameters being set to True by default I think: https://github.com/teddykoker/image-gpt

I think this is fine, as long as we make it clear that this change is being made + having an easy way to turn it back on. Currently the user would need to instantiate the ddp plugin to enable this flag, anyway we could potentially add this to the trainer?

@@ -107,6 +108,7 @@ def test_sync_batchnorm_ddp(tmpdir):
sync_batchnorm=True,
num_sanity_val_steps=0,
replace_sampler_ddp=False,
plugins=[DDPPlugin(find_unused_parameters=True)]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

had to set this here to true, as mentioned by @SeanNaren in the comment above.
turns out this modification to the test is already in master branch

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah for a test this is fine, but for the user you can't enable this unless you pass your own plugin, which is why I don't like this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, it's also not agnostic to command line input / device without directly. A new trainer flag is probably best for user.

@codecov
Copy link

codecov bot commented Jan 9, 2021

Codecov Report

Merging #5435 (465a15f) into release/1.2-dev (61f415f) will decrease coverage by 0%.
The diff coverage is n/a.

@@               Coverage Diff                @@
##           release/1.2-dev   #5435    +/-   ##
================================================
- Coverage               93%     93%    -0%     
================================================
  Files                  152     151     -1     
  Lines                10737   10620   -117     
================================================
- Hits                  9950    9835   -115     
+ Misses                 787     785     -2     

@awaelchli
Copy link
Contributor Author

@SeanNaren sounds like the reason we had it set to True before was simply so that users don't run into error messages despite it having a performance hit.

I wonder how many users would have breaking code because of this. Perhaps we should print deprecation warnings for this change, the question is just how to do show it only to the users who would need it and not just everybody.

@SeanNaren
Copy link
Contributor

@SeanNaren sounds like the reason we had it set to True before was simply so that users don't run into error messages despite it having a performance hit.

I wonder how many users would have breaking code because of this. Perhaps we should print deprecation warnings for this change, the question is just how to do show it only to the users who would need it and not just everybody.

This is going to be tricky, I think if it breaks it's better to hard crash so the user know what's happening. Then exposing the parameter find_unused_parameters to the trainer for users to enable. The overhead from using it from what I know is quite high, so we should turn this off by default for the users that do not need to worry about this.

@awaelchli
Copy link
Contributor Author

awaelchli commented Jan 9, 2021

sounds good. then do we want to do it directly in this PR here (the trainer flag)?

@awaelchli awaelchli added the design Includes a design discussion label Jan 9, 2021
@carmocca
Copy link
Contributor

carmocca commented Jan 9, 2021

I wonder how many users would have breaking code because of this

I don't think it is a large percentage. And in any case, wouldn't most of them want to fix their unused parameters?

I don't like the idea of introducing another Trainer parameter for this. It is just noise for everybody else. I would focus on making clear in the docs what is happening if the program fails due to the new behaviour and how to fix it. After all, it's not much harder to do:

Trainer(plugins=[DDPPlugin(find_unused_parameters=True)] than Trainer(find_ddp_unused_parameters=True)

@SeanNaren
Copy link
Contributor

I wonder how many users would have breaking code because of this

I don't think it is a large percentage. And in any case, wouldn't most of them want to fix their unused parameters?

I don't like the idea of introducing another Trainer parameter for this. It is just noise for everybody else. I would focus on making clear in the docs what is happening if the program fails due to the new behaviour and how to fix it. After all, it's not much harder to do:

Trainer(plugins=[DDPPlugin(find_unused_parameters=True)] than Trainer(find_ddp_unused_parameters=True)

This requires' making the change within the code though, that's what I'm concerned about. If we feel confident that this won't break enough user code to add to the trainer args for convenience, then I'm happy :) more interested in getting this fix in ASAP.

Regardless let's do it in a separate PR, we can chat offline!

@awaelchli
Copy link
Contributor Author

alright, will open this for review then

@awaelchli awaelchli marked this pull request as ready for review January 10, 2021 00:30
@tchaton tchaton enabled auto-merge (squash) January 11, 2021 11:49
@SeanNaren SeanNaren added the ready PRs ready to be merged label Jan 11, 2021
@Borda
Copy link
Member

Borda commented Jan 13, 2021

@awaelchli mind resolve conflicts...

@tchaton tchaton merged commit 6130813 into release/1.2-dev Jan 13, 2021
@tchaton tchaton deleted the feature/find_unused branch January 13, 2021 15:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
design Includes a design discussion feature Is an improvement or enhancement has conflicts ready PRs ready to be merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants