Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Skip replacing dataloader sampler if it's already a distributed sampler #4273

Merged
merged 14 commits into from
Oct 23, 2020

Conversation

ananthsub
Copy link
Contributor

@ananthsub ananthsub commented Oct 21, 2020

What does this PR do?

This PR skips replacing the dataloader's sampler if the sampler is already of type DistributedSampler. Currently this raises a MisconfigurationException.

This causes confusion, since trainer.replace_sampler_ddp is by default True. In these cases, we can silently skip replacing the sampler instead of failing.

Before submitting

  • Was this discussed/approved via a Github issue? (no need for typos and docs improvements)
  • Did you read the contributor guideline, Pull Request section?
  • Did you make sure your PR does only one thing, instead of bundling different changes together? Otherwise, we ask you to create a separate PR for every change.
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests?
  • Did you verify new and existing tests pass locally with your changes?
  • If you made a notable change (that affects users), did you update the CHANGELOG?

PR review

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

Did you have fun?

Make sure you had fun coding 🙃

@mergify mergify bot requested a review from a team October 21, 2020 05:53
@s-rog
Copy link
Contributor

s-rog commented Oct 21, 2020

Has this change already been approved by one of our team members?
Currently trainer.replace_sampler_ddp is supposed to be False in this case, and using the default True is considered user error.

@codecov
Copy link

codecov bot commented Oct 21, 2020

Codecov Report

Merging #4273 into master will increase coverage by 0%.
The diff coverage is 100%.

@@          Coverage Diff           @@
##           master   #4273   +/-   ##
======================================
  Coverage      92%     93%           
======================================
  Files         111     111           
  Lines        8075    8021   -54     
======================================
- Hits         7466    7455   -11     
+ Misses        609     566   -43     

@Borda Borda added the feature Is an improvement or enhancement label Oct 21, 2020
@mergify mergify bot requested a review from a team October 21, 2020 09:16
@justusschock
Copy link
Member

@s-rog while this is true, we still can do that type check and the main idea of that flag is being able to turn it off (which you still can)

@mergify mergify bot requested a review from a team October 21, 2020 09:48
Copy link
Contributor

@s-rog s-rog left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we already have a test that covers this new behavior? Otherwise LGTM!

@mergify mergify bot requested a review from a team October 21, 2020 09:49
@ananthsub
Copy link
Contributor Author

@s-rog I'll add one for a dataloader configured with a distributed sampler and replace_sampler_ddp=True to confirm it doesn't throw

@awaelchli
Copy link
Contributor

Be careful, with this change, the docs are slightly misleading now. It currently reads:

Enables auto adding of distributed sampler. By default it will add shuffle=True for train sampler and shuffle=False for val/test sampler. If you want to customize it, you can set replace_sampler_ddp=False and add your own distributed sampler.

@mergify mergify bot requested a review from a team October 22, 2020 06:15
@carmocca
Copy link
Contributor

LGTM. Great idea!

Copy link
Contributor

@SeanNaren SeanNaren left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to fix the test, seems like its failing on multiple GPUs.

Copy link
Contributor

@rohitgr7 rohitgr7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just trying.

tests/trainer/test_dataloaders.py Outdated Show resolved Hide resolved
tests/trainer/test_dataloaders.py Outdated Show resolved Hide resolved
@rohitgr7
Copy link
Contributor

rohitgr7 commented Oct 23, 2020

I suggest while adding a sampler we should explicitlly set shuffle=Falsehere to avoid the case if the dataloader already has shuffle=True else it will raise an Error.

@rohitgr7 rohitgr7 requested a review from SeanNaren October 23, 2020 14:09
@SeanNaren
Copy link
Contributor

@rohitgr7 think we should include that in this PR or handle this separately?

@rohitgr7
Copy link
Contributor

rohitgr7 commented Oct 23, 2020

@SeanNaren either way is fine or let's just do it separately and also here in _get_distributed_sampler(self, dataloader, train): this train argument should be renamed to shuffle for better readability of the code.

EDIT: Let's change it here. Not a big one.

Copy link
Contributor

@rohitgr7 rohitgr7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please wait

@SeanNaren SeanNaren merged commit f6efb71 into Lightning-AI:master Oct 23, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Is an improvement or enhancement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants