Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use callable object for patching dataloaders #971

Merged

Conversation

shoarora
Copy link
Contributor

@shoarora shoarora commented Feb 28, 2020

Before submitting

  • Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
  • Did you read the contributor guideline?
  • Did you make sure to update the docs?
    • not sure if there's anywhere I need to
  • Did you write any new necessary tests?
  • If you made a notable change (that affects users), did you update the CHANGELOG?

What does this PR do?

Fixes #968 .

Changes how we patch dataloaders when passed to trainer.fit() so that it's pickleable

PR review

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

Did you have fun?

Make sure you had fun coding 🙃

@shoarora shoarora changed the title [WIP] Use callable object for patching dataloaders Use callable object for patching dataloaders Feb 28, 2020
@shoarora
Copy link
Contributor Author

@williamFalcon @Borda

Do we need tests for this?
Not sure if this needs a CHANGELOG change either. would appreciate review!

@shoarora shoarora changed the title Use callable object for patching dataloaders [WIP] Use callable object for patching dataloaders Feb 28, 2020
@shoarora
Copy link
Contributor Author

Getting a very unhelpful error, will keep investigating

Exception                                 Traceback (most recent call last)
<ipython-input-15-bef80bcdc7b4> in <module>()
     16 # trainer = Trainer(num_tpu_cores=8)
     17 
---> 18 trainer.fit(model, loader)

2 frames
/usr/local/lib/python3.6/dist-packages/torch/multiprocessing/spawn.py in join(self, timeout)
    110                 raise Exception(
    111                     "process %d terminated with exit code %d" %
--> 112                     (error_index, exitcode)
    113                 )
    114 

Exception: process 0 terminated with exit code 1

@shoarora
Copy link
Contributor Author

Update: now if you pass your dataloaders to trainer.fit():

  • non-slurm ddp works in a VM/linux context
  • non-slurm ddp does not work in colab
    • the colab stack trace (my previous message) is unhelpful, and I'm having trouble figuring out how to get to the bottom of it
    • on the other hand, ddp in a colab context is an impractical scenario
  • colab TPU worked regardless of this PR
  • non-colab TPU works

Not sure what a change of this level warrants in terms of testing and documentation for pl. Please let me know!

I added a gpu-ddp test for passing all the loaders to fit() and confirmed that this branch allows the test to pass

@shoarora shoarora changed the title [WIP] Use callable object for patching dataloaders Use callable object for patching dataloaders Feb 29, 2020
@williamFalcon
Copy link
Contributor

this is awesome! will pr review in a few hours!
thank u

@williamFalcon
Copy link
Contributor

i think ddp in colab needs the spawn flag or something

@Borda Borda added the feature Is an improvement or enhancement label Feb 29, 2020
Copy link
Member

@Borda Borda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just curious if simple functools would help too which does very similar wrapping

from functools import partial

dataloader = partial(dataloader)

@shoarora
Copy link
Contributor Author

LGTM, just curious if simple functools would help too which does very similar wrapping

from functools import partial

dataloader = partial(dataloader)

partial is specifically for wrapping functions/callables and freezing sets of arguments, so you couldn't call it on a dataloader:

>>> from functools import partial
>>> x = partial(loader)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: the first argument must be callable

@Borda Borda added the ready PRs ready to be merged label Mar 1, 2020
@williamFalcon williamFalcon merged commit a1fb3a4 into Lightning-AI:master Mar 2, 2020
Borda added a commit that referenced this pull request Mar 2, 2020
* Use callable object for patching dataloaders

* Add test for ddp with dataloaders passed to fit()

* Update pytorch_lightning/trainer/trainer.py

Co-Authored-By: Jirka Borovec <[email protected]>

* Update pytorch_lightning/trainer/trainer.py

Co-Authored-By: Jirka Borovec <[email protected]>

Co-authored-by: Jirka Borovec <[email protected]>
williamFalcon added a commit that referenced this pull request Mar 3, 2020
* Update README.md

* Update README.md

* Use callable object for patching dataloaders (#971)

* Use callable object for patching dataloaders

* Add test for ddp with dataloaders passed to fit()

* Update pytorch_lightning/trainer/trainer.py

Co-Authored-By: Jirka Borovec <[email protected]>

* Update pytorch_lightning/trainer/trainer.py

Co-Authored-By: Jirka Borovec <[email protected]>

Co-authored-by: Jirka Borovec <[email protected]>

* merge load functions

* update tests

* fix documentation warnings

* fix line too long

* fix line too long

* print deprecation warning

Co-Authored-By: Jirka Borovec <[email protected]>

* move tags_csv argument to end of signature

* fix typo, update version numbers

* fix line too long

* add typing as requested

* update changelog

Co-authored-by: William Falcon <[email protected]>
Co-authored-by: Sho Arora <[email protected]>
Co-authored-by: Jirka Borovec <[email protected]>
@Borda Borda added this to the 0.7.0 milestone Mar 7, 2020
tullie pushed a commit to tullie/pytorch-lightning that referenced this pull request Apr 3, 2020
* Use callable object for patching dataloaders

* Add test for ddp with dataloaders passed to fit()

* Update pytorch_lightning/trainer/trainer.py

Co-Authored-By: Jirka Borovec <[email protected]>

* Update pytorch_lightning/trainer/trainer.py

Co-Authored-By: Jirka Borovec <[email protected]>

Co-authored-by: Jirka Borovec <[email protected]>
tullie pushed a commit to tullie/pytorch-lightning that referenced this pull request Apr 3, 2020
* Update README.md

* Update README.md

* Use callable object for patching dataloaders (Lightning-AI#971)

* Use callable object for patching dataloaders

* Add test for ddp with dataloaders passed to fit()

* Update pytorch_lightning/trainer/trainer.py

Co-Authored-By: Jirka Borovec <[email protected]>

* Update pytorch_lightning/trainer/trainer.py

Co-Authored-By: Jirka Borovec <[email protected]>

Co-authored-by: Jirka Borovec <[email protected]>

* merge load functions

* update tests

* fix documentation warnings

* fix line too long

* fix line too long

* print deprecation warning

Co-Authored-By: Jirka Borovec <[email protected]>

* move tags_csv argument to end of signature

* fix typo, update version numbers

* fix line too long

* add typing as requested

* update changelog

Co-authored-by: William Falcon <[email protected]>
Co-authored-by: Sho Arora <[email protected]>
Co-authored-by: Jirka Borovec <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Is an improvement or enhancement ready PRs ready to be merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Passing dataloader to trainer.fit() doesn't work with tpu (and maybe ddp)
3 participants