Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support returning python scalars in DP #1935

Merged
merged 16 commits into from
Aug 7, 2020

Conversation

nsarang
Copy link
Contributor

@nsarang nsarang commented May 24, 2020

Before submitting

  • Was this discussed/approved via a Github issue? (no need for typos and docs improvements)
  • Did you read the contributor guideline, Pull Request section?
  • Did you make sure to update the docs?
  • Did you write any new necessary tests?
  • If you made a notable change (that affects users), did you update the CHANGELOG?

What does this PR do?

Fixes #1861 and #1904.
I'm not sure if this fix is the right approach. In the docs it's mentioned that training_step must return Tensor (even for the progress_bar dict). But with this PR it'll also work with scalars. Were there other reasons (besides the issues above) to force users to stick with tensors?

PR review

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

Did you have fun?

Make sure you had fun coding 🙃

@mergify mergify bot requested a review from a team May 24, 2020 20:13
@Borda Borda added the feature Is an improvement or enhancement label May 24, 2020
@Borda Borda added this to the 0.7.7 milestone May 24, 2020
@codecov
Copy link

codecov bot commented May 24, 2020

Codecov Report

Merging #1935 into master will increase coverage by 1%.
The diff coverage is 90%.

@@          Coverage Diff           @@
##           master   #1935   +/-   ##
======================================
+ Coverage      89%     90%   +1%     
======================================
  Files          79      79           
  Lines        7302    7214   -88     
======================================
- Hits         6514    6496   -18     
+ Misses        788     718   -70     

@nsarang nsarang changed the title [WIP] Support returning scalars in DP Support returning scalars in DP May 26, 2020
@Borda Borda modified the milestones: 0.7.7, 0.8.0 May 26, 2020
@williamFalcon
Copy link
Contributor

@nsarang good addition. Can you write a test for this?

@awaelchli
Copy link
Contributor

Could you then also update the docs, since you mention there are wrong informations on the return type?

@Borda Borda added the waiting on author Waiting on user action, correction, or update label May 27, 2020
Copy link
Member

@Borda Borda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as mentioned by Will and Adrian, pls

  • add tests
  • update docs

@mergify mergify bot requested a review from a team May 27, 2020 08:23
@Borda
Copy link
Member

Borda commented Jun 8, 2020

@nsarang how is it going here? 🐰

@nsarang
Copy link
Contributor Author

nsarang commented Jun 9, 2020

@Borda Sorry I've been busy lately. I'll finish it today 🙂

@pep8speaks
Copy link

pep8speaks commented Jun 9, 2020

Hello @nsarang! Thanks for updating this PR.

Line 190:120: E501 line too long (121 > 119 characters)
Line 362:120: E501 line too long (121 > 119 characters)
Line 804:120: E501 line too long (122 > 119 characters)

Comment last updated at 2020-08-06 12:23:29 UTC

@nsarang nsarang changed the title Support returning scalars in DP [WIP] Support returning scalars in DP Jun 9, 2020
@Borda Borda modified the milestones: 0.8.0, 0.9.0 Jun 9, 2020
@nsarang nsarang changed the title [WIP] Support returning scalars in DP Support returning scalars in DP Jun 11, 2020
@nsarang
Copy link
Contributor Author

nsarang commented Jun 11, 2020

@Borda
I tested this on 2 GPUs using DP and it was OK. But it gave the following warning:
/nsarang/env-python3.6/lib/python3.6/site-packages/torch/nn/parallel/_functions.py:61: UserWarning: Was asked to gather along dimension 0, but all input tensors were scalars; will instead unsqueeze and return a vector.

This warning is supposed to be silenced in ignored_warnings.py but It seems the text doesn't match 🤔

I could add this line to the gather method to avoid this warning.

@mergify
Copy link
Contributor

mergify bot commented Jun 16, 2020

This pull request is now in conflict... :(

@Borda Borda removed the waiting on author Waiting on user action, correction, or update label Jun 21, 2020
@Borda Borda added the ready PRs ready to be merged label Jul 9, 2020
@mergify
Copy link
Contributor

mergify bot commented Jul 20, 2020

This pull request is now in conflict... :(

@Borda Borda requested review from awaelchli and Borda July 28, 2020 08:19
@Borda
Copy link
Member

Borda commented Jul 31, 2020

@nsarang mind check the tests, pls

@williamFalcon
Copy link
Contributor

mind checking the failing test? @nsarang

@nsarang
Copy link
Contributor Author

nsarang commented Aug 2, 2020

@Borda @williamFalcon
Seems to be working for all tests now except tests/core/test_datamodules.py::test_full_loop_ddp_spawn. The final assertion fails.
I'm not sure how LightningDataParallel is related to ddp_spwan.
Is it because of the way I added alternating batches in TrainingStepVariations and ValidationEpochEndVariations?

@Borda
Copy link
Member

Borda commented Aug 3, 2020

@nateraw mind have look? 🐰

@mergify mergify bot requested a review from a team August 6, 2020 12:23
@Borda Borda merged commit 793036d into Lightning-AI:master Aug 7, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Is an improvement or enhancement ready PRs ready to be merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

distributed training crashes with dp (list comprehension issue from torch?)
6 participants