-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix DP Logging Aggregation #4138
Conversation
Hello @justusschock! Thanks for updating this PR. There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻 Comment last updated at 2020-12-04 17:30:46 UTC |
Codecov Report
@@ Coverage Diff @@
## master #4138 +/- ##
======================================
Coverage 93% 93%
======================================
Files 124 124
Lines 9294 9299 +5
======================================
+ Hits 8608 8615 +7
+ Misses 686 684 -2 |
@Borda any clue on the failing Mac tests? |
@justusschock It comes from master branch |
It is related to latest packages...
Yes, we can check if there was a package upgrade in last days... |
@justusschock fix failing tests? |
@edenlightning they should not be related to this pr. But I will check again |
Can we get a test for this in as well? |
@justusschock is this fix finished, do you need help? |
It is not yet finished. I was planning to get this done today or tomorrow (my old fix sin't valid anymore, we talked about this offline). If you have some time you can for sure give it a shot :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great ! Great catch !
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any chance you could add a test ?
can I just say how frustrated I am how long it took me to realize that BoringModel uses batch size 1 so when running in DP mode it does not actually do anything on gpu 1 and so for a long time I was just not able to make the test fail on master 😩 now the reported bug |
@adityak2920 is this the issue you were facing?? Does it solve your use-case? |
Yes, I was facing this issue and will solve my use case. |
@awaelchli @SeanNaren mind review? |
it requires also @tchaton review because he had requested changes. |
move result stupid result object revert to master undo import add "to" method to result generalize to try a test try a test Revert "try a test" This reverts commit 22e3c10. Revert "try a test" This reverts commit 4d2d8fb. new test max epochs super epoch end log in test hanging test undo test initial test that fails on master step end pass step end step end epoch end print step check dev clean up test sanity check wtf is go ing on frustration debugging test test test test test test test test test unused import
bfe5395
to
ec1f526
Compare
pytorch_lightning/trainer/connectors/logger_connector/logger_connector.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM !
What does this PR do?
Fixes a bug in LoggerConnector, where the metric values that should be reduced are not on the same device in DP
Fixes #4073
Before submitting
PR review
Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.
Did you have fun?
Make sure you had fun coding 🙃