Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

skip best_model_path if checkpoint_callback is None #2962

Merged
merged 2 commits into from
Oct 1, 2020

Conversation

lezwon
Copy link
Contributor

@lezwon lezwon commented Aug 13, 2020

What does this PR do?

Fixes training error when checkpoint_callback=False is set during training on TPU.

Fixes #2961

Before submitting

  • Was this discussed/approved via a Github issue? (no need for typos and docs improvements)
  • Did you read the contributor guideline, Pull Request section?
  • Did you make sure your PR does only one thing, instead of bundling different changes together? Otherwise, we ask you to create a separate PR for every change.
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests?
  • Did you verify new and existing tests pass locally with your changes?
  • If you made a notable change (that affects users), did you update the CHANGELOG?

PR review

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

Did you have fun?

Make sure you had fun coding 🙃

@pep8speaks
Copy link

pep8speaks commented Aug 13, 2020

Hello @lezwon! Thanks for updating this PR.

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2020-10-01 00:51:34 UTC

@mergify mergify bot requested a review from a team August 13, 2020 19:27
@Borda Borda added bug Something isn't working accelerator: tpu Tensor Processing Unit labels Aug 13, 2020
@mergify
Copy link
Contributor

mergify bot commented Aug 13, 2020

This pull request is now in conflict... :(

@lezwon lezwon force-pushed the no_checkpoint_tpu_fix branch from 662d168 to 056bf5e Compare August 14, 2020 04:43
@lezwon lezwon marked this pull request as draft August 14, 2020 05:35
@Borda Borda added this to the 0.9.x milestone Aug 20, 2020
@Borda
Copy link
Member

Borda commented Sep 15, 2020

@lezwon how is it going? seems quite ready to go... 🐰

@lezwon
Copy link
Contributor Author

lezwon commented Sep 16, 2020

@Borda The checkpoint_callback issue is fixed. But I couldn't get the test to work due to another issue which saving TPU weights. I am working on that one here #3044 . Will sort this and the other PR by the weekend.

@lezwon lezwon force-pushed the no_checkpoint_tpu_fix branch from 6b27dc2 to 7173a4c Compare September 16, 2020 01:23
@Borda Borda changed the title skip best_model_path if checkpoint_callback is None [blocked by #3044] skip best_model_path if checkpoint_callback is None Sep 16, 2020
Comment on lines +70 to +71
if self.trainer.checkpoint_callback is not None:
self.trainer.checkpoint_callback.best_model_path = best_path
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about we just merge this one line fix into master? This would close the issue before release and you can still figure out the hard testing part for TPU in the other PR that's blocking this.
cc @Borda

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good to me...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's sounds good 👍

@lezwon lezwon force-pushed the no_checkpoint_tpu_fix branch from 7173a4c to 51f495b Compare October 1, 2020 00:51
@lezwon lezwon marked this pull request as ready for review October 1, 2020 01:16
@lezwon lezwon changed the title [blocked by #3044] skip best_model_path if checkpoint_callback is None skip best_model_path if checkpoint_callback is None Oct 1, 2020
@mergify mergify bot requested a review from a team October 1, 2020 01:16
@codecov
Copy link

codecov bot commented Oct 1, 2020

Codecov Report

Merging #2962 into master will increase coverage by 5%.
The diff coverage is 0%.

@@           Coverage Diff           @@
##           master   #2962    +/-   ##
=======================================
+ Coverage      85%     90%    +5%     
=======================================
  Files         108     108            
  Lines        8433    8387    -46     
=======================================
+ Hits         7191    7584   +393     
+ Misses       1242     803   -439     

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
accelerator: tpu Tensor Processing Unit bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

AttributeError: 'NoneType' object has no attribute 'best_model_path' when checkpoint_callback = False
5 participants