add current_epoch to dumped_params #3261

maxjeblick · 2020-08-29T22:19:56Z

What does this PR do?

Fixes #3260 by restoring the current_epoch after finishing the batch size finder.

codecov · 2020-08-29T22:33:28Z

Codecov Report

Merging #3261 into master will increase coverage by 1%.
The diff coverage is 100%.

@@           Coverage Diff           @@
##           master   #3261    +/-   ##
=======================================
+ Coverage      86%     87%    +1%     
=======================================
  Files         117     117            
  Lines        9353    9618   +265     
=======================================
+ Hits         8074    8357   +283     
+ Misses       1279    1261    -18

rohitgr7 · 2020-08-30T12:25:57Z

global_step, total_batch_idx, or maybe more attributes need to be reset too. I believe a better solution would be to make a pre_fit method or something in Trainer and reassign them to their default values every time .fit is called by calling it in .fit itself.

awaelchli · 2020-08-30T12:45:36Z

@rohitgr7 There is a Tuner class coming, #3160 in which I believe the resetting problem will be tackled.

rohitgr7 · 2020-08-30T12:47:46Z

Nice!! @awaelchli will it support user-defined callbacks for lr_find and auto_scale_batch_size?

awaelchli · 2020-08-30T12:55:02Z

I'm not sure, but we recently added support to persist state of callbacks, so we should be able to dump their state too and restore them easily if that's the problem? For further questions better ask directly here #3160

maxjeblick · 2020-08-31T13:17:53Z

The Tuner class looks really promising!

SkafteNicki · 2020-09-01T10:11:59Z

Just to update, the HyperTuner class is not going to happen at the moment, instead the tuner algorithms will be moved to a separate trainer.tune() method instead of being called by fit (it has already started #3293). So we need to fix this problem when the refactor is done.

mergify · 2020-09-07T20:47:04Z

This pull request is now in conflict... :(

Borda · 2020-10-02T16:24:14Z

@maxjeblick mind add a test to check that it is fixed...

SkafteNicki

LGTM

awaelchli

there are probably lots of other trainer attributes that need to be dumped, for example global_step?

CHANGELOG.md

Co-authored-by: Adrian Wälchli <[email protected]>

SkafteNicki · 2020-10-06T14:09:29Z

@awaelchli you are probably right. I think the tuner algorithms needs an refactoring after v1.0.

rohitgr7 · 2020-10-06T14:56:27Z

Shouldn't tuner be refactored in such a way that it won't call .fit again? Then we won't need to dump and restore anything. What it should do is just suggest and user should change the values and rerun?

SkafteNicki · 2020-10-06T15:04:10Z

@rohitgr7 we use fit internally in these algorithms so we don´t have to redefine the training loop necessary to run these algorithms. However, IMO moving forward we should refactor it in such a way that tuner should just create a new trainer that can be used during tuning and then destroyed afterwards.

rohitgr7 · 2020-10-06T15:35:55Z

@SkafteNicki I am not suggesting to rewrite anything. The reason we need to dump and restore is because these attributes are not re-initialized and current workflow is something like:

trainer.tuner.lr_find()
# suggest
trainer.fit()

What I am suggesting is:

trainer.tuner.lr_find()
# suggest
# and tell user to reinitalize trainer, LM, and LDM(if any) with the updated hparams from the suggestions.

IMO moving forward we should refactor it in such a way that tuner should just create a new trainer that can be used during tuning and then destroyed afterwards.

this is good alternative but this will work for only Trainer and not for LM and LDM since there are chances that some attributes might be changed during .fit().

mergify bot requested a review from a team August 29, 2020 22:20

awaelchli added the bug Something isn't working label Aug 30, 2020

maxjeblick closed this Sep 7, 2020

Borda reopened this Oct 2, 2020

maxjeblick and others added 2 commits October 2, 2020 18:21

add current epoch to __dumped_params

39a4858

log

8bfc3fc

Borda approved these changes Oct 2, 2020

View reviewed changes

Borda requested review from awaelchli, williamFalcon, ethanwharris, nateraw, SkafteNicki and yukw777 October 2, 2020 16:23

mergify bot requested a review from a team October 2, 2020 16:23

reset

538e10c

Borda added this to the 1.0 milestone Oct 5, 2020

SkafteNicki added 2 commits October 6, 2020 10:01

add to test

b1e3788

Merge remote-tracking branch 'upstream/master' into maxjeblick/#3260

47c7352

SkafteNicki approved these changes Oct 6, 2020

View reviewed changes

SkafteNicki added the ready PRs ready to be merged label Oct 6, 2020

mergify bot requested a review from a team October 6, 2020 08:43

awaelchli approved these changes Oct 6, 2020

View reviewed changes

CHANGELOG.md Outdated Show resolved Hide resolved

justusschock approved these changes Oct 6, 2020

View reviewed changes

Update CHANGELOG.md

3d16e6d

Co-authored-by: Adrian Wälchli <[email protected]>

Borda merged commit 39b3704 into Lightning-AI:master Oct 6, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add current_epoch to dumped_params #3261

add current_epoch to dumped_params #3261

maxjeblick commented Aug 29, 2020

codecov bot commented Aug 29, 2020 •

edited

Loading

rohitgr7 commented Aug 30, 2020 •

edited

Loading

awaelchli commented Aug 30, 2020

rohitgr7 commented Aug 30, 2020 •

edited

Loading

awaelchli commented Aug 30, 2020

maxjeblick commented Aug 31, 2020

SkafteNicki commented Sep 1, 2020

mergify bot commented Sep 7, 2020

Borda commented Oct 2, 2020

SkafteNicki left a comment

awaelchli left a comment

SkafteNicki commented Oct 6, 2020

rohitgr7 commented Oct 6, 2020

SkafteNicki commented Oct 6, 2020

rohitgr7 commented Oct 6, 2020

add current_epoch to dumped_params #3261

add current_epoch to dumped_params #3261

Conversation

maxjeblick commented Aug 29, 2020

What does this PR do?

codecov bot commented Aug 29, 2020 • edited Loading

Codecov Report

rohitgr7 commented Aug 30, 2020 • edited Loading

awaelchli commented Aug 30, 2020

rohitgr7 commented Aug 30, 2020 • edited Loading

awaelchli commented Aug 30, 2020

maxjeblick commented Aug 31, 2020

SkafteNicki commented Sep 1, 2020

mergify bot commented Sep 7, 2020

Borda commented Oct 2, 2020

SkafteNicki left a comment

Choose a reason for hiding this comment

awaelchli left a comment

Choose a reason for hiding this comment

SkafteNicki commented Oct 6, 2020

rohitgr7 commented Oct 6, 2020

SkafteNicki commented Oct 6, 2020

rohitgr7 commented Oct 6, 2020

codecov bot commented Aug 29, 2020 •

edited

Loading

rohitgr7 commented Aug 30, 2020 •

edited

Loading

rohitgr7 commented Aug 30, 2020 •

edited

Loading