-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Learning Rate finder #1347
Learning Rate finder #1347
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very excited about this feature! 🤖
self.num_iter = num_iter | ||
super(_LinearLR, self).__init__(optimizer, last_epoch) | ||
|
||
def get_lr(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
having it as property lr
?
# Max step set to number of iterations | ||
max_steps = self.max_steps | ||
self.max_steps = num_iters | ||
|
||
# Disable standard progress bar for fit | ||
show_progress_bar = self.show_progress_bar | ||
self.show_progress_bar = False | ||
|
||
# Accumulation of gradients | ||
accumulate_grad_batches = self.accumulate_grad_batches | ||
self.accumulate_grad_batches = num_accumulation_steps | ||
|
||
# Configure optimizer and scheduler | ||
optimizer, _, _ = self.init_optimizers(model.configure_optimizers()) | ||
assert len(optimizer) == 1, 'cannot find lr for more than 1 optimizer' | ||
configure_optimizers = model.configure_optimizers | ||
model.configure_optimizers = lr_finder._get_new_optimizer(optimizer[0]) | ||
|
||
# Fit, lr & loss logged in callback | ||
self.fit(model) | ||
|
||
# Promt if we stopped early | ||
if self.global_step != num_iters: | ||
print('LR finder stopped early due to diverging loss.') | ||
|
||
# Transfer results from callback to lr finder object | ||
lr_finder.results.update({'lr': self.callbacks[0].lrs, | ||
'loss': self.callbacks[0].losses}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
regarding the logger suppression, it would be nice to have this as a function/method with a wrapper...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain a bit more?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is a single block of code which you want to execute "silently"
co make it as a separate function and write a wrapper which disables the logger and later restore
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is awesome! I'd prefer to put this into an argument in the trainer... not have an additional method the user has to call.
Trainer(auto_find_lr=True)
def auto_find_lr(self, model):
lr_finder = self.find_lr(model)
# Plot results
# Choose based on plot, or get a suggestion
print(f'suggested lr {lr_finder.suggestion()}')
# automatically update the optimizers
@justusschock @ethanwharris thoughts?
Some caveats:
- This won't work with dp/ddp. We should modify for that case.
- what about multiple optimizers?
I recommend not merging until we hash out the API.
Ok to merge a v1 of this that doesn't support multiple optimizers or dp/ddp but we need to work those out eventually. This should also be clearly written in the docs
|
@SkafteNicki if you call .fit internally then it should be fine! Overall though, take a look at my comments about collapsing this into a flag |
The idea of having this as a separate method is just taken directly from fastai. I am fine by collapsing this into a flag, however I think it removes the possibility for the user to interact with the results produced by learning rate finder before fitting the model. |
@williamFalcon I wouldn't migrate it to a trainer arg. What I'd really like is something like: # init your Trainer:
trainer = Trainer (...)
with trainer.find_best_lr:
trainer.fit This is not much boilerplate, but I think we should not make to much implicit choices. And if you would pass the optimiser there, you could also do: with trainer.find_best_lr(optim1):
with trainer.find_best_lr(optim2):
trainer.fit() Which would be equal to: with trainer.find_best_lr(optim1, optim2):
trainer.fit() Not sure, how realistic this is, but that's the API, I'd like the most... |
I would then rather target the next release and have it in v0.8.0 with metrics |
This pull request is now in conflict... :( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great contribution! excited about this feature, just have a few comments to address.
yeah i agree with @justusschock i wouldn't use a Trainer arg here. i personally like the recommended usage as it stands. (copied from @SkafteNicki's post above )
users can name their learning rate hparam however they please ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an amazing feature!
I have some suggestions for the docs :)
Great PR, I'm also looking for this feature. Thanks everyone.
I think the current usage is great to manipulate lr finder programmatically, but it would also be nicer if we have a flag for interactive training. Something like this:
or
This flag will take precedence over all other lr related flag and the default will of course be False. |
good ideas. i think we don’t want to set a trend for breaking out of the current API patterns. the trainer flags are there so that users don’t have to think about all the tiny nuances of doing something. the approach auggested by @justusschock means the user now has to learn more API and has to remember to do a bunch of things... ie: they need to go read docs, also start pulling out optimizers, this completely breaks the lightning organization and abstraction and doesn’t go with the principle of not having to make users think about things they don’t need to think about. the user just wants the best LR, they shouldn’t have to remember to add a with, or pull out optimizers, etc... they should set a flag and get the LR. with the trainer flag the user doesn’t have to think about it. in fact, the library can just automatically set the best LR using what it finds... ie: it just works. we can still support the graph approach in this case by showing the plot in the logger. then the user can decide to fix the LR once they feel confident. the approach i suggest with a flag works as follows:
this approach has the advantages that:
So, i’m going to strongly suggest we use a trainer flag instead. |
Okay, gut can we pass an object where the best of should be stored (e.g. hparams.lr) ? If you have multiple runs you probably don't want to do the LR search every time |
I guess i assume the flow would be:
what would that object do? |
let me elaborate on why i think this is challenging suppose user A has a model like:
and user B has a model like:
how do we automatically set the learning rate? sure, we can update the pytorch optimizer's i would agree with the general flow:
i'm still not sure a flag is the best design for this. by keeping this functionality as a method, we're still only asking the user to call essentially one line of code minimal example
or the user can reach into the existing model and update an hparam since we call
power user
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great work on this! ⚡
@SkafteNicki this is an awesome feature! |
This pull request is now in conflict... :( |
|
||
lr_max: lr to stop seach | ||
|
||
num_training: number of steps to take between lr_min and lr_max |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rather num_train_steps
self.num_iter = num_iter | ||
super(_ExponentialLR, self).__init__(optimizer, last_epoch) | ||
|
||
def get_lr(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it shall be described what is the doff between this get_lr
and just lr
bellow because intuitively (by the name) they shall return the same
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
get_lr()
is the method called inside lr_scheduler.step()
and is not meant to be called elsewhere. Since pytorch 1.4 the property self._last_lr
was introduced to extract the last computed lr. However, since pytorch-ligning need to be backwards compatible, I created the self.lr
property that archives the same. They therfore have slightly different purpose.
* initial structure * rebase * incorporate suggestions * update CHANGELOG.md * initial docs * fixes based on reviews * added trainer arg * update docs * added saving/restore of model state * initial tests * fix styling * added more tests * fix docs, backward compatility and progressbar * fix styling * docs update * updates based on review * changed saving to standard functions * consistent naming * fix formatting * improve docs, added support for nested fields, improve codecov * update CHANGELOG.md * Update lr_finder.rst * Update pytorch_lightning/trainer/trainer.py * Update trainer.py * Update CHANGELOG.md * Update path * restoring * test * attribs * docs * doc typo Co-authored-by: Nicki Skafte <[email protected]> Co-authored-by: William Falcon <[email protected]> Co-authored-by: Jirka Borovec <[email protected]> Co-authored-by: J. Borovec <[email protected]>
Before submitting
What does this PR do?
Fixes #624
This PR implements a new method to the trainer class
lr_finder=Trainer.find_lr(model)
, that are similar to the feature found in fast.ai. It does a small fit of the model, where thelr
is increased after each batch and the corresponding loss is logged. The output objectlr_finder
can then be used to investigate the connection between choice oflr
and theloss
of the model. It can be used to reduce the amount of guesswork of choosing a good lr and it can be used to choose good bounds for the CyclicLRScheduler.The interface is simple from a user-standpoint:
Running above code for
pl_examples/basic_model/cpu_templatlightning_module_template.py
model produces the following plot (red point corresponds to the suggestedlr
to use)The feature seemed to gain much traction when it was proposed, however lightning was at that time missing a step-wise scheduling feature. This was implemented in PR #941, and this feature was therefore possible to implement now using more or less standard lightning features (callbacks ect.)
This PR is currently missing a lot (documentation, tests ect.) but I wanted I bit of feedback if this is still a wanted feature in lightning or if instead should be a part of lightning-bolts (when that is up and running).
PR review
Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.
Did you have fun?
Make sure you had fun coding 🙃