-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
more early stopping options (convergence and divergence threshold) #6868
Conversation
check_finite: Stops training when the monitor becomes NaN or infinite. Set this argument to ``False`` | ||
if this behavior is undesired. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we want an option to turn this on/off at all?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would say off. We will add support for occasional NaN loss training soon.
Codecov Report
@@ Coverage Diff @@
## master #6868 +/- ##
=======================================
- Coverage 92% 87% -5%
=======================================
Files 196 196
Lines 12571 12597 +26
=======================================
- Hits 11594 10968 -626
- Misses 977 1629 +652 |
if should_stop: | ||
self.stopped_epoch = trainer.current_epoch | ||
if reason: | ||
log.info(f"[{trainer.global_rank}] {reason}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is suboptimal.
If we log on rank zero only, and the user has sync_dist=False
for logging, then we might not see the reason being logged because it could be rank > 0 that decided to stop.
If we log on all ranks and the user has sync_dist=True
for logging, we will show the same message N times.
Should we perhaps broadcast the message and log only on rank 0?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea !
stopping_threshold: Stop training immediately once the monitored quantity reaches this threshold. | ||
divergence_threshold: Stop training as soon as the monitored quantity becomes worse than this threshold. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we could use stop_limit
and stop_loss
to follow common financial terms
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jlperla what do you think of this name suggestion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But might not be loss
, and most people don't know finance. These are basically optimizer settings,which is more universal.
I think sticking with optimizer style lingo is ideal. Divergence is safe and says what it means . Normally one would call the success criteria as tolerances for optimizers. But that is because they are always comparing something (eg a value itself, changes in that value, or first order conditions) to zero.
Since this could presumably compare stopping for this other than close to zero(especially if you are tracking something where a larger number is better) , I think threshold is probably more general. But open minded of course
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could also go super simple and go with min_threshold
and max_threshold
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No because that implies a direction to them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO, the names are good right now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is not clear to me, it is you have some converging sequence so it stops when it starts diverse again? shall it be some patience for noise presence reason?
or natively it can be observing training and validation measure and stop overfitting - when these twos tart diverse
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If something is diverging, it is because you are in some sort of local minima or outside of an attractor and it could only return with some massive jumps (i.e. emulation in simulating annealing you are way off in the boons for your optima... in theory it come come back, but it might takes months. You are better off just restarting). So patience isn't the right thing to think of for that.
Co-authored-by: Carlos Mocholí <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should patience and delta apply to these thresholds?
stopping_threshold: Stop training immediately once the monitored quantity reaches this threshold. | ||
divergence_threshold: Stop training as soon as the monitored quantity becomes worse than this threshold. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could also go super simple and go with min_threshold
and max_threshold
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM ! Great work ! @jlperl, does it match what you had in mind ?
elif self.divergence_threshold is not None and self.monitor_op(-current, -self.divergence_threshold): | ||
should_stop = True | ||
reason = ( | ||
f"Divergence: {self.monitor} = {current} > {self.divergence_threshold}." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor point: whether it is
f"Divergence: {self.monitor} = {current} > {self.divergence_threshold}."
or
f"Divergence: {self.monitor} = {current} > {self.divergence_threshold}."
should depend on the monitor_op
, right? Maybe have a f"Divergence: {self.monitor} = {current} {op_to_string(self.monitor_op)} {self.divergence_threshold}."
or something like that, where you fill op_to_string
with whatever you need to turn it into a >
, etc.?
Similarly, I think you could do the same with the "successful" convergence below/above the target.
f"Below tolerance {self.monitor} = {current} {op_to_string(self.monitor_op)} {self.stopping_threshold} ."
where you would have to ensure the order is correct as it is going form the other direction.
As I said though, minor.
@tchaton @awaelchli This all looks great to me. I put in one minor comment about the "reason" strings only being correct for one of the monitor_ops, but I also think that could wait and do it as a separate issue later. I personally am unlikely to use the other direction for the monitor_op anytime soon. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Love it!
What does this PR do?
Part of #6795
Adds two thresholds after which we stop training immediately (no patience).
Divergence threshold: the monitor has reached a value from which we believe it cannot recover -> stop training
Stopping threshold: the monitor has reached a target value that is close to optimal, and we do not care about further improvement -> stop training
Now that we have multiple stopping criteria, it's best we report the reason for stopping too.
TODO:
Before submitting
PR review
Anyone in the community is free to review the PR once the tests have passed.
Before you start reviewing make sure you have read Review guidelines. In short, see the following bullet-list:
Did you have fun?
Make sure you had fun coding 🙃