Add reduce_on_plateau LR scheduler to contrib directory. #629

vz415 · 2023-11-10T22:49:29Z

Following pull request #505 and issue #221 with @mtthss suggestions on changing where the scheduler is located (contrib) and using GradientTransformationExtraArgs API. Let me know if there's anything else that needs to be done to merge this. Cheers.

google-cla · 2023-11-10T22:49:33Z

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

fabianp · 2023-12-10T09:12:28Z

Thanks for the contribution @vz415 ! There seem to be some minor issues regarding formatting that break the tests (see https://github.com/google-deepmind/optax/actions/runs/6830352294).

Other than that this looks good to me. Green light to merge once the tests pass @mtthss ?

vz415 · 2023-12-11T21:17:50Z

@fabianp fixed the spacing formatting issue 🤦‍♂️ and everything should be ready to merge.

vroulet · 2023-12-11T23:12:48Z

optax/contrib/prodigy.py

@@ -46,7 +46,7 @@ class ProdigyState(NamedTuple):
 def prodigy(
    learning_rate: base.ScalarOrSchedule = 0.1,
    betas: tuple[float, float] = (0.9, 0.999),
-    beta3: float | None = None,
+    beta3: float = None,


Thanks for catching that! I fixed it, you'll need to merge with main.

Cool, pulled the most recent main commit to fix this and pushed.

fabianp · 2023-12-12T12:30:14Z

Thanks @vz415 for the changes! A couple more minor things and then I think we're ready to merge:

Is there any paper we could cite that describes this approach? If so, please add them under "References" on the docstring (see examples on _src/alias.py)
Please add this new method to the API reference (docs/api.rst)

vz415 · 2023-12-12T14:57:58Z

Hi @fabianp , I've addressed the issues below.

There's no seminal paper on the technique but here's the pytorch documentation. I can add some papers I've seen use it if wanted.
I added it to the API reference. Let me know if I need to edit it.

fabianp · 2023-12-12T23:11:50Z

docs/api.rst

@@ -619,6 +619,7 @@ Schedules
 .. autofunction:: piecewise_constant_schedule
 .. autofunction:: piecewise_interpolate_schedule
 .. autofunction:: polynomial_schedule
+.. autofunction:: optax.contrib.reduce_on_plateau.reduce_on_plateau


Suggested change

.. autofunction:: optax.contrib.reduce_on_plateau.reduce_on_plateau

.. autofunction:: optax.contrib.reduce_on_plateau

fabianp · 2023-12-12T23:14:35Z

optax/contrib/reduce_on_plateau.py

+  min_improvement:float,
+  cooldown:int
+) -> base.GradientTransformationExtraArgs:
+  """ Args:


Suggested change

""" Args:

"""Reduce learning rate when a metric has stopped improving.

Models often benefit from reducing the learning once learning stagnates.

This scheduler reads a metrics quantity and if no improvement is seen for

a ‘patience’ number of epochs, the learning rate is reduced.

Args:

fabianp · 2023-12-12T23:15:08Z

optax/contrib/reduce_on_plateau.py

+def reduce_on_plateau(
+  reduce_factor: float,
+  patience: int,
+  min_improvement:float,


Suggested change

min_improvement:float,

min_improvement: float,

fabianp · 2023-12-12T23:15:19Z

optax/contrib/reduce_on_plateau.py

+  reduce_factor: float,
+  patience: int,
+  min_improvement:float,
+  cooldown:int


Suggested change

cooldown:int

cooldown: int

fabianp · 2023-12-12T23:16:34Z

optax/contrib/reduce_on_plateau.py

+  """ Args:
+  reduce_factor: Factor by which the learning rate will be reduced. 
+      new_lr = lr * factor.
+  patience: Number of epochs with no improvement after which learning 


Why are these parameters in terms of number of epochs? Wouldn't it be more accurate to talk about number of iterations? It seems to me that the method has now knowledge of epochs, but keeps track of iterations instead

Thanks for catching that. Seems like this is a carryover from the pytorch documentation. I agree that the current version should be stated in terms of iterations instead of epochs and think the function would need to change for epochs (I've only used for full-batch training on my own research) unless I'm missing something

optax/contrib/reduce_on_plateau.py

fabianp · 2023-12-18T02:39:56Z

Is there a reason why the parameters reduce_factor, patience, etc. need to be passed to the state ReduceLROnPlateauState ?

fabianp · 2023-12-18T02:45:05Z

Also, in its current state, I think this method is not straightforward to use (at least it was not clear to me). Would you be willing add a docstring or create an example on how to combine this scaling with (say) SGD or Adam?

vz415 · 2023-12-19T14:45:40Z

Thanks for the review and feedback! Answering your question: those parameters are passed to update the learning rate after every epoch. Those parameters could externalized via functools.partial but I think this format makes it easier to implement for users, although more implicit in how the updates are carried out. Let me know if this makes sense or if you have other concerns.

I'll add a docstring with an example on how to use. If that's not clear I can create an example.

fabianp · 2023-12-19T15:32:23Z

This now merged with some changes. Among other things, I changed some keyword arguments to match those of the pytorch implementation.

Other changes were done to fix some internal failures due to the fact that the internal pytype checks are a bit more stringent than the ones that run on the github actions.

If you feel that some of these changes are for the worst, please open another PR and propose modifications.

Regarding documentation, I mentioned earlier I think it is important to make an example on using this function. I created issue #679 to track progress on this

fabianp · 2023-12-19T16:05:53Z

thanks for the answer @vz415 (and for your contribution)! I saw your comment after I had submitted. In any case please don't hesitate to open another PR to edit the submitted code

Add reduce_on_plateau LR scheduler to contrib directory.

ae75b61

vz415 added 4 commits December 11, 2023 10:24

Merge branch 'google-deepmind:master' into master

c0e8d40

fixed spacing for ReduceLROnPlateau

8facf45

modify init to reflect new optax changes

a5671a3

fix prodigy file type error

f02913d

vroulet reviewed Dec 11, 2023

View reviewed changes

prodigy type tweak from main

dc42492

add reduce_on_plateau scheduler to documentation

530d4ae

fabianp reviewed Dec 12, 2023

View reviewed changes

make suggested documentation changes

96c2b53