Support torch.optim.lr_scheduler.ReduceLROnPlateau #320

vikmary · 2019-10-06T20:27:03Z

Support of torch.optim.lr_scheduler.ReduceLROnPlateau by implementing pytorch_lightning.Callback .

Fixes #298.

ReduceLROnPlateau should be inited as any other scheduler:

class MyModule(pl.LightningModule):
  def configure_optimizers(self):
    optimizer = optim.Adam(self.parameters(), lr=self.params.learning_rate)
    scheduler = optim.lr_scheduler.ReduceLROnPlateau(optimizer,
                                                     mode='min',
                                                     factor=0.1,
                                                     patience=10,
                                                     min_lr=1e-6,
                                                     verbose=True)
    return [optimizer], [scheduler]

From contribution guide:

UI condition met
simplicity of internal code is placed under review.
seems like no doc updates needed.

williamFalcon · 2019-10-09T12:34:41Z

@vikmary looks like some of the old changes leaked into this PR. Could you rebase master into this and submit again? Thanks!

vikmary · 2019-10-09T12:49:16Z

@vikmary looks like some of the old changes leaked into this PR. Could you rebase master into this and submit again? Thanks!

Merged origin master again.

williamFalcon · 2019-10-09T12:51:23Z

ummm. still pulls in the wrong master.

can you pull your local master first then apply the rebase to your branch?

basically all the messaging and ddp stuff shouldn’t change

vikmary · 2019-10-09T13:02:36Z

ummm. still pulls in the wrong master.

can you pull your local master first then apply the rebase to your branch?

basically all the messaging and ddp stuff shouldn’t change

That's what I did. Can you detect what's wrong?

cd my-fork
git checkout master
git pull https://github.com/williamFalcon/pytorch-lightning.git master
git checkout support-reduceonplateau-lr-scheduler
git merge master
git push

vikmary · 2019-10-09T13:06:03Z

ummm. still pulls in the wrong master.

can you pull your local master first then apply the rebase to your branch?

basically all the messaging and ddp stuff shouldn’t change

Diff looks strange. That's i guess what you are talking about. I'll redo the merging, thanks

williamFalcon · 2019-10-09T14:20:44Z

@vikmary ah it looks great now haha. I added a comment in the review

vikmary · 2019-10-09T14:46:56Z

@williamFalcon Can't find the comment, did you submit the review?

williamFalcon · 2019-10-09T15:39:34Z

ummm. ok.

My question was about why we need a custom implementation of the reduceLROnPlateu? We should be using the default PyTorch one

williamFalcon · 2019-10-09T16:58:30Z

(if you click on the "Files Changed" tab you'll see my original comment and links

vikmary · 2019-10-11T05:48:30Z

Sorry, my "Files Changes" tab is crystal clear from comments. Did you click "Add single comment" when submitting a comment or "Start a Review"? If you start a review, then you have to submit the whole review in order for comments to appear (as it seems to me).

We ARE using the default PyTorch implementation of ReduceLROnPlateau. We even can get rid of my class implementation (it only makes code cleaner).

The only problem with default PyTorch implementation it that we have to pass VALIDATION LOSS ato scheduler.step method. That is exactly what pytorch-lightning/callbacks:ReduceLROnPlateau class is doing.

williamFalcon · 2019-10-09T12:33:16Z

pytorch_lightning/callbacks/pt_callbacks.py

+class ReduceLROnPlateauScheduler(Callback):
+    """
+    Reduce learning rate when the monitored metric has stopped improving.
+    Wrapper for torch.optim.lr_schuduler.ReduceLROnPlateau learning rate
+    schedulers.
+
+    # Arguments
+        schedulers: list of torch.optim.lr_scheduler.ReduceLROnPlateau
+        monitor: quantity to be monitored.
+    """
+
+    def __init__(self, schedulers, monitor='val_loss'):
+        super(ReduceLROnPlateauScheduler, self).__init__()
+
+        self.monitor = monitor
+        self.schedulers = schedulers
+
+    def on_epoch_end(self, epoch, logs=None):
+        current = logs.get(self.monitor)
+        stop_training = False
+        if current is None:
+            print('ReduceLROnPlateau conditioned on metric `%s` '
+                  'which is not available. Available metrics are: %s' %
+                  (self.monitor, ','.join(list(logs.keys()))), RuntimeWarning)
+            exit(-1)
+
+        for scheduler in self.schedulers:
+            scheduler.step(current, epoch=epoch)
+
+


Why do we need to create our own ReduceLROnPlateauScheduler?
We should be operating directly on the PyTorch one (https://pytorch.org/docs/stable/optim.html?highlight=reducelr#torch.optim.lr_scheduler.ReduceLROnPlateau)

ReduceLROnPlateauScheduler.schedulers is a list of orginal torch.optim.lr_scheduler.ReduceLROnPlateau, see the proof in a comment below

vikmary · 2019-10-11T11:28:56Z

pytorch_lightning/trainer/trainer.py

+        custom_schedulers = []
+        i = 0
+        while i < len(schedulers):
+            if isinstance(schedulers[i], torch.optim.lr_scheduler.ReduceLROnPlateau):


VismantasD · 2019-10-18T18:50:38Z

pytorch_lightning/trainer/trainer.py

+        while i < len(schedulers):
+            if isinstance(schedulers[i], torch.optim.lr_scheduler.ReduceLROnPlateau):
+                custom_schedulers.append(schedulers.pop(i))
+            i += 1


There is a small issue with this snippet. When the ReduceLROnPlateau optimizer is pop'ed, the i should not be increased, otherwise, the element following the element being pop'ed ends up in position schedulers[i] and then i is immediately increased, so that element never gets checked ( to see if it is another ReduceLROnPlateau.
This would not cause any problems when there is only on ReduceLROnPlateau scheduler.
While there is probably no reason to have more than one ReduceLROnPlateau scheduler, it would be nicer to change the code. Either move the i+=1 to an else branch, or ( if we make assumption that only one ReduceLROnPlateau scheduler is present ) break out of the loop.

Suggested change

while i < len(schedulers):

if isinstance(schedulers[i], torch.optim.lr_scheduler.ReduceLROnPlateau):

custom_schedulers.append(schedulers.pop(i))

i += 1

else:

i += 1

Hi, thank you for the fix. I decided to support only one ReduceLROnPlateau scheduler.

Borda · 2019-11-03T07:20:16Z

pytorch_lightning/callbacks/__init__.py

@@ -1,7 +1,8 @@
-from .pt_callbacks import EarlyStopping, ModelCheckpoint, GradientAccumulationScheduler
+from .pt_callbacks import EarlyStopping, ModelCheckpoint, ReduceLROnPlateauScheduler, GradientAccumulationScheduler


it looks like a relative import which we shall not use... :)

Relative imports of EarlyStopping, ModelCheckpoint etc. are taken from the original repository. Why is relative import of ReduceLROnPlateauScheduler inappropriate?

#402 (comment)

#402 never fixed relative imports in callbacks init as well as in many other places. I'd say that above comment is out of scope for this PR. @Borda it might be better to create a separate PR that will properly fix relative imports.

It was not that the PR was fixing relative imports, but I tried to make them which was stopped...

let's do a separate PR for relative imports.

I have opened ticket #459

williamFalcon · 2019-11-05T02:14:35Z

@vikmary let's get this into the next release on nov 6.

can you rebase master onto this?
do we need that class you made? can't we just use the default pytorch one? maybe i'm missing something or reading this too quickly.

@Borda thoughts?

vikmary · 2019-11-05T09:10:13Z

@vikmary let's get this into the next release on nov 6.

can you rebase master onto this?

do we need that class you made? can't we just use the default pytorch one? maybe i'm missing something or reading this too quickly.

@Borda thoughts?

I'll try to rebase during today.

Borda · 2019-11-05T09:27:52Z

there are 3 options and I would try them in the following order:

rebase it on upstream/master (upstream is the "williamFalcon") and resolve conflicts, see https://gist.github.com/ravibhure/a7e0918ff4937c9ea1c456698dcd58aa
merge upstream/master and resolve conflicts
create a clear branch from upstream/master and cherry-pick your commits

(for all three options maybe be easier to squash all your commits to one)

vikmary · 2019-11-05T19:33:18Z

The merging needs some refactoring, will not be able to finish today

vikmary · 2019-11-07T08:05:13Z

Hi, I refactored so that:

there is no Callback class, we rely solely on torch.optim.lr_scheduler.ReduceLROnPlateau
fixed logging error in ModelCheckoint.on_epoch_end
@Borda @williamFalcon

williamFalcon · 2019-11-30T20:14:53Z

@Borda @Ir1d merging this, look good (looks fine to me)?

Borda

I would not use exit otherwise LGTM

pytorch_lightning/trainer/train_loop_mixin.py

pytorch_lightning/trainer/trainer.py

pytorch_lightning/callbacks/pt_callbacks.py

williamFalcon · 2019-12-01T10:50:33Z

pytorch_lightning/trainer/train_loop_mixin.py

-                    lr_scheduler.step(self.current_epoch)
+                    lr_scheduler.step(epoch=self.current_epoch)
+            if self.reduce_lr_on_plateau_scheduler is not None:
+                val_loss = self.callback_metrics.get('val_loss')


this seems very specific. does it only need to work with val_loss?

I think it could be any validation metric in theory, but how do we let the user pass it in? Perhaps via a dedicated dict entry in the validation_end output similar to "log" and "progress_bar"? It's one more thing the user needs to remember, but maybe its fine since this lr_scheduler is optional and it should be mentioned in the docs.

yeah, good option. let’s do it in a separate PR?

Ir1d · 2019-12-01T10:53:27Z

I have one offtopic question..
Should this be epoch instead of epoch + 1?

https://github.com/williamFalcon/pytorch-lightning/pull/320/files#diff-f1ccb073775f3b2f9c294bd887086da3L202

vikmary · 2019-12-03T13:12:17Z

Thank you
It was a long-awaiting pull-request
Hope it has some value for framework in spite of non-trivial master integration

williamFalcon · 2019-12-03T13:44:58Z

haha. Thank you so much for contributing! feel free to keep helping out as we’re always growing our core team!

vikmary added 4 commits October 6, 2019 23:03

feat: add reducelronplateau callback

1a173d8

feat: use reducelronplateau callback in trainer

acb30c0

feat: only on unsupported lr schedulers

6827262

feat: merge origin

b6f827b

feat: merge master origin (second try)

fa4094e

feat: last but not the least merge of master

d232a7c

williamFalcon reviewed Oct 11, 2019

View reviewed changes

vikmary commented Oct 11, 2019

View reviewed changes

Merge branch 'master' into support-reduceonplateau-lr-scheduler

94a285d

VismantasD reviewed Oct 18, 2019

View reviewed changes

vikmary added 2 commits October 20, 2019 12:30

feat: merge master

26844ac

feat: support only on scheduler in reduceLrOnPlateauScheduler

cae83ca

Borda reviewed Nov 3, 2019

View reviewed changes

feat: merge origin

29c5480

Borda mentioned this pull request Nov 6, 2019

save_best,-save_all,-save_last ++ bug fix #416

Closed

feat: merge origin & refactor

cd946a3

vikmary and others added 2 commits November 7, 2019 12:41

refactor: code style

78b975b

Merge branch 'master' into support-reduceonplateau-lr-scheduler

4f5ae52

Borda requested changes Nov 30, 2019

View reviewed changes

pytorch_lightning/trainer/train_loop_mixin.py Outdated Show resolved Hide resolved

pytorch_lightning/trainer/trainer.py Outdated Show resolved Hide resolved

Ir1d reviewed Dec 1, 2019

View reviewed changes

pytorch_lightning/callbacks/pt_callbacks.py Outdated Show resolved Hide resolved

williamFalcon added 5 commits December 1, 2019 04:53

Update pt_callbacks.py

c9d5618

Update trainer.py

75faae5

Update train_loop_mixin.py

5465466

Update trainer.py

d16a6c0

Update train_loop_mixin.py

7650fa4

williamFalcon reviewed Dec 1, 2019

View reviewed changes

williamFalcon merged commit a6d64ac into Lightning-AI:master Dec 3, 2019

		@@ -1,7 +1,8 @@
		from .pt_callbacks import EarlyStopping, ModelCheckpoint, GradientAccumulationScheduler
		from .pt_callbacks import EarlyStopping, ModelCheckpoint, ReduceLROnPlateauScheduler, GradientAccumulationScheduler

Support torch.optim.lr_scheduler.ReduceLROnPlateau #320

Support torch.optim.lr_scheduler.ReduceLROnPlateau #320

Conversation

vikmary commented Oct 6, 2019

williamFalcon commented Oct 9, 2019

vikmary commented Oct 9, 2019

williamFalcon commented Oct 9, 2019

vikmary commented Oct 9, 2019

vikmary commented Oct 9, 2019

williamFalcon commented Oct 9, 2019

vikmary commented Oct 9, 2019 • edited Loading

williamFalcon commented Oct 9, 2019

williamFalcon commented Oct 9, 2019

vikmary commented Oct 11, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vikmary Oct 20, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

williamFalcon commented Nov 5, 2019 • edited Loading

vikmary commented Nov 5, 2019

Borda commented Nov 5, 2019

vikmary commented Nov 5, 2019

vikmary commented Nov 7, 2019 • edited Loading

williamFalcon commented Nov 30, 2019 • edited Loading

Borda left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Ir1d commented Dec 1, 2019

vikmary commented Dec 3, 2019

williamFalcon commented Dec 3, 2019

vikmary commented Oct 9, 2019 •

edited

Loading

vikmary commented Oct 11, 2019 •

edited

Loading

vikmary Oct 20, 2019 •

edited

Loading

williamFalcon commented Nov 5, 2019 •

edited

Loading

vikmary commented Nov 7, 2019 •

edited

Loading

williamFalcon commented Nov 30, 2019 •

edited

Loading