Transfer learning example #1564

jbschiratti · 2020-04-22T17:14:55Z

What does this PR do?

Addresses issue #514. Following up on this discussion, this PR proposes to add a self-contained example which shows how a pretrained network (such as ResNet50) can be fine tuned within a LightningModule.

PR review

Anyone in the community is free to review the PR 🙂

Borda

pls add argparse to be able to run with diff params
pls use Napoleon docs style https://www.sphinx-doc.org/en/master/usage/extensions/napoleon.html

pl_examples/domain_templates/computer_vision_fine_tuning.py

hcjghr · 2020-04-23T04:15:45Z

Hi @jbschiratti

Thanks for such a nice example. I'm rather new in the field so hopefully my question will not be too off base. I was just going over the code and I noticed that in your example the BatchNorm layers will always remain in training mode (as train_bn is always set to self.hparams.train_bn when calling the freeze function). That is even when performing validation or evaluation. I understand the code potentially allows for the BN layers to be set to eval (if the train_bn=False) but I am just wondering if there is a specific reason why do you always leave BN in train mode? Why not have them train in the training stage and eval in the validation/testing?

Just to clerify I'm not arguing it should be different, I'm just asking for the reasoning behind it.

Borda · 2020-04-23T07:54:59Z

Thanks for such a nice example. I'm rather new in the field so hopefully my question will not be too off base. I was just going over the code and I noticed ...

Thank you for your interest and help with this addition, may you please use review tab of this PR to write your comments directly to the sections you are talking about... it will make the discussion clearer and a bit more concrete :]

jbschiratti · 2020-04-23T09:52:58Z

@hcjghr Thank you for spotting this. It's was a bug! The way I see it, in the evaluation loop when model.eval() is called, the BatchNorm layers (as well as the other layers) should be in eval mode (training=False). This is what I had in mind and it is now fixed.

@Borda I fixed the docstrings and added argparse. Thank you for the comments!

Borda · 2020-04-23T10:07:12Z

pls add note to changelog 🐰

mergify · 2020-04-26T20:12:14Z

This pull request is now in conflict... :(

codecov · 2020-04-27T08:39:33Z

Codecov Report

Merging #1564 into master will not change coverage.
The diff coverage is n/a.

@@          Coverage Diff           @@
##           master   #1564   +/-   ##
======================================
  Coverage      88%     88%           
======================================
  Files          69      69           
  Lines        4133    4133           
======================================
  Hits         3656    3656           
  Misses        477     477

pl_examples/domain_templates/computer_vision_fine_tuning.py

jbschiratti · 2020-04-29T11:05:36Z

Thanks @awaelchli for the review and the comments!

pl_examples/domain_templates/computer_vision_fine_tuning.py

awaelchli · 2020-04-29T11:42:59Z

I noticed that the downloaded dataset is not ignored in version control. Could we maybe redirect it to a subfolder datasets and add a .gitignore in domain templates folder?

jbschiratti · 2020-04-29T11:45:09Z

This is strange because the context manager

with TemporaryDirectory(dir=hparams.root_data_path) as tmp_dir:
    ...

should delete the temporary folder in which the data is downloaded.

awaelchli · 2020-04-29T11:47:50Z

ah ok, so it is also supposed to do that on keyboard interrupt? Maybe it's because I'm on Windows currently.

jbschiratti · 2020-04-29T11:54:31Z

I tried to stop the script with CTRL+C during the 1st epoch and the temporary folder was deleted (on Linux). But I cannot guarantee this always works.

awaelchli · 2020-04-29T12:03:47Z

Can now also confirm it works fine on Linux, so it's just a Windows thing, so I guess we can keep it like that.

awaelchli

Nice, minimal and clean. I like it very much.

pl_examples/domain_templates/computer_vision_fine_tuning.py

Borda · 2020-04-29T15:20:12Z

pl_examples/domain_templates/computer_vision_fine_tuning.py

+    def loss(self, labels, logits):
+        return self.loss_func(input=logits, target=labels)
+
+    def train(self, mode=True):


what is the mode for? could it be more descriptive?

See https://github.com/pytorch/pytorch/blob/d37a4861b8a5eed3d9a1340484d1efb0f48aa59e/torch/nn/modules/module.py#L1067. This line overrides the train method of the Pytorch module. I will add a docstring specifying what mode does.

you are right, we may rename it...
doe you have suggestion about a better name? @PyTorchLightning/core-contributors

I am not sure we can rename it. In the evaluation loop (L330), model.train() is called and here, model refers (if I am not mistaken) to the LightningModule. We want to override this train method to ensure that, at the end of this evaluation loop when model.train() is called, some parameters (in specific layers) remain frozen (that is, with requires_grad=False) if needed.

pl_examples/domain_templates/computer_vision_fine_tuning.py

Borda · 2020-04-29T15:25:44Z

pl_examples/domain_templates/computer_vision_fine_tuning.py

+    @staticmethod
+    def add_model_specific_args(parent_parser):
+        parser = argparse.ArgumentParser(parents=[parent_parser])
+        parser.add_argument('--backbone',


use add_argparse_args so we limit duplication and add just the new/needed for a model

by "limit code duplication", you want me to remove this line?

I mean remove lines which are generated from Trainer arguments... does it make sense?

https://github.com/PyTorchLightning/pytorch-lightning/blob/9b86aea98bdaa73bc3bf8841f4dc794f46a3f2ac/pytorch_lightning/trainer/trainer.py#L589

Borda · 2020-04-29T15:26:48Z

pl_examples/domain_templates/computer_vision_fine_tuning.py

+        to a temporary directory.
+    """
+
+    with TemporaryDirectory(dir=hparams.root_data_path) as tmp_dir:


I guess we want to keep the output folder

@jbschiratti ^^

The folder in which the data was downloaded is deleted after the experiment. If you think we should leave the data untouched after the example has run, I can make another PR to fix this :-)

williamFalcon · 2020-04-30T12:19:18Z

@jbschiratti this is super cool.
Why don't we move this to https://github.com/PyTorchLightning/pytorch-lightning-bolts?

Borda · 2020-04-30T13:41:43Z

I would keep it here in examples....

mergify · 2020-05-01T18:15:29Z

This pull request is now in conflict... :(

…trings).

jbschiratti · 2020-05-03T09:13:04Z

Thank you @williamFalcon 👍

mergify bot requested a review from a team April 22, 2020 17:15

Borda added the example label Apr 22, 2020

Borda reviewed Apr 22, 2020

View reviewed changes

pl_examples/domain_templates/computer_vision_fine_tuning.py Outdated Show resolved Hide resolved

pl_examples/domain_templates/computer_vision_fine_tuning.py Show resolved Hide resolved

mergify bot requested review from a team April 22, 2020 21:04

Borda added the waiting on author Waiting on user action, correction, or update label Apr 25, 2020

hcjghr reviewed Apr 27, 2020

View reviewed changes

pl_examples/domain_templates/computer_vision_fine_tuning.py Outdated Show resolved Hide resolved

jbschiratti mentioned this pull request Apr 28, 2020

Parameter Groups / Transfer Learning #514

Closed

awaelchli reviewed Apr 28, 2020

View reviewed changes

mergify bot requested a review from a team April 28, 2020 23:13

awaelchli suggested changes Apr 29, 2020

View reviewed changes

pl_examples/domain_templates/computer_vision_fine_tuning.py Outdated Show resolved Hide resolved

pl_examples/domain_templates/computer_vision_fine_tuning.py Outdated Show resolved Hide resolved

mergify bot requested a review from a team April 29, 2020 11:38

Borda assigned awaelchli Apr 29, 2020

Borda requested a review from awaelchli April 29, 2020 11:56

awaelchli approved these changes Apr 29, 2020

View reviewed changes

mergify bot requested a review from a team April 29, 2020 13:25

Borda reviewed Apr 29, 2020

View reviewed changes

mergify bot requested a review from a team April 29, 2020 15:28

jbschiratti added 7 commits May 1, 2020 21:03

Fine tuning example.

91ba069

Fix (in train method) + Borda's comments (added argparse + fixed docs…

23f7b54

…trings).

Updated CHANGELOG.md

a21178f

Fix + updated docstring.

80612a7

Fixes (awaelchli's comments) + docstrings.

dd97f11

Fix train/val loss.

629970a

Fix.

493296e

Borda force-pushed the fine_tuning_example branch from 7a1e5e3 to 493296e Compare May 1, 2020 19:05

williamFalcon merged commit fafe5d6 into Lightning-AI:master May 2, 2020

Borda added this to the 0.7.6 milestone May 4, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Transfer learning example #1564

Transfer learning example #1564

jbschiratti commented Apr 22, 2020

Borda left a comment

hcjghr commented Apr 23, 2020

Borda commented Apr 23, 2020

jbschiratti commented Apr 23, 2020

Borda commented Apr 23, 2020

mergify bot commented Apr 26, 2020

codecov bot commented Apr 27, 2020 •

edited

Loading

jbschiratti commented Apr 29, 2020

awaelchli commented Apr 29, 2020 •

edited

Loading

jbschiratti commented Apr 29, 2020

awaelchli commented Apr 29, 2020

jbschiratti commented Apr 29, 2020

awaelchli commented Apr 29, 2020

awaelchli left a comment

Borda Apr 29, 2020

jbschiratti Apr 29, 2020

Borda Apr 29, 2020

jbschiratti Apr 29, 2020 •

edited

Loading

Borda Apr 29, 2020

jbschiratti Apr 29, 2020

Borda Apr 29, 2020

Borda Apr 29, 2020

Borda Apr 29, 2020

Borda May 2, 2020

jbschiratti May 3, 2020

williamFalcon commented Apr 30, 2020

Borda commented Apr 30, 2020

mergify bot commented May 1, 2020

jbschiratti commented May 3, 2020

Transfer learning example #1564

Transfer learning example #1564

Conversation

jbschiratti commented Apr 22, 2020

What does this PR do?

PR review

Borda left a comment

Choose a reason for hiding this comment

hcjghr commented Apr 23, 2020

Borda commented Apr 23, 2020

jbschiratti commented Apr 23, 2020

Borda commented Apr 23, 2020

mergify bot commented Apr 26, 2020

codecov bot commented Apr 27, 2020 • edited Loading

Codecov Report

jbschiratti commented Apr 29, 2020

awaelchli commented Apr 29, 2020 • edited Loading

jbschiratti commented Apr 29, 2020

awaelchli commented Apr 29, 2020

jbschiratti commented Apr 29, 2020

awaelchli commented Apr 29, 2020

awaelchli left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jbschiratti Apr 29, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

williamFalcon commented Apr 30, 2020

Borda commented Apr 30, 2020

mergify bot commented May 1, 2020

jbschiratti commented May 3, 2020

codecov bot commented Apr 27, 2020 •

edited

Loading

awaelchli commented Apr 29, 2020 •

edited

Loading

jbschiratti Apr 29, 2020 •

edited

Loading