-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parameter Groups / Transfer Learning #514
Comments
Great questions.
def __init__(self, ...):
self.pretrained_model = SomeModel.load_from_...()
self.pretrained_model.freeze()
self.finetune_model = ...
def configure_optimizers(self):
return Adam(self.pretrained_model.parameters(), ...)
|
Hey William. I am a bit confused on that response. Let me be more clear on the process.
I would like to train this full network with self.resnet frozen for a few epochs. Then unfreeze this network and train the whole model some more. To accomplish this
It seems to me, in this example you have given, that when you try to train, the model will be stuck in eval mode because of the call to model.eval. I think you may have the idea that pre-trained is a lightning module that has been trained. Is this the case? If so, that makes using pre-trained networks need to be wrapped in Lightning Module which I think goes against the idea of "just pytorch." |
Hi, I too would be interested in a step-by-step 'tutorial' for doing transfer learning with Pytorch-Lightning. Is this something that might be added to the docs? |
@LucFrachon would you interested in creating such tutorial? |
I'm not sure I am fully qualified for this :-)
I can see how to do these things individually, but I struggle to see how to integrate them elegantly in a LightningModule... |
@Borda @LucFrachon I was working on something very similar these last few days.
|
@jbschiratti nice example, what dataset are you using? |
|
cool, mind sending PR as a lightning example? |
Sure ! But I think it would be more relevant with some real images (the dataset used in this example, for instance). Don't you think? |
well, real-world examples would be nice nut we still need to stay in kind of minimal mode, we do not want a user to download couple gb dataset just for an example... BUT your Ants-Bets looks good |
Hi, i created a similar example using fastai and pytorch-lightning. Might be useful for some one https://github.com/sairahul/mlexperiments/blob/master/pytorch-lightning/fine_tuning_example.py |
@Borda I eventually made a PR with a slightly modified version of the example I proposed. |
Thank you for the tutorial! This is much needed. I have a small question -- is there a reason why the frozen parameters cannot be added to the optimizer from the beginning, and has to be added on specific epochs? Afaik if |
@lizhitwo Sure! I added the parameters separately to emphasize the idea that these parameters were not trained before a given epoch. AFAIK, what you're proposing would work as well! |
@jbschiratti Thanks for the explanation! |
@jbschiratti (and maybe someone else), is there a big advantage in either approach? I understand you made it in this way to emphasis the approach but in general which would be more optimal? I came across a discussion on pytorch forum (https://discuss.pytorch.org/t/passing-a-subset-of-the-parameters-to-an-optimizer-equivalent-to-setting-requires-grad-of-subset-only-to-true/42866) where it is suggested that passing all the parameters to the optimizer and then marking the frozen ones with requires_grad=False flag prevents the gradients from being computed, and subsequently saves some memory. Not sure if this is still relevant as the discussion is a year old... |
@jbschiratti I completely understand it now and I definitely agree this is the best way to have it in the example to show different possibilities. Thanks for the quick explanation. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
Hey I'm trying to do something similar with some categorical data where I want to freeze the pre-trained model for the initial training then train the full network slightly. I struggled to find the example as its not mentioned in the docs but is hidden away in the github repo. The README.md for the parent directory doesn't mention this example either so its not very visible. Although, I think it is a super useful example for transfer learning! |
Hi, I'm trying to fine-tune a subset of parameters for pre-defined model. Only fine-tune the bias parameters as done in BitFit paper. But i'm getting this error. "One of the differentiated Tensors does not require grad" I'm looping through all the parameters (named_parameters), and then changing the requires_grad value if "bias" is in the name of the parameter. Did anyone faced this issue ? Thanks, |
I am trying to train a pre-trained reset50 with a small network on top of it frozen and then unfreeze and train the whole network.
My questions are:
1 - Where is the most appropriate place in the framework to create parameter groups?
2 - Does it make sense to add options to freeze/unfreeze to support selectively freezing groups
Regarding the current implementation of freeze/unfreeze, it has the side effect of setting the state of the model to eval/train. This seems inappropriate. If this is of interest, I am happy to make a pull request.
The text was updated successfully, but these errors were encountered: