-
-
Notifications
You must be signed in to change notification settings - Fork 16.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Transfer Learning with Frozen Layers #1314
Comments
I noticed that there's an argument in yolov3 train.py code "--freeze-layer"? Please, what does it do? It states that it freezes all non-output layer? Please can you provide more clarification about this? Thank you. Omobayode |
@glenn-jocher Another dimension to this is generalization. I assume your results are shown for a test dataset. But for generalization to new datasets, freezing might also help prevent overfitting to ttraining data (and therefore improve robustness/generalization). |
@mphillips-valleyit interesting point, though hard to quantify beyond existing val/test metrics. |
It could be done with separate datasets--models pretrained on COCO, measure their generalization (freezing vs. non-freezing fine-tuning) to OpenImages for common categories. If I'm able to post results on this at some point, I will. |
@mphillips-valleyit how is your custom data training result by transfer lerning ? could you attach you train log here? |
@glenn-jocher Might be interesting to do a final step by unfreezing and training the complete netwerk again with differentiated learning rate. So complete training process would be (default method in fast.ai):
|
@ramonhollands that's an interesting idea, though I'm sure the devil is in the details, such as the epochs you take these actions at, the LRs used, dataset and model etc. I don't have time to investigate further, but you should be able to reproduce the above tutorial and apply the extra steps you propose to quantify differences. If you do please share your results with us. One point to mention is that classification and detection may not share a common set of optimal training steps, so what works for fast.ai may not correlate perfectly to detection architectures like YOLO. Would be very interested to see experimental results. |
Ill take that challenge the coming weeks. Trying to wrap your amazing work in the fast.ai framework to be able to use best of both worlds, including the fastai learning rate finder and discriminate learning rates etc. The method should work for detection architectures as well (https://www.youtube.com/watch?v=0frKXR-2PBY). Ill keep you updated. |
I trained a model with an online dataset containing 5 categories, and now I'm trying to fine-tune it with my own images, which contain the same 5 categories plus an additional one. My images are similar to the ones from the online dataset, so I thought that transfer learning would work. However, this is what I obtain while fine-tuning:
When I visualize the labels everything looks correct, so I don't understand is why Targets is 0. I also modified the dataset configuration yaml file adding the new category. The fine-tuning works when I remove my additional category and fine-tune with the same 5 categories. Does anybody know what am I doing wrong here? Thanks in advance! |
@aritzLizoain |
@ramonhollands LR finder sounds very cool, but be careful because sometimes LRs that work well for training can cause instabilities without a warmup to ramp the LR from 0 to it's initial value. @aritzLizoain no targets found during testing means no labels are found for your images. Follow the Custom training tutorial to create a custom dataset: |
While using Transfer Learning (both with layers freeze and without), it happens that model "forgets" data it was trained on (metrics on original data are getting worse). So, I think problem might be in too large learning rate. Can you please give a little bit more details on what hyperparameters should be changed when finetuning the model? (maybe change lr0 to the last lr that was during original training and removing warmup epochs, or is it a wrong approach?) |
How do you set a different learning rate for the backbone? |
You have to split the backbone and head parameters and add additional param groups for both with different 'lr' argument (https://pytorch.org/docs/stable/optim.html). I wrote some initial code which Ill post later today. |
See https://github.com/ramonhollands/different_learning_rates/blob/master/train.py I have added two parameters to experiment with:
I started some experiments which where encouraging but did not have enough time to finish up yet. |
@glenn-jocher I am curious about result pics in "Accuracy Comparison", why can the mAP of exp9_freeze_all increase as training progresses? Now that all params are frozed, they won't be optimized and performance should be a flat line? |
@laisimiao exp9_freeze_all freezes all layer except output layer, which has an active gradient. |
@glenn-jocher sir, when im doing my transfer learning in yolov5, the electricity cuts off and the training stops, my question is, can i still continue the training? can i use --resume? how to continue interrupted transfer learning? |
1 similar comment
@glenn-jocher sir, when im doing my transfer learning in yolov5, the electricity cuts off and the training stops, my question is, can i still continue the training? can i use --resume? how to continue interrupted transfer learning? |
so have you figure out how to unfreeze backbone after epoch? |
The most straightforward way would be to use a smaller existing model of You can further set the numbers to be smaller if you want a smaller model than the
https://github.com/ultralytics/yolov5/blob/master/models/yolov5n.yaml#L5-L6 |
@bryanbocao hi there! It's great that you're looking to optimize the model for your specific use case. To reduce the number of parameters, you can consider using a smaller existing model such as |
@glenn-jocher Thanks for your reply!
That's what I did eventually :) |
@bryanbocao you're welcome! Great to hear that you found a solution by adjusting the |
I am not sure if this is the right place to ask it. But I have a Yolo5x6 model that I want to "convert" to a Yolo5n or Yolo5s model weights. Is there some technique to do that, without having to retrain the model from scratch? |
@skyprince999 hi there! The process you're referring to is known as model distillation or compression, where a larger model (teacher) is used to guide the training of a smaller model (student). However, directly converting weights from a larger model like YOLOv5x6 to a smaller architecture like YOLOv5n or YOLOv5s isn't straightforward because the architectures differ significantly in terms of layer depth and width. To achieve a smaller model with the knowledge of the larger one, you would typically perform knowledge distillation, which involves training the smaller model using the larger model's outputs as guidance. This process still requires training from scratch but can be faster and result in a more accurate small model than training it directly on the dataset. For now, YOLOv5 does not support direct weight conversion between different model sizes. You would need to train the smaller model using the standard training procedures, potentially using the larger model's weights for initializing the training process, or you could explore knowledge distillation techniques. If you're looking to maintain as much performance as possible without retraining from scratch, you might consider fine-tuning the smaller model on your dataset using the larger model's weights as a starting point. This would involve using the |
Thanks @glenn-jocher for the update. I am aware of the knowledge distillation process. But was wondering if Yolo had some inbuilt mechanism to work with it. Not sure how its done, but I like the idea of initializing the weights of the smaller model with those from the larger model and then training it on the dataset. I'll explore that option. |
@skyprince999 You're welcome! Indeed, YOLOv5 doesn't have an inbuilt mechanism for model distillation, but initializing the smaller model with weights from the larger one and then fine-tuning on your dataset is a practical approach. This method leverages the pre-trained knowledge and can lead to better performance than training from scratch. If you need further guidance on fine-tuning or have any other questions, feel free to reach out. Happy coding! 😊🚀 |
Hi @ramonhollands, I am doing the same thing. First, I train the model with a dataset by freezing the head (--freeze 24), then I do another training on the same data, initializing the weights with the previously trained weights (output from the training with freezing) and passing those weights as initial weights (--weights runs/exp/weights/last.pt). Additionally, I pass --hyp where lr1 and lr0 are reduced by 10 times. In this process, I am training twice on the same dataset. @glenn-jocher @ramonhollands, can you help me reduce this training time by making it the default, where the head will be trained with a low learning rate and the body will be trained with a high learning rate? |
Hi @sriram-dsl! Your approach of using a differentiated learning rate after unfreezing the network is indeed a solid strategy, often leading to better fine-tuning of the model. To implement this in YOLOv5, you can adjust the learning rates directly in the
In your lr0: 0.001 # lower base learning rate for backbone
lr1: 0.01 # higher base learning rate for head This setup should help streamline the process and potentially reduce total training time by more effectively leveraging the initial frozen training phase. 😊👍 |
you mean `lr0` and `lrf` right in hyp.scratch-low.yaml
…On Mon, 20 May 2024 at 17:57, Glenn Jocher ***@***.***> wrote:
Hi @sriram-dsl <https://github.com/sriram-dsl>! Your approach of using a
differentiated learning rate after unfreezing the network is indeed a solid
strategy, often leading to better fine-tuning of the model. To implement
this in YOLOv5, you can adjust the learning rates directly in the hyp.yaml
file used during training. Here’s a quick example of how you might set this
up:
1.
*Freeze the backbone* and train:
python train.py --freeze 24 --weights yolov5s.pt --data yourdata.yaml --epochs 10
2.
*Unfreeze and train with differentiated learning rates*:
python train.py --weights runs/train/exp/weights/last.pt --data yourdata.yaml --epochs 30 --hyp yourhyp.yaml
In your yourhyp.yaml, specify lower learning rates for earlier layers
(backbone) and higher for later layers (head):
lr0: 0.001 # lower base learning rate for backbonelr1: 0.01 # higher base learning rate for head
This setup should help streamline the process and potentially reduce total
training time by more effectively leveraging the initial frozen training
phase. 😊👍
—
Reply to this email directly, view it on GitHub
<https://docs.ultralytics.com/yolov5/tutorials/transfer_learning_with_frozen_layers#issuecomment-2120361954>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/BAVOFMN5PLZWY3UOQVEZIL3ZDHT4XAVCNFSM4TM55QMKU5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TEMJSGAZTMMJZGU2A>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
@sriram-dsl hi there! Yes, you're correct. In the lr0: 0.001 # lower base learning rate for backbone
lrf: 0.1 # learning rate multiplier for final layers This configuration helps in fine-tuning the model by applying different learning rates to the backbone and the head. Thanks for pointing that out! 😊👍 |
📚 This guide explains how to freeze YOLOv5 🚀 layers when transfer learning. Transfer learning is a useful way to quickly retrain a model on new data without having to retrain the entire network. Instead, part of the initial weights are frozen in place, and the rest of the weights are used to compute loss and are updated by the optimizer. This requires less resources than normal training and allows for faster training times, though it may also results in reductions to final trained accuracy. UPDATED 28 March 2023.
Before You Start
Clone repo and install requirements.txt in a Python>=3.7.0 environment, including PyTorch>=1.7. Models and datasets download automatically from the latest YOLOv5 release.
Freeze Backbone
All layers that match the
freeze
list in train.py will be frozen by setting their gradients to zero before training starts.yolov5/train.py
Lines 119 to 126 in 771ac6c
To see a list of module names:
Looking at the model architecture we can see that the model backbone is layers 0-9:
yolov5/models/yolov5s.yaml
Lines 12 to 48 in 58f8ba7
so we can define the freeze list to contain all modules with 'model.0.' - 'model.9.' in their names:
Freeze All Layers
To freeze the full model except for the final output convolution layers in Detect(), we set freeze list to contain all modules with 'model.0.' - 'model.23.' in their names:
Results
We train YOLOv5m on VOC on both of the above scenarios, along with a default model (no freezing), starting from the official COCO pretrained
--weights yolov5m.pt
:Accuracy Comparison
The results show that freezing speeds up training, but reduces final accuracy slightly.
GPU Utilization Comparison
Interestingly, the more modules are frozen the less GPU memory is required to train, and the lower GPU utilization. This indicates that larger models, or models trained at larger --image-size may benefit from freezing in order to train faster.
Environments
YOLOv5 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):
Status
If this badge is green, all YOLOv5 GitHub Actions Continuous Integration (CI) tests are currently passing. CI tests verify correct operation of YOLOv5 training, validation, inference, export and benchmarks on MacOS, Windows, and Ubuntu every 24 hours and on every commit.
The text was updated successfully, but these errors were encountered: