-
Notifications
You must be signed in to change notification settings - Fork 7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding pretrained ViT weights #5085
Conversation
💊 CI failures summary and remediationsAs of commit 3aee34e (more details on the Dr. CI page):
1 failure not recognized by patterns:
🚧 3 ongoing upstream failures:These were probably caused by upstream breakages that are not fixed yet.
This comment was automatically generated by Dr. CI (expand for details).Please report bugs/suggestions to the (internal) Dr. CI Users group. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @sallysyw. From #5086 I understand that some of the optimizations applied to all other models were not used here. Can you confirm that you tried them and were not beneficial? If you haven't used them, I would strongly recommend doing a few more runs prior merging this to confirm we pushed the accuracy as high as we can. The above tricks helped a multitude of models including ResNets, RegNets (unpublished, coming soon), MobileNetV3, EfficientNet and ResNeXt.
Ok. Let me upload the checkpoint based on the epoch/job-id with the best EMA weights and re-run the testing using the EMA model... |
Here are the updated version of our best results (all tested with batch-size=1)
|
@sallysyw Thanks for the clarifications and for making the necessary changes. If I understand correctly, For
Moreover, there are a few more changes required in the code now that you introduce weights. You need to replace the With the above changes, we are getting close to being able to merge the PR. Please note that before doing so, we will need to finish some of the pending steps such as deploying the models on Manifold, adding them on the torchvision/models folders on our AWS infra. Let's finish these prior merging. I'll send you offline the guide that describes the model release process. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks a lot @sallysyw.
Summary: * Adding pretrained ViT weights * Adding recipe as part of meta * update checkpoints using best ema results * Fix handle_legacy_interface and update recipe url * Update README Reviewed By: datumbox Differential Revision: D33426965 fbshipit-source-id: 753ce1d1318df3d47da181db06b35b770de26ffc
Differential Revision: D33426965 Original commit changeset: 753ce1d1318d Original Phabricator Diff: D33426965 fbshipit-source-id: db9a9f51c5365b2dd9c002aa681da0be33b3cb7d
Summary: * Adding pretrained ViT weights * Adding recipe as part of meta * update checkpoints using best ema results * Fix handle_legacy_interface and update recipe url * Update README Reviewed By: sallysyw Differential Revision: D33479262 fbshipit-source-id: 20d344db0961ed8ae12104c509ebddd17179d286
@sallysyw Is is a plan to add vit-tiny and vit-small (DeiT-Ti, DeiT-Small) |
In #4594, we added ViT models' architecture to torchvision prototype.
In this PR, we are going to add pretrained weights for ViT models :D
pwd
python -u run_with_submitit.py --timeout 3000 --ngpus 8 --nodes 8 --partition train --model vit_b_16 --batch-size 64 --epochs 300 --opt adamw --lr 0.003 --wd 0.3 --lr-scheduler cosineannealinglr --lr-warmup-method linear --lr-warmup-epochs 30 --lr-warmup-decay 0.033 --amp --label-smoothing 0.11 --mixup-alpha 0.2 --auto-augment ra --clip-grad-norm 1 --ra-sampler --cutmix-alpha 1.0 --model-emapwd
python -u ~/workspace/scripts/run_with_submitit.py --timeout 3000 --ngpus 8 --nodes 2 --partition train --model vit_b_32 --batch-size 256 --epochs 300 --opt adamw --lr 0.003 --wd 0.3 --lr-scheduler cosineannealinglr --lr-warmup-method linear --lr-warmup-epochs 30 --lr-warmup-decay 0.033 --amp --label-smoothing 0.1 --mixup-alpha 0.2 --auto-augment imagenet --clip-grad-norm 1 --ra-sampler --cutmix-alpha 1.0 --model-emapwd
python -u run_with_submitit.py --timeout 3000 --ngpus 8 --nodes 2 --model vit_l_16 --batch-size 64 --lr 0.5 --lr-scheduler cosineannealinglr --lr-warmup-epochs 5 --lr-warmup-method linear --auto-augment ta_wide --epochs 600 --random-erase 0.1 --label-smoothing 0.1 --mixup-alpha 0.2 --cutmix-alpha 1.0 --weight-decay 0.00002 --norm-weight-decay 0.0 --model-ema --val-resize-size 232 --clip-grad-norm 1 --ra-samplerpwd
python -u run_with_submitit.py --timeout 3000 --ngpus 8 --nodes 8 --partition train --model vit_l_32 --batch-size 64 --epochs 300 --opt adamw --lr 0.003 --wd 0.3 --lr-scheduler cosineannealinglr --lr-warmup-method linear --lr-warmup-epochs 30 --lr-warmup-decay 0.033 --amp --label-smoothing 0.11 --mixup-alpha 0.2 --auto-augment ra --clip-grad-norm 1 --ra-sampler --cutmix-alpha 1.0 --model-emacc @datumbox