Adding pretrained ViT weights #5085

yiwen-song · 2021-12-10T19:25:45Z

In #4594, we added ViT models' architecture to torchvision prototype.
In this PR, we are going to add pretrained weights for ViT models :D

Model	Training Recipe	Training Command	Training Job ID	Epochs	Nodes	Num of GPUs per node	Batch Size per GPU	Global batch_size	Image Size	Representation Size	Original Paper Acc@1	Classy Vision Acc@1	ACC@1	ACC@5	Testing Job ID	Testing Command
vit_b_16	Close to DeiT	PYTHONPATH=$PYTHONPATH:`pwd` python -u run_with_submitit.py --timeout 3000 --ngpus 8 --nodes 8 --partition train --model vit_b_16 --batch-size 64 --epochs 300 --opt adamw --lr 0.003 --wd 0.3 --lr-scheduler cosineannealinglr --lr-warmup-method linear --lr-warmup-epochs 30 --lr-warmup-decay 0.033 --amp --label-smoothing 0.11 --mixup-alpha 0.2 --auto-augment ra --clip-grad-norm 1 --ra-sampler --cutmix-alpha 1.0 --model-ema	10266	268/300	8	8	64	4096	224	None	77.91	78.98	81.072	95.318	13799	PYTHONPATH=$PYTHONPATH:`pwd` python -u run_with_submitit.py --timeout 3000 --ngpus 1 --nodes 1 --partition train --model vit_b_16 --batch-size 1 --test-only --weights ViT_B_16_Weights.ImageNet1K_V1
vit_b_32	Close to DeiT	PYTHONPATH=$PYTHONPATH:`pwd` python -u ~/workspace/scripts/run_with_submitit.py --timeout 3000 --ngpus 8 --nodes 2 --partition train --model vit_b_32 --batch-size 256 --epochs 300 --opt adamw --lr 0.003 --wd 0.3 --lr-scheduler cosineannealinglr --lr-warmup-method linear --lr-warmup-epochs 30 --lr-warmup-decay 0.033 --amp --label-smoothing 0.1 --mixup-alpha 0.2 --auto-augment imagenet --clip-grad-norm 1 --ra-sampler --cutmix-alpha 1.0 --model-ema	10265	291/300	2	8	256	4096	224	None	73.38	73.3	75.912	92.466	13796	PYTHONPATH=$PYTHONPATH:`pwd` python -u run_with_submitit.py --timeout 3000 --ngpus 1 --nodes 1 --partition train --model vit_b_32 --batch-size 1 --test-only --weights ViT_B_32_Weights.ImageNet1K_V1
vit_l_16	TorchVision New Recipe	PYTHONPATH=$PYTHONPATH:`pwd` python -u run_with_submitit.py --timeout 3000 --ngpus 8 --nodes 2 --model vit_l_16 --batch-size 64 --lr 0.5 --lr-scheduler cosineannealinglr --lr-warmup-epochs 5 --lr-warmup-method linear --auto-augment ta_wide --epochs 600 --random-erase 0.1 --label-smoothing 0.1 --mixup-alpha 0.2 --cutmix-alpha 1.0 --weight-decay 0.00002 --norm-weight-decay 0.0 --model-ema --val-resize-size 232 --clip-grad-norm 1 --ra-sampler	12107/12349/12567 (due to multiple times of resuming)	378/600	2	8	16	1024	224	None	76.53	76.57	79.662	94.638	13804	PYTHONPATH=$PYTHONPATH:`pwd` python -u run_with_submitit.py --timeout 3000 --ngpus 1 --nodes 1 --partition train --model vit_l_16 --batch-size 1 --test-only --weights ViT_L_16_Weights.ImageNet1K_V1
vit_l_32	Close to DeiT	PYTHONPATH=$PYTHONPATH:`pwd` python -u run_with_submitit.py --timeout 3000 --ngpus 8 --nodes 8 --partition train --model vit_l_32 --batch-size 64 --epochs 300 --opt adamw --lr 0.003 --wd 0.3 --lr-scheduler cosineannealinglr --lr-warmup-method linear --lr-warmup-epochs 30 --lr-warmup-decay 0.033 --amp --label-smoothing 0.11 --mixup-alpha 0.2 --auto-augment ra --clip-grad-norm 1 --ra-sampler --cutmix-alpha 1.0 --model-ema	10430	253/300	8	8	64	4096	224	None	71.16	73.49	76.972	93.07	13793	PYTHONPATH=$PYTHONPATH:`pwd` python -u run_with_submitit.py --timeout 3000 --ngpus 1 --nodes 1 --partition train --model vit_l_32 --batch-size 1 --test-only --weights ViT_L_32_Weights.ImageNet1K_V1

cc @datumbox

facebook-github-bot · 2021-12-10T19:55:06Z

💊 CI failures summary and remediations

As of commit 3aee34e (more details on the Dr. CI page):

1/4 failures introduced in this PR
3/4 broken upstream at merge base bbeb320 since Jan 04

1 failure not recognized by patterns:

Job	Step	Action
^{binary_libtorchvision_ops_ios_12.0.0_arm64}	^Build	🔁 rerun

🚧 3 ongoing upstream failures:

These were probably caused by upstream breakages that are not fixed yet.

unittest_linux_cpu_py3.9 since Jan 04 (cc7e856)
- 🔁 rerun
unittest_linux_cpu_py3.7 since Jan 04 (cc7e856)
- 🔁 rerun
unittest_linux_cpu_py3.8 since Jan 04 (cc7e856)
- 🔁 rerun

This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

datumbox

Thanks @sallysyw. From #5086 I understand that some of the optimizations applied to all other models were not used here. Can you confirm that you tried them and were not beneficial? If you haven't used them, I would strongly recommend doing a few more runs prior merging this to confirm we pushed the accuracy as high as we can. The above tricks helped a multitude of models including ResNets, RegNets (unpublished, coming soon), MobileNetV3, EfficientNet and ResNeXt.

torchvision/prototype/models/vision_transformer.py

yiwen-song · 2021-12-23T22:27:28Z

Ok. Let me upload the checkpoint based on the epoch/job-id with the best EMA weights and re-run the testing using the EMA model...

yiwen-song · 2021-12-31T02:50:36Z

Here are the updated version of our best results (all tested with batch-size=1)

Model	Training Job ID	Epochs	Nodes	Num of GPUs per node	Batch Size per GPU	Global batch_size	Image Size	Representation Size	Original Acc@1	ClassyVision Acc@1	ACC@1	ACC@5	Testing Job ID	Testing Command
vit_b_16	10266	268/300	8	8	64	4096	224	None	77.91	78.98	81.072	95.318	13799	PYTHONPATH=$PYTHONPATH:`pwd` python -u ~/workspace/scripts/run_with_submitit.py --timeout 3000 --ngpus 1 --nodes 1 --partition train --model vit_b_16 --batch-size 1 --data-path /datasets01_ontap/imagenet_full_size/061417/ --test-only --weights ViT_B_16_Weights.ImageNet1K_V1
vit_b_32	10265	291/300	2	8	256	4096	224	None	73.38	73.3	75.912	92.466	13796	PYTHONPATH=$PYTHONPATH:`pwd` python -u ~/workspace/scripts/run_with_submitit.py --timeout 3000 --ngpus 1 --nodes 1 --partition train --model vit_b_32 --batch-size 1 --data-path /datasets01_ontap/imagenet_full_size/061417/ --test-only --weights ViT_B_32_Weights.ImageNet1K_V1
vit_l_16	12349	378/600	2	8	16	1024	224	None	76.53	76.57	79.662	94.638	13804	PYTHONPATH=$PYTHONPATH:`pwd` python -u ~/workspace/scripts/run_with_submitit.py --timeout 3000 --ngpus 1 --nodes 1 --partition train --model vit_l_16 --batch-size 1 --data-path /datasets01_ontap/imagenet_full_size/061417/ --test-only --weights ViT_L_16_Weights.ImageNet1K_V1
vit_l_32	10430	253/300	8	8	64	4096	224	None	71.16	73.49	76.972	93.07	13793	PYTHONPATH=$PYTHONPATH:`pwd` python -u ~/workspace/scripts/run_with_submitit.py --timeout 3000 --ngpus 1 --nodes 1 --partition train --model vit_l_32 --batch-size 1 --data-path /datasets01_ontap/imagenet_full_size/061417/ --test-only --weights ViT_L_32_Weights.ImageNet1K_V1

datumbox · 2022-01-02T11:15:37Z

@sallysyw Thanks for the clarifications and for making the necessary changes.

If I understand correctly, vit_l_16 is produced using the new TorchVision recipe while vit_b_16, vit_b_32 and vit_l_32 use one that is closer to DeIT. If that's correct, we need to update the readme on the references and your table to reflect that.

For vit_l_16 (jobids 12107/12349/12567) the exact config I used was:

 --ngpus 8 --nodes 2 --model vit_l_16 --batch-size 64 --lr 0.5 --lr-scheduler cosineannealinglr --lr-warmup-epochs 5 --lr-warmup-method linear --auto-augment ta_wide --epochs 600 --random-erase 0.1 --label-smoothing 0.1 --mixup-alpha 0.2 --cutmix-alpha 1.0 --weight-decay 0.00002 --norm-weight-decay 0.0 --model-ema --val-resize-size 232 --clip-grad-norm 1 --ra-sampler

Moreover, there are a few more changes required in the code now that you introduce weights. You need to replace the @handle_legacy_interface(weights=("pretrained", None)) with the default value of each model. The value None indicated that there was no default pre-trained value. Instead now we should pass the appropriate value for each model, something like ViT_XYZ_Weights.ImageNet1K_V1

With the above changes, we are getting close to being able to merge the PR. Please note that before doing so, we will need to finish some of the pending steps such as deploying the models on Manifold, adding them on the torchvision/models folders on our AWS infra. Let's finish these prior merging. I'll send you offline the guide that describes the model release process.

As per #5085.

datumbox

LGTM, thanks a lot @sallysyw.

Summary: * Adding pretrained ViT weights * Adding recipe as part of meta * update checkpoints using best ema results * Fix handle_legacy_interface and update recipe url * Update README Reviewed By: datumbox Differential Revision: D33426965 fbshipit-source-id: 753ce1d1318df3d47da181db06b35b770de26ffc

Differential Revision: D33426965 Original commit changeset: 753ce1d1318d Original Phabricator Diff: D33426965 fbshipit-source-id: db9a9f51c5365b2dd9c002aa681da0be33b3cb7d

Summary: * Adding pretrained ViT weights * Adding recipe as part of meta * update checkpoints using best ema results * Fix handle_legacy_interface and update recipe url * Update README Reviewed By: sallysyw Differential Revision: D33479262 fbshipit-source-id: 20d344db0961ed8ae12104c509ebddd17179d286

xiaohu2015 · 2022-01-11T07:10:30Z

@sallysyw Is is a plan to add vit-tiny and vit-small (DeiT-Ti, DeiT-Small)

Adding pretrained ViT weights

4a2e712

pytorch-probot bot added the ciflow/default label Dec 10, 2021

yiwen-song requested review from datumbox, kazhang and fmassa December 10, 2021 19:27

yiwen-song added module: models weights labels Dec 10, 2021

facebook-github-bot added the cla signed label Dec 10, 2021

yiwen-song linked an issue Dec 10, 2021 that may be closed by this pull request

Adding Vision Transformer to torchvision/models #4593

Closed

yiwen-song and others added 2 commits December 10, 2021 14:10

Merge branch 'pytorch:main' into weights

fc88d64

Adding recipe as part of meta

64a28b7

datumbox reviewed Dec 11, 2021

View reviewed changes

torchvision/prototype/models/vision_transformer.py Show resolved Hide resolved

torchvision/prototype/models/vision_transformer.py Show resolved Hide resolved

Merge branch 'pytorch:main' into weights

8fad97e

update checkpoints using best ema results

9c49625

Merge branch 'main' into weights

377463a

yiwen-song added a commit that referenced this pull request Jan 4, 2022

Update the best training commands for ViT models

a200635

As per #5085.

yiwen-song mentioned this pull request Jan 4, 2022

Update the best training commands for ViT models #5156

Closed

yiwen-song and others added 2 commits January 3, 2022 19:30

Merge branch 'main' into weights

835dd69

Fix handle_legacy_interface and update recipe url

0b500bb

datumbox mentioned this pull request Jan 4, 2022

Remove incorrect ViT recipe commands. #5159

Merged

yiwen-song and others added 2 commits January 4, 2022 17:49

Merge branch 'pytorch:main' into weights

7c8c719

Update README

3aee34e

datumbox approved these changes Jan 5, 2022

View reviewed changes

datumbox added the enhancement label Jan 5, 2022

datumbox merged commit df628c4 into pytorch:main Jan 5, 2022

yiwen-song deleted the weights branch January 6, 2022 18:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding pretrained ViT weights #5085

Adding pretrained ViT weights #5085

yiwen-song commented Dec 10, 2021 •

edited

Loading

facebook-github-bot commented Dec 10, 2021 •

edited

Loading

datumbox left a comment

yiwen-song commented Dec 23, 2021 •

edited

Loading

yiwen-song commented Dec 31, 2021 •

edited

Loading

datumbox commented Jan 2, 2022 •

edited

Loading

datumbox left a comment

xiaohu2015 commented Jan 11, 2022

Adding pretrained ViT weights #5085

Adding pretrained ViT weights #5085

Conversation

yiwen-song commented Dec 10, 2021 • edited Loading

facebook-github-bot commented Dec 10, 2021 • edited Loading

💊 CI failures summary and remediations

1 failure not recognized by patterns:

🚧 3 ongoing upstream failures:

datumbox left a comment

Choose a reason for hiding this comment

yiwen-song commented Dec 23, 2021 • edited Loading

yiwen-song commented Dec 31, 2021 • edited Loading

datumbox commented Jan 2, 2022 • edited Loading

datumbox left a comment

Choose a reason for hiding this comment

xiaohu2015 commented Jan 11, 2022

yiwen-song commented Dec 10, 2021 •

edited

Loading

facebook-github-bot commented Dec 10, 2021 •

edited

Loading

yiwen-song commented Dec 23, 2021 •

edited

Loading

yiwen-song commented Dec 31, 2021 •

edited

Loading

datumbox commented Jan 2, 2022 •

edited

Loading