Skip to content

Commit 2767f81

Browse files
committed
v0.1
1 parent da9da7c commit 2767f81

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

49 files changed

+11282
-1
lines changed

README.md

+79-1
Original file line numberDiff line numberDiff line change
@@ -1 +1,79 @@
1-
# DeepMIM
1+
# DeepMIM
2+
3+
## Introduction
4+
This repository is the official implementation of our
5+
6+
**DeepMIM: Deep Supervision for Masked Image Modeling**
7+
8+
[[arxiv](https://arxiv.org/abs/2303.08817)] [[code](https://github.com/OliverRensu/DeepMIM)]
9+
10+
*[Sucheng Ren](https://oliverrensu.github.io/), [Fangyun Wei](https://scholar.google.com/citations?user=-ncz2s8AAAAJ&hl=en), [Samuel Albanie](https://samuelalbanie.com/), [Zheng Zhang](https://stupidzz.github.io/), [Han Hu](https://ancientmooner.github.io/)*
11+
12+
> Deep supervision, which involves extra supervisions to the intermediate features of a neural network, was widely used in image classification in the early deep learning era since it significantly reduces the training difficulty and eases the optimization like avoiding gradient vanish over the vanilla training. Nevertheless, with the emergence of normalization techniques and residual connection, deep supervision in image classification was gradually phased out. In this paper, we revisit deep supervision for masked image modeling (MIM) that pre-trains a Vision Transformer (ViT) via a mask-and-predict scheme. Experimentally, we find that deep supervision drives the shallower layers to learn more meaningful representations, accelerates model convergence, and expands attention diversities. Our approach, called DeepMIM, significantly boosts the representation capability of each layer. In addition, DeepMIM is compatible with many MIM models across a range of reconstruction targets.
13+
14+
![method](figures/method.png)
15+
16+
## News
17+
* Code and checkpoints are released!
18+
## Installation
19+
We build the repo based on [MAE](https://github.com/facebookresearch/mae)
20+
21+
## Pretraining
22+
We pretrain DeepMIM on 32 V100 GPU with overall batch size of 4096 which is identical to that in MAE.
23+
```
24+
python -m torch.distributed.launch \
25+
--nnodes 4 --node_rank $noderank \
26+
--nproc_per_node 8 --master_addr $ip --master_port $port \
27+
main_pretrain.py \
28+
--batch_size 128 \
29+
--model mae_vit_base_patch16 \
30+
--norm_pix_loss --clip_path /path/to/clip \
31+
--mask_ratio 0.75 \
32+
--epochs 1600 \
33+
--warmup_epochs 40 \
34+
--blr 1.5e-4 --weight_decay 0.05 \
35+
--data_path /path/to/imagenet/
36+
```
37+
38+
## Fine-tuning on ImageNet-1K (Classification)
39+
Expected results: 85.6% Top-1 Accuracy [log](./log/FT-log.txt)
40+
```
41+
python -m torch.distributed.launch --nproc_per_node=8 main_finetune.py \
42+
--batch_size 128 \
43+
--model vit_base_patch16 \
44+
--finetune ./output_dir/checkpoint-1599.pth \
45+
--epochs 100 \
46+
--output_dir ./out_finetune/ \
47+
--blr 1e-4 --layer_decay 0.6 \
48+
--weight_decay 0.05 --drop_path 0.1 --reprob 0.25 --mixup 0.8 --cutmix 1.0 \
49+
--dist_eval --data_path /path/to/imagenet
50+
```
51+
52+
## Fune-tuning on ADE20K (Semantic Segmentation)
53+
Please refer [Segmentation/README.md](./Segmentation/README.md)
54+
55+
## Checkpoint
56+
The pretrained and finetuned model on ImageNet-1K are available at
57+
58+
[[Google Drive](https://drive.google.com/drive/folders/1VLJX93RTnCLvIThLxmp71eBsm41HP0sw?usp=sharing)]
59+
60+
## Comparison
61+
Performance comparison on ImageNet-1K classification and ADE20K Semantic Segmentation.
62+
|Method|Model Size| Top-1 | mIoU|
63+
|---|:---:|:---:|:---:|
64+
|MAE|ViT-B| 83.6| 48.1|
65+
|DeepMIM-CLIP|ViT-B| 85.6 | 53.1|
66+
67+
68+
## Citation
69+
70+
If you have any question, feel free to contact [Sucheng Ren]([email protected]) :)
71+
```
72+
@article{ren2023deepmim,
73+
title={DeepMIM: Deep Supervision for Masked Image Modeling},
74+
author={Sucheng Ren and Fangyun Wei and Samuel Albanie and Zheng Zhang and Han Hu},
75+
year={2023},
76+
archivePrefix={arXiv},
77+
primaryClass={cs.CV}
78+
}
79+
```

Segmentation/README.md

+43
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,43 @@
1+
# ADE20k Semantic Segmentation with DeepMIM
2+
3+
## Getting started
4+
5+
1. Install the [mmsegmentation](https://github.com/open-mmlab/mmsegmentation) library and some required packages.
6+
7+
```bash
8+
pip install mmcv-full==1.3.0 mmsegmentation==0.11.0
9+
pip install scipy timm==0.3.2
10+
```
11+
12+
2. Install [apex](https://github.com/NVIDIA/apex) for mixed-precision training
13+
14+
```bash
15+
git clone https://github.com/NVIDIA/apex
16+
cd apex
17+
pip install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./
18+
```
19+
20+
3. Follow the guide in [mmseg](https://github.com/open-mmlab/mmsegmentation/blob/master/docs/dataset_prepare.md) to prepare the ADE20k dataset.
21+
22+
23+
## Fine-tuning with DeepMIM-CLIP
24+
Command:
25+
```
26+
bash tools/dist_train.sh \
27+
configs/mae/upernet_mae_base_12_512_slide_160k_ade20k.py 8 --seed 0 --work-dir ./ckpt/ \
28+
--options model.pretrained="/path/to/DeepMIM-CLIP-PT.pth"
29+
```
30+
Expected results [log](./log/DeepMIM-Seg.log) :
31+
```
32+
+--------+-------+-------+-------+
33+
| Scope | mIoU | mAcc | aAcc |
34+
+--------+-------+-------+-------+
35+
| global | 53.05 | 64.18 | 84.73 |
36+
+--------+-------+-------+-------+
37+
```
38+
39+
## Checkpoint
40+
The checkpoint can be found in [Google Drive](https://drive.google.com/drive/folders/1VLJX93RTnCLvIThLxmp71eBsm41HP0sw?usp=sharing)
41+
42+
## Acknowledgement
43+
This repository is built using [mae segmentation](https://github.com/implus/mae_segmentation), [mmseg](https://github.com/open-mmlab/mmsegmentation)

0 commit comments

Comments
 (0)