DropMAE

🌟 The codes for our CVPR 2023 paper 'DropMAE: Masked Autoencoders with Spatial-Attention Dropout for Tracking Tasks'. [Link]

Project Page

[Link]

If you find our work useful in your research, please consider citing:

@inproceedings{dropmae2023,
  title={DropMAE: Masked Autoencoders with Spatial-Attention Dropout for Tracking Tasks},
  author={Qiangqiang Wu and Tianyu Yang and Ziquan Liu and Baoyuan Wu and Ying Shan and Antoni B. Chan},
  booktitle={CVPR},
  year={2023}
}

Overall Architecture

Frame Reconstruction Results.

DropMAE leverages more temporal cues for reconstruction.

Catalog

Environment setup

This repo is a modification based on the MAE repo. Installation follows that repo. You can also check our requirements file.

Dataset Download

In the dropmae pre-training, we mainly use Kinetics Datasets, which can be download in this Link. We use its training raw videos (*.mp4) for training. The detailed download instruction can also be found here.

DropMAE pre-training

To pre-train ViT-Base (the default configuration) with multi-node distributed training, run the following on 8 nodes with 8 GPUs each:

python -m torch.distributed.launch --nproc_per_node=8 --nnodes=8 \
--node_rank=$INDEX --master_addr=$CHIEF_IP --master_port=1234  main_pretrain_kinetics.py --batch_size 64 \
--model mae_vit_base_patch16 \
--norm_pix_loss \
--mask_ratio 0.75 \
--epochs 400 \
--warmup_epochs 40 \
--blr 1.5e-4 \
--weight_decay 0.05 \
--P 0.1 \
--frame_gap 50 \
--data_path $data_path_to_k400_training_videos \
--output_dir $output_dir \
--log_dir $log_dir

Here the effective batch size is 64 (batch_size per gpu) * 8 (nodes) * 8 (gpus per node) = 4096. If memory or # gpus is limited, use --accum_iter to maintain the effective batch size, which is similar to MAE.
P is the spatial-attention dropout ratio for DropMAE.
data_path indicates the Kinetics (e.g., K400 and K700) training video folder path.
The exact same hyper-parameters and configs (initialization, augmentation, etc.) are used in our implementation w/ MAE.

Training logs

The pre-training logs of K400-1600E and K700-800E are provided.

Pre-trained Models

We also provide the pre-trained models (ViT-Base) on K400 and K800 datasets.
Conviniently, you could try your tracking model w/ our pre-trained models as the initialization weights for improving downstream performance.

	K400-1600E	K700-800E
pre-trained checkpoint	download	download

Fine-tuning on VOT

The OSTrack w/ our DropMAE pre-trained models can achieve state-of-the-art performance on existing popular tracking benchmarks.

Tracker	GOT-10K (AO)	LaSOT (AUC)	LaSOT (AUC)	TrackingNet (AUC)	TNL2K(AUC)
DropTrack-K700-ViTBase	75.9	71.8	52.7	84.1	56.9

The detailed fine-tuning codes && models can be found in our DropTrack repository.

Fine-tuning on VOS

The detailed VOS fine-tuning can be found in our DropSeg repository.

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
__pycache__		__pycache__
dataset_video		dataset_video
figs_paper		figs_paper
process_data		process_data
util		util
README.md		README.md
engine_pretrain.py		engine_pretrain.py
k400_1600E_training_log.txt		k400_1600E_training_log.txt
k700_800E_training_log.txt		k700_800E_training_log.txt
main_pretrain_kinetics.py		main_pretrain_kinetics.py
models_mae.py		models_mae.py
models_vit.py		models_vit.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DropMAE

Project Page

Overall Architecture

Frame Reconstruction Results.

Catalog

Environment setup

Dataset Download

DropMAE pre-training

Training logs

Pre-trained Models

Fine-tuning on VOT

Fine-tuning on VOS

About

Releases

Packages

Languages

jimmy-dq/DropMAE

Folders and files

Latest commit

History

Repository files navigation

DropMAE

Project Page

Overall Architecture

Frame Reconstruction Results.

Catalog

Environment setup

Dataset Download

DropMAE pre-training

Training logs

Pre-trained Models

Fine-tuning on VOT

Fine-tuning on VOS

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages