This is a PyTorch/GPU re-implementation of the paper Masked Autoencoders As Spatiotemporal Learners:
@Article{MaskedAutoencodersSpatiotemporal2022,
author = {Christoph Feichtenhofer and Haoqi Fan and Yanghao Li and Kaiming He},
journal = {arXiv:2205.09113},
title = {Masked Autoencoders As Spatiotemporal Learners},
year = {2022},
}
Another implementation that supports AVA and SSv2 downstream evaluation is available in PySlowFast.
-
This repo is a modification on the MAE repo. Installation and preparation follow INSTALL.md.
-
This repo is based on
timm==0.3.2
, for which a fix is needed to work with PyTorch 1.8.1+.
- Visualization demo
- Pre-trained checkpoints + fine-tuning code + testing code
- Pre-training code
Visualization of MAE output with 95% (left) and 98% (right) mask rate on the same video.
Run our interactive visualization demo using Colab notebook (no GPU needed):
The following table provides the pre-trained checkpoints used in the paper, pretrained with 90% mask ratio and 1600 effective epochs, converted from the PySlowFast codebase:
ViT-Large | ViT-Huge | |
---|---|---|
pre-trained checkpoint on Kinetics-400 | download | download |
md5 | edf3a5 | 3d7f64 |
ViT-Large | ViT-Huge | |
---|---|---|
pre-trained checkpoint on Kinetics-600 | download | download |
md5 | 9a9645 | 27495e |
ViT-Large | ViT-Huge | |
---|---|---|
pre-trained checkpoint on Kinetics-700 | download | download |
md5 | cdbada | 4c4e3c |
The fine-tuning instruction is in FINETUNE.md.
The pre-training instruction is in PRETRAIN.md.
This project is under the CC-BY-NC 4.0 license. See LICENSE for details.