Skip to content

Official Open Source code for "Masked Autoencoders As Spatiotemporal Learners"

License

Notifications You must be signed in to change notification settings

facebookresearch/mae_st

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Masked Autoencoders As Spatiotemporal Learners: A PyTorch Implementation

This is a PyTorch/GPU re-implementation of the paper Masked Autoencoders As Spatiotemporal Learners:

@Article{MaskedAutoencodersSpatiotemporal2022,
  author  = {Christoph Feichtenhofer and Haoqi Fan and Yanghao Li and Kaiming He},
  journal = {arXiv:2205.09113},
  title   = {Masked Autoencoders As Spatiotemporal Learners},
  year    = {2022},
}

Another implementation that supports AVA and SSv2 downstream evaluation is available in PySlowFast.

  • This repo is a modification on the MAE repo. Installation and preparation follow INSTALL.md.

  • This repo is based on timm==0.3.2, for which a fix is needed to work with PyTorch 1.8.1+.

Catalog

  • Visualization demo
  • Pre-trained checkpoints + fine-tuning code + testing code
  • Pre-training code

Visualization demo

Visualization of MAE output with 95% (left) and 98% (right) mask rate on the same video.

Run our interactive visualization demo using Colab notebook (no GPU needed):

Fine-tuning with pre-trained checkpoints

The following table provides the pre-trained checkpoints used in the paper, pretrained with 90% mask ratio and 1600 effective epochs, converted from the PySlowFast codebase:

ViT-Large ViT-Huge
pre-trained checkpoint on Kinetics-400 download download
md5 edf3a5 3d7f64
ViT-Large ViT-Huge
pre-trained checkpoint on Kinetics-600 download download
md5 9a9645 27495e
ViT-Large ViT-Huge
pre-trained checkpoint on Kinetics-700 download download
md5 cdbada 4c4e3c

The fine-tuning instruction is in FINETUNE.md.

Pre-training

The pre-training instruction is in PRETRAIN.md.

License

This project is under the CC-BY-NC 4.0 license. See LICENSE for details.

About

Official Open Source code for "Masked Autoencoders As Spatiotemporal Learners"

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published