Masked Autoencoders in PyTorch

A simple, unofficial implementation of MAE (Masked Autoencoders are Scalable Vision Learners) using pytorch-lightning. A PyTorch implementation by the authors can be found here.

Currently implements training on CUB, StanfordCars, STL-10 but is easily extensible to any other image dataset.

Updates

September 1, 2023

Updated for compatibility with Pytorch 2.0 and PyTorch-Lightning 2.0. This probably breaks backwards compatibility. Created a release for the old version of the code.
Modified parts of the training code for better conciseness and efficiency.
Added additional features, including the option to save some validation reconstructions during training. Note: having trouble with saving reconstructions during distributed training; freezes at the end of the validation epoch.
Retrained CUB and Cars models with new code and a stronger decoder.

February 4, 2022

Fixed a bug in the code for generating mask indices. Retrained and updated the reconstruction figures (see below). They aren't quite as pretty now, but they make more sense.

Setup

# Clone the repository
git clone https://github.com/catalys1/mae-pytorch.git
cd mae-pytorch

# Install required libraries (inside a virtual environment preferably)
pip install -r requirements.txt

# Set up .env for path to data
echo "DATADIR=/path/to/data" > .env

Usage

MAE training

Training options are provided through configuration files, handled by LightningCLI. See config/ for examples.

Train an MAE model on the CUB dataset:

python train.py fit -c config/mae.yaml -c config/data/cub_mae.yaml

Using multiple GPUs:

python train.py fit -c config/mae.yaml -c config/data/cub_mae.yaml --trainer.devices 8

Linear probing (currently only supported on STL-10)

Evaluate the learned representations using a linear probe. First, pretrain the model on the 100.000 samples of the 'unlabeled' split.

python train.py fit -c config/mae.yaml -c config/data/stl10_mae.yaml

Now, append a linear probe to the last layer of the frozen encoder and discard the decoder. The appended classifier is then trained on 4000 labeled samples of the 'train' split (another 1000 are used for training validation) and evaluated on the 'test' split. To do so, simply provide the path to the pretrained model checkpoint in the command below.

python linear_probe.py -c config/linear_probe.yaml -c config/data/stl10_linear_probe.yaml --model.init_args.ckpt_path <path to pretrained .ckpt>

This yields 77.96% accuracy on the test data.

Fine-tuning

Not yet implemented.

Implementation

The default model uses ViT-Base for the encoder, and a small ViT (depth=6, width=384) for the decoder. This is smaller than the model used in the paper.

Dependencies

Configuration and training is handled completely by pytorch-lightning.
The MAE model uses the VisionTransformer from timm.
Interface to FGVC datasets through fgvcdata.
Configurable environment variables through python-dotenv.

Results

Image reconstructions of CUB validation set images after training with the following command:

python train.py fit -c config/mae.yaml -c config/data/cub_mae.yaml --data.init_args.batch_size 256 --data.init_args.num_workers 12

Image reconstructions of Cars validation set images after training with the following command:

python train.py fit -c config/mae.yaml -c config/data/cars_mae.yaml --data.init_args.batch_size 256 --data.init_args.num_workers 16

Hyperparameters

Param	Setting
GPUs	1xA100
Batch size	256
Learning rate	1.5e-4
LR schedule	Cosine decay
Warmup	10% of steps
Training steps	78,000

Training

Training and validation loss curves for CUB.

Validation image reconstructions over the course of training.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Masked Autoencoders in PyTorch

Updates

September 1, 2023

February 4, 2022

Setup

Usage

MAE training

Linear probing (currently only supported on STL-10)

Fine-tuning

Implementation

Dependencies

Results

Hyperparameters

Training

Files

README.md

Latest commit

History

README.md

File metadata and controls

Masked Autoencoders in PyTorch

Updates

September 1, 2023

February 4, 2022

Setup

Usage

MAE training

Linear probing (currently only supported on STL-10)

Fine-tuning

Implementation

Dependencies

Results

Hyperparameters

Training