lstm-vision

Repository containing PyTorch code to train an LSTM on a CV dataset, e.g. MNIST. In principle, any image dataset can be used, for this change these lines and don't forget to adjust the flag channels_img, which is by default 1.

The code can be run both with AMP (automatic mixed precision) enabled and torch.compile().

Note that this repository is more for showing that LSTMs can also be used to do image classification. To use the right inductive bias, a CNN/ResNet/DenseNet/etc. should be preferred, since an LSTM treats the image sequentially, i.e. pixel by pixel.

Run

Single-GPU

On a single-GPU machine, I ran the script run.py as follows:

docker build -f Dockerfile -t lstm-vision:1.4.0 .
docker run --shm-size 512m --rm -v $(pwd):/app --gpus all -it lstm-vision:1.4.0 python -B /app/lstm_vision/run.py training.saving_path=[...]

To check all available config keys, check out the file configs/conf.yaml.

If you only want to evaluate the model from a pre-existing checkpoint, add

model.loading_path=[...] training.num_epochs=0

Multiple GPUs

You can easily specify to use DistributedDataParallel during training, which uses several GPUs if available:

docker run --shm-size 512m --rm -v $(pwd):/app --gpus all -it lstm-vision:1.4.0 torchrun --nproc_per_node=NUM_GPUS_YOU_HAVE /app/lstm_vision/run.py training.use_ddp=true training.master_addr='"<ip-address>"'
- training.master_port='"<port>"' training.saving_path=[...]

where the master address is the IP address that can be obtained via hostname -I. If you only want to evaluate the model from a pre-existing checkpoint, add

model.loading_path=[...] training.num_epochs=0

W&B

If you want to log some metrics to Weights & Biases, append the following to the docker run command:

training.wandb__api_key=<your_key>
# training.wandb__api_key=2fru...

Results

All results were obtained on a single GPU. For this small model, I do not recommend a DDP setup.

Training a bidirectional LSTM with roughly 3.9M params for 50 epochs results in,

Train data: Got 49839/50000 with accuracy 99.68 %
Test data: Got 9906/10000 with accuracy 99.06 %

On a machine with an NVIDIA RTX 4090 with an Intel i5-10400, training for 50 epochs takes about 232 s, and in total about 12.61 GB of GPU memory are required. Note that without the --use_amp flag, which is specified in configs/conf.json, about double the memory will be required. If you have a GPU with less than already 12.61 GB VRAM, decrease the batch size.

I also tried a compilation mode (training.compile_mode) with all modes "default", "reduce-overhead" & "max-autotune", and noticed that the runtime slightly increases when using the MNIST dataset. This happens, since the warmup phase, cf. here, takes a long time, and after the warmup phase, the runtime epoch is comparable to no compilation. However, for other CV datasets (e.g. CIFAR100) and other model architectures, this might change! Also, please note that torch.compile(..., full_graph=False) has to be used, since TorchDynamo does not allow full_graph=True for RNNs/GRUs/LSTMs.

The above results were obtained with $10 %$ label smoothing. I varied the label smoothing between $0 %$ and $10 %$ in steps of $2 %$ and noticed that the greater the label smoothing, the higher the train and validation losses per epoch.

Name		Name	Last commit message	Last commit date
Latest commit History 199 Commits
.github/workflows		.github/workflows
configs		configs
lstm_vision		lstm_vision
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
Dockerfile		Dockerfile
README.md		README.md
pyproject.toml		pyproject.toml
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

lstm-vision

Run

Single-GPU

Multiple GPUs

W&B

Results

About

Releases 8

Packages

Languages

ImahnShekhzadeh/lstm_vision

Folders and files

Latest commit

History

Repository files navigation

lstm-vision

Run

Single-GPU

Multiple GPUs

W&B

Results

About

Resources

Stars

Watchers

Forks

Releases 8

Packages 0

Languages

Packages