Better Reconstruction Loss for VQ-VAE

Colab Project available here - https://drive.google.com/drive/folders/1W6LW7lZsHF8BogACkjLmm_h_K7UidKV0?usp=sharing

The paper I have chosen to work on is the [https://arxiv.org/abs/1711.00937](VQ-VAE: Neural Discrete Representation Learning)
The Extension -

A problem that occurs with VQ-VAE image outputs is image blurriness. The goal of my extension is to tackle this blurriness problem. This occurs due to the reconstruction loss just being the MSE loss.

There are a few different ways I can think to do that.

Firstly, the reconstruction loss term can be replaced with a GAN-like discriminator. This has been done in this paper quite successfully (https://arxiv.org/abs/2012.09841). This adds a fair bit of complexity in terms of training an extra model.

My proposed extension would be to see if we could achieve an improvement in the blurriness and general image reconstruction domain using specialised loss functions without the need for extra training. This would mean replacing the image reconstruction loss with a greater array of loss terms targeting image reconstruction metrics. A few examples of these terms could be - Structured Similarity Index Metric, Peak Signal-to-Noise Ratio etc. We will experiment with multiple metrics/multiple combinations, to come up with reasonably computationally cheap ways to improve image reconstruction.

I would build this on top of the VQ-VAE model (the forked project), and compare to that. If time permits, I would also compare it to a comparable implemention of VQGAN.

Better Reconstruction Loss for VQ-VAE - Results

OLD README

Reproducing Neural Discrete Representation Learning

Course Project for IFT 6135 - Representation Learning

Project Report link: final_project.pdf

Instructions

To train the VQVAE with default arguments as discussed in the report, execute:

python vqvae.py --data-folder /tmp/miniimagenet --output-folder models/vqvae

To train the PixelCNN prior on the latents, execute:

python pixelcnn_prior.py --data-folder /tmp/miniimagenet --model models/vqvae --output-folder models/pixelcnn_prior

Datasets Tested

Image

MNIST
FashionMNIST
CIFAR10
Mini-ImageNet

Video

Atari 2600 - Boxing (OpenAI Gym) code

Reconstructions from VQ-VAE

Top 4 rows are Original Images. Bottom 4 rows are Reconstructions.

MNIST

Fashion MNIST

Class-conditional samples from VQVAE with PixelCNN prior on the latents

MNIST

Fashion MNIST

Comments

We noticed that implementing our own VectorQuantization PyTorch function speeded-up training of VQ-VAE by nearly 3x. The slower, but simpler code is in this commit.
We added some basic tests for the vector quantization functions (based on pytest). To run these tests

py.test . -vv

Authors

Rithesh Kumar
Tristan Deleu
Evan Racah

Name		Name	Last commit message	Last commit date
Latest commit History 73 Commits
samples		samples
.gitignore		.gitignore
Final-Project.pdf		Final-Project.pdf
README.md		README.md
datasets.py		datasets.py
functions.py		functions.py
modules.py		modules.py
pixelcnn_baseline.py		pixelcnn_baseline.py
pixelcnn_prior.py		pixelcnn_prior.py
requirements.txt		requirements.txt
test_functions.py		test_functions.py
vae.py		vae.py
vqvae.ipynb		vqvae.ipynb
vqvae.py		vqvae.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Better Reconstruction Loss for VQ-VAE

OLD README

Reproducing Neural Discrete Representation Learning

Course Project for IFT 6135 - Representation Learning

Instructions

Datasets Tested

Image

Video

Reconstructions from VQ-VAE

MNIST

Fashion MNIST

Class-conditional samples from VQVAE with PixelCNN prior on the latents

MNIST

Fashion MNIST

Comments

Authors

About

Releases

Packages

Languages

divij-sinha/pytorch-vqvae

Folders and files

Latest commit

History

Repository files navigation

Better Reconstruction Loss for VQ-VAE

OLD README

Reproducing Neural Discrete Representation Learning

Course Project for IFT 6135 - Representation Learning

Instructions

Datasets Tested

Image

Video

Reconstructions from VQ-VAE

MNIST

Fashion MNIST

Class-conditional samples from VQVAE with PixelCNN prior on the latents

MNIST

Fashion MNIST

Comments

Authors

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages