Dynamical Variational Autoencoders

This repository contains the code for this paper:

Dynamical Variational Autoencoders: A Comprehensive Review
Laurent Girin, Simon Leglaive, Xiaoyu Bie, Julien Diard, Thomas Hueber, Xavier Alameda-Pineda
[Paper]

More precisely, it is a re-implementation of the following models in Pytorch for speech re-synthesis task:

Bibtex

If you find this code useful, please star the project and consider citing:

@article{dvae2020,
  title={Dynamical Variational Autoencoders: A Comprehensive Review},
  author={Girin, Laurent and Leglaive, Simon and Bie, Xiaoyu and Diard, Julien and Hueber, Thomas and Alameda-Pineda, Xavier},
  Journal={arXiv preprint arXiv:2008.12595},
  year={2020}
}

Installation instructions

This code has been tested on Ubuntu 16.04, Python 3.7, Pytorch=1.3.1, CUDA 9.2, NVIDIA Titan X and NVIDIA Titan RTX, you could use it in the following methods:

Install as a package

The simplest way to use our code is install it as a python package:

# For Linux users
pip install git+https://github.com/XiaoyuBIE1994/DVAE-speech@code_release
# For Mac users
pip install git+https://github.com/XiaoyuBIE1994/DVAE-speech@code_release_mac

, then you could import it like other packages:

import dvae

, for more detailed usage tutorial, pleas find in section Example

Use container

It's highly recommended to use Singularity container for results reproducing or further application, we provide a sigularity definition file example_singularity/dvae.def to build the images:

# Download singularity
sudo apt-get install -y singularity-container

# Build singularity image
sudo singularity build dvae.sif example_singularity/dvae.def

# Shell into the image, no cuda
singularity shell --bind /path_to_your_dir/:/mnt/your_dir_name singularity/dvae.sif

# Execute commands, enable cuda, you need to define the data path in the config files
singularity exec --nv --bind /path_to_your_dir/:/mnt singularity/dvae.sif python train_model.py example_configuration/cfg_dkf.ini

For more information about singularity, please read Singularity User Guide

Tips

For Mac users, speech evaluation is disable because there exist some compilation errors for pypesq on MacOS
Python 3.8 support requires Tensorflow 2.2 or later (link), so it is recommended to use Python 3.7 since speechmetrics package needs Tensorflow 2.0

Usage

dvae-speech has been designed to be easily used in a modular way. All you need to do is to specify a configure file path for model initialization, then you could either:

train(): train your model with training/validation dataset specified in your configure file
generate(): generate a reconstructed audio with an input audio file
eval(): evaluate reconstrcuted audio file with specified metric (rmse, pesq, stoi, all)
test(): apply reconstruction for all audio files with indicated test dataset directory path

Example

# instantiate a model
from dvae import LearningAlgorithm
cfg_file = 'config.ini'
learning_algo = LearningAlgorithm(config_file=cfg_file)

# train your model
learning_algo.train()

# generate audio with model state in cache
# reconstructed audio will be saved in the same path as input audio file, named as 'audio_001_recon.wav'
audio_ref = 'audio_001.wav'
learning_algo.generate(audio_orig=audio_ref, audio_recon=None, state_dict_file=None)

# generate audio with given model state
# reconstructed audio will be saved in the given path
audio_recon = 'recon/audio_001_recon.wav'
model_state = 'model_state.pt'
learning_algo.generate(audio_orig=audio_ref, audio_recon=audio_recon, state_dict_file=model_state)

# evaluate audio quality with model state in cache
score_rmse = learning_algo.eval(audio_ref=audio_ref, audio_est=audio_recon, metric='rmse') # only RMSE
score_rmse, score_pesq, score_stoi = learning_algo.eval(audio_ref=audio_ref, audio_est=audio_recon, metric='all') # both RMSE, PESQ and STOI

# test model on test dataset with model state in cache
test_data_dir = 'data_to_test'
list_score_rmse, list_score_pesq, list_score_stoi = learning_algo.test(data_dir=test_data_dir, state_dict_file=None)

Config file

We provide all configuration examples of the above models in example_configuration/. For results reproducing, all you need to do is to replace saved_root, train_data_dir and val_data_dir with your own directory path

Main results

All models are trained with Adam with a batch size of 32, using:

training dataset: wsj0_si_tr_s
validation dataset: wsj0_si_dt_05
test dataset: wsj0_si_et_05

For more details, pleas visit Chapter 13 Experiments in our article.

We propose the evaluation results in average and their training curve in the following table, the trained models can be found in saved_model

DVAE	RMSE	PESQ	STOI
VAE	0.0510	2.05	0.86
DKF	0.0344	3.30	0.94
STORN	0.0338	3.05	0.93
VRNN	0.0267	3.60	0.96
SRNN	0.0248	3.64	0.97
RVAE-Causal	0.0499	2.27	0.89
RVAE-NonCausal	0.0479	2.37	0.89
DSAE	0.0469	2.32	0.90

Name		Name	Last commit message	Last commit date
Latest commit History 165 Commits
dvae		dvae
example_configuration		example_configuration
example_singularity		example_singularity
figures		figures
saved_model		saved_model
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
MANIFEST.in		MANIFEST.in
README.md		README.md
VAEs_simon.py		VAEs_simon.py
convert_CRVAE_Simon2Xiaoyu.py		convert_CRVAE_Simon2Xiaoyu.py
convert_NCRVAE_Simon2Xiaoyu.py		convert_NCRVAE_Simon2Xiaoyu.py
convert_VAE_Simon2Xiaoyu.py		convert_VAE_Simon2Xiaoyu.py
setup.py		setup.py
train_model.py		train_model.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Dynamical Variational Autoencoders

Bibtex

Installation instructions

Install as a package

Use container

Tips

Usage

Example

Config file

Main results

VAE

DKF

STORN

VRNN

SRNN

RVAE-Causal

RVAE-NonCausal

DSAE

About

Releases

Packages

Languages

License

sleglaive/DVAE-speech

Folders and files

Latest commit

History

Repository files navigation

Dynamical Variational Autoencoders

Bibtex

Installation instructions

Install as a package

Use container

Tips

Usage

Example

Config file

Main results

VAE

DKF

STORN

VRNN

SRNN

RVAE-Causal

RVAE-NonCausal

DSAE

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages