Skip to content

sleglaive/DVAE-speech

 
 

Repository files navigation

Dynamical Variational Autoencoders

This repository contains the code for this paper:

Dynamical Variational Autoencoders: A Comprehensive Review
Laurent Girin, Simon Leglaive, Xiaoyu Bie, Julien Diard, Thomas Hueber, Xavier Alameda-Pineda
[Paper]

More precisely, it is a re-implementation of the following models in Pytorch for speech re-synthesis task:

Bibtex

If you find this code useful, please star the project and consider citing:

@article{dvae2020,
  title={Dynamical Variational Autoencoders: A Comprehensive Review},
  author={Girin, Laurent and Leglaive, Simon and Bie, Xiaoyu and Diard, Julien and Hueber, Thomas and Alameda-Pineda, Xavier},
  Journal={arXiv preprint arXiv:2008.12595},
  year={2020}
}

Installation instructions

This code has been tested on Ubuntu 16.04, Python 3.7, Pytorch=1.3.1, CUDA 9.2, NVIDIA Titan X and NVIDIA Titan RTX, you could use it in the following methods:

Install as a package

The simplest way to use our code is install it as a python package:

# For Linux users
pip install git+https://github.com/XiaoyuBIE1994/DVAE-speech@code_release
# For Mac users
pip install git+https://github.com/XiaoyuBIE1994/DVAE-speech@code_release_mac

, then you could import it like other packages:

import dvae

, for more detailed usage tutorial, pleas find in section Example

Use container

It's highly recommended to use Singularity container for results reproducing or further application, we provide a sigularity definition file example_singularity/dvae.def to build the images:

# Download singularity
sudo apt-get install -y singularity-container

# Build singularity image
sudo singularity build dvae.sif example_singularity/dvae.def

# Shell into the image, no cuda
singularity shell --bind /path_to_your_dir/:/mnt/your_dir_name singularity/dvae.sif

# Execute commands, enable cuda, you need to define the data path in the config files
singularity exec --nv --bind /path_to_your_dir/:/mnt singularity/dvae.sif python train_model.py example_configuration/cfg_dkf.ini

For more information about singularity, please read Singularity User Guide

Tips

  • For Mac users, speech evaluation is disable because there exist some compilation errors for pypesq on MacOS
  • Python 3.8 support requires Tensorflow 2.2 or later (link), so it is recommended to use Python 3.7 since speechmetrics package needs Tensorflow 2.0

Usage

dvae-speech has been designed to be easily used in a modular way. All you need to do is to specify a configure file path for model initialization, then you could either:

  • train(): train your model with training/validation dataset specified in your configure file
  • generate(): generate a reconstructed audio with an input audio file
  • eval(): evaluate reconstrcuted audio file with specified metric (rmse, pesq, stoi, all)
  • test(): apply reconstruction for all audio files with indicated test dataset directory path

Example

# instantiate a model
from dvae import LearningAlgorithm
cfg_file = 'config.ini'
learning_algo = LearningAlgorithm(config_file=cfg_file)

# train your model
learning_algo.train()

# generate audio with model state in cache
# reconstructed audio will be saved in the same path as input audio file, named as 'audio_001_recon.wav'
audio_ref = 'audio_001.wav'
learning_algo.generate(audio_orig=audio_ref, audio_recon=None, state_dict_file=None)

# generate audio with given model state
# reconstructed audio will be saved in the given path
audio_recon = 'recon/audio_001_recon.wav'
model_state = 'model_state.pt'
learning_algo.generate(audio_orig=audio_ref, audio_recon=audio_recon, state_dict_file=model_state)

# evaluate audio quality with model state in cache
score_rmse = learning_algo.eval(audio_ref=audio_ref, audio_est=audio_recon, metric='rmse') # only RMSE
score_rmse, score_pesq, score_stoi = learning_algo.eval(audio_ref=audio_ref, audio_est=audio_recon, metric='all') # both RMSE, PESQ and STOI

# test model on test dataset with model state in cache
test_data_dir = 'data_to_test'
list_score_rmse, list_score_pesq, list_score_stoi = learning_algo.test(data_dir=test_data_dir, state_dict_file=None)

Config file

We provide all configuration examples of the above models in example_configuration/. For results reproducing, all you need to do is to replace saved_root, train_data_dir and val_data_dir with your own directory path

Main results

All models are trained with Adam with a batch size of 32, using:

  • training dataset: wsj0_si_tr_s
  • validation dataset: wsj0_si_dt_05
  • test dataset: wsj0_si_et_05

For more details, pleas visit Chapter 13 Experiments in our article.

We propose the evaluation results in average and their training curve in the following table, the trained models can be found in saved_model

DVAE RMSE PESQ STOI
VAE 0.0510 2.05 0.86
DKF 0.0344 3.30 0.94
STORN 0.0338 3.05 0.93
VRNN 0.0267 3.60 0.96
SRNN 0.0248 3.64 0.97
RVAE-Causal 0.0499 2.27 0.89
RVAE-NonCausal 0.0479 2.37 0.89
DSAE 0.0469 2.32 0.90

VAE

vae

DKF

dkf

STORN

storn

VRNN

vrnn

SRNN

srnn

RVAE-Causal

rvae-causal

RVAE-NonCausal

rvae-noncausal

DSAE

dsae

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Python 100.0%