Skip to content

Pytorch implementation of "Group Latent Embedding for Vector Quantized Variational Autoencoder in Non-Parallel Voice Conversion" [Interspeech 2019]

License

Notifications You must be signed in to change notification settings

shaojinding/GroupLatentEmbedding

 
 

Repository files navigation

Group Latent Embedding for Vector Quantized Variational Autoencoder in Non-Parallel Voice Conversion

Code for this paper Group Latent Embedding for Vector Quantized Variational Autoencoder in Non-Parallel Voice Conversion

Shaojin Ding, Ricardo Gutierrez-Osuna

In INTERSPEECH 2019

This is a Pytorch implementation. This implementation is based on the VQ-VAE-WaveRNN implementation at https://github.com/mkotha/WaveRNN.

Dataset:

Preparation

The preparation is similar to that at https://github.com/mkotha/WaveRNN. We repeat it here for convenience.

Requirements

  • Python 3.6 or newer
  • PyTorch with CUDA enabled
  • librosa
  • apex if you want to use FP16 (it probably doesn't work that well).

Create config.py

cp config.py.example config.py

Preparing VCTK

You can skip this section if you don't need a multi-speaker dataset.

  1. Download and uncompress the VCTK dataset.
  2. python preprocess_multispeaker.py /path/to/dataset/VCTK-Corpus/wav48 /path/to/output/directory
  3. In config.py, set multi_speaker_data_path to point to the output directory.

Usage

To run Group Latent Embedding:

$ python wavernn.py -m vqvae_group --num-group 41 --num-sample 10

The -m option can be used to tell the the script what model to train. By default, it trains a vanilla VQ-VAE model.

Trained models are saved under the model_checkpoints directory.

By default, the script will take the latest snapshot and continues training from there. To train a new model freshly, use the --scratch option.

Every 50k steps, the model is run to generate test audio outputs. The output goes under the model_outputs directory.

When the -g option is given, the script produces the output using the saved model, rather than training it.

--num-group specifies the number of groups. --num-sample specifies the number of atoms in each group. Note that num-group times num-sample should be equal to the total number of atoms in the embedding dictionary (n_classes in class VectorQuantGroup in vector_quant.py)

Acknowledgement

The code is based on mkotha/WaveRNN.

Cite the work

@inproceedings{Ding2019,
  author={Shaojin Ding and Ricardo Gutierrez-Osuna},
  title={{Group Latent Embedding for Vector Quantized Variational Autoencoder in Non-Parallel Voice Conversion}},
  year=2019,
  booktitle={Proc. Interspeech 2019},
  pages={724--728},
  doi={10.21437/Interspeech.2019-1198},
  url={http://dx.doi.org/10.21437/Interspeech.2019-1198}
}

About

Pytorch implementation of "Group Latent Embedding for Vector Quantized Variational Autoencoder in Non-Parallel Voice Conversion" [Interspeech 2019]

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Languages

  • Python 99.8%
  • Shell 0.2%