Unofficial implementation of wavenext neural vocoder(WIP)
WaveNext proposed to replace the ISTFT final layer of Vocos with a linear layer without bias followed by a reshape op. As this is a slight modification of vocos we're just using the official vocos implementation and adding the WaveNext head in wavenext_pytorch/vocos/heads.py
We also added the modifications in the feature extraction and mel spec loss to make it compatible with the HifiGAN features, However, you can also use the original features from Vocos.
To use Vocos only in inference mode, install it using:
pip install -r requirements.txt
If you wish to train the model, install it with additional dependencies:
pip install -r requirements-train.txt
Prepare a filelist of audio files for the training and validation set:
find $TRAIN_DATASET_DIR -name *.wav > filelist.train
find $VAL_DATASET_DIR -name *.wav > filelist.val
Fill a config file, e.g. wavenext.yaml, with your filelist paths and start training with:
python train.py -c configs/wavenext.yaml
Refer to Pytorch Lightning documentation for details about customizing the training pipeline.
Pre-trained models
Model Name | Dataset | Training Iterations | Parameters |
---|---|---|---|
BSC-LT/wavenext-mel | LibriTTS + LJSpeech + openslr69 + festcat | 1M | 13.68M |
- Add tensorboards.
- Add encodec config.
If this code contributes to your research, please cite the work:
@INPROCEEDINGS{10389765,
author={Okamoto, Takuma and Yamashita, Haruki and Ohtani, Yamato and Toda, Tomoki and Kawai, Hisashi},
booktitle={2023 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)},
title={WaveNeXt: ConvNeXt-Based Fast Neural Vocoder Without ISTFT layer},
year={2023},
volume={},
number={},
pages={1-8},
keywords={Fourier transforms;Vocoders;Conferences;Automatic speech recognition;ConvNext;end-to-end text-to-speech;linear layer-based upsampling;neural vocoder;Vocos},
doi={10.1109/ASRU57964.2023.10389765}}
The code in this repository is released under the MIT license as found in the LICENSE file.