PyTorch implementation of Efficiently Trainable Text-to-Speech System Based on Deep Convolutional Networks with Guided Attention based partially on the following projects:
- https://github.com/Kyubyong/dc_tts (audio pre processing)
- https://github.com/r9y9/deepvoice3_pytorch (data loader sampler)
The following notebooks are executable on https://colab.research.google.com :
For audio samples and pretrained models, visit the above notebook links.
The English TTS uses the LJ-Speech dataset.
- Download the dataset:
python dl_and_preprop_dataset.py --dataset=ljspeech
- Train the Text2Mel model:
python train-text2mel.py --dataset=ljspeech
- Train the SSRN model:
python train-ssrn.py --dataset=ljspeech
- Synthesize sentences:
python synthesize.py --dataset=ljspeech
- The WAV files are saved in the
samples
folder.
- The WAV files are saved in the
The Mongolian text-to-speech uses 5 hours audio from the Mongolian Bible.
- Download the dataset:
python dl_and_preprop_dataset.py --dataset=mbspeech
- Train the Text2Mel model:
python train-text2mel.py --dataset=mbspeech
- Train the SSRN model:
python train-ssrn.py --dataset=mbspeech
- Synthesize sentences:
python synthesize.py --dataset=mbspeech
- The WAV files are saved in the
samples
folder.
- The WAV files are saved in the