- python 3.7
- pytorch 1.3
- librosa, scipy, tqdm, tensorboardX
- KSS, Korean female single speaker speech dataset.
-
Download the above dataset and modify the path in config.py. And then run the below command.
python prepro.py
-
The model needs to train 100k+ steps
python train.py <gpu_id>
-
After training, you can synthesize some speech from text.
python synthesize.py <gpu_id> <model_path>
-
To listen your samples, you may need mel2wav vocoder. I didn't include vocoder in this repo.
- I think the difference between baseline Tacotron and TPGST is small on KSS dataset.
- I will be doing more experiminets soon.