How to train my own tts model with more vocoders #355

JRMeyer · 2021-03-07T09:19:25Z

JRMeyer
Mar 7, 2021
Maintainer

>>> cesm23
[February 15, 2021, 10:24pm]

I am quite new at this, and i have been reading tons of documentation
about this, including the faq and wikis from
https://github.com/mozilla/TTS

I was unsure if this is the right place to post this, or as a issue on
the github repository, but since this is more like doubts in how to do
things, probably it's better here.

So far, just to gain experience on this, i am trying to train the
dataset on the folder ' slash TTS slash tests slash data slash ljspeech' using the
TTS slash tests slash inputs slash test_train_config.json file using tacotron2, like
this :

python3 TTS/bin/train_tacotron.py slash --config_path
tests/inputs/test_train_config.json

Unfortunatley i have no money to buy a powerfull gpu to train, so my
only choice is to use CPU (INTEL CORE I9 9900KF non overclocked), which
isnt as bad as i thought, it's taking 10 seconds each step + 7 for the
evaluation (i am unable to disable it with the run_eval, because the
script throws an error which i think is related to using gradual
training), but this is still quite acceptable to me (better than not
doing training at all!), but it's strange, since i am not yet training a
vocoder, and only using CPU i expected this to be much longer.

The issue here, which i still haven't yet understood correctly, is that
we have to do two trainings, right? (i only want the same custom voice
from the wav files, not new ones) One for the tts model using Tacotron2
for example (i suppose it's the best one to choose), and then another
one for the vocoder.

But i cant understand how to train other vocoders than then ones in
TTS slash TTS slash bin :

train_vocoder_gan.py slash
train_vocoder_wavegrad.py slash
train_vocoder_wavernn.py

Were's the other ones, like ParallelWaveGAN, Multi-Band MelGAN,
Full-Band MelGAN and MelGAN ? Unless those dont need training and are
meant only for inference/speech synthesizing ?

And another thing, when i tested speech synthesizing with the tts model
'tts_models/en/ljspeech/tacotron2-DCA' :

tts slash --text 'Text for TTS' slash

and i checked the list in

tts slash --list_models

vocoder_models/universal/libri-tts/wavegrad slash
vocoder_models/universal/libri-tts/fullband-melgan slash
vocoder_models/en/ljspeech/mulitband-melgan

I noticed that for the same TTS model the voice sounded quite different
from vocoder to vocoder, it's almost like it was another womans voice
which made me confused, i thought the TTS models were supposed to have
the same voice from the datasets they were built upon, but each vocoder
i tried made the voice sound so different... What i want from all this
is to have the exact same voice from the datasets, but now i am afraid
of choosing a vocoder that could make the voice change almost to another
person's and only finding that out after training for days (still not
sure how much seconds each step will take with the vocoders training.)

Sorry for all this but before last weekend i knew almost nothing about
training tts voices!

[This is an archived TTS discussion thread from discourse.mozilla.org/t/how-to-train-my-own-tts-model-with-more-vocoders]

JRMeyer · 2021-03-07T09:19:28Z

JRMeyer
Mar 7, 2021
Maintainer Author

>>> erogol
[February 15, 2021, 10:50pm]

Happy to see that you landed the right place eventually.

1. without GPUs it is very time consuming to train models
unfortunately. I suggest you to use at least Google Colab to begin
with that provides some GPUs for limited usage.

2. All slash *GAN vocoders are trained with train_vocoder_gan.py. You need
to specify which one in the config.json file. Check some of the
example config files.

3. Not all vocoders are compatible with all the tts models. This is the
reason for the difference b/w vocoders. You need to use the
compatible ones. You either use the vocoder trained on the same
language and the dataset or the universal vocoders.

Hope these helps !

Note: Pls use some formatting the next time in you post for codes and
commands.

[Archived Post]

0 replies

JRMeyer · 2021-03-07T09:19:30Z

JRMeyer
Mar 7, 2021
Maintainer Author

>>> cesm23
[February 16, 2021, 10:41am]

> without GPUs it is very time consuming to train models unfortunately.
> I suggest you to use at least Google Colab to begin with that provides
> some GPUs for limited usage.

Unfortunately i probably wont have other choice if i can't set up gpu
encoding, because that's the problem : 'limited usage'... i never used
google colab before, but unfortunately i have a lot of custom datasets i
would like to train (even tough they are all small/medium sized, not as
big as the ljspeech dataset) so i doubt i can do heavy use of their
GPUS, but i will have to investigate that more.

But wait a minute... i DO have a nvida card, NVIDIA GEFORCE GTX 1060
SUPER 6GB, the problem is that i found it to be SUCH a hassle to install
nvidia toolkit and in top of that finding that it's not possible to use
it without registering windows insider program and installing insider
windows builds, which i really dont want to, but i am creating a
separate topic for this, either reply here or there in case you can
advise on that, i presume using this card is better than cpu, right? At
least i seen people that did thousands of steps training even using this
card, so it should be doable despite it's low vram and lack of tensor
cores.

> All slash *GAN vocoders are trained with train_vocoder_gan.py. You need to
> specify which one in the config.json file. Check some of the example
> config files.
>
> Not all vocoders are compatible with all the tts models. This is the
> reason for the difference b/w vocoders. You need to use the compatible
> ones. You either use the vocoder trained on the same language and the
> dataset or the universal vocoders.

Ok, i will see about that later after i solve the problem about using
the gpu i currently have.

[Archived Post]

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to train my own tts model with more vocoders #355

{{title}}

Replies: 2 comments

{{title}}

{{title}}

Select a reply

How to train my own tts model with more vocoders #355

JRMeyer Mar 7, 2021 Maintainer

Replies: 2 comments

JRMeyer Mar 7, 2021 Maintainer Author

JRMeyer Mar 7, 2021 Maintainer Author

JRMeyer
Mar 7, 2021
Maintainer

JRMeyer
Mar 7, 2021
Maintainer Author

JRMeyer
Mar 7, 2021
Maintainer Author