Replies: 2 comments
-
>>> erogol |
Beta Was this translation helpful? Give feedback.
-
>>> cesm23 |
Beta Was this translation helpful? Give feedback.
-
>>> erogol |
Beta Was this translation helpful? Give feedback.
-
>>> cesm23 |
Beta Was this translation helpful? Give feedback.
-
>>> cesm23
[February 15, 2021, 10:24pm]
I am quite new at this, and i have been reading tons of documentation
about this, including the faq and wikis from
https://github.com/mozilla/TTS
I was unsure if this is the right place to post this, or as a issue on
the github repository, but since this is more like doubts in how to do
things, probably it's better here.
So far, just to gain experience on this, i am trying to train the
dataset on the folder ' slash TTS slash tests slash data slash ljspeech' using the
TTS slash tests slash inputs slash test_train_config.json file using tacotron2, like
this :
python3 TTS/bin/train_tacotron.py slash --config_path
tests/inputs/test_train_config.json
Unfortunatley i have no money to buy a powerfull gpu to train, so my
only choice is to use CPU (INTEL CORE I9 9900KF non overclocked), which
isnt as bad as i thought, it's taking 10 seconds each step + 7 for the
evaluation (i am unable to disable it with the run_eval, because the
script throws an error which i think is related to using gradual
training), but this is still quite acceptable to me (better than not
doing training at all!), but it's strange, since i am not yet training a
vocoder, and only using CPU i expected this to be much longer.
The issue here, which i still haven't yet understood correctly, is that
we have to do two trainings, right? (i only want the same custom voice
from the wav files, not new ones) One for the tts model using Tacotron2
for example (i suppose it's the best one to choose), and then another
one for the vocoder.
But i cant understand how to train other vocoders than then ones in
TTS slash TTS slash bin :
train_vocoder_gan.py slash
train_vocoder_wavegrad.py slash
train_vocoder_wavernn.py
Were's the other ones, like ParallelWaveGAN, Multi-Band MelGAN,
Full-Band MelGAN and MelGAN ? Unless those dont need training and are
meant only for inference/speech synthesizing ?
And another thing, when i tested speech synthesizing with the tts model
'tts_models/en/ljspeech/tacotron2-DCA' :
tts slash --text 'Text for TTS' slash
and i checked the list in
tts slash --list_models
vocoder_models/universal/libri-tts/wavegrad slash
vocoder_models/universal/libri-tts/fullband-melgan slash
vocoder_models/en/ljspeech/mulitband-melgan
I noticed that for the same TTS model the voice sounded quite different
from vocoder to vocoder, it's almost like it was another womans voice
which made me confused, i thought the TTS models were supposed to have
the same voice from the datasets they were built upon, but each vocoder
i tried made the voice sound so different... What i want from all this
is to have the exact same voice from the datasets, but now i am afraid
of choosing a vocoder that could make the voice change almost to another
person's and only finding that out after training for days (still not
sure how much seconds each step will take with the vocoders training.)
Sorry for all this but before last weekend i knew almost nothing about
training tts voices!
[This is an archived TTS discussion thread from discourse.mozilla.org/t/how-to-train-my-own-tts-model-with-more-vocoders]
Beta Was this translation helpful? Give feedback.
All reactions