-
Notifications
You must be signed in to change notification settings - Fork 104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Training Multi-Language and Multi-Speaker Model #119
Comments
Hello! Yes, you can definitely do that. But you will need to devise ways to merge two conditional information. I suggest some naive ways that do not guarantee the disentanglement between language and speaker IDs.
And then you should be able to train it multi-language and multi speaker! Hope this helps. |
Thank you for the explanation @shivammehta25. I will summarize it as follows, is it correct?
|
It would be great if this repository allowed to train on a non-English language without having to change the Python code, but only the config files. Even though I have not reviewed the whole repository it doesn't seem that it is there yet, e.g. if you use basic_cleaner in your config file, you end up training on raw text, not on phonemes, which will certainly pose problems. @shivammehta25 I would suggest indeed changing global_phonemizer to an empty dict, then by using |
@see2run Hey sorry for the delay in response. Some thoughts/suggestions: In point 3. global_phonemizer = phonemizer.backend.EspeakBackend(
language=lang,
preserve_punctuation=True,
with_stress=True,
language_switch="remove-flags",
logger=critical_logger) move this outside to a global position or create a singleton of it, because this initialisation takes some time and will slow the training down. For point 5. Now you have language id in the input which is an integer (an ordinal in this case), we need to convert it into vector. One of the easiest way to do this is an nn.Embedding layer, so you would need to pass the integer to in the Matcha-TTS/matcha/models/matcha_tts.py Lines 52 to 53 in 108906c
if n_languages > 1:
self.language_emb = nn.Embedding(n_languages, lang_emb_dim) and then just follow the path wherever having a speaker id changes the shapes of the layers and is added or concatenated also change the shape and concatenate or add language_emb too. |
@tomschelsen That is a good suggestion; however, it was released as a supplement code for a research paper whose evaluations were run in English, so we didn't add any native support for multilingual text. It is a good suggestion, but I do not have the bandwidth right now. I do welcome PRs. :) |
Hi @shivammehta25, to train matcha model in spanish what code changes are needed? I don't mean multi language TTS |
@anarucu Hello, I have information about training to another languages here: https://github.com/shivammehta25/Matcha-TTS/wiki/Training-%F0%9F%8D%B5-Matcha%E2%80%90TTS-with-different-dataset-&-languages As long as you have not changed the phonemizer, you can start with a pretrained checkpoint. However, further experimentation would be needed to measure its effectiveness. |
Hi, I want to ask about multi-language and multi-speaker training. Can I do that? Maybe the dataset format could be like this:
file_wav|text|lang_id|speaker_id
The text was updated successfully, but these errors were encountered: