[Help] Share your TTS models #930

erogol · 2021-03-15T10:48:45Z

erogol
Mar 15, 2021
Maintainer

Please consider sharing your pre-trained models in any language (If the licences allow that).

We can include them in our model catalogue for public use by attributing your name (website, company etc.).

That would enable more people to experiment together and coordinate, instead of individual efforts to achieve similar goals.

That is also a chance to make your work more visible.

You can share in two ways;

Share the model files with us and we serve them with the next 🐸 TTS release.
Upload your models on GDrive and share the link.

Models are served under .models.json file and any model is available under tts CLI or Server end points. More details...

(previously mozilla/TTS#395)

enjikaka · 2021-04-15T08:51:34Z

enjikaka
Apr 15, 2021

Any ELI5 tutorial/doc for creating a dataset for your own language/dialect?

0 replies

erogol · 2021-04-15T14:32:25Z

erogol
Apr 15, 2021
Maintainer Author

Not sure if it is ELI5, but there is this link https://github.com/coqui-ai/TTS/wiki/What-makes-a-good-TTS-dataset

Also, @thorstenMueller has created a TTS dataset from the gecko so he might have valuable comments if you have specific questions.

0 replies

thorstenMueller · 2021-04-15T20:09:22Z

thorstenMueller
Apr 15, 2021

Feel free to ask specific question. I'd happy to share my experiences on recording a new dataset here.

Find/Create a text corpus to record (one sentence = 1 recording)
Replace numbers to text
Create csv file from corpus
Check Mimic-Recording-Studio from Mycroft as recording environment (https://github.com/MycroftAI/mimic-recording-studio)
Start recording
- Constant speed while recordings
- Speak all chars clearly
- Speak in neutral voice
- Use good microphone equipment
- Find a recording place without random noise

1 reply

hjschiller Sep 8, 2022

Feel free to ask specific question. I'd happy to share my experiences on recording a new dataset here.

Find/Create a text corpus to record (one sentence = 1 recording)

Replace numbers to text

Create csv file from corpus

Check Mimic-Recording-Studio from Mycroft as recording environment (https://github.com/MycroftAI/mimic-recording-studio)

Start recording

Constant speed while recordings

Speak all chars clearly

Speak in neutral voice

Use good microphone equipment

Find a recording place without random noise

I am a native English (American) speaker, male, but high pitched (so often confused for a woman in the phone). I am just starting out in making a model and am happy to share my model if that is helpful. So far I have only attempted a single recording session, about 700 sentences, in not the quietest location.

My worst recordings have a SNR of 20 and average around 34 (but I just realized none of the noise reduction I applied actually worked so I plan to work on that while recovering from COVID).

What is the most effective least time involved way to test various dataset creation methods? For example would reading 3 sentences with two different mics be sufficient for me (or someone) to judge which provides “better” results? What would the metric be? Signal to noise ratio?

For the dataset, what exactly do you mean? I assume this is .wav files and metadata.csv, but do you want the raw recording, the recording with extra dead air removed, band pass filtered to take out pops, noise reduced, compressed, amplified to a maximum of -10db? Do you have a decent write up (in English) of a good workflow for making a good data set? (A YouTube video would be ok too)

Also I’m not exactly sure what neutral voice is, I have been aiming for the voice I use in conversation, or reading a book out-loud (without doing different voices for when the characters speak). It is not necessarily flat, but it is without emotion such as excitement or anger.

Sadam1195 · 2021-04-21T12:05:04Z

Sadam1195
Apr 21, 2021

Hi @erogol , thank you for the amazing work, from Mozilla TTS to coqui-ai. Although Mozilla seemed perfect to me as it had wider community reach, just hope this grows even wider and faster than Mozilla. I am planning to share my models for Spanish and Italian using (Taco2 600k steps + WaveRNN). Audio quality seems to be good but I need to train it a bit more and also ask dataset providers if that would be okay if I make the models public.
Fingers crossed.

Let me know if I can contribute in any way I have Google Colab Pro resources laying around free.

+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+

0 replies

erogol · 2021-04-21T12:15:22Z

erogol
Apr 21, 2021
Maintainer Author

@Sadam1195 thx for the amazing work 🚀🚀.

I really hope we can include your models, of course with the right attribution going to you.

Just waiting for your signal.

For general contribution, this is a nice place to start https://github.com/coqui-ai/TTS/blob/main/CONTRIBUTING.md

If you just like to train models, let me know we can also find new datasets to attack.

0 replies

Sadam1195 · 2021-04-21T13:50:26Z

Sadam1195
Apr 21, 2021

I really hope we can include your models, of course with the right attribution going to you.
I hope they allow me, otherwise I would see it as wasting my time and effort.
Just waiting for your signal.
I will let you know when I get the confirmation.
If you just like to train models, let me know we can also find new datasets to attack.
Training models on colab can be a bit annoying as sessions often get disconnected even with all the tricks in the book.

Nonetheless, I would love to train model on new datasets (if you have any) specially in the languages in which TTS models haven't been made public yet.

0 replies

kaiidams · 2021-05-16T10:59:37Z

kaiidams
May 16, 2021

Hello,

I've just started to train a public domain Japanese dataset https://github.com/kaiidams/Kokoro-Speech-Dataset with Tacotron 2 of the latest master of https://github.com/mozilla/TTS on Google Colab Free. After 19K steps, I can hear what he says, although it is metallic.

To proceed, I'd like to know which branch and repo do you recommend for me to use? https://github.com/erogol/TTS_recipes seems a bit old.

0 replies

Sadam1195 · 2021-05-16T11:40:18Z

Sadam1195
May 16, 2021

To proceed, I'd like to know which branch and repo do you recommend for me to use? https://github.com/erogol/TTS_recipes seems a bit old.

Please use this https://github.com/coqui-ai/TTS instead of https://github.com/mozilla/TTS and use the latest main branch. @kaiidams

0 replies

kaiidams · 2021-05-21T06:49:56Z

kaiidams
May 21, 2021

@Sadam1195 @erogol

I trained Tacotron 2 for 130K steps with this code https://github.com/kaiidams/TTS/tree/kaiidams/kokoro which was forked from the latest main.
https://drive.google.com/drive/folders/1-1_HB-ogmvD-qYaHm8D5Xp1pWq9HKhB_?usp=sharing
The included sample.wav was generated with vocoder_models/universal/libri-tts/wavegrad.

The input of the model is Romanized Japanese text. It requires some dependencies like MeCab to convert texts from ordinary ones.
The dataset is the public domain and the reader knows about the dataset. I think I can provide Python code for text conversion.

0 replies

erogol · 2021-05-21T10:06:34Z

erogol
May 21, 2021
Maintainer Author

@kaiidams if you can send a PR for text conversion something similar to the Chinese API we have, with the model, would be a great contribution.

1 reply

Grant-Tao Oct 3, 2022

Sorry to bother. Is it possible to add a chinese dataset when training yourtts vits model, so that yourtts can speak chinese? if yes, how?

zubairahmed-ai · 2021-06-09T05:38:58Z

zubairahmed-ai
Jun 9, 2021

Feel free to ask specific question. I'd happy to share my experiences on recording a new dataset here.

Find/Create a text corpus to record (one sentence = 1 recording)

Replace numbers to text

Create csv file from corpus

Check Mimic-Recording-Studio from Mycroft as recording environment (https://github.com/MycroftAI/mimic-recording-studio)

Start recording

Constant speed while recordings

Speak all chars clearly

Speak in neutral voice

Use good microphone equipment

Find a recording place without random noise

Any reason why this and this isn't in the readme?
I had to look up training to reach here

0 replies

thorstenMueller · 2021-06-09T11:31:37Z

thorstenMueller
Jun 9, 2021

Hi @zubairahmed-ai.
Here's a talk a made on how to record a voice dataset if that's helpful for you.

https://youtu.be/m-Uwb-Bg144

0 replies

zubairahmed-ai · 2021-06-09T11:33:47Z

zubairahmed-ai
Jun 9, 2021

@thorstenMueller Perfect timing, thank you

0 replies

zubairahmed-ai · 2021-06-09T11:35:18Z

zubairahmed-ai
Jun 9, 2021

Oh just realized this talk happened during recent Google I/O and I somehow didn't catch it while watching other videos :)

0 replies

zubairahmed-ai · 2021-06-10T05:35:17Z

zubairahmed-ai
Jun 10, 2021

@thorstenMueller Thanks so much for the great video explaining your process in details with some tips. I'll make sure I follow that, do you plan to give a try to other models besides Tacotron-2? like Align-TTS?

0 replies

ZohaibSajjad · 2022-03-20T18:34:14Z

ZohaibSajjad
Mar 20, 2022

Hi all, I want to build voice conversion using the cross-language technique. For this purpose, I have used voice conversion challenge 2020 architecture. After 6 weeks of working on it, there are no acceptable results found. Now I decide to use YourTTS voice conversion architecture. I want to train it on the English-Urdu dataset. But I don't know where to start, can anyone guide me in this regard, Any help is appreciated.

0 replies

xettrisomeman · 2022-09-07T11:16:49Z

xettrisomeman
Sep 7, 2022

Hey , I would like to share my Nepali Model trained on Openslr dataset

demo : vitsexamplex2_1015k.webm

Here is the drive link: https://drive.google.com/drive/folders/1Jwr7ITDA4hFKLMSVXUj3A8nQlpdxT6ql?usp=sharing

Thanks :)

1 reply

erogol Sep 8, 2022
Maintainer Author

Thanks for sharing. Does it work with the latest 🐸TTS?

Tarek-Hasan · 2022-11-21T05:15:24Z

Tarek-Hasan
Nov 21, 2022

Checkout Voice models for Mimic 3 from Mycroft AI
https://github.com/MycroftAI/mimic3
these are the most natural sounded opensource AI voice I ever heard.

0 replies

erogol · 2022-11-21T11:04:04Z

erogol
Nov 21, 2022
Maintainer Author

@Tarek-Hasan looking at the released models they pretty much used our code but I don't see the released model binaries.

0 replies

Tarek-Hasan · 2022-11-21T16:12:49Z

Tarek-Hasan
Nov 21, 2022

Sorry, I somehow included wrong link. Here's the voice models link
https://github.com/MycroftAI/mimic3-voices

1 reply

erogol Dec 5, 2022
Maintainer Author

It looks like they only released ONNX files that are not unfortunately compatible with TTS. Some manual work is needed.
Thanks for sharing the link though 👍

gullabi · 2022-12-21T10:13:28Z

gullabi
Dec 21, 2022

I would like to present a new VITS multispeaker model trained by @GerrySant for Catalan within the framework of @projecte-aina. It is trained from zero with 101460 utterances consisting of 257 speakers, approx 138 hours of speech. We used three datasets; Festcat and Google Catalan TTS (both TTS datasets) and also a part of Common Voice 8. It is trained with TTS v0.8.0.

Here are two examples of a male and a female voice.

f_occ_de.mp4

m_occ_88.mp4

The model is uploaded in Huggingface with its own space to generate voices. We also would like the models to be accessible as a part of Coqui models.

1 reply

erogol Dec 26, 2022
Maintainer Author

@gullabi added the model to the latest release https://github.com/coqui-ai/TTS/releases/tag/v0.10.1_models

Just one note. You need to rename speakers.pth as sepeaker_ids.pth to be compatible with the tts command.

speakers.pth is for models trained with d-vectors.

Thanks again for sharing the model.

erogol · 2022-12-21T10:38:08Z

erogol
Dec 21, 2022
Maintainer Author

@gullabi thanks for sharing. I'll add the model asap.

0 replies

karim23657 · 2022-12-25T08:26:45Z

karim23657
Dec 25, 2022

I trained a glow_tts model for Persian language.
https://huggingface.co/Kamtera/persian-tts-female-glow_tts
Also I have created a dataset:
https://www.kaggle.com/datasets/magnoliasis/persian-tts-dataset-famale

3 replies

erogol Dec 25, 2022
Maintainer Author

Thanks for sharing. A new language 👍

erogol Dec 26, 2022
Maintainer Author

@karim23657 model audio quality is not very good since there is no vocoder. To fix that here are my suggestions

Retrain/finetune the model with the audio parameters that are compatible with one of the released universal vocoders
Train a separate vocoder model compatible with the model
Train VITS instead of GlowTTS, which has a vocoder inside.

Let me know if you are interested in following one of the options. It'd be nice to bump up the model quality to make it useful.

PS. I've added the model to the latest releases regardless.

https://github.com/coqui-ai/TTS/releases/tag/v0.10.1_models

karim23657 Feb 1, 2023

I trained these VITS :
https://huggingface.co/Kamtera/persian-tts-male-vits
https://huggingface.co/Kamtera/persian-tts-female-vits

Demo:
https://huggingface.co/spaces/Kamtera/Persian-tts-CoquiTTS

Models trained on these datasets :
https://www.kaggle.com/datasets/magnoliasis/persian-tts-dataset
https://www.kaggle.com/datasets/magnoliasis/persian-tts-dataset-famale

Other models(not very good): https://huggingface.co/Kamtera

mobassir94 · 2023-01-01T08:47:45Z

mobassir94
Jan 1, 2023

@erogol
i would like to share two bangla tts models trained using fantastic 🐸 TTS.
i have trained vits and glowtts,, though glowtts didn't perform well but vits performed really well. i have used iitm IndicSpeech dataset and converted them in ljspeech format for training the models using 🐸 TTS api.
please have a look at this repo : https://github.com/mobassir94/comprehensive-bangla-tts
everything is public,including the trained weight files and logs. please check this colab inference demo code : https://github.com/mobassir94/comprehensive-bangla-tts/blob/main/Comprehensive_Bangla_Text_to_Speech_(TTS).ipynb

To the best of our knowledge (from our extensive google search and research and extensive human validation) we’ve discovered that the Bangla Vits TTS (text to speech) system that we trained and used for reading various bangla tafsir / hadith is the highest performing State of the Art (SOTA) Bangla neural voice cloning system till this date (Thursday, December 29, 2022) that’s ever released publicly for Bangla language for free and it beats past TTS systems like gtts,silero-tts,indic-tts by large margin in terms of quality.
i would like to thank 🐸 TTS team. your api helped me to train and deploy best bangla tts models with ease.

4 replies

erogol Apr 16, 2023
Maintainer Author

It's been a while but I did not forget about it. Hopefully this week I'll give your model a try and hopefully add it to the model zoo.

mobassir94 Apr 17, 2023

@erogol bangla is a very complex language and there are many complex issues with this languages,,if you check this notebook -> https://github.com/mobassir94/comprehensive-bangla-tts/blob/main/Comprehensive_Bangla_Text_to_Speech_(TTS).ipynb you can see that i used import bangla,from bnnumerizer import numerize
and from bnunicodenormalizer import Normalizer

these packages needs to be used to process the input text before fetching it to tts model for inference.these pre processings are very necessary

erogol Apr 17, 2023
Maintainer Author

it was actually quite easy. I am adding the female and male models to the model zoo. I'll ping you to try it out.

erogol Apr 21, 2023
Maintainer Author

models are gone through the release 🚀

neural-loop · 2023-05-19T22:50:48Z

neural-loop
May 19, 2023

Started a project for creating collections of voices: https://huggingface.co/voices/

The first addition is https://huggingface.co/voices/VCTK_European_English_Females (trained at 85000 steps, might go a bit further later) A nice feature is the ability to quickly preview each of the voices based on the wav samples / README markdown.

I plan to go through and include all of the VCTK voices. It would be kind of cool to do one big VCTK dataset with the 110 english voices, but it would be so many dataset files that it'd feel like a bit of a slog to get through, and also I notice that the voices kind of pull each other different directions (in the VCTK European English Females, for example, the original YourTTS male-en-2 has taken on the properties of the female voices) so I considered that segmenting them by similarity might produce better outputs.

A couple other community members in the discord have been helping, one is setting up a space where people can test the voices out and another tested the model and helped me iron out some issues in the config.json (which I think may still need a little work)

These can be added to the model zoo or list if wanted. The VCTK set will be 100% from VCTK & YourTTS training data so it should all be 'for sure' CC-by-4.0

I may add some custom curated voices that sit more in the gray area to the voices space on huggingface but it will be labeled clearly which ones were trained off of cc-by-4.0 or if they have manually sourced training.

0 replies

kungfooman · 2023-05-28T15:49:23Z

kungfooman
May 28, 2023

Anyone happens to have a Norwegian voice model?

0 replies

vislupus · 2023-06-14T09:54:07Z

vislupus
Jun 14, 2023

I prepared a small dataset in a format similar to LJSpeech for Bulgarian and English. I can also add more audio files for an additional speaker if that will be helpful.

0 replies

alex73 · 2023-07-08T21:41:03Z

alex73
Jul 8, 2023

Hi. I would like to share model for Belarusian language trained on the Mozilla Common Voice dataset. License: CC-BY-SA 4.0
Please share it from https://coqui.gateway.scarf.sh/
Files for GlowTTS and HiFiGAN : https://drive.google.com/drive/folders/19OmP81aeOd2Xp7aP8aVSxhuXnR4s43QO?usp=sharing

11 replies

erogol Aug 1, 2023
Maintainer Author

You should add it as a separate backend like espeak and worse case call it from the terminal again like espeak. You can use the espeak implementation as an example under TTS/TTS/tts/utils/text

erogol Aug 1, 2023
Maintainer Author

Model sounds really good btw :)

alex73 Aug 28, 2023

@erogol Hi Eren. Did you include Belarusian model into some official list ? I don't see it in the "--list_models".

erogol Sep 4, 2023
Maintainer Author

not yet I needed the phonemizer. It is just merged. I'll add the model asap. Sorry, I am slow working on open-source among other things in the list.

erogol Sep 4, 2023
Maintainer Author

#2922

karim23657 · 2023-08-15T11:14:17Z

karim23657
Aug 15, 2023

Bellow repo contains my Models and demoes and training codes for Persian tts

https://github.com/karim23657/Persian-tts-coqui

5 replies

erogol Aug 21, 2023
Maintainer Author

Thanks for sharing. I'll try to add them to the model zoo asap.

erogol Sep 4, 2023
Maintainer Author

@karim23657 among all the models do you have a favorite model? It'd be better to add the best model only so people do not need to jump between models.

karim23657 Sep 19, 2023

@erogol these are best : vits female (best) and vits male1 (best)

erogol Sep 25, 2023
Maintainer Author

Thanks I'll give them a try asap

UncleBob2 Nov 9, 2023

https://huggingface.co/spaces/ntt123/Vietnam-male-voice-TTS

Could you please add this Vietnamese model on Hugging Face?

omega3 · 2023-11-29T13:06:18Z

omega3
Nov 29, 2023

Can you share some good British English model voice? US English voices like 17: tts_models/en/ljspeech/vits--neon is great but British voices are not as good. As a point of reference I propose voices from a dictionary.

0 replies

[Help] Share your TTS models #930

erogol Mar 15, 2021 Maintainer

Replies: 42 comments · 36 replies

erogol Apr 15, 2021 Maintainer Author

erogol Apr 21, 2021 Maintainer Author

erogol May 21, 2021 Maintainer Author

erogol Sep 8, 2022 Maintainer Author

erogol Nov 21, 2022 Maintainer Author

erogol Dec 5, 2022 Maintainer Author

erogol Dec 26, 2022 Maintainer Author

erogol Dec 21, 2022 Maintainer Author

erogol Dec 25, 2022 Maintainer Author

erogol Dec 26, 2022 Maintainer Author

erogol Apr 16, 2023 Maintainer Author

erogol Apr 17, 2023 Maintainer Author

erogol Apr 21, 2023 Maintainer Author

erogol Aug 1, 2023 Maintainer Author

erogol Aug 1, 2023 Maintainer Author

erogol Sep 4, 2023 Maintainer Author

erogol Sep 4, 2023 Maintainer Author

erogol Aug 21, 2023 Maintainer Author

erogol Sep 4, 2023 Maintainer Author

erogol Sep 25, 2023 Maintainer Author

erogol
Mar 15, 2021
Maintainer

Replies: 42 comments 36 replies

erogol
Apr 15, 2021
Maintainer Author

erogol
Apr 21, 2021
Maintainer Author

erogol
May 21, 2021
Maintainer Author

erogol Sep 8, 2022
Maintainer Author

erogol
Nov 21, 2022
Maintainer Author

erogol Dec 5, 2022
Maintainer Author

erogol Dec 26, 2022
Maintainer Author

erogol
Dec 21, 2022
Maintainer Author

erogol Dec 25, 2022
Maintainer Author

erogol Dec 26, 2022
Maintainer Author

erogol Apr 16, 2023
Maintainer Author

erogol Apr 17, 2023
Maintainer Author

erogol Apr 21, 2023
Maintainer Author

erogol Aug 1, 2023
Maintainer Author

erogol Aug 1, 2023
Maintainer Author

erogol Sep 4, 2023
Maintainer Author

erogol Sep 4, 2023
Maintainer Author

erogol Aug 21, 2023
Maintainer Author

erogol Sep 4, 2023
Maintainer Author

erogol Sep 25, 2023
Maintainer Author