Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🤖 AUTO TTS #696

Closed
wants to merge 132 commits into from
Closed

🤖 AUTO TTS #696

wants to merge 132 commits into from

Conversation

loganhart02
Copy link
Contributor

This is just something I made on my personal fork for really quick testing of different models for my datasets and thought this same style would be cool for a sort of recipe api. All the model configs are either based off pre trained models or current recipes, haven't really messed around with them and I have only tested models for one epoch as my gpu is training something else. also added a data loader tool that will load the dataset with the proper audio configs. That is also just based off pre trained model configs. It's pretty easy to add more recipes. Let me know what you think and I can work on adding more recipes, also planning on making a vocoder trainer and a speaker encoder trainer.

erogol and others added 5 commits July 27, 2021 10:08
* Update distribute.py

Simple fix to make distributed training work with python runner file

* Update distribute.py

* Fix linter errors
@CLAassistant
Copy link

CLAassistant commented Aug 3, 2021

CLA assistant check
All committers have signed the CLA.

@erogol
Copy link
Member

erogol commented Aug 4, 2021

This is quite impressive 🚀

Do you think we can also create a console end-point so people run these pre-defines recipes on the terminal? We can even call it AutoTTS :)

TTS/recipe_api/complete_recipes.py Outdated Show resolved Hide resolved
@loganhart02
Copy link
Contributor Author

This is quite impressive rocket

Do you think we can also create a console end-point so people run these pre-defines recipes on the terminal? We can even call it AutoTTS :)

AutoTTS sounds wayyyyyyyy cooler than recipe api! I'm currently working on a python script to let you just define everything from the dataset you want to train on to the model you want to use as command line args, I'm just debugging the last bit it of it currently. I'm also adding a notebook for people who use google colab.

@loganhart02 loganhart02 changed the title recipe_api auto tts Aug 6, 2021
…r sam dataset, also added options so users can turn on forward and loction attention when they want to.
@erogol erogol changed the title auto tts 🤖 AUTO TTS Aug 10, 2021
@erogol erogol added the feature implementation Implementation of a new feature label Aug 10, 2021
@loganhart02
Copy link
Contributor Author

I'm pretty content with this being the first version of the auto tts for now. Obviously as this repo grows and more models get implemented I'll continue to add but right now I want to focus on trying to make a script to export models to onnx and get them to run on onnxruntime and tensorrt(I need to do this for my own project so might as well just make a script everyone can use), and try to work on batch inference, so idk when ill add more to this, let me know if anything needs changed before it can pushed to main

@erogol
Copy link
Member

erogol commented Aug 23, 2021

@loganhart420 I guess you are up for the review right. Do you plan any other change?

@loganhart02
Copy link
Contributor Author

loganhart02 commented Aug 23, 2021 via email

@erogol
Copy link
Member

erogol commented Aug 25, 2021

just a heads up,I am going to start reviewing the PR next week hopefully after solving a bunch of bugs.

epochs=self.epochs,
)

def ljspeechAutoTts(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better to use _ notation instead of Camel notation to comply with the rest of the code base.

"""This is trainer for calling complete recipes based off public datasets.
all configs are based off pretrained model configs or the model papers.

usage:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Examples: instead of usage:

It'd be nice to add Args: and type annotations in docstrings too


def SamAccentureAutoTts(self, model_name, tacotron2_model_type, forward_attention=False, location_attention=True):
"""Tacotron2 recipes for the sam dataset, based off the pre trained model."""
if model_name == "tacotrn2":
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tacotron2

trainer = Trainer(args, config, output_path, c_logger, tb_logger)
return trainer

def vctkAutoTts(self, model_name, speaker_file, glowtts_encoder):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe rather than defining different functions for each dataset you can make the dataset an argument to the function. AFAIS, the only difference between functions is the choice of datasets.

@@ -0,0 +1,14 @@
from TTS.auto_tts.complete_recipes import Examples
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe call this AutoTrainer or just Trainer

def single_speaker_tacotron2_base(
self, audio, dataset, dla=0.25, pla=0.25, ga=5.0, forward_attn=True, location_attn=True
):
config = Tacotron2Config(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about fetching these configs from the real recipes when exist to reduce the decoupling


def ljspeech_speedy_speech(self, audio, dataset):
"""Base speedy speech model for ljpseech dataset."""
model_args = SpeedySpeechArgs(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SpeedySpeech is tricky to train since it needs character durations precomputed. You either compute them externally or train a Tacotron model first to compute durations. Maybe Speedy Speecy training should first start with the Tacotron training and compute the durations. But also it sounds like a lot of clutter in the code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea I saw that when testing that out, for now I'm just going to leave it out until I create a stable and clean experiment function

@erogol
Copy link
Member

erogol commented Oct 12, 2021

The PR looks awesome. The abstraction you put on the training is great for especially non-technical users. I've put some comments above.

@erogol
Copy link
Member

erogol commented Oct 12, 2021

Also, what is the use-case for AutoTTS in your mind? My thinking, it targets especially non-technical users who want to train a new model on a custom dataset. That means actually we don't know if the default values are the best values for his dataset. Let's say he trained the first model with the default values then what should be the next step? Do you have an idea?

@loganhart02
Copy link
Contributor Author

loganhart02 commented Oct 12, 2021 via email

@erogol
Copy link
Member

erogol commented Oct 14, 2021

Are you on the Gitter/Matrix channel?

@loganhart02
Copy link
Contributor Author

loganhart02 commented Oct 14, 2021 via email

@erogol
Copy link
Member

erogol commented Oct 14, 2021

It'd be nice to get on there so that we can talk in detail :)

@loganhart02 loganhart02 reopened this Oct 27, 2021
� Conflicts:
�	docs/source/tutorial_for_nervous_beginners.md
�	recipes/ljspeech/fast_pitch/train_fast_pitch.py
�	recipes/ljspeech/glow_tts/train_glowtts.py
�	recipes/ljspeech/hifigan/train_hifigan.py
�	recipes/ljspeech/multiband_melgan/train_multiband_melgan.py
�	recipes/ljspeech/univnet/train.py
�	recipes/ljspeech/vits_tts/train_vits.py
�	recipes/ljspeech/wavegrad/train_wavegrad.py
�	recipes/ljspeech/wavernn/train_wavernn.py
…ill working on adding more this is what I got so far
@loganhart02
Copy link
Contributor Author

I put dataset downloaders in this pr It can be bundled with this pr but also I can make another pr for it if you want to merge it seperate from this one

self.epochs = epochs
self.manager = ModelManager()

def _single_speaker_from_pretrained(self, model_name):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be easier to use the ModelManager to parse the models from .models.json so we dont need to add models manually

self.data_path = data_path
self.dataset_name = dataset

def single_speaker_autotts( # im actually going to change this to autotts_recipes and i'm making a more generic
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even for personal notes

trainer = Trainer(args, config, output_path, c_logger, tb_logger)
return trainer

def multi_speaker_autotts(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can add ForwardTTS model too for multi-speaker



def main():
parser = argparse.ArgumentParser()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For Argparsing over default values you can use Coqpit https://github.com/coqui-ai/coqpit

# with each users data so im thinking of a way to have users define their own audio params with this


def pick_glowtts_encoder(encoder_name: str):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better to parse it from the code gain to prevent manual editing in the future.

@@ -0,0 +1,256 @@
import logging
Copy link
Member

@erogol erogol Nov 3, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this needs to be a class. We can define different functions for each dataset.

You can maybe add datasets here https://github.com/coqui-ai/TTS/blob/main/TTS/utils/downloaders.py

You should also create separate PRs for changes under 🐸TTS as we move AutoTTS to a new repo

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea I had made this before I you guys added the downloaders so I'm gong to make a new PR just adding functions for the other datasets

@erogol
Copy link
Member

erogol commented Nov 3, 2021

I put some comments to your changes. Let me know if you have any questions .

@stale
Copy link

stale bot commented Dec 3, 2021

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. You might also look our discussion channels.

@stale stale bot added the wontfix This will not be worked on but feel free to help. label Dec 3, 2021
@erogol
Copy link
Member

erogol commented Dec 7, 2021

I close this as it is going to be a separate repo

@erogol erogol closed this Dec 7, 2021
@king-dahmanus
Copy link

Hey, just wanted to come here and say that auto tts is an actual tts engine for android. It basically allows for multi lingual tts by switching between tts voices. So you might need changing it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature implementation Implementation of a new feature wontfix This will not be worked on but feel free to help.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants