-
Notifications
You must be signed in to change notification settings - Fork 4.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🤖 AUTO TTS #696
🤖 AUTO TTS #696
Conversation
* Update distribute.py Simple fix to make distributed training work with python runner file * Update distribute.py * Fix linter errors
Fix json formatting error
…y only made ljspeech recipes
This is quite impressive 🚀 Do you think we can also create a console end-point so people run these pre-defines recipes on the terminal? We can even call it AutoTTS :) |
AutoTTS sounds wayyyyyyyy cooler than recipe api! I'm currently working on a python script to let you just define everything from the dataset you want to train on to the model you want to use as command line args, I'm just debugging the last bit it of it currently. I'm also adding a notebook for people who use google colab. |
…r sam dataset, also added options so users can turn on forward and loction attention when they want to.
…tead of calling a function) added more ljspeech vocoder models(these are just baselines though, once I have a free gpu im going to test them and see what can be changed to make them better.)
I'm pretty content with this being the first version of the auto tts for now. Obviously as this repo grows and more models get implemented I'll continue to add but right now I want to focus on trying to make a script to export models to onnx and get them to run on onnxruntime and tensorrt(I need to do this for my own project so might as well just make a script everyone can use), and try to work on batch inference, so idk when ill add more to this, let me know if anything needs changed before it can pushed to main |
@loganhart420 I guess you are up for the review right. Do you plan any other change? |
as of right now, no, I’m working on other features before I change anything to this.
…Sent from my iPhone
On Aug 23, 2021, at 8:05 AM, Eren Gölge ***@***.***> wrote:
@loganhart420 I guess you are up for the review right. Do you plan any other change?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
just a heads up,I am going to start reviewing the PR next week hopefully after solving a bunch of bugs. |
TTS/auto_tts/complete_recipes.py
Outdated
epochs=self.epochs, | ||
) | ||
|
||
def ljspeechAutoTts( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better to use _
notation instead of Camel notation to comply with the rest of the code base.
TTS/auto_tts/complete_recipes.py
Outdated
"""This is trainer for calling complete recipes based off public datasets. | ||
all configs are based off pretrained model configs or the model papers. | ||
|
||
usage: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Examples:
instead of usage:
It'd be nice to add Args:
and type annotations in docstrings too
TTS/auto_tts/complete_recipes.py
Outdated
|
||
def SamAccentureAutoTts(self, model_name, tacotron2_model_type, forward_attention=False, location_attention=True): | ||
"""Tacotron2 recipes for the sam dataset, based off the pre trained model.""" | ||
if model_name == "tacotrn2": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tacotron2
TTS/auto_tts/complete_recipes.py
Outdated
trainer = Trainer(args, config, output_path, c_logger, tb_logger) | ||
return trainer | ||
|
||
def vctkAutoTts(self, model_name, speaker_file, glowtts_encoder): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe rather than defining different functions for each dataset you can make the dataset an argument to the function. AFAIS, the only difference between functions is the choice of datasets.
TTS/auto_tts/example.py
Outdated
@@ -0,0 +1,14 @@ | |||
from TTS.auto_tts.complete_recipes import Examples |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe call this AutoTrainer
or just Trainer
def single_speaker_tacotron2_base( | ||
self, audio, dataset, dla=0.25, pla=0.25, ga=5.0, forward_attn=True, location_attn=True | ||
): | ||
config = Tacotron2Config( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about fetching these configs from the real recipes
when exist to reduce the decoupling
TTS/auto_tts/model_hub.py
Outdated
|
||
def ljspeech_speedy_speech(self, audio, dataset): | ||
"""Base speedy speech model for ljpseech dataset.""" | ||
model_args = SpeedySpeechArgs( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SpeedySpeech is tricky to train since it needs character durations precomputed. You either compute them externally or train a Tacotron model first to compute durations. Maybe Speedy Speecy training should first start with the Tacotron training and compute the durations. But also it sounds like a lot of clutter in the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea I saw that when testing that out, for now I'm just going to leave it out until I create a stable and clean experiment function
The PR looks awesome. The abstraction you put on the training is great for especially non-technical users. I've put some comments above. |
Also, what is the use-case for AutoTTS in your mind? My thinking, it targets especially non-technical users who want to train a new model on a custom dataset. That means actually we don't know if the default values are the best values for his dataset. Let's say he trained the first model with the default values then what should be the next step? Do you have an idea? |
Hey, I’ve read through the comments you made and plan on making the necessary changes within the week. For the data formatters I agree with just trying to get them to format their data into the ljspeech format, I’ll probably make some helper functions to make that process easier if it’s necessary as this is a higher level abstraction for non technical users. making auto tts a separate repo is probably a good idea because, although I’ve only added small features in this this PR I’ve got a million ideas and features I want to add. You are correct that this is a platform for non technical users to fine tune on their data, I’ve only had a computer for a little over a year so I’ve only been learning programming and deep learning for that long so I don’t know how correct this is but from what I’ve seen is that most models are pretty standard and you only have to tweak certain parameters to get it to train well on new data(correct me if I’m wrong about this), so my end goal is had those parameters mutable and have defaults for all, a user can then train using defaults and if results are bad can tweak parameters until the model produces good audio. I also plan on making some tools to automatically find good parameters for example a learning rate finder(what you see in pytorch lighting and fastai). obviously I have a million ideas for helper tools and features but I’m only going to implement them as I progress through building the whole platform. I also like the idea of making it a separate repo because I always wanted more people to contribute to so people can give feedback and add their own features that I otherwise wouldn’t have thought of if that makes sense.
…Sent from my iPhone
On Oct 12, 2021, at 8:59 AM, Eren Gölge ***@***.***> wrote:
Also, what is the use-case for AutoTTS in your mind? My thinking, it targets especially non-technical users who want to train a new model on a custom dataset. That means actually we don't know if the default values are the best values for his dataset. Let's say he trained the first model with the default values then what should be the next step? Do you have an idea?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
Are you on the Gitter/Matrix channel? |
I am not if you’d like me to get on it I’d be more than happy to
…Sent from my iPhone
On Oct 14, 2021, at 10:33 AM, Eren Gölge ***@***.***> wrote:
Are you on the Gitter/Matrix channel?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
It'd be nice to get on there so that we can talk in detail :) |
…added functions to pick forward tts encoder and decoder
� Conflicts: � docs/source/tutorial_for_nervous_beginners.md � recipes/ljspeech/fast_pitch/train_fast_pitch.py � recipes/ljspeech/glow_tts/train_glowtts.py � recipes/ljspeech/hifigan/train_hifigan.py � recipes/ljspeech/multiband_melgan/train_multiband_melgan.py � recipes/ljspeech/univnet/train.py � recipes/ljspeech/vits_tts/train_vits.py � recipes/ljspeech/wavegrad/train_wavegrad.py � recipes/ljspeech/wavernn/train_wavernn.py
…ill working on adding more this is what I got so far
I put dataset downloaders in this pr It can be bundled with this pr but also I can make another pr for it if you want to merge it seperate from this one |
self.epochs = epochs | ||
self.manager = ModelManager() | ||
|
||
def _single_speaker_from_pretrained(self, model_name): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It'd be easier to use the ModelManager to parse the models from .models.json so we dont need to add models manually
self.data_path = data_path | ||
self.dataset_name = dataset | ||
|
||
def single_speaker_autotts( # im actually going to change this to autotts_recipes and i'm making a more generic |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For docstrings use this format https://numpydoc.readthedocs.io/en/latest/format.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even for personal notes
trainer = Trainer(args, config, output_path, c_logger, tb_logger) | ||
return trainer | ||
|
||
def multi_speaker_autotts( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can add ForwardTTS model too for multi-speaker
|
||
|
||
def main(): | ||
parser = argparse.ArgumentParser() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For Argparsing over default values you can use Coqpit https://github.com/coqui-ai/coqpit
# with each users data so im thinking of a way to have users define their own audio params with this | ||
|
||
|
||
def pick_glowtts_encoder(encoder_name: str): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better to parse it from the code gain to prevent manual editing in the future.
@@ -0,0 +1,256 @@ | |||
import logging |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this needs to be a class. We can define different functions for each dataset.
You can maybe add datasets here https://github.com/coqui-ai/TTS/blob/main/TTS/utils/downloaders.py
You should also create separate PRs for changes under 🐸TTS as we move AutoTTS to a new repo
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea I had made this before I you guys added the downloaders so I'm gong to make a new PR just adding functions for the other datasets
I put some comments to your changes. Let me know if you have any questions . |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. You might also look our discussion channels. |
I close this as it is going to be a separate repo |
Hey, just wanted to come here and say that auto tts is an actual tts engine for android. It basically allows for multi lingual tts by switching between tts voices. So you might need changing it |
This is just something I made on my personal fork for really quick testing of different models for my datasets and thought this same style would be cool for a sort of recipe api. All the model configs are either based off pre trained models or current recipes, haven't really messed around with them and I have only tested models for one epoch as my gpu is training something else. also added a data loader tool that will load the dataset with the proper audio configs. That is also just based off pre trained model configs. It's pretty easy to add more recipes. Let me know what you think and I can work on adding more recipes, also planning on making a vocoder trainer and a speaker encoder trainer.