add 'handle multi-speaker and GST inference' in synthesizer class #3
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
… refactor/type some functionsHi everyone !
This week, I worked on three things :
training a better model for Chinese mandarin trained on 126k epochs (here you can see the associated google colab)
handling multi speaker and GST inference in the Synthesizer class (that is used in server.py or in the google colab for Chinese that I mentioned in the first point). Now, you can pass the following two optional parameters to the Synthesizer.tts() method :
speaker_json_key and style_wav . speaker_json_key is the name of the key of one of the speaker in the provided speakers.json . style_wav is either a path to a wav file for GST style transfer, or is a dict containing the {"token1":0.25, "token2" -0.1, etc...}. *The next step is to also give the user the possibility to directly provide the optional parameter speaker_embedding that is a speaker embedding (as a numpy array or a list?) that will be passed to Tacotron at inference time.
I've added some typing and made some refactoring to some functions and methods that appear in the Synthesizer class. I've added one abstract for TTS models and one abstract for Vocoder models to get better hinting from editors when handling with models.
The synthesizer class is now easier to use, and we can see in this google colab that this reduces the number of lines required for having working generation samples.
I look forward for your reviews :)