add 'handle multi-speaker and GST inference' in synthesizer class #3

kirianguiller · 2021-02-22T14:02:13Z

… refactor/type some functionsHi everyone !

This week, I worked on three things :

training a better model for Chinese mandarin trained on 126k epochs (here you can see the associated google colab)
handling multi speaker and GST inference in the Synthesizer class (that is used in server.py or in the google colab for Chinese that I mentioned in the first point). Now, you can pass the following two optional parameters to the Synthesizer.tts() method :
speaker_json_key and style_wav . speaker_json_key is the name of the key of one of the speaker in the provided speakers.json . style_wav is either a path to a wav file for GST style transfer, or is a dict containing the {"token1":0.25, "token2" -0.1, etc...}. *The next step is to also give the user the possibility to directly provide the optional parameter speaker_embedding that is a speaker embedding (as a numpy array or a list?) that will be passed to Tacotron at inference time.
I've added some typing and made some refactoring to some functions and methods that appear in the Synthesizer class. I've added one abstract for TTS models and one abstract for Vocoder models to get better hinting from editors when handling with models.

The synthesizer class is now easier to use, and we can see in this google colab that this reduces the number of lines required for having working generation samples.

I look forward for your reviews :)

… refactor/type some functions

erogol · 2021-02-23T10:01:32Z

I think there are 3 different PRs here :)

If you don't mind it is better the split them apart. It'd make things easier to manage.

New Chinese model
Synthesizer class
Abstraction

kirianguiller · 2021-02-26T16:04:27Z

I think there are 3 different PRs here :)

If you don't mind it is better the split them apart. It'd make things easier to manage.
1. New Chinese model

2. Synthesizer class

3. Abstraction

Thanks for your review, i've just made these 3 mentioned PRs.

I'm therefore closing this one :).

Remove pandas

Use Python logging instead of print()

add 'handle multi-speaker and GST inference' in synthesizer class AND…

df7162f

… refactor/type some functions

kirianguiller closed this Feb 26, 2021

erogol added a commit that referenced this pull request Apr 9, 2021

style update #3

87ee6ce

erogol added a commit that referenced this pull request May 6, 2021

[ci skip] config update #3 WIP

33e507c

erogol added a commit that referenced this pull request May 11, 2021

[ci skip] config update #3 WIP

97bd5f9

zjwang21 mentioned this pull request Dec 14, 2023

VITS multi speaker not work.[Bug] #3431

Closed

platform-kit mentioned this pull request Jan 5, 2024

XTTS License After Shutdown #3490

Closed

eginhard referenced this pull request in idiap/coqui-ai-TTS Apr 2, 2024

Merge pull request #3 from eginhard/remove-pandas

f24f7c1

Remove pandas

gravityrail pushed a commit to gravityrail/TTS that referenced this pull request Jul 8, 2024

Merge pull request coqui-ai#3 from idiap/logging

dfbe016

Use Python logging instead of print()

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add 'handle multi-speaker and GST inference' in synthesizer class #3

add 'handle multi-speaker and GST inference' in synthesizer class #3

kirianguiller commented Feb 22, 2021

erogol commented Feb 23, 2021

kirianguiller commented Feb 26, 2021

add 'handle multi-speaker and GST inference' in synthesizer class #3

add 'handle multi-speaker and GST inference' in synthesizer class #3

Conversation

kirianguiller commented Feb 22, 2021

erogol commented Feb 23, 2021

kirianguiller commented Feb 26, 2021