-
Notifications
You must be signed in to change notification settings - Fork 4.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dev pr2 : handle multi-speaker and GST in synthetizer class #5
Conversation
Sorry for being slow. I'll check the PR definitely the latest tomorrow. |
No problem ! Just let me know if you have any requests for modification :) |
I think the only immediate requirement is writing some testing code for the synthesizer. I'll write one for the current synthesizer in the dev branch then you can rebase and add more for multi-speaker and GST changes you made. |
ok we already have Can you implement test cases for your changes - GST and multi-speaker? |
Yes ! I will implement this and push the changes at the beginning of next week :) |
@kirianguiller any updates? |
Yes sorry, quite busy weeks I had here. Thanks for reminding me though. I will implement the test for the code I added and work on the new conflicts. |
@kirianguiller I am also on this PR. Maybe better if you wait me to push my updates. I also rebased the latest dev. I'll ping you. |
Oh cool ! Thank you. I am waiting for your changes then :) |
I close this for the sake of #441 |
Add tokenizer logging, update version for release 0.23.0
Hi guys,
Here the second split of the PR I did earlier this week.
This new content is for handling multi speaker and GST inference in the Synthesizer class (that is used in server.py or in the google colab for Chinese that I mentioned in the first point). Now, you can pass the following two optional parameters to the Synthesizer.tts() method :
speaker_json_key and style_wav . speaker_json_key is the name of the key of one of the speaker in the provided speakers.json . style_wav is either a path to a wav file for GST style transfer, or is a dict containing the {"token1":0.25, "token2" -0.1, etc...}. *The next step is to also give the user the possibility to directly provide the optional parameter speaker_embedding that is a speaker embedding (as a numpy array or a list?) that will be passed to Tacotron at inference time.
The synthesizer class is now simplier to use, and we can see in this google colab that this reduces the number of lines required for having working generation samples.
Thanks :)