Feedback: My experience using this (very impressive) project #360

plummet555 · 2020-06-11T17:01:12Z

Not an issue - just thought my experiences may be of use to some as this project.

I used it to build an Alexa skill that will read out Trump's latest tweet using his simulated voice. It's currently in certification - will be called 'Robo Trump AI Tweets' when it is published.

I think the results are pretty good. I had to do a few things though:

Used part of @pulsaleith's fork to allow the code to run without a GPU (as I host it on AWS EC2). I cherry-picked the relevant commit as I couldn't get the tip of that fork to run.
export CUDA_VISIBLE_DEVICES="", otherwise it still won't run without a GPU
Set hardcoded seeds in Tacotron2, otherwise the result are very inconsistent
I found that often there would be a harsh pop or other artifact at the start of the audio. I did a lot of experimenting with that. In the end, I added the word 'clip' to the start of every input sentence, then removed it from the output with silence detection (find the first gap in the output audio)

plummet555 · 2020-06-11T17:03:24Z

.. by the way @CorentinJ I'd be happy to attribute your project or resemble.ai as you prefer in the skill description. It's non-commercial (i.e. free)

plummet555 · 2020-06-14T11:23:20Z

Skill store links for those who are interested in the output:

US store: https://www.amazon.com/dp/B08B59XJLY
UK store: https://www.amazon.co.uk/dp/B08B3ZQ6SP

I've posted about it in a few Reddit forums and referenced this project.

ghost · 2020-06-30T04:53:06Z

Thanks for sharing @plummet555 ! I have a question and a suggestion:

Set hardcoded seeds in Tacotron2, otherwise the result are very inconsistent

Can you please share the changes? This would help me for #384 where I am trying to make the output more consistent.

I found that often there would be a harsh pop or other artifact at the start of the audio.

You can try this vocoder model with additional training: #126 (comment) The speech quality is nearly identical but I find it cuts down on these types of artifacts. If you try it out I'd like to know if it worked for you.

ghost · 2020-07-08T17:57:02Z

Closing this issue due to inactivity, feel free to reopen. I would appreciate an answer on how to set the tacotron2 hardcoded seeds for better repeatability.

plummet555 · 2020-07-08T18:34:18Z

Hi @blue-fish - sorry I forgot to reply earlier.

I've just shared my repo with you. I didn't make it public as it is a bit messy but hopefully it will help you.

Look at the changes to tacotron2 in the 'silence detection and seeding' commit.

ghost · 2020-07-08T20:13:49Z

Thank you @plummet555 ! I found the changes you were describing. I'll add a note to #384 .

If you didn't notice my comment about updated vocoder model above, you can try plugging that in (no change to hparams needed) and see if the audio quality gets better. I've noticed fewer artifacts but no difference in voice. #126 (comment)

ghost closed this as completed Jul 8, 2020

ghost mentioned this issue Jul 20, 2020

Questions about the toolbox from @mbdash #433

Closed

This issue was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feedback: My experience using this (very impressive) project #360

Feedback: My experience using this (very impressive) project #360

plummet555 commented Jun 11, 2020

plummet555 commented Jun 11, 2020

plummet555 commented Jun 14, 2020

ghost commented Jun 30, 2020

ghost commented Jul 8, 2020

plummet555 commented Jul 8, 2020

ghost commented Jul 8, 2020

Feedback: My experience using this (very impressive) project #360

Feedback: My experience using this (very impressive) project #360

Comments

plummet555 commented Jun 11, 2020

plummet555 commented Jun 11, 2020

plummet555 commented Jun 14, 2020

ghost commented Jun 30, 2020

ghost commented Jul 8, 2020

plummet555 commented Jul 8, 2020

ghost commented Jul 8, 2020