Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feedback: My experience using this (very impressive) project #360

Closed
plummet555 opened this issue Jun 11, 2020 · 6 comments
Closed

Feedback: My experience using this (very impressive) project #360

plummet555 opened this issue Jun 11, 2020 · 6 comments

Comments

@plummet555
Copy link

Not an issue - just thought my experiences may be of use to some as this project.

I used it to build an Alexa skill that will read out Trump's latest tweet using his simulated voice. It's currently in certification - will be called 'Robo Trump AI Tweets' when it is published.

I think the results are pretty good. I had to do a few things though:

  1. Used part of @pulsaleith's fork to allow the code to run without a GPU (as I host it on AWS EC2). I cherry-picked the relevant commit as I couldn't get the tip of that fork to run.
  2. export CUDA_VISIBLE_DEVICES="", otherwise it still won't run without a GPU
  3. Set hardcoded seeds in Tacotron2, otherwise the result are very inconsistent
  4. I found that often there would be a harsh pop or other artifact at the start of the audio. I did a lot of experimenting with that. In the end, I added the word 'clip' to the start of every input sentence, then removed it from the output with silence detection (find the first gap in the output audio)
@plummet555
Copy link
Author

.. by the way @CorentinJ I'd be happy to attribute your project or resemble.ai as you prefer in the skill description. It's non-commercial (i.e. free)

@plummet555
Copy link
Author

Skill store links for those who are interested in the output:

US store: https://www.amazon.com/dp/B08B59XJLY
UK store: https://www.amazon.co.uk/dp/B08B3ZQ6SP

I've posted about it in a few Reddit forums and referenced this project.

@ghost
Copy link

ghost commented Jun 30, 2020

Thanks for sharing @plummet555 ! I have a question and a suggestion:

  1. Set hardcoded seeds in Tacotron2, otherwise the result are very inconsistent

Can you please share the changes? This would help me for #384 where I am trying to make the output more consistent.

  1. I found that often there would be a harsh pop or other artifact at the start of the audio.

You can try this vocoder model with additional training: #126 (comment) The speech quality is nearly identical but I find it cuts down on these types of artifacts. If you try it out I'd like to know if it worked for you.

@ghost
Copy link

ghost commented Jul 8, 2020

Closing this issue due to inactivity, feel free to reopen. I would appreciate an answer on how to set the tacotron2 hardcoded seeds for better repeatability.

@ghost ghost closed this as completed Jul 8, 2020
@plummet555
Copy link
Author

Hi @blue-fish - sorry I forgot to reply earlier.

I've just shared my repo with you. I didn't make it public as it is a bit messy but hopefully it will help you.

Look at the changes to tacotron2 in the 'silence detection and seeding' commit.

@ghost
Copy link

ghost commented Jul 8, 2020

Thank you @plummet555 ! I found the changes you were describing. I'll add a note to #384 .

If you didn't notice my comment about updated vocoder model above, you can try plugging that in (no change to hparams needed) and see if the audio quality gets better. I've noticed fewer artifacts but no difference in voice. #126 (comment)

This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant