Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues with outputs of short and long texts #459

Closed
kkprabhu opened this issue Jul 30, 2020 · 2 comments
Closed

Issues with outputs of short and long texts #459

kkprabhu opened this issue Jul 30, 2020 · 2 comments

Comments

@kkprabhu
Copy link

Hello, and thank you for this great work @CorentinJ
I have below few observations. Wanted to bring them to your notice and seek your advice in fixing them.

  1. For a shorter input text (~<25 characters), the generated audio has gaps/noise in between words. Is there any way to prevent it(happens consistently)?
  2. For longer input text(between 25 to 120 characters), the words are skipped in between. This is not consistent but happens quite frequently. Any reason for this and any way to prevent this?
  3. For very long input text(>120 characters with multiple sentences), the generated output starts with a normal speed and then the speed increases to an extent it becomes difficult to comprehend. Any solution to this?

I am using the pre-trained models you have published in this repo.
Appreciate if you can share your thoughts and advice.
Thanks in advance!

@ghost
Copy link

ghost commented Jul 30, 2020

  1. See Fixing the synthesizer's gaps in spectrograms #53. It gets a little better with a LibriTTS-trained model: Training a new model based on LibriTTS #449 (comment) .
    • As a workaround, the "enhance vocoder output" option in the toolbox will also use voice activation detection to trim out these gaps if you have the webrtcvad package installed.
  2. Have not seen this before, can you try to find an input sequence + random seed that does this? And provide the source audio file for the embed.
  3. Issue was also reported in Re: speed or rate of talking - generated audio speaking way too fast #347 (don't have an explanation yet)

Also see #411 for discussion about some things that should be changed or improved.

@ghost
Copy link

ghost commented Aug 7, 2020

@kkprabhu Closing as duplicate of #411. You are welcome to reopen this issue if you have audio samples and a reproducible test case for the number 2 item (skipped words).

@ghost ghost closed this as completed Aug 7, 2020
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant