poor performance in compare to the main paper? #411

amintavakol · 2020-07-09T09:11:52Z

Hi,
For those of you who are working with this repo to synthesize different voices;
Have you noticed a huge difference between the voices generated by this repo and the samples released by the main paper here?
If yes, let's discuss and find out the reason(s).

ghost · 2020-07-09T10:05:53Z

I would recommend reading #41 and the first few posts of #126 for some context. The main points as I understand them:

SV2TTS authors trained speaker encoder to 50M steps, this one is just over 1M.
SV2TTS authors used embedding size of 768, this one uses 256.
SV2TTS authors used a larger, proprietary dataset for encoder training which gets better results.

The speaker encoder was trained on a proprietary voice search corpus containing 36M utterances with median duration of 3.9 seconds from 18K English speakers in the United States. This dataset is not transcribed, but contains anonymized speaker identities. It is never used to train synthesis networks.

CorentinJ · 2020-07-10T08:21:41Z

Don't forget this too:

Use LibriTTS instead of LibriSpeech in order to have punctuation.

LibriTTS needs to be curated of speakers with bad prosody.

You can lower the upper bound I put on utterance duration, which I suspect has for effect of removing long utterances that are more likely to have more pauses (I formally evaluated models trained this way to generate less frequent long pauses). It also trains faster and does not have drawbacks (with a good attention paradigm, the model can generate longer sentences than seen in training).

The attention paradigm needs to be replaced, forward attention is poor.

If the attention paradigm holds prosody-specific parameters, it may be complemented with a speaker embedding mechanism

#364 (comment)

CorentinJ · 2020-07-10T08:26:17Z

At Resemble.AI we also have better results by using a new vocoder that my colleague @fatchord developed. I believe he's about to publish the paper he wrote about it.

ghost · 2020-07-12T17:41:52Z

We can reduce artifacts in the vocoder with additional training (#126 (comment)) . However, it does not make a perceptible difference in the cloned voice. This result also suggests that to the extent the vocoder has an impact on the output quality, we are reaching the limits of what is possible with WaveRNN.

winterfate · 2020-07-13T02:36:01Z

At Resemble.AI we also have better results by using a new vocoder that my colleague @fatchord developed. I believe he's about to publish the paper he wrote about it.

Absolutely astounding what you're all doing at Resemble, as well. Saw the LTT videos done in cooperation with you lot as well; was very happy to see some publicity in front of the average tech nerd.

CorentinJ · 2020-07-13T06:19:19Z

Yeah and the LTT video is using models dating from january, our sound quality has way improved since

ghost mentioned this issue Jul 13, 2020

Control pauses between words #291

Closed

amintavakol closed this as completed Jul 22, 2020

This was referenced Jul 26, 2020

Single speaker fine-tuning process and results #437

Closed

Poor quality results - are there other ways? #453

Closed

Issues with outputs of short and long texts #459

Closed

ghost mentioned this issue Oct 16, 2020

Why the generated voice sounds so unreal? #564

Closed

ghost mentioned this issue Dec 5, 2020

Quality of the voice #611

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

poor performance in compare to the main paper? #411

poor performance in compare to the main paper? #411

amintavakol commented Jul 9, 2020

ghost commented Jul 9, 2020

CorentinJ commented Jul 10, 2020 •

edited

Loading

CorentinJ commented Jul 10, 2020

ghost commented Jul 12, 2020

winterfate commented Jul 13, 2020

CorentinJ commented Jul 13, 2020

poor performance in compare to the main paper? #411

poor performance in compare to the main paper? #411

Comments

amintavakol commented Jul 9, 2020

ghost commented Jul 9, 2020

CorentinJ commented Jul 10, 2020 • edited Loading

CorentinJ commented Jul 10, 2020

ghost commented Jul 12, 2020

winterfate commented Jul 13, 2020

CorentinJ commented Jul 13, 2020

CorentinJ commented Jul 10, 2020 •

edited

Loading