WaveRNN generated samples sound strange #26

tugstugi · 2020-09-12T00:18:40Z

I am trying to train 4 speaker (2 males and 2 women) WaveRNN model. I have successfully trained Tacotron. The wav files generated with Griffin Lim sound good. After that, I have generated gta files and now I am training WaveRNN, currently at 250k steps. But the WaveRNN samples sound really strange. I have 2 problems:

target wavs have no sound at all
generated wavs sound really strange: overlapping artifacts and not intelligable

I have attached a sample target and generated wav files.

What could be the reason for that? If 250k steps are not enough to generate intelligable audios, why are the target wavs silent? Is that normal?

wavernn.zip

begeekmyfriend · 2020-09-12T01:41:21Z

Please check out whether the GTA Mel frames align with the wav length. You can calculate the lengths of them both.

tugstugi · 2020-09-12T09:49:32Z

@begeekmyfriend

I have checked with #4 (comment)

All asserts are successfull and min and max values are: -4.0 -0.21728516

tugstugi · 2020-09-12T11:40:45Z

ok, it seems Tacotron train wasn't good enough even the Griffing Lim sounded good. Using GTA generated from little longer trained Tacotron (130k), the audios generated by WaveRNN are now getting little better.

I have some questions regarding to your 4 speaker Tacotron:

How many iterations did you train?
What is your final training loss?

Thanks

begeekmyfriend · 2020-09-12T12:57:39Z

200 epochs (not steps) for T2.

tugstugi · 2020-09-12T13:28:30Z

@begeekmyfriend I have also around 40k files (10k for each speaker) and trained 200 epochs. My final training loss is around 0.35 which is compared to the NVidia Tacotron2 too high. Is that normal?

begeekmyfriend · 2020-09-12T15:14:52Z

You might try training 48h and see.

tugstugi · 2020-09-14T11:59:20Z

After training Tacotron2 400 epochs, the loss is improved to 0.31. After applying more aggresive silence trimming, the loss is now around 0.27. @begeekmyfriend How did you trim your dataset? Could you share your trim_top_db for your dataset?

begeekmyfriend · 2020-09-15T11:54:29Z

It does not matter with trim_top_db, in fact it still clips the Mel value in preprocessing.

tugstugi · 2020-09-15T19:37:13Z

Made further experiments: single speaker T2 training loss around from 0.11 to 0.15, 2 speakers around 0.18, 3 speakers 0.23.

tugstugi · 2020-09-15T21:40:14Z

@begeekmyfriend I think I have found the cause why the WaveRNN produces such strange artifacts. I have trained T2 without --load-mel-from-disk. In this case, the min mel values are around -12. GTA/WaveRNN uses -4 as clip/pad values and this seems to cause the artifacts. How did you choose -4 as clip/pad?

I have also made a pull request #27 to fix the error mentioned in #24

begeekmyfriend · 2020-09-16T07:50:57Z

See #17 (comment) and begeekmyfriend/WaveRNN@7e1d403

tugstugi · 2020-09-16T10:17:33Z

With -4 clipped mels, the 4 speaker T2 loss is now around 0.13, now I am training WaveRNN. Hopefully it solves the artifacts :)

begeekmyfriend · 2020-09-16T10:21:21Z

You'd better cut the edge data of corpus since there are noises.

tugstugi · 2020-09-16T10:23:04Z

@begeekmyfriend you mean the trimming?

begeekmyfriend · 2020-09-16T10:36:05Z

It just works with these hyper parameters and what we need to do is just follow them.

tugstugi · 2020-09-17T09:43:58Z

Now WaveRNN sounds after 200k steps ok:
wavernn.zip

tugstugi changed the title ~~WaveRNN generated samples sound strage~~ WaveRNN generated samples sound strange Sep 12, 2020

tugstugi closed this as completed Sep 16, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WaveRNN generated samples sound strange #26

WaveRNN generated samples sound strange #26

tugstugi commented Sep 12, 2020

begeekmyfriend commented Sep 12, 2020

tugstugi commented Sep 12, 2020

tugstugi commented Sep 12, 2020 •

edited

Loading

begeekmyfriend commented Sep 12, 2020

tugstugi commented Sep 12, 2020

begeekmyfriend commented Sep 12, 2020

tugstugi commented Sep 14, 2020

begeekmyfriend commented Sep 15, 2020

tugstugi commented Sep 15, 2020

tugstugi commented Sep 15, 2020

begeekmyfriend commented Sep 16, 2020 •

edited

Loading

tugstugi commented Sep 16, 2020 •

edited

Loading

begeekmyfriend commented Sep 16, 2020

tugstugi commented Sep 16, 2020

begeekmyfriend commented Sep 16, 2020

tugstugi commented Sep 17, 2020

WaveRNN generated samples sound strange #26

WaveRNN generated samples sound strange #26

Comments

tugstugi commented Sep 12, 2020

begeekmyfriend commented Sep 12, 2020

tugstugi commented Sep 12, 2020

tugstugi commented Sep 12, 2020 • edited Loading

begeekmyfriend commented Sep 12, 2020

tugstugi commented Sep 12, 2020

begeekmyfriend commented Sep 12, 2020

tugstugi commented Sep 14, 2020

begeekmyfriend commented Sep 15, 2020

tugstugi commented Sep 15, 2020

tugstugi commented Sep 15, 2020

begeekmyfriend commented Sep 16, 2020 • edited Loading

tugstugi commented Sep 16, 2020 • edited Loading

begeekmyfriend commented Sep 16, 2020

tugstugi commented Sep 16, 2020

begeekmyfriend commented Sep 16, 2020

tugstugi commented Sep 17, 2020

tugstugi commented Sep 12, 2020 •

edited

Loading

begeekmyfriend commented Sep 16, 2020 •

edited

Loading

tugstugi commented Sep 16, 2020 •

edited

Loading