Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WaveRNN generated samples sound strange #26

Closed
tugstugi opened this issue Sep 12, 2020 · 16 comments
Closed

WaveRNN generated samples sound strange #26

tugstugi opened this issue Sep 12, 2020 · 16 comments

Comments

@tugstugi
Copy link
Contributor

Hallo @begeekmyfriend,

I am trying to train 4 speaker (2 males and 2 women) WaveRNN model. I have successfully trained Tacotron. The wav files generated with Griffin Lim sound good. After that, I have generated gta files and now I am training WaveRNN, currently at 250k steps. But the WaveRNN samples sound really strange. I have 2 problems:

  • target wavs have no sound at all
  • generated wavs sound really strange: overlapping artifacts and not intelligable

I have attached a sample target and generated wav files.

What could be the reason for that? If 250k steps are not enough to generate intelligable audios, why are the target wavs silent? Is that normal?

wavernn.zip

@tugstugi tugstugi changed the title WaveRNN generated samples sound strage WaveRNN generated samples sound strange Sep 12, 2020
@begeekmyfriend
Copy link
Owner

Please check out whether the GTA Mel frames align with the wav length. You can calculate the lengths of them both.

@tugstugi
Copy link
Contributor Author

@begeekmyfriend

I have checked with #4 (comment)

All asserts are successfull and min and max values are: -4.0 -0.21728516

@tugstugi
Copy link
Contributor Author

tugstugi commented Sep 12, 2020

ok, it seems Tacotron train wasn't good enough even the Griffing Lim sounded good. Using GTA generated from little longer trained Tacotron (130k), the audios generated by WaveRNN are now getting little better.

I have some questions regarding to your 4 speaker Tacotron:

  • How many iterations did you train?
  • What is your final training loss?

Thanks

@begeekmyfriend
Copy link
Owner

200 epochs (not steps) for T2.

@tugstugi
Copy link
Contributor Author

@begeekmyfriend I have also around 40k files (10k for each speaker) and trained 200 epochs. My final training loss is around 0.35 which is compared to the NVidia Tacotron2 too high. Is that normal?

@begeekmyfriend
Copy link
Owner

You might try training 48h and see.

@tugstugi
Copy link
Contributor Author

After training Tacotron2 400 epochs, the loss is improved to 0.31. After applying more aggresive silence trimming, the loss is now around 0.27. @begeekmyfriend How did you trim your dataset? Could you share your trim_top_db for your dataset?

@begeekmyfriend
Copy link
Owner

It does not matter with trim_top_db, in fact it still clips the Mel value in preprocessing.

@tugstugi
Copy link
Contributor Author

Made further experiments: single speaker T2 training loss around from 0.11 to 0.15, 2 speakers around 0.18, 3 speakers 0.23.

@tugstugi
Copy link
Contributor Author

@begeekmyfriend I think I have found the cause why the WaveRNN produces such strange artifacts. I have trained T2 without --load-mel-from-disk. In this case, the min mel values are around -12. GTA/WaveRNN uses -4 as clip/pad values and this seems to cause the artifacts. How did you choose -4 as clip/pad?

I have also made a pull request #27 to fix the error mentioned in #24

@begeekmyfriend
Copy link
Owner

begeekmyfriend commented Sep 16, 2020

@tugstugi
Copy link
Contributor Author

tugstugi commented Sep 16, 2020

With -4 clipped mels, the 4 speaker T2 loss is now around 0.13, now I am training WaveRNN. Hopefully it solves the artifacts :)

@begeekmyfriend
Copy link
Owner

You'd better cut the edge data of corpus since there are noises.

@tugstugi
Copy link
Contributor Author

@begeekmyfriend you mean the trimming?

@begeekmyfriend
Copy link
Owner

It just works with these hyper parameters and what we need to do is just follow them.

@tugstugi
Copy link
Contributor Author

Now WaveRNN sounds after 200k steps ok:
wavernn.zip

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants