-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WaveRNN generated samples sound strange #26
Comments
Please check out whether the GTA Mel frames align with the wav length. You can calculate the lengths of them both. |
I have checked with #4 (comment) All asserts are successfull and min and max values are: |
ok, it seems Tacotron train wasn't good enough even the Griffing Lim sounded good. Using GTA generated from little longer trained Tacotron (130k), the audios generated by WaveRNN are now getting little better. I have some questions regarding to your 4 speaker Tacotron:
Thanks |
200 epochs (not steps) for T2. |
@begeekmyfriend I have also around 40k files (10k for each speaker) and trained 200 epochs. My final training loss is around 0.35 which is compared to the NVidia Tacotron2 too high. Is that normal? |
You might try training 48h and see. |
After training Tacotron2 400 epochs, the loss is improved to 0.31. After applying more aggresive silence trimming, the loss is now around 0.27. @begeekmyfriend How did you trim your dataset? Could you share your |
It does not matter with |
Made further experiments: single speaker T2 training loss around from 0.11 to 0.15, 2 speakers around 0.18, 3 speakers 0.23. |
@begeekmyfriend I think I have found the cause why the WaveRNN produces such strange artifacts. I have trained T2 without I have also made a pull request #27 to fix the error mentioned in #24 |
With -4 clipped mels, the 4 speaker T2 loss is now around 0.13, now I am training WaveRNN. Hopefully it solves the artifacts :) |
You'd better cut the edge data of corpus since there are noises. |
@begeekmyfriend you mean the trimming? |
It just works with these hyper parameters and what we need to do is just follow them. |
Now WaveRNN sounds after 200k steps ok: |
Hallo @begeekmyfriend,
I am trying to train 4 speaker (2 males and 2 women) WaveRNN model. I have successfully trained Tacotron. The wav files generated with Griffin Lim sound good. After that, I have generated gta files and now I am training WaveRNN, currently at 250k steps. But the WaveRNN samples sound really strange. I have 2 problems:
I have attached a sample target and generated wav files.
What could be the reason for that? If 250k steps are not enough to generate intelligable audios, why are the target wavs silent? Is that normal?
wavernn.zip
The text was updated successfully, but these errors were encountered: