-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Quality #1
Comments
I've not compared yet with iSTFTNet or MB-MelGan, but i'll try to hear those models on the same sample. Here is a sample of a first model. I'll probably try to increase the alpha of mrd loss to 1.0 as this was suggested here gemelo-ai/vocos#48 |
@abylouw interesting to compare outputs for Ukrainian as well I have a RAD-TTS model with these vocoders: |
I am training your vocos-matcha with mrd loss = 1.0 and 44100 Hz So it is very slow, for almost 1M iterations, it still sounds slightly worse than yours https://huggingface.co/BSC-LT/vocos-mel-22khz. And metrics still worse. |
Hi @wetdog What's the status of this implementation in terms of quality and speed? I've great expectations for this repo 🙂 |
Hi @mush42 I finished a training for the mel version this week. in terms of quality it achieves better periodicity, Also I fixed some things with the encodec experiment this week and now is training. For this trainings I used the mel features compatible with hifigan but probably is worth to train a version with 24khz using the same features as the original vocos. Let me know if you have some doubts. |
@egorsmkv Great work I would probably use your versions to run some metrics and compare the quality of those vocoders. |
@wetdog I have added your wavenext pretrained model to my huggingface app that runs pflowtts model. But unfortunately it sounds not very good. There is 4 vocoders that generate all waveforms from the same mel spectrogram generated by pflowtts and wavenext sounds similar to hifigan but slightly worse. There are also 44100 vocos vocoder trained from your implementation and it sounds the best. You can check it here https://huggingface.co/spaces/patriotyk/pflowtts_ukr_demo |
@patriotyk Thanks for the quick implementation, Do you think that this could be due to the dataset where it was trained? I used libritts for this run but I would like to try a version with commophone https://arxiv.org/abs/2201.05912 to make it more "universal". |
Hi Heavy TTS user here. Specifically, there is an audible hissing noise in the audio vocoded by vocos, probably as an ISTFT artifact. Here's a sample of an unseen speaker, where Matcha TTS is used to generate the melspectogram. Best |
@patriotyk |
Hi! @wetdog Could you please share the .ckpt checkpoint file in addition to the .bin checkpoint file that you provided? I want finetuning! but .bin checkpoint exist only generator! |
@fd873630 I just uploaded the ckpt. you can find it here https://huggingface.co/BSC-LT/wavenext-mel/blob/main/wavenext_2M_libritt_r.ckpt |
@wetdog Best |
Hi,
Thank you for creating this repo and implementing the architecture in the paper. I have been looking at the paper and was going to try an implementation.
Do you have any preliminary results available? Do you think that it is better than for example iSTFTNet or MB-MelGan?
The text was updated successfully, but these errors were encountered: