-
Notifications
You must be signed in to change notification settings - Fork 99
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
One click installer + usage question #2
Comments
Hey, I've updated the Vocos API (#4) to make it easier to integrate with Bark. Take a look at the example notebook. Hope it helps! |
Thank you! For now this is the initial UI, but it will grow from here. |
hey rsxdalv Could you make a training section that lets us train our own vocos model at a higher sample rate? |
It's possible, do you have a sample of the command/dataset/config? |
Dataset I am imagining multiple 10 second audio files config I was making for 48k is pytorch_lightning==1.8.6seed_everything: 4444 data:
model:
trainer: Lightning calculates max_steps across all optimizer steps (rather than number of batches)This equals to 1M steps per generator and 1M per discriminatormax_steps: 2000000 You might want to limit val batches when evaluating all the metrics, as they are time-consuminglimit_val_batches: 100 |
Hi, thank you for the great project you have made available!
I added it to my one click installed package of AI based audio generators. Link
Here's the notebook I quickly created:
https://github.com/rsxdalv/tts-generation-webui/blob/main/notebooks/vocos.ipynb
I wonder if using this in a pipeline with SunoAI/Bark has a different impact than with something else. I couldn't manage to link up the raw encodec codes so I used the final wav files.
I saw the best result when using 12kbps bandwidth although if I remember correctly Bark model runs on 6kbps.
In my small sample size I didn't see a unsupervised improvement although I found an example where it gives more "quality" to a sound sample (I included it next to the notebook).
I would love to see how would it go if I could link it up with the encodec tokens from Bark and how to best go about using it.
The text was updated successfully, but these errors were encountered: