-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The Minion randomly intruded into my audio #783
Comments
This happens all the time for me. Generate a few and choose the median length one. |
In my case, I need to preprocess to remove * , for example *args-> args. This way some error sounds will not be generated, but there are more steps that may require preprocessing that I didn't see. |
@20km-shimakaze Which characters have you found that you need to remove? Or instead, which character sets do you keep? |
I got the same problem when generate Chinese, and it seems to occur randomly, the same sentence could generated correctly when you test it. |
I haven't found a solution yet, but increasing the audio duration of the material seems to reduce the probability of occurrence. |
Self Checks
Cloud or Self Hosted
Self Hosted (Source)
Environment Details
Nvidia3090, Python 3.10, torch==2.4.1, torchvision==0.19.1, torchaudio==2.4.1
Steps to Reproduce
/root/miniconda3/bin/python -m tools.api_server --listen 0.0.0.0:6006 --llama-checkpoint-path "/usr/github/fish-speech/checkpoints/fish-speech-1.5" --decoder-checkpoint-path "/usr/github/fish-speech/checkpoints/fish-speech-1.5/firefly-gan-vq-fsq-8x1024-21hz-generator.pth" --decoder-config-name firefly_gan_vq --compile
✔️ Expected Behavior
No response
❌ Actual Behavior
Please listen to the last few seconds of this audio, where a Minion's voice appears.
https://saysay-bucket1.s3.us-west-1.amazonaws.com/uploads/default/20241224/4f75658f38eb0c163acced94328a73b6e78275bb.mp3
Text: By the end of this century, we will have reached a technological singularity, where quantum computing leads to a paradigm shift in epistemology.
This is a probabilistic issue. Out of my 100 audio files, 13 have similar occurrences.
Please help me, how should I solve this problem?
The text was updated successfully, but these errors were encountered: