-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Repetition in recordings #72
Comments
So I replaced the Capture Effect of the audio bus with a Record Effect. I used linear Interpolation to resample the data i got from GetRecording() from 48000 to 16000. This works with an astounding accuracy of ~95% ( I am not a native english speaker). No repetition, even recognizes names correctly. While this approach works for me, i just couldnt get the sample capture implementation to work. |
Interesting, this sounds like it could be an issue with how I am doing the interpolation. This plugin currently uses libsamplerate for that, as seen here: https://github.com/V-Sekai/godot-whisper/blob/main/src/speech_to_text.cpp#L32 The resample function also exposes a
By default it's set to FASTEST
You could also give a try to set it to BEST_QUALITY see if there is a change. If not the solution/approach you did is pretty good as well, if you want you can make a new scene with it and add a PR for others to try.(if not I might if I get some time). |
@Ughuuu I have implemented this in C#, here https://github.com/gudatr/godot-ai-rpg/blob/main/scripts/SpeechRecognizer.cs but it greatly differs from the examples of the project. I tried writing the code in gdscript but I must admit that I am too inexperienced with it, especially if the implementation needs to be close to the samples, and currently have no motiviation to learn it, sorry. |
No worries, thanks for this, it's great! If anything it's a sample people can look at if they want to do sampling manually. I'm also busy but maybe in future I might take a stab at it. |
So far everything has been working out of the box, so thank you for this great plugin!
Issue:
I'm having problems with repetition. Recognition is good, but the same sentence is repeated over and over.
What I have tried:
From what I can see in the whisper documentation, the entropy threashold should fix this.
But there seems to be no effect when I change the value.
entropy 2.8, default
entropy 5
entropy 0
If at all higher values make recognition less precise.
Is this related to the other problem regarding Voice Activation Detection?
I have tried changing the VAD threshold as well but that seems to be doing nothing.
I have also tried using a larger whisper model but that yields the same results, only slower.
The text was updated successfully, but these errors were encountered: