Repetition in recordings #72

gudrob · 2024-05-10T11:08:08Z

So far everything has been working out of the box, so thank you for this great plugin!

Issue:
I'm having problems with repetition. Recognition is good, but the same sentence is repeated over and over.

What I have tried:
From what I can see in the whisper documentation, the entropy threashold should fix this.
But there seems to be no effect when I change the value.

entropy 2.8, default

entropy 5

entropy 0

If at all higher values make recognition less precise.

Is this related to the other problem regarding Voice Activation Detection?
I have tried changing the VAD threshold as well but that seems to be doing nothing.

I have also tried using a larger whisper model but that yields the same results, only slower.

gudrob · 2024-05-11T09:29:55Z

So I replaced the Capture Effect of the audio bus with a Record Effect. I used linear Interpolation to resample the data i got from GetRecording() from 48000 to 16000. This works with an astounding accuracy of ~95% ( I am not a native english speaker). No repetition, even recognizes names correctly.

While this approach works for me, i just couldnt get the sample capture implementation to work.

Ughuuu · 2024-05-13T11:27:39Z

Interesting, this sounds like it could be an issue with how I am doing the interpolation. This plugin currently uses libsamplerate for that, as seen here: https://github.com/V-Sekai/godot-whisper/blob/main/src/speech_to_text.cpp#L32

The resample function also exposes a InterpolatorType:

	enum InterpolatorType {
		SRC_SINC_BEST_QUALITY = 0,
		SRC_SINC_MEDIUM_QUALITY = 1,
		SRC_SINC_FASTEST = 2,
		SRC_ZERO_ORDER_HOLD = 3,
		SRC_LINEAR = 4,
	};

By default it's set to FASTEST

godot-whisper/bin/addons/godot_whisper/capture_stream_to_text.gd

Line 66 in c3682d7

var resampled = resample(_accumulated_frames, SpeechToText.SRC_SINC_FASTEST)

You could also give a try to set it to BEST_QUALITY see if there is a change. If not the solution/approach you did is pretty good as well, if you want you can make a new scene with it and add a PR for others to try.(if not I might if I get some time).

gudrob · 2024-05-14T09:56:25Z

@Ughuuu I have implemented this in C#, here https://github.com/gudatr/godot-ai-rpg/blob/main/scripts/SpeechRecognizer.cs but it greatly differs from the examples of the project. I tried writing the code in gdscript but I must admit that I am too inexperienced with it, especially if the implementation needs to be close to the samples, and currently have no motiviation to learn it, sorry.

Ughuuu · 2024-05-14T11:16:49Z

No worries, thanks for this, it's great! If anything it's a sample people can look at if they want to do sampling manually. I'm also busy but maybe in future I might take a stab at it.

gudrob changed the title ~~Repetition in recordings, e~~ Repetition in recordings May 10, 2024

fire added the bug Something isn't working label Nov 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repetition in recordings #72

Repetition in recordings #72

gudrob commented May 10, 2024

gudrob commented May 11, 2024

Ughuuu commented May 13, 2024

gudrob commented May 14, 2024 •

edited

Loading

Ughuuu commented May 14, 2024

Repetition in recordings #72

Repetition in recordings #72

Comments

gudrob commented May 10, 2024

gudrob commented May 11, 2024

Ughuuu commented May 13, 2024

gudrob commented May 14, 2024 • edited Loading

Ughuuu commented May 14, 2024

gudrob commented May 14, 2024 •

edited

Loading