ONNX model detect soft hum as speech #164

wciurzynski · 2022-01-26T13:25:35Z

wciurzynski
Jan 26, 2022

🐛 Bug

Using function get_speech_timestamps (from silero-vad) for attached .wav I got result [{'start': 4128, 'end': 29664}] so model detect it as speech.
possible_speech_but_noise.zip

Answered by snakers4

Jan 26, 2022

If you renormalize this audio (the model does this internally) - you get this:

noise_fp.zip

I can hear some microphone / wind (?) artefacts, this is probably why network gets triggered. But during the white noise it gets un-triggered.

The probability chart looks like this:

wav = read_audio('possible_speech_but_noise.wav', sampling_rate=SAMPLING_RATE)
wav *= 1 / wav.max()
# get speech timestamps from full audio file
speech_timestamps = get_speech_timestamps(wav, model,
                                          sampling_rate=SAMPLING_RATE,
                                          visualize_probs=True,
                                          window_size_samples=1024,
                   …

View full answer

snakers4 · 2022-01-26T14:45:52Z

snakers4
Jan 26, 2022
Maintainer

If you renormalize this audio (the model does this internally) - you get this:

noise_fp.zip

I can hear some microphone / wind (?) artefacts, this is probably why network gets triggered. But during the white noise it gets un-triggered.

The probability chart looks like this:

wav = read_audio('possible_speech_but_noise.wav', sampling_rate=SAMPLING_RATE)
wav *= 1 / wav.max()
# get speech timestamps from full audio file
speech_timestamps = get_speech_timestamps(wav, model,
                                          sampling_rate=SAMPLING_RATE,
                                          visualize_probs=True,
                                          window_size_samples=1024,
                                          return_seconds=True,
                                          threshold=0.9)

For noisier data it is generally a good idea to set threshold=0.9.

4 replies

snakers4 Jan 26, 2022
Maintainer

@wciurzynski
moved this to silero-vad under discussions/qa, since this looks like more of a hyper-param settings question

wciurzynski Jan 26, 2022
Author

This recording was call with speaker mode - in the backgroud was server cabinet
Why probability of speech is so high ?
Model could be trained to not recognize this as speech ?

snakers4 Jan 27, 2022
Maintainer

In case of noisy environments and some third-party (I assume because of the "speaker mode") noise suppression algorithms (which our network did not see during training, probably this is the reason everything is very silent before the re-normalization) I guess the easiest way is just to set-up a threshold to 0.9 and compare what the VAD outputs when there is real speech.

While it is very easy to adapt the model to some particular case (i.e. 8 kHz, some language, or some noise), adapting it to some custom noise filtering may be an issue.

Also this may be dumb, but you can setup some custom energy filter or use WebRTC VAD together with our VAD. They both feature 30ms window.

wciurzynski May 26, 2022
Author

@snakers4
I understand. I cannot set threshold to 0.9 because often even 0.5 is too high for phone calls.
I have another wave file but with wind and model detects speech for window_size_samples smaller than 768 with ONXX=False.
I process this sound with window_size_samples 256 and threshold 0.2 for phone calls. Maybe
the accuracy of the model would be higher if I would use model for Polish language :)

Could you check this sound of wind and train model so that wind would be not detected as speech ?
wind phone call.zip

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ONNX model detect soft hum as speech #164

{{title}}

Replies: 1 comment 4 replies

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

ONNX model detect soft hum as speech #164

wciurzynski Jan 26, 2022

🐛 Bug

Replies: 1 comment · 4 replies

snakers4 Jan 26, 2022 Maintainer

snakers4 Jan 26, 2022 Maintainer

wciurzynski Jan 26, 2022 Author

snakers4 Jan 27, 2022 Maintainer

wciurzynski May 26, 2022 Author

wciurzynski
Jan 26, 2022

Replies: 1 comment 4 replies

snakers4
Jan 26, 2022
Maintainer

snakers4 Jan 26, 2022
Maintainer

wciurzynski Jan 26, 2022
Author

snakers4 Jan 27, 2022
Maintainer

wciurzynski May 26, 2022
Author