Audio optimization #276

KentDes · 2022-11-30T11:19:36Z

KentDes
Nov 30, 2022

Hi,

I use voice recognition on my raspberry pi with a dsp (I can configure this dsp (gain, pass filter, ...)).
I want to use your vad with my voice recognition to filter false detection.
Do you have any advice or preferences to optimize your vad (like having min and max db, removing a specific frequency, ...)?
I would think of adding an AGC to improve the result when I am far away but there is more false detection...

Is there an order to implement audio processes (like agc, vad, reveber, noise reduction,...) before your vad?

Thanks

Answered by snakers4

Nov 30, 2022

Do you have any advice or preferences to optimize your vad (like having min and max db, removing a specific frequency, ...)?

All of the knobs are here -

silero-vad/utils_vad.py

Lines 161 to 170 in 82d199f

     def get_speech_timestamps(audio: torch.Tensor,  
   model,  
   threshold: float = 0.5,  
   sampling_rate: int = 16000,  
   min_speech_duration_ms: int = 250,  
   min_silence_duration_ms: int = 100,  
   window_size_samples: int = 512,  
   speech_pad_ms: int = 30,  
   return_seconds: bool = False,  
   visualize_probs: bool = False):  

 

Be sure to test on the audio from your domain first, i.e. plot the probability chart and decide on the hyper-params

I would think …

View full answer

snakers4 · 2022-11-30T11:27:21Z

snakers4
Nov 30, 2022
Maintainer

Do you have any advice or preferences to optimize your vad (like having min and max db, removing a specific frequency, ...)?

All of the knobs are here -

silero-vad/utils_vad.py

Lines 161 to 170 in 82d199f

    
           def get_speech_timestamps(audio: torch.Tensor, 
        
                                     model, 
        
                                     threshold: float = 0.5, 
        
                                     sampling_rate: int = 16000, 
        
                                     min_speech_duration_ms: int = 250, 
        
                                     min_silence_duration_ms: int = 100, 
        
                                     window_size_samples: int = 512, 
        
                                     speech_pad_ms: int = 30, 
        
                                     return_seconds: bool = False, 
        
                                     visualize_probs: bool = False):

Be sure to test on the audio from your domain first, i.e. plot the probability chart and decide on the hyper-params

I would think of adding an AGC to improve the result when I am far away but there is more false detection...

You can combine our VAD with any other VAD or algorithm and use its features in your algorithm.
The VAD is trained to receive audio as-is without much external interference.

Is there an order to implement audio processes (like agc, vad, reveber, noise reduction,...) before your vad?

Not sure why adding more noise or reverb before the VAD is helpful, but the VAD just receives audio.

1 reply

KentDes Nov 30, 2022
Author

Thank you @snakers4,

Ok, I'll take a look at the hyper params.

Not sure why adding more noise or reverb before the VAD is helpful, but the VAD just receives audio.

I was talking about noise cancellation and dereverb (no reverb sorry).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Audio optimization #276

{{title}}

Replies: 1 comment 1 reply

{{title}}

{{title}}

Select a reply

	def get_speech_timestamps(audio: torch.Tensor,
	model,
	threshold: float = 0.5,
	sampling_rate: int = 16000,
	min_speech_duration_ms: int = 250,
	min_silence_duration_ms: int = 100,
	window_size_samples: int = 512,
	speech_pad_ms: int = 30,
	return_seconds: bool = False,
	visualize_probs: bool = False):

Audio optimization #276

KentDes Nov 30, 2022

Replies: 1 comment · 1 reply

snakers4 Nov 30, 2022 Maintainer

KentDes Nov 30, 2022 Author

KentDes
Nov 30, 2022

Replies: 1 comment 1 reply

snakers4
Nov 30, 2022
Maintainer

KentDes Nov 30, 2022
Author