Skip to content

Audio optimization #276

Answered by snakers4
KentDes asked this question in Q&A
Nov 30, 2022 · 1 comments · 1 reply
Discussion options

You must be logged in to vote

Do you have any advice or preferences to optimize your vad (like having min and max db, removing a specific frequency, ...)?

All of the knobs are here -

silero-vad/utils_vad.py

Lines 161 to 170 in 82d199f

def get_speech_timestamps(audio: torch.Tensor,
model,
threshold: float = 0.5,
sampling_rate: int = 16000,
min_speech_duration_ms: int = 250,
min_silence_duration_ms: int = 100,
window_size_samples: int = 512,
speech_pad_ms: int = 30,
return_seconds: bool = False,
visualize_probs: bool = False):

Be sure to test on the audio from your domain first, i.e. plot the probability chart and decide on the hyper-params

I would think …

Replies: 1 comment 1 reply

Comment options

You must be logged in to vote
1 reply
@KentDes
Comment options

Answer selected by KentDes
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
help wanted Extra attention is needed
2 participants
Converted from issue

This discussion was converted from issue #275 on November 30, 2022 11:22.