Skip to content

input and output #230

Answered by snakers4
JJun-Guo asked this question in Q&A
Sep 5, 2022 · 1 comments · 4 replies
Discussion options

You must be logged in to vote

Hi,

The VAD accepts only 16 kHz or 8 kHz.

silero-vad/utils_vad.py

Lines 119 to 171 in 7c671a7

def get_speech_timestamps(audio: torch.Tensor,
model,
threshold: float = 0.5,
sampling_rate: int = 16000,
min_speech_duration_ms: int = 250,
min_silence_duration_ms: int = 100,
window_size_samples: int = 1536,
speech_pad_ms: int = 30,
return_seconds: bool = False,
visualize_probs: bool = False):
"""
This method is used for splitting long audios into speech chunks using silero VAD
Parameters
----------
audio: torch.Tensor, one dimensional
One dimensional float torch.Tensor, other types are casted…

Replies: 1 comment 4 replies

Comment options

You must be logged in to vote
4 replies
@JJun-Guo
Comment options

@snakers4
Comment options

@JJun-Guo
Comment options

@snakers4
Comment options

Answer selected by snakers4
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
help wanted Extra attention is needed
2 participants
Converted from issue

This discussion was converted from issue #229 on September 05, 2022 04:35.