input and output #230

JJun-Guo · 2022-09-05T02:52:28Z

JJun-Guo
Sep 5, 2022

Hi,I input a speech sample with 44.1kHz sample rate and 16 (Sampling depth)，the output is a speech with 16kHZ sample rate and 32 (Sampling depth), how can I get the default sampling depth speech sample???

Answered by snakers4

Sep 5, 2022

Hi,

The VAD accepts only 16 kHz or 8 kHz.

silero-vad/utils_vad.py

Lines 119 to 171 in 7c671a7

     def get_speech_timestamps(audio: torch.Tensor,  
   model,  
   threshold: float = 0.5,  
   sampling_rate: int = 16000,  
   min_speech_duration_ms: int = 250,  
   min_silence_duration_ms: int = 100,  
   window_size_samples: int = 1536,  
   speech_pad_ms: int = 30,  
   return_seconds: bool = False,  
   visualize_probs: bool = False):  
    
   """  
    This method is used for splitting long audios into speech chunks using silero VAD  
     
    Parameters  
    ----------  
    audio: torch.Tensor, one dimensional  
    One dimensional float torch.Tensor, other types are casted…

View full answer

snakers4 · 2022-09-05T04:36:31Z

snakers4
Sep 5, 2022
Maintainer

Hi,

The VAD accepts only 16 kHz or 8 kHz.

silero-vad/utils_vad.py

Lines 119 to 171 in 7c671a7

    
           def get_speech_timestamps(audio: torch.Tensor, 
        
                                     model, 
        
                                     threshold: float = 0.5, 
        
                                     sampling_rate: int = 16000, 
        
                                     min_speech_duration_ms: int = 250, 
        
                                     min_silence_duration_ms: int = 100, 
        
                                     window_size_samples: int = 1536, 
        
                                     speech_pad_ms: int = 30, 
        
                                     return_seconds: bool = False, 
        
                                     visualize_probs: bool = False): 
        
               """ 
        
               This method is used for splitting long audios into speech chunks using silero VAD 
        
               Parameters 
        
               ---------- 
        
               audio: torch.Tensor, one dimensional 
        
                   One dimensional float torch.Tensor, other types are casted to torch if possible 
        
               model: preloaded .jit silero VAD model 
        
               threshold: float (default - 0.5) 
        
                   Speech threshold. Silero VAD outputs speech probabilities for each audio chunk, probabilities ABOVE this value are considered as SPEECH. 
        
                   It is better to tune this parameter for each dataset separately, but "lazy" 0.5 is pretty good for most datasets. 
        
               sampling_rate: int (default - 16000) 
        
                   Currently silero VAD models support 8000 and 16000 sample rates 
        
               min_speech_duration_ms: int (default - 250 milliseconds) 
        
                   Final speech chunks shorter min_speech_duration_ms are thrown out 
        
               min_silence_duration_ms: int (default - 100 milliseconds) 
        
                   In the end of each speech chunk wait for min_silence_duration_ms before separating it 
        
               window_size_samples: int (default - 1536 samples) 
        
                   Audio chunks of window_size_samples size are fed to the silero VAD model. 
        
                   WARNING! Silero VAD models were trained using 512, 1024, 1536 samples for 16000 sample rate and 256, 512, 768 samples for 8000 sample rate. 
        
                   Values other than these may affect model perfomance!! 
        
               speech_pad_ms: int (default - 30 milliseconds) 
        
                   Final speech chunks are padded by speech_pad_ms each side 
        
               return_seconds: bool (default - False) 
        
                   whether return timestamps in seconds (default - samples) 
        
               visualize_probs: bool (default - False) 
        
                   whether draw prob hist or not 
        
               Returns 
        
               ---------- 
        
               speeches: list of dicts 
        
                   list containing ends and beginnings of speech chunks (samples or seconds based on return_seconds) 
        
               """

4 replies

JJun-Guo Sep 5, 2022
Author

yes, the speech with other sample rate will be resampled with 16kHz? and the sampling depth can be choosed

snakers4 Sep 5, 2022
Maintainer

You need to resample.

JJun-Guo Sep 5, 2022
Author

I get the following code including the resample process, do i need to resample the speech manually？
def read_audio(path: str,
sampling_rate: int = 16000):
wav, sr = torchaudio.load(path)

if wav.size(0) > 1:
    wav = wav.mean(dim=0, keepdim=True)

if sr != sampling_rate:
    transform = torchaudio.transforms.Resample(orig_freq=sr,
                                               new_freq=sampling_rate)
    wav = transform(wav)
    sr = sampling_rate
assert sr == sampling_rate
return wav.squeeze(0)

snakers4 Sep 6, 2022
Maintainer

do i need to resample the speech manually

You need to make sure that incoming samples are either 16 kHz or 8 kHz

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

input and output #230

{{title}}

Replies: 1 comment 4 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

	def get_speech_timestamps(audio: torch.Tensor,
	model,
	threshold: float = 0.5,
	sampling_rate: int = 16000,
	min_speech_duration_ms: int = 250,
	min_silence_duration_ms: int = 100,
	window_size_samples: int = 1536,
	speech_pad_ms: int = 30,
	return_seconds: bool = False,
	visualize_probs: bool = False):

	"""
	This method is used for splitting long audios into speech chunks using silero VAD

	Parameters
	----------
	audio: torch.Tensor, one dimensional
	One dimensional float torch.Tensor, other types are casted…

input and output #230

JJun-Guo Sep 5, 2022

Replies: 1 comment · 4 replies

snakers4 Sep 5, 2022 Maintainer

JJun-Guo Sep 5, 2022 Author

snakers4 Sep 5, 2022 Maintainer

JJun-Guo Sep 5, 2022 Author

snakers4 Sep 6, 2022 Maintainer

JJun-Guo
Sep 5, 2022

Replies: 1 comment 4 replies

snakers4
Sep 5, 2022
Maintainer

JJun-Guo Sep 5, 2022
Author

snakers4 Sep 5, 2022
Maintainer

JJun-Guo Sep 5, 2022
Author

snakers4 Sep 6, 2022
Maintainer