❓ Question: How to load RAW audio instead of only WAV? #261

hobojoe · 2022-10-28T12:48:59Z

hobojoe
Oct 28, 2022

❓ Questions and Help

Hi,

First of all, congratulations on the VAD model, it is great!

I have a question though: Can I load RAW audio files instead of the WAV?

I am currently processing streaming audio in RAW format: (ALAW - Mono - 8khz). Since the volume of data is high, I would like to avoid the conversion to WAV before running the model.

Thanks!

Answered by adamnsandle

Oct 28, 2022

I was able to load your example using the following code:

import soundfile as sf
wav, sr = sf.read('files/01b084d5-e5e3-4348-b14e-beee32cb6909.raw', samplerate=8000, channels=1, subtype='ALAW', dtype='float32')
wav = torch.tensor(wav)

Then you can use VAD model to process this chunk.
For example:

## just probabilities

speech_probs = []
window_size_samples = 256
for i in range(0, len(wav), window_size_samples):
    chunk = wav[i: i+window_size_samples]
    if len(chunk) < window_size_samples:
        break
    speech_prob = model(chunk, 8000).item()
    speech_probs.append(speech_prob)
model.reset_states() # reset model states after each audio

print(speech_probs[:10]) # first 10 chunks p…

View full answer

adamnsandle · 2022-10-28T13:07:38Z

adamnsandle
Oct 28, 2022
Collaborator

Could you upload an example of a single alaw file?

0 replies

hobojoe · 2022-10-28T13:30:15Z

hobojoe
Oct 28, 2022
Author

Could you upload an example of a single alaw file?

Sure:
https://www.dropbox.com/s/s9may9l58szafb0/01b084d5-e5e3-4348-b14e-beee32cb6909.raw?dl=1

0 replies

adamnsandle · 2022-10-28T14:05:10Z

adamnsandle
Oct 28, 2022
Collaborator

I was able to load your example using the following code:

import soundfile as sf
wav, sr = sf.read('files/01b084d5-e5e3-4348-b14e-beee32cb6909.raw', samplerate=8000, channels=1, subtype='ALAW', dtype='float32')
wav = torch.tensor(wav)

Then you can use VAD model to process this chunk.
For example:

## just probabilities

speech_probs = []
window_size_samples = 256
for i in range(0, len(wav), window_size_samples):
    chunk = wav[i: i+window_size_samples]
    if len(chunk) < window_size_samples:
        break
    speech_prob = model(chunk, 8000).item()
    speech_probs.append(speech_prob)
model.reset_states() # reset model states after each audio

print(speech_probs[:10]) # first 10 chunks predicts

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

❓ Question: How to load RAW audio instead of only WAV? #261

{{title}}

Replies: 3 comments

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

❓ Question: How to load RAW audio instead of only WAV? #261

hobojoe Oct 28, 2022

❓ Questions and Help

Replies: 3 comments

adamnsandle Oct 28, 2022 Collaborator

hobojoe Oct 28, 2022 Author

adamnsandle Oct 28, 2022 Collaborator

hobojoe
Oct 28, 2022

adamnsandle
Oct 28, 2022
Collaborator

hobojoe
Oct 28, 2022
Author

adamnsandle
Oct 28, 2022
Collaborator