❓ Question: How to load RAW audio instead of only WAV? #261
-
❓ Questions and HelpHi, First of all, congratulations on the VAD model, it is great! I have a question though: Can I load RAW audio files instead of the WAV? I am currently processing streaming audio in RAW format: (ALAW - Mono - 8khz). Since the volume of data is high, I would like to avoid the conversion to WAV before running the model. Thanks! |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments
-
Could you upload an example of a single alaw file? |
Beta Was this translation helpful? Give feedback.
-
Sure: |
Beta Was this translation helpful? Give feedback.
-
I was able to load your example using the following code: import soundfile as sf
wav, sr = sf.read('files/01b084d5-e5e3-4348-b14e-beee32cb6909.raw', samplerate=8000, channels=1, subtype='ALAW', dtype='float32')
wav = torch.tensor(wav) Then you can use VAD model to process this chunk. ## just probabilities
speech_probs = []
window_size_samples = 256
for i in range(0, len(wav), window_size_samples):
chunk = wav[i: i+window_size_samples]
if len(chunk) < window_size_samples:
break
speech_prob = model(chunk, 8000).item()
speech_probs.append(speech_prob)
model.reset_states() # reset model states after each audio
print(speech_probs[:10]) # first 10 chunks predicts |
Beta Was this translation helpful? Give feedback.
I was able to load your example using the following code:
Then you can use VAD model to process this chunk.
For example: