The processed file sounds correctly after the VAD, but the model still seems to know where the big silence gap was and stops at this place. #312

ghost · 2023-03-20T15:15:28Z

ghost
Mar 20, 2023

❓Help - The processed file sounds correctly after the VAD, but the model still seems to know where the big silence gap was and stops at this place.

Hi, maybe someone faced with such a situation: I'm trying to apply the silero-VAD before the wav2vec model. The processed file sounds correctly after the VAD, but the model still seems to know where the big silence gap was and stops at this place.
Any guesses as to why this might be?

Answered by ghost

Mar 20, 2023

Not sure I can understand. Can you maybe post a minimal example?

The idea is as follows: apply VAD to the original file audio.wav, and use the collect_chunks() function to get the processed file without non-speech fragments - only_speech.wav. Then pass this only_speech.wav file as the input to the wav2vec model.
But the thing is, if the original audio file contains longer pauses (about 2 sec), VAD process them successfully and sounds correctly, but the wav2vec model still seems to know where this pause was and stops at this place as if identifying it as the end of the file.
And I can't figure out how the model identifies this pause in the only_speech.wav file. As if there is some marker...

View full answer

snakers4 · 2023-03-20T15:29:36Z

snakers4
Mar 20, 2023
Maintainer

Not sure I can understand.
Can you maybe post a minimal example?

0 replies

ghost · 2023-03-20T17:09:34Z

ghost
Mar 20, 2023

Not sure I can understand. Can you maybe post a minimal example?

The idea is as follows: apply VAD to the original file audio.wav, and use the collect_chunks() function to get the processed file without non-speech fragments - only_speech.wav. Then pass this only_speech.wav file as the input to the wav2vec model.
But the thing is, if the original audio file contains longer pauses (about 2 sec), VAD process them successfully and sounds correctly, but the wav2vec model still seems to know where this pause was and stops at this place as if identifying it as the end of the file.
And I can't figure out how the model identifies this pause in the only_speech.wav file. As if there is some marker...

2 replies

snakers4 Mar 21, 2023
Maintainer

Well, looks like this is an issue of wav2vec, not the vad. We do not put any markers into the file, we just cut it using timestamps.

You can try the following experiment - cut the file a bit differently, i.e. cut extra 0.5s on each silence and see what happens then.

ghost Mar 21, 2023

yes, the problem was with the wav2vec. I've replaced the wav2vec trained on commonvoice by the wav2vec trained on Librispeech, and it works correctly now.

ghost · 2023-03-20T17:28:26Z

ghost
Mar 20, 2023

audio_files.zip
there are files for an example.
wav2vec model transcribed just this part (using only_speech.wav file): IN OUR WORK WE ARE OFTEN SURPRISED BY THE FACT THAT MOST PEOPLE KNOW ABOUT AUTOMATIC SPEECH RECOGNITION BUT NOW VERY LITTLE ABOUT VOICE ACTIVITY DETECTION

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The processed file sounds correctly after the VAD, but the model still seems to know where the big silence gap was and stops at this place. #312

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 3 comments 2 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

The processed file sounds correctly after the VAD, but the model still seems to know where the big silence gap was and stops at this place. #312

ghost Mar 20, 2023

❓Help - The processed file sounds correctly after the VAD, but the model still seems to know where the big silence gap was and stops at this place.

Replies: 3 comments · 2 replies

snakers4 Mar 20, 2023 Maintainer

ghost Mar 20, 2023

snakers4 Mar 21, 2023 Maintainer

ghost Mar 21, 2023

ghost Mar 20, 2023

ghost
Mar 20, 2023

Replies: 3 comments 2 replies

snakers4
Mar 20, 2023
Maintainer

ghost
Mar 20, 2023

snakers4 Mar 21, 2023
Maintainer

ghost
Mar 20, 2023