-
❓Help - The processed file sounds correctly after the VAD, but the model still seems to know where the big silence gap was and stops at this place.Hi, maybe someone faced with such a situation: I'm trying to apply the silero-VAD before the wav2vec model. The processed file sounds correctly after the VAD, but the model still seems to know where the big silence gap was and stops at this place. |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments 2 replies
-
Not sure I can understand. |
Beta Was this translation helpful? Give feedback.
-
The idea is as follows: apply VAD to the original file audio.wav, and use the collect_chunks() function to get the processed file without non-speech fragments - only_speech.wav. Then pass this only_speech.wav file as the input to the wav2vec model. |
Beta Was this translation helpful? Give feedback.
-
audio_files.zip |
Beta Was this translation helpful? Give feedback.
The idea is as follows: apply VAD to the original file audio.wav, and use the collect_chunks() function to get the processed file without non-speech fragments - only_speech.wav. Then pass this only_speech.wav file as the input to the wav2vec model.
But the thing is, if the original audio file contains longer pauses (about 2 sec), VAD process them successfully and sounds correctly, but the wav2vec model still seems to know where this pause was and stops at this place as if identifying it as the end of the file.
And I can't figure out how the model identifies this pause in the only_speech.wav file. As if there is some marker...