How does silero-vad compares to pyannote and nvidia nemo #152
-
Hi there @snakers4 https://github.com/pyannote/pyannote-audio |
Beta Was this translation helpful? Give feedback.
Replies: 5 comments 8 replies
-
Hi, As for pyannote, we compared it previously, it was ok for whole audios, but not very fast. But am not sure that it is a out-of-the-box solution (it was trained on limited academic data) and I believe it does not support streaming (or we did not read all of their code to find the streaming examples). If you find some checkpoints and streaming examples, please send a link.
As for MarbleNet I found some streaming examples here with pretrained models (but I am not sure if they were pre-trained just on Google speech commands which is limited). In any case I am reluctant to invest time in adding this network into our benchmark for a couple of reasons:
|
Beta Was this translation helpful? Give feedback.
-
Thanks for the detailed reply, The pyannote link that you referenced links to pyannote 1, Also, i am interested in hearing your thoughts about SpeechBrain's vad |
Beta Was this translation helpful? Give feedback.
-
Thank you for your detailed reply, Note that SpeechBrain still don't have stream inferencing, nor do i see a deployment pipeline, but one of the things that caught my eye is that their model was able to correctly predict the part where both music and speech are active and classify it as speech in this sample. For speechbrain here are the links so you can test and compare:
|
Beta Was this translation helpful? Give feedback.
-
silero's performance is actually great, you can note that adding an auto-postprocessing step to the pipeline inorder to automatically generate the time per sentence would be even better. |
Beta Was this translation helpful? Give feedback.
-
can you explain. |
Beta Was this translation helpful? Give feedback.
Hi,
As for pyannote, we compared it previously, it was ok for whole audios, but not very fast. But am not sure that it is a out-of-the-box solution (it was trained on limited academic data) and I believe it does not support streaming (or we did not read all of their code to find the streaming examples). If you find some checkpoints and streaming examples, please send a link.
As for MarbleNet I found some streaming examples here with pretrained models (but I am not sure if they were pre-trained just on Google speech commands which is limited).
In any case I am reluctant to invest time in adding this network into our…