Skip to content

How does silero-vad compares to pyannote and nvidia nemo #152

Answered by snakers4
seekingdeep asked this question in Q&A
Discussion options

You must be logged in to vote

Hi,

As for pyannote, we compared it previously, it was ok for whole audios, but not very fast. But am not sure that it is a out-of-the-box solution (it was trained on limited academic data) and I believe it does not support streaming (or we did not read all of their code to find the streaming examples). If you find some checkpoints and streaming examples, please send a link.

Therefore, we use Google Speech Commands Dataset V2 [22] as speech data

As for MarbleNet I found some streaming examples here with pretrained models (but I am not sure if they were pre-trained just on Google speech commands which is limited).

In any case I am reluctant to invest time in adding this network into our…

Replies: 5 comments 8 replies

Comment options

You must be logged in to vote
2 replies
@hbredin
Comment options

@snakers4
Comment options

Answer selected by snakers4
Comment options

You must be logged in to vote
1 reply
@snakers4
Comment options

Comment options

You must be logged in to vote
1 reply
@snakers4
Comment options

Comment options

You must be logged in to vote
3 replies
@snakers4
Comment options

@seekingdeep
Comment options

@snakers4
Comment options

Comment options

You must be logged in to vote
1 reply
@snakers4
Comment options

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
help wanted Extra attention is needed
3 participants
Converted from issue

This discussion was converted from issue #151 on January 04, 2022 16:12.