You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The vad working principle is apply a voting mechanism to model output to decide the start and end of a segment speech. So the result depend on both the implement of the voting mechanism and model output. i.e. there is a control parameter min_silence_duration_ms ,maybe the default value is different bwtween python implement and cpp implement,and it can affect the length of final output(i.e a segment with 80ms silence will be consider as a whole by python version voting,but maybe consider as two segment by cpp version voting). And there are several others this kind of parameter that can affect the length of final output or start and end value. Of cause there is another posibility that the cpp voting implement is different from py version,I am not sure cause I am not familar with cpp. but as I see it the differences are tolerable and acceptable,most segments are valid speech.
The new VAD version was released just now - #2 (comment).
If the issue persists with the new version, can you please open a new issue referring to this one.
Many thanks!
❓ Questions and Help
I found the speech timestamps respectively obtained from pytorch and silero-vad-onnx.cpp is somewhat different. The input file is 'en_example.wav' which downloaded from torch.hub.
the speech timestamps from pytorch are as follows (set USE_ONNX = True or False):
{'end': 31200, 'start': 1568},
{'end': 73696, 'start': 42528},
{'end': 108512, 'start': 79392},
{'end': 163808, 'start': 149024},
{'end': 181728, 'start': 166944},
{'end': 211936, 'start': 183328},
{'end': 227808, 'start': 216608},
{'end': 241120, 'start': 229920},
{'end': 252896, 'start': 245280},
{'end': 285664, 'start': 260640},
{'end': 301024, 'start': 294432},
{'end': 311776, 'start': 303648},
{'end': 420320, 'start': 325664},
{'end': 455136, 'start': 422432},
{'end': 490976, 'start': 458784},
{'end': 520160, 'start': 493088},
{'end': 566752, 'start': 523808},
{'end': 601056, 'start': 572448},
{'end': 621024, 'start': 607264},
{'end': 669152, 'start': 638496},
{'end': 691680, 'start': 671776},
{'end': 712672, 'start': 697888},
{'end': 748512, 'start': 720928},
{'end': 798688, 'start': 781856},
{'end': 853984, 'start': 817696},
{'end': 865248, 'start': 856608},
{'end': 903648, 'start': 871968},
{'end': 916960, 'start': 906272},
{'end': 952288, 'start': 920096}]
the length of timestamps is 29.
the speech timestamps from silero-vad-onnx.cpp are as follows:
{start:00002048,end:00031744}
{start:00043008,end:00074752}
{start:00079872,end:00108544}
{start:00149504,end:00164864}
{start:00166912,end:00182272}
{start:00183296,end:00195584}
{start:00195584,end:00212992}
{start:00217088,end:00228352}
{start:00230400,end:00241664}
{start:00245760,end:00252928}
{start:00261120,end:00286720}
{start:00294912,end:00302080}
{start:00304128,end:00312320}
{start:00325632,end:00352256}
{start:00352256,end:00373760}
{start:00373760,end:00419840}
{start:00422912,end:00455680}
{start:00458752,end:00491520}
{start:00493568,end:00521216}
{start:00524288,end:00555008}
{start:00555008,end:00567296}
{start:00572416,end:00602112}
{start:00607232,end:00621568}
{start:00638976,end:00669696}
{start:00671744,end:00680960}
{start:00680960,end:00692224}
{start:00698368,end:00713728}
{start:00720896,end:00739328}
{start:00739328,end:00744448}
{start:00745472,end:00749568}
{start:00782336,end:00798720}
{start:00818176,end:00854016}
{start:00857088,end:00866304}
{start:00872448,end:00904192}
{start:00906240,end:00917504}
{start:00920576,end:00941056}
{start:00941056,end:00949248}
{start:00949248,end:00952320}
{start:00958464,end:00960000}
the length of timestamps is 39.
I wonder if above differences are tolerable and acceptable?
(https://github.com/snakers4/silero-models/wiki) available for our users. Please make sure you have checked it out first.
The text was updated successfully, but these errors were encountered: