Releases
v5.0
Finally, V5 is here, 3x faster, supporting 6000+ languages!
Performance and Model Size
3x faster inference for TorchScript, 10% faster inference for ONNX;
Now TorchScript is as fast as ONNX;
Model size is 2x larger, 2MB vs. 1MB;
Quality
The VAD supports more than 6,000 languages now;
Significanly more robust on noisy data;
Overall 5-7% quality increase on clean data;
Quality difference for 8 kHz and 16 kHz is negligible now;
Quality difference for different window sizes is negligible => window size was deprecated;
Added benchmarks on 9 unique datasets (2 private) and one holistic multi-domain dataset;
Changes and deprecations
ONNX opset 16;
window_size_samples
is deprecated - now the VAD only works with fixed size window;
VAD now works with 8 kHz and 16 kHz sample rates, only with fixed 256 and 512 sample windows respectively;
Slightly changed internal logic, now some context (part of previous chunk) is passed along with the current chunk;
Sample rates that are a multiple of 16 kHz are still supported;
You can’t perform that action at this time.