Finally, V5 is here, 3x faster, supporting 6000+ languages!

snakers4 released this 27 Jun 20:07

· 72 commits to master since this release

Performance and Model Size

3x faster inference for TorchScript, 10% faster inference for ONNX;
Now TorchScript is as fast as ONNX;
Model size is 2x larger, 2MB vs. 1MB;

Quality

The VAD supports more than 6,000 languages now;
Significanly more robust on noisy data;
Overall 5-7% quality increase on clean data;
Quality difference for 8 kHz and 16 kHz is negligible now;
Quality difference for different window sizes is negligible => window size was deprecated;
Added benchmarks on 9 unique datasets (2 private) and one holistic multi-domain dataset;

Changes and deprecations

ONNX opset 16;
window_size_samples is deprecated - now the VAD only works with fixed size window;
VAD now works with 8 kHz and 16 kHz sample rates, only with fixed 256 and 512 sample windows respectively;
Slightly changed internal logic, now some context (part of previous chunk) is passed along with the current chunk;
Sample rates that are a multiple of 16 kHz are still supported;

Assets 2

8 Join discussion