Skip to content

Finally, V5 is here, 3x faster, supporting 6000+ languages!

Compare
Choose a tag to compare
@snakers4 snakers4 released this 27 Jun 20:07
· 72 commits to master since this release
5cd2ba5

image

Performance and Model Size

  • 3x faster inference for TorchScript, 10% faster inference for ONNX;
  • Now TorchScript is as fast as ONNX;
  • Model size is 2x larger, 2MB vs. 1MB;

Quality

  • The VAD supports more than 6,000 languages now;
  • Significanly more robust on noisy data;
  • Overall 5-7% quality increase on clean data;
  • Quality difference for 8 kHz and 16 kHz is negligible now;
  • Quality difference for different window sizes is negligible => window size was deprecated;
  • Added benchmarks on 9 unique datasets (2 private) and one holistic multi-domain dataset;

Changes and deprecations

  • ONNX opset 16;
  • window_size_samples is deprecated - now the VAD only works with fixed size window;
  • VAD now works with 8 kHz and 16 kHz sample rates, only with fixed 256 and 512 sample windows respectively;
  • Slightly changed internal logic, now some context (part of previous chunk) is passed along with the current chunk;
  • Sample rates that are a multiple of 16 kHz are still supported;