Skip to content

v0.7.0

Compare
Choose a tag to compare
@ZachNagengast ZachNagengast released this 24 May 10:02
· 44 commits to main since this release
c829f9a

This is a very exciting release because we're seeing yet another massive speedup in offline throughput thanks to VAD based chunking πŸš€

Highlights

  • Energy VAD based chunking πŸ—£οΈ @jkrukowski
    • There is a new decoding option called chunkingStrategy which can significantly speed up your single file transcriptions with minimal WER downsides.
    • It works by finding a clip point in the middle of the longest silence (lowest audio energy) in the last 15s of a 30s window and uses that to split up all the audio ahead of time so it can be asynchronously decoded in parallel.
    • Heres a video of it in action, comparing .none chunking strategy with .vad
vad.chunking.mp4
  • Detect language helper:
    • You can now call detectLanguage with just an audio path as input from the main whisperKit object. This will return a simple language code and probability back as a tuple, and has minimal logging/timing.
    • Example:
let whisperKit = try await WhisperKit()
let (language, probs) = try await whisperKit.detectLanguage(audioPath: "your/audio/path/spanish.wav")
print(language) // "es"
  • WhisperKit via Expo @seb-sep
  • Bug fixes and enhancements:
    • @jiangdi0924 and @fengcunhan contributed some nice fixes in this release with #136 and #138 (see below)
    • Also moved the decoding progress callback to be fully async so that it doesn't block the decoder thread

What's Changed

New Contributors

Full Changelog: v0.6.1...v0.7.0