Example app VAD default + memory reduction #217

ZachNagengast · 2024-10-08T02:08:07Z

This PR sets the example app and CLI to use VAD as the default setting. VAD uses a lot of memory for async predicitons so this also includes some improvements to memory / thread handling in general. There is future work to improve this (see #209) but I'm including one of @keleftheriou's fixes here in 2770d84.

Memory issue detail

For very large files, there was a large spike as it was copied into a float array for the model to consume (peaks at 2gb):

Now it will directly convert the audio into a float array in chunks to mitigate this (never goes above 1gb):

This has a speed reduction of about 20% to process the full file, which can surely be improved.

Co-authored-by: keleftheriou <[email protected]>

- Reduces peak memory by doing the array conversion while loading in chunks so the array copy size is lower - Previously copied the entire buffer which spiked the memory 2x

@available

- Optional cli commands are deprecated - @_disfavoredOverload required @available to prevent infinite loop

Tests/WhisperKitTests/UnitTests.swift

Sources/WhisperKit/Core/Audio/AudioProcessor.swift

Co-authored-by: Andrey Leonov <[email protected]>

kaiwen-wang · 2025-01-25T23:43:16Z

Based on this would it be better to not use VAD if we want quality? Since the model handles speech start/end, isn't the only benefit of VAD essentially not processing silent audio

and not splitting the text arbitrarily at periodic intervals

ZachNagengast and others added 6 commits October 6, 2024 17:00

Release memory when transcribing single files

2770d84

Co-authored-by: keleftheriou <[email protected]>

Add method to load from file into float array iteratively

e3078a8

- Reduces peak memory by doing the array conversion while loading in chunks so the array copy size is lower - Previously copied the entire buffer which spiked the memory 2x

Fix leak

baea188

Use vad by default in examples

33759ed

Merge branch 'main' into whisperax-faster-defaults

e673e71

Fix vad thread issue

37a4d4f

ZachNagengast requested a review from a2they October 8, 2024 02:08

ZachNagengast added 3 commits October 7, 2024 19:17

Fix unused warning

d99f7a6

Revert change to early stop callback

6da35b5

Fix warnings

23b8226

- Optional cli commands are deprecated - @_disfavoredOverload required @available to prevent infinite loop

a2they approved these changes Oct 8, 2024

View reviewed changes

Tests/WhisperKitTests/UnitTests.swift Outdated Show resolved Hide resolved

Sources/WhisperKit/Core/Audio/AudioProcessor.swift Outdated Show resolved Hide resolved

ZachNagengast and others added 2 commits October 8, 2024 08:39

PR review - simplify early stop test logic

14461c0

Co-authored-by: Andrey Leonov <[email protected]>

Cleanup from review

0a46e6f

ZachNagengast merged commit e3e21d4 into main Oct 8, 2024
9 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Example app VAD default + memory reduction #217

Example app VAD default + memory reduction #217

ZachNagengast commented Oct 8, 2024 •

edited

Loading

kaiwen-wang commented Jan 25, 2025 •

edited

Loading

Example app VAD default + memory reduction #217

Example app VAD default + memory reduction #217

Conversation

ZachNagengast commented Oct 8, 2024 • edited Loading

Memory issue detail

kaiwen-wang commented Jan 25, 2025 • edited Loading

ZachNagengast commented Oct 8, 2024 •

edited

Loading

kaiwen-wang commented Jan 25, 2025 •

edited

Loading