Resample audio files in 10mb chunks #158

finnvoor · 2024-06-04T10:11:11Z

closes #16

Resampling audio files in 10mb chunks reduces the peak memory usage and fixes some niche issues with transcribing very long or very high sample rate / channel count audio files.

Before	After

10mb is a bit arbitrary, but I chose it to roughly match the peak memory usage of the rest of the pipeline.

I expect this will have a very minor negative impact on speed of resampling, but given this is a small fraction of the time compared to the rest of the pipeline + the memory savings, it seems like a reasonable tradeoff.

ZachNagengast · 2024-06-04T18:11:34Z

Amazing! Was just about to look at this, do you think there's any impact on audio quality a the breakpoints? I wouldn't expect much just curious

ZachNagengast

Very nice @finnvoor thanks for the contribution! Next step will be purging the already transcribed audio during decoding.

Sources/WhisperKit/Core/AudioProcessor.swift

Update Sources/WhisperKit/Core/AudioProcessor.swift Co-authored-by: Zach Nagengast <[email protected]> Update Sources/WhisperKit/Core/AudioProcessor.swift Co-authored-by: Zach Nagengast <[email protected]>

finnvoor · 2024-06-04T19:25:47Z

Amazing! Was just about to look at this, do you think there's any impact on audio quality a the breakpoints? I wouldn't expect much just curious

hmm, not really sure but I doubt it would be enough to notice. I didn't test much but I got the same transcript when a file was split into ~16 chunks.

atiorh · 2024-06-04T22:46:40Z

Thanks for the contrib @finnvoor! We will run full evals for 1.0.0 on all this behavior and address regressions (if any). This looks to be low risk but we might need to couple this with VAD to be double sure.

ZachNagengast · 2024-06-30T15:59:18Z

FYI there appears to be an issue with this code that is placing audio in the wrong position in the outputBuffer. I am working on an approach that appends to the buffer every 10MB instead of writes directly to it.

ZachNagengast approved these changes Jun 4, 2024

View reviewed changes

Sources/WhisperKit/Core/AudioProcessor.swift Outdated Show resolved Hide resolved

Sources/WhisperKit/Core/AudioProcessor.swift Outdated Show resolved Hide resolved

Resample audio files in 10mb chunks

44f8d0c

Update Sources/WhisperKit/Core/AudioProcessor.swift Co-authored-by: Zach Nagengast <[email protected]> Update Sources/WhisperKit/Core/AudioProcessor.swift Co-authored-by: Zach Nagengast <[email protected]>

finnvoor force-pushed the main branch from a073163 to 44f8d0c Compare June 4, 2024 19:21

ZachNagengast merged commit 25a0749 into argmaxinc:main Jun 4, 2024
9 checks passed

iandundas mentioned this pull request Jun 11, 2024

Segment order regression since 10mb chunking #163

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Resample audio files in 10mb chunks #158

Resample audio files in 10mb chunks #158

finnvoor commented Jun 4, 2024 •

edited

Loading

ZachNagengast commented Jun 4, 2024

ZachNagengast left a comment

finnvoor commented Jun 4, 2024

atiorh commented Jun 4, 2024

ZachNagengast commented Jun 30, 2024

Resample audio files in 10mb chunks #158

Resample audio files in 10mb chunks #158

Conversation

finnvoor commented Jun 4, 2024 • edited Loading

ZachNagengast commented Jun 4, 2024

ZachNagengast left a comment

Choose a reason for hiding this comment

finnvoor commented Jun 4, 2024

atiorh commented Jun 4, 2024

ZachNagengast commented Jun 30, 2024

finnvoor commented Jun 4, 2024 •

edited

Loading