Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resample audio files in 10mb chunks #158

Merged
merged 1 commit into from
Jun 4, 2024
Merged

Conversation

finnvoor
Copy link
Contributor

@finnvoor finnvoor commented Jun 4, 2024

closes #16

Resampling audio files in 10mb chunks reduces the peak memory usage and fixes some niche issues with transcribing very long or very high sample rate / channel count audio files.

Before After
before after

10mb is a bit arbitrary, but I chose it to roughly match the peak memory usage of the rest of the pipeline.

I expect this will have a very minor negative impact on speed of resampling, but given this is a small fraction of the time compared to the rest of the pipeline + the memory savings, it seems like a reasonable tradeoff.

@ZachNagengast
Copy link
Contributor

Amazing! Was just about to look at this, do you think there's any impact on audio quality a the breakpoints? I wouldn't expect much just curious

Copy link
Contributor

@ZachNagengast ZachNagengast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice @finnvoor thanks for the contribution! Next step will be purging the already transcribed audio during decoding.

Sources/WhisperKit/Core/AudioProcessor.swift Outdated Show resolved Hide resolved
Sources/WhisperKit/Core/AudioProcessor.swift Outdated Show resolved Hide resolved
Update Sources/WhisperKit/Core/AudioProcessor.swift

Co-authored-by: Zach Nagengast <[email protected]>

Update Sources/WhisperKit/Core/AudioProcessor.swift

Co-authored-by: Zach Nagengast <[email protected]>
@finnvoor
Copy link
Contributor Author

finnvoor commented Jun 4, 2024

Amazing! Was just about to look at this, do you think there's any impact on audio quality a the breakpoints? I wouldn't expect much just curious

hmm, not really sure but I doubt it would be enough to notice. I didn't test much but I got the same transcript when a file was split into ~16 chunks.

@ZachNagengast ZachNagengast merged commit 25a0749 into argmaxinc:main Jun 4, 2024
9 checks passed
@atiorh
Copy link
Contributor

atiorh commented Jun 4, 2024

Thanks for the contrib @finnvoor! We will run full evals for 1.0.0 on all this behavior and address regressions (if any). This looks to be low risk but we might need to couple this with VAD to be double sure.

@ZachNagengast
Copy link
Contributor

FYI there appears to be an issue with this code that is placing audio in the wrong position in the outputBuffer. I am working on an approach that appends to the buffer every 10MB instead of writes directly to it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Resample audio file in chunks to reduce memory usage
3 participants