Optimal sample rate for input audio? #870

o0101 · 2023-01-21T05:42:04Z

o0101
Jan 21, 2023

Hello!

I'm sorry this will probably be a really dumb question, but I'm afraid I don't know where else to investigate the answer:

What would the optimal sample rate be for input to whisper?

Seems too high will slow it down with too much data, and too low may cause lower quality. I'm not expert so I'm sure it will seem like I have no idea what I'm talking about!

Anyway, I'm sure you're all super busy, so no worries if you can't reply--just thank you for reading this far! :)

Have a good one! 😍

Answered by jongwook

Jan 21, 2023

Hi!

Regardless of the sampling rate used in the original audio file, the audio signal gets resampled to 16kHz (via ffmpeg). So it should work with the recordings you have (likely 44.1 or 48 kHz). If you're creating new recordings and have an option to record in 16 kHz, it may become marginally faster since it can skip resampling and use less space than using a higher sample rate. Although, you'd probably not want to do this for the sake of keeping the recording in a higher audio quality.

View full answer

jongwook · 2023-01-21T05:47:55Z

jongwook
Jan 21, 2023
Maintainer

Hi!

Regardless of the sampling rate used in the original audio file, the audio signal gets resampled to 16kHz (via ffmpeg). So it should work with the recordings you have (likely 44.1 or 48 kHz). If you're creating new recordings and have an option to record in 16 kHz, it may become marginally faster since it can skip resampling and use less space than using a higher sample rate. Although, you'd probably not want to do this for the sake of keeping the recording in a higher audio quality.

7 replies

ryanheise May 15, 2023

Whichever format and sample rate you use, ffmpeg is always used to first read your file and see what it actually is so that it can use the right decoder and resample if necessary. The result of that decoding/resampling step is an internal Python array of audio samples in the range -1 to 1. If you are trying to avoid using ffmpeg, you would have to use Python code where you can directly pass an array of audio samples into transcribe(), but I think it's easier to just let ffmpeg and Whisper take care of this for you.

x86Gr May 15, 2023

@ryanheise I meant "it's still transcoding". From audio.py it looks like it is directly launching the ffmpeg transcode without checking if it is unnecessary. Maybe ffmpeg could avoid resampling if input format = output format, but it still shows lot of cpu usage. Not a big deal in total runtime, but skipping it could speed up potential real time uses.

ryanheise May 15, 2023

What is the encoding of your audio file? Not just the sample rate, but what encoding is it?

x86Gr May 15, 2023

@ryanheise they are wav(pcm) files encoded with ffmpeg with the command
ffmpeg -i -ar 16000 l -acodec pcm_s16le -y

ryanheise May 15, 2023

If the in/out sample rate is identical, ffmpeg will skip the resampling step, which you can verify by running ffmpeg on the command line with the same options and varying the sample rate and measuring with time, noting that ffmpeg runs faster when the in/out sample rate is the same. Although perhaps you could take some measurements yourself in case you find that this does not happen in your case.

So the CPU activity might not be related to the resampling, but it still also requires the CPU in order to simply read the WAV file and write it to the pipe. I don't know how much faster it would be if you used the Python API mentioned above to directly pass your scaled samples into transcribe() but you could try that, too, since that is a way to bypass ffmpeg if you are trying to avoid it.

jaggzh · 2023-05-13T20:57:17Z

jaggzh
May 13, 2023

Also, in case it's useful to anyone, the module's audio.py handles the conversion. Here's a chopped up snippet so you can see the format, sample rate, etc.:

SAMPLE_RATE = 16000
...
def load_audio(file: str, sr: int = SAMPLE_RATE):
	...
    cmd = [
        "ffmpeg",
        "-nostdin",
        "-threads", "0",
        "-i", file,
        "-f", "s16le",
        "-ac", "1",
        "-acodec", "pcm_s16le",
        "-ar", str(sr),
        "-"
    ]

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimal sample rate for input audio? #870

{{title}}

Replies: 2 comments 7 replies

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Optimal sample rate for input audio? #870

o0101 Jan 21, 2023

Replies: 2 comments · 7 replies

jongwook Jan 21, 2023 Maintainer

ryanheise May 15, 2023

x86Gr May 15, 2023

ryanheise May 15, 2023

x86Gr May 15, 2023

ryanheise May 15, 2023

jaggzh May 13, 2023

o0101
Jan 21, 2023

Replies: 2 comments 7 replies

jongwook
Jan 21, 2023
Maintainer

jaggzh
May 13, 2023