[Bugfix][Frontend] Fix audio transcription for MP4, M4A, and WebM formats by seanmamasde · Pull Request #35109 · vllm-project/vllm

seanmamasde · 2026-02-23T15:43:44Z

Purpose

Fix /v1/audio/transcriptions (and /v1/audio/translations) to correctly handle MP4, M4A, and WebM audio uploads. These three container formats are listed as supported in both the OpenAI API specification and the vLLM documentation, yet they have been broken since the transcription endpoint was first
introduced.

Fixes #16335
Fixes #26808
Fixes #18385

this should supersede #18477 (stale and only addressed WebM). This PR addresses all three broken formats and incorporates the reviewer feedback from #18477 (no tempfile, narrower exception handling, debug logging on the fallback path).

Cause

_preprocess_speech_to_text() wraps the uploaded bytes in a BytesIO and passes them to librosa.load(). Under the hood, librosa delegates to soundfile (libsndfile), which auto-detects the codec from the stream. This works for self-describing formats like WAV, FLAC, MP3, and OGG because their headers contain enough information for libsndfile to identify them.

MP4 (AAC), M4A (AAC), and WebM (Opus/Vorbis) (container formats) use ISOBMFF or Matroska containers whose detection in libsndfile relies on a filename extension hint that BytesIO objects cannot provide. When libsndfile fails, librosa is supposed to fall back to audioread (which shells out to ffmpeg), but audioread also cannot handle BytesIO objects because ffmpeg needs a seekable file path.

The result is "Error opening <_io.BytesIO object>: Format not recognised.", shown as HTTP 500 (v0.13) or HTTP 200 with an error body (v0.15+).

Critically, librosa.load(filepath_string) works perfectly for all nine documented formats. The bug is exclusively in the BytesIO code path.

Changes

load_audio_bytes() in vllm/entrypoints/openai/speech_to_text/utils.py first tries librosa.load(BytesIO(...)) (soundfile backend) and, on known libsndfile format detection failures (soundfile.LibsndfileError codes {1, 3, 4}), falls back to an in-process decode via torchaudio.load(BytesIO(...)) (torchcodec).

_preprocess_speech_to_text() is replaced with a call to load_audio_bytes().
Added torchcodec to vLLM requirements, since torchaudio>=2.9 uses it for decoding (optional in vllm[audio]).

Some more details

Avoids spawning an ffmpeg subprocess at request time in previous commits (addressing the latency concern raised in review) while still supporting MP4/M4A/WebM container formats.
Current code tries BytesIO first and only falls back on failure. If a future libsndfile version adds native MP4 support (support matrix), the fast path will automatically start working for those formats.
The fallback path logs at DEBUG level, as suggested in the [Bugfix][Frontend] support webm with audioread fallback #18477 review.
torchaudio is already a vLLM dependency. Since torchaudio>=2.9 uses TorchCodec under the hood for decoding, torchcodec is added as dep.

tests

Tested all 9 formats documented by the OpenAI API against a live vLLM server running openai/whisper-large-v3-turbo on an NVIDIA a30.

Test audio: short LibriSpeech test-clean sample (public domain / LibriVox-derived), converted from a WAV source to all formats via ffmpeg.

# Server startup
python -m vllm.entrypoints.openai.api_server \
  --model openai/whisper-large-v3-turbo \
  --max-model-len 448 --dtype auto

# Test each format
for fmt in wav flac mp3 mpga ogg mp4 mpeg m4a webm; do
  curl -s -w "\n[%{http_code}]" \
    -F "file=@test.${fmt}" \
    -F "model=openai/whisper-large-v3-turbo" \
    http://localhost:8000/v1/audio/transcriptions
done

Test Result

Before patch (baseline)

Tested on vLLM v0.13.0 and v0.15.1 (unpatched):

Format	v0.13.0	v0.15.1	Response
wav	200 OK	200 OK	Transcription text
flac	200 OK	200 OK	Transcription text
mp3	200 OK	200 OK	Transcription text
mpga	200 OK	200 OK	Transcription text
ogg	200 OK	200 OK	Transcription text
mpeg	200 OK	200 OK	Transcription text
mp4	500	200 (error in body)	`"Error opening <_io.BytesIO object>: Format not recognised."`
m4a	500	200 (error in body)	`"Error opening <_io.BytesIO object>: Format not recognised."`
webm	500	200 (error in body)	`"Error opening <_io.BytesIO object>: Format not recognised."`

After patch

All formats now pass. The 3 previously broken formats (mp4, m4a, webm) now work via the in-process torchaudio fallback. The other six formats continue to use the fast BytesIO/librosa path.

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

github-actions · 2026-02-23T15:44:36Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors.

You ask your reviewers to trigger select CI tests on top of fastcheck CI.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

If you have any questions, please reach out to us on Slack at https://slack.vllm.ai.

🚀

gemini-code-assist

Code Review

The pull request introduces a robust solution for handling various audio formats (MP4, M4A, WebM) in the speech-to-text transcription endpoint. The core change involves a new _decode_audio_bytes_ffmpeg function that leverages os.memfd_create and ffmpeg for in-memory decoding, avoiding disk I/O and permission issues. This is integrated into a _load_audio_bytes helper that attempts a fast librosa.load path first, falling back to the ffmpeg method if necessary. This approach directly addresses the root cause of previous failures with container formats and improves the overall reliability of the audio transcription service. The changes are well-documented and include a comprehensive test plan and results, demonstrating the effectiveness of the fix.

gemini-code-assist · 2026-02-23T15:47:07Z

vllm/entrypoints/openai/speech_to_text/speech_to_text.py

+    except Exception:
+        pass


Catching a generic Exception is too broad and can mask unexpected errors, making debugging difficult. It's better to catch specific exceptions that librosa.load is known to raise when it fails to decode certain formats, such as soundfile.LibsndfileError or audioread.exceptions.NoBackendError if audioread were directly involved. If the exact exceptions are not known, consider logging the exception type and message before falling back, or catching a more specific base class if one exists for audio decoding failures.

Suggested change

except Exception:

pass

except (soundfile.LibsndfileError, audioread.exceptions.NoBackendError) as e:

logger.debug("Librosa BytesIO decode failed: %s", e)

Copilot

Pull request overview

Fixes the OpenAI-compatible speech-to-text preprocessing path so MP4/M4A/WebM container uploads can be decoded (via an ffmpeg fallback) instead of failing when librosa.load(BytesIO(...)) can’t infer the format.

Changes:

Added _decode_audio_bytes_ffmpeg() to decode audio bytes to mono float32 PCM using ffmpeg with an in-memory FD.
Added _load_audio_bytes() to try librosa.load(BytesIO(...)) first and fall back to ffmpeg on failure.
Updated _preprocess_speech_to_text() to use _load_audio_bytes() instead of a direct BytesIO + librosa.load call.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-23T15:49:59Z

vllm/entrypoints/openai/speech_to_text/speech_to_text.py

+    sr = int(sr)
+    fd = os.memfd_create("vllm_audio")
+    try:
+        os.write(fd, audio_data)


os.write(fd, audio_data) is not guaranteed to write all bytes in one call. If it performs a partial write, ffmpeg will see a truncated container and decoding may fail or produce incorrect audio. Consider writing in a loop (or use os.fdopen(fd, 'wb', closefd=False) and .write()/.flush()) and verify the full length is written before running ffmpeg.

Suggested change

os.write(fd, audio_data)

# Ensure the full audio_data buffer is written to the memfd.

total_written = 0

data_len = len(audio_data)

while total_written < data_len:

written = os.write(fd, audio_data[total_written:])

if written == 0:

raise RuntimeError("Failed to write complete audio data to memfd")

total_written += written

Copilot · 2026-02-23T15:50:00Z

vllm/entrypoints/openai/speech_to_text/speech_to_text.py

+        result = subprocess.run(
+            cmd,
+            stdout=subprocess.PIPE,
+            stderr=subprocess.PIPE,
+            pass_fds=(fd,),  # inherit only this fd
+        )
+    finally:
+        os.close(fd)
+
+    if result.returncode != 0:
+        raise RuntimeError(
+            "ffmpeg failed to decode audio: "
+            + result.stderr.decode("utf-8", errors="replace").strip()
+        )
+
+    y = np.frombuffer(result.stdout, dtype=np.float32)
+    if y.size == 0:
+        raise RuntimeError(


_decode_audio_bytes_ffmpeg raises RuntimeError on decode failure / empty output, but _create_speech_to_text only catches ValueError from preprocessing. This means ffmpeg failures will likely bubble up as 500s instead of a structured OpenAI error response. Suggest raising ValueError (or VLLMValidationError) from this helper (possibly wrapping FileNotFoundError/ffmpeg stderr) or expanding the preprocessing except to handle these failure types explicitly.

Suggested change

result = subprocess.run(

cmd,

stdout=subprocess.PIPE,

stderr=subprocess.PIPE,

pass_fds=(fd,), # inherit only this fd

)

finally:

os.close(fd)

if result.returncode != 0:

raise RuntimeError(

"ffmpeg failed to decode audio: "

+ result.stderr.decode("utf-8", errors="replace").strip()

)

y = np.frombuffer(result.stdout, dtype=np.float32)

if y.size == 0:

raise RuntimeError(

try:

result = subprocess.run(

cmd,

stdout=subprocess.PIPE,

stderr=subprocess.PIPE,

pass_fds=(fd,), # inherit only this fd

)

except OSError as exc:

# ffmpeg binary not found or failed to execute

raise VLLMValidationError(

f"ffmpeg invocation failed while decoding audio: {exc}"

) from exc

finally:

os.close(fd)

if result.returncode != 0:

raise VLLMValidationError(

"ffmpeg failed to decode audio: "

+ result.stderr.decode("utf-8", errors="replace").strip()

)

y = np.frombuffer(result.stdout, dtype=np.float32)

if y.size == 0:

raise VLLMValidationError(

Copilot · 2026-02-23T15:50:00Z

vllm/entrypoints/openai/speech_to_text/speech_to_text.py

+    sr = int(sr)
+    fd = os.memfd_create("vllm_audio")
+    try:
+        os.write(fd, audio_data)
+        os.lseek(fd, 0, os.SEEK_SET)
+
+        cmd = [
+            "ffmpeg",
+            "-hide_banner",
+            "-loglevel",
+            "error",
+            "-i",
+            f"/proc/self/fd/{fd}",
+            "-vn",  # discard video


This ffmpeg path is Linux-specific (os.memfd_create and /proc/self/fd/<N>). vLLM supports macOS (see setup.py / CI smoke tests), where memfd_create and /proc are unavailable; MP4/M4A/WebM uploads would still fail there (likely with AttributeError). Consider adding an OS check and a portable fallback (e.g., NamedTemporaryFile(suffix=...) or SpooledTemporaryFile) with clear error messaging when neither option is available.

Copilot · 2026-02-23T15:50:00Z

vllm/entrypoints/openai/speech_to_text/speech_to_text.py

+    try:
+        with io.BytesIO(audio_data) as buf:
+            return librosa.load(buf, sr=sr)  # type: ignore[return-value]
+    except Exception:
+        pass
+


The PR description mentions “narrower exception handling”, but _load_audio_bytes currently uses a broad except Exception: and silently discards the error. This both contradicts the description and makes it hard to diagnose why the fast path failed. Consider catching the expected decode exceptions and logging the exception details at DEBUG (with exc_info=True) before falling back.

Copilot · 2026-02-23T15:50:00Z

vllm/entrypoints/openai/speech_to_text/speech_to_text.py

+        # Decode audio bytes.  For container formats (MP4, M4A, WebM) that
+        # soundfile cannot detect from a BytesIO stream, _load_audio_bytes
+        # transparently falls back to ffmpeg via an in-memory fd.
+        # NOTE resample to model SR here for efficiency. This is also a
+        # pre-requisite for chunking, as it assumes Whisper SR.
+        y, sr = _load_audio_bytes(audio_data, sr=self.asr_config.sample_rate)


This change introduces a new ffmpeg-based fallback path for container formats (MP4/M4A/WebM), but the existing transcription tests appear to cover only WAV-like inputs. Adding an automated test that exercises the fallback (and validates it returns audio of the expected duration) would prevent regressions and ensure CI covers the previously broken formats.

mergify · 2026-02-23T15:50:31Z

Hi @seanmamasde, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy or markdownlint failing?

mypy and markdownlint are run differently in CI. If the failure is related to either of these checks, please use the following commands to run them locally:

# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10
# For markdownlint
pre-commit run --hook-stage manual markdownlint

NickLucche

Thanks a lot for the detailed breakdown and for contributing to vLLM @seanmamasde !

My only concern is the one I commented about, reporting it here to broaden discussion.

I am somewhat worried about the latency overhead we're introducing here in spawning a separate process at the API level in calling ffmpeg. On one side I understand a generic fallback like this for all audio types can be of enhance flexibility. On the other I wouldn't want to penalize vllm perceived latency for an operation that could be carried out in front of vllm itself.

This may call at least for an optional flag which the user has to explicitly set to opt-in and ack the suboptimal conversion (ie make this feature optional).

Alternatively, we should consider whether an in-process conversion solution could be adopted here.

Finally, can you provide more info about the mp4 file used for testing (feel free to reach out on slack), so I can add them to our set?

cc @alex-jw-brooks may also be interested

NickLucche · 2026-02-24T10:16:52Z

vllm/entrypoints/openai/speech_to_text/speech_to_text.py

+def _load_audio_bytes(
+    audio_data: bytes,
+    sr: int | float,
+) -> tuple[np.ndarray, int]:


can we move these two new functions into a new utils.py file here in the same submodule?

it's moved now! checkout 62f5ce5

NickLucche · 2026-02-24T10:19:50Z

vllm/entrypoints/openai/speech_to_text/speech_to_text.py

+        with io.BytesIO(audio_data) as buf:
+            return librosa.load(buf, sr=sr)  # type: ignore[return-value]
+    except Exception:
+        pass


we can move the rest of the code here instead of passing.
Also, could you check whether snf exception introduced in #34715 can be used in place of the generic Exception catch-all trap?

Done. now utils.py catches sf.LibsndfileError with exc.code in _BAD_SF_CODE

NickLucche · 2026-02-24T10:29:29Z

vllm/entrypoints/openai/speech_to_text/speech_to_text.py

+        result = subprocess.run(
+            cmd,
+            capture_output=True,
+            pass_fds=(fd,),  # inherit only this fd
+        )


I am also somewhat worried about the latency overhead we're introducing here in spawning a separate process at the API level.
On one side I understand a generic fallback like this for all audio types can be of enhance flexibility.
On the other I wouldn't want to penalize vllm perceived latency for an operation that could be carried out in front of vllm itself.

This may call at least for an optional flag which the user has to explicitly set to opt-in and ack the suboptimal conversion (ie make this feature optional).

Alternatively, we should consider whether an in-process conversion solution could be adopted here.

it's now done via torchaudio.load, which is in-process, so I guess no flag needed?

mergify · 2026-02-24T13:06:09Z

Hi @seanmamasde, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy or markdownlint failing?

mypy and markdownlint are run differently in CI. If the failure is related to either of these checks, please use the following commands to run them locally:

# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10
# For markdownlint
pre-commit run --hook-stage manual markdownlint

mergify · 2026-02-24T13:31:00Z

Hi @seanmamasde, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy or markdownlint failing?

mypy and markdownlint are run differently in CI. If the failure is related to either of these checks, please use the following commands to run them locally:

# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10
# For markdownlint
pre-commit run --hook-stage manual markdownlint

mergify · 2026-02-24T13:44:19Z

Hi @seanmamasde, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy or markdownlint failing?

mypy and markdownlint are run differently in CI. If the failure is related to either of these checks, please use the following commands to run them locally:

# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10
# For markdownlint
pre-commit run --hook-stage manual markdownlint

seanmamasde · 2026-02-24T14:14:59Z

Thanks a lot for the detailed breakdown and for contributing to vLLM @seanmamasde !

My only concern is the one I commented about, reporting it here to broaden discussion.

I am somewhat worried about the latency overhead we're introducing here in spawning a separate process at the API level in calling ffmpeg. On one side I understand a generic fallback like this for all audio types can be of enhance flexibility. On the other I wouldn't want to penalize vllm perceived latency for an operation that could be carried out in front of vllm itself.

This may call at least for an optional flag which the user has to explicitly set to opt-in and ack the suboptimal conversion (ie make this feature optional).

Alternatively, we should consider whether an in-process conversion solution could be adopted here.

Finally, can you provide more info about the mp4 file used for testing (feel free to reach out on slack), so I can add them to our set?

cc @alex-jw-brooks may also be interested

Audio is generated from a short LibriSpeech (test-clean) speech clip (public domain / LibriVox-derived), downloaded as WAV and then trimmed/resampled to mono 16 kHz and converted to wav, flac, mp3, mpga, ogg, mp4, mpeg, m4a, webm w/ ffmpeg.

LibriSpeech sample mirror: https://www.voiptroubleshooter.com/open_speech/american/OSR_us_000_0010_8k.wav

alex-jw-brooks

Thanks for this, looks good! Some small suggestions

alex-jw-brooks · 2026-03-03T19:49:45Z

vllm/entrypoints/openai/speech_to_text/utils.py

+try:
+    import soundfile as sf
+except ImportError:
+    sf = None  # type: ignore[assignment]


Could you refactor this to also use a PlaceholderModule for soundfile?

alex-jw-brooks · 2026-03-03T20:06:47Z

vllm/entrypoints/openai/speech_to_text/utils.py

+    try:
+        with io.BytesIO(audio_data) as buf:
+            return librosa.load(buf, sr=sr)  # type: ignore[return-value]
+    except Exception as exc:


I think it would be better to avoid catching the exception generically like this and handle it more explicitly - For example, if we use the soundfile placeholder, I think we can just catch soundfile.LibsndfileError, inspect the code, and add a debug log + return decode_audio_bytes_torchaudio(audio_data, sr) if it's a _BAD_SF_CODES?

Using the placeholder would also be more clear for failure cases here, because soundfile is an explicitly listed optional dep of vLLM for audio too, so it'll raise Please install vllm[audio] for audio support if soundfile.LibsndfileError is invalid because it's not installed

seanmamasde · 2026-03-06T09:22:53Z

Hi @alex-jw-brooks I have made the changes you suggested. Can you take a look when you have time? Huge thanks.

alex-jw-brooks

Nice, thanks! LGTM - @NickLucche will need to take one more look to merge I think :)

NickLucche

Looks good, thanks for the work @seanmamasde @alex-jw-brooks !
Just a comment on where should torchcodec dependency should live (and also torchaudio imo, although that may be work for a separate PR)

NickLucche · 2026-03-07T09:25:57Z

requirements/cuda.txt

 ray[cgraph]>=2.48.0
 torch==2.10.0
 torchaudio==2.10.0
+torchcodec==0.10.0 # Required by torchaudio>=2.9 for audio decoding (MP4/M4A/WebM)


I am not sure sure why torchaudio appears in every requirements but not in common.txt? @DarkLight1337

Regardless, I think we should add torchcodec to the vllm[audio] extras

vllm/setup.py

Line 1054 in 755356b

"audio": [

yeah I should've just added it under to the audio section. I removed all the occurrences in the requirements.txt and put it in the setup.py -> audio[] instead. it's fixed now!

NickLucche

LGTM

…mats Add torchaudio-based fallback decoding for container formats that librosa/soundfile (libsndfile) cannot handle. When librosa.load() fails with a LibsndfileError on unsupported formats, fall back to torchaudio.load() which uses torchcodec/FFmpeg for decoding. - Add utils.py with load_audio_bytes() and decode_audio_bytes_torchaudio() - Narrow exception handling to catch sf.LibsndfileError specifically - Use PlaceholderModule for soundfile import - Add torchcodec to vllm[audio] extras in setup.py Signed-off-by: seanmamasde <seanmamasde@gmail.com>

Isotr0py · 2026-03-14T17:00:04Z

setup.py

            "soundfile",
            "mistral_common[audio]",
            "av",
+            "torchcodec",


I'm a bit worried that torchcodec will break audio support on GB200 + aarch64 CPU, because it only distributes x86_64 manylinux wheels (https://pypi.org/project/torchcodec/#files).

I opened #37061 to revert this PR and use pyav for video fallback instead.

I actually investigated this a bit back:

| lib | in-process? | mp4/m4a/webm using bytesio | new dep | | ------------- | ----------- | ----------------------------------------------------- | ------- | | ffmpeg-python | no | using pipe, but still subprocess | yes | | pydub | no | using pipe, but still subprocess | yes | | soundfile | yes | libsndfile doesn't support mp4/m4a/webm | no | | PyAV (av) | yes | av.open(BytesIO(...)) should work | yes | | torchaudio | yes | torchaudio.load(BytesIO(...), format=...) should work | no |

At the time of implementation, torchaudio seems like the best bet since it doesn't introduce extra deps and is an in-process conversion (as opposed to tempfile, subprocess w/ ffmpeg). But it seems that starting with torchaudio v2.9.0+ it uses torchcodec for torchaudio.save() and torchaudio.load()

…mats (vllm-project#35109) Signed-off-by: seanmamasde <seanmamasde@gmail.com> Signed-off-by: Athrael Soju <athrael.soju@gmail.com>

…mats (vllm-project#35109) Signed-off-by: seanmamasde <seanmamasde@gmail.com>

Copilot AI review requested due to automatic review settings February 23, 2026 15:43

seanmamasde requested a review from NickLucche as a code owner February 23, 2026 15:43

mergify bot added frontend bug Something isn't working labels Feb 23, 2026

Copilot started reviewing on behalf of seanmamasde February 23, 2026 15:45 View session

seanmamasde force-pushed the fix/audio-transcription-mp4-m4a-webm branch from d67d5c1 to ef29f2e Compare February 23, 2026 15:45

gemini-code-assist bot reviewed Feb 23, 2026

View reviewed changes

Copilot AI reviewed Feb 23, 2026

View reviewed changes

seanmamasde force-pushed the fix/audio-transcription-mp4-m4a-webm branch 2 times, most recently from 20c55cb to b878842 Compare February 23, 2026 16:04

NickLucche reviewed Feb 24, 2026

View reviewed changes

seanmamasde force-pushed the fix/audio-transcription-mp4-m4a-webm branch 2 times, most recently from 4794f78 to 1258128 Compare February 24, 2026 13:01

seanmamasde force-pushed the fix/audio-transcription-mp4-m4a-webm branch from 259fca6 to bb8dbce Compare February 24, 2026 13:36

mergify bot added ci/build nvidia labels Feb 24, 2026

github-project-automation bot added this to NVIDIA Feb 24, 2026

seanmamasde force-pushed the fix/audio-transcription-mp4-m4a-webm branch from bb8dbce to 914066d Compare February 24, 2026 13:39

seanmamasde requested a review from tjtanaa as a code owner February 24, 2026 13:39

mergify bot added rocm Related to AMD ROCm cpu Related to CPU backends labels Feb 24, 2026

github-project-automation bot added this to AMD Feb 24, 2026

github-project-automation bot moved this to Todo in AMD Feb 24, 2026

seanmamasde force-pushed the fix/audio-transcription-mp4-m4a-webm branch from 914066d to 38e8209 Compare February 24, 2026 13:58

alex-jw-brooks suggested changes Mar 3, 2026

View reviewed changes

github-project-automation bot moved this to In review in NVIDIA Mar 3, 2026

seanmamasde force-pushed the fix/audio-transcription-mp4-m4a-webm branch from 38e8209 to 0cf7845 Compare March 4, 2026 14:59

mergify bot removed the needs-rebase label Mar 4, 2026

alex-jw-brooks approved these changes Mar 6, 2026

View reviewed changes

NickLucche reviewed Mar 7, 2026

View reviewed changes

seanmamasde force-pushed the fix/audio-transcription-mp4-m4a-webm branch from 0cf7845 to 9657424 Compare March 7, 2026 09:32

NickLucche added the ready ONLY add when PR is ready to merge/full CI is needed label Mar 9, 2026

NickLucche approved these changes Mar 9, 2026

View reviewed changes

github-project-automation bot moved this from In review to Ready in NVIDIA Mar 9, 2026

NickLucche enabled auto-merge (squash) March 9, 2026 08:21

auto-merge was automatically disabled March 9, 2026 08:23
Head branch was pushed to by a user without write access

seanmamasde force-pushed the fix/audio-transcription-mp4-m4a-webm branch 5 times, most recently from cb8b2be to c6363fb Compare March 12, 2026 06:45

seanmamasde force-pushed the fix/audio-transcription-mp4-m4a-webm branch from c6363fb to 277488c Compare March 14, 2026 10:19

vllm-bot merged commit 84868e4 into vllm-project:main Mar 14, 2026
126 of 128 checks passed

github-project-automation bot moved this from Ready to Done in NVIDIA Mar 14, 2026

github-project-automation bot moved this from Todo to Done in AMD Mar 14, 2026

Isotr0py mentioned this pull request Mar 14, 2026

[Frontend] Remove torchcodec from audio dependency #37061

Merged

5 tasks

Isotr0py reviewed Mar 14, 2026

View reviewed changes

Lucaskabela pushed a commit to Lucaskabela/vllm that referenced this pull request Mar 17, 2026

[Bugfix][Frontend] Fix audio transcription for MP4, M4A, and WebM for…

739987b

…mats (vllm-project#35109) Signed-off-by: seanmamasde <seanmamasde@gmail.com>

wendyliu235 pushed a commit to wendyliu235/vllm-public that referenced this pull request Mar 18, 2026

[Bugfix][Frontend] Fix audio transcription for MP4, M4A, and WebM for…

12522fd

…mats (vllm-project#35109) Signed-off-by: seanmamasde <seanmamasde@gmail.com>

fxdawnn pushed a commit to fxdawnn/vllm that referenced this pull request Mar 19, 2026

[Bugfix][Frontend] Fix audio transcription for MP4, M4A, and WebM for…

bdf45a1

…mats (vllm-project#35109) Signed-off-by: seanmamasde <seanmamasde@gmail.com>

-    except Exception:
-        pass
+    except (soundfile.LibsndfileError, audioread.exceptions.NoBackendError) as e:
+        logger.debug("Librosa BytesIO decode failed: %s", e)

-        os.write(fd, audio_data)
+        # Ensure the full audio_data buffer is written to the memfd.
+        total_written = 0
+        data_len = len(audio_data)
+        while total_written < data_len:
+            written = os.write(fd, audio_data[total_written:])
+            if written == 0:
+                raise RuntimeError("Failed to write complete audio data to memfd")
+            total_written += written

Uh oh!

Conversation

seanmamasde commented Feb 23, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Cause

Changes

Some more details

tests

Test Result

Before patch (baseline)

After patch

Uh oh!

github-actions bot commented Feb 23, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Feb 23, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Feb 23, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 23, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 23, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 23, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 23, 2026

Choose a reason for hiding this comment

Uh oh!

mergify bot commented Feb 23, 2026

Uh oh!

NickLucche left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mergify bot commented Feb 24, 2026

Uh oh!

mergify bot commented Feb 24, 2026

Uh oh!

mergify bot commented Feb 24, 2026

Uh oh!

seanmamasde commented Feb 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alex-jw-brooks left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

seanmamasde commented Mar 6, 2026

Uh oh!

seanmamasde commented Feb 23, 2026 •

edited by github-actions bot

Loading

seanmamasde commented Feb 24, 2026 •

edited

Loading

seanmamasde Mar 7, 2026 •

edited

Loading

Isotr0py Mar 14, 2026 •

edited

Loading

seanmamasde Mar 15, 2026 •

edited

Loading