Revert "[Frontend] Remove librosa from audio dependency" (#37058) by zhewenl · Pull Request #37785 · vllm-project/vllm

zhewenl · 2026-03-22T01:05:51Z

Revert of #37058

This reverts commit c7f98b4 (PR #37058).

Reason

CI build #57431 detected 1 new failure linked to this PR:

Entrypoints Integration (API Server 1): test_online_audio_in_video_interleaved fails with Error opening <_io.BytesIO object>: (Garbled error message from libsndfile) after librosa was removed from audio dependencies.

Auto-generated

This revert PR was auto-generated by the CI failure analyzer.

…#37058)" This reverts commit c7f98b4.

gemini-code-assist

Code Review

This pull request reverts a previous change that removed librosa, which caused a CI failure. The revert correctly re-introduces librosa and removes resampy. It also includes several improvements to audio handling and tests. My review identifies a critical bug in the audio chunking logic that could lead to a TypeError and provides a suggestion to improve the robustness of the audio loading mechanism to better handle various video formats.

gemini-code-assist · 2026-03-22T01:09:26Z

+        do_split_audio = (
+            self.asr_config.allow_audio_chunking
            and duration > self.asr_config.max_audio_clip_s
        )


The check for self.asr_config.max_audio_clip_s is not None has been removed. If self.asr_config.allow_audio_chunking is True and self.asr_config.max_audio_clip_s is None, the expression duration > self.asr_config.max_audio_clip_s will raise a TypeError. This check should be restored to prevent a potential crash.

Suggested change

do_split_audio = (

self.asr_config.allow_audio_chunking

and duration > self.asr_config.max_audio_clip_s

)

do_split_audio = (

self.asr_config.allow_audio_chunking

and self.asr_config.max_audio_clip_s is not None

and duration > self.asr_config.max_audio_clip_s

)

gemini-code-assist · 2026-03-22T01:09:26Z

    def load_bytes(self, data: bytes) -> tuple[npt.NDArray, float]:
-        return load_audio(BytesIO(data), sr=None)
+        if is_video(data):
+            return extract_audio_from_video_bytes(data)
+        return librosa.load(BytesIO(data), sr=None)


The is_video() check is not exhaustive and might fail for some video formats (e.g., .mkv, .mov). This would cause librosa.load to be called on video data, which may fail. A more robust approach would be to use a try...except block to handle librosa failures and then fall back to extract_audio_from_video_bytes, similar to the logic in speech_to_text.py. This would make the audio loading more resilient to different media formats.

def load_bytes(self, data: bytes) -> tuple[npt.NDArray, float]: try: return librosa.load(BytesIO(data), sr=None) except soundfile.LibsndfileError: # Fallback to pyav for video containers or other formats # that soundfile doesn't handle from a buffer. return extract_audio_from_video_bytes(data)

AndreasKaratzas · 2026-03-22T17:31:27Z

This can be closed I think :)

mergify · 2026-03-23T02:41:59Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @zhewenl.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

mergify · 2026-04-29T17:34:42Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @zhewenl.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Revert "[Frontend] Remove librosa from audio dependency (vllm-project…

e06e90a

…#37058)" This reverts commit c7f98b4.

mergify Bot added ci/build frontend multi-modality Related to multi-modality (#4194) performance Performance-related issues labels Mar 22, 2026

gemini-code-assist Bot reviewed Mar 22, 2026

View reviewed changes

mergify Bot added the needs-rebase label Mar 23, 2026

mergify Bot removed the needs-rebase label Apr 29, 2026

mergify Bot added the needs-rebase label Apr 29, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Revert "[Frontend] Remove librosa from audio dependency" (#37058)#37785

Revert "[Frontend] Remove librosa from audio dependency" (#37058)#37785
zhewenl wants to merge 1 commit intovllm-project:mainfrom
zhewenl:auto-revert/pr-37058

zhewenl commented Mar 22, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Mar 22, 2026

Uh oh!

gemini-code-assist Bot Mar 22, 2026

Uh oh!

AndreasKaratzas commented Mar 22, 2026

Uh oh!

mergify Bot commented Mar 23, 2026

Uh oh!

mergify Bot commented Apr 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

zhewenl commented Mar 22, 2026

Revert of #37058

Reason

Auto-generated

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Mar 22, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Mar 22, 2026

Choose a reason for hiding this comment

Uh oh!

AndreasKaratzas commented Mar 22, 2026

Uh oh!

mergify Bot commented Mar 23, 2026

Uh oh!

mergify Bot commented Apr 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants