Skip to content

Revert "[Frontend] Remove librosa from audio dependency" (#37058)#37785

Draft
zhewenl wants to merge 1 commit intovllm-project:mainfrom
zhewenl:auto-revert/pr-37058
Draft

Revert "[Frontend] Remove librosa from audio dependency" (#37058)#37785
zhewenl wants to merge 1 commit intovllm-project:mainfrom
zhewenl:auto-revert/pr-37058

Conversation

@zhewenl
Copy link
Copy Markdown
Collaborator

@zhewenl zhewenl commented Mar 22, 2026

Revert of #37058

This reverts commit c7f98b4 (PR #37058).

Reason

CI build #57431 detected 1 new failure linked to this PR:

  • Entrypoints Integration (API Server 1): test_online_audio_in_video_interleaved fails with Error opening <_io.BytesIO object>: (Garbled error message from libsndfile) after librosa was removed from audio dependencies.

Auto-generated

This revert PR was auto-generated by the CI failure analyzer.

@mergify mergify Bot added ci/build frontend multi-modality Related to multi-modality (#4194) performance Performance-related issues labels Mar 22, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request reverts a previous change that removed librosa, which caused a CI failure. The revert correctly re-introduces librosa and removes resampy. It also includes several improvements to audio handling and tests. My review identifies a critical bug in the audio chunking logic that could lead to a TypeError and provides a suggestion to improve the robustness of the audio loading mechanism to better handle various video formats.

Comment on lines +241 to 244
do_split_audio = (
self.asr_config.allow_audio_chunking
and duration > self.asr_config.max_audio_clip_s
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The check for self.asr_config.max_audio_clip_s is not None has been removed. If self.asr_config.allow_audio_chunking is True and self.asr_config.max_audio_clip_s is None, the expression duration > self.asr_config.max_audio_clip_s will raise a TypeError. This check should be restored to prevent a potential crash.

Suggested change
do_split_audio = (
self.asr_config.allow_audio_chunking
and duration > self.asr_config.max_audio_clip_s
)
do_split_audio = (
self.asr_config.allow_audio_chunking
and self.asr_config.max_audio_clip_s is not None
and duration > self.asr_config.max_audio_clip_s
)

Comment on lines 130 to +133
def load_bytes(self, data: bytes) -> tuple[npt.NDArray, float]:
return load_audio(BytesIO(data), sr=None)
if is_video(data):
return extract_audio_from_video_bytes(data)
return librosa.load(BytesIO(data), sr=None)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The is_video() check is not exhaustive and might fail for some video formats (e.g., .mkv, .mov). This would cause librosa.load to be called on video data, which may fail. A more robust approach would be to use a try...except block to handle librosa failures and then fall back to extract_audio_from_video_bytes, similar to the logic in speech_to_text.py. This would make the audio loading more resilient to different media formats.

    def load_bytes(self, data: bytes) -> tuple[npt.NDArray, float]:
        try:
            return librosa.load(BytesIO(data), sr=None)
        except soundfile.LibsndfileError:
            # Fallback to pyav for video containers or other formats
            # that soundfile doesn't handle from a buffer.
            return extract_audio_from_video_bytes(data)

@AndreasKaratzas
Copy link
Copy Markdown
Collaborator

This can be closed I think :)

@mergify
Copy link
Copy Markdown
Contributor

mergify Bot commented Mar 23, 2026

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @zhewenl.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify Bot added the needs-rebase label Mar 23, 2026
@mergify mergify Bot removed the needs-rebase label Apr 29, 2026
@mergify
Copy link
Copy Markdown
Contributor

mergify Bot commented Apr 29, 2026

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @zhewenl.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify Bot added the needs-rebase label Apr 29, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci/build frontend multi-modality Related to multi-modality (#4194) needs-rebase performance Performance-related issues

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants