[Bugfix][Frontend] support webm with audioread fallback by cpwan · Pull Request #18477 · vllm-project/vllm

cpwan · 2025-05-21T10:27:16Z

github-actions · 2025-05-21T10:33:04Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

cpwan · 2025-05-21T10:40:25Z

Btw, i tried to make minimal change to the code, yet the code in the main branch falls short of the pre-commit check, such as the import sorting...

vllm/entrypoints/openai/serving_transcription.py

DarkLight1337 · 2025-05-21T12:36:27Z

@NickLucche does this look good to you?

cpwan · 2025-05-21T12:39:31Z

Let me tidy up a bit the git history

Signed-off-by: cpwan <cpwan@connect.ust.hk>

NickLucche

Thanks for taking action on the issue.
To be frank I am not familiar with audioread, but I see librosa already uses it as a fallback so maybe this doesn't even go into the requirements lists.

librosa uses soundfile and audioread for reading audio. As of v0.7, librosa uses soundfile by default, and falls back on audioread only when dealing with codecs unsupported by soundfile. For a list of codecs supported by soundfile, see the libsndfile documentation.

https://github.com/librosa/librosa/blob/e403272fc984bc4aeb316e5f15899042224bb9fe/docs/ioformats.rst#read-specific-formats

Also I am totally uneducated on potential security concerns in opening up to a matrioska format like webm (cc @russellb ).
Can we at least test for the sake of completeness sending some video/image in webm?

NickLucche · 2025-05-21T12:38:46Z

vllm/entrypoints/openai/serving_transcription.py

this exception is way too generic

NickLucche · 2025-05-21T12:39:44Z

vllm/entrypoints/openai/serving_transcription.py

we should write to a bytesio in memory buffer not a temp file. This may, among other things, trigger permissions issues on deployments.

NickLucche · 2025-05-21T12:40:34Z

vllm/entrypoints/openai/serving_transcription.py

we should probably log debug/warning this path

NickLucche · 2025-05-21T12:50:13Z

Also worth to look into why librosa isn't falling back to audioread as reported in the docs, is it an optional dep or..?

github-actions · 2025-08-20T02:11:48Z

This pull request has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this pull request should remain open. Thank you!

mergify · 2025-08-20T02:12:25Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @cpwan.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

github-actions · 2025-11-20T02:11:48Z

This pull request has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this pull request should remain open. Thank you!

mergify · 2025-11-22T02:10:30Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @cpwan.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

github-actions · 2026-02-21T02:17:11Z

This pull request has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this pull request should remain open. Thank you!

mergify bot added the frontend label May 21, 2025

cpwan force-pushed the add-webm branch 2 times, most recently from c667ae0 to 8529fe5 Compare May 21, 2025 10:31

cpwan mentioned this pull request May 21, 2025

[Bug]: Audio transcription does not support webm #18385

Closed

1 task

DarkLight1337 reviewed May 21, 2025

View reviewed changes

vllm/entrypoints/openai/serving_transcription.py Outdated Show resolved Hide resolved

mergify bot added the ci/build label May 21, 2025

fix: support webm with audioread fallback

691656d

Signed-off-by: cpwan <cpwan@connect.ust.hk>

cpwan force-pushed the add-webm branch from e814b6f to 691656d Compare May 21, 2025 12:44

NickLucche requested changes May 21, 2025

View reviewed changes

github-actions bot added the stale Over 90 days of inactivity label Aug 20, 2025

mergify bot added the needs-rebase label Aug 20, 2025

github-actions bot added unstale Recieved activity after being labelled stale and removed stale Over 90 days of inactivity labels Aug 21, 2025

github-actions bot added stale Over 90 days of inactivity and removed unstale Recieved activity after being labelled stale labels Nov 20, 2025

mergify bot removed the needs-rebase label Nov 20, 2025

github-actions bot added unstale Recieved activity after being labelled stale and removed stale Over 90 days of inactivity labels Nov 22, 2025

mergify bot added the needs-rebase label Nov 22, 2025

github-actions bot added the stale Over 90 days of inactivity label Feb 21, 2026

github-actions bot removed the unstale Recieved activity after being labelled stale label Feb 21, 2026

seanmamasde mentioned this pull request Feb 23, 2026

[Bugfix][Frontend] Fix audio transcription for MP4, M4A, and WebM formats #35109

Merged

5 tasks

hmellor closed this Mar 4, 2026

Uh oh!

Conversation

cpwan commented May 21, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented May 21, 2025

Uh oh!

cpwan commented May 21, 2025

Uh oh!

Uh oh!

DarkLight1337 commented May 21, 2025

Uh oh!

cpwan commented May 21, 2025

Uh oh!

NickLucche left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

NickLucche May 21, 2025

Choose a reason for hiding this comment

Uh oh!

NickLucche May 21, 2025

Choose a reason for hiding this comment

Uh oh!

NickLucche May 21, 2025

Choose a reason for hiding this comment

Uh oh!

NickLucche commented May 21, 2025

Uh oh!

github-actions bot commented Aug 20, 2025

Uh oh!

mergify bot commented Aug 20, 2025

Uh oh!

github-actions bot commented Nov 20, 2025

Uh oh!

mergify bot commented Nov 22, 2025

Uh oh!

github-actions bot commented Feb 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

cpwan commented May 21, 2025 •

edited by github-actions bot

Loading

NickLucche left a comment •

edited

Loading