Skip to content

[Refactor] Drop direct dependency on librosa#39079

Merged
ywang96 merged 6 commits intovllm-project:mainfrom
NickCao:drop-librosa
Apr 18, 2026
Merged

[Refactor] Drop direct dependency on librosa#39079
ywang96 merged 6 commits intovllm-project:mainfrom
NickCao:drop-librosa

Conversation

@NickCao
Copy link
Copy Markdown
Contributor

@NickCao NickCao commented Apr 6, 2026

Purpose

Drop dependency on librosa due to license concerns.

Test Plan

N/A, the load_audio/resample wrapper functions has been validated in existing code, and the melscale_fbanks function from torch audio is numerically equivalent to it's librosa counterpart.

Test Result

N/A

Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

@mergify
Copy link
Copy Markdown
Contributor

mergify Bot commented Apr 6, 2026

Documentation preview: https://vllm--39079.org.readthedocs.build/en/39079/

@mergify mergify Bot added documentation Improvements or additions to documentation ci/build multi-modality Related to multi-modality (#4194) rocm Related to AMD ROCm labels Apr 6, 2026
@github-project-automation github-project-automation Bot moved this to Todo in AMD Apr 6, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request removes the librosa dependency from the codebase, replacing its functionality with internal utilities and torchaudio. Key changes include replacing librosa.load and librosa.get_duration with load_audio and get_audio_duration, as well as migrating mel-filterbank generation to torchaudio.functional.melscale_fbanks. Documentation, examples, and requirement files have been updated to reflect these changes and the shift toward soundfile and PyAV as the primary backends. I have no feedback to provide.

@NickCao NickCao force-pushed the drop-librosa branch 2 times, most recently from db5219a to 38068e7 Compare April 6, 2026 14:26
@robertgshaw2-redhat robertgshaw2-redhat added the ready ONLY add when PR is ready to merge/full CI is needed label Apr 6, 2026
@robertgshaw2-redhat
Copy link
Copy Markdown
Collaborator

could you run an performance sanity check?

@robertgshaw2-redhat
Copy link
Copy Markdown
Collaborator

thanks for making this change, its long overdue

@DarkLight1337 DarkLight1337 requested a review from Isotr0py April 6, 2026 14:29
@DarkLight1337
Copy link
Copy Markdown
Member

DarkLight1337 commented Apr 6, 2026

Actually for the main code we have already dropped the dependency: #37058

But it's nice to remove it from the example and testing code as well!

@NickCao
Copy link
Copy Markdown
Contributor Author

NickCao commented Apr 6, 2026

could you run an performance sanity check?

torchaudio.functional.melscale_fbanks in in the __init__ function, not on the hot path, the remaining changes are in the tests/examples, and these wrapper functions (load_audio, etc.) are already used in the main code, so there should be none performance regressions.

@NickCao
Copy link
Copy Markdown
Contributor Author

NickCao commented Apr 6, 2026

Still pulled in by a third party dep....

__________________________________ test_wer_correctness[D4nt3/esb-datasets-earnings22-validation-tiny-filtered-model_config0] __________________________________
self = Audio(sampling_rate=16000, mono=True, decode=True, id=None)
value = {'bytes': b'RIFF&\xd1\x06\x00WAVEfmt \x10\x00\x00\x00\x01\x00\x01\x00\x80>\x00\x00\x00}\x00\x00\x02\x00\x10\x00data\x0...x01H\x02\xa2\x01\xbe\x00\xe4\xffn\x00]\x01\x9e\x01j\xffS\xfd\t\xfd\xd9\xff\x1c\x02D\x03\xf3\x00\x04\xfd', 'path': None}
token_per_repo_id = None
    def decode_example(
        self, value: dict, token_per_repo_id: Optional[Dict[str, Union[str, bool, None]]] = None
    ) -> dict:
        """Decode example audio file into audio data.
        Args:
            value (`dict`):
                A dictionary with keys:
                - `path`: String with relative audio file path.
                - `bytes`: Bytes of the audio file.
            token_per_repo_id (`dict`, *optional*):
                To access and decode
                audio files from private repositories on the Hub, you can pass
                a dictionary repo_id (`str`) -> token (`bool` or `str`)
        Returns:
            `dict`
        """
        if not self.decode:
            raise RuntimeError("Decoding is disabled for this feature. Please use Audio(decode=True) instead.")
        path, file = (value["path"], BytesIO(value["bytes"])) if value["bytes"] is not None else (value["path"], None)
        if path is None and file is None:
            raise ValueError(f"An audio sample should have one of 'path' or 'bytes' but both are None in {value}.")
        try:
>           import librosa
E           ModuleNotFoundError: No module named 'librosa'
/usr/local/lib/python3.12/dist-packages/datasets/features/audio.py:153: ModuleNotFoundError

@NickCao
Copy link
Copy Markdown
Contributor Author

NickCao commented Apr 6, 2026

Dropped the commit changing requirements, let's handle this later.

@NickCao NickCao changed the title [Refactor] Drop dependency on librosa [Refactor] Drop direct dependency on librosa Apr 6, 2026
Comment thread vllm/transformers_utils/processors/cohere_asr.py
@NickCao
Copy link
Copy Markdown
Contributor Author

NickCao commented Apr 6, 2026

Still pulled in by a third party dep....

        try:
>           import librosa
E           ModuleNotFoundError: No module named 'librosa'
/usr/local/lib/python3.12/dist-packages/datasets/features/audio.py:153: ModuleNotFoundError

datasets drops the liborsa dependency in favor of torchcodec in huggingface/datasets@161f99d, we need to update it to 4.0.0+.

@Isotr0py
Copy link
Copy Markdown
Member

Isotr0py commented Apr 6, 2026

datasets drops the liborsa dependency in favor of torchcodec

But I think torchcodec is optional requirements actually? https://github.com/huggingface/datasets/blob/161f99d94a1daf8380eabdb826048a0652510ee6/setup.py#L210-L212

@NickCao
Copy link
Copy Markdown
Contributor Author

NickCao commented Apr 6, 2026

datasets drops the liborsa dependency in favor of torchcodec

But I think torchcodec is optional requirements actually? https://github.com/huggingface/datasets/blob/161f99d94a1daf8380eabdb826048a0652510ee6/setup.py#L210-L212

We are pinning to datasets 3:

requirements/test.in
74:# Newer versions of datasets require torchcoded, that makes the tests fail in CI because of a missing library.
76:datasets>=3.3.0,<=3.6.0

Thus still using librosa.

NickCao and others added 5 commits April 17, 2026 13:11
…_audio

Signed-off-by: Nick Cao <ncao@redhat.com>
…esampler

Signed-off-by: Nick Cao <ncao@redhat.com>
…t_audio_duration

Signed-off-by: Nick Cao <ncao@redhat.com>
…scale_fbanks

Co-authored-by: Claude <noreply@anthropic.com>
Signed-off-by: Nick Cao <ncao@redhat.com>
Signed-off-by: Nick Cao <ncao@redhat.com>
@ywang96 ywang96 enabled auto-merge (squash) April 18, 2026 05:22
@ywang96 ywang96 merged commit 153ba7f into vllm-project:main Apr 18, 2026
55 of 56 checks passed
@github-project-automation github-project-automation Bot moved this from Todo to Done in AMD Apr 18, 2026
bnellnm pushed a commit to neuralmagic/vllm that referenced this pull request Apr 20, 2026
Signed-off-by: Nick Cao <ncao@redhat.com>
Co-authored-by: Claude <noreply@anthropic.com>
baonudesifeizhai pushed a commit to baonudesifeizhai/vllm that referenced this pull request Apr 23, 2026
Signed-off-by: Nick Cao <ncao@redhat.com>
Co-authored-by: Claude <noreply@anthropic.com>
whk-lab pushed a commit to whk-lab/vllm that referenced this pull request Apr 23, 2026
Signed-off-by: Nick Cao <ncao@redhat.com>
Co-authored-by: Claude <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci/build documentation Improvements or additions to documentation multi-modality Related to multi-modality (#4194) ready ONLY add when PR is ready to merge/full CI is needed rocm Related to AMD ROCm

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

5 participants