Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Process .opus files with torchaudio #3667

Closed
wants to merge 1 commit into from

Conversation

polinaeterna
Copy link
Contributor

@polinaeterna polinaeterna commented Feb 2, 2022

@anton-l suggested to proccess .opus files with torchaudio instead of soundfile as it's faster:
opus

(moreover, I didn't manage to load .opus files with soundfile / librosa locally on any my machine anyway for some reason, even with ffmpeg installed).

For now my current changes work with locally stored file:

# download sample opus file (from MultilingualSpokenWords dataset)
!wget https://huggingface.co/datasets/polinaeterna/test_opus/resolve/main/common_voice_tt_17737010.opus 

from datasets import Dataset, Audio

audio_path = "common_voice_tt_17737010.opus"
dataset = Dataset.from_dict({"audio": [audio_path]}).cast_column("audio", Audio(48000))
dataset[0]
# {'audio': {'path': 'common_voice_tt_17737010.opus',
#   'array': array([ 0.0000000e+00,  0.0000000e+00,  3.0517578e-05, ...,
#          -6.1035156e-05,  6.1035156e-05,  0.0000000e+00], dtype=float32),
#   'sampling_rate': 48000}}

But it doesn't work when loading inside s dataset from bytes (I checked on MultilingualSpokenWords, the PR is a draft now, maybe the bug is somewhere there )

import torchaudio
with open(audio_path, "rb") as b:
    print(torchaudio.load(b))
# RuntimeError: Error loading audio file: failed to open file <in memory buffer>

@polinaeterna polinaeterna marked this pull request as draft February 2, 2022 15:23
@polinaeterna polinaeterna self-assigned this Feb 3, 2022
Copy link
Member

@lhoestq lhoestq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me :) thanks !

@@ -91,10 +91,11 @@ def decode_example(self, value: dict) -> dict:
raise RuntimeError("Decoding is disabled for this feature. Please use Audio(decode=True) instead.")

path, file = (value["path"], BytesIO(value["bytes"])) if value["bytes"] is not None else (value["path"], None)
extension = path.split(".")[-1]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The path can be None here:

Suggested change
extension = path.split(".")[-1]
extension = path.split(".")[-1] if path is not None else None

@lhoestq
Copy link
Member

lhoestq commented Feb 3, 2022

Note that torchaudio is maybe less practical to use for TF or JAX users.
This is not in the scope of this PR, but in the future if we manage to find a way to let the user control the decoding it would be nice

@polinaeterna
Copy link
Contributor Author

Note that torchaudio is maybe less practical to use for TF or JAX users. This is not in the scope of this PR, but in the future if we manage to find a way to let the user control the decoding it would be nice

@lhoestq so maybe don't do this PR? :) if it doesn't work anyway with an opened file, only with path

@lhoestq
Copy link
Member

lhoestq commented Feb 4, 2022

Yes as discussed offline there seems to be issues with torchaudio on opened files. Feel free to close this PR if it's better to stick with soundfile because of that

@mariosasko
Copy link
Collaborator

We should be able to remove torchaudio, which has torch as a hard dependency, soon and use only soundfile for decoding: bastibe/python-soundfile#252 (comment) (opus + mp3 support is on the way).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants