Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Audio backend refactoring and a workaround for FLAC reading from/writing to in-memory buffers #814

Merged
merged 10 commits into from
Sep 25, 2022

Conversation

pzelasko
Copy link
Collaborator

@pzelasko pzelasko commented Sep 20, 2022

This PR went a little out of scope, I didn't originally intend the refactoring but the audio errors were too confusing even for me at this point. I think I will follow-up with a similar refactoring that @desh2608 did in #820 to improve the code readability. I might also further refactor the "audio backend" thing in Lhotse, apparently it grew quite complicated as it tries to support multiple torchaudio versions, multiple torchaudio backends, libsoundfile, audioread, and custom hacks for things like OPUS and SPHERE, and now also custom hacks for in-memory buffers. I'll need to think if the solution can be made a bit more elegant.

BTW I tried the suggested workaround solution to pytorch/audio#2662 by using the new torchaudio ffmpeg streamer, but I had some issues with some (not all) in-memory buffers saying that seek operation is not permitted, but I could not easily create a reproducible example yet. Maybe I'll revisit this in the future.

@pzelasko pzelasko marked this pull request as ready for review September 25, 2022 03:49
@pzelasko pzelasko changed the title Reading audio with torchaudio ffmpeg streamer Audio backend refactoring and a workaround for FLAC reading from/writing to in-memory buffers Sep 25, 2022
@desh2608
Copy link
Collaborator

Nice! I think most users (me, for one) would care more about what functionalities a Recording or RecordingSet provide, compared to nitty-gritties of how the audio is loaded in the backend. Probably best to separate these 2 concepts.

@pzelasko pzelasko merged commit e08965e into master Sep 25, 2022
@pzelasko pzelasko added this to the v1.8 milestone Sep 25, 2022
desh2608 pushed a commit to desh2608/lhotse that referenced this pull request Sep 25, 2022
…ing to in-memory buffers (lhotse-speech#814)

* Reading audio with torchaudio ffmpeg streamer

* Workaround for broken FLAC BytesIO saving

* Resolve CI errors by checking min torchaudio version >= 0.9 before using soundfile for saving to BytesIO

* Refactor audio loading logic for better extensibility and error display

* Fix CI

* Fix CI

* Prefer libsndfile for in-memory buffer data

* Add a minimum amount of documentation
pzelasko added a commit that referenced this pull request Oct 5, 2022
…822)

* initial commit for multi-channel supervisions

* added base.py

* add mono.py

* add mixed.py

* add padding.py

* add set.py

* add init file

* fix isort

* initial commit for MultiCut

* add type hints for is_equal_or_contains

* fix flake8 issues

* fix flake8 in mono.py

* more changes for MultiCut

* added base.py

* add mono.py

* add mixed.py

* add padding.py

* add set.py

* add init file

* fix isort

* fix flake8 issues

* fix flake8 in mono.py

* Audio backend refactoring and a workaround for FLAC reading from/writing to in-memory buffers (#814)

* Reading audio with torchaudio ffmpeg streamer

* Workaround for broken FLAC BytesIO saving

* Resolve CI errors by checking min torchaudio version >= 0.9 before using soundfile for saving to BytesIO

* Refactor audio loading logic for better extensibility and error display

* Fix CI

* Fix CI

* Prefer libsndfile for in-memory buffer data

* Add a minimum amount of documentation

* initial commit for multi-channel supervisions

* initial commit for MultiCut

* add type hints for is_equal_or_contains

* more changes for MultiCut

* more changes for MultiCut

* more changes for MultiCut

* add type attribute to MixTrack to make from_dict work

* fix rir test case

* all old tests passing

* fix isort

* revert asdict_nonull

* more changes for MultiCut

* remove voxceleb changes

* fixed save_audio

* add tests for multi cut augmentation

* add tests for drop attributes

* more tests for multi cut

* more tests for multi cut

* fix isort

* fix test cases; all passing

* merge_supervisions implemented for each cut type

* add tests for mixing with multi cuts

* update feature mixing

* fix failing test

* incorporate suggestions from @pzelasko

* fix mixing test

* add serialization test for MultiCut

* add multi cut fixture

* added tests for audio mixer

* remove redundant cases in audio mixer

* test for mixing mixed cut with multi cut

Co-authored-by: Piotr Żelasko <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants