-
Notifications
You must be signed in to change notification settings - Fork 223
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Initial support for video #1151
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…ing Recording, solving appending/padding issues
…num frames in their header
flyingleafe
pushed a commit
to flyingleafe/lhotse
that referenced
this pull request
Oct 11, 2023
* Tutorial materials in main readme page * Initial crude video support in AudioSource and Recording * Add downsized test fixture video * Support for loading video + audio at the same time * Enforce consistent video and audio duration * Support for changing video resolution * Basic video support for most cut types * Support for padded video MixedCuts * Enforce audio duration and video duration to be consistent when creating Recording, solving appending/padding issues * Add missing assertion * Stricter tests for padding and appending video cuts * Minimal set of utilities for PyTorch video dataloading * Grid audio-visual speech corpus recipe + support videos with missing num frames in their header * Skip video test for PyTorch < 2.0 * Fix issue with torchaudio.info usage
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR implements some features related to video dataloading support.
TL;DR
More or less detailed list of changes:
Recording.from_file()
supports video files (so doeslhotse.info()
)Recording.load_video()
method that loads video + audio (and keeps them in sync duration-wise)(num_frames, color, height, width)
, dtype isuint8
, and format is RGB - if another makes more sense we can change it laterrecording.video
recording.has_video
recording = recording.with_video_resolution()
cut.has_video
andcut.video
(metadata)collate_video
method that pads with black frames and packs video examples into 5d tensorUnsupervisedAudioVideoDataset
as a basic example for working with videos, returns video + audio mini-batchesFor audio-only workflows, the code is otherwise practically unaffected. Dataloading videos with torch DataLoader generally works without issues. At this point though it's likely the current code won't work with every type of video format, and might not support some other standard Lhotse operations for video data yet. And it'll only work with recent PyTorch versions.