Initial support for video #1151

pzelasko · 2023-09-16T03:41:05Z

This PR implements some features related to video dataloading support.

TL;DR

recording = Recording.from_file("example.mp4")
video, audio = recording.load_video()

cut = recording.to_cut().truncate(duration=1.0)
video, audio = cut.load_video()

More or less detailed list of changes:

Recording
- Recording.from_file() supports video files (so does lhotse.info())
- Recording.load_video() method that loads video + audio (and keeps them in sync duration-wise)
- the video shape is (num_frames, color, height, width), dtype is uint8, and format is RGB - if another makes more sense we can change it later
- support only a single video stream for now (no multi-video files)
- metadata about videos is in recording.video
- check if recording.has_video
- supports dynamic video resolution scaling with recording = recording.with_video_resolution()
- can't load video when using perturb speed/tempo
Cut
- Supports all cut types (Mono, Multi, Mixed)
- Mixed video cuts only support padding + appending with other video cuts, not mixing (but it's OK to mix audio-only cut into video +/- edge cases)
- cut.has_video and cut.video (metadata)
Dataloading
- collate_video method that pads with black frames and packs video examples into 5d tensor
- UnsupervisedAudioVideoDataset as a basic example for working with videos, returns video + audio mini-batches
First supported audio-video corpus
- Grid audio-visual speech corpus

For audio-only workflows, the code is otherwise practically unaffected. Dataloading videos with torch DataLoader generally works without issues. At this point though it's likely the current code won't work with every type of video format, and might not support some other standard Lhotse operations for video data yet. And it'll only work with recent PyTorch versions.

…ing Recording, solving appending/padding issues

…num frames in their header

* Tutorial materials in main readme page * Initial crude video support in AudioSource and Recording * Add downsized test fixture video * Support for loading video + audio at the same time * Enforce consistent video and audio duration * Support for changing video resolution * Basic video support for most cut types * Support for padded video MixedCuts * Enforce audio duration and video duration to be consistent when creating Recording, solving appending/padding issues * Add missing assertion * Stricter tests for padding and appending video cuts * Minimal set of utilities for PyTorch video dataloading * Grid audio-visual speech corpus recipe + support videos with missing num frames in their header * Skip video test for PyTorch < 2.0 * Fix issue with torchaudio.info usage

pzelasko added 19 commits August 29, 2023 10:21

Tutorial materials in main readme page

5b478b5

Merge branch 'master' of https://github.com/lhotse-speech/lhotse

fbc21b7

Merge branch 'master' of https://github.com/lhotse-speech/lhotse

3d8fccb

Merge branch 'master' of https://github.com/lhotse-speech/lhotse

144f541

Merge branch 'master' of https://github.com/lhotse-speech/lhotse

927b8f8

Merge branch 'master' of https://github.com/lhotse-speech/lhotse

477a430

Merge branch 'master' of https://github.com/lhotse-speech/lhotse

22c23b2

Initial crude video support in AudioSource and Recording

6dce054

Add downsized test fixture video

c9066f1

Support for loading video + audio at the same time

5b768dd

Enforce consistent video and audio duration

a8a7f03

Support for changing video resolution

02e394f

Basic video support for most cut types

014035c

Support for padded video MixedCuts

c74ca71

Enforce audio duration and video duration to be consistent when creat…

0c6d04d

…ing Recording, solving appending/padding issues

Add missing assertion

0741675

Stricter tests for padding and appending video cuts

a266bfc

Minimal set of utilities for PyTorch video dataloading

17b3d73

Grid audio-visual speech corpus recipe + support videos with missing …

d983c5a

…num frames in their header

pzelasko added this to the v1.17 milestone Sep 16, 2023

pzelasko added 3 commits September 21, 2023 11:07

Merge branch 'master' into feature/video-support

26f9be0

Skip video test for PyTorch < 2.0

82bb80a

Fix issue with torchaudio.info usage

eda630f

pzelasko merged commit 7b60f86 into master Sep 21, 2023

pzelasko deleted the feature/video-support branch September 21, 2023 21:07

kerolos mentioned this pull request Jun 23, 2024

Support for Video Features, for example How2Sign #1359

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Initial support for video #1151

Initial support for video #1151

pzelasko commented Sep 16, 2023 •

edited

Loading

Initial support for video #1151

Initial support for video #1151

Conversation

pzelasko commented Sep 16, 2023 • edited Loading

pzelasko commented Sep 16, 2023 •

edited

Loading