Add LRS3 data preparation #3421

mpc001 · 2023-06-08T10:44:42Z

This PR adds a data preparation recipe that uses the ultra face detector to extract full-face video. The resulting video output is then used as input for training and evaluating RNNT-based models for automatic speech recognition (ASR), visual speech recognition (VSR), and audio-visual ASR (AV-ASR) on the LRS3 dataset.

This PR also updates the word error rate (WER) for AV-ASR LRS3 models and improves the code readability.

pytorch-bot · 2023-06-08T10:44:46Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/audio/3421

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 3 New Failures

As of commit 6231cb1:

NEW FAILURES - The following jobs have failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

mthrok

The code looks fine, but IIUC, the preparation uses four different FFmpeg-related packages. torchvision, opencv, ffmpeg, torchaudio.

This might cause some subtle difference in the data processing.

examples/asr/avsr_rnnt/data_prep/main.py

facebook-github-bot · 2023-06-16T14:15:05Z

@mthrok has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Summary: This PR adds a data preparation recipe that uses the ultra face detector to extract full-face video. The resulting video output is then used as input for training and evaluating RNNT-based models for automatic speech recognition (ASR), visual speech recognition (VSR), and audio-visual ASR (AV-ASR) on the LRS3 dataset. This PR also updates the word error rate (WER) for AV-ASR LRS3 models and improves the code readability. Pull Request resolved: pytorch#3421 Reviewed By: mpc001 Differential Revision: D46799748 Pulled By: mthrok fbshipit-source-id: 28a4fc1251700c739411db216d156e75db5db4fa

facebook-github-bot · 2023-06-16T15:05:10Z

This pull request was exported from Phabricator. Differential Revision: D46799748

facebook-github-bot · 2023-06-16T15:50:04Z

@mthrok merged this pull request in 77cdd16.

github-actions · 2023-06-16T15:51:15Z

Hey @mthrok.
You merged this PR, but labels were not properly added. Please add a primary and secondary label (See https://github.com/pytorch/audio/blob/main/.github/process_commit.py).

Some guidance:

Use 'module: ops' for operations under 'torchaudio/{transforms, functional}', and ML-related components under 'torchaudio/csrc' (e.g. RNN-T loss).

Things in "examples" directory:

'recipe' is applicable to training recipes under the 'examples' folder,
'tutorial' is applicable to tutorials under the “examples/tutorials” folder
'example' is applicable to everything else (e.g. C++ examples)
'module: docs' is applicable to code documentations (not to tutorials).

Regarding examples in code documentations, please also use 'module: docs'.

Please use 'other' tag only when you’re sure the changes are not much relevant to users, or when all other tags are not applicable. Try not to use it often, in order to minimize efforts required when we prepare release notes.

When preparing release notes, please make sure 'documentation' and 'tutorials' occur as the last sub-categories under each primary category like 'new feature', 'improvements' or 'prototype'.

Things related to build are by default excluded from the release note, except when it impacts users. For example:
* Drop support of Python 3.7.
* Add support of Python 3.X.
* Change the way a third party library is bound (so that user needs to install it separately).

facebook-github-bot added the CLA Signed label Jun 8, 2023

mpc001 changed the title ~~update AV-ASR results and simplify the training procedure~~ Add LRS3 data preparation Jun 8, 2023

mthrok approved these changes Jun 15, 2023

View reviewed changes

examples/asr/avsr_rnnt/data_prep/main.py Show resolved Hide resolved

mthrok reviewed Jun 15, 2023

View reviewed changes

examples/asr/avsr_rnnt/data_prep/main.py Outdated Show resolved Hide resolved

mpc001 force-pushed the avsr branch from 088793b to 6231cb1 Compare June 16, 2023 15:04

facebook-github-bot closed this in 77cdd16 Jun 16, 2023

facebook-github-bot added the Merged label Jun 16, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add LRS3 data preparation #3421

Add LRS3 data preparation #3421

mpc001 commented Jun 8, 2023

pytorch-bot bot commented Jun 8, 2023 •

edited

Loading

mthrok left a comment

facebook-github-bot commented Jun 16, 2023

facebook-github-bot commented Jun 16, 2023

facebook-github-bot commented Jun 16, 2023

github-actions bot commented Jun 16, 2023

Add LRS3 data preparation #3421

Add LRS3 data preparation #3421

Conversation

mpc001 commented Jun 8, 2023

pytorch-bot bot commented Jun 8, 2023 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/audio/3421

❌ 3 New Failures

mthrok left a comment

Choose a reason for hiding this comment

facebook-github-bot commented Jun 16, 2023

facebook-github-bot commented Jun 16, 2023

facebook-github-bot commented Jun 16, 2023

github-actions bot commented Jun 16, 2023

Some guidance:

pytorch-bot bot commented Jun 8, 2023 •

edited

Loading