LibriLight dataset #1014

yfyeung · 2023-03-30T08:52:03Z

This recipe enables preparing LibriLight manifests.
Since LibriLight is unlabelled, no text field exists in SupervisionSegment.

Usage:

lhotse prepare librilight <librilight_dir> <output_dir>  -j <num_jobs>

desh2608 · 2023-03-30T12:55:05Z

lhotse/recipes/librilight.py

+            recording_id=file_name,
+        )
+        recordings.append(recording)
+        segment = SupervisionSegment(


FWIW, Recording class has a to_cut() method which creates a cut out of the recording, so it may be okay to just return a RecordingSet from LibriLight (like we do with MUSAN). I guess we would miss explicit speaker id information though, so this is fine too.

I'm thinking it's OK to keep the supervisions for speaker and language info, but I would set the text to None (or just avoid setting it, which results in the same).

OK, I have fixed it by avoiding setting the text field.

desh2608 · 2023-03-30T12:55:51Z

lhotse/recipes/librilight.py

+
+def prepare_librilight(
+    corpus_dir: Pathlike,
+    output_dir: Optional[Pathlike] = None,


Since this is a large corpus, perhaps it would be useful to add a num_jobs option to speed up manifest creation? Check the LibriSpeech recipe for an example.

OK, I have implemented this.

pzelasko · 2023-03-30T14:24:22Z

lhotse/recipes/librilight.py

+Libri-light is a benchmark for the training of automatic speech recognition (ASR) systems with limited or no supervision.
+It contains a large dataset of 60K hours of unlabelled speech from audiobooks in English and a small labelled dataset (10h, 1h, and 10 min) plus metrics, trainable baseline models, and pretrained models that use these datasets.
+It is covered in more detail at https://arxiv.org/abs/1912.07875.
+"""


If we're not adding download functions, could you at least provide a link to where this dataset can be obtained from?

desh2608 · 2023-03-31T15:39:26Z

lhotse/recipes/librilight.py

+    corpus_dir = Path(corpus_dir)
+    part_path = corpus_dir / subset
+    audio_paths = []
+    for root, dirs, files in os.walk(part_path):


Can this loop be turned into a one-liner with Path(part_path).rglob("*.flac")?

yfyeung · 2023-04-01T14:19:56Z

Please re-check the code, thx.

pzelasko

LGTM

yfy62 and others added 3 commits March 30, 2023 16:41

Add LibriLight corpus

7c60327

Fix for flake8

619fb33

Update librilight.py

ff1fdc8

desh2608 reviewed Mar 30, 2023

View reviewed changes

pzelasko reviewed Mar 30, 2023

View reviewed changes

yfy62 added 2 commits March 31, 2023 12:54

Add num_jobs

0a67c58

Fix for isort and black8

7218d34

desh2608 reviewed Mar 31, 2023

View reviewed changes

yfy62 added 2 commits April 1, 2023 22:02

Refactor loop

4363b4b

Fix for black

1ec76ac

pzelasko approved these changes Apr 1, 2023

View reviewed changes

pzelasko merged commit a4d9430 into lhotse-speech:master Apr 1, 2023

pzelasko added this to the v1.14 milestone Apr 1, 2023

yfyeung deleted the librilight branch April 2, 2023 06:15

lifeiteng mentioned this pull request Apr 14, 2023

full Libri-light dataset lifeiteng/vall-e#83

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LibriLight dataset #1014

LibriLight dataset #1014

yfyeung commented Mar 30, 2023 •

edited

Loading

desh2608 Mar 30, 2023

pzelasko Mar 30, 2023

yfyeung Mar 31, 2023 •

edited

Loading

desh2608 Mar 30, 2023

yfyeung Mar 31, 2023 •

edited

Loading

pzelasko Mar 30, 2023

yfyeung Mar 31, 2023

desh2608 Mar 31, 2023

yfyeung Apr 1, 2023

yfyeung commented Apr 1, 2023 •

edited

Loading

pzelasko left a comment

LibriLight dataset #1014

LibriLight dataset #1014

Conversation

yfyeung commented Mar 30, 2023 • edited Loading

desh2608 Mar 30, 2023

Choose a reason for hiding this comment

pzelasko Mar 30, 2023

Choose a reason for hiding this comment

yfyeung Mar 31, 2023 • edited Loading

Choose a reason for hiding this comment

desh2608 Mar 30, 2023

Choose a reason for hiding this comment

yfyeung Mar 31, 2023 • edited Loading

Choose a reason for hiding this comment

pzelasko Mar 30, 2023

Choose a reason for hiding this comment

yfyeung Mar 31, 2023

Choose a reason for hiding this comment

desh2608 Mar 31, 2023

Choose a reason for hiding this comment

yfyeung Apr 1, 2023

Choose a reason for hiding this comment

yfyeung commented Apr 1, 2023 • edited Loading

pzelasko left a comment

Choose a reason for hiding this comment

yfyeung commented Mar 30, 2023 •

edited

Loading

yfyeung Mar 31, 2023 •

edited

Loading

yfyeung Mar 31, 2023 •

edited

Loading

yfyeung commented Apr 1, 2023 •

edited

Loading