"This American Life" dataset recipe #1140

flyingleafe · 2023-09-12T10:13:57Z

From here:

"This dataset consists of transcripts for 663 podcasts from the This American Life radio program from 1995 to 2020, covering 637 hours of audio (57.7 minutes per conversation) and an average of 18 unique speakers per conversation.

We hope that this dataset can serve as a new benchmark for the difficult tasks of speech transcription, speaker diarization, and dialog modeling on long, open-domain, multi-speaker conversations."

The website has been updated since the publishing of the transcripts, so I wrote a simple URL scrapper which works with the new website.

I also duplicated the archive from Kaggle to IPFS, so that it can be downloaded automatically without logging in with Kaggle account.

flyingleafe · 2023-09-12T10:41:18Z

@pzelasko ^ weird that every single CI action is currently successful, but nevertheless Github says some are not...

pzelasko

Looks good! Don't worry about codecov, it's just for information purposes, doesn't need to pass.

Before we merge, can you add an entry in docs/corpus.rst? Thanks!

pzelasko · 2023-09-13T00:34:11Z

Oh, you also need to import the recipe in lhotse/recipes/__init__.py

flyingleafe · 2023-09-13T05:40:50Z

@pzelasko done

pzelasko

Thanks!

flyingleafe added 3 commits September 12, 2023 10:09

Add This American Life recipe

2e7c353

run pre-commit

f0cf813

Fix tests for python 3.7

d1eb482

pzelasko reviewed Sep 12, 2023

View reviewed changes

pzelasko added this to the v1.17 milestone Sep 12, 2023

Docs, import, absolute paths for recordings

418f4ac

pzelasko approved these changes Sep 13, 2023

View reviewed changes

pzelasko merged commit de3f48e into lhotse-speech:master Sep 13, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

"This American Life" dataset recipe #1140

"This American Life" dataset recipe #1140

flyingleafe commented Sep 12, 2023

flyingleafe commented Sep 12, 2023

pzelasko left a comment

pzelasko commented Sep 13, 2023

flyingleafe commented Sep 13, 2023

pzelasko left a comment

"This American Life" dataset recipe #1140

"This American Life" dataset recipe #1140

Conversation

flyingleafe commented Sep 12, 2023

flyingleafe commented Sep 12, 2023

pzelasko left a comment

Choose a reason for hiding this comment

pzelasko commented Sep 13, 2023

flyingleafe commented Sep 13, 2023

pzelasko left a comment

Choose a reason for hiding this comment