13 Feb 19:57

pzelasko

769c273

v1.21 - Glaciology

What's Changed

This release patches lhotse to handle cases when libsox is not available for torchaudio. The audio backend code went through additional round of refactoring, and libsndfile is now preferred as a default since it showed faster audio decoding performance in our testing. Going forward, when LHOTSE_AUDIO_BACKEND is set, we will use the same backend for audio loading, audio saving, and reading audio metadata (if possible). This release also adds support for Python 3.12 and PyTorch 2.2.

Add VAD to Supervisions in LibriLight Recipe by @yfyeung in #1280
Fixes for manifest validation and fixing by @pzelasko in #1284
Handle error with cachedir creation gracefully by @pzelasko in #1287
AudioBackend specific save_audio and info, managing missing SoX in torchaudio, Python 3.12 / PyTorch 2.2 support, using libsndfile as preferred audio backend by @pzelasko in #1288

Full Changelog: v1.20...v1.21

Contributors

pzelasko and yfyeung

Assets 2

31 Jan 20:51

pzelasko

v1.20

455b20e

v1.20 - Pining for the Fjords

What's Changed

New features

Extended the subset of lhotse that works without installing torchaudio by @pzelasko in #1253 #1255
Ensure drop_last=False always returns an equal number of mini-batches by re-distributing and/or duplicating some data by @pzelasko in #1277
Improved CPU memory usage and shuffling + bucketing in DynamicBucketingSampler by @pzelasko in #1276
Enable seed randomization in dynamic samplers by @pzelasko in #1278

Recipes

Fluent Speech Commands dataset, SLU task by @HSTEHSTEHSTE in #1272

Other improvements

Update docs with env vars used by Lhotse by @pzelasko in #1252
support whisper large v3; deepspeed launcher rank world_size setting by @yuekaizhang in #1260
Fix non-deterministic tests by @pzelasko in #1261
Fix duplication issues in CutSet.mix() by @pzelasko in #1268
Support controllable CutSet.mux weights in multiprocess dataloading by @pzelasko in #1266
Fix distributed sampler initialization and exceeded sampler warning false positives by @pzelasko in #1270
Install kaldi-native-io explicitly in the kaldi doc example. by @csukuangfj in #1275
Allow duplicate cut IDs in a CutSet (CutSet is list-like instead of dict-like) by @pzelasko in #1279

New Contributors

@HSTEHSTEHSTE made their first contribution in #1272

Full Changelog: v1.19...v1.20

Contributors

csukuangfj, pzelasko, and 2 other contributors

Assets 2

02 Jan 14:58

pzelasko

v1.19

3e53b68

v1.19 - The Iceberger

What's Changed

Features

Support for OPUS encoding in Lhotse Shar format by @pzelasko in #1238
Perform CutSet.mix() lazily by @pzelasko in #1244
CutSampler.map() for transforming CutSet mini-batches by @pzelasko in #1246
Support multiplexing with a limited number of open streams by @pzelasko in #1248

Recipes

support icmc eval track 1 by @yuekaizhang in #1235
updating the voxpopuli recipe by @vesis84 in #1243
Allowing downloading Edin. ver. of VCTK by @JinZr in #1247

Other improvements

Micro-optimization for LazyJsonlIterator len() by @pzelasko in #1237
Drop python3.7 support by @pzelasko in #1245
Fix normalize_loudness for MixedCuts with PaddingCuts by @pzelasko in #1249

Full Changelog: v1.18...v1.19

Contributors

KarelVesely84, pzelasko, and 2 other contributors

Assets 2

11 Dec 14:10

pzelasko

v1.18

78b3a12

v1.18 - The Ice Age

What's Changed

New features

MMS forced alignment backend by @flyingleafe in #1185
Two new options: CutSet.from_shar(seed="trng") and DynamicCutSampler(quadratic_duration=...) by @pzelasko in #1199
Faster initialization option in DynamicBucketingSampler + various fixes by @pzelasko in #1210
CLI to estimate and print bucket bins for a cut set by @pzelasko in #1214
More flexible setting of audio backends by @pzelasko in #1219

Recipes

Add recipe for Medical Corpus by @yfyeung in #1212
minor fix for the AMI recipe by @JinZr in #1178
fixes compatibility with Edin. ver. VCTK dataset by @JinZr in #1182
Minor bug fix for eval2000 recipe by @JinZr in #1127
support far field data for icmcasr challenge by @yuekaizhang in #1189
fixed text norm for tal_csasr by @JinZr in #1198 #1213

Other improvements

MixedCut.truncate: fix the case when only PaddingCuts are left by @flyingleafe in #1157
Fix some potential problems in OPUS file reading by @yangb05 in #1181
fix an issue where 404 exception leaves 0 byte placeholder by @JinZr in #1190
Prevent accidental renaming when using with_suffix by @chiiyeh in #1192
Fix shar export for num_jobs>1 and recordings with transforms by @pzelasko in #1196
fix speaker error by @yzmyyff in #1197
Fix for trim_to_alignments issue by @desh2608 in #1193
Add deterministic_rng to more flaky tests by @pzelasko in #1200
update_recipes by @vesis84 in #1208
SpeechSynthesisDataset returns speaker_ids by @JinZr in #1206
Fix audio backend selection by @pzelasko in #1216
save sdm files into a single mdm file to do gss by @yuekaizhang in #1221
Modify SpeechSynthesisDataset class, make it return text by @yaozengwei in #1205
Allow lhotse installation without torchaudio for a limited set of features by @pzelasko in #1231
Use attacut module for Thai word tokenization (in MMS forced alignment) by @flyingleafe in #1232

New Contributors

@yangb05 made their first contribution in #1181
@chiiyeh made their first contribution in #1192
@yzmyyff made their first contribution in #1197
@yaozengwei made their first contribution in #1205

Full Changelog: v1.17...v1.18

Contributors

flyingleafe, desh2608, and 9 other contributors

Assets 2

08 Oct 23:31

pzelasko

v1.17

9c80a1e

v1.17 - Swirling Ice Pick

What's Changed

New supported datasets

Speech to text translation utilizing 3-way data by @AmirHussein96 in #1099
"This American Life" dataset recipe by @flyingleafe in #1140
Add VoxConverse recipe by @flyingleafe in #1142
Add recipe for ICASSP2024 ICMC-ASR Grand Challenge by @yfyeung in #1172

New features

Initial support for video by @pzelasko in #1151
copy_data: copy CutSet + its data to a new location by @pzelasko in #1130
Add whisper feature extractor by @yuekaizhang in #1159
VAD workflow with Silero by @rilshok in #1160

Enhancements and fixes

Fix feature extraction for lhotse shar CLI by @pzelasko in #1123
Add m4a to special cases for num samples determination by @pzelasko in #1124
making the kaldi import more robust by @vesis84 in #1129
Tutorial materials in main readme page by @pzelasko in #1133
optimize save_audios() by @vesis84 in #1131
Fix bugs in resumable_download by @flyingleafe in #1135
Arxiv badge by @desh2608 in #1136
Fix docs build by @pzelasko in #1137
Fix failing tests after repairing docs build by @pzelasko in #1138
Remove deprecated code, make minor cleanups by @pzelasko in #1139
Enforce deterministic RNG behavior in repeatedly flaky tests by @pzelasko in #1143
Refactor audio.py into smaller modules by @pzelasko in #1144
Fix broken save_audio by @flyingleafe in #1147
Optimize cut_into_windows for long cuts by @flyingleafe in #1150
Fixes for #1152 #1153 and #1154 by @pzelasko in #1156
fix bugs in downloading voxpopuli corpus by @DongjiGao in #1165
Support export_to_kaldi on resampled recordings by @sih4sing5hong5 in #1162
Refactor CutSet.describe to enable parallel statistics computation by @pzelasko in #1168
Allow dashes in feat CLI by @desh2608 in #1169
Apply deterministic RNG to more unit tests by @pzelasko in #1173
Add fix_manifests in all recipes by @desh2608 in #1128
Fix small bug in eval2000 by @desh2608 in #1126
Fix download in LibriCSS recipe by @desh2608 in #1148

New Contributors

@sih4sing5hong5 made their first contribution in #1162
@rilshok made their first contribution in #1160

Full Changelog: v1.16...v1.17

Contributors

flyingleafe, desh2608, and 8 other contributors

Assets 2

11 Aug 19:36

pzelasko

v1.16

aa073f6

v1.16 - Mountain Warming

What's Changed

Recipes

New:

Add speech translation corpus MuST-C by @csukuangfj in #1079
Extend LibriTTS recipe to support LibriTTS-R by @pzelasko in #1082
SURT dataset by @desh2608 in #951
[Recipe] VoxPopuli by @desh2608 in #1089
Air Traffic Control (ATC) corpora - various improvements 2 by @rouseabout in #1090
Add Bengali.AI Speech corpus for Kaggle Research Code Competition by @yfyeung in #1108
Support AudioMNIST by @csukuangfj in #1093

Improvements:

Add multithread to peoples_speech by @yfyeung in #1078
Add multiprocess mechanism for Common Voice by @yfyeung in #1025
text normalization for aishell4 by @JinZr in #1103
updated text norm for aishell2 by @JinZr in #1104
updated text norm for magicdata by @JinZr in #1105
PR for the KeSpeech recipe by @JinZr in #1106
Small changes in some existing recipes by @desh2608 in #1110

New features

CutSet.from_files constructor for random order multi-file cutsets by @pzelasko in #1085
Infinite random-file random-line stateless sampler by @pzelasko in #1102
StatelessSampler: remove TRNG, make base_seed a mandatory parameter by @pzelasko in #1109

Other improvements

Loudness fix by @desh2608 in #1087
Update SpecAug custom state_dict to be compatible with PyTorch by @osadj in #1091
Adding log spectrogram by @Tomiinek in #1094
add user defined kaldi feature type by @ZihanLiao in #1101
Update cuts.rst by @taras-sereda in #1112
Add alignment scores from CTM by @desh2608 in #1116
Fix and enhance TIMIT recipe by @flyingleafe in #1072
Fixing shar reader assert by @Tomiinek in #1117
Support del (cut|supervision).custom_field by @pzelasko in #1113
Exposing tolerance for matching supervisions with features in Kaldi imported data dirs by @pzelasko in #1119
Some additional options for supervision-related methods by @desh2608 in #1115

New Contributors

@osadj made their first contribution in #1091
@ZihanLiao made their first contribution in #1101
@taras-sereda made their first contribution in #1112

Full Changelog: v1.15...v1.16

Contributors

flyingleafe, csukuangfj, and 9 other contributors

Assets 2

27 May 00:20

pzelasko

v1.15

3071ade

v1.15 - Magmatic Fractionation

What's Changed

Resumable dataset downloads by @pzelasko in #1045
AMI beamformed mic option by @desh2608 in #1048
Add options to prepare data according to CHiME-7 by @desh2608 in #1051
Support Shar export of multi-channel, multi-source recording and cuts with start>0 by @pzelasko in #1053
Add mono_downmix option for MultiCut.to_mono() by @desh2608 in #1052
Support audio duration mismatch tolerance in MixedCut.load_audio() by @pzelasko in #1054
AudioCache: caching for "command" type of audio files by @vesis84 in #1050
Small changes in some cut methods by @desh2608 in #1059
Enhancements and bug fixes for AMI and ICSI by @desh2608 in #1058
Add GigaST corpus by @yfyeung in #1062
Air Traffic Control (ATC) corpora by @rouseabout in #1061
Fix resumable_download for fully downloaded files by @flyingleafe in #1060
Fix for audio loading optimization to return the expected number of samples by @pzelasko in #1071
Support preparing almost all NSC data part except PART3_SameBoundaryMic by @trunglebka in #1066
Fix bugs in MixedCut logic by @flyingleafe in #1073
Air Traffic Control (ATC) corpora - various improvements by @rouseabout in #1070

New Contributors

@rouseabout made their first contribution in #1061
@flyingleafe made their first contribution in #1060

Full Changelog: v1.14...v1.15

Contributors

flyingleafe, desh2608, and 5 other contributors

Assets 2

27 Apr 01:00

pzelasko

v1.14

2eff8bc

v1.14 - Curiously Delicious Snowflakes

What's Changed

New features

Add CLIs for creating Lhotse Shar directories and computing features by @pzelasko in #1042
Integrate torchaudio's 2.0 ffmpeg backend for audio loading + add some optimizations by @pzelasko in #1043
- (note: with PyTorch 2.0 set the following env var: TORCHAUDIO_USE_BACKEND_DISPATCHER=1)
Loudness normalization with pyloudnorm by @desh2608 in #1016

Recipes

New

LibriLight dataset by @yfyeung in #1014
EDACC recipe by @pzelasko in #1022
[Recipe] BUT Reverb DB by @desh2608 in #1028

Improvements

Aishell3 speaker, gender, and tone labels by @zjwang21 in #1027
Small fix for speechcommands.py by @yfyeung in #1012
Minor fix in LibriCSS recipe by @desh2608 in #1021
Add parts and num_jobs options for tedlium by @desh2608 in #1030

Other enhancements

Bump version to 1.14.0.dev and fix .dev version suffix handling by @pzelasko in #1010
Update recording id in the supervision after applying WPE by @desh2608 in #1015
Specify what formats are expected when using the pipe: prefix to load cuts by @srdecny in #1013
Function to drop alignments from cut by @desh2608 in #1019
Fix bug in lazy CutSet subset with last by @desh2608 in #1023
Fix LoudnessNormalization by @lifeiteng in #1029
API to enable/disable ffmpeg-torchaudio by @desh2608 in #1032
Ensure RIR has same sampling rate as audio by @desh2608 in #1037
Add transforms attribute for MixedCut by @desh2608 in #1035
Fix #1038 and #1039 by @pzelasko in #1040

New Contributors

@srdecny made their first contribution in #1013

Full Changelog: v1.13...v1.14

Contributors

lifeiteng, desh2608, and 4 other contributors

Assets 2

23 Mar 14:26

pzelasko

v1.13

23a7922

v1.13 - Local Freezing

What's Changed

New tutorials

Lhotse Shar tutorial notebook by @pzelasko in #1006

Recipes

New

CSJ: Faithful Manifest by @teowenshen in #940
himia dataset by @glynpu in #991
Speech Commands v0.01 & v0.02 dataset by @yfyeung in #996
Aishell3 by @zjwang21 in #998

Fixes

add dataset-parts argument to libritts by @lifeiteng in #956
Add option to create segments for LibriCSS by @desh2608 in #961
fix tal_csasr data pre-processing by @KajiMaCN in #975
Fix wrong Common Voice parsing by @trunglebka in #979
add the download function of commonvoice command line interface by @manbaaaa in #968
store LJSpeech normalized text by @lifeiteng in #988
LJSpeech strip normalized text by @lifeiteng in #992

New features

Optional quadratic duration correction for dynamic bucketing sampler by @pzelasko in #950
Tentative lhotse --> kaldi manifests conversion for multiple channels by @popcornell in #962
Add RecordingChunkIterableDataset by @pzelasko in #985
Python 3.11 support by @pzelasko in #866
(cut|recording).dereverb_wpe() API + more stable numpy version by @pzelasko in #1000

General improvements

Release v1.12, bump dev version to 1.13.0.dev by @pzelasko in #945
Batch extraction for kaldi features by @desh2608 in #947
Fix features_lens for rare failure cases by @pzelasko in #953
add 'decode_options' to annotate_with_whisper by @Joemgu7 in #954
Preserve custom field when convering MultiCut to MonoCuts by @pzelasko in #957
Fix libritts dataset-parts by @lifeiteng in #960
Handling of -1 in the segments file by @JinZr in #952
Bug in trim_to_supervision_groups + tolerance for overspans by @desh2608 in #963
Fix for OnTheFlyFeatures with batched inference by @pzelasko in #965
Allow dash in SequentialJsonlWriter by @desh2608 in #967
Support move_to_memory for MixedCut and PaddingCut by @pzelasko in #970
Add new method MixedCut.to_mono() by @pzelasko in #973
Add multiprocessing to meeting simulation workflow by @desh2608 in #972
Kaldi-import: floor wav duration to milliseconds by @vesis84 in #971
Fix bug in computing same speaker pause distribution by @desh2608 in #974
Add padding direction when using transform ExtraPadding by @marcoyang1998 in #980
make sure the ProcessPoolExecutor executor uses spawn context by @jtrmal in #982
Small fix for example in class SupervisionSegment by @yfyeung in #994
Changing devices in Fbank by @Tomiinek in #999
Fix for issues #1001 by @yfyeung in #1002
Fix typo in https://github.com/lhotse-speech/lhotse/blob/master/lhotse/cut/mixed.py line 76 by @yfyeung in #1003
Add cache for KaldiReader by @david20181 in #1004
Fix load_kaldi_dara_dir not loading segments and feats.scp correctly by @yasumori in #1005
Fix minor bug in conversational meeting simulation algorithm by @desh2608 in #1007

New Contributors

@Joemgu7 made their first contribution in #954
@lifeiteng made their first contribution in #956
@vesis84 made their first contribution in #971
@KajiMaCN made their first contribution in #975
@marcoyang1998 made their first contribution in #980
@manbaaaa made their first contribution in #968
@yfyeung made their first contribution in #994
@zjwang21 made their first contribution in #998
@david20181 made their first contribution in #1004
@yasumori made their first contribution in #1005

Full Changelog: v1.12...v1.13

Contributors

lifeiteng, desh2608, and 17 other contributors

Assets 2

17 Jan 00:41

pzelasko

v1.12

c33345d

v1.12 - Spicy Yak

What's Changed

Downloading the AMI dataset using CLI will trigger TypeError by @JinZr in #919
AliMeeting recipe enhancement by @desh2608 in #909
fixing alignments by @lubacien in #914
Add room and source RNG seeds option for reverb_rir by @desh2608 in #920
Fix serialization for reverb by @desh2608 in #927
Add trim_to_alignments() method by @desh2608 in #926
Remove negative duration segments from whisper by @desh2608 in #928
Add trim_to_supervision_groups method by @desh2608 in #930
Fix the use of deprecated np.float in numpy>=1.24 by @pzelasko in #936
Remove 'NonPositiveEnergyError' exception when mix audio by @drawfish in #922
minor update for recipes.utils.read_manifests_if_cached by @trunglebka in #932
small pad fix by @lubacien in #934
Unit test that Shar reader is working when shard tar files are named randomly by @pzelasko in #937
[workflow] Multi-talker meeting simulation by @desh2608 in #929
Safe extract for more recipes by @desh2608 in #941
Sampler.filter() preserves previous filters on multiple calls by @pzelasko in #944
Batched feature extraction for s3prl by @desh2608 in #942

New Contributors

@lubacien made their first contribution in #914
@trunglebka made their first contribution in #932

Full Changelog: v1.11...v1.12

Contributors

drawfish, desh2608, and 4 other contributors

Assets 2

Releases: lhotse-speech/lhotse

v1.21 - Glaciology

What's Changed

Contributors

v1.20 - Pining for the Fjords

What's Changed

New features

Recipes

Other improvements

New Contributors

Contributors

v1.19 - The Iceberger

What's Changed

Features

Recipes

Other improvements

Contributors

v1.18 - The Ice Age

What's Changed

New features

Recipes

Other improvements

New Contributors

Contributors

v1.17 - Swirling Ice Pick

What's Changed

New supported datasets

New features

Enhancements and fixes

New Contributors

Contributors

v1.16 - Mountain Warming

What's Changed

Recipes

New features

Other improvements

New Contributors

Contributors

v1.15 - Magmatic Fractionation

What's Changed

New Contributors

Contributors

v1.14 - Curiously Delicious Snowflakes

What's Changed

New features

Recipes

New

Improvements

Other enhancements

New Contributors

Contributors

v1.13 - Local Freezing

What's Changed

New tutorials

Recipes

New features

General improvements

New Contributors

Contributors

v1.12 - Spicy Yak

What's Changed

New Contributors

Contributors