Releases: lhotse-speech/lhotse
v1.21 - Glaciology
What's Changed
This release patches lhotse to handle cases when libsox is not available for torchaudio. The audio backend code went through additional round of refactoring, and libsndfile
is now preferred as a default since it showed faster audio decoding performance in our testing. Going forward, when LHOTSE_AUDIO_BACKEND
is set, we will use the same backend for audio loading, audio saving, and reading audio metadata (if possible). This release also adds support for Python 3.12 and PyTorch 2.2.
- Add VAD to Supervisions in LibriLight Recipe by @yfyeung in #1280
- Fixes for manifest validation and fixing by @pzelasko in #1284
- Handle error with cachedir creation gracefully by @pzelasko in #1287
AudioBackend
specificsave_audio
andinfo
, managing missing SoX in torchaudio, Python 3.12 / PyTorch 2.2 support, usinglibsndfile
as preferred audio backend by @pzelasko in #1288
Full Changelog: v1.20...v1.21
v1.20 - Pining for the Fjords
What's Changed
New features
- Extended the subset of lhotse that works without installing torchaudio by @pzelasko in #1253 #1255
- Ensure
drop_last=False
always returns an equal number of mini-batches by re-distributing and/or duplicating some data by @pzelasko in #1277 - Improved CPU memory usage and shuffling + bucketing in
DynamicBucketingSampler
by @pzelasko in #1276 - Enable seed randomization in dynamic samplers by @pzelasko in #1278
Recipes
- Fluent Speech Commands dataset, SLU task by @HSTEHSTEHSTE in #1272
Other improvements
- Update docs with env vars used by Lhotse by @pzelasko in #1252
- support whisper large v3; deepspeed launcher rank world_size setting by @yuekaizhang in #1260
- Fix non-deterministic tests by @pzelasko in #1261
- Fix duplication issues in CutSet.mix() by @pzelasko in #1268
- Support controllable
CutSet.mux
weights in multiprocess dataloading by @pzelasko in #1266 - Fix distributed sampler initialization and
exceeded
sampler warning false positives by @pzelasko in #1270 - Install kaldi-native-io explicitly in the kaldi doc example. by @csukuangfj in #1275
- Allow duplicate cut IDs in a CutSet (CutSet is list-like instead of dict-like) by @pzelasko in #1279
New Contributors
- @HSTEHSTEHSTE made their first contribution in #1272
Full Changelog: v1.19...v1.20
v1.19 - The Iceberger
What's Changed
Features
- Support for OPUS encoding in Lhotse Shar format by @pzelasko in #1238
- Perform CutSet.mix() lazily by @pzelasko in #1244
CutSampler.map()
for transformingCutSet
mini-batches by @pzelasko in #1246- Support multiplexing with a limited number of open streams by @pzelasko in #1248
Recipes
- support icmc eval track 1 by @yuekaizhang in #1235
- updating the voxpopuli recipe by @vesis84 in #1243
- Allowing downloading Edin. ver. of VCTK by @JinZr in #1247
Other improvements
- Micro-optimization for LazyJsonlIterator len() by @pzelasko in #1237
- Drop python3.7 support by @pzelasko in #1245
- Fix
normalize_loudness
for MixedCuts with PaddingCuts by @pzelasko in #1249
Full Changelog: v1.18...v1.19
v1.18 - The Ice Age
What's Changed
New features
- MMS forced alignment backend by @flyingleafe in #1185
- Two new options:
CutSet.from_shar(seed="trng")
andDynamicCutSampler(quadratic_duration=...)
by @pzelasko in #1199 - Faster initialization option in
DynamicBucketingSampler
+ various fixes by @pzelasko in #1210 - CLI to estimate and print bucket bins for a cut set by @pzelasko in #1214
- More flexible setting of audio backends by @pzelasko in #1219
Recipes
- Add recipe for Medical Corpus by @yfyeung in #1212
- minor fix for the AMI recipe by @JinZr in #1178
- fixes compatibility with Edin. ver. VCTK dataset by @JinZr in #1182
- Minor bug fix for eval2000 recipe by @JinZr in #1127
- support far field data for icmcasr challenge by @yuekaizhang in #1189
- fixed text norm for
tal_csasr
by @JinZr in #1198 #1213
Other improvements
MixedCut.truncate
: fix the case when onlyPaddingCut
s are left by @flyingleafe in #1157- Fix some potential problems in OPUS file reading by @yangb05 in #1181
- fix an issue where 404 exception leaves 0 byte placeholder by @JinZr in #1190
- Prevent accidental renaming when using with_suffix by @chiiyeh in #1192
- Fix shar export for
num_jobs>1
and recordings with transforms by @pzelasko in #1196 - fix speaker error by @yzmyyff in #1197
- Fix for
trim_to_alignments
issue by @desh2608 in #1193 - Add
deterministic_rng
to more flaky tests by @pzelasko in #1200 - update_recipes by @vesis84 in #1208
- SpeechSynthesisDataset returns
speaker_ids
by @JinZr in #1206 - Fix audio backend selection by @pzelasko in #1216
- save sdm files into a single mdm file to do gss by @yuekaizhang in #1221
- Modify SpeechSynthesisDataset class, make it return text by @yaozengwei in #1205
- Allow lhotse installation without torchaudio for a limited set of features by @pzelasko in #1231
- Use
attacut
module for Thai word tokenization (in MMS forced alignment) by @flyingleafe in #1232
New Contributors
- @yangb05 made their first contribution in #1181
- @chiiyeh made their first contribution in #1192
- @yzmyyff made their first contribution in #1197
- @yaozengwei made their first contribution in #1205
Full Changelog: v1.17...v1.18
v1.17 - Swirling Ice Pick
What's Changed
New supported datasets
- Speech to text translation utilizing 3-way data by @AmirHussein96 in #1099
- "This American Life" dataset recipe by @flyingleafe in #1140
- Add VoxConverse recipe by @flyingleafe in #1142
- Add recipe for ICASSP2024 ICMC-ASR Grand Challenge by @yfyeung in #1172
New features
- Initial support for video by @pzelasko in #1151
copy_data
: copyCutSet
+ its data to a new location by @pzelasko in #1130- Add whisper feature extractor by @yuekaizhang in #1159
- VAD workflow with Silero by @rilshok in #1160
Enhancements and fixes
- Fix feature extraction for lhotse shar CLI by @pzelasko in #1123
- Add m4a to special cases for num samples determination by @pzelasko in #1124
- making the kaldi import more robust by @vesis84 in #1129
- Tutorial materials in main readme page by @pzelasko in #1133
- optimize save_audios() by @vesis84 in #1131
- Fix bugs in
resumable_download
by @flyingleafe in #1135 - Arxiv badge by @desh2608 in #1136
- Fix docs build by @pzelasko in #1137
- Fix failing tests after repairing docs build by @pzelasko in #1138
- Remove deprecated code, make minor cleanups by @pzelasko in #1139
- Enforce deterministic RNG behavior in repeatedly flaky tests by @pzelasko in #1143
- Refactor
audio.py
into smaller modules by @pzelasko in #1144 - Fix broken
save_audio
by @flyingleafe in #1147 - Optimize
cut_into_windows
for long cuts by @flyingleafe in #1150 - Fixes for #1152 #1153 and #1154 by @pzelasko in #1156
- fix bugs in downloading voxpopuli corpus by @DongjiGao in #1165
- Support
export_to_kaldi
on resampled recordings by @sih4sing5hong5 in #1162 - Refactor
CutSet.describe
to enable parallel statistics computation by @pzelasko in #1168 - Allow dashes in feat CLI by @desh2608 in #1169
- Apply deterministic RNG to more unit tests by @pzelasko in #1173
- Add
fix_manifests
in all recipes by @desh2608 in #1128 - Fix small bug in eval2000 by @desh2608 in #1126
- Fix download in LibriCSS recipe by @desh2608 in #1148
New Contributors
- @sih4sing5hong5 made their first contribution in #1162
- @rilshok made their first contribution in #1160
Full Changelog: v1.16...v1.17
v1.16 - Mountain Warming
What's Changed
Recipes
New:
- Add speech translation corpus MuST-C by @csukuangfj in #1079
- Extend LibriTTS recipe to support LibriTTS-R by @pzelasko in #1082
- SURT dataset by @desh2608 in #951
- [Recipe] VoxPopuli by @desh2608 in #1089
- Air Traffic Control (ATC) corpora - various improvements 2 by @rouseabout in #1090
- Add Bengali.AI Speech corpus for Kaggle Research Code Competition by @yfyeung in #1108
- Support AudioMNIST by @csukuangfj in #1093
Improvements:
- Add multithread to peoples_speech by @yfyeung in #1078
- Add multiprocess mechanism for Common Voice by @yfyeung in #1025
- text normalization for aishell4 by @JinZr in #1103
- updated text norm for aishell2 by @JinZr in #1104
- updated text norm for magicdata by @JinZr in #1105
- PR for the KeSpeech recipe by @JinZr in #1106
- Small changes in some existing recipes by @desh2608 in #1110
New features
CutSet.from_files
constructor for random order multi-file cutsets by @pzelasko in #1085- Infinite random-file random-line stateless sampler by @pzelasko in #1102
StatelessSampler
: remove TRNG, makebase_seed
a mandatory parameter by @pzelasko in #1109
Other improvements
- Loudness fix by @desh2608 in #1087
- Update SpecAug custom state_dict to be compatible with PyTorch by @osadj in #1091
- Adding log spectrogram by @Tomiinek in #1094
- add user defined kaldi feature type by @ZihanLiao in #1101
- Update cuts.rst by @taras-sereda in #1112
- Add alignment scores from CTM by @desh2608 in #1116
- Fix and enhance TIMIT recipe by @flyingleafe in #1072
- Fixing shar reader assert by @Tomiinek in #1117
- Support
del (cut|supervision).custom_field
by @pzelasko in #1113 - Exposing
tolerance
for matching supervisions with features in Kaldi imported data dirs by @pzelasko in #1119 - Some additional options for supervision-related methods by @desh2608 in #1115
New Contributors
- @osadj made their first contribution in #1091
- @ZihanLiao made their first contribution in #1101
- @taras-sereda made their first contribution in #1112
Full Changelog: v1.15...v1.16
v1.15 - Magmatic Fractionation
What's Changed
- Resumable dataset downloads by @pzelasko in #1045
- AMI beamformed mic option by @desh2608 in #1048
- Add options to prepare data according to CHiME-7 by @desh2608 in #1051
- Support Shar export of multi-channel, multi-source recording and cuts with start>0 by @pzelasko in #1053
- Add
mono_downmix
option forMultiCut.to_mono()
by @desh2608 in #1052 - Support audio duration mismatch tolerance in
MixedCut.load_audio()
by @pzelasko in #1054 - AudioCache: caching for "command" type of audio files by @vesis84 in #1050
- Small changes in some cut methods by @desh2608 in #1059
- Enhancements and bug fixes for AMI and ICSI by @desh2608 in #1058
- Add GigaST corpus by @yfyeung in #1062
- Air Traffic Control (ATC) corpora by @rouseabout in #1061
- Fix
resumable_download
for fully downloaded files by @flyingleafe in #1060 - Fix for audio loading optimization to return the expected number of samples by @pzelasko in #1071
- Support preparing almost all NSC data part except PART3_SameBoundaryMic by @trunglebka in #1066
- Fix bugs in
MixedCut
logic by @flyingleafe in #1073 - Air Traffic Control (ATC) corpora - various improvements by @rouseabout in #1070
New Contributors
- @rouseabout made their first contribution in #1061
- @flyingleafe made their first contribution in #1060
Full Changelog: v1.14...v1.15
v1.14 - Curiously Delicious Snowflakes
What's Changed
New features
- Add CLIs for creating Lhotse Shar directories and computing features by @pzelasko in #1042
- Integrate torchaudio's 2.0 ffmpeg backend for audio loading + add some optimizations by @pzelasko in #1043
- (note: with PyTorch 2.0 set the following env var:
TORCHAUDIO_USE_BACKEND_DISPATCHER=1
)
- (note: with PyTorch 2.0 set the following env var:
- Loudness normalization with
pyloudnorm
by @desh2608 in #1016
Recipes
New
- LibriLight dataset by @yfyeung in #1014
- EDACC recipe by @pzelasko in #1022
- [Recipe] BUT Reverb DB by @desh2608 in #1028
Improvements
- Aishell3 speaker, gender, and tone labels by @zjwang21 in #1027
- Small fix for speechcommands.py by @yfyeung in #1012
- Minor fix in LibriCSS recipe by @desh2608 in #1021
- Add
parts
andnum_jobs
options for tedlium by @desh2608 in #1030
Other enhancements
- Bump version to 1.14.0.dev and fix .dev version suffix handling by @pzelasko in #1010
- Update recording id in the supervision after applying WPE by @desh2608 in #1015
- Specify what formats are expected when using the pipe: prefix to load cuts by @srdecny in #1013
- Function to drop alignments from cut by @desh2608 in #1019
- Fix bug in lazy CutSet subset with last by @desh2608 in #1023
- Fix LoudnessNormalization by @lifeiteng in #1029
- API to enable/disable ffmpeg-torchaudio by @desh2608 in #1032
- Ensure RIR has same sampling rate as audio by @desh2608 in #1037
- Add
transforms
attribute for MixedCut by @desh2608 in #1035 - Fix #1038 and #1039 by @pzelasko in #1040
New Contributors
Full Changelog: v1.13...v1.14
v1.13 - Local Freezing
What's Changed
New tutorials
Recipes
New
- CSJ: Faithful Manifest by @teowenshen in #940
- himia dataset by @glynpu in #991
- Speech Commands v0.01 & v0.02 dataset by @yfyeung in #996
- Aishell3 by @zjwang21 in #998
Fixes
- add dataset-parts argument to libritts by @lifeiteng in #956
- Add option to create segments for LibriCSS by @desh2608 in #961
- fix tal_csasr data pre-processing by @KajiMaCN in #975
- Fix wrong Common Voice parsing by @trunglebka in #979
- add the download function of commonvoice command line interface by @manbaaaa in #968
- store LJSpeech normalized text by @lifeiteng in #988
- LJSpeech strip normalized text by @lifeiteng in #992
New features
- Optional quadratic duration correction for dynamic bucketing sampler by @pzelasko in #950
- Tentative lhotse --> kaldi manifests conversion for multiple channels by @popcornell in #962
- Add RecordingChunkIterableDataset by @pzelasko in #985
- Python 3.11 support by @pzelasko in #866
(cut|recording).dereverb_wpe() API
+ more stable numpy version by @pzelasko in #1000
General improvements
- Release v1.12, bump dev version to 1.13.0.dev by @pzelasko in #945
- Batch extraction for kaldi features by @desh2608 in #947
- Fix features_lens for rare failure cases by @pzelasko in #953
- add 'decode_options' to annotate_with_whisper by @Joemgu7 in #954
- Preserve custom field when convering MultiCut to MonoCuts by @pzelasko in #957
- Fix libritts dataset-parts by @lifeiteng in #960
- Handling of -1 in the segments file by @JinZr in #952
- Bug in
trim_to_supervision_groups
+ tolerance foroverspans
by @desh2608 in #963 - Fix for OnTheFlyFeatures with batched inference by @pzelasko in #965
- Allow dash in SequentialJsonlWriter by @desh2608 in #967
- Support
move_to_memory
forMixedCut
andPaddingCut
by @pzelasko in #970 - Add new method
MixedCut.to_mono()
by @pzelasko in #973 - Add multiprocessing to meeting simulation workflow by @desh2608 in #972
- Kaldi-import: floor wav duration to milliseconds by @vesis84 in #971
- Fix bug in computing same speaker pause distribution by @desh2608 in #974
- Add padding direction when using transform ExtraPadding by @marcoyang1998 in #980
- make sure the ProcessPoolExecutor executor uses spawn context by @jtrmal in #982
- Small fix for example in class SupervisionSegment by @yfyeung in #994
- Changing devices in Fbank by @Tomiinek in #999
- Fix for issues #1001 by @yfyeung in #1002
- Fix typo in https://github.com/lhotse-speech/lhotse/blob/master/lhotse/cut/mixed.py line 76 by @yfyeung in #1003
- Add cache for KaldiReader by @david20181 in #1004
- Fix load_kaldi_dara_dir not loading segments and feats.scp correctly by @yasumori in #1005
- Fix minor bug in conversational meeting simulation algorithm by @desh2608 in #1007
New Contributors
- @Joemgu7 made their first contribution in #954
- @lifeiteng made their first contribution in #956
- @vesis84 made their first contribution in #971
- @KajiMaCN made their first contribution in #975
- @marcoyang1998 made their first contribution in #980
- @manbaaaa made their first contribution in #968
- @yfyeung made their first contribution in #994
- @zjwang21 made their first contribution in #998
- @david20181 made their first contribution in #1004
- @yasumori made their first contribution in #1005
Full Changelog: v1.12...v1.13
v1.12 - Spicy Yak
What's Changed
- Downloading the AMI dataset using CLI will trigger TypeError by @JinZr in #919
- AliMeeting recipe enhancement by @desh2608 in #909
- fixing alignments by @lubacien in #914
- Add room and source RNG seeds option for
reverb_rir
by @desh2608 in #920 - Fix serialization for reverb by @desh2608 in #927
- Add
trim_to_alignments()
method by @desh2608 in #926 - Remove negative duration segments from whisper by @desh2608 in #928
- Add
trim_to_supervision_groups
method by @desh2608 in #930 - Fix the use of deprecated np.float in numpy>=1.24 by @pzelasko in #936
- Remove 'NonPositiveEnergyError' exception when mix audio by @drawfish in #922
- minor update for recipes.utils.read_manifests_if_cached by @trunglebka in #932
- small pad fix by @lubacien in #934
- Unit test that Shar reader is working when shard tar files are named randomly by @pzelasko in #937
- [workflow] Multi-talker meeting simulation by @desh2608 in #929
- Safe extract for more recipes by @desh2608 in #941
- Sampler.filter() preserves previous filters on multiple calls by @pzelasko in #944
- Batched feature extraction for s3prl by @desh2608 in #942
New Contributors
- @lubacien made their first contribution in #914
- @trunglebka made their first contribution in #932
Full Changelog: v1.11...v1.12