Releases: lhotse-speech/lhotse
v1.29.0 - Potion of Everlasting Vigor
What's Changed
Recipes
- Recipe for the Chinese Dysarthric Speech Database by @JinZr in #1423
- Optimized ReazonSpeech download speed using hf datasets features by @yuta0306 in #1434
New features
- Option to save audio in the original format when exporting to shar by @anteju in #1422
CutSet.from_huggingface_dataset()
for importing HF datasets by @pzelasko in #1433- Extend AIStore serialization backend to writing by @pzelasko in #1435
Other improvements
- change max_frames to max_duration in docs by @pengzhendong in #1419
- add opensmile url by @pengzhendong in #1424
- File reading IO refactoring into backends by @pzelasko in #1421
- Fix .m4a support in some setups (possibly for other formats not supported by libsndfile) by @racoiaws in #1427
- add to_dict for CustomFieldMixin class by @pengzhendong in #1426
- Fix consecutive same sampler selection in round robin sampler with num_workers>1 by @pzelasko in #1432
- Fixed copying MixedCut with custom attributes set by @pzelasko in #1436
New Contributors
Full Changelog: v1.28.0...v1.29.0
v1.28.0 - Lurking Lizard
New features
- Implement conversion from CutSet to HuggingFace dataset by @domklement in #1398
- Add workflow: annotate DNSMOS P.835 by @yfyeung in #1406
New recipes
- Add recipe for the Santa Barbara Corpus of Spoken American English (SBCSAE) by @mmaciej2 in #1395
- Adds radio data recipe by @m-wiesner in #1400
- Fleurs by @m-wiesner in #1402
- Add the Emilia corpus by @csukuangfj in #1404
What's Changed
- [spgispeech] Fix durations object is null issue by @frankyoujian in #1390
- Fix backend to None while ffmpeg is unavailable. by @pengzhendong in #1392
- Fix ksponspeech recipe by @yfyeung in #1394
- Fix cli for ksponspeech by @yfyeung in #1393
- [fix] fisher_english recipe by @pengzhendong in #1410
- downgrading sphinx version from 7.2.6 to 7.1.2 by @annapovey in #1409
- Update lhotse.py by @pengzhendong in #1414
- Make torchaudio an optional dependency by @pzelasko in #1382
- minor fix by @pengzhendong in #1418
- Support for AIStore ObjectFile resilient reading when AIStore SDK version >=1.9.1 is present
New Contributors
- @frankyoujian made their first contribution in #1390
- @pengzhendong made their first contribution in #1392
- @mmaciej2 made their first contribution in #1395
- @domklement made their first contribution in #1398
- @annapovey made their first contribution in #1409
Full Changelog: v1.27.0...v1.28.0
v1.27.0 - Crispy Momo
New recipes
- [Recipe] Wenetspeech4tts by @yuekaizhang in #1384
- [Recipe] Spatial LibriSpeech by @JinZr in #1386
Other enhancements
- Cap the 'trng' random seeds to 2**31 avoiding numpy error by @pzelasko in #1379
CutSet
.prefetch() for background cuts loading during iteration by @pzelasko in #1380- Include a copyright NOTICE listing major copyright holders by @pzelasko in #1381
- Added has_custom to MixedCut by @anteju in #1383
- Fix to fixed batch size bucketing and audio loading network connectio… by @pzelasko in #1387
New Contributors
Full Changelog: v1.26.0...v1.27.0
v1.26.0 - Uranium Fever
v1.25.0 - Himalayan Cat
What's Changed
- [feature] Add
.narrowband()
effect (mulaw, lpc10 codecs) by @rouseabout in #1348 - [feature/optimization] Support for pre-determined batch sizes in
DynamicBucketingSampler
by @pzelasko in #1372 - [bug] Fix
MixedCut
transforms serialization by @pzelasko in #1370
Full Changelog: v1.24.2...v1.25.0
v1.24.2
New recipes
New features
Several new APIs for manifest classes added in #1361:
cut.iter_data()
which iterates over (key, manifest) pairs of all data items attached to a given cut (e.g.,("recording", Recording(...)), ("custom_features", TemporalArray(...))
)is_in_memory
property for all manifest types to indicate if it contains data that is held in memoryis_placeholder
for non-cut manifests to indicate if a manifest is just a placeholder (has some metadata, but can't be used to load data)cut.drop_in_memory_data()
which converts manifests with in-memory data to placeholders (this is useful for manifests that live longer than just dataloading to avoid blowing up CPU memory and/or slowing down the program)
Bug fixes
- Restoring smart open for local files if available by @pzelasko in #1360
- Fix Recording.to_dict() when transforms are dicts and transform pickling issues by @pzelasko in #1355
- Utils for discovering attached data and dropping in-memory data by @pzelasko in #1361
- Numpy 2.0 compatibility by @pzelasko in #1362
New Contributors
Full Changelog: v1.24.1...v1.24.2
v1.24.1
v1.24 - The World's Highest Wingsuit Jump
What's Changed
New features
Notably, there's a new optimization for dynamic bucketing sampler in multi-GPU training - it will choose the same (or the closest possible) bucket on each DDP rank to keep the total training step times closer. The expected speedup is dependent on the model and the number of GPUs. We observed 8 and 13% speedups across two experiments compared to non-synchronized bucket selection. The new option is called sync_buckets
and is enabled by default.
- Dynamic bucket selection RNG sync by @pzelasko in #1341
- Add new sampler: weighted sampler by @marcoyang1998 in #1344
reverb_rir
: support Cut input and in memory data by @pzelasko in #1332
Recipes
Other improvements
- Missing 'subset' parameter by @daniel-dona in #1336
- Fix describe on cuts by @keeofkoo in #1340
- Use libsndfile in recording chunk dataset by @pzelasko in #1335
- Fix librispeech manifest caching by @haerski in #1343
- Fix one-off edge case in split_lazy by @pzelasko in #1347
- Increase the start diff tolerance for feature loading by @pzelasko in #1349
- More test coverage for lhotse subset by @pzelasko in #1345
New Contributors
- @keeofkoo made their first contribution in #1340
- @haerski made their first contribution in #1343
- @Triplecq made their first contribution in #1330
Full Changelog: v1.23...v1.24
v1.23 - Snowdrop
What's Changed
Recipes
- MDCC recipe by @JinZr in #1302
- Updated text_norm for
aishell
recipe by @JinZr in #1305 - Allow skipping missing files in AMI download by @pzelasko in #1318
- Add Chinese TTS dataset
baker
. by @csukuangfj in #1304 - In CommonVoice corpus, use .tsv headers to parse and not column index by @daniel-dona in #1328
Fixes to a regression in noise mixing augmentations
- Enhance
CutSet.mix()
randomness and data utilization by @pzelasko in #1315 - Fix randomness in CutMix transform by @pzelasko in #1316
- select a random sub-region of the noise based on the delta duration by @osadj in #1317
Other improvements
- Add dataset for audio tagging by @marcoyang1998 in #1241
- Fix _get_strided_batch device by @lifeiteng in #1303
- Fix typo in README.md by @yfyeung in #1308
- Fix export of features/array to shar by @pzelasko in #1323
- Fix
trim_to_supervision_groups
by @pzelasko in #1322
New Contributors
- @daniel-dona made their first contribution in #1328
Full Changelog: v1.22...v1.23
v1.22 - Sherpa's Paradise
What's Changed
New features
As an experimental feature, we are extending the API of Lhotse samplers to enable key sampling features for non-audio data such as text. That means text (and other) data can be dynamically multiplexed and bucketed in the same way as audio data with some lightweight wrappers. Please refer to new documentation here: https://lhotse.readthedocs.io/en/latest/datasets.html#customizing-sampling-constraints
- Multi-channel support improvements
Lhotse MultiCut
s:
- are now exportable into Lhotse Shar format
- gained a new method
cut = cut.with_channels([0, 1, ...])
to modify the channels they refer to - can have multi-channel custom Recordings with channels selectable via a special custom key (e.g., if defining
cut.target_recording
, audio can be read viacut.load_target_recording()
and channels will be auto-selected by looking upcut.target_recording_channel_selector
).
Recipes
- Add new recipe: speechio by @yuekaizhang in #1297
- tedlium2 recipe by @JinZr in #1296
Other improvements
- Use audio backends and export custom fields in Lhotse Shar by @pzelasko in #1290
- Documentation for random seeds in lhotse + extended support of lazy r… by @pzelasko in #1291
- Cutconcat fixed max duration by @swigls in #1292
- Fix feature_dim of Spectrogram extractors. by @csukuangfj in #1294
- fix whisper for multi-channel data by @yuekaizhang in #1289
- Xfail flaky SileroVAD tests by @pzelasko in #1300
New Contributors
Full Changelog: v1.21...v1.22