v1.2 - Winter in the South
New Recipes
- Adding lhotse recipe to prepare eval2000 data by @GoVivace in #679
- adding Earnings-21 dataset from rev-dot-com by @jtrmal in #709
- Adding the second revdotcom's earnings corpus by @jtrmal in #713
- MGB2 recipe by @AmirHussein96 in #718
What's Changed
- Fix import namespaces by @pzelasko in #698
.repeat(..., preserve_id=...)
option for repeating manifests by @pzelasko in #699- Kaldi impex: remove invalid test by @jtrmal in #700
- Minor fix in base url for AliMeeting download by @desh2608 in #702
- [aidatatang_200zh] Avoid being converted to ASCII when preparing manifest by @luomingshuang in #703
- [ali_meeting] Fix some path errors for ali_meeting.py by @luomingshuang in #705
- [ali_meeting] Avoid being converted to ASCII by @luomingshuang in #704
- Test for webdataset data de-duplication across ranks by @pzelasko in #706
- Fixing data duplication with WebDataset in multi-node multi-worker training by @pzelasko in #707
- Fix epoch setting for WebDataset shard shuffling by @pzelasko in #708
- Full shard shuffling with webdataset by @pzelasko in #711
- Raise an error when
BucketingSampler
is used with a lazyCutSet
by @pzelasko in #710 - Normalize output path names for recipes by @desh2608 in #712
- [webdataset] Add shard of origin to Cut.shard_origin custom field by @pzelasko in #714
- Update examples of combining datasets with RoundRobinSampler and add
stop_early
option. by @pzelasko in #716 pre-commit
,isort
+ CI checks + running it on all code by @pzelasko in #720
New Contributors
- @GoVivace made their first contribution in #679
- @AmirHussein96 made their first contribution in #718
Full Changelog: v1.1...v1.2