Releases: NVIDIA/NeMo
NVIDIA Neural Modules 2.0.0rc0
Highlights
LLM and MM
Models
-
Megatron Core RETRO
- Pre-training
- Zero-shot Evaluation
-
Pretraining, conversion, evaluation, SFT, and PEFT for:
- Mixtral 8X22B
- Llama 3
- SpaceGemma
-
Embedding Models Fine Tuning
- Mistral
- BERT
-
BERT models
- Context Parallel
- Distributed checkpoint
-
Video capabilities with NeVa
Performance
-
Distributed Checkpointing
- Torch native backend
- Parallel read/write
- Async write
-
Multimodal LLM (LLAVA/NeVA)
- Pipeline Parallelism support
- Sequence packing support
Export
- Integration of Export & Deploy Modules into NeMo Framework container
- Upgrade to TRT-LLM 0.9
Speech (ASR & TTS)
Models
- AED Multi Task Models (Canary) - Multi-Task Multi-Lingual Speech Recognition / Speech Translation model
- Multimodal Domain - Speech LLM supporting SALM Model
- Parakeet-tdt_ctc-1.1b Model - RTFx of > 1500 (can transcribe 1500 seconds of audio in 1 second)
- Audio Codec 16kHz Small - NeMo Neural Audio Codec for discretizing speech for use in LLMs
- mel_codec_22khz_medium
- mel_codec_44khz_medium
Perf Improvements
- Transcribe() upgrade - Enables one line transcribe with files, tensors, data loaders
- Frame looping algorithm for RNNT faster decoding - Improves Real Time Factor (RTF) by 2-3x
- Cuda Graphs + Label-Looping algorithm for RNN-T and TDT Decoding - Transducer Greedy decoding at over 1500x RTFx, on par with CTC Non-Autoregressive models
- Semi Sorted Batching support - External User contribution that speeds up training by 15-30%.
Customization
- Context biasing for CTC word stamping - Improve accuracy for custom vocabulary and pronunciation
- Longform Inference
- Longform inference support for AED models
- Transcription of multi-channel audio for AED models
Misc
- Upgraded webdataset - Speech and LLM / Multimodal unified container
Detailed Changelogs
ASR
Changelog
- Enable using hybrid asr models in CTC Segmentation tool by @erastorgueva-nv :: PR: #8828
- TDT confidence fix by @GNroy :: PR: #8982
- Fix union type annotations for autodoc+mock-import rendering by @pzelasko :: PR: #8956
- NeMo dev doc restructure by @yaoyu-33 :: PR: #8896
- Improved random seed configuration for Lhotse dataloaders with docs by @pzelasko :: PR: #9001
- Fix #8948, allow preprocessor to be stream captured to a cuda graph when doing per_feature normalization by @galv :: PR: #8964
- [ASR] Support for transcription of multi-channel audio for AED models by @anteju :: PR: #9007
- Add ASR latest news by @titu1994 :: PR: #9073
- Fix docs errors and most warnings by @erastorgueva-nv :: PR: #9006
- PyTorch CUDA allocator optimization for dynamic batch shape dataloading in ASR by @pzelasko :: PR: #9061
- RNN-T and TDT inference: use CUDA graphs by default by @artbataev :: PR: #8972
- Fix #8891 by supported GPU-side batched CTC Greedy Decoding by @galv :: PR: #9100
- Update branch for notebooks and ci in release by @ericharper :: PR: #9189
- Enable CUDA graphs by default only for transcription by @artbataev :: PR: #9196
- rename paths2audiofiles to audio by @nithinraok :: PR: #9209
- Fix ASR_Context_Biasing.ipynb contains FileNotFoundError by @andrusenkoau :: PR: #9233
- Cherrypick: Support dataloader as input to
audio
for transcription (#9201) by @titu1994 :: PR: #9235 - Update Online_Offline_Microphone_VAD_Demo.ipynb by @stevehuang52 :: PR: #9252
- Dgalvez/fix greedy batch strategy name r2.0.0rc0 by @galv :: PR: #9243
- Accept None as an argument to decoder_lengths in GreedyBatchedCTCInfer::forward by @galv :: PR: #9246
- Fix loading github raw images on notebook by @nithinraok :: PR: #9282
- typos by @nithinraok :: PR: #9314
- Re-enable cuda graphs in training modes. by @galv :: PR: #9338
- add large model stable training fix and contrastive loss update for variable seq by @nithinraok :: PR: #9259
- Fix conv1d package in r2.0.0rc0 by @pablo-garay :: PR: #9369
- Fix GreedyBatchedCTCInfer regression from GreedyCTCInfer. (#9347) by @titu1994 :: PR: #9350
- Make a backward compatibility for old MSDD configs in label models by @tango4j :: PR: #9377
- Force diarizer to use CUDA if cuda is available and if device=None. by @tango4j :: PR: #9380
TTS
Changelog
LLM and MM
Changelog
- Rachitg/dpa by @rachitgarg91 :: PR: #8911
- Remove precision args in trainer due to PTL update by @yaoyu-33 :: PR: #8908
- Huvu/mcore retro by @huvunvidia :: PR: #8861
- fsdp tp > 1 bug fix by @dimapihtar :: PR: #8947
- Fix memory leak at loss func by @minitu :: PR: #8868
- change the condition for get qkv tensor from linear_qkv output in mcoremixin by @HuiyingLi :: PR: #8965
- Add safety checks for 'data' key in MegatronGPTModel cfg by @HuiyingLi :: PR: #8991
- [NeMo-UX] Adding MegatronParallel by @cuichenx :: PR: #8987
- Skip top_p computations when set to 1.0 by @odelalleau :: PR: #8905
- Gemma bug by @cuichenx :: PR: #8962
- [NeMo-UX] Adding megatron strategy by @marcromeyn :: PR: #8995
- Quantized checkpoint support in export and deploy modules by @janekl :: PR: #8859
- add geglu to mlp swap by @JRD971000 :: PR: #8999
- add timeout for new_group by @acphile :: PR: #8998
- Zero-shot evaluation pipeline for mcore RETRO by @huvunvidia :: PR: #8941
- Added fusion for squared relu by @sanandaraj5597 :: PR: #8963
- Developer Documents for mcore RETRO by @huvunvidia :: PR: #9026
- [NeMo-UX] Adding GPTModel & MockDataModule by @marcromeyn :: PR: #9011
- Adding unit test for mcore RETRO model by @huvunvidia :: PR: #9022
- docs and simplification of cmd args by @arendu :: PR: #8979
- [NeMo-UX] Add checkpoint-io to MegatronStrategy by @marcromeyn :: PR: #9057
- Enable Sequence Packing and Pipeline Parallel in NeVA by @yaoyu-33 :: PR: #8957
- Mingyuanm/add back fp8 support to sd by @Victor49152 :: PR: #9070
- unfused lora by @arendu :: PR: #9004
- Handle case where num_query_groups is set to null for LoRA config setup by @vysarge :: PR: #9075
- Alit/griffin by @JRD971000 :: PR: #9021
- Implement DistributedCheckpointIO by @mikolajblaz :: PR: #9016
- Video Neva Pretraining + Inference Implementation by @paul-gibbons :: PR: #9095
- HF to .nemo for Mixtral-8x22B-instruct by @akoumpa :: PR: #9060
- mcore ds updates by @dimapihtar :: PR: #8951
- Alit/griffin perf by @JRD971000 :: PR: #9107
- Add assert for max_steps to be positive in MegatronGPTSFTModel by @athitten :: PR: #9110
- Extend sequence length padding for GPT SFT to account for context parallel by @vysarge :: PR: #8869
- Update gpt dataset config parameter for mock by @thomasdhc :: PR: #9118
- Add Mcore DistributedDataParallel and distributed optimizer into Nemo by @gdengk :: PR: #9034
- Revert "Add assert for max_steps to be positive in MegatronGPTSFTMode… by @pablo-garay :: PR: #9128
- scripts to convert HF lora to nemo by @arendu :: PR: #9102
- Prevent duplicated checkpoints by @mikolajblaz :: PR: #9015
- add TN/ITN link in speech tools list by @erastorgueva-nv :: PR: #9142
- Cleanup deprecated files and temporary changes by @cuichenx :: PR: #9088
- Use DP+CP groups as the FSDP sharding domain by @erhoo82 :: PR: #9145
- CUDA memory profile by @erhoo82 :: PR: #9096
- Fix missing func for T5 model by @gdengk :: PR: #9141
- Add knob for load_directly_on_device by @mikolajblaz :: PR: #9125
- Revert rope fusion defaults by @cuichenx :: PR: #9238
- Update nemo.export module for quantized models by @janekl :: PR: #9250
- Fix circular import for MM dataprep notebook by @cuichenx :: PR: #9287
- neva media_type + text generation default fix by @paul-gibbons :: PR: #9257
- fix lora and ptuning and isort/black by @oyilmaz-nvidia :: PR: #9290
- add check if num layers is divisible by pp size by @dimapihtar :: PR: #9208
- Fix P-tuning for Llama based models by @apanteleev :: PR: #9297
- add deprecation warnings by @pablo-garay :: PR: #9266
- move pooler under post_process by @dimapihtar :: PR: #9328
- add deprecation note for nmt by @dimapihtar :: PR: #9342
- Fix incorrect checkpoint removal logic (#9192) by @mikolajblaz :: PR: #9204
- fix fp16 precision issue by @dimapihtar :: PR: #9376
- Fix module.training for Neva in FusedAttn backward which causes nan by @yaoyu-33 :: PR: #8877
Export
Changelog
- Updates for TRT-LLM 0.9 by @oyilmaz-nvidia :: PR: #8873
- Mingyuanm/sdxl export by @Victor49152 :: PR: #8926
- Avoid unpacking NeMo checkpoints before exporting to TRT-LLM by @apanteleev :: PR: #8866
- Update gemma for trt-llm 0.9 by @oyilmaz-nvidia :: PR: #8974
- TRT-LLM export P-tuning related fixes by @apanteleev :: PR: #8863
General Improvements
Changelog
- Update package info by @ericharper :: PR: #8793
- [Nemo CICD] Update mcore 4.13.24 by @pablo-garay :: PR: #8917
- Akoumparouli/low mem mixtral ckpt converter by @akoumpa :: PR: #8895
- Adding RETRO tests to Action Tests (cicd-main.yml) by @huvunvidia :: PR: #8942
- Akoumparouli/fix sd train 2 by @akoumpa :: PR: #8883
- Update te install for jenkins by @ericharper :: PR: #8954
- [Nemo CICD] Add last job depending on others for blocking check by @pablo-garay :: PR: #8959
- Minor quantization...
NVIDIA Neural Modules 1.23.0
Highlights
Models
Nvidia Starcoder 2 - 15B
- Announcement - https://developer.nvidia.com/blog/unlock-your-llm-coding-potential-with-starcoder2/
- AI Foundation Model Inference - https://catalog.ngc.nvidia.com/orgs/nvidia/teams/ai-foundation/models/starcoder2-15b
- https://huggingface.co/bigcode/starcoder2-15b
NeMo Canary
Announcement - https://nvidia.github.io/NeMo/blogs/2024/2024-02-canary/
NeMo LLM
- Falcon
- Code Llama
- StarCoder
- GPT perf improvements
- Context parallelism
- Mistral
- Mixtral (without expert parallelism)
- Mcore GPT Dataset integration
NeMo MM
- CLIP
- Stable Diffusion (supporting LoRA)
- Imagen
- ControlNet (for SD)
- Instruct pix2pix (for SD)
- LLAVA
- NeVA
- DreamFusion++
- NSFW filtering
NeMo ASR
- Lhotse Dataloading support #7880
- Canary: Multi task multi lingual ASR #8242
- LongForm Audio for Diarization #7737
- Faster algorithm for RNN-T Greedy #7926
- Cache-Aware streaming notebook #8296
NeMo TTS
NeMo Vision
Known Issues
ASR
RNNT WER calculation when fused batch size > 1 during validation / test step()
Previously, the RNNT metric was stateful while the CTC one was not (r1.22.0, r1.23.0)
Therefore this calculation in the RNNT joint for fused operation worked properly. However with the unification of metrics in r1.23.0, a bug was introduced where only the last sub-batch of metrics calculates the scores and does not accumulate. This is patched via #8587 and will be fixed in the next release.
Workaround: Explicitly disable fused batch size during inference using the following command
from omegaconf import open_dict
model = ...
decoding_cfg = model.cfg.decoding
with open_dict(decoding_cfg):
decoding_cfg.fused_batch_size = -1
model.change_decoding_strategy(decoding_cfg)
Note: This bug does not affect scores calculated via model.transcribe() (since it does not calculate metrics during inference, just text), or using the transcribe_speech.py
or speech_to_text_eval.py
in examples/asr
.
Two failing unit tests due to a change in expected results, caused by lhotse version update.
Container
For additional information regarding NeMo containers, please visit: https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo
docker pull nvcr.io/nvidia/nemo:24.01.speech
Detailed Changelogs
ASR
Changelog
- Update link to yaml file in ASR_with_Transducers.ipynb by @Faith-Nchifor :: PR: #8014
- Use convert_hf_dataset_to_nemo by @karpnv :: PR: #8017
- Update asr_language_modeling.rst: Add a missing word by @martin0258 :: PR: #8007
- spelling mistake by @orena1 :: PR: #7903
- update asr eval by @stevehuang52 :: PR: #8045
- fix noise aug by @stevehuang52 :: PR: #8057
- Various fixes for typos and urls by @titu1994 :: PR: #8066
- [Fix] Increase length check tolerance to prevent test failing by @anteju :: PR: #8067
- Add text metrics to asr eval by @stevehuang52 :: PR: #8087
- fix device setting to allow using accelerator cpu by @orena1 :: PR: #8084
- .ctm in data simulator annotator compliant with RT-09 specification by @popcornell :: PR: #8004
- Fix AST eval by @stevehuang52 :: PR: #8112
- fix: numba.*_num_threads resets torch num_threads #8141 by @itzsimpl :: PR: #8145
- Update dependencies by @titu1994 :: PR: #8156
- NeMo + Lhotse integration by @pzelasko :: PR: #7880
- Speedup RNN-T greedy decoding by @artbataev :: PR: #7926
- [docker] Install k2 before NeMo for faster image rebuilding by @pzelasko :: PR: #8204
- [docs] Add --force_codec to tarred dataset creation examples by @pzelasko :: PR: #8227
- Temporarily use the previous RNN-T decoding algorithm as default by @artbataev :: PR: #8226
- Make TDT inference not require duration params by @hainan-xv :: PR: #8207
- Cache Aware Streaming tutorial notebook by @erastorgueva-nv :: PR: #8296
- fix path location and branch by @nithinraok :: PR: #8304
- Attention encoder-decoder models for multiple speech-to-text tasks … by @titu1994 :: PR: #8324
- Remove asr webapp by @titu1994 :: PR: #8347
- remove target at model level in aed model config [ASR] by @krishnacpuvvada :: PR: #8351
- Add change_vocabulary and save_tokenizers() support to Multitask ASR models by @titu1994 :: PR: #8357
- Change default beam size by @titu1994 :: PR: #8371
- adding jenkins test for speech_to_text_aed model by @krishnacpuvvada :: PR: #8368
- Add Finetuning tutorial with HF Datasets by @nithinraok :: PR: #8356
- wer fix by @tbartley94 :: PR: #8404
- add ensemble decoding fix by @nithinraok :: PR: #8427
- Update k2 by @artbataev :: PR: #8492
TTS
Changelog
- [TTS] Scale sampler steps by number of devices by @rlangman :: PR: #7947
- Add All Multimodal Source Code Part 2: Text to image, x to nerf by @yaoyu-33 :: PR: #7970
- [TTS] Add period discriminator and feature matching loss to codec recipe by @rlangman :: PR: #7884
- Added VectorQuantizer base class by @anteju :: PR: #8011
LLMS
Changelog
- Add interface to set NCCL options of each process group by @erhoo82 :: PR: #7923
- Support O2 training of PEFT and SFT by @cuichenx :: PR: #7971
- [NLP] Access scaler only in FP16 case by @janekl :: PR: #7916
- [NLP] Minor improvements in Llama conversion script by @janekl :: PR: #7978
- [NLP] Use helpers from utils_funcs.py in Llama conversion by @janekl :: PR: #7979
- [NLP] Remove replace_sampler_ddp (deprecated in Trainer) by @janekl :: PR: #7981
- Reworked MegatronPretrainingRandomBatchSampler to correctly handle epochs > 1 by @trias702 :: PR: #7920
- Remove deprecated arguments from TE's TransformerLayer by @jbaczek :: PR: #7917
- Add All Multimodal Source Code by @yaoyu-33 :: PR: #7791
- First draft of mcore bert model in NeMo by @shanmugamr1992 :: PR: #7814
- Support Falcon Variants (7B/40B/180B) in Mcore NeMo by @xuanzic :: PR: #7666
- FSDP + Tensor Parallelism by @erhoo82 :: PR: #7897
- Packed Sequence by @cuichenx :: PR: #7945
- Adding method back that was removed accidentally by @ericharper :: PR: #8038
- [NLP] ArtifactItem with init=True to make it debuggable by @janekl :: PR: #7980
- SFT patch: (1) enable sequence parallelism and (2) enable profile by @erhoo82 :: PR: #7963
- migration to PTL 2.0 for spellmapper model by @bene-ges :: PR: #7924
- Change the megatron config lr scheduler default and fix to change partitions script by @shan18 :: PR: #8094
- (1) Add SHARP interface to M-CORE, (2) use send/recv to send train loss to the first rank instead of b-cast by @erhoo82 :: PR: #7793
- Reconfigure limit_val_batches only for int by @athitten :: PR: #8099
- Fixing wrapper and moving it to base class by @shanmugamr1992 :: PR: #8055
- fix gated_linear_unit bug by @Agoniii :: PR: #8042
- Fix Adapter for MCore models by @cuichenx :: PR: #8124
- add war fix for sync issues by @gshennvm :: PR: #8130
- Improve PEFT UX by @cuichenx :: PR: #8131
- Enhance flexibility by passing callbacks as method argument by @michal2409 :: PR: #8015
- context parallelism by @xrennvidia :: PR: #7739
- Make pipelined TP comm overlap available with mcore by @erhoo82 :: PR: #8005
- remove deprecated scripts by @arendu :: PR: #8138
- adding OnlineSampleMapping by @arendu :: PR: #8137
- Add distopt support for FP8 params and BF16 optimizer state by @timmoon10 :: PR: #7909
- Revert adding OnlineSampleMapping by @pablo-garay :: PR: #8164
- Token count and sequence length logging for MegatronGPTSFTModel by @vysarge :: PR: #8136
- Use latest apex internal API by @jbaczek :: PR: #8129
- tune specific params in the base model by @arendu :: PR: #7745
- Virtual pipeline parallel support for MegatronGPTSFTModel by @vysarge :: PR: #7964
- removed deprecated peft model by @arendu :: PR: #8183
- remove more deprecated files by @arendu :: PR: #8169
- Pre-generate cu_seqlens argmin and max_seqlen to remove host-to-device sync by @erhoo82 :: PR: #8108
- Add the interface to use SHARP to FSDP strategy by @erhoo82 :: PR: #8202
- Multimodal required NLP base model changes by @yaoyu-33 :: PR: #8188
- [NLP] Improve and unify loading state_dict for community models by @janekl :: PR: #7977
- Rename Finetuning Scripts by @cuichenx :: PR: #8201
- Final multimodal PR with our recent developments on MM side by @yaoyu-33 :: PR: #8127
- Add include_text parameter to SFT dataloaders by @Kipok :: PR: #8198
- Add random_seed argument to generate by @Kipok :: PR: #8162
- Added support for neptune logger by @harishankar-gopalan :: PR: #8210
- Pre-compute max_seqlen and cu_seqlens_argmin in all model-parallel cases by @erhoo82 :: PR: #8222
- Use PackedSeqParams in accordance with changes in Megatron-LM by @cuichenx :: PR: #8205
- Fix to peft & virtual pipeline parallel unsupported check by @vysarge :: PR: #8216
- Fixed the tp overlap switch by @sanandaraj5597 :: PR: #8195
- add knobs for rope/swiglu fusion by @lhb8125 :: PR: #8184
- Added sample cpu_offloading switch to YAML by @sanandaraj5597 :: PR: #8148
- Syncing random seed between ranks in generate by @Kipok :: PR: #8230
- add first_val_step to mcore scheduler by @JimmyZhang12 :: PR: #8150
- Correct padding for SFT input data to account for sequence parallel + TE's fp8 op dimension requirements by @vysarge :: PR: #8240
- Mistral 7b conversion script by @akoumpa :: PR: #8052
- switch to mcore dataset [with FIM support] by @dimapihtar :: PR: #8149
- Mixtral to NeMo conversion script. by @akoumpa :: PR: #8155
- fixes to accomendate mcore changes by @HuiyingLi :: PR: #8261
- Allow MegatronPretrainingRandomSample...
NVIDIA Neural Modules 1.22.0
Highlights
Models
NeMo Parakeet
Announcement - https://nvidia.github.io/NeMo/blogs/2024/2024-01-parakeet/
- https://huggingface.co/nvidia/parakeet-rnnt-1.1b
- https://huggingface.co/nvidia/parakeet-ctc-1.1b
- https://huggingface.co/nvidia/parakeet-rnnt-0.6b
- https://huggingface.co/nvidia/parakeet-ctc-0.6b
NeMo Parakeet-TDT
Announcement - https://nvidia.github.io/NeMo/blogs/2024/2024-01-parakeet-tdt/
ASR
- stt_en_fastconformer_transducer_large_ls #7641
- stt_en_fastconformer_ctc_larg_ls #7641
- stt_en_fastconformer_hybrid_large_streaming_multi
- stt_nl_fastconformer_hybrid_large_pc
- stt_fa_fastconformer_hybrid_large
NeMo ASR
- Multi-lookahead cache-aware streaming Conformer #6711
- Automatic Lip Reading Recognition (ALR) - ASR/CV (Visual ASR) by @burchim #7330
- Speech ehancement tutorial #6492
- Support punctuation error rate #7538
Container
For additional information regarding NeMo containers, please visit: https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo
docker pull nvcr.io/nvidia/nemo:23.10
Detailed Changelogs
ASR
Changelog
- Fix missing pip package 'einops' by @RobinDong :: PR: #7397
- Fix failure of installing pyaudio in Online_Offline_Speech_Commands_Demo.ipynb by @RobinDong :: PR: #7396
- [ASR] Confidence measure -> method renames by @GNroy :: PR: #7434
- RNN-T confidence and alignment bugfix by @GNroy :: PR: #7381
- Automatic Lip Reading Recognition (ALR) - ASR/CV (Visual ASR) by @burchim :: PR: #7330
- [TTS] Read audio as int32 to avoid flac read errors by @rlangman :: PR: #7477
- Fix typos in confidence tutorial notebooks by @Kipok :: PR: #7581
- Safeguard nemo_text_processing installation on ARM by @blisc :: PR: #7485
- add fc large ls models by @nithinraok :: PR: #7641
- [ASR] RNN-T greedy decoding max_frames fix for alignment and confidence by @GNroy :: PR: #7635
- Create per.py by @ssh-meister :: PR: #7538
- Update docs: readme, getting started, ASR intro by @erastorgueva-nv :: PR: #7679
- [ASR] Multichannel mask estimator with flex number of channels by @anteju :: PR: #7317
- Fix code block typo in docs by @erastorgueva-nv :: PR: #7717
- Replace gpus with devices by @athitten :: PR: #7743
- docs: fix typos by @shuoer86 :: PR: #7758
- Snake act by @nithinraok :: PR: #7736
- fix(clustering_diarizer.py): fix typo by @jqueguiner :: PR: #7772
- Add some docs and update scripts for ASR by @titu1994 :: PR: #7790
- remove TN from ctc_segm tut by @ekmb :: PR: #7807
- Add support for finetuning with huggingface datasets by @stevehuang52 :: PR: #7834
- Adding long-form audio speaker diarization (clustering) class and functions by @tango4j :: PR: #7737
- Fix k2 installation: update for latest PyTorch, move script to dir by @artbataev :: PR: #7887
- [ASR] GSS-based mask estimator by @anteju :: PR: #7849
- add Dutch P&C FC model info by @zhehuaichen :: PR: #7892
- Add checks for unit tests that are looking for data from CI machine by @ericharper :: PR: #7943
- update branch name by @nithinraok :: PR: #7990
- fix librosa display issue by @nithinraok :: PR: #7991
- Fixes Notebooks for ASR by @titu1994 :: PR: #7994
- cherry pick bug 4405781 by @karpnv :: PR: #8044
- fix noise augmentation by @stevehuang52 :: PR: #8056
- Fix various issues with broken links and bugs by @titu1994 :: PR: #8064
- run with non-dev option by @nithinraok :: PR: #8077
- update broken links by @nithinraok :: PR: #8079
- langid bug fix by @karpnv :: PR: #8134
TTS
Changelog
- Add steps for document of getting dataset 'SF Bilingual Speech' by @RobinDong :: PR: #7378
- Fix checking of cuda/cpu device for inputs of Decoder by @RobinDong :: PR: #7444
- Fix failure of ljspeech's get_data.py by @RobinDong :: PR: #7430
- [TTS] Fix audio codec type checks by @rlangman :: PR: #7373
- [TTS] Add dataset to path of logged artifacts by @rlangman :: PR: #7462
- Fix adding positional embeddings in-place in FFTransformerDecoder by @The0nix :: PR: #7440
- Add dataset 'AISHELL-3' from OpenSLR for training mandarin TTS by @RobinDong :: PR: #7409
- [TTS] Fix FastPitch data prep tutorial by @rlangman :: PR: #7524
- add italian tokenization by @GiacomoLeoneMaria :: PR: #7486
- Remap speakers to continuous range of speaker_id for dataset AISHELL3 by @RobinDong :: PR: #7536
- add ItalianPhonemesTokenizer by @GiacomoLeoneMaria :: PR: #7587
- [TTS] Add STFT and SI-SDR loss to audio codec recipe by @rlangman :: PR: #7468
- Fix typo in audio codec config, encoder target by @anteju :: PR: #7697
- Group-residual vector quantizer by @anteju :: PR: #7643
- French g2p with pronunciation dictionary by @mgrafu :: PR: #7601
- add pleasefixme marker for potential failed nightly tests. by @XuesongYang :: PR: #7678
- Add new text segmentation library for better TTS quality by @RobinDong :: PR: #7645
- ConditionalInput: cat along the feature dim, not the batch dim by @anferico :: PR: #7785
- Add selection criteria for reference audios in the submodule by @anferico :: PR: #7788
- [Codec] Update codec checkpoint config by @anteju :: PR: #7835
- [Codec] Finite scalar quantizer by @anteju :: PR: #7886
- Tar codec by @nithinraok :: PR: #7867
LLM
Changelog
- Allow disabling sanity checking when num_sanity_val_steps=0 by @athitten :: PR: #7413
- Add comprehensive error messages by @PeganovAnton :: PR: #7261
- layer selection for ia3 by @arendu :: PR: #7417
- Add rope dynamic linear scaling by @hsiehjackson :: PR: #7437
- Fix sft dataset truncation by @hsiehjackson :: PR: #7464
- fix bug when loading dist ckpt in peft by @lhb8125 :: PR: #7452
- Fix sft chat dataset truncation by @hsiehjackson :: PR: #7478
- SFT model parallel fix for dist ckpt by @aklife97 :: PR: #7511
- remove auto generated examples by @arendu :: PR: #7510
- Add the argument to by @odelalleau :: PR: #7264
- PEFT GPT & T5 Refactor by @meatybobby :: PR: #7308
- fix a typo by @BestJuly :: PR: #7496
- StarCoder SFT test + bump PyT NGC image to 23.09 by @janekl :: PR: #7540
- fix llama2 70b lora tuning bug by @cuichenx :: PR: #7622
- generalized chat sft prompt by @yidong72 :: PR: #7655
- Set base frequency from config by @shan18 :: PR: #7734
- Megatron LLM documentation updates by @ssh-meister :: PR: #7400
- Remove incorrect extra argument of load_from_checkpoint_dir() by @RobinDong :: PR: #7500
- Add nemo to mcore GPT conversion script by @cuichenx :: PR: #7730
- set context for text memmap to fork by @arendu :: PR: #7784
- Support flash decoding by @hsiehjackson :: PR: #7744
- update text server to support compute logprobs by @Zhilin123 :: PR: #7733
- Revert PEFT eval fix by @ericharper :: PR: #7693
- Fix tn duplex by @ekmb :: PR: #7808
- Multimodal merge by @yaoyu-33 :: PR: #7728
- Fix flash decoding precision by @hsiehjackson :: PR: #7852
- Removing duplicate Megatron-LM installation by @Davood-M :: PR: #7864
- adding special_tokens from tokenizer config for transformer-lm model by @clumsy :: PR: #7613
- Add Adapter and IA3 support for MCore models by @cuichenx :: PR: #7750
- Add back import guard by @cuichenx :: PR: #7882
- Change FP8 Defaults by @cuichenx :: PR: #7894
- Added knob for ub_tp_comm_overlap for the MCORE pass by @sanandaraj5597 :: PR: #7902
- Upgrade NeMo to latest mcore and TE by @dimapihtar :: PR: #7862
- Pad sequences to multiples of 16 for GPTSFTDataset by @vysarge :: PR: #7904
- upgrade to latest mcore and TE by @dimapihtar :: PR: #7908
- added missing torch import by @Davood-M :: PR: #7913
- Fix CPU initialization of GPT models by @cuichenx :: PR: #7889
- Fix pinned triton version by @hsiehjackson :: PR: #7925
- fix tp_overlap config var name by @xrennvidia :: PR: #7928
- only enable query key scaling during fp16 by @gshennvm :: PR: #7946
- Fix for gpt3 eval hang with PP (a dtype issue) by @yaoyu-33 :: PR: #7927
- Pass in rotary_base to mcore and from HF by @Kipok :: PR: #7933
- Use NLPDDPStrategyNotebook in Multitask_Prompt_and_PTuning.ipynb by @athitten :: PR: #8061
General Improvements
Changelog
- Add fix for max time to quit trainer gracefully, without running validation by @SeanNaren :: PR: #7731
- SDE Tutorial minor fix by @Jorjeous :: PR: #7598
- Temporary pin Lightning-Utilities version due to broken NamedTuple by @artbataev :: PR: #8022
- Karpnv/issue 7320 by @karpnv :: PR: #7418
- Speech Simulator, update README.md: output_path --> output_manifest_filepath by @popcornell :: PR: #7442
- Fix None dataloader issue in PTL2.0 by @KunalDhawan :: PR: #7455
- HF StarCoder to NeMo conversion script by @janekl :: PR: #7421
- [doc] fix broken link by @stas00 :: PR: #7481
- dllogger - log on rank 0 only by @stas00 :: PR: #7513
- Add two youtube introductory videos to README and Docs. by @XuesongYang :: PR: #7570
- defaults changed by @arendu :: PR: #7600
- Bound transformers version in requirements by @athitten :: PR: #7620
- Fix import error no module name model_utils by @menon92 :: PR: #7629
- Fix in the confidence ensemble test by @Kipok :: PR: #7682
- move core install to /workspace by @aklife97 :: PR: #7706
- distributed checkpoint average script by @yidong72 :: PR: #7721
- fix hybrid eval by @karpnv :: PR: #7757
- fix(diarization-README): typo by @jqueguiner :: PR: #7771
- Configure MCore logger by @mikolajblaz :: PR: #7781
- Nemo to HF converter for LLaMA model by @uppalutkarsh :: PR: #7770
- [Fix] Save best NeMo model only when necessary by @anteju :: PR: #7836
- add guard if its a distributed checkpoint b...
NVIDIA Neural Modules 1.21.0
Highlights
Models
NeMo ASR
- Multi-lookahead cache-aware streaming
- Speech enahncement tutorial #6492
- Online code switching dataset #6579
NeMo TTS
- AudioCodec: Training recipe for EnCodec #6852
NeMo Framework
NeMo Core
- Update to PTL 2.0 #6433
NeMo Tools
- Forced aligner tutorial #7210
Container
For additional information regarding NeMo containers, please visit: https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo
docker pull nvcr.io/nvidia/nemo:23.08
ASR
Changelog
- Fix require_grad typos by @kit1980 :: PR: #6930
- rnnt_greedy_decoding.py: typos? auto-repressively -> auto-regressively by @vadimkantorov :: PR: #6989
- Adding tutorial for confidence ensembles by @Kipok :: PR: #6932
- Add support for Numba FP16 RNNT Loss by @titu1994 :: PR: #6991
- fix install_beamsearch_decoders by @karpnv :: PR: #7011
- rnnt and char utils by @karpnv :: PR: #6971
- ASR Confidence update and tutorial by @GNroy :: PR: #6810
- st standalone model by @AlexGrinch :: PR: #6969
- Fix typo in ASR-TTS tutorial by @artbataev :: PR: #7049
- Update Frame-VAD doc and fix onnx export by @stevehuang52 :: PR: #7076
- Fast Conformer global token fix by @sam1373 :: PR: #7085
- Added script to extract ASR CTC and RNNT models from ASR hybrid models by @trias702 :: PR: #7092
- Fix absolute path in path join call by @kingjan1999 :: PR: #7099
- NeMo ASR Demo by @lleaver :: PR: #7110
- Fix plot function in vad_utils.py by @stevehuang52 :: PR: #7113
- Fixed small bug with NoisePerturbationWithNormalization by @trias702 :: PR: #7118
- Merge release r1.20.0 to main by @ericharper :: PR: #7167
- minor fix for conformer subsampling docstring. by @XuesongYang :: PR: #7195
- [ASR] Fix GPU memory leak in transcribe_speech.py by @rlangman :: PR: #7249
- Adding Multilingual, Code-Switched, and Hybrid ASR models by @KunalDhawan :: PR: #7250
- fix partial transcribe by @stevehuang52 :: PR: #7284
- Conv1d subsampling by @burchim :: PR: #7294
- add bf16 inference support and fix seq_len stft issue by @nithinraok :: PR: #7338
- Add finetuning scripts by @nithinraok :: PR: #7263
- Move parameter: trainer -> exp_manager (for PTL 2.0) by @artbataev :: PR: #7339
- Fix typos by @omahs :: PR: #7361
- Fix wrong calling of librosa.get_duration() in notebook by @RobinDong :: PR: #7376
- RNN-T confidence and alignment bugfix (#7381) by @GNroy :: PR: #7459
- update branch by @nithinraok :: PR: #7488
- Replace strategy = None with strategy = auto for notebooks by @athitten :: PR: #7521
- Fix PTL2.0 related ASR bugs in r1.21.0: Val metrics logging, None dataloader issue by @KunalDhawan :: PR: #7531
- gpus -> devices by @nithinraok :: PR: #7542
- [BugFix] Add missing quotes for auto strategy in tutorial notebooks by @athitten :: PR: #7541
- Append output of val_step to self.validation_step_outputs in EncMaskDecAudioToAudioModel by @athitten :: PR: #7543
- fix validation_step_outputs initialization for multi-dataloader by @KunalDhawan :: PR: #7546
- Append val/test output to instance variable in EncDecSpeakerLabelModel by @athitten :: PR: #7562
- update strategy by @nithinraok :: PR: #7577
- Typo fixes by @Kipok :: PR: #7591
- Fix metrics for SE tutorial by @anteju :: PR: #7604
- fix ssl models ptl monitor val through logging by @nithinraok :: PR: #7608
- Fix py3.11 dataclasses issue by @titu1994 :: PR: #7582
- bugfix: trainer.gpus, trainer.strategy, trainer.accelerator by @XuesongYang :: PR: #7621
- Safeguard nemo_text_processing installation on ARM (#7485) by @blisc :: PR: #7619
- [ASR] Fix type error in jasper by @rlangman :: PR: #7636
- Fix vad & speech command tutorial - onnx by @fayejf :: PR: #7671
- Replace strategy='dp'/None with 'auto' by @athitten :: PR: #7681
- Fix multi rank finetune for ASR by @titu1994 :: PR: #7684
- fix ptl_bugs in slu_models.py by @jzi040941 :: PR: #7689
- Add NLPDDPStrategyNotebook and change trainer gpus to devices by @athitten :: PR: #7741
- Updated installation of ctc-decoders by @vsl9 :: PR: #7746
- Fix bug wrt change decoding strategy for bpe models by @titu1994 :: PR: #7762
TTS
Changelog
- [TTS] Add cosine distance option to TTS aligner by @rlangman :: PR: #6806
- [TTS] Add tutorial for TTS data prep scripts by @rlangman :: PR: #6922
- update TTS readme by @XuesongYang :: PR: #7088
- [TTS] Create EnCodec training recipe by @rlangman :: PR: #6852
- [TTS][ZH] add Chinese TTS recipes based on IPA symbol sets. by @XuesongYang :: PR: #6893
- [TTS] Add output audio format to preprocessing by @rlangman :: PR: #6889
- [TTS] Remove nested TTS configs by @rlangman :: PR: #7154
- [TTS] Fix TTS recipes with PTL 2.0 by @rlangman :: PR: #7188
- [TTS] Add license to ported EnCodec code by @rlangman :: PR: #7197
- [Fix] Discriminator update in AudioCodecModel by @anteju :: PR: #7209
- Adapter ipa Tutorial and config update by @styagi130 :: PR: #7260
- [TTS] Audio codec fixes by @rlangman :: PR: #7266
- [TTS] minor fix typos and input_types by @XuesongYang :: PR: #7272
- specify explicitly to set pretrained model paths by @styagi130 :: PR: #7305
- [TTS] Update AudioCodec API by @anteju :: PR: #7310
- [TTS] Add additional config to preprocess_text and compute_feature_stats by @rlangman :: PR: #7321
- [TTS] Change audio codec token type to TokenIndex by @rlangman :: PR: #7356
- fixed trainer.strategy=auto from None. by @XuesongYang :: PR: #7369
- [TTS] Added a callback for logging initial data by @anteju :: PR: #7384
- [TTS] bugfix: trainer.accelerator=auto from None. by @XuesongYang :: PR: #7492
- bugfix: specify trainer.strategy=auto when devices=1 by @XuesongYang :: PR: #7509
- Fix dimensionality in get_dist function by @redoctopus :: PR: #7506
- Fix TTS FastPitch tutorial by @hsiehjackson :: PR: #7494
- [TTS] remove curly braces from in jupyer notebook cell. by @XuesongYang :: PR: #7554
- [TTS] fixed trainer's accelerator and strategy. by @XuesongYang :: PR: #7569
- Change hifigan finetune strategy to ddp_find_unused_parameters_true by @hsiehjackson :: PR: #7579
- Fix validation in G2PModel and ThutmoseTaggerModel by @athitten :: PR: #7597
- [TTS] Fix FastPitch data prep tutorial by @rlangman :: PR: #7602
- [TTS] Add dataset to path of logged artifacts by @rlangman :: PR: #7651
NLP / NMT
Changelog
- Minor MPT-7B fixes and creation script update by @trias702 :: PR: #6982
- remove hard coded input and output fields by @arendu :: PR: #7008
- RoPE length extrapolation with interpolation by @MaximumEntropy :: PR: #7005
- add async + distopt to sft by @MaximumEntropy :: PR: #7018
- ptuning inference table bug fix by @arendu :: PR: #7015
- Fix missing import for GPT SFT by @MaximumEntropy :: PR: #7026
- Add end_strings to SamplingParams by @markelsanz14 :: PR: #6986
- Fix race condition for downloading cache when executing with multi-node by @findkim :: PR: #7016
- added back the retro documents. by @yidong72 :: PR: #7033
- remove pos emb from state dict for old models by @ekmb :: PR: #7068
- memmap worker arg by @arendu :: PR: #7062
- Disable distopt contiguous param buffer by default by @timmoon10 :: PR: #7095
- [Fix] load_state_dict in nlp_model.py by @stevehuang52 :: PR: #7086
- Fix tokenizer file caching where torch.distributed may not be initialized yet by @findkim :: PR: #7061
- freeze base mode on init during peft by @arendu :: PR: #7152
- Include the scripts for preprocessing OAST and unit tests for chat sft datasets by @yidong72 :: PR: #7112
- T5 metrics fix by @jubick1337 :: PR: #7037
- megatron gpt training fix by @anmolgupt :: PR: #7199
- Fix T5 using FA by @hsiehjackson :: PR: #7196
- fix-causal-fa-infer by @hsiehjackson :: PR: #7200
- Fix gpt trainer test by @hsiehjackson :: PR: #6915
- Load ub_cfg from hydra config by @jbaczek :: PR: #7003
- Fixes for lightning 2.0 upgrade by @athitten :: PR: #7176
- Fix which was off by one batch by @odelalleau :: PR: #7212
- Start using ModelParallelConfig from Megatron Core by @ericharper :: PR: #6885
- deprecation warning by @arendu :: PR: #7193
- Fix attention mask inference by @hsiehjackson :: PR: #7213
- Use GPTModel from mcore by @ericharper :: PR: #7093
- Add bf16-mixed and 16-mixed in module.py by @athitten :: PR: #7227
- Refactor LLM pretraining examples by @maanug-nv :: PR: #7159
- Add only trainable parameters to optimizer group in PEFT by @guyueh1 :: PR: #7230
- Dummy class for ModelParallelConfig by @ericharper :: PR: #7254
- [TN][Docs] update language coverage matrix and refs by @mgrafu :: PR: #7247
- tied weights for adapters by @arendu :: PR: #6928
- Fix skip generation by @hsiehjackson :: PR: #7270
- Hidden transforms model parallel config + CI with Perceiver by @michalivne :: PR: #7241
- Fix restore sequence parallel by @hsiehjackson :: PR: #7273
- fix ptuning and lora model_parallel_config by @blahBlahhhJ :: PR: #7287
- Fix adapters and ptuning for amp O2 by @guyueh1 :: PR: #7285
- remove additional line in peft state dict by @blahBlahhhJ :: PR: #7293
- loss mask aware final layer applicaiton by @arendu :: PR: #7275
- Adding server option to peft eval by @Davood-M :: PR: #7292
- migrated class CSVFieldsMemmapDataset from BioNeMo by @dorotat-nv :: PR: #7314
- remove old prompt table for storing cached ptunig representations by @arendu :: PR: #7295
- Bugfix and optimization in by @odelalleau :: PR: #7267
- Set a default value when getting by @yaox12 :: PR: #7115
- Distributed checkpointing with mcore GPT by @ericharper :: PR: #7116
- Fix activation checkpoint by @hsiehjackson :: PR: #7334
- Replace prefetch with val iterator check in megatron models by @athitten :: PR: #7318
- Fixing indentation bug in indexed_dataset memory d...
NVIDIA Neural Modules 1.20.0
Highlights
Models
- STT En Fast Conformer CTC XXLarge - 1.2 B param Fast Conformer CTC
- STT En Fast Conformer Transducer XXLarge - 1.2 B param Fast Conformer Transducer
- STT En Fast Conformer Transducer XLarge - XLarge Fast Conformer English
- STT En Fast Conformer CTC XLarge - XLarge Fast Conformer CTC
- STT En Fast Conformer Transducer XLarge - XLarge Fast Conformer Transducer
- STT En Fast Conformer CTC Large - Large Fast Conformer CTC
- STT En Fast Conformer Transducer Large - Large Fast Conformer Transducer
- STT It Fast Conformer Hybrid Large P&C - Large P&C Italian Fast Conformer
- STT Ua Fast Conformer Hybrid Large P&C - Large Ukranian Fast Conformer
NeMo ASR
- Graph-RNN-T #6168
- WildCard-RNN-T #6168
- Confidence Ensembles for ASR
- Token-and-Duration Transducer (TDT) #6536
- Spellchecking ASR #6179
- Numba FP16 RNNT Loss #6991
NeMo TTS
- TTS Adapter Customization
- TTS Dataloader Framework
NeMo Framework
- LoRA for T5 and mT5 #6612
- Flash Attention integration #6666
- Mosaic 7B compatibility
- Models with LongContext (32K) #6666, #6687, #6773
NeMo Tools
- Speech Data Explorer: Utterance level ASR model comparsion #6669
- Speech Data Processor: Spanish P&C
- NeMo Forced Aligner: Large sequence alignment + memory reduction #6695
Container
For additional information regarding NeMo containers, please visit: https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo
docker pull nvcr.io/nvidia/nemo:23.06
Detailed Changelogs
ASR
Changelog
- [ASR] Adding ssl config for fast-conformer by @krishnacpuvvada :: PR: #6672
- Fix for interctc test random failure by @Kipok :: PR: #6644
- sharded manifests docs by @bmwshop :: PR: #6751
- [TTS] Implement new vocoder dataset by @rlangman :: PR: #6670
- TDT model pull request by @hainan-xv :: PR: #6536
- Spec aug fix by @tbartley94 :: PR: #6775
- Support large inputs to Conformer and Fast Conformer by @bmwshop :: PR: #6556
- sharded manifests updated docs by @bmwshop :: PR: #6833
- added fc-xl, xxl and titanet-s models by @nithinraok :: PR: #6832
- Multi-lookahead cache-aware streaming models by @VahidooX :: PR: #6711
- Update transcribe_utils.py by @stevehuang52 :: PR: #6865
- Fix k2 build topo helper by @artbataev :: PR: #6887
- Fix transcribe_utils.py for hybrid models in partial transcribe mode by @stevehuang52 :: PR: #6899
- Add hybrid model support to transcribe_speech_parallel.py by @stevehuang52 :: PR: #6906
- Update Frame-VAD doc by @stevehuang52 :: PR: #6902
- Make sure asr_model.change_attention_model is run if either cfg.model_path or cfg.pretrained_name is specified by @erastorgueva-nv :: PR: #6908
- Update fvad doc by @stevehuang52 :: PR: #6920
- Online Code Switching Dataset for ASR by @trias702 :: PR: #6579
- Fix AN4 dataset links by @artbataev :: PR: #6926
- Fix confidence ensembles RNNT logprobs selection logic for exclude_blank scenario by @KunalDhawan :: PR: #6937
- Adding cache-aware streaming ASR checkpoints. by @VahidooX :: PR: #6940
- Remove from metrics by @titu1994 :: PR: #6979
- Hybrid conformer export by @borisfom :: PR: #6983
- Cache handling without input tensors mutation by @borisfom :: PR: #6980
- Fixing an issue with confidence ensembles by @Kipok :: PR: #6987
- Add ASR with TTS Tutorial. Fix enhancer usage. by @artbataev :: PR: #6955
- fix install_beamsearch_decoders.sh by @karpnv :: PR: #7019
- Add support for Numba FP16 RNNT Loss (#6991) by @titu1994 :: PR: #7038
- Fix typo and branch in tutorial by @artbataev :: PR: #7048
- Refined export_config by @borisfom :: PR: #7053
- Fix documentation for Numba by @titu1994 :: PR: #7065
- Adding docs and models for multiple lookahead cache-aware ASR by @VahidooX :: PR: #7067
- Add updated fc ctc and rnnt xxl models by @nithinraok :: PR: #7128
- Update notebook branch by @ericharper :: PR: #7135
- Fixed main and merging this to r1.20 by @tango4j :: PR: #7127
- Fix default context size by @nithinraok :: PR: #7141
- Fix incorrect embedding grads with distopt BF16 grad reductions by @timmoon10 :: PR: #6958
TTS
Changelog
- [TTS] Add callback for saving audio during FastPitch training by @rlangman :: PR: #6665
- [TTS] Add script for text preprocessing by @rlangman :: PR: #6541
- [TTS] Fix adapter duration issue by @hsiehjackson :: PR: #6697
- [TTS] Filter out silent audio files during preprocessing by @rlangman :: PR: #6716
- [TTS] fix inconsistent type hints for IpaG2p by @XuesongYang :: PR: #6733
- [TTS] relax hardcoded prefix for phonemes and tones and infer phoneme set through dict by @XuesongYang :: PR: #6735
- [TTS] corrected misleading deprecation warnings. by @XuesongYang :: PR: #6702
- Fix TTS adapter tutorial by @hsiehjackson :: PR: #6741
- [TTS][zh] refine hardcoded lowercase for ASCII letters. by @XuesongYang :: PR: #6781
- [TTS] Append pretrained FastPitch & SpectrogamEnhancer pair to available models by @racoiaws :: PR: #7012
NLP / NMT
Changelog
- minor fix for missing chat attr by @arendu :: PR: #6671
- eval fix by @arendu :: PR: #6685
- VP Fixes for converter + Config management by @titu1994 :: PR: #6698
- lora notebook by @arendu :: PR: #6765
- peft eval directly from ckpt by @arendu :: PR: #6785
- GPT inference long context by @ekmb :: PR: #6687
- Fix validation with drop_last=False by @mikolajblaz :: PR: #6704
- fix spellmapper tutorial, change branch to main by @bene-ges :: PR: #6803
- text_generation_utils memory reduction if no logprob needed by @yzhang123 :: PR: #6773
- Add optional index mapping dir in mmap text datasets by @gheinrich :: PR: #6683
- Add inference kv cache support for transformer TE path by @yen-shi :: PR: #6627
- add reference to our paper by @bene-ges :: PR: #6821
- added changes to ramp up bs by @dimapihtar :: PR: #6799
- t5 lora tuning by @arendu :: PR: #6612
- Added rouge monitoring support for T5 by @jubick1337 :: PR: #6737
- GPT extrapolatable position embedding (xpos/sandwich/alibi/kerple) and Flash Attention by @hsiehjackson :: PR: #6666
- Import Enum for chatbot component by @ericharper :: PR: #6877
- typo fix from #6666 by @arendu :: PR: #6882
- removed unnecessary print by @dimapihtar :: PR: #6884
- Fix destructor for delayed mmap dataset case by @mikolajblaz :: PR: #6703
- Make Gradio library optional by @yidong72 :: PR: #6904
- Fix fast-glu activation in change partitions by @hsiehjackson :: PR: #6909
- Documentation for ONNX export of Megatron Models by @asfiyab-nvidia :: PR: #6914
- FixTextMemMapDataset index file creation in multi-node setup by @gheinrich :: PR: #6768
- Fix flash-attention by @hsiehjackson :: PR: #6901
- ptuning oom fix by @arendu :: PR: #6916
- add rampup bs assertion by @dimapihtar :: PR: #6927
- Enable methods in bert-like models by @sararb :: PR: #6898
- support value attribution condition by @yidong72 :: PR: #6934
- Add missing save restore connector to eval scripts by @titu1994 :: PR: #6935
- Merge release r1.19.0 into main by @ericharper :: PR: #6948
- Stop at the stop token by @yidong72 :: PR: #6957
- fixes for spellmapper by @bene-ges :: PR: #6994
- Fix tabular data text generation by @yidong72 :: PR: #7022
- fix pos id - hf update by @ekmb :: PR: #7075
- fix syntax error introduced in PR-7079 by @bene-ges :: PR: #7102
NeMo Tools
Bugfixes
Changelog
- small Bugfix by @fayejf :: PR: #7079
- Fix caching bug in causal convolutions for cache-aware ASR models by @VahidooX :: PR: #7034
- Fix masking bug for TTS Aligner by @redoctopus :: PR: #6677
- [bugfix] avoid the random shuffle of phoneme and tone tokens. by @XuesongYang :: PR: #6855
- fix ptuning residuals bug by @arendu :: PR: #6866
- TE bug fix by @dimapihtar :: PR: #7027
- Update distopt API for coalesced NCCL calls by @timmoon10 :: PR: #6886
General Improvements
Changelog
- update batch size recommendation to min 32 for 43b by @Zhilin123 :: PR: #6675
- Make Note usage consistent in adapter_mixins.py by @BrianMcBrayer :: PR: #6678
- Update all invalid tree references to blobs for NeMo samples by @BrianMcBrayer :: PR: #6679
- Update README.rst about container by @fayejf :: PR: #6686
- karpnv/issues6690 by @karpnv :: PR: #6705
- Limit codeql scope by @titu1994 :: PR: #6710
- Not pinning Gradio version by @yidong72 :: PR: #6680
- preprocess squad in sft format by @arendu :: PR: #6727
- Fix Codeql config by @titu1994 :: PR: #6731
- Fix fastpitch test nightly by @hsiehjackson :: PR: #6730
- Lora/PEFT training script CI test by @arendu :: PR: #6664
- fixed decor to show messages only when the wrapped object is called. by @XuesongYang :: PR: #6793
- lora pp2 by @arendu :: PR: #6818
- Upperbound Numpy to < 1.24 by @titu1994 :: PR: #6829
- Fix typo in documentation by @Dounx :: PR: #6838
- NFA updates by @erastorgueva-nv :: PR: #6695
- Update container for import action by @ericharper :: PR: #6883
- removed some tests by @arendu :: PR: #6900
- Update contai...
NVIDIA Neural Modules 1.19.1
This release is a small patch to fix torchmetrics.
- Remove deprecated arg
compute_on_step
. See #6979.
NVIDIA Neural Modules 1.19.0
Highlights
NeMo ASR
- Sharded Manifests for Tarred Datasets #6395
- Frame-VAD model + datasets support #6441
- Noise Norm Perturbation #6445
- Code Switched Dataset with IID Sampling #6448
NeMo TTS
NeMo Megatron
- Batch size rampup #6424
- Unify dataset and model classes for all PEFT #6391
- LoRA for GPT #6391
- Convert interleaved pipeline model to non-interleaved #6498
- Dialog Dataset for SFT #6654
- Dynamic length batches for GPT SFT #6510
- Merge LoRA weights into base model #6597
Container
For additional information regarding NeMo containers, please visit: https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo
docker pull nvcr.io/nvidia/nemo:23.04
Detailed Changelogs
ASR
Changelog
- Sharded manifests for tarred datasets by @bmwshop :: PR: #6395
- Update script for ngram rnnt and hat beam search decoding by @andrusenkoau :: PR: #6370
- Add disclaimer about dataset for ASR by @titu1994 :: PR: #6496
- New noise_norm perturbation based on Riva work by @trias702 :: PR: #6445
- Add Frame-VAD model and datasets by @stevehuang52 :: PR: #6441
- removing unnecessary avoid_bfloat16_autocast_context by @bmwshop :: PR: #6481
- FC models in menu by @bmwshop :: PR: #6473
- Separate punctuation by whitespace by @karpnv :: PR: #6574
- Cherry pick commits in #6601 to main by @fayejf :: PR: #6611
- Offline and streaming inference support for hybrid model by @fayejf :: PR: #6570
- Disable interctc tests by @Kipok :: PR: #6638
- ASR-TTS Models: Support hybrid RNNT-CTC, improve docs. by @artbataev :: PR: #6620
- Confidence ensembles implementation by @Kipok :: PR: #6614
- Confidence ensembles: fix issues and add tuning functionality by @Kipok :: PR: #6657
- Add support for RNNT/hybrid models to partial transcribe by @stevehuang52 :: PR: #6609
- eval_beamsearch_ngram.py with hybrid ctc by @karpnv :: PR: #6656
TTS
Changelog
- [TTS] FastPitch adapter fine-tune and conditional layer normalization by @hsiehjackson :: PR: #6416
- [TTS] whitelist broken path fix. by @XuesongYang :: PR: #6412
- [TTS] FastPitch speaker encoder by @hsiehjackson :: PR: #6417
- Update NeMo_TTS_Primer.ipynb by @pythinker :: PR: #6436
- [TTS] Create functions for TTS preprocessing without dataloader by @rlangman :: PR: #6317
- [TTS] Fix FastPitch energy code by @rlangman :: PR: #6511
- [TTS] Add script for computing feature stats by @rlangman :: PR: #6508
- [TTS] Add tutorials for FastPitch TTS speaker adaptation with adapters by @hsiehjackson :: PR: #6431
- [TTS] Create initial TTS dataset feature processors by @rlangman :: PR: #6507
- [TTS] Add script for mapping speaker names to indices by @rlangman :: PR: #6509
- [TTS] Implement new TextToSpeech dataset by @rlangman :: PR: #6575
NLP / NMT
Changelog
- Add patches for Virtual Parallel conversion by @titu1994 :: PR: #6589
- Update wfst_text_normalization.rst by @jimregan :: PR: #6374
- add rampup batch size support for Megatron GPT by @dimapihtar :: PR: #6424
- Add interleaved pp support by @titu1994 :: PR: #6498
- Support dynamic length batches with GPT SFT by @aklife97 :: PR: #6510
- Framework for PEFT via mixins by @arendu :: PR: #6391
- Add GPT eval mode fix for interleaved to main (#6449) by @aklife97 :: PR: #6610
- sft model can use this script for eval by @arendu :: PR: #6637
- Patch memory used for NeMo Megatron models by @titu1994 :: PR: #6615
- merge lora weights into base model by @arendu :: PR: #6597
- Dialogue dataset by @yidong72 :: PR: #6654
- check for first or last stage by @ericharper :: PR: #6708
- A few small typo fixes by @Kipok :: PR: #6599
- Lddl bert by @wdykas :: PR: #6761
- Debug Transformer Engine FP8 support with Megatron-core infrastructure by @timmoon10 :: PR: #6740
- Tensor-parallel communication overlap with userbuffer backend by @erhoo82 :: PR: #6780
- Add ub communicator initialization to validation step by @erhoo82 :: PR: #6807
- Add trainer.validate example for GPT by @ericharper :: PR: #6794
- Add API docs for NeMo Megatron by @ericharper :: PR: #6850
- Apply garbage collection interval to validation steps by @erhoo82 :: PR: #6870
Bugfixes
Changelog
- [BugFix] Force _get_batch_preds() to keep logits in decoder timestamps generator by @tango4j :: PR: #6499
- small bugfix for asr_evaluator by @fayejf :: PR: #6636
- fix bucketing bug issue for picking new bucket by @nithinraok :: PR: #6663
- [TTS] Fix TTS audio preprocessing bugs by @rlangman :: PR: #6628
- Fix a bug, use _ceil_to_nearest instead as _round_to_nearest is not d… by @BestJuly :: PR: #6681
- Bug fix to restore act ckpt by @markelsanz14 :: PR: #6753
- Bug fix to reset sequence parallelism by @markelsanz14 :: PR: #6756
- Bug fix for reset_sequence_parallel_args by @markelsanz14 :: PR: #6802
- Fix adapter tutorial r1.19.0 by @hsiehjackson :: PR: #6776
- Fix error appearing when using tar datasets by @Jorjeous :: PR: #6502
- Fix normalization of impulse response in ImpulsePerturbation by @anteju :: PR: #6505
- Fix typos by @titu1994 :: PR: #6523
- Fix notebook bad json by @titu1994 :: PR: #6561
- [ASR] Fix for old models in change_attention_model by @sam1373 :: PR: #6608
- Fix k2 installation in Docker with CUDA 12 by @artbataev :: PR: #6707
- Tutorial fixes by @titu1994 :: PR: #6717
- Vp fixes by @titu1994 :: PR: #6738
- [TTS] Fix aligner nan loss in fp32 by @hsiehjackson :: PR: #6435
- fix conversion and eval by @arendu :: PR: #6648
- Fix checkpointed forward and add test for full activation checkpointing by @aklife97 :: PR: #6744
- add call to p2p overlap by @aklife97 :: PR: #6779
- Fix get_parameters when using main params optimizer by @ericharper :: PR: #6764
- Fix GPTDataset Assert by @MaximumEntropy :: PR: #6798
- fix notebook error by @yidong72 :: PR: #6840
- final fix of notebook by @yidong72 :: PR: #6842
General Improvements
Changelog
- Code-Switching dataset creation - upgrading to aggregate tokenizer manifest format by @KunalDhawan :: PR: #6448
- Fix an invalid link in get_data.py of ljspeech by @pythinker :: PR: #6456
- Update manifest.py to use os.path for get_full_path by @stevehuang52 :: PR: #6598
- Cherry pick commits in #6528 to main by @timmoon10 :: PR: #6613
- Move black parameters to pyproject.toml by @artbataev :: PR: #6647
- handle artifacts when path is an extracted dir by @arendu :: PR: #6658
- remove upgrading setuptools in reinstall.sh by @XuesongYang :: PR: #6659
- Upgrade to PyTorch 23.04 Container by @ericharper :: PR: #6660
- Fix fastpitch test nightly by @hsiehjackson :: PR: #6742
- Fix Links for tutorials by @titu1994 :: PR: #6777
- Update core version in Jenkinsfile by @aklife97 :: PR: #6817
- Update mcore requirement to 0.2.0 by @ericharper :: PR: #6875
NVIDIA Neural Modules 1.18.1
Highlights
For the complete release note, please see NeMo 1.18.0 Release Notes
Bugfix
This patch release fixes a major bug in ASR Bucketing datasets that was introduced in r1.17.0 in PR #6191. Due to this bug, while each bucket is randomly shuffled before selection on each rank, only a single bucket would loop infinitely - without continuing onto subsequent buckets.
Effect: Significantly worse WER would be obtained since not all buckets would be used.
This has been patched and should work correctly in 1.18.1 onwards.
Container
For additional information regarding NeMo containers, please visit: https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo
docker pull nvcr.io/nvidia/nemo:23.03
NVIDIA Neural Modules 1.18.0
Highlights
Models
- GPT-2B-001, trained on 1.1T tokens with 4K sequence length.
- STT En Fast Conformer-CTC Large
- STT En Fast Conformer-Transducer Large
- STT En Fast Conformer-Transducer Large LibriSpeech
- STT En FastConformer Hybrid Transducer-CTC Large P&C
- STT De FastConformer Hybrid Transducer-CTC Large P&C
- STT Es FastConformer Hybrid Transducer-CTC Large P&C
- STT It FastConformer Hybrid Transducer-CTC Large P&C
- STT Pl FastConformer Hybrid Transducer-CTC Large P&C
- STT Ua FastConformer Hybrid Transducer-CTC Large P&C
- STT Hr FastConformer Hybrid Transducer-CTC Large P&C
- STT By Conformer-RNNT Large
NeMo ASR
- Hybrid Autoregressive Transducer (HAT) #6260
- Apple MPS Support for ASR Inference #6289
- InterCTC Support for Hybrid ASR Models #6215
- RNNT N-Gram Fusion with mAES algo #6118
- ASR + Apple M2 CPU/GPU MPS #6289
NeMo TTS
- TTS directory structure refactor
- User-set symbol vocabulary #6172
NeMo Megatron
- Model parallelism from Megatron Core #6393
- Continued training for P-tuning #6273
- SFT for GPT-3 #6210
- Tensor and pipeline model parallel conversion #6218
- Megatron NMT Export to Riva
NeMo Core
Detailed Changelogs
ASR
Changelog
- minor cleanup by @messiaen :: PR: #6311
- docs on the use of heterogeneous test / val manifests by @bmwshop :: PR: #6352
- [WIP] add buffered chunked streaming for nemo force aligner by @Slyne :: PR: #6185
- Word boosting for Flashlight decoder by @trias702 :: PR: #6367
- Add installation and ASR inference instructions for Mac by @artbataev :: PR: #6377
- specaug speedup by @1-800-BAD-CODE :: PR: #6347
- updated lr for FC configs by @bmwshop :: PR: #6379
- Make possible to control tqdm progress bar in ASR models by @SN4KEBYTE :: PR: #6375
- [ASR] Conformer global tokens in local attention by @sam1373 :: PR: #6253
- fixed torch warning on using a list of numpy arrays by @MKNachesa :: PR: #6382
- Fix FastConformer config: correct bucketing strategy by @artbataev :: PR: #6413
- fixing the ability to use temp sampling with concat datasets by @bmwshop :: PR: #6423
- add conformer configs for hat model by @andrusenkoau :: PR: #6372
- [ASR] Add optimization util for linear sum assignment algorithm by @tango4j :: PR: #6349
- Added/updated new Conformer configs by @VahidooX :: PR: #6426
- Fix typos by @titu1994 :: PR: #6494
- Fix typos (#6523) by @titu1994 :: PR: #6539
- added back the fast emit section to the configs. by @VahidooX :: PR: #6540
- Add FastConformer Hybrid ASR models for EN, ES, IT, DE, PL, HR, UA, BY by @KunalDhawan :: PR: #6549
- Add scores for FastConformer models by @titu1994 :: PR: #6557
- Patch transcribe and support offline transcribe for hybrid model by @fayejf :: PR: #6550
- More streaming conformer export fixes by @messiaen :: PR: #6567
- Documentation for ASR-TTS models by @artbataev :: PR: #6594
- Patch transcribe_util for steaming mode and add wer calculation back to inference scripts by @fayejf :: PR: #6601
- Add HAT image to docs by @andrusenkoau :: PR: #6619
- Patch decoding for PC models by @titu1994 :: PR: #6630
- Fix wer.py where 'errors' variable was not set by @stevehuang52 :: PR: #6633
- Fix for old models in change_attention_model by @VahidooX :: PR: #6635
TTS
Changelog
NLP / NMT
Changelog
- [Core] return_config=True now extracts just config, not full tarfile by @titu1994 :: PR: #6346
- restore path for p-tuning by @arendu :: PR: #6273
- taskname and early stopping for adapters by @arendu :: PR: #6366
- Adapter tuning accepts expanded language model dir by @arendu :: PR: #6376
- Update gpt_training.rst by @blisc :: PR: #6378
- Megatron GPT model finetuning by @MaximumEntropy :: PR: #6210
- [NeMo Megatron] Cleanup configs to infer the models TP PP config automatically by @titu1994 :: PR: #6368
- Fix prompt template unescaping by @MaximumEntropy :: PR: #6399
- Add support for Megatron GPT Untied Embd TP PP Change by @titu1994 :: PR: #6388
- Move Parallelism usage from Apex -> Megatron Core by @aklife97 :: PR: #6393
- Add ability to enable/disable act ckpt and seq parallelism in GPT by @markelsanz14 :: PR: #6327
- Refactor PP conversion + add support for TP only conversion by @titu1994 :: PR: #6419
- fix CPU overheads of GPT synthetic dataset by @xrennvidia :: PR: #6427
- check if grad is none before calling all_reduce by @arendu :: PR: #6428
- Fix replace_bos_with_pad not found by @aklife97 :: PR: #6443
- Support Swiglu in TP PP Conversion by @titu1994 :: PR: #6437
- BERT pre-training mp fork to spawn by @aklife97 :: PR: #6442
- Meagtron encoder decoder fix for empty validation outputs by @michalivne :: PR: #6459
- Reduce workers on NMT CI by @aklife97 :: PR: #6472
- Switch to NVIDIA Megatron repo by @aklife97 :: PR: #6465
- Megatron KERPLE positional embeddings by @michalivne :: PR: #6478
- Support in external sample mapping for Megatron datasets by @michalivne :: PR: #6462
- Fix custom by @aklife97 :: PR: #6512
- GPT fp16 inference fix by @MaximumEntropy :: PR: #6543
- Fix for T5 FT model by @aklife97 :: PR: #6529
- Pass instead of scaler object to core by @aklife97 :: PR: #6545
- Change Megatron Enc Dec model to use persistent_workers by @aklife97 :: PR: #6548
- Turn autocast off when precision is fp32 by @aklife97 :: PR: #6554
- Fix batch size reconf for T5 FT for multi-validation by @aklife97 :: PR: #6582
- Make tensor split contiguous for qkv and kv in attention by @aklife97 :: PR: #6580
- Patches from main to r1.18.0 for Virtual Parallel by @titu1994 :: PR: #6592
- Create dummy iters to satisy iter type len checks in core + update core commit by @aklife97 :: PR: #6600
- Restore GPT support for interleaved pipeline parallelism by @timmoon10 :: PR: #6528
- Add megatron_core to requirements by @ericharper :: PR: #6639
Export
Changelog
Bugfixes
Changelog
- Fix the GPT SFT datasets loss mask bug by @yidong72 :: PR: #6409
- [BugFix] Fix multi-processing bug in data simulator by @tango4j :: PR: #6310
- Fix cache aware hybrid bugs by @VahidooX :: PR: #6466
- [BugFix] Force _get_batch_preds() to keep logits in decoder timestamp… by @tango4j :: PR: #6500
- Fixing bug in unsort_tensor by @borisfom :: PR: #6320
- Bugfix for BF16 grad reductions with distopt by @timmoon10 :: PR: #6340
- Limit urllib3 version to patch issue with RTD by @aklife97 :: PR: #6568
General improvements
Changelog
- Pin the version to hopefully fix rtd build by @SeanNaren :: PR: #6334
- enabling diverse datasets in val / test by @bmwshop :: PR: #6306
- extract inference weights by @arendu :: PR: #6353
- Add opengraph support for NeMo docs by @titu1994 :: PR: #6380
- Adding basic preemption code by @athitten :: PR: #6161
- Add documentation for preemption support by @athitten :: PR: #6403
- Update hyperparameter recommendation based on experiments by @Zhilin123 :: PR: #6405
- exceptions with empty test / val ds config sections by @bmwshop :: PR: #6421
- Upgrade pt 23.03 by @ericharper :: PR: #6430
- Update README to add core installation by @aklife97 :: PR: #6488
- Not doing CastToFloat by default by @borisfom :: PR: #6524
- Update manifest.py for speedup by @stevehuang52 :: PR: #6565
- Update SDP docs by @erastorgueva-nv :: PR: #6485
- Update core commit hash in readme by @aklife97 :: PR: #6622
- Remove from jenkins by @ericharper :: PR: #6641
- Remove dup by @ericharper :: PR: #6643
NVIDIA Neural Modules 1.17.0
Highlights
NeMo ASR
- Online Clustering Diarizer
- High Level Diarization API
- PyCTC Decode Beam Search Support
- RNNT Beam Search Alignment Extraction
- InterCTC Loss
- AIStore Documentation
- ASR & AWS Multi-node Integration
- Convolution Invariant SDR losses
NeMo TTS
NeMo Megatron
- SqaredReLU, SwiGLU, No-Dropout
- Rotary Position Embedding
- Untie word embeddings and output projection
NeMo Core
- Dynamic freezing of modules during training
- NeMo Multi-Run Documentation
- ClearML Logging
- Early Stopping
- Experiment Manager Docs Update
Container
For additional information regarding NeMo containers, please visit: https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo
docker pull nvcr.io/nvidia/nemo:23.02
Detailed Changelogs
ASR
Changelog
- Support Alignment Extraction for all RNNT Beam decoding methods by @titu1994 :: PR: #5925
- Use module-based k2 import guard by @artbataev :: PR: #6006
- Default RNNT loss to int64 targets by @titu1994 :: PR: #6011
- Added documentation section for ASR datasets from AIStore by @anteju :: PR: #6008
- Change perturb rng for reproducing results easily by @fayejf :: PR: #6042
- InterCTC loss and stochastic depth implementation by @Kipok :: PR: #6013
- Add pyctcdecode to high level beam search API by @titu1994 :: PR: #6026
- Convert esperanto into a notebook by @SeanNaren :: PR: #6070
- [ASR] Added a script for evaluating metrics for audio-to-audio by @anteju :: PR: #5971
- [ASR] Convolution-invariant SDR loss + unit tests by @anteju :: PR: #5992
- Adjust stochastic depth dropout probability calculation by @anteju :: PR: #6120
- Add file class based inference API for diarization by @SeanNaren :: PR: #5945
- Ngram by @karpnv :: PR: #6063
- remove duplicate definition of manifest read and write func. by @XuesongYang :: PR: #6088
- Streaming conformer CTC export by @messiaen :: PR: #5837
- [TTS] Make mel spectrogram norm configurable by @rlangman :: PR: #6155
- Ngram lm fusion for RNNT maes decoding by @andrusenkoau :: PR: #6118
- ASR Beam search documentation by @titu1994 :: PR: #6244
TTS
Changelog
- [TTS][ZH] added new NGC model cards with polyphone disambiguation. by @XuesongYang :: PR: #5940
- [TTS] deprecate AudioToCharWithPriorAndPitchDataset. by @XuesongYang :: PR: #5959
- [TTS][G2P] deprecate add_symbols by @XuesongYang :: PR: #5961
- Added list_available_models by @treacker :: PR: #5967
- Update Fastpitch energy bug by @blisc :: PR: #5969
- removed WHATEVER(1) ˌhwʌˈtɛvɚ from scripts/tts_dataset_files/ipa_cmudict-0.7b_nv22.10.txt by @MikyasDesta :: PR: #5869
- ONNX export for RadTTS by @borisfom :: PR: #5880
- Add some info about FastPitch SSL model by @redoctopus :: PR: #5994
- Vits doc by @treacker :: PR: #5989
- Ragged batching changes for RadTTS, some refactoring by @borisfom :: PR: #6020
- Working enabled ragged batching with ONNX by @borisfom :: PR: #6030
- [TTS/TN/G2P] Remove Text Processing from NeMo, move G2P to TTS by @ekmb :: PR: #5982
- [TTS] Add Spanish IPA dictionaries and heteronyms by @rlangman :: PR: #6037
- [TTS] Separate TTS tokenization and g2p util to fix circular import by @rlangman :: PR: #6080
- [TTS][refactor] Part 7 - move module from model file. by @XuesongYang :: PR: #6098
- [TTS][refactor] Part 1 - nemo.collections.tts.data by @XuesongYang :: PR: #6099
- [TTS][refactor] Part 2 - nemo.colletions.tts.parts by @XuesongYang :: PR: #6105
- [TTS][refactor] Part 6 - remove nemo.collections.tts.torch.README.md and tts_dataset.yaml by @XuesongYang :: PR: #6103
- [TTS][refactor] Part 3 - nemo.collections.tts.g2p.models by @XuesongYang :: PR: #6113
- [TTS] update German NGC models trained on Thorsten Datasets by @XuesongYang :: PR: #6125
- [TTS] remove old waveglow model that relies on torch_stft. by @XuesongYang :: PR: #6128
- [TTS] Move Spanish polyphones from heteronym to dictionary by @rlangman :: PR: #6123
- [TTS][refactor] Part 8 - added model inference tests to safeguard changes. by @XuesongYang :: PR: #6129
- remove duplicate definition of manifest read and write func. by @XuesongYang :: PR: #6088
- [TTS][refactor] update tutorial import paths. by @XuesongYang :: PR: #6176
- [TTS] Add univnet scheduler by @ArtyomZemlyak :: PR: #6157
- [TTS] Make mel spectrogram norm configurable by @rlangman :: PR: #6155
NLP / NMT
Changelog
- add new lannguages to doc by @yzhang123 :: PR: #5939
- Distributed Adam optimizer overlaps param all-gather with forward compute by @timmoon10 :: PR: #5684
- Refactor the retrieval services for microservice architecture by @yidong72 :: PR: #5910
- make validation accuracy reporting optional for adapters/ptuning by @arendu :: PR: #5843
- Add BERT support for overlapping forward compute with distopt communication by @timmoon10 :: PR: #6024
- [TTS/TN/G2P] Remove Text Processing from NeMo, move G2P to TTS by @ekmb :: PR: #5982
- adding early stop callback to ptuning by @arendu :: PR: #6028
- Pr doc tn by @yzhang123 :: PR: #6041
- Adds several configurable flags for Megatron GPT models by @MaximumEntropy :: PR: #5991
- P-tuning refactor Part 1/N by @arendu :: PR: #6054
- Fast glu activations by @MaximumEntropy :: PR: #6058
- P-tuning refactor Part 2/N by @arendu :: PR: #6056
- P-tuning refactor Part 3/N by @arendu :: PR: #6106
- Explicitly check for united embeddings when logging params by @MaximumEntropy :: PR: #6085
- Add flag to get attention from fusion by @ericharper :: PR: #6049
- Improving text memmap generated index files error messages by @michalivne :: PR: #6093
- Megatron Encoder-Decoder Sampler Function by @michalivne :: PR: #6095
- Sentence piece legacy false compatibility by @arendu :: PR: #6154
- convert Megatron LM ckpt to NeMo PP support. by @yidong72 :: PR: #6159
- Avoid multiple warnings for loss mask by @mikolajblaz :: PR: #6062
- Propagate LayerNorm1P to TE by @mikolajblaz :: PR: #6061
- Filter p-tuning by example length by @arendu :: PR: #6182
- Add sequence parallel support to Rope positional embedding by @yidong72 :: PR: #6178
- Use a separate communicator for DP AMAX reduction by @erhoo82 :: PR: #6022
- Add persistent workers to GPT by @ericharper :: PR: #6205
- Micro batch loader for bert model by @shanmugamr1992 :: PR: #6046
- GPT P tuning Eval changes (#5952) by @aklife97 :: PR: #6272
- add template for taskname=taskname by @Zhilin123 :: PR: #6283
- added RPE + fixed RMSNorm by @Davood-M :: PR: #6304
- simplified notebook for p-tuning by @arendu :: PR: #6326
- Added num decoder blocks in megatron export by @Davood-M :: PR: #6331
Text Normalization / Inverse Text Normalization
Export
Changelog
- ONNX export for RadTTS by @borisfom :: PR: #5880
- Working enabled ragged batching with ONNX by @borisfom :: PR: #6030
- Update docs for ExpManager and Exportable frameworks by @titu1994 :: PR: #6165
- Streaming conformer CTC export by @messiaen :: PR: #5837
- MixedFusedRMSNorm Export Fix by @Davood-M :: PR: #6296
- Added num decoder blocks in megatron export by @Davood-M :: PR: #6331
Bugfixes
Changelog
- Fix bug where GPT always enabled distopt overlapped param sync by @timmoon10 :: PR: #5995
- CS bugfix by @bmwshop :: PR: #6122
- RNNT patch by @titu1994 :: PR: #6231
- Notebook fixes by @titu1994 :: PR: #6212
- Small fixes for flashlight decoder by @trias702 :: PR: #6071
- Various fixes in docs and RNNT by @titu1994 :: PR: #6156
- Fix k2 and torchaudio installation (Docker, macOS) by @artbataev :: PR: #6094
- update and deprecate warning for Mic notebook by @fayejf :: PR: #6307
- small bugfix and add asr evaluator to doc by @fayejf :: PR: #6229
- Bug fixing for bucketing dataset by @VahidooX :: PR: #6191
- Fix character beam decoding algorithm with vocab index map by @titu1994 :: PR: #6140
- fix typo in asr evaluator readme by @fayejf :: PR: #6053
- Fix typos by @titu1994 :: PR: #6241
- [ASR]:fixed augmentor arguments for transcribe functionality of Hybrid CTC-RNNT model by @KunalDhawan :: PR: #6290
- Fix hybrid transcribe by @ArtyomZemlyak :: PR: #6003
- Fix buckeing seeding by @VahidooX :: PR: #6254
- Fix for CTC decoder setup by @vsl9 :: PR: #6303
- Fix RNNT Joint narrow() by @titu1994 :: PR: #6336
- Fix bugs with interctc mixin by @Kipok :: PR: #6228
- Update IPA dict path in tutorial by @redoctopus :: PR: #6208
- [TTS] fix broken tutorial for Tacotron2 by @XuesongYang :: PR: #6199
- [TTS] fix bugs for chinese and german tutorials. by @XuesongYang :: PR: #6216
- Fix radtts sort r17 by @borisfom :: PR: #6344
- Quick Fix for RadTTS test by @blisc :: PR: #6034
- Disabling radtts tests untin we have real model by @borisfom :: PR: #6036
- fix val loss computation in megatron by @anmolgupt :: PR: #5871
- Fix incomplete batches by @mikolajblaz :: PR: #6083
- Avoid unnecessarily accessing data loader with pipeline parallelism by @timmoon10 :: PR: #6164
- bugfix: file handlers are not closed. by @XuesongYang :: PR: #5956
- Fix Silence Sampling Algorithm for ASR Multi-speaker Data Simulator by @stevehuang52 :: PR: #5897
- Fix Windows bug with save_restore_connector by @trias702 :: PR: #5919
- fix broken link by @ericharper :: PR: #5968
- Fix torchaudio installation by @artbataev :: PR: #5850
- Fix reinstall.sh dependencies by @titu1994 :: PR: #6027
- Adding changes to fix the mv error by @tango4j :: PR: #6087
- Fix README by @flx42 :: PR: #6137
- Fix typos in voiceapp notebook by @titu1994 :: PR: #6262
- [BugFix] Fix diarization result path errors in tutorial notebook for r1.17.0 by @tango4j :: PR: #6234
- [BugFix] Fix ...