migrated class CSVFieldsMemmapDataset from BioNeMo #7314

dorotat-nv · 2023-08-24T13:18:44Z

What does this PR do ?

Adding class CSVFieldsMemmapDataset

Collection: [Note which collection this PR will affect]

Changelog

Add specific line by line info of high level changes in this PR.

Usage

You can potentially add a usage example below

# Add a code snippet demonstrating how to use this

Before your PR is "Ready for review"

Pre checks:

[ x] Make sure you read and followed Contributor guidelines
[ x] Did you write any new necessary tests?
[ x] Did you add or update any necessary documentation?
Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
- Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

[ x] New Feature
Bugfix
Documentation

If you haven't finished some of the above items you can still open "Draft" PR.

Who can review?

Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.

Additional Information

Related to # (issue)

Signed-off-by: dorotat <[email protected]>

for more information, see https://pre-commit.ci Signed-off-by: dorotat <[email protected]>

Signed-off-by: dorotat <[email protected]>

* memmap worker arg Signed-off-by: arendu <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update Signed-off-by: arendu <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update Signed-off-by: arendu <[email protected]> * update Signed-off-by: arendu <[email protected]> --------- Signed-off-by: arendu <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: dorotat <[email protected]>

…IDIA#7034) (NVIDIA#7082) Co-authored-by: Vahid Noroozi <[email protected]> Signed-off-by: dorotat <[email protected]>

* old way Signed-off-by: sam1373 <[email protected]> * fix Signed-off-by: sam1373 <[email protected]> * fix Signed-off-by: sam1373 <[email protected]> * fix Signed-off-by: sam1373 <[email protected]> * remove extra Signed-off-by: sam1373 <[email protected]> * clean Signed-off-by: sam1373 <[email protected]> * clean Signed-off-by: sam1373 <[email protected]> * clean Signed-off-by: sam1373 <[email protected]> * fix Signed-off-by: sam1373 <[email protected]> * fix Signed-off-by: sam1373 <[email protected]> * fix Signed-off-by: sam1373 <[email protected]> * fix Signed-off-by: sam1373 <[email protected]> * fix Signed-off-by: sam1373 <[email protected]> * fix Signed-off-by: sam1373 <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: sam1373 <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: dorotat <[email protected]>

* Refined export_config * Rolling back hierarchy change --------- Signed-off-by: Boris Fomitchev <[email protected]> Co-authored-by: Boris Fomitchev <[email protected]> Signed-off-by: dorotat <[email protected]>

* small Bugfix (NVIDIA#7079) * fix branch Signed-off-by: fayejf <[email protected]> * fix typo Signed-off-by: fayejf <[email protected]> * fix link Signed-off-by: fayejf <[email protected]> --------- Signed-off-by: fayejf <[email protected]> * Update tutorials/nlp/SpellMapper_English_ASR_Customization.ipynb Signed-off-by: Somshubra Majumdar <[email protected]> * Update tutorials/nlp/SpellMapper_English_ASR_Customization.ipynb Signed-off-by: Somshubra Majumdar <[email protected]> --------- Signed-off-by: fayejf <[email protected]> Signed-off-by: Somshubra Majumdar <[email protected]> Co-authored-by: fayejf <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Signed-off-by: dorotat <[email protected]>

NVIDIA#7092) * Added script to extract ctc and rnnt models from hybrid models Signed-off-by: Daniel Egert <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Updated hybrid extraction script for review request 1 Signed-off-by: Daniel Egert <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Updated hybrid convert script to remove --cuda flag Signed-off-by: Daniel Egert <[email protected]> --------- Signed-off-by: Daniel Egert <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Somshubra Majumdar <[email protected]> Signed-off-by: dorotat <[email protected]>

…#7067) (NVIDIA#7094) Signed-off-by: dorotat <[email protected]>

* update TTS readme Signed-off-by: Xuesong Yang <[email protected]> --------- Signed-off-by: Xuesong Yang <[email protected]> Signed-off-by: dorotat <[email protected]>

Signed-off-by: Jan Beckmann <[email protected]> Signed-off-by: dorotat <[email protected]>

Signed-off-by: Tim Moon <[email protected]> Signed-off-by: dorotat <[email protected]>

Signed-off-by: Linnea Pari Leaver <[email protected]> Co-authored-by: Linnea Pari Leaver <[email protected]> Signed-off-by: dorotat <[email protected]>

* Fix load_state_dict in nlp_model.py Signed-off-by: He Huang (Steve) <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: He Huang (Steve) <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: dorotat <[email protected]>

Fix plot function in vad_utils.py Signed-off-by: He Huang (Steve) <[email protected]> Signed-off-by: dorotat <[email protected]>

Signed-off-by: Daniel Egert <[email protected]> Signed-off-by: dorotat <[email protected]>

Signed-off-by: smajumdar <[email protected]> Signed-off-by: dorotat <[email protected]>

This reverts commit a46e325. Signed-off-by: dorotat <[email protected]>

* Fix import guard checks Signed-off-by: smajumdar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: smajumdar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: dorotat <[email protected]>

Signed-off-by: dorotat <[email protected]>

* [TTS] Create EnCodec training recipe Signed-off-by: Ryan <[email protected]> * [TTS] Update encodec recipe Signed-off-by: Ryan <[email protected]> * [TTS] Rename EnCodec to AudioCodec Signed-off-by: Ryan <[email protected]> * [TTS] Add EnCodec unit tests Signed-off-by: Ryan <[email protected]> * [TTS] Add copyright header to distributed.py Signed-off-by: Ryan <[email protected]> --------- Signed-off-by: Ryan <[email protected]> Signed-off-by: dorotat <[email protected]>

… not wait for tokenizer file caching (NVIDIA#7061) Signed-off-by: Kim Ngo <[email protected]> Co-authored-by: David <[email protected]> Signed-off-by: dorotat <[email protected]>

Signed-off-by: dorotat <[email protected]>

Signed-off-by: He Huang (Steve) <[email protected]> Signed-off-by: dorotat <[email protected]>

) * [TTS] add Chinese TTS recipe based on IPA. * add new pinyin and ipa dictionaries with 36 finals. * add yaml configs for 24-final pinyin and ipa. * add copyright header * add a directory level 24finals to discriminate from 36 finals. Signed-off-by: Xuesong Yang <[email protected]> * unify configs into a single one and add detailed comments providing supported candidates. Signed-off-by: Xuesong Yang <[email protected]> * choose 36-final IPA as default phoneme dict Signed-off-by: Xuesong Yang <[email protected]> --------- Signed-off-by: Xuesong Yang <[email protected]> Signed-off-by: dorotat <[email protected]>

* [TTS] Add output audio format to preprocessing Signed-off-by: Ryan <[email protected]> * [TTS] Add format validation Signed-off-by: Ryan <[email protected]> * [TTS] Fix data tutorial Signed-off-by: Ryan <[email protected]> --------- Signed-off-by: Ryan <[email protected]> Signed-off-by: dorotat <[email protected]>

Signed-off-by: arendu <[email protected]> Signed-off-by: dorotat <[email protected]>

Signed-off-by: Elena Rastorgueva <[email protected]> Signed-off-by: dorotat <[email protected]>

* Transform adapter modules to fp16/bf16 under amp_O2 * Under megatron_amp_O2, transform the adapter modules to low precision after instantiation Signed-off-by: Guyue Huang <[email protected]> Conflicts: nemo/collections/nlp/modules/common/megatron/adapters/parallel_adapters.py * Fix ptuning under amp O2 * Fix the first_stage_of_pipeline detection for half models * Fix the freezing of InferenceTable for half models Signed-off-by: Guyue Huang <[email protected]> * Fix MegatronGPTAdapterPTuningModel * When unfreezing adapters, we explicitly set inference embedding table in prompt encoder to be untrainable. Signed-off-by: Guyue Huang <[email protected]> * Add comments for feature explanation Signed-off-by: Guyue Huang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix ptuning and lora model_parallel_config Signed-off-by: jasonwan <[email protected]> * Put the casting of adapters in their instantiaion Signed-off-by: Guyue Huang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * small fix for state dict Signed-off-by: jasonwan <[email protected]> * optional model_parallel_config Signed-off-by: jasonwan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Guyue Huang <[email protected]> Signed-off-by: jasonwan <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: jasonwan <[email protected]> Signed-off-by: dorotat <[email protected]>

* fix partial transcribe Signed-off-by: stevehuang52 <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * refactor Signed-off-by: stevehuang52 <[email protected]> --------- Signed-off-by: stevehuang52 <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: dorotat <[email protected]>

Signed-off-by: Jason Wang <[email protected]> Signed-off-by: dorotat <[email protected]>

* loss mask for final output and softmax Signed-off-by: arendu <[email protected]> * bs2 working Signed-off-by: arendu <[email protected]> * Fix skip generation Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add metric condition Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * encoder_input is none check Signed-off-by: arendu <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: arendu <[email protected]> Signed-off-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: dorotat <[email protected]>

* Adding server to peft eval Signed-off-by: David Mosallanezhad <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * updated trainer.test for server Signed-off-by: David Mosallanezhad <[email protected]> --------- Signed-off-by: David Mosallanezhad <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: dorotat <[email protected]>

Signed-off-by: arendu <[email protected]> Signed-off-by: dorotat <[email protected]>

Signed-off-by: Alireza Morsali <[email protected]> Signed-off-by: dorotat <[email protected]>

Signed-off-by: Xuesong Yang <[email protected]> Signed-off-by: dorotat <[email protected]>

for more information, see https://pre-commit.ci Signed-off-by: dorotat <[email protected]>

Signed-off-by: dorotat <[email protected]>

…dataset

Davood-M

LGTM, thank you!

for more information, see https://pre-commit.ci

…dataset

Davood-M

LGTM, thanks!

* migrated class Signed-off-by: dorotat <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: dorotat <[email protected]> * added unit test Signed-off-by: dorotat <[email protected]> * memmap worker arg (#7062) * memmap worker arg Signed-off-by: arendu <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update Signed-off-by: arendu <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update Signed-off-by: arendu <[email protected]> * update Signed-off-by: arendu <[email protected]> --------- Signed-off-by: arendu <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: dorotat <[email protected]> * Fix caching bug in causal convolutions for cache-aware ASR models (#7034) (#7082) Co-authored-by: Vahid Noroozi <[email protected]> Signed-off-by: dorotat <[email protected]> * Fast Conformer global token fix (#7085) * old way Signed-off-by: sam1373 <[email protected]> * fix Signed-off-by: sam1373 <[email protected]> * fix Signed-off-by: sam1373 <[email protected]> * fix Signed-off-by: sam1373 <[email protected]> * remove extra Signed-off-by: sam1373 <[email protected]> * clean Signed-off-by: sam1373 <[email protected]> * clean Signed-off-by: sam1373 <[email protected]> * clean Signed-off-by: sam1373 <[email protected]> * fix Signed-off-by: sam1373 <[email protected]> * fix Signed-off-by: sam1373 <[email protected]> * fix Signed-off-by: sam1373 <[email protected]> * fix Signed-off-by: sam1373 <[email protected]> * fix Signed-off-by: sam1373 <[email protected]> * fix Signed-off-by: sam1373 <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: sam1373 <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: dorotat <[email protected]> * Refined export_config (#7053) (#7066) * Refined export_config * Rolling back hierarchy change --------- Signed-off-by: Boris Fomitchev <[email protected]> Co-authored-by: Boris Fomitchev <[email protected]> Signed-off-by: dorotat <[email protected]> * small Bugfix (#7081) * small Bugfix (#7079) * fix branch Signed-off-by: fayejf <[email protected]> * fix typo Signed-off-by: fayejf <[email protected]> * fix link Signed-off-by: fayejf <[email protected]> --------- Signed-off-by: fayejf <[email protected]> * Update tutorials/nlp/SpellMapper_English_ASR_Customization.ipynb Signed-off-by: Somshubra Majumdar <[email protected]> * Update tutorials/nlp/SpellMapper_English_ASR_Customization.ipynb Signed-off-by: Somshubra Majumdar <[email protected]> --------- Signed-off-by: fayejf <[email protected]> Signed-off-by: Somshubra Majumdar <[email protected]> Co-authored-by: fayejf <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Signed-off-by: dorotat <[email protected]> * Added script to extract ASR CTC and RNNT models from ASR hybrid models (#7092) * Added script to extract ctc and rnnt models from hybrid models Signed-off-by: Daniel Egert <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Updated hybrid extraction script for review request 1 Signed-off-by: Daniel Egert <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Updated hybrid convert script to remove --cuda flag Signed-off-by: Daniel Egert <[email protected]> --------- Signed-off-by: Daniel Egert <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Somshubra Majumdar <[email protected]> Signed-off-by: dorotat <[email protected]> * Adding docs and models for multiple lookahead cache-aware ASR (#7067) (#7094) Signed-off-by: dorotat <[email protected]> * update TTS readme (#7088) * update TTS readme Signed-off-by: Xuesong Yang <[email protected]> --------- Signed-off-by: Xuesong Yang <[email protected]> Signed-off-by: dorotat <[email protected]> * Fix absolute path in path join call (#7099) Signed-off-by: Jan Beckmann <[email protected]> Signed-off-by: dorotat <[email protected]> * Disable distopt contiguous param buffer by default (#7095) Signed-off-by: Tim Moon <[email protected]> Signed-off-by: dorotat <[email protected]> * microphone demo (#7110) Signed-off-by: Linnea Pari Leaver <[email protected]> Co-authored-by: Linnea Pari Leaver <[email protected]> Signed-off-by: dorotat <[email protected]> * [Fix] load_state_dict in nlp_model.py (#7086) * Fix load_state_dict in nlp_model.py Signed-off-by: He Huang (Steve) <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: He Huang (Steve) <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: dorotat <[email protected]> * Fix plot function in vad_utils.py (#7113) Fix plot function in vad_utils.py Signed-off-by: He Huang (Steve) <[email protected]> Signed-off-by: dorotat <[email protected]> * Fixed small bug with NoisePerturbationWithNormalization (#7118) Signed-off-by: Daniel Egert <[email protected]> Signed-off-by: dorotat <[email protected]> * Fix import guard checks (#7124) Signed-off-by: smajumdar <[email protected]> Signed-off-by: dorotat <[email protected]> * Revert "Fix import guard checks (#7124)" (#7125) This reverts commit ae7624da7d773a6b9436ff61903dc4b99c7c27cb. Signed-off-by: dorotat <[email protected]> * Fix import guard checks (#7126) * Fix import guard checks Signed-off-by: smajumdar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: smajumdar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: dorotat <[email protected]> * Add updated fc ctc and rnnt xxl models (#7128) (#7130) Signed-off-by: dorotat <[email protected]> * [TTS] Create EnCodec training recipe (#6852) * [TTS] Create EnCodec training recipe Signed-off-by: Ryan <[email protected]> * [TTS] Update encodec recipe Signed-off-by: Ryan <[email protected]> * [TTS] Rename EnCodec to AudioCodec Signed-off-by: Ryan <[email protected]> * [TTS] Add EnCodec unit tests Signed-off-by: Ryan <[email protected]> * [TTS] Add copyright header to distributed.py Signed-off-by: Ryan <[email protected]> --------- Signed-off-by: Ryan <[email protected]> Signed-off-by: dorotat <[email protected]> * Fix rank where torch.distributed may not be initialized yet and would not wait for tokenizer file caching (#7061) Signed-off-by: Kim Ngo <[email protected]> Co-authored-by: David <[email protected]> Signed-off-by: dorotat <[email protected]> * fix default attention size (#7141) (#7143) Signed-off-by: dorotat <[email protected]> * fix evaluator.py for various exceptions by ast (#7150) Signed-off-by: He Huang (Steve) <[email protected]> Signed-off-by: dorotat <[email protected]> * [TTS][ZH] add Chinese TTS recipes based on IPA symbol sets. (#6893) * [TTS] add Chinese TTS recipe based on IPA. * add new pinyin and ipa dictionaries with 36 finals. * add yaml configs for 24-final pinyin and ipa. * add copyright header * add a directory level 24finals to discriminate from 36 finals. Signed-off-by: Xuesong Yang <[email protected]> * unify configs into a single one and add detailed comments providing supported candidates. Signed-off-by: Xuesong Yang <[email protected]> * choose 36-final IPA as default phoneme dict Signed-off-by: Xuesong Yang <[email protected]> --------- Signed-off-by: Xuesong Yang <[email protected]> Signed-off-by: dorotat <[email protected]> * [TTS] Add output audio format to preprocessing (#6889) * [TTS] Add output audio format to preprocessing Signed-off-by: Ryan <[email protected]> * [TTS] Add format validation Signed-off-by: Ryan <[email protected]> * [TTS] Fix data tutorial Signed-off-by: Ryan <[email protected]> --------- Signed-off-by: Ryan <[email protected]> Signed-off-by: dorotat <[email protected]> * freeze (#7152) Signed-off-by: arendu <[email protected]> Signed-off-by: dorotat <[email protected]> * make sure any empty segments are removed (#7155) Signed-off-by: Elena Rastorgueva <[email protected]> Signed-off-by: dorotat <[email protected]> * Update RIR generation scripts (#6547) - fix: reduce room size if evaluation of params fails - added randomized mic placement - added diffuse noise generation - added an option to specify the format and subtype for saved audio Signed-off-by: Ante Jukić <[email protected]> Signed-off-by: dorotat <[email protected]> * A quickstart speech enhancement tutorial (#6492) A simple example of training a model for speech enhancement task Signed-off-by: Ante Jukić <[email protected]> Signed-off-by: dorotat <[email protected]> * NFA subtitle file config - specify colors and vertical alignment (#7160) * allow specifying colors of text in ASS subtitle file Signed-off-by: Elena Rastorgueva <[email protected]> * specify vertical_alignment instead of marginv in ass_file_config Signed-off-by: Elena Rastorgueva <[email protected]> * add documentation of CTMFileConfig and ASSFileConfig to NFA README Signed-off-by: Elena Rastorgueva <[email protected]> --------- Signed-off-by: Elena Rastorgueva <[email protected]> Signed-off-by: dorotat <[email protected]> * Eagerly accumulate embedding grads into fp32 buffer (#6958) (#7153) Signed-off-by: Tim Moon <[email protected]> Co-authored-by: Tim Moon <[email protected]> Signed-off-by: dorotat <[email protected]> * TE bug fix (#7027) (#7036) Signed-off-by: Dmytro Pykhtar <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Signed-off-by: dorotat <[email protected]> * [TTS] Remove nested TTS configs (#7154) * [TTS] Remove nested TTS configs Signed-off-by: Ryan <[email protected]> * [TTS] Modify tutorial to support multiple sampling rates Signed-off-by: Ryan <[email protected]> * [TTS] Clarify min_duration unit Signed-off-by: Ryan <[email protected]> * [TTS] Default 22.05kHz highfreq to null Signed-off-by: Ryan <[email protected]> --------- Signed-off-by: Ryan <[email protected]> Signed-off-by: dorotat <[email protected]> * Merge release r1.20.0 to main (#7167) * update package info Signed-off-by: ericharper <[email protected]> * Add ASR with TTS Tutorial. Fix enhancer usage. (#6955) * Add ASR with TTS Tutorial * Fix enhancer usage Signed-off-by: Vladimir Bataev <[email protected]> * install_bs (#7019) Signed-off-by: Nikolay Karpov <[email protected]> * Fix typo and branch in tutorial (#7048) Signed-off-by: Vladimir Bataev <[email protected]> * fix syntax error introduced in PR-7079 (#7102) * fix syntax error introduced in PR-7079 Signed-off-by: Alexandra Antonova <[email protected]> * fixes for pr review Signed-off-by: Alexandra Antonova <[email protected]> --------- Signed-off-by: Alexandra Antonova <[email protected]> * fix links for TN (#7117) Signed-off-by: Evelina <[email protected]> * update branch (#7135) Signed-off-by: ericharper <[email protected]> * Fixed main and merging this to r1.20 (#7127) * Fixed main and merging this to r1.20 Signed-off-by: Taejin Park <[email protected]> * Update vad_utils.py Signed-off-by: He Huang (Steve) <[email protected]> --------- Signed-off-by: Taejin Park <[email protected]> Signed-off-by: He Huang (Steve) <[email protected]> Co-authored-by: He Huang (Steve) <[email protected]> * update branch Signed-off-by: ericharper <[email protected]> * fix version Signed-off-by: ericharper <[email protected]> * resolve conflict the other way Signed-off-by: ericharper <[email protected]> * keep both Signed-off-by: ericharper <[email protected]> * revert keep both Signed-off-by: ericharper <[email protected]> --------- Signed-off-by: ericharper <[email protected]> Signed-off-by: Vladimir Bataev <[email protected]> Signed-off-by: Nikolay Karpov <[email protected]> Signed-off-by: Alexandra Antonova <[email protected]> Signed-off-by: Evelina <[email protected]> Signed-off-by: Taejin Park <[email protected]> Signed-off-by: He Huang (Steve) <[email protected]> Co-authored-by: Vladimir Bataev <[email protected]> Co-authored-by: Nikolay Karpov <[email protected]> Co-authored-by: bene-ges <[email protected]> Co-authored-by: Evelina <[email protected]> Co-authored-by: Taejin Park <[email protected]> Co-authored-by: He Huang (Steve) <[email protected]> Signed-off-by: dorotat <[email protected]> * Upgrade to pytorch lightning 2.0 (#6433) * Upgrade pytorch lightning version in requirements Signed-off-by: Abhishree <[email protected]> * Initial fixes for PTL2.0 Signed-off-by: Abhishree <[email protected]> * Add further fixes to support lightning 2.0 Signed-off-by: Abhishree <[email protected]> * Add replacements for replace_sampler_ddp, resume_from_checkpoint_fit_path and few occurances of validation_epoch_end Signed-off-by: Abhishree <[email protected]> * Replace all occurances of validation_epoch_end to on_validation_epoch_end Signed-off-by: Abhishree <[email protected]> * Replace training_epoch_end, test_epoch_end with on_train_epoch_end and on_test_epoch_end respectively Signed-off-by: Abhishree <[email protected]> * Change logger=None to logger=False in Trainer object Signed-off-by: Abhishree <[email protected]> * Remove PTL2.0 deprecated Trainer args from TrainerConfig dataclass Signed-off-by: Abhishree <[email protected]> * Modify trainer.precision check and other small edits Signed-off-by: Abhishree <[email protected]> * Replace logger=None with logger=False in test_ptl_stateless_timer.py Trainer Signed-off-by: Abhishree <[email protected]> * Add default values for args to fix Attribute Error Signed-off-by: Abhishree <[email protected]> * Add the following modifications 1) Remove outputs arg from on_validation_epoch_end, on_test_epoch_end and make it an arg of the class 2) Replace resume_from_checkpoint with ckpt_path as needed 3) Explicitly add accelerator as 'CPU' in UTs being run on CPU Signed-off-by: Abhishree <[email protected]> * Remove outputs arg from on_validation_epoch_end, on_test_epoch_end Signed-off-by: Abhishree <[email protected]> * Remove outputs arg in on_validation_epoch_end in MultiBinaryAccuracy docstrings Signed-off-by: Abhishree <[email protected]> * Add val, test outputs as instance vars in PunctuationCapitalizationModel and TokenClassificationModel Signed-off-by: Abhishree <[email protected]> * Replace trainer.fit_loop.max_steps with trainer.fit_loop.epoch_loop.max_steps in test_optimizers_schedulers.py Signed-off-by: Abhishree <[email protected]> * Revert an extra space that was mistakenly added Signed-off-by: Abhishree <[email protected]> * Use self.validation_step_outputs and self.test_step_outputs in test_ema.py for uniformity Signed-off-by: Abhishree <[email protected]> * Use self.validation_step_outputs and self.test_step_outputs in test_ptl_stateless_timer.py and check_for_ranks.py for uniformity Signed-off-by: Abhishree <[email protected]> * Add self.validation_step_outputs.clear() and self.test_step_outputs.clear() wherever missing Signed-off-by: Abhishree <[email protected]> * Remove outputs arg from on_train_epoch_end Signed-off-by: Abhishree <[email protected]> * Remove outputs from on_validation_epoch_end in multi_binary_acc.py Signed-off-by: Abhishree <[email protected]> * Remove output args from on_validation_epoch_end in the docstrings of some ASR files Signed-off-by: Abhishree <[email protected]> * Remove output args from on_validation_epoch_end and clear memory from validation_step_outputs Signed-off-by: Abhishree <[email protected]> * Add on_validation_epoch_end and remove outputs args for nlp models Signed-off-by: Abhishree <[email protected]> * Append output of validation_step to validation_step_outputs in EncDecClassificationModel Signed-off-by: Abhishree <[email protected]> * Add the following changes 1) Index self.validation_step_outputs and self.test_step.outputs with dataloader_idx wherever needed 2) Initialize self.validation_step_outputs and self.test_step.outputs as empty lists and add support for multi dataloaders if they exist 3) Remove self.pre_configure_ddp from NLPDDPStrategy class as its removed in PTL 2.0 Signed-off-by: Abhishree <[email protected]> * Add default value dataloader_idx=0 for on_validation_batch_end() in megatron_base_model.py Signed-off-by: Abhishree <[email protected]> * TypeCast precision to str in attention.py and utils_funcs.py to avoid TypeError Signed-off-by: Abhishree <[email protected]> * Add if condition check for multiple dataloaders when appending to validation outputs Signed-off-by: Abhishree <[email protected]> * Separate validation pass to be used with both validation_step and test_step Signed-off-by: Abhishree <[email protected]> * Add if condition check for multiple dataloader while appending to test_step_outputs in punctuation_capitalization_model.py Signed-off-by: Abhishree <[email protected]> * Add condition check for multiple dataloaders based on type of trainer.val/test_dataloaders or self._validation/test_dl instead of len Signed-off-by: Abhishree <[email protected]> * Comment Megatron T5 IA3 PP=2 in CI pipeline due to dataloader_iter issue with PTL 2.0 Signed-off-by: Abhishree <[email protected]> * Modify precision checks to account for 16-mixed and bf16-mixed Signed-off-by: Abhishree <[email protected]> * Append output of validation/test_step to self.validation/test_step_outputs in CTCG2PModel Signed-off-by: Abhishree <[email protected]> * Modify find_unused_parameters=True in g2p_heteronym model 1) Add find_unused_parameters=True for DDP strategy in g2p_heteronym_classification_train_and_evaluate.py 2) Remove args output in validation/test_step and add instance variables instead for heteronym_classification.py Signed-off-by: Abhishree <[email protected]> * Remove outputs from on_test_epoch_end in DialogueGPTClassificationModel Signed-off-by: Abhishree <[email protected]> * Add validation/test outputs in sgdqa_model and modify dialogue_config.yaml Signed-off-by: Abhishree <[email protected]> * Add split arg self.test_step_outputs to TextClassificationModel Signed-off-by: Abhishree <[email protected]> * Add test_step_outputs to dialogue and text classification models Signed-off-by: Abhishree <[email protected]> * Change condition check for multiple dataloaders: 1) Replace ds_item as list in dialogue_config.yaml 2) Check for len of val/test_dataloaders or validation/test_dl along with type check of list in sgdqa_model.py while appending outputs of validation/test_step 3) Check for len of _validation/test_dl for creating self.validation/test_step_outputs in ModelPT and punctuation_cpitalization_model.py Signed-off-by: Abhishree <[email protected]> * Add additional condition for multi dataloaders Check len(self.trainer.val/test_dataloaders) > 1 along with type(self.trainer.val/test_dataloaders) == list for multi dataloaders in validation/test_step Signed-off-by: Abhishree <[email protected]> * Add val step outputs and default val for dataloader_idx 1) Append validation_step outout to self.validation_step_outputs in MultiLabelIntentSlotClassificationMode 2) Add default val for dataloader_idx for on_test_batch_start/end in TimingCallback 3) Add self.validation/test_step_outputs in BERTQAModel and remove outputs arg Signed-off-by: Abhishree <[email protected]> * Add val/test_step_outputs to S2SQAModel and GPTQAModel Signed-off-by: Abhishree <[email protected]> * Edit JenkinsFile for bert_pretrainig.py Edit Jenkinsfile for this test to disable validation as a workaround for trainer.val_dataloader None error Signed-off-by: Abhishree <[email protected]> * Modify precision to support 16-mixed, bf16-mixed in megatron_gpt_pretraining.py Signed-off-by: Abhishree <[email protected]> * Add ddp_find_unused_parameters_true and remove output args 1) Add ddp_find_unused_parameters_true fro trainer.strategy in self_alignment_pretraining.py as it has unused parameters 2) Remove output args and add self.validation/test_step_outputs to validation/test_step in mt_enc_dec_model.py 3) Comment tests in JenkinsFile that need to be fixed Signed-off-by: Abhishree <[email protected]> * Precision fix in megatron_nmt_training.py for 16-mixed, bf16-mixed Signed-off-by: Abhishree <[email protected]> * Precision fix for megatron_bert_pretraining.py and megatron_bert_model.py Signed-off-by: Abhishree <[email protected]> * Precision fix and validation/test_step_outputs 1) Add fix to account for 16-mixed and bf16-mixed in megatron_retro_mutransfer_pretrain.py, megatron_retro_pretraining.py 2) Reset ckpt_path for test in enc_dec_nmt.py 3) Remove outputs args and add validation/test_step_outputs in megatron_retrieval_model.py 4) Comment Megatron Bert Pretraining and Resume Training with Pipeline Paralleism and add back NMT Training Post-LN Signed-off-by: Abhishree <[email protected]> * Precision fix and skip few failing tests Signed-off-by: Abhishree <[email protected]> * Add missing comment lines in JenkinsFile Signed-off-by: Abhishree <[email protected]> * Comment jenkin tests and super().on_validation_epoch_end() in megatron_gpt_sft_model.py Signed-off-by: Abhishree <[email protected]> * Minor edit JenkinsFile Signed-off-by: Abhishree <[email protected]> * Minor edit in jenkins file Signed-off-by: Abhishree <[email protected]> * Edit in Jenkins file Signed-off-by: Abhishree <[email protected]> * Comment missed lines in Jenkins file Signed-off-by: Abhishree <[email protected]> * Fix precision and validation/test outputs 1) Add precision fix to account for 16-mixed and bf16-mixed in megatron_t5_pretraining.py 2) Remove outputs args and add append loss to self.validation/test_step_outputs in megatron_lm_encoder_decoder_model.py 3) Add back resume_from_checkpoint in the megatron_t5_config.yaml 4) Comment out certain tests in Jenkins file Signed-off-by: Abhishree <[email protected]> * Fix precision and validation/test/predict errors in megatron_t5_prompt_learning.py Signed-off-by: Abhishree <[email protected]> * Precision fix and edit precision typo in all files 1) Account for 16-mixed and bf16-mixed in megatron_bart_pretraining.py and megatron_t5_seq2seq_finetune.py 2) Fix precision typo in all files Signed-off-by: Abhishree <[email protected]> * Fix all CI TTS tests and comment few Jenkins tests Signed-off-by: Abhishree <[email protected]> * Combine xx_epoch_end and on_xx_epoch_end Add on_inference_epoch_end to inference_epoch_end function and have a single on_validation/test_epoch_end in megatron_finetune_model.py and megatron_gpt_sft_model.py Signed-off-by: Abhishree <[email protected]> * Add a missing comment in JenkinsFile Signed-off-by: Abhishree <[email protected]> * Add try except StopIteration in validation_step for models with dataloader_iter Signed-off-by: Abhishree <[email protected]> * Remove pyyaml from requirements Signed-off-by: Abhishree <[email protected]> * Add try except for inference_step in megatron_finetune_model.py Signed-off-by: Abhishree <[email protected]> * Remove limit_val_batches for mockGPTDataset test Signed-off-by: Abhishree <[email protected]> * Add new self.validation_step_outputs for MegatronGPTSFTModel Signed-off-by: Abhishree <[email protected]> * Minor edit Jenkinsfile Signed-off-by: Abhishree <[email protected]> * Initialize self.validation/test_step_outputs in megatron_gpt_sft_model.py Initialize self.validation/test_step_outputs in setup of MegatronGPTSFTModel to take care of cases when datalaoders are not setup in ModelPT for example while restoring the model. Signed-off-by: Abhishree <[email protected]> * Remove resume_from_checkpoint if trainer arg in conf yaml files Signed-off-by: Abhishree <[email protected]> * Remove resume_from_checkpoint as trainer arg in GPT, T5 configs Signed-off-by: Abhishree <[email protected]> * Remove resume_from_checkpoint in duplex_tn_config.yaml Signed-off-by: Abhishree <[email protected]> * Fix typos, unused imports and refactor code to remove redundant funcs Signed-off-by: Abhishree <[email protected]> * Remove commented code in megatron_nmt_model.py Signed-off-by: Abhishree <[email protected]> * Fix overriden functions to match parent class functions Signed-off-by: Abhishree <[email protected]> * Prefetch dataloader_iter to prevent hang for PP>1 Signed-off-by: Abhishree <[email protected]> * Override setup() in NLPDDPStrategy to avoid hang during predict with PP>1 Signed-off-by: Abhishree <[email protected]> * Uncomment tests in JenkinsFile Signed-off-by: Abhishree <[email protected]> * Add '16' to precision checks and other minor fixes Signed-off-by: Abhishree <[email protected]> * Clear validation/test_step_outputs with dataloader_idx for multi dataloaders Signed-off-by: Abhishree <[email protected]> * Minor edits Signed-off-by: Abhishree <[email protected]> * Modify precision checks to avoid indexing Signed-off-by: Abhishree <[email protected]> * Remove self.validation_step_outputs_sft and add dataloader_idx to clear outputs Signed-off-by: Abhishree <[email protected]> * Reference checkpoint with trainer.ckpt_path Signed-off-by: Abhishree <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add _prefetch to NLPModel and minor fixes Signed-off-by: Abhishree <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add limit_val_batches in JenkinsFile for NMT 1) Add trainer.limit_val_batches in Megatron NMT Training TP=2 2) Remove unused import in ModelPT Signed-off-by: Abhishree <[email protected]> --------- Signed-off-by: Abhishree <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: dorotat <[email protected]> * Include the scripts for preprocessing OAST and unit tests for chat sft datasets (#7112) * scripts for sft Signed-off-by: Yi Dong <[email protected]> * fix style Signed-off-by: Yi Dong <[email protected]> * adde special token only for huggingface model Signed-off-by: Yi Dong <[email protected]> * change default name Signed-off-by: Yi Dong <[email protected]> * print out error datapoint content Signed-off-by: Yi Dong <[email protected]> * show error id Signed-off-by: Yi Dong <[email protected]> * annotation script working Signed-off-by: Yi Dong <[email protected]> * try to be compatible with huggingface tokenizer Signed-off-by: Yi Dong <[email protected]> * added examples Signed-off-by: Yi Dong <[email protected]> * added lang Signed-off-by: Yi Dong <[email protected]> * added lang Signed-off-by: Yi Dong <[email protected]> * text to value special case Signed-off-by: Yi Dong <[email protected]> * configure the slider Signed-off-by: Yi Dong <[email protected]> * annoatation handles lang Signed-off-by: Yi Dong <[email protected]> * added the unit test for chat sft dataset Signed-off-by: Yi Dong <[email protected]> * used the file in the test dir Signed-off-by: Yi Dong <[email protected]> * fix json error Signed-off-by: Yi Dong <[email protected]> * load local tokenizer Signed-off-by: Yi Dong <[email protected]> * remove mask count check Signed-off-by: Yi Dong <[email protected]> * added HF dataset backend Signed-off-by: Yi Dong <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Yi Dong <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: dorotat <[email protected]> * add paths to labeler. (#7087) Signed-off-by: Xuesong Yang <[email protected]> Signed-off-by: dorotat <[email protected]> * T5 metrics fix (#7037) * Fix race condition when executing with multi-node where some ranks does not wait for setup (#7016) Signed-off-by: Kim Ngo <[email protected]> Signed-off-by: jubick1337 <[email protected]> * Added bool types to neural_types export (#7032) Signed-off-by: tbartley94 <[email protected]> Signed-off-by: jubick1337 <[email protected]> * rnnt and char utils (#6971) * rnnt_ngram_merge Signed-off-by: Nikolay Karpov <[email protected]> * char level bug Signed-off-by: Nikolay Karpov <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Nikolay Karpov <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Somshubra Majumdar <[email protected]> Signed-off-by: jubick1337 <[email protected]> * fix tab text gen (#7022) (#7031) Signed-off-by: Yi Dong <[email protected]> Co-authored-by: Yi Dong <[email protected]> Signed-off-by: jubick1337 <[email protected]> * Fixed kwargs for metric instance init Signed-off-by: jubick1337 <[email protected]> * Fixed kwargs for metric instance init Signed-off-by: jubick1337 <[email protected]> * removed kwagrs Signed-off-by: jubick1337 <[email protected]> * Updated config desc Signed-off-by: jubick1337 <[email protected]> * ASR Confidence update and tutorial (#6810) * small fixes and tests Signed-off-by: Aleksandr Laptev <[email protected]> * various fixes for the tutorial Signed-off-by: Aleksandr Laptev <[email protected]> * tutorial added Signed-off-by: Aleksandr Laptev <[email protected]> * for for a little oops after rebasement Signed-off-by: Aleksandr Laptev <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix tests Signed-off-by: Aleksandr Laptev <[email protected]> * unused import removed Signed-off-by: Aleksandr Laptev <[email protected]> * fix review comments Signed-off-by: Aleksandr Laptev <[email protected]> * deprecated parameters for greedy configs Signed-off-by: Aleksandr Laptev <[email protected]> * move re-assigning to configs Signed-off-by: Aleksandr Laptev <[email protected]> * fix comments 2 Signed-off-by: Aleksandr Laptev <[email protected]> * fix config tests Signed-off-by: Aleksandr Laptev <[email protected]> * fix ece test (my env was bugged apparently) Signed-off-by: Aleksandr Laptev <[email protected]> * renamings for confidence ensemble Signed-off-by: Aleksandr Laptev <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fox comments 3 Signed-off-by: Aleksandr Laptev <[email protected]> * return dropped tutorial Signed-off-by: Aleksandr Laptev <[email protected]> * CI flips back and forth, increasing tolerance Signed-off-by: Aleksandr Laptev <[email protected]> --------- Signed-off-by: Aleksandr Laptev <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: jubick1337 <[email protected]> * install_bs (#7019) (#7028) Signed-off-by: Nikolay Karpov <[email protected]> Co-authored-by: Nikolay Karpov <[email protected]> Signed-off-by: jubick1337 <[email protected]> * fixes for spellmapper (#6994) (#7000) Signed-off-by: Alexandra Antonova <[email protected]> Co-authored-by: bene-ges <[email protected]> Co-authored-by: Evelina <[email protected]> Signed-off-by: jubick1337 <[email protected]> * added back the retro documents (#7033) Signed-off-by: Yi Dong <[email protected]> Signed-off-by: jubick1337 <[email protected]> * Remove pyyaml (#7052) (#7054) Signed-off-by: smajumdar <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Signed-off-by: jubick1337 <[email protected]> * st standalone model (#6969) * st standalone model Signed-off-by: AlexGrinch <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * style fix Signed-off-by: AlexGrinch <[email protected]> * sacrebleu import fix, unused imports removed Signed-off-by: AlexGrinch <[email protected]> * import guard for nlp inside asr transformer bpe model Signed-off-by: AlexGrinch <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * codeql fixes Signed-off-by: AlexGrinch <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * comments answered Signed-off-by: AlexGrinch <[email protected]> * import ordering fix Signed-off-by: AlexGrinch <[email protected]> * yttm for asr removed Signed-off-by: AlexGrinch <[email protected]> * logging added Signed-off-by: AlexGrinch <[email protected]> * added inference and translate method Signed-off-by: AlexGrinch <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: AlexGrinch <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: jubick1337 <[email protected]> * remove pos emb from state dict for old models (#7068) * remove pos emb from state dict Signed-off-by: Evelina <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * move to nlp_model Signed-off-by: Evelina <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update comment Signed-off-by: Evelina <[email protected]> * fix nmt test Signed-off-by: Evelina <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix nmt test Signed-off-by: Evelina <[email protected]> --------- Signed-off-by: Evelina <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: jubick1337 <[email protected]> * Fix typo in ASR-TTS tutorial (#7049) Signed-off-by: Vladimir Bataev <[email protected]> Signed-off-by: jubick1337 <[email protected]> * Fixed tutorial's name (#7047) Signed-off-by: Vitaly Lavrukhin <[email protected]> Co-authored-by: Vladimir Bataev <[email protected]> Signed-off-by: jubick1337 <[email protected]> * Fix documentation for Numba (#7065) (#7077) * Fix documentation for Numba * Update force float32 flag dynamically * Update force float32 flag dynamically * Fix nemo version --------- Signed-off-by: smajumdar <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: jubick1337 <[email protected]> * Update Frame-VAD doc and fix onnx export (#7076) * update fvad doc Signed-off-by: stevehuang52 <[email protected]> * fix typo Signed-off-by: stevehuang52 <[email protected]> * update fvad example Signed-off-by: stevehuang52 <[email protected]> * update Signed-off-by: stevehuang52 <[email protected]> * fix onnx export Signed-off-by: stevehuang52 <[email protected]> * update test Signed-off-by: stevehuang52 <[email protected]> * refactor Signed-off-by: stevehuang52 <[email protected]> * update doc Signed-off-by: stevehuang52 <[email protected]> * update Signed-off-by: stevehuang52 <[email protected]> --------- Signed-off-by: stevehuang52 <[email protected]> Co-authored-by: fayejf <[email protected]> Signed-off-by: jubick1337 <[email protected]> * memmap worker arg (#7062) * memmap worker arg Signed-off-by: arendu <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update Signed-off-by: arendu <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update Signed-off-by: arendu <[email protected]> * update Signed-off-by: arendu <[email protected]> --------- Signed-off-by: arendu <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: jubick1337 <[email protected]> * Fix caching bug in causal convolutions for cache-aware ASR models (#7034) (#7082) Co-authored-by: Vahid Noroozi <[email protected]> Signed-off-by: jubick1337 <[email protected]> * Fast Conformer global token fix (#7085) * old way Signed-off-by: sam1373 <[email protected]> * fix Signed-off-by: sam1373 <[email protected]> * fix Signed-off-by: sam1373 <[email protected]> * fix Signed-off-by: sam1373 <[email protected]> * remove extra Signed-off-by: sam1373 <[email protected]> * clean Signed-off-by: sam1373 <[email protected]> * clean Signed-off-by: sam1373 <[email protected]> * clean Signed-off-by: sam1373 <[email protected]> * fix Signed-off-by: sam1373 <[email protected]> * fix Signed-off-by: sam1373 <[email protected]> * fix Signed-off-by: sam1373 <[email protected]> * fix Signed-off-by: sam1373 <[email protected]> * fix Signed-off-by: sam1373 <[email protected]> * fix Signed-off-by: sam1373 <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: sam1373 <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: jubick1337 <[email protected]> * Refined export_config (#7053) (#7066) * Refined export_config * Rolling back hierarchy change --------- Signed-off-by: Boris Fomitchev <[email protected]> Co-authored-by: Boris Fomitchev <[email protected]> Signed-off-by: jubick1337 <[email protected]> * small Bugfix (#7081) * small Bugfix (#7079) * fix branch Signed-off-by: fayejf <[email protected]> * fix typo Signed-off-by: fayejf <[email protected]> * fix link Signed-off-by: fayejf <[email protected]> --------- Signed-off-by: fayejf <[email protected]> * Update tutorials/nlp/SpellMapper_English_ASR_Customization.ipynb Signed-off-by: Somshubra Majumdar <[email protected]> * Update tutorials/nlp/SpellMapper_English_ASR_Customization.ipynb Signed-off-by: Somshubra Majumdar <[email protected]> --------- Signed-off-by: fayejf <[email protected]> Signed-off-by: Somshubra Majumdar <[email protected]> Co-authored-by: fayejf <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Signed-off-by: jubick1337 <[email protected]> * Added script to extract ASR CTC and RNNT models from ASR hybrid models (#7092) * Added script to extract ctc and rnnt models from hybrid models Signed-off-by: Daniel Egert <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Updated hybrid extraction script for review request 1 Signed-off-by: Daniel Egert <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Updated hybrid convert script to remove --cuda flag Signed-off-by: Daniel Egert <[email protected]> --------- Signed-off-by: Daniel Egert <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Somshubra Majumdar <[email protected]> Signed-off-by: jubick1337 <[email protected]> * Adding docs and models for multiple lookahead cache-aware ASR (#7067) (#7094) Signed-off-by: jubick1337 <[email protected]> * update TTS readme (#7088) * update TTS readme Signed-off-by: Xuesong Yang <[email protected]> --------- Signed-off-by: Xuesong Yang <[email protected]> Signed-off-by: jubick1337 <[email protected]> * Fix absolute path in path join call (#7099) Signed-off-by: Jan Beckmann <[email protected]> Signed-off-by: jubick1337 <[email protected]> * Disable distopt contiguous param buffer by default (#7095) Signed-off-by: Tim Moon <[email protected]> Signed-off-by: jubick1337 <[email protected]> * microphone demo (#7110) Signed-off-by: Linnea Pari Leaver <[email protected]> Co-authored-by: Linnea Pari Leaver <[email protected]> Signed-off-by: jubick1337 <[email protected]> * [Fix] load_state_dict in nlp_model.py (#7086) * Fix load_state_dict in nlp_model.py Signed-off-by: He Huang (Steve) <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: He Huang (Steve) <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: jubick1337 <[email protected]> * Fix plot function in vad_utils.py (#7113) Fix plot function in vad_utils.py Signed-off-by: He Huang (Steve) <[email protected]> Signed-off-by: jubick1337 <[email protected]> * Fixed small bug with NoisePerturbationWithNormalization (#7118) Signed-off-by: Daniel Egert <[email protected]> Signed-off-by: jubick1337 <[email protected]> * Fix import guard checks (#7124) Signed-off-by: smajumdar <[email protected]> Signed-off-by: jubick1337 <[email protected]> * Revert "Fix import guard checks (#7124)" (#7125) This reverts commit ae7624da7d773a6b9436ff61903dc4b99c7c27cb. Signed-off-by: jubick1337 <[email protected]> * Fix import guard checks (#7126) * Fix import guard checks Signed-off-by: smajumdar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: smajumdar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: jubick1337 <[email protected]> * Add updated fc ctc and rnnt xxl models (#7128) (#7130) Signed-off-by: jubick1337 <[email protected]> * [TTS] Create EnCodec training recipe (#6852) * [TTS] Create EnCodec training recipe Signed-off-by: Ryan <[email protected]> * [TTS] Update encodec recipe Signed-off-by: Ryan <[email protected]> * [TTS] Rename EnCodec to AudioCodec Signed-off-by: Ryan <[email protected]> * [TTS] Add EnCodec unit tests Signed-off-by: Ryan <[email protected]> * [TTS] Add copyright header to distributed.py Signed-off-by: Ryan <[email protected]> --------- Signed-off-by: Ryan <[email protected]> Signed-off-by: jubick1337 <[email protected]> * Fix rank where torch.distributed may not be initialized yet and would not wait for tokenizer file caching (#7061) Signed-off-by: Kim Ngo <[email protected]> Co-authored-by: David <[email protected]> Signed-off-by: jubick1337 <[email protected]> * fix default attention size (#7141) (#7143) Signed-off-by: jubick1337 <[email protected]> * fix evaluator.py for various exceptions by ast (#7150) Signed-off-by: He Huang (Steve) <[email protected]> Signed-off-by: jubick1337 <[email protected]> * [TTS][ZH] add Chinese TTS recipes based on IPA symbol sets. (#6893) * [TTS] add Chinese TTS recipe based on IPA. * add new pinyin and ipa dictionaries with 36 finals. * add yaml configs for 24-final pinyin and ipa. * add copyright header * add a directory level 24finals to discriminate from 36 finals. Signed-off-by: Xuesong Yang <[email protected]> * unify configs into a single one and add detailed comments providing supported candidates. Signed-off-by: Xuesong Yang <[email protected]> * choose 36-final IPA as default phoneme dict Signed-off-by: Xuesong Yang <[email protected]> --------- Signed-off-by: Xuesong Yang <[email protected]> Signed-off-by: jubick1337 <[email protected]> * [TTS] Add output audio format to preprocessing (#6889) * [TTS] Add output audio format to preprocessing Signed-off-by: Ryan <[email protected]> * [TTS] Add format validation Signed-off-by: Ryan <[email protected]> * [TTS] Fix data tutorial Signed-off-by: Ryan <[email protected]> --------- Signed-off-by: Ryan <[email protected]> Signed-off-by: jubick1337 <[email protected]> * freeze (#7152) Signed-off-by: arendu <[email protected]> Signed-off-by: jubick1337 <[email protected]> * make sure any empty segments are removed (#7155) Signed-off-by: Elena Rastorgueva <[email protected]> Signed-off-by: jubick1337 <[email protected]> * Update RIR generation scripts (#6547) - fix: reduce room size if evaluation of params fails - added randomized mic placement - added diffuse noise generation - added an option to specify the format and subtype for saved audio Signed-off-by: Ante Jukić <[email protected]> Signed-off-by: jubick1337 <[email protected]> * A quickstart speech enhancement tutorial (#6492) A simple example of training a model for speech enhancement task Signed-off-by: Ante Jukić <[email protected]> Signed-off-by: jubick1337 <[email protected]> * NFA subtitle file config - specify colors and vertical alignment (#7160) * allow specifying colors of text in ASS subtitle file Signed-off-by: Elena Rastorgueva <[email protected]> * specify vertical_alignment instead of marginv in ass_file_config Signed-off-by: Elena Rastorgueva <[email protected]> * add documentation of CTMFileConfig and ASSFileConfig to NFA README Signed-off-by: Elena Rastorgueva <[email protected]> --------- Signed-off-by: Elena Rastorgueva <[email protected]> Signed-off-by: jubick1337 <[email protected]> * Eagerly accumulate embedding grads into fp32 buffer (#6958) (#7153) Signed-off-by: Tim Moon <[email protected]> Co-authored-by: Tim Moon <[email protected]> Signed-off-by: jubick1337 <[email protected]> * TE bug fix (#7027) (#7036) Signed-off-by: Dmytro Pykhtar <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Signed-off-by: jubick1337 <[email protected]> * [TTS] Remove nested TTS configs (#7154) * [TTS] Remove nested TTS configs Signed-off-by: Ryan <[email protected]> * [TTS] Modify tutorial to support multiple sampling rates Signed-off-by: Ryan <[email protected]> * [TTS] Clarify min_duration unit Signed-off-by: Ryan <[email protected]> * [TTS] Default 22.05kHz highfreq to null Signed-off-by: Ryan <[email protected]> --------- Signed-off-by: Ryan <[email protected]> Signed-off-by: jubick1337 <[email protected]> * Merge release r1.20.0 to main (#7167) * update package info Signed-off-by: ericharper <[email protected]> * Add ASR with TTS Tutorial. Fix enhancer usage. (#6955) * Add ASR with TTS Tutorial * Fix enhancer usage Signed-off-by: Vladimir Bataev <[email protected]> * install_bs (#7019) Signed-off-by: Nikolay Karpov <[email protected]> * Fix typo and branch in tutorial (#7048) Signed-off-by: Vladimir Bataev <[email protected]> * fix syntax error introduced in PR-7079 (#7102) * fix syntax error introduced in PR-7079 Signed-off-by: Alexandra Antonova <[email protected]> * fixes for pr review Signed-off-by: Alexandra Antonova <[email protected]> --------- Signed-off-by: Alexandra Antonova <[email protected]> * fix links for TN (#7117) Signed-off-by: Evelina <[email protected]> * update branch (#7135) Signed-off-by: ericharper <[email protected]> * Fixed main and merging this to r1.20 (#7127) * Fixed main and merging this to r1.20 Signed-off-by: Taejin Park <[email protected]> * Update vad_utils.py Signed-off-by: He Huang (Steve) <[email protected]> --------- Signed-off-by: Taejin Park <[email protected]> Signed-off-by: He Huang (Steve) <[email protected]> Co-authored-by: He Huang (Steve) <[email protected]> * update branch Signed-off-by: ericharper <[email protected]> * fix version Signed-off-by: ericharper <[email protected]> * resolve conflict the other way Signed-off-by: ericharper <[email protected]> * keep both Signed-off-by: ericharper <[email protected]> * revert keep both Signed-off-by: ericharper <[email protected]> --------- Signed-off-by: ericharper <[email protected]> Signed-off-by: Vladimir Bataev <[email protected]> Signed-off-by: Nikolay Karpov <[email protected]> Signed-off-by: Alexandra Antonova <[email protected]> Signed-off-by: Evelina <[email protected]> Signed-off-by: Taejin Park <[email protected]> Signed-off-by: He Huang (Steve) <[email protected]> Co-authored-by: Vladimir Bataev <[email protected]> Co-authored-by: Nikolay Karpov <[email protected]> Co-authored-by: bene-ges <[email protected]> Co-authored-by: Evelina <[email protected]> Co-authored-by: Taejin Park <[email protected]> Co-authored-by: He Huang (Steve) <[email protected]> Signed-off-by: jubick1337 <[email protected]> * Upgrade to pytorch lightning 2.0 (#6433) * Upgrade pytorch lightning version in requirements Signed-off-by: Abhishree <[email protected]> * Initial fixes for PTL2.0 Signed-off-by: Abhishree <[email protected]> * Add further fixes to support lightning 2.0 Signed-off-by: Abhishree <[email protected]> * Add replacements for replace_sampler_ddp, resume_from_checkpoint_fit_path and few occurances of validation_epoch_end Signed-off-by: Abhishree <[email protected]> * Replace all occurances of validation_epoch_end to on_validation_epoch_end Signed-off-by: Abhishree <[email protected]> * Replace training_epoch_end, test_epoch_end with on_train_epoch_end and on_test_epoch_end respectively Signed-off-by: Abhishree <[email protected]> * Change logger=None to logger=False in Trainer object Signed-off-by: Abhishree <[email protected]> * Remove PTL2.0 deprecated Trainer args from TrainerConfig dataclass Signed-off-by: Abhishree <[email protected]> * Modify trainer.precision check and other small edits Signed-off-by: Abhishree <[email protected]> * Replace logger=None with logger=False in test_ptl_stateless_timer.py Trainer Signed-off-by: Abhishree <[email protected]> * Add default values for args to fix Attribute Error Signed-off-by: Abhishree <[email protected]> * Add the following modifications 1) Remove outputs arg from on_validation_epoch_end, on_test_epoch_end and make it an arg of the class 2) Replace resume_from_checkpoint with ckpt_path as needed 3) Explicitly add accelerator as 'CPU' in UTs being run on CPU Signed-off-by: Abhishree <[email protected]> * Remove outputs arg from on_validation_epoch_end, on_test_epoch_end Signed-off-by: Abhishree <[email protected]> * Remove outputs arg in on_validation_epoch_end in MultiBinaryAccuracy docstrings Signed-off-by: Abhishree <[email protected]> * Add val, test outputs as instance vars in PunctuationCapitalizationModel and TokenClassificationModel Signed-off-by: Abhishree <[email protected]> * Replace trainer.fit_loop.max_steps with trainer.fit_loop.epoch_loop.max_steps in test_optimizers_schedulers.py Signed-off-by: Abhishree <[email protected]> * Revert an extra space that was mistakenly added Signed-off-by: Abhishree <[email protected]> * Use self.validation_step_outputs and self.test_step_outputs in test_ema.py for uniformity Signed-off-by: Abhishree <[email protected]> * Use self.validation_step_outputs and self.test_step_outputs in test_ptl_stateless_timer.py and check_for_ranks.py for uniformity Signed-off-by: Abhishree <[email protected]> * Add self.validation_step_outputs.clear() and self.test_step_outputs.clear() wherever missing Signed-off-by: Abhishree <[email protected]> * Remove outputs arg from on_train_epoch_end Signed-off-by: Abhishree <[email protected]> * Remove outputs from on_validation_epoch_end in multi_binary_acc.py Signed-off-by: Abhishree <[email protected]> * Remove output args from on_validation_epoch_end in the docstrings of some ASR files Signed-off-by: Abhishree <[email protected]> * Remove output args from on_validation_epoch_end and clear memory from validation_step_outputs Signed-off-by: Abhishree <[email protected]> * Add on_validation_epoch_end and remove outputs args for nlp models Signed-off-by: Abhishree <[email protected]> * Append output of validation_step to validation_step_outputs in EncDecClassificationModel Signed-off-by: Abhishree <[email protected]> * Add the following changes 1) Index self.validation_step_outputs and self.test_step.outputs with dataloader_idx wherever needed 2) Initialize self.validation_step_outputs and self.test_step.outputs as empty lists and add support for multi dataloaders if they exist 3) Remove self.pre_configure_ddp from NLPDDPStrategy class as its removed in PTL 2.0 Signed-off-by: Abhishree <[email protected]> * Add default value dataloader_idx=0 for on_validation_batch_end() in megatron_base_model.py Signed-off-by: Abhishree <[email protected]> * TypeCast precision to str in attention.py and utils_funcs.py to avoid TypeError Signed-off-by: Abhishree <[email protected]> * Add if condition check for multiple dataloaders when appending to validation outputs Signed-off-by: Abhishree <[email protected]> * Separate validation pass to be used with both validation_step and test_step Signed-off-by: Abhishree <[email protected]> * Add if condition check for multiple dataloader while appending to test_step_outputs in punctuation_capitalization_model.py Signed-off-by: Abhishree <[email protected]> * Add condition check for multiple dataloaders based on type of trainer.val/test_dataloaders or self._validation/test_dl instead of len Signed-off-by: Abhishree <[email protected]> * Comment Megatron T5 IA3 PP=2 in CI pipeline due to dataloader_iter issue with PTL 2.0 Signed-off-by: Abhishree <[email protected]> * Modify precision checks to account for 16-mixed and bf16-mixed Signed-off-by: Abhishree <[email protected]> * Append output of validation/test_step to self.validation/test_step_outputs in CTCG2PModel Signed-off-by: Abhishree <[email protected]> * Modify find_unused_parameters=True in g2p_heteronym model 1) Add find_unused_parameters=True for DDP strategy in g2p_heteronym_classification_train_and_evaluate.py 2) Remove args output in validation/test_step and add instance variables instead for heteronym_classification.py Signed-off-by: Abhishree <[email protected]> * Remove outputs from on_test_epoch_end in DialogueGPTClassificationModel Signed-off-by: Abhishree <[email protected]> * Add validation/test outputs in sgdqa_model and modify dialogue_config.yaml Signed-off-by: Abhishree <[email protected]> * Add split arg self.test_step_outputs to TextClassificationModel Signed-off-by: Abhishree <[email protected]> * Add test_step_outputs to dialogue and text classification models Signed-off-by: Abhishree <[email protected]> * Change condition check for multiple dataloaders: 1) Replace ds_item as list in dialogue_config.yaml 2) Check for len of val/test_dataloaders or validation/test_dl along with type check of list in sgdqa_model.py while appending outputs of validation/test_step 3) Check for len of _validation/test_dl for creating self.validation/test_step_outputs in ModelPT and punctuation_cpitalization_model.py Signed-off-by: Abhishree <[email protected]> * Add additional condition for multi dataloaders Check len(self.trainer.val/test_dataloaders) > 1 along with type(self.trainer.val/test_dataloaders) == list for multi dataloaders in validation/test_step Signed-off-by: Abhishree <[email protected]> * Add val step outputs and default val for dataloader_idx 1) Append validation_step outout to self.validation_step_outputs in MultiLabelIntentSlotClassificationMode 2) Add default val for dataloader_idx for on_test_batch_start/end in TimingCallback 3) Add self.validation/test_step_outputs in BERTQAModel and remove outputs arg Signed-off-by: Abhishree <[email protected]> * Add val/test_step_outputs to S2SQAModel and GPTQAModel Signed-off-by: Abhishree <[email protected]> * Edit JenkinsFile for bert_pretrainig.py Edit Jenkinsfile for this test to disable validation as a workaround for trainer.val_dataloader None error Signed-off-by: Abhishree <[email protected]> * Modify precision to support 16-mixed, bf16-mixed in megatron_gpt_pretraining.py Signed-off-by: Abhishree <[email protected]> * Add ddp_find_unused_parameters_true and remove output args 1) Add ddp_find_unused_parameters_true fro trainer.strategy in self_alignment_pretraining.py as it has unused parameters 2) Remove output args and add self.validation/test_step_outputs to validation/test_step in mt_enc_dec_model.py 3) Comment tests in JenkinsFile that need to be fixed Signed-off-by: Abhishree <[email protected]> * Precision fix in megatron_nmt_training.py for 16-mixed, bf16-mixed Signed-off-by: Abhishree <[email protected]> * Precision fix for megatron_bert_pretraining.py and megatron_bert_model.py Signed-off-by: Abhishree <[email protected]> * Precision fix and validation/test_step_outputs 1) Add fix to account for 16-mixed and bf16-mixed in megatron_retro_mutransfer_pretrain.py, megatron_retro_pretraining.py 2) Reset ckpt_path for test in enc_dec_nmt.py 3) Remove outputs args and add validation/test_step_outputs in megatron_retrieval_model.py 4) Comment Megatron Bert Pretraining and Resume Training with Pipeline Paralleism and add back NMT Training Post-LN Signed-off-by: Abhishree <[email protected]> * Precision fix and skip few failing tests Signed-off-by: Abhishree <[email protected]> * Add missing comment lines in JenkinsFile Signed-off-by: Abhishree <[email protected]> * Comment jenkin tests and super().on_validation_epoch_end() in megatron_gpt_sft_model.py Signed-off-by: Abhishree <[email protected]> * Minor edit JenkinsFile Signed-off-by: Abhishree <[email protected]> * Minor edit in jenkins file Signed-off-by: Abhishree <[email protected]> * Edit in Jenkins file Signed-off-by: Abhishree <[email protected]> * Comment missed lines in Jenkins file Signed-off-by: Abhishree <[email protected]> * Fix precision and validation/test outputs 1) Add precision fix to account for 16-mixed and bf16-mixed in megatron_t5_pretraining.py 2) Remove outputs args and add append loss to self.validation/test_step_outputs in megatron_lm_encoder_decoder_model.py 3) Add back resume_from_checkpoint in the megatron_t5_config.yaml 4) Comment out certain tests in Jenkins file Signed-off-by: Abhishree <[email protected]> * Fix precision and validation/test/predict errors in megatron_t5_prompt_learning.py Signed-off-by: Abhishree <[email protected]> * Precision fix and edit precision typo in all files 1) Account for 16-mixed and bf16-mixed in megatron_bart_pretraining.py and megatron_t5_seq2seq_finetune.py 2) Fix precision typo in all files Signed-off-by: Abhishree <[email protected]> * Fix all CI TTS tests and comment few Jenkins tests Signed-off-by: Abhishree <[email protected]> * Combine xx_epoch_end and on_xx_epoch_end Add on_inference_epoch_end to inference_epoch_end function and have a single on_validation/test_epoch_end in megatron_finetune_model.py and megatron_gpt_sft_model.py Signed-off-by: Abhishree <[email protected]> * Add a missing comment in JenkinsFile Signed-off-by: Abhishree <[email protected]> * Add try except StopIteration in validation_step for models with dataloader_iter Signed-off-by: Abhishree <[email protected]> * Remove pyyaml from requirements Signed-off-by: Abhishree <[email protected]> * Add try except for inference_step in megatron_finetune_model.py Signed-off-by: Abhishree <[email protected]> * Remove limit_val_batches for mockGPTDataset test Signed-off-by: Abhishree <[email protected]> * Add new self.validation_step_outputs for MegatronGPTSFTModel Signed-off-by: Abhishree <[email protected]> * Minor edit Jenkinsfile Signed-off-by: Abhishree <[email protected]> * Initialize self.validation/test_step_outputs in megatron_gpt_sft_model.py Initialize self.validation/test_step_outputs in setup of MegatronGPTSFTModel to take care of cases when datalaoders are not setup in ModelPT for example while restoring the model. Signed-off-by: Abhishree <[email protected]> * Remove resume_from_checkpoint if trainer arg in conf yaml files Signed-off-by: Abhishree <[email protected]> * Remove resume_from_checkpoint as trainer arg in GPT, T5 configs Signed-off-by: Abhishree <abhishreetm@gmai…

github-actions bot added the NLP label Aug 24, 2023

dorotat-nv and others added 29 commits August 24, 2023 15:22

migrated class

3c1e8f8

Signed-off-by: dorotat <[email protected]>

[pre-commit.ci] auto fixes from pre-commit.com hooks

c8b80b2

for more information, see https://pre-commit.ci Signed-off-by: dorotat <[email protected]>

added unit test

21ac92f

Signed-off-by: dorotat <[email protected]>

Fix caching bug in causal convolutions for cache-aware ASR models (NV…

1ea9bf5

…IDIA#7034) (NVIDIA#7082) Co-authored-by: Vahid Noroozi <[email protected]> Signed-off-by: dorotat <[email protected]>

Refined export_config (NVIDIA#7053) (NVIDIA#7066)

0a22846

* Refined export_config * Rolling back hierarchy change --------- Signed-off-by: Boris Fomitchev <[email protected]> Co-authored-by: Boris Fomitchev <[email protected]> Signed-off-by: dorotat <[email protected]>

Adding docs and models for multiple lookahead cache-aware ASR (NVIDIA…

9c6c494

…#7067) (NVIDIA#7094) Signed-off-by: dorotat <[email protected]>

update TTS readme (NVIDIA#7088)

1a5a32f

* update TTS readme Signed-off-by: Xuesong Yang <[email protected]> --------- Signed-off-by: Xuesong Yang <[email protected]> Signed-off-by: dorotat <[email protected]>

Fix absolute path in path join call (NVIDIA#7099)

abc48d6

Signed-off-by: Jan Beckmann <[email protected]> Signed-off-by: dorotat <[email protected]>

Disable distopt contiguous param buffer by default (NVIDIA#7095)

bd30112

Signed-off-by: Tim Moon <[email protected]> Signed-off-by: dorotat <[email protected]>

microphone demo (NVIDIA#7110)

274c66e

Signed-off-by: Linnea Pari Leaver <[email protected]> Co-authored-by: Linnea Pari Leaver <[email protected]> Signed-off-by: dorotat <[email protected]>

Fix plot function in vad_utils.py (NVIDIA#7113)

7f9d91c

Fix plot function in vad_utils.py Signed-off-by: He Huang (Steve) <[email protected]> Signed-off-by: dorotat <[email protected]>

Fixed small bug with NoisePerturbationWithNormalization (NVIDIA#7118)

5f17ceb

Signed-off-by: Daniel Egert <[email protected]> Signed-off-by: dorotat <[email protected]>

Fix import guard checks (NVIDIA#7124)

83074c2

Signed-off-by: smajumdar <[email protected]> Signed-off-by: dorotat <[email protected]>

Revert "Fix import guard checks (NVIDIA#7124)" (NVIDIA#7125)

9a35a3c

This reverts commit a46e325. Signed-off-by: dorotat <[email protected]>

Add updated fc ctc and rnnt xxl models (NVIDIA#7128) (NVIDIA#7130)

4c469f9

Signed-off-by: dorotat <[email protected]>

Fix rank where torch.distributed may not be initialized yet and would…

e05a0c6

… not wait for tokenizer file caching (NVIDIA#7061) Signed-off-by: Kim Ngo <[email protected]> Co-authored-by: David <[email protected]> Signed-off-by: dorotat <[email protected]>

fix default attention size (NVIDIA#7141) (NVIDIA#7143)

25517ca

Signed-off-by: dorotat <[email protected]>

fix evaluator.py for various exceptions by ast (NVIDIA#7150)

ae7ebac

Signed-off-by: He Huang (Steve) <[email protected]> Signed-off-by: dorotat <[email protected]>

freeze (NVIDIA#7152)

90acab5

Signed-off-by: arendu <[email protected]> Signed-off-by: dorotat <[email protected]>

make sure any empty segments are removed (NVIDIA#7155)

a1e65bf

Signed-off-by: Elena Rastorgueva <[email protected]> Signed-off-by: dorotat <[email protected]>

guyueh1 and others added 10 commits August 24, 2023 15:22

remove additional line (NVIDIA#7293)

0f0bb53

Signed-off-by: Jason Wang <[email protected]> Signed-off-by: dorotat <[email protected]>

remove deprecated scripts from ci (NVIDIA#7239)

daf65ae

Signed-off-by: arendu <[email protected]> Signed-off-by: dorotat <[email protected]>

add log_model to MLFlowParams (NVIDIA#7258)

95df78d

Signed-off-by: Alireza Morsali <[email protected]> Signed-off-by: dorotat <[email protected]>

[TTS] minor fix typos and input_types (NVIDIA#7272)

a0e7f48

Signed-off-by: Xuesong Yang <[email protected]> Signed-off-by: dorotat <[email protected]>

[pre-commit.ci] auto fixes from pre-commit.com hooks

849cab3

for more information, see https://pre-commit.ci Signed-off-by: dorotat <[email protected]>

improved docs

6dd3e73

Signed-off-by: dorotat <[email protected]>

dorotat-nv force-pushed the dorotat/migrate-from-bionemo-csvfieldsmemmapdataset branch from e8cc235 to 6dd3e73 Compare August 24, 2023 13:22

github-actions bot added core Changes to NeMo Core TTS ASR CI common labels Aug 24, 2023

Merge branch 'main' into dorotat/migrate-from-bionemo-csvfieldsmemmap…

6ec1b0c

…dataset

github-actions bot removed core Changes to NeMo Core TTS ASR CI common labels Aug 24, 2023

Davood-M previously approved these changes Aug 24, 2023

View reviewed changes

fixed unit test

9db1364

dorotat-nv dismissed Davood-M’s stale review via 9db1364 August 24, 2023 16:09

pre-commit-ci bot and others added 2 commits August 24, 2023 16:10

[pre-commit.ci] auto fixes from pre-commit.com hooks

99e70d2

for more information, see https://pre-commit.ci

Merge branch 'main' into dorotat/migrate-from-bionemo-csvfieldsmemmap…

965c96c

…dataset

Davood-M approved these changes Aug 24, 2023

View reviewed changes

Davood-M merged commit 11c0b2a into NVIDIA:main Aug 24, 2023
11 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

migrated class CSVFieldsMemmapDataset from BioNeMo #7314

migrated class CSVFieldsMemmapDataset from BioNeMo #7314

dorotat-nv commented Aug 24, 2023

Davood-M left a comment

Davood-M left a comment

migrated class CSVFieldsMemmapDataset from BioNeMo #7314

migrated class CSVFieldsMemmapDataset from BioNeMo #7314

Conversation

dorotat-nv commented Aug 24, 2023

What does this PR do ?

Changelog

Usage

Before your PR is "Ready for review"

Who can review?

Additional Information

Davood-M left a comment

Choose a reason for hiding this comment

Davood-M left a comment

Choose a reason for hiding this comment