Skip to content

Commit

Permalink
T5 metrics fix (NVIDIA#7037)
Browse files Browse the repository at this point in the history
* Fix race condition when executing with multi-node where some ranks does not wait for setup (NVIDIA#7016)

Signed-off-by: Kim Ngo <[email protected]>
Signed-off-by: jubick1337 <[email protected]>

* Added bool types to neural_types export (NVIDIA#7032)

Signed-off-by: tbartley94 <[email protected]>
Signed-off-by: jubick1337 <[email protected]>

* rnnt and char utils (NVIDIA#6971)

* rnnt_ngram_merge

Signed-off-by: Nikolay Karpov <[email protected]>

* char level bug

Signed-off-by: Nikolay Karpov <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Nikolay Karpov <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Somshubra Majumdar <[email protected]>
Signed-off-by: jubick1337 <[email protected]>

* fix tab text gen (NVIDIA#7022) (NVIDIA#7031)

Signed-off-by: Yi Dong <[email protected]>
Co-authored-by: Yi Dong <[email protected]>
Signed-off-by: jubick1337 <[email protected]>

* Fixed kwargs for metric instance init

Signed-off-by: jubick1337 <[email protected]>

* Fixed kwargs for metric instance init

Signed-off-by: jubick1337 <[email protected]>

* removed kwagrs

Signed-off-by: jubick1337 <[email protected]>

* Updated config desc

Signed-off-by: jubick1337 <[email protected]>

* ASR Confidence update and tutorial (NVIDIA#6810)

* small fixes and tests

Signed-off-by: Aleksandr Laptev <[email protected]>

* various fixes for the tutorial

Signed-off-by: Aleksandr Laptev <[email protected]>

* tutorial added

Signed-off-by: Aleksandr Laptev <[email protected]>

* for for a little oops after rebasement

Signed-off-by: Aleksandr Laptev <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix tests

Signed-off-by: Aleksandr Laptev <[email protected]>

* unused import removed

Signed-off-by: Aleksandr Laptev <[email protected]>

* fix review comments

Signed-off-by: Aleksandr Laptev <[email protected]>

* deprecated parameters for greedy configs

Signed-off-by: Aleksandr Laptev <[email protected]>

* move re-assigning to configs

Signed-off-by: Aleksandr Laptev <[email protected]>

* fix comments 2

Signed-off-by: Aleksandr Laptev <[email protected]>

* fix config tests

Signed-off-by: Aleksandr Laptev <[email protected]>

* fix ece test (my env was bugged apparently)

Signed-off-by: Aleksandr Laptev <[email protected]>

* renamings for confidence ensemble

Signed-off-by: Aleksandr Laptev <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fox comments 3

Signed-off-by: Aleksandr Laptev <[email protected]>

* return dropped tutorial

Signed-off-by: Aleksandr Laptev <[email protected]>

* CI flips back and forth, increasing tolerance

Signed-off-by: Aleksandr Laptev <[email protected]>

---------

Signed-off-by: Aleksandr Laptev <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: jubick1337 <[email protected]>

* install_bs (NVIDIA#7019) (NVIDIA#7028)

Signed-off-by: Nikolay Karpov <[email protected]>
Co-authored-by: Nikolay Karpov <[email protected]>
Signed-off-by: jubick1337 <[email protected]>

* fixes for spellmapper (NVIDIA#6994) (NVIDIA#7000)

Signed-off-by: Alexandra Antonova <[email protected]>
Co-authored-by: bene-ges <[email protected]>
Co-authored-by: Evelina <[email protected]>
Signed-off-by: jubick1337 <[email protected]>

* added back the retro documents (NVIDIA#7033)

Signed-off-by: Yi Dong <[email protected]>
Signed-off-by: jubick1337 <[email protected]>

* Remove pyyaml (NVIDIA#7052) (NVIDIA#7054)

Signed-off-by: smajumdar <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Signed-off-by: jubick1337 <[email protected]>

* st standalone model (NVIDIA#6969)

* st standalone model

Signed-off-by: AlexGrinch <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* style fix

Signed-off-by: AlexGrinch <[email protected]>

* sacrebleu import fix, unused imports removed

Signed-off-by: AlexGrinch <[email protected]>

* import guard for nlp inside asr transformer bpe model

Signed-off-by: AlexGrinch <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* codeql fixes

Signed-off-by: AlexGrinch <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* comments answered

Signed-off-by: AlexGrinch <[email protected]>

* import ordering fix

Signed-off-by: AlexGrinch <[email protected]>

* yttm for asr removed

Signed-off-by: AlexGrinch <[email protected]>

* logging added

Signed-off-by: AlexGrinch <[email protected]>

* added inference and translate method

Signed-off-by: AlexGrinch <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: AlexGrinch <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: jubick1337 <[email protected]>

* remove pos emb from state dict for old models (NVIDIA#7068)

* remove pos emb from state dict

Signed-off-by: Evelina <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* move to nlp_model

Signed-off-by: Evelina <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update comment

Signed-off-by: Evelina <[email protected]>

* fix nmt test

Signed-off-by: Evelina <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix nmt test

Signed-off-by: Evelina <[email protected]>

---------

Signed-off-by: Evelina <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: jubick1337 <[email protected]>

* Fix typo in ASR-TTS tutorial (NVIDIA#7049)

Signed-off-by: Vladimir Bataev <[email protected]>
Signed-off-by: jubick1337 <[email protected]>

* Fixed tutorial's name (NVIDIA#7047)

Signed-off-by: Vitaly Lavrukhin <[email protected]>
Co-authored-by: Vladimir Bataev <[email protected]>
Signed-off-by: jubick1337 <[email protected]>

* Fix documentation for Numba (NVIDIA#7065) (NVIDIA#7077)

* Fix documentation for Numba

* Update force float32 flag dynamically

* Update force float32 flag dynamically

* Fix nemo version

---------

Signed-off-by: smajumdar <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: jubick1337 <[email protected]>

* Update Frame-VAD doc and fix onnx export (NVIDIA#7076)

* update fvad doc

Signed-off-by: stevehuang52 <[email protected]>

* fix typo

Signed-off-by: stevehuang52 <[email protected]>

* update fvad example

Signed-off-by: stevehuang52 <[email protected]>

* update

Signed-off-by: stevehuang52 <[email protected]>

* fix onnx export

Signed-off-by: stevehuang52 <[email protected]>

* update test

Signed-off-by: stevehuang52 <[email protected]>

* refactor

Signed-off-by: stevehuang52 <[email protected]>

* update doc

Signed-off-by: stevehuang52 <[email protected]>

* update

Signed-off-by: stevehuang52 <[email protected]>

---------

Signed-off-by: stevehuang52 <[email protected]>
Co-authored-by: fayejf <[email protected]>
Signed-off-by: jubick1337 <[email protected]>

* memmap worker arg (NVIDIA#7062)

* memmap worker arg

Signed-off-by: arendu <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update

Signed-off-by: arendu <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update

Signed-off-by: arendu <[email protected]>

* update

Signed-off-by: arendu <[email protected]>

---------

Signed-off-by: arendu <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: jubick1337 <[email protected]>

* Fix caching bug in causal convolutions for cache-aware ASR models (NVIDIA#7034) (NVIDIA#7082)

Co-authored-by: Vahid Noroozi <[email protected]>
Signed-off-by: jubick1337 <[email protected]>

* Fast Conformer global token fix (NVIDIA#7085)

* old way

Signed-off-by: sam1373 <[email protected]>

* fix

Signed-off-by: sam1373 <[email protected]>

* fix

Signed-off-by: sam1373 <[email protected]>

* fix

Signed-off-by: sam1373 <[email protected]>

* remove extra

Signed-off-by: sam1373 <[email protected]>

* clean

Signed-off-by: sam1373 <[email protected]>

* clean

Signed-off-by: sam1373 <[email protected]>

* clean

Signed-off-by: sam1373 <[email protected]>

* fix

Signed-off-by: sam1373 <[email protected]>

* fix

Signed-off-by: sam1373 <[email protected]>

* fix

Signed-off-by: sam1373 <[email protected]>

* fix

Signed-off-by: sam1373 <[email protected]>

* fix

Signed-off-by: sam1373 <[email protected]>

* fix

Signed-off-by: sam1373 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: sam1373 <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: jubick1337 <[email protected]>

* Refined export_config (NVIDIA#7053) (NVIDIA#7066)

* Refined export_config
* Rolling back hierarchy change
---------

Signed-off-by: Boris Fomitchev <[email protected]>
Co-authored-by: Boris Fomitchev <[email protected]>
Signed-off-by: jubick1337 <[email protected]>

* small Bugfix (NVIDIA#7081)

* small Bugfix (NVIDIA#7079)

* fix branch

Signed-off-by: fayejf <[email protected]>

* fix typo

Signed-off-by: fayejf <[email protected]>

* fix link

Signed-off-by: fayejf <[email protected]>

---------

Signed-off-by: fayejf <[email protected]>

* Update tutorials/nlp/SpellMapper_English_ASR_Customization.ipynb

Signed-off-by: Somshubra Majumdar <[email protected]>

* Update tutorials/nlp/SpellMapper_English_ASR_Customization.ipynb

Signed-off-by: Somshubra Majumdar <[email protected]>

---------

Signed-off-by: fayejf <[email protected]>
Signed-off-by: Somshubra Majumdar <[email protected]>
Co-authored-by: fayejf <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Signed-off-by: jubick1337 <[email protected]>

* Added script to extract ASR CTC and RNNT models from ASR hybrid models (NVIDIA#7092)

* Added script to extract ctc and rnnt models from hybrid models

Signed-off-by: Daniel Egert <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Updated hybrid extraction script for review request 1

Signed-off-by: Daniel Egert <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Updated hybrid convert script to remove --cuda flag

Signed-off-by: Daniel Egert <[email protected]>

---------

Signed-off-by: Daniel Egert <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Somshubra Majumdar <[email protected]>
Signed-off-by: jubick1337 <[email protected]>

* Adding docs and models for multiple lookahead cache-aware ASR (NVIDIA#7067) (NVIDIA#7094)

Signed-off-by: jubick1337 <[email protected]>

* update TTS readme (NVIDIA#7088)

* update TTS readme

Signed-off-by: Xuesong Yang <[email protected]>

---------

Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: jubick1337 <[email protected]>

* Fix absolute path in path join call (NVIDIA#7099)

Signed-off-by: Jan Beckmann <[email protected]>
Signed-off-by: jubick1337 <[email protected]>

* Disable distopt contiguous param buffer by default (NVIDIA#7095)

Signed-off-by: Tim Moon <[email protected]>
Signed-off-by: jubick1337 <[email protected]>

* microphone demo (NVIDIA#7110)

Signed-off-by: Linnea Pari Leaver <[email protected]>
Co-authored-by: Linnea Pari Leaver <[email protected]>
Signed-off-by: jubick1337 <[email protected]>

* [Fix] load_state_dict in nlp_model.py (NVIDIA#7086)

* Fix load_state_dict in nlp_model.py

Signed-off-by: He Huang (Steve) <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: He Huang (Steve) <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: jubick1337 <[email protected]>

* Fix plot function in vad_utils.py (NVIDIA#7113)

Fix plot function in vad_utils.py

Signed-off-by: He Huang (Steve) <[email protected]>
Signed-off-by: jubick1337 <[email protected]>

* Fixed small bug with NoisePerturbationWithNormalization (NVIDIA#7118)

Signed-off-by: Daniel Egert <[email protected]>
Signed-off-by: jubick1337 <[email protected]>

* Fix import guard checks (NVIDIA#7124)

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: jubick1337 <[email protected]>

* Revert "Fix import guard checks (NVIDIA#7124)" (NVIDIA#7125)

This reverts commit a46e325.

Signed-off-by: jubick1337 <[email protected]>

* Fix import guard checks (NVIDIA#7126)

* Fix import guard checks

Signed-off-by: smajumdar <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: smajumdar <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: jubick1337 <[email protected]>

* Add updated fc ctc and rnnt xxl models (NVIDIA#7128) (NVIDIA#7130)

Signed-off-by: jubick1337 <[email protected]>

* [TTS] Create EnCodec training recipe (NVIDIA#6852)

* [TTS] Create EnCodec training recipe

Signed-off-by: Ryan <[email protected]>

* [TTS] Update encodec recipe

Signed-off-by: Ryan <[email protected]>

* [TTS] Rename EnCodec to AudioCodec

Signed-off-by: Ryan <[email protected]>

* [TTS] Add EnCodec unit tests

Signed-off-by: Ryan <[email protected]>

* [TTS] Add copyright header to distributed.py

Signed-off-by: Ryan <[email protected]>

---------

Signed-off-by: Ryan <[email protected]>
Signed-off-by: jubick1337 <[email protected]>

* Fix rank where torch.distributed may not be initialized yet and would not wait for tokenizer file caching (NVIDIA#7061)

Signed-off-by: Kim Ngo <[email protected]>
Co-authored-by: David <[email protected]>
Signed-off-by: jubick1337 <[email protected]>

* fix default attention size (NVIDIA#7141) (NVIDIA#7143)

Signed-off-by: jubick1337 <[email protected]>

* fix evaluator.py for various exceptions by ast (NVIDIA#7150)

Signed-off-by: He Huang (Steve) <[email protected]>
Signed-off-by: jubick1337 <[email protected]>

* [TTS][ZH] add Chinese TTS recipes based on IPA symbol sets. (NVIDIA#6893)

* [TTS] add Chinese TTS recipe based on IPA.
* add new pinyin and ipa dictionaries with 36 finals.
* add yaml configs for 24-final pinyin and ipa.
* add copyright header
* add a directory level 24finals to discriminate from 36 finals.

Signed-off-by: Xuesong Yang <[email protected]>

* unify configs into a single one and add detailed comments providing supported candidates.

Signed-off-by: Xuesong Yang <[email protected]>

* choose 36-final IPA as default phoneme dict

Signed-off-by: Xuesong Yang <[email protected]>

---------

Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: jubick1337 <[email protected]>

* [TTS] Add output audio format to preprocessing (NVIDIA#6889)

* [TTS] Add output audio format to preprocessing

Signed-off-by: Ryan <[email protected]>

* [TTS] Add format validation

Signed-off-by: Ryan <[email protected]>

* [TTS] Fix data tutorial

Signed-off-by: Ryan <[email protected]>

---------

Signed-off-by: Ryan <[email protected]>
Signed-off-by: jubick1337 <[email protected]>

* freeze (NVIDIA#7152)

Signed-off-by: arendu <[email protected]>
Signed-off-by: jubick1337 <[email protected]>

* make sure any empty segments are removed (NVIDIA#7155)

Signed-off-by: Elena Rastorgueva <[email protected]>
Signed-off-by: jubick1337 <[email protected]>

* Update RIR generation scripts (NVIDIA#6547)

- fix: reduce room size if evaluation of params fails
- added randomized mic placement
- added diffuse noise generation
- added an option to specify the format and subtype for saved audio

Signed-off-by: Ante Jukić <[email protected]>
Signed-off-by: jubick1337 <[email protected]>

* A quickstart speech enhancement tutorial (NVIDIA#6492)

A simple example of training a model for speech enhancement task

Signed-off-by: Ante Jukić <[email protected]>
Signed-off-by: jubick1337 <[email protected]>

* NFA subtitle file config - specify colors and vertical alignment (NVIDIA#7160)

* allow specifying colors of text in ASS subtitle file

Signed-off-by: Elena Rastorgueva <[email protected]>

* specify vertical_alignment instead of marginv in ass_file_config

Signed-off-by: Elena Rastorgueva <[email protected]>

* add documentation of CTMFileConfig and ASSFileConfig to NFA README

Signed-off-by: Elena Rastorgueva <[email protected]>

---------

Signed-off-by: Elena Rastorgueva <[email protected]>
Signed-off-by: jubick1337 <[email protected]>

* Eagerly accumulate embedding grads into fp32 buffer (NVIDIA#6958) (NVIDIA#7153)

Signed-off-by: Tim Moon <[email protected]>
Co-authored-by: Tim Moon <[email protected]>
Signed-off-by: jubick1337 <[email protected]>

* TE bug fix (NVIDIA#7027) (NVIDIA#7036)

Signed-off-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Signed-off-by: jubick1337 <[email protected]>

* [TTS] Remove nested TTS configs (NVIDIA#7154)

* [TTS] Remove nested TTS configs

Signed-off-by: Ryan <[email protected]>

* [TTS] Modify tutorial to support multiple sampling rates

Signed-off-by: Ryan <[email protected]>

* [TTS] Clarify min_duration unit

Signed-off-by: Ryan <[email protected]>

* [TTS] Default 22.05kHz highfreq to null

Signed-off-by: Ryan <[email protected]>

---------

Signed-off-by: Ryan <[email protected]>
Signed-off-by: jubick1337 <[email protected]>

* Merge release r1.20.0 to main (NVIDIA#7167)

* update package info

Signed-off-by: ericharper <[email protected]>

* Add ASR with TTS Tutorial. Fix enhancer usage. (NVIDIA#6955)

* Add ASR with TTS Tutorial
* Fix enhancer usage

Signed-off-by: Vladimir Bataev <[email protected]>

* install_bs (NVIDIA#7019)

Signed-off-by: Nikolay Karpov <[email protected]>

* Fix typo and branch in tutorial (NVIDIA#7048)

Signed-off-by: Vladimir Bataev <[email protected]>

* fix syntax error introduced in PR-7079 (NVIDIA#7102)

* fix syntax error introduced in PR-7079

Signed-off-by: Alexandra Antonova <[email protected]>

* fixes for pr review

Signed-off-by: Alexandra Antonova <[email protected]>

---------

Signed-off-by: Alexandra Antonova <[email protected]>

* fix links for TN (NVIDIA#7117)

Signed-off-by: Evelina <[email protected]>

* update branch (NVIDIA#7135)

Signed-off-by: ericharper <[email protected]>

* Fixed main and merging this to r1.20 (NVIDIA#7127)

* Fixed main and merging this to r1.20

Signed-off-by: Taejin Park <[email protected]>

* Update vad_utils.py

Signed-off-by: He Huang (Steve) <[email protected]>

---------

Signed-off-by: Taejin Park <[email protected]>
Signed-off-by: He Huang (Steve) <[email protected]>
Co-authored-by: He Huang (Steve) <[email protected]>

* update branch

Signed-off-by: ericharper <[email protected]>

* fix version

Signed-off-by: ericharper <[email protected]>

* resolve conflict the other way

Signed-off-by: ericharper <[email protected]>

* keep both

Signed-off-by: ericharper <[email protected]>

* revert keep both

Signed-off-by: ericharper <[email protected]>

---------

Signed-off-by: ericharper <[email protected]>
Signed-off-by: Vladimir Bataev <[email protected]>
Signed-off-by: Nikolay Karpov <[email protected]>
Signed-off-by: Alexandra Antonova <[email protected]>
Signed-off-by: Evelina <[email protected]>
Signed-off-by: Taejin Park <[email protected]>
Signed-off-by: He Huang (Steve) <[email protected]>
Co-authored-by: Vladimir Bataev <[email protected]>
Co-authored-by: Nikolay Karpov <[email protected]>
Co-authored-by: bene-ges <[email protected]>
Co-authored-by: Evelina <[email protected]>
Co-authored-by: Taejin Park <[email protected]>
Co-authored-by: He Huang (Steve) <[email protected]>
Signed-off-by: jubick1337 <[email protected]>

* Upgrade to pytorch lightning 2.0 (NVIDIA#6433)

* Upgrade pytorch lightning version in requirements

Signed-off-by: Abhishree <[email protected]>

* Initial fixes for PTL2.0

Signed-off-by: Abhishree <[email protected]>

* Add further fixes to support lightning 2.0

Signed-off-by: Abhishree <[email protected]>

* Add replacements for replace_sampler_ddp, resume_from_checkpoint_fit_path and few occurances of validation_epoch_end

Signed-off-by: Abhishree <[email protected]>

* Replace all occurances of validation_epoch_end to on_validation_epoch_end

Signed-off-by: Abhishree <[email protected]>

* Replace training_epoch_end, test_epoch_end with on_train_epoch_end and on_test_epoch_end respectively

Signed-off-by: Abhishree <[email protected]>

* Change logger=None to logger=False in Trainer object

Signed-off-by: Abhishree <[email protected]>

* Remove PTL2.0 deprecated Trainer args from TrainerConfig dataclass

Signed-off-by: Abhishree <[email protected]>

* Modify trainer.precision check and other small edits

Signed-off-by: Abhishree <[email protected]>

* Replace logger=None with logger=False in test_ptl_stateless_timer.py Trainer

Signed-off-by: Abhishree <[email protected]>

* Add default values for args to fix Attribute Error

Signed-off-by: Abhishree <[email protected]>

* Add the following modifications

1) Remove outputs arg from on_validation_epoch_end, on_test_epoch_end and make it an arg of the class
2) Replace resume_from_checkpoint with ckpt_path as needed
3) Explicitly add accelerator as 'CPU' in UTs being run on CPU

Signed-off-by: Abhishree <[email protected]>

* Remove outputs arg from on_validation_epoch_end, on_test_epoch_end

Signed-off-by: Abhishree <[email protected]>

* Remove outputs arg in on_validation_epoch_end in MultiBinaryAccuracy docstrings

Signed-off-by: Abhishree <[email protected]>

* Add val, test outputs as instance vars in PunctuationCapitalizationModel and TokenClassificationModel

Signed-off-by: Abhishree <[email protected]>

* Replace trainer.fit_loop.max_steps with trainer.fit_loop.epoch_loop.max_steps in test_optimizers_schedulers.py

Signed-off-by: Abhishree <[email protected]>

* Revert an extra space that was mistakenly added

Signed-off-by: Abhishree <[email protected]>

* Use self.validation_step_outputs and self.test_step_outputs in test_ema.py for uniformity

Signed-off-by: Abhishree <[email protected]>

* Use self.validation_step_outputs and self.test_step_outputs in test_ptl_stateless_timer.py and check_for_ranks.py for uniformity

Signed-off-by: Abhishree <[email protected]>

* Add self.validation_step_outputs.clear() and self.test_step_outputs.clear() wherever missing

Signed-off-by: Abhishree <[email protected]>

* Remove outputs arg from on_train_epoch_end

Signed-off-by: Abhishree <[email protected]>

* Remove outputs from on_validation_epoch_end in multi_binary_acc.py

Signed-off-by: Abhishree <[email protected]>

* Remove output args from on_validation_epoch_end in the docstrings of some ASR files

Signed-off-by: Abhishree <[email protected]>

* Remove output args from on_validation_epoch_end and clear memory from validation_step_outputs

Signed-off-by: Abhishree <[email protected]>

* Add on_validation_epoch_end and remove outputs args for nlp models

Signed-off-by: Abhishree <[email protected]>

* Append output of validation_step to validation_step_outputs in EncDecClassificationModel

Signed-off-by: Abhishree <[email protected]>

* Add the following changes

1) Index self.validation_step_outputs and self.test_step.outputs with dataloader_idx wherever needed
2) Initialize self.validation_step_outputs and self.test_step.outputs as empty lists and add support for multi dataloaders if they exist
3) Remove self.pre_configure_ddp from NLPDDPStrategy class as its removed in PTL 2.0

Signed-off-by: Abhishree <[email protected]>

* Add default value dataloader_idx=0 for on_validation_batch_end() in megatron_base_model.py

Signed-off-by: Abhishree <[email protected]>

* TypeCast precision to str in attention.py and utils_funcs.py to avoid TypeError

Signed-off-by: Abhishree <[email protected]>

* Add if condition check for multiple dataloaders when appending to validation outputs

Signed-off-by: Abhishree <[email protected]>

* Separate validation pass to be used with both validation_step and test_step

Signed-off-by: Abhishree <[email protected]>

* Add if condition check for multiple dataloader while appending to test_step_outputs in punctuation_capitalization_model.py

Signed-off-by: Abhishree <[email protected]>

* Add condition check for multiple dataloaders based on type of trainer.val/test_dataloaders or self._validation/test_dl instead of len

Signed-off-by: Abhishree <[email protected]>

* Comment Megatron T5 IA3 PP=2 in CI pipeline due to dataloader_iter issue with PTL 2.0

Signed-off-by: Abhishree <[email protected]>

* Modify precision checks to account for 16-mixed and bf16-mixed

Signed-off-by: Abhishree <[email protected]>

* Append output of validation/test_step to self.validation/test_step_outputs in CTCG2PModel

Signed-off-by: Abhishree <[email protected]>

* Modify find_unused_parameters=True in g2p_heteronym model

1) Add find_unused_parameters=True for DDP strategy in g2p_heteronym_classification_train_and_evaluate.py
2) Remove args output in validation/test_step and add instance variables instead for heteronym_classification.py

Signed-off-by: Abhishree <[email protected]>

* Remove outputs from on_test_epoch_end in DialogueGPTClassificationModel

Signed-off-by: Abhishree <[email protected]>

* Add validation/test outputs in sgdqa_model and modify dialogue_config.yaml

Signed-off-by: Abhishree <[email protected]>

* Add split arg self.test_step_outputs to TextClassificationModel

Signed-off-by: Abhishree <[email protected]>

* Add test_step_outputs to dialogue and text classification models

Signed-off-by: Abhishree <[email protected]>

* Change condition check for multiple dataloaders:

1) Replace ds_item as list in dialogue_config.yaml
2) Check for len of val/test_dataloaders or validation/test_dl along with type check of list in sgdqa_model.py while appending outputs of validation/test_step
3) Check for len of _validation/test_dl for creating self.validation/test_step_outputs in ModelPT and punctuation_cpitalization_model.py

Signed-off-by: Abhishree <[email protected]>

* Add additional condition for multi dataloaders

Check len(self.trainer.val/test_dataloaders) > 1 along with type(self.trainer.val/test_dataloaders) == list for multi dataloaders in validation/test_step

Signed-off-by: Abhishree <[email protected]>

* Add val step outputs and default val for dataloader_idx

1) Append validation_step outout to self.validation_step_outputs in MultiLabelIntentSlotClassificationMode
2) Add default val for dataloader_idx for on_test_batch_start/end in TimingCallback
3) Add self.validation/test_step_outputs in BERTQAModel and remove outputs arg

Signed-off-by: Abhishree <[email protected]>

* Add val/test_step_outputs to S2SQAModel and GPTQAModel

Signed-off-by: Abhishree <[email protected]>

* Edit JenkinsFile for bert_pretrainig.py

Edit Jenkinsfile for this test to disable validation as a workaround for trainer.val_dataloader None error

Signed-off-by: Abhishree <[email protected]>

* Modify precision to support 16-mixed, bf16-mixed in megatron_gpt_pretraining.py

Signed-off-by: Abhishree <[email protected]>

* Add ddp_find_unused_parameters_true and remove output args

1) Add ddp_find_unused_parameters_true fro trainer.strategy in self_alignment_pretraining.py as it has unused parameters
2) Remove output args and add self.validation/test_step_outputs to validation/test_step in mt_enc_dec_model.py
3) Comment tests in JenkinsFile that need to be fixed

Signed-off-by: Abhishree <[email protected]>

* Precision fix in megatron_nmt_training.py for 16-mixed, bf16-mixed

Signed-off-by: Abhishree <[email protected]>

* Precision fix for megatron_bert_pretraining.py and megatron_bert_model.py

Signed-off-by: Abhishree <[email protected]>

* Precision fix and validation/test_step_outputs

1) Add fix to account for 16-mixed and bf16-mixed in megatron_retro_mutransfer_pretrain.py, megatron_retro_pretraining.py
2) Reset ckpt_path for test in enc_dec_nmt.py
3) Remove outputs args and add validation/test_step_outputs in megatron_retrieval_model.py
4) Comment Megatron Bert Pretraining and Resume Training with Pipeline Paralleism and add back NMT Training Post-LN

Signed-off-by: Abhishree <[email protected]>

* Precision fix and skip few failing tests

Signed-off-by: Abhishree <[email protected]>

* Add missing comment lines in JenkinsFile

Signed-off-by: Abhishree <[email protected]>

* Comment jenkin tests and super().on_validation_epoch_end() in megatron_gpt_sft_model.py

Signed-off-by: Abhishree <[email protected]>

* Minor edit JenkinsFile

Signed-off-by: Abhishree <[email protected]>

* Minor edit in jenkins file

Signed-off-by: Abhishree <[email protected]>

* Edit in Jenkins file

Signed-off-by: Abhishree <[email protected]>

* Comment missed lines in Jenkins file

Signed-off-by: Abhishree <[email protected]>

* Fix precision and validation/test outputs

1) Add precision fix to account for 16-mixed and bf16-mixed in megatron_t5_pretraining.py
2) Remove outputs args and add append loss to self.validation/test_step_outputs in megatron_lm_encoder_decoder_model.py
3) Add back resume_from_checkpoint in the megatron_t5_config.yaml
4) Comment out certain tests in Jenkins file

Signed-off-by: Abhishree <[email protected]>

* Fix precision and validation/test/predict errors in megatron_t5_prompt_learning.py

Signed-off-by: Abhishree <[email protected]>

* Precision fix and edit precision typo in all files

1) Account for 16-mixed and bf16-mixed in megatron_bart_pretraining.py and megatron_t5_seq2seq_finetune.py
2) Fix precision typo in all files

Signed-off-by: Abhishree <[email protected]>

* Fix all CI TTS tests and comment few Jenkins tests

Signed-off-by: Abhishree <[email protected]>

* Combine xx_epoch_end and on_xx_epoch_end

Add on_inference_epoch_end to inference_epoch_end function and have a single on_validation/test_epoch_end in megatron_finetune_model.py and megatron_gpt_sft_model.py

Signed-off-by: Abhishree <[email protected]>

* Add a missing comment in JenkinsFile

Signed-off-by: Abhishree <[email protected]>

* Add try except StopIteration in validation_step for models with dataloader_iter

Signed-off-by: Abhishree <[email protected]>

* Remove pyyaml from requirements

Signed-off-by: Abhishree <[email protected]>

* Add try except for inference_step in megatron_finetune_model.py

Signed-off-by: Abhishree <[email protected]>

* Remove limit_val_batches for mockGPTDataset test

Signed-off-by: Abhishree <[email protected]>

* Add new self.validation_step_outputs for MegatronGPTSFTModel

Signed-off-by: Abhishree <[email protected]>

* Minor edit Jenkinsfile

Signed-off-by: Abhishree <[email protected]>

* Initialize self.validation/test_step_outputs in megatron_gpt_sft_model.py

Initialize self.validation/test_step_outputs in setup of MegatronGPTSFTModel to take care of cases when datalaoders are not setup in ModelPT for example while restoring the model.

Signed-off-by: Abhishree <[email protected]>

* Remove resume_from_checkpoint if trainer arg in conf yaml files

Signed-off-by: Abhishree <[email protected]>

* Remove resume_from_checkpoint as trainer arg in GPT, T5 configs

Signed-off-by: Abhishree <[email protected]>

* Remove resume_from_checkpoint in duplex_tn_config.yaml

Signed-off-by: Abhishree <[email protected]>

* Fix typos, unused imports and refactor code to remove redundant funcs

Signed-off-by: Abhishree <[email protected]>

* Remove commented code in megatron_nmt_model.py

Signed-off-by: Abhishree <[email protected]>

* Fix overriden functions to match parent class functions

Signed-off-by: Abhishree <[email protected]>

* Prefetch dataloader_iter to prevent hang for PP>1

Signed-off-by: Abhishree <[email protected]>

* Override setup() in NLPDDPStrategy to avoid hang during predict with PP>1

Signed-off-by: Abhishree <[email protected]>

* Uncomment tests in JenkinsFile

Signed-off-by: Abhishree <[email protected]>

* Add '16' to precision checks and other minor fixes

Signed-off-by: Abhishree <[email protected]>

* Clear validation/test_step_outputs with dataloader_idx for multi dataloaders

Signed-off-by: Abhishree <[email protected]>

* Minor edits

Signed-off-by: Abhishree <[email protected]>

* Modify precision checks to avoid indexing

Signed-off-by: Abhishree <[email protected]>

* Remove self.validation_step_outputs_sft and add dataloader_idx to clear outputs

Signed-off-by: Abhishree <[email protected]>

* Reference checkpoint with trainer.ckpt_path

Signed-off-by: Abhishree <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add _prefetch to NLPModel and minor fixes

Signed-off-by: Abhishree <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add limit_val_batches in JenkinsFile for NMT

1) Add trainer.limit_val_batches in Megatron NMT Training TP=2
2) Remove unused import in ModelPT

Signed-off-by: Abhishree <[email protected]>

---------

Signed-off-by: Abhishree <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: jubick1337 <[email protected]>

* Include the scripts for preprocessing OAST and unit tests for chat sft datasets (NVIDIA#7112)

* scripts for sft

Signed-off-by: Yi Dong <[email protected]>

* fix style

Signed-off-by: Yi Dong <[email protected]>

* adde special token only for huggingface model

Signed-off-by: Yi Dong <[email protected]>

* change default name

Signed-off-by: Yi Dong <[email protected]>

* print out error datapoint content

Signed-off-by: Yi Dong <[email protected]>

* show error id

Signed-off-by: Yi Dong <[email protected]>

* annotation script working

Signed-off-by: Yi Dong <[email protected]>

* try to be compatible with huggingface tokenizer

Signed-off-by: Yi Dong <[email protected]>

* added examples

Signed-off-by: Yi Dong <[email protected]>

* added lang

Signed-off-by: Yi Dong <[email protected]>

* added lang

Signed-off-by: Yi Dong <[email protected]>

* text to value special case

Signed-off-by: Yi Dong <[email protected]>

* configure the slider

Signed-off-by: Yi Dong <[email protected]>

* annoatation handles lang

Signed-off-by: Yi Dong <[email protected]>

* added the unit test for chat sft dataset

Signed-off-by: Yi Dong <[email protected]>

* used the file in the test dir

Signed-off-by: Yi Dong <[email protected]>

* fix json error

Signed-off-by: Yi Dong <[email protected]>

* load local tokenizer

Signed-off-by: Yi Dong <[email protected]>

* remove mask count check

Signed-off-by: Yi Dong <[email protected]>

* added HF dataset backend

Signed-off-by: Yi Dong <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Yi Dong <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: jubick1337 <[email protected]>

* add paths to labeler. (NVIDIA#7087)

Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: jubick1337 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Kim Ngo <[email protected]>
Signed-off-by: jubick1337 <[email protected]>
Signed-off-by: tbartley94 <[email protected]>
Signed-off-by: Nikolay Karpov <[email protected]>
Signed-off-by: Yi Dong <[email protected]>
Signed-off-by: Aleksandr Laptev <[email protected]>
Signed-off-by: Alexandra Antonova <[email protected]>
Signed-off-by: smajumdar <[email protected]>
Signed-off-by: AlexGrinch <[email protected]>
Signed-off-by: Evelina <[email protected]>
Signed-off-by: Vladimir Bataev <[email protected]>
Signed-off-by: Vitaly Lavrukhin <[email protected]>
Signed-off-by: stevehuang52 <[email protected]>
Signed-off-by: arendu <[email protected]>
Signed-off-by: sam1373 <[email protected]>
Signed-off-by: Boris Fomitchev <[email protected]>
Signed-off-by: fayejf <[email protected]>
Signed-off-by: Somshubra Majumdar <[email protected]>
Signed-off-by: Daniel Egert <[email protected]>
Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: Jan Beckmann <[email protected]>
Signed-off-by: Tim Moon <[email protected]>
Signed-off-by: Linnea Pari Leaver <[email protected]>
Signed-off-by: He Huang (Steve) <[email protected]>
Signed-off-by: Ryan <[email protected]>
Signed-off-by: Elena Rastorgueva <[email protected]>
Signed-off-by: Ante Jukić <[email protected]>
Signed-off-by: Dmytro Pykhtar <[email protected]>
Signed-off-by: ericharper <[email protected]>
Signed-off-by: Taejin Park <[email protected]>
Signed-off-by: Abhishree <[email protected]>
Co-authored-by: Kim Ngo <[email protected]>
Co-authored-by: tbartley94 <[email protected]>
Co-authored-by: Nikolay Karpov <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Yi Dong <[email protected]>
Co-authored-by: Aleksandr Laptev <[email protected]>
Co-authored-by: bene-ges <[email protected]>
Co-authored-by: Evelina <[email protected]>
Co-authored-by: Aleksey Grinchuk (Oleksii Hrinchuk) <[email protected]>
Co-authored-by: Vladimir Bataev <[email protected]>
Co-authored-by: Vitaly Lavrukhin <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: He Huang (Steve) <[email protected]>
Co-authored-by: fayejf <[email protected]>
Co-authored-by: Adi Renduchintala <[email protected]>
Co-authored-by: Vahid Noroozi <[email protected]>
Co-authored-by: Samuel Kriman <[email protected]>
Co-authored-by: Boris Fomitchev <[email protected]>
Co-authored-by: trias702 <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: Jan Beckmann <[email protected]>
Co-authored-by: Tim Moon <[email protected]>
Co-authored-by: lleaver <[email protected]>
Co-authored-by: Linnea Pari Leaver <[email protected]>
Co-authored-by: Ryan Langman <[email protected]>
Co-authored-by: David <[email protected]>
Co-authored-by: Elena Rastorgueva <[email protected]>
Co-authored-by: anteju <[email protected]>
Co-authored-by: Dmytro Pykhtar <[email protected]>
Co-authored-by: Taejin Park <[email protected]>
Co-authored-by: Abhishree Thittenamane <[email protected]>
Signed-off-by: dorotat <[email protected]>
  • Loading branch information
1 parent c6387a3 commit 3c5c4d5
Show file tree
Hide file tree
Showing 2 changed files with 22 additions and 10 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -87,7 +87,7 @@ model:
add_bos_to_input: ${data.train_ds.add_bos_to_input}
add_eos_to_input: ${data.train_ds.add_eos_to_input}
metric:
name: "exact_string_match" # Name of the evaluation metric to use.
name: "exact_string_match" # Name of the evaluation metric to use. Supported metrics: [`exact_string_match`, `rouge`, `pearson_corr_coef`, `spearman_corr_coef`, `f1`, `accuracy`, `average_precision`]
average: micro # Average the metric over the dataset. Options: ['macro', 'micro']. Works only for 'F1', 'accuracy' etc. Refer to torchmetrics for metrics where this is supported.
num_classes: null # Number of classes for the metric. Works only for 'F1', 'accuracy' and 'average_precision' etc. Refer to torchmetrics for metrics where this is supported.
class_labels: null # If the targets in your dataset are strings and not integers/float, you need to provide a list of class labels (size = num_classes) so we can convert from strings to integer categories to compute the metric.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -106,24 +106,36 @@ def setup_metric(self, data_cfg):
)

metric_name = data_cfg.metric.name
metric = MetricStringToTorchMetric[metric_name]
metric_class = MetricStringToTorchMetric[metric_name]

# GLUE will not have a "src_file_name" attribute and will always have only a single metric.
if hasattr(data_cfg, "src_file_name") or hasattr(data_cfg, "file_names"):
if hasattr(data_cfg, "src_file_name") and isinstance(data_cfg.src_file_name, ListConfig):
# We pass average and num_classes to the metric constructor via kwargs even if they don't exist for each metric.
if (
hasattr(data_cfg, "src_file_name")
and isinstance(data_cfg.src_file_name, ListConfig)
and metric_name != 'rouge'
):
metric = [
metric(average=data_cfg.metric.average, num_classes=data_cfg.metric.num_classes)
metric_class(average=data_cfg.metric.average, num_classes=data_cfg.metric.num_classes)
for _ in range(len(data_cfg.src_file_name))
]
elif hasattr(data_cfg, "file_names") and isinstance(data_cfg.file_names, ListConfig):
elif (
hasattr(data_cfg, "file_names")
and isinstance(data_cfg.file_names, ListConfig)
and metric_name != 'rouge'
):
metric = [
metric(average=data_cfg.metric.average, num_classes=data_cfg.metric.num_classes)
metric_class(average=data_cfg.metric.average, num_classes=data_cfg.metric.num_classes)
for _ in range(len(data_cfg.file_names))
]
elif hasattr(data_cfg, "src_file_name") and isinstance(data_cfg.src_file_name, ListConfig):
metric = [metric_class() for _ in range(len(data_cfg.src_file_name))]
elif hasattr(data_cfg, "file_names") and isinstance(data_cfg.file_names, ListConfig):
metric = [metric_class() for _ in range(len(data_cfg.file_names))]
else:
metric = [metric(average=data_cfg.metric.average, num_classes=data_cfg.metric.num_classes)]
metric = [metric_class(average=data_cfg.metric.average, num_classes=data_cfg.metric.num_classes)]
else:
metric = [metric()] # GLUE does need to specify average or num_classes.
metric = [metric_class()] # GLUE does need to specify average or num_classes.

return metric, metric_name

Expand Down Expand Up @@ -221,7 +233,7 @@ def cast_for_metric(self, pred, label, metric_name, class_labels=None, labels_ar
else:
pred = class_labels.index(pred)
if label not in class_labels:
raise ValueError(f"Ground truth labe; {label} is not in the class labels list : {class_labels}")
raise ValueError(f"Ground truth label {label} is not in the class labels list : {class_labels}")
label = class_labels.index(label)
pred = torch.LongTensor([pred]).to(self.device)
label = torch.LongTensor([label]).to(self.device)
Expand Down

0 comments on commit 3c5c4d5

Please sign in to comment.