From 026e363fa6c55e8e0434ee29e8929b58968d2998 Mon Sep 17 00:00:00 2001
From: Matvei Novikov <mattyson.so@gmail.com>
Date: Tue, 8 Aug 2023 07:50:02 +0400
Subject: [PATCH] T5 metrics fix (#7037)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

* Fix race condition when executing with multi-node where some ranks does not wait for setup (#7016)

Signed-off-by: Kim Ngo <6362111+findkim@users.noreply.github.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Added bool types to neural_types export (#7032)

Signed-off-by: tbartley94 <tbartley@nvidia.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* rnnt and char utils (#6971)

* rnnt_ngram_merge

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

* char level bug

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Somshubra Majumdar <titu1994@gmail.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* fix tab text gen (#7022) (#7031)

Signed-off-by: Yi Dong <yidong@nvidia.com>
Co-authored-by: Yi Dong <43824965+yidong72@users.noreply.github.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Fixed kwargs for metric instance init

Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Fixed kwargs for metric instance init

Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* removed kwagrs

Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Updated config desc

Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* ASR Confidence update and tutorial (#6810)

* small fixes and tests

Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>

* various fixes for the tutorial

Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>

* tutorial added

Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>

* for for a little oops after rebasement

Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix tests

Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>

* unused import removed

Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>

* fix review comments

Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>

* deprecated parameters for greedy configs

Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>

* move re-assigning to configs

Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>

* fix comments 2

Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>

* fix config tests

Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>

* fix ece test (my env was bugged apparently)

Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>

* renamings for confidence ensemble

Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fox comments 3

Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>

* return dropped tutorial

Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>

* CI flips back and forth, increasing tolerance

Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>

---------

Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* install_bs (#7019) (#7028)

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>
Co-authored-by: Nikolay Karpov <karpnv@gmail.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* fixes for spellmapper (#6994) (#7000)

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>
Co-authored-by: bene-ges <antonova_sasha@list.ru>
Co-authored-by: Evelina <10428420+ekmb@users.noreply.github.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* added back the retro documents (#7033)

Signed-off-by: Yi Dong <yidong@nvidia.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Remove pyyaml (#7052) (#7054)

Signed-off-by: smajumdar <titu1994@gmail.com>
Co-authored-by: Somshubra Majumdar <titu1994@gmail.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* st standalone model (#6969)

* st standalone model

Signed-off-by: AlexGrinch <grinchuk.alexey@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* style fix

Signed-off-by: AlexGrinch <grinchuk.alexey@gmail.com>

* sacrebleu import fix, unused imports removed

Signed-off-by: AlexGrinch <grinchuk.alexey@gmail.com>

* import guard for nlp inside asr transformer bpe model

Signed-off-by: AlexGrinch <grinchuk.alexey@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* codeql fixes

Signed-off-by: AlexGrinch <grinchuk.alexey@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* comments answered

Signed-off-by: AlexGrinch <grinchuk.alexey@gmail.com>

* import ordering fix

Signed-off-by: AlexGrinch <grinchuk.alexey@gmail.com>

* yttm for asr removed

Signed-off-by: AlexGrinch <grinchuk.alexey@gmail.com>

* logging added

Signed-off-by: AlexGrinch <grinchuk.alexey@gmail.com>

* added inference and translate method

Signed-off-by: AlexGrinch <grinchuk.alexey@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: AlexGrinch <grinchuk.alexey@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* remove pos emb from state dict for old models (#7068)

* remove pos emb from state dict

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* move to nlp_model

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update comment

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* fix nmt test

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix nmt test

Signed-off-by: Evelina <ebakhturina@nvidia.com>

---------

Signed-off-by: Evelina <ebakhturina@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Fix typo in ASR-TTS tutorial (#7049)

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Fixed tutorial's name (#7047)

Signed-off-by: Vitaly Lavrukhin <vlavrukhin@nvidia.com>
Co-authored-by: Vladimir Bataev <vbataev@nvidia.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Fix documentation for Numba (#7065) (#7077)

* Fix documentation for Numba


* Update force float32 flag dynamically


* Update force float32 flag dynamically


* Fix nemo version


---------

Signed-off-by: smajumdar <titu1994@gmail.com>
Co-authored-by: Somshubra Majumdar <titu1994@gmail.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Update Frame-VAD doc and fix onnx export (#7076)

* update fvad doc

Signed-off-by: stevehuang52 <heh@nvidia.com>

* fix typo

Signed-off-by: stevehuang52 <heh@nvidia.com>

* update fvad example

Signed-off-by: stevehuang52 <heh@nvidia.com>

* update

Signed-off-by: stevehuang52 <heh@nvidia.com>

* fix onnx export

Signed-off-by: stevehuang52 <heh@nvidia.com>

* update test

Signed-off-by: stevehuang52 <heh@nvidia.com>

* refactor

Signed-off-by: stevehuang52 <heh@nvidia.com>

* update doc

Signed-off-by: stevehuang52 <heh@nvidia.com>

* update

Signed-off-by: stevehuang52 <heh@nvidia.com>

---------

Signed-off-by: stevehuang52 <heh@nvidia.com>
Co-authored-by: fayejf <36722593+fayejf@users.noreply.github.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* memmap worker arg (#7062)

* memmap worker arg

Signed-off-by: arendu <adithya.r@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update

Signed-off-by: arendu <adithya.r@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update

Signed-off-by: arendu <adithya.r@gmail.com>

* update

Signed-off-by: arendu <adithya.r@gmail.com>

---------

Signed-off-by: arendu <adithya.r@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Fix caching bug in causal convolutions for cache-aware ASR models (#7034) (#7082)

Co-authored-by: Vahid Noroozi <VahidooX@users.noreply.github.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Fast Conformer global token fix (#7085)

* old way

Signed-off-by: sam1373 <samuelkriman@gmail.com>

* fix

Signed-off-by: sam1373 <samuelkriman@gmail.com>

* fix

Signed-off-by: sam1373 <samuelkriman@gmail.com>

* fix

Signed-off-by: sam1373 <samuelkriman@gmail.com>

* remove extra

Signed-off-by: sam1373 <samuelkriman@gmail.com>

* clean

Signed-off-by: sam1373 <samuelkriman@gmail.com>

* clean

Signed-off-by: sam1373 <samuelkriman@gmail.com>

* clean

Signed-off-by: sam1373 <samuelkriman@gmail.com>

* fix

Signed-off-by: sam1373 <samuelkriman@gmail.com>

* fix

Signed-off-by: sam1373 <samuelkriman@gmail.com>

* fix

Signed-off-by: sam1373 <samuelkriman@gmail.com>

* fix

Signed-off-by: sam1373 <samuelkriman@gmail.com>

* fix

Signed-off-by: sam1373 <samuelkriman@gmail.com>

* fix

Signed-off-by: sam1373 <samuelkriman@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: sam1373 <samuelkriman@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Refined export_config (#7053) (#7066)

* Refined export_config
* Rolling back hierarchy change
---------

Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com>
Co-authored-by: Boris Fomitchev <borisfom@users.noreply.github.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* small Bugfix (#7081)

* small Bugfix (#7079)

* fix branch

Signed-off-by: fayejf <fayejf07@gmail.com>

* fix typo

Signed-off-by: fayejf <fayejf07@gmail.com>

* fix link

Signed-off-by: fayejf <fayejf07@gmail.com>

---------

Signed-off-by: fayejf <fayejf07@gmail.com>

* Update tutorials/nlp/SpellMapper_English_ASR_Customization.ipynb

Signed-off-by: Somshubra Majumdar <titu1994@gmail.com>

* Update tutorials/nlp/SpellMapper_English_ASR_Customization.ipynb

Signed-off-by: Somshubra Majumdar <titu1994@gmail.com>

---------

Signed-off-by: fayejf <fayejf07@gmail.com>
Signed-off-by: Somshubra Majumdar <titu1994@gmail.com>
Co-authored-by: fayejf <36722593+fayejf@users.noreply.github.com>
Co-authored-by: Somshubra Majumdar <titu1994@gmail.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Added script to extract ASR CTC and RNNT models from ASR hybrid models (#7092)

* Added script to extract ctc and rnnt models from hybrid models

Signed-off-by: Daniel Egert <degert@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Updated hybrid extraction script for review request 1

Signed-off-by: Daniel Egert <degert@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Updated hybrid convert script to remove --cuda flag

Signed-off-by: Daniel Egert <degert@nvidia.com>

---------

Signed-off-by: Daniel Egert <degert@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Somshubra Majumdar <titu1994@gmail.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Adding docs and models for multiple lookahead cache-aware ASR (#7067) (#7094)

Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* update TTS readme (#7088)

* update TTS readme

Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>

---------

Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Fix absolute path in path join call (#7099)

Signed-off-by: Jan Beckmann <king-jan1999@hotmail.de>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Disable distopt contiguous param buffer by default (#7095)

Signed-off-by: Tim Moon <tmoon@nvidia.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* microphone demo (#7110)

Signed-off-by: Linnea Pari Leaver <lleaver@lleaver-mlt.client.nvidia.com>
Co-authored-by: Linnea Pari Leaver <lleaver@lleaver-mlt.client.nvidia.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* [Fix] load_state_dict in nlp_model.py (#7086)

* Fix load_state_dict in nlp_model.py

Signed-off-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Fix plot function in vad_utils.py (#7113)

Fix plot function in vad_utils.py

Signed-off-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Fixed small bug with NoisePerturbationWithNormalization (#7118)

Signed-off-by: Daniel Egert <degert@nvidia.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Fix import guard checks (#7124)

Signed-off-by: smajumdar <titu1994@gmail.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Revert "Fix import guard checks (#7124)" (#7125)

This reverts commit a46e3251944642f9102aa16ce2d2f9d3a804ff8a.

Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Fix import guard checks (#7126)

* Fix import guard checks

Signed-off-by: smajumdar <titu1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: smajumdar <titu1994@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Add updated fc ctc and rnnt xxl models (#7128) (#7130)

Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* [TTS] Create EnCodec training recipe (#6852)

* [TTS] Create EnCodec training recipe

Signed-off-by: Ryan <rlangman@nvidia.com>

* [TTS] Update encodec recipe

Signed-off-by: Ryan <rlangman@nvidia.com>

* [TTS] Rename EnCodec to AudioCodec

Signed-off-by: Ryan <rlangman@nvidia.com>

* [TTS] Add EnCodec unit tests

Signed-off-by: Ryan <rlangman@nvidia.com>

* [TTS] Add copyright header to distributed.py

Signed-off-by: Ryan <rlangman@nvidia.com>

---------

Signed-off-by: Ryan <rlangman@nvidia.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Fix rank where torch.distributed may not be initialized yet and would not wait for tokenizer file caching (#7061)

Signed-off-by: Kim Ngo <6362111+findkim@users.noreply.github.com>
Co-authored-by: David <amosalla@asu.edu>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* fix default attention size (#7141) (#7143)

Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* fix evaluator.py for various exceptions by ast (#7150)

Signed-off-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* [TTS][ZH] add Chinese TTS recipes based on IPA symbol sets. (#6893)

* [TTS] add Chinese TTS recipe based on IPA.
* add new pinyin and ipa dictionaries with 36 finals.
* add yaml configs for 24-final pinyin and ipa.
* add copyright header
* add a directory level 24finals to discriminate from 36 finals.

Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>

* unify configs into a single one and add detailed comments providing supported candidates.

Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>

* choose 36-final IPA as default phoneme dict

Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>

---------

Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* [TTS] Add output audio format to preprocessing (#6889)

* [TTS] Add output audio format to preprocessing

Signed-off-by: Ryan <rlangman@nvidia.com>

* [TTS] Add format validation

Signed-off-by: Ryan <rlangman@nvidia.com>

* [TTS] Fix data tutorial

Signed-off-by: Ryan <rlangman@nvidia.com>

---------

Signed-off-by: Ryan <rlangman@nvidia.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* freeze (#7152)

Signed-off-by: arendu <adithya.r@gmail.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* make sure any empty segments are removed (#7155)

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Update RIR generation scripts (#6547)

- fix: reduce room size if evaluation of params fails
- added randomized mic placement
- added diffuse noise generation
- added an option to specify the format and subtype for saved audio

Signed-off-by: Ante Jukić <ajukic@nvidia.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* A quickstart speech enhancement tutorial (#6492)

A simple example of training a model for speech enhancement task

Signed-off-by: Ante Jukić <ajukic@nvidia.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* NFA subtitle file config - specify colors and vertical alignment (#7160)

* allow specifying colors of text in ASS subtitle file

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* specify vertical_alignment instead of marginv in ass_file_config

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* add documentation of CTMFileConfig and ASSFileConfig to NFA README

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

---------

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Eagerly accumulate embedding grads into fp32 buffer (#6958) (#7153)

Signed-off-by: Tim Moon <tmoon@nvidia.com>
Co-authored-by: Tim Moon <4406448+timmoon10@users.noreply.github.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* TE bug fix (#7027) (#7036)

Signed-off-by: Dmytro Pykhtar <dpykhtar@nvidia.com>
Co-authored-by: Dmytro Pykhtar <37850217+dimapihtar@users.noreply.github.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* [TTS] Remove nested TTS configs (#7154)

* [TTS] Remove nested TTS configs

Signed-off-by: Ryan <rlangman@nvidia.com>

* [TTS] Modify tutorial to support multiple sampling rates

Signed-off-by: Ryan <rlangman@nvidia.com>

* [TTS] Clarify min_duration unit

Signed-off-by: Ryan <rlangman@nvidia.com>

* [TTS] Default 22.05kHz highfreq to null

Signed-off-by: Ryan <rlangman@nvidia.com>

---------

Signed-off-by: Ryan <rlangman@nvidia.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Merge release r1.20.0 to main (#7167)

* update package info

Signed-off-by: ericharper <complex451@gmail.com>

* Add ASR with TTS Tutorial. Fix enhancer usage. (#6955)

* Add ASR with TTS Tutorial
* Fix enhancer usage

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* install_bs (#7019)

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

* Fix typo and branch in tutorial (#7048)

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* fix syntax error introduced in PR-7079 (#7102)

* fix syntax error introduced in PR-7079

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>

* fixes for pr review

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>

---------

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>

* fix links for TN (#7117)

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* update branch (#7135)

Signed-off-by: ericharper <complex451@gmail.com>

* Fixed main and merging this to r1.20 (#7127)

* Fixed main and merging this to r1.20

Signed-off-by: Taejin Park <tango4j@gmail.com>

* Update vad_utils.py

Signed-off-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com>

---------

Signed-off-by: Taejin Park <tango4j@gmail.com>
Signed-off-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com>
Co-authored-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com>

* update branch

Signed-off-by: ericharper <complex451@gmail.com>

* fix version

Signed-off-by: ericharper <complex451@gmail.com>

* resolve conflict the other way

Signed-off-by: ericharper <complex451@gmail.com>

* keep both

Signed-off-by: ericharper <complex451@gmail.com>

* revert keep both

Signed-off-by: ericharper <complex451@gmail.com>

---------

Signed-off-by: ericharper <complex451@gmail.com>
Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
Signed-off-by: Nikolay Karpov <karpnv@gmail.com>
Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>
Signed-off-by: Evelina <ebakhturina@nvidia.com>
Signed-off-by: Taejin Park <tango4j@gmail.com>
Signed-off-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com>
Co-authored-by: Vladimir Bataev <vbataev@nvidia.com>
Co-authored-by: Nikolay Karpov <karpnv@gmail.com>
Co-authored-by: bene-ges <antonova_sasha@list.ru>
Co-authored-by: Evelina <10428420+ekmb@users.noreply.github.com>
Co-authored-by: Taejin Park <tango4j@gmail.com>
Co-authored-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Upgrade to pytorch lightning 2.0 (#6433)

* Upgrade pytorch lightning version in requirements

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Initial fixes for PTL2.0

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add further fixes to support lightning 2.0

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add replacements for replace_sampler_ddp, resume_from_checkpoint_fit_path and few occurances of validation_epoch_end

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Replace all occurances of validation_epoch_end to on_validation_epoch_end

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Replace training_epoch_end, test_epoch_end with on_train_epoch_end and on_test_epoch_end respectively

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Change logger=None to logger=False in Trainer object

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Remove PTL2.0 deprecated Trainer args from TrainerConfig dataclass

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Modify trainer.precision check and other small edits

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Replace logger=None with logger=False in test_ptl_stateless_timer.py Trainer

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add default values for args to fix Attribute Error

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add the following modifications

1) Remove outputs arg from on_validation_epoch_end, on_test_epoch_end and make it an arg of the class
2) Replace resume_from_checkpoint with ckpt_path as needed
3) Explicitly add accelerator as 'CPU' in UTs being run on CPU

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Remove outputs arg from on_validation_epoch_end, on_test_epoch_end

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Remove outputs arg in on_validation_epoch_end in MultiBinaryAccuracy docstrings

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add val, test outputs as instance vars in PunctuationCapitalizationModel and TokenClassificationModel

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Replace trainer.fit_loop.max_steps with trainer.fit_loop.epoch_loop.max_steps in test_optimizers_schedulers.py

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Revert an extra space that was mistakenly added

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Use self.validation_step_outputs and self.test_step_outputs in test_ema.py for uniformity

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Use self.validation_step_outputs and self.test_step_outputs in test_ptl_stateless_timer.py and check_for_ranks.py for uniformity

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add self.validation_step_outputs.clear() and self.test_step_outputs.clear() wherever missing

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Remove outputs arg from on_train_epoch_end

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Remove outputs from on_validation_epoch_end in multi_binary_acc.py

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Remove output args from on_validation_epoch_end in the docstrings of some ASR files

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Remove output args from on_validation_epoch_end and clear memory from validation_step_outputs

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add on_validation_epoch_end and remove outputs args for nlp models

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Append output of validation_step to validation_step_outputs in EncDecClassificationModel

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add the following changes

1) Index self.validation_step_outputs and self.test_step.outputs with dataloader_idx wherever needed
2) Initialize self.validation_step_outputs and self.test_step.outputs as empty lists and add support for multi dataloaders if they exist
3) Remove self.pre_configure_ddp from NLPDDPStrategy class as its removed in PTL 2.0

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add default value dataloader_idx=0 for on_validation_batch_end() in megatron_base_model.py

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* TypeCast precision to str in attention.py and utils_funcs.py to avoid TypeError

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add if condition check for multiple dataloaders when appending to validation outputs

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Separate validation pass to be used with both validation_step and test_step

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add if condition check for multiple dataloader while appending to test_step_outputs in punctuation_capitalization_model.py

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add condition check for multiple dataloaders based on type of trainer.val/test_dataloaders or self._validation/test_dl instead of len

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Comment Megatron T5 IA3 PP=2 in CI pipeline due to dataloader_iter issue with PTL 2.0

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Modify precision checks to account for 16-mixed and bf16-mixed

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Append output of validation/test_step to self.validation/test_step_outputs in CTCG2PModel

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Modify find_unused_parameters=True in g2p_heteronym model

1) Add find_unused_parameters=True for DDP strategy in g2p_heteronym_classification_train_and_evaluate.py
2) Remove args output in validation/test_step and add instance variables instead for heteronym_classification.py

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Remove outputs from on_test_epoch_end in DialogueGPTClassificationModel

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add validation/test outputs in sgdqa_model and modify dialogue_config.yaml

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add split arg self.test_step_outputs to TextClassificationModel

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add test_step_outputs to dialogue and text classification models

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Change condition check for multiple dataloaders:

1) Replace ds_item as list in dialogue_config.yaml
2) Check for len of val/test_dataloaders or validation/test_dl along with type check of list in sgdqa_model.py while appending outputs of validation/test_step
3) Check for len of _validation/test_dl for creating self.validation/test_step_outputs in ModelPT and punctuation_cpitalization_model.py

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add additional condition for multi dataloaders

Check len(self.trainer.val/test_dataloaders) > 1 along with type(self.trainer.val/test_dataloaders) == list for multi dataloaders in validation/test_step

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add val step outputs and default val for dataloader_idx

1) Append validation_step outout to self.validation_step_outputs in MultiLabelIntentSlotClassificationMode
2) Add default val for dataloader_idx for on_test_batch_start/end in TimingCallback
3) Add self.validation/test_step_outputs in BERTQAModel and remove outputs arg

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add val/test_step_outputs to S2SQAModel and GPTQAModel

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Edit JenkinsFile for bert_pretrainig.py

Edit Jenkinsfile for this test to disable validation as a workaround for trainer.val_dataloader None error

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Modify precision to support 16-mixed, bf16-mixed in megatron_gpt_pretraining.py

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add ddp_find_unused_parameters_true and remove output args

1) Add ddp_find_unused_parameters_true fro trainer.strategy in self_alignment_pretraining.py as it has unused parameters
2) Remove output args and add self.validation/test_step_outputs to validation/test_step in mt_enc_dec_model.py
3) Comment tests in JenkinsFile that need to be fixed

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Precision fix in megatron_nmt_training.py for 16-mixed, bf16-mixed

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Precision fix for megatron_bert_pretraining.py and megatron_bert_model.py

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Precision fix and validation/test_step_outputs

1) Add fix to account for 16-mixed and bf16-mixed in megatron_retro_mutransfer_pretrain.py, megatron_retro_pretraining.py
2) Reset ckpt_path for test in enc_dec_nmt.py
3) Remove outputs args and add validation/test_step_outputs in megatron_retrieval_model.py
4) Comment Megatron Bert Pretraining and Resume Training with Pipeline Paralleism and add back NMT Training Post-LN

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Precision fix and skip few failing tests

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add missing comment lines in JenkinsFile

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Comment jenkin tests and super().on_validation_epoch_end() in megatron_gpt_sft_model.py

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Minor edit JenkinsFile

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Minor edit in jenkins file

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Edit in Jenkins file

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Comment missed lines in Jenkins file

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Fix precision and validation/test outputs

1) Add precision fix to account for 16-mixed and bf16-mixed in megatron_t5_pretraining.py
2) Remove outputs args and add append loss to self.validation/test_step_outputs in megatron_lm_encoder_decoder_model.py
3) Add back resume_from_checkpoint in the megatron_t5_config.yaml
4) Comment out certain tests in Jenkins file

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Fix precision and validation/test/predict errors in megatron_t5_prompt_learning.py

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Precision fix and edit precision typo in all files

1) Account for 16-mixed and bf16-mixed in megatron_bart_pretraining.py and megatron_t5_seq2seq_finetune.py
2) Fix precision typo in all files

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Fix all CI TTS tests and comment few Jenkins tests

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Combine xx_epoch_end and on_xx_epoch_end

Add on_inference_epoch_end to inference_epoch_end function and have a single on_validation/test_epoch_end in megatron_finetune_model.py and megatron_gpt_sft_model.py

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add a missing comment in JenkinsFile

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add try except StopIteration in validation_step for models with dataloader_iter

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Remove pyyaml from requirements

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add try except for inference_step in megatron_finetune_model.py

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Remove limit_val_batches for mockGPTDataset test

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add new self.validation_step_outputs for MegatronGPTSFTModel

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Minor edit Jenkinsfile

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Initialize self.validation/test_step_outputs in megatron_gpt_sft_model.py

Initialize self.validation/test_step_outputs in setup of MegatronGPTSFTModel to take care of cases when datalaoders are not setup in ModelPT for example while restoring the model.

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Remove resume_from_checkpoint if trainer arg in conf yaml files

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Remove resume_from_checkpoint as trainer arg in GPT, T5 configs

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Remove resume_from_checkpoint in duplex_tn_config.yaml

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Fix typos, unused imports and refactor code to remove redundant funcs

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Remove commented code in megatron_nmt_model.py

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Fix overriden functions to match parent class functions

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Prefetch dataloader_iter to prevent hang for PP>1

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Override setup() in NLPDDPStrategy to avoid hang during predict with PP>1

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Uncomment tests in JenkinsFile

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add '16' to precision checks and other minor fixes

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Clear validation/test_step_outputs with dataloader_idx for multi dataloaders

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Minor edits

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Modify precision checks to avoid indexing

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Remove self.validation_step_outputs_sft and add dataloader_idx to clear outputs

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Reference checkpoint with trainer.ckpt_path

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add _prefetch to NLPModel and minor fixes

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add limit_val_batches in JenkinsFile for NMT

1) Add trainer.limit_val_batches in Megatron NMT Training TP=2
2) Remove unused import in ModelPT

Signed-off-by: Abhishree <abhishreetm@gmail.com>

---------

Signed-off-by: Abhishree <abhishreetm@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Include the scripts for preprocessing OAST and unit tests for chat sft datasets (#7112)

* scripts for sft

Signed-off-by: Yi Dong <yidong@nvidia.com>

* fix style

Signed-off-by: Yi Dong <yidong@nvidia.com>

* adde special token only for huggingface model

Signed-off-by: Yi Dong <yidong@nvidia.com>

* change default name

Signed-off-by: Yi Dong <yidong@nvidia.com>

* print out error datapoint content

Signed-off-by: Yi Dong <yidong@nvidia.com>

* show error id

Signed-off-by: Yi Dong <yidong@nvidia.com>

* annotation script working

Signed-off-by: Yi Dong <yidong@nvidia.com>

* try to be compatible with huggingface tokenizer

Signed-off-by: Yi Dong <yidong@nvidia.com>

* added examples

Signed-off-by: Yi Dong <yidong@nvidia.com>

* added lang

Signed-off-by: Yi Dong <yidong@nvidia.com>

* added lang

Signed-off-by: Yi Dong <yidong@nvidia.com>

* text to value special case

Signed-off-by: Yi Dong <yidong@nvidia.com>

* configure the slider

Signed-off-by: Yi Dong <yidong@nvidia.com>

* annoatation handles lang

Signed-off-by: Yi Dong <yidong@nvidia.com>

* added the unit test for chat sft dataset

Signed-off-by: Yi Dong <yidong@nvidia.com>

* used the file in the test dir

Signed-off-by: Yi Dong <yidong@nvidia.com>

* fix json error

Signed-off-by: Yi Dong <yidong@nvidia.com>

* load local tokenizer

Signed-off-by: Yi Dong <yidong@nvidia.com>

* remove mask count check

Signed-off-by: Yi Dong <yidong@nvidia.com>

* added HF dataset backend

Signed-off-by: Yi Dong <yidong@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Yi Dong <yidong@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* add paths to labeler. (#7087)

Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Kim Ngo <6362111+findkim@users.noreply.github.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>
Signed-off-by: tbartley94 <tbartley@nvidia.com>
Signed-off-by: Nikolay Karpov <karpnv@gmail.com>
Signed-off-by: Yi Dong <yidong@nvidia.com>
Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>
Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>
Signed-off-by: smajumdar <titu1994@gmail.com>
Signed-off-by: AlexGrinch <grinchuk.alexey@gmail.com>
Signed-off-by: Evelina <ebakhturina@nvidia.com>
Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
Signed-off-by: Vitaly Lavrukhin <vlavrukhin@nvidia.com>
Signed-off-by: stevehuang52 <heh@nvidia.com>
Signed-off-by: arendu <adithya.r@gmail.com>
Signed-off-by: sam1373 <samuelkriman@gmail.com>
Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com>
Signed-off-by: fayejf <fayejf07@gmail.com>
Signed-off-by: Somshubra Majumdar <titu1994@gmail.com>
Signed-off-by: Daniel Egert <degert@nvidia.com>
Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Signed-off-by: Jan Beckmann <king-jan1999@hotmail.de>
Signed-off-by: Tim Moon <tmoon@nvidia.com>
Signed-off-by: Linnea Pari Leaver <lleaver@lleaver-mlt.client.nvidia.com>
Signed-off-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com>
Signed-off-by: Ryan <rlangman@nvidia.com>
Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>
Signed-off-by: Ante Jukić <ajukic@nvidia.com>
Signed-off-by: Dmytro Pykhtar <dpykhtar@nvidia.com>
Signed-off-by: ericharper <complex451@gmail.com>
Signed-off-by: Taejin Park <tango4j@gmail.com>
Signed-off-by: Abhishree <abhishreetm@gmail.com>
Co-authored-by: Kim Ngo <6362111+findkim@users.noreply.github.com>
Co-authored-by: tbartley94 <90423858+tbartley94@users.noreply.github.com>
Co-authored-by: Nikolay Karpov <karpnv@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Somshubra Majumdar <titu1994@gmail.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Yi Dong <43824965+yidong72@users.noreply.github.com>
Co-authored-by: Aleksandr Laptev <alaptev@nvidia.com>
Co-authored-by: bene-ges <antonova_sasha@list.ru>
Co-authored-by: Evelina <10428420+ekmb@users.noreply.github.com>
Co-authored-by: Aleksey Grinchuk (Oleksii Hrinchuk) <grinchuk.alexey@gmail.com>
Co-authored-by: Vladimir Bataev <vbataev@nvidia.com>
Co-authored-by: Vitaly Lavrukhin <vlavrukhin@nvidia.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
Co-authored-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com>
Co-authored-by: fayejf <36722593+fayejf@users.noreply.github.com>
Co-authored-by: Adi Renduchintala <adithyare@nvidia.com>
Co-authored-by: Vahid Noroozi <VahidooX@users.noreply.github.com>
Co-authored-by: Samuel Kriman <samuelkriman@gmail.com>
Co-authored-by: Boris Fomitchev <borisfom@users.noreply.github.com>
Co-authored-by: trias702 <25867060+trias702@users.noreply.github.com>
Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Co-authored-by: Jan Beckmann <king-jan1999@hotmail.de>
Co-authored-by: Tim Moon <4406448+timmoon10@users.noreply.github.com>
Co-authored-by: lleaver <137942999+lleaver@users.noreply.github.com>
Co-authored-by: Linnea Pari Leaver <lleaver@lleaver-mlt.client.nvidia.com>
Co-authored-by: Ryan Langman <rlangman@nvidia.com>
Co-authored-by: David <amosalla@asu.edu>
Co-authored-by: Elena Rastorgueva <80532067+erastorgueva-nv@users.noreply.github.com>
Co-authored-by: anteju <108555623+anteju@users.noreply.github.com>
Co-authored-by: Dmytro Pykhtar <37850217+dimapihtar@users.noreply.github.com>
Co-authored-by: Taejin Park <tango4j@gmail.com>
Co-authored-by: Abhishree Thittenamane <47577437+athitten@users.noreply.github.com>
---
 .../conf/megatron_t5_finetune.yaml            |  2 +-
 .../megatron_finetune_model.py                | 30 +++++++++++++------
 2 files changed, 22 insertions(+), 10 deletions(-)

diff --git a/examples/nlp/language_modeling/conf/megatron_t5_finetune.yaml b/examples/nlp/language_modeling/conf/megatron_t5_finetune.yaml
index ab9939af518f..2ba68cbc5979 100644
--- a/examples/nlp/language_modeling/conf/megatron_t5_finetune.yaml
+++ b/examples/nlp/language_modeling/conf/megatron_t5_finetune.yaml
@@ -87,7 +87,7 @@ model:
       add_bos_to_input: ${data.train_ds.add_bos_to_input}
       add_eos_to_input: ${data.train_ds.add_eos_to_input}
       metric:
-        name: "exact_string_match" # Name of the evaluation metric to use.
+        name: "exact_string_match" # Name of the evaluation metric to use. Supported metrics: [`exact_string_match`, `rouge`, `pearson_corr_coef`, `spearman_corr_coef`, `f1`, `accuracy`, `average_precision`]
         average: micro # Average the metric over the dataset. Options: ['macro', 'micro']. Works only for 'F1', 'accuracy' etc. Refer to torchmetrics for metrics where this is supported.
         num_classes: null # Number of classes for the metric. Works only for 'F1', 'accuracy' and 'average_precision' etc. Refer to torchmetrics for metrics where this is supported.
         class_labels: null # If the targets in your dataset are strings and not integers/float, you need to provide a list of class labels (size = num_classes) so we can convert from strings to integer categories to compute the metric.
diff --git a/nemo/collections/nlp/models/language_modeling/megatron_finetune_model.py b/nemo/collections/nlp/models/language_modeling/megatron_finetune_model.py
index fb1fe83ee68e..9fce0d52c4a1 100644
--- a/nemo/collections/nlp/models/language_modeling/megatron_finetune_model.py
+++ b/nemo/collections/nlp/models/language_modeling/megatron_finetune_model.py
@@ -106,24 +106,36 @@ def setup_metric(self, data_cfg):
                         )
 
             metric_name = data_cfg.metric.name
-            metric = MetricStringToTorchMetric[metric_name]
+            metric_class = MetricStringToTorchMetric[metric_name]
+
             # GLUE will not have a "src_file_name" attribute and will always have only a single metric.
             if hasattr(data_cfg, "src_file_name") or hasattr(data_cfg, "file_names"):
-                if hasattr(data_cfg, "src_file_name") and isinstance(data_cfg.src_file_name, ListConfig):
-                    # We pass average and num_classes to the metric constructor via kwargs even if they don't exist for each metric.
+                if (
+                    hasattr(data_cfg, "src_file_name")
+                    and isinstance(data_cfg.src_file_name, ListConfig)
+                    and metric_name != 'rouge'
+                ):
                     metric = [
-                        metric(average=data_cfg.metric.average, num_classes=data_cfg.metric.num_classes)
+                        metric_class(average=data_cfg.metric.average, num_classes=data_cfg.metric.num_classes)
                         for _ in range(len(data_cfg.src_file_name))
                     ]
-                elif hasattr(data_cfg, "file_names") and isinstance(data_cfg.file_names, ListConfig):
+                elif (
+                    hasattr(data_cfg, "file_names")
+                    and isinstance(data_cfg.file_names, ListConfig)
+                    and metric_name != 'rouge'
+                ):
                     metric = [
-                        metric(average=data_cfg.metric.average, num_classes=data_cfg.metric.num_classes)
+                        metric_class(average=data_cfg.metric.average, num_classes=data_cfg.metric.num_classes)
                         for _ in range(len(data_cfg.file_names))
                     ]
+                elif hasattr(data_cfg, "src_file_name") and isinstance(data_cfg.src_file_name, ListConfig):
+                    metric = [metric_class() for _ in range(len(data_cfg.src_file_name))]
+                elif hasattr(data_cfg, "file_names") and isinstance(data_cfg.file_names, ListConfig):
+                    metric = [metric_class() for _ in range(len(data_cfg.file_names))]
                 else:
-                    metric = [metric(average=data_cfg.metric.average, num_classes=data_cfg.metric.num_classes)]
+                    metric = [metric_class(average=data_cfg.metric.average, num_classes=data_cfg.metric.num_classes)]
             else:
-                metric = [metric()]  # GLUE does need to specify average or num_classes.
+                metric = [metric_class()]  # GLUE does need to specify average or num_classes.
 
         return metric, metric_name
 
@@ -221,7 +233,7 @@ def cast_for_metric(self, pred, label, metric_name, class_labels=None, labels_ar
             else:
                 pred = class_labels.index(pred)
             if label not in class_labels:
-                raise ValueError(f"Ground truth labe; {label} is not in the class labels list : {class_labels}")
+                raise ValueError(f"Ground truth label {label} is not in the class labels list : {class_labels}")
             label = class_labels.index(label)
             pred = torch.LongTensor([pred]).to(self.device)
             label = torch.LongTensor([label]).to(self.device)