Merge main to asr_normalize #7084

KunalDhawan · 2023-07-20T21:08:11Z

What does this PR do ?

Merging main to asr_normalize branch

Collection: [Note which collection this PR will affect]

Changelog

Add specific line by line info of high level changes in this PR.

Usage

You can potentially add a usage example below

# Add a code snippet demonstrating how to use this

Before your PR is "Ready for review"

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests?
Did you add or update any necessary documentation?
Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
- Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

New Feature
Bugfix
Documentation

If you haven't finished some of the above items you can still open "Draft" PR.

Who can review?

Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.

Additional Information

Related to # (issue)

* update to load from ckpt Signed-off-by: arendu <[email protected]> * update Signed-off-by: arendu <[email protected]> * load ckpt peft model Signed-off-by: arendu <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update style Signed-off-by: arendu <[email protected]> --------- Signed-off-by: arendu <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* add model, dataset, necessary utils and tests Signed-off-by: stevehuang52 <[email protected]> * fix tarred data Signed-off-by: stevehuang52 <[email protected]> * fix typo Signed-off-by: stevehuang52 <[email protected]> * add fvad examples and update utils Signed-off-by: stevehuang52 <[email protected]> * add copyright Signed-off-by: stevehuang52 <[email protected]> * refactor and add tests Signed-off-by: stevehuang52 <[email protected]> * update dataset Signed-off-by: stevehuang52 <[email protected]> * update test Signed-off-by: stevehuang52 <[email protected]> * refactor Signed-off-by: stevehuang52 <[email protected]> * refactor Signed-off-by: stevehuang52 <[email protected]> * fix typos Signed-off-by: stevehuang52 <[email protected]> --------- Signed-off-by: stevehuang52 <[email protected]> Co-authored-by: fayejf <[email protected]> Co-authored-by: Taejin Park <[email protected]>

Signed-off-by: Xuesong Yang <[email protected]>

* bug fixes Signed-off-by: Alexandra Antonova <[email protected]> * fix bugs, add preparation and evaluation scripts, add readme Signed-off-by: Alexandra Antonova <[email protected]> * small fixes Signed-off-by: Alexandra Antonova <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add real coverage calculation, small fixes, more debug information Signed-off-by: Alexandra Antonova <[email protected]> * add option to pass a filelist and output folder - to handle inference from multiple input files Signed-off-by: Alexandra Antonova <[email protected]> * added preprocessing for yago wikipedia articles - finding yago entities and their subphrases Signed-off-by: Alexandra Antonova <[email protected]> * yago wiki preprocessing, sampling, pseudonormalization Signed-off-by: Alexandra Antonova <[email protected]> * more scripts for preparation of training examples Signed-off-by: Alexandra Antonova <[email protected]> * bug fixes Signed-off-by: Alexandra Antonova <[email protected]> * add some alphabet checks Signed-off-by: Alexandra Antonova <[email protected]> * add bert on subwords, concatenate it to bert on characters Signed-off-by: Alexandra Antonova <[email protected]> * add calculation of character_pos_to_subword_pos Signed-off-by: Alexandra Antonova <[email protected]> * bug fix Signed-off-by: Alexandra Antonova <[email protected]> * bug fix Signed-off-by: Alexandra Antonova <[email protected]> * pdb Signed-off-by: Alexandra Antonova <[email protected]> * tensor join bug fix Signed-off-by: Alexandra Antonova <[email protected]> * double hidden_size in classifier Signed-off-by: Alexandra Antonova <[email protected]> * pdb Signed-off-by: Alexandra Antonova <[email protected]> * default index value 0 instead of -1 because index cannot be negative Signed-off-by: Alexandra Antonova <[email protected]> * pad index value 0 instead of -1 because index cannot be negative Signed-off-by: Alexandra Antonova <[email protected]> * remove pdb Signed-off-by: Alexandra Antonova <[email protected]> * fix bugs, add creation of tarred dataset Signed-off-by: Alexandra Antonova <[email protected]> * add possibility to change sequence len at inference Signed-off-by: Alexandra Antonova <[email protected]> * change sampling of dummy candidates at inference, add candidate info file Signed-off-by: Alexandra Antonova <[email protected]> * fix import Signed-off-by: Alexandra Antonova <[email protected]> * fix bug Signed-off-by: Alexandra Antonova <[email protected]> * update transcription now uses info Signed-off-by: Alexandra Antonova <[email protected]> * write path Signed-off-by: Alexandra Antonova <[email protected]> * 1. add tarred dataset support(untested). 2. fix bug with ban_ngrams in indexing Signed-off-by: Alexandra Antonova <[email protected]> * skip short_sent if no real candidates Signed-off-by: Alexandra Antonova <[email protected]> * fix import Signed-off-by: Alexandra Antonova <[email protected]> * add braceexpand Signed-off-by: Alexandra Antonova <[email protected]> * fixes Signed-off-by: Alexandra Antonova <[email protected]> * fix bug Signed-off-by: Alexandra Antonova <[email protected]> * fix bug Signed-off-by: Alexandra Antonova <[email protected]> * fix bug in np.ones Signed-off-by: Alexandra Antonova <[email protected]> * fix bug in collate Signed-off-by: Alexandra Antonova <[email protected]> * change tensor type to long because of error in torch.gather Signed-off-by: Alexandra Antonova <[email protected]> * fix for empty spans tensor Signed-off-by: Alexandra Antonova <[email protected]> * same fixes in _collate_fn for tarred dataset Signed-off-by: Alexandra Antonova <[email protected]> * fix bug from previous commit Signed-off-by: Alexandra Antonova <[email protected]> * change int types to be shorter to minimize tar size Signed-off-by: Alexandra Antonova <[email protected]> * refactoring of datasets and inference Signed-off-by: Alexandra Antonova <[email protected]> * bug fix Signed-off-by: Alexandra Antonova <[email protected]> * bug fix Signed-off-by: Alexandra Antonova <[email protected]> * bug fix Signed-off-by: Alexandra Antonova <[email protected]> * tar by 100k examples, small fixes Signed-off-by: Alexandra Antonova <[email protected]> * small fixes, add analytics script Signed-off-by: Alexandra Antonova <[email protected]> * Add functions for dynamic programming comparison to get best path by ngrams Signed-off-by: Alexandra Antonova <[email protected]> * fixes Signed-off-by: Alexandra Antonova <[email protected]> * small fix Signed-off-by: Alexandra Antonova <[email protected]> * fixes to support testing on SPGISpeech Signed-off-by: Alexandra Antonova <[email protected]> * add preprocessing for userlibri Signed-off-by: Alexandra Antonova <[email protected]> * some refactoring Signed-off-by: Alexandra Antonova <[email protected]> * some refactoring Signed-off-by: Alexandra Antonova <[email protected]> * move some functions to utils to reuse from other project Signed-off-by: Alexandra Antonova <[email protected]> * move some functions to utils to reuse from other project Signed-off-by: Alexandra Antonova <[email protected]> * move some functions to utils to reuse from other project Signed-off-by: Alexandra Antonova <[email protected]> * small refactoring before pr. Add bash-scripts reproducing evaluation Signed-off-by: Alexandra Antonova <[email protected]> * style fix Signed-off-by: Alexandra Antonova <[email protected]> * small fixes in inference Signed-off-by: Alexandra Antonova <[email protected]> * bug fix - didn't move window on last symbol Signed-off-by: Alexandra Antonova <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix bug - shuffle was before truncation of sorted candidates Signed-off-by: Alexandra Antonova <[email protected]> * refactoring, fix some bugs Signed-off-by: Alexandra Antonova <[email protected]> * variour fixes. Add word_indices at inference Signed-off-by: Alexandra Antonova <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add candidate positions Signed-off-by: Alexandra Antonova <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Move data preparation and evaluation to other repo Signed-off-by: Alexandra Antonova <[email protected]> * add infer_reproduce_paper. Refactoring Signed-off-by: Alexandra Antonova <[email protected]> * refactor inference using fragment indices Signed-off-by: Alexandra Antonova <[email protected]> * add some helper functions Signed-off-by: Alexandra Antonova <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix bug with parameters order Signed-off-by: Alexandra Antonova <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix bugs Signed-off-by: Alexandra Antonova <[email protected]> * refactoring, fix bug Signed-off-by: Alexandra Antonova <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add multiple variants of adjusting start/end positions Signed-off-by: Alexandra Antonova <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * more fixes Signed-off-by: Alexandra Antonova <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add unit tests, other fixes Signed-off-by: Alexandra Antonova <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by: Alexandra Antonova <[email protected]> * fix CodeQl warnings Signed-off-by: Alexandra Antonova <[email protected]> * bug fixes Signed-off-by: Alexandra Antonova <[email protected]> * fix bugs, add preparation and evaluation scripts, add readme Signed-off-by: Alexandra Antonova <[email protected]> * small fixes Signed-off-by: Alexandra Antonova <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add real coverage calculation, small fixes, more debug information Signed-off-by: Alexandra Antonova <[email protected]> * add option to pass a filelist and output folder - to handle inference from multiple input files Signed-off-by: Alexandra Antonova <[email protected]> * added preprocessing for yago wikipedia articles - finding yago entities and their subphrases Signed-off-by: Alexandra Antonova <[email protected]> * yago wiki preprocessing, sampling, pseudonormalization Signed-off-by: Alexandra Antonova <[email protected]> * more scripts for preparation of training examples Signed-off-by: Alexandra Antonova <[email protected]> * bug fixes Signed-off-by: Alexandra Antonova <[email protected]> * add some alphabet checks Signed-off-by: Alexandra Antonova <[email protected]> * add bert on subwords, concatenate it to bert on characters Signed-off-by: Alexandra Antonova <[email protected]> * add calculation of character_pos_to_subword_pos Signed-off-by: Alexandra Antonova <[email protected]> * bug fix Signed-off-by: Alexandra Antonova <[email protected]> * bug fix Signed-off-by: Alexandra Antonova <[email protected]> * pdb Signed-off-by: Alexandra Antonova <[email protected]> * tensor join bug fix Signed-off-by: Alexandra Antonova <[email protected]> * double hidden_size in classifier Signed-off-by: Alexandra Antonova <[email protected]> * pdb Signed-off-by: Alexandra Antonova <[email protected]> * default index value 0 instead of -1 because index cannot be negative Signed-off-by: Alexandra Antonova <[email protected]> * pad index value 0 instead of -1 because index cannot be negative Signed-off-by: Alexandra Antonova <[email protected]> * remove pdb Signed-off-by: Alexandra Antonova <[email protected]> * fix bugs, add creation of tarred dataset Signed-off-by: Alexandra Antonova <[email protected]> * add possibility to change sequence len at inference Signed-off-by: Alexandra Antonova <[email protected]> * change sampling of dummy candidates at inference, add candidate info file Signed-off-by: Alexandra Antonova <[email protected]> * fix import Signed-off-by: Alexandra Antonova <[email protected]> * fix bug Signed-off-by: Alexandra Antonova <[email protected]> * update transcription now uses info Signed-off-by: Alexandra Antonova <[email protected]> * write path Signed-off-by: Alexandra Antonova <[email protected]> * 1. add tarred dataset support(untested). 2. fix bug with ban_ngrams in indexing Signed-off-by: Alexandra Antonova <[email protected]> * skip short_sent if no real candidates Signed-off-by: Alexandra Antonova <[email protected]> * fix import Signed-off-by: Alexandra Antonova <[email protected]> * add braceexpand Signed-off-by: Alexandra Antonova <[email protected]> * fixes Signed-off-by: Alexandra Antonova <[email protected]> * fix bug Signed-off-by: Alexandra Antonova <[email protected]> * fix bug Signed-off-by: Alexandra Antonova <[email protected]> * fix bug in np.ones Signed-off-by: Alexandra Antonova <[email protected]> * fix bug in collate Signed-off-by: Alexandra Antonova <[email protected]> * change tensor type to long because of error in torch.gather Signed-off-by: Alexandra Antonova <[email protected]> * fix for empty spans tensor Signed-off-by: Alexandra Antonova <[email protected]> * same fixes in _collate_fn for tarred dataset Signed-off-by: Alexandra Antonova <[email protected]> * fix bug from previous commit Signed-off-by: Alexandra Antonova <[email protected]> * change int types to be shorter to minimize tar size Signed-off-by: Alexandra Antonova <[email protected]> * refactoring of datasets and inference Signed-off-by: Alexandra Antonova <[email protected]> * bug fix Signed-off-by: Alexandra Antonova <[email protected]> * bug fix Signed-off-by: Alexandra Antonova <[email protected]> * bug fix Signed-off-by: Alexandra Antonova <[email protected]> * tar by 100k examples, small fixes Signed-off-by: Alexandra Antonova <[email protected]> * small fixes, add analytics script Signed-off-by: Alexandra Antonova <[email protected]> * Add functions for dynamic programming comparison to get best path by ngrams Signed-off-by: Alexandra Antonova <[email protected]> * fixes Signed-off-by: Alexandra Antonova <[email protected]> * small fix Signed-off-by: Alexandra Antonova <[email protected]> * fixes to support testing on SPGISpeech Signed-off-by: Alexandra Antonova <[email protected]> * add preprocessing for userlibri Signed-off-by: Alexandra Antonova <[email protected]> * some refactoring Signed-off-by: Alexandra Antonova <[email protected]> * some refactoring Signed-off-by: Alexandra Antonova <[email protected]> * move some functions to utils to reuse from other project Signed-off-by: Alexandra Antonova <[email protected]> * move some functions to utils to reuse from other project Signed-off-by: Alexandra Antonova <[email protected]> * move some functions to utils to reuse from other project Signed-off-by: Alexandra Antonova <[email protected]> * small refactoring before pr. Add bash-scripts reproducing evaluation Signed-off-by: Alexandra Antonova <[email protected]> * style fix Signed-off-by: Alexandra Antonova <[email protected]> * small fixes in inference Signed-off-by: Alexandra Antonova <[email protected]> * bug fix - didn't move window on last symbol Signed-off-by: Alexandra Antonova <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix bug - shuffle was before truncation of sorted candidates Signed-off-by: Alexandra Antonova <[email protected]> * refactoring, fix some bugs Signed-off-by: Alexandra Antonova <[email protected]> * variour fixes. Add word_indices at inference Signed-off-by: Alexandra Antonova <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add candidate positions Signed-off-by: Alexandra Antonova <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Move data preparation and evaluation to other repo Signed-off-by: Alexandra Antonova <[email protected]> * add infer_reproduce_paper. Refactoring Signed-off-by: Alexandra Antonova <[email protected]> * refactor inference using fragment indices Signed-off-by: Alexandra Antonova <[email protected]> * add some helper functions Signed-off-by: Alexandra Antonova <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix bug with parameters order Signed-off-by: Alexandra Antonova <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix bugs Signed-off-by: Alexandra Antonova <[email protected]> * refactoring, fix bug Signed-off-by: Alexandra Antonova <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add multiple variants of adjusting start/end positions Signed-off-by: Alexandra Antonova <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * more fixes Signed-off-by: Alexandra Antonova <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add unit tests, other fixes Signed-off-by: Alexandra Antonova <[email protected]> * fix Signed-off-by: Alexandra Antonova <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix CodeQl warnings Signed-off-by: Alexandra Antonova <[email protected]> * add script for full inference pipeline, refactoring Signed-off-by: Alexandra Antonova <[email protected]> * add tutorial Signed-off-by: Alexandra Antonova <[email protected]> * take example data from HuggingFace Signed-off-by: Alexandra Antonova <[email protected]> * add docs Signed-off-by: Alexandra Antonova <[email protected]> * fix comment Signed-off-by: Alexandra Antonova <[email protected]> * fix bug Signed-off-by: Alexandra Antonova <[email protected]> * small fixes for PR Signed-off-by: Alexandra Antonova <[email protected]> * add some more tests Signed-off-by: Alexandra Antonova <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * try to fix tests adding with_downloads Signed-off-by: Alexandra Antonova <[email protected]> * skip tests with tokenizer download Signed-off-by: Alexandra Antonova <[email protected]> --------- Signed-off-by: Alexandra Antonova <[email protected]> Signed-off-by: Alexandra Antonova <[email protected]> Co-authored-by: Alexandra Antonova <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* [TTS] Implement new vocoder dataset Signed-off-by: Ryan <[email protected]> * [TTS] Redo config structure, minor fixes Signed-off-by: Ryan <[email protected]> * [TTS] Fix alignment logging Signed-off-by: Ryan <[email protected]> * [TTS] Fix script usage example Signed-off-by: Ryan <[email protected]> * [TTS] Fixed epoch LR scheduling Signed-off-by: Ryan <[email protected]> * [TTS] Support .nemo checkpoint in FP callback Signed-off-by: Ryan <[email protected]> * [TTS] Remove align interpolator Signed-off-by: Ryan <[email protected]> * [TTS] Remove HiFi-GAN defaults list interpolation Signed-off-by: Ryan <[email protected]> * [TTS] Rename weighted_sample_steps to weighted_sampling_steps_per_epoch Signed-off-by: Ryan <[email protected]> --------- Signed-off-by: Ryan <[email protected]>

* deb infer Signed-off-by: Evelina <[email protected]> * deb infer Signed-off-by: Evelina <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * clean up Signed-off-by: Evelina <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * dont do maxlen trunc for non abs pos emb Signed-off-by: Evelina <[email protected]> * dont do maxlen trunc for non abs pos emb Signed-off-by: Evelina <[email protected]> * convert for training only Signed-off-by: Evelina <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add eval test, add save .nemo for sft model Signed-off-by: Evelina <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * jenkins format fix Signed-off-by: Evelina <[email protected]> * update jenkins Signed-off-by: Evelina <[email protected]> * update jenkins Signed-off-by: Evelina <[email protected]> * fix jenkins Signed-off-by: Evelina <[email protected]> * remove test, ci timeout Signed-off-by: Evelina <[email protected]> * fix for m_gpt_eval.py Signed-off-by: Evelina <[email protected]> * jenkins test Signed-off-by: Evelina <[email protected]> * fix gpt_eval with sft model Signed-off-by: Evelina <[email protected]> * revert jenkins Signed-off-by: Evelina <[email protected]> * keep float conversion for model.generate() Signed-off-by: Evelina <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix inference dtype Signed-off-by: Evelina <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Evelina <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* TDT model pull request, initial draft Signed-off-by: Hainan Xu <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * TDT PR WIP Signed-off-by: Hainan Xu <[email protected]> * TDT PR WIP Signed-off-by: Hainan Xu <[email protected]> * TDT PR WIP Signed-off-by: Hainan Xu <[email protected]> * TDT WIP Signed-off-by: Hainan Xu <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * TDT WIP Signed-off-by: Hainan Xu <[email protected]> * TDT WIP Signed-off-by: Hainan Xu <[email protected]> * TDT WIP Signed-off-by: Hainan Xu <[email protected]> * TDT WIP Signed-off-by: Hainan Xu <[email protected]> * TDT WIP Signed-off-by: Hainan Xu <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * TDT WIP Signed-off-by: Hainan Xu <[email protected]> * TDT WIP Signed-off-by: Hainan Xu <[email protected]> * TDT WIP Signed-off-by: Hainan Xu <[email protected]> * TDT WIP Signed-off-by: Hainan Xu <[email protected]> * TDT WIP Signed-off-by: Hainan Xu <[email protected]> * addressed some review comments, part1 Signed-off-by: Hainan Xu <[email protected]> * addressed some review comments, part1, one line fix Signed-off-by: Hainan Xu <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add tests for comparing TDT alphas with pytorch VS kernel computation Signed-off-by: Hainan Xu <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add tests for comparing multiblank alphas with pytorch VS kernel computation Signed-off-by: Hainan Xu <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add tests for fixed case computation for TDT Signed-off-by: Hainan Xu <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add more comments for greedy-batch decoding for TDT Signed-off-by: Hainan Xu <[email protected]> * include config for TDT model with stateless decoders Signed-off-by: Hainan Xu <[email protected]> * add reference to TDT in Readme Signed-off-by: Hainan Xu <[email protected]> * slight modification of config file comments Signed-off-by: Hainan Xu <[email protected]> * addressed more comments Signed-off-by: Hainan Xu <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * more detailed comments for tdt kernel Signed-off-by: Hainan Xu <[email protected]> * one line fix Signed-off-by: Hainan Xu <[email protected]> * fixed small bug that results in test fails for rnnt_decoding Signed-off-by: Hainan Xu <[email protected]> * fixed small bug that results in test fails for rnnt_decoding Signed-off-by: Hainan Xu <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fixed small bug that results in test fails for rnnt_decoding Signed-off-by: Hainan Xu <[email protected]> * remove unused import Signed-off-by: Hainan Xu <[email protected]> --------- Signed-off-by: Hainan Xu <[email protected]> Co-authored-by: Hainan Xu <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* fix get param * change name --------- Signed-off-by: ericharper <[email protected]> Co-authored-by: Eric Harper <[email protected]>

* initial POC for LDDL Bert * Finish LDDL POC * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * address comments * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix merge head * resolving merge * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add support for val/test loaders * change to new LDDL class + add winding * fix logging level * fix winding * test fix * fixes to winding * add file system * add prepemption optimizations * more logging * more prints * better logging * asfsf * add barrier * removing prints * working with mb lddl loader * final changes * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update requirements file with LDDL * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revert adding to requirements --------- Signed-off-by: wdykas <[email protected]> Co-authored-by: wdykas <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Eric Harper <[email protected]>

Signed-off-by: MaximumEntropy <[email protected]> Co-authored-by: Sandeep Subramanian <[email protected]>

Signed-off-by: Mikołaj Błaż <[email protected]> Co-authored-by: Eric Harper <[email protected]>

Added a visual utterance-level comparison of two ASR models Signed-off-by: George <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

#6791) * Construct FP8 amax reduction group Signed-off-by: Tim Moon <[email protected]> * Update Megatron-core version in CI Signed-off-by: Tim Moon <[email protected]> --------- Signed-off-by: Tim Moon <[email protected]> Co-authored-by: Tim Moon <[email protected]> Co-authored-by: Tim Moon <[email protected]>

* new lora test Signed-off-by: arendu <[email protected]> * updates Signed-off-by: arendu <[email protected]> * check for chat Signed-off-by: arendu <[email protected]> * update Signed-off-by: arendu <[email protected]> * update Signed-off-by: arendu <[email protected]> * small train set Signed-off-by: arendu <[email protected]> * update Signed-off-by: arendu <[email protected]> * precision change Signed-off-by: arendu <[email protected]> * fixed typo in paths Signed-off-by: arendu <[email protected]> * full data with limit val batches Signed-off-by: arendu <[email protected]> * tp2 instead of pp2 Signed-off-by: arendu <[email protected]> * tp2 instead of pp2 Signed-off-by: arendu <[email protected]> --------- Signed-off-by: arendu <[email protected]> Signed-off-by: Adi Renduchintala <[email protected]>

Signed-off-by: Alexandra Antonova <[email protected]>

* add call to p2p overlap * update Jenkins for test --------- Signed-off-by: Abhinav Khattar <[email protected]> Signed-off-by: Eric Harper <[email protected]> Co-authored-by: Abhinav Khattar <[email protected]> Co-authored-by: Eric Harper <[email protected]>

…6793) Signed-off-by: Xuesong Yang <[email protected]>

Signed-off-by: Markel Sanz Ausin <[email protected]> Co-authored-by: Markel Sanz Ausin <[email protected]>

* repro for gpt eval mp mem issue Signed-off-by: Yang Zhang <[email protected]> * add print statements for memory allocation Signed-off-by: Yang Zhang <[email protected]> * adjusted hot fix that prevents softmax on the entire output embedding,now memory bottlenecked by attention softmax which needs to be solved with FA or long attention Signed-off-by: Yang Zhang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * using compute_logprob to configure inference Signed-off-by: Yang Zhang <[email protected]> * enable compute logprob for peft Signed-off-by: Yang Zhang <[email protected]> * remove print statements Signed-off-by: Yang Zhang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix ci Signed-off-by: Yang Zhang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * added docstrings Signed-off-by: Yang Zhang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add missing config Signed-off-by: Yang Zhang <[email protected]> * remove truncate prompt length feature Signed-off-by: Yang Zhang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * tensor before all gather needs to be contiguous Signed-off-by: Yang Zhang <[email protected]> --------- Signed-off-by: Yang Zhang <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Evelina <[email protected]> Co-authored-by: Sandeep Subramanian <[email protected]>

Signed-off-by: tbartley94 <[email protected]>

Signed-off-by: Abhinav Khattar <[email protected]> Co-authored-by: Abhinav Khattar <[email protected]>

Signed-off-by: arendu <[email protected]>

If datasets are stored on a read-only medium, index files cannot be created into adjacent files and an alternative directory must be specified for index mapping files. This commit adds an optional `index_mapping_dir` to the constructors. Unit tests are also added. [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Update path formatting for relative paths Signed-off-by: Greg Heinrich <[email protected]>

* Add kv cache support for transformer TE path Signed-off-by: Yen-Shi Wang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Mark get_data_parallel_group as WAR Signed-off-by: Yen-Shi Wang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Initialize process group for FP8 training Signed-off-by: Tim Moon <[email protected]> * Update Megatron GPT eval script for non-FP8 path Signed-off-by: Yen-Shi Wang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Yen-Shi Wang <[email protected]> Signed-off-by: Tim Moon <[email protected]> Signed-off-by: Yen-Shi Wang <[email protected]> Co-authored-by: Yen-Shi Wang <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Tim Moon <[email protected]> Co-authored-by: Tim Moon <[email protected]> Co-authored-by: Eric Harper <[email protected]>

* initial commit Signed-off-by: Dima Rekesh <[email protected]> * typos Signed-off-by: Dima Rekesh <[email protected]> * tweaks to padding Signed-off-by: Dima Rekesh <[email protected]> * comments Signed-off-by: Dima Rekesh <[email protected]> * attempt at first working version Signed-off-by: Dima Rekesh <[email protected]> * typos and fixed p calculation Signed-off-by: Dima Rekesh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * removing merge artifacts Signed-off-by: Dima Rekesh <[email protected]> * typo Signed-off-by: Dima Rekesh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * removing unnessary imports Signed-off-by: Dima Rekesh <[email protected]> * if batch split succeeded no need to conv again Signed-off-by: Dima Rekesh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * adding channel wise split Signed-off-by: Dima Rekesh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * adding reference to pytorch issue 80020 Signed-off-by: Dima Rekesh <[email protected]> * removing time chunking methods Signed-off-by: Dima Rekesh <[email protected]> * accounting for the actual self._stride value Signed-off-by: Dima Rekesh <[email protected]> * limiting the fix to dw_striding subsampling Signed-off-by: Dima Rekesh <[email protected]> * renamed methods Signed-off-by: Dima Rekesh <[email protected]> * one more accounting for the actual self._stride value Signed-off-by: Dima Rekesh <[email protected]> * support for causal convs Signed-off-by: Dima Rekesh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * option to set conv chunking size manually * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fixing imports * subsampling test Signed-off-by: Dima Rekesh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * rename variable Signed-off-by: Dima Rekesh <[email protected]> * imports in test Signed-off-by: Dima Rekesh <[email protected]> * more runtime checks * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * a more careful test Signed-off-by: Dima Rekesh <[email protected]> * bug in causal Signed-off-by: Dima Rekesh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix in causal Signed-off-by: Dima Rekesh <[email protected]> * change_conv_chunking_factor methods Signed-off-by: Dima Rekesh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * renamed methods Signed-off-by: Dima Rekesh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * disabling chunking by default Signed-off-by: Dima Rekesh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * typo Signed-off-by: Dima Rekesh <[email protected]> * changing default chunking to auto Signed-off-by: Dima Rekesh <[email protected]> * only split if needed Signed-off-by: Dima Rekesh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * only split if needed Signed-off-by: Dima Rekesh <[email protected]> --------- Signed-off-by: Dima Rekesh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

Signed-off-by: Dima Rekesh <[email protected]>

Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao Koluguri <nithinraok>

* add reference to our paper Signed-off-by: Alexandra Antonova <[email protected]> * add paper reference to docs Signed-off-by: Alexandra Antonova <[email protected]> --------- Signed-off-by: Alexandra Antonova <[email protected]>

Signed-off-by: smajumdar <[email protected]> Co-authored-by: Eric Harper <[email protected]>

* added methods. Signed-off-by: Vahid <[email protected]> * added methods. Signed-off-by: Vahid <[email protected]> * added initial code. Signed-off-by: Vahid <[email protected]> * added initial code. Signed-off-by: Vahid <[email protected]> * added initial code. Signed-off-by: Vahid <[email protected]> * added config files. Signed-off-by: Vahid <[email protected]> * fixed bugs. Signed-off-by: Vahid <[email protected]> * updated confs. Signed-off-by: Vahid <[email protected]> * updated confs. Signed-off-by: Vahid <[email protected]> * updated confs. Signed-off-by: Vahid <[email protected]> * updated confs. Signed-off-by: Vahid <[email protected]> * improved f.conv1d Signed-off-by: Vahid <[email protected]> * pulled from main. Signed-off-by: Vahid <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * pulled from main. Signed-off-by: Vahid <[email protected]> * added postpostnorm. Signed-off-by: Vahid <[email protected]> * fixed the target continiouse bug. Signed-off-by: Vahid <[email protected]> * added dw_striding causal. Signed-off-by: Vahid <[email protected]> * added print for debugging. Signed-off-by: Vahid <[email protected]> * added print for debugging. Signed-off-by: Vahid <[email protected]> * fixed causal convolutions. Signed-off-by: Vahid <[email protected]> * added _midnorm. Signed-off-by: Vahid <[email protected]> * fixed transcribe. Signed-off-by: Vahid <[email protected]> * cleaned code. Signed-off-by: Vahid <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * moved back configs. Signed-off-by: Vahid <[email protected]> * moved back configs. Signed-off-by: Vahid <[email protected]> * updated fast emit for FC models. Signed-off-by: Vahid <[email protected]> * updated fast emit for FC models. Signed-off-by: Vahid <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fixed bug. Signed-off-by: Vahid <[email protected]> * fixed bug and addressed comments. Signed-off-by: Vahid <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fixed configs. Signed-off-by: Vahid <[email protected]> * fixed configs. Signed-off-by: Vahid <[email protected]> * dropped the test. Signed-off-by: Vahid <[email protected]> --------- Signed-off-by: Vahid <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* aliases Signed-off-by: Nikolay Karpov <[email protected]> * add NEMO_PATH Signed-off-by: Nikolay Karpov <[email protected]> * expand_aliases Signed-off-by: Nikolay Karpov <[email protected]> --------- Signed-off-by: Nikolay Karpov <[email protected]>

Signed-off-by: Igor Gitman <[email protected]>

…ble models (#7012) (#7013) * [TTS] fastpitch: add english libritts model with asr stft parameters (25 ms 10 ms) * [TTS] enhancer: add pretrained model intended for asr finetuning --------- Signed-off-by: Roman Korostik <[email protected]>

* Add ASR with TTS Tutorial * Fix enhancer usage Signed-off-by: Vladimir Bataev <[email protected]> Co-authored-by: Vladimir Bataev <[email protected]>

* Add end_strings to SamplingParams Signed-off-by: Gerald Shen <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Gerald Shen <[email protected]> * Add end_strings to megatron_gpt_inference.yaml Signed-off-by: Gerald Shen <[email protected]> * Add end_strings to sampling params Signed-off-by: Gerald Shen <[email protected]> * Remove extra_id_1 from default end_strings Signed-off-by: Gerald Shen <[email protected]> * Fix require_grad typos (#6930) Signed-off-by: Sergii Dymchenko <[email protected]> Signed-off-by: Gerald Shen <[email protected]> * fix syntax error Signed-off-by: Gerald Shen <[email protected]> * fix the mpt chatbot (#6957) (#6968) Signed-off-by: Yi Dong <[email protected]> Co-authored-by: Yi Dong <[email protected]> Signed-off-by: Gerald Shen <[email protected]> * add support for max_total_length=4096 for 43b (#6763) * add support for max_total_length=4096 for 43b Signed-off-by: Zhilin Wang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Zhilin Wang <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Gerald Shen <[email protected]> * rnnt_greedy_decoding.py: typos? auto-repressively -> auto-regressively (#6989) Signed-off-by: Vadim Kantorov <[email protected]> Signed-off-by: Gerald Shen <[email protected]> * Cache handling without input tensors mutation (#6980) (#6996) * Cache handling without input tensors mutation * Cleanup * Cleanup#2 * Cleanup#3 --------- Signed-off-by: Boris Fomitchev <[email protected]> Co-authored-by: Boris Fomitchev <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Signed-off-by: Gerald Shen <[email protected]> * Hybrid conformer export (#6983) (#6995) * Implemented generic kv-pair setting of export_config from args * Hybrid conformer export * Hybrid decoder export * Cleanup * Changed from **kwargs * Docstring * Docs added * Stringify args * Added docs for ASR export configs * lowercase ctc --------- Signed-off-by: Boris Fomitchev <[email protected]> Co-authored-by: Boris Fomitchev <[email protected]> Signed-off-by: Gerald Shen <[email protected]> * Fixing an issue with confidence ensembles (#6987) (#7004) * Bug fix for the confidence ensembles * Relax constraints for the test --------- Signed-off-by: Igor Gitman <[email protected]> Co-authored-by: Igor Gitman <[email protected]> Signed-off-by: Gerald Shen <[email protected]> * [TTS] Add cosine distance option to TTS aligner (#6806) * [TTS] Add cosine distance option to TTS aligner Signed-off-by: Ryan <[email protected]> * [TTS] Update aligner comments Signed-off-by: Ryan <[email protected]> --------- Signed-off-by: Ryan <[email protected]> Signed-off-by: Gerald Shen <[email protected]> * Minor MPT-7B fixes and creation script update (#6982) * Initial commit of minor MPT-7B fixes Signed-off-by: Daniel Egert <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Daniel Egert <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Gerald Shen <[email protected]> * Change Jenkins timeout (#6997) * change timeout Signed-off-by: ericharper <[email protected]> * change to 8 hours Signed-off-by: ericharper <[email protected]> --------- Signed-off-by: ericharper <[email protected]> Signed-off-by: Gerald Shen <[email protected]> * remove hard coded input and output fields (#7008) * remove hard coded input and output fields Signed-off-by: arendu <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: arendu <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Gerald Shen <[email protected]> * RoPE length extrapolation with interpolation (#7005) * Push changes Signed-off-by: MaximumEntropy <[email protected]> * Fixes Signed-off-by: MaximumEntropy <[email protected]> * add continue training script Signed-off-by: MaximumEntropy <[email protected]> * [WIP] nonlinear interp Signed-off-by: MaximumEntropy <[email protected]> * Fix Signed-off-by: MaximumEntropy <[email protected]> * override encoder_seq_len Signed-off-by: MaximumEntropy <[email protected]> * Remove nonlinear Signed-off-by: MaximumEntropy <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * sft with pi (#7006) * sft with pi Signed-off-by: Evelina <[email protected]> * update values only if not None" Signed-off-by: Evelina <[email protected]> --------- Signed-off-by: Evelina <[email protected]> * Address comments Signed-off-by: MaximumEntropy <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add info Signed-off-by: MaximumEntropy <[email protected]> * Empty Signed-off-by: MaximumEntropy <[email protected]> --------- Signed-off-by: MaximumEntropy <[email protected]> Signed-off-by: Evelina <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Evelina <[email protected]> Signed-off-by: Gerald Shen <[email protected]> * use proper config Signed-off-by: Gerald Shen <[email protected]> * Add end_strings to SamplingParams Signed-off-by: Gerald Shen <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Gerald Shen <[email protected]> * Add end_strings to megatron_gpt_inference.yaml Signed-off-by: Gerald Shen <[email protected]> * Add end_strings to sampling params Signed-off-by: Gerald Shen <[email protected]> * Remove extra_id_1 from default end_strings Signed-off-by: Gerald Shen <[email protected]> * fix syntax error Signed-off-by: Gerald Shen <[email protected]> * use proper config Signed-off-by: Gerald Shen <[email protected]> --------- Signed-off-by: Gerald Shen <[email protected]> Signed-off-by: Sergii Dymchenko <[email protected]> Signed-off-by: Yi Dong <[email protected]> Signed-off-by: Zhilin Wang <[email protected]> Signed-off-by: Vadim Kantorov <[email protected]> Signed-off-by: Boris Fomitchev <[email protected]> Signed-off-by: Igor Gitman <[email protected]> Signed-off-by: Ryan <[email protected]> Signed-off-by: Daniel Egert <[email protected]> Signed-off-by: ericharper <[email protected]> Signed-off-by: arendu <[email protected]> Signed-off-by: MaximumEntropy <[email protected]> Signed-off-by: Evelina <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Sergii Dymchenko <[email protected]> Co-authored-by: Gerald Shen <[email protected]> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Yi Dong <[email protected]> Co-authored-by: Zhilin Wang <[email protected]> Co-authored-by: Vadim Kantorov <[email protected]> Co-authored-by: Boris Fomitchev <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: Igor Gitman <[email protected]> Co-authored-by: Ryan Langman <[email protected]> Co-authored-by: trias702 <[email protected]> Co-authored-by: Eric Harper <[email protected]> Co-authored-by: Adi Renduchintala <[email protected]> Co-authored-by: Sandeep Subramanian <[email protected]> Co-authored-by: Evelina <[email protected]>

…es not wait for setup (#7016) Signed-off-by: Kim Ngo <[email protected]>

Signed-off-by: tbartley94 <[email protected]>

* rnnt_ngram_merge Signed-off-by: Nikolay Karpov <[email protected]> * char level bug Signed-off-by: Nikolay Karpov <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Nikolay Karpov <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Somshubra Majumdar <[email protected]>

Signed-off-by: Yi Dong <[email protected]> Co-authored-by: Yi Dong <[email protected]>

* small fixes and tests Signed-off-by: Aleksandr Laptev <[email protected]> * various fixes for the tutorial Signed-off-by: Aleksandr Laptev <[email protected]> * tutorial added Signed-off-by: Aleksandr Laptev <[email protected]> * for for a little oops after rebasement Signed-off-by: Aleksandr Laptev <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix tests Signed-off-by: Aleksandr Laptev <[email protected]> * unused import removed Signed-off-by: Aleksandr Laptev <[email protected]> * fix review comments Signed-off-by: Aleksandr Laptev <[email protected]> * deprecated parameters for greedy configs Signed-off-by: Aleksandr Laptev <[email protected]> * move re-assigning to configs Signed-off-by: Aleksandr Laptev <[email protected]> * fix comments 2 Signed-off-by: Aleksandr Laptev <[email protected]> * fix config tests Signed-off-by: Aleksandr Laptev <[email protected]> * fix ece test (my env was bugged apparently) Signed-off-by: Aleksandr Laptev <[email protected]> * renamings for confidence ensemble Signed-off-by: Aleksandr Laptev <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fox comments 3 Signed-off-by: Aleksandr Laptev <[email protected]> * return dropped tutorial Signed-off-by: Aleksandr Laptev <[email protected]> * CI flips back and forth, increasing tolerance Signed-off-by: Aleksandr Laptev <[email protected]> --------- Signed-off-by: Aleksandr Laptev <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

Signed-off-by: Nikolay Karpov <[email protected]> Co-authored-by: Nikolay Karpov <[email protected]>

Signed-off-by: Alexandra Antonova <[email protected]> Co-authored-by: bene-ges <[email protected]> Co-authored-by: Evelina <[email protected]>

Signed-off-by: Yi Dong <[email protected]>

Signed-off-by: smajumdar <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]>

* st standalone model Signed-off-by: AlexGrinch <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * style fix Signed-off-by: AlexGrinch <[email protected]> * sacrebleu import fix, unused imports removed Signed-off-by: AlexGrinch <[email protected]> * import guard for nlp inside asr transformer bpe model Signed-off-by: AlexGrinch <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * codeql fixes Signed-off-by: AlexGrinch <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * comments answered Signed-off-by: AlexGrinch <[email protected]> * import ordering fix Signed-off-by: AlexGrinch <[email protected]> * yttm for asr removed Signed-off-by: AlexGrinch <[email protected]> * logging added Signed-off-by: AlexGrinch <[email protected]> * added inference and translate method Signed-off-by: AlexGrinch <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: AlexGrinch <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* remove pos emb from state dict Signed-off-by: Evelina <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * move to nlp_model Signed-off-by: Evelina <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update comment Signed-off-by: Evelina <[email protected]> * fix nmt test Signed-off-by: Evelina <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix nmt test Signed-off-by: Evelina <[email protected]> --------- Signed-off-by: Evelina <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

Signed-off-by: Vladimir Bataev <[email protected]>

Signed-off-by: Vitaly Lavrukhin <[email protected]> Co-authored-by: Vladimir Bataev <[email protected]>

* Fix documentation for Numba * Update force float32 flag dynamically * Update force float32 flag dynamically * Fix nemo version --------- Signed-off-by: smajumdar <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: Eric Harper <[email protected]>

* update fvad doc Signed-off-by: stevehuang52 <[email protected]> * fix typo Signed-off-by: stevehuang52 <[email protected]> * update fvad example Signed-off-by: stevehuang52 <[email protected]> * update Signed-off-by: stevehuang52 <[email protected]> * fix onnx export Signed-off-by: stevehuang52 <[email protected]> * update test Signed-off-by: stevehuang52 <[email protected]> * refactor Signed-off-by: stevehuang52 <[email protected]> * update doc Signed-off-by: stevehuang52 <[email protected]> * update Signed-off-by: stevehuang52 <[email protected]> --------- Signed-off-by: stevehuang52 <[email protected]> Co-authored-by: fayejf <[email protected]>

* memmap worker arg Signed-off-by: arendu <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update Signed-off-by: arendu <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update Signed-off-by: arendu <[email protected]> * update Signed-off-by: arendu <[email protected]> --------- Signed-off-by: arendu <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

nemo/collections/asr/parts/numba/rnnt_loss/rnnt_pytorch.py

+        return costs
+
+    @staticmethod
+    def backward(ctx, grad_output):


examples/nlp/language_modeling/tuning/megatron_t5_lora_eval.py

+    # Have to turn off activations_checkpoint_method for inference
+    try:
+        model.model.language_model.encoder.activations_checkpoint_method = None
+    except AttributeError:


examples/nlp/language_modeling/tuning/megatron_t5_lora_eval.py

+
+    try:
+        model.frozen_model.model.language_model.encoder.activations_checkpoint_method = None
+    except AttributeError:


examples/nlp/language_modeling/megatron_gpt_validate.py

+from nemo.collections.nlp.parts.nlp_overrides import (
+    MegatronHalfPrecisionPlugin,
+    NLPDDPStrategy,
+    NLPSaveRestoreConnector,
+    PipelineMixedPrecisionPlugin,
+)


nemo/collections/asr/models/transformer_bpe_models.py

+    def multi_test_epoch_end(self, outputs, dataloader_idx: int = 0):
+        return self.multi_validation_epoch_end(outputs, dataloader_idx, eval_mode="test")
+
+    def test_dataloader(self):


nemo/collections/asr/models/transformer_bpe_models.py

+    from nemo.collections.nlp.modules.common.lm_utils import get_transformer
+    from nemo.collections.nlp.modules.common.transformer import BeamSearchSequenceGenerator, TransformerEncoder
+
+    NLP_AVAILABLE = True


nemo/collections/asr/models/transformer_bpe_models.py

+
+    NLP_AVAILABLE = True
+except (ImportError, ModuleNotFoundError):
+    NLP_AVAILABLE = False


nemo/collections/asr/data/audio_to_text_dataset.py

+        zip(tarred_audio_filepaths, manifest_filepaths)
+    ):
+        conf = copy.deepcopy(config)
+        conf['manifest_filepath'] = manifest_filepath


nemo/collections/asr/data/audio_to_text_dataset.py

+        conf = copy.deepcopy(config)
+        conf['manifest_filepath'] = manifest_filepath
+        with open_dict(conf):
+            conf['tarred_audio_filepaths'] = tarred_audio_filepath


arendu and others added 30 commits June 1, 2023 11:42

[TTS][zh] refine hardcoded lowercase for ASCII letters. (#6781)

cfbe092

Signed-off-by: Xuesong Yang <[email protected]>

Fix get_parameters when using main params optimizer (#6764) (#6787)

ef74006

* fix get param * change name --------- Signed-off-by: ericharper <[email protected]> Co-authored-by: Eric Harper <[email protected]>

Fix check (#6798) (#6800)

a7403c2

Signed-off-by: MaximumEntropy <[email protected]> Co-authored-by: Sandeep Subramanian <[email protected]>

Fix validation with drop_last=False (#6704)

d984333

Signed-off-by: Mikołaj Błaż <[email protected]> Co-authored-by: Eric Harper <[email protected]>

SDE unt lvl comparison (#6669)

8f26d83

Added a visual utterance-level comparison of two ASR models Signed-off-by: George <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

change branch to main, small fix (#6803)

76fc488

Signed-off-by: Alexandra Antonova <[email protected]>

fixed decor to show messages only when the wrapped object is called. (#…

aa21e8a

…6793) Signed-off-by: Xuesong Yang <[email protected]>

Bug fix for reset_sequence_parallel_args (#6802) (#6805)

f9bb1b0

Signed-off-by: Markel Sanz Ausin <[email protected]> Co-authored-by: Markel Sanz Ausin <[email protected]>

Fixed bug in MaskedSpecAug that overestimates samples. (#6775)

010a0e6

Signed-off-by: tbartley94 <[email protected]>

update core version (#6817) (#6819)

8c26464

Signed-off-by: Abhinav Khattar <[email protected]> Co-authored-by: Abhinav Khattar <[email protected]>

lora pp2 (#6818)

acf50f4

Signed-off-by: arendu <[email protected]>

sharded_manifests updated docs (#6833)

ebfcef7

Signed-off-by: Dima Rekesh <[email protected]>

added fc-xl, xxl and titanet-s models (#6832)

52e23e0

Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao Koluguri <nithinraok>

add reference to our paper (#6821)

6903d9b

* add reference to our paper Signed-off-by: Alexandra Antonova <[email protected]> * add paper reference to docs Signed-off-by: Alexandra Antonova <[email protected]> --------- Signed-off-by: Alexandra Antonova <[email protected]>

Upperbound Numpy to < 1.24 (#6829)

9cca92b

Signed-off-by: smajumdar <[email protected]> Co-authored-by: Eric Harper <[email protected]>

karpnv and others added 21 commits July 13, 2023 09:05

Update SDP docs page with a new documentation link (#7029)

caddb8d

Signed-off-by: Igor Gitman <[email protected]>

Add ASR with TTS Tutorial. Fix enhancer usage. (#6955) (#7023)

d44127e

* Add ASR with TTS Tutorial * Fix enhancer usage Signed-off-by: Vladimir Bataev <[email protected]> Co-authored-by: Vladimir Bataev <[email protected]>

Fix race condition when executing with multi-node where some ranks do…

5733975

…es not wait for setup (#7016) Signed-off-by: Kim Ngo <[email protected]>

Added bool types to neural_types export (#7032)

470f178

Signed-off-by: tbartley94 <[email protected]>

fix tab text gen (#7022) (#7031)

18f283e

Signed-off-by: Yi Dong <[email protected]> Co-authored-by: Yi Dong <[email protected]>

install_bs (#7019) (#7028)

2ef544f

Signed-off-by: Nikolay Karpov <[email protected]> Co-authored-by: Nikolay Karpov <[email protected]>

fixes for spellmapper (#6994) (#7000)

8b4b382

Signed-off-by: Alexandra Antonova <[email protected]> Co-authored-by: bene-ges <[email protected]> Co-authored-by: Evelina <[email protected]>

added back the retro documents (#7033)

9051440

Signed-off-by: Yi Dong <[email protected]>

Remove pyyaml (#7052) (#7054)

84ae944

Signed-off-by: smajumdar <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]>

Fix typo in ASR-TTS tutorial (#7049)

b1aa4c2

Signed-off-by: Vladimir Bataev <[email protected]>

Fixed tutorial's name (#7047)

1dde267

Signed-off-by: Vitaly Lavrukhin <[email protected]> Co-authored-by: Vladimir Bataev <[email protected]>

github-advanced-security bot found potential problems Jul 20, 2023

View reviewed changes

github-actions bot added core Changes to NeMo Core TTS ASR NLP Speaker Tasks CI common labels Jul 20, 2023

KunalDhawan merged commit 1b17b22 into asr_normalize Jul 20, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Merge main to asr_normalize #7084

Merge main to asr_normalize #7084

KunalDhawan commented Jul 20, 2023

Merge main to asr_normalize #7084

Merge main to asr_normalize #7084

Conversation

KunalDhawan commented Jul 20, 2023

What does this PR do ?

Changelog

Usage

Before your PR is "Ready for review"

Who can review?

Additional Information