Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NMT bottleneck #2390

Merged
merged 472 commits into from
Jul 14, 2021
Merged

NMT bottleneck #2390

merged 472 commits into from
Jul 14, 2021

Conversation

michalivne
Copy link
Collaborator

@michalivne michalivne commented Jun 22, 2021

This PR adds a bottleneck architecture to NMT models, along with support for training VAE and MIM latent variable models.

Summary

The bottleneck class (MTBottleneckModel) supports two architectures:

  • No bottleneck: model_type=seq2seq, the usual NMT mdoel.
  • Fixed-size bottleneck: model_type in [seq2seq-br, seq2seq-mim, seq2seq-vae], where the output of the encoder is projected to a fixed number of steps.

The projection to a fixed number of steps is based on the paper https://arxiv.org/pdf/1703.03130.pdf.
The idea is to use K attention heads to compute K weighted-average hidden states, projecting a variable number of steps into K steps.

The bottleneck variants offer different losses:

YAML Configuration

The following configurations were added to the YAML config:

model:
  model_type: 'seq2seq-br' # supports seq2seq, seq2seq-br, seq2seq-mim, seq2seq-vae (see description above)
  min_logv: -8 # minimal allowed logv for seq2seq-mim
  ortho_loss_coef: 0.0 # orthogonality coefficient for attention bridge
  att_bridge_size: 512 # dimension of a step in attention bridge
  att_bridge_k: 16 # fixed number of steps in attention bridge
  att_bridge_inner_size: 1024 # feedforward size in attention bridge
  non_recon_warmup_batches: 200000 # warm-up steps for seq2seq-mim, seq2seq-vae
  recon_per_token: true # when false reconstruction is computed per sample, not per token
  • ortho_loss_coef - encou
  • non_recon_warmup_batches - anneals the KL divergence term (for VAE) or latent entropy term (for MIM)
  • recon_per_token - if false, loss is computed per sample (i.e., summed over all tokens), otherwise loss is averaged per token (the default behaviour).

Usage

See usage example below, training a seq2seq-br with 32 steps.

NOTE: max_generation_delta must be big enough to allow the generation of the longest sequence given the chosen bottleneck size.

python -- examples/nlp/machine_translation/enc_dec_nmt-bottleneck.py \
      --config-name=aayn_bottleneck \
      ...
      model.max_generation_delta=256 \
      ...
      model.model_type=seq2seq-br \
      model.att_bridge_size=1024 \
      model.ortho_loss_coef=0.0 \
      model.att_bridge_k=32 \
      model.att_bridge_inner_size=1024 \
      model.recon_per_token=true \
      model.non_recon_warmup_batches=150000 \
      ...

Additional Info

  • The attention bridge class (AttentionBridge) was added to nemo/collections/nlp/modules/common/transformer/transformer_modules.py.
  • The bottleneck class (MTBottleneckModel) supports logging of various loss terms.

yzhang123 and others added 30 commits April 29, 2021 18:58
* move do_training flag to config

Signed-off-by: Yang Zhang <[email protected]>

* added telephone to itn

Signed-off-by: Yang Zhang <[email protected]>

* add telephone and email to itn

Signed-off-by: Yang Zhang <[email protected]>
Signed-off-by: Micha Livne <[email protected]>
* Preserve the tokenizer config for ASR

Signed-off-by: smajumdar <[email protected]>

* Correct nlp docs

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Micha Livne <[email protected]>
NVIDIA#2144)

* Removing graphsurgeon optional dependency, improving import error reporting

Signed-off-by: Boris Fomitchev <[email protected]>

* Fixing scope error

Signed-off-by: Boris Fomitchev <[email protected]>
Signed-off-by: Micha Livne <[email protected]>
Signed-off-by: PiotrDabkowski <[email protected]>
Signed-off-by: Micha Livne <[email protected]>
Signed-off-by: Micha Livne <[email protected]>
* removed graphutils, suppletive, data_loader_utils from itn to be reused from tn

Signed-off-by: Yang Zhang <[email protected]>

* inheriting itn from tn, thus removing redundancy

Signed-off-by: Yang Zhang <[email protected]>

* cleaned whitelist

Signed-off-by: Yang Zhang <[email protected]>

* lgtm fix

Signed-off-by: Yang Zhang <[email protected]>
Signed-off-by: Micha Livne <[email protected]>
* Update how artifacts work

Signed-off-by: Oleksii Kuchaiev <[email protected]>

* fixing some tests

Signed-off-by: Oleksii Kuchaiev <[email protected]>

* fix more tests

Signed-off-by: Oleksii Kuchaiev <[email protected]>

* add __init__ to tests to make them discoverable

Signed-off-by: Oleksii Kuchaiev <[email protected]>

* empty src support

Signed-off-by: Oleksii Kuchaiev <[email protected]>

* updates plust unittest

Signed-off-by: Oleksii Kuchaiev <[email protected]>

* add copyright check

Signed-off-by: Oleksii Kuchaiev <[email protected]>

* copyright header

Signed-off-by: Oleksii Kuchaiev <[email protected]>

* fix style

Signed-off-by: Oleksii Kuchaiev <[email protected]>

* handle hashed megatron checkpoint version in nlp restore_from

Signed-off-by: ericharper <[email protected]>

* add _MODEL_RESTORE_PATH to AppState

Signed-off-by: ericharper <[email protected]>

* get rid of global folder caching

Signed-off-by: Oleksii Kuchaiev <[email protected]>

* double register - warning instead of exception

Signed-off-by: Oleksii Kuchaiev <[email protected]>

* Add asr spe tests

Signed-off-by: smajumdar <[email protected]>

* Pop out asr wpe pre-registered value

Signed-off-by: smajumdar <[email protected]>

* Correct ASR tests and paths

Signed-off-by: smajumdar <[email protected]>

* Correct tokenizer saving

Signed-off-by: smajumdar <[email protected]>

* Correct ASR tests

Signed-off-by: smajumdar <[email protected]>

* Correct ASR bpe mixin

Signed-off-by: smajumdar <[email protected]>

* Patch up backward compatibility

Signed-off-by: smajumdar <[email protected]>

* update register_bert_model

Signed-off-by: ericharper <[email protected]>

* update all get_lm_model calls

Signed-off-by: ericharper <[email protected]>

* return None if src not found

Signed-off-by: ericharper <[email protected]>

* handle case with no tokenizer

Signed-off-by: ericharper <[email protected]>

* do not add another hash is using tarfile_artifacts

Signed-off-by: ericharper <[email protected]>

* add return_none flag, update doc string

Signed-off-by: ericharper <[email protected]>

* update default behavior of register_artifact for NLPModel

Signed-off-by: ericharper <[email protected]>

* change kwarg name to verify_src_exists

Signed-off-by: ericharper <[email protected]>

* use cfg instead of _cfg

Signed-off-by: Oleksii Kuchaiev <[email protected]>

* some cleanups

Signed-off-by: Oleksii Kuchaiev <[email protected]>

Co-authored-by: ericharper <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Signed-off-by: Micha Livne <[email protected]>
* fixed branch in IR tutorial

Signed-off-by: AlexGrinch <[email protected]>

* bucketing tarred dataset for lm training

Signed-off-by: AlexGrinch <[email protected]>

* updated global rank

Signed-off-by: AlexGrinch <[email protected]>

* perplexity update

Signed-off-by: AlexGrinch <[email protected]>

* refactor lm to be campatible with latest nmt

Signed-off-by: AlexGrinch <[email protected]>

* perplexity change

Signed-off-by: AlexGrinch <[email protected]>

* removed obsolete config

Signed-off-by: AlexGrinch <[email protected]>

* added sequence perplexity

Signed-off-by: AlexGrinch <[email protected]>

* added non-smoothed CE loss for validation

Signed-off-by: AlexGrinch <[email protected]>

* unified sentence dataset, torchmetrics for sequence perplexity

Signed-off-by: AlexGrinch <[email protected]>

* translate_ddp refactor

Signed-off-by: AlexGrinch <[email protected]>

Co-authored-by: Oleksii Kuchaiev <[email protected]>
Co-authored-by: Sandeep Subramanian <[email protected]>
Signed-off-by: Micha Livne <[email protected]>
* rename dl index 0 loss and sacrebleu for backwards compatibility

Signed-off-by: ericharper <[email protected]>

* eval -> val/tst

Signed-off-by: ericharper <[email protected]>

* instantiate torchmetrics after instantiating dataloaders

Signed-off-by: ericharper <[email protected]>

* bug

Signed-off-by: ericharper <[email protected]>

* remove debugging log

Signed-off-by: ericharper <[email protected]>

* remove debugging log

Signed-off-by: ericharper <[email protected]>

Co-authored-by: Oleksii Kuchaiev <[email protected]>
Signed-off-by: Micha Livne <[email protected]>
Signed-off-by: Oleksii Kuchaiev <[email protected]>

Signed-off-by: Micha Livne <[email protected]>
* fix for electronic

Signed-off-by: ekmb <[email protected]>

* special symbols added

Signed-off-by: ekmb <[email protected]>

* restrict symbols list

Signed-off-by: ekmb <[email protected]>
Signed-off-by: Micha Livne <[email protected]>
* Add FS2 data loading test

Signed-off-by: Jocelyn Huang <[email protected]>

* TTS docs update for FastSpeech 2

Signed-off-by: Jocelyn Huang <[email protected]>

* Style fix for FS2 dataset test

Signed-off-by: Jocelyn Huang <[email protected]>

* Fix transpose typo

Signed-off-by: Jocelyn Huang <[email protected]>

Co-authored-by: Oleksii Kuchaiev <[email protected]>
Co-authored-by: Jason <[email protected]>
Signed-off-by: Micha Livne <[email protected]>
* Patch for backtranslation in lm dataset

Signed-off-by: MaximumEntropy <[email protected]>

* One more fix

Signed-off-by: MaximumEntropy <[email protected]>
Signed-off-by: Micha Livne <[email protected]>
* Started adding SAP dataset

Signed-off-by: Virginia Adams <[email protected]>

* Delete .lm_bert_dataset.py.swp

Signed-off-by: Virginia Adams <[email protected]>

* Added dataset and loss

Signed-off-by: Virginia Adams <[email protected]>

* Added entity linking encoder model

Signed-off-by: Virginia Adams <[email protected]>

* Can build and use index from pubmedbert model

Signed-off-by: Virginia Adams <[email protected]>

* checked boolean logic in build_index.py

Signed-off-by: Virginia Adams <[email protected]>

* End to end tested all functionality

Signed-off-by: Virginia Adams <[email protected]>

* fixed val loss none at end of validation

Signed-off-by: Virginia Adams <[email protected]>

* Started adding demo entity linking notebook

Signed-off-by: Virginia Adams <[email protected]>

* adding in notebook demo

Signed-off-by: Virginia Adams <[email protected]>

* added call to entitylinking classes in __init__.py files

Signed-off-by: Virginia Adams <[email protected]>

* Added eval code to notebook

Signed-off-by: Virginia Adams <[email protected]>

* Adding unfinished notebook

Signed-off-by: Virginia Adams <[email protected]>

* Cleaned up example dir

Signed-off-by: Virginia Adams <[email protected]>

* Fixed recap commands

Signed-off-by: Virginia Adams <[email protected]>

* added model typing and tiny data tar

Signed-off-by: Virginia Adams <[email protected]>

* Adding tiny data zip

Signed-off-by: Virginia Adams <[email protected]>

* updated tiny example config data path

Signed-off-by: Virginia Adams <[email protected]>

* Notebook demo works

Signed-off-by: Virginia Adams <[email protected]>

* Changed training epochs

Signed-off-by: Virginia Adams <[email protected]>

* Removed output from training and install cells

Signed-off-by: Virginia Adams <[email protected]>

* changed code formatting

Signed-off-by: Virginia Adams <[email protected]>

* Started doc string for new functions

Signed-off-by: Virginia Adams <[email protected]>

* Updated data_preprocessing to save to data_dir

Signed-off-by: Virginia Adams <[email protected]>

* fixed comment in notebook demo

Signed-off-by: Virginia Adams <[email protected]>

* Update data_preprocessing.py

Signed-off-by: Virginia Adams <[email protected]>

* updated nemo typing imports

Signed-off-by: Virginia Adams <[email protected]>

* about to rebase

Signed-off-by: Virginia Adams <[email protected]>

* added back umls_dataset_processing.py

Signed-off-by: Virginia Adams <[email protected]>

* Removed example data

Signed-off-by: Virginia Adams <[email protected]>

* Fixed typos in notebook demo

Signed-off-by: Virginia Adams <[email protected]>

* fixed lgtm-com issues

Signed-off-by: Virginia Adams <[email protected]>

* added copyright headers

Signed-off-by: Virginia Adams <[email protected]>

* fixed import and copyright headers

Signed-off-by: Virginia Adams <[email protected]>

* Fixed formatting

Signed-off-by: Virginia Adams <[email protected]>

* Fixed formatting changes 2

Signed-off-by: Virginia Adams <[email protected]>

* fixed test formatting

Signed-off-by: Virginia Adams <[email protected]>

* Added __init__.py for model and dataset

Signed-off-by: Virginia Adams <[email protected]>

* loading newline file returns data_dir now

Signed-off-by: Virginia Adams <[email protected]>

* Removed conf notebook and deleted comment

Signed-off-by: Virginia Adams <[email protected]>

* Added jenkins test

Signed-off-by: Virginia Adams <[email protected]>

* Updated Jenkins test

Signed-off-by: Virginia Adams <[email protected]>

* fixed file path

Signed-off-by: Virginia Adams <[email protected]>

* Changed Jenkins pipeline order

Signed-off-by: Virginia Adams <[email protected]>

* Fixed Jenkins datapath... again...

Signed-off-by: Virginia Adams <[email protected]>

* Made most review changes

Signed-off-by: Virginia Adams <[email protected]>

* fixed copy right

Signed-off-by: Virginia Adams <[email protected]>

* updated unit test to wget config

Signed-off-by: Virginia Adams <[email protected]>

* reverted test file back

Signed-off-by: Virginia Adams <[email protected]>

* Added project dir to jenkins test

Signed-off-by: Virginia Adams <[email protected]>

* defined config in unit test

Signed-off-by: Virginia Adams <[email protected]>

Co-authored-by: Sandeep Subramanian <[email protected]>
Co-authored-by: Oleksii Kuchaiev <[email protected]>
Signed-off-by: Micha Livne <[email protected]>
* Correct branch version

Signed-off-by: smajumdar <[email protected]>

* Correct Jenkinsfile

Signed-off-by: smajumdar <[email protected]>

* Update rst files

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Micha Livne <[email protected]>
Signed-off-by: Oleksii Kuchaiev <[email protected]>

Signed-off-by: Micha Livne <[email protected]>
* FastSpeech 2 Test & Docs (NVIDIA#2143)

* Add FS2 data loading test

Signed-off-by: Jocelyn Huang <[email protected]>

* TTS docs update for FastSpeech 2

Signed-off-by: Jocelyn Huang <[email protected]>

* Style fix for FS2 dataset test

Signed-off-by: Jocelyn Huang <[email protected]>

* Fix transpose typo

Signed-off-by: Jocelyn Huang <[email protected]>

Co-authored-by: Oleksii Kuchaiev <[email protected]>
Co-authored-by: Jason <[email protected]>

* Entity linking (NVIDIA#2050)

* Started adding SAP dataset

Signed-off-by: Virginia Adams <[email protected]>

* Delete .lm_bert_dataset.py.swp

Signed-off-by: Virginia Adams <[email protected]>

* Added dataset and loss

Signed-off-by: Virginia Adams <[email protected]>

* Added entity linking encoder model

Signed-off-by: Virginia Adams <[email protected]>

* Can build and use index from pubmedbert model

Signed-off-by: Virginia Adams <[email protected]>

* checked boolean logic in build_index.py

Signed-off-by: Virginia Adams <[email protected]>

* End to end tested all functionality

Signed-off-by: Virginia Adams <[email protected]>

* fixed val loss none at end of validation

Signed-off-by: Virginia Adams <[email protected]>

* Started adding demo entity linking notebook

Signed-off-by: Virginia Adams <[email protected]>

* adding in notebook demo

Signed-off-by: Virginia Adams <[email protected]>

* added call to entitylinking classes in __init__.py files

Signed-off-by: Virginia Adams <[email protected]>

* Added eval code to notebook

Signed-off-by: Virginia Adams <[email protected]>

* Adding unfinished notebook

Signed-off-by: Virginia Adams <[email protected]>

* Cleaned up example dir

Signed-off-by: Virginia Adams <[email protected]>

* Fixed recap commands

Signed-off-by: Virginia Adams <[email protected]>

* added model typing and tiny data tar

Signed-off-by: Virginia Adams <[email protected]>

* Adding tiny data zip

Signed-off-by: Virginia Adams <[email protected]>

* updated tiny example config data path

Signed-off-by: Virginia Adams <[email protected]>

* Notebook demo works

Signed-off-by: Virginia Adams <[email protected]>

* Changed training epochs

Signed-off-by: Virginia Adams <[email protected]>

* Removed output from training and install cells

Signed-off-by: Virginia Adams <[email protected]>

* changed code formatting

Signed-off-by: Virginia Adams <[email protected]>

* Started doc string for new functions

Signed-off-by: Virginia Adams <[email protected]>

* Updated data_preprocessing to save to data_dir

Signed-off-by: Virginia Adams <[email protected]>

* fixed comment in notebook demo

Signed-off-by: Virginia Adams <[email protected]>

* Update data_preprocessing.py

Signed-off-by: Virginia Adams <[email protected]>

* updated nemo typing imports

Signed-off-by: Virginia Adams <[email protected]>

* about to rebase

Signed-off-by: Virginia Adams <[email protected]>

* added back umls_dataset_processing.py

Signed-off-by: Virginia Adams <[email protected]>

* Removed example data

Signed-off-by: Virginia Adams <[email protected]>

* Fixed typos in notebook demo

Signed-off-by: Virginia Adams <[email protected]>

* fixed lgtm-com issues

Signed-off-by: Virginia Adams <[email protected]>

* added copyright headers

Signed-off-by: Virginia Adams <[email protected]>

* fixed import and copyright headers

Signed-off-by: Virginia Adams <[email protected]>

* Fixed formatting

Signed-off-by: Virginia Adams <[email protected]>

* Fixed formatting changes 2

Signed-off-by: Virginia Adams <[email protected]>

* fixed test formatting

Signed-off-by: Virginia Adams <[email protected]>

* Added __init__.py for model and dataset

Signed-off-by: Virginia Adams <[email protected]>

* loading newline file returns data_dir now

Signed-off-by: Virginia Adams <[email protected]>

* Removed conf notebook and deleted comment

Signed-off-by: Virginia Adams <[email protected]>

* Added jenkins test

Signed-off-by: Virginia Adams <[email protected]>

* Updated Jenkins test

Signed-off-by: Virginia Adams <[email protected]>

* fixed file path

Signed-off-by: Virginia Adams <[email protected]>

* Changed Jenkins pipeline order

Signed-off-by: Virginia Adams <[email protected]>

* Fixed Jenkins datapath... again...

Signed-off-by: Virginia Adams <[email protected]>

* Made most review changes

Signed-off-by: Virginia Adams <[email protected]>

* fixed copy right

Signed-off-by: Virginia Adams <[email protected]>

* updated unit test to wget config

Signed-off-by: Virginia Adams <[email protected]>

* reverted test file back

Signed-off-by: Virginia Adams <[email protected]>

* Added project dir to jenkins test

Signed-off-by: Virginia Adams <[email protected]>

* defined config in unit test

Signed-off-by: Virginia Adams <[email protected]>

Co-authored-by: Sandeep Subramanian <[email protected]>
Co-authored-by: Oleksii Kuchaiev <[email protected]>

* switch CI back to main

Signed-off-by: Oleksii Kuchaiev <[email protected]>

* Make Hifigan jittable

Signed-off-by: Ryan Leary <[email protected]>

* Remove vestigial debugging printout

Signed-off-by: Ryan Leary <[email protected]>

* Add export forward and fix style

Signed-off-by: Ryan Leary <[email protected]>

* Fix load_state_dict override for arbitrary layers

Signed-off-by: Ryan Leary <[email protected]>

Co-authored-by: Jocelyn <[email protected]>
Co-authored-by: Oleksii Kuchaiev <[email protected]>
Co-authored-by: Jason <[email protected]>
Co-authored-by: vadam5 <[email protected]>
Co-authored-by: Sandeep Subramanian <[email protected]>
Co-authored-by: Oleksii Kuchaiev <[email protected]>
Co-authored-by: Ryan Leary <[email protected]>
Signed-off-by: Micha Livne <[email protected]>
Signed-off-by: Jason <[email protected]>
Signed-off-by: Micha Livne <[email protected]>
* notebook size reduced

Signed-off-by: ekmb <[email protected]>

* notebook size reduced

Signed-off-by: ekmb <[email protected]>
Signed-off-by: Micha Livne <[email protected]>
* update spectral clustering method

Signed-off-by: nithinraok <[email protected]>

* update Jenkins File

Signed-off-by: nithinraok <[email protected]>

* threshold fix by reducing window length for shorter embs

Signed-off-by: nithinraok <[email protected]>

* grammar fixes

Signed-off-by: nithinraok <[email protected]>

* CR update

Signed-off-by: nithinraok <[email protected]>

* paper reference

Signed-off-by: nithinraok <[email protected]>

* improve docstring for yaml

Signed-off-by: nithinraok <[email protected]>

* Doc fixes

Signed-off-by: nithinraok <[email protected]>
Signed-off-by: Micha Livne <[email protected]>
Signed-off-by: Jason <[email protected]>
Signed-off-by: Micha Livne <[email protected]>
Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Micha Livne <[email protected]>
* artifacts update

Signed-off-by: ekmb <[email protected]>

* artifacts update

Signed-off-by: ekmb <[email protected]>

* jenkins

Signed-off-by: ekmb <[email protected]>

* fix for model restoration

Signed-off-by: ekmb <[email protected]>

* typos fix + jenkins dir update

Signed-off-by: ekmb <[email protected]>

* typo

Signed-off-by: ericharper <[email protected]>

* update jenkins branch

Signed-off-by: ericharper <[email protected]>

* add &&

Signed-off-by: ericharper <[email protected]>

* jenkins

Signed-off-by: ekmb <[email protected]>

* jenkins disable

Signed-off-by: ekmb <[email protected]>

* revert jenkins

Signed-off-by: ekmb <[email protected]>

* jenkins disable

Signed-off-by: ekmb <[email protected]>

* revert jenkins

Signed-off-by: ekmb <[email protected]>

Co-authored-by: ericharper <[email protected]>
Signed-off-by: Micha Livne <[email protected]>
* Initial attempt at always_save_nemo fix

Signed-off-by: MaximumEntropy <[email protected]>

* updated path before saving in exp manager, fixed bug when handling tarfile artifacts

Signed-off-by: ericharper <[email protected]>

* Add test with always_save_nemo to exp_manager

Signed-off-by: MaximumEntropy <[email protected]>

* Style fixes

Signed-off-by: MaximumEntropy <[email protected]>

* update jenkins branch

Signed-off-by: ericharper <[email protected]>

* check for nemo:

Signed-off-by: ericharper <[email protected]>

* check for nemo:

Signed-off-by: ericharper <[email protected]>

* check for nemo:

Signed-off-by: ericharper <[email protected]>

Co-authored-by: ericharper <[email protected]>
Co-authored-by: Jason <[email protected]>
Signed-off-by: Micha Livne <[email protected]>
Signed-off-by: Yang Zhang <[email protected]>
Signed-off-by: Micha Livne <[email protected]>
* Limit Pytorch lightning release

Signed-off-by: smajumdar <[email protected]>

* Add final two checks

Signed-off-by: smajumdar <[email protected]>

Co-authored-by: Jason <[email protected]>
Signed-off-by: Micha Livne <[email protected]>
* squash

Signed-off-by: Jason <[email protected]>

* notebook fixes

Signed-off-by: Jason <[email protected]>

* notebook fixes

Signed-off-by: Jason <[email protected]>

* typos

Signed-off-by: Jason <[email protected]>
Signed-off-by: Micha Livne <[email protected]>
Signed-off-by: Micha Livne <[email protected]>
@lgtm-com
Copy link

lgtm-com bot commented Jul 12, 2021

This pull request introduces 12 alerts when merging d3c655b into 44a3d02 - view on LGTM.com

new alerts:

  • 12 for Unused import

Signed-off-by: Micha Livne <[email protected]>
@lgtm-com
Copy link

lgtm-com bot commented Jul 12, 2021

This pull request introduces 12 alerts when merging ec1303e into 44a3d02 - view on LGTM.com

new alerts:

  • 12 for Unused import

@lgtm-com
Copy link

lgtm-com bot commented Jul 12, 2021

This pull request introduces 12 alerts when merging 78382bc into 44a3d02 - view on LGTM.com

new alerts:

  • 12 for Unused import

@ericharper
Copy link
Collaborator

@michalivne, could you add usage instructions to the README?

@ericharper
Copy link
Collaborator

Please remove the empty file get_wk2.sh

@lgtm-com
Copy link

lgtm-com bot commented Jul 13, 2021

This pull request introduces 12 alerts when merging b91349b into 5363b49 - view on LGTM.com

new alerts:

  • 12 for Unused import

Copy link
Collaborator

@ericharper ericharper left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks!

@lgtm-com
Copy link

lgtm-com bot commented Jul 13, 2021

This pull request introduces 12 alerts when merging 14395e3 into 5363b49 - view on LGTM.com

new alerts:

  • 12 for Unused import

Signed-off-by: Micha Livne <[email protected]>
Copy link
Contributor

@MaximumEntropy MaximumEntropy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for making all of the changes :)

@lgtm-com
Copy link

lgtm-com bot commented Jul 13, 2021

This pull request introduces 12 alerts when merging 85449f3 into 5363b49 - view on LGTM.com

new alerts:

  • 12 for Unused import

@lgtm-com
Copy link

lgtm-com bot commented Jul 14, 2021

This pull request introduces 12 alerts when merging efa5d3f into 6ebbcb8 - view on LGTM.com

new alerts:

  • 12 for Unused import

@lgtm-com
Copy link

lgtm-com bot commented Jul 14, 2021

This pull request introduces 12 alerts when merging d95f97f into ed08545 - view on LGTM.com

new alerts:

  • 12 for Unused import

@MaximumEntropy MaximumEntropy merged commit da90c34 into NVIDIA:main Jul 14, 2021
fayejf added a commit that referenced this pull request Jul 16, 2021
* Itn add classes (#2141)

* move do_training flag to config

Signed-off-by: Yang Zhang <[email protected]>

* added telephone to itn

Signed-off-by: Yang Zhang <[email protected]>

* add telephone and email to itn

Signed-off-by: Yang Zhang <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* ASR + NLP Doc Fixes (#2136)

* Preserve the tokenizer config for ASR

Signed-off-by: smajumdar <[email protected]>

* Correct nlp docs

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Removing graphsurgeon optional dependency, improving import error rep… (#2144)

* Removing graphsurgeon optional dependency, improving import error reporting

Signed-off-by: Boris Fomitchev <[email protected]>

* Fixing scope error

Signed-off-by: Boris Fomitchev <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Fix FilterbankFeatures eval nondeterminism. (#2146)

Signed-off-by: PiotrDabkowski <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* fix the docs. (#2148)


Signed-off-by: Micha Livne <[email protected]>

* Text processing refactor (#2149)

* removed graphutils, suppletive, data_loader_utils from itn to be reused from tn

Signed-off-by: Yang Zhang <[email protected]>

* inheriting itn from tn, thus removing redundancy

Signed-off-by: Yang Zhang <[email protected]>

* cleaned whitelist

Signed-off-by: Yang Zhang <[email protected]>

* lgtm fix

Signed-off-by: Yang Zhang <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Update how artifacts work (#2138)

* Update how artifacts work

Signed-off-by: Oleksii Kuchaiev <[email protected]>

* fixing some tests

Signed-off-by: Oleksii Kuchaiev <[email protected]>

* fix more tests

Signed-off-by: Oleksii Kuchaiev <[email protected]>

* add __init__ to tests to make them discoverable

Signed-off-by: Oleksii Kuchaiev <[email protected]>

* empty src support

Signed-off-by: Oleksii Kuchaiev <[email protected]>

* updates plust unittest

Signed-off-by: Oleksii Kuchaiev <[email protected]>

* add copyright check

Signed-off-by: Oleksii Kuchaiev <[email protected]>

* copyright header

Signed-off-by: Oleksii Kuchaiev <[email protected]>

* fix style

Signed-off-by: Oleksii Kuchaiev <[email protected]>

* handle hashed megatron checkpoint version in nlp restore_from

Signed-off-by: ericharper <[email protected]>

* add _MODEL_RESTORE_PATH to AppState

Signed-off-by: ericharper <[email protected]>

* get rid of global folder caching

Signed-off-by: Oleksii Kuchaiev <[email protected]>

* double register - warning instead of exception

Signed-off-by: Oleksii Kuchaiev <[email protected]>

* Add asr spe tests

Signed-off-by: smajumdar <[email protected]>

* Pop out asr wpe pre-registered value

Signed-off-by: smajumdar <[email protected]>

* Correct ASR tests and paths

Signed-off-by: smajumdar <[email protected]>

* Correct tokenizer saving

Signed-off-by: smajumdar <[email protected]>

* Correct ASR tests

Signed-off-by: smajumdar <[email protected]>

* Correct ASR bpe mixin

Signed-off-by: smajumdar <[email protected]>

* Patch up backward compatibility

Signed-off-by: smajumdar <[email protected]>

* update register_bert_model

Signed-off-by: ericharper <[email protected]>

* update all get_lm_model calls

Signed-off-by: ericharper <[email protected]>

* return None if src not found

Signed-off-by: ericharper <[email protected]>

* handle case with no tokenizer

Signed-off-by: ericharper <[email protected]>

* do not add another hash is using tarfile_artifacts

Signed-off-by: ericharper <[email protected]>

* add return_none flag, update doc string

Signed-off-by: ericharper <[email protected]>

* update default behavior of register_artifact for NLPModel

Signed-off-by: ericharper <[email protected]>

* change kwarg name to verify_src_exists

Signed-off-by: ericharper <[email protected]>

* use cfg instead of _cfg

Signed-off-by: Oleksii Kuchaiev <[email protected]>

* some cleanups

Signed-off-by: Oleksii Kuchaiev <[email protected]>

Co-authored-by: ericharper <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Language model refactoring (#2120)

* fixed branch in IR tutorial

Signed-off-by: AlexGrinch <[email protected]>

* bucketing tarred dataset for lm training

Signed-off-by: AlexGrinch <[email protected]>

* updated global rank

Signed-off-by: AlexGrinch <[email protected]>

* perplexity update

Signed-off-by: AlexGrinch <[email protected]>

* refactor lm to be campatible with latest nmt

Signed-off-by: AlexGrinch <[email protected]>

* perplexity change

Signed-off-by: AlexGrinch <[email protected]>

* removed obsolete config

Signed-off-by: AlexGrinch <[email protected]>

* added sequence perplexity

Signed-off-by: AlexGrinch <[email protected]>

* added non-smoothed CE loss for validation

Signed-off-by: AlexGrinch <[email protected]>

* unified sentence dataset, torchmetrics for sequence perplexity

Signed-off-by: AlexGrinch <[email protected]>

* translate_ddp refactor

Signed-off-by: AlexGrinch <[email protected]>

Co-authored-by: Oleksii Kuchaiev <[email protected]>
Co-authored-by: Sandeep Subramanian <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* [NMT] Multi-validation Patch (#2150)

* rename dl index 0 loss and sacrebleu for backwards compatibility

Signed-off-by: ericharper <[email protected]>

* eval -> val/tst

Signed-off-by: ericharper <[email protected]>

* instantiate torchmetrics after instantiating dataloaders

Signed-off-by: ericharper <[email protected]>

* bug

Signed-off-by: ericharper <[email protected]>

* remove debugging log

Signed-off-by: ericharper <[email protected]>

* remove debugging log

Signed-off-by: ericharper <[email protected]>

Co-authored-by: Oleksii Kuchaiev <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* bumping version to 1.0.0

Signed-off-by: Oleksii Kuchaiev <[email protected]>

Signed-off-by: Micha Livne <[email protected]>

* fixed the num_samples of text classification model. (#2152)

Signed-off-by: Vahid <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* fix for electronic (#2153)

* fix for electronic

Signed-off-by: ekmb <[email protected]>

* special symbols added

Signed-off-by: ekmb <[email protected]>

* restrict symbols list

Signed-off-by: ekmb <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* FastSpeech 2 Test & Docs (#2143)

* Add FS2 data loading test

Signed-off-by: Jocelyn Huang <[email protected]>

* TTS docs update for FastSpeech 2

Signed-off-by: Jocelyn Huang <[email protected]>

* Style fix for FS2 dataset test

Signed-off-by: Jocelyn Huang <[email protected]>

* Fix transpose typo

Signed-off-by: Jocelyn Huang <[email protected]>

Co-authored-by: Oleksii Kuchaiev <[email protected]>
Co-authored-by: Jason <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Minor patch for translate_ddp (#2155)

* Patch for backtranslation in lm dataset

Signed-off-by: MaximumEntropy <[email protected]>

* One more fix

Signed-off-by: MaximumEntropy <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Entity linking (#2050)

* Started adding SAP dataset

Signed-off-by: Virginia Adams <[email protected]>

* Delete .lm_bert_dataset.py.swp

Signed-off-by: Virginia Adams <[email protected]>

* Added dataset and loss

Signed-off-by: Virginia Adams <[email protected]>

* Added entity linking encoder model

Signed-off-by: Virginia Adams <[email protected]>

* Can build and use index from pubmedbert model

Signed-off-by: Virginia Adams <[email protected]>

* checked boolean logic in build_index.py

Signed-off-by: Virginia Adams <[email protected]>

* End to end tested all functionality

Signed-off-by: Virginia Adams <[email protected]>

* fixed val loss none at end of validation

Signed-off-by: Virginia Adams <[email protected]>

* Started adding demo entity linking notebook

Signed-off-by: Virginia Adams <[email protected]>

* adding in notebook demo

Signed-off-by: Virginia Adams <[email protected]>

* added call to entitylinking classes in __init__.py files

Signed-off-by: Virginia Adams <[email protected]>

* Added eval code to notebook

Signed-off-by: Virginia Adams <[email protected]>

* Adding unfinished notebook

Signed-off-by: Virginia Adams <[email protected]>

* Cleaned up example dir

Signed-off-by: Virginia Adams <[email protected]>

* Fixed recap commands

Signed-off-by: Virginia Adams <[email protected]>

* added model typing and tiny data tar

Signed-off-by: Virginia Adams <[email protected]>

* Adding tiny data zip

Signed-off-by: Virginia Adams <[email protected]>

* updated tiny example config data path

Signed-off-by: Virginia Adams <[email protected]>

* Notebook demo works

Signed-off-by: Virginia Adams <[email protected]>

* Changed training epochs

Signed-off-by: Virginia Adams <[email protected]>

* Removed output from training and install cells

Signed-off-by: Virginia Adams <[email protected]>

* changed code formatting

Signed-off-by: Virginia Adams <[email protected]>

* Started doc string for new functions

Signed-off-by: Virginia Adams <[email protected]>

* Updated data_preprocessing to save to data_dir

Signed-off-by: Virginia Adams <[email protected]>

* fixed comment in notebook demo

Signed-off-by: Virginia Adams <[email protected]>

* Update data_preprocessing.py

Signed-off-by: Virginia Adams <[email protected]>

* updated nemo typing imports

Signed-off-by: Virginia Adams <[email protected]>

* about to rebase

Signed-off-by: Virginia Adams <[email protected]>

* added back umls_dataset_processing.py

Signed-off-by: Virginia Adams <[email protected]>

* Removed example data

Signed-off-by: Virginia Adams <[email protected]>

* Fixed typos in notebook demo

Signed-off-by: Virginia Adams <[email protected]>

* fixed lgtm-com issues

Signed-off-by: Virginia Adams <[email protected]>

* added copyright headers

Signed-off-by: Virginia Adams <[email protected]>

* fixed import and copyright headers

Signed-off-by: Virginia Adams <[email protected]>

* Fixed formatting

Signed-off-by: Virginia Adams <[email protected]>

* Fixed formatting changes 2

Signed-off-by: Virginia Adams <[email protected]>

* fixed test formatting

Signed-off-by: Virginia Adams <[email protected]>

* Added __init__.py for model and dataset

Signed-off-by: Virginia Adams <[email protected]>

* loading newline file returns data_dir now

Signed-off-by: Virginia Adams <[email protected]>

* Removed conf notebook and deleted comment

Signed-off-by: Virginia Adams <[email protected]>

* Added jenkins test

Signed-off-by: Virginia Adams <[email protected]>

* Updated Jenkins test

Signed-off-by: Virginia Adams <[email protected]>

* fixed file path

Signed-off-by: Virginia Adams <[email protected]>

* Changed Jenkins pipeline order

Signed-off-by: Virginia Adams <[email protected]>

* Fixed Jenkins datapath... again...

Signed-off-by: Virginia Adams <[email protected]>

* Made most review changes

Signed-off-by: Virginia Adams <[email protected]>

* fixed copy right

Signed-off-by: Virginia Adams <[email protected]>

* updated unit test to wget config

Signed-off-by: Virginia Adams <[email protected]>

* reverted test file back

Signed-off-by: Virginia Adams <[email protected]>

* Added project dir to jenkins test

Signed-off-by: Virginia Adams <[email protected]>

* defined config in unit test

Signed-off-by: Virginia Adams <[email protected]>

Co-authored-by: Sandeep Subramanian <[email protected]>
Co-authored-by: Oleksii Kuchaiev <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Correct branch version for v1.0.0 (#2157)

* Correct branch version

Signed-off-by: smajumdar <[email protected]>

* Correct Jenkinsfile

Signed-off-by: smajumdar <[email protected]>

* Update rst files

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* switch CI back to main

Signed-off-by: Oleksii Kuchaiev <[email protected]>

Signed-off-by: Micha Livne <[email protected]>

* fixed the docs. (#2156)


Signed-off-by: Micha Livne <[email protected]>

* Make Hifigan jittable (#2159)

* FastSpeech 2 Test & Docs (#2143)

* Add FS2 data loading test

Signed-off-by: Jocelyn Huang <[email protected]>

* TTS docs update for FastSpeech 2

Signed-off-by: Jocelyn Huang <[email protected]>

* Style fix for FS2 dataset test

Signed-off-by: Jocelyn Huang <[email protected]>

* Fix transpose typo

Signed-off-by: Jocelyn Huang <[email protected]>

Co-authored-by: Oleksii Kuchaiev <[email protected]>
Co-authored-by: Jason <[email protected]>

* Entity linking (#2050)

* Started adding SAP dataset

Signed-off-by: Virginia Adams <[email protected]>

* Delete .lm_bert_dataset.py.swp

Signed-off-by: Virginia Adams <[email protected]>

* Added dataset and loss

Signed-off-by: Virginia Adams <[email protected]>

* Added entity linking encoder model

Signed-off-by: Virginia Adams <[email protected]>

* Can build and use index from pubmedbert model

Signed-off-by: Virginia Adams <[email protected]>

* checked boolean logic in build_index.py

Signed-off-by: Virginia Adams <[email protected]>

* End to end tested all functionality

Signed-off-by: Virginia Adams <[email protected]>

* fixed val loss none at end of validation

Signed-off-by: Virginia Adams <[email protected]>

* Started adding demo entity linking notebook

Signed-off-by: Virginia Adams <[email protected]>

* adding in notebook demo

Signed-off-by: Virginia Adams <[email protected]>

* added call to entitylinking classes in __init__.py files

Signed-off-by: Virginia Adams <[email protected]>

* Added eval code to notebook

Signed-off-by: Virginia Adams <[email protected]>

* Adding unfinished notebook

Signed-off-by: Virginia Adams <[email protected]>

* Cleaned up example dir

Signed-off-by: Virginia Adams <[email protected]>

* Fixed recap commands

Signed-off-by: Virginia Adams <[email protected]>

* added model typing and tiny data tar

Signed-off-by: Virginia Adams <[email protected]>

* Adding tiny data zip

Signed-off-by: Virginia Adams <[email protected]>

* updated tiny example config data path

Signed-off-by: Virginia Adams <[email protected]>

* Notebook demo works

Signed-off-by: Virginia Adams <[email protected]>

* Changed training epochs

Signed-off-by: Virginia Adams <[email protected]>

* Removed output from training and install cells

Signed-off-by: Virginia Adams <[email protected]>

* changed code formatting

Signed-off-by: Virginia Adams <[email protected]>

* Started doc string for new functions

Signed-off-by: Virginia Adams <[email protected]>

* Updated data_preprocessing to save to data_dir

Signed-off-by: Virginia Adams <[email protected]>

* fixed comment in notebook demo

Signed-off-by: Virginia Adams <[email protected]>

* Update data_preprocessing.py

Signed-off-by: Virginia Adams <[email protected]>

* updated nemo typing imports

Signed-off-by: Virginia Adams <[email protected]>

* about to rebase

Signed-off-by: Virginia Adams <[email protected]>

* added back umls_dataset_processing.py

Signed-off-by: Virginia Adams <[email protected]>

* Removed example data

Signed-off-by: Virginia Adams <[email protected]>

* Fixed typos in notebook demo

Signed-off-by: Virginia Adams <[email protected]>

* fixed lgtm-com issues

Signed-off-by: Virginia Adams <[email protected]>

* added copyright headers

Signed-off-by: Virginia Adams <[email protected]>

* fixed import and copyright headers

Signed-off-by: Virginia Adams <[email protected]>

* Fixed formatting

Signed-off-by: Virginia Adams <[email protected]>

* Fixed formatting changes 2

Signed-off-by: Virginia Adams <[email protected]>

* fixed test formatting

Signed-off-by: Virginia Adams <[email protected]>

* Added __init__.py for model and dataset

Signed-off-by: Virginia Adams <[email protected]>

* loading newline file returns data_dir now

Signed-off-by: Virginia Adams <[email protected]>

* Removed conf notebook and deleted comment

Signed-off-by: Virginia Adams <[email protected]>

* Added jenkins test

Signed-off-by: Virginia Adams <[email protected]>

* Updated Jenkins test

Signed-off-by: Virginia Adams <[email protected]>

* fixed file path

Signed-off-by: Virginia Adams <[email protected]>

* Changed Jenkins pipeline order

Signed-off-by: Virginia Adams <[email protected]>

* Fixed Jenkins datapath... again...

Signed-off-by: Virginia Adams <[email protected]>

* Made most review changes

Signed-off-by: Virginia Adams <[email protected]>

* fixed copy right

Signed-off-by: Virginia Adams <[email protected]>

* updated unit test to wget config

Signed-off-by: Virginia Adams <[email protected]>

* reverted test file back

Signed-off-by: Virginia Adams <[email protected]>

* Added project dir to jenkins test

Signed-off-by: Virginia Adams <[email protected]>

* defined config in unit test

Signed-off-by: Virginia Adams <[email protected]>

Co-authored-by: Sandeep Subramanian <[email protected]>
Co-authored-by: Oleksii Kuchaiev <[email protected]>

* switch CI back to main

Signed-off-by: Oleksii Kuchaiev <[email protected]>

* Make Hifigan jittable

Signed-off-by: Ryan Leary <[email protected]>

* Remove vestigial debugging printout

Signed-off-by: Ryan Leary <[email protected]>

* Add export forward and fix style

Signed-off-by: Ryan Leary <[email protected]>

* Fix load_state_dict override for arbitrary layers

Signed-off-by: Ryan Leary <[email protected]>

Co-authored-by: Jocelyn <[email protected]>
Co-authored-by: Oleksii Kuchaiev <[email protected]>
Co-authored-by: Jason <[email protected]>
Co-authored-by: vadam5 <[email protected]>
Co-authored-by: Sandeep Subramanian <[email protected]>
Co-authored-by: Oleksii Kuchaiev <[email protected]>
Co-authored-by: Ryan Leary <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* fix version (#2162)

Signed-off-by: Jason <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Megatron nb size reduced (#2163)

* notebook size reduced

Signed-off-by: ekmb <[email protected]>

* notebook size reduced

Signed-off-by: ekmb <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* update spectral clustering method (#2158)

* update spectral clustering method

Signed-off-by: nithinraok <[email protected]>

* update Jenkins File

Signed-off-by: nithinraok <[email protected]>

* threshold fix by reducing window length for shorter embs

Signed-off-by: nithinraok <[email protected]>

* grammar fixes

Signed-off-by: nithinraok <[email protected]>

* CR update

Signed-off-by: nithinraok <[email protected]>

* paper reference

Signed-off-by: nithinraok <[email protected]>

* improve docstring for yaml

Signed-off-by: nithinraok <[email protected]>

* Doc fixes

Signed-off-by: nithinraok <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* revert (#2167)

Signed-off-by: Jason <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Limit Pytorch lightning release (#2170)

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* token classification models artifacts update (#2169)

* artifacts update

Signed-off-by: ekmb <[email protected]>

* artifacts update

Signed-off-by: ekmb <[email protected]>

* jenkins

Signed-off-by: ekmb <[email protected]>

* fix for model restoration

Signed-off-by: ekmb <[email protected]>

* typos fix + jenkins dir update

Signed-off-by: ekmb <[email protected]>

* typo

Signed-off-by: ericharper <[email protected]>

* update jenkins branch

Signed-off-by: ericharper <[email protected]>

* add &&

Signed-off-by: ericharper <[email protected]>

* jenkins

Signed-off-by: ekmb <[email protected]>

* jenkins disable

Signed-off-by: ekmb <[email protected]>

* revert jenkins

Signed-off-by: ekmb <[email protected]>

* jenkins disable

Signed-off-by: ekmb <[email protected]>

* revert jenkins

Signed-off-by: ekmb <[email protected]>

Co-authored-by: ericharper <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Fix to always_save_nemo (#2174)

* Initial attempt at always_save_nemo fix

Signed-off-by: MaximumEntropy <[email protected]>

* updated path before saving in exp manager, fixed bug when handling tarfile artifacts

Signed-off-by: ericharper <[email protected]>

* Add test with always_save_nemo to exp_manager

Signed-off-by: MaximumEntropy <[email protected]>

* Style fixes

Signed-off-by: MaximumEntropy <[email protected]>

* update jenkins branch

Signed-off-by: ericharper <[email protected]>

* check for nemo:

Signed-off-by: ericharper <[email protected]>

* check for nemo:

Signed-off-by: ericharper <[email protected]>

* check for nemo:

Signed-off-by: ericharper <[email protected]>

Co-authored-by: ericharper <[email protected]>
Co-authored-by: Jason <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* fix typo (#2179)

Signed-off-by: Yang Zhang <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Make itn tests optional  (#2173)

* Limit Pytorch lightning release

Signed-off-by: smajumdar <[email protected]>

* Add final two checks

Signed-off-by: smajumdar <[email protected]>

Co-authored-by: Jason <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* First Revision of TTS Docs and Notebooks Update for 1.0 (#2166)

* squash

Signed-off-by: Jason <[email protected]>

* notebook fixes

Signed-off-by: Jason <[email protected]>

* notebook fixes

Signed-off-by: Jason <[email protected]>

* typos

Signed-off-by: Jason <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* add more alternatives of 0 for telephone (#2171)

Signed-off-by: Yang Zhang <[email protected]>

Co-authored-by: Oleksii Kuchaiev <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Acc tn (#2180)

* make tn cardinal faster

Signed-off-by: Yang Zhang <[email protected]>

* add number far

Signed-off-by: Yang Zhang <[email protected]>

* add test

Signed-off-by: Yang Zhang <[email protected]>

* fix lgtm

Signed-off-by: Yang Zhang <[email protected]>

* fix lgtm

Signed-off-by: Yang Zhang <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* [DOCS] NLP Model parallel, NMT multi-val, CORE register artifacts  (#2168)

* update docs

Signed-off-by: ericharper <[email protected]>

* update docs

Signed-off-by: ericharper <[email protected]>

* update docs

Signed-off-by: ericharper <[email protected]>

* update docs

Signed-off-by: ericharper <[email protected]>

* update docs

Signed-off-by: ericharper <[email protected]>

* update docs

Signed-off-by: ericharper <[email protected]>

* update docs

Signed-off-by: ericharper <[email protected]>

* update docs

Signed-off-by: ericharper <[email protected]>

* update docs

Signed-off-by: ericharper <[email protected]>

* update docs

Signed-off-by: ericharper <[email protected]>

* update docs

Signed-off-by: ericharper <[email protected]>

* update docs

Signed-off-by: ericharper <[email protected]>

Co-authored-by: Sandeep Subramanian <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Change label smoothing prob to reduce chance of test failure (#2184)

Signed-off-by: MaximumEntropy <[email protected]>

Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Add FS2 checkpoint links to docs and inference notebook (#2181)

* Add FS2 checkpoint links to docs and inference notebook

Signed-off-by: Jocelyn Huang <[email protected]>

* Remove empty cell from TTS notebook

Signed-off-by: Jocelyn Huang <[email protected]>

Co-authored-by: Oleksii Kuchaiev <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Update ptl to 1.3 on main branch (#2178)

* Update PTL

Signed-off-by: smajumdar <[email protected]>

* Begin update to Pytorch Lightning 1.3.x

Signed-off-by: smajumdar <[email protected]>

* Formatting

Signed-off-by: smajumdar <[email protected]>

* style

Signed-off-by: ericharper <[email protected]>

* Formatting

Signed-off-by: smajumdar <[email protected]>

* minor fix

Signed-off-by: Jason <[email protected]>

* minor fix

Signed-off-by: Jason <[email protected]>

* get testing attribute from trainer

Signed-off-by: ericharper <[email protected]>

* update init_ddp_connection override

Signed-off-by: ericharper <[email protected]>

* update attribute

Signed-off-by: ericharper <[email protected]>

* add barrier after load checkpoint in megatron

Signed-off-by: ericharper <[email protected]>

* remove barrier

Signed-off-by: ericharper <[email protected]>

* update last naming

Signed-off-by: Jason <[email protected]>

Co-authored-by: ericharper <[email protected]>
Co-authored-by: Jason <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* SDE updates (#2187)

* Added updates to SDE:
- support for external vocabulary (to detect OOV words)
- support for offset field (for segmented long recordings)
- UI improvements

Signed-off-by: Vitaly Lavrukhin <[email protected]>

* Refactored diff in SDE

Signed-off-by: Vitaly Lavrukhin <[email protected]>

Co-authored-by: Oleksii Kuchaiev <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Add TTS aligner and improved version of g2p for vocabs.Phonemes, small improvement in TalkNet (#2189)

* add first version of aligner

Signed-off-by: Oktai Tatanov <[email protected]>

* aligner docs, new g2p version, fix bugs in talknet

Signed-off-by: Oktai Tatanov <[email protected]>

* update docs and remove lj related code

Signed-off-by: Oktai Tatanov <[email protected]>

* fix style

Signed-off-by: Oktai Tatanov <[email protected]>

* fix import

Signed-off-by: Oktai Tatanov <[email protected]>

Co-authored-by: Oleksii Kuchaiev <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* set the default of nodessplitter to None. (#2190)

Signed-off-by: Vahid <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* NMT fixes (#2194)

* minor fixes

Signed-off-by: Oleksii Kuchaiev <[email protected]>

* minor bugfixes

Signed-off-by: Oleksii Kuchaiev <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Store mappings file in .nemo for FS2 model (#2196)

* Store mappings file in .nemo for FS2 model

Signed-off-by: Jocelyn Huang <[email protected]>

* Add error enforcing mappings file during training (FS2)

Signed-off-by: Jocelyn Huang <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Add support to change the SE context window of ConvASREncoder (#2193)

* Add support for changing context window on the fly

Signed-off-by: smajumdar <[email protected]>

* Add support to change the SE context window of ConvASREncoder

Signed-off-by: smajumdar <[email protected]>

* Add ability to skip config updating

Signed-off-by: smajumdar <[email protected]>

* Switch to mixin based API

Signed-off-by: smajumdar <[email protected]>

* Update docs and api for ASRModuleMixin

Signed-off-by: smajumdar <[email protected]>

* Change print to logging.info

Signed-off-by: smajumdar <[email protected]>

* Correct stride level when computing context window

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Add a CI test for doing inference with an NMT model trained with Pre-LN (#2198)

* Change label smoothing prob to reduce chance of test failure

Signed-off-by: MaximumEntropy <[email protected]>

* Add Pre-LN inference test to Jenkinsfile

Signed-off-by: MaximumEntropy <[email protected]>

* Separate tests for training and NMT inference

Signed-off-by: MaximumEntropy <[email protected]>

Co-authored-by: Oleksii Kuchaiev <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Fix ipywidgets error in asr notebook (#2199)

Added `ipywidgets` to avoid `ImportError: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html` error.

Signed-off-by: Derek Chia <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* metrics fix (#2202)

* metrics fix

Signed-off-by: ekmb <[email protected]>

* metrics reset for punct model

Signed-off-by: ekmb <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* readme and minor improvements (#2203)

* readme and minor improvements

Signed-off-by: nithinraok <[email protected]>

* vad threshold update

Signed-off-by: nithinraok <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Fix text processing docs (#2195)

* fix text processing docs

Signed-off-by: Yang Zhang <[email protected]>

* fix name

Signed-off-by: Yang Zhang <[email protected]>

* add guard to pynini import

Signed-off-by: Yang Zhang <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Fix bug in SpecCutout (#2201)

Signed-off-by: Robert Bracco <[email protected]>

Co-authored-by: Oleksii Kuchaiev <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Fix bug in SpecCutout (#2201) (#2205)

Signed-off-by: Robert Bracco <[email protected]>

Co-authored-by: Oleksii Kuchaiev <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Signed-off-by: smajumdar <[email protected]>

Co-authored-by: Robert Bracco <[email protected]>
Co-authored-by: Oleksii Kuchaiev <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Set seed before generating random tensors in NMT test (#2206)

* Change label smoothing prob to reduce chance of test failure

Signed-off-by: MaximumEntropy <[email protected]>

* Set seed before generating tensors

Signed-off-by: MaximumEntropy <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* ASR patches for v1.0.0 (#2207)

* Multiple updates to RNNT add initialization

Signed-off-by: smajumdar <[email protected]>

* Correct name of initilization

Signed-off-by: smajumdar <[email protected]>

* Update dockerignore

Signed-off-by: smajumdar <[email protected]>

* Fix RNNT WER calculation

Signed-off-by: smajumdar <[email protected]>

* Address comments

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Multilingual training for NMT (#2160)

* mnmt on fresh main

Signed-off-by: Abhinav Khattar <[email protected]>

* push for test

Signed-off-by: Abhinav Khattar <[email protected]>

* debug

Signed-off-by: Abhinav Khattar <[email protected]>

* check

Signed-off-by: Abhinav Khattar <[email protected]>

* cleanup

Signed-off-by: Abhinav Khattar <[email protected]>

* minor fix

Signed-off-by: Abhinav Khattar <[email protected]>

* more minor fixes

Signed-off-by: Abhinav Khattar <[email protected]>

* fix for test

Signed-off-by: Abhinav Khattar <[email protected]>

* fix list size error

Signed-off-by: Abhinav Khattar <[email protected]>

* multilingual in infer

Signed-off-by: Abhinav Khattar <[email protected]>

* changes

Signed-off-by: Abhinav Khattar <[email protected]>

* tar creation with multilingual

Signed-off-by: Abhinav Khattar <[email protected]>

* fix

Signed-off-by: Abhinav Khattar <[email protected]>

* changes + parallelism + bug fix

Signed-off-by: Abhinav Khattar <[email protected]>

* small fix

Signed-off-by: Abhinav Khattar <[email protected]>

* multilingual preprocessor fix

Signed-off-by: Abhinav Khattar <[email protected]>

* globally unique fragment names in tarred dataset

Signed-off-by: Abhinav Khattar <[email protected]>

* minor changes

Signed-off-by: Abhinav Khattar <[email protected]>

* rm load_from_cached_dataset

Signed-off-by: Abhinav Khattar <[email protected]>

* minor config change

Signed-off-by: Abhinav Khattar <[email protected]>

* rm unsued import

Signed-off-by: Abhinav Khattar <[email protected]>

Co-authored-by: Oleksii Kuchaiev <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Remove memory leak from ASR notebook + update model notebook (#2213)

* ASR patches for v1.0.0 (#2207)

* Multiple updates to RNNT add initialization

Signed-off-by: smajumdar <[email protected]>

* Correct name of initilization

Signed-off-by: smajumdar <[email protected]>

* Update dockerignore

Signed-off-by: smajumdar <[email protected]>

* Fix RNNT WER calculation

Signed-off-by: smajumdar <[email protected]>

* Address comments

Signed-off-by: smajumdar <[email protected]>

* Correct model notebook to log the loss and correctly assign keys

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* replace names in vad tutorials (#2220)

Signed-off-by: fayejf <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* fix the versioning name. (#2209)

* fix the versioning name.

Signed-off-by: Vahid <[email protected]>

* Made version None.

Signed-off-by: Vahid <[email protected]>

Co-authored-by: Oleksii Kuchaiev <[email protected]>
Co-authored-by: Jason <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Enabled passing kwargs to export() (#2175)

* Enabled passing kwargs to export()

Signed-off-by: Boris Fomitchev <[email protected]>

* Fixing style; changed Classifier input_example to new extended syntax

Signed-off-by: Boris Fomitchev <[email protected]>

* Fixed order of forward() call in export

Signed-off-by: Boris Fomitchev <[email protected]>

* Fixing style

Signed-off-by: Boris Fomitchev <[email protected]>

Co-authored-by: Oleksii Kuchaiev <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* update g2p: ambigious ignore, flag for skipping seq2seq (#2223)

Signed-off-by: Oktai Tatanov <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Update TTS notebook with TalkNet inference (#2133)

* Update TTS notebook with TalkNet inference.

Signed-off-by: Stanislav Beliaev <[email protected]>

* Update TTS Notebook.

Signed-off-by: Stanislav Beliaev <[email protected]>

* Update TTS TN Training Notebook.

Signed-off-by: Stanislav Beliaev <[email protected]>

* Fix TN paper link.

Signed-off-by: Stanislav Beliaev <[email protected]>

* Remove branch updaing TODOs.

Signed-off-by: Stanislav Beliaev <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* update speaker notebooks (#2224)

Signed-off-by: nithinraok <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Support symlinked files (#2216)

Signed-off-by: Anas Abou Allaban <[email protected]>

Co-authored-by: Somshubra Majumdar <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Set strict=True everywhere by default. (#2225)

Signed-off-by: Oleksii Kuchaiev <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* set strict=True in nlp_model (#2227)

Signed-off-by: Oleksii Kuchaiev <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* set strict=False for model parallel examples

Signed-off-by: Oleksii Kuchaiev <[email protected]>

Signed-off-by: Micha Livne <[email protected]>

* Make Text processing installation optional via reinstall.sh (#2226)

* Make Text processing installation optional via reinstall.sh

Signed-off-by: smajumdar <[email protected]>

* Support both success and failure states

Signed-off-by: smajumdar <[email protected]>

Co-authored-by: Oleksii Kuchaiev <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Transformer final norm preln (#2197)

* fix pre_ln final norm

Signed-off-by: fayejf <[email protected]>

* style fix

Signed-off-by: fayejf <[email protected]>

* bug fixed

Signed-off-by: fayejf <[email protected]>

* bugfix post_ln

Signed-off-by: fayejf <[email protected]>

* update and add pre_ln_final_norm

Signed-off-by: fayejf <[email protected]>

* style fix

Signed-off-by: fayejf <[email protected]>

* fix for unit test

Signed-off-by: fayejf <[email protected]>

* rename final_norm to final_layer_norm

Signed-off-by: fayejf <[email protected]>

* bug fix

Signed-off-by: fayejf <[email protected]>

* tiny fix

Signed-off-by: fayejf <[email protected]>

* fix and improve

Signed-off-by: fayejf <[email protected]>

* tiny fix

Signed-off-by: fayejf <[email protected]>

* Patch for NMT to allow loading old modlels trained with pre-LN

Signed-off-by: MaximumEntropy <[email protected]>

Co-authored-by: Sandeep Subramanian <[email protected]>
Co-authored-by: Oleksii Kuchaiev <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Update models and notebook for 1.0 (#2211)

* update models

Signed-off-by: Jason <[email protected]>

* updates

Signed-off-by: Jason <[email protected]>

* fix

Signed-off-by: Jason <[email protected]>

* add links

Signed-off-by: Jason <[email protected]>

* fixes

Signed-off-by: Jason <[email protected]>

* style

Signed-off-by: Jason <[email protected]>

* update checkpoints

Signed-off-by: Jason <[email protected]>

* rename

Signed-off-by: Jason <[email protected]>

* fixes

Signed-off-by: Jason <[email protected]>

* lgtm

Signed-off-by: Jason <[email protected]>

* fix loading waveglow

Signed-off-by: Jason <[email protected]>

* typo

Signed-off-by: Jason <[email protected]>

Co-authored-by: Oleksii Kuchaiev <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* update_metrics_classification_models (#2228)

Signed-off-by: nithinraok <[email protected]>

Co-authored-by: fayejf <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Data loader for seq of label model (#2084)

* feature to seq label data loader

Signed-off-by: fayejf <[email protected]>

* style fix

Signed-off-by: fayejf <[email protected]>

* small fix

Signed-off-by: fayejf <[email protected]>

* update tl to be length of seq label

Signed-off-by: fayejf <[email protected]>

* style fix

Signed-off-by: fayejf <[email protected]>

* tiny bug fix

Signed-off-by: fayejf <[email protected]>

* small updates

Signed-off-by: fayejf <[email protected]>

* updates for review feedback

Signed-off-by: fayejf <[email protected]>

* style fix

Signed-off-by: fayejf <[email protected]>

* explain seq_label

Signed-off-by: fayejf <[email protected]>

* fix lgtm

Signed-off-by: fayejf <[email protected]>

* small updates

Signed-off-by: fayejf <[email protected]>

* improve as discussed

Signed-off-by: fayejf <[email protected]>

* add docstring

Signed-off-by: fayejf <[email protected]>

Co-authored-by: Nithin Rao <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* fix comments (#2236)

* fix comments

Signed-off-by: Yang Zhang <[email protected]>

* fix typo

Signed-off-by: Yang Zhang <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* add paper ref to sgdqa model doc (#2233)

* add paper ref to sgdqa model doc

Signed-off-by: Yang Zhang <[email protected]>

* fix comments

Signed-off-by: Yang Zhang <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Move ConcatDataset to common (#2237)

* move concatdataset to common

Signed-off-by: Abhinav Khattar <[email protected]>

* var name change

Signed-off-by: Abhinav Khattar <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* audio based normalization (#2231)

* squash norm_audio

Signed-off-by: ekmb <[email protected]>

* add missing files

Signed-off-by: ekmb <[email protected]>

* style

Signed-off-by: ekmb <[email protected]>

* unit tests added, docstrings fixed

Signed-off-by: ekmb <[email protected]>

* fix lgtm errors

Signed-off-by: ekmb <[email protected]>

* debug jenkins

Signed-off-by: ekmb <[email protected]>

* debug jenkins

Signed-off-by: ekmb <[email protected]>

* signature update

Signed-off-by: ekmb <[email protected]>

* set deterministic default

Signed-off-by: ekmb <[email protected]>

* add more test cases

Signed-off-by: ekmb <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* bug fix config (#2232)

Signed-off-by: fayejf <[email protected]>

Co-authored-by: Somshubra Majumdar <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Alias Swish to SiLU  (#2239)

* Alias Swish to SiLU and move activations to inplace execution if possible

Signed-off-by: smajumdar <[email protected]>

* Remove unused import

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Update README.rst
Signed-off-by: Micha Livne <[email protected]>

* Offline asr notebook bug fix (#2242)

* fix

Signed-off-by: fayejf <[email protected]>

* install

Signed-off-by: fayejf <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Fix docstring (#2244)

* fix comments

Signed-off-by: Yang Zhang <[email protected]>

* fix doc string

Signed-off-by: Yang Zhang <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Update "last" Checkpoint (#2241)

* fix

Signed-off-by: Jason <[email protected]>

* change

Signed-off-by: Jason <[email protected]>

* fix

Signed-off-by: Jason <[email protected]>

Co-authored-by: Sandeep Subramanian <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Add pretrained model stt_es_citrinet_512 (#2247)

Signed-off-by: Elena Rastorgueva <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* [BUGFIX] Only process tarfile artifacts when model was restored from tarfile (#2250)

* process tarfile artifacts only if model is being restored

Signed-off-by: ericharper <[email protected]>

* process tarfile artifacts only if model was restored from a tarfile

Signed-off-by: ericharper <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Log average metrics for Multi-validation in NMT (#2251)

* add avg metrics NMT

Signed-off-by: Abhinav Khattar <[email protected]>

* name change

Signed-off-by: Abhinav Khattar <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Update Primer notebook (#2258)

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* fixed Bug 3310780 and  3310799 (#2264)

Signed-off-by: Virginia Adams <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Support multiple models being instantiated in same execution scope (#2245)

* Support multiple models being instantiated in same execution scope

Signed-off-by: smajumdar <[email protected]>

* Fix tests

Signed-off-by: smajumdar <[email protected]>

* Add locks to methods in appstate

Signed-off-by: smajumdar <[email protected]>

* Perform locks only on write operations

Signed-off-by: smajumdar <[email protected]>

* Correct deadlock issue

Signed-off-by: smajumdar <[email protected]>

* Add more tests

Signed-off-by: smajumdar <[email protected]>

* Add test for multi save and remove patch to change save type

Signed-off-by: smajumdar <[email protected]>

* Update app state to preserve gidx of previous token

Signed-off-by: smajumdar <[email protected]>

* Correct restoration logic for tarfiles

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* ASR Refactoring (#2240)

* Refactor out the preprocessing from ASR into common

Signed-off-by: smajumdar <[email protected]>

* Correct nltk issue with vocabs.py for clusters

Signed-off-by: smajumdar <[email protected]>

* Add typing information to SpecAugment and SpecCutout

Signed-off-by: smajumdar <[email protected]>

* Reorganize parts directory

Signed-off-by: smajumdar <[email protected]>

* Refactor parts submodules, add __init__ to few important parts

Signed-off-by: smajumdar <[email protected]>

* Update docs for new path to parts

Signed-off-by: smajumdar <[email protected]>

* Cherry pick PR https://github.com/NVIDIA/NeMo/pull/2219

Signed-off-by: smajumdar <[email protected]>

* Add header for preprocessing commons

Signed-off-by: smajumdar <[email protected]>

* Fix style of tests

Signed-off-by: smajumdar <[email protected]>

* Add forced update of configs for train-val-test ds to new labels tests

Signed-off-by: smajumdar <[email protected]>

* Update path to FilterbankFeatures for TTS

Signed-off-by: smajumdar <[email protected]>

* Add an alias file for backward compatibility

Signed-off-by: smajumdar <[email protected]>

* Add an alias file for backward compatibility

Signed-off-by: smajumdar <[email protected]>

* Update training scripts of ASR to support finetuning

Signed-off-by: smajumdar <[email protected]>

* Update Finetuning step to be ModelPT level

Signed-off-by: smajumdar <[email protected]>

* Update docs for finetuning for ASR

Signed-off-by: smajumdar <[email protected]>

* Fix style

Signed-off-by: smajumdar <[email protected]>

* Update docs and scripts with fine-tuning info

Signed-off-by: smajumdar <[email protected]>

* Update docs and scripts with fine-tuning info

Signed-off-by: smajumdar <[email protected]>

* Fix style

Signed-off-by: smajumdar <[email protected]>

* Update scripts

Signed-off-by: smajumdar <[email protected]>

* Add comment for weight initialization

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* TTS Doc Fix and Remove TTS Test (#2272)

* bug fix and remove test

Signed-off-by: Jason <[email protected]>

* syntax

Signed-off-by: Jason <[email protected]>

* syntax

Signed-off-by: Jason <[email protected]>

* syntax

Signed-off-by: Jason <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Talknet training Fix (#2273)

* TalkNet Training notebook fix.

Signed-off-by: Stanislav Beliaev <[email protected]>

* Remove debug stuff.

Signed-off-by: Stanislav Beliaev <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* update (#2274)

Signed-off-by: Jason <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Add links (#2275)

* update

Signed-off-by: Jason <[email protected]>

* link

Signed-off-by: Jason <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Delete 3_TTS_TalkNet_Training.ipynb (#2276)

Signed-off-by: Jason <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* tune down logging (#2277)

* tune down logging

Signed-off-by: Oleksii Kuchaiev <[email protected]>

* debug message instead of removing it completely

Signed-off-by: Oleksii Kuchaiev <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* minor bugfix

Signed-off-by: Oleksii Kuchaiev <[email protected]>

Signed-off-by: Micha Livne <[email protected]>

* remove confusing message

Signed-off-by: Oleksii Kuchaiev <[email protected]>

Signed-off-by: Micha Livne <[email protected]>

* Restore TalkNet training notebook (#2281)

* Restore TalkNet training notebook.

Signed-off-by: Stanislav Beliaev <[email protected]>

* Remove torchaudio dep.

Signed-off-by: Stanislav Beliaev <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Fix ExpManager Issues and FastPitch (#2283)

* backport exp_manager fixes to v1

Signed-off-by: Jason <[email protected]>

* fix fastpitch

Signed-off-by: Jason <[email protected]>

* fix tests

Signed-off-by: Jason <[email protected]>

* update prefix

Signed-off-by: Jason <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Organize asr config folders (#2284)


Signed-off-by: Micha Livne <[email protected]>

* Fix and enable DALI tests (#2077)

* Fix and enable DALI tests

Signed-off-by: Joaquin Anton <[email protected]>

* remove unused import

Signed-off-by: Joaquin Anton <[email protected]>

* Move DALI tests to a separate Jenkins stage

Signed-off-by: Joaquin Anton <[email protected]>

* Remove DALI tests from the main jenkins ASR stage

Signed-off-by: Joaquin Anton <[email protected]>

* Comment out MFCC test

Signed-off-by: Joaquin Anton <[email protected]>

* Working version

Signed-off-by: Joaquin Anton <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Added unit test for hifigan export, fixed hifigan export (#2279)

* Added unit test for hifigan export, Removed runtime test from waveglow test (now in export)

Signed-off-by: Boris Fomitchev <[email protected]>

* Fixed style

Signed-off-by: Boris Fomitchev <[email protected]>

* Fixed style

Signed-off-by: Boris Fomitchev <[email protected]>

Co-authored-by: Somshubra Majumdar <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Update conformer recipes (#2265)

* updated readme asr.

Signed-off-by: Vahid <[email protected]>

* added models.

Signed-off-by: Vahid <[email protected]>

* fixed docs.

Signed-off-by: Vahid <[email protected]>

* fixed docs.

Signed-off-by: Vahid <[email protected]>

* fixed docs.

Signed-off-by: Vahid <[email protected]>

* fixed docs.

Signed-off-by: Vahid <[email protected]>

* fixed the docs.

Signed-off-by: Vahid <[email protected]>

* fixed the docs.

Signed-off-by: Vahid <[email protected]>

* fixed the docs.

Signed-off-by: Vahid <[email protected]>

* disabled test.

Signed-off-by: Vahid <[email protected]>

* Updated the config files.

Signed-off-by: Vahid <[email protected]>

* Updated the config files.

Signed-off-by: Vahid <[email protected]>

* Updated the config files.

Signed-off-by: Vahid <[email protected]>

* dropped the wers.

Signed-off-by: Vahid <[email protected]>

* dropped the wers.

Signed-off-by: Vahid <[email protected]>

* dropped new models and reverted to old versions.

Signed-off-by: Vahid <[email protected]>

Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: Oleksii Kuchaiev <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Adding neural rescorer and its documentations (#2287)

* Added intial neural rescorer scripts.

Signed-off-by: Vahid <[email protected]>

* Added intial neural rescorer scripts.

Signed-off-by: Vahid <[email protected]>

* Added intial neural rescorer scripts.

Signed-off-by: Vahid <[email protected]>

* Added intial neural rescorer scripts.

Signed-off-by: Vahid <[email protected]>

* Added intial neural rescorer scripts.

Signed-off-by: Vahid <[email protected]>

* Added intial neural rescorer scripts.

Signed-off-by: Vahid <[email protected]>

* Added intial neural rescorer scripts.

Signed-off-by: Vahid <[email protected]>

* Added intial neural rescorer scripts.

Signed-off-by: Vahid <[email protected]>

* Added intial neural rescorer scripts.

Signed-off-by: Vahid <[email protected]>

* added more docs, figures, and output file.

Signed-off-by: Vahid <[email protected]>

* added more docs, figures, and output file.

Signed-off-by: Vahid <[email protected]>

* added more docs, figures, and output file.

Signed-off-by: Vahid <[email protected]>

* added more docs, figures, and output file.

Signed-off-by: Vahid <[email protected]>

* added more docs, figures, and output file.

Signed-off-by: Vahid <[email protected]>

* fixed style

Signed-off-by: Vahid <[email protected]>

* add a note to asr notebook.

Signed-off-by: Vahid <[email protected]>

* add a note to asr notebook.

Signed-off-by: Vahid <[email protected]>

* add a note to asr notebook.

Signed-off-by: Vahid <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Adjust warning messages

Signed-off-by: Oleksii Kuchaiev <[email protected]>

Signed-off-by: Micha Livne <[email protected]>

* Revert "Adjust warning messages"

This reverts commit df046ec55754d0136a2a28451435068f32409f30.

Signed-off-by: Micha Livne <[email protected]>

* Adjust warning messages (#2294)

Signed-off-by: Oleksii Kuchaiev <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Adding new Models releases on NGC. (#2295)

* added new models.

Signed-off-by: Vahid <[email protected]>

* added tests for asr lm.

Signed-off-by: Vahid <[email protected]>

* added tests for asr lm.

Signed-off-by: Vahid <[email protected]>

* dropped the test.

Signed-off-by: Vahid <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* update quantization (#2298)

Signed-off-by: slyned <[email protected]>

Co-authored-by: slyned <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* ASR improvements (#2293)

* Update numba messages and citrinet configs

Signed-off-by: smajumdar <[email protected]>

* Remove support for weight init scale and hidden hidden bias scale for layer normalized lstm

Signed-off-by: smajumdar <[email protected]>

* Add support for multiple filetypes in tarred datasets, correct rnn LN-lstm inputs, fix OmegaConf compat issue

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Time quarter to (#2292)

* fix comments

Signed-off-by: Yang Zhang <[email protected]>

* fix doc string

Signed-off-by: Yang Zhang <[email protected]>

* adding quarter to to time class

Signed-off-by: Yang Zhang <[email protected]>

* fix typo

Signed-off-by: Yang Zhang <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* fixed paths. (#2301)

Signed-off-by: Vahid <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Added onnxruntime check of exported ONNX, bumped up default ONNX opset (#2278)

* Added onnxruntime check of exported ONNX, bumped up default ONNX opset

Signed-off-by: Boris Fomitchev <[email protected]>

* Made TS export to accept ONNX-style input example, removed unused param to export

Signed-off-by: Boris Fomitchev <[email protected]>

* check_trace default made False

Signed-off-by: Boris Fomitchev <[email protected]>

* Fixed for updated export signature

Signed-off-by: Boris Fomitchev <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Update readmes

Signed-off-by: Oleksii Kuchaiev <[email protected]>

Signed-off-by: Micha Livne <[email protected]>

* update readme

Signed-off-by: Oleksii Kuchaiev <[email protected]>

Signed-off-by: Micha Livne <[email protected]>

* update readme

Signed-off-by: Oleksii Kuchaiev <[email protected]>

Signed-off-by: Micha Livne <[email protected]>

* fix docs table

Signed-off-by: Oleksii Kuchaiev <[email protected]>

Signed-off-by: Micha Livne <[email protected]>

* Add support for Numba CUDA optimized SpecAugment (#2269)

* Initial implementation

Signed-off-by: smajumdar <[email protected]>

* Initial implementation

Signed-off-by: smajumdar <[email protected]>

* Finish initial implementation of numba spec augment

Signed-off-by: smajumdar <[email protected]>

* Correct mask propagataion

Signed-off-by: smajumdar <[email protected]>

* Parallelize kernel over batch instead of over masks

Signed-off-by: smajumdar <[email protected]>

* Finish tests and update to signature of spectrogramaugmentation calls

Signed-off-by: smajumdar <[email protected]>

* Finish tests and update to signature of spectrogramaugmentation calls

Signed-off-by: smajumdar <[email protected]>

* Add header

Signed-off-by: smajumdar <[email protected]>

* Fix style

Signed-off-by: smajumdar <[email protected]>

* Add heuristics

Signed-off-by: smajumdar <[email protected]>

* Correct inclusive range of padding

Signed-off-by: smajumdar <[email protected]>

* Correct typing for spec aug numba

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Added JSON manifest's support to transcribe_speech.py (#2304)

* Added JSON manifest's support to transcribe_speech.py

Signed-off-by: Vitaly Lavrukhin <[email protected]>

* Dropped unused import

Signed-off-by: Vitaly Lavrukhin <[email protected]>

Co-authored-by: Oleksii Kuchaiev <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* get embedding for a single file (#2310)

* get embedding for a single file

Signed-off-by: nithinraok <[email protected]>

* fixes

Signed-off-by: nithinraok <[email protected]>

* sr update

Signed-off-by: nithinraok <[email protected]>

* regain train mode

Signed-off-by: nithinraok <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Update FastPitch (#2249)

* wip

Signed-off-by: Jason <[email protected]>

* c1

Signed-off-by: Jason <[email protected]>

* bug fixes

Signed-off-by: Jason <[email protected]>

* bug fixes

Signed-off-by: Jason <[email protected]>

* v2

Signed-off-by: Jason <[email protected]>

* changes

Signed-off-by: Jason <[email protected]>

* add types, old model working

Signed-off-by: Jason <[email protected]>

* pitch

Signed-off-by: Jason <[email protected]>

* update

Signed-off-by: Jason <[email protected]>

* update

Signed-off-by: Jason <[email protected]>

* let it work

Signed-off-by: Jason <[email protected]>

* fixes

Signed-off-by: Jason <[email protected]>

* add oktai comments

Signed-off-by: Jason <[email protected]>

* debug

Signed-off-by: Jason <[email protected]>

* scale

Signed-off-by: Jason <[email protected]>

* wip

Signed-off-by: Jason <[email protected]>

* fix test for v1

Signed-off-by: Jason <[email protected]>

* merge train and val

Signed-off-by: Jason <[email protected]>

* back to par bin att, add correct encoder settings

Signed-off-by: Jason <[email protected]>

* try

Signed-off-by: Jason <[email protected]>

* undo

Signed-off-by: Jason <[email protected]>

* lgtm:

Signed-off-by: Jason <[email protected]>

* style

Signed-off-by: Jason <[email protected]>

* default to ljs

Signed-off-by: Jason <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* patch quantization (#2314)

* update quantization

Signed-off-by: slyned <[email protected]>

* update quant infer trt

Signed-off-by: slyned <[email protected]>

* fix style

Signed-off-by: slyned <[email protected]>

Co-authored-by: slyned <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Pin OmegaConf version for 1.0.0 (#2316)

* Update OmegaConf compatibility

Signed-off-by: smajumdar <[email protected]>

* Correct OmegaConf.pretty()

Signed-off-by: smajumdar <[email protected]>

* Upper bound omegaconf

Signed-off-by: smajumdar <[email protected]>

* Revert "Correct OmegaConf.pretty()"

This reverts commit 6ebae2ef

Signed-off-by: smajumdar <[email protected]>

* Revert "Update OmegaConf compatibility"

This reverts commit 83b2cf35a07a742552082e80e6ca34c9b8203cbc.

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* [BUGFIX] OmegaConf forward compatibility (#2319)

* Update OmegaConf compatibility

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: ericharper <[email protected]>

* Correct OmegaConf.pretty()

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: ericharper <[email protected]>

* upper bound omegaconf

Signed-off-by: ericharper <[email protected]>

* add if,else back

Signed-off-by: ericharper <[email protected]>

* typo

Signed-off-by: ericharper <[email protected]>

Co-authored-by: Somshubra Majumdar <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* bumping version to 1.0.1

Signed-off-by: Oleksii Kuchaiev <[email protected]>

Signed-off-by: Micha Livne <[email protected]>

* fix_cluster_small_sample (#2303)

* fix_cluster_small_sample

Signed-off-by: nithinraok <[email protected]>

* for smaller samples

Signed-off-by: nithinraok <[email protected]>

* remove type

Signed-off-by: nithinraok <[email protected]>

* similarity matrix

Signed-off-by: nithinraok <[email protected]>

* est num of speakers add

Signed-off-by: nithinraok <[email protected]>

* comment update

Signed-off-by: nithinraok <[email protected]>

* style fix

Signed-off-by: nithinraok <[email protected]>

* MIN_SAMPLES passed through func arg

Signed-off-by: nithinraok <[email protected]>

* doc string update

Signed-off-by: nithinraok <[email protected]>

* spell mistake

Signed-off-by: nithinraok <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Fastpitch export (#2300)

* wip

Signed-off-by: Jason <[email protected]>

* c1

Signed-off-by: Jason <[email protected]>

* bug fixes

Signed-off-by: Jason <[email protected]>

* bug fixes

Signed-off-by: Jason <[email protected]>

* v2

Signed-off-by: Jason <[email protected]>

* changes

Signed-off-by: Jason <[email protected]>

* add types, old model working

Signed-off-by: Jason <[email protected]>

* pitch

Signed-off-by: Jason <[email protected]>

* update

Signed-off-by: Jason <[email protected]>

* update

Signed-off-by: Jason <[email protected]>

* let it work

Signed-off-by: Jason <[email protected]>

* fixes

Signed-off-by: Jason <[email protected]>

* add oktai comments

Signed-off-by: Jason <[email protected]>

* debug

Signed-off-by: Jason <[email protected]>

* scale

Signed-off-by: Jason <[email protected]>

* wip

Signed-off-by: Jason <[email protected]>

* fix test for v1

Signed-off-by: Jason <[email protected]>
…
titu1994 added a commit to titu1994/NeMo that referenced this pull request Jul 20, 2021
* Itn add classes (#2141)

* move do_training flag to config

Signed-off-by: Yang Zhang <[email protected]>

* added telephone to itn

Signed-off-by: Yang Zhang <[email protected]>

* add telephone and email to itn

Signed-off-by: Yang Zhang <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* ASR + NLP Doc Fixes (#2136)

* Preserve the tokenizer config for ASR

Signed-off-by: smajumdar <[email protected]>

* Correct nlp docs

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Removing graphsurgeon optional dependency, improving import error rep… (#2144)

* Removing graphsurgeon optional dependency, improving import error reporting

Signed-off-by: Boris Fomitchev <[email protected]>

* Fixing scope error

Signed-off-by: Boris Fomitchev <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Fix FilterbankFeatures eval nondeterminism. (#2146)

Signed-off-by: PiotrDabkowski <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* fix the docs. (#2148)


Signed-off-by: Micha Livne <[email protected]>

* Text processing refactor (#2149)

* removed graphutils, suppletive, data_loader_utils from itn to be reused from tn

Signed-off-by: Yang Zhang <[email protected]>

* inheriting itn from tn, thus removing redundancy

Signed-off-by: Yang Zhang <[email protected]>

* cleaned whitelist

Signed-off-by: Yang Zhang <[email protected]>

* lgtm fix

Signed-off-by: Yang Zhang <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Update how artifacts work (#2138)

* Update how artifacts work

Signed-off-by: Oleksii Kuchaiev <[email protected]>

* fixing some tests

Signed-off-by: Oleksii Kuchaiev <[email protected]>

* fix more tests

Signed-off-by: Oleksii Kuchaiev <[email protected]>

* add __init__ to tests to make them discoverable

Signed-off-by: Oleksii Kuchaiev <[email protected]>

* empty src support

Signed-off-by: Oleksii Kuchaiev <[email protected]>

* updates plust unittest

Signed-off-by: Oleksii Kuchaiev <[email protected]>

* add copyright check

Signed-off-by: Oleksii Kuchaiev <[email protected]>

* copyright header

Signed-off-by: Oleksii Kuchaiev <[email protected]>

* fix style

Signed-off-by: Oleksii Kuchaiev <[email protected]>

* handle hashed megatron checkpoint version in nlp restore_from

Signed-off-by: ericharper <[email protected]>

* add _MODEL_RESTORE_PATH to AppState

Signed-off-by: ericharper <[email protected]>

* get rid of global folder caching

Signed-off-by: Oleksii Kuchaiev <[email protected]>

* double register - warning instead of exception

Signed-off-by: Oleksii Kuchaiev <[email protected]>

* Add asr spe tests

Signed-off-by: smajumdar <[email protected]>

* Pop out asr wpe pre-registered value

Signed-off-by: smajumdar <[email protected]>

* Correct ASR tests and paths

Signed-off-by: smajumdar <[email protected]>

* Correct tokenizer saving

Signed-off-by: smajumdar <[email protected]>

* Correct ASR tests

Signed-off-by: smajumdar <[email protected]>

* Correct ASR bpe mixin

Signed-off-by: smajumdar <[email protected]>

* Patch up backward compatibility

Signed-off-by: smajumdar <[email protected]>

* update register_bert_model

Signed-off-by: ericharper <[email protected]>

* update all get_lm_model calls

Signed-off-by: ericharper <[email protected]>

* return None if src not found

Signed-off-by: ericharper <[email protected]>

* handle case with no tokenizer

Signed-off-by: ericharper <[email protected]>

* do not add another hash is using tarfile_artifacts

Signed-off-by: ericharper <[email protected]>

* add return_none flag, update doc string

Signed-off-by: ericharper <[email protected]>

* update default behavior of register_artifact for NLPModel

Signed-off-by: ericharper <[email protected]>

* change kwarg name to verify_src_exists

Signed-off-by: ericharper <[email protected]>

* use cfg instead of _cfg

Signed-off-by: Oleksii Kuchaiev <[email protected]>

* some cleanups

Signed-off-by: Oleksii Kuchaiev <[email protected]>

Co-authored-by: ericharper <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Language model refactoring (#2120)

* fixed branch in IR tutorial

Signed-off-by: AlexGrinch <[email protected]>

* bucketing tarred dataset for lm training

Signed-off-by: AlexGrinch <[email protected]>

* updated global rank

Signed-off-by: AlexGrinch <[email protected]>

* perplexity update

Signed-off-by: AlexGrinch <[email protected]>

* refactor lm to be campatible with latest nmt

Signed-off-by: AlexGrinch <[email protected]>

* perplexity change

Signed-off-by: AlexGrinch <[email protected]>

* removed obsolete config

Signed-off-by: AlexGrinch <[email protected]>

* added sequence perplexity

Signed-off-by: AlexGrinch <[email protected]>

* added non-smoothed CE loss for validation

Signed-off-by: AlexGrinch <[email protected]>

* unified sentence dataset, torchmetrics for sequence perplexity

Signed-off-by: AlexGrinch <[email protected]>

* translate_ddp refactor

Signed-off-by: AlexGrinch <[email protected]>

Co-authored-by: Oleksii Kuchaiev <[email protected]>
Co-authored-by: Sandeep Subramanian <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* [NMT] Multi-validation Patch (#2150)

* rename dl index 0 loss and sacrebleu for backwards compatibility

Signed-off-by: ericharper <[email protected]>

* eval -> val/tst

Signed-off-by: ericharper <[email protected]>

* instantiate torchmetrics after instantiating dataloaders

Signed-off-by: ericharper <[email protected]>

* bug

Signed-off-by: ericharper <[email protected]>

* remove debugging log

Signed-off-by: ericharper <[email protected]>

* remove debugging log

Signed-off-by: ericharper <[email protected]>

Co-authored-by: Oleksii Kuchaiev <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* bumping version to 1.0.0

Signed-off-by: Oleksii Kuchaiev <[email protected]>

Signed-off-by: Micha Livne <[email protected]>

* fixed the num_samples of text classification model. (#2152)

Signed-off-by: Vahid <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* fix for electronic (#2153)

* fix for electronic

Signed-off-by: ekmb <[email protected]>

* special symbols added

Signed-off-by: ekmb <[email protected]>

* restrict symbols list

Signed-off-by: ekmb <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* FastSpeech 2 Test & Docs (#2143)

* Add FS2 data loading test

Signed-off-by: Jocelyn Huang <[email protected]>

* TTS docs update for FastSpeech 2

Signed-off-by: Jocelyn Huang <[email protected]>

* Style fix for FS2 dataset test

Signed-off-by: Jocelyn Huang <[email protected]>

* Fix transpose typo

Signed-off-by: Jocelyn Huang <[email protected]>

Co-authored-by: Oleksii Kuchaiev <[email protected]>
Co-authored-by: Jason <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Minor patch for translate_ddp (#2155)

* Patch for backtranslation in lm dataset

Signed-off-by: MaximumEntropy <[email protected]>

* One more fix

Signed-off-by: MaximumEntropy <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Entity linking (#2050)

* Started adding SAP dataset

Signed-off-by: Virginia Adams <[email protected]>

* Delete .lm_bert_dataset.py.swp

Signed-off-by: Virginia Adams <[email protected]>

* Added dataset and loss

Signed-off-by: Virginia Adams <[email protected]>

* Added entity linking encoder model

Signed-off-by: Virginia Adams <[email protected]>

* Can build and use index from pubmedbert model

Signed-off-by: Virginia Adams <[email protected]>

* checked boolean logic in build_index.py

Signed-off-by: Virginia Adams <[email protected]>

* End to end tested all functionality

Signed-off-by: Virginia Adams <[email protected]>

* fixed val loss none at end of validation

Signed-off-by: Virginia Adams <[email protected]>

* Started adding demo entity linking notebook

Signed-off-by: Virginia Adams <[email protected]>

* adding in notebook demo

Signed-off-by: Virginia Adams <[email protected]>

* added call to entitylinking classes in __init__.py files

Signed-off-by: Virginia Adams <[email protected]>

* Added eval code to notebook

Signed-off-by: Virginia Adams <[email protected]>

* Adding unfinished notebook

Signed-off-by: Virginia Adams <[email protected]>

* Cleaned up example dir

Signed-off-by: Virginia Adams <[email protected]>

* Fixed recap commands

Signed-off-by: Virginia Adams <[email protected]>

* added model typing and tiny data tar

Signed-off-by: Virginia Adams <[email protected]>

* Adding tiny data zip

Signed-off-by: Virginia Adams <[email protected]>

* updated tiny example config data path

Signed-off-by: Virginia Adams <[email protected]>

* Notebook demo works

Signed-off-by: Virginia Adams <[email protected]>

* Changed training epochs

Signed-off-by: Virginia Adams <[email protected]>

* Removed output from training and install cells

Signed-off-by: Virginia Adams <[email protected]>

* changed code formatting

Signed-off-by: Virginia Adams <[email protected]>

* Started doc string for new functions

Signed-off-by: Virginia Adams <[email protected]>

* Updated data_preprocessing to save to data_dir

Signed-off-by: Virginia Adams <[email protected]>

* fixed comment in notebook demo

Signed-off-by: Virginia Adams <[email protected]>

* Update data_preprocessing.py

Signed-off-by: Virginia Adams <[email protected]>

* updated nemo typing imports

Signed-off-by: Virginia Adams <[email protected]>

* about to rebase

Signed-off-by: Virginia Adams <[email protected]>

* added back umls_dataset_processing.py

Signed-off-by: Virginia Adams <[email protected]>

* Removed example data

Signed-off-by: Virginia Adams <[email protected]>

* Fixed typos in notebook demo

Signed-off-by: Virginia Adams <[email protected]>

* fixed lgtm-com issues

Signed-off-by: Virginia Adams <[email protected]>

* added copyright headers

Signed-off-by: Virginia Adams <[email protected]>

* fixed import and copyright headers

Signed-off-by: Virginia Adams <[email protected]>

* Fixed formatting

Signed-off-by: Virginia Adams <[email protected]>

* Fixed formatting changes 2

Signed-off-by: Virginia Adams <[email protected]>

* fixed test formatting

Signed-off-by: Virginia Adams <[email protected]>

* Added __init__.py for model and dataset

Signed-off-by: Virginia Adams <[email protected]>

* loading newline file returns data_dir now

Signed-off-by: Virginia Adams <[email protected]>

* Removed conf notebook and deleted comment

Signed-off-by: Virginia Adams <[email protected]>

* Added jenkins test

Signed-off-by: Virginia Adams <[email protected]>

* Updated Jenkins test

Signed-off-by: Virginia Adams <[email protected]>

* fixed file path

Signed-off-by: Virginia Adams <[email protected]>

* Changed Jenkins pipeline order

Signed-off-by: Virginia Adams <[email protected]>

* Fixed Jenkins datapath... again...

Signed-off-by: Virginia Adams <[email protected]>

* Made most review changes

Signed-off-by: Virginia Adams <[email protected]>

* fixed copy right

Signed-off-by: Virginia Adams <[email protected]>

* updated unit test to wget config

Signed-off-by: Virginia Adams <[email protected]>

* reverted test file back

Signed-off-by: Virginia Adams <[email protected]>

* Added project dir to jenkins test

Signed-off-by: Virginia Adams <[email protected]>

* defined config in unit test

Signed-off-by: Virginia Adams <[email protected]>

Co-authored-by: Sandeep Subramanian <[email protected]>
Co-authored-by: Oleksii Kuchaiev <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Correct branch version for v1.0.0 (#2157)

* Correct branch version

Signed-off-by: smajumdar <[email protected]>

* Correct Jenkinsfile

Signed-off-by: smajumdar <[email protected]>

* Update rst files

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* switch CI back to main

Signed-off-by: Oleksii Kuchaiev <[email protected]>

Signed-off-by: Micha Livne <[email protected]>

* fixed the docs. (#2156)


Signed-off-by: Micha Livne <[email protected]>

* Make Hifigan jittable (#2159)

* FastSpeech 2 Test & Docs (#2143)

* Add FS2 data loading test

Signed-off-by: Jocelyn Huang <[email protected]>

* TTS docs update for FastSpeech 2

Signed-off-by: Jocelyn Huang <[email protected]>

* Style fix for FS2 dataset test

Signed-off-by: Jocelyn Huang <[email protected]>

* Fix transpose typo

Signed-off-by: Jocelyn Huang <[email protected]>

Co-authored-by: Oleksii Kuchaiev <[email protected]>
Co-authored-by: Jason <[email protected]>

* Entity linking (#2050)

* Started adding SAP dataset

Signed-off-by: Virginia Adams <[email protected]>

* Delete .lm_bert_dataset.py.swp

Signed-off-by: Virginia Adams <[email protected]>

* Added dataset and loss

Signed-off-by: Virginia Adams <[email protected]>

* Added entity linking encoder model

Signed-off-by: Virginia Adams <[email protected]>

* Can build and use index from pubmedbert model

Signed-off-by: Virginia Adams <[email protected]>

* checked boolean logic in build_index.py

Signed-off-by: Virginia Adams <[email protected]>

* End to end tested all functionality

Signed-off-by: Virginia Adams <[email protected]>

* fixed val loss none at end of validation

Signed-off-by: Virginia Adams <[email protected]>

* Started adding demo entity linking notebook

Signed-off-by: Virginia Adams <[email protected]>

* adding in notebook demo

Signed-off-by: Virginia Adams <[email protected]>

* added call to entitylinking classes in __init__.py files

Signed-off-by: Virginia Adams <[email protected]>

* Added eval code to notebook

Signed-off-by: Virginia Adams <[email protected]>

* Adding unfinished notebook

Signed-off-by: Virginia Adams <[email protected]>

* Cleaned up example dir

Signed-off-by: Virginia Adams <[email protected]>

* Fixed recap commands

Signed-off-by: Virginia Adams <[email protected]>

* added model typing and tiny data tar

Signed-off-by: Virginia Adams <[email protected]>

* Adding tiny data zip

Signed-off-by: Virginia Adams <[email protected]>

* updated tiny example config data path

Signed-off-by: Virginia Adams <[email protected]>

* Notebook demo works

Signed-off-by: Virginia Adams <[email protected]>

* Changed training epochs

Signed-off-by: Virginia Adams <[email protected]>

* Removed output from training and install cells

Signed-off-by: Virginia Adams <[email protected]>

* changed code formatting

Signed-off-by: Virginia Adams <[email protected]>

* Started doc string for new functions

Signed-off-by: Virginia Adams <[email protected]>

* Updated data_preprocessing to save to data_dir

Signed-off-by: Virginia Adams <[email protected]>

* fixed comment in notebook demo

Signed-off-by: Virginia Adams <[email protected]>

* Update data_preprocessing.py

Signed-off-by: Virginia Adams <[email protected]>

* updated nemo typing imports

Signed-off-by: Virginia Adams <[email protected]>

* about to rebase

Signed-off-by: Virginia Adams <[email protected]>

* added back umls_dataset_processing.py

Signed-off-by: Virginia Adams <[email protected]>

* Removed example data

Signed-off-by: Virginia Adams <[email protected]>

* Fixed typos in notebook demo

Signed-off-by: Virginia Adams <[email protected]>

* fixed lgtm-com issues

Signed-off-by: Virginia Adams <[email protected]>

* added copyright headers

Signed-off-by: Virginia Adams <[email protected]>

* fixed import and copyright headers

Signed-off-by: Virginia Adams <[email protected]>

* Fixed formatting

Signed-off-by: Virginia Adams <[email protected]>

* Fixed formatting changes 2

Signed-off-by: Virginia Adams <[email protected]>

* fixed test formatting

Signed-off-by: Virginia Adams <[email protected]>

* Added __init__.py for model and dataset

Signed-off-by: Virginia Adams <[email protected]>

* loading newline file returns data_dir now

Signed-off-by: Virginia Adams <[email protected]>

* Removed conf notebook and deleted comment

Signed-off-by: Virginia Adams <[email protected]>

* Added jenkins test

Signed-off-by: Virginia Adams <[email protected]>

* Updated Jenkins test

Signed-off-by: Virginia Adams <[email protected]>

* fixed file path

Signed-off-by: Virginia Adams <[email protected]>

* Changed Jenkins pipeline order

Signed-off-by: Virginia Adams <[email protected]>

* Fixed Jenkins datapath... again...

Signed-off-by: Virginia Adams <[email protected]>

* Made most review changes

Signed-off-by: Virginia Adams <[email protected]>

* fixed copy right

Signed-off-by: Virginia Adams <[email protected]>

* updated unit test to wget config

Signed-off-by: Virginia Adams <[email protected]>

* reverted test file back

Signed-off-by: Virginia Adams <[email protected]>

* Added project dir to jenkins test

Signed-off-by: Virginia Adams <[email protected]>

* defined config in unit test

Signed-off-by: Virginia Adams <[email protected]>

Co-authored-by: Sandeep Subramanian <[email protected]>
Co-authored-by: Oleksii Kuchaiev <[email protected]>

* switch CI back to main

Signed-off-by: Oleksii Kuchaiev <[email protected]>

* Make Hifigan jittable

Signed-off-by: Ryan Leary <[email protected]>

* Remove vestigial debugging printout

Signed-off-by: Ryan Leary <[email protected]>

* Add export forward and fix style

Signed-off-by: Ryan Leary <[email protected]>

* Fix load_state_dict override for arbitrary layers

Signed-off-by: Ryan Leary <[email protected]>

Co-authored-by: Jocelyn <[email protected]>
Co-authored-by: Oleksii Kuchaiev <[email protected]>
Co-authored-by: Jason <[email protected]>
Co-authored-by: vadam5 <[email protected]>
Co-authored-by: Sandeep Subramanian <[email protected]>
Co-authored-by: Oleksii Kuchaiev <[email protected]>
Co-authored-by: Ryan Leary <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* fix version (#2162)

Signed-off-by: Jason <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Megatron nb size reduced (#2163)

* notebook size reduced

Signed-off-by: ekmb <[email protected]>

* notebook size reduced

Signed-off-by: ekmb <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* update spectral clustering method (#2158)

* update spectral clustering method

Signed-off-by: nithinraok <[email protected]>

* update Jenkins File

Signed-off-by: nithinraok <[email protected]>

* threshold fix by reducing window length for shorter embs

Signed-off-by: nithinraok <[email protected]>

* grammar fixes

Signed-off-by: nithinraok <[email protected]>

* CR update

Signed-off-by: nithinraok <[email protected]>

* paper reference

Signed-off-by: nithinraok <[email protected]>

* improve docstring for yaml

Signed-off-by: nithinraok <[email protected]>

* Doc fixes

Signed-off-by: nithinraok <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* revert (#2167)

Signed-off-by: Jason <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Limit Pytorch lightning release (#2170)

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* token classification models artifacts update (#2169)

* artifacts update

Signed-off-by: ekmb <[email protected]>

* artifacts update

Signed-off-by: ekmb <[email protected]>

* jenkins

Signed-off-by: ekmb <[email protected]>

* fix for model restoration

Signed-off-by: ekmb <[email protected]>

* typos fix + jenkins dir update

Signed-off-by: ekmb <[email protected]>

* typo

Signed-off-by: ericharper <[email protected]>

* update jenkins branch

Signed-off-by: ericharper <[email protected]>

* add &&

Signed-off-by: ericharper <[email protected]>

* jenkins

Signed-off-by: ekmb <[email protected]>

* jenkins disable

Signed-off-by: ekmb <[email protected]>

* revert jenkins

Signed-off-by: ekmb <[email protected]>

* jenkins disable

Signed-off-by: ekmb <[email protected]>

* revert jenkins

Signed-off-by: ekmb <[email protected]>

Co-authored-by: ericharper <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Fix to always_save_nemo (#2174)

* Initial attempt at always_save_nemo fix

Signed-off-by: MaximumEntropy <[email protected]>

* updated path before saving in exp manager, fixed bug when handling tarfile artifacts

Signed-off-by: ericharper <[email protected]>

* Add test with always_save_nemo to exp_manager

Signed-off-by: MaximumEntropy <[email protected]>

* Style fixes

Signed-off-by: MaximumEntropy <[email protected]>

* update jenkins branch

Signed-off-by: ericharper <[email protected]>

* check for nemo:

Signed-off-by: ericharper <[email protected]>

* check for nemo:

Signed-off-by: ericharper <[email protected]>

* check for nemo:

Signed-off-by: ericharper <[email protected]>

Co-authored-by: ericharper <[email protected]>
Co-authored-by: Jason <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* fix typo (#2179)

Signed-off-by: Yang Zhang <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Make itn tests optional  (#2173)

* Limit Pytorch lightning release

Signed-off-by: smajumdar <[email protected]>

* Add final two checks

Signed-off-by: smajumdar <[email protected]>

Co-authored-by: Jason <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* First Revision of TTS Docs and Notebooks Update for 1.0 (#2166)

* squash

Signed-off-by: Jason <[email protected]>

* notebook fixes

Signed-off-by: Jason <[email protected]>

* notebook fixes

Signed-off-by: Jason <[email protected]>

* typos

Signed-off-by: Jason <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* add more alternatives of 0 for telephone (#2171)

Signed-off-by: Yang Zhang <[email protected]>

Co-authored-by: Oleksii Kuchaiev <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Acc tn (#2180)

* make tn cardinal faster

Signed-off-by: Yang Zhang <[email protected]>

* add number far

Signed-off-by: Yang Zhang <[email protected]>

* add test

Signed-off-by: Yang Zhang <[email protected]>

* fix lgtm

Signed-off-by: Yang Zhang <[email protected]>

* fix lgtm

Signed-off-by: Yang Zhang <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* [DOCS] NLP Model parallel, NMT multi-val, CORE register artifacts  (#2168)

* update docs

Signed-off-by: ericharper <[email protected]>

* update docs

Signed-off-by: ericharper <[email protected]>

* update docs

Signed-off-by: ericharper <[email protected]>

* update docs

Signed-off-by: ericharper <[email protected]>

* update docs

Signed-off-by: ericharper <[email protected]>

* update docs

Signed-off-by: ericharper <[email protected]>

* update docs

Signed-off-by: ericharper <[email protected]>

* update docs

Signed-off-by: ericharper <[email protected]>

* update docs

Signed-off-by: ericharper <[email protected]>

* update docs

Signed-off-by: ericharper <[email protected]>

* update docs

Signed-off-by: ericharper <[email protected]>

* update docs

Signed-off-by: ericharper <[email protected]>

Co-authored-by: Sandeep Subramanian <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Change label smoothing prob to reduce chance of test failure (#2184)

Signed-off-by: MaximumEntropy <[email protected]>

Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Add FS2 checkpoint links to docs and inference notebook (#2181)

* Add FS2 checkpoint links to docs and inference notebook

Signed-off-by: Jocelyn Huang <[email protected]>

* Remove empty cell from TTS notebook

Signed-off-by: Jocelyn Huang <[email protected]>

Co-authored-by: Oleksii Kuchaiev <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Update ptl to 1.3 on main branch (#2178)

* Update PTL

Signed-off-by: smajumdar <[email protected]>

* Begin update to Pytorch Lightning 1.3.x

Signed-off-by: smajumdar <[email protected]>

* Formatting

Signed-off-by: smajumdar <[email protected]>

* style

Signed-off-by: ericharper <[email protected]>

* Formatting

Signed-off-by: smajumdar <[email protected]>

* minor fix

Signed-off-by: Jason <[email protected]>

* minor fix

Signed-off-by: Jason <[email protected]>

* get testing attribute from trainer

Signed-off-by: ericharper <[email protected]>

* update init_ddp_connection override

Signed-off-by: ericharper <[email protected]>

* update attribute

Signed-off-by: ericharper <[email protected]>

* add barrier after load checkpoint in megatron

Signed-off-by: ericharper <[email protected]>

* remove barrier

Signed-off-by: ericharper <[email protected]>

* update last naming

Signed-off-by: Jason <[email protected]>

Co-authored-by: ericharper <[email protected]>
Co-authored-by: Jason <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* SDE updates (#2187)

* Added updates to SDE:
- support for external vocabulary (to detect OOV words)
- support for offset field (for segmented long recordings)
- UI improvements

Signed-off-by: Vitaly Lavrukhin <[email protected]>

* Refactored diff in SDE

Signed-off-by: Vitaly Lavrukhin <[email protected]>

Co-authored-by: Oleksii Kuchaiev <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Add TTS aligner and improved version of g2p for vocabs.Phonemes, small improvement in TalkNet (#2189)

* add first version of aligner

Signed-off-by: Oktai Tatanov <[email protected]>

* aligner docs, new g2p version, fix bugs in talknet

Signed-off-by: Oktai Tatanov <[email protected]>

* update docs and remove lj related code

Signed-off-by: Oktai Tatanov <[email protected]>

* fix style

Signed-off-by: Oktai Tatanov <[email protected]>

* fix import

Signed-off-by: Oktai Tatanov <[email protected]>

Co-authored-by: Oleksii Kuchaiev <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* set the default of nodessplitter to None. (#2190)

Signed-off-by: Vahid <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* NMT fixes (#2194)

* minor fixes

Signed-off-by: Oleksii Kuchaiev <[email protected]>

* minor bugfixes

Signed-off-by: Oleksii Kuchaiev <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Store mappings file in .nemo for FS2 model (#2196)

* Store mappings file in .nemo for FS2 model

Signed-off-by: Jocelyn Huang <[email protected]>

* Add error enforcing mappings file during training (FS2)

Signed-off-by: Jocelyn Huang <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Add support to change the SE context window of ConvASREncoder (#2193)

* Add support for changing context window on the fly

Signed-off-by: smajumdar <[email protected]>

* Add support to change the SE context window of ConvASREncoder

Signed-off-by: smajumdar <[email protected]>

* Add ability to skip config updating

Signed-off-by: smajumdar <[email protected]>

* Switch to mixin based API

Signed-off-by: smajumdar <[email protected]>

* Update docs and api for ASRModuleMixin

Signed-off-by: smajumdar <[email protected]>

* Change print to logging.info

Signed-off-by: smajumdar <[email protected]>

* Correct stride level when computing context window

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Add a CI test for doing inference with an NMT model trained with Pre-LN (#2198)

* Change label smoothing prob to reduce chance of test failure

Signed-off-by: MaximumEntropy <[email protected]>

* Add Pre-LN inference test to Jenkinsfile

Signed-off-by: MaximumEntropy <[email protected]>

* Separate tests for training and NMT inference

Signed-off-by: MaximumEntropy <[email protected]>

Co-authored-by: Oleksii Kuchaiev <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Fix ipywidgets error in asr notebook (#2199)

Added `ipywidgets` to avoid `ImportError: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html` error.

Signed-off-by: Derek Chia <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* metrics fix (#2202)

* metrics fix

Signed-off-by: ekmb <[email protected]>

* metrics reset for punct model

Signed-off-by: ekmb <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* readme and minor improvements (#2203)

* readme and minor improvements

Signed-off-by: nithinraok <[email protected]>

* vad threshold update

Signed-off-by: nithinraok <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Fix text processing docs (#2195)

* fix text processing docs

Signed-off-by: Yang Zhang <[email protected]>

* fix name

Signed-off-by: Yang Zhang <[email protected]>

* add guard to pynini import

Signed-off-by: Yang Zhang <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Fix bug in SpecCutout (#2201)

Signed-off-by: Robert Bracco <[email protected]>

Co-authored-by: Oleksii Kuchaiev <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Fix bug in SpecCutout (#2201) (#2205)

Signed-off-by: Robert Bracco <[email protected]>

Co-authored-by: Oleksii Kuchaiev <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Signed-off-by: smajumdar <[email protected]>

Co-authored-by: Robert Bracco <[email protected]>
Co-authored-by: Oleksii Kuchaiev <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Set seed before generating random tensors in NMT test (#2206)

* Change label smoothing prob to reduce chance of test failure

Signed-off-by: MaximumEntropy <[email protected]>

* Set seed before generating tensors

Signed-off-by: MaximumEntropy <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* ASR patches for v1.0.0 (#2207)

* Multiple updates to RNNT add initialization

Signed-off-by: smajumdar <[email protected]>

* Correct name of initilization

Signed-off-by: smajumdar <[email protected]>

* Update dockerignore

Signed-off-by: smajumdar <[email protected]>

* Fix RNNT WER calculation

Signed-off-by: smajumdar <[email protected]>

* Address comments

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Multilingual training for NMT (#2160)

* mnmt on fresh main

Signed-off-by: Abhinav Khattar <[email protected]>

* push for test

Signed-off-by: Abhinav Khattar <[email protected]>

* debug

Signed-off-by: Abhinav Khattar <[email protected]>

* check

Signed-off-by: Abhinav Khattar <[email protected]>

* cleanup

Signed-off-by: Abhinav Khattar <[email protected]>

* minor fix

Signed-off-by: Abhinav Khattar <[email protected]>

* more minor fixes

Signed-off-by: Abhinav Khattar <[email protected]>

* fix for test

Signed-off-by: Abhinav Khattar <[email protected]>

* fix list size error

Signed-off-by: Abhinav Khattar <[email protected]>

* multilingual in infer

Signed-off-by: Abhinav Khattar <[email protected]>

* changes

Signed-off-by: Abhinav Khattar <[email protected]>

* tar creation with multilingual

Signed-off-by: Abhinav Khattar <[email protected]>

* fix

Signed-off-by: Abhinav Khattar <[email protected]>

* changes + parallelism + bug fix

Signed-off-by: Abhinav Khattar <[email protected]>

* small fix

Signed-off-by: Abhinav Khattar <[email protected]>

* multilingual preprocessor fix

Signed-off-by: Abhinav Khattar <[email protected]>

* globally unique fragment names in tarred dataset

Signed-off-by: Abhinav Khattar <[email protected]>

* minor changes

Signed-off-by: Abhinav Khattar <[email protected]>

* rm load_from_cached_dataset

Signed-off-by: Abhinav Khattar <[email protected]>

* minor config change

Signed-off-by: Abhinav Khattar <[email protected]>

* rm unsued import

Signed-off-by: Abhinav Khattar <[email protected]>

Co-authored-by: Oleksii Kuchaiev <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Remove memory leak from ASR notebook + update model notebook (#2213)

* ASR patches for v1.0.0 (#2207)

* Multiple updates to RNNT add initialization

Signed-off-by: smajumdar <[email protected]>

* Correct name of initilization

Signed-off-by: smajumdar <[email protected]>

* Update dockerignore

Signed-off-by: smajumdar <[email protected]>

* Fix RNNT WER calculation

Signed-off-by: smajumdar <[email protected]>

* Address comments

Signed-off-by: smajumdar <[email protected]>

* Correct model notebook to log the loss and correctly assign keys

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* replace names in vad tutorials (#2220)

Signed-off-by: fayejf <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* fix the versioning name. (#2209)

* fix the versioning name.

Signed-off-by: Vahid <[email protected]>

* Made version None.

Signed-off-by: Vahid <[email protected]>

Co-authored-by: Oleksii Kuchaiev <[email protected]>
Co-authored-by: Jason <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Enabled passing kwargs to export() (#2175)

* Enabled passing kwargs to export()

Signed-off-by: Boris Fomitchev <[email protected]>

* Fixing style; changed Classifier input_example to new extended syntax

Signed-off-by: Boris Fomitchev <[email protected]>

* Fixed order of forward() call in export

Signed-off-by: Boris Fomitchev <[email protected]>

* Fixing style

Signed-off-by: Boris Fomitchev <[email protected]>

Co-authored-by: Oleksii Kuchaiev <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* update g2p: ambigious ignore, flag for skipping seq2seq (#2223)

Signed-off-by: Oktai Tatanov <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Update TTS notebook with TalkNet inference (#2133)

* Update TTS notebook with TalkNet inference.

Signed-off-by: Stanislav Beliaev <[email protected]>

* Update TTS Notebook.

Signed-off-by: Stanislav Beliaev <[email protected]>

* Update TTS TN Training Notebook.

Signed-off-by: Stanislav Beliaev <[email protected]>

* Fix TN paper link.

Signed-off-by: Stanislav Beliaev <[email protected]>

* Remove branch updaing TODOs.

Signed-off-by: Stanislav Beliaev <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* update speaker notebooks (#2224)

Signed-off-by: nithinraok <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Support symlinked files (#2216)

Signed-off-by: Anas Abou Allaban <[email protected]>

Co-authored-by: Somshubra Majumdar <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Set strict=True everywhere by default. (#2225)

Signed-off-by: Oleksii Kuchaiev <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* set strict=True in nlp_model (#2227)

Signed-off-by: Oleksii Kuchaiev <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* set strict=False for model parallel examples

Signed-off-by: Oleksii Kuchaiev <[email protected]>

Signed-off-by: Micha Livne <[email protected]>

* Make Text processing installation optional via reinstall.sh (#2226)

* Make Text processing installation optional via reinstall.sh

Signed-off-by: smajumdar <[email protected]>

* Support both success and failure states

Signed-off-by: smajumdar <[email protected]>

Co-authored-by: Oleksii Kuchaiev <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Transformer final norm preln (#2197)

* fix pre_ln final norm

Signed-off-by: fayejf <[email protected]>

* style fix

Signed-off-by: fayejf <[email protected]>

* bug fixed

Signed-off-by: fayejf <[email protected]>

* bugfix post_ln

Signed-off-by: fayejf <[email protected]>

* update and add pre_ln_final_norm

Signed-off-by: fayejf <[email protected]>

* style fix

Signed-off-by: fayejf <[email protected]>

* fix for unit test

Signed-off-by: fayejf <[email protected]>

* rename final_norm to final_layer_norm

Signed-off-by: fayejf <[email protected]>

* bug fix

Signed-off-by: fayejf <[email protected]>

* tiny fix

Signed-off-by: fayejf <[email protected]>

* fix and improve

Signed-off-by: fayejf <[email protected]>

* tiny fix

Signed-off-by: fayejf <[email protected]>

* Patch for NMT to allow loading old modlels trained with pre-LN

Signed-off-by: MaximumEntropy <[email protected]>

Co-authored-by: Sandeep Subramanian <[email protected]>
Co-authored-by: Oleksii Kuchaiev <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Update models and notebook for 1.0 (#2211)

* update models

Signed-off-by: Jason <[email protected]>

* updates

Signed-off-by: Jason <[email protected]>

* fix

Signed-off-by: Jason <[email protected]>

* add links

Signed-off-by: Jason <[email protected]>

* fixes

Signed-off-by: Jason <[email protected]>

* style

Signed-off-by: Jason <[email protected]>

* update checkpoints

Signed-off-by: Jason <[email protected]>

* rename

Signed-off-by: Jason <[email protected]>

* fixes

Signed-off-by: Jason <[email protected]>

* lgtm

Signed-off-by: Jason <[email protected]>

* fix loading waveglow

Signed-off-by: Jason <[email protected]>

* typo

Signed-off-by: Jason <[email protected]>

Co-authored-by: Oleksii Kuchaiev <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* update_metrics_classification_models (#2228)

Signed-off-by: nithinraok <[email protected]>

Co-authored-by: fayejf <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Data loader for seq of label model (#2084)

* feature to seq label data loader

Signed-off-by: fayejf <[email protected]>

* style fix

Signed-off-by: fayejf <[email protected]>

* small fix

Signed-off-by: fayejf <[email protected]>

* update tl to be length of seq label

Signed-off-by: fayejf <[email protected]>

* style fix

Signed-off-by: fayejf <[email protected]>

* tiny bug fix

Signed-off-by: fayejf <[email protected]>

* small updates

Signed-off-by: fayejf <[email protected]>

* updates for review feedback

Signed-off-by: fayejf <[email protected]>

* style fix

Signed-off-by: fayejf <[email protected]>

* explain seq_label

Signed-off-by: fayejf <[email protected]>

* fix lgtm

Signed-off-by: fayejf <[email protected]>

* small updates

Signed-off-by: fayejf <[email protected]>

* improve as discussed

Signed-off-by: fayejf <[email protected]>

* add docstring

Signed-off-by: fayejf <[email protected]>

Co-authored-by: Nithin Rao <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* fix comments (#2236)

* fix comments

Signed-off-by: Yang Zhang <[email protected]>

* fix typo

Signed-off-by: Yang Zhang <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* add paper ref to sgdqa model doc (#2233)

* add paper ref to sgdqa model doc

Signed-off-by: Yang Zhang <[email protected]>

* fix comments

Signed-off-by: Yang Zhang <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Move ConcatDataset to common (#2237)

* move concatdataset to common

Signed-off-by: Abhinav Khattar <[email protected]>

* var name change

Signed-off-by: Abhinav Khattar <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* audio based normalization (#2231)

* squash norm_audio

Signed-off-by: ekmb <[email protected]>

* add missing files

Signed-off-by: ekmb <[email protected]>

* style

Signed-off-by: ekmb <[email protected]>

* unit tests added, docstrings fixed

Signed-off-by: ekmb <[email protected]>

* fix lgtm errors

Signed-off-by: ekmb <[email protected]>

* debug jenkins

Signed-off-by: ekmb <[email protected]>

* debug jenkins

Signed-off-by: ekmb <[email protected]>

* signature update

Signed-off-by: ekmb <[email protected]>

* set deterministic default

Signed-off-by: ekmb <[email protected]>

* add more test cases

Signed-off-by: ekmb <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* bug fix config (#2232)

Signed-off-by: fayejf <[email protected]>

Co-authored-by: Somshubra Majumdar <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Alias Swish to SiLU  (#2239)

* Alias Swish to SiLU and move activations to inplace execution if possible

Signed-off-by: smajumdar <[email protected]>

* Remove unused import

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Update README.rst
Signed-off-by: Micha Livne <[email protected]>

* Offline asr notebook bug fix (#2242)

* fix

Signed-off-by: fayejf <[email protected]>

* install

Signed-off-by: fayejf <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Fix docstring (#2244)

* fix comments

Signed-off-by: Yang Zhang <[email protected]>

* fix doc string

Signed-off-by: Yang Zhang <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Update "last" Checkpoint (#2241)

* fix

Signed-off-by: Jason <[email protected]>

* change

Signed-off-by: Jason <[email protected]>

* fix

Signed-off-by: Jason <[email protected]>

Co-authored-by: Sandeep Subramanian <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Add pretrained model stt_es_citrinet_512 (#2247)

Signed-off-by: Elena Rastorgueva <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* [BUGFIX] Only process tarfile artifacts when model was restored from tarfile (#2250)

* process tarfile artifacts only if model is being restored

Signed-off-by: ericharper <[email protected]>

* process tarfile artifacts only if model was restored from a tarfile

Signed-off-by: ericharper <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Log average metrics for Multi-validation in NMT (#2251)

* add avg metrics NMT

Signed-off-by: Abhinav Khattar <[email protected]>

* name change

Signed-off-by: Abhinav Khattar <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Update Primer notebook (#2258)

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* fixed Bug 3310780 and  3310799 (#2264)

Signed-off-by: Virginia Adams <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Support multiple models being instantiated in same execution scope (#2245)

* Support multiple models being instantiated in same execution scope

Signed-off-by: smajumdar <[email protected]>

* Fix tests

Signed-off-by: smajumdar <[email protected]>

* Add locks to methods in appstate

Signed-off-by: smajumdar <[email protected]>

* Perform locks only on write operations

Signed-off-by: smajumdar <[email protected]>

* Correct deadlock issue

Signed-off-by: smajumdar <[email protected]>

* Add more tests

Signed-off-by: smajumdar <[email protected]>

* Add test for multi save and remove patch to change save type

Signed-off-by: smajumdar <[email protected]>

* Update app state to preserve gidx of previous token

Signed-off-by: smajumdar <[email protected]>

* Correct restoration logic for tarfiles

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* ASR Refactoring (#2240)

* Refactor out the preprocessing from ASR into common

Signed-off-by: smajumdar <[email protected]>

* Correct nltk issue with vocabs.py for clusters

Signed-off-by: smajumdar <[email protected]>

* Add typing information to SpecAugment and SpecCutout

Signed-off-by: smajumdar <[email protected]>

* Reorganize parts directory

Signed-off-by: smajumdar <[email protected]>

* Refactor parts submodules, add __init__ to few important parts

Signed-off-by: smajumdar <[email protected]>

* Update docs for new path to parts

Signed-off-by: smajumdar <[email protected]>

* Cherry pick PR https://github.com/NVIDIA/NeMo/pull/2219

Signed-off-by: smajumdar <[email protected]>

* Add header for preprocessing commons

Signed-off-by: smajumdar <[email protected]>

* Fix style of tests

Signed-off-by: smajumdar <[email protected]>

* Add forced update of configs for train-val-test ds to new labels tests

Signed-off-by: smajumdar <[email protected]>

* Update path to FilterbankFeatures for TTS

Signed-off-by: smajumdar <[email protected]>

* Add an alias file for backward compatibility

Signed-off-by: smajumdar <[email protected]>

* Add an alias file for backward compatibility

Signed-off-by: smajumdar <[email protected]>

* Update training scripts of ASR to support finetuning

Signed-off-by: smajumdar <[email protected]>

* Update Finetuning step to be ModelPT level

Signed-off-by: smajumdar <[email protected]>

* Update docs for finetuning for ASR

Signed-off-by: smajumdar <[email protected]>

* Fix style

Signed-off-by: smajumdar <[email protected]>

* Update docs and scripts with fine-tuning info

Signed-off-by: smajumdar <[email protected]>

* Update docs and scripts with fine-tuning info

Signed-off-by: smajumdar <[email protected]>

* Fix style

Signed-off-by: smajumdar <[email protected]>

* Update scripts

Signed-off-by: smajumdar <[email protected]>

* Add comment for weight initialization

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* TTS Doc Fix and Remove TTS Test (#2272)

* bug fix and remove test

Signed-off-by: Jason <[email protected]>

* syntax

Signed-off-by: Jason <[email protected]>

* syntax

Signed-off-by: Jason <[email protected]>

* syntax

Signed-off-by: Jason <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Talknet training Fix (#2273)

* TalkNet Training notebook fix.

Signed-off-by: Stanislav Beliaev <[email protected]>

* Remove debug stuff.

Signed-off-by: Stanislav Beliaev <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* update (#2274)

Signed-off-by: Jason <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Add links (#2275)

* update

Signed-off-by: Jason <[email protected]>

* link

Signed-off-by: Jason <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Delete 3_TTS_TalkNet_Training.ipynb (#2276)

Signed-off-by: Jason <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* tune down logging (#2277)

* tune down logging

Signed-off-by: Oleksii Kuchaiev <[email protected]>

* debug message instead of removing it completely

Signed-off-by: Oleksii Kuchaiev <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* minor bugfix

Signed-off-by: Oleksii Kuchaiev <[email protected]>

Signed-off-by: Micha Livne <[email protected]>

* remove confusing message

Signed-off-by: Oleksii Kuchaiev <[email protected]>

Signed-off-by: Micha Livne <[email protected]>

* Restore TalkNet training notebook (#2281)

* Restore TalkNet training notebook.

Signed-off-by: Stanislav Beliaev <[email protected]>

* Remove torchaudio dep.

Signed-off-by: Stanislav Beliaev <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Fix ExpManager Issues and FastPitch (#2283)

* backport exp_manager fixes to v1

Signed-off-by: Jason <[email protected]>

* fix fastpitch

Signed-off-by: Jason <[email protected]>

* fix tests

Signed-off-by: Jason <[email protected]>

* update prefix

Signed-off-by: Jason <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Organize asr config folders (#2284)


Signed-off-by: Micha Livne <[email protected]>

* Fix and enable DALI tests (#2077)

* Fix and enable DALI tests

Signed-off-by: Joaquin Anton <[email protected]>

* remove unused import

Signed-off-by: Joaquin Anton <[email protected]>

* Move DALI tests to a separate Jenkins stage

Signed-off-by: Joaquin Anton <[email protected]>

* Remove DALI tests from the main jenkins ASR stage

Signed-off-by: Joaquin Anton <[email protected]>

* Comment out MFCC test

Signed-off-by: Joaquin Anton <[email protected]>

* Working version

Signed-off-by: Joaquin Anton <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Added unit test for hifigan export, fixed hifigan export (#2279)

* Added unit test for hifigan export, Removed runtime test from waveglow test (now in export)

Signed-off-by: Boris Fomitchev <[email protected]>

* Fixed style

Signed-off-by: Boris Fomitchev <[email protected]>

* Fixed style

Signed-off-by: Boris Fomitchev <[email protected]>

Co-authored-by: Somshubra Majumdar <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Update conformer recipes (#2265)

* updated readme asr.

Signed-off-by: Vahid <[email protected]>

* added models.

Signed-off-by: Vahid <[email protected]>

* fixed docs.

Signed-off-by: Vahid <[email protected]>

* fixed docs.

Signed-off-by: Vahid <[email protected]>

* fixed docs.

Signed-off-by: Vahid <[email protected]>

* fixed docs.

Signed-off-by: Vahid <[email protected]>

* fixed the docs.

Signed-off-by: Vahid <[email protected]>

* fixed the docs.

Signed-off-by: Vahid <[email protected]>

* fixed the docs.

Signed-off-by: Vahid <[email protected]>

* disabled test.

Signed-off-by: Vahid <[email protected]>

* Updated the config files.

Signed-off-by: Vahid <[email protected]>

* Updated the config files.

Signed-off-by: Vahid <[email protected]>

* Updated the config files.

Signed-off-by: Vahid <[email protected]>

* dropped the wers.

Signed-off-by: Vahid <[email protected]>

* dropped the wers.

Signed-off-by: Vahid <[email protected]>

* dropped new models and reverted to old versions.

Signed-off-by: Vahid <[email protected]>

Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: Oleksii Kuchaiev <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Adding neural rescorer and its documentations (#2287)

* Added intial neural rescorer scripts.

Signed-off-by: Vahid <[email protected]>

* Added intial neural rescorer scripts.

Signed-off-by: Vahid <[email protected]>

* Added intial neural rescorer scripts.

Signed-off-by: Vahid <[email protected]>

* Added intial neural rescorer scripts.

Signed-off-by: Vahid <[email protected]>

* Added intial neural rescorer scripts.

Signed-off-by: Vahid <[email protected]>

* Added intial neural rescorer scripts.

Signed-off-by: Vahid <[email protected]>

* Added intial neural rescorer scripts.

Signed-off-by: Vahid <[email protected]>

* Added intial neural rescorer scripts.

Signed-off-by: Vahid <[email protected]>

* Added intial neural rescorer scripts.

Signed-off-by: Vahid <[email protected]>

* added more docs, figures, and output file.

Signed-off-by: Vahid <[email protected]>

* added more docs, figures, and output file.

Signed-off-by: Vahid <[email protected]>

* added more docs, figures, and output file.

Signed-off-by: Vahid <[email protected]>

* added more docs, figures, and output file.

Signed-off-by: Vahid <[email protected]>

* added more docs, figures, and output file.

Signed-off-by: Vahid <[email protected]>

* fixed style

Signed-off-by: Vahid <[email protected]>

* add a note to asr notebook.

Signed-off-by: Vahid <[email protected]>

* add a note to asr notebook.

Signed-off-by: Vahid <[email protected]>

* add a note to asr notebook.

Signed-off-by: Vahid <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Adjust warning messages

Signed-off-by: Oleksii Kuchaiev <[email protected]>

Signed-off-by: Micha Livne <[email protected]>

* Revert "Adjust warning messages"

This reverts commit df046ec55754d0136a2a28451435068f32409f30.

Signed-off-by: Micha Livne <[email protected]>

* Adjust warning messages (#2294)

Signed-off-by: Oleksii Kuchaiev <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Adding new Models releases on NGC. (#2295)

* added new models.

Signed-off-by: Vahid <[email protected]>

* added tests for asr lm.

Signed-off-by: Vahid <[email protected]>

* added tests for asr lm.

Signed-off-by: Vahid <[email protected]>

* dropped the test.

Signed-off-by: Vahid <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* update quantization (#2298)

Signed-off-by: slyned <[email protected]>

Co-authored-by: slyned <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* ASR improvements (#2293)

* Update numba messages and citrinet configs

Signed-off-by: smajumdar <[email protected]>

* Remove support for weight init scale and hidden hidden bias scale for layer normalized lstm

Signed-off-by: smajumdar <[email protected]>

* Add support for multiple filetypes in tarred datasets, correct rnn LN-lstm inputs, fix OmegaConf compat issue

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Time quarter to (#2292)

* fix comments

Signed-off-by: Yang Zhang <[email protected]>

* fix doc string

Signed-off-by: Yang Zhang <[email protected]>

* adding quarter to to time class

Signed-off-by: Yang Zhang <[email protected]>

* fix typo

Signed-off-by: Yang Zhang <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* fixed paths. (#2301)

Signed-off-by: Vahid <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Added onnxruntime check of exported ONNX, bumped up default ONNX opset (#2278)

* Added onnxruntime check of exported ONNX, bumped up default ONNX opset

Signed-off-by: Boris Fomitchev <[email protected]>

* Made TS export to accept ONNX-style input example, removed unused param to export

Signed-off-by: Boris Fomitchev <[email protected]>

* check_trace default made False

Signed-off-by: Boris Fomitchev <[email protected]>

* Fixed for updated export signature

Signed-off-by: Boris Fomitchev <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Update readmes

Signed-off-by: Oleksii Kuchaiev <[email protected]>

Signed-off-by: Micha Livne <[email protected]>

* update readme

Signed-off-by: Oleksii Kuchaiev <[email protected]>

Signed-off-by: Micha Livne <[email protected]>

* update readme

Signed-off-by: Oleksii Kuchaiev <[email protected]>

Signed-off-by: Micha Livne <[email protected]>

* fix docs table

Signed-off-by: Oleksii Kuchaiev <[email protected]>

Signed-off-by: Micha Livne <[email protected]>

* Add support for Numba CUDA optimized SpecAugment (#2269)

* Initial implementation

Signed-off-by: smajumdar <[email protected]>

* Initial implementation

Signed-off-by: smajumdar <[email protected]>

* Finish initial implementation of numba spec augment

Signed-off-by: smajumdar <[email protected]>

* Correct mask propagataion

Signed-off-by: smajumdar <[email protected]>

* Parallelize kernel over batch instead of over masks

Signed-off-by: smajumdar <[email protected]>

* Finish tests and update to signature of spectrogramaugmentation calls

Signed-off-by: smajumdar <[email protected]>

* Finish tests and update to signature of spectrogramaugmentation calls

Signed-off-by: smajumdar <[email protected]>

* Add header

Signed-off-by: smajumdar <[email protected]>

* Fix style

Signed-off-by: smajumdar <[email protected]>

* Add heuristics

Signed-off-by: smajumdar <[email protected]>

* Correct inclusive range of padding

Signed-off-by: smajumdar <[email protected]>

* Correct typing for spec aug numba

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Added JSON manifest's support to transcribe_speech.py (#2304)

* Added JSON manifest's support to transcribe_speech.py

Signed-off-by: Vitaly Lavrukhin <[email protected]>

* Dropped unused import

Signed-off-by: Vitaly Lavrukhin <[email protected]>

Co-authored-by: Oleksii Kuchaiev <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* get embedding for a single file (#2310)

* get embedding for a single file

Signed-off-by: nithinraok <[email protected]>

* fixes

Signed-off-by: nithinraok <[email protected]>

* sr update

Signed-off-by: nithinraok <[email protected]>

* regain train mode

Signed-off-by: nithinraok <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Update FastPitch (#2249)

* wip

Signed-off-by: Jason <[email protected]>

* c1

Signed-off-by: Jason <[email protected]>

* bug fixes

Signed-off-by: Jason <[email protected]>

* bug fixes

Signed-off-by: Jason <[email protected]>

* v2

Signed-off-by: Jason <[email protected]>

* changes

Signed-off-by: Jason <[email protected]>

* add types, old model working

Signed-off-by: Jason <[email protected]>

* pitch

Signed-off-by: Jason <[email protected]>

* update

Signed-off-by: Jason <[email protected]>

* update

Signed-off-by: Jason <[email protected]>

* let it work

Signed-off-by: Jason <[email protected]>

* fixes

Signed-off-by: Jason <[email protected]>

* add oktai comments

Signed-off-by: Jason <[email protected]>

* debug

Signed-off-by: Jason <[email protected]>

* scale

Signed-off-by: Jason <[email protected]>

* wip

Signed-off-by: Jason <[email protected]>

* fix test for v1

Signed-off-by: Jason <[email protected]>

* merge train and val

Signed-off-by: Jason <[email protected]>

* back to par bin att, add correct encoder settings

Signed-off-by: Jason <[email protected]>

* try

Signed-off-by: Jason <[email protected]>

* undo

Signed-off-by: Jason <[email protected]>

* lgtm:

Signed-off-by: Jason <[email protected]>

* style

Signed-off-by: Jason <[email protected]>

* default to ljs

Signed-off-by: Jason <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* patch quantization (#2314)

* update quantization

Signed-off-by: slyned <[email protected]>

* update quant infer trt

Signed-off-by: slyned <[email protected]>

* fix style

Signed-off-by: slyned <[email protected]>

Co-authored-by: slyned <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Pin OmegaConf version for 1.0.0 (#2316)

* Update OmegaConf compatibility

Signed-off-by: smajumdar <[email protected]>

* Correct OmegaConf.pretty()

Signed-off-by: smajumdar <[email protected]>

* Upper bound omegaconf

Signed-off-by: smajumdar <[email protected]>

* Revert "Correct OmegaConf.pretty()"

This reverts commit 6ebae2ef

Signed-off-by: smajumdar <[email protected]>

* Revert "Update OmegaConf compatibility"

This reverts commit 83b2cf35a07a742552082e80e6ca34c9b8203cbc.

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* [BUGFIX] OmegaConf forward compatibility (#2319)

* Update OmegaConf compatibility

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: ericharper <[email protected]>

* Correct OmegaConf.pretty()

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: ericharper <[email protected]>

* upper bound omegaconf

Signed-off-by: ericharper <[email protected]>

* add if,else back

Signed-off-by: ericharper <[email protected]>

* typo

Signed-off-by: ericharper <[email protected]>

Co-authored-by: Somshubra Majumdar <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* bumping version to 1.0.1

Signed-off-by: Oleksii Kuchaiev <[email protected]>

Signed-off-by: Micha Livne <[email protected]>

* fix_cluster_small_sample (#2303)

* fix_cluster_small_sample

Signed-off-by: nithinraok <[email protected]>

* for smaller samples

Signed-off-by: nithinraok <[email protected]>

* remove type

Signed-off-by: nithinraok <[email protected]>

* similarity matrix

Signed-off-by: nithinraok <[email protected]>

* est num of speakers add

Signed-off-by: nithinraok <[email protected]>

* comment update

Signed-off-by: nithinraok <[email protected]>

* style fix

Signed-off-by: nithinraok <[email protected]>

* MIN_SAMPLES passed through func arg

Signed-off-by: nithinraok <[email protected]>

* doc string update

Signed-off-by: nithinraok <[email protected]>

* spell mistake

Signed-off-by: nithinraok <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Fastpitch export (#2300)

* wip

Signed-off-by: Jason <[email protected]>

* c1

Signed-off-by: Jason <[email protected]>

* bug fixes

Signed-off-by: Jason <[email protected]>

* bug fixes

Signed-off-by: Jason <[email protected]>

* v2

Signed-off-by: Jason <[email protected]>

* changes

Signed-off-by: Jason <[email protected]>

* add types, old model working

Signed-off-by: Jason <[email protected]>

* pitch

Signed-off-by: Jason <[email protected]>

* update

Signed-off-by: Jason <[email protected]>

* update

Signed-off-by: Jason <[email protected]>

* let it work

Signed-off-by: Jason <[email protected]>

* fixes

Signed-off-by: Jason <[email protected]>

* add oktai comments

Signed-off-by: Jason <[email protected]>

* debug

Signed-off-by: Jason <[email protected]>

* scale

Signed-off-by: Jason <[email protected]>

* wip

Signed-off-by: Jason <[email protected]>

* fix test for v1

Signed-off-by: Jason <[email protected]>
…
paarthneekhara pushed a commit to paarthneekhara/NeMo that referenced this pull request Sep 17, 2021
* Itn add classes (#2141)

* move do_training flag to config

Signed-off-by: Yang Zhang <[email protected]>

* added telephone to itn

Signed-off-by: Yang Zhang <[email protected]>

* add telephone and email to itn

Signed-off-by: Yang Zhang <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* ASR + NLP Doc Fixes (#2136)

* Preserve the tokenizer config for ASR

Signed-off-by: smajumdar <[email protected]>

* Correct nlp docs

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Removing graphsurgeon optional dependency, improving import error rep… (#2144)

* Removing graphsurgeon optional dependency, improving import error reporting

Signed-off-by: Boris Fomitchev <[email protected]>

* Fixing scope error

Signed-off-by: Boris Fomitchev <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Fix FilterbankFeatures eval nondeterminism. (#2146)

Signed-off-by: PiotrDabkowski <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* fix the docs. (#2148)


Signed-off-by: Micha Livne <[email protected]>

* Text processing refactor (#2149)

* removed graphutils, suppletive, data_loader_utils from itn to be reused from tn

Signed-off-by: Yang Zhang <[email protected]>

* inheriting itn from tn, thus removing redundancy

Signed-off-by: Yang Zhang <[email protected]>

* cleaned whitelist

Signed-off-by: Yang Zhang <[email protected]>

* lgtm fix

Signed-off-by: Yang Zhang <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Update how artifacts work (#2138)

* Update how artifacts work

Signed-off-by: Oleksii Kuchaiev <[email protected]>

* fixing some tests

Signed-off-by: Oleksii Kuchaiev <[email protected]>

* fix more tests

Signed-off-by: Oleksii Kuchaiev <[email protected]>

* add __init__ to tests to make them discoverable

Signed-off-by: Oleksii Kuchaiev <[email protected]>

* empty src support

Signed-off-by: Oleksii Kuchaiev <[email protected]>

* updates plust unittest

Signed-off-by: Oleksii Kuchaiev <[email protected]>

* add copyright check

Signed-off-by: Oleksii Kuchaiev <[email protected]>

* copyright header

Signed-off-by: Oleksii Kuchaiev <[email protected]>

* fix style

Signed-off-by: Oleksii Kuchaiev <[email protected]>

* handle hashed megatron checkpoint version in nlp restore_from

Signed-off-by: ericharper <[email protected]>

* add _MODEL_RESTORE_PATH to AppState

Signed-off-by: ericharper <[email protected]>

* get rid of global folder caching

Signed-off-by: Oleksii Kuchaiev <[email protected]>

* double register - warning instead of exception

Signed-off-by: Oleksii Kuchaiev <[email protected]>

* Add asr spe tests

Signed-off-by: smajumdar <[email protected]>

* Pop out asr wpe pre-registered value

Signed-off-by: smajumdar <[email protected]>

* Correct ASR tests and paths

Signed-off-by: smajumdar <[email protected]>

* Correct tokenizer saving

Signed-off-by: smajumdar <[email protected]>

* Correct ASR tests

Signed-off-by: smajumdar <[email protected]>

* Correct ASR bpe mixin

Signed-off-by: smajumdar <[email protected]>

* Patch up backward compatibility

Signed-off-by: smajumdar <[email protected]>

* update register_bert_model

Signed-off-by: ericharper <[email protected]>

* update all get_lm_model calls

Signed-off-by: ericharper <[email protected]>

* return None if src not found

Signed-off-by: ericharper <[email protected]>

* handle case with no tokenizer

Signed-off-by: ericharper <[email protected]>

* do not add another hash is using tarfile_artifacts

Signed-off-by: ericharper <[email protected]>

* add return_none flag, update doc string

Signed-off-by: ericharper <[email protected]>

* update default behavior of register_artifact for NLPModel

Signed-off-by: ericharper <[email protected]>

* change kwarg name to verify_src_exists

Signed-off-by: ericharper <[email protected]>

* use cfg instead of _cfg

Signed-off-by: Oleksii Kuchaiev <[email protected]>

* some cleanups

Signed-off-by: Oleksii Kuchaiev <[email protected]>

Co-authored-by: ericharper <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Language model refactoring (#2120)

* fixed branch in IR tutorial

Signed-off-by: AlexGrinch <[email protected]>

* bucketing tarred dataset for lm training

Signed-off-by: AlexGrinch <[email protected]>

* updated global rank

Signed-off-by: AlexGrinch <[email protected]>

* perplexity update

Signed-off-by: AlexGrinch <[email protected]>

* refactor lm to be campatible with latest nmt

Signed-off-by: AlexGrinch <[email protected]>

* perplexity change

Signed-off-by: AlexGrinch <[email protected]>

* removed obsolete config

Signed-off-by: AlexGrinch <[email protected]>

* added sequence perplexity

Signed-off-by: AlexGrinch <[email protected]>

* added non-smoothed CE loss for validation

Signed-off-by: AlexGrinch <[email protected]>

* unified sentence dataset, torchmetrics for sequence perplexity

Signed-off-by: AlexGrinch <[email protected]>

* translate_ddp refactor

Signed-off-by: AlexGrinch <[email protected]>

Co-authored-by: Oleksii Kuchaiev <[email protected]>
Co-authored-by: Sandeep Subramanian <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* [NMT] Multi-validation Patch (#2150)

* rename dl index 0 loss and sacrebleu for backwards compatibility

Signed-off-by: ericharper <[email protected]>

* eval -> val/tst

Signed-off-by: ericharper <[email protected]>

* instantiate torchmetrics after instantiating dataloaders

Signed-off-by: ericharper <[email protected]>

* bug

Signed-off-by: ericharper <[email protected]>

* remove debugging log

Signed-off-by: ericharper <[email protected]>

* remove debugging log

Signed-off-by: ericharper <[email protected]>

Co-authored-by: Oleksii Kuchaiev <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* bumping version to 1.0.0

Signed-off-by: Oleksii Kuchaiev <[email protected]>

Signed-off-by: Micha Livne <[email protected]>

* fixed the num_samples of text classification model. (#2152)

Signed-off-by: Vahid <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* fix for electronic (#2153)

* fix for electronic

Signed-off-by: ekmb <[email protected]>

* special symbols added

Signed-off-by: ekmb <[email protected]>

* restrict symbols list

Signed-off-by: ekmb <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* FastSpeech 2 Test & Docs (#2143)

* Add FS2 data loading test

Signed-off-by: Jocelyn Huang <[email protected]>

* TTS docs update for FastSpeech 2

Signed-off-by: Jocelyn Huang <[email protected]>

* Style fix for FS2 dataset test

Signed-off-by: Jocelyn Huang <[email protected]>

* Fix transpose typo

Signed-off-by: Jocelyn Huang <[email protected]>

Co-authored-by: Oleksii Kuchaiev <[email protected]>
Co-authored-by: Jason <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Minor patch for translate_ddp (#2155)

* Patch for backtranslation in lm dataset

Signed-off-by: MaximumEntropy <[email protected]>

* One more fix

Signed-off-by: MaximumEntropy <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Entity linking (#2050)

* Started adding SAP dataset

Signed-off-by: Virginia Adams <[email protected]>

* Delete .lm_bert_dataset.py.swp

Signed-off-by: Virginia Adams <[email protected]>

* Added dataset and loss

Signed-off-by: Virginia Adams <[email protected]>

* Added entity linking encoder model

Signed-off-by: Virginia Adams <[email protected]>

* Can build and use index from pubmedbert model

Signed-off-by: Virginia Adams <[email protected]>

* checked boolean logic in build_index.py

Signed-off-by: Virginia Adams <[email protected]>

* End to end tested all functionality

Signed-off-by: Virginia Adams <[email protected]>

* fixed val loss none at end of validation

Signed-off-by: Virginia Adams <[email protected]>

* Started adding demo entity linking notebook

Signed-off-by: Virginia Adams <[email protected]>

* adding in notebook demo

Signed-off-by: Virginia Adams <[email protected]>

* added call to entitylinking classes in __init__.py files

Signed-off-by: Virginia Adams <[email protected]>

* Added eval code to notebook

Signed-off-by: Virginia Adams <[email protected]>

* Adding unfinished notebook

Signed-off-by: Virginia Adams <[email protected]>

* Cleaned up example dir

Signed-off-by: Virginia Adams <[email protected]>

* Fixed recap commands

Signed-off-by: Virginia Adams <[email protected]>

* added model typing and tiny data tar

Signed-off-by: Virginia Adams <[email protected]>

* Adding tiny data zip

Signed-off-by: Virginia Adams <[email protected]>

* updated tiny example config data path

Signed-off-by: Virginia Adams <[email protected]>

* Notebook demo works

Signed-off-by: Virginia Adams <[email protected]>

* Changed training epochs

Signed-off-by: Virginia Adams <[email protected]>

* Removed output from training and install cells

Signed-off-by: Virginia Adams <[email protected]>

* changed code formatting

Signed-off-by: Virginia Adams <[email protected]>

* Started doc string for new functions

Signed-off-by: Virginia Adams <[email protected]>

* Updated data_preprocessing to save to data_dir

Signed-off-by: Virginia Adams <[email protected]>

* fixed comment in notebook demo

Signed-off-by: Virginia Adams <[email protected]>

* Update data_preprocessing.py

Signed-off-by: Virginia Adams <[email protected]>

* updated nemo typing imports

Signed-off-by: Virginia Adams <[email protected]>

* about to rebase

Signed-off-by: Virginia Adams <[email protected]>

* added back umls_dataset_processing.py

Signed-off-by: Virginia Adams <[email protected]>

* Removed example data

Signed-off-by: Virginia Adams <[email protected]>

* Fixed typos in notebook demo

Signed-off-by: Virginia Adams <[email protected]>

* fixed lgtm-com issues

Signed-off-by: Virginia Adams <[email protected]>

* added copyright headers

Signed-off-by: Virginia Adams <[email protected]>

* fixed import and copyright headers

Signed-off-by: Virginia Adams <[email protected]>

* Fixed formatting

Signed-off-by: Virginia Adams <[email protected]>

* Fixed formatting changes 2

Signed-off-by: Virginia Adams <[email protected]>

* fixed test formatting

Signed-off-by: Virginia Adams <[email protected]>

* Added __init__.py for model and dataset

Signed-off-by: Virginia Adams <[email protected]>

* loading newline file returns data_dir now

Signed-off-by: Virginia Adams <[email protected]>

* Removed conf notebook and deleted comment

Signed-off-by: Virginia Adams <[email protected]>

* Added jenkins test

Signed-off-by: Virginia Adams <[email protected]>

* Updated Jenkins test

Signed-off-by: Virginia Adams <[email protected]>

* fixed file path

Signed-off-by: Virginia Adams <[email protected]>

* Changed Jenkins pipeline order

Signed-off-by: Virginia Adams <[email protected]>

* Fixed Jenkins datapath... again...

Signed-off-by: Virginia Adams <[email protected]>

* Made most review changes

Signed-off-by: Virginia Adams <[email protected]>

* fixed copy right

Signed-off-by: Virginia Adams <[email protected]>

* updated unit test to wget config

Signed-off-by: Virginia Adams <[email protected]>

* reverted test file back

Signed-off-by: Virginia Adams <[email protected]>

* Added project dir to jenkins test

Signed-off-by: Virginia Adams <[email protected]>

* defined config in unit test

Signed-off-by: Virginia Adams <[email protected]>

Co-authored-by: Sandeep Subramanian <[email protected]>
Co-authored-by: Oleksii Kuchaiev <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Correct branch version for v1.0.0 (#2157)

* Correct branch version

Signed-off-by: smajumdar <[email protected]>

* Correct Jenkinsfile

Signed-off-by: smajumdar <[email protected]>

* Update rst files

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* switch CI back to main

Signed-off-by: Oleksii Kuchaiev <[email protected]>

Signed-off-by: Micha Livne <[email protected]>

* fixed the docs. (#2156)


Signed-off-by: Micha Livne <[email protected]>

* Make Hifigan jittable (#2159)

* FastSpeech 2 Test & Docs (#2143)

* Add FS2 data loading test

Signed-off-by: Jocelyn Huang <[email protected]>

* TTS docs update for FastSpeech 2

Signed-off-by: Jocelyn Huang <[email protected]>

* Style fix for FS2 dataset test

Signed-off-by: Jocelyn Huang <[email protected]>

* Fix transpose typo

Signed-off-by: Jocelyn Huang <[email protected]>

Co-authored-by: Oleksii Kuchaiev <[email protected]>
Co-authored-by: Jason <[email protected]>

* Entity linking (#2050)

* Started adding SAP dataset

Signed-off-by: Virginia Adams <[email protected]>

* Delete .lm_bert_dataset.py.swp

Signed-off-by: Virginia Adams <[email protected]>

* Added dataset and loss

Signed-off-by: Virginia Adams <[email protected]>

* Added entity linking encoder model

Signed-off-by: Virginia Adams <[email protected]>

* Can build and use index from pubmedbert model

Signed-off-by: Virginia Adams <[email protected]>

* checked boolean logic in build_index.py

Signed-off-by: Virginia Adams <[email protected]>

* End to end tested all functionality

Signed-off-by: Virginia Adams <[email protected]>

* fixed val loss none at end of validation

Signed-off-by: Virginia Adams <[email protected]>

* Started adding demo entity linking notebook

Signed-off-by: Virginia Adams <[email protected]>

* adding in notebook demo

Signed-off-by: Virginia Adams <[email protected]>

* added call to entitylinking classes in __init__.py files

Signed-off-by: Virginia Adams <[email protected]>

* Added eval code to notebook

Signed-off-by: Virginia Adams <[email protected]>

* Adding unfinished notebook

Signed-off-by: Virginia Adams <[email protected]>

* Cleaned up example dir

Signed-off-by: Virginia Adams <[email protected]>

* Fixed recap commands

Signed-off-by: Virginia Adams <[email protected]>

* added model typing and tiny data tar

Signed-off-by: Virginia Adams <[email protected]>

* Adding tiny data zip

Signed-off-by: Virginia Adams <[email protected]>

* updated tiny example config data path

Signed-off-by: Virginia Adams <[email protected]>

* Notebook demo works

Signed-off-by: Virginia Adams <[email protected]>

* Changed training epochs

Signed-off-by: Virginia Adams <[email protected]>

* Removed output from training and install cells

Signed-off-by: Virginia Adams <[email protected]>

* changed code formatting

Signed-off-by: Virginia Adams <[email protected]>

* Started doc string for new functions

Signed-off-by: Virginia Adams <[email protected]>

* Updated data_preprocessing to save to data_dir

Signed-off-by: Virginia Adams <[email protected]>

* fixed comment in notebook demo

Signed-off-by: Virginia Adams <[email protected]>

* Update data_preprocessing.py

Signed-off-by: Virginia Adams <[email protected]>

* updated nemo typing imports

Signed-off-by: Virginia Adams <[email protected]>

* about to rebase

Signed-off-by: Virginia Adams <[email protected]>

* added back umls_dataset_processing.py

Signed-off-by: Virginia Adams <[email protected]>

* Removed example data

Signed-off-by: Virginia Adams <[email protected]>

* Fixed typos in notebook demo

Signed-off-by: Virginia Adams <[email protected]>

* fixed lgtm-com issues

Signed-off-by: Virginia Adams <[email protected]>

* added copyright headers

Signed-off-by: Virginia Adams <[email protected]>

* fixed import and copyright headers

Signed-off-by: Virginia Adams <[email protected]>

* Fixed formatting

Signed-off-by: Virginia Adams <[email protected]>

* Fixed formatting changes 2

Signed-off-by: Virginia Adams <[email protected]>

* fixed test formatting

Signed-off-by: Virginia Adams <[email protected]>

* Added __init__.py for model and dataset

Signed-off-by: Virginia Adams <[email protected]>

* loading newline file returns data_dir now

Signed-off-by: Virginia Adams <[email protected]>

* Removed conf notebook and deleted comment

Signed-off-by: Virginia Adams <[email protected]>

* Added jenkins test

Signed-off-by: Virginia Adams <[email protected]>

* Updated Jenkins test

Signed-off-by: Virginia Adams <[email protected]>

* fixed file path

Signed-off-by: Virginia Adams <[email protected]>

* Changed Jenkins pipeline order

Signed-off-by: Virginia Adams <[email protected]>

* Fixed Jenkins datapath... again...

Signed-off-by: Virginia Adams <[email protected]>

* Made most review changes

Signed-off-by: Virginia Adams <[email protected]>

* fixed copy right

Signed-off-by: Virginia Adams <[email protected]>

* updated unit test to wget config

Signed-off-by: Virginia Adams <[email protected]>

* reverted test file back

Signed-off-by: Virginia Adams <[email protected]>

* Added project dir to jenkins test

Signed-off-by: Virginia Adams <[email protected]>

* defined config in unit test

Signed-off-by: Virginia Adams <[email protected]>

Co-authored-by: Sandeep Subramanian <[email protected]>
Co-authored-by: Oleksii Kuchaiev <[email protected]>

* switch CI back to main

Signed-off-by: Oleksii Kuchaiev <[email protected]>

* Make Hifigan jittable

Signed-off-by: Ryan Leary <[email protected]>

* Remove vestigial debugging printout

Signed-off-by: Ryan Leary <[email protected]>

* Add export forward and fix style

Signed-off-by: Ryan Leary <[email protected]>

* Fix load_state_dict override for arbitrary layers

Signed-off-by: Ryan Leary <[email protected]>

Co-authored-by: Jocelyn <[email protected]>
Co-authored-by: Oleksii Kuchaiev <[email protected]>
Co-authored-by: Jason <[email protected]>
Co-authored-by: vadam5 <[email protected]>
Co-authored-by: Sandeep Subramanian <[email protected]>
Co-authored-by: Oleksii Kuchaiev <[email protected]>
Co-authored-by: Ryan Leary <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* fix version (#2162)

Signed-off-by: Jason <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Megatron nb size reduced (#2163)

* notebook size reduced

Signed-off-by: ekmb <[email protected]>

* notebook size reduced

Signed-off-by: ekmb <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* update spectral clustering method (#2158)

* update spectral clustering method

Signed-off-by: nithinraok <[email protected]>

* update Jenkins File

Signed-off-by: nithinraok <[email protected]>

* threshold fix by reducing window length for shorter embs

Signed-off-by: nithinraok <[email protected]>

* grammar fixes

Signed-off-by: nithinraok <[email protected]>

* CR update

Signed-off-by: nithinraok <[email protected]>

* paper reference

Signed-off-by: nithinraok <[email protected]>

* improve docstring for yaml

Signed-off-by: nithinraok <[email protected]>

* Doc fixes

Signed-off-by: nithinraok <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* revert (#2167)

Signed-off-by: Jason <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Limit Pytorch lightning release (#2170)

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* token classification models artifacts update (#2169)

* artifacts update

Signed-off-by: ekmb <[email protected]>

* artifacts update

Signed-off-by: ekmb <[email protected]>

* jenkins

Signed-off-by: ekmb <[email protected]>

* fix for model restoration

Signed-off-by: ekmb <[email protected]>

* typos fix + jenkins dir update

Signed-off-by: ekmb <[email protected]>

* typo

Signed-off-by: ericharper <[email protected]>

* update jenkins branch

Signed-off-by: ericharper <[email protected]>

* add &&

Signed-off-by: ericharper <[email protected]>

* jenkins

Signed-off-by: ekmb <[email protected]>

* jenkins disable

Signed-off-by: ekmb <[email protected]>

* revert jenkins

Signed-off-by: ekmb <[email protected]>

* jenkins disable

Signed-off-by: ekmb <[email protected]>

* revert jenkins

Signed-off-by: ekmb <[email protected]>

Co-authored-by: ericharper <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Fix to always_save_nemo (#2174)

* Initial attempt at always_save_nemo fix

Signed-off-by: MaximumEntropy <[email protected]>

* updated path before saving in exp manager, fixed bug when handling tarfile artifacts

Signed-off-by: ericharper <[email protected]>

* Add test with always_save_nemo to exp_manager

Signed-off-by: MaximumEntropy <[email protected]>

* Style fixes

Signed-off-by: MaximumEntropy <[email protected]>

* update jenkins branch

Signed-off-by: ericharper <[email protected]>

* check for nemo:

Signed-off-by: ericharper <[email protected]>

* check for nemo:

Signed-off-by: ericharper <[email protected]>

* check for nemo:

Signed-off-by: ericharper <[email protected]>

Co-authored-by: ericharper <[email protected]>
Co-authored-by: Jason <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* fix typo (#2179)

Signed-off-by: Yang Zhang <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Make itn tests optional  (#2173)

* Limit Pytorch lightning release

Signed-off-by: smajumdar <[email protected]>

* Add final two checks

Signed-off-by: smajumdar <[email protected]>

Co-authored-by: Jason <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* First Revision of TTS Docs and Notebooks Update for 1.0 (#2166)

* squash

Signed-off-by: Jason <[email protected]>

* notebook fixes

Signed-off-by: Jason <[email protected]>

* notebook fixes

Signed-off-by: Jason <[email protected]>

* typos

Signed-off-by: Jason <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* add more alternatives of 0 for telephone (#2171)

Signed-off-by: Yang Zhang <[email protected]>

Co-authored-by: Oleksii Kuchaiev <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Acc tn (#2180)

* make tn cardinal faster

Signed-off-by: Yang Zhang <[email protected]>

* add number far

Signed-off-by: Yang Zhang <[email protected]>

* add test

Signed-off-by: Yang Zhang <[email protected]>

* fix lgtm

Signed-off-by: Yang Zhang <[email protected]>

* fix lgtm

Signed-off-by: Yang Zhang <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* [DOCS] NLP Model parallel, NMT multi-val, CORE register artifacts  (#2168)

* update docs

Signed-off-by: ericharper <[email protected]>

* update docs

Signed-off-by: ericharper <[email protected]>

* update docs

Signed-off-by: ericharper <[email protected]>

* update docs

Signed-off-by: ericharper <[email protected]>

* update docs

Signed-off-by: ericharper <[email protected]>

* update docs

Signed-off-by: ericharper <[email protected]>

* update docs

Signed-off-by: ericharper <[email protected]>

* update docs

Signed-off-by: ericharper <[email protected]>

* update docs

Signed-off-by: ericharper <[email protected]>

* update docs

Signed-off-by: ericharper <[email protected]>

* update docs

Signed-off-by: ericharper <[email protected]>

* update docs

Signed-off-by: ericharper <[email protected]>

Co-authored-by: Sandeep Subramanian <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Change label smoothing prob to reduce chance of test failure (#2184)

Signed-off-by: MaximumEntropy <[email protected]>

Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Add FS2 checkpoint links to docs and inference notebook (#2181)

* Add FS2 checkpoint links to docs and inference notebook

Signed-off-by: Jocelyn Huang <[email protected]>

* Remove empty cell from TTS notebook

Signed-off-by: Jocelyn Huang <[email protected]>

Co-authored-by: Oleksii Kuchaiev <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Update ptl to 1.3 on main branch (#2178)

* Update PTL

Signed-off-by: smajumdar <[email protected]>

* Begin update to Pytorch Lightning 1.3.x

Signed-off-by: smajumdar <[email protected]>

* Formatting

Signed-off-by: smajumdar <[email protected]>

* style

Signed-off-by: ericharper <[email protected]>

* Formatting

Signed-off-by: smajumdar <[email protected]>

* minor fix

Signed-off-by: Jason <[email protected]>

* minor fix

Signed-off-by: Jason <[email protected]>

* get testing attribute from trainer

Signed-off-by: ericharper <[email protected]>

* update init_ddp_connection override

Signed-off-by: ericharper <[email protected]>

* update attribute

Signed-off-by: ericharper <[email protected]>

* add barrier after load checkpoint in megatron

Signed-off-by: ericharper <[email protected]>

* remove barrier

Signed-off-by: ericharper <[email protected]>

* update last naming

Signed-off-by: Jason <[email protected]>

Co-authored-by: ericharper <[email protected]>
Co-authored-by: Jason <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* SDE updates (#2187)

* Added updates to SDE:
- support for external vocabulary (to detect OOV words)
- support for offset field (for segmented long recordings)
- UI improvements

Signed-off-by: Vitaly Lavrukhin <[email protected]>

* Refactored diff in SDE

Signed-off-by: Vitaly Lavrukhin <[email protected]>

Co-authored-by: Oleksii Kuchaiev <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Add TTS aligner and improved version of g2p for vocabs.Phonemes, small improvement in TalkNet (#2189)

* add first version of aligner

Signed-off-by: Oktai Tatanov <[email protected]>

* aligner docs, new g2p version, fix bugs in talknet

Signed-off-by: Oktai Tatanov <[email protected]>

* update docs and remove lj related code

Signed-off-by: Oktai Tatanov <[email protected]>

* fix style

Signed-off-by: Oktai Tatanov <[email protected]>

* fix import

Signed-off-by: Oktai Tatanov <[email protected]>

Co-authored-by: Oleksii Kuchaiev <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* set the default of nodessplitter to None. (#2190)

Signed-off-by: Vahid <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* NMT fixes (#2194)

* minor fixes

Signed-off-by: Oleksii Kuchaiev <[email protected]>

* minor bugfixes

Signed-off-by: Oleksii Kuchaiev <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Store mappings file in .nemo for FS2 model (#2196)

* Store mappings file in .nemo for FS2 model

Signed-off-by: Jocelyn Huang <[email protected]>

* Add error enforcing mappings file during training (FS2)

Signed-off-by: Jocelyn Huang <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Add support to change the SE context window of ConvASREncoder (#2193)

* Add support for changing context window on the fly

Signed-off-by: smajumdar <[email protected]>

* Add support to change the SE context window of ConvASREncoder

Signed-off-by: smajumdar <[email protected]>

* Add ability to skip config updating

Signed-off-by: smajumdar <[email protected]>

* Switch to mixin based API

Signed-off-by: smajumdar <[email protected]>

* Update docs and api for ASRModuleMixin

Signed-off-by: smajumdar <[email protected]>

* Change print to logging.info

Signed-off-by: smajumdar <[email protected]>

* Correct stride level when computing context window

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Add a CI test for doing inference with an NMT model trained with Pre-LN (#2198)

* Change label smoothing prob to reduce chance of test failure

Signed-off-by: MaximumEntropy <[email protected]>

* Add Pre-LN inference test to Jenkinsfile

Signed-off-by: MaximumEntropy <[email protected]>

* Separate tests for training and NMT inference

Signed-off-by: MaximumEntropy <[email protected]>

Co-authored-by: Oleksii Kuchaiev <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Fix ipywidgets error in asr notebook (#2199)

Added `ipywidgets` to avoid `ImportError: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html` error.

Signed-off-by: Derek Chia <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* metrics fix (#2202)

* metrics fix

Signed-off-by: ekmb <[email protected]>

* metrics reset for punct model

Signed-off-by: ekmb <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* readme and minor improvements (#2203)

* readme and minor improvements

Signed-off-by: nithinraok <[email protected]>

* vad threshold update

Signed-off-by: nithinraok <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Fix text processing docs (#2195)

* fix text processing docs

Signed-off-by: Yang Zhang <[email protected]>

* fix name

Signed-off-by: Yang Zhang <[email protected]>

* add guard to pynini import

Signed-off-by: Yang Zhang <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Fix bug in SpecCutout (#2201)

Signed-off-by: Robert Bracco <[email protected]>

Co-authored-by: Oleksii Kuchaiev <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Fix bug in SpecCutout (#2201) (#2205)

Signed-off-by: Robert Bracco <[email protected]>

Co-authored-by: Oleksii Kuchaiev <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Signed-off-by: smajumdar <[email protected]>

Co-authored-by: Robert Bracco <[email protected]>
Co-authored-by: Oleksii Kuchaiev <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Set seed before generating random tensors in NMT test (#2206)

* Change label smoothing prob to reduce chance of test failure

Signed-off-by: MaximumEntropy <[email protected]>

* Set seed before generating tensors

Signed-off-by: MaximumEntropy <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* ASR patches for v1.0.0 (#2207)

* Multiple updates to RNNT add initialization

Signed-off-by: smajumdar <[email protected]>

* Correct name of initilization

Signed-off-by: smajumdar <[email protected]>

* Update dockerignore

Signed-off-by: smajumdar <[email protected]>

* Fix RNNT WER calculation

Signed-off-by: smajumdar <[email protected]>

* Address comments

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Multilingual training for NMT (#2160)

* mnmt on fresh main

Signed-off-by: Abhinav Khattar <[email protected]>

* push for test

Signed-off-by: Abhinav Khattar <[email protected]>

* debug

Signed-off-by: Abhinav Khattar <[email protected]>

* check

Signed-off-by: Abhinav Khattar <[email protected]>

* cleanup

Signed-off-by: Abhinav Khattar <[email protected]>

* minor fix

Signed-off-by: Abhinav Khattar <[email protected]>

* more minor fixes

Signed-off-by: Abhinav Khattar <[email protected]>

* fix for test

Signed-off-by: Abhinav Khattar <[email protected]>

* fix list size error

Signed-off-by: Abhinav Khattar <[email protected]>

* multilingual in infer

Signed-off-by: Abhinav Khattar <[email protected]>

* changes

Signed-off-by: Abhinav Khattar <[email protected]>

* tar creation with multilingual

Signed-off-by: Abhinav Khattar <[email protected]>

* fix

Signed-off-by: Abhinav Khattar <[email protected]>

* changes + parallelism + bug fix

Signed-off-by: Abhinav Khattar <[email protected]>

* small fix

Signed-off-by: Abhinav Khattar <[email protected]>

* multilingual preprocessor fix

Signed-off-by: Abhinav Khattar <[email protected]>

* globally unique fragment names in tarred dataset

Signed-off-by: Abhinav Khattar <[email protected]>

* minor changes

Signed-off-by: Abhinav Khattar <[email protected]>

* rm load_from_cached_dataset

Signed-off-by: Abhinav Khattar <[email protected]>

* minor config change

Signed-off-by: Abhinav Khattar <[email protected]>

* rm unsued import

Signed-off-by: Abhinav Khattar <[email protected]>

Co-authored-by: Oleksii Kuchaiev <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Remove memory leak from ASR notebook + update model notebook (#2213)

* ASR patches for v1.0.0 (#2207)

* Multiple updates to RNNT add initialization

Signed-off-by: smajumdar <[email protected]>

* Correct name of initilization

Signed-off-by: smajumdar <[email protected]>

* Update dockerignore

Signed-off-by: smajumdar <[email protected]>

* Fix RNNT WER calculation

Signed-off-by: smajumdar <[email protected]>

* Address comments

Signed-off-by: smajumdar <[email protected]>

* Correct model notebook to log the loss and correctly assign keys

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* replace names in vad tutorials (#2220)

Signed-off-by: fayejf <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* fix the versioning name. (#2209)

* fix the versioning name.

Signed-off-by: Vahid <[email protected]>

* Made version None.

Signed-off-by: Vahid <[email protected]>

Co-authored-by: Oleksii Kuchaiev <[email protected]>
Co-authored-by: Jason <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Enabled passing kwargs to export() (#2175)

* Enabled passing kwargs to export()

Signed-off-by: Boris Fomitchev <[email protected]>

* Fixing style; changed Classifier input_example to new extended syntax

Signed-off-by: Boris Fomitchev <[email protected]>

* Fixed order of forward() call in export

Signed-off-by: Boris Fomitchev <[email protected]>

* Fixing style

Signed-off-by: Boris Fomitchev <[email protected]>

Co-authored-by: Oleksii Kuchaiev <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* update g2p: ambigious ignore, flag for skipping seq2seq (#2223)

Signed-off-by: Oktai Tatanov <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Update TTS notebook with TalkNet inference (#2133)

* Update TTS notebook with TalkNet inference.

Signed-off-by: Stanislav Beliaev <[email protected]>

* Update TTS Notebook.

Signed-off-by: Stanislav Beliaev <[email protected]>

* Update TTS TN Training Notebook.

Signed-off-by: Stanislav Beliaev <[email protected]>

* Fix TN paper link.

Signed-off-by: Stanislav Beliaev <[email protected]>

* Remove branch updaing TODOs.

Signed-off-by: Stanislav Beliaev <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* update speaker notebooks (#2224)

Signed-off-by: nithinraok <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Support symlinked files (#2216)

Signed-off-by: Anas Abou Allaban <[email protected]>

Co-authored-by: Somshubra Majumdar <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Set strict=True everywhere by default. (#2225)

Signed-off-by: Oleksii Kuchaiev <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* set strict=True in nlp_model (#2227)

Signed-off-by: Oleksii Kuchaiev <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* set strict=False for model parallel examples

Signed-off-by: Oleksii Kuchaiev <[email protected]>

Signed-off-by: Micha Livne <[email protected]>

* Make Text processing installation optional via reinstall.sh (#2226)

* Make Text processing installation optional via reinstall.sh

Signed-off-by: smajumdar <[email protected]>

* Support both success and failure states

Signed-off-by: smajumdar <[email protected]>

Co-authored-by: Oleksii Kuchaiev <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Transformer final norm preln (#2197)

* fix pre_ln final norm

Signed-off-by: fayejf <[email protected]>

* style fix

Signed-off-by: fayejf <[email protected]>

* bug fixed

Signed-off-by: fayejf <[email protected]>

* bugfix post_ln

Signed-off-by: fayejf <[email protected]>

* update and add pre_ln_final_norm

Signed-off-by: fayejf <[email protected]>

* style fix

Signed-off-by: fayejf <[email protected]>

* fix for unit test

Signed-off-by: fayejf <[email protected]>

* rename final_norm to final_layer_norm

Signed-off-by: fayejf <[email protected]>

* bug fix

Signed-off-by: fayejf <[email protected]>

* tiny fix

Signed-off-by: fayejf <[email protected]>

* fix and improve

Signed-off-by: fayejf <[email protected]>

* tiny fix

Signed-off-by: fayejf <[email protected]>

* Patch for NMT to allow loading old modlels trained with pre-LN

Signed-off-by: MaximumEntropy <[email protected]>

Co-authored-by: Sandeep Subramanian <[email protected]>
Co-authored-by: Oleksii Kuchaiev <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Update models and notebook for 1.0 (#2211)

* update models

Signed-off-by: Jason <[email protected]>

* updates

Signed-off-by: Jason <[email protected]>

* fix

Signed-off-by: Jason <[email protected]>

* add links

Signed-off-by: Jason <[email protected]>

* fixes

Signed-off-by: Jason <[email protected]>

* style

Signed-off-by: Jason <[email protected]>

* update checkpoints

Signed-off-by: Jason <[email protected]>

* rename

Signed-off-by: Jason <[email protected]>

* fixes

Signed-off-by: Jason <[email protected]>

* lgtm

Signed-off-by: Jason <[email protected]>

* fix loading waveglow

Signed-off-by: Jason <[email protected]>

* typo

Signed-off-by: Jason <[email protected]>

Co-authored-by: Oleksii Kuchaiev <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* update_metrics_classification_models (#2228)

Signed-off-by: nithinraok <[email protected]>

Co-authored-by: fayejf <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Data loader for seq of label model (#2084)

* feature to seq label data loader

Signed-off-by: fayejf <[email protected]>

* style fix

Signed-off-by: fayejf <[email protected]>

* small fix

Signed-off-by: fayejf <[email protected]>

* update tl to be length of seq label

Signed-off-by: fayejf <[email protected]>

* style fix

Signed-off-by: fayejf <[email protected]>

* tiny bug fix

Signed-off-by: fayejf <[email protected]>

* small updates

Signed-off-by: fayejf <[email protected]>

* updates for review feedback

Signed-off-by: fayejf <[email protected]>

* style fix

Signed-off-by: fayejf <[email protected]>

* explain seq_label

Signed-off-by: fayejf <[email protected]>

* fix lgtm

Signed-off-by: fayejf <[email protected]>

* small updates

Signed-off-by: fayejf <[email protected]>

* improve as discussed

Signed-off-by: fayejf <[email protected]>

* add docstring

Signed-off-by: fayejf <[email protected]>

Co-authored-by: Nithin Rao <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* fix comments (#2236)

* fix comments

Signed-off-by: Yang Zhang <[email protected]>

* fix typo

Signed-off-by: Yang Zhang <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* add paper ref to sgdqa model doc (#2233)

* add paper ref to sgdqa model doc

Signed-off-by: Yang Zhang <[email protected]>

* fix comments

Signed-off-by: Yang Zhang <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Move ConcatDataset to common (#2237)

* move concatdataset to common

Signed-off-by: Abhinav Khattar <[email protected]>

* var name change

Signed-off-by: Abhinav Khattar <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* audio based normalization (#2231)

* squash norm_audio

Signed-off-by: ekmb <[email protected]>

* add missing files

Signed-off-by: ekmb <[email protected]>

* style

Signed-off-by: ekmb <[email protected]>

* unit tests added, docstrings fixed

Signed-off-by: ekmb <[email protected]>

* fix lgtm errors

Signed-off-by: ekmb <[email protected]>

* debug jenkins

Signed-off-by: ekmb <[email protected]>

* debug jenkins

Signed-off-by: ekmb <[email protected]>

* signature update

Signed-off-by: ekmb <[email protected]>

* set deterministic default

Signed-off-by: ekmb <[email protected]>

* add more test cases

Signed-off-by: ekmb <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* bug fix config (#2232)

Signed-off-by: fayejf <[email protected]>

Co-authored-by: Somshubra Majumdar <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Alias Swish to SiLU  (#2239)

* Alias Swish to SiLU and move activations to inplace execution if possible

Signed-off-by: smajumdar <[email protected]>

* Remove unused import

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Update README.rst
Signed-off-by: Micha Livne <[email protected]>

* Offline asr notebook bug fix (#2242)

* fix

Signed-off-by: fayejf <[email protected]>

* install

Signed-off-by: fayejf <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Fix docstring (#2244)

* fix comments

Signed-off-by: Yang Zhang <[email protected]>

* fix doc string

Signed-off-by: Yang Zhang <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Update "last" Checkpoint (#2241)

* fix

Signed-off-by: Jason <[email protected]>

* change

Signed-off-by: Jason <[email protected]>

* fix

Signed-off-by: Jason <[email protected]>

Co-authored-by: Sandeep Subramanian <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Add pretrained model stt_es_citrinet_512 (#2247)

Signed-off-by: Elena Rastorgueva <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* [BUGFIX] Only process tarfile artifacts when model was restored from tarfile (#2250)

* process tarfile artifacts only if model is being restored

Signed-off-by: ericharper <[email protected]>

* process tarfile artifacts only if model was restored from a tarfile

Signed-off-by: ericharper <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Log average metrics for Multi-validation in NMT (#2251)

* add avg metrics NMT

Signed-off-by: Abhinav Khattar <[email protected]>

* name change

Signed-off-by: Abhinav Khattar <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Update Primer notebook (#2258)

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* fixed Bug 3310780 and  3310799 (#2264)

Signed-off-by: Virginia Adams <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Support multiple models being instantiated in same execution scope (#2245)

* Support multiple models being instantiated in same execution scope

Signed-off-by: smajumdar <[email protected]>

* Fix tests

Signed-off-by: smajumdar <[email protected]>

* Add locks to methods in appstate

Signed-off-by: smajumdar <[email protected]>

* Perform locks only on write operations

Signed-off-by: smajumdar <[email protected]>

* Correct deadlock issue

Signed-off-by: smajumdar <[email protected]>

* Add more tests

Signed-off-by: smajumdar <[email protected]>

* Add test for multi save and remove patch to change save type

Signed-off-by: smajumdar <[email protected]>

* Update app state to preserve gidx of previous token

Signed-off-by: smajumdar <[email protected]>

* Correct restoration logic for tarfiles

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* ASR Refactoring (#2240)

* Refactor out the preprocessing from ASR into common

Signed-off-by: smajumdar <[email protected]>

* Correct nltk issue with vocabs.py for clusters

Signed-off-by: smajumdar <[email protected]>

* Add typing information to SpecAugment and SpecCutout

Signed-off-by: smajumdar <[email protected]>

* Reorganize parts directory

Signed-off-by: smajumdar <[email protected]>

* Refactor parts submodules, add __init__ to few important parts

Signed-off-by: smajumdar <[email protected]>

* Update docs for new path to parts

Signed-off-by: smajumdar <[email protected]>

* Cherry pick PR https://github.com/NVIDIA/NeMo/pull/2219

Signed-off-by: smajumdar <[email protected]>

* Add header for preprocessing commons

Signed-off-by: smajumdar <[email protected]>

* Fix style of tests

Signed-off-by: smajumdar <[email protected]>

* Add forced update of configs for train-val-test ds to new labels tests

Signed-off-by: smajumdar <[email protected]>

* Update path to FilterbankFeatures for TTS

Signed-off-by: smajumdar <[email protected]>

* Add an alias file for backward compatibility

Signed-off-by: smajumdar <[email protected]>

* Add an alias file for backward compatibility

Signed-off-by: smajumdar <[email protected]>

* Update training scripts of ASR to support finetuning

Signed-off-by: smajumdar <[email protected]>

* Update Finetuning step to be ModelPT level

Signed-off-by: smajumdar <[email protected]>

* Update docs for finetuning for ASR

Signed-off-by: smajumdar <[email protected]>

* Fix style

Signed-off-by: smajumdar <[email protected]>

* Update docs and scripts with fine-tuning info

Signed-off-by: smajumdar <[email protected]>

* Update docs and scripts with fine-tuning info

Signed-off-by: smajumdar <[email protected]>

* Fix style

Signed-off-by: smajumdar <[email protected]>

* Update scripts

Signed-off-by: smajumdar <[email protected]>

* Add comment for weight initialization

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* TTS Doc Fix and Remove TTS Test (#2272)

* bug fix and remove test

Signed-off-by: Jason <[email protected]>

* syntax

Signed-off-by: Jason <[email protected]>

* syntax

Signed-off-by: Jason <[email protected]>

* syntax

Signed-off-by: Jason <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Talknet training Fix (#2273)

* TalkNet Training notebook fix.

Signed-off-by: Stanislav Beliaev <[email protected]>

* Remove debug stuff.

Signed-off-by: Stanislav Beliaev <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* update (#2274)

Signed-off-by: Jason <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Add links (#2275)

* update

Signed-off-by: Jason <[email protected]>

* link

Signed-off-by: Jason <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Delete 3_TTS_TalkNet_Training.ipynb (#2276)

Signed-off-by: Jason <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* tune down logging (#2277)

* tune down logging

Signed-off-by: Oleksii Kuchaiev <[email protected]>

* debug message instead of removing it completely

Signed-off-by: Oleksii Kuchaiev <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* minor bugfix

Signed-off-by: Oleksii Kuchaiev <[email protected]>

Signed-off-by: Micha Livne <[email protected]>

* remove confusing message

Signed-off-by: Oleksii Kuchaiev <[email protected]>

Signed-off-by: Micha Livne <[email protected]>

* Restore TalkNet training notebook (#2281)

* Restore TalkNet training notebook.

Signed-off-by: Stanislav Beliaev <[email protected]>

* Remove torchaudio dep.

Signed-off-by: Stanislav Beliaev <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Fix ExpManager Issues and FastPitch (#2283)

* backport exp_manager fixes to v1

Signed-off-by: Jason <[email protected]>

* fix fastpitch

Signed-off-by: Jason <[email protected]>

* fix tests

Signed-off-by: Jason <[email protected]>

* update prefix

Signed-off-by: Jason <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Organize asr config folders (#2284)


Signed-off-by: Micha Livne <[email protected]>

* Fix and enable DALI tests (#2077)

* Fix and enable DALI tests

Signed-off-by: Joaquin Anton <[email protected]>

* remove unused import

Signed-off-by: Joaquin Anton <[email protected]>

* Move DALI tests to a separate Jenkins stage

Signed-off-by: Joaquin Anton <[email protected]>

* Remove DALI tests from the main jenkins ASR stage

Signed-off-by: Joaquin Anton <[email protected]>

* Comment out MFCC test

Signed-off-by: Joaquin Anton <[email protected]>

* Working version

Signed-off-by: Joaquin Anton <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Added unit test for hifigan export, fixed hifigan export (#2279)

* Added unit test for hifigan export, Removed runtime test from waveglow test (now in export)

Signed-off-by: Boris Fomitchev <[email protected]>

* Fixed style

Signed-off-by: Boris Fomitchev <[email protected]>

* Fixed style

Signed-off-by: Boris Fomitchev <[email protected]>

Co-authored-by: Somshubra Majumdar <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Update conformer recipes (#2265)

* updated readme asr.

Signed-off-by: Vahid <[email protected]>

* added models.

Signed-off-by: Vahid <[email protected]>

* fixed docs.

Signed-off-by: Vahid <[email protected]>

* fixed docs.

Signed-off-by: Vahid <[email protected]>

* fixed docs.

Signed-off-by: Vahid <[email protected]>

* fixed docs.

Signed-off-by: Vahid <[email protected]>

* fixed the docs.

Signed-off-by: Vahid <[email protected]>

* fixed the docs.

Signed-off-by: Vahid <[email protected]>

* fixed the docs.

Signed-off-by: Vahid <[email protected]>

* disabled test.

Signed-off-by: Vahid <[email protected]>

* Updated the config files.

Signed-off-by: Vahid <[email protected]>

* Updated the config files.

Signed-off-by: Vahid <[email protected]>

* Updated the config files.

Signed-off-by: Vahid <[email protected]>

* dropped the wers.

Signed-off-by: Vahid <[email protected]>

* dropped the wers.

Signed-off-by: Vahid <[email protected]>

* dropped new models and reverted to old versions.

Signed-off-by: Vahid <[email protected]>

Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: Oleksii Kuchaiev <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Adding neural rescorer and its documentations (#2287)

* Added intial neural rescorer scripts.

Signed-off-by: Vahid <[email protected]>

* Added intial neural rescorer scripts.

Signed-off-by: Vahid <[email protected]>

* Added intial neural rescorer scripts.

Signed-off-by: Vahid <[email protected]>

* Added intial neural rescorer scripts.

Signed-off-by: Vahid <[email protected]>

* Added intial neural rescorer scripts.

Signed-off-by: Vahid <[email protected]>

* Added intial neural rescorer scripts.

Signed-off-by: Vahid <[email protected]>

* Added intial neural rescorer scripts.

Signed-off-by: Vahid <[email protected]>

* Added intial neural rescorer scripts.

Signed-off-by: Vahid <[email protected]>

* Added intial neural rescorer scripts.

Signed-off-by: Vahid <[email protected]>

* added more docs, figures, and output file.

Signed-off-by: Vahid <[email protected]>

* added more docs, figures, and output file.

Signed-off-by: Vahid <[email protected]>

* added more docs, figures, and output file.

Signed-off-by: Vahid <[email protected]>

* added more docs, figures, and output file.

Signed-off-by: Vahid <[email protected]>

* added more docs, figures, and output file.

Signed-off-by: Vahid <[email protected]>

* fixed style

Signed-off-by: Vahid <[email protected]>

* add a note to asr notebook.

Signed-off-by: Vahid <[email protected]>

* add a note to asr notebook.

Signed-off-by: Vahid <[email protected]>

* add a note to asr notebook.

Signed-off-by: Vahid <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Adjust warning messages

Signed-off-by: Oleksii Kuchaiev <[email protected]>

Signed-off-by: Micha Livne <[email protected]>

* Revert "Adjust warning messages"

This reverts commit df046ec55754d0136a2a28451435068f32409f30.

Signed-off-by: Micha Livne <[email protected]>

* Adjust warning messages (#2294)

Signed-off-by: Oleksii Kuchaiev <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Adding new Models releases on NGC. (#2295)

* added new models.

Signed-off-by: Vahid <[email protected]>

* added tests for asr lm.

Signed-off-by: Vahid <[email protected]>

* added tests for asr lm.

Signed-off-by: Vahid <[email protected]>

* dropped the test.

Signed-off-by: Vahid <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* update quantization (#2298)

Signed-off-by: slyned <[email protected]>

Co-authored-by: slyned <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* ASR improvements (#2293)

* Update numba messages and citrinet configs

Signed-off-by: smajumdar <[email protected]>

* Remove support for weight init scale and hidden hidden bias scale for layer normalized lstm

Signed-off-by: smajumdar <[email protected]>

* Add support for multiple filetypes in tarred datasets, correct rnn LN-lstm inputs, fix OmegaConf compat issue

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Time quarter to (#2292)

* fix comments

Signed-off-by: Yang Zhang <[email protected]>

* fix doc string

Signed-off-by: Yang Zhang <[email protected]>

* adding quarter to to time class

Signed-off-by: Yang Zhang <[email protected]>

* fix typo

Signed-off-by: Yang Zhang <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* fixed paths. (#2301)

Signed-off-by: Vahid <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Added onnxruntime check of exported ONNX, bumped up default ONNX opset (#2278)

* Added onnxruntime check of exported ONNX, bumped up default ONNX opset

Signed-off-by: Boris Fomitchev <[email protected]>

* Made TS export to accept ONNX-style input example, removed unused param to export

Signed-off-by: Boris Fomitchev <[email protected]>

* check_trace default made False

Signed-off-by: Boris Fomitchev <[email protected]>

* Fixed for updated export signature

Signed-off-by: Boris Fomitchev <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Update readmes

Signed-off-by: Oleksii Kuchaiev <[email protected]>

Signed-off-by: Micha Livne <[email protected]>

* update readme

Signed-off-by: Oleksii Kuchaiev <[email protected]>

Signed-off-by: Micha Livne <[email protected]>

* update readme

Signed-off-by: Oleksii Kuchaiev <[email protected]>

Signed-off-by: Micha Livne <[email protected]>

* fix docs table

Signed-off-by: Oleksii Kuchaiev <[email protected]>

Signed-off-by: Micha Livne <[email protected]>

* Add support for Numba CUDA optimized SpecAugment (#2269)

* Initial implementation

Signed-off-by: smajumdar <[email protected]>

* Initial implementation

Signed-off-by: smajumdar <[email protected]>

* Finish initial implementation of numba spec augment

Signed-off-by: smajumdar <[email protected]>

* Correct mask propagataion

Signed-off-by: smajumdar <[email protected]>

* Parallelize kernel over batch instead of over masks

Signed-off-by: smajumdar <[email protected]>

* Finish tests and update to signature of spectrogramaugmentation calls

Signed-off-by: smajumdar <[email protected]>

* Finish tests and update to signature of spectrogramaugmentation calls

Signed-off-by: smajumdar <[email protected]>

* Add header

Signed-off-by: smajumdar <[email protected]>

* Fix style

Signed-off-by: smajumdar <[email protected]>

* Add heuristics

Signed-off-by: smajumdar <[email protected]>

* Correct inclusive range of padding

Signed-off-by: smajumdar <[email protected]>

* Correct typing for spec aug numba

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Added JSON manifest's support to transcribe_speech.py (#2304)

* Added JSON manifest's support to transcribe_speech.py

Signed-off-by: Vitaly Lavrukhin <[email protected]>

* Dropped unused import

Signed-off-by: Vitaly Lavrukhin <[email protected]>

Co-authored-by: Oleksii Kuchaiev <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* get embedding for a single file (#2310)

* get embedding for a single file

Signed-off-by: nithinraok <[email protected]>

* fixes

Signed-off-by: nithinraok <[email protected]>

* sr update

Signed-off-by: nithinraok <[email protected]>

* regain train mode

Signed-off-by: nithinraok <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Update FastPitch (#2249)

* wip

Signed-off-by: Jason <[email protected]>

* c1

Signed-off-by: Jason <[email protected]>

* bug fixes

Signed-off-by: Jason <[email protected]>

* bug fixes

Signed-off-by: Jason <[email protected]>

* v2

Signed-off-by: Jason <[email protected]>

* changes

Signed-off-by: Jason <[email protected]>

* add types, old model working

Signed-off-by: Jason <[email protected]>

* pitch

Signed-off-by: Jason <[email protected]>

* update

Signed-off-by: Jason <[email protected]>

* update

Signed-off-by: Jason <[email protected]>

* let it work

Signed-off-by: Jason <[email protected]>

* fixes

Signed-off-by: Jason <[email protected]>

* add oktai comments

Signed-off-by: Jason <[email protected]>

* debug

Signed-off-by: Jason <[email protected]>

* scale

Signed-off-by: Jason <[email protected]>

* wip

Signed-off-by: Jason <[email protected]>

* fix test for v1

Signed-off-by: Jason <[email protected]>

* merge train and val

Signed-off-by: Jason <[email protected]>

* back to par bin att, add correct encoder settings

Signed-off-by: Jason <[email protected]>

* try

Signed-off-by: Jason <[email protected]>

* undo

Signed-off-by: Jason <[email protected]>

* lgtm:

Signed-off-by: Jason <[email protected]>

* style

Signed-off-by: Jason <[email protected]>

* default to ljs

Signed-off-by: Jason <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* patch quantization (#2314)

* update quantization

Signed-off-by: slyned <[email protected]>

* update quant infer trt

Signed-off-by: slyned <[email protected]>

* fix style

Signed-off-by: slyned <[email protected]>

Co-authored-by: slyned <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Pin OmegaConf version for 1.0.0 (#2316)

* Update OmegaConf compatibility

Signed-off-by: smajumdar <[email protected]>

* Correct OmegaConf.pretty()

Signed-off-by: smajumdar <[email protected]>

* Upper bound omegaconf

Signed-off-by: smajumdar <[email protected]>

* Revert "Correct OmegaConf.pretty()"

This reverts commit 6ebae2ef

Signed-off-by: smajumdar <[email protected]>

* Revert "Update OmegaConf compatibility"

This reverts commit 83b2cf35a07a742552082e80e6ca34c9b8203cbc.

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* [BUGFIX] OmegaConf forward compatibility (#2319)

* Update OmegaConf compatibility

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: ericharper <[email protected]>

* Correct OmegaConf.pretty()

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: ericharper <[email protected]>

* upper bound omegaconf

Signed-off-by: ericharper <[email protected]>

* add if,else back

Signed-off-by: ericharper <[email protected]>

* typo

Signed-off-by: ericharper <[email protected]>

Co-authored-by: Somshubra Majumdar <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* bumping version to 1.0.1

Signed-off-by: Oleksii Kuchaiev <[email protected]>

Signed-off-by: Micha Livne <[email protected]>

* fix_cluster_small_sample (#2303)

* fix_cluster_small_sample

Signed-off-by: nithinraok <[email protected]>

* for smaller samples

Signed-off-by: nithinraok <[email protected]>

* remove type

Signed-off-by: nithinraok <[email protected]>

* similarity matrix

Signed-off-by: nithinraok <[email protected]>

* est num of speakers add

Signed-off-by: nithinraok <[email protected]>

* comment update

Signed-off-by: nithinraok <[email protected]>

* style fix

Signed-off-by: nithinraok <[email protected]>

* MIN_SAMPLES passed through func arg

Signed-off-by: nithinraok <[email protected]>

* doc string update

Signed-off-by: nithinraok <[email protected]>

* spell mistake

Signed-off-by: nithinraok <[email protected]>
Signed-off-by: Micha Livne <[email protected]>

* Fastpitch export (#2300)

* wip

Signed-off-by: Jason <[email protected]>

* c1

Signed-off-by: Jason <[email protected]>

* bug fixes

Signed-off-by: Jason <[email protected]>

* bug fixes

Signed-off-by: Jason <[email protected]>

* v2

Signed-off-by: Jason <[email protected]>

* changes

Signed-off-by: Jason <[email protected]>

* add types, old model working

Signed-off-by: Jason <[email protected]>

* pitch

Signed-off-by: Jason <[email protected]>

* update

Signed-off-by: Jason <[email protected]>

* update

Signed-off-by: Jason <[email protected]>

* let it work

Signed-off-by: Jason <[email protected]>

* fixes

Signed-off-by: Jason <[email protected]>

* add oktai comments

Signed-off-by: Jason <[email protected]>

* debug

Signed-off-by: Jason <[email protected]>

* scale

Signed-off-by: Jason <[email protected]>

* wip

Signed-off-by: Jason <[email protected]>

* fix test for v1

Signed-off-by: Jason <[email protected]>
…
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet