NMT bottleneck #2390

michalivne · 2021-06-22T23:29:23Z

This PR adds a bottleneck architecture to NMT models, along with support for training VAE and MIM latent variable models.

Summary

The bottleneck class (MTBottleneckModel) supports two architectures:

No bottleneck: model_type=seq2seq, the usual NMT mdoel.
Fixed-size bottleneck: model_type in [seq2seq-br, seq2seq-mim, seq2seq-vae], where the output of the encoder is projected to a fixed number of steps.

The projection to a fixed number of steps is based on the paper https://arxiv.org/pdf/1703.03130.pdf.
The idea is to use K attention heads to compute K weighted-average hidden states, projecting a variable number of steps into K steps.

The bottleneck variants offer different losses:

The loss of seq2seq-br is reconstruction only (like seq2seq).
The loss of seq2seq-mim is reconstruction + latent entropy minimization (See MIM, https://arxiv.org/pdf/2003.02645.pdf).
The loss of seq2seq-vae is reconstruction + latent entropy regularization (See https://arxiv.org/pdf/1312.6114.pdf) correspondingly.

YAML Configuration

The following configurations were added to the YAML config:

model:
  model_type: 'seq2seq-br' # supports seq2seq, seq2seq-br, seq2seq-mim, seq2seq-vae (see description above)
  min_logv: -8 # minimal allowed logv for seq2seq-mim
  ortho_loss_coef: 0.0 # orthogonality coefficient for attention bridge
  att_bridge_size: 512 # dimension of a step in attention bridge
  att_bridge_k: 16 # fixed number of steps in attention bridge
  att_bridge_inner_size: 1024 # feedforward size in attention bridge
  non_recon_warmup_batches: 200000 # warm-up steps for seq2seq-mim, seq2seq-vae
  recon_per_token: true # when false reconstruction is computed per sample, not per token

ortho_loss_coef - encou
non_recon_warmup_batches - anneals the KL divergence term (for VAE) or latent entropy term (for MIM)
recon_per_token - if false, loss is computed per sample (i.e., summed over all tokens), otherwise loss is averaged per token (the default behaviour).

Usage

See usage example below, training a seq2seq-br with 32 steps.

NOTE: max_generation_delta must be big enough to allow the generation of the longest sequence given the chosen bottleneck size.

python -- examples/nlp/machine_translation/enc_dec_nmt-bottleneck.py \
      --config-name=aayn_bottleneck \
      ...
      model.max_generation_delta=256 \
      ...
      model.model_type=seq2seq-br \
      model.att_bridge_size=1024 \
      model.ortho_loss_coef=0.0 \
      model.att_bridge_k=32 \
      model.att_bridge_inner_size=1024 \
      model.recon_per_token=true \
      model.non_recon_warmup_batches=150000 \
      ...

Additional Info

The attention bridge class (AttentionBridge) was added to nemo/collections/nlp/modules/common/transformer/transformer_modules.py.
The bottleneck class (MTBottleneckModel) supports logging of various loss terms.

* move do_training flag to config Signed-off-by: Yang Zhang <[email protected]> * added telephone to itn Signed-off-by: Yang Zhang <[email protected]> * add telephone and email to itn Signed-off-by: Yang Zhang <[email protected]> Signed-off-by: Micha Livne <[email protected]>

* Preserve the tokenizer config for ASR Signed-off-by: smajumdar <[email protected]> * Correct nlp docs Signed-off-by: smajumdar <[email protected]> Signed-off-by: Micha Livne <[email protected]>

NVIDIA#2144) * Removing graphsurgeon optional dependency, improving import error reporting Signed-off-by: Boris Fomitchev <[email protected]> * Fixing scope error Signed-off-by: Boris Fomitchev <[email protected]> Signed-off-by: Micha Livne <[email protected]>

Signed-off-by: PiotrDabkowski <[email protected]> Signed-off-by: Micha Livne <[email protected]>

Signed-off-by: Micha Livne <[email protected]>

* removed graphutils, suppletive, data_loader_utils from itn to be reused from tn Signed-off-by: Yang Zhang <[email protected]> * inheriting itn from tn, thus removing redundancy Signed-off-by: Yang Zhang <[email protected]> * cleaned whitelist Signed-off-by: Yang Zhang <[email protected]> * lgtm fix Signed-off-by: Yang Zhang <[email protected]> Signed-off-by: Micha Livne <[email protected]>

* Update how artifacts work Signed-off-by: Oleksii Kuchaiev <[email protected]> * fixing some tests Signed-off-by: Oleksii Kuchaiev <[email protected]> * fix more tests Signed-off-by: Oleksii Kuchaiev <[email protected]> * add __init__ to tests to make them discoverable Signed-off-by: Oleksii Kuchaiev <[email protected]> * empty src support Signed-off-by: Oleksii Kuchaiev <[email protected]> * updates plust unittest Signed-off-by: Oleksii Kuchaiev <[email protected]> * add copyright check Signed-off-by: Oleksii Kuchaiev <[email protected]> * copyright header Signed-off-by: Oleksii Kuchaiev <[email protected]> * fix style Signed-off-by: Oleksii Kuchaiev <[email protected]> * handle hashed megatron checkpoint version in nlp restore_from Signed-off-by: ericharper <[email protected]> * add _MODEL_RESTORE_PATH to AppState Signed-off-by: ericharper <[email protected]> * get rid of global folder caching Signed-off-by: Oleksii Kuchaiev <[email protected]> * double register - warning instead of exception Signed-off-by: Oleksii Kuchaiev <[email protected]> * Add asr spe tests Signed-off-by: smajumdar <[email protected]> * Pop out asr wpe pre-registered value Signed-off-by: smajumdar <[email protected]> * Correct ASR tests and paths Signed-off-by: smajumdar <[email protected]> * Correct tokenizer saving Signed-off-by: smajumdar <[email protected]> * Correct ASR tests Signed-off-by: smajumdar <[email protected]> * Correct ASR bpe mixin Signed-off-by: smajumdar <[email protected]> * Patch up backward compatibility Signed-off-by: smajumdar <[email protected]> * update register_bert_model Signed-off-by: ericharper <[email protected]> * update all get_lm_model calls Signed-off-by: ericharper <[email protected]> * return None if src not found Signed-off-by: ericharper <[email protected]> * handle case with no tokenizer Signed-off-by: ericharper <[email protected]> * do not add another hash is using tarfile_artifacts Signed-off-by: ericharper <[email protected]> * add return_none flag, update doc string Signed-off-by: ericharper <[email protected]> * update default behavior of register_artifact for NLPModel Signed-off-by: ericharper <[email protected]> * change kwarg name to verify_src_exists Signed-off-by: ericharper <[email protected]> * use cfg instead of _cfg Signed-off-by: Oleksii Kuchaiev <[email protected]> * some cleanups Signed-off-by: Oleksii Kuchaiev <[email protected]> Co-authored-by: ericharper <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Signed-off-by: Micha Livne <[email protected]>

* fixed branch in IR tutorial Signed-off-by: AlexGrinch <[email protected]> * bucketing tarred dataset for lm training Signed-off-by: AlexGrinch <[email protected]> * updated global rank Signed-off-by: AlexGrinch <[email protected]> * perplexity update Signed-off-by: AlexGrinch <[email protected]> * refactor lm to be campatible with latest nmt Signed-off-by: AlexGrinch <[email protected]> * perplexity change Signed-off-by: AlexGrinch <[email protected]> * removed obsolete config Signed-off-by: AlexGrinch <[email protected]> * added sequence perplexity Signed-off-by: AlexGrinch <[email protected]> * added non-smoothed CE loss for validation Signed-off-by: AlexGrinch <[email protected]> * unified sentence dataset, torchmetrics for sequence perplexity Signed-off-by: AlexGrinch <[email protected]> * translate_ddp refactor Signed-off-by: AlexGrinch <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Co-authored-by: Sandeep Subramanian <[email protected]> Signed-off-by: Micha Livne <[email protected]>

* rename dl index 0 loss and sacrebleu for backwards compatibility Signed-off-by: ericharper <[email protected]> * eval -> val/tst Signed-off-by: ericharper <[email protected]> * instantiate torchmetrics after instantiating dataloaders Signed-off-by: ericharper <[email protected]> * bug Signed-off-by: ericharper <[email protected]> * remove debugging log Signed-off-by: ericharper <[email protected]> * remove debugging log Signed-off-by: ericharper <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: Micha Livne <[email protected]>

Signed-off-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: Micha Livne <[email protected]>

Signed-off-by: Vahid <[email protected]> Signed-off-by: Micha Livne <[email protected]>

* fix for electronic Signed-off-by: ekmb <[email protected]> * special symbols added Signed-off-by: ekmb <[email protected]> * restrict symbols list Signed-off-by: ekmb <[email protected]> Signed-off-by: Micha Livne <[email protected]>

* Add FS2 data loading test Signed-off-by: Jocelyn Huang <[email protected]> * TTS docs update for FastSpeech 2 Signed-off-by: Jocelyn Huang <[email protected]> * Style fix for FS2 dataset test Signed-off-by: Jocelyn Huang <[email protected]> * Fix transpose typo Signed-off-by: Jocelyn Huang <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Co-authored-by: Jason <[email protected]> Signed-off-by: Micha Livne <[email protected]>

* Patch for backtranslation in lm dataset Signed-off-by: MaximumEntropy <[email protected]> * One more fix Signed-off-by: MaximumEntropy <[email protected]> Signed-off-by: Micha Livne <[email protected]>

* Started adding SAP dataset Signed-off-by: Virginia Adams <[email protected]> * Delete .lm_bert_dataset.py.swp Signed-off-by: Virginia Adams <[email protected]> * Added dataset and loss Signed-off-by: Virginia Adams <[email protected]> * Added entity linking encoder model Signed-off-by: Virginia Adams <[email protected]> * Can build and use index from pubmedbert model Signed-off-by: Virginia Adams <[email protected]> * checked boolean logic in build_index.py Signed-off-by: Virginia Adams <[email protected]> * End to end tested all functionality Signed-off-by: Virginia Adams <[email protected]> * fixed val loss none at end of validation Signed-off-by: Virginia Adams <[email protected]> * Started adding demo entity linking notebook Signed-off-by: Virginia Adams <[email protected]> * adding in notebook demo Signed-off-by: Virginia Adams <[email protected]> * added call to entitylinking classes in __init__.py files Signed-off-by: Virginia Adams <[email protected]> * Added eval code to notebook Signed-off-by: Virginia Adams <[email protected]> * Adding unfinished notebook Signed-off-by: Virginia Adams <[email protected]> * Cleaned up example dir Signed-off-by: Virginia Adams <[email protected]> * Fixed recap commands Signed-off-by: Virginia Adams <[email protected]> * added model typing and tiny data tar Signed-off-by: Virginia Adams <[email protected]> * Adding tiny data zip Signed-off-by: Virginia Adams <[email protected]> * updated tiny example config data path Signed-off-by: Virginia Adams <[email protected]> * Notebook demo works Signed-off-by: Virginia Adams <[email protected]> * Changed training epochs Signed-off-by: Virginia Adams <[email protected]> * Removed output from training and install cells Signed-off-by: Virginia Adams <[email protected]> * changed code formatting Signed-off-by: Virginia Adams <[email protected]> * Started doc string for new functions Signed-off-by: Virginia Adams <[email protected]> * Updated data_preprocessing to save to data_dir Signed-off-by: Virginia Adams <[email protected]> * fixed comment in notebook demo Signed-off-by: Virginia Adams <[email protected]> * Update data_preprocessing.py Signed-off-by: Virginia Adams <[email protected]> * updated nemo typing imports Signed-off-by: Virginia Adams <[email protected]> * about to rebase Signed-off-by: Virginia Adams <[email protected]> * added back umls_dataset_processing.py Signed-off-by: Virginia Adams <[email protected]> * Removed example data Signed-off-by: Virginia Adams <[email protected]> * Fixed typos in notebook demo Signed-off-by: Virginia Adams <[email protected]> * fixed lgtm-com issues Signed-off-by: Virginia Adams <[email protected]> * added copyright headers Signed-off-by: Virginia Adams <[email protected]> * fixed import and copyright headers Signed-off-by: Virginia Adams <[email protected]> * Fixed formatting Signed-off-by: Virginia Adams <[email protected]> * Fixed formatting changes 2 Signed-off-by: Virginia Adams <[email protected]> * fixed test formatting Signed-off-by: Virginia Adams <[email protected]> * Added __init__.py for model and dataset Signed-off-by: Virginia Adams <[email protected]> * loading newline file returns data_dir now Signed-off-by: Virginia Adams <[email protected]> * Removed conf notebook and deleted comment Signed-off-by: Virginia Adams <[email protected]> * Added jenkins test Signed-off-by: Virginia Adams <[email protected]> * Updated Jenkins test Signed-off-by: Virginia Adams <[email protected]> * fixed file path Signed-off-by: Virginia Adams <[email protected]> * Changed Jenkins pipeline order Signed-off-by: Virginia Adams <[email protected]> * Fixed Jenkins datapath... again... Signed-off-by: Virginia Adams <[email protected]> * Made most review changes Signed-off-by: Virginia Adams <[email protected]> * fixed copy right Signed-off-by: Virginia Adams <[email protected]> * updated unit test to wget config Signed-off-by: Virginia Adams <[email protected]> * reverted test file back Signed-off-by: Virginia Adams <[email protected]> * Added project dir to jenkins test Signed-off-by: Virginia Adams <[email protected]> * defined config in unit test Signed-off-by: Virginia Adams <[email protected]> Co-authored-by: Sandeep Subramanian <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: Micha Livne <[email protected]>

* Correct branch version Signed-off-by: smajumdar <[email protected]> * Correct Jenkinsfile Signed-off-by: smajumdar <[email protected]> * Update rst files Signed-off-by: smajumdar <[email protected]> Signed-off-by: Micha Livne <[email protected]>

Signed-off-by: Micha Livne <[email protected]>

Signed-off-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: Micha Livne <[email protected]>

Signed-off-by: Micha Livne <[email protected]>

* FastSpeech 2 Test & Docs (NVIDIA#2143) * Add FS2 data loading test Signed-off-by: Jocelyn Huang <[email protected]> * TTS docs update for FastSpeech 2 Signed-off-by: Jocelyn Huang <[email protected]> * Style fix for FS2 dataset test Signed-off-by: Jocelyn Huang <[email protected]> * Fix transpose typo Signed-off-by: Jocelyn Huang <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Co-authored-by: Jason <[email protected]> * Entity linking (NVIDIA#2050) * Started adding SAP dataset Signed-off-by: Virginia Adams <[email protected]> * Delete .lm_bert_dataset.py.swp Signed-off-by: Virginia Adams <[email protected]> * Added dataset and loss Signed-off-by: Virginia Adams <[email protected]> * Added entity linking encoder model Signed-off-by: Virginia Adams <[email protected]> * Can build and use index from pubmedbert model Signed-off-by: Virginia Adams <[email protected]> * checked boolean logic in build_index.py Signed-off-by: Virginia Adams <[email protected]> * End to end tested all functionality Signed-off-by: Virginia Adams <[email protected]> * fixed val loss none at end of validation Signed-off-by: Virginia Adams <[email protected]> * Started adding demo entity linking notebook Signed-off-by: Virginia Adams <[email protected]> * adding in notebook demo Signed-off-by: Virginia Adams <[email protected]> * added call to entitylinking classes in __init__.py files Signed-off-by: Virginia Adams <[email protected]> * Added eval code to notebook Signed-off-by: Virginia Adams <[email protected]> * Adding unfinished notebook Signed-off-by: Virginia Adams <[email protected]> * Cleaned up example dir Signed-off-by: Virginia Adams <[email protected]> * Fixed recap commands Signed-off-by: Virginia Adams <[email protected]> * added model typing and tiny data tar Signed-off-by: Virginia Adams <[email protected]> * Adding tiny data zip Signed-off-by: Virginia Adams <[email protected]> * updated tiny example config data path Signed-off-by: Virginia Adams <[email protected]> * Notebook demo works Signed-off-by: Virginia Adams <[email protected]> * Changed training epochs Signed-off-by: Virginia Adams <[email protected]> * Removed output from training and install cells Signed-off-by: Virginia Adams <[email protected]> * changed code formatting Signed-off-by: Virginia Adams <[email protected]> * Started doc string for new functions Signed-off-by: Virginia Adams <[email protected]> * Updated data_preprocessing to save to data_dir Signed-off-by: Virginia Adams <[email protected]> * fixed comment in notebook demo Signed-off-by: Virginia Adams <[email protected]> * Update data_preprocessing.py Signed-off-by: Virginia Adams <[email protected]> * updated nemo typing imports Signed-off-by: Virginia Adams <[email protected]> * about to rebase Signed-off-by: Virginia Adams <[email protected]> * added back umls_dataset_processing.py Signed-off-by: Virginia Adams <[email protected]> * Removed example data Signed-off-by: Virginia Adams <[email protected]> * Fixed typos in notebook demo Signed-off-by: Virginia Adams <[email protected]> * fixed lgtm-com issues Signed-off-by: Virginia Adams <[email protected]> * added copyright headers Signed-off-by: Virginia Adams <[email protected]> * fixed import and copyright headers Signed-off-by: Virginia Adams <[email protected]> * Fixed formatting Signed-off-by: Virginia Adams <[email protected]> * Fixed formatting changes 2 Signed-off-by: Virginia Adams <[email protected]> * fixed test formatting Signed-off-by: Virginia Adams <[email protected]> * Added __init__.py for model and dataset Signed-off-by: Virginia Adams <[email protected]> * loading newline file returns data_dir now Signed-off-by: Virginia Adams <[email protected]> * Removed conf notebook and deleted comment Signed-off-by: Virginia Adams <[email protected]> * Added jenkins test Signed-off-by: Virginia Adams <[email protected]> * Updated Jenkins test Signed-off-by: Virginia Adams <[email protected]> * fixed file path Signed-off-by: Virginia Adams <[email protected]> * Changed Jenkins pipeline order Signed-off-by: Virginia Adams <[email protected]> * Fixed Jenkins datapath... again... Signed-off-by: Virginia Adams <[email protected]> * Made most review changes Signed-off-by: Virginia Adams <[email protected]> * fixed copy right Signed-off-by: Virginia Adams <[email protected]> * updated unit test to wget config Signed-off-by: Virginia Adams <[email protected]> * reverted test file back Signed-off-by: Virginia Adams <[email protected]> * Added project dir to jenkins test Signed-off-by: Virginia Adams <[email protected]> * defined config in unit test Signed-off-by: Virginia Adams <[email protected]> Co-authored-by: Sandeep Subramanian <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> * switch CI back to main Signed-off-by: Oleksii Kuchaiev <[email protected]> * Make Hifigan jittable Signed-off-by: Ryan Leary <[email protected]> * Remove vestigial debugging printout Signed-off-by: Ryan Leary <[email protected]> * Add export forward and fix style Signed-off-by: Ryan Leary <[email protected]> * Fix load_state_dict override for arbitrary layers Signed-off-by: Ryan Leary <[email protected]> Co-authored-by: Jocelyn <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Co-authored-by: Jason <[email protected]> Co-authored-by: vadam5 <[email protected]> Co-authored-by: Sandeep Subramanian <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Co-authored-by: Ryan Leary <[email protected]> Signed-off-by: Micha Livne <[email protected]>

Signed-off-by: Jason <[email protected]> Signed-off-by: Micha Livne <[email protected]>

* notebook size reduced Signed-off-by: ekmb <[email protected]> * notebook size reduced Signed-off-by: ekmb <[email protected]> Signed-off-by: Micha Livne <[email protected]>

* update spectral clustering method Signed-off-by: nithinraok <[email protected]> * update Jenkins File Signed-off-by: nithinraok <[email protected]> * threshold fix by reducing window length for shorter embs Signed-off-by: nithinraok <[email protected]> * grammar fixes Signed-off-by: nithinraok <[email protected]> * CR update Signed-off-by: nithinraok <[email protected]> * paper reference Signed-off-by: nithinraok <[email protected]> * improve docstring for yaml Signed-off-by: nithinraok <[email protected]> * Doc fixes Signed-off-by: nithinraok <[email protected]> Signed-off-by: Micha Livne <[email protected]>

Signed-off-by: Jason <[email protected]> Signed-off-by: Micha Livne <[email protected]>

Signed-off-by: smajumdar <[email protected]> Signed-off-by: Micha Livne <[email protected]>

* artifacts update Signed-off-by: ekmb <[email protected]> * artifacts update Signed-off-by: ekmb <[email protected]> * jenkins Signed-off-by: ekmb <[email protected]> * fix for model restoration Signed-off-by: ekmb <[email protected]> * typos fix + jenkins dir update Signed-off-by: ekmb <[email protected]> * typo Signed-off-by: ericharper <[email protected]> * update jenkins branch Signed-off-by: ericharper <[email protected]> * add && Signed-off-by: ericharper <[email protected]> * jenkins Signed-off-by: ekmb <[email protected]> * jenkins disable Signed-off-by: ekmb <[email protected]> * revert jenkins Signed-off-by: ekmb <[email protected]> * jenkins disable Signed-off-by: ekmb <[email protected]> * revert jenkins Signed-off-by: ekmb <[email protected]> Co-authored-by: ericharper <[email protected]> Signed-off-by: Micha Livne <[email protected]>

* Initial attempt at always_save_nemo fix Signed-off-by: MaximumEntropy <[email protected]> * updated path before saving in exp manager, fixed bug when handling tarfile artifacts Signed-off-by: ericharper <[email protected]> * Add test with always_save_nemo to exp_manager Signed-off-by: MaximumEntropy <[email protected]> * Style fixes Signed-off-by: MaximumEntropy <[email protected]> * update jenkins branch Signed-off-by: ericharper <[email protected]> * check for nemo: Signed-off-by: ericharper <[email protected]> * check for nemo: Signed-off-by: ericharper <[email protected]> * check for nemo: Signed-off-by: ericharper <[email protected]> Co-authored-by: ericharper <[email protected]> Co-authored-by: Jason <[email protected]> Signed-off-by: Micha Livne <[email protected]>

Signed-off-by: Yang Zhang <[email protected]> Signed-off-by: Micha Livne <[email protected]>

* Limit Pytorch lightning release Signed-off-by: smajumdar <[email protected]> * Add final two checks Signed-off-by: smajumdar <[email protected]> Co-authored-by: Jason <[email protected]> Signed-off-by: Micha Livne <[email protected]>

* squash Signed-off-by: Jason <[email protected]> * notebook fixes Signed-off-by: Jason <[email protected]> * notebook fixes Signed-off-by: Jason <[email protected]> * typos Signed-off-by: Jason <[email protected]> Signed-off-by: Micha Livne <[email protected]>

Signed-off-by: Micha Livne <[email protected]>

lgtm-com · 2021-07-12T22:56:22Z

This pull request introduces 12 alerts when merging d3c655b into 44a3d02 - view on LGTM.com

new alerts:

12 for Unused import

Signed-off-by: Micha Livne <[email protected]>

…bottleneck

lgtm-com · 2021-07-12T23:26:48Z

This pull request introduces 12 alerts when merging ec1303e into 44a3d02 - view on LGTM.com

new alerts:

12 for Unused import

Signed-off-by: Micha Livne <[email protected]>

lgtm-com · 2021-07-12T23:54:56Z

This pull request introduces 12 alerts when merging 78382bc into 44a3d02 - view on LGTM.com

new alerts:

12 for Unused import

ericharper · 2021-07-13T15:30:24Z

@michalivne, could you add usage instructions to the README?

ericharper · 2021-07-13T15:31:01Z

Please remove the empty file get_wk2.sh

nemo/collections/nlp/models/machine_translation/mt_enc_dec_model.py

lgtm-com · 2021-07-13T20:10:05Z

This pull request introduces 12 alerts when merging b91349b into 5363b49 - view on LGTM.com

new alerts:

12 for Unused import

Signed-off-by: Micha Livne <[email protected]>

ericharper

LGTM. Thanks!

lgtm-com · 2021-07-13T20:45:21Z

This pull request introduces 12 alerts when merging 14395e3 into 5363b49 - view on LGTM.com

new alerts:

12 for Unused import

Signed-off-by: Micha Livne <[email protected]>

MaximumEntropy

Thanks for making all of the changes :)

lgtm-com · 2021-07-13T22:39:12Z

This pull request introduces 12 alerts when merging 85449f3 into 5363b49 - view on LGTM.com

new alerts:

12 for Unused import

lgtm-com · 2021-07-14T13:30:41Z

This pull request introduces 12 alerts when merging efa5d3f into 6ebbcb8 - view on LGTM.com

new alerts:

12 for Unused import

lgtm-com · 2021-07-14T16:42:05Z

This pull request introduces 12 alerts when merging d95f97f into ed08545 - view on LGTM.com

new alerts:

12 for Unused import

* Itn add classes (#2141) * move do_training flag to config Signed-off-by: Yang Zhang <[email protected]> * added telephone to itn Signed-off-by: Yang Zhang <[email protected]> * add telephone and email to itn Signed-off-by: Yang Zhang <[email protected]> Signed-off-by: Micha Livne <[email protected]> * ASR + NLP Doc Fixes (#2136) * Preserve the tokenizer config for ASR Signed-off-by: smajumdar <[email protected]> * Correct nlp docs Signed-off-by: smajumdar <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Removing graphsurgeon optional dependency, improving import error rep… (#2144) * Removing graphsurgeon optional dependency, improving import error reporting Signed-off-by: Boris Fomitchev <[email protected]> * Fixing scope error Signed-off-by: Boris Fomitchev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Fix FilterbankFeatures eval nondeterminism. (#2146) Signed-off-by: PiotrDabkowski <[email protected]> Signed-off-by: Micha Livne <[email protected]> * fix the docs. (#2148) Signed-off-by: Micha Livne <[email protected]> * Text processing refactor (#2149) * removed graphutils, suppletive, data_loader_utils from itn to be reused from tn Signed-off-by: Yang Zhang <[email protected]> * inheriting itn from tn, thus removing redundancy Signed-off-by: Yang Zhang <[email protected]> * cleaned whitelist Signed-off-by: Yang Zhang <[email protected]> * lgtm fix Signed-off-by: Yang Zhang <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Update how artifacts work (#2138) * Update how artifacts work Signed-off-by: Oleksii Kuchaiev <[email protected]> * fixing some tests Signed-off-by: Oleksii Kuchaiev <[email protected]> * fix more tests Signed-off-by: Oleksii Kuchaiev <[email protected]> * add __init__ to tests to make them discoverable Signed-off-by: Oleksii Kuchaiev <[email protected]> * empty src support Signed-off-by: Oleksii Kuchaiev <[email protected]> * updates plust unittest Signed-off-by: Oleksii Kuchaiev <[email protected]> * add copyright check Signed-off-by: Oleksii Kuchaiev <[email protected]> * copyright header Signed-off-by: Oleksii Kuchaiev <[email protected]> * fix style Signed-off-by: Oleksii Kuchaiev <[email protected]> * handle hashed megatron checkpoint version in nlp restore_from Signed-off-by: ericharper <[email protected]> * add _MODEL_RESTORE_PATH to AppState Signed-off-by: ericharper <[email protected]> * get rid of global folder caching Signed-off-by: Oleksii Kuchaiev <[email protected]> * double register - warning instead of exception Signed-off-by: Oleksii Kuchaiev <[email protected]> * Add asr spe tests Signed-off-by: smajumdar <[email protected]> * Pop out asr wpe pre-registered value Signed-off-by: smajumdar <[email protected]> * Correct ASR tests and paths Signed-off-by: smajumdar <[email protected]> * Correct tokenizer saving Signed-off-by: smajumdar <[email protected]> * Correct ASR tests Signed-off-by: smajumdar <[email protected]> * Correct ASR bpe mixin Signed-off-by: smajumdar <[email protected]> * Patch up backward compatibility Signed-off-by: smajumdar <[email protected]> * update register_bert_model Signed-off-by: ericharper <[email protected]> * update all get_lm_model calls Signed-off-by: ericharper <[email protected]> * return None if src not found Signed-off-by: ericharper <[email protected]> * handle case with no tokenizer Signed-off-by: ericharper <[email protected]> * do not add another hash is using tarfile_artifacts Signed-off-by: ericharper <[email protected]> * add return_none flag, update doc string Signed-off-by: ericharper <[email protected]> * update default behavior of register_artifact for NLPModel Signed-off-by: ericharper <[email protected]> * change kwarg name to verify_src_exists Signed-off-by: ericharper <[email protected]> * use cfg instead of _cfg Signed-off-by: Oleksii Kuchaiev <[email protected]> * some cleanups Signed-off-by: Oleksii Kuchaiev <[email protected]> Co-authored-by: ericharper <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Language model refactoring (#2120) * fixed branch in IR tutorial Signed-off-by: AlexGrinch <[email protected]> * bucketing tarred dataset for lm training Signed-off-by: AlexGrinch <[email protected]> * updated global rank Signed-off-by: AlexGrinch <[email protected]> * perplexity update Signed-off-by: AlexGrinch <[email protected]> * refactor lm to be campatible with latest nmt Signed-off-by: AlexGrinch <[email protected]> * perplexity change Signed-off-by: AlexGrinch <[email protected]> * removed obsolete config Signed-off-by: AlexGrinch <[email protected]> * added sequence perplexity Signed-off-by: AlexGrinch <[email protected]> * added non-smoothed CE loss for validation Signed-off-by: AlexGrinch <[email protected]> * unified sentence dataset, torchmetrics for sequence perplexity Signed-off-by: AlexGrinch <[email protected]> * translate_ddp refactor Signed-off-by: AlexGrinch <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Co-authored-by: Sandeep Subramanian <[email protected]> Signed-off-by: Micha Livne <[email protected]> * [NMT] Multi-validation Patch (#2150) * rename dl index 0 loss and sacrebleu for backwards compatibility Signed-off-by: ericharper <[email protected]> * eval -> val/tst Signed-off-by: ericharper <[email protected]> * instantiate torchmetrics after instantiating dataloaders Signed-off-by: ericharper <[email protected]> * bug Signed-off-by: ericharper <[email protected]> * remove debugging log Signed-off-by: ericharper <[email protected]> * remove debugging log Signed-off-by: ericharper <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * bumping version to 1.0.0 Signed-off-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * fixed the num_samples of text classification model. (#2152) Signed-off-by: Vahid <[email protected]> Signed-off-by: Micha Livne <[email protected]> * fix for electronic (#2153) * fix for electronic Signed-off-by: ekmb <[email protected]> * special symbols added Signed-off-by: ekmb <[email protected]> * restrict symbols list Signed-off-by: ekmb <[email protected]> Signed-off-by: Micha Livne <[email protected]> * FastSpeech 2 Test & Docs (#2143) * Add FS2 data loading test Signed-off-by: Jocelyn Huang <[email protected]> * TTS docs update for FastSpeech 2 Signed-off-by: Jocelyn Huang <[email protected]> * Style fix for FS2 dataset test Signed-off-by: Jocelyn Huang <[email protected]> * Fix transpose typo Signed-off-by: Jocelyn Huang <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Co-authored-by: Jason <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Minor patch for translate_ddp (#2155) * Patch for backtranslation in lm dataset Signed-off-by: MaximumEntropy <[email protected]> * One more fix Signed-off-by: MaximumEntropy <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Entity linking (#2050) * Started adding SAP dataset Signed-off-by: Virginia Adams <[email protected]> * Delete .lm_bert_dataset.py.swp Signed-off-by: Virginia Adams <[email protected]> * Added dataset and loss Signed-off-by: Virginia Adams <[email protected]> * Added entity linking encoder model Signed-off-by: Virginia Adams <[email protected]> * Can build and use index from pubmedbert model Signed-off-by: Virginia Adams <[email protected]> * checked boolean logic in build_index.py Signed-off-by: Virginia Adams <[email protected]> * End to end tested all functionality Signed-off-by: Virginia Adams <[email protected]> * fixed val loss none at end of validation Signed-off-by: Virginia Adams <[email protected]> * Started adding demo entity linking notebook Signed-off-by: Virginia Adams <[email protected]> * adding in notebook demo Signed-off-by: Virginia Adams <[email protected]> * added call to entitylinking classes in __init__.py files Signed-off-by: Virginia Adams <[email protected]> * Added eval code to notebook Signed-off-by: Virginia Adams <[email protected]> * Adding unfinished notebook Signed-off-by: Virginia Adams <[email protected]> * Cleaned up example dir Signed-off-by: Virginia Adams <[email protected]> * Fixed recap commands Signed-off-by: Virginia Adams <[email protected]> * added model typing and tiny data tar Signed-off-by: Virginia Adams <[email protected]> * Adding tiny data zip Signed-off-by: Virginia Adams <[email protected]> * updated tiny example config data path Signed-off-by: Virginia Adams <[email protected]> * Notebook demo works Signed-off-by: Virginia Adams <[email protected]> * Changed training epochs Signed-off-by: Virginia Adams <[email protected]> * Removed output from training and install cells Signed-off-by: Virginia Adams <[email protected]> * changed code formatting Signed-off-by: Virginia Adams <[email protected]> * Started doc string for new functions Signed-off-by: Virginia Adams <[email protected]> * Updated data_preprocessing to save to data_dir Signed-off-by: Virginia Adams <[email protected]> * fixed comment in notebook demo Signed-off-by: Virginia Adams <[email protected]> * Update data_preprocessing.py Signed-off-by: Virginia Adams <[email protected]> * updated nemo typing imports Signed-off-by: Virginia Adams <[email protected]> * about to rebase Signed-off-by: Virginia Adams <[email protected]> * added back umls_dataset_processing.py Signed-off-by: Virginia Adams <[email protected]> * Removed example data Signed-off-by: Virginia Adams <[email protected]> * Fixed typos in notebook demo Signed-off-by: Virginia Adams <[email protected]> * fixed lgtm-com issues Signed-off-by: Virginia Adams <[email protected]> * added copyright headers Signed-off-by: Virginia Adams <[email protected]> * fixed import and copyright headers Signed-off-by: Virginia Adams <[email protected]> * Fixed formatting Signed-off-by: Virginia Adams <[email protected]> * Fixed formatting changes 2 Signed-off-by: Virginia Adams <[email protected]> * fixed test formatting Signed-off-by: Virginia Adams <[email protected]> * Added __init__.py for model and dataset Signed-off-by: Virginia Adams <[email protected]> * loading newline file returns data_dir now Signed-off-by: Virginia Adams <[email protected]> * Removed conf notebook and deleted comment Signed-off-by: Virginia Adams <[email protected]> * Added jenkins test Signed-off-by: Virginia Adams <[email protected]> * Updated Jenkins test Signed-off-by: Virginia Adams <[email protected]> * fixed file path Signed-off-by: Virginia Adams <[email protected]> * Changed Jenkins pipeline order Signed-off-by: Virginia Adams <[email protected]> * Fixed Jenkins datapath... again... Signed-off-by: Virginia Adams <[email protected]> * Made most review changes Signed-off-by: Virginia Adams <[email protected]> * fixed copy right Signed-off-by: Virginia Adams <[email protected]> * updated unit test to wget config Signed-off-by: Virginia Adams <[email protected]> * reverted test file back Signed-off-by: Virginia Adams <[email protected]> * Added project dir to jenkins test Signed-off-by: Virginia Adams <[email protected]> * defined config in unit test Signed-off-by: Virginia Adams <[email protected]> Co-authored-by: Sandeep Subramanian <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Correct branch version for v1.0.0 (#2157) * Correct branch version Signed-off-by: smajumdar <[email protected]> * Correct Jenkinsfile Signed-off-by: smajumdar <[email protected]> * Update rst files Signed-off-by: smajumdar <[email protected]> Signed-off-by: Micha Livne <[email protected]> * switch CI back to main Signed-off-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * fixed the docs. (#2156) Signed-off-by: Micha Livne <[email protected]> * Make Hifigan jittable (#2159) * FastSpeech 2 Test & Docs (#2143) * Add FS2 data loading test Signed-off-by: Jocelyn Huang <[email protected]> * TTS docs update for FastSpeech 2 Signed-off-by: Jocelyn Huang <[email protected]> * Style fix for FS2 dataset test Signed-off-by: Jocelyn Huang <[email protected]> * Fix transpose typo Signed-off-by: Jocelyn Huang <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Co-authored-by: Jason <[email protected]> * Entity linking (#2050) * Started adding SAP dataset Signed-off-by: Virginia Adams <[email protected]> * Delete .lm_bert_dataset.py.swp Signed-off-by: Virginia Adams <[email protected]> * Added dataset and loss Signed-off-by: Virginia Adams <[email protected]> * Added entity linking encoder model Signed-off-by: Virginia Adams <[email protected]> * Can build and use index from pubmedbert model Signed-off-by: Virginia Adams <[email protected]> * checked boolean logic in build_index.py Signed-off-by: Virginia Adams <[email protected]> * End to end tested all functionality Signed-off-by: Virginia Adams <[email protected]> * fixed val loss none at end of validation Signed-off-by: Virginia Adams <[email protected]> * Started adding demo entity linking notebook Signed-off-by: Virginia Adams <[email protected]> * adding in notebook demo Signed-off-by: Virginia Adams <[email protected]> * added call to entitylinking classes in __init__.py files Signed-off-by: Virginia Adams <[email protected]> * Added eval code to notebook Signed-off-by: Virginia Adams <[email protected]> * Adding unfinished notebook Signed-off-by: Virginia Adams <[email protected]> * Cleaned up example dir Signed-off-by: Virginia Adams <[email protected]> * Fixed recap commands Signed-off-by: Virginia Adams <[email protected]> * added model typing and tiny data tar Signed-off-by: Virginia Adams <[email protected]> * Adding tiny data zip Signed-off-by: Virginia Adams <[email protected]> * updated tiny example config data path Signed-off-by: Virginia Adams <[email protected]> * Notebook demo works Signed-off-by: Virginia Adams <[email protected]> * Changed training epochs Signed-off-by: Virginia Adams <[email protected]> * Removed output from training and install cells Signed-off-by: Virginia Adams <[email protected]> * changed code formatting Signed-off-by: Virginia Adams <[email protected]> * Started doc string for new functions Signed-off-by: Virginia Adams <[email protected]> * Updated data_preprocessing to save to data_dir Signed-off-by: Virginia Adams <[email protected]> * fixed comment in notebook demo Signed-off-by: Virginia Adams <[email protected]> * Update data_preprocessing.py Signed-off-by: Virginia Adams <[email protected]> * updated nemo typing imports Signed-off-by: Virginia Adams <[email protected]> * about to rebase Signed-off-by: Virginia Adams <[email protected]> * added back umls_dataset_processing.py Signed-off-by: Virginia Adams <[email protected]> * Removed example data Signed-off-by: Virginia Adams <[email protected]> * Fixed typos in notebook demo Signed-off-by: Virginia Adams <[email protected]> * fixed lgtm-com issues Signed-off-by: Virginia Adams <[email protected]> * added copyright headers Signed-off-by: Virginia Adams <[email protected]> * fixed import and copyright headers Signed-off-by: Virginia Adams <[email protected]> * Fixed formatting Signed-off-by: Virginia Adams <[email protected]> * Fixed formatting changes 2 Signed-off-by: Virginia Adams <[email protected]> * fixed test formatting Signed-off-by: Virginia Adams <[email protected]> * Added __init__.py for model and dataset Signed-off-by: Virginia Adams <[email protected]> * loading newline file returns data_dir now Signed-off-by: Virginia Adams <[email protected]> * Removed conf notebook and deleted comment Signed-off-by: Virginia Adams <[email protected]> * Added jenkins test Signed-off-by: Virginia Adams <[email protected]> * Updated Jenkins test Signed-off-by: Virginia Adams <[email protected]> * fixed file path Signed-off-by: Virginia Adams <[email protected]> * Changed Jenkins pipeline order Signed-off-by: Virginia Adams <[email protected]> * Fixed Jenkins datapath... again... Signed-off-by: Virginia Adams <[email protected]> * Made most review changes Signed-off-by: Virginia Adams <[email protected]> * fixed copy right Signed-off-by: Virginia Adams <[email protected]> * updated unit test to wget config Signed-off-by: Virginia Adams <[email protected]> * reverted test file back Signed-off-by: Virginia Adams <[email protected]> * Added project dir to jenkins test Signed-off-by: Virginia Adams <[email protected]> * defined config in unit test Signed-off-by: Virginia Adams <[email protected]> Co-authored-by: Sandeep Subramanian <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> * switch CI back to main Signed-off-by: Oleksii Kuchaiev <[email protected]> * Make Hifigan jittable Signed-off-by: Ryan Leary <[email protected]> * Remove vestigial debugging printout Signed-off-by: Ryan Leary <[email protected]> * Add export forward and fix style Signed-off-by: Ryan Leary <[email protected]> * Fix load_state_dict override for arbitrary layers Signed-off-by: Ryan Leary <[email protected]> Co-authored-by: Jocelyn <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Co-authored-by: Jason <[email protected]> Co-authored-by: vadam5 <[email protected]> Co-authored-by: Sandeep Subramanian <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Co-authored-by: Ryan Leary <[email protected]> Signed-off-by: Micha Livne <[email protected]> * fix version (#2162) Signed-off-by: Jason <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Megatron nb size reduced (#2163) * notebook size reduced Signed-off-by: ekmb <[email protected]> * notebook size reduced Signed-off-by: ekmb <[email protected]> Signed-off-by: Micha Livne <[email protected]> * update spectral clustering method (#2158) * update spectral clustering method Signed-off-by: nithinraok <[email protected]> * update Jenkins File Signed-off-by: nithinraok <[email protected]> * threshold fix by reducing window length for shorter embs Signed-off-by: nithinraok <[email protected]> * grammar fixes Signed-off-by: nithinraok <[email protected]> * CR update Signed-off-by: nithinraok <[email protected]> * paper reference Signed-off-by: nithinraok <[email protected]> * improve docstring for yaml Signed-off-by: nithinraok <[email protected]> * Doc fixes Signed-off-by: nithinraok <[email protected]> Signed-off-by: Micha Livne <[email protected]> * revert (#2167) Signed-off-by: Jason <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Limit Pytorch lightning release (#2170) Signed-off-by: smajumdar <[email protected]> Signed-off-by: Micha Livne <[email protected]> * token classification models artifacts update (#2169) * artifacts update Signed-off-by: ekmb <[email protected]> * artifacts update Signed-off-by: ekmb <[email protected]> * jenkins Signed-off-by: ekmb <[email protected]> * fix for model restoration Signed-off-by: ekmb <[email protected]> * typos fix + jenkins dir update Signed-off-by: ekmb <[email protected]> * typo Signed-off-by: ericharper <[email protected]> * update jenkins branch Signed-off-by: ericharper <[email protected]> * add && Signed-off-by: ericharper <[email protected]> * jenkins Signed-off-by: ekmb <[email protected]> * jenkins disable Signed-off-by: ekmb <[email protected]> * revert jenkins Signed-off-by: ekmb <[email protected]> * jenkins disable Signed-off-by: ekmb <[email protected]> * revert jenkins Signed-off-by: ekmb <[email protected]> Co-authored-by: ericharper <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Fix to always_save_nemo (#2174) * Initial attempt at always_save_nemo fix Signed-off-by: MaximumEntropy <[email protected]> * updated path before saving in exp manager, fixed bug when handling tarfile artifacts Signed-off-by: ericharper <[email protected]> * Add test with always_save_nemo to exp_manager Signed-off-by: MaximumEntropy <[email protected]> * Style fixes Signed-off-by: MaximumEntropy <[email protected]> * update jenkins branch Signed-off-by: ericharper <[email protected]> * check for nemo: Signed-off-by: ericharper <[email protected]> * check for nemo: Signed-off-by: ericharper <[email protected]> * check for nemo: Signed-off-by: ericharper <[email protected]> Co-authored-by: ericharper <[email protected]> Co-authored-by: Jason <[email protected]> Signed-off-by: Micha Livne <[email protected]> * fix typo (#2179) Signed-off-by: Yang Zhang <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Make itn tests optional (#2173) * Limit Pytorch lightning release Signed-off-by: smajumdar <[email protected]> * Add final two checks Signed-off-by: smajumdar <[email protected]> Co-authored-by: Jason <[email protected]> Signed-off-by: Micha Livne <[email protected]> * First Revision of TTS Docs and Notebooks Update for 1.0 (#2166) * squash Signed-off-by: Jason <[email protected]> * notebook fixes Signed-off-by: Jason <[email protected]> * notebook fixes Signed-off-by: Jason <[email protected]> * typos Signed-off-by: Jason <[email protected]> Signed-off-by: Micha Livne <[email protected]> * add more alternatives of 0 for telephone (#2171) Signed-off-by: Yang Zhang <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Acc tn (#2180) * make tn cardinal faster Signed-off-by: Yang Zhang <[email protected]> * add number far Signed-off-by: Yang Zhang <[email protected]> * add test Signed-off-by: Yang Zhang <[email protected]> * fix lgtm Signed-off-by: Yang Zhang <[email protected]> * fix lgtm Signed-off-by: Yang Zhang <[email protected]> Signed-off-by: Micha Livne <[email protected]> * [DOCS] NLP Model parallel, NMT multi-val, CORE register artifacts (#2168) * update docs Signed-off-by: ericharper <[email protected]> * update docs Signed-off-by: ericharper <[email protected]> * update docs Signed-off-by: ericharper <[email protected]> * update docs Signed-off-by: ericharper <[email protected]> * update docs Signed-off-by: ericharper <[email protected]> * update docs Signed-off-by: ericharper <[email protected]> * update docs Signed-off-by: ericharper <[email protected]> * update docs Signed-off-by: ericharper <[email protected]> * update docs Signed-off-by: ericharper <[email protected]> * update docs Signed-off-by: ericharper <[email protected]> * update docs Signed-off-by: ericharper <[email protected]> * update docs Signed-off-by: ericharper <[email protected]> Co-authored-by: Sandeep Subramanian <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Change label smoothing prob to reduce chance of test failure (#2184) Signed-off-by: MaximumEntropy <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Add FS2 checkpoint links to docs and inference notebook (#2181) * Add FS2 checkpoint links to docs and inference notebook Signed-off-by: Jocelyn Huang <[email protected]> * Remove empty cell from TTS notebook Signed-off-by: Jocelyn Huang <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Update ptl to 1.3 on main branch (#2178) * Update PTL Signed-off-by: smajumdar <[email protected]> * Begin update to Pytorch Lightning 1.3.x Signed-off-by: smajumdar <[email protected]> * Formatting Signed-off-by: smajumdar <[email protected]> * style Signed-off-by: ericharper <[email protected]> * Formatting Signed-off-by: smajumdar <[email protected]> * minor fix Signed-off-by: Jason <[email protected]> * minor fix Signed-off-by: Jason <[email protected]> * get testing attribute from trainer Signed-off-by: ericharper <[email protected]> * update init_ddp_connection override Signed-off-by: ericharper <[email protected]> * update attribute Signed-off-by: ericharper <[email protected]> * add barrier after load checkpoint in megatron Signed-off-by: ericharper <[email protected]> * remove barrier Signed-off-by: ericharper <[email protected]> * update last naming Signed-off-by: Jason <[email protected]> Co-authored-by: ericharper <[email protected]> Co-authored-by: Jason <[email protected]> Signed-off-by: Micha Livne <[email protected]> * SDE updates (#2187) * Added updates to SDE: - support for external vocabulary (to detect OOV words) - support for offset field (for segmented long recordings) - UI improvements Signed-off-by: Vitaly Lavrukhin <[email protected]> * Refactored diff in SDE Signed-off-by: Vitaly Lavrukhin <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Add TTS aligner and improved version of g2p for vocabs.Phonemes, small improvement in TalkNet (#2189) * add first version of aligner Signed-off-by: Oktai Tatanov <[email protected]> * aligner docs, new g2p version, fix bugs in talknet Signed-off-by: Oktai Tatanov <[email protected]> * update docs and remove lj related code Signed-off-by: Oktai Tatanov <[email protected]> * fix style Signed-off-by: Oktai Tatanov <[email protected]> * fix import Signed-off-by: Oktai Tatanov <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * set the default of nodessplitter to None. (#2190) Signed-off-by: Vahid <[email protected]> Signed-off-by: Micha Livne <[email protected]> * NMT fixes (#2194) * minor fixes Signed-off-by: Oleksii Kuchaiev <[email protected]> * minor bugfixes Signed-off-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Store mappings file in .nemo for FS2 model (#2196) * Store mappings file in .nemo for FS2 model Signed-off-by: Jocelyn Huang <[email protected]> * Add error enforcing mappings file during training (FS2) Signed-off-by: Jocelyn Huang <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Add support to change the SE context window of ConvASREncoder (#2193) * Add support for changing context window on the fly Signed-off-by: smajumdar <[email protected]> * Add support to change the SE context window of ConvASREncoder Signed-off-by: smajumdar <[email protected]> * Add ability to skip config updating Signed-off-by: smajumdar <[email protected]> * Switch to mixin based API Signed-off-by: smajumdar <[email protected]> * Update docs and api for ASRModuleMixin Signed-off-by: smajumdar <[email protected]> * Change print to logging.info Signed-off-by: smajumdar <[email protected]> * Correct stride level when computing context window Signed-off-by: smajumdar <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Add a CI test for doing inference with an NMT model trained with Pre-LN (#2198) * Change label smoothing prob to reduce chance of test failure Signed-off-by: MaximumEntropy <[email protected]> * Add Pre-LN inference test to Jenkinsfile Signed-off-by: MaximumEntropy <[email protected]> * Separate tests for training and NMT inference Signed-off-by: MaximumEntropy <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Fix ipywidgets error in asr notebook (#2199) Added `ipywidgets` to avoid `ImportError: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html` error. Signed-off-by: Derek Chia <[email protected]> Signed-off-by: Micha Livne <[email protected]> * metrics fix (#2202) * metrics fix Signed-off-by: ekmb <[email protected]> * metrics reset for punct model Signed-off-by: ekmb <[email protected]> Signed-off-by: Micha Livne <[email protected]> * readme and minor improvements (#2203) * readme and minor improvements Signed-off-by: nithinraok <[email protected]> * vad threshold update Signed-off-by: nithinraok <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Fix text processing docs (#2195) * fix text processing docs Signed-off-by: Yang Zhang <[email protected]> * fix name Signed-off-by: Yang Zhang <[email protected]> * add guard to pynini import Signed-off-by: Yang Zhang <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Fix bug in SpecCutout (#2201) Signed-off-by: Robert Bracco <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Fix bug in SpecCutout (#2201) (#2205) Signed-off-by: Robert Bracco <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Signed-off-by: smajumdar <[email protected]> Co-authored-by: Robert Bracco <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Set seed before generating random tensors in NMT test (#2206) * Change label smoothing prob to reduce chance of test failure Signed-off-by: MaximumEntropy <[email protected]> * Set seed before generating tensors Signed-off-by: MaximumEntropy <[email protected]> Signed-off-by: Micha Livne <[email protected]> * ASR patches for v1.0.0 (#2207) * Multiple updates to RNNT add initialization Signed-off-by: smajumdar <[email protected]> * Correct name of initilization Signed-off-by: smajumdar <[email protected]> * Update dockerignore Signed-off-by: smajumdar <[email protected]> * Fix RNNT WER calculation Signed-off-by: smajumdar <[email protected]> * Address comments Signed-off-by: smajumdar <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Multilingual training for NMT (#2160) * mnmt on fresh main Signed-off-by: Abhinav Khattar <[email protected]> * push for test Signed-off-by: Abhinav Khattar <[email protected]> * debug Signed-off-by: Abhinav Khattar <[email protected]> * check Signed-off-by: Abhinav Khattar <[email protected]> * cleanup Signed-off-by: Abhinav Khattar <[email protected]> * minor fix Signed-off-by: Abhinav Khattar <[email protected]> * more minor fixes Signed-off-by: Abhinav Khattar <[email protected]> * fix for test Signed-off-by: Abhinav Khattar <[email protected]> * fix list size error Signed-off-by: Abhinav Khattar <[email protected]> * multilingual in infer Signed-off-by: Abhinav Khattar <[email protected]> * changes Signed-off-by: Abhinav Khattar <[email protected]> * tar creation with multilingual Signed-off-by: Abhinav Khattar <[email protected]> * fix Signed-off-by: Abhinav Khattar <[email protected]> * changes + parallelism + bug fix Signed-off-by: Abhinav Khattar <[email protected]> * small fix Signed-off-by: Abhinav Khattar <[email protected]> * multilingual preprocessor fix Signed-off-by: Abhinav Khattar <[email protected]> * globally unique fragment names in tarred dataset Signed-off-by: Abhinav Khattar <[email protected]> * minor changes Signed-off-by: Abhinav Khattar <[email protected]> * rm load_from_cached_dataset Signed-off-by: Abhinav Khattar <[email protected]> * minor config change Signed-off-by: Abhinav Khattar <[email protected]> * rm unsued import Signed-off-by: Abhinav Khattar <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Remove memory leak from ASR notebook + update model notebook (#2213) * ASR patches for v1.0.0 (#2207) * Multiple updates to RNNT add initialization Signed-off-by: smajumdar <[email protected]> * Correct name of initilization Signed-off-by: smajumdar <[email protected]> * Update dockerignore Signed-off-by: smajumdar <[email protected]> * Fix RNNT WER calculation Signed-off-by: smajumdar <[email protected]> * Address comments Signed-off-by: smajumdar <[email protected]> * Correct model notebook to log the loss and correctly assign keys Signed-off-by: smajumdar <[email protected]> Signed-off-by: Micha Livne <[email protected]> * replace names in vad tutorials (#2220) Signed-off-by: fayejf <[email protected]> Signed-off-by: Micha Livne <[email protected]> * fix the versioning name. (#2209) * fix the versioning name. Signed-off-by: Vahid <[email protected]> * Made version None. Signed-off-by: Vahid <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Co-authored-by: Jason <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Enabled passing kwargs to export() (#2175) * Enabled passing kwargs to export() Signed-off-by: Boris Fomitchev <[email protected]> * Fixing style; changed Classifier input_example to new extended syntax Signed-off-by: Boris Fomitchev <[email protected]> * Fixed order of forward() call in export Signed-off-by: Boris Fomitchev <[email protected]> * Fixing style Signed-off-by: Boris Fomitchev <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * update g2p: ambigious ignore, flag for skipping seq2seq (#2223) Signed-off-by: Oktai Tatanov <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Update TTS notebook with TalkNet inference (#2133) * Update TTS notebook with TalkNet inference. Signed-off-by: Stanislav Beliaev <[email protected]> * Update TTS Notebook. Signed-off-by: Stanislav Beliaev <[email protected]> * Update TTS TN Training Notebook. Signed-off-by: Stanislav Beliaev <[email protected]> * Fix TN paper link. Signed-off-by: Stanislav Beliaev <[email protected]> * Remove branch updaing TODOs. Signed-off-by: Stanislav Beliaev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * update speaker notebooks (#2224) Signed-off-by: nithinraok <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Support symlinked files (#2216) Signed-off-by: Anas Abou Allaban <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Set strict=True everywhere by default. (#2225) Signed-off-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * set strict=True in nlp_model (#2227) Signed-off-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * set strict=False for model parallel examples Signed-off-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Make Text processing installation optional via reinstall.sh (#2226) * Make Text processing installation optional via reinstall.sh Signed-off-by: smajumdar <[email protected]> * Support both success and failure states Signed-off-by: smajumdar <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Transformer final norm preln (#2197) * fix pre_ln final norm Signed-off-by: fayejf <[email protected]> * style fix Signed-off-by: fayejf <[email protected]> * bug fixed Signed-off-by: fayejf <[email protected]> * bugfix post_ln Signed-off-by: fayejf <[email protected]> * update and add pre_ln_final_norm Signed-off-by: fayejf <[email protected]> * style fix Signed-off-by: fayejf <[email protected]> * fix for unit test Signed-off-by: fayejf <[email protected]> * rename final_norm to final_layer_norm Signed-off-by: fayejf <[email protected]> * bug fix Signed-off-by: fayejf <[email protected]> * tiny fix Signed-off-by: fayejf <[email protected]> * fix and improve Signed-off-by: fayejf <[email protected]> * tiny fix Signed-off-by: fayejf <[email protected]> * Patch for NMT to allow loading old modlels trained with pre-LN Signed-off-by: MaximumEntropy <[email protected]> Co-authored-by: Sandeep Subramanian <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Update models and notebook for 1.0 (#2211) * update models Signed-off-by: Jason <[email protected]> * updates Signed-off-by: Jason <[email protected]> * fix Signed-off-by: Jason <[email protected]> * add links Signed-off-by: Jason <[email protected]> * fixes Signed-off-by: Jason <[email protected]> * style Signed-off-by: Jason <[email protected]> * update checkpoints Signed-off-by: Jason <[email protected]> * rename Signed-off-by: Jason <[email protected]> * fixes Signed-off-by: Jason <[email protected]> * lgtm Signed-off-by: Jason <[email protected]> * fix loading waveglow Signed-off-by: Jason <[email protected]> * typo Signed-off-by: Jason <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * update_metrics_classification_models (#2228) Signed-off-by: nithinraok <[email protected]> Co-authored-by: fayejf <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Data loader for seq of label model (#2084) * feature to seq label data loader Signed-off-by: fayejf <[email protected]> * style fix Signed-off-by: fayejf <[email protected]> * small fix Signed-off-by: fayejf <[email protected]> * update tl to be length of seq label Signed-off-by: fayejf <[email protected]> * style fix Signed-off-by: fayejf <[email protected]> * tiny bug fix Signed-off-by: fayejf <[email protected]> * small updates Signed-off-by: fayejf <[email protected]> * updates for review feedback Signed-off-by: fayejf <[email protected]> * style fix Signed-off-by: fayejf <[email protected]> * explain seq_label Signed-off-by: fayejf <[email protected]> * fix lgtm Signed-off-by: fayejf <[email protected]> * small updates Signed-off-by: fayejf <[email protected]> * improve as discussed Signed-off-by: fayejf <[email protected]> * add docstring Signed-off-by: fayejf <[email protected]> Co-authored-by: Nithin Rao <[email protected]> Signed-off-by: Micha Livne <[email protected]> * fix comments (#2236) * fix comments Signed-off-by: Yang Zhang <[email protected]> * fix typo Signed-off-by: Yang Zhang <[email protected]> Signed-off-by: Micha Livne <[email protected]> * add paper ref to sgdqa model doc (#2233) * add paper ref to sgdqa model doc Signed-off-by: Yang Zhang <[email protected]> * fix comments Signed-off-by: Yang Zhang <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Move ConcatDataset to common (#2237) * move concatdataset to common Signed-off-by: Abhinav Khattar <[email protected]> * var name change Signed-off-by: Abhinav Khattar <[email protected]> Signed-off-by: Micha Livne <[email protected]> * audio based normalization (#2231) * squash norm_audio Signed-off-by: ekmb <[email protected]> * add missing files Signed-off-by: ekmb <[email protected]> * style Signed-off-by: ekmb <[email protected]> * unit tests added, docstrings fixed Signed-off-by: ekmb <[email protected]> * fix lgtm errors Signed-off-by: ekmb <[email protected]> * debug jenkins Signed-off-by: ekmb <[email protected]> * debug jenkins Signed-off-by: ekmb <[email protected]> * signature update Signed-off-by: ekmb <[email protected]> * set deterministic default Signed-off-by: ekmb <[email protected]> * add more test cases Signed-off-by: ekmb <[email protected]> Signed-off-by: Micha Livne <[email protected]> * bug fix config (#2232) Signed-off-by: fayejf <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Alias Swish to SiLU (#2239) * Alias Swish to SiLU and move activations to inplace execution if possible Signed-off-by: smajumdar <[email protected]> * Remove unused import Signed-off-by: smajumdar <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Update README.rst Signed-off-by: Micha Livne <[email protected]> * Offline asr notebook bug fix (#2242) * fix Signed-off-by: fayejf <[email protected]> * install Signed-off-by: fayejf <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Fix docstring (#2244) * fix comments Signed-off-by: Yang Zhang <[email protected]> * fix doc string Signed-off-by: Yang Zhang <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Update "last" Checkpoint (#2241) * fix Signed-off-by: Jason <[email protected]> * change Signed-off-by: Jason <[email protected]> * fix Signed-off-by: Jason <[email protected]> Co-authored-by: Sandeep Subramanian <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Add pretrained model stt_es_citrinet_512 (#2247) Signed-off-by: Elena Rastorgueva <[email protected]> Signed-off-by: Micha Livne <[email protected]> * [BUGFIX] Only process tarfile artifacts when model was restored from tarfile (#2250) * process tarfile artifacts only if model is being restored Signed-off-by: ericharper <[email protected]> * process tarfile artifacts only if model was restored from a tarfile Signed-off-by: ericharper <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Log average metrics for Multi-validation in NMT (#2251) * add avg metrics NMT Signed-off-by: Abhinav Khattar <[email protected]> * name change Signed-off-by: Abhinav Khattar <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Update Primer notebook (#2258) Signed-off-by: smajumdar <[email protected]> Signed-off-by: Micha Livne <[email protected]> * fixed Bug 3310780 and 3310799 (#2264) Signed-off-by: Virginia Adams <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Support multiple models being instantiated in same execution scope (#2245) * Support multiple models being instantiated in same execution scope Signed-off-by: smajumdar <[email protected]> * Fix tests Signed-off-by: smajumdar <[email protected]> * Add locks to methods in appstate Signed-off-by: smajumdar <[email protected]> * Perform locks only on write operations Signed-off-by: smajumdar <[email protected]> * Correct deadlock issue Signed-off-by: smajumdar <[email protected]> * Add more tests Signed-off-by: smajumdar <[email protected]> * Add test for multi save and remove patch to change save type Signed-off-by: smajumdar <[email protected]> * Update app state to preserve gidx of previous token Signed-off-by: smajumdar <[email protected]> * Correct restoration logic for tarfiles Signed-off-by: smajumdar <[email protected]> Signed-off-by: Micha Livne <[email protected]> * ASR Refactoring (#2240) * Refactor out the preprocessing from ASR into common Signed-off-by: smajumdar <[email protected]> * Correct nltk issue with vocabs.py for clusters Signed-off-by: smajumdar <[email protected]> * Add typing information to SpecAugment and SpecCutout Signed-off-by: smajumdar <[email protected]> * Reorganize parts directory Signed-off-by: smajumdar <[email protected]> * Refactor parts submodules, add __init__ to few important parts Signed-off-by: smajumdar <[email protected]> * Update docs for new path to parts Signed-off-by: smajumdar <[email protected]> * Cherry pick PR https://github.com/NVIDIA/NeMo/pull/2219 Signed-off-by: smajumdar <[email protected]> * Add header for preprocessing commons Signed-off-by: smajumdar <[email protected]> * Fix style of tests Signed-off-by: smajumdar <[email protected]> * Add forced update of configs for train-val-test ds to new labels tests Signed-off-by: smajumdar <[email protected]> * Update path to FilterbankFeatures for TTS Signed-off-by: smajumdar <[email protected]> * Add an alias file for backward compatibility Signed-off-by: smajumdar <[email protected]> * Add an alias file for backward compatibility Signed-off-by: smajumdar <[email protected]> * Update training scripts of ASR to support finetuning Signed-off-by: smajumdar <[email protected]> * Update Finetuning step to be ModelPT level Signed-off-by: smajumdar <[email protected]> * Update docs for finetuning for ASR Signed-off-by: smajumdar <[email protected]> * Fix style Signed-off-by: smajumdar <[email protected]> * Update docs and scripts with fine-tuning info Signed-off-by: smajumdar <[email protected]> * Update docs and scripts with fine-tuning info Signed-off-by: smajumdar <[email protected]> * Fix style Signed-off-by: smajumdar <[email protected]> * Update scripts Signed-off-by: smajumdar <[email protected]> * Add comment for weight initialization Signed-off-by: smajumdar <[email protected]> Signed-off-by: Micha Livne <[email protected]> * TTS Doc Fix and Remove TTS Test (#2272) * bug fix and remove test Signed-off-by: Jason <[email protected]> * syntax Signed-off-by: Jason <[email protected]> * syntax Signed-off-by: Jason <[email protected]> * syntax Signed-off-by: Jason <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Talknet training Fix (#2273) * TalkNet Training notebook fix. Signed-off-by: Stanislav Beliaev <[email protected]> * Remove debug stuff. Signed-off-by: Stanislav Beliaev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * update (#2274) Signed-off-by: Jason <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Add links (#2275) * update Signed-off-by: Jason <[email protected]> * link Signed-off-by: Jason <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Delete 3_TTS_TalkNet_Training.ipynb (#2276) Signed-off-by: Jason <[email protected]> Signed-off-by: Micha Livne <[email protected]> * tune down logging (#2277) * tune down logging Signed-off-by: Oleksii Kuchaiev <[email protected]> * debug message instead of removing it completely Signed-off-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * minor bugfix Signed-off-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * remove confusing message Signed-off-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Restore TalkNet training notebook (#2281) * Restore TalkNet training notebook. Signed-off-by: Stanislav Beliaev <[email protected]> * Remove torchaudio dep. Signed-off-by: Stanislav Beliaev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Fix ExpManager Issues and FastPitch (#2283) * backport exp_manager fixes to v1 Signed-off-by: Jason <[email protected]> * fix fastpitch Signed-off-by: Jason <[email protected]> * fix tests Signed-off-by: Jason <[email protected]> * update prefix Signed-off-by: Jason <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Organize asr config folders (#2284) Signed-off-by: Micha Livne <[email protected]> * Fix and enable DALI tests (#2077) * Fix and enable DALI tests Signed-off-by: Joaquin Anton <[email protected]> * remove unused import Signed-off-by: Joaquin Anton <[email protected]> * Move DALI tests to a separate Jenkins stage Signed-off-by: Joaquin Anton <[email protected]> * Remove DALI tests from the main jenkins ASR stage Signed-off-by: Joaquin Anton <[email protected]> * Comment out MFCC test Signed-off-by: Joaquin Anton <[email protected]> * Working version Signed-off-by: Joaquin Anton <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Added unit test for hifigan export, fixed hifigan export (#2279) * Added unit test for hifigan export, Removed runtime test from waveglow test (now in export) Signed-off-by: Boris Fomitchev <[email protected]> * Fixed style Signed-off-by: Boris Fomitchev <[email protected]> * Fixed style Signed-off-by: Boris Fomitchev <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Update conformer recipes (#2265) * updated readme asr. Signed-off-by: Vahid <[email protected]> * added models. Signed-off-by: Vahid <[email protected]> * fixed docs. Signed-off-by: Vahid <[email protected]> * fixed docs. Signed-off-by: Vahid <[email protected]> * fixed docs. Signed-off-by: Vahid <[email protected]> * fixed docs. Signed-off-by: Vahid <[email protected]> * fixed the docs. Signed-off-by: Vahid <[email protected]> * fixed the docs. Signed-off-by: Vahid <[email protected]> * fixed the docs. Signed-off-by: Vahid <[email protected]> * disabled test. Signed-off-by: Vahid <[email protected]> * Updated the config files. Signed-off-by: Vahid <[email protected]> * Updated the config files. Signed-off-by: Vahid <[email protected]> * Updated the config files. Signed-off-by: Vahid <[email protected]> * dropped the wers. Signed-off-by: Vahid <[email protected]> * dropped the wers. Signed-off-by: Vahid <[email protected]> * dropped new models and reverted to old versions. Signed-off-by: Vahid <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Adding neural rescorer and its documentations (#2287) * Added intial neural rescorer scripts. Signed-off-by: Vahid <[email protected]> * Added intial neural rescorer scripts. Signed-off-by: Vahid <[email protected]> * Added intial neural rescorer scripts. Signed-off-by: Vahid <[email protected]> * Added intial neural rescorer scripts. Signed-off-by: Vahid <[email protected]> * Added intial neural rescorer scripts. Signed-off-by: Vahid <[email protected]> * Added intial neural rescorer scripts. Signed-off-by: Vahid <[email protected]> * Added intial neural rescorer scripts. Signed-off-by: Vahid <[email protected]> * Added intial neural rescorer scripts. Signed-off-by: Vahid <[email protected]> * Added intial neural rescorer scripts. Signed-off-by: Vahid <[email protected]> * added more docs, figures, and output file. Signed-off-by: Vahid <[email protected]> * added more docs, figures, and output file. Signed-off-by: Vahid <[email protected]> * added more docs, figures, and output file. Signed-off-by: Vahid <[email protected]> * added more docs, figures, and output file. Signed-off-by: Vahid <[email protected]> * added more docs, figures, and output file. Signed-off-by: Vahid <[email protected]> * fixed style Signed-off-by: Vahid <[email protected]> * add a note to asr notebook. Signed-off-by: Vahid <[email protected]> * add a note to asr notebook. Signed-off-by: Vahid <[email protected]> * add a note to asr notebook. Signed-off-by: Vahid <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Adjust warning messages Signed-off-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Revert "Adjust warning messages" This reverts commit df046ec55754d0136a2a28451435068f32409f30. Signed-off-by: Micha Livne <[email protected]> * Adjust warning messages (#2294) Signed-off-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Adding new Models releases on NGC. (#2295) * added new models. Signed-off-by: Vahid <[email protected]> * added tests for asr lm. Signed-off-by: Vahid <[email protected]> * added tests for asr lm. Signed-off-by: Vahid <[email protected]> * dropped the test. Signed-off-by: Vahid <[email protected]> Signed-off-by: Micha Livne <[email protected]> * update quantization (#2298) Signed-off-by: slyned <[email protected]> Co-authored-by: slyned <[email protected]> Signed-off-by: Micha Livne <[email protected]> * ASR improvements (#2293) * Update numba messages and citrinet configs Signed-off-by: smajumdar <[email protected]> * Remove support for weight init scale and hidden hidden bias scale for layer normalized lstm Signed-off-by: smajumdar <[email protected]> * Add support for multiple filetypes in tarred datasets, correct rnn LN-lstm inputs, fix OmegaConf compat issue Signed-off-by: smajumdar <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Time quarter to (#2292) * fix comments Signed-off-by: Yang Zhang <[email protected]> * fix doc string Signed-off-by: Yang Zhang <[email protected]> * adding quarter to to time class Signed-off-by: Yang Zhang <[email protected]> * fix typo Signed-off-by: Yang Zhang <[email protected]> Signed-off-by: Micha Livne <[email protected]> * fixed paths. (#2301) Signed-off-by: Vahid <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Added onnxruntime check of exported ONNX, bumped up default ONNX opset (#2278) * Added onnxruntime check of exported ONNX, bumped up default ONNX opset Signed-off-by: Boris Fomitchev <[email protected]> * Made TS export to accept ONNX-style input example, removed unused param to export Signed-off-by: Boris Fomitchev <[email protected]> * check_trace default made False Signed-off-by: Boris Fomitchev <[email protected]> * Fixed for updated export signature Signed-off-by: Boris Fomitchev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Update readmes Signed-off-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * update readme Signed-off-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * update readme Signed-off-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * fix docs table Signed-off-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Add support for Numba CUDA optimized SpecAugment (#2269) * Initial implementation Signed-off-by: smajumdar <[email protected]> * Initial implementation Signed-off-by: smajumdar <[email protected]> * Finish initial implementation of numba spec augment Signed-off-by: smajumdar <[email protected]> * Correct mask propagataion Signed-off-by: smajumdar <[email protected]> * Parallelize kernel over batch instead of over masks Signed-off-by: smajumdar <[email protected]> * Finish tests and update to signature of spectrogramaugmentation calls Signed-off-by: smajumdar <[email protected]> * Finish tests and update to signature of spectrogramaugmentation calls Signed-off-by: smajumdar <[email protected]> * Add header Signed-off-by: smajumdar <[email protected]> * Fix style Signed-off-by: smajumdar <[email protected]> * Add heuristics Signed-off-by: smajumdar <[email protected]> * Correct inclusive range of padding Signed-off-by: smajumdar <[email protected]> * Correct typing for spec aug numba Signed-off-by: smajumdar <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Added JSON manifest's support to transcribe_speech.py (#2304) * Added JSON manifest's support to transcribe_speech.py Signed-off-by: Vitaly Lavrukhin <[email protected]> * Dropped unused import Signed-off-by: Vitaly Lavrukhin <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Signed-off-by: Micha Livne <[email protected]> * get embedding for a single file (#2310) * get embedding for a single file Signed-off-by: nithinraok <[email protected]> * fixes Signed-off-by: nithinraok <[email protected]> * sr update Signed-off-by: nithinraok <[email protected]> * regain train mode Signed-off-by: nithinraok <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Update FastPitch (#2249) * wip Signed-off-by: Jason <[email protected]> * c1 Signed-off-by: Jason <[email protected]> * bug fixes Signed-off-by: Jason <[email protected]> * bug fixes Signed-off-by: Jason <[email protected]> * v2 Signed-off-by: Jason <[email protected]> * changes Signed-off-by: Jason <[email protected]> * add types, old model working Signed-off-by: Jason <[email protected]> * pitch Signed-off-by: Jason <[email protected]> * update Signed-off-by: Jason <[email protected]> * update Signed-off-by: Jason <[email protected]> * let it work Signed-off-by: Jason <[email protected]> * fixes Signed-off-by: Jason <[email protected]> * add oktai comments Signed-off-by: Jason <[email protected]> * debug Signed-off-by: Jason <[email protected]> * scale Signed-off-by: Jason <[email protected]> * wip Signed-off-by: Jason <[email protected]> * fix test for v1 Signed-off-by: Jason <[email protected]> * merge train and val Signed-off-by: Jason <[email protected]> * back to par bin att, add correct encoder settings Signed-off-by: Jason <[email protected]> * try Signed-off-by: Jason <[email protected]> * undo Signed-off-by: Jason <[email protected]> * lgtm: Signed-off-by: Jason <[email protected]> * style Signed-off-by: Jason <[email protected]> * default to ljs Signed-off-by: Jason <[email protected]> Signed-off-by: Micha Livne <[email protected]> * patch quantization (#2314) * update quantization Signed-off-by: slyned <[email protected]> * update quant infer trt Signed-off-by: slyned <[email protected]> * fix style Signed-off-by: slyned <[email protected]> Co-authored-by: slyned <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Pin OmegaConf version for 1.0.0 (#2316) * Update OmegaConf compatibility Signed-off-by: smajumdar <[email protected]> * Correct OmegaConf.pretty() Signed-off-by: smajumdar <[email protected]> * Upper bound omegaconf Signed-off-by: smajumdar <[email protected]> * Revert "Correct OmegaConf.pretty()" This reverts commit 6ebae2ef Signed-off-by: smajumdar <[email protected]> * Revert "Update OmegaConf compatibility" This reverts commit 83b2cf35a07a742552082e80e6ca34c9b8203cbc. Signed-off-by: smajumdar <[email protected]> Signed-off-by: Micha Livne <[email protected]> * [BUGFIX] OmegaConf forward compatibility (#2319) * Update OmegaConf compatibility Signed-off-by: smajumdar <[email protected]> Signed-off-by: ericharper <[email protected]> * Correct OmegaConf.pretty() Signed-off-by: smajumdar <[email protected]> Signed-off-by: ericharper <[email protected]> * upper bound omegaconf Signed-off-by: ericharper <[email protected]> * add if,else back Signed-off-by: ericharper <[email protected]> * typo Signed-off-by: ericharper <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Signed-off-by: Micha Livne <[email protected]> * bumping version to 1.0.1 Signed-off-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: Micha Livne <[email protected]> * fix_cluster_small_sample (#2303) * fix_cluster_small_sample Signed-off-by: nithinraok <[email protected]> * for smaller samples Signed-off-by: nithinraok <[email protected]> * remove type Signed-off-by: nithinraok <[email protected]> * similarity matrix Signed-off-by: nithinraok <[email protected]> * est num of speakers add Signed-off-by: nithinraok <[email protected]> * comment update Signed-off-by: nithinraok <[email protected]> * style fix Signed-off-by: nithinraok <[email protected]> * MIN_SAMPLES passed through func arg Signed-off-by: nithinraok <[email protected]> * doc string update Signed-off-by: nithinraok <[email protected]> * spell mistake Signed-off-by: nithinraok <[email protected]> Signed-off-by: Micha Livne <[email protected]> * Fastpitch export (#2300) * wip Signed-off-by: Jason <[email protected]> * c1 Signed-off-by: Jason <[email protected]> * bug fixes Signed-off-by: Jason <[email protected]> * bug fixes Signed-off-by: Jason <[email protected]> * v2 Signed-off-by: Jason <[email protected]> * changes Signed-off-by: Jason <[email protected]> * add types, old model working Signed-off-by: Jason <[email protected]> * pitch Signed-off-by: Jason <[email protected]> * update Signed-off-by: Jason <[email protected]> * update Signed-off-by: Jason <[email protected]> * let it work Signed-off-by: Jason <[email protected]> * fixes Signed-off-by: Jason <[email protected]> * add oktai comments Signed-off-by: Jason <[email protected]> * debug Signed-off-by: Jason <[email protected]> * scale Signed-off-by: Jason <[email protected]> * wip Signed-off-by: Jason <[email protected]> * fix test for v1 Signed-off-by: Jason <[email protected]> …

yzhang123 and others added 30 commits April 29, 2021 18:58

ASR + NLP Doc Fixes (NVIDIA#2136)

a50aa57

* Preserve the tokenizer config for ASR Signed-off-by: smajumdar <[email protected]> * Correct nlp docs Signed-off-by: smajumdar <[email protected]> Signed-off-by: Micha Livne <[email protected]>

Fix FilterbankFeatures eval nondeterminism. (NVIDIA#2146)

2ba8779

Signed-off-by: PiotrDabkowski <[email protected]> Signed-off-by: Micha Livne <[email protected]>

fix the docs. (NVIDIA#2148)

617bf1c

Signed-off-by: Micha Livne <[email protected]>

bumping version to 1.0.0

4f539d5

Signed-off-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: Micha Livne <[email protected]>

fixed the num_samples of text classification model. (NVIDIA#2152)

068b708

Signed-off-by: Vahid <[email protected]> Signed-off-by: Micha Livne <[email protected]>

fix for electronic (NVIDIA#2153)

962bc54

* fix for electronic Signed-off-by: ekmb <[email protected]> * special symbols added Signed-off-by: ekmb <[email protected]> * restrict symbols list Signed-off-by: ekmb <[email protected]> Signed-off-by: Micha Livne <[email protected]>

Minor patch for translate_ddp (NVIDIA#2155)

a54b962

* Patch for backtranslation in lm dataset Signed-off-by: MaximumEntropy <[email protected]> * One more fix Signed-off-by: MaximumEntropy <[email protected]> Signed-off-by: Micha Livne <[email protected]>

Merge branch 'v1.0.0' into main

0b15937

Signed-off-by: Micha Livne <[email protected]>

switch CI back to main

fc0eea5

Signed-off-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: Micha Livne <[email protected]>

fixed the docs. (NVIDIA#2156)

750d9ab

Signed-off-by: Micha Livne <[email protected]>

fix version (NVIDIA#2162)

68c922b

Signed-off-by: Jason <[email protected]> Signed-off-by: Micha Livne <[email protected]>

Megatron nb size reduced (NVIDIA#2163)

e31b86b

* notebook size reduced Signed-off-by: ekmb <[email protected]> * notebook size reduced Signed-off-by: ekmb <[email protected]> Signed-off-by: Micha Livne <[email protected]>

revert (NVIDIA#2167)

0178d6c

Signed-off-by: Jason <[email protected]> Signed-off-by: Micha Livne <[email protected]>

Limit Pytorch lightning release (NVIDIA#2170)

b0526a3

Signed-off-by: smajumdar <[email protected]> Signed-off-by: Micha Livne <[email protected]>

fix typo (NVIDIA#2179)

910b775

Signed-off-by: Yang Zhang <[email protected]> Signed-off-by: Micha Livne <[email protected]>

Make itn tests optional (NVIDIA#2173)

8d067e0

* Limit Pytorch lightning release Signed-off-by: smajumdar <[email protected]> * Add final two checks Signed-off-by: smajumdar <[email protected]> Co-authored-by: Jason <[email protected]> Signed-off-by: Micha Livne <[email protected]>

1. Added a missing import.

2a8c732

Signed-off-by: Micha Livne <[email protected]>

1. Added NLPDDPPlugin.

8b692fe

Signed-off-by: Micha Livne <[email protected]>

michalivne force-pushed the nmt-bottleneck branch from 2a8c732 to 8b692fe Compare July 12, 2021 23:07

michalivne added 2 commits July 12, 2021 16:15

1. Cleaned style.

cf1a059

Signed-off-by: Micha Livne <[email protected]>

Merge branch 'nmt-bottleneck' of github.com:michalivne/NeMo into nmt-…

ec1303e

…bottleneck

michalivne added 2 commits July 12, 2021 19:39

1. Updated sign of computed loss.

bc2b173

Signed-off-by: Micha Livne <[email protected]>

1. Fixed double import.

78382bc

Signed-off-by: Micha Livne <[email protected]>

ericharper reviewed Jul 13, 2021

View reviewed changes

nemo/collections/nlp/models/machine_translation/mt_enc_dec_model.py Outdated Show resolved Hide resolved

Merge remote-tracking branch 'upstream/main' into nmt-bottleneck

b91349b

1. Moved logging of additional loss terms into MTBottleneckModel class.

14395e3

Signed-off-by: Micha Livne <[email protected]>

ericharper approved these changes Jul 13, 2021

View reviewed changes

1. Updated permissions.

85449f3

Signed-off-by: Micha Livne <[email protected]>

michalivne requested a review from MaximumEntropy July 13, 2021 22:22

MaximumEntropy approved these changes Jul 13, 2021

View reviewed changes

Merge branch 'main' into nmt-bottleneck

efa5d3f

Merge branch 'main' into nmt-bottleneck

d95f97f

MaximumEntropy merged commit da90c34 into NVIDIA:main Jul 14, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NMT bottleneck #2390

NMT bottleneck #2390

michalivne commented Jun 22, 2021 •

edited

Loading

lgtm-com bot commented Jul 12, 2021

lgtm-com bot commented Jul 12, 2021

lgtm-com bot commented Jul 12, 2021

ericharper commented Jul 13, 2021

ericharper commented Jul 13, 2021

lgtm-com bot commented Jul 13, 2021

ericharper left a comment

lgtm-com bot commented Jul 13, 2021

MaximumEntropy left a comment

lgtm-com bot commented Jul 13, 2021

lgtm-com bot commented Jul 14, 2021

lgtm-com bot commented Jul 14, 2021

NMT bottleneck #2390

NMT bottleneck #2390

Conversation

michalivne commented Jun 22, 2021 • edited Loading

Summary

YAML Configuration

Usage

Additional Info

lgtm-com bot commented Jul 12, 2021

lgtm-com bot commented Jul 12, 2021

lgtm-com bot commented Jul 12, 2021

ericharper commented Jul 13, 2021

ericharper commented Jul 13, 2021

lgtm-com bot commented Jul 13, 2021

ericharper left a comment

Choose a reason for hiding this comment

lgtm-com bot commented Jul 13, 2021

MaximumEntropy left a comment

Choose a reason for hiding this comment

lgtm-com bot commented Jul 13, 2021

lgtm-com bot commented Jul 14, 2021

lgtm-com bot commented Jul 14, 2021

michalivne commented Jun 22, 2021 •

edited

Loading