From 40c1cccc9eb6773243a5637ddaf0e6ff023d87e9 Mon Sep 17 00:00:00 2001 From: Somshubra Majumdar Date: Wed, 12 Jul 2023 10:00:05 -0700 Subject: [PATCH] Add support for Numba FP16 RNNT Loss (#6991) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit * Force working space memory to always be in fp32 Signed-off-by: smajumdar * Add support for fp16 testing in Numba Signed-off-by: smajumdar * Add support for fp16 testing in Numba Signed-off-by: smajumdar * Add support for fp16 testing in Numba Signed-off-by: smajumdar * Fix cost calculation by upcasting to fp32 Signed-off-by: smajumdar * Fix cost calculation by upcasting to fp32 Signed-off-by: smajumdar * Add support to check if numba fp16 is available Signed-off-by: smajumdar * add RNN-T loss implemented by PyTorch and test code (#5312) * Fix the bugs in cache-aware streaming Conformer (#5032) Signed-off-by: Vahid Signed-off-by: Hainan Xu * IA3 support for GPT and T5 (#4909) * init commit for ia3 adater training in GPT Signed-off-by: arendu * ia3 adater training in GPT, models and adapter classes Signed-off-by: arendu * reshape to operate even on non-contiguous tensors Signed-off-by: arendu * configs Signed-off-by: arendu * fixed none init Signed-off-by: arendu * adding adapter and ia3 support for T5 based models Signed-off-by: arendu * style fix Signed-off-by: arendu * config update and t5 model adapter and ia3 Signed-off-by: arendu * removed unused imports Signed-off-by: arendu * predict step for inference Signed-off-by: arendu * style fix Signed-off-by: arendu * style fix Signed-off-by: arendu * adapter inference for t5 Signed-off-by: arendu * style fix Signed-off-by: arendu * fixed bug micro and global batch size in eval Signed-off-by: arendu * minor edit Signed-off-by: arendu * agressive truncation if in test examples if no truncation field is given Signed-off-by: arendu * corrected for language_model_path name changes in main Signed-off-by: arendu * removed unused import Signed-off-by: arendu * name change for language_model_path Signed-off-by: arendu * include inter_attention to IA3 Signed-off-by: arendu * minor fix in confg Signed-off-by: arendu * minor fixes Signed-off-by: arendu * removed unused flag Signed-off-by: arendu * addressing PR comments Signed-off-by: arendu * address PR comments Signed-off-by: arendu * minor fix Signed-off-by: arendu * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * style fix Signed-off-by: arendu * CI test Signed-off-by: arendu * minor fix in jenkinsfile Signed-off-by: arendu Signed-off-by: arendu Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Hainan Xu * Bug fix - Limit val batches set to 1.0 (#5023) * Bug fix Signed-off-by: shanmugamr1992 * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Adressed sandeep's comments * Fixing limit val batches support in bert * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fixing limit val batches support in bert * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: shanmugamr1992 Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Sandeep Subramanian Signed-off-by: Hainan Xu * [bug_fix] kv_channels is used when available (#5066) * fix bug s.t kv_channels is used when available Signed-off-by: arendu * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: arendu Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Hainan Xu * P&C Docs (#5068) (#5069) Signed-off-by: Matvei Novikov Signed-off-by: Matvei Novikov Signed-off-by: Matvei Novikov Co-authored-by: Matvei Novikov Signed-off-by: Hainan Xu * Add spe_split_by_unicode_script arg (#5072) * Add spe_split_by_unicode_script arg Signed-off-by: Anas * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Anas Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Hainan Xu * probabilites -> probabilities (#5078) (#5079) Signed-off-by: nithinraok Signed-off-by: nithinraok Signed-off-by: nithinraok Co-authored-by: Nithin Rao Signed-off-by: Hainan Xu * increase PR and Issue sweep quantity and active close PRs. (#5073) * increase PR and Issue sweep quantity and active close PRs. Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> * update with stricter rules, 30 days to be stale and 7 days to be closed for both Issues and PRs. Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Signed-off-by: Hainan Xu * [TTS] added missing German phoneme tokenizer. (#5070) (#5074) Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Signed-off-by: Hainan Xu * rename to match prompt leanring (#5076) Signed-off-by: arendu Signed-off-by: arendu Signed-off-by: Hainan Xu * Missing fixes from r1.11.0 to T5 finetuning eval (#5054) (#5061) * Fixes to seq2seq eval Signed-off-by: MaximumEntropy * Style Signed-off-by: MaximumEntropy * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: MaximumEntropy Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: MaximumEntropy Co-authored-by: Sandeep Subramanian Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Hainan Xu * Notebook bug fixes (#5084) (#5085) * Notebook bug fixes Signed-off-by: Virginia Adams * Turned nemo install back on Signed-off-by: Virginia Adams * reverted notebook Signed-off-by: Virginia Adams * Updated one line in entity linking nb Signed-off-by: Virginia Adams Signed-off-by: Virginia Adams Co-authored-by: Eric Harper Signed-off-by: Virginia Adams Co-authored-by: Virginia Adams <78445382+vadam5@users.noreply.github.com> Co-authored-by: Eric Harper Signed-off-by: Hainan Xu * update strategy in notebook from ddp_fork to dp (#5088) (#5089) Co-authored-by: Zhilin Wang Signed-off-by: Hainan Xu * Fix bug in Squeezeformer Conv block (#5011) (#5024) * Fix bug in Squeezeformer Conv block Signed-off-by: smajumdar * Fix kernel context Signed-off-by: smajumdar * Fix access mixin Signed-off-by: smajumdar Signed-off-by: smajumdar Signed-off-by: smajumdar Co-authored-by: Somshubra Majumdar Signed-off-by: Hainan Xu * fixed megatron lm conversion bug (PTL related) (#5038) (#5063) Signed-off-by: David Mosallanezhad Signed-off-by: David Mosallanezhad Co-authored-by: David Mosallanezhad Signed-off-by: David Mosallanezhad Co-authored-by: David Co-authored-by: David Mosallanezhad Co-authored-by: Eric Harper Signed-off-by: Hainan Xu * Fix Unhashable type list for Numba Cuda spec augment kernel (#5093) (#5094) Signed-off-by: smajumdar Signed-off-by: smajumdar Signed-off-by: smajumdar Co-authored-by: Somshubra Majumdar Signed-off-by: Hainan Xu * Fix numba (#5098) Signed-off-by: smajumdar Signed-off-by: smajumdar Signed-off-by: Hainan Xu * Make it possible to specify output_filename in normalize_with_audio.py (#5092) Signed-off-by: Elena Rastorgueva Signed-off-by: Elena Rastorgueva Signed-off-by: Hainan Xu * Greedy decoding confidence for CTC and RNNT (#4931) * rnnt confidence draft Signed-off-by: Aleksandr Laptev * word confidence Signed-off-by: Aleksandr Laptev * advanced entropies added Signed-off-by: Aleksandr Laptev * refactoring Signed-off-by: Aleksandr Laptev * oops forgot a file Signed-off-by: Aleksandr Laptev * metrics and benchmarking script added Signed-off-by: Aleksandr Laptev * style fix Signed-off-by: Aleksandr Laptev * texterrors installation added Signed-off-by: Aleksandr Laptev * lgtm and bug fix Signed-off-by: Aleksandr Laptev * fix comments Signed-off-by: Aleksandr Laptev * fix typos Signed-off-by: Aleksandr Laptev * add missing import after rebase Signed-off-by: Aleksandr Laptev Signed-off-by: Aleksandr Laptev Co-authored-by: Aleksandr Laptev Signed-off-by: Hainan Xu * [Add] SLURP models and examples (#4668) * add model, util and loss Signed-off-by: stevehuang52 * update Signed-off-by: stevehuang52 * update Signed-off-by: stevehuang52 * update Signed-off-by: stevehuang52 * update Signed-off-by: stevehuang52 * update Signed-off-by: stevehuang52 * update Signed-off-by: stevehuang52 * update Signed-off-by: stevehuang52 * update Signed-off-by: stevehuang52 * update Signed-off-by: stevehuang52 * refactor Signed-off-by: stevehuang52 * refactor annd update Signed-off-by: stevehuang52 * update Signed-off-by: stevehuang52 * update and refactor Signed-off-by: stevehuang52 * update and refactor Signed-off-by: stevehuang52 * update and refactor Signed-off-by: stevehuang52 * update Signed-off-by: stevehuang52 * update Signed-off-by: stevehuang52 * update Signed-off-by: stevehuang52 * update Signed-off-by: stevehuang52 * update Signed-off-by: stevehuang52 * update Signed-off-by: stevehuang52 * update Signed-off-by: stevehuang52 * update docs Signed-off-by: stevehuang52 * update available models Signed-off-by: stevehuang52 * update Signed-off-by: stevehuang52 * refactor data processing Signed-off-by: stevehuang52 * fix typo Signed-off-by: stevehuang52 * update docs Signed-off-by: stevehuang52 * refactor and update Signed-off-by: stevehuang52 * update doc Signed-off-by: stevehuang52 * move transformer to asr.modules Signed-off-by: stevehuang52 * move transformer to asr.modules Signed-off-by: stevehuang52 * get rid of jsonlines Signed-off-by: stevehuang52 * refactor Signed-off-by: stevehuang52 * revert changes to nlp Signed-off-by: stevehuang52 Signed-off-by: stevehuang52 Signed-off-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com> Co-authored-by: Jagadeesh Balam <4916480+jbalam-nv@users.noreply.github.com> Signed-off-by: Hainan Xu * only optimize params that are part of the adapter modules (#5086) Signed-off-by: arendu Signed-off-by: arendu Co-authored-by: Virginia Adams <78445382+vadam5@users.noreply.github.com> Signed-off-by: Hainan Xu * Pipeline Parallel T5 Prompt Learning (#4956) * Added pre process flag checks and pipeline parallel in fwd Signed-off-by: Virginia Adams * Added rank check for pipeline parallel Signed-off-by: Virginia Adams * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * T5 prompt learning works! Signed-off-by: Virginia Adams * IA3 passing CI Signed-off-by: Virginia Adams * Fixed typo Signed-off-by: Virginia Adams * removed optimizer setup so Adi's change will not conflict Signed-off-by: Virginia Adams Signed-off-by: Virginia Adams Signed-off-by: Adi Renduchintala <108822655+arendu@users.noreply.github.com> Co-authored-by: Adi Renduchintala <108822655+arendu@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Hainan Xu * [TTS] remove phonemizer.py (#5090) remove phonemizer.py and convert code block to markdown in the tutorial. Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Signed-off-by: Hainan Xu * T5 Decoding with PP > 2 fix (#5091) (#5103) * set sequence lenghts in the pipeline properly Signed-off-by: MaximumEntropy * Fix Signed-off-by: MaximumEntropy Signed-off-by: MaximumEntropy Signed-off-by: MaximumEntropy Co-authored-by: Sandeep Subramanian Signed-off-by: Hainan Xu * [TTS] fixed wrong val loss for epoch 0 and inconsistent metrics names (#5087) (#5102) * fixed hifigan configs as well * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Hainan Xu * Fix and refactor consumed samples save/restore for Megatron models. (#5077) * Fixes and refactor Signed-off-by: MaximumEntropy * Fix Signed-off-by: MaximumEntropy * Remove unused imports Signed-off-by: MaximumEntropy * Empty Signed-off-by: MaximumEntropy * Fix Signed-off-by: MaximumEntropy Signed-off-by: MaximumEntropy Signed-off-by: Hainan Xu * RIR corpus generator tool (#4927) Signed-off-by: Ante Jukić Signed-off-by: Ante Jukić Signed-off-by: Hainan Xu * Multiprocessing fix (#5106) (#5107) Signed-off-by: Matvei Novikov Signed-off-by: Matvei Novikov Signed-off-by: Matvei Novikov Co-authored-by: Matvei Novikov Signed-off-by: Hainan Xu * [Bug fix] PC lexical + audio (#5109) (#5110) * training running Signed-off-by: ekmb * revert Signed-off-by: ekmb * revert Signed-off-by: ekmb Signed-off-by: ekmb Signed-off-by: ekmb Co-authored-by: Evelina <10428420+ekmb@users.noreply.github.com> Signed-off-by: Hainan Xu * [Fix] schedulers with no max_steps param (#4564) * fix schedulers Signed-off-by: stevehuang52 * update to use python inspect module Signed-off-by: stevehuang52 * update Signed-off-by: stevehuang52 Signed-off-by: stevehuang52 Signed-off-by: Hainan Xu * T5 prompt learning fixes missing from r.11.0 merge (#5075) (#5101) * Fix special tokens Signed-off-by: MaximumEntropy * Fix Signed-off-by: MaximumEntropy * Empty Signed-off-by: MaximumEntropy Signed-off-by: MaximumEntropy Co-authored-by: David Signed-off-by: MaximumEntropy Co-authored-by: Sandeep Subramanian Co-authored-by: David Co-authored-by: Eric Harper Signed-off-by: Hainan Xu * [TTS] Add NeMo TTS Primer Tutorial (#4933) * [TTS] Add NeMo TTS Primer Tutorial Signed-off-by: Ryan Signed-off-by: Hainan Xu * Add Squeezeformer CTC model checkpoints on Librispeech (#5121) Signed-off-by: smajumdar Signed-off-by: smajumdar Signed-off-by: Hainan Xu * adding loss normalization options to rnnt joint (#4829) * adding normalization options to rnnt joint loss * moving the param to joint * moving loss normalization to rnnt loss config * style * cleaning up * fixing sum reduction in joint Signed-off-by: Dima Rekesh * moving reduction into RNNT loss class * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * refactoring * typos Signed-off-by: Dima Rekesh Signed-off-by: Dima Rekesh Co-authored-by: Dima Rekesh Co-authored-by: Oleksii Kuchaiev Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Hainan Xu * Asr concat dataloader (#5108) * forced precision * typo * initial commit Signed-off-by: Dima Rekesh * typos and bugs Signed-off-by: Dima Rekesh * reverting conformer encoder Signed-off-by: Dima Rekesh * additional checks Signed-off-by: Dima Rekesh * adding support to CTC models as well * reverting conformer_encoder Signed-off-by: Dima Rekesh * typo Signed-off-by: Dima Rekesh * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * refactoring Signed-off-by: Dima Rekesh * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * refactoring Signed-off-by: Dima Rekesh * merging Signed-off-by: Dima Rekesh Signed-off-by: Dima Rekesh Signed-off-by: Dima Rekesh Co-authored-by: Dima Rekesh Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Somshubra Majumdar Signed-off-by: Hainan Xu * fix blossom ci unittests Signed-off-by: Oleksii Kuchaiev Signed-off-by: Hainan Xu * bugfix: pybtex.database.InvalidNameString: Too many commas in author field. (#5112) (#5115) Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Signed-off-by: Hainan Xu * Uppdate container version to 22.09 (#5105) * update container version Signed-off-by: ericharper * pin click Signed-off-by: ericharper * pin click 8.0.2 Signed-off-by: ericharper Signed-off-by: ericharper Signed-off-by: Hainan Xu * Remove unsupported arguments from MegatronNMT (#5065) * Fixes Signed-off-by: MaximumEntropy * Fixes Signed-off-by: MaximumEntropy * Style Signed-off-by: MaximumEntropy * Fix Signed-off-by: MaximumEntropy * More fixes Signed-off-by: MaximumEntropy Signed-off-by: MaximumEntropy Signed-off-by: Hainan Xu * pp2 support for T5 IA3 learning and T5 Adapters learning (#5116) * enabling pp2 Signed-off-by: arendu * optimizer update Signed-off-by: arendu * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * T5 pp>1 support for adapters and ia3 Signed-off-by: arendu * fix bug with missing adapter_tuning Signed-off-by: arendu * inference error fixed, pp=2 Signed-off-by: arendu Signed-off-by: arendu Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Oleksii Kuchaiev Signed-off-by: Hainan Xu * T5 Prompt Learning Fixes for Pipeline Parallel (#5120) * Initial fixes Signed-off-by: MaximumEntropy * Added back validation acc Signed-off-by: Virginia Adams * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Put num workers back Signed-off-by: Virginia Adams * added relative encoding if statament Signed-off-by: Virginia Adams * Added back val loss only validation Signed-off-by: Virginia Adams * Revert "Added back val loss only validation" This reverts commit 86d8f4806fe30335c40c3716ce18259939df500f. * Removed val acc for PP > 1 Signed-off-by: Virginia Adams * Removed enc_seq_len if statement Signed-off-by: Virginia Adams * Added back validation acc calc Signed-off-by: Virginia Adams * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: MaximumEntropy Signed-off-by: Virginia Adams Signed-off-by: Virginia Adams Co-authored-by: Virginia Adams Co-authored-by: Virginia Adams <78445382+vadam5@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Virginia Adams Signed-off-by: Hainan Xu * add doc info (#4721) Signed-off-by: Yang Zhang Signed-off-by: Yang Zhang Signed-off-by: Hainan Xu * [TTS] Add SpanishCharsTokenizer (#5135) * [TTS] Add SpanishCharsTokenizer Signed-off-by: Ryan Signed-off-by: Hainan Xu * Update megatron interface to dialogue (#4936) * fix style formatting Signed-off-by: Zhilin Wang * update template to include description of intent Signed-off-by: Zhilin Wang * update Jenkinsfile Signed-off-by: Zhilin Wang * changes based on requests in review Signed-off-by: Zhilin Wang * add compatibility with assistant dataset Signed-off-by: Zhilin Wang * update Jenkins Signed-off-by: Zhilin Wang * remove dialogue_state_tracking Signed-off-by: Zhilin Wang * update huggingface utils for dialogue Signed-off-by: Zhilin Wang * rename dialogue_state_tracking_hybrid to dialogue_state_tracking_sgdqa Signed-off-by: Zhilin Wang * style fix Signed-off-by: Zhilin Wang * fix style Signed-off-by: Zhilin Wang * style fix nemo/collections/nlp/models/dialogue_state_tracking_sgdqa/__init__.py Signed-off-by: Zhilin Wang * update Jenkinsfile for SGDGEN Signed-off-by: Zhilin Wang * update Jenkinsfile for SGDGEN Signed-off-by: Zhilin Wang * update Jenkinsfile for SGDGEN Signed-off-by: Zhilin Wang * update Jenkinsfile for SGDGEN Signed-off-by: Zhilin Wang * update Jenkinsfile for SGDGEN Signed-off-by: Zhilin Wang * fix typo Signed-off-by: Zhilin Wang * add docstrings for assistant data processsor Signed-off-by: Zhilin Wang * update Jenkins for SGDGEN local checkpoint Signed-off-by: Zhilin Wang * update style Signed-off-by: Zhilin Wang * use local vocab file for Jenkinsfile Signed-off-by: Zhilin Wang * patch for Jenkins CI using local file Signed-off-by: Zhilin Wang * add slot filling prediction and metrics Signed-off-by: Zhilin Wang * remove unused code Signed-off-by: Zhilin Wang * style fix Signed-off-by: Zhilin Wang * refactor metrics code out of Dialogue GPT Model Signed-off-by: Zhilin Wang * integrate backward compatible support for IntentSlotClassificationModel (bert model) Signed-off-by: Zhilin Wang * save prediction file for IntentSlotClassification Signed-off-by: Zhilin Wang * update dialogue gpt model training for megatron gpt Signed-off-by: Zhilin Wang * remove batch generate for HF GPT2, which causes lower performance Signed-off-by: Zhilin Wang * add few shot capability to dialogue gpt model Signed-off-by: Zhilin Wang * update Jenkinsfile and remove unused import Signed-off-by: Zhilin Wang * update code description and clarity Signed-off-by: Zhilin Wang * address PR comments Signed-off-by: Zhilin Wang * style fix Signed-off-by: Zhilin Wang * integrate compatibility with ZeroShotIntentModel Signed-off-by: Zhilin Wang * rename folder to dialogue due to increased scope and further refactor for clarity Signed-off-by: Zhilin Wang * added dialogue GPT for sequence generation task (e.g. answer extender) Signed-off-by: Zhilin Wang * add CI test for DialogueGPTGenerationModel Signed-off-by: Zhilin Wang * integrate DialogueS2SGenerationModel for generation task (e.g. answer extender) Signed-off-by: Zhilin Wang * modify huggingface utils to support HF t5/BART models Signed-off-by: Zhilin Wang * style fix Signed-off-by: Zhilin Wang * style fix Signed-off-by: Zhilin Wang * remove unused imports Signed-off-by: Zhilin Wang * style fix Signed-off-by: Zhilin Wang * update Jenkinsfile Signed-off-by: Zhilin Wang * update Jenkinsfile Signed-off-by: Zhilin Wang * update bleu metric Signed-off-by: Zhilin Wang * fix bleu metric style Signed-off-by: Zhilin Wang * debug bleu metric Signed-off-by: Zhilin Wang * debug bleu metric Signed-off-by: Zhilin Wang * update based on PR #3893 Signed-off-by: Zhilin Wang * update 2 based on PR #3893 Signed-off-by: Zhilin Wang * update 3 based on PR #3893 Signed-off-by: Zhilin Wang * integrate sgd generation based on user user utterance and system slot-values to generate system utterance Signed-off-by: Zhilin Wang * add validation model saving capabilities Signed-off-by: Zhilin Wang * cleaned up code for SGD Based Answer extender Signed-off-by: Zhilin Wang * update Dialogue Generation CI Signed-off-by: Zhilin Wang * update Jenkinsfile Signed-off-by: Zhilin Wang * update Jenkinsfile Signed-off-by: Zhilin Wang * fix Jenkins CI issue" Signed-off-by: Zhilin Wang * add support for design dataset Signed-off-by: Zhilin Wang * remove unnecessary imports Signed-off-by: Zhilin Wang * update Jenkins Signed-off-by: Zhilin Wang * update jenkins Signed-off-by: Zhilin Wang * update jenkins Signed-off-by: Zhilin Wang * support megatron for dialogue_s2s_generation_model Signed-off-by: Zhilin Wang * reduce loaded samples in MSMarcoDataProcessor to 64 when cfg.model.dataset.debug_mode=True Signed-off-by: Zhilin Wang * style fix Signed-off-by: Zhilin Wang * style fix Signed-off-by: Zhilin Wang * update CI Signed-off-by: Zhilin Wang * update checkpoint and predictions filename to include epoch number Signed-off-by: Zhilin Wang * style fix Signed-off-by: Zhilin Wang * integrate HF BART MNLI into zero shot intent model Signed-off-by: Zhilin Wang * integrate Dialogue Nearest Neighbour Model Signed-off-by: Zhilin Wang * update Jenkins Signed-off-by: Zhilin Wang * update Jenkins Signed-off-by: Zhilin Wang * refactor Dialogue SGD Data Processor to make interface for models cleaner Signed-off-by: Zhilin Wang * update jenkins Signed-off-by: Zhilin Wang * update Dialogue S2S Generation model for DialogueSGDDataProcessor interface Signed-off-by: Zhilin Wang * update jenkins Signed-off-by: Zhilin Wang * update jenkins Signed-off-by: Zhilin Wang * support sgd and drive thru datasets by zero shot model and nearest neighbour model Signed-off-by: Zhilin Wang * add prediction saving code to nearest neighbour and zero shot intent models Signed-off-by: Zhilin Wang * fix typo in sgd data processor Signed-off-by: Zhilin Wang * integrate Dialogue Mellon QA Data Processor Signed-off-by: Zhilin Wang * update mellon qa Signed-off-by: Zhilin Wang * update dialogue.py to remove outdated info Signed-off-by: Zhilin Wang * style fix Signed-off-by: Zhilin Wang * update dialogue_config.yaml Signed-off-by: Zhilin Wang * update dialogue_config.yaml Signed-off-by: Zhilin Wang * add dialogue docs Signed-off-by: Zhilin Wang * address review comments Signed-off-by: Zhilin Wang * style fix Signed-off-by: Zhilin Wang * style fix Signed-off-by: Zhilin Wang * style fix Signed-off-by: Zhilin Wang * style fix Signed-off-by: Zhilin Wang * style fix Signed-off-by: Zhilin Wang * style fix for cfg Signed-off-by: Zhilin Wang * make dependency on apex optional Signed-off-by: Zhilin Wang * change NLPDDPluggin calling logic to make it possible to run without apex Signed-off-by: Zhilin Wang * add first draft of tutorial Signed-off-by: Zhilin Wang * reduce ms marco size by removing lines without wellFormedAnswers Signed-off-by: Zhilin Wang * address pr comments Signed-off-by: Zhilin Wang * style fix Signed-off-by: Zhilin Wang * update colab tutorial link in dialogue docs Signed-off-by: Zhilin Wang * include unit test and some refactor to facilitate unit test Signed-off-by: Zhilin Wang * style fix Signed-off-by: Zhilin Wang * address pr issues Signed-off-by: Zhilin Wang * remove typos in dialogue tutorial Signed-off-by: Zhilin Wang * support larger files for question answering Signed-off-by: Zhilin Wang * style fix Signed-off-by: Zhilin Wang * style fix Signed-off-by: Zhilin Wang * style fix Signed-off-by: Zhilin Wang * remove unnecessary artifacts to reduce memory use Signed-off-by: Zhilin Wang * put 0 tensor to device Signed-off-by: Zhilin Wang * update link within dialogue tutorial Signed-off-by: Zhilin Wang * restore previously delete files Signed-off-by: Zhilin Wang * update error handling when loss = nan Signed-off-by: Zhilin Wang * update nan handling Signed-off-by: Zhilin Wang * style fix Signed-off-by: Zhilin Wang * update spanning loss func Signed-off-by: Zhilin Wang * update spanning loss Signed-off-by: Zhilin Wang * fix type error raised in qa_dataset.py Signed-off-by: Zhilin Wang * add error checking message Signed-off-by: Zhilin Wang * revert back to float32 Signed-off-by: Zhilin Wang * revert back to float32 Signed-off-by: Zhilin Wang * update error msgs Signed-off-by: Zhilin Wang * update error msgs Signed-off-by: Zhilin Wang * update error msgs Signed-off-by: Zhilin Wang * update error msgs Signed-off-by: Zhilin Wang * update error msgs Signed-off-by: Zhilin Wang * update error msgs Signed-off-by: Zhilin Wang * update error msgs Signed-off-by: Zhilin Wang * update error msgs Signed-off-by: Zhilin Wang * update exp logging Signed-off-by: Zhilin Wang * update error msgs Signed-off-by: Zhilin Wang * update loading of large file from pickle to json Signed-off-by: Zhilin Wang * update loading of large file from pickle to json Signed-off-by: Zhilin Wang * limit number of negative samples Signed-off-by: Zhilin Wang * revert post processing Signed-off-by: Zhilin Wang * revert post processing Signed-off-by: Zhilin Wang * remove unused methods and style fix Signed-off-by: Zhilin Wang * add more documentation Signed-off-by: Zhilin Wang * remove unused imports Signed-off-by: Zhilin Wang * changes base on PR review Signed-off-by: Zhilin Wang * set wandb logger falseby default Signed-off-by: Zhilin Wang * update interface with megatron gpt prompt learning Signed-off-by: Zhilin Wang * update inline documentation Signed-off-by: Zhilin Wang * style fix Signed-off-by: Zhilin Wang * style fix Signed-off-by: Zhilin Wang * update prompt_ids Signed-off-by: Zhilin Wang * update error msg Signed-off-by: Zhilin Wang * update config Signed-off-by: Zhilin Wang * update config Signed-off-by: Zhilin Wang * set inference = False for dialgue prompt learning during trainng Signed-off-by: Zhilin Wang * set inference = False for dialgue prompt learning during trainng Signed-off-by: Zhilin Wang * remove unused code Signed-off-by: Zhilin Wang * update config yaml Signed-off-by: Zhilin Wang * fix bug for megatron gpt prompt learning Signed-off-by: Zhilin Wang * remove unused import Signed-off-by: Zhilin Wang * address comments in PR Signed-off-by: Zhilin Wang * address comments in PR Signed-off-by: Zhilin Wang * address typo Signed-off-by: Zhilin Wang * add megatron t5 inference Signed-off-by: Zhilin Wang * fix bug due to bert tokenizer not being space-aware Signed-off-by: Zhilin Wang * style fix Signed-off-by: Zhilin Wang * style fix Signed-off-by: Zhilin Wang * update style Signed-off-by: Zhilin Wang * update IntentSlotModel onnx export test Signed-off-by: Zhilin Wang * update style Signed-off-by: Zhilin Wang * update exportable Signed-off-by: Zhilin Wang * address PR comments Signed-off-by: Zhilin Wang * replace functools.cache_property with functools.lru_cache to maintain python 3.7 compatibility Signed-off-by: Zhilin Wang * improve speed of rank_candidates and support for p tuning Signed-off-by: Zhilin Wang * update dialogue.py Signed-off-by: Zhilin Wang * fix megatron prompt learning saving bug Signed-off-by: Zhilin Wang * update generate_candidate method Signed-off-by: Zhilin Wang * remove repeated init text ids and invert attention masks Signed-off-by: Zhilin Wang * update typo Signed-off-by: Zhilin Wang * custom collate fn to remove excess padding in batch Signed-off-by: Zhilin Wang * style fix Signed-off-by: Zhilin Wang * style fix Signed-off-by: Zhilin Wang * update complete method to mitigate issue when max seq len is low Signed-off-by: Zhilin Wang * address pr comments Signed-off-by: Zhilin Wang * update generation interface Signed-off-by: Zhilin Wang Signed-off-by: Zhilin Wang Co-authored-by: Zhilin Wang Co-authored-by: Oleksii Kuchaiev Co-authored-by: Yang Zhang Co-authored-by: Eric Harper Co-authored-by: Sandeep Subramanian Signed-off-by: Hainan Xu * Added save inference ready .nemo file with every checkpoint (#5055) * Added save inference ready .nemo file with every checkpoint Signed-off-by: Virginia Adams * Python style fix Signed-off-by: Virginia Adams * addressed Adi's comment Signed-off-by: Virginia Adams * Added ptuning check in model checkpoint saving Signed-off-by: Virginia Adams * Changed save_nemo_on_valdaition default to False Signed-off-by: Virginia Adams * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Changes global batch size of adapter CI Signed-off-by: Virginia Adams * Changed num workers to 0 Signed-off-by: Virginia Adams * added first stage of pipeline check Signed-off-by: Virginia Adams * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Virginia Adams Signed-off-by: Virginia Adams <78445382+vadam5@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Hainan Xu * Fixes for docs/typos + remove max_utts parameter from tarred datasets as it causes hang in training (#5118) * Remove ; from jupyter notebook cells Signed-off-by: Igor Gitman * Fix typos in documentation/code Signed-off-by: Igor Gitman * Fix output message to have 'or equal' Signed-off-by: Igor Gitman * Link formatting fixes Signed-off-by: Igor Gitman * Add error if max_utts is used in tarred datasets Signed-off-by: Igor Gitman * Remove max_utts parameter from tarred datasets Signed-off-by: Igor Gitman * Fix max_utts removal in tests Signed-off-by: Igor Gitman * Fix typo if -> is Signed-off-by: Igor Gitman Signed-off-by: Igor Gitman Signed-off-by: Hainan Xu * Merge r1.12.0 main (#5139) * update branch Signed-off-by: ericharper * Add cherry-pick action (#4958) * add cherry-pick action Signed-off-by: ericharper * Pin Transformers version to fix CI (#4955) * Pin transformers version in CI to prevent offline tokenizer loading error Signed-off-by: SeanNaren * Drop version Signed-off-by: SeanNaren * Disable offline temporarily Signed-off-by: SeanNaren * Disable offline temporarily Signed-off-by: SeanNaren * Enable offline Signed-off-by: SeanNaren Signed-off-by: SeanNaren Signed-off-by: ericharper Signed-off-by: SeanNaren Co-authored-by: Sean Naren * upper bound transformers Signed-off-by: ericharper * remove duplicate transformers requirement Signed-off-by: ericharper * Release SOTA Lang ID model (#5080) * add pretrained lang id model ambernet Signed-off-by: fayejf * update doc and style fix Signed-off-by: fayejf Signed-off-by: fayejf * update branch and package info Signed-off-by: ericharper * remove upper bounds on lightning and transformers Signed-off-by: ericharper * remove transformers offline from ci Signed-off-by: ericharper * upper bound transformers Signed-off-by: ericharper Signed-off-by: ericharper Signed-off-by: SeanNaren Signed-off-by: fayejf Co-authored-by: Sean Naren Co-authored-by: fayejf <36722593+fayejf@users.noreply.github.com> Signed-off-by: Hainan Xu * Added ASR model comparison to SDE (#5043) SDE: Added ASR model comparison tool to SDE transcribe speech: Added support for many predictions in one file, as well as custom field names Signed-off-by: George Zelenfroynd Signed-off-by: Hainan Xu * fix nmt eval sampler (#5154) Signed-off-by: Abhinav Khattar Signed-off-by: Abhinav Khattar Signed-off-by: Hainan Xu * Fix Global init steps (#5143) * move global step to base Signed-off-by: Yi Dong * fix fused softmax Signed-off-by: Yi Dong * add the missing file Signed-off-by: Yi Dong * update the fused kernel Signed-off-by: Yi Dong * fix import error Signed-off-by: Yi Dong * fix import again Signed-off-by: Yi Dong Signed-off-by: Yi Dong Signed-off-by: Yi Dong Co-authored-by: Yi Dong Co-authored-by: Sandeep Subramanian Signed-off-by: Hainan Xu * [TTS] bug fix - sample rate was being ignored in vocoder dataset (#4518) * bug fix - sample rate was being ignored in vocoder dataset when not loading mel * handled n segments for a different sampling rate than original sampling rate * Added case for n_segments 0, warning for n_segments greater than file length Signed-off-by: Paarth Neekhara Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Co-authored-by: Jocelyn Signed-off-by: Hainan Xu * Add EMA support to NeMo (#4764) * Added Base files Signed-off-by: SeanNaren * Some refactors, swap to using MNIST Lnet Signed-off-by: SeanNaren * Add a few more tests, allow the callback to be set via the exp manager Signed-off-by: SeanNaren * Actually run validation for testing Signed-off-by: SeanNaren * Run isort Signed-off-by: SeanNaren * Add test for saving state/fix saving state Signed-off-by: SeanNaren * Use dummy model Signed-off-by: SeanNaren * Fix test Signed-off-by: SeanNaren * Add copyright Signed-off-by: SeanNaren * Support saving separate EMA weight module Signed-off-by: SeanNaren * Add standalone functionality/logging Signed-off-by: SeanNaren * Expose more parameters Signed-off-by: SeanNaren * Modify to allow option to replace validation Signed-off-by: SeanNaren * Add jenkins test, formatting Signed-off-by: SeanNaren * Pin Transformers version to fix CI (#4955) * Pin transformers version in CI to prevent offline tokenizer loading error Signed-off-by: SeanNaren * Drop version Signed-off-by: SeanNaren * Disable offline temporarily Signed-off-by: SeanNaren * Disable offline temporarily Signed-off-by: SeanNaren * Enable offline Signed-off-by: SeanNaren Signed-off-by: SeanNaren * Add cherry-pick action (#4958) (#4961) * add cherry-pick action Signed-off-by: ericharper * Pin Transformers version to fix CI (#4955) * Pin transformers version in CI to prevent offline tokenizer loading error Signed-off-by: SeanNaren * Drop version Signed-off-by: SeanNaren * Disable offline temporarily Signed-off-by: SeanNaren * Disable offline temporarily Signed-off-by: SeanNaren * Enable offline Signed-off-by: SeanNaren Signed-off-by: SeanNaren Signed-off-by: ericharper Signed-off-by: SeanNaren Co-authored-by: Sean Naren Signed-off-by: ericharper Signed-off-by: SeanNaren Co-authored-by: Eric Harper Co-authored-by: Sean Naren Signed-off-by: SeanNaren * Fix changelog builder (#4962) (#4963) Signed-off-by: smajumdar Signed-off-by: smajumdar Signed-off-by: smajumdar Signed-off-by: SeanNaren * fix cherry pick workflow (#4964) (#4965) Signed-off-by: ericharper Signed-off-by: ericharper Signed-off-by: ericharper Co-authored-by: Eric Harper Signed-off-by: SeanNaren * reorder model check (#4959) (#4967) Signed-off-by: nithinraok Signed-off-by: nithinraok Signed-off-by: nithinraok Co-authored-by: Nithin Rao Signed-off-by: SeanNaren * check for active conda environment (#4970) (#4971) Signed-off-by: SeanNaren * [TTS] fix broken tutorial for MixerTTS. (#4949) (#4976) Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Signed-off-by: SeanNaren * Checkpoint averaging class fix (#4946) * 1. Added args.class_path to provide it externally. Signed-off-by: Micha Livne * 1. Fixed style. Signed-off-by: Micha Livne Signed-off-by: Micha Livne Signed-off-by: SeanNaren * Add ability to give seperate datasets for test, train and validation (#4798) * Add ability to give seperate datasets for test, train and validation * Addressed Sandeeps comments * Addressed Sandeeps comments * Add ability to give seperate datasets for test, train and validation * Add ability to give seperate datasets for test, train and validation * Addressed review comments * Bug fix for common dataset utils * Add CI tests Signed-off-by: shanmugamr1992 * Reformat code Signed-off-by: shanmugamr1992 * Bug fix Signed-off-by: shanmugamr1992 * Bug fix * Bug Fix * Bug Fix * Update Jenkinsfile * Addressed comments * Addressed Eriks comments. * Addressed Sandeep * Update Jenkinsfile * Update Jenkinsfile * Update dataset_utils.py * Update Jenkinsfile * Update Jenkinsfile * Use GPT CI config Signed-off-by: MaximumEntropy Signed-off-by: shanmugamr1992 Signed-off-by: MaximumEntropy Co-authored-by: MaximumEntropy Signed-off-by: SeanNaren * fix label models restoring issue from wrighted cross entropy (#4968) (#4975) Signed-off-by: nithinraok Signed-off-by: nithinraok Signed-off-by: nithinraok Co-authored-by: Nithin Rao Signed-off-by: SeanNaren * Add simple pre-commit file (#4983) * Add simple pre-commit file Signed-off-by: SeanNaren * Exclude docs folder Signed-off-by: SeanNaren * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: SeanNaren * Revert "[pre-commit.ci] auto fixes from pre-commit.com hooks" This reverts commit 053bd5ba579537a5f311b431871c21f3381b43eb. Signed-off-by: SeanNaren * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: SeanNaren Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: SeanNaren * Import pycuda.autoprimaryctx or pycuda.autoinit to init pycuda execution environment (#4951) Signed-off-by: Jin Li Signed-off-by: Jin Li Co-authored-by: Somshubra Majumdar Signed-off-by: SeanNaren * Adding speaker embedding conditioning in fastpitch (#4986) Signed-off-by: subhankar-ghosh Signed-off-by: subhankar-ghosh Signed-off-by: SeanNaren * Fix ASR issues (#4984) (#4991) * Fix ASR issues Signed-off-by: smajumdar * Revert fix Signed-off-by: smajumdar Signed-off-by: smajumdar Signed-off-by: smajumdar Co-authored-by: Somshubra Majumdar Signed-off-by: SeanNaren * Fix current tests Signed-off-by: SeanNaren * More test coverage Signed-off-by: SeanNaren * Address reviews Signed-off-by: SeanNaren * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Address review Signed-off-by: SeanNaren * Drop bf16 test Signed-off-by: SeanNaren * Address review Signed-off-by: SeanNaren * remove print Signed-off-by: SeanNaren * Add bf16 Signed-off-by: SeanNaren Signed-off-by: SeanNaren Signed-off-by: ericharper Signed-off-by: smajumdar Signed-off-by: nithinraok Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Signed-off-by: Micha Livne Signed-off-by: shanmugamr1992 Signed-off-by: MaximumEntropy Signed-off-by: Jin Li Signed-off-by: subhankar-ghosh Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Eric Harper Co-authored-by: Somshubra Majumdar Co-authored-by: Nithin Rao Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Co-authored-by: Micha Livne Co-authored-by: shanmugamr1992 <111910568+shanmugamr1992@users.noreply.github.com> Co-authored-by: MaximumEntropy Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: liji-nv <59594262+liji-nv@users.noreply.github.com> Co-authored-by: Subhankar Ghosh Signed-off-by: Hainan Xu * Fix BF16 test (#5162) Signed-off-by: SeanNaren Signed-off-by: SeanNaren Signed-off-by: Hainan Xu * Fix errors in speaker diarization nemo docs (#5153) * fix docs and docstrings for MSDD Signed-off-by: Taejin Park * fix nemo docs errors Signed-off-by: Taejin Park * reflected review comments Signed-off-by: Taejin Park Signed-off-by: Taejin Park Signed-off-by: Hainan Xu * Add interleaved pipeline schedule to GPT (#5025) * add virtual pipeline size to config Signed-off-by: ericharper * convert model to list of modules Signed-off-by: ericharper * convert model to list of modules Signed-off-by: ericharper * convert model to list of modules Signed-off-by: ericharper * update for list of modules Signed-off-by: ericharper * add virtual to init Signed-off-by: ericharper * update first last stage embedding all reduce Signed-off-by: ericharper * update sequence parallel all reduce for virtual models Signed-off-by: ericharper * runs but we get an error Signed-off-by: ericharper * set virtual rank 0 after looping Signed-off-by: ericharper * account for virtual when determinining first and last pipeline stages Signed-off-by: ericharper * checkpointing for virtual models in progress Signed-off-by: ericharper * add checkpoint hooks Signed-off-by: ericharper * working on validation when resuming Signed-off-by: ericharper * skip sanity val steps by default in config Signed-off-by: ericharper * remove comment Signed-off-by: ericharper * log number of params Signed-off-by: ericharper * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * style Signed-off-by: ericharper * check if self.model is a list Signed-off-by: ericharper * make virtual pipeline default size None on init Signed-off-by: ericharper * make virtual pipeline default to None in config Signed-off-by: ericharper * remove ensure_divisibility call Signed-off-by: ericharper * fix lgtm alerts Signed-off-by: ericharper * remove num_sanity_val_steps from config Signed-off-by: ericharper * default virtual pipeline size to none Signed-off-by: ericharper * check for list Signed-off-by: ericharper * update assert to make sure we are only doing virtual for gpt Signed-off-by: ericharper * revert change to get_params_for_weight_decay Signed-off-by: ericharper * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * init var Signed-off-by: ericharper * add import guard for set virtual model parallel world size Signed-off-by: ericharper * use import guard Signed-off-by: ericharper * update calls to fake init in eval scripts Signed-off-by: ericharper * add _get_fwd_bwd_function Signed-off-by: ericharper * log all total model parameters Signed-off-by: ericharper * remove unused import Signed-off-by: ericharper Signed-off-by: ericharper Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Hainan Xu * reduced to 14 inactive days to be stale for PRs. (#5165) Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Signed-off-by: Hainan Xu * refactor TTS documentation organization and add new contents. (#5137) * refactor TTS documentation organization and add new contents. * fix asr api bug. * fix broken links. * fix unexpected indentation errors. * fixed unexpected indentation. * fixed broken paper reference. * fixed cross-reference and typos. * fixed toctree errors. * revert to 'Augmentors' * reordered TTS tutorial list in starthere. * ordered api classes alphabetically for each Section. * fixed underscore typo for fastpitch checkpoint. Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> * upcase 'Tuning' Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> * fixed typo for RAD-TTS Aligner Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> * reorder aligner section after mel-gen and vocoders in models.rst. Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> * clarify Mixer-TTS-X and reorder model descriptions alphabetically. Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> * fixed some typos and formats. Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> * removed old megatron.rst. Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> * fixed block quote ends without a blank line warnings. Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> * remove duplicate reference; fixed missing key nlp-megatron-shoeybi2019megatron Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> * Revert "removed old megatron.rst." This reverts commit c5ea1dc3f23272eecfe8040e3abfa54fa122cf73. Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> * removed Russian, a hyphen, and add a note about G2P in tts/config.rst Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> * added pynini installation in wfst_text_normalization.rst Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> * added description of manifest key/value pairs. Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> * add toctree in tts/intro Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> * replace main branch to stable. Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> * add 'upcoming' for e2e systems. Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> * replaced main branch to stabl… * Multiblank Transducer (#5527) * multi-blank transducers Signed-off-by: Hainan Xu * one line bug fix Signed-off-by: Hainan Xu * change interface of RNNTDecoding class to extract num-extra-output from joint instead of constructor Signed-off-by: Hainan Xu * addressed PR comments Signed-off-by: Hainan Xu * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Hainan Xu Co-authored-by: Hainan Xu Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Default RNNT loss to int64 targets (#6011) Signed-off-by: smajumdar * Rebase Signed-off-by: smajumdar * Begin refactoring tests Signed-off-by: smajumdar * Pass all tests for RNNT numba loss Signed-off-by: smajumdar * Pass all tests for RNNT numba loss Signed-off-by: smajumdar * Remove print Signed-off-by: smajumdar * Fix test for version Signed-off-by: smajumdar * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Revert bad merges Signed-off-by: smajumdar * Revert bad merges Signed-off-by: smajumdar * Address comments Signed-off-by: smajumdar * Remove wrong file Signed-off-by: smajumdar --------- Signed-off-by: smajumdar Signed-off-by: Vahid Signed-off-by: Hainan Xu Signed-off-by: arendu Signed-off-by: shanmugamr1992 Signed-off-by: Matvei Novikov Signed-off-by: Anas Signed-off-by: nithinraok Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Signed-off-by: MaximumEntropy Signed-off-by: Virginia Adams Signed-off-by: smajumdar Signed-off-by: David Mosallanezhad Signed-off-by: Elena Rastorgueva Signed-off-by: Aleksandr Laptev Signed-off-by: stevehuang52 Signed-off-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com> Signed-off-by: Adi Renduchintala <108822655+arendu@users.noreply.github.com> Signed-off-by: Ante Jukić Signed-off-by: ekmb Signed-off-by: Ryan Signed-off-by: Dima Rekesh Signed-off-by: Dima Rekesh Signed-off-by: Oleksii Kuchaiev Signed-off-by: ericharper Signed-off-by: Virginia Adams Signed-off-by: Yang Zhang Signed-off-by: Zhilin Wang Signed-off-by: Virginia Adams <78445382+vadam5@users.noreply.github.com> Signed-off-by: Igor Gitman Signed-off-by: SeanNaren Signed-off-by: fayejf Signed-off-by: George Zelenfroynd Signed-off-by: Abhinav Khattar Signed-off-by: Yi Dong Signed-off-by: Yi Dong Signed-off-by: Paarth Neekhara Signed-off-by: Micha Livne Signed-off-by: Jin Li Signed-off-by: subhankar-ghosh Signed-off-by: Taejin Park Signed-off-by: Miguel Martínez Signed-off-by: miguelangel Signed-off-by: Jocelyn Huang Signed-off-by: 彭震东 <275331498@qq.com> Signed-off-by: Alexandra Antonova Signed-off-by: Jason Signed-off-by: Patrick Simianer Signed-off-by: Shantanu Acharya Signed-off-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com> Signed-off-by: Tim Moon Signed-off-by: eharper Signed-off-by: Micha Livne Signed-off-by: Oleksii Volkovskyi Signed-off-by: Yuekai Zhang Signed-off-by: Boris Fomitchev Signed-off-by: Sasha Meister <117230141+ssh-meister@users.noreply.github.com> Signed-off-by: whrichd Signed-off-by: Elena Rastorgueva <80532067+erastorgueva-nv@users.noreply.github.com> Signed-off-by: Vladimir Bataev Signed-off-by: Viraj Karandikar Signed-off-by: Yu Yao Signed-off-by: PeganovAnton Signed-off-by: Somshubra Majumdar Signed-off-by: Jonghwan Hyeon Signed-off-by: Boris Fomitchev Signed-off-by: shane carroll Co-authored-by: Samuel Kriman Co-authored-by: Hainan Xu Co-authored-by: Vahid Noroozi Co-authored-by: Adi Renduchintala <108822655+arendu@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com> Co-authored-by: Sandeep Subramanian Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Matvei Novikov Co-authored-by: Anas Abou Allaban Co-authored-by: Nithin Rao Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Co-authored-by: Virginia Adams <78445382+vadam5@users.noreply.github.com> Co-authored-by: Eric Harper Co-authored-by: Zhilin Wang Co-authored-by: David Co-authored-by: David Mosallanezhad Co-authored-by: Elena Rastorgueva <80532067+erastorgueva-nv@users.noreply.github.com> Co-authored-by: Aleksandr Laptev Co-authored-by: Aleksandr Laptev Co-authored-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com> Co-authored-by: Jagadeesh Balam <4916480+jbalam-nv@users.noreply.github.com> Co-authored-by: anteju <108555623+anteju@users.noreply.github.com> Co-authored-by: Evelina <10428420+ekmb@users.noreply.github.com> Co-authored-by: Ryan Langman Co-authored-by: Dima Rekesh Co-authored-by: Dima Rekesh Co-authored-by: Oleksii Kuchaiev Co-authored-by: Oleksii Kuchaiev Co-authored-by: Virginia Adams Co-authored-by: Virginia Adams Co-authored-by: Yang Zhang Co-authored-by: Zhilin Wang Co-authored-by: Igor Gitman Co-authored-by: Sean Naren Co-authored-by: fayejf <36722593+fayejf@users.noreply.github.com> Co-authored-by: George <37293288+Jorjeous@users.noreply.github.com> Co-authored-by: Abhinav Khattar Co-authored-by: Yi Dong <43824965+yidong72@users.noreply.github.com> Co-authored-by: Yi Dong Co-authored-by: Paarth Neekhara Co-authored-by: Jocelyn Co-authored-by: Micha Livne Co-authored-by: liji-nv <59594262+liji-nv@users.noreply.github.com> Co-authored-by: Subhankar Ghosh Co-authored-by: Taejin Park Co-authored-by: Miguel Martínez <26169771+miguelusque@users.noreply.github.com> Co-authored-by: miguelangel Co-authored-by: 彭震东 <275331498@qq.com> Co-authored-by: Igor Gitman Co-authored-by: bene-ges <61418381+bene-ges@users.noreply.github.com> Co-authored-by: Alexandra Antonova Co-authored-by: Jason Co-authored-by: Rajesh Ilango Co-authored-by: pks Co-authored-by: Shantanu Acharya Co-authored-by: Shantanu Acharya Co-authored-by: Tim Moon <4406448+timmoon10@users.noreply.github.com> Co-authored-by: Sangkug Lym Co-authored-by: Kirthi Shankar Sivamani Co-authored-by: ksivamani Co-authored-by: Eric Harper Co-authored-by: Sandeep Subramanian Co-authored-by: Micha Livne Co-authored-by: Oleksii Volkovskyi Co-authored-by: Yuekai Zhang Co-authored-by: Hainan Xu Co-authored-by: Boris Fomitchev Co-authored-by: anmolgupt <14880251+anmolgupt@users.noreply.github.com> Co-authored-by: Anmol Gupta Co-authored-by: Sasha Meister <117230141+ssh-meister@users.noreply.github.com> Co-authored-by: Riqiang Wang <43883260+whrichd@users.noreply.github.com> Co-authored-by: Vladimir Bataev Co-authored-by: Shanmugam Ramasamy Co-authored-by: Viraj Karandikar <16838694+virajkarandikar@users.noreply.github.com> Co-authored-by: Shane Carroll <50530592+1-800-BAD-CODE@users.noreply.github.com> Co-authored-by: yaoyu-33 <54727607+yaoyu-33@users.noreply.github.com> Co-authored-by: Yi Dong Co-authored-by: PeganovAnton Co-authored-by: Jonghwan Hyeon Co-authored-by: Kaden Uhlig Co-authored-by: Kaden Uhlig Co-authored-by: Boris Fomitchev Co-authored-by: Jonghwan Hyeon --- nemo/collections/asr/losses/rnnt.py | 26 +++- nemo/collections/asr/losses/rnnt_pytorch.py | 5 + .../asr/parts/numba/rnnt_loss/rnnt.py | 2 +- .../asr/parts/numba/rnnt_loss/rnnt_numpy.py | 5 + .../asr/parts/numba/rnnt_loss/rnnt_pytorch.py | 7 +- .../rnnt_loss/utils/cpu_utils/cpu_rnnt.py | 8 +- .../numba/rnnt_loss/utils/rnnt_helper.py | 3 +- nemo/core/utils/numba_utils.py | 36 +++++ .../asr/numba/rnnt_loss/test_rnnt_pytorch.py | 126 ++++++++++++------ .../rnnt_loss/utils/test_gpu_rnnt_kernel.py | 64 +++++---- .../asr/numba/rnnt_loss/utils/test_reduce.py | 18 ++- .../numba/rnnt_loss/utils/test_rnnt_helper.py | 75 +++++++---- 12 files changed, 263 insertions(+), 112 deletions(-) diff --git a/nemo/collections/asr/losses/rnnt.py b/nemo/collections/asr/losses/rnnt.py index 10b85acb42ef..a884f7d3cc68 100644 --- a/nemo/collections/asr/losses/rnnt.py +++ b/nemo/collections/asr/losses/rnnt.py @@ -38,9 +38,10 @@ from nemo.collections.asr.losses.rnnt_pytorch import MultiblankRNNTLossPytorch, RNNTLossPytorch, TDTLossPytorch from nemo.core.classes import Loss, typecheck from nemo.core.neural_types import LabelsType, LengthsType, LogprobsType, LossType, NeuralType +from nemo.core.utils import numba_utils from nemo.core.utils.k2_utils import K2_INSTALLATION_MESSAGE from nemo.core.utils.numba_utils import NUMBA_INSTALLATION_MESSAGE -from nemo.utils import logging, model_utils +from nemo.utils import logging, logging_mode, model_utils try: import warprnnt_pytorch as warprnnt @@ -98,7 +99,7 @@ class RNNTLossConfig: min_version='0.53.0', is_available=NUMBA_RNNT_AVAILABLE, installation_msg=NUMBA_INSTALLATION_MESSAGE, - force_float32=True, + force_float32=not numba_utils.NUMBA_FP16_SUPPORTED, ), "pytorch": RNNTLossConfig( loss_name="pytorch", @@ -387,7 +388,7 @@ def __init__(self, num_classes, reduction: str = 'mean_batch', loss_name: str = for the standard "blank" symbol. In particular, say V is the number of non-blank tokens in the vocabulary, then in the case of, standard RNNT: num_classes = V - multiblank RNNT: num_classes = V + number-big-blanks (since we store big-blanks before + multiblank RNNT: num_classes = V + number-big-blanks (since we store big-blanks before standard blank, and the standard blank is the last symbol in the vocab) TDT: num_classes = V. Note, V here does not include any of the "duration outputs". @@ -413,6 +414,7 @@ def __init__(self, num_classes, reduction: str = 'mean_batch', loss_name: str = self.reduction = reduction self._loss = resolve_rnnt_loss(loss_name, blank_idx=self._blank, loss_kwargs=loss_kwargs) self._force_float32 = RNNT_LOSS_RESOLVER[loss_name].force_float32 + self._fp16_compat_checked = False def reduce(self, losses, target_lengths): @@ -442,8 +444,22 @@ def forward(self, log_probs, targets, input_lengths, target_lengths): max_targets_len = target_lengths.max() # Force cast joint to float32 - # TODO: Remove once Numba supports FP16 - if self._force_float32 and log_probs.dtype != torch.float32: + if not self._force_float32 and numba_utils.NUMBA_FP16_SUPPORTED: + # Execute the kernel in fp16 + pass + elif self._force_float32 and log_probs.dtype != torch.float32: + # Log just once if fp16 tensor was passed and fp16 Numba CUDA loss could not be used. + if log_probs.dtype == torch.float16 and not self._fp16_compat_checked: + _, reason = numba_utils.is_numba_cuda_fp16_supported(return_reason=True) + logging.warning( + f"Provided RNNT Joint tensor is of dtype {log_probs.dtype}, but RNNT loss could not be calculated " + f"in fp16 due to following reason stated below. Loss will be calculated in fp32. \n\n" + f"{reason}", + mode=logging_mode.ONCE, + ) + self._fp16_compat_checked = True + + # Upcast the activation tensor and compute loss and grads in fp32 logits_orig = log_probs log_probs = log_probs.float() del logits_orig # save memory *before* computing the loss diff --git a/nemo/collections/asr/losses/rnnt_pytorch.py b/nemo/collections/asr/losses/rnnt_pytorch.py index bc6e5a25a3b2..c8eee90a2eb5 100644 --- a/nemo/collections/asr/losses/rnnt_pytorch.py +++ b/nemo/collections/asr/losses/rnnt_pytorch.py @@ -47,7 +47,12 @@ def __init__(self, blank, reduction): self.reduction = reduction def forward(self, acts, labels, act_lens, label_lens): + # CPU patch for FP16 + if not acts.is_cuda and acts.dtype == torch.float16: + acts = acts.float() + acts = torch.log_softmax(acts, -1) + forward_logprob = self.compute_forward_prob(acts, labels, act_lens, label_lens) losses = -forward_logprob if self.reduction == 'mean_batch': diff --git a/nemo/collections/asr/parts/numba/rnnt_loss/rnnt.py b/nemo/collections/asr/parts/numba/rnnt_loss/rnnt.py index 118ee88acbfe..046aea425e20 100644 --- a/nemo/collections/asr/parts/numba/rnnt_loss/rnnt.py +++ b/nemo/collections/asr/parts/numba/rnnt_loss/rnnt.py @@ -186,7 +186,7 @@ def rnnt_loss_gpu( # Select GPU index cuda.select_device(acts.device.index) - gpu_workspace = torch.zeros(gpu_size, device=acts.device, dtype=acts.dtype, requires_grad=False) + gpu_workspace = torch.zeros(gpu_size, device=acts.device, dtype=torch.float32, requires_grad=False) ### VIEW TENSORS AS VECTORS FOR POINTER INDEXING ### acts, acts_shape = rnnt_helper.flatten_tensor(acts) diff --git a/nemo/collections/asr/parts/numba/rnnt_loss/rnnt_numpy.py b/nemo/collections/asr/parts/numba/rnnt_loss/rnnt_numpy.py index eaa6d332a0fc..58508970aa83 100644 --- a/nemo/collections/asr/parts/numba/rnnt_loss/rnnt_numpy.py +++ b/nemo/collections/asr/parts/numba/rnnt_loss/rnnt_numpy.py @@ -344,10 +344,15 @@ def forward(self, acts, labels, act_lens, label_lens): _assert_no_grad(label_lens) certify_inputs(acts, labels, act_lens, label_lens) + # CPU Patch for fp16 - force cast to fp32 + if not acts.is_cuda and acts.dtype == torch.float16: + acts = acts.float() + if self.clamp > 0.0: acts = LogSoftmaxGradModification.apply(acts, self.clamp) acts = torch.nn.functional.log_softmax(acts, -1) + return self.rnnt(acts, labels, act_lens, label_lens, self.blank, self.fastemit_lambda) diff --git a/nemo/collections/asr/parts/numba/rnnt_loss/rnnt_pytorch.py b/nemo/collections/asr/parts/numba/rnnt_loss/rnnt_pytorch.py index 2ffe08be361e..5960d5ab6b18 100644 --- a/nemo/collections/asr/parts/numba/rnnt_loss/rnnt_pytorch.py +++ b/nemo/collections/asr/parts/numba/rnnt_loss/rnnt_pytorch.py @@ -57,7 +57,7 @@ def forward(ctx, acts, labels, act_lens, label_lens, blank, reduction, fastemit_ loss_func = rnnt.rnnt_loss_gpu if is_cuda else rnnt.rnnt_loss_cpu grads = torch.zeros_like(acts) if acts.requires_grad else None minibatch_size = acts.size(0) - costs = torch.zeros(minibatch_size, device=acts.device, dtype=acts.dtype) + costs = torch.zeros(minibatch_size, device=acts.device, dtype=torch.float32) loss_func( acts, @@ -119,7 +119,6 @@ def forward( label_lens: Tensor of (batch) containing label length of each example fastemit_lambda: Float scaling factor for FastEmit regularization. Refer to FastEmit: Low-latency Streaming ASR with Sequence-level Emission Regularization. - durations: list of durations for TDT model, must include 0 and 1, e.g. [0, 1, 2, 3, 4]. sigma: hyper-parameter for logit under-normalization method for training @@ -417,6 +416,10 @@ def forward(self, acts, labels, act_lens, label_lens): label_lens: Tensor of (batch) containing label length of each example """ if not acts.is_cuda: + # Force FP32 until log_softmax() is implemented for fp16 on CPU + if acts.dtype == torch.float16: + acts = acts.float() + # Since CPU requires log_softmax to be computed explicitly, we need to perform grad clipping # *after* we have obtained the gradients of loss(logsoftmax()). # This is highly wasteful since it requires a copy of the entire joint tensor which is expensive. diff --git a/nemo/collections/asr/parts/numba/rnnt_loss/utils/cpu_utils/cpu_rnnt.py b/nemo/collections/asr/parts/numba/rnnt_loss/utils/cpu_utils/cpu_rnnt.py index 1528606716e1..3feb7b513a50 100644 --- a/nemo/collections/asr/parts/numba/rnnt_loss/utils/cpu_utils/cpu_rnnt.py +++ b/nemo/collections/asr/parts/numba/rnnt_loss/utils/cpu_utils/cpu_rnnt.py @@ -231,8 +231,8 @@ def cost_and_grad_kernel( ) # Scale llForward by FastEmit lambda - llForward *= 1.0 + self.fastemit_lambda_ - llBackward *= 1.0 + self.fastemit_lambda_ + llForward += llForward * self.fastemit_lambda_ + llBackward += llBackward * self.fastemit_lambda_ diff = (llForward - llBackward).abs() if diff > 0.1: @@ -300,6 +300,10 @@ def compute_betas_and_grads( Returns: Loglikelihood of the forward variable and inplace updates the grad tensor. """ + # Patch for CPU + fp16 + if log_probs.dtype == torch.float16 and not log_probs.is_cuda: + log_probs = log_probs.float() + idx = CpuRNNT_index(U, self.maxU_, self.minibatch_, self.alphabet_size_, self.batch_first) betas[idx(T - 1, U - 1)] = log_probs[idx(T - 1, U - 1) * 2] diff --git a/nemo/collections/asr/parts/numba/rnnt_loss/utils/rnnt_helper.py b/nemo/collections/asr/parts/numba/rnnt_loss/utils/rnnt_helper.py index b579b7315ef2..6ca7cd237264 100644 --- a/nemo/collections/asr/parts/numba/rnnt_loss/utils/rnnt_helper.py +++ b/nemo/collections/asr/parts/numba/rnnt_loss/utils/rnnt_helper.py @@ -30,6 +30,7 @@ import math from typing import Optional, Tuple +import numba import torch from numba import cuda @@ -112,7 +113,7 @@ def compute_costs_data(source: torch.Tensor, dest: torch.Tensor, fastemit_lambda if idx < length: copy_data_1d(source, dest, idx) dest[idx] *= -1.0 - dest[idx] *= 1.0 + fastemit_lambda + dest[idx] *= numba.float32(1.0 + fastemit_lambda) def get_workspace_size( diff --git a/nemo/core/utils/numba_utils.py b/nemo/core/utils/numba_utils.py index 6e1a8cb247d6..04010a2f7db4 100644 --- a/nemo/core/utils/numba_utils.py +++ b/nemo/core/utils/numba_utils.py @@ -17,6 +17,8 @@ import operator import os +from typing import Tuple, Union + from nemo.utils import model_utils # Prevent Numba CUDA logs from showing at info level @@ -26,6 +28,11 @@ __NUMBA_DEFAULT_MINIMUM_VERSION__ = "0.53.0" __NUMBA_MINIMUM_VERSION__ = os.environ.get("NEMO_NUMBA_MINVER", __NUMBA_DEFAULT_MINIMUM_VERSION__) +__NUMBA_MINIMUM_VERSION_FP16_SUPPORTED__ = "0.57.0" +NUMBA_FP16_SUPPORTED = model_utils.check_lib_version( + 'numba', __NUMBA_MINIMUM_VERSION_FP16_SUPPORTED__, operator=operator.ge +)[0] + NUMBA_INSTALLATION_MESSAGE = ( "Could not import `numba`.\n" @@ -148,6 +155,35 @@ def numba_cuda_is_supported(min_version: str) -> bool: return False +def is_numba_cuda_fp16_supported(return_reason: bool = False) -> Union[bool, Tuple[bool, str]]: + """ + Utility method that returns a bool, stating if FP16 is supported for numba cuda kernels or not. + + Returns: + bool, whether Numba CUDA will support fp16 or not. + """ + reason = "" + use_nvidia_binding = os.environ.get('NUMBA_CUDA_USE_NVIDIA_BINDING', None) + if use_nvidia_binding is not None: + use_nvidia_binding = use_nvidia_binding.lower() == "1" + reason += "Env variable `NUMBA_CUDA_USE_NVIDIA_BINDING` is available and set to `1`. " + else: + use_nvidia_binding = False + reason += "Env variable `NUMBA_CUDA_USE_NVIDIA_BINDING` is not available or has not set to `1`." + + if NUMBA_FP16_SUPPORTED: + reason += f"Numba CUDA FP16 is supported in installed numba version." + else: + reason += f"Numba CUDA FP16 is not supported in installed numba version." + + result = use_nvidia_binding and NUMBA_FP16_SUPPORTED + + if return_reason: + return result, reason + else: + return result + + def skip_numba_cuda_test_if_unsupported(min_version: str): """ Helper method to skip pytest test case if numba cuda is not supported. diff --git a/tests/collections/asr/numba/rnnt_loss/test_rnnt_pytorch.py b/tests/collections/asr/numba/rnnt_loss/test_rnnt_pytorch.py index 3fbfcf6df54b..1a29a14f540d 100644 --- a/tests/collections/asr/numba/rnnt_loss/test_rnnt_pytorch.py +++ b/tests/collections/asr/numba/rnnt_loss/test_rnnt_pytorch.py @@ -34,9 +34,14 @@ DEVICES.append('cuda') +DTYPES = [np.float32] +if numba_utils.is_numba_cuda_fp16_supported(): + DTYPES.append(np.float16) + + def wrap_and_call(fn, acts, labels, device): if not torch.is_tensor(acts): - acts = torch.FloatTensor(acts) + acts = torch.tensor(acts) if 'cuda' in device: acts = acts.cuda() @@ -72,7 +77,8 @@ def wrap_and_call(fn, acts, labels, device): class TestRNNTLossPytorch: @pytest.mark.unit @pytest.mark.parametrize('device', DEVICES) - def test_case_small(self, device): + @pytest.mark.parametrize('dtype', DTYPES) + def test_case_small(self, device, dtype): if device == 'cuda': numba_utils.skip_numba_cuda_test_if_unsupported(__NUMBA_MINIMUM_VERSION__) @@ -83,9 +89,13 @@ def test_case_small(self, device): [[0.1, 0.6, 0.1, 0.1, 0.1], [0.1, 0.1, 0.2, 0.1, 0.1], [0.7, 0.1, 0.2, 0.1, 0.1]], ] ] - ) + ).astype(dtype) labels = [[1, 2]] + cost_threshold = 1e-8 if dtype == np.float32 else 5e-4 + grad_threshold = 1e-8 if dtype == np.float32 else 1e-4 + rtol = 1e-5 if dtype == np.float32 else 1e-3 + fn_pt = RNNTLossNumba(blank=0, reduction='sum') pt_cost, pt_grads = wrap_and_call(fn_pt, acts, labels, device) @@ -113,23 +123,28 @@ def test_case_small(self, device): ] ) - assert np.allclose(pt_cost, expected_cost, rtol=1e-6), "small_test costs mismatch." - assert np.allclose(pt_grads, expected_grads), "small_test gradient mismatch." + assert np.allclose(pt_cost, expected_cost, atol=cost_threshold, rtol=1e-6), "small_test costs mismatch." + assert np.allclose(pt_grads, expected_grads, atol=grad_threshold, rtol=rtol), "small_test gradient mismatch." - assert np.allclose(pt_cost, np_cost, rtol=1e-6), "small_test costs mismatch." - assert np.allclose(pt_grads, np_grads), "small_test gradient mismatch." + assert np.allclose(pt_cost, np_cost, atol=cost_threshold, rtol=rtol), "small_test costs mismatch." + assert np.allclose(pt_grads, np_grads, atol=grad_threshold, rtol=rtol), "small_test gradient mismatch." - assert np.allclose(ag_cost, np_cost, rtol=1e-6), "small_test costs mismatch." - assert np.allclose(ag_grads, np_grads), "small_test gradient mismatch." + assert np.allclose(ag_cost, np_cost, atol=cost_threshold, rtol=rtol), "small_test costs mismatch." + assert np.allclose(ag_grads, np_grads, atol=cost_threshold, rtol=rtol), "small_test gradient mismatch." @pytest.mark.unit @pytest.mark.parametrize('device', DEVICES) - def test_case_small_random(self, device): + @pytest.mark.parametrize('dtype', DTYPES) + def test_case_small_random(self, device, dtype): if device == 'cuda': numba_utils.skip_numba_cuda_test_if_unsupported(__NUMBA_MINIMUM_VERSION__) + cost_threshold = 1e-8 if dtype == np.float32 else 5e-4 + grad_threshold = 1e-8 if dtype == np.float32 else 1e-4 + rtol = 1e-5 if dtype == np.float32 else 1e-3 + rng = np.random.RandomState(0) - acts = rng.randn(1, 4, 3, 3) + acts = rng.randn(1, 4, 3, 3).astype(dtype) labels = [[1, 2]] fn_pt = RNNTLossNumba(blank=0, reduction='sum') @@ -141,16 +156,17 @@ def test_case_small_random(self, device): fn_ag = RNNTLossPytorch(blank=0, reduction='sum') # ag for automatic gradient computation ag_cost, ag_grads = wrap_and_call(fn_ag, acts, labels, device) - assert np.allclose(pt_cost, np_cost, rtol=1e-6), "small_random_test costs mismatch." - assert np.allclose(pt_grads, np_grads), "small_random_test gradient mismatch." + assert np.allclose(pt_cost, np_cost, atol=cost_threshold, rtol=rtol), "small_random_test costs mismatch." + assert np.allclose(pt_grads, np_grads, atol=grad_threshold, rtol=rtol), "small_random_test gradient mismatch." - assert np.allclose(pt_cost, ag_cost, rtol=1e-6), "small_random_test costs mismatch." - assert np.allclose(pt_grads, ag_grads), "small_random_test gradient mismatch." + assert np.allclose(pt_cost, ag_cost, atol=cost_threshold, rtol=rtol), "small_random_test costs mismatch." + assert np.allclose(pt_grads, ag_grads, atol=grad_threshold, rtol=rtol), "small_random_test gradient mismatch." @pytest.mark.unit @pytest.mark.parametrize('device', DEVICES) + @pytest.mark.parametrize('dtype', DTYPES) @pytest.mark.parametrize('fastemit_lambda', [1.0, 0.01, 0.00001]) - def test_case_small_random_fastemit_reg(self, device, fastemit_lambda): + def test_case_small_random_fastemit_reg(self, device, dtype, fastemit_lambda): if device == 'cuda': numba_utils.skip_numba_cuda_test_if_unsupported(__NUMBA_MINIMUM_VERSION__) @@ -165,11 +181,12 @@ def test_case_small_random_fastemit_reg(self, device, fastemit_lambda): np_cost, np_grads = wrap_and_call(fn_np, acts, labels, device) assert np.allclose(pt_cost, np_cost, rtol=1e-6), "small_random_test costs mismatch." - assert np.allclose(pt_grads, np_grads, atol=1e-5, rtol=1e-5), "small_random_test gradient mismatch." + assert np.allclose(pt_grads, np_grads, rtol=1e-5), "small_random_test gradient mismatch." @pytest.mark.unit @pytest.mark.parametrize('device', DEVICES) - def test_case_big_tensor(self, device): + @pytest.mark.parametrize('dtype', DTYPES) + def test_case_big_tensor(self, device, dtype): if device == 'cuda': numba_utils.skip_numba_cuda_test_if_unsupported(__NUMBA_MINIMUM_VERSION__) @@ -269,9 +286,13 @@ def test_case_big_tensor(self, device): ], ] - activations = np.array(activations) + activations = np.array(activations).astype(dtype) labels = [[1, 2], [1, 1]] + cost_threshold = 1e-8 if dtype == np.float32 else 5e-4 + grad_threshold = 1e-8 if dtype == np.float32 else 1e-4 + rtol = 1e-3 if dtype == np.float32 else 0.1 + fn_pt = RNNTLossNumba(blank=0, reduction='sum') pt_costs, pt_grads = wrap_and_call(fn_pt, activations, labels, device) @@ -281,23 +302,30 @@ def test_case_big_tensor(self, device): fn_ag = RNNTLossPytorch(blank=0, reduction='sum') ag_costs, ag_grads = wrap_and_call(fn_ag, activations, labels, device) - assert np.allclose(pt_costs, sum(expected_costs)), "big_test average costs mismatch." - assert np.allclose(pt_grads, expected_grads, rtol=1e-3), "big_test grads for average cost mismatch." + assert np.allclose(pt_costs, sum(expected_costs), atol=cost_threshold), "big_test average costs mismatch." + assert np.allclose( + pt_grads, expected_grads, atol=grad_threshold, rtol=1e-3 + ), "big_test grads for average cost mismatch." - assert np.allclose(pt_costs, np_costs), "big_test average costs mismatch." - assert np.allclose(pt_grads, np_grads, rtol=1e-3), "big_test grads for average cost mismatch." + assert np.allclose(pt_costs, np_costs, atol=cost_threshold, rtol=rtol), "big_test average costs mismatch." + assert np.allclose( + pt_grads, np_grads, atol=grad_threshold, rtol=rtol + ), "big_test grads for average cost mismatch." - assert np.allclose(pt_costs, ag_costs), "big_test average costs mismatch." - assert np.allclose(pt_grads, ag_grads, rtol=1e-3), "big_test grads for average cost mismatch." + assert np.allclose(pt_costs, ag_costs, atol=cost_threshold, rtol=rtol), "big_test average costs mismatch." + assert np.allclose( + pt_grads, ag_grads, atol=grad_threshold, rtol=rtol + ), "big_test grads for average cost mismatch." @pytest.mark.unit @pytest.mark.parametrize('device', DEVICES) - def test_case_large_random(self, device): + @pytest.mark.parametrize('dtype', DTYPES) + def test_case_large_random(self, device, dtype): if device == 'cuda': numba_utils.skip_numba_cuda_test_if_unsupported(__NUMBA_MINIMUM_VERSION__) rng = np.random.RandomState(0) - acts = rng.randn(4, 8, 11, 5) + acts = rng.randn(4, 8, 11, 5).astype(dtype) labels = [ [1, 2, 4, 3, 2, 2, 1, 1, 1, 1], [3, 2, 2, 3, 4, 1, 1, 1, 1, 1], @@ -305,6 +333,10 @@ def test_case_large_random(self, device): [1, 1, 2, 1, 2, 3, 3, 1, 1, 1], ] + cost_threshold = 1e-8 if dtype == np.float32 else 5e-4 + grad_threshold = 1e-8 if dtype == np.float32 else 1e-4 + rtol = 1e-3 if dtype == np.float32 else 5e-2 + fn_pt = RNNTLossNumba(blank=0, reduction='sum') pt_cost, pt_grads = wrap_and_call(fn_pt, acts, labels, device) @@ -314,14 +346,15 @@ def test_case_large_random(self, device): fn_ag = RNNTLossPytorch(blank=0, reduction='sum') ag_cost, ag_grads = wrap_and_call(fn_ag, acts, labels, device) - assert np.allclose(pt_cost, np_cost, atol=1e-5, rtol=1e-3), "large_random_test costs mismatch." - assert np.allclose(ag_cost, np_cost, atol=1e-5, rtol=1e-3), "large_random_test costs mismatch." - assert np.allclose(pt_grads, np_grads, atol=1e-5, rtol=1e-3), "large_random_test gradient mismatch." - assert np.allclose(ag_grads, np_grads, atol=1e-5, rtol=1e-3), "large_random_test gradient mismatch." + assert np.allclose(pt_cost, np_cost, atol=cost_threshold, rtol=rtol), "large_random_test costs mismatch." + assert np.allclose(ag_cost, np_cost, atol=cost_threshold, rtol=rtol), "large_random_test costs mismatch." + assert np.allclose(pt_grads, np_grads, atol=grad_threshold, rtol=rtol), "large_random_test gradient mismatch." + assert np.allclose(ag_grads, np_grads, atol=grad_threshold, rtol=rtol), "large_random_test gradient mismatch." @pytest.mark.unit @pytest.mark.parametrize('device', DEVICES) - def test_case_small_clamp(self, device): + @pytest.mark.parametrize('dtype', DTYPES) + def test_case_small_clamp(self, device, dtype): if device == 'cuda': numba_utils.skip_numba_cuda_test_if_unsupported(__NUMBA_MINIMUM_VERSION__) @@ -333,9 +366,13 @@ def test_case_small_clamp(self, device): [[0.1, 0.6, 0.1, 0.1, 0.1], [0.1, 0.1, 0.2, 0.1, 0.1], [0.7, 0.1, 0.2, 0.1, 0.1]], ] ] - ) + ).astype(dtype) labels = [[1, 2]] + cost_threshold = 1e-8 if dtype == np.float32 else 5e-4 + grad_threshold = 1e-8 if dtype == np.float32 else 5e-5 + rtol = 1e-5 if dtype == np.float32 else 1e-3 + fn_pt = RNNTLossNumba(blank=0, reduction='sum', clamp=GRAD_CLAMP) pt_cost, pt_grads = wrap_and_call(fn_pt, acts, labels, device) @@ -360,16 +397,17 @@ def test_case_small_clamp(self, device): ] ) - assert np.allclose(pt_cost, expected_cost, rtol=1e-6), "small_test costs mismatch." - assert np.allclose(pt_grads, expected_grads), "small_test gradient mismatch." + assert np.allclose(pt_cost, expected_cost, atol=cost_threshold, rtol=rtol), "small_test costs mismatch." + assert np.allclose(pt_grads, expected_grads, atol=grad_threshold, rtol=rtol), "small_test gradient mismatch." - assert np.allclose(pt_cost, np_cost, rtol=1e-6), "small_test costs mismatch." - assert np.allclose(pt_grads, np_grads), "small_test gradient mismatch." + assert np.allclose(pt_cost, np_cost, atol=cost_threshold, rtol=rtol), "small_test costs mismatch." + assert np.allclose(pt_grads, np_grads, atol=grad_threshold, rtol=rtol), "small_test gradient mismatch." @pytest.mark.unit @pytest.mark.parametrize('device', DEVICES) + @pytest.mark.parametrize('dtype', DTYPES) @pytest.mark.parametrize('fastemit_lambda', [1.0, 0.01, 0.00001]) - def test_case_small_fastemit_clamp(self, device, fastemit_lambda): + def test_case_small_fastemit_clamp(self, device, dtype, fastemit_lambda): if device == 'cuda': numba_utils.skip_numba_cuda_test_if_unsupported(__NUMBA_MINIMUM_VERSION__) @@ -381,9 +419,13 @@ def test_case_small_fastemit_clamp(self, device, fastemit_lambda): [[0.1, 0.6, 0.1, 0.1, 0.1], [0.1, 0.1, 0.2, 0.1, 0.1], [0.7, 0.1, 0.2, 0.1, 0.1]], ] ] - ) + ).astype(dtype) labels = [[1, 2]] + cost_threshold = 1e-8 if dtype == np.float32 else 1e-3 + grad_threshold = 1e-8 if dtype == np.float32 else 5e-4 + rtol = 1e-5 if dtype == np.float32 else 1e-3 + fn_pt = RNNTLossNumba(blank=0, reduction='sum', fastemit_lambda=fastemit_lambda, clamp=GRAD_CLAMP) pt_cost, pt_grads = wrap_and_call(fn_pt, acts, labels, device) @@ -393,9 +435,9 @@ def test_case_small_fastemit_clamp(self, device, fastemit_lambda): expected_cost = 4.495666 expected_cost += expected_cost * fastemit_lambda - assert np.allclose(pt_cost, expected_cost, rtol=1e-6), "small_test costs mismatch." - assert np.allclose(pt_cost, np_cost, rtol=1e-6), "small_test costs mismatch." - assert np.allclose(pt_grads, np_grads), "small_test gradient mismatch." + assert np.allclose(pt_cost, expected_cost, atol=cost_threshold, rtol=rtol), "small_test costs mismatch." + assert np.allclose(pt_cost, np_cost, atol=cost_threshold, rtol=rtol), "small_test costs mismatch." + assert np.allclose(pt_grads, np_grads, atol=grad_threshold, rtol=rtol), "small_test gradient mismatch." @pytest.mark.unit @pytest.mark.parametrize('device', DEVICES) diff --git a/tests/collections/asr/numba/rnnt_loss/utils/test_gpu_rnnt_kernel.py b/tests/collections/asr/numba/rnnt_loss/utils/test_gpu_rnnt_kernel.py index 230b6b7c099f..cb5a9816e237 100644 --- a/tests/collections/asr/numba/rnnt_loss/utils/test_gpu_rnnt_kernel.py +++ b/tests/collections/asr/numba/rnnt_loss/utils/test_gpu_rnnt_kernel.py @@ -25,8 +25,14 @@ from nemo.core.utils.numba_utils import __NUMBA_MINIMUM_VERSION__ +DTYPES = [torch.float32] +if numba_utils.is_numba_cuda_fp16_supported(): + DTYPES.append(torch.float16) + + def log_softmax(x, axis=-1): x = torch.from_numpy(x) # zero-copy + x = x.float() x = torch.log_softmax(x, dim=axis) x = x.numpy() return x @@ -42,12 +48,14 @@ def log_softmax_grad(x, axis=-1): class TestRNNTCUDAKernels: @pytest.mark.skipif(not cuda.is_available(), reason="CUDA Reductions can only be run when CUDA is available") @pytest.mark.unit - def test_compute_alphas_kernel(self): + @pytest.mark.parametrize('dtype', DTYPES) + def test_compute_alphas_kernel(self, dtype): numba_utils.skip_numba_cuda_test_if_unsupported(__NUMBA_MINIMUM_VERSION__) random = np.random.RandomState(0) original_shape = [1, 5, 11, 3] B, T, U, V = original_shape + threshold = 1e-5 if dtype == torch.float32 else 3e-4 # Numpy kernel x = random.randn(*original_shape) @@ -67,7 +75,7 @@ def test_compute_alphas_kernel(self): else: stream = cuda.default_stream() - x_c = torch.tensor(x, device=device, dtype=torch.float32) + x_c = torch.tensor(x, device=device, dtype=dtype) labels_c = torch.tensor(labels, device=device, dtype=torch.int64) # Allocate workspace memory @@ -100,22 +108,24 @@ def test_compute_alphas_kernel(self): alphas = alphas.view([B, T, U]) diff = ground_alphas - alphas[0].cpu().numpy() - assert np.abs(diff).mean() <= 1e-5 - assert np.square(diff).mean() <= 1e-10 + assert np.abs(diff).mean() <= threshold + assert np.square(diff).mean() <= (threshold ** 2) ll_diff = ground_log_likelihood - llForward[0].cpu().numpy() - assert np.abs(ll_diff).mean() <= 1e-5 - assert np.square(ll_diff).mean() <= 1e-10 + assert np.abs(ll_diff).mean() <= threshold + assert np.square(ll_diff).mean() <= (threshold ** 2) @pytest.mark.skipif(not cuda.is_available(), reason="CUDA Reductions can only be run when CUDA is available") @pytest.mark.unit - def test_compute_betas_kernel(self): + @pytest.mark.parametrize('dtype', DTYPES) + def test_compute_betas_kernel(self, dtype): numba_utils.skip_numba_cuda_test_if_unsupported(__NUMBA_MINIMUM_VERSION__) random = np.random.RandomState(0) original_shape = [1, 5, 11, 3] B, T, U, V = original_shape + threshold = 1e-5 if dtype == torch.float32 else 3e-4 # Numpy kernel x = random.randn(*original_shape) @@ -135,7 +145,7 @@ def test_compute_betas_kernel(self): else: stream = cuda.default_stream() - x_c = torch.tensor(x, device=device, dtype=torch.float32) + x_c = torch.tensor(x, device=device, dtype=dtype) labels_c = torch.tensor(labels, device=device, dtype=torch.int64) # Allocate workspace memory @@ -168,17 +178,18 @@ def test_compute_betas_kernel(self): betas = betas.view([B, T, U]) diff = ground_alphas - betas[0].cpu().numpy() - assert np.abs(diff).mean() <= 1e-5 - assert np.square(diff).mean() <= 1e-10 + assert np.abs(diff).mean() <= threshold + assert np.square(diff).mean() <= (threshold ** 2) ll_diff = ground_log_likelihood - llBackward[0].cpu().numpy() - assert np.abs(ll_diff).mean() <= 1e-5 - assert np.square(ll_diff).mean() <= 1e-10 + assert np.abs(ll_diff).mean() <= threshold + assert np.square(ll_diff).mean() <= (threshold ** 2) @pytest.mark.skipif(not cuda.is_available(), reason="CUDA Reductions can only be run when CUDA is available") @pytest.mark.unit - def test_compute_grads_kernel(self): + @pytest.mark.parametrize('dtype', DTYPES) + def test_compute_grads_kernel(self, dtype): numba_utils.skip_numba_cuda_test_if_unsupported(__NUMBA_MINIMUM_VERSION__) fastemit_lambda = 0.0 @@ -187,6 +198,7 @@ def test_compute_grads_kernel(self): random = np.random.RandomState(0) original_shape = [1, 5, 11, 3] B, T, U, V = original_shape + threshold = 1e-5 if dtype == torch.float32 else 3e-5 # Numpy kernel x = random.randn(*original_shape) @@ -220,7 +232,7 @@ def test_compute_grads_kernel(self): else: stream = cuda.default_stream() - x_c = torch.tensor(x, device=device, dtype=torch.float32) + x_c = torch.tensor(x, device=device, dtype=dtype) labels_c = labels.clone().to(device=device, dtype=torch.int64) # Allocate workspace memory @@ -283,12 +295,13 @@ def test_compute_grads_kernel(self): grads = grads.view([B, T, U, V]) diff = true_grads - grads[0].cpu().numpy() - assert np.abs(diff).mean() <= 1e-5 - assert np.square(diff).mean() <= 1e-10 + assert np.abs(diff).mean() <= threshold + assert np.square(diff).mean() <= (threshold ** 2) * 5.0 @pytest.mark.skipif(not cuda.is_available(), reason="CUDA Reductions can only be run when CUDA is available") @pytest.mark.unit - def test_compute_grads_kernel_fastemit(self): + @pytest.mark.parametrize('dtype', DTYPES) + def test_compute_grads_kernel_fastemit(self, dtype): numba_utils.skip_numba_cuda_test_if_unsupported(__NUMBA_MINIMUM_VERSION__) fastemit_lambda = 0.001 @@ -297,6 +310,7 @@ def test_compute_grads_kernel_fastemit(self): random = np.random.RandomState(0) original_shape = [1, 5, 11, 3] B, T, U, V = original_shape + threshold = 1e-5 if dtype == torch.float32 else 3e-5 # Numpy kernel x = random.randn(*original_shape) @@ -330,7 +344,7 @@ def test_compute_grads_kernel_fastemit(self): else: stream = cuda.default_stream() - x_c = torch.tensor(x, device=device, dtype=torch.float32) + x_c = torch.tensor(x, device=device, dtype=dtype) labels_c = labels.clone().to(device=device, dtype=torch.int64) # Allocate workspace memory @@ -393,12 +407,13 @@ def test_compute_grads_kernel_fastemit(self): grads = grads.view([B, T, U, V]) diff = true_grads - grads[0].cpu().numpy() - assert np.abs(diff).mean() <= 1e-5 - assert np.square(diff).mean() <= 1e-10 + assert np.abs(diff).mean() <= threshold + assert np.square(diff).mean() <= (threshold ** 2) * 5 @pytest.mark.skipif(not cuda.is_available(), reason="CUDA Reductions can only be run when CUDA is available") @pytest.mark.unit - def test_compute_grads_kernel_clamp(self): + @pytest.mark.parametrize('dtype', DTYPES) + def test_compute_grads_kernel_clamp(self, dtype): numba_utils.skip_numba_cuda_test_if_unsupported(__NUMBA_MINIMUM_VERSION__) fastemit_lambda = 0.0 @@ -407,6 +422,7 @@ def test_compute_grads_kernel_clamp(self): random = np.random.RandomState(0) original_shape = [1, 5, 11, 3] B, T, U, V = original_shape + threshold = 1e-5 if dtype == torch.float32 else 3e-5 # Numpy kernel x = random.randn(*original_shape) @@ -440,7 +456,7 @@ def test_compute_grads_kernel_clamp(self): else: stream = cuda.default_stream() - x_c = torch.tensor(x, device=device, dtype=torch.float32) + x_c = torch.tensor(x, device=device, dtype=dtype) labels_c = labels.clone().to(device=device, dtype=torch.int64) # Allocate workspace memory @@ -503,8 +519,8 @@ def test_compute_grads_kernel_clamp(self): grads = grads.view([B, T, U, V]) diff = true_grads - grads[0].cpu().numpy() - assert np.abs(diff).mean() <= 1e-5 - assert np.square(diff).mean() <= 1e-10 + assert np.abs(diff).mean() <= threshold + assert np.square(diff).mean() <= (threshold ** 2) * 5 class TestTDTCUDAKernels: diff --git a/tests/collections/asr/numba/rnnt_loss/utils/test_reduce.py b/tests/collections/asr/numba/rnnt_loss/utils/test_reduce.py index 7c2ba6a41208..5994d53e1d8f 100644 --- a/tests/collections/asr/numba/rnnt_loss/utils/test_reduce.py +++ b/tests/collections/asr/numba/rnnt_loss/utils/test_reduce.py @@ -20,17 +20,22 @@ from nemo.core.utils import numba_utils from nemo.core.utils.numba_utils import __NUMBA_MINIMUM_VERSION__ +DTYPES = [np.float32] +if numba_utils.is_numba_cuda_fp16_supported(): + DTYPES.append(np.float16) + class TestRNNTCUDAReductions: @pytest.mark.skipif(not cuda.is_available(), reason="CUDA Reductions can only be run when CUDA is available") @pytest.mark.unit - def test_reduce_max(self): + @pytest.mark.parametrize('dtype', DTYPES) + def test_reduce_max(self, dtype): numba_utils.skip_numba_cuda_test_if_unsupported(__NUMBA_MINIMUM_VERSION__) random = np.random.RandomState(0) original_shape = [1, 5, 4, 3] - x = random.randn(*original_shape).reshape([-1]) - dx = random.randn(*x.shape) + x = random.randn(*original_shape).reshape([-1]).astype(dtype) + dx = random.randn(*x.shape).astype(dtype) stream = cuda.stream() x_c = cuda.to_device(x, stream=stream) @@ -53,13 +58,14 @@ def test_reduce_max(self): @pytest.mark.skipif(not cuda.is_available(), reason="CUDA Reductions can only be run when CUDA is available") @pytest.mark.unit - def test_reduce_exp(self): + @pytest.mark.parametrize('dtype', DTYPES) + def test_reduce_exp(self, dtype): numba_utils.skip_numba_cuda_test_if_unsupported(__NUMBA_MINIMUM_VERSION__) random = np.random.RandomState(0) original_shape = [1, 5, 4, 2] - x = random.randn(*original_shape).reshape([-1]) - dx = np.zeros_like(x) + x = random.randn(*original_shape).reshape([-1]).astype(dtype) + dx = np.zeros_like(x).astype(dtype) stream = cuda.stream() x_c = cuda.to_device(x, stream=stream) diff --git a/tests/collections/asr/numba/rnnt_loss/utils/test_rnnt_helper.py b/tests/collections/asr/numba/rnnt_loss/utils/test_rnnt_helper.py index 243fe727e172..08f12da8324d 100644 --- a/tests/collections/asr/numba/rnnt_loss/utils/test_rnnt_helper.py +++ b/tests/collections/asr/numba/rnnt_loss/utils/test_rnnt_helper.py @@ -20,11 +20,16 @@ from nemo.core.utils import numba_utils from nemo.core.utils.numba_utils import __NUMBA_MINIMUM_VERSION__ +DTYPES = [np.float32] +if numba_utils.is_numba_cuda_fp16_supported(): + DTYPES.append(np.float16) + class TestRNNTHelper: @pytest.mark.skipif(not cuda.is_available(), reason="CUDA Helpers can only be run when CUDA is available") @pytest.mark.unit - def test_log_sum_exp(self): + @pytest.mark.parametrize('dtype', DTYPES) + def test_log_sum_exp(self, dtype): numba_utils.skip_numba_cuda_test_if_unsupported(__NUMBA_MINIMUM_VERSION__) # wrapper kernel for device function that is tested @@ -34,8 +39,9 @@ def _kernel(x, y): if x_pos < x.shape[0] and x_pos < y.shape[0]: x[x_pos] = rnnt_helper.log_sum_exp(x[x_pos], y[x_pos]) - x = np.zeros([8]) # np.random.rand(8192) - y = np.ones([8]) # np.random.rand(8192) + x = np.zeros([8]).astype(dtype) # np.random.rand(8192) + y = np.ones([8]).astype(dtype) # np.random.rand(8192) + threshold = 1e-5 if dtype == np.float32 else 2e-3 stream = cuda.stream() x_c = cuda.to_device(x, stream=stream) @@ -52,11 +58,12 @@ def _kernel(x, y): x_new = x_c.copy_to_host(stream=stream) del x_c, y_c - assert (x_new.sum() - 10.506093500145782) <= 1e-5 + assert (x_new.sum() - 10.506093500145782) <= threshold @pytest.mark.skipif(not cuda.is_available(), reason="CUDA Helpers can only be run when CUDA is available") @pytest.mark.unit - def test_log_sum_exp_neg_inf(self): + @pytest.mark.parametrize('dtype', DTYPES) + def test_log_sum_exp_neg_inf(self, dtype): numba_utils.skip_numba_cuda_test_if_unsupported(__NUMBA_MINIMUM_VERSION__) # wrapper kernel for device function that is tested @@ -66,8 +73,8 @@ def _kernel(x, y): if x_pos < x.shape[0] and x_pos < y.shape[0]: x[x_pos] = rnnt_helper.log_sum_exp(x[x_pos], y[x_pos]) - x = np.asarray([global_constants.FP32_NEG_INF] * 8) - y = np.ones([len(x)]) + x = np.asarray([global_constants.FP32_NEG_INF] * 8).astype(dtype) + y = np.ones([len(x)]).astype(dtype) stream = cuda.stream() x_c = cuda.to_device(x, stream=stream) @@ -88,7 +95,8 @@ def _kernel(x, y): @pytest.mark.skipif(not cuda.is_available(), reason="CUDA Helpers can only be run when CUDA is available") @pytest.mark.unit - def test_div_up(self): + @pytest.mark.parametrize('dtype', DTYPES) + def test_div_up(self, dtype): numba_utils.skip_numba_cuda_test_if_unsupported(__NUMBA_MINIMUM_VERSION__) # wrapper kernel for device function that is tested @@ -98,8 +106,8 @@ def _kernel(x, y): if x_pos < x.shape[0] and x_pos < y.shape[0]: x[x_pos] = rnnt_helper.div_up(x[x_pos], y[x_pos]) - x = np.full([8], fill_value=10) # np.random.rand(8192) - y = np.full([8], fill_value=2) # np.random.rand(8192) + x = np.full([8], fill_value=10).astype(dtype) # np.random.rand(8192) + y = np.full([8], fill_value=2).astype(dtype) # np.random.rand(8192) stream = cuda.stream() x_c = cuda.to_device(x, stream=stream) @@ -121,7 +129,8 @@ def _kernel(x, y): @pytest.mark.skipif(not cuda.is_available(), reason="CUDA Helpers can only be run when CUDA is available") @pytest.mark.unit - def test_add(self): + @pytest.mark.parametrize('dtype', DTYPES) + def test_add(self, dtype): numba_utils.skip_numba_cuda_test_if_unsupported(__NUMBA_MINIMUM_VERSION__) # wrapper kernel for device function that is tested @@ -131,8 +140,8 @@ def _kernel(x, y): if x_pos < x.shape[0] and x_pos < y.shape[0]: x[x_pos] = rnnt_helper.add(x[x_pos], y[x_pos]) - x = np.full([8], fill_value=10) # np.random.rand(8192) - y = np.full([8], fill_value=2) # np.random.rand(8192) + x = np.full([8], fill_value=10).astype(dtype) # np.random.rand(8192) + y = np.full([8], fill_value=2).astype(dtype) # np.random.rand(8192) stream = cuda.stream() x_c = cuda.to_device(x, stream=stream) @@ -154,7 +163,8 @@ def _kernel(x, y): @pytest.mark.skipif(not cuda.is_available(), reason="CUDA Helpers can only be run when CUDA is available") @pytest.mark.unit - def test_maximum(self): + @pytest.mark.parametrize('dtype', DTYPES) + def test_maximum(self, dtype): numba_utils.skip_numba_cuda_test_if_unsupported(__NUMBA_MINIMUM_VERSION__) # wrapper kernel for device function that is tested @@ -164,8 +174,8 @@ def _kernel(x, y): if x_pos < x.shape[0] and x_pos < y.shape[0]: x[x_pos] = rnnt_helper.maximum(x[x_pos], y[x_pos]) - x = np.full([8], fill_value=10) # np.random.rand(8192) - y = np.full([8], fill_value=2) # np.random.rand(8192) + x = np.full([8], fill_value=10).astype(dtype) # np.random.rand(8192) + y = np.full([8], fill_value=2).astype(dtype) # np.random.rand(8192) stream = cuda.stream() x_c = cuda.to_device(x, stream=stream) @@ -187,7 +197,8 @@ def _kernel(x, y): @pytest.mark.skipif(not cuda.is_available(), reason="CUDA Helpers can only be run when CUDA is available") @pytest.mark.unit - def test_identity(self): + @pytest.mark.parametrize('dtype', DTYPES) + def test_identity(self, dtype): numba_utils.skip_numba_cuda_test_if_unsupported(__NUMBA_MINIMUM_VERSION__) # wrapper kernel for device function that is tested @@ -197,7 +208,7 @@ def _kernel(x): if x_pos < x.shape[0]: x[x_pos] = rnnt_helper.identity(x[x_pos]) - x = np.full([8], fill_value=10) # np.random.rand(8192) + x = np.full([8], fill_value=10).astype(dtype) # np.random.rand(8192) stream = cuda.stream() x_c = cuda.to_device(x, stream=stream) @@ -218,7 +229,8 @@ def _kernel(x): @pytest.mark.skipif(not cuda.is_available(), reason="CUDA Helpers can only be run when CUDA is available") @pytest.mark.unit - def test_negate(self): + @pytest.mark.parametrize('dtype', [np.float32, np.float16]) + def test_negate(self, dtype): numba_utils.skip_numba_cuda_test_if_unsupported(__NUMBA_MINIMUM_VERSION__) # wrapper kernel for device function that is tested @@ -228,7 +240,7 @@ def _kernel(x): if x_pos < x.shape[0]: x[x_pos] = rnnt_helper.negate(x[x_pos]) - x = np.full([8], fill_value=10) # np.random.rand(8192) + x = np.full([8], fill_value=10).astype(dtype) # np.random.rand(8192) stream = cuda.stream() x_c = cuda.to_device(x, stream=stream) @@ -249,7 +261,8 @@ def _kernel(x): @pytest.mark.skipif(not cuda.is_available(), reason="CUDA Helpers can only be run when CUDA is available") @pytest.mark.unit - def test_exponential(self): + @pytest.mark.parametrize('dtype', DTYPES) + def test_exponential(self, dtype): numba_utils.skip_numba_cuda_test_if_unsupported(__NUMBA_MINIMUM_VERSION__) # wrapper kernel for device function that is tested @@ -259,7 +272,7 @@ def _kernel(x): if x_pos < x.shape[0]: x[x_pos] = rnnt_helper.exponential(x[x_pos]) - x = np.random.rand(8) + x = np.random.rand(8).astype(dtype) stream = cuda.stream() x_c = cuda.to_device(x, stream=stream) @@ -281,7 +294,8 @@ def _kernel(x): @pytest.mark.skipif(not cuda.is_available(), reason="CUDA Helpers can only be run when CUDA is available") @pytest.mark.unit - def test_log_plus(self): + @pytest.mark.parametrize('dtype', DTYPES) + def test_log_plus(self, dtype): numba_utils.skip_numba_cuda_test_if_unsupported(__NUMBA_MINIMUM_VERSION__) # wrapper kernel for device function that is tested @@ -291,8 +305,8 @@ def _kernel(x, y): if x_pos < x.shape[0] and x_pos < y.shape[0]: x[x_pos] = rnnt_helper.log_plus(x[x_pos], y[x_pos]) - x = np.full([8], fill_value=10.0) # np.random.rand(8192) - y = np.full([8], fill_value=2.0) # np.random.rand(8192) + x = np.full([8], fill_value=10.0).astype(dtype) # np.random.rand(8192) + y = np.full([8], fill_value=2.0).astype(dtype) # np.random.rand(8192) stream = cuda.stream() x_c = cuda.to_device(x, stream=stream) @@ -317,12 +331,15 @@ def _kernel(x, y): @pytest.mark.skipif(not cuda.is_available(), reason="CUDA Helpers can only be run when CUDA is available") @pytest.mark.parametrize('batch_size', [8, 128, 256]) @pytest.mark.parametrize('fastemit_lambda', [0.0, 0.001]) + @pytest.mark.parametrize('dtype', DTYPES) @pytest.mark.unit - def test_compute_costs_data(self, batch_size, fastemit_lambda): + def test_compute_costs_data(self, batch_size, fastemit_lambda, dtype): numba_utils.skip_numba_cuda_test_if_unsupported(__NUMBA_MINIMUM_VERSION__) + np.random.seed(0) x = np.full([batch_size], fill_value=0.0) # np.random.rand(8192) - y = np.random.randn(batch_size) # np.random.rand(8192) + y = np.random.randn(batch_size).astype(dtype) # np.random.rand(8192) + threshold = 1e-5 if dtype == np.float32 else 1e-5 stream = cuda.stream() x_c = cuda.to_device(x, stream=stream) @@ -340,11 +357,11 @@ def test_compute_costs_data(self, batch_size, fastemit_lambda): x_new = x_c.copy_to_host(stream=stream) del x_c, y_c - res = -(y.copy()) + res = -(y.astype(np.float32).copy()) res *= 1.0 + fastemit_lambda for i in range(len(x_new)): - assert x_new[i] == res[i], f"index failed {i}" + assert abs(x_new[i] - res[i]) < threshold, f"index failed {i}" if __name__ == '__main__':