From 026e363fa6c55e8e0434ee29e8929b58968d2998 Mon Sep 17 00:00:00 2001 From: Matvei Novikov Date: Tue, 8 Aug 2023 07:50:02 +0400 Subject: [PATCH] T5 metrics fix (#7037) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit * Fix race condition when executing with multi-node where some ranks does not wait for setup (#7016) Signed-off-by: Kim Ngo <6362111+findkim@users.noreply.github.com> Signed-off-by: jubick1337 * Added bool types to neural_types export (#7032) Signed-off-by: tbartley94 Signed-off-by: jubick1337 * rnnt and char utils (#6971) * rnnt_ngram_merge Signed-off-by: Nikolay Karpov * char level bug Signed-off-by: Nikolay Karpov * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Nikolay Karpov Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Somshubra Majumdar Signed-off-by: jubick1337 * fix tab text gen (#7022) (#7031) Signed-off-by: Yi Dong Co-authored-by: Yi Dong <43824965+yidong72@users.noreply.github.com> Signed-off-by: jubick1337 * Fixed kwargs for metric instance init Signed-off-by: jubick1337 * Fixed kwargs for metric instance init Signed-off-by: jubick1337 * removed kwagrs Signed-off-by: jubick1337 * Updated config desc Signed-off-by: jubick1337 * ASR Confidence update and tutorial (#6810) * small fixes and tests Signed-off-by: Aleksandr Laptev * various fixes for the tutorial Signed-off-by: Aleksandr Laptev * tutorial added Signed-off-by: Aleksandr Laptev * for for a little oops after rebasement Signed-off-by: Aleksandr Laptev * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix tests Signed-off-by: Aleksandr Laptev * unused import removed Signed-off-by: Aleksandr Laptev * fix review comments Signed-off-by: Aleksandr Laptev * deprecated parameters for greedy configs Signed-off-by: Aleksandr Laptev * move re-assigning to configs Signed-off-by: Aleksandr Laptev * fix comments 2 Signed-off-by: Aleksandr Laptev * fix config tests Signed-off-by: Aleksandr Laptev * fix ece test (my env was bugged apparently) Signed-off-by: Aleksandr Laptev * renamings for confidence ensemble Signed-off-by: Aleksandr Laptev * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fox comments 3 Signed-off-by: Aleksandr Laptev * return dropped tutorial Signed-off-by: Aleksandr Laptev * CI flips back and forth, increasing tolerance Signed-off-by: Aleksandr Laptev --------- Signed-off-by: Aleksandr Laptev Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: jubick1337 * install_bs (#7019) (#7028) Signed-off-by: Nikolay Karpov Co-authored-by: Nikolay Karpov Signed-off-by: jubick1337 * fixes for spellmapper (#6994) (#7000) Signed-off-by: Alexandra Antonova Co-authored-by: bene-ges Co-authored-by: Evelina <10428420+ekmb@users.noreply.github.com> Signed-off-by: jubick1337 * added back the retro documents (#7033) Signed-off-by: Yi Dong Signed-off-by: jubick1337 * Remove pyyaml (#7052) (#7054) Signed-off-by: smajumdar Co-authored-by: Somshubra Majumdar Signed-off-by: jubick1337 * st standalone model (#6969) * st standalone model Signed-off-by: AlexGrinch * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * style fix Signed-off-by: AlexGrinch * sacrebleu import fix, unused imports removed Signed-off-by: AlexGrinch * import guard for nlp inside asr transformer bpe model Signed-off-by: AlexGrinch * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * codeql fixes Signed-off-by: AlexGrinch * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * comments answered Signed-off-by: AlexGrinch * import ordering fix Signed-off-by: AlexGrinch * yttm for asr removed Signed-off-by: AlexGrinch * logging added Signed-off-by: AlexGrinch * added inference and translate method Signed-off-by: AlexGrinch * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: AlexGrinch Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: jubick1337 * remove pos emb from state dict for old models (#7068) * remove pos emb from state dict Signed-off-by: Evelina * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * move to nlp_model Signed-off-by: Evelina * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update comment Signed-off-by: Evelina * fix nmt test Signed-off-by: Evelina * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix nmt test Signed-off-by: Evelina --------- Signed-off-by: Evelina Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: jubick1337 * Fix typo in ASR-TTS tutorial (#7049) Signed-off-by: Vladimir Bataev Signed-off-by: jubick1337 * Fixed tutorial's name (#7047) Signed-off-by: Vitaly Lavrukhin Co-authored-by: Vladimir Bataev Signed-off-by: jubick1337 * Fix documentation for Numba (#7065) (#7077) * Fix documentation for Numba * Update force float32 flag dynamically * Update force float32 flag dynamically * Fix nemo version --------- Signed-off-by: smajumdar Co-authored-by: Somshubra Majumdar Co-authored-by: Eric Harper Signed-off-by: jubick1337 * Update Frame-VAD doc and fix onnx export (#7076) * update fvad doc Signed-off-by: stevehuang52 * fix typo Signed-off-by: stevehuang52 * update fvad example Signed-off-by: stevehuang52 * update Signed-off-by: stevehuang52 * fix onnx export Signed-off-by: stevehuang52 * update test Signed-off-by: stevehuang52 * refactor Signed-off-by: stevehuang52 * update doc Signed-off-by: stevehuang52 * update Signed-off-by: stevehuang52 --------- Signed-off-by: stevehuang52 Co-authored-by: fayejf <36722593+fayejf@users.noreply.github.com> Signed-off-by: jubick1337 * memmap worker arg (#7062) * memmap worker arg Signed-off-by: arendu * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update Signed-off-by: arendu * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update Signed-off-by: arendu * update Signed-off-by: arendu --------- Signed-off-by: arendu Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: jubick1337 * Fix caching bug in causal convolutions for cache-aware ASR models (#7034) (#7082) Co-authored-by: Vahid Noroozi Signed-off-by: jubick1337 * Fast Conformer global token fix (#7085) * old way Signed-off-by: sam1373 * fix Signed-off-by: sam1373 * fix Signed-off-by: sam1373 * fix Signed-off-by: sam1373 * remove extra Signed-off-by: sam1373 * clean Signed-off-by: sam1373 * clean Signed-off-by: sam1373 * clean Signed-off-by: sam1373 * fix Signed-off-by: sam1373 * fix Signed-off-by: sam1373 * fix Signed-off-by: sam1373 * fix Signed-off-by: sam1373 * fix Signed-off-by: sam1373 * fix Signed-off-by: sam1373 * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: sam1373 Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: jubick1337 * Refined export_config (#7053) (#7066) * Refined export_config * Rolling back hierarchy change --------- Signed-off-by: Boris Fomitchev Co-authored-by: Boris Fomitchev Signed-off-by: jubick1337 * small Bugfix (#7081) * small Bugfix (#7079) * fix branch Signed-off-by: fayejf * fix typo Signed-off-by: fayejf * fix link Signed-off-by: fayejf --------- Signed-off-by: fayejf * Update tutorials/nlp/SpellMapper_English_ASR_Customization.ipynb Signed-off-by: Somshubra Majumdar * Update tutorials/nlp/SpellMapper_English_ASR_Customization.ipynb Signed-off-by: Somshubra Majumdar --------- Signed-off-by: fayejf Signed-off-by: Somshubra Majumdar Co-authored-by: fayejf <36722593+fayejf@users.noreply.github.com> Co-authored-by: Somshubra Majumdar Signed-off-by: jubick1337 * Added script to extract ASR CTC and RNNT models from ASR hybrid models (#7092) * Added script to extract ctc and rnnt models from hybrid models Signed-off-by: Daniel Egert * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Updated hybrid extraction script for review request 1 Signed-off-by: Daniel Egert * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Updated hybrid convert script to remove --cuda flag Signed-off-by: Daniel Egert --------- Signed-off-by: Daniel Egert Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Somshubra Majumdar Signed-off-by: jubick1337 * Adding docs and models for multiple lookahead cache-aware ASR (#7067) (#7094) Signed-off-by: jubick1337 * update TTS readme (#7088) * update TTS readme Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> --------- Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Signed-off-by: jubick1337 * Fix absolute path in path join call (#7099) Signed-off-by: Jan Beckmann Signed-off-by: jubick1337 * Disable distopt contiguous param buffer by default (#7095) Signed-off-by: Tim Moon Signed-off-by: jubick1337 * microphone demo (#7110) Signed-off-by: Linnea Pari Leaver Co-authored-by: Linnea Pari Leaver Signed-off-by: jubick1337 * [Fix] load_state_dict in nlp_model.py (#7086) * Fix load_state_dict in nlp_model.py Signed-off-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: jubick1337 * Fix plot function in vad_utils.py (#7113) Fix plot function in vad_utils.py Signed-off-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com> Signed-off-by: jubick1337 * Fixed small bug with NoisePerturbationWithNormalization (#7118) Signed-off-by: Daniel Egert Signed-off-by: jubick1337 * Fix import guard checks (#7124) Signed-off-by: smajumdar Signed-off-by: jubick1337 * Revert "Fix import guard checks (#7124)" (#7125) This reverts commit a46e3251944642f9102aa16ce2d2f9d3a804ff8a. Signed-off-by: jubick1337 * Fix import guard checks (#7126) * Fix import guard checks Signed-off-by: smajumdar * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: smajumdar Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: jubick1337 * Add updated fc ctc and rnnt xxl models (#7128) (#7130) Signed-off-by: jubick1337 * [TTS] Create EnCodec training recipe (#6852) * [TTS] Create EnCodec training recipe Signed-off-by: Ryan * [TTS] Update encodec recipe Signed-off-by: Ryan * [TTS] Rename EnCodec to AudioCodec Signed-off-by: Ryan * [TTS] Add EnCodec unit tests Signed-off-by: Ryan * [TTS] Add copyright header to distributed.py Signed-off-by: Ryan --------- Signed-off-by: Ryan Signed-off-by: jubick1337 * Fix rank where torch.distributed may not be initialized yet and would not wait for tokenizer file caching (#7061) Signed-off-by: Kim Ngo <6362111+findkim@users.noreply.github.com> Co-authored-by: David Signed-off-by: jubick1337 * fix default attention size (#7141) (#7143) Signed-off-by: jubick1337 * fix evaluator.py for various exceptions by ast (#7150) Signed-off-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com> Signed-off-by: jubick1337 * [TTS][ZH] add Chinese TTS recipes based on IPA symbol sets. (#6893) * [TTS] add Chinese TTS recipe based on IPA. * add new pinyin and ipa dictionaries with 36 finals. * add yaml configs for 24-final pinyin and ipa. * add copyright header * add a directory level 24finals to discriminate from 36 finals. Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> * unify configs into a single one and add detailed comments providing supported candidates. Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> * choose 36-final IPA as default phoneme dict Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> --------- Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Signed-off-by: jubick1337 * [TTS] Add output audio format to preprocessing (#6889) * [TTS] Add output audio format to preprocessing Signed-off-by: Ryan * [TTS] Add format validation Signed-off-by: Ryan * [TTS] Fix data tutorial Signed-off-by: Ryan --------- Signed-off-by: Ryan Signed-off-by: jubick1337 * freeze (#7152) Signed-off-by: arendu Signed-off-by: jubick1337 * make sure any empty segments are removed (#7155) Signed-off-by: Elena Rastorgueva Signed-off-by: jubick1337 * Update RIR generation scripts (#6547) - fix: reduce room size if evaluation of params fails - added randomized mic placement - added diffuse noise generation - added an option to specify the format and subtype for saved audio Signed-off-by: Ante Jukić Signed-off-by: jubick1337 * A quickstart speech enhancement tutorial (#6492) A simple example of training a model for speech enhancement task Signed-off-by: Ante Jukić Signed-off-by: jubick1337 * NFA subtitle file config - specify colors and vertical alignment (#7160) * allow specifying colors of text in ASS subtitle file Signed-off-by: Elena Rastorgueva * specify vertical_alignment instead of marginv in ass_file_config Signed-off-by: Elena Rastorgueva * add documentation of CTMFileConfig and ASSFileConfig to NFA README Signed-off-by: Elena Rastorgueva --------- Signed-off-by: Elena Rastorgueva Signed-off-by: jubick1337 * Eagerly accumulate embedding grads into fp32 buffer (#6958) (#7153) Signed-off-by: Tim Moon Co-authored-by: Tim Moon <4406448+timmoon10@users.noreply.github.com> Signed-off-by: jubick1337 * TE bug fix (#7027) (#7036) Signed-off-by: Dmytro Pykhtar Co-authored-by: Dmytro Pykhtar <37850217+dimapihtar@users.noreply.github.com> Signed-off-by: jubick1337 * [TTS] Remove nested TTS configs (#7154) * [TTS] Remove nested TTS configs Signed-off-by: Ryan * [TTS] Modify tutorial to support multiple sampling rates Signed-off-by: Ryan * [TTS] Clarify min_duration unit Signed-off-by: Ryan * [TTS] Default 22.05kHz highfreq to null Signed-off-by: Ryan --------- Signed-off-by: Ryan Signed-off-by: jubick1337 * Merge release r1.20.0 to main (#7167) * update package info Signed-off-by: ericharper * Add ASR with TTS Tutorial. Fix enhancer usage. (#6955) * Add ASR with TTS Tutorial * Fix enhancer usage Signed-off-by: Vladimir Bataev * install_bs (#7019) Signed-off-by: Nikolay Karpov * Fix typo and branch in tutorial (#7048) Signed-off-by: Vladimir Bataev * fix syntax error introduced in PR-7079 (#7102) * fix syntax error introduced in PR-7079 Signed-off-by: Alexandra Antonova * fixes for pr review Signed-off-by: Alexandra Antonova --------- Signed-off-by: Alexandra Antonova * fix links for TN (#7117) Signed-off-by: Evelina * update branch (#7135) Signed-off-by: ericharper * Fixed main and merging this to r1.20 (#7127) * Fixed main and merging this to r1.20 Signed-off-by: Taejin Park * Update vad_utils.py Signed-off-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com> --------- Signed-off-by: Taejin Park Signed-off-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com> Co-authored-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com> * update branch Signed-off-by: ericharper * fix version Signed-off-by: ericharper * resolve conflict the other way Signed-off-by: ericharper * keep both Signed-off-by: ericharper * revert keep both Signed-off-by: ericharper --------- Signed-off-by: ericharper Signed-off-by: Vladimir Bataev Signed-off-by: Nikolay Karpov Signed-off-by: Alexandra Antonova Signed-off-by: Evelina Signed-off-by: Taejin Park Signed-off-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com> Co-authored-by: Vladimir Bataev Co-authored-by: Nikolay Karpov Co-authored-by: bene-ges Co-authored-by: Evelina <10428420+ekmb@users.noreply.github.com> Co-authored-by: Taejin Park Co-authored-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com> Signed-off-by: jubick1337 * Upgrade to pytorch lightning 2.0 (#6433) * Upgrade pytorch lightning version in requirements Signed-off-by: Abhishree * Initial fixes for PTL2.0 Signed-off-by: Abhishree * Add further fixes to support lightning 2.0 Signed-off-by: Abhishree * Add replacements for replace_sampler_ddp, resume_from_checkpoint_fit_path and few occurances of validation_epoch_end Signed-off-by: Abhishree * Replace all occurances of validation_epoch_end to on_validation_epoch_end Signed-off-by: Abhishree * Replace training_epoch_end, test_epoch_end with on_train_epoch_end and on_test_epoch_end respectively Signed-off-by: Abhishree * Change logger=None to logger=False in Trainer object Signed-off-by: Abhishree * Remove PTL2.0 deprecated Trainer args from TrainerConfig dataclass Signed-off-by: Abhishree * Modify trainer.precision check and other small edits Signed-off-by: Abhishree * Replace logger=None with logger=False in test_ptl_stateless_timer.py Trainer Signed-off-by: Abhishree * Add default values for args to fix Attribute Error Signed-off-by: Abhishree * Add the following modifications 1) Remove outputs arg from on_validation_epoch_end, on_test_epoch_end and make it an arg of the class 2) Replace resume_from_checkpoint with ckpt_path as needed 3) Explicitly add accelerator as 'CPU' in UTs being run on CPU Signed-off-by: Abhishree * Remove outputs arg from on_validation_epoch_end, on_test_epoch_end Signed-off-by: Abhishree * Remove outputs arg in on_validation_epoch_end in MultiBinaryAccuracy docstrings Signed-off-by: Abhishree * Add val, test outputs as instance vars in PunctuationCapitalizationModel and TokenClassificationModel Signed-off-by: Abhishree * Replace trainer.fit_loop.max_steps with trainer.fit_loop.epoch_loop.max_steps in test_optimizers_schedulers.py Signed-off-by: Abhishree * Revert an extra space that was mistakenly added Signed-off-by: Abhishree * Use self.validation_step_outputs and self.test_step_outputs in test_ema.py for uniformity Signed-off-by: Abhishree * Use self.validation_step_outputs and self.test_step_outputs in test_ptl_stateless_timer.py and check_for_ranks.py for uniformity Signed-off-by: Abhishree * Add self.validation_step_outputs.clear() and self.test_step_outputs.clear() wherever missing Signed-off-by: Abhishree * Remove outputs arg from on_train_epoch_end Signed-off-by: Abhishree * Remove outputs from on_validation_epoch_end in multi_binary_acc.py Signed-off-by: Abhishree * Remove output args from on_validation_epoch_end in the docstrings of some ASR files Signed-off-by: Abhishree * Remove output args from on_validation_epoch_end and clear memory from validation_step_outputs Signed-off-by: Abhishree * Add on_validation_epoch_end and remove outputs args for nlp models Signed-off-by: Abhishree * Append output of validation_step to validation_step_outputs in EncDecClassificationModel Signed-off-by: Abhishree * Add the following changes 1) Index self.validation_step_outputs and self.test_step.outputs with dataloader_idx wherever needed 2) Initialize self.validation_step_outputs and self.test_step.outputs as empty lists and add support for multi dataloaders if they exist 3) Remove self.pre_configure_ddp from NLPDDPStrategy class as its removed in PTL 2.0 Signed-off-by: Abhishree * Add default value dataloader_idx=0 for on_validation_batch_end() in megatron_base_model.py Signed-off-by: Abhishree * TypeCast precision to str in attention.py and utils_funcs.py to avoid TypeError Signed-off-by: Abhishree * Add if condition check for multiple dataloaders when appending to validation outputs Signed-off-by: Abhishree * Separate validation pass to be used with both validation_step and test_step Signed-off-by: Abhishree * Add if condition check for multiple dataloader while appending to test_step_outputs in punctuation_capitalization_model.py Signed-off-by: Abhishree * Add condition check for multiple dataloaders based on type of trainer.val/test_dataloaders or self._validation/test_dl instead of len Signed-off-by: Abhishree * Comment Megatron T5 IA3 PP=2 in CI pipeline due to dataloader_iter issue with PTL 2.0 Signed-off-by: Abhishree * Modify precision checks to account for 16-mixed and bf16-mixed Signed-off-by: Abhishree * Append output of validation/test_step to self.validation/test_step_outputs in CTCG2PModel Signed-off-by: Abhishree * Modify find_unused_parameters=True in g2p_heteronym model 1) Add find_unused_parameters=True for DDP strategy in g2p_heteronym_classification_train_and_evaluate.py 2) Remove args output in validation/test_step and add instance variables instead for heteronym_classification.py Signed-off-by: Abhishree * Remove outputs from on_test_epoch_end in DialogueGPTClassificationModel Signed-off-by: Abhishree * Add validation/test outputs in sgdqa_model and modify dialogue_config.yaml Signed-off-by: Abhishree * Add split arg self.test_step_outputs to TextClassificationModel Signed-off-by: Abhishree * Add test_step_outputs to dialogue and text classification models Signed-off-by: Abhishree * Change condition check for multiple dataloaders: 1) Replace ds_item as list in dialogue_config.yaml 2) Check for len of val/test_dataloaders or validation/test_dl along with type check of list in sgdqa_model.py while appending outputs of validation/test_step 3) Check for len of _validation/test_dl for creating self.validation/test_step_outputs in ModelPT and punctuation_cpitalization_model.py Signed-off-by: Abhishree * Add additional condition for multi dataloaders Check len(self.trainer.val/test_dataloaders) > 1 along with type(self.trainer.val/test_dataloaders) == list for multi dataloaders in validation/test_step Signed-off-by: Abhishree * Add val step outputs and default val for dataloader_idx 1) Append validation_step outout to self.validation_step_outputs in MultiLabelIntentSlotClassificationMode 2) Add default val for dataloader_idx for on_test_batch_start/end in TimingCallback 3) Add self.validation/test_step_outputs in BERTQAModel and remove outputs arg Signed-off-by: Abhishree * Add val/test_step_outputs to S2SQAModel and GPTQAModel Signed-off-by: Abhishree * Edit JenkinsFile for bert_pretrainig.py Edit Jenkinsfile for this test to disable validation as a workaround for trainer.val_dataloader None error Signed-off-by: Abhishree * Modify precision to support 16-mixed, bf16-mixed in megatron_gpt_pretraining.py Signed-off-by: Abhishree * Add ddp_find_unused_parameters_true and remove output args 1) Add ddp_find_unused_parameters_true fro trainer.strategy in self_alignment_pretraining.py as it has unused parameters 2) Remove output args and add self.validation/test_step_outputs to validation/test_step in mt_enc_dec_model.py 3) Comment tests in JenkinsFile that need to be fixed Signed-off-by: Abhishree * Precision fix in megatron_nmt_training.py for 16-mixed, bf16-mixed Signed-off-by: Abhishree * Precision fix for megatron_bert_pretraining.py and megatron_bert_model.py Signed-off-by: Abhishree * Precision fix and validation/test_step_outputs 1) Add fix to account for 16-mixed and bf16-mixed in megatron_retro_mutransfer_pretrain.py, megatron_retro_pretraining.py 2) Reset ckpt_path for test in enc_dec_nmt.py 3) Remove outputs args and add validation/test_step_outputs in megatron_retrieval_model.py 4) Comment Megatron Bert Pretraining and Resume Training with Pipeline Paralleism and add back NMT Training Post-LN Signed-off-by: Abhishree * Precision fix and skip few failing tests Signed-off-by: Abhishree * Add missing comment lines in JenkinsFile Signed-off-by: Abhishree * Comment jenkin tests and super().on_validation_epoch_end() in megatron_gpt_sft_model.py Signed-off-by: Abhishree * Minor edit JenkinsFile Signed-off-by: Abhishree * Minor edit in jenkins file Signed-off-by: Abhishree * Edit in Jenkins file Signed-off-by: Abhishree * Comment missed lines in Jenkins file Signed-off-by: Abhishree * Fix precision and validation/test outputs 1) Add precision fix to account for 16-mixed and bf16-mixed in megatron_t5_pretraining.py 2) Remove outputs args and add append loss to self.validation/test_step_outputs in megatron_lm_encoder_decoder_model.py 3) Add back resume_from_checkpoint in the megatron_t5_config.yaml 4) Comment out certain tests in Jenkins file Signed-off-by: Abhishree * Fix precision and validation/test/predict errors in megatron_t5_prompt_learning.py Signed-off-by: Abhishree * Precision fix and edit precision typo in all files 1) Account for 16-mixed and bf16-mixed in megatron_bart_pretraining.py and megatron_t5_seq2seq_finetune.py 2) Fix precision typo in all files Signed-off-by: Abhishree * Fix all CI TTS tests and comment few Jenkins tests Signed-off-by: Abhishree * Combine xx_epoch_end and on_xx_epoch_end Add on_inference_epoch_end to inference_epoch_end function and have a single on_validation/test_epoch_end in megatron_finetune_model.py and megatron_gpt_sft_model.py Signed-off-by: Abhishree * Add a missing comment in JenkinsFile Signed-off-by: Abhishree * Add try except StopIteration in validation_step for models with dataloader_iter Signed-off-by: Abhishree * Remove pyyaml from requirements Signed-off-by: Abhishree * Add try except for inference_step in megatron_finetune_model.py Signed-off-by: Abhishree * Remove limit_val_batches for mockGPTDataset test Signed-off-by: Abhishree * Add new self.validation_step_outputs for MegatronGPTSFTModel Signed-off-by: Abhishree * Minor edit Jenkinsfile Signed-off-by: Abhishree * Initialize self.validation/test_step_outputs in megatron_gpt_sft_model.py Initialize self.validation/test_step_outputs in setup of MegatronGPTSFTModel to take care of cases when datalaoders are not setup in ModelPT for example while restoring the model. Signed-off-by: Abhishree * Remove resume_from_checkpoint if trainer arg in conf yaml files Signed-off-by: Abhishree * Remove resume_from_checkpoint as trainer arg in GPT, T5 configs Signed-off-by: Abhishree * Remove resume_from_checkpoint in duplex_tn_config.yaml Signed-off-by: Abhishree * Fix typos, unused imports and refactor code to remove redundant funcs Signed-off-by: Abhishree * Remove commented code in megatron_nmt_model.py Signed-off-by: Abhishree * Fix overriden functions to match parent class functions Signed-off-by: Abhishree * Prefetch dataloader_iter to prevent hang for PP>1 Signed-off-by: Abhishree * Override setup() in NLPDDPStrategy to avoid hang during predict with PP>1 Signed-off-by: Abhishree * Uncomment tests in JenkinsFile Signed-off-by: Abhishree * Add '16' to precision checks and other minor fixes Signed-off-by: Abhishree * Clear validation/test_step_outputs with dataloader_idx for multi dataloaders Signed-off-by: Abhishree * Minor edits Signed-off-by: Abhishree * Modify precision checks to avoid indexing Signed-off-by: Abhishree * Remove self.validation_step_outputs_sft and add dataloader_idx to clear outputs Signed-off-by: Abhishree * Reference checkpoint with trainer.ckpt_path Signed-off-by: Abhishree * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add _prefetch to NLPModel and minor fixes Signed-off-by: Abhishree * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add limit_val_batches in JenkinsFile for NMT 1) Add trainer.limit_val_batches in Megatron NMT Training TP=2 2) Remove unused import in ModelPT Signed-off-by: Abhishree --------- Signed-off-by: Abhishree Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: jubick1337 * Include the scripts for preprocessing OAST and unit tests for chat sft datasets (#7112) * scripts for sft Signed-off-by: Yi Dong * fix style Signed-off-by: Yi Dong * adde special token only for huggingface model Signed-off-by: Yi Dong * change default name Signed-off-by: Yi Dong * print out error datapoint content Signed-off-by: Yi Dong * show error id Signed-off-by: Yi Dong * annotation script working Signed-off-by: Yi Dong * try to be compatible with huggingface tokenizer Signed-off-by: Yi Dong * added examples Signed-off-by: Yi Dong * added lang Signed-off-by: Yi Dong * added lang Signed-off-by: Yi Dong * text to value special case Signed-off-by: Yi Dong * configure the slider Signed-off-by: Yi Dong * annoatation handles lang Signed-off-by: Yi Dong * added the unit test for chat sft dataset Signed-off-by: Yi Dong * used the file in the test dir Signed-off-by: Yi Dong * fix json error Signed-off-by: Yi Dong * load local tokenizer Signed-off-by: Yi Dong * remove mask count check Signed-off-by: Yi Dong * added HF dataset backend Signed-off-by: Yi Dong * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Yi Dong Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: jubick1337 * add paths to labeler. (#7087) Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Signed-off-by: jubick1337 * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Kim Ngo <6362111+findkim@users.noreply.github.com> Signed-off-by: jubick1337 Signed-off-by: tbartley94 Signed-off-by: Nikolay Karpov Signed-off-by: Yi Dong Signed-off-by: Aleksandr Laptev Signed-off-by: Alexandra Antonova Signed-off-by: smajumdar Signed-off-by: AlexGrinch Signed-off-by: Evelina Signed-off-by: Vladimir Bataev Signed-off-by: Vitaly Lavrukhin Signed-off-by: stevehuang52 Signed-off-by: arendu Signed-off-by: sam1373 Signed-off-by: Boris Fomitchev Signed-off-by: fayejf Signed-off-by: Somshubra Majumdar Signed-off-by: Daniel Egert Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Signed-off-by: Jan Beckmann Signed-off-by: Tim Moon Signed-off-by: Linnea Pari Leaver Signed-off-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com> Signed-off-by: Ryan Signed-off-by: Elena Rastorgueva Signed-off-by: Ante Jukić Signed-off-by: Dmytro Pykhtar Signed-off-by: ericharper Signed-off-by: Taejin Park Signed-off-by: Abhishree Co-authored-by: Kim Ngo <6362111+findkim@users.noreply.github.com> Co-authored-by: tbartley94 <90423858+tbartley94@users.noreply.github.com> Co-authored-by: Nikolay Karpov Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Somshubra Majumdar Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Yi Dong <43824965+yidong72@users.noreply.github.com> Co-authored-by: Aleksandr Laptev Co-authored-by: bene-ges Co-authored-by: Evelina <10428420+ekmb@users.noreply.github.com> Co-authored-by: Aleksey Grinchuk (Oleksii Hrinchuk) Co-authored-by: Vladimir Bataev Co-authored-by: Vitaly Lavrukhin Co-authored-by: Eric Harper Co-authored-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com> Co-authored-by: fayejf <36722593+fayejf@users.noreply.github.com> Co-authored-by: Adi Renduchintala Co-authored-by: Vahid Noroozi Co-authored-by: Samuel Kriman Co-authored-by: Boris Fomitchev Co-authored-by: trias702 <25867060+trias702@users.noreply.github.com> Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Co-authored-by: Jan Beckmann Co-authored-by: Tim Moon <4406448+timmoon10@users.noreply.github.com> Co-authored-by: lleaver <137942999+lleaver@users.noreply.github.com> Co-authored-by: Linnea Pari Leaver Co-authored-by: Ryan Langman Co-authored-by: David Co-authored-by: Elena Rastorgueva <80532067+erastorgueva-nv@users.noreply.github.com> Co-authored-by: anteju <108555623+anteju@users.noreply.github.com> Co-authored-by: Dmytro Pykhtar <37850217+dimapihtar@users.noreply.github.com> Co-authored-by: Taejin Park Co-authored-by: Abhishree Thittenamane <47577437+athitten@users.noreply.github.com> --- .../conf/megatron_t5_finetune.yaml | 2 +- .../megatron_finetune_model.py | 30 +++++++++++++------ 2 files changed, 22 insertions(+), 10 deletions(-) diff --git a/examples/nlp/language_modeling/conf/megatron_t5_finetune.yaml b/examples/nlp/language_modeling/conf/megatron_t5_finetune.yaml index ab9939af518f..2ba68cbc5979 100644 --- a/examples/nlp/language_modeling/conf/megatron_t5_finetune.yaml +++ b/examples/nlp/language_modeling/conf/megatron_t5_finetune.yaml @@ -87,7 +87,7 @@ model: add_bos_to_input: ${data.train_ds.add_bos_to_input} add_eos_to_input: ${data.train_ds.add_eos_to_input} metric: - name: "exact_string_match" # Name of the evaluation metric to use. + name: "exact_string_match" # Name of the evaluation metric to use. Supported metrics: [`exact_string_match`, `rouge`, `pearson_corr_coef`, `spearman_corr_coef`, `f1`, `accuracy`, `average_precision`] average: micro # Average the metric over the dataset. Options: ['macro', 'micro']. Works only for 'F1', 'accuracy' etc. Refer to torchmetrics for metrics where this is supported. num_classes: null # Number of classes for the metric. Works only for 'F1', 'accuracy' and 'average_precision' etc. Refer to torchmetrics for metrics where this is supported. class_labels: null # If the targets in your dataset are strings and not integers/float, you need to provide a list of class labels (size = num_classes) so we can convert from strings to integer categories to compute the metric. diff --git a/nemo/collections/nlp/models/language_modeling/megatron_finetune_model.py b/nemo/collections/nlp/models/language_modeling/megatron_finetune_model.py index fb1fe83ee68e..9fce0d52c4a1 100644 --- a/nemo/collections/nlp/models/language_modeling/megatron_finetune_model.py +++ b/nemo/collections/nlp/models/language_modeling/megatron_finetune_model.py @@ -106,24 +106,36 @@ def setup_metric(self, data_cfg): ) metric_name = data_cfg.metric.name - metric = MetricStringToTorchMetric[metric_name] + metric_class = MetricStringToTorchMetric[metric_name] + # GLUE will not have a "src_file_name" attribute and will always have only a single metric. if hasattr(data_cfg, "src_file_name") or hasattr(data_cfg, "file_names"): - if hasattr(data_cfg, "src_file_name") and isinstance(data_cfg.src_file_name, ListConfig): - # We pass average and num_classes to the metric constructor via kwargs even if they don't exist for each metric. + if ( + hasattr(data_cfg, "src_file_name") + and isinstance(data_cfg.src_file_name, ListConfig) + and metric_name != 'rouge' + ): metric = [ - metric(average=data_cfg.metric.average, num_classes=data_cfg.metric.num_classes) + metric_class(average=data_cfg.metric.average, num_classes=data_cfg.metric.num_classes) for _ in range(len(data_cfg.src_file_name)) ] - elif hasattr(data_cfg, "file_names") and isinstance(data_cfg.file_names, ListConfig): + elif ( + hasattr(data_cfg, "file_names") + and isinstance(data_cfg.file_names, ListConfig) + and metric_name != 'rouge' + ): metric = [ - metric(average=data_cfg.metric.average, num_classes=data_cfg.metric.num_classes) + metric_class(average=data_cfg.metric.average, num_classes=data_cfg.metric.num_classes) for _ in range(len(data_cfg.file_names)) ] + elif hasattr(data_cfg, "src_file_name") and isinstance(data_cfg.src_file_name, ListConfig): + metric = [metric_class() for _ in range(len(data_cfg.src_file_name))] + elif hasattr(data_cfg, "file_names") and isinstance(data_cfg.file_names, ListConfig): + metric = [metric_class() for _ in range(len(data_cfg.file_names))] else: - metric = [metric(average=data_cfg.metric.average, num_classes=data_cfg.metric.num_classes)] + metric = [metric_class(average=data_cfg.metric.average, num_classes=data_cfg.metric.num_classes)] else: - metric = [metric()] # GLUE does need to specify average or num_classes. + metric = [metric_class()] # GLUE does need to specify average or num_classes. return metric, metric_name @@ -221,7 +233,7 @@ def cast_for_metric(self, pred, label, metric_name, class_labels=None, labels_ar else: pred = class_labels.index(pred) if label not in class_labels: - raise ValueError(f"Ground truth labe; {label} is not in the class labels list : {class_labels}") + raise ValueError(f"Ground truth label {label} is not in the class labels list : {class_labels}") label = class_labels.index(label) pred = torch.LongTensor([pred]).to(self.device) label = torch.LongTensor([label]).to(self.device)