Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mT5 whole word masking and T5 finetuning config fixes #3776

Merged
merged 12 commits into from
Mar 7, 2022

Conversation

MaximumEntropy
Copy link
Contributor

Signed-off-by: MaximumEntropy [email protected]

What does this PR do ?

This PR makes two fixes to T5 before r1.7.0 (a) Sets async grad allreduce to false for bf16 O2 (b) undoes whole word masking logic for sentencepiece

Collection: NLP

Changelog

  • Add specific line by line info of high level changes in this PR.

Usage

  • N/A
# Add a code snippet demonstrating how to use this 

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you add or update any necessary documentation?
  • Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
    • Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

  • New Feature
  • Bugfix
  • Documentation

If you haven't finished some of the above items you can still open "Draft" PR.

Who can review?

Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.

Additional Information

  • Related to # (issue)

Signed-off-by: MaximumEntropy <[email protected]>
Signed-off-by: MaximumEntropy <[email protected]>
Signed-off-by: MaximumEntropy <[email protected]>
@MaximumEntropy MaximumEntropy changed the title O2 and whole word masking changes mT5 whole word masking and T5 finetuning config fixes Mar 2, 2022
@MaximumEntropy MaximumEntropy changed the base branch from r1.7.0 to r1.7.1 March 2, 2022 18:28
Copy link
Collaborator

@ericharper ericharper left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@ericharper ericharper merged commit 836d813 into r1.7.1 Mar 7, 2022
@ericharper ericharper deleted the t5_masking_and_o2_fixes branch March 7, 2022 23:08
ericharper added a commit that referenced this pull request Mar 14, 2022
* Tn bug 1.7.0 (#3730)

* fix es and fr bug

Signed-off-by: Yang Zhang <[email protected]>

* add file

Signed-off-by: Yang Zhang <[email protected]>

* [TTS] Fix bugs in E2E TTS, Mixer-TTS and FastPitch (#3740)

* fix bugs

Signed-off-by: Oktai Tatanov <[email protected]>

* fix bug in e2e tts and mixer tts

Signed-off-by: Oktai Tatanov <[email protected]>

* Mirror AN4 data while servers are down (#3743)

Signed-off-by: smajumdar <[email protected]>

* Bugfix for GPT eval  (#3744)

* use tokens_cut not tokens

Signed-off-by: ericharper <[email protected]>

* remove precision conversion and comment jit for bias gelu

Signed-off-by: ericharper <[email protected]>

* revert comment update mbs in config

Signed-off-by: ericharper <[email protected]>

* calculate micro_batch_size during complete and compute_logprobs

Signed-off-by: ericharper <[email protected]>

* ASR SSL update (#3746)

* ssl update

Signed-off-by: sam1373 <[email protected]>

* tutorial update

Signed-off-by: sam1373 <[email protected]>

* Fix SSL configs for 1.7 (#3748)

* ssl update

Signed-off-by: sam1373 <[email protected]>

* tutorial update

Signed-off-by: sam1373 <[email protected]>

* revert configs

Signed-off-by: sam1373 <[email protected]>

* revert configs

Signed-off-by: sam1373 <[email protected]>

* punct process bug fix (#3747)

Signed-off-by: ekmb <[email protected]>

Co-authored-by: Somshubra Majumdar <[email protected]>

* updated conformer models. (#3741)

Signed-off-by: Vahid <[email protected]>

Co-authored-by: Somshubra Majumdar <[email protected]>

* Yuya/megatron t5 glue eval (#3751)

* Add megatron t5 glue eval-only script

Signed-off-by: Yu Yao <[email protected]>

* Update megatron t5 glue eval default configs

Signed-off-by: Yu Yao <[email protected]>

* Update megatron t5 glue eval configs

Signed-off-by: Yu Yao <[email protected]>

* Update config comments

Signed-off-by: Yu Yao <[email protected]>

Co-authored-by: Yu Yao <[email protected]>

* Specify gpus in SSL notebook (#3753)

* ssl update

Signed-off-by: sam1373 <[email protected]>

* tutorial update

Signed-off-by: sam1373 <[email protected]>

* revert configs

Signed-off-by: sam1373 <[email protected]>

* revert configs

Signed-off-by: sam1373 <[email protected]>

* specify gpus

Signed-off-by: sam1373 <[email protected]>

* Duplex model inference fix, money encoder fix (#3754)

Signed-off-by: ekmb <[email protected]>

* Update docs for RNNT and overriding fused batch size (#3755)

Signed-off-by: smajumdar <[email protected]>

* fix consumed samples calculation + PTune Model bugs (#3738)

* fix the way computing consumed samples

Signed-off-by: Yi Dong <[email protected]>

* fixed ptune model

Signed-off-by: Yi Dong <[email protected]>

* make sure notebook is working

Signed-off-by: Yi Dong <[email protected]>

* added try-catch

Signed-off-by: Yi Dong <[email protected]>

Co-authored-by: Micha Livne <[email protected]>
Co-authored-by: Eric Harper <[email protected]>

* fix directories in ssl notebook (#3758)

* ssl update

Signed-off-by: sam1373 <[email protected]>

* tutorial update

Signed-off-by: sam1373 <[email protected]>

* revert configs

Signed-off-by: sam1373 <[email protected]>

* revert configs

Signed-off-by: sam1373 <[email protected]>

* specify gpus

Signed-off-by: sam1373 <[email protected]>

* update dirs

Signed-off-by: sam1373 <[email protected]>

* TN docs update (#3735)

* TN docs update: audio based docs added, quick start, ref fixed, etc

Signed-off-by: ekmb <[email protected]>

* add deployment script dir and Sp TN

Signed-off-by: ekmb <[email protected]>

Co-authored-by: Yang Zhang <[email protected]>

* Update Tacotron2_Training.ipynb (#3769)

Signed-off-by: Jason <[email protected]>

* fix dockerfile (#3778)

Signed-off-by: Yang Zhang <[email protected]>

* Prompt-Tuning-Documentation (#3777)

* Update megatron.rst

* Updated example prompt tuning script's doc string

* Update megatron.rst

* Update megatron.rst

Co-authored-by: Eric Harper <[email protected]>

* Prompt tuning bug fix (#3780)

* Making updated code backwards compatible with previous prompt tuned models

Signed-off-by: Virginia Adams <[email protected]>

* Fixed backward compatiablity bug

Signed-off-by: Virginia Adams <[email protected]>

* Removed random import

Signed-off-by: Virginia Adams <[email protected]>

Co-authored-by: Eric Harper <[email protected]>

* update branch

Signed-off-by: ericharper <[email protected]>

* revert changes (#3785)

Signed-off-by: Yang Zhang <[email protected]>

* Fixed soft prompt eval loading bug (#3805)

Signed-off-by: Virginia Adams <[email protected]>

* mT5 whole word masking and T5 finetuning config fixes (#3776)

* O2 and whole word masking changes

Signed-off-by: MaximumEntropy <[email protected]>

* Style

Signed-off-by: MaximumEntropy <[email protected]>

* Update yaml

Signed-off-by: MaximumEntropy <[email protected]>

* Tok and O2 fix

Signed-off-by: MaximumEntropy <[email protected]>

* Fix arg passing

Signed-off-by: MaximumEntropy <[email protected]>

* Fix checkpoint path

Signed-off-by: MaximumEntropy <[email protected]>

* Style fixes

Signed-off-by: MaximumEntropy <[email protected]>

* Raise error if FP16 training is tried with O2 recipe. (#3806)

* raise error

Signed-off-by: ericharper <[email protected]>

* update assert

Signed-off-by: ericharper <[email protected]>

* update error message

Signed-off-by: ericharper <[email protected]>

* update error message

Signed-off-by: ericharper <[email protected]>

* update branch

Signed-off-by: ericharper <[email protected]>

* remove test

Signed-off-by: ericharper <[email protected]>

* revert bad merges

Signed-off-by: ericharper <[email protected]>

* revert change partitions

Signed-off-by: ericharper <[email protected]>

Co-authored-by: Yang Zhang <[email protected]>
Co-authored-by: Oktai Tatanov <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: Samuel Kriman <[email protected]>
Co-authored-by: Evelina <[email protected]>
Co-authored-by: Vahid Noroozi <[email protected]>
Co-authored-by: yaoyu-33 <[email protected]>
Co-authored-by: Yu Yao <[email protected]>
Co-authored-by: Yi Dong <[email protected]>
Co-authored-by: Micha Livne <[email protected]>
Co-authored-by: Jason <[email protected]>
Co-authored-by: Virginia Adams <[email protected]>
Co-authored-by: Sandeep Subramanian <[email protected]>
fayejf pushed a commit that referenced this pull request Mar 22, 2022
* Tn bug 1.7.0 (#3730)

* fix es and fr bug

Signed-off-by: Yang Zhang <[email protected]>

* add file

Signed-off-by: Yang Zhang <[email protected]>

* [TTS] Fix bugs in E2E TTS, Mixer-TTS and FastPitch (#3740)

* fix bugs

Signed-off-by: Oktai Tatanov <[email protected]>

* fix bug in e2e tts and mixer tts

Signed-off-by: Oktai Tatanov <[email protected]>

* Mirror AN4 data while servers are down (#3743)

Signed-off-by: smajumdar <[email protected]>

* Bugfix for GPT eval  (#3744)

* use tokens_cut not tokens

Signed-off-by: ericharper <[email protected]>

* remove precision conversion and comment jit for bias gelu

Signed-off-by: ericharper <[email protected]>

* revert comment update mbs in config

Signed-off-by: ericharper <[email protected]>

* calculate micro_batch_size during complete and compute_logprobs

Signed-off-by: ericharper <[email protected]>

* ASR SSL update (#3746)

* ssl update

Signed-off-by: sam1373 <[email protected]>

* tutorial update

Signed-off-by: sam1373 <[email protected]>

* Fix SSL configs for 1.7 (#3748)

* ssl update

Signed-off-by: sam1373 <[email protected]>

* tutorial update

Signed-off-by: sam1373 <[email protected]>

* revert configs

Signed-off-by: sam1373 <[email protected]>

* revert configs

Signed-off-by: sam1373 <[email protected]>

* punct process bug fix (#3747)

Signed-off-by: ekmb <[email protected]>

Co-authored-by: Somshubra Majumdar <[email protected]>

* updated conformer models. (#3741)

Signed-off-by: Vahid <[email protected]>

Co-authored-by: Somshubra Majumdar <[email protected]>

* Yuya/megatron t5 glue eval (#3751)

* Add megatron t5 glue eval-only script

Signed-off-by: Yu Yao <[email protected]>

* Update megatron t5 glue eval default configs

Signed-off-by: Yu Yao <[email protected]>

* Update megatron t5 glue eval configs

Signed-off-by: Yu Yao <[email protected]>

* Update config comments

Signed-off-by: Yu Yao <[email protected]>

Co-authored-by: Yu Yao <[email protected]>

* Specify gpus in SSL notebook (#3753)

* ssl update

Signed-off-by: sam1373 <[email protected]>

* tutorial update

Signed-off-by: sam1373 <[email protected]>

* revert configs

Signed-off-by: sam1373 <[email protected]>

* revert configs

Signed-off-by: sam1373 <[email protected]>

* specify gpus

Signed-off-by: sam1373 <[email protected]>

* Duplex model inference fix, money encoder fix (#3754)

Signed-off-by: ekmb <[email protected]>

* Update docs for RNNT and overriding fused batch size (#3755)

Signed-off-by: smajumdar <[email protected]>

* fix consumed samples calculation + PTune Model bugs (#3738)

* fix the way computing consumed samples

Signed-off-by: Yi Dong <[email protected]>

* fixed ptune model

Signed-off-by: Yi Dong <[email protected]>

* make sure notebook is working

Signed-off-by: Yi Dong <[email protected]>

* added try-catch

Signed-off-by: Yi Dong <[email protected]>

Co-authored-by: Micha Livne <[email protected]>
Co-authored-by: Eric Harper <[email protected]>

* fix directories in ssl notebook (#3758)

* ssl update

Signed-off-by: sam1373 <[email protected]>

* tutorial update

Signed-off-by: sam1373 <[email protected]>

* revert configs

Signed-off-by: sam1373 <[email protected]>

* revert configs

Signed-off-by: sam1373 <[email protected]>

* specify gpus

Signed-off-by: sam1373 <[email protected]>

* update dirs

Signed-off-by: sam1373 <[email protected]>

* TN docs update (#3735)

* TN docs update: audio based docs added, quick start, ref fixed, etc

Signed-off-by: ekmb <[email protected]>

* add deployment script dir and Sp TN

Signed-off-by: ekmb <[email protected]>

Co-authored-by: Yang Zhang <[email protected]>

* Update Tacotron2_Training.ipynb (#3769)

Signed-off-by: Jason <[email protected]>

* fix dockerfile (#3778)

Signed-off-by: Yang Zhang <[email protected]>

* Prompt-Tuning-Documentation (#3777)

* Update megatron.rst

* Updated example prompt tuning script's doc string

* Update megatron.rst

* Update megatron.rst

Co-authored-by: Eric Harper <[email protected]>

* Prompt tuning bug fix (#3780)

* Making updated code backwards compatible with previous prompt tuned models

Signed-off-by: Virginia Adams <[email protected]>

* Fixed backward compatiablity bug

Signed-off-by: Virginia Adams <[email protected]>

* Removed random import

Signed-off-by: Virginia Adams <[email protected]>

Co-authored-by: Eric Harper <[email protected]>

* update branch

Signed-off-by: ericharper <[email protected]>

* revert changes (#3785)

Signed-off-by: Yang Zhang <[email protected]>

* Fixed soft prompt eval loading bug (#3805)

Signed-off-by: Virginia Adams <[email protected]>

* mT5 whole word masking and T5 finetuning config fixes (#3776)

* O2 and whole word masking changes

Signed-off-by: MaximumEntropy <[email protected]>

* Style

Signed-off-by: MaximumEntropy <[email protected]>

* Update yaml

Signed-off-by: MaximumEntropy <[email protected]>

* Tok and O2 fix

Signed-off-by: MaximumEntropy <[email protected]>

* Fix arg passing

Signed-off-by: MaximumEntropy <[email protected]>

* Fix checkpoint path

Signed-off-by: MaximumEntropy <[email protected]>

* Style fixes

Signed-off-by: MaximumEntropy <[email protected]>

* Raise error if FP16 training is tried with O2 recipe. (#3806)

* raise error

Signed-off-by: ericharper <[email protected]>

* update assert

Signed-off-by: ericharper <[email protected]>

* update error message

Signed-off-by: ericharper <[email protected]>

* update error message

Signed-off-by: ericharper <[email protected]>

* update branch

Signed-off-by: ericharper <[email protected]>

* remove test

Signed-off-by: ericharper <[email protected]>

* revert bad merges

Signed-off-by: ericharper <[email protected]>

* revert change partitions

Signed-off-by: ericharper <[email protected]>

Co-authored-by: Yang Zhang <[email protected]>
Co-authored-by: Oktai Tatanov <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: Samuel Kriman <[email protected]>
Co-authored-by: Evelina <[email protected]>
Co-authored-by: Vahid Noroozi <[email protected]>
Co-authored-by: yaoyu-33 <[email protected]>
Co-authored-by: Yu Yao <[email protected]>
Co-authored-by: Yi Dong <[email protected]>
Co-authored-by: Micha Livne <[email protected]>
Co-authored-by: Jason <[email protected]>
Co-authored-by: Virginia Adams <[email protected]>
Co-authored-by: Sandeep Subramanian <[email protected]>
fayejf pushed a commit that referenced this pull request Mar 22, 2022
* Tn bug 1.7.0 (#3730)

* fix es and fr bug

Signed-off-by: Yang Zhang <[email protected]>

* add file

Signed-off-by: Yang Zhang <[email protected]>

* [TTS] Fix bugs in E2E TTS, Mixer-TTS and FastPitch (#3740)

* fix bugs

Signed-off-by: Oktai Tatanov <[email protected]>

* fix bug in e2e tts and mixer tts

Signed-off-by: Oktai Tatanov <[email protected]>

* Mirror AN4 data while servers are down (#3743)

Signed-off-by: smajumdar <[email protected]>

* Bugfix for GPT eval  (#3744)

* use tokens_cut not tokens

Signed-off-by: ericharper <[email protected]>

* remove precision conversion and comment jit for bias gelu

Signed-off-by: ericharper <[email protected]>

* revert comment update mbs in config

Signed-off-by: ericharper <[email protected]>

* calculate micro_batch_size during complete and compute_logprobs

Signed-off-by: ericharper <[email protected]>

* ASR SSL update (#3746)

* ssl update

Signed-off-by: sam1373 <[email protected]>

* tutorial update

Signed-off-by: sam1373 <[email protected]>

* Fix SSL configs for 1.7 (#3748)

* ssl update

Signed-off-by: sam1373 <[email protected]>

* tutorial update

Signed-off-by: sam1373 <[email protected]>

* revert configs

Signed-off-by: sam1373 <[email protected]>

* revert configs

Signed-off-by: sam1373 <[email protected]>

* punct process bug fix (#3747)

Signed-off-by: ekmb <[email protected]>

Co-authored-by: Somshubra Majumdar <[email protected]>

* updated conformer models. (#3741)

Signed-off-by: Vahid <[email protected]>

Co-authored-by: Somshubra Majumdar <[email protected]>

* Yuya/megatron t5 glue eval (#3751)

* Add megatron t5 glue eval-only script

Signed-off-by: Yu Yao <[email protected]>

* Update megatron t5 glue eval default configs

Signed-off-by: Yu Yao <[email protected]>

* Update megatron t5 glue eval configs

Signed-off-by: Yu Yao <[email protected]>

* Update config comments

Signed-off-by: Yu Yao <[email protected]>

Co-authored-by: Yu Yao <[email protected]>

* Specify gpus in SSL notebook (#3753)

* ssl update

Signed-off-by: sam1373 <[email protected]>

* tutorial update

Signed-off-by: sam1373 <[email protected]>

* revert configs

Signed-off-by: sam1373 <[email protected]>

* revert configs

Signed-off-by: sam1373 <[email protected]>

* specify gpus

Signed-off-by: sam1373 <[email protected]>

* Duplex model inference fix, money encoder fix (#3754)

Signed-off-by: ekmb <[email protected]>

* Update docs for RNNT and overriding fused batch size (#3755)

Signed-off-by: smajumdar <[email protected]>

* fix consumed samples calculation + PTune Model bugs (#3738)

* fix the way computing consumed samples

Signed-off-by: Yi Dong <[email protected]>

* fixed ptune model

Signed-off-by: Yi Dong <[email protected]>

* make sure notebook is working

Signed-off-by: Yi Dong <[email protected]>

* added try-catch

Signed-off-by: Yi Dong <[email protected]>

Co-authored-by: Micha Livne <[email protected]>
Co-authored-by: Eric Harper <[email protected]>

* fix directories in ssl notebook (#3758)

* ssl update

Signed-off-by: sam1373 <[email protected]>

* tutorial update

Signed-off-by: sam1373 <[email protected]>

* revert configs

Signed-off-by: sam1373 <[email protected]>

* revert configs

Signed-off-by: sam1373 <[email protected]>

* specify gpus

Signed-off-by: sam1373 <[email protected]>

* update dirs

Signed-off-by: sam1373 <[email protected]>

* TN docs update (#3735)

* TN docs update: audio based docs added, quick start, ref fixed, etc

Signed-off-by: ekmb <[email protected]>

* add deployment script dir and Sp TN

Signed-off-by: ekmb <[email protected]>

Co-authored-by: Yang Zhang <[email protected]>

* Update Tacotron2_Training.ipynb (#3769)

Signed-off-by: Jason <[email protected]>

* fix dockerfile (#3778)

Signed-off-by: Yang Zhang <[email protected]>

* Prompt-Tuning-Documentation (#3777)

* Update megatron.rst

* Updated example prompt tuning script's doc string

* Update megatron.rst

* Update megatron.rst

Co-authored-by: Eric Harper <[email protected]>

* Prompt tuning bug fix (#3780)

* Making updated code backwards compatible with previous prompt tuned models

Signed-off-by: Virginia Adams <[email protected]>

* Fixed backward compatiablity bug

Signed-off-by: Virginia Adams <[email protected]>

* Removed random import

Signed-off-by: Virginia Adams <[email protected]>

Co-authored-by: Eric Harper <[email protected]>

* update branch

Signed-off-by: ericharper <[email protected]>

* revert changes (#3785)

Signed-off-by: Yang Zhang <[email protected]>

* Fixed soft prompt eval loading bug (#3805)

Signed-off-by: Virginia Adams <[email protected]>

* mT5 whole word masking and T5 finetuning config fixes (#3776)

* O2 and whole word masking changes

Signed-off-by: MaximumEntropy <[email protected]>

* Style

Signed-off-by: MaximumEntropy <[email protected]>

* Update yaml

Signed-off-by: MaximumEntropy <[email protected]>

* Tok and O2 fix

Signed-off-by: MaximumEntropy <[email protected]>

* Fix arg passing

Signed-off-by: MaximumEntropy <[email protected]>

* Fix checkpoint path

Signed-off-by: MaximumEntropy <[email protected]>

* Style fixes

Signed-off-by: MaximumEntropy <[email protected]>

* Raise error if FP16 training is tried with O2 recipe. (#3806)

* raise error

Signed-off-by: ericharper <[email protected]>

* update assert

Signed-off-by: ericharper <[email protected]>

* update error message

Signed-off-by: ericharper <[email protected]>

* update error message

Signed-off-by: ericharper <[email protected]>

* update branch

Signed-off-by: ericharper <[email protected]>

* remove test

Signed-off-by: ericharper <[email protected]>

* revert bad merges

Signed-off-by: ericharper <[email protected]>

* revert change partitions

Signed-off-by: ericharper <[email protected]>

Co-authored-by: Yang Zhang <[email protected]>
Co-authored-by: Oktai Tatanov <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: Samuel Kriman <[email protected]>
Co-authored-by: Evelina <[email protected]>
Co-authored-by: Vahid Noroozi <[email protected]>
Co-authored-by: yaoyu-33 <[email protected]>
Co-authored-by: Yu Yao <[email protected]>
Co-authored-by: Yi Dong <[email protected]>
Co-authored-by: Micha Livne <[email protected]>
Co-authored-by: Jason <[email protected]>
Co-authored-by: Virginia Adams <[email protected]>
Co-authored-by: Sandeep Subramanian <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants