Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LogProb calculation performance fix #3984

Merged
merged 6 commits into from
Apr 15, 2022
Merged

LogProb calculation performance fix #3984

merged 6 commits into from
Apr 15, 2022

Conversation

yidong72
Copy link
Collaborator

What does this PR do ?

Make log prob calculation faster and handles longer sequences.

@lgtm-com
Copy link

lgtm-com bot commented Apr 13, 2022

This pull request introduces 2 alerts when merging b35ad85 into 1a0575b - view on LGTM.com

new alerts:

  • 2 for Redundant assignment

@yidong72 yidong72 changed the base branch from main to r1.8.0 April 14, 2022 18:05
@yidong72 yidong72 changed the base branch from r1.8.0 to main April 14, 2022 18:19
@yidong72 yidong72 changed the base branch from main to r1.8.0 April 14, 2022 18:28
ericharper
ericharper previously approved these changes Apr 14, 2022
Copy link
Collaborator

@ericharper ericharper left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks!

Copy link
Collaborator

@ericharper ericharper left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks!

@yidong72 yidong72 merged commit 90b7dc0 into r1.8.0 Apr 15, 2022
@yidong72 yidong72 deleted the performance_fix branch April 15, 2022 18:10
ericharper pushed a commit that referenced this pull request Apr 20, 2022
* performance fix for logprob computation

Signed-off-by: Yi Dong <[email protected]>

* fix redandant assign

Signed-off-by: Yi Dong <[email protected]>

* fix bug to add gather from TP workers

Signed-off-by: Yi Dong <[email protected]>

Co-authored-by: Yi Dong <[email protected]>
ericharper added a commit that referenced this pull request Apr 20, 2022
* update version

Signed-off-by: ericharper <[email protected]>

* Stateless timer fix for PTL 1.6 (#3925)

* Stateless timer fix for PTL 1.6

Signed-off-by: MaximumEntropy <[email protected]>

* Stateless timer PTL test

Signed-off-by: MaximumEntropy <[email protected]>

* Fix year

Signed-off-by: MaximumEntropy <[email protected]>

* Style

Signed-off-by: MaximumEntropy <[email protected]>

* Remove unused imports

Signed-off-by: MaximumEntropy <[email protected]>

* Style

Signed-off-by: MaximumEntropy <[email protected]>

* GPU test

Signed-off-by: MaximumEntropy <[email protected]>

* Style

Signed-off-by: MaximumEntropy <[email protected]>

* clean import

Signed-off-by: ericharper <[email protected]>

Co-authored-by: ericharper <[email protected]>

* Fix issues with librosa deprecations (#3950)

Signed-off-by: smajumdar <[email protected]>

* Fix notebook bugs for branch r1.8.0 (#3948)

* load the model from ngc

Signed-off-by: Yi Dong <[email protected]>

* fix all biomegatron notebook

Signed-off-by: Yi Dong <[email protected]>

* fix the typos

Signed-off-by: Yi Dong <[email protected]>

* remove output

Signed-off-by: Yi Dong <[email protected]>

* fix isort

Signed-off-by: Yi Dong <[email protected]>

* fix merge error

Signed-off-by: Yi Dong <[email protected]>

* change ntpath for isort workaround

Signed-off-by: Yi Dong <[email protected]>

* fix unit test

Signed-off-by: Yi Dong <[email protected]>

* fix ci

Signed-off-by: Yi Dong <[email protected]>

* fix ci bert pretraining

Signed-off-by: Yi Dong <[email protected]>

* make it compatible with main

Signed-off-by: Yi Dong <[email protected]>

* add the teste for biomegatron ner

Signed-off-by: Yi Dong <[email protected]>

* fix argument

Signed-off-by: Yi Dong <[email protected]>

* fix usablity issue

Signed-off-by: Yi Dong <[email protected]>

* work around

Signed-off-by: Yi Dong <[email protected]>

Co-authored-by: Yi Dong <[email protected]>
Co-authored-by: Eric Harper <[email protected]>

* Fix global batch fit loop (#3936)

* add lightning module hooks for global batch

Signed-off-by: ericharper <[email protected]>

* clean scripts

Signed-off-by: ericharper <[email protected]>

* style

Signed-off-by: ericharper <[email protected]>

* remove unused import

Signed-off-by: ericharper <[email protected]>

* DP=1 fix

Signed-off-by: MaximumEntropy <[email protected]>

* set num dataset workers to 2

Signed-off-by: ericharper <[email protected]>

* update validation_loop with GlobalDataFetcher

Signed-off-by: ericharper <[email protected]>

* add test global data fetcher

Signed-off-by: ericharper <[email protected]>

* Drop last for test ds

Signed-off-by: MaximumEntropy <[email protected]>

* Fix test epoch end

Signed-off-by: MaximumEntropy <[email protected]>

* Style

Signed-off-by: MaximumEntropy <[email protected]>

* Fix eval

Signed-off-by: MaximumEntropy <[email protected]>

* Fix reconfigure microbatch in the complete method

Signed-off-by: MaximumEntropy <[email protected]>

* add comments

Signed-off-by: MaximumEntropy <[email protected]>

* Set init consumed samples

Signed-off-by: MaximumEntropy <[email protected]>

* fix shuffle

Signed-off-by: MaximumEntropy <[email protected]>

* add save_restore_connector arg

Signed-off-by: ericharper <[email protected]>

* Fix padding for labels and loss mask

Signed-off-by: MaximumEntropy <[email protected]>

* GLUE/XNLI CI tests

Signed-off-by: MaximumEntropy <[email protected]>

* limit val batches in hydra fix

Signed-off-by: MaximumEntropy <[email protected]>

* Restart CI

Signed-off-by: MaximumEntropy <[email protected]>

* Fix unittest

Signed-off-by: MaximumEntropy <[email protected]>

Co-authored-by: MaximumEntropy <[email protected]>

* Exports 22.03 war (#3957)

* Fixed fastpitch for 22.03

Signed-off-by: Boris Fomitchev <[email protected]>

* cleanup

Signed-off-by: Boris Fomitchev <[email protected]>

* Restored mask expansion; added WAR for test container images

Signed-off-by: Boris Fomitchev <[email protected]>

* style

Signed-off-by: Boris Fomitchev <[email protected]>

* Refactor restorefrom (#3927)

* update package info (#3926)

Signed-off-by: ericharper <[email protected]>

* Refactor restore_from

Signed-off-by: Ramanathan Arunachalam <[email protected]>

* Move export related python files to scripts/export/

Signed-off-by: Ramanathan Arunachalam <[email protected]>

* Return state dict after modification function

* Remove Megatron legacy parameter in common.py restore_from function

Signed-off-by: Ramanathan Arunachalam <[email protected]>

* ability to set log_predictions to false (#3929)

* Bumping Python version

Signed-off-by: Oleksii Kuchaiev <[email protected]>

* fixing style

Signed-off-by: Oleksii Kuchaiev <[email protected]>

* load the model from ngc

Signed-off-by: Yi Dong <[email protected]>

* fix all biomegatron notebook

Signed-off-by: Yi Dong <[email protected]>

* fix the typos

Signed-off-by: Yi Dong <[email protected]>

* remove output

Signed-off-by: Yi Dong <[email protected]>

* fix isort

Signed-off-by: Yi Dong <[email protected]>

* fix merge error

Signed-off-by: Yi Dong <[email protected]>

* change ntpath for isort workaround

Signed-off-by: Yi Dong <[email protected]>

* fix unit test

Signed-off-by: Yi Dong <[email protected]>

* fix ci

Signed-off-by: Yi Dong <[email protected]>

* fix ci bert pretraining

Signed-off-by: Yi Dong <[email protected]>

* Rearrage export files; Style fix; Extend legacy MegatronBert conversion to NLP models nemo version updation

* Glu activation variants (#3951)

* Temp

Signed-off-by: MaximumEntropy <[email protected]>

* Add reglu and swiglu activations

Signed-off-by: MaximumEntropy <[email protected]>

* Style

Signed-off-by: MaximumEntropy <[email protected]>

* Style on unrelated file

Signed-off-by: MaximumEntropy <[email protected]>

* CI changes to test activations

Signed-off-by: MaximumEntropy <[email protected]>

* Fix unused import

Signed-off-by: MaximumEntropy <[email protected]>

* Style fix beacuse of merge from main

Signed-off-by: Ramanathan Arunachalam <[email protected]>

* make it compatible with main

Signed-off-by: Yi Dong <[email protected]>

* add the teste for biomegatron ner

Signed-off-by: Yi Dong <[email protected]>

* fix argument

Signed-off-by: Yi Dong <[email protected]>

* fix usablity issue

Signed-off-by: Yi Dong <[email protected]>

* FastPitch FT notebook - Improving Speech Quality clarifications (#3954)

* FastPitch FT notebook - Improving Speech Quality clarifications

Signed-off-by: Jocelyn Huang <[email protected]>

* Add pynini dependency install to FastPitch FT notebook

Signed-off-by: Jocelyn Huang <[email protected]>

* Pin pynini install for FastPitch FT tutorial

Signed-off-by: Jocelyn Huang <[email protected]>

* work around

Signed-off-by: Yi Dong <[email protected]>

Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: Ramanathan Arunachalam <[email protected]>
Co-authored-by: Dima Rekesh <[email protected]>
Co-authored-by: Oleksii Kuchaiev <[email protected]>
Co-authored-by: Yi Dong <[email protected]>
Co-authored-by: Yi Dong <[email protected]>
Co-authored-by: Boris Fomitchev <[email protected]>
Co-authored-by: Sandeep Subramanian <[email protected]>
Co-authored-by: Jocelyn <[email protected]>

* Bump TTS deprecation version to 1.9 (#3955)

* bump deprecation version

Signed-off-by: Jason <[email protected]>

* update talknet depre

Signed-off-by: Jason <[email protected]>

* added conformer for zh. (#3970)

Signed-off-by: Vahid <[email protected]>

* Add pinned pynini and scipy installs to TTS training tutorial (#3967)

Signed-off-by: Jocelyn Huang <[email protected]>

* Fix variable name and move models to CPU in Change partition (#3972)

* fixes

Signed-off-by: Abhinav Khattar <[email protected]>

* add CI

Signed-off-by: Abhinav Khattar <[email protected]>

Co-authored-by: Sandeep Subramanian <[email protected]>

* fix misconfiguration (#3975)

Signed-off-by: Yi Dong <[email protected]>

Co-authored-by: Yi Dong <[email protected]>

* Fix NMT variable passing bug (#3985)

* fix

Signed-off-by: Abhinav Khattar <[email protected]>

* stylefix

Signed-off-by: Abhinav Khattar <[email protected]>

* Compatability override to load_state_dict for old TTS checkpoints (#3978)

* Compatability override to load_state_dict for old TTS checkpoints

Signed-off-by: Jocelyn Huang <[email protected]>

* Tacotron2 training notebook fix - add GPU argument

Signed-off-by: Jocelyn Huang <[email protected]>

* Add hann window override warning for old model loading

Signed-off-by: Jocelyn Huang <[email protected]>

* Notebook Bug Fixes for r1.8.0 (#3989)

* Made config related bug fixes

Signed-off-by: Virginia Adams <[email protected]>

* Fixed cfg.get syntax

Signed-off-by: Virginia Adams <[email protected]>

* Fix compat override for TalkNet Aligner (#3993)

* Fix compatibility override for TalkNet Aligner

Signed-off-by: Jocelyn Huang <[email protected]>

* Remove extraneous logging import

Signed-off-by: Jocelyn Huang <[email protected]>

* docs fixes (#3987)

* docs fixes

Signed-off-by: ekmb <[email protected]>

* rename files in docs

Signed-off-by: ekmb <[email protected]>

* docs improvement

Signed-off-by: ekmb <[email protected]>

* arg renamed

Signed-off-by: ekmb <[email protected]>

* Fix nemo megatron restore with artifacts (#3997)

* update config_path in register_artifact

Signed-off-by: ericharper <[email protected]>

* fix register_artifact calls

Signed-off-by: ericharper <[email protected]>

* fix register_artifact calls

Signed-off-by: ericharper <[email protected]>

* update log messages to include merges file

Signed-off-by: ericharper <[email protected]>

* add default prompts to config

Signed-off-by: ericharper <[email protected]>

* Fixes val_check_interval, skip loading train data during eval (#3968)

* Change stage check

Signed-off-by: MaximumEntropy <[email protected]>

* Fix bugs in megatron t5 glue eval scripts

Signed-off-by: Yu Yao <[email protected]>

* Fix reconfigure

Signed-off-by: MaximumEntropy <[email protected]>

* Change check

Signed-off-by: MaximumEntropy <[email protected]>

* Fix hasattr

Signed-off-by: MaximumEntropy <[email protected]>

* Fix typo in cfg structure

Signed-off-by: Yu Yao <[email protected]>

* Update megatron t5 glue eval config file

Signed-off-by: Yu Yao <[email protected]>

* Reconfigure to avoid drop last

Signed-off-by: MaximumEntropy <[email protected]>

* Fix for train step reconfigure as well

Signed-off-by: MaximumEntropy <[email protected]>

* Update megatron t5 glue eval config file drop_last to False

Signed-off-by: Yu Yao <[email protected]>

* Style

Signed-off-by: MaximumEntropy <[email protected]>

* limit test batches

Signed-off-by: MaximumEntropy <[email protected]>

Co-authored-by: Yu Yao <[email protected]>
Co-authored-by: Eric Harper <[email protected]>

* LogProb calculation performance fix (#3984)

* performance fix for logprob computation

Signed-off-by: Yi Dong <[email protected]>

* fix redandant assign

Signed-off-by: Yi Dong <[email protected]>

* fix bug to add gather from TP workers

Signed-off-by: Yi Dong <[email protected]>

Co-authored-by: Yi Dong <[email protected]>

* Fix link issues in export example notebook and fix pretrained model info for MegatronBert (#4004)

Signed-off-by: Ramanathan Arunachalam <[email protected]>

Co-authored-by: Ramanathan Arunachalam <[email protected]>

* Fix single GPU training issue + change deprecated Lightning args (#4010)

* change vars

Signed-off-by: Abhinav Khattar <[email protected]>

* style fix

Signed-off-by: Abhinav Khattar <[email protected]>

* Fix P-Tune T5 model (#4001)

* fix ptune t5

Signed-off-by: Yi Dong <[email protected]>

* fix ci test

Signed-off-by: Yi Dong <[email protected]>

* fix the ci fail because of the order problem

Signed-off-by: Yi Dong <[email protected]>

Co-authored-by: Yi Dong <[email protected]>
Co-authored-by: Eric Harper <[email protected]>

* Megatron work-arounds (#3998)

* WAR around Apex issue, and making sure output is FP32

Signed-off-by: Boris Fomitchev <[email protected]>

* Fixing merge issues; moving dummy Trainer; adding float() casts

Signed-off-by: Boris Fomitchev <[email protected]>

* Fixing ColumnParallelLinear call

Signed-off-by: Boris Fomitchev <[email protected]>

* Cleanup

Signed-off-by: Boris Fomitchev <[email protected]>

* Cleanup#2

Signed-off-by: Boris Fomitchev <[email protected]>

* fix the broadcast shape mismatch (#4017)

Signed-off-by: Yi Dong <[email protected]>

Co-authored-by: Yi Dong <[email protected]>
Co-authored-by: Eric Harper <[email protected]>

* add known issues (#4024)

Signed-off-by: ericharper <[email protected]>

* update readme with conda env setup instructions

Signed-off-by: ericharper <[email protected]>

* typo

Signed-off-by: ericharper <[email protected]>

* update package info

Signed-off-by: ericharper <[email protected]>

* update branch

Signed-off-by: ericharper <[email protected]>

* update package info

Signed-off-by: ericharper <[email protected]>

* revert apex guard removal

Signed-off-by: ericharper <[email protected]>

* revert --language to --lang

Signed-off-by: ericharper <[email protected]>

* fix apex guard

Signed-off-by: ericharper <[email protected]>

* remove set_trace

Signed-off-by: ericharper <[email protected]>

* typo

Signed-off-by: ericharper <[email protected]>

* typo

Signed-off-by: ericharper <[email protected]>

* fix apex guard

Signed-off-by: ericharper <[email protected]>

* remove unreachable statement

Signed-off-by: ericharper <[email protected]>

* remove duplicate lines

Signed-off-by: ericharper <[email protected]>

* remove duplicate lines

Signed-off-by: ericharper <[email protected]>

Co-authored-by: Sandeep Subramanian <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: Yi Dong <[email protected]>
Co-authored-by: Yi Dong <[email protected]>
Co-authored-by: Boris Fomitchev <[email protected]>
Co-authored-by: Ramanathan Arunachalam <[email protected]>
Co-authored-by: Ramanathan Arunachalam <[email protected]>
Co-authored-by: Dima Rekesh <[email protected]>
Co-authored-by: Oleksii Kuchaiev <[email protected]>
Co-authored-by: Jocelyn <[email protected]>
Co-authored-by: Jason <[email protected]>
Co-authored-by: Vahid Noroozi <[email protected]>
Co-authored-by: Abhinav Khattar <[email protected]>
Co-authored-by: Virginia Adams <[email protected]>
Co-authored-by: Evelina <[email protected]>
Co-authored-by: Yu Yao <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants