Fix adapters and ptuning for amp O2 #7285

guyueh1 · 2023-08-21T18:46:23Z

What does this PR do ?

Fix the issues when using megatron_amp_O2 on PEFT models

Collection: [Note which collection this PR will affect]

Changelog

Transform adapters to fp16/bf16 when the base model uses megatron_amp_O2. When using amp_O2 we should make model parameters in fp16/bf16. Previous PR Add precision and megatron_amp_O2 configs to adapters #7232 handles the precision issue in ColumnParallelLinear and RowParallelLinear in PEFT models, and this PR handles precision casting of the rest of the modules such as layernorm.
Explicitly set the ptuning inference table to untrainable for FP16/BF16 modules. Previous whitelist only includes the name of the inference table as in a FP32 model. We add its name in FP16/BF16 models to whitelist.
Explicitly set ptuning inference table to be untrainable in AdapterPTuningModel like in PTuningModel

Usage

You can potentially add a usage example below

# Add a code snippet demonstrating how to use this

Before your PR is "Ready for review"

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests?
Did you add or update any necessary documentation?
Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
- Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

New Feature
Bugfix
Documentation

If you haven't finished some of the above items you can still open "Draft" PR.

Who can review?

Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.

Additional Information

Related to Add precision and megatron_amp_O2 configs to adapters #7232

* Under megatron_amp_O2, transform the adapter modules to low precision after instantiation Signed-off-by: Guyue Huang <[email protected]> Conflicts: nemo/collections/nlp/modules/common/megatron/adapters/parallel_adapters.py

* Fix the first_stage_of_pipeline detection for half models * Fix the freezing of InferenceTable for half models Signed-off-by: Guyue Huang <[email protected]>

* When unfreezing adapters, we explicitly set inference embedding table in prompt encoder to be untrainable. Signed-off-by: Guyue Huang <[email protected]>

…ebased

Signed-off-by: Guyue Huang <[email protected]>

for more information, see https://pre-commit.ci

nemo/collections/nlp/models/language_modeling/megatron_gpt_peft_models.py

Signed-off-by: jasonwan <[email protected]>

Signed-off-by: Guyue Huang <[email protected]>

…github.com:NVIDIA/NeMo into guyueh1/fix_adapters_and_ptuning_for_ampO2_rebased

for more information, see https://pre-commit.ci

Signed-off-by: jasonwan <[email protected]>

…ebased Signed-off-by: jasonwan <[email protected]>

Signed-off-by: jasonwan <[email protected]>

for more information, see https://pre-commit.ci

arendu

LGTM!!

* Transform adapter modules to fp16/bf16 under amp_O2 * Under megatron_amp_O2, transform the adapter modules to low precision after instantiation Signed-off-by: Guyue Huang <[email protected]> Conflicts: nemo/collections/nlp/modules/common/megatron/adapters/parallel_adapters.py * Fix ptuning under amp O2 * Fix the first_stage_of_pipeline detection for half models * Fix the freezing of InferenceTable for half models Signed-off-by: Guyue Huang <[email protected]> * Fix MegatronGPTAdapterPTuningModel * When unfreezing adapters, we explicitly set inference embedding table in prompt encoder to be untrainable. Signed-off-by: Guyue Huang <[email protected]> * Add comments for feature explanation Signed-off-by: Guyue Huang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix ptuning and lora model_parallel_config Signed-off-by: jasonwan <[email protected]> * Put the casting of adapters in their instantiaion Signed-off-by: Guyue Huang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * small fix for state dict Signed-off-by: jasonwan <[email protected]> * optional model_parallel_config Signed-off-by: jasonwan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Guyue Huang <[email protected]> Signed-off-by: jasonwan <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: jasonwan <[email protected]> Signed-off-by: Siddharth Tyagi <[email protected]> Signed-off-by: Siddharth Tyagi <[email protected]>

* Transform adapter modules to fp16/bf16 under amp_O2 * Under megatron_amp_O2, transform the adapter modules to low precision after instantiation Signed-off-by: Guyue Huang <[email protected]> Conflicts: nemo/collections/nlp/modules/common/megatron/adapters/parallel_adapters.py * Fix ptuning under amp O2 * Fix the first_stage_of_pipeline detection for half models * Fix the freezing of InferenceTable for half models Signed-off-by: Guyue Huang <[email protected]> * Fix MegatronGPTAdapterPTuningModel * When unfreezing adapters, we explicitly set inference embedding table in prompt encoder to be untrainable. Signed-off-by: Guyue Huang <[email protected]> * Add comments for feature explanation Signed-off-by: Guyue Huang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix ptuning and lora model_parallel_config Signed-off-by: jasonwan <[email protected]> * Put the casting of adapters in their instantiaion Signed-off-by: Guyue Huang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * small fix for state dict Signed-off-by: jasonwan <[email protected]> * optional model_parallel_config Signed-off-by: jasonwan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Guyue Huang <[email protected]> Signed-off-by: jasonwan <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: jasonwan <[email protected]> Signed-off-by: dorotat <[email protected]>

* Transform adapter modules to fp16/bf16 under amp_O2 * Under megatron_amp_O2, transform the adapter modules to low precision after instantiation Signed-off-by: Guyue Huang <[email protected]> Conflicts: nemo/collections/nlp/modules/common/megatron/adapters/parallel_adapters.py * Fix ptuning under amp O2 * Fix the first_stage_of_pipeline detection for half models * Fix the freezing of InferenceTable for half models Signed-off-by: Guyue Huang <[email protected]> * Fix MegatronGPTAdapterPTuningModel * When unfreezing adapters, we explicitly set inference embedding table in prompt encoder to be untrainable. Signed-off-by: Guyue Huang <[email protected]> * Add comments for feature explanation Signed-off-by: Guyue Huang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix ptuning and lora model_parallel_config Signed-off-by: jasonwan <[email protected]> * Put the casting of adapters in their instantiaion Signed-off-by: Guyue Huang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * small fix for state dict Signed-off-by: jasonwan <[email protected]> * optional model_parallel_config Signed-off-by: jasonwan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Guyue Huang <[email protected]> Signed-off-by: jasonwan <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: jasonwan <[email protected]>

guyueh1 added 3 commits August 18, 2023 10:14

Transform adapter modules to fp16/bf16 under amp_O2

c08fa63

* Under megatron_amp_O2, transform the adapter modules to low precision after instantiation Signed-off-by: Guyue Huang <[email protected]> Conflicts: nemo/collections/nlp/modules/common/megatron/adapters/parallel_adapters.py

Fix ptuning under amp O2

c505e05

* Fix the first_stage_of_pipeline detection for half models * Fix the freezing of InferenceTable for half models Signed-off-by: Guyue Huang <[email protected]>

Fix MegatronGPTAdapterPTuningModel

5eef329

* When unfreezing adapters, we explicitly set inference embedding table in prompt encoder to be untrainable. Signed-off-by: Guyue Huang <[email protected]>

github-actions bot added core Changes to NeMo Core NLP labels Aug 21, 2023

guyueh1 added 2 commits August 21, 2023 11:58

Merge branch 'main' into guyueh1/fix_adapters_and_ptuning_for_ampO2_r…

9b09164

…ebased

Add comments for feature explanation

5169153

Signed-off-by: Guyue Huang <[email protected]>

guyueh1 marked this pull request as ready for review August 21, 2023 19:10

[pre-commit.ci] auto fixes from pre-commit.com hooks

0190b5b

for more information, see https://pre-commit.ci

guyueh1 mentioned this pull request Aug 21, 2023

Add precision and megatron_amp_O2 configs to adapters #7232

Closed

8 tasks

guyueh1 requested a review from arendu August 21, 2023 19:49

blahBlahhhJ reviewed Aug 21, 2023

View reviewed changes

nemo/collections/nlp/models/language_modeling/megatron_gpt_peft_models.py Outdated Show resolved Hide resolved

blahBlahhhJ mentioned this pull request Aug 21, 2023

add PEFT adapter support for mcore gpt path #7276

Merged

blahBlahhhJ and others added 6 commits August 22, 2023 00:48

fix ptuning and lora model_parallel_config

f2f6f68

Signed-off-by: jasonwan <[email protected]>

Put the casting of adapters in their instantiaion

035c855

Signed-off-by: Guyue Huang <[email protected]>

Merge branch 'guyueh1/fix_adapters_and_ptuning_for_ampO2_rebased' of …

6b1eb5d

…github.com:NVIDIA/NeMo into guyueh1/fix_adapters_and_ptuning_for_ampO2_rebased

[pre-commit.ci] auto fixes from pre-commit.com hooks

b04d5ce

for more information, see https://pre-commit.ci

small fix for state dict

9d731a2

Signed-off-by: jasonwan <[email protected]>

Merge branch 'main' into guyueh1/fix_adapters_and_ptuning_for_ampO2_r…

3f5f697

…ebased Signed-off-by: jasonwan <[email protected]>

github-actions bot removed the core Changes to NeMo Core label Aug 22, 2023

blahBlahhhJ and others added 2 commits August 21, 2023 23:22

optional model_parallel_config

1d01fec

Signed-off-by: jasonwan <[email protected]>

[pre-commit.ci] auto fixes from pre-commit.com hooks

adccde6

for more information, see https://pre-commit.ci

arendu approved these changes Aug 22, 2023

View reviewed changes

arendu merged commit 69af78c into main Aug 22, 2023
15 checks passed

arendu deleted the guyueh1/fix_adapters_and_ptuning_for_ampO2_rebased branch August 22, 2023 14:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix adapters and ptuning for amp O2 #7285

Fix adapters and ptuning for amp O2 #7285

guyueh1 commented Aug 21, 2023

arendu left a comment •

edited

Loading

Fix adapters and ptuning for amp O2 #7285

Fix adapters and ptuning for amp O2 #7285

Conversation

guyueh1 commented Aug 21, 2023

What does this PR do ?

Changelog

Usage

Before your PR is "Ready for review"

Who can review?

Additional Information

arendu left a comment • edited Loading

Choose a reason for hiding this comment

arendu left a comment •

edited

Loading