Megatron GPT model finetuning #6210

MaximumEntropy · 2023-03-15T21:39:54Z

What does this PR do ?

Adds the ability to fine-tune Megatron GPT Models.

Collection: NLP

Changelog

Add specific line by line info of high level changes in this PR.

Usage

You can potentially add a usage example below

# Add a code snippet demonstrating how to use this

Before your PR is "Ready for review"

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests?
Did you add or update any necessary documentation?
Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
- Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

New Feature
Bugfix
Documentation

If you haven't finished some of the above items you can still open "Draft" PR.

Who can review?

Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.

Additional Information

Related to # (issue)

for more information, see https://pre-commit.ci

Signed-off-by: MaximumEntropy <[email protected]>

for more information, see https://pre-commit.ci

Signed-off-by: MaximumEntropy <[email protected]>

…IDIA/NeMo into sandeepsub/gpt_sft_stable_rebase_main

Signed-off-by: MaximumEntropy <[email protected]>

…IDIA/NeMo into sandeepsub/gpt_sft_stable_rebase_main

Signed-off-by: MaximumEntropy <[email protected]>

…IDIA/NeMo into sandeepsub/gpt_sft_stable_rebase_main

examples/nlp/language_modeling/tuning/megatron_gpt_sft.py

yidong72

Loos good. left some comments.

yidong72 · 2023-04-04T22:24:15Z

examples/nlp/language_modeling/tuning/megatron_gpt_sft.py

+    return model
+
+
+def load_from_checkpoint_dir(cls, cfg, trainer, modify_confg_fn):


Or we can put this into a utility function. It is used a lot in other places to load from checkpoint dir

examples/nlp/language_modeling/tuning/megatron_gpt_sft.py

yidong72 · 2023-04-04T22:27:48Z

nemo/collections/nlp/data/language_modeling/megatron/gpt_sft_dataset.py

+            text = self.prompt_template.replace('{input}', original_context).replace('{output}', output)
+
+        if self.separate_prompt_and_response_with_newline and self.prompt_template is None:
+            text = context + '\n' + output


should we use user provided separators?

I think the prompt_template should cover this case right?

yidong72 · 2023-04-04T22:28:47Z

nemo/collections/nlp/data/language_modeling/megatron/gpt_sft_dataset.py

+        if self.prompt_template is not None:
+            import ipdb
+
+            ipdb.set_trace()


remove the debug statement?

yidong72 · 2023-04-04T22:33:37Z

scripts/nlp_language_modeling/niv2/preprocess_niv2.py

+from argparse import ArgumentParser
+from multiprocessing import Pool
+
+from sacremoses import MosesDetokenizer


is it part of the plan to release NIV and T0 data preprocessing scripts? We would like others to SFT GPT with the same instruction dataset?

Signed-off-by: MaximumEntropy <[email protected]>

…IDIA/NeMo into sandeepsub/gpt_sft_stable_rebase_main

Signed-off-by: MaximumEntropy <[email protected]>

ericharper

LGTM. Thanks!

* copy from sft_from_gpt * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Changed tokenization and example * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * maybe remove (got from upstream) * Eval metrics while finetuning Signed-off-by: MaximumEntropy <[email protected]> * Add missing args Signed-off-by: MaximumEntropy <[email protected]> * Add arg Signed-off-by: MaximumEntropy <[email protected]> * Fix Signed-off-by: MaximumEntropy <[email protected]> * Fix Signed-off-by: MaximumEntropy <[email protected]> * Wrap in try except Signed-off-by: MaximumEntropy <[email protected]> * Try fix Signed-off-by: MaximumEntropy <[email protected]> * Fix Signed-off-by: MaximumEntropy <[email protected]> * Add separate validation and test batch sizes Signed-off-by: MaximumEntropy <[email protected]> * Fix Signed-off-by: MaximumEntropy <[email protected]> * Fix Signed-off-by: MaximumEntropy <[email protected]> * Fix Signed-off-by: MaximumEntropy <[email protected]> * Add assert Signed-off-by: MaximumEntropy <[email protected]> * Fix Signed-off-by: MaximumEntropy <[email protected]> * Fix checkpoint name Signed-off-by: MaximumEntropy <[email protected]> * Explict sampling args Signed-off-by: MaximumEntropy <[email protected]> * Update t0 script Signed-off-by: MaximumEntropy <[email protected]> * Add niv2 script Signed-off-by: MaximumEntropy <[email protected]> * Change workers Signed-off-by: MaximumEntropy <[email protected]> * Fix labels Signed-off-by: MaximumEntropy <[email protected]> * Ignore download Signed-off-by: MaximumEntropy <[email protected]> * Minor fixes Signed-off-by: MaximumEntropy <[email protected]> * Add dist opt support Signed-off-by: MaximumEntropy <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Minor Signed-off-by: MaximumEntropy <[email protected]> * Allow skipping validation Signed-off-by: MaximumEntropy <[email protected]> * Fix tokenization and padding to max batch Signed-off-by: MaximumEntropy <[email protected]> * Adds several configurable flags for Megatron GPT models (NVIDIA#5991) * Initial Signed-off-by: MaximumEntropy <[email protected]> * Multiple fixes Signed-off-by: MaximumEntropy <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: MaximumEntropy <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add to CI test Signed-off-by: MaximumEntropy <[email protected]> * Fix Signed-off-by: MaximumEntropy <[email protected]> * check position embs for gpt prompt learning Signed-off-by: Adi Renduchintala <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update args Signed-off-by: MaximumEntropy <[email protected]> * Disable tts unit test Signed-off-by: MaximumEntropy <[email protected]> * Fix Signed-off-by: MaximumEntropy <[email protected]> * Fix Signed-off-by: MaximumEntropy <[email protected]> * Empty Signed-off-by: MaximumEntropy <[email protected]> * Update Jenkinsfile Changed optimizer for GPT training from 'fused_adam' to 'distributed_fused_adam'. Signed-off-by: khcs <[email protected]> * update config to to use correct key Signed-off-by: ericharper <[email protected]> * revert Jenkinsfile back to fused_adam Signed-off-by: ericharper <[email protected]> --------- Signed-off-by: MaximumEntropy <[email protected]> Signed-off-by: Adi Renduchintala <[email protected]> Signed-off-by: khcs <[email protected]> Signed-off-by: ericharper <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Adi Renduchintala <[email protected]> Co-authored-by: khcs <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Co-authored-by: ericharper <[email protected]> * Fast glu activations (NVIDIA#6058) * fast glu activations Signed-off-by: MaximumEntropy <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: MaximumEntropy <[email protected]> * Fix Signed-off-by: MaximumEntropy <[email protected]> * Clean up activation list Signed-off-by: MaximumEntropy <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: MaximumEntropy <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Explicitly check for united embeddings when logging params (NVIDIA#6085) * Explicitly check for united embeddings Signed-off-by: MaximumEntropy <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: MaximumEntropy <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Option for model extracted dir Signed-off-by: MaximumEntropy <[email protected]> * Fix Signed-off-by: MaximumEntropy <[email protected]> * Fix Signed-off-by: MaximumEntropy <[email protected]> * Add index mapping dir Signed-off-by: MaximumEntropy <[email protected]> * Assistant prompt Signed-off-by: MaximumEntropy <[email protected]> * Fix Signed-off-by: MaximumEntropy <[email protected]> * Remove ipdb Signed-off-by: MaximumEntropy <[email protected]> * Fix Signed-off-by: MaximumEntropy <[email protected]> * Override dropout Signed-off-by: MaximumEntropy <[email protected]> * Change sampler Signed-off-by: MaximumEntropy <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Roll back again Signed-off-by: MaximumEntropy <[email protected]> * Revert TTS Signed-off-by: MaximumEntropy <[email protected]> * Reset TTS Signed-off-by: MaximumEntropy <[email protected]> * Revert further Signed-off-by: MaximumEntropy <[email protected]> * Revert more to main Signed-off-by: MaximumEntropy <[email protected]> * Fix Test DS Signed-off-by: MaximumEntropy <[email protected]> * Address PR comments Signed-off-by: MaximumEntropy <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add the option to provide a prompt template via fstrings Signed-off-by: MaximumEntropy <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add CI test Signed-off-by: MaximumEntropy <[email protected]> * fix ci test Signed-off-by: MaximumEntropy <[email protected]> * Fix CI test Signed-off-by: MaximumEntropy <[email protected]> * Minor Signed-off-by: MaximumEntropy <[email protected]> * Fix CI Signed-off-by: MaximumEntropy <[email protected]> * Fix CI Signed-off-by: MaximumEntropy <[email protected]> * Fix Signed-off-by: MaximumEntropy <[email protected]> * Fix CI Signed-off-by: MaximumEntropy <[email protected]> * Fix workers issue Signed-off-by: MaximumEntropy <[email protected]> * Fix workers Signed-off-by: MaximumEntropy <[email protected]> --------- Signed-off-by: MaximumEntropy <[email protected]> Signed-off-by: Adi Renduchintala <[email protected]> Signed-off-by: khcs <[email protected]> Signed-off-by: ericharper <[email protected]> Co-authored-by: soares-f <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Adi Renduchintala <[email protected]> Co-authored-by: khcs <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Co-authored-by: ericharper <[email protected]> Signed-off-by: hsiehjackson <[email protected]>

soares-f and others added 30 commits December 19, 2022 06:42

copy from sft_from_gpt

b63dcee

[pre-commit.ci] auto fixes from pre-commit.com hooks

d05d632

for more information, see https://pre-commit.ci

Changed tokenization and example

0cb5907

Merge branch 'GPT_SFT' of https://github.com/soares-f/NeMo into GPT_SFT

e57114c

[pre-commit.ci] auto fixes from pre-commit.com hooks

0785902

for more information, see https://pre-commit.ci

maybe remove (got from upstream)

8f11a14

merge and commit

b2dd38d

Signed-off-by: MaximumEntropy <[email protected]>

Eval metrics while finetuning

e8f1924

Signed-off-by: MaximumEntropy <[email protected]>

Add missing args

b49d37b

Signed-off-by: MaximumEntropy <[email protected]>

Add arg

2de9931

Signed-off-by: MaximumEntropy <[email protected]>

Fix

7636372

Signed-off-by: MaximumEntropy <[email protected]>

Fix

6b30660

Signed-off-by: MaximumEntropy <[email protected]>

Wrap in try except

2e9ab6c

Signed-off-by: MaximumEntropy <[email protected]>

Try fix

7f5eba1

Signed-off-by: MaximumEntropy <[email protected]>

Fix

4387574

Signed-off-by: MaximumEntropy <[email protected]>

Add separate validation and test batch sizes

8bdeff4

Signed-off-by: MaximumEntropy <[email protected]>

Fix

983f6e3

Signed-off-by: MaximumEntropy <[email protected]>

Fix

78ab97f

Signed-off-by: MaximumEntropy <[email protected]>

Fix

6e19953

Signed-off-by: MaximumEntropy <[email protected]>

Add assert

63c81fe

Signed-off-by: MaximumEntropy <[email protected]>

Fix

63d6489

Signed-off-by: MaximumEntropy <[email protected]>

Fix checkpoint name

ed45634

Signed-off-by: MaximumEntropy <[email protected]>

Explict sampling args

19c1a1c

Signed-off-by: MaximumEntropy <[email protected]>

Update t0 script

7fa203f

Signed-off-by: MaximumEntropy <[email protected]>

Add niv2 script

1258436

Signed-off-by: MaximumEntropy <[email protected]>

Change workers

3651097

Signed-off-by: MaximumEntropy <[email protected]>

Merge branch 'main' of github.com:NVIDIA/NeMo into sandeepsub/gpt_sft

406f773

Fix labels

102c2a3

Signed-off-by: MaximumEntropy <[email protected]>

Merge branch 'main' of github.com:NVIDIA/NeMo into sandeepsub/gpt_sft

54b9a77

Ignore download

50f7160

Signed-off-by: MaximumEntropy <[email protected]>

pre-commit-ci bot and others added 3 commits April 3, 2023 23:36

[pre-commit.ci] auto fixes from pre-commit.com hooks

7224f67

for more information, see https://pre-commit.ci

Add CI test

9a971d7

Signed-off-by: MaximumEntropy <[email protected]>

Merge branch 'sandeepsub/gpt_sft_stable_rebase_main' of github.com:NV…

63a187f

…IDIA/NeMo into sandeepsub/gpt_sft_stable_rebase_main

github-actions bot added the CI label Apr 4, 2023

MaximumEntropy added 7 commits April 4, 2023 09:50

fix ci test

3fdf1d4

Signed-off-by: MaximumEntropy <[email protected]>

Merge branch 'main' into sandeepsub/gpt_sft_stable_rebase_main

86efeee

Fix CI test

2f3efd2

Signed-off-by: MaximumEntropy <[email protected]>

Merge branch 'sandeepsub/gpt_sft_stable_rebase_main' of github.com:NV…

78513c6

…IDIA/NeMo into sandeepsub/gpt_sft_stable_rebase_main

Merge branch 'main' into sandeepsub/gpt_sft_stable_rebase_main

a06f6cc

Minor

dea00db

Signed-off-by: MaximumEntropy <[email protected]>

Merge branch 'sandeepsub/gpt_sft_stable_rebase_main' of github.com:NV…

1d96011

…IDIA/NeMo into sandeepsub/gpt_sft_stable_rebase_main

github-advanced-security bot found potential problems Apr 4, 2023

View reviewed changes

examples/nlp/language_modeling/tuning/megatron_gpt_sft.py Fixed Show fixed Hide fixed

examples/nlp/language_modeling/tuning/megatron_gpt_sft.py Fixed Show resolved Hide resolved

MaximumEntropy requested a review from yidong72 April 4, 2023 21:23

yidong72 reviewed Apr 4, 2023

View reviewed changes

MaximumEntropy added 9 commits April 4, 2023 21:25

Fix CI

d6d9837

Signed-off-by: MaximumEntropy <[email protected]>

Fix CI

7749ede

Signed-off-by: MaximumEntropy <[email protected]>

Merge branch 'main' into sandeepsub/gpt_sft_stable_rebase_main

7fd0c85

Fix

ad69891

Signed-off-by: MaximumEntropy <[email protected]>

Merge branch 'sandeepsub/gpt_sft_stable_rebase_main' of github.com:NV…

5df8955

…IDIA/NeMo into sandeepsub/gpt_sft_stable_rebase_main

Merge branch 'main' into sandeepsub/gpt_sft_stable_rebase_main

791402f

Fix CI

6c003b0

Signed-off-by: MaximumEntropy <[email protected]>

Fix workers issue

d99e276

Signed-off-by: MaximumEntropy <[email protected]>

Merge branch 'main' into sandeepsub/gpt_sft_stable_rebase_main

9951a19

okuchaiev requested a review from arendu April 6, 2023 21:32

MaximumEntropy added 2 commits April 6, 2023 14:41

Fix workers

6443e69

Signed-off-by: MaximumEntropy <[email protected]>

Merge branch 'main' into sandeepsub/gpt_sft_stable_rebase_main

7062845

ericharper approved these changes Apr 6, 2023

View reviewed changes

ericharper merged commit 714eded into main Apr 6, 2023

ericharper deleted the sandeepsub/gpt_sft_stable_rebase_main branch April 6, 2023 23:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Megatron GPT model finetuning #6210

Megatron GPT model finetuning #6210

MaximumEntropy commented Mar 15, 2023

yidong72 left a comment

yidong72 Apr 4, 2023

yidong72 Apr 4, 2023

MaximumEntropy Apr 5, 2023

yidong72 Apr 4, 2023

yidong72 Apr 4, 2023

ericharper left a comment

		return model


		def load_from_checkpoint_dir(cls, cfg, trainer, modify_confg_fn):

Megatron GPT model finetuning #6210

Megatron GPT model finetuning #6210

Conversation

MaximumEntropy commented Mar 15, 2023

What does this PR do ?

Changelog

Usage

Before your PR is "Ready for review"

Who can review?

Additional Information

yidong72 left a comment

Choose a reason for hiding this comment

yidong72 Apr 4, 2023

Choose a reason for hiding this comment

yidong72 Apr 4, 2023

Choose a reason for hiding this comment

MaximumEntropy Apr 5, 2023

Choose a reason for hiding this comment

yidong72 Apr 4, 2023

Choose a reason for hiding this comment

yidong72 Apr 4, 2023

Choose a reason for hiding this comment

ericharper left a comment

Choose a reason for hiding this comment