Adds several configurable flags for Megatron GPT models #5991

MaximumEntropy · 2023-02-10T18:47:10Z

What does this PR do ?

This PR adds the following functionality for Megatron GPT models.

Disable biases.
Add different activation functions - geglu, swiglu, reglu, squared relu
Different transformer layer configs - pre-ln, post-ln, normformer.
RoPE
Disable both hidden and attention dropout.
RMSNorm Normalization
Untie embedding and output layer.

Collection: NLP

Changelog

Add specific line by line info of high level changes in this PR.

Usage

You can potentially add a usage example below

# Add a code snippet demonstrating how to use this

Before your PR is "Ready for review"

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests?
Did you add or update any necessary documentation?
Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
- Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

New Feature
Bugfix
Documentation

If you haven't finished some of the above items you can still open "Draft" PR.

Who can review?

Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.

Additional Information

Related to # (issue)

Signed-off-by: MaximumEntropy <[email protected]>

…n_lm_gpt_compat

Signed-off-by: MaximumEntropy <[email protected]>

for more information, see https://pre-commit.ci

ericharper

LGTM. Thanks!

…n_lm_gpt_compat

…/NeMo into sandeepsub/megatron_lm_gpt_compat

…n_lm_gpt_compat

Signed-off-by: MaximumEntropy <[email protected]>

for more information, see https://pre-commit.ci

Signed-off-by: MaximumEntropy <[email protected]>

…/NeMo into sandeepsub/megatron_lm_gpt_compat

khcs

Looks good as unified config flags. Only minor review on flag consistency regarding "rotary_percent(age)". Thanks!

Jenkinsfile

Signed-off-by: MaximumEntropy <[email protected]>

Signed-off-by: Adi Renduchintala <[email protected]>

…m/NVIDIA/NeMo into sandeepsub/megatron_lm_gpt_compat

for more information, see https://pre-commit.ci

Signed-off-by: MaximumEntropy <[email protected]>

…/NeMo into sandeepsub/megatron_lm_gpt_compat

Signed-off-by: MaximumEntropy <[email protected]>

khcs

Thanks for integrating the config changes and also adding the CI test, Sandeep!

Signed-off-by: MaximumEntropy <[email protected]>

ericharper · 2023-02-16T19:09:35Z

nemo/collections/nlp/models/language_modeling/megatron_base_model.py

@@ -508,16 +508,15 @@ def _get_total_params_across_model_parallel_groups_gpt_bert(self, model):
            num_parameters_on_device = sum(
                [sum([p.nelement() for p in model_module.parameters()]) for model_module in model]
            )
-            if parallel_state.get_pipeline_model_parallel_world_size() > 1 and parallel_state.is_pipeline_last_stage(
+            if parallel_state.get_pipeline_model_parallel_world_size() > 1 and parallel_state.is_pipeline_first_stage(


Should this be changed?

Yeah because word embeddings are no longer present in the last pipeline stage if you untie embeddings and output weights. But they should always be present in the first pipeline stage.

Same for the comment below.

ericharper · 2023-02-16T19:09:44Z

nemo/collections/nlp/models/language_modeling/megatron_base_model.py

                ignore_virtual=True
            ):
                # substract the embedding weights on the last virtual stage
                num_word_embedding_parameters = sum([p.nelement() for p in model[-1].word_embeddings_weight()])
                num_parameters_on_device -= num_word_embedding_parameters
        else:
            num_parameters_on_device = sum([p.nelement() for p in model.parameters()])
-
-            if parallel_state.get_pipeline_model_parallel_world_size() > 1 and parallel_state.is_pipeline_last_stage(
+            if parallel_state.get_pipeline_model_parallel_world_size() > 1 and parallel_state.is_pipeline_first_stage(


Should this be changed?

Changed optimizer for GPT training from 'fused_adam' to 'distributed_fused_adam'. Signed-off-by: khcs <[email protected]>

okuchaiev · 2023-02-17T19:36:43Z

@khcs and @ericharper is this good to merge now?

ericharper · 2023-02-17T19:55:05Z

@khcs and @ericharper is this good to merge now?

No, the CI test with pp=2 is failing

Signed-off-by: ericharper <[email protected]>

ericharper · 2023-02-17T23:34:00Z

@khcs and @ericharper is this good to merge now?

No, the CI test with pp=2 is failing

Update: it looks like we found the issue. Waiting to see if CI passes now.

khcs

Looks good and passing all CI test now! Thanks all!

* Initial Signed-off-by: MaximumEntropy <[email protected]> * Multiple fixes Signed-off-by: MaximumEntropy <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: MaximumEntropy <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add to CI test Signed-off-by: MaximumEntropy <[email protected]> * Fix Signed-off-by: MaximumEntropy <[email protected]> * check position embs for gpt prompt learning Signed-off-by: Adi Renduchintala <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update args Signed-off-by: MaximumEntropy <[email protected]> * Disable tts unit test Signed-off-by: MaximumEntropy <[email protected]> * Fix Signed-off-by: MaximumEntropy <[email protected]> * Fix Signed-off-by: MaximumEntropy <[email protected]> * Empty Signed-off-by: MaximumEntropy <[email protected]> * Update Jenkinsfile Changed optimizer for GPT training from 'fused_adam' to 'distributed_fused_adam'. Signed-off-by: khcs <[email protected]> * update config to to use correct key Signed-off-by: ericharper <[email protected]> * revert Jenkinsfile back to fused_adam Signed-off-by: ericharper <[email protected]> --------- Signed-off-by: MaximumEntropy <[email protected]> Signed-off-by: Adi Renduchintala <[email protected]> Signed-off-by: khcs <[email protected]> Signed-off-by: ericharper <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Adi Renduchintala <[email protected]> Co-authored-by: khcs <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Co-authored-by: ericharper <[email protected]>

hiyijian · 2023-03-08T03:01:11Z

It seems that RoPE can not work with sequence_parallel

* Initial Signed-off-by: MaximumEntropy <[email protected]> * Multiple fixes Signed-off-by: MaximumEntropy <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: MaximumEntropy <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add to CI test Signed-off-by: MaximumEntropy <[email protected]> * Fix Signed-off-by: MaximumEntropy <[email protected]> * check position embs for gpt prompt learning Signed-off-by: Adi Renduchintala <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update args Signed-off-by: MaximumEntropy <[email protected]> * Disable tts unit test Signed-off-by: MaximumEntropy <[email protected]> * Fix Signed-off-by: MaximumEntropy <[email protected]> * Fix Signed-off-by: MaximumEntropy <[email protected]> * Empty Signed-off-by: MaximumEntropy <[email protected]> * Update Jenkinsfile Changed optimizer for GPT training from 'fused_adam' to 'distributed_fused_adam'. Signed-off-by: khcs <[email protected]> * update config to to use correct key Signed-off-by: ericharper <[email protected]> * revert Jenkinsfile back to fused_adam Signed-off-by: ericharper <[email protected]> --------- Signed-off-by: MaximumEntropy <[email protected]> Signed-off-by: Adi Renduchintala <[email protected]> Signed-off-by: khcs <[email protected]> Signed-off-by: ericharper <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Adi Renduchintala <[email protected]> Co-authored-by: khcs <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Co-authored-by: ericharper <[email protected]>

* copy from sft_from_gpt * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Changed tokenization and example * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * maybe remove (got from upstream) * Eval metrics while finetuning Signed-off-by: MaximumEntropy <[email protected]> * Add missing args Signed-off-by: MaximumEntropy <[email protected]> * Add arg Signed-off-by: MaximumEntropy <[email protected]> * Fix Signed-off-by: MaximumEntropy <[email protected]> * Fix Signed-off-by: MaximumEntropy <[email protected]> * Wrap in try except Signed-off-by: MaximumEntropy <[email protected]> * Try fix Signed-off-by: MaximumEntropy <[email protected]> * Fix Signed-off-by: MaximumEntropy <[email protected]> * Add separate validation and test batch sizes Signed-off-by: MaximumEntropy <[email protected]> * Fix Signed-off-by: MaximumEntropy <[email protected]> * Fix Signed-off-by: MaximumEntropy <[email protected]> * Fix Signed-off-by: MaximumEntropy <[email protected]> * Add assert Signed-off-by: MaximumEntropy <[email protected]> * Fix Signed-off-by: MaximumEntropy <[email protected]> * Fix checkpoint name Signed-off-by: MaximumEntropy <[email protected]> * Explict sampling args Signed-off-by: MaximumEntropy <[email protected]> * Update t0 script Signed-off-by: MaximumEntropy <[email protected]> * Add niv2 script Signed-off-by: MaximumEntropy <[email protected]> * Change workers Signed-off-by: MaximumEntropy <[email protected]> * Fix labels Signed-off-by: MaximumEntropy <[email protected]> * Ignore download Signed-off-by: MaximumEntropy <[email protected]> * Minor fixes Signed-off-by: MaximumEntropy <[email protected]> * Add dist opt support Signed-off-by: MaximumEntropy <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Minor Signed-off-by: MaximumEntropy <[email protected]> * Allow skipping validation Signed-off-by: MaximumEntropy <[email protected]> * Fix tokenization and padding to max batch Signed-off-by: MaximumEntropy <[email protected]> * Adds several configurable flags for Megatron GPT models (#5991) * Initial Signed-off-by: MaximumEntropy <[email protected]> * Multiple fixes Signed-off-by: MaximumEntropy <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: MaximumEntropy <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add to CI test Signed-off-by: MaximumEntropy <[email protected]> * Fix Signed-off-by: MaximumEntropy <[email protected]> * check position embs for gpt prompt learning Signed-off-by: Adi Renduchintala <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update args Signed-off-by: MaximumEntropy <[email protected]> * Disable tts unit test Signed-off-by: MaximumEntropy <[email protected]> * Fix Signed-off-by: MaximumEntropy <[email protected]> * Fix Signed-off-by: MaximumEntropy <[email protected]> * Empty Signed-off-by: MaximumEntropy <[email protected]> * Update Jenkinsfile Changed optimizer for GPT training from 'fused_adam' to 'distributed_fused_adam'. Signed-off-by: khcs <[email protected]> * update config to to use correct key Signed-off-by: ericharper <[email protected]> * revert Jenkinsfile back to fused_adam Signed-off-by: ericharper <[email protected]> --------- Signed-off-by: MaximumEntropy <[email protected]> Signed-off-by: Adi Renduchintala <[email protected]> Signed-off-by: khcs <[email protected]> Signed-off-by: ericharper <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Adi Renduchintala <[email protected]> Co-authored-by: khcs <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Co-authored-by: ericharper <[email protected]> * Fast glu activations (#6058) * fast glu activations Signed-off-by: MaximumEntropy <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: MaximumEntropy <[email protected]> * Fix Signed-off-by: MaximumEntropy <[email protected]> * Clean up activation list Signed-off-by: MaximumEntropy <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: MaximumEntropy <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Explicitly check for united embeddings when logging params (#6085) * Explicitly check for united embeddings Signed-off-by: MaximumEntropy <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: MaximumEntropy <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Option for model extracted dir Signed-off-by: MaximumEntropy <[email protected]> * Fix Signed-off-by: MaximumEntropy <[email protected]> * Fix Signed-off-by: MaximumEntropy <[email protected]> * Add index mapping dir Signed-off-by: MaximumEntropy <[email protected]> * Assistant prompt Signed-off-by: MaximumEntropy <[email protected]> * Fix Signed-off-by: MaximumEntropy <[email protected]> * Remove ipdb Signed-off-by: MaximumEntropy <[email protected]> * Fix Signed-off-by: MaximumEntropy <[email protected]> * Override dropout Signed-off-by: MaximumEntropy <[email protected]> * Change sampler Signed-off-by: MaximumEntropy <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Roll back again Signed-off-by: MaximumEntropy <[email protected]> * Revert TTS Signed-off-by: MaximumEntropy <[email protected]> * Reset TTS Signed-off-by: MaximumEntropy <[email protected]> * Revert further Signed-off-by: MaximumEntropy <[email protected]> * Revert more to main Signed-off-by: MaximumEntropy <[email protected]> * Fix Test DS Signed-off-by: MaximumEntropy <[email protected]> * Address PR comments Signed-off-by: MaximumEntropy <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add the option to provide a prompt template via fstrings Signed-off-by: MaximumEntropy <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add CI test Signed-off-by: MaximumEntropy <[email protected]> * fix ci test Signed-off-by: MaximumEntropy <[email protected]> * Fix CI test Signed-off-by: MaximumEntropy <[email protected]> * Minor Signed-off-by: MaximumEntropy <[email protected]> * Fix CI Signed-off-by: MaximumEntropy <[email protected]> * Fix CI Signed-off-by: MaximumEntropy <[email protected]> * Fix Signed-off-by: MaximumEntropy <[email protected]> * Fix CI Signed-off-by: MaximumEntropy <[email protected]> * Fix workers issue Signed-off-by: MaximumEntropy <[email protected]> * Fix workers Signed-off-by: MaximumEntropy <[email protected]> --------- Signed-off-by: MaximumEntropy <[email protected]> Signed-off-by: Adi Renduchintala <[email protected]> Signed-off-by: khcs <[email protected]> Signed-off-by: ericharper <[email protected]> Co-authored-by: soares-f <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Adi Renduchintala <[email protected]> Co-authored-by: khcs <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Co-authored-by: ericharper <[email protected]>

* Initial Signed-off-by: MaximumEntropy <[email protected]> * Multiple fixes Signed-off-by: MaximumEntropy <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: MaximumEntropy <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add to CI test Signed-off-by: MaximumEntropy <[email protected]> * Fix Signed-off-by: MaximumEntropy <[email protected]> * check position embs for gpt prompt learning Signed-off-by: Adi Renduchintala <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update args Signed-off-by: MaximumEntropy <[email protected]> * Disable tts unit test Signed-off-by: MaximumEntropy <[email protected]> * Fix Signed-off-by: MaximumEntropy <[email protected]> * Fix Signed-off-by: MaximumEntropy <[email protected]> * Empty Signed-off-by: MaximumEntropy <[email protected]> * Update Jenkinsfile Changed optimizer for GPT training from 'fused_adam' to 'distributed_fused_adam'. Signed-off-by: khcs <[email protected]> * update config to to use correct key Signed-off-by: ericharper <[email protected]> * revert Jenkinsfile back to fused_adam Signed-off-by: ericharper <[email protected]> --------- Signed-off-by: MaximumEntropy <[email protected]> Signed-off-by: Adi Renduchintala <[email protected]> Signed-off-by: khcs <[email protected]> Signed-off-by: ericharper <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Adi Renduchintala <[email protected]> Co-authored-by: khcs <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Co-authored-by: ericharper <[email protected]> Signed-off-by: hsiehjackson <[email protected]>

* copy from sft_from_gpt * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Changed tokenization and example * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * maybe remove (got from upstream) * Eval metrics while finetuning Signed-off-by: MaximumEntropy <[email protected]> * Add missing args Signed-off-by: MaximumEntropy <[email protected]> * Add arg Signed-off-by: MaximumEntropy <[email protected]> * Fix Signed-off-by: MaximumEntropy <[email protected]> * Fix Signed-off-by: MaximumEntropy <[email protected]> * Wrap in try except Signed-off-by: MaximumEntropy <[email protected]> * Try fix Signed-off-by: MaximumEntropy <[email protected]> * Fix Signed-off-by: MaximumEntropy <[email protected]> * Add separate validation and test batch sizes Signed-off-by: MaximumEntropy <[email protected]> * Fix Signed-off-by: MaximumEntropy <[email protected]> * Fix Signed-off-by: MaximumEntropy <[email protected]> * Fix Signed-off-by: MaximumEntropy <[email protected]> * Add assert Signed-off-by: MaximumEntropy <[email protected]> * Fix Signed-off-by: MaximumEntropy <[email protected]> * Fix checkpoint name Signed-off-by: MaximumEntropy <[email protected]> * Explict sampling args Signed-off-by: MaximumEntropy <[email protected]> * Update t0 script Signed-off-by: MaximumEntropy <[email protected]> * Add niv2 script Signed-off-by: MaximumEntropy <[email protected]> * Change workers Signed-off-by: MaximumEntropy <[email protected]> * Fix labels Signed-off-by: MaximumEntropy <[email protected]> * Ignore download Signed-off-by: MaximumEntropy <[email protected]> * Minor fixes Signed-off-by: MaximumEntropy <[email protected]> * Add dist opt support Signed-off-by: MaximumEntropy <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Minor Signed-off-by: MaximumEntropy <[email protected]> * Allow skipping validation Signed-off-by: MaximumEntropy <[email protected]> * Fix tokenization and padding to max batch Signed-off-by: MaximumEntropy <[email protected]> * Adds several configurable flags for Megatron GPT models (NVIDIA#5991) * Initial Signed-off-by: MaximumEntropy <[email protected]> * Multiple fixes Signed-off-by: MaximumEntropy <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: MaximumEntropy <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add to CI test Signed-off-by: MaximumEntropy <[email protected]> * Fix Signed-off-by: MaximumEntropy <[email protected]> * check position embs for gpt prompt learning Signed-off-by: Adi Renduchintala <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update args Signed-off-by: MaximumEntropy <[email protected]> * Disable tts unit test Signed-off-by: MaximumEntropy <[email protected]> * Fix Signed-off-by: MaximumEntropy <[email protected]> * Fix Signed-off-by: MaximumEntropy <[email protected]> * Empty Signed-off-by: MaximumEntropy <[email protected]> * Update Jenkinsfile Changed optimizer for GPT training from 'fused_adam' to 'distributed_fused_adam'. Signed-off-by: khcs <[email protected]> * update config to to use correct key Signed-off-by: ericharper <[email protected]> * revert Jenkinsfile back to fused_adam Signed-off-by: ericharper <[email protected]> --------- Signed-off-by: MaximumEntropy <[email protected]> Signed-off-by: Adi Renduchintala <[email protected]> Signed-off-by: khcs <[email protected]> Signed-off-by: ericharper <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Adi Renduchintala <[email protected]> Co-authored-by: khcs <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Co-authored-by: ericharper <[email protected]> * Fast glu activations (NVIDIA#6058) * fast glu activations Signed-off-by: MaximumEntropy <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: MaximumEntropy <[email protected]> * Fix Signed-off-by: MaximumEntropy <[email protected]> * Clean up activation list Signed-off-by: MaximumEntropy <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: MaximumEntropy <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Explicitly check for united embeddings when logging params (NVIDIA#6085) * Explicitly check for united embeddings Signed-off-by: MaximumEntropy <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: MaximumEntropy <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Option for model extracted dir Signed-off-by: MaximumEntropy <[email protected]> * Fix Signed-off-by: MaximumEntropy <[email protected]> * Fix Signed-off-by: MaximumEntropy <[email protected]> * Add index mapping dir Signed-off-by: MaximumEntropy <[email protected]> * Assistant prompt Signed-off-by: MaximumEntropy <[email protected]> * Fix Signed-off-by: MaximumEntropy <[email protected]> * Remove ipdb Signed-off-by: MaximumEntropy <[email protected]> * Fix Signed-off-by: MaximumEntropy <[email protected]> * Override dropout Signed-off-by: MaximumEntropy <[email protected]> * Change sampler Signed-off-by: MaximumEntropy <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Roll back again Signed-off-by: MaximumEntropy <[email protected]> * Revert TTS Signed-off-by: MaximumEntropy <[email protected]> * Reset TTS Signed-off-by: MaximumEntropy <[email protected]> * Revert further Signed-off-by: MaximumEntropy <[email protected]> * Revert more to main Signed-off-by: MaximumEntropy <[email protected]> * Fix Test DS Signed-off-by: MaximumEntropy <[email protected]> * Address PR comments Signed-off-by: MaximumEntropy <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add the option to provide a prompt template via fstrings Signed-off-by: MaximumEntropy <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add CI test Signed-off-by: MaximumEntropy <[email protected]> * fix ci test Signed-off-by: MaximumEntropy <[email protected]> * Fix CI test Signed-off-by: MaximumEntropy <[email protected]> * Minor Signed-off-by: MaximumEntropy <[email protected]> * Fix CI Signed-off-by: MaximumEntropy <[email protected]> * Fix CI Signed-off-by: MaximumEntropy <[email protected]> * Fix Signed-off-by: MaximumEntropy <[email protected]> * Fix CI Signed-off-by: MaximumEntropy <[email protected]> * Fix workers issue Signed-off-by: MaximumEntropy <[email protected]> * Fix workers Signed-off-by: MaximumEntropy <[email protected]> --------- Signed-off-by: MaximumEntropy <[email protected]> Signed-off-by: Adi Renduchintala <[email protected]> Signed-off-by: khcs <[email protected]> Signed-off-by: ericharper <[email protected]> Co-authored-by: soares-f <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Adi Renduchintala <[email protected]> Co-authored-by: khcs <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Co-authored-by: ericharper <[email protected]> Signed-off-by: hsiehjackson <[email protected]>

MaximumEntropy added 3 commits February 8, 2023 12:52

Initial

5906945

Signed-off-by: MaximumEntropy <[email protected]>

Merge branch 'main' of github.com:NVIDIA/NeMo into sandeepsub/megatro…

23f0736

…n_lm_gpt_compat

Multiple fixes

e1d841b

Signed-off-by: MaximumEntropy <[email protected]>

github-actions bot added the NLP label Feb 10, 2023

pre-commit-ci bot and others added 2 commits February 10, 2023 18:48

[pre-commit.ci] auto fixes from pre-commit.com hooks

cb3f7f6

for more information, see https://pre-commit.ci

Merge branch 'main' into sandeepsub/megatron_lm_gpt_compat

8f1e45a

MaximumEntropy requested a review from ericharper February 10, 2023 18:48

MaximumEntropy marked this pull request as draft February 10, 2023 18:51

ericharper previously approved these changes Feb 10, 2023

View reviewed changes

MaximumEntropy added 4 commits February 10, 2023 14:30

Merge branch 'main' of github.com:NVIDIA/NeMo into sandeepsub/megatro…

cb0d05f

…n_lm_gpt_compat

Merge branch 'sandeepsub/megatron_lm_gpt_compat' of github.com:NVIDIA…

a5c9bee

…/NeMo into sandeepsub/megatron_lm_gpt_compat

Merge branch 'main' of github.com:NVIDIA/NeMo into sandeepsub/megatro…

dd743d8

…n_lm_gpt_compat

Fix

c23fe6e

Signed-off-by: MaximumEntropy <[email protected]>

MaximumEntropy dismissed ericharper’s stale review via c23fe6e February 13, 2023 22:51

MaximumEntropy requested review from ericharper and khcs February 13, 2023 22:51

Merge branch 'main' into sandeepsub/megatron_lm_gpt_compat

37a0553

MaximumEntropy marked this pull request as ready for review February 13, 2023 22:52

pre-commit-ci bot and others added 3 commits February 13, 2023 22:53

[pre-commit.ci] auto fixes from pre-commit.com hooks

ab7f7e6

for more information, see https://pre-commit.ci

Add to CI test

b3f0f35

Signed-off-by: MaximumEntropy <[email protected]>

Merge branch 'sandeepsub/megatron_lm_gpt_compat' of github.com:NVIDIA…

1600fa8

…/NeMo into sandeepsub/megatron_lm_gpt_compat

github-actions bot added the CI label Feb 13, 2023

khcs reviewed Feb 14, 2023

View reviewed changes

Jenkinsfile Outdated Show resolved Hide resolved

Jenkinsfile Outdated Show resolved Hide resolved

MaximumEntropy and others added 7 commits February 15, 2023 00:00

Fix

22ca8dd

Signed-off-by: MaximumEntropy <[email protected]>

Merge branch 'main' into sandeepsub/megatron_lm_gpt_compat

aa5c254

check position embs for gpt prompt learning

8775b9d

Signed-off-by: Adi Renduchintala <[email protected]>

Merge branch 'sandeepsub/megatron_lm_gpt_compat' of https://github.co…

cd880ee

…m/NVIDIA/NeMo into sandeepsub/megatron_lm_gpt_compat

[pre-commit.ci] auto fixes from pre-commit.com hooks

18ceb66

for more information, see https://pre-commit.ci

Update args

8d13143

Signed-off-by: MaximumEntropy <[email protected]>

Merge branch 'main' into sandeepsub/megatron_lm_gpt_compat

ef1263f

MaximumEntropy added 4 commits February 15, 2023 11:32

Merge branch 'sandeepsub/megatron_lm_gpt_compat' of github.com:NVIDIA…

ab06671

…/NeMo into sandeepsub/megatron_lm_gpt_compat

Fix

cee2c6b

Signed-off-by: MaximumEntropy <[email protected]>

Fix

88bcb27

Signed-off-by: MaximumEntropy <[email protected]>

Merge branch 'main' into sandeepsub/megatron_lm_gpt_compat

0a38655

khcs self-requested a review February 15, 2023 23:49

khcs previously approved these changes Feb 15, 2023

View reviewed changes

Empty

ef12369

Signed-off-by: MaximumEntropy <[email protected]>

ericharper reviewed Feb 16, 2023

View reviewed changes

khcs added 2 commits February 16, 2023 11:27

Merge branch 'main' into sandeepsub/megatron_lm_gpt_compat

ca0dfe7

Update Jenkinsfile

cf04c31

Changed optimizer for GPT training from 'fused_adam' to 'distributed_fused_adam'. Signed-off-by: khcs <[email protected]>

khcs dismissed their stale review via cf04c31 February 16, 2023 23:42

Merge branch 'main' into sandeepsub/megatron_lm_gpt_compat

d2d8a5a

ericharper and others added 3 commits February 17, 2023 15:58

update config to to use correct key

9df2b14

Signed-off-by: ericharper <[email protected]>

revert Jenkinsfile back to fused_adam

0558b74

Signed-off-by: ericharper <[email protected]>

Merge branch 'main' into sandeepsub/megatron_lm_gpt_compat

a74cf75

Merge branch 'main' into sandeepsub/megatron_lm_gpt_compat

9332dbc

khcs approved these changes Feb 18, 2023

View reviewed changes

MaximumEntropy merged commit 4a56631 into main Feb 18, 2023

MaximumEntropy deleted the sandeepsub/megatron_lm_gpt_compat branch February 18, 2023 05:13

ericharper mentioned this pull request May 12, 2023

GPT Training process hangs during sanity checking dataloader with TP=2, PP=2 and bias=false #6396

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adds several configurable flags for Megatron GPT models #5991

Adds several configurable flags for Megatron GPT models #5991

MaximumEntropy commented Feb 10, 2023 •

edited

Loading

ericharper left a comment

khcs left a comment

khcs left a comment

ericharper Feb 16, 2023

MaximumEntropy Feb 16, 2023

MaximumEntropy Feb 16, 2023

ericharper Feb 16, 2023

okuchaiev commented Feb 17, 2023

ericharper commented Feb 17, 2023

ericharper commented Feb 17, 2023

khcs left a comment

hiyijian commented Mar 8, 2023

Adds several configurable flags for Megatron GPT models #5991

Adds several configurable flags for Megatron GPT models #5991

Conversation

MaximumEntropy commented Feb 10, 2023 • edited Loading

What does this PR do ?

Changelog

Usage

Before your PR is "Ready for review"

Who can review?

Additional Information

ericharper left a comment

Choose a reason for hiding this comment

khcs left a comment

Choose a reason for hiding this comment

khcs left a comment

Choose a reason for hiding this comment

ericharper Feb 16, 2023

Choose a reason for hiding this comment

MaximumEntropy Feb 16, 2023

Choose a reason for hiding this comment

MaximumEntropy Feb 16, 2023

Choose a reason for hiding this comment

ericharper Feb 16, 2023

Choose a reason for hiding this comment

okuchaiev commented Feb 17, 2023

ericharper commented Feb 17, 2023

ericharper commented Feb 17, 2023

khcs left a comment

Choose a reason for hiding this comment

hiyijian commented Mar 8, 2023

MaximumEntropy commented Feb 10, 2023 •

edited

Loading