Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use GPTModel from mcore #7093

Merged
merged 78 commits into from
Aug 14, 2023
Merged

Use GPTModel from mcore #7093

merged 78 commits into from
Aug 14, 2023

Conversation

ericharper
Copy link
Collaborator

@ericharper ericharper commented Jul 21, 2023

What does this PR do ?

This PR adds a path to use GPTModel from Megatron Core.

Requirements:
Make sure that megatron core is installed from a recent commit:

git clone https://github.com/NVIDIA/Megatron-LM.git && \
            cd Megatron-LM && \
            git checkout 3316e811cc5335ee24c2d203416d864edcf2f7a8 && \
            pip install -e .

Collection: NLP

Changelog

  • Add specific line by line info of high level changes in this PR.

Usage

Set mcore_gpt=True in the MegatronGPTModel config.

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you add or update any necessary documentation?
  • Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
    • Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

  • New Feature
  • Bugfix
  • Documentation

If you haven't finished some of the above items you can still open "Draft" PR.

Who can review?

Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.

Additional Information

  • Related to # (issue)

@github-actions github-actions bot added core Changes to NeMo Core NLP CI labels Jul 21, 2023
Copy link

@github-advanced-security github-advanced-security bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CodeQL found more than 10 potential problems in the proposed changes. Check the Files changed tab for more details.

@ericharper ericharper marked this pull request as ready for review July 25, 2023 18:16
@ericharper ericharper changed the base branch from main to mcore_gpt_path July 25, 2023 18:16
@github-actions github-actions bot removed core Changes to NeMo Core CI labels Jul 25, 2023
ericharper and others added 21 commits August 8, 2023 10:46
Signed-off-by: ericharper <[email protected]>
Signed-off-by: ericharper <[email protected]>
Signed-off-by: ericharper <[email protected]>
Signed-off-by: ericharper <[email protected]>
Signed-off-by: ericharper <[email protected]>
Signed-off-by: ericharper <[email protected]>
Signed-off-by: ericharper <[email protected]>
Signed-off-by: ericharper <[email protected]>
ericharper and others added 4 commits August 10, 2023 11:29
Signed-off-by: ericharper <[email protected]>
* Add GQA config in gpt config file

Signed-off-by: jasonwan <[email protected]>

* Verify mcore is enabled when using GQA

Signed-off-by: jasonwan <[email protected]>

---------

Signed-off-by: jasonwan <[email protected]>
Signed-off-by: ericharper <[email protected]>
Base automatically changed from mcore_gpt_path to main August 14, 2023 04:55
Signed-off-by: eharper <[email protected]>
@github-actions github-actions bot added the CI label Aug 14, 2023
@@ -44,6 +44,9 @@ exp_manager:
model_parallel_size: ${multiply:${model.tensor_model_parallel_size}, ${model.pipeline_model_parallel_size}}

model:
# use GPTModel from megatron.core
mcore_gpt: False
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should add a CI test for mcore gpt = True

Copy link
Collaborator

@aklife97 aklife97 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@ericharper ericharper merged commit e64076e into main Aug 14, 2023
13 of 15 checks passed
@ericharper ericharper deleted the mcore_gpt_model branch August 14, 2023 21:37
dorotat-nv pushed a commit to dorotat-nv/NeMo that referenced this pull request Aug 24, 2023
* start adding gpt from megatron core path

Signed-off-by: ericharper <[email protected]>

* set model parallel config

Signed-off-by: ericharper <[email protected]>

* use model parallel config object

Signed-off-by: ericharper <[email protected]>

* update args

Signed-off-by: ericharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* set vp size to none if it is 1

Signed-off-by: ericharper <[email protected]>

* set vp size to none if it is 1

Signed-off-by: ericharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add TransformerConfig

Signed-off-by: ericharper <[email protected]>

* start updating to TransformerConfig

Signed-off-by: ericharper <[email protected]>

* add todo

Signed-off-by: ericharper <[email protected]>

* revert to model parallel config

Signed-off-by: ericharper <[email protected]>

* add hidden_size to model_parallel_config

Signed-off-by: ericharper <[email protected]>

* remove imports

Signed-off-by: ericharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove import

Signed-off-by: ericharper <[email protected]>

* small clean up

Signed-off-by: ericharper <[email protected]>

* update hidden size in peft base model, add mcore commit to jenkins

Signed-off-by: ericharper <[email protected]>

* update module args

Signed-off-by: ericharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add config obj to flash attention tests

Signed-off-by: ericharper <[email protected]>

* remove args

Signed-off-by: ericharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove sequence parallel arg

Signed-off-by: ericharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update args

Signed-off-by: ericharper <[email protected]>

* add config to self

Signed-off-by: ericharper <[email protected]>

* update args

Signed-off-by: ericharper <[email protected]>

* update args

Signed-off-by: ericharper <[email protected]>

* update args

Signed-off-by: ericharper <[email protected]>

* add config to test

Signed-off-by: ericharper <[email protected]>

* get hidden_size from config

Signed-off-by: ericharper <[email protected]>

* add try except

Signed-off-by: ericharper <[email protected]>

* use default

Signed-off-by: ericharper <[email protected]>

* update config with hidden size

Signed-off-by: ericharper <[email protected]>

* remove arg

Signed-off-by: ericharper <[email protected]>

* comment out jenkins test

Signed-off-by: ericharper <[email protected]>

* revert import

Signed-off-by: ericharper <[email protected]>

* remove optimizer_idx

Signed-off-by: eharper <[email protected]>

* prefetch num microbatches

Signed-off-by: eharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* start adding gpt from megatron core path

Signed-off-by: ericharper <[email protected]>

* set model parallel config

Signed-off-by: ericharper <[email protected]>

* use model parallel config object

Signed-off-by: ericharper <[email protected]>

* update args

Signed-off-by: ericharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* start updating to TransformerConfig

Signed-off-by: ericharper <[email protected]>

* revert to model parallel config

Signed-off-by: ericharper <[email protected]>

* add hidden_size to model_parallel_config

Signed-off-by: ericharper <[email protected]>

* remove imports

Signed-off-by: ericharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update module args

Signed-off-by: ericharper <[email protected]>

* add config to self

Signed-off-by: ericharper <[email protected]>

* build transformer config

Signed-off-by: ericharper <[email protected]>

* add model to provider func

Signed-off-by: ericharper <[email protected]>

* update forward and float16 wrapper

Signed-off-by: ericharper <[email protected]>

* instantiate model parallel config after init model parallel

Signed-off-by: ericharper <[email protected]>

* set virtual rank

Signed-off-by: ericharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add GQA config to megatron gpt model (NVIDIA#7096)

* Add GQA config in gpt config file

Signed-off-by: jasonwan <[email protected]>

* Verify mcore is enabled when using GQA

Signed-off-by: jasonwan <[email protected]>

---------

Signed-off-by: jasonwan <[email protected]>

* revert

Signed-off-by: ericharper <[email protected]>

* remove import

Signed-off-by: eharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update for dist adam

Signed-off-by: eharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* use get_gpt_module_list

Signed-off-by: eharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update megatron core commit

Signed-off-by: eharper <[email protected]>

* revert change

Signed-off-by: eharper <[email protected]>

* remove import

Signed-off-by: eharper <[email protected]>

* remove import

Signed-off-by: eharper <[email protected]>

* remove import

Signed-off-by: eharper <[email protected]>

---------

Signed-off-by: ericharper <[email protected]>
Signed-off-by: eharper <[email protected]>
Signed-off-by: jasonwan <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Jason Wang <[email protected]>
Signed-off-by: dorotat <[email protected]>
rohitrango pushed a commit to rohitrango/NeMo that referenced this pull request Jun 25, 2024
* start adding gpt from megatron core path

Signed-off-by: ericharper <[email protected]>

* set model parallel config

Signed-off-by: ericharper <[email protected]>

* use model parallel config object

Signed-off-by: ericharper <[email protected]>

* update args

Signed-off-by: ericharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* set vp size to none if it is 1

Signed-off-by: ericharper <[email protected]>

* set vp size to none if it is 1

Signed-off-by: ericharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add TransformerConfig

Signed-off-by: ericharper <[email protected]>

* start updating to TransformerConfig

Signed-off-by: ericharper <[email protected]>

* add todo

Signed-off-by: ericharper <[email protected]>

* revert to model parallel config

Signed-off-by: ericharper <[email protected]>

* add hidden_size to model_parallel_config

Signed-off-by: ericharper <[email protected]>

* remove imports

Signed-off-by: ericharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove import

Signed-off-by: ericharper <[email protected]>

* small clean up

Signed-off-by: ericharper <[email protected]>

* update hidden size in peft base model, add mcore commit to jenkins

Signed-off-by: ericharper <[email protected]>

* update module args

Signed-off-by: ericharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add config obj to flash attention tests

Signed-off-by: ericharper <[email protected]>

* remove args

Signed-off-by: ericharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove sequence parallel arg

Signed-off-by: ericharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update args

Signed-off-by: ericharper <[email protected]>

* add config to self

Signed-off-by: ericharper <[email protected]>

* update args

Signed-off-by: ericharper <[email protected]>

* update args

Signed-off-by: ericharper <[email protected]>

* update args

Signed-off-by: ericharper <[email protected]>

* add config to test

Signed-off-by: ericharper <[email protected]>

* get hidden_size from config

Signed-off-by: ericharper <[email protected]>

* add try except

Signed-off-by: ericharper <[email protected]>

* use default

Signed-off-by: ericharper <[email protected]>

* update config with hidden size

Signed-off-by: ericharper <[email protected]>

* remove arg

Signed-off-by: ericharper <[email protected]>

* comment out jenkins test

Signed-off-by: ericharper <[email protected]>

* revert import

Signed-off-by: ericharper <[email protected]>

* remove optimizer_idx

Signed-off-by: eharper <[email protected]>

* prefetch num microbatches

Signed-off-by: eharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* start adding gpt from megatron core path

Signed-off-by: ericharper <[email protected]>

* set model parallel config

Signed-off-by: ericharper <[email protected]>

* use model parallel config object

Signed-off-by: ericharper <[email protected]>

* update args

Signed-off-by: ericharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* start updating to TransformerConfig

Signed-off-by: ericharper <[email protected]>

* revert to model parallel config

Signed-off-by: ericharper <[email protected]>

* add hidden_size to model_parallel_config

Signed-off-by: ericharper <[email protected]>

* remove imports

Signed-off-by: ericharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update module args

Signed-off-by: ericharper <[email protected]>

* add config to self

Signed-off-by: ericharper <[email protected]>

* build transformer config

Signed-off-by: ericharper <[email protected]>

* add model to provider func

Signed-off-by: ericharper <[email protected]>

* update forward and float16 wrapper

Signed-off-by: ericharper <[email protected]>

* instantiate model parallel config after init model parallel

Signed-off-by: ericharper <[email protected]>

* set virtual rank

Signed-off-by: ericharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add GQA config to megatron gpt model (NVIDIA#7096)

* Add GQA config in gpt config file

Signed-off-by: jasonwan <[email protected]>

* Verify mcore is enabled when using GQA

Signed-off-by: jasonwan <[email protected]>

---------

Signed-off-by: jasonwan <[email protected]>

* revert

Signed-off-by: ericharper <[email protected]>

* remove import

Signed-off-by: eharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update for dist adam

Signed-off-by: eharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* use get_gpt_module_list

Signed-off-by: eharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update megatron core commit

Signed-off-by: eharper <[email protected]>

* revert change

Signed-off-by: eharper <[email protected]>

* remove import

Signed-off-by: eharper <[email protected]>

* remove import

Signed-off-by: eharper <[email protected]>

* remove import

Signed-off-by: eharper <[email protected]>

---------

Signed-off-by: ericharper <[email protected]>
Signed-off-by: eharper <[email protected]>
Signed-off-by: jasonwan <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Jason Wang <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants