Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Start using ModelParallelConfig from Megatron Core #6885

Merged
merged 57 commits into from
Aug 14, 2023
Merged

Conversation

ericharper
Copy link
Collaborator

What does this PR do ?

This PR is adding the ModelParallelConfig arguments to be used with the next release of Megatron Core.

Collection: NLP

Changelog

  • Add specific line by line info of high level changes in this PR.

Usage

  • You can potentially add a usage example below
# Add a code snippet demonstrating how to use this 

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you add or update any necessary documentation?
  • Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
    • Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

  • New Feature
  • Bugfix
  • Documentation

If you haven't finished some of the above items you can still open "Draft" PR.

Who can review?

Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.

Additional Information

  • Related to # (issue)

@github-actions github-actions bot added the NLP label Jun 19, 2023
Copy link

@github-advanced-security github-advanced-security bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CodeQL found more than 10 potential problems in the proposed changes. Check the Files changed tab for more details.

@github-actions github-actions bot added CI core Changes to NeMo Core labels Jun 30, 2023
@ericharper ericharper marked this pull request as ready for review July 25, 2023 16:39
Copy link
Collaborator

@michalivne michalivne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Very useful to collect configs into model parallel configs. See minor comments.

# hidden size is needed for pipeline schedules but is not currently in ModelParallelConfig
setattr(model_parallel_config, 'hidden_size', self.cfg.hidden_size)
except AttributeError:
logging.warning(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not also fail here? If missing and will fail later wouldn't here be a good place to stop?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found this was too brittle. Maybe we can add a strict argument?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about the suggestion?

""" Hidden size needs to be set from the cfg.encoder for the pipeline schedule.
"""

model_parallel_config = super().build_model_parallel_config()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't parent class return warning if hidden_size is not in cfg.model.hidden_size? Perhaps this argument can be passed to parent method?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe you could expand more on your suggestion? I was adding this because the parent class didn't have hidden_size for this model.

aklife97
aklife97 previously approved these changes Aug 7, 2023
Copy link
Collaborator

@aklife97 aklife97 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you!
The main concern I have is MPConfig vs TransformerConfig, we need to probably discuss more how we should structure the usages. Apart from that, this looks like it covers everything

ericharper and others added 22 commits August 8, 2023 10:46
Signed-off-by: ericharper <[email protected]>
Signed-off-by: ericharper <[email protected]>
Signed-off-by: ericharper <[email protected]>
Signed-off-by: ericharper <[email protected]>
Signed-off-by: ericharper <[email protected]>
Signed-off-by: ericharper <[email protected]>
Signed-off-by: ericharper <[email protected]>
Signed-off-by: ericharper <[email protected]>
Signed-off-by: ericharper <[email protected]>
aklife97
aklife97 previously approved these changes Aug 8, 2023
Copy link
Collaborator

@aklife97 aklife97 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you!! just 1 potential issue with sequence length setting

Copy link
Collaborator

@aklife97 aklife97 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! I think we should merge this in now
@michalivne: let us know what your feedback is on Eric's response, and we can send fixes in later PRs accordingly!

@ericharper ericharper merged commit 4833347 into main Aug 14, 2023
13 of 15 checks passed
@ericharper ericharper deleted the mcore_gpt_path branch August 14, 2023 04:55
guyueh1 pushed a commit to guyueh1/NeMo that referenced this pull request Aug 14, 2023
* start adding gpt from megatron core path

Signed-off-by: ericharper <[email protected]>

* set model parallel config

Signed-off-by: ericharper <[email protected]>

* use model parallel config object

Signed-off-by: ericharper <[email protected]>

* update args

Signed-off-by: ericharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* set vp size to none if it is 1

Signed-off-by: ericharper <[email protected]>

* set vp size to none if it is 1

Signed-off-by: ericharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add TransformerConfig

Signed-off-by: ericharper <[email protected]>

* start updating to TransformerConfig

Signed-off-by: ericharper <[email protected]>

* add todo

Signed-off-by: ericharper <[email protected]>

* revert to model parallel config

Signed-off-by: ericharper <[email protected]>

* add hidden_size to model_parallel_config

Signed-off-by: ericharper <[email protected]>

* remove imports

Signed-off-by: ericharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove import

Signed-off-by: ericharper <[email protected]>

* small clean up

Signed-off-by: ericharper <[email protected]>

* update hidden size in peft base model, add mcore commit to jenkins

Signed-off-by: ericharper <[email protected]>

* update module args

Signed-off-by: ericharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add config obj to flash attention tests

Signed-off-by: ericharper <[email protected]>

* remove args

Signed-off-by: ericharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove sequence parallel arg

Signed-off-by: ericharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update args

Signed-off-by: ericharper <[email protected]>

* add config to self

Signed-off-by: ericharper <[email protected]>

* update args

Signed-off-by: ericharper <[email protected]>

* update args

Signed-off-by: ericharper <[email protected]>

* update args

Signed-off-by: ericharper <[email protected]>

* add config to test

Signed-off-by: ericharper <[email protected]>

* get hidden_size from config

Signed-off-by: ericharper <[email protected]>

* add try except

Signed-off-by: ericharper <[email protected]>

* use default

Signed-off-by: ericharper <[email protected]>

* update config with hidden size

Signed-off-by: ericharper <[email protected]>

* remove arg

Signed-off-by: ericharper <[email protected]>

* comment out jenkins test

Signed-off-by: ericharper <[email protected]>

* revert import

Signed-off-by: ericharper <[email protected]>

* remove optimizer_idx

Signed-off-by: eharper <[email protected]>

* prefetch num microbatches

Signed-off-by: eharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove import

Signed-off-by: eharper <[email protected]>

* temporarily comment jenkins test

Signed-off-by: eharper <[email protected]>

* update seq_length

Signed-off-by: eharper <[email protected]>

* remove commented code

Signed-off-by: eharper <[email protected]>

* update arg

Signed-off-by: eharper <[email protected]>

* update mbs and gbs of test

Signed-off-by: eharper <[email protected]>

* update batch size in test

Signed-off-by: eharper <[email protected]>

* fix precision in test

Signed-off-by: eharper <[email protected]>

* update precision

Signed-off-by: eharper <[email protected]>

* move hidden_size out of conditional

Signed-off-by: eharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: ericharper <[email protected]>
Signed-off-by: eharper <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
@@ -16,6 +16,7 @@

import pytest
import torch
from megatron.core import ModelParallelConfig
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is breaking pytest --cpu when doing a basic setup without all the fluff.

dorotat-nv pushed a commit to dorotat-nv/NeMo that referenced this pull request Aug 24, 2023
* start adding gpt from megatron core path

Signed-off-by: ericharper <[email protected]>

* set model parallel config

Signed-off-by: ericharper <[email protected]>

* use model parallel config object

Signed-off-by: ericharper <[email protected]>

* update args

Signed-off-by: ericharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* set vp size to none if it is 1

Signed-off-by: ericharper <[email protected]>

* set vp size to none if it is 1

Signed-off-by: ericharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add TransformerConfig

Signed-off-by: ericharper <[email protected]>

* start updating to TransformerConfig

Signed-off-by: ericharper <[email protected]>

* add todo

Signed-off-by: ericharper <[email protected]>

* revert to model parallel config

Signed-off-by: ericharper <[email protected]>

* add hidden_size to model_parallel_config

Signed-off-by: ericharper <[email protected]>

* remove imports

Signed-off-by: ericharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove import

Signed-off-by: ericharper <[email protected]>

* small clean up

Signed-off-by: ericharper <[email protected]>

* update hidden size in peft base model, add mcore commit to jenkins

Signed-off-by: ericharper <[email protected]>

* update module args

Signed-off-by: ericharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add config obj to flash attention tests

Signed-off-by: ericharper <[email protected]>

* remove args

Signed-off-by: ericharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove sequence parallel arg

Signed-off-by: ericharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update args

Signed-off-by: ericharper <[email protected]>

* add config to self

Signed-off-by: ericharper <[email protected]>

* update args

Signed-off-by: ericharper <[email protected]>

* update args

Signed-off-by: ericharper <[email protected]>

* update args

Signed-off-by: ericharper <[email protected]>

* add config to test

Signed-off-by: ericharper <[email protected]>

* get hidden_size from config

Signed-off-by: ericharper <[email protected]>

* add try except

Signed-off-by: ericharper <[email protected]>

* use default

Signed-off-by: ericharper <[email protected]>

* update config with hidden size

Signed-off-by: ericharper <[email protected]>

* remove arg

Signed-off-by: ericharper <[email protected]>

* comment out jenkins test

Signed-off-by: ericharper <[email protected]>

* revert import

Signed-off-by: ericharper <[email protected]>

* remove optimizer_idx

Signed-off-by: eharper <[email protected]>

* prefetch num microbatches

Signed-off-by: eharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove import

Signed-off-by: eharper <[email protected]>

* temporarily comment jenkins test

Signed-off-by: eharper <[email protected]>

* update seq_length

Signed-off-by: eharper <[email protected]>

* remove commented code

Signed-off-by: eharper <[email protected]>

* update arg

Signed-off-by: eharper <[email protected]>

* update mbs and gbs of test

Signed-off-by: eharper <[email protected]>

* update batch size in test

Signed-off-by: eharper <[email protected]>

* fix precision in test

Signed-off-by: eharper <[email protected]>

* update precision

Signed-off-by: eharper <[email protected]>

* move hidden_size out of conditional

Signed-off-by: eharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: ericharper <[email protected]>
Signed-off-by: eharper <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: dorotat <[email protected]>
rohitrango pushed a commit to rohitrango/NeMo that referenced this pull request Jun 25, 2024
* start adding gpt from megatron core path

Signed-off-by: ericharper <[email protected]>

* set model parallel config

Signed-off-by: ericharper <[email protected]>

* use model parallel config object

Signed-off-by: ericharper <[email protected]>

* update args

Signed-off-by: ericharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* set vp size to none if it is 1

Signed-off-by: ericharper <[email protected]>

* set vp size to none if it is 1

Signed-off-by: ericharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add TransformerConfig

Signed-off-by: ericharper <[email protected]>

* start updating to TransformerConfig

Signed-off-by: ericharper <[email protected]>

* add todo

Signed-off-by: ericharper <[email protected]>

* revert to model parallel config

Signed-off-by: ericharper <[email protected]>

* add hidden_size to model_parallel_config

Signed-off-by: ericharper <[email protected]>

* remove imports

Signed-off-by: ericharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove import

Signed-off-by: ericharper <[email protected]>

* small clean up

Signed-off-by: ericharper <[email protected]>

* update hidden size in peft base model, add mcore commit to jenkins

Signed-off-by: ericharper <[email protected]>

* update module args

Signed-off-by: ericharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add config obj to flash attention tests

Signed-off-by: ericharper <[email protected]>

* remove args

Signed-off-by: ericharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove sequence parallel arg

Signed-off-by: ericharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update args

Signed-off-by: ericharper <[email protected]>

* add config to self

Signed-off-by: ericharper <[email protected]>

* update args

Signed-off-by: ericharper <[email protected]>

* update args

Signed-off-by: ericharper <[email protected]>

* update args

Signed-off-by: ericharper <[email protected]>

* add config to test

Signed-off-by: ericharper <[email protected]>

* get hidden_size from config

Signed-off-by: ericharper <[email protected]>

* add try except

Signed-off-by: ericharper <[email protected]>

* use default

Signed-off-by: ericharper <[email protected]>

* update config with hidden size

Signed-off-by: ericharper <[email protected]>

* remove arg

Signed-off-by: ericharper <[email protected]>

* comment out jenkins test

Signed-off-by: ericharper <[email protected]>

* revert import

Signed-off-by: ericharper <[email protected]>

* remove optimizer_idx

Signed-off-by: eharper <[email protected]>

* prefetch num microbatches

Signed-off-by: eharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove import

Signed-off-by: eharper <[email protected]>

* temporarily comment jenkins test

Signed-off-by: eharper <[email protected]>

* update seq_length

Signed-off-by: eharper <[email protected]>

* remove commented code

Signed-off-by: eharper <[email protected]>

* update arg

Signed-off-by: eharper <[email protected]>

* update mbs and gbs of test

Signed-off-by: eharper <[email protected]>

* update batch size in test

Signed-off-by: eharper <[email protected]>

* fix precision in test

Signed-off-by: eharper <[email protected]>

* update precision

Signed-off-by: eharper <[email protected]>

* move hidden_size out of conditional

Signed-off-by: eharper <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: ericharper <[email protected]>
Signed-off-by: eharper <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI core Changes to NeMo Core NLP
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants