Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix gpt trainer test #6915

Merged
merged 156 commits into from
Aug 10, 2023
Merged

Fix gpt trainer test #6915

merged 156 commits into from
Aug 10, 2023

Conversation

hsiehjackson
Copy link
Collaborator

@hsiehjackson hsiehjackson commented Jun 24, 2023

What does this PR do ?

  • Fix gpt trainer inference_step to compute generation results during validation and test
  • Add dataset with metadata, to output in prediction file
  • Add token_f1 metrics for Scrolls

Collection: [NLP]

Changelog

  • Add specific line by line info of high level changes in this PR.

Usage

  • You can potentially add a usage example below
# Add a code snippet demonstrating how to use this 

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you add or update any necessary documentation?
  • Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
    • Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

  • New Feature
  • Bugfix
  • Documentation

If you haven't finished some of the above items you can still open "Draft" PR.

Who can review?

Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.

Additional Information

  • Related to # (issue)

Signed-off-by: hsiehjackson <[email protected]>
Signed-off-by: hsiehjackson <[email protected]>
@github-actions github-actions bot added the NLP label Jun 24, 2023
@hsiehjackson hsiehjackson force-pushed the fix-gpt-trainer-test branch 2 times, most recently from 70e3747 to fadf272 Compare June 26, 2023 16:47
Signed-off-by: hsiehjackson <[email protected]>
Signed-off-by: hsiehjackson <[email protected]>
@github-actions github-actions bot added the CI label Jun 27, 2023
@arendu arendu self-requested a review June 27, 2023 23:36
Jenkinsfile Outdated Show resolved Hide resolved
ekmb
ekmb previously approved these changes Aug 8, 2023
arendu
arendu previously approved these changes Aug 9, 2023
yzhang123
yzhang123 previously approved these changes Aug 10, 2023
Signed-off-by: Cheng-Ping Hsieh <[email protected]>
@hsiehjackson hsiehjackson dismissed stale reviews from yzhang123, arendu, and ekmb via cf2f2b9 August 10, 2023 02:48
Copy link
Collaborator

@yidong72 yidong72 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall, please don't change the default configuration behavior as the old code depends on it.

drop_last: True
context_key: 'input'
label_key: 'output'
drop_last: False
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't change the default behavior

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why we drop last if we evaluate with the validation dataset?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have one use case that I have to drop the last because I am preparing a dataset that computes contrastive loss. I need to make sure all the batch has the same batch size.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we set drop_last=False, we can use pad_samples_to_global_batch_size=True right? Or this doesn't fulfill your settings? https://github.com/NVIDIA/NeMo/blob/fix-gpt-trainer-test/nemo/collections/nlp/models/language_modeling/megatron_gpt_sft_model.py#L788
If we set default drop_last=True, then we may drop some samples for evaluation which may show incorrect results for comparison.

@hsiehjackson hsiehjackson merged commit e0f8b9b into main Aug 10, 2023
15 checks passed
@hsiehjackson hsiehjackson deleted the fix-gpt-trainer-test branch August 10, 2023 19:44
dorotat-nv pushed a commit to dorotat-nv/NeMo that referenced this pull request Aug 24, 2023
* Add trainer.test() for GPT

Signed-off-by: hsiehjackson <[email protected]>

* Remove unused part

Signed-off-by: hsiehjackson <[email protected]>

* Add trainer.test() for GPT

Signed-off-by: hsiehjackson <[email protected]>

* Remove unused part

Signed-off-by: hsiehjackson <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix training part

Signed-off-by: hsiehjackson <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix config

Signed-off-by: hsiehjackson <[email protected]>

* Fix references and add CI

Signed-off-by: hsiehjackson <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix config error

Signed-off-by: hsiehjackson <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix dataset

Signed-off-by: hsiehjackson <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add metadata

Signed-off-by: hsiehjackson <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix config

Signed-off-by: hsiehjackson <[email protected]>

* Fix empty batch

Signed-off-by: hsiehjackson <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix config

Signed-off-by: hsiehjackson <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix config

Signed-off-by: hsiehjackson <[email protected]>

* Fix max seq length

Signed-off-by: hsiehjackson <[email protected]>

* Fix dataset

Signed-off-by: hsiehjackson <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix dataset

Signed-off-by: hsiehjackson <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add token f1

Signed-off-by: hsiehjackson <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add FA in sft

Signed-off-by: hsiehjackson <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add inference config

Signed-off-by: hsiehjackson <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix bug

Signed-off-by: hsiehjackson <[email protected]>

* Fix pad

Signed-off-by: hsiehjackson <[email protected]>

* Fix num batch

Signed-off-by: hsiehjackson <[email protected]>

* Add query_key

Signed-off-by: hsiehjackson <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove pdb

Signed-off-by: hsiehjackson <[email protected]>

* Fix write json

Signed-off-by: hsiehjackson <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix dataset bug and refactor

Signed-off-by: hsiehjackson <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add logging for prediction

Signed-off-by: hsiehjackson <[email protected]>

* Fix retrain

Signed-off-by: hsiehjackson <[email protected]>

* Add query_key in config

Signed-off-by: hsiehjackson <[email protected]>

* Fix bug

Signed-off-by: hsiehjackson <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix config

Signed-off-by: hsiehjackson <[email protected]>

* Fix bug

Signed-off-by: hsiehjackson <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add inference config

Signed-off-by: hsiehjackson <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix bug

Signed-off-by: hsiehjackson <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix mask

Signed-off-by: hsiehjackson <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix mask

Signed-off-by: hsiehjackson <[email protected]>

* Split PR

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* Undo commit

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* Add query_key to doc_string

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* Adjust yzhang123 comments

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix error and follow comments

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove query key

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* Remove logic and query

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove query from model

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* Remove query_key

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* Fix error

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix pdb

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* Add default tokens_to_generate

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* Revert prompt truncate re-prompt

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Revert

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* skip generation with metric loss

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Revert

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix bug

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* support GPTSFTChatDataset

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add comment

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

---------

Signed-off-by: hsiehjackson <[email protected]>
Signed-off-by: Cheng-Ping Hsieh <[email protected]>
Signed-off-by: Cheng-Ping Hsieh <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: dorotat <[email protected]>
rohitrango pushed a commit to rohitrango/NeMo that referenced this pull request Jun 25, 2024
* Add trainer.test() for GPT

Signed-off-by: hsiehjackson <[email protected]>

* Remove unused part

Signed-off-by: hsiehjackson <[email protected]>

* Add trainer.test() for GPT

Signed-off-by: hsiehjackson <[email protected]>

* Remove unused part

Signed-off-by: hsiehjackson <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix training part

Signed-off-by: hsiehjackson <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix config

Signed-off-by: hsiehjackson <[email protected]>

* Fix references and add CI

Signed-off-by: hsiehjackson <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix config error

Signed-off-by: hsiehjackson <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix dataset

Signed-off-by: hsiehjackson <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add metadata

Signed-off-by: hsiehjackson <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix config

Signed-off-by: hsiehjackson <[email protected]>

* Fix empty batch

Signed-off-by: hsiehjackson <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix config

Signed-off-by: hsiehjackson <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix config

Signed-off-by: hsiehjackson <[email protected]>

* Fix max seq length

Signed-off-by: hsiehjackson <[email protected]>

* Fix dataset

Signed-off-by: hsiehjackson <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix dataset

Signed-off-by: hsiehjackson <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add token f1

Signed-off-by: hsiehjackson <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add FA in sft

Signed-off-by: hsiehjackson <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add inference config

Signed-off-by: hsiehjackson <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix bug

Signed-off-by: hsiehjackson <[email protected]>

* Fix pad

Signed-off-by: hsiehjackson <[email protected]>

* Fix num batch

Signed-off-by: hsiehjackson <[email protected]>

* Add query_key

Signed-off-by: hsiehjackson <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove pdb

Signed-off-by: hsiehjackson <[email protected]>

* Fix write json

Signed-off-by: hsiehjackson <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix dataset bug and refactor

Signed-off-by: hsiehjackson <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add logging for prediction

Signed-off-by: hsiehjackson <[email protected]>

* Fix retrain

Signed-off-by: hsiehjackson <[email protected]>

* Add query_key in config

Signed-off-by: hsiehjackson <[email protected]>

* Fix bug

Signed-off-by: hsiehjackson <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix config

Signed-off-by: hsiehjackson <[email protected]>

* Fix bug

Signed-off-by: hsiehjackson <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add inference config

Signed-off-by: hsiehjackson <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix bug

Signed-off-by: hsiehjackson <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix mask

Signed-off-by: hsiehjackson <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix mask

Signed-off-by: hsiehjackson <[email protected]>

* Split PR

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* Undo commit

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* Add query_key to doc_string

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* Adjust yzhang123 comments

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix error and follow comments

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove query key

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* Remove logic and query

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove query from model

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* Remove query_key

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* Fix error

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix pdb

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* Add default tokens_to_generate

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* Revert prompt truncate re-prompt

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Revert

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* skip generation with metric loss

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Revert

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix bug

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* support GPTSFTChatDataset

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add comment

Signed-off-by: Cheng-Ping Hsieh <[email protected]>

---------

Signed-off-by: hsiehjackson <[email protected]>
Signed-off-by: Cheng-Ping Hsieh <[email protected]>
Signed-off-by: Cheng-Ping Hsieh <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants