Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Megatron legacy conversion support #3919

Merged
merged 7 commits into from
Apr 4, 2022
Merged

Megatron legacy conversion support #3919

merged 7 commits into from
Apr 4, 2022

Conversation

ramanathan831
Copy link
Contributor

Signed-off-by: Ramanathan Arunachalam [email protected]

What does this PR do ?

Fix bugs in MegatronBert export to support models trained on NeMo < 1.5
Refactor code for NLP models forward functions based on Bert
Fix a bug in Intent Slot classification on path join
Modify export.py to showcase exporting legacy MegatronBert NLP models

Add a one line overview of what this PR aims to accomplish.

Collection: [Note which collection this PR will affect]

Changelog

  • Add specific line by line info of high level changes in this PR.

Usage

  • You can potentially add a usage example below
# Add a code snippet demonstrating how to use this 

Before your PR is "Ready for review"

Pre checks:

  • [x ] Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you add or update any necessary documentation?
  • Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
    • Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

  • New Feature
  • [ x] Bugfix
  • Documentation

If you haven't finished some of the above items you can still open "Draft" PR.

Who can review?

Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.

Additional Information

  • Related to # (issue)

Signed-off-by: Ramanathan Arunachalam <[email protected]>
logging.info("Restoring NeMo model from '{}'".format(nemo_in))
try:
with torch.inference_mode():
# Restore instance from .nemo file using generic model restore_from
model = ModelPT.restore_from(restore_path=nemo_in)
# If the megatron based NLP model was trained on NeMo < 1.5, then we need to update the lm_checkpoint on the model config
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This megatron_legacy block would be better off as a separate script that converts legacy .nemo to regular .nemo, and not part of the export script. Otherwise I would have to duplicate this logic in nemo2riva if we want to support legacy megatron -> Riva link.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Introduced a new file scripts/legacy_megatronbert_nlp_to_current_version.py to convert legacy MegatronBert based checkpoints to uptodate version of NeMo
Reverted back export.py to it's previous state
Now developers have to run scripts/legacy_megatronbert_nlp_to_current_version.py first and then scripts/export.py

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ramanathan831 can we move export scripts to its own directory: scripts/export/... ?

…ng legacy MegatronBert NLP models to current version

Signed-off-by: Ramanathan Arunachalam <[email protected]>
@lgtm-com
Copy link

lgtm-com bot commented Apr 1, 2022

This pull request introduces 3 alerts when merging ab12f6c into c15ed04 - view on LGTM.com

new alerts:

  • 2 for Unused import
  • 1 for Unnecessary delete statement in function

Signed-off-by: Ramanathan Arunachalam <[email protected]>
borisfom
borisfom previously approved these changes Apr 1, 2022
@lgtm-com
Copy link

lgtm-com bot commented Apr 1, 2022

This pull request introduces 3 alerts when merging 84a5606 into c15ed04 - view on LGTM.com

new alerts:

  • 3 for Unused import

@lgtm-com
Copy link

lgtm-com bot commented Apr 1, 2022

This pull request introduces 3 alerts when merging 2a1584c into b0e250f - view on LGTM.com

new alerts:

  • 3 for Unused import

Copy link
Collaborator

@ericharper ericharper left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we move the ModelPT changes to NLPModel and NLPSaveRestoreConnector?

…; Do transpose op in Self attention if it's a legacy checkpoint; Add a example notebook for exporting NLP Bert models

Signed-off-by: Ramanathan Arunachalam <[email protected]>
@lgtm-com
Copy link

lgtm-com bot commented Apr 4, 2022

This pull request introduces 3 alerts when merging eef8294 into 087de54 - view on LGTM.com

new alerts:

  • 3 for Unused import

@ramanathan831
Copy link
Contributor Author

@ericharper @borisfom : Moved the state dict mapping to NLPSaveRestoreConnector

ericharper
ericharper previously approved these changes Apr 4, 2022
Copy link
Collaborator

@ericharper ericharper left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks!

Signed-off-by: Ramanathan Arunachalam <[email protected]>
@lgtm-com
Copy link

lgtm-com bot commented Apr 4, 2022

This pull request introduces 3 alerts when merging 5ef30ac into 087de54 - view on LGTM.com

new alerts:

  • 3 for Unused import

Copy link
Collaborator

@ericharper ericharper left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@ericharper ericharper merged commit 01ba140 into main Apr 4, 2022
@ericharper ericharper deleted the fix_megatron_fix branch April 4, 2022 19:12
@@ -308,6 +310,111 @@ def save_to(self, model, save_path: str):
else:
return super().save_to(model, save_path)

def restore_from(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have to duplicate the whole method here? Can we call base class restore_from and then override ? Or extract some additional parameters and use auxiliary method ?

@@ -530,6 +530,7 @@ def restore_from(
return_config: bool = False,
trainer: Optional['Trainer'] = None,
save_restore_connector: SaveRestoreConnector = None,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought the idea was to not have legacy args in this method. Can we move that arg to SaveRestoreConnector constructor instead ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or is it just leftover and not used ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I removed in modelpt and save restore connector, but missed here. Will exclude this is the refractor pr

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants