-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Megatron legacy conversion support #3919
Conversation
Signed-off-by: Ramanathan Arunachalam <[email protected]>
scripts/export.py
Outdated
logging.info("Restoring NeMo model from '{}'".format(nemo_in)) | ||
try: | ||
with torch.inference_mode(): | ||
# Restore instance from .nemo file using generic model restore_from | ||
model = ModelPT.restore_from(restore_path=nemo_in) | ||
# If the megatron based NLP model was trained on NeMo < 1.5, then we need to update the lm_checkpoint on the model config |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This megatron_legacy block would be better off as a separate script that converts legacy .nemo to regular .nemo, and not part of the export script. Otherwise I would have to duplicate this logic in nemo2riva if we want to support legacy megatron -> Riva link.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Introduced a new file scripts/legacy_megatronbert_nlp_to_current_version.py to convert legacy MegatronBert based checkpoints to uptodate version of NeMo
Reverted back export.py to it's previous state
Now developers have to run scripts/legacy_megatronbert_nlp_to_current_version.py first and then scripts/export.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ramanathan831 can we move export scripts to its own directory: scripts/export/...
?
…ng legacy MegatronBert NLP models to current version Signed-off-by: Ramanathan Arunachalam <[email protected]>
This pull request introduces 3 alerts when merging ab12f6c into c15ed04 - view on LGTM.com new alerts:
|
Signed-off-by: Ramanathan Arunachalam <[email protected]>
This pull request introduces 3 alerts when merging 84a5606 into c15ed04 - view on LGTM.com new alerts:
|
This pull request introduces 3 alerts when merging 2a1584c into b0e250f - view on LGTM.com new alerts:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we move the ModelPT changes to NLPModel and NLPSaveRestoreConnector?
…; Do transpose op in Self attention if it's a legacy checkpoint; Add a example notebook for exporting NLP Bert models Signed-off-by: Ramanathan Arunachalam <[email protected]>
Signed-off-by: Ramanathan Arunachalam <[email protected]>
This pull request introduces 3 alerts when merging eef8294 into 087de54 - view on LGTM.com new alerts:
|
@ericharper @borisfom : Moved the state dict mapping to NLPSaveRestoreConnector |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks!
Signed-off-by: Ramanathan Arunachalam <[email protected]>
This pull request introduces 3 alerts when merging 5ef30ac into 087de54 - view on LGTM.com new alerts:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
@@ -308,6 +310,111 @@ def save_to(self, model, save_path: str): | |||
else: | |||
return super().save_to(model, save_path) | |||
|
|||
def restore_from( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have to duplicate the whole method here? Can we call base class restore_from and then override ? Or extract some additional parameters and use auxiliary method ?
@@ -530,6 +530,7 @@ def restore_from( | |||
return_config: bool = False, | |||
trainer: Optional['Trainer'] = None, | |||
save_restore_connector: SaveRestoreConnector = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought the idea was to not have legacy args in this method. Can we move that arg to SaveRestoreConnector constructor instead ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or is it just leftover and not used ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I removed in modelpt and save restore connector, but missed here. Will exclude this is the refractor pr
Signed-off-by: Ramanathan Arunachalam [email protected]
What does this PR do ?
Fix bugs in MegatronBert export to support models trained on NeMo < 1.5
Refactor code for NLP models forward functions based on Bert
Fix a bug in Intent Slot classification on path join
Modify export.py to showcase exporting legacy MegatronBert NLP models
Add a one line overview of what this PR aims to accomplish.
Collection: [Note which collection this PR will affect]
Changelog
Usage
# Add a code snippet demonstrating how to use this
Before your PR is "Ready for review"
Pre checks:
PR Type:
If you haven't finished some of the above items you can still open "Draft" PR.
Who can review?
Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.
Additional Information