Megatron legacy conversion support #3919

ramanathan831 · 2022-03-31T22:02:54Z

Signed-off-by: Ramanathan Arunachalam [email protected]

What does this PR do ?

Fix bugs in MegatronBert export to support models trained on NeMo < 1.5
Refactor code for NLP models forward functions based on Bert
Fix a bug in Intent Slot classification on path join
Modify export.py to showcase exporting legacy MegatronBert NLP models

Add a one line overview of what this PR aims to accomplish.

Collection: [Note which collection this PR will affect]

Changelog

Add specific line by line info of high level changes in this PR.

Usage

You can potentially add a usage example below

# Add a code snippet demonstrating how to use this

Before your PR is "Ready for review"

Pre checks:

[x ] Make sure you read and followed Contributor guidelines
Did you write any new necessary tests?
Did you add or update any necessary documentation?
Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
- Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

New Feature
[ x] Bugfix
Documentation

If you haven't finished some of the above items you can still open "Draft" PR.

Who can review?

Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.

Additional Information

Related to # (issue)

Signed-off-by: Ramanathan Arunachalam <[email protected]>

borisfom · 2022-03-31T23:15:59Z

scripts/export.py

    logging.info("Restoring NeMo model from '{}'".format(nemo_in))
    try:
        with torch.inference_mode():
-            # Restore instance from .nemo file using generic model restore_from
-            model = ModelPT.restore_from(restore_path=nemo_in)
+            # If the megatron based NLP model was trained on NeMo < 1.5, then we need to update the lm_checkpoint on the model config


This megatron_legacy block would be better off as a separate script that converts legacy .nemo to regular .nemo, and not part of the export script. Otherwise I would have to duplicate this logic in nemo2riva if we want to support legacy megatron -> Riva link.

Introduced a new file scripts/legacy_megatronbert_nlp_to_current_version.py to convert legacy MegatronBert based checkpoints to uptodate version of NeMo
Reverted back export.py to it's previous state
Now developers have to run scripts/legacy_megatronbert_nlp_to_current_version.py first and then scripts/export.py

@ramanathan831 can we move export scripts to its own directory: scripts/export/... ?

…ng legacy MegatronBert NLP models to current version Signed-off-by: Ramanathan Arunachalam <[email protected]>

lgtm-com · 2022-04-01T07:23:49Z

This pull request introduces 3 alerts when merging ab12f6c into c15ed04 - view on LGTM.com

new alerts:

2 for Unused import
1 for Unnecessary delete statement in function

Signed-off-by: Ramanathan Arunachalam <[email protected]>

lgtm-com · 2022-04-01T18:40:26Z

This pull request introduces 3 alerts when merging 84a5606 into c15ed04 - view on LGTM.com

new alerts:

3 for Unused import

lgtm-com · 2022-04-01T19:03:59Z

This pull request introduces 3 alerts when merging 2a1584c into b0e250f - view on LGTM.com

new alerts:

3 for Unused import

ericharper

Can we move the ModelPT changes to NLPModel and NLPSaveRestoreConnector?

…; Do transpose op in Self attention if it's a legacy checkpoint; Add a example notebook for exporting NLP Bert models Signed-off-by: Ramanathan Arunachalam <[email protected]>

Signed-off-by: Ramanathan Arunachalam <[email protected]>

lgtm-com · 2022-04-04T01:56:29Z

This pull request introduces 3 alerts when merging eef8294 into 087de54 - view on LGTM.com

new alerts:

3 for Unused import

ramanathan831 · 2022-04-04T02:39:34Z

@ericharper @borisfom : Moved the state dict mapping to NLPSaveRestoreConnector

nemo/collections/nlp/models/language_modeling/megatron_bert_model.py

ericharper

LGTM. Thanks!

Signed-off-by: Ramanathan Arunachalam <[email protected]>

lgtm-com · 2022-04-04T18:41:24Z

This pull request introduces 3 alerts when merging 5ef30ac into 087de54 - view on LGTM.com

new alerts:

3 for Unused import

ericharper

LGTM.

borisfom · 2022-04-04T19:12:39Z

nemo/collections/nlp/parts/nlp_overrides.py

@@ -308,6 +310,111 @@ def save_to(self, model, save_path: str):
        else:
            return super().save_to(model, save_path)

+    def restore_from(


Do we have to duplicate the whole method here? Can we call base class restore_from and then override ? Or extract some additional parameters and use auxiliary method ?

borisfom · 2022-04-04T19:26:19Z

nemo/core/classes/common.py

@@ -530,6 +530,7 @@ def restore_from(
        return_config: bool = False,
        trainer: Optional['Trainer'] = None,
        save_restore_connector: SaveRestoreConnector = None,


I thought the idea was to not have legacy args in this method. Can we move that arg to SaveRestoreConnector constructor instead ?

Or is it just leftover and not used ?

Oh, I removed in modelpt and save restore connector, but missed here. Will exclude this is the refractor pr

Fix merge from main

960f04a

Signed-off-by: Ramanathan Arunachalam <[email protected]>

ramanathan831 requested review from ericharper, borisfom and okuchaiev March 31, 2022 22:02

borisfom reviewed Mar 31, 2022

View reviewed changes

Rollback scripts/export.py changes and create a new file for converti…

ab12f6c

…ng legacy MegatronBert NLP models to current version Signed-off-by: Ramanathan Arunachalam <[email protected]>

Cleanup legacy conversion code

84a5606

Signed-off-by: Ramanathan Arunachalam <[email protected]>

borisfom previously approved these changes Apr 1, 2022

View reviewed changes

Merge branch 'main' into fix_megatron_fix

2a1584c

ericharper requested changes Apr 1, 2022

View reviewed changes

Move state dict mapping of legacy Megatron to NLPSaveRestoreConnector…

a88bf09

…; Do transpose op in Self attention if it's a legacy checkpoint; Add a example notebook for exporting NLP Bert models Signed-off-by: Ramanathan Arunachalam <[email protected]>

ramanathan831 dismissed borisfom’s stale review via a88bf09 April 4, 2022 01:08

Merge branch 'main' into fix_megatron_fix and resolve merge conflicts

eef8294

Signed-off-by: Ramanathan Arunachalam <[email protected]>

ramanathan831 requested a review from ericharper April 4, 2022 03:36

ericharper reviewed Apr 4, 2022

View reviewed changes

nemo/collections/nlp/models/language_modeling/megatron_bert_model.py Show resolved Hide resolved

ericharper previously approved these changes Apr 4, 2022

View reviewed changes

Style fix

5ef30ac

Signed-off-by: Ramanathan Arunachalam <[email protected]>

ramanathan831 dismissed ericharper’s stale review via 5ef30ac April 4, 2022 18:30

ramanathan831 requested review from ericharper and borisfom April 4, 2022 18:35

ericharper approved these changes Apr 4, 2022

View reviewed changes

ericharper merged commit 01ba140 into main Apr 4, 2022

ericharper deleted the fix_megatron_fix branch April 4, 2022 19:12

borisfom reviewed Apr 4, 2022

View reviewed changes

yidong72 mentioned this pull request Apr 8, 2022

Fix notebook bugs for branch r1.8.0 #3948

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Megatron legacy conversion support #3919

Megatron legacy conversion support #3919

ramanathan831 commented Mar 31, 2022

borisfom Mar 31, 2022

ramanathan831 Apr 1, 2022

ericharper Apr 4, 2022

lgtm-com bot commented Apr 1, 2022

lgtm-com bot commented Apr 1, 2022

lgtm-com bot commented Apr 1, 2022

ericharper left a comment

lgtm-com bot commented Apr 4, 2022

ramanathan831 commented Apr 4, 2022

ericharper left a comment

lgtm-com bot commented Apr 4, 2022

ericharper left a comment

borisfom Apr 4, 2022

borisfom Apr 4, 2022

borisfom Apr 4, 2022

ramanathan831 Apr 5, 2022

Megatron legacy conversion support #3919

Megatron legacy conversion support #3919

Conversation

ramanathan831 commented Mar 31, 2022

What does this PR do ?

Changelog

Usage

Before your PR is "Ready for review"

Who can review?

Additional Information

borisfom Mar 31, 2022

Choose a reason for hiding this comment

ramanathan831 Apr 1, 2022

Choose a reason for hiding this comment

ericharper Apr 4, 2022

Choose a reason for hiding this comment

lgtm-com bot commented Apr 1, 2022

lgtm-com bot commented Apr 1, 2022

lgtm-com bot commented Apr 1, 2022

ericharper left a comment

Choose a reason for hiding this comment

lgtm-com bot commented Apr 4, 2022

ramanathan831 commented Apr 4, 2022

ericharper left a comment

Choose a reason for hiding this comment

lgtm-com bot commented Apr 4, 2022

ericharper left a comment

Choose a reason for hiding this comment

borisfom Apr 4, 2022

Choose a reason for hiding this comment

borisfom Apr 4, 2022

Choose a reason for hiding this comment

borisfom Apr 4, 2022

Choose a reason for hiding this comment

ramanathan831 Apr 5, 2022

Choose a reason for hiding this comment