Save model parallel .nemo in ExpManager #6115

arendu · 2023-02-25T17:37:45Z

What does this PR do ?

save_best_model and always_save_nemo will work with tp/pp > 1 with this PR.

Collection: [NLP,ASR]

Changelog

Add specific line by line info of high level changes in this PR.

Usage

You can potentially add a usage example below

# Add a code snippet demonstrating how to use this

Before your PR is "Ready for review"

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests?
Did you add or update any necessary documentation?
Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
- Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

New Feature
Bugfix
Documentation

If you haven't finished some of the above items you can still open "Draft" PR.

Who can review?

Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.

Additional Information

Related to # (issue)

… attribute Signed-off-by: arendu <[email protected]>

Signed-off-by: arendu <[email protected]>

…om/NVIDIA/NeMo into adithyare/save_model_parallel_nemo

Signed-off-by: arendu <[email protected]>

nemo/core/classes/modelPT.py

Signed-off-by: arendu <[email protected]>

…om/NVIDIA/NeMo into adithyare/save_model_parallel_nemo

arendu · 2023-03-07T22:03:21Z

@titu1994 I added the pleasefixme for the test.

ericharper

LGTM. Thanks!

titu1994

Looks correct, minor modification below

nemo/utils/exp_manager.py

Signed-off-by: arendu <[email protected]>

for more information, see https://pre-commit.ci

titu1994

Lgtm I have PR ready to remove the please fix me for RNNT

* patch to allow using tokenizers without additional_special_tokens_ids attribute Signed-off-by: arendu <[email protected]> * save tp pp > 1 .nemo in exp manager Signed-off-by: arendu <[email protected]> * Better rank checking for model parallel > 1 .nemo saving Signed-off-by: MaximumEntropy <[email protected]> * Safety check Signed-off-by: MaximumEntropy <[email protected]> * check for nlp model Signed-off-by: arendu <[email protected]> * custom on save checkpoint for NLPModel Signed-off-by: arendu <[email protected]> * minor update Signed-off-by: arendu <[email protected]> * minor updates Signed-off-by: arendu <[email protected]> * reverting custom save logic Signed-off-by: arendu <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * reverting custom save logic Signed-off-by: arendu <[email protected]> * reverting custom save logic Signed-off-by: arendu <[email protected]> * reverting custom save logic Signed-off-by: arendu <[email protected]> * reverting custom save logic Signed-off-by: arendu <[email protected]> * reverting custom save logic Signed-off-by: arendu <[email protected]> * added pleasefixme Signed-off-by: arendu <[email protected]> * updated Signed-off-by: arendu <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: arendu <[email protected]> Signed-off-by: MaximumEntropy <[email protected]> Co-authored-by: MaximumEntropy <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* patch to allow using tokenizers without additional_special_tokens_ids attribute Signed-off-by: arendu <[email protected]> * save tp pp > 1 .nemo in exp manager Signed-off-by: arendu <[email protected]> * Better rank checking for model parallel > 1 .nemo saving Signed-off-by: MaximumEntropy <[email protected]> * Safety check Signed-off-by: MaximumEntropy <[email protected]> * check for nlp model Signed-off-by: arendu <[email protected]> * custom on save checkpoint for NLPModel Signed-off-by: arendu <[email protected]> * minor update Signed-off-by: arendu <[email protected]> * minor updates Signed-off-by: arendu <[email protected]> * reverting custom save logic Signed-off-by: arendu <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * reverting custom save logic Signed-off-by: arendu <[email protected]> * reverting custom save logic Signed-off-by: arendu <[email protected]> * reverting custom save logic Signed-off-by: arendu <[email protected]> * reverting custom save logic Signed-off-by: arendu <[email protected]> * reverting custom save logic Signed-off-by: arendu <[email protected]> * added pleasefixme Signed-off-by: arendu <[email protected]> * updated Signed-off-by: arendu <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: arendu <[email protected]> Signed-off-by: MaximumEntropy <[email protected]> Co-authored-by: MaximumEntropy <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: hsiehjackson <[email protected]>

arendu added 30 commits December 15, 2022 10:51

patch to allow using tokenizers without additional_special_tokens_ids…

2b95406

… attribute Signed-off-by: arendu <[email protected]>

Merge branch 'main' of https://github.com/NVIDIA/NeMo into main

c131a90

Merge branch 'main' of https://github.com/NVIDIA/NeMo into main

9e15c3a

merge main

d0e3669

Signed-off-by: arendu <[email protected]>

Merge branch 'main' of https://github.com/NVIDIA/NeMo into main

0a19a5a

Merge branch 'main' of https://github.com/NVIDIA/NeMo into main

ec3d57b

Merge branch 'main' of https://github.com/NVIDIA/NeMo into main

64e36ba

Merge branch 'main' of https://github.com/NVIDIA/NeMo into main

5bfde7e

Merge branch 'main' of https://github.com/NVIDIA/NeMo into main

b04b145

Merge branch 'main' of https://github.com/NVIDIA/NeMo into main

b1906ab

Merge branch 'main' of https://github.com/NVIDIA/NeMo into main

9795062

Merge branch 'main' of https://github.com/NVIDIA/NeMo into main

0f83085

Merge branch 'main' of https://github.com/NVIDIA/NeMo into main

ee4dd1a

Merge branch 'main' of https://github.com/NVIDIA/NeMo into main

53ba0b2

Merge branch 'main' of https://github.com/NVIDIA/NeMo into main

a6aee2a

Merge branch 'main' of https://github.com/NVIDIA/NeMo into main

33442d4

Merge branch 'main' of https://github.com/NVIDIA/NeMo into main

8e6c5c9

Merge branch 'main' of https://github.com/NVIDIA/NeMo into main

efd263c

Merge branch 'main' of https://github.com/NVIDIA/NeMo into main

ecfda4f

Merge branch 'main' of https://github.com/NVIDIA/NeMo into main

15aee0c

Merge branch 'main' of https://github.com/NVIDIA/NeMo into main

2b7f3de

Merge branch 'main' of https://github.com/NVIDIA/NeMo into main

f62cde9

Merge branch 'main' of https://github.com/NVIDIA/NeMo into main

31915c9

Merge branch 'main' of https://github.com/NVIDIA/NeMo into main

fa22a1f

Merge branch 'main' of https://github.com/NVIDIA/NeMo into main

e62cd47

Merge branch 'main' of https://github.com/NVIDIA/NeMo into main

321a907

Merge branch 'main' of https://github.com/NVIDIA/NeMo into main

aeaf13f

Merge branch 'main' of https://github.com/NVIDIA/NeMo into main

c9a61f1

Merge branch 'main' of https://github.com/NVIDIA/NeMo into main

942b58c

Merge branch 'main' of https://github.com/NVIDIA/NeMo into main

0007d32

arendu added 5 commits March 7, 2023 10:22

reverting custom save logic

02dc335

Signed-off-by: arendu <[email protected]>

reverting custom save logic

52c0687

Signed-off-by: arendu <[email protected]>

Merge branch 'adithyare/save_model_parallel_nemo' of https://github.c…

61327fb

…om/NVIDIA/NeMo into adithyare/save_model_parallel_nemo

reverting custom save logic

788205f

Signed-off-by: arendu <[email protected]>

reverting custom save logic

a504af3

Signed-off-by: arendu <[email protected]>

github-actions bot removed the NLP label Mar 7, 2023

reverting custom save logic

77a93b8

Signed-off-by: arendu <[email protected]>

arendu requested a review from MaximumEntropy March 7, 2023 18:32

ericharper reviewed Mar 7, 2023

View reviewed changes

nemo/core/classes/modelPT.py Show resolved Hide resolved

arendu and others added 3 commits March 7, 2023 12:27

Merge branch 'main' into adithyare/save_model_parallel_nemo

fa375d8

added pleasefixme

8c28b72

Signed-off-by: arendu <[email protected]>

Merge branch 'adithyare/save_model_parallel_nemo' of https://github.c…

f6a1337

…om/NVIDIA/NeMo into adithyare/save_model_parallel_nemo

arendu requested a review from ericharper March 7, 2023 22:35

arendu added 3 commits March 7, 2023 16:03

Merge branch 'main' of https://github.com/NVIDIA/NeMo into main

5c200e2

Merge branch 'main' of https://github.com/NVIDIA/NeMo into main

638f66e

Merge branch 'main' into adithyare/save_model_parallel_nemo

2488a51

ericharper previously approved these changes Mar 8, 2023

View reviewed changes

titu1994 reviewed Mar 8, 2023

View reviewed changes

nemo/utils/exp_manager.py Outdated Show resolved Hide resolved

updated

e453062

Signed-off-by: arendu <[email protected]>

arendu dismissed ericharper’s stale review via e453062 March 9, 2023 22:14

arendu requested review from ericharper and titu1994 March 9, 2023 22:15

arendu and others added 2 commits March 9, 2023 14:15

Merge branch 'main' into adithyare/save_model_parallel_nemo

30d4eea

[pre-commit.ci] auto fixes from pre-commit.com hooks

75862b2

for more information, see https://pre-commit.ci

titu1994 approved these changes Mar 9, 2023

View reviewed changes

arendu merged commit 15766ca into main Mar 10, 2023

arendu deleted the adithyare/save_model_parallel_nemo branch March 10, 2023 00:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Save model parallel .nemo in ExpManager #6115

Save model parallel .nemo in ExpManager #6115

arendu commented Feb 25, 2023 •

edited

Loading

arendu commented Mar 7, 2023

ericharper left a comment

titu1994 left a comment •

edited

Loading

titu1994 left a comment

Save model parallel .nemo in ExpManager #6115

Save model parallel .nemo in ExpManager #6115

Conversation

arendu commented Feb 25, 2023 • edited Loading

What does this PR do ?

Changelog

Usage

Before your PR is "Ready for review"

Who can review?

Additional Information

arendu commented Mar 7, 2023

ericharper left a comment

Choose a reason for hiding this comment

titu1994 left a comment • edited Loading

Choose a reason for hiding this comment

titu1994 left a comment

Choose a reason for hiding this comment

arendu commented Feb 25, 2023 •

edited

Loading

titu1994 left a comment •

edited

Loading