Mistral 7b conversion script #8052

akoumpa · 2023-12-19T17:32:25Z

What does this PR do ?

Adds conversion script for mistral-7b to NeMo.

Collection: [Note which collection this PR will affect]
nlp/language_modeling

Changelog

Add specific line by line info of high level changes in this PR.

Usage

You can potentially add a usage example below

# Add a code snippet demonstrating how to use this

Jenkins CI

To run Jenkins, a NeMo User with write access must comment jenkins on the PR.

Before your PR is "Ready for review"

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests?
Did you add or update any necessary documentation?
Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
- Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

New Feature
Bugfix
Documentation

If you haven't finished some of the above items you can still open "Draft" PR.

Who can review?

Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.

Additional Information

Related to # (issue)

scripts/nlp_language_modeling/convert_mistral_7b_to_nemo.py

janekl · 2023-12-27T14:28:20Z

Could you please keep the conversion script consistent with other ones? I specifically mean:

In particular, at best we keep these aligned across scripts:

defining a new convert_state_dict function
config handling (override_model_dict etc.)
script parameters.

Llama2 conversion script has a bit different flavour, but I believe we should follow the "majority" style for new scripts (and align Llama2 at some point).

janekl · 2023-12-27T14:33:05Z

scripts/nlp_language_modeling/convert_hf_mistral_7b_to_nemo.py

+        for key in keys:
+            checkpoint['state_dict'][key.replace('model.', 'model.module.', 1)] = checkpoint['state_dict'].pop(key)
+
+    model = load_model(MegatronGPTModel, checkpoint, strict=False, trainer=trainer)


Function load_model is a bit convolved, can you avoid using it? I recommend instantiating model with sth like

model = MegatronGPTModel(cfg, trainer) missing_keys, unexpected_keys = model.load_state_dict(hf_state_dict, strict=False)

see also #7977.

Do you know if the model.load_state_dict with strict=False verifies if any weight were loaded at all? I'm not implying that load_model performs this check, rather want to understand if this is something we care doing at this stage of the script.

That PR #7977 has been merged. Could you please replace load_model with load_state_dict_helper? The former is really cluttered and unnecessarily saves two tokenizers -- the one stored in tokenizer.tokenizer_model is not needed.

And as for strict=False there are checks for missing_keys and unexpected_keys lists in load_state_dict_helper that will complain if any expected weights are not loaded, or are superfluous.

janekl · 2023-12-27T14:39:38Z

As for architecture, Mistral seems to heavily borrow from Llama2.

Would it be possible to just extend Llama2 conversion script to cover HF Mistral checkpoint?

github-actions · 2024-01-11T01:46:05Z

This PR is stale because it has been open for 14 days with no activity. Remove stale label or comment or update or this will be closed in 7 days.

akoumpa · 2024-01-11T18:04:43Z

Hi @janekl thanks for your comments! I'm leaning towards merging this as is and once we settle on how we want to refactor the import scripts do the actual refactor. I think you are right it shares a lot with the Llama script, and once the PRs you mention have been approved & merged we can follow the appropriate guidelines here as well.

scripts/nlp_language_modeling/convert_nemo_mistral_7b_to_hf.py

From mistral checkpoint not hf. Pending: support for block-diagonal attention mask. Signed-off-by: Alexandros Koumparoulis <[email protected]>

Signed-off-by: Alexandros Koumparoulis <[email protected]>

for more information, see https://pre-commit.ci Signed-off-by: Alexandros Koumparoulis <[email protected]>

Signed-off-by: Alexandros Koumparoulis <[email protected]>

ericharper · 2024-01-24T07:05:41Z

jenkins

ericharper · 2024-01-24T21:41:07Z

jenkins

janekl · 2024-01-25T09:02:19Z

I think that adding a Jenkins conversion test for a small Mistral model would be really desirable.

Examples are here

NeMo/Jenkinsfile

Lines 391 to 419 in f10d694

    
                   stage('Llama') { 
        
                     steps { 
        
                       sh 'CUDA_VISIBLE_DEVICES=0 python scripts/nlp_language_modeling/convert_hf_llama_to_nemo.py \ 
        
                       --in-file=/home/TestData/nlp/megatron_llama/llama-ci-hf \ 
        
                       --out-file=/home/TestData/nlp/megatron_llama/ci.nemo \ 
        
                       --precision=16' 
        
                       sh 'rm -f /home/TestData/nlp/megatron_llama/ci.nemo' 
        
                     } 
        
                   } 
        
                   stage('StarCoder') { 
        
                     steps { 
        
                       sh 'python scripts/nlp_language_modeling/convert_starcoder_hf_to_nemo.py \ 
        
                       --config examples/nlp/language_modeling/conf/megatron_gpt_config.yaml \ 
        
                       --input /home/TestData/nlp/megatron_gpt/starcoder-ci-hf \ 
        
                       --output /home/TestData/nlp/megatron_gpt/starcoder-ci-hf' 
        
                       sh 'rm -f /home/TestData/nlp/megatron_gpt/starcoder-ci-hf/megatron_starcoder_tp1_pp1.nemo' 
        
                     } 
        
                   } 
        
                   stage('Falcon') { 
        
                     steps { 
        
                       sh 'python scripts/nlp_language_modeling/convert_hf_falcon_to_nemo.py \ 
        
                       --config examples/nlp/language_modeling/conf/megatron_falcon_config.yaml \ 
        
                       --input /home/TestData/nlp/megatron_gpt/falcon-ci-hf \ 
        
                       --output /home/TestData/nlp/megatron_gpt/falcon-ci-hf/falcon_ci.nemo' 
        
                       sh 'rm -f /home/TestData/nlp/megatron_gpt/falcon-ci-hf/falcon_ci.nemo' 
        
                     } 
        
                   } 
        
                 } 
        
               }

cc @ericharper

janekl · 2024-01-25T09:19:47Z

Hi @janekl thanks for your comments! I'm leaning towards merging this as is and once we settle on how we want to refactor the import scripts do the actual refactor. I think you are right it shares a lot with the Llama script, and once the PRs you mention have been approved & merged we can follow the appropriate guidelines here as well.

Sorry for replying late. That's fine, we can polish it later, there is some consistency PR open here #8192.

In any case, please see my comments in two other threads. We really need to test conversion in analogy to other scripts. Otherwise we won't have any confidence in refactoring work later. I think @ericharper should support this extra effort.

ericharper

LGTM. Thanks!

* Import script for mistral-7b. From mistral checkpoint not hf. Pending: support for block-diagonal attention mask. Signed-off-by: Alexandros Koumparoulis <[email protected]> * add window_size to nemo_config. Signed-off-by: Alexandros Koumparoulis <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Alexandros Koumparoulis <[email protected]> * Switch from Mistral checkpoint to HF-Mistral. Signed-off-by: Alexandros Koumparoulis <[email protected]> * Force lowercase when checking for normalization type. Signed-off-by: Alexandros Koumparoulis <[email protected]> * NeMo-Mistral-7B to HF-Mistral-7B. Signed-off-by: Alexandros Koumparoulis <[email protected]> --------- Signed-off-by: Alexandros Koumparoulis <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Eric Harper <[email protected]>

* Import script for mistral-7b. From mistral checkpoint not hf. Pending: support for block-diagonal attention mask. Signed-off-by: Alexandros Koumparoulis <[email protected]> * add window_size to nemo_config. Signed-off-by: Alexandros Koumparoulis <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Alexandros Koumparoulis <[email protected]> * Switch from Mistral checkpoint to HF-Mistral. Signed-off-by: Alexandros Koumparoulis <[email protected]> * Force lowercase when checking for normalization type. Signed-off-by: Alexandros Koumparoulis <[email protected]> * NeMo-Mistral-7B to HF-Mistral-7B. Signed-off-by: Alexandros Koumparoulis <[email protected]> --------- Signed-off-by: Alexandros Koumparoulis <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: stevehuang52 <[email protected]>

* Import script for mistral-7b. From mistral checkpoint not hf. Pending: support for block-diagonal attention mask. Signed-off-by: Alexandros Koumparoulis <[email protected]> * add window_size to nemo_config. Signed-off-by: Alexandros Koumparoulis <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Alexandros Koumparoulis <[email protected]> * Switch from Mistral checkpoint to HF-Mistral. Signed-off-by: Alexandros Koumparoulis <[email protected]> * Force lowercase when checking for normalization type. Signed-off-by: Alexandros Koumparoulis <[email protected]> * NeMo-Mistral-7B to HF-Mistral-7B. Signed-off-by: Alexandros Koumparoulis <[email protected]> --------- Signed-off-by: Alexandros Koumparoulis <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Sasha Meister <[email protected]>

This reverts commit 9944304.

* Import script for mistral-7b. From mistral checkpoint not hf. Pending: support for block-diagonal attention mask. Signed-off-by: Alexandros Koumparoulis <[email protected]> * add window_size to nemo_config. Signed-off-by: Alexandros Koumparoulis <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Alexandros Koumparoulis <[email protected]> * Switch from Mistral checkpoint to HF-Mistral. Signed-off-by: Alexandros Koumparoulis <[email protected]> * Force lowercase when checking for normalization type. Signed-off-by: Alexandros Koumparoulis <[email protected]> * NeMo-Mistral-7B to HF-Mistral-7B. Signed-off-by: Alexandros Koumparoulis <[email protected]> --------- Signed-off-by: Alexandros Koumparoulis <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Eric Harper <[email protected]>

* Import script for mistral-7b. From mistral checkpoint not hf. Pending: support for block-diagonal attention mask. Signed-off-by: Alexandros Koumparoulis <[email protected]> * add window_size to nemo_config. Signed-off-by: Alexandros Koumparoulis <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Alexandros Koumparoulis <[email protected]> * Switch from Mistral checkpoint to HF-Mistral. Signed-off-by: Alexandros Koumparoulis <[email protected]> * Force lowercase when checking for normalization type. Signed-off-by: Alexandros Koumparoulis <[email protected]> * NeMo-Mistral-7B to HF-Mistral-7B. Signed-off-by: Alexandros Koumparoulis <[email protected]> --------- Signed-off-by: Alexandros Koumparoulis <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Pablo Garay <[email protected]>

* Import script for mistral-7b. From mistral checkpoint not hf. Pending: support for block-diagonal attention mask. Signed-off-by: Alexandros Koumparoulis <[email protected]> * add window_size to nemo_config. Signed-off-by: Alexandros Koumparoulis <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Alexandros Koumparoulis <[email protected]> * Switch from Mistral checkpoint to HF-Mistral. Signed-off-by: Alexandros Koumparoulis <[email protected]> * Force lowercase when checking for normalization type. Signed-off-by: Alexandros Koumparoulis <[email protected]> * NeMo-Mistral-7B to HF-Mistral-7B. Signed-off-by: Alexandros Koumparoulis <[email protected]> --------- Signed-off-by: Alexandros Koumparoulis <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Eric Harper <[email protected]>

akoumpa force-pushed the mistral_7b_support/akoumparouli branch 3 times, most recently from 57d578c to 91ac647 Compare December 19, 2023 18:36

akoumpa changed the title ~~Mistral 7b support/akoumparouli~~ Mistral 7b support Dec 19, 2023

github-advanced-security bot found potential problems Dec 19, 2023

View reviewed changes

scripts/nlp_language_modeling/convert_mistral_7b_to_nemo.py Fixed Show fixed Hide fixed

akoumpa force-pushed the mistral_7b_support/akoumparouli branch 10 times, most recently from a8516bf to c1b0eb0 Compare December 20, 2023 00:43

github-actions bot added the NLP label Dec 20, 2023

akoumpa force-pushed the mistral_7b_support/akoumparouli branch from 4a4e725 to 29c543a Compare December 26, 2023 23:10

janekl reviewed Dec 27, 2023

View reviewed changes

github-actions bot added the stale label Jan 11, 2024

akoumpa removed the stale label Jan 11, 2024

akoumpa force-pushed the mistral_7b_support/akoumparouli branch 2 times, most recently from 4dc48ae to fab45fc Compare January 17, 2024 22:57

github-advanced-security bot found potential problems Jan 17, 2024

View reviewed changes

akoumpa marked this pull request as ready for review January 18, 2024 17:57

akoumpa requested a review from ericharper January 18, 2024 17:58

Import script for mistral-7b.

531975b

From mistral checkpoint not hf. Pending: support for block-diagonal attention mask. Signed-off-by: Alexandros Koumparoulis <[email protected]>

akoumpa and others added 4 commits January 18, 2024 10:00

add window_size to nemo_config.

15ab972

Signed-off-by: Alexandros Koumparoulis <[email protected]>

[pre-commit.ci] auto fixes from pre-commit.com hooks

1f43ece

for more information, see https://pre-commit.ci Signed-off-by: Alexandros Koumparoulis <[email protected]>

Switch from Mistral checkpoint to HF-Mistral.

0c72777

Signed-off-by: Alexandros Koumparoulis <[email protected]>

Force lowercase when checking for normalization type.

f9d9641

Signed-off-by: Alexandros Koumparoulis <[email protected]>

akoumpa force-pushed the mistral_7b_support/akoumparouli branch from fab45fc to 1a0e698 Compare January 18, 2024 18:01

akoumpa changed the title ~~Mistral 7b support~~ Mistral 7b conversion script Jan 18, 2024

NeMo-Mistral-7B to HF-Mistral-7B.

763fbb2

Signed-off-by: Alexandros Koumparoulis <[email protected]>

akoumpa force-pushed the mistral_7b_support/akoumparouli branch from 1a0e698 to 763fbb2 Compare January 18, 2024 23:30

Merge branch 'main' into mistral_7b_support/akoumparouli

035a228

Merge branch 'main' into mistral_7b_support/akoumparouli

81372b0

ericharper approved these changes Jan 25, 2024

View reviewed changes

ericharper merged commit 9944304 into NVIDIA:main Jan 25, 2024
11 checks passed

layalir pushed a commit to layalir/NeMo that referenced this pull request Feb 29, 2024

Revert "Mistral 7b conversion script (NVIDIA#8052)"

101c6f2

This reverts commit 9944304.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mistral 7b conversion script #8052

Mistral 7b conversion script #8052

akoumpa commented Dec 19, 2023 •

edited

Loading

janekl commented Dec 27, 2023

janekl Dec 27, 2023

akoumpa Jan 11, 2024

janekl Jan 25, 2024

janekl Jan 25, 2024

janekl commented Dec 27, 2023

github-actions bot commented Jan 11, 2024

akoumpa commented Jan 11, 2024

ericharper commented Jan 24, 2024

ericharper commented Jan 24, 2024

janekl commented Jan 25, 2024

janekl commented Jan 25, 2024

ericharper left a comment

Mistral 7b conversion script #8052

Mistral 7b conversion script #8052

Conversation

akoumpa commented Dec 19, 2023 • edited Loading

What does this PR do ?

Changelog

Usage

Jenkins CI

Before your PR is "Ready for review"

Who can review?

Additional Information

janekl commented Dec 27, 2023

janekl Dec 27, 2023

Choose a reason for hiding this comment

akoumpa Jan 11, 2024

Choose a reason for hiding this comment

janekl Jan 25, 2024

Choose a reason for hiding this comment

janekl Jan 25, 2024

Choose a reason for hiding this comment

janekl commented Dec 27, 2023

github-actions bot commented Jan 11, 2024

akoumpa commented Jan 11, 2024

ericharper commented Jan 24, 2024

ericharper commented Jan 24, 2024

janekl commented Jan 25, 2024

janekl commented Jan 25, 2024

ericharper left a comment

Choose a reason for hiding this comment

akoumpa commented Dec 19, 2023 •

edited

Loading