Support Falcon Variants (7B/40B/180B) in Mcore NeMo #7666

xuanzic · 2023-10-09T19:12:43Z

What does this PR do ?

Conversion script for HuggingFace Falcon checkpoint (1b-rw/7b-rw/7b/40b/180b) to NeMo format, as well as NeMo to HuggingFace format.
Support Falcon parallel attention and new decoder architecture by adding falcon_decoder_layer and falcon_spec,
these will make use of mcore's latest spec system.

Limitation: although the conversion script works for 1b-rw/7b-rw, Alibi is currently not supported in mcore GPT, so expect suboptimal generation results from these two models that use Alibi specifically. This limitation will NOT affect other 7b/40b/180b Falcon variants.

Collection: [Note which collection this PR will affect]

Changelog

add bi-directional HF <-> NeMo conversion script
Falcon architecture implemented using mcore spec system and mcore layers.

Usage

See docstring in the conversion script on how to use.

Before your PR is "Ready for review"

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests?
Did you add or update any necessary documentation?
Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
- Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

New Feature
Bugfix
Documentation

If you haven't finished some of the above items you can still open "Draft" PR.

Who can review?

Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.

Additional Information

Related to # (issue)

for more information, see https://pre-commit.ci

…chen/falcon

for more information, see https://pre-commit.ci

…hanges

for more information, see https://pre-commit.ci

Signed-off-by: Vivian <[email protected]>

ericharper · 2023-12-13T06:28:45Z

jenkins

tests/collections/nlp/test_falcon_transformer_layer.py

tests/collections/nlp/test_falcon_model.py

tests/collections/nlp/test_falcon_transformer_layer.py

This reverts commit 9c4960f.

for more information, see https://pre-commit.ci

ericharper · 2023-12-14T07:03:03Z

jenkins

for more information, see https://pre-commit.ci

ericharper · 2023-12-14T23:35:07Z

jenkins

for more information, see https://pre-commit.ci

nemo/collections/nlp/models/language_modeling/megatron/falcon/falcon_decoder_layer.py

nemo/collections/nlp/models/language_modeling/megatron/falcon/falcon_spec.py

ericharper

LGTM. Thanks!

for more information, see https://pre-commit.ci

ericharper · 2023-12-15T06:39:50Z

jenkins

This reverts commit 9028555.

ericharper · 2023-12-15T07:27:11Z

jenkins

ericharper

LGTM. Thanks!

* support falcon * support falcon bug fix layernorm naming * fix todo * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix for new architecture * new transformerlayer for falcon * fix for new decoder architecture * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add DDP * fix state dict based on spec system * fix state dict based on change in layers, fix amp O2 * add falcon spec system support * remove old falcon mcore support * refactor conversion script to align with others * add support for falcon-rw model (normal gpt architecture) * modify falcon 7b config and remove trust remote code due to HF code changes * rename falcon implementation dir * change dir name * modify block name * rename decoder layer * clean up * remove debug * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add proper header Signed-off-by: Vivian <[email protected]> * falcon lora mixin to support when non-fused LN linear * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revise jenkinsfile, tokenizer update in convertion script, add two falcon config files Signed-off-by: Vivian <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * refactor falcon to use MCoreGPT+spec+baselayer initial commit * modification to get nemo run with mcore in this version * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * small fix on the output file path * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add nemo to hf conversion script * fix on base layer config and missing state dict due to dist ckpt * Revert "fix on base layer config and missing state dict due to dist ckpt" This reverts commit c85f3ac. * fix on base layer config and missing state dict due to dist ckpt * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix megatron_gpt_model Signed-off-by: Vivian chen <[email protected]> * modify model config Signed-off-by: Vivian chen <[email protected]> * Apply suggestions from code review Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Vivian Chen <[email protected]> * fix based on review Signed-off-by: Vivian Chen <[email protected]> * multiple revise based on review and latest mcore changes * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by: Vivian Chen <[email protected]> * subclass from TransformerLayer * fixes according to comments * add falcon ci test Signed-off-by: Vivian Chen <[email protected]> * add post_self_attn_layernorm * add explicit explanation/refs for handling lora logic * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fixes for code scanning * remove unused imports * unit test for falcon model * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add falcon transformer layer unit test * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fixes for code scan * remove mcore dependent tests * Revert "remove mcore dependent tests" This reverts commit 9c4960f. * add import guards * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add import guards cont * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fixes for ci import tests and unit tests * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fixes for codeql * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Revert "fixes for codeql" This reverts commit 9028555. --------- Signed-off-by: Vivian <[email protected]> Signed-off-by: Vivian chen <[email protected]> Signed-off-by: Vivian Chen <[email protected]> Signed-off-by: Vivian Chen <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Huiying Li <[email protected]> Co-authored-by: HuiyingLi <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Aishwarya Bhandare <[email protected]>

* support falcon * support falcon bug fix layernorm naming * fix todo * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix for new architecture * new transformerlayer for falcon * fix for new decoder architecture * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add DDP * fix state dict based on spec system * fix state dict based on change in layers, fix amp O2 * add falcon spec system support * remove old falcon mcore support * refactor conversion script to align with others * add support for falcon-rw model (normal gpt architecture) * modify falcon 7b config and remove trust remote code due to HF code changes * rename falcon implementation dir * change dir name * modify block name * rename decoder layer * clean up * remove debug * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add proper header Signed-off-by: Vivian <[email protected]> * falcon lora mixin to support when non-fused LN linear * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revise jenkinsfile, tokenizer update in convertion script, add two falcon config files Signed-off-by: Vivian <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * refactor falcon to use MCoreGPT+spec+baselayer initial commit * modification to get nemo run with mcore in this version * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * small fix on the output file path * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add nemo to hf conversion script * fix on base layer config and missing state dict due to dist ckpt * Revert "fix on base layer config and missing state dict due to dist ckpt" This reverts commit c85f3ac. * fix on base layer config and missing state dict due to dist ckpt * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix megatron_gpt_model Signed-off-by: Vivian chen <[email protected]> * modify model config Signed-off-by: Vivian chen <[email protected]> * Apply suggestions from code review Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Vivian Chen <[email protected]> * fix based on review Signed-off-by: Vivian Chen <[email protected]> * multiple revise based on review and latest mcore changes * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by: Vivian Chen <[email protected]> * subclass from TransformerLayer * fixes according to comments * add falcon ci test Signed-off-by: Vivian Chen <[email protected]> * add post_self_attn_layernorm * add explicit explanation/refs for handling lora logic * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fixes for code scanning * remove unused imports * unit test for falcon model * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add falcon transformer layer unit test * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fixes for code scan * remove mcore dependent tests * Revert "remove mcore dependent tests" This reverts commit 9c4960f. * add import guards * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add import guards cont * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fixes for ci import tests and unit tests * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fixes for codeql * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Revert "fixes for codeql" This reverts commit 9028555. --------- Signed-off-by: Vivian <[email protected]> Signed-off-by: Vivian chen <[email protected]> Signed-off-by: Vivian Chen <[email protected]> Signed-off-by: Vivian Chen <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Huiying Li <[email protected]> Co-authored-by: HuiyingLi <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Piotr Żelasko <[email protected]>

* support falcon * support falcon bug fix layernorm naming * fix todo * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix for new architecture * new transformerlayer for falcon * fix for new decoder architecture * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add DDP * fix state dict based on spec system * fix state dict based on change in layers, fix amp O2 * add falcon spec system support * remove old falcon mcore support * refactor conversion script to align with others * add support for falcon-rw model (normal gpt architecture) * modify falcon 7b config and remove trust remote code due to HF code changes * rename falcon implementation dir * change dir name * modify block name * rename decoder layer * clean up * remove debug * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add proper header Signed-off-by: Vivian <[email protected]> * falcon lora mixin to support when non-fused LN linear * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revise jenkinsfile, tokenizer update in convertion script, add two falcon config files Signed-off-by: Vivian <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * refactor falcon to use MCoreGPT+spec+baselayer initial commit * modification to get nemo run with mcore in this version * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * small fix on the output file path * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add nemo to hf conversion script * fix on base layer config and missing state dict due to dist ckpt * Revert "fix on base layer config and missing state dict due to dist ckpt" This reverts commit c85f3ac. * fix on base layer config and missing state dict due to dist ckpt * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix megatron_gpt_model Signed-off-by: Vivian chen <[email protected]> * modify model config Signed-off-by: Vivian chen <[email protected]> * Apply suggestions from code review Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Vivian Chen <[email protected]> * fix based on review Signed-off-by: Vivian Chen <[email protected]> * multiple revise based on review and latest mcore changes * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by: Vivian Chen <[email protected]> * subclass from TransformerLayer * fixes according to comments * add falcon ci test Signed-off-by: Vivian Chen <[email protected]> * add post_self_attn_layernorm * add explicit explanation/refs for handling lora logic * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fixes for code scanning * remove unused imports * unit test for falcon model * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add falcon transformer layer unit test * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fixes for code scan * remove mcore dependent tests * Revert "remove mcore dependent tests" This reverts commit 9c4960f. * add import guards * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add import guards cont * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fixes for ci import tests and unit tests * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fixes for codeql * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Revert "fixes for codeql" This reverts commit 9028555. --------- Signed-off-by: Vivian <[email protected]> Signed-off-by: Vivian chen <[email protected]> Signed-off-by: Vivian Chen <[email protected]> Signed-off-by: Vivian Chen <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Huiying Li <[email protected]> Co-authored-by: HuiyingLi <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Sasha Meister <[email protected]>

* support falcon * support falcon bug fix layernorm naming * fix todo * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix for new architecture * new transformerlayer for falcon * fix for new decoder architecture * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add DDP * fix state dict based on spec system * fix state dict based on change in layers, fix amp O2 * add falcon spec system support * remove old falcon mcore support * refactor conversion script to align with others * add support for falcon-rw model (normal gpt architecture) * modify falcon 7b config and remove trust remote code due to HF code changes * rename falcon implementation dir * change dir name * modify block name * rename decoder layer * clean up * remove debug * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add proper header Signed-off-by: Vivian <[email protected]> * falcon lora mixin to support when non-fused LN linear * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revise jenkinsfile, tokenizer update in convertion script, add two falcon config files Signed-off-by: Vivian <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * refactor falcon to use MCoreGPT+spec+baselayer initial commit * modification to get nemo run with mcore in this version * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * small fix on the output file path * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add nemo to hf conversion script * fix on base layer config and missing state dict due to dist ckpt * Revert "fix on base layer config and missing state dict due to dist ckpt" This reverts commit c85f3ac. * fix on base layer config and missing state dict due to dist ckpt * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix megatron_gpt_model Signed-off-by: Vivian chen <[email protected]> * modify model config Signed-off-by: Vivian chen <[email protected]> * Apply suggestions from code review Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Vivian Chen <[email protected]> * fix based on review Signed-off-by: Vivian Chen <[email protected]> * multiple revise based on review and latest mcore changes * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by: Vivian Chen <[email protected]> * subclass from TransformerLayer * fixes according to comments * add falcon ci test Signed-off-by: Vivian Chen <[email protected]> * add post_self_attn_layernorm * add explicit explanation/refs for handling lora logic * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fixes for code scanning * remove unused imports * unit test for falcon model * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add falcon transformer layer unit test * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fixes for code scan * remove mcore dependent tests * Revert "remove mcore dependent tests" This reverts commit 9c4960f. * add import guards * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add import guards cont * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fixes for ci import tests and unit tests * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fixes for codeql * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Revert "fixes for codeql" This reverts commit 9028555. --------- Signed-off-by: Vivian <[email protected]> Signed-off-by: Vivian chen <[email protected]> Signed-off-by: Vivian Chen <[email protected]> Signed-off-by: Vivian Chen <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Huiying Li <[email protected]> Co-authored-by: HuiyingLi <[email protected]> Co-authored-by: Eric Harper <[email protected]>

xuanzic and others added 24 commits September 12, 2023 16:20

support falcon

fe90ac2

support falcon bug fix layernorm naming

ffaf228

fix todo

562e6f0

[pre-commit.ci] auto fixes from pre-commit.com hooks

8297b5c

for more information, see https://pre-commit.ci

fix for new architecture

2fc07a4

new transformerlayer for falcon

9bafd73

Merge branch 'vchen/falcon' of https://github.com/xuanzic/NeMo into v…

3bf4b54

…chen/falcon

fix for new decoder architecture

36fe312

[pre-commit.ci] auto fixes from pre-commit.com hooks

044026d

for more information, see https://pre-commit.ci

add DDP

908004e

fix state dict based on spec system

c69d577

fix state dict based on change in layers, fix amp O2

0610e19

add falcon spec system support

a8684d0

remove old falcon mcore support

0995272

refactor conversion script to align with others

f2ad089

add support for falcon-rw model (normal gpt architecture)

47d2f23

modify falcon 7b config and remove trust remote code due to HF code c…

ed8869a

…hanges

rename falcon implementation dir

59e0f2e

change dir name

03d06bc

modify block name

71b25b8

rename decoder layer

9bb2e32

clean up

d105603

remove debug

65fb726

Merge remote-tracking branch 'upstream/main' into vchen/falcon

eaa42ff

github-actions bot added the NLP label Oct 9, 2023

[pre-commit.ci] auto fixes from pre-commit.com hooks

c4ad769

for more information, see https://pre-commit.ci

xuanzic marked this pull request as draft October 9, 2023 19:14

xuanzic and others added 3 commits October 11, 2023 01:17

add proper header

b9264d4

Signed-off-by: Vivian <[email protected]>

Merge branch 'main' into vchen/falcon

1c1c7dd

falcon lora mixin to support when non-fused LN linear

3dcbd38

github-advanced-security bot found potential problems Dec 13, 2023

View reviewed changes

tests/collections/nlp/test_falcon_transformer_layer.py Fixed Show fixed Hide fixed

tests/collections/nlp/test_falcon_model.py Fixed Show fixed Hide fixed

tests/collections/nlp/test_falcon_transformer_layer.py Fixed Show fixed Hide fixed

HuiyingLi and others added 6 commits December 13, 2023 10:21

fixes for code scan

5ad525c

remove mcore dependent tests

9c4960f

Revert "remove mcore dependent tests"

fb04806

This reverts commit 9c4960f.

add import guards

e54fdad

[pre-commit.ci] auto fixes from pre-commit.com hooks

7cd8cfb

for more information, see https://pre-commit.ci

Merge branch 'main' into vchen/falcon

e51cfa1

HuiyingLi and others added 3 commits December 14, 2023 00:29

add import guards cont

beada8c

[pre-commit.ci] auto fixes from pre-commit.com hooks

5d76cf3

for more information, see https://pre-commit.ci

Merge branch 'main' into vchen/falcon

fafa39f

HuiyingLi and others added 2 commits December 14, 2023 17:37

fixes for ci import tests and unit tests

27b7694

[pre-commit.ci] auto fixes from pre-commit.com hooks

e7476e8

for more information, see https://pre-commit.ci

github-advanced-security bot found potential problems Dec 15, 2023

View reviewed changes

ericharper previously approved these changes Dec 15, 2023

View reviewed changes

fixes for codeql

9028555

HuiyingLi dismissed ericharper’s stale review via 9028555 December 15, 2023 06:37

pre-commit-ci bot and others added 2 commits December 15, 2023 06:38

[pre-commit.ci] auto fixes from pre-commit.com hooks

0531cff

for more information, see https://pre-commit.ci

Merge branch 'main' into vchen/falcon

eb5bf94

Revert "fixes for codeql"

5f866da

This reverts commit 9028555.

ericharper approved these changes Dec 15, 2023

View reviewed changes

ericharper merged commit 8523384 into NVIDIA:main Dec 15, 2023
11 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support Falcon Variants (7B/40B/180B) in Mcore NeMo #7666

Support Falcon Variants (7B/40B/180B) in Mcore NeMo #7666

xuanzic commented Oct 9, 2023 •

edited

Loading

ericharper commented Dec 13, 2023

ericharper commented Dec 14, 2023

ericharper commented Dec 14, 2023

ericharper left a comment

ericharper commented Dec 15, 2023

ericharper commented Dec 15, 2023

ericharper left a comment

Support Falcon Variants (7B/40B/180B) in Mcore NeMo #7666

Support Falcon Variants (7B/40B/180B) in Mcore NeMo #7666

Conversation

xuanzic commented Oct 9, 2023 • edited Loading

What does this PR do ?

Changelog

Usage

Before your PR is "Ready for review"

Who can review?

Additional Information

ericharper commented Dec 13, 2023

ericharper commented Dec 14, 2023

ericharper commented Dec 14, 2023

ericharper left a comment

Choose a reason for hiding this comment

ericharper commented Dec 15, 2023

ericharper commented Dec 15, 2023

ericharper left a comment

Choose a reason for hiding this comment

xuanzic commented Oct 9, 2023 •

edited

Loading