Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Falcon Variants (7B/40B/180B) in Mcore NeMo #7666

Merged
merged 86 commits into from
Dec 15, 2023

Conversation

xuanzic
Copy link
Contributor

@xuanzic xuanzic commented Oct 9, 2023

What does this PR do ?

Conversion script for HuggingFace Falcon checkpoint (1b-rw/7b-rw/7b/40b/180b) to NeMo format, as well as NeMo to HuggingFace format.
Support Falcon parallel attention and new decoder architecture by adding falcon_decoder_layer and falcon_spec,
these will make use of mcore's latest spec system.

Limitation: although the conversion script works for 1b-rw/7b-rw, Alibi is currently not supported in mcore GPT, so expect suboptimal generation results from these two models that use Alibi specifically. This limitation will NOT affect other 7b/40b/180b Falcon variants.

Collection: [Note which collection this PR will affect]

Changelog

  • add bi-directional HF <-> NeMo conversion script
  • Falcon architecture implemented using mcore spec system and mcore layers.

Usage

  • See docstring in the conversion script on how to use.

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you add or update any necessary documentation?
  • Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
    • Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

  • New Feature
  • Bugfix
  • Documentation

If you haven't finished some of the above items you can still open "Draft" PR.

Who can review?

Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.

Additional Information

  • Related to # (issue)

@github-actions github-actions bot added the NLP label Oct 9, 2023
@xuanzic xuanzic marked this pull request as draft October 9, 2023 19:14
@ericharper
Copy link
Collaborator

jenkins

@ericharper
Copy link
Collaborator

jenkins

@ericharper
Copy link
Collaborator

jenkins

ericharper
ericharper previously approved these changes Dec 15, 2023
Copy link
Collaborator

@ericharper ericharper left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks!

@ericharper
Copy link
Collaborator

jenkins

This reverts commit 9028555.
@ericharper
Copy link
Collaborator

jenkins

Copy link
Collaborator

@ericharper ericharper left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks!

@ericharper ericharper merged commit 8523384 into NVIDIA:main Dec 15, 2023
11 checks passed
ashbhandare pushed a commit to ashbhandare/NeMo that referenced this pull request Dec 15, 2023
* support falcon

* support falcon bug fix layernorm naming

* fix todo

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix for new architecture

* new transformerlayer for falcon

* fix for new decoder architecture

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add DDP

* fix state dict based on spec system

* fix state dict based on change in layers, fix amp O2

* add falcon spec system support

* remove old falcon mcore support

* refactor conversion script to align with others

* add support for falcon-rw model (normal gpt architecture)

* modify falcon 7b config and remove trust remote code due to HF code changes

* rename falcon implementation dir

* change dir name

* modify block name

* rename decoder layer

* clean up

* remove debug

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add proper header

Signed-off-by: Vivian <[email protected]>

* falcon lora mixin to support when non-fused LN linear

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* revise jenkinsfile, tokenizer update in convertion script, add two falcon config files

Signed-off-by: Vivian <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* refactor falcon to use MCoreGPT+spec+baselayer initial commit

* modification to get nemo run with mcore in this version

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* small fix on the output file path

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add nemo to hf conversion script

* fix on base layer config and missing state dict due to dist ckpt

* Revert "fix on base layer config and missing state dict due to dist ckpt"

This reverts commit c85f3ac.

* fix on base layer config and missing state dict due to dist ckpt

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix megatron_gpt_model

Signed-off-by: Vivian chen <[email protected]>

* modify model config

Signed-off-by: Vivian chen <[email protected]>

* Apply suggestions from code review

Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: Vivian Chen <[email protected]>

* fix based on review

Signed-off-by: Vivian Chen <[email protected]>

* multiple revise based on review and latest mcore changes

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix

Signed-off-by: Vivian Chen <[email protected]>

* subclass from TransformerLayer

* fixes according to comments

* add falcon ci test

Signed-off-by: Vivian Chen <[email protected]>

* add post_self_attn_layernorm

* add explicit explanation/refs for handling lora logic

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fixes for code scanning

* remove unused imports

* unit test for falcon model

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add falcon transformer layer unit test

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fixes for code scan

* remove mcore dependent tests

* Revert "remove mcore dependent tests"

This reverts commit 9c4960f.

* add import guards

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add import guards cont

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fixes for ci import tests and unit tests

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fixes for codeql

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Revert "fixes for codeql"

This reverts commit 9028555.

---------

Signed-off-by: Vivian <[email protected]>
Signed-off-by: Vivian chen <[email protected]>
Signed-off-by: Vivian Chen <[email protected]>
Signed-off-by: Vivian Chen <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Huiying Li <[email protected]>
Co-authored-by: HuiyingLi <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: Aishwarya Bhandare <[email protected]>
pzelasko pushed a commit to pzelasko/NeMo that referenced this pull request Jan 3, 2024
* support falcon

* support falcon bug fix layernorm naming

* fix todo

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix for new architecture

* new transformerlayer for falcon

* fix for new decoder architecture

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add DDP

* fix state dict based on spec system

* fix state dict based on change in layers, fix amp O2

* add falcon spec system support

* remove old falcon mcore support

* refactor conversion script to align with others

* add support for falcon-rw model (normal gpt architecture)

* modify falcon 7b config and remove trust remote code due to HF code changes

* rename falcon implementation dir

* change dir name

* modify block name

* rename decoder layer

* clean up

* remove debug

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add proper header

Signed-off-by: Vivian <[email protected]>

* falcon lora mixin to support when non-fused LN linear

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* revise jenkinsfile, tokenizer update in convertion script, add two falcon config files

Signed-off-by: Vivian <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* refactor falcon to use MCoreGPT+spec+baselayer initial commit

* modification to get nemo run with mcore in this version

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* small fix on the output file path

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add nemo to hf conversion script

* fix on base layer config and missing state dict due to dist ckpt

* Revert "fix on base layer config and missing state dict due to dist ckpt"

This reverts commit c85f3ac.

* fix on base layer config and missing state dict due to dist ckpt

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix megatron_gpt_model

Signed-off-by: Vivian chen <[email protected]>

* modify model config

Signed-off-by: Vivian chen <[email protected]>

* Apply suggestions from code review

Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: Vivian Chen <[email protected]>

* fix based on review

Signed-off-by: Vivian Chen <[email protected]>

* multiple revise based on review and latest mcore changes

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix

Signed-off-by: Vivian Chen <[email protected]>

* subclass from TransformerLayer

* fixes according to comments

* add falcon ci test

Signed-off-by: Vivian Chen <[email protected]>

* add post_self_attn_layernorm

* add explicit explanation/refs for handling lora logic

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fixes for code scanning

* remove unused imports

* unit test for falcon model

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add falcon transformer layer unit test

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fixes for code scan

* remove mcore dependent tests

* Revert "remove mcore dependent tests"

This reverts commit 9c4960f.

* add import guards

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add import guards cont

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fixes for ci import tests and unit tests

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fixes for codeql

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Revert "fixes for codeql"

This reverts commit 9028555.

---------

Signed-off-by: Vivian <[email protected]>
Signed-off-by: Vivian chen <[email protected]>
Signed-off-by: Vivian Chen <[email protected]>
Signed-off-by: Vivian Chen <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Huiying Li <[email protected]>
Co-authored-by: HuiyingLi <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: Piotr Żelasko <[email protected]>
ssh-meister pushed a commit to ssh-meister/NeMo that referenced this pull request Feb 15, 2024
* support falcon

* support falcon bug fix layernorm naming

* fix todo

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix for new architecture

* new transformerlayer for falcon

* fix for new decoder architecture

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add DDP

* fix state dict based on spec system

* fix state dict based on change in layers, fix amp O2

* add falcon spec system support

* remove old falcon mcore support

* refactor conversion script to align with others

* add support for falcon-rw model (normal gpt architecture)

* modify falcon 7b config and remove trust remote code due to HF code changes

* rename falcon implementation dir

* change dir name

* modify block name

* rename decoder layer

* clean up

* remove debug

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add proper header

Signed-off-by: Vivian <[email protected]>

* falcon lora mixin to support when non-fused LN linear

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* revise jenkinsfile, tokenizer update in convertion script, add two falcon config files

Signed-off-by: Vivian <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* refactor falcon to use MCoreGPT+spec+baselayer initial commit

* modification to get nemo run with mcore in this version

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* small fix on the output file path

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add nemo to hf conversion script

* fix on base layer config and missing state dict due to dist ckpt

* Revert "fix on base layer config and missing state dict due to dist ckpt"

This reverts commit c85f3ac.

* fix on base layer config and missing state dict due to dist ckpt

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix megatron_gpt_model

Signed-off-by: Vivian chen <[email protected]>

* modify model config

Signed-off-by: Vivian chen <[email protected]>

* Apply suggestions from code review

Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: Vivian Chen <[email protected]>

* fix based on review

Signed-off-by: Vivian Chen <[email protected]>

* multiple revise based on review and latest mcore changes

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix

Signed-off-by: Vivian Chen <[email protected]>

* subclass from TransformerLayer

* fixes according to comments

* add falcon ci test

Signed-off-by: Vivian Chen <[email protected]>

* add post_self_attn_layernorm

* add explicit explanation/refs for handling lora logic

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fixes for code scanning

* remove unused imports

* unit test for falcon model

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add falcon transformer layer unit test

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fixes for code scan

* remove mcore dependent tests

* Revert "remove mcore dependent tests"

This reverts commit 9c4960f.

* add import guards

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add import guards cont

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fixes for ci import tests and unit tests

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fixes for codeql

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Revert "fixes for codeql"

This reverts commit 9028555.

---------

Signed-off-by: Vivian <[email protected]>
Signed-off-by: Vivian chen <[email protected]>
Signed-off-by: Vivian Chen <[email protected]>
Signed-off-by: Vivian Chen <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Huiying Li <[email protected]>
Co-authored-by: HuiyingLi <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: Sasha Meister <[email protected]>
rohitrango pushed a commit to rohitrango/NeMo that referenced this pull request Jun 25, 2024
* support falcon

* support falcon bug fix layernorm naming

* fix todo

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix for new architecture

* new transformerlayer for falcon

* fix for new decoder architecture

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add DDP

* fix state dict based on spec system

* fix state dict based on change in layers, fix amp O2

* add falcon spec system support

* remove old falcon mcore support

* refactor conversion script to align with others

* add support for falcon-rw model (normal gpt architecture)

* modify falcon 7b config and remove trust remote code due to HF code changes

* rename falcon implementation dir

* change dir name

* modify block name

* rename decoder layer

* clean up

* remove debug

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add proper header

Signed-off-by: Vivian <[email protected]>

* falcon lora mixin to support when non-fused LN linear

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* revise jenkinsfile, tokenizer update in convertion script, add two falcon config files

Signed-off-by: Vivian <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* refactor falcon to use MCoreGPT+spec+baselayer initial commit

* modification to get nemo run with mcore in this version

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* small fix on the output file path

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add nemo to hf conversion script

* fix on base layer config and missing state dict due to dist ckpt

* Revert "fix on base layer config and missing state dict due to dist ckpt"

This reverts commit c85f3ac.

* fix on base layer config and missing state dict due to dist ckpt

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix megatron_gpt_model

Signed-off-by: Vivian chen <[email protected]>

* modify model config

Signed-off-by: Vivian chen <[email protected]>

* Apply suggestions from code review

Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: Vivian Chen <[email protected]>

* fix based on review

Signed-off-by: Vivian Chen <[email protected]>

* multiple revise based on review and latest mcore changes

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix

Signed-off-by: Vivian Chen <[email protected]>

* subclass from TransformerLayer

* fixes according to comments

* add falcon ci test

Signed-off-by: Vivian Chen <[email protected]>

* add post_self_attn_layernorm

* add explicit explanation/refs for handling lora logic

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fixes for code scanning

* remove unused imports

* unit test for falcon model

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add falcon transformer layer unit test

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fixes for code scan

* remove mcore dependent tests

* Revert "remove mcore dependent tests"

This reverts commit 9c4960f.

* add import guards

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add import guards cont

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fixes for ci import tests and unit tests

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fixes for codeql

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Revert "fixes for codeql"

This reverts commit 9028555.

---------

Signed-off-by: Vivian <[email protected]>
Signed-off-by: Vivian chen <[email protected]>
Signed-off-by: Vivian Chen <[email protected]>
Signed-off-by: Vivian Chen <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Huiying Li <[email protected]>
Co-authored-by: HuiyingLi <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants