-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support Falcon Variants (7B/40B/180B) in Mcore NeMo #7666
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
for more information, see https://pre-commit.ci
jenkins |
This reverts commit 9c4960f.
for more information, see https://pre-commit.ci
jenkins |
jenkins |
nemo/collections/nlp/models/language_modeling/megatron/falcon/falcon_decoder_layer.py
Fixed
Show fixed
Hide fixed
nemo/collections/nlp/models/language_modeling/megatron/falcon/falcon_spec.py
Fixed
Show fixed
Hide fixed
nemo/collections/nlp/models/language_modeling/megatron/falcon/falcon_spec.py
Fixed
Show fixed
Hide fixed
nemo/collections/nlp/models/language_modeling/megatron/falcon/falcon_spec.py
Fixed
Show fixed
Hide fixed
ericharper
previously approved these changes
Dec 15, 2023
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks!
jenkins |
This reverts commit 9028555.
jenkins |
ericharper
approved these changes
Dec 15, 2023
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks!
ashbhandare
pushed a commit
to ashbhandare/NeMo
that referenced
this pull request
Dec 15, 2023
* support falcon * support falcon bug fix layernorm naming * fix todo * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix for new architecture * new transformerlayer for falcon * fix for new decoder architecture * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add DDP * fix state dict based on spec system * fix state dict based on change in layers, fix amp O2 * add falcon spec system support * remove old falcon mcore support * refactor conversion script to align with others * add support for falcon-rw model (normal gpt architecture) * modify falcon 7b config and remove trust remote code due to HF code changes * rename falcon implementation dir * change dir name * modify block name * rename decoder layer * clean up * remove debug * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add proper header Signed-off-by: Vivian <[email protected]> * falcon lora mixin to support when non-fused LN linear * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revise jenkinsfile, tokenizer update in convertion script, add two falcon config files Signed-off-by: Vivian <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * refactor falcon to use MCoreGPT+spec+baselayer initial commit * modification to get nemo run with mcore in this version * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * small fix on the output file path * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add nemo to hf conversion script * fix on base layer config and missing state dict due to dist ckpt * Revert "fix on base layer config and missing state dict due to dist ckpt" This reverts commit c85f3ac. * fix on base layer config and missing state dict due to dist ckpt * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix megatron_gpt_model Signed-off-by: Vivian chen <[email protected]> * modify model config Signed-off-by: Vivian chen <[email protected]> * Apply suggestions from code review Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Vivian Chen <[email protected]> * fix based on review Signed-off-by: Vivian Chen <[email protected]> * multiple revise based on review and latest mcore changes * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by: Vivian Chen <[email protected]> * subclass from TransformerLayer * fixes according to comments * add falcon ci test Signed-off-by: Vivian Chen <[email protected]> * add post_self_attn_layernorm * add explicit explanation/refs for handling lora logic * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fixes for code scanning * remove unused imports * unit test for falcon model * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add falcon transformer layer unit test * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fixes for code scan * remove mcore dependent tests * Revert "remove mcore dependent tests" This reverts commit 9c4960f. * add import guards * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add import guards cont * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fixes for ci import tests and unit tests * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fixes for codeql * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Revert "fixes for codeql" This reverts commit 9028555. --------- Signed-off-by: Vivian <[email protected]> Signed-off-by: Vivian chen <[email protected]> Signed-off-by: Vivian Chen <[email protected]> Signed-off-by: Vivian Chen <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Huiying Li <[email protected]> Co-authored-by: HuiyingLi <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Aishwarya Bhandare <[email protected]>
pzelasko
pushed a commit
to pzelasko/NeMo
that referenced
this pull request
Jan 3, 2024
* support falcon * support falcon bug fix layernorm naming * fix todo * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix for new architecture * new transformerlayer for falcon * fix for new decoder architecture * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add DDP * fix state dict based on spec system * fix state dict based on change in layers, fix amp O2 * add falcon spec system support * remove old falcon mcore support * refactor conversion script to align with others * add support for falcon-rw model (normal gpt architecture) * modify falcon 7b config and remove trust remote code due to HF code changes * rename falcon implementation dir * change dir name * modify block name * rename decoder layer * clean up * remove debug * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add proper header Signed-off-by: Vivian <[email protected]> * falcon lora mixin to support when non-fused LN linear * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revise jenkinsfile, tokenizer update in convertion script, add two falcon config files Signed-off-by: Vivian <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * refactor falcon to use MCoreGPT+spec+baselayer initial commit * modification to get nemo run with mcore in this version * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * small fix on the output file path * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add nemo to hf conversion script * fix on base layer config and missing state dict due to dist ckpt * Revert "fix on base layer config and missing state dict due to dist ckpt" This reverts commit c85f3ac. * fix on base layer config and missing state dict due to dist ckpt * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix megatron_gpt_model Signed-off-by: Vivian chen <[email protected]> * modify model config Signed-off-by: Vivian chen <[email protected]> * Apply suggestions from code review Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Vivian Chen <[email protected]> * fix based on review Signed-off-by: Vivian Chen <[email protected]> * multiple revise based on review and latest mcore changes * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by: Vivian Chen <[email protected]> * subclass from TransformerLayer * fixes according to comments * add falcon ci test Signed-off-by: Vivian Chen <[email protected]> * add post_self_attn_layernorm * add explicit explanation/refs for handling lora logic * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fixes for code scanning * remove unused imports * unit test for falcon model * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add falcon transformer layer unit test * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fixes for code scan * remove mcore dependent tests * Revert "remove mcore dependent tests" This reverts commit 9c4960f. * add import guards * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add import guards cont * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fixes for ci import tests and unit tests * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fixes for codeql * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Revert "fixes for codeql" This reverts commit 9028555. --------- Signed-off-by: Vivian <[email protected]> Signed-off-by: Vivian chen <[email protected]> Signed-off-by: Vivian Chen <[email protected]> Signed-off-by: Vivian Chen <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Huiying Li <[email protected]> Co-authored-by: HuiyingLi <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Piotr Żelasko <[email protected]>
ssh-meister
pushed a commit
to ssh-meister/NeMo
that referenced
this pull request
Feb 15, 2024
* support falcon * support falcon bug fix layernorm naming * fix todo * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix for new architecture * new transformerlayer for falcon * fix for new decoder architecture * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add DDP * fix state dict based on spec system * fix state dict based on change in layers, fix amp O2 * add falcon spec system support * remove old falcon mcore support * refactor conversion script to align with others * add support for falcon-rw model (normal gpt architecture) * modify falcon 7b config and remove trust remote code due to HF code changes * rename falcon implementation dir * change dir name * modify block name * rename decoder layer * clean up * remove debug * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add proper header Signed-off-by: Vivian <[email protected]> * falcon lora mixin to support when non-fused LN linear * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revise jenkinsfile, tokenizer update in convertion script, add two falcon config files Signed-off-by: Vivian <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * refactor falcon to use MCoreGPT+spec+baselayer initial commit * modification to get nemo run with mcore in this version * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * small fix on the output file path * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add nemo to hf conversion script * fix on base layer config and missing state dict due to dist ckpt * Revert "fix on base layer config and missing state dict due to dist ckpt" This reverts commit c85f3ac. * fix on base layer config and missing state dict due to dist ckpt * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix megatron_gpt_model Signed-off-by: Vivian chen <[email protected]> * modify model config Signed-off-by: Vivian chen <[email protected]> * Apply suggestions from code review Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Vivian Chen <[email protected]> * fix based on review Signed-off-by: Vivian Chen <[email protected]> * multiple revise based on review and latest mcore changes * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by: Vivian Chen <[email protected]> * subclass from TransformerLayer * fixes according to comments * add falcon ci test Signed-off-by: Vivian Chen <[email protected]> * add post_self_attn_layernorm * add explicit explanation/refs for handling lora logic * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fixes for code scanning * remove unused imports * unit test for falcon model * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add falcon transformer layer unit test * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fixes for code scan * remove mcore dependent tests * Revert "remove mcore dependent tests" This reverts commit 9c4960f. * add import guards * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add import guards cont * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fixes for ci import tests and unit tests * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fixes for codeql * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Revert "fixes for codeql" This reverts commit 9028555. --------- Signed-off-by: Vivian <[email protected]> Signed-off-by: Vivian chen <[email protected]> Signed-off-by: Vivian Chen <[email protected]> Signed-off-by: Vivian Chen <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Huiying Li <[email protected]> Co-authored-by: HuiyingLi <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Sasha Meister <[email protected]>
rohitrango
pushed a commit
to rohitrango/NeMo
that referenced
this pull request
Jun 25, 2024
* support falcon * support falcon bug fix layernorm naming * fix todo * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix for new architecture * new transformerlayer for falcon * fix for new decoder architecture * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add DDP * fix state dict based on spec system * fix state dict based on change in layers, fix amp O2 * add falcon spec system support * remove old falcon mcore support * refactor conversion script to align with others * add support for falcon-rw model (normal gpt architecture) * modify falcon 7b config and remove trust remote code due to HF code changes * rename falcon implementation dir * change dir name * modify block name * rename decoder layer * clean up * remove debug * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add proper header Signed-off-by: Vivian <[email protected]> * falcon lora mixin to support when non-fused LN linear * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revise jenkinsfile, tokenizer update in convertion script, add two falcon config files Signed-off-by: Vivian <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * refactor falcon to use MCoreGPT+spec+baselayer initial commit * modification to get nemo run with mcore in this version * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * small fix on the output file path * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add nemo to hf conversion script * fix on base layer config and missing state dict due to dist ckpt * Revert "fix on base layer config and missing state dict due to dist ckpt" This reverts commit c85f3ac. * fix on base layer config and missing state dict due to dist ckpt * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix megatron_gpt_model Signed-off-by: Vivian chen <[email protected]> * modify model config Signed-off-by: Vivian chen <[email protected]> * Apply suggestions from code review Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Vivian Chen <[email protected]> * fix based on review Signed-off-by: Vivian Chen <[email protected]> * multiple revise based on review and latest mcore changes * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by: Vivian Chen <[email protected]> * subclass from TransformerLayer * fixes according to comments * add falcon ci test Signed-off-by: Vivian Chen <[email protected]> * add post_self_attn_layernorm * add explicit explanation/refs for handling lora logic * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fixes for code scanning * remove unused imports * unit test for falcon model * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add falcon transformer layer unit test * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fixes for code scan * remove mcore dependent tests * Revert "remove mcore dependent tests" This reverts commit 9c4960f. * add import guards * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add import guards cont * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fixes for ci import tests and unit tests * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fixes for codeql * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Revert "fixes for codeql" This reverts commit 9028555. --------- Signed-off-by: Vivian <[email protected]> Signed-off-by: Vivian chen <[email protected]> Signed-off-by: Vivian Chen <[email protected]> Signed-off-by: Vivian Chen <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Huiying Li <[email protected]> Co-authored-by: HuiyingLi <[email protected]> Co-authored-by: Eric Harper <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What does this PR do ?
Conversion script for HuggingFace Falcon checkpoint (1b-rw/7b-rw/7b/40b/180b) to NeMo format, as well as NeMo to HuggingFace format.
Support Falcon parallel attention and new decoder architecture by adding falcon_decoder_layer and falcon_spec,
these will make use of mcore's latest spec system.
Limitation: although the conversion script works for 1b-rw/7b-rw, Alibi is currently not supported in mcore GPT, so expect suboptimal generation results from these two models that use Alibi specifically. This limitation will NOT affect other 7b/40b/180b Falcon variants.
Collection: [Note which collection this PR will affect]
Changelog
Usage
Before your PR is "Ready for review"
Pre checks:
PR Type:
If you haven't finished some of the above items you can still open "Draft" PR.
Who can review?
Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.
Additional Information