fix gated_linear_unit bug #8042

Agoniii · 2023-12-18T12:19:13Z

What does this PR do ?

Fix the issue when running Llama2 eval or LLama2 SFT.

megatron.core.dist_checkpointing.core.CheckpointingException: Global shape mismatch for loaded ((32, 22016, 4096)) and expected ((32, 11008, 4096)) tensor for key model.decoder.layers.mlp.linear_fc1.weight

Collection: [Note which collection this PR will affect]

Changelog

Add specific line by line info of high level changes in this PR.

Usage

You can potentially add a usage example below

# Add a code snippet demonstrating how to use this

Jenkins CI

To run Jenkins, a NeMo User with write access must comment jenkins on the PR.

Before your PR is "Ready for review"

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests?
Did you add or update any necessary documentation?
Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
- Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

New Feature
Bugfix
Documentation

If you haven't finished some of the above items you can still open "Draft" PR.

Who can review?

Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.

Additional Information

Related to # (issue)

Signed-off-by: Agoniii <[email protected]>

cuichenx · 2024-01-03T19:45:08Z

jenkins

cuichenx · 2024-01-04T16:35:42Z

merging this as this is critical to unblock any llama2 workflows

Signed-off-by: Agoniii <[email protected]> Co-authored-by: Chen Cui <[email protected]>

Signed-off-by: Agoniii <[email protected]> Co-authored-by: Chen Cui <[email protected]> Signed-off-by: Sasha Meister <[email protected]>

Signed-off-by: Agoniii <[email protected]> Co-authored-by: Chen Cui <[email protected]>

github-actions bot added the NLP label Dec 18, 2023

fix gated_linear_unit bug

676264d

Signed-off-by: Agoniii <[email protected]>

Agoniii force-pushed the xueh/fix_gated_linear_unit branch from 2d63996 to 676264d Compare December 18, 2023 12:20

shanmugamr1992 approved these changes Dec 22, 2023

View reviewed changes

Merge branch 'main' into xueh/fix_gated_linear_unit

5f7d181

cuichenx merged commit d62f6ff into NVIDIA:main Jan 4, 2024
11 checks passed

minitu pushed a commit to minitu/NeMo that referenced this pull request Jan 19, 2024

fix gated_linear_unit bug (NVIDIA#8042)

f686ce9

Signed-off-by: Agoniii <[email protected]> Co-authored-by: Chen Cui <[email protected]>

ssh-meister pushed a commit to ssh-meister/NeMo that referenced this pull request Feb 15, 2024

fix gated_linear_unit bug (NVIDIA#8042)

bb32442

Signed-off-by: Agoniii <[email protected]> Co-authored-by: Chen Cui <[email protected]> Signed-off-by: Sasha Meister <[email protected]>

rohitrango pushed a commit to rohitrango/NeMo that referenced this pull request Jun 25, 2024

fix gated_linear_unit bug (NVIDIA#8042)

245c2ff

Signed-off-by: Agoniii <[email protected]> Co-authored-by: Chen Cui <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix gated_linear_unit bug #8042

fix gated_linear_unit bug #8042

Agoniii commented Dec 18, 2023

cuichenx commented Jan 3, 2024

cuichenx commented Jan 4, 2024

fix gated_linear_unit bug #8042

fix gated_linear_unit bug #8042

Conversation

Agoniii commented Dec 18, 2023

What does this PR do ?

Changelog

Usage

Jenkins CI

Before your PR is "Ready for review"

Who can review?

Additional Information

cuichenx commented Jan 3, 2024

cuichenx commented Jan 4, 2024