Fix the incorrect permutation of gguf #31788

PenutChen · 2024-07-04T05:25:05Z

What does this PR do?

Fixes #31766. The permutation of q_proj and k_proj needs to consider both num_attention_heads and num_key_value_heads.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@SunMarc

LysandreJik · 2024-07-04T13:10:18Z

Thanks for your PR @PenutChen, we really appreciate it!

@SunMarc it'd be awesome if you can review this when you have a minute.

SunMarc

Thanks for the deep dive @PenutChen Really good job ! Can you add a few tests since it was not working with a bigger llama ! If possible, we can test this on a q4 model to not take too much memory !

src/transformers/modeling_gguf_pytorch_utils.py

Co-authored-by: Marc Sun <[email protected]>

PenutChen · 2024-07-05T01:52:10Z

@SunMarc Thanks for the advice! I noticed that there is already a Q4 Llama3 test. This update will change the expected text. Should I just update the expected text?

SunMarc

Thanks a lot for iterating ! Could you are the perplexity of the model before and after for llama 3 ? If the llama 3 perpelxity was so high, it's a bit strange that it was able to pass the test / generate coherent text

PenutChen · 2024-07-05T13:57:45Z

The perplexity of Llama3 Q4 is 1713.8865 before the fix and 6.4626 after the fix. The model comes from NousResearch/Meta-Llama-3-8B-GGUF.

They still produce reasonable output if the sequence length is short. This might be because only the q_proj and k_proj weights are abnormal, and their weights are relatively low. Other weights are robust to a certain extent, which could be the cause. This is just my guess.

SunMarc · 2024-07-05T14:21:51Z

Thanks for the confirmation !

Fix the incorrect permutation of gguf

da57298

SunMarc reviewed Jul 4, 2024

View reviewed changes

src/transformers/modeling_gguf_pytorch_utils.py Outdated Show resolved Hide resolved

src/transformers/modeling_gguf_pytorch_utils.py Outdated Show resolved Hide resolved

src/transformers/modeling_gguf_pytorch_utils.py Outdated Show resolved Hide resolved

PenutChen and others added 4 commits July 5, 2024 08:48

rename num_kv_heads

9cfe762

Co-authored-by: Marc Sun <[email protected]>

add typing to num_kv_heads

fe3cd6c

Co-authored-by: Marc Sun <[email protected]>

rename variables

07c4ccf

refactor permute function name

d8651d3

update the expected text of the llama3 q4 test

450cd12

SunMarc approved these changes Jul 5, 2024

View reviewed changes

SunMarc requested a review from LysandreJik July 5, 2024 14:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix the incorrect permutation of gguf #31788

Fix the incorrect permutation of gguf #31788

PenutChen commented Jul 4, 2024

LysandreJik commented Jul 4, 2024

SunMarc left a comment

PenutChen commented Jul 5, 2024

SunMarc left a comment

PenutChen commented Jul 5, 2024

SunMarc commented Jul 5, 2024

Fix the incorrect permutation of gguf #31788

Are you sure you want to change the base?

Fix the incorrect permutation of gguf #31788

Conversation

PenutChen commented Jul 4, 2024

What does this PR do?

Before submitting

Who can review?

LysandreJik commented Jul 4, 2024

SunMarc left a comment

Choose a reason for hiding this comment

PenutChen commented Jul 5, 2024

SunMarc left a comment

Choose a reason for hiding this comment

PenutChen commented Jul 5, 2024

SunMarc commented Jul 5, 2024