Skip to content

Conversation

SolenoidWGT
Copy link
Contributor

@SolenoidWGT SolenoidWGT commented Feb 3, 2024

InternLM2 has two problems in adapting llama.cpp, which are fixed by this PR:

  1. The weights of q and k require additional reshape to be compatible with llama's inference interface..
  2. For the chat model, we need to explicitly replace llama eos with internlm2 eos, so that the model can end the conversation normally.

The Prompt for reference.

[UNUSED_TOKEN_146]system\nYou are InternLM (书生·浦语), a helpful, honest, and harmless AI assistant developed by Shanghai AI Laboratory (上海人工智能实验室).[UNUSED_TOKEN_145]\n

User name

[UNUSED_TOKEN_146]user

Bot name

[UNUSED_TOKEN_146]assistant

Prompt template

{{prompt}}

{{history}} 
{{char}}:

Chat history template

{{name}}: 
{{message}} [UNUSED_TOKEN_145]

image

cc: @arch-btw @sweetcard

@SolenoidWGT SolenoidWGT force-pushed the fix/internlm2_qk_shape branch from bf507c8 to 54dd7da Compare February 3, 2024 18:31
@arch-btw
Copy link
Contributor

arch-btw commented Feb 4, 2024

Thank you! Can confirm that it works with internlm2-chat-1_8b-sft

Copy link

@sweetcard sweetcard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can wok now. Thank you.👍

@ggerganov ggerganov merged commit 7e1ae37 into ggml-org:master Feb 5, 2024
jordankanter pushed a commit to jordankanter/llama.cpp that referenced this pull request Mar 13, 2024
* py : fix internlm2-hf convert to gguf

* ggml-ci
hodlen pushed a commit to hodlen/llama.cpp that referenced this pull request Apr 1, 2024
* py : fix internlm2-hf convert to gguf

* ggml-ci
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants