Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

server: add llama2 chat template #5425

Merged
merged 6 commits into from
Feb 11, 2024

Conversation

ngxson
Copy link
Collaborator

@ngxson ngxson commented Feb 8, 2024

Motivation

Some of llama2-based models (for example, Mistral) does not use chatml template. As I observed, using the chatml with Mistral does still work, but it leads to some weird behaviors and hard to steer (the model does not follow system message).

Mistral uses this kind of chat template:

<s>[INST] <<SYS>>
You are a helpful assistant.
<</SYS>>

Who are you? [/INST] I'm a helpful assistant </s>

With this PR, I observed that Mistral (and fine-tuned model based on Mistral) response more correct & coherent than using chatml.

Ideally, I want to switch between chatml and mistral format using kv named tokenizer.chat_template, but access to kv is hidden inside llama_model_loader and I cannot access it modifying a lot of things in llama.cpp. Switching using argument is less ideal, but still works though.

examples/server/oai.hpp Outdated Show resolved Hide resolved
@cebtenzzre
Copy link
Collaborator

To be clear, this is Mistral Instruct that you are talking about. Vanilla Mistral doesn't have a prompt template at all, and there are derivatives such as Mistral OpenOrca that do use ChatML.

@ngxson
Copy link
Collaborator Author

ngxson commented Feb 9, 2024

@cebtenzzre Thanks for info. As I search on the internet, this format seems to be introduced initially in llama 2 instruct version (search on google with the term "llama 2 chat template"). I'll need to rename the template on my PR to reflect that.

Copy link
Owner

@ggerganov ggerganov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding checks for the input of --chat-template might be nice to have


for (auto it = messages.begin(); it != messages.end(); ++it) {
if (!is_inside_turn) {
output << "<s>[INST] ";
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect that later in the logic we might be incidentally adding some extra BOS tokens through llama_tokenize(...). It's a bit difficult to follow the logic - probably need to debug this and add some more trace logs. If you feel like it, you might want to look into this

Copy link
Collaborator Author

@ngxson ngxson Feb 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah you're right, when I tested with chat completion, add_bos is set to true, so I'll remove the <s> here (I've also verified that on the list of tokens returned by tokenize function, the first token is already 1, which is BOS. So no need to explicitly add <s>)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The redundant <s> is removed in 7efef47

Validation for --chat-template is added in ebe3079

Please review and merge if it's OK for you. Thank you!

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, waiting for @cebtenzzre's review

examples/server/oai.hpp Outdated Show resolved Hide resolved
@arch-btw
Copy link
Contributor

Are you sure it's the right prompt? It is my understanding that there is no system prompt with Mistral.

You could add it like this I believe, but it's without <<SYS>> tags: HuggingFace Discussion

Also see: https://docs.mistral.ai/models/

@ngxson
Copy link
Collaborator Author

ngxson commented Feb 10, 2024

@arch-btw The << SYS >> is supported by some models finetuned from Mistral. You can see details in my discussion above: #5425 (comment)

Edit: also refer to the message above where I realized that this template comes from llama 2 instead of mistral. I edited in the code, but I forgot to update the title of this PR

@ngxson ngxson changed the title server: add mistral chat template server: add llama2 chat template Feb 10, 2024
@ggerganov ggerganov merged commit 907e08c into ggerganov:master Feb 11, 2024
54 checks passed
output << content << " [/INST]";
is_inside_turn = true;
} else {
output << " " << content << " </s>";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there should be a space in front of the eos token.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed in #5425 (comment) , the space is not consistent among different variations of this template. I leave it here to be sure. It changes nothing on the behavior of the model.

Ideally if I had access to tokenizer.chat_template, I can adapt the template the template to exactly what the model need (i.e. spaces, <<SYS>>,...). But since I don't have access to that part for now, I kinda have to make a "catch-all" template.

And also, what is important in this template is the placement of [INST], [/INST] and </s>. A model that never need trained to understand chatml will not understand that the message should be encapsulated between <|im_start|> and <|im_end|>. It will kind of "repeat what's already in the conversation".

As long as the model see [INST] some user prompt [/INST] some assistant responses </s>, it will behave just fine. The spaces does not matter, that's what I experience while testing this PR.

@duykhanhbk
Copy link

Is arg --chat-template actually added? I tried it but it doesn't seem to be added to the server correctly @ngxson @ggerganov

@ngxson
Copy link
Collaborator Author

ngxson commented Feb 16, 2024

@duykhanhbk Can you run the server with --verbose log and paste it here? Also, I may re-write this part soon, because now I know how to access the template saved in the model

@duykhanhbk
Copy link

duykhanhbk commented Feb 16, 2024

@ngxson I got an error: unknown argument: --chat-template llama2 (I really pull the latest code)

@ngxson
Copy link
Collaborator Author

ngxson commented Feb 16, 2024

@duykhanhbk Can you take the latest official pre-built binary (or docker) ?

I'm doubt it's problem on your side, the compilation is not done correctly. Without any further logs and details, I really cannot help.

@duykhanhbk
Copy link

duykhanhbk commented Feb 16, 2024

my cmd @ngxson: ./server -m /home/azureuser/khanhvd/llama.cpp/models/vistral-7b-chat/hf_model/ggml-model-f16.gguf --port 8002 --alias vistral-7b-chat -c 4000 --chat-template llama2
Update: I re-build with make and run again and it's worked! Thanks @ngxson

jordankanter pushed a commit to jordankanter/llama.cpp that referenced this pull request Mar 13, 2024
* server: add mistral chat template

* server: fix typo

* server: rename template mistral to llama2

* server: format_llama2: remove BOS

* server: validate "--chat-template" argument

* server: clean up using_chatml variable

Co-authored-by: Jared Van Bortel <[email protected]>

---------

Co-authored-by: Jared Van Bortel <[email protected]>
hodlen pushed a commit to hodlen/llama.cpp that referenced this pull request Apr 1, 2024
* server: add mistral chat template

* server: fix typo

* server: rename template mistral to llama2

* server: format_llama2: remove BOS

* server: validate "--chat-template" argument

* server: clean up using_chatml variable

Co-authored-by: Jared Van Bortel <[email protected]>

---------

Co-authored-by: Jared Van Bortel <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants