server: add llama2 chat template #5425

ngxson · 2024-02-08T22:25:51Z

Motivation

Some of llama2-based models (for example, Mistral) does not use chatml template. As I observed, using the chatml with Mistral does still work, but it leads to some weird behaviors and hard to steer (the model does not follow system message).

Mistral uses this kind of chat template:

<s>[INST] <<SYS>>
You are a helpful assistant.
<</SYS>>

Who are you? [/INST] I'm a helpful assistant </s>

With this PR, I observed that Mistral (and fine-tuned model based on Mistral) response more correct & coherent than using chatml.

Ideally, I want to switch between chatml and mistral format using kv named tokenizer.chat_template, but access to kv is hidden inside llama_model_loader and I cannot access it modifying a lot of things in llama.cpp. Switching using argument is less ideal, but still works though.

examples/server/oai.hpp

cebtenzzre · 2024-02-09T02:04:42Z

To be clear, this is Mistral Instruct that you are talking about. Vanilla Mistral doesn't have a prompt template at all, and there are derivatives such as Mistral OpenOrca that do use ChatML.

ngxson · 2024-02-09T08:52:13Z

@cebtenzzre Thanks for info. As I search on the internet, this format seems to be introduced initially in llama 2 instruct version (search on google with the term "llama 2 chat template"). I'll need to rename the template on my PR to reflect that.

ggerganov

Adding checks for the input of --chat-template might be nice to have

ggerganov · 2024-02-09T13:54:01Z

examples/server/utils.hpp

+
+    for (auto it = messages.begin(); it != messages.end(); ++it) {
+        if (!is_inside_turn) {
+            output << "<s>[INST] ";


I suspect that later in the logic we might be incidentally adding some extra BOS tokens through llama_tokenize(...). It's a bit difficult to follow the logic - probably need to debug this and add some more trace logs. If you feel like it, you might want to look into this

Yeah you're right, when I tested with chat completion, add_bos is set to true, so I'll remove the <s> here (I've also verified that on the list of tokens returned by tokenize function, the first token is already 1, which is BOS. So no need to explicitly add <s>)

The redundant <s> is removed in 7efef47

Validation for --chat-template is added in ebe3079

Please review and merge if it's OK for you. Thank you!

Ok, waiting for @cebtenzzre's review

examples/server/oai.hpp

examples/server/utils.hpp

Co-authored-by: Jared Van Bortel <[email protected]>

arch-btw · 2024-02-10T17:50:15Z

Are you sure it's the right prompt? It is my understanding that there is no system prompt with Mistral.

You could add it like this I believe, but it's without <<SYS>> tags: HuggingFace Discussion

Also see: https://docs.mistral.ai/models/

ngxson · 2024-02-10T19:34:07Z

@arch-btw The << SYS >> is supported by some models finetuned from Mistral. You can see details in my discussion above: #5425 (comment)

Edit: also refer to the message above where I realized that this template comes from llama 2 instead of mistral. I edited in the code, but I forgot to update the title of this PR

jxy · 2024-02-12T17:06:23Z

examples/server/utils.hpp

+            output << content << " [/INST]";
+            is_inside_turn = true;
+        } else {
+            output << " " << content << " </s>";


I don't think there should be a space in front of the eos token.

As discussed in #5425 (comment) , the space is not consistent among different variations of this template. I leave it here to be sure. It changes nothing on the behavior of the model.

Ideally if I had access to tokenizer.chat_template, I can adapt the template the template to exactly what the model need (i.e. spaces, <<SYS>>,...). But since I don't have access to that part for now, I kinda have to make a "catch-all" template.

And also, what is important in this template is the placement of [INST], [/INST] and </s>. A model that never need trained to understand chatml will not understand that the message should be encapsulated between <|im_start|> and <|im_end|>. It will kind of "repeat what's already in the conversation".

As long as the model see [INST] some user prompt [/INST] some assistant responses </s>, it will behave just fine. The spaces does not matter, that's what I experience while testing this PR.

duykhanhbk · 2024-02-16T04:16:54Z

Is arg --chat-template actually added? I tried it but it doesn't seem to be added to the server correctly @ngxson @ggerganov

ngxson · 2024-02-16T08:51:34Z

@duykhanhbk Can you run the server with --verbose log and paste it here? Also, I may re-write this part soon, because now I know how to access the template saved in the model

duykhanhbk · 2024-02-16T09:27:16Z

@ngxson I got an error: unknown argument: --chat-template llama2 (I really pull the latest code)

ngxson · 2024-02-16T09:39:08Z

@duykhanhbk Can you take the latest official pre-built binary (or docker) ?

I'm doubt it's problem on your side, the compilation is not done correctly. Without any further logs and details, I really cannot help.

duykhanhbk · 2024-02-16T09:42:59Z

my cmd @ngxson: ./server -m /home/azureuser/khanhvd/llama.cpp/models/vistral-7b-chat/hf_model/ggml-model-f16.gguf --port 8002 --alias vistral-7b-chat -c 4000 --chat-template llama2
Update: I re-build with make and run again and it's worked! Thanks @ngxson

* server: add mistral chat template * server: fix typo * server: rename template mistral to llama2 * server: format_llama2: remove BOS * server: validate "--chat-template" argument * server: clean up using_chatml variable Co-authored-by: Jared Van Bortel <[email protected]> --------- Co-authored-by: Jared Van Bortel <[email protected]>

server: add mistral chat template

2ebedda

cebtenzzre reviewed Feb 9, 2024

View reviewed changes

examples/server/oai.hpp Outdated Show resolved Hide resolved

server: fix typo

27976c3

server: rename template mistral to llama2

269437e

ngxson requested a review from cebtenzzre February 9, 2024 08:55

ggerganov approved these changes Feb 9, 2024

View reviewed changes

ngxson added 2 commits February 9, 2024 17:00

server: format_llama2: remove BOS

7efef47

server: validate "--chat-template" argument

ebe3079

cebtenzzre reviewed Feb 9, 2024

View reviewed changes

examples/server/oai.hpp Outdated Show resolved Hide resolved

cebtenzzre reviewed Feb 9, 2024

View reviewed changes

examples/server/utils.hpp Show resolved Hide resolved

server: clean up using_chatml variable

1a27406

Co-authored-by: Jared Van Bortel <[email protected]>

cebtenzzre approved these changes Feb 10, 2024

View reviewed changes

ngxson changed the title ~~server: add mistral chat template~~ server: add llama2 chat template Feb 10, 2024

ggerganov merged commit 907e08c into ggerganov:master Feb 11, 2024
54 checks passed

cebtenzzre mentioned this pull request Feb 12, 2024

Fix multi-turn chat-style prompt formatting/tokenization nomic-ai/gpt4all#1961

Closed

jxy reviewed Feb 12, 2024

View reviewed changes

ngxson mentioned this pull request Feb 15, 2024

Get chat_template from a server endpoint. #5447

Closed

This was referenced Feb 18, 2024

Server: use llama_chat_apply_template to format the chat #5575

Closed

Server: use llama_chat_apply_template #5593

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

server: add llama2 chat template #5425

server: add llama2 chat template #5425

ngxson commented Feb 8, 2024 •

edited

Loading

cebtenzzre commented Feb 9, 2024

ngxson commented Feb 9, 2024 •

edited

Loading

ggerganov left a comment

ggerganov Feb 9, 2024

ngxson Feb 9, 2024 •

edited

Loading

ngxson Feb 9, 2024

ggerganov Feb 9, 2024

arch-btw commented Feb 10, 2024

ngxson commented Feb 10, 2024 •

edited

Loading

jxy Feb 12, 2024

ngxson Feb 12, 2024

duykhanhbk commented Feb 16, 2024

ngxson commented Feb 16, 2024 •

edited

Loading

duykhanhbk commented Feb 16, 2024 •

edited

Loading

ngxson commented Feb 16, 2024

duykhanhbk commented Feb 16, 2024 •

edited

Loading

server: add llama2 chat template #5425

server: add llama2 chat template #5425

Conversation

ngxson commented Feb 8, 2024 • edited Loading

Motivation

cebtenzzre commented Feb 9, 2024

ngxson commented Feb 9, 2024 • edited Loading

ggerganov left a comment

Choose a reason for hiding this comment

ggerganov Feb 9, 2024

Choose a reason for hiding this comment

ngxson Feb 9, 2024 • edited Loading

Choose a reason for hiding this comment

ngxson Feb 9, 2024

Choose a reason for hiding this comment

ggerganov Feb 9, 2024

Choose a reason for hiding this comment

arch-btw commented Feb 10, 2024

ngxson commented Feb 10, 2024 • edited Loading

jxy Feb 12, 2024

Choose a reason for hiding this comment

ngxson Feb 12, 2024

Choose a reason for hiding this comment

duykhanhbk commented Feb 16, 2024

ngxson commented Feb 16, 2024 • edited Loading

duykhanhbk commented Feb 16, 2024 • edited Loading

ngxson commented Feb 16, 2024

duykhanhbk commented Feb 16, 2024 • edited Loading

ngxson commented Feb 8, 2024 •

edited

Loading

ngxson commented Feb 9, 2024 •

edited

Loading

ngxson Feb 9, 2024 •

edited

Loading

ngxson commented Feb 10, 2024 •

edited

Loading

ngxson commented Feb 16, 2024 •

edited

Loading

duykhanhbk commented Feb 16, 2024 •

edited

Loading

duykhanhbk commented Feb 16, 2024 •

edited

Loading