-
Notifications
You must be signed in to change notification settings - Fork 9.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
server: add llama2 chat template #5425
server: add llama2 chat template #5425
Conversation
To be clear, this is Mistral Instruct that you are talking about. Vanilla Mistral doesn't have a prompt template at all, and there are derivatives such as Mistral OpenOrca that do use ChatML. |
@cebtenzzre Thanks for info. As I search on the internet, this format seems to be introduced initially in llama 2 instruct version (search on google with the term "llama 2 chat template"). I'll need to rename the template on my PR to reflect that. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding checks for the input of --chat-template
might be nice to have
examples/server/utils.hpp
Outdated
|
||
for (auto it = messages.begin(); it != messages.end(); ++it) { | ||
if (!is_inside_turn) { | ||
output << "<s>[INST] "; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspect that later in the logic we might be incidentally adding some extra BOS tokens through llama_tokenize(...)
. It's a bit difficult to follow the logic - probably need to debug this and add some more trace logs. If you feel like it, you might want to look into this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah you're right, when I tested with chat completion, add_bos
is set to true, so I'll remove the <s>
here (I've also verified that on the list of tokens returned by tokenize
function, the first token is already 1
, which is BOS. So no need to explicitly add <s>
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, waiting for @cebtenzzre's review
Co-authored-by: Jared Van Bortel <[email protected]>
Are you sure it's the right prompt? It is my understanding that there is no system prompt with Mistral. You could add it like this I believe, but it's without Also see: https://docs.mistral.ai/models/ |
@arch-btw The << SYS >> is supported by some models finetuned from Mistral. You can see details in my discussion above: #5425 (comment) Edit: also refer to the message above where I realized that this template comes from llama 2 instead of mistral. I edited in the code, but I forgot to update the title of this PR |
output << content << " [/INST]"; | ||
is_inside_turn = true; | ||
} else { | ||
output << " " << content << " </s>"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think there should be a space in front of the eos token.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed in #5425 (comment) , the space is not consistent among different variations of this template. I leave it here to be sure. It changes nothing on the behavior of the model.
Ideally if I had access to tokenizer.chat_template
, I can adapt the template the template to exactly what the model need (i.e. spaces, <<SYS>>
,...). But since I don't have access to that part for now, I kinda have to make a "catch-all" template.
And also, what is important in this template is the placement of [INST]
, [/INST]
and </s>
. A model that never need trained to understand chatml will not understand that the message should be encapsulated between <|im_start|>
and <|im_end|>
. It will kind of "repeat what's already in the conversation".
As long as the model see [INST] some user prompt [/INST] some assistant responses </s>
, it will behave just fine. The spaces does not matter, that's what I experience while testing this PR.
Is arg --chat-template actually added? I tried it but it doesn't seem to be added to the server correctly @ngxson @ggerganov |
@duykhanhbk Can you run the server with |
@ngxson I got an error: unknown argument: --chat-template llama2 (I really pull the latest code) |
@duykhanhbk Can you take the latest official pre-built binary (or docker) ? I'm doubt it's problem on your side, the compilation is not done correctly. Without any further logs and details, I really cannot help. |
* server: add mistral chat template * server: fix typo * server: rename template mistral to llama2 * server: format_llama2: remove BOS * server: validate "--chat-template" argument * server: clean up using_chatml variable Co-authored-by: Jared Van Bortel <[email protected]> --------- Co-authored-by: Jared Van Bortel <[email protected]>
* server: add mistral chat template * server: fix typo * server: rename template mistral to llama2 * server: format_llama2: remove BOS * server: validate "--chat-template" argument * server: clean up using_chatml variable Co-authored-by: Jared Van Bortel <[email protected]> --------- Co-authored-by: Jared Van Bortel <[email protected]>
Motivation
Some of llama2-based models (for example, Mistral) does not use chatml template. As I observed, using the chatml with Mistral does still work, but it leads to some weird behaviors and hard to steer (the model does not follow system message).
Mistral uses this kind of chat template:
With this PR, I observed that Mistral (and fine-tuned model based on Mistral) response more correct & coherent than using chatml.
Ideally, I want to switch between chatml and mistral format using kv named
tokenizer.chat_template
, but access to kv is hidden insidellama_model_loader
and I cannot access it modifying a lot of things in llama.cpp. Switching using argument is less ideal, but still works though.