Output shows <|endoftext|> tokens and #10604

UltraWelfare · 2024-11-30T15:08:36Z

UltraWelfare
Nov 30, 2024

I'm trying two models converted to gguf using the GGUF-my-repo space
Model 1
Model 2

They both face the same issue where they have <|endoftext|> or <|im_end> tokens in their output and they start questioning and answering themselves.

I'm starting the llama-server like this : .\llama-server --model .\teuken-7b-instruct-commercial-v0.4-q6_k.gguf. Setting the context size (parameter -c) doesn't change the output...

Other models work correctly like Llama3.2 and Llama3.1.. I'm not sure what is up with these two specifically

slaren · 2024-11-30T15:15:53Z

slaren
Nov 30, 2024
Collaborator

This is most likely caused due to incorrect configuration of the model tokenizer. From a quick look at https://huggingface.co/openGPT-X/Teuken-7B-instruct-commercial-v0.4/blob/main/tokenizer_config.json it seems that this model does not have a chat_template, so llama.cpp does not know how to format the chat messages. There may be other issues such as missing a list of stop tokens.

1 reply

UltraWelfare Nov 30, 2024
Author

Hm I see... I just looked at the output of the server

main: The chat template that comes with this model is not yet supported, falling back to chatml. This may cause the model to output suboptimal responses
main: chat template, built_in: 0, chat_example: '<|im_start|>system
You are a helpful assistant<|im_end|>
<|im_start|>user
Hello<|im_end|>
<|im_start|>assistant
Hi there<|im_end|>
<|im_start|>user
How are you?<|im_end|>
<|im_start|>assistant
'

It says built_in:0 but it has an example.. Unless this is the chatml one.

Looking at the Meltemi model:

main: chat template, built_in: 1, chat_example: '<|system|>
You are a helpful assistant<|endoftext|>
<|user|>
Hello<|endoftext|>
<|assistant|>
Hi there<|endoftext|>
<|user|>
How are you?<|endoftext|>
<|assistant|>
'

It seems more correct... although still not quite sure why it doesn't work D:

cristianadam · 2024-12-04T13:00:39Z

cristianadam
Dec 4, 2024

See also the discussion I've started at #10539

llama.cpp doesn't actually support the Teuken models.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Output shows <|endoftext|> tokens and #10604

{{title}}

Replies: 2 comments 1 reply

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

Output shows <|endoftext|> tokens and #10604

UltraWelfare Nov 30, 2024

Replies: 2 comments · 1 reply

slaren Nov 30, 2024 Collaborator

UltraWelfare Nov 30, 2024 Author

cristianadam Dec 4, 2024

UltraWelfare
Nov 30, 2024

Replies: 2 comments 1 reply

slaren
Nov 30, 2024
Collaborator

UltraWelfare Nov 30, 2024
Author

cristianadam
Dec 4, 2024