Special tokens are not rendered correctly (as empty) -- llama3 specific? #6770

DreamGenX · 2024-04-19T16:29:58Z

Hello!

Using this GGUF: https://huggingface.co/LoneStriker/opus-v1.2-llama-3-8b-GGUF

The tokens are tokenized correctly, just not rendered:

main: prompt: '<|im_end|>'
main: number of tokens in prompt = 1
128009 -> ''

main: prompt: '<|im_start|>'
main: number of tokens in prompt = 1
128006 -> ''

I first tested this with old commit:

version: 2243 (201294ae)
201294ae177b308fb3a99dc504dd6d27e8afa907

And replicated with fresh main:

version: 2698 (637e9a86)
637e9a86c220718d008b54842dfd294aa96d3b7a

The text was updated successfully, but these errors were encountered:

bavellone · 2024-04-19T18:15:20Z

I believe I'm also running into this issue using Meta-Llama-3-70B-Instruct.IQ3_XS.gguf - I'm seeing tokens being output from the model but decoding them all return empty strings (I let it run for a few hundred tokens). I'm not seeing this behaviour on a Meta-Llama-3-8B-Instruct.Q6_K.gguf model.

Offloading to ROCm, only loading ~25 layers for 70B.

DreamGenX · 2024-04-20T07:41:53Z

KoboldCpp has somewhat of a fix: https://github.com/LostRuins/koboldcpp/releases/tag/v1.63

Added support for special tokens in stop_sequences. Thus, if you set <|eot_id|> as a stop sequence and it can be tokenized into a single token, it will just work and function like the EOS token, allowing multiple EOS-like tokens.

Commit: LostRuins@3170284

As far as I can tell, it will still not render the tokens, but at least stopping should work.

Lyrcaxis · 2024-04-20T14:55:21Z

There's also something wrong with the existing tokenizer -- \n\n (Ċ Ċ) should be properly merged into a single token based on tokenizer's merge instructions ĊĊ but unless it's at the end of the prompt, it tokenizes as [\n,\n].

(Note: Only tried via the LLAMA API using LLamaSharp)

I think tokenizer's integration in GGUFs could use some attention overall (merges + added_tokens).

phymbert · 2024-04-20T14:57:46Z

Please wait for:

DreamGenX · 2024-04-21T11:10:52Z

Hey @phymbert -- did you check the description of the issue? I don't think anything in the issues you linked is really relevant or solving this problem -- the problem being that special tokens are not rendered.

ggerganov · 2024-04-21T11:49:56Z

We can start rendering special tokens here:

llama.cpp/llama.cpp

Lines 17017 to 17019 in 0e4802b

    
           } else if (llama_is_control_token(model->vocab, token)) { 
        
               ; 
        
           }

But my personal opinion is that parsing the text of special/control tokens is a poor practice. AFAICT it seems to have worked so far since we have incorrectly exported tokens such as "<|im_end|>" as normal text tokens.

In #6745 we will introduce llama_token_is_eog() which can be used to properly check for end-of-generation tokens. I think this is more robust and it's better to adopt that interface

DreamGenX · 2024-04-22T06:39:48Z

Thanks for the PR @ggerganov, awesome.

As for whether it's bad practice depends very much on the use case. For the use case most people deal with, which is generating assistant response based on conversations history, I agree it's not needed -- just pass the input in EOM / EOT token.with the header and stop on end-of-message token.

There are other use cases though, where you fine-tune the model to generate multiple turns. Simplest example would be multiple messages from multiple role-play characters at once, where the message header contains character name and possibly other metadata. Or generating multiple function-call instructions. In those cases special tokens allow you to properly parse the response, rather than rely on ad-hoc formatting.

DreamGenX added the bug-unconfirmed label Apr 19, 2024

DreamGenX mentioned this issue Apr 19, 2024

Support Llama 3 conversion #6745

Merged

mlim15 mentioned this issue Apr 20, 2024

Prompt Format Updates for LLama3 huggingface/chat-ui#1035

Open

phymbert closed this as completed Apr 20, 2024

ggerganov mentioned this issue Apr 21, 2024

llama : add option to render special/control tokens #6807

Merged

mr-september mentioned this issue Jun 5, 2024

epic: format generated code in code format (not plain text) janhq/jan#2913

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Special tokens are not rendered correctly (as empty) -- llama3 specific? #6770

Special tokens are not rendered correctly (as empty) -- llama3 specific? #6770

DreamGenX commented Apr 19, 2024 •

edited

Loading

bavellone commented Apr 19, 2024 •

edited

Loading

DreamGenX commented Apr 20, 2024 •

edited

Loading

Lyrcaxis commented Apr 20, 2024

phymbert commented Apr 20, 2024

DreamGenX commented Apr 21, 2024

ggerganov commented Apr 21, 2024

DreamGenX commented Apr 22, 2024 •

edited

Loading

Special tokens are not rendered correctly (as empty) -- llama3 specific? #6770

Special tokens are not rendered correctly (as empty) -- llama3 specific? #6770

Comments

DreamGenX commented Apr 19, 2024 • edited Loading

bavellone commented Apr 19, 2024 • edited Loading

DreamGenX commented Apr 20, 2024 • edited Loading

Lyrcaxis commented Apr 20, 2024

phymbert commented Apr 20, 2024

DreamGenX commented Apr 21, 2024

ggerganov commented Apr 21, 2024

DreamGenX commented Apr 22, 2024 • edited Loading

DreamGenX commented Apr 19, 2024 •

edited

Loading

bavellone commented Apr 19, 2024 •

edited

Loading

DreamGenX commented Apr 20, 2024 •

edited

Loading

DreamGenX commented Apr 22, 2024 •

edited

Loading