-
Notifications
You must be signed in to change notification settings - Fork 13.4k
Description
Prerequisites
Please answer the following questions for yourself before submitting an issue.
- I am running the latest code. Development is very rapid so there are no tagged versions as of now.
- I carefully followed the README.md.
- I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
- I reviewed the Discussions, and have a new bug or useful enhancement to share.
Description
(This is specifically for the latest 72B models. I have never tried the smaller ones).
I'm using this model: https://huggingface.co/Qwen/Qwen-72B-Chat
Commit: 33e171d1e9fc4903f9314b490d77fb8d58331b63
I think the current convert-hf-to-gguf.py does not produce a .gguf file that treats these two tokens correctly for <|im_start|> and <|im_end|>.
The prompt I used is "<|im_start|>system" for the examples below.
Following steps in #4281 to produce some .gguf files (I personally used the Q6_K on a MacStudio) I tried the tokenize tool:
27 -> '<'
91 -> '|'
318 -> 'im'
4906 -> '_start'
91 -> '|'
29 -> '>'
8948 -> 'system'
Compare this to a Yi model with exact same prompt:
6 -> '<|im_start|>'
10707 -> 'system'
I saw the Qwen model code (https://huggingface.co/Qwen/Qwen-72B/blob/main/tokenization_qwen.py#L37) and I think these are intended to be single tokens. But the current script does not handle it properly.
Steps to Reproduce
- Download the Qwen models. (https://huggingface.co/Qwen/Qwen-72B-Chat)
- Use the
convert-hf-to-gguf.pyscript to convert one into a.gguffile. (This is the exact command I found on my Mac Studio:python3 convert-hf-to-gguf.py --outfile /Volumes/T9/qwen_72b_chat_v3_f16.gguf --outtype f16 ~/text-generation-webui/models/Qwen_Qwen-72B-Chat) - Run
tokenizeon them to see what tokens are interpreted.
If I'm honest, I'm not sure if this would be a bug to llama.cpp repository or something Qwen team might want to fix in their repo. But I'm submitting it here for awareness.
Also, the model seems to work fine despite this. But maybe it would work better if they were interpreted correctly? No idea.