Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Server completion streaming returns special tokens as empty strings in chunks #7106

Closed
Inego opened this issue May 6, 2024 · 9 comments
Closed

Comments

@Inego
Copy link

Inego commented May 6, 2024

Version: b2794.
Model: Meta-Llama-3-8B-Instruct-Q8_0.gguf (updated)
Prompt: "<|start_header_id|>user<|end_header_id|>How much is 12 plus 19?<|eot_id|>"

When I run the server and send a completion request with streaming, in the verbose logs I see that the server generates the "<|start_header_id|>", "assistant" and "<|end_header_id|>", followed by "\n\n12 + 19 = 31".

However, the streaming chunks sent by server for <|start_header_id|> and <|end_header_id|> have empty strings as content in data.

I couldn't find a config parameter either in the server or in the request that could change this behavior.

@Inego
Copy link
Author

Inego commented May 6, 2024

Actually, the special tokens are not output to the content without streaming as well.

@Inego
Copy link
Author

Inego commented May 6, 2024

This may be related to #6860.

@turian
Copy link

turian commented May 7, 2024

Did you generate the GGUF yourself, or download it? How old is it?

@Inego
Copy link
Author

Inego commented May 7, 2024

@ggerganov
Copy link
Owner

This may be related to #6860.

Yes, special tokens are not rendered in server. This can become an user-configurable option

@Inego
Copy link
Author

Inego commented May 7, 2024

Yes, special tokens are not rendered in server. This can become an user-configurable option

I would argue that rendering them should be mandatory for the Completion API, since it deals with token generation at a lower level than the Chat API. Therefore, if the model generates a sequence of tokens, these tokens should be visible to the API client.

@teleprint-me
Copy link
Contributor

teleprint-me commented May 8, 2024

I would argue that rendering them should be mandatory for the Completion API, since it deals with token generation at a lower level than the Chat API. Therefore, if the model generates a sequence of tokens, these tokens should be visible to the API client.

I agree. This is especially true for training, finetuning, and testing.

Yes, special tokens are not rendered in server. This can become an user-configurable option

I think making this user-configurable is a good compromise.

@github-actions github-actions bot added the stale label Jun 8, 2024
@shibe2
Copy link
Collaborator

shibe2 commented Jun 8, 2024

If you render special tokens as text, it will be difficult to distinguish special token from regular text that happens to match token's name/string. When streaming, if the whole name came in one event, it's probably the special token, and if it's broken into multiple events, it's regular text. Without streaming, I don't see any way to distinguish between the cases.

A better way would be to return special tokens in a separate field. For streaming, we can add field tokens with an array of tokens that correspond to text in content. When content is empty and tokens field is non-empty, the client will know that it's a special token. When not steaming, we can use the same format that is accepted for prompts – an array with token identifiers and strings. The response to the example in the original report would be:

"content": "assistant\n\n12 + 19 = 31",
"generated": [128006, "assistant", 128007, "\n\n12 + 19 = 31"]

A client that expects special tokens to be generated should ignore content and process generated field, or however it will be named.

Also, I think, you are not supposed to ask Llama 3 to generate special tokens other than eot. You add <|start_header_id|> "assistant" <|end_header_id|> "\n\n" to the end of the prompt, and the model generates just the message content.

@github-actions github-actions bot removed the stale label Jun 9, 2024
@github-actions github-actions bot added the stale label Jul 9, 2024
Copy link
Contributor

This issue was closed because it has been inactive for 14 days since being marked as stale.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants