server : support preserving reasoning_content in assistant message#18994
server : support preserving reasoning_content in assistant message#18994pwilkin merged 5 commits intoggml-org:masterfrom
Conversation
pwilkin
left a comment
There was a problem hiding this comment.
Just as a general notion: I am not a fan of splitting reasoning handling into "enable_reasoning", "clear_thinking" and the passive "supports_preserve_reasoning". I think this is a bit messy. Don't have a clear idea of how to handle this yet, but I guess we should (a) detect whether model supports reasoning (b) enable reasoning by default if it does (c) pass reasoning traces if the template supports it (d) accept explicit overrides, but I'm not sure if the explicit overrides are something we should handle on the level of flags or just allow passing it in template_kwargs.
| #include "log.h" | ||
| #include "regex-partial.h" | ||
|
|
||
| // #include <minja/chat-template.hpp> |
There was a problem hiding this comment.
Should just remove those at this point, we're not going back to Minja.
common/chat.cpp
Outdated
| } | ||
| // std::vector<common_chat_msg> common_chat_msgs_parse_oaicompat(const std::string & messages) { | ||
| // return common_chat_msgs_parse_oaicompat(json::parse(messages)); | ||
| // } |
There was a problem hiding this comment.
Likewise, I'd just remove this. The code files are littered with comments like this that are left and then never removed.
common/chat.cpp
Outdated
| } | ||
| // std::vector<common_chat_tool> common_chat_tools_parse_oaicompat(const std::string & tools) { | ||
| // return common_chat_tools_parse_oaicompat(json::parse(tools)); | ||
| // } |
| // TODO @ngxson : no known chat templates support reasoning_content in content parts yet | ||
| // this can be useful for models with interleaved thinking (like Kimi-K2) | ||
| // if you see any templates explicitly support this, please ping me | ||
| // std::string reasoning_content; |
There was a problem hiding this comment.
I guess you could argue that GPT-OSS does, but don't know if anyone properly supports that.
| { | ||
| {"role", "assistant"}, | ||
| {"content", "Assistant message"}, | ||
| {"reasoning_content", "Reasoning content"} | ||
| }, |
There was a problem hiding this comment.
Might need a couple more capability checks for thinking at the message level and "type": "thinking" in content parts for gpt-oss and ministral 3 respectively.
The current logic for these models transforms reasoning_content to their expected field at init.
There was a problem hiding this comment.
for gpt-oss, it seems like reasoning is only allowed to be added if add_generation_prompt = false, so not usable in llama.cpp use case I think:
{%- elif loop.last and not add_generation_prompt %}
{#- Only render the CoT if the final turn is an assistant turn and add_generation_prompt is false #}
{#- This is a situation that should only occur in training, never in inference. #}
{%- if "thinking" in message %}
{{- "<|start|>assistant<|channel|>analysis<|message|>" + message.thinking + "<|end|>" }}
{%- endif %}
There was a problem hiding this comment.
Line 293:
{%- elif message.thinking and not future_final_message.found %}
{{- "<|start|>assistant<|channel|>analysis<|message|>" + message.thinking + "<|end|>" }}
{%- endif %}
@pwilkin I'm not splitting but they are indeed different notions:
Edit: I think this PR already provide the 4 points a,b,c,d that you brought up |
|
@ngxson yeah, you're right. I was somehow confused that we're already passing the reasoning_content to the template. |
|
The API supports it, but the WebUI does not. I assume this is setting up the foundation to add first class support in the WebUI. By support, I mean it'll pass the reasoning in the message objects fed to the template. |
…gml-org#18994) * support reasoning_content input * report template caps to webui * add docs * rm commented code
…gml-org#18994) * support reasoning_content input * report template caps to webui * add docs * rm commented code
…gml-org#18994) * support reasoning_content input * report template caps to webui * add docs * rm commented code
Ref: #18936 (comment)
Changes included in this PR
json_fwdinchat.hto avoid usingtemplatetrickcommon_chat_msgs_to_json_oaicompatandcommon_chat_msg::to_json_oaicompat()clear_thinking = falsefor GLM 4.7 if it is not specifiedsupports_preserve_reasoningto server/props(Web UI support is TBD)
Changes in API
The
/chat/completionsAPI now acceptsreasoning_contentfor assistant message:{ "messages": [ { "content": "Hello, world!", "role": "user" }, { "content": "Hey there!", "role": "assistant", "reasoning_content": "This is my reasoning." }, { "content": "Hello, world!", "role": "user" } ], "stream": false, "max_tokens": 64 }If the template supports it, the reasoning will be put back into the message (testing with GLM 4.7)
[gMASK]<sop><|user|>Hello, world!<|assistant|><think>This is my reasoning.</think>Hey there!<|user|>Hello, world!<|assistant|><think>Otherwise, it will be ignored.
To know if the template supports it,
/propsendpoint will indicate:{ "chat_template_caps": { ... "supports_preserve_reasoning": true, ... } }