fix no think of GLM-4.5 / GLM-4.7#31449
Conversation
Signed-off-by: zRzRzRzRzRzRzR <2448370773@qq.com>
There was a problem hiding this comment.
Code Review
This pull request refactors Glm4MoeModelReasoningParser to inherit from BaseThinkingReasoningParser, simplifying the code. It also introduces a fix to handle cases where GLM-4.5 models might not output a <think> start token, which is a good improvement. However, the implementation introduces code duplication between the subclass and the base class. I've left a comment highlighting this maintainability issue and suggesting a refactoring approach.
| if ( | ||
| ret is not None | ||
| and self.start_token_id not in previous_token_ids | ||
| and self.start_token_id not in delta_token_ids | ||
| ): | ||
| return None | ||
|
|
||
| if self.think_start_token_id in previous_token_ids: | ||
| if self.think_end_token_id in delta_token_ids: | ||
| # <think> in previous, </think> in delta, | ||
| # extract reasoning content | ||
| end_index = delta_text.find(self.think_end_token) | ||
| if self.end_token_id in delta_token_ids: | ||
| # end token in delta with more tokens, | ||
| # extract reasoning content and content | ||
| end_index = delta_text.find(self.end_token) | ||
| reasoning = delta_text[:end_index] | ||
| content = delta_text[end_index + len(self.think_end_token) :] | ||
| content = delta_text[end_index + len(self.end_token) :] | ||
| return DeltaMessage( | ||
| reasoning=reasoning, | ||
| content=content if content else None, | ||
| ) | ||
| elif self.think_end_token_id in previous_token_ids: | ||
| # <think> in previous, </think> in previous, | ||
| # reasoning content continues | ||
| elif self.end_token_id in previous_token_ids: | ||
| # end token in previous, thinking content ends | ||
| return DeltaMessage(content=delta_text) | ||
| else: | ||
| # <think> in previous, no </think> in previous or delta, | ||
| # reasoning content continues | ||
| return DeltaMessage(reasoning=delta_text) | ||
| elif self.think_start_token_id in delta_token_ids: | ||
| if self.think_end_token_id in delta_token_ids: | ||
| # <think> in delta, </think> in delta, extract reasoning content | ||
| start_index = delta_text.find(self.think_start_token) | ||
| end_index = delta_text.find(self.think_end_token) | ||
| reasoning = delta_text[ | ||
| start_index + len(self.think_start_token) : end_index | ||
| ] | ||
| content = delta_text[end_index + len(self.think_end_token) :] | ||
| return DeltaMessage( | ||
| reasoning=reasoning, | ||
| content=content if content else None, | ||
| ) | ||
| else: | ||
| # <think> in delta, no </think> in delta, | ||
| # reasoning content continues | ||
| # no end token in previous or delta, reasoning content continues | ||
| return DeltaMessage(reasoning=delta_text) |
There was a problem hiding this comment.
The logic within this if block for handling streaming extraction when a start token is missing is a duplication of existing logic in BaseThinkingReasoningParser for when a start token is present. This code duplication introduces a maintenance risk: future changes to the parsing logic will need to be made in two places, which is error-prone.
To improve maintainability, this duplicated logic should be extracted into a shared helper method within the base class.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| if ( | ||
| ret is not None | ||
| and self.start_token_id not in previous_token_ids | ||
| and self.start_token_id not in delta_token_ids | ||
| ): |
There was a problem hiding this comment.
Normal outputs treated as reasoning when think tokens absent
The new branch that runs when no <think> start token has ever been seen now routes text into the reasoning field even though the model never emitted thinking markers, and the class now inherits the base extract_reasoning (basic_parsers.py 151-175) which likewise returns the entire output as reasoning when <think> is missing. For GLM calls with enable_thinking=False or for variants that simply omit the tags, content stays None, so the OpenAI response builders (serving_responses.py 844-889) skip creating the assistant message and the user receives no answer despite the model producing one. The previous parser returned the text as content when the tags were absent, so non-thinking generations are now silently dropped.
Useful? React with 👍 / 👎.
chaunceyjiang
left a comment
There was a problem hiding this comment.
It seems exactly the same as DeepSeekR1ReasoningParser. I suggest adding a new entry glm47 in __init__.py:
"glm47": ( # name
"deepseek_r1_reasoning_parser", # filename
"DeepSeekR1ReasoningParser", # class_name
),
"deepseek_r1_reasoning_parser", # filename |
No. Since the official GLM blog recommends using the GLM-4.5 reasoning-parser for the GLM-4.7 model, end users can continue using the GLM-4.5 reasoning-parser (even though it is effectively an R1 implementation) without needing to modify any startup parameters. |
|
I understand, I modified the code. Is this what you mean? |
|
@zRzRzRzRzRzRzR Could you test |
|
|
Thank you for your patience. @zRzRzRzRzRzRzR I discussed this with the other maintainers. However, considering that the file |
|
Alright, I won't modify this PR anymore, looking forward to the merge. |
|
I tried to use deepseek_r1 parser with GLM-4.7, and when {"chat_template_kwargs": {"enable_thinking": False}} is added to the request, deepseek_r1 parser seems to encapsulate the entire non-reasoning content into Simply extending the Deepseek-R1's reasoning parser class will introduce the same issue. |
|
curl http://localhost:18001/v1/chat/completions {"id":"chatcmpl-9ba8e0f3887b48b7","object":"chat.completion","created":1767525526,"model":"glm-4.7-fp8","choices":[{"index":0,"message":{"role":"assistant","content":null,"refusal":null,"annotations":null,"audio":null,"function_call":null,"tool_calls":[],"reasoning":"青泥何盘盘,百步九折萦岩峦。扪参历井仰胁息,以手抚膺坐长叹。问君西游何时还?畏途巉岩不可攀。但见悲鸟号古木,雄飞雌从绕林间。又闻子规啼夜月,愁空山。蜀道之难,难于上青天,使人听此凋朱颜!连峰去天不盈尺,枯松倒挂倚绝壁。飞湍瀑流争喧豗,砯崖转石万壑雷。其险也如此,嗟尔远道之人胡为乎来哉!剑阁峥嵘而崔嵬,一夫当关,万夫莫开。所守或匪亲,化为狼与豺。朝避猛虎,夕避长蛇;磨牙吮血,杀人如麻。锦城虽云乐,不如早还家。蜀道之难,难于上青天,侧身西望长咨嗟!","reasoning_content":"青泥何盘盘,百步九折萦岩峦。扪参历井仰胁息,以手抚膺坐长叹。问君西游何时还?畏途巉岩不可攀。但见悲鸟号古木,雄飞雌从绕林间。又闻子规啼夜月,愁空山。蜀道之难,难于上青天,使人听此凋朱颜!连峰去天不盈尺,枯松倒挂倚绝壁。飞湍瀑流争喧豗,砯崖转石万壑雷。其险也如此,嗟尔远道之人胡为乎来哉!剑阁峥嵘而崔嵬,一夫当关,万夫莫开。所守或匪亲,化为狼与豺。朝避猛虎,夕避长蛇;磨牙吮血,杀人如麻。锦城虽云乐,不如早还家。蜀道之难,难于上青天,侧身西望长咨嗟!"},"logprobs":null,"finish_reason":"stop","stop_reason":151336,"token_ids":null}],"service_tier":null,"system_fingerprint":null,"usage":{"prompt_tokens":13,"total_tokens":242,"completion_tokens":229,"prompt_tokens_details":null},"prompt_logprobs":null,"prompt_token_ids":null,"kv_transfer_params":null} the content is null,and had replace to reasoning? |
|
curl http://localhost:18001/v1/chat/completions -H "Content-Type: application/json" -d '{ not set chat_template_kwargs": {"enable_thinking": false},but he reason have "reasoning" and "reasoning_content" |
/cc @zRzRzRzRzRzRzR WDYT? |
meet same problem, cc @chaunceyjiang |
Signed-off-by: zRzRzRzRzRzRzR <2448370773@qq.com>
Signed-off-by: zRzRzRzRzRzRzR <2448370773@qq.com>
Signed-off-by: zRzRzRzRzRzRzR <2448370773@qq.com>
Signed-off-by: zRzRzRzRzRzRzR <2448370773@qq.com> Signed-off-by: dsuhinin <suhinin.dmitriy@gmail.com>
Signed-off-by: zRzRzRzRzRzRzR <2448370773@qq.com>
Using the logic of DeepSeek-R1 and Qwen3, it can be extracted even if it doesn't appear.