Skip to content

[Frontend] Support GLM-4.5 / GLM-4.7 with enable_thinking: false#31788

Merged
DarkLight1337 merged 3 commits intovllm-project:mainfrom
chaunceyjiang:glm7_enable_thinking
Jan 6, 2026
Merged

[Frontend] Support GLM-4.5 / GLM-4.7 with enable_thinking: false#31788
DarkLight1337 merged 3 commits intovllm-project:mainfrom
chaunceyjiang:glm7_enable_thinking

Conversation

@chaunceyjiang
Copy link
Copy Markdown
Collaborator

@chaunceyjiang chaunceyjiang commented Jan 6, 2026

Purpose

[Frontend] Support GLM-4.5 / GLM-4.7 with enable_thinking: false

FIX #31319
FIX #31449 (comment)

Test Plan

Test Result


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
@chaunceyjiang chaunceyjiang requested a review from aarnphm as a code owner January 6, 2026 07:51
@mergify mergify bot added the deepseek Related to DeepSeek models label Jan 6, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request aims to add support for GLM-4.5/GLM-4.7 models, particularly handling the enable_thinking: false parameter. The changes involve updating reasoning parsers.

My review found a critical logic issue in holo2_reasoning_parser.py where the or operator is used incorrectly, preventing the enable_thinking: false flag from disabling the thinking feature as intended. I've suggested using and instead. Additionally, I've pointed out an outdated docstring in glm4_moe_reasoning_parser.py that needs to be updated to reflect the new class inheritance.

Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
@chaunceyjiang
Copy link
Copy Markdown
Collaborator Author

/cc @hhd52859 @zhangsongqing @athenacykes Could you help test this?

Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
@hhd52859
Copy link
Copy Markdown

hhd52859 commented Jan 6, 2026

/cc @hhd52859 @zhangsongqing @athenacykes Could you help test this?

LGTM, thanks!

@chaunceyjiang
Copy link
Copy Markdown
Collaborator Author

/cc @zRzRzRzRzRzRzR PTAL

@JaredforReal
Copy link
Copy Markdown
Contributor

@chaunceyjiang not working well, extra_body ignored
image

@JaredforReal
Copy link
Copy Markdown
Contributor

@chaunceyjiang try working on protocol.py, appreciate that

@cjackal
Copy link
Copy Markdown
Contributor

cjackal commented Jan 6, 2026

So in effect glm45 reasoning parser is --reasoning-parser deepseek_v3 --default-chat-template-kwargs {"enable_thinking":true}. After #31581 reasoning parsets handle default chat template kwargs well, FYI.

@chaunceyjiang
Copy link
Copy Markdown
Collaborator Author

So in effect glm45 reasoning parser is --reasoning-parser deepseek_v3 --default-chat-template-kwargs {"enable_thinking":true}. After #31581 reasoning parsets handle default chat template kwargs well, FYI.

@cjackal Yes. This is more of a workaround. The root cause is still an incorrect implementation of the GLM45 reasoning parser.

@chaunceyjiang
Copy link
Copy Markdown
Collaborator Author

@chaunceyjiang not working well, extra_body ignored

@JaredforReal, could you share your client code? I think the issue is with the client-side code.

@JaredforReal
Copy link
Copy Markdown
Contributor

@chaunceyjiang Thanks!

@chaunceyjiang chaunceyjiang added the ready ONLY add when PR is ready to merge/full CI is needed label Jan 6, 2026
@chaunceyjiang
Copy link
Copy Markdown
Collaborator Author

cc @DarkLight1337 PTAL

@DarkLight1337 DarkLight1337 enabled auto-merge (squash) January 6, 2026 12:11
@DarkLight1337 DarkLight1337 merged commit 0202971 into vllm-project:main Jan 6, 2026
46 checks passed
Anexdeus pushed a commit to Anexdeus/vllm that referenced this pull request Jan 6, 2026
…m-project#31788)

Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
Signed-off-by: Anexdeus <5142168@mail.ru>
LucasWilkinson pushed a commit to neuralmagic/vllm that referenced this pull request Jan 6, 2026
yugong333 pushed a commit to yugong333/vllm that referenced this pull request Jan 9, 2026
@mballesterosc
Copy link
Copy Markdown

mballesterosc commented Jan 9, 2026

In streaming mode, when {"chat_template_kwargs":{"enable_thinking": False}} is set, output is encapsuled in <think></think> tags, it is intended?

Thanks

@RocketRider
Copy link
Copy Markdown

In streaming mode, when {"chat_template_kwargs":{"enable_thinking": False}} is set, output is encapsuled in <think></think> tags, it is intended?

Thanks

Did you test it with the latest nightly?
For me it is fixed with the latest version.

@mballesterosc
Copy link
Copy Markdown

In streaming mode, when {"chat_template_kwargs":{"enable_thinking": False}} is set, output is encapsuled in <think></think> tags, it is intended?
Thanks

Did you test it with the latest nightly? For me it is fixed with the latest version.

I'm sorry, I tried with the nightly build from the day the pull request was closed, and it's OK now. Thank you.

akh64bit pushed a commit to akh64bit/vllm that referenced this pull request Jan 16, 2026
dsuhinin pushed a commit to dsuhinin/vllm that referenced this pull request Jan 21, 2026
…m-project#31788)

Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
Signed-off-by: dsuhinin <suhinin.dmitriy@gmail.com>
ItzDEXX pushed a commit to ItzDEXX/vllm that referenced this pull request Feb 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

deepseek Related to DeepSeek models ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: GLM-4.7-FP8 missing beginning <think> tag

8 participants