Skip to content

reasoning: fix tool-call routing bug in GLM-5.1-FP8 at long context (AIVLLM-229)#40782

Closed
jatseng-ai wants to merge 1 commit into
vllm-project:mainfrom
jatseng-ai:fix/glm45-tool-call-routing-aivllm-229
Closed

reasoning: fix tool-call routing bug in GLM-5.1-FP8 at long context (AIVLLM-229)#40782
jatseng-ai wants to merge 1 commit into
vllm-project:mainfrom
jatseng-ai:fix/glm45-tool-call-routing-aivllm-229

Conversation

@jatseng-ai
Copy link
Copy Markdown
Contributor

Summary

GLM-5.1-FP8 places <tool_call> XML inside <think> blocks when prompt context exceeds ~8k tokens. vLLM's glm45 reasoning parser captures this in reasoning_content instead of routing it to tool_calls, producing finish_reason=stop instead of finish_reason=tool_calls.

Three-phase fix in DeepSeekV3ReasoningWithThinkingParser:

  • Phase 1 — complete <tool_call>…</tool_call> blocks inside reasoning promoted verbatim
  • Phase 2 — body-without-open-tag (token id 154829 stripped by vLLM) reconstructed by prepending <tool_call>
  • Phase 3a/b — closing fragment appended to unclosed content call, or promoted standalone

19/19 unit tests pass. Live test enforces 8k-token gate. Confirmed PASS at 10,431 prompt tokens.

Analogous to #39055 (Qwen3) but requires additional phases because GLM's <tool_call> open token is a special token stripped from text output.

Ref: AIVLLM-229

Copy link
Copy Markdown

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

@mergify mergify Bot added deepseek Related to DeepSeek models bug Something isn't working labels Apr 24, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a three-phase fix for the GLM-4.5 model to handle tool-call XML blocks that are incorrectly emitted inside reasoning tags. It includes a new _split_embedded_tool_calls method to promote these blocks to the content field and comprehensive unit tests for various edge cases. Feedback highlights a regression where essential properties were removed from the parser and a potential issue with a regular expression being too greedy, which could lead to the accidental removal of reasoning prose.

Comment on lines 43 to -49
@@ -40,14 +76,6 @@ def __init__(self, tokenizer: PreTrainedTokenizerBase, *args, **kwargs):
else:
self._parser = IdentityReasoningParser(tokenizer, *args, **kwargs)

@property
def reasoning_start_str(self) -> str | None:
return self._parser.reasoning_start_str

@property
def reasoning_end_str(self) -> str | None:
return self._parser.reasoning_end_str
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The removal of the reasoning_start_str and reasoning_end_str properties from DeepSeekV3ReasoningParser is a regression. These properties are essential for delegating to the underlying parser (e.g., DeepSeekR1ReasoningParser), which defines the actual delimiters like <think> and </think>. Without these, the parser will return None for these delimiters, which can break reasoning block detection in the engine or API layers that rely on them.

Comment on lines +35 to +45
_BODY_WITHOUT_OPEN_RE = re.compile(
r"^(?:[\w.-]+\n?)?" # optional function name (no spaces, optional newline)
r"(?:<arg_key>.*?</arg_key>" # at least one complete key-value pair
r"<arg_value>.*?</arg_value>)*"
r"<arg_key>.*?</arg_key>" # required: at least one <arg_key> must be present
r"<arg_value>.*?" # value (may be truncated)
r"(?:</arg_value>)?"
r"</tool_call>" # required closing tag
r".*$",
re.DOTALL,
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The regex _BODY_WITHOUT_OPEN_RE uses .*$ at the end, which greedily captures any text following the </tool_call> tag. If the model produces additional reasoning prose after the tool call body within the same block, that prose will be incorrectly moved to the content field and removed from reasoning_content. Changing this to \s*$ ensures that Phase 2 only triggers when the entire remaining string is a tool call body (optionally followed by whitespace), preventing accidental loss of reasoning prose.

Suggested change
_BODY_WITHOUT_OPEN_RE = re.compile(
r"^(?:[\w.-]+\n?)?" # optional function name (no spaces, optional newline)
r"(?:<arg_key>.*?</arg_key>" # at least one complete key-value pair
r"<arg_value>.*?</arg_value>)*"
r"<arg_key>.*?</arg_key>" # required: at least one <arg_key> must be present
r"<arg_value>.*?" # value (may be truncated)
r"(?:</arg_value>)?"
r"</tool_call>" # required closing tag
r".*$",
re.DOTALL,
)
_BODY_WITHOUT_OPEN_RE = re.compile(
r"^(?:[\w.-]+\n?)?" # optional function name (no spaces, optional newline)
r"(?:<arg_key>.*?</arg_key>"
r"<arg_value>.*?</arg_value>)*"
r"<arg_key>.*?</arg_key>" # required: at least one <arg_key> must be present
r"<arg_value>.*?" # value (may be truncated)
r"(?:</arg_value>)?"
r"</tool_call>"
r"\s*$",
re.DOTALL,
)

@jatseng-ai jatseng-ai closed this Apr 24, 2026
@jatseng-ai jatseng-ai deleted the fix/glm45-tool-call-routing-aivllm-229 branch April 24, 2026 15:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working deepseek Related to DeepSeek models

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant