[Bugfix] Fix GLM4 MoE and SeedOSS reasoning parser regressions by he-yufeng · Pull Request #37044 · vllm-project/vllm

he-yufeng · 2026-03-14T09:29:36Z

Summary

Two reasoning parser regressions were introduced by PR #33221 (which consolidated model-specific parsers into the generic DeepSeek V3 delegation chain):

GLM4 MoE — tagless text misclassified as reasoning instead of content.

GLM4 injects <think> via the chat template, so when the model output lacks </think>, it means the model chose not to reason. The R1-based parser incorrectly treats this as reasoning. Added a dedicated Glm4MoeReasoningParser that returns (None, content) when </think> is absent.

SeedOSS — streaming output inconsistent with non-streaming.

SeedOSSReasoningParser extends BaseThinkingReasoningParser directly but the base class's streaming path returns content for tagless text, while the non-streaming extract_reasoning() returns it as reasoning. Added the same R1-style streaming override so both paths agree.

Test plan

pytest tests/reasoning/test_glm4_moe_reasoning_parser.py — all 10 cases pass (without_think, without_think_stream, only_open_tag previously failing)
pytest tests/reasoning/test_seedoss_reasoning_parser.py — streaming cases pass (previously misclassifying tagless text)

Fixes #37023, fixes #37022

gemini-code-assist

Code Review

This pull request introduces two fixes for reasoning parser regressions in GLM4 MoE and SeedOSS models. The changes for SeedOSS correctly align streaming and non-streaming behavior. For GLM4 MoE, a new parser is introduced that handles non-streaming output without a closing </think> tag. However, I've identified an inconsistency between the streaming and non-streaming behavior in the new GLM4 parser. Specifically, output with an opening <think> tag but no closing tag is treated as reasoning in streaming mode but as content in non-streaming mode. This should be addressed to ensure consistent parsing logic across both modes.

gemini-code-assist · 2026-03-14T09:32:44Z

vllm/reasoning/glm4_moe_reasoning_parser.py

+class Glm4MoeReasoningParser(BaseThinkingReasoningParser):
+    """
+    Reasoning parser for GLM-4 MoE models.
+
+    Unlike DeepSeek R1, GLM-4 injects <think> via the chat template rather
+    than generating it.  When the model output lacks </think>, the entire
+    output is treated as *content* (not reasoning), because the absence of
+    the end tag means the model chose not to reason.
+    """


There's an inconsistency between the non-streaming and streaming behavior of this parser for outputs that contain <think> but not </think>.

The extract_reasoning method correctly implements the logic described in the docstring: if </think> is absent, the entire output is treated as content. For an input like "<think>some reasoning", it will return (None, "<think>some reasoning").

However, this class inherits extract_reasoning_streaming from BaseThinkingReasoningParser. The base implementation will treat "<think>some reasoning" as reasoning during streaming, which contradicts this parser's stated logic for handling outputs without a closing </think> tag.

This is the same type of inconsistency that this PR fixes for SeedOSSReasoningParser. To ensure consistent behavior, Glm4MoeReasoningParser should also override extract_reasoning_streaming. A potential approach is to buffer content after <think> and only flush it as reasoning once </think> is seen. If the stream ends before </think>, the buffer would be flushed as content.

alvinttang

Review

This PR fixes two distinct regressions — the GLM-4 MoE reasoning parser and the SeedOSS streaming parser. Both fixes are well-scoped.

GLM-4 MoE reasoning parser

The old mapping pointed "glm45" at DeepSeekV3ReasoningWithThinkingParser, which is incorrect for GLM-4 MoE. The new Glm4MoeReasoningParser correctly inherits from BaseThinkingReasoningParser and overrides the key semantic difference: when </think> is absent, the entire output is treated as content rather than reasoning. This matches GLM-4's behavior where <think> is injected via the chat template, and the model can opt out of reasoning by not emitting the end tag.

The extract_reasoning implementation is clean. One edge case to consider: if the model outputs <think></think> (empty reasoning), reasoning will be "" and content will be None (since content or None converts empty string to None). Is empty-string reasoning semantically meaningful here, or should it also be normalized to None?

SeedOSS streaming parser

The extract_reasoning_streaming override handles the case where the start token is in the chat template (not generated). The logic:

If neither previous_token_ids nor delta_token_ids contain the start token ID...
Check if end token is in delta → split into reasoning + content
Check if end token is in previous → all delta is content
Otherwise → all delta is reasoning

This is correct, but I have a concern about the token ID checks: self.start_token_id not in previous_token_ids does a linear scan of the full token history on every streaming chunk. For long generations, this could become expensive. Consider tracking whether the start token was seen via a boolean flag on the parser instance rather than re-scanning the history each time.

Also, the end_index = delta_text.find(self.end_token) on line that handles the end-token-in-delta case: if the end token is split across two delta chunks (rare but possible with some tokenizers), find will fail and the text won't be split correctly. The base class may already handle this via the partial-match machinery, so this might be fine — but worth verifying.

Missing tests

Neither parser has unit tests in this PR. Given that these are regression fixes, adding at least a couple of test cases for each (especially the "no end tag" and "streaming split" edge cases) would help prevent future regressions.

Overall, solid fixes. The GLM-4 parser is particularly clean.

he-yufeng · 2026-03-15T05:48:48Z

Thanks for the thorough review @alvinttang!

Empty reasoning: Good catch. "" reasoning should indeed be None — an empty think block means the model chose not to reason. I'll normalize it.

Linear scan concern: Valid point. However, this is inherited from DeepSeekR1ReasoningParser (same pattern at line 47-48 of deepseek_r1_reasoning_parser.py), so changing it here would diverge from the existing R1 behavior. I'd suggest addressing the linear scan optimization as a follow-up across all parsers that share this pattern.

Tests: The test cases already exist in tests/reasoning/test_glm4_moe_reasoning_parser.py (the file that was failing before this fix). The SeedOSS tests are in tests/reasoning/test_seedoss_reasoning_parser.py. Both were added as part of the CI regression PR #37025.

GLM4 MoE (vllm-project#37023): PR vllm-project#33221 replaced the dedicated Glm4MoeModelReasoningParser with the generic DeepSeekV3ReasoningWithThinkingParser, which delegates to R1. R1 treats tagless text as reasoning, but GLM4 injects <think> via the chat template, so tagless output means the model chose not to reason. Add Glm4MoeReasoningParser that returns (None, content) when </think> is absent, matching the expected semantics. Empty reasoning (<think></think>) is normalized to None. SeedOSS (vllm-project#37022): The streaming path in BaseThinkingReasoningParser returns content for tagless text, but SeedOSS (like R1) may not emit the start token. Add the same R1-style streaming override so tagless streaming text is correctly classified as reasoning. Fixes vllm-project#37023, fixes vllm-project#37022 Signed-off-by: Yufeng He <40085740+he-yufeng@users.noreply.github.com>

<think></think> means the model chose not to reason, so empty string reasoning should be None, not "". Added test cases for this edge case.

he-yufeng · 2026-03-31T03:15:44Z

Thanks again for the detailed review @alvinttang! Pushed a fix (158912e):

Empty reasoning normalization — Added an explicit comment clarifying the or None normalization on the return, and added test cases for <think></think> (empty think block with and without trailing content). The or None was already handling this correctly, but now the intent is documented and tested.

GLM-4 MoE streaming inconsistency (also @gemini-code-assist) — The streaming/non-streaming divergence for <think>... (no closing tag) is intentional and unavoidable. During streaming, we can't know if </think> will arrive in a future chunk, so the base class must optimistically route text as reasoning until the end token appears. The non-streaming path has the complete output and can make the definitive call that no </think> means the model opted out of reasoning. This is the same tradeoff every think-tag parser makes — it's inherent to incremental parsing.

SeedOSS end token split across chunks — This is safe because the split check operates on token IDs (self.end_token_id in delta_token_ids), not raw text. Since </seed:think> is a single special token in the vocabulary, it will always appear as one token ID in delta_token_ids — it can't be split across chunks at the token level. The delta_text.find(self.end_token) is only reached after confirming the token ID is present in the delta.

gemini-code-assist · 2026-03-31T03:15:59Z

Thanks for the detailed explanation and for pushing the fix! The clarification on the streaming/non-streaming divergence for <think> tags and the confirmation that the SeedOSS end token check is safe at the token ID level are very helpful. The added documentation and test cases for empty think blocks look great. I have no further concerns.

he-yufeng requested review from aarnphm and chaunceyjiang as code owners March 14, 2026 09:29

mergify bot added the bug Something isn't working label Mar 14, 2026

gemini-code-assist bot reviewed Mar 14, 2026

View reviewed changes

alvinttang reviewed Mar 14, 2026

View reviewed changes

he-yufeng force-pushed the fix/reasoning-parser-regressions branch from 10ae5cf to 7ae12d6 Compare March 15, 2026 05:49

fix: normalize empty reasoning to None per review

158912e

<think></think> means the model chose not to reason, so empty string reasoning should be None, not "". Added test cases for this edge case.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bugfix] Fix GLM4 MoE and SeedOSS reasoning parser regressions#37044

[Bugfix] Fix GLM4 MoE and SeedOSS reasoning parser regressions#37044
he-yufeng wants to merge 2 commits intovllm-project:mainfrom
he-yufeng:fix/reasoning-parser-regressions

he-yufeng commented Mar 14, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Mar 14, 2026

Uh oh!

alvinttang left a comment

Uh oh!

he-yufeng commented Mar 15, 2026

Uh oh!

he-yufeng commented Mar 31, 2026

Uh oh!

gemini-code-assist bot commented Mar 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

he-yufeng commented Mar 14, 2026

Summary

Test plan

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Mar 14, 2026

Choose a reason for hiding this comment

Uh oh!

alvinttang left a comment

Choose a reason for hiding this comment

Review

GLM-4 MoE reasoning parser

SeedOSS streaming parser

Missing tests

Uh oh!

he-yufeng commented Mar 15, 2026

Uh oh!

he-yufeng commented Mar 31, 2026

Uh oh!

gemini-code-assist bot commented Mar 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants