Fix assistant thinking block normalization by mertunsall · Pull Request #41718 · vllm-project/vllm

mertunsall · 2026-05-05T09:05:04Z

Summary

route assistant content parts of type thinking into the normalized reasoning / reasoning_content fields instead of visible content
reject assistant messages that provide both top-level reasoning and typed thinking content blocks, since those are duplicate reasoning representations
add CPU-friendly regression coverage for generic chat message normalization and DeepSeek V4 prompt rendering

This fixes a DeepSeek V4 history-rendering issue where prior assistant messages shaped like {"content": [{"type": "thinking", "thinking": "..."}]} rendered as <think></think>... instead of <think>...</think>.

Tests

git diff --check
CUDA_VISIBLE_DEVICES='' .venv/bin/python -m pytest tests/entrypoints/test_chat_utils.py::test_parse_chat_messages_include_thinking_chunk tests/entrypoints/test_chat_utils.py::test_parse_chat_messages_rejects_duplicate_assistant_reasoning tests/tokenizers_/test_deepseek_v4.py::test_deepseek_v4_renders_assistant_thinking_content_as_reasoning -q
- Note: after reverting the minimal test config per reviewer preference, the two test_chat_utils.py targets require the existing mistral_model_config fixture and may need optional vision deps in local environments.

AI assistance

AI assistance was used to prepare this patch. The submitting human should review every changed line and CI result before merge.

claude

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

gemini-code-assist

Code Review

This pull request implements the extraction of thinking content parts from assistant messages, ensuring they are processed as reasoning. It adds validation to prevent the simultaneous use of top-level reasoning fields and thinking content parts, and includes tests for DeepSeek V4 and chat utility parsing. A review comment suggests using a newline separator when joining multiple reasoning parts to enhance readability.

Co-authored-by: OpenAI Codex <codex@openai.com> Signed-off-by: mertunsall <mert.unsal@mistral.ai>

juliendenize

The issue I see here is that creating reasoning as an additional field based on thinking chunks will lead to duplicated reasoning for mistral-common and recent chat templates.
FYI, this PR adds the support of reasoning for mistral-common inside vLLM
#41658

The duplication is due to the fact that we don't expect users to send both format at the same time. Therefore, I think it is necessary to raise an error as you did if both format are sent inside a message and probably something we should also add in mistral-common.

If thinking is used we should probably either raise if it is not a Mistral model or find a way to not duplicate both entries. AFAIK we don't have models that use thinking chunks between text chunks in an assistant message so the solution might be that
_extract_assistant_thinking_parts removes thinking chunks. However it won't work for past chat templates so not ideal.

juliendenize · 2026-05-05T12:07:36Z

+        if reasoning_from_content is not None:
+            reasoning = reasoning_from_content


recent mistral chat templates such as this one
https://huggingface.co/mistralai/Mistral-Medium-3.5-128B/blob/main/chat_template.jinja#L80 and mistral-common
would add twice the thinking part if you do this.

mertunsall requested review from DarkLight1337, NickLucche, aarnphm and robertgshaw2-redhat as code owners May 5, 2026 09:05

claude Bot reviewed May 5, 2026

View reviewed changes

mergify Bot added deepseek Related to DeepSeek models frontend labels May 5, 2026

gemini-code-assist Bot reviewed May 5, 2026

View reviewed changes

Comment thread vllm/entrypoints/chat_utils.py

mertunsall force-pushed the fix-assistant-thinking-block-reasoning branch from f7fc8db to e074f56 Compare May 5, 2026 09:15

BugenZhao self-requested a review May 5, 2026 09:18

Fix assistant thinking block normalization

e35531e

Co-authored-by: OpenAI Codex <codex@openai.com> Signed-off-by: mertunsall <mert.unsal@mistral.ai>

mertunsall force-pushed the fix-assistant-thinking-block-reasoning branch from e074f56 to e35531e Compare May 5, 2026 09:26

juliendenize reviewed May 5, 2026

View reviewed changes

chaunceyjiang self-assigned this May 6, 2026

juliendenize mentioned this pull request May 6, 2026

[BUGFIX] Parse or convert thinking chunks given content format #41822

Open

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix assistant thinking block normalization#41718

Fix assistant thinking block normalization#41718
mertunsall wants to merge 1 commit intovllm-project:mainfrom
mertunsall:fix-assistant-thinking-block-reasoning

mertunsall commented May 5, 2026 •

edited

Loading

Uh oh!

claude Bot left a comment

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

juliendenize left a comment •

edited

Loading

Uh oh!

juliendenize May 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		if reasoning_from_content is not None:
		reasoning = reasoning_from_content

Uh oh!

Conversation

mertunsall commented May 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Tests

AI assistance

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

Claude Code Review

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

juliendenize left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

juliendenize May 5, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

mertunsall commented May 5, 2026 •

edited

Loading

juliendenize left a comment •

edited

Loading