Skip to content

Fix assistant thinking block normalization#41718

Open
mertunsall wants to merge 1 commit intovllm-project:mainfrom
mertunsall:fix-assistant-thinking-block-reasoning
Open

Fix assistant thinking block normalization#41718
mertunsall wants to merge 1 commit intovllm-project:mainfrom
mertunsall:fix-assistant-thinking-block-reasoning

Conversation

@mertunsall
Copy link
Copy Markdown
Contributor

@mertunsall mertunsall commented May 5, 2026

Summary

  • route assistant content parts of type thinking into the normalized reasoning / reasoning_content fields instead of visible content
  • reject assistant messages that provide both top-level reasoning and typed thinking content blocks, since those are duplicate reasoning representations
  • add CPU-friendly regression coverage for generic chat message normalization and DeepSeek V4 prompt rendering

This fixes a DeepSeek V4 history-rendering issue where prior assistant messages shaped like {"content": [{"type": "thinking", "thinking": "..."}]} rendered as <think></think>... instead of <think>...</think>.

Tests

  • git diff --check
  • CUDA_VISIBLE_DEVICES='' .venv/bin/python -m pytest tests/entrypoints/test_chat_utils.py::test_parse_chat_messages_include_thinking_chunk tests/entrypoints/test_chat_utils.py::test_parse_chat_messages_rejects_duplicate_assistant_reasoning tests/tokenizers_/test_deepseek_v4.py::test_deepseek_v4_renders_assistant_thinking_content_as_reasoning -q
    • Note: after reverting the minimal test config per reviewer preference, the two test_chat_utils.py targets require the existing mistral_model_config fixture and may need optional vision deps in local environments.

AI assistance

AI assistance was used to prepare this patch. The submitting human should review every changed line and CI result before merge.

Copy link
Copy Markdown

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

@mergify mergify Bot added deepseek Related to DeepSeek models frontend labels May 5, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements the extraction of thinking content parts from assistant messages, ensuring they are processed as reasoning. It adds validation to prevent the simultaneous use of top-level reasoning fields and thinking content parts, and includes tests for DeepSeek V4 and chat utility parsing. A review comment suggests using a newline separator when joining multiple reasoning parts to enhance readability.

Comment thread vllm/entrypoints/chat_utils.py
@mertunsall mertunsall force-pushed the fix-assistant-thinking-block-reasoning branch from f7fc8db to e074f56 Compare May 5, 2026 09:15
@BugenZhao BugenZhao self-requested a review May 5, 2026 09:18
Co-authored-by: OpenAI Codex <codex@openai.com>
Signed-off-by: mertunsall <mert.unsal@mistral.ai>
@mertunsall mertunsall force-pushed the fix-assistant-thinking-block-reasoning branch from e074f56 to e35531e Compare May 5, 2026 09:26
Copy link
Copy Markdown
Contributor

@juliendenize juliendenize left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue I see here is that creating reasoning as an additional field based on thinking chunks will lead to duplicated reasoning for mistral-common and recent chat templates.
FYI, this PR adds the support of reasoning for mistral-common inside vLLM
#41658

The duplication is due to the fact that we don't expect users to send both format at the same time. Therefore, I think it is necessary to raise an error as you did if both format are sent inside a message and probably something we should also add in mistral-common.

If thinking is used we should probably either raise if it is not a Mistral model or find a way to not duplicate both entries. AFAIK we don't have models that use thinking chunks between text chunks in an assistant message so the solution might be that
_extract_assistant_thinking_parts removes thinking chunks. However it won't work for past chat templates so not ideal.

Comment on lines +1787 to +1788
if reasoning_from_content is not None:
reasoning = reasoning_from_content
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

recent mistral chat templates such as this one
https://huggingface.co/mistralai/Mistral-Medium-3.5-128B/blob/main/chat_template.jinja#L80 and mistral-common
would add twice the thinking part if you do this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

deepseek Related to DeepSeek models frontend

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants