Fix assistant thinking block normalization#41718
Fix assistant thinking block normalization#41718mertunsall wants to merge 1 commit intovllm-project:mainfrom
Conversation
There was a problem hiding this comment.
Code Review
This pull request implements the extraction of thinking content parts from assistant messages, ensuring they are processed as reasoning. It adds validation to prevent the simultaneous use of top-level reasoning fields and thinking content parts, and includes tests for DeepSeek V4 and chat utility parsing. A review comment suggests using a newline separator when joining multiple reasoning parts to enhance readability.
f7fc8db to
e074f56
Compare
Co-authored-by: OpenAI Codex <codex@openai.com> Signed-off-by: mertunsall <mert.unsal@mistral.ai>
e074f56 to
e35531e
Compare
There was a problem hiding this comment.
The issue I see here is that creating reasoning as an additional field based on thinking chunks will lead to duplicated reasoning for mistral-common and recent chat templates.
FYI, this PR adds the support of reasoning for mistral-common inside vLLM
#41658
The duplication is due to the fact that we don't expect users to send both format at the same time. Therefore, I think it is necessary to raise an error as you did if both format are sent inside a message and probably something we should also add in mistral-common.
If thinking is used we should probably either raise if it is not a Mistral model or find a way to not duplicate both entries. AFAIK we don't have models that use thinking chunks between text chunks in an assistant message so the solution might be that
_extract_assistant_thinking_parts removes thinking chunks. However it won't work for past chat templates so not ideal.
| if reasoning_from_content is not None: | ||
| reasoning = reasoning_from_content |
There was a problem hiding this comment.
recent mistral chat templates such as this one
https://huggingface.co/mistralai/Mistral-Medium-3.5-128B/blob/main/chat_template.jinja#L80 and mistral-common
would add twice the thinking part if you do this.
Summary
thinkinginto the normalizedreasoning/reasoning_contentfields instead of visiblecontentreasoningand typedthinkingcontent blocks, since those are duplicate reasoning representationsThis fixes a DeepSeek V4 history-rendering issue where prior assistant messages shaped like
{"content": [{"type": "thinking", "thinking": "..."}]}rendered as<think></think>...instead of<think>...</think>.Tests
git diff --checkCUDA_VISIBLE_DEVICES='' .venv/bin/python -m pytest tests/entrypoints/test_chat_utils.py::test_parse_chat_messages_include_thinking_chunk tests/entrypoints/test_chat_utils.py::test_parse_chat_messages_rejects_duplicate_assistant_reasoning tests/tokenizers_/test_deepseek_v4.py::test_deepseek_v4_renders_assistant_thinking_content_as_reasoning -qtest_chat_utils.pytargets require the existingmistral_model_configfixture and may need optional vision deps in local environments.AI assistance
AI assistance was used to prepare this patch. The submitting human should review every changed line and CI result before merge.