Skip to content

[Bugfix] Fix the issue with interleaved thinking when using streaming#30033

Merged
DarkLight1337 merged 4 commits intovllm-project:mainfrom
chaunceyjiang:interleaved_thinking_bug
Dec 4, 2025
Merged

[Bugfix] Fix the issue with interleaved thinking when using streaming#30033
DarkLight1337 merged 4 commits intovllm-project:mainfrom
chaunceyjiang:interleaved_thinking_bug

Conversation

@chaunceyjiang
Copy link
Collaborator

@chaunceyjiang chaunceyjiang commented Dec 4, 2025

Purpose

Fix the issue with interleaved thinking when using streaming.

|begin▁of▁sentence|>

## Tools
You have access to a set of tools you can use to answer the user's question.

<think>...thinking about results</think>

The current date is: 2024-09-30<|User|> What is the weather like in 
Beijing today? And tomorrow? <|Assistant|><think>


When the user's prompt contains something like <think>...</think> ... <think>, is_reasoning_end incorrectly determines that reasoning has already ended. In reality, because the last token is <think>, the reasoning has just begun. This causes reasoning to fail to work properly.

Especially in interleaved thinking mode, where the chat_template renders a <think>...</think> ... <think> prompt with 100% consistency.

Test Plan

Test Result


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request addresses a bug in is_reasoning_end that caused incorrect behavior with interleaved thinking blocks during streaming. The change correctly determines if reasoning has ended by checking for the last occurrence of a start or end token. The logic is sound and effectively resolves the issue. However, the pull request lacks unit tests to cover the fixed scenario, which is a significant omission for preventing future regressions. I have added a comment to highlight the need for adding a dedicated test case.

Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Signed-off-by: Chauncey <chaunceyjiang@gmail.com>
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
@DarkLight1337 DarkLight1337 enabled auto-merge (squash) December 4, 2025 08:38
@chaunceyjiang chaunceyjiang added the ready ONLY add when PR is ready to merge/full CI is needed label Dec 4, 2025
@DarkLight1337 DarkLight1337 merged commit 6796ce8 into vllm-project:main Dec 4, 2025
46 of 47 checks passed
@chaunceyjiang chaunceyjiang deleted the interleaved_thinking_bug branch December 4, 2025 12:35
dsuhinin pushed a commit to dsuhinin/vllm that referenced this pull request Jan 21, 2026
…vllm-project#30033)

Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
Signed-off-by: Chauncey <chaunceyjiang@gmail.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Signed-off-by: dsuhinin <suhinin.dmitriy@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants