[Bugfix] Fix the issue with interleaved thinking when using streaming by chaunceyjiang · Pull Request #30033 · vllm-project/vllm

chaunceyjiang · 2025-12-04T07:53:27Z

Purpose

Fix the issue with interleaved thinking when using streaming.

｜begin▁of▁sentence｜>

## Tools
You have access to a set of tools you can use to answer the user's question.

<think>...thinking about results</think>

The current date is: 2024-09-30<｜User｜> What is the weather like in 
Beijing today? And tomorrow? <｜Assistant｜><think>

When the user's prompt contains something like <think>...</think> ... <think>, is_reasoning_end incorrectly determines that reasoning has already ended. In reality, because the last token is <think>, the reasoning has just begun. This causes reasoning to fail to work properly.

Especially in interleaved thinking mode, where the chat_template renders a <think>...</think> ... <think> prompt with 100% consistency.

Test Plan

Test Result

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>

gemini-code-assist

Code Review

This pull request addresses a bug in is_reasoning_end that caused incorrect behavior with interleaved thinking blocks during streaming. The change correctly determines if reasoning has ended by checking for the last occurrence of a start or end token. The logic is sound and effectively resolves the issue. However, the pull request lacks unit tests to cover the fixed scenario, which is a significant omission for preventing future regressions. I have added a comment to highlight the need for adding a dedicated test case.

vllm/reasoning/basic_parsers.py

Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>

vllm/reasoning/basic_parsers.py

Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Signed-off-by: Chauncey <chaunceyjiang@gmail.com>

tests/reasoning/test_base_thinking_reasoning_parser.py

Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>

…vllm-project#30033) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com> Signed-off-by: Chauncey <chaunceyjiang@gmail.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Signed-off-by: dsuhinin <suhinin.dmitriy@gmail.com>

[Bugfix] Fix the issue with interleaved thinking when using streaming

00e9cc9

Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>

chaunceyjiang requested a review from aarnphm as a code owner December 4, 2025 07:53

gemini-code-assist bot reviewed Dec 4, 2025

View reviewed changes

vllm/reasoning/basic_parsers.py Show resolved Hide resolved

[Bugfix] Fix the issue with interleaved thinking when using streaming

82f324c

Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>

DarkLight1337 reviewed Dec 4, 2025

View reviewed changes

vllm/reasoning/basic_parsers.py Show resolved Hide resolved

Update vllm/reasoning/basic_parsers.py

a980672

Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Signed-off-by: Chauncey <chaunceyjiang@gmail.com>

DarkLight1337 approved these changes Dec 4, 2025

View reviewed changes

DarkLight1337 reviewed Dec 4, 2025

View reviewed changes

tests/reasoning/test_base_thinking_reasoning_parser.py Outdated Show resolved Hide resolved

[Bugfix] Fix the issue with interleaved thinking when using streaming

a07a382

Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>

DarkLight1337 enabled auto-merge (squash) December 4, 2025 08:38

chaunceyjiang added the ready ONLY add when PR is ready to merge/full CI is needed label Dec 4, 2025

DarkLight1337 merged commit 6796ce8 into vllm-project:main Dec 4, 2025
46 of 47 checks passed

chaunceyjiang deleted the interleaved_thinking_bug branch December 4, 2025 12:35

hdlj-h mentioned this pull request Dec 8, 2025

[Structured Output][Reasoning] Improves decoding throughput for models using single-token reasoning endings. #30056

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bugfix] Fix the issue with interleaved thinking when using streaming#30033

[Bugfix] Fix the issue with interleaved thinking when using streaming#30033
DarkLight1337 merged 4 commits intovllm-project:mainfrom
chaunceyjiang:interleaved_thinking_bug

chaunceyjiang commented Dec 4, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

chaunceyjiang commented Dec 4, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

chaunceyjiang commented Dec 4, 2025 •

edited by github-actions bot

Loading