Skip to content

[Refactor][Parser] Migrate chat completion auto-tool/reasoning/plain streaming to parse_delta#39446

Merged
chaunceyjiang merged 2 commits intovllm-project:mainfrom
sfeng33:parser-1
Apr 14, 2026
Merged

[Refactor][Parser] Migrate chat completion auto-tool/reasoning/plain streaming to parse_delta#39446
chaunceyjiang merged 2 commits intovllm-project:mainfrom
sfeng33:parser-1

Conversation

@sfeng33
Copy link
Copy Markdown
Collaborator

@sfeng33 sfeng33 commented Apr 9, 2026

Purpose

Migrate branches in chat_completion_stream_generator (auto tool + reasoning, auto tool only, reasoning only, plain content) with unified Parser.parse_delta() calls.

Test Plan

Verified for chat completion api, there is no behaviour change:

# Category Server Command
1 Both reasoning + tool parser vllm serve Qwen/Qwen3-4B --reasoning-parser qwen3 --tool-call-parser hermes --enable-auto-tool-choice
2 Reasoning parser only vllm serve Qwen/Qwen3-4B --reasoning-parser qwen3
3 Tool parser only vllm serve Qwen/Qwen3-4B --tool-call-parser hermes --enable-auto-tool-choice
4 No parsers vllm serve Qwen/Qwen3-4B

@mergify mergify Bot added the frontend label Apr 9, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the chat completion serving logic to use a unified Parser interface, simplifying the handling of streaming deltas for reasoning and tool calls. The changes include initializing a parser class via ParserManager and replacing manual delta extraction logic with parser.parse_delta. Feedback indicates a potential issue where reasoning data might be lost if a single streaming chunk contains both the end of a reasoning block and the start of a tool call, as the reasoning delta could be overwritten.

Comment thread vllm/entrypoints/openai/chat_completion/serving.py
Signed-off-by: sfeng33 <4florafeng@gmail.com>
Signed-off-by: sfeng33 <4florafeng@gmail.com>
@sfeng33 sfeng33 marked this pull request as ready for review April 13, 2026 17:46
@sfeng33
Copy link
Copy Markdown
Collaborator Author

sfeng33 commented Apr 14, 2026

PTAL @chaunceyjiang @aarnphm

Copy link
Copy Markdown
Collaborator

@chaunceyjiang chaunceyjiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@chaunceyjiang chaunceyjiang added the ready ONLY add when PR is ready to merge/full CI is needed label Apr 14, 2026
@chaunceyjiang chaunceyjiang changed the title [Parser] Migrate chat completion auto-tool/reasoning/plain streaming to parse_delta [Refactor][Parser] Migrate chat completion auto-tool/reasoning/plain streaming to parse_delta Apr 14, 2026
@chaunceyjiang chaunceyjiang enabled auto-merge (squash) April 14, 2026 03:14
@chaunceyjiang chaunceyjiang merged commit db8a6d6 into vllm-project:main Apr 14, 2026
53 checks passed
@sfeng33 sfeng33 deleted the parser-1 branch April 14, 2026 04:40
whk-lab pushed a commit to whk-lab/vllm that referenced this pull request Apr 23, 2026
…streaming to parse_delta (vllm-project#39446)

Signed-off-by: sfeng33 <4florafeng@gmail.com>
avinashsingh77 pushed a commit to avinashsingh77/vllm that referenced this pull request Apr 27, 2026
…streaming to parse_delta (vllm-project#39446)

Signed-off-by: sfeng33 <4florafeng@gmail.com>
Signed-off-by: Avinash Singh <avinashsingh.rcoem@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

frontend ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants