feat: add reasoning/thinking support to Anthropic /v1/messages endpoint#35035
feat: add reasoning/thinking support to Anthropic /v1/messages endpoint#35035timon0305 wants to merge 1 commit intovllm-project:mainfrom
Conversation
There was a problem hiding this comment.
Code Review
The pull request successfully implements reasoning/thinking support for the Anthropic Messages API. The protocol models are correctly updated to include the thinking content type and configuration, and the serving logic handles both streaming and non-streaming responses. However, there are several issues in the streaming converter logic where continue and elif statements could lead to data loss if multiple types of deltas (reasoning, content, tool calls) are present in a single chunk from the engine. These should be addressed to ensure robustness.
| if delta.content == "": | ||
| continue | ||
| chunk = AnthropicStreamEvent( | ||
| index=content_block_index, | ||
| type="content_block_delta", | ||
| delta=AnthropicDelta( | ||
| type="text_delta", | ||
| text=origin_chunk.choices[0].delta.content, | ||
| text=delta.content, | ||
| ), | ||
| ) | ||
| data = chunk.model_dump_json(exclude_unset=True) | ||
| yield wrap_data_with_event(data, "content_block_delta") | ||
| continue |
There was a problem hiding this comment.
Similar to the reasoning block, using continue here prevents the processing of tool_calls if they are packed into the same chunk as a content delta. Removing the continue and adjusting the logic to use a non-skipping check for empty content ensures that all parts of the delta are handled.
| if delta.content == "": | |
| continue | |
| chunk = AnthropicStreamEvent( | |
| index=content_block_index, | |
| type="content_block_delta", | |
| delta=AnthropicDelta( | |
| type="text_delta", | |
| text=origin_chunk.choices[0].delta.content, | |
| text=delta.content, | |
| ), | |
| ) | |
| data = chunk.model_dump_json(exclude_unset=True) | |
| yield wrap_data_with_event(data, "content_block_delta") | |
| continue | |
| if delta.content != "": | |
| chunk = AnthropicStreamEvent( | |
| index=content_block_index, | |
| type="content_block_delta", | |
| delta=AnthropicDelta( | |
| type="text_delta", | |
| text=delta.content, | |
| ), | |
| ) | |
| data = chunk.model_dump_json(exclude_unset=True) | |
| yield wrap_data_with_event(data, "content_block_delta") |
| # tool calls | ||
| elif len(origin_chunk.choices[0].delta.tool_calls) > 0: | ||
| tool_call = origin_chunk.choices[0].delta.tool_calls[0] | ||
| elif len(delta.tool_calls) > 0: |
There was a problem hiding this comment.
Using elif here means that tool calls will be ignored if delta.content was also present in the same chunk (and the continue above was removed). Changing this to an if allows the converter to sequentially close the text block and open a tool block within the same iteration if the engine emits them together.
| elif len(delta.tool_calls) > 0: | |
| if len(delta.tool_calls) > 0: |
Signed-off-by: timon0305 <timon0305@outlook.com>
d53fbec to
e680cbf
Compare
|
This pull request has merge conflicts that must be resolved before it can be |
|
Thanks~ @timon0305 see #33671 |
Summary
Adds support for extended thinking / reasoning output in the Anthropic Messages API (
/v1/messages), resolving a feature gap where reasoning tokens from models like QwQ, DeepSeek-R1, and other thinking-capable models were not exposed through the Anthropic-compatible endpoint.Changes:
protocol.py: AddedAnthropicThinkingConfigmodel for thethinkingrequest parameter (matching Anthropic's{"type": "enabled", "budget_tokens": N}format), added"thinking"content block type toAnthropicContentBlock, and added"thinking_delta"delta type toAnthropicDeltaserving.py:thinking.type == "enabled"toinclude_reasoning=Trueon the OpenAI request; handles incomingthinkingcontent blocks from prior assistant turns by converting them to text for the modelmessage.reasoningfrom the OpenAI response and prepends athinkingcontent block before thetextcontent blockdelta.reasoningby emitting propercontent_block_start/content_block_delta/content_block_stopevents withthinkingtype blocks, correctly transitioning between thinking → text → tool_use block typestest_anthropic_reasoning.py: Unit tests covering protocol validation, request conversion, non-streaming response conversion, streaming response conversion with reasoning→text transitions, and serialization round-tripsProtocol compatibility:
The implementation follows the Anthropic API spec for extended thinking:
{"thinking": {"type": "enabled", "budget_tokens": 4096}}[{"type": "thinking", "thinking": "..."}, {"type": "text", "text": "..."}]thinking_deltaevents withthinkingfieldCloses #29915
Test plan
AnthropicThinkingConfigvalidation (enabled requires budget_tokens)include_reasoningflag propagation)