Skip to content

feat: add reasoning/thinking support to Anthropic /v1/messages endpoint#35035

Open
timon0305 wants to merge 1 commit intovllm-project:mainfrom
timon0305:add-anthropic-reasoning-support
Open

feat: add reasoning/thinking support to Anthropic /v1/messages endpoint#35035
timon0305 wants to merge 1 commit intovllm-project:mainfrom
timon0305:add-anthropic-reasoning-support

Conversation

@timon0305
Copy link

Summary

Adds support for extended thinking / reasoning output in the Anthropic Messages API (/v1/messages), resolving a feature gap where reasoning tokens from models like QwQ, DeepSeek-R1, and other thinking-capable models were not exposed through the Anthropic-compatible endpoint.

Changes:

  • protocol.py: Added AnthropicThinkingConfig model for the thinking request parameter (matching Anthropic's {"type": "enabled", "budget_tokens": N} format), added "thinking" content block type to AnthropicContentBlock, and added "thinking_delta" delta type to AnthropicDelta
  • serving.py:
    • Request conversion: Maps thinking.type == "enabled" to include_reasoning=True on the OpenAI request; handles incoming thinking content blocks from prior assistant turns by converting them to text for the model
    • Non-streaming: Extracts message.reasoning from the OpenAI response and prepends a thinking content block before the text content block
    • Streaming: Handles delta.reasoning by emitting proper content_block_start/content_block_delta/content_block_stop events with thinking type blocks, correctly transitioning between thinking → text → tool_use block types
  • test_anthropic_reasoning.py: Unit tests covering protocol validation, request conversion, non-streaming response conversion, streaming response conversion with reasoning→text transitions, and serialization round-trips

Protocol compatibility:
The implementation follows the Anthropic API spec for extended thinking:

  • Request: {"thinking": {"type": "enabled", "budget_tokens": 4096}}
  • Response content blocks: [{"type": "thinking", "thinking": "..."}, {"type": "text", "text": "..."}]
  • Streaming: thinking_delta events with thinking field

Closes #29915

Test plan

  • Unit tests for AnthropicThinkingConfig validation (enabled requires budget_tokens)
  • Unit tests for request conversion (include_reasoning flag propagation)
  • Unit tests for non-streaming response with/without reasoning
  • Unit tests for streaming response with reasoning→text block transitions
  • Unit tests for serialization round-trips of thinking content blocks
  • Ruff lint and format checks pass

@dosubot
Copy link

dosubot bot commented Feb 22, 2026

Related Documentation

Checked 0 published document(s) in 1 knowledge base(s). No updates required.

How did I do? Any feedback?  Join Discord

@mergify mergify bot added the frontend label Feb 22, 2026
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

The pull request successfully implements reasoning/thinking support for the Anthropic Messages API. The protocol models are correctly updated to include the thinking content type and configuration, and the serving logic handles both streaming and non-streaming responses. However, there are several issues in the streaming converter logic where continue and elif statements could lead to data loss if multiple types of deltas (reasoning, content, tool calls) are present in a single chunk from the engine. These should be addressed to ensure robustness.

Comment on lines +481 to 493
if delta.content == "":
continue
chunk = AnthropicStreamEvent(
index=content_block_index,
type="content_block_delta",
delta=AnthropicDelta(
type="text_delta",
text=origin_chunk.choices[0].delta.content,
text=delta.content,
),
)
data = chunk.model_dump_json(exclude_unset=True)
yield wrap_data_with_event(data, "content_block_delta")
continue
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Similar to the reasoning block, using continue here prevents the processing of tool_calls if they are packed into the same chunk as a content delta. Removing the continue and adjusting the logic to use a non-skipping check for empty content ensures that all parts of the delta are handled.

Suggested change
if delta.content == "":
continue
chunk = AnthropicStreamEvent(
index=content_block_index,
type="content_block_delta",
delta=AnthropicDelta(
type="text_delta",
text=origin_chunk.choices[0].delta.content,
text=delta.content,
),
)
data = chunk.model_dump_json(exclude_unset=True)
yield wrap_data_with_event(data, "content_block_delta")
continue
if delta.content != "":
chunk = AnthropicStreamEvent(
index=content_block_index,
type="content_block_delta",
delta=AnthropicDelta(
type="text_delta",
text=delta.content,
),
)
data = chunk.model_dump_json(exclude_unset=True)
yield wrap_data_with_event(data, "content_block_delta")

# tool calls
elif len(origin_chunk.choices[0].delta.tool_calls) > 0:
tool_call = origin_chunk.choices[0].delta.tool_calls[0]
elif len(delta.tool_calls) > 0:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Using elif here means that tool calls will be ignored if delta.content was also present in the same chunk (and the continue above was removed). Changing this to an if allows the converter to sequentially close the text block and open a tool block within the same iteration if the engine emits them together.

Suggested change
elif len(delta.tool_calls) > 0:
if len(delta.tool_calls) > 0:

Signed-off-by: timon0305 <timon0305@outlook.com>
@timon0305 timon0305 force-pushed the add-anthropic-reasoning-support branch from d53fbec to e680cbf Compare February 22, 2026 01:13
@mergify
Copy link

mergify bot commented Feb 26, 2026

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @timon0305.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Feb 26, 2026
@chaunceyjiang
Copy link
Collaborator

Thanks~ @timon0305 see #33671

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature]: include reasoning tokens in /v1/messages Anthropic endpoint if model supports it

2 participants