Skip to content

fix(websearch_interception): preserve thinking blocks in agentic loop follow-up messages#21604

Merged
Sameerlite merged 1 commit intoBerriAI:mainfrom
michelligabriele:fix/websearch-thinking-blocks
Feb 26, 2026
Merged

fix(websearch_interception): preserve thinking blocks in agentic loop follow-up messages#21604
Sameerlite merged 1 commit intoBerriAI:mainfrom
michelligabriele:fix/websearch-thinking-blocks

Conversation

@michelligabriele
Copy link
Contributor

When extended thinking is enabled, the websearch interception agentic loop builds a follow-up assistant message with only tool_use blocks. Anthropic's API requires assistant messages to start with thinking/redacted_thinking blocks when thinking is enabled, causing a 400 Bad Request.

Extract thinking blocks from the model's initial response, thread them through the agentic loop, and prepend them to the follow-up assistant message — matching the pattern used by anthropic_messages_pt in factory.py.

Fixes the error: "Expected 'thinking' or 'redacted_thinking', but found 'tool_use'"

Relevant issues

Fixes #20187

Related PRs: #20488 (by @mpcusack-altos) and #20489 (by @Quentin-M) attempt the same fix with broader scope. This PR takes a minimal, focused approach — fixing only the core thinking block issue in the Anthropic Messages API pass-through path.

Pre-Submission checklist

  • I have Added testing in the tests/litellm/ directory, Adding at least 1 test is a hard requirement - see details
  • My PR passes all unit tests on make test-unit
  • My PR's scope is as isolated as possible, it only solves 1 specific problem
  • I have requested a Greptile review by commenting @greptileai and received a Confidence Score of at least 4/5 before requesting a maintainer review

CI (LiteLLM team)

CI status guideline:

  • 50-55 passing tests: main is stable with minor issues.
  • 45-49 passing tests: acceptable but needs attention
  • <= 40 passing tests: unstable; be careful with your merges and assess the risk.
  • Branch creation CI run
    Link:

  • CI run for the last commit
    Link:

  • Merge / cherry-pick CI run
    Links:

Type

🐛 Bug Fix

Changes

  • handler.pyasync_should_run_agentic_loop: Extract thinking/redacted_thinking blocks from the model response content and include them in the tools_dict passed to the agentic loop
  • handler.pyasync_run_agentic_loop / _execute_agentic_loop: Thread thinking_blocks through to transform_response
  • transformation.pytransform_response / _transform_response_anthropic: Accept optional thinking_blocks parameter and prepend them before tool_use blocks in the follow-up assistant message (same pattern as anthropic_messages_pt in factory.py)
  • test_websearch_interception_thinking.py: 9 new unit tests covering thinking block extraction (dict + object responses), prepending, backward compatibility (no thinking / empty list), public API routing, and OpenAI path isolation

… follow-up messages

When extended thinking is enabled, the websearch interception agentic loop
builds a follow-up assistant message with only tool_use blocks. Anthropic's
API requires assistant messages to start with thinking/redacted_thinking
blocks when thinking is enabled, causing a 400 Bad Request.

Extract thinking blocks from the model's initial response, thread them
through the agentic loop, and prepend them to the follow-up assistant
message — matching the pattern used by anthropic_messages_pt in factory.py.

Fixes the error: "Expected 'thinking' or 'redacted_thinking', but found
'tool_use'"
@vercel
Copy link

vercel bot commented Feb 19, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
litellm Ready Ready Preview, Comment Feb 19, 2026 8:54pm

Request Review

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Feb 19, 2026

Greptile Summary

Fixes "Expected 'thinking' or 'redacted_thinking', but found 'tool_use'" error when websearch interception is used with extended thinking enabled.

Key changes:

  • Extracts thinking/redacted_thinking blocks from initial model response in async_should_run_agentic_loop
  • Threads thinking blocks through async_run_agentic_loop_execute_agentic_looptransform_response
  • Prepends thinking blocks before tool_use blocks in follow-up assistant message, matching Anthropic API requirements
  • Adds 9 comprehensive unit tests covering extraction (dict + object responses), prepending, backward compatibility, and format isolation

Issues found:

  • Missing cache_control field preservation when converting thinking block objects to dicts (handler.py:322-336)

Confidence Score: 4/5

  • Safe to merge with one logic fix needed for cache_control field preservation
  • Well-structured fix with comprehensive tests, but missing cache_control field in thinking block object-to-dict conversion could cause issues if prompt caching is used with extended thinking
  • handler.py lines 322-336 need to preserve cache_control field

Important Files Changed

Filename Overview
litellm/integrations/websearch_interception/handler.py Extracts thinking/redacted_thinking blocks from model response and threads them through the agentic loop. One issue: missing cache_control field preservation in object-to-dict conversion.
litellm/integrations/websearch_interception/transformation.py Prepends thinking blocks before tool_use blocks in assistant message. Implementation correctly matches Anthropic API requirements and includes proper backward compatibility.
tests/test_litellm/integrations/websearch_interception/test_websearch_interception_thinking.py Comprehensive test coverage with 9 unit tests covering extraction, prepending, backward compatibility, and OpenAI path isolation. All tests use mocks (no network calls).

Sequence Diagram

sequenceDiagram
    participant Model as Anthropic Model
    participant Handler as WebSearchInterceptionLogger
    participant Transform as WebSearchTransformation
    participant Search as litellm.asearch()
    
    Model->>Handler: response with thinking + tool_use blocks
    Note over Handler: Extract thinking/redacted_thinking blocks<br/>from response.content
    Handler->>Transform: transform_request(response)
    Transform-->>Handler: tool_calls
    Note over Handler: Store thinking_blocks in tools_dict
    
    Handler->>Search: Execute searches in parallel
    Search-->>Handler: search_results
    
    Handler->>Transform: transform_response(tool_calls, search_results, thinking_blocks)
    Note over Transform: Prepend thinking_blocks before tool_use blocks<br/>in assistant message
    Transform-->>Handler: assistant_message, user_message
    
    Handler->>Model: Follow-up request with:<br/>1. thinking blocks (prepended)<br/>2. tool_use blocks<br/>3. tool_result blocks
    Model-->>Handler: Final response
Loading

Last reviewed commit: 4630793

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

Comment on lines +322 to +336
# Convert object to dict using getattr, matching the
# pattern in _detect_from_non_streaming_response
thinking_block_dict: Dict = {"type": block_type}
if block_type == "thinking":
thinking_block_dict["thinking"] = getattr(
block, "thinking", ""
)
thinking_block_dict["signature"] = getattr(
block, "signature", ""
)
else: # redacted_thinking
thinking_block_dict["data"] = getattr(
block, "data", ""
)
thinking_blocks.append(thinking_block_dict)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing cache_control field when converting object to dict

thinking blocks can include an optional cache_control field (see ChatCompletionThinkingBlock and ChatCompletionRedactedThinkingBlock in types/llms/openai.py), but this conversion only copies type, thinking, signature, and data fields

Suggested change
# Convert object to dict using getattr, matching the
# pattern in _detect_from_non_streaming_response
thinking_block_dict: Dict = {"type": block_type}
if block_type == "thinking":
thinking_block_dict["thinking"] = getattr(
block, "thinking", ""
)
thinking_block_dict["signature"] = getattr(
block, "signature", ""
)
else: # redacted_thinking
thinking_block_dict["data"] = getattr(
block, "data", ""
)
thinking_blocks.append(thinking_block_dict)
# Convert object to dict using getattr, matching the
# pattern in _detect_from_non_streaming_response
thinking_block_dict: Dict = {"type": block_type}
if block_type == "thinking":
thinking_block_dict["thinking"] = getattr(
block, "thinking", ""
)
thinking_block_dict["signature"] = getattr(
block, "signature", ""
)
else: # redacted_thinking
thinking_block_dict["data"] = getattr(
block, "data", ""
)
# Preserve cache_control if present
cache_control = getattr(block, "cache_control", None)
if cache_control is not None:
thinking_block_dict["cache_control"] = cache_control
thinking_blocks.append(thinking_block_dict)

@Sameerlite Sameerlite merged commit 1790a6b into BerriAI:main Feb 26, 2026
8 of 21 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Claude Code shows 0 output tokens when using websearch_interception in bedrock and tool call not recorded in request logs

2 participants