fix(websearch_interception): preserve thinking blocks in agentic loop follow-up messages#21604
Merged
Sameerlite merged 1 commit intoBerriAI:mainfrom Feb 26, 2026
Conversation
… follow-up messages When extended thinking is enabled, the websearch interception agentic loop builds a follow-up assistant message with only tool_use blocks. Anthropic's API requires assistant messages to start with thinking/redacted_thinking blocks when thinking is enabled, causing a 400 Bad Request. Extract thinking blocks from the model's initial response, thread them through the agentic loop, and prepend them to the follow-up assistant message — matching the pattern used by anthropic_messages_pt in factory.py. Fixes the error: "Expected 'thinking' or 'redacted_thinking', but found 'tool_use'"
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
Contributor
Greptile SummaryFixes "Expected 'thinking' or 'redacted_thinking', but found 'tool_use'" error when websearch interception is used with extended thinking enabled. Key changes:
Issues found:
Confidence Score: 4/5
|
| Filename | Overview |
|---|---|
| litellm/integrations/websearch_interception/handler.py | Extracts thinking/redacted_thinking blocks from model response and threads them through the agentic loop. One issue: missing cache_control field preservation in object-to-dict conversion. |
| litellm/integrations/websearch_interception/transformation.py | Prepends thinking blocks before tool_use blocks in assistant message. Implementation correctly matches Anthropic API requirements and includes proper backward compatibility. |
| tests/test_litellm/integrations/websearch_interception/test_websearch_interception_thinking.py | Comprehensive test coverage with 9 unit tests covering extraction, prepending, backward compatibility, and OpenAI path isolation. All tests use mocks (no network calls). |
Sequence Diagram
sequenceDiagram
participant Model as Anthropic Model
participant Handler as WebSearchInterceptionLogger
participant Transform as WebSearchTransformation
participant Search as litellm.asearch()
Model->>Handler: response with thinking + tool_use blocks
Note over Handler: Extract thinking/redacted_thinking blocks<br/>from response.content
Handler->>Transform: transform_request(response)
Transform-->>Handler: tool_calls
Note over Handler: Store thinking_blocks in tools_dict
Handler->>Search: Execute searches in parallel
Search-->>Handler: search_results
Handler->>Transform: transform_response(tool_calls, search_results, thinking_blocks)
Note over Transform: Prepend thinking_blocks before tool_use blocks<br/>in assistant message
Transform-->>Handler: assistant_message, user_message
Handler->>Model: Follow-up request with:<br/>1. thinking blocks (prepended)<br/>2. tool_use blocks<br/>3. tool_result blocks
Model-->>Handler: Final response
Last reviewed commit: 4630793
Comment on lines
+322
to
+336
| # Convert object to dict using getattr, matching the | ||
| # pattern in _detect_from_non_streaming_response | ||
| thinking_block_dict: Dict = {"type": block_type} | ||
| if block_type == "thinking": | ||
| thinking_block_dict["thinking"] = getattr( | ||
| block, "thinking", "" | ||
| ) | ||
| thinking_block_dict["signature"] = getattr( | ||
| block, "signature", "" | ||
| ) | ||
| else: # redacted_thinking | ||
| thinking_block_dict["data"] = getattr( | ||
| block, "data", "" | ||
| ) | ||
| thinking_blocks.append(thinking_block_dict) |
Contributor
There was a problem hiding this comment.
missing cache_control field when converting object to dict
thinking blocks can include an optional cache_control field (see ChatCompletionThinkingBlock and ChatCompletionRedactedThinkingBlock in types/llms/openai.py), but this conversion only copies type, thinking, signature, and data fields
Suggested change
| # Convert object to dict using getattr, matching the | |
| # pattern in _detect_from_non_streaming_response | |
| thinking_block_dict: Dict = {"type": block_type} | |
| if block_type == "thinking": | |
| thinking_block_dict["thinking"] = getattr( | |
| block, "thinking", "" | |
| ) | |
| thinking_block_dict["signature"] = getattr( | |
| block, "signature", "" | |
| ) | |
| else: # redacted_thinking | |
| thinking_block_dict["data"] = getattr( | |
| block, "data", "" | |
| ) | |
| thinking_blocks.append(thinking_block_dict) | |
| # Convert object to dict using getattr, matching the | |
| # pattern in _detect_from_non_streaming_response | |
| thinking_block_dict: Dict = {"type": block_type} | |
| if block_type == "thinking": | |
| thinking_block_dict["thinking"] = getattr( | |
| block, "thinking", "" | |
| ) | |
| thinking_block_dict["signature"] = getattr( | |
| block, "signature", "" | |
| ) | |
| else: # redacted_thinking | |
| thinking_block_dict["data"] = getattr( | |
| block, "data", "" | |
| ) | |
| # Preserve cache_control if present | |
| cache_control = getattr(block, "cache_control", None) | |
| if cache_control is not None: | |
| thinking_block_dict["cache_control"] = cache_control | |
| thinking_blocks.append(thinking_block_dict) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
When extended thinking is enabled, the websearch interception agentic loop builds a follow-up assistant message with only tool_use blocks. Anthropic's API requires assistant messages to start with thinking/redacted_thinking blocks when thinking is enabled, causing a 400 Bad Request.
Extract thinking blocks from the model's initial response, thread them through the agentic loop, and prepend them to the follow-up assistant message — matching the pattern used by anthropic_messages_pt in factory.py.
Fixes the error: "Expected 'thinking' or 'redacted_thinking', but found 'tool_use'"
Relevant issues
Fixes #20187
Related PRs: #20488 (by @mpcusack-altos) and #20489 (by @Quentin-M) attempt the same fix with broader scope. This PR takes a minimal, focused approach — fixing only the core thinking block issue in the Anthropic Messages API pass-through path.
Pre-Submission checklist
tests/litellm/directory, Adding at least 1 test is a hard requirement - see detailsmake test-unit@greptileaiand received a Confidence Score of at least 4/5 before requesting a maintainer reviewCI (LiteLLM team)
Branch creation CI run
Link:
CI run for the last commit
Link:
Merge / cherry-pick CI run
Links:
Type
🐛 Bug Fix
Changes
handler.py—async_should_run_agentic_loop: Extractthinking/redacted_thinkingblocks from the model response content and include them in thetools_dictpassed to the agentic loophandler.py—async_run_agentic_loop/_execute_agentic_loop: Threadthinking_blocksthrough totransform_responsetransformation.py—transform_response/_transform_response_anthropic: Accept optionalthinking_blocksparameter and prepend them beforetool_useblocks in the follow-up assistant message (same pattern asanthropic_messages_ptinfactory.py)test_websearch_interception_thinking.py: 9 new unit tests covering thinking block extraction (dict + object responses), prepending, backward compatibility (no thinking / empty list), public API routing, and OpenAI path isolation