Fix websearch interception with extended thinking mode support#20488
Fix websearch interception with extended thinking mode support#20488mpcusack-altos wants to merge 1 commit intoBerriAI:mainfrom
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
Greptile OverviewGreptile SummaryThis PR fixes websearch interception when Anthropic's extended thinking mode is enabled by capturing and preserving thinking blocks throughout the agentic loop. Key Changes:
Code Quality Notes:
The implementation correctly addresses the validation error described in the PR description. Confidence Score: 4/5
|
| Filename | Overview |
|---|---|
| litellm/integrations/websearch_interception/transformation.py | Introduces TransformRequestResult NamedTuple and adds thinking block capture/normalization logic to handle extended thinking mode requirements |
| litellm/integrations/websearch_interception/handler.py | Threads thinking_blocks through agentic loop chain and changes hook return to spread all kwargs; contains duplicate tool conversion logic across two hooks |
| litellm/litellm_core_utils/core_helpers.py | Adds INTERNAL_PARAMS_PREFIXES constant and refactors filter_internal_params() to use prefix-based filtering with clean implementation |
Sequence Diagram
sequenceDiagram
participant Client as Claude Code
participant Router as LiteLLM Router
participant Hook as WebSearchInterception
participant Transform as WebSearchTransformation
participant Provider as AWS Bedrock
participant Search as litellm.asearch()
Client->>Router: completion(model, messages, tools=[web_search])
Router->>Hook: async_pre_call_deployment_hook(kwargs)
Hook->>Hook: Convert native web_search to litellm_web_search tool
Hook-->>Router: Return modified kwargs with converted tools
Router->>Provider: API request with litellm_web_search tool
Provider-->>Router: Response with thinking blocks + tool_use blocks
Router->>Hook: async_should_run_agentic_loop(response)
Hook->>Transform: transform_request(response)
Transform->>Transform: Extract tool_calls and thinking_blocks from response
Transform-->>Hook: TransformRequestResult(has_websearch=True, tool_calls, thinking_blocks)
Hook-->>Router: Return (True, tools_dict)
Router->>Hook: async_run_agentic_loop(tools, messages, kwargs)
Hook->>Search: Execute searches in parallel via litellm.asearch()
Search-->>Hook: Return search results
Hook->>Transform: transform_response(tool_calls, results, thinking_blocks)
Note over Transform: Builds assistant message with thinking blocks first,<br/>then tool_use blocks (Anthropic requirement)
Transform-->>Hook: Return (assistant_message, user_message)
Hook->>Provider: Follow-up request with messages + search results
Provider-->>Hook: Final response with answer
Hook-->>Router: Return final response
Router-->>Client: Final response
Additional Comments (1)
Prompt To Fix With AIThis is a comment left during a code review.
Path: litellm/integrations/websearch_interception/transformation.py
Line: 186:197
Comment:
**Mismatched tool/result lengths**
`transform_response()` indexes `search_results[i]` for `i in range(len(tool_calls))`. If any tool call didn’t produce a result (e.g., earlier code adds an empty placeholder for missing `query`, or future callers pass fewer results), this will raise `IndexError` and break the agentic loop. Consider iterating over `zip(tool_calls, search_results)` (and/or asserting lengths match) to avoid runtime crashes.
How can I resolve this? If you propose a fix, please make it concise. |
ff442f5 to
9b4b15c
Compare
9b4b15c to
b3e5f72
Compare
b3e5f72 to
e2f0969
Compare
e2f0969 to
dc333f1
Compare
dc333f1 to
a6c5882
Compare
a6c5882 to
cb1ff68
Compare
cb1ff68 to
0bb8498
Compare
719508b to
74d8fd6
Compare
28d335a to
7b541dd
Compare
7b541dd to
1abc412
Compare
1abc412 to
c8dd971
Compare
5b2f59d to
1715711
Compare
69b73f5 to
5ab5727
Compare
|
Hey @mpcusack-altos, solid PR — the thinking block fix is well-architected and the 1. Normalize thinking blocks to dicts when capturingThe blocks captured from response content could be Anthropic SDK response objects (e.g., assistant_content.extend(thinking_blocks)If the blocks are SDK objects, this could fail when if block_type in ("thinking", "redacted_thinking"):
if isinstance(block, dict):
thinking_blocks.append(block)
else:
# Normalize SDK objects to dicts for safe serialization in follow-up requests
normalized = {"type": block_type}
for attr in ("thinking", "data", "signature"):
if hasattr(block, attr):
normalized[attr] = getattr(block, attr)
thinking_blocks.append(normalized)2. Good catch on
|
Additional Comments (1)
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time! Prompt To Fix With AIThis is a comment left during a code review.
Path: litellm/integrations/websearch_interception/handler.py
Line: 68:115
Comment:
Duplicate logic in two hooks - `async_pre_call_deployment_hook` (lines 68-115) and `async_pre_request_hook` (lines 163-238) perform nearly identical tool conversion. Both iterate through tools and convert web search tools to LiteLLM standard format. Consider consolidating this logic into a shared helper method to reduce duplication and maintenance burden.
<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>
How can I resolve this? If you propose a fix, please make it concise. |
|
@jquinter Thanks for the review. 1-4) ✔️ CI linting/tests currently failing due to merge conflict cruft upsteam, but passes locally. |
|
That's amazing! Thank you so much for putting together such a comprehensive fix. Your PR is vastly superior to my poor attempt (#20489), so I will simply yield to your change, and simply address the API key loading & budget constraints on top of your PR separately. |
|
Unfortunately, there are still a few issues despite the fix, e.g. running your fix, alongside Bedrock + Sonnet 4.5: More specifically: My Claude put an attempt together to fix, but it's mostly wrong as it (and I) do not understands the APIs & flows correctly. |
|
@mpcusack-altos can you address the feedback from @Quentin-M so it can actually work for claude code? would love to just have 1 pr which fixes this cc: @jquinter for monitoring |
|
PR was working for me yesterday against opus 4.5. I'll stress test it with opus+sonnet some more today and see if I can reproduce @Quentin-M's error. I think I can follow what it is asking us to do from the error message, but tough be sure if I can't reproduce. @Quentin-M were you able to use web search at all? Did it fail in a fresh session or only in the middle of a chat? Does it work for you with opus 4-5? What version of claude code? |
|
@mpcusack-altos Sorry for the late reply - I had to make a last minute trip after losing a family member.
Interesting question. I was starting fresh sessions to make a basic demo video.
Actually trying to reproduce again right now with Opus 4.6 1M & Opus 4.5 - and web search seems to work well - even when asking Claude Code to think hard. With that being said, we run LiteLLM with the following PR: https://github.com/BerriAI/litellm/pull/20489/commits, which carries various 'fixes' (all Claude generated without review) including fix(thinking): drop thinking param when assistant messages have text without thinking blocks as well as a few other thinking & beta headers fixes. The logic might be flawed (I am no expert, and Claude certainly isn't one either), but that has allowed our various teams to work over the past few days... I have not heard complaints, and can't look back from central logs as we have disabled them for LiteLLM specifically due to LiteLLM leaking prompts/responses in a few places of the code despite having that generally disabled. |
|
I've rebased my PR on your latest changes. My branch builds on top of yours, and fixes many more issues in order to make websearch / thinking work relatively well with claude code and the latest models. |
When Anthropic's extended thinking is enabled, assistant messages must start with thinking blocks before tool_use blocks. The agentic loop was creating follow-up messages with only tool_use blocks, causing validation errors. This change ensures thinking blocks from the original response are preserved and included at the start of follow-up assistant messages.
- Created `TransformRequestResult` NamedTuple to capture both tool_calls and thinking_blocks from `transform_request()`, making the contract explicit and extensible
- Modified `transform_request()` to extract and return thinking/redacted_thinking blocks alongside tool calls
- Updated `transform_response()` to accept thinking_blocks and prepend them to follow-up assistant messages
- Passed thinking_blocks through the agentic loop chain: detection → execution → message transformation
- Fixed `transform_request()` to return full kwargs (not just tools) to preserve other request parameters
- Used `filter_internal_params()` utility instead of manual filtering for consistency
This change fixes websearch interception when extended thinking mode is enabled.
**Problem**: When Anthropic's extended thinking is enabled, assistant messages must start with thinking blocks before tool_use blocks. The agentic loop was creating follow-up messages with only tool_use blocks, causing the error: `messages.1.content.0.type: Expected 'thinking' or 'redacted_thinking', but found 'tool_use'`
**Solution**: Modified `transform_request()` to capture thinking/redacted_thinking blocks from the original response, and `transform_response()` to include them at the start of the assistant message in follow-up requests.
**Testing**: Successfully tested end-to-end with Claude Code → LiteLLM Proxy → AWS Bedrock → Claude Opus 4.5.
```yaml
model_list:
- model_name: claude-opus-4-5-20251101
litellm_params:
model: bedrock/us.anthropic.claude-opus-4-5-20251101-v1:0
aws_region_name: us-west-2
model_info:
supports_web_search: true
litellm_settings:
callbacks: ["websearch_interception"]
websearch_interception_params:
enabled_providers: ["bedrock"]
search_tool_name: "searxng-search"
search_tools:
- search_tool_name: searxng-search
litellm_params:
search_provider: searxng
api_base: "https://searxng.example.com"
```
**Note**: Uses `bedrock/` (not `bedrock/converse/`) to route through `anthropic_messages_handler()` which supports agentic hooks.
|
@greptile please review this PR |
|
I tested this some more today with sonnet/opus 4.5/4.6 and this PR as is is working for me. I haven't been able to reproduce the issues @Quentin-M is running into for some reason, although looking at his code I agree most/all of it are issues this code could hit. Claude summary of those changes: This combined with the thinking support I have in this PR results in complex and somewhat brittle code in the web search interceptor that really has nothing to do with web search. And I think will have us chasing our tails with every new claude code/models update. I feel like it's an indication that the @krrishdholakia @jquinter You know litellm way better than I do, might there be a better hook the interceptor could use? Or some common code path it could delegate all the thinking block manipulation to? |
Motivation
When Anthropic's extended thinking is enabled, assistant messages must start with thinking blocks before tool_use blocks. The agentic loop was creating follow-up messages with only tool_use blocks, causing validation errors. This change ensures thinking blocks from the original response are preserved and included at the start of follow-up assistant messages.
Implementation
TransformRequestResultNamedTuple to capture both tool_calls and thinking_blocks fromtransform_request(), making the contract explicit and extensibletransform_request()to extract and return thinking/redacted_thinking blocks alongside tool callstransform_response()to accept thinking_blocks and prepend them to follow-up assistant messagestransform_request()to return full kwargs (not just tools) to preserve other request parametersfilter_internal_params()utility instead of manual filtering for consistencyAdditional Context
This change fixes websearch interception when extended thinking mode is enabled.
Problem: When Anthropic's extended thinking is enabled, assistant messages must start with thinking blocks before tool_use blocks. The agentic loop was creating follow-up messages with only tool_use blocks, causing the error:
messages.1.content.0.type: Expected 'thinking' or 'redacted_thinking', but found 'tool_use'Solution: Modified
transform_request()to capture thinking/redacted_thinking blocks from the original response, andtransform_response()to include them at the start of the assistant message in follow-up requests.Testing: Successfully tested end-to-end with Claude Code → LiteLLM Proxy → AWS Bedrock → Claude Opus 4.5.
Configuration Example
Note: Uses
bedrock/(notbedrock/converse/) to route throughanthropic_messages_handler()which supports agentic hooks.