fix(websearch_interception): Convert agentic loop response to streaming format for Claude Code#21878
fix(websearch_interception): Convert agentic loop response to streaming format for Claude Code#21878shin-bot-litellm wants to merge 1 commit intomainfrom
Conversation
…ng format for Claude Code Fixes #20187 - When using websearch_interception in Bedrock with Claude Code: 1. Output tokens were showing as 0 because the agentic loop response wasn't being converted back to streaming format 2. The response from the agentic loop (follow-up request) was returned as a non-streaming dict, but Claude Code expects a streaming response This fix adds streaming format conversion for the agentic loop response when the original request was streaming (detected via the websearch_interception_converted_stream flag in logging_obj). The fix applies to both: - Anthropic Messages API (_call_agentic_completion_hooks) - Chat Completions API (_call_agentic_chat_completion_hooks) The fix ensures: - Output tokens are correctly included in the message_delta event - stop_reason is properly preserved - The response format matches what Claude Code expects Note: This fix was previously in PR #20631 but was merged to a staging branch (litellm_oss_staging_02_07_2026) and never made it to main.
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
|
|
Greptile SummaryThis PR fixes a streaming format mismatch when
Confidence Score: 4/5
|
| Filename | Overview |
|---|---|
| litellm/llms/custom_httpx/llm_http_handler.py | Adds fake-stream conversion inside both _call_agentic_completion_hooks and _call_agentic_chat_completion_hooks for when websearch_interception_converted_stream is set. The logic mirrors the existing non-agentic fallback paths already on main. No new bugs introduced. |
| tests/test_litellm/integrations/websearch_interception/test_websearch_interception_handler.py | Two new mock-only unit tests verifying FakeAnthropicMessagesStreamIterator correctly includes output_tokens and stop_reason in the message_delta event. No network calls, compliant with CI requirements. |
Sequence Diagram
sequenceDiagram
participant Client as Client (Claude Code)
participant Proxy as LiteLLM Proxy
participant Hook as WebSearch Interception Hook
participant LLM as LLM Provider (Bedrock)
participant Search as Search Provider
Client->>Proxy: POST /v1/messages (stream=True)
Proxy->>Hook: async_pre_call_deployment_hook
Hook-->>Proxy: stream=False (converted)
Proxy->>LLM: Non-streaming request
LLM-->>Proxy: Non-streaming response (with tool_use)
Proxy->>Hook: async_should_run_agentic_loop
Hook-->>Proxy: should_run=True
Proxy->>Hook: async_run_agentic_loop
Hook->>Search: Execute web search
Search-->>Hook: Search results
Hook->>LLM: Follow-up request (non-streaming)
LLM-->>Hook: Final response (dict)
Hook-->>Proxy: agentic_response (dict)
Note over Proxy: websearch_converted_stream=True
Proxy->>Proxy: Wrap in FakeAnthropicMessagesStreamIterator
Proxy-->>Client: Streaming SSE response (message_start, content_block_delta, message_delta, message_stop)
Last reviewed commit: 990cda8
| from typing import cast | ||
|
|
||
| from litellm.llms.anthropic.experimental_pass_through.messages.fake_stream_iterator import ( | ||
| FakeAnthropicMessagesStreamIterator, | ||
| ) | ||
| from litellm.types.llms.anthropic_messages.anthropic_response import ( | ||
| AnthropicMessagesResponse, | ||
| ) |
There was a problem hiding this comment.
Inline imports inside method body
Per CLAUDE.md style guidelines: "Avoid imports within methods — place all imports at the top of the file (module-level). Inline imports inside functions/methods make dependencies harder to trace and hurt readability."
Note: This follows the existing pattern in this method (e.g., verbose_logger is re-imported inline at line 4408 despite being a top-level import). Ideally these would be consolidated at the module level, but this is a pre-existing pattern rather than a new concern.
Context Used: Context from dashboard - CLAUDE.md (source)
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
What does this PR do?
Fixes #20187 - When using
websearch_interceptioncallback with Bedrock + Claude Code:Root Cause
When the agentic loop runs (after a web search tool call is detected), the follow-up LLM request returns a non-streaming response. However, the original request had
stream=Truewhich was converted tostream=Falseby thewebsearch_interceptionhook. The response needs to be converted back to streaming format before being returned to the client.This fix was previously in PR #20631 but was merged to a staging branch (
litellm_oss_staging_02_07_2026) and never made it to main.Changes
litellm/llms/custom_httpx/llm_http_handler.py:_call_agentic_completion_hooksafter the agentic loop completes_call_agentic_chat_completion_hooksafter the chat completion agentic loop completesFakeAnthropicMessagesStreamIteratorfor Anthropic Messages APIconvert_model_response_to_streamingfor Chat Completions APItests/test_litellm/integrations/websearch_interception/test_websearch_interception_handler.py:test_fake_anthropic_messages_stream_iterator_includes_output_tokensto verify output_tokens is included in message_deltatest_fake_anthropic_messages_stream_iterator_preserves_stop_reasonto verify stop_reason is preservedHow the fix works
async_run_agentic_looporasync_run_chat_completion_agentic_loop)websearch_interception_converted_stream=Truein logging_obj (set when stream was converted)This ensures:
message_deltaeventstop_reasonis properly preservedTesting
All existing tests pass + 2 new tests added: