fix(websearch_interception): Convert agentic loop response to streaming format for Claude Code by shin-bot-litellm · Pull Request #21878 · BerriAI/litellm

shin-bot-litellm · 2026-02-22T04:32:09Z

What does this PR do?

Fixes #20187 - When using websearch_interception callback with Bedrock + Claude Code:

Output tokens were showing as 0 in Claude Code session logs
The response from the agentic loop (follow-up request) was returned as a non-streaming dict, but Claude Code expects a streaming response

Root Cause

When the agentic loop runs (after a web search tool call is detected), the follow-up LLM request returns a non-streaming response. However, the original request had stream=True which was converted to stream=False by the websearch_interception hook. The response needs to be converted back to streaming format before being returned to the client.

This fix was previously in PR #20631 but was merged to a staging branch (litellm_oss_staging_02_07_2026) and never made it to main.

Changes

litellm/llms/custom_httpx/llm_http_handler.py:
- Added streaming format conversion in _call_agentic_completion_hooks after the agentic loop completes
- Added streaming format conversion in _call_agentic_chat_completion_hooks after the chat completion agentic loop completes
- Uses FakeAnthropicMessagesStreamIterator for Anthropic Messages API
- Uses convert_model_response_to_streaming for Chat Completions API
tests/test_litellm/integrations/websearch_interception/test_websearch_interception_handler.py:
- Added test test_fake_anthropic_messages_stream_iterator_includes_output_tokens to verify output_tokens is included in message_delta
- Added test test_fake_anthropic_messages_stream_iterator_preserves_stop_reason to verify stop_reason is preserved

How the fix works

After the agentic loop completes (async_run_agentic_loop or async_run_chat_completion_agentic_loop)
Check if websearch_interception_converted_stream=True in logging_obj (set when stream was converted)
If true, convert the non-streaming response to a fake streaming iterator
Return the fake stream to the client

This ensures:

Output tokens are correctly included in the message_delta event
stop_reason is properly preserved
The response format matches what Claude Code expects

Testing

All existing tests pass + 2 new tests added:

tests/test_litellm/integrations/websearch_interception/test_websearch_interception_handler.py::test_fake_anthropic_messages_stream_iterator_includes_output_tokens PASSED
tests/test_litellm/integrations/websearch_interception/test_websearch_interception_handler.py::test_fake_anthropic_messages_stream_iterator_preserves_stop_reason PASSED

…ng format for Claude Code Fixes #20187 - When using websearch_interception in Bedrock with Claude Code: 1. Output tokens were showing as 0 because the agentic loop response wasn't being converted back to streaming format 2. The response from the agentic loop (follow-up request) was returned as a non-streaming dict, but Claude Code expects a streaming response This fix adds streaming format conversion for the agentic loop response when the original request was streaming (detected via the websearch_interception_converted_stream flag in logging_obj). The fix applies to both: - Anthropic Messages API (_call_agentic_completion_hooks) - Chat Completions API (_call_agentic_chat_completion_hooks) The fix ensures: - Output tokens are correctly included in the message_delta event - stop_reason is properly preserved - The response format matches what Claude Code expects Note: This fix was previously in PR #20631 but was merged to a staging branch (litellm_oss_staging_02_07_2026) and never made it to main.

vercel · 2026-02-22T04:32:13Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
litellm	Ready	Preview, Comment	Feb 22, 2026 4:33am

CLAassistant · 2026-02-22T04:32:52Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

greptile-apps · 2026-02-22T04:38:34Z

Greptile Summary

This PR fixes a streaming format mismatch when websearch_interception is used with Bedrock + Claude Code. When the agentic loop runs (after a web search tool call), the follow-up LLM response is non-streaming, but the original client expects a stream. The fix adds conversion logic inside the agentic loop paths of both _call_agentic_completion_hooks (Anthropic Messages API) and _call_agentic_chat_completion_hooks (Chat Completions API), matching the same pattern already used in the non-agentic fallback code on main.

Adds FakeAnthropicMessagesStreamIterator wrapping for agentic Anthropic Messages responses when websearch_interception_converted_stream flag is set
Adds convert_model_response_to_streaming wrapping for agentic Chat Completions responses under the same flag
Two new unit tests verify output_tokens and stop_reason are preserved in the fake stream's message_delta event
Tests are mock-only with no network calls, following CI requirements

Confidence Score: 4/5

This PR is safe to merge — it follows existing patterns and only activates under the specific websearch interception flag.
The code changes are well-scoped and follow the same conversion patterns already established in the non-agentic fallback paths on main. The logic is guarded by the websearch_interception_converted_stream flag so it only activates when relevant. Both Anthropic Messages and Chat Completions paths are covered with appropriate type checks. The tests are mock-only and verify the critical regression (output_tokens and stop_reason). Minor style note: inline imports follow the existing pattern in these methods though CLAUDE.md prefers module-level imports.
No files require special attention — the core logic in litellm/llms/custom_httpx/llm_http_handler.py is well-guarded and follows existing patterns.

Important Files Changed

Filename	Overview
litellm/llms/custom_httpx/llm_http_handler.py	Adds fake-stream conversion inside both `_call_agentic_completion_hooks` and `_call_agentic_chat_completion_hooks` for when `websearch_interception_converted_stream` is set. The logic mirrors the existing non-agentic fallback paths already on main. No new bugs introduced.
tests/test_litellm/integrations/websearch_interception/test_websearch_interception_handler.py	Two new mock-only unit tests verifying FakeAnthropicMessagesStreamIterator correctly includes output_tokens and stop_reason in the message_delta event. No network calls, compliant with CI requirements.

Sequence Diagram

sequenceDiagram
    participant Client as Client (Claude Code)
    participant Proxy as LiteLLM Proxy
    participant Hook as WebSearch Interception Hook
    participant LLM as LLM Provider (Bedrock)
    participant Search as Search Provider

    Client->>Proxy: POST /v1/messages (stream=True)
    Proxy->>Hook: async_pre_call_deployment_hook
    Hook-->>Proxy: stream=False (converted)
    Proxy->>LLM: Non-streaming request
    LLM-->>Proxy: Non-streaming response (with tool_use)
    Proxy->>Hook: async_should_run_agentic_loop
    Hook-->>Proxy: should_run=True
    Proxy->>Hook: async_run_agentic_loop
    Hook->>Search: Execute web search
    Search-->>Hook: Search results
    Hook->>LLM: Follow-up request (non-streaming)
    LLM-->>Hook: Final response (dict)
    Hook-->>Proxy: agentic_response (dict)
    Note over Proxy: websearch_converted_stream=True
    Proxy->>Proxy: Wrap in FakeAnthropicMessagesStreamIterator
    Proxy-->>Client: Streaming SSE response (message_start, content_block_delta, message_delta, message_stop)

_{Last reviewed commit: 990cda8}

greptile-apps

_{2 files reviewed, 1 comment}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps · 2026-02-22T04:38:38Z

litellm/llms/custom_httpx/llm_http_handler.py

+                            from typing import cast
+
+                            from litellm.llms.anthropic.experimental_pass_through.messages.fake_stream_iterator import (
+                                FakeAnthropicMessagesStreamIterator,
+                            )
+                            from litellm.types.llms.anthropic_messages.anthropic_response import (
+                                AnthropicMessagesResponse,
+                            )


Inline imports inside method body

Per CLAUDE.md style guidelines: "Avoid imports within methods — place all imports at the top of the file (module-level). Inline imports inside functions/methods make dependencies harder to trace and hurt readability."

Note: This follows the existing pattern in this method (e.g., verbose_logger is re-imported inline at line 4408 despite being a top-level import). Ideally these would be consolidated at the module level, but this is a pre-existing pattern rather than a new concern.

Context Used: Context from dashboard - CLAUDE.md (source)

_{Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!}

vercel bot deployed to Preview February 22, 2026 04:33 View deployment

greptile-apps bot reviewed Feb 22, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(websearch_interception): Convert agentic loop response to streaming format for Claude Code#21878

fix(websearch_interception): Convert agentic loop response to streaming format for Claude Code#21878
shin-bot-litellm wants to merge 1 commit intomainfrom
litellm_fix_websearch_interception_logging

shin-bot-litellm commented Feb 22, 2026

Uh oh!

vercel bot commented Feb 22, 2026 •

edited

Loading

Uh oh!

CLAassistant commented Feb 22, 2026

Uh oh!

greptile-apps bot commented Feb 22, 2026

Important Files Changed

Uh oh!

greptile-apps bot left a comment

Uh oh!

greptile-apps bot Feb 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

shin-bot-litellm commented Feb 22, 2026

What does this PR do?

Root Cause

Changes

How the fix works

Testing

Uh oh!

vercel bot commented Feb 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

CLAassistant commented Feb 22, 2026

Uh oh!

greptile-apps bot commented Feb 22, 2026

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Feb 22, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

vercel bot commented Feb 22, 2026 •

edited

Loading