fix(websearch_interception): Convert agentic loop response to streaming format for Claude Code by shin-bot-litellm · Pull Request #20631 · BerriAI/litellm

shin-bot-litellm · 2026-02-07T04:18:17Z

Summary

Fixes #20187

When using websearch_interception with Bedrock and Claude Code, the output tokens were showing as 0 because the agentic loop response wasn't being converted back to streaming format.

Problem

When Claude Code sends a streaming request with websearch tools
The websearch_interception handler converts stream=True to stream=False to intercept the response
When the agentic loop runs (executing the search and making a follow-up request), the response was returned as a non-streaming dict
Claude Code expects a streaming response, so when it parsed the non-streaming dict, the output tokens appeared as 0

Solution

After the agentic loop completes, check if the original request was streaming (via the websearch_interception_converted_stream flag). If so, convert the agentic loop's non-streaming response to streaming format using FakeAnthropicMessagesStreamIterator.

This ensures:

Output tokens are correctly included in the message_delta event (as per Anthropic's streaming spec)
stop_reason is properly preserved
The response format matches what Claude Code expects

Testing

Added unit tests to verify FakeAnthropicMessagesStreamIterator correctly includes output tokens and preserves stop_reason
All existing tests pass

Changes

litellm/llms/custom_httpx/llm_http_handler.py: Added streaming conversion for agentic loop response
tests/test_litellm/integrations/websearch_interception/test_websearch_interception_handler.py: Added regression tests

…CP + Agent guardrail support (#20619) * fix: fix styling * fix(custom_code_guardrail.py): add http support for custom code guardrails allows users to call external guardrails on litellm with minimal code changes (no custom handlers) Test guardrail integrations more easily * feat(a2a/): add guardrails for agent interactions allows the same guardrails for llm's to be applied to agents as well * fix(a2a/): support passing guardrails to a2a from the UI * style(code-editor): allow editing custom code guardrails on ui + add examples of pre/post calls for custom code guardrails * feat(mcp/): support custom code guardrails for mcp calls allows custom code guardrails to work on mcp input * feat(chatui.tsx): support guardrails on mcp tool calls on playground

…20618) * fix(mypy): resolve missing return statements and type casting issues * fix(pangea): use elif to prevent UnboundLocalError and handle None messages Address Greptile review feedback: - Make branches mutually exclusive using elif to prevent input_messages from being overwritten - Handle case where data.get('messages') returns None to avoid passing invalid payload to Pangea API --------- Co-authored-by: Shin <shin@openclaw.ai>

…lable on Internet (#20607) * update MCPAuthenticatedUser * add available_on_public_internet for MCPs * update claude.md * init IPAddressUtils * init available_on_public_internet * add on REST endpoints * filter with IP * TestIsInternalIp * _extract_mcp_headers_from_request * init get_mcp_client_ip * _get_general_settings * allowed_server_ids * address PR comments * get_mcp_server_by_name fix * fix server * fix review comments * get_public_mcp_servers * address _get_allowed_mcp_servers

* update MCPAuthenticatedUser * add available_on_public_internet for MCPs * update claude.md * init IPAddressUtils * init available_on_public_internet * add on REST endpoints * filter with IP * TestIsInternalIp * _extract_mcp_headers_from_request * init get_mcp_client_ip * _get_general_settings * allowed_server_ids * address PR comments * get_mcp_server_by_name fix * fix server * fix review comments * get_public_mcp_servers * address _get_allowed_mcp_servers * test fix * fix linting * inint ui types * add ui for managing MCP private/public * add ui * fixes * add to schema * add types * fix endpoint * add endpoint * update manager * test mcp * dont use external party for ip address

…sion detection (#20622)

[Fix] /key/list user_id Empty String Edge Case

- a2a_protocol/exception_mapping_utils.py: Fix type ignore comment for None assignment - caching/redis_cache.py: Add type ignore for async ping return type - caching/redis_cluster_cache.py: Add type ignore for async ping return type - llms/deprecated_providers/palm.py: Add type ignore for palm.generate_text - proxy/auth/handle_jwt.py: Add type ignore for jwt.decode options argument All changes add appropriate type: ignore comments to handle library typing inconsistencies.

Replace text-embedding-004 with gemini-embedding-001. The old model was deprecated and returns 404: 'models/text-embedding-004 is not found for API version v1beta' Co-authored-by: Shin <shin@openclaw.ai>

…ng format when original request was streaming Fixes #20187 - When using websearch_interception in Bedrock with Claude Code: 1. Output tokens were showing as 0 because the agentic loop response wasn't being converted back to streaming format 2. The response from the agentic loop (follow-up request) was returned as a non-streaming dict, but Claude Code expects a streaming response This fix adds streaming format conversion for the agentic loop response when the original request was streaming (detected via the websearch_interception_converted_stream flag in logging_obj). The fix ensures: - Output tokens are correctly included in the message_delta event - stop_reason is properly preserved - The response format matches what Claude Code expects

vercel · 2026-02-07T04:18:21Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
litellm	Ready	Preview, Comment	Feb 7, 2026 4:19am

CLAassistant · 2026-02-07T04:18:23Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

greptile-apps · 2026-02-07T04:21:37Z

Greptile Overview

Greptile Summary

This PR fixes a bug where Claude Code showed 0 output tokens when using websearch_interception with Bedrock. The issue occurred because the websearch handler converts streaming requests to non-streaming for interception, and when the agentic loop completed, it returned a dict response instead of converting it back to streaming format.

Changes:

Added logic in llm_http_handler.py:4415-4440 to check if the original request was streaming (via the websearch_interception_converted_stream flag) and convert the agentic loop's dict response to a fake stream using FakeAnthropicMessagesStreamIterator
Added two regression tests to verify the fake stream iterator correctly includes output_tokens in the message_delta event and preserves stop_reason
The fix mirrors existing logic at line 4455 that handles the case when no agentic loop runs

Technical Details:

When websearch tools are detected, stream=True is converted to stream=False and a flag is set in logging_obj.model_call_details
After the agentic loop completes its follow-up request, the code now checks this flag and converts the dict response to streaming format
The FakeAnthropicMessagesStreamIterator properly formats the response with all required Anthropic streaming events including correct token counts

Confidence Score: 5/5

This PR is safe to merge with minimal risk
The changes are well-isolated to the websearch interception flow, follow existing patterns in the codebase (duplicating logic from line 4455), include comprehensive regression tests, and fix a specific reported issue without introducing new dependencies or altering the core request path
No files require special attention

Important Files Changed

Filename	Overview
litellm/llms/custom_httpx/llm_http_handler.py	Adds streaming conversion for agentic loop responses when websearch interception converts stream=True to stream=False. Uses FakeAnthropicMessagesStreamIterator to properly format the response with output tokens and stop_reason.
tests/test_litellm/integrations/websearch_interception/test_websearch_interception_handler.py	Adds regression tests to verify FakeAnthropicMessagesStreamIterator correctly includes output_tokens in message_delta event and preserves stop_reason, addressing the reported issue.

Sequence Diagram

sequenceDiagram
    participant Client as Claude Code Client
    participant Handler as WebSearchInterception
    participant LLM as LLM Provider
    participant AgenticLoop as Agentic Loop
    participant Converter as FakeStreamIterator

    Client->>Handler: Request with stream=True and websearch tools
    Handler->>Handler: Convert stream=True to stream=False
    Handler->>Handler: Set websearch_interception_converted_stream flag
    Handler->>LLM: Make request with stream=False
    LLM->>Handler: Non-streaming response with tool_use
    Handler->>AgenticLoop: Execute agentic loop with websearch
    AgenticLoop->>AgenticLoop: Execute search queries
    AgenticLoop->>LLM: Follow-up request with search results
    LLM->>AgenticLoop: Non-streaming response dict
    AgenticLoop->>Handler: Return agentic response dict
    Handler->>Handler: Check websearch_interception_converted_stream flag
    Handler->>Converter: Convert dict to fake stream
    Converter->>Converter: Create streaming events
    Converter->>Converter: Include output_tokens in message_delta
    Converter->>Client: Return streaming response with correct tokens

greptile-apps

_{2 files reviewed, no comments}

_{Edit Code Review Agent Settings | Greptile}

…ng format for Claude Code Fixes #20187 - When using websearch_interception in Bedrock with Claude Code: 1. Output tokens were showing as 0 because the agentic loop response wasn't being converted back to streaming format 2. The response from the agentic loop (follow-up request) was returned as a non-streaming dict, but Claude Code expects a streaming response This fix adds streaming format conversion for the agentic loop response when the original request was streaming (detected via the websearch_interception_converted_stream flag in logging_obj). The fix applies to both: - Anthropic Messages API (_call_agentic_completion_hooks) - Chat Completions API (_call_agentic_chat_completion_hooks) The fix ensures: - Output tokens are correctly included in the message_delta event - stop_reason is properly preserved - The response format matches what Claude Code expects Note: This fix was previously in PR #20631 but was merged to a staging branch (litellm_oss_staging_02_07_2026) and never made it to main.

krrishdholakia and others added 13 commits February 6, 2026 17:34

fixing user_id

4d1b5d8

Add OpenAI/Azure release test suite with HTTP client lifecycle regres…

0d74656

…sion detection (#20622)

Merge pull request #20623 from BerriAI/litellm_user_id_fix

271877f

[Fix] /key/list user_id Empty String Edge Case

docs (#20626)

1b24a0f

docs

36be004

fix(test): update deprecated gemini embedding model (#20621)

537f7af

Replace text-embedding-004 with gemini-embedding-001. The old model was deprecated and returns 404: 'models/text-embedding-004 is not found for API version v1beta' Co-authored-by: Shin <shin@openclaw.ai>

ui new buil

51af66f

vercel bot deployed to Preview February 7, 2026 04:19 View deployment

greptile-apps bot reviewed Feb 7, 2026

View reviewed changes

krrishdholakia changed the base branch from main to litellm_oss_staging_02_07_2026 February 7, 2026 07:54

krrishdholakia merged commit 8f66873 into litellm_oss_staging_02_07_2026 Feb 7, 2026
55 of 67 checks passed

shin-bot-litellm mentioned this pull request Feb 22, 2026

fix(websearch_interception): Convert agentic loop response to streaming format for Claude Code #21878

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(websearch_interception): Convert agentic loop response to streaming format for Claude Code#20631

fix(websearch_interception): Convert agentic loop response to streaming format for Claude Code#20631
krrishdholakia merged 13 commits intolitellm_oss_staging_02_07_2026from
litellm_fix_websearch_token_tracking

shin-bot-litellm commented Feb 7, 2026

Uh oh!

vercel bot commented Feb 7, 2026 •

edited

Loading

Uh oh!

CLAassistant commented Feb 7, 2026

Uh oh!

greptile-apps bot commented Feb 7, 2026

Important Files Changed

Uh oh!

greptile-apps bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Uh oh!

Conversation

shin-bot-litellm commented Feb 7, 2026

Summary

Problem

Solution

Testing

Changes

Uh oh!

vercel bot commented Feb 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

CLAassistant commented Feb 7, 2026

Uh oh!

greptile-apps bot commented Feb 7, 2026

Greptile Overview

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

vercel bot commented Feb 7, 2026 •

edited

Loading