Skip to content

fix(websearch_interception): Convert agentic loop response to streaming format for Claude Code#20631

Merged
krrishdholakia merged 13 commits intolitellm_oss_staging_02_07_2026from
litellm_fix_websearch_token_tracking
Feb 7, 2026
Merged

fix(websearch_interception): Convert agentic loop response to streaming format for Claude Code#20631
krrishdholakia merged 13 commits intolitellm_oss_staging_02_07_2026from
litellm_fix_websearch_token_tracking

Conversation

@shin-bot-litellm
Copy link
Contributor

Summary

Fixes #20187

When using websearch_interception with Bedrock and Claude Code, the output tokens were showing as 0 because the agentic loop response wasn't being converted back to streaming format.

Problem

  1. When Claude Code sends a streaming request with websearch tools
  2. The websearch_interception handler converts stream=True to stream=False to intercept the response
  3. When the agentic loop runs (executing the search and making a follow-up request), the response was returned as a non-streaming dict
  4. Claude Code expects a streaming response, so when it parsed the non-streaming dict, the output tokens appeared as 0

Solution

After the agentic loop completes, check if the original request was streaming (via the websearch_interception_converted_stream flag). If so, convert the agentic loop's non-streaming response to streaming format using FakeAnthropicMessagesStreamIterator.

This ensures:

  • Output tokens are correctly included in the message_delta event (as per Anthropic's streaming spec)
  • stop_reason is properly preserved
  • The response format matches what Claude Code expects

Testing

  • Added unit tests to verify FakeAnthropicMessagesStreamIterator correctly includes output tokens and preserves stop_reason
  • All existing tests pass

Changes

  • litellm/llms/custom_httpx/llm_http_handler.py: Added streaming conversion for agentic loop response
  • tests/test_litellm/integrations/websearch_interception/test_websearch_interception_handler.py: Added regression tests

krrishdholakia and others added 13 commits February 6, 2026 17:34
…CP + Agent guardrail support (#20619)

* fix: fix styling

* fix(custom_code_guardrail.py): add http support for custom code guardrails

allows users to call external guardrails on litellm with minimal code changes (no custom handlers)

Test guardrail integrations more easily

* feat(a2a/): add guardrails for agent interactions

allows the same guardrails for llm's to be applied to agents as well

* fix(a2a/): support passing guardrails to a2a from the UI

* style(code-editor): allow editing custom code guardrails on ui + add examples of pre/post calls for custom code guardrails

* feat(mcp/): support custom code guardrails for mcp calls

allows custom code guardrails to work on mcp input

* feat(chatui.tsx): support guardrails on mcp tool calls on playground
…20618)

* fix(mypy): resolve missing return statements and type casting issues

* fix(pangea): use elif to prevent UnboundLocalError and handle None messages

Address Greptile review feedback:
- Make branches mutually exclusive using elif to prevent input_messages from being overwritten
- Handle case where data.get('messages') returns None to avoid passing invalid payload to Pangea API

---------

Co-authored-by: Shin <shin@openclaw.ai>
…lable on Internet (#20607)

* update MCPAuthenticatedUser

* add available_on_public_internet for MCPs

* update claude.md

* init IPAddressUtils

* init available_on_public_internet

* add on REST endpoints

* filter with IP

* TestIsInternalIp

* _extract_mcp_headers_from_request

* init get_mcp_client_ip

* _get_general_settings

* allowed_server_ids

* address PR comments

* get_mcp_server_by_name fix

* fix server

* fix review comments

* get_public_mcp_servers

* address _get_allowed_mcp_servers
* update MCPAuthenticatedUser

* add available_on_public_internet for MCPs

* update claude.md

* init IPAddressUtils

* init available_on_public_internet

* add on REST endpoints

* filter with IP

* TestIsInternalIp

* _extract_mcp_headers_from_request

* init get_mcp_client_ip

* _get_general_settings

* allowed_server_ids

* address PR comments

* get_mcp_server_by_name fix

* fix server

* fix review comments

* get_public_mcp_servers

* address _get_allowed_mcp_servers

* test fix

* fix linting

* inint ui types

* add ui for managing MCP private/public

* add ui

* fixes

* add to schema

* add types

* fix endpoint

* add endpoint

* update manager

* test mcp

* dont use external party for ip address
[Fix] /key/list user_id Empty String Edge Case
- a2a_protocol/exception_mapping_utils.py: Fix type ignore comment for None assignment
- caching/redis_cache.py: Add type ignore for async ping return type
- caching/redis_cluster_cache.py: Add type ignore for async ping return type
- llms/deprecated_providers/palm.py: Add type ignore for palm.generate_text
- proxy/auth/handle_jwt.py: Add type ignore for jwt.decode options argument

All changes add appropriate type: ignore comments to handle library typing inconsistencies.
Replace text-embedding-004 with gemini-embedding-001.

The old model was deprecated and returns 404:
'models/text-embedding-004 is not found for API version v1beta'

Co-authored-by: Shin <shin@openclaw.ai>
…ng format when original request was streaming

Fixes #20187 - When using websearch_interception in Bedrock with Claude Code:
1. Output tokens were showing as 0 because the agentic loop response wasn't
   being converted back to streaming format
2. The response from the agentic loop (follow-up request) was returned as a
   non-streaming dict, but Claude Code expects a streaming response

This fix adds streaming format conversion for the agentic loop response when
the original request was streaming (detected via the
websearch_interception_converted_stream flag in logging_obj).

The fix ensures:
- Output tokens are correctly included in the message_delta event
- stop_reason is properly preserved
- The response format matches what Claude Code expects
@vercel
Copy link

vercel bot commented Feb 7, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
litellm Ready Ready Preview, Comment Feb 7, 2026 4:19am

Request Review

@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Feb 7, 2026

Greptile Overview

Greptile Summary

This PR fixes a bug where Claude Code showed 0 output tokens when using websearch_interception with Bedrock. The issue occurred because the websearch handler converts streaming requests to non-streaming for interception, and when the agentic loop completed, it returned a dict response instead of converting it back to streaming format.

Changes:

  • Added logic in llm_http_handler.py:4415-4440 to check if the original request was streaming (via the websearch_interception_converted_stream flag) and convert the agentic loop's dict response to a fake stream using FakeAnthropicMessagesStreamIterator
  • Added two regression tests to verify the fake stream iterator correctly includes output_tokens in the message_delta event and preserves stop_reason
  • The fix mirrors existing logic at line 4455 that handles the case when no agentic loop runs

Technical Details:

  • When websearch tools are detected, stream=True is converted to stream=False and a flag is set in logging_obj.model_call_details
  • After the agentic loop completes its follow-up request, the code now checks this flag and converts the dict response to streaming format
  • The FakeAnthropicMessagesStreamIterator properly formats the response with all required Anthropic streaming events including correct token counts

Confidence Score: 5/5

  • This PR is safe to merge with minimal risk
  • The changes are well-isolated to the websearch interception flow, follow existing patterns in the codebase (duplicating logic from line 4455), include comprehensive regression tests, and fix a specific reported issue without introducing new dependencies or altering the core request path
  • No files require special attention

Important Files Changed

Filename Overview
litellm/llms/custom_httpx/llm_http_handler.py Adds streaming conversion for agentic loop responses when websearch interception converts stream=True to stream=False. Uses FakeAnthropicMessagesStreamIterator to properly format the response with output tokens and stop_reason.
tests/test_litellm/integrations/websearch_interception/test_websearch_interception_handler.py Adds regression tests to verify FakeAnthropicMessagesStreamIterator correctly includes output_tokens in message_delta event and preserves stop_reason, addressing the reported issue.

Sequence Diagram

sequenceDiagram
    participant Client as Claude Code Client
    participant Handler as WebSearchInterception
    participant LLM as LLM Provider
    participant AgenticLoop as Agentic Loop
    participant Converter as FakeStreamIterator

    Client->>Handler: Request with stream=True and websearch tools
    Handler->>Handler: Convert stream=True to stream=False
    Handler->>Handler: Set websearch_interception_converted_stream flag
    Handler->>LLM: Make request with stream=False
    LLM->>Handler: Non-streaming response with tool_use
    Handler->>AgenticLoop: Execute agentic loop with websearch
    AgenticLoop->>AgenticLoop: Execute search queries
    AgenticLoop->>LLM: Follow-up request with search results
    LLM->>AgenticLoop: Non-streaming response dict
    AgenticLoop->>Handler: Return agentic response dict
    Handler->>Handler: Check websearch_interception_converted_stream flag
    Handler->>Converter: Convert dict to fake stream
    Converter->>Converter: Create streaming events
    Converter->>Converter: Include output_tokens in message_delta
    Converter->>Client: Return streaming response with correct tokens
Loading

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 files reviewed, no comments

Edit Code Review Agent Settings | Greptile

@krrishdholakia krrishdholakia changed the base branch from main to litellm_oss_staging_02_07_2026 February 7, 2026 07:54
@krrishdholakia krrishdholakia merged commit 8f66873 into litellm_oss_staging_02_07_2026 Feb 7, 2026
55 of 67 checks passed
shin-bot-litellm added a commit that referenced this pull request Feb 22, 2026
…ng format for Claude Code

Fixes #20187 - When using websearch_interception in Bedrock with Claude Code:

1. Output tokens were showing as 0 because the agentic loop response wasn't
   being converted back to streaming format
2. The response from the agentic loop (follow-up request) was returned as a
   non-streaming dict, but Claude Code expects a streaming response

This fix adds streaming format conversion for the agentic loop response when
the original request was streaming (detected via the
websearch_interception_converted_stream flag in logging_obj).

The fix applies to both:
- Anthropic Messages API (_call_agentic_completion_hooks)
- Chat Completions API (_call_agentic_chat_completion_hooks)

The fix ensures:
- Output tokens are correctly included in the message_delta event
- stop_reason is properly preserved
- The response format matches what Claude Code expects

Note: This fix was previously in PR #20631 but was merged to a staging branch
(litellm_oss_staging_02_07_2026) and never made it to main.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Claude Code shows 0 output tokens when using websearch_interception in bedrock and tool call not recorded in request logs

6 participants