Skip to content

fix(websearch_interception): Convert agentic loop response to streaming format for Claude Code#21878

Open
shin-bot-litellm wants to merge 1 commit intomainfrom
litellm_fix_websearch_interception_logging
Open

fix(websearch_interception): Convert agentic loop response to streaming format for Claude Code#21878
shin-bot-litellm wants to merge 1 commit intomainfrom
litellm_fix_websearch_interception_logging

Conversation

@shin-bot-litellm
Copy link
Contributor

What does this PR do?

Fixes #20187 - When using websearch_interception callback with Bedrock + Claude Code:

  1. Output tokens were showing as 0 in Claude Code session logs
  2. The response from the agentic loop (follow-up request) was returned as a non-streaming dict, but Claude Code expects a streaming response

Root Cause

When the agentic loop runs (after a web search tool call is detected), the follow-up LLM request returns a non-streaming response. However, the original request had stream=True which was converted to stream=False by the websearch_interception hook. The response needs to be converted back to streaming format before being returned to the client.

This fix was previously in PR #20631 but was merged to a staging branch (litellm_oss_staging_02_07_2026) and never made it to main.

Changes

  1. litellm/llms/custom_httpx/llm_http_handler.py:

    • Added streaming format conversion in _call_agentic_completion_hooks after the agentic loop completes
    • Added streaming format conversion in _call_agentic_chat_completion_hooks after the chat completion agentic loop completes
    • Uses FakeAnthropicMessagesStreamIterator for Anthropic Messages API
    • Uses convert_model_response_to_streaming for Chat Completions API
  2. tests/test_litellm/integrations/websearch_interception/test_websearch_interception_handler.py:

    • Added test test_fake_anthropic_messages_stream_iterator_includes_output_tokens to verify output_tokens is included in message_delta
    • Added test test_fake_anthropic_messages_stream_iterator_preserves_stop_reason to verify stop_reason is preserved

How the fix works

  1. After the agentic loop completes (async_run_agentic_loop or async_run_chat_completion_agentic_loop)
  2. Check if websearch_interception_converted_stream=True in logging_obj (set when stream was converted)
  3. If true, convert the non-streaming response to a fake streaming iterator
  4. Return the fake stream to the client

This ensures:

  • Output tokens are correctly included in the message_delta event
  • stop_reason is properly preserved
  • The response format matches what Claude Code expects

Testing

All existing tests pass + 2 new tests added:

tests/test_litellm/integrations/websearch_interception/test_websearch_interception_handler.py::test_fake_anthropic_messages_stream_iterator_includes_output_tokens PASSED
tests/test_litellm/integrations/websearch_interception/test_websearch_interception_handler.py::test_fake_anthropic_messages_stream_iterator_preserves_stop_reason PASSED

…ng format for Claude Code

Fixes #20187 - When using websearch_interception in Bedrock with Claude Code:

1. Output tokens were showing as 0 because the agentic loop response wasn't
   being converted back to streaming format
2. The response from the agentic loop (follow-up request) was returned as a
   non-streaming dict, but Claude Code expects a streaming response

This fix adds streaming format conversion for the agentic loop response when
the original request was streaming (detected via the
websearch_interception_converted_stream flag in logging_obj).

The fix applies to both:
- Anthropic Messages API (_call_agentic_completion_hooks)
- Chat Completions API (_call_agentic_chat_completion_hooks)

The fix ensures:
- Output tokens are correctly included in the message_delta event
- stop_reason is properly preserved
- The response format matches what Claude Code expects

Note: This fix was previously in PR #20631 but was merged to a staging branch
(litellm_oss_staging_02_07_2026) and never made it to main.
@vercel
Copy link

vercel bot commented Feb 22, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
litellm Ready Ready Preview, Comment Feb 22, 2026 4:33am

Request Review

@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Feb 22, 2026

Greptile Summary

This PR fixes a streaming format mismatch when websearch_interception is used with Bedrock + Claude Code. When the agentic loop runs (after a web search tool call), the follow-up LLM response is non-streaming, but the original client expects a stream. The fix adds conversion logic inside the agentic loop paths of both _call_agentic_completion_hooks (Anthropic Messages API) and _call_agentic_chat_completion_hooks (Chat Completions API), matching the same pattern already used in the non-agentic fallback code on main.

  • Adds FakeAnthropicMessagesStreamIterator wrapping for agentic Anthropic Messages responses when websearch_interception_converted_stream flag is set
  • Adds convert_model_response_to_streaming wrapping for agentic Chat Completions responses under the same flag
  • Two new unit tests verify output_tokens and stop_reason are preserved in the fake stream's message_delta event
  • Tests are mock-only with no network calls, following CI requirements

Confidence Score: 4/5

  • This PR is safe to merge — it follows existing patterns and only activates under the specific websearch interception flag.
  • The code changes are well-scoped and follow the same conversion patterns already established in the non-agentic fallback paths on main. The logic is guarded by the websearch_interception_converted_stream flag so it only activates when relevant. Both Anthropic Messages and Chat Completions paths are covered with appropriate type checks. The tests are mock-only and verify the critical regression (output_tokens and stop_reason). Minor style note: inline imports follow the existing pattern in these methods though CLAUDE.md prefers module-level imports.
  • No files require special attention — the core logic in litellm/llms/custom_httpx/llm_http_handler.py is well-guarded and follows existing patterns.

Important Files Changed

Filename Overview
litellm/llms/custom_httpx/llm_http_handler.py Adds fake-stream conversion inside both _call_agentic_completion_hooks and _call_agentic_chat_completion_hooks for when websearch_interception_converted_stream is set. The logic mirrors the existing non-agentic fallback paths already on main. No new bugs introduced.
tests/test_litellm/integrations/websearch_interception/test_websearch_interception_handler.py Two new mock-only unit tests verifying FakeAnthropicMessagesStreamIterator correctly includes output_tokens and stop_reason in the message_delta event. No network calls, compliant with CI requirements.

Sequence Diagram

sequenceDiagram
    participant Client as Client (Claude Code)
    participant Proxy as LiteLLM Proxy
    participant Hook as WebSearch Interception Hook
    participant LLM as LLM Provider (Bedrock)
    participant Search as Search Provider

    Client->>Proxy: POST /v1/messages (stream=True)
    Proxy->>Hook: async_pre_call_deployment_hook
    Hook-->>Proxy: stream=False (converted)
    Proxy->>LLM: Non-streaming request
    LLM-->>Proxy: Non-streaming response (with tool_use)
    Proxy->>Hook: async_should_run_agentic_loop
    Hook-->>Proxy: should_run=True
    Proxy->>Hook: async_run_agentic_loop
    Hook->>Search: Execute web search
    Search-->>Hook: Search results
    Hook->>LLM: Follow-up request (non-streaming)
    LLM-->>Hook: Final response (dict)
    Hook-->>Proxy: agentic_response (dict)
    Note over Proxy: websearch_converted_stream=True
    Proxy->>Proxy: Wrap in FakeAnthropicMessagesStreamIterator
    Proxy-->>Client: Streaming SSE response (message_start, content_block_delta, message_delta, message_stop)
Loading

Last reviewed commit: 990cda8

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

Comment on lines +4459 to +4466
from typing import cast

from litellm.llms.anthropic.experimental_pass_through.messages.fake_stream_iterator import (
FakeAnthropicMessagesStreamIterator,
)
from litellm.types.llms.anthropic_messages.anthropic_response import (
AnthropicMessagesResponse,
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inline imports inside method body

Per CLAUDE.md style guidelines: "Avoid imports within methods — place all imports at the top of the file (module-level). Inline imports inside functions/methods make dependencies harder to trace and hurt readability."

Note: This follows the existing pattern in this method (e.g., verbose_logger is re-imported inline at line 4408 despite being a top-level import). Ideally these would be consolidated at the module level, but this is a pre-existing pattern rather than a new concern.

Context Used: Context from dashboard - CLAUDE.md (source)

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Claude Code shows 0 output tokens when using websearch_interception in bedrock and tool call not recorded in request logs

2 participants