fix(websearch_interception): Convert agentic loop response to streaming format for Claude Code#20631
Merged
krrishdholakia merged 13 commits intolitellm_oss_staging_02_07_2026from Feb 7, 2026
Conversation
…CP + Agent guardrail support (#20619) * fix: fix styling * fix(custom_code_guardrail.py): add http support for custom code guardrails allows users to call external guardrails on litellm with minimal code changes (no custom handlers) Test guardrail integrations more easily * feat(a2a/): add guardrails for agent interactions allows the same guardrails for llm's to be applied to agents as well * fix(a2a/): support passing guardrails to a2a from the UI * style(code-editor): allow editing custom code guardrails on ui + add examples of pre/post calls for custom code guardrails * feat(mcp/): support custom code guardrails for mcp calls allows custom code guardrails to work on mcp input * feat(chatui.tsx): support guardrails on mcp tool calls on playground
…20618) * fix(mypy): resolve missing return statements and type casting issues * fix(pangea): use elif to prevent UnboundLocalError and handle None messages Address Greptile review feedback: - Make branches mutually exclusive using elif to prevent input_messages from being overwritten - Handle case where data.get('messages') returns None to avoid passing invalid payload to Pangea API --------- Co-authored-by: Shin <shin@openclaw.ai>
…lable on Internet (#20607) * update MCPAuthenticatedUser * add available_on_public_internet for MCPs * update claude.md * init IPAddressUtils * init available_on_public_internet * add on REST endpoints * filter with IP * TestIsInternalIp * _extract_mcp_headers_from_request * init get_mcp_client_ip * _get_general_settings * allowed_server_ids * address PR comments * get_mcp_server_by_name fix * fix server * fix review comments * get_public_mcp_servers * address _get_allowed_mcp_servers
* update MCPAuthenticatedUser * add available_on_public_internet for MCPs * update claude.md * init IPAddressUtils * init available_on_public_internet * add on REST endpoints * filter with IP * TestIsInternalIp * _extract_mcp_headers_from_request * init get_mcp_client_ip * _get_general_settings * allowed_server_ids * address PR comments * get_mcp_server_by_name fix * fix server * fix review comments * get_public_mcp_servers * address _get_allowed_mcp_servers * test fix * fix linting * inint ui types * add ui for managing MCP private/public * add ui * fixes * add to schema * add types * fix endpoint * add endpoint * update manager * test mcp * dont use external party for ip address
[Fix] /key/list user_id Empty String Edge Case
- a2a_protocol/exception_mapping_utils.py: Fix type ignore comment for None assignment - caching/redis_cache.py: Add type ignore for async ping return type - caching/redis_cluster_cache.py: Add type ignore for async ping return type - llms/deprecated_providers/palm.py: Add type ignore for palm.generate_text - proxy/auth/handle_jwt.py: Add type ignore for jwt.decode options argument All changes add appropriate type: ignore comments to handle library typing inconsistencies.
Replace text-embedding-004 with gemini-embedding-001. The old model was deprecated and returns 404: 'models/text-embedding-004 is not found for API version v1beta' Co-authored-by: Shin <shin@openclaw.ai>
…ng format when original request was streaming Fixes #20187 - When using websearch_interception in Bedrock with Claude Code: 1. Output tokens were showing as 0 because the agentic loop response wasn't being converted back to streaming format 2. The response from the agentic loop (follow-up request) was returned as a non-streaming dict, but Claude Code expects a streaming response This fix adds streaming format conversion for the agentic loop response when the original request was streaming (detected via the websearch_interception_converted_stream flag in logging_obj). The fix ensures: - Output tokens are correctly included in the message_delta event - stop_reason is properly preserved - The response format matches what Claude Code expects
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
|
|
Contributor
Greptile OverviewGreptile SummaryThis PR fixes a bug where Claude Code showed 0 output tokens when using Changes:
Technical Details:
Confidence Score: 5/5
|
| Filename | Overview |
|---|---|
| litellm/llms/custom_httpx/llm_http_handler.py | Adds streaming conversion for agentic loop responses when websearch interception converts stream=True to stream=False. Uses FakeAnthropicMessagesStreamIterator to properly format the response with output tokens and stop_reason. |
| tests/test_litellm/integrations/websearch_interception/test_websearch_interception_handler.py | Adds regression tests to verify FakeAnthropicMessagesStreamIterator correctly includes output_tokens in message_delta event and preserves stop_reason, addressing the reported issue. |
Sequence Diagram
sequenceDiagram
participant Client as Claude Code Client
participant Handler as WebSearchInterception
participant LLM as LLM Provider
participant AgenticLoop as Agentic Loop
participant Converter as FakeStreamIterator
Client->>Handler: Request with stream=True and websearch tools
Handler->>Handler: Convert stream=True to stream=False
Handler->>Handler: Set websearch_interception_converted_stream flag
Handler->>LLM: Make request with stream=False
LLM->>Handler: Non-streaming response with tool_use
Handler->>AgenticLoop: Execute agentic loop with websearch
AgenticLoop->>AgenticLoop: Execute search queries
AgenticLoop->>LLM: Follow-up request with search results
LLM->>AgenticLoop: Non-streaming response dict
AgenticLoop->>Handler: Return agentic response dict
Handler->>Handler: Check websearch_interception_converted_stream flag
Handler->>Converter: Convert dict to fake stream
Converter->>Converter: Create streaming events
Converter->>Converter: Include output_tokens in message_delta
Converter->>Client: Return streaming response with correct tokens
8f66873
into
litellm_oss_staging_02_07_2026
55 of 67 checks passed
shin-bot-litellm
added a commit
that referenced
this pull request
Feb 22, 2026
…ng format for Claude Code Fixes #20187 - When using websearch_interception in Bedrock with Claude Code: 1. Output tokens were showing as 0 because the agentic loop response wasn't being converted back to streaming format 2. The response from the agentic loop (follow-up request) was returned as a non-streaming dict, but Claude Code expects a streaming response This fix adds streaming format conversion for the agentic loop response when the original request was streaming (detected via the websearch_interception_converted_stream flag in logging_obj). The fix applies to both: - Anthropic Messages API (_call_agentic_completion_hooks) - Chat Completions API (_call_agentic_chat_completion_hooks) The fix ensures: - Output tokens are correctly included in the message_delta event - stop_reason is properly preserved - The response format matches what Claude Code expects Note: This fix was previously in PR #20631 but was merged to a staging branch (litellm_oss_staging_02_07_2026) and never made it to main.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes #20187
When using
websearch_interceptionwith Bedrock and Claude Code, the output tokens were showing as 0 because the agentic loop response wasn't being converted back to streaming format.Problem
websearch_interceptionhandler convertsstream=Truetostream=Falseto intercept the responseSolution
After the agentic loop completes, check if the original request was streaming (via the
websearch_interception_converted_streamflag). If so, convert the agentic loop's non-streaming response to streaming format usingFakeAnthropicMessagesStreamIterator.This ensures:
message_deltaevent (as per Anthropic's streaming spec)stop_reasonis properly preservedTesting
FakeAnthropicMessagesStreamIteratorcorrectly includes output tokens and preserves stop_reasonChanges
litellm/llms/custom_httpx/llm_http_handler.py: Added streaming conversion for agentic loop responsetests/test_litellm/integrations/websearch_interception/test_websearch_interception_handler.py: Added regression tests