fix(bedrock): filter internal json_tool_call when mixed with real tools#20916
fix(bedrock): filter internal json_tool_call when mixed with real tools#20916jquinter wants to merge 9 commits intoBerriAI:mainfrom
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
Greptile OverviewGreptile SummaryFilters the internal
Confidence Score: 4/5
|
| Filename | Overview |
|---|---|
| litellm/llms/bedrock/chat/converse_transformation.py | Extracts _filter_json_mode_tools() to handle 3 scenarios for json_tool_call filtering; fixes .pop() → .get() to avoid mutating optional_params. Minor style note: inline import json should be at module level per CLAUDE.md. |
| litellm/llms/bedrock/chat/invoke_handler.py | Adds json_mode and _current_tool_name tracking to AWSEventStreamDecoder. Suppresses json_tool_call start/stop events and converts delta chunks to text content when in json_mode. Logic is sound and handles state transitions correctly. |
| litellm/llms/bedrock/chat/converse_handler.py | Passes json_mode to AWSEventStreamDecoder constructor in make_sync_call. Minimal, correct plumbing change. |
| tests/test_litellm/llms/bedrock/chat/test_converse_transformation.py | Adds 4 well-structured mock-only tests covering: mixed tools filtering, optional_params non-mutation, streaming json_tool_call suppression, and backward compatibility with json_mode=False. |
Sequence Diagram
sequenceDiagram
participant Client
participant LiteLLM
participant Bedrock
Client->>LiteLLM: completion(tools + response_format)
LiteLLM->>Bedrock: Converse API (tools + json_tool_call)
Bedrock-->>LiteLLM: Response with json_tool_call + real tools
alt Non-streaming
LiteLLM->>LiteLLM: _filter_json_mode_tools()
alt Only json_tool_call
LiteLLM->>LiteLLM: Convert to text content
else Mixed with real tools
LiteLLM->>LiteLLM: Filter out json_tool_call, keep real tools
end
else Streaming
LiteLLM->>LiteLLM: AWSEventStreamDecoder (json_mode=True)
loop Per chunk
alt json_tool_call chunk
LiteLLM->>LiteLLM: Suppress start/stop, convert delta to text
else Real tool chunk
LiteLLM->>LiteLLM: Pass through normally
end
end
end
LiteLLM-->>Client: Response (only real tools in tool_calls)
|
@greptile-apps re-review please! |
Greptile OverviewGreptile SummaryFixes an issue where Bedrock Converse API returns both the internal
Confidence Score: 4/5
|
| Filename | Overview |
|---|---|
| litellm/llms/bedrock/chat/converse_transformation.py | New _filter_json_mode_tools static method handles 3 scenarios for json_tool_call filtering. Changed optional_params.pop() to .get() to avoid mutation. Logic is sound and well-structured. |
| litellm/llms/bedrock/chat/invoke_handler.py | Added json_mode and _current_tool_name tracking to AWSEventStreamDecoder. Suppresses json_tool_call start/stop/delta chunks in streaming and converts deltas to text content. State tracking relies on Bedrock's sequential content block delivery. |
| litellm/llms/bedrock/chat/converse_handler.py | One-line change to pass json_mode to AWSEventStreamDecoder constructor in make_sync_call. |
| tests/test_litellm/llms/bedrock/chat/test_converse_transformation.py | 4 new mock-only tests covering mixed tools, optional_params mutation, streaming filtering, and backward compatibility. No real network calls. |
Sequence Diagram
sequenceDiagram
participant Client
participant LiteLLM
participant Bedrock
Client->>LiteLLM: completion(tools + response_format)
LiteLLM->>Bedrock: converse(tools + json_tool_call)
Bedrock-->>LiteLLM: response with json_tool_call + real tools
alt Non-Streaming
LiteLLM->>LiteLLM: _filter_json_mode_tools()
Note over LiteLLM: Converts json_tool_call to text content<br/>Keeps real tool_calls intact
else Streaming
loop For each content block
alt json_tool_call block
LiteLLM->>LiteLLM: Suppress start/stop chunks
LiteLLM->>LiteLLM: Convert delta to text content
else Real tool block
LiteLLM->>LiteLLM: Pass through as tool_call
end
end
end
LiteLLM-->>Client: response with only real tool_calls + text content
|
@jquinter have you validated this PR solves the problem, by doing real tests with bedrock? |
yes, I created two scripts, one simulating a request, and the last one I include here, a real request to Bedrock. Result doing real requests (before vs after PR)Script doing real requests"""
Demo: Bedrock Converse API json_tool_call fix — live API requests
Makes a real Bedrock API call and shows the before/after behavior by
capturing the raw tool_calls list BEFORE _filter_json_mode_tools processes it.
This proves the bug from https://github.com/BerriAI/litellm/issues/18381
exists in the raw Bedrock response and that this PR filters it correctly.
Requires AWS credentials configured for Bedrock access.
Usage:
poetry run python scripts/demo_bedrock_live.py
poetry run python scripts/demo_bedrock_live.py --model bedrock/us.anthropic.claude-sonnet-4-20250514-v1:0
poetry run python scripts/demo_bedrock_live.py --region us-west-2
"""
import argparse
import json
import os
from copy import deepcopy
from typing import List, Optional
import litellm
from litellm.constants import RESPONSE_FORMAT_TOOL_NAME
from litellm.llms.bedrock.chat.converse_transformation import AmazonConverseConfig
from litellm.types.utils import ChatCompletionToolCallChunk
# ---------------------------------------------------------------------------
# Formatting
# ---------------------------------------------------------------------------
BOLD = "\033[1m"
GREEN = "\033[1;32m"
RED = "\033[1;31m"
YELLOW = "\033[1;33m"
CYAN = "\033[1;36m"
DIM = "\033[2m"
RESET = "\033[0m"
def heading(text: str):
print(f"\n{BOLD}{'=' * 72}")
print(f" {text}")
print(f"{'=' * 72}{RESET}\n")
def subheading(text: str):
print(f"\n{BOLD}--- {text} ---{RESET}\n")
def print_tool_calls_dicts(label: str, tool_calls):
"""Print tool_calls from list of dicts (raw capture)."""
if not tool_calls:
print(f" {label}: {YELLOW}None{RESET}")
return
for i, tc in enumerate(tool_calls):
name = tc["function"]["name"]
args = tc["function"].get("arguments", "")
color = RED if name == RESPONSE_FORMAT_TOOL_NAME else GREEN
args_short = args[:120] + "..." if len(args) > 120 else args
print(f" {label}[{i}]: {color}{name}{RESET}({args_short})")
def print_tool_calls_objects(label: str, tool_calls):
"""Print tool_calls from litellm response objects."""
if not tool_calls:
print(f" {label}: {YELLOW}None{RESET}")
return
for i, tc in enumerate(tool_calls):
name = tc.function.name
args = tc.function.arguments or ""
color = RED if name == RESPONSE_FORMAT_TOOL_NAME else GREEN
args_short = args[:120] + "..." if len(args) > 120 else args
print(f" {label}[{i}]: {color}{name}{RESET}({args_short})")
# ---------------------------------------------------------------------------
# Old logic — verbatim from main branch before this PR
# ---------------------------------------------------------------------------
def old_logic(tools, chat_completion_message):
"""
Simulates the old _transform_response code (main branch):
if (
json_mode is True
and tools is not None
and len(tools) == 1 # <-- BUG
and tools[0]["function"].get("name") == RESPONSE_FORMAT_TOOL_NAME
):
# Extract content — only works for single json_tool_call
...
else:
chat_completion_message["tool_calls"] = tools # ALL tools leak
"""
if (
tools is not None
and len(tools) == 1
and tools[0]["function"].get("name") == RESPONSE_FORMAT_TOOL_NAME
):
content = tools[0]["function"].get("arguments")
if content is not None:
chat_completion_message["content"] = content
else:
chat_completion_message["tool_calls"] = tools
# ---------------------------------------------------------------------------
# Capture hook — monkeypatches _filter_json_mode_tools to grab raw tools
# ---------------------------------------------------------------------------
captured_raw_tools: List[ChatCompletionToolCallChunk] = []
_original_filter = AmazonConverseConfig._filter_json_mode_tools
def _capturing_filter(json_mode, tools, chat_completion_message):
"""Wraps _filter_json_mode_tools to capture the raw tools before filtering."""
captured_raw_tools.clear()
if tools:
captured_raw_tools.extend(deepcopy(tools))
return _original_filter(json_mode, tools, chat_completion_message)
# ---------------------------------------------------------------------------
# Test: Non-streaming with tools + response_format
# ---------------------------------------------------------------------------
def test_non_streaming(model: str):
subheading("Non-streaming: tools + response_format (real Bedrock call)")
tools = [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get the current weather for a location.",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "City and state, e.g. San Francisco, CA",
},
},
"required": ["location"],
},
},
}
]
response_format = {
"type": "json_schema",
"json_schema": {
"name": "weather_response",
"schema": {
"type": "object",
"properties": {
"summary": {"type": "string"},
"needs_tool_call": {"type": "boolean"},
},
"required": ["summary", "needs_tool_call"],
},
},
}
messages = [
{
"role": "user",
"content": (
"What's the weather in San Francisco? "
"You MUST call the get_weather tool to check the weather."
),
}
]
# Install the capture hook (must wrap in staticmethod to avoid self injection)
AmazonConverseConfig._filter_json_mode_tools = staticmethod(_capturing_filter)
try:
response = litellm.completion(
model=model,
messages=messages,
tools=tools,
response_format=response_format,
)
finally:
# Restore original as staticmethod
AmazonConverseConfig._filter_json_mode_tools = staticmethod(_original_filter)
msg = response.choices[0].message
raw_tools = list(captured_raw_tools)
# --- Show what Bedrock returned (raw) ---
print(f" {CYAN}{BOLD}Raw tools from Bedrock (before filtering):{RESET}")
print_tool_calls_dicts("raw_tools", raw_tools)
has_json_tool = any(
t["function"]["name"] == RESPONSE_FORMAT_TOOL_NAME for t in raw_tools
)
has_real_tool = any(
t["function"]["name"] != RESPONSE_FORMAT_TOOL_NAME for t in raw_tools
)
if has_json_tool and has_real_tool:
print(f"\n {RED}{BOLD}Bedrock returned BOTH json_tool_call AND get_weather!{RESET}")
print(f" {DIM}This is the exact scenario that triggers issue #18381.{RESET}")
# Show old behavior
print(f"\n {RED}{BOLD}BEFORE this PR (old logic would produce):{RESET}")
msg_old = {"role": "assistant", "content": ""}
old_logic(tools=deepcopy(raw_tools), chat_completion_message=msg_old)
print_tool_calls_dicts("tool_calls", msg_old.get("tool_calls"))
print(f" content: {msg_old.get('content', '') or '(empty)'}")
leaked = any(
tc["function"]["name"] == RESPONSE_FORMAT_TOOL_NAME
for tc in msg_old.get("tool_calls", [])
)
if leaked:
print(f" {RED}^^^ BUG: json_tool_call leaked into tool_calls!{RESET}")
# Show new behavior (what litellm actually returned)
print(f"\n {GREEN}{BOLD}AFTER this PR (litellm.completion() returned):{RESET}")
print_tool_calls_objects("tool_calls", msg.tool_calls)
print(f" content: {msg.content or '(empty)'}")
no_leak = not msg.tool_calls or not any(
tc.function.name == RESPONSE_FORMAT_TOOL_NAME for tc in msg.tool_calls
)
if no_leak:
print(f"\n {GREEN}FIXED: json_tool_call filtered, real tool preserved.{RESET}")
return True
else:
print(f"\n {RED}json_tool_call still leaking!{RESET}")
return False
elif has_json_tool and not has_real_tool:
print(f"\n {YELLOW}Bedrock returned only json_tool_call (no real tool).{RESET}")
print(f" {DIM}The model chose not to call get_weather this time.")
print(f" This case is handled correctly by both old and new code.{RESET}")
print(f"\n litellm result:")
print_tool_calls_objects("tool_calls", msg.tool_calls)
print(f" content: {msg.content or '(empty)'}")
no_leak = not msg.tool_calls or not any(
tc.function.name == RESPONSE_FORMAT_TOOL_NAME for tc in msg.tool_calls
)
print(f"\n {'PASS' if no_leak else 'FAIL'}: json_tool_call {'filtered' if no_leak else 'leaked'}")
print(f"\n {YELLOW}TIP: Re-run — the model sometimes returns both tools.{RESET}")
return no_leak
elif has_real_tool and not has_json_tool:
print(f"\n {YELLOW}Bedrock returned only get_weather (no json_tool_call).{RESET}")
print(f" {DIM}The model chose not to use the json_tool_call tool this time.")
print(f" No filtering needed — both old and new code work fine.{RESET}")
print(f"\n litellm result:")
print_tool_calls_objects("tool_calls", msg.tool_calls)
print(f" content: {msg.content or '(empty)'}")
print(f"\n {YELLOW}TIP: Re-run — the model sometimes returns both tools.{RESET}")
return True
else:
print(f"\n {YELLOW}Bedrock returned no tools at all.{RESET}")
print(f" content: {msg.content or '(empty)'}")
print(f"\n {YELLOW}TIP: Re-run or try a different prompt/model.{RESET}")
return True
# ---------------------------------------------------------------------------
# Test: Streaming with tools + response_format
# ---------------------------------------------------------------------------
def test_streaming(model: str):
subheading("Streaming: tools + response_format (real Bedrock call)")
tools = [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get the current weather for a location.",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "City and state, e.g. San Francisco, CA",
},
},
"required": ["location"],
},
},
}
]
response_format = {
"type": "json_schema",
"json_schema": {
"name": "weather_response",
"schema": {
"type": "object",
"properties": {
"summary": {"type": "string"},
"needs_tool_call": {"type": "boolean"},
},
"required": ["summary", "needs_tool_call"],
},
},
}
messages = [
{
"role": "user",
"content": (
"What's the weather in San Francisco? "
"You MUST call the get_weather tool to check the weather."
),
}
]
response = litellm.completion(
model=model,
messages=messages,
tools=tools,
response_format=response_format,
stream=True,
)
collected_text = ""
collected_tool_calls = {}
finish_reason = None
for chunk in response:
choice = chunk.choices[0]
if choice.finish_reason:
finish_reason = choice.finish_reason
delta = choice.delta
if delta.content:
collected_text += delta.content
if delta.tool_calls:
for tc in delta.tool_calls:
idx = tc.index
if idx not in collected_tool_calls:
collected_tool_calls[idx] = {"name": None, "args": ""}
if tc.function and tc.function.name:
collected_tool_calls[idx]["name"] = tc.function.name
if tc.function and tc.function.arguments:
collected_tool_calls[idx]["args"] += tc.function.arguments
print(f" finish_reason: {finish_reason}")
print(f" content: {collected_text[:200] if collected_text else '(empty)'}")
if collected_tool_calls:
for idx, tc in sorted(collected_tool_calls.items()):
name = tc["name"] or "(unknown)"
color = RED if name == RESPONSE_FORMAT_TOOL_NAME else GREEN
args_short = tc["args"][:120] + "..." if len(tc["args"]) > 120 else tc["args"]
print(f" tool_calls[{idx}]: {color}{name}{RESET}({args_short})")
else:
print(f" tool_calls: {YELLOW}None{RESET}")
leaked = any(
tc["name"] == RESPONSE_FORMAT_TOOL_NAME
for tc in collected_tool_calls.values()
)
if leaked:
print(f"\n {RED}FAIL: json_tool_call leaked into streaming tool_calls!{RESET}")
else:
print(f"\n {GREEN}PASS: No json_tool_call in streaming output.{RESET}")
return not leaked
# ---------------------------------------------------------------------------
# Main
# ---------------------------------------------------------------------------
def main():
parser = argparse.ArgumentParser(
description="Demo: Bedrock json_tool_call fix (live API requests)"
)
parser.add_argument(
"--model",
default="bedrock/us.anthropic.claude-3-5-haiku-20241022-v1:0",
help="Model to test (default: bedrock/us.anthropic.claude-3-5-haiku-20241022-v1:0)",
)
parser.add_argument(
"--region",
default=None,
help="AWS region (e.g. us-east-1, us-west-2). Overrides AWS_REGION_NAME env var.",
)
args = parser.parse_args()
if args.region:
os.environ["AWS_REGION_NAME"] = args.region
region_display = (
args.region
or os.environ.get("AWS_REGION_NAME")
or os.environ.get("AWS_REGION")
or "(default)"
)
heading(
f"Bedrock json_tool_call fix — live API demo\n"
f" model: {args.model}\n"
f" region: {region_display}\n"
f" issue: https://github.com/BerriAI/litellm/issues/18381"
)
tests = [
("non_streaming", test_non_streaming),
("streaming", test_streaming),
]
results = {}
for name, fn in tests:
try:
results[name] = fn(args.model)
except Exception as e:
print(f"\n {RED}ERROR: {e}{RESET}")
results[name] = None
# Summary
heading("Summary")
all_ok = True
for name, result in results.items():
if result is None:
status = f"{YELLOW}SKIP (error){RESET}"
all_ok = False
elif result:
status = f"{GREEN}PASS{RESET}"
else:
status = f"{RED}FAIL{RESET}"
all_ok = False
print(f" {name}: {status}")
if all_ok:
print(f"\n {GREEN}{BOLD}All tests passed! json_tool_call is properly filtered.{RESET}")
else:
ran = [v for v in results.values() if v is not None]
if ran and all(v is True for v in ran):
print(f"\n {GREEN}{BOLD}Runnable tests passed!{RESET} {YELLOW}Some skipped.{RESET}")
else:
print(f"\n {RED}{BOLD}Some tests failed or were skipped.{RESET}")
return 0 if all_ok else 1
if __name__ == "__main__":
raise SystemExit(main())Result doing simulated requests (before vs after PR)Script simulating processing requests"""
Demo: Bedrock Converse API json_tool_call filtering — before vs after
Shows the exact bug from https://github.com/BerriAI/litellm/issues/18381
by replaying a realistic Bedrock response through the old logic (before)
and the new _filter_json_mode_tools logic (after this PR).
No AWS credentials required — this exercises the transformation code directly.
Usage:
poetry run python scripts/demo_bedrock_json_tool_call_fix.py
"""
import json
from copy import deepcopy
from litellm.constants import RESPONSE_FORMAT_TOOL_NAME
from litellm.llms.bedrock.chat.converse_transformation import AmazonConverseConfig
# ---------------------------------------------------------------------------
# Formatting helpers
# ---------------------------------------------------------------------------
BOLD = "\033[1m"
GREEN = "\033[1;32m"
RED = "\033[1;31m"
YELLOW = "\033[1;33m"
CYAN = "\033[1;36m"
DIM = "\033[2m"
RESET = "\033[0m"
def heading(text: str):
print(f"\n{BOLD}{'=' * 72}")
print(f" {text}")
print(f"{'=' * 72}{RESET}\n")
def subheading(text: str):
print(f"\n{BOLD}--- {text} ---{RESET}\n")
def print_tool_calls(label: str, tool_calls):
if not tool_calls:
print(f" {label}: {YELLOW}None{RESET}")
return
for i, tc in enumerate(tool_calls):
name = tc["function"]["name"]
args = tc["function"].get("arguments", "")
color = RED if name == RESPONSE_FORMAT_TOOL_NAME else GREEN
args_short = args[:100] + "..." if len(args) > 100 else args
print(f" {label}[{i}]: {color}{name}{RESET}({args_short})")
# ---------------------------------------------------------------------------
# Simulated Bedrock response — this is what _translate_message_content
# produces when Bedrock returns BOTH json_tool_call AND a real tool.
#
# This happens when the user passes tools + response_format. LiteLLM
# injects a json_tool_call tool, and Bedrock may call both.
# ---------------------------------------------------------------------------
MIXED_TOOLS = [
{
"id": "call_json_001",
"type": "function",
"function": {
"name": RESPONSE_FORMAT_TOOL_NAME, # "json_tool_call"
"arguments": json.dumps({
"summary": "Checking weather in San Francisco",
"needs_tool_call": True,
}),
},
},
{
"id": "call_real_001",
"type": "function",
"function": {
"name": "get_weather",
"arguments": json.dumps({"location": "San Francisco, CA"}),
},
},
]
ONLY_JSON_TOOL = [
{
"id": "call_json_002",
"type": "function",
"function": {
"name": RESPONSE_FORMAT_TOOL_NAME,
"arguments": json.dumps({
"summary": "Paris is cold in January with average highs around 7C",
"needs_tool_call": False,
}),
},
},
]
# ---------------------------------------------------------------------------
# Old logic (before this PR) — extracted verbatim from main branch
# ---------------------------------------------------------------------------
def old_logic(json_mode, tools, chat_completion_message):
"""
The code from main branch (converse_transformation.py lines 1679-1714):
if (
json_mode is True
and tools is not None
and len(tools) == 1 # <-- BUG
and tools[0]["function"].get("name") == RESPONSE_FORMAT_TOOL_NAME
):
# Extract content — only works for single json_tool_call
...
else:
chat_completion_message["tool_calls"] = tools # ALL tools leak
"""
if (
json_mode is True
and tools is not None
and len(tools) == 1
and tools[0]["function"].get("name") == RESPONSE_FORMAT_TOOL_NAME
):
json_mode_content_str = tools[0]["function"].get("arguments")
if json_mode_content_str is not None:
try:
response_data = json.loads(json_mode_content_str)
if (
isinstance(response_data, dict)
and "properties" in response_data
and len(response_data) == 1
):
response_data = response_data["properties"]
json_mode_content_str = json.dumps(response_data)
except json.JSONDecodeError:
pass
chat_completion_message["content"] = json_mode_content_str
else:
chat_completion_message["tool_calls"] = tools
# ---------------------------------------------------------------------------
# Scenario 1: Mixed json_tool_call + real tool
# ---------------------------------------------------------------------------
def scenario_mixed():
subheading("Scenario 1: Bedrock returns BOTH json_tool_call AND get_weather")
print(f" {DIM}This happens when using tools + response_format together.")
print(f" Bedrock calls the internal json_tool_call AND the user's real tool.{RESET}\n")
# --- BEFORE ---
print(f" {RED}{BOLD}BEFORE this PR (old logic):{RESET}")
msg_before = {"role": "assistant", "content": ""}
tools_before = deepcopy(MIXED_TOOLS)
old_logic(json_mode=True, tools=tools_before, chat_completion_message=msg_before)
print_tool_calls("tool_calls", msg_before.get("tool_calls"))
print(f" content: {msg_before.get('content', '') or '(empty)'}")
leaked_before = any(
tc["function"]["name"] == RESPONSE_FORMAT_TOOL_NAME
for tc in msg_before.get("tool_calls", [])
)
if leaked_before:
print(f"\n {RED}BUG: json_tool_call leaked into tool_calls!")
print(f" Consumers (e.g. OpenAI Agents SDK) try to dispatch it as a real")
print(f" tool and crash because no such tool exists.{RESET}")
# --- AFTER ---
print(f"\n {GREEN}{BOLD}AFTER this PR (new _filter_json_mode_tools):{RESET}")
msg_after = {"role": "assistant", "content": ""}
tools_after = deepcopy(MIXED_TOOLS)
filtered = AmazonConverseConfig._filter_json_mode_tools(
json_mode=True,
tools=tools_after,
chat_completion_message=msg_after,
)
print_tool_calls("tool_calls", filtered)
content_after = msg_after.get("content", "") or "(empty)"
print(f" content: {content_after}")
leaked_after = filtered and any(
tc["function"]["name"] == RESPONSE_FORMAT_TOOL_NAME
for tc in filtered
)
has_real = filtered and any(
tc["function"]["name"] == "get_weather"
for tc in filtered
)
if not leaked_after and has_real:
print(f"\n {GREEN}FIXED: json_tool_call filtered out, real tool preserved.")
print(f" Structured output preserved in message.content.{RESET}")
return True
else:
print(f"\n {RED}UNEXPECTED result.{RESET}")
return False
# ---------------------------------------------------------------------------
# Scenario 2: Only json_tool_call (no real tools)
# ---------------------------------------------------------------------------
def scenario_only_json():
subheading("Scenario 2: Bedrock returns only json_tool_call (no user tools)")
print(f" {DIM}This happens with response_format but no user-defined tools.")
print(f" Both old and new code handle this correctly.{RESET}\n")
# --- BEFORE ---
print(f" {CYAN}{BOLD}BEFORE this PR:{RESET}")
msg_before = {"role": "assistant", "content": ""}
tools_before = deepcopy(ONLY_JSON_TOOL)
old_logic(json_mode=True, tools=tools_before, chat_completion_message=msg_before)
print_tool_calls("tool_calls", msg_before.get("tool_calls"))
content_b = msg_before.get("content", "") or "(empty)"
print(f" content: {content_b}")
ok_before = "tool_calls" not in msg_before and msg_before.get("content")
print(f" {'OK' if ok_before else 'ISSUE'}: json_tool_call converted to content\n")
# --- AFTER ---
print(f" {CYAN}{BOLD}AFTER this PR:{RESET}")
msg_after = {"role": "assistant", "content": ""}
tools_after = deepcopy(ONLY_JSON_TOOL)
filtered = AmazonConverseConfig._filter_json_mode_tools(
json_mode=True,
tools=tools_after,
chat_completion_message=msg_after,
)
print_tool_calls("tool_calls", filtered)
content_a = msg_after.get("content", "") or "(empty)"
print(f" content: {content_a}")
ok_after = filtered is None and msg_after.get("content")
print(f" {'OK' if ok_after else 'ISSUE'}: json_tool_call converted to content")
return ok_before and ok_after
# ---------------------------------------------------------------------------
# Scenario 3: pop vs get — optional_params mutation
# ---------------------------------------------------------------------------
def scenario_no_mutation():
subheading("Scenario 3: optional_params mutation (.pop vs .get)")
print(f" {DIM}The old code used optional_params.pop('json_mode'), which mutates")
print(f" the caller's dict. The new code uses .get() instead.{RESET}\n")
params = {"json_mode": True, "tools": []}
print(f" {RED}{BOLD}BEFORE:{RESET} optional_params.pop('json_mode', None)")
params_before = deepcopy(params)
params_before.pop("json_mode", None)
print(f" After pop: 'json_mode' in params = {'json_mode' in params_before}")
print(f" {RED}json_mode is GONE from the dict — may cause issues downstream{RESET}\n")
print(f" {GREEN}{BOLD}AFTER:{RESET} optional_params.get('json_mode', None)")
params_after = deepcopy(params)
params_after.get("json_mode", None)
print(f" After get: 'json_mode' in params = {'json_mode' in params_after}")
print(f" {GREEN}json_mode stays in the dict — no side effects{RESET}")
return True
# ---------------------------------------------------------------------------
# Main
# ---------------------------------------------------------------------------
def main():
heading(
"Bedrock json_tool_call fix — before vs after\n"
f" Issue: https://github.com/BerriAI/litellm/issues/18381\n"
f" Internal tool name: {CYAN}{RESPONSE_FORMAT_TOOL_NAME}{RESET}"
)
print(f" {BOLD}Background:{RESET}")
print(f" When using tools + response_format with Bedrock, LiteLLM injects an")
print(f" internal tool called '{RESPONSE_FORMAT_TOOL_NAME}' to get structured output.")
print(f" Bedrock may return this internal tool ALONGSIDE real user tools.")
print(f" Consumers (e.g. OpenAI Agents SDK) see '{RESPONSE_FORMAT_TOOL_NAME}' in")
print(f" tool_calls and crash trying to dispatch it.\n")
print(f" {BOLD}The old code only handled len(tools)==1. When Bedrock returned 2+")
print(f" tools, json_tool_call leaked through.{RESET}")
results = {}
for name, fn in [
("mixed_tools", scenario_mixed),
("only_json_tool", scenario_only_json),
("no_mutation", scenario_no_mutation),
]:
try:
results[name] = fn()
except Exception as e:
print(f"\n {RED}ERROR: {e}{RESET}")
results[name] = False
# Summary
heading("Summary")
all_ok = True
for name, ok in results.items():
status = f"{GREEN}PASS{RESET}" if ok else f"{RED}FAIL{RESET}"
print(f" {name}: {status}")
if not ok:
all_ok = False
if all_ok:
print(f"\n {GREEN}{BOLD}All scenarios demonstrate the fix correctly.{RESET}")
else:
print(f"\n {RED}{BOLD}Some scenarios did not behave as expected.{RESET}")
return 0 if all_ok else 1
if __name__ == "__main__":
raise SystemExit(main())
|
642d3a2 to
0ad2aee
Compare
49914db to
e40aa80
Compare
…ls in both streaming and non-streaming
When using both `tools` and `response_format` with Bedrock Converse API, LiteLLM
internally adds a fake tool called `json_tool_call` to handle structured output.
Bedrock may return both this internal tool AND real user-defined tools, causing
consumers like OpenAI Agents SDK to break trying to dispatch `json_tool_call`.
This fix:
- Extracts `_filter_json_mode_tools()` to handle 3 scenarios: only json_tool_call
(convert to content), mixed with real tools (filter it out), or no json_tool_call
- Fixes streaming by adding json_mode awareness to AWSEventStreamDecoder, converting
json_tool_call chunks to text content while passing real tool chunks through
- Changes `optional_params.pop("json_mode")` to `.get()` to avoid mutating caller dict
Fixes BerriAI#18381
Credits @haggai-backline for the original investigation in PR BerriAI#18384
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
…_tool_call content in mixed case - Move `import json` to top of converse_transformation.py per CLAUDE.md style guide - In the mixed tools case, preserve json_tool_call arguments as message content so the structured output from response_format is not silently lost - Update test to verify json_tool_call content is preserved as message text Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
httpx.Response.json() is synchronous, not async. Using AsyncMock made the test fail because it turned json() into a coroutine.
Adds try/except around FastAPI imports with fallback mock classes. This allows the module to be imported in test environments where proxy dependencies (FastAPI) may not be installed. Fixes NameError when MCP tests try to import from proxy_server which imports from this module: - NameError: name 'APIRouter' is not defined - NameError: name 'Depends' is not defined - NameError: name 'HTTPException' is not defined - NameError: name 'Query' is not defined
The test was intermittently failing in CI because it used a default MagicMock for the async post() method. This is unreliable across environments. Using AsyncMock explicitly ensures the mock properly handles async/await.
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
202da3d to
e358d29
Compare
|
Closing this PR as it includes unrelated changes (policy_resolve_endpoints, MCP tests, Anthropic tests). Replaced by properly scoped PR #21107 which contains only the Bedrock json_tool_call filtering changes. |
Summary
toolsandresponse_formatwith Bedrock Converse API, LiteLLM internally adds a fake tool calledjson_tool_call(RESPONSE_FORMAT_TOOL_NAME) to handle structured output. Bedrock may return both this internal tool AND real user-defined tools, causing consumers like OpenAI Agents SDK to break trying to dispatchjson_tool_callas a real tool.MultiProcessCollectorfor Prometheus #11067 and builds on the approach from Fix Bedrock Converse API returning both json_tool_call and real tools when tools and response_format are used #18384 by @haggai-backline, extending it to cover the streaming case and fixing theoptional_params.pop()mutation issue.Changes
Non-streaming fix (
converse_transformation.py):_filter_json_mode_tools()static method that handles 3 scenarios: onlyjson_tool_call(convert to content), mixed with real tools (filter it out), or nojson_tool_call(pass through)optional_params.pop("json_mode")→.get("json_mode")to avoid mutating the caller's dictStreaming fix (
invoke_handler.py):json_modeparameter and_current_tool_nametracking toAWSEventStreamDecoderjson_mode=True, suppressesjson_tool_callstart/stop chunks and converts delta chunks to text content instead of tool call argumentsPlumbing (
converse_handler.py,invoke_handler.py):json_modetoAWSEventStreamDecoderinmake_callandmake_sync_callTest plan
test_transform_response_with_both_json_tool_call_and_real_tool— Bedrock returns bothjson_tool_callANDget_weather, verifies onlyget_weatherremainstest_transform_response_does_not_mutate_optional_params— Verifiesoptional_paramsstill containsjson_modeafter_transform_response()test_streaming_filters_json_tool_call_with_real_tools— Streaming chunks with both tools, verifiesjson_tool_callbecomes text content while real tool passes throughtest_streaming_without_json_mode_passes_all_tools— Backward compat:json_mode=Falsepasses all tools through unchangedtest_converse_transformation.pycontinue to passCredits to @haggai-backline for the original investigation and non-streaming approach in PR #18384.
🤖 Generated with Claude Code