fix(bedrock): filter internal json_tool_call when mixed with real tools by jquinter · Pull Request #20916 · BerriAI/litellm

jquinter · 2026-02-11T02:01:43Z

Summary

Fixes [Bug]: Bedrock returns fake json_tool_call tool when using response_format + tools together, breaking OpenAI Agents SDK #18381: When using both tools and response_format with Bedrock Converse API, LiteLLM internally adds a fake tool called json_tool_call (RESPONSE_FORMAT_TOOL_NAME) to handle structured output. Bedrock may return both this internal tool AND real user-defined tools, causing consumers like OpenAI Agents SDK to break trying to dispatch json_tool_call as a real tool.
Supersedes When using multiple workers with uvicorn, use MultiProcessCollector for Prometheus #11067 and builds on the approach from Fix Bedrock Converse API returning both json_tool_call and real tools when tools and response_format are used #18384 by @haggai-backline, extending it to cover the streaming case and fixing the optional_params.pop() mutation issue.

Changes

Non-streaming fix (converse_transformation.py):
- Extracted _filter_json_mode_tools() static method that handles 3 scenarios: only json_tool_call (convert to content), mixed with real tools (filter it out), or no json_tool_call (pass through)
- Changed optional_params.pop("json_mode") → .get("json_mode") to avoid mutating the caller's dict
Streaming fix (invoke_handler.py):
- Added json_mode parameter and _current_tool_name tracking to AWSEventStreamDecoder
- When json_mode=True, suppresses json_tool_call start/stop chunks and converts delta chunks to text content instead of tool call arguments
- Real tool chunks pass through normally
Plumbing (converse_handler.py, invoke_handler.py):
- Pass json_mode to AWSEventStreamDecoder in make_call and make_sync_call

Test plan

test_transform_response_with_both_json_tool_call_and_real_tool — Bedrock returns both json_tool_call AND get_weather, verifies only get_weather remains
test_transform_response_does_not_mutate_optional_params — Verifies optional_params still contains json_mode after _transform_response()
test_streaming_filters_json_tool_call_with_real_tools — Streaming chunks with both tools, verifies json_tool_call becomes text content while real tool passes through
test_streaming_without_json_mode_passes_all_tools — Backward compat: json_mode=False passes all tools through unchanged
All 63 existing tests in test_converse_transformation.py continue to pass

Credits to @haggai-backline for the original investigation and non-streaming approach in PR #18384.

🤖 Generated with Claude Code

vercel · 2026-02-11T02:01:48Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
litellm	Ready	Preview, Comment	Feb 13, 2026 0:03am

greptile-apps · 2026-02-11T02:05:08Z

Greptile Overview

Greptile Summary

Filters the internal json_tool_call (RESPONSE_FORMAT_TOOL_NAME) from Bedrock Converse API responses when it appears alongside real user-defined tools, preventing downstream consumers (e.g., OpenAI Agents SDK) from trying to dispatch it as a real tool. Also fixes a mutation bug where optional_params.pop("json_mode") was modifying the caller's dict.

Non-streaming: Extracted _filter_json_mode_tools() static method handling 3 scenarios: only json_tool_call (convert to content), mixed with real tools (filter out), no json_tool_call (pass through). Changed .pop() → .get() to avoid mutating optional_params.
Streaming: Added json_mode + _current_tool_name tracking to AWSEventStreamDecoder. Suppresses json_tool_call start/stop chunks and converts delta chunks to text content.
Tests: 4 new mock-only tests cover mixed tools, non-mutation, streaming filtering, and backward compatibility.
One asymmetry worth verifying: in the non-streaming mixed case, json_tool_call arguments are silently discarded rather than being converted to text content (unlike the only-json_tool_call and streaming cases).

Confidence Score: 4/5

This PR is safe to merge — well-tested bug fix with clear scope and backward compatibility.
The fix addresses a real user-reported issue ([Bug]: Bedrock returns fake json_tool_call tool when using response_format + tools together, breaking OpenAI Agents SDK #18381) with clean extraction of filtering logic. Tests cover key scenarios including backward compatibility. One minor concern: asymmetric handling of json_tool_call content in the mixed non-streaming case (discarded vs converted to text in other paths).
litellm/llms/bedrock/chat/converse_transformation.py — verify the mixed-tools case intentionally discards json_tool_call content rather than converting it to text.

Important Files Changed

Filename	Overview
litellm/llms/bedrock/chat/converse_transformation.py	Extracts `_filter_json_mode_tools()` to handle 3 scenarios for json_tool_call filtering; fixes `.pop()` → `.get()` to avoid mutating `optional_params`. Minor style note: inline `import json` should be at module level per CLAUDE.md.
litellm/llms/bedrock/chat/invoke_handler.py	Adds `json_mode` and `_current_tool_name` tracking to `AWSEventStreamDecoder`. Suppresses json_tool_call start/stop events and converts delta chunks to text content when in json_mode. Logic is sound and handles state transitions correctly.
litellm/llms/bedrock/chat/converse_handler.py	Passes `json_mode` to `AWSEventStreamDecoder` constructor in `make_sync_call`. Minimal, correct plumbing change.
tests/test_litellm/llms/bedrock/chat/test_converse_transformation.py	Adds 4 well-structured mock-only tests covering: mixed tools filtering, optional_params non-mutation, streaming json_tool_call suppression, and backward compatibility with json_mode=False.

Sequence Diagram

sequenceDiagram
    participant Client
    participant LiteLLM
    participant Bedrock

    Client->>LiteLLM: completion(tools + response_format)
    LiteLLM->>Bedrock: Converse API (tools + json_tool_call)
    Bedrock-->>LiteLLM: Response with json_tool_call + real tools

    alt Non-streaming
        LiteLLM->>LiteLLM: _filter_json_mode_tools()
        alt Only json_tool_call
            LiteLLM->>LiteLLM: Convert to text content
        else Mixed with real tools
            LiteLLM->>LiteLLM: Filter out json_tool_call, keep real tools
        end
    else Streaming
        LiteLLM->>LiteLLM: AWSEventStreamDecoder (json_mode=True)
        loop Per chunk
            alt json_tool_call chunk
                LiteLLM->>LiteLLM: Suppress start/stop, convert delta to text
            else Real tool chunk
                LiteLLM->>LiteLLM: Pass through normally
            end
        end
    end

    LiteLLM-->>Client: Response (only real tools in tool_calls)

greptile-apps

_{4 files reviewed, 2 comments}

_{Edit Code Review Agent Settings | Greptile}

litellm/llms/bedrock/chat/converse_transformation.py

jquinter · 2026-02-11T02:25:48Z

@greptile-apps re-review please!

greptile-apps · 2026-02-11T02:28:52Z

Greptile Overview

Greptile Summary

Fixes an issue where Bedrock Converse API returns both the internal json_tool_call (used internally by LiteLLM for response_format handling) and real user-defined tools, breaking consumers like OpenAI Agents SDK. The fix filters out json_tool_call in both non-streaming and streaming paths, preserving its content as message text so structured output isn't lost.

Extracted _filter_json_mode_tools() in converse_transformation.py to handle 3 scenarios: only json_tool_call (convert to content), mixed with real tools (filter and preserve as text), or no json_tool_call (pass through)
Fixed optional_params.pop("json_mode") → .get("json_mode") to avoid mutating the caller's dict
Added json_mode + _current_tool_name tracking to AWSEventStreamDecoder for streaming: suppresses json_tool_call start/stop chunks and converts delta chunks to text content
4 new mock-only tests covering mixed tools, mutation prevention, streaming filtering, and backward compatibility

Confidence Score: 4/5

This PR is safe to merge — it fixes a real bug with well-tested logic and no breaking changes to existing behavior.
The changes are well-scoped, with clear separation between non-streaming and streaming fixes. The _filter_json_mode_tools logic handles all edge cases (only json_tool_call, mixed, none). The streaming state tracking via _current_tool_name is reliable given Bedrock's sequential content block delivery. The .pop() → .get() fix prevents a real mutation bug. Tests cover all key scenarios with proper mocking. Minor deduction for the unusual for ... break pattern that could be simplified, but no functional issues found.
litellm/llms/bedrock/chat/invoke_handler.py — streaming state management with _current_tool_name relies on Bedrock always delivering content blocks sequentially (start → deltas → stop per block).

Important Files Changed

Filename	Overview
litellm/llms/bedrock/chat/converse_transformation.py	New `_filter_json_mode_tools` static method handles 3 scenarios for json_tool_call filtering. Changed `optional_params.pop()` to `.get()` to avoid mutation. Logic is sound and well-structured.
litellm/llms/bedrock/chat/invoke_handler.py	Added `json_mode` and `_current_tool_name` tracking to `AWSEventStreamDecoder`. Suppresses `json_tool_call` start/stop/delta chunks in streaming and converts deltas to text content. State tracking relies on Bedrock's sequential content block delivery.
litellm/llms/bedrock/chat/converse_handler.py	One-line change to pass `json_mode` to `AWSEventStreamDecoder` constructor in `make_sync_call`.
tests/test_litellm/llms/bedrock/chat/test_converse_transformation.py	4 new mock-only tests covering mixed tools, optional_params mutation, streaming filtering, and backward compatibility. No real network calls.

Sequence Diagram

sequenceDiagram
    participant Client
    participant LiteLLM
    participant Bedrock

    Client->>LiteLLM: completion(tools + response_format)
    LiteLLM->>Bedrock: converse(tools + json_tool_call)
    Bedrock-->>LiteLLM: response with json_tool_call + real tools

    alt Non-Streaming
        LiteLLM->>LiteLLM: _filter_json_mode_tools()
        Note over LiteLLM: Converts json_tool_call to text content<br/>Keeps real tool_calls intact
    else Streaming
        loop For each content block
            alt json_tool_call block
                LiteLLM->>LiteLLM: Suppress start/stop chunks
                LiteLLM->>LiteLLM: Convert delta to text content
            else Real tool block
                LiteLLM->>LiteLLM: Pass through as tool_call
            end
        end
    end

    LiteLLM-->>Client: response with only real tool_calls + text content

greptile-apps

_{4 files reviewed, 1 comment}

_{Edit Code Review Agent Settings | Greptile}

litellm/llms/bedrock/chat/converse_transformation.py

krrishdholakia · 2026-02-11T06:39:00Z

@jquinter have you validated this PR solves the problem, by doing real tests with bedrock?

jquinter · 2026-02-11T15:00:35Z

@jquinter have you validated this PR solves the problem, by doing real tests with bedrock?

yes, I created two scripts, one simulating a request, and the last one I include here, a real request to Bedrock.

Result doing real requests (before vs after PR)

  ========================================================================                                                                                                                                                      
    Bedrock json_tool_call fix — live API demo                                                                       
    model:  bedrock/us.anthropic.claude-haiku-4-5-20251001-v1:0                                                                                                                                                                 
    region: us-east-1                                                                                                                                                                                                           
    issue:  https://github.com/BerriAI/litellm/issues/18381
  ========================================================================


  --- Non-streaming: tools + response_format (real Bedrock call) ---

    Raw tools from Bedrock (before filtering):
    raw_tools[0]: json_tool_call({"summary": "Checking weather in San Francisco, CA", "needs_tool_call": true})
    raw_tools[1]: get_weather({"location": "San Francisco, CA"})

    Bedrock returned BOTH json_tool_call AND get_weather!
    This is the exact scenario that triggers issue #18381.

    BEFORE this PR (old logic would produce):
    tool_calls[0]: json_tool_call({"summary": "Checking weather in San Francisco, CA", "needs_tool_call": true})
    tool_calls[1]: get_weather({"location": "San Francisco, CA"})
    content: (empty)
    ^^^ BUG: json_tool_call leaked into tool_calls!

    AFTER this PR (litellm.completion() returned):
    tool_calls[0]: get_weather({"location": "San Francisco, CA"})
    content: {"summary": "Checking weather in San Francisco, CA", "needs_tool_call": true}

    FIXED: json_tool_call filtered, real tool preserved.

  --- Streaming: tools + response_format (real Bedrock call) ---

    finish_reason: tool_calls
    content:       {"summary": "Checking the weather in San Francisco, CA", "needs_tool_call": true}
    tool_calls:    None

    PASS: No json_tool_call in streaming output.

  ========================================================================
    Summary
  ========================================================================

    non_streaming: PASS
    streaming: PASS

    All tests passed! json_tool_call is properly filtered.

Script doing real requests

"""
Demo: Bedrock Converse API json_tool_call fix — live API requests

Makes a real Bedrock API call and shows the before/after behavior by
capturing the raw tool_calls list BEFORE _filter_json_mode_tools processes it.

This proves the bug from https://github.com/BerriAI/litellm/issues/18381
exists in the raw Bedrock response and that this PR filters it correctly.

Requires AWS credentials configured for Bedrock access.

Usage:
    poetry run python scripts/demo_bedrock_live.py
    poetry run python scripts/demo_bedrock_live.py --model bedrock/us.anthropic.claude-sonnet-4-20250514-v1:0
    poetry run python scripts/demo_bedrock_live.py --region us-west-2
"""

import argparse
import json
import os
from copy import deepcopy
from typing import List, Optional

import litellm
from litellm.constants import RESPONSE_FORMAT_TOOL_NAME
from litellm.llms.bedrock.chat.converse_transformation import AmazonConverseConfig
from litellm.types.utils import ChatCompletionToolCallChunk

# ---------------------------------------------------------------------------
# Formatting
# ---------------------------------------------------------------------------

BOLD = "\033[1m"
GREEN = "\033[1;32m"
RED = "\033[1;31m"
YELLOW = "\033[1;33m"
CYAN = "\033[1;36m"
DIM = "\033[2m"
RESET = "\033[0m"


def heading(text: str):
    print(f"\n{BOLD}{'=' * 72}")
    print(f"  {text}")
    print(f"{'=' * 72}{RESET}\n")


def subheading(text: str):
    print(f"\n{BOLD}--- {text} ---{RESET}\n")


def print_tool_calls_dicts(label: str, tool_calls):
    """Print tool_calls from list of dicts (raw capture)."""
    if not tool_calls:
        print(f"  {label}: {YELLOW}None{RESET}")
        return
    for i, tc in enumerate(tool_calls):
        name = tc["function"]["name"]
        args = tc["function"].get("arguments", "")
        color = RED if name == RESPONSE_FORMAT_TOOL_NAME else GREEN
        args_short = args[:120] + "..." if len(args) > 120 else args
        print(f"  {label}[{i}]: {color}{name}{RESET}({args_short})")


def print_tool_calls_objects(label: str, tool_calls):
    """Print tool_calls from litellm response objects."""
    if not tool_calls:
        print(f"  {label}: {YELLOW}None{RESET}")
        return
    for i, tc in enumerate(tool_calls):
        name = tc.function.name
        args = tc.function.arguments or ""
        color = RED if name == RESPONSE_FORMAT_TOOL_NAME else GREEN
        args_short = args[:120] + "..." if len(args) > 120 else args
        print(f"  {label}[{i}]: {color}{name}{RESET}({args_short})")


# ---------------------------------------------------------------------------
# Old logic — verbatim from main branch before this PR
# ---------------------------------------------------------------------------

def old_logic(tools, chat_completion_message):
    """
    Simulates the old _transform_response code (main branch):

        if (
            json_mode is True
            and tools is not None
            and len(tools) == 1                          # <-- BUG
            and tools[0]["function"].get("name") == RESPONSE_FORMAT_TOOL_NAME
        ):
            # Extract content — only works for single json_tool_call
            ...
        else:
            chat_completion_message["tool_calls"] = tools  # ALL tools leak
    """
    if (
        tools is not None
        and len(tools) == 1
        and tools[0]["function"].get("name") == RESPONSE_FORMAT_TOOL_NAME
    ):
        content = tools[0]["function"].get("arguments")
        if content is not None:
            chat_completion_message["content"] = content
    else:
        chat_completion_message["tool_calls"] = tools


# ---------------------------------------------------------------------------
# Capture hook — monkeypatches _filter_json_mode_tools to grab raw tools
# ---------------------------------------------------------------------------

captured_raw_tools: List[ChatCompletionToolCallChunk] = []
_original_filter = AmazonConverseConfig._filter_json_mode_tools


def _capturing_filter(json_mode, tools, chat_completion_message):
    """Wraps _filter_json_mode_tools to capture the raw tools before filtering."""
    captured_raw_tools.clear()
    if tools:
        captured_raw_tools.extend(deepcopy(tools))
    return _original_filter(json_mode, tools, chat_completion_message)


# ---------------------------------------------------------------------------
# Test: Non-streaming with tools + response_format
# ---------------------------------------------------------------------------

def test_non_streaming(model: str):
    subheading("Non-streaming: tools + response_format (real Bedrock call)")

    tools = [
        {
            "type": "function",
            "function": {
                "name": "get_weather",
                "description": "Get the current weather for a location.",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "location": {
                            "type": "string",
                            "description": "City and state, e.g. San Francisco, CA",
                        },
                    },
                    "required": ["location"],
                },
            },
        }
    ]

    response_format = {
        "type": "json_schema",
        "json_schema": {
            "name": "weather_response",
            "schema": {
                "type": "object",
                "properties": {
                    "summary": {"type": "string"},
                    "needs_tool_call": {"type": "boolean"},
                },
                "required": ["summary", "needs_tool_call"],
            },
        },
    }

    messages = [
        {
            "role": "user",
            "content": (
                "What's the weather in San Francisco? "
                "You MUST call the get_weather tool to check the weather."
            ),
        }
    ]

    # Install the capture hook (must wrap in staticmethod to avoid self injection)
    AmazonConverseConfig._filter_json_mode_tools = staticmethod(_capturing_filter)

    try:
        response = litellm.completion(
            model=model,
            messages=messages,
            tools=tools,
            response_format=response_format,
        )
    finally:
        # Restore original as staticmethod
        AmazonConverseConfig._filter_json_mode_tools = staticmethod(_original_filter)

    msg = response.choices[0].message
    raw_tools = list(captured_raw_tools)

    # --- Show what Bedrock returned (raw) ---
    print(f"  {CYAN}{BOLD}Raw tools from Bedrock (before filtering):{RESET}")
    print_tool_calls_dicts("raw_tools", raw_tools)

    has_json_tool = any(
        t["function"]["name"] == RESPONSE_FORMAT_TOOL_NAME for t in raw_tools
    )
    has_real_tool = any(
        t["function"]["name"] != RESPONSE_FORMAT_TOOL_NAME for t in raw_tools
    )

    if has_json_tool and has_real_tool:
        print(f"\n  {RED}{BOLD}Bedrock returned BOTH json_tool_call AND get_weather!{RESET}")
        print(f"  {DIM}This is the exact scenario that triggers issue #18381.{RESET}")

        # Show old behavior
        print(f"\n  {RED}{BOLD}BEFORE this PR (old logic would produce):{RESET}")
        msg_old = {"role": "assistant", "content": ""}
        old_logic(tools=deepcopy(raw_tools), chat_completion_message=msg_old)
        print_tool_calls_dicts("tool_calls", msg_old.get("tool_calls"))
        print(f"  content: {msg_old.get('content', '') or '(empty)'}")
        leaked = any(
            tc["function"]["name"] == RESPONSE_FORMAT_TOOL_NAME
            for tc in msg_old.get("tool_calls", [])
        )
        if leaked:
            print(f"  {RED}^^^ BUG: json_tool_call leaked into tool_calls!{RESET}")

        # Show new behavior (what litellm actually returned)
        print(f"\n  {GREEN}{BOLD}AFTER this PR (litellm.completion() returned):{RESET}")
        print_tool_calls_objects("tool_calls", msg.tool_calls)
        print(f"  content: {msg.content or '(empty)'}")

        no_leak = not msg.tool_calls or not any(
            tc.function.name == RESPONSE_FORMAT_TOOL_NAME for tc in msg.tool_calls
        )
        if no_leak:
            print(f"\n  {GREEN}FIXED: json_tool_call filtered, real tool preserved.{RESET}")
            return True
        else:
            print(f"\n  {RED}json_tool_call still leaking!{RESET}")
            return False

    elif has_json_tool and not has_real_tool:
        print(f"\n  {YELLOW}Bedrock returned only json_tool_call (no real tool).{RESET}")
        print(f"  {DIM}The model chose not to call get_weather this time.")
        print(f"  This case is handled correctly by both old and new code.{RESET}")
        print(f"\n  litellm result:")
        print_tool_calls_objects("tool_calls", msg.tool_calls)
        print(f"  content: {msg.content or '(empty)'}")
        no_leak = not msg.tool_calls or not any(
            tc.function.name == RESPONSE_FORMAT_TOOL_NAME for tc in msg.tool_calls
        )
        print(f"\n  {'PASS' if no_leak else 'FAIL'}: json_tool_call {'filtered' if no_leak else 'leaked'}")
        print(f"\n  {YELLOW}TIP: Re-run — the model sometimes returns both tools.{RESET}")
        return no_leak

    elif has_real_tool and not has_json_tool:
        print(f"\n  {YELLOW}Bedrock returned only get_weather (no json_tool_call).{RESET}")
        print(f"  {DIM}The model chose not to use the json_tool_call tool this time.")
        print(f"  No filtering needed — both old and new code work fine.{RESET}")
        print(f"\n  litellm result:")
        print_tool_calls_objects("tool_calls", msg.tool_calls)
        print(f"  content: {msg.content or '(empty)'}")
        print(f"\n  {YELLOW}TIP: Re-run — the model sometimes returns both tools.{RESET}")
        return True

    else:
        print(f"\n  {YELLOW}Bedrock returned no tools at all.{RESET}")
        print(f"  content: {msg.content or '(empty)'}")
        print(f"\n  {YELLOW}TIP: Re-run or try a different prompt/model.{RESET}")
        return True


# ---------------------------------------------------------------------------
# Test: Streaming with tools + response_format
# ---------------------------------------------------------------------------

def test_streaming(model: str):
    subheading("Streaming: tools + response_format (real Bedrock call)")

    tools = [
        {
            "type": "function",
            "function": {
                "name": "get_weather",
                "description": "Get the current weather for a location.",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "location": {
                            "type": "string",
                            "description": "City and state, e.g. San Francisco, CA",
                        },
                    },
                    "required": ["location"],
                },
            },
        }
    ]

    response_format = {
        "type": "json_schema",
        "json_schema": {
            "name": "weather_response",
            "schema": {
                "type": "object",
                "properties": {
                    "summary": {"type": "string"},
                    "needs_tool_call": {"type": "boolean"},
                },
                "required": ["summary", "needs_tool_call"],
            },
        },
    }

    messages = [
        {
            "role": "user",
            "content": (
                "What's the weather in San Francisco? "
                "You MUST call the get_weather tool to check the weather."
            ),
        }
    ]

    response = litellm.completion(
        model=model,
        messages=messages,
        tools=tools,
        response_format=response_format,
        stream=True,
    )

    collected_text = ""
    collected_tool_calls = {}
    finish_reason = None

    for chunk in response:
        choice = chunk.choices[0]
        if choice.finish_reason:
            finish_reason = choice.finish_reason
        delta = choice.delta
        if delta.content:
            collected_text += delta.content
        if delta.tool_calls:
            for tc in delta.tool_calls:
                idx = tc.index
                if idx not in collected_tool_calls:
                    collected_tool_calls[idx] = {"name": None, "args": ""}
                if tc.function and tc.function.name:
                    collected_tool_calls[idx]["name"] = tc.function.name
                if tc.function and tc.function.arguments:
                    collected_tool_calls[idx]["args"] += tc.function.arguments

    print(f"  finish_reason: {finish_reason}")
    print(f"  content:       {collected_text[:200] if collected_text else '(empty)'}")
    if collected_tool_calls:
        for idx, tc in sorted(collected_tool_calls.items()):
            name = tc["name"] or "(unknown)"
            color = RED if name == RESPONSE_FORMAT_TOOL_NAME else GREEN
            args_short = tc["args"][:120] + "..." if len(tc["args"]) > 120 else tc["args"]
            print(f"  tool_calls[{idx}]: {color}{name}{RESET}({args_short})")
    else:
        print(f"  tool_calls:    {YELLOW}None{RESET}")

    leaked = any(
        tc["name"] == RESPONSE_FORMAT_TOOL_NAME
        for tc in collected_tool_calls.values()
    )
    if leaked:
        print(f"\n  {RED}FAIL: json_tool_call leaked into streaming tool_calls!{RESET}")
    else:
        print(f"\n  {GREEN}PASS: No json_tool_call in streaming output.{RESET}")
    return not leaked


# ---------------------------------------------------------------------------
# Main
# ---------------------------------------------------------------------------

def main():
    parser = argparse.ArgumentParser(
        description="Demo: Bedrock json_tool_call fix (live API requests)"
    )
    parser.add_argument(
        "--model",
        default="bedrock/us.anthropic.claude-3-5-haiku-20241022-v1:0",
        help="Model to test (default: bedrock/us.anthropic.claude-3-5-haiku-20241022-v1:0)",
    )
    parser.add_argument(
        "--region",
        default=None,
        help="AWS region (e.g. us-east-1, us-west-2). Overrides AWS_REGION_NAME env var.",
    )
    args = parser.parse_args()

    if args.region:
        os.environ["AWS_REGION_NAME"] = args.region

    region_display = (
        args.region
        or os.environ.get("AWS_REGION_NAME")
        or os.environ.get("AWS_REGION")
        or "(default)"
    )

    heading(
        f"Bedrock json_tool_call fix — live API demo\n"
        f"  model:  {args.model}\n"
        f"  region: {region_display}\n"
        f"  issue:  https://github.com/BerriAI/litellm/issues/18381"
    )

    tests = [
        ("non_streaming", test_non_streaming),
        ("streaming", test_streaming),
    ]
    results = {}

    for name, fn in tests:
        try:
            results[name] = fn(args.model)
        except Exception as e:
            print(f"\n  {RED}ERROR: {e}{RESET}")
            results[name] = None

    # Summary
    heading("Summary")
    all_ok = True
    for name, result in results.items():
        if result is None:
            status = f"{YELLOW}SKIP (error){RESET}"
            all_ok = False
        elif result:
            status = f"{GREEN}PASS{RESET}"
        else:
            status = f"{RED}FAIL{RESET}"
            all_ok = False
        print(f"  {name}: {status}")

    if all_ok:
        print(f"\n  {GREEN}{BOLD}All tests passed! json_tool_call is properly filtered.{RESET}")
    else:
        ran = [v for v in results.values() if v is not None]
        if ran and all(v is True for v in ran):
            print(f"\n  {GREEN}{BOLD}Runnable tests passed!{RESET} {YELLOW}Some skipped.{RESET}")
        else:
            print(f"\n  {RED}{BOLD}Some tests failed or were skipped.{RESET}")

    return 0 if all_ok else 1


if __name__ == "__main__":
    raise SystemExit(main())

Result doing simulated requests (before vs after PR)

========================================================================
  Bedrock json_tool_call fix — before vs after
  Issue: https://github.com/BerriAI/litellm/issues/18381
  Internal tool name: json_tool_call
========================================================================

  Background:
  When using tools + response_format with Bedrock, LiteLLM injects an
  internal tool called 'json_tool_call' to get structured output.
  Bedrock may return this internal tool ALONGSIDE real user tools.
  Consumers (e.g. OpenAI Agents SDK) see 'json_tool_call' in
  tool_calls and crash trying to dispatch it.

  The old code only handled len(tools)==1. When Bedrock returned 2+
  tools, json_tool_call leaked through.

--- Scenario 1: Bedrock returns BOTH json_tool_call AND get_weather ---

  This happens when using tools + response_format together.
  Bedrock calls the internal json_tool_call AND the user's real tool.

  BEFORE this PR (old logic):
  tool_calls[0]: json_tool_call({"summary": "Checking weather in San Francisco", "needs_tool_call": true})
  tool_calls[1]: get_weather({"location": "San Francisco, CA"})
  content: (empty)

  BUG: json_tool_call leaked into tool_calls!
  Consumers (e.g. OpenAI Agents SDK) try to dispatch it as a real
  tool and crash because no such tool exists.

  AFTER this PR (new _filter_json_mode_tools):
  tool_calls[0]: get_weather({"location": "San Francisco, CA"})
  content: {"summary": "Checking weather in San Francisco", "needs_tool_call": true}

  FIXED: json_tool_call filtered out, real tool preserved.
  Structured output preserved in message.content.

--- Scenario 2: Bedrock returns only json_tool_call (no user tools) ---

  This happens with response_format but no user-defined tools.
  Both old and new code handle this correctly.

  BEFORE this PR:
  tool_calls: None
  content: {"summary": "Paris is cold in January with average highs around 7C", "needs_tool_call": false}
  OK: json_tool_call converted to content

  AFTER this PR:
  tool_calls: None
  content: {"summary": "Paris is cold in January with average highs around 7C", "needs_tool_call": false}
  OK: json_tool_call converted to content

--- Scenario 3: optional_params mutation (.pop vs .get) ---

  The old code used optional_params.pop('json_mode'), which mutates
  the caller's dict. The new code uses .get() instead.

  BEFORE: optional_params.pop('json_mode', None)
  After pop: 'json_mode' in params = False
  json_mode is GONE from the dict — may cause issues downstream

  AFTER:  optional_params.get('json_mode', None)
  After get: 'json_mode' in params = True
  json_mode stays in the dict — no side effects

========================================================================
  Summary
========================================================================

  mixed_tools: PASS
  only_json_tool: PASS
  no_mutation: PASS

  All scenarios demonstrate the fix correctly.

Script simulating processing requests

"""
Demo: Bedrock Converse API json_tool_call filtering — before vs after

Shows the exact bug from https://github.com/BerriAI/litellm/issues/18381
by replaying a realistic Bedrock response through the old logic (before)
and the new _filter_json_mode_tools logic (after this PR).

No AWS credentials required — this exercises the transformation code directly.

Usage:
    poetry run python scripts/demo_bedrock_json_tool_call_fix.py
"""

import json
from copy import deepcopy

from litellm.constants import RESPONSE_FORMAT_TOOL_NAME
from litellm.llms.bedrock.chat.converse_transformation import AmazonConverseConfig

# ---------------------------------------------------------------------------
# Formatting helpers
# ---------------------------------------------------------------------------

BOLD = "\033[1m"
GREEN = "\033[1;32m"
RED = "\033[1;31m"
YELLOW = "\033[1;33m"
CYAN = "\033[1;36m"
DIM = "\033[2m"
RESET = "\033[0m"


def heading(text: str):
    print(f"\n{BOLD}{'=' * 72}")
    print(f"  {text}")
    print(f"{'=' * 72}{RESET}\n")


def subheading(text: str):
    print(f"\n{BOLD}--- {text} ---{RESET}\n")


def print_tool_calls(label: str, tool_calls):
    if not tool_calls:
        print(f"  {label}: {YELLOW}None{RESET}")
        return
    for i, tc in enumerate(tool_calls):
        name = tc["function"]["name"]
        args = tc["function"].get("arguments", "")
        color = RED if name == RESPONSE_FORMAT_TOOL_NAME else GREEN
        args_short = args[:100] + "..." if len(args) > 100 else args
        print(f"  {label}[{i}]: {color}{name}{RESET}({args_short})")


# ---------------------------------------------------------------------------
# Simulated Bedrock response — this is what _translate_message_content
# produces when Bedrock returns BOTH json_tool_call AND a real tool.
#
# This happens when the user passes tools + response_format. LiteLLM
# injects a json_tool_call tool, and Bedrock may call both.
# ---------------------------------------------------------------------------

MIXED_TOOLS = [
    {
        "id": "call_json_001",
        "type": "function",
        "function": {
            "name": RESPONSE_FORMAT_TOOL_NAME,  # "json_tool_call"
            "arguments": json.dumps({
                "summary": "Checking weather in San Francisco",
                "needs_tool_call": True,
            }),
        },
    },
    {
        "id": "call_real_001",
        "type": "function",
        "function": {
            "name": "get_weather",
            "arguments": json.dumps({"location": "San Francisco, CA"}),
        },
    },
]

ONLY_JSON_TOOL = [
    {
        "id": "call_json_002",
        "type": "function",
        "function": {
            "name": RESPONSE_FORMAT_TOOL_NAME,
            "arguments": json.dumps({
                "summary": "Paris is cold in January with average highs around 7C",
                "needs_tool_call": False,
            }),
        },
    },
]


# ---------------------------------------------------------------------------
# Old logic (before this PR) — extracted verbatim from main branch
# ---------------------------------------------------------------------------

def old_logic(json_mode, tools, chat_completion_message):
    """
    The code from main branch (converse_transformation.py lines 1679-1714):

        if (
            json_mode is True
            and tools is not None
            and len(tools) == 1                          # <-- BUG
            and tools[0]["function"].get("name") == RESPONSE_FORMAT_TOOL_NAME
        ):
            # Extract content — only works for single json_tool_call
            ...
        else:
            chat_completion_message["tool_calls"] = tools  # ALL tools leak
    """
    if (
        json_mode is True
        and tools is not None
        and len(tools) == 1
        and tools[0]["function"].get("name") == RESPONSE_FORMAT_TOOL_NAME
    ):
        json_mode_content_str = tools[0]["function"].get("arguments")
        if json_mode_content_str is not None:
            try:
                response_data = json.loads(json_mode_content_str)
                if (
                    isinstance(response_data, dict)
                    and "properties" in response_data
                    and len(response_data) == 1
                ):
                    response_data = response_data["properties"]
                    json_mode_content_str = json.dumps(response_data)
            except json.JSONDecodeError:
                pass
            chat_completion_message["content"] = json_mode_content_str
    else:
        chat_completion_message["tool_calls"] = tools


# ---------------------------------------------------------------------------
# Scenario 1: Mixed json_tool_call + real tool
# ---------------------------------------------------------------------------

def scenario_mixed():
    subheading("Scenario 1: Bedrock returns BOTH json_tool_call AND get_weather")
    print(f"  {DIM}This happens when using tools + response_format together.")
    print(f"  Bedrock calls the internal json_tool_call AND the user's real tool.{RESET}\n")

    # --- BEFORE ---
    print(f"  {RED}{BOLD}BEFORE this PR (old logic):{RESET}")
    msg_before = {"role": "assistant", "content": ""}
    tools_before = deepcopy(MIXED_TOOLS)
    old_logic(json_mode=True, tools=tools_before, chat_completion_message=msg_before)

    print_tool_calls("tool_calls", msg_before.get("tool_calls"))
    print(f"  content: {msg_before.get('content', '') or '(empty)'}")

    leaked_before = any(
        tc["function"]["name"] == RESPONSE_FORMAT_TOOL_NAME
        for tc in msg_before.get("tool_calls", [])
    )
    if leaked_before:
        print(f"\n  {RED}BUG: json_tool_call leaked into tool_calls!")
        print(f"  Consumers (e.g. OpenAI Agents SDK) try to dispatch it as a real")
        print(f"  tool and crash because no such tool exists.{RESET}")

    # --- AFTER ---
    print(f"\n  {GREEN}{BOLD}AFTER this PR (new _filter_json_mode_tools):{RESET}")
    msg_after = {"role": "assistant", "content": ""}
    tools_after = deepcopy(MIXED_TOOLS)
    filtered = AmazonConverseConfig._filter_json_mode_tools(
        json_mode=True,
        tools=tools_after,
        chat_completion_message=msg_after,
    )

    print_tool_calls("tool_calls", filtered)
    content_after = msg_after.get("content", "") or "(empty)"
    print(f"  content: {content_after}")

    leaked_after = filtered and any(
        tc["function"]["name"] == RESPONSE_FORMAT_TOOL_NAME
        for tc in filtered
    )
    has_real = filtered and any(
        tc["function"]["name"] == "get_weather"
        for tc in filtered
    )

    if not leaked_after and has_real:
        print(f"\n  {GREEN}FIXED: json_tool_call filtered out, real tool preserved.")
        print(f"  Structured output preserved in message.content.{RESET}")
        return True
    else:
        print(f"\n  {RED}UNEXPECTED result.{RESET}")
        return False


# ---------------------------------------------------------------------------
# Scenario 2: Only json_tool_call (no real tools)
# ---------------------------------------------------------------------------

def scenario_only_json():
    subheading("Scenario 2: Bedrock returns only json_tool_call (no user tools)")
    print(f"  {DIM}This happens with response_format but no user-defined tools.")
    print(f"  Both old and new code handle this correctly.{RESET}\n")

    # --- BEFORE ---
    print(f"  {CYAN}{BOLD}BEFORE this PR:{RESET}")
    msg_before = {"role": "assistant", "content": ""}
    tools_before = deepcopy(ONLY_JSON_TOOL)
    old_logic(json_mode=True, tools=tools_before, chat_completion_message=msg_before)

    print_tool_calls("tool_calls", msg_before.get("tool_calls"))
    content_b = msg_before.get("content", "") or "(empty)"
    print(f"  content: {content_b}")
    ok_before = "tool_calls" not in msg_before and msg_before.get("content")
    print(f"  {'OK' if ok_before else 'ISSUE'}: json_tool_call converted to content\n")

    # --- AFTER ---
    print(f"  {CYAN}{BOLD}AFTER this PR:{RESET}")
    msg_after = {"role": "assistant", "content": ""}
    tools_after = deepcopy(ONLY_JSON_TOOL)
    filtered = AmazonConverseConfig._filter_json_mode_tools(
        json_mode=True,
        tools=tools_after,
        chat_completion_message=msg_after,
    )

    print_tool_calls("tool_calls", filtered)
    content_a = msg_after.get("content", "") or "(empty)"
    print(f"  content: {content_a}")
    ok_after = filtered is None and msg_after.get("content")
    print(f"  {'OK' if ok_after else 'ISSUE'}: json_tool_call converted to content")

    return ok_before and ok_after


# ---------------------------------------------------------------------------
# Scenario 3: pop vs get — optional_params mutation
# ---------------------------------------------------------------------------

def scenario_no_mutation():
    subheading("Scenario 3: optional_params mutation (.pop vs .get)")
    print(f"  {DIM}The old code used optional_params.pop('json_mode'), which mutates")
    print(f"  the caller's dict. The new code uses .get() instead.{RESET}\n")

    params = {"json_mode": True, "tools": []}

    print(f"  {RED}{BOLD}BEFORE:{RESET} optional_params.pop('json_mode', None)")
    params_before = deepcopy(params)
    params_before.pop("json_mode", None)
    print(f"  After pop: 'json_mode' in params = {'json_mode' in params_before}")
    print(f"  {RED}json_mode is GONE from the dict — may cause issues downstream{RESET}\n")

    print(f"  {GREEN}{BOLD}AFTER:{RESET}  optional_params.get('json_mode', None)")
    params_after = deepcopy(params)
    params_after.get("json_mode", None)
    print(f"  After get: 'json_mode' in params = {'json_mode' in params_after}")
    print(f"  {GREEN}json_mode stays in the dict — no side effects{RESET}")

    return True


# ---------------------------------------------------------------------------
# Main
# ---------------------------------------------------------------------------

def main():
    heading(
        "Bedrock json_tool_call fix — before vs after\n"
        f"  Issue: https://github.com/BerriAI/litellm/issues/18381\n"
        f"  Internal tool name: {CYAN}{RESPONSE_FORMAT_TOOL_NAME}{RESET}"
    )

    print(f"  {BOLD}Background:{RESET}")
    print(f"  When using tools + response_format with Bedrock, LiteLLM injects an")
    print(f"  internal tool called '{RESPONSE_FORMAT_TOOL_NAME}' to get structured output.")
    print(f"  Bedrock may return this internal tool ALONGSIDE real user tools.")
    print(f"  Consumers (e.g. OpenAI Agents SDK) see '{RESPONSE_FORMAT_TOOL_NAME}' in")
    print(f"  tool_calls and crash trying to dispatch it.\n")
    print(f"  {BOLD}The old code only handled len(tools)==1. When Bedrock returned 2+")
    print(f"  tools, json_tool_call leaked through.{RESET}")

    results = {}

    for name, fn in [
        ("mixed_tools", scenario_mixed),
        ("only_json_tool", scenario_only_json),
        ("no_mutation", scenario_no_mutation),
    ]:
        try:
            results[name] = fn()
        except Exception as e:
            print(f"\n  {RED}ERROR: {e}{RESET}")
            results[name] = False

    # Summary
    heading("Summary")
    all_ok = True
    for name, ok in results.items():
        status = f"{GREEN}PASS{RESET}" if ok else f"{RED}FAIL{RESET}"
        print(f"  {name}: {status}")
        if not ok:
            all_ok = False

    if all_ok:
        print(f"\n  {GREEN}{BOLD}All scenarios demonstrate the fix correctly.{RESET}")
    else:
        print(f"\n  {RED}{BOLD}Some scenarios did not behave as expected.{RESET}")

    return 0 if all_ok else 1


if __name__ == "__main__":
    raise SystemExit(main())

@haggai-backline

…ls in both streaming and non-streaming When using both `tools` and `response_format` with Bedrock Converse API, LiteLLM internally adds a fake tool called `json_tool_call` to handle structured output. Bedrock may return both this internal tool AND real user-defined tools, causing consumers like OpenAI Agents SDK to break trying to dispatch `json_tool_call`. This fix: - Extracts `_filter_json_mode_tools()` to handle 3 scenarios: only json_tool_call (convert to content), mixed with real tools (filter it out), or no json_tool_call - Fixes streaming by adding json_mode awareness to AWSEventStreamDecoder, converting json_tool_call chunks to text content while passing real tool chunks through - Changes `optional_params.pop("json_mode")` to `.get()` to avoid mutating caller dict Fixes BerriAI#18381 Credits @haggai-backline for the original investigation in PR BerriAI#18384 Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

…_tool_call content in mixed case - Move `import json` to top of converse_transformation.py per CLAUDE.md style guide - In the mixed tools case, preserve json_tool_call arguments as message content so the structured output from response_format is not silently lost - Update test to verify json_tool_call content is preserved as message text Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

httpx.Response.json() is synchronous, not async. Using AsyncMock made the test fail because it turned json() into a coroutine.

Adds try/except around FastAPI imports with fallback mock classes. This allows the module to be imported in test environments where proxy dependencies (FastAPI) may not be installed. Fixes NameError when MCP tests try to import from proxy_server which imports from this module: - NameError: name 'APIRouter' is not defined - NameError: name 'Depends' is not defined - NameError: name 'HTTPException' is not defined - NameError: name 'Query' is not defined

The test was intermittently failing in CI because it used a default MagicMock for the async post() method. This is unreliable across environments. Using AsyncMock explicitly ensures the mock properly handles async/await.

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

jquinter · 2026-02-13T09:25:44Z

Closing this PR as it includes unrelated changes (policy_resolve_endpoints, MCP tests, Anthropic tests). Replaced by properly scoped PR #21107 which contains only the Bedrock json_tool_call filtering changes.

vercel bot deployed to Preview February 11, 2026 02:03 View deployment

greptile-apps bot reviewed Feb 11, 2026

View reviewed changes

litellm/llms/bedrock/chat/converse_transformation.py Outdated Show resolved Hide resolved

litellm/llms/bedrock/chat/converse_transformation.py Outdated Show resolved Hide resolved

vercel bot deployed to Preview February 11, 2026 02:11 View deployment

greptile-apps bot reviewed Feb 11, 2026

View reviewed changes

litellm/llms/bedrock/chat/converse_transformation.py Outdated Show resolved Hide resolved

jquinter force-pushed the fix/bedrock-filter-json-tool-call-mixed-tools branch from 642d3a2 to 0ad2aee Compare February 12, 2026 22:13

vercel bot deployed to Preview February 12, 2026 22:15 View deployment

vercel bot deployed to Preview February 12, 2026 22:18 View deployment

vercel bot deployed to Preview February 12, 2026 22:32 View deployment

vercel bot deployed to Preview February 12, 2026 22:40 View deployment

jquinter force-pushed the fix/bedrock-filter-json-tool-call-mixed-tools branch from 49914db to e40aa80 Compare February 12, 2026 23:22

vercel bot deployed to Preview February 12, 2026 23:24 View deployment

vercel bot deployed to Preview February 12, 2026 23:38 View deployment

vercel bot deployed to Preview February 12, 2026 23:43 View deployment

jquinter and others added 9 commits February 12, 2026 21:00

chore: regenerate poetry.lock after rebase

c379ad7

fix: update MCP test mocks to accept litellm_trace_id parameter

4975ceb

fix: use MagicMock instead of AsyncMock for httpx.Response mock

afffa6f

httpx.Response.json() is synchronous, not async. Using AsyncMock made the test fail because it turned json() into a coroutine.

Update litellm/llms/bedrock/chat/converse_transformation.py

5c713c2

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

chore: regenerate poetry.lock after rebasing onto latest main

e358d29

jquinter force-pushed the fix/bedrock-filter-json-tool-call-mixed-tools branch from 202da3d to e358d29 Compare February 13, 2026 00:01

vercel bot deployed to Preview February 13, 2026 00:03 View deployment

jquinter mentioned this pull request Feb 13, 2026

fix(bedrock): filter internal json_tool_call when mixed with real tools #21107

Merged

5 tasks

jquinter closed this Feb 13, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(bedrock): filter internal json_tool_call when mixed with real tools#20916

fix(bedrock): filter internal json_tool_call when mixed with real tools#20916
jquinter wants to merge 9 commits intoBerriAI:mainfrom
jquinter:fix/bedrock-filter-json-tool-call-mixed-tools

jquinter commented Feb 11, 2026

Uh oh!

vercel bot commented Feb 11, 2026 •

edited

Loading

Uh oh!

greptile-apps bot commented Feb 11, 2026

Important Files Changed

Uh oh!

greptile-apps bot left a comment

Uh oh!

Uh oh!

Uh oh!

jquinter commented Feb 11, 2026

Uh oh!

greptile-apps bot commented Feb 11, 2026

Important Files Changed

Uh oh!

greptile-apps bot left a comment

Uh oh!

Uh oh!

krrishdholakia commented Feb 11, 2026

Uh oh!

jquinter commented Feb 11, 2026

Uh oh!

jquinter commented Feb 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

jquinter commented Feb 11, 2026

Summary

Changes

Test plan

Uh oh!

vercel bot commented Feb 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

greptile-apps bot commented Feb 11, 2026

Greptile Overview

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

jquinter commented Feb 11, 2026

Uh oh!

greptile-apps bot commented Feb 11, 2026

Greptile Overview

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

krrishdholakia commented Feb 11, 2026

Uh oh!

jquinter commented Feb 11, 2026

Uh oh!

jquinter commented Feb 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

vercel bot commented Feb 11, 2026 •

edited

Loading