Skip to content

fix(bedrock): filter internal json_tool_call when mixed with real tools#20916

Closed
jquinter wants to merge 9 commits intoBerriAI:mainfrom
jquinter:fix/bedrock-filter-json-tool-call-mixed-tools
Closed

fix(bedrock): filter internal json_tool_call when mixed with real tools#20916
jquinter wants to merge 9 commits intoBerriAI:mainfrom
jquinter:fix/bedrock-filter-json-tool-call-mixed-tools

Conversation

@jquinter
Copy link
Contributor

Summary

Changes

  1. Non-streaming fix (converse_transformation.py):

    • Extracted _filter_json_mode_tools() static method that handles 3 scenarios: only json_tool_call (convert to content), mixed with real tools (filter it out), or no json_tool_call (pass through)
    • Changed optional_params.pop("json_mode").get("json_mode") to avoid mutating the caller's dict
  2. Streaming fix (invoke_handler.py):

    • Added json_mode parameter and _current_tool_name tracking to AWSEventStreamDecoder
    • When json_mode=True, suppresses json_tool_call start/stop chunks and converts delta chunks to text content instead of tool call arguments
    • Real tool chunks pass through normally
  3. Plumbing (converse_handler.py, invoke_handler.py):

    • Pass json_mode to AWSEventStreamDecoder in make_call and make_sync_call

Test plan

  • test_transform_response_with_both_json_tool_call_and_real_tool — Bedrock returns both json_tool_call AND get_weather, verifies only get_weather remains
  • test_transform_response_does_not_mutate_optional_params — Verifies optional_params still contains json_mode after _transform_response()
  • test_streaming_filters_json_tool_call_with_real_tools — Streaming chunks with both tools, verifies json_tool_call becomes text content while real tool passes through
  • test_streaming_without_json_mode_passes_all_tools — Backward compat: json_mode=False passes all tools through unchanged
  • All 63 existing tests in test_converse_transformation.py continue to pass

Credits to @haggai-backline for the original investigation and non-streaming approach in PR #18384.

🤖 Generated with Claude Code

@vercel
Copy link

vercel bot commented Feb 11, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
litellm Ready Ready Preview, Comment Feb 13, 2026 0:03am

Request Review

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Feb 11, 2026

Greptile Overview

Greptile Summary

Filters the internal json_tool_call (RESPONSE_FORMAT_TOOL_NAME) from Bedrock Converse API responses when it appears alongside real user-defined tools, preventing downstream consumers (e.g., OpenAI Agents SDK) from trying to dispatch it as a real tool. Also fixes a mutation bug where optional_params.pop("json_mode") was modifying the caller's dict.

  • Non-streaming: Extracted _filter_json_mode_tools() static method handling 3 scenarios: only json_tool_call (convert to content), mixed with real tools (filter out), no json_tool_call (pass through). Changed .pop().get() to avoid mutating optional_params.
  • Streaming: Added json_mode + _current_tool_name tracking to AWSEventStreamDecoder. Suppresses json_tool_call start/stop chunks and converts delta chunks to text content.
  • Tests: 4 new mock-only tests cover mixed tools, non-mutation, streaming filtering, and backward compatibility.
  • One asymmetry worth verifying: in the non-streaming mixed case, json_tool_call arguments are silently discarded rather than being converted to text content (unlike the only-json_tool_call and streaming cases).

Confidence Score: 4/5

  • This PR is safe to merge — well-tested bug fix with clear scope and backward compatibility.
  • The fix addresses a real user-reported issue ([Bug]: Bedrock returns fake json_tool_call tool when using response_format + tools together, breaking OpenAI Agents SDK #18381) with clean extraction of filtering logic. Tests cover key scenarios including backward compatibility. One minor concern: asymmetric handling of json_tool_call content in the mixed non-streaming case (discarded vs converted to text in other paths).
  • litellm/llms/bedrock/chat/converse_transformation.py — verify the mixed-tools case intentionally discards json_tool_call content rather than converting it to text.

Important Files Changed

Filename Overview
litellm/llms/bedrock/chat/converse_transformation.py Extracts _filter_json_mode_tools() to handle 3 scenarios for json_tool_call filtering; fixes .pop().get() to avoid mutating optional_params. Minor style note: inline import json should be at module level per CLAUDE.md.
litellm/llms/bedrock/chat/invoke_handler.py Adds json_mode and _current_tool_name tracking to AWSEventStreamDecoder. Suppresses json_tool_call start/stop events and converts delta chunks to text content when in json_mode. Logic is sound and handles state transitions correctly.
litellm/llms/bedrock/chat/converse_handler.py Passes json_mode to AWSEventStreamDecoder constructor in make_sync_call. Minimal, correct plumbing change.
tests/test_litellm/llms/bedrock/chat/test_converse_transformation.py Adds 4 well-structured mock-only tests covering: mixed tools filtering, optional_params non-mutation, streaming json_tool_call suppression, and backward compatibility with json_mode=False.

Sequence Diagram

sequenceDiagram
    participant Client
    participant LiteLLM
    participant Bedrock

    Client->>LiteLLM: completion(tools + response_format)
    LiteLLM->>Bedrock: Converse API (tools + json_tool_call)
    Bedrock-->>LiteLLM: Response with json_tool_call + real tools

    alt Non-streaming
        LiteLLM->>LiteLLM: _filter_json_mode_tools()
        alt Only json_tool_call
            LiteLLM->>LiteLLM: Convert to text content
        else Mixed with real tools
            LiteLLM->>LiteLLM: Filter out json_tool_call, keep real tools
        end
    else Streaming
        LiteLLM->>LiteLLM: AWSEventStreamDecoder (json_mode=True)
        loop Per chunk
            alt json_tool_call chunk
                LiteLLM->>LiteLLM: Suppress start/stop, convert delta to text
            else Real tool chunk
                LiteLLM->>LiteLLM: Pass through normally
            end
        end
    end

    LiteLLM-->>Client: Response (only real tools in tool_calls)
Loading

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

4 files reviewed, 2 comments

Edit Code Review Agent Settings | Greptile

@jquinter
Copy link
Contributor Author

@greptile-apps re-review please!

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Feb 11, 2026

Greptile Overview

Greptile Summary

Fixes an issue where Bedrock Converse API returns both the internal json_tool_call (used internally by LiteLLM for response_format handling) and real user-defined tools, breaking consumers like OpenAI Agents SDK. The fix filters out json_tool_call in both non-streaming and streaming paths, preserving its content as message text so structured output isn't lost.

  • Extracted _filter_json_mode_tools() in converse_transformation.py to handle 3 scenarios: only json_tool_call (convert to content), mixed with real tools (filter and preserve as text), or no json_tool_call (pass through)
  • Fixed optional_params.pop("json_mode").get("json_mode") to avoid mutating the caller's dict
  • Added json_mode + _current_tool_name tracking to AWSEventStreamDecoder for streaming: suppresses json_tool_call start/stop chunks and converts delta chunks to text content
  • 4 new mock-only tests covering mixed tools, mutation prevention, streaming filtering, and backward compatibility

Confidence Score: 4/5

  • This PR is safe to merge — it fixes a real bug with well-tested logic and no breaking changes to existing behavior.
  • The changes are well-scoped, with clear separation between non-streaming and streaming fixes. The _filter_json_mode_tools logic handles all edge cases (only json_tool_call, mixed, none). The streaming state tracking via _current_tool_name is reliable given Bedrock's sequential content block delivery. The .pop().get() fix prevents a real mutation bug. Tests cover all key scenarios with proper mocking. Minor deduction for the unusual for ... break pattern that could be simplified, but no functional issues found.
  • litellm/llms/bedrock/chat/invoke_handler.py — streaming state management with _current_tool_name relies on Bedrock always delivering content blocks sequentially (start → deltas → stop per block).

Important Files Changed

Filename Overview
litellm/llms/bedrock/chat/converse_transformation.py New _filter_json_mode_tools static method handles 3 scenarios for json_tool_call filtering. Changed optional_params.pop() to .get() to avoid mutation. Logic is sound and well-structured.
litellm/llms/bedrock/chat/invoke_handler.py Added json_mode and _current_tool_name tracking to AWSEventStreamDecoder. Suppresses json_tool_call start/stop/delta chunks in streaming and converts deltas to text content. State tracking relies on Bedrock's sequential content block delivery.
litellm/llms/bedrock/chat/converse_handler.py One-line change to pass json_mode to AWSEventStreamDecoder constructor in make_sync_call.
tests/test_litellm/llms/bedrock/chat/test_converse_transformation.py 4 new mock-only tests covering mixed tools, optional_params mutation, streaming filtering, and backward compatibility. No real network calls.

Sequence Diagram

sequenceDiagram
    participant Client
    participant LiteLLM
    participant Bedrock

    Client->>LiteLLM: completion(tools + response_format)
    LiteLLM->>Bedrock: converse(tools + json_tool_call)
    Bedrock-->>LiteLLM: response with json_tool_call + real tools

    alt Non-Streaming
        LiteLLM->>LiteLLM: _filter_json_mode_tools()
        Note over LiteLLM: Converts json_tool_call to text content<br/>Keeps real tool_calls intact
    else Streaming
        loop For each content block
            alt json_tool_call block
                LiteLLM->>LiteLLM: Suppress start/stop chunks
                LiteLLM->>LiteLLM: Convert delta to text content
            else Real tool block
                LiteLLM->>LiteLLM: Pass through as tool_call
            end
        end
    end

    LiteLLM-->>Client: response with only real tool_calls + text content
Loading

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

4 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

@krrishdholakia
Copy link
Member

@jquinter have you validated this PR solves the problem, by doing real tests with bedrock?

@jquinter
Copy link
Contributor Author

@jquinter have you validated this PR solves the problem, by doing real tests with bedrock?

yes, I created two scripts, one simulating a request, and the last one I include here, a real request to Bedrock.

Result doing real requests (before vs after PR)
  ========================================================================                                                                                                                                                      
    Bedrock json_tool_call fix — live API demo                                                                       
    model:  bedrock/us.anthropic.claude-haiku-4-5-20251001-v1:0                                                                                                                                                                 
    region: us-east-1                                                                                                                                                                                                           
    issue:  https://github.com/BerriAI/litellm/issues/18381
  ========================================================================


  --- Non-streaming: tools + response_format (real Bedrock call) ---

    Raw tools from Bedrock (before filtering):
    raw_tools[0]: json_tool_call({"summary": "Checking weather in San Francisco, CA", "needs_tool_call": true})
    raw_tools[1]: get_weather({"location": "San Francisco, CA"})

    Bedrock returned BOTH json_tool_call AND get_weather!
    This is the exact scenario that triggers issue #18381.

    BEFORE this PR (old logic would produce):
    tool_calls[0]: json_tool_call({"summary": "Checking weather in San Francisco, CA", "needs_tool_call": true})
    tool_calls[1]: get_weather({"location": "San Francisco, CA"})
    content: (empty)
    ^^^ BUG: json_tool_call leaked into tool_calls!

    AFTER this PR (litellm.completion() returned):
    tool_calls[0]: get_weather({"location": "San Francisco, CA"})
    content: {"summary": "Checking weather in San Francisco, CA", "needs_tool_call": true}

    FIXED: json_tool_call filtered, real tool preserved.

  --- Streaming: tools + response_format (real Bedrock call) ---

    finish_reason: tool_calls
    content:       {"summary": "Checking the weather in San Francisco, CA", "needs_tool_call": true}
    tool_calls:    None

    PASS: No json_tool_call in streaming output.

  ========================================================================
    Summary
  ========================================================================

    non_streaming: PASS
    streaming: PASS

    All tests passed! json_tool_call is properly filtered.
Script doing real requests
"""
Demo: Bedrock Converse API json_tool_call fix — live API requests

Makes a real Bedrock API call and shows the before/after behavior by
capturing the raw tool_calls list BEFORE _filter_json_mode_tools processes it.

This proves the bug from https://github.com/BerriAI/litellm/issues/18381
exists in the raw Bedrock response and that this PR filters it correctly.

Requires AWS credentials configured for Bedrock access.

Usage:
    poetry run python scripts/demo_bedrock_live.py
    poetry run python scripts/demo_bedrock_live.py --model bedrock/us.anthropic.claude-sonnet-4-20250514-v1:0
    poetry run python scripts/demo_bedrock_live.py --region us-west-2
"""

import argparse
import json
import os
from copy import deepcopy
from typing import List, Optional

import litellm
from litellm.constants import RESPONSE_FORMAT_TOOL_NAME
from litellm.llms.bedrock.chat.converse_transformation import AmazonConverseConfig
from litellm.types.utils import ChatCompletionToolCallChunk

# ---------------------------------------------------------------------------
# Formatting
# ---------------------------------------------------------------------------

BOLD = "\033[1m"
GREEN = "\033[1;32m"
RED = "\033[1;31m"
YELLOW = "\033[1;33m"
CYAN = "\033[1;36m"
DIM = "\033[2m"
RESET = "\033[0m"


def heading(text: str):
    print(f"\n{BOLD}{'=' * 72}")
    print(f"  {text}")
    print(f"{'=' * 72}{RESET}\n")


def subheading(text: str):
    print(f"\n{BOLD}--- {text} ---{RESET}\n")


def print_tool_calls_dicts(label: str, tool_calls):
    """Print tool_calls from list of dicts (raw capture)."""
    if not tool_calls:
        print(f"  {label}: {YELLOW}None{RESET}")
        return
    for i, tc in enumerate(tool_calls):
        name = tc["function"]["name"]
        args = tc["function"].get("arguments", "")
        color = RED if name == RESPONSE_FORMAT_TOOL_NAME else GREEN
        args_short = args[:120] + "..." if len(args) > 120 else args
        print(f"  {label}[{i}]: {color}{name}{RESET}({args_short})")


def print_tool_calls_objects(label: str, tool_calls):
    """Print tool_calls from litellm response objects."""
    if not tool_calls:
        print(f"  {label}: {YELLOW}None{RESET}")
        return
    for i, tc in enumerate(tool_calls):
        name = tc.function.name
        args = tc.function.arguments or ""
        color = RED if name == RESPONSE_FORMAT_TOOL_NAME else GREEN
        args_short = args[:120] + "..." if len(args) > 120 else args
        print(f"  {label}[{i}]: {color}{name}{RESET}({args_short})")


# ---------------------------------------------------------------------------
# Old logic — verbatim from main branch before this PR
# ---------------------------------------------------------------------------

def old_logic(tools, chat_completion_message):
    """
    Simulates the old _transform_response code (main branch):

        if (
            json_mode is True
            and tools is not None
            and len(tools) == 1                          # <-- BUG
            and tools[0]["function"].get("name") == RESPONSE_FORMAT_TOOL_NAME
        ):
            # Extract content — only works for single json_tool_call
            ...
        else:
            chat_completion_message["tool_calls"] = tools  # ALL tools leak
    """
    if (
        tools is not None
        and len(tools) == 1
        and tools[0]["function"].get("name") == RESPONSE_FORMAT_TOOL_NAME
    ):
        content = tools[0]["function"].get("arguments")
        if content is not None:
            chat_completion_message["content"] = content
    else:
        chat_completion_message["tool_calls"] = tools


# ---------------------------------------------------------------------------
# Capture hook — monkeypatches _filter_json_mode_tools to grab raw tools
# ---------------------------------------------------------------------------

captured_raw_tools: List[ChatCompletionToolCallChunk] = []
_original_filter = AmazonConverseConfig._filter_json_mode_tools


def _capturing_filter(json_mode, tools, chat_completion_message):
    """Wraps _filter_json_mode_tools to capture the raw tools before filtering."""
    captured_raw_tools.clear()
    if tools:
        captured_raw_tools.extend(deepcopy(tools))
    return _original_filter(json_mode, tools, chat_completion_message)


# ---------------------------------------------------------------------------
# Test: Non-streaming with tools + response_format
# ---------------------------------------------------------------------------

def test_non_streaming(model: str):
    subheading("Non-streaming: tools + response_format (real Bedrock call)")

    tools = [
        {
            "type": "function",
            "function": {
                "name": "get_weather",
                "description": "Get the current weather for a location.",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "location": {
                            "type": "string",
                            "description": "City and state, e.g. San Francisco, CA",
                        },
                    },
                    "required": ["location"],
                },
            },
        }
    ]

    response_format = {
        "type": "json_schema",
        "json_schema": {
            "name": "weather_response",
            "schema": {
                "type": "object",
                "properties": {
                    "summary": {"type": "string"},
                    "needs_tool_call": {"type": "boolean"},
                },
                "required": ["summary", "needs_tool_call"],
            },
        },
    }

    messages = [
        {
            "role": "user",
            "content": (
                "What's the weather in San Francisco? "
                "You MUST call the get_weather tool to check the weather."
            ),
        }
    ]

    # Install the capture hook (must wrap in staticmethod to avoid self injection)
    AmazonConverseConfig._filter_json_mode_tools = staticmethod(_capturing_filter)

    try:
        response = litellm.completion(
            model=model,
            messages=messages,
            tools=tools,
            response_format=response_format,
        )
    finally:
        # Restore original as staticmethod
        AmazonConverseConfig._filter_json_mode_tools = staticmethod(_original_filter)

    msg = response.choices[0].message
    raw_tools = list(captured_raw_tools)

    # --- Show what Bedrock returned (raw) ---
    print(f"  {CYAN}{BOLD}Raw tools from Bedrock (before filtering):{RESET}")
    print_tool_calls_dicts("raw_tools", raw_tools)

    has_json_tool = any(
        t["function"]["name"] == RESPONSE_FORMAT_TOOL_NAME for t in raw_tools
    )
    has_real_tool = any(
        t["function"]["name"] != RESPONSE_FORMAT_TOOL_NAME for t in raw_tools
    )

    if has_json_tool and has_real_tool:
        print(f"\n  {RED}{BOLD}Bedrock returned BOTH json_tool_call AND get_weather!{RESET}")
        print(f"  {DIM}This is the exact scenario that triggers issue #18381.{RESET}")

        # Show old behavior
        print(f"\n  {RED}{BOLD}BEFORE this PR (old logic would produce):{RESET}")
        msg_old = {"role": "assistant", "content": ""}
        old_logic(tools=deepcopy(raw_tools), chat_completion_message=msg_old)
        print_tool_calls_dicts("tool_calls", msg_old.get("tool_calls"))
        print(f"  content: {msg_old.get('content', '') or '(empty)'}")
        leaked = any(
            tc["function"]["name"] == RESPONSE_FORMAT_TOOL_NAME
            for tc in msg_old.get("tool_calls", [])
        )
        if leaked:
            print(f"  {RED}^^^ BUG: json_tool_call leaked into tool_calls!{RESET}")

        # Show new behavior (what litellm actually returned)
        print(f"\n  {GREEN}{BOLD}AFTER this PR (litellm.completion() returned):{RESET}")
        print_tool_calls_objects("tool_calls", msg.tool_calls)
        print(f"  content: {msg.content or '(empty)'}")

        no_leak = not msg.tool_calls or not any(
            tc.function.name == RESPONSE_FORMAT_TOOL_NAME for tc in msg.tool_calls
        )
        if no_leak:
            print(f"\n  {GREEN}FIXED: json_tool_call filtered, real tool preserved.{RESET}")
            return True
        else:
            print(f"\n  {RED}json_tool_call still leaking!{RESET}")
            return False

    elif has_json_tool and not has_real_tool:
        print(f"\n  {YELLOW}Bedrock returned only json_tool_call (no real tool).{RESET}")
        print(f"  {DIM}The model chose not to call get_weather this time.")
        print(f"  This case is handled correctly by both old and new code.{RESET}")
        print(f"\n  litellm result:")
        print_tool_calls_objects("tool_calls", msg.tool_calls)
        print(f"  content: {msg.content or '(empty)'}")
        no_leak = not msg.tool_calls or not any(
            tc.function.name == RESPONSE_FORMAT_TOOL_NAME for tc in msg.tool_calls
        )
        print(f"\n  {'PASS' if no_leak else 'FAIL'}: json_tool_call {'filtered' if no_leak else 'leaked'}")
        print(f"\n  {YELLOW}TIP: Re-run — the model sometimes returns both tools.{RESET}")
        return no_leak

    elif has_real_tool and not has_json_tool:
        print(f"\n  {YELLOW}Bedrock returned only get_weather (no json_tool_call).{RESET}")
        print(f"  {DIM}The model chose not to use the json_tool_call tool this time.")
        print(f"  No filtering needed — both old and new code work fine.{RESET}")
        print(f"\n  litellm result:")
        print_tool_calls_objects("tool_calls", msg.tool_calls)
        print(f"  content: {msg.content or '(empty)'}")
        print(f"\n  {YELLOW}TIP: Re-run — the model sometimes returns both tools.{RESET}")
        return True

    else:
        print(f"\n  {YELLOW}Bedrock returned no tools at all.{RESET}")
        print(f"  content: {msg.content or '(empty)'}")
        print(f"\n  {YELLOW}TIP: Re-run or try a different prompt/model.{RESET}")
        return True


# ---------------------------------------------------------------------------
# Test: Streaming with tools + response_format
# ---------------------------------------------------------------------------

def test_streaming(model: str):
    subheading("Streaming: tools + response_format (real Bedrock call)")

    tools = [
        {
            "type": "function",
            "function": {
                "name": "get_weather",
                "description": "Get the current weather for a location.",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "location": {
                            "type": "string",
                            "description": "City and state, e.g. San Francisco, CA",
                        },
                    },
                    "required": ["location"],
                },
            },
        }
    ]

    response_format = {
        "type": "json_schema",
        "json_schema": {
            "name": "weather_response",
            "schema": {
                "type": "object",
                "properties": {
                    "summary": {"type": "string"},
                    "needs_tool_call": {"type": "boolean"},
                },
                "required": ["summary", "needs_tool_call"],
            },
        },
    }

    messages = [
        {
            "role": "user",
            "content": (
                "What's the weather in San Francisco? "
                "You MUST call the get_weather tool to check the weather."
            ),
        }
    ]

    response = litellm.completion(
        model=model,
        messages=messages,
        tools=tools,
        response_format=response_format,
        stream=True,
    )

    collected_text = ""
    collected_tool_calls = {}
    finish_reason = None

    for chunk in response:
        choice = chunk.choices[0]
        if choice.finish_reason:
            finish_reason = choice.finish_reason
        delta = choice.delta
        if delta.content:
            collected_text += delta.content
        if delta.tool_calls:
            for tc in delta.tool_calls:
                idx = tc.index
                if idx not in collected_tool_calls:
                    collected_tool_calls[idx] = {"name": None, "args": ""}
                if tc.function and tc.function.name:
                    collected_tool_calls[idx]["name"] = tc.function.name
                if tc.function and tc.function.arguments:
                    collected_tool_calls[idx]["args"] += tc.function.arguments

    print(f"  finish_reason: {finish_reason}")
    print(f"  content:       {collected_text[:200] if collected_text else '(empty)'}")
    if collected_tool_calls:
        for idx, tc in sorted(collected_tool_calls.items()):
            name = tc["name"] or "(unknown)"
            color = RED if name == RESPONSE_FORMAT_TOOL_NAME else GREEN
            args_short = tc["args"][:120] + "..." if len(tc["args"]) > 120 else tc["args"]
            print(f"  tool_calls[{idx}]: {color}{name}{RESET}({args_short})")
    else:
        print(f"  tool_calls:    {YELLOW}None{RESET}")

    leaked = any(
        tc["name"] == RESPONSE_FORMAT_TOOL_NAME
        for tc in collected_tool_calls.values()
    )
    if leaked:
        print(f"\n  {RED}FAIL: json_tool_call leaked into streaming tool_calls!{RESET}")
    else:
        print(f"\n  {GREEN}PASS: No json_tool_call in streaming output.{RESET}")
    return not leaked


# ---------------------------------------------------------------------------
# Main
# ---------------------------------------------------------------------------

def main():
    parser = argparse.ArgumentParser(
        description="Demo: Bedrock json_tool_call fix (live API requests)"
    )
    parser.add_argument(
        "--model",
        default="bedrock/us.anthropic.claude-3-5-haiku-20241022-v1:0",
        help="Model to test (default: bedrock/us.anthropic.claude-3-5-haiku-20241022-v1:0)",
    )
    parser.add_argument(
        "--region",
        default=None,
        help="AWS region (e.g. us-east-1, us-west-2). Overrides AWS_REGION_NAME env var.",
    )
    args = parser.parse_args()

    if args.region:
        os.environ["AWS_REGION_NAME"] = args.region

    region_display = (
        args.region
        or os.environ.get("AWS_REGION_NAME")
        or os.environ.get("AWS_REGION")
        or "(default)"
    )

    heading(
        f"Bedrock json_tool_call fix — live API demo\n"
        f"  model:  {args.model}\n"
        f"  region: {region_display}\n"
        f"  issue:  https://github.com/BerriAI/litellm/issues/18381"
    )

    tests = [
        ("non_streaming", test_non_streaming),
        ("streaming", test_streaming),
    ]
    results = {}

    for name, fn in tests:
        try:
            results[name] = fn(args.model)
        except Exception as e:
            print(f"\n  {RED}ERROR: {e}{RESET}")
            results[name] = None

    # Summary
    heading("Summary")
    all_ok = True
    for name, result in results.items():
        if result is None:
            status = f"{YELLOW}SKIP (error){RESET}"
            all_ok = False
        elif result:
            status = f"{GREEN}PASS{RESET}"
        else:
            status = f"{RED}FAIL{RESET}"
            all_ok = False
        print(f"  {name}: {status}")

    if all_ok:
        print(f"\n  {GREEN}{BOLD}All tests passed! json_tool_call is properly filtered.{RESET}")
    else:
        ran = [v for v in results.values() if v is not None]
        if ran and all(v is True for v in ran):
            print(f"\n  {GREEN}{BOLD}Runnable tests passed!{RESET} {YELLOW}Some skipped.{RESET}")
        else:
            print(f"\n  {RED}{BOLD}Some tests failed or were skipped.{RESET}")

    return 0 if all_ok else 1


if __name__ == "__main__":
    raise SystemExit(main())
Result doing simulated requests (before vs after PR)
========================================================================
  Bedrock json_tool_call fix — before vs after
  Issue: https://github.com/BerriAI/litellm/issues/18381
  Internal tool name: json_tool_call
========================================================================

  Background:
  When using tools + response_format with Bedrock, LiteLLM injects an
  internal tool called 'json_tool_call' to get structured output.
  Bedrock may return this internal tool ALONGSIDE real user tools.
  Consumers (e.g. OpenAI Agents SDK) see 'json_tool_call' in
  tool_calls and crash trying to dispatch it.

  The old code only handled len(tools)==1. When Bedrock returned 2+
  tools, json_tool_call leaked through.

--- Scenario 1: Bedrock returns BOTH json_tool_call AND get_weather ---

  This happens when using tools + response_format together.
  Bedrock calls the internal json_tool_call AND the user's real tool.

  BEFORE this PR (old logic):
  tool_calls[0]: json_tool_call({"summary": "Checking weather in San Francisco", "needs_tool_call": true})
  tool_calls[1]: get_weather({"location": "San Francisco, CA"})
  content: (empty)

  BUG: json_tool_call leaked into tool_calls!
  Consumers (e.g. OpenAI Agents SDK) try to dispatch it as a real
  tool and crash because no such tool exists.

  AFTER this PR (new _filter_json_mode_tools):
  tool_calls[0]: get_weather({"location": "San Francisco, CA"})
  content: {"summary": "Checking weather in San Francisco", "needs_tool_call": true}

  FIXED: json_tool_call filtered out, real tool preserved.
  Structured output preserved in message.content.

--- Scenario 2: Bedrock returns only json_tool_call (no user tools) ---

  This happens with response_format but no user-defined tools.
  Both old and new code handle this correctly.

  BEFORE this PR:
  tool_calls: None
  content: {"summary": "Paris is cold in January with average highs around 7C", "needs_tool_call": false}
  OK: json_tool_call converted to content

  AFTER this PR:
  tool_calls: None
  content: {"summary": "Paris is cold in January with average highs around 7C", "needs_tool_call": false}
  OK: json_tool_call converted to content

--- Scenario 3: optional_params mutation (.pop vs .get) ---

  The old code used optional_params.pop('json_mode'), which mutates
  the caller's dict. The new code uses .get() instead.

  BEFORE: optional_params.pop('json_mode', None)
  After pop: 'json_mode' in params = False
  json_mode is GONE from the dict — may cause issues downstream

  AFTER:  optional_params.get('json_mode', None)
  After get: 'json_mode' in params = True
  json_mode stays in the dict — no side effects

========================================================================
  Summary
========================================================================

  mixed_tools: PASS
  only_json_tool: PASS
  no_mutation: PASS

  All scenarios demonstrate the fix correctly.

Script simulating processing requests
"""
Demo: Bedrock Converse API json_tool_call filtering — before vs after

Shows the exact bug from https://github.com/BerriAI/litellm/issues/18381
by replaying a realistic Bedrock response through the old logic (before)
and the new _filter_json_mode_tools logic (after this PR).

No AWS credentials required — this exercises the transformation code directly.

Usage:
    poetry run python scripts/demo_bedrock_json_tool_call_fix.py
"""

import json
from copy import deepcopy

from litellm.constants import RESPONSE_FORMAT_TOOL_NAME
from litellm.llms.bedrock.chat.converse_transformation import AmazonConverseConfig

# ---------------------------------------------------------------------------
# Formatting helpers
# ---------------------------------------------------------------------------

BOLD = "\033[1m"
GREEN = "\033[1;32m"
RED = "\033[1;31m"
YELLOW = "\033[1;33m"
CYAN = "\033[1;36m"
DIM = "\033[2m"
RESET = "\033[0m"


def heading(text: str):
    print(f"\n{BOLD}{'=' * 72}")
    print(f"  {text}")
    print(f"{'=' * 72}{RESET}\n")


def subheading(text: str):
    print(f"\n{BOLD}--- {text} ---{RESET}\n")


def print_tool_calls(label: str, tool_calls):
    if not tool_calls:
        print(f"  {label}: {YELLOW}None{RESET}")
        return
    for i, tc in enumerate(tool_calls):
        name = tc["function"]["name"]
        args = tc["function"].get("arguments", "")
        color = RED if name == RESPONSE_FORMAT_TOOL_NAME else GREEN
        args_short = args[:100] + "..." if len(args) > 100 else args
        print(f"  {label}[{i}]: {color}{name}{RESET}({args_short})")


# ---------------------------------------------------------------------------
# Simulated Bedrock response — this is what _translate_message_content
# produces when Bedrock returns BOTH json_tool_call AND a real tool.
#
# This happens when the user passes tools + response_format. LiteLLM
# injects a json_tool_call tool, and Bedrock may call both.
# ---------------------------------------------------------------------------

MIXED_TOOLS = [
    {
        "id": "call_json_001",
        "type": "function",
        "function": {
            "name": RESPONSE_FORMAT_TOOL_NAME,  # "json_tool_call"
            "arguments": json.dumps({
                "summary": "Checking weather in San Francisco",
                "needs_tool_call": True,
            }),
        },
    },
    {
        "id": "call_real_001",
        "type": "function",
        "function": {
            "name": "get_weather",
            "arguments": json.dumps({"location": "San Francisco, CA"}),
        },
    },
]

ONLY_JSON_TOOL = [
    {
        "id": "call_json_002",
        "type": "function",
        "function": {
            "name": RESPONSE_FORMAT_TOOL_NAME,
            "arguments": json.dumps({
                "summary": "Paris is cold in January with average highs around 7C",
                "needs_tool_call": False,
            }),
        },
    },
]


# ---------------------------------------------------------------------------
# Old logic (before this PR) — extracted verbatim from main branch
# ---------------------------------------------------------------------------

def old_logic(json_mode, tools, chat_completion_message):
    """
    The code from main branch (converse_transformation.py lines 1679-1714):

        if (
            json_mode is True
            and tools is not None
            and len(tools) == 1                          # <-- BUG
            and tools[0]["function"].get("name") == RESPONSE_FORMAT_TOOL_NAME
        ):
            # Extract content — only works for single json_tool_call
            ...
        else:
            chat_completion_message["tool_calls"] = tools  # ALL tools leak
    """
    if (
        json_mode is True
        and tools is not None
        and len(tools) == 1
        and tools[0]["function"].get("name") == RESPONSE_FORMAT_TOOL_NAME
    ):
        json_mode_content_str = tools[0]["function"].get("arguments")
        if json_mode_content_str is not None:
            try:
                response_data = json.loads(json_mode_content_str)
                if (
                    isinstance(response_data, dict)
                    and "properties" in response_data
                    and len(response_data) == 1
                ):
                    response_data = response_data["properties"]
                    json_mode_content_str = json.dumps(response_data)
            except json.JSONDecodeError:
                pass
            chat_completion_message["content"] = json_mode_content_str
    else:
        chat_completion_message["tool_calls"] = tools


# ---------------------------------------------------------------------------
# Scenario 1: Mixed json_tool_call + real tool
# ---------------------------------------------------------------------------

def scenario_mixed():
    subheading("Scenario 1: Bedrock returns BOTH json_tool_call AND get_weather")
    print(f"  {DIM}This happens when using tools + response_format together.")
    print(f"  Bedrock calls the internal json_tool_call AND the user's real tool.{RESET}\n")

    # --- BEFORE ---
    print(f"  {RED}{BOLD}BEFORE this PR (old logic):{RESET}")
    msg_before = {"role": "assistant", "content": ""}
    tools_before = deepcopy(MIXED_TOOLS)
    old_logic(json_mode=True, tools=tools_before, chat_completion_message=msg_before)

    print_tool_calls("tool_calls", msg_before.get("tool_calls"))
    print(f"  content: {msg_before.get('content', '') or '(empty)'}")

    leaked_before = any(
        tc["function"]["name"] == RESPONSE_FORMAT_TOOL_NAME
        for tc in msg_before.get("tool_calls", [])
    )
    if leaked_before:
        print(f"\n  {RED}BUG: json_tool_call leaked into tool_calls!")
        print(f"  Consumers (e.g. OpenAI Agents SDK) try to dispatch it as a real")
        print(f"  tool and crash because no such tool exists.{RESET}")

    # --- AFTER ---
    print(f"\n  {GREEN}{BOLD}AFTER this PR (new _filter_json_mode_tools):{RESET}")
    msg_after = {"role": "assistant", "content": ""}
    tools_after = deepcopy(MIXED_TOOLS)
    filtered = AmazonConverseConfig._filter_json_mode_tools(
        json_mode=True,
        tools=tools_after,
        chat_completion_message=msg_after,
    )

    print_tool_calls("tool_calls", filtered)
    content_after = msg_after.get("content", "") or "(empty)"
    print(f"  content: {content_after}")

    leaked_after = filtered and any(
        tc["function"]["name"] == RESPONSE_FORMAT_TOOL_NAME
        for tc in filtered
    )
    has_real = filtered and any(
        tc["function"]["name"] == "get_weather"
        for tc in filtered
    )

    if not leaked_after and has_real:
        print(f"\n  {GREEN}FIXED: json_tool_call filtered out, real tool preserved.")
        print(f"  Structured output preserved in message.content.{RESET}")
        return True
    else:
        print(f"\n  {RED}UNEXPECTED result.{RESET}")
        return False


# ---------------------------------------------------------------------------
# Scenario 2: Only json_tool_call (no real tools)
# ---------------------------------------------------------------------------

def scenario_only_json():
    subheading("Scenario 2: Bedrock returns only json_tool_call (no user tools)")
    print(f"  {DIM}This happens with response_format but no user-defined tools.")
    print(f"  Both old and new code handle this correctly.{RESET}\n")

    # --- BEFORE ---
    print(f"  {CYAN}{BOLD}BEFORE this PR:{RESET}")
    msg_before = {"role": "assistant", "content": ""}
    tools_before = deepcopy(ONLY_JSON_TOOL)
    old_logic(json_mode=True, tools=tools_before, chat_completion_message=msg_before)

    print_tool_calls("tool_calls", msg_before.get("tool_calls"))
    content_b = msg_before.get("content", "") or "(empty)"
    print(f"  content: {content_b}")
    ok_before = "tool_calls" not in msg_before and msg_before.get("content")
    print(f"  {'OK' if ok_before else 'ISSUE'}: json_tool_call converted to content\n")

    # --- AFTER ---
    print(f"  {CYAN}{BOLD}AFTER this PR:{RESET}")
    msg_after = {"role": "assistant", "content": ""}
    tools_after = deepcopy(ONLY_JSON_TOOL)
    filtered = AmazonConverseConfig._filter_json_mode_tools(
        json_mode=True,
        tools=tools_after,
        chat_completion_message=msg_after,
    )

    print_tool_calls("tool_calls", filtered)
    content_a = msg_after.get("content", "") or "(empty)"
    print(f"  content: {content_a}")
    ok_after = filtered is None and msg_after.get("content")
    print(f"  {'OK' if ok_after else 'ISSUE'}: json_tool_call converted to content")

    return ok_before and ok_after


# ---------------------------------------------------------------------------
# Scenario 3: pop vs get — optional_params mutation
# ---------------------------------------------------------------------------

def scenario_no_mutation():
    subheading("Scenario 3: optional_params mutation (.pop vs .get)")
    print(f"  {DIM}The old code used optional_params.pop('json_mode'), which mutates")
    print(f"  the caller's dict. The new code uses .get() instead.{RESET}\n")

    params = {"json_mode": True, "tools": []}

    print(f"  {RED}{BOLD}BEFORE:{RESET} optional_params.pop('json_mode', None)")
    params_before = deepcopy(params)
    params_before.pop("json_mode", None)
    print(f"  After pop: 'json_mode' in params = {'json_mode' in params_before}")
    print(f"  {RED}json_mode is GONE from the dict — may cause issues downstream{RESET}\n")

    print(f"  {GREEN}{BOLD}AFTER:{RESET}  optional_params.get('json_mode', None)")
    params_after = deepcopy(params)
    params_after.get("json_mode", None)
    print(f"  After get: 'json_mode' in params = {'json_mode' in params_after}")
    print(f"  {GREEN}json_mode stays in the dict — no side effects{RESET}")

    return True


# ---------------------------------------------------------------------------
# Main
# ---------------------------------------------------------------------------

def main():
    heading(
        "Bedrock json_tool_call fix — before vs after\n"
        f"  Issue: https://github.com/BerriAI/litellm/issues/18381\n"
        f"  Internal tool name: {CYAN}{RESPONSE_FORMAT_TOOL_NAME}{RESET}"
    )

    print(f"  {BOLD}Background:{RESET}")
    print(f"  When using tools + response_format with Bedrock, LiteLLM injects an")
    print(f"  internal tool called '{RESPONSE_FORMAT_TOOL_NAME}' to get structured output.")
    print(f"  Bedrock may return this internal tool ALONGSIDE real user tools.")
    print(f"  Consumers (e.g. OpenAI Agents SDK) see '{RESPONSE_FORMAT_TOOL_NAME}' in")
    print(f"  tool_calls and crash trying to dispatch it.\n")
    print(f"  {BOLD}The old code only handled len(tools)==1. When Bedrock returned 2+")
    print(f"  tools, json_tool_call leaked through.{RESET}")

    results = {}

    for name, fn in [
        ("mixed_tools", scenario_mixed),
        ("only_json_tool", scenario_only_json),
        ("no_mutation", scenario_no_mutation),
    ]:
        try:
            results[name] = fn()
        except Exception as e:
            print(f"\n  {RED}ERROR: {e}{RESET}")
            results[name] = False

    # Summary
    heading("Summary")
    all_ok = True
    for name, ok in results.items():
        status = f"{GREEN}PASS{RESET}" if ok else f"{RED}FAIL{RESET}"
        print(f"  {name}: {status}")
        if not ok:
            all_ok = False

    if all_ok:
        print(f"\n  {GREEN}{BOLD}All scenarios demonstrate the fix correctly.{RESET}")
    else:
        print(f"\n  {RED}{BOLD}Some scenarios did not behave as expected.{RESET}")

    return 0 if all_ok else 1


if __name__ == "__main__":
    raise SystemExit(main())

jquinter and others added 9 commits February 12, 2026 21:00
…ls in both streaming and non-streaming

When using both `tools` and `response_format` with Bedrock Converse API, LiteLLM
internally adds a fake tool called `json_tool_call` to handle structured output.
Bedrock may return both this internal tool AND real user-defined tools, causing
consumers like OpenAI Agents SDK to break trying to dispatch `json_tool_call`.

This fix:
- Extracts `_filter_json_mode_tools()` to handle 3 scenarios: only json_tool_call
  (convert to content), mixed with real tools (filter it out), or no json_tool_call
- Fixes streaming by adding json_mode awareness to AWSEventStreamDecoder, converting
  json_tool_call chunks to text content while passing real tool chunks through
- Changes `optional_params.pop("json_mode")` to `.get()` to avoid mutating caller dict

Fixes BerriAI#18381
Credits @haggai-backline for the original investigation in PR BerriAI#18384

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
…_tool_call content in mixed case

- Move `import json` to top of converse_transformation.py per CLAUDE.md style guide
- In the mixed tools case, preserve json_tool_call arguments as message content
  so the structured output from response_format is not silently lost
- Update test to verify json_tool_call content is preserved as message text

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
httpx.Response.json() is synchronous, not async. Using AsyncMock
made the test fail because it turned json() into a coroutine.
Adds try/except around FastAPI imports with fallback mock classes.
This allows the module to be imported in test environments where
proxy dependencies (FastAPI) may not be installed.

Fixes NameError when MCP tests try to import from proxy_server which
imports from this module:
- NameError: name 'APIRouter' is not defined
- NameError: name 'Depends' is not defined
- NameError: name 'HTTPException' is not defined
- NameError: name 'Query' is not defined
The test was intermittently failing in CI because it used a default MagicMock
for the async post() method. This is unreliable across environments. Using
AsyncMock explicitly ensures the mock properly handles async/await.
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
@jquinter
Copy link
Contributor Author

Closing this PR as it includes unrelated changes (policy_resolve_endpoints, MCP tests, Anthropic tests). Replaced by properly scoped PR #21107 which contains only the Bedrock json_tool_call filtering changes.

@jquinter jquinter closed this Feb 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Bedrock returns fake json_tool_call tool when using response_format + tools together, breaking OpenAI Agents SDK

2 participants