Skip to content

Bedrock: move native structured output model list to cost JSON, add Sonnet 4.6#23794

Open
ndgigliotti wants to merge 7 commits intoBerriAI:mainfrom
ndgigliotti:feat/bedrock-structured-output-cost-json
Open

Bedrock: move native structured output model list to cost JSON, add Sonnet 4.6#23794
ndgigliotti wants to merge 7 commits intoBerriAI:mainfrom
ndgigliotti:feat/bedrock-structured-output-cost-json

Conversation

@ndgigliotti
Copy link
Contributor

@ndgigliotti ndgigliotti commented Mar 17, 2026

Relevant issues

Addresses Greptile feedback on #21222 and #23778 recommending the hardcoded model set be moved to the cost JSON.

Pre-Submission checklist

  • I have Added testing in the tests/test_litellm/ directory
  • My PR passes all unit tests on make test-unit
  • My PR's scope is as isolated as possible, it only solves 1 specific problem
  • I have requested a Greptile review by commenting @greptileai and received a Confidence Score of at least 4/5 before requesting a maintainer review

Type

Refactoring / New Feature

Changes

Claude Sonnet 4.6 was released after the native structured output feature landed (#21222) and was not included. DeepSeek v3 was listed in the hardcoded set but the substring didn't match the actual model ID, so it was silently broken. This PR flags both for native structured output and moves the model capability check from a hardcoded set to the cost JSON, so future models are supported without code changes or releases (since litellm.model_cost is fetched from the remote JSON at import time).

  • litellm/llms/bedrock/chat/converse_transformation.py: Removed hardcoded BEDROCK_NATIVE_STRUCTURED_OUTPUT_MODELS set. _supports_native_structured_outputs() now looks up the supports_native_structured_output flag in litellm.model_cost via get_bedrock_base_model(), with a fallback that strips version suffixes (e.g. :0).
  • model_prices_and_context_window.json / litellm/model_prices_and_context_window_backup.json: Added "supports_native_structured_output": true to 44 Bedrock models:
    • Claude 4.5/4.6 (haiku, sonnet, opus) + all regional variants
    • Qwen3 (32b, 235b, coder-30b, coder-480b, next-80b, vl-235b)
    • Mistral (ministral-3 3b/8b/14b, mistral-large-3, voxtral-mini-3b, voxtral-small-24b)
    • Minimax (m2), Moonshot (kimi-k2-thinking)
    • Nvidia (nemotron-nano-3-30b), DeepSeek (v3-v1:0)
  • tests/test_litellm/llms/bedrock/chat/test_converse_transformation.py: Updated tests to load local cost map and assert against real model IDs.

All 44 flagged models were integration tested against real Bedrock endpoints (sync + streaming) on the native outputConfig.textFormat path.

Models deliberately excluded after integration testing:

  • google.gemma-3 (4b/12b/27b): ignores schema, returns free text
  • nvidia.nemotron-nano (9b/12b): errors with "Tool calling is not supported in streaming mode" even on sync
  • deepseek.v3.2: Bedrock returns 400 on outputConfig.textFormat
  • minimax.minimax-m2.1: Bedrock returns 400 on outputConfig.textFormat
  • moonshotai.kimi-k2.5: Bedrock returns 400 on outputConfig.textFormat
  • qwen.qwen3-coder-next: unavailable in both us-east-1 and us-west-2

These excluded models continue to work via the existing tool-call fallback path.

@vercel
Copy link

vercel bot commented Mar 17, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
litellm Ready Ready Preview, Comment Mar 17, 2026 1:13am

Request Review

@codspeed-hq
Copy link
Contributor

codspeed-hq bot commented Mar 17, 2026

Merging this PR will not alter performance

✅ 16 untouched benchmarks


Comparing ndgigliotti:feat/bedrock-structured-output-cost-json (3b1e124) with main (278c9ba)

Open in CodSpeed

@ndgigliotti
Copy link
Contributor Author

@greptileai

@ndgigliotti ndgigliotti changed the title Flag Claude Sonnet 4.6 for native structured output and move Bedrock model check to cost JSON Bedrock: move native structured output model list to cost JSON, add Sonnet 4.6 Mar 17, 2026
@greptile-apps
Copy link
Contributor

greptile-apps bot commented Mar 17, 2026

Greptile Summary

This PR replaces a hardcoded BEDROCK_NATIVE_STRUCTURED_OUTPUT_MODELS substring-match set with a cost-JSON-driven lookup in _supports_native_structured_outputs, aligning with the project's convention of encoding model capabilities in model_prices_and_context_window.json. It also adds Claude Sonnet 4.6 (previously missing) and correctly identifies DeepSeek v3 (the old "deepseek-v3.1" substring never matched real Bedrock model IDs).

Key changes:

  • _supports_native_structured_outputs now calls get_bedrock_base_model() to normalize the model ID, does a direct litellm.model_cost lookup, then falls back to stripping the version suffix (:0) for models that appear in the JSON without a version qualifier.
  • 44 Bedrock models gain "supports_native_structured_output": true in both the main and backup JSON files.
  • New tests (test_supports_native_structured_outputs, test_native_structured_output_no_fake_stream, test_json_object_no_schema_falls_back_to_tool_call) correctly guard litellm.model_cost state with try/finally. However, test_translate_response_format_native_output_config (line 2729) omits that setup and may be flaky if run against a remote cost map that hasn't been updated yet.
  • The remainder of the diff is pure line-wrapping/formatting reformatting with no logic changes.

Confidence Score: 4/5

  • Safe to merge with one minor fix: test_translate_response_format_native_output_config should set up the local cost map the same way the other new tests do.
  • The core logic change is clean, well-tested, and follows the project convention for capability flags. The 44 JSON additions are consistent. The only concern is a potential flaky test that could cause CI noise when the remote cost map CDN hasn't been updated yet.
  • tests/test_litellm/llms/bedrock/chat/test_converse_transformation.py — test_translate_response_format_native_output_config (line 2729) needs the same LITELLM_LOCAL_MODEL_COST_MAP / litellm.model_cost setup guard as the other new tests.

Important Files Changed

Filename Overview
litellm/llws/bedrock/chat/converse_transformation.py Removes hardcoded BEDROCK_NATIVE_STRUCTURED_OUTPUT_MODELS set and replaces _supports_native_structured_outputs with a cost-JSON lookup via litellm.model_cost + get_bedrock_base_model, with a version-suffix fallback. The rest of the diff is pure whitespace/line-wrapping reformatting with no logic changes.
model_prices_and_context_window.json Adds "supports_native_structured_output": true to 44 Bedrock models (Claude 4.5/4.6 variants, Qwen3, Mistral, MiniMax, Moonshot, NVIDIA, DeepSeek). Also includes an unrelated correction to vertex_ai/gemini-embedding-2-preview (discussed in a previous thread).
litellm/model_prices_and_context_window_backup.json Mirror of the main JSON changes: same 44 models flagged with supports_native_structured_output: true, and the same vertex_ai/gemini-embedding-2-preview correction.
tests/test_litellm/llms/bedrock/chat/test_converse_transformation.py Adds tests for the new cost-JSON-backed _supports_native_structured_outputs logic. Three new tests properly save/restore litellm.model_cost with try/finally, but test_translate_response_format_native_output_config (line 2729) skips that setup, making it potentially flaky when the remote cost map hasn't been updated yet. The rest of the diff is cosmetic reformatting.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A["_translate_response_format_param(model, response_format, ...)"] --> B{"json_schema present?"}
    B -- No --> E["Tool-call fallback path"]
    B -- Yes --> C["_supports_native_structured_outputs(model)"]
    C --> C1["get_bedrock_base_model(model)\n(strip region prefix, routing prefix, throughput suffix, ARN)"]
    C1 --> C2["litellm.model_cost.get(base_model)"]
    C2 -- found --> C4["return info.get('supports_native_structured_output', False) is True"]
    C2 -- not found & ':' in model --> C3["Retry with version suffix stripped\n(e.g. 'model-v1:0' → 'model-v1')"]
    C3 --> C4
    C2 -- not found & no ':' --> C5["return False"]
    C4 -- True --> D["Native path: build outputConfig.textFormat\nNo tool injection, no fake_stream"]
    C4 -- False --> E
    C5 --> E
    E["Tool-call fallback: inject synthetic tool,\nset tool_choice, possibly set fake_stream"]
Loading

Comments Outside Diff (1)

  1. tests/test_litellm/llws/bedrock/chat/test_converse_transformation.py, line 2729-2770 (link)

    P2 test_translate_response_format_native_output_config may be flaky without local cost map setup

    Unlike test_supports_native_structured_outputs, test_native_structured_output_no_fake_stream, and test_json_object_no_schema_falls_back_to_tool_call — all of which explicitly set LITELLM_LOCAL_MODEL_COST_MAP and reload litellm.model_cost from the local backup — this test relies on whatever litellm.model_cost was loaded at import time.

    If the CI environment fetches the remote JSON (i.e., LITELLM_LOCAL_MODEL_COST_MAP is not set in the process environment at import), and the remote CDN hasn't yet been updated with the "supports_native_structured_output": true flag added by this PR, then _supports_native_structured_outputs("anthropic.claude-sonnet-4-5-20250929-v1:0") will return False, outputConfig won't be added, and the assertion on line 2758 will fail.

    Consider applying the same pattern as the other tests:

    def test_translate_response_format_native_output_config():
        old_env = os.environ.get("LITELLM_LOCAL_MODEL_COST_MAP")
        old_cost = litellm.model_cost
        os.environ["LITELLM_LOCAL_MODEL_COST_MAP"] = "True"
        litellm.model_cost = litellm.get_model_cost_map(url="")
        try:
            config = AmazonConverseConfig()
            # ... rest of test ...
        finally:
            litellm.model_cost = old_cost
            if old_env is None:
                os.environ.pop("LITELLM_LOCAL_MODEL_COST_MAP", None)
            else:
                os.environ["LITELLM_LOCAL_MODEL_COST_MAP"] = old_env

Last reviewed commit: 3b1e124

Comment on lines +2648 to +2649
os.environ["LITELLM_LOCAL_MODEL_COST_MAP"] = "True"
litellm.model_cost = litellm.get_model_cost_map(url="")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing cleanup of LITELLM_LOCAL_MODEL_COST_MAP and litellm.model_cost

os.environ["LITELLM_LOCAL_MODEL_COST_MAP"] and litellm.model_cost are set here without any teardown, which means these side effects persist for the rest of the test session. Since litellm.model_cost is a module-level global, any test that runs afterward and relies on the original (remote) cost map—or on the env var not being set—could behave differently or produce false positives/negatives.

The same pattern appears in test_native_structured_output_no_fake_stream (line 2797) and test_json_object_no_schema_falls_back_to_tool_call (line 3066).

Each of these tests should save and restore both values with try/finally, or use pytest's monkeypatch fixture:

def test_supports_native_structured_outputs(monkeypatch):
    monkeypatch.setenv("LITELLM_LOCAL_MODEL_COST_MAP", "True")
    original_model_cost = litellm.model_cost
    litellm.model_cost = litellm.get_model_cost_map(url="")
    try:
        # ... assertions ...
    finally:
        litellm.model_cost = original_model_cost

Or more concisely with monkeypatch for the attribute too:

def test_supports_native_structured_outputs(monkeypatch):
    monkeypatch.setenv("LITELLM_LOCAL_MODEL_COST_MAP", "True")
    monkeypatch.setattr(litellm, "model_cost", litellm.get_model_cost_map(url=""))
    # ... assertions ...

@ndgigliotti
Copy link
Contributor Author

Fixed the test state leakage in 97e7fb9. All three tests now restore os.environ["LITELLM_LOCAL_MODEL_COST_MAP"] and litellm.model_cost in a finally block.

@greptileai

Comment on lines 30970 to +30975
"mode": "chat",
"output_cost_per_token": 3.2e-06,
"source": "https://cloud.google.com/vertex-ai/generative-ai/pricing#glm-models",
"supported_regions": ["global"],
"supported_regions": [
"global"
],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated pricing change modifies vertex_ai/gemini-embedding-2-preview

This PR also updates vertex_ai/gemini-embedding-2-preview in ways unrelated to Bedrock native structured outputs:

  • Drops input_cost_per_audio_per_second, input_cost_per_image, and input_cost_per_video_per_second fields
  • Changes input_cost_per_token from 2e-07 to 1.5e-07
  • Switches the source URL from the Google Cloud pricing page to the AI Studio embeddings page

The identical change is in litellm/model_prices_and_context_window_backup.json.

This seems to be fixing a pre-existing duplicate key in the JSON (there were two vertex_ai/gemini-embedding-2-preview entries), but by removing audio/video/image costs it changes the cost calculation result for users calling this embedding model. If this change is intentional, it should either be documented in this PR or separated into its own PR to make the impact clear.

@ndgigliotti
Copy link
Contributor Author

Thanks for the follow-up review.

  1. vertex_ai/gemini-embedding-2-preview diff: Main has two duplicate JSON keys for this entry. Our JSON round-trip collapsed them to the second (text-only) entry, but PR fix(model-prices): remove duplicate vertex_ai/gemini-embedding-2-preview entry #23599 intentionally keeps the first (multimodal pricing with audio/image/video fields). Fixed in e404349 to align with fix(model-prices): remove duplicate vertex_ai/gemini-embedding-2-preview entry #23599.

  2. Missing deepseek.v3-v1:0 test assertion: Added in 254567f.

…t JSON lookup

Move the source of truth for which Bedrock models support native structured
outputs (outputConfig.textFormat) from a hardcoded substring set
(BEDROCK_NATIVE_STRUCTURED_OUTPUT_MODELS) to the cost JSON via a new
"supports_native_structured_output" flag. This makes it possible to add
support for new models (including Claude Sonnet 4.6, which was missing)
by updating the JSON alone, with no code changes needed.
Integration testing confirmed gemma-3 (4b/12b/27b) ignores the JSON
schema and returns free text, and nemotron-nano (9b/12b) errors with
"Tool calling is not supported in streaming mode" even on sync calls.
Remove the flag so these models fall back to the tool-call approach.
Also fix test assertions to match (nemotron-nano-3-30b is supported,
gemma-3 and nemotron-nano-12b are not).
Integration tested 28/28 (10 sync + 10 streaming + extras) on the
native outputConfig.textFormat path in us-west-2. deepseek.v3.2 does
not support native structured output (Bedrock returns 400).
…en3-coder-next

Integration testing confirmed:
- minimax.minimax-m2.1: Bedrock rejects outputConfig.textFormat (400)
- moonshotai.kimi-k2.5: Bedrock rejects outputConfig.textFormat (400)
- qwen.qwen3-coder-next: unavailable in us-east-1 and us-west-2
Wrap cost-map-dependent tests in try/finally to restore
os.environ["LITELLM_LOCAL_MODEL_COST_MAP"] and litellm.model_cost,
preventing test-ordering sensitivity.
…I#23599

Main has two duplicate keys for vertex_ai/gemini-embedding-2-preview.
Our JSON round-trip collapsed them to the second (text-only) entry, but
PR BerriAI#23599 intentionally keeps the first (multimodal pricing) entry.
Restore the multimodal entry to avoid conflicts.
@ndgigliotti ndgigliotti force-pushed the feat/bedrock-structured-output-cost-json branch from e404349 to 3b1e124 Compare March 17, 2026 01:10
@ndgigliotti
Copy link
Contributor Author

Rebased on latest main (3b1e124).

@ndgigliotti
Copy link
Contributor Author

@krrishdholakia ready for review when you get a chance. Greptile gave 4/5 and all feedback has been addressed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant