Bedrock: move native structured output model list to cost JSON, add Sonnet 4.6#23794
Bedrock: move native structured output model list to cost JSON, add Sonnet 4.6#23794ndgigliotti wants to merge 7 commits intoBerriAI:mainfrom
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
Greptile SummaryThis PR replaces a hardcoded Key changes:
Confidence Score: 4/5
|
| Filename | Overview |
|---|---|
| litellm/llws/bedrock/chat/converse_transformation.py | Removes hardcoded BEDROCK_NATIVE_STRUCTURED_OUTPUT_MODELS set and replaces _supports_native_structured_outputs with a cost-JSON lookup via litellm.model_cost + get_bedrock_base_model, with a version-suffix fallback. The rest of the diff is pure whitespace/line-wrapping reformatting with no logic changes. |
| model_prices_and_context_window.json | Adds "supports_native_structured_output": true to 44 Bedrock models (Claude 4.5/4.6 variants, Qwen3, Mistral, MiniMax, Moonshot, NVIDIA, DeepSeek). Also includes an unrelated correction to vertex_ai/gemini-embedding-2-preview (discussed in a previous thread). |
| litellm/model_prices_and_context_window_backup.json | Mirror of the main JSON changes: same 44 models flagged with supports_native_structured_output: true, and the same vertex_ai/gemini-embedding-2-preview correction. |
| tests/test_litellm/llms/bedrock/chat/test_converse_transformation.py | Adds tests for the new cost-JSON-backed _supports_native_structured_outputs logic. Three new tests properly save/restore litellm.model_cost with try/finally, but test_translate_response_format_native_output_config (line 2729) skips that setup, making it potentially flaky when the remote cost map hasn't been updated yet. The rest of the diff is cosmetic reformatting. |
Flowchart
%%{init: {'theme': 'neutral'}}%%
flowchart TD
A["_translate_response_format_param(model, response_format, ...)"] --> B{"json_schema present?"}
B -- No --> E["Tool-call fallback path"]
B -- Yes --> C["_supports_native_structured_outputs(model)"]
C --> C1["get_bedrock_base_model(model)\n(strip region prefix, routing prefix, throughput suffix, ARN)"]
C1 --> C2["litellm.model_cost.get(base_model)"]
C2 -- found --> C4["return info.get('supports_native_structured_output', False) is True"]
C2 -- not found & ':' in model --> C3["Retry with version suffix stripped\n(e.g. 'model-v1:0' → 'model-v1')"]
C3 --> C4
C2 -- not found & no ':' --> C5["return False"]
C4 -- True --> D["Native path: build outputConfig.textFormat\nNo tool injection, no fake_stream"]
C4 -- False --> E
C5 --> E
E["Tool-call fallback: inject synthetic tool,\nset tool_choice, possibly set fake_stream"]
Comments Outside Diff (1)
-
tests/test_litellm/llws/bedrock/chat/test_converse_transformation.py, line 2729-2770 (link)test_translate_response_format_native_output_configmay be flaky without local cost map setupUnlike
test_supports_native_structured_outputs,test_native_structured_output_no_fake_stream, andtest_json_object_no_schema_falls_back_to_tool_call— all of which explicitly setLITELLM_LOCAL_MODEL_COST_MAPand reloadlitellm.model_costfrom the local backup — this test relies on whateverlitellm.model_costwas loaded at import time.If the CI environment fetches the remote JSON (i.e.,
LITELLM_LOCAL_MODEL_COST_MAPis not set in the process environment at import), and the remote CDN hasn't yet been updated with the"supports_native_structured_output": trueflag added by this PR, then_supports_native_structured_outputs("anthropic.claude-sonnet-4-5-20250929-v1:0")will returnFalse,outputConfigwon't be added, and the assertion on line 2758 will fail.Consider applying the same pattern as the other tests:
def test_translate_response_format_native_output_config(): old_env = os.environ.get("LITELLM_LOCAL_MODEL_COST_MAP") old_cost = litellm.model_cost os.environ["LITELLM_LOCAL_MODEL_COST_MAP"] = "True" litellm.model_cost = litellm.get_model_cost_map(url="") try: config = AmazonConverseConfig() # ... rest of test ... finally: litellm.model_cost = old_cost if old_env is None: os.environ.pop("LITELLM_LOCAL_MODEL_COST_MAP", None) else: os.environ["LITELLM_LOCAL_MODEL_COST_MAP"] = old_env
Last reviewed commit: 3b1e124
| os.environ["LITELLM_LOCAL_MODEL_COST_MAP"] = "True" | ||
| litellm.model_cost = litellm.get_model_cost_map(url="") |
There was a problem hiding this comment.
Missing cleanup of LITELLM_LOCAL_MODEL_COST_MAP and litellm.model_cost
os.environ["LITELLM_LOCAL_MODEL_COST_MAP"] and litellm.model_cost are set here without any teardown, which means these side effects persist for the rest of the test session. Since litellm.model_cost is a module-level global, any test that runs afterward and relies on the original (remote) cost map—or on the env var not being set—could behave differently or produce false positives/negatives.
The same pattern appears in test_native_structured_output_no_fake_stream (line 2797) and test_json_object_no_schema_falls_back_to_tool_call (line 3066).
Each of these tests should save and restore both values with try/finally, or use pytest's monkeypatch fixture:
def test_supports_native_structured_outputs(monkeypatch):
monkeypatch.setenv("LITELLM_LOCAL_MODEL_COST_MAP", "True")
original_model_cost = litellm.model_cost
litellm.model_cost = litellm.get_model_cost_map(url="")
try:
# ... assertions ...
finally:
litellm.model_cost = original_model_costOr more concisely with monkeypatch for the attribute too:
def test_supports_native_structured_outputs(monkeypatch):
monkeypatch.setenv("LITELLM_LOCAL_MODEL_COST_MAP", "True")
monkeypatch.setattr(litellm, "model_cost", litellm.get_model_cost_map(url=""))
# ... assertions ...|
Fixed the test state leakage in 97e7fb9. All three tests now restore |
| "mode": "chat", | ||
| "output_cost_per_token": 3.2e-06, | ||
| "source": "https://cloud.google.com/vertex-ai/generative-ai/pricing#glm-models", | ||
| "supported_regions": ["global"], | ||
| "supported_regions": [ | ||
| "global" | ||
| ], |
There was a problem hiding this comment.
Unrelated pricing change modifies vertex_ai/gemini-embedding-2-preview
This PR also updates vertex_ai/gemini-embedding-2-preview in ways unrelated to Bedrock native structured outputs:
- Drops
input_cost_per_audio_per_second,input_cost_per_image, andinput_cost_per_video_per_secondfields - Changes
input_cost_per_tokenfrom2e-07to1.5e-07 - Switches the
sourceURL from the Google Cloud pricing page to the AI Studio embeddings page
The identical change is in litellm/model_prices_and_context_window_backup.json.
This seems to be fixing a pre-existing duplicate key in the JSON (there were two vertex_ai/gemini-embedding-2-preview entries), but by removing audio/video/image costs it changes the cost calculation result for users calling this embedding model. If this change is intentional, it should either be documented in this PR or separated into its own PR to make the impact clear.
|
Thanks for the follow-up review.
|
…t JSON lookup Move the source of truth for which Bedrock models support native structured outputs (outputConfig.textFormat) from a hardcoded substring set (BEDROCK_NATIVE_STRUCTURED_OUTPUT_MODELS) to the cost JSON via a new "supports_native_structured_output" flag. This makes it possible to add support for new models (including Claude Sonnet 4.6, which was missing) by updating the JSON alone, with no code changes needed.
Integration testing confirmed gemma-3 (4b/12b/27b) ignores the JSON schema and returns free text, and nemotron-nano (9b/12b) errors with "Tool calling is not supported in streaming mode" even on sync calls. Remove the flag so these models fall back to the tool-call approach. Also fix test assertions to match (nemotron-nano-3-30b is supported, gemma-3 and nemotron-nano-12b are not).
Integration tested 28/28 (10 sync + 10 streaming + extras) on the native outputConfig.textFormat path in us-west-2. deepseek.v3.2 does not support native structured output (Bedrock returns 400).
…en3-coder-next Integration testing confirmed: - minimax.minimax-m2.1: Bedrock rejects outputConfig.textFormat (400) - moonshotai.kimi-k2.5: Bedrock rejects outputConfig.textFormat (400) - qwen.qwen3-coder-next: unavailable in us-east-1 and us-west-2
Wrap cost-map-dependent tests in try/finally to restore os.environ["LITELLM_LOCAL_MODEL_COST_MAP"] and litellm.model_cost, preventing test-ordering sensitivity.
…I#23599 Main has two duplicate keys for vertex_ai/gemini-embedding-2-preview. Our JSON round-trip collapsed them to the second (text-only) entry, but PR BerriAI#23599 intentionally keeps the first (multimodal pricing) entry. Restore the multimodal entry to avoid conflicts.
e404349 to
3b1e124
Compare
|
Rebased on latest main (3b1e124). |
|
@krrishdholakia ready for review when you get a chance. Greptile gave 4/5 and all feedback has been addressed. |
Relevant issues
Addresses Greptile feedback on #21222 and #23778 recommending the hardcoded model set be moved to the cost JSON.
Pre-Submission checklist
tests/test_litellm/directorymake test-unit@greptileaiand received a Confidence Score of at least 4/5 before requesting a maintainer reviewType
Refactoring / New Feature
Changes
Claude Sonnet 4.6 was released after the native structured output feature landed (#21222) and was not included. DeepSeek v3 was listed in the hardcoded set but the substring didn't match the actual model ID, so it was silently broken. This PR flags both for native structured output and moves the model capability check from a hardcoded set to the cost JSON, so future models are supported without code changes or releases (since
litellm.model_costis fetched from the remote JSON at import time).litellm/llms/bedrock/chat/converse_transformation.py: Removed hardcodedBEDROCK_NATIVE_STRUCTURED_OUTPUT_MODELSset._supports_native_structured_outputs()now looks up thesupports_native_structured_outputflag inlitellm.model_costviaget_bedrock_base_model(), with a fallback that strips version suffixes (e.g.:0).model_prices_and_context_window.json/litellm/model_prices_and_context_window_backup.json: Added"supports_native_structured_output": trueto 44 Bedrock models:tests/test_litellm/llms/bedrock/chat/test_converse_transformation.py: Updated tests to load local cost map and assert against real model IDs.All 44 flagged models were integration tested against real Bedrock endpoints (sync + streaming) on the native
outputConfig.textFormatpath.Models deliberately excluded after integration testing:
google.gemma-3(4b/12b/27b): ignores schema, returns free textnvidia.nemotron-nano(9b/12b): errors with "Tool calling is not supported in streaming mode" even on syncdeepseek.v3.2: Bedrock returns 400 onoutputConfig.textFormatminimax.minimax-m2.1: Bedrock returns 400 onoutputConfig.textFormatmoonshotai.kimi-k2.5: Bedrock returns 400 onoutputConfig.textFormatqwen.qwen3-coder-next: unavailable in both us-east-1 and us-west-2These excluded models continue to work via the existing tool-call fallback path.