fix(azure): auto-route gpt-5.4+ tools+reasoning to Responses API by Chesars · Pull Request #23926 · BerriAI/litellm

Chesars · 2026-03-18T02:08:10Z

Relevant issues

Pre-Submission checklist

I have Added testing in the tests/test_litellm/ directory, Adding at least 1 test is a hard requirement - see details
My PR passes all unit tests on make test-unit
My PR's scope is as isolated as possible, it only solves 1 specific problem
I have requested a Greptile review by commenting @greptileai and received a Confidence Score of at least 4/5 before requesting a maintainer review

Type

🐛 Bug Fix

Changes

Azure GPT-5.4+ models silently drop reasoning_effort when tools are also present in litellm.completion(). OpenAI already auto-routes these requests to the Responses API (which supports both params), but Azure was excluded from this routing.

Fix

litellm/main.py — Moved the gpt-5.4+ auto-routing check out of the try block (Azure models aren't in model_cost map) and extended it to include custom_llm_provider == "azure" alongside "openai".
litellm/llms/azure/chat/gpt_5_transformation.py — Removed the code that silently dropped reasoning_effort when tools were present for gpt-5.4+ models. This is no longer needed since requests are now routed to the Responses API bridge.
docs/my-website/docs/reasoning_content.md — Updated the docs tip to reflect that auto-routing now works for both OpenAI and Azure.

Testing

2 new unit tests for Azure routing in test_main.py
Updated existing test from asserting reasoning_effort is dropped to asserting it's preserved
Verified e2e with Azure gpt-5-nano deployment (same deployment serves both /chat/completions and /responses endpoints)

Azure GPT-5.4+ models now get the same auto-routing treatment as OpenAI when both `reasoning_effort` and `tools` are used in `litellm.completion()`. Previously, `reasoning_effort` was silently dropped for Azure; now the request is bridged to the Responses API which supports both parameters. Fixes BerriAI#23914

vercel · 2026-03-18T02:08:15Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
litellm	Ready	Preview, Comment	Mar 18, 2026 3:58am

codspeed-hq · 2026-03-18T02:10:20Z

Merging this PR will not alter performance

✅ 16 untouched benchmarks

_{Comparing Chesars:fix/azure-gpt5-4-responses-api-routing (cb15296) with main (cfeafbe)}

greptile-apps · 2026-03-18T02:16:43Z

Greptile Summary

This PR fixes a silent data-loss bug where Azure gpt-5.4+ requests containing both tools and reasoning_effort had reasoning_effort silently dropped, whereas OpenAI was already auto-routing such requests through the Responses API bridge. The fix moves the routing check outside the try block in responses_api_bridge_check() and extends it to include azure alongside openai, and removes the corresponding drop-workaround in AzureOpenAIGPT5Config.map_openai_params.

Key changes:

litellm/main.py: Routing condition relocated outside the try/except and expanded to custom_llm_provider in ("openai", "azure"), with a guard against double-routing (mode != "responses").
litellm/llms/azure/chat/gpt_5_transformation.py: Removed the block that silently dropped reasoning_effort when tools were present for Azure gpt-5.4+.
tests/: Two new mock unit tests for the Azure routing path; existing Azure transformation test updated to assert reasoning_effort is preserved (not dropped).
docs/: Updated to reflect that auto-routing now covers both OpenAI and Azure, with a documented workaround for custom Azure deployment names.

Issues found:

The auto-routing silently changes the Azure endpoint from Chat Completions to the Responses API for users who previously got a working (if degraded) result. Azure Responses API is not universally available across all API versions and subscription tiers. Per the backward-compatibility rule, this behavior change should be guarded behind an opt-in flag so existing users are not broken on upgrade.
model.replace("responses/", "") on line 978 is a no-op in the auto-routing path (the model string never contains "responses/" at that point), making it dead code.

Confidence Score: 2/5

Merging as-is risks silently breaking Azure users who have gpt-5.4+ deployments with tools+reasoning but no access to the Azure Responses API.
The core logic is sound and the tests are well-structured, but the change is backward-incompatible for a subset of Azure users — requests that previously succeeded (with degraded output) will now hard-fail if Azure Responses API is not available in their subscription or API version. There is no opt-out mechanism. The hardcoded model-name string check (flagged in prior threads) also limits coverage to standard deployment names, making the fix incomplete for custom Azure deployments. These two issues together warrant a low confidence score.
litellm/main.py — the new routing block at lines 970–979 introduces the backward-compatibility risk and contains the no-op replace call.

Important Files Changed

Filename	Overview
litellm/main.py	Moved gpt-5.4+ Responses API bridge check outside the try block and extended it to include Azure — introduces a silent backward-incompatible reroute for Azure users without Responses API access, and contains a no-op `model.replace("responses/", "")` in the auto-routing path.
litellm/llms/azure/chat/gpt_5_transformation.py	Removed the workaround that dropped `reasoning_effort` when tools were present for Azure gpt-5.4+ — correct cleanup now that routing is handled upstream in `responses_api_bridge_check`.
tests/test_litellm/llms/azure/chat/test_azure_gpt5_transformation.py	Test updated from asserting `reasoning_effort` is dropped to asserting it is preserved — a legitimate change reflecting the new routing behavior, not a mock integrity violation.
tests/test_litellm/test_main.py	Two new mock-only unit tests added for Azure routing in `responses_api_bridge_check` — tests are properly mocked with no real network calls.
docs/my-website/docs/reasoning_content.md	Documentation updated to reflect auto-routing for both OpenAI and Azure, with a workaround note for custom Azure deployment names — accurately reflects current behavior and limitations.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A["litellm.completion(model='azure/gpt-5.4', tools=[...], reasoning_effort='high')"] --> B["responses_api_bridge_check()"]
    B --> C{model starts with 'responses/'?}
    C -- Yes --> D["Strip prefix, set mode='responses'"]
    C -- No --> E["_get_model_info_helper() try block"]
    E -- Exception --> F{model starts with 'responses/'?}
    F -- Yes --> D
    F -- No --> G["model_info = {}"]
    D --> H{mode already 'responses'?}
    G --> I{"NEW: custom_llm_provider in (openai, azure)\nAND is_model_gpt_5_4_plus_model(model)\nAND tools\nAND reasoning_effort is not None\nAND mode != 'responses'"}
    E -- Success --> I
    I -- Yes --> J["Set mode='responses'\nmodel.replace('responses/','') [no-op]"]
    I -- No --> K["Return model_info, model"]
    H -- Yes --> K
    J --> K
    K --> L{mode == 'responses'?}
    L -- Yes --> M["Route to Responses API Bridge"]
    L -- No --> N["Route to Azure Chat Completions"]
    M --> O["Azure Responses API\n⚠️ May not be available on all\nsubscriptions/API versions"]
    N --> P["Azure Chat Completions API"]

_{Last reviewed commit: "docs: add Azure cust..."}

litellm/main.py

Chesars · 2026-03-18T03:48:18Z

@greptile if users defines their custom deployment name what is the chance for us to detect if is gpt 5.4 and auto-route to responses?

greptile-apps · 2026-03-18T03:49:27Z

Based on the code, the detection relies on string matching against the model name:

@classmethod
def is_model_gpt_5_4_plus_model(cls, model: str) -> bool:
    """Check if the model is gpt-5.4 or newer (5.4, 5.5, 5.6, etc., including pro)."""
    model_name = model.split("/")[-1]
    if not model_name.startswith("gpt-5."):
        return False
    try:
        version_str = model_name.replace("gpt-5.", "").split("-")[0]
        major = version_str.split(".")[0]
        return int(major) >= 4
    except (ValueError, IndexError):
        return False

The chance is very low if users define custom deployment names. Here's why:

✅ Works (will be detected):

model="gpt-5.4-turbo"
model="gpt-5.4-pro"
model="azure/gpt-5.5"
model="my-gpt-5.4-deployment" ← requires "gpt-5." prefix

❌ Doesn't work (will NOT be detected):

model="prod-gpt5"
model="my-custom-deployment"
model="ai-reasoning-model"
model="gpt54-deployment" ← missing the dot

For Azure specifically: Users pass the deployment name as the model parameter, not the underlying model. Unless they name their deployment following the gpt-5.4* pattern, auto-routing won't kick in.

Potential solutions:

Check if there's a base_model or similar field in Azure's model info that contains the actual model ID
Add a way for users to explicitly set model_info["supports_reasoning_effort"] = True in their config
Document that Azure deployments should follow naming conventions like gpt-5.4-* for auto-routing to work
Add an azure_model_mapping config where users can map custom names to model families

Would you like me to check if Azure exposes the underlying model information that could be used for detection?

litellm/main.py

Chesars · 2026-03-18T04:08:52Z

@greptile Azure Responses API uses the same deployment, key, and base URL as Chat Completions — no extra enablement needed. Routing to Responses API same as OpenAI is backwards compatible here.

greptile-apps · 2026-03-18T04:10:15Z

Good clarification. Looking at the changes, this appears backwards compatible:

Why this works:

Only affects a previously broken scenario: Users with both tools + reasoning_effort were getting reasoning_effort silently dropped. Now they get both params working via Responses API.
Conditional routing: The auto-routing only triggers when BOTH params are present, so existing users with only tools OR only reasoning_effort see no change.
Matches proven OpenAI pattern: OpenAI has been doing this auto-routing, so the behavior is validated.

One verification needed:
Is the /responses endpoint universally available for all Azure GPT-5.4+ deployments without extra enablement? If any deployments don't support it, this could break calls that previously "worked" (with reasoning_effort dropped).

Your e2e test with gpt-5-nano confirms it works for that deployment, but can you confirm the Responses API endpoint is part of the standard Azure GPT-5.4+ API surface across all deployments?

Chesars · 2026-03-18T04:14:47Z

Yes — Microsoft docs list prerequisites: a deployed model and an API key. No special enablement required. Docs: https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/responses#prerequisites

- Fix azure/gpt-5.4-mini context window: 1.05M → 272K - Move xai/grok-beta to xAI subsection (was under OpenAI) - Fix DeepSeek model names: DeepSeek-V3-2 → deepseek-v3.2 (lowercase) - Remove duplicate PR BerriAI#23926 from Bugs (already in Features) - Fix Diff Summary LLM API count: 6 → 4 - PR description: fix stale PR refs (BerriAI#24102, BerriAI#24128 → correct ones) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

vercel bot deployed to Preview March 18, 2026 02:10 View deployment

greptile-apps bot reviewed Mar 18, 2026

View reviewed changes

litellm/main.py Show resolved Hide resolved

litellm/main.py Show resolved Hide resolved

Chesars changed the base branch from main to litellm_oss_staging_03_17_2026 March 18, 2026 03:48

docs: add Azure custom deployment name guidance for auto-routing

aaf860c

vercel bot deployed to Preview March 18, 2026 03:58 View deployment

greptile-apps bot reviewed Mar 18, 2026

View reviewed changes

litellm/main.py Show resolved Hide resolved

litellm/main.py Show resolved Hide resolved

Chesars merged commit 6a5b005 into BerriAI:litellm_oss_staging_03_17_2026 Mar 18, 2026
5 checks passed

Chesars deleted the fix/azure-gpt5-4-responses-api-routing branch March 18, 2026 04:14

greptile-apps bot mentioned this pull request Mar 24, 2026

docs(release-notes): add v1.82.6.rc.1 release notes #24452

Open

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(azure): auto-route gpt-5.4+ tools+reasoning to Responses API#23926

fix(azure): auto-route gpt-5.4+ tools+reasoning to Responses API#23926
Chesars merged 2 commits intoBerriAI:litellm_oss_staging_03_17_2026from
Chesars:fix/azure-gpt5-4-responses-api-routing

Chesars commented Mar 18, 2026

Uh oh!

vercel bot commented Mar 18, 2026 •

edited

Loading

Uh oh!

codspeed-hq bot commented Mar 18, 2026

Uh oh!

greptile-apps bot commented Mar 18, 2026 •

edited

Loading

Important Files Changed

Uh oh!

Uh oh!

Uh oh!

Chesars commented Mar 18, 2026

Uh oh!

greptile-apps bot commented Mar 18, 2026

Uh oh!

Uh oh!

Uh oh!

Chesars commented Mar 18, 2026

Uh oh!

greptile-apps bot commented Mar 18, 2026

Uh oh!

Chesars commented Mar 18, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

Chesars commented Mar 18, 2026

Relevant issues

Pre-Submission checklist

Type

Changes

Fix

Testing

Uh oh!

vercel bot commented Mar 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codspeed-hq bot commented Mar 18, 2026

Merging this PR will not alter performance

Uh oh!

greptile-apps bot commented Mar 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 2/5

Important Files Changed

Flowchart

Uh oh!

Uh oh!

Uh oh!

Chesars commented Mar 18, 2026

Uh oh!

greptile-apps bot commented Mar 18, 2026

Uh oh!

Uh oh!

Uh oh!

Chesars commented Mar 18, 2026

Uh oh!

greptile-apps bot commented Mar 18, 2026

Uh oh!

Chesars commented Mar 18, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

vercel bot commented Mar 18, 2026 •

edited

Loading

greptile-apps bot commented Mar 18, 2026 •

edited

Loading