Skip to content

fix(azure): auto-route gpt-5.4+ tools+reasoning to Responses API#23926

Merged
Chesars merged 2 commits intoBerriAI:litellm_oss_staging_03_17_2026from
Chesars:fix/azure-gpt5-4-responses-api-routing
Mar 18, 2026
Merged

fix(azure): auto-route gpt-5.4+ tools+reasoning to Responses API#23926
Chesars merged 2 commits intoBerriAI:litellm_oss_staging_03_17_2026from
Chesars:fix/azure-gpt5-4-responses-api-routing

Conversation

@Chesars
Copy link
Copy Markdown
Contributor

@Chesars Chesars commented Mar 18, 2026

Relevant issues

Fixes #23914

Pre-Submission checklist

  • I have Added testing in the tests/test_litellm/ directory, Adding at least 1 test is a hard requirement - see details
  • My PR passes all unit tests on make test-unit
  • My PR's scope is as isolated as possible, it only solves 1 specific problem
  • I have requested a Greptile review by commenting @greptileai and received a Confidence Score of at least 4/5 before requesting a maintainer review

Type

🐛 Bug Fix

Changes

Azure GPT-5.4+ models silently drop reasoning_effort when tools are also present in litellm.completion(). OpenAI already auto-routes these requests to the Responses API (which supports both params), but Azure was excluded from this routing.

Fix

  1. litellm/main.py — Moved the gpt-5.4+ auto-routing check out of the try block (Azure models aren't in model_cost map) and extended it to include custom_llm_provider == "azure" alongside "openai".

  2. litellm/llms/azure/chat/gpt_5_transformation.py — Removed the code that silently dropped reasoning_effort when tools were present for gpt-5.4+ models. This is no longer needed since requests are now routed to the Responses API bridge.

  3. docs/my-website/docs/reasoning_content.md — Updated the docs tip to reflect that auto-routing now works for both OpenAI and Azure.

Testing

  • 2 new unit tests for Azure routing in test_main.py
  • Updated existing test from asserting reasoning_effort is dropped to asserting it's preserved
  • Verified e2e with Azure gpt-5-nano deployment (same deployment serves both /chat/completions and /responses endpoints)

Azure GPT-5.4+ models now get the same auto-routing treatment as OpenAI
when both `reasoning_effort` and `tools` are used in `litellm.completion()`.
Previously, `reasoning_effort` was silently dropped for Azure; now the
request is bridged to the Responses API which supports both parameters.

Fixes BerriAI#23914
@vercel
Copy link
Copy Markdown

vercel bot commented Mar 18, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
litellm Ready Ready Preview, Comment Mar 18, 2026 3:58am

Request Review

@codspeed-hq
Copy link
Copy Markdown
Contributor

codspeed-hq bot commented Mar 18, 2026

Merging this PR will not alter performance

✅ 16 untouched benchmarks


Comparing Chesars:fix/azure-gpt5-4-responses-api-routing (cb15296) with main (cfeafbe)

Open in CodSpeed

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Mar 18, 2026

Greptile Summary

This PR fixes a silent data-loss bug where Azure gpt-5.4+ requests containing both tools and reasoning_effort had reasoning_effort silently dropped, whereas OpenAI was already auto-routing such requests through the Responses API bridge. The fix moves the routing check outside the try block in responses_api_bridge_check() and extends it to include azure alongside openai, and removes the corresponding drop-workaround in AzureOpenAIGPT5Config.map_openai_params.

Key changes:

  • litellm/main.py: Routing condition relocated outside the try/except and expanded to custom_llm_provider in ("openai", "azure"), with a guard against double-routing (mode != "responses").
  • litellm/llms/azure/chat/gpt_5_transformation.py: Removed the block that silently dropped reasoning_effort when tools were present for Azure gpt-5.4+.
  • tests/: Two new mock unit tests for the Azure routing path; existing Azure transformation test updated to assert reasoning_effort is preserved (not dropped).
  • docs/: Updated to reflect that auto-routing now covers both OpenAI and Azure, with a documented workaround for custom Azure deployment names.

Issues found:

  • The auto-routing silently changes the Azure endpoint from Chat Completions to the Responses API for users who previously got a working (if degraded) result. Azure Responses API is not universally available across all API versions and subscription tiers. Per the backward-compatibility rule, this behavior change should be guarded behind an opt-in flag so existing users are not broken on upgrade.
  • model.replace("responses/", "") on line 978 is a no-op in the auto-routing path (the model string never contains "responses/" at that point), making it dead code.

Confidence Score: 2/5

  • Merging as-is risks silently breaking Azure users who have gpt-5.4+ deployments with tools+reasoning but no access to the Azure Responses API.
  • The core logic is sound and the tests are well-structured, but the change is backward-incompatible for a subset of Azure users — requests that previously succeeded (with degraded output) will now hard-fail if Azure Responses API is not available in their subscription or API version. There is no opt-out mechanism. The hardcoded model-name string check (flagged in prior threads) also limits coverage to standard deployment names, making the fix incomplete for custom Azure deployments. These two issues together warrant a low confidence score.
  • litellm/main.py — the new routing block at lines 970–979 introduces the backward-compatibility risk and contains the no-op replace call.

Important Files Changed

Filename Overview
litellm/main.py Moved gpt-5.4+ Responses API bridge check outside the try block and extended it to include Azure — introduces a silent backward-incompatible reroute for Azure users without Responses API access, and contains a no-op model.replace("responses/", "") in the auto-routing path.
litellm/llms/azure/chat/gpt_5_transformation.py Removed the workaround that dropped reasoning_effort when tools were present for Azure gpt-5.4+ — correct cleanup now that routing is handled upstream in responses_api_bridge_check.
tests/test_litellm/llms/azure/chat/test_azure_gpt5_transformation.py Test updated from asserting reasoning_effort is dropped to asserting it is preserved — a legitimate change reflecting the new routing behavior, not a mock integrity violation.
tests/test_litellm/test_main.py Two new mock-only unit tests added for Azure routing in responses_api_bridge_check — tests are properly mocked with no real network calls.
docs/my-website/docs/reasoning_content.md Documentation updated to reflect auto-routing for both OpenAI and Azure, with a workaround note for custom Azure deployment names — accurately reflects current behavior and limitations.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A["litellm.completion(model='azure/gpt-5.4', tools=[...], reasoning_effort='high')"] --> B["responses_api_bridge_check()"]
    B --> C{model starts with 'responses/'?}
    C -- Yes --> D["Strip prefix, set mode='responses'"]
    C -- No --> E["_get_model_info_helper() try block"]
    E -- Exception --> F{model starts with 'responses/'?}
    F -- Yes --> D
    F -- No --> G["model_info = {}"]
    D --> H{mode already 'responses'?}
    G --> I{"NEW: custom_llm_provider in (openai, azure)\nAND is_model_gpt_5_4_plus_model(model)\nAND tools\nAND reasoning_effort is not None\nAND mode != 'responses'"}
    E -- Success --> I
    I -- Yes --> J["Set mode='responses'\nmodel.replace('responses/','') [no-op]"]
    I -- No --> K["Return model_info, model"]
    H -- Yes --> K
    J --> K
    K --> L{mode == 'responses'?}
    L -- Yes --> M["Route to Responses API Bridge"]
    L -- No --> N["Route to Azure Chat Completions"]
    M --> O["Azure Responses API\n⚠️ May not be available on all\nsubscriptions/API versions"]
    N --> P["Azure Chat Completions API"]
Loading

Last reviewed commit: "docs: add Azure cust..."

@Chesars
Copy link
Copy Markdown
Contributor Author

Chesars commented Mar 18, 2026

@greptile if users defines their custom deployment name what is the chance for us to detect if is gpt 5.4 and auto-route to responses?

@Chesars Chesars changed the base branch from main to litellm_oss_staging_03_17_2026 March 18, 2026 03:48
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Mar 18, 2026

Based on the code, the detection relies on string matching against the model name:

@classmethod
def is_model_gpt_5_4_plus_model(cls, model: str) -> bool:
    """Check if the model is gpt-5.4 or newer (5.4, 5.5, 5.6, etc., including pro)."""
    model_name = model.split("/")[-1]
    if not model_name.startswith("gpt-5."):
        return False
    try:
        version_str = model_name.replace("gpt-5.", "").split("-")[0]
        major = version_str.split(".")[0]
        return int(major) >= 4
    except (ValueError, IndexError):
        return False

The chance is very low if users define custom deployment names. Here's why:

✅ Works (will be detected):

  • model="gpt-5.4-turbo"
  • model="gpt-5.4-pro"
  • model="azure/gpt-5.5"
  • model="my-gpt-5.4-deployment" ← requires "gpt-5." prefix

❌ Doesn't work (will NOT be detected):

  • model="prod-gpt5"
  • model="my-custom-deployment"
  • model="ai-reasoning-model"
  • model="gpt54-deployment" ← missing the dot

For Azure specifically: Users pass the deployment name as the model parameter, not the underlying model. Unless they name their deployment following the gpt-5.4* pattern, auto-routing won't kick in.

Potential solutions:

  1. Check if there's a base_model or similar field in Azure's model info that contains the actual model ID
  2. Add a way for users to explicitly set model_info["supports_reasoning_effort"] = True in their config
  3. Document that Azure deployments should follow naming conventions like gpt-5.4-* for auto-routing to work
  4. Add an azure_model_mapping config where users can map custom names to model families

Would you like me to check if Azure exposes the underlying model information that could be used for detection?

@Chesars
Copy link
Copy Markdown
Contributor Author

Chesars commented Mar 18, 2026

@greptile Azure Responses API uses the same deployment, key, and base URL as Chat Completions — no extra enablement needed. Routing to Responses API same as OpenAI is backwards compatible here.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Mar 18, 2026

Good clarification. Looking at the changes, this appears backwards compatible:

Why this works:

  1. Only affects a previously broken scenario: Users with both tools + reasoning_effort were getting reasoning_effort silently dropped. Now they get both params working via Responses API.

  2. Conditional routing: The auto-routing only triggers when BOTH params are present, so existing users with only tools OR only reasoning_effort see no change.

  3. Matches proven OpenAI pattern: OpenAI has been doing this auto-routing, so the behavior is validated.

One verification needed:
Is the /responses endpoint universally available for all Azure GPT-5.4+ deployments without extra enablement? If any deployments don't support it, this could break calls that previously "worked" (with reasoning_effort dropped).

Your e2e test with gpt-5-nano confirms it works for that deployment, but can you confirm the Responses API endpoint is part of the standard Azure GPT-5.4+ API surface across all deployments?

@Chesars
Copy link
Copy Markdown
Contributor Author

Chesars commented Mar 18, 2026

Yes — Microsoft docs list prerequisites: a deployed model and an API key. No special enablement required. Docs: https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/responses#prerequisites

@Chesars Chesars merged commit 6a5b005 into BerriAI:litellm_oss_staging_03_17_2026 Mar 18, 2026
5 checks passed
@Chesars Chesars deleted the fix/azure-gpt5-4-responses-api-routing branch March 18, 2026 04:14
joereyna added a commit to joereyna/litellm that referenced this pull request Mar 24, 2026
- Fix azure/gpt-5.4-mini context window: 1.05M → 272K
- Move xai/grok-beta to xAI subsection (was under OpenAI)
- Fix DeepSeek model names: DeepSeek-V3-2 → deepseek-v3.2 (lowercase)
- Remove duplicate PR BerriAI#23926 from Bugs (already in Features)
- Fix Diff Summary LLM API count: 6 → 4
- PR description: fix stale PR refs (BerriAI#24102, BerriAI#24128 → correct ones)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature]: Auto-route Azure GPT-5.4 models to Responses API when reasoning_effort + tools are used

1 participant