fix(azure): auto-route gpt-5.4+ tools+reasoning to Responses API#23926
Conversation
Azure GPT-5.4+ models now get the same auto-routing treatment as OpenAI when both `reasoning_effort` and `tools` are used in `litellm.completion()`. Previously, `reasoning_effort` was silently dropped for Azure; now the request is bridged to the Responses API which supports both parameters. Fixes BerriAI#23914
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
Greptile SummaryThis PR fixes a silent data-loss bug where Azure Key changes:
Issues found:
Confidence Score: 2/5
|
| Filename | Overview |
|---|---|
| litellm/main.py | Moved gpt-5.4+ Responses API bridge check outside the try block and extended it to include Azure — introduces a silent backward-incompatible reroute for Azure users without Responses API access, and contains a no-op model.replace("responses/", "") in the auto-routing path. |
| litellm/llms/azure/chat/gpt_5_transformation.py | Removed the workaround that dropped reasoning_effort when tools were present for Azure gpt-5.4+ — correct cleanup now that routing is handled upstream in responses_api_bridge_check. |
| tests/test_litellm/llms/azure/chat/test_azure_gpt5_transformation.py | Test updated from asserting reasoning_effort is dropped to asserting it is preserved — a legitimate change reflecting the new routing behavior, not a mock integrity violation. |
| tests/test_litellm/test_main.py | Two new mock-only unit tests added for Azure routing in responses_api_bridge_check — tests are properly mocked with no real network calls. |
| docs/my-website/docs/reasoning_content.md | Documentation updated to reflect auto-routing for both OpenAI and Azure, with a workaround note for custom Azure deployment names — accurately reflects current behavior and limitations. |
Flowchart
%%{init: {'theme': 'neutral'}}%%
flowchart TD
A["litellm.completion(model='azure/gpt-5.4', tools=[...], reasoning_effort='high')"] --> B["responses_api_bridge_check()"]
B --> C{model starts with 'responses/'?}
C -- Yes --> D["Strip prefix, set mode='responses'"]
C -- No --> E["_get_model_info_helper() try block"]
E -- Exception --> F{model starts with 'responses/'?}
F -- Yes --> D
F -- No --> G["model_info = {}"]
D --> H{mode already 'responses'?}
G --> I{"NEW: custom_llm_provider in (openai, azure)\nAND is_model_gpt_5_4_plus_model(model)\nAND tools\nAND reasoning_effort is not None\nAND mode != 'responses'"}
E -- Success --> I
I -- Yes --> J["Set mode='responses'\nmodel.replace('responses/','') [no-op]"]
I -- No --> K["Return model_info, model"]
H -- Yes --> K
J --> K
K --> L{mode == 'responses'?}
L -- Yes --> M["Route to Responses API Bridge"]
L -- No --> N["Route to Azure Chat Completions"]
M --> O["Azure Responses API\n⚠️ May not be available on all\nsubscriptions/API versions"]
N --> P["Azure Chat Completions API"]
Last reviewed commit: "docs: add Azure cust..."
|
@greptile if users defines their custom deployment name what is the chance for us to detect if is gpt 5.4 and auto-route to responses? |
|
Based on the code, the detection relies on string matching against the model name: @classmethod
def is_model_gpt_5_4_plus_model(cls, model: str) -> bool:
"""Check if the model is gpt-5.4 or newer (5.4, 5.5, 5.6, etc., including pro)."""
model_name = model.split("/")[-1]
if not model_name.startswith("gpt-5."):
return False
try:
version_str = model_name.replace("gpt-5.", "").split("-")[0]
major = version_str.split(".")[0]
return int(major) >= 4
except (ValueError, IndexError):
return FalseThe chance is very low if users define custom deployment names. Here's why: ✅ Works (will be detected):
❌ Doesn't work (will NOT be detected):
For Azure specifically: Users pass the deployment name as the model parameter, not the underlying model. Unless they name their deployment following the Potential solutions:
Would you like me to check if Azure exposes the underlying model information that could be used for detection? |
|
@greptile Azure Responses API uses the same deployment, key, and base URL as Chat Completions — no extra enablement needed. Routing to Responses API same as OpenAI is backwards compatible here. |
|
Good clarification. Looking at the changes, this appears backwards compatible: Why this works:
One verification needed: Your e2e test with |
|
Yes — Microsoft docs list prerequisites: a deployed model and an API key. No special enablement required. Docs: https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/responses#prerequisites |
6a5b005
into
BerriAI:litellm_oss_staging_03_17_2026
- Fix azure/gpt-5.4-mini context window: 1.05M → 272K - Move xai/grok-beta to xAI subsection (was under OpenAI) - Fix DeepSeek model names: DeepSeek-V3-2 → deepseek-v3.2 (lowercase) - Remove duplicate PR BerriAI#23926 from Bugs (already in Features) - Fix Diff Summary LLM API count: 6 → 4 - PR description: fix stale PR refs (BerriAI#24102, BerriAI#24128 → correct ones) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Relevant issues
Fixes #23914
Pre-Submission checklist
tests/test_litellm/directory, Adding at least 1 test is a hard requirement - see detailsmake test-unit@greptileaiand received a Confidence Score of at least 4/5 before requesting a maintainer reviewType
🐛 Bug Fix
Changes
Azure GPT-5.4+ models silently drop
reasoning_effortwhentoolsare also present inlitellm.completion(). OpenAI already auto-routes these requests to the Responses API (which supports both params), but Azure was excluded from this routing.Fix
litellm/main.py— Moved the gpt-5.4+ auto-routing check out of the try block (Azure models aren't inmodel_costmap) and extended it to includecustom_llm_provider == "azure"alongside"openai".litellm/llms/azure/chat/gpt_5_transformation.py— Removed the code that silently droppedreasoning_effortwhen tools were present for gpt-5.4+ models. This is no longer needed since requests are now routed to the Responses API bridge.docs/my-website/docs/reasoning_content.md— Updated the docs tip to reflect that auto-routing now works for both OpenAI and Azure.Testing
test_main.pyreasoning_effortis dropped to asserting it's preservedgpt-5-nanodeployment (same deployment serves both/chat/completionsand/responsesendpoints)