feat(azure_ai): add router flat cost when response contains actual model#22957
feat(azure_ai): add router flat cost when response contains actual model#22957Sameerlite merged 1 commit intomainfrom
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
Greptile SummaryThis PR fixes Azure Model Router cost tracking so that the $0.14/M-token infrastructure flat cost is correctly applied even when Azure returns the actual model name (e.g., Key changes:
Issues found:
Confidence Score: 2/5
|
| Filename | Overview |
|---|---|
| litellm/llms/azure_ai/cost_calculator.py | Core fix adds request_model to detect router requests when Azure returns the actual model; however the exception handler still only checks _is_azure_model_router(model) (the response model), not is_router_request, which can cause the full cost calculation to fail when an unknown response model is returned via the router. |
| litellm/cost_calculator.py | Correctly threads request_model from litellm_logging_obj.model into cost_per_token, but _get_additional_costs is not updated to receive request_model, leaving the cost breakdown logging incomplete when the response model is not a router identifier. |
| tests/test_litellm/llms/azure_ai/test_azure_ai_cost_calculator.py | New test test_router_flat_cost_when_response_has_actual_model correctly validates the happy-path fix; all tests are mock-only (no real network calls), which is compliant with project rules. |
| litellm/model_prices_and_context_window_backup.json | Adds supports_web_search: true to several OpenAI models; unrelated to the main Azure Model Router fix but is a clean additive change with no issues. |
| docs/my-website/docs/providers/azure_ai/azure_model_router.md | Documentation improvements: adds Quick Start section, cost calculation flow table, and configuration requirements. Clear and accurate for the described behaviour. |
Flowchart
%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[completion_cost called] --> B[Extract request_model\nfrom litellm_logging_obj.model]
B --> C[Call cost_per_token\nwith model + request_model]
C --> D{is_router_request?\n_is_azure_model_router\nresponse model OR request model}
D -- Yes --> E[generic_cost_per_token\nresponse model\ne.g. gpt-5-nano-2025-08-07]
D -- No --> E2[generic_cost_per_token\nnormal model]
E --> F{Model in cost map?}
F -- Yes --> G[base prompt_cost\n+ completion_cost]
F -- No --> H{_is_azure_model_router\nresponse model only\n⚠️ BUG: ignores is_router_request}
H -- True --> I[Swallow exception\ncontinue with 0 cost]
H -- False --> J[❌ Re-raise exception\neven if is_router_request=True]
G --> K[Add router flat cost\nvia calculate_azure_model_router_flat_cost\nusing request_model]
I --> K
K --> L[Return prompt_cost, completion_cost]
L --> M[_get_additional_costs\nmodel only, no request_model\n⚠️ BUG: breakdown misses flat cost\nfor router responses]
M --> N[_store_cost_breakdown_in_logging_obj]
L --> O[_final_cost = prompt + completion]
Comments Outside Diff (3)
-
litellm/llms/azure_ai/cost_calculator.py, line 106-115 (link)Exception handler ignores
is_router_request— may silently re-raise for router responsesThe exception handler still only checks
_is_azure_model_router(model)— the response model — when deciding whether to swallow the error. But the whole purpose of this PR is to handle the case where Azure returns the actual model name (e.g.,gpt-5-nano-2025-08-07) in the response, which is not a router model name.If
gpt-5-nano-2025-08-07(or any future actual model) is not present in LiteLLM's cost map,generic_cost_per_tokenwill raise,_is_azure_model_router("gpt-5-nano-2025-08-07")returnsFalse, and the exception propagates — losing the entire cost calculation even thoughis_router_requestisTrue.The fix needs to use
is_router_request(which was correctly computed above) instead of re-calling_is_azure_model_router(model): -
litellm/llms/azure_ai/cost_calculator.py, line 14-35 (link)Hardcoded model-router string patterns violate the model-flags-in-JSON rule
_is_azure_model_routerhardcodes the strings"model-router","model_router", and"azure-model-router"to detect router deployments. Per the project's guidelines, model-specific capability flags should be placed inmodel_prices_and_context_window.jsonand read back viaget_model_info— this prevents requiring a LiteLLM upgrade whenever a new naming convention is introduced.The PR extends the reliance on this function (now also checking
request_modelagainst it), so the same risk applies to the new code path. Consider introducing a JSON field such as"is_model_router": trueon theazure_ai/model_routerentry and reading it viaget_model_infoat detection time, so future router deployment name patterns only need a JSON update rather than a code change.Context Used: Rule from
dashboard- What: Do not hardcode model-specific flags in the codebase. Instead, put them in model_prices_and_co... (source) -
litellm/cost_calculator.py, line 1494-1500 (link)_get_additional_costsmissingrequest_model— cost breakdown incomplete for router responses_get_additional_costsis called with only the responsemodel(e.g., a plain model name likegpt-5-nano-2025-08-07). Internally it callsAzureFoundryModelInfo.get_azure_ai_config_for_model(model), which returnsAzureAIStudioConfigfor a plain model name — notAzureModelRouterConfig. This means the "Azure Model Router Flat Cost" entry is never written to the cost breakdown when Azure returns the actual model in the response.The actual
prompt_costreturned fromcost_per_tokenis correct because the newrequest_modelparameter is threaded in there, but the cost breakdown stored in the logging object (used for spend analytics / proxy dashboards) will be missing the flat cost line item. This creates an inconsistency between the reported total cost and the breakdown stored inlitellm_logging_obj.cost_breakdown.request_model_for_costshould be passed into_get_additional_costs(and propagated down throughAzureFoundryModelInfo.get_azure_ai_config_for_model) so it can make the same router detection decision ascost_per_tokendoes.
Last reviewed commit: c23eb5a
- Pass request_model to Azure AI cost calculator to detect router requests - Add router flat cost ($0.14/M input tokens) even when Azure returns actual model in response - Add test for router flat cost with response containing actual model - Update docs with cost calculation flow and configuration requirements Made-with: Cursor
88e1e6c to
c23eb5a
Compare
Summary
Fixes Azure Model Router cost tracking so that the router flat cost ($0.14 per M input tokens) is correctly added even when Azure returns the actual model (e.g.,
gpt-5-nano-2025-08-07) in the response instead of the router deployment name.Problem
When using Azure Model Router, the Azure API returns the actual model used in the response (e.g.,
gpt-5-nano-2025-08-07). LiteLLM was only checking if the response model was a router model to add the flat cost. Since the response contained the actual model name, the router flat cost was never added—only the model cost was tracked.Solution
request_modelfrom the logging object to the Azure AI cost calculatorChanges
request_modelparameter; use it to detect router requests and add flat costrequest_modelfromlitellm_logging_obj.modelthrough to Azure AI cost calculationtest_router_flat_cost_when_response_has_actual_modelto verify the fixConfiguration
For cost tracking to work correctly, use the full pattern:
azure_ai/model_router/<deployment-name>(e.g.,azure_ai/model_router/model-router).Fixes LIT-2013