feat(azure_ai): add router flat cost when response contains actual model by Sameerlite · Pull Request #22957 · BerriAI/litellm

Sameerlite · 2026-03-06T05:19:08Z

Summary

Fixes Azure Model Router cost tracking so that the router flat cost ($0.14 per M input tokens) is correctly added even when Azure returns the actual model (e.g., gpt-5-nano-2025-08-07) in the response instead of the router deployment name.

Problem

When using Azure Model Router, the Azure API returns the actual model used in the response (e.g., gpt-5-nano-2025-08-07). LiteLLM was only checking if the response model was a router model to add the flat cost. Since the response contained the actual model name, the router flat cost was never added—only the model cost was tracked.

Solution

Pass request_model from the logging object to the Azure AI cost calculator
Detect router requests by checking both the response model and the original request model
Add the router flat cost when the request was made via a model router endpoint, regardless of what model Azure returns in the response

Changes

litellm/llms/azure_ai/cost_calculator.py: Add request_model parameter; use it to detect router requests and add flat cost
litellm/cost_calculator.py: Thread request_model from litellm_logging_obj.model through to Azure AI cost calculation
tests: Add test_router_flat_cost_when_response_has_actual_model to verify the fix
docs: Update Azure Model Router docs with cost calculation flow and configuration requirements

Configuration

For cost tracking to work correctly, use the full pattern: azure_ai/model_router/<deployment-name> (e.g., azure_ai/model_router/model-router).

Fixes LIT-2013

vercel · 2026-03-06T05:19:15Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
litellm	Ready	Preview, Comment	Mar 6, 2026 0:50am

greptile-apps · 2026-03-06T05:23:18Z

Greptile Summary

This PR fixes Azure Model Router cost tracking so that the $0.14/M-token infrastructure flat cost is correctly applied even when Azure returns the actual model name (e.g., gpt-5-nano-2025-08-07) in the response rather than the router deployment name. It achieves this by threading request_model from the logging object into the Azure AI cost calculator and using it as a fallback router-detection signal.

Key changes:

litellm/llms/azure_ai/cost_calculator.py: Adds request_model parameter; computes is_router_request from either the response model or the request model; uses request_model when calling calculate_azure_model_router_flat_cost.
litellm/cost_calculator.py: Extracts litellm_logging_obj.model as request_model_for_cost and passes it to cost_per_token.
tests/: Adds a mock-only test that validates flat cost is applied when request was via router but response contains an actual model name.
model_prices_and_context_window_backup.json: Unrelated addition of supports_web_search: true to several OpenAI models.

Issues found:

Exception handler regression (logic): litellm/llms/azure_ai/cost_calculator.py lines 106–115 — the except block still calls _is_azure_model_router(model) on the response model rather than checking the already-computed is_router_request. If Azure returns a model not yet in LiteLLM's cost database, the exception will propagate and the entire cost calculation fails, even for valid router requests.
Incomplete cost breakdown (logic): litellm/cost_calculator.py lines 1494–1500 — _get_additional_costs is not passed request_model, so AzureFoundryModelInfo.get_azure_ai_config_for_model picks AzureAIStudioConfig for the raw response model and the "Azure Model Router Flat Cost" entry is never written to the logging cost breakdown, creating a mismatch between the reported total cost and the breakdown visible in proxy dashboards.
Hardcoded router-name patterns: _is_azure_model_router uses hard-coded strings ("model-router", "model_router", "azure-model-router"), which is inconsistent with the project guideline to place model-capability flags in model_prices_and_context_window.json.

Confidence Score: 2/5

The core fix is correct in intent but contains two logic bugs that can cause cost calculation failures and silent cost-breakdown inconsistencies in production.
The exception handler in azure_ai/cost_calculator.py still tests only the response model for router detection, meaning any router response carrying an unknown model name will throw an exception and record $0 cost. Additionally, _get_additional_costs is not updated with request_model, producing an inconsistent cost breakdown in logging even when the total cost is calculated correctly. Both issues affect the stated goal of the PR.
litellm/llms/azure_ai/cost_calculator.py (exception handler on lines 106–115) and litellm/cost_calculator.py (the _get_additional_costs call on lines 1494–1500) need attention before merging.

Important Files Changed

Filename	Overview
litellm/llms/azure_ai/cost_calculator.py	Core fix adds `request_model` to detect router requests when Azure returns the actual model; however the exception handler still only checks `_is_azure_model_router(model)` (the response model), not `is_router_request`, which can cause the full cost calculation to fail when an unknown response model is returned via the router.
litellm/cost_calculator.py	Correctly threads `request_model` from `litellm_logging_obj.model` into `cost_per_token`, but `_get_additional_costs` is not updated to receive `request_model`, leaving the cost breakdown logging incomplete when the response model is not a router identifier.
tests/test_litellm/llms/azure_ai/test_azure_ai_cost_calculator.py	New test `test_router_flat_cost_when_response_has_actual_model` correctly validates the happy-path fix; all tests are mock-only (no real network calls), which is compliant with project rules.
litellm/model_prices_and_context_window_backup.json	Adds `supports_web_search: true` to several OpenAI models; unrelated to the main Azure Model Router fix but is a clean additive change with no issues.
docs/my-website/docs/providers/azure_ai/azure_model_router.md	Documentation improvements: adds Quick Start section, cost calculation flow table, and configuration requirements. Clear and accurate for the described behaviour.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[completion_cost called] --> B[Extract request_model\nfrom litellm_logging_obj.model]
    B --> C[Call cost_per_token\nwith model + request_model]
    C --> D{is_router_request?\n_is_azure_model_router\nresponse model OR request model}
    D -- Yes --> E[generic_cost_per_token\nresponse model\ne.g. gpt-5-nano-2025-08-07]
    D -- No --> E2[generic_cost_per_token\nnormal model]
    E --> F{Model in cost map?}
    F -- Yes --> G[base prompt_cost\n+ completion_cost]
    F -- No --> H{_is_azure_model_router\nresponse model only\n⚠️ BUG: ignores is_router_request}
    H -- True --> I[Swallow exception\ncontinue with 0 cost]
    H -- False --> J[❌ Re-raise exception\neven if is_router_request=True]
    G --> K[Add router flat cost\nvia calculate_azure_model_router_flat_cost\nusing request_model]
    I --> K
    K --> L[Return prompt_cost, completion_cost]
    L --> M[_get_additional_costs\nmodel only, no request_model\n⚠️ BUG: breakdown misses flat cost\nfor router responses]
    M --> N[_store_cost_breakdown_in_logging_obj]
    L --> O[_final_cost = prompt + completion]

Comments Outside Diff (3)

litellm/llms/azure_ai/cost_calculator.py, line 106-115 (link)

Exception handler ignores is_router_request — may silently re-raise for router responses

The exception handler still only checks _is_azure_model_router(model) — the response model — when deciding whether to swallow the error. But the whole purpose of this PR is to handle the case where Azure returns the actual model name (e.g., gpt-5-nano-2025-08-07) in the response, which is not a router model name.

If gpt-5-nano-2025-08-07 (or any future actual model) is not present in LiteLLM's cost map, generic_cost_per_token will raise, _is_azure_model_router("gpt-5-nano-2025-08-07") returns False, and the exception propagates — losing the entire cost calculation even though is_router_request is True.

The fix needs to use is_router_request (which was correctly computed above) instead of re-calling _is_azure_model_router(model):
litellm/llms/azure_ai/cost_calculator.py, line 14-35 (link)

Hardcoded model-router string patterns violate the model-flags-in-JSON rule

_is_azure_model_router hardcodes the strings "model-router", "model_router", and "azure-model-router" to detect router deployments. Per the project's guidelines, model-specific capability flags should be placed in model_prices_and_context_window.json and read back via get_model_info — this prevents requiring a LiteLLM upgrade whenever a new naming convention is introduced.

The PR extends the reliance on this function (now also checking request_model against it), so the same risk applies to the new code path. Consider introducing a JSON field such as "is_model_router": true on the azure_ai/model_router entry and reading it via get_model_info at detection time, so future router deployment name patterns only need a JSON update rather than a code change.

Context Used: Rule from dashboard - What: Do not hardcode model-specific flags in the codebase. Instead, put them in model_prices_and_co... (source)
litellm/cost_calculator.py, line 1494-1500 (link)

_get_additional_costs missing request_model — cost breakdown incomplete for router responses

_get_additional_costs is called with only the response model (e.g., a plain model name like gpt-5-nano-2025-08-07). Internally it calls AzureFoundryModelInfo.get_azure_ai_config_for_model(model), which returns AzureAIStudioConfig for a plain model name — not AzureModelRouterConfig. This means the "Azure Model Router Flat Cost" entry is never written to the cost breakdown when Azure returns the actual model in the response.

The actual prompt_cost returned from cost_per_token is correct because the new request_model parameter is threaded in there, but the cost breakdown stored in the logging object (used for spend analytics / proxy dashboards) will be missing the flat cost line item. This creates an inconsistency between the reported total cost and the breakdown stored in litellm_logging_obj.cost_breakdown.

request_model_for_cost should be passed into _get_additional_costs (and propagated down through AzureFoundryModelInfo.get_azure_ai_config_for_model) so it can make the same router detection decision as cost_per_token does.

_{Last reviewed commit: c23eb5a}

- Pass request_model to Azure AI cost calculator to detect router requests - Add router flat cost ($0.14/M input tokens) even when Azure returns actual model in response - Add test for router flat cost with response containing actual model - Update docs with cost calculation flow and configuration requirements Made-with: Cursor

vercel bot deployed to Preview March 6, 2026 05:20 View deployment

Sameerlite force-pushed the litellm_azure-model-router-cost-tracking branch from 88e1e6c to c23eb5a Compare March 6, 2026 12:48

Sameerlite merged commit 118cad8 into main Mar 6, 2026
29 of 41 checks passed

vercel bot deployed to Preview March 6, 2026 12:50 View deployment

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(azure_ai): add router flat cost when response contains actual model#22957

feat(azure_ai): add router flat cost when response contains actual model#22957
Sameerlite merged 1 commit intomainfrom
litellm_azure-model-router-cost-tracking

Sameerlite commented Mar 6, 2026 •

edited

Loading

Uh oh!

vercel bot commented Mar 6, 2026 •

edited

Loading

Uh oh!

greptile-apps bot commented Mar 6, 2026 •

edited

Loading

Important Files Changed

Comments Outside Diff (3)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

Sameerlite commented Mar 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Problem

Solution

Changes

Configuration

Uh oh!

vercel bot commented Mar 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

greptile-apps bot commented Mar 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 2/5

Important Files Changed

Flowchart

Comments Outside Diff (3)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Sameerlite commented Mar 6, 2026 •

edited

Loading

vercel bot commented Mar 6, 2026 •

edited

Loading

greptile-apps bot commented Mar 6, 2026 •

edited

Loading