Skip to content

feat(azure_ai): add router flat cost when response contains actual model#22957

Merged
Sameerlite merged 1 commit intomainfrom
litellm_azure-model-router-cost-tracking
Mar 6, 2026
Merged

feat(azure_ai): add router flat cost when response contains actual model#22957
Sameerlite merged 1 commit intomainfrom
litellm_azure-model-router-cost-tracking

Conversation

@Sameerlite
Copy link
Collaborator

@Sameerlite Sameerlite commented Mar 6, 2026

Summary

Fixes Azure Model Router cost tracking so that the router flat cost ($0.14 per M input tokens) is correctly added even when Azure returns the actual model (e.g., gpt-5-nano-2025-08-07) in the response instead of the router deployment name.

Problem

When using Azure Model Router, the Azure API returns the actual model used in the response (e.g., gpt-5-nano-2025-08-07). LiteLLM was only checking if the response model was a router model to add the flat cost. Since the response contained the actual model name, the router flat cost was never added—only the model cost was tracked.

Solution

  • Pass request_model from the logging object to the Azure AI cost calculator
  • Detect router requests by checking both the response model and the original request model
  • Add the router flat cost when the request was made via a model router endpoint, regardless of what model Azure returns in the response

Changes

  • litellm/llms/azure_ai/cost_calculator.py: Add request_model parameter; use it to detect router requests and add flat cost
  • litellm/cost_calculator.py: Thread request_model from litellm_logging_obj.model through to Azure AI cost calculation
  • tests: Add test_router_flat_cost_when_response_has_actual_model to verify the fix
  • docs: Update Azure Model Router docs with cost calculation flow and configuration requirements

Configuration

For cost tracking to work correctly, use the full pattern: azure_ai/model_router/<deployment-name> (e.g., azure_ai/model_router/model-router).

image

Fixes LIT-2013

@vercel
Copy link

vercel bot commented Mar 6, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
litellm Ready Ready Preview, Comment Mar 6, 2026 0:50am

Request Review

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Mar 6, 2026

Greptile Summary

This PR fixes Azure Model Router cost tracking so that the $0.14/M-token infrastructure flat cost is correctly applied even when Azure returns the actual model name (e.g., gpt-5-nano-2025-08-07) in the response rather than the router deployment name. It achieves this by threading request_model from the logging object into the Azure AI cost calculator and using it as a fallback router-detection signal.

Key changes:

  • litellm/llms/azure_ai/cost_calculator.py: Adds request_model parameter; computes is_router_request from either the response model or the request model; uses request_model when calling calculate_azure_model_router_flat_cost.
  • litellm/cost_calculator.py: Extracts litellm_logging_obj.model as request_model_for_cost and passes it to cost_per_token.
  • tests/: Adds a mock-only test that validates flat cost is applied when request was via router but response contains an actual model name.
  • model_prices_and_context_window_backup.json: Unrelated addition of supports_web_search: true to several OpenAI models.

Issues found:

  • Exception handler regression (logic): litellm/llms/azure_ai/cost_calculator.py lines 106–115 — the except block still calls _is_azure_model_router(model) on the response model rather than checking the already-computed is_router_request. If Azure returns a model not yet in LiteLLM's cost database, the exception will propagate and the entire cost calculation fails, even for valid router requests.
  • Incomplete cost breakdown (logic): litellm/cost_calculator.py lines 1494–1500 — _get_additional_costs is not passed request_model, so AzureFoundryModelInfo.get_azure_ai_config_for_model picks AzureAIStudioConfig for the raw response model and the "Azure Model Router Flat Cost" entry is never written to the logging cost breakdown, creating a mismatch between the reported total cost and the breakdown visible in proxy dashboards.
  • Hardcoded router-name patterns: _is_azure_model_router uses hard-coded strings ("model-router", "model_router", "azure-model-router"), which is inconsistent with the project guideline to place model-capability flags in model_prices_and_context_window.json.

Confidence Score: 2/5

  • The core fix is correct in intent but contains two logic bugs that can cause cost calculation failures and silent cost-breakdown inconsistencies in production.
  • The exception handler in azure_ai/cost_calculator.py still tests only the response model for router detection, meaning any router response carrying an unknown model name will throw an exception and record $0 cost. Additionally, _get_additional_costs is not updated with request_model, producing an inconsistent cost breakdown in logging even when the total cost is calculated correctly. Both issues affect the stated goal of the PR.
  • litellm/llms/azure_ai/cost_calculator.py (exception handler on lines 106–115) and litellm/cost_calculator.py (the _get_additional_costs call on lines 1494–1500) need attention before merging.

Important Files Changed

Filename Overview
litellm/llms/azure_ai/cost_calculator.py Core fix adds request_model to detect router requests when Azure returns the actual model; however the exception handler still only checks _is_azure_model_router(model) (the response model), not is_router_request, which can cause the full cost calculation to fail when an unknown response model is returned via the router.
litellm/cost_calculator.py Correctly threads request_model from litellm_logging_obj.model into cost_per_token, but _get_additional_costs is not updated to receive request_model, leaving the cost breakdown logging incomplete when the response model is not a router identifier.
tests/test_litellm/llms/azure_ai/test_azure_ai_cost_calculator.py New test test_router_flat_cost_when_response_has_actual_model correctly validates the happy-path fix; all tests are mock-only (no real network calls), which is compliant with project rules.
litellm/model_prices_and_context_window_backup.json Adds supports_web_search: true to several OpenAI models; unrelated to the main Azure Model Router fix but is a clean additive change with no issues.
docs/my-website/docs/providers/azure_ai/azure_model_router.md Documentation improvements: adds Quick Start section, cost calculation flow table, and configuration requirements. Clear and accurate for the described behaviour.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[completion_cost called] --> B[Extract request_model\nfrom litellm_logging_obj.model]
    B --> C[Call cost_per_token\nwith model + request_model]
    C --> D{is_router_request?\n_is_azure_model_router\nresponse model OR request model}
    D -- Yes --> E[generic_cost_per_token\nresponse model\ne.g. gpt-5-nano-2025-08-07]
    D -- No --> E2[generic_cost_per_token\nnormal model]
    E --> F{Model in cost map?}
    F -- Yes --> G[base prompt_cost\n+ completion_cost]
    F -- No --> H{_is_azure_model_router\nresponse model only\n⚠️ BUG: ignores is_router_request}
    H -- True --> I[Swallow exception\ncontinue with 0 cost]
    H -- False --> J[❌ Re-raise exception\neven if is_router_request=True]
    G --> K[Add router flat cost\nvia calculate_azure_model_router_flat_cost\nusing request_model]
    I --> K
    K --> L[Return prompt_cost, completion_cost]
    L --> M[_get_additional_costs\nmodel only, no request_model\n⚠️ BUG: breakdown misses flat cost\nfor router responses]
    M --> N[_store_cost_breakdown_in_logging_obj]
    L --> O[_final_cost = prompt + completion]
Loading

Comments Outside Diff (3)

  1. litellm/llms/azure_ai/cost_calculator.py, line 106-115 (link)

    Exception handler ignores is_router_request — may silently re-raise for router responses

    The exception handler still only checks _is_azure_model_router(model) — the response model — when deciding whether to swallow the error. But the whole purpose of this PR is to handle the case where Azure returns the actual model name (e.g., gpt-5-nano-2025-08-07) in the response, which is not a router model name.

    If gpt-5-nano-2025-08-07 (or any future actual model) is not present in LiteLLM's cost map, generic_cost_per_token will raise, _is_azure_model_router("gpt-5-nano-2025-08-07") returns False, and the exception propagates — losing the entire cost calculation even though is_router_request is True.

    The fix needs to use is_router_request (which was correctly computed above) instead of re-calling _is_azure_model_router(model):

  2. litellm/llms/azure_ai/cost_calculator.py, line 14-35 (link)

    Hardcoded model-router string patterns violate the model-flags-in-JSON rule

    _is_azure_model_router hardcodes the strings "model-router", "model_router", and "azure-model-router" to detect router deployments. Per the project's guidelines, model-specific capability flags should be placed in model_prices_and_context_window.json and read back via get_model_info — this prevents requiring a LiteLLM upgrade whenever a new naming convention is introduced.

    The PR extends the reliance on this function (now also checking request_model against it), so the same risk applies to the new code path. Consider introducing a JSON field such as "is_model_router": true on the azure_ai/model_router entry and reading it via get_model_info at detection time, so future router deployment name patterns only need a JSON update rather than a code change.

    Context Used: Rule from dashboard - What: Do not hardcode model-specific flags in the codebase. Instead, put them in model_prices_and_co... (source)

  3. litellm/cost_calculator.py, line 1494-1500 (link)

    _get_additional_costs missing request_model — cost breakdown incomplete for router responses

    _get_additional_costs is called with only the response model (e.g., a plain model name like gpt-5-nano-2025-08-07). Internally it calls AzureFoundryModelInfo.get_azure_ai_config_for_model(model), which returns AzureAIStudioConfig for a plain model name — not AzureModelRouterConfig. This means the "Azure Model Router Flat Cost" entry is never written to the cost breakdown when Azure returns the actual model in the response.

    The actual prompt_cost returned from cost_per_token is correct because the new request_model parameter is threaded in there, but the cost breakdown stored in the logging object (used for spend analytics / proxy dashboards) will be missing the flat cost line item. This creates an inconsistency between the reported total cost and the breakdown stored in litellm_logging_obj.cost_breakdown.

    request_model_for_cost should be passed into _get_additional_costs (and propagated down through AzureFoundryModelInfo.get_azure_ai_config_for_model) so it can make the same router detection decision as cost_per_token does.

Last reviewed commit: c23eb5a

- Pass request_model to Azure AI cost calculator to detect router requests
- Add router flat cost ($0.14/M input tokens) even when Azure returns actual model in response
- Add test for router flat cost with response containing actual model
- Update docs with cost calculation flow and configuration requirements

Made-with: Cursor
@Sameerlite Sameerlite force-pushed the litellm_azure-model-router-cost-tracking branch from 88e1e6c to c23eb5a Compare March 6, 2026 12:48
@Sameerlite Sameerlite merged commit 118cad8 into main Mar 6, 2026
29 of 41 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant