Skip to content

[Fix] prevent shared backend model key from being polluted by per-deployment custom pricing#20679

Merged
ishaan-jaff merged 2 commits intomainfrom
litellm_custom_price_override_for_models
Feb 10, 2026
Merged

[Fix] prevent shared backend model key from being polluted by per-deployment custom pricing#20679
ishaan-jaff merged 2 commits intomainfrom
litellm_custom_price_override_for_models

Conversation

@shivamrawat1
Copy link
Collaborator

@shivamrawat1 shivamrawat1 commented Feb 7, 2026

Relevant issues

Closes #20546

Problem
When the proxy model_list contains two deployments that use the same backend model (e.g., both pointing to vertex_ai/gemini-2.5-flash), and one deployment has explicit zero-cost pricing in model_info while the other relies on built-in pricing, both models incorrectly reported $0 cost.
Example config:

model_list:  
  - model_name: gcp/google/gemini-2.5-flash.custom 
    litellm_params:      
      model: vertex_ai/gemini-2.5-flash    
      model_info:      
      input_cost_per_token: 0.0      
      output_cost_per_token: 0.0  
    model_info:
      id: gcp/google/gemini-2.5-flash.custom
  - model_name: gcp/google/gemini-2.5-flash.same   
    litellm_params:      
      model: vertex_ai/gemini-2.5-flash    # no custom pricing — should use built-in pricing
    model_info:
      id: gcp/google/gemini-2.5-flash.same

With both models present, the second model showed no (equivalent to 0) cost. With only the second model in the config, cost was correct.

Before:
Screenshot 2026-02-05 at 6 14 54 PM

After:
Screenshot 2026-02-05 at 6 15 37 PM

Model uses the custom pricing.
Note: Both models should have distintive id under model info,

Screenshot 2026-02-09 at 7 31 28 PM Screenshot 2026-02-09 at 7 31 01 PM

Pre-Submission checklist

Please complete all items before asking a LiteLLM maintainer to review your PR

  • I have Added testing in the tests/litellm/ directory, Adding at least 1 test is a hard requirement - see details
  • My PR passes all unit tests on make test-unit
  • My PR's scope is as isolated as possible, it only solves 1 specific problem

CI (LiteLLM team)

CI status guideline:

  • 50-55 passing tests: main is stable with minor issues.
  • 45-49 passing tests: acceptable but needs attention
  • <= 40 passing tests: unstable; be careful with your merges and assess the risk.
  • Branch creation CI run
    Link:

  • CI run for the last commit
    Link:

  • Merge / cherry-pick CI run
    Links:

Type

🐛 Bug Fix

Changes

Root Cause
In Router._create_deployment(), each deployment's model_info was registered in litellm.model_cost under two keys:
The deployment's unique model_id (safe — unique per deployment)
The shared backend model name (e.g., vertex_ai/gemini-2.5-flash) — this key is global and shared across all deployments using the same underlying model
When the first deployment (with input_cost_per_token: 0) was processed, its zero-cost pricing was written to the shared vertex_ai/gemini-2.5-flash key, overwriting the built-in pricing. The second deployment then picked up that zero-cost entry as its base, resulting in both reporting $0.
Fix
Strip custom pricing fields from the model info before registering under the shared backend model name. Each deployment's full pricing (including custom overrides) is still stored under its unique model_id. This prevents one deployment's pricing from polluting another deployment that shares the same backend.
The fix is backward compatible because:
The shared key is still always registered (preserving lookups by backend name)
Built-in pricing in the shared key is never overwritten by per-deployment overrides
The cost calculator already uses model_id for lookups when custom_pricing=True

@vercel
Copy link

vercel bot commented Feb 7, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
litellm Ready Ready Preview, Comment Feb 8, 2026 0:09am

Request Review

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Feb 8, 2026

Greptile Overview

Greptile Summary

Fixed pricing pollution bug where multiple deployments sharing the same backend model (e.g., vertex_ai/gemini-2.5-flash) would incorrectly inherit custom pricing from each other. The issue occurred because custom pricing fields (like input_cost_per_token: 0.0) from one deployment were being registered under the shared backend model key, affecting all other deployments using that model.

Key Changes:

  • Strips custom pricing fields from model_info before registering under the shared backend model name (vertex_ai/gemini-2.5-flash)
  • Each deployment's full pricing (including custom overrides) is still stored under its unique model_id
  • The shared backend key now only contains base model information and built-in pricing

Impact:

  • Prevents zero-cost deployments from making other deployments report $0 cost
  • Maintains backward compatibility since the shared key is still registered
  • Cost calculator already uses model_id for lookups when custom pricing is present

Confidence Score: 3/5

  • Safe to merge with test coverage recommendation
  • The fix correctly addresses the root cause by filtering custom pricing fields from the shared backend model registration while preserving them in the per-deployment registration. The logic is sound and backward compatible. However, the PR lacks test coverage for this specific scenario (multiple deployments with the same backend model but different custom pricing), which is a hard requirement according to the PR checklist. The fix is minimal and isolated, but without tests it's difficult to verify it fully resolves the issue and prevents regression.
  • No files require special attention beyond adding test coverage

Important Files Changed

Filename Overview
litellm/router.py Strips custom pricing fields from shared backend model registration to prevent pollution across deployments with same backend model

Sequence Diagram

sequenceDiagram
    participant Config as Router Config
    participant Router as Router._create_deployment
    participant Deployment as Deployment Object
    participant ModelCost as litellm.model_cost
    
    Note over Config,ModelCost: Deployment 1: custom zero-cost pricing
    Config->>Router: deployment_info, model_info (empty or base)
    Router->>Router: Extract custom pricing from litellm_params
    Router->>Router: Add custom pricing to _model_info<br/>(input_cost_per_token: 0.0)
    Router->>Deployment: Create deployment with model_info
    Router->>ModelCost: register_model(model_id: _model_info)<br/>[includes custom pricing]
    Router->>Router: Strip custom pricing fields<br/>_shared_model_info = filter(_model_info)
    Router->>ModelCost: register_model(vertex_ai/gemini: _shared_model_info)<br/>[no custom pricing, preserves built-in]
    
    Note over Config,ModelCost: Deployment 2: no custom pricing
    Config->>Router: deployment_info, model_info (empty or base)
    Router->>Router: No custom pricing in litellm_params
    Router->>Deployment: Create deployment with model_info
    Router->>ModelCost: register_model(model_id: _model_info)<br/>[uses built-in pricing]
    Router->>Router: Strip custom pricing fields<br/>_shared_model_info = filter(_model_info)
    Router->>ModelCost: register_model(vertex_ai/gemini: _shared_model_info)<br/>[preserves built-in, not polluted by Deployment 1]
    
    Note over ModelCost: Result: Each deployment has correct pricing<br/>Shared backend key has built-in pricing
Loading

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 file reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

Comment on lines +5933 to +5938
_custom_pricing_fields = CustomPricingLiteLLMParams.model_fields.keys()
_shared_model_info = {
k: v
for k, v in _model_info.items()
if k not in _custom_pricing_fields
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check that _model_info doesn't become empty after filtering. If a deployment only has custom pricing fields and no other model info, _shared_model_info could be empty or contain no useful data for the shared backend key.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@greptile If _shared_model_info ends up empty (because _model_info only contained custom pricing fields), calling register_model with an empty dict is a harmless no-op — _update_dictionary(existing_model, {}) iterates over zero keys and leaves the built-in pricing untouched.

@shivamrawat1
Copy link
Collaborator Author

shivamrawat1 commented Feb 8, 2026

@greptile review the pr again with the tests added

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Feb 8, 2026

Greptile Overview

Greptile Summary

This PR fixes a critical pricing bug where per-deployment custom pricing was polluting the shared backend model key in litellm.model_cost, causing deployments without custom pricing to incorrectly inherit zero or custom costs from other deployments sharing the same backend model.

Changes:

  • Modified Router._create_deployment() in litellm/router.py to filter out custom pricing fields before registering under the shared backend model name
  • Each deployment's full pricing (including custom overrides) is still stored under its unique model_id key
  • The shared backend key now only contains non-pricing metadata, allowing register_model() to preserve built-in pricing
  • Added comprehensive test suite with 4 test cases covering zero-cost pricing, non-zero custom pricing, model_id storage, and deployment order independence

Technical correctness:

  • The fix correctly identifies all custom pricing fields from CustomPricingLiteLLMParams.model_fields.keys()
  • Empty _shared_model_info dict is safe because register_model() fetches fresh built-in pricing and merges with it
  • The lookup mechanism in get_deployment_model_info() correctly merges deployment-specific info with built-in pricing
  • Backward compatible because shared key registration is preserved and deployment-specific pricing is still fully accessible via model_id

Confidence Score: 5/5

  • This PR is safe to merge with high confidence
  • The fix is surgically targeted, well-tested, and solves a clear bug without introducing new issues. The change correctly strips only custom pricing fields before shared key registration while preserving deployment-specific pricing under unique model_id keys. The comprehensive test suite validates the fix across multiple scenarios including zero-cost, non-zero custom pricing, and deployment order independence. The implementation is backward compatible and aligns with the existing cost lookup mechanism in get_deployment_model_info().
  • No files require special attention

Important Files Changed

Filename Overview
litellm/router.py Correctly strips custom pricing fields before registering shared backend model key, preventing pricing pollution across deployments
tests/test_litellm/test_router_model_cost_isolation.py Comprehensive test coverage with 4 test cases covering zero-cost pricing, non-zero custom pricing, model_id storage, and deployment order independence

Sequence Diagram

sequenceDiagram
    participant Router
    participant DeployA as Deployment A<br/>(custom pricing)
    participant DeployB as Deployment B<br/>(built-in pricing)
    participant ModelCost as litellm.model_cost
    participant BuiltIn as Built-in Pricing

    Note over Router: Process Deployment A
    Router->>DeployA: Create with custom pricing
    Router->>ModelCost: register(deployment-a, full pricing)
    Note over ModelCost: Deployment-specific key stored
    
    Router->>Router: Strip custom pricing fields
    Router->>BuiltIn: get_model_info(backend_model)
    BuiltIn-->>Router: Return built-in pricing
    Router->>ModelCost: register(backend_model, stripped info)
    Note over ModelCost: Shared key keeps built-in pricing

    Note over Router: Process Deployment B
    Router->>DeployB: Create without custom pricing
    Router->>ModelCost: register(deployment-b, model_info)
    Note over ModelCost: Deployment-specific key stored
    
    Router->>Router: Strip custom pricing (none exist)
    Router->>BuiltIn: get_model_info(backend_model)
    BuiltIn-->>Router: Return built-in pricing
    Router->>ModelCost: register(backend_model, stripped info)
    Note over ModelCost: Shared key maintains built-in pricing

    Note over Router,ModelCost: Lookup Phase
    Router->>ModelCost: get_deployment_model_info(deployment-a)
    ModelCost-->>Router: Custom pricing returned
    
    Router->>ModelCost: get_deployment_model_info(deployment-b)
    ModelCost-->>Router: Non-pricing fields only
    Router->>BuiltIn: get_model_info(backend_model)
    BuiltIn-->>Router: Built-in pricing
    Router->>Router: Merge deployment-b + built-in
    Note over Router: Deployment B correctly uses built-in pricing
Loading

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 files reviewed, no comments

Edit Code Review Agent Settings | Greptile

Copy link
Member

@ishaan-jaff ishaan-jaff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@ishaan-jaff ishaan-jaff merged commit 1ee43b1 into main Feb 10, 2026
63 of 67 checks passed
krrishdholakia pushed a commit that referenced this pull request Feb 10, 2026
…logging_payload is missing (#20851)

* fix: Preserved nullable object fields by carrying schema properties

* Fix: _convert_schema_types

* Fix all mypy issues

* Add alert about email notifications

* fixing tests

* extending timeout for long running tests

* Text changes

* [Feat] MCP Oauth2 Fixes - Add support for MCP M2M Oauth2 support (#20788)

* add has_client_credentials

* MCPOAuth2TokenCache

* init MCP Oauth2 constants

* MCPOAuth2TokenCache

* resolve_mcp_auth

* test fixes

* docs fix

* address greptile review: min TTL, env-configurable constants, tests, docs

- Fix zero-TTL edge case: floor at MCP_OAUTH2_TOKEN_CACHE_MIN_TTL (10s)
- Make all MCP OAuth2 constants env-configurable via os.getenv()
- Move test file to follow 1:1 mapping convention (test_oauth2_token_cache.py)
- Add MCP OAuth doc page (mcp_oauth.md) with M2M and PKCE sections
- Update FAQ in mcp.md to reflect M2M support
- Add E2E test script and config

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix mypy lint

* fix oauth2

* remove old files

* docs fix

* address greptile comments

* fix: atomic lock creation + validate JSON response shape

- Use dict.setdefault() for atomic per-server lock creation
- Add isinstance(body, dict) check before accessing token response fields

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: replace asserts with proper guards, wrap HTTP errors with context

- Replace `assert` statements with `if/raise ValueError` (asserts can be
  disabled with python -O in production)
- Wrap `httpx.HTTPStatusError` to provide a clear error message with
  server_id and status code
- Add tests for HTTP error and non-dict JSON response error paths
- Remove unused imports

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* [UI] M2M OAuth2 UI Flow  (#20794)

* add has_client_credentials

* MCPOAuth2TokenCache

* init MCP Oauth2 constants

* MCPOAuth2TokenCache

* resolve_mcp_auth

* test fixes

* docs fix

* address greptile review: min TTL, env-configurable constants, tests, docs

- Fix zero-TTL edge case: floor at MCP_OAUTH2_TOKEN_CACHE_MIN_TTL (10s)
- Make all MCP OAuth2 constants env-configurable via os.getenv()
- Move test file to follow 1:1 mapping convention (test_oauth2_token_cache.py)
- Add MCP OAuth doc page (mcp_oauth.md) with M2M and PKCE sections
- Update FAQ in mcp.md to reflect M2M support
- Add E2E test script and config

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix mypy lint

* fix oauth2

* ui feat fixes

* test M2M

* test fix

* ui feats

* ui fixes

* ui fix client ID

* fix: backend endpoints

* docs fix

* fixes greptile

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* [Fix] prevent shared backend model key from being polluted by per-deployment custom pricing (#20679)

* bug: custom price override for models

* added associated test

* fix(mcp): resolve OAuth2 root endpoints returning "MCP server not found" (#20784)

When MCP SDK hits root-level /register, /authorize, /token without
server name prefix, auto-resolve to the single configured OAuth2
server. Also fix WWW-Authenticate header to use correct public URL
behind reverse proxy.

* Add support for langchain_aws via litellm passthrough

* fix(proxy): return early instead of raising ValueError when standard_logging_payload is missing

The `_PROXY_VirtualKeyModelMaxBudgetLimiter.async_log_success_event` hook
raises `ValueError` when `standard_logging_payload` is `None`.  This breaks
non-standard call types (e.g. vLLM `/classify`) that do not populate the
payload, and the resulting exception disrupts downstream success callbacks
like Langfuse.

Return early with a debug log instead, matching the existing pattern used
for missing `user_api_key_model_max_budget`.

Fixes #18986

---------

Co-authored-by: Sameer Kankute <sameer@berri.ai>
Co-authored-by: yuneng-jiang <yuneng.jiang@gmail.com>
Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Shivam Rawat <161387515+shivamrawat1@users.noreply.github.com>
Co-authored-by: michelligabriele <gabriele.michelli@icloud.com>
Sameerlite added a commit that referenced this pull request Feb 11, 2026
…logging_payload is missing (#20851)

* fix: Preserved nullable object fields by carrying schema properties

* Fix: _convert_schema_types

* Fix all mypy issues

* Add alert about email notifications

* fixing tests

* extending timeout for long running tests

* Text changes

* [Feat] MCP Oauth2 Fixes - Add support for MCP M2M Oauth2 support (#20788)

* add has_client_credentials

* MCPOAuth2TokenCache

* init MCP Oauth2 constants

* MCPOAuth2TokenCache

* resolve_mcp_auth

* test fixes

* docs fix

* address greptile review: min TTL, env-configurable constants, tests, docs

- Fix zero-TTL edge case: floor at MCP_OAUTH2_TOKEN_CACHE_MIN_TTL (10s)
- Make all MCP OAuth2 constants env-configurable via os.getenv()
- Move test file to follow 1:1 mapping convention (test_oauth2_token_cache.py)
- Add MCP OAuth doc page (mcp_oauth.md) with M2M and PKCE sections
- Update FAQ in mcp.md to reflect M2M support
- Add E2E test script and config

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix mypy lint

* fix oauth2

* remove old files

* docs fix

* address greptile comments

* fix: atomic lock creation + validate JSON response shape

- Use dict.setdefault() for atomic per-server lock creation
- Add isinstance(body, dict) check before accessing token response fields

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: replace asserts with proper guards, wrap HTTP errors with context

- Replace `assert` statements with `if/raise ValueError` (asserts can be
  disabled with python -O in production)
- Wrap `httpx.HTTPStatusError` to provide a clear error message with
  server_id and status code
- Add tests for HTTP error and non-dict JSON response error paths
- Remove unused imports

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* [UI] M2M OAuth2 UI Flow  (#20794)

* add has_client_credentials

* MCPOAuth2TokenCache

* init MCP Oauth2 constants

* MCPOAuth2TokenCache

* resolve_mcp_auth

* test fixes

* docs fix

* address greptile review: min TTL, env-configurable constants, tests, docs

- Fix zero-TTL edge case: floor at MCP_OAUTH2_TOKEN_CACHE_MIN_TTL (10s)
- Make all MCP OAuth2 constants env-configurable via os.getenv()
- Move test file to follow 1:1 mapping convention (test_oauth2_token_cache.py)
- Add MCP OAuth doc page (mcp_oauth.md) with M2M and PKCE sections
- Update FAQ in mcp.md to reflect M2M support
- Add E2E test script and config

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix mypy lint

* fix oauth2

* ui feat fixes

* test M2M

* test fix

* ui feats

* ui fixes

* ui fix client ID

* fix: backend endpoints

* docs fix

* fixes greptile

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* [Fix] prevent shared backend model key from being polluted by per-deployment custom pricing (#20679)

* bug: custom price override for models

* added associated test

* fix(mcp): resolve OAuth2 root endpoints returning "MCP server not found" (#20784)

When MCP SDK hits root-level /register, /authorize, /token without
server name prefix, auto-resolve to the single configured OAuth2
server. Also fix WWW-Authenticate header to use correct public URL
behind reverse proxy.

* Add support for langchain_aws via litellm passthrough

* fix(proxy): return early instead of raising ValueError when standard_logging_payload is missing

The `_PROXY_VirtualKeyModelMaxBudgetLimiter.async_log_success_event` hook
raises `ValueError` when `standard_logging_payload` is `None`.  This breaks
non-standard call types (e.g. vLLM `/classify`) that do not populate the
payload, and the resulting exception disrupts downstream success callbacks
like Langfuse.

Return early with a debug log instead, matching the existing pattern used
for missing `user_api_key_model_max_budget`.

Fixes #18986

---------

Co-authored-by: Sameer Kankute <sameer@berri.ai>
Co-authored-by: yuneng-jiang <yuneng.jiang@gmail.com>
Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Shivam Rawat <161387515+shivamrawat1@users.noreply.github.com>
Co-authored-by: michelligabriele <gabriele.michelli@icloud.com>
Chesars added a commit to Chesars/litellm that referenced this pull request Feb 27, 2026
…icing

image_edit was not forwarding model_info/metadata to the logging object,
so custom_pricing was never detected. After PR BerriAI#20679 stripped custom
pricing fields from the shared backend key, image_edit cost became 0.

Fixes BerriAI#22244
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants