[Fix] prevent shared backend model key from being polluted by per-deployment custom pricing#20679
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
Greptile OverviewGreptile SummaryFixed pricing pollution bug where multiple deployments sharing the same backend model (e.g., Key Changes:
Impact:
Confidence Score: 3/5
|
| Filename | Overview |
|---|---|
| litellm/router.py | Strips custom pricing fields from shared backend model registration to prevent pollution across deployments with same backend model |
Sequence Diagram
sequenceDiagram
participant Config as Router Config
participant Router as Router._create_deployment
participant Deployment as Deployment Object
participant ModelCost as litellm.model_cost
Note over Config,ModelCost: Deployment 1: custom zero-cost pricing
Config->>Router: deployment_info, model_info (empty or base)
Router->>Router: Extract custom pricing from litellm_params
Router->>Router: Add custom pricing to _model_info<br/>(input_cost_per_token: 0.0)
Router->>Deployment: Create deployment with model_info
Router->>ModelCost: register_model(model_id: _model_info)<br/>[includes custom pricing]
Router->>Router: Strip custom pricing fields<br/>_shared_model_info = filter(_model_info)
Router->>ModelCost: register_model(vertex_ai/gemini: _shared_model_info)<br/>[no custom pricing, preserves built-in]
Note over Config,ModelCost: Deployment 2: no custom pricing
Config->>Router: deployment_info, model_info (empty or base)
Router->>Router: No custom pricing in litellm_params
Router->>Deployment: Create deployment with model_info
Router->>ModelCost: register_model(model_id: _model_info)<br/>[uses built-in pricing]
Router->>Router: Strip custom pricing fields<br/>_shared_model_info = filter(_model_info)
Router->>ModelCost: register_model(vertex_ai/gemini: _shared_model_info)<br/>[preserves built-in, not polluted by Deployment 1]
Note over ModelCost: Result: Each deployment has correct pricing<br/>Shared backend key has built-in pricing
| _custom_pricing_fields = CustomPricingLiteLLMParams.model_fields.keys() | ||
| _shared_model_info = { | ||
| k: v | ||
| for k, v in _model_info.items() | ||
| if k not in _custom_pricing_fields | ||
| } |
There was a problem hiding this comment.
Check that _model_info doesn't become empty after filtering. If a deployment only has custom pricing fields and no other model info, _shared_model_info could be empty or contain no useful data for the shared backend key.
There was a problem hiding this comment.
@greptile If _shared_model_info ends up empty (because _model_info only contained custom pricing fields), calling register_model with an empty dict is a harmless no-op — _update_dictionary(existing_model, {}) iterates over zero keys and leaves the built-in pricing untouched.
|
@greptile review the pr again with the tests added |
Greptile OverviewGreptile SummaryThis PR fixes a critical pricing bug where per-deployment custom pricing was polluting the shared backend model key in Changes:
Technical correctness:
Confidence Score: 5/5
|
| Filename | Overview |
|---|---|
| litellm/router.py | Correctly strips custom pricing fields before registering shared backend model key, preventing pricing pollution across deployments |
| tests/test_litellm/test_router_model_cost_isolation.py | Comprehensive test coverage with 4 test cases covering zero-cost pricing, non-zero custom pricing, model_id storage, and deployment order independence |
Sequence Diagram
sequenceDiagram
participant Router
participant DeployA as Deployment A<br/>(custom pricing)
participant DeployB as Deployment B<br/>(built-in pricing)
participant ModelCost as litellm.model_cost
participant BuiltIn as Built-in Pricing
Note over Router: Process Deployment A
Router->>DeployA: Create with custom pricing
Router->>ModelCost: register(deployment-a, full pricing)
Note over ModelCost: Deployment-specific key stored
Router->>Router: Strip custom pricing fields
Router->>BuiltIn: get_model_info(backend_model)
BuiltIn-->>Router: Return built-in pricing
Router->>ModelCost: register(backend_model, stripped info)
Note over ModelCost: Shared key keeps built-in pricing
Note over Router: Process Deployment B
Router->>DeployB: Create without custom pricing
Router->>ModelCost: register(deployment-b, model_info)
Note over ModelCost: Deployment-specific key stored
Router->>Router: Strip custom pricing (none exist)
Router->>BuiltIn: get_model_info(backend_model)
BuiltIn-->>Router: Return built-in pricing
Router->>ModelCost: register(backend_model, stripped info)
Note over ModelCost: Shared key maintains built-in pricing
Note over Router,ModelCost: Lookup Phase
Router->>ModelCost: get_deployment_model_info(deployment-a)
ModelCost-->>Router: Custom pricing returned
Router->>ModelCost: get_deployment_model_info(deployment-b)
ModelCost-->>Router: Non-pricing fields only
Router->>BuiltIn: get_model_info(backend_model)
BuiltIn-->>Router: Built-in pricing
Router->>Router: Merge deployment-b + built-in
Note over Router: Deployment B correctly uses built-in pricing
…logging_payload is missing (#20851) * fix: Preserved nullable object fields by carrying schema properties * Fix: _convert_schema_types * Fix all mypy issues * Add alert about email notifications * fixing tests * extending timeout for long running tests * Text changes * [Feat] MCP Oauth2 Fixes - Add support for MCP M2M Oauth2 support (#20788) * add has_client_credentials * MCPOAuth2TokenCache * init MCP Oauth2 constants * MCPOAuth2TokenCache * resolve_mcp_auth * test fixes * docs fix * address greptile review: min TTL, env-configurable constants, tests, docs - Fix zero-TTL edge case: floor at MCP_OAUTH2_TOKEN_CACHE_MIN_TTL (10s) - Make all MCP OAuth2 constants env-configurable via os.getenv() - Move test file to follow 1:1 mapping convention (test_oauth2_token_cache.py) - Add MCP OAuth doc page (mcp_oauth.md) with M2M and PKCE sections - Update FAQ in mcp.md to reflect M2M support - Add E2E test script and config Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix mypy lint * fix oauth2 * remove old files * docs fix * address greptile comments * fix: atomic lock creation + validate JSON response shape - Use dict.setdefault() for atomic per-server lock creation - Add isinstance(body, dict) check before accessing token response fields Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: replace asserts with proper guards, wrap HTTP errors with context - Replace `assert` statements with `if/raise ValueError` (asserts can be disabled with python -O in production) - Wrap `httpx.HTTPStatusError` to provide a clear error message with server_id and status code - Add tests for HTTP error and non-dict JSON response error paths - Remove unused imports Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> * [UI] M2M OAuth2 UI Flow (#20794) * add has_client_credentials * MCPOAuth2TokenCache * init MCP Oauth2 constants * MCPOAuth2TokenCache * resolve_mcp_auth * test fixes * docs fix * address greptile review: min TTL, env-configurable constants, tests, docs - Fix zero-TTL edge case: floor at MCP_OAUTH2_TOKEN_CACHE_MIN_TTL (10s) - Make all MCP OAuth2 constants env-configurable via os.getenv() - Move test file to follow 1:1 mapping convention (test_oauth2_token_cache.py) - Add MCP OAuth doc page (mcp_oauth.md) with M2M and PKCE sections - Update FAQ in mcp.md to reflect M2M support - Add E2E test script and config Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix mypy lint * fix oauth2 * ui feat fixes * test M2M * test fix * ui feats * ui fixes * ui fix client ID * fix: backend endpoints * docs fix * fixes greptile --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> * [Fix] prevent shared backend model key from being polluted by per-deployment custom pricing (#20679) * bug: custom price override for models * added associated test * fix(mcp): resolve OAuth2 root endpoints returning "MCP server not found" (#20784) When MCP SDK hits root-level /register, /authorize, /token without server name prefix, auto-resolve to the single configured OAuth2 server. Also fix WWW-Authenticate header to use correct public URL behind reverse proxy. * Add support for langchain_aws via litellm passthrough * fix(proxy): return early instead of raising ValueError when standard_logging_payload is missing The `_PROXY_VirtualKeyModelMaxBudgetLimiter.async_log_success_event` hook raises `ValueError` when `standard_logging_payload` is `None`. This breaks non-standard call types (e.g. vLLM `/classify`) that do not populate the payload, and the resulting exception disrupts downstream success callbacks like Langfuse. Return early with a debug log instead, matching the existing pattern used for missing `user_api_key_model_max_budget`. Fixes #18986 --------- Co-authored-by: Sameer Kankute <sameer@berri.ai> Co-authored-by: yuneng-jiang <yuneng.jiang@gmail.com> Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: Shivam Rawat <161387515+shivamrawat1@users.noreply.github.com> Co-authored-by: michelligabriele <gabriele.michelli@icloud.com>
…logging_payload is missing (#20851) * fix: Preserved nullable object fields by carrying schema properties * Fix: _convert_schema_types * Fix all mypy issues * Add alert about email notifications * fixing tests * extending timeout for long running tests * Text changes * [Feat] MCP Oauth2 Fixes - Add support for MCP M2M Oauth2 support (#20788) * add has_client_credentials * MCPOAuth2TokenCache * init MCP Oauth2 constants * MCPOAuth2TokenCache * resolve_mcp_auth * test fixes * docs fix * address greptile review: min TTL, env-configurable constants, tests, docs - Fix zero-TTL edge case: floor at MCP_OAUTH2_TOKEN_CACHE_MIN_TTL (10s) - Make all MCP OAuth2 constants env-configurable via os.getenv() - Move test file to follow 1:1 mapping convention (test_oauth2_token_cache.py) - Add MCP OAuth doc page (mcp_oauth.md) with M2M and PKCE sections - Update FAQ in mcp.md to reflect M2M support - Add E2E test script and config Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix mypy lint * fix oauth2 * remove old files * docs fix * address greptile comments * fix: atomic lock creation + validate JSON response shape - Use dict.setdefault() for atomic per-server lock creation - Add isinstance(body, dict) check before accessing token response fields Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: replace asserts with proper guards, wrap HTTP errors with context - Replace `assert` statements with `if/raise ValueError` (asserts can be disabled with python -O in production) - Wrap `httpx.HTTPStatusError` to provide a clear error message with server_id and status code - Add tests for HTTP error and non-dict JSON response error paths - Remove unused imports Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> * [UI] M2M OAuth2 UI Flow (#20794) * add has_client_credentials * MCPOAuth2TokenCache * init MCP Oauth2 constants * MCPOAuth2TokenCache * resolve_mcp_auth * test fixes * docs fix * address greptile review: min TTL, env-configurable constants, tests, docs - Fix zero-TTL edge case: floor at MCP_OAUTH2_TOKEN_CACHE_MIN_TTL (10s) - Make all MCP OAuth2 constants env-configurable via os.getenv() - Move test file to follow 1:1 mapping convention (test_oauth2_token_cache.py) - Add MCP OAuth doc page (mcp_oauth.md) with M2M and PKCE sections - Update FAQ in mcp.md to reflect M2M support - Add E2E test script and config Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix mypy lint * fix oauth2 * ui feat fixes * test M2M * test fix * ui feats * ui fixes * ui fix client ID * fix: backend endpoints * docs fix * fixes greptile --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> * [Fix] prevent shared backend model key from being polluted by per-deployment custom pricing (#20679) * bug: custom price override for models * added associated test * fix(mcp): resolve OAuth2 root endpoints returning "MCP server not found" (#20784) When MCP SDK hits root-level /register, /authorize, /token without server name prefix, auto-resolve to the single configured OAuth2 server. Also fix WWW-Authenticate header to use correct public URL behind reverse proxy. * Add support for langchain_aws via litellm passthrough * fix(proxy): return early instead of raising ValueError when standard_logging_payload is missing The `_PROXY_VirtualKeyModelMaxBudgetLimiter.async_log_success_event` hook raises `ValueError` when `standard_logging_payload` is `None`. This breaks non-standard call types (e.g. vLLM `/classify`) that do not populate the payload, and the resulting exception disrupts downstream success callbacks like Langfuse. Return early with a debug log instead, matching the existing pattern used for missing `user_api_key_model_max_budget`. Fixes #18986 --------- Co-authored-by: Sameer Kankute <sameer@berri.ai> Co-authored-by: yuneng-jiang <yuneng.jiang@gmail.com> Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: Shivam Rawat <161387515+shivamrawat1@users.noreply.github.com> Co-authored-by: michelligabriele <gabriele.michelli@icloud.com>
…icing image_edit was not forwarding model_info/metadata to the logging object, so custom_pricing was never detected. After PR BerriAI#20679 stripped custom pricing fields from the shared backend key, image_edit cost became 0. Fixes BerriAI#22244
Relevant issues
Closes #20546
Problem
When the proxy model_list contains two deployments that use the same backend model (e.g., both pointing to vertex_ai/gemini-2.5-flash), and one deployment has explicit zero-cost pricing in model_info while the other relies on built-in pricing, both models incorrectly reported $0 cost.
Example config:
With both models present, the second model showed no (equivalent to 0) cost. With only the second model in the config, cost was correct.
Before:

After:

Model uses the custom pricing.
Note: Both models should have distintive id under model info,
Pre-Submission checklist
Please complete all items before asking a LiteLLM maintainer to review your PR
tests/litellm/directory, Adding at least 1 test is a hard requirement - see detailsmake test-unitCI (LiteLLM team)
Branch creation CI run
Link:
CI run for the last commit
Link:
Merge / cherry-pick CI run
Links:
Type
🐛 Bug Fix
Changes
Root Cause
In Router._create_deployment(), each deployment's model_info was registered in litellm.model_cost under two keys:
The deployment's unique model_id (safe — unique per deployment)
The shared backend model name (e.g., vertex_ai/gemini-2.5-flash) — this key is global and shared across all deployments using the same underlying model
When the first deployment (with input_cost_per_token: 0) was processed, its zero-cost pricing was written to the shared vertex_ai/gemini-2.5-flash key, overwriting the built-in pricing. The second deployment then picked up that zero-cost entry as its base, resulting in both reporting $0.
Fix
Strip custom pricing fields from the model info before registering under the shared backend model name. Each deployment's full pricing (including custom overrides) is still stored under its unique model_id. This prevents one deployment's pricing from polluting another deployment that shares the same backend.
The fix is backward compatible because:
The shared key is still always registered (preserving lookups by backend name)
Built-in pricing in the shared key is never overwritten by per-deployment overrides
The cost calculator already uses model_id for lookups when custom_pricing=True