fix: preserve deployment custom pricing for /responses and /messages#23239
fix: preserve deployment custom pricing for /responses and /messages#23239DmitriyAlergant wants to merge 17 commits intoBerriAI:mainfrom
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
|
@greptileai please review |
Greptile SummaryThis PR fixes a regression introduced by PR #20679 where custom pricing from Key changes:
Confidence Score: 4/5
|
| Filename | Overview |
|---|---|
| litellm/litellm_core_utils/core_helpers.py | Adds two well-implemented helper functions: get_model_info_from_litellm_params (with sentinel pattern for explicit empty dict) and merge_metadata_preserving_deployment_model_info (preserves deployment model_info while merging user metadata). Clean, centralized solution. Minor: or {} for metadata uses falsy semantics while model_info uses sentinel, creating a subtle inconsistency (e.g., metadata=None falls through to lower-priority locations). |
| litellm/litellm_core_utils/litellm_logging.py | Three cost calculation call sites updated to extract router_model_id from model_call_details["litellm_params"] via the new helper — covering non-streaming, sync-streaming, and async-streaming paths. The change in _process_hidden_params_and_response_cost now applies to all non-streaming call types (not just responses/messages), which is guarded by the existing elif response_cost is not None: pass branch for most non-responses paths. |
| litellm/llms/custom_httpx/llm_http_handler.py | Fixes the /v1/messages path by using merge_metadata_preserving_deployment_model_info to build the litellm_params["metadata"] passed to update_environment_variables, ensuring model_info from litellm_metadata reaches the logging object's metadata for cost calculation. |
| litellm/proxy/pass_through_endpoints/llm_provider_handlers/anthropic_passthrough_logging_handler.py | Explicitly passes router_model_id and custom_pricing to completion_cost for the passthrough path. Also merges logging-obj metadata into kwargs["litellm_params"]["metadata"] using the helper, preserving deployment model_info while allowing existing pipeline keys to win for non-model_info fields. Well-tested by the new unit test. |
| litellm/responses/main.py | Core bug fix: replaces the broken metadata or kwargs.get("litellm_metadata") or {} or-chain with merge_metadata_preserving_deployment_model_info(...), ensuring litellm_metadata["model_info"] (router-injected) is never silently dropped when user metadata is also non-empty. |
| tests/test_litellm/test_custom_pricing_metadata_propagation.py | New comprehensive test file covering unit tests for both helpers and integration tests for all responses API paths (async non-streaming, async streaming, sync streaming) and the messages API path. All HTTP calls are properly mocked. Minor: list[str] type annotation (line 143) is Python 3.9+ syntax — consistent with the rest of the codebase. |
| tests/test_litellm/litellm_core_utils/test_litellm_logging.py | Adds unit tests verifying the sentinel-based empty-dict behavior of get_model_info_from_litellm_params and the model_info precedence of merge_metadata_preserving_deployment_model_info. Also adds integration tests for _process_hidden_params_and_response_cost and success_handler confirming router_model_id is passed to _response_cost_calculator for aresponses call types. |
| tests/test_litellm/proxy/pass_through_endpoints/llm_provider_handlers/test_anthropic_passthrough_logging_handler.py | Adds TestAnthropicPassthroughLoggingPayload class with streaming integration tests using a mocked HTTP response and a CostCapturingCallback to verify custom pricing is applied for the passthrough streaming path. Also adds a unit test verifying the metadata merge doesn't clobber existing litellm_params keys like stream_response. |
Sequence Diagram
sequenceDiagram
participant User
participant Router
participant responses_main as responses/main.py
participant llm_http as llm_http_handler.py
participant passthrough as anthropic_passthrough_logging_handler.py
participant logging as litellm_logging.py
participant core_helpers as core_helpers.py
User->>Router: aresponses(model, input, metadata={...})
Router->>Router: _update_kwargs_with_deployment()<br/>sets kwargs["litellm_metadata"]["model_info"]
Router->>responses_main: responses(kwargs incl. litellm_metadata)
responses_main->>core_helpers: merge_metadata_preserving_deployment_model_info(<br/>litellm_metadata=kwargs["litellm_metadata"],<br/>user_metadata=metadata, model_info=kwargs["model_info"])
core_helpers-->>responses_main: metadata_for_callbacks<br/>(model_info preserved from litellm_metadata)
responses_main->>logging: update_environment_variables(<br/>litellm_params={"metadata": metadata_for_callbacks})
Note over logging: self.litellm_params["metadata"]["model_info"] = deployment model_info
responses_main->>llm_http: HTTP call
alt /v1/messages path
llm_http->>core_helpers: merge_metadata_preserving_deployment_model_info(...)
core_helpers-->>llm_http: merged_metadata (model_info preserved)
llm_http->>logging: update_environment_variables(litellm_params={"metadata": merged_metadata})
end
alt Anthropic passthrough path
passthrough->>core_helpers: get_model_info_from_litellm_params(logging_obj.litellm_params)
core_helpers-->>passthrough: model_info
passthrough->>passthrough: completion_cost(router_model_id=model_info["id"],<br/>custom_pricing=use_custom_pricing_for_model(...))
end
llm_http-->>responses_main: response
responses_main->>logging: success_handler(result)
logging->>core_helpers: get_model_info_from_litellm_params(model_call_details["litellm_params"])
core_helpers-->>logging: model_info with deployment id
logging->>logging: _response_cost_calculator(result,<br/>router_model_id=model_info["id"])
Note over logging: Custom pricing applied via deployment ID lookup
logging-->>User: async_log_success_event(kwargs, response_cost=custom_price)
Last reviewed commit: b19fcaa
| kwargs["model"] = model | ||
| kwargs.setdefault("litellm_params", {}) | ||
| if litellm_params: | ||
| kwargs["litellm_params"].update(litellm_params) |
There was a problem hiding this comment.
Unconditional .update(litellm_params) may overwrite existing kwargs["litellm_params"] keys
litellm_params here is extracted from logging_obj.model_call_details["litellm_params"], which was populated by update_environment_variables in llm_http_handler.py. It contains not only metadata (the key you need to propagate) but also preset_cache_key: None, stream_response: {}, and all of anthropic_messages_optional_request_params (e.g. max_tokens, stream, etc.).
If kwargs["litellm_params"] already holds data that was set earlier in the proxy request pipeline, calling .update(litellm_params) overwrites any conflicting keys. The intent appears to be to ensure metadata with model_info reaches the downstream logging callbacks — a targeted merge on just metadata would be safer:
kwargs.setdefault("litellm_params", {})
if litellm_params:
# Only propagate metadata (which carries model_info for custom pricing).
# Avoid clobbering unrelated keys already set in kwargs["litellm_params"].
kwargs["litellm_params"].setdefault("metadata", {})
kwargs["litellm_params"]["metadata"].update(
litellm_params.get("metadata") or {}
)| _lp_meta = ( | ||
| self.model_call_details | ||
| .get("litellm_params", {}) | ||
| .get("metadata", {}) or {} | ||
| ) | ||
| _lp = self.model_call_details.get("litellm_params", {}) or {} | ||
| _mi = ( | ||
| _lp_meta.get("model_info") | ||
| or _lp.get("model_info") | ||
| or (_lp.get("litellm_metadata", {}) or {}).get("model_info") | ||
| or {} | ||
| ) |
There was a problem hiding this comment.
Duplicated model_info resolution logic
The identical 3-location lookup pattern:
_lp_meta = self.model_call_details.get("litellm_params", {}).get("metadata", {}) or {}
_lp = self.model_call_details.get("litellm_params", {}) or {}
_mi = (
_lp_meta.get("model_info")
or _lp.get("model_info")
or (_lp.get("litellm_metadata", {}) or {}).get("model_info")
or {}
)appears verbatim three times: here (non-streaming path), at the streaming path around line 2491, and in anthropic_passthrough_logging_handler.py lines 122–129. Any future change to where model_info can live (e.g. a fourth location) requires updating all three copies.
Consider extracting it to a small helper on the Logging class (or as a module-level function), e.g.:
def _extract_model_info_from_litellm_params(litellm_params: dict) -> dict:
metadata = litellm_params.get("metadata", {}) or {}
return (
metadata.get("model_info")
or litellm_params.get("model_info")
or (litellm_params.get("litellm_metadata", {}) or {}).get("model_info")
or {}
)| def _get_model_info_from_litellm_params( | ||
| litellm_params: Optional[dict], | ||
| ) -> Dict[str, Any]: | ||
| if litellm_params is None: | ||
| return {} | ||
|
|
||
| metadata = litellm_params.get("metadata", {}) or {} | ||
| return ( | ||
| metadata.get("model_info") | ||
| or litellm_params.get("model_info") | ||
| or (litellm_params.get("litellm_metadata", {}) or {}).get("model_info") | ||
| or {} | ||
| ) |
There was a problem hiding this comment.
_get_model_info_from_litellm_params is annotated to return Dict[str, Any], but the expression chain can return a non-dict truthy value. If someone stores a non-dict value under model_info (e.g., a string), call sites like line 1656 (model_info.get("id")) will raise AttributeError at runtime.
Add a defensive cast to guarantee the return type:
| def _get_model_info_from_litellm_params( | |
| litellm_params: Optional[dict], | |
| ) -> Dict[str, Any]: | |
| if litellm_params is None: | |
| return {} | |
| metadata = litellm_params.get("metadata", {}) or {} | |
| return ( | |
| metadata.get("model_info") | |
| or litellm_params.get("model_info") | |
| or (litellm_params.get("litellm_metadata", {}) or {}).get("model_info") | |
| or {} | |
| ) | |
| def _get_model_info_from_litellm_params( | |
| litellm_params: Optional[dict], | |
| ) -> Dict[str, Any]: | |
| if litellm_params is None: | |
| return {} | |
| metadata = litellm_params.get("metadata", {}) or {} | |
| result = ( | |
| metadata.get("model_info") | |
| or litellm_params.get("model_info") | |
| or (litellm_params.get("litellm_metadata", {}) or {}).get("model_info") | |
| or {} | |
| ) | |
| return result if isinstance(result, dict) else {} |
| if "model_info" not in request_payload and self.litellm_metadata: | ||
| _mi = self.litellm_metadata.get("model_info") | ||
| if _mi: | ||
| request_payload["model_info"] = _mi |
There was a problem hiding this comment.
Top-level model_info injection bypasses the established lookup path
_get_model_info_from_litellm_params (the helper added by this PR) looks inside litellm_params["metadata"], litellm_params["model_info"], or litellm_params["litellm_metadata"] — it does not look at the top-level key being set here (request_payload["model_info"]).
After the fix in responses/main.py, litellm_params["metadata"]["model_info"] is already populated correctly when _handle_completed_response runs. The streaming cost path at litellm_logging.py:2473 extracts model_info from model_call_details["litellm_params"] (via the same helper), so this top-level injection has no effect on cost calculation.
The change appears to target update_response_metadata / set_hidden_params so that hidden_params["model_id"] gets set — but the explicit router_model_id argument now passed by litellm_logging.py renders that fallback unnecessary.
Consider documenting the exact code path this unblocks (i.e., which consumer of request_payload["model_info"] requires it), or remove the block if it is truly redundant, to avoid future confusion.
| model_info = _get_model_info_from_litellm_params( | ||
| self.model_call_details.get("litellm_params", {}) or {} | ||
| ) | ||
| self.model_call_details["response_cost"] = self._response_cost_calculator( | ||
| result=logging_result | ||
| result=logging_result, | ||
| router_model_id=model_info.get("id"), | ||
| ) |
There was a problem hiding this comment.
Broader scope than intended for this fix
The new router_model_id injection at lines 1651–1657 affects ALL non-streaming call types that reach the else branch (chat completions, embeddings, etc.), not just /v1/responses. While regressions are prevented by the earlier elif self.model_call_details.get("response_cost") is not None: pass guard, it is worth verifying that for these other call types _get_model_info_from_litellm_params consistently returns {} (or a dict without "id") so that router_model_id=None is passed and cost calculation is unchanged.
If any existing call type stores a model_info with an "id" in litellm_params (which many router-dispatched calls do), _response_cost_calculator will now be called with a non-None router_model_id where it previously wasn't. If the deployment ID happens to exist in litellm.model_cost with unexpectedly-priced entries, this could silently change cost calculations for those call types.
Consider scoping the router_model_id injection to the specific call types this PR targets (aresponses, responses) by guarding on self.call_type:
# Only apply router_model_id for response API call types
if self.call_type in ("aresponses", "responses"):
model_info = _get_model_info_from_litellm_params(
self.model_call_details.get("litellm_params", {}) or {}
)
router_model_id = model_info.get("id")
else:
router_model_id = None
self.model_call_details["response_cost"] = self._response_cost_calculator(
result=logging_result,
router_model_id=router_model_id,
)| if litellm_params.get("metadata") is not None: | ||
| existing_metadata = kwargs["litellm_params"].get("metadata", {}) or {} | ||
| merged_metadata = dict(existing_metadata) | ||
| merged_metadata.update(litellm_params.get("metadata") or {}) | ||
| kwargs["litellm_params"]["metadata"] = merged_metadata |
There was a problem hiding this comment.
Merge direction may silently overwrite proxy-pipeline metadata
The merge applies logging_obj's metadata on top of the existing kwargs["litellm_params"]["metadata"]:
merged_metadata = dict(existing_metadata)
merged_metadata.update(litellm_params.get("metadata") or {}) # logging_obj winslitellm_params.get("metadata") here is the merged_metadata built by llm_http_handler.py, which already contains all user-provided metadata keys plus the deployment's model_info. Because logging_obj's metadata wins, any key that the proxy pipeline added to kwargs["litellm_params"]["metadata"] after the initial handler ran (e.g., a middleware that injects user or trace_id) will be silently overwritten if that key also appears in the logging object's metadata.
For correctness in the custom-pricing case you only need to ensure the model_info key from litellm_params reaches kwargs["litellm_params"]["metadata"]. A targeted write is safer:
kwargs.setdefault("litellm_params", {})
kwargs["litellm_params"].setdefault("metadata", {})
if litellm_params.get("metadata", {}).get("model_info"):
kwargs["litellm_params"]["metadata"]["model_info"] = (
litellm_params["metadata"]["model_info"]
)| if litellm_params.get("metadata") is not None: | ||
| logging_metadata = litellm_params.get("metadata") or {} | ||
| existing_metadata = kwargs["litellm_params"].get("metadata", {}) or {} | ||
| merged_metadata = dict(logging_metadata) | ||
| merged_metadata.update(existing_metadata) | ||
| if logging_metadata.get("model_info") is not None: | ||
| merged_metadata["model_info"] = logging_metadata["model_info"] | ||
| kwargs["litellm_params"]["metadata"] = merged_metadata |
There was a problem hiding this comment.
Merge start-point is counter-intuitive but correctly tested
The merge starts from logging_metadata and then overlays existing_metadata, which means existing pipeline keys win for every field except model_info. The test at test_custom_pricing_metadata_propagation.py:785 (shared_field == "from-existing") confirms this is intentional.
A brief comment would help future readers understand the direction:
| if litellm_params.get("metadata") is not None: | |
| logging_metadata = litellm_params.get("metadata") or {} | |
| existing_metadata = kwargs["litellm_params"].get("metadata", {}) or {} | |
| merged_metadata = dict(logging_metadata) | |
| merged_metadata.update(existing_metadata) | |
| if logging_metadata.get("model_info") is not None: | |
| merged_metadata["model_info"] = logging_metadata["model_info"] | |
| kwargs["litellm_params"]["metadata"] = merged_metadata | |
| if litellm_params.get("metadata") is not None: | |
| logging_metadata = litellm_params.get("metadata") or {} | |
| existing_metadata = kwargs["litellm_params"].get("metadata", {}) or {} | |
| # Start from logging_obj metadata, let existing pipeline metadata win for | |
| # all keys, then unconditionally restore deployment model_info. | |
| merged_metadata = dict(logging_metadata) | |
| merged_metadata.update(existing_metadata) | |
| if logging_metadata.get("model_info") is not None: | |
| merged_metadata["model_info"] = logging_metadata["model_info"] | |
| kwargs["litellm_params"]["metadata"] = merged_metadata |
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
| deadline = time.time() + 2.0 | ||
| while cost_callback.response_cost is None and time.time() < deadline: | ||
| time.sleep(0.01) |
There was a problem hiding this comment.
Polling loop is redundant for sync streaming
For sync streaming, log_success_event is called synchronously as each chunk is processed inside the for _chunk in response loop above. By the time that loop exits, cost_callback.response_cost is already set. The while polling loop will exit immediately on the first iteration.
Consider simplifying:
| deadline = time.time() + 2.0 | |
| while cost_callback.response_cost is None and time.time() < deadline: | |
| time.sleep(0.01) | |
| # For sync streaming, cost_callback.response_cost is already set | |
| # by the time the for-loop above completes (callbacks are synchronous). |
If there is a theoretical case where the callback is deferred (e.g. a background thread), a brief comment explaining that intent would help readers understand why the poll is needed.
|
@greptileai @greptile-apps can you review one last commit 1f3530185 and update the summary? |
| or litellm_params.get("model_info") | ||
| or (litellm_params.get("litellm_metadata", {}) or {}).get("model_info") |
There was a problem hiding this comment.
Empty-dict short-circuit in or-chain silently skips explicit model_info: {}
The chained or evaluates Python truthiness, so if metadata["model_info"] is an empty dict {} (falsy), the expression falls through to litellm_params.get("model_info") even though model_info was explicitly set in the highest-priority location.
In practice this is benign (an empty model_info has no custom pricing), but it creates an inconsistency: a caller who deliberately sets metadata={"model_info": {}} to reset any inherited model info would silently get a different (lower-priority) model_info returned instead of {}.
Consider using an explicit sentinel instead of truthiness:
_MISSING = object()
metadata_mi = metadata.get("model_info", _MISSING)
if metadata_mi is not _MISSING:
result = metadata_mi
else:
lp_mi = litellm_params.get("model_info", _MISSING)
if lp_mi is not _MISSING:
result = lp_mi
else:
result = (litellm_params.get("litellm_metadata") or {}).get("model_info", {})
return result if isinstance(result, dict) else {}|
@ishaan-berri can you review? This fixes a regression you introduced in #20679 4/5 greptile confidence, 5.0 is not feasible at this complexity; It keeps coming up with ridiculously minor and far-fetched stiff. |
24c8438 to
b07d091
Compare
b07d091 to
719629c
Compare
| pass | ||
| else: | ||
| router_model_id = None | ||
| if self.call_type in ( |
There was a problem hiding this comment.
this is a weird hardcoding, why do it this way? @DmitriyAlergant
|
@krrishdholakia to be honest this was mainly to satisfy "the beast" (greptile): #23239 (comment) She is not wrong, since other endpoints are not properly covered by tests for this change. And this core pathway is complex and not 100% understood for implications on all possible endpoints. Based on your comment, I pushed a commit to remove this guard, and the tests are still passing. |
|
Fix on main |
Relevant issues
#23185
Pre-Submission checklist
Please complete all items before asking a LiteLLM maintainer to review your PR
tests/test_litellm/directory, Adding at least 1 test is a hard requirement - see detailsmake test-unitAll good aside of 7 pre-existing failures due to
responsespackage shadowingAttributeError: module 'responses' has no attribute 'activate', this is a pre-existing issue@greptileaiand received a Confidence Score of at least 4/5 before requesting a maintainer reviewType
🐛 Bug Fix
Changes
Custom pricing from
model_infoconfig was ignored for/v1/responses(streaming) and/v1/messages(all modes) endpoints, causing cost calculations to use standard pricing instead. Regression introduced by: PR #20679 (commit1ee43b11de). See repro steps in the issue linked above.