fix(proxy): make async_post_call_response_headers_hook consistent across all endpoints#22985
Conversation
…oss all endpoints The response headers hook had 5 gaps that prevented callbacks from reliably extracting routing metadata across endpoint types: 1. Hook never fired for /audio/transcriptions (endpoint bypasses base_process_llm_request) 2. custom_llm_provider not accessible in hook data for any endpoint 3. custom_llm_provider not stamped in ResponsesAPIResponse._hidden_params (unlike chat completions) 4. model_info under inconsistent keys (metadata vs litellm_metadata) 5. request_headers always None at all call sites This adds a litellm_call_info parameter to the hook that normalizes routing metadata (custom_llm_provider, model_info, api_base, model_id) regardless of endpoint type. Also stamps custom_llm_provider on Responses API responses, adds the hook call to the transcription handler, and passes request_headers at all call sites. Supersedes PR BerriAI#21385.
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
Greptile SummaryThis PR addresses five consistency gaps in Key changes:
Remaining gaps:
Confidence Score: 4/5
Sequence DiagramsequenceDiagram
participant Client
participant ProxyServer
participant ProxyLogging
participant _accepts_litellm_call_info
participant CustomLogger
Client->>ProxyServer: POST /chat/completions or /audio/transcriptions
ProxyServer->>ProxyServer: LLM call → response
ProxyServer->>ProxyLogging: post_call_response_headers_hook(data, response, request_headers)
ProxyLogging->>ProxyLogging: _build_litellm_call_info(data, response)<br/>→ {custom_llm_provider, model_info, api_base, model_id}
loop for each callback in litellm.callbacks
ProxyLogging->>_accepts_litellm_call_info: id(type(cb))
_accepts_litellm_call_info-->>ProxyLogging: True / False (cached)
alt callback accepts litellm_call_info
ProxyLogging->>CustomLogger: async_post_call_response_headers_hook(..., litellm_call_info=...)
else legacy callback
ProxyLogging->>CustomLogger: async_post_call_response_headers_hook(...) [no litellm_call_info]
end
CustomLogger-->>ProxyLogging: Dict[str, str] headers
end
ProxyLogging-->>ProxyServer: merged_headers
ProxyServer->>Client: Response + injected headers
|
| model_info = ( | ||
| data.get("metadata", {}).get("model_info") | ||
| or data.get("litellm_metadata", {}).get("model_info") | ||
| or {} | ||
| ) |
There was a problem hiding this comment.
AttributeError when metadata key is explicitly None
data.get("metadata", {}) only returns the {} fallback when the key is absent. If the key exists with a None value (e.g. data = {"metadata": None}), Python returns None, and the subsequent .get("model_info") call raises AttributeError. This exception is caught by the outer try/except Exception in post_call_response_headers_hook, which means litellm_call_info is never built and no callbacks are invoked for that request.
The failure path in common_request_processing.py at line 1201 has the same pattern for proxy_server_request. The fix is to use (... or {}) to guard against None values:
| model_info = ( | |
| data.get("metadata", {}).get("model_info") | |
| or data.get("litellm_metadata", {}).get("model_info") | |
| or {} | |
| ) | |
| model_info = ( | |
| (data.get("metadata") or {}).get("model_info") | |
| or (data.get("litellm_metadata") or {}).get("model_info") | |
| or {} | |
| ) |
litellm/proxy/utils.py
Outdated
| except TypeError: | ||
| # Backwards compat: callback doesn't accept litellm_call_info | ||
| result = await _callback.async_post_call_response_headers_hook( | ||
| data=data, | ||
| user_api_key_dict=user_api_key_dict, | ||
| response=response, | ||
| request_headers=request_headers, | ||
| ) |
There was a problem hiding this comment.
Overly broad TypeError catch masks real callback bugs
Catching TypeError as the signal that a callback doesn't accept litellm_call_info is unsafe. Any TypeError raised inside a callback's own implementation (e.g. a type mismatch in the callback body itself) will be silently swallowed, and then the callback will be called a second time without litellm_call_info. This means:
- Real bugs in callbacks are silently hidden.
- If a callback has side effects (e.g. writes headers, updates counters), those effects may happen twice with inconsistent parameters.
The standard approach for backward-compatibility shims is to inspect the signature once rather than catching TypeError at runtime. Using inspect.signature(_callback.async_post_call_response_headers_hook) lets you check for "litellm_call_info" in sig.parameters before the call, which is both safer and avoids the double-invocation risk entirely.
… guards
- Replace try/except TypeError with inspect.signature() check for
litellm_call_info backwards compatibility. This avoids masking real
TypeErrors inside callback implementations and prevents double
invocation with inconsistent parameters.
- Use (data.get("key") or {}) instead of data.get("key", {}) to guard
against keys that exist with an explicit None value, which would
cause AttributeError on the subsequent .get() call.
litellm/proxy/utils.py
Outdated
| response=response, | ||
| request_headers=request_headers, | ||
| ) | ||
| sig = inspect.signature(_callback.async_post_call_response_headers_hook) |
There was a problem hiding this comment.
inspect.signature called per-request per-callback
inspect.signature(...) is called inside the hot callback loop on every request. While Python's inspect module caches the parsed signature on the function object after the first call (making subsequent calls cheap), the attribute lookup and wrapper overhead still fires for every request × every callback. For deployments with many callbacks at high QPS this adds up.
A simple guard caches the result in a module-level dict keyed by callback identity:
_CALLBACK_ACCEPTS_CALL_INFO: Dict[int, bool] = {}
def _accepts_litellm_call_info(cb: CustomLogger) -> bool:
key = id(cb)
if key not in _CALLBACK_ACCEPTS_CALL_INFO:
sig = inspect.signature(cb.async_post_call_response_headers_hook)
_CALLBACK_ACCEPTS_CALL_INFO[key] = "litellm_call_info" in sig.parameters
return _CALLBACK_ACCEPTS_CALL_INFO[key]This makes the per-request cost a single dict lookup instead of a inspect.signature call.
Move the inspect.signature() call into a module-level helper with a dict cache keyed by callback identity. Avoids repeated introspection per request per callback in the hot path.
| _CALLBACK_ACCEPTS_CALL_INFO: Dict[int, bool] = {} | ||
|
|
||
|
|
||
| def _accepts_litellm_call_info(cb: CustomLogger) -> bool: | ||
| key = id(cb) | ||
| if key not in _CALLBACK_ACCEPTS_CALL_INFO: | ||
| sig = inspect.signature(cb.async_post_call_response_headers_hook) | ||
| _CALLBACK_ACCEPTS_CALL_INFO[key] = "litellm_call_info" in sig.parameters | ||
| return _CALLBACK_ACCEPTS_CALL_INFO[key] |
There was a problem hiding this comment.
The cache key uses id(cb) (instance memory address), which is vulnerable to Python's address reuse after garbage collection. In deployments with dynamic callback add/remove: a removed callback's address may be reused by a new callback that doesn't accept litellm_call_info, causing the stale cache entry to return the wrong signature status.
Since all instances of the same callback class share the same method signature, use id(type(cb)) instead. This is safer (classes are effectively singletons), has better cache hit rates, and avoids the GC reuse problem entirely.
Key the _CALLBACK_ACCEPTS_CALL_INFO cache by id(type(cb)) instead of id(cb) to avoid stale entries from Python address reuse after GC. All instances of the same callback class share the same method signature, so class identity is both safer and more cache-efficient.
| def _accepts_litellm_call_info(cb: CustomLogger) -> bool: | ||
| key = id(type(cb)) | ||
| if key not in _CALLBACK_ACCEPTS_CALL_INFO: | ||
| sig = inspect.signature(cb.async_post_call_response_headers_hook) | ||
| _CALLBACK_ACCEPTS_CALL_INFO[key] = "litellm_call_info" in sig.parameters | ||
| return _CALLBACK_ACCEPTS_CALL_INFO[key] |
There was a problem hiding this comment.
VAR_KEYWORD callbacks silently excluded from litellm_call_info
"litellm_call_info" in sig.parameters only matches explicitly-named parameters. Callbacks that use variadic keyword arguments for forward-compatibility (e.g. async def async_post_call_response_headers_hook(self, data, user_api_key_dict, response, **kw)) will not match this check and will never receive litellm_call_info, even if they access it via kw.get("litellm_call_info").
The fix is to also accept callbacks whose signature contains a VAR_KEYWORD parameter:
sig = inspect.signature(cb.async_post_call_response_headers_hook)
has_explicit = "litellm_call_info" in sig.parameters
has_var_kw = any(
p.kind == inspect.Parameter.VAR_KEYWORD
for p in sig.parameters.values()
)
_CALLBACK_ACCEPTS_CALL_INFO[key] = has_explicit or has_var_kw…oss all endpoints (#22985) * fix(proxy): make async_post_call_response_headers_hook consistent across all endpoints The response headers hook had 5 gaps that prevented callbacks from reliably extracting routing metadata across endpoint types: 1. Hook never fired for /audio/transcriptions (endpoint bypasses base_process_llm_request) 2. custom_llm_provider not accessible in hook data for any endpoint 3. custom_llm_provider not stamped in ResponsesAPIResponse._hidden_params (unlike chat completions) 4. model_info under inconsistent keys (metadata vs litellm_metadata) 5. request_headers always None at all call sites This adds a litellm_call_info parameter to the hook that normalizes routing metadata (custom_llm_provider, model_info, api_base, model_id) regardless of endpoint type. Also stamps custom_llm_provider on Responses API responses, adds the hook call to the transcription handler, and passes request_headers at all call sites. Supersedes PR #21385. * fix(proxy): address review feedback — safer backwards compat and None guards - Replace try/except TypeError with inspect.signature() check for litellm_call_info backwards compatibility. This avoids masking real TypeErrors inside callback implementations and prevents double invocation with inconsistent parameters. - Use (data.get("key") or {}) instead of data.get("key", {}) to guard against keys that exist with an explicit None value, which would cause AttributeError on the subsequent .get() call. * fix(proxy): cache inspect.signature result for callback compat check Move the inspect.signature() call into a module-level helper with a dict cache keyed by callback identity. Avoids repeated introspection per request per callback in the hot path. * fix(proxy): use class identity for signature cache key Key the _CALLBACK_ACCEPTS_CALL_INFO cache by id(type(cb)) instead of id(cb) to avoid stale entries from Python address reuse after GC. All instances of the same callback class share the same method signature, so class identity is both safer and more cache-efficient.
The response headers hook had 5 gaps that prevented callbacks from reliably extracting routing metadata across endpoint types:
This adds a litellm_call_info parameter to the hook that normalizes routing metadata (custom_llm_provider, model_info, api_base, model_id) regardless of endpoint type. Also stamps custom_llm_provider on Responses API responses, adds the hook call to the transcription handler, and passes request_headers at all call sites.
Supersedes PR #21385.
Relevant issues
Fixes #19646
Related PRs: #20083 (original hook feature), #21385 (superseded by this PR)
Pre-Submission checklist
Please complete all items before asking a LiteLLM maintainer to review your PR
tests/test_litellm/directory, Adding at least 1 test is a hard requirement - see detailsmake test-unit@greptileaiand received a Confidence Score of at least 4/5 before requesting a maintainer reviewCI (LiteLLM team)
Branch creation CI run
Link:
CI run for the last commit
Link:
Merge / cherry-pick CI run
Links:
Type
🐛 Bug Fix
Changes
litellm/integrations/custom_logger.pylitellm_call_info: Optional[Dict[str, Any]]parameter toasync_post_call_response_headers_hooksignature with docstring describing the fields (custom_llm_provider,model_info,api_base,model_id)litellm/proxy/utils.py_build_litellm_call_info()static method that extracts routing metadata fromresponse._hidden_paramsand normalizesmodel_infolookup across bothdata["metadata"]anddata["litellm_metadata"]post_call_response_headers_hook()to build and passlitellm_call_infoto each callbackTypeErrorfallback for backwards compatibility with existing callbacks that don't accept the new parameterlitellm/proxy/common_request_processing.pyrequest_headers=dict(request.headers)at all 3 hook call sites (streaming success, non-streaming success, failure)litellm/proxy/proxy_server.pypost_call_response_headers_hookcall to the/audio/transcriptionshandler, which previously bypassed the hook entirelylitellm/responses/main.pycustom_llm_providerintoResponsesAPIResponse._hidden_paramson both the async (aresponses) and sync (responses) non-streaming paths, mirroringlitellm/main.py:1371for chat completionslitellm/responses/streaming_iterator.pycustom_llm_providerto_hidden_paramsdict inBaseResponsesAPIStreamingIterator.__init__, mirroringstreaming_handler.py:701for chat completionstests/test_litellm/proxy/hooks/test_post_call_response_headers_hook.pytest_litellm_call_info_from_hidden_params— verifieslitellm_call_infois built fromresponse._hidden_paramstest_litellm_call_info_from_litellm_metadata— verifiesmodel_infois found underlitellm_metadata(responses API path)test_litellm_call_info_with_none_response— verifies graceful handling when response isNone(failure path)test_litellm_call_info_backwards_compatible— verifies existing callbacks without the new parameter still work viaTypeErrorfallback