Skip to content

fix(proxy): make async_post_call_response_headers_hook consistent across all endpoints#22985

Merged
ishaan-jaff merged 4 commits intoBerriAI:mainfrom
michelligabriele:fix/response-headers-hook-consistency
Mar 12, 2026
Merged

fix(proxy): make async_post_call_response_headers_hook consistent across all endpoints#22985
ishaan-jaff merged 4 commits intoBerriAI:mainfrom
michelligabriele:fix/response-headers-hook-consistency

Conversation

@michelligabriele
Copy link
Collaborator

The response headers hook had 5 gaps that prevented callbacks from reliably extracting routing metadata across endpoint types:

  1. Hook never fired for /audio/transcriptions (endpoint bypasses base_process_llm_request)
  2. custom_llm_provider not accessible in hook data for any endpoint
  3. custom_llm_provider not stamped in ResponsesAPIResponse._hidden_params (unlike chat completions)
  4. model_info under inconsistent keys (metadata vs litellm_metadata)
  5. request_headers always None at all call sites

This adds a litellm_call_info parameter to the hook that normalizes routing metadata (custom_llm_provider, model_info, api_base, model_id) regardless of endpoint type. Also stamps custom_llm_provider on Responses API responses, adds the hook call to the transcription handler, and passes request_headers at all call sites.

Supersedes PR #21385.

Relevant issues

Fixes #19646

Related PRs: #20083 (original hook feature), #21385 (superseded by this PR)

Pre-Submission checklist

Please complete all items before asking a LiteLLM maintainer to review your PR

  • I have Added testing in the tests/test_litellm/ directory, Adding at least 1 test is a hard requirement - see details
  • My PR passes all unit tests on make test-unit
  • My PR's scope is as isolated as possible, it only solves 1 specific problem
  • I have requested a Greptile review by commenting @greptileai and received a Confidence Score of at least 4/5 before requesting a maintainer review

CI (LiteLLM team)

CI status guideline:

  • 50-55 passing tests: main is stable with minor issues.
  • 45-49 passing tests: acceptable but needs attention
  • <= 40 passing tests: unstable; be careful with your merges and assess the risk.
  • Branch creation CI run
    Link:

  • CI run for the last commit
    Link:

  • Merge / cherry-pick CI run
    Links:

Type

🐛 Bug Fix

Changes

litellm/integrations/custom_logger.py

  • Added optional litellm_call_info: Optional[Dict[str, Any]] parameter to async_post_call_response_headers_hook signature with docstring describing the fields (custom_llm_provider, model_info, api_base, model_id)

litellm/proxy/utils.py

  • Added _build_litellm_call_info() static method that extracts routing metadata from response._hidden_params and normalizes model_info lookup across both data["metadata"] and data["litellm_metadata"]
  • Updated post_call_response_headers_hook() to build and pass litellm_call_info to each callback
  • Added TypeError fallback for backwards compatibility with existing callbacks that don't accept the new parameter

litellm/proxy/common_request_processing.py

  • Added request_headers=dict(request.headers) at all 3 hook call sites (streaming success, non-streaming success, failure)

litellm/proxy/proxy_server.py

  • Added post_call_response_headers_hook call to the /audio/transcriptions handler, which previously bypassed the hook entirely

litellm/responses/main.py

  • Stamped custom_llm_provider into ResponsesAPIResponse._hidden_params on both the async (aresponses) and sync (responses) non-streaming paths, mirroring litellm/main.py:1371 for chat completions

litellm/responses/streaming_iterator.py

  • Added custom_llm_provider to _hidden_params dict in BaseResponsesAPIStreamingIterator.__init__, mirroring streaming_handler.py:701 for chat completions

tests/test_litellm/proxy/hooks/test_post_call_response_headers_hook.py

  • Added 4 new tests:
    • test_litellm_call_info_from_hidden_params — verifies litellm_call_info is built from response._hidden_params
    • test_litellm_call_info_from_litellm_metadata — verifies model_info is found under litellm_metadata (responses API path)
    • test_litellm_call_info_with_none_response — verifies graceful handling when response is None (failure path)
    • test_litellm_call_info_backwards_compatible — verifies existing callbacks without the new parameter still work via TypeError fallback

…oss all endpoints

The response headers hook had 5 gaps that prevented callbacks from
reliably extracting routing metadata across endpoint types:

1. Hook never fired for /audio/transcriptions (endpoint bypasses
   base_process_llm_request)
2. custom_llm_provider not accessible in hook data for any endpoint
3. custom_llm_provider not stamped in ResponsesAPIResponse._hidden_params
   (unlike chat completions)
4. model_info under inconsistent keys (metadata vs litellm_metadata)
5. request_headers always None at all call sites

This adds a litellm_call_info parameter to the hook that normalizes
routing metadata (custom_llm_provider, model_info, api_base, model_id)
regardless of endpoint type. Also stamps custom_llm_provider on
Responses API responses, adds the hook call to the transcription
handler, and passes request_headers at all call sites.

Supersedes PR BerriAI#21385.
@vercel
Copy link

vercel bot commented Mar 6, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
litellm Ready Ready Preview, Comment Mar 9, 2026 5:14pm

Request Review

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Mar 6, 2026

Greptile Summary

This PR addresses five consistency gaps in async_post_call_response_headers_hook: it adds litellm_call_info (normalized routing metadata) to the hook contract, fixes the /audio/transcriptions endpoint that previously bypassed the hook entirely, passes request_headers at all call sites, stamps custom_llm_provider into ResponsesAPIResponse._hidden_params and the streaming iterator, and replaces the unsafe TypeError-catch backward-compat pattern with an inspect.signature cache keyed by class type.

Key changes:

  • litellm/proxy/utils.py: _build_litellm_call_info() normalizes metadata from both metadata and litellm_metadata keys; _accepts_litellm_call_info() uses a class-keyed signature cache for efficient per-request dispatch
  • litellm/proxy/common_request_processing.py: request_headers now correctly populated at all 3 hook call sites (streaming success, non-streaming success, failure)
  • litellm/proxy/proxy_server.py: hook wired into /audio/transcriptions success path; failure path still missing the hook call
  • litellm/responses/main.py + streaming_iterator.py: custom_llm_provider stamped into _hidden_params for both streaming and non-streaming Responses API paths
  • Tests: 4 new mock-only tests covering the full matrix of new behaviour

Remaining gaps:

  • Callbacks using **kwargs for forward-compatibility are silently excluded from receiving litellm_call_info
  • The /audio/transcriptions failure path still skips the hook, inconsistent with other endpoints

Confidence Score: 4/5

  • Safe to merge with minor completeness gaps; no regressions introduced and all backward-compat paths are correctly handled.
  • The core logic is sound: the fragile TypeError-catch is replaced with a robust inspect.signature cache keyed by class type, the None-guard for metadata uses or {} throughout, and all four previously-flagged issues from the PR description are addressed. Two remaining style gaps lower the score slightly: (1) callbacks using variadic keyword arguments will silently not receive litellm_call_info, and (2) the /audio/transcriptions failure path still skips the hook, inconsistent with other endpoints. Neither is a regression but both limit the completeness of the fix.
  • litellm/proxy/proxy_server.py (missing hook in failure path) and litellm/proxy/utils.py (_accepts_litellm_call_info VAR_KEYWORD gap)

Sequence Diagram

sequenceDiagram
    participant Client
    participant ProxyServer
    participant ProxyLogging
    participant _accepts_litellm_call_info
    participant CustomLogger

    Client->>ProxyServer: POST /chat/completions or /audio/transcriptions
    ProxyServer->>ProxyServer: LLM call → response
    ProxyServer->>ProxyLogging: post_call_response_headers_hook(data, response, request_headers)
    ProxyLogging->>ProxyLogging: _build_litellm_call_info(data, response)<br/>→ {custom_llm_provider, model_info, api_base, model_id}
    loop for each callback in litellm.callbacks
        ProxyLogging->>_accepts_litellm_call_info: id(type(cb))
        _accepts_litellm_call_info-->>ProxyLogging: True / False (cached)
        alt callback accepts litellm_call_info
            ProxyLogging->>CustomLogger: async_post_call_response_headers_hook(..., litellm_call_info=...)
        else legacy callback
            ProxyLogging->>CustomLogger: async_post_call_response_headers_hook(...) [no litellm_call_info]
        end
        CustomLogger-->>ProxyLogging: Dict[str, str] headers
    end
    ProxyLogging-->>ProxyServer: merged_headers
    ProxyServer->>Client: Response + injected headers
Loading

Comments Outside Diff (1)

  1. litellm/proxy/proxy_server.py, line 7381-7384 (link)

    Failure path skips post_call_response_headers_hook for audio transcriptions

    The PR adds the hook to the success path of /audio/transcriptions (line 7371), but the except block omits it. Other endpoints routed through ProxyBaseLLMRequestProcessing (chat, completions, etc.) do call the hook in their failure path (see common_request_processing.py lines 1195–1206). This leaves audio transcription failure responses still not firing the hook, contradicting the PR's stated goal of making the hook consistent across all endpoints.

    Consider calling proxy_logging_obj.post_call_response_headers_hook with response=None inside the except block (after line 7383), mirroring how common_request_processing.py handles the failure path.

Last reviewed commit: 9eb8b98

Comment on lines +2026 to +2030
model_info = (
data.get("metadata", {}).get("model_info")
or data.get("litellm_metadata", {}).get("model_info")
or {}
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AttributeError when metadata key is explicitly None

data.get("metadata", {}) only returns the {} fallback when the key is absent. If the key exists with a None value (e.g. data = {"metadata": None}), Python returns None, and the subsequent .get("model_info") call raises AttributeError. This exception is caught by the outer try/except Exception in post_call_response_headers_hook, which means litellm_call_info is never built and no callbacks are invoked for that request.

The failure path in common_request_processing.py at line 1201 has the same pattern for proxy_server_request. The fix is to use (... or {}) to guard against None values:

Suggested change
model_info = (
data.get("metadata", {}).get("model_info")
or data.get("litellm_metadata", {}).get("model_info")
or {}
)
model_info = (
(data.get("metadata") or {}).get("model_info")
or (data.get("litellm_metadata") or {}).get("model_info")
or {}
)

Comment on lines +1999 to +2006
except TypeError:
# Backwards compat: callback doesn't accept litellm_call_info
result = await _callback.async_post_call_response_headers_hook(
data=data,
user_api_key_dict=user_api_key_dict,
response=response,
request_headers=request_headers,
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overly broad TypeError catch masks real callback bugs

Catching TypeError as the signal that a callback doesn't accept litellm_call_info is unsafe. Any TypeError raised inside a callback's own implementation (e.g. a type mismatch in the callback body itself) will be silently swallowed, and then the callback will be called a second time without litellm_call_info. This means:

  1. Real bugs in callbacks are silently hidden.
  2. If a callback has side effects (e.g. writes headers, updates counters), those effects may happen twice with inconsistent parameters.

The standard approach for backward-compatibility shims is to inspect the signature once rather than catching TypeError at runtime. Using inspect.signature(_callback.async_post_call_response_headers_hook) lets you check for "litellm_call_info" in sig.parameters before the call, which is both safer and avoids the double-invocation risk entirely.

… guards

- Replace try/except TypeError with inspect.signature() check for
  litellm_call_info backwards compatibility. This avoids masking real
  TypeErrors inside callback implementations and prevents double
  invocation with inconsistent parameters.

- Use (data.get("key") or {}) instead of data.get("key", {}) to guard
  against keys that exist with an explicit None value, which would
  cause AttributeError on the subsequent .get() call.
response=response,
request_headers=request_headers,
)
sig = inspect.signature(_callback.async_post_call_response_headers_hook)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

inspect.signature called per-request per-callback

inspect.signature(...) is called inside the hot callback loop on every request. While Python's inspect module caches the parsed signature on the function object after the first call (making subsequent calls cheap), the attribute lookup and wrapper overhead still fires for every request × every callback. For deployments with many callbacks at high QPS this adds up.

A simple guard caches the result in a module-level dict keyed by callback identity:

_CALLBACK_ACCEPTS_CALL_INFO: Dict[int, bool] = {}

def _accepts_litellm_call_info(cb: CustomLogger) -> bool:
    key = id(cb)
    if key not in _CALLBACK_ACCEPTS_CALL_INFO:
        sig = inspect.signature(cb.async_post_call_response_headers_hook)
        _CALLBACK_ACCEPTS_CALL_INFO[key] = "litellm_call_info" in sig.parameters
    return _CALLBACK_ACCEPTS_CALL_INFO[key]

This makes the per-request cost a single dict lookup instead of a inspect.signature call.

Move the inspect.signature() call into a module-level helper with a
dict cache keyed by callback identity. Avoids repeated introspection
per request per callback in the hot path.
Comment on lines +291 to +299
_CALLBACK_ACCEPTS_CALL_INFO: Dict[int, bool] = {}


def _accepts_litellm_call_info(cb: CustomLogger) -> bool:
key = id(cb)
if key not in _CALLBACK_ACCEPTS_CALL_INFO:
sig = inspect.signature(cb.async_post_call_response_headers_hook)
_CALLBACK_ACCEPTS_CALL_INFO[key] = "litellm_call_info" in sig.parameters
return _CALLBACK_ACCEPTS_CALL_INFO[key]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cache key uses id(cb) (instance memory address), which is vulnerable to Python's address reuse after garbage collection. In deployments with dynamic callback add/remove: a removed callback's address may be reused by a new callback that doesn't accept litellm_call_info, causing the stale cache entry to return the wrong signature status.

Since all instances of the same callback class share the same method signature, use id(type(cb)) instead. This is safer (classes are effectively singletons), has better cache hit rates, and avoids the GC reuse problem entirely.

Key the _CALLBACK_ACCEPTS_CALL_INFO cache by id(type(cb)) instead of
id(cb) to avoid stale entries from Python address reuse after GC.
All instances of the same callback class share the same method
signature, so class identity is both safer and more cache-efficient.
Comment on lines +294 to +299
def _accepts_litellm_call_info(cb: CustomLogger) -> bool:
key = id(type(cb))
if key not in _CALLBACK_ACCEPTS_CALL_INFO:
sig = inspect.signature(cb.async_post_call_response_headers_hook)
_CALLBACK_ACCEPTS_CALL_INFO[key] = "litellm_call_info" in sig.parameters
return _CALLBACK_ACCEPTS_CALL_INFO[key]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

VAR_KEYWORD callbacks silently excluded from litellm_call_info

"litellm_call_info" in sig.parameters only matches explicitly-named parameters. Callbacks that use variadic keyword arguments for forward-compatibility (e.g. async def async_post_call_response_headers_hook(self, data, user_api_key_dict, response, **kw)) will not match this check and will never receive litellm_call_info, even if they access it via kw.get("litellm_call_info").

The fix is to also accept callbacks whose signature contains a VAR_KEYWORD parameter:

sig = inspect.signature(cb.async_post_call_response_headers_hook)
has_explicit = "litellm_call_info" in sig.parameters
has_var_kw = any(
    p.kind == inspect.Parameter.VAR_KEYWORD
    for p in sig.parameters.values()
)
_CALLBACK_ACCEPTS_CALL_INFO[key] = has_explicit or has_var_kw

@ishaan-jaff ishaan-jaff merged commit 7c5e2e8 into BerriAI:main Mar 12, 2026
31 of 38 checks passed
shivamrawat1 pushed a commit that referenced this pull request Mar 12, 2026
…oss all endpoints (#22985)

* fix(proxy): make async_post_call_response_headers_hook consistent across all endpoints

The response headers hook had 5 gaps that prevented callbacks from
reliably extracting routing metadata across endpoint types:

1. Hook never fired for /audio/transcriptions (endpoint bypasses
   base_process_llm_request)
2. custom_llm_provider not accessible in hook data for any endpoint
3. custom_llm_provider not stamped in ResponsesAPIResponse._hidden_params
   (unlike chat completions)
4. model_info under inconsistent keys (metadata vs litellm_metadata)
5. request_headers always None at all call sites

This adds a litellm_call_info parameter to the hook that normalizes
routing metadata (custom_llm_provider, model_info, api_base, model_id)
regardless of endpoint type. Also stamps custom_llm_provider on
Responses API responses, adds the hook call to the transcription
handler, and passes request_headers at all call sites.

Supersedes PR #21385.

* fix(proxy): address review feedback — safer backwards compat and None guards

- Replace try/except TypeError with inspect.signature() check for
  litellm_call_info backwards compatibility. This avoids masking real
  TypeErrors inside callback implementations and prevents double
  invocation with inconsistent parameters.

- Use (data.get("key") or {}) instead of data.get("key", {}) to guard
  against keys that exist with an explicit None value, which would
  cause AttributeError on the subsequent .get() call.

* fix(proxy): cache inspect.signature result for callback compat check

Move the inspect.signature() call into a module-level helper with a
dict cache keyed by callback identity. Avoids repeated introspection
per request per callback in the hot path.

* fix(proxy): use class identity for signature cache key

Key the _CALLBACK_ACCEPTS_CALL_INFO cache by id(type(cb)) instead of
id(cb) to avoid stale entries from Python address reuse after GC.
All instances of the same callback class share the same method
signature, so class identity is both safer and more cache-efficient.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature]: Approach to add custom header into response header through failure event hook

2 participants