fix(proxy): make async_post_call_response_headers_hook consistent across all endpoints by michelligabriele · Pull Request #22985 · BerriAI/litellm

michelligabriele · 2026-03-06T17:21:09Z

The response headers hook had 5 gaps that prevented callbacks from reliably extracting routing metadata across endpoint types:

Hook never fired for /audio/transcriptions (endpoint bypasses base_process_llm_request)
custom_llm_provider not accessible in hook data for any endpoint
custom_llm_provider not stamped in ResponsesAPIResponse._hidden_params (unlike chat completions)
model_info under inconsistent keys (metadata vs litellm_metadata)
request_headers always None at all call sites

This adds a litellm_call_info parameter to the hook that normalizes routing metadata (custom_llm_provider, model_info, api_base, model_id) regardless of endpoint type. Also stamps custom_llm_provider on Responses API responses, adds the hook call to the transcription handler, and passes request_headers at all call sites.

Supersedes PR #21385.

Relevant issues

Fixes #19646

Related PRs: #20083 (original hook feature), #21385 (superseded by this PR)

Pre-Submission checklist

Please complete all items before asking a LiteLLM maintainer to review your PR

I have Added testing in the tests/test_litellm/ directory, Adding at least 1 test is a hard requirement - see details
My PR passes all unit tests on make test-unit
My PR's scope is as isolated as possible, it only solves 1 specific problem
I have requested a Greptile review by commenting @greptileai and received a Confidence Score of at least 4/5 before requesting a maintainer review

CI (LiteLLM team)

CI status guideline:

50-55 passing tests: main is stable with minor issues.

45-49 passing tests: acceptable but needs attention

<= 40 passing tests: unstable; be careful with your merges and assess the risk.

Branch creation CI run
Link:
CI run for the last commit
Link:
Merge / cherry-pick CI run
Links:

Type

🐛 Bug Fix

Changes

`litellm/integrations/custom_logger.py`

Added optional litellm_call_info: Optional[Dict[str, Any]] parameter to async_post_call_response_headers_hook signature with docstring describing the fields (custom_llm_provider, model_info, api_base, model_id)

`litellm/proxy/utils.py`

Added _build_litellm_call_info() static method that extracts routing metadata from response._hidden_params and normalizes model_info lookup across both data["metadata"] and data["litellm_metadata"]
Updated post_call_response_headers_hook() to build and pass litellm_call_info to each callback
Added TypeError fallback for backwards compatibility with existing callbacks that don't accept the new parameter

`litellm/proxy/common_request_processing.py`

Added request_headers=dict(request.headers) at all 3 hook call sites (streaming success, non-streaming success, failure)

`litellm/proxy/proxy_server.py`

Added post_call_response_headers_hook call to the /audio/transcriptions handler, which previously bypassed the hook entirely

`litellm/responses/main.py`

Stamped custom_llm_provider into ResponsesAPIResponse._hidden_params on both the async (aresponses) and sync (responses) non-streaming paths, mirroring litellm/main.py:1371 for chat completions

`litellm/responses/streaming_iterator.py`

Added custom_llm_provider to _hidden_params dict in BaseResponsesAPIStreamingIterator.__init__, mirroring streaming_handler.py:701 for chat completions

`tests/test_litellm/proxy/hooks/test_post_call_response_headers_hook.py`

Added 4 new tests:
- test_litellm_call_info_from_hidden_params — verifies litellm_call_info is built from response._hidden_params
- test_litellm_call_info_from_litellm_metadata — verifies model_info is found under litellm_metadata (responses API path)
- test_litellm_call_info_with_none_response — verifies graceful handling when response is None (failure path)
- test_litellm_call_info_backwards_compatible — verifies existing callbacks without the new parameter still work via TypeError fallback

…oss all endpoints The response headers hook had 5 gaps that prevented callbacks from reliably extracting routing metadata across endpoint types: 1. Hook never fired for /audio/transcriptions (endpoint bypasses base_process_llm_request) 2. custom_llm_provider not accessible in hook data for any endpoint 3. custom_llm_provider not stamped in ResponsesAPIResponse._hidden_params (unlike chat completions) 4. model_info under inconsistent keys (metadata vs litellm_metadata) 5. request_headers always None at all call sites This adds a litellm_call_info parameter to the hook that normalizes routing metadata (custom_llm_provider, model_info, api_base, model_id) regardless of endpoint type. Also stamps custom_llm_provider on Responses API responses, adds the hook call to the transcription handler, and passes request_headers at all call sites. Supersedes PR BerriAI#21385.

vercel · 2026-03-06T17:21:15Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
litellm	Ready	Preview, Comment	Mar 9, 2026 5:14pm

greptile-apps · 2026-03-06T17:24:02Z

Greptile Summary

This PR addresses five consistency gaps in async_post_call_response_headers_hook: it adds litellm_call_info (normalized routing metadata) to the hook contract, fixes the /audio/transcriptions endpoint that previously bypassed the hook entirely, passes request_headers at all call sites, stamps custom_llm_provider into ResponsesAPIResponse._hidden_params and the streaming iterator, and replaces the unsafe TypeError-catch backward-compat pattern with an inspect.signature cache keyed by class type.

Key changes:

litellm/proxy/utils.py: _build_litellm_call_info() normalizes metadata from both metadata and litellm_metadata keys; _accepts_litellm_call_info() uses a class-keyed signature cache for efficient per-request dispatch
litellm/proxy/common_request_processing.py: request_headers now correctly populated at all 3 hook call sites (streaming success, non-streaming success, failure)
litellm/proxy/proxy_server.py: hook wired into /audio/transcriptions success path; failure path still missing the hook call
litellm/responses/main.py + streaming_iterator.py: custom_llm_provider stamped into _hidden_params for both streaming and non-streaming Responses API paths
Tests: 4 new mock-only tests covering the full matrix of new behaviour

Remaining gaps:

Callbacks using **kwargs for forward-compatibility are silently excluded from receiving litellm_call_info
The /audio/transcriptions failure path still skips the hook, inconsistent with other endpoints

Confidence Score: 4/5

Safe to merge with minor completeness gaps; no regressions introduced and all backward-compat paths are correctly handled.
The core logic is sound: the fragile TypeError-catch is replaced with a robust inspect.signature cache keyed by class type, the None-guard for metadata uses or {} throughout, and all four previously-flagged issues from the PR description are addressed. Two remaining style gaps lower the score slightly: (1) callbacks using variadic keyword arguments will silently not receive litellm_call_info, and (2) the /audio/transcriptions failure path still skips the hook, inconsistent with other endpoints. Neither is a regression but both limit the completeness of the fix.
litellm/proxy/proxy_server.py (missing hook in failure path) and litellm/proxy/utils.py (_accepts_litellm_call_info VAR_KEYWORD gap)

Sequence Diagram

sequenceDiagram
    participant Client
    participant ProxyServer
    participant ProxyLogging
    participant _accepts_litellm_call_info
    participant CustomLogger

    Client->>ProxyServer: POST /chat/completions or /audio/transcriptions
    ProxyServer->>ProxyServer: LLM call → response
    ProxyServer->>ProxyLogging: post_call_response_headers_hook(data, response, request_headers)
    ProxyLogging->>ProxyLogging: _build_litellm_call_info(data, response)<br/>→ {custom_llm_provider, model_info, api_base, model_id}
    loop for each callback in litellm.callbacks
        ProxyLogging->>_accepts_litellm_call_info: id(type(cb))
        _accepts_litellm_call_info-->>ProxyLogging: True / False (cached)
        alt callback accepts litellm_call_info
            ProxyLogging->>CustomLogger: async_post_call_response_headers_hook(..., litellm_call_info=...)
        else legacy callback
            ProxyLogging->>CustomLogger: async_post_call_response_headers_hook(...) [no litellm_call_info]
        end
        CustomLogger-->>ProxyLogging: Dict[str, str] headers
    end
    ProxyLogging-->>ProxyServer: merged_headers
    ProxyServer->>Client: Response + injected headers

Comments Outside Diff (1)

litellm/proxy/proxy_server.py, line 7381-7384 (link)

Failure path skips post_call_response_headers_hook for audio transcriptions

The PR adds the hook to the success path of /audio/transcriptions (line 7371), but the except block omits it. Other endpoints routed through ProxyBaseLLMRequestProcessing (chat, completions, etc.) do call the hook in their failure path (see common_request_processing.py lines 1195–1206). This leaves audio transcription failure responses still not firing the hook, contradicting the PR's stated goal of making the hook consistent across all endpoints.

Consider calling proxy_logging_obj.post_call_response_headers_hook with response=None inside the except block (after line 7383), mirroring how common_request_processing.py handles the failure path.

_{Last reviewed commit: 9eb8b98}

greptile-apps · 2026-03-06T17:24:06Z

litellm/proxy/utils.py

+        model_info = (
+            data.get("metadata", {}).get("model_info")
+            or data.get("litellm_metadata", {}).get("model_info")
+            or {}
+        )


AttributeError when metadata key is explicitly None

data.get("metadata", {}) only returns the {} fallback when the key is absent. If the key exists with a None value (e.g. data = {"metadata": None}), Python returns None, and the subsequent .get("model_info") call raises AttributeError. This exception is caught by the outer try/except Exception in post_call_response_headers_hook, which means litellm_call_info is never built and no callbacks are invoked for that request.

The failure path in common_request_processing.py at line 1201 has the same pattern for proxy_server_request. The fix is to use (... or {}) to guard against None values:

Suggested change

model_info = (

data.get("metadata", {}).get("model_info")

or data.get("litellm_metadata", {}).get("model_info")

or {}

)

model_info = (

(data.get("metadata") or {}).get("model_info")

or (data.get("litellm_metadata") or {}).get("model_info")

or {}

)

greptile-apps · 2026-03-06T17:24:07Z

litellm/proxy/utils.py

+                    except TypeError:
+                        # Backwards compat: callback doesn't accept litellm_call_info
+                        result = await _callback.async_post_call_response_headers_hook(
+                            data=data,
+                            user_api_key_dict=user_api_key_dict,
+                            response=response,
+                            request_headers=request_headers,
+                        )


Overly broad TypeError catch masks real callback bugs

Catching TypeError as the signal that a callback doesn't accept litellm_call_info is unsafe. Any TypeError raised inside a callback's own implementation (e.g. a type mismatch in the callback body itself) will be silently swallowed, and then the callback will be called a second time without litellm_call_info. This means:

Real bugs in callbacks are silently hidden.

If a callback has side effects (e.g. writes headers, updates counters), those effects may happen twice with inconsistent parameters.

The standard approach for backward-compatibility shims is to inspect the signature once rather than catching TypeError at runtime. Using inspect.signature(_callback.async_post_call_response_headers_hook) lets you check for "litellm_call_info" in sig.parameters before the call, which is both safer and avoids the double-invocation risk entirely.

… guards - Replace try/except TypeError with inspect.signature() check for litellm_call_info backwards compatibility. This avoids masking real TypeErrors inside callback implementations and prevents double invocation with inconsistent parameters. - Use (data.get("key") or {}) instead of data.get("key", {}) to guard against keys that exist with an explicit None value, which would cause AttributeError on the subsequent .get() call.

greptile-apps · 2026-03-06T17:41:14Z

litellm/proxy/utils.py

-                        response=response,
-                        request_headers=request_headers,
-                    )
+                    sig = inspect.signature(_callback.async_post_call_response_headers_hook)


inspect.signature called per-request per-callback

inspect.signature(...) is called inside the hot callback loop on every request. While Python's inspect module caches the parsed signature on the function object after the first call (making subsequent calls cheap), the attribute lookup and wrapper overhead still fires for every request × every callback. For deployments with many callbacks at high QPS this adds up.

A simple guard caches the result in a module-level dict keyed by callback identity:

_CALLBACK_ACCEPTS_CALL_INFO: Dict[int, bool] = {} def _accepts_litellm_call_info(cb: CustomLogger) -> bool: key = id(cb) if key not in _CALLBACK_ACCEPTS_CALL_INFO: sig = inspect.signature(cb.async_post_call_response_headers_hook) _CALLBACK_ACCEPTS_CALL_INFO[key] = "litellm_call_info" in sig.parameters return _CALLBACK_ACCEPTS_CALL_INFO[key]

This makes the per-request cost a single dict lookup instead of a inspect.signature call.

Move the inspect.signature() call into a module-level helper with a dict cache keyed by callback identity. Avoids repeated introspection per request per callback in the hot path.

greptile-apps · 2026-03-06T17:57:59Z

litellm/proxy/utils.py

+_CALLBACK_ACCEPTS_CALL_INFO: Dict[int, bool] = {}
+
+
+def _accepts_litellm_call_info(cb: CustomLogger) -> bool:
+    key = id(cb)
+    if key not in _CALLBACK_ACCEPTS_CALL_INFO:
+        sig = inspect.signature(cb.async_post_call_response_headers_hook)
+        _CALLBACK_ACCEPTS_CALL_INFO[key] = "litellm_call_info" in sig.parameters
+    return _CALLBACK_ACCEPTS_CALL_INFO[key]


The cache key uses id(cb) (instance memory address), which is vulnerable to Python's address reuse after garbage collection. In deployments with dynamic callback add/remove: a removed callback's address may be reused by a new callback that doesn't accept litellm_call_info, causing the stale cache entry to return the wrong signature status.

Since all instances of the same callback class share the same method signature, use id(type(cb)) instead. This is safer (classes are effectively singletons), has better cache hit rates, and avoids the GC reuse problem entirely.

Key the _CALLBACK_ACCEPTS_CALL_INFO cache by id(type(cb)) instead of id(cb) to avoid stale entries from Python address reuse after GC. All instances of the same callback class share the same method signature, so class identity is both safer and more cache-efficient.

greptile-apps · 2026-03-09T17:17:17Z

litellm/proxy/utils.py

+def _accepts_litellm_call_info(cb: CustomLogger) -> bool:
+    key = id(type(cb))
+    if key not in _CALLBACK_ACCEPTS_CALL_INFO:
+        sig = inspect.signature(cb.async_post_call_response_headers_hook)
+        _CALLBACK_ACCEPTS_CALL_INFO[key] = "litellm_call_info" in sig.parameters
+    return _CALLBACK_ACCEPTS_CALL_INFO[key]


VAR_KEYWORD callbacks silently excluded from litellm_call_info

"litellm_call_info" in sig.parameters only matches explicitly-named parameters. Callbacks that use variadic keyword arguments for forward-compatibility (e.g. async def async_post_call_response_headers_hook(self, data, user_api_key_dict, response, **kw)) will not match this check and will never receive litellm_call_info, even if they access it via kw.get("litellm_call_info").

The fix is to also accept callbacks whose signature contains a VAR_KEYWORD parameter:

sig = inspect.signature(cb.async_post_call_response_headers_hook) has_explicit = "litellm_call_info" in sig.parameters has_var_kw = any( p.kind == inspect.Parameter.VAR_KEYWORD for p in sig.parameters.values() ) _CALLBACK_ACCEPTS_CALL_INFO[key] = has_explicit or has_var_kw

…oss all endpoints (#22985) * fix(proxy): make async_post_call_response_headers_hook consistent across all endpoints The response headers hook had 5 gaps that prevented callbacks from reliably extracting routing metadata across endpoint types: 1. Hook never fired for /audio/transcriptions (endpoint bypasses base_process_llm_request) 2. custom_llm_provider not accessible in hook data for any endpoint 3. custom_llm_provider not stamped in ResponsesAPIResponse._hidden_params (unlike chat completions) 4. model_info under inconsistent keys (metadata vs litellm_metadata) 5. request_headers always None at all call sites This adds a litellm_call_info parameter to the hook that normalizes routing metadata (custom_llm_provider, model_info, api_base, model_id) regardless of endpoint type. Also stamps custom_llm_provider on Responses API responses, adds the hook call to the transcription handler, and passes request_headers at all call sites. Supersedes PR #21385. * fix(proxy): address review feedback — safer backwards compat and None guards - Replace try/except TypeError with inspect.signature() check for litellm_call_info backwards compatibility. This avoids masking real TypeErrors inside callback implementations and prevents double invocation with inconsistent parameters. - Use (data.get("key") or {}) instead of data.get("key", {}) to guard against keys that exist with an explicit None value, which would cause AttributeError on the subsequent .get() call. * fix(proxy): cache inspect.signature result for callback compat check Move the inspect.signature() call into a module-level helper with a dict cache keyed by callback identity. Avoids repeated introspection per request per callback in the hot path. * fix(proxy): use class identity for signature cache key Key the _CALLBACK_ACCEPTS_CALL_INFO cache by id(type(cb)) instead of id(cb) to avoid stale entries from Python address reuse after GC. All instances of the same callback class share the same method signature, so class identity is both safer and more cache-efficient.

vercel bot deployed to Preview March 6, 2026 17:22 View deployment

greptile-apps bot reviewed Mar 6, 2026

View reviewed changes

vercel bot deployed to Preview March 6, 2026 17:38 View deployment

greptile-apps bot reviewed Mar 6, 2026

View reviewed changes

fix(proxy): cache inspect.signature result for callback compat check

5fc852a

Move the inspect.signature() call into a module-level helper with a dict cache keyed by callback identity. Avoids repeated introspection per request per callback in the hot path.

vercel bot deployed to Preview March 6, 2026 17:53 View deployment

greptile-apps bot reviewed Mar 6, 2026

View reviewed changes

vercel bot deployed to Preview March 9, 2026 17:14 View deployment

greptile-apps bot reviewed Mar 9, 2026

View reviewed changes

ishaan-jaff merged commit 7c5e2e8 into BerriAI:main Mar 12, 2026
31 of 38 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(proxy): make async_post_call_response_headers_hook consistent across all endpoints#22985

fix(proxy): make async_post_call_response_headers_hook consistent across all endpoints#22985
ishaan-jaff merged 4 commits intoBerriAI:mainfrom
michelligabriele:fix/response-headers-hook-consistency

michelligabriele commented Mar 6, 2026

Uh oh!

vercel bot commented Mar 6, 2026 •

edited

Loading

Uh oh!

greptile-apps bot commented Mar 6, 2026 •

edited

Loading

Comments Outside Diff (1)

Uh oh!

greptile-apps bot Mar 6, 2026

Uh oh!

greptile-apps bot Mar 6, 2026

Uh oh!

greptile-apps bot Mar 6, 2026

Uh oh!

greptile-apps bot Mar 6, 2026

Uh oh!

greptile-apps bot Mar 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

michelligabriele commented Mar 6, 2026

Relevant issues

Pre-Submission checklist

CI (LiteLLM team)

Type

Changes

litellm/integrations/custom_logger.py

litellm/proxy/utils.py

litellm/proxy/common_request_processing.py

litellm/proxy/proxy_server.py

litellm/responses/main.py

litellm/responses/streaming_iterator.py

tests/test_litellm/proxy/hooks/test_post_call_response_headers_hook.py

Uh oh!

vercel bot commented Mar 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

greptile-apps bot commented Mar 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 4/5

Sequence Diagram

Comments Outside Diff (1)

Uh oh!

greptile-apps bot Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

`litellm/integrations/custom_logger.py`

`litellm/proxy/utils.py`

`litellm/proxy/common_request_processing.py`

`litellm/proxy/proxy_server.py`

`litellm/responses/main.py`

`litellm/responses/streaming_iterator.py`

`tests/test_litellm/proxy/hooks/test_post_call_response_headers_hook.py`

vercel bot commented Mar 6, 2026 •

edited

Loading

greptile-apps bot commented Mar 6, 2026 •

edited

Loading