fix(proxy): pass request_headers to response headers hook + fix guardrail gap#21385
fix(proxy): pass request_headers to response headers hook + fix guardrail gap#21385michelligabriele wants to merge 4 commits intoBerriAI:mainfrom
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
Greptile SummaryThis PR fixes the
Confidence Score: 3/5
|
| Filename | Overview |
|---|---|
| litellm/proxy/common_request_processing.py | Core fix: captures request headers and passes them to all 3 hook call sites. Correct implementation but exposes sensitive headers (Authorization, etc.) to callbacks without filtering. |
| litellm/proxy/response_api_endpoints/endpoints.py | Adds response headers hook to ModifyResponseException handler in /responses endpoint. Correct fix but inconsistent — same gap remains in /chat/completions, /completions, and /v1/messages endpoints. |
| docs/my-website/docs/proxy/call_hooks.md | Updated docs with request_headers usage example and tip about endpoint coverage. Tip claim may be inaccurate for guardrail exceptions on non-/responses endpoints. |
| e2e_demo_response_headers_callback.py | New E2E demo file at repo root showing request header echoing. Duplicated identically in tests/ directory. |
| tests/e2e_demo_response_headers_callback.py | E2E demo callback copy under tests/. Identical to root-level file — one copy should be removed. |
| tests/test_litellm/proxy/hooks/test_post_call_response_headers_hook.py | Two new unit tests for request_headers forwarding. Properly mocked, no real network calls. Tests verify headers reach the callback. |
| tests/test_litellm/proxy/response_api_endpoints/test_response_headers_on_guardrail_exception.py | New test verifying response headers hook fires on ModifyResponseException in /responses. Uses TestClient with mocks, no real network calls. |
Flowchart
flowchart TD
A[Incoming HTTP Request] --> B[base_process_llm_request]
B --> C["Capture request.headers → self._request_headers"]
C --> D{Route Request}
D -->|Streaming| E[Streaming Success Path]
D -->|Non-Streaming| F[Non-Streaming Success Path]
D -->|Error| G[Failure Path]
D -->|ModifyResponseException| H{Which Endpoint?}
E --> E1["post_call_response_headers_hook(request_headers=self._request_headers)"]
E1 --> E2[Inject headers into StreamingResponse]
F --> F1["post_call_response_headers_hook(request_headers=self._request_headers)"]
F1 --> F2[Inject headers into fastapi_response]
G --> G1["post_call_response_headers_hook(request_headers=self._request_headers)"]
G1 --> G2[Inject headers into error response]
H -->|/responses ✅ Fixed| I["post_call_response_headers_hook(request_headers=dict(request.headers))"]
I --> I1[Inject headers into guardrail response]
H -->|/chat/completions ❌| J[No hook call - gap remains]
H -->|/completions ❌| K[No hook call - gap remains]
H -->|/v1/messages ❌| L[No hook call - gap remains]
Last reviewed commit: 0ad9bac
| @@ -749,6 +750,8 @@ async def base_process_llm_request( | |||
| """ | |||
| Common request processing logic for both chat completions and responses API endpoints | |||
| """ | |||
| self._request_headers = dict(request.headers) | |||
There was a problem hiding this comment.
Sensitive headers exposed to callbacks
dict(request.headers) captures all HTTP headers, including authorization, cookie, and other sensitive values. These are then passed unfiltered to every CustomLogger callback via request_headers. A misconfigured or third-party callback could inadvertently log or leak credentials.
Consider filtering out sensitive headers before storing them, or at minimum document this behavior clearly. For example:
_sensitive_headers = {"authorization", "cookie", "proxy-authorization"}
self._request_headers = {
k: v for k, v in request.headers.items()
if k.lower() not in _sensitive_headers
}| """ | ||
| Demo CustomLogger that injects custom response headers. | ||
|
|
||
| Shows how to: | ||
| 1. Echo an incoming request header (e.g., APIGEE request ID) into the response | ||
| 2. Inject headers on both success and failure paths | ||
| 3. Works for /chat/completions, /embeddings, and /responses | ||
|
|
||
| Usage: | ||
| litellm --config tests/e2e_demo_response_headers_config.yaml | ||
|
|
||
| Test commands: | ||
| # /chat/completions (non-streaming) | ||
| curl -s -D- http://localhost:4000/chat/completions \ | ||
| -H "Authorization: Bearer sk-1234" \ | ||
| -H "x-apigee-request-id: apigee-req-001" \ | ||
| -d '{"model":"gpt-4o-mini","messages":[{"role":"user","content":"hi"}]}' | ||
|
|
||
| # /chat/completions (streaming) | ||
| curl -s -D- http://localhost:4000/chat/completions \ | ||
| -H "Authorization: Bearer sk-1234" \ | ||
| -H "x-apigee-request-id: apigee-req-002" \ | ||
| -d '{"model":"gpt-4o-mini","messages":[{"role":"user","content":"hi"}],"stream":true}' | ||
|
|
||
| # /embeddings | ||
| curl -s -D- http://localhost:4000/embeddings \ | ||
| -H "Authorization: Bearer sk-1234" \ | ||
| -H "x-apigee-request-id: apigee-req-003" \ | ||
| -d '{"model":"text-embedding-3-small","input":"hello"}' | ||
|
|
||
| # /v1/responses (non-streaming) | ||
| curl -s -D- http://localhost:4000/v1/responses \ | ||
| -H "Authorization: Bearer sk-1234" \ | ||
| -H "x-apigee-request-id: apigee-req-004" \ | ||
| -d '{"model":"gpt-4o-mini","input":"hi"}' | ||
|
|
||
| # /v1/responses (streaming) | ||
| curl -s -D- http://localhost:4000/v1/responses \ | ||
| -H "Authorization: Bearer sk-1234" \ | ||
| -H "x-apigee-request-id: apigee-req-005" \ | ||
| -d '{"model":"gpt-4o-mini","input":"hi","stream":true}' | ||
|
|
||
| # Failure path (bad model → headers still injected) | ||
| curl -s -D- http://localhost:4000/chat/completions \ | ||
| -H "Authorization: Bearer sk-1234" \ | ||
| -H "x-apigee-request-id: apigee-req-006" \ | ||
| -d '{"model":"nonexistent-model","messages":[{"role":"user","content":"hi"}]}' | ||
|
|
||
| Expected: All responses contain x-apigee-request-id, x-custom-header, and x-litellm-hook-model. | ||
| """ | ||
|
|
||
| from typing import Any, Dict, Optional | ||
|
|
||
| from litellm.integrations.custom_logger import CustomLogger | ||
| from litellm.proxy._types import UserAPIKeyAuth | ||
|
|
||
|
|
||
| class ResponseHeaderInjector(CustomLogger): | ||
| """ | ||
| Demonstrates injecting custom HTTP response headers via the proxy hook. | ||
|
|
||
| Key features: | ||
| - Echoes the incoming x-apigee-request-id header back in the response | ||
| - Adds a static custom header and the model name | ||
| - Works for success (streaming + non-streaming) and failure responses | ||
| - Works for all endpoints: /chat/completions, /embeddings, /responses | ||
| """ | ||
|
|
||
| async def async_post_call_response_headers_hook( | ||
| self, | ||
| data: dict, | ||
| user_api_key_dict: UserAPIKeyAuth, | ||
| response: Any, | ||
| request_headers: Optional[Dict[str, str]] = None, | ||
| ) -> Optional[Dict[str, str]]: | ||
| headers: Dict[str, str] = { | ||
| "x-custom-header": "hello-from-hook", | ||
| "x-litellm-hook-model": data.get("model", "unknown"), | ||
| } | ||
|
|
||
| # Echo the APIGEE request ID from the incoming request into the response | ||
| if request_headers: | ||
| apigee_id = request_headers.get("x-apigee-request-id") | ||
| if apigee_id: | ||
| headers["x-apigee-request-id"] = apigee_id | ||
|
|
||
| return headers | ||
|
|
||
|
|
||
| response_header_injector = ResponseHeaderInjector() |
There was a problem hiding this comment.
Duplicate file at repo root
This file is identical to tests/e2e_demo_response_headers_callback.py. Having the same file in two places creates a maintenance burden. Consider keeping only the tests/ copy and removing this root-level file, or if it's intended to be a user-facing demo, place it under docs/ or reference it from a single location.
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
| # Call response headers hook for guardrail failure path | ||
| callback_headers = await proxy_logging_obj.post_call_response_headers_hook( | ||
| data=_data, | ||
| user_api_key_dict=user_api_key_dict, | ||
| response=None, | ||
| request_headers=dict(request.headers), | ||
| ) | ||
| if callback_headers: | ||
| fastapi_response.headers.update(callback_headers) |
There was a problem hiding this comment.
Inconsistent fix — other ModifyResponseException handlers still lack this hook
This PR adds post_call_response_headers_hook to the ModifyResponseException handler for /responses, but the same gap exists in:
proxy_server.pyline ~6194 (/chat/completionshandler)proxy_server.pyline ~6358 (/completionstext completions handler)anthropic_endpoints/endpoints.pyline ~72 (/v1/messageshandler)
If the intention is that custom response headers should be injected on guardrail failures for all endpoints, those handlers need the same treatment. The docs tip at line 443 of call_hooks.md states this hook "works for all proxy endpoints" including failure responses, which would be inaccurate for these other endpoints on guardrail exceptions.
|
@greptileai review |
Greptile SummaryThis PR fixes the
Confidence Score: 2/5
|
| Filename | Overview |
|---|---|
| litellm/proxy/common_request_processing.py | Core fix to populate request_headers at all hook call sites. Adds _SENSITIVE_HEADERS filtering (addressing previous review feedback) and passes request_headers to all 3 hook sites. Has a missing Dict import that will cause a NameError at runtime. |
| litellm/proxy/response_api_endpoints/endpoints.py | Adds post_call_response_headers_hook call in the ModifyResponseException handler for /responses. Uses _filter_sensitive_headers consistently with other endpoints. Implementation is correct. |
| litellm/proxy/proxy_server.py | Adds post_call_response_headers_hook to ModifyResponseException handlers in both /chat/completions and /completions endpoints. Consistent pattern with other endpoints. |
| litellm/proxy/anthropic_endpoints/endpoints.py | Adds post_call_response_headers_hook to the ModifyResponseException handler for /v1/messages. Same pattern as other endpoints. |
| docs/my-website/docs/proxy/call_hooks.md | Updated docs to show request_headers usage, removed unnecessary __init__, and added tip about endpoint coverage. Documentation is clear and accurate. |
| tests/test_litellm/proxy/hooks/test_post_call_response_headers_hook.py | Added 2 new mock-only tests verifying request_headers forwarding to callbacks. Tests are well-structured and don't make real network calls. |
| tests/test_litellm/proxy/response_api_endpoints/test_response_headers_on_guardrail_exception.py | New test verifying custom headers appear on guardrail failure in /responses. Uses TestClient with mocks — no real network calls. Properly patches auth and processor. |
| tests/e2e_demo_response_headers_callback.py | New E2E demo showing APIGEE request ID echoing. Well-documented with usage instructions. Purely demonstrative, no issues. |
Flowchart
flowchart TD
A[Incoming HTTP Request] --> B[ProxyBaseLLMRequestProcessing.__init__]
B --> C[base_process_llm_request]
C --> D["_filter_sensitive_headers(request.headers)"]
D --> E[Store as self._request_headers]
E --> F{Request outcome?}
F -->|Streaming Success| G["post_call_response_headers_hook(request_headers=self._request_headers)"]
G --> H[Merge into streaming custom_headers]
F -->|Non-Streaming Success| I["post_call_response_headers_hook(request_headers=self._request_headers)"]
I --> J[Update fastapi_response.headers]
F -->|Failure| K["post_call_response_headers_hook(request_headers=self._request_headers)"]
K --> L[Include in error response headers]
F -->|ModifyResponseException| M["_filter_sensitive_headers(request.headers) — called directly"]
M --> N["post_call_response_headers_hook(request_headers=filtered)"]
N --> O[Update fastapi_response.headers]
style M fill:#f9f,stroke:#333
style D fill:#bbf,stroke:#333
Last reviewed commit: 1f22984
| def __init__(self, data: dict): | ||
| self.data = data | ||
| self._request_headers: Optional[Dict[str, str]] = None |
There was a problem hiding this comment.
Missing Dict import causes NameError
Dict is used in the type annotation here but is not imported from typing (lines 6-15 import TYPE_CHECKING, Any, AsyncGenerator, Callable, Literal, Optional, Tuple, Union — but not Dict). In Python without from __future__ import annotations, variable annotations in function bodies are evaluated at runtime. This will raise a NameError: name 'Dict' is not defined when ProxyBaseLLMRequestProcessing.__init__ is called.
| self._request_headers: Optional[Dict[str, str]] = None | |
| self._request_headers: Optional[dict] = None |
Alternatively, add Dict to the typing imports at line 6.
1f22984 to
bd30289
Compare
|
@greptileai review |
Greptile SummaryThis PR fixes the
Confidence Score: 4/5
|
| Filename | Overview |
|---|---|
| litellm/proxy/common_request_processing.py | Core fix: adds _filter_sensitive_headers static method and _request_headers instance attribute; captures filtered headers in base_process_llm_request and passes them to all three hook call sites (streaming success, non-streaming success, failure). |
| litellm/proxy/proxy_server.py | Adds post_call_response_headers_hook call to both /chat/completions and /completions ModifyResponseException handlers, using _filter_sensitive_headers to strip credentials before passing to callbacks. |
| litellm/proxy/response_api_endpoints/endpoints.py | Adds post_call_response_headers_hook call to /responses ModifyResponseException handler with filtered headers, closing the guardrail gap for the responses API. |
| litellm/proxy/anthropic_endpoints/endpoints.py | Adds post_call_response_headers_hook call to /v1/messages ModifyResponseException handler with filtered headers, ensuring consistency with other endpoints. |
| docs/my-website/docs/proxy/call_hooks.md | Updates hook example to show request_headers usage for echoing gateway IDs; adds tip about endpoint coverage. Doc says "original HTTP request headers" but sensitive headers are filtered — slightly misleading. |
| tests/e2e_demo_response_headers_callback.py | Demo callback showing APIGEE request ID echoing. Not a test file — serves as a reference implementation for documentation purposes. |
| tests/test_litellm/proxy/hooks/test_post_call_response_headers_hook.py | Adds two well-structured mock-only tests verifying request_headers forwarding to callbacks. Tests are properly isolated with no network calls. |
| tests/test_litellm/proxy/response_api_endpoints/test_response_headers_on_guardrail_exception.py | Mock-only test verifying the guardrail exception path in /responses invokes the response headers hook. Uses TestClient with mocked auth and processor — no real network calls. |
Flowchart
flowchart TD
A[Incoming HTTP Request] --> B[FastAPI Endpoint]
B --> C{_filter_sensitive_headers}
C -->|Remove auth, cookie, proxy-auth| D[Filtered request_headers]
D --> E[base_process_llm_request]
E --> F{Response Path}
F -->|Streaming Success| G[post_call_response_headers_hook\nwith request_headers]
F -->|Non-Streaming Success| H[post_call_response_headers_hook\nwith request_headers]
F -->|Failure| I[post_call_response_headers_hook\nwith request_headers]
F -->|ModifyResponseException\nGuardrail Block| J[post_call_response_headers_hook\nwith _filter_sensitive_headers]
G --> K[Custom headers injected into response]
H --> K
I --> K
J --> K
K --> L[HTTP Response with custom headers]
Last reviewed commit: bd30289
| :::tip | ||
| This hook works for **all proxy endpoints**: `/chat/completions`, `/embeddings`, `/responses` (streaming and non-streaming), and failure responses. | ||
|
|
||
| The `request_headers` parameter contains the original HTTP request headers, allowing you to echo incoming headers (e.g., API gateway request IDs) into the response. |
There was a problem hiding this comment.
Docs say "original" but headers are filtered
The text says request_headers contains "the original HTTP request headers," but _filter_sensitive_headers strips authorization, cookie, and proxy-authorization before passing them to callbacks. Consider updating the wording to note this filtering, e.g.:
"The request_headers parameter contains the HTTP request headers (with sensitive headers like authorization and cookie removed for security)..."
This helps callback authors understand they won't have access to auth headers through this parameter.
…rail gap The async_post_call_response_headers_hook receives request_headers as a parameter but it was never populated — always None. This prevented use cases like echoing an API gateway request ID (e.g., APIGEE) from the incoming request into the response headers. Changes: - Capture request headers from the FastAPI Request object at the start of base_process_llm_request and pass them to all three hook call sites (streaming success, non-streaming success, failure) - Add missing post_call_response_headers_hook call in the /responses endpoint's ModifyResponseException handler, so custom headers are injected even when a guardrail blocks the request - Update E2E demo to show APIGEE request ID echoing across all endpoints - Update docs with request_headers usage example - Add unit tests for request_headers forwarding and guardrail exception path Closes BerriAI#19646
…ints Address code review feedback: 1. Filter sensitive request headers (authorization, cookie, proxy-authorization) before passing to callbacks to prevent credential leaks in third-party loggers. 2. Add post_call_response_headers_hook to ModifyResponseException handlers in all four endpoints (/chat/completions, /completions, /v1/messages, /responses) — previously only /responses was covered. 3. Remove duplicate e2e demo file from repo root (canonical copy lives in tests/).
Move post_call_failure_hook + post_call_response_headers_hook calls from 4 separate endpoint handlers into a single _handle_modify_response_exception method on ProxyBaseLLMRequestProcessing, mirroring the existing _handle_llm_api_exception pattern. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
bd30289 to
6318692
Compare
|
@greptileai review |
Greptile SummaryThis PR fixes the
Confidence Score: 3/5
|
| Filename | Overview |
|---|---|
| litellm/proxy/common_request_processing.py | Core change: adds _filter_sensitive_headers, captures request headers at start of base_process_llm_request, passes them to all hook call sites, and centralizes ModifyResponseException handling in _handle_modify_response_exception. Implementation is clean and consistent. |
| litellm/proxy/proxy_server.py | Replaces inline failure hook with centralized _handle_modify_response_exception in both /chat/completions and /completions. Custom headers set on fastapi_response are lost when a new StreamingResponse is returned in the streaming guardrail path. |
| litellm/proxy/anthropic_endpoints/endpoints.py | Refactored to use centralized _handle_modify_response_exception. Same streaming header-loss issue: create_response at line 111 is called with headers={} instead of including the custom callback headers. |
| litellm/proxy/response_api_endpoints/endpoints.py | Refactored to use centralized _handle_modify_response_exception. Non-streaming path works correctly since it returns a model object and fastapi_response headers are preserved by FastAPI. |
| tests/test_litellm/proxy/hooks/test_post_call_response_headers_hook.py | Adds two new mock-only tests verifying request_headers are forwarded to the callback hook. Tests are correctly structured and don't make real network calls. |
| tests/test_litellm/proxy/response_api_endpoints/test_response_headers_on_guardrail_exception.py | New test using FastAPI TestClient that verifies custom headers appear on guardrail failure for /v1/responses. Uses appropriate mocking to avoid real network calls. |
| docs/my-website/docs/proxy/call_hooks.md | Updated docs to show request_headers usage with gateway ID echoing example. Added tip about all-endpoint support. |
| tests/e2e_demo_response_headers_callback.py | New E2E demo callback showing how to echo APIGEE request IDs via the response headers hook. Documentation/demo only file. |
Flowchart
%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[Incoming Request] --> B[base_process_llm_request]
B --> C{Capture request_headers\nfilter sensitive headers}
C --> D[Route to LLM]
D --> E{Response Type?}
E -->|Streaming Success| F[post_call_response_headers_hook\nwith request_headers]
E -->|Non-Streaming Success| G[post_call_response_headers_hook\nwith request_headers]
E -->|Exception| H{Exception Type?}
H -->|ModifyResponseException| I[_handle_modify_response_exception]
I --> J[post_call_failure_hook]
J --> K[post_call_response_headers_hook\nwith request_headers]
K --> L{Streaming?}
L -->|Yes| M["⚠️ New StreamingResponse\n(headers lost)"]
L -->|No| N[Headers on fastapi_response ✓]
H -->|Other Exception| O[_handle_llm_api_exception]
O --> P[post_call_response_headers_hook\nwith request_headers]
F --> Q[Headers in custom_headers dict ✓]
G --> R[Headers on fastapi_response ✓]
Last reviewed commit: 6318692
Additional Comments (2)
When a streaming request triggers a The same issue exists in the To fix this, the custom headers need to be explicitly passed to the new
Consider forwarding the |
…lures When a streaming request triggered ModifyResponseException, custom headers set by _handle_modify_response_exception on fastapi_response were lost because a new StreamingResponse was returned without them. Pass dict(fastapi_response.headers) to StreamingResponse/create_response in /chat/completions, /completions, and /v1/messages handlers. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
@greptileai review |
Greptile SummaryThis PR fixes the
Confidence Score: 4/5
|
| Filename | Overview |
|---|---|
| litellm/proxy/common_request_processing.py | Core fix: captures filtered request headers, passes them to all 3 hook call sites, and adds centralized _handle_modify_response_exception method. Well-structured changes. |
| litellm/proxy/proxy_server.py | Chat completions and text completions ModifyResponseException handlers now use centralized method and pass custom headers to StreamingResponse. |
| litellm/proxy/anthropic_endpoints/endpoints.py | Anthropic endpoint now uses centralized guardrail handler and passes fastapi_response.headers to streaming response instead of empty {}. |
| litellm/proxy/response_api_endpoints/endpoints.py | Responses endpoint now uses centralized _handle_modify_response_exception for guardrail exceptions, ensuring custom headers are injected. |
| docs/my-website/docs/proxy/call_hooks.md | Updated docs with request_headers usage example and tip about endpoint coverage. Minor inaccuracy: says "original" headers but they are filtered. |
| tests/e2e_demo_response_headers_callback.py | E2E demo callback showing APIGEE request ID echoing. Not a test file — it's a CustomLogger implementation for manual testing. |
| tests/test_litellm/proxy/hooks/test_post_call_response_headers_hook.py | Two new mock-only tests validating request_headers forwarding through ProxyLogging.post_call_response_headers_hook. Good coverage. |
| tests/test_litellm/proxy/response_api_endpoints/test_response_headers_on_guardrail_exception.py | Integration test using TestClient with mocked auth and request processing to verify custom headers on guardrail failures in /responses. Fully mocked, no real network calls. |
Sequence Diagram
sequenceDiagram
participant Client
participant FastAPI as FastAPI Endpoint
participant Processor as ProxyBaseLLMRequestProcessing
participant Guardrail
participant ProxyLogging
participant Callback as CustomLogger Callback
Client->>FastAPI: HTTP Request (with headers)
FastAPI->>Processor: base_process_llm_request(request)
Processor->>Processor: _filter_sensitive_headers(request.headers)
Note over Processor: Strips authorization, cookie, proxy-authorization
alt Guardrail blocks request
Processor->>Guardrail: pre-call check
Guardrail-->>Processor: ModifyResponseException
Processor->>FastAPI: raises ModifyResponseException
FastAPI->>Processor: _handle_modify_response_exception(e)
Processor->>ProxyLogging: post_call_failure_hook()
Processor->>ProxyLogging: post_call_response_headers_hook(request_headers)
ProxyLogging->>Callback: async_post_call_response_headers_hook(request_headers)
Callback-->>ProxyLogging: custom headers (e.g., x-apigee-request-id)
ProxyLogging-->>Processor: merged headers
Processor->>FastAPI: fastapi_response.headers.update(headers)
FastAPI-->>Client: 200 + custom headers + violation message
else Success (non-streaming)
Processor->>ProxyLogging: post_call_response_headers_hook(request_headers)
ProxyLogging->>Callback: async_post_call_response_headers_hook(request_headers)
Callback-->>ProxyLogging: custom headers
ProxyLogging-->>Processor: merged headers
Processor->>FastAPI: fastapi_response.headers.update(headers)
FastAPI-->>Client: 200 + custom headers + response
else Success (streaming)
Processor->>ProxyLogging: post_call_response_headers_hook(request_headers)
ProxyLogging->>Callback: async_post_call_response_headers_hook(request_headers)
Callback-->>ProxyLogging: custom headers
ProxyLogging-->>Processor: merged headers
Processor-->>FastAPI: StreamingResponse(headers=custom_headers)
FastAPI-->>Client: SSE stream + custom headers
end
Last reviewed commit: 36717bd
|
Automated patch bundle from top-50 unresolved backlog scan.\nGenerated due limited direct branch-write access; please apply/cherry-pick the proposed minimal edits below.\n\n## PR #21385 — Unresolved threads summary
Concrete patch proposal
|
…oss all endpoints The response headers hook had 5 gaps that prevented callbacks from reliably extracting routing metadata across endpoint types: 1. Hook never fired for /audio/transcriptions (endpoint bypasses base_process_llm_request) 2. custom_llm_provider not accessible in hook data for any endpoint 3. custom_llm_provider not stamped in ResponsesAPIResponse._hidden_params (unlike chat completions) 4. model_info under inconsistent keys (metadata vs litellm_metadata) 5. request_headers always None at all call sites This adds a litellm_call_info parameter to the hook that normalizes routing metadata (custom_llm_provider, model_info, api_base, model_id) regardless of endpoint type. Also stamps custom_llm_provider on Responses API responses, adds the hook call to the transcription handler, and passes request_headers at all call sites. Supersedes PR BerriAI#21385.
…oss all endpoints (#22985) * fix(proxy): make async_post_call_response_headers_hook consistent across all endpoints The response headers hook had 5 gaps that prevented callbacks from reliably extracting routing metadata across endpoint types: 1. Hook never fired for /audio/transcriptions (endpoint bypasses base_process_llm_request) 2. custom_llm_provider not accessible in hook data for any endpoint 3. custom_llm_provider not stamped in ResponsesAPIResponse._hidden_params (unlike chat completions) 4. model_info under inconsistent keys (metadata vs litellm_metadata) 5. request_headers always None at all call sites This adds a litellm_call_info parameter to the hook that normalizes routing metadata (custom_llm_provider, model_info, api_base, model_id) regardless of endpoint type. Also stamps custom_llm_provider on Responses API responses, adds the hook call to the transcription handler, and passes request_headers at all call sites. Supersedes PR #21385. * fix(proxy): address review feedback — safer backwards compat and None guards - Replace try/except TypeError with inspect.signature() check for litellm_call_info backwards compatibility. This avoids masking real TypeErrors inside callback implementations and prevents double invocation with inconsistent parameters. - Use (data.get("key") or {}) instead of data.get("key", {}) to guard against keys that exist with an explicit None value, which would cause AttributeError on the subsequent .get() call. * fix(proxy): cache inspect.signature result for callback compat check Move the inspect.signature() call into a module-level helper with a dict cache keyed by callback identity. Avoids repeated introspection per request per callback in the hot path. * fix(proxy): use class identity for signature cache key Key the _CALLBACK_ACCEPTS_CALL_INFO cache by id(type(cb)) instead of id(cb) to avoid stale entries from Python address reuse after GC. All instances of the same callback class share the same method signature, so class identity is both safer and more cache-efficient.
…oss all endpoints (#22985) * fix(proxy): make async_post_call_response_headers_hook consistent across all endpoints The response headers hook had 5 gaps that prevented callbacks from reliably extracting routing metadata across endpoint types: 1. Hook never fired for /audio/transcriptions (endpoint bypasses base_process_llm_request) 2. custom_llm_provider not accessible in hook data for any endpoint 3. custom_llm_provider not stamped in ResponsesAPIResponse._hidden_params (unlike chat completions) 4. model_info under inconsistent keys (metadata vs litellm_metadata) 5. request_headers always None at all call sites This adds a litellm_call_info parameter to the hook that normalizes routing metadata (custom_llm_provider, model_info, api_base, model_id) regardless of endpoint type. Also stamps custom_llm_provider on Responses API responses, adds the hook call to the transcription handler, and passes request_headers at all call sites. Supersedes PR #21385. * fix(proxy): address review feedback — safer backwards compat and None guards - Replace try/except TypeError with inspect.signature() check for litellm_call_info backwards compatibility. This avoids masking real TypeErrors inside callback implementations and prevents double invocation with inconsistent parameters. - Use (data.get("key") or {}) instead of data.get("key", {}) to guard against keys that exist with an explicit None value, which would cause AttributeError on the subsequent .get() call. * fix(proxy): cache inspect.signature result for callback compat check Move the inspect.signature() call into a module-level helper with a dict cache keyed by callback identity. Avoids repeated introspection per request per callback in the hot path. * fix(proxy): use class identity for signature cache key Key the _CALLBACK_ACCEPTS_CALL_INFO cache by id(type(cb)) instead of id(cb) to avoid stale entries from Python address reuse after GC. All instances of the same callback class share the same method signature, so class identity is both safer and more cache-efficient.
The async_post_call_response_headers_hook receives request_headers as a parameter but it was never populated — always None. This prevented use cases like echoing an API gateway request ID (e.g., APIGEE) from the incoming request into the response headers.
Changes:
Closes #19646
Relevant issues
Fixes #19646
Pre-Submission checklist
tests/litellm/directory, Adding at least 1 test is a hard requirement - see detailsmake test-unit@greptileaiand received a Confidence Score of at least 4/5 before requesting a maintainer reviewCI (LiteLLM team)
Branch creation CI run
Link:
CI run for the last commit
Link:
Merge / cherry-pick CI run
Links:
Type
🐛 Bug Fix
📖 Documentation
✅ Test
Changes
Core fix: populate
request_headersat all hook call siteslitellm/proxy/common_request_processing.py_request_headersattribute toProxyBaseLLMRequestProcessing.__init__dict(request.headers)at the start ofbase_process_llm_requestrequest_headers=self._request_headersto all 3post_call_response_headers_hookcalls (streaming success, non-streaming success, failure)Fix: ModifyResponseException guardrail gap in /responses
litellm/proxy/response_api_endpoints/endpoints.pypost_call_response_headers_hookcall in theModifyResponseExceptionhandler so custom headers are injected even when a guardrail blocks the requestDocs & demo
docs/my-website/docs/proxy/call_hooks.mdasync_post_call_response_headers_hookexample to showrequest_headersusage (echoing gateway request IDs)/responsese2e_demo_response_headers_callback.py+tests/e2e_demo_response_headers_callback.pyTests
tests/test_litellm/proxy/hooks/test_post_call_response_headers_hook.pytest_response_headers_hook_receives_request_headers— verifies hook receives request_headerstest_response_headers_hook_request_headers_passed_to_callback— verifies callback can echo incoming headerstests/test_litellm/proxy/response_api_endpoints/test_response_headers_on_guardrail_exception.pytest_modify_response_exception_calls_response_headers_hook— verifies custom headers appear on guardrail failure in /responses