fix(proxy): pass request_headers to response headers hook + fix guardrail gap by michelligabriele · Pull Request #21385 · BerriAI/litellm

michelligabriele · 2026-02-17T16:02:29Z

The async_post_call_response_headers_hook receives request_headers as a parameter but it was never populated — always None. This prevented use cases like echoing an API gateway request ID (e.g., APIGEE) from the incoming request into the response headers.

Changes:

Capture request headers from the FastAPI Request object at the start of base_process_llm_request and pass them to all three hook call sites (streaming success, non-streaming success, failure)
Add missing post_call_response_headers_hook call in the /responses endpoint's ModifyResponseException handler, so custom headers are injected even when a guardrail blocks the request
Update E2E demo to show APIGEE request ID echoing across all endpoints
Update docs with request_headers usage example
Add unit tests for request_headers forwarding and guardrail exception path

Closes #19646

Relevant issues

Fixes #19646

Pre-Submission checklist

I have Added testing in the tests/litellm/ directory, Adding at least 1 test is a hard requirement - see details
My PR passes all unit tests on make test-unit
My PR's scope is as isolated as possible, it only solves 1 specific problem
I have requested a Greptile review by commenting @greptileai and received a Confidence Score of at least 4/5 before requesting a maintainer review

CI (LiteLLM team)

CI status guideline:

50-55 passing tests: main is stable with minor issues.

45-49 passing tests: acceptable but needs attention

<= 40 passing tests: unstable; be careful with your merges and assess the risk.

Branch creation CI run
Link:
CI run for the last commit
Link:
Merge / cherry-pick CI run
Links:

Type

🐛 Bug Fix
📖 Documentation
✅ Test

Changes

Core fix: populate `request_headers` at all hook call sites

litellm/proxy/common_request_processing.py

Added _request_headers attribute to ProxyBaseLLMRequestProcessing.__init__
Capture dict(request.headers) at the start of base_process_llm_request
Pass request_headers=self._request_headers to all 3 post_call_response_headers_hook calls (streaming success, non-streaming success, failure)

Fix: ModifyResponseException guardrail gap in /responses

litellm/proxy/response_api_endpoints/endpoints.py

Added post_call_response_headers_hook call in the ModifyResponseException handler so custom headers are injected even when a guardrail blocks the request

Docs & demo

docs/my-website/docs/proxy/call_hooks.md

Updated async_post_call_response_headers_hook example to show request_headers usage (echoing gateway request IDs)
Added tip noting the hook works for all endpoints including /responses

e2e_demo_response_headers_callback.py + tests/e2e_demo_response_headers_callback.py

Updated demo to show APIGEE request ID echoing from incoming request headers

Tests

tests/test_litellm/proxy/hooks/test_post_call_response_headers_hook.py

test_response_headers_hook_receives_request_headers — verifies hook receives request_headers
test_response_headers_hook_request_headers_passed_to_callback — verifies callback can echo incoming headers

tests/test_litellm/proxy/response_api_endpoints/test_response_headers_on_guardrail_exception.py

test_modify_response_exception_calls_response_headers_hook — verifies custom headers appear on guardrail failure in /responses

vercel · 2026-02-17T16:02:36Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
litellm	Ready	Preview, Comment	Feb 18, 2026 1:58pm

greptile-apps · 2026-02-17T16:07:02Z

Greptile Summary

This PR fixes the async_post_call_response_headers_hook to actually receive the incoming HTTP request headers (previously always None) and adds the missing hook call in the /responses endpoint's ModifyResponseException handler.

Core fix: Captures dict(request.headers) at the start of base_process_llm_request and passes it as request_headers to all three hook call sites (streaming success, non-streaming success, failure) in ProxyBaseLLMRequestProcessing
Guardrail gap fix: Adds post_call_response_headers_hook call in the /responses endpoint's ModifyResponseException handler so custom headers are injected even when a guardrail blocks the request
Concern — sensitive header exposure: dict(request.headers) includes authorization, cookie, and other sensitive headers that are passed unfiltered to all CustomLogger callbacks. Consider filtering sensitive headers before passing them to callbacks.
Inconsistency: The same ModifyResponseException guardrail gap still exists in /chat/completions, /completions, and /v1/messages endpoint handlers in proxy_server.py and anthropic_endpoints/endpoints.py. The docs claim this hook works for "all proxy endpoints" on failure responses, which is not fully accurate.
Duplicate demo file: e2e_demo_response_headers_callback.py exists identically at both the repo root and under tests/

Confidence Score: 3/5

The core fix is correct and low-risk, but passing unfiltered request headers (including Authorization) to callbacks is a security concern, and the guardrail gap fix is incomplete across other endpoints.
Score of 3 reflects: (1) the request_headers fix itself is straightforward and well-tested, (2) but sensitive headers are exposed to callbacks without filtering, (3) the guardrail gap fix is only applied to /responses while the same gap remains in /chat/completions, /completions, and /v1/messages, making the docs claim inaccurate, and (4) there's a duplicated demo file.
Pay close attention to litellm/proxy/common_request_processing.py (sensitive header exposure) and litellm/proxy/response_api_endpoints/endpoints.py (inconsistent guardrail fix).

Important Files Changed

Filename	Overview
litellm/proxy/common_request_processing.py	Core fix: captures request headers and passes them to all 3 hook call sites. Correct implementation but exposes sensitive headers (Authorization, etc.) to callbacks without filtering.
litellm/proxy/response_api_endpoints/endpoints.py	Adds response headers hook to ModifyResponseException handler in /responses endpoint. Correct fix but inconsistent — same gap remains in /chat/completions, /completions, and /v1/messages endpoints.
docs/my-website/docs/proxy/call_hooks.md	Updated docs with request_headers usage example and tip about endpoint coverage. Tip claim may be inaccurate for guardrail exceptions on non-/responses endpoints.
e2e_demo_response_headers_callback.py	New E2E demo file at repo root showing request header echoing. Duplicated identically in tests/ directory.
tests/e2e_demo_response_headers_callback.py	E2E demo callback copy under tests/. Identical to root-level file — one copy should be removed.
tests/test_litellm/proxy/hooks/test_post_call_response_headers_hook.py	Two new unit tests for request_headers forwarding. Properly mocked, no real network calls. Tests verify headers reach the callback.
tests/test_litellm/proxy/response_api_endpoints/test_response_headers_on_guardrail_exception.py	New test verifying response headers hook fires on ModifyResponseException in /responses. Uses TestClient with mocks, no real network calls.

Flowchart

flowchart TD
    A[Incoming HTTP Request] --> B[base_process_llm_request]
    B --> C["Capture request.headers → self._request_headers"]
    C --> D{Route Request}
    D -->|Streaming| E[Streaming Success Path]
    D -->|Non-Streaming| F[Non-Streaming Success Path]
    D -->|Error| G[Failure Path]
    D -->|ModifyResponseException| H{Which Endpoint?}

    E --> E1["post_call_response_headers_hook(request_headers=self._request_headers)"]
    E1 --> E2[Inject headers into StreamingResponse]

    F --> F1["post_call_response_headers_hook(request_headers=self._request_headers)"]
    F1 --> F2[Inject headers into fastapi_response]

    G --> G1["post_call_response_headers_hook(request_headers=self._request_headers)"]
    G1 --> G2[Inject headers into error response]

    H -->|/responses ✅ Fixed| I["post_call_response_headers_hook(request_headers=dict(request.headers))"]
    I --> I1[Inject headers into guardrail response]

    H -->|/chat/completions ❌| J[No hook call - gap remains]
    H -->|/completions ❌| K[No hook call - gap remains]
    H -->|/v1/messages ❌| L[No hook call - gap remains]

_{Last reviewed commit: 0ad9bac}

greptile-apps

_{7 files reviewed, 3 comments}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps · 2026-02-17T16:07:06Z

litellm/proxy/common_request_processing.py

@@ -749,6 +750,8 @@ async def base_process_llm_request(
        """
        Common request processing logic for both chat completions and responses API endpoints
        """
+        self._request_headers = dict(request.headers)


Sensitive headers exposed to callbacks

dict(request.headers) captures all HTTP headers, including authorization, cookie, and other sensitive values. These are then passed unfiltered to every CustomLogger callback via request_headers. A misconfigured or third-party callback could inadvertently log or leak credentials.

Consider filtering out sensitive headers before storing them, or at minimum document this behavior clearly. For example:

_sensitive_headers = {"authorization", "cookie", "proxy-authorization"} self._request_headers = { k: v for k, v in request.headers.items() if k.lower() not in _sensitive_headers }

greptile-apps · 2026-02-17T16:07:07Z

e2e_demo_response_headers_callback.py

+"""
+Demo CustomLogger that injects custom response headers.
+
+Shows how to:
+1. Echo an incoming request header (e.g., APIGEE request ID) into the response
+2. Inject headers on both success and failure paths
+3. Works for /chat/completions, /embeddings, and /responses
+
+Usage:
+    litellm --config tests/e2e_demo_response_headers_config.yaml
+
+Test commands:
+    # /chat/completions (non-streaming)
+    curl -s -D- http://localhost:4000/chat/completions \
+      -H "Authorization: Bearer sk-1234" \
+      -H "x-apigee-request-id: apigee-req-001" \
+      -d '{"model":"gpt-4o-mini","messages":[{"role":"user","content":"hi"}]}'
+
+    # /chat/completions (streaming)
+    curl -s -D- http://localhost:4000/chat/completions \
+      -H "Authorization: Bearer sk-1234" \
+      -H "x-apigee-request-id: apigee-req-002" \
+      -d '{"model":"gpt-4o-mini","messages":[{"role":"user","content":"hi"}],"stream":true}'
+
+    # /embeddings
+    curl -s -D- http://localhost:4000/embeddings \
+      -H "Authorization: Bearer sk-1234" \
+      -H "x-apigee-request-id: apigee-req-003" \
+      -d '{"model":"text-embedding-3-small","input":"hello"}'
+
+    # /v1/responses (non-streaming)
+    curl -s -D- http://localhost:4000/v1/responses \
+      -H "Authorization: Bearer sk-1234" \
+      -H "x-apigee-request-id: apigee-req-004" \
+      -d '{"model":"gpt-4o-mini","input":"hi"}'
+
+    # /v1/responses (streaming)
+    curl -s -D- http://localhost:4000/v1/responses \
+      -H "Authorization: Bearer sk-1234" \
+      -H "x-apigee-request-id: apigee-req-005" \
+      -d '{"model":"gpt-4o-mini","input":"hi","stream":true}'
+
+    # Failure path (bad model → headers still injected)
+    curl -s -D- http://localhost:4000/chat/completions \
+      -H "Authorization: Bearer sk-1234" \
+      -H "x-apigee-request-id: apigee-req-006" \
+      -d '{"model":"nonexistent-model","messages":[{"role":"user","content":"hi"}]}'
+
+Expected: All responses contain x-apigee-request-id, x-custom-header, and x-litellm-hook-model.
+"""
+
+from typing import Any, Dict, Optional
+
+from litellm.integrations.custom_logger import CustomLogger
+from litellm.proxy._types import UserAPIKeyAuth
+
+
+class ResponseHeaderInjector(CustomLogger):
+    """
+    Demonstrates injecting custom HTTP response headers via the proxy hook.
+
+    Key features:
+    - Echoes the incoming x-apigee-request-id header back in the response
+    - Adds a static custom header and the model name
+    - Works for success (streaming + non-streaming) and failure responses
+    - Works for all endpoints: /chat/completions, /embeddings, /responses
+    """
+
+    async def async_post_call_response_headers_hook(
+        self,
+        data: dict,
+        user_api_key_dict: UserAPIKeyAuth,
+        response: Any,
+        request_headers: Optional[Dict[str, str]] = None,
+    ) -> Optional[Dict[str, str]]:
+        headers: Dict[str, str] = {
+            "x-custom-header": "hello-from-hook",
+            "x-litellm-hook-model": data.get("model", "unknown"),
+        }
+
+        # Echo the APIGEE request ID from the incoming request into the response
+        if request_headers:
+            apigee_id = request_headers.get("x-apigee-request-id")
+            if apigee_id:
+                headers["x-apigee-request-id"] = apigee_id
+
+        return headers
+
+
+response_header_injector = ResponseHeaderInjector()


Duplicate file at repo root

This file is identical to tests/e2e_demo_response_headers_callback.py. Having the same file in two places creates a maintenance burden. Consider keeping only the tests/ copy and removing this root-level file, or if it's intended to be a user-facing demo, place it under docs/ or reference it from a single location.

_{Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!}

greptile-apps · 2026-02-17T16:07:08Z

litellm/proxy/response_api_endpoints/endpoints.py

+        # Call response headers hook for guardrail failure path
+        callback_headers = await proxy_logging_obj.post_call_response_headers_hook(
+            data=_data,
+            user_api_key_dict=user_api_key_dict,
+            response=None,
+            request_headers=dict(request.headers),
+        )
+        if callback_headers:
+            fastapi_response.headers.update(callback_headers)


Inconsistent fix — other ModifyResponseException handlers still lack this hook

This PR adds post_call_response_headers_hook to the ModifyResponseException handler for /responses, but the same gap exists in:

proxy_server.py line ~6194 (/chat/completions handler)

proxy_server.py line ~6358 (/completions text completions handler)

anthropic_endpoints/endpoints.py line ~72 (/v1/messages handler)

If the intention is that custom response headers should be injected on guardrail failures for all endpoints, those handlers need the same treatment. The docs tip at line 443 of call_hooks.md states this hook "works for all proxy endpoints" including failure responses, which would be inaccurate for these other endpoints on guardrail exceptions.

michelligabriele · 2026-02-17T17:50:07Z

@greptileai review

greptile-apps · 2026-02-17T17:54:34Z

Greptile Summary

This PR fixes the async_post_call_response_headers_hook to actually receive the incoming HTTP request headers (previously always None), enabling use cases like echoing API gateway request IDs into response headers. It also adds the missing post_call_response_headers_hook call in all ModifyResponseException handlers across endpoints so custom headers are injected even when guardrails block requests.

Core fix: Captures filtered request headers at the start of base_process_llm_request and passes them to all three hook call sites (streaming success, non-streaming success, failure) in ProxyBaseLLMRequestProcessing
Guardrail gap fix: Adds post_call_response_headers_hook to ModifyResponseException handlers in /responses, /chat/completions, /completions, and /v1/messages endpoints
Security: Filters sensitive headers (authorization, cookie, proxy-authorization) before passing to callbacks via _filter_sensitive_headers
Bug: Missing Dict import in common_request_processing.py will cause a NameError at runtime — must be fixed before merge

Confidence Score: 2/5

This PR has a missing import that will cause a runtime NameError in the critical request path — must be fixed before merge.
The logic and approach are sound, but the missing Dict import in common_request_processing.py line 359 will cause a NameError when ProxyBaseLLMRequestProcessing.__init__ is called, breaking all proxy request processing. This is a blocking issue.
litellm/proxy/common_request_processing.py — missing Dict import causes runtime failure

Important Files Changed

Filename	Overview
litellm/proxy/common_request_processing.py	Core fix to populate `request_headers` at all hook call sites. Adds `_SENSITIVE_HEADERS` filtering (addressing previous review feedback) and passes `request_headers` to all 3 hook sites. Has a missing `Dict` import that will cause a NameError at runtime.
litellm/proxy/response_api_endpoints/endpoints.py	Adds `post_call_response_headers_hook` call in the `ModifyResponseException` handler for `/responses`. Uses `_filter_sensitive_headers` consistently with other endpoints. Implementation is correct.
litellm/proxy/proxy_server.py	Adds `post_call_response_headers_hook` to `ModifyResponseException` handlers in both `/chat/completions` and `/completions` endpoints. Consistent pattern with other endpoints.
litellm/proxy/anthropic_endpoints/endpoints.py	Adds `post_call_response_headers_hook` to the `ModifyResponseException` handler for `/v1/messages`. Same pattern as other endpoints.
docs/my-website/docs/proxy/call_hooks.md	Updated docs to show `request_headers` usage, removed unnecessary `__init__`, and added tip about endpoint coverage. Documentation is clear and accurate.
tests/test_litellm/proxy/hooks/test_post_call_response_headers_hook.py	Added 2 new mock-only tests verifying `request_headers` forwarding to callbacks. Tests are well-structured and don't make real network calls.
tests/test_litellm/proxy/response_api_endpoints/test_response_headers_on_guardrail_exception.py	New test verifying custom headers appear on guardrail failure in `/responses`. Uses `TestClient` with mocks — no real network calls. Properly patches auth and processor.
tests/e2e_demo_response_headers_callback.py	New E2E demo showing APIGEE request ID echoing. Well-documented with usage instructions. Purely demonstrative, no issues.

Flowchart

flowchart TD
    A[Incoming HTTP Request] --> B[ProxyBaseLLMRequestProcessing.__init__]
    B --> C[base_process_llm_request]
    C --> D["_filter_sensitive_headers(request.headers)"]
    D --> E[Store as self._request_headers]
    
    E --> F{Request outcome?}
    
    F -->|Streaming Success| G["post_call_response_headers_hook(request_headers=self._request_headers)"]
    G --> H[Merge into streaming custom_headers]
    
    F -->|Non-Streaming Success| I["post_call_response_headers_hook(request_headers=self._request_headers)"]
    I --> J[Update fastapi_response.headers]
    
    F -->|Failure| K["post_call_response_headers_hook(request_headers=self._request_headers)"]
    K --> L[Include in error response headers]
    
    F -->|ModifyResponseException| M["_filter_sensitive_headers(request.headers) — called directly"]
    M --> N["post_call_response_headers_hook(request_headers=filtered)"]
    N --> O[Update fastapi_response.headers]

    style M fill:#f9f,stroke:#333
    style D fill:#bbf,stroke:#333

_{Last reviewed commit: 1f22984}

greptile-apps

_{8 files reviewed, 1 comment}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps · 2026-02-17T17:54:38Z

litellm/proxy/common_request_processing.py

    def __init__(self, data: dict):
        self.data = data
+        self._request_headers: Optional[Dict[str, str]] = None


Missing Dict import causes NameError

Dict is used in the type annotation here but is not imported from typing (lines 6-15 import TYPE_CHECKING, Any, AsyncGenerator, Callable, Literal, Optional, Tuple, Union — but not Dict). In Python without from __future__ import annotations, variable annotations in function bodies are evaluated at runtime. This will raise a NameError: name 'Dict' is not defined when ProxyBaseLLMRequestProcessing.__init__ is called.

Suggested change

self._request_headers: Optional[Dict[str, str]] = None

self._request_headers: Optional[dict] = None

Alternatively, add Dict to the typing imports at line 6.

michelligabriele · 2026-02-17T18:14:47Z

@greptileai review

greptile-apps · 2026-02-17T18:19:56Z

Greptile Summary

This PR fixes the async_post_call_response_headers_hook so that the request_headers parameter is actually populated (previously always None), and adds the missing hook call in ModifyResponseException handlers across all endpoints so custom headers are injected even when guardrails block requests.

Core fix: Captures filtered request headers via _filter_sensitive_headers in base_process_llm_request and passes them to all three hook call sites (streaming success, non-streaming success, failure) in ProxyBaseLLMRequestProcessing
Guardrail gap fix: Adds post_call_response_headers_hook calls to ModifyResponseException handlers in /chat/completions, /completions, /v1/messages, and /responses endpoints
Security: Introduces _SENSITIVE_HEADERS frozenset to strip authorization, cookie, and proxy-authorization before passing headers to callbacks
Tests: Adds mock-only unit tests verifying request_headers forwarding and the guardrail exception path — no real network calls
Docs: Minor inaccuracy — tip says "original HTTP request headers" but sensitive headers are actually filtered

Confidence Score: 4/5

This PR is safe to merge — it fills in a previously-always-None parameter and adds missing hook calls with proper sensitive header filtering.
The changes are well-scoped: they populate a parameter that was always None, add consistent hook calls across all ModifyResponseException handlers, and filter sensitive headers before passing to callbacks. Tests cover the core paths. The only concern is a minor docs inaccuracy (says "original" headers but sensitive ones are filtered).
Pay attention to docs/my-website/docs/proxy/call_hooks.md — the tip claims headers are "original" but they are actually filtered.

Important Files Changed

Filename	Overview
litellm/proxy/common_request_processing.py	Core fix: adds `_filter_sensitive_headers` static method and `_request_headers` instance attribute; captures filtered headers in `base_process_llm_request` and passes them to all three hook call sites (streaming success, non-streaming success, failure).
litellm/proxy/proxy_server.py	Adds `post_call_response_headers_hook` call to both `/chat/completions` and `/completions` `ModifyResponseException` handlers, using `_filter_sensitive_headers` to strip credentials before passing to callbacks.
litellm/proxy/response_api_endpoints/endpoints.py	Adds `post_call_response_headers_hook` call to `/responses` `ModifyResponseException` handler with filtered headers, closing the guardrail gap for the responses API.
litellm/proxy/anthropic_endpoints/endpoints.py	Adds `post_call_response_headers_hook` call to `/v1/messages` `ModifyResponseException` handler with filtered headers, ensuring consistency with other endpoints.
docs/my-website/docs/proxy/call_hooks.md	Updates hook example to show `request_headers` usage for echoing gateway IDs; adds tip about endpoint coverage. Doc says "original HTTP request headers" but sensitive headers are filtered — slightly misleading.
tests/e2e_demo_response_headers_callback.py	Demo callback showing APIGEE request ID echoing. Not a test file — serves as a reference implementation for documentation purposes.
tests/test_litellm/proxy/hooks/test_post_call_response_headers_hook.py	Adds two well-structured mock-only tests verifying `request_headers` forwarding to callbacks. Tests are properly isolated with no network calls.
tests/test_litellm/proxy/response_api_endpoints/test_response_headers_on_guardrail_exception.py	Mock-only test verifying the guardrail exception path in `/responses` invokes the response headers hook. Uses TestClient with mocked auth and processor — no real network calls.

Flowchart

flowchart TD
    A[Incoming HTTP Request] --> B[FastAPI Endpoint]
    B --> C{_filter_sensitive_headers}
    C -->|Remove auth, cookie, proxy-auth| D[Filtered request_headers]
    D --> E[base_process_llm_request]
    E --> F{Response Path}
    F -->|Streaming Success| G[post_call_response_headers_hook\nwith request_headers]
    F -->|Non-Streaming Success| H[post_call_response_headers_hook\nwith request_headers]
    F -->|Failure| I[post_call_response_headers_hook\nwith request_headers]
    F -->|ModifyResponseException\nGuardrail Block| J[post_call_response_headers_hook\nwith _filter_sensitive_headers]
    G --> K[Custom headers injected into response]
    H --> K
    I --> K
    J --> K
    K --> L[HTTP Response with custom headers]

_{Last reviewed commit: bd30289}

greptile-apps

_{8 files reviewed, 1 comment}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps · 2026-02-17T18:19:59Z

docs/my-website/docs/proxy/call_hooks.md

+:::tip
+This hook works for **all proxy endpoints**: `/chat/completions`, `/embeddings`, `/responses` (streaming and non-streaming), and failure responses.
+
+The `request_headers` parameter contains the original HTTP request headers, allowing you to echo incoming headers (e.g., API gateway request IDs) into the response.


Docs say "original" but headers are filtered

The text says request_headers contains "the original HTTP request headers," but _filter_sensitive_headers strips authorization, cookie, and proxy-authorization before passing them to callbacks. Consider updating the wording to note this filtering, e.g.:

"The request_headers parameter contains the HTTP request headers (with sensitive headers like authorization and cookie removed for security)..."

This helps callback authors understand they won't have access to auth headers through this parameter.

…rail gap The async_post_call_response_headers_hook receives request_headers as a parameter but it was never populated — always None. This prevented use cases like echoing an API gateway request ID (e.g., APIGEE) from the incoming request into the response headers. Changes: - Capture request headers from the FastAPI Request object at the start of base_process_llm_request and pass them to all three hook call sites (streaming success, non-streaming success, failure) - Add missing post_call_response_headers_hook call in the /responses endpoint's ModifyResponseException handler, so custom headers are injected even when a guardrail blocks the request - Update E2E demo to show APIGEE request ID echoing across all endpoints - Update docs with request_headers usage example - Add unit tests for request_headers forwarding and guardrail exception path Closes BerriAI#19646

…ints Address code review feedback: 1. Filter sensitive request headers (authorization, cookie, proxy-authorization) before passing to callbacks to prevent credential leaks in third-party loggers. 2. Add post_call_response_headers_hook to ModifyResponseException handlers in all four endpoints (/chat/completions, /completions, /v1/messages, /responses) — previously only /responses was covered. 3. Remove duplicate e2e demo file from repo root (canonical copy lives in tests/).

Move post_call_failure_hook + post_call_response_headers_hook calls from 4 separate endpoint handlers into a single _handle_modify_response_exception method on ProxyBaseLLMRequestProcessing, mirroring the existing _handle_llm_api_exception pattern. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

michelligabriele · 2026-02-18T13:47:08Z

@greptileai review

greptile-apps · 2026-02-18T13:51:27Z

Greptile Summary

This PR fixes the async_post_call_response_headers_hook to actually receive request_headers (previously always None) and centralizes ModifyResponseException handling so custom response headers are injected on guardrail failures.

Core fix: Captures filtered request headers at the start of base_process_llm_request and passes them to all three hook call sites (streaming success, non-streaming success, failure)
Centralized guardrail handling: New _handle_modify_response_exception method replaces duplicated inline logic across /chat/completions, /completions, /responses, and /v1/messages endpoints
Sensitive header filtering: Strips authorization, cookie, and proxy-authorization before passing headers to callbacks
Bug: Streaming guardrail responses lose custom headers: In /chat/completions, /completions, and /v1/messages, when a streaming request triggers a guardrail ModifyResponseException, headers are set on fastapi_response but a new StreamingResponse is returned — the custom headers are silently dropped. This affects proxy_server.py (lines 6228, 6399) and anthropic_endpoints/endpoints.py (line 111)
Tests and docs are well-structured; tests are mock-only as required

Confidence Score: 3/5

The core logic is sound but custom headers are silently lost on streaming guardrail responses across three endpoints.
The non-streaming paths work correctly and the centralization of ModifyResponseException handling is a good refactor. However, the streaming guardrail paths in /chat/completions, /completions, and /v1/messages create new StreamingResponse objects that don't include the custom headers set on fastapi_response, meaning the fix is incomplete for streaming guardrail scenarios.
litellm/proxy/proxy_server.py (streaming ModifyResponseException handlers at lines 6228 and 6399), litellm/proxy/anthropic_endpoints/endpoints.py (streaming handler at line 111)

Important Files Changed

Filename	Overview
litellm/proxy/common_request_processing.py	Core change: adds `_filter_sensitive_headers`, captures request headers at start of `base_process_llm_request`, passes them to all hook call sites, and centralizes `ModifyResponseException` handling in `_handle_modify_response_exception`. Implementation is clean and consistent.
litellm/proxy/proxy_server.py	Replaces inline failure hook with centralized `_handle_modify_response_exception` in both `/chat/completions` and `/completions`. Custom headers set on `fastapi_response` are lost when a new `StreamingResponse` is returned in the streaming guardrail path.
litellm/proxy/anthropic_endpoints/endpoints.py	Refactored to use centralized `_handle_modify_response_exception`. Same streaming header-loss issue: `create_response` at line 111 is called with `headers={}` instead of including the custom callback headers.
litellm/proxy/response_api_endpoints/endpoints.py	Refactored to use centralized `_handle_modify_response_exception`. Non-streaming path works correctly since it returns a model object and `fastapi_response` headers are preserved by FastAPI.
tests/test_litellm/proxy/hooks/test_post_call_response_headers_hook.py	Adds two new mock-only tests verifying request_headers are forwarded to the callback hook. Tests are correctly structured and don't make real network calls.
tests/test_litellm/proxy/response_api_endpoints/test_response_headers_on_guardrail_exception.py	New test using FastAPI TestClient that verifies custom headers appear on guardrail failure for `/v1/responses`. Uses appropriate mocking to avoid real network calls.
docs/my-website/docs/proxy/call_hooks.md	Updated docs to show `request_headers` usage with gateway ID echoing example. Added tip about all-endpoint support.
tests/e2e_demo_response_headers_callback.py	New E2E demo callback showing how to echo APIGEE request IDs via the response headers hook. Documentation/demo only file.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[Incoming Request] --> B[base_process_llm_request]
    B --> C{Capture request_headers\nfilter sensitive headers}
    C --> D[Route to LLM]
    D --> E{Response Type?}
    E -->|Streaming Success| F[post_call_response_headers_hook\nwith request_headers]
    E -->|Non-Streaming Success| G[post_call_response_headers_hook\nwith request_headers]
    E -->|Exception| H{Exception Type?}
    H -->|ModifyResponseException| I[_handle_modify_response_exception]
    I --> J[post_call_failure_hook]
    J --> K[post_call_response_headers_hook\nwith request_headers]
    K --> L{Streaming?}
    L -->|Yes| M["⚠️ New StreamingResponse\n(headers lost)"]
    L -->|No| N[Headers on fastapi_response ✓]
    H -->|Other Exception| O[_handle_llm_api_exception]
    O --> P[post_call_response_headers_hook\nwith request_headers]
    F --> Q[Headers in custom_headers dict ✓]
    G --> R[Headers on fastapi_response ✓]

_{Last reviewed commit: 6318692}

greptile-apps

_{8 files reviewed, 2 comments}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps · 2026-02-18T13:51:38Z

Additional Comments (2)

litellm/proxy/proxy_server.py
Custom headers lost on streaming guardrail responses

When a streaming request triggers a ModifyResponseException, _handle_modify_response_exception sets custom headers on fastapi_response, but the handler then returns a new StreamingResponse object (line 6228) that does not carry those headers. The custom callback headers will be silently dropped.

The same issue exists in the /completions handler (line 6399) and the anthropic endpoint's streaming path (anthropic_endpoints/endpoints.py line 111 where create_response is called with headers={}).

To fix this, the custom headers need to be explicitly passed to the new StreamingResponse. For example:

            return StreamingResponse(
                selected_data_generator,
                media_type="text/event-stream",
                status_code=200,  # Return 200 for passthrough mode
                headers=dict(fastapi_response.headers),
            )

litellm/proxy/anthropic_endpoints/endpoints.py
Streaming guardrail response discards custom headers

_handle_modify_response_exception at line 75 sets callback headers on fastapi_response, but this streaming path passes headers={} to create_response, creating a new response object that doesn't include those custom headers. The headers from the hook will be silently lost for streaming Anthropic guardrail responses.

Consider forwarding the fastapi_response headers:

            return await create_response(
                generator=selected_data_generator,
                media_type="text/event-stream",
                headers=dict(fastapi_response.headers),
            )

…lures When a streaming request triggered ModifyResponseException, custom headers set by _handle_modify_response_exception on fastapi_response were lost because a new StreamingResponse was returned without them. Pass dict(fastapi_response.headers) to StreamingResponse/create_response in /chat/completions, /completions, and /v1/messages handlers. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

michelligabriele · 2026-02-18T13:56:45Z

@greptileai review

greptile-apps · 2026-02-18T14:02:02Z

Greptile Summary

This PR fixes the async_post_call_response_headers_hook to actually receive request headers (previously always None) and closes the guardrail gap where ModifyResponseException handlers were missing the response headers hook call.

Core fix: Captures filtered request headers at the start of base_process_llm_request and passes them to all three hook call sites (streaming success, non-streaming success, failure)
Centralized guardrail handling: New _handle_modify_response_exception method consolidates the failure hook + response headers hook logic, replacing duplicated inline code across /chat/completions, /completions, /responses, and /v1/messages endpoints
Streaming header propagation: StreamingResponse in ModifyResponseException handlers now correctly passes fastapi_response.headers (previously {} in the Anthropic endpoint)
Sensitive header filtering: Authorization, cookie, and proxy-authorization headers are stripped before passing to callbacks via _filter_sensitive_headers
Tests: Mock-only unit tests verify request_headers forwarding and the guardrail exception path

Confidence Score: 4/5

This PR is safe to merge — it fixes a real bug with well-structured, consistent changes across all endpoints.
The changes are well-scoped and consistently applied across all four endpoint handlers. The centralized _handle_modify_response_exception method reduces duplication. Sensitive header filtering is correctly implemented. Tests cover the key paths (request_headers forwarding and guardrail exception). The only minor concern is the docs describing headers as "original" when they are filtered, but this was already flagged in a previous review thread.
No files require special attention — changes are consistent and well-tested.

Important Files Changed

Filename	Overview
litellm/proxy/common_request_processing.py	Core fix: captures filtered request headers, passes them to all 3 hook call sites, and adds centralized `_handle_modify_response_exception` method. Well-structured changes.
litellm/proxy/proxy_server.py	Chat completions and text completions `ModifyResponseException` handlers now use centralized method and pass custom headers to `StreamingResponse`.
litellm/proxy/anthropic_endpoints/endpoints.py	Anthropic endpoint now uses centralized guardrail handler and passes `fastapi_response.headers` to streaming response instead of empty `{}`.
litellm/proxy/response_api_endpoints/endpoints.py	Responses endpoint now uses centralized `_handle_modify_response_exception` for guardrail exceptions, ensuring custom headers are injected.
docs/my-website/docs/proxy/call_hooks.md	Updated docs with `request_headers` usage example and tip about endpoint coverage. Minor inaccuracy: says "original" headers but they are filtered.
tests/e2e_demo_response_headers_callback.py	E2E demo callback showing APIGEE request ID echoing. Not a test file — it's a `CustomLogger` implementation for manual testing.
tests/test_litellm/proxy/hooks/test_post_call_response_headers_hook.py	Two new mock-only tests validating `request_headers` forwarding through `ProxyLogging.post_call_response_headers_hook`. Good coverage.
tests/test_litellm/proxy/response_api_endpoints/test_response_headers_on_guardrail_exception.py	Integration test using `TestClient` with mocked auth and request processing to verify custom headers on guardrail failures in `/responses`. Fully mocked, no real network calls.

Sequence Diagram

sequenceDiagram
    participant Client
    participant FastAPI as FastAPI Endpoint
    participant Processor as ProxyBaseLLMRequestProcessing
    participant Guardrail
    participant ProxyLogging
    participant Callback as CustomLogger Callback

    Client->>FastAPI: HTTP Request (with headers)
    FastAPI->>Processor: base_process_llm_request(request)
    Processor->>Processor: _filter_sensitive_headers(request.headers)
    Note over Processor: Strips authorization, cookie, proxy-authorization

    alt Guardrail blocks request
        Processor->>Guardrail: pre-call check
        Guardrail-->>Processor: ModifyResponseException
        Processor->>FastAPI: raises ModifyResponseException
        FastAPI->>Processor: _handle_modify_response_exception(e)
        Processor->>ProxyLogging: post_call_failure_hook()
        Processor->>ProxyLogging: post_call_response_headers_hook(request_headers)
        ProxyLogging->>Callback: async_post_call_response_headers_hook(request_headers)
        Callback-->>ProxyLogging: custom headers (e.g., x-apigee-request-id)
        ProxyLogging-->>Processor: merged headers
        Processor->>FastAPI: fastapi_response.headers.update(headers)
        FastAPI-->>Client: 200 + custom headers + violation message
    else Success (non-streaming)
        Processor->>ProxyLogging: post_call_response_headers_hook(request_headers)
        ProxyLogging->>Callback: async_post_call_response_headers_hook(request_headers)
        Callback-->>ProxyLogging: custom headers
        ProxyLogging-->>Processor: merged headers
        Processor->>FastAPI: fastapi_response.headers.update(headers)
        FastAPI-->>Client: 200 + custom headers + response
    else Success (streaming)
        Processor->>ProxyLogging: post_call_response_headers_hook(request_headers)
        ProxyLogging->>Callback: async_post_call_response_headers_hook(request_headers)
        Callback-->>ProxyLogging: custom headers
        ProxyLogging-->>Processor: merged headers
        Processor-->>FastAPI: StreamingResponse(headers=custom_headers)
        FastAPI-->>Client: SSE stream + custom headers
    end

_{Last reviewed commit: 36717bd}

greptile-apps

_{8 files reviewed, no comments}

_{Edit Code Review Agent Settings | Greptile}

giulio-leone · 2026-03-02T13:49:29Z

Automated patch bundle from top-50 unresolved backlog scan.\nGenerated due limited direct branch-write access; please apply/cherry-pick the proposed minimal edits below.\n\n## PR #21385 — fix/response-headers-hook-request-headers

Unresolved threads summary

greptile-apps, litellm/proxy/common_request_processing.py:753, Sensitive headers exposed to callbacks.
greptile-apps, e2e_demo_response_headers_callback.py:90, Duplicate file at repo root: This file is identical to tests/e2e_demo_response_headers_callback.py.
greptile-apps, litellm/proxy/response_api_endpoints/endpoints.py:237, Inconsistent fix — other ModifyResponseException handlers still lack this hook.
greptile-apps, litellm/proxy/common_request_processing.py:361, Missing Dict import causes NameError.
greptile-apps, docs/my-website/docs/proxy/call_hooks.md:445, Docs say "original" but headers are filtered.

Concrete patch proposal

litellm/proxy/common_request_processing.py: filter sensitive headers by allowlist before callback/log/provider forwarding; add the missing Dict import in this module.
e2e_demo_response_headers_callback.py: collapse duplicated branches into one canonical implementation.
litellm/proxy/response_api_endpoints/endpoints.py: apply the same fix across all equivalent handlers/endpoints.
docs/my-website/docs/proxy/call_hooks.md: rewrite documentation/comments to explicitly cover Docs say "original" but headers are filtered.

…oss all endpoints The response headers hook had 5 gaps that prevented callbacks from reliably extracting routing metadata across endpoint types: 1. Hook never fired for /audio/transcriptions (endpoint bypasses base_process_llm_request) 2. custom_llm_provider not accessible in hook data for any endpoint 3. custom_llm_provider not stamped in ResponsesAPIResponse._hidden_params (unlike chat completions) 4. model_info under inconsistent keys (metadata vs litellm_metadata) 5. request_headers always None at all call sites This adds a litellm_call_info parameter to the hook that normalizes routing metadata (custom_llm_provider, model_info, api_base, model_id) regardless of endpoint type. Also stamps custom_llm_provider on Responses API responses, adds the hook call to the transcription handler, and passes request_headers at all call sites. Supersedes PR BerriAI#21385.

…oss all endpoints (#22985) * fix(proxy): make async_post_call_response_headers_hook consistent across all endpoints The response headers hook had 5 gaps that prevented callbacks from reliably extracting routing metadata across endpoint types: 1. Hook never fired for /audio/transcriptions (endpoint bypasses base_process_llm_request) 2. custom_llm_provider not accessible in hook data for any endpoint 3. custom_llm_provider not stamped in ResponsesAPIResponse._hidden_params (unlike chat completions) 4. model_info under inconsistent keys (metadata vs litellm_metadata) 5. request_headers always None at all call sites This adds a litellm_call_info parameter to the hook that normalizes routing metadata (custom_llm_provider, model_info, api_base, model_id) regardless of endpoint type. Also stamps custom_llm_provider on Responses API responses, adds the hook call to the transcription handler, and passes request_headers at all call sites. Supersedes PR #21385. * fix(proxy): address review feedback — safer backwards compat and None guards - Replace try/except TypeError with inspect.signature() check for litellm_call_info backwards compatibility. This avoids masking real TypeErrors inside callback implementations and prevents double invocation with inconsistent parameters. - Use (data.get("key") or {}) instead of data.get("key", {}) to guard against keys that exist with an explicit None value, which would cause AttributeError on the subsequent .get() call. * fix(proxy): cache inspect.signature result for callback compat check Move the inspect.signature() call into a module-level helper with a dict cache keyed by callback identity. Avoids repeated introspection per request per callback in the hot path. * fix(proxy): use class identity for signature cache key Key the _CALLBACK_ACCEPTS_CALL_INFO cache by id(type(cb)) instead of id(cb) to avoid stale entries from Python address reuse after GC. All instances of the same callback class share the same method signature, so class identity is both safer and more cache-efficient.

vercel bot deployed to Preview February 17, 2026 16:04 View deployment

greptile-apps bot reviewed Feb 17, 2026

View reviewed changes

vercel bot deployed to Preview February 17, 2026 17:51 View deployment

greptile-apps bot reviewed Feb 17, 2026

View reviewed changes

michelligabriele force-pushed the fix/response-headers-hook-request-headers branch from 1f22984 to bd30289 Compare February 17, 2026 18:14

vercel bot deployed to Preview February 17, 2026 18:16 View deployment

greptile-apps bot reviewed Feb 17, 2026

View reviewed changes

michelligabriele and others added 3 commits February 18, 2026 14:21

michelligabriele force-pushed the fix/response-headers-hook-request-headers branch from bd30289 to 6318692 Compare February 18, 2026 13:46

vercel bot deployed to Preview February 18, 2026 13:48 View deployment

greptile-apps bot reviewed Feb 18, 2026

View reviewed changes

vercel bot deployed to Preview February 18, 2026 13:58 View deployment

greptile-apps bot reviewed Feb 18, 2026

View reviewed changes

michelligabriele mentioned this pull request Mar 6, 2026

fix(proxy): make async_post_call_response_headers_hook consistent across all endpoints #22985

Merged

7 tasks

	self._request_headers: Optional[Dict[str, str]] = None
	self._request_headers: Optional[dict] = None

Uh oh!

Conversation

michelligabriele commented Feb 17, 2026

Relevant issues

Pre-Submission checklist

CI (LiteLLM team)

Type

Changes

Core fix: populate request_headers at all hook call sites

Fix: ModifyResponseException guardrail gap in /responses

Docs & demo

Tests

Uh oh!

vercel bot commented Feb 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

greptile-apps bot commented Feb 17, 2026

Greptile Summary

Confidence Score: 3/5

Important Files Changed

Flowchart

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Feb 17, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Feb 17, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Feb 17, 2026

Choose a reason for hiding this comment

Uh oh!

michelligabriele commented Feb 17, 2026

Uh oh!

greptile-apps bot commented Feb 17, 2026

Greptile Summary

Confidence Score: 2/5

Important Files Changed

Flowchart

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Feb 17, 2026

Choose a reason for hiding this comment

Uh oh!

michelligabriele commented Feb 17, 2026

Uh oh!

greptile-apps bot commented Feb 17, 2026

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Flowchart

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Feb 17, 2026

Choose a reason for hiding this comment

Uh oh!

michelligabriele commented Feb 18, 2026

Uh oh!

greptile-apps bot commented Feb 18, 2026

Greptile Summary

Confidence Score: 3/5

Important Files Changed

Flowchart

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot commented Feb 18, 2026

Uh oh!

michelligabriele commented Feb 18, 2026

Uh oh!

greptile-apps bot commented Feb 18, 2026

Greptile Summary

Confidence Score: 4/5

Core fix: populate `request_headers` at all hook call sites

vercel bot commented Feb 17, 2026 •

edited

Loading