Skip to content

fix(proxy): pass request_headers to response headers hook + fix guardrail gap#21385

Open
michelligabriele wants to merge 4 commits intoBerriAI:mainfrom
michelligabriele:fix/response-headers-hook-request-headers
Open

fix(proxy): pass request_headers to response headers hook + fix guardrail gap#21385
michelligabriele wants to merge 4 commits intoBerriAI:mainfrom
michelligabriele:fix/response-headers-hook-request-headers

Conversation

@michelligabriele
Copy link
Collaborator

The async_post_call_response_headers_hook receives request_headers as a parameter but it was never populated — always None. This prevented use cases like echoing an API gateway request ID (e.g., APIGEE) from the incoming request into the response headers.

Changes:

  • Capture request headers from the FastAPI Request object at the start of base_process_llm_request and pass them to all three hook call sites (streaming success, non-streaming success, failure)
  • Add missing post_call_response_headers_hook call in the /responses endpoint's ModifyResponseException handler, so custom headers are injected even when a guardrail blocks the request
  • Update E2E demo to show APIGEE request ID echoing across all endpoints
  • Update docs with request_headers usage example
  • Add unit tests for request_headers forwarding and guardrail exception path

Closes #19646

Relevant issues

Fixes #19646

Pre-Submission checklist

  • I have Added testing in the tests/litellm/ directory, Adding at least 1 test is a hard requirement - see details
  • My PR passes all unit tests on make test-unit
  • My PR's scope is as isolated as possible, it only solves 1 specific problem
  • I have requested a Greptile review by commenting @greptileai and received a Confidence Score of at least 4/5 before requesting a maintainer review

CI (LiteLLM team)

CI status guideline:

  • 50-55 passing tests: main is stable with minor issues.
  • 45-49 passing tests: acceptable but needs attention
  • <= 40 passing tests: unstable; be careful with your merges and assess the risk.
  • Branch creation CI run
    Link:

  • CI run for the last commit
    Link:

  • Merge / cherry-pick CI run
    Links:

Type

🐛 Bug Fix
📖 Documentation
✅ Test

Changes

Core fix: populate request_headers at all hook call sites

litellm/proxy/common_request_processing.py

  • Added _request_headers attribute to ProxyBaseLLMRequestProcessing.__init__
  • Capture dict(request.headers) at the start of base_process_llm_request
  • Pass request_headers=self._request_headers to all 3 post_call_response_headers_hook calls (streaming success, non-streaming success, failure)

Fix: ModifyResponseException guardrail gap in /responses

litellm/proxy/response_api_endpoints/endpoints.py

  • Added post_call_response_headers_hook call in the ModifyResponseException handler so custom headers are injected even when a guardrail blocks the request

Docs & demo

docs/my-website/docs/proxy/call_hooks.md

  • Updated async_post_call_response_headers_hook example to show request_headers usage (echoing gateway request IDs)
  • Added tip noting the hook works for all endpoints including /responses

e2e_demo_response_headers_callback.py + tests/e2e_demo_response_headers_callback.py

  • Updated demo to show APIGEE request ID echoing from incoming request headers

Tests

tests/test_litellm/proxy/hooks/test_post_call_response_headers_hook.py

  • test_response_headers_hook_receives_request_headers — verifies hook receives request_headers
  • test_response_headers_hook_request_headers_passed_to_callback — verifies callback can echo incoming headers

tests/test_litellm/proxy/response_api_endpoints/test_response_headers_on_guardrail_exception.py

  • test_modify_response_exception_calls_response_headers_hook — verifies custom headers appear on guardrail failure in /responses

@vercel
Copy link

vercel bot commented Feb 17, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
litellm Ready Ready Preview, Comment Feb 18, 2026 1:58pm

Request Review

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Feb 17, 2026

Greptile Summary

This PR fixes the async_post_call_response_headers_hook to actually receive the incoming HTTP request headers (previously always None) and adds the missing hook call in the /responses endpoint's ModifyResponseException handler.

  • Core fix: Captures dict(request.headers) at the start of base_process_llm_request and passes it as request_headers to all three hook call sites (streaming success, non-streaming success, failure) in ProxyBaseLLMRequestProcessing
  • Guardrail gap fix: Adds post_call_response_headers_hook call in the /responses endpoint's ModifyResponseException handler so custom headers are injected even when a guardrail blocks the request
  • Concern — sensitive header exposure: dict(request.headers) includes authorization, cookie, and other sensitive headers that are passed unfiltered to all CustomLogger callbacks. Consider filtering sensitive headers before passing them to callbacks.
  • Inconsistency: The same ModifyResponseException guardrail gap still exists in /chat/completions, /completions, and /v1/messages endpoint handlers in proxy_server.py and anthropic_endpoints/endpoints.py. The docs claim this hook works for "all proxy endpoints" on failure responses, which is not fully accurate.
  • Duplicate demo file: e2e_demo_response_headers_callback.py exists identically at both the repo root and under tests/

Confidence Score: 3/5

  • The core fix is correct and low-risk, but passing unfiltered request headers (including Authorization) to callbacks is a security concern, and the guardrail gap fix is incomplete across other endpoints.
  • Score of 3 reflects: (1) the request_headers fix itself is straightforward and well-tested, (2) but sensitive headers are exposed to callbacks without filtering, (3) the guardrail gap fix is only applied to /responses while the same gap remains in /chat/completions, /completions, and /v1/messages, making the docs claim inaccurate, and (4) there's a duplicated demo file.
  • Pay close attention to litellm/proxy/common_request_processing.py (sensitive header exposure) and litellm/proxy/response_api_endpoints/endpoints.py (inconsistent guardrail fix).

Important Files Changed

Filename Overview
litellm/proxy/common_request_processing.py Core fix: captures request headers and passes them to all 3 hook call sites. Correct implementation but exposes sensitive headers (Authorization, etc.) to callbacks without filtering.
litellm/proxy/response_api_endpoints/endpoints.py Adds response headers hook to ModifyResponseException handler in /responses endpoint. Correct fix but inconsistent — same gap remains in /chat/completions, /completions, and /v1/messages endpoints.
docs/my-website/docs/proxy/call_hooks.md Updated docs with request_headers usage example and tip about endpoint coverage. Tip claim may be inaccurate for guardrail exceptions on non-/responses endpoints.
e2e_demo_response_headers_callback.py New E2E demo file at repo root showing request header echoing. Duplicated identically in tests/ directory.
tests/e2e_demo_response_headers_callback.py E2E demo callback copy under tests/. Identical to root-level file — one copy should be removed.
tests/test_litellm/proxy/hooks/test_post_call_response_headers_hook.py Two new unit tests for request_headers forwarding. Properly mocked, no real network calls. Tests verify headers reach the callback.
tests/test_litellm/proxy/response_api_endpoints/test_response_headers_on_guardrail_exception.py New test verifying response headers hook fires on ModifyResponseException in /responses. Uses TestClient with mocks, no real network calls.

Flowchart

flowchart TD
    A[Incoming HTTP Request] --> B[base_process_llm_request]
    B --> C["Capture request.headers → self._request_headers"]
    C --> D{Route Request}
    D -->|Streaming| E[Streaming Success Path]
    D -->|Non-Streaming| F[Non-Streaming Success Path]
    D -->|Error| G[Failure Path]
    D -->|ModifyResponseException| H{Which Endpoint?}

    E --> E1["post_call_response_headers_hook(request_headers=self._request_headers)"]
    E1 --> E2[Inject headers into StreamingResponse]

    F --> F1["post_call_response_headers_hook(request_headers=self._request_headers)"]
    F1 --> F2[Inject headers into fastapi_response]

    G --> G1["post_call_response_headers_hook(request_headers=self._request_headers)"]
    G1 --> G2[Inject headers into error response]

    H -->|/responses ✅ Fixed| I["post_call_response_headers_hook(request_headers=dict(request.headers))"]
    I --> I1[Inject headers into guardrail response]

    H -->|/chat/completions ❌| J[No hook call - gap remains]
    H -->|/completions ❌| K[No hook call - gap remains]
    H -->|/v1/messages ❌| L[No hook call - gap remains]
Loading

Last reviewed commit: 0ad9bac

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

7 files reviewed, 3 comments

Edit Code Review Agent Settings | Greptile

@@ -749,6 +750,8 @@ async def base_process_llm_request(
"""
Common request processing logic for both chat completions and responses API endpoints
"""
self._request_headers = dict(request.headers)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sensitive headers exposed to callbacks

dict(request.headers) captures all HTTP headers, including authorization, cookie, and other sensitive values. These are then passed unfiltered to every CustomLogger callback via request_headers. A misconfigured or third-party callback could inadvertently log or leak credentials.

Consider filtering out sensitive headers before storing them, or at minimum document this behavior clearly. For example:

_sensitive_headers = {"authorization", "cookie", "proxy-authorization"}
self._request_headers = {
    k: v for k, v in request.headers.items()
    if k.lower() not in _sensitive_headers
}

Comment on lines +1 to +90
"""
Demo CustomLogger that injects custom response headers.

Shows how to:
1. Echo an incoming request header (e.g., APIGEE request ID) into the response
2. Inject headers on both success and failure paths
3. Works for /chat/completions, /embeddings, and /responses

Usage:
litellm --config tests/e2e_demo_response_headers_config.yaml

Test commands:
# /chat/completions (non-streaming)
curl -s -D- http://localhost:4000/chat/completions \
-H "Authorization: Bearer sk-1234" \
-H "x-apigee-request-id: apigee-req-001" \
-d '{"model":"gpt-4o-mini","messages":[{"role":"user","content":"hi"}]}'

# /chat/completions (streaming)
curl -s -D- http://localhost:4000/chat/completions \
-H "Authorization: Bearer sk-1234" \
-H "x-apigee-request-id: apigee-req-002" \
-d '{"model":"gpt-4o-mini","messages":[{"role":"user","content":"hi"}],"stream":true}'

# /embeddings
curl -s -D- http://localhost:4000/embeddings \
-H "Authorization: Bearer sk-1234" \
-H "x-apigee-request-id: apigee-req-003" \
-d '{"model":"text-embedding-3-small","input":"hello"}'

# /v1/responses (non-streaming)
curl -s -D- http://localhost:4000/v1/responses \
-H "Authorization: Bearer sk-1234" \
-H "x-apigee-request-id: apigee-req-004" \
-d '{"model":"gpt-4o-mini","input":"hi"}'

# /v1/responses (streaming)
curl -s -D- http://localhost:4000/v1/responses \
-H "Authorization: Bearer sk-1234" \
-H "x-apigee-request-id: apigee-req-005" \
-d '{"model":"gpt-4o-mini","input":"hi","stream":true}'

# Failure path (bad model → headers still injected)
curl -s -D- http://localhost:4000/chat/completions \
-H "Authorization: Bearer sk-1234" \
-H "x-apigee-request-id: apigee-req-006" \
-d '{"model":"nonexistent-model","messages":[{"role":"user","content":"hi"}]}'

Expected: All responses contain x-apigee-request-id, x-custom-header, and x-litellm-hook-model.
"""

from typing import Any, Dict, Optional

from litellm.integrations.custom_logger import CustomLogger
from litellm.proxy._types import UserAPIKeyAuth


class ResponseHeaderInjector(CustomLogger):
"""
Demonstrates injecting custom HTTP response headers via the proxy hook.

Key features:
- Echoes the incoming x-apigee-request-id header back in the response
- Adds a static custom header and the model name
- Works for success (streaming + non-streaming) and failure responses
- Works for all endpoints: /chat/completions, /embeddings, /responses
"""

async def async_post_call_response_headers_hook(
self,
data: dict,
user_api_key_dict: UserAPIKeyAuth,
response: Any,
request_headers: Optional[Dict[str, str]] = None,
) -> Optional[Dict[str, str]]:
headers: Dict[str, str] = {
"x-custom-header": "hello-from-hook",
"x-litellm-hook-model": data.get("model", "unknown"),
}

# Echo the APIGEE request ID from the incoming request into the response
if request_headers:
apigee_id = request_headers.get("x-apigee-request-id")
if apigee_id:
headers["x-apigee-request-id"] = apigee_id

return headers


response_header_injector = ResponseHeaderInjector()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicate file at repo root

This file is identical to tests/e2e_demo_response_headers_callback.py. Having the same file in two places creates a maintenance burden. Consider keeping only the tests/ copy and removing this root-level file, or if it's intended to be a user-facing demo, place it under docs/ or reference it from a single location.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Comment on lines +229 to +237
# Call response headers hook for guardrail failure path
callback_headers = await proxy_logging_obj.post_call_response_headers_hook(
data=_data,
user_api_key_dict=user_api_key_dict,
response=None,
request_headers=dict(request.headers),
)
if callback_headers:
fastapi_response.headers.update(callback_headers)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inconsistent fix — other ModifyResponseException handlers still lack this hook

This PR adds post_call_response_headers_hook to the ModifyResponseException handler for /responses, but the same gap exists in:

  • proxy_server.py line ~6194 (/chat/completions handler)
  • proxy_server.py line ~6358 (/completions text completions handler)
  • anthropic_endpoints/endpoints.py line ~72 (/v1/messages handler)

If the intention is that custom response headers should be injected on guardrail failures for all endpoints, those handlers need the same treatment. The docs tip at line 443 of call_hooks.md states this hook "works for all proxy endpoints" including failure responses, which would be inaccurate for these other endpoints on guardrail exceptions.

@michelligabriele
Copy link
Collaborator Author

@greptileai review

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Feb 17, 2026

Greptile Summary

This PR fixes the async_post_call_response_headers_hook to actually receive the incoming HTTP request headers (previously always None), enabling use cases like echoing API gateway request IDs into response headers. It also adds the missing post_call_response_headers_hook call in all ModifyResponseException handlers across endpoints so custom headers are injected even when guardrails block requests.

  • Core fix: Captures filtered request headers at the start of base_process_llm_request and passes them to all three hook call sites (streaming success, non-streaming success, failure) in ProxyBaseLLMRequestProcessing
  • Guardrail gap fix: Adds post_call_response_headers_hook to ModifyResponseException handlers in /responses, /chat/completions, /completions, and /v1/messages endpoints
  • Security: Filters sensitive headers (authorization, cookie, proxy-authorization) before passing to callbacks via _filter_sensitive_headers
  • Bug: Missing Dict import in common_request_processing.py will cause a NameError at runtime — must be fixed before merge

Confidence Score: 2/5

  • This PR has a missing import that will cause a runtime NameError in the critical request path — must be fixed before merge.
  • The logic and approach are sound, but the missing Dict import in common_request_processing.py line 359 will cause a NameError when ProxyBaseLLMRequestProcessing.__init__ is called, breaking all proxy request processing. This is a blocking issue.
  • litellm/proxy/common_request_processing.py — missing Dict import causes runtime failure

Important Files Changed

Filename Overview
litellm/proxy/common_request_processing.py Core fix to populate request_headers at all hook call sites. Adds _SENSITIVE_HEADERS filtering (addressing previous review feedback) and passes request_headers to all 3 hook sites. Has a missing Dict import that will cause a NameError at runtime.
litellm/proxy/response_api_endpoints/endpoints.py Adds post_call_response_headers_hook call in the ModifyResponseException handler for /responses. Uses _filter_sensitive_headers consistently with other endpoints. Implementation is correct.
litellm/proxy/proxy_server.py Adds post_call_response_headers_hook to ModifyResponseException handlers in both /chat/completions and /completions endpoints. Consistent pattern with other endpoints.
litellm/proxy/anthropic_endpoints/endpoints.py Adds post_call_response_headers_hook to the ModifyResponseException handler for /v1/messages. Same pattern as other endpoints.
docs/my-website/docs/proxy/call_hooks.md Updated docs to show request_headers usage, removed unnecessary __init__, and added tip about endpoint coverage. Documentation is clear and accurate.
tests/test_litellm/proxy/hooks/test_post_call_response_headers_hook.py Added 2 new mock-only tests verifying request_headers forwarding to callbacks. Tests are well-structured and don't make real network calls.
tests/test_litellm/proxy/response_api_endpoints/test_response_headers_on_guardrail_exception.py New test verifying custom headers appear on guardrail failure in /responses. Uses TestClient with mocks — no real network calls. Properly patches auth and processor.
tests/e2e_demo_response_headers_callback.py New E2E demo showing APIGEE request ID echoing. Well-documented with usage instructions. Purely demonstrative, no issues.

Flowchart

flowchart TD
    A[Incoming HTTP Request] --> B[ProxyBaseLLMRequestProcessing.__init__]
    B --> C[base_process_llm_request]
    C --> D["_filter_sensitive_headers(request.headers)"]
    D --> E[Store as self._request_headers]
    
    E --> F{Request outcome?}
    
    F -->|Streaming Success| G["post_call_response_headers_hook(request_headers=self._request_headers)"]
    G --> H[Merge into streaming custom_headers]
    
    F -->|Non-Streaming Success| I["post_call_response_headers_hook(request_headers=self._request_headers)"]
    I --> J[Update fastapi_response.headers]
    
    F -->|Failure| K["post_call_response_headers_hook(request_headers=self._request_headers)"]
    K --> L[Include in error response headers]
    
    F -->|ModifyResponseException| M["_filter_sensitive_headers(request.headers) — called directly"]
    M --> N["post_call_response_headers_hook(request_headers=filtered)"]
    N --> O[Update fastapi_response.headers]

    style M fill:#f9f,stroke:#333
    style D fill:#bbf,stroke:#333
Loading

Last reviewed commit: 1f22984

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

8 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

def __init__(self, data: dict):
self.data = data
self._request_headers: Optional[Dict[str, str]] = None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing Dict import causes NameError

Dict is used in the type annotation here but is not imported from typing (lines 6-15 import TYPE_CHECKING, Any, AsyncGenerator, Callable, Literal, Optional, Tuple, Union — but not Dict). In Python without from __future__ import annotations, variable annotations in function bodies are evaluated at runtime. This will raise a NameError: name 'Dict' is not defined when ProxyBaseLLMRequestProcessing.__init__ is called.

Suggested change
self._request_headers: Optional[Dict[str, str]] = None
self._request_headers: Optional[dict] = None

Alternatively, add Dict to the typing imports at line 6.

@michelligabriele michelligabriele force-pushed the fix/response-headers-hook-request-headers branch from 1f22984 to bd30289 Compare February 17, 2026 18:14
@michelligabriele
Copy link
Collaborator Author

@greptileai review

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Feb 17, 2026

Greptile Summary

This PR fixes the async_post_call_response_headers_hook so that the request_headers parameter is actually populated (previously always None), and adds the missing hook call in ModifyResponseException handlers across all endpoints so custom headers are injected even when guardrails block requests.

  • Core fix: Captures filtered request headers via _filter_sensitive_headers in base_process_llm_request and passes them to all three hook call sites (streaming success, non-streaming success, failure) in ProxyBaseLLMRequestProcessing
  • Guardrail gap fix: Adds post_call_response_headers_hook calls to ModifyResponseException handlers in /chat/completions, /completions, /v1/messages, and /responses endpoints
  • Security: Introduces _SENSITIVE_HEADERS frozenset to strip authorization, cookie, and proxy-authorization before passing headers to callbacks
  • Tests: Adds mock-only unit tests verifying request_headers forwarding and the guardrail exception path — no real network calls
  • Docs: Minor inaccuracy — tip says "original HTTP request headers" but sensitive headers are actually filtered

Confidence Score: 4/5

  • This PR is safe to merge — it fills in a previously-always-None parameter and adds missing hook calls with proper sensitive header filtering.
  • The changes are well-scoped: they populate a parameter that was always None, add consistent hook calls across all ModifyResponseException handlers, and filter sensitive headers before passing to callbacks. Tests cover the core paths. The only concern is a minor docs inaccuracy (says "original" headers but sensitive ones are filtered).
  • Pay attention to docs/my-website/docs/proxy/call_hooks.md — the tip claims headers are "original" but they are actually filtered.

Important Files Changed

Filename Overview
litellm/proxy/common_request_processing.py Core fix: adds _filter_sensitive_headers static method and _request_headers instance attribute; captures filtered headers in base_process_llm_request and passes them to all three hook call sites (streaming success, non-streaming success, failure).
litellm/proxy/proxy_server.py Adds post_call_response_headers_hook call to both /chat/completions and /completions ModifyResponseException handlers, using _filter_sensitive_headers to strip credentials before passing to callbacks.
litellm/proxy/response_api_endpoints/endpoints.py Adds post_call_response_headers_hook call to /responses ModifyResponseException handler with filtered headers, closing the guardrail gap for the responses API.
litellm/proxy/anthropic_endpoints/endpoints.py Adds post_call_response_headers_hook call to /v1/messages ModifyResponseException handler with filtered headers, ensuring consistency with other endpoints.
docs/my-website/docs/proxy/call_hooks.md Updates hook example to show request_headers usage for echoing gateway IDs; adds tip about endpoint coverage. Doc says "original HTTP request headers" but sensitive headers are filtered — slightly misleading.
tests/e2e_demo_response_headers_callback.py Demo callback showing APIGEE request ID echoing. Not a test file — serves as a reference implementation for documentation purposes.
tests/test_litellm/proxy/hooks/test_post_call_response_headers_hook.py Adds two well-structured mock-only tests verifying request_headers forwarding to callbacks. Tests are properly isolated with no network calls.
tests/test_litellm/proxy/response_api_endpoints/test_response_headers_on_guardrail_exception.py Mock-only test verifying the guardrail exception path in /responses invokes the response headers hook. Uses TestClient with mocked auth and processor — no real network calls.

Flowchart

flowchart TD
    A[Incoming HTTP Request] --> B[FastAPI Endpoint]
    B --> C{_filter_sensitive_headers}
    C -->|Remove auth, cookie, proxy-auth| D[Filtered request_headers]
    D --> E[base_process_llm_request]
    E --> F{Response Path}
    F -->|Streaming Success| G[post_call_response_headers_hook\nwith request_headers]
    F -->|Non-Streaming Success| H[post_call_response_headers_hook\nwith request_headers]
    F -->|Failure| I[post_call_response_headers_hook\nwith request_headers]
    F -->|ModifyResponseException\nGuardrail Block| J[post_call_response_headers_hook\nwith _filter_sensitive_headers]
    G --> K[Custom headers injected into response]
    H --> K
    I --> K
    J --> K
    K --> L[HTTP Response with custom headers]
Loading

Last reviewed commit: bd30289

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

8 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

:::tip
This hook works for **all proxy endpoints**: `/chat/completions`, `/embeddings`, `/responses` (streaming and non-streaming), and failure responses.

The `request_headers` parameter contains the original HTTP request headers, allowing you to echo incoming headers (e.g., API gateway request IDs) into the response.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Docs say "original" but headers are filtered

The text says request_headers contains "the original HTTP request headers," but _filter_sensitive_headers strips authorization, cookie, and proxy-authorization before passing them to callbacks. Consider updating the wording to note this filtering, e.g.:

"The request_headers parameter contains the HTTP request headers (with sensitive headers like authorization and cookie removed for security)..."

This helps callback authors understand they won't have access to auth headers through this parameter.

michelligabriele and others added 3 commits February 18, 2026 14:21
…rail gap

The async_post_call_response_headers_hook receives request_headers as a
parameter but it was never populated — always None. This prevented use cases
like echoing an API gateway request ID (e.g., APIGEE) from the incoming
request into the response headers.

Changes:
- Capture request headers from the FastAPI Request object at the start of
  base_process_llm_request and pass them to all three hook call sites
  (streaming success, non-streaming success, failure)
- Add missing post_call_response_headers_hook call in the /responses
  endpoint's ModifyResponseException handler, so custom headers are
  injected even when a guardrail blocks the request
- Update E2E demo to show APIGEE request ID echoing across all endpoints
- Update docs with request_headers usage example
- Add unit tests for request_headers forwarding and guardrail exception path

Closes BerriAI#19646
…ints

Address code review feedback:

1. Filter sensitive request headers (authorization, cookie,
   proxy-authorization) before passing to callbacks to prevent
   credential leaks in third-party loggers.

2. Add post_call_response_headers_hook to ModifyResponseException
   handlers in all four endpoints (/chat/completions, /completions,
   /v1/messages, /responses) — previously only /responses was covered.

3. Remove duplicate e2e demo file from repo root (canonical copy
   lives in tests/).
Move post_call_failure_hook + post_call_response_headers_hook calls
from 4 separate endpoint handlers into a single
_handle_modify_response_exception method on ProxyBaseLLMRequestProcessing,
mirroring the existing _handle_llm_api_exception pattern.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@michelligabriele michelligabriele force-pushed the fix/response-headers-hook-request-headers branch from bd30289 to 6318692 Compare February 18, 2026 13:46
@michelligabriele
Copy link
Collaborator Author

@greptileai review

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Feb 18, 2026

Greptile Summary

This PR fixes the async_post_call_response_headers_hook to actually receive request_headers (previously always None) and centralizes ModifyResponseException handling so custom response headers are injected on guardrail failures.

  • Core fix: Captures filtered request headers at the start of base_process_llm_request and passes them to all three hook call sites (streaming success, non-streaming success, failure)
  • Centralized guardrail handling: New _handle_modify_response_exception method replaces duplicated inline logic across /chat/completions, /completions, /responses, and /v1/messages endpoints
  • Sensitive header filtering: Strips authorization, cookie, and proxy-authorization before passing headers to callbacks
  • Bug: Streaming guardrail responses lose custom headers: In /chat/completions, /completions, and /v1/messages, when a streaming request triggers a guardrail ModifyResponseException, headers are set on fastapi_response but a new StreamingResponse is returned — the custom headers are silently dropped. This affects proxy_server.py (lines 6228, 6399) and anthropic_endpoints/endpoints.py (line 111)
  • Tests and docs are well-structured; tests are mock-only as required

Confidence Score: 3/5

  • The core logic is sound but custom headers are silently lost on streaming guardrail responses across three endpoints.
  • The non-streaming paths work correctly and the centralization of ModifyResponseException handling is a good refactor. However, the streaming guardrail paths in /chat/completions, /completions, and /v1/messages create new StreamingResponse objects that don't include the custom headers set on fastapi_response, meaning the fix is incomplete for streaming guardrail scenarios.
  • litellm/proxy/proxy_server.py (streaming ModifyResponseException handlers at lines 6228 and 6399), litellm/proxy/anthropic_endpoints/endpoints.py (streaming handler at line 111)

Important Files Changed

Filename Overview
litellm/proxy/common_request_processing.py Core change: adds _filter_sensitive_headers, captures request headers at start of base_process_llm_request, passes them to all hook call sites, and centralizes ModifyResponseException handling in _handle_modify_response_exception. Implementation is clean and consistent.
litellm/proxy/proxy_server.py Replaces inline failure hook with centralized _handle_modify_response_exception in both /chat/completions and /completions. Custom headers set on fastapi_response are lost when a new StreamingResponse is returned in the streaming guardrail path.
litellm/proxy/anthropic_endpoints/endpoints.py Refactored to use centralized _handle_modify_response_exception. Same streaming header-loss issue: create_response at line 111 is called with headers={} instead of including the custom callback headers.
litellm/proxy/response_api_endpoints/endpoints.py Refactored to use centralized _handle_modify_response_exception. Non-streaming path works correctly since it returns a model object and fastapi_response headers are preserved by FastAPI.
tests/test_litellm/proxy/hooks/test_post_call_response_headers_hook.py Adds two new mock-only tests verifying request_headers are forwarded to the callback hook. Tests are correctly structured and don't make real network calls.
tests/test_litellm/proxy/response_api_endpoints/test_response_headers_on_guardrail_exception.py New test using FastAPI TestClient that verifies custom headers appear on guardrail failure for /v1/responses. Uses appropriate mocking to avoid real network calls.
docs/my-website/docs/proxy/call_hooks.md Updated docs to show request_headers usage with gateway ID echoing example. Added tip about all-endpoint support.
tests/e2e_demo_response_headers_callback.py New E2E demo callback showing how to echo APIGEE request IDs via the response headers hook. Documentation/demo only file.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[Incoming Request] --> B[base_process_llm_request]
    B --> C{Capture request_headers\nfilter sensitive headers}
    C --> D[Route to LLM]
    D --> E{Response Type?}
    E -->|Streaming Success| F[post_call_response_headers_hook\nwith request_headers]
    E -->|Non-Streaming Success| G[post_call_response_headers_hook\nwith request_headers]
    E -->|Exception| H{Exception Type?}
    H -->|ModifyResponseException| I[_handle_modify_response_exception]
    I --> J[post_call_failure_hook]
    J --> K[post_call_response_headers_hook\nwith request_headers]
    K --> L{Streaming?}
    L -->|Yes| M["⚠️ New StreamingResponse\n(headers lost)"]
    L -->|No| N[Headers on fastapi_response ✓]
    H -->|Other Exception| O[_handle_llm_api_exception]
    O --> P[post_call_response_headers_hook\nwith request_headers]
    F --> Q[Headers in custom_headers dict ✓]
    G --> R[Headers on fastapi_response ✓]
Loading

Last reviewed commit: 6318692

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

8 files reviewed, 2 comments

Edit Code Review Agent Settings | Greptile

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Feb 18, 2026

Additional Comments (2)

litellm/proxy/proxy_server.py
Custom headers lost on streaming guardrail responses

When a streaming request triggers a ModifyResponseException, _handle_modify_response_exception sets custom headers on fastapi_response, but the handler then returns a new StreamingResponse object (line 6228) that does not carry those headers. The custom callback headers will be silently dropped.

The same issue exists in the /completions handler (line 6399) and the anthropic endpoint's streaming path (anthropic_endpoints/endpoints.py line 111 where create_response is called with headers={}).

To fix this, the custom headers need to be explicitly passed to the new StreamingResponse. For example:

            return StreamingResponse(
                selected_data_generator,
                media_type="text/event-stream",
                status_code=200,  # Return 200 for passthrough mode
                headers=dict(fastapi_response.headers),
            )

litellm/proxy/anthropic_endpoints/endpoints.py
Streaming guardrail response discards custom headers

_handle_modify_response_exception at line 75 sets callback headers on fastapi_response, but this streaming path passes headers={} to create_response, creating a new response object that doesn't include those custom headers. The headers from the hook will be silently lost for streaming Anthropic guardrail responses.

Consider forwarding the fastapi_response headers:

            return await create_response(
                generator=selected_data_generator,
                media_type="text/event-stream",
                headers=dict(fastapi_response.headers),
            )

…lures

When a streaming request triggered ModifyResponseException, custom
headers set by _handle_modify_response_exception on fastapi_response
were lost because a new StreamingResponse was returned without them.
Pass dict(fastapi_response.headers) to StreamingResponse/create_response
in /chat/completions, /completions, and /v1/messages handlers.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@michelligabriele
Copy link
Collaborator Author

@greptileai review

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Feb 18, 2026

Greptile Summary

This PR fixes the async_post_call_response_headers_hook to actually receive request headers (previously always None) and closes the guardrail gap where ModifyResponseException handlers were missing the response headers hook call.

  • Core fix: Captures filtered request headers at the start of base_process_llm_request and passes them to all three hook call sites (streaming success, non-streaming success, failure)
  • Centralized guardrail handling: New _handle_modify_response_exception method consolidates the failure hook + response headers hook logic, replacing duplicated inline code across /chat/completions, /completions, /responses, and /v1/messages endpoints
  • Streaming header propagation: StreamingResponse in ModifyResponseException handlers now correctly passes fastapi_response.headers (previously {} in the Anthropic endpoint)
  • Sensitive header filtering: Authorization, cookie, and proxy-authorization headers are stripped before passing to callbacks via _filter_sensitive_headers
  • Tests: Mock-only unit tests verify request_headers forwarding and the guardrail exception path

Confidence Score: 4/5

  • This PR is safe to merge — it fixes a real bug with well-structured, consistent changes across all endpoints.
  • The changes are well-scoped and consistently applied across all four endpoint handlers. The centralized _handle_modify_response_exception method reduces duplication. Sensitive header filtering is correctly implemented. Tests cover the key paths (request_headers forwarding and guardrail exception). The only minor concern is the docs describing headers as "original" when they are filtered, but this was already flagged in a previous review thread.
  • No files require special attention — changes are consistent and well-tested.

Important Files Changed

Filename Overview
litellm/proxy/common_request_processing.py Core fix: captures filtered request headers, passes them to all 3 hook call sites, and adds centralized _handle_modify_response_exception method. Well-structured changes.
litellm/proxy/proxy_server.py Chat completions and text completions ModifyResponseException handlers now use centralized method and pass custom headers to StreamingResponse.
litellm/proxy/anthropic_endpoints/endpoints.py Anthropic endpoint now uses centralized guardrail handler and passes fastapi_response.headers to streaming response instead of empty {}.
litellm/proxy/response_api_endpoints/endpoints.py Responses endpoint now uses centralized _handle_modify_response_exception for guardrail exceptions, ensuring custom headers are injected.
docs/my-website/docs/proxy/call_hooks.md Updated docs with request_headers usage example and tip about endpoint coverage. Minor inaccuracy: says "original" headers but they are filtered.
tests/e2e_demo_response_headers_callback.py E2E demo callback showing APIGEE request ID echoing. Not a test file — it's a CustomLogger implementation for manual testing.
tests/test_litellm/proxy/hooks/test_post_call_response_headers_hook.py Two new mock-only tests validating request_headers forwarding through ProxyLogging.post_call_response_headers_hook. Good coverage.
tests/test_litellm/proxy/response_api_endpoints/test_response_headers_on_guardrail_exception.py Integration test using TestClient with mocked auth and request processing to verify custom headers on guardrail failures in /responses. Fully mocked, no real network calls.

Sequence Diagram

sequenceDiagram
    participant Client
    participant FastAPI as FastAPI Endpoint
    participant Processor as ProxyBaseLLMRequestProcessing
    participant Guardrail
    participant ProxyLogging
    participant Callback as CustomLogger Callback

    Client->>FastAPI: HTTP Request (with headers)
    FastAPI->>Processor: base_process_llm_request(request)
    Processor->>Processor: _filter_sensitive_headers(request.headers)
    Note over Processor: Strips authorization, cookie, proxy-authorization

    alt Guardrail blocks request
        Processor->>Guardrail: pre-call check
        Guardrail-->>Processor: ModifyResponseException
        Processor->>FastAPI: raises ModifyResponseException
        FastAPI->>Processor: _handle_modify_response_exception(e)
        Processor->>ProxyLogging: post_call_failure_hook()
        Processor->>ProxyLogging: post_call_response_headers_hook(request_headers)
        ProxyLogging->>Callback: async_post_call_response_headers_hook(request_headers)
        Callback-->>ProxyLogging: custom headers (e.g., x-apigee-request-id)
        ProxyLogging-->>Processor: merged headers
        Processor->>FastAPI: fastapi_response.headers.update(headers)
        FastAPI-->>Client: 200 + custom headers + violation message
    else Success (non-streaming)
        Processor->>ProxyLogging: post_call_response_headers_hook(request_headers)
        ProxyLogging->>Callback: async_post_call_response_headers_hook(request_headers)
        Callback-->>ProxyLogging: custom headers
        ProxyLogging-->>Processor: merged headers
        Processor->>FastAPI: fastapi_response.headers.update(headers)
        FastAPI-->>Client: 200 + custom headers + response
    else Success (streaming)
        Processor->>ProxyLogging: post_call_response_headers_hook(request_headers)
        ProxyLogging->>Callback: async_post_call_response_headers_hook(request_headers)
        Callback-->>ProxyLogging: custom headers
        ProxyLogging-->>Processor: merged headers
        Processor-->>FastAPI: StreamingResponse(headers=custom_headers)
        FastAPI-->>Client: SSE stream + custom headers
    end
Loading

Last reviewed commit: 36717bd

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

8 files reviewed, no comments

Edit Code Review Agent Settings | Greptile

@giulio-leone
Copy link

Automated patch bundle from top-50 unresolved backlog scan.\nGenerated due limited direct branch-write access; please apply/cherry-pick the proposed minimal edits below.\n\n## PR #21385fix/response-headers-hook-request-headers

Unresolved threads summary

  • greptile-apps, litellm/proxy/common_request_processing.py:753, Sensitive headers exposed to callbacks.
  • greptile-apps, e2e_demo_response_headers_callback.py:90, Duplicate file at repo root: This file is identical to tests/e2e_demo_response_headers_callback.py.
  • greptile-apps, litellm/proxy/response_api_endpoints/endpoints.py:237, Inconsistent fix — other ModifyResponseException handlers still lack this hook.
  • greptile-apps, litellm/proxy/common_request_processing.py:361, Missing Dict import causes NameError.
  • greptile-apps, docs/my-website/docs/proxy/call_hooks.md:445, Docs say "original" but headers are filtered.

Concrete patch proposal

  • litellm/proxy/common_request_processing.py: filter sensitive headers by allowlist before callback/log/provider forwarding; add the missing Dict import in this module.

  • e2e_demo_response_headers_callback.py: collapse duplicated branches into one canonical implementation.

  • litellm/proxy/response_api_endpoints/endpoints.py: apply the same fix across all equivalent handlers/endpoints.

  • docs/my-website/docs/proxy/call_hooks.md: rewrite documentation/comments to explicitly cover Docs say "original" but headers are filtered.

michelligabriele added a commit to michelligabriele/litellm that referenced this pull request Mar 6, 2026
…oss all endpoints

The response headers hook had 5 gaps that prevented callbacks from
reliably extracting routing metadata across endpoint types:

1. Hook never fired for /audio/transcriptions (endpoint bypasses
   base_process_llm_request)
2. custom_llm_provider not accessible in hook data for any endpoint
3. custom_llm_provider not stamped in ResponsesAPIResponse._hidden_params
   (unlike chat completions)
4. model_info under inconsistent keys (metadata vs litellm_metadata)
5. request_headers always None at all call sites

This adds a litellm_call_info parameter to the hook that normalizes
routing metadata (custom_llm_provider, model_info, api_base, model_id)
regardless of endpoint type. Also stamps custom_llm_provider on
Responses API responses, adds the hook call to the transcription
handler, and passes request_headers at all call sites.

Supersedes PR BerriAI#21385.
ishaan-jaff pushed a commit that referenced this pull request Mar 12, 2026
…oss all endpoints (#22985)

* fix(proxy): make async_post_call_response_headers_hook consistent across all endpoints

The response headers hook had 5 gaps that prevented callbacks from
reliably extracting routing metadata across endpoint types:

1. Hook never fired for /audio/transcriptions (endpoint bypasses
   base_process_llm_request)
2. custom_llm_provider not accessible in hook data for any endpoint
3. custom_llm_provider not stamped in ResponsesAPIResponse._hidden_params
   (unlike chat completions)
4. model_info under inconsistent keys (metadata vs litellm_metadata)
5. request_headers always None at all call sites

This adds a litellm_call_info parameter to the hook that normalizes
routing metadata (custom_llm_provider, model_info, api_base, model_id)
regardless of endpoint type. Also stamps custom_llm_provider on
Responses API responses, adds the hook call to the transcription
handler, and passes request_headers at all call sites.

Supersedes PR #21385.

* fix(proxy): address review feedback — safer backwards compat and None guards

- Replace try/except TypeError with inspect.signature() check for
  litellm_call_info backwards compatibility. This avoids masking real
  TypeErrors inside callback implementations and prevents double
  invocation with inconsistent parameters.

- Use (data.get("key") or {}) instead of data.get("key", {}) to guard
  against keys that exist with an explicit None value, which would
  cause AttributeError on the subsequent .get() call.

* fix(proxy): cache inspect.signature result for callback compat check

Move the inspect.signature() call into a module-level helper with a
dict cache keyed by callback identity. Avoids repeated introspection
per request per callback in the hot path.

* fix(proxy): use class identity for signature cache key

Key the _CALLBACK_ACCEPTS_CALL_INFO cache by id(type(cb)) instead of
id(cb) to avoid stale entries from Python address reuse after GC.
All instances of the same callback class share the same method
signature, so class identity is both safer and more cache-efficient.
shivamrawat1 pushed a commit that referenced this pull request Mar 12, 2026
…oss all endpoints (#22985)

* fix(proxy): make async_post_call_response_headers_hook consistent across all endpoints

The response headers hook had 5 gaps that prevented callbacks from
reliably extracting routing metadata across endpoint types:

1. Hook never fired for /audio/transcriptions (endpoint bypasses
   base_process_llm_request)
2. custom_llm_provider not accessible in hook data for any endpoint
3. custom_llm_provider not stamped in ResponsesAPIResponse._hidden_params
   (unlike chat completions)
4. model_info under inconsistent keys (metadata vs litellm_metadata)
5. request_headers always None at all call sites

This adds a litellm_call_info parameter to the hook that normalizes
routing metadata (custom_llm_provider, model_info, api_base, model_id)
regardless of endpoint type. Also stamps custom_llm_provider on
Responses API responses, adds the hook call to the transcription
handler, and passes request_headers at all call sites.

Supersedes PR #21385.

* fix(proxy): address review feedback — safer backwards compat and None guards

- Replace try/except TypeError with inspect.signature() check for
  litellm_call_info backwards compatibility. This avoids masking real
  TypeErrors inside callback implementations and prevents double
  invocation with inconsistent parameters.

- Use (data.get("key") or {}) instead of data.get("key", {}) to guard
  against keys that exist with an explicit None value, which would
  cause AttributeError on the subsequent .get() call.

* fix(proxy): cache inspect.signature result for callback compat check

Move the inspect.signature() call into a module-level helper with a
dict cache keyed by callback identity. Avoids repeated introspection
per request per callback in the hot path.

* fix(proxy): use class identity for signature cache key

Key the _CALLBACK_ACCEPTS_CALL_INFO cache by id(type(cb)) instead of
id(cb) to avoid stale entries from Python address reuse after GC.
All instances of the same callback class share the same method
signature, so class identity is both safer and more cache-efficient.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature]: Approach to add custom header into response header through failure event hook

2 participants