Skip to content

[Perf] Remove premature model.dump call on the hot path#19109

Merged
AlexsanderHamir merged 8 commits intomainfrom
litellm_weekly_overhead_work_006
Jan 14, 2026
Merged

[Perf] Remove premature model.dump call on the hot path#19109
AlexsanderHamir merged 8 commits intomainfrom
litellm_weekly_overhead_work_006

Conversation

@AlexsanderHamir
Copy link
Contributor

@AlexsanderHamir AlexsanderHamir commented Jan 14, 2026

Relevant issues

Previously, _extract_response_obj_and_hidden_params() called model_dump() on BaseModel objects immediately, converting them to dicts even when a dict wasn't needed. This caused:

  • Unnecessary serialization overhead (identified as a bottleneck in profiling)

Pre-Submission checklist

Please complete all items before asking a LiteLLM maintainer to review your PR

  • I have Added testing in the tests/litellm/ directory, Adding at least 1 test is a hard requirement - see details
  • My PR passes all unit tests on make test-unit
  • My PR's scope is as isolated as possible, it only solves 1 specific problem

CI (LiteLLM team)

CI status guideline:

  • 50-55 passing tests: main is stable with minor issues.
  • 45-49 passing tests: acceptable but needs attention
  • <= 40 passing tests: unstable; be careful with your merges and assess the risk.

Type

🧹 Refactoring

Changes

Performance Optimization: Lazy BaseModel Serialization

Core Change:

  • Removed early model_dump() call: _extract_response_obj_and_hidden_params() now returns Union[dict, BaseModel] instead of always converting BaseModel to dict
  • Lazy serialization: model_dump() is now only called when a dict is actually required (in get_final_response_obj() for redaction/logging)

Implementation Details:

  • Changed return type of _extract_response_obj_and_hidden_params() from Tuple[dict, Optional[dict]] to Tuple[Union[dict, BaseModel], Optional[dict]]
  • Updated get_final_response_obj() to call _safe_model_dump() only when response_obj is a BaseModel
  • Updated all downstream functions to handle both dict and BaseModel types

Defensive Helper Functions Added:

  • _safe_model_dump(): Safely calls model_dump() with fallback strategies (__dict__, string conversion)
  • _safe_get_attribute(): Safely extracts attributes from both dict and BaseModel objects
  • _safe_extract_usage_from_obj(): Safely extracts usage from response objects
  • _try_transform_response_api_usage(): Attempts ResponseAPIUsage transformation with error handling
  • _try_create_usage_from_dict(): Attempts Usage creation from dict with error handling

Unit Test Results

Screenshot 2026-01-14 115536
  • Failures seem to be unrelated.

…_dump()

- Return BaseModel directly instead of converting to dict via model_dump()
- Update get_usage_from_response_obj to handle both dict and BaseModel
- Update get_final_response_obj to handle BaseModel (lazy conversion)
- Add type guards for direct attribute access when response_obj is BaseModel

This optimization eliminates the expensive model_dump() call (previously 92.6% of function time)
by leveraging ModelResponse's .get() method and deferring dict conversion until needed.
- Fix bug where ResponseAPIUsage objects from ResponsesAPIResponse were not being transformed
- Add explicit handling for ResponseAPIUsage objects (not just dicts)
- Use _is_response_api_usage() helper to handle both ResponseAPIUsage objects and dicts
- Remove problematic 'or {}' fallback that was converting None to empty dict

This fixes the test failure where prompt_tokens was 0 instead of 8 for ResponsesAPIResponse objects.
- Restore original defensive behavior: return Usage(0,0,0) for unknown types instead of raising ValueError
- Keep explicit ResponseAPIUsage handling to fix the bug where it was incorrectly returning 0 tokens
- Maintain backward compatibility by preserving original error handling approach
@vercel
Copy link

vercel bot commented Jan 14, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Review Updated (UTC)
litellm Ready Ready Preview, Comment Jan 14, 2026 8:36pm

- Add _safe_model_dump() for safe model_dump() calls with fallbacks
- Add _safe_get_attribute() for safe attribute access from dict/BaseModel
- Add _safe_extract_usage_from_obj() for safe usage extraction
- Add _try_transform_response_api_usage() for ResponseAPIUsage transformation
- Add _try_create_usage_from_dict() for safe Usage creation
- Refactor get_usage_from_response_obj() to use helper functions
- Refactor _extract_response_obj_and_hidden_params() to use helper functions
- Refactor get_final_response_obj() to use helper functions
- Update type hints to support Union[dict, BaseModel] throughout
- Fix ID extraction to preserve falsy values (0, empty string, False) from response_obj

- Only fallback to litellm_call_id when id is None (attribute doesn't exist)

- Remove redundant comments that duplicate function names and obvious logic

- Clean up code for better readability
try:
return Usage(**usage)
except (TypeError, ValueError) as e:
verbose_logger.debug(f"Error creating Usage from dict: {e}, usage: {usage}")

Check failure

Code scanning / CodeQL

Clear-text logging of sensitive information High

This expression logs sensitive data (password) as clear text.
This expression logs sensitive data (secret) as clear text.
This expression logs sensitive data (password) as clear text.
This expression logs sensitive data (secret) as clear text.
This expression logs sensitive data (password) as clear text.
This expression logs sensitive data (password) as clear text.

Copilot Autofix

AI 2 months ago

To fix the problem, we need to stop logging potentially sensitive contents of the usage dict while still preserving enough information for debugging. The problematic line is the debug log in _try_create_usage_from_dict, which currently formats and logs the complete usage dict when constructing a Usage object fails.

The best fix with minimal behavior change is:

  • Keep catching the exception and returning None as before.
  • Change the log message to:
    • Omit the usage value.
    • Optionally log non-sensitive structural information, such as the keys present in usage and the type, which is sufficient for debugging schema mismatches.
  • Ensure no additional sensitive fields are logged as raw values.

Concretely:

  • In litellm/litellm_core_utils/litellm_logging.py, in _try_create_usage_from_dict, replace:
except (TypeError, ValueError) as e:
    verbose_logger.debug(f"Error creating Usage from dict: {e}, usage: {usage}")
    return None

with something like:

except (TypeError, ValueError) as e:
    try:
        usage_keys = list(usage.keys())
    except Exception:
        usage_keys = None
    verbose_logger.debug(
        "Error creating Usage from dict: %s, usage keys: %s, usage type: %s",
        e,
        usage_keys,
        type(usage),
    )
    return None

This preserves error context (exception and shape of the dict) while avoiding logging dict values.

No other files require changes for this specific alert, because the root issue is the sink (the logging of usage), and fixing that sink addresses all variants that flow into this location.


Suggested changeset 1
litellm/litellm_core_utils/litellm_logging.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/litellm/litellm_core_utils/litellm_logging.py b/litellm/litellm_core_utils/litellm_logging.py
--- a/litellm/litellm_core_utils/litellm_logging.py
+++ b/litellm/litellm_core_utils/litellm_logging.py
@@ -4972,7 +4972,17 @@
     try:
         return Usage(**usage)
     except (TypeError, ValueError) as e:
-        verbose_logger.debug(f"Error creating Usage from dict: {e}, usage: {usage}")
+        # Avoid logging full dict contents, which may include sensitive data.
+        try:
+            usage_keys = list(usage.keys())
+        except Exception:
+            usage_keys = None
+        verbose_logger.debug(
+            "Error creating Usage from dict: %s, usage keys: %s, usage type: %s",
+            e,
+            usage_keys,
+            type(usage),
+        )
         return None
 
 
EOF
@@ -4972,7 +4972,17 @@
try:
return Usage(**usage)
except (TypeError, ValueError) as e:
verbose_logger.debug(f"Error creating Usage from dict: {e}, usage: {usage}")
# Avoid logging full dict contents, which may include sensitive data.
try:
usage_keys = list(usage.keys())
except Exception:
usage_keys = None
verbose_logger.debug(
"Error creating Usage from dict: %s, usage keys: %s, usage type: %s",
e,
usage_keys,
type(usage),
)
return None


Copilot is powered by AI and may make mistakes. Always verify output.
Unable to commit as this autofix suggestion is now outdated
- Replace full usage dict logging with keys and type only

- Prevents potential exposure of sensitive information in debug logs

- Maintains debugging capability without security risk
@AlexsanderHamir AlexsanderHamir merged commit b352d0d into main Jan 14, 2026
28 of 56 checks passed
uc4w6c added a commit that referenced this pull request Jan 16, 2026
shriharsha98 added a commit to juspay/litellm that referenced this pull request Feb 10, 2026
* fix: lint

* fix: prevent MCP type objects from being captured in locals()

* Fix: BerriAI#19089 websocket version error

* Fix: handling of model name in query param

* Add logger to see websocket request

* Fix: code quality tests

* Fix: code quality tests

* docs(logging.md): add guide for mounting custom callbacks in Helm/K8s (BerriAI#19136)

* docs: new cookbook .md for claude code

allows rendering on models.litellm.ai

* refactor: rename file for consistency

* Add user auth in standard logging object for bedrock passthrough

* remove unused space

* Fix: test router

* fix(index.json): have a parent index.json and just link out to guides from docs

maintain just 1 place for tutorials

* Fix: tests/test_litellm/proxy/test_litellm_pre_call_utils.py::test_embedding_header_forwarding_with_model_group

* Fix: tests/test_litellm/proxy/test_proxy_server.py::test_embedding_input_array_of_tokens

* Fix: response enterprise tests

* docs(claude_mcp.md): separate claude mcp tutorial into a separate doc

easier to surface

* docs: update index.json

* fix: mount config.yaml as single file in Helm chart (BerriAI#19146)

* Fix: response enterprise tests

* Fix: mock test tests

* Fix: mock test tests

* Adjust icons for buttons

* fixing build

* Added ability to customize logfire base url through env var (BerriAI#19148)

* Added ability to customize logfire base url through env var

* Added test to check if env var is used correctly for logfire

* Document the env var

* Documented env var in config_settings.md

* Litellm dev 01 15 2026 p1 (BerriAI#19153)

* fix: safely handle unmapped call type

* docs: cleanup links for ai coding tools

* docs(claude_non_anthropic_models.md): add tutorial showing non anthropic model connection to claude code

* docs: link to non-anthropic model tutorial for claude code

* docs: document more tutorials on website

* chore: remove unused test files from repository root (BerriAI#19150)

Remove orphaned test files that are not referenced in any tests or code:
- flux2_test_image.png
- test_generic_guardrail_config.yaml
- test_image_edit.png (root only, tests/image_gen_tests/ copy preserved)
- document.txt
- batch_small.jsonl (root and tests/batches_tests/)

* Chore: bump boto3 version (BerriAI#19090)

* Add pricing for volcengine models (deepseek-v3-2, glm-4-7, kimi-k2-thinking) (BerriAI#19076)

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>

* Make keepalive_timeout parameter work for Gunicorn (BerriAI#19087)

* [Fix] Containers API - Allow routing to regional endpoints (BerriAI#19118)

* fix get_complete_url

* fix url resolution containers API

* TestContainerRegionalApiBase

* feat(proxy): add keepalive_timeout support for Gunicorn server

Add configurable keepalive timeout parameter for Gunicorn workers to
match existing Uvicorn functionality. This allows users to tune the
keep-alive connection timeout based on their deployment requirements.

Changes:
- Add keepalive_timeout parameter to _run_gunicorn_server method
- Configure Gunicorn's keepalive setting (defaults to 90s if not specified)
- Update --keepalive_timeout CLI help text to document both Uvicorn and Gunicorn behavior
- Pass keepalive_timeout from run_server to _run_gunicorn_server

Tests:
- Add test to verify keepalive_timeout flag is properly passed to Gunicorn
- Add test to verify default 90s timeout when flag is not specified

Co-Authored-By: lizhen921 <294474470@qq.com>
Signed-off-by: Kris Xia <xiajiayi0506@gmail.com>

---------

Signed-off-by: Kris Xia <xiajiayi0506@gmail.com>
Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com>
Co-authored-by: lizhen921 <294474470@qq.com>

* Update prisma_migration.py (BerriAI#19083)

* fix: model-level guardrails not taking effect (BerriAI#18363) (BerriAI#18895)

* fix: model-level guardrails not taking effect (BerriAI#18363)

* fix(proxy): add support event-based deployment hooks

* fix(proxy): add type safety check for guardrails

* fix: models loadbalancing billing issue by filter (BerriAI#18891)

* fix: models loadbalancing billing issue by filter

* fix: separate key and team access groups in metadata

* fix: video status/content credential injection for wildcard models (BerriAI#18854)

* fix: video status/content credential injection for wildcard models

When using wildcard model patterns like `vertex_ai/*`, the video status
and content endpoints failed to resolve the model_name correctly,
causing credential injection to be skipped.

Changes:
- router.py: Added `custom_llm_provider` parameter to
  `resolve_model_name_from_model_id` method
- router.py: Added Strategy 2 (provider prefix matching) and
  Strategy 4 (wildcard pattern matching)
- endpoints.py: Pass `provider_from_id` to resolver in video_status,
  video_content, and video_remix endpoints

This allows video_id like `vertex_ai:veo-3.0-generate-preview:...` to
correctly match `vertex_ai/*` wildcard pattern and inject credentials
from the model config.

Fixes: Video status returns "Your default credentials were not found"
when using Vertex AI video generation with wildcard model patterns.

* pr18845-video기능버그픽스 (vibe-kanban e43e2d2d)

pr코멘트 대응

litellm fork해서 branch만들고 작업후 pull request를 올렸는데 피드백을줬어.

이 내용 파악해서 내가 올린 pr 브랜치에 해당 작업 이어서 해야할거같아.

https://github.com/BerriAI/litellm/pull/18854#discussion\_r2677026995

여기 내용 읽고 현황 파악해서 작업하자.

테스트코드 작성해달라는데 테스트코드작성후 로컬에서 테스트명령어 한번 돌리고 커밋 푸시하려고.

litellm에서 pull request를 위한 문서가 있어.

https://docs.litellm.ai/docs/extras/contributing\_code

CRA서명은 했어. 그다음거부터 양식에 맞게 해야할듯. 지금 버그만 바로 고쳐서 pr했거든.

* fix: resolve mypy type error in resolve_model_name_from_model_id

Rename loop variable to avoid type conflict between DeploymentTypedDict
and Dict[Any, Any] from pattern_router.route() return type.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>

* Reusable model select

* fixing build

* fixing build 2

* Fix Azure embeddings JSON parsing to prevent connection leaks and ensure proper router cooldown (BerriAI#19167)

* Revert "[Feat] Add support for 0 cost models"

* [Feat] Add support for Tool Search on /messages API  - Azure, Bedrock, Anthropic API (BerriAI#19165)

* fix _update_headers_with_anthropic_beta

* init ANTHROPIC_BETA_HEADER_VALUES

* fix ANTHROPIC_BETA_HEADER_VALUES

* fix: _update_headers_with_anthropic_beta - anthropic API

* init _update_headers_with_anthropic_beta - azure AI support

* init VertexAIPartnerModelsAnthropicMessagesConfig

* fix _get_total_tokens_from_usage

* working TestBedrockInvokeToolSearch

* fix get_extra_headers

* TestBedrockInvokeToolSearch

* _get_tool_search_beta_header_for_bedrock

* fix mypy linting

* test fix

* test fix

* test: fix router_code_coverage test fail

* chore: fix lint error

* ensuring this still works PENDING PROXY EXTRAS

* test: fix missing test

* [Feat] Claude Code - Add End-user tracking with Claude Code (BerriAI#19171)

* add claude code customer usage tracking

* fix get end user trackign claude code

* TestGetCustomerIdFromStandardHeaders

* Revert "fix(gemini): dereference $defs/$ref in tool response content (BerriAI#19062)"

This reverts commit 84dad95.

* chore: document temporary grype ignore for CVE-2026-22184

* Revert "[Fix] /team/daily/activity Show Internal Users Their Spend Only"

* [Docs Guide] Litellm claude code end user tracking (BerriAI#19176)

* add to sidebar

* v1 guide

* guide claude granular cost tracking

* docs fix

* fix gcp glm-4.7 pricing (BerriAI#19172)

* Improve documentation for routing LLM calls via SAP Gen AI Hub (BerriAI#19166)

* fix(sap): resolve JSON serialization error and update documentation

- Fix 'Object of type cached_property is not JSON serializable' error
- Replace @cached_property with manual caching in deployment_url
- Update documentation examples to match sap_proxy_config.yaml
- Add Anthropic model naming clarification (anthropic-- prefix)
- Improve authentication documentation with tabbed interface

Fixes critical bug preventing SAP Gen AI Hub integration from working.
Fully tested with both chat and embedding endpoints.

* docs: update SAP provider documentation

* Update SAP provider documentation with better setup instructions

Rewrote the SAP docs to make it easier for users to get started. Added a quick start section, clarified the authentication options, explained model naming differences between SDK and proxy usage, and included some troubleshooting tips.

* Revert transformation files - keep only documentation changes

* Revert "[Perf] Remove premature model.dump call on the hot path (BerriAI#19109)"

This reverts commit b352d0d.

* chore: add zlib to allow list

* fix(bedrock): strip throughput tier suffixes from model names (BerriAI#19147)

Co-authored-by: Greek, John <jgreek@users.noreply.github.com>

* chore: update jaraco

* tests: skip Azure SDK init check for acreate_skill

* Fix : test_stream_chunk_builder_litellm_mixed_calls

* chore: force jaraco.context 6.1.0 in runtime images

* chore: move install jaraco.context

* test: handle wildcard routes in route validation test

* Fix : test_streaming_multiple_partial_tool_calls

* docs: add redis initalization with kwargs

* chore: pip install upgrade

* chore: pip install force-reinstall

* chore: address jaraco.context path traversal vulnerability (GHSA-58pv-8j8x-9vj2)

* Add fallback endpoints support

* Fix get_combined_tool_content Too many statements (70 > 50)

* Add medium value support for detail param for gemini

* chore: add jaraco liccheck

* Team Settings Model Select

* adding mocks

* bump: version 1.80.16 → 1.80.17

* refactor team member icon buttons

* Fix: [Bug]: stream_timeout:The function of this parameter has been changed

* Add sanititzation for anthropic messages

* Add docs for message sanitisation

* Fix : revert get_combined_tool_content

* Fix : revert get_combined_tool_content

* Fix malformed tool call tranform

* fix Updated all 27 occurrences of mode: image_edit to mode: image_edits

* fix: image_edits request handling fails for Stability models

* fix documentation

* Fix mypy issues

* chore: add ALLOWED_CVES. Because Wolfi glibc still flagged even on 2.42-r5.

* Fix: vertex ai doesn't support structured output

* Revert "fix: models loadbalancing billing issue by filter (BerriAI#18891)"

This reverts commit 41d8f79.

* Fix:add async_get_available_deployment_for_pass_through in code tests

* Fix boto3 conflicting dependency

* Potential fix for code scanning alert no. 3990: Clear-text logging of sensitive information

Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>

* Fix model map

* Fix all mypy issues

* Add azure/gpt-5.2-codex

* ci/cd fixes

* fix stability mode

* only show own internal user usage

* fix: correct budget limit validation operator (>=) for team members (BerriAI#19207)

* ci(github): add automated duplicate issue checker and template safeguards (BerriAI#19218)

* bump: version 0.4.21 → 0.4.22

* Adding build artifacts

* fix(vertex_ai): Vertex AI 400 Error: Model used by GenerateContent request (models/gemini-3-*) and CachedContent (models/gemini-3-*) has to be the same (BerriAI#19193)

* fix(vertex_ai): include model in context cache key generation

* test(vertex_ai): update context caching tests to verify model in cache key

* fix(logging): Include langfuse logger in JSON logging when langfuse callback is used (BerriAI#19162)

When JSON_LOGS is enabled and langfuse is configured as a success/failure callback,
the langfuse logger now receives the JSON formatter. This ensures langfuse SDK
log messages (like 'Item exceeds size limit' warnings) are output as JSON with
proper level information, instead of plain text that log aggregators may
incorrectly classify as errors.

Fixes issue where langfuse warnings appeared as errors in Datadog due to missing
log level in unformatted output.

Co-authored-by: openhands <openhands@all-hands.dev>

* fixing tests

* Revert "[Fix] /user/new Privilege Escalation"

* Revert "Add sanititzation for anthropic messages"

* Revert "Fix: malformed tool call transformation in bedrock"

* bump: version 0.4.21 → 0.4.22

* Revert "Stabilise mock tests"

* Revert "Litellm staging 01 15 2026"

* Revert "[Feature] Deleted Keys and Deleted Teams Table"

* Manual revert BerriAI#19078

* feat: add auto-labeling for 'claude code' issues (BerriAI#19242)

* Revert "Revert "[Feature] Deleted Keys and Deleted Teams Table""

* bump: version 0.4.22 → 0.4.23

* adding migration

* [Docs] Litellm architecture fixes 2 (BerriAI#19252)

* fixes 1

* docs fix

* docs fix

* docs fix

* docs fix

* /public/model_hub health information

* Public Model Hub Health UI

* fix: ci test

gemini 2.5 depricated

* [Fix] - Reliability fix OOMs with image url handling (BerriAI#19257)

* fix MAX_IMAGE_URL_DOWNLOAD_SIZE_MB

* test_image_exceeds_size_limit_with_content_length

* fix: _process_image_response

* add constants 50MB

* fix convert_to_anthropic_image_obj image handling

* test_gemini_image_size_limit_exceeded

* MAX_IMAGE_URL_DOWNLOAD_SIZE_MB fix

* MAX_IMAGE_URL_DOWNLOAD_SIZE_MB

* test_image_size_limit_disabled

* async_convert_url_to_base64

* docs fix

* code QA check

* fix Exception

* Add status to /list in keys and teams

* adding tests

* Linting

* refresh keys on delete

* temp commit for branch switching

* fixing lint

* fixing test

* Fixing tests and adding proper returns

* linting

* [Feat] Claude Code - Add Websearch support using LiteLLM /search (using web search interception hook) (BerriAI#19263)

* init WebSearchInterceptionLogger

* test_websearch_interception_real_call

* init async_should_run_agentic_completion

* async_should_run_agentic_loop

* async_run_agentic_loop

* refactor folder

* fix organization

* WebSearchTransformation

* WebSearchInterceptionLogger

* _call_agentic_completion_hooks

* WebSearch Interception Architecture

* test_websearch_interception_real_call

* add streaming

* add transform_request for streaming

* get_llm_provider

* test fix

* fix info

* init from config.yaml

* fixes

* test handler

* fix _is_streaming_response

* async_run_agentic_loop

* mypy fix

* Deleted Teams

* Adding tests

* fixing tests

* feat(panw_prisma_airs): add custom violation message support

* Adjusting new badges

* building ui

* docs fix

* png fixes

* deleted teams endpoint fix

* rebuilding ui

* updating docker pull cmd

* fixing ui build

* docs ui usage

* docs fix

* fix doc

* docs clean up

* deleted keys and teams docs

* fix build attempt

* testing adding entire out

* [Feat] Claude Code x LiteLLM WebSearch - QA Fixes to work with Claude Code  (BerriAI#19294)

* fix websearch_interception_converted_stream

* test_websearch_interception_no_tool_call_streaming

* FakeAnthropicMessagesStreamIterator

* LITELLM_WEB_SEARCH_TOOL_NAME

* fixes tools def for litellm web search

* fixes FakeAnthropicMessagesStreamIterator

* test_litellm_standard_websearch_tool

* use new hook for modfying before any transfroms from litellm

* init WebSearchInterceptionLogger + ARCHITECTURE

* fix config.yaml

* init doc for claude code web search

* docs fix

* doc fix

* fix mypy linting

* test_router_fallbacks_with_custom_model_costs

* test_deepseek_mock_completion

* v1.81.0

* docs fix

* test_aiohttp_openai

* fix

* qa fixes

* docs fix

* docs fix

* docs fix

* docs fix

* docs fix

* [Fix] LiteLLM VertexAI Pass through - ensuring incoming headers are forwarded down to target  (BerriAI#19524)

* test_vertex_passthrough_forwards_anthropic_beta_header

* add_incoming_headers

* [Fix] VertexAI Pass through - Ensure only anthropic betas are forwarded down to LLM API (BerriAI#19542)

* fix ALLOWED_VERTEX_AI_PASSTHROUGH_HEADERS

* test_vertex_passthrough_forwards_anthropic_beta_header

* fix test_vertex_passthrough_forwards_anthropic_beta_header

* test_vertex_passthrough_does_not_forward_litellm_auth_token

* fix utils

* Using Anthropic Beta Features on Vertex AI

* test_forward_headers_from_request_x_pass_prefix

* Fix: Handle PostgreSQL cached plan errors during rolling deployments (BerriAI#19424)

* Fix in-flight request termination on SIGTERM when health-check runs in a separate process (BerriAI#19427)

* Fix build errors after merge

---------

Signed-off-by: Kris Xia <xiajiayi0506@gmail.com>
Co-authored-by: Yuta Saito <uc4w6c@bma.biglobe.ne.jp>
Co-authored-by: YutaSaito <36355491+uc4w6c@users.noreply.github.com>
Co-authored-by: Sameer Kankute <sameer@berri.ai>
Co-authored-by: Harshit Jain <48647625+Harshit28j@users.noreply.github.com>
Co-authored-by: Krrish Dholakia <krrishdholakia@gmail.com>
Co-authored-by: yuneng-jiang <yuneng.jiang@gmail.com>
Co-authored-by: Vikash <master.bvik@gmail.com>
Co-authored-by: Cesar Garcia <128240629+Chesars@users.noreply.github.com>
Co-authored-by: burnerburnerburnerman <rharmhd@gmail.com>
Co-authored-by: 拐爷&&老拐瘦 <geyf@vip.qq.com>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
Co-authored-by: Kris Xia <xiajiayi0506@gmail.com>
Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com>
Co-authored-by: lizhen921 <294474470@qq.com>
Co-authored-by: danielnyari-seon <daniel.nyari@seon.io>
Co-authored-by: choigawoon <choigawoon@gmail.com>
Co-authored-by: Alexsander Hamir <alexsanderhamirgomesbaptista@gmail.com>
Co-authored-by: Emerson Gomes <emerson.gomes@thalesgroup.com>
Co-authored-by: Guilherme Segantini <guilherme.segantini@sap.com>
Co-authored-by: John Greek <2006605+jgreek@users.noreply.github.com>
Co-authored-by: Greek, John <jgreek@users.noreply.github.com>
Co-authored-by: mubashir1osmani <mubashir.osmani777@gmail.com>
Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
Co-authored-by: Anand Kamble <anandmk837@gmail.com>
Co-authored-by: Graham Neubig <neubig@gmail.com>
Co-authored-by: openhands <openhands@all-hands.dev>
Co-authored-by: Harshit Jain <harshitjain0562@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant