fix(vertex_ai): Vertex AI 400 Error: Model used by GenerateContent request (models/gemini-3-*) and CachedContent (models/gemini-3-*) has to be the same#19193
Merged
krrishdholakia merged 3 commits intoBerriAI:mainfrom Jan 16, 2026
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
Contributor
Author
Contributor
Author
shriharsha98
added a commit
to juspay/litellm
that referenced
this pull request
Feb 10, 2026
* fix: lint * fix: prevent MCP type objects from being captured in locals() * Fix: BerriAI#19089 websocket version error * Fix: handling of model name in query param * Add logger to see websocket request * Fix: code quality tests * Fix: code quality tests * docs(logging.md): add guide for mounting custom callbacks in Helm/K8s (BerriAI#19136) * docs: new cookbook .md for claude code allows rendering on models.litellm.ai * refactor: rename file for consistency * Add user auth in standard logging object for bedrock passthrough * remove unused space * Fix: test router * fix(index.json): have a parent index.json and just link out to guides from docs maintain just 1 place for tutorials * Fix: tests/test_litellm/proxy/test_litellm_pre_call_utils.py::test_embedding_header_forwarding_with_model_group * Fix: tests/test_litellm/proxy/test_proxy_server.py::test_embedding_input_array_of_tokens * Fix: response enterprise tests * docs(claude_mcp.md): separate claude mcp tutorial into a separate doc easier to surface * docs: update index.json * fix: mount config.yaml as single file in Helm chart (BerriAI#19146) * Fix: response enterprise tests * Fix: mock test tests * Fix: mock test tests * Adjust icons for buttons * fixing build * Added ability to customize logfire base url through env var (BerriAI#19148) * Added ability to customize logfire base url through env var * Added test to check if env var is used correctly for logfire * Document the env var * Documented env var in config_settings.md * Litellm dev 01 15 2026 p1 (BerriAI#19153) * fix: safely handle unmapped call type * docs: cleanup links for ai coding tools * docs(claude_non_anthropic_models.md): add tutorial showing non anthropic model connection to claude code * docs: link to non-anthropic model tutorial for claude code * docs: document more tutorials on website * chore: remove unused test files from repository root (BerriAI#19150) Remove orphaned test files that are not referenced in any tests or code: - flux2_test_image.png - test_generic_guardrail_config.yaml - test_image_edit.png (root only, tests/image_gen_tests/ copy preserved) - document.txt - batch_small.jsonl (root and tests/batches_tests/) * Chore: bump boto3 version (BerriAI#19090) * Add pricing for volcengine models (deepseek-v3-2, glm-4-7, kimi-k2-thinking) (BerriAI#19076) Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com> * Make keepalive_timeout parameter work for Gunicorn (BerriAI#19087) * [Fix] Containers API - Allow routing to regional endpoints (BerriAI#19118) * fix get_complete_url * fix url resolution containers API * TestContainerRegionalApiBase * feat(proxy): add keepalive_timeout support for Gunicorn server Add configurable keepalive timeout parameter for Gunicorn workers to match existing Uvicorn functionality. This allows users to tune the keep-alive connection timeout based on their deployment requirements. Changes: - Add keepalive_timeout parameter to _run_gunicorn_server method - Configure Gunicorn's keepalive setting (defaults to 90s if not specified) - Update --keepalive_timeout CLI help text to document both Uvicorn and Gunicorn behavior - Pass keepalive_timeout from run_server to _run_gunicorn_server Tests: - Add test to verify keepalive_timeout flag is properly passed to Gunicorn - Add test to verify default 90s timeout when flag is not specified Co-Authored-By: lizhen921 <294474470@qq.com> Signed-off-by: Kris Xia <xiajiayi0506@gmail.com> --------- Signed-off-by: Kris Xia <xiajiayi0506@gmail.com> Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com> Co-authored-by: lizhen921 <294474470@qq.com> * Update prisma_migration.py (BerriAI#19083) * fix: model-level guardrails not taking effect (BerriAI#18363) (BerriAI#18895) * fix: model-level guardrails not taking effect (BerriAI#18363) * fix(proxy): add support event-based deployment hooks * fix(proxy): add type safety check for guardrails * fix: models loadbalancing billing issue by filter (BerriAI#18891) * fix: models loadbalancing billing issue by filter * fix: separate key and team access groups in metadata * fix: video status/content credential injection for wildcard models (BerriAI#18854) * fix: video status/content credential injection for wildcard models When using wildcard model patterns like `vertex_ai/*`, the video status and content endpoints failed to resolve the model_name correctly, causing credential injection to be skipped. Changes: - router.py: Added `custom_llm_provider` parameter to `resolve_model_name_from_model_id` method - router.py: Added Strategy 2 (provider prefix matching) and Strategy 4 (wildcard pattern matching) - endpoints.py: Pass `provider_from_id` to resolver in video_status, video_content, and video_remix endpoints This allows video_id like `vertex_ai:veo-3.0-generate-preview:...` to correctly match `vertex_ai/*` wildcard pattern and inject credentials from the model config. Fixes: Video status returns "Your default credentials were not found" when using Vertex AI video generation with wildcard model patterns. * pr18845-video기능버그픽스 (vibe-kanban e43e2d2d) pr코멘트 대응 litellm fork해서 branch만들고 작업후 pull request를 올렸는데 피드백을줬어. 이 내용 파악해서 내가 올린 pr 브랜치에 해당 작업 이어서 해야할거같아. https://github.com/BerriAI/litellm/pull/18854#discussion\_r2677026995 여기 내용 읽고 현황 파악해서 작업하자. 테스트코드 작성해달라는데 테스트코드작성후 로컬에서 테스트명령어 한번 돌리고 커밋 푸시하려고. litellm에서 pull request를 위한 문서가 있어. https://docs.litellm.ai/docs/extras/contributing\_code CRA서명은 했어. 그다음거부터 양식에 맞게 해야할듯. 지금 버그만 바로 고쳐서 pr했거든. * fix: resolve mypy type error in resolve_model_name_from_model_id Rename loop variable to avoid type conflict between DeploymentTypedDict and Dict[Any, Any] from pattern_router.route() return type. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com> * Reusable model select * fixing build * fixing build 2 * Fix Azure embeddings JSON parsing to prevent connection leaks and ensure proper router cooldown (BerriAI#19167) * Revert "[Feat] Add support for 0 cost models" * [Feat] Add support for Tool Search on /messages API - Azure, Bedrock, Anthropic API (BerriAI#19165) * fix _update_headers_with_anthropic_beta * init ANTHROPIC_BETA_HEADER_VALUES * fix ANTHROPIC_BETA_HEADER_VALUES * fix: _update_headers_with_anthropic_beta - anthropic API * init _update_headers_with_anthropic_beta - azure AI support * init VertexAIPartnerModelsAnthropicMessagesConfig * fix _get_total_tokens_from_usage * working TestBedrockInvokeToolSearch * fix get_extra_headers * TestBedrockInvokeToolSearch * _get_tool_search_beta_header_for_bedrock * fix mypy linting * test fix * test fix * test: fix router_code_coverage test fail * chore: fix lint error * ensuring this still works PENDING PROXY EXTRAS * test: fix missing test * [Feat] Claude Code - Add End-user tracking with Claude Code (BerriAI#19171) * add claude code customer usage tracking * fix get end user trackign claude code * TestGetCustomerIdFromStandardHeaders * Revert "fix(gemini): dereference $defs/$ref in tool response content (BerriAI#19062)" This reverts commit 84dad95. * chore: document temporary grype ignore for CVE-2026-22184 * Revert "[Fix] /team/daily/activity Show Internal Users Their Spend Only" * [Docs Guide] Litellm claude code end user tracking (BerriAI#19176) * add to sidebar * v1 guide * guide claude granular cost tracking * docs fix * fix gcp glm-4.7 pricing (BerriAI#19172) * Improve documentation for routing LLM calls via SAP Gen AI Hub (BerriAI#19166) * fix(sap): resolve JSON serialization error and update documentation - Fix 'Object of type cached_property is not JSON serializable' error - Replace @cached_property with manual caching in deployment_url - Update documentation examples to match sap_proxy_config.yaml - Add Anthropic model naming clarification (anthropic-- prefix) - Improve authentication documentation with tabbed interface Fixes critical bug preventing SAP Gen AI Hub integration from working. Fully tested with both chat and embedding endpoints. * docs: update SAP provider documentation * Update SAP provider documentation with better setup instructions Rewrote the SAP docs to make it easier for users to get started. Added a quick start section, clarified the authentication options, explained model naming differences between SDK and proxy usage, and included some troubleshooting tips. * Revert transformation files - keep only documentation changes * Revert "[Perf] Remove premature model.dump call on the hot path (BerriAI#19109)" This reverts commit b352d0d. * chore: add zlib to allow list * fix(bedrock): strip throughput tier suffixes from model names (BerriAI#19147) Co-authored-by: Greek, John <jgreek@users.noreply.github.com> * chore: update jaraco * tests: skip Azure SDK init check for acreate_skill * Fix : test_stream_chunk_builder_litellm_mixed_calls * chore: force jaraco.context 6.1.0 in runtime images * chore: move install jaraco.context * test: handle wildcard routes in route validation test * Fix : test_streaming_multiple_partial_tool_calls * docs: add redis initalization with kwargs * chore: pip install upgrade * chore: pip install force-reinstall * chore: address jaraco.context path traversal vulnerability (GHSA-58pv-8j8x-9vj2) * Add fallback endpoints support * Fix get_combined_tool_content Too many statements (70 > 50) * Add medium value support for detail param for gemini * chore: add jaraco liccheck * Team Settings Model Select * adding mocks * bump: version 1.80.16 → 1.80.17 * refactor team member icon buttons * Fix: [Bug]: stream_timeout:The function of this parameter has been changed * Add sanititzation for anthropic messages * Add docs for message sanitisation * Fix : revert get_combined_tool_content * Fix : revert get_combined_tool_content * Fix malformed tool call tranform * fix Updated all 27 occurrences of mode: image_edit to mode: image_edits * fix: image_edits request handling fails for Stability models * fix documentation * Fix mypy issues * chore: add ALLOWED_CVES. Because Wolfi glibc still flagged even on 2.42-r5. * Fix: vertex ai doesn't support structured output * Revert "fix: models loadbalancing billing issue by filter (BerriAI#18891)" This reverts commit 41d8f79. * Fix:add async_get_available_deployment_for_pass_through in code tests * Fix boto3 conflicting dependency * Potential fix for code scanning alert no. 3990: Clear-text logging of sensitive information Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com> * Fix model map * Fix all mypy issues * Add azure/gpt-5.2-codex * ci/cd fixes * fix stability mode * only show own internal user usage * fix: correct budget limit validation operator (>=) for team members (BerriAI#19207) * ci(github): add automated duplicate issue checker and template safeguards (BerriAI#19218) * bump: version 0.4.21 → 0.4.22 * Adding build artifacts * fix(vertex_ai): Vertex AI 400 Error: Model used by GenerateContent request (models/gemini-3-*) and CachedContent (models/gemini-3-*) has to be the same (BerriAI#19193) * fix(vertex_ai): include model in context cache key generation * test(vertex_ai): update context caching tests to verify model in cache key * fix(logging): Include langfuse logger in JSON logging when langfuse callback is used (BerriAI#19162) When JSON_LOGS is enabled and langfuse is configured as a success/failure callback, the langfuse logger now receives the JSON formatter. This ensures langfuse SDK log messages (like 'Item exceeds size limit' warnings) are output as JSON with proper level information, instead of plain text that log aggregators may incorrectly classify as errors. Fixes issue where langfuse warnings appeared as errors in Datadog due to missing log level in unformatted output. Co-authored-by: openhands <openhands@all-hands.dev> * fixing tests * Revert "[Fix] /user/new Privilege Escalation" * Revert "Add sanititzation for anthropic messages" * Revert "Fix: malformed tool call transformation in bedrock" * bump: version 0.4.21 → 0.4.22 * Revert "Stabilise mock tests" * Revert "Litellm staging 01 15 2026" * Revert "[Feature] Deleted Keys and Deleted Teams Table" * Manual revert BerriAI#19078 * feat: add auto-labeling for 'claude code' issues (BerriAI#19242) * Revert "Revert "[Feature] Deleted Keys and Deleted Teams Table"" * bump: version 0.4.22 → 0.4.23 * adding migration * [Docs] Litellm architecture fixes 2 (BerriAI#19252) * fixes 1 * docs fix * docs fix * docs fix * docs fix * /public/model_hub health information * Public Model Hub Health UI * fix: ci test gemini 2.5 depricated * [Fix] - Reliability fix OOMs with image url handling (BerriAI#19257) * fix MAX_IMAGE_URL_DOWNLOAD_SIZE_MB * test_image_exceeds_size_limit_with_content_length * fix: _process_image_response * add constants 50MB * fix convert_to_anthropic_image_obj image handling * test_gemini_image_size_limit_exceeded * MAX_IMAGE_URL_DOWNLOAD_SIZE_MB fix * MAX_IMAGE_URL_DOWNLOAD_SIZE_MB * test_image_size_limit_disabled * async_convert_url_to_base64 * docs fix * code QA check * fix Exception * Add status to /list in keys and teams * adding tests * Linting * refresh keys on delete * temp commit for branch switching * fixing lint * fixing test * Fixing tests and adding proper returns * linting * [Feat] Claude Code - Add Websearch support using LiteLLM /search (using web search interception hook) (BerriAI#19263) * init WebSearchInterceptionLogger * test_websearch_interception_real_call * init async_should_run_agentic_completion * async_should_run_agentic_loop * async_run_agentic_loop * refactor folder * fix organization * WebSearchTransformation * WebSearchInterceptionLogger * _call_agentic_completion_hooks * WebSearch Interception Architecture * test_websearch_interception_real_call * add streaming * add transform_request for streaming * get_llm_provider * test fix * fix info * init from config.yaml * fixes * test handler * fix _is_streaming_response * async_run_agentic_loop * mypy fix * Deleted Teams * Adding tests * fixing tests * feat(panw_prisma_airs): add custom violation message support * Adjusting new badges * building ui * docs fix * png fixes * deleted teams endpoint fix * rebuilding ui * updating docker pull cmd * fixing ui build * docs ui usage * docs fix * fix doc * docs clean up * deleted keys and teams docs * fix build attempt * testing adding entire out * [Feat] Claude Code x LiteLLM WebSearch - QA Fixes to work with Claude Code (BerriAI#19294) * fix websearch_interception_converted_stream * test_websearch_interception_no_tool_call_streaming * FakeAnthropicMessagesStreamIterator * LITELLM_WEB_SEARCH_TOOL_NAME * fixes tools def for litellm web search * fixes FakeAnthropicMessagesStreamIterator * test_litellm_standard_websearch_tool * use new hook for modfying before any transfroms from litellm * init WebSearchInterceptionLogger + ARCHITECTURE * fix config.yaml * init doc for claude code web search * docs fix * doc fix * fix mypy linting * test_router_fallbacks_with_custom_model_costs * test_deepseek_mock_completion * v1.81.0 * docs fix * test_aiohttp_openai * fix * qa fixes * docs fix * docs fix * docs fix * docs fix * docs fix * [Fix] LiteLLM VertexAI Pass through - ensuring incoming headers are forwarded down to target (BerriAI#19524) * test_vertex_passthrough_forwards_anthropic_beta_header * add_incoming_headers * [Fix] VertexAI Pass through - Ensure only anthropic betas are forwarded down to LLM API (BerriAI#19542) * fix ALLOWED_VERTEX_AI_PASSTHROUGH_HEADERS * test_vertex_passthrough_forwards_anthropic_beta_header * fix test_vertex_passthrough_forwards_anthropic_beta_header * test_vertex_passthrough_does_not_forward_litellm_auth_token * fix utils * Using Anthropic Beta Features on Vertex AI * test_forward_headers_from_request_x_pass_prefix * Fix: Handle PostgreSQL cached plan errors during rolling deployments (BerriAI#19424) * Fix in-flight request termination on SIGTERM when health-check runs in a separate process (BerriAI#19427) * Fix build errors after merge --------- Signed-off-by: Kris Xia <xiajiayi0506@gmail.com> Co-authored-by: Yuta Saito <uc4w6c@bma.biglobe.ne.jp> Co-authored-by: YutaSaito <36355491+uc4w6c@users.noreply.github.com> Co-authored-by: Sameer Kankute <sameer@berri.ai> Co-authored-by: Harshit Jain <48647625+Harshit28j@users.noreply.github.com> Co-authored-by: Krrish Dholakia <krrishdholakia@gmail.com> Co-authored-by: yuneng-jiang <yuneng.jiang@gmail.com> Co-authored-by: Vikash <master.bvik@gmail.com> Co-authored-by: Cesar Garcia <128240629+Chesars@users.noreply.github.com> Co-authored-by: burnerburnerburnerman <rharmhd@gmail.com> Co-authored-by: 拐爷&&老拐瘦 <geyf@vip.qq.com> Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com> Co-authored-by: Kris Xia <xiajiayi0506@gmail.com> Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com> Co-authored-by: lizhen921 <294474470@qq.com> Co-authored-by: danielnyari-seon <daniel.nyari@seon.io> Co-authored-by: choigawoon <choigawoon@gmail.com> Co-authored-by: Alexsander Hamir <alexsanderhamirgomesbaptista@gmail.com> Co-authored-by: Emerson Gomes <emerson.gomes@thalesgroup.com> Co-authored-by: Guilherme Segantini <guilherme.segantini@sap.com> Co-authored-by: John Greek <2006605+jgreek@users.noreply.github.com> Co-authored-by: Greek, John <jgreek@users.noreply.github.com> Co-authored-by: mubashir1osmani <mubashir.osmani777@gmail.com> Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com> Co-authored-by: Anand Kamble <anandmk837@gmail.com> Co-authored-by: Graham Neubig <neubig@gmail.com> Co-authored-by: openhands <openhands@all-hands.dev> Co-authored-by: Harshit Jain <harshitjain0562@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.


Relevant issues
Pre-Submission checklist
Please complete all items before asking a LiteLLM maintainer to review your PR
tests/litellm/directory, Adding at least 1 test is a hard requirement - see detailsmake test-unitCI (LiteLLM team)
Branch creation CI run
Link:
CI run for the last commit
Link:
Merge / cherry-pick CI run
Links:
Type
🐛 Bug Fix
Changes
Include the model name when generating the cache key. This avoids the reuse of cache across different models.