[Fix] VertexAI Pass through - Ensure only anthropic betas are forwarded down to LLM API#19542
Merged
ishaan-jaff merged 7 commits intomainfrom Jan 22, 2026
Merged
[Fix] VertexAI Pass through - Ensure only anthropic betas are forwarded down to LLM API#19542ishaan-jaff merged 7 commits intomainfrom
ishaan-jaff merged 7 commits intomainfrom
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
ishaan-jaff
added a commit
that referenced
this pull request
Jan 22, 2026
…ed down to LLM API (#19542) * fix ALLOWED_VERTEX_AI_PASSTHROUGH_HEADERS * test_vertex_passthrough_forwards_anthropic_beta_header * fix test_vertex_passthrough_forwards_anthropic_beta_header * test_vertex_passthrough_does_not_forward_litellm_auth_token * fix utils * Using Anthropic Beta Features on Vertex AI * test_forward_headers_from_request_x_pass_prefix
shriharsha98
added a commit
to juspay/litellm
that referenced
this pull request
Feb 10, 2026
* fix: lint * fix: prevent MCP type objects from being captured in locals() * Fix: BerriAI#19089 websocket version error * Fix: handling of model name in query param * Add logger to see websocket request * Fix: code quality tests * Fix: code quality tests * docs(logging.md): add guide for mounting custom callbacks in Helm/K8s (BerriAI#19136) * docs: new cookbook .md for claude code allows rendering on models.litellm.ai * refactor: rename file for consistency * Add user auth in standard logging object for bedrock passthrough * remove unused space * Fix: test router * fix(index.json): have a parent index.json and just link out to guides from docs maintain just 1 place for tutorials * Fix: tests/test_litellm/proxy/test_litellm_pre_call_utils.py::test_embedding_header_forwarding_with_model_group * Fix: tests/test_litellm/proxy/test_proxy_server.py::test_embedding_input_array_of_tokens * Fix: response enterprise tests * docs(claude_mcp.md): separate claude mcp tutorial into a separate doc easier to surface * docs: update index.json * fix: mount config.yaml as single file in Helm chart (BerriAI#19146) * Fix: response enterprise tests * Fix: mock test tests * Fix: mock test tests * Adjust icons for buttons * fixing build * Added ability to customize logfire base url through env var (BerriAI#19148) * Added ability to customize logfire base url through env var * Added test to check if env var is used correctly for logfire * Document the env var * Documented env var in config_settings.md * Litellm dev 01 15 2026 p1 (BerriAI#19153) * fix: safely handle unmapped call type * docs: cleanup links for ai coding tools * docs(claude_non_anthropic_models.md): add tutorial showing non anthropic model connection to claude code * docs: link to non-anthropic model tutorial for claude code * docs: document more tutorials on website * chore: remove unused test files from repository root (BerriAI#19150) Remove orphaned test files that are not referenced in any tests or code: - flux2_test_image.png - test_generic_guardrail_config.yaml - test_image_edit.png (root only, tests/image_gen_tests/ copy preserved) - document.txt - batch_small.jsonl (root and tests/batches_tests/) * Chore: bump boto3 version (BerriAI#19090) * Add pricing for volcengine models (deepseek-v3-2, glm-4-7, kimi-k2-thinking) (BerriAI#19076) Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com> * Make keepalive_timeout parameter work for Gunicorn (BerriAI#19087) * [Fix] Containers API - Allow routing to regional endpoints (BerriAI#19118) * fix get_complete_url * fix url resolution containers API * TestContainerRegionalApiBase * feat(proxy): add keepalive_timeout support for Gunicorn server Add configurable keepalive timeout parameter for Gunicorn workers to match existing Uvicorn functionality. This allows users to tune the keep-alive connection timeout based on their deployment requirements. Changes: - Add keepalive_timeout parameter to _run_gunicorn_server method - Configure Gunicorn's keepalive setting (defaults to 90s if not specified) - Update --keepalive_timeout CLI help text to document both Uvicorn and Gunicorn behavior - Pass keepalive_timeout from run_server to _run_gunicorn_server Tests: - Add test to verify keepalive_timeout flag is properly passed to Gunicorn - Add test to verify default 90s timeout when flag is not specified Co-Authored-By: lizhen921 <294474470@qq.com> Signed-off-by: Kris Xia <xiajiayi0506@gmail.com> --------- Signed-off-by: Kris Xia <xiajiayi0506@gmail.com> Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com> Co-authored-by: lizhen921 <294474470@qq.com> * Update prisma_migration.py (BerriAI#19083) * fix: model-level guardrails not taking effect (BerriAI#18363) (BerriAI#18895) * fix: model-level guardrails not taking effect (BerriAI#18363) * fix(proxy): add support event-based deployment hooks * fix(proxy): add type safety check for guardrails * fix: models loadbalancing billing issue by filter (BerriAI#18891) * fix: models loadbalancing billing issue by filter * fix: separate key and team access groups in metadata * fix: video status/content credential injection for wildcard models (BerriAI#18854) * fix: video status/content credential injection for wildcard models When using wildcard model patterns like `vertex_ai/*`, the video status and content endpoints failed to resolve the model_name correctly, causing credential injection to be skipped. Changes: - router.py: Added `custom_llm_provider` parameter to `resolve_model_name_from_model_id` method - router.py: Added Strategy 2 (provider prefix matching) and Strategy 4 (wildcard pattern matching) - endpoints.py: Pass `provider_from_id` to resolver in video_status, video_content, and video_remix endpoints This allows video_id like `vertex_ai:veo-3.0-generate-preview:...` to correctly match `vertex_ai/*` wildcard pattern and inject credentials from the model config. Fixes: Video status returns "Your default credentials were not found" when using Vertex AI video generation with wildcard model patterns. * pr18845-video기능버그픽스 (vibe-kanban e43e2d2d) pr코멘트 대응 litellm fork해서 branch만들고 작업후 pull request를 올렸는데 피드백을줬어. 이 내용 파악해서 내가 올린 pr 브랜치에 해당 작업 이어서 해야할거같아. https://github.com/BerriAI/litellm/pull/18854#discussion\_r2677026995 여기 내용 읽고 현황 파악해서 작업하자. 테스트코드 작성해달라는데 테스트코드작성후 로컬에서 테스트명령어 한번 돌리고 커밋 푸시하려고. litellm에서 pull request를 위한 문서가 있어. https://docs.litellm.ai/docs/extras/contributing\_code CRA서명은 했어. 그다음거부터 양식에 맞게 해야할듯. 지금 버그만 바로 고쳐서 pr했거든. * fix: resolve mypy type error in resolve_model_name_from_model_id Rename loop variable to avoid type conflict between DeploymentTypedDict and Dict[Any, Any] from pattern_router.route() return type. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com> * Reusable model select * fixing build * fixing build 2 * Fix Azure embeddings JSON parsing to prevent connection leaks and ensure proper router cooldown (BerriAI#19167) * Revert "[Feat] Add support for 0 cost models" * [Feat] Add support for Tool Search on /messages API - Azure, Bedrock, Anthropic API (BerriAI#19165) * fix _update_headers_with_anthropic_beta * init ANTHROPIC_BETA_HEADER_VALUES * fix ANTHROPIC_BETA_HEADER_VALUES * fix: _update_headers_with_anthropic_beta - anthropic API * init _update_headers_with_anthropic_beta - azure AI support * init VertexAIPartnerModelsAnthropicMessagesConfig * fix _get_total_tokens_from_usage * working TestBedrockInvokeToolSearch * fix get_extra_headers * TestBedrockInvokeToolSearch * _get_tool_search_beta_header_for_bedrock * fix mypy linting * test fix * test fix * test: fix router_code_coverage test fail * chore: fix lint error * ensuring this still works PENDING PROXY EXTRAS * test: fix missing test * [Feat] Claude Code - Add End-user tracking with Claude Code (BerriAI#19171) * add claude code customer usage tracking * fix get end user trackign claude code * TestGetCustomerIdFromStandardHeaders * Revert "fix(gemini): dereference $defs/$ref in tool response content (BerriAI#19062)" This reverts commit 84dad95. * chore: document temporary grype ignore for CVE-2026-22184 * Revert "[Fix] /team/daily/activity Show Internal Users Their Spend Only" * [Docs Guide] Litellm claude code end user tracking (BerriAI#19176) * add to sidebar * v1 guide * guide claude granular cost tracking * docs fix * fix gcp glm-4.7 pricing (BerriAI#19172) * Improve documentation for routing LLM calls via SAP Gen AI Hub (BerriAI#19166) * fix(sap): resolve JSON serialization error and update documentation - Fix 'Object of type cached_property is not JSON serializable' error - Replace @cached_property with manual caching in deployment_url - Update documentation examples to match sap_proxy_config.yaml - Add Anthropic model naming clarification (anthropic-- prefix) - Improve authentication documentation with tabbed interface Fixes critical bug preventing SAP Gen AI Hub integration from working. Fully tested with both chat and embedding endpoints. * docs: update SAP provider documentation * Update SAP provider documentation with better setup instructions Rewrote the SAP docs to make it easier for users to get started. Added a quick start section, clarified the authentication options, explained model naming differences between SDK and proxy usage, and included some troubleshooting tips. * Revert transformation files - keep only documentation changes * Revert "[Perf] Remove premature model.dump call on the hot path (BerriAI#19109)" This reverts commit b352d0d. * chore: add zlib to allow list * fix(bedrock): strip throughput tier suffixes from model names (BerriAI#19147) Co-authored-by: Greek, John <jgreek@users.noreply.github.com> * chore: update jaraco * tests: skip Azure SDK init check for acreate_skill * Fix : test_stream_chunk_builder_litellm_mixed_calls * chore: force jaraco.context 6.1.0 in runtime images * chore: move install jaraco.context * test: handle wildcard routes in route validation test * Fix : test_streaming_multiple_partial_tool_calls * docs: add redis initalization with kwargs * chore: pip install upgrade * chore: pip install force-reinstall * chore: address jaraco.context path traversal vulnerability (GHSA-58pv-8j8x-9vj2) * Add fallback endpoints support * Fix get_combined_tool_content Too many statements (70 > 50) * Add medium value support for detail param for gemini * chore: add jaraco liccheck * Team Settings Model Select * adding mocks * bump: version 1.80.16 → 1.80.17 * refactor team member icon buttons * Fix: [Bug]: stream_timeout:The function of this parameter has been changed * Add sanititzation for anthropic messages * Add docs for message sanitisation * Fix : revert get_combined_tool_content * Fix : revert get_combined_tool_content * Fix malformed tool call tranform * fix Updated all 27 occurrences of mode: image_edit to mode: image_edits * fix: image_edits request handling fails for Stability models * fix documentation * Fix mypy issues * chore: add ALLOWED_CVES. Because Wolfi glibc still flagged even on 2.42-r5. * Fix: vertex ai doesn't support structured output * Revert "fix: models loadbalancing billing issue by filter (BerriAI#18891)" This reverts commit 41d8f79. * Fix:add async_get_available_deployment_for_pass_through in code tests * Fix boto3 conflicting dependency * Potential fix for code scanning alert no. 3990: Clear-text logging of sensitive information Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com> * Fix model map * Fix all mypy issues * Add azure/gpt-5.2-codex * ci/cd fixes * fix stability mode * only show own internal user usage * fix: correct budget limit validation operator (>=) for team members (BerriAI#19207) * ci(github): add automated duplicate issue checker and template safeguards (BerriAI#19218) * bump: version 0.4.21 → 0.4.22 * Adding build artifacts * fix(vertex_ai): Vertex AI 400 Error: Model used by GenerateContent request (models/gemini-3-*) and CachedContent (models/gemini-3-*) has to be the same (BerriAI#19193) * fix(vertex_ai): include model in context cache key generation * test(vertex_ai): update context caching tests to verify model in cache key * fix(logging): Include langfuse logger in JSON logging when langfuse callback is used (BerriAI#19162) When JSON_LOGS is enabled and langfuse is configured as a success/failure callback, the langfuse logger now receives the JSON formatter. This ensures langfuse SDK log messages (like 'Item exceeds size limit' warnings) are output as JSON with proper level information, instead of plain text that log aggregators may incorrectly classify as errors. Fixes issue where langfuse warnings appeared as errors in Datadog due to missing log level in unformatted output. Co-authored-by: openhands <openhands@all-hands.dev> * fixing tests * Revert "[Fix] /user/new Privilege Escalation" * Revert "Add sanititzation for anthropic messages" * Revert "Fix: malformed tool call transformation in bedrock" * bump: version 0.4.21 → 0.4.22 * Revert "Stabilise mock tests" * Revert "Litellm staging 01 15 2026" * Revert "[Feature] Deleted Keys and Deleted Teams Table" * Manual revert BerriAI#19078 * feat: add auto-labeling for 'claude code' issues (BerriAI#19242) * Revert "Revert "[Feature] Deleted Keys and Deleted Teams Table"" * bump: version 0.4.22 → 0.4.23 * adding migration * [Docs] Litellm architecture fixes 2 (BerriAI#19252) * fixes 1 * docs fix * docs fix * docs fix * docs fix * /public/model_hub health information * Public Model Hub Health UI * fix: ci test gemini 2.5 depricated * [Fix] - Reliability fix OOMs with image url handling (BerriAI#19257) * fix MAX_IMAGE_URL_DOWNLOAD_SIZE_MB * test_image_exceeds_size_limit_with_content_length * fix: _process_image_response * add constants 50MB * fix convert_to_anthropic_image_obj image handling * test_gemini_image_size_limit_exceeded * MAX_IMAGE_URL_DOWNLOAD_SIZE_MB fix * MAX_IMAGE_URL_DOWNLOAD_SIZE_MB * test_image_size_limit_disabled * async_convert_url_to_base64 * docs fix * code QA check * fix Exception * Add status to /list in keys and teams * adding tests * Linting * refresh keys on delete * temp commit for branch switching * fixing lint * fixing test * Fixing tests and adding proper returns * linting * [Feat] Claude Code - Add Websearch support using LiteLLM /search (using web search interception hook) (BerriAI#19263) * init WebSearchInterceptionLogger * test_websearch_interception_real_call * init async_should_run_agentic_completion * async_should_run_agentic_loop * async_run_agentic_loop * refactor folder * fix organization * WebSearchTransformation * WebSearchInterceptionLogger * _call_agentic_completion_hooks * WebSearch Interception Architecture * test_websearch_interception_real_call * add streaming * add transform_request for streaming * get_llm_provider * test fix * fix info * init from config.yaml * fixes * test handler * fix _is_streaming_response * async_run_agentic_loop * mypy fix * Deleted Teams * Adding tests * fixing tests * feat(panw_prisma_airs): add custom violation message support * Adjusting new badges * building ui * docs fix * png fixes * deleted teams endpoint fix * rebuilding ui * updating docker pull cmd * fixing ui build * docs ui usage * docs fix * fix doc * docs clean up * deleted keys and teams docs * fix build attempt * testing adding entire out * [Feat] Claude Code x LiteLLM WebSearch - QA Fixes to work with Claude Code (BerriAI#19294) * fix websearch_interception_converted_stream * test_websearch_interception_no_tool_call_streaming * FakeAnthropicMessagesStreamIterator * LITELLM_WEB_SEARCH_TOOL_NAME * fixes tools def for litellm web search * fixes FakeAnthropicMessagesStreamIterator * test_litellm_standard_websearch_tool * use new hook for modfying before any transfroms from litellm * init WebSearchInterceptionLogger + ARCHITECTURE * fix config.yaml * init doc for claude code web search * docs fix * doc fix * fix mypy linting * test_router_fallbacks_with_custom_model_costs * test_deepseek_mock_completion * v1.81.0 * docs fix * test_aiohttp_openai * fix * qa fixes * docs fix * docs fix * docs fix * docs fix * docs fix * [Fix] LiteLLM VertexAI Pass through - ensuring incoming headers are forwarded down to target (BerriAI#19524) * test_vertex_passthrough_forwards_anthropic_beta_header * add_incoming_headers * [Fix] VertexAI Pass through - Ensure only anthropic betas are forwarded down to LLM API (BerriAI#19542) * fix ALLOWED_VERTEX_AI_PASSTHROUGH_HEADERS * test_vertex_passthrough_forwards_anthropic_beta_header * fix test_vertex_passthrough_forwards_anthropic_beta_header * test_vertex_passthrough_does_not_forward_litellm_auth_token * fix utils * Using Anthropic Beta Features on Vertex AI * test_forward_headers_from_request_x_pass_prefix * Fix: Handle PostgreSQL cached plan errors during rolling deployments (BerriAI#19424) * Fix in-flight request termination on SIGTERM when health-check runs in a separate process (BerriAI#19427) * Fix build errors after merge --------- Signed-off-by: Kris Xia <xiajiayi0506@gmail.com> Co-authored-by: Yuta Saito <uc4w6c@bma.biglobe.ne.jp> Co-authored-by: YutaSaito <36355491+uc4w6c@users.noreply.github.com> Co-authored-by: Sameer Kankute <sameer@berri.ai> Co-authored-by: Harshit Jain <48647625+Harshit28j@users.noreply.github.com> Co-authored-by: Krrish Dholakia <krrishdholakia@gmail.com> Co-authored-by: yuneng-jiang <yuneng.jiang@gmail.com> Co-authored-by: Vikash <master.bvik@gmail.com> Co-authored-by: Cesar Garcia <128240629+Chesars@users.noreply.github.com> Co-authored-by: burnerburnerburnerman <rharmhd@gmail.com> Co-authored-by: 拐爷&&老拐瘦 <geyf@vip.qq.com> Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com> Co-authored-by: Kris Xia <xiajiayi0506@gmail.com> Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com> Co-authored-by: lizhen921 <294474470@qq.com> Co-authored-by: danielnyari-seon <daniel.nyari@seon.io> Co-authored-by: choigawoon <choigawoon@gmail.com> Co-authored-by: Alexsander Hamir <alexsanderhamirgomesbaptista@gmail.com> Co-authored-by: Emerson Gomes <emerson.gomes@thalesgroup.com> Co-authored-by: Guilherme Segantini <guilherme.segantini@sap.com> Co-authored-by: John Greek <2006605+jgreek@users.noreply.github.com> Co-authored-by: Greek, John <jgreek@users.noreply.github.com> Co-authored-by: mubashir1osmani <mubashir.osmani777@gmail.com> Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com> Co-authored-by: Anand Kamble <anandmk837@gmail.com> Co-authored-by: Graham Neubig <neubig@gmail.com> Co-authored-by: openhands <openhands@all-hands.dev> Co-authored-by: Harshit Jain <harshitjain0562@gmail.com>
shriharsha98
added a commit
to juspay/litellm
that referenced
this pull request
Feb 13, 2026
* [Fix] LiteLLM VertexAI Pass through - ensuring incoming headers are forwarded down to target (BerriAI#19524) * test_vertex_passthrough_forwards_anthropic_beta_header * add_incoming_headers * fix linting errors * fix lint * fix: Send litellm_trace_id to Langfuse to link LiteLLM logs with Langfuse logs * test: update langfuse trace_id tests to use litellm_trace_id * Fix virtual keys table sorting * Adding tests * feat: add GMI Cloud provider support (BerriAI#19376) * feat: add GMI Cloud provider support Add GMI Cloud as an OpenAI-compatible provider with: - Provider configuration in providers.json - Documentation page with usage examples - Model pricing for 16 models (Claude, GPT, DeepSeek, Gemini, etc.) - Sidebar entry for docs navigation * Add gmi_cloud to provider_endpoints_support.json Add provider entry to pass CI validation check that ensures all providers in openai_like/providers.json are documented. * Fix provider key: gmi_cloud -> gmi Match the provider key with providers.json --------- Co-authored-by: Krish Dholakia <krrishdholakia@gmail.com> * Cut chat_completion latency by ~21% by reducing pre-call processing time (BerriAI#19535) * Adding scope to /models * e2e test internal viewer sidebar * Model Select for Create Team * create team model select * fixing build * [Fix] VertexAI Pass through - Ensure only anthropic betas are forwarded down to LLM API (BerriAI#19542) * fix ALLOWED_VERTEX_AI_PASSTHROUGH_HEADERS * test_vertex_passthrough_forwards_anthropic_beta_header * fix test_vertex_passthrough_forwards_anthropic_beta_header * test_vertex_passthrough_does_not_forward_litellm_auth_token * fix utils * Using Anthropic Beta Features on Vertex AI * test_forward_headers_from_request_x_pass_prefix * [Fix] VertexAI Pass through - Ensure only anthropic betas are forwarded down to LLM API (BerriAI#19542) * fix ALLOWED_VERTEX_AI_PASSTHROUGH_HEADERS * test_vertex_passthrough_forwards_anthropic_beta_header * fix test_vertex_passthrough_forwards_anthropic_beta_header * test_vertex_passthrough_does_not_forward_litellm_auth_token * fix utils * Using Anthropic Beta Features on Vertex AI * test_forward_headers_from_request_x_pass_prefix * fix(mcp): forward static_headers to MCP servers (BerriAI#19341) (BerriAI#19366) Forward static_headers from /mcp-rest/test/* routes into the MCP client so headers are present during session.initialize() and tool discovery. Also add a shared merge_mcp_headers() helper to keep header precedence consistent and ensure OpenAPI-to-MCP generated tools include static_headers. Tests: - pytest tests/test_litellm/proxy/_experimental/mcp_server/test_rest_endpoints.py - pytest tests/test_litellm/proxy/_experimental/mcp_server/test_mcp_server_manager.py -k register_openapi_tools_includes_static_headers Fixes BerriAI#19341 Co-authored-by: Krish Dholakia <krrishdholakia@gmail.com> * fix(azure): preserve content_policy_violation details for images (BerriAI#19328) (BerriAI#19372) Azure OpenAI Images (DALL·E 3) returns policy violations as a structured payload under body["error"], including inner_error.content_filter_results and revised_prompt. LiteLLM previously: - Failed to extract nested error messages (get_error_message only handled body["message"]) - Missed policy violation detection when error strings were generic - Dropped inner_error details when raising ContentPolicyViolationError This change: - Extracts nested Azure error fields (code/type/message + inner_error) - Detects policy violations via structured error codes - Passes an OpenAI-style error body + provider_specific_fields to preserve details Tests: - python3 -m pytest tests/test_litellm/llms/azure/test_azure_exception_mapping.py - python3 -m pytest tests/test_litellm/litellm_core_utils/test_exception_mapping_utils.py Fixes BerriAI#19328 * [Feat] Add Structured output for /v1/messages with Anthropic API, Azure Anthropic API, Bedrock Converse (BerriAI#19545) * fix: add AnthropicMessagesRequestOptionalParams * add _update_headers_with_anthropic_beta * fix output format tests * test_structured_output_e2e * TestAnthropicAPIStructuredOutput * test_structured_output_e2e * fix BASE * TestAzureAnthropicStructuredOutput * fix: Bedrock Converse * add nthropic Messages Pass-Through Architecture * fix: bedrock invoke output_format * fix: transform_anthropic_messages_request for vertex anthropic * TestBedrockInvokeStructuredOutput * docs anthropic vertex * docs fix * docs fix * fixing prompt-security's guardrail implementation (BerriAI#19374) * Consolidated change * fix(prompt_security): update message processing to persist sanitized files and filter for API calls * fix per krrishdholakia suggestion * Fix/per service ssl override v2 (BerriAI#19538) * refactor(ssl): support per-service SSL verification overrides * add test cases for ssl * docs: update Claude Code integration guides (BerriAI#19415) * docs: document Claude Code default models and env var overrides - Update config example with current Claude Code 2.1.x model names - Add section documenting default models (sonnet/haiku) that Claude Code requests - Document env var overrides (ANTHROPIC_DEFAULT_SONNET_MODEL, etc.) - Show how model_name alias can route to any provider (Bedrock, Vertex, etc.) * Update docs Removed warning about changing model names in Claude Code versions. * docs: add 1M context support and improve Claude Code quickstart guide - Add comprehensive 1M context window documentation - Document [1m] suffix usage and shell escaping requirements - Clarify that LiteLLM config should NOT include [1m] in model names - Add standalone claude_code_1m_context.md guide - Improve model selection documentation with environment variables - Add section on default models used by Claude Code v2.1.14 - Add troubleshooting for 1M context issues - Reorganize to emphasize environment variables approach Addresses GitHub issue BerriAI#14444 * docs: reorder model selection options - prioritize --model over env vars - Move command line/session model selection to Option 1 (most reliable) - Move environment variables to Option 2 - Add note that env vars may be cached from previous session - Emphasize that --model always uses exact model specified * docs: reorganize 1M context section - separate command line from env vars - Split 1M context examples into two clear sections - Show command line usage first (--model and /model) - Show environment variables as alternative approach - Improves readability and emphasizes most reliable method * docs: remove misleading default models section from website tutorial - Remove 'Default Models Used by Claude Code' section (misleading) - Remove claim that config must match exact default model names - Update config comment to be more general - Add claude-opus-4-5-20251101 to example config - Keep authentication section as-is * docs: correct model selection in website tutorial - Remove incorrect claim that Claude Code automatically uses proxy models - Add explicit model selection examples with --model and /model - Show environment variables as alternative approach - Remove misleading comment about 'multiple configured' * docs: add 1M context section to website tutorial - Add section on using [1m] suffix for 1 million token context - Include warning about shell escaping (quotes required) - Explain how Claude Code handles [1m] internally - Add /context verification command - Note that LiteLLM config should NOT include [1m] * docs: add tip about using .env for API keys - Add note that ANTHROPIC_API_KEY can be stored in .env file - Clarifies alternative to exporting environment variables * add redisvl dependency to the root requiremnts.tx (BerriAI#19417) * [Fix] UI Cost Estimator - Fix model dropdown (BerriAI#19529) * add cost estimator * ui fix show errors * test_estimate_cost_resolves_router_model_alias * fix: UI 404 error when SERVER_ROOT_PATH is set (BerriAI#19467) * fix: add case-insensitive support for guardrail mode and actions (BerriAI#19480) * fix(bedrock): correct streaming choice index for tool calls (BerriAI#19506) Bedrock's contentBlockIndex identifies content blocks within a message (text=0, tool_call=1), not OpenAI's choice index (which varies with n>1). This caused OpenAI SDK's ChatCompletionAccumulator to fail when tool call chunks arrived on index 1 while finish_reason arrived on index 0. Bedrock doesn't support n>1 (no such parameter exists): https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_InferenceConfiguration.html OpenAI choice index spec: https://platform.openai.com/docs/api-reference/chat/streaming * Fix Azure RPM calculation formula (BerriAI#19513) * Fix Azure RPM calculation formula * updated test * fix(azure response api): flatten tools for responses api to support nested definitions (BerriAI#19526) The Azure Responses API uses a different schema (flattened) for tools compared to the standard OpenAI/Azure Chat Completions API (nested). This caused a `BadRequestError` when users passed standard tool definitions. Changes: - Implemented tool flattening logic in `AzureOpenAIResponsesAPIConfig.transform_responses_api_request`. - Added comprehensive unit tests in test_azure_transformation.py to verify nested-to-flat transformation, pass-through of flat tools, and immutability. - Ensures cross-provider compatibility for tool definitions. Fixes BerriAI#19523 * Fix date overflow/division by zero in proxy utils (BerriAI#19527) * Fix date overflow/division by zero in proxy utils * Fix projected spend calculation * Strengthen projected spend tests * Fix Azure AI costs for Anthropic models (BerriAI#19530) * Fix Azure AI cost calculation * fixup * feat: Add MCP tools response to chat completions * feat: display mcp output on the play ground * Fix: generation config empty for batch * Add custom vertex ai mapping to the output * Add support for output formatfor bedrock invoke via v1/messages * feat: Limit stop sequence as per openai spec * Fix mypy error in litellm_staging_01_21_2026 * Fix: imagegeneration@006 has been deprecated * Fix : test_anthropic_via_responses_api * Fix: Responses API usage field type mismatch * Fix: Httpx timeout test failures * Fix: generationConfig removal from tests * fix: mypy error * comment code not used * feat: Add MCP tools response to chat completions * feat: display mcp output on the play ground * Fix batch tests * fix: mypy error * fix: mypy error * Fix:test_multiple_function_call * build(deps): bump lodash from 4.17.21 to 4.17.23 in /docs/my-website Bumps [lodash](https://github.com/lodash/lodash) from 4.17.21 to 4.17.23. - [Release notes](https://github.com/lodash/lodash/releases) - [Commits](lodash/lodash@4.17.21...4.17.23) --- updated-dependencies: - dependency-name: lodash dependency-version: 4.17.23 dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> * Metrics prometheus user team count (BerriAI#19520) * add user count and team count prometheus metrics * rebase * revert mistaken deletion * fix ui build and mypy lint * Adding python3-dev to non root * adding node-tar cve allowlist * fix(websearch_interception): filter internal kwargs before follow-up request (BerriAI#19577) The websearch interception handler was passing internal flags like `_websearch_interception_converted_stream` to the follow-up LLM request. This caused "Extra inputs are not permitted" errors from providers like Bedrock that use strict Pydantic validation. Fix: Filter out all kwargs starting with `_websearch_interception` prefix before making the follow-up anthropic_messages.acreate() call. * skip brave tests * Fix unsafe access to request attribute (BerriAI#19573) * updating promethus tests * Fix non-root proxy tests * Adding lodash-es to allowlist * attempt fix translation tests * fix: change oss staging branch name to reflect they're oss * Revert "[Infra] UI - E2E Tests: Internal Viewer Sidebar" * Overriding lodash-es with version 4.17.23 in docs * updating lodash for dashboard * bump: version 1.81.1 → 1.81.2 * Add reusable model select to update organization page * Fixing tests * Adding EOS to finish reasons * Adding retries to flaky tests * add opencode tutorial (BerriAI#19602) * Fix org all proxy model case * adjust opencode tutorial (BerriAI#19605) * Add OSS Adopters section to README * fix: completions mcp output ordering * feat(helm): Enable PreStop hook configuration in values.yaml (BerriAI#19613) * Fix: litellm/tests/test_proxy_server_non_root.py * Update README.md * Update README.md * [Feat] New LiteLLM Policy engine - create policies to manage guardrails, conditions - permissions per Key, Team (BerriAI#19612) * init PolicyMatcher * TestPolicyMatcherGetMatchingPolicies * TestPolicyMatcherGetMatchingPolicies * feat: init PolicyResolver * init resolver types * init policy from config * inint PolicyValidator * validate policy * init Architecture Diagram * test_add_guardrails_from_policy_engine * init _init_policy_engine * test updates * test fixws * new attachment config * simplify types * TestPolicyResolverInheritance * fix policy resolver * fix policies * fix applied policy * docs fix * docs fix * fix linting + QA checks * fix linting + QA fixes * test fixes * docs fix * fix: pass through endpoints update registry (BerriAI#19420) * fix: pass through endpoints update registry * add test case, fix lint error and comment to avoid confusion * fix pass through endpoints test case * [Fix] Anthropic models on Azure AI cache pricing (BerriAI#19532) (BerriAI#19614) * Update README.md * fix: for test * All Models Backend Search * adding test * test: completions mcp output test * chore: fix lint error * test: Skip anthropic model test when ANTHROPIC_API_KEY is not set * fix: include tool arguments in proxy_server_request for spend logs callbacks * feat: hashicorp vault rotate support * Add tool choice mapping for giga chat * Fix: Responses API logging error for StopIteration * Fix: test_nova_invoke_streaming_chunk_parsing * Remove f string * fix BerriAI#19620: SSO user roles are not updated for existing users (BerriAI#19621) * Fix: SSO user roles are not updated for existing users Fixes BerriAI#19620 * Refactor: Remove redundant user_info retrieval in SSOAuthenticationHandler * Test: add new tests for user creation and updates in get_user_info_from_db * ci cd fixes - linting security * resetting poetry and requirements * fixing security checks * docs fix * fixing config * skipping flaky tests * skipping non root tests entirely * security scan * attempt fix flaky tests * fixing flaky tests * [Feat] Guardrail Policy Management - Allow using UI to manage guardrail policies (BerriAI#19668) * init UI * init schema.prisma * fix: policy_crud_router * UI fixes * update gitignore * working v0 for policy mgmt * fix: endpoints to resolve guardrails * fix code QA checks * ui build issues * schema fixes * fix checks * docs fix * remove imports from functions * add schema.prisma * add migrtion * fix schema.prisma * remove imports from functions * fix lint * BUMP pyproject * add spend-queue-troubleshooting docs (BerriAI#19659) * add spend-queue-troubleshooting docs * adjust spend-queue-troubleshooting docs * fix linting * New add fallbacks modal * adding tests * Add Langfuse mock mode for testing without API calls (BerriAI#19676) * Add GCS mock mode for testing without API calls (BerriAI#19683) * Adding router settings to create team and key * fixing build * fixing tests * perf: Optimize strip_trailing_slash with O(1) index check (BerriAI#19679) * perf: Optimize strip_trailing_slash with O(1) index check Replace rstrip("/") with direct index check for O(1) performance instead of O(n) string scanning. Results: - strip_trailing_slash: 311ms → 13ms (96% faster) - get_standard_logging_object_payload: 6.11s → 5.80s (5% faster) * Handle multiple trailing slashes in strip_trailing_slash Use rstrip for correctness when URL ends with "//" or more, otherwise use O(1) index check for single trailing slash. * Fixing tests * perf: Optimize use_custom_pricing_for_model with set intersection (BerriAI#19677) * perf: Optimize use_custom_pricing_for_model with set intersection Cache CustomPricingLiteLLMParams.model_fields.keys() as a module-level frozenset and use set intersection to reduce loop iterations from 882k to 90k (only iterating over keys that exist in both sets). Performance improvement: 84% faster (6.3x speedup) - Before: 1.17s total, 65µs per call - After: 0.19s total, 10µs per call * Use .get() for defensive dictionary access * perf: skip pattern_router.route() for non-wildcard models (BerriAI#19664) Check "*" in model before calling pattern_router.route() to avoid unnecessary pattern matching for non-wildcard model configurations. * perf: Add LRU caching to get_model_info for faster cost lookups (BerriAI#19606) - Add @lru_cache decorator to get_model_info() and _cached_get_model_info_helper() - Update _invalidate_model_cost_lowercase_map() to clear these caches when model_cost changes - Update test to call cache invalidation after modifying litellm.model_cost Reduces get_model_cost_information from 46% to <1% of request handling time. * UI: new build * redirect to login on expired jwt * [Feat] UI + Backend - Allow adding policies on Keys/Teams + Viewing on Info panels (BerriAI#19688) * ui for policy mgmt * test_add_guardrails_from_policy_engine_accepts_dynamic_policies_and_pops_from_data * docs: add litellm-enterprise requirement for managed files (BerriAI#19689) * Update Gemini 2.0 Flash deprecation dates to March 31, 2026 (BerriAI#19592) Google announced that Gemini 2.0 Flash and Flash Lite models will be discontinued on March 31, 2026. Updated deprecation_date field for all affected model variants across different providers (vertex_ai, gemini, deepinfra, openrouter, vercel_ai_gateway). Models updated: - gemini-2.0-flash (added deprecation date) - gemini-2.0-flash-001 (updated from 2026-02-05) - gemini-2.0-flash-lite (added deprecation date) - gemini-2.0-flash-lite-001 (updated from 2026-02-25) All variants now correctly reflect the March 31, 2026 shutdown date. * fixing build * Fixing failing tests * deactivating non root tests * fixing arize tests * cache tests serial * fixing circleci config * fixing circleci config * Update OSS Adopters section with new table format * Fixing ruff check * bump: version 1.81.2 → 1.81.3 * chore: update Next.js build artifacts (2026-01-24 17:18 UTC, node v22.16.0) * CI/CD fixes - split local testing * fix: _apply_search_filter_to_models mypy linting * test_partner_models_httpx_streaming * test_web_search * Fix: log duplication when json_logs is enabled (BerriAI#19705) * fix: FLAKY tests * fix unstable tests * docs fix * docs fix * docs fix * docs fix * docs fix * test_get_default_unvicorn_init_args * fix flaky tests * test_hanging_request_azure * test_team_update_sc_2 * BUMP extras * test fixes * test fixes * test_retrieve_container_basic * Model and Team filtering * TestBedrockInvokeToolSearch * fix(presidio): resolve runtime error by handling asyncio loops in bac… (BerriAI#19714) * fix(presidio): resolve runtime error by handling asyncio loops in background threads * add test case for thread safety * UI Keys Teams Router Settings docs * chore: update Next.js build artifacts (2026-01-25 00:27 UTC, node v22.16.0) * test_stream_transformation_error_sync * fix patch reliability mock tests * fix MCP tests * fix: server rooth path (BerriAI#19790) * feat: tpm-rpm limit in prometheus metrics (BerriAI#19725) Co-authored-by: Krish Dholakia <krrishdholakia@gmail.com> * fix(proxy): support slashes in google generateContent model names (BerriAI#19737) * fix(proxy): support slashes in google route params * fix(proxy): extract google model ids with slashes * test(proxy): cover google model ids with slashes * fix(vertex_ai): support model names with slashes in passthrough URLs (BerriAI#19944) The regex in get_vertex_model_id_from_url() was using [^/:]+ which stopped at the first slash, truncating model names like 'gcp/google/gemini-2.5-flash' to just 'gcp'. This caused access_groups checks to fail for custom model names. Changed the pattern to [^:]+ to allow slashes in model names, only stopping at the colon before the action (e.g., :generateContent). * [Fix] VertexAI Pass through - fix regression that caused vertex ai passthroughs to stop working for router models (BerriAI#19967) * fix(vertex_ai): replace custom model names with actual Vertex AI model names in passthrough URLs (BerriAI#19948) When the passthrough URL already contains project and location, the code was skipping the deployment lookup and forwarding the URL as-is to Vertex AI. For custom model names like gcp/google/gemini-2.5-flash, Vertex AI returned 404 because it only knows the actual model name (gemini-2.5-flash). The fix makes the deployment lookup always run, so the custom model name gets replaced with the actual Vertex AI model name before forwarding. * add _resolve_vertex_model_from_router * fix: get_llm_provider * Potential fix for code scanning alert no. 4020: Clear-text logging of sensitive information Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com> --------- Co-authored-by: michelligabriele <gabriele.michelli@icloud.com> Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com> * [Feat] - Search API add /list endpoint to list what search tools exist in router (BerriAI#19969) * feat: List all available search tools configured in the router. * add debugging search API * add debugging search API * perf(prometheus): parallelize budget metrics, fix caching bug, reduce CPU by ~40% (BerriAI#20544) * fix: revert httpx client caching that caused closed client errors AsyncHTTPHandler.__del__ was closing httpx clients still in use by AsyncOpenAI/AsyncAzureOpenAI due to independent cache lifecycles. Restores standalone httpx client creation for OpenAI/Azure providers. * Revert "Merge pull request BerriAI#18790 from BerriAI/litellm_key_team_routing_3" This reverts commit ae26d8e, reversing changes made to 864e8c6. * fix MYPY lint * fixed build errors after merge * least busy debug logs --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com> Co-authored-by: mubashir1osmani <mubashir.osmani777@gmail.com> Co-authored-by: yuneng-jiang <yuneng.jiang@gmail.com> Co-authored-by: YutaSaito <36355491+uc4w6c@users.noreply.github.com> Co-authored-by: Yuta Saito <uc4w6c@bma.biglobe.ne.jp> Co-authored-by: Cesar Garcia <128240629+Chesars@users.noreply.github.com> Co-authored-by: Krish Dholakia <krrishdholakia@gmail.com> Co-authored-by: Alexsander Hamir <alexsanderhamirgomesbaptista@gmail.com> Co-authored-by: jay prajapati <79649559+jayy-77@users.noreply.github.com> Co-authored-by: Sameer Kankute <sameer@berri.ai> Co-authored-by: davida-ps <david.a@prompt.security> Co-authored-by: Harshit Jain <48647625+Harshit28j@users.noreply.github.com> Co-authored-by: houdataali <84786211+houdataali@users.noreply.github.com> Co-authored-by: João Dinis Ferreira <hello@joaof.eu> Co-authored-by: Emerson Gomes <emerson.gomes@thalesgroup.com> Co-authored-by: Yogeshwaran Ravichandran <96047771+yogeshwaran10@users.noreply.github.com> Co-authored-by: Will Chen <willchen90@gmail.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Eric Cao <ecao310@gmail.com> Co-authored-by: mpcusack-altos <mcusack@altoslabs.com> Co-authored-by: milan-berri <milan@berri.ai> Co-authored-by: John Greek <2006605+jgreek@users.noreply.github.com> Co-authored-by: xqe2011 <gz923553148@gmail.com> Co-authored-by: ryan-crabbe <128659760+ryan-crabbe@users.noreply.github.com> Co-authored-by: Harshit Jain <harshitjain0562@gmail.com> Co-authored-by: michelligabriele <gabriele.michelli@icloud.com> Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
shriharsha98
added a commit
to juspay/litellm
that referenced
this pull request
Feb 19, 2026
* Fix virtual keys table sorting * Adding tests * feat: add GMI Cloud provider support (BerriAI#19376) * feat: add GMI Cloud provider support Add GMI Cloud as an OpenAI-compatible provider with: - Provider configuration in providers.json - Documentation page with usage examples - Model pricing for 16 models (Claude, GPT, DeepSeek, Gemini, etc.) - Sidebar entry for docs navigation * Add gmi_cloud to provider_endpoints_support.json Add provider entry to pass CI validation check that ensures all providers in openai_like/providers.json are documented. * Fix provider key: gmi_cloud -> gmi Match the provider key with providers.json --------- Co-authored-by: Krish Dholakia <krrishdholakia@gmail.com> * Cut chat_completion latency by ~21% by reducing pre-call processing time (BerriAI#19535) * Adding scope to /models * e2e test internal viewer sidebar * Model Select for Create Team * create team model select * fixing build * [Fix] VertexAI Pass through - Ensure only anthropic betas are forwarded down to LLM API (BerriAI#19542) * fix ALLOWED_VERTEX_AI_PASSTHROUGH_HEADERS * test_vertex_passthrough_forwards_anthropic_beta_header * fix test_vertex_passthrough_forwards_anthropic_beta_header * test_vertex_passthrough_does_not_forward_litellm_auth_token * fix utils * Using Anthropic Beta Features on Vertex AI * test_forward_headers_from_request_x_pass_prefix * [Fix] VertexAI Pass through - Ensure only anthropic betas are forwarded down to LLM API (BerriAI#19542) * fix ALLOWED_VERTEX_AI_PASSTHROUGH_HEADERS * test_vertex_passthrough_forwards_anthropic_beta_header * fix test_vertex_passthrough_forwards_anthropic_beta_header * test_vertex_passthrough_does_not_forward_litellm_auth_token * fix utils * Using Anthropic Beta Features on Vertex AI * test_forward_headers_from_request_x_pass_prefix * fix(mcp): forward static_headers to MCP servers (BerriAI#19341) (BerriAI#19366) Forward static_headers from /mcp-rest/test/* routes into the MCP client so headers are present during session.initialize() and tool discovery. Also add a shared merge_mcp_headers() helper to keep header precedence consistent and ensure OpenAPI-to-MCP generated tools include static_headers. Tests: - pytest tests/test_litellm/proxy/_experimental/mcp_server/test_rest_endpoints.py - pytest tests/test_litellm/proxy/_experimental/mcp_server/test_mcp_server_manager.py -k register_openapi_tools_includes_static_headers Fixes BerriAI#19341 Co-authored-by: Krish Dholakia <krrishdholakia@gmail.com> * fix(azure): preserve content_policy_violation details for images (BerriAI#19328) (BerriAI#19372) Azure OpenAI Images (DALL·E 3) returns policy violations as a structured payload under body["error"], including inner_error.content_filter_results and revised_prompt. LiteLLM previously: - Failed to extract nested error messages (get_error_message only handled body["message"]) - Missed policy violation detection when error strings were generic - Dropped inner_error details when raising ContentPolicyViolationError This change: - Extracts nested Azure error fields (code/type/message + inner_error) - Detects policy violations via structured error codes - Passes an OpenAI-style error body + provider_specific_fields to preserve details Tests: - python3 -m pytest tests/test_litellm/llms/azure/test_azure_exception_mapping.py - python3 -m pytest tests/test_litellm/litellm_core_utils/test_exception_mapping_utils.py Fixes BerriAI#19328 * [Feat] Add Structured output for /v1/messages with Anthropic API, Azure Anthropic API, Bedrock Converse (BerriAI#19545) * fix: add AnthropicMessagesRequestOptionalParams * add _update_headers_with_anthropic_beta * fix output format tests * test_structured_output_e2e * TestAnthropicAPIStructuredOutput * test_structured_output_e2e * fix BASE * TestAzureAnthropicStructuredOutput * fix: Bedrock Converse * add nthropic Messages Pass-Through Architecture * fix: bedrock invoke output_format * fix: transform_anthropic_messages_request for vertex anthropic * TestBedrockInvokeStructuredOutput * docs anthropic vertex * docs fix * docs fix * fixing prompt-security's guardrail implementation (BerriAI#19374) * Consolidated change * fix(prompt_security): update message processing to persist sanitized files and filter for API calls * fix per krrishdholakia suggestion * Fix/per service ssl override v2 (BerriAI#19538) * refactor(ssl): support per-service SSL verification overrides * add test cases for ssl * docs: update Claude Code integration guides (BerriAI#19415) * docs: document Claude Code default models and env var overrides - Update config example with current Claude Code 2.1.x model names - Add section documenting default models (sonnet/haiku) that Claude Code requests - Document env var overrides (ANTHROPIC_DEFAULT_SONNET_MODEL, etc.) - Show how model_name alias can route to any provider (Bedrock, Vertex, etc.) * Update docs Removed warning about changing model names in Claude Code versions. * docs: add 1M context support and improve Claude Code quickstart guide - Add comprehensive 1M context window documentation - Document [1m] suffix usage and shell escaping requirements - Clarify that LiteLLM config should NOT include [1m] in model names - Add standalone claude_code_1m_context.md guide - Improve model selection documentation with environment variables - Add section on default models used by Claude Code v2.1.14 - Add troubleshooting for 1M context issues - Reorganize to emphasize environment variables approach Addresses GitHub issue BerriAI#14444 * docs: reorder model selection options - prioritize --model over env vars - Move command line/session model selection to Option 1 (most reliable) - Move environment variables to Option 2 - Add note that env vars may be cached from previous session - Emphasize that --model always uses exact model specified * docs: reorganize 1M context section - separate command line from env vars - Split 1M context examples into two clear sections - Show command line usage first (--model and /model) - Show environment variables as alternative approach - Improves readability and emphasizes most reliable method * docs: remove misleading default models section from website tutorial - Remove 'Default Models Used by Claude Code' section (misleading) - Remove claim that config must match exact default model names - Update config comment to be more general - Add claude-opus-4-5-20251101 to example config - Keep authentication section as-is * docs: correct model selection in website tutorial - Remove incorrect claim that Claude Code automatically uses proxy models - Add explicit model selection examples with --model and /model - Show environment variables as alternative approach - Remove misleading comment about 'multiple configured' * docs: add 1M context section to website tutorial - Add section on using [1m] suffix for 1 million token context - Include warning about shell escaping (quotes required) - Explain how Claude Code handles [1m] internally - Add /context verification command - Note that LiteLLM config should NOT include [1m] * docs: add tip about using .env for API keys - Add note that ANTHROPIC_API_KEY can be stored in .env file - Clarifies alternative to exporting environment variables * add redisvl dependency to the root requiremnts.tx (BerriAI#19417) * [Fix] UI Cost Estimator - Fix model dropdown (BerriAI#19529) * add cost estimator * ui fix show errors * test_estimate_cost_resolves_router_model_alias * fix: UI 404 error when SERVER_ROOT_PATH is set (BerriAI#19467) * fix: add case-insensitive support for guardrail mode and actions (BerriAI#19480) * fix(bedrock): correct streaming choice index for tool calls (BerriAI#19506) Bedrock's contentBlockIndex identifies content blocks within a message (text=0, tool_call=1), not OpenAI's choice index (which varies with n>1). This caused OpenAI SDK's ChatCompletionAccumulator to fail when tool call chunks arrived on index 1 while finish_reason arrived on index 0. Bedrock doesn't support n>1 (no such parameter exists): https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_InferenceConfiguration.html OpenAI choice index spec: https://platform.openai.com/docs/api-reference/chat/streaming * Fix Azure RPM calculation formula (BerriAI#19513) * Fix Azure RPM calculation formula * updated test * fix(azure response api): flatten tools for responses api to support nested definitions (BerriAI#19526) The Azure Responses API uses a different schema (flattened) for tools compared to the standard OpenAI/Azure Chat Completions API (nested). This caused a `BadRequestError` when users passed standard tool definitions. Changes: - Implemented tool flattening logic in `AzureOpenAIResponsesAPIConfig.transform_responses_api_request`. - Added comprehensive unit tests in test_azure_transformation.py to verify nested-to-flat transformation, pass-through of flat tools, and immutability. - Ensures cross-provider compatibility for tool definitions. Fixes BerriAI#19523 * Fix date overflow/division by zero in proxy utils (BerriAI#19527) * Fix date overflow/division by zero in proxy utils * Fix projected spend calculation * Strengthen projected spend tests * Fix Azure AI costs for Anthropic models (BerriAI#19530) * Fix Azure AI cost calculation * fixup * feat: Add MCP tools response to chat completions * feat: display mcp output on the play ground * Fix: generation config empty for batch * Add custom vertex ai mapping to the output * Add support for output formatfor bedrock invoke via v1/messages * feat: Limit stop sequence as per openai spec * Fix mypy error in litellm_staging_01_21_2026 * Fix: imagegeneration@006 has been deprecated * Fix : test_anthropic_via_responses_api * Fix: Responses API usage field type mismatch * Fix: Httpx timeout test failures * Fix: generationConfig removal from tests * fix: mypy error * comment code not used * feat: Add MCP tools response to chat completions * feat: display mcp output on the play ground * Fix batch tests * fix: mypy error * fix: mypy error * Fix:test_multiple_function_call * build(deps): bump lodash from 4.17.21 to 4.17.23 in /docs/my-website Bumps [lodash](https://github.com/lodash/lodash) from 4.17.21 to 4.17.23. - [Release notes](https://github.com/lodash/lodash/releases) - [Commits](lodash/lodash@4.17.21...4.17.23) --- updated-dependencies: - dependency-name: lodash dependency-version: 4.17.23 dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> * Metrics prometheus user team count (BerriAI#19520) * add user count and team count prometheus metrics * rebase * revert mistaken deletion * fix ui build and mypy lint * Adding python3-dev to non root * adding node-tar cve allowlist * fix(websearch_interception): filter internal kwargs before follow-up request (BerriAI#19577) The websearch interception handler was passing internal flags like `_websearch_interception_converted_stream` to the follow-up LLM request. This caused "Extra inputs are not permitted" errors from providers like Bedrock that use strict Pydantic validation. Fix: Filter out all kwargs starting with `_websearch_interception` prefix before making the follow-up anthropic_messages.acreate() call. * skip brave tests * Fix unsafe access to request attribute (BerriAI#19573) * updating promethus tests * Fix non-root proxy tests * Adding lodash-es to allowlist * attempt fix translation tests * fix: change oss staging branch name to reflect they're oss * Revert "[Infra] UI - E2E Tests: Internal Viewer Sidebar" * Overriding lodash-es with version 4.17.23 in docs * updating lodash for dashboard * bump: version 1.81.1 → 1.81.2 * Add reusable model select to update organization page * Fixing tests * Adding EOS to finish reasons * Adding retries to flaky tests * add opencode tutorial (BerriAI#19602) * Fix org all proxy model case * adjust opencode tutorial (BerriAI#19605) * Add OSS Adopters section to README * fix: completions mcp output ordering * feat(helm): Enable PreStop hook configuration in values.yaml (BerriAI#19613) * Fix: litellm/tests/test_proxy_server_non_root.py * Update README.md * Update README.md * [Feat] New LiteLLM Policy engine - create policies to manage guardrails, conditions - permissions per Key, Team (BerriAI#19612) * init PolicyMatcher * TestPolicyMatcherGetMatchingPolicies * TestPolicyMatcherGetMatchingPolicies * feat: init PolicyResolver * init resolver types * init policy from config * inint PolicyValidator * validate policy * init Architecture Diagram * test_add_guardrails_from_policy_engine * init _init_policy_engine * test updates * test fixws * new attachment config * simplify types * TestPolicyResolverInheritance * fix policy resolver * fix policies * fix applied policy * docs fix * docs fix * fix linting + QA checks * fix linting + QA fixes * test fixes * docs fix * fix: pass through endpoints update registry (BerriAI#19420) * fix: pass through endpoints update registry * add test case, fix lint error and comment to avoid confusion * fix pass through endpoints test case * [Fix] Anthropic models on Azure AI cache pricing (BerriAI#19532) (BerriAI#19614) * Update README.md * fix: for test * All Models Backend Search * adding test * test: completions mcp output test * chore: fix lint error * test: Skip anthropic model test when ANTHROPIC_API_KEY is not set * fix: include tool arguments in proxy_server_request for spend logs callbacks * feat: hashicorp vault rotate support * Add tool choice mapping for giga chat * Fix: Responses API logging error for StopIteration * Fix: test_nova_invoke_streaming_chunk_parsing * Remove f string * fix BerriAI#19620: SSO user roles are not updated for existing users (BerriAI#19621) * Fix: SSO user roles are not updated for existing users Fixes BerriAI#19620 * Refactor: Remove redundant user_info retrieval in SSOAuthenticationHandler * Test: add new tests for user creation and updates in get_user_info_from_db * ci cd fixes - linting security * resetting poetry and requirements * fixing security checks * docs fix * fixing config * skipping flaky tests * skipping non root tests entirely * security scan * attempt fix flaky tests * fixing flaky tests * [Feat] Guardrail Policy Management - Allow using UI to manage guardrail policies (BerriAI#19668) * init UI * init schema.prisma * fix: policy_crud_router * UI fixes * update gitignore * working v0 for policy mgmt * fix: endpoints to resolve guardrails * fix code QA checks * ui build issues * schema fixes * fix checks * docs fix * remove imports from functions * add schema.prisma * add migrtion * fix schema.prisma * remove imports from functions * fix lint * BUMP pyproject * add spend-queue-troubleshooting docs (BerriAI#19659) * add spend-queue-troubleshooting docs * adjust spend-queue-troubleshooting docs * fix linting * New add fallbacks modal * adding tests * Add Langfuse mock mode for testing without API calls (BerriAI#19676) * Add GCS mock mode for testing without API calls (BerriAI#19683) * Adding router settings to create team and key * fixing build * fixing tests * perf: Optimize strip_trailing_slash with O(1) index check (BerriAI#19679) * perf: Optimize strip_trailing_slash with O(1) index check Replace rstrip("/") with direct index check for O(1) performance instead of O(n) string scanning. Results: - strip_trailing_slash: 311ms → 13ms (96% faster) - get_standard_logging_object_payload: 6.11s → 5.80s (5% faster) * Handle multiple trailing slashes in strip_trailing_slash Use rstrip for correctness when URL ends with "//" or more, otherwise use O(1) index check for single trailing slash. * Fixing tests * perf: Optimize use_custom_pricing_for_model with set intersection (BerriAI#19677) * perf: Optimize use_custom_pricing_for_model with set intersection Cache CustomPricingLiteLLMParams.model_fields.keys() as a module-level frozenset and use set intersection to reduce loop iterations from 882k to 90k (only iterating over keys that exist in both sets). Performance improvement: 84% faster (6.3x speedup) - Before: 1.17s total, 65µs per call - After: 0.19s total, 10µs per call * Use .get() for defensive dictionary access * perf: skip pattern_router.route() for non-wildcard models (BerriAI#19664) Check "*" in model before calling pattern_router.route() to avoid unnecessary pattern matching for non-wildcard model configurations. * perf: Add LRU caching to get_model_info for faster cost lookups (BerriAI#19606) - Add @lru_cache decorator to get_model_info() and _cached_get_model_info_helper() - Update _invalidate_model_cost_lowercase_map() to clear these caches when model_cost changes - Update test to call cache invalidation after modifying litellm.model_cost Reduces get_model_cost_information from 46% to <1% of request handling time. * UI: new build * redirect to login on expired jwt * [Feat] UI + Backend - Allow adding policies on Keys/Teams + Viewing on Info panels (BerriAI#19688) * ui for policy mgmt * test_add_guardrails_from_policy_engine_accepts_dynamic_policies_and_pops_from_data * docs: add litellm-enterprise requirement for managed files (BerriAI#19689) * Update Gemini 2.0 Flash deprecation dates to March 31, 2026 (BerriAI#19592) Google announced that Gemini 2.0 Flash and Flash Lite models will be discontinued on March 31, 2026. Updated deprecation_date field for all affected model variants across different providers (vertex_ai, gemini, deepinfra, openrouter, vercel_ai_gateway). Models updated: - gemini-2.0-flash (added deprecation date) - gemini-2.0-flash-001 (updated from 2026-02-05) - gemini-2.0-flash-lite (added deprecation date) - gemini-2.0-flash-lite-001 (updated from 2026-02-25) All variants now correctly reflect the March 31, 2026 shutdown date. * fixing build * Fixing failing tests * deactivating non root tests * fixing arize tests * cache tests serial * fixing circleci config * fixing circleci config * Update OSS Adopters section with new table format * Fixing ruff check * bump: version 1.81.2 → 1.81.3 * chore: update Next.js build artifacts (2026-01-24 17:18 UTC, node v22.16.0) * CI/CD fixes - split local testing * fix: _apply_search_filter_to_models mypy linting * test_partner_models_httpx_streaming * test_web_search * Fix: log duplication when json_logs is enabled (BerriAI#19705) * fix: FLAKY tests * fix unstable tests * docs fix * docs fix * docs fix * docs fix * docs fix * test_get_default_unvicorn_init_args * fix flaky tests * test_hanging_request_azure * test_team_update_sc_2 * BUMP extras * test fixes * test fixes * test_retrieve_container_basic * Model and Team filtering * TestBedrockInvokeToolSearch * fix(presidio): resolve runtime error by handling asyncio loops in bac… (BerriAI#19714) * fix(presidio): resolve runtime error by handling asyncio loops in background threads * add test case for thread safety * UI Keys Teams Router Settings docs * chore: update Next.js build artifacts (2026-01-25 00:27 UTC, node v22.16.0) * test_stream_transformation_error_sync * fix patch reliability mock tests * fix MCP tests * fix: server rooth path (BerriAI#19790) * feat: tpm-rpm limit in prometheus metrics (BerriAI#19725) Co-authored-by: Krish Dholakia <krrishdholakia@gmail.com> * fix(proxy): support slashes in google generateContent model names (BerriAI#19737) * fix(proxy): support slashes in google route params * fix(proxy): extract google model ids with slashes * test(proxy): cover google model ids with slashes * fix(vertex_ai): support model names with slashes in passthrough URLs (BerriAI#19944) The regex in get_vertex_model_id_from_url() was using [^/:]+ which stopped at the first slash, truncating model names like 'gcp/google/gemini-2.5-flash' to just 'gcp'. This caused access_groups checks to fail for custom model names. Changed the pattern to [^:]+ to allow slashes in model names, only stopping at the colon before the action (e.g., :generateContent). * [Fix] VertexAI Pass through - fix regression that caused vertex ai passthroughs to stop working for router models (BerriAI#19967) * fix(vertex_ai): replace custom model names with actual Vertex AI model names in passthrough URLs (BerriAI#19948) When the passthrough URL already contains project and location, the code was skipping the deployment lookup and forwarding the URL as-is to Vertex AI. For custom model names like gcp/google/gemini-2.5-flash, Vertex AI returned 404 because it only knows the actual model name (gemini-2.5-flash). The fix makes the deployment lookup always run, so the custom model name gets replaced with the actual Vertex AI model name before forwarding. * add _resolve_vertex_model_from_router * fix: get_llm_provider * Potential fix for code scanning alert no. 4020: Clear-text logging of sensitive information Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com> --------- Co-authored-by: michelligabriele <gabriele.michelli@icloud.com> Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com> * [Feat] - Search API add /list endpoint to list what search tools exist in router (BerriAI#19969) * feat: List all available search tools configured in the router. * add debugging search API * add debugging search API * perf(prometheus): parallelize budget metrics, fix caching bug, reduce CPU by ~40% (BerriAI#20544) * fix: revert httpx client caching that caused closed client errors AsyncHTTPHandler.__del__ was closing httpx clients still in use by AsyncOpenAI/AsyncAzureOpenAI due to independent cache lifecycles. Restores standalone httpx client creation for OpenAI/Azure providers. * Revert "Merge pull request BerriAI#18790 from BerriAI/litellm_key_team_routing_3" This reverts commit ae26d8e, reversing changes made to 864e8c6. * fix MYPY lint * fixed build errors after merge * added sandbox branch for gcr push (#61) * added sandbox branch for gcr push * jenkins setup for sbx * build fix * addding sync/v[0-9] branches for gcr push * build fix * least busy debug logs * Fix: remove x-anthropic-billing block * added backl anthropic envs * merge fixes * least busy router changes --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: YutaSaito <36355491+uc4w6c@users.noreply.github.com> Co-authored-by: yuneng-jiang <yuneng.jiang@gmail.com> Co-authored-by: Cesar Garcia <128240629+Chesars@users.noreply.github.com> Co-authored-by: Krish Dholakia <krrishdholakia@gmail.com> Co-authored-by: Alexsander Hamir <alexsanderhamirgomesbaptista@gmail.com> Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com> Co-authored-by: jay prajapati <79649559+jayy-77@users.noreply.github.com> Co-authored-by: Sameer Kankute <sameer@berri.ai> Co-authored-by: davida-ps <david.a@prompt.security> Co-authored-by: Harshit Jain <48647625+Harshit28j@users.noreply.github.com> Co-authored-by: houdataali <84786211+houdataali@users.noreply.github.com> Co-authored-by: João Dinis Ferreira <hello@joaof.eu> Co-authored-by: Emerson Gomes <emerson.gomes@thalesgroup.com> Co-authored-by: Yogeshwaran Ravichandran <96047771+yogeshwaran10@users.noreply.github.com> Co-authored-by: Will Chen <willchen90@gmail.com> Co-authored-by: Yuta Saito <uc4w6c@bma.biglobe.ne.jp> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Eric Cao <ecao310@gmail.com> Co-authored-by: mpcusack-altos <mcusack@altoslabs.com> Co-authored-by: milan-berri <milan@berri.ai> Co-authored-by: John Greek <2006605+jgreek@users.noreply.github.com> Co-authored-by: xqe2011 <gz923553148@gmail.com> Co-authored-by: mubashir1osmani <mubashir.osmani777@gmail.com> Co-authored-by: ryan-crabbe <128659760+ryan-crabbe@users.noreply.github.com> Co-authored-by: Harshit Jain <harshitjain0562@gmail.com> Co-authored-by: michelligabriele <gabriele.michelli@icloud.com> Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com> Co-authored-by: pramodp-dotcom <pramod.p@juspay.in>
shriharsha98
added a commit
to juspay/litellm
that referenced
this pull request
Feb 23, 2026
* added sandbox branch for gcr push * jenkins setup for sbx * build fix * addding sync/v[0-9] branches for gcr push * build fix * Feature/upgrade to v1.81.3 stable (#63) * [Fix] LiteLLM VertexAI Pass through - ensuring incoming headers are forwarded down to target (BerriAI#19524) * test_vertex_passthrough_forwards_anthropic_beta_header * add_incoming_headers * fix linting errors * fix lint * fix: Send litellm_trace_id to Langfuse to link LiteLLM logs with Langfuse logs * test: update langfuse trace_id tests to use litellm_trace_id * Fix virtual keys table sorting * Adding tests * feat: add GMI Cloud provider support (BerriAI#19376) * feat: add GMI Cloud provider support Add GMI Cloud as an OpenAI-compatible provider with: - Provider configuration in providers.json - Documentation page with usage examples - Model pricing for 16 models (Claude, GPT, DeepSeek, Gemini, etc.) - Sidebar entry for docs navigation * Add gmi_cloud to provider_endpoints_support.json Add provider entry to pass CI validation check that ensures all providers in openai_like/providers.json are documented. * Fix provider key: gmi_cloud -> gmi Match the provider key with providers.json --------- Co-authored-by: Krish Dholakia <krrishdholakia@gmail.com> * Cut chat_completion latency by ~21% by reducing pre-call processing time (BerriAI#19535) * Adding scope to /models * e2e test internal viewer sidebar * Model Select for Create Team * create team model select * fixing build * [Fix] VertexAI Pass through - Ensure only anthropic betas are forwarded down to LLM API (BerriAI#19542) * fix ALLOWED_VERTEX_AI_PASSTHROUGH_HEADERS * test_vertex_passthrough_forwards_anthropic_beta_header * fix test_vertex_passthrough_forwards_anthropic_beta_header * test_vertex_passthrough_does_not_forward_litellm_auth_token * fix utils * Using Anthropic Beta Features on Vertex AI * test_forward_headers_from_request_x_pass_prefix * [Fix] VertexAI Pass through - Ensure only anthropic betas are forwarded down to LLM API (BerriAI#19542) * fix ALLOWED_VERTEX_AI_PASSTHROUGH_HEADERS * test_vertex_passthrough_forwards_anthropic_beta_header * fix test_vertex_passthrough_forwards_anthropic_beta_header * test_vertex_passthrough_does_not_forward_litellm_auth_token * fix utils * Using Anthropic Beta Features on Vertex AI * test_forward_headers_from_request_x_pass_prefix * fix(mcp): forward static_headers to MCP servers (BerriAI#19341) (BerriAI#19366) Forward static_headers from /mcp-rest/test/* routes into the MCP client so headers are present during session.initialize() and tool discovery. Also add a shared merge_mcp_headers() helper to keep header precedence consistent and ensure OpenAPI-to-MCP generated tools include static_headers. Tests: - pytest tests/test_litellm/proxy/_experimental/mcp_server/test_rest_endpoints.py - pytest tests/test_litellm/proxy/_experimental/mcp_server/test_mcp_server_manager.py -k register_openapi_tools_includes_static_headers Fixes BerriAI#19341 Co-authored-by: Krish Dholakia <krrishdholakia@gmail.com> * fix(azure): preserve content_policy_violation details for images (BerriAI#19328) (BerriAI#19372) Azure OpenAI Images (DALL·E 3) returns policy violations as a structured payload under body["error"], including inner_error.content_filter_results and revised_prompt. LiteLLM previously: - Failed to extract nested error messages (get_error_message only handled body["message"]) - Missed policy violation detection when error strings were generic - Dropped inner_error details when raising ContentPolicyViolationError This change: - Extracts nested Azure error fields (code/type/message + inner_error) - Detects policy violations via structured error codes - Passes an OpenAI-style error body + provider_specific_fields to preserve details Tests: - python3 -m pytest tests/test_litellm/llms/azure/test_azure_exception_mapping.py - python3 -m pytest tests/test_litellm/litellm_core_utils/test_exception_mapping_utils.py Fixes BerriAI#19328 * [Feat] Add Structured output for /v1/messages with Anthropic API, Azure Anthropic API, Bedrock Converse (BerriAI#19545) * fix: add AnthropicMessagesRequestOptionalParams * add _update_headers_with_anthropic_beta * fix output format tests * test_structured_output_e2e * TestAnthropicAPIStructuredOutput * test_structured_output_e2e * fix BASE * TestAzureAnthropicStructuredOutput * fix: Bedrock Converse * add nthropic Messages Pass-Through Architecture * fix: bedrock invoke output_format * fix: transform_anthropic_messages_request for vertex anthropic * TestBedrockInvokeStructuredOutput * docs anthropic vertex * docs fix * docs fix * fixing prompt-security's guardrail implementation (BerriAI#19374) * Consolidated change * fix(prompt_security): update message processing to persist sanitized files and filter for API calls * fix per krrishdholakia suggestion * Fix/per service ssl override v2 (BerriAI#19538) * refactor(ssl): support per-service SSL verification overrides * add test cases for ssl * docs: update Claude Code integration guides (BerriAI#19415) * docs: document Claude Code default models and env var overrides - Update config example with current Claude Code 2.1.x model names - Add section documenting default models (sonnet/haiku) that Claude Code requests - Document env var overrides (ANTHROPIC_DEFAULT_SONNET_MODEL, etc.) - Show how model_name alias can route to any provider (Bedrock, Vertex, etc.) * Update docs Removed warning about changing model names in Claude Code versions. * docs: add 1M context support and improve Claude Code quickstart guide - Add comprehensive 1M context window documentation - Document [1m] suffix usage and shell escaping requirements - Clarify that LiteLLM config should NOT include [1m] in model names - Add standalone claude_code_1m_context.md guide - Improve model selection documentation with environment variables - Add section on default models used by Claude Code v2.1.14 - Add troubleshooting for 1M context issues - Reorganize to emphasize environment variables approach Addresses GitHub issue BerriAI#14444 * docs: reorder model selection options - prioritize --model over env vars - Move command line/session model selection to Option 1 (most reliable) - Move environment variables to Option 2 - Add note that env vars may be cached from previous session - Emphasize that --model always uses exact model specified * docs: reorganize 1M context section - separate command line from env vars - Split 1M context examples into two clear sections - Show command line usage first (--model and /model) - Show environment variables as alternative approach - Improves readability and emphasizes most reliable method * docs: remove misleading default models section from website tutorial - Remove 'Default Models Used by Claude Code' section (misleading) - Remove claim that config must match exact default model names - Update config comment to be more general - Add claude-opus-4-5-20251101 to example config - Keep authentication section as-is * docs: correct model selection in website tutorial - Remove incorrect claim that Claude Code automatically uses proxy models - Add explicit model selection examples with --model and /model - Show environment variables as alternative approach - Remove misleading comment about 'multiple configured' * docs: add 1M context section to website tutorial - Add section on using [1m] suffix for 1 million token context - Include warning about shell escaping (quotes required) - Explain how Claude Code handles [1m] internally - Add /context verification command - Note that LiteLLM config should NOT include [1m] * docs: add tip about using .env for API keys - Add note that ANTHROPIC_API_KEY can be stored in .env file - Clarifies alternative to exporting environment variables * add redisvl dependency to the root requiremnts.tx (BerriAI#19417) * [Fix] UI Cost Estimator - Fix model dropdown (BerriAI#19529) * add cost estimator * ui fix show errors * test_estimate_cost_resolves_router_model_alias * fix: UI 404 error when SERVER_ROOT_PATH is set (BerriAI#19467) * fix: add case-insensitive support for guardrail mode and actions (BerriAI#19480) * fix(bedrock): correct streaming choice index for tool calls (BerriAI#19506) Bedrock's contentBlockIndex identifies content blocks within a message (text=0, tool_call=1), not OpenAI's choice index (which varies with n>1). This caused OpenAI SDK's ChatCompletionAccumulator to fail when tool call chunks arrived on index 1 while finish_reason arrived on index 0. Bedrock doesn't support n>1 (no such parameter exists): https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_InferenceConfiguration.html OpenAI choice index spec: https://platform.openai.com/docs/api-reference/chat/streaming * Fix Azure RPM calculation formula (BerriAI#19513) * Fix Azure RPM calculation formula * updated test * fix(azure response api): flatten tools for responses api to support nested definitions (BerriAI#19526) The Azure Responses API uses a different schema (flattened) for tools compared to the standard OpenAI/Azure Chat Completions API (nested). This caused a `BadRequestError` when users passed standard tool definitions. Changes: - Implemented tool flattening logic in `AzureOpenAIResponsesAPIConfig.transform_responses_api_request`. - Added comprehensive unit tests in test_azure_transformation.py to verify nested-to-flat transformation, pass-through of flat tools, and immutability. - Ensures cross-provider compatibility for tool definitions. Fixes BerriAI#19523 * Fix date overflow/division by zero in proxy utils (BerriAI#19527) * Fix date overflow/division by zero in proxy utils * Fix projected spend calculation * Strengthen projected spend tests * Fix Azure AI costs for Anthropic models (BerriAI#19530) * Fix Azure AI cost calculation * fixup * feat: Add MCP tools response to chat completions * feat: display mcp output on the play ground * Fix: generation config empty for batch * Add custom vertex ai mapping to the output * Add support for output formatfor bedrock invoke via v1/messages * feat: Limit stop sequence as per openai spec * Fix mypy error in litellm_staging_01_21_2026 * Fix: imagegeneration@006 has been deprecated * Fix : test_anthropic_via_responses_api * Fix: Responses API usage field type mismatch * Fix: Httpx timeout test failures * Fix: generationConfig removal from tests * fix: mypy error * comment code not used * feat: Add MCP tools response to chat completions * feat: display mcp output on the play ground * Fix batch tests * fix: mypy error * fix: mypy error * Fix:test_multiple_function_call * build(deps): bump lodash from 4.17.21 to 4.17.23 in /docs/my-website Bumps [lodash](https://github.com/lodash/lodash) from 4.17.21 to 4.17.23. - [Release notes](https://github.com/lodash/lodash/releases) - [Commits](lodash/lodash@4.17.21...4.17.23) --- updated-dependencies: - dependency-name: lodash dependency-version: 4.17.23 dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> * Metrics prometheus user team count (BerriAI#19520) * add user count and team count prometheus metrics * rebase * revert mistaken deletion * fix ui build and mypy lint * Adding python3-dev to non root * adding node-tar cve allowlist * fix(websearch_interception): filter internal kwargs before follow-up request (BerriAI#19577) The websearch interception handler was passing internal flags like `_websearch_interception_converted_stream` to the follow-up LLM request. This caused "Extra inputs are not permitted" errors from providers like Bedrock that use strict Pydantic validation. Fix: Filter out all kwargs starting with `_websearch_interception` prefix before making the follow-up anthropic_messages.acreate() call. * skip brave tests * Fix unsafe access to request attribute (BerriAI#19573) * updating promethus tests * Fix non-root proxy tests * Adding lodash-es to allowlist * attempt fix translation tests * fix: change oss staging branch name to reflect they're oss * Revert "[Infra] UI - E2E Tests: Internal Viewer Sidebar" * Overriding lodash-es with version 4.17.23 in docs * updating lodash for dashboard * bump: version 1.81.1 → 1.81.2 * Add reusable model select to update organization page * Fixing tests * Adding EOS to finish reasons * Adding retries to flaky tests * add opencode tutorial (BerriAI#19602) * Fix org all proxy model case * adjust opencode tutorial (BerriAI#19605) * Add OSS Adopters section to README * fix: completions mcp output ordering * feat(helm): Enable PreStop hook configuration in values.yaml (BerriAI#19613) * Fix: litellm/tests/test_proxy_server_non_root.py * Update README.md * Update README.md * [Feat] New LiteLLM Policy engine - create policies to manage guardrails, conditions - permissions per Key, Team (BerriAI#19612) * init PolicyMatcher * TestPolicyMatcherGetMatchingPolicies * TestPolicyMatcherGetMatchingPolicies * feat: init PolicyResolver * init resolver types * init policy from config * inint PolicyValidator * validate policy * init Architecture Diagram * test_add_guardrails_from_policy_engine * init _init_policy_engine * test updates * test fixws * new attachment config * simplify types * TestPolicyResolverInheritance * fix policy resolver * fix policies * fix applied policy * docs fix * docs fix * fix linting + QA checks * fix linting + QA fixes * test fixes * docs fix * fix: pass through endpoints update registry (BerriAI#19420) * fix: pass through endpoints update registry * add test case, fix lint error and comment to avoid confusion * fix pass through endpoints test case * [Fix] Anthropic models on Azure AI cache pricing (BerriAI#19532) (BerriAI#19614) * Update README.md * fix: for test * All Models Backend Search * adding test * test: completions mcp output test * chore: fix lint error * test: Skip anthropic model test when ANTHROPIC_API_KEY is not set * fix: include tool arguments in proxy_server_request for spend logs callbacks * feat: hashicorp vault rotate support * Add tool choice mapping for giga chat * Fix: Responses API logging error for StopIteration * Fix: test_nova_invoke_streaming_chunk_parsing * Remove f string * fix BerriAI#19620: SSO user roles are not updated for existing users (BerriAI#19621) * Fix: SSO user roles are not updated for existing users Fixes BerriAI#19620 * Refactor: Remove redundant user_info retrieval in SSOAuthenticationHandler * Test: add new tests for user creation and updates in get_user_info_from_db * ci cd fixes - linting security * resetting poetry and requirements * fixing security checks * docs fix * fixing config * skipping flaky tests * skipping non root tests entirely * security scan * attempt fix flaky tests * fixing flaky tests * [Feat] Guardrail Policy Management - Allow using UI to manage guardrail policies (BerriAI#19668) * init UI * init schema.prisma * fix: policy_crud_router * UI fixes * update gitignore * working v0 for policy mgmt * fix: endpoints to resolve guardrails * fix code QA checks * ui build issues * schema fixes * fix checks * docs fix * remove imports from functions * add schema.prisma * add migrtion * fix schema.prisma * remove imports from functions * fix lint * BUMP pyproject * add spend-queue-troubleshooting docs (BerriAI#19659) * add spend-queue-troubleshooting docs * adjust spend-queue-troubleshooting docs * fix linting * New add fallbacks modal * adding tests * Add Langfuse mock mode for testing without API calls (BerriAI#19676) * Add GCS mock mode for testing without API calls (BerriAI#19683) * Adding router settings to create team and key * fixing build * fixing tests * perf: Optimize strip_trailing_slash with O(1) index check (BerriAI#19679) * perf: Optimize strip_trailing_slash with O(1) index check Replace rstrip("/") with direct index check for O(1) performance instead of O(n) string scanning. Results: - strip_trailing_slash: 311ms → 13ms (96% faster) - get_standard_logging_object_payload: 6.11s → 5.80s (5% faster) * Handle multiple trailing slashes in strip_trailing_slash Use rstrip for correctness when URL ends with "//" or more, otherwise use O(1) index check for single trailing slash. * Fixing tests * perf: Optimize use_custom_pricing_for_model with set intersection (BerriAI#19677) * perf: Optimize use_custom_pricing_for_model with set intersection Cache CustomPricingLiteLLMParams.model_fields.keys() as a module-level frozenset and use set intersection to reduce loop iterations from 882k to 90k (only iterating over keys that exist in both sets). Performance improvement: 84% faster (6.3x speedup) - Before: 1.17s total, 65µs per call - After: 0.19s total, 10µs per call * Use .get() for defensive dictionary access * perf: skip pattern_router.route() for non-wildcard models (BerriAI#19664) Check "*" in model before calling pattern_router.route() to avoid unnecessary pattern matching for non-wildcard model configurations. * perf: Add LRU caching to get_model_info for faster cost lookups (BerriAI#19606) - Add @lru_cache decorator to get_model_info() and _cached_get_model_info_helper() - Update _invalidate_model_cost_lowercase_map() to clear these caches when model_cost changes - Update test to call cache invalidation after modifying litellm.model_cost Reduces get_model_cost_information from 46% to <1% of request handling time. * UI: new build * redirect to login on expired jwt * [Feat] UI + Backend - Allow adding policies on Keys/Teams + Viewing on Info panels (BerriAI#19688) * ui for policy mgmt * test_add_guardrails_from_policy_engine_accepts_dynamic_policies_and_pops_from_data * docs: add litellm-enterprise requirement for managed files (BerriAI#19689) * Update Gemini 2.0 Flash deprecation dates to March 31, 2026 (BerriAI#19592) Google announced that Gemini 2.0 Flash and Flash Lite models will be discontinued on March 31, 2026. Updated deprecation_date field for all affected model variants across different providers (vertex_ai, gemini, deepinfra, openrouter, vercel_ai_gateway). Models updated: - gemini-2.0-flash (added deprecation date) - gemini-2.0-flash-001 (updated from 2026-02-05) - gemini-2.0-flash-lite (added deprecation date) - gemini-2.0-flash-lite-001 (updated from 2026-02-25) All variants now correctly reflect the March 31, 2026 shutdown date. * fixing build * Fixing failing tests * deactivating non root tests * fixing arize tests * cache tests serial * fixing circleci config * fixing circleci config * Update OSS Adopters section with new table format * Fixing ruff check * bump: version 1.81.2 → 1.81.3 * chore: update Next.js build artifacts (2026-01-24 17:18 UTC, node v22.16.0) * CI/CD fixes - split local testing * fix: _apply_search_filter_to_models mypy linting * test_partner_models_httpx_streaming * test_web_search * Fix: log duplication when json_logs is enabled (BerriAI#19705) * fix: FLAKY tests * fix unstable tests * docs fix * docs fix * docs fix * docs fix * docs fix * test_get_default_unvicorn_init_args * fix flaky tests * test_hanging_request_azure * test_team_update_sc_2 * BUMP extras * test fixes * test fixes * test_retrieve_container_basic * Model and Team filtering * TestBedrockInvokeToolSearch * fix(presidio): resolve runtime error by handling asyncio loops in bac… (BerriAI#19714) * fix(presidio): resolve runtime error by handling asyncio loops in background threads * add test case for thread safety * UI Keys Teams Router Settings docs * chore: update Next.js build artifacts (2026-01-25 00:27 UTC, node v22.16.0) * test_stream_transformation_error_sync * fix patch reliability mock tests * fix MCP tests * fix: server rooth path (BerriAI#19790) * feat: tpm-rpm limit in prometheus metrics (BerriAI#19725) Co-authored-by: Krish Dholakia <krrishdholakia@gmail.com> * fix(proxy): support slashes in google generateContent model names (BerriAI#19737) * fix(proxy): support slashes in google route params * fix(proxy): extract google model ids with slashes * test(proxy): cover google model ids with slashes * fix(vertex_ai): support model names with slashes in passthrough URLs (BerriAI#19944) The regex in get_vertex_model_id_from_url() was using [^/:]+ which stopped at the first slash, truncating model names like 'gcp/google/gemini-2.5-flash' to just 'gcp'. This caused access_groups checks to fail for custom model names. Changed the pattern to [^:]+ to allow slashes in model names, only stopping at the colon before the action (e.g., :generateContent). * [Fix] VertexAI Pass through - fix regression that caused vertex ai passthroughs to stop working for router models (BerriAI#19967) * fix(vertex_ai): replace custom model names with actual Vertex AI model names in passthrough URLs (BerriAI#19948) When the passthrough URL already contains project and location, the code was skipping the deployment lookup and forwarding the URL as-is to Vertex AI. For custom model names like gcp/google/gemini-2.5-flash, Vertex AI returned 404 because it only knows the actual model name (gemini-2.5-flash). The fix makes the deployment lookup always run, so the custom model name gets replaced with the actual Vertex AI model name before forwarding. * add _resolve_vertex_model_from_router * fix: get_llm_provider * Potential fix for code scanning alert no. 4020: Clear-text logging of sensitive information Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com> --------- Co-authored-by: michelligabriele <gabriele.michelli@icloud.com> Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com> * [Feat] - Search API add /list endpoint to list what search tools exist in router (BerriAI#19969) * feat: List all available search tools configured in the router. * add debugging search API * add debugging search API * perf(prometheus): parallelize budget metrics, fix caching bug, reduce CPU by ~40% (BerriAI#20544) * fix: revert httpx client caching that caused closed client errors AsyncHTTPHandler.__del__ was closing httpx clients still in use by AsyncOpenAI/AsyncAzureOpenAI due to independent cache lifecycles. Restores standalone httpx client creation for OpenAI/Azure providers. * Revert "Merge pull request BerriAI#18790 from BerriAI/litellm_key_team_routing_3" This reverts commit ae26d8e, reversing changes made to 864e8c6. * fix MYPY lint * fixed build errors after merge * least busy debug logs --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com> Co-authored-by: mubashir1osmani <mubashir.osmani777@gmail.com> Co-authored-by: yuneng-jiang <yuneng.jiang@gmail.com> Co-authored-by: YutaSaito <36355491+uc4w6c@users.noreply.github.com> Co-authored-by: Yuta Saito <uc4w6c@bma.biglobe.ne.jp> Co-authored-by: Cesar Garcia <128240629+Chesars@users.noreply.github.com> Co-authored-by: Krish Dholakia <krrishdholakia@gmail.com> Co-authored-by: Alexsander Hamir <alexsanderhamirgomesbaptista@gmail.com> Co-authored-by: jay prajapati <79649559+jayy-77@users.noreply.github.com> Co-authored-by: Sameer Kankute <sameer@berri.ai> Co-authored-by: davida-ps <david.a@prompt.security> Co-authored-by: Harshit Jain <48647625+Harshit28j@users.noreply.github.com> Co-authored-by: houdataali <84786211+houdataali@users.noreply.github.com> Co-authored-by: João Dinis Ferreira <hello@joaof.eu> Co-authored-by: Emerson Gomes <emerson.gomes@thalesgroup.com> Co-authored-by: Yogeshwaran Ravichandran <96047771+yogeshwaran10@users.noreply.github.com> Co-authored-by: Will Chen <willchen90@gmail.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Eric Cao <ecao310@gmail.com> Co-authored-by: mpcusack-altos <mcusack@altoslabs.com> Co-authored-by: milan-berri <milan@berri.ai> Co-authored-by: John Greek <2006605+jgreek@users.noreply.github.com> Co-authored-by: xqe2011 <gz923553148@gmail.com> Co-authored-by: ryan-crabbe <128659760+ryan-crabbe@users.noreply.github.com> Co-authored-by: Harshit Jain <harshitjain0562@gmail.com> Co-authored-by: michelligabriele <gabriele.michelli@icloud.com> Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com> * Sync/v1.81.3 stable (#67) * Fix virtual keys table sorting * Adding tests * feat: add GMI Cloud provider support (BerriAI#19376) * feat: add GMI Cloud provider support Add GMI Cloud as an OpenAI-compatible provider with: - Provider configuration in providers.json - Documentation page with usage examples - Model pricing for 16 models (Claude, GPT, DeepSeek, Gemini, etc.) - Sidebar entry for docs navigation * Add gmi_cloud to provider_endpoints_support.json Add provider entry to pass CI validation check that ensures all providers in openai_like/providers.json are documented. * Fix provider key: gmi_cloud -> gmi Match the provider key with providers.json --------- Co-authored-by: Krish Dholakia <krrishdholakia@gmail.com> * Cut chat_completion latency by ~21% by reducing pre-call processing time (BerriAI#19535) * Adding scope to /models * e2e test internal viewer sidebar * Model Select for Create Team * create team model select * fixing build * [Fix] VertexAI Pass through - Ensure only anthropic betas are forwarded down to LLM API (BerriAI#19542) * fix ALLOWED_VERTEX_AI_PASSTHROUGH_HEADERS * test_vertex_passthrough_forwards_anthropic_beta_header * fix test_vertex_passthrough_forwards_anthropic_beta_header * test_vertex_passthrough_does_not_forward_litellm_auth_token * fix utils * Using Anthropic Beta Features on Vertex AI * test_forward_headers_from_request_x_pass_prefix * [Fix] VertexAI Pass through - Ensure only anthropic betas are forwarded down to LLM API (BerriAI#19542) * fix ALLOWED_VERTEX_AI_PASSTHROUGH_HEADERS * test_vertex_passthrough_forwards_anthropic_beta_header * fix test_vertex_passthrough_forwards_anthropic_beta_header * test_vertex_passthrough_does_not_forward_litellm_auth_token * fix utils * Using Anthropic Beta Features on Vertex AI * test_forward_headers_from_request_x_pass_prefix * fix(mcp): forward static_headers to MCP servers (BerriAI#19341) (BerriAI#19366) Forward static_headers from /mcp-rest/test/* routes into the MCP client so headers are present during session.initialize() and tool discovery. Also add a shared merge_mcp_headers() helper to keep header precedence consistent and ensure OpenAPI-to-MCP generated tools include static_headers. Tests: - pytest tests/test_litellm/proxy/_experimental/mcp_server/test_rest_endpoints.py - pytest tests/test_litellm/proxy/_experimental/mcp_server/test_mcp_server_manager.py -k register_openapi_tools_includes_static_headers Fixes BerriAI#19341 Co-authored-by: Krish Dholakia <krrishdholakia@gmail.com> * fix(azure): preserve content_policy_violation details for images (BerriAI#19328) (BerriAI#19372) Azure OpenAI Images (DALL·E 3) returns policy violations as a structured payload under body["error"], including inner_error.content_filter_results and revised_prompt. LiteLLM previously: - Failed to extract nested error messages (get_error_message only handled body["message"]) - Missed policy violation detection when error strings were generic - Dropped inner_error details when raising ContentPolicyViolationError This change: - Extracts nested Azure error fields (code/type/message + inner_error) - Detects policy violations via structured error codes - Passes an OpenAI-style error body + provider_specific_fields to preserve details Tests: - python3 -m pytest tests/test_litellm/llms/azure/test_azure_exception_mapping.py - python3 -m pytest tests/test_litellm/litellm_core_utils/test_exception_mapping_utils.py Fixes BerriAI#19328 * [Feat] Add Structured output for /v1/messages with Anthropic API, Azure Anthropic API, Bedrock Converse (BerriAI#19545) * fix: add AnthropicMessagesRequestOptionalParams * add _update_headers_with_anthropic_beta * fix output format tests * test_structured_output_e2e * TestAnthropicAPIStructuredOutput * test_structured_output_e2e * fix BASE * TestAzureAnthropicStructuredOutput * fix: Bedrock Converse * add nthropic Messages Pass-Through Architecture * fix: bedrock invoke output_format * fix: transform_anthropic_messages_request for vertex anthropic * TestBedrockInvokeStructuredOutput * docs anthropic vertex * docs fix * docs fix * fixing prompt-security's guardrail implementation (BerriAI#19374) * Consolidated change * fix(prompt_security): update message processing to persist sanitized files and filter for API calls * fix per krrishdholakia suggestion * Fix/per service ssl override v2 (BerriAI#19538) * refactor(ssl): support per-service SSL verification overrides * add test cases for ssl * docs: update Claude Code integration guides (BerriAI#19415) * docs: document Claude Code default models and env var overrides - Update config example with current Claude Code 2.1.x model names - Add section documenting default models (sonnet/haiku) that Claude Code requests - Document env var overrides (ANTHROPIC_DEFAULT_SONNET_MODEL, etc.) - Show how model_name alias can route to any provider (Bedrock, Vertex, etc.) * Update docs Removed warning about changing model names in Claude Code versions. * docs: add 1M context support and improve Claude Code quickstart guide - Add comprehensive 1M context window documentation - Document [1m] suffix usage and shell escaping requirements - Clarify that LiteLLM config should NOT include [1m] in model names - Add standalone claude_code_1m_context.md guide - Improve model selection documentation with environment variables - Add section on default models used by Claude Code v2.1.14 - Add troubleshooting for 1M context issues - Reorganize to emphasize environment variables approach Addresses GitHub issue BerriAI#14444 * docs: reorder model selection options - prioritize --model over env vars - Move command line/session model selection to Option 1 (most reliable) - Move environment variables to Option 2 - Add note that env vars may be cached from previous session - Emphasize that --model always uses exact model specified * docs: reorganize 1M context section - separate command line from env vars - Split 1M context examples into two clear sections - Show command line usage first (--model and /model) - Show environment variables as alternative approach - Improves readability and emphasizes most reliable method * docs: remove misleading default models section from website tutorial - Remove 'Default Models Used by Claude Code' section (misleading) - Remove claim that config must match exact default model names - Update config comment to be more general - Add claude-opus-4-5-20251101 to example config - Keep authentication section as-is * docs: correct model selection in website tutorial - Remove incorrect claim that Claude Code automatically uses proxy models - Add explicit model selection examples with --model and /model - Show environment variables as alternative approach - Remove misleading comment about 'multiple configured' * docs: add 1M context section to website tutorial - Add section on using [1m] suffix for 1 million token context - Include warning about shell escaping (quotes required) - Explain how Claude Code handles [1m] internally - Add /context verification command - Note that LiteLLM config should NOT include [1m] * docs: add tip about using .env for API keys - Add note that ANTHROPIC_API_KEY can be stored in .env file - Clarifies alternative to exporting environment variables * add redisvl dependency to the root requiremnts.tx (BerriAI#19417) * [Fix] UI Cost Estimator - Fix model dropdown (BerriAI#19529) * add cost estimator * ui fix show errors * test_estimate_cost_resolves_router_model_alias * fix: UI 404 error when SERVER_ROOT_PATH is set (BerriAI#19467) * fix: add case-insensitive support for guardrail mode and actions (BerriAI#19480) * fix(bedrock): correct streaming choice index for tool calls (BerriAI#19506) Bedrock's contentBlockIndex identifies content blocks within a message (text=0, tool_call=1), not OpenAI's choice index (which varies with n>1). This caused OpenAI SDK's ChatCompletionAccumulator to fail when tool call chunks arrived on index 1 while finish_reason arrived on index 0. Bedrock doesn't support n>1 (no such parameter exists): https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_InferenceConfiguration.html OpenAI choice index spec: https://platform.openai.com/docs/api-reference/chat/streaming * Fix Azure RPM calculation formula (BerriAI#19513) * Fix Azure RPM calculation formula * updated test * fix(azure response api): flatten tools for responses api to support nested definitions (BerriAI#19526) The Azure Responses API uses a different schema (flattened) for tools compared to the standard OpenAI/Azure Chat Completions API (nested). This caused a `BadRequestError` when users passed standard tool definitions. Changes: - Implemented tool flattening logic in `AzureOpenAIResponsesAPIConfig.transform_responses_api_request`. - Added comprehensive unit tests in test_azure_transformation.py to verify nested-to-flat transformation, pass-through of flat tools, and immutability. - Ensures cross-provider compatibility for tool definitions. Fixes BerriAI#19523 * Fix date overflow/division by zero in proxy utils (BerriAI#19527) * Fix date overflow/division by zero in proxy utils * Fix projected spend calculation * Strengthen projected spend tests * Fix Azure AI costs for Anthropic models (BerriAI#19530) * Fix Azure AI cost calculation * fixup * feat: Add MCP tools response to chat completions * feat: display mcp output on the play ground * Fix: generation config empty for batch * Add custom vertex ai mapping to the output * Add support for output formatfor bedrock invoke via v1/messages * feat: Limit stop sequence as per openai spec * Fix mypy error in litellm_staging_01_21_2026 * Fix: imagegeneration@006 has been deprecated * Fix : test_anthropic_via_responses_api * Fix: Responses API usage field type mismatch * Fix: Httpx timeout test failures * Fix: generationConfig removal from tests * fix: mypy error * comment code not used * feat: Add MCP tools response to chat completions * feat: display mcp output on the play ground * Fix batch tests * fix: mypy error * fix: mypy error * Fix:test_multiple_function_call * build(deps): bump lodash from 4.17.21 to 4.17.23 in /docs/my-website Bumps [lodash](https://github.com/lodash/lodash) from 4.17.21 to 4.17.23. - [Release notes](https://github.com/lodash/lodash/releases) - [Commits](lodash/lodash@4.17.21...4.17.23) --- updated-dependencies: - dependency-name: lodash dependency-version: 4.17.23 dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> * Metrics prometheus user team count (BerriAI#19520) * add user count and team count prometheus metrics * rebase * revert mistaken deletion * fix ui build and mypy lint * Adding python3-dev to non root * adding node-tar cve allowlist * fix(websearch_interception): filter internal kwargs before follow-up request (BerriAI#19577) The websearch interception handler was passing internal flags like `_websearch_interception_converted_stream` to the follow-up LLM request. This caused "Extra inputs are not permitted" errors from providers like Bedrock that use strict Pydantic validation. Fix: Filter out all kwargs starting with `_websearch_interception` prefix before making the follow-up anthropic_messages.acreate() call. * skip brave tests * Fix unsafe access to request attribute (BerriAI#19573) * updating promethus tests * Fix non-root proxy tests * Adding lodash-es to allowlist * attempt fix translation tests * fix: change oss staging branch name to reflect they're oss * Revert "[Infra] UI - E2E Tests: Internal Viewer Sidebar" * Overriding lodash-es with version 4.17.23 in docs * updating lodash for dashboard * bump: version 1.81.1 → 1.81.2 * Add reusable model select to update organization page * Fixing tests * Adding EOS to finish reasons * Adding retries to flaky tests * add opencode tutorial (BerriAI#19602) * Fix org all proxy model case * adjust opencode tutorial (BerriAI#19605) * Add OSS Adopters section to README * fix: completions mcp output ordering * feat(helm): Enable PreStop hook configuration in values.yaml (BerriAI#19613) * Fix: litellm/tests/test_proxy_server_non_root.py * Update README.md * Update README.md * [Feat] New LiteLLM Policy engine - create policies to manage guardrails, conditions - permissions per Key, Team (BerriAI#19612) * init PolicyMatcher * TestPolicyMatcherGetMatchingPolicies * TestPolicyMatcherGetMatchingPolicies * feat: init PolicyResolver * init resolver types * init policy from config * inint PolicyValidator * validate policy * init Architecture Diagram * test_add_guardrails_from_policy_engine * init _init_policy_engine * test updates * test fixws * new attachment config * simplify types * TestPolicyResolverInheritance * fix policy resolver * fix policies * fix applied policy * docs fix * docs fix * fix linting + QA checks * fix linting + QA fixes * test fixes * docs fix * fix: pass through endpoints update registry (BerriAI#19420) * fix: pass through endpoints update registry * add test case, fix lint error and comment to avoid confusion * fix pass through endpoints test case * [Fix] Anthropic models on Azure AI cache pricing (BerriAI#19532) (BerriAI#19614) * Update README.md * fix: for test * All Models Backend Search * adding test * test: completions mcp output test * chore: fix lint error * test: Skip anthropic model test when ANTHROPIC_API_KEY is not set * fix: include tool arguments in proxy_server_request for spend logs callbacks * feat: hashicorp vault rotate support * Add tool choice mapping for giga chat * Fix: Responses API logging error for StopIteration * Fix: test_nova_invoke_streaming_chunk_parsing * Remove f string * fix BerriAI#19620: SSO user roles are not updated for existing users (BerriAI#19621) * Fix: SSO user roles are not updated for existing users Fixes BerriAI#19620 * Refactor: Remove redundant user_info retrieval in SSOAuthenticationHandler * Test: add new tests for user creation and updates in get_user_info_from_db * ci cd fixes - linting security * resetting poetry and requirements * fixing security checks * docs fix * fixing config * skipping flaky tests * skipping non root tests entirely * security scan * attempt fix flaky tests * fixing flaky tests * [Feat] Guardrail Policy Management - Allow using UI to manage guardrail policies (BerriAI#19668) * init UI * init schema.prisma * fix: policy_crud_router * UI fixes * update gitignore * working v0 for policy mgmt * fix: endpoints to resolve guardrails * fix code QA checks * ui build issues * schema fixes * fix checks * docs fix * remove imports from functions * add schema.prisma * add migrtion * fix schema.prisma * remove imports from functions * fix lint * BUMP pyproject * add spend-queue-troubleshooting docs (BerriAI#19659) * add spend-queue-troubleshooting docs * adjust spend-queue-troubleshooting docs * fix linting * New add fallbacks modal * adding tests * Add Langfuse mock mode for testing without API calls (BerriAI#19676) * Add GCS mock mode for testing without API calls (BerriAI#19683) * Adding router settings to create team and key * fixing build * fixing tests * perf: Optimize strip_trailing_slash with O(1) index check (BerriAI#19679) * perf: Optimize strip_trailing_slash with O(1) index check Replace rstrip("/") with direct index check for O(1) performance instead of O(n) string scanning. Results: - strip_trailing_slash: 311ms → 13ms (96% faster) - get_standard_logging_object_payload: 6.11s → 5.80s (5% faster) * Handle multiple trailing slashes in strip_trailing_slash Use rstrip for correctness when URL ends with "//" or more, otherwise use O(1) index check for single trailing slash. * Fixing tests * perf: Optimize use_custom_pricing_for_model with set intersection (BerriAI#19677) * perf: Optimize use_custom_pricing_for_model with set intersection Cache CustomPricingLiteLLMParams.model_fields.keys() as a module-level frozenset and use set intersection to reduce loop iterations from 882k to 90k (only iterating over keys that exist in both sets). Performance improvement: 84% faster (6.3x speedup) - Before: 1.17s total, 65µs per call - After: 0.19s total, 10µs per call * Use .get() for defensive dictionary access * perf: skip pattern_router.route() for non-wildcard models (BerriAI#19664) Check "*" in model before calling pattern_router.route() to avoid unnecessary pattern matching for non-wildcard model configurations. * perf: Add LRU caching to get_model_info for faster cost lookups (BerriAI#19606) - Add @lru_cache decorator to get_model_info() and _cached_get_model_info_helper() - Update _invalidate_model_cost_lowercase_map() to clear these caches when model_cost changes - Update test to call cache invalidation after modifying litellm.model_cost Reduces get_model_cost_information from 46% to <1% of request handling time. * UI: new build * redirect to login on expired jwt * [Feat] UI + Backend - Allow adding policies on Keys/Teams + Viewing on Info panels (BerriAI#19688) * ui for policy mgmt * test_add_guardrails_from_policy_engine_accepts_dynamic_policies_and_pops_from_data * docs: add litellm-enterprise requirement for managed files (BerriAI#19689) * Update Gemini 2.0 Flash deprecation dates to March 31, 2026 (BerriAI#19592) Google announced that Gemini 2.0 Flash and Flash Lite models will be discontinued on March 31, 2026. Updated deprecation_date field for all affected model variants across different providers (vertex_ai, gemini, deepinfra, openrouter, vercel_ai_gateway). Models updated: - gemini-2.0-flash (added deprecation date) - gemini-2.0-flash-001 (updated from 2026-02-05) - gemini-2.0-flash-lite (added deprecation date) - gemini-2.0-flash-lite-001 (updated from 2026-02-25) All variants now correctly reflect the March 31, 2026 shutdown date. * fixing build * Fixing failing tests * deactivating non root tests * fixing arize tests * cache tests serial * fixing circleci config * fixing circleci config * Update OSS Adopters section with new table format * Fixing ruff check * bump: version 1.81.2 → 1.81.3 * chore: update Next.js build artifacts (2026-01-24 17:18 UTC, node v22.16.0) * CI/CD fixes - split local testing * fix: _apply_search_filter_to_models mypy linting * test_partner_models_httpx_streaming * test_web_search * Fix: log duplication when json_logs is enabled (BerriAI#19705) * fix: FLAKY tests * fix unstable tests * docs fix * docs fix * docs fix * docs fix * docs fix * test_get_default_unvicorn_init_args * fix flaky tests * test_hanging_request_azure * test_team_update_sc_2 * BUMP extras * test fixes * test fixes * test_retrieve_container_basic * Model and Team filtering * TestBedrockInvokeToolSearch * fix(presidio): resolve runtime error by handling asyncio loops in bac… (BerriAI#19714) * fix(presidio): resolve runtime error by handling asyncio loops in background threads * add test case for thread safety * UI Keys Teams Router Settings docs * chore: update Next.js build artifacts (2026-01-25 00:27 UTC, node v22.16.0) * test_stream_transformation_error_sync * fix patch reliability mock tests * fix MCP tests * fix: server rooth path (BerriAI#19790) * feat: tpm-rpm limit in prometheus metrics (BerriAI#19725) Co-authored-by: Krish Dholakia <krrishdholakia@gmail.com> * fix(proxy): support slashes in google generateContent model names (BerriAI#19737) * fix(proxy): support slashes in google route params * fix(proxy): extract google model ids with slashes * test(proxy): cover google model ids with slashes * fix(vertex_ai): support model names with slashes in passthrough URLs (BerriAI#19944) The regex in get_vertex_model_id_from_url() was using [^/:]+ which stopped at the first slash, truncating model names like 'gcp/google/gemini-2.5-flash' to just 'gcp'. This caused access_groups checks to fail for custom model names. Changed the pattern to [^:]+ to allow slashes in model names, only stopping at the colon before the action (e.g., :generateContent). * [Fix] VertexAI Pass through - fix regression that caused vertex ai passthroughs to stop working for router models (BerriAI#19967) * fix(vertex_ai): replace custom model names with actual Vertex AI model names in passthrough URLs (BerriAI#19948) When the passthrough URL already contains project and location, the code was skipping the deployment lookup and forwarding the URL as-is to Vertex AI. For custom model names like gcp/google/gemini-2.5-flash, Vertex AI returned 404 because it only knows the actual model name (gemini-2.5-flash). The fix makes the deployment lookup always run, so the custom model name gets replaced with the actual Vertex AI model name before forwarding. * add _resolve_vertex_model_from_router * fix: get_llm_provider * Potential fix for code scanning alert no. 4020: Clear-text logging of sensitive information Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com> --------- Co-authored-by: michelligabriele <gabriele.michelli@icloud.com> Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com> * [Feat] - Search API add /list endpoint to list what search tools exist in router (BerriAI#19969) * feat: List all available search tools configured in the router. * add debugging search API * add debugging search API * perf(prometheus): parallelize budget metrics, fix caching bug, reduce CPU by ~40% (BerriAI#20544) * fix: revert httpx client caching that caused closed client errors AsyncHTTPHandler.__del__ was closing httpx clients still in use by AsyncOpenAI/AsyncAzureOpenAI due to independent cache lifecycles. Restores standalone httpx client creation for OpenAI/Azure providers. * Revert "Merge pull request BerriAI#18790 from BerriAI/litellm_key_team_routing_3" This reverts commit ae26d8e, reversing changes made to 864e8c6. * fix MYPY lint * fixed build errors after merge * added sandbox branch for gcr push (#61) * added sandbox branch for gcr push * jenkins setup for sbx * build fix * addding sync/v[0-9] branches for gcr push * build fix * least busy debug logs * Fix: remove x-anthropic-billing block * added backl anthropic envs * merge fixes * least busy router changes --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: YutaSaito <36355491+uc4w6c@users.noreply.github.com> Co-authored-by: yuneng-jiang <yuneng.jiang@gmail.com> Co-authored-by: Cesar Garcia <128240629+Chesars@users.noreply.github.com> Co-authored-by: Krish Dholakia <krrishdholakia@gmail.com> Co-authored-by: Alexsander Hamir <alexsanderhamirgomesbaptista@gmail.com> Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com> Co-authored-by: jay prajapati <79649559+jayy-77@users.noreply.github.com> Co-authored-by: Sameer Kankute <sameer@berri.ai> Co-authored-by: davida-ps <david.a@prompt.security> Co-authored-by: Harshit Jain <48647625+Harshit28j@users.noreply.github.com> Co-authored-by: houdataali <84786211+houdataali@users.noreply.github.com> Co-authored-by: João Dinis Ferreira <hello@joaof.eu> Co-authored-by: Emerson Gomes <emerson.gomes@thalesgroup.com> Co-authored-by: Yogeshwaran Ravichandran <96047771+yogeshwaran10@users.noreply.github.com> Co-authored-by: Will Chen <willchen90@gmail.com> Co-authored-by: Yuta Saito <uc4w6c@bma.biglobe.ne.jp> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Eric Cao <ecao310@gmail.com> Co-authored-by: mpcusack-altos <mcusack@altoslabs.com> Co-authored-by: milan-berri <milan@berri.ai> Co-authored-by: John Greek <2006605+jgreek@users.noreply.github.com> Co-authored-by: xqe2011 <gz923553148@gmail.com> Co-authored-by: mubashir1osmani <mubashir.osmani777@gmail.com> Co-authored-by: ryan-crabbe <128659760+ryan-crabbe@users.noreply.github.com> Co-authored-by: Harshit Jain <harshitjain0562@gmail.com> Co-authored-by: michelligabriele <gabriele.michelli@icloud.com> Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com> Co-authored-by: pramodp-dotcom <pramod.p@juspay.in> --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: Pramod P <pramod.p@juspay.in> Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com> Co-authored-by: mubashir1osmani <mubashir.osmani777@gmail.com> Co-authored-by: yuneng-jiang <yuneng.jiang@gmail.com> Co-authored-by: YutaSaito <36355491+uc4w6c@users.noreply.github.com> Co-authored-by: Yuta Saito <uc4w6c@bma.biglobe.ne.jp> Co-authored-by: Cesar Garcia <128240629+Chesars@users.noreply.github.com> Co-authored-by: Krish Dholakia <krrishdholakia@gmail.com> Co-authored-by: Alexsander Hamir <alexsanderhamirgomesbaptista@gmail.com> Co-authored-by: jay prajapati <79649559+jayy-77@users.noreply.github.com> Co-authored-by: Sameer Kankute <sameer@berri.ai> Co-authored-by: davida-ps <david.a@prompt.security> Co-authored-by: Harshit Jain <48647625+Harshit28j@users.noreply.github.com> Co-authored-by: houdataali <84786211+houdataali@users.noreply.github.com> Co-authored-by: João Dinis Ferreira <hello@joaof.eu> Co-authored-by: Emerson Gomes <emerson.gomes@thalesgroup.com> Co-authored-by: Yogeshwaran Ravichandran <96047771+yogeshwaran10@users.noreply.github.com> Co-authored-by: Will Chen <willchen90@gmail.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Eric Cao <ecao310@gmail.com> Co-authored-by: mpcusack-altos <mcusack@altoslabs.com> Co-authored-by: milan-berri <milan@berri.ai> Co-authored-by: John Greek <2006605+jgreek@users.noreply.github.com> Co-authored-by: xqe2011 <gz923553148@gmail.com> Co-authored-by: ryan-crabbe <128659760+ryan-crabbe@users.noreply.github.com> Co-authored-by: Harshit Jain <harshitjain0562@gmail.com> Co-authored-by: michelligabriele <gabriele.michelli@icloud.com> Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
[Fix] VertexAI Pass through - Ensure only anthropic betas are forwarded down to LLM API
Request from LiteLLM

Request to Vertex
Pre-Submission checklist
Please complete all items before asking a LiteLLM maintainer to review your PR
tests/litellm/directory, Adding at least 1 test is a hard requirement - see detailsmake test-unitCI (LiteLLM team)
Branch creation CI run
Link:
CI run for the last commit
Link:
Merge / cherry-pick CI run
Links:
Type
🐛 Bug Fix
✅ Test
Changes