Skip to content

add timeout to onyx guardrail#19731

Merged
krrishdholakia merged 2 commits intoBerriAI:litellm_oss_staging_01_26_2026from
tamirkiviti13:add-timeout-to-onyx-guardrail
Jan 26, 2026
Merged

add timeout to onyx guardrail#19731
krrishdholakia merged 2 commits intoBerriAI:litellm_oss_staging_01_26_2026from
tamirkiviti13:add-timeout-to-onyx-guardrail

Conversation

@tamirkiviti13
Copy link
Contributor

Relevant issues

Pre-Submission checklist

Please complete all items before asking a LiteLLM maintainer to review your PR

  • I have Added testing in the tests/litellm/ directory, Adding at least 1 test is a hard requirement - see details
  • My PR passes all unit tests on make test-unit
  • My PR's scope is as isolated as possible, it only solves 1 specific problem

CI (LiteLLM team)

CI status guideline:

  • 50-55 passing tests: main is stable with minor issues.
  • 45-49 passing tests: acceptable but needs attention
  • <= 40 passing tests: unstable; be careful with your merges and assess the risk.
  • Branch creation CI run
    Link:

  • CI run for the last commit
    Link:

  • Merge / cherry-pick CI run
    Links:

Type

🚄 Infrastructure

Changes

Added the option to override the default timeout for the HTTP client in Onyx's custom guardrail

@vercel
Copy link

vercel bot commented Jan 25, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Review Updated (UTC)
litellm Ready Ready Preview, Comment Jan 25, 2026 4:15pm

Request Review

@krrishdholakia krrishdholakia changed the base branch from main to litellm_oss_staging_01_26_2026 January 26, 2026 07:13
@krrishdholakia krrishdholakia merged commit aa8134f into BerriAI:litellm_oss_staging_01_26_2026 Jan 26, 2026
4 of 7 checks passed
krrishdholakia added a commit that referenced this pull request Feb 5, 2026
* UI: new build

* redirect to login on expired jwt

* [Feat] UI + Backend - Allow adding policies on Keys/Teams  + Viewing on Info panels  (#19688)

* ui for policy mgmt

* test_add_guardrails_from_policy_engine_accepts_dynamic_policies_and_pops_from_data

* docs: add litellm-enterprise requirement for managed files (#19689)

* Update Gemini 2.0 Flash deprecation dates to March 31, 2026 (#19592)

Google announced that Gemini 2.0 Flash and Flash Lite models will be discontinued on March 31, 2026. Updated deprecation_date field for all affected model variants across different providers (vertex_ai, gemini, deepinfra, openrouter, vercel_ai_gateway).

Models updated:
- gemini-2.0-flash (added deprecation date)
- gemini-2.0-flash-001 (updated from 2026-02-05)
- gemini-2.0-flash-lite (added deprecation date)
- gemini-2.0-flash-lite-001 (updated from 2026-02-25)

All variants now correctly reflect the March 31, 2026 shutdown date.

* fixing build

* Fixing failing tests

* deactivating non root tests

* fixing arize tests

* cache tests serial

* fixing circleci config

* fixing circleci config

* Update OSS Adopters section with new table format

* Fixing ruff check

* bump: version 1.81.2 → 1.81.3

* chore: update Next.js build artifacts (2026-01-24 17:18 UTC, node v22.16.0)

* CI/CD fixes  - split local testing

* fix: _apply_search_filter_to_models mypy linting

* test_partner_models_httpx_streaming

* test_web_search

* Fix: log duplication when json_logs is enabled (#19705)

* fix: FLAKY tests

* fix unstable tests

* docs fix

* docs fix

* docs fix

* docs fix

* docs fix

* test_get_default_unvicorn_init_args

* fix flaky tests

* test_hanging_request_azure

* test_team_update_sc_2

* BUMP extras

* test fixes

* test fixes

* test_retrieve_container_basic

* Model and Team filtering

* TestBedrockInvokeToolSearch

* fix(presidio): resolve runtime error by handling asyncio loops in bac… (#19714)

* fix(presidio): resolve runtime error by handling asyncio loops in background threads

* add test case for thread safety

* UI Keys Teams Router Settings docs

* chore: update Next.js build artifacts (2026-01-25 00:27 UTC, node v22.16.0)

* test_stream_transformation_error_sync

* fix patch reliability mock tests

* fix MCP tests

* auto truncation of virtual keys table values

* fix: args issue & refactor into helper function to reduce bloat for both(#19441)

* Fix bulk user add

* fix(proxy): support slashes in google generateContent model names (#19737)

* fix(proxy): support slashes in google route params

* fix(proxy): extract google model ids with slashes

* test(proxy): cover google model ids with slashes

* Fix/non standard mcp url pattern (#19738)

* fix(mcp): Add standard MCP URL pattern support for OAuth discovery (#17272)

  OAuth discovery endpoints now support both URL patterns:
  - Standard MCP pattern: /mcp/{server_name} (new)
  - Legacy LiteLLM pattern: /{server_name}/mcp (backward compatible)

  The standard pattern is required by MCP-compliant clients like
  mcp-inspector and VSCode Copilot, which expect resource URLs
  following the /mcp/{server_name} convention per RFC 9728.

  Changes:
  - Add _build_oauth_protected_resource_response() helper
  - Add oauth_protected_resource_mcp_standard() endpoint
  - Add oauth_authorization_server_mcp_standard() endpoint
  - Keep legacy endpoints for backward compatibility
  - Add tests for both URL patterns

  Fixes #17272

* fix(mcp): Add standard MCP URL pattern support for OAuth discovery (#17272)

  OAuth discovery endpoints now support both URL patterns:
  - Standard MCP pattern: /mcp/{server_name} (new)
  - Legacy LiteLLM pattern: /{server_name}/mcp (backward compatible)

  The standard pattern is required by MCP-compliant clients like
  mcp-inspector and VSCode Copilot, which expect resource URLs
  following the /mcp/{server_name} convention per RFC 9728.

  Changes:
  - Add _build_oauth_protected_resource_response() helper
  - Add oauth_protected_resource_mcp_standard() endpoint
  - Add oauth_authorization_server_mcp_standard() endpoint
  - Keep legacy endpoints for backward compatibility
  - Add tests for both URL patterns

  Fixes #17272

* Test was relocated

* refactor(mcp): Extract helper methods from run_with_session to fix PLR0915

Split the large run_with_session method (55 statements) into smaller
helper methods to satisfy ruff's PLR0915 rule (max 50 statements):

- _create_transport_context(): Creates transport based on type
- _execute_session_operation(): Handles session lifecycle

Also changed cleanup exception handling from Exception to BaseException
to properly catch asyncio.CancelledError (which is a BaseException subclass
in Python 3.8+).

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* test(mcp): Fix flaky test by mocking health_check_server

The test_mcp_server_manager_config_integration_with_database test was
making real network calls to fake URLs which caused timeouts and
CancelledError exceptions.

Fixed by mocking health_check_server to return a proper
LiteLLM_MCPServerTable object instead of making network calls.

* test(mcp): Fix skip condition to properly detect claude model names

The skip condition for missing API keys was checking for "anthropic" in
the model name, but the test uses "claude-haiku-4-5" which doesn't match.
Updated to check for both "anthropic" and "claude" model patterns.

Also added skip condition for OpenAI models when OPENAI_API_KEY is not set.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* test(mcp): Fix skip condition to properly detect claude model names

The skip condition for missing API keys was checking for "anthropic" in
the model name, but the test uses "claude-haiku-4-5" which doesn't match.
Updated to check for both "anthropic" and "claude" model patterns.

Also added skip condition for OpenAI models when OPENAI_API_KEY is not set.

---------

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>

* add callbacks and labels to prometheus (#19708)

* feat: add clientip and user agent in metrics (#19717)

* feat: add clientip and user agent in metrics

* fix: lint errors

* Add model id and other req labels

---------

Co-authored-by: Krish Dholakia <krrishdholakia@gmail.com>

* fix: optimize logo fetching and resolve mcp import blockers (#19719)

* feat: tpm-rpm limit in prometheus metrics (#19725)

Co-authored-by: Krish Dholakia <krrishdholakia@gmail.com>

* add timeout to onyx guardrail (#19731)

* add timeout to onyx guardrail

* add tests

* Fix /batches to return encoded ids (from managed objects table)

* fix(proxy): use return value from CustomLogger.async_post_call_success_hook (#19670)

* fix(proxy): use return value from CustomLogger.async_post_call_success_hook

Previously the return value was ignored for CustomLogger callbacks,
preventing users from modifying responses. Now the return value is
captured and used to replace the response (if not None), consistent
with CustomGuardrail and streaming iterator hook behavior.

Fixes issue with custom_callbacks not being able to inject data into
LLM responses.

* fix(proxy): also fix async_post_call_streaming_hook to use return value

Previously the streaming hook only used return values that started with
"data: " (SSE format). Now any non-None return value is used, consistent
with async_post_call_success_hook and streaming iterator hook behavior.

Added tests for streaming hook transformation.

---------

Co-authored-by: Gabriele Michelli <michelligabriele0@gmail.com>

* feat(hosted_vllm): support thinking parameter for /v1/messages endpoint

Adds support for Anthropic-style 'thinking' parameter in hosted_vllm,
converting it to OpenAI-style 'reasoning_effort' since vLLM is
OpenAI-compatible.

This enables users to use Claude Code CLI with hosted vLLM models
like GLM-4.6/4.7 through the /v1/messages endpoint.

Mapping (same as Anthropic adapter):
- budget_tokens >= 10000 -> "high"
- budget_tokens >= 5000  -> "medium"
- budget_tokens >= 2000  -> "low"
- budget_tokens < 2000   -> "minimal"

Fixes #19761

* Fix batch creation to return the input file's expires_at attribute

* bump: version 1.81.3 → 1.81.4 (#19793)

* fix: server rooth path (#19790)

* refactor: extract transport context creation into separate method (#19794)

* Fix user max budget reset to unlimited

- Added a Pydantic validator to convert empty string inputs for max_budget to None, preventing float parsing errors from the frontend.
- Modified the internal user update logic to explicitly allow max_budget to be None, ensuring the value isn't filtered out and can be reset to unlimited in the database.
- Added unit tests for validation and logic.

 Closes #19781

* Make test_get_users_key_count deterministic by creating dedicated test user (#19795)

- Create a test user with auto_create_key=False to ensure known starting state
- Filter get_users by user_ids to target only the test user
- Verify initial key count is 0 before creating a key
- Clean up test user after test completes
- This ensures consistent behavior across CI and local environments

* Add test for Router.get_valid_args, fix router code coverage encoding (#19797)

- Add test_get_valid_args in test_router_helper_utils.py to cover get_valid_args
- Use encoding='utf-8' in router_code_coverage.py for cross-platform file reads

* fix sso email case sensitivity

* Fix test_mcp_server_manager_config_integration_with_database cancellation error (#19801)

Mock _create_mcp_client to avoid network calls in health checks.
This prevents asyncio.CancelledError when the test teardown closes
the event loop while health checks are still pending.

The test focuses on conversion logic (access_groups, description)
not health check functionality, so mocking the network call is appropriate.

* fix: make HTTPHandler mockable in OIDC secret manager tests (#19803)

* fix: make HTTPHandler mockable in OIDC secret manager tests

- Add _get_oidc_http_handler() factory function to make HTTPHandler
  easily mockable in tests
- Update test_oidc_github_success to patch factory function instead
  of HTTPHandler directly
- Update Google OIDC tests for consistency
- Fixes test_oidc_github_success failure where mock was bypassed

This change allows tests to properly mock HTTPHandler instances used
for OIDC token requests, fixing the test failure where the mock was
not being used.

* fix: patch base_llm_http_handler method directly in container tests

- Use patch.object to patch container_create_handler method directly
  on the base_llm_http_handler instance instead of patching the module
- Fixes test_provider_support[openai] failure where mock wasn't applied
- Also fixes test_error_handling_integration with same approach

The issue was that patching 'litellm.containers.main.base_llm_http_handler'
didn't work because the module imports it with 'from litellm.main import',
creating a local reference. Using patch.object patches the method on the
actual object instance, which works regardless of import style.

* fix: resolve flaky test_openai_env_base by clearing cache

- Add cache clearing at start of test_openai_env_base to prevent cache pollution
- Ensures no cached clients from previous tests interfere with respx mocks
- Fixes intermittent failures where aiohttp transport was used instead of httpx
- Test-only change with low risk, no production code modifications

Resolves flaky test marked with @pytest.mark.flaky(retries=3, delay=1)
Both parametrized versions (OPENAI_API_BASE and OPENAI_BASE_URL) now pass consistently

* test: add explicit mock verification in test_provider_support

- Capture mock handler with 'as mock_handler' for explicit validation
- Add assert_called_once() to verify mock was actually used
- Ensures test verifies no real API calls are made
- Follows same pattern as test_openai_env_base validation

* Add light/dark mode slider for dev

* fix key duration input

* Messages api bedrock converse caching and pdf support (#19785)

* cache control for user messages and system messages

* add cache createion tokens in reponse

* cache controls in tool calls and assistant turns

* refactor with _should_preserve_cache_control

* add cache control unit tests

* use simpler cache creation token count logic

* use helper function

* remove unused function

* fix unit tests

* fixing team member add

* [Feat] enable progress notifications for MCP tool calls (#19809)

* enable progress notifications for MCP tool calls

* adjust mcp test

* [Feat] CLI Auth - Add configurable CLI JWT expiration via environment variable (#19780)

* fix: add CLI_JWT_EXPIRATION_HOURS

* docs: CLI_JWT_EXPIRATION_HOURS

* fix: get_cli_jwt_auth_token

* test_get_cli_jwt_auth_token_custom_expiration

* fixing flaky tests around oidc and email

* Add dont ask me again option in nudges

* CI/CD: Increase retries and stabilize litellm_mapped_tests_core (#19826)

* Fix PLR0915: Extract system message handling to reduce statement count

* fix mypy

* fix: add host_progress_callback parameter to mock_call_tool in test

The test_call_tool_without_broken_pipe_error was failing because the mock function did not accept the host_progress_callback keyword argument that the actual implementation passes to client.call_tool(). Updated the mock to accept this parameter to match the real implementation signature.

* fixing flaky tests around oidc and email

* Add documentation comment to test file

* add retry

* add dependency

* increase retry

---------

Co-authored-by: yuneng-jiang <yuneng.jiang@gmail.com>

* Fix broken mocks in 6 flaky tests to prevent real API calls  (#19829)

* Fix broken mocks in 6 flaky tests to prevent real API calls

Added network-level HTTP blocking using respx to prevent tests from making real API calls when Python-level mocks fail. This makes tests more reliable and retryable in CI.

Changes:

- Azure OIDC test: Added Azure Identity SDK mock to prevent real Azure calls

- Vector store test: Added @respx.mock decorator to block HTTP requests

- Resend email tests (3): Added @respx.mock decorator for all 3 test functions

- SendGrid email test: Added @respx.mock decorator

All test assertions and verification logic remain unchanged - only added safety nets to catch leaked API calls.

* Fix failing OIDC secret manager tests

Fixed two test failures in test_secret_managers_main.py:

1. test_oidc_azure_ad_token_success: Corrected the patch path for get_bearer_token_provider from 'litellm.secret_managers.get_azure_ad_token_provider.get_bearer_token_provider' to 'azure.identity.get_bearer_token_provider' since the function is imported from azure.identity.

2. test_oidc_google_success: Added @patch('httpx.Client') decorator to prevent any real HTTP connections during test execution, resolving httpx.ConnectError issues.

Both tests now pass successfully.

* Adding tests:

* fixing breaking change: just user_id provided should upsert still

* Fix: A2A Python SDK URL

* [Feat] Add UI for /rag/ingest API - upload docs, pdfs etc to create vector stores  (#19822)

* feat: _save_vector_store_to_db_from_rag_ingest

* UI features for RAG ingest

* fix: Endpoints

* ragIngestCall

* _save_vector_store_to_db_from_rag_ingest

* fix: rag_ingest Code QA CHECK

* UI fixes unit tests

* docs(readme): add OpenAI Agents SDK to OSS Adopters (#19820)

* docs(readme): add OpenAI Agents SDK to OSS Adopters

* docs(readme): add OpenAI Agents SDK logo

* Fixing tests

* Litellm release notes 01 26 2026 (#19836)

* docs: document new models/endpoints

* docs: cleanup

* feat: update model table

* fixing tests

* Litellm release notes 01 26 2026 (#19838)

* docs: document new models/endpoints

* docs: cleanup

* feat: update model table

* fix: cleanup

* feat: Add model_id label to Prometheus metrics (#18048) (#19678)

Co-authored-by: Cursor Agent <cursoragent@cursor.com>

* fix(models): set gpt-5.2-codex mode to responses for Azure and OpenRouter (#19770)

Fixes #19754

The gpt-5.2-codex model only supports the responses API, not chat completions.
Updated azure/gpt-5.2-codex and openrouter/openai/gpt-5.2-codex entries to use
mode: "responses" and supported_endpoints: ["/v1/responses"].

* fix(responses): update local_vars with detected provider (#19782) (#19798)

When using the responses API with provider-specific params (aws_*, vertex_*)
without explicitly passing custom_llm_provider, the code crashed with:
AttributeError: 'NoneType' object has no attribute 'startswith'

Root cause: local_vars was captured via locals() before get_llm_provider()
detected the provider from the model string (e.g., "bedrock/..."), so
custom_llm_provider remained None when processing provider-specific params.

Fix: Update local_vars["custom_llm_provider"] after get_llm_provider() call
so the detected provider is available for param processing.

Affected provider-specific params:
- aws_* (aws_region_name, aws_access_key_id, etc.) for Bedrock/SageMaker
- vertex_* (vertex_project, vertex_location, etc.) for Vertex AI

* fix(azure): use generic cost calculator for audio token pricing (#19771)

Azure audio models were charging audio output tokens at the text token
rate instead of the correct audio token rate. This resulted in costs
being ~6.65x lower than expected.

The fix replaces Azure's custom cost calculation logic with the generic
cost calculator that properly handles text, audio, cached, reasoning,
and image tokens.

Fixes #19764

* fix(xai): correct cached token cost calculation for xAI models (#19772)

* fix(azure): use generic cost calculator for audio token pricing

Azure audio models were charging audio output tokens at the text token
rate instead of the correct audio token rate. This resulted in costs
being ~6.65x lower than expected.

The fix replaces Azure's custom cost calculation logic with the generic
cost calculator that properly handles text, audio, cached, reasoning,
and image tokens.

Fixes #19764

* fix(xai): correct cached token cost calculation for xAI models

- Fix double-counting issue where xAI reports text_tokens = prompt_tokens
  (including cached), causing tokens to be charged twice
- Add cache_read_input_token_cost to xAI grok-3 and grok-3-mini model variants
- Detection: when text_tokens + cached_tokens > prompt_tokens, recalculate
  text_tokens = prompt_tokens - cached_tokens

xAI pricing (25% of input for cached):
- grok-3 variants: $0.75/M cached (input $3/M)
- grok-3-mini variants: $0.075/M cached (input $0.30/M)

* Fix:Support both JSON array format and comma-separated values from user headers

* Translate advanced-tool-use to Bedrock-specific headers for Claude Opus 4.5

* fix: token calculations and refactor (#19696)

* fix(prometheus): safely handle None metadata in logging to prevent At… (#19691)

* fix(prometheus): safely handle None metadata in logging to prevent AttributeError

* fix: lint issues

* fix: resolve 'does not exist' migration errors as applied in setup_database (#19281)

* Fix: timeout exception raised eror

* Add sarvam doc

* Add gemini-robotics-er-1.5-preview model in model map

* Add gemini-robotics-er-1.5-preview model documentation

* Fix: Stream the download in chunks

* Add grok reasoning content

* Revert poetry lock

* Fix mypy and code quality issues

* feat: add feature to make silent calls (#19544)

* feat: add feature to make silent calls

* add test or silent feat

* add docs for silent feat

* fix lint issues and  UI logs

* add docs of ab testing and deep copy

* fix(enterprise): correct error message for DISABLE_ADMIN_ENDPOINTS (#19861)

The error message for DISABLE_ADMIN_ENDPOINTS incorrectly said
"DISABLING LLM API ENDPOINTS is an Enterprise feature" instead of
"DISABLING ADMIN ENDPOINTS is an Enterprise feature".

This was a copy-paste bug from the is_llm_api_route_disabled() function.

Added regression tests to verify both error messages are correct.

* fix(proxy): handle agent parameter in /interactions endpoint (#19866)

* initialize tiktoken environment at import time to support offline usage

* fix(bedrock): support tool search header translation for Sonnet 4.5 (#19871)

Extend advanced-tool-use header translation to include Claude Sonnet 4.5
in addition to Opus 4.5 on Bedrock Invoke API.

When Claude Code sends the advanced-tool-use-2025-11-20 header, it now
gets correctly translated to Bedrock-specific headers for both:
- Claude Opus 4.5
- Claude Sonnet 4.5

Headers translated:
- tool-search-tool-2025-10-19
- tool-examples-2025-10-29

Fixes defer_loading validation error on Bedrock with Sonnet 4.5.

Ref: https://platform.claude.com/docs/en/agents-and-tools/tool-use/tool-search-tool

* bulk update keys endpoint

* mypy linting

* [Feat] RAG API - Add support for using s3 Vectors as Vector Store Provider for /rag/ingest (#19888)

* init S3VectorsRAGIngestion as a supported ingestion provider for RAG API

* test: TestRAGS3Vectors

* init S3VectorsVectorStoreOptions

* init s3 vectors

* code clean up + QA

* fix: get_credentials

* S3VectorsRAGIngestion

* TestRAGS3Vectors

* docs: AWS S3 Vectors

* add asyncio QA checks

* fix: S3_VECTORS_DEFAULT_DIMENSION

* Add native_background_mode to override polling_via_cache for specific models

This follow-up to PR #16862 allows users to specify models that should use
the native provider's background mode instead of polling via cache.

Config example:
  litellm_settings:
    responses:
      background_mode:
        polling_via_cache: ["openai"]
        native_background_mode: ["o4-mini-deep-research"]
        ttl: 3600

When a model is in native_background_mode list, should_use_polling_for_request
returns False, allowing the request to fall through to native provider handling.

Committed-By-Agent: cursor

* [Feat] RAG API - Add s3_vectors as provider on /vector_store/search API + UI for creating + PDF support for /rag/ingest (#19895)

* init S3VectorsRAGIngestion as a supported ingestion provider for RAG API

* test: TestRAGS3Vectors

* init S3VectorsVectorStoreOptions

* init s3 vectors

* code clean up + QA

* fix: get_credentials

* S3VectorsRAGIngestion

* TestRAGS3Vectors

* docs: AWS S3 Vectors

* add asyncio QA checks

* fix: S3_VECTORS_DEFAULT_DIMENSION

* init ui for bedrock s3 vectors

* fix add /search support for s3_vectors

* init atransform_search_vector_store_request

* feat: S3VectorsVectorStoreConfig

* TestS3VectorsVectorStoreConfig

* atransform_search_vector_store_request

* fix: S3VectorsVectorStoreConfig

* add validation for bucket name etd

* fix UI validation for s3 vector store

* init extract_text_from_pdf

* add pypdf

* fix code QA checks

* fix navbar

* init s3_vector.png

* fix QA code

* Add tests for native_background_mode feature

Added 8 new unit tests for the native_background_mode feature:
- test_polling_disabled_when_model_in_native_background_mode
- test_polling_disabled_for_native_background_mode_with_provider_list
- test_polling_enabled_when_model_not_in_native_background_mode
- test_polling_enabled_when_native_background_mode_is_none
- test_polling_enabled_when_native_background_mode_is_empty_list
- test_native_background_mode_exact_match_required
- test_native_background_mode_with_provider_prefix_in_request
- test_native_background_mode_with_router_lookup

Committed-By-Agent: cursor

* add sortBy and sortOrder params for /v2/model/info

* ruff check

* Fixing UI tests

* test(proxy): add regression tests for vertex passthrough model names with slashes (#19855)

Added test cases for custom model names containing slashes in Vertex AI
passthrough URLs (e.g., gcp/google/gemini-2.5-flash).

Test cases:
- gcp/google/gemini-2.5-flash
- gcp/google/gemini-3-flash-preview
- custom/model

* fix: guardrails issues streaming-response regex (#19901)

* fix: add fix for migration issue and and stable linux debain (#19843)

* fix: filter unsupported beta headers for Bedrock Invoke API (#19877)

- Add whitelist-based filtering for anthropic_beta headers
- Only allow Bedrock-supported beta flags (computer-use, tool-search, etc.)
- Filter out unsupported flags like mcp-servers, structured-outputs
- Remove output_format parameter from Bedrock Invoke requests
- Force tool-based structured outputs when response_format is used

Fixes #16726

* fix: allow tool_choice for Azure GPT-5 chat models (#19813)

* fix: don't treat gpt-5-chat as GPT-5 reasoning

* fix: mark azure gpt-5-chat as supporting tool_choice

* test: cover gpt-5-chat params on azure/openai

* fix: tool with antropic #19800 (#19805)

* All Models Page server side sorting

* Add Init Containers in the community helm chart (#19816)

* docs: fix guardrail logging docs (#19833)

* Fixing build and tests

* inspect BadRequestError after all other policy types (#19878)

As indicated by https://docs.litellm.ai/docs/exception_mapping,
BadRequestError is used as the base type for multiple exceptions.  As
such, it should be tested last in handling retry policies.

This updates the integration test that validates retry policies work as
expected.

Fixes #19876

* fix(main): use local tiktoken cache in lazy loading (#19774)

The lazy loading implementation for encoding in __getattr__ was calling
tiktoken.get_encoding() directly without first setting TIKTOKEN_CACHE_DIR.
This caused tiktoken to attempt downloading the encoding file from the
internet instead of using the local copy bundled with litellm.

This fix uses _get_default_encoding() from _lazy_imports which properly
sets TIKTOKEN_CACHE_DIR before loading tiktoken, ensuring the local cache
is used.

* fix(gemini): subtract implicit cached tokens from text_tokens for correct cost calculation (#19775)

When Gemini uses implicit caching, it returns cachedContentTokenCount but
NOT cacheTokensDetails. Previously, text_tokens was not adjusted in this case,
causing costs to be calculated as if all tokens were non-cached.

This fix subtracts cachedContentTokenCount from text_tokens when no
cacheTokensDetails is present (implicit caching), ensuring correct cost
calculation with the reduced cache_read pricing.

* [Feat] UI: Allow Admins to control what pages are visible on LeftNav (#19907)

* feat: enabled_ui_pages_internal_users

* init ui for internal user controsl

* fix ui settings

* fix build

* fix leftnav

* fix leftnav

* test fixes

* fix leftnav

* isPageAccessibleToInternalUsers

* docs fix

* docs ui viz

* Add xai websearch params support

* Allow dynamic setting of store_prompts_in_spend_logs

* Fix: output_tokens_details.reasoning_tokens None

* fix: Pydantic will fail to parse it because cached_tokens is required but not provided

* Spend logs setting modal

* adding tests

* fix(anthropic): remove explicit cache_control null in tool_result content

Fixes issue where tool_result content blocks include explicit
'cache_control': null which breaks some Anthropic API channels.

Changes:
- Only include cache_control field when explicitly set and not None
- Prevents serialization of null values in tool_result text content
- Maintains backward compatibility with existing cache_control usage

Related issue: Anthropic tool_result conversion adds explicit null values
that cause compatibility issues with certain API implementations.

Co-Authored-By: Claude (claude-4.5-sonnet) <noreply@anthropic.com>

* Fixing tests

* Add Prompt caching and reasoning support for MiniMax, GLM, Xiaomi

* Fix test_calculate_usage_completion_tokens_details_always_populated and logging object test

* Fix gemini-robotics-er-1.5-preview name

* Fix gemini-robotics-er-1.5-preview name

* Fix team cli auth flow (#19666)

* Cleanup code for user cli auth, and make sure not to prompt user for team multiple times while polling

* Adding tests

* Cleanup normalize teams some more

* fix(vertex_ai): support model names with slashes in passthrough URLs (#19944)

The regex in get_vertex_model_id_from_url() was using [^/:]+
which stopped at the first slash, truncating model names like
'gcp/google/gemini-2.5-flash' to just 'gcp'. This caused
access_groups checks to fail for custom model names.

Changed the pattern to [^:]+ to allow slashes in model names,
only stopping at the colon before the action (e.g., :generateContent).

* Fix thread leak in OpenTelemetry dynamic header path (#19946)

* UI: New build

* breakdown by team and keys

* Adding test

* Fixing build

* fix pypdf: >=6.6.2

* [Fix] A2a Gateway - Allow supporting old A2a card formats  (#19949)

* fix: LiteLLMA2ACardResolver

* fix: LiteLLMA2ACardResolver

* feat: .well-known/agent.json

* test_card_resolver_fallback_from_new_to_old_path

* Add error_message search in spend logs endpoint

* Adding Error message search to ui spend logs

* fix

* fix(presidio): reuse HTTP connections to prevent OOMs (#19964)

* [Fix] VertexAI Pass through - fix regression that caused vertex ai passthroughs to stop working for router models (#19967)

* fix(vertex_ai): replace custom model names with actual Vertex AI model names in passthrough URLs (#19948)

When the passthrough URL already contains project and location, the code
was skipping the deployment lookup and forwarding the URL as-is to Vertex AI.
For custom model names like gcp/google/gemini-2.5-flash, Vertex AI returned
404 because it only knows the actual model name (gemini-2.5-flash).

The fix makes the deployment lookup always run, so the custom model name
gets replaced with the actual Vertex AI model name before forwarding.

* add _resolve_vertex_model_from_router

* fix: get_llm_provider

* Potential fix for code scanning alert no. 4020: Clear-text logging of sensitive information

Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>

---------

Co-authored-by: michelligabriele <gabriele.michelli@icloud.com>
Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>

* Reusable Table Sort Component

* Fixing sorting API calls

* [Release Day] - Fixed CI/CD issues & changed processes (#19902)

* [Feat] - Search API add /list endpoint to list what search tools exist in router  (#19969)

* feat: List all available search tools configured in the router.

* add debugging search API

* add debugging search API

* fixing sorting for v2/model/info

* [Feat] LiteLLM Vector Stores - Add permission management for users, teams (#19972)

* fix: create_vector_store_in_db

* add team/user to LiteLLM_ManagedVectorStore

* add _check_vector_store_access

* add new fields

* test_check_vector_store_access

* add vector_store/list endpoints

* fix code QA checks

* feat: Add new OpenRouter models: `xiaomi/mimo-v2-flash`, `z-ai/glm-4.7`, `z-ai/glm-4.7-flash`, and `minimax/minimax-m2.1`. to model prices and context window (#19938)

Co-authored-by: Rushil Chugh <Rushil>

* fix gemini gemini-robotics-er-1.5-preview entry

* removing _experimental out routes from gitignore

* chore: update Next.js build artifacts (2026-01-29 04:12 UTC, node v22.16.0)

* Add custom_llm_provider as gemini translation

* Add test to check if model map is corretly formatted

* Intentional bad model map

* Add Validate model_prices_and_context_window.json job

* Remove validate job from lint

* Intentional bad model map

* Intentional bad model map

* Correct model map path

* Fix: litellm_fix_robotic_model_map_entry

* fix(mypy): fix type: ignore placement for OTEL LogRecord import

The type: ignore[attr-defined] comment was on the import alias line
inside parentheses, but mypy reports the error on the `from` line.
Collapse to single-line imports so the suppression is on the correct
line. Also add no-redef to the fallback branch.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

---------

Co-authored-by: yuneng-jiang <yuneng.jiang@gmail.com>
Co-authored-by: Ishaan Jaffer <ishaanjaffer0324@gmail.com>
Co-authored-by: Harshit Jain <48647625+Harshit28j@users.noreply.github.com>
Co-authored-by: Cesar Garcia <128240629+Chesars@users.noreply.github.com>
Co-authored-by: Yogeshwaran Ravichandran <96047771+yogeshwaran10@users.noreply.github.com>
Co-authored-by: Alexsander Hamir <alexsanderhamirgomesbaptista@gmail.com>
Co-authored-by: Harshit Jain <harshitjain0562@gmail.com>
Co-authored-by: Jay Prajapati <79649559+jayy-77@users.noreply.github.com>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
Co-authored-by: Krish Dholakia <krrishdholakia@gmail.com>
Co-authored-by: Tamir Kiviti <95572081+tamirkiviti13@users.noreply.github.com>
Co-authored-by: Ephrim Stanley <ephrim.stanley@point72.com>
Co-authored-by: michelligabriele <gabriele.michelli@icloud.com>
Co-authored-by: Gabriele Michelli <michelligabriele0@gmail.com>
Co-authored-by: Chesars <cesarponce19544@gmail.com>
Co-authored-by: yogeshwaran10 <ywaran646@gmail.com>
Co-authored-by: colinlin-stripe <colinlin@stripe.com>
Co-authored-by: houdataali <84786211+houdataali@users.noreply.github.com>
Co-authored-by: mubashir1osmani <mubashir.osmani777@gmail.com>
Co-authored-by: Sameer Kankute <sameer@berri.ai>
Co-authored-by: Cursor Agent <cursoragent@cursor.com>
Co-authored-by: Milan <milan@berri.ai>
Co-authored-by: Xianzong Xie <xianzongxie@stripe.com>
Co-authored-by: Teo Stocco <zifeo@users.noreply.github.com>
Co-authored-by: Pragya Sardana <pragyasardana@gmail.com>
Co-authored-by: Ryan Wilson <84201908+ryewilson@users.noreply.github.com>
Co-authored-by: Brian Caswell <bcaswell@microsoft.com>
Co-authored-by: lizhen <lizhen10763@autohome.com.cn>
Co-authored-by: boarder7395 <37314943+boarder7395@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
Co-authored-by: rushilchugh01 <58689126+rushilchugh01@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants