add timeout to onyx guardrail by tamirkiviti13 · Pull Request #19731 · BerriAI/litellm

tamirkiviti13 · 2026-01-25T16:14:21Z

Relevant issues

Pre-Submission checklist

Please complete all items before asking a LiteLLM maintainer to review your PR

I have Added testing in the tests/litellm/ directory, Adding at least 1 test is a hard requirement - see details
My PR passes all unit tests on make test-unit
My PR's scope is as isolated as possible, it only solves 1 specific problem

CI (LiteLLM team)

CI status guideline:

50-55 passing tests: main is stable with minor issues.

45-49 passing tests: acceptable but needs attention

<= 40 passing tests: unstable; be careful with your merges and assess the risk.

Branch creation CI run
Link:
CI run for the last commit
Link:
Merge / cherry-pick CI run
Links:

Type

🚄 Infrastructure

Changes

Added the option to override the default timeout for the HTTP client in Onyx's custom guardrail

vercel · 2026-01-25T16:14:26Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Review	Updated (UTC)
litellm	Ready	Preview, Comment	Jan 25, 2026 4:15pm

@patch

* UI: new build * redirect to login on expired jwt * [Feat] UI + Backend - Allow adding policies on Keys/Teams + Viewing on Info panels (#19688) * ui for policy mgmt * test_add_guardrails_from_policy_engine_accepts_dynamic_policies_and_pops_from_data * docs: add litellm-enterprise requirement for managed files (#19689) * Update Gemini 2.0 Flash deprecation dates to March 31, 2026 (#19592) Google announced that Gemini 2.0 Flash and Flash Lite models will be discontinued on March 31, 2026. Updated deprecation_date field for all affected model variants across different providers (vertex_ai, gemini, deepinfra, openrouter, vercel_ai_gateway). Models updated: - gemini-2.0-flash (added deprecation date) - gemini-2.0-flash-001 (updated from 2026-02-05) - gemini-2.0-flash-lite (added deprecation date) - gemini-2.0-flash-lite-001 (updated from 2026-02-25) All variants now correctly reflect the March 31, 2026 shutdown date. * fixing build * Fixing failing tests * deactivating non root tests * fixing arize tests * cache tests serial * fixing circleci config * fixing circleci config * Update OSS Adopters section with new table format * Fixing ruff check * bump: version 1.81.2 → 1.81.3 * chore: update Next.js build artifacts (2026-01-24 17:18 UTC, node v22.16.0) * CI/CD fixes - split local testing * fix: _apply_search_filter_to_models mypy linting * test_partner_models_httpx_streaming * test_web_search * Fix: log duplication when json_logs is enabled (#19705) * fix: FLAKY tests * fix unstable tests * docs fix * docs fix * docs fix * docs fix * docs fix * test_get_default_unvicorn_init_args * fix flaky tests * test_hanging_request_azure * test_team_update_sc_2 * BUMP extras * test fixes * test fixes * test_retrieve_container_basic * Model and Team filtering * TestBedrockInvokeToolSearch * fix(presidio): resolve runtime error by handling asyncio loops in bac… (#19714) * fix(presidio): resolve runtime error by handling asyncio loops in background threads * add test case for thread safety * UI Keys Teams Router Settings docs * chore: update Next.js build artifacts (2026-01-25 00:27 UTC, node v22.16.0) * test_stream_transformation_error_sync * fix patch reliability mock tests * fix MCP tests * auto truncation of virtual keys table values * fix: args issue & refactor into helper function to reduce bloat for both(#19441) * Fix bulk user add * fix(proxy): support slashes in google generateContent model names (#19737) * fix(proxy): support slashes in google route params * fix(proxy): extract google model ids with slashes * test(proxy): cover google model ids with slashes * Fix/non standard mcp url pattern (#19738) * fix(mcp): Add standard MCP URL pattern support for OAuth discovery (#17272) OAuth discovery endpoints now support both URL patterns: - Standard MCP pattern: /mcp/{server_name} (new) - Legacy LiteLLM pattern: /{server_name}/mcp (backward compatible) The standard pattern is required by MCP-compliant clients like mcp-inspector and VSCode Copilot, which expect resource URLs following the /mcp/{server_name} convention per RFC 9728. Changes: - Add _build_oauth_protected_resource_response() helper - Add oauth_protected_resource_mcp_standard() endpoint - Add oauth_authorization_server_mcp_standard() endpoint - Keep legacy endpoints for backward compatibility - Add tests for both URL patterns Fixes #17272 * fix(mcp): Add standard MCP URL pattern support for OAuth discovery (#17272) OAuth discovery endpoints now support both URL patterns: - Standard MCP pattern: /mcp/{server_name} (new) - Legacy LiteLLM pattern: /{server_name}/mcp (backward compatible) The standard pattern is required by MCP-compliant clients like mcp-inspector and VSCode Copilot, which expect resource URLs following the /mcp/{server_name} convention per RFC 9728. Changes: - Add _build_oauth_protected_resource_response() helper - Add oauth_protected_resource_mcp_standard() endpoint - Add oauth_authorization_server_mcp_standard() endpoint - Keep legacy endpoints for backward compatibility - Add tests for both URL patterns Fixes #17272 * Test was relocated * refactor(mcp): Extract helper methods from run_with_session to fix PLR0915 Split the large run_with_session method (55 statements) into smaller helper methods to satisfy ruff's PLR0915 rule (max 50 statements): - _create_transport_context(): Creates transport based on type - _execute_session_operation(): Handles session lifecycle Also changed cleanup exception handling from Exception to BaseException to properly catch asyncio.CancelledError (which is a BaseException subclass in Python 3.8+). Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * test(mcp): Fix flaky test by mocking health_check_server The test_mcp_server_manager_config_integration_with_database test was making real network calls to fake URLs which caused timeouts and CancelledError exceptions. Fixed by mocking health_check_server to return a proper LiteLLM_MCPServerTable object instead of making network calls. * test(mcp): Fix skip condition to properly detect claude model names The skip condition for missing API keys was checking for "anthropic" in the model name, but the test uses "claude-haiku-4-5" which doesn't match. Updated to check for both "anthropic" and "claude" model patterns. Also added skip condition for OpenAI models when OPENAI_API_KEY is not set. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * test(mcp): Fix skip condition to properly detect claude model names The skip condition for missing API keys was checking for "anthropic" in the model name, but the test uses "claude-haiku-4-5" which doesn't match. Updated to check for both "anthropic" and "claude" model patterns. Also added skip condition for OpenAI models when OPENAI_API_KEY is not set. --------- Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com> * add callbacks and labels to prometheus (#19708) * feat: add clientip and user agent in metrics (#19717) * feat: add clientip and user agent in metrics * fix: lint errors * Add model id and other req labels --------- Co-authored-by: Krish Dholakia <krrishdholakia@gmail.com> * fix: optimize logo fetching and resolve mcp import blockers (#19719) * feat: tpm-rpm limit in prometheus metrics (#19725) Co-authored-by: Krish Dholakia <krrishdholakia@gmail.com> * add timeout to onyx guardrail (#19731) * add timeout to onyx guardrail * add tests * Fix /batches to return encoded ids (from managed objects table) * fix(proxy): use return value from CustomLogger.async_post_call_success_hook (#19670) * fix(proxy): use return value from CustomLogger.async_post_call_success_hook Previously the return value was ignored for CustomLogger callbacks, preventing users from modifying responses. Now the return value is captured and used to replace the response (if not None), consistent with CustomGuardrail and streaming iterator hook behavior. Fixes issue with custom_callbacks not being able to inject data into LLM responses. * fix(proxy): also fix async_post_call_streaming_hook to use return value Previously the streaming hook only used return values that started with "data: " (SSE format). Now any non-None return value is used, consistent with async_post_call_success_hook and streaming iterator hook behavior. Added tests for streaming hook transformation. --------- Co-authored-by: Gabriele Michelli <michelligabriele0@gmail.com> * feat(hosted_vllm): support thinking parameter for /v1/messages endpoint Adds support for Anthropic-style 'thinking' parameter in hosted_vllm, converting it to OpenAI-style 'reasoning_effort' since vLLM is OpenAI-compatible. This enables users to use Claude Code CLI with hosted vLLM models like GLM-4.6/4.7 through the /v1/messages endpoint. Mapping (same as Anthropic adapter): - budget_tokens >= 10000 -> "high" - budget_tokens >= 5000 -> "medium" - budget_tokens >= 2000 -> "low" - budget_tokens < 2000 -> "minimal" Fixes #19761 * Fix batch creation to return the input file's expires_at attribute * bump: version 1.81.3 → 1.81.4 (#19793) * fix: server rooth path (#19790) * refactor: extract transport context creation into separate method (#19794) * Fix user max budget reset to unlimited - Added a Pydantic validator to convert empty string inputs for max_budget to None, preventing float parsing errors from the frontend. - Modified the internal user update logic to explicitly allow max_budget to be None, ensuring the value isn't filtered out and can be reset to unlimited in the database. - Added unit tests for validation and logic. Closes #19781 * Make test_get_users_key_count deterministic by creating dedicated test user (#19795) - Create a test user with auto_create_key=False to ensure known starting state - Filter get_users by user_ids to target only the test user - Verify initial key count is 0 before creating a key - Clean up test user after test completes - This ensures consistent behavior across CI and local environments * Add test for Router.get_valid_args, fix router code coverage encoding (#19797) - Add test_get_valid_args in test_router_helper_utils.py to cover get_valid_args - Use encoding='utf-8' in router_code_coverage.py for cross-platform file reads * fix sso email case sensitivity * Fix test_mcp_server_manager_config_integration_with_database cancellation error (#19801) Mock _create_mcp_client to avoid network calls in health checks. This prevents asyncio.CancelledError when the test teardown closes the event loop while health checks are still pending. The test focuses on conversion logic (access_groups, description) not health check functionality, so mocking the network call is appropriate. * fix: make HTTPHandler mockable in OIDC secret manager tests (#19803) * fix: make HTTPHandler mockable in OIDC secret manager tests - Add _get_oidc_http_handler() factory function to make HTTPHandler easily mockable in tests - Update test_oidc_github_success to patch factory function instead of HTTPHandler directly - Update Google OIDC tests for consistency - Fixes test_oidc_github_success failure where mock was bypassed This change allows tests to properly mock HTTPHandler instances used for OIDC token requests, fixing the test failure where the mock was not being used. * fix: patch base_llm_http_handler method directly in container tests - Use patch.object to patch container_create_handler method directly on the base_llm_http_handler instance instead of patching the module - Fixes test_provider_support[openai] failure where mock wasn't applied - Also fixes test_error_handling_integration with same approach The issue was that patching 'litellm.containers.main.base_llm_http_handler' didn't work because the module imports it with 'from litellm.main import', creating a local reference. Using patch.object patches the method on the actual object instance, which works regardless of import style. * fix: resolve flaky test_openai_env_base by clearing cache - Add cache clearing at start of test_openai_env_base to prevent cache pollution - Ensures no cached clients from previous tests interfere with respx mocks - Fixes intermittent failures where aiohttp transport was used instead of httpx - Test-only change with low risk, no production code modifications Resolves flaky test marked with @pytest.mark.flaky(retries=3, delay=1) Both parametrized versions (OPENAI_API_BASE and OPENAI_BASE_URL) now pass consistently * test: add explicit mock verification in test_provider_support - Capture mock handler with 'as mock_handler' for explicit validation - Add assert_called_once() to verify mock was actually used - Ensures test verifies no real API calls are made - Follows same pattern as test_openai_env_base validation * Add light/dark mode slider for dev * fix key duration input * Messages api bedrock converse caching and pdf support (#19785) * cache control for user messages and system messages * add cache createion tokens in reponse * cache controls in tool calls and assistant turns * refactor with _should_preserve_cache_control * add cache control unit tests * use simpler cache creation token count logic * use helper function * remove unused function * fix unit tests * fixing team member add * [Feat] enable progress notifications for MCP tool calls (#19809) * enable progress notifications for MCP tool calls * adjust mcp test * [Feat] CLI Auth - Add configurable CLI JWT expiration via environment variable (#19780) * fix: add CLI_JWT_EXPIRATION_HOURS * docs: CLI_JWT_EXPIRATION_HOURS * fix: get_cli_jwt_auth_token * test_get_cli_jwt_auth_token_custom_expiration * fixing flaky tests around oidc and email * Add dont ask me again option in nudges * CI/CD: Increase retries and stabilize litellm_mapped_tests_core (#19826) * Fix PLR0915: Extract system message handling to reduce statement count * fix mypy * fix: add host_progress_callback parameter to mock_call_tool in test The test_call_tool_without_broken_pipe_error was failing because the mock function did not accept the host_progress_callback keyword argument that the actual implementation passes to client.call_tool(). Updated the mock to accept this parameter to match the real implementation signature. * fixing flaky tests around oidc and email * Add documentation comment to test file * add retry * add dependency * increase retry --------- Co-authored-by: yuneng-jiang <yuneng.jiang@gmail.com> * Fix broken mocks in 6 flaky tests to prevent real API calls (#19829) * Fix broken mocks in 6 flaky tests to prevent real API calls Added network-level HTTP blocking using respx to prevent tests from making real API calls when Python-level mocks fail. This makes tests more reliable and retryable in CI. Changes: - Azure OIDC test: Added Azure Identity SDK mock to prevent real Azure calls - Vector store test: Added @respx.mock decorator to block HTTP requests - Resend email tests (3): Added @respx.mock decorator for all 3 test functions - SendGrid email test: Added @respx.mock decorator All test assertions and verification logic remain unchanged - only added safety nets to catch leaked API calls. * Fix failing OIDC secret manager tests Fixed two test failures in test_secret_managers_main.py: 1. test_oidc_azure_ad_token_success: Corrected the patch path for get_bearer_token_provider from 'litellm.secret_managers.get_azure_ad_token_provider.get_bearer_token_provider' to 'azure.identity.get_bearer_token_provider' since the function is imported from azure.identity. 2. test_oidc_google_success: Added @patch('httpx.Client') decorator to prevent any real HTTP connections during test execution, resolving httpx.ConnectError issues. Both tests now pass successfully. * Adding tests: * fixing breaking change: just user_id provided should upsert still * Fix: A2A Python SDK URL * [Feat] Add UI for /rag/ingest API - upload docs, pdfs etc to create vector stores (#19822) * feat: _save_vector_store_to_db_from_rag_ingest * UI features for RAG ingest * fix: Endpoints * ragIngestCall * _save_vector_store_to_db_from_rag_ingest * fix: rag_ingest Code QA CHECK * UI fixes unit tests * docs(readme): add OpenAI Agents SDK to OSS Adopters (#19820) * docs(readme): add OpenAI Agents SDK to OSS Adopters * docs(readme): add OpenAI Agents SDK logo * Fixing tests * Litellm release notes 01 26 2026 (#19836) * docs: document new models/endpoints * docs: cleanup * feat: update model table * fixing tests * Litellm release notes 01 26 2026 (#19838) * docs: document new models/endpoints * docs: cleanup * feat: update model table * fix: cleanup * feat: Add model_id label to Prometheus metrics (#18048) (#19678) Co-authored-by: Cursor Agent <cursoragent@cursor.com> * fix(models): set gpt-5.2-codex mode to responses for Azure and OpenRouter (#19770) Fixes #19754 The gpt-5.2-codex model only supports the responses API, not chat completions. Updated azure/gpt-5.2-codex and openrouter/openai/gpt-5.2-codex entries to use mode: "responses" and supported_endpoints: ["/v1/responses"]. * fix(responses): update local_vars with detected provider (#19782) (#19798) When using the responses API with provider-specific params (aws_*, vertex_*) without explicitly passing custom_llm_provider, the code crashed with: AttributeError: 'NoneType' object has no attribute 'startswith' Root cause: local_vars was captured via locals() before get_llm_provider() detected the provider from the model string (e.g., "bedrock/..."), so custom_llm_provider remained None when processing provider-specific params. Fix: Update local_vars["custom_llm_provider"] after get_llm_provider() call so the detected provider is available for param processing. Affected provider-specific params: - aws_* (aws_region_name, aws_access_key_id, etc.) for Bedrock/SageMaker - vertex_* (vertex_project, vertex_location, etc.) for Vertex AI * fix(azure): use generic cost calculator for audio token pricing (#19771) Azure audio models were charging audio output tokens at the text token rate instead of the correct audio token rate. This resulted in costs being ~6.65x lower than expected. The fix replaces Azure's custom cost calculation logic with the generic cost calculator that properly handles text, audio, cached, reasoning, and image tokens. Fixes #19764 * fix(xai): correct cached token cost calculation for xAI models (#19772) * fix(azure): use generic cost calculator for audio token pricing Azure audio models were charging audio output tokens at the text token rate instead of the correct audio token rate. This resulted in costs being ~6.65x lower than expected. The fix replaces Azure's custom cost calculation logic with the generic cost calculator that properly handles text, audio, cached, reasoning, and image tokens. Fixes #19764 * fix(xai): correct cached token cost calculation for xAI models - Fix double-counting issue where xAI reports text_tokens = prompt_tokens (including cached), causing tokens to be charged twice - Add cache_read_input_token_cost to xAI grok-3 and grok-3-mini model variants - Detection: when text_tokens + cached_tokens > prompt_tokens, recalculate text_tokens = prompt_tokens - cached_tokens xAI pricing (25% of input for cached): - grok-3 variants: $0.75/M cached (input $3/M) - grok-3-mini variants: $0.075/M cached (input $0.30/M) * Fix:Support both JSON array format and comma-separated values from user headers * Translate advanced-tool-use to Bedrock-specific headers for Claude Opus 4.5 * fix: token calculations and refactor (#19696) * fix(prometheus): safely handle None metadata in logging to prevent At… (#19691) * fix(prometheus): safely handle None metadata in logging to prevent AttributeError * fix: lint issues * fix: resolve 'does not exist' migration errors as applied in setup_database (#19281) * Fix: timeout exception raised eror * Add sarvam doc * Add gemini-robotics-er-1.5-preview model in model map * Add gemini-robotics-er-1.5-preview model documentation * Fix: Stream the download in chunks * Add grok reasoning content * Revert poetry lock * Fix mypy and code quality issues * feat: add feature to make silent calls (#19544) * feat: add feature to make silent calls * add test or silent feat * add docs for silent feat * fix lint issues and UI logs * add docs of ab testing and deep copy * fix(enterprise): correct error message for DISABLE_ADMIN_ENDPOINTS (#19861) The error message for DISABLE_ADMIN_ENDPOINTS incorrectly said "DISABLING LLM API ENDPOINTS is an Enterprise feature" instead of "DISABLING ADMIN ENDPOINTS is an Enterprise feature". This was a copy-paste bug from the is_llm_api_route_disabled() function. Added regression tests to verify both error messages are correct. * fix(proxy): handle agent parameter in /interactions endpoint (#19866) * initialize tiktoken environment at import time to support offline usage * fix(bedrock): support tool search header translation for Sonnet 4.5 (#19871) Extend advanced-tool-use header translation to include Claude Sonnet 4.5 in addition to Opus 4.5 on Bedrock Invoke API. When Claude Code sends the advanced-tool-use-2025-11-20 header, it now gets correctly translated to Bedrock-specific headers for both: - Claude Opus 4.5 - Claude Sonnet 4.5 Headers translated: - tool-search-tool-2025-10-19 - tool-examples-2025-10-29 Fixes defer_loading validation error on Bedrock with Sonnet 4.5. Ref: https://platform.claude.com/docs/en/agents-and-tools/tool-use/tool-search-tool * bulk update keys endpoint * mypy linting * [Feat] RAG API - Add support for using s3 Vectors as Vector Store Provider for /rag/ingest (#19888) * init S3VectorsRAGIngestion as a supported ingestion provider for RAG API * test: TestRAGS3Vectors * init S3VectorsVectorStoreOptions * init s3 vectors * code clean up + QA * fix: get_credentials * S3VectorsRAGIngestion * TestRAGS3Vectors * docs: AWS S3 Vectors * add asyncio QA checks * fix: S3_VECTORS_DEFAULT_DIMENSION * Add native_background_mode to override polling_via_cache for specific models This follow-up to PR #16862 allows users to specify models that should use the native provider's background mode instead of polling via cache. Config example: litellm_settings: responses: background_mode: polling_via_cache: ["openai"] native_background_mode: ["o4-mini-deep-research"] ttl: 3600 When a model is in native_background_mode list, should_use_polling_for_request returns False, allowing the request to fall through to native provider handling. Committed-By-Agent: cursor * [Feat] RAG API - Add s3_vectors as provider on /vector_store/search API + UI for creating + PDF support for /rag/ingest (#19895) * init S3VectorsRAGIngestion as a supported ingestion provider for RAG API * test: TestRAGS3Vectors * init S3VectorsVectorStoreOptions * init s3 vectors * code clean up + QA * fix: get_credentials * S3VectorsRAGIngestion * TestRAGS3Vectors * docs: AWS S3 Vectors * add asyncio QA checks * fix: S3_VECTORS_DEFAULT_DIMENSION * init ui for bedrock s3 vectors * fix add /search support for s3_vectors * init atransform_search_vector_store_request * feat: S3VectorsVectorStoreConfig * TestS3VectorsVectorStoreConfig * atransform_search_vector_store_request * fix: S3VectorsVectorStoreConfig * add validation for bucket name etd * fix UI validation for s3 vector store * init extract_text_from_pdf * add pypdf * fix code QA checks * fix navbar * init s3_vector.png * fix QA code * Add tests for native_background_mode feature Added 8 new unit tests for the native_background_mode feature: - test_polling_disabled_when_model_in_native_background_mode - test_polling_disabled_for_native_background_mode_with_provider_list - test_polling_enabled_when_model_not_in_native_background_mode - test_polling_enabled_when_native_background_mode_is_none - test_polling_enabled_when_native_background_mode_is_empty_list - test_native_background_mode_exact_match_required - test_native_background_mode_with_provider_prefix_in_request - test_native_background_mode_with_router_lookup Committed-By-Agent: cursor * add sortBy and sortOrder params for /v2/model/info * ruff check * Fixing UI tests * test(proxy): add regression tests for vertex passthrough model names with slashes (#19855) Added test cases for custom model names containing slashes in Vertex AI passthrough URLs (e.g., gcp/google/gemini-2.5-flash). Test cases: - gcp/google/gemini-2.5-flash - gcp/google/gemini-3-flash-preview - custom/model * fix: guardrails issues streaming-response regex (#19901) * fix: add fix for migration issue and and stable linux debain (#19843) * fix: filter unsupported beta headers for Bedrock Invoke API (#19877) - Add whitelist-based filtering for anthropic_beta headers - Only allow Bedrock-supported beta flags (computer-use, tool-search, etc.) - Filter out unsupported flags like mcp-servers, structured-outputs - Remove output_format parameter from Bedrock Invoke requests - Force tool-based structured outputs when response_format is used Fixes #16726 * fix: allow tool_choice for Azure GPT-5 chat models (#19813) * fix: don't treat gpt-5-chat as GPT-5 reasoning * fix: mark azure gpt-5-chat as supporting tool_choice * test: cover gpt-5-chat params on azure/openai * fix: tool with antropic #19800 (#19805) * All Models Page server side sorting * Add Init Containers in the community helm chart (#19816) * docs: fix guardrail logging docs (#19833) * Fixing build and tests * inspect BadRequestError after all other policy types (#19878) As indicated by https://docs.litellm.ai/docs/exception_mapping, BadRequestError is used as the base type for multiple exceptions. As such, it should be tested last in handling retry policies. This updates the integration test that validates retry policies work as expected. Fixes #19876 * fix(main): use local tiktoken cache in lazy loading (#19774) The lazy loading implementation for encoding in __getattr__ was calling tiktoken.get_encoding() directly without first setting TIKTOKEN_CACHE_DIR. This caused tiktoken to attempt downloading the encoding file from the internet instead of using the local copy bundled with litellm. This fix uses _get_default_encoding() from _lazy_imports which properly sets TIKTOKEN_CACHE_DIR before loading tiktoken, ensuring the local cache is used. * fix(gemini): subtract implicit cached tokens from text_tokens for correct cost calculation (#19775) When Gemini uses implicit caching, it returns cachedContentTokenCount but NOT cacheTokensDetails. Previously, text_tokens was not adjusted in this case, causing costs to be calculated as if all tokens were non-cached. This fix subtracts cachedContentTokenCount from text_tokens when no cacheTokensDetails is present (implicit caching), ensuring correct cost calculation with the reduced cache_read pricing. * [Feat] UI: Allow Admins to control what pages are visible on LeftNav (#19907) * feat: enabled_ui_pages_internal_users * init ui for internal user controsl * fix ui settings * fix build * fix leftnav * fix leftnav * test fixes * fix leftnav * isPageAccessibleToInternalUsers * docs fix * docs ui viz * Add xai websearch params support * Allow dynamic setting of store_prompts_in_spend_logs * Fix: output_tokens_details.reasoning_tokens None * fix: Pydantic will fail to parse it because cached_tokens is required but not provided * Spend logs setting modal * adding tests * fix(anthropic): remove explicit cache_control null in tool_result content Fixes issue where tool_result content blocks include explicit 'cache_control': null which breaks some Anthropic API channels. Changes: - Only include cache_control field when explicitly set and not None - Prevents serialization of null values in tool_result text content - Maintains backward compatibility with existing cache_control usage Related issue: Anthropic tool_result conversion adds explicit null values that cause compatibility issues with certain API implementations. Co-Authored-By: Claude (claude-4.5-sonnet) <noreply@anthropic.com> * Fixing tests * Add Prompt caching and reasoning support for MiniMax, GLM, Xiaomi * Fix test_calculate_usage_completion_tokens_details_always_populated and logging object test * Fix gemini-robotics-er-1.5-preview name * Fix gemini-robotics-er-1.5-preview name * Fix team cli auth flow (#19666) * Cleanup code for user cli auth, and make sure not to prompt user for team multiple times while polling * Adding tests * Cleanup normalize teams some more * fix(vertex_ai): support model names with slashes in passthrough URLs (#19944) The regex in get_vertex_model_id_from_url() was using [^/:]+ which stopped at the first slash, truncating model names like 'gcp/google/gemini-2.5-flash' to just 'gcp'. This caused access_groups checks to fail for custom model names. Changed the pattern to [^:]+ to allow slashes in model names, only stopping at the colon before the action (e.g., :generateContent). * Fix thread leak in OpenTelemetry dynamic header path (#19946) * UI: New build * breakdown by team and keys * Adding test * Fixing build * fix pypdf: >=6.6.2 * [Fix] A2a Gateway - Allow supporting old A2a card formats (#19949) * fix: LiteLLMA2ACardResolver * fix: LiteLLMA2ACardResolver * feat: .well-known/agent.json * test_card_resolver_fallback_from_new_to_old_path * Add error_message search in spend logs endpoint * Adding Error message search to ui spend logs * fix * fix(presidio): reuse HTTP connections to prevent OOMs (#19964) * [Fix] VertexAI Pass through - fix regression that caused vertex ai passthroughs to stop working for router models (#19967) * fix(vertex_ai): replace custom model names with actual Vertex AI model names in passthrough URLs (#19948) When the passthrough URL already contains project and location, the code was skipping the deployment lookup and forwarding the URL as-is to Vertex AI. For custom model names like gcp/google/gemini-2.5-flash, Vertex AI returned 404 because it only knows the actual model name (gemini-2.5-flash). The fix makes the deployment lookup always run, so the custom model name gets replaced with the actual Vertex AI model name before forwarding. * add _resolve_vertex_model_from_router * fix: get_llm_provider * Potential fix for code scanning alert no. 4020: Clear-text logging of sensitive information Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com> --------- Co-authored-by: michelligabriele <gabriele.michelli@icloud.com> Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com> * Reusable Table Sort Component * Fixing sorting API calls * [Release Day] - Fixed CI/CD issues & changed processes (#19902) * [Feat] - Search API add /list endpoint to list what search tools exist in router (#19969) * feat: List all available search tools configured in the router. * add debugging search API * add debugging search API * fixing sorting for v2/model/info * [Feat] LiteLLM Vector Stores - Add permission management for users, teams (#19972) * fix: create_vector_store_in_db * add team/user to LiteLLM_ManagedVectorStore * add _check_vector_store_access * add new fields * test_check_vector_store_access * add vector_store/list endpoints * fix code QA checks * feat: Add new OpenRouter models: `xiaomi/mimo-v2-flash`, `z-ai/glm-4.7`, `z-ai/glm-4.7-flash`, and `minimax/minimax-m2.1`. to model prices and context window (#19938) Co-authored-by: Rushil Chugh <Rushil> * fix gemini gemini-robotics-er-1.5-preview entry * removing _experimental out routes from gitignore * chore: update Next.js build artifacts (2026-01-29 04:12 UTC, node v22.16.0) * Add custom_llm_provider as gemini translation * Add test to check if model map is corretly formatted * Intentional bad model map * Add Validate model_prices_and_context_window.json job * Remove validate job from lint * Intentional bad model map * Intentional bad model map * Correct model map path * Fix: litellm_fix_robotic_model_map_entry * fix(mypy): fix type: ignore placement for OTEL LogRecord import The type: ignore[attr-defined] comment was on the import alias line inside parentheses, but mypy reports the error on the `from` line. Collapse to single-line imports so the suppression is on the correct line. Also add no-redef to the fallback branch. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> --------- Co-authored-by: yuneng-jiang <yuneng.jiang@gmail.com> Co-authored-by: Ishaan Jaffer <ishaanjaffer0324@gmail.com> Co-authored-by: Harshit Jain <48647625+Harshit28j@users.noreply.github.com> Co-authored-by: Cesar Garcia <128240629+Chesars@users.noreply.github.com> Co-authored-by: Yogeshwaran Ravichandran <96047771+yogeshwaran10@users.noreply.github.com> Co-authored-by: Alexsander Hamir <alexsanderhamirgomesbaptista@gmail.com> Co-authored-by: Harshit Jain <harshitjain0562@gmail.com> Co-authored-by: Jay Prajapati <79649559+jayy-77@users.noreply.github.com> Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com> Co-authored-by: Krish Dholakia <krrishdholakia@gmail.com> Co-authored-by: Tamir Kiviti <95572081+tamirkiviti13@users.noreply.github.com> Co-authored-by: Ephrim Stanley <ephrim.stanley@point72.com> Co-authored-by: michelligabriele <gabriele.michelli@icloud.com> Co-authored-by: Gabriele Michelli <michelligabriele0@gmail.com> Co-authored-by: Chesars <cesarponce19544@gmail.com> Co-authored-by: yogeshwaran10 <ywaran646@gmail.com> Co-authored-by: colinlin-stripe <colinlin@stripe.com> Co-authored-by: houdataali <84786211+houdataali@users.noreply.github.com> Co-authored-by: mubashir1osmani <mubashir.osmani777@gmail.com> Co-authored-by: Sameer Kankute <sameer@berri.ai> Co-authored-by: Cursor Agent <cursoragent@cursor.com> Co-authored-by: Milan <milan@berri.ai> Co-authored-by: Xianzong Xie <xianzongxie@stripe.com> Co-authored-by: Teo Stocco <zifeo@users.noreply.github.com> Co-authored-by: Pragya Sardana <pragyasardana@gmail.com> Co-authored-by: Ryan Wilson <84201908+ryewilson@users.noreply.github.com> Co-authored-by: Brian Caswell <bcaswell@microsoft.com> Co-authored-by: lizhen <lizhen10763@autohome.com.cn> Co-authored-by: boarder7395 <37314943+boarder7395@users.noreply.github.com> Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com> Co-authored-by: rushilchugh01 <58689126+rushilchugh01@users.noreply.github.com>

tamirkiviti13 added 2 commits January 25, 2026 18:07

add timeout to onyx guardrail

4f20a33

add tests

7cba3ee

vercel bot deployed to Preview January 25, 2026 16:15 View deployment

krrishdholakia changed the base branch from main to litellm_oss_staging_01_26_2026 January 26, 2026 07:13

krrishdholakia merged commit aa8134f into BerriAI:litellm_oss_staging_01_26_2026 Jan 26, 2026
4 of 7 checks passed

krrishdholakia mentioned this pull request Feb 5, 2026

fix(mypy): fix type: ignore placement for OTEL LogRecord import (#20351) #20477

Closed

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

add timeout to onyx guardrail#19731

add timeout to onyx guardrail#19731
krrishdholakia merged 2 commits intoBerriAI:litellm_oss_staging_01_26_2026from
tamirkiviti13:add-timeout-to-onyx-guardrail

tamirkiviti13 commented Jan 25, 2026

Uh oh!

vercel bot commented Jan 25, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

tamirkiviti13 commented Jan 25, 2026

Relevant issues

Pre-Submission checklist

CI (LiteLLM team)

Type

Changes

Uh oh!

vercel bot commented Jan 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

vercel bot commented Jan 25, 2026 •

edited

Loading