Fix broken mocks in 6 flaky tests to prevent real API calls by AlexsanderHamir · Pull Request #19829 · BerriAI/litellm

AlexsanderHamir · 2026-01-27T01:37:58Z

Relevant issues

Pre-Submission checklist

Please complete all items before asking a LiteLLM maintainer to review your PR

I have Added testing in the tests/litellm/ directory, Adding at least 1 test is a hard requirement - see details
My PR passes all unit tests on make test-unit
My PR's scope is as isolated as possible, it only solves 1 specific problem

CI (LiteLLM team)

CI status guideline:

50-55 passing tests: main is stable with minor issues.

45-49 passing tests: acceptable but needs attention

<= 40 passing tests: unstable; be careful with your merges and assess the risk.

Branch creation CI run
Link:
CI run for the last commit
Link:
Merge / cherry-pick CI run
Links:

Type

🆕 New Feature
🐛 Bug Fix
🧹 Refactoring
📖 Documentation
🚄 Infrastructure
✅ Test

Changes

Added network-level HTTP blocking using respx to prevent tests from making real API calls when Python-level mocks fail. This makes tests more reliable and retryable in CI. Changes: - Azure OIDC test: Added Azure Identity SDK mock to prevent real Azure calls - Vector store test: Added @respx.mock decorator to block HTTP requests - Resend email tests (3): Added @respx.mock decorator for all 3 test functions - SendGrid email test: Added @respx.mock decorator All test assertions and verification logic remain unchanged - only added safety nets to catch leaked API calls.

@patch

Fixed two test failures in test_secret_managers_main.py: 1. test_oidc_azure_ad_token_success: Corrected the patch path for get_bearer_token_provider from 'litellm.secret_managers.get_azure_ad_token_provider.get_bearer_token_provider' to 'azure.identity.get_bearer_token_provider' since the function is imported from azure.identity. 2. test_oidc_google_success: Added @patch('httpx.Client') decorator to prevent any real HTTP connections during test execution, resolving httpx.ConnectError issues. Both tests now pass successfully.

vercel · 2026-01-27T01:38:03Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Review	Updated (UTC)
litellm	Ready	Preview, Comment	Jan 27, 2026 1:38am

## Problem Tests using @respx.mock were hitting real APIs because respx only intercepts httpx requests, but litellm uses aiohttp transport by default. Tests affected: - test_send_email_missing_api_key - test_send_email_multiple_recipients (resend & sendgrid) - test_search_uses_registry_credentials - test_vector_store_create_with_simple_provider_name - test_image_edit_merges_headers_and_extra_headers ## Solution 1. Add disable_aiohttp_transport fixture (autouse=True) to test files that use respx mocking. This forces httpx transport so respx works. 2. Clear client cache in fixture to prevent reuse of old clients. 3. Fix isinstance checks in vector store tests to use type name comparison (avoids module identity issues from sys.path.insert). ## Regression PR #19829 (commit f95572e) added @respx.mock but didn't account for aiohttp transport bypassing respx interception.

…checks ## Problem Tests using mocked HTTP clients were hitting real APIs because: 1. HTTP client cache was returning previously cached real clients 2. isinstance checks failed due to module identity issues from sys.path ### Tests affected: - test_send_email_missing_api_key - test_send_email_multiple_recipients (resend & sendgrid) - test_search_uses_registry_credentials - test_vector_store_create_with_simple_provider_name - test_image_edit_merges_headers_and_extra_headers ## Solution 1. Add clear_client_cache fixture (autouse=True) to clear litellm.in_memory_llm_clients_cache before each test 2. Fix isinstance checks to use type name comparison (avoids module identity issues from sys.path.insert) ## Why not disable_aiohttp_transport The default transport is aiohttp, so tests should work with it. Clearing the cache ensures mocks are used instead of cached real clients. ## Regression PR #19829 (commit f95572e) added @respx.mock but cached clients from earlier tests were being reused, bypassing the mocks.

…checks ## Problem Tests using mocked HTTP clients were hitting real APIs because: 1. HTTP client cache was returning previously cached real clients 2. isinstance checks failed due to module identity issues from sys.path ### Tests affected: - test_send_email_missing_api_key - test_send_email_multiple_recipients (resend & sendgrid) - test_search_uses_registry_credentials - test_vector_store_create_with_simple_provider_name - test_vector_store_create_with_provider_api_type - test_vector_store_create_with_ragflow_provider - test_image_edit_merges_headers_and_extra_headers - test_retrieve_container_basic (container API tests) ## Solution 1. Add clear_client_cache fixture (autouse=True) to clear litellm.in_memory_llm_clients_cache before each test 2. Fix isinstance checks to use type name comparison (avoids module identity issues from sys.path.insert) ## Why not disable_aiohttp_transport The default transport is aiohttp, so tests should work with it. Clearing the cache ensures mocks are used instead of cached real clients. ## Regression PR #19829 (commit f95572e) added @respx.mock but cached clients from earlier tests were being reused, bypassing the mocks.

…checks (#20196) ## Problem Tests using mocked HTTP clients were hitting real APIs because: 1. HTTP client cache was returning previously cached real clients 2. isinstance checks failed due to module identity issues from sys.path ### Tests affected: - test_send_email_missing_api_key - test_send_email_multiple_recipients (resend & sendgrid) - test_search_uses_registry_credentials - test_vector_store_create_with_simple_provider_name - test_vector_store_create_with_provider_api_type - test_vector_store_create_with_ragflow_provider - test_image_edit_merges_headers_and_extra_headers - test_retrieve_container_basic (container API tests) ## Solution 1. Add clear_client_cache fixture (autouse=True) to clear litellm.in_memory_llm_clients_cache before each test 2. Fix isinstance checks to use type name comparison (avoids module identity issues from sys.path.insert) ## Why not disable_aiohttp_transport The default transport is aiohttp, so tests should work with it. Clearing the cache ensures mocks are used instead of cached real clients. ## Regression PR #19829 (commit f95572e) added @respx.mock but cached clients from earlier tests were being reused, bypassing the mocks. Co-authored-by: shin-bot-litellm <shin-bot-litellm@users.noreply.github.com>

* feat: add disable_default_user_agent flag Add litellm.disable_default_user_agent global flag to control whether the automatic User-Agent header is injected into HTTP requests. * refactor: update HTTP handlers to respect disable_default_user_agent Modify http_handler.py and httpx_handler.py to check the disable_default_user_agent flag and return empty headers when disabled. This allows users to override the User-Agent header completely. * test: add comprehensive tests for User-Agent customization Add 8 tests covering: - Default User-Agent behavior - Disabling default User-Agent - Custom User-Agent via extra_headers - Environment variable support - Async handler support - Override without disabling - Claude Code use case - Backwards compatibility * fix: honor LITELLM_USER_AGENT for default User-Agent * refactor: drop disable_default_user_agent setting * test: cover LITELLM_USER_AGENT override in custom_httpx handlers * fix Prompt Studio history to load tools and system messages (BerriAI#19920) * fix(vertex_ai): convert image URLs to base64 in tool messages for Anthropic (BerriAI#19896) * fix(vertex_ai): convert image URLs to base64 in tool messages for Anthropic Fixes BerriAI#19891 Vertex AI Anthropic models don't support URL sources for images. LiteLLM already converted image URLs to base64 for user messages, but not for tool messages (role='tool'). This caused errors when using ToolOutputImage with image_url in tool outputs. Changes: - Add force_base64 parameter to convert_to_anthropic_tool_result() - Pass force_base64 to create_anthropic_image_param() for tool message images - Calculate force_base64 in anthropic_messages_pt() based on llm_provider - Add unit tests for tool message image handling * chore: remove extra comment from test file header * Fix/router search tools v2 (BerriAI#19840) * fix(proxy_server): pass search_tools to Router during DB-triggered initialization * fix search tools from db * add missing statement to handle from db * fix import issues to pass lint errors * Fix: Batch cancellation ownership bug * Fix stream_chunk_builder to preserve images from streaming chunks (BerriAI#19654) Fixes BerriAI#19478 The stream_chunk_builder function was not handling image chunks from models like gemini-2.5-flash-image. When streaming responses were reconstructed (e.g., for caching), images in delta.images were lost. This adds handling for image_chunks similar to how audio, annotations, and other delta fields are handled. * fix(docker): add libsndfile to main Dockerfile for ARM64 audio processing (BerriAI#19776) Fixes BerriAI#16920 for users of the stable release images. The previous fix (PR BerriAI#18092) added libsndfile to docker/Dockerfile.alpine, but stable releases are built from the main Dockerfile (Wolfi-based), not the Alpine variant. * Fix File access permissions for .retreive and .delete * Fix Only allowed to call routes: ['llm_api_routes']. Tried to call route: /batches/bGl0ZWxsbV9wcm/cancel * fix(proxy): add datadog_llm_observability to /health/services allowed list (BerriAI#19952) The /health/services endpoint rejected datadog_llm_observability as an unknown service, even though it was registered in the core callback registry and __init__.py. Added it to both the Literal type hint and the hardcoded validation list in the health endpoint. * fix(proxy): prevent provider-prefixed model leaks (BerriAI#19943) * fix(proxy): prevent provider-prefixed model leaks Proxy clients should not see LiteLLM internal provider prefixes (e.g. hosted_vllm/...) in the OpenAI-compatible response model field. This patch sanitizes the client-facing model name for both: - Non-streaming responses returned from base_process_llm_request - Streaming SSE chunks emitted by async_data_generator Adds regression tests covering vLLM-style hosted_vllm routing for both streaming and non-streaming paths. * chore(lint): suppress PLR0915 in proxy handler Ruff started flagging ProxyBaseLLMRequestProcessing.base_process_llm_request() for too many statements after the hotpatch changes. Add an explicit '# noqa: PLR0915' on the function definition to avoid a large refactor in a hotpatch. * refactor(proxy): make model restamp explicit Replace silent try/except/pass and type ignores with explicit model restamping. - Logs an error when the downstream response model differs from the client-requested model - Overwrites the OpenAI `model` field to the client-requested value to avoid leaking internal provider-prefixed identifiers - Applies the same behavior to streaming chunks, logging the mismatch only once per stream * chore(lint): drop PLR0915 suppression The model restamping bugfix made `base_process_llm_request()` slightly exceed Ruff's PLR0915 (too-many-statements) threshold, requiring a `# noqa` suppression. Collapse consecutive `hidden_params` extractions into tuple unpacking so the function falls back under the lint limit and remove the suppression. No functional change intended; this keeps the proxy model-field bugfix intact while aligning with project linting rules. * chore(proxy): log model mismatches as warnings These model-restamping logs are intentionally verbose: a mismatch is a useful signal that an internal provider/deployment identifier may be leaking into the public OpenAI response `model` field. - Downgrade model mismatch logs from error -> warning - Keep error logs only for cases where the proxy cannot read/override the model * fix(proxy): preserve client model for streaming aliasing Pre-call processing can rewrite request_data['model'] via model alias maps.\n\nOur streaming SSE generator was using the rewritten value when restamping chunk.model, which caused the public 'model' field to differ between streaming and non-streaming responses for alias-based requests.\n\nStash the original client model in request_data as _litellm_client_requested_model after the model has been routed, and prefer it when overriding the outgoing chunk model. Add a regression test for the alias-mapping case. * chore(lint): satisfy PLR0915 in streaming generator Ruff started flagging async_data_generator() for too many statements after adding model restamping logic.\n\nExtract the client-model selection + chunk restamping into small helpers to keep behavior unchanged while meeting the project's PLR0915 threshold. * fix(hosted_vllm): route through base_llm_http_handler to support ssl_verify (BerriAI#19893) * fix(hosted_vllm): route through base_llm_http_handler to support ssl_verify The hosted_vllm provider was falling through to the OpenAI catch-all path which doesn't pass ssl_verify to the HTTP client. This adds an explicit elif branch that routes hosted_vllm through base_llm_http_handler.completion() which properly passes ssl_verify to the httpx client. - Add explicit hosted_vllm branch in main.py completion() - Add ssl_verify tests for sync and async completion - Update existing audio_url test to mock httpx instead of OpenAI client * feat(hosted_vllm): add embedding support with ssl_verify - Add HostedVLLMEmbeddingConfig for embedding transformations - Register hosted_vllm embedding config in utils.py - Add lazy import for embedding transformation module - Add unit test for ssl_verify parameter handling * Add OpenRouter Kimi K2.5 (BerriAI#19872) Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com> * Fix: Encoding cancel batch response * Add tests for user level permissions on file and batch access * Fix: mypy errors * Fix lint issues * Add litellm metadata correctly for file create * Add cost tacking and usage info in call_type=aretrieve_batch * Fix max_input_tokens for gpt-5.2-codex * fix(gemini): support file retrieval in GoogleAIStudioFilesHandler * Allow config embedding models * adding tests * Model Usage per key * adding tests * fix(ResponseAPILoggingUtils): extract input tokens details as dict * Add routing of xai chat completions to responses when web search options is present * Add web search tests * Add disable flahg for anthropic gemini cache translation * fix aspectRatio mapping * feat: add /delete endpoint support for gemini * Fix: vllm embedding format * Fix: remove unsupported prompt-caching-scope-2026-01-05 header for vertex ai * Add mock client factory pattern and mock support for PostHog, Helicone, and Braintrust integrations (BerriAI#19707) * Add LangSmith mock client support - Create langsmith_mock_client.py following GCS and Langfuse patterns - Add mock mode detection via LANGSMITH_MOCK environment variable - Intercept LangSmith API calls via AsyncHTTPHandler.post patching - Add verbose logging throughout mock implementation - Update LangsmithLogger to initialize mock client when mock mode enabled - Supports configurable mock latency via LANGSMITH_MOCK_LATENCY_MS * Add Datadog mock client support - Create datadog_mock_client.py following GCS, Langfuse, and LangSmith patterns - Add mock mode detection via DATADOG_MOCK environment variable - Intercept Datadog API calls via AsyncHTTPHandler.post and httpx.Client.post patching - Add verbose logging throughout mock implementation - Update DataDogLogger and DataDogLLMObsLogger to initialize mock client when mock mode enabled - Supports both async and sync logging paths - Supports configurable mock latency via DATADOG_MOCK_LATENCY_MS * refactor: consolidate mock client logic into factory pattern - Create mock_client_factory.py to centralize common mock HTTP client logic - Refactor GCS, Langfuse, LangSmith, and Datadog mock clients to use factory - Improve GET/DELETE mock accuracy for GCS (return valid StandardLoggingPayload) - Fix DELETE mock to return empty body (204 No Content) instead of JSON - Reduce code duplication across integration mock clients * feat: add PostHog mock client support - Create posthog_mock_client.py using factory pattern - Integrate mock client into PostHogLogger with mock mode detection - Add verbose logging for mock mode initialization and batch operations - Enable mock mode via POSTHOG_MOCK environment variable * Add Helicone mock client support - Created helicone_mock_client.py using factory pattern (similar to GCS) - Integrated mock mode detection and initialization in HeliconeLogger - Mock client patches HTTPHandler.post to intercept Helicone API calls - Uses factory pattern for should_use_mock and MockResponse utilities - Custom HTTPHandler.post patching required since HTTPHandler uses self.client.send() * Add mock support for Braintrust integration and extend mock client factory - Add braintrust_mock_client.py with mock HTTP client for Braintrust integration testing - Integrate mock client into BraintrustLogger with mock mode detection - Refactor Helicone mock client to fully utilize factory's HTTPHandler.post patching - Extend mock_client_factory to support patching HTTPHandler.post for sync calls - Enable endpoint-specific mock responses for Braintrust (/project vs /project_logs) - All mock clients now properly handle both async (AsyncHTTPHandler) and sync (HTTPHandler) calls * Fix linter errors: remove unused imports and suppress complexity warning - Remove unused imports from gcs_bucket_mock_client.py (httpx, json, timedelta, Dict, Optional) - Remove unused Callable import from mock_client_factory.py - Add noqa comment to suppress PLR0915 complexity warning for create_mock_client_factory function * Document mock environment variables for PostHog, Helicone, Braintrust, Datadog, and Langsmith integrations - Add POSTHOG_MOCK and POSTHOG_MOCK_LATENCY_MS documentation - Add HELICONE_MOCK and HELICONE_MOCK_LATENCY_MS documentation - Add BRAINTRUST_MOCK and BRAINTRUST_MOCK_LATENCY_MS documentation - Add DATADOG_MOCK and DATADOG_MOCK_LATENCY_MS documentation - Add LANGSMITH_MOCK and LANGSMITH_MOCK_LATENCY_MS documentation All mock env vars follow the same pattern: enable mock mode for integration testing by intercepting API calls and returning mock responses without making actual network calls. * Fix security issue * Realtime API benchmarks (BerriAI#20074) * Add /realtime API benchmarks to Benchmarks documentation - Added new section showing performance improvements for /realtime endpoint - Included before/after metrics showing 182× faster p99 latency - Added test setup specifications and key optimizations - Referenced from v1.80.5-stable release notes Co-authored-by: ishaan <ishaan@berri.ai> * Update /realtime benchmarks to show current performance only - Removed before/after comparison, showing only current metrics - Clarified that benchmarks are e2e latency against fake realtime endpoint - Simplified table format for better readability Co-authored-by: ishaan <ishaan@berri.ai> --------- Co-authored-by: Cursor Agent <cursoragent@cursor.com> Co-authored-by: ishaan <ishaan@berri.ai> * fixes: ci pipeline router coverage failure (BerriAI#20065) * fix: working claude code with agent SDKs (BerriAI#20081) * [Feat] Add async_post_call_response_headers_hook to CustomLogger (BerriAI#20083) * Add async_post_call_response_headers_hook to CustomLogger (BerriAI#20070) Allow CustomLogger callbacks to inject custom HTTP response headers into streaming, non-streaming, and failure responses via a new async_post_call_response_headers_hook method. * async_post_call_response_headers_hook --------- Co-authored-by: michelligabriele <gabriele.michelli@icloud.com> * Add WATSONX_ZENAPIKEY * fix(proxy): resolve high CPU when router_settings in DB by avoiding REGISTRY.collect() in PrometheusServicesLogger (BerriAI#20087) * v0 - looks decen view * refactored code * fix ui * fixes ui * complete v2 viewer * fix drawer * Revert logs view commits to recreate with clean history (BerriAI#20090) This reverts commits: - 437e9e2 fix drawer - 61bb51d complete v2 viewer - 2014bcf fixes ui - 5f07635 fix ui - f07ef8a refactored code - 8b7a925 v0 - looks decen view Will create a new clean PR with the original changes. * update image and bounded logo in navbar * refactoring user dropdown * new utils * address feedback * [Feat] v2 - Logs view with side panel and improved UX (BerriAI#20091) * init: azure_ai/azure-model-router * show additional_costs in CostBreakdown * UI show cost breakdown fields * feat: dedicated cost calc for azure ai * test_azure_ai_model_router * docs azure model router * test azure model router * fix transfrom * Add transform file * fix:feat: route to config * v0 - looks decen view * refactored code * fix ui * fixes ui * complete v2 viewer * address feedback * address feedback * Delete resource modal dark mode * [Feat] UI - New View to render "Tools" on Logs View (BerriAI#20093) * v1 - tool viewer in logs page * add preview for tool sections * ui fixes * new tool view * Refactor: Address code review feedback - use Antd components Changes: - Use Antd Space component instead of manual flex layouts - Use Antd Text.copyable prop instead of custom clipboard utilities - Extract helper functions to utils.ts for testability - Remove clipboardUtils.ts (replaced with Antd built-in) - Update DrawerHeader, LogDetailsDrawer, and constants Benefits: - Cleaner code using standard Antd patterns - Better testability with separated utils - Consistent UX with Antd's copy tooltips - Reduced custom code maintenance Co-authored-by: Cursor <cursoragent@cursor.com> --------- Co-authored-by: Cursor <cursoragent@cursor.com> * [Feat] UI - Add Pretty print view of request/response (BerriAI#20096) * v1 - tool viewer in logs page * add preview for tool sections * ui fixes * new tool view * v1 - new pretty view * clean ui * polish fixes * nice view input/output * working i/o cards * fixes for log view --------- Co-authored-by: Warp <agent@warp.dev> * remove md * fixed mcp tools instructions on ui to show comma seprated str instead of list * docs: cleanup docs * litellm_fix: add missing timezone import to proxy_server.py (BerriAI#20121) * fix(proxy): reduce PLR0915 complexity in base_process_llm_request (BerriAI#20127) * litellm_fix(ui): remove unused ToolOutlined import (BerriAI#20129) * litellm_fix(e2e): disable bedrock-converse-claude-sonnet-4.5 model in tests (BerriAI#20131) * litellm_fix(test): fix Azure AI cost calculator test - use Logging class (BerriAI#20134) * litellm_fix(test): fix Bedrock tool search header test regression (BerriAI#20135) * litellm_fix(test): allow comment field in schema and exclude robotics models from tpm check (BerriAI#20139) * litellm_docs: add missing environment variable documentation (BerriAI#20138) * litellm_fix(test): add acancel_batch to Azure SDK client initialization test (BerriAI#20143) * litellm_fix: handle unknown models in Azure AI cost calculator (BerriAI#20150) * litellm_fix(test): fix router silent experiment tests to properly mock async functions (BerriAI#20140) * chore: update Next.js build artifacts (2026-01-31 17:20 UTC, node v22.16.0) * fix(proxy): use get_async_httpx_client for logo download (BerriAI#20155) Replace direct AsyncHTTPHandler instantiation with get_async_httpx_client to avoid +500ms latency per request from creating new async clients. Added httpxSpecialProvider.UI for UI-related HTTP requests like logo downloads. * fix(datadog): check for agent mode before requiring DD_API_KEY/DD_SITE (BerriAI#20156) The DataDog LLM Obs logger was checking for DD_API_KEY and DD_SITE before checking if agent mode (LITELLM_DD_AGENT_HOST) was configured. In agent mode, the DataDog agent handles authentication, so these environment variables are not required. This fix moves the agent mode check first, and only validates DD_API_KEY and DD_SITE when using direct API mode. Fixes test_datadog_llm_obs_agent_configuration and test_datadog_llm_obs_agent_no_api_key_ok * litellm_fix: handle empty dict for web_search_options in Nova grounding (BerriAI#20159) The condition `value and isinstance(value, dict)` fails for empty dicts because `{}` is falsy in Python. Users commonly pass `web_search_options={}` to enable Nova grounding without specifying additional options. Changed the condition to `isinstance(value, dict)` which correctly handles both empty and non-empty dicts. Fixes failing tests: - test_bedrock_nova_grounding_async - test_bedrock_nova_grounding_request_transformation - test_bedrock_nova_grounding_web_search_options_non_streaming - test_bedrock_nova_grounding_with_function_tools * fix(mypy): fix type errors in files, opentelemetry, gemini transformation, and key management (BerriAI#20161) - files/main.py: rename uuid import to uuid_module to avoid conflict with router import - integrations/opentelemetry.py: add fallback for callback_name to ensure str type - llms/gemini/files/transformation.py: add type annotation for params dict - proxy/management_endpoints/key_management_endpoints.py: add null check for prisma_client * litellm_fix(test): update Prometheus metric test assertions with new labels (BerriAI#20162) This fixes the failing litellm_mapped_enterprise_tests (metrics/logging) job. Recent commits added new labels to several Prometheus metrics (model_id, client_ip, user_agent) but the test assertions weren't fully updated to expect these new labels. Tests fixed: - test_async_post_call_failure_hook - test_async_log_failure_event - test_increment_token_metrics - test_log_failure_fallback_event - test_set_latency_metrics - test_set_llm_deployment_success_metrics Labels added to test assertions: - model_id for token metrics (litellm_tokens_metric, litellm_input_tokens_metric, litellm_output_tokens_metric) - model_id for latency metrics (litellm_llm_api_latency_metric) - model_id for remaining requests/tokens metrics - model_id for fallback metrics - model_id for overhead latency metric - client_ip and user_agent for deployment failure/total/success responses - client_ip and user_agent for proxy failed/total requests metrics * test: remove hosted_vllm from OpenAI client tests (BerriAI#20163) hosted_vllm no longer uses the OpenAI client, so these tests that mock the OpenAI client are not applicable to hosted_vllm. Removes hosted_vllm from: - test_openai_compatible_custom_api_base - test_openai_compatible_custom_api_video * litellm_fix: bump litellm-proxy-extras version to 0.4.28 (BerriAI#20166) Changes were made to litellm_proxy_extras (schema.prisma, utils.py, migrations) but version was not bumped, causing CI publish job to fail. This commit bumps the version from 0.4.27 to 0.4.28 in all required files: - litellm-proxy-extras/pyproject.toml - requirements.txt - pyproject.toml Co-authored-by: shin-bot-litellm <shin-bot-litellm@users.noreply.github.com> * litellm_fix(mypy): fix remaining type errors (BerriAI#20164) - route_llm_request.py: add acancel_batch and afile_delete to route_type Literal - router.py: add SearchToolInfoTypedDict and search_tool_info to SearchToolTypedDict - gemini/files/transformation.py: fix validate_environment signature to match base class - responses transformation.py: fix Dict type annotations to use int instead of Optional[int] - vector_stores/endpoints.py: add team_id and user_id to LiteLLM_ManagedVectorStoresTable constructor Co-authored-by: shin-bot-litellm <shin-bot-litellm@users.noreply.github.com> * litellm_fix(security): allowlist Next.js CVEs for 7 days (BerriAI#20169) Temporarily allowlist Next.js vulnerabilities in UI dashboard: - GHSA-h25m-26qc-wcjf (HIGH: DoS via request deserialization) - CVE-2025-59471 (MEDIUM: Image Optimizer DoS) Fix: Upgrade to Next.js 15.5.10+ or 16.1.5+ (7-day timeline) Changes: - Added .trivyignore with Next.js CVEs - Updated security_scans.sh to use --ignorefile flag * litellm_fix(router): use safe_deep_copy in _get_silent_experiment_kwargs (BerriAI#20170) **Regression introduced in:** PR BerriAI#19544 (feat: add feature to make silent calls) Fixes check_code_and_doc_quality CI failure. Line 1332 used copy.deepcopy(kwargs) which violates ban_copy_deepcopy_kwargs check. kwargs can contain non-serializable objects like OTEL spans. Changed to safe_deep_copy(kwargs) which handles these correctly. * docs(embeddings): add supported input formats section (BerriAI#20073) Document valid input formats for /v1/embeddings endpoint per OpenAI spec. Clarifies that array of string arrays is not a valid format. * fix proxy extras pip * fix gemini files * fix EventDrivenCacheCoordinator * test_increment_top_level_request_and_spend_metrics * fix typing * fix transform_retrieve_file_response * fix linting * fix mcp linting * _add_web_search_tool * test_bedrock_nova_grounding_web_search_options_non_streaming * add _is_bedrock_tool_block * fix MCP client * fix files * litellm_fix(lint): remove unused ToolNameValidationResult imports (BerriAI#20176) Fixes ruff F401 errors in check_code_and_doc_quality CI job. **Regression introduced in:** 41ec820 (fix files) - added files with unused imports ## Problem ToolNameValidationResult is imported but never used in: - litellm/proxy/_experimental/mcp_server/mcp_server_manager.py - litellm/proxy/management_endpoints/mcp_management_endpoints.py ## Fix ```diff - ToolNameValidationResult, ``` Removed from both import statements. ## Changes - mcp_server_manager.py: -1 line (removed unused import) - mcp_management_endpoints.py: -1 line (removed unused import) * litellm_fix(azure): Fix acancel_batch not using Azure SDK client initialization (BerriAI#20168) - Fixed model parameter being overwritten to None in acancel_batch function - Added dedicated acancel_batch/\_acancel_batch methods in Router - Properly extracts custom_llm_provider from deployment like acreate_batch This fixes test_ensure_initialize_azure_sdk_client_always_used[acancel_batch] which expected azure_batches_instance.initialize_azure_sdk_client to be called. Co-authored-by: shin-bot-litellm <shin-bot-litellm@users.noreply.github.com> * fix tar security issue with TAR * fix model name during fallback * test_get_image_non_root_uses_var_lib_assets_dir * test_delete_vector_store_checks_access * test_get_session_iterator_thread_safety * Fix health endpoints * _prepare_vertex_auth_headers * test_budget_reset_and_expires_at_first_of_month * fix(test): add router.acancel_batch coverage (BerriAI#20183) - Add test_router_acancel_batch.py with mock test for router.acancel_batch() - Add _acancel_batch to ignored list (internal helper tested via public API) Fixes CI failure in check_code_and_doc_quality job * fix(mypy): fix validate_tool_name return type signatures (BerriAI#20184) Move ToolNameValidationResult class definition outside the fallback function and use consistent return type annotation to satisfy mypy. Files fixed: - proxy/_experimental/mcp_server/mcp_server_manager.py - proxy/management_endpoints/mcp_management_endpoints.py * fix(test): update test_chat_completion to handle metadata in body The proxy now adds metadata to the request body during processing. Updated test to compare fields individually and strip metadata from body comparison. Fixes litellm_proxy_unit_testing_part2 CI failure. * fix(proxy): resolve 'multiple values for keyword argument' in batch cancel and file retrieve - batch_endpoints.py: Pop batch_id from data before creating CancelBatchRequest to avoid duplicate batch_id when data already contains it from earlier cast - files_endpoints.py: Pop file_id from data before calling afile_retrieve to avoid duplicate file_id when data was initialized with {"file_id": file_id} - test_claude_agent_sdk.py: Disable bedrock-nova-premier test as it requires an inference profile for on-demand throughput (AWS limitation) Fixes: e2e_openai_endpoints tests (test_batches_operations, test_file_operations) Fixes: proxy_e2e_anthropic_messages_tests (nova-premier model skip) * ci(security): allowlist GHSA-34x7-hfp2-rc4v (node-tar hardlink) Not applicable - tar CLI not exposed in application code * fix(mypy): add type: ignore for conditional function variants in MCP modules The mypy error 'All conditional function variants have identical signatures' occurs when defining fallback functions in try/except ImportError blocks. Adding '# type: ignore[misc]' suppresses this false positive. Fixes: - mcp_server_manager.py:80 - validate_tool_name fallback - mcp_management_endpoints.py:72 - validate_tool_name fallback * fix: make cache updates synchronous for budget enforcement The budget enforcement was failing in tests because cache updates were fire-and-forget (asyncio.create_task), causing race conditions where subsequent requests would read stale spend data. Changes: 1. proxy_track_cost_callback.py: await update_cache() instead of create_task 2. proxy_server.py: await async_set_cache_pipeline() instead of create_task 3. auth_checks.py: prefer valid_token.team_member_spend (from fresh cache) over team_membership.spend (which may be stale) This ensures budget checks see the most recent spend values and properly enforce budget limits when requests come in quick succession. Fixes: test_users_in_team_budget, test_chat_completion_low_budget * fix(test): accept both AuthenticationError and InternalServerError in batch_completion test (BerriAI#20186) The test uses an invalid API key to verify that batch_completion returns exceptions rather than raising them. However, depending on network conditions, the error may be: - AuthenticationError: API properly rejected the invalid key - InternalServerError: Connection error occurred before API could respond Both are valid outcomes for this test case. Co-authored-by: shin-bot-litellm <shin-bot-litellm@users.noreply.github.com> * test_embedding fix * fix bedrock-nova-premier * Revert "fix: make cache updates synchronous for budget enforcement" This reverts commit d038341. * fix(test): correct prompt_tokens in test_string_cost_values (BerriAI#20185) The test had prompt_tokens=1000 but the sum of token details was 1150 (text=700 + audio=100 + cached=200 + cache_creation=150). This triggered the double-counting detection logic which recalculated text_tokens to 550, causing the assertion to fail. Fixed by setting prompt_tokens=1150 to match the sum of details. * fix: bedrock-converse-claude-sonnet-4.5 * fix: stabilize CI tests - routes and bedrock config - Add /v1/vector_store/list route for OpenAI API compatibility (fixes test_routes_on_litellm_proxy) - Fix Bedrock Converse API model format (bedrock_converse/ → bedrock/converse/) - Fix Nova Premier inference profile prefix (amazon. → us.amazon.) - Add STABILIZATION_TODO.md to .gitignore Tested locally - all affected tests now pass Co-authored-by: Cursor <cursoragent@cursor.com> * sync: generator client * add LiteLLM_ManagedVectorStoresTable_user_id_idx * docs/blog index page (BerriAI#20188) * docs: add card-based blog index page for mobile navigation Fixes BerriAI#20100 - the blog landing page showed post content directly instead of an index, with no way to navigate between posts on mobile. - Swizzle BlogListPage with card-based grid layout - Featured latest post spans full width with badge - Responsive 2-column grid with orphan handling - Pagination, SEO metadata, accessibility (aria-label, dateTime, heading hierarchy) - Add description frontmatter to existing blog posts * docs: add deterministic fallback colors for unknown blog tags * docs: rename blog heading to The LiteLLM Blog * UI spend logs setting docs * bump extras * fix fake-openai-endpoint * doc fix * fix team budget checks * bump: version 1.81.5 → 1.81.6 * litellm_fix_mapped_tests_core: clear client cache and fix isinstance checks (BerriAI#20196) ## Problem Tests using mocked HTTP clients were hitting real APIs because: 1. HTTP client cache was returning previously cached real clients 2. isinstance checks failed due to module identity issues from sys.path ### Tests affected: - test_send_email_missing_api_key - test_send_email_multiple_recipients (resend & sendgrid) - test_search_uses_registry_credentials - test_vector_store_create_with_simple_provider_name - test_vector_store_create_with_provider_api_type - test_vector_store_create_with_ragflow_provider - test_image_edit_merges_headers_and_extra_headers - test_retrieve_container_basic (container API tests) ## Solution 1. Add clear_client_cache fixture (autouse=True) to clear litellm.in_memory_llm_clients_cache before each test 2. Fix isinstance checks to use type name comparison (avoids module identity issues from sys.path.insert) ## Why not disable_aiohttp_transport The default transport is aiohttp, so tests should work with it. Clearing the cache ensures mocks are used instead of cached real clients. ## Regression PR BerriAI#19829 (commit f95572e) added @respx.mock but cached clients from earlier tests were being reused, bypassing the mocks. Co-authored-by: shin-bot-litellm <shin-bot-litellm@users.noreply.github.com> * test_chat_completion_low_budget * fix: delete_file * fixes * fix: update test_prometheus to expect masked user_id in metrics The user_id field 'default_user_id' is being masked to '*******_user_id' in prometheus metrics for privacy. Updated test expectations to match the actual behavior. Co-authored-by: Cursor <cursoragent@cursor.com> * docs fix * feat(bedrock): add base cache costs for sonnet v1 (BerriAI#20214) * docs: fix dead links in v1.81.6 release notes (BerriAI#20218) - Fix /docs/search/index -> /docs/search (404 error) - Fix /cookbook/ -> GitHub cookbook URL (404 error) Co-authored-by: shin-bot-litellm <shin-bot-litellm@users.noreply.github.com> * fix(test): update test_prometheus with masked user_id and missing labels - Update expected user_id from 'default_user_id' to '*******_user_id' (PII masking) - Add missing client_ip, user_agent, model_id labels (from PRs BerriAI#19717, BerriAI#19678) - Update label order to match Prometheus alphabetical sorting Co-authored-by: Cursor <cursoragent@cursor.com> * litellm_fix_mapped_tests_core: fix test isolation and mock injection issues (BerriAI#20209) * litellm_fix_mapped_tests_core: fix test isolation and mock injection issues ## Problem Four tests in litellm_mapped_tests_core were failing: 1. test_register_model_with_scientific_notation - KeyError due to test isolation issues 2. test_search_uses_registry_credentials - Mock not being called due to incorrect patch path 3. test_send_email_missing_api_key - Real API calls despite mocking 4. test_stream_transformation_error_sync - Mock not effective, real API called ## Solution ### test_register_model_with_scientific_notation - Use unique model name to avoid conflicts with other tests - Clear LRU caches before test to prevent stale data - Clean up model_cost entry after test ### test_search_uses_registry_credentials - Use patch.object() on the actual base_llm_http_handler instance - String-based patching for instance methods can fail; direct object patching is more reliable ### test_send_email_missing_api_key - Directly inject mock HTTP client into logger instance - This bypasses any caching issues that could cause the fixture mock to be ineffective ### test_stream_transformation_error_sync - Patch litellm.completion directly instead of the handler module's litellm reference - This ensures the mock is effective regardless of import order ## Regression These tests were affected by LRU caching added in BerriAI#19606 and HTTP client caching. * fix(test): use patch.object for container API tests to fix mock injection ## Problem test_retrieve_container_basic tests were failing because mocks weren't being applied correctly. The tests used string-based patching: patch('litellm.containers.main.base_llm_http_handler') But base_llm_http_handler is imported at module level, so the mock wasn't intercepting the actual handler calls, resulting in real HTTP requests to OpenAI API. ## Solution Use patch.object() to directly mock methods on the imported handler instance. Import base_llm_http_handler in the test file and patch like: patch.object(base_llm_http_handler, 'container_retrieve_handler', ...) This ensures the mock is applied to the actual object being used, regardless of import order or caching. * fix(test): add missing Prometheus metric labels to test_proxy_failure_metrics Add client_ip, user_agent, model_id labels to expected metric patterns. These labels were added in PRs BerriAI#19717 and BerriAI#19678 but test wasn't updated. * fix(test_resend_email): use direct mock injection for all email tests Extend the mock injection pattern used in test_send_email_missing_api_key to all other tests in the file: - test_send_email_success - test_send_email_multiple_recipients Instead of relying on fixture-based patching and respx mocks which can fail due to import order and caching issues, directly inject the mock HTTP client into the logger instance. This ensures mocks are always used regardless of test execution order. * fix(test): use patch.object for image_edit and vector_store tests - test_image_edit_merges_headers_and_extra_headers: import base_llm_http_handler and use patch.object instead of string path patching - test_search_uses_registry_credentials: import module and patch via module.base_llm_http_handler to ensure we patch the right instance --------- Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com> * Revert "Merge pull request BerriAI#18790 from BerriAI/litellm_key_team_routing_3" This reverts commit ae26d8e, reversing changes made to 864e8c6. * test_proxy_failure_metrics * test_proxy_success_metrics * fix(test): make test_proxy_failure_metrics resilient to missing proxy-level metrics - Check for both litellm_proxy_failed_requests_metric_total and the deprecated litellm_llm_api_failed_requests_metric_total - The proxy-level failure hook may not always be called depending on where the exception occurs - Simplify total_requests check to only verify key fields Co-authored-by: Cursor <cursoragent@cursor.com> * test fix * docs: Update v1.81.6 release notes - focus on Logs v2 with Tool Call Tracing (BerriAI#20225) - Updated title to highlight Logs v2 feature - Simplified Key Highlights to focus on Logs v2 / tool call tracing - Rewrote Logs v2 description with improved language style - Removed Claude Agents SDK and RAG API from key highlights section - TODO: Add image (logs_v2_tool_tracing.png) Co-authored-by: shin-bot-litellm <shin-bot-litellm@users.noreply.github.com> * feat: enhance Cohere embedding support with additional parameters and model version * Update Vertex AI Text to Speech doc to show use of audio --------- Co-authored-by: jayy-77 <1427jay@gmail.com> Co-authored-by: Neha Prasad <neh6a683@gmail.com> Co-authored-by: Cesar Garcia <128240629+Chesars@users.noreply.github.com> Co-authored-by: Harshit Jain <48647625+Harshit28j@users.noreply.github.com> Co-authored-by: Sameer Kankute <sameer@berri.ai> Co-authored-by: michelligabriele <gabriele.michelli@icloud.com> Co-authored-by: Bernardo Donadio <bcdonadio@bcdonadio.com> Co-authored-by: Christopher Chase <cchase@redhat.com> Co-authored-by: Aaron Yim <aaronchyim@gmail.com> Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com> Co-authored-by: Takumi Matsuzawa <152503584+genga6@users.noreply.github.com> Co-authored-by: Varun Sripad <varunsripad@Varuns-MacBook-Air.local> Co-authored-by: yuneng-jiang <yuneng.jiang@gmail.com> Co-authored-by: Rhys <nghuutho74@gmail.com> Co-authored-by: Alexsander Hamir <alexsanderhamirgomesbaptista@gmail.com> Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com> Co-authored-by: Cursor Agent <cursoragent@cursor.com> Co-authored-by: ishaan <ishaan@berri.ai> Co-authored-by: Warp <agent@warp.dev> Co-authored-by: shivam <shivam@uni.minerva.edu> Co-authored-by: Krrish Dholakia <krrishdholakia@gmail.com> Co-authored-by: Shivam Rawat <161387515+shivamrawat1@users.noreply.github.com> Co-authored-by: shin-bot-litellm <shin-bot-litellm@berri.ai> Co-authored-by: shin-bot-litellm <shin-bot-litellm@users.noreply.github.com> Co-authored-by: ryan-crabbe <128659760+ryan-crabbe@users.noreply.github.com> Co-authored-by: cscguochang-agent <cscguochang@gmail.com> Co-authored-by: amirzaushnizer <amir.z@qodo.ai>

@patch

* UI: new build * redirect to login on expired jwt * [Feat] UI + Backend - Allow adding policies on Keys/Teams + Viewing on Info panels (#19688) * ui for policy mgmt * test_add_guardrails_from_policy_engine_accepts_dynamic_policies_and_pops_from_data * docs: add litellm-enterprise requirement for managed files (#19689) * Update Gemini 2.0 Flash deprecation dates to March 31, 2026 (#19592) Google announced that Gemini 2.0 Flash and Flash Lite models will be discontinued on March 31, 2026. Updated deprecation_date field for all affected model variants across different providers (vertex_ai, gemini, deepinfra, openrouter, vercel_ai_gateway). Models updated: - gemini-2.0-flash (added deprecation date) - gemini-2.0-flash-001 (updated from 2026-02-05) - gemini-2.0-flash-lite (added deprecation date) - gemini-2.0-flash-lite-001 (updated from 2026-02-25) All variants now correctly reflect the March 31, 2026 shutdown date. * fixing build * Fixing failing tests * deactivating non root tests * fixing arize tests * cache tests serial * fixing circleci config * fixing circleci config * Update OSS Adopters section with new table format * Fixing ruff check * bump: version 1.81.2 → 1.81.3 * chore: update Next.js build artifacts (2026-01-24 17:18 UTC, node v22.16.0) * CI/CD fixes - split local testing * fix: _apply_search_filter_to_models mypy linting * test_partner_models_httpx_streaming * test_web_search * Fix: log duplication when json_logs is enabled (#19705) * fix: FLAKY tests * fix unstable tests * docs fix * docs fix * docs fix * docs fix * docs fix * test_get_default_unvicorn_init_args * fix flaky tests * test_hanging_request_azure * test_team_update_sc_2 * BUMP extras * test fixes * test fixes * test_retrieve_container_basic * Model and Team filtering * TestBedrockInvokeToolSearch * fix(presidio): resolve runtime error by handling asyncio loops in bac… (#19714) * fix(presidio): resolve runtime error by handling asyncio loops in background threads * add test case for thread safety * UI Keys Teams Router Settings docs * chore: update Next.js build artifacts (2026-01-25 00:27 UTC, node v22.16.0) * test_stream_transformation_error_sync * fix patch reliability mock tests * fix MCP tests * auto truncation of virtual keys table values * fix: args issue & refactor into helper function to reduce bloat for both(#19441) * Fix bulk user add * fix(proxy): support slashes in google generateContent model names (#19737) * fix(proxy): support slashes in google route params * fix(proxy): extract google model ids with slashes * test(proxy): cover google model ids with slashes * Fix/non standard mcp url pattern (#19738) * fix(mcp): Add standard MCP URL pattern support for OAuth discovery (#17272) OAuth discovery endpoints now support both URL patterns: - Standard MCP pattern: /mcp/{server_name} (new) - Legacy LiteLLM pattern: /{server_name}/mcp (backward compatible) The standard pattern is required by MCP-compliant clients like mcp-inspector and VSCode Copilot, which expect resource URLs following the /mcp/{server_name} convention per RFC 9728. Changes: - Add _build_oauth_protected_resource_response() helper - Add oauth_protected_resource_mcp_standard() endpoint - Add oauth_authorization_server_mcp_standard() endpoint - Keep legacy endpoints for backward compatibility - Add tests for both URL patterns Fixes #17272 * fix(mcp): Add standard MCP URL pattern support for OAuth discovery (#17272) OAuth discovery endpoints now support both URL patterns: - Standard MCP pattern: /mcp/{server_name} (new) - Legacy LiteLLM pattern: /{server_name}/mcp (backward compatible) The standard pattern is required by MCP-compliant clients like mcp-inspector and VSCode Copilot, which expect resource URLs following the /mcp/{server_name} convention per RFC 9728. Changes: - Add _build_oauth_protected_resource_response() helper - Add oauth_protected_resource_mcp_standard() endpoint - Add oauth_authorization_server_mcp_standard() endpoint - Keep legacy endpoints for backward compatibility - Add tests for both URL patterns Fixes #17272 * Test was relocated * refactor(mcp): Extract helper methods from run_with_session to fix PLR0915 Split the large run_with_session method (55 statements) into smaller helper methods to satisfy ruff's PLR0915 rule (max 50 statements): - _create_transport_context(): Creates transport based on type - _execute_session_operation(): Handles session lifecycle Also changed cleanup exception handling from Exception to BaseException to properly catch asyncio.CancelledError (which is a BaseException subclass in Python 3.8+). Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * test(mcp): Fix flaky test by mocking health_check_server The test_mcp_server_manager_config_integration_with_database test was making real network calls to fake URLs which caused timeouts and CancelledError exceptions. Fixed by mocking health_check_server to return a proper LiteLLM_MCPServerTable object instead of making network calls. * test(mcp): Fix skip condition to properly detect claude model names The skip condition for missing API keys was checking for "anthropic" in the model name, but the test uses "claude-haiku-4-5" which doesn't match. Updated to check for both "anthropic" and "claude" model patterns. Also added skip condition for OpenAI models when OPENAI_API_KEY is not set. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * test(mcp): Fix skip condition to properly detect claude model names The skip condition for missing API keys was checking for "anthropic" in the model name, but the test uses "claude-haiku-4-5" which doesn't match. Updated to check for both "anthropic" and "claude" model patterns. Also added skip condition for OpenAI models when OPENAI_API_KEY is not set. --------- Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com> * add callbacks and labels to prometheus (#19708) * feat: add clientip and user agent in metrics (#19717) * feat: add clientip and user agent in metrics * fix: lint errors * Add model id and other req labels --------- Co-authored-by: Krish Dholakia <krrishdholakia@gmail.com> * fix: optimize logo fetching and resolve mcp import blockers (#19719) * feat: tpm-rpm limit in prometheus metrics (#19725) Co-authored-by: Krish Dholakia <krrishdholakia@gmail.com> * add timeout to onyx guardrail (#19731) * add timeout to onyx guardrail * add tests * Fix /batches to return encoded ids (from managed objects table) * fix(proxy): use return value from CustomLogger.async_post_call_success_hook (#19670) * fix(proxy): use return value from CustomLogger.async_post_call_success_hook Previously the return value was ignored for CustomLogger callbacks, preventing users from modifying responses. Now the return value is captured and used to replace the response (if not None), consistent with CustomGuardrail and streaming iterator hook behavior. Fixes issue with custom_callbacks not being able to inject data into LLM responses. * fix(proxy): also fix async_post_call_streaming_hook to use return value Previously the streaming hook only used return values that started with "data: " (SSE format). Now any non-None return value is used, consistent with async_post_call_success_hook and streaming iterator hook behavior. Added tests for streaming hook transformation. --------- Co-authored-by: Gabriele Michelli <michelligabriele0@gmail.com> * feat(hosted_vllm): support thinking parameter for /v1/messages endpoint Adds support for Anthropic-style 'thinking' parameter in hosted_vllm, converting it to OpenAI-style 'reasoning_effort' since vLLM is OpenAI-compatible. This enables users to use Claude Code CLI with hosted vLLM models like GLM-4.6/4.7 through the /v1/messages endpoint. Mapping (same as Anthropic adapter): - budget_tokens >= 10000 -> "high" - budget_tokens >= 5000 -> "medium" - budget_tokens >= 2000 -> "low" - budget_tokens < 2000 -> "minimal" Fixes #19761 * Fix batch creation to return the input file's expires_at attribute * bump: version 1.81.3 → 1.81.4 (#19793) * fix: server rooth path (#19790) * refactor: extract transport context creation into separate method (#19794) * Fix user max budget reset to unlimited - Added a Pydantic validator to convert empty string inputs for max_budget to None, preventing float parsing errors from the frontend. - Modified the internal user update logic to explicitly allow max_budget to be None, ensuring the value isn't filtered out and can be reset to unlimited in the database. - Added unit tests for validation and logic. Closes #19781 * Make test_get_users_key_count deterministic by creating dedicated test user (#19795) - Create a test user with auto_create_key=False to ensure known starting state - Filter get_users by user_ids to target only the test user - Verify initial key count is 0 before creating a key - Clean up test user after test completes - This ensures consistent behavior across CI and local environments * Add test for Router.get_valid_args, fix router code coverage encoding (#19797) - Add test_get_valid_args in test_router_helper_utils.py to cover get_valid_args - Use encoding='utf-8' in router_code_coverage.py for cross-platform file reads * fix sso email case sensitivity * Fix test_mcp_server_manager_config_integration_with_database cancellation error (#19801) Mock _create_mcp_client to avoid network calls in health checks. This prevents asyncio.CancelledError when the test teardown closes the event loop while health checks are still pending. The test focuses on conversion logic (access_groups, description) not health check functionality, so mocking the network call is appropriate. * fix: make HTTPHandler mockable in OIDC secret manager tests (#19803) * fix: make HTTPHandler mockable in OIDC secret manager tests - Add _get_oidc_http_handler() factory function to make HTTPHandler easily mockable in tests - Update test_oidc_github_success to patch factory function instead of HTTPHandler directly - Update Google OIDC tests for consistency - Fixes test_oidc_github_success failure where mock was bypassed This change allows tests to properly mock HTTPHandler instances used for OIDC token requests, fixing the test failure where the mock was not being used. * fix: patch base_llm_http_handler method directly in container tests - Use patch.object to patch container_create_handler method directly on the base_llm_http_handler instance instead of patching the module - Fixes test_provider_support[openai] failure where mock wasn't applied - Also fixes test_error_handling_integration with same approach The issue was that patching 'litellm.containers.main.base_llm_http_handler' didn't work because the module imports it with 'from litellm.main import', creating a local reference. Using patch.object patches the method on the actual object instance, which works regardless of import style. * fix: resolve flaky test_openai_env_base by clearing cache - Add cache clearing at start of test_openai_env_base to prevent cache pollution - Ensures no cached clients from previous tests interfere with respx mocks - Fixes intermittent failures where aiohttp transport was used instead of httpx - Test-only change with low risk, no production code modifications Resolves flaky test marked with @pytest.mark.flaky(retries=3, delay=1) Both parametrized versions (OPENAI_API_BASE and OPENAI_BASE_URL) now pass consistently * test: add explicit mock verification in test_provider_support - Capture mock handler with 'as mock_handler' for explicit validation - Add assert_called_once() to verify mock was actually used - Ensures test verifies no real API calls are made - Follows same pattern as test_openai_env_base validation * Add light/dark mode slider for dev * fix key duration input * Messages api bedrock converse caching and pdf support (#19785) * cache control for user messages and system messages * add cache createion tokens in reponse * cache controls in tool calls and assistant turns * refactor with _should_preserve_cache_control * add cache control unit tests * use simpler cache creation token count logic * use helper function * remove unused function * fix unit tests * fixing team member add * [Feat] enable progress notifications for MCP tool calls (#19809) * enable progress notifications for MCP tool calls * adjust mcp test * [Feat] CLI Auth - Add configurable CLI JWT expiration via environment variable (#19780) * fix: add CLI_JWT_EXPIRATION_HOURS * docs: CLI_JWT_EXPIRATION_HOURS * fix: get_cli_jwt_auth_token * test_get_cli_jwt_auth_token_custom_expiration * fixing flaky tests around oidc and email * Add dont ask me again option in nudges * CI/CD: Increase retries and stabilize litellm_mapped_tests_core (#19826) * Fix PLR0915: Extract system message handling to reduce statement count * fix mypy * fix: add host_progress_callback parameter to mock_call_tool in test The test_call_tool_without_broken_pipe_error was failing because the mock function did not accept the host_progress_callback keyword argument that the actual implementation passes to client.call_tool(). Updated the mock to accept this parameter to match the real implementation signature. * fixing flaky tests around oidc and email * Add documentation comment to test file * add retry * add dependency * increase retry --------- Co-authored-by: yuneng-jiang <yuneng.jiang@gmail.com> * Fix broken mocks in 6 flaky tests to prevent real API calls (#19829) * Fix broken mocks in 6 flaky tests to prevent real API calls Added network-level HTTP blocking using respx to prevent tests from making real API calls when Python-level mocks fail. This makes tests more reliable and retryable in CI. Changes: - Azure OIDC test: Added Azure Identity SDK mock to prevent real Azure calls - Vector store test: Added @respx.mock decorator to block HTTP requests - Resend email tests (3): Added @respx.mock decorator for all 3 test functions - SendGrid email test: Added @respx.mock decorator All test assertions and verification logic remain unchanged - only added safety nets to catch leaked API calls. * Fix failing OIDC secret manager tests Fixed two test failures in test_secret_managers_main.py: 1. test_oidc_azure_ad_token_success: Corrected the patch path for get_bearer_token_provider from 'litellm.secret_managers.get_azure_ad_token_provider.get_bearer_token_provider' to 'azure.identity.get_bearer_token_provider' since the function is imported from azure.identity. 2. test_oidc_google_success: Added @patch('httpx.Client') decorator to prevent any real HTTP connections during test execution, resolving httpx.ConnectError issues. Both tests now pass successfully. * Adding tests: * fixing breaking change: just user_id provided should upsert still * Fix: A2A Python SDK URL * [Feat] Add UI for /rag/ingest API - upload docs, pdfs etc to create vector stores (#19822) * feat: _save_vector_store_to_db_from_rag_ingest * UI features for RAG ingest * fix: Endpoints * ragIngestCall * _save_vector_store_to_db_from_rag_ingest * fix: rag_ingest Code QA CHECK * UI fixes unit tests * docs(readme): add OpenAI Agents SDK to OSS Adopters (#19820) * docs(readme): add OpenAI Agents SDK to OSS Adopters * docs(readme): add OpenAI Agents SDK logo * Fixing tests * Litellm release notes 01 26 2026 (#19836) * docs: document new models/endpoints * docs: cleanup * feat: update model table * fixing tests * Litellm release notes 01 26 2026 (#19838) * docs: document new models/endpoints * docs: cleanup * feat: update model table * fix: cleanup * feat: Add model_id label to Prometheus metrics (#18048) (#19678) Co-authored-by: Cursor Agent <cursoragent@cursor.com> * fix(models): set gpt-5.2-codex mode to responses for Azure and OpenRouter (#19770) Fixes #19754 The gpt-5.2-codex model only supports the responses API, not chat completions. Updated azure/gpt-5.2-codex and openrouter/openai/gpt-5.2-codex entries to use mode: "responses" and supported_endpoints: ["/v1/responses"]. * fix(responses): update local_vars with detected provider (#19782) (#19798) When using the responses API with provider-specific params (aws_*, vertex_*) without explicitly passing custom_llm_provider, the code crashed with: AttributeError: 'NoneType' object has no attribute 'startswith' Root cause: local_vars was captured via locals() before get_llm_provider() detected the provider from the model string (e.g., "bedrock/..."), so custom_llm_provider remained None when processing provider-specific params. Fix: Update local_vars["custom_llm_provider"] after get_llm_provider() call so the detected provider is available for param processing. Affected provider-specific params: - aws_* (aws_region_name, aws_access_key_id, etc.) for Bedrock/SageMaker - vertex_* (vertex_project, vertex_location, etc.) for Vertex AI * fix(azure): use generic cost calculator for audio token pricing (#19771) Azure audio models were charging audio output tokens at the text token rate instead of the correct audio token rate. This resulted in costs being ~6.65x lower than expected. The fix replaces Azure's custom cost calculation logic with the generic cost calculator that properly handles text, audio, cached, reasoning, and image tokens. Fixes #19764 * fix(xai): correct cached token cost calculation for xAI models (#19772) * fix(azure): use generic cost calculator for audio token pricing Azure audio models were charging audio output tokens at the text token rate instead of the correct audio token rate. This resulted in costs being ~6.65x lower than expected. The fix replaces Azure's custom cost calculation logic with the generic cost calculator that properly handles text, audio, cached, reasoning, and image tokens. Fixes #19764 * fix(xai): correct cached token cost calculation for xAI models - Fix double-counting issue where xAI reports text_tokens = prompt_tokens (including cached), causing tokens to be charged twice - Add cache_read_input_token_cost to xAI grok-3 and grok-3-mini model variants - Detection: when text_tokens + cached_tokens > prompt_tokens, recalculate text_tokens = prompt_tokens - cached_tokens xAI pricing (25% of input for cached): - grok-3 variants: $0.75/M cached (input $3/M) - grok-3-mini variants: $0.075/M cached (input $0.30/M) * Fix:Support both JSON array format and comma-separated values from user headers * Translate advanced-tool-use to Bedrock-specific headers for Claude Opus 4.5 * fix: token calculations and refactor (#19696) * fix(prometheus): safely handle None metadata in logging to prevent At… (#19691) * fix(prometheus): safely handle None metadata in logging to prevent AttributeError * fix: lint issues * fix: resolve 'does not exist' migration errors as applied in setup_database (#19281) * Fix: timeout exception raised eror * Add sarvam doc * Add gemini-robotics-er-1.5-preview model in model map * Add gemini-robotics-er-1.5-preview model documentation * Fix: Stream the download in chunks * Add grok reasoning content * Revert poetry lock * Fix mypy and code quality issues * feat: add feature to make silent calls (#19544) * feat: add feature to make silent calls * add test or silent feat * add docs for silent feat * fix lint issues and UI logs * add docs of ab testing and deep copy * fix(enterprise): correct error message for DISABLE_ADMIN_ENDPOINTS (#19861) The error message for DISABLE_ADMIN_ENDPOINTS incorrectly said "DISABLING LLM API ENDPOINTS is an Enterprise feature" instead of "DISABLING ADMIN ENDPOINTS is an Enterprise feature". This was a copy-paste bug from the is_llm_api_route_disabled() function. Added regression tests to verify both error messages are correct. * fix(proxy): handle agent parameter in /interactions endpoint (#19866) * initialize tiktoken environment at import time to support offline usage * fix(bedrock): support tool search header translation for Sonnet 4.5 (#19871) Extend advanced-tool-use header translation to include Claude Sonnet 4.5 in addition to Opus 4.5 on Bedrock Invoke API. When Claude Code sends the advanced-tool-use-2025-11-20 header, it now gets correctly translated to Bedrock-specific headers for both: - Claude Opus 4.5 - Claude Sonnet 4.5 Headers translated: - tool-search-tool-2025-10-19 - tool-examples-2025-10-29 Fixes defer_loading validation error on Bedrock with Sonnet 4.5. Ref: https://platform.claude.com/docs/en/agents-and-tools/tool-use/tool-search-tool * bulk update keys endpoint * mypy linting * [Feat] RAG API - Add support for using s3 Vectors as Vector Store Provider for /rag/ingest (#19888) * init S3VectorsRAGIngestion as a supported ingestion provider for RAG API * test: TestRAGS3Vectors * init S3VectorsVectorStoreOptions * init s3 vectors * code clean up + QA * fix: get_credentials * S3VectorsRAGIngestion * TestRAGS3Vectors * docs: AWS S3 Vectors * add asyncio QA checks * fix: S3_VECTORS_DEFAULT_DIMENSION * Add native_background_mode to override polling_via_cache for specific models This follow-up to PR #16862 allows users to specify models that should use the native provider's background mode instead of polling via cache. Config example: litellm_settings: responses: background_mode: polling_via_cache: ["openai"] native_background_mode: ["o4-mini-deep-research"] ttl: 3600 When a model is in native_background_mode list, should_use_polling_for_request returns False, allowing the request to fall through to native provider handling. Committed-By-Agent: cursor * [Feat] RAG API - Add s3_vectors as provider on /vector_store/search API + UI for creating + PDF support for /rag/ingest (#19895) * init S3VectorsRAGIngestion as a supported ingestion provider for RAG API * test: TestRAGS3Vectors * init S3VectorsVectorStoreOptions * init s3 vectors * code clean up + QA * fix: get_credentials * S3VectorsRAGIngestion * TestRAGS3Vectors * docs: AWS S3 Vectors * add asyncio QA checks * fix: S3_VECTORS_DEFAULT_DIMENSION * init ui for bedrock s3 vectors * fix add /search support for s3_vectors * init atransform_search_vector_store_request * feat: S3VectorsVectorStoreConfig * TestS3VectorsVectorStoreConfig * atransform_search_vector_store_request * fix: S3VectorsVectorStoreConfig * add validation for bucket name etd * fix UI validation for s3 vector store * init extract_text_from_pdf * add pypdf * fix code QA checks * fix navbar * init s3_vector.png * fix QA code * Add tests for native_background_mode feature Added 8 new unit tests for the native_background_mode feature: - test_polling_disabled_when_model_in_native_background_mode - test_polling_disabled_for_native_background_mode_with_provider_list - test_polling_enabled_when_model_not_in_native_background_mode - test_polling_enabled_when_native_background_mode_is_none - test_polling_enabled_when_native_background_mode_is_empty_list - test_native_background_mode_exact_match_required - test_native_background_mode_with_provider_prefix_in_request - test_native_background_mode_with_router_lookup Committed-By-Agent: cursor * add sortBy and sortOrder params for /v2/model/info * ruff check * Fixing UI tests * test(proxy): add regression tests for vertex passthrough model names with slashes (#19855) Added test cases for custom model names containing slashes in Vertex AI passthrough URLs (e.g., gcp/google/gemini-2.5-flash). Test cases: - gcp/google/gemini-2.5-flash - gcp/google/gemini-3-flash-preview - custom/model * fix: guardrails issues streaming-response regex (#19901) * fix: add fix for migration issue and and stable linux debain (#19843) * fix: filter unsupported beta headers for Bedrock Invoke API (#19877) - Add whitelist-based filtering for anthropic_beta headers - Only allow Bedrock-supported beta flags (computer-use, tool-search, etc.) - Filter out unsupported flags like mcp-servers, structured-outputs - Remove output_format parameter from Bedrock Invoke requests - Force tool-based structured outputs when response_format is used Fixes #16726 * fix: allow tool_choice for Azure GPT-5 chat models (#19813) * fix: don't treat gpt-5-chat as GPT-5 reasoning * fix: mark azure gpt-5-chat as supporting tool_choice * test: cover gpt-5-chat params on azure/openai * fix: tool with antropic #19800 (#19805) * All Models Page server side sorting * Add Init Containers in the community helm chart (#19816) * docs: fix guardrail logging docs (#19833) * Fixing build and tests * inspect BadRequestError after all other policy types (#19878) As indicated by https://docs.litellm.ai/docs/exception_mapping, BadRequestError is used as the base type for multiple exceptions. As such, it should be tested last in handling retry policies. This updates the integration test that validates retry policies work as expected. Fixes #19876 * fix(main): use local tiktoken cache in lazy loading (#19774) The lazy loading implementation for encoding in __getattr__ was calling tiktoken.get_encoding() directly without first setting TIKTOKEN_CACHE_DIR. This caused tiktoken to attempt downloading the encoding file from the internet instead of using the local copy bundled with litellm. This fix uses _get_default_encoding() from _lazy_imports which properly sets TIKTOKEN_CACHE_DIR before loading tiktoken, ensuring the local cache is used. * fix(gemini): subtract implicit cached tokens from text_tokens for correct cost calculation (#19775) When Gemini uses implicit caching, it returns cachedContentTokenCount but NOT cacheTokensDetails. Previously, text_tokens was not adjusted in this case, causing costs to be calculated as if all tokens were non-cached. This fix subtracts cachedContentTokenCount from text_tokens when no cacheTokensDetails is present (implicit caching), ensuring correct cost calculation with the reduced cache_read pricing. * [Feat] UI: Allow Admins to control what pages are visible on LeftNav (#19907) * feat: enabled_ui_pages_internal_users * init ui for internal user controsl * fix ui settings * fix build * fix leftnav * fix leftnav * test fixes * fix leftnav * isPageAccessibleToInternalUsers * docs fix * docs ui viz * Add xai websearch params support * Allow dynamic setting of store_prompts_in_spend_logs * Fix: output_tokens_details.reasoning_tokens None * fix: Pydantic will fail to parse it because cached_tokens is required but not provided * Spend logs setting modal * adding tests * fix(anthropic): remove explicit cache_control null in tool_result content Fixes issue where tool_result content blocks include explicit 'cache_control': null which breaks some Anthropic API channels. Changes: - Only include cache_control field when explicitly set and not None - Prevents serialization of null values in tool_result text content - Maintains backward compatibility with existing cache_control usage Related issue: Anthropic tool_result conversion adds explicit null values that cause compatibility issues with certain API implementations. Co-Authored-By: Claude (claude-4.5-sonnet) <noreply@anthropic.com> * Fixing tests * Add Prompt caching and reasoning support for MiniMax, GLM, Xiaomi * Fix test_calculate_usage_completion_tokens_details_always_populated and logging object test * Fix gemini-robotics-er-1.5-preview name * Fix gemini-robotics-er-1.5-preview name * Fix team cli auth flow (#19666) * Cleanup code for user cli auth, and make sure not to prompt user for team multiple times while polling * Adding tests * Cleanup normalize teams some more * fix(vertex_ai): support model names with slashes in passthrough URLs (#19944) The regex in get_vertex_model_id_from_url() was using [^/:]+ which stopped at the first slash, truncating model names like 'gcp/google/gemini-2.5-flash' to just 'gcp'. This caused access_groups checks to fail for custom model names. Changed the pattern to [^:]+ to allow slashes in model names, only stopping at the colon before the action (e.g., :generateContent). * Fix thread leak in OpenTelemetry dynamic header path (#19946) * UI: New build * breakdown by team and keys * Adding test * Fixing build * fix pypdf: >=6.6.2 * [Fix] A2a Gateway - Allow supporting old A2a card formats (#19949) * fix: LiteLLMA2ACardResolver * fix: LiteLLMA2ACardResolver * feat: .well-known/agent.json * test_card_resolver_fallback_from_new_to_old_path * Add error_message search in spend logs endpoint * Adding Error message search to ui spend logs * fix * fix(presidio): reuse HTTP connections to prevent OOMs (#19964) * [Fix] VertexAI Pass through - fix regression that caused vertex ai passthroughs to stop working for router models (#19967) * fix(vertex_ai): replace custom model names with actual Vertex AI model names in passthrough URLs (#19948) When the passthrough URL already contains project and location, the code was skipping the deployment lookup and forwarding the URL as-is to Vertex AI. For custom model names like gcp/google/gemini-2.5-flash, Vertex AI returned 404 because it only knows the actual model name (gemini-2.5-flash). The fix makes the deployment lookup always run, so the custom model name gets replaced with the actual Vertex AI model name before forwarding. * add _resolve_vertex_model_from_router * fix: get_llm_provider * Potential fix for code scanning alert no. 4020: Clear-text logging of sensitive information Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com> --------- Co-authored-by: michelligabriele <gabriele.michelli@icloud.com> Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com> * Reusable Table Sort Component * Fixing sorting API calls * [Release Day] - Fixed CI/CD issues & changed processes (#19902) * [Feat] - Search API add /list endpoint to list what search tools exist in router (#19969) * feat: List all available search tools configured in the router. * add debugging search API * add debugging search API * fixing sorting for v2/model/info * [Feat] LiteLLM Vector Stores - Add permission management for users, teams (#19972) * fix: create_vector_store_in_db * add team/user to LiteLLM_ManagedVectorStore * add _check_vector_store_access * add new fields * test_check_vector_store_access * add vector_store/list endpoints * fix code QA checks * feat: Add new OpenRouter models: `xiaomi/mimo-v2-flash`, `z-ai/glm-4.7`, `z-ai/glm-4.7-flash`, and `minimax/minimax-m2.1`. to model prices and context window (#19938) Co-authored-by: Rushil Chugh <Rushil> * fix gemini gemini-robotics-er-1.5-preview entry * removing _experimental out routes from gitignore * chore: update Next.js build artifacts (2026-01-29 04:12 UTC, node v22.16.0) * Add custom_llm_provider as gemini translation * Add test to check if model map is corretly formatted * Intentional bad model map * Add Validate model_prices_and_context_window.json job * Remove validate job from lint * Intentional bad model map * Intentional bad model map * Correct model map path * Fix: litellm_fix_robotic_model_map_entry * fix(mypy): fix type: ignore placement for OTEL LogRecord import The type: ignore[attr-defined] comment was on the import alias line inside parentheses, but mypy reports the error on the `from` line. Collapse to single-line imports so the suppression is on the correct line. Also add no-redef to the fallback branch. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> --------- Co-authored-by: yuneng-jiang <yuneng.jiang@gmail.com> Co-authored-by: Ishaan Jaffer <ishaanjaffer0324@gmail.com> Co-authored-by: Harshit Jain <48647625+Harshit28j@users.noreply.github.com> Co-authored-by: Cesar Garcia <128240629+Chesars@users.noreply.github.com> Co-authored-by: Yogeshwaran Ravichandran <96047771+yogeshwaran10@users.noreply.github.com> Co-authored-by: Alexsander Hamir <alexsanderhamirgomesbaptista@gmail.com> Co-authored-by: Harshit Jain <harshitjain0562@gmail.com> Co-authored-by: Jay Prajapati <79649559+jayy-77@users.noreply.github.com> Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com> Co-authored-by: Krish Dholakia <krrishdholakia@gmail.com> Co-authored-by: Tamir Kiviti <95572081+tamirkiviti13@users.noreply.github.com> Co-authored-by: Ephrim Stanley <ephrim.stanley@point72.com> Co-authored-by: michelligabriele <gabriele.michelli@icloud.com> Co-authored-by: Gabriele Michelli <michelligabriele0@gmail.com> Co-authored-by: Chesars <cesarponce19544@gmail.com> Co-authored-by: yogeshwaran10 <ywaran646@gmail.com> Co-authored-by: colinlin-stripe <colinlin@stripe.com> Co-authored-by: houdataali <84786211+houdataali@users.noreply.github.com> Co-authored-by: mubashir1osmani <mubashir.osmani777@gmail.com> Co-authored-by: Sameer Kankute <sameer@berri.ai> Co-authored-by: Cursor Agent <cursoragent@cursor.com> Co-authored-by: Milan <milan@berri.ai> Co-authored-by: Xianzong Xie <xianzongxie@stripe.com> Co-authored-by: Teo Stocco <zifeo@users.noreply.github.com> Co-authored-by: Pragya Sardana <pragyasardana@gmail.com> Co-authored-by: Ryan Wilson <84201908+ryewilson@users.noreply.github.com> Co-authored-by: Brian Caswell <bcaswell@microsoft.com> Co-authored-by: lizhen <lizhen10763@autohome.com.cn> Co-authored-by: boarder7395 <37314943+boarder7395@users.noreply.github.com> Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com> Co-authored-by: rushilchugh01 <58689126+rushilchugh01@users.noreply.github.com>

* feat: add disable_default_user_agent flag Add litellm.disable_default_user_agent global flag to control whether the automatic User-Agent header is injected into HTTP requests. * refactor: update HTTP handlers to respect disable_default_user_agent Modify http_handler.py and httpx_handler.py to check the disable_default_user_agent flag and return empty headers when disabled. This allows users to override the User-Agent header completely. * test: add comprehensive tests for User-Agent customization Add 8 tests covering: - Default User-Agent behavior - Disabling default User-Agent - Custom User-Agent via extra_headers - Environment variable support - Async handler support - Override without disabling - Claude Code use case - Backwards compatibility * fix: honor LITELLM_USER_AGENT for default User-Agent * refactor: drop disable_default_user_agent setting * test: cover LITELLM_USER_AGENT override in custom_httpx handlers * fix Prompt Studio history to load tools and system messages (BerriAI#19920) * fix(vertex_ai): convert image URLs to base64 in tool messages for Anthropic (BerriAI#19896) * fix(vertex_ai): convert image URLs to base64 in tool messages for Anthropic Fixes BerriAI#19891 Vertex AI Anthropic models don't support URL sources for images. LiteLLM already converted image URLs to base64 for user messages, but not for tool messages (role='tool'). This caused errors when using ToolOutputImage with image_url in tool outputs. Changes: - Add force_base64 parameter to convert_to_anthropic_tool_result() - Pass force_base64 to create_anthropic_image_param() for tool message images - Calculate force_base64 in anthropic_messages_pt() based on llm_provider - Add unit tests for tool message image handling * chore: remove extra comment from test file header * Fix/router search tools v2 (BerriAI#19840) * fix(proxy_server): pass search_tools to Router during DB-triggered initialization * fix search tools from db * add missing statement to handle from db * fix import issues to pass lint errors * Fix: Batch cancellation ownership bug * Fix stream_chunk_builder to preserve images from streaming chunks (BerriAI#19654) Fixes BerriAI#19478 The stream_chunk_builder function was not handling image chunks from models like gemini-2.5-flash-image. When streaming responses were reconstructed (e.g., for caching), images in delta.images were lost. This adds handling for image_chunks similar to how audio, annotations, and other delta fields are handled. * fix(docker): add libsndfile to main Dockerfile for ARM64 audio processing (BerriAI#19776) Fixes BerriAI#16920 for users of the stable release images. The previous fix (PR BerriAI#18092) added libsndfile to docker/Dockerfile.alpine, but stable releases are built from the main Dockerfile (Wolfi-based), not the Alpine variant. * Fix File access permissions for .retreive and .delete * Fix Only allowed to call routes: ['llm_api_routes']. Tried to call route: /batches/bGl0ZWxsbV9wcm/cancel * fix(proxy): add datadog_llm_observability to /health/services allowed list (BerriAI#19952) The /health/services endpoint rejected datadog_llm_observability as an unknown service, even though it was registered in the core callback registry and __init__.py. Added it to both the Literal type hint and the hardcoded validation list in the health endpoint. * fix(proxy): prevent provider-prefixed model leaks (BerriAI#19943) * fix(proxy): prevent provider-prefixed model leaks Proxy clients should not see LiteLLM internal provider prefixes (e.g. hosted_vllm/...) in the OpenAI-compatible response model field. This patch sanitizes the client-facing model name for both: - Non-streaming responses returned from base_process_llm_request - Streaming SSE chunks emitted by async_data_generator Adds regression tests covering vLLM-style hosted_vllm routing for both streaming and non-streaming paths. * chore(lint): suppress PLR0915 in proxy handler Ruff started flagging ProxyBaseLLMRequestProcessing.base_process_llm_request() for too many statements after the hotpatch changes. Add an explicit '# noqa: PLR0915' on the function definition to avoid a large refactor in a hotpatch. * refactor(proxy): make model restamp explicit Replace silent try/except/pass and type ignores with explicit model restamping. - Logs an error when the downstream response model differs from the client-requested model - Overwrites the OpenAI `model` field to the client-requested value to avoid leaking internal provider-prefixed identifiers - Applies the same behavior to streaming chunks, logging the mismatch only once per stream * chore(lint): drop PLR0915 suppression The model restamping bugfix made `base_process_llm_request()` slightly exceed Ruff's PLR0915 (too-many-statements) threshold, requiring a `# noqa` suppression. Collapse consecutive `hidden_params` extractions into tuple unpacking so the function falls back under the lint limit and remove the suppression. No functional change intended; this keeps the proxy model-field bugfix intact while aligning with project linting rules. * chore(proxy): log model mismatches as warnings These model-restamping logs are intentionally verbose: a mismatch is a useful signal that an internal provider/deployment identifier may be leaking into the public OpenAI response `model` field. - Downgrade model mismatch logs from error -> warning - Keep error logs only for cases where the proxy cannot read/override the model * fix(proxy): preserve client model for streaming aliasing Pre-call processing can rewrite request_data['model'] via model alias maps.\n\nOur streaming SSE generator was using the rewritten value when restamping chunk.model, which caused the public 'model' field to differ between streaming and non-streaming responses for alias-based requests.\n\nStash the original client model in request_data as _litellm_client_requested_model after the model has been routed, and prefer it when overriding the outgoing chunk model. Add a regression test for the alias-mapping case. * chore(lint): satisfy PLR0915 in streaming generator Ruff started flagging async_data_generator() for too many statements after adding model restamping logic.\n\nExtract the client-model selection + chunk restamping into small helpers to keep behavior unchanged while meeting the project's PLR0915 threshold. * fix(hosted_vllm): route through base_llm_http_handler to support ssl_verify (BerriAI#19893) * fix(hosted_vllm): route through base_llm_http_handler to support ssl_verify The hosted_vllm provider was falling through to the OpenAI catch-all path which doesn't pass ssl_verify to the HTTP client. This adds an explicit elif branch that routes hosted_vllm through base_llm_http_handler.completion() which properly passes ssl_verify to the httpx client. - Add explicit hosted_vllm branch in main.py completion() - Add ssl_verify tests for sync and async completion - Update existing audio_url test to mock httpx instead of OpenAI client * feat(hosted_vllm): add embedding support with ssl_verify - Add HostedVLLMEmbeddingConfig for embedding transformations - Register hosted_vllm embedding config in utils.py - Add lazy import for embedding transformation module - Add unit test for ssl_verify parameter handling * Add OpenRouter Kimi K2.5 (BerriAI#19872) Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com> * Fix: Encoding cancel batch response * Add tests for user level permissions on file and batch access * Fix: mypy errors * Fix lint issues * Add litellm metadata correctly for file create * Add cost tacking and usage info in call_type=aretrieve_batch * Fix max_input_tokens for gpt-5.2-codex * fix(gemini): support file retrieval in GoogleAIStudioFilesHandler * Allow config embedding models * adding tests * Model Usage per key * adding tests * fix(ResponseAPILoggingUtils): extract input tokens details as dict * Add routing of xai chat completions to responses when web search options is present * Add web search tests * Add disable flahg for anthropic gemini cache translation * fix aspectRatio mapping * feat: add /delete endpoint support for gemini * Fix: vllm embedding format * Fix: remove unsupported prompt-caching-scope-2026-01-05 header for vertex ai * Add mock client factory pattern and mock support for PostHog, Helicone, and Braintrust integrations (BerriAI#19707) * Add LangSmith mock client support - Create langsmith_mock_client.py following GCS and Langfuse patterns - Add mock mode detection via LANGSMITH_MOCK environment variable - Intercept LangSmith API calls via AsyncHTTPHandler.post patching - Add verbose logging throughout mock implementation - Update LangsmithLogger to initialize mock client when mock mode enabled - Supports configurable mock latency via LANGSMITH_MOCK_LATENCY_MS * Add Datadog mock client support - Create datadog_mock_client.py following GCS, Langfuse, and LangSmith patterns - Add mock mode detection via DATADOG_MOCK environment variable - Intercept Datadog API calls via AsyncHTTPHandler.post and httpx.Client.post patching - Add verbose logging throughout mock implementation - Update DataDogLogger and DataDogLLMObsLogger to initialize mock client when mock mode enabled - Supports both async and sync logging paths - Supports configurable mock latency via DATADOG_MOCK_LATENCY_MS * refactor: consolidate mock client logic into factory pattern - Create mock_client_factory.py to centralize common mock HTTP client logic - Refactor GCS, Langfuse, LangSmith, and Datadog mock clients to use factory - Improve GET/DELETE mock accuracy for GCS (return valid StandardLoggingPayload) - Fix DELETE mock to return empty body (204 No Content) instead of JSON - Reduce code duplication across integration mock clients * feat: add PostHog mock client support - Create posthog_mock_client.py using factory pattern - Integrate mock client into PostHogLogger with mock mode detection - Add verbose logging for mock mode initialization and batch operations - Enable mock mode via POSTHOG_MOCK environment variable * Add Helicone mock client support - Created helicone_mock_client.py using factory pattern (similar to GCS) - Integrated mock mode detection and initialization in HeliconeLogger - Mock client patches HTTPHandler.post to intercept Helicone API calls - Uses factory pattern for should_use_mock and MockResponse utilities - Custom HTTPHandler.post patching required since HTTPHandler uses self.client.send() * Add mock support for Braintrust integration and extend mock client factory - Add braintrust_mock_client.py with mock HTTP client for Braintrust integration testing - Integrate mock client into BraintrustLogger with mock mode detection - Refactor Helicone mock client to fully utilize factory's HTTPHandler.post patching - Extend mock_client_factory to support patching HTTPHandler.post for sync calls - Enable endpoint-specific mock responses for Braintrust (/project vs /project_logs) - All mock clients now properly handle both async (AsyncHTTPHandler) and sync (HTTPHandler) calls * Fix linter errors: remove unused imports and suppress complexity warning - Remove unused imports from gcs_bucket_mock_client.py (httpx, json, timedelta, Dict, Optional) - Remove unused Callable import from mock_client_factory.py - Add noqa comment to suppress PLR0915 complexity warning for create_mock_client_factory function * Document mock environment variables for PostHog, Helicone, Braintrust, Datadog, and Langsmith integrations - Add POSTHOG_MOCK and POSTHOG_MOCK_LATENCY_MS documentation - Add HELICONE_MOCK and HELICONE_MOCK_LATENCY_MS documentation - Add BRAINTRUST_MOCK and BRAINTRUST_MOCK_LATENCY_MS documentation - Add DATADOG_MOCK and DATADOG_MOCK_LATENCY_MS documentation - Add LANGSMITH_MOCK and LANGSMITH_MOCK_LATENCY_MS documentation All mock env vars follow the same pattern: enable mock mode for integration testing by intercepting API calls and returning mock responses without making actual network calls. * Fix security issue * Realtime API benchmarks (BerriAI#20074) * Add /realtime API benchmarks to Benchmarks documentation - Added new section showing performance improvements for /realtime endpoint - Included before/after metrics showing 182× faster p99 latency - Added test setup specifications and key optimizations - Referenced from v1.80.5-stable release notes Co-authored-by: ishaan <ishaan@berri.ai> * Update /realtime benchmarks to show current performance only - Removed before/after comparison, showing only current metrics - Clarified that benchmarks are e2e latency against fake realtime endpoint - Simplified table format for better readability Co-authored-by: ishaan <ishaan@berri.ai> --------- Co-authored-by: Cursor Agent <cursoragent@cursor.com> Co-authored-by: ishaan <ishaan@berri.ai> * fixes: ci pipeline router coverage failure (BerriAI#20065) * fix: working claude code with agent SDKs (BerriAI#20081) * [Feat] Add async_post_call_response_headers_hook to CustomLogger (BerriAI#20083) * Add async_post_call_response_headers_hook to CustomLogger (BerriAI#20070) Allow CustomLogger callbacks to inject custom HTTP response headers into streaming, non-streaming, and failure responses via a new async_post_call_response_headers_hook method. * async_post_call_response_headers_hook --------- Co-authored-by: michelligabriele <gabriele.michelli@icloud.com> * Add WATSONX_ZENAPIKEY * fix(proxy): resolve high CPU when router_settings in DB by avoiding REGISTRY.collect() in PrometheusServicesLogger (BerriAI#20087) * v0 - looks decen view * refactored code * fix ui * fixes ui * complete v2 viewer * fix drawer * Revert logs view commits to recreate with clean history (BerriAI#20090) This reverts commits: - 437e9e2 fix drawer - 61bb51d complete v2 viewer - 2014bcf fixes ui - 5f07635 fix ui - f07ef8a refactored code - 8b7a925 v0 - looks decen view Will create a new clean PR with the original changes. * update image and bounded logo in navbar * refactoring user dropdown * new utils * address feedback * [Feat] v2 - Logs view with side panel and improved UX (BerriAI#20091) * init: azure_ai/azure-model-router * show additional_costs in CostBreakdown * UI show cost breakdown fields * feat: dedicated cost calc for azure ai * test_azure_ai_model_router * docs azure model router * test azure model router * fix transfrom * Add transform file * fix:feat: route to config * v0 - looks decen view * refactored code * fix ui * fixes ui * complete v2 viewer * address feedback * address feedback * Delete resource modal dark mode * [Feat] UI - New View to render "Tools" on Logs View (BerriAI#20093) * v1 - tool viewer in logs page * add preview for tool sections * ui fixes * new tool view * Refactor: Address code review feedback - use Antd components Changes: - Use Antd Space component instead of manual flex layouts - Use Antd Text.copyable prop instead of custom clipboard utilities - Extract helper functions to utils.ts for testability - Remove clipboardUtils.ts (replaced with Antd built-in) - Update DrawerHeader, LogDetailsDrawer, and constants Benefits: - Cleaner code using standard Antd patterns - Better testability with separated utils - Consistent UX with Antd's copy tooltips - Reduced custom code maintenance Co-authored-by: Cursor <cursoragent@cursor.com> --------- Co-authored-by: Cursor <cursoragent@cursor.com> * [Feat] UI - Add Pretty print view of request/response (BerriAI#20096) * v1 - tool viewer in logs page * add preview for tool sections * ui fixes * new tool view * v1 - new pretty view * clean ui * polish fixes * nice view input/output * working i/o cards * fixes for log view --------- Co-authored-by: Warp <agent@warp.dev> * remove md * fixed mcp tools instructions on ui to show comma seprated str instead of list * docs: cleanup docs * litellm_fix: add missing timezone import to proxy_server.py (BerriAI#20121) * fix(proxy): reduce PLR0915 complexity in base_process_llm_request (BerriAI#20127) * litellm_fix(ui): remove unused ToolOutlined import (BerriAI#20129) * litellm_fix(e2e): disable bedrock-converse-claude-sonnet-4.5 model in tests (BerriAI#20131) * litellm_fix(test): fix Azure AI cost calculator test - use Logging class (BerriAI#20134) * litellm_fix(test): fix Bedrock tool search header test regression (BerriAI#20135) * litellm_fix(test): allow comment field in schema and exclude robotics models from tpm check (BerriAI#20139) * litellm_docs: add missing environment variable documentation (BerriAI#20138) * litellm_fix(test): add acancel_batch to Azure SDK client initialization test (BerriAI#20143) * litellm_fix: handle unknown models in Azure AI cost calculator (BerriAI#20150) * litellm_fix(test): fix router silent experiment tests to properly mock async functions (BerriAI#20140) * chore: update Next.js build artifacts (2026-01-31 17:20 UTC, node v22.16.0) * fix(proxy): use get_async_httpx_client for logo download (BerriAI#20155) Replace direct AsyncHTTPHandler instantiation with get_async_httpx_client to avoid +500ms latency per request from creating new async clients. Added httpxSpecialProvider.UI for UI-related HTTP requests like logo downloads. * fix(datadog): check for agent mode before requiring DD_API_KEY/DD_SITE (BerriAI#20156) The DataDog LLM Obs logger was checking for DD_API_KEY and DD_SITE before checking if agent mode (LITELLM_DD_AGENT_HOST) was configured. In agent mode, the DataDog agent handles authentication, so these environment variables are not required. This fix moves the agent mode check first, and only validates DD_API_KEY and DD_SITE when using direct API mode. Fixes test_datadog_llm_obs_agent_configuration and test_datadog_llm_obs_agent_no_api_key_ok * litellm_fix: handle empty dict for web_search_options in Nova grounding (BerriAI#20159) The condition `value and isinstance(value, dict)` fails for empty dicts because `{}` is falsy in Python. Users commonly pass `web_search_options={}` to enable Nova grounding without specifying additional options. Changed the condition to `isinstance(value, dict)` which correctly handles both empty and non-empty dicts. Fixes failing tests: - test_bedrock_nova_grounding_async - test_bedrock_nova_grounding_request_transformation - test_bedrock_nova_grounding_web_search_options_non_streaming - test_bedrock_nova_grounding_with_function_tools * fix(mypy): fix type errors in files, opentelemetry, gemini transformation, and key management (BerriAI#20161) - files/main.py: rename uuid import to uuid_module to avoid conflict with router import - integrations/opentelemetry.py: add fallback for callback_name to ensure str type - llms/gemini/files/transformation.py: add type annotation for params dict - proxy/management_endpoints/key_management_endpoints.py: add null check for prisma_client * litellm_fix(test): update Prometheus metric test assertions with new labels (BerriAI#20162) This fixes the failing litellm_mapped_enterprise_tests (metrics/logging) job. Recent commits added new labels to several Prometheus metrics (model_id, client_ip, user_agent) but the test assertions weren't fully updated to expect these new labels. Tests fixed: - test_async_post_call_failure_hook - test_async_log_failure_event - test_increment_token_metrics - test_log_failure_fallback_event - test_set_latency_metrics - test_set_llm_deployment_success_metrics Labels added to test assertions: - model_id for token metrics (litellm_tokens_metric, litellm_input_tokens_metric, litellm_output_tokens_metric) - model_id for latency metrics (litellm_llm_api_latency_metric) - model_id for remaining requests/tokens metrics - model_id for fallback metrics - model_id for overhead latency metric - client_ip and user_agent for deployment failure/total/success responses - client_ip and user_agent for proxy failed/total requests metrics * test: remove hosted_vllm from OpenAI client tests (BerriAI#20163) hosted_vllm no longer uses the OpenAI client, so these tests that mock the OpenAI client are not applicable to hosted_vllm. Removes hosted_vllm from: - test_openai_compatible_custom_api_base - test_openai_compatible_custom_api_video * litellm_fix: bump litellm-proxy-extras version to 0.4.28 (BerriAI#20166) Changes were made to litellm_proxy_extras (schema.prisma, utils.py, migrations) but version was not bumped, causing CI publish job to fail. This commit bumps the version from 0.4.27 to 0.4.28 in all required files: - litellm-proxy-extras/pyproject.toml - requirements.txt - pyproject.toml Co-authored-by: shin-bot-litellm <shin-bot-litellm@users.noreply.github.com> * litellm_fix(mypy): fix remaining type errors (BerriAI#20164) - route_llm_request.py: add acancel_batch and afile_delete to route_type Literal - router.py: add SearchToolInfoTypedDict and search_tool_info to SearchToolTypedDict - gemini/files/transformation.py: fix validate_environment signature to match base class - responses transformation.py: fix Dict type annotations to use int instead of Optional[int] - vector_stores/endpoints.py: add team_id and user_id to LiteLLM_ManagedVectorStoresTable constructor Co-authored-by: shin-bot-litellm <shin-bot-litellm@users.noreply.github.com> * litellm_fix(security): allowlist Next.js CVEs for 7 days (BerriAI#20169) Temporarily allowlist Next.js vulnerabilities in UI dashboard: - GHSA-h25m-26qc-wcjf (HIGH: DoS via request deserialization) - CVE-2025-59471 (MEDIUM: Image Optimizer DoS) Fix: Upgrade to Next.js 15.5.10+ or 16.1.5+ (7-day timeline) Changes: - Added .trivyignore with Next.js CVEs - Updated security_scans.sh to use --ignorefile flag * litellm_fix(router): use safe_deep_copy in _get_silent_experiment_kwargs (BerriAI#20170) **Regression introduced in:** PR BerriAI#19544 (feat: add feature to make silent calls) Fixes check_code_and_doc_quality CI failure. Line 1332 used copy.deepcopy(kwargs) which violates ban_copy_deepcopy_kwargs check. kwargs can contain non-serializable objects like OTEL spans. Changed to safe_deep_copy(kwargs) which handles these correctly. * docs(embeddings): add supported input formats section (BerriAI#20073) Document valid input formats for /v1/embeddings endpoint per OpenAI spec. Clarifies that array of string arrays is not a valid format. * fix proxy extras pip * fix gemini files * fix EventDrivenCacheCoordinator * test_increment_top_level_request_and_spend_metrics * fix typing * fix transform_retrieve_file_response * fix linting * fix mcp linting * _add_web_search_tool * test_bedrock_nova_grounding_web_search_options_non_streaming * add _is_bedrock_tool_block * fix MCP client * fix files * litellm_fix(lint): remove unused ToolNameValidationResult imports (BerriAI#20176) Fixes ruff F401 errors in check_code_and_doc_quality CI job. **Regression introduced in:** 41ec820 (fix files) - added files with unused imports ## Problem ToolNameValidationResult is imported but never used in: - litellm/proxy/_experimental/mcp_server/mcp_server_manager.py - litellm/proxy/management_endpoints/mcp_management_endpoints.py ## Fix ```diff - ToolNameValidationResult, ``` Removed from both import statements. ## Changes - mcp_server_manager.py: -1 line (removed unused import) - mcp_management_endpoints.py: -1 line (removed unused import) * litellm_fix(azure): Fix acancel_batch not using Azure SDK client initialization (BerriAI#20168) - Fixed model parameter being overwritten to None in acancel_batch function - Added dedicated acancel_batch/\_acancel_batch methods in Router - Properly extracts custom_llm_provider from deployment like acreate_batch This fixes test_ensure_initialize_azure_sdk_client_always_used[acancel_batch] which expected azure_batches_instance.initialize_azure_sdk_client to be called. Co-authored-by: shin-bot-litellm <shin-bot-litellm@users.noreply.github.com> * fix tar security issue with TAR * fix model name during fallback * test_get_image_non_root_uses_var_lib_assets_dir * test_delete_vector_store_checks_access * test_get_session_iterator_thread_safety * Fix health endpoints * _prepare_vertex_auth_headers * test_budget_reset_and_expires_at_first_of_month * fix(test): add router.acancel_batch coverage (BerriAI#20183) - Add test_router_acancel_batch.py with mock test for router.acancel_batch() - Add _acancel_batch to ignored list (internal helper tested via public API) Fixes CI failure in check_code_and_doc_quality job * fix(mypy): fix validate_tool_name return type signatures (BerriAI#20184) Move ToolNameValidationResult class definition outside the fallback function and use consistent return type annotation to satisfy mypy. Files fixed: - proxy/_experimental/mcp_server/mcp_server_manager.py - proxy/management_endpoints/mcp_management_endpoints.py * fix(test): update test_chat_completion to handle metadata in body The proxy now adds metadata to the request body during processing. Updated test to compare fields individually and strip metadata from body comparison. Fixes litellm_proxy_unit_testing_part2 CI failure. * fix(proxy): resolve 'multiple values for keyword argument' in batch cancel and file retrieve - batch_endpoints.py: Pop batch_id from data before creating CancelBatchRequest to avoid duplicate batch_id when data already contains it from earlier cast - files_endpoints.py: Pop file_id from data before calling afile_retrieve to avoid duplicate file_id when data was initialized with {"file_id": file_id} - test_claude_agent_sdk.py: Disable bedrock-nova-premier test as it requires an inference profile for on-demand throughput (AWS limitation) Fixes: e2e_openai_endpoints tests (test_batches_operations, test_file_operations) Fixes: proxy_e2e_anthropic_messages_tests (nova-premier model skip) * ci(security): allowlist GHSA-34x7-hfp2-rc4v (node-tar hardlink) Not applicable - tar CLI not exposed in application code * fix(mypy): add type: ignore for conditional function variants in MCP modules The mypy error 'All conditional function variants have identical signatures' occurs when defining fallback functions in try/except ImportError blocks. Adding '# type: ignore[misc]' suppresses this false positive. Fixes: - mcp_server_manager.py:80 - validate_tool_name fallback - mcp_management_endpoints.py:72 - validate_tool_name fallback * fix: make cache updates synchronous for budget enforcement The budget enforcement was failing in tests because cache updates were fire-and-forget (asyncio.create_task), causing race conditions where subsequent requests would read stale spend data. Changes: 1. proxy_track_cost_callback.py: await update_cache() instead of create_task 2. proxy_server.py: await async_set_cache_pipeline() instead of create_task 3. auth_checks.py: prefer valid_token.team_member_spend (from fresh cache) over team_membership.spend (which may be stale) This ensures budget checks see the most recent spend values and properly enforce budget limits when requests come in quick succession. Fixes: test_users_in_team_budget, test_chat_completion_low_budget * fix(test): accept both AuthenticationError and InternalServerError in batch_completion test (BerriAI#20186) The test uses an invalid API key to verify that batch_completion returns exceptions rather than raising them. However, depending on network conditions, the error may be: - AuthenticationError: API properly rejected the invalid key - InternalServerError: Connection error occurred before API could respond Both are valid outcomes for this test case. Co-authored-by: shin-bot-litellm <shin-bot-litellm@users.noreply.github.com> * test_embedding fix * fix bedrock-nova-premier * Revert "fix: make cache updates synchronous for budget enforcement" This reverts commit d038341. * fix(test): correct prompt_tokens in test_string_cost_values (BerriAI#20185) The test had prompt_tokens=1000 but the sum of token details was 1150 (text=700 + audio=100 + cached=200 + cache_creation=150). This triggered the double-counting detection logic which recalculated text_tokens to 550, causing the assertion to fail. Fixed by setting prompt_tokens=1150 to match the sum of details. * fix: bedrock-converse-claude-sonnet-4.5 * fix: stabilize CI tests - routes and bedrock config - Add /v1/vector_store/list route for OpenAI API compatibility (fixes test_routes_on_litellm_proxy) - Fix Bedrock Converse API model format (bedrock_converse/ → bedrock/converse/) - Fix Nova Premier inference profile prefix (amazon. → us.amazon.) - Add STABILIZATION_TODO.md to .gitignore Tested locally - all affected tests now pass Co-authored-by: Cursor <cursoragent@cursor.com> * sync: generator client * add LiteLLM_ManagedVectorStoresTable_user_id_idx * docs/blog index page (BerriAI#20188) * docs: add card-based blog index page for mobile navigation Fixes BerriAI#20100 - the blog landing page showed post content directly instead of an index, with no way to navigate between posts on mobile. - Swizzle BlogListPage with card-based grid layout - Featured latest post spans full width with badge - Responsive 2-column grid with orphan handling - Pagination, SEO metadata, accessibility (aria-label, dateTime, heading hierarchy) - Add description frontmatter to existing blog posts * docs: add deterministic fallback colors for unknown blog tags * docs: rename blog heading to The LiteLLM Blog * UI spend logs setting docs * bump extras * fix fake-openai-endpoint * doc fix * fix team budget checks * bump: version 1.81.5 → 1.81.6 * litellm_fix_mapped_tests_core: clear client cache and fix isinstance checks (BerriAI#20196) ## Problem Tests using mocked HTTP clients were hitting real APIs because: 1. HTTP client cache was returning previously cached real clients 2. isinstance checks failed due to module identity issues from sys.path ### Tests affected: - test_send_email_missing_api_key - test_send_email_multiple_recipients (resend & sendgrid) - test_search_uses_registry_credentials - test_vector_store_create_with_simple_provider_name - test_vector_store_create_with_provider_api_type - test_vector_store_create_with_ragflow_provider - test_image_edit_merges_headers_and_extra_headers - test_retrieve_container_basic (container API tests) ## Solution 1. Add clear_client_cache fixture (autouse=True) to clear litellm.in_memory_llm_clients_cache before each test 2. Fix isinstance checks to use type name comparison (avoids module identity issues from sys.path.insert) ## Why not disable_aiohttp_transport The default transport is aiohttp, so tests should work with it. Clearing the cache ensures mocks are used instead of cached real clients. ## Regression PR BerriAI#19829 (commit f95572e) added @respx.mock but cached clients from earlier tests were being reused, bypassing the mocks. Co-authored-by: shin-bot-litellm <shin-bot-litellm@users.noreply.github.com> * test_chat_completion_low_budget * fix: delete_file * fixes * fix: update test_prometheus to expect masked user_id in metrics The user_id field 'default_user_id' is being masked to '*******_user_id' in prometheus metrics for privacy. Updated test expectations to match the actual behavior. Co-authored-by: Cursor <cursoragent@cursor.com> * docs fix * feat(bedrock): add base cache costs for sonnet v1 (BerriAI#20214) * docs: fix dead links in v1.81.6 release notes (BerriAI#20218) - Fix /docs/search/index -> /docs/search (404 error) - Fix /cookbook/ -> GitHub cookbook URL (404 error) Co-authored-by: shin-bot-litellm <shin-bot-litellm@users.noreply.github.com> * fix(test): update test_prometheus with masked user_id and missing labels - Update expected user_id from 'default_user_id' to '*******_user_id' (PII masking) - Add missing client_ip, user_agent, model_id labels (from PRs BerriAI#19717, BerriAI#19678) - Update label order to match Prometheus alphabetical sorting Co-authored-by: Cursor <cursoragent@cursor.com> * litellm_fix_mapped_tests_core: fix test isolation and mock injection issues (BerriAI#20209) * litellm_fix_mapped_tests_core: fix test isolation and mock injection issues ## Problem Four tests in litellm_mapped_tests_core were failing: 1. test_register_model_with_scientific_notation - KeyError due to test isolation issues 2. test_search_uses_registry_credentials - Mock not being called due to incorrect patch path 3. test_send_email_missing_api_key - Real API calls despite mocking 4. test_stream_transformation_error_sync - Mock not effective, real API called ## Solution ### test_register_model_with_scientific_notation - Use unique model name to avoid conflicts with other tests - Clear LRU caches before test to prevent stale data - Clean up model_cost entry after test ### test_search_uses_registry_credentials - Use patch.object() on the actual base_llm_http_handler instance - String-based patching for instance methods can fail; direct object patching is more reliable ### test_send_email_missing_api_key - Directly inject mock HTTP client into logger instance - This bypasses any caching issues that could cause the fixture mock to be ineffective ### test_stream_transformation_error_sync - Patch litellm.completion directly instead of the handler module's litellm reference - This ensures the mock is effective regardless of import order ## Regression These tests were affected by LRU caching added in BerriAI#19606 and HTTP client caching. * fix(test): use patch.object for container API tests to fix mock injection ## Problem test_retrieve_container_basic tests were failing because mocks weren't being applied correctly. The tests used string-based patching: patch('litellm.containers.main.base_llm_http_handler') But base_llm_http_handler is imported at module level, so the mock wasn't intercepting the actual handler calls, resulting in real HTTP requests to OpenAI API. ## Solution Use patch.object() to directly mock methods on the imported handler instance. Import base_llm_http_handler in the test file and patch like: patch.object(base_llm_http_handler, 'container_retrieve_handler', ...) This ensures the mock is applied to the actual object being used, regardless of import order or caching. * fix(test): add missing Prometheus metric labels to test_proxy_failure_metrics Add client_ip, user_agent, model_id labels to expected metric patterns. These labels were added in PRs BerriAI#19717 and BerriAI#19678 but test wasn't updated. * fix(test_resend_email): use direct mock injection for all email tests Extend the mock injection pattern used in test_send_email_missing_api_key to all other tests in the file: - test_send_email_success - test_send_email_multiple_recipients Instead of relying on fixture-based patching and respx mocks which can fail due to import order and caching issues, directly inject the mock HTTP client into the logger instance. This ensures mocks are always used regardless of test execution order. * fix(test): use patch.object for image_edit and vector_store tests - test_image_edit_merges_headers_and_extra_headers: import base_llm_http_handler and use patch.object instead of string path patching - test_search_uses_registry_credentials: import module and patch via module.base_llm_http_handler to ensure we patch the right instance --------- Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com> * Revert "Merge pull request BerriAI#18790 from BerriAI/litellm_key_team_routing_3" This reverts commit ae26d8e, reversing changes made to 864e8c6. * test_proxy_failure_metrics * test_proxy_success_metrics * fix(test): make test_proxy_failure_metrics resilient to missing proxy-level metrics - Check for both litellm_proxy_failed_requests_metric_total and the deprecated litellm_llm_api_failed_requests_metric_total - The proxy-level failure hook may not always be called depending on where the exception occurs - Simplify total_requests check to only verify key fields Co-authored-by: Cursor <cursoragent@cursor.com> * test fix * docs: Update v1.81.6 release notes - focus on Logs v2 with Tool Call Tracing (BerriAI#20225) - Updated title to highlight Logs v2 feature - Simplified Key Highlights to focus on Logs v2 / tool call tracing - Rewrote Logs v2 description with improved language style - Removed Claude Agents SDK and RAG API from key highlights section - TODO: Add image (logs_v2_tool_tracing.png) Co-authored-by: shin-bot-litellm <shin-bot-litellm@users.noreply.github.com> * feat: enhance Cohere embedding support with additional parameters and model version * Update Vertex AI Text to Speech doc to show use of audio * temporarily remove `litellm/proxy/_experimental/out` before merging main * chore: update Next.js build artifacts (2026-02-06 09:45 UTC, node v24.13.0) * chore: update baseline-browser-mapping to version 2.9.19 --------- Co-authored-by: jayy-77 <1427jay@gmail.com> Co-authored-by: Neha Prasad <neh6a683@gmail.com> Co-authored-by: Cesar Garcia <128240629+Chesars@users.noreply.github.com> Co-authored-by: Harshit Jain <48647625+Harshit28j@users.noreply.github.com> Co-authored-by: Sameer Kankute <sameer@berri.ai> Co-authored-by: michelligabriele <gabriele.michelli@icloud.com> Co-authored-by: Bernardo Donadio <bcdonadio@bcdonadio.com> Co-authored-by: Christopher Chase <cchase@redhat.com> Co-authored-by: Aaron Yim <aaronchyim@gmail.com> Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com> Co-authored-by: Takumi Matsuzawa <152503584+genga6@users.noreply.github.com> Co-authored-by: Varun Sripad <varunsripad@Varuns-MacBook-Air.local> Co-authored-by: yuneng-jiang <yuneng.jiang@gmail.com> Co-authored-by: Rhys <nghuutho74@gmail.com> Co-authored-by: Alexsander Hamir <alexsanderhamirgomesbaptista@gmail.com> Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com> Co-authored-by: Cursor Agent <cursoragent@cursor.com> Co-authored-by: ishaan <ishaan@berri.ai> Co-authored-by: Warp <agent@warp.dev> Co-authored-by: shivam <shivam@uni.minerva.edu> Co-authored-by: Krrish Dholakia <krrishdholakia@gmail.com> Co-authored-by: Shivam Rawat <161387515+shivamrawat1@users.noreply.github.com> Co-authored-by: shin-bot-litellm <shin-bot-litellm@berri.ai> Co-authored-by: shin-bot-litellm <shin-bot-litellm@users.noreply.github.com> Co-authored-by: ryan-crabbe <128659760+ryan-crabbe@users.noreply.github.com> Co-authored-by: cscguochang-agent <cscguochang@gmail.com> Co-authored-by: amirzaushnizer <amir.z@qodo.ai>

…I (instead of the deprecated expander UI) (#7) * feat: add disable_default_user_agent flag Add litellm.disable_default_user_agent global flag to control whether the automatic User-Agent header is injected into HTTP requests. * refactor: update HTTP handlers to respect disable_default_user_agent Modify http_handler.py and httpx_handler.py to check the disable_default_user_agent flag and return empty headers when disabled. This allows users to override the User-Agent header completely. * test: add comprehensive tests for User-Agent customization Add 8 tests covering: - Default User-Agent behavior - Disabling default User-Agent - Custom User-Agent via extra_headers - Environment variable support - Async handler support - Override without disabling - Claude Code use case - Backwards compatibility * fix: honor LITELLM_USER_AGENT for default User-Agent * refactor: drop disable_default_user_agent setting * test: cover LITELLM_USER_AGENT override in custom_httpx handlers * fix Prompt Studio history to load tools and system messages (BerriAI#19920) * fix(vertex_ai): convert image URLs to base64 in tool messages for Anthropic (BerriAI#19896) * fix(vertex_ai): convert image URLs to base64 in tool messages for Anthropic Fixes BerriAI#19891 Vertex AI Anthropic models don't support URL sources for images. LiteLLM already converted image URLs to base64 for user messages, but not for tool messages (role='tool'). This caused errors when using ToolOutputImage with image_url in tool outputs. Changes: - Add force_base64 parameter to convert_to_anthropic_tool_result() - Pass force_base64 to create_anthropic_image_param() for tool message images - Calculate force_base64 in anthropic_messages_pt() based on llm_provider - Add unit tests for tool message image handling * chore: remove extra comment from test file header * Fix/router search tools v2 (BerriAI#19840) * fix(proxy_server): pass search_tools to Router during DB-triggered initialization * fix search tools from db * add missing statement to handle from db * fix import issues to pass lint errors * Fix: Batch cancellation ownership bug * Fix stream_chunk_builder to preserve images from streaming chunks (BerriAI#19654) Fixes BerriAI#19478 The stream_chunk_builder function was not handling image chunks from models like gemini-2.5-flash-image. When streaming responses were reconstructed (e.g., for caching), images in delta.images were lost. This adds handling for image_chunks similar to how audio, annotations, and other delta fields are handled. * fix(docker): add libsndfile to main Dockerfile for ARM64 audio processing (BerriAI#19776) Fixes BerriAI#16920 for users of the stable release images. The previous fix (PR BerriAI#18092) added libsndfile to docker/Dockerfile.alpine, but stable releases are built from the main Dockerfile (Wolfi-based), not the Alpine variant. * Fix File access permissions for .retreive and .delete * Fix Only allowed to call routes: ['llm_api_routes']. Tried to call route: /batches/bGl0ZWxsbV9wcm/cancel * fix(proxy): add datadog_llm_observability to /health/services allowed list (BerriAI#19952) The /health/services endpoint rejected datadog_llm_observability as an unknown service, even though it was registered in the core callback registry and __init__.py. Added it to both the Literal type hint and the hardcoded validation list in the health endpoint. * fix(proxy): prevent provider-prefixed model leaks (BerriAI#19943) * fix(proxy): prevent provider-prefixed model leaks Proxy clients should not see LiteLLM internal provider prefixes (e.g. hosted_vllm/...) in the OpenAI-compatible response model field. This patch sanitizes the client-facing model name for both: - Non-streaming responses returned from base_process_llm_request - Streaming SSE chunks emitted by async_data_generator Adds regression tests covering vLLM-style hosted_vllm routing for both streaming and non-streaming paths. * chore(lint): suppress PLR0915 in proxy handler Ruff started flagging ProxyBaseLLMRequestProcessing.base_process_llm_request() for too many statements after the hotpatch changes. Add an explicit '# noqa: PLR0915' on the function definition to avoid a large refactor in a hotpatch. * refactor(proxy): make model restamp explicit Replace silent try/except/pass and type ignores with explicit model restamping. - Logs an error when the downstream response model differs from the client-requested model - Overwrites the OpenAI `model` field to the client-requested value to avoid leaking internal provider-prefixed identifiers - Applies the same behavior to streaming chunks, logging the mismatch only once per stream * chore(lint): drop PLR0915 suppression The model restamping bugfix made `base_process_llm_request()` slightly exceed Ruff's PLR0915 (too-many-statements) threshold, requiring a `# noqa` suppression. Collapse consecutive `hidden_params` extractions into tuple unpacking so the function falls back under the lint limit and remove the suppression. No functional change intended; this keeps the proxy model-field bugfix intact while aligning with project linting rules. * chore(proxy): log model mismatches as warnings These model-restamping logs are intentionally verbose: a mismatch is a useful signal that an internal provider/deployment identifier may be leaking into the public OpenAI response `model` field. - Downgrade model mismatch logs from error -> warning - Keep error logs only for cases where the proxy cannot read/override the model * fix(proxy): preserve client model for streaming aliasing Pre-call processing can rewrite request_data['model'] via model alias maps.\n\nOur streaming SSE generator was using the rewritten value when restamping chunk.model, which caused the public 'model' field to differ between streaming and non-streaming responses for alias-based requests.\n\nStash the original client model in request_data as _litellm_client_requested_model after the model has been routed, and prefer it when overriding the outgoing chunk model. Add a regression test for the alias-mapping case. * chore(lint): satisfy PLR0915 in streaming generator Ruff started flagging async_data_generator() for too many statements after adding model restamping logic.\n\nExtract the client-model selection + chunk restamping into small helpers to keep behavior unchanged while meeting the project's PLR0915 threshold. * fix(hosted_vllm): route through base_llm_http_handler to support ssl_verify (BerriAI#19893) * fix(hosted_vllm): route through base_llm_http_handler to support ssl_verify The hosted_vllm provider was falling through to the OpenAI catch-all path which doesn't pass ssl_verify to the HTTP client. This adds an explicit elif branch that routes hosted_vllm through base_llm_http_handler.completion() which properly passes ssl_verify to the httpx client. - Add explicit hosted_vllm branch in main.py completion() - Add ssl_verify tests for sync and async completion - Update existing audio_url test to mock httpx instead of OpenAI client * feat(hosted_vllm): add embedding support with ssl_verify - Add HostedVLLMEmbeddingConfig for embedding transformations - Register hosted_vllm embedding config in utils.py - Add lazy import for embedding transformation module - Add unit test for ssl_verify parameter handling * Add OpenRouter Kimi K2.5 (BerriAI#19872) Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com> * Fix: Encoding cancel batch response * Add tests for user level permissions on file and batch access * Fix: mypy errors * Fix lint issues * Add litellm metadata correctly for file create * Add cost tacking and usage info in call_type=aretrieve_batch * Fix max_input_tokens for gpt-5.2-codex * fix(gemini): support file retrieval in GoogleAIStudioFilesHandler * Allow config embedding models * adding tests * Model Usage per key * adding tests * fix(ResponseAPILoggingUtils): extract input tokens details as dict * Add routing of xai chat completions to responses when web search options is present * Add web search tests * Add disable flahg for anthropic gemini cache translation * fix aspectRatio mapping * feat: add /delete endpoint support for gemini * Fix: vllm embedding format * Fix: remove unsupported prompt-caching-scope-2026-01-05 header for vertex ai * Add mock client factory pattern and mock support for PostHog, Helicone, and Braintrust integrations (BerriAI#19707) * Add LangSmith mock client support - Create langsmith_mock_client.py following GCS and Langfuse patterns - Add mock mode detection via LANGSMITH_MOCK environment variable - Intercept LangSmith API calls via AsyncHTTPHandler.post patching - Add verbose logging throughout mock implementation - Update LangsmithLogger to initialize mock client when mock mode enabled - Supports configurable mock latency via LANGSMITH_MOCK_LATENCY_MS * Add Datadog mock client support - Create datadog_mock_client.py following GCS, Langfuse, and LangSmith patterns - Add mock mode detection via DATADOG_MOCK environment variable - Intercept Datadog API calls via AsyncHTTPHandler.post and httpx.Client.post patching - Add verbose logging throughout mock implementation - Update DataDogLogger and DataDogLLMObsLogger to initialize mock client when mock mode enabled - Supports both async and sync logging paths - Supports configurable mock latency via DATADOG_MOCK_LATENCY_MS * refactor: consolidate mock client logic into factory pattern - Create mock_client_factory.py to centralize common mock HTTP client logic - Refactor GCS, Langfuse, LangSmith, and Datadog mock clients to use factory - Improve GET/DELETE mock accuracy for GCS (return valid StandardLoggingPayload) - Fix DELETE mock to return empty body (204 No Content) instead of JSON - Reduce code duplication across integration mock clients * feat: add PostHog mock client support - Create posthog_mock_client.py using factory pattern - Integrate mock client into PostHogLogger with mock mode detection - Add verbose logging for mock mode initialization and batch operations - Enable mock mode via POSTHOG_MOCK environment variable * Add Helicone mock client support - Created helicone_mock_client.py using factory pattern (similar to GCS) - Integrated mock mode detection and initialization in HeliconeLogger - Mock client patches HTTPHandler.post to intercept Helicone API calls - Uses factory pattern for should_use_mock and MockResponse utilities - Custom HTTPHandler.post patching required since HTTPHandler uses self.client.send() * Add mock support for Braintrust integration and extend mock client factory - Add braintrust_mock_client.py with mock HTTP client for Braintrust integration testing - Integrate mock client into BraintrustLogger with mock mode detection - Refactor Helicone mock client to fully utilize factory's HTTPHandler.post patching - Extend mock_client_factory to support patching HTTPHandler.post for sync calls - Enable endpoint-specific mock responses for Braintrust (/project vs /project_logs) - All mock clients now properly handle both async (AsyncHTTPHandler) and sync (HTTPHandler) calls * Fix linter errors: remove unused imports and suppress complexity warning - Remove unused imports from gcs_bucket_mock_client.py (httpx, json, timedelta, Dict, Optional) - Remove unused Callable import from mock_client_factory.py - Add noqa comment to suppress PLR0915 complexity warning for create_mock_client_factory function * Document mock environment variables for PostHog, Helicone, Braintrust, Datadog, and Langsmith integrations - Add POSTHOG_MOCK and POSTHOG_MOCK_LATENCY_MS documentation - Add HELICONE_MOCK and HELICONE_MOCK_LATENCY_MS documentation - Add BRAINTRUST_MOCK and BRAINTRUST_MOCK_LATENCY_MS documentation - Add DATADOG_MOCK and DATADOG_MOCK_LATENCY_MS documentation - Add LANGSMITH_MOCK and LANGSMITH_MOCK_LATENCY_MS documentation All mock env vars follow the same pattern: enable mock mode for integration testing by intercepting API calls and returning mock responses without making actual network calls. * Fix security issue * Realtime API benchmarks (BerriAI#20074) * Add /realtime API benchmarks to Benchmarks documentation - Added new section showing performance improvements for /realtime endpoint - Included before/after metrics showing 182× faster p99 latency - Added test setup specifications and key optimizations - Referenced from v1.80.5-stable release notes Co-authored-by: ishaan <ishaan@berri.ai> * Update /realtime benchmarks to show current performance only - Removed before/after comparison, showing only current metrics - Clarified that benchmarks are e2e latency against fake realtime endpoint - Simplified table format for better readability Co-authored-by: ishaan <ishaan@berri.ai> --------- Co-authored-by: Cursor Agent <cursoragent@cursor.com> Co-authored-by: ishaan <ishaan@berri.ai> * fixes: ci pipeline router coverage failure (BerriAI#20065) * fix: working claude code with agent SDKs (BerriAI#20081) * [Feat] Add async_post_call_response_headers_hook to CustomLogger (BerriAI#20083) * Add async_post_call_response_headers_hook to CustomLogger (BerriAI#20070) Allow CustomLogger callbacks to inject custom HTTP response headers into streaming, non-streaming, and failure responses via a new async_post_call_response_headers_hook method. * async_post_call_response_headers_hook --------- Co-authored-by: michelligabriele <gabriele.michelli@icloud.com> * Add WATSONX_ZENAPIKEY * fix(proxy): resolve high CPU when router_settings in DB by avoiding REGISTRY.collect() in PrometheusServicesLogger (BerriAI#20087) * v0 - looks decen view * refactored code * fix ui * fixes ui * complete v2 viewer * fix drawer * Revert logs view commits to recreate with clean history (BerriAI#20090) This reverts commits: - 437e9e2 fix drawer - 61bb51d complete v2 viewer - 2014bcf fixes ui - 5f07635 fix ui - f07ef8a refactored code - 8b7a925 v0 - looks decen view Will create a new clean PR with the original changes. * update image and bounded logo in navbar * refactoring user dropdown * new utils * address feedback * [Feat] v2 - Logs view with side panel and improved UX (BerriAI#20091) * init: azure_ai/azure-model-router * show additional_costs in CostBreakdown * UI show cost breakdown fields * feat: dedicated cost calc for azure ai * test_azure_ai_model_router * docs azure model router * test azure model router * fix transfrom * Add transform file * fix:feat: route to config * v0 - looks decen view * refactored code * fix ui * fixes ui * complete v2 viewer * address feedback * address feedback * Delete resource modal dark mode * [Feat] UI - New View to render "Tools" on Logs View (BerriAI#20093) * v1 - tool viewer in logs page * add preview for tool sections * ui fixes * new tool view * Refactor: Address code review feedback - use Antd components Changes: - Use Antd Space component instead of manual flex layouts - Use Antd Text.copyable prop instead of custom clipboard utilities - Extract helper functions to utils.ts for testability - Remove clipboardUtils.ts (replaced with Antd built-in) - Update DrawerHeader, LogDetailsDrawer, and constants Benefits: - Cleaner code using standard Antd patterns - Better testability with separated utils - Consistent UX with Antd's copy tooltips - Reduced custom code maintenance Co-authored-by: Cursor <cursoragent@cursor.com> --------- Co-authored-by: Cursor <cursoragent@cursor.com> * [Feat] UI - Add Pretty print view of request/response (BerriAI#20096) * v1 - tool viewer in logs page * add preview for tool sections * ui fixes * new tool view * v1 - new pretty view * clean ui * polish fixes * nice view input/output * working i/o cards * fixes for log view --------- Co-authored-by: Warp <agent@warp.dev> * remove md * fixed mcp tools instructions on ui to show comma seprated str instead of list * docs: cleanup docs * litellm_fix: add missing timezone import to proxy_server.py (BerriAI#20121) * fix(proxy): reduce PLR0915 complexity in base_process_llm_request (BerriAI#20127) * litellm_fix(ui): remove unused ToolOutlined import (BerriAI#20129) * litellm_fix(e2e): disable bedrock-converse-claude-sonnet-4.5 model in tests (BerriAI#20131) * litellm_fix(test): fix Azure AI cost calculator test - use Logging class (BerriAI#20134) * litellm_fix(test): fix Bedrock tool search header test regression (BerriAI#20135) * litellm_fix(test): allow comment field in schema and exclude robotics models from tpm check (BerriAI#20139) * litellm_docs: add missing environment variable documentation (BerriAI#20138) * litellm_fix(test): add acancel_batch to Azure SDK client initialization test (BerriAI#20143) * litellm_fix: handle unknown models in Azure AI cost calculator (BerriAI#20150) * litellm_fix(test): fix router silent experiment tests to properly mock async functions (BerriAI#20140) * chore: update Next.js build artifacts (2026-01-31 17:20 UTC, node v22.16.0) * fix(proxy): use get_async_httpx_client for logo download (BerriAI#20155) Replace direct AsyncHTTPHandler instantiation with get_async_httpx_client to avoid +500ms latency per request from creating new async clients. Added httpxSpecialProvider.UI for UI-related HTTP requests like logo downloads. * fix(datadog): check for agent mode before requiring DD_API_KEY/DD_SITE (BerriAI#20156) The DataDog LLM Obs logger was checking for DD_API_KEY and DD_SITE before checking if agent mode (LITELLM_DD_AGENT_HOST) was configured. In agent mode, the DataDog agent handles authentication, so these environment variables are not required. This fix moves the agent mode check first, and only validates DD_API_KEY and DD_SITE when using direct API mode. Fixes test_datadog_llm_obs_agent_configuration and test_datadog_llm_obs_agent_no_api_key_ok * litellm_fix: handle empty dict for web_search_options in Nova grounding (BerriAI#20159) The condition `value and isinstance(value, dict)` fails for empty dicts because `{}` is falsy in Python. Users commonly pass `web_search_options={}` to enable Nova grounding without specifying additional options. Changed the condition to `isinstance(value, dict)` which correctly handles both empty and non-empty dicts. Fixes failing tests: - test_bedrock_nova_grounding_async - test_bedrock_nova_grounding_request_transformation - test_bedrock_nova_grounding_web_search_options_non_streaming - test_bedrock_nova_grounding_with_function_tools * fix(mypy): fix type errors in files, opentelemetry, gemini transformation, and key management (BerriAI#20161) - files/main.py: rename uuid import to uuid_module to avoid conflict with router import - integrations/opentelemetry.py: add fallback for callback_name to ensure str type - llms/gemini/files/transformation.py: add type annotation for params dict - proxy/management_endpoints/key_management_endpoints.py: add null check for prisma_client * litellm_fix(test): update Prometheus metric test assertions with new labels (BerriAI#20162) This fixes the failing litellm_mapped_enterprise_tests (metrics/logging) job. Recent commits added new labels to several Prometheus metrics (model_id, client_ip, user_agent) but the test assertions weren't fully updated to expect these new labels. Tests fixed: - test_async_post_call_failure_hook - test_async_log_failure_event - test_increment_token_metrics - test_log_failure_fallback_event - test_set_latency_metrics - test_set_llm_deployment_success_metrics Labels added to test assertions: - model_id for token metrics (litellm_tokens_metric, litellm_input_tokens_metric, litellm_output_tokens_metric) - model_id for latency metrics (litellm_llm_api_latency_metric) - model_id for remaining requests/tokens metrics - model_id for fallback metrics - model_id for overhead latency metric - client_ip and user_agent for deployment failure/total/success responses - client_ip and user_agent for proxy failed/total requests metrics * test: remove hosted_vllm from OpenAI client tests (BerriAI#20163) hosted_vllm no longer uses the OpenAI client, so these tests that mock the OpenAI client are not applicable to hosted_vllm. Removes hosted_vllm from: - test_openai_compatible_custom_api_base - test_openai_compatible_custom_api_video * litellm_fix: bump litellm-proxy-extras version to 0.4.28 (BerriAI#20166) Changes were made to litellm_proxy_extras (schema.prisma, utils.py, migrations) but version was not bumped, causing CI publish job to fail. This commit bumps the version from 0.4.27 to 0.4.28 in all required files: - litellm-proxy-extras/pyproject.toml - requirements.txt - pyproject.toml Co-authored-by: shin-bot-litellm <shin-bot-litellm@users.noreply.github.com> * litellm_fix(mypy): fix remaining type errors (BerriAI#20164) - route_llm_request.py: add acancel_batch and afile_delete to route_type Literal - router.py: add SearchToolInfoTypedDict and search_tool_info to SearchToolTypedDict - gemini/files/transformation.py: fix validate_environment signature to match base class - responses transformation.py: fix Dict type annotations to use int instead of Optional[int] - vector_stores/endpoints.py: add team_id and user_id to LiteLLM_ManagedVectorStoresTable constructor Co-authored-by: shin-bot-litellm <shin-bot-litellm@users.noreply.github.com> * litellm_fix(security): allowlist Next.js CVEs for 7 days (BerriAI#20169) Temporarily allowlist Next.js vulnerabilities in UI dashboard: - GHSA-h25m-26qc-wcjf (HIGH: DoS via request deserialization) - CVE-2025-59471 (MEDIUM: Image Optimizer DoS) Fix: Upgrade to Next.js 15.5.10+ or 16.1.5+ (7-day timeline) Changes: - Added .trivyignore with Next.js CVEs - Updated security_scans.sh to use --ignorefile flag * litellm_fix(router): use safe_deep_copy in _get_silent_experiment_kwargs (BerriAI#20170) **Regression introduced in:** PR BerriAI#19544 (feat: add feature to make silent calls) Fixes check_code_and_doc_quality CI failure. Line 1332 used copy.deepcopy(kwargs) which violates ban_copy_deepcopy_kwargs check. kwargs can contain non-serializable objects like OTEL spans. Changed to safe_deep_copy(kwargs) which handles these correctly. * docs(embeddings): add supported input formats section (BerriAI#20073) Document valid input formats for /v1/embeddings endpoint per OpenAI spec. Clarifies that array of string arrays is not a valid format. * fix proxy extras pip * fix gemini files * fix EventDrivenCacheCoordinator * test_increment_top_level_request_and_spend_metrics * fix typing * fix transform_retrieve_file_response * fix linting * fix mcp linting * _add_web_search_tool * test_bedrock_nova_grounding_web_search_options_non_streaming * add _is_bedrock_tool_block * fix MCP client * fix files * litellm_fix(lint): remove unused ToolNameValidationResult imports (BerriAI#20176) Fixes ruff F401 errors in check_code_and_doc_quality CI job. **Regression introduced in:** 41ec820 (fix files) - added files with unused imports ## Problem ToolNameValidationResult is imported but never used in: - litellm/proxy/_experimental/mcp_server/mcp_server_manager.py - litellm/proxy/management_endpoints/mcp_management_endpoints.py ## Fix ```diff - ToolNameValidationResult, ``` Removed from both import statements. ## Changes - mcp_server_manager.py: -1 line (removed unused import) - mcp_management_endpoints.py: -1 line (removed unused import) * litellm_fix(azure): Fix acancel_batch not using Azure SDK client initialization (BerriAI#20168) - Fixed model parameter being overwritten to None in acancel_batch function - Added dedicated acancel_batch/\_acancel_batch methods in Router - Properly extracts custom_llm_provider from deployment like acreate_batch This fixes test_ensure_initialize_azure_sdk_client_always_used[acancel_batch] which expected azure_batches_instance.initialize_azure_sdk_client to be called. Co-authored-by: shin-bot-litellm <shin-bot-litellm@users.noreply.github.com> * fix tar security issue with TAR * fix model name during fallback * test_get_image_non_root_uses_var_lib_assets_dir * test_delete_vector_store_checks_access * test_get_session_iterator_thread_safety * Fix health endpoints * _prepare_vertex_auth_headers * test_budget_reset_and_expires_at_first_of_month * fix(test): add router.acancel_batch coverage (BerriAI#20183) - Add test_router_acancel_batch.py with mock test for router.acancel_batch() - Add _acancel_batch to ignored list (internal helper tested via public API) Fixes CI failure in check_code_and_doc_quality job * fix(mypy): fix validate_tool_name return type signatures (BerriAI#20184) Move ToolNameValidationResult class definition outside the fallback function and use consistent return type annotation to satisfy mypy. Files fixed: - proxy/_experimental/mcp_server/mcp_server_manager.py - proxy/management_endpoints/mcp_management_endpoints.py * fix(test): update test_chat_completion to handle metadata in body The proxy now adds metadata to the request body during processing. Updated test to compare fields individually and strip metadata from body comparison. Fixes litellm_proxy_unit_testing_part2 CI failure. * fix(proxy): resolve 'multiple values for keyword argument' in batch cancel and file retrieve - batch_endpoints.py: Pop batch_id from data before creating CancelBatchRequest to avoid duplicate batch_id when data already contains it from earlier cast - files_endpoints.py: Pop file_id from data before calling afile_retrieve to avoid duplicate file_id when data was initialized with {"file_id": file_id} - test_claude_agent_sdk.py: Disable bedrock-nova-premier test as it requires an inference profile for on-demand throughput (AWS limitation) Fixes: e2e_openai_endpoints tests (test_batches_operations, test_file_operations) Fixes: proxy_e2e_anthropic_messages_tests (nova-premier model skip) * ci(security): allowlist GHSA-34x7-hfp2-rc4v (node-tar hardlink) Not applicable - tar CLI not exposed in application code * fix(mypy): add type: ignore for conditional function variants in MCP modules The mypy error 'All conditional function variants have identical signatures' occurs when defining fallback functions in try/except ImportError blocks. Adding '# type: ignore[misc]' suppresses this false positive. Fixes: - mcp_server_manager.py:80 - validate_tool_name fallback - mcp_management_endpoints.py:72 - validate_tool_name fallback * fix: make cache updates synchronous for budget enforcement The budget enforcement was failing in tests because cache updates were fire-and-forget (asyncio.create_task), causing race conditions where subsequent requests would read stale spend data. Changes: 1. proxy_track_cost_callback.py: await update_cache() instead of create_task 2. proxy_server.py: await async_set_cache_pipeline() instead of create_task 3. auth_checks.py: prefer valid_token.team_member_spend (from fresh cache) over team_membership.spend (which may be stale) This ensures budget checks see the most recent spend values and properly enforce budget limits when requests come in quick succession. Fixes: test_users_in_team_budget, test_chat_completion_low_budget * fix(test): accept both AuthenticationError and InternalServerError in batch_completion test (BerriAI#20186) The test uses an invalid API key to verify that batch_completion returns exceptions rather than raising them. However, depending on network conditions, the error may be: - AuthenticationError: API properly rejected the invalid key - InternalServerError: Connection error occurred before API could respond Both are valid outcomes for this test case. Co-authored-by: shin-bot-litellm <shin-bot-litellm@users.noreply.github.com> * test_embedding fix * fix bedrock-nova-premier * Revert "fix: make cache updates synchronous for budget enforcement" This reverts commit d038341. * fix(test): correct prompt_tokens in test_string_cost_values (BerriAI#20185) The test had prompt_tokens=1000 but the sum of token details was 1150 (text=700 + audio=100 + cached=200 + cache_creation=150). This triggered the double-counting detection logic which recalculated text_tokens to 550, causing the assertion to fail. Fixed by setting prompt_tokens=1150 to match the sum of details. * fix: bedrock-converse-claude-sonnet-4.5 * fix: stabilize CI tests - routes and bedrock config - Add /v1/vector_store/list route for OpenAI API compatibility (fixes test_routes_on_litellm_proxy) - Fix Bedrock Converse API model format (bedrock_converse/ → bedrock/converse/) - Fix Nova Premier inference profile prefix (amazon. → us.amazon.) - Add STABILIZATION_TODO.md to .gitignore Tested locally - all affected tests now pass Co-authored-by: Cursor <cursoragent@cursor.com> * sync: generator client * add LiteLLM_ManagedVectorStoresTable_user_id_idx * docs/blog index page (BerriAI#20188) * docs: add card-based blog index page for mobile navigation Fixes BerriAI#20100 - the blog landing page showed post content directly instead of an index, with no way to navigate between posts on mobile. - Swizzle BlogListPage with card-based grid layout - Featured latest post spans full width with badge - Responsive 2-column grid with orphan handling - Pagination, SEO metadata, accessibility (aria-label, dateTime, heading hierarchy) - Add description frontmatter to existing blog posts * docs: add deterministic fallback colors for unknown blog tags * docs: rename blog heading to The LiteLLM Blog * UI spend logs setting docs * bump extras * fix fake-openai-endpoint * doc fix * fix team budget checks * bump: version 1.81.5 → 1.81.6 * litellm_fix_mapped_tests_core: clear client cache and fix isinstance checks (BerriAI#20196) ## Problem Tests using mocked HTTP clients were hitting real APIs because: 1. HTTP client cache was returning previously cached real clients 2. isinstance checks failed due to module identity issues from sys.path ### Tests affected: - test_send_email_missing_api_key - test_send_email_multiple_recipients (resend & sendgrid) - test_search_uses_registry_credentials - test_vector_store_create_with_simple_provider_name - test_vector_store_create_with_provider_api_type - test_vector_store_create_with_ragflow_provider - test_image_edit_merges_headers_and_extra_headers - test_retrieve_container_basic (container API tests) ## Solution 1. Add clear_client_cache fixture (autouse=True) to clear litellm.in_memory_llm_clients_cache before each test 2. Fix isinstance checks to use type name comparison (avoids module identity issues from sys.path.insert) ## Why not disable_aiohttp_transport The default transport is aiohttp, so tests should work with it. Clearing the cache ensures mocks are used instead of cached real clients. ## Regression PR BerriAI#19829 (commit f95572e) added @respx.mock but cached clients from earlier tests were being reused, bypassing the mocks. Co-authored-by: shin-bot-litellm <shin-bot-litellm@users.noreply.github.com> * test_chat_completion_low_budget * fix: delete_file * fixes * fix: update test_prometheus to expect masked user_id in metrics The user_id field 'default_user_id' is being masked to '*******_user_id' in prometheus metrics for privacy. Updated test expectations to match the actual behavior. Co-authored-by: Cursor <cursoragent@cursor.com> * docs fix * feat(bedrock): add base cache costs for sonnet v1 (BerriAI#20214) * docs: fix dead links in v1.81.6 release notes (BerriAI#20218) - Fix /docs/search/index -> /docs/search (404 error) - Fix /cookbook/ -> GitHub cookbook URL (404 error) Co-authored-by: shin-bot-litellm <shin-bot-litellm@users.noreply.github.com> * fix(test): update test_prometheus with masked user_id and missing labels - Update expected user_id from 'default_user_id' to '*******_user_id' (PII masking) - Add missing client_ip, user_agent, model_id labels (from PRs BerriAI#19717, BerriAI#19678) - Update label order to match Prometheus alphabetical sorting Co-authored-by: Cursor <cursoragent@cursor.com> * litellm_fix_mapped_tests_core: fix test isolation and mock injection issues (BerriAI#20209) * litellm_fix_mapped_tests_core: fix test isolation and mock injection issues ## Problem Four tests in litellm_mapped_tests_core were failing: 1. test_register_model_with_scientific_notation - KeyError due to test isolation issues 2. test_search_uses_registry_credentials - Mock not being called due to incorrect patch path 3. test_send_email_missing_api_key - Real API calls despite mocking 4. test_stream_transformation_error_sync - Mock not effective, real API called ## Solution ### test_register_model_with_scientific_notation - Use unique model name to avoid conflicts with other tests - Clear LRU caches before test to prevent stale data - Clean up model_cost entry after test ### test_search_uses_registry_credentials - Use patch.object() on the actual base_llm_http_handler instance - String-based patching for instance methods can fail; direct object patching is more reliable ### test_send_email_missing_api_key - Directly inject mock HTTP client into logger instance - This bypasses any caching issues that could cause the fixture mock to be ineffective ### test_stream_transformation_error_sync - Patch litellm.completion directly instead of the handler module's litellm reference - This ensures the mock is effective regardless of import order ## Regression These tests were affected by LRU caching added in BerriAI#19606 and HTTP client caching. * fix(test): use patch.object for container API tests to fix mock injection ## Problem test_retrieve_container_basic tests were failing because mocks weren't being applied correctly. The tests used string-based patching: patch('litellm.containers.main.base_llm_http_handler') But base_llm_http_handler is imported at module level, so the mock wasn't intercepting the actual handler calls, resulting in real HTTP requests to OpenAI API. ## Solution Use patch.object() to directly mock methods on the imported handler instance. Import base_llm_http_handler in the test file and patch like: patch.object(base_llm_http_handler, 'container_retrieve_handler', ...) This ensures the mock is applied to the actual object being used, regardless of import order or caching. * fix(test): add missing Prometheus metric labels to test_proxy_failure_metrics Add client_ip, user_agent, model_id labels to expected metric patterns. These labels were added in PRs BerriAI#19717 and BerriAI#19678 but test wasn't updated. * fix(test_resend_email): use direct mock injection for all email tests Extend the mock injection pattern used in test_send_email_missing_api_key to all other tests in the file: - test_send_email_success - test_send_email_multiple_recipients Instead of relying on fixture-based patching and respx mocks which can fail due to import order and caching issues, directly inject the mock HTTP client into the logger instance. This ensures mocks are always used regardless of test execution order. * fix(test): use patch.object for image_edit and vector_store tests - test_image_edit_merges_headers_and_extra_headers: import base_llm_http_handler and use patch.object instead of string path patching - test_search_uses_registry_credentials: import module and patch via module.base_llm_http_handler to ensure we patch the right instance --------- Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com> * Revert "Merge pull request BerriAI#18790 from BerriAI/litellm_key_team_routing_3" This reverts commit ae26d8e, reversing changes made to 864e8c6. * test_proxy_failure_metrics * test_proxy_success_metrics * fix(test): make test_proxy_failure_metrics resilient to missing proxy-level metrics - Check for both litellm_proxy_failed_requests_metric_total and the deprecated litellm_llm_api_failed_requests_metric_total - The proxy-level failure hook may not always be called depending on where the exception occurs - Simplify total_requests check to only verify key fields Co-authored-by: Cursor <cursoragent@cursor.com> * test fix * docs: Update v1.81.6 release notes - focus on Logs v2 with Tool Call Tracing (BerriAI#20225) - Updated title to highlight Logs v2 feature - Simplified Key Highlights to focus on Logs v2 / tool call tracing - Rewrote Logs v2 description with improved language style - Removed Claude Agents SDK and RAG API from key highlights section - TODO: Add image (logs_v2_tool_tracing.png) Co-authored-by: shin-bot-litellm <shin-bot-litellm@users.noreply.github.com> * feat: enhance Cohere embedding support with additional parameters and model version * Update Vertex AI Text to Speech doc to show use of audio * feat(logs-ui): show additional client/model logs in JSON view and log model requests * temporarily remove `litellm/proxy/_experimental/out` before merging `ii-main` * chore: update Next.js build artifacts (2026-02-06 10:06 UTC, node v24.13.0) --------- Co-authored-by: jayy-77 <1427jay@gmail.com> Co-authored-by: Neha Prasad <neh6a683@gmail.com> Co-authored-by: Cesar Garcia <128240629+Chesars@users.noreply.github.com> Co-authored-by: Harshit Jain <48647625+Harshit28j@users.noreply.github.com> Co-authored-by: Sameer Kankute <sameer@berri.ai> Co-authored-by: michelligabriele <gabriele.michelli@icloud.com> Co-authored-by: Bernardo Donadio <bcdonadio@bcdonadio.com> Co-authored-by: Christopher Chase <cchase@redhat.com> Co-authored-by: Aaron Yim <aaronchyim@gmail.com> Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com> Co-authored-by: Takumi Matsuzawa <152503584+genga6@users.noreply.github.com> Co-authored-by: Varun Sripad <varunsripad@Varuns-MacBook-Air.local> Co-authored-by: yuneng-jiang <yuneng.jiang@gmail.com> Co-authored-by: Rhys <nghuutho74@gmail.com> Co-authored-by: Alexsander Hamir <alexsanderhamirgomesbaptista@gmail.com> Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com> Co-authored-by: Cursor Agent <cursoragent@cursor.com> Co-authored-by: ishaan <ishaan@berri.ai> Co-authored-by: Warp <agent@warp.dev> Co-authored-by: shivam <shivam@uni.minerva.edu> Co-authored-by: Krrish Dholakia <krrishdholakia@gmail.com> Co-authored-by: Shivam Rawat <161387515+shivamrawat1@users.noreply.github.com> Co-authored-by: shin-bot-litellm <shin-bot-litellm@berri.ai> Co-authored-by: shin-bot-litellm <shin-bot-litellm@users.noreply.github.com> Co-authored-by: ryan-crabbe <128659760+ryan-crabbe@users.noreply.github.com> Co-authored-by: cscguochang-agent <cscguochang@gmail.com> Co-authored-by: amirzaushnizer <amir.z@qodo.ai>

AlexsanderHamir added 2 commits January 26, 2026 17:36

vercel bot deployed to Preview January 27, 2026 01:38 View deployment

AlexsanderHamir merged commit f95572e into main Jan 27, 2026
10 of 32 checks passed

shin-bot-litellm mentioned this pull request Jan 31, 2026

litellm_fix_mapped_tests_core: disable aiohttp transport in tests #20194

Closed

This was referenced Jan 31, 2026

litellm_fix_mapped_tests_core: clear client cache and fix isinstance checks #20195

Closed

litellm_fix_mapped_tests_core: clear client cache and fix isinstance checks #20196

Merged

krrishdholakia mentioned this pull request Feb 5, 2026

fix(mypy): fix type: ignore placement for OTEL LogRecord import (#20351) #20477

Closed

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix broken mocks in 6 flaky tests to prevent real API calls #19829

Fix broken mocks in 6 flaky tests to prevent real API calls #19829
AlexsanderHamir merged 2 commits intomainfrom
litellm_ci_fix_0000010

AlexsanderHamir commented Jan 27, 2026

Uh oh!

vercel bot commented Jan 27, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

AlexsanderHamir commented Jan 27, 2026

Relevant issues

Pre-Submission checklist

CI (LiteLLM team)

Type

Changes

Uh oh!

vercel bot commented Jan 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

vercel bot commented Jan 27, 2026 •

edited

Loading