docs: add litellm-enterprise requirement for managed files#19689
Merged
ishaan-jaff merged 1 commit intoBerriAI:mainfrom Jan 24, 2026
Merged
docs: add litellm-enterprise requirement for managed files#19689ishaan-jaff merged 1 commit intoBerriAI:mainfrom
ishaan-jaff merged 1 commit intoBerriAI:mainfrom
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
krrishdholakia
added a commit
that referenced
this pull request
Feb 5, 2026
* UI: new build * redirect to login on expired jwt * [Feat] UI + Backend - Allow adding policies on Keys/Teams + Viewing on Info panels (#19688) * ui for policy mgmt * test_add_guardrails_from_policy_engine_accepts_dynamic_policies_and_pops_from_data * docs: add litellm-enterprise requirement for managed files (#19689) * Update Gemini 2.0 Flash deprecation dates to March 31, 2026 (#19592) Google announced that Gemini 2.0 Flash and Flash Lite models will be discontinued on March 31, 2026. Updated deprecation_date field for all affected model variants across different providers (vertex_ai, gemini, deepinfra, openrouter, vercel_ai_gateway). Models updated: - gemini-2.0-flash (added deprecation date) - gemini-2.0-flash-001 (updated from 2026-02-05) - gemini-2.0-flash-lite (added deprecation date) - gemini-2.0-flash-lite-001 (updated from 2026-02-25) All variants now correctly reflect the March 31, 2026 shutdown date. * fixing build * Fixing failing tests * deactivating non root tests * fixing arize tests * cache tests serial * fixing circleci config * fixing circleci config * Update OSS Adopters section with new table format * Fixing ruff check * bump: version 1.81.2 → 1.81.3 * chore: update Next.js build artifacts (2026-01-24 17:18 UTC, node v22.16.0) * CI/CD fixes - split local testing * fix: _apply_search_filter_to_models mypy linting * test_partner_models_httpx_streaming * test_web_search * Fix: log duplication when json_logs is enabled (#19705) * fix: FLAKY tests * fix unstable tests * docs fix * docs fix * docs fix * docs fix * docs fix * test_get_default_unvicorn_init_args * fix flaky tests * test_hanging_request_azure * test_team_update_sc_2 * BUMP extras * test fixes * test fixes * test_retrieve_container_basic * Model and Team filtering * TestBedrockInvokeToolSearch * fix(presidio): resolve runtime error by handling asyncio loops in bac… (#19714) * fix(presidio): resolve runtime error by handling asyncio loops in background threads * add test case for thread safety * UI Keys Teams Router Settings docs * chore: update Next.js build artifacts (2026-01-25 00:27 UTC, node v22.16.0) * test_stream_transformation_error_sync * fix patch reliability mock tests * fix MCP tests * auto truncation of virtual keys table values * fix: args issue & refactor into helper function to reduce bloat for both(#19441) * Fix bulk user add * fix(proxy): support slashes in google generateContent model names (#19737) * fix(proxy): support slashes in google route params * fix(proxy): extract google model ids with slashes * test(proxy): cover google model ids with slashes * Fix/non standard mcp url pattern (#19738) * fix(mcp): Add standard MCP URL pattern support for OAuth discovery (#17272) OAuth discovery endpoints now support both URL patterns: - Standard MCP pattern: /mcp/{server_name} (new) - Legacy LiteLLM pattern: /{server_name}/mcp (backward compatible) The standard pattern is required by MCP-compliant clients like mcp-inspector and VSCode Copilot, which expect resource URLs following the /mcp/{server_name} convention per RFC 9728. Changes: - Add _build_oauth_protected_resource_response() helper - Add oauth_protected_resource_mcp_standard() endpoint - Add oauth_authorization_server_mcp_standard() endpoint - Keep legacy endpoints for backward compatibility - Add tests for both URL patterns Fixes #17272 * fix(mcp): Add standard MCP URL pattern support for OAuth discovery (#17272) OAuth discovery endpoints now support both URL patterns: - Standard MCP pattern: /mcp/{server_name} (new) - Legacy LiteLLM pattern: /{server_name}/mcp (backward compatible) The standard pattern is required by MCP-compliant clients like mcp-inspector and VSCode Copilot, which expect resource URLs following the /mcp/{server_name} convention per RFC 9728. Changes: - Add _build_oauth_protected_resource_response() helper - Add oauth_protected_resource_mcp_standard() endpoint - Add oauth_authorization_server_mcp_standard() endpoint - Keep legacy endpoints for backward compatibility - Add tests for both URL patterns Fixes #17272 * Test was relocated * refactor(mcp): Extract helper methods from run_with_session to fix PLR0915 Split the large run_with_session method (55 statements) into smaller helper methods to satisfy ruff's PLR0915 rule (max 50 statements): - _create_transport_context(): Creates transport based on type - _execute_session_operation(): Handles session lifecycle Also changed cleanup exception handling from Exception to BaseException to properly catch asyncio.CancelledError (which is a BaseException subclass in Python 3.8+). Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * test(mcp): Fix flaky test by mocking health_check_server The test_mcp_server_manager_config_integration_with_database test was making real network calls to fake URLs which caused timeouts and CancelledError exceptions. Fixed by mocking health_check_server to return a proper LiteLLM_MCPServerTable object instead of making network calls. * test(mcp): Fix skip condition to properly detect claude model names The skip condition for missing API keys was checking for "anthropic" in the model name, but the test uses "claude-haiku-4-5" which doesn't match. Updated to check for both "anthropic" and "claude" model patterns. Also added skip condition for OpenAI models when OPENAI_API_KEY is not set. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * test(mcp): Fix skip condition to properly detect claude model names The skip condition for missing API keys was checking for "anthropic" in the model name, but the test uses "claude-haiku-4-5" which doesn't match. Updated to check for both "anthropic" and "claude" model patterns. Also added skip condition for OpenAI models when OPENAI_API_KEY is not set. --------- Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com> * add callbacks and labels to prometheus (#19708) * feat: add clientip and user agent in metrics (#19717) * feat: add clientip and user agent in metrics * fix: lint errors * Add model id and other req labels --------- Co-authored-by: Krish Dholakia <krrishdholakia@gmail.com> * fix: optimize logo fetching and resolve mcp import blockers (#19719) * feat: tpm-rpm limit in prometheus metrics (#19725) Co-authored-by: Krish Dholakia <krrishdholakia@gmail.com> * add timeout to onyx guardrail (#19731) * add timeout to onyx guardrail * add tests * Fix /batches to return encoded ids (from managed objects table) * fix(proxy): use return value from CustomLogger.async_post_call_success_hook (#19670) * fix(proxy): use return value from CustomLogger.async_post_call_success_hook Previously the return value was ignored for CustomLogger callbacks, preventing users from modifying responses. Now the return value is captured and used to replace the response (if not None), consistent with CustomGuardrail and streaming iterator hook behavior. Fixes issue with custom_callbacks not being able to inject data into LLM responses. * fix(proxy): also fix async_post_call_streaming_hook to use return value Previously the streaming hook only used return values that started with "data: " (SSE format). Now any non-None return value is used, consistent with async_post_call_success_hook and streaming iterator hook behavior. Added tests for streaming hook transformation. --------- Co-authored-by: Gabriele Michelli <michelligabriele0@gmail.com> * feat(hosted_vllm): support thinking parameter for /v1/messages endpoint Adds support for Anthropic-style 'thinking' parameter in hosted_vllm, converting it to OpenAI-style 'reasoning_effort' since vLLM is OpenAI-compatible. This enables users to use Claude Code CLI with hosted vLLM models like GLM-4.6/4.7 through the /v1/messages endpoint. Mapping (same as Anthropic adapter): - budget_tokens >= 10000 -> "high" - budget_tokens >= 5000 -> "medium" - budget_tokens >= 2000 -> "low" - budget_tokens < 2000 -> "minimal" Fixes #19761 * Fix batch creation to return the input file's expires_at attribute * bump: version 1.81.3 → 1.81.4 (#19793) * fix: server rooth path (#19790) * refactor: extract transport context creation into separate method (#19794) * Fix user max budget reset to unlimited - Added a Pydantic validator to convert empty string inputs for max_budget to None, preventing float parsing errors from the frontend. - Modified the internal user update logic to explicitly allow max_budget to be None, ensuring the value isn't filtered out and can be reset to unlimited in the database. - Added unit tests for validation and logic. Closes #19781 * Make test_get_users_key_count deterministic by creating dedicated test user (#19795) - Create a test user with auto_create_key=False to ensure known starting state - Filter get_users by user_ids to target only the test user - Verify initial key count is 0 before creating a key - Clean up test user after test completes - This ensures consistent behavior across CI and local environments * Add test for Router.get_valid_args, fix router code coverage encoding (#19797) - Add test_get_valid_args in test_router_helper_utils.py to cover get_valid_args - Use encoding='utf-8' in router_code_coverage.py for cross-platform file reads * fix sso email case sensitivity * Fix test_mcp_server_manager_config_integration_with_database cancellation error (#19801) Mock _create_mcp_client to avoid network calls in health checks. This prevents asyncio.CancelledError when the test teardown closes the event loop while health checks are still pending. The test focuses on conversion logic (access_groups, description) not health check functionality, so mocking the network call is appropriate. * fix: make HTTPHandler mockable in OIDC secret manager tests (#19803) * fix: make HTTPHandler mockable in OIDC secret manager tests - Add _get_oidc_http_handler() factory function to make HTTPHandler easily mockable in tests - Update test_oidc_github_success to patch factory function instead of HTTPHandler directly - Update Google OIDC tests for consistency - Fixes test_oidc_github_success failure where mock was bypassed This change allows tests to properly mock HTTPHandler instances used for OIDC token requests, fixing the test failure where the mock was not being used. * fix: patch base_llm_http_handler method directly in container tests - Use patch.object to patch container_create_handler method directly on the base_llm_http_handler instance instead of patching the module - Fixes test_provider_support[openai] failure where mock wasn't applied - Also fixes test_error_handling_integration with same approach The issue was that patching 'litellm.containers.main.base_llm_http_handler' didn't work because the module imports it with 'from litellm.main import', creating a local reference. Using patch.object patches the method on the actual object instance, which works regardless of import style. * fix: resolve flaky test_openai_env_base by clearing cache - Add cache clearing at start of test_openai_env_base to prevent cache pollution - Ensures no cached clients from previous tests interfere with respx mocks - Fixes intermittent failures where aiohttp transport was used instead of httpx - Test-only change with low risk, no production code modifications Resolves flaky test marked with @pytest.mark.flaky(retries=3, delay=1) Both parametrized versions (OPENAI_API_BASE and OPENAI_BASE_URL) now pass consistently * test: add explicit mock verification in test_provider_support - Capture mock handler with 'as mock_handler' for explicit validation - Add assert_called_once() to verify mock was actually used - Ensures test verifies no real API calls are made - Follows same pattern as test_openai_env_base validation * Add light/dark mode slider for dev * fix key duration input * Messages api bedrock converse caching and pdf support (#19785) * cache control for user messages and system messages * add cache createion tokens in reponse * cache controls in tool calls and assistant turns * refactor with _should_preserve_cache_control * add cache control unit tests * use simpler cache creation token count logic * use helper function * remove unused function * fix unit tests * fixing team member add * [Feat] enable progress notifications for MCP tool calls (#19809) * enable progress notifications for MCP tool calls * adjust mcp test * [Feat] CLI Auth - Add configurable CLI JWT expiration via environment variable (#19780) * fix: add CLI_JWT_EXPIRATION_HOURS * docs: CLI_JWT_EXPIRATION_HOURS * fix: get_cli_jwt_auth_token * test_get_cli_jwt_auth_token_custom_expiration * fixing flaky tests around oidc and email * Add dont ask me again option in nudges * CI/CD: Increase retries and stabilize litellm_mapped_tests_core (#19826) * Fix PLR0915: Extract system message handling to reduce statement count * fix mypy * fix: add host_progress_callback parameter to mock_call_tool in test The test_call_tool_without_broken_pipe_error was failing because the mock function did not accept the host_progress_callback keyword argument that the actual implementation passes to client.call_tool(). Updated the mock to accept this parameter to match the real implementation signature. * fixing flaky tests around oidc and email * Add documentation comment to test file * add retry * add dependency * increase retry --------- Co-authored-by: yuneng-jiang <yuneng.jiang@gmail.com> * Fix broken mocks in 6 flaky tests to prevent real API calls (#19829) * Fix broken mocks in 6 flaky tests to prevent real API calls Added network-level HTTP blocking using respx to prevent tests from making real API calls when Python-level mocks fail. This makes tests more reliable and retryable in CI. Changes: - Azure OIDC test: Added Azure Identity SDK mock to prevent real Azure calls - Vector store test: Added @respx.mock decorator to block HTTP requests - Resend email tests (3): Added @respx.mock decorator for all 3 test functions - SendGrid email test: Added @respx.mock decorator All test assertions and verification logic remain unchanged - only added safety nets to catch leaked API calls. * Fix failing OIDC secret manager tests Fixed two test failures in test_secret_managers_main.py: 1. test_oidc_azure_ad_token_success: Corrected the patch path for get_bearer_token_provider from 'litellm.secret_managers.get_azure_ad_token_provider.get_bearer_token_provider' to 'azure.identity.get_bearer_token_provider' since the function is imported from azure.identity. 2. test_oidc_google_success: Added @patch('httpx.Client') decorator to prevent any real HTTP connections during test execution, resolving httpx.ConnectError issues. Both tests now pass successfully. * Adding tests: * fixing breaking change: just user_id provided should upsert still * Fix: A2A Python SDK URL * [Feat] Add UI for /rag/ingest API - upload docs, pdfs etc to create vector stores (#19822) * feat: _save_vector_store_to_db_from_rag_ingest * UI features for RAG ingest * fix: Endpoints * ragIngestCall * _save_vector_store_to_db_from_rag_ingest * fix: rag_ingest Code QA CHECK * UI fixes unit tests * docs(readme): add OpenAI Agents SDK to OSS Adopters (#19820) * docs(readme): add OpenAI Agents SDK to OSS Adopters * docs(readme): add OpenAI Agents SDK logo * Fixing tests * Litellm release notes 01 26 2026 (#19836) * docs: document new models/endpoints * docs: cleanup * feat: update model table * fixing tests * Litellm release notes 01 26 2026 (#19838) * docs: document new models/endpoints * docs: cleanup * feat: update model table * fix: cleanup * feat: Add model_id label to Prometheus metrics (#18048) (#19678) Co-authored-by: Cursor Agent <cursoragent@cursor.com> * fix(models): set gpt-5.2-codex mode to responses for Azure and OpenRouter (#19770) Fixes #19754 The gpt-5.2-codex model only supports the responses API, not chat completions. Updated azure/gpt-5.2-codex and openrouter/openai/gpt-5.2-codex entries to use mode: "responses" and supported_endpoints: ["/v1/responses"]. * fix(responses): update local_vars with detected provider (#19782) (#19798) When using the responses API with provider-specific params (aws_*, vertex_*) without explicitly passing custom_llm_provider, the code crashed with: AttributeError: 'NoneType' object has no attribute 'startswith' Root cause: local_vars was captured via locals() before get_llm_provider() detected the provider from the model string (e.g., "bedrock/..."), so custom_llm_provider remained None when processing provider-specific params. Fix: Update local_vars["custom_llm_provider"] after get_llm_provider() call so the detected provider is available for param processing. Affected provider-specific params: - aws_* (aws_region_name, aws_access_key_id, etc.) for Bedrock/SageMaker - vertex_* (vertex_project, vertex_location, etc.) for Vertex AI * fix(azure): use generic cost calculator for audio token pricing (#19771) Azure audio models were charging audio output tokens at the text token rate instead of the correct audio token rate. This resulted in costs being ~6.65x lower than expected. The fix replaces Azure's custom cost calculation logic with the generic cost calculator that properly handles text, audio, cached, reasoning, and image tokens. Fixes #19764 * fix(xai): correct cached token cost calculation for xAI models (#19772) * fix(azure): use generic cost calculator for audio token pricing Azure audio models were charging audio output tokens at the text token rate instead of the correct audio token rate. This resulted in costs being ~6.65x lower than expected. The fix replaces Azure's custom cost calculation logic with the generic cost calculator that properly handles text, audio, cached, reasoning, and image tokens. Fixes #19764 * fix(xai): correct cached token cost calculation for xAI models - Fix double-counting issue where xAI reports text_tokens = prompt_tokens (including cached), causing tokens to be charged twice - Add cache_read_input_token_cost to xAI grok-3 and grok-3-mini model variants - Detection: when text_tokens + cached_tokens > prompt_tokens, recalculate text_tokens = prompt_tokens - cached_tokens xAI pricing (25% of input for cached): - grok-3 variants: $0.75/M cached (input $3/M) - grok-3-mini variants: $0.075/M cached (input $0.30/M) * Fix:Support both JSON array format and comma-separated values from user headers * Translate advanced-tool-use to Bedrock-specific headers for Claude Opus 4.5 * fix: token calculations and refactor (#19696) * fix(prometheus): safely handle None metadata in logging to prevent At… (#19691) * fix(prometheus): safely handle None metadata in logging to prevent AttributeError * fix: lint issues * fix: resolve 'does not exist' migration errors as applied in setup_database (#19281) * Fix: timeout exception raised eror * Add sarvam doc * Add gemini-robotics-er-1.5-preview model in model map * Add gemini-robotics-er-1.5-preview model documentation * Fix: Stream the download in chunks * Add grok reasoning content * Revert poetry lock * Fix mypy and code quality issues * feat: add feature to make silent calls (#19544) * feat: add feature to make silent calls * add test or silent feat * add docs for silent feat * fix lint issues and UI logs * add docs of ab testing and deep copy * fix(enterprise): correct error message for DISABLE_ADMIN_ENDPOINTS (#19861) The error message for DISABLE_ADMIN_ENDPOINTS incorrectly said "DISABLING LLM API ENDPOINTS is an Enterprise feature" instead of "DISABLING ADMIN ENDPOINTS is an Enterprise feature". This was a copy-paste bug from the is_llm_api_route_disabled() function. Added regression tests to verify both error messages are correct. * fix(proxy): handle agent parameter in /interactions endpoint (#19866) * initialize tiktoken environment at import time to support offline usage * fix(bedrock): support tool search header translation for Sonnet 4.5 (#19871) Extend advanced-tool-use header translation to include Claude Sonnet 4.5 in addition to Opus 4.5 on Bedrock Invoke API. When Claude Code sends the advanced-tool-use-2025-11-20 header, it now gets correctly translated to Bedrock-specific headers for both: - Claude Opus 4.5 - Claude Sonnet 4.5 Headers translated: - tool-search-tool-2025-10-19 - tool-examples-2025-10-29 Fixes defer_loading validation error on Bedrock with Sonnet 4.5. Ref: https://platform.claude.com/docs/en/agents-and-tools/tool-use/tool-search-tool * bulk update keys endpoint * mypy linting * [Feat] RAG API - Add support for using s3 Vectors as Vector Store Provider for /rag/ingest (#19888) * init S3VectorsRAGIngestion as a supported ingestion provider for RAG API * test: TestRAGS3Vectors * init S3VectorsVectorStoreOptions * init s3 vectors * code clean up + QA * fix: get_credentials * S3VectorsRAGIngestion * TestRAGS3Vectors * docs: AWS S3 Vectors * add asyncio QA checks * fix: S3_VECTORS_DEFAULT_DIMENSION * Add native_background_mode to override polling_via_cache for specific models This follow-up to PR #16862 allows users to specify models that should use the native provider's background mode instead of polling via cache. Config example: litellm_settings: responses: background_mode: polling_via_cache: ["openai"] native_background_mode: ["o4-mini-deep-research"] ttl: 3600 When a model is in native_background_mode list, should_use_polling_for_request returns False, allowing the request to fall through to native provider handling. Committed-By-Agent: cursor * [Feat] RAG API - Add s3_vectors as provider on /vector_store/search API + UI for creating + PDF support for /rag/ingest (#19895) * init S3VectorsRAGIngestion as a supported ingestion provider for RAG API * test: TestRAGS3Vectors * init S3VectorsVectorStoreOptions * init s3 vectors * code clean up + QA * fix: get_credentials * S3VectorsRAGIngestion * TestRAGS3Vectors * docs: AWS S3 Vectors * add asyncio QA checks * fix: S3_VECTORS_DEFAULT_DIMENSION * init ui for bedrock s3 vectors * fix add /search support for s3_vectors * init atransform_search_vector_store_request * feat: S3VectorsVectorStoreConfig * TestS3VectorsVectorStoreConfig * atransform_search_vector_store_request * fix: S3VectorsVectorStoreConfig * add validation for bucket name etd * fix UI validation for s3 vector store * init extract_text_from_pdf * add pypdf * fix code QA checks * fix navbar * init s3_vector.png * fix QA code * Add tests for native_background_mode feature Added 8 new unit tests for the native_background_mode feature: - test_polling_disabled_when_model_in_native_background_mode - test_polling_disabled_for_native_background_mode_with_provider_list - test_polling_enabled_when_model_not_in_native_background_mode - test_polling_enabled_when_native_background_mode_is_none - test_polling_enabled_when_native_background_mode_is_empty_list - test_native_background_mode_exact_match_required - test_native_background_mode_with_provider_prefix_in_request - test_native_background_mode_with_router_lookup Committed-By-Agent: cursor * add sortBy and sortOrder params for /v2/model/info * ruff check * Fixing UI tests * test(proxy): add regression tests for vertex passthrough model names with slashes (#19855) Added test cases for custom model names containing slashes in Vertex AI passthrough URLs (e.g., gcp/google/gemini-2.5-flash). Test cases: - gcp/google/gemini-2.5-flash - gcp/google/gemini-3-flash-preview - custom/model * fix: guardrails issues streaming-response regex (#19901) * fix: add fix for migration issue and and stable linux debain (#19843) * fix: filter unsupported beta headers for Bedrock Invoke API (#19877) - Add whitelist-based filtering for anthropic_beta headers - Only allow Bedrock-supported beta flags (computer-use, tool-search, etc.) - Filter out unsupported flags like mcp-servers, structured-outputs - Remove output_format parameter from Bedrock Invoke requests - Force tool-based structured outputs when response_format is used Fixes #16726 * fix: allow tool_choice for Azure GPT-5 chat models (#19813) * fix: don't treat gpt-5-chat as GPT-5 reasoning * fix: mark azure gpt-5-chat as supporting tool_choice * test: cover gpt-5-chat params on azure/openai * fix: tool with antropic #19800 (#19805) * All Models Page server side sorting * Add Init Containers in the community helm chart (#19816) * docs: fix guardrail logging docs (#19833) * Fixing build and tests * inspect BadRequestError after all other policy types (#19878) As indicated by https://docs.litellm.ai/docs/exception_mapping, BadRequestError is used as the base type for multiple exceptions. As such, it should be tested last in handling retry policies. This updates the integration test that validates retry policies work as expected. Fixes #19876 * fix(main): use local tiktoken cache in lazy loading (#19774) The lazy loading implementation for encoding in __getattr__ was calling tiktoken.get_encoding() directly without first setting TIKTOKEN_CACHE_DIR. This caused tiktoken to attempt downloading the encoding file from the internet instead of using the local copy bundled with litellm. This fix uses _get_default_encoding() from _lazy_imports which properly sets TIKTOKEN_CACHE_DIR before loading tiktoken, ensuring the local cache is used. * fix(gemini): subtract implicit cached tokens from text_tokens for correct cost calculation (#19775) When Gemini uses implicit caching, it returns cachedContentTokenCount but NOT cacheTokensDetails. Previously, text_tokens was not adjusted in this case, causing costs to be calculated as if all tokens were non-cached. This fix subtracts cachedContentTokenCount from text_tokens when no cacheTokensDetails is present (implicit caching), ensuring correct cost calculation with the reduced cache_read pricing. * [Feat] UI: Allow Admins to control what pages are visible on LeftNav (#19907) * feat: enabled_ui_pages_internal_users * init ui for internal user controsl * fix ui settings * fix build * fix leftnav * fix leftnav * test fixes * fix leftnav * isPageAccessibleToInternalUsers * docs fix * docs ui viz * Add xai websearch params support * Allow dynamic setting of store_prompts_in_spend_logs * Fix: output_tokens_details.reasoning_tokens None * fix: Pydantic will fail to parse it because cached_tokens is required but not provided * Spend logs setting modal * adding tests * fix(anthropic): remove explicit cache_control null in tool_result content Fixes issue where tool_result content blocks include explicit 'cache_control': null which breaks some Anthropic API channels. Changes: - Only include cache_control field when explicitly set and not None - Prevents serialization of null values in tool_result text content - Maintains backward compatibility with existing cache_control usage Related issue: Anthropic tool_result conversion adds explicit null values that cause compatibility issues with certain API implementations. Co-Authored-By: Claude (claude-4.5-sonnet) <noreply@anthropic.com> * Fixing tests * Add Prompt caching and reasoning support for MiniMax, GLM, Xiaomi * Fix test_calculate_usage_completion_tokens_details_always_populated and logging object test * Fix gemini-robotics-er-1.5-preview name * Fix gemini-robotics-er-1.5-preview name * Fix team cli auth flow (#19666) * Cleanup code for user cli auth, and make sure not to prompt user for team multiple times while polling * Adding tests * Cleanup normalize teams some more * fix(vertex_ai): support model names with slashes in passthrough URLs (#19944) The regex in get_vertex_model_id_from_url() was using [^/:]+ which stopped at the first slash, truncating model names like 'gcp/google/gemini-2.5-flash' to just 'gcp'. This caused access_groups checks to fail for custom model names. Changed the pattern to [^:]+ to allow slashes in model names, only stopping at the colon before the action (e.g., :generateContent). * Fix thread leak in OpenTelemetry dynamic header path (#19946) * UI: New build * breakdown by team and keys * Adding test * Fixing build * fix pypdf: >=6.6.2 * [Fix] A2a Gateway - Allow supporting old A2a card formats (#19949) * fix: LiteLLMA2ACardResolver * fix: LiteLLMA2ACardResolver * feat: .well-known/agent.json * test_card_resolver_fallback_from_new_to_old_path * Add error_message search in spend logs endpoint * Adding Error message search to ui spend logs * fix * fix(presidio): reuse HTTP connections to prevent OOMs (#19964) * [Fix] VertexAI Pass through - fix regression that caused vertex ai passthroughs to stop working for router models (#19967) * fix(vertex_ai): replace custom model names with actual Vertex AI model names in passthrough URLs (#19948) When the passthrough URL already contains project and location, the code was skipping the deployment lookup and forwarding the URL as-is to Vertex AI. For custom model names like gcp/google/gemini-2.5-flash, Vertex AI returned 404 because it only knows the actual model name (gemini-2.5-flash). The fix makes the deployment lookup always run, so the custom model name gets replaced with the actual Vertex AI model name before forwarding. * add _resolve_vertex_model_from_router * fix: get_llm_provider * Potential fix for code scanning alert no. 4020: Clear-text logging of sensitive information Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com> --------- Co-authored-by: michelligabriele <gabriele.michelli@icloud.com> Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com> * Reusable Table Sort Component * Fixing sorting API calls * [Release Day] - Fixed CI/CD issues & changed processes (#19902) * [Feat] - Search API add /list endpoint to list what search tools exist in router (#19969) * feat: List all available search tools configured in the router. * add debugging search API * add debugging search API * fixing sorting for v2/model/info * [Feat] LiteLLM Vector Stores - Add permission management for users, teams (#19972) * fix: create_vector_store_in_db * add team/user to LiteLLM_ManagedVectorStore * add _check_vector_store_access * add new fields * test_check_vector_store_access * add vector_store/list endpoints * fix code QA checks * feat: Add new OpenRouter models: `xiaomi/mimo-v2-flash`, `z-ai/glm-4.7`, `z-ai/glm-4.7-flash`, and `minimax/minimax-m2.1`. to model prices and context window (#19938) Co-authored-by: Rushil Chugh <Rushil> * fix gemini gemini-robotics-er-1.5-preview entry * removing _experimental out routes from gitignore * chore: update Next.js build artifacts (2026-01-29 04:12 UTC, node v22.16.0) * Add custom_llm_provider as gemini translation * Add test to check if model map is corretly formatted * Intentional bad model map * Add Validate model_prices_and_context_window.json job * Remove validate job from lint * Intentional bad model map * Intentional bad model map * Correct model map path * Fix: litellm_fix_robotic_model_map_entry * fix(mypy): fix type: ignore placement for OTEL LogRecord import The type: ignore[attr-defined] comment was on the import alias line inside parentheses, but mypy reports the error on the `from` line. Collapse to single-line imports so the suppression is on the correct line. Also add no-redef to the fallback branch. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> --------- Co-authored-by: yuneng-jiang <yuneng.jiang@gmail.com> Co-authored-by: Ishaan Jaffer <ishaanjaffer0324@gmail.com> Co-authored-by: Harshit Jain <48647625+Harshit28j@users.noreply.github.com> Co-authored-by: Cesar Garcia <128240629+Chesars@users.noreply.github.com> Co-authored-by: Yogeshwaran Ravichandran <96047771+yogeshwaran10@users.noreply.github.com> Co-authored-by: Alexsander Hamir <alexsanderhamirgomesbaptista@gmail.com> Co-authored-by: Harshit Jain <harshitjain0562@gmail.com> Co-authored-by: Jay Prajapati <79649559+jayy-77@users.noreply.github.com> Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com> Co-authored-by: Krish Dholakia <krrishdholakia@gmail.com> Co-authored-by: Tamir Kiviti <95572081+tamirkiviti13@users.noreply.github.com> Co-authored-by: Ephrim Stanley <ephrim.stanley@point72.com> Co-authored-by: michelligabriele <gabriele.michelli@icloud.com> Co-authored-by: Gabriele Michelli <michelligabriele0@gmail.com> Co-authored-by: Chesars <cesarponce19544@gmail.com> Co-authored-by: yogeshwaran10 <ywaran646@gmail.com> Co-authored-by: colinlin-stripe <colinlin@stripe.com> Co-authored-by: houdataali <84786211+houdataali@users.noreply.github.com> Co-authored-by: mubashir1osmani <mubashir.osmani777@gmail.com> Co-authored-by: Sameer Kankute <sameer@berri.ai> Co-authored-by: Cursor Agent <cursoragent@cursor.com> Co-authored-by: Milan <milan@berri.ai> Co-authored-by: Xianzong Xie <xianzongxie@stripe.com> Co-authored-by: Teo Stocco <zifeo@users.noreply.github.com> Co-authored-by: Pragya Sardana <pragyasardana@gmail.com> Co-authored-by: Ryan Wilson <84201908+ryewilson@users.noreply.github.com> Co-authored-by: Brian Caswell <bcaswell@microsoft.com> Co-authored-by: lizhen <lizhen10763@autohome.com.cn> Co-authored-by: boarder7395 <37314943+boarder7395@users.noreply.github.com> Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com> Co-authored-by: rushilchugh01 <58689126+rushilchugh01@users.noreply.github.com>
6 tasks
shriharsha98
added a commit
to juspay/litellm
that referenced
this pull request
Feb 13, 2026
* [Fix] LiteLLM VertexAI Pass through - ensuring incoming headers are forwarded down to target (BerriAI#19524) * test_vertex_passthrough_forwards_anthropic_beta_header * add_incoming_headers * fix linting errors * fix lint * fix: Send litellm_trace_id to Langfuse to link LiteLLM logs with Langfuse logs * test: update langfuse trace_id tests to use litellm_trace_id * Fix virtual keys table sorting * Adding tests * feat: add GMI Cloud provider support (BerriAI#19376) * feat: add GMI Cloud provider support Add GMI Cloud as an OpenAI-compatible provider with: - Provider configuration in providers.json - Documentation page with usage examples - Model pricing for 16 models (Claude, GPT, DeepSeek, Gemini, etc.) - Sidebar entry for docs navigation * Add gmi_cloud to provider_endpoints_support.json Add provider entry to pass CI validation check that ensures all providers in openai_like/providers.json are documented. * Fix provider key: gmi_cloud -> gmi Match the provider key with providers.json --------- Co-authored-by: Krish Dholakia <krrishdholakia@gmail.com> * Cut chat_completion latency by ~21% by reducing pre-call processing time (BerriAI#19535) * Adding scope to /models * e2e test internal viewer sidebar * Model Select for Create Team * create team model select * fixing build * [Fix] VertexAI Pass through - Ensure only anthropic betas are forwarded down to LLM API (BerriAI#19542) * fix ALLOWED_VERTEX_AI_PASSTHROUGH_HEADERS * test_vertex_passthrough_forwards_anthropic_beta_header * fix test_vertex_passthrough_forwards_anthropic_beta_header * test_vertex_passthrough_does_not_forward_litellm_auth_token * fix utils * Using Anthropic Beta Features on Vertex AI * test_forward_headers_from_request_x_pass_prefix * [Fix] VertexAI Pass through - Ensure only anthropic betas are forwarded down to LLM API (BerriAI#19542) * fix ALLOWED_VERTEX_AI_PASSTHROUGH_HEADERS * test_vertex_passthrough_forwards_anthropic_beta_header * fix test_vertex_passthrough_forwards_anthropic_beta_header * test_vertex_passthrough_does_not_forward_litellm_auth_token * fix utils * Using Anthropic Beta Features on Vertex AI * test_forward_headers_from_request_x_pass_prefix * fix(mcp): forward static_headers to MCP servers (BerriAI#19341) (BerriAI#19366) Forward static_headers from /mcp-rest/test/* routes into the MCP client so headers are present during session.initialize() and tool discovery. Also add a shared merge_mcp_headers() helper to keep header precedence consistent and ensure OpenAPI-to-MCP generated tools include static_headers. Tests: - pytest tests/test_litellm/proxy/_experimental/mcp_server/test_rest_endpoints.py - pytest tests/test_litellm/proxy/_experimental/mcp_server/test_mcp_server_manager.py -k register_openapi_tools_includes_static_headers Fixes BerriAI#19341 Co-authored-by: Krish Dholakia <krrishdholakia@gmail.com> * fix(azure): preserve content_policy_violation details for images (BerriAI#19328) (BerriAI#19372) Azure OpenAI Images (DALL·E 3) returns policy violations as a structured payload under body["error"], including inner_error.content_filter_results and revised_prompt. LiteLLM previously: - Failed to extract nested error messages (get_error_message only handled body["message"]) - Missed policy violation detection when error strings were generic - Dropped inner_error details when raising ContentPolicyViolationError This change: - Extracts nested Azure error fields (code/type/message + inner_error) - Detects policy violations via structured error codes - Passes an OpenAI-style error body + provider_specific_fields to preserve details Tests: - python3 -m pytest tests/test_litellm/llms/azure/test_azure_exception_mapping.py - python3 -m pytest tests/test_litellm/litellm_core_utils/test_exception_mapping_utils.py Fixes BerriAI#19328 * [Feat] Add Structured output for /v1/messages with Anthropic API, Azure Anthropic API, Bedrock Converse (BerriAI#19545) * fix: add AnthropicMessagesRequestOptionalParams * add _update_headers_with_anthropic_beta * fix output format tests * test_structured_output_e2e * TestAnthropicAPIStructuredOutput * test_structured_output_e2e * fix BASE * TestAzureAnthropicStructuredOutput * fix: Bedrock Converse * add nthropic Messages Pass-Through Architecture * fix: bedrock invoke output_format * fix: transform_anthropic_messages_request for vertex anthropic * TestBedrockInvokeStructuredOutput * docs anthropic vertex * docs fix * docs fix * fixing prompt-security's guardrail implementation (BerriAI#19374) * Consolidated change * fix(prompt_security): update message processing to persist sanitized files and filter for API calls * fix per krrishdholakia suggestion * Fix/per service ssl override v2 (BerriAI#19538) * refactor(ssl): support per-service SSL verification overrides * add test cases for ssl * docs: update Claude Code integration guides (BerriAI#19415) * docs: document Claude Code default models and env var overrides - Update config example with current Claude Code 2.1.x model names - Add section documenting default models (sonnet/haiku) that Claude Code requests - Document env var overrides (ANTHROPIC_DEFAULT_SONNET_MODEL, etc.) - Show how model_name alias can route to any provider (Bedrock, Vertex, etc.) * Update docs Removed warning about changing model names in Claude Code versions. * docs: add 1M context support and improve Claude Code quickstart guide - Add comprehensive 1M context window documentation - Document [1m] suffix usage and shell escaping requirements - Clarify that LiteLLM config should NOT include [1m] in model names - Add standalone claude_code_1m_context.md guide - Improve model selection documentation with environment variables - Add section on default models used by Claude Code v2.1.14 - Add troubleshooting for 1M context issues - Reorganize to emphasize environment variables approach Addresses GitHub issue BerriAI#14444 * docs: reorder model selection options - prioritize --model over env vars - Move command line/session model selection to Option 1 (most reliable) - Move environment variables to Option 2 - Add note that env vars may be cached from previous session - Emphasize that --model always uses exact model specified * docs: reorganize 1M context section - separate command line from env vars - Split 1M context examples into two clear sections - Show command line usage first (--model and /model) - Show environment variables as alternative approach - Improves readability and emphasizes most reliable method * docs: remove misleading default models section from website tutorial - Remove 'Default Models Used by Claude Code' section (misleading) - Remove claim that config must match exact default model names - Update config comment to be more general - Add claude-opus-4-5-20251101 to example config - Keep authentication section as-is * docs: correct model selection in website tutorial - Remove incorrect claim that Claude Code automatically uses proxy models - Add explicit model selection examples with --model and /model - Show environment variables as alternative approach - Remove misleading comment about 'multiple configured' * docs: add 1M context section to website tutorial - Add section on using [1m] suffix for 1 million token context - Include warning about shell escaping (quotes required) - Explain how Claude Code handles [1m] internally - Add /context verification command - Note that LiteLLM config should NOT include [1m] * docs: add tip about using .env for API keys - Add note that ANTHROPIC_API_KEY can be stored in .env file - Clarifies alternative to exporting environment variables * add redisvl dependency to the root requiremnts.tx (BerriAI#19417) * [Fix] UI Cost Estimator - Fix model dropdown (BerriAI#19529) * add cost estimator * ui fix show errors * test_estimate_cost_resolves_router_model_alias * fix: UI 404 error when SERVER_ROOT_PATH is set (BerriAI#19467) * fix: add case-insensitive support for guardrail mode and actions (BerriAI#19480) * fix(bedrock): correct streaming choice index for tool calls (BerriAI#19506) Bedrock's contentBlockIndex identifies content blocks within a message (text=0, tool_call=1), not OpenAI's choice index (which varies with n>1). This caused OpenAI SDK's ChatCompletionAccumulator to fail when tool call chunks arrived on index 1 while finish_reason arrived on index 0. Bedrock doesn't support n>1 (no such parameter exists): https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_InferenceConfiguration.html OpenAI choice index spec: https://platform.openai.com/docs/api-reference/chat/streaming * Fix Azure RPM calculation formula (BerriAI#19513) * Fix Azure RPM calculation formula * updated test * fix(azure response api): flatten tools for responses api to support nested definitions (BerriAI#19526) The Azure Responses API uses a different schema (flattened) for tools compared to the standard OpenAI/Azure Chat Completions API (nested). This caused a `BadRequestError` when users passed standard tool definitions. Changes: - Implemented tool flattening logic in `AzureOpenAIResponsesAPIConfig.transform_responses_api_request`. - Added comprehensive unit tests in test_azure_transformation.py to verify nested-to-flat transformation, pass-through of flat tools, and immutability. - Ensures cross-provider compatibility for tool definitions. Fixes BerriAI#19523 * Fix date overflow/division by zero in proxy utils (BerriAI#19527) * Fix date overflow/division by zero in proxy utils * Fix projected spend calculation * Strengthen projected spend tests * Fix Azure AI costs for Anthropic models (BerriAI#19530) * Fix Azure AI cost calculation * fixup * feat: Add MCP tools response to chat completions * feat: display mcp output on the play ground * Fix: generation config empty for batch * Add custom vertex ai mapping to the output * Add support for output formatfor bedrock invoke via v1/messages * feat: Limit stop sequence as per openai spec * Fix mypy error in litellm_staging_01_21_2026 * Fix: imagegeneration@006 has been deprecated * Fix : test_anthropic_via_responses_api * Fix: Responses API usage field type mismatch * Fix: Httpx timeout test failures * Fix: generationConfig removal from tests * fix: mypy error * comment code not used * feat: Add MCP tools response to chat completions * feat: display mcp output on the play ground * Fix batch tests * fix: mypy error * fix: mypy error * Fix:test_multiple_function_call * build(deps): bump lodash from 4.17.21 to 4.17.23 in /docs/my-website Bumps [lodash](https://github.com/lodash/lodash) from 4.17.21 to 4.17.23. - [Release notes](https://github.com/lodash/lodash/releases) - [Commits](lodash/lodash@4.17.21...4.17.23) --- updated-dependencies: - dependency-name: lodash dependency-version: 4.17.23 dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> * Metrics prometheus user team count (BerriAI#19520) * add user count and team count prometheus metrics * rebase * revert mistaken deletion * fix ui build and mypy lint * Adding python3-dev to non root * adding node-tar cve allowlist * fix(websearch_interception): filter internal kwargs before follow-up request (BerriAI#19577) The websearch interception handler was passing internal flags like `_websearch_interception_converted_stream` to the follow-up LLM request. This caused "Extra inputs are not permitted" errors from providers like Bedrock that use strict Pydantic validation. Fix: Filter out all kwargs starting with `_websearch_interception` prefix before making the follow-up anthropic_messages.acreate() call. * skip brave tests * Fix unsafe access to request attribute (BerriAI#19573) * updating promethus tests * Fix non-root proxy tests * Adding lodash-es to allowlist * attempt fix translation tests * fix: change oss staging branch name to reflect they're oss * Revert "[Infra] UI - E2E Tests: Internal Viewer Sidebar" * Overriding lodash-es with version 4.17.23 in docs * updating lodash for dashboard * bump: version 1.81.1 → 1.81.2 * Add reusable model select to update organization page * Fixing tests * Adding EOS to finish reasons * Adding retries to flaky tests * add opencode tutorial (BerriAI#19602) * Fix org all proxy model case * adjust opencode tutorial (BerriAI#19605) * Add OSS Adopters section to README * fix: completions mcp output ordering * feat(helm): Enable PreStop hook configuration in values.yaml (BerriAI#19613) * Fix: litellm/tests/test_proxy_server_non_root.py * Update README.md * Update README.md * [Feat] New LiteLLM Policy engine - create policies to manage guardrails, conditions - permissions per Key, Team (BerriAI#19612) * init PolicyMatcher * TestPolicyMatcherGetMatchingPolicies * TestPolicyMatcherGetMatchingPolicies * feat: init PolicyResolver * init resolver types * init policy from config * inint PolicyValidator * validate policy * init Architecture Diagram * test_add_guardrails_from_policy_engine * init _init_policy_engine * test updates * test fixws * new attachment config * simplify types * TestPolicyResolverInheritance * fix policy resolver * fix policies * fix applied policy * docs fix * docs fix * fix linting + QA checks * fix linting + QA fixes * test fixes * docs fix * fix: pass through endpoints update registry (BerriAI#19420) * fix: pass through endpoints update registry * add test case, fix lint error and comment to avoid confusion * fix pass through endpoints test case * [Fix] Anthropic models on Azure AI cache pricing (BerriAI#19532) (BerriAI#19614) * Update README.md * fix: for test * All Models Backend Search * adding test * test: completions mcp output test * chore: fix lint error * test: Skip anthropic model test when ANTHROPIC_API_KEY is not set * fix: include tool arguments in proxy_server_request for spend logs callbacks * feat: hashicorp vault rotate support * Add tool choice mapping for giga chat * Fix: Responses API logging error for StopIteration * Fix: test_nova_invoke_streaming_chunk_parsing * Remove f string * fix BerriAI#19620: SSO user roles are not updated for existing users (BerriAI#19621) * Fix: SSO user roles are not updated for existing users Fixes BerriAI#19620 * Refactor: Remove redundant user_info retrieval in SSOAuthenticationHandler * Test: add new tests for user creation and updates in get_user_info_from_db * ci cd fixes - linting security * resetting poetry and requirements * fixing security checks * docs fix * fixing config * skipping flaky tests * skipping non root tests entirely * security scan * attempt fix flaky tests * fixing flaky tests * [Feat] Guardrail Policy Management - Allow using UI to manage guardrail policies (BerriAI#19668) * init UI * init schema.prisma * fix: policy_crud_router * UI fixes * update gitignore * working v0 for policy mgmt * fix: endpoints to resolve guardrails * fix code QA checks * ui build issues * schema fixes * fix checks * docs fix * remove imports from functions * add schema.prisma * add migrtion * fix schema.prisma * remove imports from functions * fix lint * BUMP pyproject * add spend-queue-troubleshooting docs (BerriAI#19659) * add spend-queue-troubleshooting docs * adjust spend-queue-troubleshooting docs * fix linting * New add fallbacks modal * adding tests * Add Langfuse mock mode for testing without API calls (BerriAI#19676) * Add GCS mock mode for testing without API calls (BerriAI#19683) * Adding router settings to create team and key * fixing build * fixing tests * perf: Optimize strip_trailing_slash with O(1) index check (BerriAI#19679) * perf: Optimize strip_trailing_slash with O(1) index check Replace rstrip("/") with direct index check for O(1) performance instead of O(n) string scanning. Results: - strip_trailing_slash: 311ms → 13ms (96% faster) - get_standard_logging_object_payload: 6.11s → 5.80s (5% faster) * Handle multiple trailing slashes in strip_trailing_slash Use rstrip for correctness when URL ends with "//" or more, otherwise use O(1) index check for single trailing slash. * Fixing tests * perf: Optimize use_custom_pricing_for_model with set intersection (BerriAI#19677) * perf: Optimize use_custom_pricing_for_model with set intersection Cache CustomPricingLiteLLMParams.model_fields.keys() as a module-level frozenset and use set intersection to reduce loop iterations from 882k to 90k (only iterating over keys that exist in both sets). Performance improvement: 84% faster (6.3x speedup) - Before: 1.17s total, 65µs per call - After: 0.19s total, 10µs per call * Use .get() for defensive dictionary access * perf: skip pattern_router.route() for non-wildcard models (BerriAI#19664) Check "*" in model before calling pattern_router.route() to avoid unnecessary pattern matching for non-wildcard model configurations. * perf: Add LRU caching to get_model_info for faster cost lookups (BerriAI#19606) - Add @lru_cache decorator to get_model_info() and _cached_get_model_info_helper() - Update _invalidate_model_cost_lowercase_map() to clear these caches when model_cost changes - Update test to call cache invalidation after modifying litellm.model_cost Reduces get_model_cost_information from 46% to <1% of request handling time. * UI: new build * redirect to login on expired jwt * [Feat] UI + Backend - Allow adding policies on Keys/Teams + Viewing on Info panels (BerriAI#19688) * ui for policy mgmt * test_add_guardrails_from_policy_engine_accepts_dynamic_policies_and_pops_from_data * docs: add litellm-enterprise requirement for managed files (BerriAI#19689) * Update Gemini 2.0 Flash deprecation dates to March 31, 2026 (BerriAI#19592) Google announced that Gemini 2.0 Flash and Flash Lite models will be discontinued on March 31, 2026. Updated deprecation_date field for all affected model variants across different providers (vertex_ai, gemini, deepinfra, openrouter, vercel_ai_gateway). Models updated: - gemini-2.0-flash (added deprecation date) - gemini-2.0-flash-001 (updated from 2026-02-05) - gemini-2.0-flash-lite (added deprecation date) - gemini-2.0-flash-lite-001 (updated from 2026-02-25) All variants now correctly reflect the March 31, 2026 shutdown date. * fixing build * Fixing failing tests * deactivating non root tests * fixing arize tests * cache tests serial * fixing circleci config * fixing circleci config * Update OSS Adopters section with new table format * Fixing ruff check * bump: version 1.81.2 → 1.81.3 * chore: update Next.js build artifacts (2026-01-24 17:18 UTC, node v22.16.0) * CI/CD fixes - split local testing * fix: _apply_search_filter_to_models mypy linting * test_partner_models_httpx_streaming * test_web_search * Fix: log duplication when json_logs is enabled (BerriAI#19705) * fix: FLAKY tests * fix unstable tests * docs fix * docs fix * docs fix * docs fix * docs fix * test_get_default_unvicorn_init_args * fix flaky tests * test_hanging_request_azure * test_team_update_sc_2 * BUMP extras * test fixes * test fixes * test_retrieve_container_basic * Model and Team filtering * TestBedrockInvokeToolSearch * fix(presidio): resolve runtime error by handling asyncio loops in bac… (BerriAI#19714) * fix(presidio): resolve runtime error by handling asyncio loops in background threads * add test case for thread safety * UI Keys Teams Router Settings docs * chore: update Next.js build artifacts (2026-01-25 00:27 UTC, node v22.16.0) * test_stream_transformation_error_sync * fix patch reliability mock tests * fix MCP tests * fix: server rooth path (BerriAI#19790) * feat: tpm-rpm limit in prometheus metrics (BerriAI#19725) Co-authored-by: Krish Dholakia <krrishdholakia@gmail.com> * fix(proxy): support slashes in google generateContent model names (BerriAI#19737) * fix(proxy): support slashes in google route params * fix(proxy): extract google model ids with slashes * test(proxy): cover google model ids with slashes * fix(vertex_ai): support model names with slashes in passthrough URLs (BerriAI#19944) The regex in get_vertex_model_id_from_url() was using [^/:]+ which stopped at the first slash, truncating model names like 'gcp/google/gemini-2.5-flash' to just 'gcp'. This caused access_groups checks to fail for custom model names. Changed the pattern to [^:]+ to allow slashes in model names, only stopping at the colon before the action (e.g., :generateContent). * [Fix] VertexAI Pass through - fix regression that caused vertex ai passthroughs to stop working for router models (BerriAI#19967) * fix(vertex_ai): replace custom model names with actual Vertex AI model names in passthrough URLs (BerriAI#19948) When the passthrough URL already contains project and location, the code was skipping the deployment lookup and forwarding the URL as-is to Vertex AI. For custom model names like gcp/google/gemini-2.5-flash, Vertex AI returned 404 because it only knows the actual model name (gemini-2.5-flash). The fix makes the deployment lookup always run, so the custom model name gets replaced with the actual Vertex AI model name before forwarding. * add _resolve_vertex_model_from_router * fix: get_llm_provider * Potential fix for code scanning alert no. 4020: Clear-text logging of sensitive information Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com> --------- Co-authored-by: michelligabriele <gabriele.michelli@icloud.com> Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com> * [Feat] - Search API add /list endpoint to list what search tools exist in router (BerriAI#19969) * feat: List all available search tools configured in the router. * add debugging search API * add debugging search API * perf(prometheus): parallelize budget metrics, fix caching bug, reduce CPU by ~40% (BerriAI#20544) * fix: revert httpx client caching that caused closed client errors AsyncHTTPHandler.__del__ was closing httpx clients still in use by AsyncOpenAI/AsyncAzureOpenAI due to independent cache lifecycles. Restores standalone httpx client creation for OpenAI/Azure providers. * Revert "Merge pull request BerriAI#18790 from BerriAI/litellm_key_team_routing_3" This reverts commit ae26d8e, reversing changes made to 864e8c6. * fix MYPY lint * fixed build errors after merge * least busy debug logs --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com> Co-authored-by: mubashir1osmani <mubashir.osmani777@gmail.com> Co-authored-by: yuneng-jiang <yuneng.jiang@gmail.com> Co-authored-by: YutaSaito <36355491+uc4w6c@users.noreply.github.com> Co-authored-by: Yuta Saito <uc4w6c@bma.biglobe.ne.jp> Co-authored-by: Cesar Garcia <128240629+Chesars@users.noreply.github.com> Co-authored-by: Krish Dholakia <krrishdholakia@gmail.com> Co-authored-by: Alexsander Hamir <alexsanderhamirgomesbaptista@gmail.com> Co-authored-by: jay prajapati <79649559+jayy-77@users.noreply.github.com> Co-authored-by: Sameer Kankute <sameer@berri.ai> Co-authored-by: davida-ps <david.a@prompt.security> Co-authored-by: Harshit Jain <48647625+Harshit28j@users.noreply.github.com> Co-authored-by: houdataali <84786211+houdataali@users.noreply.github.com> Co-authored-by: João Dinis Ferreira <hello@joaof.eu> Co-authored-by: Emerson Gomes <emerson.gomes@thalesgroup.com> Co-authored-by: Yogeshwaran Ravichandran <96047771+yogeshwaran10@users.noreply.github.com> Co-authored-by: Will Chen <willchen90@gmail.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Eric Cao <ecao310@gmail.com> Co-authored-by: mpcusack-altos <mcusack@altoslabs.com> Co-authored-by: milan-berri <milan@berri.ai> Co-authored-by: John Greek <2006605+jgreek@users.noreply.github.com> Co-authored-by: xqe2011 <gz923553148@gmail.com> Co-authored-by: ryan-crabbe <128659760+ryan-crabbe@users.noreply.github.com> Co-authored-by: Harshit Jain <harshitjain0562@gmail.com> Co-authored-by: michelligabriele <gabriele.michelli@icloud.com> Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
shriharsha98
added a commit
to juspay/litellm
that referenced
this pull request
Feb 19, 2026
* Fix virtual keys table sorting * Adding tests * feat: add GMI Cloud provider support (BerriAI#19376) * feat: add GMI Cloud provider support Add GMI Cloud as an OpenAI-compatible provider with: - Provider configuration in providers.json - Documentation page with usage examples - Model pricing for 16 models (Claude, GPT, DeepSeek, Gemini, etc.) - Sidebar entry for docs navigation * Add gmi_cloud to provider_endpoints_support.json Add provider entry to pass CI validation check that ensures all providers in openai_like/providers.json are documented. * Fix provider key: gmi_cloud -> gmi Match the provider key with providers.json --------- Co-authored-by: Krish Dholakia <krrishdholakia@gmail.com> * Cut chat_completion latency by ~21% by reducing pre-call processing time (BerriAI#19535) * Adding scope to /models * e2e test internal viewer sidebar * Model Select for Create Team * create team model select * fixing build * [Fix] VertexAI Pass through - Ensure only anthropic betas are forwarded down to LLM API (BerriAI#19542) * fix ALLOWED_VERTEX_AI_PASSTHROUGH_HEADERS * test_vertex_passthrough_forwards_anthropic_beta_header * fix test_vertex_passthrough_forwards_anthropic_beta_header * test_vertex_passthrough_does_not_forward_litellm_auth_token * fix utils * Using Anthropic Beta Features on Vertex AI * test_forward_headers_from_request_x_pass_prefix * [Fix] VertexAI Pass through - Ensure only anthropic betas are forwarded down to LLM API (BerriAI#19542) * fix ALLOWED_VERTEX_AI_PASSTHROUGH_HEADERS * test_vertex_passthrough_forwards_anthropic_beta_header * fix test_vertex_passthrough_forwards_anthropic_beta_header * test_vertex_passthrough_does_not_forward_litellm_auth_token * fix utils * Using Anthropic Beta Features on Vertex AI * test_forward_headers_from_request_x_pass_prefix * fix(mcp): forward static_headers to MCP servers (BerriAI#19341) (BerriAI#19366) Forward static_headers from /mcp-rest/test/* routes into the MCP client so headers are present during session.initialize() and tool discovery. Also add a shared merge_mcp_headers() helper to keep header precedence consistent and ensure OpenAPI-to-MCP generated tools include static_headers. Tests: - pytest tests/test_litellm/proxy/_experimental/mcp_server/test_rest_endpoints.py - pytest tests/test_litellm/proxy/_experimental/mcp_server/test_mcp_server_manager.py -k register_openapi_tools_includes_static_headers Fixes BerriAI#19341 Co-authored-by: Krish Dholakia <krrishdholakia@gmail.com> * fix(azure): preserve content_policy_violation details for images (BerriAI#19328) (BerriAI#19372) Azure OpenAI Images (DALL·E 3) returns policy violations as a structured payload under body["error"], including inner_error.content_filter_results and revised_prompt. LiteLLM previously: - Failed to extract nested error messages (get_error_message only handled body["message"]) - Missed policy violation detection when error strings were generic - Dropped inner_error details when raising ContentPolicyViolationError This change: - Extracts nested Azure error fields (code/type/message + inner_error) - Detects policy violations via structured error codes - Passes an OpenAI-style error body + provider_specific_fields to preserve details Tests: - python3 -m pytest tests/test_litellm/llms/azure/test_azure_exception_mapping.py - python3 -m pytest tests/test_litellm/litellm_core_utils/test_exception_mapping_utils.py Fixes BerriAI#19328 * [Feat] Add Structured output for /v1/messages with Anthropic API, Azure Anthropic API, Bedrock Converse (BerriAI#19545) * fix: add AnthropicMessagesRequestOptionalParams * add _update_headers_with_anthropic_beta * fix output format tests * test_structured_output_e2e * TestAnthropicAPIStructuredOutput * test_structured_output_e2e * fix BASE * TestAzureAnthropicStructuredOutput * fix: Bedrock Converse * add nthropic Messages Pass-Through Architecture * fix: bedrock invoke output_format * fix: transform_anthropic_messages_request for vertex anthropic * TestBedrockInvokeStructuredOutput * docs anthropic vertex * docs fix * docs fix * fixing prompt-security's guardrail implementation (BerriAI#19374) * Consolidated change * fix(prompt_security): update message processing to persist sanitized files and filter for API calls * fix per krrishdholakia suggestion * Fix/per service ssl override v2 (BerriAI#19538) * refactor(ssl): support per-service SSL verification overrides * add test cases for ssl * docs: update Claude Code integration guides (BerriAI#19415) * docs: document Claude Code default models and env var overrides - Update config example with current Claude Code 2.1.x model names - Add section documenting default models (sonnet/haiku) that Claude Code requests - Document env var overrides (ANTHROPIC_DEFAULT_SONNET_MODEL, etc.) - Show how model_name alias can route to any provider (Bedrock, Vertex, etc.) * Update docs Removed warning about changing model names in Claude Code versions. * docs: add 1M context support and improve Claude Code quickstart guide - Add comprehensive 1M context window documentation - Document [1m] suffix usage and shell escaping requirements - Clarify that LiteLLM config should NOT include [1m] in model names - Add standalone claude_code_1m_context.md guide - Improve model selection documentation with environment variables - Add section on default models used by Claude Code v2.1.14 - Add troubleshooting for 1M context issues - Reorganize to emphasize environment variables approach Addresses GitHub issue BerriAI#14444 * docs: reorder model selection options - prioritize --model over env vars - Move command line/session model selection to Option 1 (most reliable) - Move environment variables to Option 2 - Add note that env vars may be cached from previous session - Emphasize that --model always uses exact model specified * docs: reorganize 1M context section - separate command line from env vars - Split 1M context examples into two clear sections - Show command line usage first (--model and /model) - Show environment variables as alternative approach - Improves readability and emphasizes most reliable method * docs: remove misleading default models section from website tutorial - Remove 'Default Models Used by Claude Code' section (misleading) - Remove claim that config must match exact default model names - Update config comment to be more general - Add claude-opus-4-5-20251101 to example config - Keep authentication section as-is * docs: correct model selection in website tutorial - Remove incorrect claim that Claude Code automatically uses proxy models - Add explicit model selection examples with --model and /model - Show environment variables as alternative approach - Remove misleading comment about 'multiple configured' * docs: add 1M context section to website tutorial - Add section on using [1m] suffix for 1 million token context - Include warning about shell escaping (quotes required) - Explain how Claude Code handles [1m] internally - Add /context verification command - Note that LiteLLM config should NOT include [1m] * docs: add tip about using .env for API keys - Add note that ANTHROPIC_API_KEY can be stored in .env file - Clarifies alternative to exporting environment variables * add redisvl dependency to the root requiremnts.tx (BerriAI#19417) * [Fix] UI Cost Estimator - Fix model dropdown (BerriAI#19529) * add cost estimator * ui fix show errors * test_estimate_cost_resolves_router_model_alias * fix: UI 404 error when SERVER_ROOT_PATH is set (BerriAI#19467) * fix: add case-insensitive support for guardrail mode and actions (BerriAI#19480) * fix(bedrock): correct streaming choice index for tool calls (BerriAI#19506) Bedrock's contentBlockIndex identifies content blocks within a message (text=0, tool_call=1), not OpenAI's choice index (which varies with n>1). This caused OpenAI SDK's ChatCompletionAccumulator to fail when tool call chunks arrived on index 1 while finish_reason arrived on index 0. Bedrock doesn't support n>1 (no such parameter exists): https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_InferenceConfiguration.html OpenAI choice index spec: https://platform.openai.com/docs/api-reference/chat/streaming * Fix Azure RPM calculation formula (BerriAI#19513) * Fix Azure RPM calculation formula * updated test * fix(azure response api): flatten tools for responses api to support nested definitions (BerriAI#19526) The Azure Responses API uses a different schema (flattened) for tools compared to the standard OpenAI/Azure Chat Completions API (nested). This caused a `BadRequestError` when users passed standard tool definitions. Changes: - Implemented tool flattening logic in `AzureOpenAIResponsesAPIConfig.transform_responses_api_request`. - Added comprehensive unit tests in test_azure_transformation.py to verify nested-to-flat transformation, pass-through of flat tools, and immutability. - Ensures cross-provider compatibility for tool definitions. Fixes BerriAI#19523 * Fix date overflow/division by zero in proxy utils (BerriAI#19527) * Fix date overflow/division by zero in proxy utils * Fix projected spend calculation * Strengthen projected spend tests * Fix Azure AI costs for Anthropic models (BerriAI#19530) * Fix Azure AI cost calculation * fixup * feat: Add MCP tools response to chat completions * feat: display mcp output on the play ground * Fix: generation config empty for batch * Add custom vertex ai mapping to the output * Add support for output formatfor bedrock invoke via v1/messages * feat: Limit stop sequence as per openai spec * Fix mypy error in litellm_staging_01_21_2026 * Fix: imagegeneration@006 has been deprecated * Fix : test_anthropic_via_responses_api * Fix: Responses API usage field type mismatch * Fix: Httpx timeout test failures * Fix: generationConfig removal from tests * fix: mypy error * comment code not used * feat: Add MCP tools response to chat completions * feat: display mcp output on the play ground * Fix batch tests * fix: mypy error * fix: mypy error * Fix:test_multiple_function_call * build(deps): bump lodash from 4.17.21 to 4.17.23 in /docs/my-website Bumps [lodash](https://github.com/lodash/lodash) from 4.17.21 to 4.17.23. - [Release notes](https://github.com/lodash/lodash/releases) - [Commits](lodash/lodash@4.17.21...4.17.23) --- updated-dependencies: - dependency-name: lodash dependency-version: 4.17.23 dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> * Metrics prometheus user team count (BerriAI#19520) * add user count and team count prometheus metrics * rebase * revert mistaken deletion * fix ui build and mypy lint * Adding python3-dev to non root * adding node-tar cve allowlist * fix(websearch_interception): filter internal kwargs before follow-up request (BerriAI#19577) The websearch interception handler was passing internal flags like `_websearch_interception_converted_stream` to the follow-up LLM request. This caused "Extra inputs are not permitted" errors from providers like Bedrock that use strict Pydantic validation. Fix: Filter out all kwargs starting with `_websearch_interception` prefix before making the follow-up anthropic_messages.acreate() call. * skip brave tests * Fix unsafe access to request attribute (BerriAI#19573) * updating promethus tests * Fix non-root proxy tests * Adding lodash-es to allowlist * attempt fix translation tests * fix: change oss staging branch name to reflect they're oss * Revert "[Infra] UI - E2E Tests: Internal Viewer Sidebar" * Overriding lodash-es with version 4.17.23 in docs * updating lodash for dashboard * bump: version 1.81.1 → 1.81.2 * Add reusable model select to update organization page * Fixing tests * Adding EOS to finish reasons * Adding retries to flaky tests * add opencode tutorial (BerriAI#19602) * Fix org all proxy model case * adjust opencode tutorial (BerriAI#19605) * Add OSS Adopters section to README * fix: completions mcp output ordering * feat(helm): Enable PreStop hook configuration in values.yaml (BerriAI#19613) * Fix: litellm/tests/test_proxy_server_non_root.py * Update README.md * Update README.md * [Feat] New LiteLLM Policy engine - create policies to manage guardrails, conditions - permissions per Key, Team (BerriAI#19612) * init PolicyMatcher * TestPolicyMatcherGetMatchingPolicies * TestPolicyMatcherGetMatchingPolicies * feat: init PolicyResolver * init resolver types * init policy from config * inint PolicyValidator * validate policy * init Architecture Diagram * test_add_guardrails_from_policy_engine * init _init_policy_engine * test updates * test fixws * new attachment config * simplify types * TestPolicyResolverInheritance * fix policy resolver * fix policies * fix applied policy * docs fix * docs fix * fix linting + QA checks * fix linting + QA fixes * test fixes * docs fix * fix: pass through endpoints update registry (BerriAI#19420) * fix: pass through endpoints update registry * add test case, fix lint error and comment to avoid confusion * fix pass through endpoints test case * [Fix] Anthropic models on Azure AI cache pricing (BerriAI#19532) (BerriAI#19614) * Update README.md * fix: for test * All Models Backend Search * adding test * test: completions mcp output test * chore: fix lint error * test: Skip anthropic model test when ANTHROPIC_API_KEY is not set * fix: include tool arguments in proxy_server_request for spend logs callbacks * feat: hashicorp vault rotate support * Add tool choice mapping for giga chat * Fix: Responses API logging error for StopIteration * Fix: test_nova_invoke_streaming_chunk_parsing * Remove f string * fix BerriAI#19620: SSO user roles are not updated for existing users (BerriAI#19621) * Fix: SSO user roles are not updated for existing users Fixes BerriAI#19620 * Refactor: Remove redundant user_info retrieval in SSOAuthenticationHandler * Test: add new tests for user creation and updates in get_user_info_from_db * ci cd fixes - linting security * resetting poetry and requirements * fixing security checks * docs fix * fixing config * skipping flaky tests * skipping non root tests entirely * security scan * attempt fix flaky tests * fixing flaky tests * [Feat] Guardrail Policy Management - Allow using UI to manage guardrail policies (BerriAI#19668) * init UI * init schema.prisma * fix: policy_crud_router * UI fixes * update gitignore * working v0 for policy mgmt * fix: endpoints to resolve guardrails * fix code QA checks * ui build issues * schema fixes * fix checks * docs fix * remove imports from functions * add schema.prisma * add migrtion * fix schema.prisma * remove imports from functions * fix lint * BUMP pyproject * add spend-queue-troubleshooting docs (BerriAI#19659) * add spend-queue-troubleshooting docs * adjust spend-queue-troubleshooting docs * fix linting * New add fallbacks modal * adding tests * Add Langfuse mock mode for testing without API calls (BerriAI#19676) * Add GCS mock mode for testing without API calls (BerriAI#19683) * Adding router settings to create team and key * fixing build * fixing tests * perf: Optimize strip_trailing_slash with O(1) index check (BerriAI#19679) * perf: Optimize strip_trailing_slash with O(1) index check Replace rstrip("/") with direct index check for O(1) performance instead of O(n) string scanning. Results: - strip_trailing_slash: 311ms → 13ms (96% faster) - get_standard_logging_object_payload: 6.11s → 5.80s (5% faster) * Handle multiple trailing slashes in strip_trailing_slash Use rstrip for correctness when URL ends with "//" or more, otherwise use O(1) index check for single trailing slash. * Fixing tests * perf: Optimize use_custom_pricing_for_model with set intersection (BerriAI#19677) * perf: Optimize use_custom_pricing_for_model with set intersection Cache CustomPricingLiteLLMParams.model_fields.keys() as a module-level frozenset and use set intersection to reduce loop iterations from 882k to 90k (only iterating over keys that exist in both sets). Performance improvement: 84% faster (6.3x speedup) - Before: 1.17s total, 65µs per call - After: 0.19s total, 10µs per call * Use .get() for defensive dictionary access * perf: skip pattern_router.route() for non-wildcard models (BerriAI#19664) Check "*" in model before calling pattern_router.route() to avoid unnecessary pattern matching for non-wildcard model configurations. * perf: Add LRU caching to get_model_info for faster cost lookups (BerriAI#19606) - Add @lru_cache decorator to get_model_info() and _cached_get_model_info_helper() - Update _invalidate_model_cost_lowercase_map() to clear these caches when model_cost changes - Update test to call cache invalidation after modifying litellm.model_cost Reduces get_model_cost_information from 46% to <1% of request handling time. * UI: new build * redirect to login on expired jwt * [Feat] UI + Backend - Allow adding policies on Keys/Teams + Viewing on Info panels (BerriAI#19688) * ui for policy mgmt * test_add_guardrails_from_policy_engine_accepts_dynamic_policies_and_pops_from_data * docs: add litellm-enterprise requirement for managed files (BerriAI#19689) * Update Gemini 2.0 Flash deprecation dates to March 31, 2026 (BerriAI#19592) Google announced that Gemini 2.0 Flash and Flash Lite models will be discontinued on March 31, 2026. Updated deprecation_date field for all affected model variants across different providers (vertex_ai, gemini, deepinfra, openrouter, vercel_ai_gateway). Models updated: - gemini-2.0-flash (added deprecation date) - gemini-2.0-flash-001 (updated from 2026-02-05) - gemini-2.0-flash-lite (added deprecation date) - gemini-2.0-flash-lite-001 (updated from 2026-02-25) All variants now correctly reflect the March 31, 2026 shutdown date. * fixing build * Fixing failing tests * deactivating non root tests * fixing arize tests * cache tests serial * fixing circleci config * fixing circleci config * Update OSS Adopters section with new table format * Fixing ruff check * bump: version 1.81.2 → 1.81.3 * chore: update Next.js build artifacts (2026-01-24 17:18 UTC, node v22.16.0) * CI/CD fixes - split local testing * fix: _apply_search_filter_to_models mypy linting * test_partner_models_httpx_streaming * test_web_search * Fix: log duplication when json_logs is enabled (BerriAI#19705) * fix: FLAKY tests * fix unstable tests * docs fix * docs fix * docs fix * docs fix * docs fix * test_get_default_unvicorn_init_args * fix flaky tests * test_hanging_request_azure * test_team_update_sc_2 * BUMP extras * test fixes * test fixes * test_retrieve_container_basic * Model and Team filtering * TestBedrockInvokeToolSearch * fix(presidio): resolve runtime error by handling asyncio loops in bac… (BerriAI#19714) * fix(presidio): resolve runtime error by handling asyncio loops in background threads * add test case for thread safety * UI Keys Teams Router Settings docs * chore: update Next.js build artifacts (2026-01-25 00:27 UTC, node v22.16.0) * test_stream_transformation_error_sync * fix patch reliability mock tests * fix MCP tests * fix: server rooth path (BerriAI#19790) * feat: tpm-rpm limit in prometheus metrics (BerriAI#19725) Co-authored-by: Krish Dholakia <krrishdholakia@gmail.com> * fix(proxy): support slashes in google generateContent model names (BerriAI#19737) * fix(proxy): support slashes in google route params * fix(proxy): extract google model ids with slashes * test(proxy): cover google model ids with slashes * fix(vertex_ai): support model names with slashes in passthrough URLs (BerriAI#19944) The regex in get_vertex_model_id_from_url() was using [^/:]+ which stopped at the first slash, truncating model names like 'gcp/google/gemini-2.5-flash' to just 'gcp'. This caused access_groups checks to fail for custom model names. Changed the pattern to [^:]+ to allow slashes in model names, only stopping at the colon before the action (e.g., :generateContent). * [Fix] VertexAI Pass through - fix regression that caused vertex ai passthroughs to stop working for router models (BerriAI#19967) * fix(vertex_ai): replace custom model names with actual Vertex AI model names in passthrough URLs (BerriAI#19948) When the passthrough URL already contains project and location, the code was skipping the deployment lookup and forwarding the URL as-is to Vertex AI. For custom model names like gcp/google/gemini-2.5-flash, Vertex AI returned 404 because it only knows the actual model name (gemini-2.5-flash). The fix makes the deployment lookup always run, so the custom model name gets replaced with the actual Vertex AI model name before forwarding. * add _resolve_vertex_model_from_router * fix: get_llm_provider * Potential fix for code scanning alert no. 4020: Clear-text logging of sensitive information Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com> --------- Co-authored-by: michelligabriele <gabriele.michelli@icloud.com> Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com> * [Feat] - Search API add /list endpoint to list what search tools exist in router (BerriAI#19969) * feat: List all available search tools configured in the router. * add debugging search API * add debugging search API * perf(prometheus): parallelize budget metrics, fix caching bug, reduce CPU by ~40% (BerriAI#20544) * fix: revert httpx client caching that caused closed client errors AsyncHTTPHandler.__del__ was closing httpx clients still in use by AsyncOpenAI/AsyncAzureOpenAI due to independent cache lifecycles. Restores standalone httpx client creation for OpenAI/Azure providers. * Revert "Merge pull request BerriAI#18790 from BerriAI/litellm_key_team_routing_3" This reverts commit ae26d8e, reversing changes made to 864e8c6. * fix MYPY lint * fixed build errors after merge * added sandbox branch for gcr push (#61) * added sandbox branch for gcr push * jenkins setup for sbx * build fix * addding sync/v[0-9] branches for gcr push * build fix * least busy debug logs * Fix: remove x-anthropic-billing block * added backl anthropic envs * merge fixes * least busy router changes --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: YutaSaito <36355491+uc4w6c@users.noreply.github.com> Co-authored-by: yuneng-jiang <yuneng.jiang@gmail.com> Co-authored-by: Cesar Garcia <128240629+Chesars@users.noreply.github.com> Co-authored-by: Krish Dholakia <krrishdholakia@gmail.com> Co-authored-by: Alexsander Hamir <alexsanderhamirgomesbaptista@gmail.com> Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com> Co-authored-by: jay prajapati <79649559+jayy-77@users.noreply.github.com> Co-authored-by: Sameer Kankute <sameer@berri.ai> Co-authored-by: davida-ps <david.a@prompt.security> Co-authored-by: Harshit Jain <48647625+Harshit28j@users.noreply.github.com> Co-authored-by: houdataali <84786211+houdataali@users.noreply.github.com> Co-authored-by: João Dinis Ferreira <hello@joaof.eu> Co-authored-by: Emerson Gomes <emerson.gomes@thalesgroup.com> Co-authored-by: Yogeshwaran Ravichandran <96047771+yogeshwaran10@users.noreply.github.com> Co-authored-by: Will Chen <willchen90@gmail.com> Co-authored-by: Yuta Saito <uc4w6c@bma.biglobe.ne.jp> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Eric Cao <ecao310@gmail.com> Co-authored-by: mpcusack-altos <mcusack@altoslabs.com> Co-authored-by: milan-berri <milan@berri.ai> Co-authored-by: John Greek <2006605+jgreek@users.noreply.github.com> Co-authored-by: xqe2011 <gz923553148@gmail.com> Co-authored-by: mubashir1osmani <mubashir.osmani777@gmail.com> Co-authored-by: ryan-crabbe <128659760+ryan-crabbe@users.noreply.github.com> Co-authored-by: Harshit Jain <harshitjain0562@gmail.com> Co-authored-by: michelligabriele <gabriele.michelli@icloud.com> Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com> Co-authored-by: pramodp-dotcom <pramod.p@juspay.in>
shriharsha98
added a commit
to juspay/litellm
that referenced
this pull request
Feb 23, 2026
* added sandbox branch for gcr push * jenkins setup for sbx * build fix * addding sync/v[0-9] branches for gcr push * build fix * Feature/upgrade to v1.81.3 stable (#63) * [Fix] LiteLLM VertexAI Pass through - ensuring incoming headers are forwarded down to target (BerriAI#19524) * test_vertex_passthrough_forwards_anthropic_beta_header * add_incoming_headers * fix linting errors * fix lint * fix: Send litellm_trace_id to Langfuse to link LiteLLM logs with Langfuse logs * test: update langfuse trace_id tests to use litellm_trace_id * Fix virtual keys table sorting * Adding tests * feat: add GMI Cloud provider support (BerriAI#19376) * feat: add GMI Cloud provider support Add GMI Cloud as an OpenAI-compatible provider with: - Provider configuration in providers.json - Documentation page with usage examples - Model pricing for 16 models (Claude, GPT, DeepSeek, Gemini, etc.) - Sidebar entry for docs navigation * Add gmi_cloud to provider_endpoints_support.json Add provider entry to pass CI validation check that ensures all providers in openai_like/providers.json are documented. * Fix provider key: gmi_cloud -> gmi Match the provider key with providers.json --------- Co-authored-by: Krish Dholakia <krrishdholakia@gmail.com> * Cut chat_completion latency by ~21% by reducing pre-call processing time (BerriAI#19535) * Adding scope to /models * e2e test internal viewer sidebar * Model Select for Create Team * create team model select * fixing build * [Fix] VertexAI Pass through - Ensure only anthropic betas are forwarded down to LLM API (BerriAI#19542) * fix ALLOWED_VERTEX_AI_PASSTHROUGH_HEADERS * test_vertex_passthrough_forwards_anthropic_beta_header * fix test_vertex_passthrough_forwards_anthropic_beta_header * test_vertex_passthrough_does_not_forward_litellm_auth_token * fix utils * Using Anthropic Beta Features on Vertex AI * test_forward_headers_from_request_x_pass_prefix * [Fix] VertexAI Pass through - Ensure only anthropic betas are forwarded down to LLM API (BerriAI#19542) * fix ALLOWED_VERTEX_AI_PASSTHROUGH_HEADERS * test_vertex_passthrough_forwards_anthropic_beta_header * fix test_vertex_passthrough_forwards_anthropic_beta_header * test_vertex_passthrough_does_not_forward_litellm_auth_token * fix utils * Using Anthropic Beta Features on Vertex AI * test_forward_headers_from_request_x_pass_prefix * fix(mcp): forward static_headers to MCP servers (BerriAI#19341) (BerriAI#19366) Forward static_headers from /mcp-rest/test/* routes into the MCP client so headers are present during session.initialize() and tool discovery. Also add a shared merge_mcp_headers() helper to keep header precedence consistent and ensure OpenAPI-to-MCP generated tools include static_headers. Tests: - pytest tests/test_litellm/proxy/_experimental/mcp_server/test_rest_endpoints.py - pytest tests/test_litellm/proxy/_experimental/mcp_server/test_mcp_server_manager.py -k register_openapi_tools_includes_static_headers Fixes BerriAI#19341 Co-authored-by: Krish Dholakia <krrishdholakia@gmail.com> * fix(azure): preserve content_policy_violation details for images (BerriAI#19328) (BerriAI#19372) Azure OpenAI Images (DALL·E 3) returns policy violations as a structured payload under body["error"], including inner_error.content_filter_results and revised_prompt. LiteLLM previously: - Failed to extract nested error messages (get_error_message only handled body["message"]) - Missed policy violation detection when error strings were generic - Dropped inner_error details when raising ContentPolicyViolationError This change: - Extracts nested Azure error fields (code/type/message + inner_error) - Detects policy violations via structured error codes - Passes an OpenAI-style error body + provider_specific_fields to preserve details Tests: - python3 -m pytest tests/test_litellm/llms/azure/test_azure_exception_mapping.py - python3 -m pytest tests/test_litellm/litellm_core_utils/test_exception_mapping_utils.py Fixes BerriAI#19328 * [Feat] Add Structured output for /v1/messages with Anthropic API, Azure Anthropic API, Bedrock Converse (BerriAI#19545) * fix: add AnthropicMessagesRequestOptionalParams * add _update_headers_with_anthropic_beta * fix output format tests * test_structured_output_e2e * TestAnthropicAPIStructuredOutput * test_structured_output_e2e * fix BASE * TestAzureAnthropicStructuredOutput * fix: Bedrock Converse * add nthropic Messages Pass-Through Architecture * fix: bedrock invoke output_format * fix: transform_anthropic_messages_request for vertex anthropic * TestBedrockInvokeStructuredOutput * docs anthropic vertex * docs fix * docs fix * fixing prompt-security's guardrail implementation (BerriAI#19374) * Consolidated change * fix(prompt_security): update message processing to persist sanitized files and filter for API calls * fix per krrishdholakia suggestion * Fix/per service ssl override v2 (BerriAI#19538) * refactor(ssl): support per-service SSL verification overrides * add test cases for ssl * docs: update Claude Code integration guides (BerriAI#19415) * docs: document Claude Code default models and env var overrides - Update config example with current Claude Code 2.1.x model names - Add section documenting default models (sonnet/haiku) that Claude Code requests - Document env var overrides (ANTHROPIC_DEFAULT_SONNET_MODEL, etc.) - Show how model_name alias can route to any provider (Bedrock, Vertex, etc.) * Update docs Removed warning about changing model names in Claude Code versions. * docs: add 1M context support and improve Claude Code quickstart guide - Add comprehensive 1M context window documentation - Document [1m] suffix usage and shell escaping requirements - Clarify that LiteLLM config should NOT include [1m] in model names - Add standalone claude_code_1m_context.md guide - Improve model selection documentation with environment variables - Add section on default models used by Claude Code v2.1.14 - Add troubleshooting for 1M context issues - Reorganize to emphasize environment variables approach Addresses GitHub issue BerriAI#14444 * docs: reorder model selection options - prioritize --model over env vars - Move command line/session model selection to Option 1 (most reliable) - Move environment variables to Option 2 - Add note that env vars may be cached from previous session - Emphasize that --model always uses exact model specified * docs: reorganize 1M context section - separate command line from env vars - Split 1M context examples into two clear sections - Show command line usage first (--model and /model) - Show environment variables as alternative approach - Improves readability and emphasizes most reliable method * docs: remove misleading default models section from website tutorial - Remove 'Default Models Used by Claude Code' section (misleading) - Remove claim that config must match exact default model names - Update config comment to be more general - Add claude-opus-4-5-20251101 to example config - Keep authentication section as-is * docs: correct model selection in website tutorial - Remove incorrect claim that Claude Code automatically uses proxy models - Add explicit model selection examples with --model and /model - Show environment variables as alternative approach - Remove misleading comment about 'multiple configured' * docs: add 1M context section to website tutorial - Add section on using [1m] suffix for 1 million token context - Include warning about shell escaping (quotes required) - Explain how Claude Code handles [1m] internally - Add /context verification command - Note that LiteLLM config should NOT include [1m] * docs: add tip about using .env for API keys - Add note that ANTHROPIC_API_KEY can be stored in .env file - Clarifies alternative to exporting environment variables * add redisvl dependency to the root requiremnts.tx (BerriAI#19417) * [Fix] UI Cost Estimator - Fix model dropdown (BerriAI#19529) * add cost estimator * ui fix show errors * test_estimate_cost_resolves_router_model_alias * fix: UI 404 error when SERVER_ROOT_PATH is set (BerriAI#19467) * fix: add case-insensitive support for guardrail mode and actions (BerriAI#19480) * fix(bedrock): correct streaming choice index for tool calls (BerriAI#19506) Bedrock's contentBlockIndex identifies content blocks within a message (text=0, tool_call=1), not OpenAI's choice index (which varies with n>1). This caused OpenAI SDK's ChatCompletionAccumulator to fail when tool call chunks arrived on index 1 while finish_reason arrived on index 0. Bedrock doesn't support n>1 (no such parameter exists): https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_InferenceConfiguration.html OpenAI choice index spec: https://platform.openai.com/docs/api-reference/chat/streaming * Fix Azure RPM calculation formula (BerriAI#19513) * Fix Azure RPM calculation formula * updated test * fix(azure response api): flatten tools for responses api to support nested definitions (BerriAI#19526) The Azure Responses API uses a different schema (flattened) for tools compared to the standard OpenAI/Azure Chat Completions API (nested). This caused a `BadRequestError` when users passed standard tool definitions. Changes: - Implemented tool flattening logic in `AzureOpenAIResponsesAPIConfig.transform_responses_api_request`. - Added comprehensive unit tests in test_azure_transformation.py to verify nested-to-flat transformation, pass-through of flat tools, and immutability. - Ensures cross-provider compatibility for tool definitions. Fixes BerriAI#19523 * Fix date overflow/division by zero in proxy utils (BerriAI#19527) * Fix date overflow/division by zero in proxy utils * Fix projected spend calculation * Strengthen projected spend tests * Fix Azure AI costs for Anthropic models (BerriAI#19530) * Fix Azure AI cost calculation * fixup * feat: Add MCP tools response to chat completions * feat: display mcp output on the play ground * Fix: generation config empty for batch * Add custom vertex ai mapping to the output * Add support for output formatfor bedrock invoke via v1/messages * feat: Limit stop sequence as per openai spec * Fix mypy error in litellm_staging_01_21_2026 * Fix: imagegeneration@006 has been deprecated * Fix : test_anthropic_via_responses_api * Fix: Responses API usage field type mismatch * Fix: Httpx timeout test failures * Fix: generationConfig removal from tests * fix: mypy error * comment code not used * feat: Add MCP tools response to chat completions * feat: display mcp output on the play ground * Fix batch tests * fix: mypy error * fix: mypy error * Fix:test_multiple_function_call * build(deps): bump lodash from 4.17.21 to 4.17.23 in /docs/my-website Bumps [lodash](https://github.com/lodash/lodash) from 4.17.21 to 4.17.23. - [Release notes](https://github.com/lodash/lodash/releases) - [Commits](lodash/lodash@4.17.21...4.17.23) --- updated-dependencies: - dependency-name: lodash dependency-version: 4.17.23 dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> * Metrics prometheus user team count (BerriAI#19520) * add user count and team count prometheus metrics * rebase * revert mistaken deletion * fix ui build and mypy lint * Adding python3-dev to non root * adding node-tar cve allowlist * fix(websearch_interception): filter internal kwargs before follow-up request (BerriAI#19577) The websearch interception handler was passing internal flags like `_websearch_interception_converted_stream` to the follow-up LLM request. This caused "Extra inputs are not permitted" errors from providers like Bedrock that use strict Pydantic validation. Fix: Filter out all kwargs starting with `_websearch_interception` prefix before making the follow-up anthropic_messages.acreate() call. * skip brave tests * Fix unsafe access to request attribute (BerriAI#19573) * updating promethus tests * Fix non-root proxy tests * Adding lodash-es to allowlist * attempt fix translation tests * fix: change oss staging branch name to reflect they're oss * Revert "[Infra] UI - E2E Tests: Internal Viewer Sidebar" * Overriding lodash-es with version 4.17.23 in docs * updating lodash for dashboard * bump: version 1.81.1 → 1.81.2 * Add reusable model select to update organization page * Fixing tests * Adding EOS to finish reasons * Adding retries to flaky tests * add opencode tutorial (BerriAI#19602) * Fix org all proxy model case * adjust opencode tutorial (BerriAI#19605) * Add OSS Adopters section to README * fix: completions mcp output ordering * feat(helm): Enable PreStop hook configuration in values.yaml (BerriAI#19613) * Fix: litellm/tests/test_proxy_server_non_root.py * Update README.md * Update README.md * [Feat] New LiteLLM Policy engine - create policies to manage guardrails, conditions - permissions per Key, Team (BerriAI#19612) * init PolicyMatcher * TestPolicyMatcherGetMatchingPolicies * TestPolicyMatcherGetMatchingPolicies * feat: init PolicyResolver * init resolver types * init policy from config * inint PolicyValidator * validate policy * init Architecture Diagram * test_add_guardrails_from_policy_engine * init _init_policy_engine * test updates * test fixws * new attachment config * simplify types * TestPolicyResolverInheritance * fix policy resolver * fix policies * fix applied policy * docs fix * docs fix * fix linting + QA checks * fix linting + QA fixes * test fixes * docs fix * fix: pass through endpoints update registry (BerriAI#19420) * fix: pass through endpoints update registry * add test case, fix lint error and comment to avoid confusion * fix pass through endpoints test case * [Fix] Anthropic models on Azure AI cache pricing (BerriAI#19532) (BerriAI#19614) * Update README.md * fix: for test * All Models Backend Search * adding test * test: completions mcp output test * chore: fix lint error * test: Skip anthropic model test when ANTHROPIC_API_KEY is not set * fix: include tool arguments in proxy_server_request for spend logs callbacks * feat: hashicorp vault rotate support * Add tool choice mapping for giga chat * Fix: Responses API logging error for StopIteration * Fix: test_nova_invoke_streaming_chunk_parsing * Remove f string * fix BerriAI#19620: SSO user roles are not updated for existing users (BerriAI#19621) * Fix: SSO user roles are not updated for existing users Fixes BerriAI#19620 * Refactor: Remove redundant user_info retrieval in SSOAuthenticationHandler * Test: add new tests for user creation and updates in get_user_info_from_db * ci cd fixes - linting security * resetting poetry and requirements * fixing security checks * docs fix * fixing config * skipping flaky tests * skipping non root tests entirely * security scan * attempt fix flaky tests * fixing flaky tests * [Feat] Guardrail Policy Management - Allow using UI to manage guardrail policies (BerriAI#19668) * init UI * init schema.prisma * fix: policy_crud_router * UI fixes * update gitignore * working v0 for policy mgmt * fix: endpoints to resolve guardrails * fix code QA checks * ui build issues * schema fixes * fix checks * docs fix * remove imports from functions * add schema.prisma * add migrtion * fix schema.prisma * remove imports from functions * fix lint * BUMP pyproject * add spend-queue-troubleshooting docs (BerriAI#19659) * add spend-queue-troubleshooting docs * adjust spend-queue-troubleshooting docs * fix linting * New add fallbacks modal * adding tests * Add Langfuse mock mode for testing without API calls (BerriAI#19676) * Add GCS mock mode for testing without API calls (BerriAI#19683) * Adding router settings to create team and key * fixing build * fixing tests * perf: Optimize strip_trailing_slash with O(1) index check (BerriAI#19679) * perf: Optimize strip_trailing_slash with O(1) index check Replace rstrip("/") with direct index check for O(1) performance instead of O(n) string scanning. Results: - strip_trailing_slash: 311ms → 13ms (96% faster) - get_standard_logging_object_payload: 6.11s → 5.80s (5% faster) * Handle multiple trailing slashes in strip_trailing_slash Use rstrip for correctness when URL ends with "//" or more, otherwise use O(1) index check for single trailing slash. * Fixing tests * perf: Optimize use_custom_pricing_for_model with set intersection (BerriAI#19677) * perf: Optimize use_custom_pricing_for_model with set intersection Cache CustomPricingLiteLLMParams.model_fields.keys() as a module-level frozenset and use set intersection to reduce loop iterations from 882k to 90k (only iterating over keys that exist in both sets). Performance improvement: 84% faster (6.3x speedup) - Before: 1.17s total, 65µs per call - After: 0.19s total, 10µs per call * Use .get() for defensive dictionary access * perf: skip pattern_router.route() for non-wildcard models (BerriAI#19664) Check "*" in model before calling pattern_router.route() to avoid unnecessary pattern matching for non-wildcard model configurations. * perf: Add LRU caching to get_model_info for faster cost lookups (BerriAI#19606) - Add @lru_cache decorator to get_model_info() and _cached_get_model_info_helper() - Update _invalidate_model_cost_lowercase_map() to clear these caches when model_cost changes - Update test to call cache invalidation after modifying litellm.model_cost Reduces get_model_cost_information from 46% to <1% of request handling time. * UI: new build * redirect to login on expired jwt * [Feat] UI + Backend - Allow adding policies on Keys/Teams + Viewing on Info panels (BerriAI#19688) * ui for policy mgmt * test_add_guardrails_from_policy_engine_accepts_dynamic_policies_and_pops_from_data * docs: add litellm-enterprise requirement for managed files (BerriAI#19689) * Update Gemini 2.0 Flash deprecation dates to March 31, 2026 (BerriAI#19592) Google announced that Gemini 2.0 Flash and Flash Lite models will be discontinued on March 31, 2026. Updated deprecation_date field for all affected model variants across different providers (vertex_ai, gemini, deepinfra, openrouter, vercel_ai_gateway). Models updated: - gemini-2.0-flash (added deprecation date) - gemini-2.0-flash-001 (updated from 2026-02-05) - gemini-2.0-flash-lite (added deprecation date) - gemini-2.0-flash-lite-001 (updated from 2026-02-25) All variants now correctly reflect the March 31, 2026 shutdown date. * fixing build * Fixing failing tests * deactivating non root tests * fixing arize tests * cache tests serial * fixing circleci config * fixing circleci config * Update OSS Adopters section with new table format * Fixing ruff check * bump: version 1.81.2 → 1.81.3 * chore: update Next.js build artifacts (2026-01-24 17:18 UTC, node v22.16.0) * CI/CD fixes - split local testing * fix: _apply_search_filter_to_models mypy linting * test_partner_models_httpx_streaming * test_web_search * Fix: log duplication when json_logs is enabled (BerriAI#19705) * fix: FLAKY tests * fix unstable tests * docs fix * docs fix * docs fix * docs fix * docs fix * test_get_default_unvicorn_init_args * fix flaky tests * test_hanging_request_azure * test_team_update_sc_2 * BUMP extras * test fixes * test fixes * test_retrieve_container_basic * Model and Team filtering * TestBedrockInvokeToolSearch * fix(presidio): resolve runtime error by handling asyncio loops in bac… (BerriAI#19714) * fix(presidio): resolve runtime error by handling asyncio loops in background threads * add test case for thread safety * UI Keys Teams Router Settings docs * chore: update Next.js build artifacts (2026-01-25 00:27 UTC, node v22.16.0) * test_stream_transformation_error_sync * fix patch reliability mock tests * fix MCP tests * fix: server rooth path (BerriAI#19790) * feat: tpm-rpm limit in prometheus metrics (BerriAI#19725) Co-authored-by: Krish Dholakia <krrishdholakia@gmail.com> * fix(proxy): support slashes in google generateContent model names (BerriAI#19737) * fix(proxy): support slashes in google route params * fix(proxy): extract google model ids with slashes * test(proxy): cover google model ids with slashes * fix(vertex_ai): support model names with slashes in passthrough URLs (BerriAI#19944) The regex in get_vertex_model_id_from_url() was using [^/:]+ which stopped at the first slash, truncating model names like 'gcp/google/gemini-2.5-flash' to just 'gcp'. This caused access_groups checks to fail for custom model names. Changed the pattern to [^:]+ to allow slashes in model names, only stopping at the colon before the action (e.g., :generateContent). * [Fix] VertexAI Pass through - fix regression that caused vertex ai passthroughs to stop working for router models (BerriAI#19967) * fix(vertex_ai): replace custom model names with actual Vertex AI model names in passthrough URLs (BerriAI#19948) When the passthrough URL already contains project and location, the code was skipping the deployment lookup and forwarding the URL as-is to Vertex AI. For custom model names like gcp/google/gemini-2.5-flash, Vertex AI returned 404 because it only knows the actual model name (gemini-2.5-flash). The fix makes the deployment lookup always run, so the custom model name gets replaced with the actual Vertex AI model name before forwarding. * add _resolve_vertex_model_from_router * fix: get_llm_provider * Potential fix for code scanning alert no. 4020: Clear-text logging of sensitive information Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com> --------- Co-authored-by: michelligabriele <gabriele.michelli@icloud.com> Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com> * [Feat] - Search API add /list endpoint to list what search tools exist in router (BerriAI#19969) * feat: List all available search tools configured in the router. * add debugging search API * add debugging search API * perf(prometheus): parallelize budget metrics, fix caching bug, reduce CPU by ~40% (BerriAI#20544) * fix: revert httpx client caching that caused closed client errors AsyncHTTPHandler.__del__ was closing httpx clients still in use by AsyncOpenAI/AsyncAzureOpenAI due to independent cache lifecycles. Restores standalone httpx client creation for OpenAI/Azure providers. * Revert "Merge pull request BerriAI#18790 from BerriAI/litellm_key_team_routing_3" This reverts commit ae26d8e, reversing changes made to 864e8c6. * fix MYPY lint * fixed build errors after merge * least busy debug logs --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com> Co-authored-by: mubashir1osmani <mubashir.osmani777@gmail.com> Co-authored-by: yuneng-jiang <yuneng.jiang@gmail.com> Co-authored-by: YutaSaito <36355491+uc4w6c@users.noreply.github.com> Co-authored-by: Yuta Saito <uc4w6c@bma.biglobe.ne.jp> Co-authored-by: Cesar Garcia <128240629+Chesars@users.noreply.github.com> Co-authored-by: Krish Dholakia <krrishdholakia@gmail.com> Co-authored-by: Alexsander Hamir <alexsanderhamirgomesbaptista@gmail.com> Co-authored-by: jay prajapati <79649559+jayy-77@users.noreply.github.com> Co-authored-by: Sameer Kankute <sameer@berri.ai> Co-authored-by: davida-ps <david.a@prompt.security> Co-authored-by: Harshit Jain <48647625+Harshit28j@users.noreply.github.com> Co-authored-by: houdataali <84786211+houdataali@users.noreply.github.com> Co-authored-by: João Dinis Ferreira <hello@joaof.eu> Co-authored-by: Emerson Gomes <emerson.gomes@thalesgroup.com> Co-authored-by: Yogeshwaran Ravichandran <96047771+yogeshwaran10@users.noreply.github.com> Co-authored-by: Will Chen <willchen90@gmail.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Eric Cao <ecao310@gmail.com> Co-authored-by: mpcusack-altos <mcusack@altoslabs.com> Co-authored-by: milan-berri <milan@berri.ai> Co-authored-by: John Greek <2006605+jgreek@users.noreply.github.com> Co-authored-by: xqe2011 <gz923553148@gmail.com> Co-authored-by: ryan-crabbe <128659760+ryan-crabbe@users.noreply.github.com> Co-authored-by: Harshit Jain <harshitjain0562@gmail.com> Co-authored-by: michelligabriele <gabriele.michelli@icloud.com> Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com> * Sync/v1.81.3 stable (#67) * Fix virtual keys table sorting * Adding tests * feat: add GMI Cloud provider support (BerriAI#19376) * feat: add GMI Cloud provider support Add GMI Cloud as an OpenAI-compatible provider with: - Provider configuration in providers.json - Documentation page with usage examples - Model pricing for 16 models (Claude, GPT, DeepSeek, Gemini, etc.) - Sidebar entry for docs navigation * Add gmi_cloud to provider_endpoints_support.json Add provider entry to pass CI validation check that ensures all providers in openai_like/providers.json are documented. * Fix provider key: gmi_cloud -> gmi Match the provider key with providers.json --------- Co-authored-by: Krish Dholakia <krrishdholakia@gmail.com> * Cut chat_completion latency by ~21% by reducing pre-call processing time (BerriAI#19535) * Adding scope to /models * e2e test internal viewer sidebar * Model Select for Create Team * create team model select * fixing build * [Fix] VertexAI Pass through - Ensure only anthropic betas are forwarded down to LLM API (BerriAI#19542) * fix ALLOWED_VERTEX_AI_PASSTHROUGH_HEADERS * test_vertex_passthrough_forwards_anthropic_beta_header * fix test_vertex_passthrough_forwards_anthropic_beta_header * test_vertex_passthrough_does_not_forward_litellm_auth_token * fix utils * Using Anthropic Beta Features on Vertex AI * test_forward_headers_from_request_x_pass_prefix * [Fix] VertexAI Pass through - Ensure only anthropic betas are forwarded down to LLM API (BerriAI#19542) * fix ALLOWED_VERTEX_AI_PASSTHROUGH_HEADERS * test_vertex_passthrough_forwards_anthropic_beta_header * fix test_vertex_passthrough_forwards_anthropic_beta_header * test_vertex_passthrough_does_not_forward_litellm_auth_token * fix utils * Using Anthropic Beta Features on Vertex AI * test_forward_headers_from_request_x_pass_prefix * fix(mcp): forward static_headers to MCP servers (BerriAI#19341) (BerriAI#19366) Forward static_headers from /mcp-rest/test/* routes into the MCP client so headers are present during session.initialize() and tool discovery. Also add a shared merge_mcp_headers() helper to keep header precedence consistent and ensure OpenAPI-to-MCP generated tools include static_headers. Tests: - pytest tests/test_litellm/proxy/_experimental/mcp_server/test_rest_endpoints.py - pytest tests/test_litellm/proxy/_experimental/mcp_server/test_mcp_server_manager.py -k register_openapi_tools_includes_static_headers Fixes BerriAI#19341 Co-authored-by: Krish Dholakia <krrishdholakia@gmail.com> * fix(azure): preserve content_policy_violation details for images (BerriAI#19328) (BerriAI#19372) Azure OpenAI Images (DALL·E 3) returns policy violations as a structured payload under body["error"], including inner_error.content_filter_results and revised_prompt. LiteLLM previously: - Failed to extract nested error messages (get_error_message only handled body["message"]) - Missed policy violation detection when error strings were generic - Dropped inner_error details when raising ContentPolicyViolationError This change: - Extracts nested Azure error fields (code/type/message + inner_error) - Detects policy violations via structured error codes - Passes an OpenAI-style error body + provider_specific_fields to preserve details Tests: - python3 -m pytest tests/test_litellm/llms/azure/test_azure_exception_mapping.py - python3 -m pytest tests/test_litellm/litellm_core_utils/test_exception_mapping_utils.py Fixes BerriAI#19328 * [Feat] Add Structured output for /v1/messages with Anthropic API, Azure Anthropic API, Bedrock Converse (BerriAI#19545) * fix: add AnthropicMessagesRequestOptionalParams * add _update_headers_with_anthropic_beta * fix output format tests * test_structured_output_e2e * TestAnthropicAPIStructuredOutput * test_structured_output_e2e * fix BASE * TestAzureAnthropicStructuredOutput * fix: Bedrock Converse * add nthropic Messages Pass-Through Architecture * fix: bedrock invoke output_format * fix: transform_anthropic_messages_request for vertex anthropic * TestBedrockInvokeStructuredOutput * docs anthropic vertex * docs fix * docs fix * fixing prompt-security's guardrail implementation (BerriAI#19374) * Consolidated change * fix(prompt_security): update message processing to persist sanitized files and filter for API calls * fix per krrishdholakia suggestion * Fix/per service ssl override v2 (BerriAI#19538) * refactor(ssl): support per-service SSL verification overrides * add test cases for ssl * docs: update Claude Code integration guides (BerriAI#19415) * docs: document Claude Code default models and env var overrides - Update config example with current Claude Code 2.1.x model names - Add section documenting default models (sonnet/haiku) that Claude Code requests - Document env var overrides (ANTHROPIC_DEFAULT_SONNET_MODEL, etc.) - Show how model_name alias can route to any provider (Bedrock, Vertex, etc.) * Update docs Removed warning about changing model names in Claude Code versions. * docs: add 1M context support and improve Claude Code quickstart guide - Add comprehensive 1M context window documentation - Document [1m] suffix usage and shell escaping requirements - Clarify that LiteLLM config should NOT include [1m] in model names - Add standalone claude_code_1m_context.md guide - Improve model selection documentation with environment variables - Add section on default models used by Claude Code v2.1.14 - Add troubleshooting for 1M context issues - Reorganize to emphasize environment variables approach Addresses GitHub issue BerriAI#14444 * docs: reorder model selection options - prioritize --model over env vars - Move command line/session model selection to Option 1 (most reliable) - Move environment variables to Option 2 - Add note that env vars may be cached from previous session - Emphasize that --model always uses exact model specified * docs: reorganize 1M context section - separate command line from env vars - Split 1M context examples into two clear sections - Show command line usage first (--model and /model) - Show environment variables as alternative approach - Improves readability and emphasizes most reliable method * docs: remove misleading default models section from website tutorial - Remove 'Default Models Used by Claude Code' section (misleading) - Remove claim that config must match exact default model names - Update config comment to be more general - Add claude-opus-4-5-20251101 to example config - Keep authentication section as-is * docs: correct model selection in website tutorial - Remove incorrect claim that Claude Code automatically uses proxy models - Add explicit model selection examples with --model and /model - Show environment variables as alternative approach - Remove misleading comment about 'multiple configured' * docs: add 1M context section to website tutorial - Add section on using [1m] suffix for 1 million token context - Include warning about shell escaping (quotes required) - Explain how Claude Code handles [1m] internally - Add /context verification command - Note that LiteLLM config should NOT include [1m] * docs: add tip about using .env for API keys - Add note that ANTHROPIC_API_KEY can be stored in .env file - Clarifies alternative to exporting environment variables * add redisvl dependency to the root requiremnts.tx (BerriAI#19417) * [Fix] UI Cost Estimator - Fix model dropdown (BerriAI#19529) * add cost estimator * ui fix show errors * test_estimate_cost_resolves_router_model_alias * fix: UI 404 error when SERVER_ROOT_PATH is set (BerriAI#19467) * fix: add case-insensitive support for guardrail mode and actions (BerriAI#19480) * fix(bedrock): correct streaming choice index for tool calls (BerriAI#19506) Bedrock's contentBlockIndex identifies content blocks within a message (text=0, tool_call=1), not OpenAI's choice index (which varies with n>1). This caused OpenAI SDK's ChatCompletionAccumulator to fail when tool call chunks arrived on index 1 while finish_reason arrived on index 0. Bedrock doesn't support n>1 (no such parameter exists): https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_InferenceConfiguration.html OpenAI choice index spec: https://platform.openai.com/docs/api-reference/chat/streaming * Fix Azure RPM calculation formula (BerriAI#19513) * Fix Azure RPM calculation formula * updated test * fix(azure response api): flatten tools for responses api to support nested definitions (BerriAI#19526) The Azure Responses API uses a different schema (flattened) for tools compared to the standard OpenAI/Azure Chat Completions API (nested). This caused a `BadRequestError` when users passed standard tool definitions. Changes: - Implemented tool flattening logic in `AzureOpenAIResponsesAPIConfig.transform_responses_api_request`. - Added comprehensive unit tests in test_azure_transformation.py to verify nested-to-flat transformation, pass-through of flat tools, and immutability. - Ensures cross-provider compatibility for tool definitions. Fixes BerriAI#19523 * Fix date overflow/division by zero in proxy utils (BerriAI#19527) * Fix date overflow/division by zero in proxy utils * Fix projected spend calculation * Strengthen projected spend tests * Fix Azure AI costs for Anthropic models (BerriAI#19530) * Fix Azure AI cost calculation * fixup * feat: Add MCP tools response to chat completions * feat: display mcp output on the play ground * Fix: generation config empty for batch * Add custom vertex ai mapping to the output * Add support for output formatfor bedrock invoke via v1/messages * feat: Limit stop sequence as per openai spec * Fix mypy error in litellm_staging_01_21_2026 * Fix: imagegeneration@006 has been deprecated * Fix : test_anthropic_via_responses_api * Fix: Responses API usage field type mismatch * Fix: Httpx timeout test failures * Fix: generationConfig removal from tests * fix: mypy error * comment code not used * feat: Add MCP tools response to chat completions * feat: display mcp output on the play ground * Fix batch tests * fix: mypy error * fix: mypy error * Fix:test_multiple_function_call * build(deps): bump lodash from 4.17.21 to 4.17.23 in /docs/my-website Bumps [lodash](https://github.com/lodash/lodash) from 4.17.21 to 4.17.23. - [Release notes](https://github.com/lodash/lodash/releases) - [Commits](lodash/lodash@4.17.21...4.17.23) --- updated-dependencies: - dependency-name: lodash dependency-version: 4.17.23 dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> * Metrics prometheus user team count (BerriAI#19520) * add user count and team count prometheus metrics * rebase * revert mistaken deletion * fix ui build and mypy lint * Adding python3-dev to non root * adding node-tar cve allowlist * fix(websearch_interception): filter internal kwargs before follow-up request (BerriAI#19577) The websearch interception handler was passing internal flags like `_websearch_interception_converted_stream` to the follow-up LLM request. This caused "Extra inputs are not permitted" errors from providers like Bedrock that use strict Pydantic validation. Fix: Filter out all kwargs starting with `_websearch_interception` prefix before making the follow-up anthropic_messages.acreate() call. * skip brave tests * Fix unsafe access to request attribute (BerriAI#19573) * updating promethus tests * Fix non-root proxy tests * Adding lodash-es to allowlist * attempt fix translation tests * fix: change oss staging branch name to reflect they're oss * Revert "[Infra] UI - E2E Tests: Internal Viewer Sidebar" * Overriding lodash-es with version 4.17.23 in docs * updating lodash for dashboard * bump: version 1.81.1 → 1.81.2 * Add reusable model select to update organization page * Fixing tests * Adding EOS to finish reasons * Adding retries to flaky tests * add opencode tutorial (BerriAI#19602) * Fix org all proxy model case * adjust opencode tutorial (BerriAI#19605) * Add OSS Adopters section to README * fix: completions mcp output ordering * feat(helm): Enable PreStop hook configuration in values.yaml (BerriAI#19613) * Fix: litellm/tests/test_proxy_server_non_root.py * Update README.md * Update README.md * [Feat] New LiteLLM Policy engine - create policies to manage guardrails, conditions - permissions per Key, Team (BerriAI#19612) * init PolicyMatcher * TestPolicyMatcherGetMatchingPolicies * TestPolicyMatcherGetMatchingPolicies * feat: init PolicyResolver * init resolver types * init policy from config * inint PolicyValidator * validate policy * init Architecture Diagram * test_add_guardrails_from_policy_engine * init _init_policy_engine * test updates * test fixws * new attachment config * simplify types * TestPolicyResolverInheritance * fix policy resolver * fix policies * fix applied policy * docs fix * docs fix * fix linting + QA checks * fix linting + QA fixes * test fixes * docs fix * fix: pass through endpoints update registry (BerriAI#19420) * fix: pass through endpoints update registry * add test case, fix lint error and comment to avoid confusion * fix pass through endpoints test case * [Fix] Anthropic models on Azure AI cache pricing (BerriAI#19532) (BerriAI#19614) * Update README.md * fix: for test * All Models Backend Search * adding test * test: completions mcp output test * chore: fix lint error * test: Skip anthropic model test when ANTHROPIC_API_KEY is not set * fix: include tool arguments in proxy_server_request for spend logs callbacks * feat: hashicorp vault rotate support * Add tool choice mapping for giga chat * Fix: Responses API logging error for StopIteration * Fix: test_nova_invoke_streaming_chunk_parsing * Remove f string * fix BerriAI#19620: SSO user roles are not updated for existing users (BerriAI#19621) * Fix: SSO user roles are not updated for existing users Fixes BerriAI#19620 * Refactor: Remove redundant user_info retrieval in SSOAuthenticationHandler * Test: add new tests for user creation and updates in get_user_info_from_db * ci cd fixes - linting security * resetting poetry and requirements * fixing security checks * docs fix * fixing config * skipping flaky tests * skipping non root tests entirely * security scan * attempt fix flaky tests * fixing flaky tests * [Feat] Guardrail Policy Management - Allow using UI to manage guardrail policies (BerriAI#19668) * init UI * init schema.prisma * fix: policy_crud_router * UI fixes * update gitignore * working v0 for policy mgmt * fix: endpoints to resolve guardrails * fix code QA checks * ui build issues * schema fixes * fix checks * docs fix * remove imports from functions * add schema.prisma * add migrtion * fix schema.prisma * remove imports from functions * fix lint * BUMP pyproject * add spend-queue-troubleshooting docs (BerriAI#19659) * add spend-queue-troubleshooting docs * adjust spend-queue-troubleshooting docs * fix linting * New add fallbacks modal * adding tests * Add Langfuse mock mode for testing without API calls (BerriAI#19676) * Add GCS mock mode for testing without API calls (BerriAI#19683) * Adding router settings to create team and key * fixing build * fixing tests * perf: Optimize strip_trailing_slash with O(1) index check (BerriAI#19679) * perf: Optimize strip_trailing_slash with O(1) index check Replace rstrip("/") with direct index check for O(1) performance instead of O(n) string scanning. Results: - strip_trailing_slash: 311ms → 13ms (96% faster) - get_standard_logging_object_payload: 6.11s → 5.80s (5% faster) * Handle multiple trailing slashes in strip_trailing_slash Use rstrip for correctness when URL ends with "//" or more, otherwise use O(1) index check for single trailing slash. * Fixing tests * perf: Optimize use_custom_pricing_for_model with set intersection (BerriAI#19677) * perf: Optimize use_custom_pricing_for_model with set intersection Cache CustomPricingLiteLLMParams.model_fields.keys() as a module-level frozenset and use set intersection to reduce loop iterations from 882k to 90k (only iterating over keys that exist in both sets). Performance improvement: 84% faster (6.3x speedup) - Before: 1.17s total, 65µs per call - After: 0.19s total, 10µs per call * Use .get() for defensive dictionary access * perf: skip pattern_router.route() for non-wildcard models (BerriAI#19664) Check "*" in model before calling pattern_router.route() to avoid unnecessary pattern matching for non-wildcard model configurations. * perf: Add LRU caching to get_model_info for faster cost lookups (BerriAI#19606) - Add @lru_cache decorator to get_model_info() and _cached_get_model_info_helper() - Update _invalidate_model_cost_lowercase_map() to clear these caches when model_cost changes - Update test to call cache invalidation after modifying litellm.model_cost Reduces get_model_cost_information from 46% to <1% of request handling time. * UI: new build * redirect to login on expired jwt * [Feat] UI + Backend - Allow adding policies on Keys/Teams + Viewing on Info panels (BerriAI#19688) * ui for policy mgmt * test_add_guardrails_from_policy_engine_accepts_dynamic_policies_and_pops_from_data * docs: add litellm-enterprise requirement for managed files (BerriAI#19689) * Update Gemini 2.0 Flash deprecation dates to March 31, 2026 (BerriAI#19592) Google announced that Gemini 2.0 Flash and Flash Lite models will be discontinued on March 31, 2026. Updated deprecation_date field for all affected model variants across different providers (vertex_ai, gemini, deepinfra, openrouter, vercel_ai_gateway). Models updated: - gemini-2.0-flash (added deprecation date) - gemini-2.0-flash-001 (updated from 2026-02-05) - gemini-2.0-flash-lite (added deprecation date) - gemini-2.0-flash-lite-001 (updated from 2026-02-25) All variants now correctly reflect the March 31, 2026 shutdown date. * fixing build * Fixing failing tests * deactivating non root tests * fixing arize tests * cache tests serial * fixing circleci config * fixing circleci config * Update OSS Adopters section with new table format * Fixing ruff check * bump: version 1.81.2 → 1.81.3 * chore: update Next.js build artifacts (2026-01-24 17:18 UTC, node v22.16.0) * CI/CD fixes - split local testing * fix: _apply_search_filter_to_models mypy linting * test_partner_models_httpx_streaming * test_web_search * Fix: log duplication when json_logs is enabled (BerriAI#19705) * fix: FLAKY tests * fix unstable tests * docs fix * docs fix * docs fix * docs fix * docs fix * test_get_default_unvicorn_init_args * fix flaky tests * test_hanging_request_azure * test_team_update_sc_2 * BUMP extras * test fixes * test fixes * test_retrieve_container_basic * Model and Team filtering * TestBedrockInvokeToolSearch * fix(presidio): resolve runtime error by handling asyncio loops in bac… (BerriAI#19714) * fix(presidio): resolve runtime error by handling asyncio loops in background threads * add test case for thread safety * UI Keys Teams Router Settings docs * chore: update Next.js build artifacts (2026-01-25 00:27 UTC, node v22.16.0) * test_stream_transformation_error_sync * fix patch reliability mock tests * fix MCP tests * fix: server rooth path (BerriAI#19790) * feat: tpm-rpm limit in prometheus metrics (BerriAI#19725) Co-authored-by: Krish Dholakia <krrishdholakia@gmail.com> * fix(proxy): support slashes in google generateContent model names (BerriAI#19737) * fix(proxy): support slashes in google route params * fix(proxy): extract google model ids with slashes * test(proxy): cover google model ids with slashes * fix(vertex_ai): support model names with slashes in passthrough URLs (BerriAI#19944) The regex in get_vertex_model_id_from_url() was using [^/:]+ which stopped at the first slash, truncating model names like 'gcp/google/gemini-2.5-flash' to just 'gcp'. This caused access_groups checks to fail for custom model names. Changed the pattern to [^:]+ to allow slashes in model names, only stopping at the colon before the action (e.g., :generateContent). * [Fix] VertexAI Pass through - fix regression that caused vertex ai passthroughs to stop working for router models (BerriAI#19967) * fix(vertex_ai): replace custom model names with actual Vertex AI model names in passthrough URLs (BerriAI#19948) When the passthrough URL already contains project and location, the code was skipping the deployment lookup and forwarding the URL as-is to Vertex AI. For custom model names like gcp/google/gemini-2.5-flash, Vertex AI returned 404 because it only knows the actual model name (gemini-2.5-flash). The fix makes the deployment lookup always run, so the custom model name gets replaced with the actual Vertex AI model name before forwarding. * add _resolve_vertex_model_from_router * fix: get_llm_provider * Potential fix for code scanning alert no. 4020: Clear-text logging of sensitive information Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com> --------- Co-authored-by: michelligabriele <gabriele.michelli@icloud.com> Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com> * [Feat] - Search API add /list endpoint to list what search tools exist in router (BerriAI#19969) * feat: List all available search tools configured in the router. * add debugging search API * add debugging search API * perf(prometheus): parallelize budget metrics, fix caching bug, reduce CPU by ~40% (BerriAI#20544) * fix: revert httpx client caching that caused closed client errors AsyncHTTPHandler.__del__ was closing httpx clients still in use by AsyncOpenAI/AsyncAzureOpenAI due to independent cache lifecycles. Restores standalone httpx client creation for OpenAI/Azure providers. * Revert "Merge pull request BerriAI#18790 from BerriAI/litellm_key_team_routing_3" This reverts commit ae26d8e, reversing changes made to 864e8c6. * fix MYPY lint * fixed build errors after merge * added sandbox branch for gcr push (#61) * added sandbox branch for gcr push * jenkins setup for sbx * build fix * addding sync/v[0-9] branches for gcr push * build fix * least busy debug logs * Fix: remove x-anthropic-billing block * added backl anthropic envs * merge fixes * least busy router changes --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: YutaSaito <36355491+uc4w6c@users.noreply.github.com> Co-authored-by: yuneng-jiang <yuneng.jiang@gmail.com> Co-authored-by: Cesar Garcia <128240629+Chesars@users.noreply.github.com> Co-authored-by: Krish Dholakia <krrishdholakia@gmail.com> Co-authored-by: Alexsander Hamir <alexsanderhamirgomesbaptista@gmail.com> Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com> Co-authored-by: jay prajapati <79649559+jayy-77@users.noreply.github.com> Co-authored-by: Sameer Kankute <sameer@berri.ai> Co-authored-by: davida-ps <david.a@prompt.security> Co-authored-by: Harshit Jain <48647625+Harshit28j@users.noreply.github.com> Co-authored-by: houdataali <84786211+houdataali@users.noreply.github.com> Co-authored-by: João Dinis Ferreira <hello@joaof.eu> Co-authored-by: Emerson Gomes <emerson.gomes@thalesgroup.com> Co-authored-by: Yogeshwaran Ravichandran <96047771+yogeshwaran10@users.noreply.github.com> Co-authored-by: Will Chen <willchen90@gmail.com> Co-authored-by: Yuta Saito <uc4w6c@bma.biglobe.ne.jp> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Eric Cao <ecao310@gmail.com> Co-authored-by: mpcusack-altos <mcusack@altoslabs.com> Co-authored-by: milan-berri <milan@berri.ai> Co-authored-by: John Greek <2006605+jgreek@users.noreply.github.com> Co-authored-by: xqe2011 <gz923553148@gmail.com> Co-authored-by: mubashir1osmani <mubashir.osmani777@gmail.com> Co-authored-by: ryan-crabbe <128659760+ryan-crabbe@users.noreply.github.com> Co-authored-by: Harshit Jain <harshitjain0562@gmail.com> Co-authored-by: michelligabriele <gabriele.michelli@icloud.com> Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com> Co-authored-by: pramodp-dotcom <pramod.p@juspay.in> --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: Pramod P <pramod.p@juspay.in> Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com> Co-authored-by: mubashir1osmani <mubashir.osmani777@gmail.com> Co-authored-by: yuneng-jiang <yuneng.jiang@gmail.com> Co-authored-by: YutaSaito <36355491+uc4w6c@users.noreply.github.com> Co-authored-by: Yuta Saito <uc4w6c@bma.biglobe.ne.jp> Co-authored-by: Cesar Garcia <128240629+Chesars@users.noreply.github.com> Co-authored-by: Krish Dholakia <krrishdholakia@gmail.com> Co-authored-by: Alexsander Hamir <alexsanderhamirgomesbaptista@gmail.com> Co-authored-by: jay prajapati <79649559+jayy-77@users.noreply.github.com> Co-authored-by: Sameer Kankute <sameer@berri.ai> Co-authored-by: davida-ps <david.a@prompt.security> Co-authored-by: Harshit Jain <48647625+Harshit28j@users.noreply.github.com> Co-authored-by: houdataali <84786211+houdataali@users.noreply.github.com> Co-authored-by: João Dinis Ferreira <hello@joaof.eu> Co-authored-by: Emerson Gomes <emerson.gomes@thalesgroup.com> Co-authored-by: Yogeshwaran Ravichandran <96047771+yogeshwaran10@users.noreply.github.com> Co-authored-by: Will Chen <willchen90@gmail.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Eric Cao <ecao310@gmail.com> Co-authored-by: mpcusack-altos <mcusack@altoslabs.com> Co-authored-by: milan-berri <milan@berri.ai> Co-authored-by: John Greek <2006605+jgreek@users.noreply.github.com> Co-authored-by: xqe2011 <gz923553148@gmail.com> Co-authored-by: ryan-crabbe <128659760+ryan-crabbe@users.noreply.github.com> Co-authored-by: Harshit Jain <harshitjain0562@gmail.com> Co-authored-by: michelligabriele <gabriele.michelli@icloud.com> Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Relevant issues
updated docs to clarify that
litellm-enterprisepackage is required for managed files and other enterprise features when building from pip.tests/litellm/directory, Adding at least 1 test is a hard requirement - see detailsmake test-unitCI (LiteLLM team)
Branch creation CI run
Link:
CI run for the last commit
Link:
Merge / cherry-pick CI run
Links:
Type
📖 Documentation
Changes
docs/my-website/docs/proxy/deploy.md
docs/my-website/docs/proxy/litellm_managed_files.md