Litellm oss staging 02 16 2026 by krrishdholakia · Pull Request #21326 · BerriAI/litellm

krrishdholakia · 2026-02-16T17:20:09Z

Relevant issues

Pre-Submission checklist

Please complete all items before asking a LiteLLM maintainer to review your PR

I have Added testing in the tests/litellm/ directory, Adding at least 1 test is a hard requirement - see details
My PR passes all unit tests on make test-unit
My PR's scope is as isolated as possible, it only solves 1 specific problem
I have requested a Greptile review by commenting @greptileai and received a Confidence Score of at least 4/5 before requesting a maintainer review

CI (LiteLLM team)

CI status guideline:

50-55 passing tests: main is stable with minor issues.

45-49 passing tests: acceptable but needs attention

<= 40 passing tests: unstable; be careful with your merges and assess the risk.

Branch creation CI run
Link:
CI run for the last commit
Link:
Merge / cherry-pick CI run
Links:

Type

🆕 New Feature
🐛 Bug Fix
🧹 Refactoring
📖 Documentation
🚄 Infrastructure
✅ Test

Changes

…xtFormat)

… in native structured outputs - Add `definitions` handling alongside `$defs` in schema normalization (older JSON Schema drafts use `definitions` instead of `$defs`) - Fall back to tool-call approach when `response_format: {type: json_object}` has no explicit schema, since the native API requires one - Add tests for both cases

…rdrail endpoint connection failures (#21245) * Generic Guardrails: Add a configurable fallback to handle guardrail endpoint connection failures * Fix PR comments * Generic Guardrails: Add the fallback support to litellm.Timeout

#21243) * fix: preserve metadata for custom callbacks on codex/responses path (#21204) - Use metadata or litellm_metadata when calling update_environment_variables in responses/main.py so metadata is not overwritten by None on the bridge path (completion -> responses API). - Add tests for metadata in custom callback for codex models and for litellm_metadata in aresponses(). Co-authored-by: Cursor <cursoragent@cursor.com> * Update tests/test_litellm/responses/test_metadata_codex_callback.py Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> --------- Co-authored-by: Cursor <cursoragent@cursor.com> Co-authored-by: Krish Dholakia <krrishdholakia@gmail.com> Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

…21159) * fixed double counting * Update litellm/proxy/utils.py Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> * reverse prev commit * Update litellm/proxy/utils.py Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> * removed else branch --------- Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

…ompleted contains function_call items (#19745) When using the Responses API (e.g., Azure gpt-5.1-codex-mini), the response.completed event was always returning finish_reason='stop', even when the response contained function_call items in its output. This caused agents like OpenCode to incorrectly conclude the stream ended without tools to execute, breaking tool/function calling workflows. The fix inspects the response.output field in the response.completed event to determine the correct finish_reason: - 'tool_calls' when output contains function_call items - 'stop' otherwise (text-only responses) Added tests to verify: - response.completed with function_call output returns finish_reason='tool_calls' - response.completed with message-only output returns finish_reason='stop' - response.completed with empty output returns finish_reason='stop' (backward compat) Co-authored-by: Krish Dholakia <krrishdholakia@gmail.com>

vercel · 2026-02-16T17:20:13Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
litellm	Ready	Preview, Comment	Feb 18, 2026 0:09am

greptile-apps · 2026-02-16T17:26:34Z

Greptile Summary

This staging PR bundles four distinct fixes/features:

Fix: Prevent double-counting of litellm_proxy_total_requests_metric — The Prometheus metric was being incremented in both async_log_success_event (for streaming) and async_post_call_success_hook (for all successes), causing non-streaming requests to be counted twice. The fix consolidates the increment into async_log_success_event only (for both streaming and non-streaming) and makes async_post_call_success_hook a no-op. A complementary fix in _init_litellm_callbacks replaces string callbacks in-place in litellm.callbacks instead of appending initialized instances alongside the original strings, which was another source of duplicate callbacks.
Feature: Generic Guardrail API fail-open fallback — Adds a configurable unreachable_fallback parameter (fail_closed default, fail_open option) to the generic guardrail API. When set to fail_open, network errors (httpx.RequestError, litellm.Timeout) and upstream proxy errors (HTTP 502/503/504) allow the request to proceed with a critical log instead of blocking. Includes documentation and example config updates.
Fix: finish_reason='tool_calls' for response.completed with function calls — When streaming responses from codex models (GPT-5.1-codex), response.completed events that contained function_call items in their output always returned finish_reason='stop'. Agents like OpenCode need finish_reason='tool_calls' to know tools should be executed. The fix inspects the response.completed output items and returns the correct finish reason.
Fix: Preserve metadata for custom callbacks on codex/responses path — When codex models route through the responses API bridge, metadata was passed as litellm_metadata in kwargs but not forwarded to update_environment_variables. The fix falls back to kwargs.get('litellm_metadata') when metadata is None.

Confidence Score: 4/5

This PR is safe to merge — each fix is well-scoped with corresponding tests, and the changes align with the existing codebase patterns.
The PR bundles four independent changes, each with targeted tests. The Prometheus double-counting fix and callback deduplication fix work together correctly. The guardrail fail-open feature is well-implemented with proper error categorization. The only minor concerns are stylistic (lost exception chains when re-raising). All test changes are mock-based and verify the intended behavior.
Pay close attention to litellm/proxy/utils.py (callback initialization logic change) and litellm/integrations/prometheus.py (metric increment consolidation) as they affect metric accuracy across the entire proxy.

Important Files Changed

Filename	Overview
litellm/proxy/utils.py	Fixes double-counting of metrics by replacing string callbacks in-place in `litellm.callbacks` instead of appending initialized instances alongside the original strings. No longer calls `add_litellm_callback` for each entry.
litellm/integrations/prometheus.py	Moves `litellm_proxy_total_requests_metric` increment to `async_log_success_event` for ALL requests (not just streaming), and removes it from `async_post_call_success_hook` to prevent double-counting. The hook method is now a no-op `pass`.
litellm/proxy/guardrails/guardrail_hooks/generic_guardrail_api/generic_guardrail_api.py	Adds configurable `unreachable_fallback` (fail_closed/fail_open) for the generic guardrail API. Adds `httpx.HTTPStatusError` (502/503/504), `httpx.RequestError`, and `litellm.Timeout` exception handling with fail-open support. Extracts helper methods `_fail_open_passthrough` and `_build_guardrail_return_inputs`.
litellm/completion_extras/litellm_responses_transformation/transformation.py	Fixes `finish_reason` in `response.completed` streaming event to return 'tool_calls' when the response contains function_call items (instead of always 'stop'). Also includes formatting/whitespace changes.
litellm/responses/main.py	Preserves metadata for custom callbacks when called from completion bridge (codex models) by falling back to `kwargs.get('litellm_metadata')` when `metadata` is None.
litellm/types/guardrails.py	Adds `unreachable_fallback` field to `BaseLitellmParams` with 'fail_closed' default. Also adds it to `LitellmParams` validator.
litellm/types/proxy/guardrails/guardrail_hooks/generic_guardrail_api.py	Adds `unreachable_fallback` field to `GenericGuardrailAPIOptionalParams`. Minor formatting change for `litellm_trace_id`.
tests/litellm/proxy/test_init_litellm_callbacks.py	New test file validating that `_init_litellm_callbacks` replaces string callbacks in-place rather than creating duplicates. Tests cover replacement, no-duplication of instances, unrecognized strings, and multiple callbacks. All mock-based.
tests/test_litellm/responses/test_metadata_codex_callback.py	New test file validating metadata preservation for custom callbacks when using codex models (responses API bridge). Tests use HTTP mocking with `AsyncMock`.

Flowchart

flowchart TD
    subgraph Before["Before (Double-Counting Bug)"]
        A1[Successful Request] --> B1[async_log_success_event]
        B1 -->|streaming only| C1[Inc total_requests_metric]
        A1 --> D1[async_post_call_success_hook]
        D1 -->|all requests| E1[Inc total_requests_metric]
        C1 -.- F1[Non-streaming: 1x count ✓]
        E1 -.- F1
        C1 -.- G1[Streaming: 2x count ✗]
        E1 -.- G1
    end

    subgraph After["After (Fix)"]
        A2[Successful Request] --> B2[async_log_success_event]
        B2 -->|all requests| C2[Inc total_requests_metric]
        A2 --> D2[async_post_call_success_hook]
        D2 -->|pass / no-op| E2[No metric increment]
        C2 -.- F2[All requests: 1x count ✓]
    end

    subgraph Guardrail["Generic Guardrail API - Fail Open/Closed"]
        G[Guardrail API Call] -->|Success| H[Process Response]
        G -->|Timeout / RequestError| I{unreachable_fallback?}
        G -->|HTTP 502/503/504| I
        I -->|fail_closed| J[Raise Exception]
        I -->|fail_open| K[Log Critical + Passthrough]
    end

_{Last reviewed commit: 504c70f}

greptile-apps

_{16 files reviewed, 2 comments}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps · 2026-02-16T17:26:37Z

litellm/proxy/guardrails/guardrail_hooks/generic_guardrail_api/generic_guardrail_api.py

+        except httpx.HTTPStatusError as e:
+            # Common reverse-proxy/LB failures can present as HTTP errors even when the backend is unreachable.
+            status_code = getattr(getattr(e, "response", None), "status_code", None)
+            if self.unreachable_fallback == "fail_open" and status_code in (
+                502,
+                503,
+                504,
+            ):
+                return self._fail_open_passthrough(
+                    inputs=inputs,
+                    input_type=input_type,
+                    logging_obj=logging_obj,
+                    error=e,
+                    http_status_code=status_code,
+                )
+
+            verbose_proxy_logger.error(
+                "Generic Guardrail API: failed to make request: %s", str(e)
+            )
+            raise Exception(f"Generic Guardrail API failed: {str(e)}")


Non-5xx errors swallowed as generic Exception
When unreachable_fallback="fail_closed" (or any HTTP error that isn't 502/503/504), the original httpx.HTTPStatusError is caught and re-raised as a bare Exception(f"Generic Guardrail API failed: {str(e)}"). This discards the HTTP status code, response body, and exception chain, making debugging harder.

Consider preserving the original exception with raise ... from e, and ideally including the status code in the message:

Suggested change

except httpx.HTTPStatusError as e:

# Common reverse-proxy/LB failures can present as HTTP errors even when the backend is unreachable.

status_code = getattr(getattr(e, "response", None), "status_code", None)

if self.unreachable_fallback == "fail_open" and status_code in (

502,

503,

504,

):

return self._fail_open_passthrough(

inputs=inputs,

input_type=input_type,

logging_obj=logging_obj,

error=e,

http_status_code=status_code,

)

verbose_proxy_logger.error(

"Generic Guardrail API: failed to make request: %s", str(e)

)

raise Exception(f"Generic Guardrail API failed: {str(e)}")

status_code = getattr(getattr(e, "response", None), "status_code", None)

if self.unreachable_fallback == "fail_open" and status_code in (

502,

503,

504,

):

return self._fail_open_passthrough(

inputs=inputs,

input_type=input_type,

logging_obj=logging_obj,

error=e,

http_status_code=status_code,

)

verbose_proxy_logger.error(

"Generic Guardrail API: failed to make request: %s", str(e)

)

raise Exception(f"Generic Guardrail API failed (HTTP {status_code}): {str(e)}") from e

greptile-apps · 2026-02-16T17:26:38Z

litellm/proxy/guardrails/guardrail_hooks/generic_guardrail_api/generic_guardrail_api.py

+            verbose_proxy_logger.error(
+                "Generic Guardrail API: failed to make request: %s", str(e)
+            )
+            raise Exception(f"Generic Guardrail API failed: {str(e)}")


Exception chain lost when re-raising
Same as the httpx.HTTPStatusError handler above — use raise ... from e to preserve the exception chain for httpx.RequestError, which aids debugging (e.g., distinguishing DNS resolution failures from connection resets).

Suggested change

verbose_proxy_logger.error(

"Generic Guardrail API: failed to make request: %s", str(e)

)

raise Exception(f"Generic Guardrail API failed: {str(e)}")

raise Exception(f"Generic Guardrail API failed: {str(e)}") from e

Resolve conflict in test_converse_transformation.py by keeping both the structured outputs tests (from this branch) and the TestBedrockMinThinkingBudgetTokens tests (from main).

* fix: SSO PKCE support fails in multi-pod Kubernetes deployments * fix: virutal key grace period from env/UI * fix: refactor, race condition handle, fstring sql injection * fix: add async call to avoid server pauses * Update tests/test_litellm/proxy/management_endpoints/test_ui_sso.py Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> * fix: add await in tests * add modify test to perform async run * Update tests/test_litellm/proxy/management_endpoints/test_ui_sso.py Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> * Update tests/test_litellm/proxy/management_endpoints/test_ui_sso.py Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> * fix grace period with better error handling on frontend and as per best practices * Update tests/test_litellm/proxy/management_endpoints/test_ui_sso.py Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> * fix: as per request changes * Update litellm/proxy/utils.py Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> * Fix errors when callbacks are invoked for file delete operations: * Fix errors when callbacks are invoked for file operations * Fix: pass deployment credentials to afile_retrieve in managed_files post-call hook * Fix: bypass managed files access check in batch polling by calling afile_content directly * Update tests/test_litellm/proxy/management_endpoints/test_ui_sso.py Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> * fix: afile_retrieve returns unified ID for batch output files * fix: batch retrieve returns unified input_file_id * fix(chatgpt): drop unsupported responses params for Codex Co-authored-by: Cursor <cursoragent@cursor.com> * test(chatgpt): ensure Codex request filters unsupported params Co-authored-by: Cursor <cursoragent@cursor.com> * Fix deleted managed files returning 403 instead of 404 * Add comments * Update litellm/proxy/utils.py Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> * fix: thread deployment model_info through batch cost calculation batch_cost_calculator only checked the global cost map, ignoring deployment-level custom pricing (input_cost_per_token_batches etc.). Add optional model_info param through the batch cost chain and pass it from CheckBatchCost. * fix(deps): add pytest-postgresql for db schema migration tests The test_db_schema_migration.py test requires pytest-postgresql but it was missing from dependencies, causing import errors: ModuleNotFoundError: No module named 'pytest_postgresql' Added pytest-postgresql ^6.0.0 to dev dependencies to fix test collection errors in proxy_unit_tests. This is a pre-existing issue, not related to PR #21277. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * fix(test): replace caplog with custom handler for parallel execution The cost calculation log level tests were failing when run with pytest-xdist parallel execution because caplog doesn't work reliably across worker processes. This causes "ValueError: I/O operation on closed file" errors. Solution: Replace caplog fixture with a custom LogRecordHandler that directly attaches to the logger. This approach works correctly in parallel execution because each worker process has its own handler instance. Fixes test failures in PR #21277 when running with --dist=loadscope. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * fix(test): correct async mock for video generation logging test The test was failing with AuthenticationError because the mock wasn't intercepting the actual HTTP handler calls. This caused real API calls with no API key, resulting in 401 errors. Root cause: The test was patching the wrong target using string path 'litellm.videos.main.base_llm_http_handler' instead of using patch.object on the actual handler instance. Additionally, it was mocking the sync method instead of async_video_generation_handler. Solution: Use patch.object with side_effect pattern on the correct async handler method, following the same pattern used in test_video_generation_async(). Fixes test failure in PR #21277 when running with --dist=loadscope. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * fix(test): add cleanup fixture and no_parallel mark for MCP tests Two MCP server tests were failing when run with pytest-xdist parallel execution (--dist=loadscope): - test_mcp_routing_with_conflicting_alias_and_group_name - test_oauth2_headers_passed_to_mcp_client Both tests showed assertion failures where mocks weren't being called (0 times instead of expected 1 time). Root cause: These tests rely on global_mcp_server_manager singleton state and complex async mocking that doesn't work reliably with parallel execution. Each worker process can have different state and patches may not apply correctly. Solution: 1. Added autouse fixture to clean up global_mcp_server_manager registry before and after each test for better isolation 2. Added @pytest.mark.no_parallel to these specific tests to ensure they run sequentially, avoiding parallel execution issues This approach maintains test reliability while allowing other tests in the file to still benefit from parallelization. Fixes test failures exposed by PR #21277. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * Regenerate poetry.lock with Poetry 2.3.2 Updated lock file to use Poetry 2.3.2 (matching main branch standard). This addresses Greptile feedback about Poetry version mismatch. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * Remove unused pytest import and add trailing newline - Removed unused pytest import (caplog fixture was removed) - Added missing trailing newline at end of file Addresses Greptile feedback (minor style issues). Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * Remove redundant import inside test method The module litellm.videos.main is already imported at the top of the file (line 21), so the import inside the test method is redundant. Addresses Greptile feedback (minor style issue). Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * Fix converse anthropic usage object according to v1/messages specs * Add routing based on if reasoning is supported or not * add fireworks_ai/accounts/fireworks/models/kimi-k2p5 in model map * Removed stray .md file * fix(bedrock): clamp thinking.budget_tokens to minimum 1024 Bedrock rejects thinking.budget_tokens values below 1024 with a 400 error. This adds automatic clamping in the LiteLLM transformation layer so callers (e.g. router with reasoning_effort="low") don't need to know about the provider-specific minimum. Fixes #21297 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: improve Langfuse test isolation to prevent flaky failures (#21093) The test was creating fresh mocks but not fully isolating from setUp state, causing intermittent CI failures with 'Expected generation to be called once. Called 0 times.' Instead of creating fresh mocks, properly reset the existing setUp mocks to ensure clean state while maintaining proper mock chain configuration. * feat(s3): add support for virtual-hosted-style URLs (#21094) Add s3_use_virtual_hosted_style parameter to support AWS S3 virtual-hosted-style URL format (bucket.endpoint/key) alongside the existing path-style format (endpoint/bucket/key). This enables compatibility with S3-compatible services like MinIO and aligns with AWS S3 official terminology. * Addressed greptile comments to extract common helpers and return 404 * Allow effort="max" for Claude Opus 4.6 (#21112) * fix(aiohttp): prevent closing shared ClientSession in AiohttpTransport (#21117) When a shared ClientSession is passed to LiteLLMAiohttpTransport, calling aclose() on the transport would close the shared session, breaking other clients still using it. Add owns_session parameter (default True for backwards compatibility) to AiohttpTransport and LiteLLMAiohttpTransport. When a shared session is provided in http_handler.py, owns_session=False is set to prevent the transport from closing a session it does not own. This aligns AiohttpTransport with the ownership pattern already used in AiohttpHandler (aiohttp_handler.py). * perf(spend): avoid duplicate daily agent transaction computation (#21187) * fix: proxy/batches_endpoints/endpoints.py:309:11: PLR0915 Too many statements (54 > 50) * fix mypy * Add doc for OpenAI Agents SDK with LiteLLM * Add doc for OpenAI Agents SDK with LiteLLM * Update docs/my-website/sidebars.js Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> * fix mypy * Update tests/test_litellm/proxy/_experimental/mcp_server/test_mcp_server.py Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> * Add blog fffor Managing Anthropic Beta Headers * Add blog fffor Managing Anthropic Beta Headers * correct the time * Fix: Exclude tool params for models without function calling support (#21125) (#21244) * Fix tool params reported as supported for models without function calling (#21125) JSON-configured providers (e.g. PublicAI) inherited all OpenAI params including tools, tool_choice, function_call, and functions — even for models that don't support function calling. This caused an inconsistency where get_supported_openai_params included "tools" but supports_function_calling returned False. The fix checks supports_function_calling in the dynamic config's get_supported_openai_params and removes tool-related params when the model doesn't support it. Follows the same pattern used by OVHCloud and Fireworks AI providers. * Style: move verbose_logger to module-level import, remove redundant try/except Address review feedback from Greptile bot: - Move verbose_logger import to top-level (matches project convention) - Remove redundant try/except around supports_function_calling() since it already handles exceptions internally via _supports_factory() * fix(index.md): cleanup str * fix(proxy): handle missing DATABASE_URL in append_query_params (#21239) * fix: handle missing database url in append_query_params * Update litellm/proxy/proxy_cli.py Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> --------- Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> * fix(mcp): revert StreamableHTTPSessionManager to stateless mode (#21323) PR #19809 changed stateless=True to stateless=False to enable progress notifications for MCP tool calls. This caused the mcp library to enforce mcp-session-id headers on all non-initialize requests, breaking MCP Inspector, curl, and any client without automatic session management. Revert to stateless=True to restore compatibility with all MCP clients. The progress notification code already handles missing sessions gracefully (defensive checks + try/except), so no other changes are needed. Fixes #20242 * UI - Content Filters, help edit/view categories and 1-click add categories + go to next page (#21223) * feat(ui/): allow viewing content filter categories on guardrail info * fix(add_guardrail_form.tsx): add validation check to prevent adding empty content filter guardrails * feat(ui/): improve ux around adding new content filter categories easy to skip adding a category, so make it a 1-click thing * Fix OCI Grok output pricing (#21329) * fix(proxy): fix master key rotation Prisma validation errors _rotate_master_key() used jsonify_object() which converts Python dicts to JSON strings. Prisma's Python client rejects strings for Json-typed fields — it requires prisma.Json() wrappers or native dicts. This affected three code paths: - Model table (create_many): litellm_params and model_info converted to strings, plus created_at/updated_at were None (non-nullable DateTime) - Config table (update): param_value converted to string - Credentials table (update): credential_values/credential_info converted to strings Fix: replace jsonify_object() with model_dump(exclude_none=True) + prisma.Json() wrappers for all Json fields. Wrap model delete+insert in a Prisma transaction for atomicity. Add try/except around MCP server rotation to prevent non-critical failures from blocking the entire rotation. --------- Co-authored-by: Harshit Jain <harshitjain0562@gmail.com> Co-authored-by: Harshit Jain <48647625+Harshit28j@users.noreply.github.com> Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Co-authored-by: Ephrim Stanley <ephrim.stanley@point72.com> Co-authored-by: Jay Prajapati <79649559+jayy-77@users.noreply.github.com> Co-authored-by: Cursor <cursoragent@cursor.com> Co-authored-by: Julio Quinteros Pro <jquinter@gmail.com> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com> Co-authored-by: Sameer Kankute <sameer@berri.ai> Co-authored-by: mjkam <mjkam@naver.com> Co-authored-by: Fly <48186978+tuzkiyoung@users.noreply.github.com> Co-authored-by: Kristoffer Arlind <13228507+KristofferArlind@users.noreply.github.com> Co-authored-by: Constantine <Runixer@gmail.com> Co-authored-by: Emerson Gomes <emerson.gomes@thalesgroup.com> Co-authored-by: Atharva Jaiswal <92455570+AtharvaJaiswal005@users.noreply.github.com> Co-authored-by: Krrish Dholakia <krrishdholakia@gmail.com> Co-authored-by: Vincent Koc <vincentkoc@ieee.org> Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com>

Keep both structured output tests (ours) and min thinking budget tests (staging). Accept staging poetry.lock.

* fix: SSO PKCE support fails in multi-pod Kubernetes deployments * fix: virutal key grace period from env/UI * fix: refactor, race condition handle, fstring sql injection * fix: add async call to avoid server pauses * Update tests/test_litellm/proxy/management_endpoints/test_ui_sso.py Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> * fix: add await in tests * add modify test to perform async run * Update tests/test_litellm/proxy/management_endpoints/test_ui_sso.py Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> * Update tests/test_litellm/proxy/management_endpoints/test_ui_sso.py Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> * fix grace period with better error handling on frontend and as per best practices * Update tests/test_litellm/proxy/management_endpoints/test_ui_sso.py Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> * fix: as per request changes * Update litellm/proxy/utils.py Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> * Fix errors when callbacks are invoked for file delete operations: * Fix errors when callbacks are invoked for file operations * Fix: pass deployment credentials to afile_retrieve in managed_files post-call hook * Fix: bypass managed files access check in batch polling by calling afile_content directly * Update tests/test_litellm/proxy/management_endpoints/test_ui_sso.py Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> * fix: afile_retrieve returns unified ID for batch output files * fix: batch retrieve returns unified input_file_id * fix(chatgpt): drop unsupported responses params for Codex Co-authored-by: Cursor <cursoragent@cursor.com> * test(chatgpt): ensure Codex request filters unsupported params Co-authored-by: Cursor <cursoragent@cursor.com> * Fix deleted managed files returning 403 instead of 404 * Add comments * Update litellm/proxy/utils.py Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> * fix: thread deployment model_info through batch cost calculation batch_cost_calculator only checked the global cost map, ignoring deployment-level custom pricing (input_cost_per_token_batches etc.). Add optional model_info param through the batch cost chain and pass it from CheckBatchCost. * fix(deps): add pytest-postgresql for db schema migration tests The test_db_schema_migration.py test requires pytest-postgresql but it was missing from dependencies, causing import errors: ModuleNotFoundError: No module named 'pytest_postgresql' Added pytest-postgresql ^6.0.0 to dev dependencies to fix test collection errors in proxy_unit_tests. This is a pre-existing issue, not related to PR #21277. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * fix(test): replace caplog with custom handler for parallel execution The cost calculation log level tests were failing when run with pytest-xdist parallel execution because caplog doesn't work reliably across worker processes. This causes "ValueError: I/O operation on closed file" errors. Solution: Replace caplog fixture with a custom LogRecordHandler that directly attaches to the logger. This approach works correctly in parallel execution because each worker process has its own handler instance. Fixes test failures in PR #21277 when running with --dist=loadscope. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * fix(test): correct async mock for video generation logging test The test was failing with AuthenticationError because the mock wasn't intercepting the actual HTTP handler calls. This caused real API calls with no API key, resulting in 401 errors. Root cause: The test was patching the wrong target using string path 'litellm.videos.main.base_llm_http_handler' instead of using patch.object on the actual handler instance. Additionally, it was mocking the sync method instead of async_video_generation_handler. Solution: Use patch.object with side_effect pattern on the correct async handler method, following the same pattern used in test_video_generation_async(). Fixes test failure in PR #21277 when running with --dist=loadscope. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * fix(test): add cleanup fixture and no_parallel mark for MCP tests Two MCP server tests were failing when run with pytest-xdist parallel execution (--dist=loadscope): - test_mcp_routing_with_conflicting_alias_and_group_name - test_oauth2_headers_passed_to_mcp_client Both tests showed assertion failures where mocks weren't being called (0 times instead of expected 1 time). Root cause: These tests rely on global_mcp_server_manager singleton state and complex async mocking that doesn't work reliably with parallel execution. Each worker process can have different state and patches may not apply correctly. Solution: 1. Added autouse fixture to clean up global_mcp_server_manager registry before and after each test for better isolation 2. Added @pytest.mark.no_parallel to these specific tests to ensure they run sequentially, avoiding parallel execution issues This approach maintains test reliability while allowing other tests in the file to still benefit from parallelization. Fixes test failures exposed by PR #21277. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * Regenerate poetry.lock with Poetry 2.3.2 Updated lock file to use Poetry 2.3.2 (matching main branch standard). This addresses Greptile feedback about Poetry version mismatch. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * Remove unused pytest import and add trailing newline - Removed unused pytest import (caplog fixture was removed) - Added missing trailing newline at end of file Addresses Greptile feedback (minor style issues). Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * Remove redundant import inside test method The module litellm.videos.main is already imported at the top of the file (line 21), so the import inside the test method is redundant. Addresses Greptile feedback (minor style issue). Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * Fix converse anthropic usage object according to v1/messages specs * Add routing based on if reasoning is supported or not * add fireworks_ai/accounts/fireworks/models/kimi-k2p5 in model map * Removed stray .md file * fix(bedrock): clamp thinking.budget_tokens to minimum 1024 Bedrock rejects thinking.budget_tokens values below 1024 with a 400 error. This adds automatic clamping in the LiteLLM transformation layer so callers (e.g. router with reasoning_effort="low") don't need to know about the provider-specific minimum. Fixes #21297 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: improve Langfuse test isolation to prevent flaky failures (#21093) The test was creating fresh mocks but not fully isolating from setUp state, causing intermittent CI failures with 'Expected generation to be called once. Called 0 times.' Instead of creating fresh mocks, properly reset the existing setUp mocks to ensure clean state while maintaining proper mock chain configuration. * feat(s3): add support for virtual-hosted-style URLs (#21094) Add s3_use_virtual_hosted_style parameter to support AWS S3 virtual-hosted-style URL format (bucket.endpoint/key) alongside the existing path-style format (endpoint/bucket/key). This enables compatibility with S3-compatible services like MinIO and aligns with AWS S3 official terminology. * Addressed greptile comments to extract common helpers and return 404 * Allow effort="max" for Claude Opus 4.6 (#21112) * fix(aiohttp): prevent closing shared ClientSession in AiohttpTransport (#21117) When a shared ClientSession is passed to LiteLLMAiohttpTransport, calling aclose() on the transport would close the shared session, breaking other clients still using it. Add owns_session parameter (default True for backwards compatibility) to AiohttpTransport and LiteLLMAiohttpTransport. When a shared session is provided in http_handler.py, owns_session=False is set to prevent the transport from closing a session it does not own. This aligns AiohttpTransport with the ownership pattern already used in AiohttpHandler (aiohttp_handler.py). * perf(spend): avoid duplicate daily agent transaction computation (#21187) * fix: proxy/batches_endpoints/endpoints.py:309:11: PLR0915 Too many statements (54 > 50) * fix mypy * Add doc for OpenAI Agents SDK with LiteLLM * Add doc for OpenAI Agents SDK with LiteLLM * Update docs/my-website/sidebars.js Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> * fix mypy * Update tests/test_litellm/proxy/_experimental/mcp_server/test_mcp_server.py Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> * Add blog fffor Managing Anthropic Beta Headers * Add blog fffor Managing Anthropic Beta Headers * correct the time * Fix: Exclude tool params for models without function calling support (#21125) (#21244) * Fix tool params reported as supported for models without function calling (#21125) JSON-configured providers (e.g. PublicAI) inherited all OpenAI params including tools, tool_choice, function_call, and functions — even for models that don't support function calling. This caused an inconsistency where get_supported_openai_params included "tools" but supports_function_calling returned False. The fix checks supports_function_calling in the dynamic config's get_supported_openai_params and removes tool-related params when the model doesn't support it. Follows the same pattern used by OVHCloud and Fireworks AI providers. * Style: move verbose_logger to module-level import, remove redundant try/except Address review feedback from Greptile bot: - Move verbose_logger import to top-level (matches project convention) - Remove redundant try/except around supports_function_calling() since it already handles exceptions internally via _supports_factory() * fix(index.md): cleanup str * fix(proxy): handle missing DATABASE_URL in append_query_params (#21239) * fix: handle missing database url in append_query_params * Update litellm/proxy/proxy_cli.py Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> --------- Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> * fix(migrations): Make vector stores migration idempotent with IF NOT EXISTS - Add IF NOT EXISTS to ALTER TABLE ADD COLUMN statements - Add IF NOT EXISTS to CREATE INDEX statements - Prevents migration failures when columns/indexes already exist from manual fixes - Follows PostgreSQL best practices for idempotent migrations --------- Co-authored-by: Harshit Jain <harshitjain0562@gmail.com> Co-authored-by: Harshit Jain <48647625+Harshit28j@users.noreply.github.com> Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Co-authored-by: Ephrim Stanley <ephrim.stanley@point72.com> Co-authored-by: Jay Prajapati <79649559+jayy-77@users.noreply.github.com> Co-authored-by: Cursor <cursoragent@cursor.com> Co-authored-by: Julio Quinteros Pro <jquinter@gmail.com> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com> Co-authored-by: Sameer Kankute <sameer@berri.ai> Co-authored-by: mjkam <mjkam@naver.com> Co-authored-by: Fly <48186978+tuzkiyoung@users.noreply.github.com> Co-authored-by: Kristoffer Arlind <13228507+KristofferArlind@users.noreply.github.com> Co-authored-by: Constantine <Runixer@gmail.com> Co-authored-by: Emerson Gomes <emerson.gomes@thalesgroup.com> Co-authored-by: Atharva Jaiswal <92455570+AtharvaJaiswal005@users.noreply.github.com> Co-authored-by: Krrish Dholakia <krrishdholakia@gmail.com> Co-authored-by: Vincent Koc <vincentkoc@ieee.org>

…ured-outputs feat(bedrock): support native structured outputs API (outputConfig.textFormat)

ndgigliotti and others added 6 commits February 14, 2026 15:48

feat(bedrock): support native structured outputs API (outputConfig.te…

2129061

…xtFormat)

vercel bot deployed to Preview February 16, 2026 17:21 View deployment

greptile-apps bot reviewed Feb 16, 2026

View reviewed changes

ndgigliotti added 2 commits February 16, 2026 15:34

Merge branch 'main' into feat/bedrock-native-structured-outputs

8f647ca

Resolve conflict in test_converse_transformation.py by keeping both the structured outputs tests (from this branch) and the TestBedrockMinThinkingBudgetTokens tests (from main).

chore: regenerate poetry.lock after merge with main

a8fbbb3

ndgigliotti mentioned this pull request Feb 16, 2026

feat(bedrock): support native structured outputs API (outputConfig.textFormat) #21222

Merged

4 tasks

vercel bot deployed to Preview February 16, 2026 23:15 View deployment

ndgigliotti and others added 2 commits February 16, 2026 20:27

fix: resolve merge conflicts with staging branch

a946cc4

Keep both structured output tests (ours) and min thinking budget tests (staging). Accept staging poetry.lock.

vercel bot deployed to Preview February 17, 2026 02:38 View deployment

Merge pull request #21222 from ndgigliotti/feat/bedrock-native-struct…

3a34b63

…ured-outputs feat(bedrock): support native structured outputs API (outputConfig.textFormat)

vercel bot deployed to Preview February 17, 2026 03:04 View deployment

krrishdholakia requested a review from Sameerlite February 17, 2026 04:05

Merge branch 'main' into litellm_oss_staging_02_16_2026

1c2e114

vercel bot deployed to Preview February 17, 2026 12:56 View deployment

Sameerlite added 2 commits February 18, 2026 17:36

fix code quality tests and mypy

7e36d47

Fix test_async_post_call_success_hook_includes_client_ip_user_agent

4bbd15f

vercel bot deployed to Preview February 18, 2026 12:09 View deployment

Sameerlite merged commit bd0c804 into main Feb 18, 2026
57 of 84 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Litellm oss staging 02 16 2026#21326

Litellm oss staging 02 16 2026#21326
Sameerlite merged 15 commits intomainfrom
litellm_oss_staging_02_16_2026

krrishdholakia commented Feb 16, 2026

Uh oh!

vercel bot commented Feb 16, 2026 •

edited

Loading

Uh oh!

greptile-apps bot commented Feb 16, 2026

Important Files Changed

Uh oh!

greptile-apps bot left a comment

Uh oh!

greptile-apps bot Feb 16, 2026

Uh oh!

greptile-apps bot Feb 16, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants

Uh oh!

Conversation

krrishdholakia commented Feb 16, 2026

Relevant issues

Pre-Submission checklist

CI (LiteLLM team)

Type

Changes

Uh oh!

vercel bot commented Feb 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

greptile-apps bot commented Feb 16, 2026

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Flowchart

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Feb 16, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Feb 16, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants

vercel bot commented Feb 16, 2026 •

edited

Loading