feat: add in_flight_requests metric to /health/backlog + prometheus by ishaan-jaff · Pull Request #22319 · BerriAI/litellm

ishaan-jaff · 2026-02-27T22:16:09Z

Relevant issues

Pre-Submission checklist

I have Added testing in the tests/litellm/ directory, Adding at least 1 test is a hard requirement - see details
My PR passes all unit tests on make test-unit
My PR's scope is as isolated as possible, it only solves 1 specific problem
I have requested a Greptile review by commenting @greptileai and received a Confidence Score of at least 4/5 before requesting a maintainer review

CI (LiteLLM team)

Branch creation CI run Link:
CI run for the last commit Link:
Merge / cherry-pick CI run Links:

Type

🆕 New Feature

Changes

Problem

Users were reporting 20s total latency while LiteLLM only logged 10s. The missing 10s was the request sitting in uvicorn's event loop waiting to be dispatched — LiteLLM's clock doesn't start until its handler runs, so that queue wait is completely invisible.

T=0   Request arrives at ALB / uvicorn
      [10s gap — LiteLLM never sees this]
T=10  LiteLLM handler starts → starts its own timer
T=20  Response sent

LiteLLM logs: 10s   User experiences: 20s

What this adds

GET /health/backlog — new per-pod endpoint returning the number of HTTP requests currently in-flight on this uvicorn worker:

{"in_flight_requests": 47}

litellm_in_flight_requests Prometheus gauge — same value, scraped at /metrics. Uses multiprocess_mode="livesum" so it aggregates correctly when running with multiple workers.

Implementation

litellm/proxy/middleware/in_flight_requests_middleware.py — pure ASGI middleware (not BaseHTTPMiddleware). Increments a class-level counter on request start, decrements in finally so errors never leak. Prometheus gauge is lazily initialised on first request (so PROMETHEUS_MULTIPROC_DIR is already set), with a sentinel flag so the import is attempted exactly once.
litellm/proxy/proxy_server.py — InFlightRequestsMiddleware registered as the outermost middleware (wraps everything, including CORS and Prometheus auth).
litellm/proxy/health_endpoints/_health_endpoints.py — /health/backlog endpoint, protected by user_api_key_auth.
litellm/types/integrations/prometheus.py — litellm_in_flight_requests added to DEFINED_PROMETHEUS_METRICS.

POC results

start — no traffic                          0
low  — +5  requests fired                   5  █████
med  — +5  more (10 total)                 10  ██████████
high — +10 more (20 total)                 20  ████████████████████
peak — +10 more (30 total)                 30  ██████████████████████████████
peak — holding                             30  ██████████████████████████████
drained — all requests done                 0

How SREs use this

`in_flight_requests`	ALB `TargetResponseTime`	What it means
High	High	Pod overloaded → scale out
Low	High	Delay is pre-ASGI (network, event loop blocking)
High	Normal	Pod is busy but healthy

Works out of the box with AWS ALB — no nginx header injection needed.

vercel · 2026-02-27T22:16:14Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
litellm	Ready	Preview, Comment	Feb 28, 2026 1:59am

greptile-apps · 2026-02-27T22:20:49Z

Greptile Summary

Adds an InFlightRequestsMiddleware ASGI middleware to track the number of concurrent HTTP requests per uvicorn worker, exposed via a new GET /health/backlog endpoint and as a litellm_in_flight_requests Prometheus gauge. This addresses a real observability gap where request queuing time inside the event loop was invisible to LiteLLM's internal latency tracking.

New pure ASGI middleware (InFlightRequestsMiddleware) with class-level counter and lazy Prometheus gauge initialization with sentinel pattern to avoid repeated import attempts
New authenticated /health/backlog health endpoint returning {"in_flight_requests": N}
litellm_in_flight_requests added to DEFINED_PROMETHEUS_METRICS type
Middleware registered as the outermost layer in proxy_server.py (correct for capturing full request lifecycle)
Includes model pricing data updates for gpt-audio-1.5 and gpt-realtime-1.5 (unrelated to the main feature)
Test for exception handling (test_counter_decrements_after_error) doesn't actually raise an exception — it returns a 500 response, missing the key try/finally code path

Confidence Score: 4/5

This PR is safe to merge — the middleware is lightweight, well-isolated, and the feature addresses a real observability need.
The implementation is clean and follows existing patterns (pure ASGI middleware, lazy initialization). The one issue is a test that doesn't actually exercise the exception path it claims to cover, which is worth fixing but doesn't affect runtime safety.
tests/test_litellm/proxy/middleware/test_in_flight_requests_middleware.py — the error-handling test needs correction to actually raise an exception.

Important Files Changed

Filename	Overview
litellm/proxy/middleware/in_flight_requests_middleware.py	New ASGI middleware tracking in-flight HTTP requests via a class-level counter and optional Prometheus gauge. Clean implementation with lazy gauge initialization and sentinel pattern to avoid repeated import attempts.
litellm/proxy/health_endpoints/_health_endpoints.py	Adds /health/backlog GET endpoint that returns the in-flight request count. Straightforward, auth-protected, consistent with existing health endpoints.
litellm/proxy/proxy_server.py	Registers InFlightRequestsMiddleware as outermost ASGI middleware (last add_middleware call = first to run). Correct placement for capturing full request lifecycle.
litellm/types/integrations/prometheus.py	Adds litellm_in_flight_requests to DEFINED_PROMETHEUS_METRICS Literal type for metric registration.
tests/test_litellm/proxy/middleware/test_in_flight_requests_middleware.py	Good test coverage for basic counter behavior, but test_counter_decrements_after_error has a misleading name — it tests a 500 response, not an actual exception, missing the key try/finally logic path.
litellm/model_prices_and_context_window_backup.json	Adds model pricing entries for gpt-audio-1.5 and gpt-realtime-1.5 — unrelated to the in-flight requests feature.

Sequence Diagram

sequenceDiagram
    participant Client
    participant InFlightMW as InFlightRequestsMiddleware
    participant PrometheusGauge as Prometheus Gauge
    participant ASGI as ASGI App (FastAPI)
    participant Handler as /health/backlog

    Client->>InFlightMW: HTTP Request
    InFlightMW->>InFlightMW: _in_flight += 1
    InFlightMW->>PrometheusGauge: gauge.inc()
    InFlightMW->>ASGI: await app(scope, receive, send)
    ASGI->>Handler: Route to endpoint
    Handler->>InFlightMW: get_in_flight_requests()
    Handler-->>Client: {"in_flight_requests": N}
    InFlightMW->>InFlightMW: _in_flight -= 1
    InFlightMW->>PrometheusGauge: gauge.dec()

_{Last reviewed commit: c9e84f5}

greptile-apps

_{6 files reviewed, 3 comments}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps · 2026-02-27T22:20:53Z

litellm/proxy/middleware/in_flight_requests_middleware.py

+_in_flight: int = 0
+
+# Lazily created on first request so PROMETHEUS_MULTIPROC_DIR is already set
+# by the time we register the metric.
+_gauge: Optional[object] = None
+
+
+def _get_gauge() -> Optional[object]:
+    global _gauge
+    if _gauge is not None:
+        return _gauge
+    try:
+        from prometheus_client import Gauge
+
+        kwargs = {}
+        if "PROMETHEUS_MULTIPROC_DIR" in os.environ:
+            # livesum aggregates across all worker processes in the scrape response
+            kwargs["multiprocess_mode"] = "livesum"
+        _gauge = Gauge(
+            "litellm_in_flight_requests",
+            "Number of HTTP requests currently in-flight on this uvicorn worker",
+            **kwargs,
+        )
+    except Exception:
+        pass
+    return _gauge


Repeated import attempt on every request when prometheus_client is missing

When prometheus_client is not installed, _get_gauge() catches the ImportError and leaves _gauge as None. Since the check on line 22 is if _gauge is not None, every subsequent HTTP request will re-enter the try block and attempt the import again, failing each time. Consider using a sentinel value (e.g., _gauge_initialized = False) to avoid retrying after the first failure:

Suggested change

_in_flight: int = 0

# Lazily created on first request so PROMETHEUS_MULTIPROC_DIR is already set

# by the time we register the metric.

_gauge: Optional[object] = None

def _get_gauge() -> Optional[object]:

global _gauge

if _gauge is not None:

return _gauge

try:

from prometheus_client import Gauge

kwargs = {}

if "PROMETHEUS_MULTIPROC_DIR" in os.environ:

# livesum aggregates across all worker processes in the scrape response

kwargs["multiprocess_mode"] = "livesum"

_gauge = Gauge(

"litellm_in_flight_requests",

"Number of HTTP requests currently in-flight on this uvicorn worker",

**kwargs,

)

except Exception:

pass

return _gauge

_in_flight: int = 0

# Lazily created on first request so PROMETHEUS_MULTIPROC_DIR is already set

# by the time we register the metric.

_gauge: Optional[object] = None

_gauge_initialized: bool = False

def _get_gauge() -> Optional[object]:

global _gauge, _gauge_initialized

if _gauge_initialized:

return _gauge

_gauge_initialized = True

try:

from prometheus_client import Gauge

kwargs = {}

if "PROMETHEUS_MULTIPROC_DIR" in os.environ:

# livesum aggregates across all worker processes in the scrape response

kwargs["multiprocess_mode"] = "livesum"

_gauge = Gauge(

"litellm_in_flight_requests",

"Number of HTTP requests currently in-flight on this uvicorn worker",

**kwargs,

)

except Exception:

pass

return _gauge

greptile-apps · 2026-02-27T22:20:54Z

litellm/proxy/middleware/in_flight_requests_middleware.py

+        _gauge = Gauge(
+            "litellm_in_flight_requests",
+            "Number of HTTP requests currently in-flight on this uvicorn worker",
+            **kwargs,
+        )


Gauge bypasses PrometheusLogger metric-enable/disable framework

The existing PrometheusLogger (in litellm/integrations/prometheus.py) uses _create_metric_factory which checks _is_metric_enabled() and respects the enabled_metrics configuration — allowing users to disable specific metrics. This new gauge is created directly via prometheus_client.Gauge(...), so it will always be registered regardless of the user's Prometheus metric configuration. Although this metric is added to DEFINED_PROMETHEUS_METRICS, the enable/disable check is never applied here.

This is worth aligning for consistency with the existing metric management pattern, though it may be intentionally standalone since it's middleware-level rather than logger-level.

_{Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!}

…ttern

ishaan-jaff · 2026-02-28T01:51:32Z

@greptileai review

greptile-apps · 2026-02-28T01:54:08Z

tests/test_litellm/proxy/middleware/test_in_flight_requests_middleware.py

+def test_counter_decrements_after_error():
+    """Counter must reach 0 even when the handler raises."""
+
+    async def handler(request: Request) -> Response:
+        return Response("boom", status_code=500)
+
+    TestClient(_make_app(handler)).get("/")
+    assert get_in_flight_requests() == 0


Test doesn't cover the exception path it claims to test

The docstring says "Counter must reach 0 even when the handler raises" but the handler returns a normal Response("boom", status_code=500) — it never raises an exception. This means the finally block in InFlightRequestsMiddleware.__call__ (which is the whole reason for using try/finally) is never exercised by this test.

To actually test exception resilience, the handler should raise:

Suggested change

def test_counter_decrements_after_error():

"""Counter must reach 0 even when the handler raises."""

async def handler(request: Request) -> Response:

return Response("boom", status_code=500)

TestClient(_make_app(handler)).get("/")

assert get_in_flight_requests() == 0

def test_counter_decrements_after_error():

"""Counter must reach 0 even when the handler raises."""

async def handler(request: Request) -> Response:

raise RuntimeError("boom")

client = TestClient(_make_app(handler), raise_server_exceptions=False)

client.get("/")

assert get_in_flight_requests() == 0

…eshooting

* Add Prometheus child_exit cleanup for gunicorn workers When a gunicorn worker exits (e.g. from max_requests recycling), its per-process prometheus .db files remain on disk. For gauges using livesum/liveall mode, this means the dead worker's last-known values persist as if the process were still alive. Wire gunicorn's child_exit hook to call mark_process_dead() so live-tracking gauges accurately reflect only running workers. * docs: update AssemblyAI docs with Universal-3 Pro, Speech Understanding, and LLM Gateway (#21130) * docs: update AssemblyAI docs with Universal-3 Pro, Speech Understanding, and LLM Gateway provider config * feat: add AssemblyAI LLM Gateway as OpenAI-compatible provider * fix(mcp): update test mocks to use renamed filter_server_ids_by_ip_with_info Tests were mocking the old method name `filter_server_ids_by_ip` but production code at server.py:774 calls `filter_server_ids_by_ip_with_info` which returns a (server_ids, blocked_count) tuple. The unmocked method on AsyncMock returned a coroutine, causing "cannot unpack non-iterable coroutine object" errors. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(test): update realtime guardrail test assertions for voice violation behavior Tests were asserting no response.create/conversation.item.create sent to backend when guardrail blocks, but the implementation intentionally sends these to have the LLM voice the guardrail violation message to the user. Updated assertions to verify the correct guardrail flow: - response.cancel is sent to stop any in-progress response - conversation.item.create with violation message is injected - response.create is sent to voice the violation - original blocked content is NOT forwarded Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(bedrock): restore parallel_tool_calls mapping in map_openai_params The revert in 8565c70 removed the parallel_tool_calls handling from map_openai_params, and the subsequent fix d0445e1 only re-added the transform_request consumption but forgot to re-add the map_openai_params producer that sets _parallel_tool_use_config. This meant parallel_tool_calls was silently ignored for all Bedrock models. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(test): update Azure pass-through test to mock litellm.completion Commit 99c62ca removed "azure" from _RESPONSES_API_PROVIDERS, routing Azure models through litellm.completion instead of litellm.responses. The test was not updated to match, causing it to assert against the wrong mock. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: add in_flight_requests metric to /health/backlog + prometheus (#22319) * feat: add in_flight_requests metric to /health/backlog + prometheus * refactor: clean class with static methods, add tests, fix sentinel pattern * docs: add in_flight_requests to prometheus metrics and latency troubleshooting * fix(db): add missing migration for LiteLLM_ClaudeCodePluginTable PR #22271 added the LiteLLM_ClaudeCodePluginTable model to schema.prisma but did not include a corresponding migration file, causing test_aaaasschema_migration_check to fail. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: update stale docstring to match guardrail voicing behavior Addresses Greptile review feedback. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * [Feat] Agent RBAC Permission Fix - Ensure Internal Users cannot create agents (#22329) * fix: enforce RBAC on agent endpoints — block non-admin create/update/delete - Add /v1/agents/{agent_id} to agent_routes so internal users can access GET-by-ID (previously returned 403 due to missing route pattern) - Add _check_agent_management_permission() guard to POST, PUT, PATCH, DELETE agent endpoints — only PROXY_ADMIN may mutate agents - Add user_api_key_dict param to delete_agent so the role check works - Add comprehensive unit tests for RBAC enforcement across all roles Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com> * fix: mock prisma_client in internal user get-agent-by-id test Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com> * feat(ui): hide agent create/delete controls for non-admin users Match MCP servers pattern: wrap '+ Add New Agent' button in isAdmin conditional so internal users see a read-only agents view. Delete buttons in card and table were already gated. Update empty-state copy for non-admin users. Add 7 Vitest tests covering role-based visibility. Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com> --------- Co-authored-by: Cursor Agent <cursoragent@cursor.com> Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com> * fix: exclude gpt-5.2-chat from temperature passthrough (#21911) gpt-5.2-chat and gpt-5.2-chat-latest only support temperature=1 (like base gpt-5), not arbitrary values (like gpt-5.2). Update is_model_gpt_5_1_model() to exclude gpt-5.2-chat variants so drop_params correctly drops unsupported temperature values. Fixes #21911 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --------- Co-authored-by: Ryan Crabbe <rcrabbe@berkeley.edu> Co-authored-by: ryan-crabbe <128659760+ryan-crabbe@users.noreply.github.com> Co-authored-by: Dylan Duan <dylan.duan@assemblyai.com> Co-authored-by: Julio Quinteros Pro <jquinter@gmail.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com> Co-authored-by: Cursor Agent <cursoragent@cursor.com> Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

@default

* fix(image_generation): propagate extra_headers to OpenAI image generation Add headers parameter to image_generation() and aimage_generation() methods in OpenAI provider, and pass headers from images/main.py to ensure custom headers like cf-aig-authorization are properly forwarded to the OpenAI API. Aligns behavior with completion() method and Azure provider implementation. * test(image_generation): add tests for extra_headers propagation Verify that extra_headers are correctly forwarded to OpenAI's images.generate() in both sync and async paths, and that they are absent when not provided. * Add Prometheus child_exit cleanup for gunicorn workers When a gunicorn worker exits (e.g. from max_requests recycling), its per-process prometheus .db files remain on disk. For gauges using livesum/liveall mode, this means the dead worker's last-known values persist as if the process were still alive. Wire gunicorn's child_exit hook to call mark_process_dead() so live-tracking gauges accurately reflect only running workers. * docs: update AssemblyAI docs with Universal-3 Pro, Speech Understanding, and LLM Gateway (#21130) * docs: update AssemblyAI docs with Universal-3 Pro, Speech Understanding, and LLM Gateway provider config * feat: add AssemblyAI LLM Gateway as OpenAI-compatible provider * fix(mcp): update test mocks to use renamed filter_server_ids_by_ip_with_info Tests were mocking the old method name `filter_server_ids_by_ip` but production code at server.py:774 calls `filter_server_ids_by_ip_with_info` which returns a (server_ids, blocked_count) tuple. The unmocked method on AsyncMock returned a coroutine, causing "cannot unpack non-iterable coroutine object" errors. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(test): update realtime guardrail test assertions for voice violation behavior Tests were asserting no response.create/conversation.item.create sent to backend when guardrail blocks, but the implementation intentionally sends these to have the LLM voice the guardrail violation message to the user. Updated assertions to verify the correct guardrail flow: - response.cancel is sent to stop any in-progress response - conversation.item.create with violation message is injected - response.create is sent to voice the violation - original blocked content is NOT forwarded Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(bedrock): restore parallel_tool_calls mapping in map_openai_params The revert in 8565c70 removed the parallel_tool_calls handling from map_openai_params, and the subsequent fix d0445e1 only re-added the transform_request consumption but forgot to re-add the map_openai_params producer that sets _parallel_tool_use_config. This meant parallel_tool_calls was silently ignored for all Bedrock models. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(test): update Azure pass-through test to mock litellm.completion Commit 99c62ca removed "azure" from _RESPONSES_API_PROVIDERS, routing Azure models through litellm.completion instead of litellm.responses. The test was not updated to match, causing it to assert against the wrong mock. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: add in_flight_requests metric to /health/backlog + prometheus (#22319) * feat: add in_flight_requests metric to /health/backlog + prometheus * refactor: clean class with static methods, add tests, fix sentinel pattern * docs: add in_flight_requests to prometheus metrics and latency troubleshooting * fix(db): add missing migration for LiteLLM_ClaudeCodePluginTable PR #22271 added the LiteLLM_ClaudeCodePluginTable model to schema.prisma but did not include a corresponding migration file, causing test_aaaasschema_migration_check to fail. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: update stale docstring to match guardrail voicing behavior Addresses Greptile review feedback. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(caching): store background task references in LLMClientCache._remove_key to prevent unawaited coroutine warnings Fixes #22128 * [Feat] Agent RBAC Permission Fix - Ensure Internal Users cannot create agents (#22329) * fix: enforce RBAC on agent endpoints — block non-admin create/update/delete - Add /v1/agents/{agent_id} to agent_routes so internal users can access GET-by-ID (previously returned 403 due to missing route pattern) - Add _check_agent_management_permission() guard to POST, PUT, PATCH, DELETE agent endpoints — only PROXY_ADMIN may mutate agents - Add user_api_key_dict param to delete_agent so the role check works - Add comprehensive unit tests for RBAC enforcement across all roles Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com> * fix: mock prisma_client in internal user get-agent-by-id test Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com> * feat(ui): hide agent create/delete controls for non-admin users Match MCP servers pattern: wrap '+ Add New Agent' button in isAdmin conditional so internal users see a read-only agents view. Delete buttons in card and table were already gated. Update empty-state copy for non-admin users. Add 7 Vitest tests covering role-based visibility. Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com> --------- Co-authored-by: Cursor Agent <cursoragent@cursor.com> Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com> * fix: Add PROXY_ADMIN role to system user for key rotation (#21896) * fix: Add PROXY_ADMIN role to system user for key rotation The key rotation worker was failing with 'You are not authorized to regenerate this key' when rotating team keys. This was because the system user created by get_litellm_internal_jobs_user_api_key_auth() was missing the user_role field. Without user_role=PROXY_ADMIN, the system user couldn't bypass team permission checks in can_team_member_execute_key_management_endpoint(), causing authorization failures for team key rotation. This fix adds user_role=LitellmUserRoles.PROXY_ADMIN to the system user, allowing it to bypass team permission checks and successfully rotate keys for all teams. * test: Add unit test for system user PROXY_ADMIN role - Verify internal jobs system user has PROXY_ADMIN role - Critical for key rotation to bypass team permission checks - Regression test for PR #21896 * fix: populate user_id and user_info for admin users in /user/info (#22239) * fix: populate user_id and user_info for admin users in /user/info endpoint Fixes #22179 When admin users call /user/info without a user_id parameter, the endpoint was returning null for both user_id and user_info fields. This broke budgeting tooling that relies on /user/info to look up current budget and spend. Changes: - Modified _get_user_info_for_proxy_admin() to accept user_api_key_dict parameter - Added logic to fetch admin's own user info from database - Updated function to return admin's user_id and user_info instead of null - Updated unit test to verify admin user_id is populated The fix ensures admin users get their own user information just like regular users. * test: make mock get_data signature match real method - Updated MockPrismaClientDB.get_data() to accept all parameters that the real method accepts - Makes mock more robust against future refactors - Added datetime and Union imports - Mock now returns None when user_id is not provided * [Fix] Pass MCP auth headers from request into tool fetch for /v1/responses and chat completions (#22291) * fixed dynamic auth for /responses with mcp * fixed greptile concern * fix(bedrock): filter internal json_tool_call when mixed with real tools Fixes #18381: When using both tools and response_format with Bedrock Converse API, LiteLLM internally adds json_tool_call to handle structured output. Bedrock may return both this internal tool AND real user-defined tools, breaking consumers like OpenAI Agents SDK. Changes: - Non-streaming: Added _filter_json_mode_tools() to handle 3 scenarios: only json_tool_call (convert to content), mixed (filter it out), or no json_tool_call (pass through) - Streaming: Added json_mode tracking to AWSEventStreamDecoder to suppress json_tool_call chunks and convert to text content - Fixed optional_params.pop() mutation issue Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * refactor: extract duplicated JSON unwrapping into helper method Addresses review comment from greptile-apps: #21107 (review) Changes: - Added `_unwrap_bedrock_properties()` helper method to eliminate code duplication - Replaced two identical JSON unwrapping blocks (lines 1592-1601 and 1612-1620) with calls to the new helper method - Improves maintainability - single source of truth for Bedrock properties unwrapping logic The helper method: - Parses JSON string - Checks for single "properties" key structure - Unwraps and returns the properties value - Returns original string if unwrapping not needed or parsing fails No functional changes - pure refactoring. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * fix: use correct class name AmazonConverseConfig in helper method calls Fixed MyPy errors where BedrockConverseConfig was used instead of AmazonConverseConfig in the _unwrap_bedrock_properties() calls. Errors: - Line 1619: BedrockConverseConfig -> AmazonConverseConfig - Line 1631: BedrockConverseConfig -> AmazonConverseConfig Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * fix: shorten guardrail benchmark result filenames for Windows long path support Fixes #21941 The generated result filenames from _save_confusion_results contained parentheses, dots, and full yaml filenames, producing paths that exceed the Windows 260-char MAX_PATH limit. Rework the safe_label logic to produce short {topic}_{method_abbrev} filenames (e.g. insults_cf.json) while preserving the full label inside the JSON content. Rename existing tracked result files to match the new naming convention. * Update litellm/proxy/guardrails/guardrail_hooks/litellm_content_filter/guardrail_benchmarks/test_eval.py Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> * Remove Apache 2 license from SKILL.md (#22322) * fix(mcp): default available_on_public_internet to true (#22331) * fix(mcp): default available_on_public_internet to true MCPs were defaulting to private (available_on_public_internet=false) which was a breaking change. This reverts the default to public (true) across: - Pydantic models (AddMCPServerRequest, UpdateMCPServerRequest, LiteLLM_MCPServerTable) - Prisma schema @default - mcp_server_manager.py YAML config + DB loading fallbacks - UI form initialValue and setFieldValue defaults * fix(ui): add forceRender to Collapse.Panel so toggle defaults render correctly Ant Design's Collapse.Panel lazy-renders children by default. Without forceRender, the Form.Item for 'Available on Public Internet' isn't mounted when the useEffect fires form.setFieldValue, causing the Switch to visually show OFF even though the intended default is true. Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com> * fix(mcp): update remaining schema copies and MCPServer type default to true Missed in previous commit per Greptile review: - schema.prisma (root) - litellm-proxy-extras/litellm_proxy_extras/schema.prisma - litellm/types/mcp_server/mcp_server_manager.py MCPServer class * ui(mcp): reframe network access as 'Internal network only' restriction Replace scary 'Available on Public Internet' toggle with 'Internal network only' opt-in restriction. Toggle OFF (default) = all networks allowed. Toggle ON = restricted to internal network only. Auth is always required either way. - MCPPermissionManagement: new label/tooltip/description, invert display via getValueProps/getValueFromEvent so underlying available_on_public_internet value is unchanged - mcp_server_view: 'Public' → 'All networks', 'Internal' → 'Internal only' (orange) - mcp_server_columns: same badge updates --------- Co-authored-by: Cursor Agent <cursoragent@cursor.com> Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com> * fix(jwt): OIDC discovery URLs, roles array handling, dot-notation error hints (#22336) * fix(jwt): support OIDC discovery URLs, handle roles array, improve error hints Three fixes for Azure AD JWT auth: 1. OIDC discovery URL support - JWT_PUBLIC_KEY_URL can now be set to .well-known/openid-configuration endpoints. The proxy fetches the discovery doc, extracts jwks_uri, and caches it. 2. Handle roles claim as array - when team_id_jwt_field points to a list (e.g. AAD's "roles": ["team1"]), auto-unwrap the first element instead of crashing with 'unhashable type: list'. 3. Better error hint for dot-notation indexing - when team_id_jwt_field is set to "roles.0" or "roles[0]", the 401 error now explains to use "roles" instead and that LiteLLM auto-unwraps lists. * Add integration demo script for JWT auth fixes (OIDC discovery, array roles, dot-notation hints) Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com> * Add demo_servers.py for manual JWT auth testing with mock JWKS/OIDC endpoints Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com> * Add demo screenshots for PR comment Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com> * Add integration test results with screenshots for PR review Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com> * address greptile review feedback (greploop iteration 1) - fix: add HTTP status code check in _resolve_jwks_url before parsing JSON - fix: remove misleading bracket-notation hint from debug log (get_nested_value does not support it) * Update tests/test_litellm/proxy/auth/test_handle_jwt.py Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> * remove demo scripts and assets --------- Co-authored-by: Cursor Agent <cursoragent@cursor.com> Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com> Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> * perf: streaming latency improvements — 4 targeted hot-path fixes (#22346) * perf: raise aiohttp connection pool limits (300→1000, 50/host→500) * perf: skip model_copy() on every chunk — only copy usage-bearing chunks * perf: replace list+join O(n²) with str+= O(n) in async_data_generator * perf: cache model-level guardrail lookup per request, not per chunk * test: add comprehensive Vitest coverage for CostTrackingSettings Add 88 tests across 9 test files for the CostTrackingSettings component directory: - provider_display_helpers.test.ts: 9 tests for helper functions - how_it_works.test.tsx: 9 tests for discount calculator component - add_provider_form.test.tsx: 7 tests for provider form validation - add_margin_form.test.tsx: 9 tests for margin form with type toggle - provider_discount_table.test.tsx: 12 tests for table editing and interactions - provider_margin_table.test.tsx: 13 tests for margin table with sorting - use_discount_config.test.ts: 11 tests for discount hook logic - use_margin_config.test.ts: 12 tests for margin hook logic - cost_tracking_settings.test.tsx: 15 tests for main component and role-based rendering All tests passing. Coverage includes form validation, user interactions, API calls, state management, and conditional rendering. Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com> * [Feature] Key list endpoint: Add project_id and access_group_id filters Add filtering capabilities to /key/list endpoint for project_id and access_group_id parameters. Both filters work globally across all visibility rules and stack with existing sort/pagination params. Added comprehensive unit tests for the new filters. Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com> * [Feature] UI - Projects: Add Project Details page with Edit modal - Add ProjectDetailsPage with header, details card, spend/budget progress, model spend bar chart, keys placeholder, and team info card - Refactor CreateProjectModal into base form pattern (ProjectBaseForm) shared between Create and Edit flows - Add EditProjectModal with pre-filled form data from backend - Add useProjectDetails and useUpdateProject hooks - Add duplicate key validation for model limits and metadata - Wire project ID click in table to navigate to detail view - Move pagination inline with search bar Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Update ui/litellm-dashboard/src/components/Projects/ProjectModals/CreateProjectModal.tsx Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> * fix(types): filter null fields from reasoning output items in ResponsesAPIResponse When providers return reasoning items without status/content/encrypted_content, Pydantic's Optional defaults serialize them as null. This breaks downstream SDKs (e.g., the OpenAI C# SDK crashes on status=null). Add a field_serializer on ResponsesAPIResponse.output that removes null status, content, and encrypted_content from reasoning items during serialization. This mirrors the request-side filtering already done in OpenAIResponsesAPIConfig._handle_reasoning_item(). Fixes #16824 --------- Co-authored-by: Zero Clover <zero@root.me> Co-authored-by: Ryan Crabbe <rcrabbe@berkeley.edu> Co-authored-by: ryan-crabbe <128659760+ryan-crabbe@users.noreply.github.com> Co-authored-by: Dylan Duan <dylan.duan@assemblyai.com> Co-authored-by: Julio Quinteros Pro <jquinter@gmail.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com> Co-authored-by: Cursor Agent <cursoragent@cursor.com> Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com> Co-authored-by: milan-berri <milan@berri.ai> Co-authored-by: Shivam Rawat <161387515+shivamrawat1@users.noreply.github.com> Co-authored-by: Brian Caswell <bcaswell@microsoft.com> Co-authored-by: Brian Caswell <bcaswell@gmail.com> Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Co-authored-by: rasmi <rrelasmar@gmail.com> Co-authored-by: yuneng-jiang <yuneng.jiang@gmail.com>

@default

…on check for GA path (#22369) * fix(image_generation): propagate extra_headers to OpenAI image generation Add headers parameter to image_generation() and aimage_generation() methods in OpenAI provider, and pass headers from images/main.py to ensure custom headers like cf-aig-authorization are properly forwarded to the OpenAI API. Aligns behavior with completion() method and Azure provider implementation. * test(image_generation): add tests for extra_headers propagation Verify that extra_headers are correctly forwarded to OpenAI's images.generate() in both sync and async paths, and that they are absent when not provided. * Add Prometheus child_exit cleanup for gunicorn workers When a gunicorn worker exits (e.g. from max_requests recycling), its per-process prometheus .db files remain on disk. For gauges using livesum/liveall mode, this means the dead worker's last-known values persist as if the process were still alive. Wire gunicorn's child_exit hook to call mark_process_dead() so live-tracking gauges accurately reflect only running workers. * docs: update AssemblyAI docs with Universal-3 Pro, Speech Understanding, and LLM Gateway (#21130) * docs: update AssemblyAI docs with Universal-3 Pro, Speech Understanding, and LLM Gateway provider config * feat: add AssemblyAI LLM Gateway as OpenAI-compatible provider * fix(mcp): update test mocks to use renamed filter_server_ids_by_ip_with_info Tests were mocking the old method name `filter_server_ids_by_ip` but production code at server.py:774 calls `filter_server_ids_by_ip_with_info` which returns a (server_ids, blocked_count) tuple. The unmocked method on AsyncMock returned a coroutine, causing "cannot unpack non-iterable coroutine object" errors. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(test): update realtime guardrail test assertions for voice violation behavior Tests were asserting no response.create/conversation.item.create sent to backend when guardrail blocks, but the implementation intentionally sends these to have the LLM voice the guardrail violation message to the user. Updated assertions to verify the correct guardrail flow: - response.cancel is sent to stop any in-progress response - conversation.item.create with violation message is injected - response.create is sent to voice the violation - original blocked content is NOT forwarded Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(bedrock): restore parallel_tool_calls mapping in map_openai_params The revert in 8565c70 removed the parallel_tool_calls handling from map_openai_params, and the subsequent fix d0445e1 only re-added the transform_request consumption but forgot to re-add the map_openai_params producer that sets _parallel_tool_use_config. This meant parallel_tool_calls was silently ignored for all Bedrock models. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(test): update Azure pass-through test to mock litellm.completion Commit 99c62ca removed "azure" from _RESPONSES_API_PROVIDERS, routing Azure models through litellm.completion instead of litellm.responses. The test was not updated to match, causing it to assert against the wrong mock. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: add in_flight_requests metric to /health/backlog + prometheus (#22319) * feat: add in_flight_requests metric to /health/backlog + prometheus * refactor: clean class with static methods, add tests, fix sentinel pattern * docs: add in_flight_requests to prometheus metrics and latency troubleshooting * fix(db): add missing migration for LiteLLM_ClaudeCodePluginTable PR #22271 added the LiteLLM_ClaudeCodePluginTable model to schema.prisma but did not include a corresponding migration file, causing test_aaaasschema_migration_check to fail. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: update stale docstring to match guardrail voicing behavior Addresses Greptile review feedback. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(caching): store background task references in LLMClientCache._remove_key to prevent unawaited coroutine warnings Fixes #22128 * [Feat] Agent RBAC Permission Fix - Ensure Internal Users cannot create agents (#22329) * fix: enforce RBAC on agent endpoints — block non-admin create/update/delete - Add /v1/agents/{agent_id} to agent_routes so internal users can access GET-by-ID (previously returned 403 due to missing route pattern) - Add _check_agent_management_permission() guard to POST, PUT, PATCH, DELETE agent endpoints — only PROXY_ADMIN may mutate agents - Add user_api_key_dict param to delete_agent so the role check works - Add comprehensive unit tests for RBAC enforcement across all roles Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com> * fix: mock prisma_client in internal user get-agent-by-id test Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com> * feat(ui): hide agent create/delete controls for non-admin users Match MCP servers pattern: wrap '+ Add New Agent' button in isAdmin conditional so internal users see a read-only agents view. Delete buttons in card and table were already gated. Update empty-state copy for non-admin users. Add 7 Vitest tests covering role-based visibility. Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com> --------- Co-authored-by: Cursor Agent <cursoragent@cursor.com> Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com> * fix: Add PROXY_ADMIN role to system user for key rotation (#21896) * fix: Add PROXY_ADMIN role to system user for key rotation The key rotation worker was failing with 'You are not authorized to regenerate this key' when rotating team keys. This was because the system user created by get_litellm_internal_jobs_user_api_key_auth() was missing the user_role field. Without user_role=PROXY_ADMIN, the system user couldn't bypass team permission checks in can_team_member_execute_key_management_endpoint(), causing authorization failures for team key rotation. This fix adds user_role=LitellmUserRoles.PROXY_ADMIN to the system user, allowing it to bypass team permission checks and successfully rotate keys for all teams. * test: Add unit test for system user PROXY_ADMIN role - Verify internal jobs system user has PROXY_ADMIN role - Critical for key rotation to bypass team permission checks - Regression test for PR #21896 * fix: populate user_id and user_info for admin users in /user/info (#22239) * fix: populate user_id and user_info for admin users in /user/info endpoint Fixes #22179 When admin users call /user/info without a user_id parameter, the endpoint was returning null for both user_id and user_info fields. This broke budgeting tooling that relies on /user/info to look up current budget and spend. Changes: - Modified _get_user_info_for_proxy_admin() to accept user_api_key_dict parameter - Added logic to fetch admin's own user info from database - Updated function to return admin's user_id and user_info instead of null - Updated unit test to verify admin user_id is populated The fix ensures admin users get their own user information just like regular users. * test: make mock get_data signature match real method - Updated MockPrismaClientDB.get_data() to accept all parameters that the real method accepts - Makes mock more robust against future refactors - Added datetime and Union imports - Mock now returns None when user_id is not provided * [Fix] Pass MCP auth headers from request into tool fetch for /v1/responses and chat completions (#22291) * fixed dynamic auth for /responses with mcp * fixed greptile concern * fix(bedrock): filter internal json_tool_call when mixed with real tools Fixes #18381: When using both tools and response_format with Bedrock Converse API, LiteLLM internally adds json_tool_call to handle structured output. Bedrock may return both this internal tool AND real user-defined tools, breaking consumers like OpenAI Agents SDK. Changes: - Non-streaming: Added _filter_json_mode_tools() to handle 3 scenarios: only json_tool_call (convert to content), mixed (filter it out), or no json_tool_call (pass through) - Streaming: Added json_mode tracking to AWSEventStreamDecoder to suppress json_tool_call chunks and convert to text content - Fixed optional_params.pop() mutation issue Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * refactor: extract duplicated JSON unwrapping into helper method Addresses review comment from greptile-apps: #21107 (review) Changes: - Added `_unwrap_bedrock_properties()` helper method to eliminate code duplication - Replaced two identical JSON unwrapping blocks (lines 1592-1601 and 1612-1620) with calls to the new helper method - Improves maintainability - single source of truth for Bedrock properties unwrapping logic The helper method: - Parses JSON string - Checks for single "properties" key structure - Unwraps and returns the properties value - Returns original string if unwrapping not needed or parsing fails No functional changes - pure refactoring. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * fix: use correct class name AmazonConverseConfig in helper method calls Fixed MyPy errors where BedrockConverseConfig was used instead of AmazonConverseConfig in the _unwrap_bedrock_properties() calls. Errors: - Line 1619: BedrockConverseConfig -> AmazonConverseConfig - Line 1631: BedrockConverseConfig -> AmazonConverseConfig Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * fix: shorten guardrail benchmark result filenames for Windows long path support Fixes #21941 The generated result filenames from _save_confusion_results contained parentheses, dots, and full yaml filenames, producing paths that exceed the Windows 260-char MAX_PATH limit. Rework the safe_label logic to produce short {topic}_{method_abbrev} filenames (e.g. insults_cf.json) while preserving the full label inside the JSON content. Rename existing tracked result files to match the new naming convention. * Update litellm/proxy/guardrails/guardrail_hooks/litellm_content_filter/guardrail_benchmarks/test_eval.py Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> * Remove Apache 2 license from SKILL.md (#22322) * fix(mcp): default available_on_public_internet to true (#22331) * fix(mcp): default available_on_public_internet to true MCPs were defaulting to private (available_on_public_internet=false) which was a breaking change. This reverts the default to public (true) across: - Pydantic models (AddMCPServerRequest, UpdateMCPServerRequest, LiteLLM_MCPServerTable) - Prisma schema @default - mcp_server_manager.py YAML config + DB loading fallbacks - UI form initialValue and setFieldValue defaults * fix(ui): add forceRender to Collapse.Panel so toggle defaults render correctly Ant Design's Collapse.Panel lazy-renders children by default. Without forceRender, the Form.Item for 'Available on Public Internet' isn't mounted when the useEffect fires form.setFieldValue, causing the Switch to visually show OFF even though the intended default is true. Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com> * fix(mcp): update remaining schema copies and MCPServer type default to true Missed in previous commit per Greptile review: - schema.prisma (root) - litellm-proxy-extras/litellm_proxy_extras/schema.prisma - litellm/types/mcp_server/mcp_server_manager.py MCPServer class * ui(mcp): reframe network access as 'Internal network only' restriction Replace scary 'Available on Public Internet' toggle with 'Internal network only' opt-in restriction. Toggle OFF (default) = all networks allowed. Toggle ON = restricted to internal network only. Auth is always required either way. - MCPPermissionManagement: new label/tooltip/description, invert display via getValueProps/getValueFromEvent so underlying available_on_public_internet value is unchanged - mcp_server_view: 'Public' → 'All networks', 'Internal' → 'Internal only' (orange) - mcp_server_columns: same badge updates --------- Co-authored-by: Cursor Agent <cursoragent@cursor.com> Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com> * fix(jwt): OIDC discovery URLs, roles array handling, dot-notation error hints (#22336) * fix(jwt): support OIDC discovery URLs, handle roles array, improve error hints Three fixes for Azure AD JWT auth: 1. OIDC discovery URL support - JWT_PUBLIC_KEY_URL can now be set to .well-known/openid-configuration endpoints. The proxy fetches the discovery doc, extracts jwks_uri, and caches it. 2. Handle roles claim as array - when team_id_jwt_field points to a list (e.g. AAD's "roles": ["team1"]), auto-unwrap the first element instead of crashing with 'unhashable type: list'. 3. Better error hint for dot-notation indexing - when team_id_jwt_field is set to "roles.0" or "roles[0]", the 401 error now explains to use "roles" instead and that LiteLLM auto-unwraps lists. * Add integration demo script for JWT auth fixes (OIDC discovery, array roles, dot-notation hints) Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com> * Add demo_servers.py for manual JWT auth testing with mock JWKS/OIDC endpoints Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com> * Add demo screenshots for PR comment Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com> * Add integration test results with screenshots for PR review Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com> * address greptile review feedback (greploop iteration 1) - fix: add HTTP status code check in _resolve_jwks_url before parsing JSON - fix: remove misleading bracket-notation hint from debug log (get_nested_value does not support it) * Update tests/test_litellm/proxy/auth/test_handle_jwt.py Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> * remove demo scripts and assets --------- Co-authored-by: Cursor Agent <cursoragent@cursor.com> Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com> Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> * perf: streaming latency improvements — 4 targeted hot-path fixes (#22346) * perf: raise aiohttp connection pool limits (300→1000, 50/host→500) * perf: skip model_copy() on every chunk — only copy usage-bearing chunks * perf: replace list+join O(n²) with str+= O(n) in async_data_generator * perf: cache model-level guardrail lookup per request, not per chunk * test: add comprehensive Vitest coverage for CostTrackingSettings Add 88 tests across 9 test files for the CostTrackingSettings component directory: - provider_display_helpers.test.ts: 9 tests for helper functions - how_it_works.test.tsx: 9 tests for discount calculator component - add_provider_form.test.tsx: 7 tests for provider form validation - add_margin_form.test.tsx: 9 tests for margin form with type toggle - provider_discount_table.test.tsx: 12 tests for table editing and interactions - provider_margin_table.test.tsx: 13 tests for margin table with sorting - use_discount_config.test.ts: 11 tests for discount hook logic - use_margin_config.test.ts: 12 tests for margin hook logic - cost_tracking_settings.test.tsx: 15 tests for main component and role-based rendering All tests passing. Coverage includes form validation, user interactions, API calls, state management, and conditional rendering. Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com> * [Feature] Key list endpoint: Add project_id and access_group_id filters Add filtering capabilities to /key/list endpoint for project_id and access_group_id parameters. Both filters work globally across all visibility rules and stack with existing sort/pagination params. Added comprehensive unit tests for the new filters. Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com> * [Feature] UI - Projects: Add Project Details page with Edit modal - Add ProjectDetailsPage with header, details card, spend/budget progress, model spend bar chart, keys placeholder, and team info card - Refactor CreateProjectModal into base form pattern (ProjectBaseForm) shared between Create and Edit flows - Add EditProjectModal with pre-filled form data from backend - Add useProjectDetails and useUpdateProject hooks - Add duplicate key validation for model limits and metadata - Wire project ID click in table to navigate to detail view - Move pagination inline with search bar Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Update ui/litellm-dashboard/src/components/Projects/ProjectModals/CreateProjectModal.tsx Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> * fix(azure): forward realtime_protocol from config and relax api_version check for GA path The realtime_protocol parameter set in config.yaml litellm_params was not reliably reaching the Azure realtime handler. Add fallback chain: kwargs → litellm_params → LITELLM_AZURE_REALTIME_PROTOCOL env var → beta. Also relax the api_version validation to only require it for the beta protocol path, since the GA/v1 path does not use api_version in the URL. Make protocol matching case-insensitive so 'ga', 'GA', 'v1', 'V1' all work consistently. Fix _construct_url type signature to accept Optional api_version. Fixes #22127 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --------- Co-authored-by: Zero Clover <zero@root.me> Co-authored-by: Ryan Crabbe <rcrabbe@berkeley.edu> Co-authored-by: ryan-crabbe <128659760+ryan-crabbe@users.noreply.github.com> Co-authored-by: Dylan Duan <dylan.duan@assemblyai.com> Co-authored-by: Julio Quinteros Pro <jquinter@gmail.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com> Co-authored-by: Shivaang <shivaang.05@gmail.com> Co-authored-by: Cursor Agent <cursoragent@cursor.com> Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com> Co-authored-by: milan-berri <milan@berri.ai> Co-authored-by: Shivam Rawat <161387515+shivamrawat1@users.noreply.github.com> Co-authored-by: Brian Caswell <bcaswell@microsoft.com> Co-authored-by: Brian Caswell <bcaswell@gmail.com> Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Co-authored-by: rasmi <rrelasmar@gmail.com> Co-authored-by: yuneng-jiang <yuneng.jiang@gmail.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

@default

* fix(image_generation): propagate extra_headers to OpenAI image generation Add headers parameter to image_generation() and aimage_generation() methods in OpenAI provider, and pass headers from images/main.py to ensure custom headers like cf-aig-authorization are properly forwarded to the OpenAI API. Aligns behavior with completion() method and Azure provider implementation. * test(image_generation): add tests for extra_headers propagation Verify that extra_headers are correctly forwarded to OpenAI's images.generate() in both sync and async paths, and that they are absent when not provided. * Add Prometheus child_exit cleanup for gunicorn workers When a gunicorn worker exits (e.g. from max_requests recycling), its per-process prometheus .db files remain on disk. For gauges using livesum/liveall mode, this means the dead worker's last-known values persist as if the process were still alive. Wire gunicorn's child_exit hook to call mark_process_dead() so live-tracking gauges accurately reflect only running workers. * docs: update AssemblyAI docs with Universal-3 Pro, Speech Understanding, and LLM Gateway (#21130) * docs: update AssemblyAI docs with Universal-3 Pro, Speech Understanding, and LLM Gateway provider config * feat: add AssemblyAI LLM Gateway as OpenAI-compatible provider * fix(mcp): update test mocks to use renamed filter_server_ids_by_ip_with_info Tests were mocking the old method name `filter_server_ids_by_ip` but production code at server.py:774 calls `filter_server_ids_by_ip_with_info` which returns a (server_ids, blocked_count) tuple. The unmocked method on AsyncMock returned a coroutine, causing "cannot unpack non-iterable coroutine object" errors. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(test): update realtime guardrail test assertions for voice violation behavior Tests were asserting no response.create/conversation.item.create sent to backend when guardrail blocks, but the implementation intentionally sends these to have the LLM voice the guardrail violation message to the user. Updated assertions to verify the correct guardrail flow: - response.cancel is sent to stop any in-progress response - conversation.item.create with violation message is injected - response.create is sent to voice the violation - original blocked content is NOT forwarded Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(bedrock): restore parallel_tool_calls mapping in map_openai_params The revert in 8565c70 removed the parallel_tool_calls handling from map_openai_params, and the subsequent fix d0445e1 only re-added the transform_request consumption but forgot to re-add the map_openai_params producer that sets _parallel_tool_use_config. This meant parallel_tool_calls was silently ignored for all Bedrock models. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(test): update Azure pass-through test to mock litellm.completion Commit 99c62ca removed "azure" from _RESPONSES_API_PROVIDERS, routing Azure models through litellm.completion instead of litellm.responses. The test was not updated to match, causing it to assert against the wrong mock. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: add in_flight_requests metric to /health/backlog + prometheus (#22319) * feat: add in_flight_requests metric to /health/backlog + prometheus * refactor: clean class with static methods, add tests, fix sentinel pattern * docs: add in_flight_requests to prometheus metrics and latency troubleshooting * fix(db): add missing migration for LiteLLM_ClaudeCodePluginTable PR #22271 added the LiteLLM_ClaudeCodePluginTable model to schema.prisma but did not include a corresponding migration file, causing test_aaaasschema_migration_check to fail. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: update stale docstring to match guardrail voicing behavior Addresses Greptile review feedback. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(caching): store background task references in LLMClientCache._remove_key to prevent unawaited coroutine warnings Fixes #22128 * [Feat] Agent RBAC Permission Fix - Ensure Internal Users cannot create agents (#22329) * fix: enforce RBAC on agent endpoints — block non-admin create/update/delete - Add /v1/agents/{agent_id} to agent_routes so internal users can access GET-by-ID (previously returned 403 due to missing route pattern) - Add _check_agent_management_permission() guard to POST, PUT, PATCH, DELETE agent endpoints — only PROXY_ADMIN may mutate agents - Add user_api_key_dict param to delete_agent so the role check works - Add comprehensive unit tests for RBAC enforcement across all roles Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com> * fix: mock prisma_client in internal user get-agent-by-id test Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com> * feat(ui): hide agent create/delete controls for non-admin users Match MCP servers pattern: wrap '+ Add New Agent' button in isAdmin conditional so internal users see a read-only agents view. Delete buttons in card and table were already gated. Update empty-state copy for non-admin users. Add 7 Vitest tests covering role-based visibility. Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com> --------- Co-authored-by: Cursor Agent <cursoragent@cursor.com> Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com> * fix: Add PROXY_ADMIN role to system user for key rotation (#21896) * fix: Add PROXY_ADMIN role to system user for key rotation The key rotation worker was failing with 'You are not authorized to regenerate this key' when rotating team keys. This was because the system user created by get_litellm_internal_jobs_user_api_key_auth() was missing the user_role field. Without user_role=PROXY_ADMIN, the system user couldn't bypass team permission checks in can_team_member_execute_key_management_endpoint(), causing authorization failures for team key rotation. This fix adds user_role=LitellmUserRoles.PROXY_ADMIN to the system user, allowing it to bypass team permission checks and successfully rotate keys for all teams. * test: Add unit test for system user PROXY_ADMIN role - Verify internal jobs system user has PROXY_ADMIN role - Critical for key rotation to bypass team permission checks - Regression test for PR #21896 * fix: populate user_id and user_info for admin users in /user/info (#22239) * fix: populate user_id and user_info for admin users in /user/info endpoint Fixes #22179 When admin users call /user/info without a user_id parameter, the endpoint was returning null for both user_id and user_info fields. This broke budgeting tooling that relies on /user/info to look up current budget and spend. Changes: - Modified _get_user_info_for_proxy_admin() to accept user_api_key_dict parameter - Added logic to fetch admin's own user info from database - Updated function to return admin's user_id and user_info instead of null - Updated unit test to verify admin user_id is populated The fix ensures admin users get their own user information just like regular users. * test: make mock get_data signature match real method - Updated MockPrismaClientDB.get_data() to accept all parameters that the real method accepts - Makes mock more robust against future refactors - Added datetime and Union imports - Mock now returns None when user_id is not provided * [Fix] Pass MCP auth headers from request into tool fetch for /v1/responses and chat completions (#22291) * fixed dynamic auth for /responses with mcp * fixed greptile concern * fix(bedrock): filter internal json_tool_call when mixed with real tools Fixes #18381: When using both tools and response_format with Bedrock Converse API, LiteLLM internally adds json_tool_call to handle structured output. Bedrock may return both this internal tool AND real user-defined tools, breaking consumers like OpenAI Agents SDK. Changes: - Non-streaming: Added _filter_json_mode_tools() to handle 3 scenarios: only json_tool_call (convert to content), mixed (filter it out), or no json_tool_call (pass through) - Streaming: Added json_mode tracking to AWSEventStreamDecoder to suppress json_tool_call chunks and convert to text content - Fixed optional_params.pop() mutation issue Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * refactor: extract duplicated JSON unwrapping into helper method Addresses review comment from greptile-apps: #21107 (review) Changes: - Added `_unwrap_bedrock_properties()` helper method to eliminate code duplication - Replaced two identical JSON unwrapping blocks (lines 1592-1601 and 1612-1620) with calls to the new helper method - Improves maintainability - single source of truth for Bedrock properties unwrapping logic The helper method: - Parses JSON string - Checks for single "properties" key structure - Unwraps and returns the properties value - Returns original string if unwrapping not needed or parsing fails No functional changes - pure refactoring. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * fix: use correct class name AmazonConverseConfig in helper method calls Fixed MyPy errors where BedrockConverseConfig was used instead of AmazonConverseConfig in the _unwrap_bedrock_properties() calls. Errors: - Line 1619: BedrockConverseConfig -> AmazonConverseConfig - Line 1631: BedrockConverseConfig -> AmazonConverseConfig Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * fix: shorten guardrail benchmark result filenames for Windows long path support Fixes #21941 The generated result filenames from _save_confusion_results contained parentheses, dots, and full yaml filenames, producing paths that exceed the Windows 260-char MAX_PATH limit. Rework the safe_label logic to produce short {topic}_{method_abbrev} filenames (e.g. insults_cf.json) while preserving the full label inside the JSON content. Rename existing tracked result files to match the new naming convention. * Update litellm/proxy/guardrails/guardrail_hooks/litellm_content_filter/guardrail_benchmarks/test_eval.py Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> * Remove Apache 2 license from SKILL.md (#22322) * fix(mcp): default available_on_public_internet to true (#22331) * fix(mcp): default available_on_public_internet to true MCPs were defaulting to private (available_on_public_internet=false) which was a breaking change. This reverts the default to public (true) across: - Pydantic models (AddMCPServerRequest, UpdateMCPServerRequest, LiteLLM_MCPServerTable) - Prisma schema @default - mcp_server_manager.py YAML config + DB loading fallbacks - UI form initialValue and setFieldValue defaults * fix(ui): add forceRender to Collapse.Panel so toggle defaults render correctly Ant Design's Collapse.Panel lazy-renders children by default. Without forceRender, the Form.Item for 'Available on Public Internet' isn't mounted when the useEffect fires form.setFieldValue, causing the Switch to visually show OFF even though the intended default is true. Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com> * fix(mcp): update remaining schema copies and MCPServer type default to true Missed in previous commit per Greptile review: - schema.prisma (root) - litellm-proxy-extras/litellm_proxy_extras/schema.prisma - litellm/types/mcp_server/mcp_server_manager.py MCPServer class * ui(mcp): reframe network access as 'Internal network only' restriction Replace scary 'Available on Public Internet' toggle with 'Internal network only' opt-in restriction. Toggle OFF (default) = all networks allowed. Toggle ON = restricted to internal network only. Auth is always required either way. - MCPPermissionManagement: new label/tooltip/description, invert display via getValueProps/getValueFromEvent so underlying available_on_public_internet value is unchanged - mcp_server_view: 'Public' → 'All networks', 'Internal' → 'Internal only' (orange) - mcp_server_columns: same badge updates --------- Co-authored-by: Cursor Agent <cursoragent@cursor.com> Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com> * fix(jwt): OIDC discovery URLs, roles array handling, dot-notation error hints (#22336) * fix(jwt): support OIDC discovery URLs, handle roles array, improve error hints Three fixes for Azure AD JWT auth: 1. OIDC discovery URL support - JWT_PUBLIC_KEY_URL can now be set to .well-known/openid-configuration endpoints. The proxy fetches the discovery doc, extracts jwks_uri, and caches it. 2. Handle roles claim as array - when team_id_jwt_field points to a list (e.g. AAD's "roles": ["team1"]), auto-unwrap the first element instead of crashing with 'unhashable type: list'. 3. Better error hint for dot-notation indexing - when team_id_jwt_field is set to "roles.0" or "roles[0]", the 401 error now explains to use "roles" instead and that LiteLLM auto-unwraps lists. * Add integration demo script for JWT auth fixes (OIDC discovery, array roles, dot-notation hints) Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com> * Add demo_servers.py for manual JWT auth testing with mock JWKS/OIDC endpoints Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com> * Add demo screenshots for PR comment Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com> * Add integration test results with screenshots for PR review Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com> * address greptile review feedback (greploop iteration 1) - fix: add HTTP status code check in _resolve_jwks_url before parsing JSON - fix: remove misleading bracket-notation hint from debug log (get_nested_value does not support it) * Update tests/test_litellm/proxy/auth/test_handle_jwt.py Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> * remove demo scripts and assets --------- Co-authored-by: Cursor Agent <cursoragent@cursor.com> Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com> Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> * perf: streaming latency improvements — 4 targeted hot-path fixes (#22346) * perf: raise aiohttp connection pool limits (300→1000, 50/host→500) * perf: skip model_copy() on every chunk — only copy usage-bearing chunks * perf: replace list+join O(n²) with str+= O(n) in async_data_generator * perf: cache model-level guardrail lookup per request, not per chunk * test: add comprehensive Vitest coverage for CostTrackingSettings Add 88 tests across 9 test files for the CostTrackingSettings component directory: - provider_display_helpers.test.ts: 9 tests for helper functions - how_it_works.test.tsx: 9 tests for discount calculator component - add_provider_form.test.tsx: 7 tests for provider form validation - add_margin_form.test.tsx: 9 tests for margin form with type toggle - provider_discount_table.test.tsx: 12 tests for table editing and interactions - provider_margin_table.test.tsx: 13 tests for margin table with sorting - use_discount_config.test.ts: 11 tests for discount hook logic - use_margin_config.test.ts: 12 tests for margin hook logic - cost_tracking_settings.test.tsx: 15 tests for main component and role-based rendering All tests passing. Coverage includes form validation, user interactions, API calls, state management, and conditional rendering. Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com> * [Feature] Key list endpoint: Add project_id and access_group_id filters Add filtering capabilities to /key/list endpoint for project_id and access_group_id parameters. Both filters work globally across all visibility rules and stack with existing sort/pagination params. Added comprehensive unit tests for the new filters. Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com> * [Feature] UI - Projects: Add Project Details page with Edit modal - Add ProjectDetailsPage with header, details card, spend/budget progress, model spend bar chart, keys placeholder, and team info card - Refactor CreateProjectModal into base form pattern (ProjectBaseForm) shared between Create and Edit flows - Add EditProjectModal with pre-filled form data from backend - Add useProjectDetails and useUpdateProject hooks - Add duplicate key validation for model limits and metadata - Wire project ID click in table to navigate to detail view - Move pagination inline with search bar Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Update ui/litellm-dashboard/src/components/Projects/ProjectModals/CreateProjectModal.tsx Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> * fix(anthropic): handle OAuth tokens in count_tokens endpoint The count_tokens API's get_required_headers() always set x-api-key, which is incorrect for OAuth tokens (sk-ant-oat*). These tokens must use Authorization: Bearer instead. Changes: - Add optionally_handle_anthropic_oauth() call in get_required_headers() to convert OAuth tokens from x-api-key to Authorization: Bearer - Add _merge_beta_headers() helper to preserve existing anthropic-beta values (e.g. token-counting) when appending the OAuth beta header - Add 7 tests covering regular and OAuth header generation Fixes #22040 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --------- Co-authored-by: Zero Clover <zero@root.me> Co-authored-by: Ryan Crabbe <rcrabbe@berkeley.edu> Co-authored-by: ryan-crabbe <128659760+ryan-crabbe@users.noreply.github.com> Co-authored-by: Dylan Duan <dylan.duan@assemblyai.com> Co-authored-by: Julio Quinteros Pro <jquinter@gmail.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com> Co-authored-by: Shivaang <shivaang.05@gmail.com> Co-authored-by: Cursor Agent <cursoragent@cursor.com> Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com> Co-authored-by: milan-berri <milan@berri.ai> Co-authored-by: Shivam Rawat <161387515+shivamrawat1@users.noreply.github.com> Co-authored-by: Brian Caswell <bcaswell@microsoft.com> Co-authored-by: Brian Caswell <bcaswell@gmail.com> Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Co-authored-by: rasmi <rrelasmar@gmail.com> Co-authored-by: yuneng-jiang <yuneng.jiang@gmail.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Add Prometheus child_exit cleanup for gunicorn workers When a gunicorn worker exits (e.g. from max_requests recycling), its per-process prometheus .db files remain on disk. For gauges using livesum/liveall mode, this means the dead worker's last-known values persist as if the process were still alive. Wire gunicorn's child_exit hook to call mark_process_dead() so live-tracking gauges accurately reflect only running workers. * docs: update AssemblyAI docs with Universal-3 Pro, Speech Understanding, and LLM Gateway (#21130) * docs: update AssemblyAI docs with Universal-3 Pro, Speech Understanding, and LLM Gateway provider config * feat: add AssemblyAI LLM Gateway as OpenAI-compatible provider * fix(mcp): update test mocks to use renamed filter_server_ids_by_ip_with_info Tests were mocking the old method name `filter_server_ids_by_ip` but production code at server.py:774 calls `filter_server_ids_by_ip_with_info` which returns a (server_ids, blocked_count) tuple. The unmocked method on AsyncMock returned a coroutine, causing "cannot unpack non-iterable coroutine object" errors. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(test): update realtime guardrail test assertions for voice violation behavior Tests were asserting no response.create/conversation.item.create sent to backend when guardrail blocks, but the implementation intentionally sends these to have the LLM voice the guardrail violation message to the user. Updated assertions to verify the correct guardrail flow: - response.cancel is sent to stop any in-progress response - conversation.item.create with violation message is injected - response.create is sent to voice the violation - original blocked content is NOT forwarded Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(bedrock): restore parallel_tool_calls mapping in map_openai_params The revert in 8565c70 removed the parallel_tool_calls handling from map_openai_params, and the subsequent fix d0445e1 only re-added the transform_request consumption but forgot to re-add the map_openai_params producer that sets _parallel_tool_use_config. This meant parallel_tool_calls was silently ignored for all Bedrock models. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(test): update Azure pass-through test to mock litellm.completion Commit 99c62ca removed "azure" from _RESPONSES_API_PROVIDERS, routing Azure models through litellm.completion instead of litellm.responses. The test was not updated to match, causing it to assert against the wrong mock. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: add in_flight_requests metric to /health/backlog + prometheus (#22319) * feat: add in_flight_requests metric to /health/backlog + prometheus * refactor: clean class with static methods, add tests, fix sentinel pattern * docs: add in_flight_requests to prometheus metrics and latency troubleshooting * fix(db): add missing migration for LiteLLM_ClaudeCodePluginTable PR #22271 added the LiteLLM_ClaudeCodePluginTable model to schema.prisma but did not include a corresponding migration file, causing test_aaaasschema_migration_check to fail. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: update stale docstring to match guardrail voicing behavior Addresses Greptile review feedback. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * [Feat] Agent RBAC Permission Fix - Ensure Internal Users cannot create agents (#22329) * fix: enforce RBAC on agent endpoints — block non-admin create/update/delete - Add /v1/agents/{agent_id} to agent_routes so internal users can access GET-by-ID (previously returned 403 due to missing route pattern) - Add _check_agent_management_permission() guard to POST, PUT, PATCH, DELETE agent endpoints — only PROXY_ADMIN may mutate agents - Add user_api_key_dict param to delete_agent so the role check works - Add comprehensive unit tests for RBAC enforcement across all roles Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com> * fix: mock prisma_client in internal user get-agent-by-id test Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com> * feat(ui): hide agent create/delete controls for non-admin users Match MCP servers pattern: wrap '+ Add New Agent' button in isAdmin conditional so internal users see a read-only agents view. Delete buttons in card and table were already gated. Update empty-state copy for non-admin users. Add 7 Vitest tests covering role-based visibility. Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com> --------- Co-authored-by: Cursor Agent <cursoragent@cursor.com> Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com> * fix: exclude gpt-5.2-chat from temperature passthrough (#21911) gpt-5.2-chat and gpt-5.2-chat-latest only support temperature=1 (like base gpt-5), not arbitrary values (like gpt-5.2). Update is_model_gpt_5_1_model() to exclude gpt-5.2-chat variants so drop_params correctly drops unsupported temperature values. Fixes #21911 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --------- Co-authored-by: Ryan Crabbe <rcrabbe@berkeley.edu> Co-authored-by: ryan-crabbe <128659760+ryan-crabbe@users.noreply.github.com> Co-authored-by: Dylan Duan <dylan.duan@assemblyai.com> Co-authored-by: Julio Quinteros Pro <jquinter@gmail.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com> Co-authored-by: Cursor Agent <cursoragent@cursor.com> Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

@default

* fix(image_generation): propagate extra_headers to OpenAI image generation Add headers parameter to image_generation() and aimage_generation() methods in OpenAI provider, and pass headers from images/main.py to ensure custom headers like cf-aig-authorization are properly forwarded to the OpenAI API. Aligns behavior with completion() method and Azure provider implementation. * test(image_generation): add tests for extra_headers propagation Verify that extra_headers are correctly forwarded to OpenAI's images.generate() in both sync and async paths, and that they are absent when not provided. * Add Prometheus child_exit cleanup for gunicorn workers When a gunicorn worker exits (e.g. from max_requests recycling), its per-process prometheus .db files remain on disk. For gauges using livesum/liveall mode, this means the dead worker's last-known values persist as if the process were still alive. Wire gunicorn's child_exit hook to call mark_process_dead() so live-tracking gauges accurately reflect only running workers. * docs: update AssemblyAI docs with Universal-3 Pro, Speech Understanding, and LLM Gateway (#21130) * docs: update AssemblyAI docs with Universal-3 Pro, Speech Understanding, and LLM Gateway provider config * feat: add AssemblyAI LLM Gateway as OpenAI-compatible provider * fix(mcp): update test mocks to use renamed filter_server_ids_by_ip_with_info Tests were mocking the old method name `filter_server_ids_by_ip` but production code at server.py:774 calls `filter_server_ids_by_ip_with_info` which returns a (server_ids, blocked_count) tuple. The unmocked method on AsyncMock returned a coroutine, causing "cannot unpack non-iterable coroutine object" errors. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(test): update realtime guardrail test assertions for voice violation behavior Tests were asserting no response.create/conversation.item.create sent to backend when guardrail blocks, but the implementation intentionally sends these to have the LLM voice the guardrail violation message to the user. Updated assertions to verify the correct guardrail flow: - response.cancel is sent to stop any in-progress response - conversation.item.create with violation message is injected - response.create is sent to voice the violation - original blocked content is NOT forwarded Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(bedrock): restore parallel_tool_calls mapping in map_openai_params The revert in 8565c70 removed the parallel_tool_calls handling from map_openai_params, and the subsequent fix d0445e1 only re-added the transform_request consumption but forgot to re-add the map_openai_params producer that sets _parallel_tool_use_config. This meant parallel_tool_calls was silently ignored for all Bedrock models. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(test): update Azure pass-through test to mock litellm.completion Commit 99c62ca removed "azure" from _RESPONSES_API_PROVIDERS, routing Azure models through litellm.completion instead of litellm.responses. The test was not updated to match, causing it to assert against the wrong mock. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: add in_flight_requests metric to /health/backlog + prometheus (#22319) * feat: add in_flight_requests metric to /health/backlog + prometheus * refactor: clean class with static methods, add tests, fix sentinel pattern * docs: add in_flight_requests to prometheus metrics and latency troubleshooting * fix(db): add missing migration for LiteLLM_ClaudeCodePluginTable PR #22271 added the LiteLLM_ClaudeCodePluginTable model to schema.prisma but did not include a corresponding migration file, causing test_aaaasschema_migration_check to fail. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: update stale docstring to match guardrail voicing behavior Addresses Greptile review feedback. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(caching): store background task references in LLMClientCache._remove_key to prevent unawaited coroutine warnings Fixes #22128 * [Feat] Agent RBAC Permission Fix - Ensure Internal Users cannot create agents (#22329) * fix: enforce RBAC on agent endpoints — block non-admin create/update/delete - Add /v1/agents/{agent_id} to agent_routes so internal users can access GET-by-ID (previously returned 403 due to missing route pattern) - Add _check_agent_management_permission() guard to POST, PUT, PATCH, DELETE agent endpoints — only PROXY_ADMIN may mutate agents - Add user_api_key_dict param to delete_agent so the role check works - Add comprehensive unit tests for RBAC enforcement across all roles Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com> * fix: mock prisma_client in internal user get-agent-by-id test Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com> * feat(ui): hide agent create/delete controls for non-admin users Match MCP servers pattern: wrap '+ Add New Agent' button in isAdmin conditional so internal users see a read-only agents view. Delete buttons in card and table were already gated. Update empty-state copy for non-admin users. Add 7 Vitest tests covering role-based visibility. Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com> --------- Co-authored-by: Cursor Agent <cursoragent@cursor.com> Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com> * fix: Add PROXY_ADMIN role to system user for key rotation (#21896) * fix: Add PROXY_ADMIN role to system user for key rotation The key rotation worker was failing with 'You are not authorized to regenerate this key' when rotating team keys. This was because the system user created by get_litellm_internal_jobs_user_api_key_auth() was missing the user_role field. Without user_role=PROXY_ADMIN, the system user couldn't bypass team permission checks in can_team_member_execute_key_management_endpoint(), causing authorization failures for team key rotation. This fix adds user_role=LitellmUserRoles.PROXY_ADMIN to the system user, allowing it to bypass team permission checks and successfully rotate keys for all teams. * test: Add unit test for system user PROXY_ADMIN role - Verify internal jobs system user has PROXY_ADMIN role - Critical for key rotation to bypass team permission checks - Regression test for PR #21896 * fix: populate user_id and user_info for admin users in /user/info (#22239) * fix: populate user_id and user_info for admin users in /user/info endpoint Fixes #22179 When admin users call /user/info without a user_id parameter, the endpoint was returning null for both user_id and user_info fields. This broke budgeting tooling that relies on /user/info to look up current budget and spend. Changes: - Modified _get_user_info_for_proxy_admin() to accept user_api_key_dict parameter - Added logic to fetch admin's own user info from database - Updated function to return admin's user_id and user_info instead of null - Updated unit test to verify admin user_id is populated The fix ensures admin users get their own user information just like regular users. * test: make mock get_data signature match real method - Updated MockPrismaClientDB.get_data() to accept all parameters that the real method accepts - Makes mock more robust against future refactors - Added datetime and Union imports - Mock now returns None when user_id is not provided * [Fix] Pass MCP auth headers from request into tool fetch for /v1/responses and chat completions (#22291) * fixed dynamic auth for /responses with mcp * fixed greptile concern * fix(bedrock): filter internal json_tool_call when mixed with real tools Fixes #18381: When using both tools and response_format with Bedrock Converse API, LiteLLM internally adds json_tool_call to handle structured output. Bedrock may return both this internal tool AND real user-defined tools, breaking consumers like OpenAI Agents SDK. Changes: - Non-streaming: Added _filter_json_mode_tools() to handle 3 scenarios: only json_tool_call (convert to content), mixed (filter it out), or no json_tool_call (pass through) - Streaming: Added json_mode tracking to AWSEventStreamDecoder to suppress json_tool_call chunks and convert to text content - Fixed optional_params.pop() mutation issue Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * refactor: extract duplicated JSON unwrapping into helper method Addresses review comment from greptile-apps: #21107 (review) Changes: - Added `_unwrap_bedrock_properties()` helper method to eliminate code duplication - Replaced two identical JSON unwrapping blocks (lines 1592-1601 and 1612-1620) with calls to the new helper method - Improves maintainability - single source of truth for Bedrock properties unwrapping logic The helper method: - Parses JSON string - Checks for single "properties" key structure - Unwraps and returns the properties value - Returns original string if unwrapping not needed or parsing fails No functional changes - pure refactoring. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * fix: use correct class name AmazonConverseConfig in helper method calls Fixed MyPy errors where BedrockConverseConfig was used instead of AmazonConverseConfig in the _unwrap_bedrock_properties() calls. Errors: - Line 1619: BedrockConverseConfig -> AmazonConverseConfig - Line 1631: BedrockConverseConfig -> AmazonConverseConfig Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * fix: shorten guardrail benchmark result filenames for Windows long path support Fixes #21941 The generated result filenames from _save_confusion_results contained parentheses, dots, and full yaml filenames, producing paths that exceed the Windows 260-char MAX_PATH limit. Rework the safe_label logic to produce short {topic}_{method_abbrev} filenames (e.g. insults_cf.json) while preserving the full label inside the JSON content. Rename existing tracked result files to match the new naming convention. * Update litellm/proxy/guardrails/guardrail_hooks/litellm_content_filter/guardrail_benchmarks/test_eval.py Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> * Remove Apache 2 license from SKILL.md (#22322) * fix(mcp): default available_on_public_internet to true (#22331) * fix(mcp): default available_on_public_internet to true MCPs were defaulting to private (available_on_public_internet=false) which was a breaking change. This reverts the default to public (true) across: - Pydantic models (AddMCPServerRequest, UpdateMCPServerRequest, LiteLLM_MCPServerTable) - Prisma schema @default - mcp_server_manager.py YAML config + DB loading fallbacks - UI form initialValue and setFieldValue defaults * fix(ui): add forceRender to Collapse.Panel so toggle defaults render correctly Ant Design's Collapse.Panel lazy-renders children by default. Without forceRender, the Form.Item for 'Available on Public Internet' isn't mounted when the useEffect fires form.setFieldValue, causing the Switch to visually show OFF even though the intended default is true. Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com> * fix(mcp): update remaining schema copies and MCPServer type default to true Missed in previous commit per Greptile review: - schema.prisma (root) - litellm-proxy-extras/litellm_proxy_extras/schema.prisma - litellm/types/mcp_server/mcp_server_manager.py MCPServer class * ui(mcp): reframe network access as 'Internal network only' restriction Replace scary 'Available on Public Internet' toggle with 'Internal network only' opt-in restriction. Toggle OFF (default) = all networks allowed. Toggle ON = restricted to internal network only. Auth is always required either way. - MCPPermissionManagement: new label/tooltip/description, invert display via getValueProps/getValueFromEvent so underlying available_on_public_internet value is unchanged - mcp_server_view: 'Public' → 'All networks', 'Internal' → 'Internal only' (orange) - mcp_server_columns: same badge updates --------- Co-authored-by: Cursor Agent <cursoragent@cursor.com> Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com> * fix(jwt): OIDC discovery URLs, roles array handling, dot-notation error hints (#22336) * fix(jwt): support OIDC discovery URLs, handle roles array, improve error hints Three fixes for Azure AD JWT auth: 1. OIDC discovery URL support - JWT_PUBLIC_KEY_URL can now be set to .well-known/openid-configuration endpoints. The proxy fetches the discovery doc, extracts jwks_uri, and caches it. 2. Handle roles claim as array - when team_id_jwt_field points to a list (e.g. AAD's "roles": ["team1"]), auto-unwrap the first element instead of crashing with 'unhashable type: list'. 3. Better error hint for dot-notation indexing - when team_id_jwt_field is set to "roles.0" or "roles[0]", the 401 error now explains to use "roles" instead and that LiteLLM auto-unwraps lists. * Add integration demo script for JWT auth fixes (OIDC discovery, array roles, dot-notation hints) Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com> * Add demo_servers.py for manual JWT auth testing with mock JWKS/OIDC endpoints Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com> * Add demo screenshots for PR comment Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com> * Add integration test results with screenshots for PR review Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com> * address greptile review feedback (greploop iteration 1) - fix: add HTTP status code check in _resolve_jwks_url before parsing JSON - fix: remove misleading bracket-notation hint from debug log (get_nested_value does not support it) * Update tests/test_litellm/proxy/auth/test_handle_jwt.py Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> * remove demo scripts and assets --------- Co-authored-by: Cursor Agent <cursoragent@cursor.com> Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com> Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> * perf: streaming latency improvements — 4 targeted hot-path fixes (#22346) * perf: raise aiohttp connection pool limits (300→1000, 50/host→500) * perf: skip model_copy() on every chunk — only copy usage-bearing chunks * perf: replace list+join O(n²) with str+= O(n) in async_data_generator * perf: cache model-level guardrail lookup per request, not per chunk * test: add comprehensive Vitest coverage for CostTrackingSettings Add 88 tests across 9 test files for the CostTrackingSettings component directory: - provider_display_helpers.test.ts: 9 tests for helper functions - how_it_works.test.tsx: 9 tests for discount calculator component - add_provider_form.test.tsx: 7 tests for provider form validation - add_margin_form.test.tsx: 9 tests for margin form with type toggle - provider_discount_table.test.tsx: 12 tests for table editing and interactions - provider_margin_table.test.tsx: 13 tests for margin table with sorting - use_discount_config.test.ts: 11 tests for discount hook logic - use_margin_config.test.ts: 12 tests for margin hook logic - cost_tracking_settings.test.tsx: 15 tests for main component and role-based rendering All tests passing. Coverage includes form validation, user interactions, API calls, state management, and conditional rendering. Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com> * [Feature] Key list endpoint: Add project_id and access_group_id filters Add filtering capabilities to /key/list endpoint for project_id and access_group_id parameters. Both filters work globally across all visibility rules and stack with existing sort/pagination params. Added comprehensive unit tests for the new filters. Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com> * [Feature] UI - Projects: Add Project Details page with Edit modal - Add ProjectDetailsPage with header, details card, spend/budget progress, model spend bar chart, keys placeholder, and team info card - Refactor CreateProjectModal into base form pattern (ProjectBaseForm) shared between Create and Edit flows - Add EditProjectModal with pre-filled form data from backend - Add useProjectDetails and useUpdateProject hooks - Add duplicate key validation for model limits and metadata - Wire project ID click in table to navigate to detail view - Move pagination inline with search bar Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Update ui/litellm-dashboard/src/components/Projects/ProjectModals/CreateProjectModal.tsx Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> * fix(types): filter null fields from reasoning output items in ResponsesAPIResponse When providers return reasoning items without status/content/encrypted_content, Pydantic's Optional defaults serialize them as null. This breaks downstream SDKs (e.g., the OpenAI C# SDK crashes on status=null). Add a field_serializer on ResponsesAPIResponse.output that removes null status, content, and encrypted_content from reasoning items during serialization. This mirrors the request-side filtering already done in OpenAIResponsesAPIConfig._handle_reasoning_item(). Fixes #16824 --------- Co-authored-by: Zero Clover <zero@root.me> Co-authored-by: Ryan Crabbe <rcrabbe@berkeley.edu> Co-authored-by: ryan-crabbe <128659760+ryan-crabbe@users.noreply.github.com> Co-authored-by: Dylan Duan <dylan.duan@assemblyai.com> Co-authored-by: Julio Quinteros Pro <jquinter@gmail.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com> Co-authored-by: Cursor Agent <cursoragent@cursor.com> Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com> Co-authored-by: milan-berri <milan@berri.ai> Co-authored-by: Shivam Rawat <161387515+shivamrawat1@users.noreply.github.com> Co-authored-by: Brian Caswell <bcaswell@microsoft.com> Co-authored-by: Brian Caswell <bcaswell@gmail.com> Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Co-authored-by: rasmi <rrelasmar@gmail.com> Co-authored-by: yuneng-jiang <yuneng.jiang@gmail.com>

@default

…on check for GA path (#22369) * fix(image_generation): propagate extra_headers to OpenAI image generation Add headers parameter to image_generation() and aimage_generation() methods in OpenAI provider, and pass headers from images/main.py to ensure custom headers like cf-aig-authorization are properly forwarded to the OpenAI API. Aligns behavior with completion() method and Azure provider implementation. * test(image_generation): add tests for extra_headers propagation Verify that extra_headers are correctly forwarded to OpenAI's images.generate() in both sync and async paths, and that they are absent when not provided. * Add Prometheus child_exit cleanup for gunicorn workers When a gunicorn worker exits (e.g. from max_requests recycling), its per-process prometheus .db files remain on disk. For gauges using livesum/liveall mode, this means the dead worker's last-known values persist as if the process were still alive. Wire gunicorn's child_exit hook to call mark_process_dead() so live-tracking gauges accurately reflect only running workers. * docs: update AssemblyAI docs with Universal-3 Pro, Speech Understanding, and LLM Gateway (#21130) * docs: update AssemblyAI docs with Universal-3 Pro, Speech Understanding, and LLM Gateway provider config * feat: add AssemblyAI LLM Gateway as OpenAI-compatible provider * fix(mcp): update test mocks to use renamed filter_server_ids_by_ip_with_info Tests were mocking the old method name `filter_server_ids_by_ip` but production code at server.py:774 calls `filter_server_ids_by_ip_with_info` which returns a (server_ids, blocked_count) tuple. The unmocked method on AsyncMock returned a coroutine, causing "cannot unpack non-iterable coroutine object" errors. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(test): update realtime guardrail test assertions for voice violation behavior Tests were asserting no response.create/conversation.item.create sent to backend when guardrail blocks, but the implementation intentionally sends these to have the LLM voice the guardrail violation message to the user. Updated assertions to verify the correct guardrail flow: - response.cancel is sent to stop any in-progress response - conversation.item.create with violation message is injected - response.create is sent to voice the violation - original blocked content is NOT forwarded Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(bedrock): restore parallel_tool_calls mapping in map_openai_params The revert in 8565c70 removed the parallel_tool_calls handling from map_openai_params, and the subsequent fix d0445e1 only re-added the transform_request consumption but forgot to re-add the map_openai_params producer that sets _parallel_tool_use_config. This meant parallel_tool_calls was silently ignored for all Bedrock models. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(test): update Azure pass-through test to mock litellm.completion Commit 99c62ca removed "azure" from _RESPONSES_API_PROVIDERS, routing Azure models through litellm.completion instead of litellm.responses. The test was not updated to match, causing it to assert against the wrong mock. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: add in_flight_requests metric to /health/backlog + prometheus (#22319) * feat: add in_flight_requests metric to /health/backlog + prometheus * refactor: clean class with static methods, add tests, fix sentinel pattern * docs: add in_flight_requests to prometheus metrics and latency troubleshooting * fix(db): add missing migration for LiteLLM_ClaudeCodePluginTable PR #22271 added the LiteLLM_ClaudeCodePluginTable model to schema.prisma but did not include a corresponding migration file, causing test_aaaasschema_migration_check to fail. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: update stale docstring to match guardrail voicing behavior Addresses Greptile review feedback. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(caching): store background task references in LLMClientCache._remove_key to prevent unawaited coroutine warnings Fixes #22128 * [Feat] Agent RBAC Permission Fix - Ensure Internal Users cannot create agents (#22329) * fix: enforce RBAC on agent endpoints — block non-admin create/update/delete - Add /v1/agents/{agent_id} to agent_routes so internal users can access GET-by-ID (previously returned 403 due to missing route pattern) - Add _check_agent_management_permission() guard to POST, PUT, PATCH, DELETE agent endpoints — only PROXY_ADMIN may mutate agents - Add user_api_key_dict param to delete_agent so the role check works - Add comprehensive unit tests for RBAC enforcement across all roles Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com> * fix: mock prisma_client in internal user get-agent-by-id test Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com> * feat(ui): hide agent create/delete controls for non-admin users Match MCP servers pattern: wrap '+ Add New Agent' button in isAdmin conditional so internal users see a read-only agents view. Delete buttons in card and table were already gated. Update empty-state copy for non-admin users. Add 7 Vitest tests covering role-based visibility. Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com> --------- Co-authored-by: Cursor Agent <cursoragent@cursor.com> Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com> * fix: Add PROXY_ADMIN role to system user for key rotation (#21896) * fix: Add PROXY_ADMIN role to system user for key rotation The key rotation worker was failing with 'You are not authorized to regenerate this key' when rotating team keys. This was because the system user created by get_litellm_internal_jobs_user_api_key_auth() was missing the user_role field. Without user_role=PROXY_ADMIN, the system user couldn't bypass team permission checks in can_team_member_execute_key_management_endpoint(), causing authorization failures for team key rotation. This fix adds user_role=LitellmUserRoles.PROXY_ADMIN to the system user, allowing it to bypass team permission checks and successfully rotate keys for all teams. * test: Add unit test for system user PROXY_ADMIN role - Verify internal jobs system user has PROXY_ADMIN role - Critical for key rotation to bypass team permission checks - Regression test for PR #21896 * fix: populate user_id and user_info for admin users in /user/info (#22239) * fix: populate user_id and user_info for admin users in /user/info endpoint Fixes #22179 When admin users call /user/info without a user_id parameter, the endpoint was returning null for both user_id and user_info fields. This broke budgeting tooling that relies on /user/info to look up current budget and spend. Changes: - Modified _get_user_info_for_proxy_admin() to accept user_api_key_dict parameter - Added logic to fetch admin's own user info from database - Updated function to return admin's user_id and user_info instead of null - Updated unit test to verify admin user_id is populated The fix ensures admin users get their own user information just like regular users. * test: make mock get_data signature match real method - Updated MockPrismaClientDB.get_data() to accept all parameters that the real method accepts - Makes mock more robust against future refactors - Added datetime and Union imports - Mock now returns None when user_id is not provided * [Fix] Pass MCP auth headers from request into tool fetch for /v1/responses and chat completions (#22291) * fixed dynamic auth for /responses with mcp * fixed greptile concern * fix(bedrock): filter internal json_tool_call when mixed with real tools Fixes #18381: When using both tools and response_format with Bedrock Converse API, LiteLLM internally adds json_tool_call to handle structured output. Bedrock may return both this internal tool AND real user-defined tools, breaking consumers like OpenAI Agents SDK. Changes: - Non-streaming: Added _filter_json_mode_tools() to handle 3 scenarios: only json_tool_call (convert to content), mixed (filter it out), or no json_tool_call (pass through) - Streaming: Added json_mode tracking to AWSEventStreamDecoder to suppress json_tool_call chunks and convert to text content - Fixed optional_params.pop() mutation issue Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * refactor: extract duplicated JSON unwrapping into helper method Addresses review comment from greptile-apps: #21107 (review) Changes: - Added `_unwrap_bedrock_properties()` helper method to eliminate code duplication - Replaced two identical JSON unwrapping blocks (lines 1592-1601 and 1612-1620) with calls to the new helper method - Improves maintainability - single source of truth for Bedrock properties unwrapping logic The helper method: - Parses JSON string - Checks for single "properties" key structure - Unwraps and returns the properties value - Returns original string if unwrapping not needed or parsing fails No functional changes - pure refactoring. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * fix: use correct class name AmazonConverseConfig in helper method calls Fixed MyPy errors where BedrockConverseConfig was used instead of AmazonConverseConfig in the _unwrap_bedrock_properties() calls. Errors: - Line 1619: BedrockConverseConfig -> AmazonConverseConfig - Line 1631: BedrockConverseConfig -> AmazonConverseConfig Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * fix: shorten guardrail benchmark result filenames for Windows long path support Fixes #21941 The generated result filenames from _save_confusion_results contained parentheses, dots, and full yaml filenames, producing paths that exceed the Windows 260-char MAX_PATH limit. Rework the safe_label logic to produce short {topic}_{method_abbrev} filenames (e.g. insults_cf.json) while preserving the full label inside the JSON content. Rename existing tracked result files to match the new naming convention. * Update litellm/proxy/guardrails/guardrail_hooks/litellm_content_filter/guardrail_benchmarks/test_eval.py Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> * Remove Apache 2 license from SKILL.md (#22322) * fix(mcp): default available_on_public_internet to true (#22331) * fix(mcp): default available_on_public_internet to true MCPs were defaulting to private (available_on_public_internet=false) which was a breaking change. This reverts the default to public (true) across: - Pydantic models (AddMCPServerRequest, UpdateMCPServerRequest, LiteLLM_MCPServerTable) - Prisma schema @default - mcp_server_manager.py YAML config + DB loading fallbacks - UI form initialValue and setFieldValue defaults * fix(ui): add forceRender to Collapse.Panel so toggle defaults render correctly Ant Design's Collapse.Panel lazy-renders children by default. Without forceRender, the Form.Item for 'Available on Public Internet' isn't mounted when the useEffect fires form.setFieldValue, causing the Switch to visually show OFF even though the intended default is true. Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com> * fix(mcp): update remaining schema copies and MCPServer type default to true Missed in previous commit per Greptile review: - schema.prisma (root) - litellm-proxy-extras/litellm_proxy_extras/schema.prisma - litellm/types/mcp_server/mcp_server_manager.py MCPServer class * ui(mcp): reframe network access as 'Internal network only' restriction Replace scary 'Available on Public Internet' toggle with 'Internal network only' opt-in restriction. Toggle OFF (default) = all networks allowed. Toggle ON = restricted to internal network only. Auth is always required either way. - MCPPermissionManagement: new label/tooltip/description, invert display via getValueProps/getValueFromEvent so underlying available_on_public_internet value is unchanged - mcp_server_view: 'Public' → 'All networks', 'Internal' → 'Internal only' (orange) - mcp_server_columns: same badge updates --------- Co-authored-by: Cursor Agent <cursoragent@cursor.com> Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com> * fix(jwt): OIDC discovery URLs, roles array handling, dot-notation error hints (#22336) * fix(jwt): support OIDC discovery URLs, handle roles array, improve error hints Three fixes for Azure AD JWT auth: 1. OIDC discovery URL support - JWT_PUBLIC_KEY_URL can now be set to .well-known/openid-configuration endpoints. The proxy fetches the discovery doc, extracts jwks_uri, and caches it. 2. Handle roles claim as array - when team_id_jwt_field points to a list (e.g. AAD's "roles": ["team1"]), auto-unwrap the first element instead of crashing with 'unhashable type: list'. 3. Better error hint for dot-notation indexing - when team_id_jwt_field is set to "roles.0" or "roles[0]", the 401 error now explains to use "roles" instead and that LiteLLM auto-unwraps lists. * Add integration demo script for JWT auth fixes (OIDC discovery, array roles, dot-notation hints) Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com> * Add demo_servers.py for manual JWT auth testing with mock JWKS/OIDC endpoints Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com> * Add demo screenshots for PR comment Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com> * Add integration test results with screenshots for PR review Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com> * address greptile review feedback (greploop iteration 1) - fix: add HTTP status code check in _resolve_jwks_url before parsing JSON - fix: remove misleading bracket-notation hint from debug log (get_nested_value does not support it) * Update tests/test_litellm/proxy/auth/test_handle_jwt.py Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> * remove demo scripts and assets --------- Co-authored-by: Cursor Agent <cursoragent@cursor.com> Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com> Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> * perf: streaming latency improvements — 4 targeted hot-path fixes (#22346) * perf: raise aiohttp connection pool limits (300→1000, 50/host→500) * perf: skip model_copy() on every chunk — only copy usage-bearing chunks * perf: replace list+join O(n²) with str+= O(n) in async_data_generator * perf: cache model-level guardrail lookup per request, not per chunk * test: add comprehensive Vitest coverage for CostTrackingSettings Add 88 tests across 9 test files for the CostTrackingSettings component directory: - provider_display_helpers.test.ts: 9 tests for helper functions - how_it_works.test.tsx: 9 tests for discount calculator component - add_provider_form.test.tsx: 7 tests for provider form validation - add_margin_form.test.tsx: 9 tests for margin form with type toggle - provider_discount_table.test.tsx: 12 tests for table editing and interactions - provider_margin_table.test.tsx: 13 tests for margin table with sorting - use_discount_config.test.ts: 11 tests for discount hook logic - use_margin_config.test.ts: 12 tests for margin hook logic - cost_tracking_settings.test.tsx: 15 tests for main component and role-based rendering All tests passing. Coverage includes form validation, user interactions, API calls, state management, and conditional rendering. Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com> * [Feature] Key list endpoint: Add project_id and access_group_id filters Add filtering capabilities to /key/list endpoint for project_id and access_group_id parameters. Both filters work globally across all visibility rules and stack with existing sort/pagination params. Added comprehensive unit tests for the new filters. Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com> * [Feature] UI - Projects: Add Project Details page with Edit modal - Add ProjectDetailsPage with header, details card, spend/budget progress, model spend bar chart, keys placeholder, and team info card - Refactor CreateProjectModal into base form pattern (ProjectBaseForm) shared between Create and Edit flows - Add EditProjectModal with pre-filled form data from backend - Add useProjectDetails and useUpdateProject hooks - Add duplicate key validation for model limits and metadata - Wire project ID click in table to navigate to detail view - Move pagination inline with search bar Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Update ui/litellm-dashboard/src/components/Projects/ProjectModals/CreateProjectModal.tsx Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> * fix(azure): forward realtime_protocol from config and relax api_version check for GA path The realtime_protocol parameter set in config.yaml litellm_params was not reliably reaching the Azure realtime handler. Add fallback chain: kwargs → litellm_params → LITELLM_AZURE_REALTIME_PROTOCOL env var → beta. Also relax the api_version validation to only require it for the beta protocol path, since the GA/v1 path does not use api_version in the URL. Make protocol matching case-insensitive so 'ga', 'GA', 'v1', 'V1' all work consistently. Fix _construct_url type signature to accept Optional api_version. Fixes #22127 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --------- Co-authored-by: Zero Clover <zero@root.me> Co-authored-by: Ryan Crabbe <rcrabbe@berkeley.edu> Co-authored-by: ryan-crabbe <128659760+ryan-crabbe@users.noreply.github.com> Co-authored-by: Dylan Duan <dylan.duan@assemblyai.com> Co-authored-by: Julio Quinteros Pro <jquinter@gmail.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com> Co-authored-by: Shivaang <shivaang.05@gmail.com> Co-authored-by: Cursor Agent <cursoragent@cursor.com> Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com> Co-authored-by: milan-berri <milan@berri.ai> Co-authored-by: Shivam Rawat <161387515+shivamrawat1@users.noreply.github.com> Co-authored-by: Brian Caswell <bcaswell@microsoft.com> Co-authored-by: Brian Caswell <bcaswell@gmail.com> Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Co-authored-by: rasmi <rrelasmar@gmail.com> Co-authored-by: yuneng-jiang <yuneng.jiang@gmail.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

@default

* fix(image_generation): propagate extra_headers to OpenAI image generation Add headers parameter to image_generation() and aimage_generation() methods in OpenAI provider, and pass headers from images/main.py to ensure custom headers like cf-aig-authorization are properly forwarded to the OpenAI API. Aligns behavior with completion() method and Azure provider implementation. * test(image_generation): add tests for extra_headers propagation Verify that extra_headers are correctly forwarded to OpenAI's images.generate() in both sync and async paths, and that they are absent when not provided. * Add Prometheus child_exit cleanup for gunicorn workers When a gunicorn worker exits (e.g. from max_requests recycling), its per-process prometheus .db files remain on disk. For gauges using livesum/liveall mode, this means the dead worker's last-known values persist as if the process were still alive. Wire gunicorn's child_exit hook to call mark_process_dead() so live-tracking gauges accurately reflect only running workers. * docs: update AssemblyAI docs with Universal-3 Pro, Speech Understanding, and LLM Gateway (#21130) * docs: update AssemblyAI docs with Universal-3 Pro, Speech Understanding, and LLM Gateway provider config * feat: add AssemblyAI LLM Gateway as OpenAI-compatible provider * fix(mcp): update test mocks to use renamed filter_server_ids_by_ip_with_info Tests were mocking the old method name `filter_server_ids_by_ip` but production code at server.py:774 calls `filter_server_ids_by_ip_with_info` which returns a (server_ids, blocked_count) tuple. The unmocked method on AsyncMock returned a coroutine, causing "cannot unpack non-iterable coroutine object" errors. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(test): update realtime guardrail test assertions for voice violation behavior Tests were asserting no response.create/conversation.item.create sent to backend when guardrail blocks, but the implementation intentionally sends these to have the LLM voice the guardrail violation message to the user. Updated assertions to verify the correct guardrail flow: - response.cancel is sent to stop any in-progress response - conversation.item.create with violation message is injected - response.create is sent to voice the violation - original blocked content is NOT forwarded Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(bedrock): restore parallel_tool_calls mapping in map_openai_params The revert in 8565c70 removed the parallel_tool_calls handling from map_openai_params, and the subsequent fix d0445e1 only re-added the transform_request consumption but forgot to re-add the map_openai_params producer that sets _parallel_tool_use_config. This meant parallel_tool_calls was silently ignored for all Bedrock models. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(test): update Azure pass-through test to mock litellm.completion Commit 99c62ca removed "azure" from _RESPONSES_API_PROVIDERS, routing Azure models through litellm.completion instead of litellm.responses. The test was not updated to match, causing it to assert against the wrong mock. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: add in_flight_requests metric to /health/backlog + prometheus (#22319) * feat: add in_flight_requests metric to /health/backlog + prometheus * refactor: clean class with static methods, add tests, fix sentinel pattern * docs: add in_flight_requests to prometheus metrics and latency troubleshooting * fix(db): add missing migration for LiteLLM_ClaudeCodePluginTable PR #22271 added the LiteLLM_ClaudeCodePluginTable model to schema.prisma but did not include a corresponding migration file, causing test_aaaasschema_migration_check to fail. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: update stale docstring to match guardrail voicing behavior Addresses Greptile review feedback. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(caching): store background task references in LLMClientCache._remove_key to prevent unawaited coroutine warnings Fixes #22128 * [Feat] Agent RBAC Permission Fix - Ensure Internal Users cannot create agents (#22329) * fix: enforce RBAC on agent endpoints — block non-admin create/update/delete - Add /v1/agents/{agent_id} to agent_routes so internal users can access GET-by-ID (previously returned 403 due to missing route pattern) - Add _check_agent_management_permission() guard to POST, PUT, PATCH, DELETE agent endpoints — only PROXY_ADMIN may mutate agents - Add user_api_key_dict param to delete_agent so the role check works - Add comprehensive unit tests for RBAC enforcement across all roles Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com> * fix: mock prisma_client in internal user get-agent-by-id test Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com> * feat(ui): hide agent create/delete controls for non-admin users Match MCP servers pattern: wrap '+ Add New Agent' button in isAdmin conditional so internal users see a read-only agents view. Delete buttons in card and table were already gated. Update empty-state copy for non-admin users. Add 7 Vitest tests covering role-based visibility. Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com> --------- Co-authored-by: Cursor Agent <cursoragent@cursor.com> Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com> * fix: Add PROXY_ADMIN role to system user for key rotation (#21896) * fix: Add PROXY_ADMIN role to system user for key rotation The key rotation worker was failing with 'You are not authorized to regenerate this key' when rotating team keys. This was because the system user created by get_litellm_internal_jobs_user_api_key_auth() was missing the user_role field. Without user_role=PROXY_ADMIN, the system user couldn't bypass team permission checks in can_team_member_execute_key_management_endpoint(), causing authorization failures for team key rotation. This fix adds user_role=LitellmUserRoles.PROXY_ADMIN to the system user, allowing it to bypass team permission checks and successfully rotate keys for all teams. * test: Add unit test for system user PROXY_ADMIN role - Verify internal jobs system user has PROXY_ADMIN role - Critical for key rotation to bypass team permission checks - Regression test for PR #21896 * fix: populate user_id and user_info for admin users in /user/info (#22239) * fix: populate user_id and user_info for admin users in /user/info endpoint Fixes #22179 When admin users call /user/info without a user_id parameter, the endpoint was returning null for both user_id and user_info fields. This broke budgeting tooling that relies on /user/info to look up current budget and spend. Changes: - Modified _get_user_info_for_proxy_admin() to accept user_api_key_dict parameter - Added logic to fetch admin's own user info from database - Updated function to return admin's user_id and user_info instead of null - Updated unit test to verify admin user_id is populated The fix ensures admin users get their own user information just like regular users. * test: make mock get_data signature match real method - Updated MockPrismaClientDB.get_data() to accept all parameters that the real method accepts - Makes mock more robust against future refactors - Added datetime and Union imports - Mock now returns None when user_id is not provided * [Fix] Pass MCP auth headers from request into tool fetch for /v1/responses and chat completions (#22291) * fixed dynamic auth for /responses with mcp * fixed greptile concern * fix(bedrock): filter internal json_tool_call when mixed with real tools Fixes #18381: When using both tools and response_format with Bedrock Converse API, LiteLLM internally adds json_tool_call to handle structured output. Bedrock may return both this internal tool AND real user-defined tools, breaking consumers like OpenAI Agents SDK. Changes: - Non-streaming: Added _filter_json_mode_tools() to handle 3 scenarios: only json_tool_call (convert to content), mixed (filter it out), or no json_tool_call (pass through) - Streaming: Added json_mode tracking to AWSEventStreamDecoder to suppress json_tool_call chunks and convert to text content - Fixed optional_params.pop() mutation issue Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * refactor: extract duplicated JSON unwrapping into helper method Addresses review comment from greptile-apps: #21107 (review) Changes: - Added `_unwrap_bedrock_properties()` helper method to eliminate code duplication - Replaced two identical JSON unwrapping blocks (lines 1592-1601 and 1612-1620) with calls to the new helper method - Improves maintainability - single source of truth for Bedrock properties unwrapping logic The helper method: - Parses JSON string - Checks for single "properties" key structure - Unwraps and returns the properties value - Returns original string if unwrapping not needed or parsing fails No functional changes - pure refactoring. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * fix: use correct class name AmazonConverseConfig in helper method calls Fixed MyPy errors where BedrockConverseConfig was used instead of AmazonConverseConfig in the _unwrap_bedrock_properties() calls. Errors: - Line 1619: BedrockConverseConfig -> AmazonConverseConfig - Line 1631: BedrockConverseConfig -> AmazonConverseConfig Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * fix: shorten guardrail benchmark result filenames for Windows long path support Fixes #21941 The generated result filenames from _save_confusion_results contained parentheses, dots, and full yaml filenames, producing paths that exceed the Windows 260-char MAX_PATH limit. Rework the safe_label logic to produce short {topic}_{method_abbrev} filenames (e.g. insults_cf.json) while preserving the full label inside the JSON content. Rename existing tracked result files to match the new naming convention. * Update litellm/proxy/guardrails/guardrail_hooks/litellm_content_filter/guardrail_benchmarks/test_eval.py Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> * Remove Apache 2 license from SKILL.md (#22322) * fix(mcp): default available_on_public_internet to true (#22331) * fix(mcp): default available_on_public_internet to true MCPs were defaulting to private (available_on_public_internet=false) which was a breaking change. This reverts the default to public (true) across: - Pydantic models (AddMCPServerRequest, UpdateMCPServerRequest, LiteLLM_MCPServerTable) - Prisma schema @default - mcp_server_manager.py YAML config + DB loading fallbacks - UI form initialValue and setFieldValue defaults * fix(ui): add forceRender to Collapse.Panel so toggle defaults render correctly Ant Design's Collapse.Panel lazy-renders children by default. Without forceRender, the Form.Item for 'Available on Public Internet' isn't mounted when the useEffect fires form.setFieldValue, causing the Switch to visually show OFF even though the intended default is true. Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com> * fix(mcp): update remaining schema copies and MCPServer type default to true Missed in previous commit per Greptile review: - schema.prisma (root) - litellm-proxy-extras/litellm_proxy_extras/schema.prisma - litellm/types/mcp_server/mcp_server_manager.py MCPServer class * ui(mcp): reframe network access as 'Internal network only' restriction Replace scary 'Available on Public Internet' toggle with 'Internal network only' opt-in restriction. Toggle OFF (default) = all networks allowed. Toggle ON = restricted to internal network only. Auth is always required either way. - MCPPermissionManagement: new label/tooltip/description, invert display via getValueProps/getValueFromEvent so underlying available_on_public_internet value is unchanged - mcp_server_view: 'Public' → 'All networks', 'Internal' → 'Internal only' (orange) - mcp_server_columns: same badge updates --------- Co-authored-by: Cursor Agent <cursoragent@cursor.com> Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com> * fix(jwt): OIDC discovery URLs, roles array handling, dot-notation error hints (#22336) * fix(jwt): support OIDC discovery URLs, handle roles array, improve error hints Three fixes for Azure AD JWT auth: 1. OIDC discovery URL support - JWT_PUBLIC_KEY_URL can now be set to .well-known/openid-configuration endpoints. The proxy fetches the discovery doc, extracts jwks_uri, and caches it. 2. Handle roles claim as array - when team_id_jwt_field points to a list (e.g. AAD's "roles": ["team1"]), auto-unwrap the first element instead of crashing with 'unhashable type: list'. 3. Better error hint for dot-notation indexing - when team_id_jwt_field is set to "roles.0" or "roles[0]", the 401 error now explains to use "roles" instead and that LiteLLM auto-unwraps lists. * Add integration demo script for JWT auth fixes (OIDC discovery, array roles, dot-notation hints) Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com> * Add demo_servers.py for manual JWT auth testing with mock JWKS/OIDC endpoints Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com> * Add demo screenshots for PR comment Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com> * Add integration test results with screenshots for PR review Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com> * address greptile review feedback (greploop iteration 1) - fix: add HTTP status code check in _resolve_jwks_url before parsing JSON - fix: remove misleading bracket-notation hint from debug log (get_nested_value does not support it) * Update tests/test_litellm/proxy/auth/test_handle_jwt.py Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> * remove demo scripts and assets --------- Co-authored-by: Cursor Agent <cursoragent@cursor.com> Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com> Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> * perf: streaming latency improvements — 4 targeted hot-path fixes (#22346) * perf: raise aiohttp connection pool limits (300→1000, 50/host→500) * perf: skip model_copy() on every chunk — only copy usage-bearing chunks * perf: replace list+join O(n²) with str+= O(n) in async_data_generator * perf: cache model-level guardrail lookup per request, not per chunk * test: add comprehensive Vitest coverage for CostTrackingSettings Add 88 tests across 9 test files for the CostTrackingSettings component directory: - provider_display_helpers.test.ts: 9 tests for helper functions - how_it_works.test.tsx: 9 tests for discount calculator component - add_provider_form.test.tsx: 7 tests for provider form validation - add_margin_form.test.tsx: 9 tests for margin form with type toggle - provider_discount_table.test.tsx: 12 tests for table editing and interactions - provider_margin_table.test.tsx: 13 tests for margin table with sorting - use_discount_config.test.ts: 11 tests for discount hook logic - use_margin_config.test.ts: 12 tests for margin hook logic - cost_tracking_settings.test.tsx: 15 tests for main component and role-based rendering All tests passing. Coverage includes form validation, user interactions, API calls, state management, and conditional rendering. Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com> * [Feature] Key list endpoint: Add project_id and access_group_id filters Add filtering capabilities to /key/list endpoint for project_id and access_group_id parameters. Both filters work globally across all visibility rules and stack with existing sort/pagination params. Added comprehensive unit tests for the new filters. Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com> * [Feature] UI - Projects: Add Project Details page with Edit modal - Add ProjectDetailsPage with header, details card, spend/budget progress, model spend bar chart, keys placeholder, and team info card - Refactor CreateProjectModal into base form pattern (ProjectBaseForm) shared between Create and Edit flows - Add EditProjectModal with pre-filled form data from backend - Add useProjectDetails and useUpdateProject hooks - Add duplicate key validation for model limits and metadata - Wire project ID click in table to navigate to detail view - Move pagination inline with search bar Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Update ui/litellm-dashboard/src/components/Projects/ProjectModals/CreateProjectModal.tsx Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> * fix(anthropic): handle OAuth tokens in count_tokens endpoint The count_tokens API's get_required_headers() always set x-api-key, which is incorrect for OAuth tokens (sk-ant-oat*). These tokens must use Authorization: Bearer instead. Changes: - Add optionally_handle_anthropic_oauth() call in get_required_headers() to convert OAuth tokens from x-api-key to Authorization: Bearer - Add _merge_beta_headers() helper to preserve existing anthropic-beta values (e.g. token-counting) when appending the OAuth beta header - Add 7 tests covering regular and OAuth header generation Fixes #22040 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --------- Co-authored-by: Zero Clover <zero@root.me> Co-authored-by: Ryan Crabbe <rcrabbe@berkeley.edu> Co-authored-by: ryan-crabbe <128659760+ryan-crabbe@users.noreply.github.com> Co-authored-by: Dylan Duan <dylan.duan@assemblyai.com> Co-authored-by: Julio Quinteros Pro <jquinter@gmail.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com> Co-authored-by: Shivaang <shivaang.05@gmail.com> Co-authored-by: Cursor Agent <cursoragent@cursor.com> Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com> Co-authored-by: milan-berri <milan@berri.ai> Co-authored-by: Shivam Rawat <161387515+shivamrawat1@users.noreply.github.com> Co-authored-by: Brian Caswell <bcaswell@microsoft.com> Co-authored-by: Brian Caswell <bcaswell@gmail.com> Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Co-authored-by: rasmi <rrelasmar@gmail.com> Co-authored-by: yuneng-jiang <yuneng.jiang@gmail.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

feat: add in_flight_requests metric to /health/backlog + prometheus

d511087

vercel bot deployed to Preview February 27, 2026 22:17 View deployment

greptile-apps bot reviewed Feb 27, 2026

View reviewed changes

refactor: clean class with static methods, add tests, fix sentinel pa…

c9e84f5

…ttern

vercel bot deployed to Preview February 28, 2026 01:48 View deployment

greptile-apps bot reviewed Feb 28, 2026

View reviewed changes

docs: add in_flight_requests to prometheus metrics and latency troubl…

dfe0c7f

…eshooting

vercel bot deployed to Preview February 28, 2026 01:59 View deployment

ishaan-jaff merged commit 15fcd90 into main Feb 28, 2026
29 of 34 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: add in_flight_requests metric to /health/backlog + prometheus#22319

feat: add in_flight_requests metric to /health/backlog + prometheus#22319
ishaan-jaff merged 3 commits intomainfrom
worktree-melodic-sniffing-pearl

ishaan-jaff commented Feb 27, 2026 •

edited

Loading

Uh oh!

vercel bot commented Feb 27, 2026 •

edited

Loading

Uh oh!

greptile-apps bot commented Feb 27, 2026 •

edited

Loading

Important Files Changed

Uh oh!

greptile-apps bot left a comment •

edited

Loading

Uh oh!

greptile-apps bot Feb 27, 2026

Uh oh!

greptile-apps bot Feb 27, 2026

Uh oh!

ishaan-jaff commented Feb 28, 2026

Uh oh!

greptile-apps bot Feb 28, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

ishaan-jaff commented Feb 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Relevant issues

Pre-Submission checklist

CI (LiteLLM team)

Type

Changes

Problem

What this adds

Implementation

POC results

How SREs use this

Uh oh!

vercel bot commented Feb 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

greptile-apps bot commented Feb 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

ishaan-jaff commented Feb 28, 2026

Uh oh!

greptile-apps bot Feb 28, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ishaan-jaff commented Feb 27, 2026 •

edited

Loading

vercel bot commented Feb 27, 2026 •

edited

Loading

greptile-apps bot commented Feb 27, 2026 •

edited

Loading

greptile-apps bot left a comment •

edited

Loading