Skip to content

docs: update AssemblyAI docs with Universal-3 Pro, Speech Understanding, and LLM Gateway#21130

Merged
krrishdholakia merged 2 commits intoBerriAI:mainfrom
dylan-duan-aai:feat/assemblyai-llm-gateway
Feb 28, 2026
Merged

docs: update AssemblyAI docs with Universal-3 Pro, Speech Understanding, and LLM Gateway#21130
krrishdholakia merged 2 commits intoBerriAI:mainfrom
dylan-duan-aai:feat/assemblyai-llm-gateway

Conversation

@dylan-duan-aai
Copy link
Contributor

@dylan-duan-aai dylan-duan-aai commented Feb 13, 2026

Summary

  • Adds new /assemblyai-llm-gateway/* pass-through route targeting https://llm-gateway.assemblyai.com
  • Adds route to authorization allowlist in _types.py
  • Updates AssemblyAI docs with LLM Gateway section, Universal-3 Pro examples, and Speech Understanding features
  • Adds integration tests for the LLM Gateway pass-through

Test plan

  • Integration tests in tests/pass_through_tests/test_assemblyai_llm_gateway.py (chat completion, multi-turn, multiple models, usage tracking)
  • Verified against live AssemblyAI LLM Gateway with Claude Sonnet and Haiku models
  • Existing AssemblyAI STT pass-through routes unaffected

@vercel
Copy link

vercel bot commented Feb 13, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
litellm Ready Ready Preview, Comment Feb 27, 2026 4:17am

Request Review

@CLAassistant
Copy link

CLAassistant commented Feb 13, 2026

CLA assistant check
All committers have signed the CLA.

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Feb 13, 2026

Greptile Summary

This PR modernizes the AssemblyAI pass-through documentation with consistent branding ("AssemblyAI" instead of "Assembly AI"), updated Quick Start examples showcasing Universal-3 Pro speech models and Speech Understanding features, and a new LLM Gateway section documenting how to use AssemblyAI's OpenAI-compatible LLM Gateway as a provider through LiteLLM's standard model_list config.

  • Docs improvements: Replaces embedded Loom video with a structured Supported Routes table, modernizes code examples with error handling and TranscriptionConfig, and adds a prompting example for Universal-3 Pro
  • LLM Gateway section: Documents using AssemblyAI's LLM Gateway via openai/* model prefix with api_base pointing to https://llm-gateway.assemblyai.com/v1 — this is a standard OpenAI-compatible provider config, not a new pass-through route
  • Stale _types.py changes: The diff includes unrelated changes to litellm/proxy/_types.py (new routes, access_group_ids fields, LiteLLM_ManagedVectorStoreTable) that already exist on the target branch — these are artifacts of a stale branch fork and will be no-ops at merge time
  • PR description mismatch: The description claims new pass-through routes and integration tests that are not present in the changeset

Confidence Score: 4/5

  • This PR is safe to merge — the documentation changes are well-written and the _types.py changes are already on main.
  • The actual net change is documentation-only. The _types.py diff is a no-op since all those changes already exist on the target branch. The docs improvements are substantive and correct. Minor style inconsistency in the EU endpoints section. The PR description overstates the scope (mentions routes and tests not in the changeset).
  • litellm/proxy/_types.py — contains unrelated changes from a stale branch fork; recommend rebasing onto latest main for a clean diff.

Important Files Changed

Filename Overview
docs/my-website/docs/pass_through/assembly_ai.md Documentation updated with consistent AssemblyAI branding, modernized Quick Start examples with Universal-3 Pro and Speech Understanding features, and new LLM Gateway section using OpenAI-compatible provider config. EU endpoint section still uses the older code style unlike the updated US section.
litellm/proxy/_types.py Adds access_group_ids to KeyRequestBase, TeamBase, UpdateTeamRequest, LiteLLM_VerificationToken; adds daily activity routes to management_routes; adds LiteLLM_ManagedVectorStoreTable class. None of these changes are related to AssemblyAI documentation — they appear to be from the author's branch being based on a stale fork of main. All changes already exist on the base branch, so the merge should be a no-op for this file.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    subgraph STT["Speech-to-Text (Pass-through)"]
        A[Client App] -->|"/assemblyai/*"| B[LiteLLM Proxy]
        A -->|"/eu.assemblyai/*"| B
        B -->|US| C["api.assemblyai.com"]
        B -->|EU| D["eu.api.assemblyai.com"]
    end

    subgraph LLM["LLM Gateway (OpenAI-compatible)"]
        E[Client App] -->|"/v1/chat/completions\nmodel: assemblyai/*"| F[LiteLLM Proxy]
        F -->|"openai/* via api_base"| G["llm-gateway.assemblyai.com/v1"]
        G --> H["Claude / GPT / Gemini"]
    end
Loading

Last reviewed commit: b60bef9

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

4 files reviewed, 4 comments

Edit Code Review Agent Settings | Greptile

@dylan-duan-aai
Copy link
Contributor Author

CI failures are pre-existing on main and unrelated to this PR

@dylan-duan-aai dylan-duan-aai force-pushed the feat/assemblyai-llm-gateway branch from c390bfb to b60bef9 Compare February 13, 2026 21:51
@dylan-duan-aai dylan-duan-aai changed the title feat: add AssemblyAI LLM Gateway pass-through support docs: update AssemblyAI docs with Universal-3 Pro, Speech Understanding, and LLM Gateway Feb 13, 2026
@dylan-duan-aai
Copy link
Contributor Author

No code changes needed. This PR is now docs-only.

Removed the pass-through route and instead documented the LLM Gateway as an OpenAI-compatible provider via api_base. Updated the STT docs with Universal-3 Pro, prompting, and Speech Understanding examples.

@dylan-duan-aai dylan-duan-aai force-pushed the feat/assemblyai-llm-gateway branch 3 times, most recently from 8fb4124 to 6edbff1 Compare February 27, 2026 04:13
@krrishdholakia krrishdholakia merged commit af6fe18 into BerriAI:main Feb 28, 2026
27 of 30 checks passed
krrishdholakia pushed a commit that referenced this pull request Feb 28, 2026
* Add Prometheus child_exit cleanup for gunicorn workers

When a gunicorn worker exits (e.g. from max_requests recycling), its
per-process prometheus .db files remain on disk. For gauges using
livesum/liveall mode, this means the dead worker's last-known values
persist as if the process were still alive. Wire gunicorn's child_exit
hook to call mark_process_dead() so live-tracking gauges accurately
reflect only running workers.

* docs: update AssemblyAI docs with Universal-3 Pro, Speech Understanding, and LLM Gateway (#21130)

* docs: update AssemblyAI docs with Universal-3 Pro, Speech Understanding, and LLM Gateway provider config

* feat: add AssemblyAI LLM Gateway as OpenAI-compatible provider

* fix(mcp): update test mocks to use renamed filter_server_ids_by_ip_with_info

Tests were mocking the old method name `filter_server_ids_by_ip` but production
code at server.py:774 calls `filter_server_ids_by_ip_with_info` which returns
a (server_ids, blocked_count) tuple. The unmocked method on AsyncMock returned
a coroutine, causing "cannot unpack non-iterable coroutine object" errors.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(test): update realtime guardrail test assertions for voice violation behavior

Tests were asserting no response.create/conversation.item.create sent to
backend when guardrail blocks, but the implementation intentionally sends
these to have the LLM voice the guardrail violation message to the user.

Updated assertions to verify the correct guardrail flow:
- response.cancel is sent to stop any in-progress response
- conversation.item.create with violation message is injected
- response.create is sent to voice the violation
- original blocked content is NOT forwarded

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(bedrock): restore parallel_tool_calls mapping in map_openai_params

The revert in 8565c70 removed the parallel_tool_calls handling from
map_openai_params, and the subsequent fix d0445e1 only re-added the
transform_request consumption but forgot to re-add the map_openai_params
producer that sets _parallel_tool_use_config. This meant parallel_tool_calls
was silently ignored for all Bedrock models.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(test): update Azure pass-through test to mock litellm.completion

Commit 99c62ca removed "azure" from _RESPONSES_API_PROVIDERS,
routing Azure models through litellm.completion instead of
litellm.responses. The test was not updated to match, causing it
to assert against the wrong mock.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: add in_flight_requests metric to /health/backlog + prometheus (#22319)

* feat: add in_flight_requests metric to /health/backlog + prometheus

* refactor: clean class with static methods, add tests, fix sentinel pattern

* docs: add in_flight_requests to prometheus metrics and latency troubleshooting

* fix(db): add missing migration for LiteLLM_ClaudeCodePluginTable

PR #22271 added the LiteLLM_ClaudeCodePluginTable model to
schema.prisma but did not include a corresponding migration file,
causing test_aaaasschema_migration_check to fail.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: update stale docstring to match guardrail voicing behavior

Addresses Greptile review feedback.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* [Feat] Agent RBAC Permission Fix - Ensure Internal Users cannot create agents (#22329)

* fix: enforce RBAC on agent endpoints — block non-admin create/update/delete

- Add /v1/agents/{agent_id} to agent_routes so internal users can
  access GET-by-ID (previously returned 403 due to missing route pattern)
- Add _check_agent_management_permission() guard to POST, PUT, PATCH,
  DELETE agent endpoints — only PROXY_ADMIN may mutate agents
- Add user_api_key_dict param to delete_agent so the role check works
- Add comprehensive unit tests for RBAC enforcement across all roles

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* fix: mock prisma_client in internal user get-agent-by-id test

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* feat(ui): hide agent create/delete controls for non-admin users

Match MCP servers pattern: wrap '+ Add New Agent' button in
isAdmin conditional so internal users see a read-only agents view.
Delete buttons in card and table were already gated.
Update empty-state copy for non-admin users.
Add 7 Vitest tests covering role-based visibility.

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

---------

Co-authored-by: Cursor Agent <cursoragent@cursor.com>
Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* fix: exclude gpt-5.2-chat from temperature passthrough (#21911)

gpt-5.2-chat and gpt-5.2-chat-latest only support temperature=1
(like base gpt-5), not arbitrary values (like gpt-5.2).
Update is_model_gpt_5_1_model() to exclude gpt-5.2-chat variants
so drop_params correctly drops unsupported temperature values.

Fixes #21911

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

---------

Co-authored-by: Ryan Crabbe <rcrabbe@berkeley.edu>
Co-authored-by: ryan-crabbe <128659760+ryan-crabbe@users.noreply.github.com>
Co-authored-by: Dylan Duan <dylan.duan@assemblyai.com>
Co-authored-by: Julio Quinteros Pro <jquinter@gmail.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com>
Co-authored-by: Cursor Agent <cursoragent@cursor.com>
Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
krrishdholakia pushed a commit that referenced this pull request Feb 28, 2026
* fix(image_generation): propagate extra_headers to OpenAI image generation

Add headers parameter to image_generation() and aimage_generation() methods
in OpenAI provider, and pass headers from images/main.py to ensure custom
headers like cf-aig-authorization are properly forwarded to the OpenAI API.
Aligns behavior with completion() method and Azure provider implementation.

* test(image_generation): add tests for extra_headers propagation

Verify that extra_headers are correctly forwarded to OpenAI's
images.generate() in both sync and async paths, and that they
are absent when not provided.

* Add Prometheus child_exit cleanup for gunicorn workers

When a gunicorn worker exits (e.g. from max_requests recycling), its
per-process prometheus .db files remain on disk. For gauges using
livesum/liveall mode, this means the dead worker's last-known values
persist as if the process were still alive. Wire gunicorn's child_exit
hook to call mark_process_dead() so live-tracking gauges accurately
reflect only running workers.

* docs: update AssemblyAI docs with Universal-3 Pro, Speech Understanding, and LLM Gateway (#21130)

* docs: update AssemblyAI docs with Universal-3 Pro, Speech Understanding, and LLM Gateway provider config

* feat: add AssemblyAI LLM Gateway as OpenAI-compatible provider

* fix(mcp): update test mocks to use renamed filter_server_ids_by_ip_with_info

Tests were mocking the old method name `filter_server_ids_by_ip` but production
code at server.py:774 calls `filter_server_ids_by_ip_with_info` which returns
a (server_ids, blocked_count) tuple. The unmocked method on AsyncMock returned
a coroutine, causing "cannot unpack non-iterable coroutine object" errors.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(test): update realtime guardrail test assertions for voice violation behavior

Tests were asserting no response.create/conversation.item.create sent to
backend when guardrail blocks, but the implementation intentionally sends
these to have the LLM voice the guardrail violation message to the user.

Updated assertions to verify the correct guardrail flow:
- response.cancel is sent to stop any in-progress response
- conversation.item.create with violation message is injected
- response.create is sent to voice the violation
- original blocked content is NOT forwarded

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(bedrock): restore parallel_tool_calls mapping in map_openai_params

The revert in 8565c70 removed the parallel_tool_calls handling from
map_openai_params, and the subsequent fix d0445e1 only re-added the
transform_request consumption but forgot to re-add the map_openai_params
producer that sets _parallel_tool_use_config. This meant parallel_tool_calls
was silently ignored for all Bedrock models.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(test): update Azure pass-through test to mock litellm.completion

Commit 99c62ca removed "azure" from _RESPONSES_API_PROVIDERS,
routing Azure models through litellm.completion instead of
litellm.responses. The test was not updated to match, causing it
to assert against the wrong mock.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: add in_flight_requests metric to /health/backlog + prometheus (#22319)

* feat: add in_flight_requests metric to /health/backlog + prometheus

* refactor: clean class with static methods, add tests, fix sentinel pattern

* docs: add in_flight_requests to prometheus metrics and latency troubleshooting

* fix(db): add missing migration for LiteLLM_ClaudeCodePluginTable

PR #22271 added the LiteLLM_ClaudeCodePluginTable model to
schema.prisma but did not include a corresponding migration file,
causing test_aaaasschema_migration_check to fail.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: update stale docstring to match guardrail voicing behavior

Addresses Greptile review feedback.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(caching): store background task references in LLMClientCache._remove_key to prevent unawaited coroutine warnings

Fixes #22128

* [Feat] Agent RBAC Permission Fix - Ensure Internal Users cannot create agents (#22329)

* fix: enforce RBAC on agent endpoints — block non-admin create/update/delete

- Add /v1/agents/{agent_id} to agent_routes so internal users can
  access GET-by-ID (previously returned 403 due to missing route pattern)
- Add _check_agent_management_permission() guard to POST, PUT, PATCH,
  DELETE agent endpoints — only PROXY_ADMIN may mutate agents
- Add user_api_key_dict param to delete_agent so the role check works
- Add comprehensive unit tests for RBAC enforcement across all roles

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* fix: mock prisma_client in internal user get-agent-by-id test

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* feat(ui): hide agent create/delete controls for non-admin users

Match MCP servers pattern: wrap '+ Add New Agent' button in
isAdmin conditional so internal users see a read-only agents view.
Delete buttons in card and table were already gated.
Update empty-state copy for non-admin users.
Add 7 Vitest tests covering role-based visibility.

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

---------

Co-authored-by: Cursor Agent <cursoragent@cursor.com>
Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* fix: Add PROXY_ADMIN role to system user for key rotation (#21896)

* fix: Add PROXY_ADMIN role to system user for key rotation

The key rotation worker was failing with 'You are not authorized to regenerate this key'
when rotating team keys. This was because the system user created by
get_litellm_internal_jobs_user_api_key_auth() was missing the user_role field.

Without user_role=PROXY_ADMIN, the system user couldn't bypass team permission checks
in can_team_member_execute_key_management_endpoint(), causing authorization failures
for team key rotation.

This fix adds user_role=LitellmUserRoles.PROXY_ADMIN to the system user, allowing
it to bypass team permission checks and successfully rotate keys for all teams.

* test: Add unit test for system user PROXY_ADMIN role

- Verify internal jobs system user has PROXY_ADMIN role
- Critical for key rotation to bypass team permission checks
- Regression test for PR #21896

* fix: populate user_id and user_info for admin users in /user/info (#22239)

* fix: populate user_id and user_info for admin users in /user/info endpoint

Fixes #22179

When admin users call /user/info without a user_id parameter, the endpoint
was returning null for both user_id and user_info fields. This broke
budgeting tooling that relies on /user/info to look up current budget and spend.

Changes:
- Modified _get_user_info_for_proxy_admin() to accept user_api_key_dict parameter
- Added logic to fetch admin's own user info from database
- Updated function to return admin's user_id and user_info instead of null
- Updated unit test to verify admin user_id is populated

The fix ensures admin users get their own user information just like regular users.

* test: make mock get_data signature match real method

- Updated MockPrismaClientDB.get_data() to accept all parameters that the real method accepts
- Makes mock more robust against future refactors
- Added datetime and Union imports
- Mock now returns None when user_id is not provided

* [Fix] Pass MCP auth headers from request into tool fetch for /v1/responses and chat completions (#22291)

* fixed dynamic auth for /responses with mcp

* fixed greptile concern

* fix(bedrock): filter internal json_tool_call when mixed with real tools

Fixes #18381: When using both tools and response_format with Bedrock
Converse API, LiteLLM internally adds json_tool_call to handle structured
output. Bedrock may return both this internal tool AND real user-defined
tools, breaking consumers like OpenAI Agents SDK.

Changes:
- Non-streaming: Added _filter_json_mode_tools() to handle 3 scenarios:
  only json_tool_call (convert to content), mixed (filter it out), or
  no json_tool_call (pass through)
- Streaming: Added json_mode tracking to AWSEventStreamDecoder to suppress
  json_tool_call chunks and convert to text content
- Fixed optional_params.pop() mutation issue

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

* refactor: extract duplicated JSON unwrapping into helper method

Addresses review comment from greptile-apps:
#21107 (review)

Changes:
- Added `_unwrap_bedrock_properties()` helper method to eliminate code duplication
- Replaced two identical JSON unwrapping blocks (lines 1592-1601 and 1612-1620)
  with calls to the new helper method
- Improves maintainability - single source of truth for Bedrock properties unwrapping logic

The helper method:
- Parses JSON string
- Checks for single "properties" key structure
- Unwraps and returns the properties value
- Returns original string if unwrapping not needed or parsing fails

No functional changes - pure refactoring.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

* fix: use correct class name AmazonConverseConfig in helper method calls

Fixed MyPy errors where BedrockConverseConfig was used instead of
AmazonConverseConfig in the _unwrap_bedrock_properties() calls.

Errors:
- Line 1619: BedrockConverseConfig -> AmazonConverseConfig
- Line 1631: BedrockConverseConfig -> AmazonConverseConfig

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

* fix: shorten guardrail benchmark result filenames for Windows long path support

Fixes #21941

The generated result filenames from _save_confusion_results contained
parentheses, dots, and full yaml filenames, producing paths that exceed
the Windows 260-char MAX_PATH limit. Rework the safe_label logic to
produce short {topic}_{method_abbrev} filenames (e.g. insults_cf.json)
while preserving the full label inside the JSON content.

Rename existing tracked result files to match the new naming convention.

* Update litellm/proxy/guardrails/guardrail_hooks/litellm_content_filter/guardrail_benchmarks/test_eval.py

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

* Remove Apache 2 license from SKILL.md (#22322)

* fix(mcp): default available_on_public_internet to true (#22331)

* fix(mcp): default available_on_public_internet to true

MCPs were defaulting to private (available_on_public_internet=false) which
was a breaking change. This reverts the default to public (true) across:
- Pydantic models (AddMCPServerRequest, UpdateMCPServerRequest, LiteLLM_MCPServerTable)
- Prisma schema @default
- mcp_server_manager.py YAML config + DB loading fallbacks
- UI form initialValue and setFieldValue defaults

* fix(ui): add forceRender to Collapse.Panel so toggle defaults render correctly

Ant Design's Collapse.Panel lazy-renders children by default. Without
forceRender, the Form.Item for 'Available on Public Internet' isn't
mounted when the useEffect fires form.setFieldValue, causing the Switch
to visually show OFF even though the intended default is true.

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* fix(mcp): update remaining schema copies and MCPServer type default to true

Missed in previous commit per Greptile review:
- schema.prisma (root)
- litellm-proxy-extras/litellm_proxy_extras/schema.prisma
- litellm/types/mcp_server/mcp_server_manager.py MCPServer class

* ui(mcp): reframe network access as 'Internal network only' restriction

Replace scary 'Available on Public Internet' toggle with 'Internal network only'
opt-in restriction. Toggle OFF (default) = all networks allowed. Toggle ON =
restricted to internal network only. Auth is always required either way.

- MCPPermissionManagement: new label/tooltip/description, invert display via
  getValueProps/getValueFromEvent so underlying available_on_public_internet
  value is unchanged
- mcp_server_view: 'Public' → 'All networks', 'Internal' → 'Internal only' (orange)
- mcp_server_columns: same badge updates

---------

Co-authored-by: Cursor Agent <cursoragent@cursor.com>
Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* fix(jwt): OIDC discovery URLs, roles array handling, dot-notation error hints (#22336)

* fix(jwt): support OIDC discovery URLs, handle roles array, improve error hints

Three fixes for Azure AD JWT auth:

1. OIDC discovery URL support - JWT_PUBLIC_KEY_URL can now be set to
   .well-known/openid-configuration endpoints. The proxy fetches the
   discovery doc, extracts jwks_uri, and caches it.

2. Handle roles claim as array - when team_id_jwt_field points to a list
   (e.g. AAD's "roles": ["team1"]), auto-unwrap the first element instead
   of crashing with 'unhashable type: list'.

3. Better error hint for dot-notation indexing - when team_id_jwt_field is
   set to "roles.0" or "roles[0]", the 401 error now explains to use
   "roles" instead and that LiteLLM auto-unwraps lists.

* Add integration demo script for JWT auth fixes (OIDC discovery, array roles, dot-notation hints)

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* Add demo_servers.py for manual JWT auth testing with mock JWKS/OIDC endpoints

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* Add demo screenshots for PR comment

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* Add integration test results with screenshots for PR review

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* address greptile review feedback (greploop iteration 1)

- fix: add HTTP status code check in _resolve_jwks_url before parsing JSON
- fix: remove misleading bracket-notation hint from debug log (get_nested_value does not support it)

* Update tests/test_litellm/proxy/auth/test_handle_jwt.py

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

* remove demo scripts and assets

---------

Co-authored-by: Cursor Agent <cursoragent@cursor.com>
Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

* perf: streaming latency improvements — 4 targeted hot-path fixes (#22346)

* perf: raise aiohttp connection pool limits (300→1000, 50/host→500)

* perf: skip model_copy() on every chunk — only copy usage-bearing chunks

* perf: replace list+join O(n²) with str+= O(n) in async_data_generator

* perf: cache model-level guardrail lookup per request, not per chunk

* test: add comprehensive Vitest coverage for CostTrackingSettings

Add 88 tests across 9 test files for the CostTrackingSettings component directory:
- provider_display_helpers.test.ts: 9 tests for helper functions
- how_it_works.test.tsx: 9 tests for discount calculator component
- add_provider_form.test.tsx: 7 tests for provider form validation
- add_margin_form.test.tsx: 9 tests for margin form with type toggle
- provider_discount_table.test.tsx: 12 tests for table editing and interactions
- provider_margin_table.test.tsx: 13 tests for margin table with sorting
- use_discount_config.test.ts: 11 tests for discount hook logic
- use_margin_config.test.ts: 12 tests for margin hook logic
- cost_tracking_settings.test.tsx: 15 tests for main component and role-based rendering

All tests passing. Coverage includes form validation, user interactions, API calls, state management, and conditional rendering.

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

* [Feature] Key list endpoint: Add project_id and access_group_id filters

Add filtering capabilities to /key/list endpoint for project_id and access_group_id parameters. Both filters work globally across all visibility rules and stack with existing sort/pagination params. Added comprehensive unit tests for the new filters.

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

* [Feature] UI - Projects: Add Project Details page with Edit modal

- Add ProjectDetailsPage with header, details card, spend/budget progress,
  model spend bar chart, keys placeholder, and team info card
- Refactor CreateProjectModal into base form pattern (ProjectBaseForm)
  shared between Create and Edit flows
- Add EditProjectModal with pre-filled form data from backend
- Add useProjectDetails and useUpdateProject hooks
- Add duplicate key validation for model limits and metadata
- Wire project ID click in table to navigate to detail view
- Move pagination inline with search bar

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Update ui/litellm-dashboard/src/components/Projects/ProjectModals/CreateProjectModal.tsx

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

* fix(types): filter null fields from reasoning output items in ResponsesAPIResponse

When providers return reasoning items without status/content/encrypted_content,
Pydantic's Optional defaults serialize them as null. This breaks downstream SDKs
(e.g., the OpenAI C# SDK crashes on status=null).

Add a field_serializer on ResponsesAPIResponse.output that removes null
status, content, and encrypted_content from reasoning items during
serialization. This mirrors the request-side filtering already done in
OpenAIResponsesAPIConfig._handle_reasoning_item().

Fixes #16824

---------

Co-authored-by: Zero Clover <zero@root.me>
Co-authored-by: Ryan Crabbe <rcrabbe@berkeley.edu>
Co-authored-by: ryan-crabbe <128659760+ryan-crabbe@users.noreply.github.com>
Co-authored-by: Dylan Duan <dylan.duan@assemblyai.com>
Co-authored-by: Julio Quinteros Pro <jquinter@gmail.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com>
Co-authored-by: Cursor Agent <cursoragent@cursor.com>
Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>
Co-authored-by: milan-berri <milan@berri.ai>
Co-authored-by: Shivam Rawat <161387515+shivamrawat1@users.noreply.github.com>
Co-authored-by: Brian Caswell <bcaswell@microsoft.com>
Co-authored-by: Brian Caswell <bcaswell@gmail.com>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Co-authored-by: rasmi <rrelasmar@gmail.com>
Co-authored-by: yuneng-jiang <yuneng.jiang@gmail.com>
krrishdholakia pushed a commit that referenced this pull request Feb 28, 2026
…on check for GA path (#22369)

* fix(image_generation): propagate extra_headers to OpenAI image generation

Add headers parameter to image_generation() and aimage_generation() methods
in OpenAI provider, and pass headers from images/main.py to ensure custom
headers like cf-aig-authorization are properly forwarded to the OpenAI API.
Aligns behavior with completion() method and Azure provider implementation.

* test(image_generation): add tests for extra_headers propagation

Verify that extra_headers are correctly forwarded to OpenAI's
images.generate() in both sync and async paths, and that they
are absent when not provided.

* Add Prometheus child_exit cleanup for gunicorn workers

When a gunicorn worker exits (e.g. from max_requests recycling), its
per-process prometheus .db files remain on disk. For gauges using
livesum/liveall mode, this means the dead worker's last-known values
persist as if the process were still alive. Wire gunicorn's child_exit
hook to call mark_process_dead() so live-tracking gauges accurately
reflect only running workers.

* docs: update AssemblyAI docs with Universal-3 Pro, Speech Understanding, and LLM Gateway (#21130)

* docs: update AssemblyAI docs with Universal-3 Pro, Speech Understanding, and LLM Gateway provider config

* feat: add AssemblyAI LLM Gateway as OpenAI-compatible provider

* fix(mcp): update test mocks to use renamed filter_server_ids_by_ip_with_info

Tests were mocking the old method name `filter_server_ids_by_ip` but production
code at server.py:774 calls `filter_server_ids_by_ip_with_info` which returns
a (server_ids, blocked_count) tuple. The unmocked method on AsyncMock returned
a coroutine, causing "cannot unpack non-iterable coroutine object" errors.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(test): update realtime guardrail test assertions for voice violation behavior

Tests were asserting no response.create/conversation.item.create sent to
backend when guardrail blocks, but the implementation intentionally sends
these to have the LLM voice the guardrail violation message to the user.

Updated assertions to verify the correct guardrail flow:
- response.cancel is sent to stop any in-progress response
- conversation.item.create with violation message is injected
- response.create is sent to voice the violation
- original blocked content is NOT forwarded

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(bedrock): restore parallel_tool_calls mapping in map_openai_params

The revert in 8565c70 removed the parallel_tool_calls handling from
map_openai_params, and the subsequent fix d0445e1 only re-added the
transform_request consumption but forgot to re-add the map_openai_params
producer that sets _parallel_tool_use_config. This meant parallel_tool_calls
was silently ignored for all Bedrock models.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(test): update Azure pass-through test to mock litellm.completion

Commit 99c62ca removed "azure" from _RESPONSES_API_PROVIDERS,
routing Azure models through litellm.completion instead of
litellm.responses. The test was not updated to match, causing it
to assert against the wrong mock.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: add in_flight_requests metric to /health/backlog + prometheus (#22319)

* feat: add in_flight_requests metric to /health/backlog + prometheus

* refactor: clean class with static methods, add tests, fix sentinel pattern

* docs: add in_flight_requests to prometheus metrics and latency troubleshooting

* fix(db): add missing migration for LiteLLM_ClaudeCodePluginTable

PR #22271 added the LiteLLM_ClaudeCodePluginTable model to
schema.prisma but did not include a corresponding migration file,
causing test_aaaasschema_migration_check to fail.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: update stale docstring to match guardrail voicing behavior

Addresses Greptile review feedback.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(caching): store background task references in LLMClientCache._remove_key to prevent unawaited coroutine warnings

Fixes #22128

* [Feat] Agent RBAC Permission Fix - Ensure Internal Users cannot create agents (#22329)

* fix: enforce RBAC on agent endpoints — block non-admin create/update/delete

- Add /v1/agents/{agent_id} to agent_routes so internal users can
  access GET-by-ID (previously returned 403 due to missing route pattern)
- Add _check_agent_management_permission() guard to POST, PUT, PATCH,
  DELETE agent endpoints — only PROXY_ADMIN may mutate agents
- Add user_api_key_dict param to delete_agent so the role check works
- Add comprehensive unit tests for RBAC enforcement across all roles

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* fix: mock prisma_client in internal user get-agent-by-id test

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* feat(ui): hide agent create/delete controls for non-admin users

Match MCP servers pattern: wrap '+ Add New Agent' button in
isAdmin conditional so internal users see a read-only agents view.
Delete buttons in card and table were already gated.
Update empty-state copy for non-admin users.
Add 7 Vitest tests covering role-based visibility.

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

---------

Co-authored-by: Cursor Agent <cursoragent@cursor.com>
Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* fix: Add PROXY_ADMIN role to system user for key rotation (#21896)

* fix: Add PROXY_ADMIN role to system user for key rotation

The key rotation worker was failing with 'You are not authorized to regenerate this key'
when rotating team keys. This was because the system user created by
get_litellm_internal_jobs_user_api_key_auth() was missing the user_role field.

Without user_role=PROXY_ADMIN, the system user couldn't bypass team permission checks
in can_team_member_execute_key_management_endpoint(), causing authorization failures
for team key rotation.

This fix adds user_role=LitellmUserRoles.PROXY_ADMIN to the system user, allowing
it to bypass team permission checks and successfully rotate keys for all teams.

* test: Add unit test for system user PROXY_ADMIN role

- Verify internal jobs system user has PROXY_ADMIN role
- Critical for key rotation to bypass team permission checks
- Regression test for PR #21896

* fix: populate user_id and user_info for admin users in /user/info (#22239)

* fix: populate user_id and user_info for admin users in /user/info endpoint

Fixes #22179

When admin users call /user/info without a user_id parameter, the endpoint
was returning null for both user_id and user_info fields. This broke
budgeting tooling that relies on /user/info to look up current budget and spend.

Changes:
- Modified _get_user_info_for_proxy_admin() to accept user_api_key_dict parameter
- Added logic to fetch admin's own user info from database
- Updated function to return admin's user_id and user_info instead of null
- Updated unit test to verify admin user_id is populated

The fix ensures admin users get their own user information just like regular users.

* test: make mock get_data signature match real method

- Updated MockPrismaClientDB.get_data() to accept all parameters that the real method accepts
- Makes mock more robust against future refactors
- Added datetime and Union imports
- Mock now returns None when user_id is not provided

* [Fix] Pass MCP auth headers from request into tool fetch for /v1/responses and chat completions (#22291)

* fixed dynamic auth for /responses with mcp

* fixed greptile concern

* fix(bedrock): filter internal json_tool_call when mixed with real tools

Fixes #18381: When using both tools and response_format with Bedrock
Converse API, LiteLLM internally adds json_tool_call to handle structured
output. Bedrock may return both this internal tool AND real user-defined
tools, breaking consumers like OpenAI Agents SDK.

Changes:
- Non-streaming: Added _filter_json_mode_tools() to handle 3 scenarios:
  only json_tool_call (convert to content), mixed (filter it out), or
  no json_tool_call (pass through)
- Streaming: Added json_mode tracking to AWSEventStreamDecoder to suppress
  json_tool_call chunks and convert to text content
- Fixed optional_params.pop() mutation issue

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

* refactor: extract duplicated JSON unwrapping into helper method

Addresses review comment from greptile-apps:
#21107 (review)

Changes:
- Added `_unwrap_bedrock_properties()` helper method to eliminate code duplication
- Replaced two identical JSON unwrapping blocks (lines 1592-1601 and 1612-1620)
  with calls to the new helper method
- Improves maintainability - single source of truth for Bedrock properties unwrapping logic

The helper method:
- Parses JSON string
- Checks for single "properties" key structure
- Unwraps and returns the properties value
- Returns original string if unwrapping not needed or parsing fails

No functional changes - pure refactoring.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

* fix: use correct class name AmazonConverseConfig in helper method calls

Fixed MyPy errors where BedrockConverseConfig was used instead of
AmazonConverseConfig in the _unwrap_bedrock_properties() calls.

Errors:
- Line 1619: BedrockConverseConfig -> AmazonConverseConfig
- Line 1631: BedrockConverseConfig -> AmazonConverseConfig

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

* fix: shorten guardrail benchmark result filenames for Windows long path support

Fixes #21941

The generated result filenames from _save_confusion_results contained
parentheses, dots, and full yaml filenames, producing paths that exceed
the Windows 260-char MAX_PATH limit. Rework the safe_label logic to
produce short {topic}_{method_abbrev} filenames (e.g. insults_cf.json)
while preserving the full label inside the JSON content.

Rename existing tracked result files to match the new naming convention.

* Update litellm/proxy/guardrails/guardrail_hooks/litellm_content_filter/guardrail_benchmarks/test_eval.py

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

* Remove Apache 2 license from SKILL.md (#22322)

* fix(mcp): default available_on_public_internet to true (#22331)

* fix(mcp): default available_on_public_internet to true

MCPs were defaulting to private (available_on_public_internet=false) which
was a breaking change. This reverts the default to public (true) across:
- Pydantic models (AddMCPServerRequest, UpdateMCPServerRequest, LiteLLM_MCPServerTable)
- Prisma schema @default
- mcp_server_manager.py YAML config + DB loading fallbacks
- UI form initialValue and setFieldValue defaults

* fix(ui): add forceRender to Collapse.Panel so toggle defaults render correctly

Ant Design's Collapse.Panel lazy-renders children by default. Without
forceRender, the Form.Item for 'Available on Public Internet' isn't
mounted when the useEffect fires form.setFieldValue, causing the Switch
to visually show OFF even though the intended default is true.

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* fix(mcp): update remaining schema copies and MCPServer type default to true

Missed in previous commit per Greptile review:
- schema.prisma (root)
- litellm-proxy-extras/litellm_proxy_extras/schema.prisma
- litellm/types/mcp_server/mcp_server_manager.py MCPServer class

* ui(mcp): reframe network access as 'Internal network only' restriction

Replace scary 'Available on Public Internet' toggle with 'Internal network only'
opt-in restriction. Toggle OFF (default) = all networks allowed. Toggle ON =
restricted to internal network only. Auth is always required either way.

- MCPPermissionManagement: new label/tooltip/description, invert display via
  getValueProps/getValueFromEvent so underlying available_on_public_internet
  value is unchanged
- mcp_server_view: 'Public' → 'All networks', 'Internal' → 'Internal only' (orange)
- mcp_server_columns: same badge updates

---------

Co-authored-by: Cursor Agent <cursoragent@cursor.com>
Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* fix(jwt): OIDC discovery URLs, roles array handling, dot-notation error hints (#22336)

* fix(jwt): support OIDC discovery URLs, handle roles array, improve error hints

Three fixes for Azure AD JWT auth:

1. OIDC discovery URL support - JWT_PUBLIC_KEY_URL can now be set to
   .well-known/openid-configuration endpoints. The proxy fetches the
   discovery doc, extracts jwks_uri, and caches it.

2. Handle roles claim as array - when team_id_jwt_field points to a list
   (e.g. AAD's "roles": ["team1"]), auto-unwrap the first element instead
   of crashing with 'unhashable type: list'.

3. Better error hint for dot-notation indexing - when team_id_jwt_field is
   set to "roles.0" or "roles[0]", the 401 error now explains to use
   "roles" instead and that LiteLLM auto-unwraps lists.

* Add integration demo script for JWT auth fixes (OIDC discovery, array roles, dot-notation hints)

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* Add demo_servers.py for manual JWT auth testing with mock JWKS/OIDC endpoints

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* Add demo screenshots for PR comment

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* Add integration test results with screenshots for PR review

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* address greptile review feedback (greploop iteration 1)

- fix: add HTTP status code check in _resolve_jwks_url before parsing JSON
- fix: remove misleading bracket-notation hint from debug log (get_nested_value does not support it)

* Update tests/test_litellm/proxy/auth/test_handle_jwt.py

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

* remove demo scripts and assets

---------

Co-authored-by: Cursor Agent <cursoragent@cursor.com>
Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

* perf: streaming latency improvements — 4 targeted hot-path fixes (#22346)

* perf: raise aiohttp connection pool limits (300→1000, 50/host→500)

* perf: skip model_copy() on every chunk — only copy usage-bearing chunks

* perf: replace list+join O(n²) with str+= O(n) in async_data_generator

* perf: cache model-level guardrail lookup per request, not per chunk

* test: add comprehensive Vitest coverage for CostTrackingSettings

Add 88 tests across 9 test files for the CostTrackingSettings component directory:
- provider_display_helpers.test.ts: 9 tests for helper functions
- how_it_works.test.tsx: 9 tests for discount calculator component
- add_provider_form.test.tsx: 7 tests for provider form validation
- add_margin_form.test.tsx: 9 tests for margin form with type toggle
- provider_discount_table.test.tsx: 12 tests for table editing and interactions
- provider_margin_table.test.tsx: 13 tests for margin table with sorting
- use_discount_config.test.ts: 11 tests for discount hook logic
- use_margin_config.test.ts: 12 tests for margin hook logic
- cost_tracking_settings.test.tsx: 15 tests for main component and role-based rendering

All tests passing. Coverage includes form validation, user interactions, API calls, state management, and conditional rendering.

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

* [Feature] Key list endpoint: Add project_id and access_group_id filters

Add filtering capabilities to /key/list endpoint for project_id and access_group_id parameters. Both filters work globally across all visibility rules and stack with existing sort/pagination params. Added comprehensive unit tests for the new filters.

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

* [Feature] UI - Projects: Add Project Details page with Edit modal

- Add ProjectDetailsPage with header, details card, spend/budget progress,
  model spend bar chart, keys placeholder, and team info card
- Refactor CreateProjectModal into base form pattern (ProjectBaseForm)
  shared between Create and Edit flows
- Add EditProjectModal with pre-filled form data from backend
- Add useProjectDetails and useUpdateProject hooks
- Add duplicate key validation for model limits and metadata
- Wire project ID click in table to navigate to detail view
- Move pagination inline with search bar

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Update ui/litellm-dashboard/src/components/Projects/ProjectModals/CreateProjectModal.tsx

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

* fix(azure): forward realtime_protocol from config and relax api_version check for GA path

The realtime_protocol parameter set in config.yaml litellm_params was
not reliably reaching the Azure realtime handler. Add fallback chain:
kwargs → litellm_params → LITELLM_AZURE_REALTIME_PROTOCOL env var → beta.

Also relax the api_version validation to only require it for the beta
protocol path, since the GA/v1 path does not use api_version in the URL.

Make protocol matching case-insensitive so 'ga', 'GA', 'v1', 'V1' all
work consistently. Fix _construct_url type signature to accept Optional
api_version.

Fixes #22127

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

---------

Co-authored-by: Zero Clover <zero@root.me>
Co-authored-by: Ryan Crabbe <rcrabbe@berkeley.edu>
Co-authored-by: ryan-crabbe <128659760+ryan-crabbe@users.noreply.github.com>
Co-authored-by: Dylan Duan <dylan.duan@assemblyai.com>
Co-authored-by: Julio Quinteros Pro <jquinter@gmail.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com>
Co-authored-by: Shivaang <shivaang.05@gmail.com>
Co-authored-by: Cursor Agent <cursoragent@cursor.com>
Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>
Co-authored-by: milan-berri <milan@berri.ai>
Co-authored-by: Shivam Rawat <161387515+shivamrawat1@users.noreply.github.com>
Co-authored-by: Brian Caswell <bcaswell@microsoft.com>
Co-authored-by: Brian Caswell <bcaswell@gmail.com>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Co-authored-by: rasmi <rrelasmar@gmail.com>
Co-authored-by: yuneng-jiang <yuneng.jiang@gmail.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
krrishdholakia pushed a commit that referenced this pull request Feb 28, 2026
* fix(image_generation): propagate extra_headers to OpenAI image generation

Add headers parameter to image_generation() and aimage_generation() methods
in OpenAI provider, and pass headers from images/main.py to ensure custom
headers like cf-aig-authorization are properly forwarded to the OpenAI API.
Aligns behavior with completion() method and Azure provider implementation.

* test(image_generation): add tests for extra_headers propagation

Verify that extra_headers are correctly forwarded to OpenAI's
images.generate() in both sync and async paths, and that they
are absent when not provided.

* Add Prometheus child_exit cleanup for gunicorn workers

When a gunicorn worker exits (e.g. from max_requests recycling), its
per-process prometheus .db files remain on disk. For gauges using
livesum/liveall mode, this means the dead worker's last-known values
persist as if the process were still alive. Wire gunicorn's child_exit
hook to call mark_process_dead() so live-tracking gauges accurately
reflect only running workers.

* docs: update AssemblyAI docs with Universal-3 Pro, Speech Understanding, and LLM Gateway (#21130)

* docs: update AssemblyAI docs with Universal-3 Pro, Speech Understanding, and LLM Gateway provider config

* feat: add AssemblyAI LLM Gateway as OpenAI-compatible provider

* fix(mcp): update test mocks to use renamed filter_server_ids_by_ip_with_info

Tests were mocking the old method name `filter_server_ids_by_ip` but production
code at server.py:774 calls `filter_server_ids_by_ip_with_info` which returns
a (server_ids, blocked_count) tuple. The unmocked method on AsyncMock returned
a coroutine, causing "cannot unpack non-iterable coroutine object" errors.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(test): update realtime guardrail test assertions for voice violation behavior

Tests were asserting no response.create/conversation.item.create sent to
backend when guardrail blocks, but the implementation intentionally sends
these to have the LLM voice the guardrail violation message to the user.

Updated assertions to verify the correct guardrail flow:
- response.cancel is sent to stop any in-progress response
- conversation.item.create with violation message is injected
- response.create is sent to voice the violation
- original blocked content is NOT forwarded

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(bedrock): restore parallel_tool_calls mapping in map_openai_params

The revert in 8565c70 removed the parallel_tool_calls handling from
map_openai_params, and the subsequent fix d0445e1 only re-added the
transform_request consumption but forgot to re-add the map_openai_params
producer that sets _parallel_tool_use_config. This meant parallel_tool_calls
was silently ignored for all Bedrock models.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(test): update Azure pass-through test to mock litellm.completion

Commit 99c62ca removed "azure" from _RESPONSES_API_PROVIDERS,
routing Azure models through litellm.completion instead of
litellm.responses. The test was not updated to match, causing it
to assert against the wrong mock.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: add in_flight_requests metric to /health/backlog + prometheus (#22319)

* feat: add in_flight_requests metric to /health/backlog + prometheus

* refactor: clean class with static methods, add tests, fix sentinel pattern

* docs: add in_flight_requests to prometheus metrics and latency troubleshooting

* fix(db): add missing migration for LiteLLM_ClaudeCodePluginTable

PR #22271 added the LiteLLM_ClaudeCodePluginTable model to
schema.prisma but did not include a corresponding migration file,
causing test_aaaasschema_migration_check to fail.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: update stale docstring to match guardrail voicing behavior

Addresses Greptile review feedback.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(caching): store background task references in LLMClientCache._remove_key to prevent unawaited coroutine warnings

Fixes #22128

* [Feat] Agent RBAC Permission Fix - Ensure Internal Users cannot create agents (#22329)

* fix: enforce RBAC on agent endpoints — block non-admin create/update/delete

- Add /v1/agents/{agent_id} to agent_routes so internal users can
  access GET-by-ID (previously returned 403 due to missing route pattern)
- Add _check_agent_management_permission() guard to POST, PUT, PATCH,
  DELETE agent endpoints — only PROXY_ADMIN may mutate agents
- Add user_api_key_dict param to delete_agent so the role check works
- Add comprehensive unit tests for RBAC enforcement across all roles

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* fix: mock prisma_client in internal user get-agent-by-id test

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* feat(ui): hide agent create/delete controls for non-admin users

Match MCP servers pattern: wrap '+ Add New Agent' button in
isAdmin conditional so internal users see a read-only agents view.
Delete buttons in card and table were already gated.
Update empty-state copy for non-admin users.
Add 7 Vitest tests covering role-based visibility.

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

---------

Co-authored-by: Cursor Agent <cursoragent@cursor.com>
Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* fix: Add PROXY_ADMIN role to system user for key rotation (#21896)

* fix: Add PROXY_ADMIN role to system user for key rotation

The key rotation worker was failing with 'You are not authorized to regenerate this key'
when rotating team keys. This was because the system user created by
get_litellm_internal_jobs_user_api_key_auth() was missing the user_role field.

Without user_role=PROXY_ADMIN, the system user couldn't bypass team permission checks
in can_team_member_execute_key_management_endpoint(), causing authorization failures
for team key rotation.

This fix adds user_role=LitellmUserRoles.PROXY_ADMIN to the system user, allowing
it to bypass team permission checks and successfully rotate keys for all teams.

* test: Add unit test for system user PROXY_ADMIN role

- Verify internal jobs system user has PROXY_ADMIN role
- Critical for key rotation to bypass team permission checks
- Regression test for PR #21896

* fix: populate user_id and user_info for admin users in /user/info (#22239)

* fix: populate user_id and user_info for admin users in /user/info endpoint

Fixes #22179

When admin users call /user/info without a user_id parameter, the endpoint
was returning null for both user_id and user_info fields. This broke
budgeting tooling that relies on /user/info to look up current budget and spend.

Changes:
- Modified _get_user_info_for_proxy_admin() to accept user_api_key_dict parameter
- Added logic to fetch admin's own user info from database
- Updated function to return admin's user_id and user_info instead of null
- Updated unit test to verify admin user_id is populated

The fix ensures admin users get their own user information just like regular users.

* test: make mock get_data signature match real method

- Updated MockPrismaClientDB.get_data() to accept all parameters that the real method accepts
- Makes mock more robust against future refactors
- Added datetime and Union imports
- Mock now returns None when user_id is not provided

* [Fix] Pass MCP auth headers from request into tool fetch for /v1/responses and chat completions (#22291)

* fixed dynamic auth for /responses with mcp

* fixed greptile concern

* fix(bedrock): filter internal json_tool_call when mixed with real tools

Fixes #18381: When using both tools and response_format with Bedrock
Converse API, LiteLLM internally adds json_tool_call to handle structured
output. Bedrock may return both this internal tool AND real user-defined
tools, breaking consumers like OpenAI Agents SDK.

Changes:
- Non-streaming: Added _filter_json_mode_tools() to handle 3 scenarios:
  only json_tool_call (convert to content), mixed (filter it out), or
  no json_tool_call (pass through)
- Streaming: Added json_mode tracking to AWSEventStreamDecoder to suppress
  json_tool_call chunks and convert to text content
- Fixed optional_params.pop() mutation issue

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

* refactor: extract duplicated JSON unwrapping into helper method

Addresses review comment from greptile-apps:
#21107 (review)

Changes:
- Added `_unwrap_bedrock_properties()` helper method to eliminate code duplication
- Replaced two identical JSON unwrapping blocks (lines 1592-1601 and 1612-1620)
  with calls to the new helper method
- Improves maintainability - single source of truth for Bedrock properties unwrapping logic

The helper method:
- Parses JSON string
- Checks for single "properties" key structure
- Unwraps and returns the properties value
- Returns original string if unwrapping not needed or parsing fails

No functional changes - pure refactoring.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

* fix: use correct class name AmazonConverseConfig in helper method calls

Fixed MyPy errors where BedrockConverseConfig was used instead of
AmazonConverseConfig in the _unwrap_bedrock_properties() calls.

Errors:
- Line 1619: BedrockConverseConfig -> AmazonConverseConfig
- Line 1631: BedrockConverseConfig -> AmazonConverseConfig

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

* fix: shorten guardrail benchmark result filenames for Windows long path support

Fixes #21941

The generated result filenames from _save_confusion_results contained
parentheses, dots, and full yaml filenames, producing paths that exceed
the Windows 260-char MAX_PATH limit. Rework the safe_label logic to
produce short {topic}_{method_abbrev} filenames (e.g. insults_cf.json)
while preserving the full label inside the JSON content.

Rename existing tracked result files to match the new naming convention.

* Update litellm/proxy/guardrails/guardrail_hooks/litellm_content_filter/guardrail_benchmarks/test_eval.py

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

* Remove Apache 2 license from SKILL.md (#22322)

* fix(mcp): default available_on_public_internet to true (#22331)

* fix(mcp): default available_on_public_internet to true

MCPs were defaulting to private (available_on_public_internet=false) which
was a breaking change. This reverts the default to public (true) across:
- Pydantic models (AddMCPServerRequest, UpdateMCPServerRequest, LiteLLM_MCPServerTable)
- Prisma schema @default
- mcp_server_manager.py YAML config + DB loading fallbacks
- UI form initialValue and setFieldValue defaults

* fix(ui): add forceRender to Collapse.Panel so toggle defaults render correctly

Ant Design's Collapse.Panel lazy-renders children by default. Without
forceRender, the Form.Item for 'Available on Public Internet' isn't
mounted when the useEffect fires form.setFieldValue, causing the Switch
to visually show OFF even though the intended default is true.

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* fix(mcp): update remaining schema copies and MCPServer type default to true

Missed in previous commit per Greptile review:
- schema.prisma (root)
- litellm-proxy-extras/litellm_proxy_extras/schema.prisma
- litellm/types/mcp_server/mcp_server_manager.py MCPServer class

* ui(mcp): reframe network access as 'Internal network only' restriction

Replace scary 'Available on Public Internet' toggle with 'Internal network only'
opt-in restriction. Toggle OFF (default) = all networks allowed. Toggle ON =
restricted to internal network only. Auth is always required either way.

- MCPPermissionManagement: new label/tooltip/description, invert display via
  getValueProps/getValueFromEvent so underlying available_on_public_internet
  value is unchanged
- mcp_server_view: 'Public' → 'All networks', 'Internal' → 'Internal only' (orange)
- mcp_server_columns: same badge updates

---------

Co-authored-by: Cursor Agent <cursoragent@cursor.com>
Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* fix(jwt): OIDC discovery URLs, roles array handling, dot-notation error hints (#22336)

* fix(jwt): support OIDC discovery URLs, handle roles array, improve error hints

Three fixes for Azure AD JWT auth:

1. OIDC discovery URL support - JWT_PUBLIC_KEY_URL can now be set to
   .well-known/openid-configuration endpoints. The proxy fetches the
   discovery doc, extracts jwks_uri, and caches it.

2. Handle roles claim as array - when team_id_jwt_field points to a list
   (e.g. AAD's "roles": ["team1"]), auto-unwrap the first element instead
   of crashing with 'unhashable type: list'.

3. Better error hint for dot-notation indexing - when team_id_jwt_field is
   set to "roles.0" or "roles[0]", the 401 error now explains to use
   "roles" instead and that LiteLLM auto-unwraps lists.

* Add integration demo script for JWT auth fixes (OIDC discovery, array roles, dot-notation hints)

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* Add demo_servers.py for manual JWT auth testing with mock JWKS/OIDC endpoints

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* Add demo screenshots for PR comment

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* Add integration test results with screenshots for PR review

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* address greptile review feedback (greploop iteration 1)

- fix: add HTTP status code check in _resolve_jwks_url before parsing JSON
- fix: remove misleading bracket-notation hint from debug log (get_nested_value does not support it)

* Update tests/test_litellm/proxy/auth/test_handle_jwt.py

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

* remove demo scripts and assets

---------

Co-authored-by: Cursor Agent <cursoragent@cursor.com>
Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

* perf: streaming latency improvements — 4 targeted hot-path fixes (#22346)

* perf: raise aiohttp connection pool limits (300→1000, 50/host→500)

* perf: skip model_copy() on every chunk — only copy usage-bearing chunks

* perf: replace list+join O(n²) with str+= O(n) in async_data_generator

* perf: cache model-level guardrail lookup per request, not per chunk

* test: add comprehensive Vitest coverage for CostTrackingSettings

Add 88 tests across 9 test files for the CostTrackingSettings component directory:
- provider_display_helpers.test.ts: 9 tests for helper functions
- how_it_works.test.tsx: 9 tests for discount calculator component
- add_provider_form.test.tsx: 7 tests for provider form validation
- add_margin_form.test.tsx: 9 tests for margin form with type toggle
- provider_discount_table.test.tsx: 12 tests for table editing and interactions
- provider_margin_table.test.tsx: 13 tests for margin table with sorting
- use_discount_config.test.ts: 11 tests for discount hook logic
- use_margin_config.test.ts: 12 tests for margin hook logic
- cost_tracking_settings.test.tsx: 15 tests for main component and role-based rendering

All tests passing. Coverage includes form validation, user interactions, API calls, state management, and conditional rendering.

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

* [Feature] Key list endpoint: Add project_id and access_group_id filters

Add filtering capabilities to /key/list endpoint for project_id and access_group_id parameters. Both filters work globally across all visibility rules and stack with existing sort/pagination params. Added comprehensive unit tests for the new filters.

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

* [Feature] UI - Projects: Add Project Details page with Edit modal

- Add ProjectDetailsPage with header, details card, spend/budget progress,
  model spend bar chart, keys placeholder, and team info card
- Refactor CreateProjectModal into base form pattern (ProjectBaseForm)
  shared between Create and Edit flows
- Add EditProjectModal with pre-filled form data from backend
- Add useProjectDetails and useUpdateProject hooks
- Add duplicate key validation for model limits and metadata
- Wire project ID click in table to navigate to detail view
- Move pagination inline with search bar

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Update ui/litellm-dashboard/src/components/Projects/ProjectModals/CreateProjectModal.tsx

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

* fix(anthropic): handle OAuth tokens in count_tokens endpoint

The count_tokens API's get_required_headers() always set x-api-key,
which is incorrect for OAuth tokens (sk-ant-oat*). These tokens must
use Authorization: Bearer instead.

Changes:
- Add optionally_handle_anthropic_oauth() call in get_required_headers()
  to convert OAuth tokens from x-api-key to Authorization: Bearer
- Add _merge_beta_headers() helper to preserve existing anthropic-beta
  values (e.g. token-counting) when appending the OAuth beta header
- Add 7 tests covering regular and OAuth header generation

Fixes #22040

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

---------

Co-authored-by: Zero Clover <zero@root.me>
Co-authored-by: Ryan Crabbe <rcrabbe@berkeley.edu>
Co-authored-by: ryan-crabbe <128659760+ryan-crabbe@users.noreply.github.com>
Co-authored-by: Dylan Duan <dylan.duan@assemblyai.com>
Co-authored-by: Julio Quinteros Pro <jquinter@gmail.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com>
Co-authored-by: Shivaang <shivaang.05@gmail.com>
Co-authored-by: Cursor Agent <cursoragent@cursor.com>
Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>
Co-authored-by: milan-berri <milan@berri.ai>
Co-authored-by: Shivam Rawat <161387515+shivamrawat1@users.noreply.github.com>
Co-authored-by: Brian Caswell <bcaswell@microsoft.com>
Co-authored-by: Brian Caswell <bcaswell@gmail.com>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Co-authored-by: rasmi <rrelasmar@gmail.com>
Co-authored-by: yuneng-jiang <yuneng.jiang@gmail.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sameerlite pushed a commit that referenced this pull request Mar 2, 2026
* Add Prometheus child_exit cleanup for gunicorn workers

When a gunicorn worker exits (e.g. from max_requests recycling), its
per-process prometheus .db files remain on disk. For gauges using
livesum/liveall mode, this means the dead worker's last-known values
persist as if the process were still alive. Wire gunicorn's child_exit
hook to call mark_process_dead() so live-tracking gauges accurately
reflect only running workers.

* docs: update AssemblyAI docs with Universal-3 Pro, Speech Understanding, and LLM Gateway (#21130)

* docs: update AssemblyAI docs with Universal-3 Pro, Speech Understanding, and LLM Gateway provider config

* feat: add AssemblyAI LLM Gateway as OpenAI-compatible provider

* fix(mcp): update test mocks to use renamed filter_server_ids_by_ip_with_info

Tests were mocking the old method name `filter_server_ids_by_ip` but production
code at server.py:774 calls `filter_server_ids_by_ip_with_info` which returns
a (server_ids, blocked_count) tuple. The unmocked method on AsyncMock returned
a coroutine, causing "cannot unpack non-iterable coroutine object" errors.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(test): update realtime guardrail test assertions for voice violation behavior

Tests were asserting no response.create/conversation.item.create sent to
backend when guardrail blocks, but the implementation intentionally sends
these to have the LLM voice the guardrail violation message to the user.

Updated assertions to verify the correct guardrail flow:
- response.cancel is sent to stop any in-progress response
- conversation.item.create with violation message is injected
- response.create is sent to voice the violation
- original blocked content is NOT forwarded

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(bedrock): restore parallel_tool_calls mapping in map_openai_params

The revert in 8565c70 removed the parallel_tool_calls handling from
map_openai_params, and the subsequent fix d0445e1 only re-added the
transform_request consumption but forgot to re-add the map_openai_params
producer that sets _parallel_tool_use_config. This meant parallel_tool_calls
was silently ignored for all Bedrock models.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(test): update Azure pass-through test to mock litellm.completion

Commit 99c62ca removed "azure" from _RESPONSES_API_PROVIDERS,
routing Azure models through litellm.completion instead of
litellm.responses. The test was not updated to match, causing it
to assert against the wrong mock.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: add in_flight_requests metric to /health/backlog + prometheus (#22319)

* feat: add in_flight_requests metric to /health/backlog + prometheus

* refactor: clean class with static methods, add tests, fix sentinel pattern

* docs: add in_flight_requests to prometheus metrics and latency troubleshooting

* fix(db): add missing migration for LiteLLM_ClaudeCodePluginTable

PR #22271 added the LiteLLM_ClaudeCodePluginTable model to
schema.prisma but did not include a corresponding migration file,
causing test_aaaasschema_migration_check to fail.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: update stale docstring to match guardrail voicing behavior

Addresses Greptile review feedback.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* [Feat] Agent RBAC Permission Fix - Ensure Internal Users cannot create agents (#22329)

* fix: enforce RBAC on agent endpoints — block non-admin create/update/delete

- Add /v1/agents/{agent_id} to agent_routes so internal users can
  access GET-by-ID (previously returned 403 due to missing route pattern)
- Add _check_agent_management_permission() guard to POST, PUT, PATCH,
  DELETE agent endpoints — only PROXY_ADMIN may mutate agents
- Add user_api_key_dict param to delete_agent so the role check works
- Add comprehensive unit tests for RBAC enforcement across all roles

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* fix: mock prisma_client in internal user get-agent-by-id test

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* feat(ui): hide agent create/delete controls for non-admin users

Match MCP servers pattern: wrap '+ Add New Agent' button in
isAdmin conditional so internal users see a read-only agents view.
Delete buttons in card and table were already gated.
Update empty-state copy for non-admin users.
Add 7 Vitest tests covering role-based visibility.

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

---------

Co-authored-by: Cursor Agent <cursoragent@cursor.com>
Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* fix: exclude gpt-5.2-chat from temperature passthrough (#21911)

gpt-5.2-chat and gpt-5.2-chat-latest only support temperature=1
(like base gpt-5), not arbitrary values (like gpt-5.2).
Update is_model_gpt_5_1_model() to exclude gpt-5.2-chat variants
so drop_params correctly drops unsupported temperature values.

Fixes #21911

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

---------

Co-authored-by: Ryan Crabbe <rcrabbe@berkeley.edu>
Co-authored-by: ryan-crabbe <128659760+ryan-crabbe@users.noreply.github.com>
Co-authored-by: Dylan Duan <dylan.duan@assemblyai.com>
Co-authored-by: Julio Quinteros Pro <jquinter@gmail.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com>
Co-authored-by: Cursor Agent <cursoragent@cursor.com>
Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sameerlite pushed a commit that referenced this pull request Mar 2, 2026
* fix(image_generation): propagate extra_headers to OpenAI image generation

Add headers parameter to image_generation() and aimage_generation() methods
in OpenAI provider, and pass headers from images/main.py to ensure custom
headers like cf-aig-authorization are properly forwarded to the OpenAI API.
Aligns behavior with completion() method and Azure provider implementation.

* test(image_generation): add tests for extra_headers propagation

Verify that extra_headers are correctly forwarded to OpenAI's
images.generate() in both sync and async paths, and that they
are absent when not provided.

* Add Prometheus child_exit cleanup for gunicorn workers

When a gunicorn worker exits (e.g. from max_requests recycling), its
per-process prometheus .db files remain on disk. For gauges using
livesum/liveall mode, this means the dead worker's last-known values
persist as if the process were still alive. Wire gunicorn's child_exit
hook to call mark_process_dead() so live-tracking gauges accurately
reflect only running workers.

* docs: update AssemblyAI docs with Universal-3 Pro, Speech Understanding, and LLM Gateway (#21130)

* docs: update AssemblyAI docs with Universal-3 Pro, Speech Understanding, and LLM Gateway provider config

* feat: add AssemblyAI LLM Gateway as OpenAI-compatible provider

* fix(mcp): update test mocks to use renamed filter_server_ids_by_ip_with_info

Tests were mocking the old method name `filter_server_ids_by_ip` but production
code at server.py:774 calls `filter_server_ids_by_ip_with_info` which returns
a (server_ids, blocked_count) tuple. The unmocked method on AsyncMock returned
a coroutine, causing "cannot unpack non-iterable coroutine object" errors.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(test): update realtime guardrail test assertions for voice violation behavior

Tests were asserting no response.create/conversation.item.create sent to
backend when guardrail blocks, but the implementation intentionally sends
these to have the LLM voice the guardrail violation message to the user.

Updated assertions to verify the correct guardrail flow:
- response.cancel is sent to stop any in-progress response
- conversation.item.create with violation message is injected
- response.create is sent to voice the violation
- original blocked content is NOT forwarded

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(bedrock): restore parallel_tool_calls mapping in map_openai_params

The revert in 8565c70 removed the parallel_tool_calls handling from
map_openai_params, and the subsequent fix d0445e1 only re-added the
transform_request consumption but forgot to re-add the map_openai_params
producer that sets _parallel_tool_use_config. This meant parallel_tool_calls
was silently ignored for all Bedrock models.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(test): update Azure pass-through test to mock litellm.completion

Commit 99c62ca removed "azure" from _RESPONSES_API_PROVIDERS,
routing Azure models through litellm.completion instead of
litellm.responses. The test was not updated to match, causing it
to assert against the wrong mock.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: add in_flight_requests metric to /health/backlog + prometheus (#22319)

* feat: add in_flight_requests metric to /health/backlog + prometheus

* refactor: clean class with static methods, add tests, fix sentinel pattern

* docs: add in_flight_requests to prometheus metrics and latency troubleshooting

* fix(db): add missing migration for LiteLLM_ClaudeCodePluginTable

PR #22271 added the LiteLLM_ClaudeCodePluginTable model to
schema.prisma but did not include a corresponding migration file,
causing test_aaaasschema_migration_check to fail.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: update stale docstring to match guardrail voicing behavior

Addresses Greptile review feedback.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(caching): store background task references in LLMClientCache._remove_key to prevent unawaited coroutine warnings

Fixes #22128

* [Feat] Agent RBAC Permission Fix - Ensure Internal Users cannot create agents (#22329)

* fix: enforce RBAC on agent endpoints — block non-admin create/update/delete

- Add /v1/agents/{agent_id} to agent_routes so internal users can
  access GET-by-ID (previously returned 403 due to missing route pattern)
- Add _check_agent_management_permission() guard to POST, PUT, PATCH,
  DELETE agent endpoints — only PROXY_ADMIN may mutate agents
- Add user_api_key_dict param to delete_agent so the role check works
- Add comprehensive unit tests for RBAC enforcement across all roles

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* fix: mock prisma_client in internal user get-agent-by-id test

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* feat(ui): hide agent create/delete controls for non-admin users

Match MCP servers pattern: wrap '+ Add New Agent' button in
isAdmin conditional so internal users see a read-only agents view.
Delete buttons in card and table were already gated.
Update empty-state copy for non-admin users.
Add 7 Vitest tests covering role-based visibility.

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

---------

Co-authored-by: Cursor Agent <cursoragent@cursor.com>
Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* fix: Add PROXY_ADMIN role to system user for key rotation (#21896)

* fix: Add PROXY_ADMIN role to system user for key rotation

The key rotation worker was failing with 'You are not authorized to regenerate this key'
when rotating team keys. This was because the system user created by
get_litellm_internal_jobs_user_api_key_auth() was missing the user_role field.

Without user_role=PROXY_ADMIN, the system user couldn't bypass team permission checks
in can_team_member_execute_key_management_endpoint(), causing authorization failures
for team key rotation.

This fix adds user_role=LitellmUserRoles.PROXY_ADMIN to the system user, allowing
it to bypass team permission checks and successfully rotate keys for all teams.

* test: Add unit test for system user PROXY_ADMIN role

- Verify internal jobs system user has PROXY_ADMIN role
- Critical for key rotation to bypass team permission checks
- Regression test for PR #21896

* fix: populate user_id and user_info for admin users in /user/info (#22239)

* fix: populate user_id and user_info for admin users in /user/info endpoint

Fixes #22179

When admin users call /user/info without a user_id parameter, the endpoint
was returning null for both user_id and user_info fields. This broke
budgeting tooling that relies on /user/info to look up current budget and spend.

Changes:
- Modified _get_user_info_for_proxy_admin() to accept user_api_key_dict parameter
- Added logic to fetch admin's own user info from database
- Updated function to return admin's user_id and user_info instead of null
- Updated unit test to verify admin user_id is populated

The fix ensures admin users get their own user information just like regular users.

* test: make mock get_data signature match real method

- Updated MockPrismaClientDB.get_data() to accept all parameters that the real method accepts
- Makes mock more robust against future refactors
- Added datetime and Union imports
- Mock now returns None when user_id is not provided

* [Fix] Pass MCP auth headers from request into tool fetch for /v1/responses and chat completions (#22291)

* fixed dynamic auth for /responses with mcp

* fixed greptile concern

* fix(bedrock): filter internal json_tool_call when mixed with real tools

Fixes #18381: When using both tools and response_format with Bedrock
Converse API, LiteLLM internally adds json_tool_call to handle structured
output. Bedrock may return both this internal tool AND real user-defined
tools, breaking consumers like OpenAI Agents SDK.

Changes:
- Non-streaming: Added _filter_json_mode_tools() to handle 3 scenarios:
  only json_tool_call (convert to content), mixed (filter it out), or
  no json_tool_call (pass through)
- Streaming: Added json_mode tracking to AWSEventStreamDecoder to suppress
  json_tool_call chunks and convert to text content
- Fixed optional_params.pop() mutation issue

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

* refactor: extract duplicated JSON unwrapping into helper method

Addresses review comment from greptile-apps:
#21107 (review)

Changes:
- Added `_unwrap_bedrock_properties()` helper method to eliminate code duplication
- Replaced two identical JSON unwrapping blocks (lines 1592-1601 and 1612-1620)
  with calls to the new helper method
- Improves maintainability - single source of truth for Bedrock properties unwrapping logic

The helper method:
- Parses JSON string
- Checks for single "properties" key structure
- Unwraps and returns the properties value
- Returns original string if unwrapping not needed or parsing fails

No functional changes - pure refactoring.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

* fix: use correct class name AmazonConverseConfig in helper method calls

Fixed MyPy errors where BedrockConverseConfig was used instead of
AmazonConverseConfig in the _unwrap_bedrock_properties() calls.

Errors:
- Line 1619: BedrockConverseConfig -> AmazonConverseConfig
- Line 1631: BedrockConverseConfig -> AmazonConverseConfig

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

* fix: shorten guardrail benchmark result filenames for Windows long path support

Fixes #21941

The generated result filenames from _save_confusion_results contained
parentheses, dots, and full yaml filenames, producing paths that exceed
the Windows 260-char MAX_PATH limit. Rework the safe_label logic to
produce short {topic}_{method_abbrev} filenames (e.g. insults_cf.json)
while preserving the full label inside the JSON content.

Rename existing tracked result files to match the new naming convention.

* Update litellm/proxy/guardrails/guardrail_hooks/litellm_content_filter/guardrail_benchmarks/test_eval.py

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

* Remove Apache 2 license from SKILL.md (#22322)

* fix(mcp): default available_on_public_internet to true (#22331)

* fix(mcp): default available_on_public_internet to true

MCPs were defaulting to private (available_on_public_internet=false) which
was a breaking change. This reverts the default to public (true) across:
- Pydantic models (AddMCPServerRequest, UpdateMCPServerRequest, LiteLLM_MCPServerTable)
- Prisma schema @default
- mcp_server_manager.py YAML config + DB loading fallbacks
- UI form initialValue and setFieldValue defaults

* fix(ui): add forceRender to Collapse.Panel so toggle defaults render correctly

Ant Design's Collapse.Panel lazy-renders children by default. Without
forceRender, the Form.Item for 'Available on Public Internet' isn't
mounted when the useEffect fires form.setFieldValue, causing the Switch
to visually show OFF even though the intended default is true.

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* fix(mcp): update remaining schema copies and MCPServer type default to true

Missed in previous commit per Greptile review:
- schema.prisma (root)
- litellm-proxy-extras/litellm_proxy_extras/schema.prisma
- litellm/types/mcp_server/mcp_server_manager.py MCPServer class

* ui(mcp): reframe network access as 'Internal network only' restriction

Replace scary 'Available on Public Internet' toggle with 'Internal network only'
opt-in restriction. Toggle OFF (default) = all networks allowed. Toggle ON =
restricted to internal network only. Auth is always required either way.

- MCPPermissionManagement: new label/tooltip/description, invert display via
  getValueProps/getValueFromEvent so underlying available_on_public_internet
  value is unchanged
- mcp_server_view: 'Public' → 'All networks', 'Internal' → 'Internal only' (orange)
- mcp_server_columns: same badge updates

---------

Co-authored-by: Cursor Agent <cursoragent@cursor.com>
Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* fix(jwt): OIDC discovery URLs, roles array handling, dot-notation error hints (#22336)

* fix(jwt): support OIDC discovery URLs, handle roles array, improve error hints

Three fixes for Azure AD JWT auth:

1. OIDC discovery URL support - JWT_PUBLIC_KEY_URL can now be set to
   .well-known/openid-configuration endpoints. The proxy fetches the
   discovery doc, extracts jwks_uri, and caches it.

2. Handle roles claim as array - when team_id_jwt_field points to a list
   (e.g. AAD's "roles": ["team1"]), auto-unwrap the first element instead
   of crashing with 'unhashable type: list'.

3. Better error hint for dot-notation indexing - when team_id_jwt_field is
   set to "roles.0" or "roles[0]", the 401 error now explains to use
   "roles" instead and that LiteLLM auto-unwraps lists.

* Add integration demo script for JWT auth fixes (OIDC discovery, array roles, dot-notation hints)

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* Add demo_servers.py for manual JWT auth testing with mock JWKS/OIDC endpoints

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* Add demo screenshots for PR comment

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* Add integration test results with screenshots for PR review

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* address greptile review feedback (greploop iteration 1)

- fix: add HTTP status code check in _resolve_jwks_url before parsing JSON
- fix: remove misleading bracket-notation hint from debug log (get_nested_value does not support it)

* Update tests/test_litellm/proxy/auth/test_handle_jwt.py

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

* remove demo scripts and assets

---------

Co-authored-by: Cursor Agent <cursoragent@cursor.com>
Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

* perf: streaming latency improvements — 4 targeted hot-path fixes (#22346)

* perf: raise aiohttp connection pool limits (300→1000, 50/host→500)

* perf: skip model_copy() on every chunk — only copy usage-bearing chunks

* perf: replace list+join O(n²) with str+= O(n) in async_data_generator

* perf: cache model-level guardrail lookup per request, not per chunk

* test: add comprehensive Vitest coverage for CostTrackingSettings

Add 88 tests across 9 test files for the CostTrackingSettings component directory:
- provider_display_helpers.test.ts: 9 tests for helper functions
- how_it_works.test.tsx: 9 tests for discount calculator component
- add_provider_form.test.tsx: 7 tests for provider form validation
- add_margin_form.test.tsx: 9 tests for margin form with type toggle
- provider_discount_table.test.tsx: 12 tests for table editing and interactions
- provider_margin_table.test.tsx: 13 tests for margin table with sorting
- use_discount_config.test.ts: 11 tests for discount hook logic
- use_margin_config.test.ts: 12 tests for margin hook logic
- cost_tracking_settings.test.tsx: 15 tests for main component and role-based rendering

All tests passing. Coverage includes form validation, user interactions, API calls, state management, and conditional rendering.

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

* [Feature] Key list endpoint: Add project_id and access_group_id filters

Add filtering capabilities to /key/list endpoint for project_id and access_group_id parameters. Both filters work globally across all visibility rules and stack with existing sort/pagination params. Added comprehensive unit tests for the new filters.

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

* [Feature] UI - Projects: Add Project Details page with Edit modal

- Add ProjectDetailsPage with header, details card, spend/budget progress,
  model spend bar chart, keys placeholder, and team info card
- Refactor CreateProjectModal into base form pattern (ProjectBaseForm)
  shared between Create and Edit flows
- Add EditProjectModal with pre-filled form data from backend
- Add useProjectDetails and useUpdateProject hooks
- Add duplicate key validation for model limits and metadata
- Wire project ID click in table to navigate to detail view
- Move pagination inline with search bar

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Update ui/litellm-dashboard/src/components/Projects/ProjectModals/CreateProjectModal.tsx

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

* fix(types): filter null fields from reasoning output items in ResponsesAPIResponse

When providers return reasoning items without status/content/encrypted_content,
Pydantic's Optional defaults serialize them as null. This breaks downstream SDKs
(e.g., the OpenAI C# SDK crashes on status=null).

Add a field_serializer on ResponsesAPIResponse.output that removes null
status, content, and encrypted_content from reasoning items during
serialization. This mirrors the request-side filtering already done in
OpenAIResponsesAPIConfig._handle_reasoning_item().

Fixes #16824

---------

Co-authored-by: Zero Clover <zero@root.me>
Co-authored-by: Ryan Crabbe <rcrabbe@berkeley.edu>
Co-authored-by: ryan-crabbe <128659760+ryan-crabbe@users.noreply.github.com>
Co-authored-by: Dylan Duan <dylan.duan@assemblyai.com>
Co-authored-by: Julio Quinteros Pro <jquinter@gmail.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com>
Co-authored-by: Cursor Agent <cursoragent@cursor.com>
Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>
Co-authored-by: milan-berri <milan@berri.ai>
Co-authored-by: Shivam Rawat <161387515+shivamrawat1@users.noreply.github.com>
Co-authored-by: Brian Caswell <bcaswell@microsoft.com>
Co-authored-by: Brian Caswell <bcaswell@gmail.com>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Co-authored-by: rasmi <rrelasmar@gmail.com>
Co-authored-by: yuneng-jiang <yuneng.jiang@gmail.com>
Sameerlite pushed a commit that referenced this pull request Mar 2, 2026
…on check for GA path (#22369)

* fix(image_generation): propagate extra_headers to OpenAI image generation

Add headers parameter to image_generation() and aimage_generation() methods
in OpenAI provider, and pass headers from images/main.py to ensure custom
headers like cf-aig-authorization are properly forwarded to the OpenAI API.
Aligns behavior with completion() method and Azure provider implementation.

* test(image_generation): add tests for extra_headers propagation

Verify that extra_headers are correctly forwarded to OpenAI's
images.generate() in both sync and async paths, and that they
are absent when not provided.

* Add Prometheus child_exit cleanup for gunicorn workers

When a gunicorn worker exits (e.g. from max_requests recycling), its
per-process prometheus .db files remain on disk. For gauges using
livesum/liveall mode, this means the dead worker's last-known values
persist as if the process were still alive. Wire gunicorn's child_exit
hook to call mark_process_dead() so live-tracking gauges accurately
reflect only running workers.

* docs: update AssemblyAI docs with Universal-3 Pro, Speech Understanding, and LLM Gateway (#21130)

* docs: update AssemblyAI docs with Universal-3 Pro, Speech Understanding, and LLM Gateway provider config

* feat: add AssemblyAI LLM Gateway as OpenAI-compatible provider

* fix(mcp): update test mocks to use renamed filter_server_ids_by_ip_with_info

Tests were mocking the old method name `filter_server_ids_by_ip` but production
code at server.py:774 calls `filter_server_ids_by_ip_with_info` which returns
a (server_ids, blocked_count) tuple. The unmocked method on AsyncMock returned
a coroutine, causing "cannot unpack non-iterable coroutine object" errors.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(test): update realtime guardrail test assertions for voice violation behavior

Tests were asserting no response.create/conversation.item.create sent to
backend when guardrail blocks, but the implementation intentionally sends
these to have the LLM voice the guardrail violation message to the user.

Updated assertions to verify the correct guardrail flow:
- response.cancel is sent to stop any in-progress response
- conversation.item.create with violation message is injected
- response.create is sent to voice the violation
- original blocked content is NOT forwarded

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(bedrock): restore parallel_tool_calls mapping in map_openai_params

The revert in 8565c70 removed the parallel_tool_calls handling from
map_openai_params, and the subsequent fix d0445e1 only re-added the
transform_request consumption but forgot to re-add the map_openai_params
producer that sets _parallel_tool_use_config. This meant parallel_tool_calls
was silently ignored for all Bedrock models.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(test): update Azure pass-through test to mock litellm.completion

Commit 99c62ca removed "azure" from _RESPONSES_API_PROVIDERS,
routing Azure models through litellm.completion instead of
litellm.responses. The test was not updated to match, causing it
to assert against the wrong mock.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: add in_flight_requests metric to /health/backlog + prometheus (#22319)

* feat: add in_flight_requests metric to /health/backlog + prometheus

* refactor: clean class with static methods, add tests, fix sentinel pattern

* docs: add in_flight_requests to prometheus metrics and latency troubleshooting

* fix(db): add missing migration for LiteLLM_ClaudeCodePluginTable

PR #22271 added the LiteLLM_ClaudeCodePluginTable model to
schema.prisma but did not include a corresponding migration file,
causing test_aaaasschema_migration_check to fail.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: update stale docstring to match guardrail voicing behavior

Addresses Greptile review feedback.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(caching): store background task references in LLMClientCache._remove_key to prevent unawaited coroutine warnings

Fixes #22128

* [Feat] Agent RBAC Permission Fix - Ensure Internal Users cannot create agents (#22329)

* fix: enforce RBAC on agent endpoints — block non-admin create/update/delete

- Add /v1/agents/{agent_id} to agent_routes so internal users can
  access GET-by-ID (previously returned 403 due to missing route pattern)
- Add _check_agent_management_permission() guard to POST, PUT, PATCH,
  DELETE agent endpoints — only PROXY_ADMIN may mutate agents
- Add user_api_key_dict param to delete_agent so the role check works
- Add comprehensive unit tests for RBAC enforcement across all roles

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* fix: mock prisma_client in internal user get-agent-by-id test

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* feat(ui): hide agent create/delete controls for non-admin users

Match MCP servers pattern: wrap '+ Add New Agent' button in
isAdmin conditional so internal users see a read-only agents view.
Delete buttons in card and table were already gated.
Update empty-state copy for non-admin users.
Add 7 Vitest tests covering role-based visibility.

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

---------

Co-authored-by: Cursor Agent <cursoragent@cursor.com>
Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* fix: Add PROXY_ADMIN role to system user for key rotation (#21896)

* fix: Add PROXY_ADMIN role to system user for key rotation

The key rotation worker was failing with 'You are not authorized to regenerate this key'
when rotating team keys. This was because the system user created by
get_litellm_internal_jobs_user_api_key_auth() was missing the user_role field.

Without user_role=PROXY_ADMIN, the system user couldn't bypass team permission checks
in can_team_member_execute_key_management_endpoint(), causing authorization failures
for team key rotation.

This fix adds user_role=LitellmUserRoles.PROXY_ADMIN to the system user, allowing
it to bypass team permission checks and successfully rotate keys for all teams.

* test: Add unit test for system user PROXY_ADMIN role

- Verify internal jobs system user has PROXY_ADMIN role
- Critical for key rotation to bypass team permission checks
- Regression test for PR #21896

* fix: populate user_id and user_info for admin users in /user/info (#22239)

* fix: populate user_id and user_info for admin users in /user/info endpoint

Fixes #22179

When admin users call /user/info without a user_id parameter, the endpoint
was returning null for both user_id and user_info fields. This broke
budgeting tooling that relies on /user/info to look up current budget and spend.

Changes:
- Modified _get_user_info_for_proxy_admin() to accept user_api_key_dict parameter
- Added logic to fetch admin's own user info from database
- Updated function to return admin's user_id and user_info instead of null
- Updated unit test to verify admin user_id is populated

The fix ensures admin users get their own user information just like regular users.

* test: make mock get_data signature match real method

- Updated MockPrismaClientDB.get_data() to accept all parameters that the real method accepts
- Makes mock more robust against future refactors
- Added datetime and Union imports
- Mock now returns None when user_id is not provided

* [Fix] Pass MCP auth headers from request into tool fetch for /v1/responses and chat completions (#22291)

* fixed dynamic auth for /responses with mcp

* fixed greptile concern

* fix(bedrock): filter internal json_tool_call when mixed with real tools

Fixes #18381: When using both tools and response_format with Bedrock
Converse API, LiteLLM internally adds json_tool_call to handle structured
output. Bedrock may return both this internal tool AND real user-defined
tools, breaking consumers like OpenAI Agents SDK.

Changes:
- Non-streaming: Added _filter_json_mode_tools() to handle 3 scenarios:
  only json_tool_call (convert to content), mixed (filter it out), or
  no json_tool_call (pass through)
- Streaming: Added json_mode tracking to AWSEventStreamDecoder to suppress
  json_tool_call chunks and convert to text content
- Fixed optional_params.pop() mutation issue

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

* refactor: extract duplicated JSON unwrapping into helper method

Addresses review comment from greptile-apps:
#21107 (review)

Changes:
- Added `_unwrap_bedrock_properties()` helper method to eliminate code duplication
- Replaced two identical JSON unwrapping blocks (lines 1592-1601 and 1612-1620)
  with calls to the new helper method
- Improves maintainability - single source of truth for Bedrock properties unwrapping logic

The helper method:
- Parses JSON string
- Checks for single "properties" key structure
- Unwraps and returns the properties value
- Returns original string if unwrapping not needed or parsing fails

No functional changes - pure refactoring.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

* fix: use correct class name AmazonConverseConfig in helper method calls

Fixed MyPy errors where BedrockConverseConfig was used instead of
AmazonConverseConfig in the _unwrap_bedrock_properties() calls.

Errors:
- Line 1619: BedrockConverseConfig -> AmazonConverseConfig
- Line 1631: BedrockConverseConfig -> AmazonConverseConfig

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

* fix: shorten guardrail benchmark result filenames for Windows long path support

Fixes #21941

The generated result filenames from _save_confusion_results contained
parentheses, dots, and full yaml filenames, producing paths that exceed
the Windows 260-char MAX_PATH limit. Rework the safe_label logic to
produce short {topic}_{method_abbrev} filenames (e.g. insults_cf.json)
while preserving the full label inside the JSON content.

Rename existing tracked result files to match the new naming convention.

* Update litellm/proxy/guardrails/guardrail_hooks/litellm_content_filter/guardrail_benchmarks/test_eval.py

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

* Remove Apache 2 license from SKILL.md (#22322)

* fix(mcp): default available_on_public_internet to true (#22331)

* fix(mcp): default available_on_public_internet to true

MCPs were defaulting to private (available_on_public_internet=false) which
was a breaking change. This reverts the default to public (true) across:
- Pydantic models (AddMCPServerRequest, UpdateMCPServerRequest, LiteLLM_MCPServerTable)
- Prisma schema @default
- mcp_server_manager.py YAML config + DB loading fallbacks
- UI form initialValue and setFieldValue defaults

* fix(ui): add forceRender to Collapse.Panel so toggle defaults render correctly

Ant Design's Collapse.Panel lazy-renders children by default. Without
forceRender, the Form.Item for 'Available on Public Internet' isn't
mounted when the useEffect fires form.setFieldValue, causing the Switch
to visually show OFF even though the intended default is true.

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* fix(mcp): update remaining schema copies and MCPServer type default to true

Missed in previous commit per Greptile review:
- schema.prisma (root)
- litellm-proxy-extras/litellm_proxy_extras/schema.prisma
- litellm/types/mcp_server/mcp_server_manager.py MCPServer class

* ui(mcp): reframe network access as 'Internal network only' restriction

Replace scary 'Available on Public Internet' toggle with 'Internal network only'
opt-in restriction. Toggle OFF (default) = all networks allowed. Toggle ON =
restricted to internal network only. Auth is always required either way.

- MCPPermissionManagement: new label/tooltip/description, invert display via
  getValueProps/getValueFromEvent so underlying available_on_public_internet
  value is unchanged
- mcp_server_view: 'Public' → 'All networks', 'Internal' → 'Internal only' (orange)
- mcp_server_columns: same badge updates

---------

Co-authored-by: Cursor Agent <cursoragent@cursor.com>
Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* fix(jwt): OIDC discovery URLs, roles array handling, dot-notation error hints (#22336)

* fix(jwt): support OIDC discovery URLs, handle roles array, improve error hints

Three fixes for Azure AD JWT auth:

1. OIDC discovery URL support - JWT_PUBLIC_KEY_URL can now be set to
   .well-known/openid-configuration endpoints. The proxy fetches the
   discovery doc, extracts jwks_uri, and caches it.

2. Handle roles claim as array - when team_id_jwt_field points to a list
   (e.g. AAD's "roles": ["team1"]), auto-unwrap the first element instead
   of crashing with 'unhashable type: list'.

3. Better error hint for dot-notation indexing - when team_id_jwt_field is
   set to "roles.0" or "roles[0]", the 401 error now explains to use
   "roles" instead and that LiteLLM auto-unwraps lists.

* Add integration demo script for JWT auth fixes (OIDC discovery, array roles, dot-notation hints)

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* Add demo_servers.py for manual JWT auth testing with mock JWKS/OIDC endpoints

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* Add demo screenshots for PR comment

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* Add integration test results with screenshots for PR review

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* address greptile review feedback (greploop iteration 1)

- fix: add HTTP status code check in _resolve_jwks_url before parsing JSON
- fix: remove misleading bracket-notation hint from debug log (get_nested_value does not support it)

* Update tests/test_litellm/proxy/auth/test_handle_jwt.py

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

* remove demo scripts and assets

---------

Co-authored-by: Cursor Agent <cursoragent@cursor.com>
Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

* perf: streaming latency improvements — 4 targeted hot-path fixes (#22346)

* perf: raise aiohttp connection pool limits (300→1000, 50/host→500)

* perf: skip model_copy() on every chunk — only copy usage-bearing chunks

* perf: replace list+join O(n²) with str+= O(n) in async_data_generator

* perf: cache model-level guardrail lookup per request, not per chunk

* test: add comprehensive Vitest coverage for CostTrackingSettings

Add 88 tests across 9 test files for the CostTrackingSettings component directory:
- provider_display_helpers.test.ts: 9 tests for helper functions
- how_it_works.test.tsx: 9 tests for discount calculator component
- add_provider_form.test.tsx: 7 tests for provider form validation
- add_margin_form.test.tsx: 9 tests for margin form with type toggle
- provider_discount_table.test.tsx: 12 tests for table editing and interactions
- provider_margin_table.test.tsx: 13 tests for margin table with sorting
- use_discount_config.test.ts: 11 tests for discount hook logic
- use_margin_config.test.ts: 12 tests for margin hook logic
- cost_tracking_settings.test.tsx: 15 tests for main component and role-based rendering

All tests passing. Coverage includes form validation, user interactions, API calls, state management, and conditional rendering.

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

* [Feature] Key list endpoint: Add project_id and access_group_id filters

Add filtering capabilities to /key/list endpoint for project_id and access_group_id parameters. Both filters work globally across all visibility rules and stack with existing sort/pagination params. Added comprehensive unit tests for the new filters.

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

* [Feature] UI - Projects: Add Project Details page with Edit modal

- Add ProjectDetailsPage with header, details card, spend/budget progress,
  model spend bar chart, keys placeholder, and team info card
- Refactor CreateProjectModal into base form pattern (ProjectBaseForm)
  shared between Create and Edit flows
- Add EditProjectModal with pre-filled form data from backend
- Add useProjectDetails and useUpdateProject hooks
- Add duplicate key validation for model limits and metadata
- Wire project ID click in table to navigate to detail view
- Move pagination inline with search bar

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Update ui/litellm-dashboard/src/components/Projects/ProjectModals/CreateProjectModal.tsx

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

* fix(azure): forward realtime_protocol from config and relax api_version check for GA path

The realtime_protocol parameter set in config.yaml litellm_params was
not reliably reaching the Azure realtime handler. Add fallback chain:
kwargs → litellm_params → LITELLM_AZURE_REALTIME_PROTOCOL env var → beta.

Also relax the api_version validation to only require it for the beta
protocol path, since the GA/v1 path does not use api_version in the URL.

Make protocol matching case-insensitive so 'ga', 'GA', 'v1', 'V1' all
work consistently. Fix _construct_url type signature to accept Optional
api_version.

Fixes #22127

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

---------

Co-authored-by: Zero Clover <zero@root.me>
Co-authored-by: Ryan Crabbe <rcrabbe@berkeley.edu>
Co-authored-by: ryan-crabbe <128659760+ryan-crabbe@users.noreply.github.com>
Co-authored-by: Dylan Duan <dylan.duan@assemblyai.com>
Co-authored-by: Julio Quinteros Pro <jquinter@gmail.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com>
Co-authored-by: Shivaang <shivaang.05@gmail.com>
Co-authored-by: Cursor Agent <cursoragent@cursor.com>
Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>
Co-authored-by: milan-berri <milan@berri.ai>
Co-authored-by: Shivam Rawat <161387515+shivamrawat1@users.noreply.github.com>
Co-authored-by: Brian Caswell <bcaswell@microsoft.com>
Co-authored-by: Brian Caswell <bcaswell@gmail.com>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Co-authored-by: rasmi <rrelasmar@gmail.com>
Co-authored-by: yuneng-jiang <yuneng.jiang@gmail.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sameerlite pushed a commit that referenced this pull request Mar 2, 2026
* fix(image_generation): propagate extra_headers to OpenAI image generation

Add headers parameter to image_generation() and aimage_generation() methods
in OpenAI provider, and pass headers from images/main.py to ensure custom
headers like cf-aig-authorization are properly forwarded to the OpenAI API.
Aligns behavior with completion() method and Azure provider implementation.

* test(image_generation): add tests for extra_headers propagation

Verify that extra_headers are correctly forwarded to OpenAI's
images.generate() in both sync and async paths, and that they
are absent when not provided.

* Add Prometheus child_exit cleanup for gunicorn workers

When a gunicorn worker exits (e.g. from max_requests recycling), its
per-process prometheus .db files remain on disk. For gauges using
livesum/liveall mode, this means the dead worker's last-known values
persist as if the process were still alive. Wire gunicorn's child_exit
hook to call mark_process_dead() so live-tracking gauges accurately
reflect only running workers.

* docs: update AssemblyAI docs with Universal-3 Pro, Speech Understanding, and LLM Gateway (#21130)

* docs: update AssemblyAI docs with Universal-3 Pro, Speech Understanding, and LLM Gateway provider config

* feat: add AssemblyAI LLM Gateway as OpenAI-compatible provider

* fix(mcp): update test mocks to use renamed filter_server_ids_by_ip_with_info

Tests were mocking the old method name `filter_server_ids_by_ip` but production
code at server.py:774 calls `filter_server_ids_by_ip_with_info` which returns
a (server_ids, blocked_count) tuple. The unmocked method on AsyncMock returned
a coroutine, causing "cannot unpack non-iterable coroutine object" errors.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(test): update realtime guardrail test assertions for voice violation behavior

Tests were asserting no response.create/conversation.item.create sent to
backend when guardrail blocks, but the implementation intentionally sends
these to have the LLM voice the guardrail violation message to the user.

Updated assertions to verify the correct guardrail flow:
- response.cancel is sent to stop any in-progress response
- conversation.item.create with violation message is injected
- response.create is sent to voice the violation
- original blocked content is NOT forwarded

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(bedrock): restore parallel_tool_calls mapping in map_openai_params

The revert in 8565c70 removed the parallel_tool_calls handling from
map_openai_params, and the subsequent fix d0445e1 only re-added the
transform_request consumption but forgot to re-add the map_openai_params
producer that sets _parallel_tool_use_config. This meant parallel_tool_calls
was silently ignored for all Bedrock models.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(test): update Azure pass-through test to mock litellm.completion

Commit 99c62ca removed "azure" from _RESPONSES_API_PROVIDERS,
routing Azure models through litellm.completion instead of
litellm.responses. The test was not updated to match, causing it
to assert against the wrong mock.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: add in_flight_requests metric to /health/backlog + prometheus (#22319)

* feat: add in_flight_requests metric to /health/backlog + prometheus

* refactor: clean class with static methods, add tests, fix sentinel pattern

* docs: add in_flight_requests to prometheus metrics and latency troubleshooting

* fix(db): add missing migration for LiteLLM_ClaudeCodePluginTable

PR #22271 added the LiteLLM_ClaudeCodePluginTable model to
schema.prisma but did not include a corresponding migration file,
causing test_aaaasschema_migration_check to fail.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: update stale docstring to match guardrail voicing behavior

Addresses Greptile review feedback.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(caching): store background task references in LLMClientCache._remove_key to prevent unawaited coroutine warnings

Fixes #22128

* [Feat] Agent RBAC Permission Fix - Ensure Internal Users cannot create agents (#22329)

* fix: enforce RBAC on agent endpoints — block non-admin create/update/delete

- Add /v1/agents/{agent_id} to agent_routes so internal users can
  access GET-by-ID (previously returned 403 due to missing route pattern)
- Add _check_agent_management_permission() guard to POST, PUT, PATCH,
  DELETE agent endpoints — only PROXY_ADMIN may mutate agents
- Add user_api_key_dict param to delete_agent so the role check works
- Add comprehensive unit tests for RBAC enforcement across all roles

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* fix: mock prisma_client in internal user get-agent-by-id test

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* feat(ui): hide agent create/delete controls for non-admin users

Match MCP servers pattern: wrap '+ Add New Agent' button in
isAdmin conditional so internal users see a read-only agents view.
Delete buttons in card and table were already gated.
Update empty-state copy for non-admin users.
Add 7 Vitest tests covering role-based visibility.

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

---------

Co-authored-by: Cursor Agent <cursoragent@cursor.com>
Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* fix: Add PROXY_ADMIN role to system user for key rotation (#21896)

* fix: Add PROXY_ADMIN role to system user for key rotation

The key rotation worker was failing with 'You are not authorized to regenerate this key'
when rotating team keys. This was because the system user created by
get_litellm_internal_jobs_user_api_key_auth() was missing the user_role field.

Without user_role=PROXY_ADMIN, the system user couldn't bypass team permission checks
in can_team_member_execute_key_management_endpoint(), causing authorization failures
for team key rotation.

This fix adds user_role=LitellmUserRoles.PROXY_ADMIN to the system user, allowing
it to bypass team permission checks and successfully rotate keys for all teams.

* test: Add unit test for system user PROXY_ADMIN role

- Verify internal jobs system user has PROXY_ADMIN role
- Critical for key rotation to bypass team permission checks
- Regression test for PR #21896

* fix: populate user_id and user_info for admin users in /user/info (#22239)

* fix: populate user_id and user_info for admin users in /user/info endpoint

Fixes #22179

When admin users call /user/info without a user_id parameter, the endpoint
was returning null for both user_id and user_info fields. This broke
budgeting tooling that relies on /user/info to look up current budget and spend.

Changes:
- Modified _get_user_info_for_proxy_admin() to accept user_api_key_dict parameter
- Added logic to fetch admin's own user info from database
- Updated function to return admin's user_id and user_info instead of null
- Updated unit test to verify admin user_id is populated

The fix ensures admin users get their own user information just like regular users.

* test: make mock get_data signature match real method

- Updated MockPrismaClientDB.get_data() to accept all parameters that the real method accepts
- Makes mock more robust against future refactors
- Added datetime and Union imports
- Mock now returns None when user_id is not provided

* [Fix] Pass MCP auth headers from request into tool fetch for /v1/responses and chat completions (#22291)

* fixed dynamic auth for /responses with mcp

* fixed greptile concern

* fix(bedrock): filter internal json_tool_call when mixed with real tools

Fixes #18381: When using both tools and response_format with Bedrock
Converse API, LiteLLM internally adds json_tool_call to handle structured
output. Bedrock may return both this internal tool AND real user-defined
tools, breaking consumers like OpenAI Agents SDK.

Changes:
- Non-streaming: Added _filter_json_mode_tools() to handle 3 scenarios:
  only json_tool_call (convert to content), mixed (filter it out), or
  no json_tool_call (pass through)
- Streaming: Added json_mode tracking to AWSEventStreamDecoder to suppress
  json_tool_call chunks and convert to text content
- Fixed optional_params.pop() mutation issue

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

* refactor: extract duplicated JSON unwrapping into helper method

Addresses review comment from greptile-apps:
#21107 (review)

Changes:
- Added `_unwrap_bedrock_properties()` helper method to eliminate code duplication
- Replaced two identical JSON unwrapping blocks (lines 1592-1601 and 1612-1620)
  with calls to the new helper method
- Improves maintainability - single source of truth for Bedrock properties unwrapping logic

The helper method:
- Parses JSON string
- Checks for single "properties" key structure
- Unwraps and returns the properties value
- Returns original string if unwrapping not needed or parsing fails

No functional changes - pure refactoring.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

* fix: use correct class name AmazonConverseConfig in helper method calls

Fixed MyPy errors where BedrockConverseConfig was used instead of
AmazonConverseConfig in the _unwrap_bedrock_properties() calls.

Errors:
- Line 1619: BedrockConverseConfig -> AmazonConverseConfig
- Line 1631: BedrockConverseConfig -> AmazonConverseConfig

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

* fix: shorten guardrail benchmark result filenames for Windows long path support

Fixes #21941

The generated result filenames from _save_confusion_results contained
parentheses, dots, and full yaml filenames, producing paths that exceed
the Windows 260-char MAX_PATH limit. Rework the safe_label logic to
produce short {topic}_{method_abbrev} filenames (e.g. insults_cf.json)
while preserving the full label inside the JSON content.

Rename existing tracked result files to match the new naming convention.

* Update litellm/proxy/guardrails/guardrail_hooks/litellm_content_filter/guardrail_benchmarks/test_eval.py

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

* Remove Apache 2 license from SKILL.md (#22322)

* fix(mcp): default available_on_public_internet to true (#22331)

* fix(mcp): default available_on_public_internet to true

MCPs were defaulting to private (available_on_public_internet=false) which
was a breaking change. This reverts the default to public (true) across:
- Pydantic models (AddMCPServerRequest, UpdateMCPServerRequest, LiteLLM_MCPServerTable)
- Prisma schema @default
- mcp_server_manager.py YAML config + DB loading fallbacks
- UI form initialValue and setFieldValue defaults

* fix(ui): add forceRender to Collapse.Panel so toggle defaults render correctly

Ant Design's Collapse.Panel lazy-renders children by default. Without
forceRender, the Form.Item for 'Available on Public Internet' isn't
mounted when the useEffect fires form.setFieldValue, causing the Switch
to visually show OFF even though the intended default is true.

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* fix(mcp): update remaining schema copies and MCPServer type default to true

Missed in previous commit per Greptile review:
- schema.prisma (root)
- litellm-proxy-extras/litellm_proxy_extras/schema.prisma
- litellm/types/mcp_server/mcp_server_manager.py MCPServer class

* ui(mcp): reframe network access as 'Internal network only' restriction

Replace scary 'Available on Public Internet' toggle with 'Internal network only'
opt-in restriction. Toggle OFF (default) = all networks allowed. Toggle ON =
restricted to internal network only. Auth is always required either way.

- MCPPermissionManagement: new label/tooltip/description, invert display via
  getValueProps/getValueFromEvent so underlying available_on_public_internet
  value is unchanged
- mcp_server_view: 'Public' → 'All networks', 'Internal' → 'Internal only' (orange)
- mcp_server_columns: same badge updates

---------

Co-authored-by: Cursor Agent <cursoragent@cursor.com>
Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* fix(jwt): OIDC discovery URLs, roles array handling, dot-notation error hints (#22336)

* fix(jwt): support OIDC discovery URLs, handle roles array, improve error hints

Three fixes for Azure AD JWT auth:

1. OIDC discovery URL support - JWT_PUBLIC_KEY_URL can now be set to
   .well-known/openid-configuration endpoints. The proxy fetches the
   discovery doc, extracts jwks_uri, and caches it.

2. Handle roles claim as array - when team_id_jwt_field points to a list
   (e.g. AAD's "roles": ["team1"]), auto-unwrap the first element instead
   of crashing with 'unhashable type: list'.

3. Better error hint for dot-notation indexing - when team_id_jwt_field is
   set to "roles.0" or "roles[0]", the 401 error now explains to use
   "roles" instead and that LiteLLM auto-unwraps lists.

* Add integration demo script for JWT auth fixes (OIDC discovery, array roles, dot-notation hints)

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* Add demo_servers.py for manual JWT auth testing with mock JWKS/OIDC endpoints

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* Add demo screenshots for PR comment

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* Add integration test results with screenshots for PR review

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* address greptile review feedback (greploop iteration 1)

- fix: add HTTP status code check in _resolve_jwks_url before parsing JSON
- fix: remove misleading bracket-notation hint from debug log (get_nested_value does not support it)

* Update tests/test_litellm/proxy/auth/test_handle_jwt.py

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

* remove demo scripts and assets

---------

Co-authored-by: Cursor Agent <cursoragent@cursor.com>
Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

* perf: streaming latency improvements — 4 targeted hot-path fixes (#22346)

* perf: raise aiohttp connection pool limits (300→1000, 50/host→500)

* perf: skip model_copy() on every chunk — only copy usage-bearing chunks

* perf: replace list+join O(n²) with str+= O(n) in async_data_generator

* perf: cache model-level guardrail lookup per request, not per chunk

* test: add comprehensive Vitest coverage for CostTrackingSettings

Add 88 tests across 9 test files for the CostTrackingSettings component directory:
- provider_display_helpers.test.ts: 9 tests for helper functions
- how_it_works.test.tsx: 9 tests for discount calculator component
- add_provider_form.test.tsx: 7 tests for provider form validation
- add_margin_form.test.tsx: 9 tests for margin form with type toggle
- provider_discount_table.test.tsx: 12 tests for table editing and interactions
- provider_margin_table.test.tsx: 13 tests for margin table with sorting
- use_discount_config.test.ts: 11 tests for discount hook logic
- use_margin_config.test.ts: 12 tests for margin hook logic
- cost_tracking_settings.test.tsx: 15 tests for main component and role-based rendering

All tests passing. Coverage includes form validation, user interactions, API calls, state management, and conditional rendering.

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

* [Feature] Key list endpoint: Add project_id and access_group_id filters

Add filtering capabilities to /key/list endpoint for project_id and access_group_id parameters. Both filters work globally across all visibility rules and stack with existing sort/pagination params. Added comprehensive unit tests for the new filters.

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

* [Feature] UI - Projects: Add Project Details page with Edit modal

- Add ProjectDetailsPage with header, details card, spend/budget progress,
  model spend bar chart, keys placeholder, and team info card
- Refactor CreateProjectModal into base form pattern (ProjectBaseForm)
  shared between Create and Edit flows
- Add EditProjectModal with pre-filled form data from backend
- Add useProjectDetails and useUpdateProject hooks
- Add duplicate key validation for model limits and metadata
- Wire project ID click in table to navigate to detail view
- Move pagination inline with search bar

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Update ui/litellm-dashboard/src/components/Projects/ProjectModals/CreateProjectModal.tsx

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

* fix(anthropic): handle OAuth tokens in count_tokens endpoint

The count_tokens API's get_required_headers() always set x-api-key,
which is incorrect for OAuth tokens (sk-ant-oat*). These tokens must
use Authorization: Bearer instead.

Changes:
- Add optionally_handle_anthropic_oauth() call in get_required_headers()
  to convert OAuth tokens from x-api-key to Authorization: Bearer
- Add _merge_beta_headers() helper to preserve existing anthropic-beta
  values (e.g. token-counting) when appending the OAuth beta header
- Add 7 tests covering regular and OAuth header generation

Fixes #22040

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

---------

Co-authored-by: Zero Clover <zero@root.me>
Co-authored-by: Ryan Crabbe <rcrabbe@berkeley.edu>
Co-authored-by: ryan-crabbe <128659760+ryan-crabbe@users.noreply.github.com>
Co-authored-by: Dylan Duan <dylan.duan@assemblyai.com>
Co-authored-by: Julio Quinteros Pro <jquinter@gmail.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com>
Co-authored-by: Shivaang <shivaang.05@gmail.com>
Co-authored-by: Cursor Agent <cursoragent@cursor.com>
Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>
Co-authored-by: milan-berri <milan@berri.ai>
Co-authored-by: Shivam Rawat <161387515+shivamrawat1@users.noreply.github.com>
Co-authored-by: Brian Caswell <bcaswell@microsoft.com>
Co-authored-by: Brian Caswell <bcaswell@gmail.com>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Co-authored-by: rasmi <rrelasmar@gmail.com>
Co-authored-by: yuneng-jiang <yuneng.jiang@gmail.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants