Skip to content

fix: make db migration failure exit opt-in via --enforce_prisma_migration_check#23675

Merged
2 commits merged intomainfrom
krrishdholakia/db-exit-opt-in
Mar 16, 2026
Merged

fix: make db migration failure exit opt-in via --enforce_prisma_migration_check#23675
2 commits merged intomainfrom
krrishdholakia/db-exit-opt-in

Conversation

@ghost
Copy link
Copy Markdown

@ghost ghost commented Mar 15, 2026

Summary

  • Database migration failures now warn and continue by default instead of calling sys.exit(1).
  • Added --enforce_prisma_migration_check CLI flag (or ENFORCE_PRISMA_MIGRATION_CHECK=true env var) to opt into exiting on migration failure.
  • Fixed two pre-existing pyright errors: reportArgumentType on get_secret return type and reportPossiblyUnboundVariable on litellm_settings.

Pre-Submission checklist

  • I have Added testing in the tests/test_litellm/ directory, Adding at least 1 test is a hard requirement - see details
  • My PR passes all unit tests on make test-unit
  • My PR's scope is as isolated as possible, it only solves 1 specific problem

Type

🐛 Bug Fix

Changes

  • Replaced --skip_db_migration_check with --enforce_prisma_migration_check in proxy_cli.py
  • Default: warn and continue on db migration failure
  • Opt-in: set --enforce_prisma_migration_check to exit on failure

🤖 Generated with Claude Code

… proxy_cli

- Clarify --skip_db_migration_check messaging so users know how to opt
  into warn-and-continue behavior when database setup fails
- Fix pyright reportArgumentType error by casting get_secret result to str
- Fix pyright reportPossiblyUnboundVariable by initializing litellm_settings

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@vercel
Copy link
Copy Markdown

vercel bot commented Mar 15, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
litellm Ready Ready Preview, Comment Mar 15, 2026 1:56am

Request Review

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Mar 15, 2026

Greptile Summary

This PR renames the --skip_db_migration_check CLI flag to --enforce_prisma_migration_check and fixes a misleading error message that told users to pass a flag they had already passed. It also fixes two pre-existing pyright errors: a type coercion on the get_secret return value and an unbound litellm_settings variable.

Key issues:

  • Breaking default behavior change: The most critical problem is that the default behavior has been silently inverted. Previously, sys.exit(1) was the default (safe) outcome when DB setup failed; passing --skip_db_migration_check was the opt-in to continue with a warning. After this PR, the default is now "warn and continue" — meaning every deployment that omitted the flag (the majority of users) will now no longer have the proxy exit on DB failure. This is a backwards-incompatible regression that violates the project's policy of not changing default behavior without a feature flag.
  • Dropped env var / CLI flag without deprecation: SKIP_DB_MIGRATION_CHECK and --skip_db_migration_check are silently removed. Operators with these flags in Dockerfiles, Helm values, or startup scripts will get the new permissive default instead of their intended opt-in, with no error or deprecation warning.
  • PR description does not match actual changes: The description says "when database setup fails, the proxy exits with sys.exit(1) by default" and mentions --skip_db_migration_check throughout, but the code has replaced that flag and changed the default to the opposite behavior.
  • No tests added: The pre-submission checklist's testing item remains unchecked; the new code path (warn and continue as default) and the strict-exit opt-in path lack unit-test coverage.

Confidence Score: 1/5

  • Not safe to merge — the default behavior change silently makes DB-failure handling permissive for all existing deployments that omit the flag.
  • The core intent of the PR (fixing a confusing error message) is sound, but the implementation inverts the default behavior from "exit on failure" to "warn and continue". This is a backwards-incompatible change that could cause production deployments to silently start without a database. The old flag and env var are also dropped without a deprecation path.
  • litellm/proxy/proxy_cli.py — specifically lines 559–565 (flag definition) and 868–881 (the branching logic that changed the default).

Important Files Changed

Filename Overview
litellm/proxy/proxy_cli.py Renames --skip_db_migration_check to --enforce_prisma_migration_check and inverts the default behavior from "exit on failure" to "warn and continue", introducing a backwards-incompatible behavioral change. Also fixes a pyright type error with str(database_url) and initializes litellm_settings = None to prevent an unbound variable warning.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[Proxy startup] --> B{DATABASE_URL set and prisma found?}
    B -- No --> C[Print warning: prisma not found]
    B -- Yes --> D{should_update_prisma_schema?}
    D -- False --> E[check_prisma_schema_diff]
    D -- True --> F[PrismaManager.setup_database]
    F --> G{Setup succeeded?}
    G -- Yes --> H[Continue startup]
    G -- No --> I{enforce_prisma_migration_check set?}
    I -- True opt-in --> J[Print error and sys.exit 1]
    I -- False default --> K[Print warning and continue startup]

    subgraph NEW[NEW default after this PR]
        K
    end

    subgraph OLD[OLD default before this PR]
        OB[Print error and sys.exit 1 was the default]
    end

    style J fill:#ff4444,color:#fff
    style K fill:#ffcc00,color:#000
    style OB fill:#ff4444,color:#fff
Loading

Last reviewed commit: a53ecd5

Comment on lines +834 to +836
modified_url = append_query_params(
str(database_url) if database_url else None, params
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

str() coercion may silently corrupt URL

get_secret has no explicit return type annotation; pyright infers it can return str | bool | None. The guard str(database_url) if database_url else None resolves the type error, but if the secrets manager ever returns bytes (e.g. from an AWS Secrets Manager binary secret), str(b"postgresql://...") produces the literal string "b'postgresql://...'", which is an invalid URL that would silently be set as DATABASE_URL and break the DB connection.

A safer fix that also satisfies the type checker would be:

database_url = get_secret("DATABASE_URL", default_value=None)
if isinstance(database_url, bytes):
    database_url = database_url.decode("utf-8")
modified_url = append_query_params(
    database_url if isinstance(database_url, str) else None, params
)

This is a low-probability edge case, but worth guarding against explicitly since DATABASE_URL is security-critical.

Comment on lines 877 to 882
print( # noqa
"\033[1;31mLiteLLM Proxy: Database setup failed after multiple retries. "
"The proxy cannot start safely. Please check your database connection and migration status.\033[0m"
"The proxy cannot start safely. Please check your database connection and migration status. "
"Pass --skip_db_migration_check or set SKIP_DB_MIGRATION_CHECK=true to warn and continue instead of exiting.\033[0m"
)
sys.exit(1)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing test for the new error path

The PR description's own pre-submission checklist notes "Adding at least 1 test is a hard requirement," but the tests checkbox is unchecked and no test was added. The new sys.exit(1) code path (when skip_db_migration_check is False and setup fails) and the updated warning branch (when skip_db_migration_check is True) both lack unit-test coverage. Without tests, a future refactor could silently regress the corrected message or remove the sys.exit(1) call.

Consider adding tests in tests/test_litellm/ that mock PrismaManager.setup_database to return False and assert:

  1. The correct exit message is printed and sys.exit(1) is called when skip_db_migration_check=False.
  2. The correct warning message is printed and startup continues when skip_db_migration_check=True.

Rule Used: What: Ensure that any PR claiming to fix an issue ... (source)

…n_check

Flip the default behavior: database migration failures now warn and
continue by default. Only when --enforce_prisma_migration_check (or
ENFORCE_PRISMA_MIGRATION_CHECK=true) is explicitly set will the proxy
exit on migration failure.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@codspeed-hq
Copy link
Copy Markdown
Contributor

codspeed-hq bot commented Mar 15, 2026

Merging this PR will not alter performance

✅ 16 untouched benchmarks


Comparing krrishdholakia/db-exit-opt-in (a53ecd5) with main (be20a8a)

Open in CodSpeed

Comment on lines 559 to +881
@@ -602,7 +602,7 @@ def run_server( # noqa: PLR0915
skip_server_startup,
keepalive_timeout,
max_requests_before_restart,
skip_db_migration_check: bool,
enforce_prisma_migration_check: bool,
):
args = locals()
if local:
@@ -716,6 +716,7 @@ def run_server( # noqa: PLR0915
for k, v in new_env_var.items():
os.environ[k] = v

litellm_settings = None
if config is not None:
"""
Allow user to pass in db url via config
@@ -830,7 +831,9 @@ def run_server( # noqa: PLR0915
"pool_timeout": db_connection_timeout,
}
database_url = get_secret("DATABASE_URL", default_value=None)
modified_url = append_query_params(database_url, params)
modified_url = append_query_params(
str(database_url) if database_url else None, params
)
os.environ["DATABASE_URL"] = modified_url
if os.getenv("DIRECT_URL", None) is not None:
### add connection pool + pool timeout args
@@ -865,17 +868,17 @@ def run_server( # noqa: PLR0915
if not PrismaManager.setup_database(
use_migrate=not use_prisma_db_push
):
if skip_db_migration_check:
print( # noqa
"\033[1;33mLiteLLM Proxy: Database migration failed but continuing startup. "
"Pass --skip_db_migration_check to allow this.\033[0m"
)
else:
if enforce_prisma_migration_check:
print( # noqa
"\033[1;31mLiteLLM Proxy: Database setup failed after multiple retries. "
"The proxy cannot start safely. Please check your database connection and migration status.\033[0m"
)
sys.exit(1)
else:
print( # noqa
"\033[1;33mLiteLLM Proxy: Database migration failed but continuing startup. "
"Set --enforce_prisma_migration_check or ENFORCE_PRISMA_MIGRATION_CHECK=true to exit on failure.\033[0m"
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Default behavior inverted — backwards-incompatible change

The PR inverts the default behavior in a backwards-incompatible way, violating the project's policy of never making breaking changes without a user-controlled feature flag.

Before (original code):

  • Default (skip_db_migration_check=False): sys.exit(1) on DB failure — safe/strict by default
  • Opt-in (--skip_db_migration_check): warn and continue — permissive, explicit user choice

After (this PR):

  • Default (enforce_prisma_migration_check=False): warn and continue — permissive by default
  • Opt-in (--enforce_prisma_migration_check): sys.exit(1) on DB failure

This means any production deployment that previously relied on the proxy stopping when the database was unreachable will now silently start up and attempt to serve requests without a database. This is a regression: operators who deployed without --skip_db_migration_check intentionally wanted the strict behavior.

Additionally, the old env var SKIP_DB_MIGRATION_CHECK and CLI flag --skip_db_migration_check are silently dropped. Any Dockerfiles, Helm charts, or startup scripts referencing those flags will now be ignored without error, and those deployments will get the new (permissive) default rather than the intended (permissive) opt-in they explicitly configured.

To fix this without breaking existing deployments, the original semantics should be preserved: keep sys.exit(1) as the default and fix only the misleading message. The message fix from this PR is correct and should be kept:

if skip_db_migration_check:
    print(  # noqa
        "\033[1;33mLiteLLM Proxy: Database migration failed but continuing startup. "
        "Set SKIP_DB_MIGRATION_CHECK=true to suppress this warning.\033[0m"
    )
else:
    print(  # noqa
        "\033[1;31mLiteLLM Proxy: Database setup failed after multiple retries. "
        "The proxy cannot start safely. Pass --skip_db_migration_check or set "
        "SKIP_DB_MIGRATION_CHECK=true to warn and continue instead of exiting.\033[0m"
    )
    sys.exit(1)

Rule Used: What: avoid backwards-incompatible changes without... (source)

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an intentional change. We do not want to block on startup, as the initial PR added this week caused that regression. We want to make strict enforcement opt in for now, and move to it being default, with time.

@ghost ghost changed the title fix: clarify db migration failure exit with opt-in skip flag fix: make db migration failure exit opt-in via --enforce_prisma_migration_check Mar 15, 2026
@ghost ghost merged commit cd37ee1 into main Mar 16, 2026
36 of 39 checks passed
yuneng-jiang added a commit that referenced this pull request Mar 16, 2026
…nforcement

The --enforce_prisma_migration_check flag is now required to trigger
sys.exit(1) on DB migration failure, after #23675 flipped the default
behavior to warn-and-continue.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
yuneng-jiang added a commit that referenced this pull request Mar 16, 2026
…nforcement

The --enforce_prisma_migration_check flag is now required to trigger
sys.exit(1) on DB migration failure, after #23675 flipped the default
behavior to warn-and-continue.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
ghost pushed a commit that referenced this pull request Mar 16, 2026
* fix(test): add missing mocks for test_streamable_http_mcp_handler_mock

The test was missing mocks for extract_mcp_auth_context and set_auth_context,
causing the handler to fail silently in the except block instead of reaching
session_manager.handle_request. This mirrors the fix already applied to the
sibling test_sse_mcp_handler_mock.

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* fix(ci): route OpenAI models through chat completions in pass-through tests

The test_anthropic_messages_openai_model_streaming_cost_injection test fails
because the OpenAI Responses API returns 400 for requests routed through the
Anthropic Messages endpoint. Setting LITELLM_USE_CHAT_COMPLETIONS_URL_FOR_ANTHROPIC_MESSAGES=true
routes OpenAI models through the stable chat completions path instead.
Cost injection still works since it happens at the proxy level.

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* fix(ci): fix assemblyai custom auth and router wildcard test flakiness

1. custom_auth_basic.py: Add user_role='proxy_admin' so the custom auth
   user can access management endpoints like /key/generate. The test
   test_assemblyai_transcribe_with_non_admin_key was hidden behind an
   earlier -x failure and was never reached before.

2. test_router_utils.py: Add flaky(retries=3) and increase sleep from 1s
   to 2s for test_router_get_model_group_usage_wildcard_routes. The async
   callback needs time to write usage to cache, and 1s is insufficient on
   slower CI hardware.

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* ci: retrigger CI pipeline

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* fix(mypy): use LitellmUserRoles enum instead of raw string in custom_auth_basic

Fixes mypy error: Argument 'user_role' has incompatible type 'str'; expected 'LitellmUserRoles | None'

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* fix: don't close HTTP/SDK clients on LLMClientCache eviction (#22926)

* fix: don't close HTTP/SDK clients on LLMClientCache eviction

Removing the _remove_key override that eagerly called aclose()/close()
on evicted clients. Evicted clients may still be held by in-flight
streaming requests; closing them causes:

  RuntimeError: Cannot send a request, as the client has been closed.

This is a regression from commit fb72979. Clients that are no longer
referenced will be garbage-collected naturally. Explicit shutdown cleanup
happens via close_litellm_async_clients().

Fixes production crashes after the 1-hour cache TTL expires.

* test: update LLMClientCache unit tests for no-close-on-eviction behavior

Flip the assertions: evicted clients must NOT be closed. Replace
test_remove_key_closes_async_client → test_remove_key_does_not_close_async_client
and equivalents for sync/eviction paths.

Add test_remove_key_removes_plain_values for non-client cache entries.
Remove test_background_tasks_cleaned_up_after_completion (no more _background_tasks).
Remove test_remove_key_no_event_loop variant that depended on old behavior.

* test: add e2e tests for OpenAI SDK client surviving cache eviction

Add two new e2e tests using real AsyncOpenAI clients:
- test_evicted_openai_sdk_client_stays_usable: verifies size-based eviction
  doesn't close the client
- test_ttl_expired_openai_sdk_client_stays_usable: verifies TTL expiry
  eviction doesn't close the client

Both tests sleep after eviction so any create_task()-based close would
have time to run, making the regression detectable.

Also expand the module docstring to explain why the sleep is required.

* docs(AGENTS.md): add rule — never close HTTP/SDK clients on cache eviction

* docs(CLAUDE.md): add HTTP client cache safety guideline

* [Fix] Install bsdmainutils for column command in security scans

The security_scans.sh script uses `column` to format vulnerability
output, but the package wasn't installed in the CI environment.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: handle string callback values in prometheus multiproc setup

When callbacks are configured as a plain string (e.g., `callbacks: "my_callback"`)
instead of a list, the proxy crashes on startup with:
  TypeError: can only concatenate str (not "list") to str

Normalize each callback setting to a list before concatenating.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* bump: version 1.82.2 → 1.82.3

* fix(test): update test_startup_fails_when_db_setup_fails for opt-in enforcement

The --enforce_prisma_migration_check flag is now required to trigger
sys.exit(1) on DB migration failure, after #23675 flipped the default
behavior to warn-and-continue.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(cost_calculator): use model name for per-request custom pricing when router_model_id has no pricing

When custom pricing is passed as per-request kwargs (input_cost_per_token/output_cost_per_token),
completion() registers pricing under the model name, but _select_model_name_for_cost_calc was
selecting the router deployment hash (which has no pricing data), causing response_cost to be 0.0.

Now checks whether the router_model_id entry actually has pricing before preferring it.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Cursor Agent <cursoragent@cursor.com>
Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>
Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
ishaan-jaff added a commit that referenced this pull request Mar 17, 2026
…3382)

* fix: langfuse trace leak key on model params

* fix: pop sensitive keys from langfuse

* fixes

* fix: set oauth2_flow when building MCPServer in _execute_with_mcp_client

* fix: add oauth2_flow to NewMCPServerRequest and guard auto-detect with token_url

* fix: narrow oauth2_flow type to Literal in NewMCPServerRequest

* fix: align DefaultInternalUserParams Pydantic default with runtime fallback

The Pydantic default for user_role was INTERNAL_USER, but all runtime
provisioning paths (SSO, SCIM, JWT) fall back to INTERNAL_USER_VIEW_ONLY
when no settings are saved. This caused the UI to show "Internal User"
on fresh instances while new users actually got "Internal Viewer".

* test: add regression test for fresh-instance default role sync

Asserts that GET /get/internal_user_settings returns
INTERNAL_USER_VIEW_ONLY on a fresh DB with no saved settings,
matching the runtime fallback in SSO/SCIM/JWT provisioning.

* Update tests/test_litellm/proxy/ui_crud_endpoints/test_proxy_setting_endpoints.py

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

* Add unit tests for 5 previously untested UI dashboard files

Tests added for: UiLoadingSpinner, HashicorpVaultEmptyPlaceholder,
PageVisibilitySettings, errorUtils, and mcpToolCrudClassification.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: remove skip decorators from m2m tests now that oauth2_flow is set

* [Fix] Privilege escalation: restrict /key/block, /key/unblock, and max_budget updates to admins

Non-admin users (INTERNAL_USER) could call /key/block and /key/unblock on
arbitrary keys, and modify max_budget on their own keys via /key/update.
These endpoints are now restricted to proxy admins, team admins, or org admins.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* chore(ui): migrate DefaultUserSettings buttons from Tremor to antd

* [Infra] Merging RC Branch with Main (#23786)

* fix(test): add missing mocks for test_streamable_http_mcp_handler_mock

The test was missing mocks for extract_mcp_auth_context and set_auth_context,
causing the handler to fail silently in the except block instead of reaching
session_manager.handle_request. This mirrors the fix already applied to the
sibling test_sse_mcp_handler_mock.

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* fix(ci): route OpenAI models through chat completions in pass-through tests

The test_anthropic_messages_openai_model_streaming_cost_injection test fails
because the OpenAI Responses API returns 400 for requests routed through the
Anthropic Messages endpoint. Setting LITELLM_USE_CHAT_COMPLETIONS_URL_FOR_ANTHROPIC_MESSAGES=true
routes OpenAI models through the stable chat completions path instead.
Cost injection still works since it happens at the proxy level.

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* fix(ci): fix assemblyai custom auth and router wildcard test flakiness

1. custom_auth_basic.py: Add user_role='proxy_admin' so the custom auth
   user can access management endpoints like /key/generate. The test
   test_assemblyai_transcribe_with_non_admin_key was hidden behind an
   earlier -x failure and was never reached before.

2. test_router_utils.py: Add flaky(retries=3) and increase sleep from 1s
   to 2s for test_router_get_model_group_usage_wildcard_routes. The async
   callback needs time to write usage to cache, and 1s is insufficient on
   slower CI hardware.

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* ci: retrigger CI pipeline

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* fix(mypy): use LitellmUserRoles enum instead of raw string in custom_auth_basic

Fixes mypy error: Argument 'user_role' has incompatible type 'str'; expected 'LitellmUserRoles | None'

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* fix: don't close HTTP/SDK clients on LLMClientCache eviction (#22926)

* fix: don't close HTTP/SDK clients on LLMClientCache eviction

Removing the _remove_key override that eagerly called aclose()/close()
on evicted clients. Evicted clients may still be held by in-flight
streaming requests; closing them causes:

  RuntimeError: Cannot send a request, as the client has been closed.

This is a regression from commit fb72979. Clients that are no longer
referenced will be garbage-collected naturally. Explicit shutdown cleanup
happens via close_litellm_async_clients().

Fixes production crashes after the 1-hour cache TTL expires.

* test: update LLMClientCache unit tests for no-close-on-eviction behavior

Flip the assertions: evicted clients must NOT be closed. Replace
test_remove_key_closes_async_client → test_remove_key_does_not_close_async_client
and equivalents for sync/eviction paths.

Add test_remove_key_removes_plain_values for non-client cache entries.
Remove test_background_tasks_cleaned_up_after_completion (no more _background_tasks).
Remove test_remove_key_no_event_loop variant that depended on old behavior.

* test: add e2e tests for OpenAI SDK client surviving cache eviction

Add two new e2e tests using real AsyncOpenAI clients:
- test_evicted_openai_sdk_client_stays_usable: verifies size-based eviction
  doesn't close the client
- test_ttl_expired_openai_sdk_client_stays_usable: verifies TTL expiry
  eviction doesn't close the client

Both tests sleep after eviction so any create_task()-based close would
have time to run, making the regression detectable.

Also expand the module docstring to explain why the sleep is required.

* docs(AGENTS.md): add rule — never close HTTP/SDK clients on cache eviction

* docs(CLAUDE.md): add HTTP client cache safety guideline

* [Fix] Install bsdmainutils for column command in security scans

The security_scans.sh script uses `column` to format vulnerability
output, but the package wasn't installed in the CI environment.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: handle string callback values in prometheus multiproc setup

When callbacks are configured as a plain string (e.g., `callbacks: "my_callback"`)
instead of a list, the proxy crashes on startup with:
  TypeError: can only concatenate str (not "list") to str

Normalize each callback setting to a list before concatenating.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* bump: version 1.82.2 → 1.82.3

* fix(test): update test_startup_fails_when_db_setup_fails for opt-in enforcement

The --enforce_prisma_migration_check flag is now required to trigger
sys.exit(1) on DB migration failure, after #23675 flipped the default
behavior to warn-and-continue.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(cost_calculator): use model name for per-request custom pricing when router_model_id has no pricing

When custom pricing is passed as per-request kwargs (input_cost_per_token/output_cost_per_token),
completion() registers pricing under the model name, but _select_model_name_for_cost_calc was
selecting the router deployment hash (which has no pricing data), causing response_cost to be 0.0.

Now checks whether the router_model_id entry actually has pricing before preferring it.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Cursor Agent <cursoragent@cursor.com>
Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>
Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* Update litellm/proxy/management_endpoints/key_management_endpoints.py

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

* fix: clear oauth2_flow when client_credentials set without token_url

* chore(ui): use antd danger prop instead of tailwind for Remove button

* feat: fetch blog posts from docs RSS feed instead of static JSON on GitHub

* fix: remove unused Any import from get_blog_posts

* [Fix] UI - Logs: Fix empty filter results showing stale data

Remove `.length > 0` check so that when a backend filter returns an
empty result set the table correctly shows no data instead of falling
back to the previous unfiltered logs.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* [Fix] Reapply empty filter fix after merge with main

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* [Fix] Prevent internal users from creating invalid keys via key/generate and key/update

Internal users could exploit key/generate and key/update to create unbound
keys (no user_id, no budget) or attach keys to non-existent teams. This
adds validation for non-admin callers: auto-assign user_id on generate,
reject invalid team_ids, and prevent removing user_id on update.

Closes LIT-1884

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* [Fix] Remove duplicate get_team_object call in _validate_update_key_data

Move the non-admin team validation into the existing get_team_object call
site to avoid an extra DB round-trip. The existing call already fetches
the team for limits checking — we now add the LIT-1884 guard there when
team_obj is None for non-admin callers.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* [Fix] Skip key_alias re-validation on update/regenerate when alias unchanged

When updating or regenerating a key without changing its key_alias, the
existing alias was being re-validated against current format rules. This
caused keys with legacy aliases (created before stricter validation) to
become uneditable. Now validation only runs when the alias actually changes.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* [Fix] Update log filter test to match empty-result behavior

The test expected fallback to all logs when backend filters return empty,
but the source was intentionally changed to show empty results instead of
stale data. Updated test to match.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* [Feature] Disable custom API key values via UI setting

Add disable_custom_api_keys UI setting that prevents users from specifying
custom key values during key generation and regeneration. When enabled, all
keys must be auto-generated, eliminating the risk of key hash collisions
in multi-tenant environments.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* [Fix] Add disable_custom_api_keys to UISettings Pydantic model

Without this field on the model, GET /get/ui_settings omits the setting
from the response and field_schema, preventing the UI from reading it.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: Register DynamoAI guardrail initializer and enum entry (#23752)

* fix: Register DynamoAI guardrail initializer and enum entry

Fix the "Unsupported guardrail: dynamoai" error by:
1. Adding DYNAMOAI to SupportedGuardrailIntegrations enum
2. Implementing initialize_guardrail() and registries in dynamoai/__init__.py

The DynamoAI guardrail was added in PR #15920 but never properly registered
in the initialization system. The __init__.py was missing the
guardrail_initializer_registry and guardrail_class_registry dictionaries
that the dynamic discovery mechanism looks for at module load time.

Fixes #22773

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

* Update litellm/proxy/guardrails/guardrail_hooks/dynamoai/__init__.py

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

* Update litellm/proxy/guardrails/guardrail_hooks/dynamoai/__init__.py

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

* test: Add tests for DynamoAI guardrail registration

Verifies enum entry, initializer registry, class registry,
instance creation, and global registry discovery.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Haiku 4.5 <noreply@anthropic.com>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

* docs: add v1.82.3 release notes and update provider_endpoints_support.json (#23816)

* [Feature] Add disable_custom_api_keys toggle to UI Settings page

Adds a toggle switch to the admin UI Settings page so administrators can
enable/disable custom API key values without making direct API calls.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Revert "docs: add v1.82.3 release notes and update provider_endpoints_support…" (#23817)

This reverts commit 9661249.

* [Fix] Rename toggle label to "Disable custom Virtual key values"

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* [Fix] Remove "API" from custom key description text

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(ui): CSV export empty on Global Usage page

Aggregated endpoint returns empty breakdown.entities; fall back to
grouping breakdown.api_keys by team_id.

* Revert "fix: langfuse trace leak key on model params"

* fix: support served_model_name for Baseten dedicated deployments

Baseten dedicated deployments use an 8-char deployment ID for URL routing,
but the vLLM server may expect a different model name in the request body
(e.g. baseten-hosted/zai-org/GLM-5 vs wd1lndkw). Add served_model_name
litellm_param to override the model field in the request body, and declare
it in LiteLLMParamsTypedDict and GenericLiteLLMParams for IDE support.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Harshit Jain <harshitjain0562@gmail.com>
Co-authored-by: Harshit Jain <48647625+Harshit28j@users.noreply.github.com>
Co-authored-by: AlexKer <AlexKer@users.noreply.github.com>
Co-authored-by: joereyna <joseph.reyna@gmail.com>
Co-authored-by: Ryan Crabbe <rcrabbe@berkeley.edu>
Co-authored-by: ryan-crabbe <128659760+ryan-crabbe@users.noreply.github.com>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Co-authored-by: yuneng-jiang <yuneng.jiang@gmail.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Cursor Agent <cursoragent@cursor.com>
Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>
Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com>
Co-authored-by: Krish Dholakia <krrishdholakia@gmail.com>
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

0 participants