[Fix] Remove dangerous schema diff force-apply from Prisma migration startup by yuneng-jiang · Pull Request #23464 · BerriAI/litellm

yuneng-jiang · 2026-03-12T18:03:36Z

Relevant issues

Reported by a user who experienced schema thrashing during rolling deployments due to the migration system's diff-and-apply mechanism.

Summary

Failure Path (Before Fix)

_resolve_all_migrations was called after every successful prisma migrate deploy as a "sanity check." This method:

Generated a diff between the DB and schema.prisma
Force-applied that diff via prisma db execute, bypassing Prisma's migration tracking

During rolling deployments, if version A and version B had different schema.prisma files, each startup would force-push its own schema over the other's, causing a thrashing loop. Additionally, non-idempotent P3009 migration failures were silently retried instead of failing fast.

Fix

Removed _resolve_all_migrations and the post-deploy "sanity check" call — if prisma migrate deploy succeeds, the DB is already correct
Replaced with _mark_all_migrations_applied which only updates the _prisma_migrations tracking table without generating or applying diffs
P3005 (non-empty DB, first migration) now uses _mark_all_migrations_applied + prisma migrate deploy instead of the diff mechanism
P3009 non-idempotent errors now raise RuntimeError with actionable instructions instead of silently retrying

Testing

All 24 unit tests pass (tests/litellm-proxy-extras/test_litellm_proxy_extras_utils.py)
Added 5 new tests:
- TestMarkAllMigrationsApplied: verifies the new method marks migrations without generating/applying diffs, and skips already-applied migrations
- TestSetupDatabaseFailFast: verifies P3009 non-idempotent errors raise RuntimeError, and successful deploys don't call diff/resolve logic
Audited all 110 existing migration SQL files — all contain complete, self-contained DDL and do not depend on the removed diff mechanism
Verified no CI tests reference _resolve_all_migrations or depend on the removed behavior

Type

🐛 Bug Fix
✅ Test

The `_resolve_all_migrations` method was generating a diff between the DB and schema.prisma and force-applying it via `prisma db execute`, bypassing migration tracking entirely. This caused schema thrashing during rolling deployments when two versions with different schemas were running. - Remove `_resolve_all_migrations` and its post-deploy "sanity check" call - Replace with `_mark_all_migrations_applied` (marks only, no diff/apply) - P3005 handler now uses `_mark_all_migrations_applied` + `prisma migrate deploy` - P3009 non-idempotent errors now fail fast with actionable error messages Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

vercel · 2026-03-12T18:03:44Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
litellm	Error		Mar 12, 2026 7:26pm

greptile-apps · 2026-03-12T18:07:26Z

Greptile Summary

This PR addresses a real and critical rolling-deployment bug where _resolve_all_migrations was force-applying schema diffs on every startup, causing thrashing between pods running different schema versions. The fix removes that mechanism and replaces it with a cleaner _deploy_with_idempotent_resolution loop that handles P3009/P3018 Prisma errors natively. The new approach is architecturally sound for the normal deploy path.

Key changes and remaining concerns:

Core fix is correct: Removing _resolve_all_migrations eliminates the schema-thrashing root cause. If prisma migrate deploy succeeds, no post-deploy diff is needed.
_deploy_with_idempotent_resolution properly loops through multiple idempotent failures and raises RuntimeError for non-recoverable ones — this is a significant improvement over the previous silent-retry pattern.
Deleted migration file (20260309115809_add_missing_indexes): Removing this file without a corresponding DROP INDEX migration leaves LiteLLM_VerificationToken_key_alias_idx and LiteLLM_SpendLogs_user_startTime_idx orphaned in any database that already applied this migration. Schema drift will be permanent unless manually resolved. Additionally, new deployments will silently lose the index-backed query performance those entries were providing.
P3005 baseline path (_mark_all_migrations_applied then return True): Multiple prior review threads have detailed that this path marks all migrations — including newly-shipped ones — as applied without executing their SQL, and then returns success without re-running deploy. This remains an open concern.
Tests: New tests for _deploy_with_idempotent_resolution and _mark_all_migrations_applied are a good addition, but several assertions against the exact Prisma command array do not mock _get_prisma_command(), making them sensitive to the PRISMA_OFFLINE_MODE environment variable.

Confidence Score: 2/5

Not safe to merge without resolving the deleted migration file issue and the open P3005 baseline path concerns from prior threads
The core rolling-deployment fix is correct and well-tested. However, deleting 20260309115809_add_missing_indexes without a DROP INDEX migration creates permanent schema drift and potential index loss for any environment that already applied it. Additionally, multiple substantive concerns about the P3005 path (new migrations silently skipped) and the idempotent resolution re-deploy gap remain open from prior review threads and have not been addressed in this revision.
Pay close attention to litellm-proxy-extras/litellm_proxy_extras/migrations/20260309115809_add_missing_indexes/migration.sql (deleted without drop migration) and litellm-proxy-extras/litellm_proxy_extras/utils.py lines 443–460 (P3005 baseline path).

Important Files Changed

Filename	Overview
litellm-proxy-extras/litellm_proxy_extras/utils.py	Core migration logic refactored to remove dangerous diff-and-apply mechanism; adds _deploy_with_idempotent_resolution and _mark_all_migrations_applied; multiple logic concerns in P3005 baseline path and error handling already flagged in prior threads
litellm-proxy-extras/litellm_proxy_extras/migrations/20260311180521_schema_sync/migration.sql	New migration drops six columns and one index from LiteLLM_MCPServerTable — destructive DDL already flagged as unsafe for rolling deployments in prior threads
litellm-proxy-extras/litellm_proxy_extras/migrations/20260309115809_add_missing_indexes/migration.sql	Migration file deleted without a corresponding DROP INDEX migration; any database that already applied this migration will have orphaned indexes that are no longer tracked by the Prisma schema
litellm-proxy-extras/litellm_proxy_extras/schema.prisma	Removes @@index([key_alias]) from LiteLLM_VerificationToken and @@index([user, startTime]) from LiteLLM_SpendLogs without any DROP INDEX migration, creating schema drift for existing deployments and silent performance regression for new ones
tests/litellm-proxy-extras/test_litellm_proxy_extras_utils.py	Adds meaningful new tests for _mark_all_migrations_applied and _deploy_with_idempotent_resolution; assertions against exact Prisma command arrays are environment-sensitive because _get_prisma_command() is not mocked

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[setup_database called\nuse_migrate=True] --> B[_deploy_with_idempotent_resolution]
    B --> C{prisma migrate deploy}
    C -->|Success| D[return True ✅]
    C -->|CalledProcessError| E{P3009 or P3018?}
    E -->|No| F[raise CalledProcessError]
    E -->|Yes| G[_resolve_failed_migration]
    G --> H{idempotent error?}
    H -->|No - non-idempotent| I[_roll_back_migration\nthen raise RuntimeError ❌]
    H -->|Yes - idempotent| J[_roll_back_migration\n+ _resolve_specific_migration]
    J --> K[continue loop\nRe-deploy]
    K --> C
    B -->|max_resolutions exceeded| L[raise RuntimeError\nmax iterations ❌]

    C -->|CalledProcessError\nP3005 - non-empty DB| M[_create_baseline_migration]
    M --> N[_mark_all_migrations_applied\nmarks ALL migrations applied\nwithout running SQL]
    N --> O[return True ⚠️\nNew migrations not deployed]

    style D fill:#90EE90
    style I fill:#FFB6C1
    style L fill:#FFB6C1
    style O fill:#FFD700

Comments Outside Diff (1)

litellm-proxy-extras/litellm_proxy_extras/migrations/20260309115809_add_missing_indexes/migration.sql

Deleted migration file causes orphaned indexes and schema drift

This migration file created LiteLLM_VerificationToken_key_alias_idx and LiteLLM_SpendLogs_user_startTime_idx via CREATE INDEX CONCURRENTLY. Deleting the file without adding a corresponding DROP INDEX migration in its place leaves two problems:
1. Schema drift for existing deployments: Any database that already applied 20260309115809_add_missing_indexes (e.g. a staging or CI environment on the litellm_internal_dev_03_12_2026 branch) will have these indexes present in the database. Because schema.prisma no longer declares @@index([key_alias]) or @@index([user, startTime]), Prisma will permanently consider the DB to be drifted, and prisma migrate diff/prisma migrate status will flag it as out of sync.
2. Silent performance regression for new deployments: New databases will never get these indexes. The comments in the deleted schema entries indicate they were specifically added to accelerate ORDER BY key_alias queries on LiteLLM_VerificationToken and time-bounded spend queries on LiteLLM_SpendLogs. Removing them without documentation will silently degrade those query paths.
The standard Prisma approach is to add a new migration that drops the indexes rather than delete the migration that created them:
```
-- DropIndex
DROP INDEX CONCURRENTLY IF EXISTS "LiteLLM_VerificationToken_key_alias_idx";

-- DropIndex
DROP INDEX CONCURRENTLY IF EXISTS "LiteLLM_SpendLogs_user_startTime_idx";
```
If no environment has ever run 20260309115809_add_missing_indexes (i.e. it was only ever on an internal dev branch), please add a comment confirming this so reviewers can verify the deletion is safe.

_{Last reviewed commit: b4ee36c}

greptile-apps · 2026-03-12T18:07:30Z

litellm-proxy-extras/litellm_proxy_extras/utils.py

+                                    raise RuntimeError(
+                                        f"Migration {failed_migration} failed and requires manual intervention. "
+                                        f"Please inspect the migration and database state, then either:\n"
+                                        f"  - Fix the issue and restart, or\n"
+                                        f"  - Run: prisma migrate resolve --applied {failed_migration}\n"
+                                        f"Original error: {e.stderr}"
+                                    ) from e


Misleading remediation instruction in error message

The RuntimeError message suggests running prisma migrate resolve --applied {failed_migration} as an alternative to fixing and restarting. However, --applied marks the migration as applied in Prisma's tracking table without actually executing the migration SQL, which can leave the database schema in an inconsistent state if the migration contained important DDL changes.

A user following this instruction after a partial migration failure would silently skip the migration entirely. A clearer message would clarify that --applied should only be used when the changes were already applied to the database by other means (e.g., run manually):

raise RuntimeError( f"Migration {failed_migration} failed and requires manual intervention. " f"The migration has been marked as rolled-back. Please either:\n" f" 1. Fix the migration SQL and restart to re-run it, or\n" f" 2. Manually apply the changes to the DB, then run:\n" f" prisma migrate resolve --applied {failed_migration}\n" f"Original error: {e.stderr}" ) from e

- P3009 with unparseable migration name now fails fast instead of silently falling through to the generic retry handler - Removed --applied suggestion from RuntimeError message since it skips running migration SQL and can leave schema inconsistent - Increased retry attempts from 4 to 5 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

greptile-apps · 2026-03-12T18:16:14Z

litellm-proxy-extras/litellm_proxy_extras/utils.py

+                            if ProxyExtrasDBManager._is_idempotent_error(e.stderr):
+                                logger.info(
+                                    f"Migration {failed_migration} failed due to idempotent error (e.g., column already exists), resolving as applied"
+                                )
+                                ProxyExtrasDBManager._roll_back_migration(
+                                    failed_migration
+                                )
+                                ProxyExtrasDBManager._resolve_specific_migration(
+                                    failed_migration
+                                )
+                                logger.info(
+                                    f"✅ Migration {failed_migration} resolved."
+                                )
+                                return True


P3009 idempotent path returns True without re-running deploy

After resolving the idempotent P3009 failure the code immediately returns True, but there may be additional pending migrations after the now-resolved one that were never applied. prisma migrate deploy stops at the first failing migration, so any migrations B, C, … that followed the resolved migration A are still unapplied when the function returns successfully.

This stands in contrast to the P3005 path added in this same PR, which correctly re-runs prisma migrate deploy after _mark_all_migrations_applied to pick up remaining pending migrations. The inconsistency means an idempotent P3009 failure leaves the database in a partially-migrated state that is only corrected on the next startup (when deploy runs again cleanly).

Suggested fix — add a re-deploy after the rollback/resolve, mirroring the P3005 pattern:

ProxyExtrasDBManager._roll_back_migration( failed_migration ) ProxyExtrasDBManager._resolve_specific_migration( failed_migration ) logger.info( f"✅ Migration {failed_migration} resolved." ) # Re-run deploy to apply any remaining pending migrations result = subprocess.run( [_get_prisma_command(), "migrate", "deploy"], timeout=60, check=True, capture_output=True, text=True, env=_get_prisma_env(), ) logger.info(f"prisma migrate deploy stdout: {result.stdout}") return True

greptile-apps · 2026-03-12T18:16:15Z

tests/litellm-proxy-extras/test_litellm_proxy_extras_utils.py

+    def test_p3009_non_idempotent_raises_runtime_error(
+        self, mock_run, mock_dir, mock_getcwd, mock_chdir
+    ):
+        """P3009 with non-idempotent error should raise RuntimeError, not silently retry"""
+        error = subprocess.CalledProcessError(
+            1,
+            "prisma",
+            stderr="P3009: migrate found failed migrations in the target database, `20250329084805_new_cron_job_table` migration. Error: syntax error at or near 'ALTR'",
+            output="",
+        )
+        mock_run.side_effect = error
+
+        with pytest.raises(RuntimeError, match="requires manual intervention"):
+            ProxyExtrasDBManager.setup_database(use_migrate=True)
+


Test doesn't isolate rollback call from initial deploy call

mock_run.side_effect = error causes every subprocess.run call to raise the same P3009 CalledProcessError. This means _roll_back_migration (which calls subprocess.run internally) also raises, is silently caught by except Exception as rollback_error, and then RuntimeError is raised. The test passes, but it is actually testing the "rollback-also-fails" code path rather than the primary one.

A more representative test would let _roll_back_migration succeed (mock side-effects as an iterable — first call raises, second call returns normally) so the test exercises the path where rollback succeeds but RuntimeError is still raised:

mock_run.side_effect = [ error, # prisma migrate deploy → raises P3009 MagicMock(), # _roll_back_migration → succeeds ]

With the current single side-effect this also makes it impossible to assert that the rollback was actually attempted (mock_run.call_count is unverified here).

- P3009 idempotent path now re-runs prisma migrate deploy after resolving the failed migration, so subsequent migrations are not left unapplied - _mark_all_migrations_applied raises RuntimeError on unexpected errors instead of silently swallowing them - P3009 non-idempotent test now exercises the rollback-succeeds path and verifies rollback was called Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The previous approach resolved one idempotent migration per outer retry attempt, exhausting all 5 retries when a DB had 4+ idempotent failures (e.g., tables/columns already created by the old force-apply logic). Now _deploy_with_idempotent_resolution loops internally: deploy → detect idempotent P3009/P3018 → rollback + resolve → re-deploy, until all idempotent migrations are resolved or a non-recoverable error is hit. This keeps outer retries available for transient errors (timeouts, etc). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

greptile-apps · 2026-03-12T18:30:52Z

litellm-proxy-extras/litellm_proxy_extras/utils.py

+                    text=True,
+                    env=_get_prisma_env(),
+                )
+                logger.info(f"prisma migrate deploy stdout: {result.stdout}")
+                logger.info("✅ prisma migrate deploy completed")
+                return
+            except subprocess.CalledProcessError as e:
+                logger.info(f"prisma db error: {e.stderr}, e: {e.stdout}")
+                if "P3009" in e.stderr or "P3018" in e.stderr:
+                    # Raises RuntimeError for non-recoverable errors,
+                    # returns normally for resolved idempotent errors
+                    ProxyExtrasDBManager._resolve_failed_migration(e)
+                    logger.info("Re-deploying remaining migrations...")
+                    continue
+                else:
+                    raise
+
+        raise RuntimeError(
+            f"Exceeded maximum idempotent resolutions ({max_resolutions}). "
+            f"This likely indicates a deeper issue with migration state."


P3005 path: new pending migrations silently skipped

_mark_all_migrations_applied globs all migration folders under migrations_dir/migrations/*/migration.sql and marks each one as applied via prisma migrate resolve --applied. This includes any migration file that was shipped in the current version but has not yet been run against the database.

Concretely: if version N+1 ships a new migration (20250401_add_new_feature/migration.sql), the P3005 path will:

Create the 0_init baseline (correct – captures current DB state)

Call _mark_all_migrations_applied, which marks 20250401_add_new_feature as applied without executing its SQL

Call prisma migrate deploy – this is now a no-op because every migration is already recorded as applied

The net result is that the new migration's DDL changes are never applied to the database, and the function returns True, silently reporting success.

The old _resolve_all_migrations avoided this by generating and executing a diff from the current DB to schema.prisma, so new schema changes were actually materialised even when all migrations were then marked as applied. The new approach drops that safety net for the P3005 path.

A minimal fix is to only mark migrations that are truly already reflected in the DB — i.e., those whose SQL was captured in the 0_init baseline — and let prisma migrate deploy run the genuinely pending ones normally:

# After _create_baseline_migration, only mark 0_init as applied # (it already does this internally). Then let prisma migrate deploy # apply any truly pending migrations. logger.info( "Baseline migration created; running prisma migrate deploy for pending migrations..." ) result = subprocess.run( [_get_prisma_command(), "migrate", "deploy"], timeout=60, check=True, capture_output=True, text=True, env=_get_prisma_env(), )

_create_baseline_migration already marks 0_init as applied, so Prisma will skip it and only run the truly pending migrations on the subsequent deploy. _mark_all_migrations_applied would no longer be needed for this path.

Revert "fix: add missing indexes for top CPU-consuming queries (#23147)"

greptile-apps · 2026-03-12T18:43:08Z

litellm-proxy-extras/litellm_proxy_extras/utils.py

+                ) from e
+            migration_name = migration_match.group(1)
+        else:
+            raise  # Not a P3009/P3018 — let outer handler deal with it


Bare raise is dead code and relies on fragile implicit exception context

This raise statement appears in the else branch meaning neither "P3009" nor "P3018" appears in stderr. However, _resolve_failed_migration is only ever called from _deploy_with_idempotent_resolution after an explicit check if "P3009" in e.stderr or "P3018" in e.stderr:, so this else branch is unreachable dead code.

Additionally, a bare raise inside a @staticmethod called from an except block works only because Python implicitly preserves the active exception in sys.exc_info() across function call frames during exception handling. If _resolve_failed_migration is ever called from a non-exception context in the future (e.g. a new call site), this line will instead raise RuntimeError: No active exception to re-raise, obscuring the original problem.

Since the method already receives e as a parameter, it should use raise e to make intent explicit and avoid the fragile implicit dependency:

Suggested change

raise # Not a P3009/P3018 — let outer handler deal with it

raise e # Not a P3009/P3018 — let outer handler deal with it

- Fix bare `raise` → `raise e` in unreachable else branch of _resolve_failed_migration for correctness and clarity - Remove _mark_all_migrations_applied from P3005 path — it was marking ALL migrations (including new ones) as applied, causing new migration SQL to never run. Now baseline creation is followed directly by _deploy_with_idempotent_resolution which actually runs pending migrations. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

greptile-apps · 2026-03-12T18:53:39Z

litellm-proxy-extras/litellm_proxy_extras/utils.py

@@ -456,86 +450,14 @@ def setup_database(use_migrate: bool = False) -> bool:
                            )
                            ProxyExtrasDBManager._create_baseline_migration(schema_path)
                            logger.info(
-                                "Baseline migration created, resolving all migrations"
-                            )
-                            ProxyExtrasDBManager._resolve_all_migrations(
-                                migrations_dir, schema_path
+                                "Baseline migration created, running deploy with idempotent resolution "
+                                "to apply remaining migrations..."
                            )
-                            logger.info("✅ All migrations resolved.")
+                            ProxyExtrasDBManager._deploy_with_idempotent_resolution()
+                            logger.info("✅ All migrations applied.")
                            return True


P3005 path silently skips new migrations on non-empty databases

In the P3005 path, _create_baseline_migration creates and marks 0_init as applied (representing the current DB state), then immediately calls _deploy_with_idempotent_resolution(). Any subsequent migration that performs a DROP COLUMN, DROP INDEX, or DROP TABLE on an object that doesn't yet exist in this particular DB will fail with a "does not exist" error. The _is_idempotent_error pattern list includes r"does not exist", so this failure is treated as idempotent and the migration is silently marked as applied without its SQL ever running.

Concrete scenario: a database was set up via db push (not migrate), so it lacks the columns being dropped by 20260311180521_schema_sync. The P3005 path will silently skip that migration, which is fine here — but if a future migration adds something that depends on a prior migration's ALTER TABLE ... ADD COLUMN having run, the silent skip will leave the DB in a broken state.

The PR description states "P3005 now uses _mark_all_migrations_applied + prisma migrate deploy", but the actual code uses _create_baseline_migration + _deploy_with_idempotent_resolution without calling _mark_all_migrations_applied at all. The two approaches differ: the current code attempts to re-execute each migration (relying on idempotent resolution to skip already-applied ones), whereas _mark_all_migrations_applied would skip them unconditionally. Consider whether the stated intent matches the implementation, and whether a conditional _mark_all_migrations_applied call before _deploy_with_idempotent_resolution would be safer for the P3005 case.

greptile-apps · 2026-03-12T18:53:40Z

litellm-proxy-extras/litellm_proxy_extras/migrations/20260311180521_schema_sync/migration.sql

+-- DropIndex
+DROP INDEX "LiteLLM_MCPServerTable_approval_status_idx";
+
+-- AlterTable
+ALTER TABLE "LiteLLM_MCPServerTable" DROP COLUMN "approval_status",
+DROP COLUMN "review_notes",
+DROP COLUMN "reviewed_at",
+DROP COLUMN "source_url",
+DROP COLUMN "submitted_at",
+DROP COLUMN "submitted_by";
+


Destructive column removal is not safe for rolling deployments

This migration permanently drops approval_status, review_notes, reviewed_at, source_url, submitted_at, and submitted_by from LiteLLM_MCPServerTable. During a rolling deployment, pods still running the previous version will attempt to read or write these columns and receive a column does not exist database error the moment the new pod applies this migration.

The PR is specifically addressing rolling-deployment safety, so this bundled schema change works against that goal. The standard zero-downtime pattern is: (1) deploy new code that no longer references the columns, then (2) in a subsequent deploy, run the DROP COLUMN migration once all pods have moved to the new code. If these columns are already unreferenced by application code in the current branch, please confirm that before merging, and consider documenting that a brief read-path error window is expected.

greptile-apps · 2026-03-12T18:53:41Z

litellm-proxy-extras/litellm_proxy_extras/utils.py

@@ -382,72 +437,11 @@ def setup_database(use_migrate: bool = False) -> bool:
                if use_migrate:
                    logger.info("Running prisma migrate deploy")
                    try:
-                        # Set migrations directory for Prisma
-                        result = subprocess.run(
-                            [_get_prisma_command(), "migrate", "deploy"],
-                            timeout=60,
-                            check=True,
-                            capture_output=True,
-                            text=True,
-                            env=_get_prisma_env(),
-                        )
-                        logger.info(f"prisma migrate deploy stdout: {result.stdout}")
-
-                        logger.info("prisma migrate deploy completed")
-
-                        # Run sanity check to ensure DB matches schema
-                        logger.info("Running post-migration sanity check...")
-                        ProxyExtrasDBManager._resolve_all_migrations(
-                            migrations_dir, schema_path, mark_all_applied=False
-                        )
-                        logger.info("✅ Post-migration sanity check completed")
+                        ProxyExtrasDBManager._deploy_with_idempotent_resolution()
                        return True
                    except subprocess.CalledProcessError as e:
                        logger.info(f"prisma db error: {e.stderr}, e: {e.stdout}")
-                        if "P3009" in e.stderr:
-                            # Extract the failed migration name from the error message
-                            migration_match = re.search(
-                                r"`(\d+_.*)` migration", e.stderr
-                            )
-                            if migration_match:
-                                failed_migration = migration_match.group(1)
-                                if ProxyExtrasDBManager._is_idempotent_error(e.stderr):
-                                    logger.info(
-                                        f"Migration {failed_migration} failed due to idempotent error (e.g., column already exists), resolving as applied"
-                                    )
-                                    ProxyExtrasDBManager._roll_back_migration(
-                                        failed_migration
-                                    )
-                                    ProxyExtrasDBManager._resolve_specific_migration(
-                                        failed_migration
-                                    )
-                                    logger.info(
-                                        f"✅ Migration {failed_migration} resolved."
-                                    )
-                                    return True
-                                else:
-                                    logger.info(
-                                        f"Found failed migration: {failed_migration}, marking as rolled back"
-                                    )
-                                    # Mark the failed migration as rolled back
-                                    subprocess.run(
-                                        [
-                                            _get_prisma_command(),
-                                            "migrate",
-                                            "resolve",
-                                            "--rolled-back",
-                                            failed_migration,
-                                        ],
-                                        timeout=60,
-                                        check=True,
-                                        capture_output=True,
-                                        text=True,
-                                        env=_get_prisma_env(),
-                                    )
-                                    logger.info(
-                                        f"✅ Migration {failed_migration} marked as rolled back... retrying"
-                                    )
-                        elif (
+                        if (
                            "P3005" in e.stderr
                            and "database schema is not empty" in e.stderr
                        ):
@@ -456,86 +450,14 @@ def setup_database(use_migrate: bool = False) -> bool:
                            )
                            ProxyExtrasDBManager._create_baseline_migration(schema_path)
                            logger.info(
-                                "Baseline migration created, resolving all migrations"
-                            )
-                            ProxyExtrasDBManager._resolve_all_migrations(
-                                migrations_dir, schema_path
+                                "Baseline migration created, running deploy with idempotent resolution "
+                                "to apply remaining migrations..."
                            )
-                            logger.info("✅ All migrations resolved.")
+                            ProxyExtrasDBManager._deploy_with_idempotent_resolution()
+                            logger.info("✅ All migrations applied.")
                            return True
-                        elif "P3018" in e.stderr:
-                            # Check if this is a permission error or idempotent error
-                            if ProxyExtrasDBManager._is_permission_error(e.stderr):
-                                # Permission errors should NOT be marked as applied
-                                # Extract migration name for logging
-                                migration_match = re.search(
-                                    r"Migration name: (\d+_.*)", e.stderr
-                                )
-                                migration_name = (
-                                    migration_match.group(1)
-                                    if migration_match
-                                    else "unknown"
-                                )
-
-                                logger.error(
-                                    f"❌ Migration {migration_name} failed due to insufficient permissions. "
-                                    f"Please check database user privileges. Error: {e.stderr}"
-                                )
-
-                                # Mark as rolled back and exit with error
-                                if migration_match:
-                                    try:
-                                        ProxyExtrasDBManager._roll_back_migration(
-                                            migration_name
-                                        )
-                                        logger.info(
-                                            f"Migration {migration_name} marked as rolled back"
-                                        )
-                                    except Exception as rollback_error:
-                                        logger.warning(
-                                            f"Failed to mark migration as rolled back: {rollback_error}"
-                                        )
-
-                                # Re-raise the error to prevent silent failures
-                                raise RuntimeError(
-                                    f"Migration failed due to permission error. Migration {migration_name} "
-                                    f"was NOT applied. Please grant necessary database permissions and retry."
-                                ) from e
-
-                            elif ProxyExtrasDBManager._is_idempotent_error(e.stderr):
-                                # Idempotent errors mean the migration has effectively been applied
-                                logger.info(
-                                    "Migration failed due to idempotent error (e.g., column already exists), "
-                                    "resolving as applied"
-                                )
-                                # Extract the migration name from the error message
-                                migration_match = re.search(
-                                    r"Migration name: (\d+_.*)", e.stderr
-                                )
-                                if migration_match:
-                                    migration_name = migration_match.group(1)
-                                    logger.info(
-                                        f"Rolling back migration {migration_name}"
-                                    )
-                                    ProxyExtrasDBManager._roll_back_migration(
-                                        migration_name
-                                    )
-                                    logger.info(
-                                        f"Resolving migration {migration_name} that failed "
-                                        f"due to existing schema objects"
-                                    )
-                                    ProxyExtrasDBManager._resolve_specific_migration(
-                                        migration_name
-                                    )
-                                    logger.info("✅ Migration resolved.")
-                            else:
-                                # Unknown P3018 error - log and re-raise for safety
-                                logger.warning(
-                                    f"P3018 error encountered but could not classify "
-                                    f"as permission or idempotent error. "
-                                    f"Error: {e.stderr}"
-                                )
-                                raise
+                        else:
+                            raise


RuntimeError from _deploy_with_idempotent_resolution bypasses all retries and propagates uncaught

_deploy_with_idempotent_resolution raises RuntimeError for non-idempotent failures (via _resolve_failed_migration). In setup_database, the outer retry loop only catches subprocess.CalledProcessError and subprocess.TimeoutExpired:

for attempt in range(5): try: ... ProxyExtrasDBManager._deploy_with_idempotent_resolution() # can raise RuntimeError ... except subprocess.TimeoutExpired: ... except subprocess.CalledProcessError as e: ... # retried # RuntimeError is NOT caught → escapes setup_database entirely

A RuntimeError from a non-idempotent migration failure escapes setup_database without being caught, which is intentional for fail-fast semantics. However, call sites of setup_database that previously only expected a bool return value (or a CalledProcessError) will now receive an uncaught RuntimeError. The callers should be updated to handle this explicitly, or at minimum the function docstring should document that RuntimeError can be raised.

"duplicate key value violates" on CREATE UNIQUE INDEX means the index can't be created due to duplicate data — this is NOT idempotent. Marking it as applied silently skips index creation, leaving the DB without the unique constraint. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

greptile-apps · 2026-03-12T19:21:34Z

litellm-proxy-extras/litellm_proxy_extras/utils.py

@@ -354,10 +273,145 @@ def _resolve_all_migrations(
                )
                logger.debug(f"Resolved migration: {migration_name}")
            except subprocess.CalledProcessError as e:
-                if "is already recorded as applied in the database." not in e.stderr:
-                    logger.warning(
-                        f"Failed to resolve migration {migration_name}: {e.stderr}"
+                if "is already recorded as applied in the database." in e.stderr:
+                    logger.debug(
+                        f"Migration {migration_name} already recorded as applied"
                    )
+                else:
+                    raise RuntimeError(
+                        f"Failed to mark migration {migration_name} as applied: {e.stderr}"
+                    ) from e


_mark_all_migrations_applied is dead production code

This method is defined, documented, and covered by four new tests, but it is never called in any production code path. The PR description states "P3005 (non-empty DB, first migration) now uses _mark_all_migrations_applied + prisma migrate deploy", but the actual implementation uses _create_baseline_migration (which marks only 0_init as applied) followed by _deploy_with_idempotent_resolution.

A quick grep confirms no call site exists outside of tests:

grep -n "_mark_all_migrations_applied" utils.py # 245: def _mark_all_migrations_applied(migrations_dir: str): ← definition only

The intended semantics — mark all migrations as already reflected in the DB — are important to document correctly. If this method is meant as a utility for future use, a clear # NOTE: not currently called in production; see _create_baseline_migration for the P3005 path comment would help. If it is genuinely needed in the P3005 flow, it should be wired in; otherwise, the tests are covering a code path that never executes in production.

P3005 fires only during the db-push-to-migrate transition, when the _prisma_migrations table doesn't exist yet. Since db push already kept the schema in sync, all shipped migrations are already reflected in the DB. Mark them all as applied (standard Prisma baselining) instead of running deploy-with-resolution, which would fail on pre-existing data issues like duplicate rows preventing unique index creation. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

greptile-apps · 2026-03-12T19:29:52Z

tests/litellm-proxy-extras/test_litellm_proxy_extras_utils.py

+        return_value=["20250326162113_baseline"],
+    )
+    def test_skips_already_applied_migration(self, mock_get_names, mock_run):
+        """Verify already-applied migrations are silently skipped"""
+        mock_run.side_effect = subprocess.CalledProcessError(
+            1,
+            "prisma",
+            stderr="Migration `20250326162113_baseline` is already recorded as applied in the database.",
+        )


Tests assert exact Prisma command without mocking _get_prisma_command()

Multiple tests assert against a hardcoded ["prisma", "migrate", "deploy"] array:

assert cmd == ["prisma", "migrate", "deploy"]

_get_prisma_command() is environment-sensitive — when PRISMA_OFFLINE_MODE=true is set (a valid production and potentially CI configuration), it returns a full binary path like /app/.cache/prisma-python/binaries/node_modules/.bin/prisma or a PRISMA_CLI_PATH override instead of "prisma". These assertions would then fail despite the production code working correctly.

The same pattern appears in test_p3009_non_idempotent_raises_runtime_error (line 222), test_p3009_idempotent_redeploys (line 341), and test_successful_deploy (line 343). Patching _get_prisma_command to return a fixed value in each test class would make the assertions environment-agnostic:

@patch("litellm_proxy_extras.utils._get_prisma_command", return_value="prisma")

yuneng-jiang changed the base branch from main to litellm_internal_dev_03_12_2026 March 12, 2026 18:04

vercel bot had a problem deploying to Preview March 12, 2026 18:04 Failure

greptile-apps bot reviewed Mar 12, 2026

View reviewed changes

vercel bot had a problem deploying to Preview March 12, 2026 18:11 Failure

greptile-apps bot reviewed Mar 12, 2026

View reviewed changes

vercel bot had a problem deploying to Preview March 12, 2026 18:22 Failure

greptile-apps bot reviewed Mar 12, 2026

View reviewed changes

vercel bot had a problem deploying to Preview March 12, 2026 18:31 Failure

yuneng-jiang added 5 commits March 12, 2026 11:34

Merge pull request #23402 from BerriAI/litellm_proxy_extras_mar11

c47bcec

Revert "fix: add missing indexes for top CPU-consuming queries (#23147)"

adding build

3738ca5

bump: version 0.4.53 → 0.4.54

b87dc64

bump: version 0.4.54 → 0.4.55

27a10dc

proxy extras

be4ba42

vercel bot had a problem deploying to Preview March 12, 2026 18:38 Failure

greptile-apps bot reviewed Mar 12, 2026

View reviewed changes

vercel bot had a problem deploying to Preview March 12, 2026 18:48 Failure

greptile-apps bot reviewed Mar 12, 2026

View reviewed changes

vercel bot had a problem deploying to Preview March 12, 2026 19:18 Failure

greptile-apps bot reviewed Mar 12, 2026

View reviewed changes

vercel bot had a problem deploying to Preview March 12, 2026 19:26 Failure

greptile-apps bot reviewed Mar 12, 2026

View reviewed changes

yuneng-jiang marked this pull request as draft March 12, 2026 19:32

	raise # Not a P3009/P3018 — let outer handler deal with it
	raise e # Not a P3009/P3018 — let outer handler deal with it

Uh oh!

Conversation

yuneng-jiang commented Mar 12, 2026

Relevant issues

Summary

Failure Path (Before Fix)

Fix

Testing

Type

Uh oh!

vercel bot commented Mar 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

greptile-apps bot commented Mar 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 2/5

Important Files Changed

Flowchart

Comments Outside Diff (1)

Uh oh!

greptile-apps bot Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

vercel bot commented Mar 12, 2026 •

edited

Loading

greptile-apps bot commented Mar 12, 2026 •

edited

Loading