feat: add STARTING and FINALIZING deployment statuses#5206
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
📝 WalkthroughWalkthroughIntroduces new deployment lifecycle states ("starting", "finalizing") across DB schema, generated models, API (OpenAPI + protobuf), and frontend. Adds SQL exec queries for cleaning deployment topology. Reworks the deploy workflow to a multi-phase Restate-based flow with a Compensation stack and consolidates several generated DB query return types from wrapper rows to concrete types. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant Workflow as DeployWorkflow
participant DB as Database
participant Builder as Depot/Builder
participant Routing as RoutingService
rect rgba(70,130,180,0.5)
Client->>Workflow: Start Deploy
Workflow->>DB: Insert Deployment, Insert Topologies (createTopologies)
DB-->>Workflow: OK
Workflow->>Builder: Build or Resolve Image (buildImage)
Builder-->>Workflow: ImageRef / Error
Workflow->>DB: Insert Sentinels / Ensure Sentinels
DB-->>Workflow: OK
Workflow->>Routing: Configure Routing (configureRouting)
Routing-->>Workflow: OK / Error
Workflow->>Workflow: Swap Live Deployment (swapLiveDeployment)
Workflow->>DB: Update App current_deployment_id
DB-->>Workflow: OK
Workflow->>Workflow: waitForDeployments (concurrent region checks)
Workflow-->>Client: Success / Public error on failure
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes 🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
web/apps/dashboard/app/(app)/[workspaceSlug]/projects/[projectId]/(overview)/deployments/[deploymentId]/(deployment-progress)/deployment-progress.tsx (1)
67-70:⚠️ Potential issue | 🟠 MajorFinalizing phase is bypassed by the redirect condition.
Line 67 redirects on
network?.completed, so the new finalizing step can be skipped in UI. Also, when deployment fails before finalizing, the finalizing card showspendinginstead ofskipped.Suggested minimal fix
- useEffect(() => { - if (network?.completed) { + useEffect(() => { + if (finalizing?.completed) { router.push(`/${workspaceSlug}/projects/${projectId}/deployments/${deployment.id}`); } - }, [network?.completed, router, workspaceSlug, projectId, deployment.id]); + }, [finalizing?.completed, router, workspaceSlug, projectId, deployment.id]); @@ - title="Deployment finalizing" + title="Deployment Finalizing" @@ - : "Pending" + : isFailed + ? "Skipped" + : "Pending" @@ - : "pending" + : isFailed + ? "skipped" + : "pending"Also applies to: 198-218
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@web/apps/dashboard/app/`(app)/[workspaceSlug]/projects/[projectId]/(overview)/deployments/[deploymentId]/(deployment-progress)/deployment-progress.tsx around lines 67 - 70, The useEffect redirect triggers on network?.completed and causes the new "finalizing" UI to be skipped; update the redirect and the finalizing status computation so the finalizing step is shown (or marked skipped) before navigating. Specifically, change the router.push condition in the effect that currently checks network?.completed to also ensure the deployment has no pending finalizing step (e.g., confirm the finalizing phase is complete via a dedicated flag or check the step state) before calling router.push(`/${workspaceSlug}/projects/${projectId}/deployments/${deployment.id}`); and in the code that computes the finalizing card status (the logic around the finalizing step displayed at lines ~198-218), ensure that when a deployment fails before finalizing you set that step's state to "skipped" instead of "pending". Adjust both places so finalizing is either rendered or correctly set to skipped before redirecting.svc/ctrl/worker/deploy/deploy_handler.go (1)
399-406:⚠️ Potential issue | 🔴 CriticalSort map-derived regions before making workflow/service calls to ensure deterministic execution order.
In durable workflows, non-deterministic map iteration can reorder side effects between runs, breaking idempotency. The
regionsslice built fromregionConfigkeys (lines 403-405) is passed directly to versioning service calls (line 411) without sorting, which will cause unpredictable topology creation order on retries.Suggested minimal fix
import ( "context" "database/sql" "errors" "fmt" + "sort" "time" @@ if len(regionConfig) == 0 { regions = w.availableRegions } else { for r := range regionConfig { regions = append(regions, r) } + sort.Strings(regions) }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@svc/ctrl/worker/deploy/deploy_handler.go` around lines 399 - 406, The regions slice built from regionConfig keys (or taken from w.availableRegions) is not deterministic; after assembling regions (the variable regions created from regionConfig or w.availableRegions) sort it (e.g., sort.Strings(regions)) before passing it to the versioning/service calls so workflow side-effects happen in a stable order; add the sort import if missing.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@svc/ctrl/api/github_webhook.go`:
- Around line 219-229: The transaction closure currently swallows
InsertDeploymentStep errors by unconditionally returning nil; update the closure
used for the transaction so that after calling db.Query.InsertDeploymentStep
(inside the tx closure) you check its returned error and return that error from
the closure instead of returning nil, ensuring the transaction aborts on
failure; specifically modify the anonymous closure passed to the transaction
(where db.Query.InsertDeploymentStep is invoked) to propagate the err from
InsertDeploymentStep (and any other DB calls) by returning err immediately
rather than returning nil.
In `@svc/ctrl/worker/deploy/deployment_step.go`:
- Around line 69-79: The current persistence call in restate.RunVoid uses
fault.UserFacingMessage(stepErr) which can be empty for non-wrapped errors and
if db.Query.EndDeploymentStep itself fails you lose the original stepErr; update
the EndDeploymentStep write to persist the full step error string (use
stepErr.Error() or fmt.Sprintf("%+v", stepErr) when stepErr != nil) into the
Error field instead of fault.UserFacingMessage, and if EndDeploymentStep returns
an error while stepErr != nil, return a combined error that preserves both the
original stepErr and the DB error (e.g., wrap the DB error with the original
stepErr or return a multi-error) so the original failure is not masked — touch
restate.RunVoid invocation handling, the db.Query.EndDeploymentStep call, and
the error return path to implement this change.
---
Outside diff comments:
In `@svc/ctrl/worker/deploy/deploy_handler.go`:
- Around line 399-406: The regions slice built from regionConfig keys (or taken
from w.availableRegions) is not deterministic; after assembling regions (the
variable regions created from regionConfig or w.availableRegions) sort it (e.g.,
sort.Strings(regions)) before passing it to the versioning/service calls so
workflow side-effects happen in a stable order; add the sort import if missing.
In
`@web/apps/dashboard/app/`(app)/[workspaceSlug]/projects/[projectId]/(overview)/deployments/[deploymentId]/(deployment-progress)/deployment-progress.tsx:
- Around line 67-70: The useEffect redirect triggers on network?.completed and
causes the new "finalizing" UI to be skipped; update the redirect and the
finalizing status computation so the finalizing step is shown (or marked
skipped) before navigating. Specifically, change the router.push condition in
the effect that currently checks network?.completed to also ensure the
deployment has no pending finalizing step (e.g., confirm the finalizing phase is
complete via a dedicated flag or check the step state) before calling
router.push(`/${workspaceSlug}/projects/${projectId}/deployments/${deployment.id}`);
and in the code that computes the finalizing card status (the logic around the
finalizing step displayed at lines ~198-218), ensure that when a deployment
fails before finalizing you set that step's state to "skipped" instead of
"pending". Adjust both places so finalizing is either rendered or correctly set
to skipped before redirecting.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: e4dc1f7e-16a5-4eb1-bd78-6bf1ca3c0cd0
⛔ Files ignored due to path filters (2)
gen/proto/ctrl/v1/deployment.pb.gois excluded by!**/*.pb.go,!**/gen/**web/apps/dashboard/gen/proto/ctrl/v1/deployment_pb.tsis excluded by!**/gen/**
📒 Files selected for processing (24)
pkg/db/BUILD.bazelpkg/db/deployment_topology_delete_by_deployment_id.sql_generated.gopkg/db/deployment_topology_delete_by_deployment_region_version.sql_generated.gopkg/db/models_generated.gopkg/db/querier_generated.gopkg/db/queries/deployment_topology_delete_by_deployment_id.sqlpkg/db/queries/deployment_topology_delete_by_deployment_region_version.sqlpkg/db/schema.sqlsvc/api/openapi/gen.gosvc/api/openapi/openapi-generated.yamlsvc/api/openapi/spec/paths/v2/deploy/getDeployment/V2DeployGetDeploymentResponseData.yamlsvc/api/routes/v2_deploy_get_deployment/handler.gosvc/ctrl/api/github_webhook.gosvc/ctrl/proto/ctrl/v1/deployment.protosvc/ctrl/services/deployment/get_deployment.gosvc/ctrl/worker/deploy/BUILD.bazelsvc/ctrl/worker/deploy/compensation.gosvc/ctrl/worker/deploy/deploy_handler.gosvc/ctrl/worker/deploy/deployment_step.gosvc/ctrl/worker/deploy/helpers.gosvc/ctrl/worker/deploy/promote_handler.goweb/apps/dashboard/app/(app)/[workspaceSlug]/projects/[projectId]/(overview)/deployments/[deploymentId]/(deployment-progress)/deployment-progress.tsxweb/internal/db/src/schema/deployment_steps.tsweb/internal/db/src/schema/deployments.ts
💤 Files with no reviewable changes (1)
- svc/ctrl/worker/deploy/helpers.go
63b45ef to
a2be884
Compare
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
There was a problem hiding this comment.
🧹 Nitpick comments (1)
svc/ctrl/api/github_webhook.go (1)
189-231: LGTM - error propagation now correct.The transaction properly returns errors from both
InsertDeploymentandInsertDeploymentStep, addressing the prior review concern.Minor style nit: using short variable declarations inside the closure is more idiomatic and avoids capturing the outer
err.,
Optional: use short variable declarations
err = db.Tx(ctx, s.db.RW(), func(txCtx context.Context, tx db.DBTX) error { - - err = db.Query.InsertDeployment(txCtx, tx, db.InsertDeploymentParams{ + if err := db.Query.InsertDeployment(txCtx, tx, db.InsertDeploymentParams{ ID: deploymentID, K8sName: uid.DNS1035(12), ... Healthcheck: envSettings.EnvironmentRuntimeSetting.Healthcheck, - }) - if err != nil { + }); err != nil { return err } - err = db.Query.InsertDeploymentStep(txCtx, tx, db.InsertDeploymentStepParams{ + if err := db.Query.InsertDeploymentStep(txCtx, tx, db.InsertDeploymentStepParams{ WorkspaceID: project.WorkspaceID, ... StartedAt: uint64(time.Now().UnixMilli()), - }) - if err != nil { + }); err != nil { return err } return nil })🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@svc/ctrl/api/github_webhook.go` around lines 189 - 231, The transaction closure in db.Tx currently assigns to the outer err variable (using "err = ...") when calling db.Query.InsertDeployment and db.Query.InsertDeploymentStep; change those assignments to short declarations (":=") so each call declares a local err inside the closure (e.g., use "err := db.Query.InsertDeployment(...)" and "err := db.Query.InsertDeploymentStep(...)") and keep the existing if err != nil checks, ensuring you don't inadvertently capture/modify the outer err or shadow variables like deploymentID/txCtx.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@svc/ctrl/api/github_webhook.go`:
- Around line 189-231: The transaction closure in db.Tx currently assigns to the
outer err variable (using "err = ...") when calling db.Query.InsertDeployment
and db.Query.InsertDeploymentStep; change those assignments to short
declarations (":=") so each call declares a local err inside the closure (e.g.,
use "err := db.Query.InsertDeployment(...)" and "err :=
db.Query.InsertDeploymentStep(...)") and keep the existing if err != nil checks,
ensuring you don't inadvertently capture/modify the outer err or shadow
variables like deploymentID/txCtx.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: c1c61019-c5ca-4076-8fca-01382b85e7c1
📒 Files selected for processing (1)
svc/ctrl/api/github_webhook.go
…_propagation_during_deployments
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@svc/ctrl/worker/deploy/deploy_handler.go`:
- Around line 405-407: Validate and reject or sanitize region replica counts
before populating regionConfig: when iterating
runtimeSettings.AppRuntimeSetting.RegionConfig in the deploy handler, ensure
each count is a positive integer (>0) and handle invalid values by returning an
error (or skipping with a logged warning) rather than inserting zeros or
negatives into regionConfig; apply the same validation logic used for the other
RegionConfig loop near the later conversion to topology (the second loop at
lines around 431-434) so the topology builder never receives non-positive
replica counts.
- Around line 812-813: The readiness calculation uses requiredHealthyRegions :=
min(2, len(topologies)) and can incorrectly pass when len(topologies)==0; add a
guard that checks if len(topologies) == 0 and treat it as an invalid/no-targets
condition (return an error or mark not-ready) before computing
requiredHealthyRegions and healthyRegions. Update the same pattern around the
later block referenced (the logic at the 838-843 region) so both places validate
topologies is non-empty and short-circuit appropriately instead of allowing a
zero requiredHealthyRegions to let readiness succeed.
In `@svc/ctrl/worker/deploy/promote_handler.go`:
- Around line 56-59: The public-facing error strings use “project” for
app-related failures; update the fault.Public messages to say “app” instead.
Specifically, in the fault.Wrap call that wraps
restate.TerminalError(fmt.Errorf("app not found: %s", targetDeployment.AppID),
404) replace fault.Public("The project could not be found") with language like
"The app could not be found"; also make the same change for the other occurrence
around the restate/fault.Public usage later in promote_handler.go (the second
fault.Public that currently mentions "project"). Ensure you keep the existing
error wrapping (restate.TerminalError and fault.Wrap) and only change the
wording in fault.Public.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 9f337d52-587b-4bb5-b98b-62ed7639850e
⛔ Files ignored due to path filters (2)
gen/proto/ctrl/v1/deployment.pb.gois excluded by!**/*.pb.go,!**/gen/**web/apps/dashboard/gen/proto/ctrl/v1/deployment_pb.tsis excluded by!**/gen/**
📒 Files selected for processing (27)
pkg/db/BUILD.bazelpkg/db/app_find_by_id.sql_generated.gopkg/db/app_update_deployments.sql_generated.gopkg/db/environment_find_by_id.sql_generated.gopkg/db/models_generated.gopkg/db/project_find_by_id.sql_generated.gopkg/db/querier_generated.gopkg/db/queries/app_find_by_id.sqlpkg/db/queries/app_update_deployments.sqlpkg/db/queries/environment_find_by_id.sqlpkg/db/queries/project_find_by_id.sqlpkg/db/schema.sqlsvc/api/internal/testutil/seed/seed.gosvc/api/openapi/gen.gosvc/api/openapi/openapi-generated.yamlsvc/ctrl/integration/seed/seed.gosvc/ctrl/proto/ctrl/v1/deployment.protosvc/ctrl/worker/customdomain/verify_handler.gosvc/ctrl/worker/deploy/cilium_policy.gosvc/ctrl/worker/deploy/deploy_handler.gosvc/ctrl/worker/deploy/deployment_step.gosvc/ctrl/worker/deploy/promote_handler.gosvc/ctrl/worker/deploy/rollback_handler.gosvc/ctrl/worker/deployment/deployment_state.gosvc/ctrl/worker/githubwebhook/handle_push.goweb/internal/db/src/schema/deployment_steps.tsweb/internal/db/src/schema/deployments.ts
✅ Files skipped from review due to trivial changes (2)
- svc/ctrl/integration/seed/seed.go
- svc/api/internal/testutil/seed/seed.go
🚧 Files skipped from review as they are similar to previous changes (4)
- svc/ctrl/proto/ctrl/v1/deployment.proto
- web/internal/db/src/schema/deployments.ts
- pkg/db/BUILD.bazel
- web/internal/db/src/schema/deployment_steps.ts
There was a problem hiding this comment.
🧹 Nitpick comments (1)
web/apps/dashboard/app/(app)/[workspaceSlug]/projects/[projectId]/components/deployment-status-badge.tsx (1)
5-13: Use the sharedDeploymentStatustype instead of a local union.This local redeclaration can drift from the canonical status domain and miss future status additions elsewhere. Prefer importing the shared type from the deployment filters schema so illegal states stay unrepresentable across modules.
♻️ Proposed minimal refactor
import { CircleWarning } from "@unkey/icons"; import { Badge } from "@unkey/ui"; import { cn } from "@unkey/ui/src/lib/utils"; +import type { DeploymentStatus } from "../(overview)/deployments/filters.schema"; -type DeploymentStatus = - | "pending" - | "starting" - | "building" - | "deploying" - | "network" - | "finalizing" - | "ready" - | "failed";As per coding guidelines, "Make illegal states unrepresentable by modeling domain with discriminated unions and parsing inputs at boundaries into typed structures".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@web/apps/dashboard/app/`(app)/[workspaceSlug]/projects/[projectId]/components/deployment-status-badge.tsx around lines 5 - 13, Replace the local DeploymentStatus union with the canonical shared type: remove the local type declaration for DeploymentStatus in deployment-status-badge.tsx and import the exported DeploymentStatus from the deployment filters schema module (use the exported symbol name DeploymentStatus) and update any local references (props, variables, function signatures) to use that imported type so the component uses the single source-of-truth for statuses. Ensure any remaining uses that assumed string literal union still type-check against the imported DeploymentStatus and adjust imports/exports accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In
`@web/apps/dashboard/app/`(app)/[workspaceSlug]/projects/[projectId]/components/deployment-status-badge.tsx:
- Around line 5-13: Replace the local DeploymentStatus union with the canonical
shared type: remove the local type declaration for DeploymentStatus in
deployment-status-badge.tsx and import the exported DeploymentStatus from the
deployment filters schema module (use the exported symbol name DeploymentStatus)
and update any local references (props, variables, function signatures) to use
that imported type so the component uses the single source-of-truth for
statuses. Ensure any remaining uses that assumed string literal union still
type-check against the imported DeploymentStatus and adjust imports/exports
accordingly.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: aa91587e-9085-491f-9c2f-4f161b741420
📒 Files selected for processing (4)
web/apps/dashboard/app/(app)/[workspaceSlug]/projects/[projectId]/(overview)/deployments/components/table/components/deployment-status-badge.tsxweb/apps/dashboard/app/(app)/[workspaceSlug]/projects/[projectId]/(overview)/deployments/filters.schema.tsweb/apps/dashboard/app/(app)/[workspaceSlug]/projects/[projectId]/components/deployment-status-badge.tsxweb/apps/dashboard/lib/collections/deploy/deployments.ts

What does this PR do?
THIS REQUIRES A DATABASE MIGRATION AND CONTROL_PLANE DEPLOYMENT
Adds two new deployment status states (
STARTINGandFINALIZING) to provide more granular progress tracking during deployments. The deployment workflow now includes five distinct phases: starting, building, deploying, network configuration, and finalizing.The changes include:
DEPLOYMENT_STATUS_STARTING(7) andDEPLOYMENT_STATUS_FINALIZING(8) to the protobuf enumstartingandfinalizingstates in deployment and deployment_steps tablesType of change
How should this be tested?
Checklist
Required
pnpm buildpnpm fmtmake fmton/godirectoryconsole.logsgit pull origin mainAppreciated