Skip to content

feat: sticky rollbacks#3992

Closed
chronark wants to merge 5 commits intomainfrom
09-18-feat_sticky_rollbacks
Closed

feat: sticky rollbacks#3992
chronark wants to merge 5 commits intomainfrom
09-18-feat_sticky_rollbacks

Conversation

@chronark
Copy link
Collaborator

@chronark chronark commented Sep 18, 2025

What does this PR do?

This looks worse than it is due to a lot of proto generation

context: https://app.cushion.so/unkey/post/post_VU8chrcBBmG1ugC-VsOfg

Fixes # (issue)

If there is not an issue for this, please create one first. This is used to tracking purposes and also helps use understand why this PR exists

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • Chore (refactoring code, technical debt, workflow improvements)
  • Enhancement (small improvements)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

How should this be tested?

  • Test A
  • Test B

Checklist

Required

  • Filled out the "How to test" section in this PR
  • Read Contributing Guide
  • Self-reviewed my own code
  • Commented on my code in hard-to-understand areas
  • Ran pnpm build
  • Ran pnpm fmt
  • Checked for warnings, there are none
  • Removed all console.logs
  • Merged the latest changes from main onto my branch with git pull origin main
  • My changes don't cause any responsiveness issues

Appreciated

  • If a UI change was made: Added a screen recording or screenshots to this PR
  • Updated the Unkey Docs if changes were necessary

Summary by CodeRabbit

  • New Features

    • Clear “Live” vs “Target” deployment labeling in the rollback dialog and deployment list.
    • Live domains are displayed in the rollback dialog.
    • Environment slug shown in the deployments table (replaces Production/Preview).
  • Improvements

    • Deployments list auto-refreshes every 5 seconds.
    • Consistent “Live” status badge and terminology across the UI.
    • Stricter rollback eligibility to prevent invalid actions.
  • Backend/Infrastructure

    • More reliable rollback path and domain reassignment behavior (sticky domains).
  • Chores

    • Removed legacy routing service.
    • Updated internal dependencies and container image.

@changeset-bot
Copy link

changeset-bot bot commented Sep 18, 2025

⚠️ No Changeset found

Latest commit: 4e98428

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@vercel
Copy link

vercel bot commented Sep 18, 2025

The latest updates on your projects. Learn more about Vercel for GitHub.

2 Skipped Deployments
Project Deployment Preview Comments Updated (UTC)
dashboard Ignored Ignored Preview Sep 19, 2025 6:50am
engineering Ignored Ignored Preview Sep 19, 2025 6:50am

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Sep 18, 2025

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

📝 Walkthrough

Walkthrough

Renames “active/current” deployment semantics to “live/target” across dashboard UI and hooks, updates rollback flow to call a new DeploymentService.Rollback RPC, adds domain “sticky” and rollback-aware schema and queries, removes the Routing service and its proto/tests, extends gateway schema with deployment_id, adjusts generated code, and updates codegen/proto exports.

Changes

Cohort / File(s) Summary
Dashboard UI: Rollback dialog + components
apps/dashboard/app/(app)/projects/[projectId]/deployments/components/rollback-dialog.tsx
Props switched to liveDeployment/targetDeployment; internal flags renamed to isLive; hostname removed; rollback mutation uses only targetDeploymentId; shows live domains via collection query; UI text updated.
Dashboard UI: Table actions and badges
.../components/table/components/actions/deployment-list-table-action.popover.constants.tsx, .../components/table/components/env-status-badge.tsx, .../components/table/deployments-list.tsx
Actions now accept { liveDeployment?, selectedDeployment }; rollback availability logic updated; removed router/edit root-key; EnvStatusBadge default variant live (replaces current); table renders environment slug and “Live” badge; passes environment to renderers.
Dashboard hook/collections
.../hooks/use-deployments.ts, apps/dashboard/lib/collections/deploy/deployments.ts
Hook returns { deployments, liveDeployment } (replaces activeDeployment{,Id}); project liveDeploymentId access simplified; deployments collection set to refetch every 5s.
Dashboard domains collection/types
apps/dashboard/lib/collections/deploy/domains.ts, apps/dashboard/lib/trpc/routers/deploy/domains/list.ts
Domain schema adds deploymentId and sticky; listDomains projects new fields.
tRPC rollback endpoint
apps/dashboard/lib/trpc/routers/deploy/deployment/rollback.ts
Switches from HTTP routing to Connect RPC client calling DeploymentService.Rollback; input drops hostname; validates targetDeploymentId; returns { domains }; updates audit logging.
Ctrl service: deployment workflow/domains
go/apps/ctrl/services/deployment/deploy_workflow.go, go/apps/ctrl/services/deployment/domains.go
Domain generation now returns domain+sticky; CLI uploads may suffix git SHA for uniqueness; persist Domain.Sticky; gateway config creation function extracted; per-domain upserts include DeploymentID; continue on per-domain config errors.
Ctrl service: rollback RPC
go/apps/ctrl/services/deployment/rollback.go
Adds Service.Rollback to switch sticky domains to target deployment, upsert gateways by deployment, update project live deployment, and return affected domains with validations and error handling.
Ctrl server wiring and Routing removal
go/apps/ctrl/run.go, go/apps/ctrl/services/routing/service.go, go/apps/ctrl/services/routing/service_test.go
Removes Routing service registration and its implementation/tests.
Partition DB: gateways schema and queries
go/pkg/partition/db/schema.sql, .../queries/gateway_upsert.sql, .../gateway_upsert.sql_generated.go, .../bulk_gateway_upsert.sql_generated.go, .../queries/gateway_find_by_deployment_id.sql, .../gateway_find_by_deployment_id.sql_generated.go, .../querier_generated.go, .../models_generated.go
Gateways gain deployment_id; upserts now include/update deployment_id; add find-by-deployment-id query; generated structs and interfaces updated.
Core DB: domains and projects schema/queries
go/pkg/db/schema.sql, .../queries/domain_insert.sql, .../domain_insert.sql_generated.go, .../bulk_domain_insert.sql_generated.go, .../queries/domain_find_by_deployment_id.sql, .../domain_find_by_deployment_id.sql_generated.go, .../domain_find_by_domain.sql_generated.go, .../domain_rollback.sql, .../domain_rollback.sql_generated.go, .../querier_generated.go, .../models_generated.go, .../queries/project_find_by_id.sql, .../project_find_by_id.sql_generated.go
Domains table adds sticky enum and is_rolled_back; insert/upsert include sticky and updated_at; find queries return sticky and rollback flag; add domain rollback exec; models include DomainsSticky types and null handling; projects find includes live_deployment_id.
Protos and codegen
go/proto/ctrl/v1/deployment.proto, go/proto/ctrl/v1/routing.proto, go/buf.gen.yaml, internal/proto/src/index.ts, internal/proto/package.json
Adds DeploymentService.Rollback request/response; removes Routing proto; updates buf ES plugin to v2 and consolidates; adjusts TS exports to new generated path and drops connect exports; bumps protobuf dep.
Internal DB (TypeScript)
internal/db/src/schema/domains.ts, internal/db/src/schema/projects.ts
Adds sticky and is_rolled_back to domains; adds activeDeployment relation via liveDeploymentId.
Hydra logging
go/pkg/hydra/step.go, go/pkg/hydra/worker.go, go/pkg/hydra/workflow.go
Adds per-execution logger to workflow context; replace one tracing error record with logger error; set workflow status attribute on span on error.
Deployment config
deployment/docker-compose.yaml
Changes MinIO image tag to bitnamilegacy/minio:2025.4.3.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor U as User
  participant UI as Dashboard UI
  participant TRPC as tRPC (dashboard)
  participant CTRL as Ctrl Client (Connect)
  participant SVC as Ctrl DeploymentService
  participant DB as Core DB
  participant PDB as Partition DB (gateways)

  U->>UI: Open Rollback Dialog (select target)
  UI->>TRPC: mutate.rollback({ targetDeploymentId })
  TRPC->>CTRL: DeploymentService.Rollback(projectId, targetDeploymentId)
  CTRL->>SVC: RPC Rollback
  SVC->>DB: Load project (liveDeploymentId), target & live deployments
  SVC->>DB: Validate target ready, same project/workspace
  SVC->>DB: Fetch sticky domains for live deployment
  SVC->>PDB: Get gateway config for target deployment
  SVC->>PDB: Upsert gateways per domain with deployment_id
  SVC->>DB: Update project.live_deployment_id = target
  SVC-->>CTRL: { domains }
  CTRL-->>TRPC: { domains }
  TRPC-->>UI: Success
  UI-->>U: Show success and updated live/target states
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Possibly related PRs

Suggested labels

Core Team, UI

Pre-merge checks and finishing touches

❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Description Check ⚠️ Warning The PR description contains the repository template but is largely unfilled: the "What does this PR do?" section is a brief note with a link, the "Fixes #" placeholder is empty, the Type of change boxes are unset, "How should this be tested?" lists only placeholders, and the Required checklist items are all unchecked; the PR is also marked Draft. Given the wide-ranging backend, DB, proto, and UI changes shown in the diff, the description lacks the concrete summary, issue reference, explicit test steps, and checklist confirmations reviewers need to validate and test the change. For these reasons the description does not meet the repository template requirements. Please populate the template before review: add a concise "What does this PR do?" summary and link the issue number, mark the appropriate "Type of change" checkbox(es), and provide concrete testing instructions (commands to run, migration steps, expected results and any required data setup). Complete the Required checklist (run pnpm build, pnpm fmt, remove console.logs, merge latest main) and attach screenshots or recordings for UI changes; convert the draft to ready when those items are done. After updating the description and checklist, request review again so reviewers can validate and test the changes.
Docstring Coverage ⚠️ Warning Docstring coverage is 75.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (1 passed)
Check name Status Explanation
Title Check ✅ Passed The title "feat: sticky rollbacks" is concise, single-line, and accurately reflects the primary change set in the diff (introduction of sticky domain semantics and a rollback flow across DB, backend, proto, and UI). It uses conventional-commit style and is relevant to the changes summarized in the PR. This makes it suitable for a teammate scanning history to understand the main intent.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Collaborator Author

This stack of pull requests is managed by Graphite. Learn more about stacking.

@chronark
Copy link
Collaborator Author

@coderabbitai review

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Sep 18, 2025

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 10

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (4)
go/pkg/partition/db/schema.sql (1)

13-18: Index deployment_id to support new lookup.

FindGatewayByDeploymentId will full‑scan without an index. Add an index.

Apply:

   `config` blob NOT NULL,   -- Protobuf with all configuration including deployment_id, workspace_id
   PRIMARY KEY (`id`),
   UNIQUE KEY `gateways_pk` (`hostname`)
+  , KEY `idx_gateways_deployment_id` (`deployment_id`)
 )
deployment/docker-compose.yaml (1)

167-176: Fix MinIO console port mapping; image tag verified.

MINIO_CONSOLE_PORT_NUMBER=3903 but compose maps 2903:2903 — change to 3903:3903. Docker Hub shows bitnamilegacy/minio:2025.4.3 exists (tag active, last_pushed 2025-07-03).

File: deployment/docker-compose.yaml Lines: 167-176

   s3:
@@
-    image: bitnamilegacy/minio:2025.4.3
+    image: bitnamilegacy/minio:2025.4.3
@@
-      - 2903:2903
+      - 3903:3903
internal/proto/package.json (1)

11-21: Unify @bufbuild/protobuf to a single runtime dependency (^2.8.0).

Remove the duplicate devDependencies entry in internal/proto/package.json and keep "@bufbuild/protobuf": "^2.8.0" in dependencies.

  "devDependencies": {
    "@bufbuild/buf": "^1.47.2",
-    "@bufbuild/protobuf": "^1.10.0",
    "@connectrpc/protoc-gen-connect-es": "^1.6.1",
    "@bufbuild/protoc-gen-es": "^1.10.0",
    "typescript": "^5.7.3"
  },
  "dependencies": {
-    "@bufbuild/protobuf": "^2.8.0",
+    "@bufbuild/protobuf": "^2.8.0",
    "@connectrpc/connect": "^1.6.1"
  }

Generated files import types from @bufbuild/protobuf (e.g. internal/proto/generated/ctrl/v1/deployment_pb.ts), so keep the package as a dependency. No active *_connect imports were found; only a commented export in internal/proto/src/index.ts referencing ../generated/ctrl/v1/deployment_connect.

apps/dashboard/app/(app)/projects/[projectId]/deployments/components/table/deployments-list.tsx (1)

297-307: Fix VirtualTable row typing — use the wrapper's deployment for key/row class

VirtualTable rows are { deployment: Deployment; environment?: Environment } (columns destructure that wrapper). keyExtractor and rowClassName currently treat the row as a plain Deployment; update to use the wrapper's deployment.

File: apps/dashboard/app/(app)/projects/[projectId]/deployments/components/table/deployments-list.tsx (VirtualTable props, ~lines 300–307)

-      keyExtractor={(deployment) => deployment.id}
+      keyExtractor={(item) => item.deployment.id}

-      rowClassName={(deployment) => getRowClassName(deployment, selectedDeployment?.deployment.id)}
+      rowClassName={(item) => getRowClassName(item.deployment, selectedDeployment?.deployment.id)}
🧹 Nitpick comments (41)
go/pkg/hydra/workflow.go (2)

107-107: Nil‑safety for logger.

workflowContext.logger can be nil if any alternative constructor misses initialization; Step() calls wctx.logger.Error(...) and would panic. Either default to a no‑op logger or guard before logging.

I can wire a safe default via worker initialization (see suggested diff in worker.go comment).


254-257: Great: span status on outcome; also set status on early‑return errors.

Early failures (bad payload type, unmarshal error) currently record the error but don’t set hydra.workflow.status=failed. Add the attribute before returning.

Apply this diff in the indicated error sites:

@@
  if !ok {
     err := fmt.Errorf("expected RawPayload, got %T", req)
     tracing.RecordError(span, err)
+    span.SetAttributes(attribute.String("hydra.workflow.status", "failed"))
     return err
  }
@@
  if err := wctx.marshaller.Unmarshal(rawPayload.Data, &typedReq); err != nil {
     tracing.RecordError(span, err)
+    span.SetAttributes(attribute.String("hydra.workflow.status", "failed"))
     return fmt.Errorf("failed to unmarshal workflow request: %w", err)
  }
go/pkg/hydra/step.go (1)

239-239: Use a step‑scoped child logger and guard nil.

Avoid potential nil‑interface panics and add step_name for context.

Apply this diff:

@@
-  response, err := fn(stepCtx)
+  // Prepare step-scoped logger (nil-safe)
+  stepLog := wctx.logger
+  if stepLog != nil {
+    stepLog = stepLog.With("step_name", stepName)
+  }
+  response, err := fn(stepCtx)
   if err != nil {
     tracing.RecordError(span, err)
-    wctx.logger.Error("step execution failed", "error", err.Error())
+    if stepLog != nil {
+      stepLog.Error("step execution failed", "error", err.Error())
+    }
go/pkg/hydra/worker.go (1)

457-457: Enrich per-execution logger with trace IDs (nil-safe)

Build the child logger instead of inlining it; attach execution_id, namespace, workflow_name and, when present, trace_id/span_id — guard against nil engine.logger.
Location: go/pkg/hydra/worker.go:457 (workflowContext initializer — only instance found)

-        logger:          w.engine.logger.With("execution_id", e.ID, "namespace", e.Namespace, "workflow_name", e.WorkflowName),
+        // Build per-execution logger with common fields (and trace linkage if available)
+        logger: func() logging.Logger {
+          l := w.engine.logger
+          if l == nil {
+            return l // assume non-nil in prod; consider a no-op default if available
+          }
+          l = l.With("execution_id", e.ID, "namespace", e.Namespace, "workflow_name", e.WorkflowName)
+          if e.TraceID.Valid && e.TraceID.String != "" {
+            l = l.With("trace_id", e.TraceID.String)
+          }
+          if e.SpanID.Valid && e.SpanID.String != "" {
+            l = l.With("span_id", e.SpanID.String)
+          }
+          return l
+        }(),
apps/dashboard/lib/collections/deploy/deployments.ts (1)

51-52: Guard polling and avoid background churn.

Add refetchIntervalInBackground: false and consider staleTime to reduce tab‑hidden polling.

Apply:

       retry: 3,
-      refetchInterval: 5000,
+      refetchInterval: 5000,
+      refetchIntervalInBackground: false,
+      staleTime: 2_000,
go/pkg/partition/db/schema.sql (1)

10-18: Plan migration/backfill for NOT NULL column.

Adding deployment_id NOT NULL will fail on non-empty tables unless backfilled. Ensure a migration adds the column nullable, backfills, then enforces NOT NULL.

go/pkg/partition/db/queries/gateway_upsert.sql (1)

1-18: Confirm workspace_id updates on hostname conflicts are intended.

UPSERT will move an existing hostname to a different workspace by updating workspace_id; verify this is allowed. If not, omit workspace_id from UPDATE clause.

Alternative:

 ON DUPLICATE KEY UPDATE
-    workspace_id = sqlc.arg(workspace_id),
     deployment_id = sqlc.arg(deployment_id),
     config = sqlc.arg(config);
apps/dashboard/lib/collections/deploy/domains.ts (1)

12-14: Expose rollback state in client Domain schema

The DB/API surface adds is_rolled_back; the dashboard Domain schema should include it for parity and UI logic.

 const schema = z.object({
   id: z.string(),
   domain: z.string(),
   type: z.enum(["custom", "wildcard"]),
   projectId: z.string().nullable(),
   deploymentId: z.string().nullable(),
   sticky: z.enum(["branch", "environment", "live"]).nullable(),
+  isRolledBack: z.boolean(),
 });
go/pkg/db/queries/domain_find_by_deployment_id.sql (1)

8-9: Add an index for this lookup

This query will scan by deployment_id; add an index on domains(deployment_id) to keep it fast at scale.

go/pkg/db/queries/project_find_by_id.sql (1)

10-11: Consider indexing live_deployment_id and verify migration/backfill

If lookups by live_deployment_id occur, add an index on projects(live_deployment_id). Also ensure migrations populate this nullable column safely.

go/pkg/db/schema.sql (1)

420-420: Add supporting indexes for new access patterns

Add indexes to speed up domain lookups by deployment and potential project lookups by live deployment.

 CREATE INDEX `workspace_idx` ON `acme_challenges` (`workspace_id`);
 CREATE INDEX `status_idx` ON `acme_challenges` (`status`);
+
+-- New indexes for sticky rollback flows
+CREATE INDEX `deployment_idx` ON `domains` (`deployment_id`);
+CREATE INDEX `live_deployment_idx` ON `projects` (`live_deployment_id`);
go/pkg/partition/db/queries/gateway_find_by_deployment_id.sql (1)

1-6: Scope by tenant and make “latest” deterministic.

  • Consider scoping by workspace_id (or partition key) to avoid cross-tenant leakage when deployment_ids collide or are mis-routed.
  • Prefer ORDER BY created_at DESC, id DESC for recency determinism vs relying on id alone.
  • Add an index to support this query.

Proposed SQL change:

-- name: FindGatewayByDeploymentId :one
-SELECT hostname, config
-FROM gateways
-WHERE deployment_id = ?
-ORDER BY id DESC
-LIMIT 1;
+SELECT hostname, config
+FROM gateways
+WHERE
+  /* optional, if available in this partition: */ /* workspace_id = ? AND */
+  deployment_id = ?
+ORDER BY created_at DESC, id DESC
+LIMIT 1;

Suggested index (migration):

CREATE INDEX IF NOT EXISTS idx_gateways_deployment_id_created_at
  ON gateways (deployment_id, created_at DESC, id DESC);

If workspace scoping applies, prefer a composite index (workspace_id, deployment_id, created_at DESC).

go/pkg/db/domain_find_by_deployment_id.sql_generated.go (1)

13-27: Input type for deploymentID is sql.NullString; confirm call sites set Valid.

Generated signature takes deploymentID sql.NullString while other queries often use string. Passing an invalid NullString will return no rows (WHERE … = NULL). If callers always have a concrete ID, prefer a non-nullable param in the source SQL to regenerate this as string.

If you want a non-nullable input, update the underlying SQL (domain_find_by_deployment_id.sql) to ensure sqlc infers string (e.g., by comparing against CAST(? AS CHAR) or using a named arg with explicit override in sqlc.yaml), then regenerate.

go/proto/ctrl/v1/deployment.proto (1)

150-151: Clarify semantics and tighten auth in server impl.

  • Comment nit: “project’s” instead of “projects”.
  • Server must validate target_deployment_id belongs to project_id and caller is authorized within workspace. Consider returning both before/after live deployment IDs and a count of domains changed.
-// Reassign the sticky domains of the projects live deployment to the target deployment
+// Reassign the sticky domains of the project's live deployment to the target deployment.
+// Server must enforce:
+// - target_deployment_id belongs to project_id
+// - caller has permissions in the owning workspace
+// - optional precondition: expected_live_deployment_id (if provided) matches
 rpc Rollback(RollbackRequest) returns (RollbackResponse) {}
internal/db/src/schema/domains.ts (1)

14-29: Make sticky NOT NULL with default 'live' and add index on deploymentId

  • Change sticky to .notNull().default("live") and add an index on deploymentId to support FindDomainsByDeploymentId and rollback flows.
  • File: internal/db/src/schema/domains.ts (snippet referenced).
-    sticky: mysqlEnum("sticky", ["branch", "environment", "live"]),
+    sticky: mysqlEnum("sticky", ["branch", "environment", "live"]).notNull().default("live"),
@@
   (table) => ({
     workspaceIdx: index("workspace_idx").on(table.workspaceId),
     projectIdx: index("project_idx").on(table.projectId),
+    deploymentIdx: index("deployment_idx").on(table.deploymentId),
   }),

Verification: sandbox search failed (missing tooling). Run: rg -nP --type=ts '(sticky\s*:\s*null|sticky?.)' and confirm no callers rely on sticky being null; also check for existing deploymentIdx/index name collisions.

go/pkg/partition/db/querier_generated.go (1)

20-27: FindGatewayByDeploymentId: ensure deterministic “latest” and indexing

ORDER BY id DESC LIMIT 1 assumes id is monotonic and correlates with recency. Consider ordering by created_at if available, and ensure an index on (deployment_id, id DESC) (or created_at) to avoid table scans.

apps/dashboard/app/(app)/projects/[projectId]/deployments/hooks/use-deployments.ts (2)

18-21: Null-safe access for .data.at(0)

useLiveQuery may return undefined data initially. Avoid potential runtime errors by null‑checking.

Apply:

-  }).data.at(0);
-  const liveDeploymentId = project?.liveDeploymentId;
-  const liveDeployment = useLiveQuery(
+  });
+  const project = project?.data?.at(0);
+  const liveDeploymentId = project?.liveDeploymentId;
+  const liveDeploymentQuery = useLiveQuery(
@@
-  ).data.at(0);
+  );
+  const liveDeployment = liveDeploymentQuery.data?.at(0);

Also applies to: 28-28


20-27: Nit: superfluous orderBy/limit when querying by id

When filtering by unique id, orderBy + limit(1) is redundant. Optional cleanup only.

go/pkg/db/domain_rollback.sql_generated.go (2)

13-18: Rollback may need to set is_rolled_back (and/or honor sticky semantics)

If the schema adds is_rolled_back and sticky modes, consider updating those fields here (or ensure callers do). Otherwise rollbacks may be indistinguishable from normal rebinds.


33-35: Run as part of a transaction when updating multiple domains

Ensure callers pass a Tx (DBTX) and wrap multi-row rollback operations atomically.

go/apps/ctrl/services/deployment/rollback.go (5)

45-55: Wrong deployment ID in error/log for live deployment fetch.

This block fetches the live deployment but logs/returns messages referencing targetDeploymentID. Use the live ID.

- if db.IsNotFound(err) {
-   return nil, connect.NewError(connect.CodeNotFound, fmt.Errorf("deployment not found: %s", targetDeploymentID))
- }
- s.logger.Error("failed to get deployment",
-   "deployment_id", targetDeploymentID,
+ if db.IsNotFound(err) {
+   return nil, connect.NewError(connect.CodeNotFound, fmt.Errorf("deployment not found: %s", project.LiveDeploymentID.String))
+ }
+ s.logger.Error("failed to get deployment",
+   "deployment_id", project.LiveDeploymentID.String,

145-149: Normalize error mapping to Connect.

Returning a plain error here breaks consistency with the rest of the method’s connect.NewError usage.

- s.logger.Error("failed to upsert gateway", "error", err.Error())
- return nil, fmt.Errorf("failed to upsert gateway: %w", err)
+ s.logger.Error("failed to upsert gateway", "error", err.Error())
+ return nil, connect.NewError(connect.CodeInternal, fmt.Errorf("failed to upsert gateway: %w", err))

151-162: CAS update and error normalization for LiveDeploymentID.

Consider compare‑and‑swap to avoid races with concurrent deploys, and keep error mapping consistent. At minimum, normalize the error:

- if err != nil {
-   s.logger.Error("failed to update project active deployment ID",
+ if err != nil {
+   s.logger.Error("failed to update project live deployment ID",
      "project_id", project.ID,
      "error", err.Error(),
    )
-   return nil, err
+   return nil, connect.NewError(connect.CodeInternal, fmt.Errorf("failed to update project live deployment ID: %w", err))
 }

Optionally, add a WHERE clause on the update to ensure LiveDeploymentID still equals the previously read value and detect contention.


164-169: Return domain names, not IDs (API clarity).

The response field reads as “domains”; returning IDs is surprising. Capture the domain names you actually changed and return those.

- res := &ctrlv1.RollbackResponse{
-   Domains: make([]string, len(domainChanges)),
- }
- for i, domain := range domainChanges {
-   res.Domains[i] = domain.ID
- }
+ changed := make([]string, 0, len(domainChanges))
+ // While building domainChanges:
+ //   changed = append(changed, domain.Domain)
+ res := &ctrlv1.RollbackResponse{Domains: changed}

Adjust the loop above (Lines 122–138) to append domain.Domain into changed wherever you add a change.


93-98: Prefer FailedPrecondition over NotFound for “no VMs running.”

Semantically, the deployment exists but isn’t runnable; CodeFailedPrecondition communicates this better to clients.

- return nil, connect.NewError(connect.CodeNotFound, fmt.Errorf("no VMs found for deployment: %s", targetDeployment.ID))
+ return nil, connect.NewError(connect.CodeFailedPrecondition, fmt.Errorf("no VMs running for deployment: %s", targetDeployment.ID))
go/pkg/partition/db/gateway_find_by_deployment_id.sql_generated.go (1)

12-18: Query is fine; consider returning only what you use.

If callers only need config, drop hostname from the SELECT to reduce I/O. Update the source SQL (not this generated file):

-- go/pkg/partition/db/queries/gateway_find_by_deployment_id.sql
-SELECT hostname, config
+SELECT config
 FROM gateways
 WHERE deployment_id = ?
 ORDER BY id DESC
 LIMIT 1

This will regenerate the row type accordingly.

go/apps/ctrl/services/deployment/domains.go (3)

27-33: Collision risk for 4‑digit suffix; consider slightly larger entropy.

IntN(9000) yields ~1/9000 collision odds per same‑SHA CLI upload burst. If bursts are common, bump to 5–6 digits or add a short timestamp to reduce collisions. Keep label length in mind.


60-66: Guard domain label length (<=63 chars).

<project>-<env>-<workspace> can exceed a single-label limit even with an 80‑char branch slug cap. Truncate components to keep each label <=63.

Example approach: compute remaining budget per label and truncate projectSlug/workspaceSlug accordingly before formatting.


74-101: sluggify: handle empty result.

Purely non‑alphanumeric inputs can yield an empty slug, producing -- seams. Add a minimal fallback (e.g., “x”) to keep labels valid.

   s = strings.TrimSuffix(s, "-")
-  return s
+  if s == "" {
+    return "x"
+  }
+  return s
go/apps/ctrl/services/deployment/deploy_workflow.go (3)

286-286: Hardcoded SourceType — parameterize or derive from request.

Avoid baking in SOURCE_TYPE_CLI_UPLOAD. Plumb a sourceType into DeployRequest or derive from context to keep telemetry accurate.


301-307: All domains saved as wildcard — derive type from hostname.

Setting Type to DomainsTypeWildcard for every domain can be wrong. Detect wildcard from the value.

Apply:

-                Type:         db.DomainsTypeWildcard,
+                Type: func() db.DomainsType {
+                  if strings.HasPrefix(domain.domain, "*.") {
+                    return db.DomainsTypeWildcard
+                  }
+                  return db.DomainsTypeCustom
+                }(),

330-332: Use domain.domain in logs consistently.

In later logs (e.g., Line 339, Line 349), the whole struct is logged. Log the hostname string for readability and stable JSON.

- w.logger.Error("failed to create gateway config for domain",
-   "domain", domain,
+ w.logger.Error("failed to create gateway config for domain",
+   "domain", domain.domain,
   "error", err,
   "deployment_id", req.DeploymentID)

…and similarly where marshalling fails.

apps/dashboard/lib/trpc/routers/deploy/deployment/rollback.ts (3)

2-7: Use Node transport for server-side RPC.

This runs on the server; prefer @connectrpc/connect-node for proper Node defaults/timeouts.

-import { createClient } from "@connectrpc/connect";
-import { createConnectTransport } from "@connectrpc/connect-web";
+import { createClient } from "@connectrpc/connect";
+import { createConnectTransport } from "@connectrpc/connect-node";

25-28: Fail fast with a TRPCError and actionable message.

Throw a TRPCError so callers get a typed error; include env hint.

- if (!ctrlUrl) {
-   throw new Error("ctrl service is not configured");
- }
+ if (!ctrlUrl) {
+   throw new TRPCError({
+     code: "INTERNAL_SERVER_ERROR",
+     message: "CTRL_URL is not configured",
+   });
+ }

101-103: Return shape — consider guarding undefined.

If ctrl returns no domains, avoid undefined at callers.

- return { domains: rolledBack.domains };
+ return { domains: rolledBack?.domains ?? [] };
apps/dashboard/app/(app)/projects/[projectId]/deployments/components/rollback-dialog.tsx (2)

42-51: Guard live query results to avoid first-render crash.

useLiveQuery can be pending; protect .map usage.

- const domains = useLiveQuery((q) =>
+ const domains = useLiveQuery((q) =>
    q
      .from({ domain: domainCollection })
      .where(({ domain }) => inArray(domain.sticky, ["environment", "live"]))
      .where(({ domain }) => eq(domain.deploymentId, liveDeployment.id)),
  );

…and when rendering:

- {domains.data.map((domain) => (
+ {(domains.data ?? []).map((domain) => (

91-101: Improve button a11y while loading.

Add aria-busy/aria-disabled for assistive tech.

- <Button
+ <Button
    variant="primary"
    size="xlg"
    onClick={handleRollback}
    disabled={rollback.isLoading}
    loading={rollback.isLoading}
+   aria-busy={rollback.isLoading}
+   aria-disabled={rollback.isLoading}
    className="w-full rounded-lg"
  >
apps/dashboard/app/(app)/projects/[projectId]/deployments/components/table/components/actions/deployment-list-table-action.popover.constants.tsx (3)

49-53: Ensure canRollback is strictly boolean.

The current && chain can yield undefined (when liveDeployment is absent). Coerce to boolean for clearer types and JSX prop safety.

Apply:

-  const canRollback =
-    liveDeployment &&
-    environment?.slug === "production" &&
-    selectedDeployment.status === "ready" &&
-    selectedDeployment.id !== liveDeployment.id;
+  const canRollback = !!(
+    liveDeployment &&
+    environment?.slug === "production" &&
+    selectedDeployment.status === "ready" &&
+    selectedDeployment.id !== liveDeployment.id
+  );

30-36: Redundant check for selectedDeployment in dialog guard.

selectedDeployment is required in props; the extra check is noise.

-      {liveDeployment && selectedDeployment && (
+      {liveDeployment && (
         <RollbackDialog
           isOpen={isRollbackModalOpen}
           onOpenChange={setIsRollbackModalOpen}
           liveDeployment={liveDeployment}
           targetDeployment={selectedDeployment}
         />
       )}

43-45: Align helper signature with prop optionality.

For consistency with DeploymentListTableActionsProps, make liveDeployment optional in the helper’s params.

-const getDeploymentListTableActionItems = (
-  selectedDeployment: Deployment,
-  liveDeployment: Deployment | undefined,
+const getDeploymentListTableActionItems = (
+  selectedDeployment: Deployment,
+  liveDeployment?: Deployment,
   environment: Environment | undefined,
   setIsRollbackModalOpen: (open: boolean) => void,
 ): MenuItem[] => {

Also applies to: 8-12

apps/dashboard/app/(app)/projects/[projectId]/deployments/components/table/deployments-list.tsx (1)

91-97: Optional: add a fallback when environment is missing.

Prevents an empty line if environment is undefined.

-                    {environment?.slug}
+                    {environment?.slug ?? "—"}
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1e3a016 and 2a3deb6.

⛔ Files ignored due to path filters (46)
  • go/gen/proto/ctrl/v1/ctrlv1connect/deployment.connect.go is excluded by !**/gen/**
  • go/gen/proto/ctrl/v1/ctrlv1connect/routing.connect.go is excluded by !**/gen/**
  • go/gen/proto/ctrl/v1/deployment.pb.go is excluded by !**/*.pb.go, !**/gen/**
  • go/gen/proto/ctrl/v1/routing.pb.go is excluded by !**/*.pb.go, !**/gen/**
  • internal/proto/generated/assetmanagerd/v1/asset_pb.ts is excluded by !**/generated/**
  • internal/proto/generated/billaged/v1/billing_pb.ts is excluded by !**/generated/**
  • internal/proto/generated/builderd/v1/builder_pb.ts is excluded by !**/generated/**
  • internal/proto/generated/ctrl/v1/acme_pb.ts is excluded by !**/generated/**
  • internal/proto/generated/ctrl/v1/build_pb.ts is excluded by !**/generated/**
  • internal/proto/generated/ctrl/v1/deployment_pb.ts is excluded by !**/generated/**
  • internal/proto/generated/ctrl/v1/openapi_pb.ts is excluded by !**/generated/**
  • internal/proto/generated/ctrl/v1/service_pb.ts is excluded by !**/generated/**
  • internal/proto/generated/metald/v1/deployment_pb.ts is excluded by !**/generated/**
  • internal/proto/generated/metald/v1/metald_pb.ts is excluded by !**/generated/**
  • internal/proto/generated/metald/v1/network_pb.ts is excluded by !**/generated/**
  • internal/proto/generated/metald/v1/storage_pb.ts is excluded by !**/generated/**
  • internal/proto/generated/metald/v1/vm_pb.ts is excluded by !**/generated/**
  • internal/proto/generated/partition/v1/gateway_pb.ts is excluded by !**/generated/**
  • internal/proto/generated/proto/ctrl/v1/acme_connect.ts is excluded by !**/generated/**
  • internal/proto/generated/proto/ctrl/v1/acme_pb.ts is excluded by !**/generated/**
  • internal/proto/generated/proto/ctrl/v1/build_connect.ts is excluded by !**/generated/**
  • internal/proto/generated/proto/ctrl/v1/build_pb.ts is excluded by !**/generated/**
  • internal/proto/generated/proto/ctrl/v1/deployment_connect.ts is excluded by !**/generated/**
  • internal/proto/generated/proto/ctrl/v1/deployment_pb.ts is excluded by !**/generated/**
  • internal/proto/generated/proto/ctrl/v1/openapi_connect.ts is excluded by !**/generated/**
  • internal/proto/generated/proto/ctrl/v1/openapi_pb.ts is excluded by !**/generated/**
  • internal/proto/generated/proto/ctrl/v1/routing_connect.ts is excluded by !**/generated/**
  • internal/proto/generated/proto/ctrl/v1/routing_pb.ts is excluded by !**/generated/**
  • internal/proto/generated/proto/ctrl/v1/service_connect.ts is excluded by !**/generated/**
  • internal/proto/generated/proto/ctrl/v1/service_pb.ts is excluded by !**/generated/**
  • internal/proto/generated/proto/deploy/assetmanagerd/v1/asset_connect.ts is excluded by !**/generated/**
  • internal/proto/generated/proto/deploy/assetmanagerd/v1/asset_pb.ts is excluded by !**/generated/**
  • internal/proto/generated/proto/deploy/billaged/v1/billing_connect.ts is excluded by !**/generated/**
  • internal/proto/generated/proto/deploy/billaged/v1/billing_pb.ts is excluded by !**/generated/**
  • internal/proto/generated/proto/deploy/builderd/v1/builder_connect.ts is excluded by !**/generated/**
  • internal/proto/generated/proto/deploy/builderd/v1/builder_pb.ts is excluded by !**/generated/**
  • internal/proto/generated/proto/metal/vmprovisioner/v1/vmprovisioner_connect.ts is excluded by !**/generated/**
  • internal/proto/generated/proto/metal/vmprovisioner/v1/vmprovisioner_pb.ts is excluded by !**/generated/**
  • internal/proto/generated/proto/metal/vmprovisioner/v1/wip_pb.ts is excluded by !**/generated/**
  • internal/proto/generated/proto/partition/v1/gateway_pb.ts is excluded by !**/generated/**
  • internal/proto/generated/proto/vault/v1/object_pb.ts is excluded by !**/generated/**
  • internal/proto/generated/proto/vault/v1/service_connect.ts is excluded by !**/generated/**
  • internal/proto/generated/proto/vault/v1/service_pb.ts is excluded by !**/generated/**
  • internal/proto/generated/vault/v1/object_pb.ts is excluded by !**/generated/**
  • internal/proto/generated/vault/v1/service_pb.ts is excluded by !**/generated/**
  • pnpm-lock.yaml is excluded by !**/pnpm-lock.yaml
📒 Files selected for processing (47)
  • apps/dashboard/app/(app)/projects/[projectId]/deployments/components/rollback-dialog.tsx (4 hunks)
  • apps/dashboard/app/(app)/projects/[projectId]/deployments/components/table/components/actions/deployment-list-table-action.popover.constants.tsx (1 hunks)
  • apps/dashboard/app/(app)/projects/[projectId]/deployments/components/table/components/env-status-badge.tsx (1 hunks)
  • apps/dashboard/app/(app)/projects/[projectId]/deployments/components/table/deployments-list.tsx (4 hunks)
  • apps/dashboard/app/(app)/projects/[projectId]/deployments/hooks/use-deployments.ts (2 hunks)
  • apps/dashboard/lib/collections/deploy/deployments.ts (1 hunks)
  • apps/dashboard/lib/collections/deploy/domains.ts (1 hunks)
  • apps/dashboard/lib/trpc/routers/deploy/deployment/rollback.ts (2 hunks)
  • apps/dashboard/lib/trpc/routers/deploy/domains/list.ts (1 hunks)
  • deployment/docker-compose.yaml (1 hunks)
  • go/apps/ctrl/run.go (0 hunks)
  • go/apps/ctrl/services/deployment/deploy_workflow.go (6 hunks)
  • go/apps/ctrl/services/deployment/domains.go (1 hunks)
  • go/apps/ctrl/services/deployment/rollback.go (1 hunks)
  • go/apps/ctrl/services/routing/service.go (0 hunks)
  • go/apps/ctrl/services/routing/service_test.go (0 hunks)
  • go/buf.gen.yaml (1 hunks)
  • go/pkg/db/bulk_domain_insert.sql_generated.go (3 hunks)
  • go/pkg/db/domain_find_by_deployment_id.sql_generated.go (4 hunks)
  • go/pkg/db/domain_find_by_domain.sql_generated.go (2 hunks)
  • go/pkg/db/domain_insert.sql_generated.go (3 hunks)
  • go/pkg/db/domain_rollback.sql_generated.go (1 hunks)
  • go/pkg/db/models_generated.go (2 hunks)
  • go/pkg/db/project_find_by_id.sql_generated.go (4 hunks)
  • go/pkg/db/querier_generated.go (5 hunks)
  • go/pkg/db/queries/domain_find_by_deployment_id.sql (1 hunks)
  • go/pkg/db/queries/domain_insert.sql (1 hunks)
  • go/pkg/db/queries/domain_rollback.sql (1 hunks)
  • go/pkg/db/queries/project_find_by_id.sql (1 hunks)
  • go/pkg/db/schema.sql (2 hunks)
  • go/pkg/hydra/step.go (3 hunks)
  • go/pkg/hydra/worker.go (1 hunks)
  • go/pkg/hydra/workflow.go (3 hunks)
  • go/pkg/partition/db/bulk_gateway_upsert.sql_generated.go (3 hunks)
  • go/pkg/partition/db/gateway_find_by_deployment_id.sql_generated.go (1 hunks)
  • go/pkg/partition/db/gateway_upsert.sql_generated.go (1 hunks)
  • go/pkg/partition/db/models_generated.go (1 hunks)
  • go/pkg/partition/db/querier_generated.go (2 hunks)
  • go/pkg/partition/db/queries/gateway_find_by_deployment_id.sql (1 hunks)
  • go/pkg/partition/db/queries/gateway_upsert.sql (1 hunks)
  • go/pkg/partition/db/schema.sql (1 hunks)
  • go/proto/ctrl/v1/deployment.proto (1 hunks)
  • go/proto/ctrl/v1/routing.proto (0 hunks)
  • internal/db/src/schema/domains.ts (2 hunks)
  • internal/db/src/schema/projects.ts (1 hunks)
  • internal/proto/package.json (1 hunks)
  • internal/proto/src/index.ts (1 hunks)
💤 Files with no reviewable changes (4)
  • go/apps/ctrl/run.go
  • go/apps/ctrl/services/routing/service_test.go
  • go/apps/ctrl/services/routing/service.go
  • go/proto/ctrl/v1/routing.proto
🧰 Additional context used
🧠 Learnings (11)
📚 Learning: 2025-07-25T19:09:43.284Z
Learnt from: mcstepp
PR: unkeyed/unkey#3662
File: apps/dashboard/lib/trpc/routers/deployment/list.ts:11-11
Timestamp: 2025-07-25T19:09:43.284Z
Learning: In apps/dashboard/lib/trpc/routers/deployment/list.ts, the listDeployments procedure intentionally queries the versions table rather than a deployments table. The user mcstepp indicated that renaming the table would require a database migration, which was deferred for the current PR focused on UI features.

Applied to files:

  • apps/dashboard/lib/collections/deploy/deployments.ts
  • apps/dashboard/lib/trpc/routers/deploy/domains/list.ts
  • apps/dashboard/app/(app)/projects/[projectId]/deployments/components/rollback-dialog.tsx
  • apps/dashboard/app/(app)/projects/[projectId]/deployments/hooks/use-deployments.ts
  • apps/dashboard/lib/trpc/routers/deploy/deployment/rollback.ts
  • apps/dashboard/app/(app)/projects/[projectId]/deployments/components/table/deployments-list.tsx
  • apps/dashboard/app/(app)/projects/[projectId]/deployments/components/table/components/actions/deployment-list-table-action.popover.constants.tsx
📚 Learning: 2025-09-01T01:57:42.227Z
Learnt from: imeyer
PR: unkeyed/unkey#3899
File: go/proto/metald/v1/metald.proto:5-9
Timestamp: 2025-09-01T01:57:42.227Z
Learning: In the unkeyed/unkey repository, buf is configured to properly resolve metald proto imports like "metald/v1/vm.proto" without needing the full "go/proto/" prefix. The buf lint command `buf lint --path proto/metald` passes successfully with these relative import paths.

Applied to files:

  • go/buf.gen.yaml
  • go/apps/ctrl/services/deployment/deploy_workflow.go
📚 Learning: 2025-09-01T02:33:43.791Z
Learnt from: imeyer
PR: unkeyed/unkey#3899
File: go/buf.gen.yaml:0-0
Timestamp: 2025-09-01T02:33:43.791Z
Learning: In the unkeyed/unkey repository, buf commands are executed from the `go/` directory where the `buf.yaml` file is located. This means the `out: gen` configuration in `go/buf.gen.yaml` generates files to `go/gen/` relative to the repository root, which aligns with the expected directory structure.

Applied to files:

  • go/buf.gen.yaml
📚 Learning: 2025-08-08T14:55:11.981Z
Learnt from: imeyer
PR: unkeyed/unkey#3755
File: deployment/docker-compose.yaml:179-184
Timestamp: 2025-08-08T14:55:11.981Z
Learning: In deployment/docker-compose.yaml (development only), MinIO is configured with API on 3902 and console on 3903; ports should map 3902:3902 and 3903:3903 to match MINIO_API_PORT_NUMBER and MINIO_CONSOLE_PORT_NUMBER.

Applied to files:

  • deployment/docker-compose.yaml
📚 Learning: 2025-09-18T17:31:16.158Z
Learnt from: Flo4604
PR: unkeyed/unkey#3994
File: go/pkg/partition/db/queries/gateway_find_config_by_hostname.sql:2-4
Timestamp: 2025-09-18T17:31:16.158Z
Learning: Gateway hostname lookups in Unkey are infrastructure-level operations that discover workspace_id by hostname, not operations that should be scoped by workspace_id. The FindGatewayByHostname query retrieves the workspace_id as part of the result to determine which workspace a hostname belongs to.

Applied to files:

  • go/pkg/partition/db/querier_generated.go
📚 Learning: 2025-09-12T17:57:18.337Z
Learnt from: mcstepp
PR: unkeyed/unkey#3952
File: apps/dashboard/app/(app)/projects/[projectId]/deployments/components/rollback-dialog.tsx:70-79
Timestamp: 2025-09-12T17:57:18.337Z
Learning: In the deployment rollback functionality, self-rollback scenarios are prevented at the UI level in the actions menu through the `canRollback` condition which includes `!("active" in deployment && deployment.active)`, making additional checks in the dialog component redundant.

Applied to files:

  • apps/dashboard/app/(app)/projects/[projectId]/deployments/components/rollback-dialog.tsx
  • apps/dashboard/app/(app)/projects/[projectId]/deployments/components/table/components/actions/deployment-list-table-action.popover.constants.tsx
📚 Learning: 2025-09-12T17:57:18.337Z
Learnt from: mcstepp
PR: unkeyed/unkey#3952
File: apps/dashboard/app/(app)/projects/[projectId]/deployments/components/rollback-dialog.tsx:70-79
Timestamp: 2025-09-12T17:57:18.337Z
Learning: In the rollback functionality, the actions menu prevents self-rollback scenarios by including `!deployment.active` in the `canRollback` condition, ensuring only inactive deployments show the rollback option.

Applied to files:

  • apps/dashboard/app/(app)/projects/[projectId]/deployments/components/rollback-dialog.tsx
  • apps/dashboard/app/(app)/projects/[projectId]/deployments/components/table/components/actions/deployment-list-table-action.popover.constants.tsx
📚 Learning: 2025-07-17T14:24:20.403Z
Learnt from: Flo4604
PR: unkeyed/unkey#3631
File: go/pkg/db/bulk_keyring_insert.sql.go:23-25
Timestamp: 2025-07-17T14:24:20.403Z
Learning: In go/pkg/db/bulk_keyring_insert.sql.go and similar bulk insert generated files, hardcoded zero values for fields like size_approx and size_last_updated_at are intentional and reflect the original SQL query structure, not missing parameters.

Applied to files:

  • go/pkg/partition/db/bulk_gateway_upsert.sql_generated.go
  • go/pkg/db/bulk_domain_insert.sql_generated.go
📚 Learning: 2025-09-12T18:11:33.481Z
Learnt from: mcstepp
PR: unkeyed/unkey#3952
File: apps/dashboard/lib/trpc/routers/deploy/rollback.ts:23-24
Timestamp: 2025-09-12T18:11:33.481Z
Learning: In apps/dashboard/lib/trpc/routers/deploy/rollback.ts, the CTRL_URL environment variable should fail fast with a clear error message if missing in non-development environments, rather than defaulting to localhost which can mask production configuration issues.

Applied to files:

  • apps/dashboard/lib/trpc/routers/deploy/deployment/rollback.ts
📚 Learning: 2025-07-25T19:11:00.208Z
Learnt from: mcstepp
PR: unkeyed/unkey#3662
File: apps/dashboard/lib/trpc/routers/deployment/getOpenApiDiff.ts:110-147
Timestamp: 2025-07-25T19:11:00.208Z
Learning: In apps/dashboard/lib/trpc/routers/deployment/getOpenApiDiff.ts, the user mcstepp prefers to keep mock data fallbacks in POC/demonstration code for simplicity, even if it wouldn't be production-ready. This aligns with the PR being work-in-progress for demonstration purposes.

Applied to files:

  • apps/dashboard/lib/trpc/routers/deploy/deployment/rollback.ts
📚 Learning: 2025-09-11T14:24:40.988Z
Learnt from: Flo4604
PR: unkeyed/unkey#3944
File: go/apps/ctrl/services/deployment/deploy_workflow.go:326-334
Timestamp: 2025-09-11T14:24:40.988Z
Learning: The InsertDomains method in the bulk queries uses ON DUPLICATE KEY UPDATE, making it an upsert operation that is idempotent and safe for retries, despite the "Insert" naming convention.

Applied to files:

  • go/pkg/db/bulk_domain_insert.sql_generated.go
🔇 Additional comments (34)
go/pkg/hydra/workflow.go (1)

10-10: Import looks correct.

Package name matches usage (logging.Logger), no action needed.

go/pkg/hydra/step.go (1)

139-151: SQL whitespace only — LGTM.

The EXISTS lease checks remain semantically identical.

Also applies to: 185-193

internal/db/src/schema/projects.ts (1)

39-42: Rename activeDeployment → liveDeployment

Search shows the only occurrence of activeDeployment is in internal/db/src/schema/projects.ts; rename aligns with projects.liveDeploymentId/UI and is safe.

-  activeDeployment: one(deployments, {
+  liveDeployment: one(deployments, {
     fields: [projects.liveDeploymentId],
     references: [deployments.id],
   }),
go/buf.gen.yaml (1)

9-12: Approve ES plugin upgrade — no active connect‑es consumers found

Only a commented export remains at internal/proto/src/index.ts:3 — //export * from "../generated/ctrl/v1/deployment_connect"; include_imports: true increases generated output size — acceptable if intended.

go/pkg/db/project_find_by_id.sql_generated.go (1)

22-25: LGTM: scan order and nullability for LiveDeploymentID look correct

Generated mapping aligns with the SELECT list; LiveDeploymentID as sql.NullString matches schema.

Please re-run sqlc/codegen in CI to ensure no drift with the .sql file.

Also applies to: 37-39, 68-69

go/pkg/db/schema.sql (1)

373-375: LGTM: new columns align with sticky rollback design

Nullable sticky and NOT NULL DEFAULT false for is_rolled_back look appropriate.

go/pkg/db/domain_find_by_domain.sql_generated.go (1)

12-19: LGTM: extended projection and scan for sticky/is_rolled_back

The added fields are correctly projected and scanned into Domain.

Also applies to: 29-31

go/pkg/partition/db/models_generated.go (1)

115-119: DeploymentID is non-nullable — schema and writers present; confirm backfill

Found evidence: deployment_id is VARCHAR(...) NOT NULL in go/pkg/partition/db/schema.sql:13; models use non-null string (go/pkg/partition/db/models_generated.go:117,139); UpsertGateway/UpsertVM and bulk upserts require DeploymentID (gateway_upsert.sql_generated.go, vm_upsert.sql_generated.go, bulk_*_upsert.sql_generated.go). Ensure a backfill migration has been created and applied (or perform nullable → backfill → set NOT NULL rollout) so existing rows won’t break.

apps/dashboard/lib/trpc/routers/deploy/domains/list.ts (1)

20-23: Return isRolledBack to match schema and support UI

Add the boolean to the returned columns so the client can render rollback state — schema defines it at internal/db/src/schema/domains.ts:28 and there are no existing usages in apps/dashboard.

File: apps/dashboard/lib/trpc/routers/deploy/domains/list.ts

         columns: {
           id: true,
           domain: true,
           projectId: true,
           deploymentId: true,
           type: true,
-          sticky: true,
+          sticky: true,
+          isRolledBack: true,
         },
go/pkg/db/domain_find_by_deployment_id.sql_generated.go (1)

29-38: Types look consistent with new columns.

Sticky as NullDomainsSticky and IsRolledBack as bool align with schema updates. No issues with scanning order.

internal/proto/src/index.ts (1)

2-3: Restore deprecated re-export for one release — avoid breaking public API

Removing ../generated/ctrl/v1/deployment_connect is a breaking change; keep a deprecated shim for one release, then remove in the next major. Repo search matched only internal/proto/src/index.ts (lines 2–3).

internal/proto/src/index.ts — change:

-export * from "../generated/ctrl/v1/deployment_pb";
//export * from "../generated/ctrl/v1/deployment_connect";
+export * from "../generated/ctrl/v1/deployment_pb";
/** @deprecated kept for compatibility; remove in next major */
+export * from "../generated/ctrl/v1/deployment_connect";
apps/dashboard/app/(app)/projects/[projectId]/deployments/components/table/components/env-status-badge.tsx (1)

11-18: No alias required — no usages of variant="current" found.
Repo-wide search shows EnvStatusBadge is only used with variant="live" (apps/dashboard/app/(app)/projects/[projectId]/deployments/components/table/deployments-list.tsx:89). Keep change as-is.

go/pkg/partition/db/querier_generated.go (1)

56-72: UpsertGateway UPDATE set now includes workspace_id/deployment_id — validate uniqueness and drift

If hostname is the unique key, updating workspace_id on conflict can move a hostname across workspaces. Confirm your unique index and intended semantics (is workspace_id part of the unique key?).

go/pkg/partition/db/bulk_gateway_upsert.sql_generated.go (1)

27-39: Placeholder/order expansion looks correct

Four placeholders per row and per-row args append in (workspace_id, deployment_id, hostname, config) order LGTM.

go/pkg/db/domain_insert.sql_generated.go (2)

21-41: Sticky column integration looks correct

INSERT/UPDATE lists and placeholder counts align; moving to explicit assignments in ON DUPLICATE is fine for single-row execs.

Please confirm the DB migration sets a default for sticky (or allows NULL via NullDomainsSticky) and backfills existing rows as needed.


44-53: Struct update matches SQL

InsertDomainParams includes Sticky with the expected nullable wrapper. LGTM.

apps/dashboard/app/(app)/projects/[projectId]/deployments/hooks/use-deployments.ts (1)

116-118: Breaking hook API: active → live

useDeployments now returns liveDeployment and removes activeDeployment/activeDeploymentId — update all consumers. File: apps/dashboard/app/(app)/projects/[projectId]/deployments/hooks/use-deployments.ts (lines 116–118).

Verification in the sandbox was inconclusive (ripgrep found no files). Run locally to find usages:
rg -nP -S --hidden --glob '!/node_modules/' '\b(activeDeploymentId|activeDeployment|useDeployments)\b' -g '**/*.{ts,tsx,js,jsx}'

go/pkg/partition/db/gateway_upsert.sql_generated.go (2)

57-65: Args ordering looks correct.

Insert(4) + Update(3) placeholders match the 7 arguments. No action needed.


13-24: Enforce per-workspace hostname uniqueness and make workspace_id immutable.

The generated upsert (go/pkg/partition/db/gateway_upsert.sql_generated.go — const upsertGateway) uses ON DUPLICATE KEY UPDATE workspace_id = ?, which will overwrite workspace_id on conflict and indicates the duplicate key isn’t scoped by workspace. Add a DB migration to create UNIQUE(workspace_id, hostname) if missing, remove workspace_id from the ON DUPLICATE KEY UPDATE clause (only update deployment_id and config), and regenerate sqlc artifacts.

⛔ Skipped due to learnings
Learnt from: ogzhanolguncu
PR: unkeyed/unkey#2872
File: apps/dashboard/lib/trpc/routers/ratelimit/createNamespace.ts:36-39
Timestamp: 2025-04-08T09:34:24.576Z
Learning: In the Unkey dashboard, when making database queries involving workspaces, use `ctx.workspace.id` directly instead of fetching the workspace separately for better performance and security.
Learnt from: Flo4604
PR: unkeyed/unkey#3994
File: go/pkg/partition/db/queries/gateway_find_config_by_hostname.sql:2-4
Timestamp: 2025-09-18T17:31:16.158Z
Learning: Gateway hostname lookups in Unkey are infrastructure-level operations that discover workspace_id by hostname, not operations that should be scoped by workspace_id. The FindGatewayByHostname query retrieves the workspace_id as part of the result to determine which workspace a hostname belongs to.
Learnt from: Flo4604
PR: unkeyed/unkey#3952
File: go/apps/ctrl/services/routing/service.go:69-91
Timestamp: 2025-09-15T20:45:05.696Z
Learning: In Unkey's routing service, gateway lookups should be workspace-scoped using FindGatewayByHostnameAndWorkspace instead of hostname-only queries to prevent cross-tenant access issues.
Learnt from: ogzhanolguncu
PR: unkeyed/unkey#2872
File: apps/dashboard/lib/trpc/routers/ratelimit/createNamespace.ts:36-39
Timestamp: 2025-04-08T09:34:24.576Z
Learning: When querying or updating namespaces in the Unkey dashboard, always scope the operations to the current workspace using `eq(table.workspaceId, ctx.workspace.id)` to prevent cross-workspace access.
go/apps/ctrl/services/deployment/domains.go (1)

5-5: No action required — toolchain ≥ Go 1.22 confirmed

go/go.mod declares "go 1.25" and CI/Dockerfiles reference Go 1.24/1.25, so the math/rand/v2 import is supported; keep as-is.

go/pkg/db/querier_generated.go (4)

176-177: Domains now include sticky and rollback flags in lookups — looks good.

The SELECT expansion to include sticky and is_rolled_back for FindDomainByDomain matches the new schema and Domain struct.


186-193: FindDomainsByDeploymentId: added sticky/is_rolled_back — looks good.

This aligns with the new Domain fields and enables rollback-aware UI.


599-604: Project query surfaces live_deployment_id — looks good.

The additional column is consistent with models_generated.go: Project.LiveDeploymentID.


909-929: Incorrect: no UNIQUE(domain) — but bulk upsert is broken and needs a fix

  • Schema (go/pkg/db/schema.sql): domains has PRIMARY KEY(id) and NO UNIQUE on domain. The original cross-workspace-by-domain upsert concern is incorrect.
  • Single-row insert (go/pkg/db/domain_insert.sql_generated.go / queries/domain_insert.sql): ON DUPLICATE KEY UPDATE workspace_id,... is consistent with an id-based upsert.
  • Critical bug (go/pkg/db/bulk_domain_insert.sql_generated.go): the bulk insert appends ON DUPLICATE KEY UPDATE parameters only once (uses args[0]), so if multiple rows are inserted and any duplicate occurs, all duplicates will be updated with the first row’s workspace_id/project_id/deployment_id/type/sticky/updated_at — causing incorrect overwrites and potential cross-workspace assignment.
    Suggested fixes: update the generator/SQL so the UPDATE uses the correct per-row values (append one set of update params per VALUES row or reference the inserted row values via VALUES(...) / equivalent), or remove workspace_id/project_id from the UPDATE if those must be immutable.
  • Files to act on: go/pkg/db/schema.sql, go/pkg/db/domain_insert.sql_generated.go, go/pkg/db/bulk_domain_insert.sql_generated.go.

Likely an incorrect or invalid review comment.

go/pkg/db/models_generated.go (2)

239-281: DomainsSticky enum + NullDomainsSticky — looks correct.

Scan/Value implementations and constants (branch/environment/live) are consistent and idiomatic for sqlc-generated enums.


599-609: Domain struct gains Sticky and IsRolledBack — looks consistent.

Types align with the new enum and boolean column. Zero-value for IsRolledBack = false is sensible.

go/apps/ctrl/services/deployment/deploy_workflow.go (2)

354-358: Gateway upsert includes WorkspaceID/DeploymentID — good addition.

Strengthens partition data integrity and helps future queries/cleanup.


583-607: createGatewayConfig standalone — LGTM; Go version unverified

Function looks correct; protojson encoding matches storage policy. Repository lacks go.mod (script output: "go.mod: No such file or directory"), so cannot confirm Go directive >= 1.22 — confirm the project uses Go 1.22+ to safely rely on the 'for i := range ' usage.

apps/dashboard/app/(app)/projects/[projectId]/deployments/components/rollback-dialog.tsx (2)

18-25: Prop rename to isLive — looks good.

API reads clearer and matches live/target terminology.


110-124: Domain list rendering — LGTM with new sticky filter.

Filtering on ["environment","live"] matches backend enum constants.

apps/dashboard/app/(app)/projects/[projectId]/deployments/components/table/components/actions/deployment-list-table-action.popover.constants.tsx (1)

31-36: Avoid duplicating rollback eligibility in the dialog.

Per prior learning (PR #3952), eligibility is enforced in the actions menu; the dialog should assume inputs are valid and avoid redundant checks.

Please confirm RollbackDialog hasn’t reintroduced self-rollback or environment gating logic internally.

apps/dashboard/app/(app)/projects/[projectId]/deployments/components/table/deployments-list.tsx (3)

61-63: Correct live badge computation and rendering.

Using liveDeployment?.id === deployment.id with EnvStatusBadge variant="live" is consistent with the refactor.

Also applies to: 89-90


289-292: Actions prop update is consistent.

Passing { selectedDeployment, liveDeployment, environment } matches the new API.


49-50: Migration to liveDeployment incomplete — confirm hook and remove leftover activeDeployment.

Evidence: internal/db/src/schema/projects.ts contains activeDeployment (around line 39). Verify the useDeployments hook implementation returns { liveDeployment, deployments } and update any remaining callers/schema entries still referencing activeDeployment.

Comment on lines +110 to +117
gatewayConfig, err := pdb.Query.FindGatewayByDeploymentId(ctx, s.partitionDB.RO(), targetDeployment.ID)
if err != nil {
s.logger.Error("failed to get gateway config",
"deployment_id", targetDeployment.ID,
"error", err.Error(),
)
return nil, connect.NewError(connect.CodeInternal, fmt.Errorf("failed to get gateway config: %w", err))
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Handle “no gateway for target deployment” gracefully (and/or synthesize one).

If no gateway exists yet for the target deployment, FindGatewayByDeploymentId likely returns sql.ErrNoRows. Treat that as a known case and build the config (e.g., via your createGatewayConfig used in deploy flow) instead of failing the rollback.

- gatewayConfig, err := pdb.Query.FindGatewayByDeploymentId(ctx, s.partitionDB.RO(), targetDeployment.ID)
- if err != nil {
+ gatewayConfig, err := pdb.Query.FindGatewayByDeploymentId(ctx, s.partitionDB.RO(), targetDeployment.ID)
+ if errors.Is(err, sql.ErrNoRows) {
+   cfg, genErr := s.createGatewayConfig(ctx, targetDeployment) // adjust to real helper
+   if genErr != nil {
+     s.logger.Error("failed to build gateway config", "deployment_id", targetDeployment.ID, "error", genErr.Error())
+     return nil, connect.NewError(connect.CodeInternal, fmt.Errorf("failed to build gateway config: %w", genErr))
+   }
+   gatewayConfig = pdb.FindGatewayByDeploymentIdRow{Hostname: "", Config: cfg}
+ } else if err != nil {
    s.logger.Error("failed to get gateway config",
      "deployment_id", targetDeployment.ID,
      "error", err.Error(),
    )
    return nil, connect.NewError(connect.CodeInternal, fmt.Errorf("failed to get gateway config: %w", err))
- }
+ }

If no builder exists here, wire in the one used during deploy workflow.

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In go/apps/ctrl/services/deployment/rollback.go around lines 110 to 117, the
current code treats any error from FindGatewayByDeploymentId as fatal; instead
detect the “no rows” case (sql.ErrNoRows) and handle it by synthesizing a
gateway config (reuse or call the same createGatewayConfig/builder used in the
deploy flow) so rollback can proceed; log the missing-gateway as info/debug
rather than error, construct the config (wire in the builder or import the
createGatewayConfig helper), and continue without returning an internal error;
only return on truly unexpected errors.

Comment on lines +119 to +149
domainChanges := []db.RollbackDomainParams{}
gatewayChanges := []pdb.UpsertGatewayParams{}

for _, domain := range domains {
if domain.Sticky.Valid &&
(domain.Sticky.DomainsSticky == db.DomainsStickyLive ||
domain.Sticky.DomainsSticky == db.DomainsStickyEnvironment) {

domainChanges = append(domainChanges, db.RollbackDomainParams{
ID: domain.ID,
DeploymentID: sql.NullString{Valid: true, String: targetDeployment.ID},
})

gatewayChanges = append(gatewayChanges, pdb.UpsertGatewayParams{
WorkspaceID: project.WorkspaceID,
DeploymentID: targetDeployment.ID,
Hostname: domain.Domain,
Config: gatewayConfig.Config,
})
}

}

if len(domainChanges) == 0 {
return nil, connect.NewError(connect.CodeInvalidArgument, errors.New("no domains to rollback"))
}
err = pdb.BulkQuery.UpsertGateway(ctx, s.partitionDB.RW(), gatewayChanges)
if err != nil {
s.logger.Error("failed to upsert gateway", "error", err.Error())
return nil, fmt.Errorf("failed to upsert gateway: %w", err)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Domain changes are computed but never persisted.

You build domainChanges yet never apply them. Rollback won’t update domain bindings in the control-plane DB.

Apply domain rollbacks before switching the project’s live deployment ID:

   if len(domainChanges) == 0 {
     return nil, connect.NewError(connect.CodeInvalidArgument, errors.New("no domains to rollback"))
   }
-  err = pdb.BulkQuery.UpsertGateway(ctx, s.partitionDB.RW(), gatewayChanges)
+  // 1) Ensure gateways are ready for the switch
+  err = pdb.BulkQuery.UpsertGateway(ctx, s.partitionDB.RW(), gatewayChanges)
   if err != nil {
-    s.logger.Error("failed to upsert gateway", "error", err.Error())
-    return nil, fmt.Errorf("failed to upsert gateway: %w", err)
+    s.logger.Error("failed to upsert gateway", "error", err.Error())
+    return nil, connect.NewError(connect.CodeInternal, fmt.Errorf("failed to upsert gateway: %w", err))
   }
+  // 2) Persist domain -> targetDeployment bindings
+  if err := db.BulkQuery.RollbackDomain(ctx, s.db.RW(), domainChanges); err != nil {
+    s.logger.Error("failed to rollback domains", "error", err.Error())
+    return nil, connect.NewError(connect.CodeInternal, fmt.Errorf("failed to rollback domains: %w", err))
+  }

Adjust the exact BulkQuery method name/signature to match the generated querier.

🤖 Prompt for AI Agents
In go/apps/ctrl/services/deployment/rollback.go around lines 119 to 149,
domainChanges is populated but never persisted to the database; insert a call to
the generated querier BulkQuery method that applies the RollbackDomainParams
(e.g., BulkQuery.RollbackDomains or appropriate Upsert/Rollback method on
s.partitionDB.RW()) immediately after building domainChanges and before
switching the project's live deployment ID and before UpsertGateway; ensure you
handle and log the returned error consistently (wrap and return it) and adjust
the exact method name/signature to match the generated querier.

Comment on lines +12 to 18
const bulkInsertDomain = `INSERT INTO domains ( id, workspace_id, project_id, deployment_id, domain, type, sticky, created_at, updated_at ) VALUES %s ON DUPLICATE KEY UPDATE
workspace_id = ?,
project_id = ?,
deployment_id = ?,
type = ?,
sticky = ?,
updated_at = ?`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

🧩 Analysis chain

ON DUPLICATE KEY UPDATE should use per-row values, not args[0].

Binding update columns from args[0] applies the first row’s values to all duplicates, which is error-prone if a mixed batch slips through. Use VALUES() (MySQL) or an INSERT alias to reference each row’s values.

-const bulkInsertDomain = `INSERT INTO domains ( id, workspace_id, project_id, deployment_id, domain, type, sticky, created_at, updated_at ) VALUES %s ON DUPLICATE KEY UPDATE
-    workspace_id = ?,
-    project_id = ?,
-    deployment_id = ?,
-    type = ?,
-    sticky = ?,
-    updated_at = ?`
+// Prefer row-scoped updates. Option A (supported widely, though VALUES() is deprecated in recent MySQL):
+const bulkInsertDomain = `INSERT INTO domains ( id, workspace_id, project_id, deployment_id, domain, type, sticky, created_at, updated_at ) VALUES %s
+ON DUPLICATE KEY UPDATE
+    workspace_id = VALUES(workspace_id),
+    project_id = VALUES(project_id),
+    deployment_id = VALUES(deployment_id),
+    type = VALUES(type),
+    sticky = VALUES(sticky),
+    updated_at = VALUES(updated_at)`

And drop the extra trailing args:

-  // Add ON DUPLICATE KEY UPDATE parameters (only once, not per row)
-  if len(args) > 0 {
-    allArgs = append(allArgs, args[0].WorkspaceID)
-    allArgs = append(allArgs, args[0].ProjectID)
-    allArgs = append(allArgs, args[0].DeploymentID)
-    allArgs = append(allArgs, args[0].Type)
-    allArgs = append(allArgs, args[0].Sticky)
-    allArgs = append(allArgs, args[0].UpdatedAt)
-  }
+  // No extra parameters needed; UPDATE uses VALUES() from the row.

If you must avoid VALUES() deprecation, use an INSERT alias:

INSERT INTO domains (...) VALUES %s AS new
ON DUPLICATE KEY UPDATE
  workspace_id = new.workspace_id, project_id = new.project_id, ...

If keeping current approach, ensure batches are homogeneous:


🏁 Script executed:

#!/bin/bash
# Inspect call sites to confirm all rows share the same workspace_id/project_id/deployment_id/type/sticky
rg -nP 'InsertDomains\([^,]+,\s*[^,]+,\s*\[(?:\]|[^\]]+)\]\)' -A5 -B3

Length of output: 0


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "PWD: $(pwd)"
echo "--- locate file ---"
rg -n --hidden -S 'bulk_domain_insert\.sql_generated\.go' || true

echo "--- search for bulkInsertDomain symbol and INSERT INTO domains ---"
rg -n --hidden -S 'bulkInsertDomain' -C3 || true
rg -n --hidden -S 'INSERT INTO domains' -C3 || true

echo "--- print file if present ---"
if [ -f "go/pkg/db/bulk_domain_insert.sql_generated.go" ]; then
  sed -n '1,240p' go/pkg/db/bulk_domain_insert.sql_generated.go
else
  echo "file not found: go/pkg/db/bulk_domain_insert.sql_generated.go"
fi

echo "--- search for InsertDomains usages/definitions ---"
rg -n --hidden -S '\bInsertDomains\s*\(' -C3 || true
rg -n --hidden -S 'func\s+.*InsertDomains' -C3 || true

echo "--- search for args[0], ON DUPLICATE KEY UPDATE, VALUES(, AS new ---"
rg -n --hidden -S 'args\[0\]' -C3 || true
rg -n --hidden -S 'ON DUPLICATE KEY UPDATE' -C3 || true
rg -n --hidden -S 'VALUES\(' -C2 || true
rg -n --hidden -S 'AS new' -C2 || true

Length of output: 90151


Use per-row VALUES() (or INSERT ... AS new) in ON DUPLICATE KEY UPDATE — stop binding update columns from args[0]

Generated InsertDomains binds ON DUPLICATE KEY UPDATE columns from args[0], causing the first row’s values to be applied to every duplicate in the batch. Change the SQL to use per-row VALUES() (or an INSERT alias) and remove the extra appended args; update the generator template so future generated code does the same.

Locations:

  • go/pkg/db/bulk_domain_insert.sql_generated.go — const bulkInsertDomain (SQL) and the "Add ON DUPLICATE KEY UPDATE parameters" append block.
  • go/pkg/db/plugins/bulk-insert/bulk_insert.go.tmpl — template that appends args[0].{{.}}.
  • go/pkg/db/plugins/bulk-insert/generator.go — logic that derives ValuesFields vs UpdateFields (adjust so generator emits VALUES() or INSERT ... AS new instead of using args[0]).

Recommended change (replace const and drop trailing args):

-const bulkInsertDomain = `INSERT INTO domains ( id, workspace_id, project_id, deployment_id, domain, type, sticky, created_at, updated_at ) VALUES %s ON DUPLICATE KEY UPDATE
-    workspace_id = ?,
-    project_id = ?,
-    deployment_id = ?,
-    type = ?,
-    sticky = ?,
-    updated_at = ?`
+const bulkInsertDomain = `INSERT INTO domains ( id, workspace_id, project_id, deployment_id, domain, type, sticky, created_at, updated_at ) VALUES %s
+ON DUPLICATE KEY UPDATE
+    workspace_id = VALUES(workspace_id),
+    project_id = VALUES(project_id),
+    deployment_id = VALUES(deployment_id),
+    type = VALUES(type),
+    sticky = VALUES(sticky),
+    updated_at = VALUES(updated_at)`

And remove the appended-args block:

-  // Add ON DUPLICATE KEY UPDATE parameters (only once, not per row)
-  if len(args) > 0 {
-    allArgs = append(allArgs, args[0].WorkspaceID)
-    allArgs = append(allArgs, args[0].ProjectID)
-    allArgs = append(allArgs, args[0].DeploymentID)
-    allArgs = append(allArgs, args[0].Type)
-    allArgs = append(allArgs, args[0].Sticky)
-    allArgs = append(allArgs, args[0].UpdatedAt)
-  }
+  // No extra parameters needed; UPDATE uses VALUES() from the row.

If you prefer to avoid the (deprecated) VALUES() function in newer MySQL versions, use an INSERT alias:

INSERT INTO domains (...) VALUES %s AS new
ON DUPLICATE KEY UPDATE
  workspace_id = new.workspace_id,
  project_id = new.project_id,
  ...
🤖 Prompt for AI Agents
In go/pkg/db/bulk_domain_insert.sql_generated.go around lines 12 to 18, the
generated SQL and accompanying argument append logic bind ON DUPLICATE KEY
UPDATE columns from args[0], causing the first row’s values to be reused for
every duplicate; change the const to use per-row references (either VALUES(col)
or use VALUES clause aliasing like "VALUES %s AS new" and then reference
new.col) and remove the code that appends args[0] update values; then update the
generator/template files (go/pkg/db/plugins/bulk-insert/bulk_insert.go.tmpl and
go/pkg/db/plugins/bulk-insert/generator.go) so the template no longer emits
appending of args[0].{{.}} and so the generator emits UPDATE clauses that
reference per-row VALUES (or the INSERT alias) instead of using args[0].

Comment on lines +1526 to +1532
//RollbackDomain
//
// UPDATE domains
// SET deployment_id = ?,
// updated_at = ?
// WHERE id = ?
RollbackDomain(ctx context.Context, db DBTX, arg RollbackDomainParams) error
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

RollbackDomain doesn’t toggle is_rolled_back — clarify semantics or update SQL.

Sticky rollbacks likely need to mark the affected domain(s) as rolled back. The generated method only sets deployment_id and updated_at. Consider updating the underlying SQL (domain_rollback.sql) to also set is_rolled_back = TRUE (or appropriately clear it when rolling forward).

Suggest updating go/pkg/db/queries/domain_rollback.sql:

UPDATE domains
SET deployment_id = ?,
    is_rolled_back = TRUE,
    updated_at = ?
WHERE id = ?;
🤖 Prompt for AI Agents
In go/pkg/db/querier_generated.go around lines 1526 to 1532, the RollbackDomain
method updates only deployment_id and updated_at but does not toggle
is_rolled_back as the review suggests; update the domain_rollback.sql used to
generate this method to include setting is_rolled_back = TRUE in the UPDATE (or
the appropriate boolean value per your rollback semantics), then regenerate the
queries so the generated RollbackDomain sets deployment_id, is_rolled_back and
updated_at; also ensure any corresponding "rollforward" SQL explicitly clears
is_rolled_back if needed and add/update tests to cover the flag change.

Comment on lines +23 to +28
workspace_id = sqlc.arg(workspace_id),
project_id = sqlc.arg(project_id),
deployment_id = sqlc.arg(deployment_id),
type = sqlc.arg(type),
sticky = sqlc.arg(sticky),
updated_at = sqlc.arg(updated_at);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Keep tenant/project ownership immutable on upsert.

Allowing workspace_id and project_id to change on duplicate insert can move a domain across tenants/projects unintentionally and is risky for isolation.

Tighten the update set:

-    workspace_id = sqlc.arg(workspace_id),
-    project_id = sqlc.arg(project_id),
     deployment_id = sqlc.arg(deployment_id),
     type = sqlc.arg(type),
     sticky = sqlc.arg(sticky),
     updated_at = sqlc.arg(updated_at);

If you need to migrate ownership, do it via an explicit admin path with checks, not implicit upserts.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
workspace_id = sqlc.arg(workspace_id),
project_id = sqlc.arg(project_id),
deployment_id = sqlc.arg(deployment_id),
type = sqlc.arg(type),
sticky = sqlc.arg(sticky),
updated_at = sqlc.arg(updated_at);
deployment_id = sqlc.arg(deployment_id),
type = sqlc.arg(type),
sticky = sqlc.arg(sticky),
updated_at = sqlc.arg(updated_at);
🤖 Prompt for AI Agents
In go/pkg/db/queries/domain_insert.sql around lines 23 to 28, the ON CONFLICT DO
UPDATE clause currently allows workspace_id and project_id to be overwritten;
change the update set to NOT modify ownership fields — remove workspace_id and
project_id from the RHS of the update and only update non-ownership columns
(e.g., deployment_id, type, sticky, updated_at). Keep tenant/project ownership
immutable on upsert so ownership changes must be performed through an explicit
admin migration/endpoint with proper checks.

Comment on lines +1 to +5
-- name: RollbackDomain :exec
UPDATE domains
SET deployment_id = sqlc.arg(deployment_id),
updated_at = sqlc.arg(updated_at)
WHERE id = sqlc.arg(id);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

🧩 Analysis chain

Persist sticky rollback state.

If domains has sticky and is_rolled_back columns (per PR), set them here to make rollbacks sticky.

Apply (adjust types/defaults as needed):

 -- name: RollbackDomain :exec
 UPDATE domains
 SET deployment_id = sqlc.arg(deployment_id),
-    updated_at = sqlc.arg(updated_at)
+    is_rolled_back = TRUE,
+    sticky = TRUE,
+    updated_at = sqlc.arg(updated_at)
 WHERE id = sqlc.arg(id);

Ensure column names exist and align; otherwise add a separate query to toggle stickiness.


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "PWD: $(pwd)"
echo "---- show file if present: go/pkg/db/queries/domain_rollback.sql ----"
if [ -f go/pkg/db/queries/domain_rollback.sql ]; then
  sed -n '1,200p' go/pkg/db/queries/domain_rollback.sql || true
else
  echo "MISSING: go/pkg/db/queries/domain_rollback.sql"
fi

echo
echo "---- rg: search for columns is_rolled_back|is_rollback|rolled_back|sticky ----"
if command -v rg >/dev/null 2>&1; then
  rg -n --hidden -S '\b(is_rolled_back|is_rollback|rolled_back|sticky)\b' || true
else
  echo "rg not found; falling back to grep/find"
  find . -type f -name '*.sql' -o -name '*.go' -o -name '*.md' | xargs grep -nE '\b(is_rolled_back|is_rollback|rolled_back|sticky)\b' || true
fi

echo
echo "---- rg: search for CREATE TABLE domains / UPDATE domains / RollbackDomain ----"
if command -v rg >/dev/null 2>&1; then
  rg -n --hidden -S 'CREATE TABLE\s+domains' || true
  rg -n --hidden -S '\bUPDATE\s+domains\b' || true
  rg -n --hidden -S 'RollbackDomain\b' || true
  rg -n --hidden -S '\bdomains\b' -C2 || true
else
  find . -type f -name '*.sql' -o -name '*.go' | xargs grep -nE 'CREATE TABLE\s+domains|\bUPDATE\s+domains\b|RollbackDomain|\bdomains\b' || true
fi

echo
echo "---- list top-level .sql files (first 200) ----"
find . -type f -iname '*.sql' -maxdepth 6 | sed -n '1,200p' || true

Length of output: 73779


Persist rollback state — set is_rolled_back; do not set sticky = TRUE

sticky is an ENUM('branch','environment','live') and is_rolled_back is a boolean; setting sticky = TRUE is invalid. Update the rollback SQL to set is_rolled_back = TRUE instead:

File: go/pkg/db/queries/domain_rollback.sql

 -- name: RollbackDomain :exec
 UPDATE domains
 SET deployment_id = sqlc.arg(deployment_id),
-    updated_at = sqlc.arg(updated_at)
+    is_rolled_back = TRUE,
+    updated_at = sqlc.arg(updated_at)
 WHERE id = sqlc.arg(id);

Regenerate sqlc bindings after applying this change.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
-- name: RollbackDomain :exec
UPDATE domains
SET deployment_id = sqlc.arg(deployment_id),
updated_at = sqlc.arg(updated_at)
WHERE id = sqlc.arg(id);
-- name: RollbackDomain :exec
UPDATE domains
SET deployment_id = sqlc.arg(deployment_id),
is_rolled_back = TRUE,
updated_at = sqlc.arg(updated_at)
WHERE id = sqlc.arg(id);
🤖 Prompt for AI Agents
In go/pkg/db/queries/domain_rollback.sql lines 1-5, the rollback SQL incorrectly
tries to set sticky = TRUE (invalid for ENUM) — replace that assignment with
is_rolled_back = TRUE (boolean) and remove any sticky assignment while keeping
deployment_id and updated_at updates and the WHERE id = sqlc.arg(id) clause;
after updating the SQL, regenerate sqlc bindings so the Go code reflects the
changed query.

Comment on lines +12 to 16
const bulkUpsertGateway = `INSERT INTO gateways ( workspace_id, deployment_id, hostname, config ) VALUES %s ON DUPLICATE KEY UPDATE
workspace_id = ?,
deployment_id = ?,
config = ?`

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Critical: ON DUPLICATE KEY UPDATE uses args[0] constants — corrupts multi-row upserts

All conflicting rows will be updated with the first row’s values. Use per-row values (VALUES(...)) to preserve row-specific updates, consistent with other generated queries (e.g., UpsertVM).

Apply:

-const bulkUpsertGateway = `INSERT INTO gateways ( workspace_id, deployment_id, hostname, config ) VALUES %s ON DUPLICATE KEY UPDATE
-    workspace_id = ?,
-    deployment_id = ?,
-    config = ?`
+const bulkUpsertGateway = `INSERT INTO gateways ( workspace_id, deployment_id, hostname, config ) VALUES %s ON DUPLICATE KEY UPDATE
+    workspace_id = VALUES(workspace_id),
+    deployment_id = VALUES(deployment_id),
+    config = VALUES(config)`

And drop the extra bound params:

-// Add ON DUPLICATE KEY UPDATE parameters (only once, not per row)
-if len(args) > 0 {
-  allArgs = append(allArgs, args[0].WorkspaceID)
-  allArgs = append(allArgs, args[0].DeploymentID)
-  allArgs = append(allArgs, args[0].Config)
-}

Also applies to: 41-46

🤖 Prompt for AI Agents
In go/pkg/partition/db/bulk_gateway_upsert.sql_generated.go around lines 12 to
16 (and similarly at lines 41-46), the ON DUPLICATE KEY UPDATE clause uses bound
parameters (args[0] constants) which causes every conflicting row to be updated
with the first row’s values; change the UPDATE side to use per-row VALUES(...)
references (e.g., workspace_id = VALUES(workspace_id), deployment_id =
VALUES(deployment_id), config = VALUES(config)) so each row preserves its own
values, and remove the extra bound parameters that were added for the UPDATE
side.

Comment on lines +25 to +28
ON DUPLICATE KEY UPDATE
workspace_id = ?,
deployment_id = ?,
config = ?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Do not mutate workspace_id on upsert; avoid cross-tenant domain moves.

Updating workspace_id inside ON DUPLICATE KEY UPDATE can silently reassign a hostname across workspaces if the unique key is on hostname (or even (workspace_id, hostname) with non-conflicting targets). This is a data isolation/security smell.

Update the source query (not this generated file) to keep workspace_id immutable on duplicate:

-- go/pkg/partition/db/queries/gateway_upsert.sql
-ON DUPLICATE KEY UPDATE
-    workspace_id = ?,
-    deployment_id = ?,
-    config = ?
+ON DUPLICATE KEY UPDATE
+    deployment_id = ?,
+    config = ?

Also ensure the parameter list in the exec aligns after removing one placeholder.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
ON DUPLICATE KEY UPDATE
workspace_id = ?,
deployment_id = ?,
config = ?
ON DUPLICATE KEY UPDATE
deployment_id = ?,
config = ?
🤖 Prompt for AI Agents
In go/pkg/partition/db/gateway_upsert.sql_generated.go around lines 25 to 28,
the ON DUPLICATE KEY UPDATE currently reassigns workspace_id which can move
hostnames across tenants; remove workspace_id from the UPDATE clause so it only
updates deployment_id and config, and then update the originating source SQL
(not this generated file) to delete the workspace_id placeholder from the
VALUES/parameters list so the exec/prepare call has one fewer argument and the
remaining parameters are shifted appropriately to match the new placeholder
order.

Comment on lines +133 to +141
message RollbackRequest {
string project_id = 1;
string target_deployment_id = 2;
}

message RollbackResponse {
// the rolled back domains
repeated string domains = 1;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Add idempotency and concurrency guards to RollbackRequest.

Without a precondition, two admins could race a promotion/rollback, causing domain churn. Include an optional expected_live_deployment_id for CAS semantics and a request_id for idempotency; optionally support dry_run.

 message RollbackRequest {
   string project_id = 1;
   string target_deployment_id = 2;
+  // Optional concurrency guard: proceed only if the live deployment still matches.
+  string expected_live_deployment_id = 3;
+  // Optional: provide idempotency across retries.
+  string request_id = 4;
+  // Optional: if true, validate and return the plan without mutating.
+  bool dry_run = 5;
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
message RollbackRequest {
string project_id = 1;
string target_deployment_id = 2;
}
message RollbackResponse {
// the rolled back domains
repeated string domains = 1;
}
message RollbackRequest {
string project_id = 1;
string target_deployment_id = 2;
// Optional concurrency guard: proceed only if the live deployment still matches.
string expected_live_deployment_id = 3;
// Optional: provide idempotency across retries.
string request_id = 4;
// Optional: if true, validate and return the plan without mutating.
bool dry_run = 5;
}
message RollbackResponse {
// the rolled back domains
repeated string domains = 1;
}
🤖 Prompt for AI Agents
In go/proto/ctrl/v1/deployment.proto around lines 133 to 141, the
RollbackRequest lacks idempotency and concurrency guards; update the message to
add optional string request_id for idempotency, optional string
expected_live_deployment_id to provide CAS-style precondition for the current
live deployment, and optional bool dry_run to allow validation-only checks;
ensure new fields use the next available tag numbers and include short comments
documenting semantics (request_id for deduplication, expected_live_deployment_id
for conditional rollback, dry_run for simulation).

@chronark chronark closed this Sep 19, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant