Skip to content

refactor: remove hydra in favor of restate#4053

Merged
chronark merged 5 commits intomainfrom
10-02-refactor_remove_hydra_in_favor_of_restate
Oct 3, 2025
Merged

refactor: remove hydra in favor of restate#4053
chronark merged 5 commits intomainfrom
10-02-refactor_remove_hydra_in_favor_of_restate

Conversation

@chronark
Copy link
Collaborator

@chronark chronark commented Oct 2, 2025

What does this PR do?

This is mostly generated code changes, don't be scared

I've already chatted with @Flo4604 earlier, but I noticed our hydra implementation is quite flawed.
We had race conditions, cases where a workflow was not durable at all when workers crashed and after spending a few hours trying to fix it, I figured it's not worth it.

This PR rips our hydra and powers our durable workflows via restate.dev instead.
It's already in sandbox btw.

I realised we can either spend a lot of time to build this inhouse, or focus on what really matters.
Restate has a very nice balance between features and dependencies. It ships as a single binary for devmode and they also have a k8s operator, which we can use later in prod.

Flo also has experience running it in prod for steamsets.

The actual workflow implementation is fairly similar, so the business logic is almost the same.
The exception is that we use a dedicated "Virtual Object" for domain and gateway assignments, so these are always serialized and we can not have race conditions there.

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • Chore (refactoring code, technical debt, workflow improvements)
  • Enhancement (small improvements)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

How should this be tested?

  • Pretend like nothing changed and deploy an api, it should still work

Checklist

Required

  • Filled out the "How to test" section in this PR
  • Read Contributing Guide
  • Self-reviewed my own code
  • Commented on my code in hard-to-understand areas
  • Ran pnpm build
  • Ran pnpm fmt
  • Checked for warnings, there are none
  • Removed all console.logs
  • Merged the latest changes from main onto my branch with git pull origin main
  • My changes don't cause any responsiveness issues

Appreciated

  • If a UI change was made: Added a screen recording or screenshots to this PR
  • Updated the Unkey Docs if changes were necessary

This is mostly generated code changes, don't be scared

I've already chatted with @Flo4604 earlier, but I noticed our hydra implementation is quite flawed.
We had race conditions, cases where a workflow was not durable at all when workers crashed and after spending a few hours trying to fix it, I figured it's not worth it.

This PR rips our hydra and powers our durable workflows via restate.dev instead.
It's already in sandbox btw.

I realised we can either spend a lot of time to build this inhouse, or focus on what really matters.
Restate has a very nice balance between features and dependencies. It ships as a single binary for devmode and they also have a k8s operator, which we can use later in prod.

Flo also has experience running it in prod for steamsets.

The actual workflow implementation is fairly similar, so the business logic is almost the same.
The exception is that we use a dedicated "Virtual Object" for domain and gateway assignments, so these are always serialized and we can not have race conditions there.
@vercel
Copy link

vercel bot commented Oct 2, 2025

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Preview Comments Updated (UTC)
engineering Ready Ready Preview Comment Oct 2, 2025 6:11pm
1 Skipped Deployment
Project Deployment Preview Comments Updated (UTC)
dashboard Ignored Ignored Preview Oct 2, 2025 6:11pm

@changeset-bot
Copy link

changeset-bot bot commented Oct 2, 2025

⚠️ No Changeset found

Latest commit: b24d63e

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Oct 2, 2025

Warning

Rate limit exceeded

@chronark has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 3 minutes and 42 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between 0d2c6ed and b24d63e.

📒 Files selected for processing (2)
  • apps/engineering/content/docs/infrastructure/database-schema.mdx (4 hunks)
  • go/apps/ctrl/workflows/deploy/deploy_handler.go (1 hunks)
📝 Walkthrough

Walkthrough

Replaces Hydra-based in-process workflows with Restate virtual-object workflows across deploy, routing, and certificate; adds Restate infra (Docker/K8s/Tilt/Make), new protobufs and buf/generation, updates ctrl config/CLI/run wiring, adds Go workflow packages and DB query, and adds a Mermaid MDX component for docs.

Changes

Cohort / File(s) Summary
Control-plane: Restate wiring & config
go/apps/ctrl/config.go, go/cmd/ctrl/main.go, go/apps/ctrl/run.go, go/Makefile, go/Tiltfile
Add RestateConfig and CLI flags; remove Hydra DB flag; initialize Restate client/server, register workflow services, update Make/Tilt targets to deploy Restate.
Workflows: deploy / routing / certificate (new Restate packages)
go/apps/ctrl/workflows/deploy/*, go/apps/ctrl/workflows/routing/*, go/apps/ctrl/workflows/certificate/*, go/apps/ctrl/workflows/*/*.go
New Restate-backed workflow packages: deploy (Deploy/Promote/Rollback), routing (AssignDomains/SwitchDomains), certificate (ProcessChallenge), helpers, docs, and service constructors.
Ctrl services: migration to Restate ingress
go/apps/ctrl/services/deployment/service.go, go/apps/ctrl/services/deployment/create_deployment.go, go/apps/ctrl/services/deployment/promote.go, go/apps/ctrl/services/deployment/rollback.go, go/apps/ctrl/services/acme/*
Replace Hydra engine usage with Restate ingress client; refactor constructors to Config structs; delegate create/promote/rollback to Restate workflows; remove Hydra ACME workflow and engine fields.
Removed legacy Hydra workflows (ctrl)
go/apps/ctrl/services/deployment/deploy_workflow.go, go/apps/ctrl/services/acme/certificate_workflow.go
Delete prior Hydra-based workflow implementations and related types/methods.
Protobufs, buf, and go mod changes
go/proto/hydra/v1/*.proto, go/buf.gen.restate.yaml, go/buf.gen.connect.yaml, go/buf.yaml, go/go.mod
Add hydra.v1 protos for deployment/routing/certificate (Restate virtual objects), new buf generation config for Restate plugin, and go.mod dependencies (restatedev/sdk-go, etc.).
Database: query + generated code
go/pkg/db/queries/domain_find_by_ids.sql, go/pkg/db/domain_find_by_ids.sql_generated.go, go/pkg/db/querier_generated.go
Add FindDomainsByIds SQL, generated Go code, and new Querier method for batch domain lookup.
Kubernetes / Docker / deployment topology
deployment/docker-compose.yaml, go/k8s/manifests/restate.yaml, go/k8s/manifests/ctrl.yaml
Add restate service (docker-compose & k8s), expose ctrl port 9080, add Restate env vars and healthchecks, PVC/StatefulSet for Restate.
Workflows: service-level runtime changes in ctrl
go/apps/ctrl/services/*, go/apps/ctrl/run.go
Update service constructors and runtime wiring to pass Restate client; register workflow servers with Restate admin optionally.
Frontend docs: Mermaid MDX integration
apps/engineering/app/components/mermaid.tsx, apps/engineering/app/docs/[[...slug]]/page.tsx, apps/engineering/package.json
Add client-side Mermaid React component (theme-aware), register component in MDX renderer, add mermaid and next-themes dependencies.
Engineering docs: workflows content and meta
apps/engineering/content/docs/architecture/meta.json, apps/engineering/content/docs/architecture/workflows/*
Add Durable Workflows docs: index, deployment-service, routing-service, creating-services, and workflows meta.
Docs & misc updates
apps/engineering/content/docs/infrastructure/database-schema.mdx, go/GO_DOCUMENTATION_GUIDELINES.md, go/demo_api/main.go
Remove Hydra references from DB schema doc, rewrite Go doc guidelines, tighten demo API validation (minLength).
Misc: generated/other helpers
go/pkg/db/*, various new helper files under go/apps/ctrl/*
Add helper functions (PEM handling, domain slugging, gateway config, status updates), new generated code and package docs.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor Client
  participant API as ctrl (gRPC/HTTP)
  participant Ingress as Restate Ingress
  participant Server as Restate Server
  participant WF as Workflow (Deploy/Routing/Cert)
  participant DB as DB/PartitionDB
  participant Krane as Krane
  Note over API,Ingress: ctrl delegates long-running ops to Restate virtual objects

  Client->>API: Deploy / Promote / Rollback request
  API->>Ingress: WorkflowSend / Object.Request (hydra.v1)
  Ingress->>Server: route to virtual object
  Server->>WF: Invoke Deploy/Promote/Rollback handler
  WF->>DB: durable steps (load/update entities)
  WF->>Krane: create & poll deployment (Deploy)
  alt domain config needed
    WF->>Ingress: Object.Request (RoutingService.AssignDomains / SwitchDomains)
    Ingress->>Server: route to Routing workflow
    Server->>WF: Routing handler upserts domains & gateway configs
    WF->>DB: store gateway configs
  end
  WF-->>Server: complete
  Server-->>Ingress: response
  Ingress-->>API: workflow started/response
  API-->>Client: 202/2xx or error
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120+ minutes

Possibly related PRs

Suggested labels

No Rabbit

Suggested reviewers

  • perkinsjr
  • mcstepp
  • imeyer
  • Flo4604
  • MichaelUnkey
  • ogzhanolguncu

Pre-merge checks and finishing touches

❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Description Check ⚠️ Warning The PR description follows the template structure but omits the required “Fixes # (issue)” reference to link an issue, and does not list any dependencies where requested. Although motivation, context, and testing instructions are provided, the missing issue link prevents proper tracking and template compliance. Please add a “Fixes #” line under the “What does this PR do?” section to reference the tracked issue and include any dependency listings if applicable to fully satisfy the repository’s description template.
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (1 passed)
Check name Status Explanation
Title Check ✅ Passed The title succinctly and accurately describes the main change of the PR—removing the in-house Hydra implementation in favor of Restate—and follows conventional commit style without extraneous details. It clearly conveys the primary refactor purpose and would be meaningful to teammates reviewing history.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Collaborator Author

chronark commented Oct 2, 2025

This stack of pull requests is managed by Graphite. Learn more about stacking.

@chronark
Copy link
Collaborator Author

chronark commented Oct 2, 2025

@coderabbitai review

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Oct 2, 2025

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@chronark chronark marked this pull request as ready for review October 2, 2025 17:30
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 14

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
go/GO_DOCUMENTATION_GUIDELINES.md (1)

401-402: Clarify package comment style

Line 401 says to reserve /* */ for package docs, but the doc.go example above (Lines 39-77) uses // line comments. That contradiction will confuse contributors. Please align the rule and example—either bless // for package docs or adjust the example accordingly.

Apply this diff if you want to keep using // consistently:

-2. Use `//` for function comments, `/* */` only for package-level overviews
+2. Prefer `//` line comments everywhere (including package docs) for consistency with Go tooling expectations; reserve `/* */` only when absolutely necessary.
go/apps/ctrl/services/deployment/promote.go (1)

20-31: Enforce workspace ownership in Promote
In go/apps/ctrl/services/deployment/promote.go, after loading targetDeployment, compare its WorkspaceID against the caller’s workspace ID (extracted from auth context) and return a permission‐denied error if they differ.

🧹 Nitpick comments (17)
deployment/docker-compose.yaml (1)

293-293: Consider pinning Restate image version.

Using latest tag in production-like configurations can lead to unexpected behavior when the image is updated. Consider pinning to a specific version for reproducibility.

-    image: docker.io/restatedev/restate:latest
+    image: docker.io/restatedev/restate:1.2.3  # Replace with actual version
go/apps/ctrl/config.go (1)

103-103: Consider adding Restate configuration validation.

The Config.Validate() method (lines 106-115) validates ACME/Cloudflare configuration but does not validate the new Restate field. If the control plane requires Restate to function, consider adding validation checks for required Restate configuration (e.g., non-empty URLs, valid port range).

For example, you could add validation like:

 func (c Config) Validate() error {
 	// Validate Cloudflare configuration if enabled
 	if c.Acme.Enabled && c.Acme.Cloudflare.Enabled {
 		if err := assert.NotEmpty(c.Acme.Cloudflare.ApiToken, "cloudflare API token is required when cloudflare is enabled"); err != nil {
 			return err
 		}
 	}
+
+	// Validate Restate configuration if workflows are enabled
+	if err := assert.NotEmpty(c.Restate.IngressURL, "restate ingress URL is required"); err != nil {
+		return err
+	}
+	if err := assert.NotEmpty(c.Restate.AdminURL, "restate admin URL is required"); err != nil {
+		return err
+	}
+	if c.Restate.HttpPort <= 0 || c.Restate.HttpPort > 65535 {
+		return fmt.Errorf("restate HTTP port must be between 1 and 65535, got %d", c.Restate.HttpPort)
+	}
 
 	return nil
 }
go/apps/ctrl/workflows/routing/assign_domains_handler.go (2)

111-150: Revisit operation ordering for stronger consistency guarantees.

Consider upserting gateway configs first, then applying domain updates (or use a saga with compensation) to match the “gateway-before-domain” invariant and avoid windows of inconsistent routing.


116-135: Avoid duplicate gateway upserts for repeated domains.

Dedup domains locally before building gatewayParams.

Example:

-		for _, domainName := range changedDomains {
+		seen := map[string]struct{}{}
+		for _, domainName := range changedDomains {
+			if _, ok := seen[domainName]; ok {
+				continue
+			}
+			seen[domainName] = struct{}{}
go/apps/ctrl/workflows/routing/service.go (1)

43-51: Validate required config in New().

Fail fast if DB/PartitionDB/logger are nil or DefaultDomain is empty to prevent subtle runtime issues (e.g., local-domain filtering).

Example:

 func New(cfg Config) *Service {
+	if cfg.DB == nil || cfg.PartitionDB == nil {
+		panic("routing.Service: DB and PartitionDB are required")
+	}
+	if cfg.Logger == (logging.Logger{}) {
+		panic("routing.Service: Logger is required")
+	}
+	if cfg.DefaultDomain == "" {
+		panic("routing.Service: DefaultDomain is required")
+	}
 	return &Service{
go/k8s/manifests/restate.yaml (1)

33-58: Harden the pod securityContext.

We’re shipping this StatefulSet without any pod/container securityContext, so by default the container can run as root with privilege escalation permitted. That trips CKV_K8S_20/23 and leaves us exposed if the Restate image is ever compromised. Please explicitly set runAsNonRoot: true, allowPrivilegeEscalation: false, and related hardening fields (fsGroup, seccomp, etc.) so we get the expected defense-in-depth posture.

go/apps/ctrl/services/deployment/create_deployment.go (3)

147-151: Validate docker image and verify KeyAuthId mapping

  • Validate req.Msg.GetDockerImage() is non-empty before inserting/dispatching.
  • Confirm KeyAuthId should be populated from KeyspaceId. If the field expects a key-auth credential id, this may be incorrect.

154-160: Add a per-call timeout to the Restate send

Prevent hangs if Restate is slow/unreachable.

Apply this diff:

-	invocation := restateingress.WorkflowSend[*hydrav1.DeployRequest](
+	ctx, cancel := context.WithTimeout(ctx, 15*time.Second)
+	defer cancel()
+	invocation := restateingress.WorkflowSend[*hydrav1.DeployRequest](
 		s.restate,
 		"hydra.v1.DeploymentService",
 		project.ID,
 		"Deploy",
 	).Send(ctx, deployReq)

164-167: Optional: persist invocation_id for traceability

Consider storing invocation.Id on the deployment record (if a column exists) for easier cross-system debugging.

apps/engineering/content/docs/architecture/workflows/creating-services.mdx (1)

126-146: Recommend per-call timeouts in samples

Add context.WithTimeout around ingress calls to avoid indefinite waits and improve resilience.

Example:

ctx, cancel := context.WithTimeout(ctx, 15*time.Second)
defer cancel()

resp, err := restateingress.Object[...](...).Request(ctx, request)
go/apps/ctrl/services/deployment/promote.go (2)

35-42: Add per-call timeout to Restate request

Guard against slow/unreachable Restate.

Apply this diff:

-import (
-	"context"
-	"fmt"
-
-	"connectrpc.com/connect"
-	restateingress "github.com/restatedev/sdk-go/ingress"
-	ctrlv1 "github.com/unkeyed/unkey/go/gen/proto/ctrl/v1"
-	hydrav1 "github.com/unkeyed/unkey/go/gen/proto/hydra/v1"
-	"github.com/unkeyed/unkey/go/pkg/db"
-)
+import (
+	"context"
+	"fmt"
+	"time"
+
+	"connectrpc.com/connect"
+	restateingress "github.com/restatedev/sdk-go/ingress"
+	ctrlv1 "github.com/unkeyed/unkey/go/gen/proto/ctrl/v1"
+	hydrav1 "github.com/unkeyed/unkey/go/gen/proto/hydra/v1"
+	"github.com/unkeyed/unkey/go/pkg/db"
+)
@@
-	_, err = restateingress.Object[*hydrav1.PromoteRequest, *hydrav1.PromoteResponse](
+	ctx, cancel := context.WithTimeout(ctx, 30*time.Second)
+	defer cancel()
+	_, err = restateingress.Object[*hydrav1.PromoteRequest, *hydrav1.PromoteResponse](
 		s.restate,
 		"hydra.v1.DeploymentService",
 		targetDeployment.ProjectID,
 		"Promote",
 	).Request(ctx, &hydrav1.PromoteRequest{
 		TargetDeploymentId: req.Msg.GetTargetDeploymentId(),
 	})

44-50: Optional: propagate workflow error codes

If the workflow returns structured Connect errors (e.g., NotFound, InvalidArgument), unwrap/map them instead of always returning CodeInternal to improve client UX.

apps/engineering/content/docs/architecture/workflows/index.mdx (1)

27-27: Consider adding concrete code examples.

The phrase "See the Go implementation files for examples" is somewhat vague. Consider adding a brief inline code snippet showing a typical service communication pattern or linking to specific example files/line numbers.

go/apps/ctrl/workflows/deploy/rollback_handler.go (1)

130-138: Remove redundant filtering logic.

The loop at lines 132-136 re-filters domains by sticky type, but the query at lines 111-117 already filters by these exact sticky types. The domain slice returned from the query only contains domains with sticky types of Live or Environment.

Apply this diff to simplify:

-	// Collect domain IDs
-	var domainIDs []string
-	for _, domain := range domains {
-		if domain.Sticky.Valid &&
-			(domain.Sticky.DomainsSticky == db.DomainsStickyLive ||
-				domain.Sticky.DomainsSticky == db.DomainsStickyEnvironment) {
-			domainIDs = append(domainIDs, domain.ID)
-		}
-	}
+	// Collect domain IDs (already filtered by query)
+	domainIDs := make([]string, len(domains))
+	for i, domain := range domains {
+		domainIDs[i] = domain.ID
+	}
go/apps/ctrl/workflows/certificate/pem.go (3)

16-30: Add nil check for robustness.

The function does not validate that privateKey is non-nil before marshaling. While the caller may ensure this, adding a defensive check prevents potential panics.

Apply this diff:

 func privateKeyToString(privateKey *ecdsa.PrivateKey) (string, error) {
+	if privateKey == nil {
+		return "", fmt.Errorf("private key cannot be nil")
+	}
 	// Marshal the private key to DER format
 	privKeyBytes, err := x509.MarshalECPrivateKey(privateKey)

35-49: Consider validating PEM block type.

The function decodes the PEM block but does not verify that block.Type is "EC PRIVATE KEY". While x509.ParseECPrivateKey will fail on invalid data, an explicit type check would provide clearer error messages for mismatched PEM types.

 func stringToPrivateKey(pemString string) (*ecdsa.PrivateKey, error) {
 	// Decode PEM format
 	block, _ := pem.Decode([]byte(pemString))
 	if block == nil {
 		return nil, fmt.Errorf("failed to decode PEM block")
 	}
+	if block.Type != "EC PRIVATE KEY" {
+		return nil, fmt.Errorf("invalid PEM block type: expected 'EC PRIVATE KEY', got '%s'", block.Type)
+	}
 
 	// Parse the EC private key
 	privateKey, err := x509.ParseECPrivateKey(block.Bytes)

55-67: Consider validating PEM block type.

Similar to stringToPrivateKey, this function could benefit from validating that block.Type is "CERTIFICATE" to provide clearer error messages when the wrong PEM type is provided.

 func getCertificateExpiry(certPEM string) (time.Time, error) {
 	block, _ := pem.Decode([]byte(certPEM))
 	if block == nil {
 		return time.Time{}, fmt.Errorf("failed to decode PEM block")
 	}
+	if block.Type != "CERTIFICATE" {
+		return time.Time{}, fmt.Errorf("invalid PEM block type: expected 'CERTIFICATE', got '%s'", block.Type)
+	}
 
 	cert, err := x509.ParseCertificate(block.Bytes)
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 027ce45 and 4828887.

⛔ Files ignored due to path filters (18)
  • go/buf.lock is excluded by !**/*.lock
  • go/gen/proto/ctrl/v1/acme.pb.go is excluded by !**/*.pb.go, !**/gen/**
  • go/gen/proto/ctrl/v1/build.pb.go is excluded by !**/*.pb.go, !**/gen/**
  • go/gen/proto/ctrl/v1/deployment.pb.go is excluded by !**/*.pb.go, !**/gen/**
  • go/gen/proto/ctrl/v1/openapi.pb.go is excluded by !**/*.pb.go, !**/gen/**
  • go/gen/proto/ctrl/v1/service.pb.go is excluded by !**/*.pb.go, !**/gen/**
  • go/gen/proto/hydra/v1/certificate.pb.go is excluded by !**/*.pb.go, !**/gen/**
  • go/gen/proto/hydra/v1/certificate_restate.pb.go is excluded by !**/*.pb.go, !**/gen/**
  • go/gen/proto/hydra/v1/deployment.pb.go is excluded by !**/*.pb.go, !**/gen/**
  • go/gen/proto/hydra/v1/deployment_restate.pb.go is excluded by !**/*.pb.go, !**/gen/**
  • go/gen/proto/hydra/v1/routing.pb.go is excluded by !**/*.pb.go, !**/gen/**
  • go/gen/proto/hydra/v1/routing_restate.pb.go is excluded by !**/*.pb.go, !**/gen/**
  • go/gen/proto/krane/v1/deployment.pb.go is excluded by !**/*.pb.go, !**/gen/**
  • go/gen/proto/partition/v1/gateway.pb.go is excluded by !**/*.pb.go, !**/gen/**
  • go/gen/proto/vault/v1/object.pb.go is excluded by !**/*.pb.go, !**/gen/**
  • go/gen/proto/vault/v1/service.pb.go is excluded by !**/*.pb.go, !**/gen/**
  • go/go.sum is excluded by !**/*.sum
  • pnpm-lock.yaml is excluded by !**/pnpm-lock.yaml
📒 Files selected for processing (53)
  • apps/engineering/app/components/mermaid.tsx (1 hunks)
  • apps/engineering/app/docs/[[...slug]]/page.tsx (2 hunks)
  • apps/engineering/content/docs/architecture/meta.json (1 hunks)
  • apps/engineering/content/docs/architecture/workflows/creating-services.mdx (1 hunks)
  • apps/engineering/content/docs/architecture/workflows/deployment-service.mdx (1 hunks)
  • apps/engineering/content/docs/architecture/workflows/index.mdx (1 hunks)
  • apps/engineering/content/docs/architecture/workflows/meta.json (1 hunks)
  • apps/engineering/content/docs/architecture/workflows/routing-service.mdx (1 hunks)
  • apps/engineering/content/docs/infrastructure/database-schema.mdx (4 hunks)
  • apps/engineering/package.json (1 hunks)
  • deployment/docker-compose.yaml (4 hunks)
  • go/GO_DOCUMENTATION_GUIDELINES.md (8 hunks)
  • go/Makefile (5 hunks)
  • go/Tiltfile (5 hunks)
  • go/apps/ctrl/config.go (2 hunks)
  • go/apps/ctrl/run.go (5 hunks)
  • go/apps/ctrl/services/acme/certificate_workflow.go (0 hunks)
  • go/apps/ctrl/services/acme/service.go (0 hunks)
  • go/apps/ctrl/services/deployment/create_deployment.go (2 hunks)
  • go/apps/ctrl/services/deployment/deploy_workflow.go (0 hunks)
  • go/apps/ctrl/services/deployment/promote.go (2 hunks)
  • go/apps/ctrl/services/deployment/rollback.go (2 hunks)
  • go/apps/ctrl/services/deployment/service.go (1 hunks)
  • go/apps/ctrl/workflows/certificate/doc.go (1 hunks)
  • go/apps/ctrl/workflows/certificate/pem.go (1 hunks)
  • go/apps/ctrl/workflows/certificate/process_challenge_handler.go (1 hunks)
  • go/apps/ctrl/workflows/certificate/service.go (1 hunks)
  • go/apps/ctrl/workflows/deploy/deploy_handler.go (1 hunks)
  • go/apps/ctrl/workflows/deploy/doc.go (1 hunks)
  • go/apps/ctrl/workflows/deploy/domains.go (3 hunks)
  • go/apps/ctrl/workflows/deploy/helpers.go (1 hunks)
  • go/apps/ctrl/workflows/deploy/promote_handler.go (1 hunks)
  • go/apps/ctrl/workflows/deploy/rollback_handler.go (1 hunks)
  • go/apps/ctrl/workflows/deploy/service.go (1 hunks)
  • go/apps/ctrl/workflows/routing/assign_domains_handler.go (1 hunks)
  • go/apps/ctrl/workflows/routing/doc.go (1 hunks)
  • go/apps/ctrl/workflows/routing/helpers.go (1 hunks)
  • go/apps/ctrl/workflows/routing/service.go (1 hunks)
  • go/apps/ctrl/workflows/routing/switch_domains_handler.go (1 hunks)
  • go/buf.gen.connect.yaml (1 hunks)
  • go/buf.gen.restate.yaml (1 hunks)
  • go/buf.yaml (2 hunks)
  • go/cmd/ctrl/main.go (2 hunks)
  • go/demo_api/main.go (1 hunks)
  • go/go.mod (4 hunks)
  • go/k8s/manifests/ctrl.yaml (4 hunks)
  • go/k8s/manifests/restate.yaml (1 hunks)
  • go/pkg/db/domain_find_by_ids.sql_generated.go (1 hunks)
  • go/pkg/db/querier_generated.go (1 hunks)
  • go/pkg/db/queries/domain_find_by_ids.sql (1 hunks)
  • go/proto/hydra/v1/certificate.proto (1 hunks)
  • go/proto/hydra/v1/deployment.proto (1 hunks)
  • go/proto/hydra/v1/routing.proto (1 hunks)
💤 Files with no reviewable changes (3)
  • go/apps/ctrl/services/acme/service.go
  • go/apps/ctrl/services/acme/certificate_workflow.go
  • go/apps/ctrl/services/deployment/deploy_workflow.go
🧰 Additional context used
🧠 Learnings (6)
📚 Learning: 2025-09-01T02:33:43.791Z
Learnt from: imeyer
PR: unkeyed/unkey#3899
File: go/buf.gen.yaml:0-0
Timestamp: 2025-09-01T02:33:43.791Z
Learning: In the unkeyed/unkey repository, buf commands are executed from the `go/` directory where the `buf.yaml` file is located. This means the `out: gen` configuration in `go/buf.gen.yaml` generates files to `go/gen/` relative to the repository root, which aligns with the expected directory structure.

Applied to files:

  • go/Makefile
📚 Learning: 2025-09-01T01:57:42.227Z
Learnt from: imeyer
PR: unkeyed/unkey#3899
File: go/proto/metald/v1/metald.proto:5-9
Timestamp: 2025-09-01T01:57:42.227Z
Learning: In the unkeyed/unkey repository, buf is configured to properly resolve metald proto imports like "metald/v1/vm.proto" without needing the full "go/proto/" prefix. The buf lint command `buf lint --path proto/metald` passes successfully with these relative import paths.

Applied to files:

  • go/Makefile
📚 Learning: 2025-04-22T14:43:11.724Z
Learnt from: chronark
PR: unkeyed/unkey#3161
File: go/pkg/clickhouse/schema/databases/002_ratelimits/006_ratelimits_per_day_v1.sql:1-13
Timestamp: 2025-04-22T14:43:11.724Z
Learning: In the unkey project, the SQL files in clickhouse/schema/databases represent the current production schema and shouldn't be modified directly in PRs. Schema changes require dedicated migration scripts.

Applied to files:

  • apps/engineering/content/docs/infrastructure/database-schema.mdx
📚 Learning: 2025-07-02T11:51:58.572Z
Learnt from: chronark
PR: unkeyed/unkey#3420
File: go/pkg/hydra/store/gorm/gorm.go:486-498
Timestamp: 2025-07-02T11:51:58.572Z
Learning: The Hydra package (go/pkg/hydra) is planned to be migrated from GORM to sqlc for database operations, which explains why raw SQL queries are acceptable in the current implementation.

Applied to files:

  • apps/engineering/content/docs/infrastructure/database-schema.mdx
📚 Learning: 2025-09-12T08:01:20.792Z
Learnt from: Flo4604
PR: unkeyed/unkey#3944
File: go/pkg/db/acme_challenge_update_verified_with_expiry.sql_generated.go:31-39
Timestamp: 2025-09-12T08:01:20.792Z
Learning: Do not review or suggest changes to files with sql_generated.go suffix or other files marked as auto-generated (containing "Code generated by" comments), as these are generated by tools like sqlc and changes would be overwritten on regeneration.

Applied to files:

  • apps/engineering/content/docs/infrastructure/database-schema.mdx
📚 Learning: 2025-09-15T18:12:01.503Z
Learnt from: mcstepp
PR: unkeyed/unkey#3952
File: go/apps/ctrl/run.go:212-213
Timestamp: 2025-09-15T18:12:01.503Z
Learning: In Go connect-go generated handlers, mux.Handle can accept New*ServiceHandler functions directly even though they return (string, http.Handler) tuples. The pattern mux.Handle(ctrlv1connect.New*ServiceHandler(...)) is valid and compiles successfully in unkey codebase.

Applied to files:

  • go/apps/ctrl/run.go
🪛 Buf (1.57.2)
go/proto/hydra/v1/routing.proto

5-5: import "dev/restate/sdk/go.proto": file does not exist

(COMPILE)

go/proto/hydra/v1/certificate.proto

5-5: import "dev/restate/sdk/go.proto": file does not exist

(COMPILE)

go/proto/hydra/v1/deployment.proto

5-5: import "dev/restate/sdk/go.proto": file does not exist

(COMPILE)

🪛 Checkov (3.2.334)
go/k8s/manifests/restate.yaml

[medium] 15-64: Containers should not run with allowPrivilegeEscalation

(CKV_K8S_20)


[medium] 15-64: Minimize the admission of root containers

(CKV_K8S_23)

🔇 Additional comments (39)
go/buf.gen.connect.yaml (1)

2-3: LGTM! Standard buf managed mode enablement.

Enabling managed mode is a standard practice for buf-managed proto generation and aligns with the broader Restate integration.

go/pkg/db/queries/domain_find_by_ids.sql (1)

1-14: LGTM! Well-formed batch domain query.

The query correctly uses sqlc.slice(ids) for parameterized IN clause and selects all relevant domain fields. This supports the new multi-id retrieval capability mentioned in the generated code.

apps/engineering/content/docs/architecture/meta.json (1)

6-6: LGTM! Adds workflows documentation page.

The addition of "workflows" to the architecture pages aligns with the new durable workflows documentation introduced in this PR.

go/apps/ctrl/workflows/deploy/domains.go (2)

18-35: LGTM! Excellent documentation of domain generation strategy.

The expanded documentation clearly explains the three domain types (per-commit, per-branch, per-environment) and their sticky behavior, which is critical for understanding the domain assignment logic in Restate-based workflows.


85-124: LGTM! Well-implemented sluggify helper.

The function correctly transforms branch names into URL-safe slugs with appropriate safeguards:

  • Removes special characters while preserving alphanumerics
  • Normalizes separators (hyphens/underscores → spaces → hyphens)
  • Enforces length limit (80 chars)
  • Cleans up trailing artifacts

The use of precompiled regexes (lines 82-83) is efficient for repeated invocations.

go/Tiltfile (3)

114-124: LGTM! Restate service properly integrated.

The Restate service is correctly configured with appropriate port forwards (8081 for ingress, 9070 for admin) and dependencies. The conditional setup aligns with the service toggle pattern used for other services.

Note: The service is labeled as 'database' which is unconventional for a workflow engine, but appears intentional given the infrastructure grouping in this codebase.


346-347: LGTM! Restate endpoints documented in UI.

The Ingress and Admin endpoints are properly documented for developer access.


245-245: Verify Restate availability handling in ctrl startup

  • Confirm Tiltfile’s addition of 'restate' to ctrl_deps is intentional
  • Ensure ctrl’s calls to Restate include retry/backoff or graceful fallback when Restate is temporarily unavailable
go/proto/hydra/v1/deployment.proto (1)

9-14: LGTM! Virtual Object ensures serialization.

The VIRTUAL_OBJECT service type is the correct choice for the DeploymentService. This ensures that operations on the same deployment are serialized, directly addressing the race condition issues mentioned in the PR objectives.

deployment/docker-compose.yaml (3)

289-307: LGTM! Restate service properly configured.

The Restate service is correctly set up with:

  • Appropriate port mappings (8081 for ingress, 9070 for admin)
  • Health check on the admin endpoint
  • Tracing integration with OpenTelemetry
  • Proper dependency on the otel service

347-351: LGTM! Restate configuration properly wired.

The environment variables correctly configure:

  • Ingress URL pointing to restate service internal endpoint
  • Admin URL for management operations
  • HTTP port matching the exposed ctrl port (9080)
  • Registration URL pointing to ctrl's restate service endpoint

This aligns with the Restate integration documented in the PR.


321-321: All port 9080 configurations are consistent. Verified that docker-compose, environment variables, Restate registration URL, Kubernetes manifests, and RestateConfig all use port 9080.

go/buf.yaml (1)

17-18: LGTM! Restate SDK dependency correctly added.

Adding buf.build/restatedev/sdk-go to the deps list enables the proto imports from the Restate SDK used in deployment.proto and other workflow definitions.

go/apps/ctrl/config.go (1)

32-46: LGTM! Clear Restate configuration structure.

The RestateConfig struct is well-documented with helpful inline comments explaining each field's purpose and providing example values. The field names are descriptive and consistent with Go naming conventions.

apps/engineering/app/docs/[[...slug]]/page.tsx (1)

1-1: LGTM! Mermaid component integration is correct.

The import and component mapping enable Mermaid diagram rendering in MDX documentation. The integration follows the existing pattern for custom components (Tabs, Tab, Banner) and the import path is correct.

Also applies to: 43-43

go/go.mod (1)

205-205: LGTM! Indirect dependencies align with Restate integration.

The addition of four indirect dependencies supports the Restate SDK integration and related authentication/schema handling:

  • restatedev/sdk-go v0.20.0 provides the core Restate SDK
  • golang-jwt/jwt/v5 v5.2.3 supports JWT-based authentication
  • invopop/jsonschema v0.13.0 enables dynamic schema validation
  • mr-tron/base58 v1.2.0 supports encoding operations

These transitive dependencies are expected given the Restate integration across the control plane.

Also applies to: 233-233, 287-287, 325-325

go/Makefile (3)

27-27: LGTM! Restate added as a core dependency.

The up target now includes restate in the docker-compose startup, treating it as a first-class dependency alongside planetscale, mysql, redis, clickhouse, s3, and otel. This aligns with the broader Restate integration.


37-41: LGTM! Code generation workflow updated for Restate.

The generate target now:

  1. Installs the protoc-gen-go-restate plugin
  2. Runs buf generate with the Restate template for hydra protos
  3. Formats generated code with go fmt

This workflow correctly integrates Restate protobuf code generation alongside the existing Connect-based generation.


90-90: LGTM! Kubernetes workflow integrates Restate.

The changes correctly integrate Restate into the Kubernetes development workflow:

  • k8s-up waits for the Restate pod to be ready before proceeding
  • New start-restate target enables deploying Restate independently
  • start-dependencies includes Restate as a core dependency

Also applies to: 145-147, 162-162

apps/engineering/package.json (1)

24-24: LGTM! Dependencies support Mermaid diagram rendering.

The addition of mermaid and next-themes dependencies enables:

  • mermaid ^11.12.0: Rendering Mermaid diagrams in MDX documentation
  • next-themes ^0.4.6: Theme support for consistent light/dark mode rendering

These dependencies align with the Mermaid component integration in the docs page component.

Also applies to: 26-26

apps/engineering/content/docs/architecture/workflows/meta.json (1)

1-7: LGTM! Well-structured metadata for workflows documentation.

The metadata correctly defines a new "Durable Workflows" documentation section with:

  • Clear title and description
  • Appropriate icon reference
  • List of workflow-related pages (index, deployment-service, routing-service, creating-services)
  • Correct root flag (false) indicating this is a subsection
go/buf.gen.restate.yaml (1)

1-10: LGTM! Correct buf configuration for Restate code generation.

The configuration properly sets up Restate protobuf code generation:

  • Uses buf v2 format with managed generation enabled
  • Remote Go plugin (v1.36.8) generates standard protobuf code
  • Local protoc-gen-go-restate plugin generates Restate-specific workflow code
  • Both plugins output to gen/proto with paths=source_relative for consistent import paths

This aligns with the Makefile's generate target which installs the local plugin and runs this template.

go/k8s/manifests/ctrl.yaml (4)

26-26: LGTM! Restate HTTP port correctly exposed.

The container now exposes port 9080 for Restate HTTP communication, consistent with the UNKEY_RESTATE_HTTP_PORT environment variable and the service port configuration.


77-85: LGTM! Complete Restate configuration.

The environment variables correctly configure Restate integration:

  • UNKEY_RESTATE_INGRESS_URL: Points to the Restate ingress endpoint (port 8080) for invoking workflows
  • UNKEY_RESTATE_ADMIN_URL: Points to the Restate admin endpoint (port 9070) for service registration
  • UNKEY_RESTATE_HTTP_PORT: Specifies the port where ctrl listens for Restate HTTP requests (9080)
  • UNKEY_RESTATE_REGISTER_AS: Provides the self-registration URL for Restate discovery

These values align with the RestateConfig structure in go/apps/ctrl/config.go.


99-99: LGTM! Init container waits for Restate readiness.

The init container now checks that the Restate service is ready (restate:8080) before allowing the ctrl container to start, ensuring proper service initialization order alongside mysql and s3.


118-121: LGTM! Service correctly exposes Restate port.

The Service definition now exposes port 9080 (named "restate") for Restate HTTP communication, enabling other services and Restate itself to communicate with the ctrl's workflow endpoints.

go/pkg/db/domain_find_by_ids.sql_generated.go (1)

58-69: LGTM — generated query expansion is correct.

Parameter expansion and empty-slice handling are safe and appropriate.

go/apps/ctrl/workflows/deploy/deploy_handler.go (1)

100-104: updateDeploymentStatus already uses Restate for durability
The helper wraps the status update in restate.Run, so all calls (including lines 100–104, 131–134, and 335–339) are durable.

apps/engineering/content/docs/architecture/workflows/deployment-service.mdx (1)

6-94: Clear, actionable workflow docs

Flows, responsibilities, and references look consistent with the Restate migration. No issues.

go/apps/ctrl/workflows/certificate/doc.go (1)

1-79: Great package documentation

Concise, accurate overview of the Restate-backed ACME workflow and guarantees. Looks good.

apps/engineering/content/docs/architecture/workflows/index.mdx (1)

1-92: LGTM! Well-structured documentation.

The documentation provides a clear, comprehensive overview of the Restate-based durable workflow architecture. The structure logically progresses from core concepts through specific services to configuration and best practices. Error handling and observability guidance are appropriate.

go/apps/ctrl/workflows/deploy/rollback_handler.go (7)

32-42: LGTM! Clear validation and error handling.

The function documentation is comprehensive, and the initial validation correctly prevents self-rollback attempts with an appropriate terminal error.


43-72: LGTM! Proper durable step usage and validation.

The deployment lookups use named durable steps correctly, distinguish terminal (404) from retryable errors appropriately, and validate that both deployments belong to the same environment and project.


73-88: LGTM! Correct live deployment validation.

The project lookup and validation that the source deployment is currently live are implemented correctly with appropriate error handling.


89-108: LGTM! Proper VM status validation.

The workflow correctly verifies that the target deployment has at least one running VM before proceeding, preventing rollback to a non-functional deployment.


110-128: LGTM! Proper domain retrieval.

The domain query correctly filters for sticky domains (Live and Environment types) and appropriately rejects rollback attempts when no domains are found.


139-148: LGTM! Correct atomic domain switch.

The routing service call uses the project ID as the Virtual Object key, ensuring serialized execution per project. The blocking Request pattern is appropriate for this critical atomic operation.


149-173: LGTM! Proper state update and completion.

The project state update correctly sets the new live deployment and the rolled-back flag within a named durable step. The final logging provides a good audit trail.

go/apps/ctrl/services/deployment/service.go (1)

1-36: LGTM! Clean refactor to Config-based constructor.

The refactor from individual parameters to a Config struct improves maintainability and aligns with the Restate migration. All fields are properly initialized from the config.

@github-actions
Copy link
Contributor

github-actions bot commented Oct 2, 2025

Thank you for following the naming conventions for pull request titles! 🙏

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 23

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (4)
go/demo_api/main.go (1)

796-803: Enforce minLength: 3 in runtime to match OpenAPI (avoid contract drift).

Spec requires name length ≥ 3, but the handler only checks non-empty. Clients generated from the spec will expect 400 for 1–2 char names; current server returns 200.

Apply:

--- a/go/demo_api/main.go
+++ b/go/demo_api/main.go
@@
 import (
   "crypto/rand"
   "encoding/hex"
   "encoding/json"
   "fmt"
   "io"
   "log"
   "math"
   "net/http"
   "os"
   "regexp"
   "strconv"
   "strings"
   "sync"
   "time"
+  "unicode/utf8"
 )
@@
-    name := r.URL.Query().Get("name")
-    if name == "" {
-      respondWithError(w, http.StatusBadRequest, "MISSING_PARAMETER", "Missing or invalid name parameter")
-      return
-    }
+    name := strings.TrimSpace(r.URL.Query().Get("name"))
+    if utf8.RuneCountInString(name) < 3 {
+      respondWithError(w, http.StatusBadRequest, "INVALID_PARAMETER", "Missing or invalid name parameter")
+      return
+    }

Also consider adding a quick test to assert 400 for names of length 1–2 and 200 for ≥3.

Also applies to: 3-18, 395-401

apps/engineering/content/docs/infrastructure/database-schema.mdx (1)

10-13: Remove Hydra DB reference (outdated with Restate migration).

This section still lists a Hydra MySQL database. With Hydra removed in favor of Restate, this is misleading. Remove Hydra here (and anywhere else in this doc), and, if needed, add a short note that Restate is not backed by this MySQL schema.

Proposed edit:

- - **hydra**: Workflow orchestration engine database for managing deployment workflows
go/k8s/manifests/restate.yaml (1)

64-85: Avoid exposing admin (9070) via LoadBalancer

LoadBalancing both ingress and admin ports publicly is risky. Prefer ClusterIP for admin or split services and expose only ingress.

 kind: Service
 metadata:
   name: restate
@@
 spec:
   selector:
     app: restate
   ports:
     - name: ingress
       port: 8080
       targetPort: 8080
       protocol: TCP
     - name: admin
       port: 9070
       targetPort: 9070
       protocol: TCP
-  type: LoadBalancer
+  type: ClusterIP

Optionally create a separate LoadBalancer service for 8080 only if external access is required.

go/k8s/manifests/ctrl.yaml (1)

21-27: Harden ctrl container and set resources

Add securityContext and resource requests/limits; name the new port.

         - name: ctrl
           image: unkey-ctrl:latest
           imagePullPolicy: Never # Use local images
           ports:
             - containerPort: 7091
-            - containerPort: 9080
+            - containerPort: 9080
+              name: restate
+          securityContext:
+            allowPrivilegeEscalation: false
+            runAsNonRoot: true
+            readOnlyRootFilesystem: true
+            capabilities:
+              drop: ["ALL"]
+            seccompProfile:
+              type: RuntimeDefault
+          resources:
+            requests:
+              cpu: "200m"
+              memory: "256Mi"
+            limits:
+              cpu: "2"
+              memory: "1Gi"
♻️ Duplicate comments (1)
go/apps/ctrl/run.go (1)

200-206: Restate start error propagation (completes silently).

Current goroutine only logs. Propagate error (e.g., via channel) and exit; see prior comment for diff.

🧹 Nitpick comments (30)
go/Makefile (2)

90-90: Wait for app=restate — labels verified; consider longer timeout or rollout status

Labels in go/k8s/manifests/restate.yaml correctly use app: restate. For slower startups, you may increase the wait timeout to --timeout=300s or replace the kubectl wait with a kubectl rollout status deployment/restate -n unkey step.


145-147: Declare start-restate and start-dependencies as phony targets
Add the missing targets to .PHONY to avoid filename clashes:

-.PHONY: install tools fmt test-unit test-integration test-integration-long test-stress test build generate pull up clean k8s-check k8s-up k8s-down k8s-reset k8s-status start-mysql start-ctrl start-all dev
+.PHONY: install tools fmt test-unit test-integration test-integration-long test-stress test build generate pull up clean k8s-check k8s-up k8s-down k8s-reset k8s-status start-mysql start-ctrl start-all dev start-restate start-dependencies

The manifest go/k8s/manifests/restate.yaml already uses app: restate.

apps/engineering/content/docs/infrastructure/database-schema.mdx (1)

94-95: Add pointer to the custom sqlc generation step.

Helpful to link to the exact entry point (go:generate directive or Makefile target) so engineers know what to run when not using sqlc generate directly.

Example addition:

  • See Makefile target generate or go:generate in go/pkg/db/ for the custom generation pipeline.
go/buf.yaml (1)

17-18: Consider pinning the Restate SDK version.

The dependency buf.build/restatedev/sdk-go is declared without a version constraint. For reproducible builds and to avoid unexpected breaking changes, consider pinning to a specific version or version range.

Apply this diff to pin the dependency version:

 deps:
-  - buf.build/restatedev/sdk-go
+  - buf.build/restatedev/sdk-go:v1.0.0  # Update to the actual version being used

Alternatively, you can verify the appropriate version by checking the Buf registry or the project's dependency management strategy.

go/apps/ctrl/config.go (1)

103-103: Consider adding Restate configuration validation.

The Restate field is added to the Config struct, but the Validate() method (lines 106-115) doesn't include any checks for the Restate configuration. If Restate is required for the system to function, consider adding validation to ensure critical fields like IngressURL and AdminURL are not empty.

Example validation to add in the Validate() method:

// Validate Restate configuration
if c.Restate.IngressURL != "" || c.Restate.AdminURL != "" {
    if err := assert.NotEmpty(c.Restate.IngressURL, "restate ingress URL is required"); err != nil {
        return err
    }
    if err := assert.NotEmpty(c.Restate.AdminURL, "restate admin URL is required"); err != nil {
        return err
    }
    if err := assert.NotEmpty(c.Restate.RegisterAs, "restate register URL is required"); err != nil {
        return err
    }
}
go/apps/ctrl/workflows/deploy/domains.go (1)

61-69: Consider validating sluggify output.

If branchName consists entirely of special characters or results in an empty string after sluggification, the domain could become malformed (e.g., project-git--workspace.apex with double hyphens). Consider adding validation to skip branch domain creation if sluggify returns an empty string.

Apply this diff to add validation:

 if branchName != "" {
+    slug := sluggify(branchName)
+    if slug == "" {
+        // Skip branch domain if sluggify results in empty string
+        goto skipBranchDomain
+    }
     domains = append(
         domains,
         newDomain{
-            domain: fmt.Sprintf("%s-git-%s-%s.%s", projectSlug, sluggify(branchName), workspaceSlug, apex),
+            domain: fmt.Sprintf("%s-git-%s-%s.%s", projectSlug, slug, workspaceSlug, apex),
             sticky: db.NullDomainsSticky{Valid: true, DomainsSticky: db.DomainsStickyBranch},
         },
     )
 }
+skipBranchDomain:
apps/engineering/content/docs/architecture/workflows/creating-services.mdx (1)

132-135: Consider rephrasing the editorial comment.

The phrase "These are ugly, they're working on..." is informal for technical documentation. Consider rephrasing more professionally while still acknowledging the temporary nature of the API.

Suggested rewording:

-These are ugly, they're working on generating proper clients from the proto definitions
-https://github.com/restatedev/sdk-go/issues/103
+**Note:** The current client API is temporary. The Restate team is working on generating proper clients from proto definitions. See [sdk-go#103](https://github.com/restatedev/sdk-go/issues/103) for details.
apps/engineering/app/components/mermaid.tsx (1)

7-23: Consider adding error handling for mermaid.run.

The component uses void mermaid.run(...) which discards the promise and silently ignores any rendering errors. If a diagram fails to render (e.g., invalid syntax, browser compatibility issues), users won't see any feedback.

Consider adding error handling:

-    void mermaid.run({
-      nodes: [ref.current],
-    });
+    mermaid.run({
+      nodes: [ref.current],
+    }).catch((error) => {
+      console.error("Failed to render Mermaid diagram:", error);
+      // Optionally: set error state and display user-friendly message
+    });
go/k8s/manifests/restate.yaml (1)

14-63: Consider volumeClaimTemplates for StatefulSet storage

Using a static PVC works for 1 replica, but volumeClaimTemplates is the idiomatic way for StatefulSets and eases scaling.

No functional change needed now; just a future-proofing note.

go/k8s/manifests/ctrl.yaml (1)

93-101: Make dependency wait more robust

busybox nc -z can be flaky; prefer probing Restate health endpoint.

-              "until nc -z mysql 3306 && nc -z s3 3902 && nc -z restate 8080; do echo waiting for dependencies; sleep 2; done;",
+              "until nc -z mysql 3306 && nc -z s3 3902 && wget -q --spider http://restate:9070/health; do echo waiting for dependencies; sleep 2; done;",
go/apps/ctrl/workflows/routing/helpers.go (1)

32-55: Harden local-host detection and keep logic DRY across packages

Add ::1/0.0.0.0 and trim trailing dot. Also consider centralizing this logic to avoid drift with deploy/helpers.go.

Apply:

 func isLocalHostname(hostname, defaultDomain string) bool {
-	hostname = strings.ToLower(hostname)
+	hostname = strings.ToLower(hostname)
+	// Normalize FQDN style
+	hostname = strings.TrimSuffix(hostname, ".")
@@
-	if hostname == "localhost" || hostname == "127.0.0.1" {
+	if hostname == "localhost" || hostname == "127.0.0.1" || hostname == "::1" || hostname == "0.0.0.0" {
 		return true
 	}

Optionally extract to a shared helper to reuse here and in deploy/helpers.go. Based on relevant code snippets.

go/apps/ctrl/workflows/deploy/helpers.go (2)

59-89: Match routing’s local-host rules and add ::1/0.0.0.0 + trailing dot trim

Keep behavior consistent and robust; consider extracting to shared helper.

Apply:

 func isLocalHostname(hostname, defaultDomain string) bool {
 	// Lowercase for case-insensitive comparison
-	hostname = strings.ToLower(hostname)
+	hostname = strings.ToLower(hostname)
+	hostname = strings.TrimSuffix(hostname, ".")
@@
-	if hostname == "localhost" || hostname == "127.0.0.1" {
+	if hostname == "localhost" || hostname == "127.0.0.1" || hostname == "::1" || hostname == "0.0.0.0" {
 		return true
 	}

98-109: Fix error message to include actual status

Avoid hardcoding “building”.

Apply:

-		if updateErr != nil {
-			return restate.Void{}, fmt.Errorf("failed to update version status to building: %w", updateErr)
-		}
+		if updateErr != nil {
+			return restate.Void{}, fmt.Errorf("failed to update deployment status to %s: %w", status, updateErr)
+		}
go/apps/ctrl/workflows/routing/assign_domains_handler.go (1)

111-149: Deduplicate domains, handle nil GatewayConfig, and fail fast on marshal errors

  • Return early if req.GetGatewayConfig() is nil to avoid no-op upserts.
  • Deduplicate changedDomains before filtering local hosts to prevent redundant upserts.
  • Return an error on protojson.Marshal failure instead of logging and continuing.
go/cmd/ctrl/main.go (1)

82-91: Restate flags: add sane default/validation for restate-register-as

Consider one of:

  • Make restate-register-as required, or
  • Provide a safe default (e.g., http://ctrl:9080) for local/dev to avoid empty config.

Also validate URL schemes for ingress/admin/register-as to catch typos early.

Example change:

- cli.String("restate-register-as", "URL of this service for self-registration with Restate. Example: http://ctrl:9080",
-   cli.EnvVar("UNKEY_RESTATE_REGISTER_AS")),
+ cli.String("restate-register-as", "URL of this service for self-registration with Restate. Example: http://ctrl:9080",
+   cli.Default("http://ctrl:9080"), // dev-friendly default
+   cli.EnvVar("UNKEY_RESTATE_REGISTER_AS")),

If your CLI has a Validate option, attach a URL validator to restate-ingress-url, restate-admin-url, and restate-register-as.

go/proto/hydra/v1/certificate.proto (1)

23-26: Prefer enum over free-form status string

Using an enum improves type safety and avoids value drift.

Example:

-message ProcessChallengeResponse {
-  string certificate_id = 1;
-  string status = 2; // "success", "failed", "pending"
-}
+enum ProcessChallengeStatus {
+  PROCESS_CHALLENGE_STATUS_UNSPECIFIED = 0;
+  PROCESS_CHALLENGE_STATUS_SUCCESS = 1;
+  PROCESS_CHALLENGE_STATUS_FAILED = 2;
+  PROCESS_CHALLENGE_STATUS_PENDING = 3;
+}
+
+message ProcessChallengeResponse {
+  string certificate_id = 1;
+  ProcessChallengeStatus status = 2;
+}
go/apps/ctrl/services/deployment/service.go (1)

20-25: Config field naming consistency (nit)

Other packages tend to use DB over Database; consider renaming Config.Database to DB for consistency (optional).

go/apps/ctrl/services/deployment/promote.go (1)

33-42: Map Restate workflow errors to precise Connect codes

The workflow can return TerminalError(400/404/409...). Instead of always CodeInternal, map known statuses to Connect codes (InvalidArgument, NotFound, FailedPrecondition, etc.) for clearer API semantics.

If the ingress client exposes a typed error with HTTP status, assert and map it; else parse status when available. Example sketch:

// pseudo-code; adjust to actual ingress error type
if e, ok := err.(interface{ StatusCode() int }); ok {
  switch e.StatusCode() {
  case 400: return nil, connect.NewError(connect.CodeInvalidArgument, err)
  case 404: return nil, connect.NewError(connect.CodeNotFound, err)
  case 409: return nil, connect.NewError(connect.CodeFailedPrecondition, err)
  default:  return nil, connect.NewError(connect.CodeInternal, err)
  }
}

This preserves user-visible error intent from the workflow.

Also applies to: 45-49

deployment/docker-compose.yaml (2)

293-297: Pin Restate image tag instead of latest

Using :latest can break local dev unexpectedly. Pin a known-good version (and update via PRs).

-    image: docker.io/restatedev/restate:latest
+    image: docker.io/restatedev/restate:v0.XX.X

303-307: Healthcheck assumes curl is present in the image

Confirm curl exists in restatedev/restate. If not, use CMD-SHELL with wget or switch to /healthz tooling.

-    healthcheck:
-      test: ["CMD", "curl", "-f", "http://localhost:9070/health"]
+    healthcheck:
+      test: ["CMD-SHELL", "wget -qO- http://localhost:9070/health >/dev/null 2>&1 || exit 1"]
       timeout: 5s
       retries: 10
       start_period: 10s
       interval: 5s
go/apps/ctrl/workflows/routing/switch_domains_handler.go (1)

37-44: Consider terminal error on missing gateway config

If no gateway config exists for the target deployment, return a terminal error to avoid futile retries.

-	gatewayConfig, err := restate.Run(ctx, func(stepCtx restate.RunContext) (partitiondb.FindGatewayByDeploymentIdRow, error) {
-		return partitiondb.Query.FindGatewayByDeploymentId(stepCtx, s.partitionDB.RO(), req.GetTargetDeploymentId())
-	}, restate.WithName("fetch-gateway-config"))
+	gatewayConfig, err := restate.Run(ctx, func(stepCtx restate.RunContext) (partitiondb.FindGatewayByDeploymentIdRow, error) {
+		return partitiondb.Query.FindGatewayByDeploymentId(stepCtx, s.partitionDB.RO(), req.GetTargetDeploymentId())
+	}, restate.WithName("fetch-gateway-config"))
 	if err != nil {
-		return nil, fmt.Errorf("failed to fetch gateway config for deployment %s: %w", req.GetTargetDeploymentId(), err)
+		// If not found, classify as terminal (bad request)
+		if err == sql.ErrNoRows {
+			return nil, restate.TerminalError(
+				fmt.Errorf("no gateway config for deployment %s", req.GetTargetDeploymentId()),
+				400,
+			)
+		}
+		return nil, fmt.Errorf("failed to fetch gateway config for deployment %s: %w", req.GetTargetDeploymentId(), err)
 	}
go/apps/ctrl/workflows/deploy/deploy_handler.go (3)

137-145: Avoid for i := range 300 unless on Go 1.22+

Use a classic counter for wider compatibility, or confirm the module requires Go 1.22+.

-		for i := range 300 {
+		for i := 0; i < 300; i++ {

If you intend to use this syntax, ensure go.mod specifies go 1.22+.


134-206: Reduce DB churn in the polling loop

Upserting all VMs every second for up to 5 minutes is heavy. Track last-seen status and only upsert on changes or at successful completion.


134-206: Prefer non-blocking durable waits

Sleeping inside a single restate.Run step ties up the worker. If available, use Restate’s sleep/yield API between polls or split into smaller steps with backoff.

Can you confirm if sdk-go exposes a sleep/yield API (e.g., stepCtx.Sleep or similar)? If yes, consider replacing time.Sleep with it to free capacity during waits.

go/apps/ctrl/services/deployment/create_deployment.go (1)

147-151: Ensure KeyAuthId assignment matches proto oneof type and semantics

DeployRequest.KeyAuthId is a *string. Avoid assigning a non-pointer. Also confirm that KeyspaceId maps to KeyAuthId (naming differs).

-	deployReq := &hydrav1.DeployRequest{
-		DeploymentId: deploymentID,
-		DockerImage:  req.Msg.GetDockerImage(),
-		KeyAuthId:    req.Msg.KeyspaceId,
-	}
+	deployReq := &hydrav1.DeployRequest{
+		DeploymentId: deploymentID,
+		DockerImage:  req.Msg.GetDockerImage(),
+	}
+	if ks := req.Msg.GetKeyspaceId(); ks != "" {
+		deployReq.KeyAuthId = &ks
+	}
go/apps/ctrl/workflows/certificate/process_challenge_handler.go (1)

154-157: Consider returning CertificateId in response

ProcessChallengeResponse defines certificate_id; you currently omit it. Populate from the insert/update result when available.

If the generated insert returns the ID, wire it through; otherwise I can adjust the SQL to RETURNING id and plumb it into the response.

go/apps/ctrl/services/deployment/rollback.go (1)

48-55: Add a timeout around the workflow call

Ingress calls can block; set a deadline to avoid hanging request threads.

Apply:

-_, err = restateingress.Object[*hydrav1.RollbackRequest, *hydrav1.RollbackResponse](
+ctxCall, cancel := context.WithTimeout(ctx, 30*time.Second)
+defer cancel()
+_, err = restateingress.Object[*hydrav1.RollbackRequest, *hydrav1.RollbackResponse](
   s.restate,
   "hydra.v1.DeploymentService",
   sourceDeployment.ProjectID,
   "Rollback",
-).Request(ctx, &hydrav1.RollbackRequest{
+).Request(ctxCall, &hydrav1.RollbackRequest{
   SourceDeploymentId: req.Msg.GetSourceDeploymentId(),
   TargetDeploymentId: req.Msg.GetTargetDeploymentId(),
 })

Add import:

+ "time"
apps/engineering/content/docs/architecture/workflows/index.mdx (1)

27-27: Align terminology with Restate Go SDK.

“Object.Request” and “WorkflowSend.Send” are conceptual; consider matching actual SDK method names used in code to avoid confusion (e.g., typed stubs’ Request/Send names).

go/apps/ctrl/run.go (2)

103-105: Delete leftover “make go happy” line.

vaultSvc is used later; the blank assignment is now dead code.

Apply:

-	// make go happy
-	_ = vaultSvc

193-199: Guard nil Vault in Certificate workflow.

In run.go vaultSvc is nil when cfg.VaultMasterKeys is empty, and certificate.New assigns it directly to Service.vault without checks. While no current handlers dereference vault, any future use will panic. Require vaultSvc (fail startup) or add nil checks in the service (or only bind when vaultSvc ≠ nil).

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 027ce45 and e2f5837.

⛔ Files ignored due to path filters (18)
  • go/buf.lock is excluded by !**/*.lock
  • go/gen/proto/ctrl/v1/acme.pb.go is excluded by !**/*.pb.go, !**/gen/**
  • go/gen/proto/ctrl/v1/build.pb.go is excluded by !**/*.pb.go, !**/gen/**
  • go/gen/proto/ctrl/v1/deployment.pb.go is excluded by !**/*.pb.go, !**/gen/**
  • go/gen/proto/ctrl/v1/openapi.pb.go is excluded by !**/*.pb.go, !**/gen/**
  • go/gen/proto/ctrl/v1/service.pb.go is excluded by !**/*.pb.go, !**/gen/**
  • go/gen/proto/hydra/v1/certificate.pb.go is excluded by !**/*.pb.go, !**/gen/**
  • go/gen/proto/hydra/v1/certificate_restate.pb.go is excluded by !**/*.pb.go, !**/gen/**
  • go/gen/proto/hydra/v1/deployment.pb.go is excluded by !**/*.pb.go, !**/gen/**
  • go/gen/proto/hydra/v1/deployment_restate.pb.go is excluded by !**/*.pb.go, !**/gen/**
  • go/gen/proto/hydra/v1/routing.pb.go is excluded by !**/*.pb.go, !**/gen/**
  • go/gen/proto/hydra/v1/routing_restate.pb.go is excluded by !**/*.pb.go, !**/gen/**
  • go/gen/proto/krane/v1/deployment.pb.go is excluded by !**/*.pb.go, !**/gen/**
  • go/gen/proto/partition/v1/gateway.pb.go is excluded by !**/*.pb.go, !**/gen/**
  • go/gen/proto/vault/v1/object.pb.go is excluded by !**/*.pb.go, !**/gen/**
  • go/gen/proto/vault/v1/service.pb.go is excluded by !**/*.pb.go, !**/gen/**
  • go/go.sum is excluded by !**/*.sum
  • pnpm-lock.yaml is excluded by !**/pnpm-lock.yaml
📒 Files selected for processing (53)
  • apps/engineering/app/components/mermaid.tsx (1 hunks)
  • apps/engineering/app/docs/[[...slug]]/page.tsx (2 hunks)
  • apps/engineering/content/docs/architecture/meta.json (1 hunks)
  • apps/engineering/content/docs/architecture/workflows/creating-services.mdx (1 hunks)
  • apps/engineering/content/docs/architecture/workflows/deployment-service.mdx (1 hunks)
  • apps/engineering/content/docs/architecture/workflows/index.mdx (1 hunks)
  • apps/engineering/content/docs/architecture/workflows/meta.json (1 hunks)
  • apps/engineering/content/docs/architecture/workflows/routing-service.mdx (1 hunks)
  • apps/engineering/content/docs/infrastructure/database-schema.mdx (4 hunks)
  • apps/engineering/package.json (1 hunks)
  • deployment/docker-compose.yaml (4 hunks)
  • go/GO_DOCUMENTATION_GUIDELINES.md (8 hunks)
  • go/Makefile (5 hunks)
  • go/Tiltfile (5 hunks)
  • go/apps/ctrl/config.go (2 hunks)
  • go/apps/ctrl/run.go (5 hunks)
  • go/apps/ctrl/services/acme/certificate_workflow.go (0 hunks)
  • go/apps/ctrl/services/acme/service.go (0 hunks)
  • go/apps/ctrl/services/deployment/create_deployment.go (2 hunks)
  • go/apps/ctrl/services/deployment/deploy_workflow.go (0 hunks)
  • go/apps/ctrl/services/deployment/promote.go (2 hunks)
  • go/apps/ctrl/services/deployment/rollback.go (2 hunks)
  • go/apps/ctrl/services/deployment/service.go (1 hunks)
  • go/apps/ctrl/workflows/certificate/doc.go (1 hunks)
  • go/apps/ctrl/workflows/certificate/pem.go (1 hunks)
  • go/apps/ctrl/workflows/certificate/process_challenge_handler.go (1 hunks)
  • go/apps/ctrl/workflows/certificate/service.go (1 hunks)
  • go/apps/ctrl/workflows/deploy/deploy_handler.go (1 hunks)
  • go/apps/ctrl/workflows/deploy/doc.go (1 hunks)
  • go/apps/ctrl/workflows/deploy/domains.go (3 hunks)
  • go/apps/ctrl/workflows/deploy/helpers.go (1 hunks)
  • go/apps/ctrl/workflows/deploy/promote_handler.go (1 hunks)
  • go/apps/ctrl/workflows/deploy/rollback_handler.go (1 hunks)
  • go/apps/ctrl/workflows/deploy/service.go (1 hunks)
  • go/apps/ctrl/workflows/routing/assign_domains_handler.go (1 hunks)
  • go/apps/ctrl/workflows/routing/doc.go (1 hunks)
  • go/apps/ctrl/workflows/routing/helpers.go (1 hunks)
  • go/apps/ctrl/workflows/routing/service.go (1 hunks)
  • go/apps/ctrl/workflows/routing/switch_domains_handler.go (1 hunks)
  • go/buf.gen.connect.yaml (1 hunks)
  • go/buf.gen.restate.yaml (1 hunks)
  • go/buf.yaml (2 hunks)
  • go/cmd/ctrl/main.go (2 hunks)
  • go/demo_api/main.go (1 hunks)
  • go/go.mod (4 hunks)
  • go/k8s/manifests/ctrl.yaml (4 hunks)
  • go/k8s/manifests/restate.yaml (1 hunks)
  • go/pkg/db/domain_find_by_ids.sql_generated.go (1 hunks)
  • go/pkg/db/querier_generated.go (1 hunks)
  • go/pkg/db/queries/domain_find_by_ids.sql (1 hunks)
  • go/proto/hydra/v1/certificate.proto (1 hunks)
  • go/proto/hydra/v1/deployment.proto (1 hunks)
  • go/proto/hydra/v1/routing.proto (1 hunks)
💤 Files with no reviewable changes (3)
  • go/apps/ctrl/services/acme/service.go
  • go/apps/ctrl/services/deployment/deploy_workflow.go
  • go/apps/ctrl/services/acme/certificate_workflow.go
🧰 Additional context used
🧠 Learnings (6)
📚 Learning: 2025-09-01T02:33:43.791Z
Learnt from: imeyer
PR: unkeyed/unkey#3899
File: go/buf.gen.yaml:0-0
Timestamp: 2025-09-01T02:33:43.791Z
Learning: In the unkeyed/unkey repository, buf commands are executed from the `go/` directory where the `buf.yaml` file is located. This means the `out: gen` configuration in `go/buf.gen.yaml` generates files to `go/gen/` relative to the repository root, which aligns with the expected directory structure.

Applied to files:

  • go/buf.gen.restate.yaml
  • go/Makefile
📚 Learning: 2025-04-22T14:43:11.724Z
Learnt from: chronark
PR: unkeyed/unkey#3161
File: go/pkg/clickhouse/schema/databases/002_ratelimits/006_ratelimits_per_day_v1.sql:1-13
Timestamp: 2025-04-22T14:43:11.724Z
Learning: In the unkey project, the SQL files in clickhouse/schema/databases represent the current production schema and shouldn't be modified directly in PRs. Schema changes require dedicated migration scripts.

Applied to files:

  • apps/engineering/content/docs/infrastructure/database-schema.mdx
📚 Learning: 2025-07-02T11:51:58.572Z
Learnt from: chronark
PR: unkeyed/unkey#3420
File: go/pkg/hydra/store/gorm/gorm.go:486-498
Timestamp: 2025-07-02T11:51:58.572Z
Learning: The Hydra package (go/pkg/hydra) is planned to be migrated from GORM to sqlc for database operations, which explains why raw SQL queries are acceptable in the current implementation.

Applied to files:

  • apps/engineering/content/docs/infrastructure/database-schema.mdx
📚 Learning: 2025-09-12T08:01:20.792Z
Learnt from: Flo4604
PR: unkeyed/unkey#3944
File: go/pkg/db/acme_challenge_update_verified_with_expiry.sql_generated.go:31-39
Timestamp: 2025-09-12T08:01:20.792Z
Learning: Do not review or suggest changes to files with sql_generated.go suffix or other files marked as auto-generated (containing "Code generated by" comments), as these are generated by tools like sqlc and changes would be overwritten on regeneration.

Applied to files:

  • apps/engineering/content/docs/infrastructure/database-schema.mdx
📚 Learning: 2025-09-01T01:57:42.227Z
Learnt from: imeyer
PR: unkeyed/unkey#3899
File: go/proto/metald/v1/metald.proto:5-9
Timestamp: 2025-09-01T01:57:42.227Z
Learning: In the unkeyed/unkey repository, buf is configured to properly resolve metald proto imports like "metald/v1/vm.proto" without needing the full "go/proto/" prefix. The buf lint command `buf lint --path proto/metald` passes successfully with these relative import paths.

Applied to files:

  • go/Makefile
📚 Learning: 2025-09-15T18:12:01.503Z
Learnt from: mcstepp
PR: unkeyed/unkey#3952
File: go/apps/ctrl/run.go:212-213
Timestamp: 2025-09-15T18:12:01.503Z
Learning: In Go connect-go generated handlers, mux.Handle can accept New*ServiceHandler functions directly even though they return (string, http.Handler) tuples. The pattern mux.Handle(ctrlv1connect.New*ServiceHandler(...)) is valid and compiles successfully in unkey codebase.

Applied to files:

  • go/apps/ctrl/run.go
🧬 Code graph analysis (19)
go/pkg/db/querier_generated.go (2)
go/pkg/partition/db/database.go (1)
  • DBTX (10-10)
go/pkg/db/domain_find_by_ids.sql_generated.go (1)
  • FindDomainsByIdsRow (30-41)
go/apps/ctrl/workflows/routing/service.go (5)
go/gen/proto/hydra/v1/routing_restate.pb.go (5)
  • UnimplementedRoutingServiceServer (71-71)
  • UnimplementedRoutingServiceServer (73-75)
  • UnimplementedRoutingServiceServer (76-78)
  • UnimplementedRoutingServiceServer (79-79)
  • RoutingServiceServer (57-64)
go/apps/ctrl/services/deployment/service.go (3)
  • Service (10-18)
  • New (27-36)
  • Config (20-25)
go/apps/ctrl/workflows/certificate/service.go (3)
  • Service (19-25)
  • New (45-52)
  • Config (30-42)
go/apps/ctrl/services/acme/service.go (3)
  • Service (9-14)
  • New (22-29)
  • Config (16-20)
go/apps/ctrl/workflows/deploy/service.go (2)
  • New (52-60)
  • Config (34-49)
go/apps/ctrl/workflows/certificate/service.go (2)
go/gen/proto/hydra/v1/certificate_restate.pb.go (4)
  • UnimplementedCertificateServiceServer (61-61)
  • UnimplementedCertificateServiceServer (63-65)
  • UnimplementedCertificateServiceServer (66-66)
  • CertificateServiceServer (50-54)
go/apps/ctrl/workflows/routing/service.go (3)
  • Service (18-24)
  • New (44-51)
  • Config (29-41)
go/apps/ctrl/workflows/certificate/process_challenge_handler.go (7)
go/pkg/partition/db/models_generated.go (1)
  • Certificate (104-112)
go/apps/ctrl/workflows/certificate/service.go (1)
  • Service (19-25)
go/gen/proto/hydra/v1/certificate.pb.go (6)
  • ProcessChallengeRequest (25-31)
  • ProcessChallengeRequest (44-44)
  • ProcessChallengeRequest (59-61)
  • ProcessChallengeResponse (77-83)
  • ProcessChallengeResponse (96-96)
  • ProcessChallengeResponse (111-113)
go/pkg/db/models_generated.go (3)
  • AcmeChallengesStatusPending (20-20)
  • AcmeChallengesStatusFailed (22-22)
  • AcmeChallengesStatusVerified (21-21)
go/pkg/db/handle_err_no_rows.go (1)
  • IsNotFound (8-10)
go/pkg/partition/db/certificate_insert.sql_generated.go (1)
  • InsertCertificateParams (23-30)
go/pkg/db/acme_challenge_update_verified_with_expiry.sql_generated.go (1)
  • UpdateAcmeChallengeVerifiedWithExpiryParams (19-24)
go/apps/ctrl/services/deployment/rollback.go (3)
go/apps/ctrl/services/deployment/service.go (1)
  • Service (10-18)
go/gen/proto/hydra/v1/deployment.pb.go (6)
  • RollbackRequest (121-127)
  • RollbackRequest (140-140)
  • RollbackRequest (155-157)
  • RollbackResponse (173-177)
  • RollbackResponse (190-190)
  • RollbackResponse (205-207)
go/gen/proto/ctrl/v1/deployment.pb.go (6)
  • RollbackRequest (832-838)
  • RollbackRequest (851-851)
  • RollbackRequest (866-868)
  • RollbackResponse (884-888)
  • RollbackResponse (901-901)
  • RollbackResponse (916-918)
go/cmd/ctrl/main.go (2)
go/pkg/cli/flag.go (4)
  • String (419-451)
  • Default (364-416)
  • EnvVar (320-339)
  • Int (532-569)
go/apps/ctrl/config.go (1)
  • RestateConfig (32-46)
go/apps/ctrl/workflows/routing/helpers.go (2)
go/gen/proto/hydra/v1/routing.pb.go (7)
  • DomainSticky (26-26)
  • DomainSticky (61-63)
  • DomainSticky (65-67)
  • DomainSticky (74-76)
  • DomainSticky_DOMAIN_STICKY_BRANCH (30-30)
  • DomainSticky_DOMAIN_STICKY_ENVIRONMENT (31-31)
  • DomainSticky_DOMAIN_STICKY_LIVE (32-32)
go/pkg/db/models_generated.go (5)
  • NullDomainsSticky (259-262)
  • DomainsSticky (239-239)
  • DomainsStickyBranch (242-242)
  • DomainsStickyEnvironment (243-243)
  • DomainsStickyLive (244-244)
go/apps/ctrl/services/deployment/create_deployment.go (1)
go/gen/proto/hydra/v1/deployment.pb.go (3)
  • DeployRequest (25-32)
  • DeployRequest (45-45)
  • DeployRequest (60-62)
go/apps/ctrl/workflows/routing/assign_domains_handler.go (7)
go/apps/ctrl/workflows/routing/service.go (3)
  • Service (18-24)
  • New (44-51)
  • Config (29-41)
go/gen/proto/hydra/v1/routing.pb.go (6)
  • AssignDomainsRequest (79-93)
  • AssignDomainsRequest (106-106)
  • AssignDomainsRequest (121-123)
  • AssignDomainsResponse (226-232)
  • AssignDomainsResponse (245-245)
  • AssignDomainsResponse (260-262)
go/pkg/db/tx.go (1)
  • Tx (196-201)
go/pkg/db/handle_err_no_rows.go (1)
  • IsNotFound (8-10)
go/pkg/db/domain_insert.sql_generated.go (1)
  • InsertDomainParams (39-49)
go/pkg/db/domain_reassign.sql_generated.go (1)
  • ReassignDomainParams (22-27)
go/pkg/partition/db/gateway_upsert.sql_generated.go (1)
  • UpsertGatewayParams (31-36)
go/pkg/db/domain_find_by_ids.sql_generated.go (1)
go/pkg/db/models_generated.go (2)
  • NullDomainsSticky (259-262)
  • DomainsType (282-282)
go/apps/ctrl/workflows/deploy/rollback_handler.go (10)
go/apps/ctrl/workflows/deploy/service.go (1)
  • Workflow (22-29)
go/pkg/db/deployment_find_by_id.sql_generated.go (1)
  • FindDeploymentByIdRow (35-51)
go/pkg/db/handle_err_no_rows.go (1)
  • IsNotFound (8-10)
go/pkg/db/project_find_by_id.sql_generated.go (1)
  • FindProjectByIdRow (30-42)
go/pkg/partition/db/models_generated.go (2)
  • Vm (137-145)
  • VmsStatusRunning (63-63)
go/pkg/db/domain_find_for_rollback.sql_generated.go (2)
  • FindDomainsForRollbackRow (37-47)
  • FindDomainsForRollbackParams (32-35)
go/pkg/db/models_generated.go (4)
  • NullDomainsSticky (259-262)
  • DomainsSticky (239-239)
  • DomainsStickyLive (244-244)
  • DomainsStickyEnvironment (243-243)
go/gen/proto/hydra/v1/routing_restate.pb.go (1)
  • NewRoutingServiceClient (30-37)
go/gen/proto/hydra/v1/routing.pb.go (3)
  • SwitchDomainsRequest (272-279)
  • SwitchDomainsRequest (292-292)
  • SwitchDomainsRequest (307-309)
go/pkg/db/project_update_deployments.sql_generated.go (1)
  • UpdateProjectDeploymentsParams (22-27)
go/apps/ctrl/workflows/deploy/promote_handler.go (11)
go/apps/ctrl/workflows/deploy/service.go (1)
  • Workflow (22-29)
go/gen/proto/hydra/v1/deployment.pb.go (6)
  • PromoteRequest (209-214)
  • PromoteRequest (227-227)
  • PromoteRequest (242-244)
  • PromoteResponse (253-257)
  • PromoteResponse (270-270)
  • PromoteResponse (285-287)
go/pkg/db/deployment_find_by_id.sql_generated.go (1)
  • FindDeploymentByIdRow (35-51)
go/pkg/db/handle_err_no_rows.go (1)
  • IsNotFound (8-10)
go/pkg/db/project_find_by_id.sql_generated.go (1)
  • FindProjectByIdRow (30-42)
go/pkg/db/models_generated.go (5)
  • DeploymentsStatusReady (200-200)
  • NullDomainsSticky (259-262)
  • DomainsSticky (239-239)
  • DomainsStickyLive (244-244)
  • DomainsStickyEnvironment (243-243)
go/pkg/partition/db/models_generated.go (2)
  • Vm (137-145)
  • VmsStatusRunning (63-63)
go/pkg/db/domain_find_for_promotion.sql_generated.go (2)
  • FindDomainsForPromotionRow (37-47)
  • FindDomainsForPromotionParams (32-35)
go/gen/proto/hydra/v1/routing_restate.pb.go (1)
  • NewRoutingServiceClient (30-37)
go/gen/proto/hydra/v1/routing.pb.go (3)
  • SwitchDomainsRequest (272-279)
  • SwitchDomainsRequest (292-292)
  • SwitchDomainsRequest (307-309)
go/pkg/db/project_update_deployments.sql_generated.go (1)
  • UpdateProjectDeploymentsParams (22-27)
go/apps/ctrl/services/deployment/service.go (5)
go/apps/ctrl/workflows/certificate/service.go (3)
  • New (45-52)
  • Config (30-42)
  • Service (19-25)
go/apps/ctrl/workflows/deploy/service.go (2)
  • New (52-60)
  • Config (34-49)
go/apps/ctrl/workflows/routing/service.go (3)
  • New (44-51)
  • Config (29-41)
  • Service (18-24)
go/apps/ctrl/services/acme/service.go (3)
  • New (22-29)
  • Config (16-20)
  • Service (9-14)
go/apps/ctrl/config.go (1)
  • Config (48-104)
go/apps/ctrl/workflows/routing/switch_domains_handler.go (6)
go/apps/ctrl/workflows/routing/service.go (2)
  • Service (18-24)
  • Config (29-41)
go/gen/proto/hydra/v1/routing.pb.go (6)
  • SwitchDomainsRequest (272-279)
  • SwitchDomainsRequest (292-292)
  • SwitchDomainsRequest (307-309)
  • SwitchDomainsResponse (325-329)
  • SwitchDomainsResponse (342-342)
  • SwitchDomainsResponse (357-359)
go/pkg/partition/db/gateway_find_by_deployment_id.sql_generated.go (1)
  • FindGatewayByDeploymentIdRow (20-23)
go/pkg/db/domain_find_by_ids.sql_generated.go (1)
  • FindDomainsByIdsRow (30-41)
go/pkg/partition/db/gateway_upsert.sql_generated.go (1)
  • UpsertGatewayParams (31-36)
go/pkg/db/domain_reassign.sql_generated.go (1)
  • ReassignDomainParams (22-27)
go/apps/ctrl/services/deployment/promote.go (2)
go/apps/ctrl/services/deployment/service.go (1)
  • Service (10-18)
go/gen/proto/hydra/v1/deployment.pb.go (6)
  • PromoteRequest (209-214)
  • PromoteRequest (227-227)
  • PromoteRequest (242-244)
  • PromoteResponse (253-257)
  • PromoteResponse (270-270)
  • PromoteResponse (285-287)
go/apps/ctrl/workflows/deploy/helpers.go (5)
go/gen/proto/krane/v1/deployment.pb.go (3)
  • Instance (520-527)
  • Instance (540-540)
  • Instance (555-557)
go/gen/proto/partition/v1/gateway.pb.go (12)
  • GatewayConfig (26-37)
  • GatewayConfig (50-50)
  • GatewayConfig (65-67)
  • Deployment (104-110)
  • Deployment (123-123)
  • Deployment (138-140)
  • VM (208-213)
  • VM (226-226)
  • VM (241-243)
  • AuthConfig (253-258)
  • AuthConfig (271-271)
  • AuthConfig (286-288)
go/apps/ctrl/workflows/deploy/service.go (1)
  • Workflow (22-29)
go/pkg/db/models_generated.go (1)
  • DeploymentsStatus (193-193)
go/pkg/db/deployment_update_status.sql_generated.go (1)
  • UpdateDeploymentStatusParams (19-23)
go/apps/ctrl/workflows/deploy/service.go (3)
go/gen/proto/hydra/v1/deployment_restate.pb.go (7)
  • UnimplementedDeploymentServiceServer (73-73)
  • UnimplementedDeploymentServiceServer (75-77)
  • UnimplementedDeploymentServiceServer (78-80)
  • UnimplementedDeploymentServiceServer (81-83)
  • UnimplementedDeploymentServiceServer (84-84)
  • DeploymentServiceClient (15-19)
  • DeploymentServiceServer (62-66)
go/apps/ctrl/workflows/routing/service.go (2)
  • New (44-51)
  • Config (29-41)
go/apps/ctrl/config.go (1)
  • Config (48-104)
go/apps/ctrl/workflows/deploy/deploy_handler.go (14)
go/apps/ctrl/workflows/deploy/service.go (1)
  • Workflow (22-29)
go/gen/proto/hydra/v1/deployment.pb.go (6)
  • DeployRequest (25-32)
  • DeployRequest (45-45)
  • DeployRequest (60-62)
  • DeployResponse (85-89)
  • DeployResponse (102-102)
  • DeployResponse (117-119)
go/pkg/db/deployment_find_by_id.sql_generated.go (1)
  • FindDeploymentByIdRow (35-51)
go/pkg/db/models_generated.go (5)
  • DeploymentsStatusFailed (201-201)
  • Workspace (814-832)
  • DeploymentsStatusBuilding (197-197)
  • DeploymentsStatusDeploying (198-198)
  • DeploymentsStatusReady (200-200)
go/pkg/db/project_find_by_id.sql_generated.go (1)
  • FindProjectByIdRow (30-42)
go/pkg/db/environment_find_by_id.sql_generated.go (1)
  • FindEnvironmentByIdRow (19-25)
go/pkg/db/deployment_step_insert.sql_generated.go (1)
  • InsertDeploymentStepParams (33-40)
go/gen/proto/krane/v1/deployment.pb.go (6)
  • DeploymentRequest (76-86)
  • DeploymentRequest (99-99)
  • DeploymentRequest (114-116)
  • Instance (520-527)
  • Instance (540-540)
  • Instance (555-557)
go/gen/proto/partition/v1/gateway.pb.go (15)
  • Deployment (104-110)
  • Deployment (123-123)
  • Deployment (138-140)
  • GatewayConfig (26-37)
  • GatewayConfig (50-50)
  • GatewayConfig (65-67)
  • VM (208-213)
  • VM (226-226)
  • VM (241-243)
  • AuthConfig (253-258)
  • AuthConfig (271-271)
  • AuthConfig (286-288)
  • ValidationConfig (298-304)
  • ValidationConfig (317-317)
  • ValidationConfig (332-334)
go/pkg/partition/db/models_generated.go (5)
  • VmsStatus (57-57)
  • VmsStatusProvisioning (61-61)
  • VmsStatusRunning (63-63)
  • VmsStatusStopping (64-64)
  • VmsStatusAllocated (60-60)
go/pkg/partition/db/vm_upsert.sql_generated.go (1)
  • UpsertVMParams (24-31)
go/pkg/db/deployment_update_openapi_spec.sql_generated.go (1)
  • UpdateDeploymentOpenapiSpecParams (19-23)
go/gen/proto/hydra/v1/routing.pb.go (10)
  • DomainToAssign (174-180)
  • DomainToAssign (193-193)
  • DomainToAssign (208-210)
  • DomainSticky_DOMAIN_STICKY_UNSPECIFIED (29-29)
  • DomainSticky_DOMAIN_STICKY_BRANCH (30-30)
  • DomainSticky_DOMAIN_STICKY_ENVIRONMENT (31-31)
  • DomainSticky_DOMAIN_STICKY_LIVE (32-32)
  • AssignDomainsRequest (79-93)
  • AssignDomainsRequest (106-106)
  • AssignDomainsRequest (121-123)
go/gen/proto/hydra/v1/routing_restate.pb.go (1)
  • NewRoutingServiceClient (30-37)
go/apps/ctrl/run.go (8)
go/gen/proto/hydra/v1/deployment_restate.pb.go (1)
  • NewDeploymentServiceServer (93-107)
go/apps/ctrl/services/deployment/service.go (2)
  • New (27-36)
  • Config (20-25)
go/apps/ctrl/workflows/certificate/service.go (2)
  • New (45-52)
  • Config (30-42)
go/apps/ctrl/workflows/deploy/service.go (2)
  • New (52-60)
  • Config (34-49)
go/apps/ctrl/workflows/routing/service.go (2)
  • New (44-51)
  • Config (29-41)
go/gen/proto/hydra/v1/routing_restate.pb.go (1)
  • NewRoutingServiceServer (88-101)
go/gen/proto/hydra/v1/certificate_restate.pb.go (1)
  • NewCertificateServiceServer (75-87)
go/pkg/retry/retry.go (2)
  • Attempts (81-86)
  • Backoff (91-97)
🪛 Buf (1.57.2)
go/proto/hydra/v1/certificate.proto

5-5: import "dev/restate/sdk/go.proto": file does not exist

(COMPILE)

go/proto/hydra/v1/routing.proto

5-5: import "dev/restate/sdk/go.proto": file does not exist

(COMPILE)

go/proto/hydra/v1/deployment.proto

5-5: import "dev/restate/sdk/go.proto": file does not exist

(COMPILE)

🪛 Checkov (3.2.334)
go/k8s/manifests/restate.yaml

[medium] 15-64: Containers should not run with allowPrivilegeEscalation

(CKV_K8S_20)


[medium] 15-64: Minimize the admission of root containers

(CKV_K8S_23)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: Test API / API Test Local
  • GitHub Check: Test Go API Local / Test
  • GitHub Check: Test Packages / Test
  • GitHub Check: Build / Build

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
go/apps/ctrl/run.go (1)

206-251: Harden Restate admin registration (duplicate issue).

As noted in previous reviews, the registration logic needs hardening:

  1. Guard on both RegisterAs and AdminURL (line 207) - currently only checks RegisterAs.
  2. JSON-marshal the payload (line 213) - using fmt.Sprintf for JSON is fragile and doesn't escape properly.
  3. Use a custom http.Client with timeout (line 232) - http.DefaultClient has no timeout.
  4. Replace fixed 5s backoff with jittered backoff (lines 219-221) - all retries occur at exactly 5s intervals.

Apply these changes:

+	import "encoding/json"
+
-	if cfg.Restate.RegisterAs != "" {
+	if cfg.Restate.RegisterAs != "" && cfg.Restate.AdminURL != "" {
 		go func() {
 			time.Sleep(2 * time.Second)
 
 			registerURL := fmt.Sprintf("%s/deployments", cfg.Restate.AdminURL)
-			payload := fmt.Sprintf(`{"uri": "%s"}`, cfg.Restate.RegisterAs)
+			type registrationPayload struct {
+				URI string `json:"uri"`
+			}
+			body, _ := json.Marshal(registrationPayload{URI: cfg.Restate.RegisterAs})
 
 			logger.Info("Registering with Restate", "admin_url", registerURL, "service_uri", cfg.Restate.RegisterAs)
 
 			retrier := retry.New(
 				retry.Attempts(10),
 				retry.Backoff(func(n int) time.Duration {
-					return 5 * time.Second
+					// Exponential backoff with jitter: 2^n seconds + random jitter up to 1s
+					base := time.Duration(1<<uint(n)) * time.Second
+					jitter := time.Duration(rand.Intn(1000)) * time.Millisecond
+					return base + jitter
 				}),
 			)
 
 			err := retrier.Do(func() error {
-				req, err := http.NewRequestWithContext(ctx, "POST", registerURL, bytes.NewBufferString(payload))
+				req, err := http.NewRequestWithContext(ctx, http.MethodPost, registerURL, bytes.NewReader(body))
 				if err != nil {
 					return fmt.Errorf("failed to create registration request: %w", err)
 				}
 
 				req.Header.Set("Content-Type", "application/json")
 
-				resp, err := http.DefaultClient.Do(req)
+				client := &http.Client{Timeout: 5 * time.Second}
+				resp, err := client.Do(req)
 				if err != nil {
 					return fmt.Errorf("failed to register with Restate: %w", err)
 				}

Note: You'll need to import encoding/json and math/rand for the jittered backoff.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e2f5837 and 0d2c6ed.

📒 Files selected for processing (1)
  • go/apps/ctrl/run.go (4 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-09-15T18:12:01.503Z
Learnt from: mcstepp
PR: unkeyed/unkey#3952
File: go/apps/ctrl/run.go:212-213
Timestamp: 2025-09-15T18:12:01.503Z
Learning: In Go connect-go generated handlers, mux.Handle can accept New*ServiceHandler functions directly even though they return (string, http.Handler) tuples. The pattern mux.Handle(ctrlv1connect.New*ServiceHandler(...)) is valid and compiles successfully in unkey codebase.

Applied to files:

  • go/apps/ctrl/run.go
🧬 Code graph analysis (1)
go/apps/ctrl/run.go (8)
go/gen/proto/hydra/v1/deployment_restate.pb.go (1)
  • NewDeploymentServiceServer (93-107)
go/apps/ctrl/services/deployment/service.go (2)
  • New (27-36)
  • Config (20-25)
go/apps/ctrl/workflows/certificate/service.go (2)
  • New (45-52)
  • Config (30-42)
go/apps/ctrl/workflows/deploy/service.go (2)
  • New (52-60)
  • Config (34-49)
go/apps/ctrl/workflows/routing/service.go (2)
  • New (44-51)
  • Config (29-41)
go/gen/proto/hydra/v1/routing_restate.pb.go (1)
  • NewRoutingServiceServer (88-101)
go/gen/proto/hydra/v1/certificate_restate.pb.go (1)
  • NewCertificateServiceServer (75-87)
go/pkg/retry/retry.go (2)
  • Attempts (81-86)
  • Backoff (91-97)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Test API / API Test Local
  • GitHub Check: Test Packages / Test
🔇 Additional comments (2)
go/apps/ctrl/run.go (2)

274-279: LGTM!

The deployment service handler correctly uses the new deployment.Config structure with the Restate client. The wiring aligns with the Restate migration architecture.


198-204: Fail fast on Restate server start errors.

  • Add a buffered error channel and send any Start failures to it; before the signal wait, select on the channel to exit on error:
+   restateErrCh := make(chan error, 1)
    go func() {
        addr := fmt.Sprintf(":%d", cfg.Restate.HttpPort)
        logger.Info("Starting Restate server", "addr", addr)
        if startErr := restateSrv.Start(ctx, addr); startErr != nil {
            logger.Error("failed to start restate server", "error", startErr.Error())
+           restateErrCh <- startErr
        }
    }()
+   select {
+   case err := <-restateErrCh:
+       return err
+   default:
+   }
  • Check if the Restate SDK provides a Close or Shutdown method and register it with shutdowns.RegisterCtx for graceful teardown.

@chronark chronark merged commit efbe477 into main Oct 3, 2025
17 checks passed
@chronark chronark deleted the 10-02-refactor_remove_hydra_in_favor_of_restate branch October 3, 2025 10:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants