refactor: remove hydra in favor of restate#4053
Conversation
This is mostly generated code changes, don't be scared I've already chatted with @Flo4604 earlier, but I noticed our hydra implementation is quite flawed. We had race conditions, cases where a workflow was not durable at all when workers crashed and after spending a few hours trying to fix it, I figured it's not worth it. This PR rips our hydra and powers our durable workflows via restate.dev instead. It's already in sandbox btw. I realised we can either spend a lot of time to build this inhouse, or focus on what really matters. Restate has a very nice balance between features and dependencies. It ships as a single binary for devmode and they also have a k8s operator, which we can use later in prod. Flo also has experience running it in prod for steamsets. The actual workflow implementation is fairly similar, so the business logic is almost the same. The exception is that we use a dedicated "Virtual Object" for domain and gateway assignments, so these are always serialized and we can not have race conditions there.
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
|
|
Warning Rate limit exceeded@chronark has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 3 minutes and 42 seconds before requesting another review. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. 📒 Files selected for processing (2)
📝 WalkthroughWalkthroughReplaces Hydra-based in-process workflows with Restate virtual-object workflows across deploy, routing, and certificate; adds Restate infra (Docker/K8s/Tilt/Make), new protobufs and buf/generation, updates ctrl config/CLI/run wiring, adds Go workflow packages and DB query, and adds a Mermaid MDX component for docs. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor Client
participant API as ctrl (gRPC/HTTP)
participant Ingress as Restate Ingress
participant Server as Restate Server
participant WF as Workflow (Deploy/Routing/Cert)
participant DB as DB/PartitionDB
participant Krane as Krane
Note over API,Ingress: ctrl delegates long-running ops to Restate virtual objects
Client->>API: Deploy / Promote / Rollback request
API->>Ingress: WorkflowSend / Object.Request (hydra.v1)
Ingress->>Server: route to virtual object
Server->>WF: Invoke Deploy/Promote/Rollback handler
WF->>DB: durable steps (load/update entities)
WF->>Krane: create & poll deployment (Deploy)
alt domain config needed
WF->>Ingress: Object.Request (RoutingService.AssignDomains / SwitchDomains)
Ingress->>Server: route to Routing workflow
Server->>WF: Routing handler upserts domains & gateway configs
WF->>DB: store gateway configs
end
WF-->>Server: complete
Server-->>Ingress: response
Ingress-->>API: workflow started/response
API-->>Client: 202/2xx or error
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120+ minutes Possibly related PRs
Suggested labels
Suggested reviewers
Pre-merge checks and finishing touches❌ Failed checks (2 warnings)
✅ Passed checks (1 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Actionable comments posted: 14
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
go/GO_DOCUMENTATION_GUIDELINES.md (1)
401-402: Clarify package comment styleLine 401 says to reserve
/* */for package docs, but the doc.go example above (Lines 39-77) uses//line comments. That contradiction will confuse contributors. Please align the rule and example—either bless//for package docs or adjust the example accordingly.Apply this diff if you want to keep using
//consistently:-2. Use `//` for function comments, `/* */` only for package-level overviews +2. Prefer `//` line comments everywhere (including package docs) for consistency with Go tooling expectations; reserve `/* */` only when absolutely necessary.go/apps/ctrl/services/deployment/promote.go (1)
20-31: Enforce workspace ownership in Promote
In go/apps/ctrl/services/deployment/promote.go, after loadingtargetDeployment, compare itsWorkspaceIDagainst the caller’s workspace ID (extracted from auth context) and return a permission‐denied error if they differ.
🧹 Nitpick comments (17)
deployment/docker-compose.yaml (1)
293-293: Consider pinning Restate image version.Using
latesttag in production-like configurations can lead to unexpected behavior when the image is updated. Consider pinning to a specific version for reproducibility.- image: docker.io/restatedev/restate:latest + image: docker.io/restatedev/restate:1.2.3 # Replace with actual versiongo/apps/ctrl/config.go (1)
103-103: Consider adding Restate configuration validation.The
Config.Validate()method (lines 106-115) validates ACME/Cloudflare configuration but does not validate the newRestatefield. If the control plane requires Restate to function, consider adding validation checks for required Restate configuration (e.g., non-empty URLs, valid port range).For example, you could add validation like:
func (c Config) Validate() error { // Validate Cloudflare configuration if enabled if c.Acme.Enabled && c.Acme.Cloudflare.Enabled { if err := assert.NotEmpty(c.Acme.Cloudflare.ApiToken, "cloudflare API token is required when cloudflare is enabled"); err != nil { return err } } + + // Validate Restate configuration if workflows are enabled + if err := assert.NotEmpty(c.Restate.IngressURL, "restate ingress URL is required"); err != nil { + return err + } + if err := assert.NotEmpty(c.Restate.AdminURL, "restate admin URL is required"); err != nil { + return err + } + if c.Restate.HttpPort <= 0 || c.Restate.HttpPort > 65535 { + return fmt.Errorf("restate HTTP port must be between 1 and 65535, got %d", c.Restate.HttpPort) + } return nil }go/apps/ctrl/workflows/routing/assign_domains_handler.go (2)
111-150: Revisit operation ordering for stronger consistency guarantees.Consider upserting gateway configs first, then applying domain updates (or use a saga with compensation) to match the “gateway-before-domain” invariant and avoid windows of inconsistent routing.
116-135: Avoid duplicate gateway upserts for repeated domains.Dedup domains locally before building gatewayParams.
Example:
- for _, domainName := range changedDomains { + seen := map[string]struct{}{} + for _, domainName := range changedDomains { + if _, ok := seen[domainName]; ok { + continue + } + seen[domainName] = struct{}{}go/apps/ctrl/workflows/routing/service.go (1)
43-51: Validate required config in New().Fail fast if DB/PartitionDB/logger are nil or DefaultDomain is empty to prevent subtle runtime issues (e.g., local-domain filtering).
Example:
func New(cfg Config) *Service { + if cfg.DB == nil || cfg.PartitionDB == nil { + panic("routing.Service: DB and PartitionDB are required") + } + if cfg.Logger == (logging.Logger{}) { + panic("routing.Service: Logger is required") + } + if cfg.DefaultDomain == "" { + panic("routing.Service: DefaultDomain is required") + } return &Service{go/k8s/manifests/restate.yaml (1)
33-58: Harden the pod securityContext.We’re shipping this StatefulSet without any pod/container securityContext, so by default the container can run as root with privilege escalation permitted. That trips CKV_K8S_20/23 and leaves us exposed if the Restate image is ever compromised. Please explicitly set
runAsNonRoot: true,allowPrivilegeEscalation: false, and related hardening fields (fsGroup, seccomp, etc.) so we get the expected defense-in-depth posture.go/apps/ctrl/services/deployment/create_deployment.go (3)
147-151: Validate docker image and verify KeyAuthId mapping
- Validate req.Msg.GetDockerImage() is non-empty before inserting/dispatching.
- Confirm KeyAuthId should be populated from KeyspaceId. If the field expects a key-auth credential id, this may be incorrect.
154-160: Add a per-call timeout to the Restate sendPrevent hangs if Restate is slow/unreachable.
Apply this diff:
- invocation := restateingress.WorkflowSend[*hydrav1.DeployRequest]( + ctx, cancel := context.WithTimeout(ctx, 15*time.Second) + defer cancel() + invocation := restateingress.WorkflowSend[*hydrav1.DeployRequest]( s.restate, "hydra.v1.DeploymentService", project.ID, "Deploy", ).Send(ctx, deployReq)
164-167: Optional: persist invocation_id for traceabilityConsider storing invocation.Id on the deployment record (if a column exists) for easier cross-system debugging.
apps/engineering/content/docs/architecture/workflows/creating-services.mdx (1)
126-146: Recommend per-call timeouts in samplesAdd context.WithTimeout around ingress calls to avoid indefinite waits and improve resilience.
Example:
ctx, cancel := context.WithTimeout(ctx, 15*time.Second) defer cancel() resp, err := restateingress.Object[...](...).Request(ctx, request)go/apps/ctrl/services/deployment/promote.go (2)
35-42: Add per-call timeout to Restate requestGuard against slow/unreachable Restate.
Apply this diff:
-import ( - "context" - "fmt" - - "connectrpc.com/connect" - restateingress "github.com/restatedev/sdk-go/ingress" - ctrlv1 "github.com/unkeyed/unkey/go/gen/proto/ctrl/v1" - hydrav1 "github.com/unkeyed/unkey/go/gen/proto/hydra/v1" - "github.com/unkeyed/unkey/go/pkg/db" -) +import ( + "context" + "fmt" + "time" + + "connectrpc.com/connect" + restateingress "github.com/restatedev/sdk-go/ingress" + ctrlv1 "github.com/unkeyed/unkey/go/gen/proto/ctrl/v1" + hydrav1 "github.com/unkeyed/unkey/go/gen/proto/hydra/v1" + "github.com/unkeyed/unkey/go/pkg/db" +) @@ - _, err = restateingress.Object[*hydrav1.PromoteRequest, *hydrav1.PromoteResponse]( + ctx, cancel := context.WithTimeout(ctx, 30*time.Second) + defer cancel() + _, err = restateingress.Object[*hydrav1.PromoteRequest, *hydrav1.PromoteResponse]( s.restate, "hydra.v1.DeploymentService", targetDeployment.ProjectID, "Promote", ).Request(ctx, &hydrav1.PromoteRequest{ TargetDeploymentId: req.Msg.GetTargetDeploymentId(), })
44-50: Optional: propagate workflow error codesIf the workflow returns structured Connect errors (e.g., NotFound, InvalidArgument), unwrap/map them instead of always returning CodeInternal to improve client UX.
apps/engineering/content/docs/architecture/workflows/index.mdx (1)
27-27: Consider adding concrete code examples.The phrase "See the Go implementation files for examples" is somewhat vague. Consider adding a brief inline code snippet showing a typical service communication pattern or linking to specific example files/line numbers.
go/apps/ctrl/workflows/deploy/rollback_handler.go (1)
130-138: Remove redundant filtering logic.The loop at lines 132-136 re-filters domains by sticky type, but the query at lines 111-117 already filters by these exact sticky types. The domain slice returned from the query only contains domains with sticky types of Live or Environment.
Apply this diff to simplify:
- // Collect domain IDs - var domainIDs []string - for _, domain := range domains { - if domain.Sticky.Valid && - (domain.Sticky.DomainsSticky == db.DomainsStickyLive || - domain.Sticky.DomainsSticky == db.DomainsStickyEnvironment) { - domainIDs = append(domainIDs, domain.ID) - } - } + // Collect domain IDs (already filtered by query) + domainIDs := make([]string, len(domains)) + for i, domain := range domains { + domainIDs[i] = domain.ID + }go/apps/ctrl/workflows/certificate/pem.go (3)
16-30: Add nil check for robustness.The function does not validate that
privateKeyis non-nil before marshaling. While the caller may ensure this, adding a defensive check prevents potential panics.Apply this diff:
func privateKeyToString(privateKey *ecdsa.PrivateKey) (string, error) { + if privateKey == nil { + return "", fmt.Errorf("private key cannot be nil") + } // Marshal the private key to DER format privKeyBytes, err := x509.MarshalECPrivateKey(privateKey)
35-49: Consider validating PEM block type.The function decodes the PEM block but does not verify that
block.Typeis "EC PRIVATE KEY". Whilex509.ParseECPrivateKeywill fail on invalid data, an explicit type check would provide clearer error messages for mismatched PEM types.func stringToPrivateKey(pemString string) (*ecdsa.PrivateKey, error) { // Decode PEM format block, _ := pem.Decode([]byte(pemString)) if block == nil { return nil, fmt.Errorf("failed to decode PEM block") } + if block.Type != "EC PRIVATE KEY" { + return nil, fmt.Errorf("invalid PEM block type: expected 'EC PRIVATE KEY', got '%s'", block.Type) + } // Parse the EC private key privateKey, err := x509.ParseECPrivateKey(block.Bytes)
55-67: Consider validating PEM block type.Similar to
stringToPrivateKey, this function could benefit from validating thatblock.Typeis "CERTIFICATE" to provide clearer error messages when the wrong PEM type is provided.func getCertificateExpiry(certPEM string) (time.Time, error) { block, _ := pem.Decode([]byte(certPEM)) if block == nil { return time.Time{}, fmt.Errorf("failed to decode PEM block") } + if block.Type != "CERTIFICATE" { + return time.Time{}, fmt.Errorf("invalid PEM block type: expected 'CERTIFICATE', got '%s'", block.Type) + } cert, err := x509.ParseCertificate(block.Bytes)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (18)
go/buf.lockis excluded by!**/*.lockgo/gen/proto/ctrl/v1/acme.pb.gois excluded by!**/*.pb.go,!**/gen/**go/gen/proto/ctrl/v1/build.pb.gois excluded by!**/*.pb.go,!**/gen/**go/gen/proto/ctrl/v1/deployment.pb.gois excluded by!**/*.pb.go,!**/gen/**go/gen/proto/ctrl/v1/openapi.pb.gois excluded by!**/*.pb.go,!**/gen/**go/gen/proto/ctrl/v1/service.pb.gois excluded by!**/*.pb.go,!**/gen/**go/gen/proto/hydra/v1/certificate.pb.gois excluded by!**/*.pb.go,!**/gen/**go/gen/proto/hydra/v1/certificate_restate.pb.gois excluded by!**/*.pb.go,!**/gen/**go/gen/proto/hydra/v1/deployment.pb.gois excluded by!**/*.pb.go,!**/gen/**go/gen/proto/hydra/v1/deployment_restate.pb.gois excluded by!**/*.pb.go,!**/gen/**go/gen/proto/hydra/v1/routing.pb.gois excluded by!**/*.pb.go,!**/gen/**go/gen/proto/hydra/v1/routing_restate.pb.gois excluded by!**/*.pb.go,!**/gen/**go/gen/proto/krane/v1/deployment.pb.gois excluded by!**/*.pb.go,!**/gen/**go/gen/proto/partition/v1/gateway.pb.gois excluded by!**/*.pb.go,!**/gen/**go/gen/proto/vault/v1/object.pb.gois excluded by!**/*.pb.go,!**/gen/**go/gen/proto/vault/v1/service.pb.gois excluded by!**/*.pb.go,!**/gen/**go/go.sumis excluded by!**/*.sumpnpm-lock.yamlis excluded by!**/pnpm-lock.yaml
📒 Files selected for processing (53)
apps/engineering/app/components/mermaid.tsx(1 hunks)apps/engineering/app/docs/[[...slug]]/page.tsx(2 hunks)apps/engineering/content/docs/architecture/meta.json(1 hunks)apps/engineering/content/docs/architecture/workflows/creating-services.mdx(1 hunks)apps/engineering/content/docs/architecture/workflows/deployment-service.mdx(1 hunks)apps/engineering/content/docs/architecture/workflows/index.mdx(1 hunks)apps/engineering/content/docs/architecture/workflows/meta.json(1 hunks)apps/engineering/content/docs/architecture/workflows/routing-service.mdx(1 hunks)apps/engineering/content/docs/infrastructure/database-schema.mdx(4 hunks)apps/engineering/package.json(1 hunks)deployment/docker-compose.yaml(4 hunks)go/GO_DOCUMENTATION_GUIDELINES.md(8 hunks)go/Makefile(5 hunks)go/Tiltfile(5 hunks)go/apps/ctrl/config.go(2 hunks)go/apps/ctrl/run.go(5 hunks)go/apps/ctrl/services/acme/certificate_workflow.go(0 hunks)go/apps/ctrl/services/acme/service.go(0 hunks)go/apps/ctrl/services/deployment/create_deployment.go(2 hunks)go/apps/ctrl/services/deployment/deploy_workflow.go(0 hunks)go/apps/ctrl/services/deployment/promote.go(2 hunks)go/apps/ctrl/services/deployment/rollback.go(2 hunks)go/apps/ctrl/services/deployment/service.go(1 hunks)go/apps/ctrl/workflows/certificate/doc.go(1 hunks)go/apps/ctrl/workflows/certificate/pem.go(1 hunks)go/apps/ctrl/workflows/certificate/process_challenge_handler.go(1 hunks)go/apps/ctrl/workflows/certificate/service.go(1 hunks)go/apps/ctrl/workflows/deploy/deploy_handler.go(1 hunks)go/apps/ctrl/workflows/deploy/doc.go(1 hunks)go/apps/ctrl/workflows/deploy/domains.go(3 hunks)go/apps/ctrl/workflows/deploy/helpers.go(1 hunks)go/apps/ctrl/workflows/deploy/promote_handler.go(1 hunks)go/apps/ctrl/workflows/deploy/rollback_handler.go(1 hunks)go/apps/ctrl/workflows/deploy/service.go(1 hunks)go/apps/ctrl/workflows/routing/assign_domains_handler.go(1 hunks)go/apps/ctrl/workflows/routing/doc.go(1 hunks)go/apps/ctrl/workflows/routing/helpers.go(1 hunks)go/apps/ctrl/workflows/routing/service.go(1 hunks)go/apps/ctrl/workflows/routing/switch_domains_handler.go(1 hunks)go/buf.gen.connect.yaml(1 hunks)go/buf.gen.restate.yaml(1 hunks)go/buf.yaml(2 hunks)go/cmd/ctrl/main.go(2 hunks)go/demo_api/main.go(1 hunks)go/go.mod(4 hunks)go/k8s/manifests/ctrl.yaml(4 hunks)go/k8s/manifests/restate.yaml(1 hunks)go/pkg/db/domain_find_by_ids.sql_generated.go(1 hunks)go/pkg/db/querier_generated.go(1 hunks)go/pkg/db/queries/domain_find_by_ids.sql(1 hunks)go/proto/hydra/v1/certificate.proto(1 hunks)go/proto/hydra/v1/deployment.proto(1 hunks)go/proto/hydra/v1/routing.proto(1 hunks)
💤 Files with no reviewable changes (3)
- go/apps/ctrl/services/acme/service.go
- go/apps/ctrl/services/acme/certificate_workflow.go
- go/apps/ctrl/services/deployment/deploy_workflow.go
🧰 Additional context used
🧠 Learnings (6)
📚 Learning: 2025-09-01T02:33:43.791Z
Learnt from: imeyer
PR: unkeyed/unkey#3899
File: go/buf.gen.yaml:0-0
Timestamp: 2025-09-01T02:33:43.791Z
Learning: In the unkeyed/unkey repository, buf commands are executed from the `go/` directory where the `buf.yaml` file is located. This means the `out: gen` configuration in `go/buf.gen.yaml` generates files to `go/gen/` relative to the repository root, which aligns with the expected directory structure.
Applied to files:
go/Makefile
📚 Learning: 2025-09-01T01:57:42.227Z
Learnt from: imeyer
PR: unkeyed/unkey#3899
File: go/proto/metald/v1/metald.proto:5-9
Timestamp: 2025-09-01T01:57:42.227Z
Learning: In the unkeyed/unkey repository, buf is configured to properly resolve metald proto imports like "metald/v1/vm.proto" without needing the full "go/proto/" prefix. The buf lint command `buf lint --path proto/metald` passes successfully with these relative import paths.
Applied to files:
go/Makefile
📚 Learning: 2025-04-22T14:43:11.724Z
Learnt from: chronark
PR: unkeyed/unkey#3161
File: go/pkg/clickhouse/schema/databases/002_ratelimits/006_ratelimits_per_day_v1.sql:1-13
Timestamp: 2025-04-22T14:43:11.724Z
Learning: In the unkey project, the SQL files in clickhouse/schema/databases represent the current production schema and shouldn't be modified directly in PRs. Schema changes require dedicated migration scripts.
Applied to files:
apps/engineering/content/docs/infrastructure/database-schema.mdx
📚 Learning: 2025-07-02T11:51:58.572Z
Learnt from: chronark
PR: unkeyed/unkey#3420
File: go/pkg/hydra/store/gorm/gorm.go:486-498
Timestamp: 2025-07-02T11:51:58.572Z
Learning: The Hydra package (go/pkg/hydra) is planned to be migrated from GORM to sqlc for database operations, which explains why raw SQL queries are acceptable in the current implementation.
Applied to files:
apps/engineering/content/docs/infrastructure/database-schema.mdx
📚 Learning: 2025-09-12T08:01:20.792Z
Learnt from: Flo4604
PR: unkeyed/unkey#3944
File: go/pkg/db/acme_challenge_update_verified_with_expiry.sql_generated.go:31-39
Timestamp: 2025-09-12T08:01:20.792Z
Learning: Do not review or suggest changes to files with sql_generated.go suffix or other files marked as auto-generated (containing "Code generated by" comments), as these are generated by tools like sqlc and changes would be overwritten on regeneration.
Applied to files:
apps/engineering/content/docs/infrastructure/database-schema.mdx
📚 Learning: 2025-09-15T18:12:01.503Z
Learnt from: mcstepp
PR: unkeyed/unkey#3952
File: go/apps/ctrl/run.go:212-213
Timestamp: 2025-09-15T18:12:01.503Z
Learning: In Go connect-go generated handlers, mux.Handle can accept New*ServiceHandler functions directly even though they return (string, http.Handler) tuples. The pattern mux.Handle(ctrlv1connect.New*ServiceHandler(...)) is valid and compiles successfully in unkey codebase.
Applied to files:
go/apps/ctrl/run.go
🪛 Buf (1.57.2)
go/proto/hydra/v1/routing.proto
5-5: import "dev/restate/sdk/go.proto": file does not exist
(COMPILE)
go/proto/hydra/v1/certificate.proto
5-5: import "dev/restate/sdk/go.proto": file does not exist
(COMPILE)
go/proto/hydra/v1/deployment.proto
5-5: import "dev/restate/sdk/go.proto": file does not exist
(COMPILE)
🪛 Checkov (3.2.334)
go/k8s/manifests/restate.yaml
[medium] 15-64: Containers should not run with allowPrivilegeEscalation
(CKV_K8S_20)
[medium] 15-64: Minimize the admission of root containers
(CKV_K8S_23)
🔇 Additional comments (39)
go/buf.gen.connect.yaml (1)
2-3: LGTM! Standard buf managed mode enablement.Enabling managed mode is a standard practice for buf-managed proto generation and aligns with the broader Restate integration.
go/pkg/db/queries/domain_find_by_ids.sql (1)
1-14: LGTM! Well-formed batch domain query.The query correctly uses
sqlc.slice(ids)for parameterized IN clause and selects all relevant domain fields. This supports the new multi-id retrieval capability mentioned in the generated code.apps/engineering/content/docs/architecture/meta.json (1)
6-6: LGTM! Adds workflows documentation page.The addition of "workflows" to the architecture pages aligns with the new durable workflows documentation introduced in this PR.
go/apps/ctrl/workflows/deploy/domains.go (2)
18-35: LGTM! Excellent documentation of domain generation strategy.The expanded documentation clearly explains the three domain types (per-commit, per-branch, per-environment) and their sticky behavior, which is critical for understanding the domain assignment logic in Restate-based workflows.
85-124: LGTM! Well-implemented sluggify helper.The function correctly transforms branch names into URL-safe slugs with appropriate safeguards:
- Removes special characters while preserving alphanumerics
- Normalizes separators (hyphens/underscores → spaces → hyphens)
- Enforces length limit (80 chars)
- Cleans up trailing artifacts
The use of precompiled regexes (lines 82-83) is efficient for repeated invocations.
go/Tiltfile (3)
114-124: LGTM! Restate service properly integrated.The Restate service is correctly configured with appropriate port forwards (8081 for ingress, 9070 for admin) and dependencies. The conditional setup aligns with the service toggle pattern used for other services.
Note: The service is labeled as 'database' which is unconventional for a workflow engine, but appears intentional given the infrastructure grouping in this codebase.
346-347: LGTM! Restate endpoints documented in UI.The Ingress and Admin endpoints are properly documented for developer access.
245-245: Verify Restate availability handling in ctrl startup
- Confirm Tiltfile’s addition of
'restate'toctrl_depsis intentional- Ensure ctrl’s calls to Restate include retry/backoff or graceful fallback when Restate is temporarily unavailable
go/proto/hydra/v1/deployment.proto (1)
9-14: LGTM! Virtual Object ensures serialization.The
VIRTUAL_OBJECTservice type is the correct choice for the DeploymentService. This ensures that operations on the same deployment are serialized, directly addressing the race condition issues mentioned in the PR objectives.deployment/docker-compose.yaml (3)
289-307: LGTM! Restate service properly configured.The Restate service is correctly set up with:
- Appropriate port mappings (8081 for ingress, 9070 for admin)
- Health check on the admin endpoint
- Tracing integration with OpenTelemetry
- Proper dependency on the otel service
347-351: LGTM! Restate configuration properly wired.The environment variables correctly configure:
- Ingress URL pointing to restate service internal endpoint
- Admin URL for management operations
- HTTP port matching the exposed ctrl port (9080)
- Registration URL pointing to ctrl's restate service endpoint
This aligns with the Restate integration documented in the PR.
321-321: All port 9080 configurations are consistent. Verified that docker-compose, environment variables, Restate registration URL, Kubernetes manifests, and RestateConfig all use port 9080.go/buf.yaml (1)
17-18: LGTM! Restate SDK dependency correctly added.Adding
buf.build/restatedev/sdk-goto the deps list enables the proto imports from the Restate SDK used in deployment.proto and other workflow definitions.go/apps/ctrl/config.go (1)
32-46: LGTM! Clear Restate configuration structure.The
RestateConfigstruct is well-documented with helpful inline comments explaining each field's purpose and providing example values. The field names are descriptive and consistent with Go naming conventions.apps/engineering/app/docs/[[...slug]]/page.tsx (1)
1-1: LGTM! Mermaid component integration is correct.The import and component mapping enable Mermaid diagram rendering in MDX documentation. The integration follows the existing pattern for custom components (Tabs, Tab, Banner) and the import path is correct.
Also applies to: 43-43
go/go.mod (1)
205-205: LGTM! Indirect dependencies align with Restate integration.The addition of four indirect dependencies supports the Restate SDK integration and related authentication/schema handling:
restatedev/sdk-go v0.20.0provides the core Restate SDKgolang-jwt/jwt/v5 v5.2.3supports JWT-based authenticationinvopop/jsonschema v0.13.0enables dynamic schema validationmr-tron/base58 v1.2.0supports encoding operationsThese transitive dependencies are expected given the Restate integration across the control plane.
Also applies to: 233-233, 287-287, 325-325
go/Makefile (3)
27-27: LGTM! Restate added as a core dependency.The
uptarget now includesrestatein the docker-compose startup, treating it as a first-class dependency alongside planetscale, mysql, redis, clickhouse, s3, and otel. This aligns with the broader Restate integration.
37-41: LGTM! Code generation workflow updated for Restate.The
generatetarget now:
- Installs the
protoc-gen-go-restateplugin- Runs buf generate with the Restate template for hydra protos
- Formats generated code with
go fmtThis workflow correctly integrates Restate protobuf code generation alongside the existing Connect-based generation.
90-90: LGTM! Kubernetes workflow integrates Restate.The changes correctly integrate Restate into the Kubernetes development workflow:
k8s-upwaits for the Restate pod to be ready before proceeding- New
start-restatetarget enables deploying Restate independentlystart-dependenciesincludes Restate as a core dependencyAlso applies to: 145-147, 162-162
apps/engineering/package.json (1)
24-24: LGTM! Dependencies support Mermaid diagram rendering.The addition of
mermaidandnext-themesdependencies enables:
mermaid ^11.12.0: Rendering Mermaid diagrams in MDX documentationnext-themes ^0.4.6: Theme support for consistent light/dark mode renderingThese dependencies align with the Mermaid component integration in the docs page component.
Also applies to: 26-26
apps/engineering/content/docs/architecture/workflows/meta.json (1)
1-7: LGTM! Well-structured metadata for workflows documentation.The metadata correctly defines a new "Durable Workflows" documentation section with:
- Clear title and description
- Appropriate icon reference
- List of workflow-related pages (index, deployment-service, routing-service, creating-services)
- Correct root flag (false) indicating this is a subsection
go/buf.gen.restate.yaml (1)
1-10: LGTM! Correct buf configuration for Restate code generation.The configuration properly sets up Restate protobuf code generation:
- Uses buf v2 format with managed generation enabled
- Remote Go plugin (v1.36.8) generates standard protobuf code
- Local
protoc-gen-go-restateplugin generates Restate-specific workflow code- Both plugins output to
gen/protowithpaths=source_relativefor consistent import pathsThis aligns with the Makefile's
generatetarget which installs the local plugin and runs this template.go/k8s/manifests/ctrl.yaml (4)
26-26: LGTM! Restate HTTP port correctly exposed.The container now exposes port 9080 for Restate HTTP communication, consistent with the
UNKEY_RESTATE_HTTP_PORTenvironment variable and the service port configuration.
77-85: LGTM! Complete Restate configuration.The environment variables correctly configure Restate integration:
UNKEY_RESTATE_INGRESS_URL: Points to the Restate ingress endpoint (port 8080) for invoking workflowsUNKEY_RESTATE_ADMIN_URL: Points to the Restate admin endpoint (port 9070) for service registrationUNKEY_RESTATE_HTTP_PORT: Specifies the port where ctrl listens for Restate HTTP requests (9080)UNKEY_RESTATE_REGISTER_AS: Provides the self-registration URL for Restate discoveryThese values align with the
RestateConfigstructure ingo/apps/ctrl/config.go.
99-99: LGTM! Init container waits for Restate readiness.The init container now checks that the Restate service is ready (
restate:8080) before allowing the ctrl container to start, ensuring proper service initialization order alongside mysql and s3.
118-121: LGTM! Service correctly exposes Restate port.The Service definition now exposes port 9080 (named "restate") for Restate HTTP communication, enabling other services and Restate itself to communicate with the ctrl's workflow endpoints.
go/pkg/db/domain_find_by_ids.sql_generated.go (1)
58-69: LGTM — generated query expansion is correct.Parameter expansion and empty-slice handling are safe and appropriate.
go/apps/ctrl/workflows/deploy/deploy_handler.go (1)
100-104: updateDeploymentStatus already uses Restate for durability
The helper wraps the status update inrestate.Run, so all calls (including lines 100–104, 131–134, and 335–339) are durable.apps/engineering/content/docs/architecture/workflows/deployment-service.mdx (1)
6-94: Clear, actionable workflow docsFlows, responsibilities, and references look consistent with the Restate migration. No issues.
go/apps/ctrl/workflows/certificate/doc.go (1)
1-79: Great package documentationConcise, accurate overview of the Restate-backed ACME workflow and guarantees. Looks good.
apps/engineering/content/docs/architecture/workflows/index.mdx (1)
1-92: LGTM! Well-structured documentation.The documentation provides a clear, comprehensive overview of the Restate-based durable workflow architecture. The structure logically progresses from core concepts through specific services to configuration and best practices. Error handling and observability guidance are appropriate.
go/apps/ctrl/workflows/deploy/rollback_handler.go (7)
32-42: LGTM! Clear validation and error handling.The function documentation is comprehensive, and the initial validation correctly prevents self-rollback attempts with an appropriate terminal error.
43-72: LGTM! Proper durable step usage and validation.The deployment lookups use named durable steps correctly, distinguish terminal (404) from retryable errors appropriately, and validate that both deployments belong to the same environment and project.
73-88: LGTM! Correct live deployment validation.The project lookup and validation that the source deployment is currently live are implemented correctly with appropriate error handling.
89-108: LGTM! Proper VM status validation.The workflow correctly verifies that the target deployment has at least one running VM before proceeding, preventing rollback to a non-functional deployment.
110-128: LGTM! Proper domain retrieval.The domain query correctly filters for sticky domains (Live and Environment types) and appropriately rejects rollback attempts when no domains are found.
139-148: LGTM! Correct atomic domain switch.The routing service call uses the project ID as the Virtual Object key, ensuring serialized execution per project. The blocking Request pattern is appropriate for this critical atomic operation.
149-173: LGTM! Proper state update and completion.The project state update correctly sets the new live deployment and the rolled-back flag within a named durable step. The final logging provides a good audit trail.
go/apps/ctrl/services/deployment/service.go (1)
1-36: LGTM! Clean refactor to Config-based constructor.The refactor from individual parameters to a Config struct improves maintainability and aligns with the Restate migration. All fields are properly initialized from the config.
apps/engineering/content/docs/architecture/workflows/routing-service.mdx
Show resolved
Hide resolved
|
Thank you for following the naming conventions for pull request titles! 🙏 |
There was a problem hiding this comment.
Actionable comments posted: 23
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (4)
go/demo_api/main.go (1)
796-803: Enforce minLength: 3 in runtime to match OpenAPI (avoid contract drift).Spec requires name length ≥ 3, but the handler only checks non-empty. Clients generated from the spec will expect 400 for 1–2 char names; current server returns 200.
Apply:
--- a/go/demo_api/main.go +++ b/go/demo_api/main.go @@ import ( "crypto/rand" "encoding/hex" "encoding/json" "fmt" "io" "log" "math" "net/http" "os" "regexp" "strconv" "strings" "sync" "time" + "unicode/utf8" ) @@ - name := r.URL.Query().Get("name") - if name == "" { - respondWithError(w, http.StatusBadRequest, "MISSING_PARAMETER", "Missing or invalid name parameter") - return - } + name := strings.TrimSpace(r.URL.Query().Get("name")) + if utf8.RuneCountInString(name) < 3 { + respondWithError(w, http.StatusBadRequest, "INVALID_PARAMETER", "Missing or invalid name parameter") + return + }Also consider adding a quick test to assert 400 for names of length 1–2 and 200 for ≥3.
Also applies to: 3-18, 395-401
apps/engineering/content/docs/infrastructure/database-schema.mdx (1)
10-13: Remove Hydra DB reference (outdated with Restate migration).This section still lists a Hydra MySQL database. With Hydra removed in favor of Restate, this is misleading. Remove Hydra here (and anywhere else in this doc), and, if needed, add a short note that Restate is not backed by this MySQL schema.
Proposed edit:
- - **hydra**: Workflow orchestration engine database for managing deployment workflowsgo/k8s/manifests/restate.yaml (1)
64-85: Avoid exposing admin (9070) via LoadBalancerLoadBalancing both ingress and admin ports publicly is risky. Prefer ClusterIP for admin or split services and expose only ingress.
kind: Service metadata: name: restate @@ spec: selector: app: restate ports: - name: ingress port: 8080 targetPort: 8080 protocol: TCP - name: admin port: 9070 targetPort: 9070 protocol: TCP - type: LoadBalancer + type: ClusterIPOptionally create a separate LoadBalancer service for 8080 only if external access is required.
go/k8s/manifests/ctrl.yaml (1)
21-27: Harden ctrl container and set resourcesAdd securityContext and resource requests/limits; name the new port.
- name: ctrl image: unkey-ctrl:latest imagePullPolicy: Never # Use local images ports: - containerPort: 7091 - - containerPort: 9080 + - containerPort: 9080 + name: restate + securityContext: + allowPrivilegeEscalation: false + runAsNonRoot: true + readOnlyRootFilesystem: true + capabilities: + drop: ["ALL"] + seccompProfile: + type: RuntimeDefault + resources: + requests: + cpu: "200m" + memory: "256Mi" + limits: + cpu: "2" + memory: "1Gi"
♻️ Duplicate comments (1)
go/apps/ctrl/run.go (1)
200-206: Restate start error propagation (completes silently).Current goroutine only logs. Propagate error (e.g., via channel) and exit; see prior comment for diff.
🧹 Nitpick comments (30)
go/Makefile (2)
90-90: Wait forapp=restate— labels verified; consider longer timeout or rollout statusLabels in
go/k8s/manifests/restate.yamlcorrectly useapp: restate. For slower startups, you may increase the wait timeout to--timeout=300sor replace thekubectl waitwith akubectl rollout status deployment/restate -n unkeystep.
145-147: Declare start-restate and start-dependencies as phony targets
Add the missing targets to.PHONYto avoid filename clashes:-.PHONY: install tools fmt test-unit test-integration test-integration-long test-stress test build generate pull up clean k8s-check k8s-up k8s-down k8s-reset k8s-status start-mysql start-ctrl start-all dev +.PHONY: install tools fmt test-unit test-integration test-integration-long test-stress test build generate pull up clean k8s-check k8s-up k8s-down k8s-reset k8s-status start-mysql start-ctrl start-all dev start-restate start-dependenciesThe manifest
go/k8s/manifests/restate.yamlalready usesapp: restate.apps/engineering/content/docs/infrastructure/database-schema.mdx (1)
94-95: Add pointer to the custom sqlc generation step.Helpful to link to the exact entry point (go:generate directive or Makefile target) so engineers know what to run when not using
sqlc generatedirectly.Example addition:
- See Makefile target
generateorgo:generateingo/pkg/db/for the custom generation pipeline.go/buf.yaml (1)
17-18: Consider pinning the Restate SDK version.The dependency
buf.build/restatedev/sdk-gois declared without a version constraint. For reproducible builds and to avoid unexpected breaking changes, consider pinning to a specific version or version range.Apply this diff to pin the dependency version:
deps: - - buf.build/restatedev/sdk-go + - buf.build/restatedev/sdk-go:v1.0.0 # Update to the actual version being usedAlternatively, you can verify the appropriate version by checking the Buf registry or the project's dependency management strategy.
go/apps/ctrl/config.go (1)
103-103: Consider adding Restate configuration validation.The
Restatefield is added to theConfigstruct, but theValidate()method (lines 106-115) doesn't include any checks for the Restate configuration. If Restate is required for the system to function, consider adding validation to ensure critical fields likeIngressURLandAdminURLare not empty.Example validation to add in the
Validate()method:// Validate Restate configuration if c.Restate.IngressURL != "" || c.Restate.AdminURL != "" { if err := assert.NotEmpty(c.Restate.IngressURL, "restate ingress URL is required"); err != nil { return err } if err := assert.NotEmpty(c.Restate.AdminURL, "restate admin URL is required"); err != nil { return err } if err := assert.NotEmpty(c.Restate.RegisterAs, "restate register URL is required"); err != nil { return err } }go/apps/ctrl/workflows/deploy/domains.go (1)
61-69: Consider validating sluggify output.If
branchNameconsists entirely of special characters or results in an empty string after sluggification, the domain could become malformed (e.g.,project-git--workspace.apexwith double hyphens). Consider adding validation to skip branch domain creation if sluggify returns an empty string.Apply this diff to add validation:
if branchName != "" { + slug := sluggify(branchName) + if slug == "" { + // Skip branch domain if sluggify results in empty string + goto skipBranchDomain + } domains = append( domains, newDomain{ - domain: fmt.Sprintf("%s-git-%s-%s.%s", projectSlug, sluggify(branchName), workspaceSlug, apex), + domain: fmt.Sprintf("%s-git-%s-%s.%s", projectSlug, slug, workspaceSlug, apex), sticky: db.NullDomainsSticky{Valid: true, DomainsSticky: db.DomainsStickyBranch}, }, ) } +skipBranchDomain:apps/engineering/content/docs/architecture/workflows/creating-services.mdx (1)
132-135: Consider rephrasing the editorial comment.The phrase "These are ugly, they're working on..." is informal for technical documentation. Consider rephrasing more professionally while still acknowledging the temporary nature of the API.
Suggested rewording:
-These are ugly, they're working on generating proper clients from the proto definitions -https://github.com/restatedev/sdk-go/issues/103 +**Note:** The current client API is temporary. The Restate team is working on generating proper clients from proto definitions. See [sdk-go#103](https://github.com/restatedev/sdk-go/issues/103) for details.apps/engineering/app/components/mermaid.tsx (1)
7-23: Consider adding error handling for mermaid.run.The component uses
void mermaid.run(...)which discards the promise and silently ignores any rendering errors. If a diagram fails to render (e.g., invalid syntax, browser compatibility issues), users won't see any feedback.Consider adding error handling:
- void mermaid.run({ - nodes: [ref.current], - }); + mermaid.run({ + nodes: [ref.current], + }).catch((error) => { + console.error("Failed to render Mermaid diagram:", error); + // Optionally: set error state and display user-friendly message + });go/k8s/manifests/restate.yaml (1)
14-63: Consider volumeClaimTemplates for StatefulSet storageUsing a static PVC works for 1 replica, but volumeClaimTemplates is the idiomatic way for StatefulSets and eases scaling.
No functional change needed now; just a future-proofing note.
go/k8s/manifests/ctrl.yaml (1)
93-101: Make dependency wait more robustbusybox nc -z can be flaky; prefer probing Restate health endpoint.
- "until nc -z mysql 3306 && nc -z s3 3902 && nc -z restate 8080; do echo waiting for dependencies; sleep 2; done;", + "until nc -z mysql 3306 && nc -z s3 3902 && wget -q --spider http://restate:9070/health; do echo waiting for dependencies; sleep 2; done;",go/apps/ctrl/workflows/routing/helpers.go (1)
32-55: Harden local-host detection and keep logic DRY across packagesAdd ::1/0.0.0.0 and trim trailing dot. Also consider centralizing this logic to avoid drift with deploy/helpers.go.
Apply:
func isLocalHostname(hostname, defaultDomain string) bool { - hostname = strings.ToLower(hostname) + hostname = strings.ToLower(hostname) + // Normalize FQDN style + hostname = strings.TrimSuffix(hostname, ".") @@ - if hostname == "localhost" || hostname == "127.0.0.1" { + if hostname == "localhost" || hostname == "127.0.0.1" || hostname == "::1" || hostname == "0.0.0.0" { return true }Optionally extract to a shared helper to reuse here and in deploy/helpers.go. Based on relevant code snippets.
go/apps/ctrl/workflows/deploy/helpers.go (2)
59-89: Match routing’s local-host rules and add ::1/0.0.0.0 + trailing dot trimKeep behavior consistent and robust; consider extracting to shared helper.
Apply:
func isLocalHostname(hostname, defaultDomain string) bool { // Lowercase for case-insensitive comparison - hostname = strings.ToLower(hostname) + hostname = strings.ToLower(hostname) + hostname = strings.TrimSuffix(hostname, ".") @@ - if hostname == "localhost" || hostname == "127.0.0.1" { + if hostname == "localhost" || hostname == "127.0.0.1" || hostname == "::1" || hostname == "0.0.0.0" { return true }
98-109: Fix error message to include actual statusAvoid hardcoding “building”.
Apply:
- if updateErr != nil { - return restate.Void{}, fmt.Errorf("failed to update version status to building: %w", updateErr) - } + if updateErr != nil { + return restate.Void{}, fmt.Errorf("failed to update deployment status to %s: %w", status, updateErr) + }go/apps/ctrl/workflows/routing/assign_domains_handler.go (1)
111-149: Deduplicate domains, handle nil GatewayConfig, and fail fast on marshal errors
- Return early if
req.GetGatewayConfig()is nil to avoid no-op upserts.- Deduplicate
changedDomainsbefore filtering local hosts to prevent redundant upserts.- Return an error on
protojson.Marshalfailure instead of logging and continuing.go/cmd/ctrl/main.go (1)
82-91: Restate flags: add sane default/validation for restate-register-asConsider one of:
- Make restate-register-as required, or
- Provide a safe default (e.g., http://ctrl:9080) for local/dev to avoid empty config.
Also validate URL schemes for ingress/admin/register-as to catch typos early.
Example change:
- cli.String("restate-register-as", "URL of this service for self-registration with Restate. Example: http://ctrl:9080", - cli.EnvVar("UNKEY_RESTATE_REGISTER_AS")), + cli.String("restate-register-as", "URL of this service for self-registration with Restate. Example: http://ctrl:9080", + cli.Default("http://ctrl:9080"), // dev-friendly default + cli.EnvVar("UNKEY_RESTATE_REGISTER_AS")),If your CLI has a Validate option, attach a URL validator to restate-ingress-url, restate-admin-url, and restate-register-as.
go/proto/hydra/v1/certificate.proto (1)
23-26: Prefer enum over free-form status stringUsing an enum improves type safety and avoids value drift.
Example:
-message ProcessChallengeResponse { - string certificate_id = 1; - string status = 2; // "success", "failed", "pending" -} +enum ProcessChallengeStatus { + PROCESS_CHALLENGE_STATUS_UNSPECIFIED = 0; + PROCESS_CHALLENGE_STATUS_SUCCESS = 1; + PROCESS_CHALLENGE_STATUS_FAILED = 2; + PROCESS_CHALLENGE_STATUS_PENDING = 3; +} + +message ProcessChallengeResponse { + string certificate_id = 1; + ProcessChallengeStatus status = 2; +}go/apps/ctrl/services/deployment/service.go (1)
20-25: Config field naming consistency (nit)Other packages tend to use DB over Database; consider renaming Config.Database to DB for consistency (optional).
go/apps/ctrl/services/deployment/promote.go (1)
33-42: Map Restate workflow errors to precise Connect codesThe workflow can return TerminalError(400/404/409...). Instead of always CodeInternal, map known statuses to Connect codes (InvalidArgument, NotFound, FailedPrecondition, etc.) for clearer API semantics.
If the ingress client exposes a typed error with HTTP status, assert and map it; else parse status when available. Example sketch:
// pseudo-code; adjust to actual ingress error type if e, ok := err.(interface{ StatusCode() int }); ok { switch e.StatusCode() { case 400: return nil, connect.NewError(connect.CodeInvalidArgument, err) case 404: return nil, connect.NewError(connect.CodeNotFound, err) case 409: return nil, connect.NewError(connect.CodeFailedPrecondition, err) default: return nil, connect.NewError(connect.CodeInternal, err) } }This preserves user-visible error intent from the workflow.
Also applies to: 45-49
deployment/docker-compose.yaml (2)
293-297: Pin Restate image tag instead of latestUsing :latest can break local dev unexpectedly. Pin a known-good version (and update via PRs).
- image: docker.io/restatedev/restate:latest + image: docker.io/restatedev/restate:v0.XX.X
303-307: Healthcheck assumes curl is present in the imageConfirm curl exists in restatedev/restate. If not, use CMD-SHELL with wget or switch to /healthz tooling.
- healthcheck: - test: ["CMD", "curl", "-f", "http://localhost:9070/health"] + healthcheck: + test: ["CMD-SHELL", "wget -qO- http://localhost:9070/health >/dev/null 2>&1 || exit 1"] timeout: 5s retries: 10 start_period: 10s interval: 5sgo/apps/ctrl/workflows/routing/switch_domains_handler.go (1)
37-44: Consider terminal error on missing gateway configIf no gateway config exists for the target deployment, return a terminal error to avoid futile retries.
- gatewayConfig, err := restate.Run(ctx, func(stepCtx restate.RunContext) (partitiondb.FindGatewayByDeploymentIdRow, error) { - return partitiondb.Query.FindGatewayByDeploymentId(stepCtx, s.partitionDB.RO(), req.GetTargetDeploymentId()) - }, restate.WithName("fetch-gateway-config")) + gatewayConfig, err := restate.Run(ctx, func(stepCtx restate.RunContext) (partitiondb.FindGatewayByDeploymentIdRow, error) { + return partitiondb.Query.FindGatewayByDeploymentId(stepCtx, s.partitionDB.RO(), req.GetTargetDeploymentId()) + }, restate.WithName("fetch-gateway-config")) if err != nil { - return nil, fmt.Errorf("failed to fetch gateway config for deployment %s: %w", req.GetTargetDeploymentId(), err) + // If not found, classify as terminal (bad request) + if err == sql.ErrNoRows { + return nil, restate.TerminalError( + fmt.Errorf("no gateway config for deployment %s", req.GetTargetDeploymentId()), + 400, + ) + } + return nil, fmt.Errorf("failed to fetch gateway config for deployment %s: %w", req.GetTargetDeploymentId(), err) }go/apps/ctrl/workflows/deploy/deploy_handler.go (3)
137-145: Avoidfor i := range 300unless on Go 1.22+Use a classic counter for wider compatibility, or confirm the module requires Go 1.22+.
- for i := range 300 { + for i := 0; i < 300; i++ {If you intend to use this syntax, ensure go.mod specifies go 1.22+.
134-206: Reduce DB churn in the polling loopUpserting all VMs every second for up to 5 minutes is heavy. Track last-seen status and only upsert on changes or at successful completion.
134-206: Prefer non-blocking durable waitsSleeping inside a single restate.Run step ties up the worker. If available, use Restate’s sleep/yield API between polls or split into smaller steps with backoff.
Can you confirm if sdk-go exposes a sleep/yield API (e.g., stepCtx.Sleep or similar)? If yes, consider replacing time.Sleep with it to free capacity during waits.
go/apps/ctrl/services/deployment/create_deployment.go (1)
147-151: Ensure KeyAuthId assignment matches proto oneof type and semanticsDeployRequest.KeyAuthId is a *string. Avoid assigning a non-pointer. Also confirm that KeyspaceId maps to KeyAuthId (naming differs).
- deployReq := &hydrav1.DeployRequest{ - DeploymentId: deploymentID, - DockerImage: req.Msg.GetDockerImage(), - KeyAuthId: req.Msg.KeyspaceId, - } + deployReq := &hydrav1.DeployRequest{ + DeploymentId: deploymentID, + DockerImage: req.Msg.GetDockerImage(), + } + if ks := req.Msg.GetKeyspaceId(); ks != "" { + deployReq.KeyAuthId = &ks + }go/apps/ctrl/workflows/certificate/process_challenge_handler.go (1)
154-157: Consider returning CertificateId in responseProcessChallengeResponse defines certificate_id; you currently omit it. Populate from the insert/update result when available.
If the generated insert returns the ID, wire it through; otherwise I can adjust the SQL to RETURNING id and plumb it into the response.
go/apps/ctrl/services/deployment/rollback.go (1)
48-55: Add a timeout around the workflow callIngress calls can block; set a deadline to avoid hanging request threads.
Apply:
-_, err = restateingress.Object[*hydrav1.RollbackRequest, *hydrav1.RollbackResponse]( +ctxCall, cancel := context.WithTimeout(ctx, 30*time.Second) +defer cancel() +_, err = restateingress.Object[*hydrav1.RollbackRequest, *hydrav1.RollbackResponse]( s.restate, "hydra.v1.DeploymentService", sourceDeployment.ProjectID, "Rollback", -).Request(ctx, &hydrav1.RollbackRequest{ +).Request(ctxCall, &hydrav1.RollbackRequest{ SourceDeploymentId: req.Msg.GetSourceDeploymentId(), TargetDeploymentId: req.Msg.GetTargetDeploymentId(), })Add import:
+ "time"apps/engineering/content/docs/architecture/workflows/index.mdx (1)
27-27: Align terminology with Restate Go SDK.“Object.Request” and “WorkflowSend.Send” are conceptual; consider matching actual SDK method names used in code to avoid confusion (e.g., typed stubs’ Request/Send names).
go/apps/ctrl/run.go (2)
103-105: Delete leftover “make go happy” line.vaultSvc is used later; the blank assignment is now dead code.
Apply:
- // make go happy - _ = vaultSvc
193-199: Guard nil Vault in Certificate workflow.In run.go vaultSvc is nil when cfg.VaultMasterKeys is empty, and certificate.New assigns it directly to Service.vault without checks. While no current handlers dereference vault, any future use will panic. Require vaultSvc (fail startup) or add nil checks in the service (or only bind when vaultSvc ≠ nil).
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (18)
go/buf.lockis excluded by!**/*.lockgo/gen/proto/ctrl/v1/acme.pb.gois excluded by!**/*.pb.go,!**/gen/**go/gen/proto/ctrl/v1/build.pb.gois excluded by!**/*.pb.go,!**/gen/**go/gen/proto/ctrl/v1/deployment.pb.gois excluded by!**/*.pb.go,!**/gen/**go/gen/proto/ctrl/v1/openapi.pb.gois excluded by!**/*.pb.go,!**/gen/**go/gen/proto/ctrl/v1/service.pb.gois excluded by!**/*.pb.go,!**/gen/**go/gen/proto/hydra/v1/certificate.pb.gois excluded by!**/*.pb.go,!**/gen/**go/gen/proto/hydra/v1/certificate_restate.pb.gois excluded by!**/*.pb.go,!**/gen/**go/gen/proto/hydra/v1/deployment.pb.gois excluded by!**/*.pb.go,!**/gen/**go/gen/proto/hydra/v1/deployment_restate.pb.gois excluded by!**/*.pb.go,!**/gen/**go/gen/proto/hydra/v1/routing.pb.gois excluded by!**/*.pb.go,!**/gen/**go/gen/proto/hydra/v1/routing_restate.pb.gois excluded by!**/*.pb.go,!**/gen/**go/gen/proto/krane/v1/deployment.pb.gois excluded by!**/*.pb.go,!**/gen/**go/gen/proto/partition/v1/gateway.pb.gois excluded by!**/*.pb.go,!**/gen/**go/gen/proto/vault/v1/object.pb.gois excluded by!**/*.pb.go,!**/gen/**go/gen/proto/vault/v1/service.pb.gois excluded by!**/*.pb.go,!**/gen/**go/go.sumis excluded by!**/*.sumpnpm-lock.yamlis excluded by!**/pnpm-lock.yaml
📒 Files selected for processing (53)
apps/engineering/app/components/mermaid.tsx(1 hunks)apps/engineering/app/docs/[[...slug]]/page.tsx(2 hunks)apps/engineering/content/docs/architecture/meta.json(1 hunks)apps/engineering/content/docs/architecture/workflows/creating-services.mdx(1 hunks)apps/engineering/content/docs/architecture/workflows/deployment-service.mdx(1 hunks)apps/engineering/content/docs/architecture/workflows/index.mdx(1 hunks)apps/engineering/content/docs/architecture/workflows/meta.json(1 hunks)apps/engineering/content/docs/architecture/workflows/routing-service.mdx(1 hunks)apps/engineering/content/docs/infrastructure/database-schema.mdx(4 hunks)apps/engineering/package.json(1 hunks)deployment/docker-compose.yaml(4 hunks)go/GO_DOCUMENTATION_GUIDELINES.md(8 hunks)go/Makefile(5 hunks)go/Tiltfile(5 hunks)go/apps/ctrl/config.go(2 hunks)go/apps/ctrl/run.go(5 hunks)go/apps/ctrl/services/acme/certificate_workflow.go(0 hunks)go/apps/ctrl/services/acme/service.go(0 hunks)go/apps/ctrl/services/deployment/create_deployment.go(2 hunks)go/apps/ctrl/services/deployment/deploy_workflow.go(0 hunks)go/apps/ctrl/services/deployment/promote.go(2 hunks)go/apps/ctrl/services/deployment/rollback.go(2 hunks)go/apps/ctrl/services/deployment/service.go(1 hunks)go/apps/ctrl/workflows/certificate/doc.go(1 hunks)go/apps/ctrl/workflows/certificate/pem.go(1 hunks)go/apps/ctrl/workflows/certificate/process_challenge_handler.go(1 hunks)go/apps/ctrl/workflows/certificate/service.go(1 hunks)go/apps/ctrl/workflows/deploy/deploy_handler.go(1 hunks)go/apps/ctrl/workflows/deploy/doc.go(1 hunks)go/apps/ctrl/workflows/deploy/domains.go(3 hunks)go/apps/ctrl/workflows/deploy/helpers.go(1 hunks)go/apps/ctrl/workflows/deploy/promote_handler.go(1 hunks)go/apps/ctrl/workflows/deploy/rollback_handler.go(1 hunks)go/apps/ctrl/workflows/deploy/service.go(1 hunks)go/apps/ctrl/workflows/routing/assign_domains_handler.go(1 hunks)go/apps/ctrl/workflows/routing/doc.go(1 hunks)go/apps/ctrl/workflows/routing/helpers.go(1 hunks)go/apps/ctrl/workflows/routing/service.go(1 hunks)go/apps/ctrl/workflows/routing/switch_domains_handler.go(1 hunks)go/buf.gen.connect.yaml(1 hunks)go/buf.gen.restate.yaml(1 hunks)go/buf.yaml(2 hunks)go/cmd/ctrl/main.go(2 hunks)go/demo_api/main.go(1 hunks)go/go.mod(4 hunks)go/k8s/manifests/ctrl.yaml(4 hunks)go/k8s/manifests/restate.yaml(1 hunks)go/pkg/db/domain_find_by_ids.sql_generated.go(1 hunks)go/pkg/db/querier_generated.go(1 hunks)go/pkg/db/queries/domain_find_by_ids.sql(1 hunks)go/proto/hydra/v1/certificate.proto(1 hunks)go/proto/hydra/v1/deployment.proto(1 hunks)go/proto/hydra/v1/routing.proto(1 hunks)
💤 Files with no reviewable changes (3)
- go/apps/ctrl/services/acme/service.go
- go/apps/ctrl/services/deployment/deploy_workflow.go
- go/apps/ctrl/services/acme/certificate_workflow.go
🧰 Additional context used
🧠 Learnings (6)
📚 Learning: 2025-09-01T02:33:43.791Z
Learnt from: imeyer
PR: unkeyed/unkey#3899
File: go/buf.gen.yaml:0-0
Timestamp: 2025-09-01T02:33:43.791Z
Learning: In the unkeyed/unkey repository, buf commands are executed from the `go/` directory where the `buf.yaml` file is located. This means the `out: gen` configuration in `go/buf.gen.yaml` generates files to `go/gen/` relative to the repository root, which aligns with the expected directory structure.
Applied to files:
go/buf.gen.restate.yamlgo/Makefile
📚 Learning: 2025-04-22T14:43:11.724Z
Learnt from: chronark
PR: unkeyed/unkey#3161
File: go/pkg/clickhouse/schema/databases/002_ratelimits/006_ratelimits_per_day_v1.sql:1-13
Timestamp: 2025-04-22T14:43:11.724Z
Learning: In the unkey project, the SQL files in clickhouse/schema/databases represent the current production schema and shouldn't be modified directly in PRs. Schema changes require dedicated migration scripts.
Applied to files:
apps/engineering/content/docs/infrastructure/database-schema.mdx
📚 Learning: 2025-07-02T11:51:58.572Z
Learnt from: chronark
PR: unkeyed/unkey#3420
File: go/pkg/hydra/store/gorm/gorm.go:486-498
Timestamp: 2025-07-02T11:51:58.572Z
Learning: The Hydra package (go/pkg/hydra) is planned to be migrated from GORM to sqlc for database operations, which explains why raw SQL queries are acceptable in the current implementation.
Applied to files:
apps/engineering/content/docs/infrastructure/database-schema.mdx
📚 Learning: 2025-09-12T08:01:20.792Z
Learnt from: Flo4604
PR: unkeyed/unkey#3944
File: go/pkg/db/acme_challenge_update_verified_with_expiry.sql_generated.go:31-39
Timestamp: 2025-09-12T08:01:20.792Z
Learning: Do not review or suggest changes to files with sql_generated.go suffix or other files marked as auto-generated (containing "Code generated by" comments), as these are generated by tools like sqlc and changes would be overwritten on regeneration.
Applied to files:
apps/engineering/content/docs/infrastructure/database-schema.mdx
📚 Learning: 2025-09-01T01:57:42.227Z
Learnt from: imeyer
PR: unkeyed/unkey#3899
File: go/proto/metald/v1/metald.proto:5-9
Timestamp: 2025-09-01T01:57:42.227Z
Learning: In the unkeyed/unkey repository, buf is configured to properly resolve metald proto imports like "metald/v1/vm.proto" without needing the full "go/proto/" prefix. The buf lint command `buf lint --path proto/metald` passes successfully with these relative import paths.
Applied to files:
go/Makefile
📚 Learning: 2025-09-15T18:12:01.503Z
Learnt from: mcstepp
PR: unkeyed/unkey#3952
File: go/apps/ctrl/run.go:212-213
Timestamp: 2025-09-15T18:12:01.503Z
Learning: In Go connect-go generated handlers, mux.Handle can accept New*ServiceHandler functions directly even though they return (string, http.Handler) tuples. The pattern mux.Handle(ctrlv1connect.New*ServiceHandler(...)) is valid and compiles successfully in unkey codebase.
Applied to files:
go/apps/ctrl/run.go
🧬 Code graph analysis (19)
go/pkg/db/querier_generated.go (2)
go/pkg/partition/db/database.go (1)
DBTX(10-10)go/pkg/db/domain_find_by_ids.sql_generated.go (1)
FindDomainsByIdsRow(30-41)
go/apps/ctrl/workflows/routing/service.go (5)
go/gen/proto/hydra/v1/routing_restate.pb.go (5)
UnimplementedRoutingServiceServer(71-71)UnimplementedRoutingServiceServer(73-75)UnimplementedRoutingServiceServer(76-78)UnimplementedRoutingServiceServer(79-79)RoutingServiceServer(57-64)go/apps/ctrl/services/deployment/service.go (3)
Service(10-18)New(27-36)Config(20-25)go/apps/ctrl/workflows/certificate/service.go (3)
Service(19-25)New(45-52)Config(30-42)go/apps/ctrl/services/acme/service.go (3)
Service(9-14)New(22-29)Config(16-20)go/apps/ctrl/workflows/deploy/service.go (2)
New(52-60)Config(34-49)
go/apps/ctrl/workflows/certificate/service.go (2)
go/gen/proto/hydra/v1/certificate_restate.pb.go (4)
UnimplementedCertificateServiceServer(61-61)UnimplementedCertificateServiceServer(63-65)UnimplementedCertificateServiceServer(66-66)CertificateServiceServer(50-54)go/apps/ctrl/workflows/routing/service.go (3)
Service(18-24)New(44-51)Config(29-41)
go/apps/ctrl/workflows/certificate/process_challenge_handler.go (7)
go/pkg/partition/db/models_generated.go (1)
Certificate(104-112)go/apps/ctrl/workflows/certificate/service.go (1)
Service(19-25)go/gen/proto/hydra/v1/certificate.pb.go (6)
ProcessChallengeRequest(25-31)ProcessChallengeRequest(44-44)ProcessChallengeRequest(59-61)ProcessChallengeResponse(77-83)ProcessChallengeResponse(96-96)ProcessChallengeResponse(111-113)go/pkg/db/models_generated.go (3)
AcmeChallengesStatusPending(20-20)AcmeChallengesStatusFailed(22-22)AcmeChallengesStatusVerified(21-21)go/pkg/db/handle_err_no_rows.go (1)
IsNotFound(8-10)go/pkg/partition/db/certificate_insert.sql_generated.go (1)
InsertCertificateParams(23-30)go/pkg/db/acme_challenge_update_verified_with_expiry.sql_generated.go (1)
UpdateAcmeChallengeVerifiedWithExpiryParams(19-24)
go/apps/ctrl/services/deployment/rollback.go (3)
go/apps/ctrl/services/deployment/service.go (1)
Service(10-18)go/gen/proto/hydra/v1/deployment.pb.go (6)
RollbackRequest(121-127)RollbackRequest(140-140)RollbackRequest(155-157)RollbackResponse(173-177)RollbackResponse(190-190)RollbackResponse(205-207)go/gen/proto/ctrl/v1/deployment.pb.go (6)
RollbackRequest(832-838)RollbackRequest(851-851)RollbackRequest(866-868)RollbackResponse(884-888)RollbackResponse(901-901)RollbackResponse(916-918)
go/cmd/ctrl/main.go (2)
go/pkg/cli/flag.go (4)
String(419-451)Default(364-416)EnvVar(320-339)Int(532-569)go/apps/ctrl/config.go (1)
RestateConfig(32-46)
go/apps/ctrl/workflows/routing/helpers.go (2)
go/gen/proto/hydra/v1/routing.pb.go (7)
DomainSticky(26-26)DomainSticky(61-63)DomainSticky(65-67)DomainSticky(74-76)DomainSticky_DOMAIN_STICKY_BRANCH(30-30)DomainSticky_DOMAIN_STICKY_ENVIRONMENT(31-31)DomainSticky_DOMAIN_STICKY_LIVE(32-32)go/pkg/db/models_generated.go (5)
NullDomainsSticky(259-262)DomainsSticky(239-239)DomainsStickyBranch(242-242)DomainsStickyEnvironment(243-243)DomainsStickyLive(244-244)
go/apps/ctrl/services/deployment/create_deployment.go (1)
go/gen/proto/hydra/v1/deployment.pb.go (3)
DeployRequest(25-32)DeployRequest(45-45)DeployRequest(60-62)
go/apps/ctrl/workflows/routing/assign_domains_handler.go (7)
go/apps/ctrl/workflows/routing/service.go (3)
Service(18-24)New(44-51)Config(29-41)go/gen/proto/hydra/v1/routing.pb.go (6)
AssignDomainsRequest(79-93)AssignDomainsRequest(106-106)AssignDomainsRequest(121-123)AssignDomainsResponse(226-232)AssignDomainsResponse(245-245)AssignDomainsResponse(260-262)go/pkg/db/tx.go (1)
Tx(196-201)go/pkg/db/handle_err_no_rows.go (1)
IsNotFound(8-10)go/pkg/db/domain_insert.sql_generated.go (1)
InsertDomainParams(39-49)go/pkg/db/domain_reassign.sql_generated.go (1)
ReassignDomainParams(22-27)go/pkg/partition/db/gateway_upsert.sql_generated.go (1)
UpsertGatewayParams(31-36)
go/pkg/db/domain_find_by_ids.sql_generated.go (1)
go/pkg/db/models_generated.go (2)
NullDomainsSticky(259-262)DomainsType(282-282)
go/apps/ctrl/workflows/deploy/rollback_handler.go (10)
go/apps/ctrl/workflows/deploy/service.go (1)
Workflow(22-29)go/pkg/db/deployment_find_by_id.sql_generated.go (1)
FindDeploymentByIdRow(35-51)go/pkg/db/handle_err_no_rows.go (1)
IsNotFound(8-10)go/pkg/db/project_find_by_id.sql_generated.go (1)
FindProjectByIdRow(30-42)go/pkg/partition/db/models_generated.go (2)
Vm(137-145)VmsStatusRunning(63-63)go/pkg/db/domain_find_for_rollback.sql_generated.go (2)
FindDomainsForRollbackRow(37-47)FindDomainsForRollbackParams(32-35)go/pkg/db/models_generated.go (4)
NullDomainsSticky(259-262)DomainsSticky(239-239)DomainsStickyLive(244-244)DomainsStickyEnvironment(243-243)go/gen/proto/hydra/v1/routing_restate.pb.go (1)
NewRoutingServiceClient(30-37)go/gen/proto/hydra/v1/routing.pb.go (3)
SwitchDomainsRequest(272-279)SwitchDomainsRequest(292-292)SwitchDomainsRequest(307-309)go/pkg/db/project_update_deployments.sql_generated.go (1)
UpdateProjectDeploymentsParams(22-27)
go/apps/ctrl/workflows/deploy/promote_handler.go (11)
go/apps/ctrl/workflows/deploy/service.go (1)
Workflow(22-29)go/gen/proto/hydra/v1/deployment.pb.go (6)
PromoteRequest(209-214)PromoteRequest(227-227)PromoteRequest(242-244)PromoteResponse(253-257)PromoteResponse(270-270)PromoteResponse(285-287)go/pkg/db/deployment_find_by_id.sql_generated.go (1)
FindDeploymentByIdRow(35-51)go/pkg/db/handle_err_no_rows.go (1)
IsNotFound(8-10)go/pkg/db/project_find_by_id.sql_generated.go (1)
FindProjectByIdRow(30-42)go/pkg/db/models_generated.go (5)
DeploymentsStatusReady(200-200)NullDomainsSticky(259-262)DomainsSticky(239-239)DomainsStickyLive(244-244)DomainsStickyEnvironment(243-243)go/pkg/partition/db/models_generated.go (2)
Vm(137-145)VmsStatusRunning(63-63)go/pkg/db/domain_find_for_promotion.sql_generated.go (2)
FindDomainsForPromotionRow(37-47)FindDomainsForPromotionParams(32-35)go/gen/proto/hydra/v1/routing_restate.pb.go (1)
NewRoutingServiceClient(30-37)go/gen/proto/hydra/v1/routing.pb.go (3)
SwitchDomainsRequest(272-279)SwitchDomainsRequest(292-292)SwitchDomainsRequest(307-309)go/pkg/db/project_update_deployments.sql_generated.go (1)
UpdateProjectDeploymentsParams(22-27)
go/apps/ctrl/services/deployment/service.go (5)
go/apps/ctrl/workflows/certificate/service.go (3)
New(45-52)Config(30-42)Service(19-25)go/apps/ctrl/workflows/deploy/service.go (2)
New(52-60)Config(34-49)go/apps/ctrl/workflows/routing/service.go (3)
New(44-51)Config(29-41)Service(18-24)go/apps/ctrl/services/acme/service.go (3)
New(22-29)Config(16-20)Service(9-14)go/apps/ctrl/config.go (1)
Config(48-104)
go/apps/ctrl/workflows/routing/switch_domains_handler.go (6)
go/apps/ctrl/workflows/routing/service.go (2)
Service(18-24)Config(29-41)go/gen/proto/hydra/v1/routing.pb.go (6)
SwitchDomainsRequest(272-279)SwitchDomainsRequest(292-292)SwitchDomainsRequest(307-309)SwitchDomainsResponse(325-329)SwitchDomainsResponse(342-342)SwitchDomainsResponse(357-359)go/pkg/partition/db/gateway_find_by_deployment_id.sql_generated.go (1)
FindGatewayByDeploymentIdRow(20-23)go/pkg/db/domain_find_by_ids.sql_generated.go (1)
FindDomainsByIdsRow(30-41)go/pkg/partition/db/gateway_upsert.sql_generated.go (1)
UpsertGatewayParams(31-36)go/pkg/db/domain_reassign.sql_generated.go (1)
ReassignDomainParams(22-27)
go/apps/ctrl/services/deployment/promote.go (2)
go/apps/ctrl/services/deployment/service.go (1)
Service(10-18)go/gen/proto/hydra/v1/deployment.pb.go (6)
PromoteRequest(209-214)PromoteRequest(227-227)PromoteRequest(242-244)PromoteResponse(253-257)PromoteResponse(270-270)PromoteResponse(285-287)
go/apps/ctrl/workflows/deploy/helpers.go (5)
go/gen/proto/krane/v1/deployment.pb.go (3)
Instance(520-527)Instance(540-540)Instance(555-557)go/gen/proto/partition/v1/gateway.pb.go (12)
GatewayConfig(26-37)GatewayConfig(50-50)GatewayConfig(65-67)Deployment(104-110)Deployment(123-123)Deployment(138-140)VM(208-213)VM(226-226)VM(241-243)AuthConfig(253-258)AuthConfig(271-271)AuthConfig(286-288)go/apps/ctrl/workflows/deploy/service.go (1)
Workflow(22-29)go/pkg/db/models_generated.go (1)
DeploymentsStatus(193-193)go/pkg/db/deployment_update_status.sql_generated.go (1)
UpdateDeploymentStatusParams(19-23)
go/apps/ctrl/workflows/deploy/service.go (3)
go/gen/proto/hydra/v1/deployment_restate.pb.go (7)
UnimplementedDeploymentServiceServer(73-73)UnimplementedDeploymentServiceServer(75-77)UnimplementedDeploymentServiceServer(78-80)UnimplementedDeploymentServiceServer(81-83)UnimplementedDeploymentServiceServer(84-84)DeploymentServiceClient(15-19)DeploymentServiceServer(62-66)go/apps/ctrl/workflows/routing/service.go (2)
New(44-51)Config(29-41)go/apps/ctrl/config.go (1)
Config(48-104)
go/apps/ctrl/workflows/deploy/deploy_handler.go (14)
go/apps/ctrl/workflows/deploy/service.go (1)
Workflow(22-29)go/gen/proto/hydra/v1/deployment.pb.go (6)
DeployRequest(25-32)DeployRequest(45-45)DeployRequest(60-62)DeployResponse(85-89)DeployResponse(102-102)DeployResponse(117-119)go/pkg/db/deployment_find_by_id.sql_generated.go (1)
FindDeploymentByIdRow(35-51)go/pkg/db/models_generated.go (5)
DeploymentsStatusFailed(201-201)Workspace(814-832)DeploymentsStatusBuilding(197-197)DeploymentsStatusDeploying(198-198)DeploymentsStatusReady(200-200)go/pkg/db/project_find_by_id.sql_generated.go (1)
FindProjectByIdRow(30-42)go/pkg/db/environment_find_by_id.sql_generated.go (1)
FindEnvironmentByIdRow(19-25)go/pkg/db/deployment_step_insert.sql_generated.go (1)
InsertDeploymentStepParams(33-40)go/gen/proto/krane/v1/deployment.pb.go (6)
DeploymentRequest(76-86)DeploymentRequest(99-99)DeploymentRequest(114-116)Instance(520-527)Instance(540-540)Instance(555-557)go/gen/proto/partition/v1/gateway.pb.go (15)
Deployment(104-110)Deployment(123-123)Deployment(138-140)GatewayConfig(26-37)GatewayConfig(50-50)GatewayConfig(65-67)VM(208-213)VM(226-226)VM(241-243)AuthConfig(253-258)AuthConfig(271-271)AuthConfig(286-288)ValidationConfig(298-304)ValidationConfig(317-317)ValidationConfig(332-334)go/pkg/partition/db/models_generated.go (5)
VmsStatus(57-57)VmsStatusProvisioning(61-61)VmsStatusRunning(63-63)VmsStatusStopping(64-64)VmsStatusAllocated(60-60)go/pkg/partition/db/vm_upsert.sql_generated.go (1)
UpsertVMParams(24-31)go/pkg/db/deployment_update_openapi_spec.sql_generated.go (1)
UpdateDeploymentOpenapiSpecParams(19-23)go/gen/proto/hydra/v1/routing.pb.go (10)
DomainToAssign(174-180)DomainToAssign(193-193)DomainToAssign(208-210)DomainSticky_DOMAIN_STICKY_UNSPECIFIED(29-29)DomainSticky_DOMAIN_STICKY_BRANCH(30-30)DomainSticky_DOMAIN_STICKY_ENVIRONMENT(31-31)DomainSticky_DOMAIN_STICKY_LIVE(32-32)AssignDomainsRequest(79-93)AssignDomainsRequest(106-106)AssignDomainsRequest(121-123)go/gen/proto/hydra/v1/routing_restate.pb.go (1)
NewRoutingServiceClient(30-37)
go/apps/ctrl/run.go (8)
go/gen/proto/hydra/v1/deployment_restate.pb.go (1)
NewDeploymentServiceServer(93-107)go/apps/ctrl/services/deployment/service.go (2)
New(27-36)Config(20-25)go/apps/ctrl/workflows/certificate/service.go (2)
New(45-52)Config(30-42)go/apps/ctrl/workflows/deploy/service.go (2)
New(52-60)Config(34-49)go/apps/ctrl/workflows/routing/service.go (2)
New(44-51)Config(29-41)go/gen/proto/hydra/v1/routing_restate.pb.go (1)
NewRoutingServiceServer(88-101)go/gen/proto/hydra/v1/certificate_restate.pb.go (1)
NewCertificateServiceServer(75-87)go/pkg/retry/retry.go (2)
Attempts(81-86)Backoff(91-97)
🪛 Buf (1.57.2)
go/proto/hydra/v1/certificate.proto
5-5: import "dev/restate/sdk/go.proto": file does not exist
(COMPILE)
go/proto/hydra/v1/routing.proto
5-5: import "dev/restate/sdk/go.proto": file does not exist
(COMPILE)
go/proto/hydra/v1/deployment.proto
5-5: import "dev/restate/sdk/go.proto": file does not exist
(COMPILE)
🪛 Checkov (3.2.334)
go/k8s/manifests/restate.yaml
[medium] 15-64: Containers should not run with allowPrivilegeEscalation
(CKV_K8S_20)
[medium] 15-64: Minimize the admission of root containers
(CKV_K8S_23)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: Test API / API Test Local
- GitHub Check: Test Go API Local / Test
- GitHub Check: Test Packages / Test
- GitHub Check: Build / Build
apps/engineering/content/docs/infrastructure/database-schema.mdx
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
go/apps/ctrl/run.go (1)
206-251: Harden Restate admin registration (duplicate issue).As noted in previous reviews, the registration logic needs hardening:
- Guard on both
RegisterAsandAdminURL(line 207) - currently only checksRegisterAs.- JSON-marshal the payload (line 213) - using
fmt.Sprintffor JSON is fragile and doesn't escape properly.- Use a custom
http.Clientwith timeout (line 232) -http.DefaultClienthas no timeout.- Replace fixed 5s backoff with jittered backoff (lines 219-221) - all retries occur at exactly 5s intervals.
Apply these changes:
+ import "encoding/json" + - if cfg.Restate.RegisterAs != "" { + if cfg.Restate.RegisterAs != "" && cfg.Restate.AdminURL != "" { go func() { time.Sleep(2 * time.Second) registerURL := fmt.Sprintf("%s/deployments", cfg.Restate.AdminURL) - payload := fmt.Sprintf(`{"uri": "%s"}`, cfg.Restate.RegisterAs) + type registrationPayload struct { + URI string `json:"uri"` + } + body, _ := json.Marshal(registrationPayload{URI: cfg.Restate.RegisterAs}) logger.Info("Registering with Restate", "admin_url", registerURL, "service_uri", cfg.Restate.RegisterAs) retrier := retry.New( retry.Attempts(10), retry.Backoff(func(n int) time.Duration { - return 5 * time.Second + // Exponential backoff with jitter: 2^n seconds + random jitter up to 1s + base := time.Duration(1<<uint(n)) * time.Second + jitter := time.Duration(rand.Intn(1000)) * time.Millisecond + return base + jitter }), ) err := retrier.Do(func() error { - req, err := http.NewRequestWithContext(ctx, "POST", registerURL, bytes.NewBufferString(payload)) + req, err := http.NewRequestWithContext(ctx, http.MethodPost, registerURL, bytes.NewReader(body)) if err != nil { return fmt.Errorf("failed to create registration request: %w", err) } req.Header.Set("Content-Type", "application/json") - resp, err := http.DefaultClient.Do(req) + client := &http.Client{Timeout: 5 * time.Second} + resp, err := client.Do(req) if err != nil { return fmt.Errorf("failed to register with Restate: %w", err) }Note: You'll need to import
encoding/jsonandmath/randfor the jittered backoff.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
go/apps/ctrl/run.go(4 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-09-15T18:12:01.503Z
Learnt from: mcstepp
PR: unkeyed/unkey#3952
File: go/apps/ctrl/run.go:212-213
Timestamp: 2025-09-15T18:12:01.503Z
Learning: In Go connect-go generated handlers, mux.Handle can accept New*ServiceHandler functions directly even though they return (string, http.Handler) tuples. The pattern mux.Handle(ctrlv1connect.New*ServiceHandler(...)) is valid and compiles successfully in unkey codebase.
Applied to files:
go/apps/ctrl/run.go
🧬 Code graph analysis (1)
go/apps/ctrl/run.go (8)
go/gen/proto/hydra/v1/deployment_restate.pb.go (1)
NewDeploymentServiceServer(93-107)go/apps/ctrl/services/deployment/service.go (2)
New(27-36)Config(20-25)go/apps/ctrl/workflows/certificate/service.go (2)
New(45-52)Config(30-42)go/apps/ctrl/workflows/deploy/service.go (2)
New(52-60)Config(34-49)go/apps/ctrl/workflows/routing/service.go (2)
New(44-51)Config(29-41)go/gen/proto/hydra/v1/routing_restate.pb.go (1)
NewRoutingServiceServer(88-101)go/gen/proto/hydra/v1/certificate_restate.pb.go (1)
NewCertificateServiceServer(75-87)go/pkg/retry/retry.go (2)
Attempts(81-86)Backoff(91-97)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Test API / API Test Local
- GitHub Check: Test Packages / Test
🔇 Additional comments (2)
go/apps/ctrl/run.go (2)
274-279: LGTM!The deployment service handler correctly uses the new
deployment.Configstructure with the Restate client. The wiring aligns with the Restate migration architecture.
198-204: Fail fast on Restate server start errors.
- Add a buffered error channel and send any
Startfailures to it; before the signal wait, select on the channel to exit on error:+ restateErrCh := make(chan error, 1) go func() { addr := fmt.Sprintf(":%d", cfg.Restate.HttpPort) logger.Info("Starting Restate server", "addr", addr) if startErr := restateSrv.Start(ctx, addr); startErr != nil { logger.Error("failed to start restate server", "error", startErr.Error()) + restateErrCh <- startErr } }() + select { + case err := <-restateErrCh: + return err + default: + }
- Check if the Restate SDK provides a
CloseorShutdownmethod and register it withshutdowns.RegisterCtxfor graceful teardown.

What does this PR do?
This is mostly generated code changes, don't be scared
I've already chatted with @Flo4604 earlier, but I noticed our hydra implementation is quite flawed.
We had race conditions, cases where a workflow was not durable at all when workers crashed and after spending a few hours trying to fix it, I figured it's not worth it.
This PR rips our hydra and powers our durable workflows via restate.dev instead.
It's already in sandbox btw.
I realised we can either spend a lot of time to build this inhouse, or focus on what really matters.
Restate has a very nice balance between features and dependencies. It ships as a single binary for devmode and they also have a k8s operator, which we can use later in prod.
Flo also has experience running it in prod for steamsets.
The actual workflow implementation is fairly similar, so the business logic is almost the same.
The exception is that we use a dedicated "Virtual Object" for domain and gateway assignments, so these are always serialized and we can not have race conditions there.
Type of change
How should this be tested?
Checklist
Required
pnpm buildpnpm fmtconsole.logsgit pull origin mainAppreciated