Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
|
📝 WalkthroughWalkthroughThis change re-enables ACME certificate management functionality as a Restate cron job. Route53 is introduced as an alternative DNS-01 provider alongside Cloudflare. A centralized in-memory caching layer is added for domains and challenges. Provider implementations are refactored to use a generic wrapper pattern that handles DNS operations and database persistence. Changes
Sequence DiagramsequenceDiagram
actor Restate as Restate Client
participant CertSvc as Certificate Service
participant DB as Database
participant Provider as DNS/HTTP Provider
participant ACME as ACME Client
Note over Restate,ACME: Certificate Renewal Cron
Restate->>CertSvc: RenewExpiringCertificates(ctx, req)
activate CertSvc
CertSvc->>DB: Query expiring/waiting challenges<br/>(DNS01, HTTP01)
DB-->>CertSvc: List of challenges
loop For each challenge domain
CertSvc->>Provider: Resolve domain from cache/DB<br/>(exact or wildcard match)
Provider-->>CertSvc: CustomDomain
CertSvc->>ACME: Create client with user/registration
ACME-->>CertSvc: Client ready
CertSvc->>Provider: ProcessChallenge (fire-and-forget)
activate Provider
Provider->>Provider: Present DNS/HTTP record
Provider->>DB: Record ACME challenge state<br/>(DomainID, Status, Token, etc.)
DB-->>Provider: ✓ Persisted
Provider-->>CertSvc: Done
deactivate Provider
CertSvc->>CertSvc: Throttle (brief sleep)
end
Note over CertSvc: Collect metrics<br/>(checked, triggered, failed)
CertSvc->>Restate: Schedule next run<br/>with 24h delay & idempotency key
Restate-->>CertSvc: ✓ Scheduled
deactivate CertSvc
CertSvc-->>Restate: RenewExpiringCertificatesResponse<br/>(counts, failed_domains)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45–75 minutes
Areas requiring extra attention:
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Actionable comments posted: 7
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
go/apps/ctrl/workflows/certificate/service.go (1)
54-64: Validate required configuration fields.The new fields
DNSProvider,EmailDomain, andDefaultDomain(lines 60-62) are assigned without validation. If these are required for the service to function correctly (e.g., DNSProvider for DNS-01 challenges), the constructor should validate them before creating the service instance.Consider adding validation:
func New(cfg Config) *Service { + // Validate required fields + if cfg.DNSProvider == nil { + panic("DNSProvider is required") + } + if cfg.EmailDomain == "" { + panic("EmailDomain is required") + } + if cfg.DefaultDomain == "" { + panic("DefaultDomain is required") + } + return &Service{ UnimplementedCertificateServiceServer: hydrav1.UnimplementedCertificateServiceServer{}, db: cfg.DB, vault: cfg.Vault, logger: cfg.Logger, emailDomain: cfg.EmailDomain, defaultDomain: cfg.DefaultDomain, dnsProvider: cfg.DNSProvider, } }Alternatively, if any of these fields are optional, document this in the
Configstruct comments and handle nil/empty cases appropriately in the methods that use them.
🧹 Nitpick comments (10)
go/apps/ctrl/config.go (2)
53-65: AcmeConfig additions (EmailDomain & Route53) are coherently modeled.
EmailDomain,Cloudflare, andRoute53are grouped cleanly underAcmeConfig, which will make it easier to extend to additional providers later. Very minor nit: theConfigstruct comment at Line 155 still readsACME/Cloudflare Configurationeven though Route53 is now supported—renaming that to justACME Configurationwould avoid future confusion, but it can be done opportunistically.
219-236: Route53 validation mirrors Cloudflare; consider also validating EmailDomain when ACME is enabled.The new Route53 block correctly enforces that
AccessKeyID,SecretAccessKey, andRegionare non-empty when ACME + Route53 are enabled, mirroring the existing Cloudflare API token check from earlier work. This will fail fast on misconfigurations, which is good. Based on learnings, this keeps Route53 in line with the prior Cloudflare validation behavior.You might additionally want to reject an empty
EmailDomainwhenever ACME is enabled, so misconfigured registration email domains are caught at startup instead of at issuance time. One way to do that:func (c Config) Validate() error { - // Validate Cloudflare configuration if enabled - if c.Acme.Enabled && c.Acme.Cloudflare.Enabled { + if c.Acme.Enabled { + if err := assert.NotEmpty(c.Acme.EmailDomain, "acme email domain is required when acme is enabled"); err != nil { + return err + } + } + + // Validate Cloudflare configuration if enabled + if c.Acme.Enabled && c.Acme.Cloudflare.Enabled { if err := assert.NotEmpty(c.Acme.Cloudflare.ApiToken, "cloudflare API token is required when cloudflare is enabled"); err != nil { return err } } // Validate Route53 configuration if enabled if c.Acme.Enabled && c.Acme.Route53.Enabled { if err := assert.All( assert.NotEmpty(c.Acme.Route53.AccessKeyID, "route53 access key ID is required when route53 is enabled"), assert.NotEmpty(c.Acme.Route53.SecretAccessKey, "route53 secret access key is required when route53 is enabled"), assert.NotEmpty(c.Acme.Route53.Region, "route53 region is required when route53 is enabled"), ); err != nil { return err } }go/cmd/ctrl/main.go (1)
103-116: ACME CLI flags cover email domain and Route53 options in a sensible way.The new
acme-email-domainandacme-route53-*flags (with matchingUNKEY_ACME_*env vars and a reasonable default region) round out ACME configuration nicely. The defaultacme-email-domain="unkey.com"is convenient for your hosted environment; if you expect many self‑hosted users, you might eventually consider dropping the default to force an explicit choice, but that’s more of a product call than a correctness issue.go/apps/ctrl/internal/caches/caches.go (1)
22-43: Consider validating or defaulting the Logger field.The
Clockfield is defaulted if nil (lines 23-26), but theLoggerfield at line 32 is passed directly tocache.Newwithout validation. If a nilLoggeris passed and the cache implementation requires a valid logger, this could cause issues.Consider adding validation or a default logger:
func New(cfg Config) (*Caches, error) { clk := cfg.Clock if clk == nil { clk = clock.New() } + + if cfg.Logger == nil { + return nil, fmt.Errorf("logger is required") + } domains, err := cache.New(cache.Config[string, db.CustomDomain]{Alternatively, if
cache.Newalready validates the logger internally, document this requirement in theConfigstruct comments.go/apps/ctrl/workflows/certificate/renew_handler.go (1)
49-76: Consider adding pagination or limits to prevent overwhelming the system.The renewal loop processes all executable challenges without any limit (line 49). If thousands of certificates need renewal, this could overwhelm the system with renewal requests, despite the 100ms throttle delay.
Consider one of these approaches:
- Process in batches: Limit the number of renewals per run (e.g., max 100), and let subsequent cron runs handle the rest.
- Add a configurable limit: Allow operators to tune the batch size based on system capacity.
- Track progress: Store progress state and resume from where the last run left off.
Example with a limit:
+const maxRenewalsPerRun = 100 + func (s *Service) RenewExpiringCertificates( ctx restate.ObjectContext, req *hydrav1.RenewExpiringCertificatesRequest, ) (*hydrav1.RenewExpiringCertificatesResponse, error) { // ... existing code ... var failedDomains []string renewalsTriggered := int32(0) - for _, challenge := range challenges { + for i, challenge := range challenges { + if i >= maxRenewalsPerRun { + s.logger.Info("reached max renewals per run", "limit", maxRenewalsPerRun) + break + } + s.logger.Info("triggering certificate renewal",go/apps/ctrl/services/acme/user.go (1)
71-109: Consider logging success after registration URI persistence.The registration completion logic is well-structured with proper conditional handling. However, you only log a warning on failure at line 107 but don't log success. For consistency and observability, consider adding an info log when the registration URI is successfully persisted.
if updateErr := db.Query.UpdateAcmeUserRegistrationURI(ctx, cfg.DB.RW(), db.UpdateAcmeUserRegistrationURIParams{ ID: foundUser.ID, RegistrationUri: sql.NullString{Valid: true, String: reg.URI}, }); updateErr != nil { cfg.Logger.Warn("failed to persist registration URI", "error", updateErr) + } else { + cfg.Logger.Info("persisted registration URI", "workspace_id", cfg.WorkspaceID) }go/apps/ctrl/services/acme/providers/provider.go (2)
94-124: Present() creates a detached context, losing trace propagation.Using
context.Background()at line 96 discards any tracing/logging context that may have been passed through the ACME challenge flow. Since the legochallenge.Providerinterface doesn't accept a context, consider at minimum logging a trace ID if available from the caller, or documenting this limitation.If trace context is important for debugging ACME challenges, consider storing a context reference in the Provider struct during construction or using a context-aware wrapper pattern. For now, this is acceptable given lego's API constraints, but worth noting for observability.
126-136: CleanUp intentionally swallows errors — consider adding a metric.The decision to log warnings but not fail on cleanup errors is reasonable (DNS records can often be cleaned up manually, and failing would block certificate issuance). However, for operational visibility, consider emitting a metric or structured log field that can be easily alerted on if cleanup failures become frequent.
go/apps/ctrl/workflows/certificate/process_challenge_handler.go (2)
85-85: Step comment numbering is off — says "Step 5" but should be "Step 4".Minor documentation inconsistency: the comment says "Step 5" but this is actually the 4th step in the flow (after resolve domain, claim challenge, obtain certificate).
- // Step 5: Persist certificate to DB + // Step 4: Persist certificate to DB
118-141: TheworkspaceIDparameter in getOrCreateAcmeClient is unused.The function signature accepts a
workspaceIDparameter (named_on line 121), butglobalAcmeUserIDis always used instead. Either remove the parameter or document why it's reserved for future use.If the intention is to always use a global ACME user, simplify the signature:
-func (s *Service) getOrCreateAcmeClient(ctx context.Context, _ string) (*lego.Client, error) { +func (s *Service) getOrCreateAcmeClient(ctx context.Context) (*lego.Client, error) {And update the call site at line 147:
- client, err := s.getOrCreateAcmeClient(ctx, workspaceID) + client, err := s.getOrCreateAcmeClient(ctx)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (3)
go/gen/proto/hydra/v1/certificate.pb.gois excluded by!**/*.pb.go,!**/gen/**go/gen/proto/hydra/v1/certificate_restate.pb.gois excluded by!**/*.pb.go,!**/gen/**go/go.sumis excluded by!**/*.sum
📒 Files selected for processing (21)
go/apps/ctrl/config.go(2 hunks)go/apps/ctrl/internal/caches/caches.go(1 hunks)go/apps/ctrl/run.go(5 hunks)go/apps/ctrl/services/acme/pem.go(1 hunks)go/apps/ctrl/services/acme/providers/cloudflare_provider.go(1 hunks)go/apps/ctrl/services/acme/providers/provider.go(1 hunks)go/apps/ctrl/services/acme/providers/route53_provider.go(1 hunks)go/apps/ctrl/services/acme/user.go(5 hunks)go/apps/ctrl/workflows/certificate/process_challenge_handler.go(6 hunks)go/apps/ctrl/workflows/certificate/renew_handler.go(1 hunks)go/apps/ctrl/workflows/certificate/service.go(4 hunks)go/cmd/ctrl/main.go(2 hunks)go/go.mod(1 hunks)go/pkg/db/bulk_custom_domain_upsert.sql_generated.go(1 hunks)go/pkg/db/custom_domain_find_by_domain_or_wildcard.sql_generated.go(1 hunks)go/pkg/db/custom_domain_upsert.sql_generated.go(1 hunks)go/pkg/db/querier_bulk_generated.go(1 hunks)go/pkg/db/querier_generated.go(2 hunks)go/pkg/db/queries/custom_domain_find_by_domain_or_wildcard.sql(1 hunks)go/pkg/db/queries/custom_domain_upsert.sql(1 hunks)go/proto/hydra/v1/certificate.proto(2 hunks)
🧰 Additional context used
🧠 Learnings (7)
📚 Learning: 2025-09-11T14:24:40.988Z
Learnt from: Flo4604
Repo: unkeyed/unkey PR: 3944
File: go/apps/ctrl/services/deployment/deploy_workflow.go:326-334
Timestamp: 2025-09-11T14:24:40.988Z
Learning: The InsertDomains method in the bulk queries uses ON DUPLICATE KEY UPDATE, making it an upsert operation that is idempotent and safe for retries, despite the "Insert" naming convention.
Applied to files:
go/pkg/db/querier_generated.gogo/pkg/db/custom_domain_upsert.sql_generated.gogo/pkg/db/queries/custom_domain_upsert.sqlgo/pkg/db/bulk_custom_domain_upsert.sql_generated.gogo/pkg/db/querier_bulk_generated.go
📚 Learning: 2025-09-11T14:12:30.570Z
Learnt from: Flo4604
Repo: unkeyed/unkey PR: 3944
File: go/apps/ctrl/run.go:292-299
Timestamp: 2025-09-11T14:12:30.570Z
Learning: In go/apps/ctrl/config.go, the Validate() method already includes validation for Cloudflare API token when ACME and Cloudflare are both enabled, checking that cfg.Acme.Cloudflare.ApiToken is not empty.
Applied to files:
go/apps/ctrl/config.gogo/cmd/ctrl/main.gogo/apps/ctrl/services/acme/providers/cloudflare_provider.go
📚 Learning: 2025-09-11T14:12:30.570Z
Learnt from: Flo4604
Repo: unkeyed/unkey PR: 3944
File: go/apps/ctrl/run.go:292-299
Timestamp: 2025-09-11T14:12:30.570Z
Learning: In go/apps/ctrl/config.go, the Validate() method already includes validation for Cloudflare API token using assert.NotEmpty() when both c.Acme.Enabled and c.Acme.Cloudflare.Enabled are true, with the error message "cloudflare API token is required when cloudflare is enabled".
Applied to files:
go/apps/ctrl/config.gogo/cmd/ctrl/main.gogo/apps/ctrl/services/acme/providers/cloudflare_provider.go
📚 Learning: 2025-07-17T14:24:20.403Z
Learnt from: Flo4604
Repo: unkeyed/unkey PR: 3631
File: go/pkg/db/bulk_keyring_insert.sql.go:23-25
Timestamp: 2025-07-17T14:24:20.403Z
Learning: In go/pkg/db/bulk_keyring_insert.sql.go and similar bulk insert generated files, hardcoded zero values for fields like size_approx and size_last_updated_at are intentional and reflect the original SQL query structure, not missing parameters.
Applied to files:
go/pkg/db/bulk_custom_domain_upsert.sql_generated.gogo/pkg/db/querier_bulk_generated.go
📚 Learning: 2025-09-12T08:01:20.792Z
Learnt from: Flo4604
Repo: unkeyed/unkey PR: 3944
File: go/pkg/db/acme_challenge_update_verified_with_expiry.sql_generated.go:31-39
Timestamp: 2025-09-12T08:01:20.792Z
Learning: Do not review or suggest changes to files with sql_generated.go suffix or other files marked as auto-generated (containing "Code generated by" comments), as these are generated by tools like sqlc and changes would be overwritten on regeneration.
Applied to files:
go/pkg/db/bulk_custom_domain_upsert.sql_generated.gogo/pkg/db/custom_domain_find_by_domain_or_wildcard.sql_generated.go
📚 Learning: 2025-09-25T18:49:28.948Z
Learnt from: mcstepp
Repo: unkeyed/unkey PR: 4010
File: go/apps/ctrl/run.go:224-233
Timestamp: 2025-09-25T18:49:28.948Z
Learning: In go/apps/ctrl/run.go, mcstepp prefers to keep ACME service handlers wrapped with auth interceptor during the demo phase for simplicity, even though typically ACME routes would be exempt from authentication. This is part of the temporary demo authentication system.
Applied to files:
go/cmd/ctrl/main.go
📚 Learning: 2025-09-11T14:13:12.124Z
Learnt from: Flo4604
Repo: unkeyed/unkey PR: 3944
File: go/apps/ctrl/services/acme/providers/cloudflare_provider.go:56-69
Timestamp: 2025-09-11T14:13:12.124Z
Learning: For the default domain in the Unkey system (go/apps/ctrl/services/acme/providers/cloudflare_provider.go), only wildcard certificates are issued. When domain == p.defaultDomain, the code should always convert to the wildcard form ("*." + domain) and not attempt fallback lookups to the base domain form.
Applied to files:
go/apps/ctrl/run.go
🧬 Code graph analysis (13)
go/pkg/db/querier_generated.go (4)
go/pkg/db/interface.go (1)
DBTX(29-34)go/pkg/db/custom_domain_find_by_domain_or_wildcard.sql_generated.go (1)
FindCustomDomainByDomainOrWildcardParams(20-24)go/pkg/db/models_generated.go (1)
CustomDomain(727-734)go/pkg/db/custom_domain_upsert.sql_generated.go (1)
UpsertCustomDomainParams(22-29)
go/apps/ctrl/workflows/certificate/renew_handler.go (3)
go/gen/proto/hydra/v1/certificate.pb.go (6)
RenewExpiringCertificatesRequest(129-135)RenewExpiringCertificatesRequest(148-148)RenewExpiringCertificatesRequest(163-165)RenewExpiringCertificatesResponse(174-181)RenewExpiringCertificatesResponse(194-194)RenewExpiringCertificatesResponse(209-211)go/pkg/db/models_generated.go (3)
AcmeChallengesChallengeType(16-16)AcmeChallengesChallengeTypeDNS01(20-20)AcmeChallengesChallengeTypeHTTP01(19-19)go/pkg/db/acme_challenge_list_executable.sql_generated.go (1)
ListExecutableChallengesRow(21-25)
go/pkg/db/custom_domain_upsert.sql_generated.go (3)
go/pkg/db/models_generated.go (1)
CustomDomainsChallengeType(144-144)go/pkg/db/queries.go (1)
Queries(3-3)go/pkg/db/interface.go (1)
DBTX(29-34)
go/apps/ctrl/services/acme/user.go (3)
go/pkg/db/models_generated.go (1)
AcmeUser(639-646)go/pkg/db/acme_user_update_registration_uri.sql_generated.go (1)
UpdateAcmeUserRegistrationURIParams(17-20)go/pkg/db/types/null_string.go (1)
NullString(10-10)
go/apps/ctrl/services/acme/providers/route53_provider.go (3)
go/pkg/cache/interface.go (1)
Cache(7-49)go/apps/ctrl/config.go (1)
Route53Config(35-51)go/apps/ctrl/services/acme/providers/provider.go (3)
Provider(34-39)NewProvider(49-66)ProviderConfig(41-46)
go/apps/ctrl/internal/caches/caches.go (2)
go/pkg/cache/interface.go (1)
Cache(7-49)go/pkg/clock/interface.go (1)
Clock(12-18)
go/apps/ctrl/config.go (2)
go/apps/ctrl/services/acme/providers/cloudflare_provider.go (1)
CloudflareConfig(13-18)go/apps/ctrl/services/acme/providers/route53_provider.go (1)
Route53Config(14-24)
go/pkg/db/bulk_custom_domain_upsert.sql_generated.go (3)
go/pkg/db/queries.go (1)
BulkQueries(4-4)go/pkg/db/interface.go (1)
DBTX(29-34)go/pkg/db/custom_domain_upsert.sql_generated.go (1)
UpsertCustomDomainParams(22-29)
go/cmd/ctrl/main.go (4)
go/pkg/cli/flag.go (4)
String(419-453)Default(364-416)EnvVar(320-339)Bool(497-535)go/apps/ctrl/services/acme/providers/cloudflare_provider.go (1)
CloudflareConfig(13-18)go/apps/ctrl/config.go (2)
CloudflareConfig(27-33)Route53Config(35-51)go/apps/ctrl/services/acme/providers/route53_provider.go (1)
Route53Config(14-24)
go/apps/ctrl/workflows/certificate/service.go (3)
go/apps/ctrl/services/acme/service.go (1)
Service(9-13)go/apps/ctrl/workflows/routing/service.go (1)
Service(18-23)go/apps/ctrl/services/acme/providers/provider.go (1)
DNSProvider(26-30)
go/pkg/db/custom_domain_find_by_domain_or_wildcard.sql_generated.go (3)
go/pkg/db/queries.go (1)
Queries(3-3)go/pkg/db/interface.go (1)
DBTX(29-34)go/pkg/db/models_generated.go (1)
CustomDomain(727-734)
go/pkg/db/querier_bulk_generated.go (1)
go/pkg/db/interface.go (1)
DBTX(29-34)
go/apps/ctrl/workflows/certificate/process_challenge_handler.go (5)
go/gen/proto/hydra/v1/certificate.pb.go (6)
ProcessChallengeRequest(25-31)ProcessChallengeRequest(44-44)ProcessChallengeRequest(59-61)ProcessChallengeResponse(77-83)ProcessChallengeResponse(96-96)ProcessChallengeResponse(111-113)go/apps/ctrl/services/acme/user.go (2)
GetOrCreateUser(49-112)UserConfig(41-47)go/apps/ctrl/services/acme/pem.go (1)
GetCertificateExpiry(11-23)go/pkg/uid/uid.go (1)
CertificatePrefix(53-53)go/pkg/db/certificate_insert.sql_generated.go (1)
InsertCertificateParams(23-31)
🔇 Additional comments (23)
go/apps/ctrl/config.go (1)
35-51: Route53 ACME config struct matches provider needs and is clear.Field names and comments line up with the runtime
Route53Configingo/apps/ctrl/services/acme/providers/route53_provider.go(access key, secret, region, hosted zone override), so wiring from CLI → config → provider should be straightforward. No changes needed here.go/apps/ctrl/services/acme/pem.go (1)
10-23: PEM certificate expiry helper is correct and straightforward.Decoding the first PEM block, guarding against
nil, then usingx509.ParseCertificateandNotAfter.UnixMilli()is exactly what’s needed here. Error messages are clear and wrapped, and this should work fine with typical leaf-first certificate bundles.go/pkg/db/querier_generated.go (2)
168-187: Wildcard-aware custom domain lookup behavior looks correct.The new
FindCustomDomainByDomainOrWildcardmethod and its SQL comment implement the expected behavior: check both the exact domain and its wildcard variant, order so that an exact match wins (CASE WHEN domain = ? THEN 0 ELSE 1 END), andLIMIT 1to avoid ambiguity. This lines up with the existingFindCustomDomainByDomainand theCustomDomainmodel shape.
2114-2122: UpsertCustomDomain interface matches the SQL upsert semantics and is idempotent.The
UpsertCustomDomainentry mirrors the new SQL (INSERT ... ON DUPLICATE KEY UPDATE workspace_id, challenge_type, updated_at) and the generatedUpsertCustomDomainParamsstruct, giving you a clean, idempotent upsert forcustom_domains. This is consistent with prior “insert but really upsert” patterns described in earlier learnings for domain-related queries.go/pkg/db/querier_bulk_generated.go (1)
16-16: Bulk upsert for custom domains integrates cleanly with BulkQuerier.
UpsertCustomDomain(ctx, db, []UpsertCustomDomainParams)follows the same pattern as other bulk insert/upsert methods and matches the generated params struct, giving you an efficient batched path for domain upserts.go/pkg/db/queries/custom_domain_upsert.sql (1)
1-7: Custom domain upsert SQL is idempotent and matches the generated Go API.The
UpsertCustomDomainquery inserts(id, workspace_id, domain, challenge_type, created_at)and on duplicate key updatesworkspace_id,challenge_type, andupdated_at. This gives you a clean, retry‑safe upsert semantics forcustom_domains, similar to the existing “insert but upsert” patterns called out in previous learnings.go/pkg/db/queries/custom_domain_find_by_domain_or_wildcard.sql (1)
1-6: Domain-or-wildcard lookup query correctly prefers exact matches.
WHERE domain IN (?, ?)combined withORDER BY CASE WHEN domain = ? THEN 0 ELSE 1 END LIMIT 1gives you the intended behavior: try the exact domain and its wildcard variant and return at most one row, preferring the exact match when both exist. This fits the newFindCustomDomainByDomainOrWildcardmethod in the generated Go code.go/cmd/ctrl/main.go (1)
207-220: Ctrl config wiring for ACME EmailDomain and Route53 matches CLI and config structs.
Acme.Enabled,Acme.EmailDomain,Cloudflare, andRoute53are all populated from their respective flags/env vars, and the Route53 fields (Enabled,AccessKeyID,SecretAccessKey,Region,HostedZoneID) align withctrl.Route53Configand the provider’s expectations. This should work smoothly with the new validation logic inctrl.Config.Validate().go/go.mod (1)
79-79: Route53 service module addition supports the lego-based Route53 ACME provider.Adding
github.com/aws/aws-sdk-go-v2/service/route53 v1.53.1as an indirect dependency is justified: it's pulled in by lego'sroute53DNS provider (github.com/go-acme/lego/v4/providers/dns/route53), used inroute53_provider.go. This fits naturally with the existingaws-sdk-go-v2stack (config, credentials, s3) already in use.go/proto/hydra/v1/certificate.proto (1)
17-20: LGTM! Well-designed API for certificate renewal.The
RenewExpiringCertificatesRPC is well-documented with clear guidance on its singleton key pattern and periodic execution model. The request/response structure provides appropriate configurability and observability metrics.go/apps/ctrl/workflows/certificate/renew_handler.go (1)
87-93: Verify idempotency key behavior for same-day reruns.The idempotency key at line 92 uses only the date (formatted as "2006-01-02"), which prevents duplicate schedules on the same day. However, if the job is manually triggered multiple times in a day, each run will attempt to schedule the next run for the same
nextRunDate, and the idempotency key will prevent all but the first from succeeding.This is likely the intended behavior, but verify that:
- Manual reruns during the same day won't cause issues with scheduling.
- The 24-hour delay is always calculated from the current time, not a fixed time of day.
If you want the job to run at a consistent time each day (e.g., always at midnight), consider using a fixed timestamp rather than adding 24 hours to the current time.
go/apps/ctrl/services/acme/user.go (3)
22-31: LGTM on the EmailDomain addition and GetEmail refactor.The new
EmailDomainfield properly decouples the email domain from hardcoded values, andGetEmail()correctly constructs the email using the workspace ID and configurable domain.
49-56: Good defensive error handling added.The added error wrapping distinguishes between "not found" (which triggers registration) and other errors (which now propagate with context). This is a solid improvement.
114-172: LGTM on the register function updates.The
registerfunction correctly:
- Includes
EmailDomainin the new user creation- Persists the registration URI after successful ACME registration
- Uses proper error wrapping throughout
The flow is consistent with the
GetOrCreateUserfunction's handling.go/apps/ctrl/services/acme/providers/provider.go (3)
48-66: LGTM on NewProvider validation.The validation using
assert.Allensures all required dependencies are provided before constructing the provider. This is a clean pattern for constructor validation.
68-92: LGTM on resolveDomain with SWR caching.The domain resolution logic correctly:
- Builds a cache key covering both exact and wildcard lookups
- Uses stale-while-revalidate caching to reduce DB load
- Returns a specific sentinel error for not-found cases
11-11: The import path is correct and no issue exists.The import
github.com/unkeyed/unkey/go/internal/services/cachesat line 11 points togo/internal/services/caches/caches.go, which exists in the repository. Whilego/apps/ctrl/internal/caches/also exists, the file correctly imports from the shared internal services package rather than the app-specific one. This is a valid import path and will compile without errors.Likely an incorrect or invalid review comment.
go/apps/ctrl/run.go (2)
357-371: Idempotency key is static — restarts will skip the initial send after first run.The idempotency key
"cert-renewal-cron-startup"is constant. After the first successful send, subsequent restarts will be deduplicated by Restate and won't trigger a new renewal check. While this prevents duplicate crons on rapid restarts, it may also prevent the cron from being re-triggered if it fails or is canceled.Consider whether this is the intended behavior, or if you want to use a time-based idempotency key (e.g., including the date) similar to how the renewal handler schedules its next run.
492-527: LGTM on bootstrapWildcardDomain logic.The function correctly:
- Checks for existing wildcard domain before creating
- Uses an internal workspace ID for platform-managed resources
- Creates both the custom domain and an initial ACME challenge with "waiting" status
- Logs appropriately at each step
The error handling is defensive (logs and returns rather than propagating), which is appropriate for a bootstrap operation that shouldn't block startup.
go/apps/ctrl/workflows/certificate/process_challenge_handler.go (4)
65-70: Defer compensation logic is correct but consider clarifying the condition.The defer correctly marks the challenge as failed when an error occurs. The
resp.Status == "failed"check handles the case at lines 79-83 whereobtainCertificatefails but returns a response with "failed" status instead of an error. This is intentional but could benefit from a comment explaining why both conditions are checked.
181-186: Good fallback for certificate expiry parsing failure.Using a 90-day default (matching Let's Encrypt's certificate validity) when parsing fails is a reasonable fallback that ensures the renewal cron can still function.
205-223: LGTM on persistCertificate.The function correctly uses the pre-generated certificate ID from
EncryptedCertificateand leverages the upsert pattern for handling both new certificates and renewals.
225-236: Compensation runs inside Restate context correctly.Using
restate.Runfor the compensation ensures the failure status update is durable and will complete even if the process crashes during cleanup.
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
go/apps/ctrl/workflows/certificate/process_challenge_handler.go (1)
72-101: Fix step numbering: jumps from Step 3 to Step 5.The step comments skip Step 4: Line 72 says "Step 3", Line 85 says "Step 5", and Line 93 says "Step 6". Either renumber sequentially or add a comment explaining what Step 4 was (if removed).
- // Step 5: Persist certificate to DB + // Step 4: Persist certificate to DB certID, err := restate.Run(ctx, func(stepCtx restate.RunContext) (string, error) { return s.persistCertificate(stepCtx, dom, req.GetDomain(), cert) }, restate.WithName("persist certificate")) if err != nil { return nil, err } - // Step 6: Mark challenge as verified + // Step 5: Mark challenge as verified _, err = restate.Run(ctx, func(stepCtx restate.RunContext) (restate.Void, error) {
🧹 Nitpick comments (2)
go/apps/ctrl/run.go (1)
525-542: Consider extracting the hardcoded workspace ID to a constant.The hardcoded
"unkey_internal"workspace ID is used for platform-managed resources. Consider defining this as a package-level constant for maintainability and to ensure consistency if it's used elsewhere.+const internalWorkspaceID = "unkey_internal" + func bootstrapWildcardDomain(ctx context.Context, database db.Database, logger logging.Logger, defaultDomain string) { wildcardDomain := "*." + defaultDomain // ... - workspaceID := "unkey_internal" + workspaceID := internalWorkspaceIDgo/apps/ctrl/services/acme/providers/http_provider.go (1)
17-20: Consider removing unuseddbfield fromhttpDNS.The
dbfield is assigned inNewHTTPProvider(line 59) but never referenced inPresentorCleanUpmethods. Since the comment states "The actual DB update is handled by the generic Provider wrapper," this field appears to be dead code.type httpDNS struct { - db db.Database logger logging.Logger }Also update the constructor:
DNS: &httpDNS{ - db: cfg.DB, logger: cfg.Logger, },
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (7)
go/apps/api/routes/v2_identities_list_identities/handler.go(1 hunks)go/apps/ctrl/run.go(6 hunks)go/apps/ctrl/services/acme/providers/http_provider.go(1 hunks)go/apps/ctrl/services/acme/providers/route53_provider.go(1 hunks)go/apps/ctrl/workflows/certificate/process_challenge_handler.go(6 hunks)go/apps/ctrl/workflows/certificate/service.go(4 hunks)go/pkg/db/handle_err_no_rows.go(1 hunks)
🧰 Additional context used
🧠 Learnings (5)
📚 Learning: 2025-08-21T15:54:45.198Z
Learnt from: chronark
Repo: unkeyed/unkey PR: 3825
File: go/internal/services/usagelimiter/limit.go:38-0
Timestamp: 2025-08-21T15:54:45.198Z
Learning: In go/internal/services/usagelimiter/limit.go, the UpdateKeyCreditsDecrement operation cannot be safely wrapped with db.WithRetry due to the lack of idempotency mechanisms in the current tech stack. Retrying this non-idempotent write operation risks double-charging users if the first attempt commits but the client sees a transient error.
Applied to files:
go/apps/api/routes/v2_identities_list_identities/handler.go
📚 Learning: 2025-09-11T14:13:12.124Z
Learnt from: Flo4604
Repo: unkeyed/unkey PR: 3944
File: go/apps/ctrl/services/acme/providers/cloudflare_provider.go:56-69
Timestamp: 2025-09-11T14:13:12.124Z
Learning: For the default domain in the Unkey system (go/apps/ctrl/services/acme/providers/cloudflare_provider.go), only wildcard certificates are issued. When domain == p.defaultDomain, the code should always convert to the wildcard form ("*." + domain) and not attempt fallback lookups to the base domain form.
Applied to files:
go/apps/ctrl/run.go
📚 Learning: 2025-09-11T14:12:30.570Z
Learnt from: Flo4604
Repo: unkeyed/unkey PR: 3944
File: go/apps/ctrl/run.go:292-299
Timestamp: 2025-09-11T14:12:30.570Z
Learning: In go/apps/ctrl/config.go, the Validate() method already includes validation for Cloudflare API token using assert.NotEmpty() when both c.Acme.Enabled and c.Acme.Cloudflare.Enabled are true, with the error message "cloudflare API token is required when cloudflare is enabled".
Applied to files:
go/apps/ctrl/run.go
📚 Learning: 2025-09-11T14:12:30.570Z
Learnt from: Flo4604
Repo: unkeyed/unkey PR: 3944
File: go/apps/ctrl/run.go:292-299
Timestamp: 2025-09-11T14:12:30.570Z
Learning: In go/apps/ctrl/config.go, the Validate() method already includes validation for Cloudflare API token when ACME and Cloudflare are both enabled, checking that cfg.Acme.Cloudflare.ApiToken is not empty.
Applied to files:
go/apps/ctrl/run.go
📚 Learning: 2025-09-25T18:49:28.948Z
Learnt from: mcstepp
Repo: unkeyed/unkey PR: 4010
File: go/apps/ctrl/run.go:224-233
Timestamp: 2025-09-25T18:49:28.948Z
Learning: In go/apps/ctrl/run.go, mcstepp prefers to keep ACME service handlers wrapped with auth interceptor during the demo phase for simplicity, even though typically ACME routes would be exempt from authentication. This is part of the temporary demo authentication system.
Applied to files:
go/apps/ctrl/run.go
🧬 Code graph analysis (6)
go/apps/ctrl/workflows/certificate/service.go (4)
go/apps/ctrl/services/acme/service.go (1)
Service(9-13)go/apps/ctrl/workflows/routing/service.go (1)
Service(18-23)go/apps/ctrl/services/acme/providers/provider.go (1)
DNSProvider(26-30)go/apps/ctrl/services/acme/providers/http_provider.go (1)
HTTPProvider(45-45)
go/apps/api/routes/v2_identities_list_identities/handler.go (1)
go/pkg/db/handle_err_no_rows.go (1)
IsNotFound(10-12)
go/apps/ctrl/services/acme/providers/route53_provider.go (2)
go/apps/ctrl/config.go (1)
Route53Config(35-51)go/apps/ctrl/services/acme/providers/provider.go (3)
Provider(34-39)NewProvider(49-66)ProviderConfig(41-46)
go/apps/ctrl/run.go (6)
go/apps/ctrl/internal/caches/caches.go (2)
New(22-43)Config(17-20)go/pkg/uid/uid.go (1)
New(103-152)go/apps/ctrl/services/acme/providers/http_provider.go (3)
NewHTTPProvider(54-64)HTTPConfig(47-51)HTTPProvider(45-45)go/apps/ctrl/services/acme/providers/cloudflare_provider.go (2)
NewCloudflareProvider(21-38)CloudflareConfig(13-18)go/apps/ctrl/services/acme/providers/route53_provider.go (2)
NewRoute53Provider(33-55)Route53Config(13-23)go/pkg/db/custom_domain_upsert.sql_generated.go (1)
UpsertCustomDomainParams(22-29)
go/apps/ctrl/workflows/certificate/process_challenge_handler.go (7)
go/pkg/db/models_generated.go (1)
Certificate(703-711)go/apps/ctrl/workflows/certificate/service.go (1)
Service(20-29)go/gen/proto/hydra/v1/certificate.pb.go (6)
ProcessChallengeRequest(25-31)ProcessChallengeRequest(44-44)ProcessChallengeRequest(59-61)ProcessChallengeResponse(77-83)ProcessChallengeResponse(96-96)ProcessChallengeResponse(111-113)go/apps/ctrl/services/acme/user.go (2)
GetOrCreateUser(49-112)UserConfig(41-47)go/pkg/retry/retry.go (2)
Attempts(83-88)Backoff(93-98)go/apps/ctrl/services/acme/pem.go (1)
GetCertificateExpiry(11-23)go/pkg/db/certificate_insert.sql_generated.go (1)
InsertCertificateParams(23-31)
go/apps/ctrl/services/acme/providers/http_provider.go (1)
go/apps/ctrl/services/acme/providers/provider.go (1)
ProviderConfig(41-46)
🔇 Additional comments (20)
go/pkg/db/handle_err_no_rows.go (1)
8-11: LGTM!Clean utility function for consistent not-found error handling across the codebase. Using
errors.Iscorrectly handles wrapped errors.go/apps/api/routes/v2_identities_list_identities/handler.go (1)
107-111: LGTM!Good adoption of the centralized
db.IsNotFoundhelper for consistent not-found error handling.go/apps/ctrl/services/acme/providers/route53_provider.go (1)
25-55: LGTM! Previous thread-safety concern has been addressed.The
os.Setenv("LEGO_DISABLE_CNAME_SUPPORT", "true")call has been correctly moved to application startup inrun.go(line 60), and the documentation here properly notes this prerequisite. The implementation follows the established pattern fromCloudflareProvider.go/apps/ctrl/workflows/certificate/service.go (2)
22-28: LGTM!The Service struct is cleanly extended with the new ACME provider fields. The inline comments clearly distinguish DNS-01 (wildcard certs) from HTTP-01 (regular certs) use cases.
43-55: LGTM!Config fields are well-documented and properly aligned with the Service struct fields.
go/apps/ctrl/run.go (4)
57-60: LGTM! Thread-safety concern addressed.Setting the environment variable once at application startup before any concurrent execution is the correct fix for the previously identified thread-safety issue.
260-313: LGTM! Previous DomainCache concerns addressed.The cache initialization at lines 265-270 properly precedes provider creation, and
DomainCacheis correctly passed to all providers (Cloudflare at line 290, Route53 at line 305, HTTP at line 276).
391-403: LGTM! Idempotent cron startup.Using
restate.WithIdempotencyKey("cert-renewal-cron-startup")ensures that multiple application restarts don't create duplicate cron jobs.
544-559: Database constraints already prevent duplicate ACME challenges.The
acme_challengestable enforces a PRIMARY KEY constraint ondomain_id, making duplicates impossible at the database level. The code's guard checking for existing custom domains (FindCustomDomainByDomain) ensures the function returns early on subsequent restarts, preventing any insertion attempt. The proposed fix referencesFindAcmeChallengeByDomainID, which does not exist in the codebase—onlyFindAcmeChallengeByTokenis available. No changes are needed.Likely an incorrect or invalid review comment.
go/apps/ctrl/services/acme/providers/http_provider.go (3)
22-35: LGTM!The
PresentandCleanUpmethods correctly delegate actual challenge handling to the genericProviderwrapper while maintaining proper logging. The no-opCleanUpis appropriate since HTTP-01 tokens can remain in DB until overwritten.
37-41: LGTM!The timeout values (90s timeout, 3s interval) are appropriate for HTTP-01 challenges which typically resolve faster than DNS-based challenges.
43-64: LGTM!The type alias provides a clear public interface while reusing the generic
Providerimplementation. TheHTTPConfigstruct aligns withProviderConfigrequirements, and the constructor properly wires all dependencies.go/apps/ctrl/workflows/certificate/process_challenge_handler.go (8)
20-26: LGTM!The
EncryptedCertificatestruct is well-designed to hold all necessary certificate data including the encrypted private key for secure storage.
65-70: LGTM!The saga compensation correctly uses a defer to capture the final error state, ensuring
markChallengeFailedis called on any failure path after claiming the challenge. The nil check onrespprevents panic when returning early with an error.
121-124: LGTM!The wildcard detection correctly identifies domains starting with
*.and handles edge cases by checking minimum length.
126-160: LGTM!The provider selection logic correctly routes wildcard domains to DNS-01 and regular domains to HTTP-01. The nil checks provide clear error messages when providers aren't configured.
178-199: LGTM!The backoff calculation now correctly uses
(attempt-1)to produce the documented delays of 30s, 60s, 120s with a 5-minute cap. The retry logic is appropriate for handling transient ACME failures.
201-206: LGTM!The 90-day fallback matches Let's Encrypt's standard certificate validity period, and logging the warning ensures visibility when parsing fails.
208-222: LGTM!Encrypting the private key before storage is essential for security. Using
dom.WorkspaceIDas the keyring provides proper key isolation per workspace.
245-256: LGTM!The compensation correctly logs failures without propagating errors, ensuring the main workflow's error state isn't overwritten. Discarding the outer
restate.Runerror is intentional since this is best-effort cleanup.
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (1)
go/apps/ctrl/run.go (1)
526-561: Consider extracting "unkey_internal" workspace ID to a package-level constant.The hardcoded
"unkey_internal"string should be defined as a constant for clarity and to enable reuse across the codebase.The empty
TokenandAuthorizationfields in the bootstrap challenge record are intentional and correct—the renewal cron queries challenges by status (viaListExecutableChallenges, which filtersWHERE status = 'waiting'), not by token, so this record will be properly discovered during renewal.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
go/apps/ctrl/internal/caches/caches.go(1 hunks)go/apps/ctrl/run.go(7 hunks)go/apps/ctrl/services/acme/certificate_verification.go(2 hunks)go/apps/ctrl/services/acme/service.go(1 hunks)go/apps/ctrl/workflows/certificate/process_challenge_handler.go(6 hunks)
🧰 Additional context used
🧠 Learnings (6)
📚 Learning: 2025-09-15T18:12:01.503Z
Learnt from: mcstepp
Repo: unkeyed/unkey PR: 3952
File: go/apps/ctrl/run.go:212-213
Timestamp: 2025-09-15T18:12:01.503Z
Learning: In Go connect-go generated handlers, mux.Handle can accept New*ServiceHandler functions directly even though they return (string, http.Handler) tuples. The pattern mux.Handle(ctrlv1connect.New*ServiceHandler(...)) is valid and compiles successfully in unkey codebase.
Applied to files:
go/apps/ctrl/services/acme/service.gogo/apps/ctrl/run.go
📚 Learning: 2025-09-25T18:49:28.948Z
Learnt from: mcstepp
Repo: unkeyed/unkey PR: 4010
File: go/apps/ctrl/run.go:224-233
Timestamp: 2025-09-25T18:49:28.948Z
Learning: In go/apps/ctrl/run.go, mcstepp prefers to keep ACME service handlers wrapped with auth interceptor during the demo phase for simplicity, even though typically ACME routes would be exempt from authentication. This is part of the temporary demo authentication system.
Applied to files:
go/apps/ctrl/run.go
📚 Learning: 2025-09-11T14:13:12.124Z
Learnt from: Flo4604
Repo: unkeyed/unkey PR: 3944
File: go/apps/ctrl/services/acme/providers/cloudflare_provider.go:56-69
Timestamp: 2025-09-11T14:13:12.124Z
Learning: For the default domain in the Unkey system (go/apps/ctrl/services/acme/providers/cloudflare_provider.go), only wildcard certificates are issued. When domain == p.defaultDomain, the code should always convert to the wildcard form ("*." + domain) and not attempt fallback lookups to the base domain form.
Applied to files:
go/apps/ctrl/run.gogo/apps/ctrl/services/acme/certificate_verification.go
📚 Learning: 2025-09-11T14:12:30.570Z
Learnt from: Flo4604
Repo: unkeyed/unkey PR: 3944
File: go/apps/ctrl/run.go:292-299
Timestamp: 2025-09-11T14:12:30.570Z
Learning: In go/apps/ctrl/config.go, the Validate() method already includes validation for Cloudflare API token using assert.NotEmpty() when both c.Acme.Enabled and c.Acme.Cloudflare.Enabled are true, with the error message "cloudflare API token is required when cloudflare is enabled".
Applied to files:
go/apps/ctrl/run.go
📚 Learning: 2025-09-11T14:12:30.570Z
Learnt from: Flo4604
Repo: unkeyed/unkey PR: 3944
File: go/apps/ctrl/run.go:292-299
Timestamp: 2025-09-11T14:12:30.570Z
Learning: In go/apps/ctrl/config.go, the Validate() method already includes validation for Cloudflare API token when ACME and Cloudflare are both enabled, checking that cfg.Acme.Cloudflare.ApiToken is not empty.
Applied to files:
go/apps/ctrl/run.go
📚 Learning: 2025-07-15T14:59:30.212Z
Learnt from: chronark
Repo: unkeyed/unkey PR: 3560
File: go/deploy/metald/internal/database/repository.go:0-0
Timestamp: 2025-07-15T14:59:30.212Z
Learning: go/deploy/metald cannot currently import helpers from go/pkg/db because it is not yet part of the main Go module; avoid suggesting such imports until the modules are unified.
Applied to files:
go/apps/ctrl/services/acme/certificate_verification.go
🧬 Code graph analysis (3)
go/apps/ctrl/internal/caches/caches.go (6)
go/pkg/cache/interface.go (1)
Cache(7-49)go/pkg/db/models_generated.go (2)
CustomDomain(727-734)AcmeChallenge(627-637)go/pkg/clock/interface.go (1)
Clock(12-18)go/apps/ctrl/services/acme/service.go (2)
New(25-33)Config(18-23)go/apps/ctrl/workflows/certificate/service.go (2)
New(58-69)Config(34-55)go/apps/ctrl/config.go (1)
Config(95-171)
go/apps/ctrl/services/acme/service.go (5)
go/pkg/cache/interface.go (1)
Cache(7-49)go/pkg/db/models_generated.go (2)
CustomDomain(727-734)AcmeChallenge(627-637)go/apps/ctrl/internal/caches/caches.go (2)
New(23-58)Config(18-21)go/apps/ctrl/workflows/certificate/service.go (3)
New(58-69)Config(34-55)Service(20-29)go/apps/ctrl/workflows/routing/service.go (3)
New(40-47)Config(28-37)Service(18-23)
go/apps/ctrl/workflows/certificate/process_challenge_handler.go (4)
go/pkg/db/models_generated.go (2)
Certificate(703-711)CustomDomain(727-734)go/apps/ctrl/workflows/certificate/service.go (2)
Service(20-29)New(58-69)go/apps/ctrl/services/acme/user.go (2)
GetOrCreateUser(49-112)UserConfig(41-47)go/pkg/db/handle_err_no_rows.go (1)
IsNotFound(10-12)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: Test Go API Local / Test
- GitHub Check: Lint Go Code / Lint
- GitHub Check: Test API / API Test Local
- GitHub Check: Build / Build
- GitHub Check: Test Packages / Test
🔇 Additional comments (14)
go/apps/ctrl/services/acme/certificate_verification.go (2)
20-35: LGTM on domain cache lookup with SWR pattern.The stale-while-revalidate cache pattern is correctly implemented with proper error handling - checking for non-NotFound errors first, then handling cache misses via
cache.Nullordb.IsNotFound.
37-55: Challenge cache lookup is well-structured.The composite key format (
domainID|token) ensures uniqueness, and the short TTL (10s fresh / 30s stale from caches.go) is appropriate for challenge data that changes during the ACME flow.go/apps/ctrl/services/acme/service.go (1)
10-32: LGTM on cache dependency injection.The cache fields follow the established service configuration pattern in the codebase (e.g.,
routing.Service,certificate.Service). The caches are properly wired from the centralizedcaches.Cachescontainer.go/apps/ctrl/internal/caches/caches.go (1)
23-57: Well-designed cache configuration.Good choices:
- Clock injection for testability with sensible default
- Short TTL (10s/30s) for challenges that change during ACME flow
- Longer TTL (5m/10m) for relatively stable domain data
- Proper error propagation from cache initialization
go/apps/ctrl/run.go (4)
260-266: LGTM on cache initialization placement.Caches are correctly initialized before the ACME providers that depend on them.
268-313: Provider setup correctly wires domain cache.All three providers (HTTP, Cloudflare, Route53) now receive the
DomainCacheas required. The previous review comments about missing cache initialization have been addressed.
389-403: LGTM on renewal cron initialization.Good practices:
- Idempotency key prevents duplicate crons across restarts
Send()for fire-and-forget is appropriate for cron startup- Non-fatal warning on failure allows service to continue
57-60: The environment variable timing here is correct. The lego library readsLEGO_DISABLE_CNAME_SUPPORTfrom the environment at the time providers are created, not at import/init time. Sinceos.Setenvis called at line 60 and providers are created much later (lines 268-313), the variable is reliably available when needed. The existing code comment adequately documents the requirement.go/apps/ctrl/workflows/certificate/process_challenge_handler.go (6)
65-70: Compensation logic has a subtle issue with response handling.The defer checks
resp != nil && resp.Status == "failed", but at line 78-83 whenobtainCertificatefails, you return a response withStatus: "failed"anderr = nil. This correctly triggers the compensation.However, the current code has
(resp *hydrav1.ProcessChallengeResponse, err error)as named returns. If any later step fails with a non-nil error,respwill benil, and the checkresp != nil && resp.Status == "failed"won't match, buterr != nilwill. This is correct.One edge case: if
err != nilat line 78, you still returnnilfor error - verify this is intentional for Restate workflow semantics (returning a "failed" status vs propagating error).
176-192: Backoff calculation now matches documentation.The formula
30<<(attempt-1)correctly produces 30s, 60s, 120s delays for attempts 1, 2, 3, matching the comment. The past review issue has been addressed.
218-248: Certificate ID handling for renewals is now correct.The code properly handles the renewal case by:
- Checking for existing certificate by hostname (line 224)
- Reusing the existing ID for renewals (lines 228-231)
- Only using the new ID for first-time issuance
This addresses the past review concern about ID mismatches.
126-160: LGTM on ACME client setup with provider selection.Clean separation of concerns:
- Single global ACME user for simplified certificate management
- DNS-01 for wildcard domains (required by ACME spec)
- HTTP-01 for regular domains (faster, no DNS propagation)
- Clear error messages when required providers are missing
250-261: Compensation handler correctly uses best-effort semantics.Ignoring the error from
restate.Runis appropriate for compensation - the operation is best-effort and shouldn't fail the main workflow if it can't mark the challenge as failed. The error is logged for debugging.
162-216: LGTM on certificate obtain flow.Good practices:
- Comment explains why ACME client creation is inside this function (not serializable)
- Sensible 90-day default expiry matches Let's Encrypt standard
- Private key encryption via Vault before storage
- Retry with exponential backoff for transient ACME failures
13acff3 to
7b70331
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (4)
go/apps/ctrl/services/acme/user.go (1)
103-108: Consider the impact of silently failing to persist the registration URI.If
UpdateAcmeUserRegistrationURIfails, subsequent calls will repeatedly trigger ACME registration (lines 90-99), potentially hitting rate limits. While a warning log is reasonable for transient failures, consider whether repeated failures warrant returning an error to avoid unnecessary ACME calls.go/apps/ctrl/config.go (1)
227-236: LGTM on Route53 validation, but consider validating EmailDomain.The Route53 validation follows the same pattern as Cloudflare. However,
EmailDomain(line 58) is used in ACME registration emails but isn't validated when ACME is enabled. Consider adding:// Validate Cloudflare configuration if enabled if c.Acme.Enabled && c.Acme.Cloudflare.Enabled { + if err := assert.NotEmpty(c.Acme.EmailDomain, "acme email domain is required when acme is enabled"); err != nil { + return err + } if err := assert.NotEmpty(c.Acme.Cloudflare.ApiToken, "cloudflare API token is required when cloudflare is enabled"); err != nil { return err } }Or validate it when any DNS provider is enabled.
go/apps/ctrl/workflows/certificate/service.go (1)
58-68: Consider validating the new configuration fields.The
Newconstructor doesn't validate thatEmailDomain,DefaultDomain,DNSProvider, orHTTPProviderare properly set. If these are required for ACME operations, missing values could cause runtime failures later.Consider adding validation similar to other services in the codebase:
func New(cfg Config) (*Service, error) { if cfg.EmailDomain == "" { return nil, errors.New("email domain is required") } if cfg.DefaultDomain == "" { return nil, errors.New("default domain is required") } // ... rest of initialization }Alternatively, if these fields are optional depending on which challenge types are enabled, document that clearly.
go/apps/ctrl/services/acme/providers/provider.go (1)
71-92: Consider documenting the cache key format.The composite cache key
domain + "|" + wildcardDomainis a good approach for caching both exact and wildcard lookups together. However, if the domain itself could contain|, this could cause cache key collisions.Consider using a separator that's invalid in domain names, or add a brief comment explaining the key format:
func (p *Provider) resolveDomain(ctx context.Context, domain string) (db.CustomDomain, error) { wildcardDomain := "*." + domain + // Cache key combines exact and wildcard domains for single lookup caching cacheKey := domain + "|" + wildcardDomain
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (3)
go/gen/proto/hydra/v1/certificate.pb.gois excluded by!**/*.pb.go,!**/gen/**go/gen/proto/hydra/v1/certificate_restate.pb.gois excluded by!**/*.pb.go,!**/gen/**go/go.sumis excluded by!**/*.sum
📒 Files selected for processing (26)
go/apps/api/routes/v2_identities_list_identities/handler.go(1 hunks)go/apps/ctrl/config.go(2 hunks)go/apps/ctrl/internal/caches/caches.go(1 hunks)go/apps/ctrl/run.go(7 hunks)go/apps/ctrl/services/acme/certificate_verification.go(2 hunks)go/apps/ctrl/services/acme/pem.go(1 hunks)go/apps/ctrl/services/acme/providers/cloudflare_provider.go(1 hunks)go/apps/ctrl/services/acme/providers/http_provider.go(1 hunks)go/apps/ctrl/services/acme/providers/provider.go(1 hunks)go/apps/ctrl/services/acme/providers/route53_provider.go(1 hunks)go/apps/ctrl/services/acme/service.go(1 hunks)go/apps/ctrl/services/acme/user.go(5 hunks)go/apps/ctrl/workflows/certificate/process_challenge_handler.go(6 hunks)go/apps/ctrl/workflows/certificate/renew_handler.go(1 hunks)go/apps/ctrl/workflows/certificate/service.go(4 hunks)go/cmd/ctrl/main.go(2 hunks)go/go.mod(1 hunks)go/pkg/db/bulk_custom_domain_upsert.sql_generated.go(1 hunks)go/pkg/db/custom_domain_find_by_domain_or_wildcard.sql_generated.go(1 hunks)go/pkg/db/custom_domain_upsert.sql_generated.go(1 hunks)go/pkg/db/handle_err_no_rows.go(1 hunks)go/pkg/db/querier_bulk_generated.go(1 hunks)go/pkg/db/querier_generated.go(2 hunks)go/pkg/db/queries/custom_domain_find_by_domain_or_wildcard.sql(1 hunks)go/pkg/db/queries/custom_domain_upsert.sql(1 hunks)go/proto/hydra/v1/certificate.proto(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (9)
- go/pkg/db/querier_generated.go
- go/apps/api/routes/v2_identities_list_identities/handler.go
- go/go.mod
- go/pkg/db/bulk_custom_domain_upsert.sql_generated.go
- go/pkg/db/queries/custom_domain_find_by_domain_or_wildcard.sql
- go/apps/ctrl/workflows/certificate/renew_handler.go
- go/proto/hydra/v1/certificate.proto
- go/cmd/ctrl/main.go
- go/pkg/db/custom_domain_find_by_domain_or_wildcard.sql_generated.go
🧰 Additional context used
🧠 Learnings (10)
📓 Common learnings
Learnt from: mcstepp
Repo: unkeyed/unkey PR: 4010
File: go/apps/ctrl/run.go:224-233
Timestamp: 2025-09-25T18:49:28.948Z
Learning: In go/apps/ctrl/run.go, mcstepp prefers to keep ACME service handlers wrapped with auth interceptor during the demo phase for simplicity, even though typically ACME routes would be exempt from authentication. This is part of the temporary demo authentication system.
📚 Learning: 2025-09-11T14:24:40.988Z
Learnt from: Flo4604
Repo: unkeyed/unkey PR: 3944
File: go/apps/ctrl/services/deployment/deploy_workflow.go:326-334
Timestamp: 2025-09-11T14:24:40.988Z
Learning: The InsertDomains method in the bulk queries uses ON DUPLICATE KEY UPDATE, making it an upsert operation that is idempotent and safe for retries, despite the "Insert" naming convention.
Applied to files:
go/pkg/db/queries/custom_domain_upsert.sqlgo/pkg/db/querier_bulk_generated.gogo/pkg/db/custom_domain_upsert.sql_generated.go
📚 Learning: 2025-07-17T14:24:20.403Z
Learnt from: Flo4604
Repo: unkeyed/unkey PR: 3631
File: go/pkg/db/bulk_keyring_insert.sql.go:23-25
Timestamp: 2025-07-17T14:24:20.403Z
Learning: In go/pkg/db/bulk_keyring_insert.sql.go and similar bulk insert generated files, hardcoded zero values for fields like size_approx and size_last_updated_at are intentional and reflect the original SQL query structure, not missing parameters.
Applied to files:
go/pkg/db/querier_bulk_generated.go
📚 Learning: 2025-09-11T14:12:30.570Z
Learnt from: Flo4604
Repo: unkeyed/unkey PR: 3944
File: go/apps/ctrl/run.go:292-299
Timestamp: 2025-09-11T14:12:30.570Z
Learning: In go/apps/ctrl/config.go, the Validate() method already includes validation for Cloudflare API token when ACME and Cloudflare are both enabled, checking that cfg.Acme.Cloudflare.ApiToken is not empty.
Applied to files:
go/apps/ctrl/config.gogo/apps/ctrl/services/acme/providers/cloudflare_provider.gogo/apps/ctrl/run.go
📚 Learning: 2025-09-11T14:12:30.570Z
Learnt from: Flo4604
Repo: unkeyed/unkey PR: 3944
File: go/apps/ctrl/run.go:292-299
Timestamp: 2025-09-11T14:12:30.570Z
Learning: In go/apps/ctrl/config.go, the Validate() method already includes validation for Cloudflare API token using assert.NotEmpty() when both c.Acme.Enabled and c.Acme.Cloudflare.Enabled are true, with the error message "cloudflare API token is required when cloudflare is enabled".
Applied to files:
go/apps/ctrl/config.gogo/apps/ctrl/services/acme/providers/cloudflare_provider.gogo/apps/ctrl/run.go
📚 Learning: 2025-09-11T14:13:12.124Z
Learnt from: Flo4604
Repo: unkeyed/unkey PR: 3944
File: go/apps/ctrl/services/acme/providers/cloudflare_provider.go:56-69
Timestamp: 2025-09-11T14:13:12.124Z
Learning: For the default domain in the Unkey system (go/apps/ctrl/services/acme/providers/cloudflare_provider.go), only wildcard certificates are issued. When domain == p.defaultDomain, the code should always convert to the wildcard form ("*." + domain) and not attempt fallback lookups to the base domain form.
Applied to files:
go/apps/ctrl/services/acme/certificate_verification.gogo/apps/ctrl/services/acme/providers/cloudflare_provider.gogo/apps/ctrl/run.go
📚 Learning: 2025-07-15T14:59:30.212Z
Learnt from: chronark
Repo: unkeyed/unkey PR: 3560
File: go/deploy/metald/internal/database/repository.go:0-0
Timestamp: 2025-07-15T14:59:30.212Z
Learning: go/deploy/metald cannot currently import helpers from go/pkg/db because it is not yet part of the main Go module; avoid suggesting such imports until the modules are unified.
Applied to files:
go/apps/ctrl/services/acme/certificate_verification.go
📚 Learning: 2025-09-15T18:12:01.503Z
Learnt from: mcstepp
Repo: unkeyed/unkey PR: 3952
File: go/apps/ctrl/run.go:212-213
Timestamp: 2025-09-15T18:12:01.503Z
Learning: In Go connect-go generated handlers, mux.Handle can accept New*ServiceHandler functions directly even though they return (string, http.Handler) tuples. The pattern mux.Handle(ctrlv1connect.New*ServiceHandler(...)) is valid and compiles successfully in unkey codebase.
Applied to files:
go/apps/ctrl/services/acme/service.gogo/apps/ctrl/run.go
📚 Learning: 2025-09-25T18:49:28.948Z
Learnt from: mcstepp
Repo: unkeyed/unkey PR: 4010
File: go/apps/ctrl/run.go:224-233
Timestamp: 2025-09-25T18:49:28.948Z
Learning: In go/apps/ctrl/run.go, mcstepp prefers to keep ACME service handlers wrapped with auth interceptor during the demo phase for simplicity, even though typically ACME routes would be exempt from authentication. This is part of the temporary demo authentication system.
Applied to files:
go/apps/ctrl/run.go
📚 Learning: 2025-10-30T15:10:52.743Z
Learnt from: Flo4604
Repo: unkeyed/unkey PR: 4190
File: go/internal/services/keys/verifier.go:51-53
Timestamp: 2025-10-30T15:10:52.743Z
Learning: PR #4190 for unkeyed/unkey is focused solely on database schema and query changes for identity-based credits. It adds IdentityCredits and KeyCredits fields to structs and queries, but does not implement the priority enforcement logic in the usagelimiter. The logic implementation is intentionally deferred to a later PR in the stack.
Applied to files:
go/apps/ctrl/workflows/certificate/process_challenge_handler.go
🧬 Code graph analysis (12)
go/apps/ctrl/services/acme/providers/provider.go (5)
go/pkg/cache/interface.go (2)
Cache(7-49)Null(59-59)go/pkg/db/models_generated.go (2)
CustomDomain(727-734)AcmeChallengesStatusPending(62-62)go/pkg/db/queries.go (1)
Query(29-29)go/pkg/db/custom_domain_find_by_domain_or_wildcard.sql_generated.go (1)
FindCustomDomainByDomainOrWildcardParams(20-24)go/internal/services/caches/op.go (1)
DefaultFindFirstOp(9-22)
go/pkg/db/querier_bulk_generated.go (2)
go/pkg/db/interface.go (1)
DBTX(29-34)go/pkg/db/custom_domain_upsert.sql_generated.go (1)
UpsertCustomDomainParams(22-29)
go/pkg/db/custom_domain_upsert.sql_generated.go (3)
go/pkg/db/models_generated.go (1)
CustomDomainsChallengeType(144-144)go/pkg/db/queries.go (1)
Queries(3-3)go/pkg/db/interface.go (1)
DBTX(29-34)
go/apps/ctrl/config.go (2)
go/apps/ctrl/services/acme/providers/cloudflare_provider.go (1)
CloudflareConfig(13-18)go/apps/ctrl/services/acme/providers/route53_provider.go (1)
Route53Config(13-23)
go/apps/ctrl/services/acme/certificate_verification.go (8)
go/pkg/db/models_generated.go (2)
CustomDomain(727-734)AcmeChallenge(627-637)go/pkg/db/queries.go (1)
Query(29-29)go/internal/services/caches/op.go (1)
DefaultFindFirstOp(9-22)go/pkg/db/handle_err_no_rows.go (1)
IsNotFound(10-12)go/pkg/cache/interface.go (1)
Null(59-59)go/apps/ctrl/internal/caches/caches.go (1)
New(23-58)go/apps/ctrl/services/acme/service.go (1)
New(25-33)go/pkg/db/acme_challenge_find_by_token.sql_generated.go (1)
FindAcmeChallengeByTokenParams(16-20)
go/apps/ctrl/services/acme/providers/cloudflare_provider.go (4)
go/pkg/cache/interface.go (1)
Cache(7-49)go/pkg/db/models_generated.go (1)
CustomDomain(727-734)go/apps/ctrl/config.go (1)
CloudflareConfig(27-33)go/apps/ctrl/services/acme/providers/provider.go (3)
Provider(34-39)NewProvider(49-66)ProviderConfig(41-46)
go/apps/ctrl/internal/caches/caches.go (4)
go/pkg/cache/interface.go (1)
Cache(7-49)go/pkg/db/models_generated.go (2)
CustomDomain(727-734)AcmeChallenge(627-637)go/pkg/clock/interface.go (1)
Clock(12-18)go/apps/ctrl/config.go (1)
Config(95-171)
go/apps/ctrl/services/acme/user.go (4)
go/pkg/db/models_generated.go (1)
AcmeUser(639-646)go/pkg/db/queries.go (1)
Query(29-29)go/pkg/db/acme_user_update_registration_uri.sql_generated.go (1)
UpdateAcmeUserRegistrationURIParams(17-20)go/pkg/db/types/null_string.go (1)
NullString(10-10)
go/apps/ctrl/services/acme/service.go (3)
go/pkg/cache/interface.go (1)
Cache(7-49)go/pkg/db/models_generated.go (2)
CustomDomain(727-734)AcmeChallenge(627-637)go/apps/ctrl/internal/caches/caches.go (2)
New(23-58)Config(18-21)
go/apps/ctrl/run.go (7)
go/apps/ctrl/internal/caches/caches.go (2)
New(23-58)Config(18-21)go/apps/ctrl/services/acme/providers/provider.go (2)
Provider(34-39)DNSProvider(26-30)go/apps/ctrl/services/acme/providers/http_provider.go (3)
NewHTTPProvider(54-64)HTTPConfig(47-51)HTTPProvider(45-45)go/apps/ctrl/services/acme/providers/cloudflare_provider.go (2)
NewCloudflareProvider(21-38)CloudflareConfig(13-18)go/apps/ctrl/services/acme/providers/route53_provider.go (2)
NewRoute53Provider(33-55)Route53Config(13-23)go/pkg/db/custom_domain_upsert.sql_generated.go (1)
UpsertCustomDomainParams(22-29)go/pkg/db/acme_challenge_insert.sql_generated.go (1)
InsertAcmeChallengeParams(37-47)
go/apps/ctrl/workflows/certificate/service.go (2)
go/apps/ctrl/services/acme/providers/provider.go (1)
DNSProvider(26-30)go/apps/ctrl/services/acme/providers/http_provider.go (1)
HTTPProvider(45-45)
go/apps/ctrl/workflows/certificate/process_challenge_handler.go (8)
go/pkg/db/models_generated.go (2)
Certificate(703-711)CustomDomain(727-734)go/apps/ctrl/workflows/certificate/service.go (2)
Service(20-29)New(58-69)go/apps/ctrl/services/acme/user.go (2)
GetOrCreateUser(49-112)UserConfig(41-47)go/pkg/retry/retry.go (2)
Attempts(83-88)Backoff(93-98)go/apps/ctrl/services/acme/pem.go (1)
GetCertificateExpiry(11-23)go/gen/proto/vault/v1/service.pb.go (3)
EncryptRequest(104-110)EncryptRequest(123-123)EncryptRequest(138-140)go/pkg/uid/uid.go (1)
CertificatePrefix(53-53)go/pkg/db/certificate_insert.sql_generated.go (1)
InsertCertificateParams(23-31)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
- GitHub Check: Test API / API Test Local
- GitHub Check: Test Go API Local / Test
- GitHub Check: Lint Go Code / Lint
- GitHub Check: Test Packages / Test
- GitHub Check: Build / Build
- GitHub Check: Analyze (javascript-typescript)
🔇 Additional comments (34)
go/pkg/db/handle_err_no_rows.go (1)
8-12: LGTM!Clean utility function using
errors.Isfor proper error chain handling. This centralizes not-found error detection correctly.go/apps/ctrl/services/acme/user.go (3)
22-31: LGTM!Good refactoring to make the email domain configurable via struct field rather than hardcoded value.
41-47: LGTM!Clear struct extension with documented purpose for the email domain field.
114-173: LGTM!The
registerfunction correctly handles the full user creation flow. The stricter error handling on line 169 (vs. warning-only on line 107) makes sense: for new users, failing to persist the registration URI would leave an orphaned user record.go/pkg/db/queries/custom_domain_upsert.sql (1)
1-7: LGTM!Clean upsert query using MySQL's
ON DUPLICATE KEY UPDATE. The separate placeholder forupdated_atin the UPDATE clause is correct since it's not included in the INSERT values. Based on learnings, this pattern is idempotent and safe for retries.go/pkg/db/querier_bulk_generated.go (1)
16-16: LGTM!Generated interface method follows the established pattern and is correctly placed.
go/apps/ctrl/services/acme/pem.go (1)
10-23: LGTM!Clean implementation for extracting certificate expiry. Correctly handles the common case where only the first (leaf) certificate's expiry is needed from a potential chain.
go/apps/ctrl/services/acme/certificate_verification.go (2)
30-35: LGTM!The error handling correctly distinguishes between "not found" (returns 404) and other errors (returns 500). The dual check for
cache.Nullanddb.IsNotFound(err)covers both cached null values and fresh not-found results.
37-55: LGTM!The composite cache key format and challenge lookup follow the same pattern as the domain lookup. The key construction is straightforward and the error handling is consistent.
go/apps/ctrl/services/acme/service.go (1)
10-32: LGTM!Clean extension of the service with domain and challenge caches via dependency injection. The pattern is consistent with other service dependencies (db, logger).
go/apps/ctrl/config.go (2)
35-52: LGTM!Well-documented Route53 configuration struct following the same pattern as CloudflareConfig. The HostedZoneID comment helpfully explains when the bypass is needed.
53-65: LGTM!AcmeConfig cleanly extended to support both Cloudflare and Route53 DNS providers, plus the configurable email domain.
go/apps/ctrl/internal/caches/caches.go (1)
1-58: LGTM! Well-structured cache container with appropriate TTL settings.The short TTL for challenges (10s/30s) is appropriate for the time-sensitive ACME flow, and the longer TTL for domains (5m/10m) balances cache efficiency with data freshness. The clock injection pattern supports testability.
go/pkg/db/custom_domain_upsert.sql_generated.go (1)
1-49: Generated code looks correct.This is sqlc-generated code implementing an idempotent upsert operation. The
ON DUPLICATE KEY UPDATEpattern is consistent with other upsert operations in this codebase. Based on learnings, this naming convention is established practice.go/apps/ctrl/services/acme/providers/cloudflare_provider.go (1)
13-37: Clean refactor to use the generic Provider wrapper.The simplified implementation correctly delegates to the generic
Providerwhile maintaining consistent configuration (TTL, propagation timeout) with the Route53 provider. TheDomainCacherequirement will be validated byNewProvider.go/apps/ctrl/services/acme/providers/provider.go (2)
94-124: Note:context.Background()is a constraint of the lego interface.The lego
challenge.Providerinterface doesn't accept a context parameter, so usingcontext.Background()here is unavoidable. This is worth noting for future maintainers, as it means these operations cannot be cancelled externally.The DB update after DNS presentation follows an "at-least-once" pattern - if the DB update fails after DNS record creation, the record exists but isn't tracked. This is acceptable since the record will be cleaned up by TTL or subsequent attempts.
126-136: CleanUp intentionally swallows errors - verify this is the desired behavior.The
CleanUpmethod logs DNS cleanup failures as warnings but returnsnil, which means cleanup errors won't propagate. This is a reasonable choice to avoid blocking the certificate flow, but could leave orphaned DNS records.Is this the intended behavior? Orphaned TXT records will eventually expire based on TTL (10 minutes), but if cleanup consistently fails, you may want alerting on these warnings.
go/apps/ctrl/services/acme/providers/route53_provider.go (1)
25-33: The environment variableLEGO_DISABLE_CNAME_SUPPORTis already properly set to"true"at application startup ingo/apps/ctrl/run.go:60, confirming the documented requirement is fulfilled and no additional action is needed.go/apps/ctrl/run.go (6)
57-60: LGTM! Good guard against CNAME resolution issues.Setting this environment variable before provider creation prevents lego from following wildcard CNAMEs to load balancers, which would cause Route53 zone lookup failures. The comment clearly explains the purpose.
260-266: LGTM! Cache initialization addresses prior review feedback.The shared caches are properly initialized before provider setup, resolving the previous issue about missing DomainCache parameters. Creating caches unconditionally aligns with the comment that they're needed for verification endpoints regardless of ACME configuration.
268-313: LGTM! Provider setup properly addresses past issues.Both DNS providers (Cloudflare and Route53) now receive the DomainCache parameter, resolving the critical issues from previous reviews. The separation between HTTP-01 (regular domains) and DNS-01 (wildcard domains) is clear and well-commented.
315-325: LGTM! Appropriate timeout for DNS-01 challenges.The 15-minute inactivity timeout accounts for DNS propagation delays (5-10 minutes) during DNS-01 challenges. All required ACME configuration parameters are properly wired.
383-403: LGTM! Good use of idempotency and appropriate error handling.The idempotency key prevents duplicate cron jobs on restart, which is essential for reliable operation. Bootstrap and cron startup appropriately run only when DNS-01 is available (required for wildcard certificates). Using warnings instead of hard failures for these operations is sensible since they're initialization tasks.
441-446: LGTM! Cache wiring is complete.Both DomainCache and ChallengeCache are properly passed to the ACME service, enabling efficient domain and challenge lookups.
Based on learnings, the auth interceptor is intentionally kept during the demo phase.
go/apps/ctrl/services/acme/providers/http_provider.go (2)
15-41: LGTM! Refactoring aligns with generic provider architecture.The httpDNS type correctly implements the DNSProvider interface. The Present and CleanUp methods serve as hooks for the generic Provider wrapper (from provider.go), which handles actual database persistence. The comments clearly explain this delegation pattern.
The 90-second timeout and 3-second interval are appropriate for HTTP-01 challenges, which resolve faster than DNS-01.
43-63: LGTM! Clean provider construction pattern.The type alias
HTTPProvider = Providermakes the provider's purpose explicit while leveraging the generic Provider wrapper. The DomainCache is properly passed through to enable efficient domain lookups during challenge processing. This pattern is consistent with other providers in the PR (Cloudflare, Route53).go/apps/ctrl/workflows/certificate/process_challenge_handler.go (8)
20-26: LGTM! CertificateID field enables proper certificate tracking.Adding the CertificateID field to the struct enables proper certificate identification and management, particularly important for renewal scenarios handled in persistCertificate.
36-39: LGTM! Named returns enable saga compensation pattern.The named return values allow the deferred compensation function (lines 65-70) to inspect both the error and response status, which is essential for properly marking failed challenges.
65-70: LGTM! Well-implemented saga compensation pattern.The deferred compensation ensures that challenges are properly marked as failed if any error occurs or if the workflow explicitly returns a "failed" status. This prevents challenges from remaining in the "pending" state indefinitely and is a solid implementation of the saga pattern for distributed workflows.
118-124: LGTM! Single global ACME user simplifies certificate management.Using a fixed global ACME user ID for all certificates simplifies account management and aligns with common ACME deployment patterns. The isWildcard helper is correct and clear.
126-160: LGTM! Clear DNS-01 vs HTTP-01 provider selection.The function correctly routes wildcard domains to DNS-01 challenges and regular domains to HTTP-01 challenges. Provider availability is checked before use, with clear error messages. The logging helps with debugging by indicating which challenge type is being used.
162-216: LGTM! Robust certificate obtainment with proper error handling.The backoff calculation
30<<(attempt-1)correctly produces the documented sequence (30s, 60s, 120s), addressing previous review feedback. The retry logic with exponential backoff is appropriate for ACME operations which can be transiently unavailable.The 90-day fallback for certificate expiry is reasonable (matches Let's Encrypt standard), and vault-based private key encryption ensures secure storage. The newly generated CertificateID at line 211 is correctly handled in persistCertificate which preserves existing IDs during renewals.
218-248: LGTM! Properly handles certificate ID during renewals.This function correctly addresses the previous critical issue about certificate ID mismatches on renewals. By checking for an existing certificate (lines 224-231) and preserving its ID, the function ensures that renewals update the existing row rather than creating orphaned IDs. The returned ID matches what's actually stored in the database.
250-261: LGTM! Appropriate error handling for compensation logic.The function correctly marks challenges as failed during compensation. Wrapping in
restate.Runensures durability, and logging errors without propagating them is appropriate for cleanup operations—you don't want compensation failures to cascade.
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (1)
go/apps/ctrl/workflows/certificate/renew_handler.go (1)
87-95: Use injected clock for testability.Line 87 uses
time.Now()directly, making this code difficult to test deterministically. TheServicestruct should have aclock.Clockfield, allowing tests to control time progression.This was flagged in a prior review. Consider adding clock injection for testability.
🧹 Nitpick comments (5)
go/apps/ctrl/workflows/certificate/process_challenge_handler.go (3)
139-157: Consider logging the challenge type decision before provider setup.The logic correctly selects DNS-01 for wildcards and HTTP-01 for regular domains. However, the info logs appear after the provider setup attempt. If setup fails, the user won't know which path was taken.
// Wildcard certificates require DNS-01 challenge // Regular domains use HTTP-01 (faster, no DNS propagation wait) if isWildcard(domain) { + s.logger.Info("wildcard domain detected, using DNS-01 challenge", "domain", domain) if s.dnsProvider == nil { return nil, fmt.Errorf("DNS provider required for wildcard certificate: %s", domain) } if err := client.Challenge.SetDNS01Provider(s.dnsProvider); err != nil { return nil, fmt.Errorf("failed to set DNS-01 provider: %w", err) } - s.logger.Info("using DNS-01 challenge for wildcard domain", "domain", domain) } else { + s.logger.Info("regular domain, using HTTP-01 challenge", "domain", domain) if s.httpProvider == nil { return nil, fmt.Errorf("HTTP provider required for certificate: %s", domain) } if err := client.Challenge.SetHTTP01Provider(s.httpProvider); err != nil { return nil, fmt.Errorf("failed to set HTTP-01 provider: %w", err) } - s.logger.Info("using HTTP-01 challenge for domain", "domain", domain) }
162-162: Remove unused parameter.The
_ stringparameter (originallyworkspaceID) is not used in the function body. If it's intentionally unused, consider removing it from the signature to avoid confusion.-func (s *Service) obtainCertificate(ctx context.Context, _ string, dom db.CustomDomain, domain string) (EncryptedCertificate, error) { +func (s *Service) obtainCertificate(ctx context.Context, dom db.CustomDomain, domain string) (EncryptedCertificate, error) {Note: This would require updating the caller at line 76:
- return s.obtainCertificate(stepCtx, req.GetWorkspaceId(), dom, req.GetDomain()) + return s.obtainCertificate(stepCtx, dom, req.GetDomain())
251-262: Errors from markChallengeFailed are silently discarded.The return values from
restate.Runare discarded with_, _ =. While logging occurs inside, if the entirerestate.Runcall fails (not just the inner DB update), that error is lost. Consider at minimum logging at the call site.func (s *Service) markChallengeFailed(ctx restate.ObjectContext, domainID string) { - _, _ = restate.Run(ctx, func(stepCtx restate.RunContext) (restate.Void, error) { + _, runErr := restate.Run(ctx, func(stepCtx restate.RunContext) (restate.Void, error) { if updateErr := db.Query.UpdateAcmeChallengeStatus(stepCtx, s.db.RW(), db.UpdateAcmeChallengeStatusParams{ DomainID: domainID, Status: db.AcmeChallengesStatusFailed, UpdatedAt: sql.NullInt64{Valid: true, Int64: time.Now().UnixMilli()}, }); updateErr != nil { s.logger.Error("failed to update challenge status", "error", updateErr, "domain_id", domainID) } return restate.Void{}, nil }, restate.WithName("mark failed")) + if runErr != nil { + s.logger.Error("failed to run mark-failed step", "error", runErr, "domain_id", domainID) + } }go/apps/ctrl/workflows/certificate/renew_handler.go (1)
27-28:DaysBeforeExpiryfield is unused.The
DaysBeforeExpiryfield from the request (line 27) is never read in this handler. The queryListExecutableChallengesseems to have its own logic for determining which certificates need processing. Either use this field or document that it's reserved for future use.Also applies to: 91-91
go/apps/ctrl/run.go (1)
534-535: Document or make the internal workspace ID configurable.The hardcoded
"unkey_internal"workspace ID is used for platform-managed resources. Consider extracting this to a constant or making it configurable to avoid magic strings.+const ( + // internalWorkspaceID is used for platform-managed resources like wildcard certificates + internalWorkspaceID = "unkey_internal" +) + func bootstrapWildcardDomain(ctx context.Context, database db.Database, logger logging.Logger, defaultDomain string) { ... - workspaceID := "unkey_internal" + workspaceID := internalWorkspaceID
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
go/apps/ctrl/run.go(7 hunks)go/apps/ctrl/services/acme/user.go(5 hunks)go/apps/ctrl/workflows/certificate/process_challenge_handler.go(6 hunks)go/apps/ctrl/workflows/certificate/renew_handler.go(1 hunks)
🧰 Additional context used
🧠 Learnings (7)
📚 Learning: 2025-10-30T15:10:52.743Z
Learnt from: Flo4604
Repo: unkeyed/unkey PR: 4190
File: go/internal/services/keys/verifier.go:51-53
Timestamp: 2025-10-30T15:10:52.743Z
Learning: PR #4190 for unkeyed/unkey is focused solely on database schema and query changes for identity-based credits. It adds IdentityCredits and KeyCredits fields to structs and queries, but does not implement the priority enforcement logic in the usagelimiter. The logic implementation is intentionally deferred to a later PR in the stack.
Applied to files:
go/apps/ctrl/workflows/certificate/process_challenge_handler.go
📚 Learning: 2025-09-11T14:13:12.124Z
Learnt from: Flo4604
Repo: unkeyed/unkey PR: 3944
File: go/apps/ctrl/services/acme/providers/cloudflare_provider.go:56-69
Timestamp: 2025-09-11T14:13:12.124Z
Learning: For the default domain in the Unkey system (go/apps/ctrl/services/acme/providers/cloudflare_provider.go), only wildcard certificates are issued. When domain == p.defaultDomain, the code should always convert to the wildcard form ("*." + domain) and not attempt fallback lookups to the base domain form.
Applied to files:
go/apps/ctrl/workflows/certificate/process_challenge_handler.gogo/apps/ctrl/run.go
📚 Learning: 2025-09-25T18:49:28.948Z
Learnt from: mcstepp
Repo: unkeyed/unkey PR: 4010
File: go/apps/ctrl/run.go:224-233
Timestamp: 2025-09-25T18:49:28.948Z
Learning: In go/apps/ctrl/run.go, mcstepp prefers to keep ACME service handlers wrapped with auth interceptor during the demo phase for simplicity, even though typically ACME routes would be exempt from authentication. This is part of the temporary demo authentication system.
Applied to files:
go/apps/ctrl/run.go
📚 Learning: 2025-09-11T14:12:30.570Z
Learnt from: Flo4604
Repo: unkeyed/unkey PR: 3944
File: go/apps/ctrl/run.go:292-299
Timestamp: 2025-09-11T14:12:30.570Z
Learning: In go/apps/ctrl/config.go, the Validate() method already includes validation for Cloudflare API token using assert.NotEmpty() when both c.Acme.Enabled and c.Acme.Cloudflare.Enabled are true, with the error message "cloudflare API token is required when cloudflare is enabled".
Applied to files:
go/apps/ctrl/run.go
📚 Learning: 2025-09-11T14:12:30.570Z
Learnt from: Flo4604
Repo: unkeyed/unkey PR: 3944
File: go/apps/ctrl/run.go:292-299
Timestamp: 2025-09-11T14:12:30.570Z
Learning: In go/apps/ctrl/config.go, the Validate() method already includes validation for Cloudflare API token when ACME and Cloudflare are both enabled, checking that cfg.Acme.Cloudflare.ApiToken is not empty.
Applied to files:
go/apps/ctrl/run.go
📚 Learning: 2025-07-15T14:59:30.212Z
Learnt from: chronark
Repo: unkeyed/unkey PR: 3560
File: go/deploy/metald/internal/database/repository.go:0-0
Timestamp: 2025-07-15T14:59:30.212Z
Learning: go/deploy/metald cannot currently import helpers from go/pkg/db because it is not yet part of the main Go module; avoid suggesting such imports until the modules are unified.
Applied to files:
go/apps/ctrl/run.go
📚 Learning: 2025-09-15T18:12:01.503Z
Learnt from: mcstepp
Repo: unkeyed/unkey PR: 3952
File: go/apps/ctrl/run.go:212-213
Timestamp: 2025-09-15T18:12:01.503Z
Learning: In Go connect-go generated handlers, mux.Handle can accept New*ServiceHandler functions directly even though they return (string, http.Handler) tuples. The pattern mux.Handle(ctrlv1connect.New*ServiceHandler(...)) is valid and compiles successfully in unkey codebase.
Applied to files:
go/apps/ctrl/run.go
🧬 Code graph analysis (3)
go/apps/ctrl/workflows/certificate/process_challenge_handler.go (5)
go/pkg/db/models_generated.go (1)
Certificate(703-711)go/apps/ctrl/workflows/certificate/service.go (2)
Service(20-29)New(58-69)go/gen/proto/hydra/v1/certificate.pb.go (6)
ProcessChallengeRequest(25-31)ProcessChallengeRequest(44-44)ProcessChallengeRequest(59-61)ProcessChallengeResponse(77-83)ProcessChallengeResponse(96-96)ProcessChallengeResponse(111-113)go/apps/ctrl/services/acme/user.go (2)
GetOrCreateUser(49-114)UserConfig(41-47)go/pkg/db/certificate_insert.sql_generated.go (1)
InsertCertificateParams(23-31)
go/apps/ctrl/run.go (9)
go/apps/ctrl/internal/caches/caches.go (2)
New(23-58)Config(18-21)go/apps/ctrl/config.go (3)
Config(95-171)CloudflareConfig(27-33)Route53Config(35-51)go/pkg/clock/interface.go (1)
Clock(12-18)go/apps/ctrl/services/acme/providers/http_provider.go (3)
NewHTTPProvider(54-64)HTTPConfig(47-51)HTTPProvider(45-45)go/apps/ctrl/services/acme/providers/cloudflare_provider.go (2)
NewCloudflareProvider(21-38)CloudflareConfig(13-18)go/apps/ctrl/services/acme/providers/route53_provider.go (2)
NewRoute53Provider(33-55)Route53Config(13-23)go/gen/proto/hydra/v1/certificate.pb.go (3)
RenewExpiringCertificatesRequest(129-135)RenewExpiringCertificatesRequest(148-148)RenewExpiringCertificatesRequest(163-165)go/pkg/db/custom_domain_upsert.sql_generated.go (1)
UpsertCustomDomainParams(22-29)go/pkg/db/models_generated.go (2)
CustomDomainsChallengeTypeDNS01(148-148)AcmeChallengesStatusWaiting(61-61)
go/apps/ctrl/services/acme/user.go (4)
go/pkg/db/models_generated.go (1)
AcmeUser(639-646)go/pkg/db/queries.go (1)
Query(29-29)go/pkg/db/acme_user_update_registration_uri.sql_generated.go (1)
UpdateAcmeUserRegistrationURIParams(17-20)go/pkg/db/types/null_string.go (1)
NullString(10-10)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: Test Go API Local / Test
- GitHub Check: Lint Go Code / Lint
- GitHub Check: Build / Build
- GitHub Check: Test API / API Test Local
- GitHub Check: Test Packages / Test
🔇 Additional comments (18)
go/apps/ctrl/workflows/certificate/process_challenge_handler.go (5)
21-26: LGTM on the EncryptedCertificate struct.The struct cleanly encapsulates all certificate data needed for the workflow, including the new
CertificateIDandExpiresAtfields for tracking certificate lifecycle.
118-124: LGTM on the global ACME user approach and wildcard detection.Using a single global ACME account simplifies key management. The
isWildcardhelper correctly identifies wildcard domains by checking for the*.prefix.
177-184: Backoff formula now matches the documented delays.The formula
30 << (attempt-1)correctly produces delays of 30s, 60s, 120s (capped at 5min) for attempts 1, 2, 3, matching the comment. This addresses the prior review feedback.
219-249: Certificate ID mismatch issue addressed.The code now correctly checks for an existing certificate by hostname and reuses its ID on renewal. This ensures the returned
certIDmatches what's stored in the database, addressing the prior review feedback about ID mismatches.
65-70: Defer compensation behavior is correct as written.The defer at line 30-35 is properly positioned after the successful claim (line 28). With named return values, the defer captures references to
respanderrwhich are evaluated when the function returns. The compensation logic correctly triggersmarkChallengeFailedfor all failure scenarios after claiming:
- If
err != nil(failures at obtain, persist, or mark verified steps)- If response status is "failed" (certificate obtain returns failed status but nil error)
Early returns before the claim (lines 14-15, 26-27) execute before the defer is registered, so no compensation runs for errors before claiming—which is correct behavior. The code is sound.
go/apps/ctrl/services/acme/user.go (4)
22-31: LGTM on EmailDomain integration.The
EmailDomainfield allows configurable ACME registration emails. TheGetEmail()method correctly composes the email using the workspace ID and domain.
71-84: LGTM on user rehydration with optional registration.The code correctly initializes the user with the decrypted key and only sets
Registrationwhen a valid URI exists in the database.
116-175: LGTM on the register flow.The registration flow correctly:
- Generates a new ECDSA P-256 key
- Encrypts and persists the key before registration
- Registers with the CA
- Persists the registration URI
The order ensures the key is safely stored before any CA interaction.
92-111: Registration URI persistence failure does not require fatal error handling.ACME specification (RFC 8555) makes the newAccount operation idempotent by account key. If registration is attempted again with the same key, the server must return the existing account with a 200 OK response—it will not reject the duplicate registration. This means re-attempting registration if the URI persists fails is safe and will consistently return the same account. The current approach of logging a warning is appropriate.
go/apps/ctrl/workflows/certificate/renew_handler.go (3)
11-17: LGTM on constants for renewal scheduling.Clear constants for the 24-hour renewal interval and the singleton key pattern.
25-43: LGTM on the Restate durable workflow pattern.The handler correctly uses
restate.Runto durably query for executable challenges, ensuring the query is replayed correctly on workflow resume.
49-76: Fire-and-forget pattern handles failures gracefully.The loop correctly:
- Triggers each renewal via
Send()(non-blocking)- Collects failed domains for reporting
- Uses
restate.Sleepfor throttling (durable sleep)The 100ms delay prevents overwhelming the system while still being responsive.
go/apps/ctrl/run.go (6)
58-61: LGTM on disabling CNAME support for lego.Setting
LEGO_DISABLE_CNAME_SUPPORTprevents lego from following CNAMEs that could cause Route53 zone lookup failures for wildcard domains.
261-268: LGTM on cache initialization.Caches are correctly initialized before provider setup, addressing prior review feedback about missing
DomainCache.
270-315: LGTM on ACME provider setup.Both HTTP-01 and DNS-01 providers are correctly configured with the shared
DomainCache. The DNS provider selection (Cloudflare vs Route53) is mutually exclusive as expected.
519-528: LGTM on not-found error handling.The code correctly uses
db.IsNotFound(err)instead of comparing tosql.ErrNoRowsdirectly, addressing prior review feedback.
445-450: LGTM on ACME service handler wiring.The ACME service correctly receives both
DomainCacheandChallengeCachefor efficient lookups during HTTP-01 challenge verification.
550-561: Token and Authorization fields are intentionally empty initially and are populated during the ACME challenge flow.The challenge is created with empty
TokenandAuthorizationfields because these values are only provided by the ACME server during the challenge lifecycle. When the renewal cron triggersProcessChallenge, the ACME workflow calls the DNS provider'sPresentmethod (viaprovider.go), which receives the actual token and key authorization from the lego library and updates the database withUpdateAcmeChallengePending. TheVerifyCertificateservice then reads these populated values to return the authorization to clients.This design is correct—initializing with empty strings and populating during the ACME flow prevents storing incomplete data.
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
go/apps/ctrl/run.go (1)
416-432: Remove or relax the overly restrictive renewal cron condition.The renewal cron is gated by
cfg.Acme.Enabled && dnsProvider != nil, but this is incorrect. TheRenewExpiringCertificateshandler explicitly supports both DNS-01 (lines 31-34 in renew_handler.go) and HTTP-01 challenges. TheProcessChallengeworkflow dynamically selects the appropriate provider based on domain type: DNS-01 for wildcards and HTTP-01 for regular domains (lines 141-157 in process_challenge_handler.go).This condition prevents HTTP-01 certificate renewals for regular domains, even though the workflow supports them. Regular domains configured with HTTP-01 will fail to renew because the cron job never starts.
Correct the condition to:
cfg.Acme.Enabled && (dnsProvider != nil || httpProvider != nil)
🧹 Nitpick comments (1)
go/apps/ctrl/config.go (1)
233-242: Consider validating HostedZoneID when auto-discovery is expected to fail.The validation correctly requires
AccessKeyID,SecretAccessKey, andRegionwhen Route53 is enabled. However,HostedZoneIDis optional and only validated when provided. According to the comment in the struct definition (line 48-49),HostedZoneIDis "required when domains have CNAMEs that confuse the zone lookup."While making it always required would be too strict, consider whether the validation should warn or document when it should be set based on domain configuration.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
go/go.sumis excluded by!**/*.sum
📒 Files selected for processing (5)
go/apps/ctrl/config.go(2 hunks)go/apps/ctrl/run.go(7 hunks)go/cmd/ctrl/main.go(2 hunks)go/go.mod(1 hunks)go/pkg/db/querier_generated.go(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- go/go.mod
🧰 Additional context used
🧠 Learnings (7)
📚 Learning: 2025-09-11T14:13:12.124Z
Learnt from: Flo4604
Repo: unkeyed/unkey PR: 3944
File: go/apps/ctrl/services/acme/providers/cloudflare_provider.go:56-69
Timestamp: 2025-09-11T14:13:12.124Z
Learning: For the default domain in the Unkey system (go/apps/ctrl/services/acme/providers/cloudflare_provider.go), only wildcard certificates are issued. When domain == p.defaultDomain, the code should always convert to the wildcard form ("*." + domain) and not attempt fallback lookups to the base domain form.
Applied to files:
go/apps/ctrl/run.go
📚 Learning: 2025-09-11T14:12:30.570Z
Learnt from: Flo4604
Repo: unkeyed/unkey PR: 3944
File: go/apps/ctrl/run.go:292-299
Timestamp: 2025-09-11T14:12:30.570Z
Learning: In go/apps/ctrl/config.go, the Validate() method already includes validation for Cloudflare API token using assert.NotEmpty() when both c.Acme.Enabled and c.Acme.Cloudflare.Enabled are true, with the error message "cloudflare API token is required when cloudflare is enabled".
Applied to files:
go/apps/ctrl/run.gogo/cmd/ctrl/main.gogo/apps/ctrl/config.go
📚 Learning: 2025-09-11T14:12:30.570Z
Learnt from: Flo4604
Repo: unkeyed/unkey PR: 3944
File: go/apps/ctrl/run.go:292-299
Timestamp: 2025-09-11T14:12:30.570Z
Learning: In go/apps/ctrl/config.go, the Validate() method already includes validation for Cloudflare API token when ACME and Cloudflare are both enabled, checking that cfg.Acme.Cloudflare.ApiToken is not empty.
Applied to files:
go/apps/ctrl/run.gogo/cmd/ctrl/main.gogo/apps/ctrl/config.go
📚 Learning: 2025-07-15T14:59:30.212Z
Learnt from: chronark
Repo: unkeyed/unkey PR: 3560
File: go/deploy/metald/internal/database/repository.go:0-0
Timestamp: 2025-07-15T14:59:30.212Z
Learning: go/deploy/metald cannot currently import helpers from go/pkg/db because it is not yet part of the main Go module; avoid suggesting such imports until the modules are unified.
Applied to files:
go/apps/ctrl/run.go
📚 Learning: 2025-09-25T18:49:28.948Z
Learnt from: mcstepp
Repo: unkeyed/unkey PR: 4010
File: go/apps/ctrl/run.go:224-233
Timestamp: 2025-09-25T18:49:28.948Z
Learning: In go/apps/ctrl/run.go, mcstepp prefers to keep ACME service handlers wrapped with auth interceptor during the demo phase for simplicity, even though typically ACME routes would be exempt from authentication. This is part of the temporary demo authentication system.
Applied to files:
go/apps/ctrl/run.gogo/cmd/ctrl/main.go
📚 Learning: 2025-09-15T18:12:01.503Z
Learnt from: mcstepp
Repo: unkeyed/unkey PR: 3952
File: go/apps/ctrl/run.go:212-213
Timestamp: 2025-09-15T18:12:01.503Z
Learning: In Go connect-go generated handlers, mux.Handle can accept New*ServiceHandler functions directly even though they return (string, http.Handler) tuples. The pattern mux.Handle(ctrlv1connect.New*ServiceHandler(...)) is valid and compiles successfully in unkey codebase.
Applied to files:
go/apps/ctrl/run.go
📚 Learning: 2025-09-11T14:24:40.988Z
Learnt from: Flo4604
Repo: unkeyed/unkey PR: 3944
File: go/apps/ctrl/services/deployment/deploy_workflow.go:326-334
Timestamp: 2025-09-11T14:24:40.988Z
Learning: The InsertDomains method in the bulk queries uses ON DUPLICATE KEY UPDATE, making it an upsert operation that is idempotent and safe for retries, despite the "Insert" naming convention.
Applied to files:
go/pkg/db/querier_generated.go
🧬 Code graph analysis (4)
go/apps/ctrl/run.go (9)
go/apps/ctrl/internal/caches/caches.go (2)
New(23-58)Config(18-21)go/pkg/uid/uid.go (1)
New(103-152)go/apps/ctrl/config.go (2)
Config(95-177)CloudflareConfig(27-33)go/pkg/clock/interface.go (1)
Clock(12-18)go/apps/ctrl/services/acme/providers/http_provider.go (3)
NewHTTPProvider(54-64)HTTPConfig(47-51)HTTPProvider(45-45)go/apps/ctrl/services/acme/providers/cloudflare_provider.go (2)
NewCloudflareProvider(21-38)CloudflareConfig(13-18)go/pkg/db/handle_err_no_rows.go (1)
IsNotFound(10-12)go/pkg/db/custom_domain_upsert.sql_generated.go (1)
UpsertCustomDomainParams(22-29)go/pkg/db/models_generated.go (2)
CustomDomainsChallengeTypeDNS01(148-148)AcmeChallengesStatusWaiting(61-61)
go/pkg/db/querier_generated.go (4)
go/pkg/db/interface.go (1)
DBTX(29-34)go/pkg/db/custom_domain_find_by_domain_or_wildcard.sql_generated.go (1)
FindCustomDomainByDomainOrWildcardParams(20-24)go/pkg/db/models_generated.go (1)
CustomDomain(769-776)go/pkg/db/custom_domain_upsert.sql_generated.go (1)
UpsertCustomDomainParams(22-29)
go/cmd/ctrl/main.go (2)
go/pkg/cli/flag.go (4)
String(419-453)Default(364-416)EnvVar(320-339)Bool(497-535)go/apps/ctrl/config.go (2)
CloudflareConfig(27-33)Route53Config(35-51)
go/apps/ctrl/config.go (2)
go/apps/ctrl/services/acme/providers/cloudflare_provider.go (1)
CloudflareConfig(13-18)go/apps/ctrl/services/acme/providers/route53_provider.go (1)
Route53Config(13-23)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
- GitHub Check: Test API / API Test Local
- GitHub Check: Lint Go Code / Lint
- GitHub Check: Test Go API Local / Test
- GitHub Check: Test Packages / Test
- GitHub Check: Build / Build
- GitHub Check: Analyze (javascript-typescript)
🔇 Additional comments (13)
go/pkg/db/querier_generated.go (2)
180-187: LGTM!The
FindCustomDomainByDomainOrWildcardmethod correctly implements the logic to find a custom domain by either exact match or wildcard pattern, prioritizing exact matches via theORDER BY CASEclause.
2142-2150: LGTM. This generated method correctly implements the upsert pattern. The only call site ingo/apps/ctrl/run.gouses a hardcoded"unkey_internal"workspace for platform-managed ACME certificate infrastructure, so workspace reassignment is controlled and intentional.go/cmd/ctrl/main.go (3)
116-116: LGTM!The
acme-email-domainflag is properly configured with a default value and environment variable support. The default domain "unkey.com" aligns with the organization's namespace for ACME registration emails.
235-241: LGTM!The Route53 configuration is properly wired from CLI flags into the
AcmeConfigstruct, mapping all five Route53-specific fields correctly.
118-127: Route53 credential validation is already enforced in the config layer.The
Validate()method ingo/apps/ctrl/config.go(lines 234-238) ensures that whenacme-route53-enabledis true, the following fields are required:AccessKeyID,SecretAccessKey, andRegion. The CLI-level flags are correctly deferred to config validation, making this deferral appropriate.go/apps/ctrl/config.go (2)
35-51: LGTM!The
Route53Configstruct is well-documented and follows the same pattern asCloudflareConfig. The comment onHostedZoneIDclearly explains when it's needed (bypassing auto-discovery for domains with wildcard CNAMEs).
57-64: LGTM!The
AcmeConfigstruct is properly extended withEmailDomainandRoute53fields, maintaining consistency with the existingCloudflarefield structure. The comments clearly document the purpose of each field.go/apps/ctrl/run.go (6)
286-293: LGTM!The caches are initialized unconditionally (even when ACME is disabled) because they're needed for the ACME verification endpoint regardless of provider configuration. This is a good design decision that ensures the verification endpoint is always available.
295-340: LGTM! DNS provider selection follows proper exclusive pattern.The provider setup correctly implements:
- HTTP-01 provider is always created when ACME is enabled (for non-wildcard domains)
- DNS-01 provider is conditionally created based on Cloudflare OR Route53 being enabled
- Proper error handling for provider creation failures
- Both providers receive the shared
DomainCachefrom the caches moduleThe
else ifensures only one DNS provider is active at a time, which is the correct behavior.
342-352: LGTM! Certificate service properly configured with extended timeout.The certificate service is correctly wired with:
- All required dependencies (database, vault, logger, caches)
- New
EmailDomainandDefaultDomainconfiguration- Both
DNSProviderandHTTPProviderfor handling wildcard and regular certificates- Extended inactivity timeout (15 minutes) appropriate for DNS-01 propagation delays
The configuration aligns with the service's requirements for ACME challenge handling.
411-414: Bootstrap condition correctly gates wildcard domain creation.The bootstrap is properly gated by:
cfg.Acme.Enabled- ACME must be enableddnsProvider != nil- DNS-01 provider must be configured (wildcards require DNS-01)cfg.DefaultDomain != ""- Must have a default domainThis ensures the wildcard domain is only created when all prerequisites are met.
470-475: LGTM! ACME service properly wired with domain and challenge caches.The ACME service handler is correctly configured with
DomainCacheandChallengeCachefrom the shared caches module, enabling proper challenge verification and domain lookups.
539-593: LGTM! Bootstrap helper properly implements domain and challenge creation.The
bootstrapWildcardDomainfunction correctly:
- Constructs wildcard domain with "*." prefix (line 542)
- Uses
db.IsNotFound(err)for not-found detection (line 550), addressing past review feedback- Generates unique domain ID using
uid.New(uid.DomainPrefix)(line 556)- Uses
"unkey_internal"workspace for platform-managed resources (line 560)- Creates domain with
DNS-01challenge type (line 566)- Creates ACME challenge with
waitingstatus so renewal cron picks it up (line 581)- Properly handles errors with informative logging
The implementation is robust and follows the codebase's patterns.
| // Disable CNAME following in lego to prevent it from following wildcard CNAMEs | ||
| // (e.g., *.example.com -> loadbalancer.aws.com) and failing Route53 zone lookup. | ||
| // Must be set before creating any ACME DNS providers. | ||
| os.Setenv("LEGO_DISABLE_CNAME_SUPPORT", "true") |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
cat -n go/apps/ctrl/run.go | head -80 | tail -30Repository: unkeyed/unkey
Length of output: 1104
🏁 Script executed:
rg -n "LEGO_DISABLE_CNAME_SUPPORT" go/apps/ctrl/run.go -B 5 -A 5Repository: unkeyed/unkey
Length of output: 466
🏁 Script executed:
fd "cloudflare" go/apps/ctrl --type fRepository: unkeyed/unkey
Length of output: 116
🏁 Script executed:
sed -n '50,300p' go/apps/ctrl/run.go | head -150Repository: unkeyed/unkey
Length of output: 4600
🏁 Script executed:
cat go/apps/ctrl/services/acme/providers/cloudflare_provider.goRepository: unkeyed/unkey
Length of output: 1094
🏁 Script executed:
sed -n '61,400p' go/apps/ctrl/run.go | grep -n -A 5 -B 5 "cfg.Acme\|ACME" | head -100Repository: unkeyed/unkey
Length of output: 3909
🏁 Script executed:
rg -n "cfg\.Acme\.Enabled|if.*Acme" go/apps/ctrl/run.go -B 2 -A 2Repository: unkeyed/unkey
Length of output: 1728
🌐 Web query:
lego LEGO_DISABLE_CNAME_SUPPORT cloudflare DNS-01 challenge impact
💡 Result:
Short answer: LEGO_DISABLE_CNAME_SUPPORT controls whether the lego ACME client follows CNAMEs when resolving dns‑01 challenge names. If set (true), lego will NOT follow CNAME aliases; if unset (default) lego will follow them. [1][2]
Impact when using Cloudflare and dns‑01:
- If you use CNAME aliases (e.g., delegating _acme-challenge.example.com to another zone or to acme-dns), disabling CNAME support stops lego from following that alias and will look for/create TXT records in the original zone — causing validation failures. [1][4]
- Cloudflare features (proxying / CNAME flattening or a proxied record) can already break dns‑01 verification because the TXT/CNAME may not resolve as expected; combining that with LEGO_DISABLE_CNAME_SUPPORT=true makes debugging harder and is likely to fail if the TXT actually lives under the aliased zone. See Cloudflare CNAME/CNAME-flattening notes. [3]
- Some users/tools (e.g., Traefik integrations) mention toggling this env var when CNAME handling causes issues; often the root problem is DNS configuration or resolver choices rather than lego itself. Setting explicit DNS resolvers or fixing the CNAME/proxy settings on Cloudflare is another remedy. [5][4]
Recommendations:
- If you delegate _acme-challenge via CNAME to a different provider (acme-dns or another Cloudflare zone), leave CNAME support enabled (unset LEGO_DISABLE_CNAME_SUPPORT). [1][4]
- Ensure the CNAME record is DNS‑only (not proxied) and that Cloudflare CNAME flattening won’t interfere. [3]
- If you must disable CNAME following, make sure the TXT records are writable in the zone lego will query (or change your provider config so the TXT is created where lego expects). [1][4]
- If failures persist, check logs, confirm TXT resolution from public resolvers, or configure explicit resolvers for the DNS challenge. [5]
Sources:
- lego CLI options (LEGO_DISABLE_CNAME_SUPPORT description). [1]
- lego project features (CNAME support default). [2]
- Cloudflare DNS docs on CNAME / flattening / proxied records. [3]
- go-acme/lego GitHub issue discussing CNAME/dns‑01 delegation problems. [4]
- Traefik community post referencing LEGO_DISABLE_CNAME_SUPPORT and resolver workarounds. [5]
Environment variable set globally before ACME providers are created; clarify Cloudflare DNS-01 impact.
The LEGO_DISABLE_CNAME_SUPPORT setting is correctly placed after validation but before any ACME DNS providers are instantiated. However, this global environment variable affects both Route53 and Cloudflare providers:
- Route53: Disabling CNAME following prevents issues with wildcard CNAME records (e.g.,
*.example.com → loadbalancer.aws.com) that would fail zone lookups. - Cloudflare: Disabling CNAME support will break dns-01 challenges if the setup delegates
_acme-challengevia CNAME to another zone or relies on CNAME flattening. This is only safe if using Cloudflare's standard direct zone record management (not CNAME-delegated challenges).
If Cloudflare users employ CNAME-based challenge delegation, this environment variable will cause dns-01 validation failures. Consider either:
- Documenting that Cloudflare integration requires direct zone records (not CNAME delegation), or
- Making this setting conditional per-provider (more complex but provider-specific).
🤖 Prompt for AI Agents
In go/apps/ctrl/run.go around lines 58-61 the global
os.Setenv("LEGO_DISABLE_CNAME_SUPPORT","true") unconditionally disables CNAME
following for all ACME DNS providers which breaks Cloudflare setups that
delegate _acme-challenge via CNAME; change this so the env var is only set when
using providers that require CNAME following disabled (e.g., Route53) or make it
configurable: detect the configured DNS provider(s) before creating providers
and either (a) set LEGO_DISABLE_CNAME_SUPPORT only if the provider is
Route53/other non-CNAME-delegated provider, or (b) add a config
flag/documentation indicating Cloudflare requires direct zone records and do not
set the env var when Cloudflare is in use. Ensure the check runs before provider
construction and keep the default behavior unchanged for existing Route53 users.
* feat: add lets encrypt challenges * always disable cname following * cleanup some code * cleanup some code * cleanup some code * cleanup some code * cleanup some code * fix golint issues * fix golint issues
* feat: add environment variables db schema and queries * fix db query * feat: add SecretsConfig proto for encrypted env vars * [autofix.ci] apply automated fixes * feat: dashboard UI for environment variables management * fix comment and rename file * fix file export name * Remove unnecessary comments from add-env-vars * add toasts for environment variable operations * [autofix.ci] apply automated fixes * fix: add try/catch error handling to env var mutations * unfmt file * [autofix.ci] apply automated fixes * feat: decrypt env vars in CTRL workflow before passing to Krane * feat: inject env vars into pod spec via Krane * feat: add customer-workload service account for pod isolation * remove gw from k8s manifest, add agent fix ctrl vault for certs * seperate master keys too * add inital webhook stuff * add generated stuff * adjust comments * use otel lgtm stack in k8s too * fix some rabbit comments * fix some rabbit comments * get rid of some unncessary comments * actually add unkey env cmd gitignores... * fix golint issues * Fix/update validation issues status label (#4478) * fix: update API key status label from 'Potential issues' to 'High Error Rate' Changed the validation-issues status label to more clearly communicate that the key is receiving invalid requests, rather than implying the API or key itself is broken. Changes: - Label: 'Potential issues' → 'High Error Rate' - Tooltip: Updated to clarify that requests are invalid (rate limited, unauthorized, etc.) rather than suggesting system issues Fixes #4474 * chore: apply biome formatting * fix: update status label to 'Elevated Rejections' per review --------- Co-authored-by: CodeReaper <148160799+MichaelUnkey@users.noreply.github.com> * chore: Remove un-used UI components (#4472) * removed un used components * updated members refs --------- Co-authored-by: James P <james@unkey.dev> Co-authored-by: Andreas Thomas <dev@chronark.com> * perf: fix n+1 (#4484) * fix: add 403 error when 0 key verification perms (#4483) * fix: add 403 error when 0 key verification perms * cleanup tests * feat: add environment variables db schema and queries (#4450) * feat: add environment variables db schema and queries * fix db query * feat: add SecretsConfig proto for encrypted env vars (#4451) * feat: add environment variables db schema and queries * fix db query * feat: add SecretsConfig proto for encrypted env vars * [autofix.ci] apply automated fixes --------- Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com> Co-authored-by: Andreas Thomas <dev@chronark.com> * feat: add GetPullToken * feat: dashboard UI for environment variables management (#4452) * feat: add environment variables db schema and queries * fix db query * feat: add SecretsConfig proto for encrypted env vars * [autofix.ci] apply automated fixes * feat: dashboard UI for environment variables management * fix comment and rename file * fix file export name * Remove unnecessary comments from add-env-vars * add toasts for environment variable operations * [autofix.ci] apply automated fixes * fix: add try/catch error handling to env var mutations * unfmt file * [autofix.ci] apply automated fixes --------- Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com> Co-authored-by: Andreas Thomas <dev@chronark.com> * feat: decrypt env vars in CTRL workflow before passing to Krane (#4453) * feat: add environment variables db schema and queries * fix db query * feat: add SecretsConfig proto for encrypted env vars * [autofix.ci] apply automated fixes * feat: dashboard UI for environment variables management * fix comment and rename file * fix file export name * Remove unnecessary comments from add-env-vars * add toasts for environment variable operations * [autofix.ci] apply automated fixes * fix: add try/catch error handling to env var mutations * unfmt file * [autofix.ci] apply automated fixes * feat: decrypt env vars in CTRL workflow before passing to Krane --------- Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com> Co-authored-by: Andreas Thomas <dev@chronark.com> * feat: inject env vars into pod spec via Krane (#4454) * feat: add environment variables db schema and queries * fix db query * feat: add SecretsConfig proto for encrypted env vars * [autofix.ci] apply automated fixes * feat: dashboard UI for environment variables management * fix comment and rename file * fix file export name * Remove unnecessary comments from add-env-vars * add toasts for environment variable operations * [autofix.ci] apply automated fixes * fix: add try/catch error handling to env var mutations * unfmt file * [autofix.ci] apply automated fixes * feat: decrypt env vars in CTRL workflow before passing to Krane * feat: inject env vars into pod spec via Krane --------- Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com> Co-authored-by: Andreas Thomas <dev@chronark.com> * feat: add customer-workload service account for pod isolation (#4455) * feat: add environment variables db schema and queries * fix db query * feat: add SecretsConfig proto for encrypted env vars * [autofix.ci] apply automated fixes * feat: dashboard UI for environment variables management * fix comment and rename file * fix file export name * Remove unnecessary comments from add-env-vars * add toasts for environment variable operations * [autofix.ci] apply automated fixes * fix: add try/catch error handling to env var mutations * unfmt file * [autofix.ci] apply automated fixes * feat: decrypt env vars in CTRL workflow before passing to Krane * feat: inject env vars into pod spec via Krane * feat: add customer-workload service account for pod isolation --------- Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com> Co-authored-by: Andreas Thomas <dev@chronark.com> * remove gw from k8s manifest, add agent fix ctrl vault for certs (#4463) * feat: add environment variables db schema and queries * fix db query * feat: add SecretsConfig proto for encrypted env vars * [autofix.ci] apply automated fixes * feat: dashboard UI for environment variables management * fix comment and rename file * fix file export name * Remove unnecessary comments from add-env-vars * add toasts for environment variable operations * [autofix.ci] apply automated fixes * fix: add try/catch error handling to env var mutations * unfmt file * [autofix.ci] apply automated fixes * feat: decrypt env vars in CTRL workflow before passing to Krane * feat: inject env vars into pod spec via Krane * feat: add customer-workload service account for pod isolation * remove gw from k8s manifest, add agent fix ctrl vault for certs * seperate master keys too --------- Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com> Co-authored-by: Andreas Thomas <dev@chronark.com> * chore: Make Stripe Great Again (#4479) * fix: Make stripe webhooks more robust * chore: Move alert to UI (#4485) * Moved alert to ui and swapped usages * feat: better env var injection (#4468) * feat: add environment variables db schema and queries * fix db query * feat: add SecretsConfig proto for encrypted env vars * [autofix.ci] apply automated fixes * feat: dashboard UI for environment variables management * fix comment and rename file * fix file export name * Remove unnecessary comments from add-env-vars * add toasts for environment variable operations * [autofix.ci] apply automated fixes * fix: add try/catch error handling to env var mutations * unfmt file * [autofix.ci] apply automated fixes * feat: decrypt env vars in CTRL workflow before passing to Krane * feat: inject env vars into pod spec via Krane * feat: add customer-workload service account for pod isolation * remove gw from k8s manifest, add agent fix ctrl vault for certs * seperate master keys too * add inital webhook stuff * add generated stuff * adjust comments * use otel lgtm stack in k8s too * fix some rabbit comments * fix some rabbit comments * get rid of some unncessary comments * actually add unkey env cmd gitignores... * fix golint issues (#4477) * [autofix.ci] apply automated fixes * fix fmt * linter be happy --------- Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com> Co-authored-by: Andreas Thomas <dev@chronark.com> * make token pod owned * feat: add lets encrypt challenges (#4471) * feat: add lets encrypt challenges * always disable cname following * cleanup some code * cleanup some code * cleanup some code * cleanup some code * cleanup some code * fix golint issues * fix golint issues * fmt * remove old webhook code * remove old webhook code * make build id not optiona * cleanup * cleanup * fmt * fmt --------- Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com> Co-authored-by: abhay <88815641+theabhayprajapati@users.noreply.github.com> Co-authored-by: CodeReaper <148160799+MichaelUnkey@users.noreply.github.com> Co-authored-by: James P <james@unkey.dev> Co-authored-by: Andreas Thomas <dev@chronark.com>

What does this PR do?
Fixes #4054
We got automated certificates again!
Adds route53 as certificate provider (aws)
Uses self scheduling restate job to automatically refresh soon to be expiring certs or ones that we weren't able to issue yet.
Type of change
How should this be tested?
Checklist
Required
pnpm buildpnpm fmtmake fmton/godirectoryconsole.logsgit pull origin mainAppreciated