Skip to content

engineering docs#5350

Closed
chronark wants to merge 3 commits intomainfrom
engineering-docs
Closed

engineering docs#5350
chronark wants to merge 3 commits intomainfrom
engineering-docs

Conversation

@chronark
Copy link
Collaborator

  • feat: create and delete projects
  • fix: cascade
  • docs: engineering on unkey?

@vercel
Copy link

vercel bot commented Mar 17, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
dashboard Ready Ready Preview, Comment Mar 17, 2026 9:22am

Request Review

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 17, 2026

📝 Walkthrough

Walkthrough

This PR introduces project and app deletion workflows in the control plane via new services and gRPC endpoints. It adds cascading deletion logic through Restate workers, corresponding SQL/database operations, and updates the dashboard to delegate project operations to remote control plane services instead of local logic.

Changes

Cohort / File(s) Summary
Documentation & Infrastructure
docs/engineering/Dockerfile, docs/engineering/Makefile, docs/engineering/style.css, docs/engineering/scripts/sync-infra-docs.ts, docs/engineering/docs.json
Added Bun-based development environment configuration, Makefile with doc sync/dev/lint targets, CSS styling for headings, and TypeScript script to sync infra docs from GitHub into docs.json navigation structure. Restructured docs.json with theme/color updates, added Company tab, and reorganized Infra section.
Documentation Removals
docs/engineering/hosting/*, docs/engineering/infra/operations/*, docs/engineering/architecture/workflows/*, docs/engineering/infra/index.mdx
Removed hosting documentation (backup, configuration, install, scaling, security, troubleshooting, upgrade) and operations/infrastructure runbooks; deleted workflows index page. Created .gitignore for generated infra docs folder.
Documentation Link Formatting
docs/engineering/architecture/*, docs/engineering/contributing/quality/testing/http-handler-tests.mdx
Converted HTML anchor tags to Markdown-style links across architecture documentation and test documentation; updated YAML front matter formatting in company pages from quoted to unquoted scalars.
SQL & Database Queries
pkg/db/queries/*, pkg/db/*_generated.go (26 files)
Added 26 new sqlc-generated database operations for cascading deletions and bulk operations: delete operations for apps, projects, environments, deployments, custom domains, routes, connections, and related settings; insert/list/count operations for environments, apps, instances, and sentinels.
Control Plane Services
svc/ctrl/services/{app,project}/*, svc/ctrl/worker/{app,environment,project}/*
Implemented AppService and ProjectService with create/delete endpoints in the control API; added Restate worker virtual objects handling cascading deletion of environments, apps, and projects with transactional cleanup of all associated resources.
Protocol Buffers
svc/ctrl/proto/ctrl/v1/{app,project}.proto, svc/ctrl/proto/hydra/v1/{app,environment,project}.proto
Defined AppService, ProjectService, and EnvironmentService gRPC interfaces with create/delete RPCs and associated request/response message types for both control API and Restate worker layers.
Build Configuration
pkg/db/BUILD.bazel, svc/ctrl/api/BUILD.bazel, svc/ctrl/services/{app,project}/BUILD.bazel, svc/ctrl/worker/{app,environment,project}/BUILD.bazel, go.mod
Added Bazel build rules for new service packages and updated dependencies; included new Go module indirect dependencies for testing/container support (testcontainers, backoff, moby utilities).
Dashboard Frontend
web/apps/dashboard/app/(app)/[workspaceSlug]/projects/_components/list/, web/apps/dashboard/lib/trpc/routers/{ctrl.ts,deploy/project/*}, web/apps/dashboard/lib/collections/deploy/environment-settings.ts
Replaced local project creation/deletion logic with remote control plane calls via new ctrl service client; added DeleteProjectDialog integration; updated environment settings filtering to exclude null regions.
Integration Tests
svc/ctrl/integration/cleanup_test.go
Added comprehensive integration test verifying cascade deletion of project and all dependent resources (apps, environments, deployments, settings, routes) using Restate ingress and polling for eventual consistency.

Sequence Diagram

sequenceDiagram
    actor User
    participant Dashboard
    participant CtrlAPI as Control API
    participant Restate
    participant Database
    
    User->>Dashboard: Click Delete Project
    Dashboard->>CtrlAPI: DeleteProject(projectId)
    CtrlAPI->>Restate: Trigger ProjectService.Delete()
    
    Restate->>Database: ListAppIdsByProject(projectId)
    Database-->>Restate: [appId1, appId2, ...]
    
    Restate->>Restate: For each app
    Restate->>Restate: Trigger AppService.Delete(appId)
    
    Restate->>Database: ListEnvironmentIdsByApp(appId)
    Database-->>Restate: [envId1, envId2, ...]
    
    Restate->>Restate: For each environment
    Restate->>Restate: Trigger EnvironmentService.Delete(envId)
    Restate->>Database: Delete all settings by environment<br/>(deployments, routes, sentinels, etc.)
    Database-->>Restate: Success
    Restate->>Database: DeleteEnvironmentById(envId)
    Database-->>Restate: Success
    
    Restate->>Database: DeleteGithubRepoConnectionsByAppId(appId)
    Database-->>Restate: Success
    Restate->>Database: DeleteAppById(appId)
    Database-->>Restate: Success
    
    Restate->>Database: DeleteProjectById(projectId)
    Database-->>Restate: Success
    
    Restate-->>CtrlAPI: Deletion complete
    CtrlAPI-->>Dashboard: Success
    Dashboard-->>User: Project deleted
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch engineering-docs

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 15

Note

Due to the large number of review comments, Critical, Major severity comments were prioritized as inline comments.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
web/apps/dashboard/lib/trpc/routers/deploy/project/create.ts (1)

41-50: ⚠️ Potential issue | 🟠 Major

Same error type mismatch as delete.ts - ConnectError vs TRPCError.

The instanceof TRPCError check on line 42 will never match since ctrl.project.createProject uses ConnectRPC, which throws ConnectError. Apply the same fix as suggested for delete.ts.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@web/apps/dashboard/lib/trpc/routers/deploy/project/create.ts` around lines 41
- 50, The catch block in the createProject handler incorrectly checks for
TRPCError (which never matches ConnectRPC failures); update the error handling
to detect and rethrow ConnectError thrown by ctrl.project.createProject (similar
to delete.ts) — import or reference ConnectError from the ConnectRPC client, if
the caught error is a ConnectError rethrow it (or map it appropriately),
otherwise log the error and throw a TRPCError with the existing
INTERNAL_SERVER_ERROR message; ensure you update the instanceof check from
TRPCError to ConnectError and preserve the existing console.error and TRPCError
creation for non-ConnectError cases.
🟡 Minor comments (6)
web/apps/dashboard/app/(app)/[workspaceSlug]/projects/_components/list/project-actions.tsx-25-26 (1)

25-26: ⚠️ Potential issue | 🟡 Minor

Non-null assertion violates coding guidelines.

The ! operator on workspace?.slug! compromises type safety. Consider using a guard clause or providing a fallback, such as returning early if workspace?.slug is undefined, or disabling the menu items when workspace is not available.

Suggested fix
 export const ProjectActions = ({
   projectId,
   projectName,
   children,
 }: PropsWithChildren<ProjectActionsProps>) => {
   const router = useRouter();
   const { workspace } = useWorkspace();
-  // biome-ignore lint/style/noNonNullAssertion: This cannot be null
-  const menuItems = getProjectActionItems(projectId, projectName, workspace?.slug!, router);
+  
+  if (!workspace?.slug) {
+    return <>{children}</>;
+  }
+
+  const menuItems = getProjectActionItems(projectId, projectName, workspace.slug, router);

   return <TableActionPopover items={menuItems}>{children}</TableActionPopover>;
 };

As per coding guidelines: **/*.{ts,tsx}: Never compromise type safety: No any, no ! (non-null assertion), no as Type.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@web/apps/dashboard/app/`(app)/[workspaceSlug]/projects/_components/list/project-actions.tsx
around lines 25 - 26, Remove the non-null assertion on workspace?.slug and guard
against undefined before calling getProjectActionItems: check workspace?.slug
(or workspace) and either early-return/disable the menu when it's missing or
call getProjectActionItems only when slug is defined (e.g., const slug =
workspace?.slug; if (!slug) return /* disabled menu or placeholder */; const
menuItems = getProjectActionItems(projectId, projectName, slug, router);).
Update the surrounding logic that consumes menuItems to handle the
disabled/empty case so type safety is preserved and no `!` is used.
docs/engineering/infra/.gitignore-1-2 (1)

1-2: ⚠️ Potential issue | 🟡 Minor

Fix grammar in the generated-folder notice.

Line 1 reads awkwardly (“content ... generated”). Small wording fix will make this clearer.

✏️ Proposed wording tweak
-# The content of this folder generated by running bun `scripts/sync-infra-docs.ts`. DO NOT EDIT.
-# To change these, go to github.com/unkeyed/infra
+# The content of this folder is generated by running `bun scripts/sync-infra-docs.ts`. DO NOT EDIT.
+# To change this content, go to github.com/unkeyed/infra
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/engineering/infra/.gitignore` around lines 1 - 2, Update the
generated-folder notice to read more clearly: replace the first line text ("#
The content of this folder generated by running bun
`scripts/sync-infra-docs.ts`. DO NOT EDIT.") with a grammatically correct
sentence such as "This folder's contents are generated by running bun
`scripts/sync-infra-docs.ts`. DO NOT EDIT." and keep the second line ("# To
change these, go to github.com/unkeyed/infra") unchanged; locate and edit the
two comment lines at the top of docs/engineering/infra/.gitignore to apply this
wording fix.
docs/engineering/Dockerfile-9-13 (1)

9-13: ⚠️ Potential issue | 🟡 Minor

Avoid running make dev as root.

The final process still runs as root, so when this image is used with a bind mount any generated files come back root-owned on the host. Add a non-root USER before the final CMD and ensure /app is owned by that user.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/engineering/Dockerfile` around lines 9 - 13, The Dockerfile currently
ends with CMD ["make", "dev"] which leaves the container running as root; create
a non-root user and switch to it before the final CMD and ensure /app is owned
by that user so bind-mounted files aren’t root-owned on the host. Add steps (in
the Dockerfile) to create a dedicated user/group (e.g., app), chown -R /app to
that user, and add a USER app line immediately before CMD ["make","dev"];
reference the final CMD and the /app directory so reviewers can find where to
apply the ownership and USER change.
docs/engineering/architecture/services/control-plane/worker/overview.mdx-12-12 (1)

12-12: ⚠️ Potential issue | 🟡 Minor

Use /tree/ for the worker package directory link.

svc/ctrl/worker is a directory, not a file, so it should use /tree/ instead of /blob/ in the GitHub URL for correct semantics.

🔗 Suggested fix
-[`svc/ctrl/worker`](https://github.com/unkeyed/unkey/blob/main/svc/ctrl/worker)
+[`svc/ctrl/worker`](https://github.com/unkeyed/unkey/tree/main/svc/ctrl/worker)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/engineering/architecture/services/control-plane/worker/overview.mdx` at
line 12, The Markdown link currently points to
https://github.com/unkeyed/unkey/blob/main/svc/ctrl/worker (using /blob/), but
that path is a directory; update the link so it uses /tree/ instead of /blob/
(i.e., change the URL in the link for `svc/ctrl/worker` to
https://github.com/unkeyed/unkey/tree/main/svc/ctrl/worker) to correctly
reference the package directory.
docs/engineering/architecture/services/control-plane/worker/workflows/routing.mdx-10-10 (1)

10-10: ⚠️ Potential issue | 🟡 Minor

Use the GitHub tree view for package links.

svc/ctrl/worker/routing is a directory, so the current /blob/ URL will not render correctly.

🔗 Suggested fix
-- Routing service ([`svc/ctrl/worker/routing`](https://github.com/unkeyed/unkey/blob/main/svc/ctrl/worker/routing)).
+- Routing service ([`svc/ctrl/worker/routing`](https://github.com/unkeyed/unkey/tree/main/svc/ctrl/worker/routing)).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@docs/engineering/architecture/services/control-plane/worker/workflows/routing.mdx`
at line 10, The link to the routing service currently points to a blob URL for
`svc/ctrl/worker/routing` which is a directory and won't render; update the URL
to use the GitHub tree view (change `/blob/` to `/tree/`) for the
`svc/ctrl/worker/routing` link so it correctly navigates to the package
directory in
docs/engineering/architecture/services/control-plane/worker/workflows/routing.mdx.
docs/engineering/scripts/sync-infra-docs.ts-84-86 (1)

84-86: ⚠️ Potential issue | 🟡 Minor

Remove the non-null assertion in buildTree.

The code can stay fully typed without the ! on line 86, which aligns with the repo's TypeScript safety requirements.

♻️ Minimal refactor
-      if (!subdirs.has(dir)) subdirs.set(dir, []);
-      subdirs.get(dir)!.push(page);
+      const dirPages = subdirs.get(dir);
+      if (dirPages) {
+        dirPages.push(page);
+      } else {
+        subdirs.set(dir, [page]);
+      }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/engineering/scripts/sync-infra-docs.ts` around lines 84 - 86, In
buildTree, remove the non-null assertion on subdirs.get(dir)! by ensuring a
typed array is retrieved or created before pushing: after computing dir, fetch
const arr = subdirs.get(dir); if arr is undefined create a new array and set it
on subdirs (subdirs.set(dir, arr)); then call arr.push(page). This keeps types
safe without using the `!` operator and references the existing symbols
buildTree, subdirs, dir, and page.
🧹 Nitpick comments (3)
docs/engineering/Dockerfile (1)

5-6: Use apk add --no-cache in one layer.

The current form keeps the package index in the image and adds an extra layer for no benefit.

Suggested cleanup
-RUN apk add github-cli
-RUN apk add make
+RUN apk add --no-cache github-cli make
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/engineering/Dockerfile` around lines 5 - 6, Replace the two separate RUN
apk add lines (the statements installing github-cli and make) with a single RUN
that installs both packages in one layer and uses --no-cache to avoid leaving
the package index in the image; update the lines referencing "RUN apk add
github-cli" and "RUN apk add make" to a single invocation that includes
--no-cache and both package names so the image has one layer and no cached
index.
svc/ctrl/proto/hydra/v1/environment.proto (1)

16-21: Buf linter flags naming convention, but pattern appears consistent with codebase.

Static analysis reports that DeleteEnvironmentRequest/DeleteEnvironmentResponse should be named DeleteRequest or EnvironmentServiceDeleteRequest per Buf's standard naming conventions. However, comparing with svc/ctrl/proto/ctrl/v1/project.proto in this PR (which uses DeleteProjectRequest/DeleteProjectResponse), the Delete{Entity}Request/Response pattern appears to be the established convention in this codebase.

Consider either:

  1. Suppressing the Buf lint rule if this naming convention is intentional
  2. Aligning all proto files to the standard naming convention
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@svc/ctrl/proto/hydra/v1/environment.proto` around lines 16 - 21, The Buf
linter flags DeleteEnvironmentRequest/DeleteEnvironmentResponse for naming, but
the codebase uses the Delete{Entity}Request/Response pattern (see
DeleteProjectRequest/DeleteProjectResponse); decide which approach to take and
apply it consistently: either (A) suppress the specific Buf lint rule in your
buf configuration (e.g., update the lint ruleset to ignore the naming rule) so
DeleteEnvironmentRequest/DeleteEnvironmentResponse remain unchanged, or (B)
rename the messages to the standard form (e.g., DeleteRequest/DeleteResponse or
EnvironmentServiceDeleteRequest) across this proto and any codegen consumers,
updating all references to DeleteEnvironmentRequest/DeleteEnvironmentResponse
(including RPC signature and generated bindings) to the new names to keep
linter-compliant naming.
svc/ctrl/integration/cleanup_test.go (1)

49-208: This still only proves the single-child path.

The fixture creates one app and one environment, so it never exercises the new fan-out over project apps and app environments. A delete flow that only handles the first child would still pass; add at least one second app and one extra environment, ideally as separate t.Run scenarios.

As per coding guidelines, "**/*_test.go: Organize tests using t.Run() for scenarios."

Also applies to: 210-252

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@svc/ctrl/integration/cleanup_test.go` around lines 49 - 208, The test only
seeds one app and one environment (calls to h.Seed.CreateApp and
h.Seed.CreateEnvironment) so it doesn't exercise fan-out deletion across
multiple apps/environments; update cleanup_test.go to create at least a second
App and a second Environment (using another h.Seed.CreateApp and
h.Seed.CreateEnvironment with distinct IDs) and assert the cleanup behavior for
both children, and refactor into separate t.Run subtests (e.g., "multiple-apps"
and "multiple-environments") to cover the multi-child paths rather than only the
single-child case.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/engineering/Makefile`:
- Around line 1-12: The Makefile is missing a fmt target while CI/autofix.ci
invokes make fmt; either add a fmt target to the Makefile or remove/adjust the
CI job to stop calling it. To fix, add a new target named fmt (alongside sync,
dev, lint) that runs the repository’s formatting command used by CI (or mirror
whatever formatter command CI expects), or alternatively update the CI workflow
to call one of the existing targets (e.g., lint) instead of make fmt; reference
the Makefile targets sync, dev, lint and the missing fmt target when making the
change.

In `@docs/engineering/scripts/sync-infra-docs.ts`:
- Line 12: The GROUP_NAME constant is outdated and doesn't match docs.json
("Infra Repo"), causing lookups to fail and a duplicate generated group to be
appended; update the GROUP_NAME value from "Infra" to "Infra Repo" (and update
any other uses of GROUP_NAME in the same file, e.g., the lookup/replacement
logic around the block that currently references GROUP_NAME at lines ~120-125)
so the script matches and replaces the existing group instead of creating a
second one.

In `@pkg/db/queries/deployment_delete_by_environment.sql`:
- Around line 1-2: When deleting an environment, ensure you remove dependent
instances first to avoid orphaned rows: call the DeleteDeploymentInstances query
(the generated function for the instances deletion) before calling
DeleteDeploymentsByEnvironmentId in the environment deletion handler (where the
environment delete flow currently invokes DeleteDeploymentsByEnvironmentId);
update that handler to first execute DeleteDeploymentInstances filtered by the
same environment/deployment identifiers, then execute
DeleteDeploymentsByEnvironmentId, preserving transactional ordering so both ops
run in the same transaction.

In `@svc/ctrl/integration/cleanup_test.go`:
- Around line 258-263: The countRows helper currently accepts a variadic args
...any which weakens type safety; change its signature to accept a single arg
string (func countRows(t *testing.T, ctx context.Context, database db.Database,
query string, arg string) int), update the QueryRowContext call to use arg (not
args...), and update the two call sites that pass a single string (lines
referenced) to pass that string as the new arg parameter; keep the
require.NoError check and return unchanged.

In `@svc/ctrl/proto/hydra/v1/app.proto`:
- Around line 16-21: Rename the RPC request/response message types to
service-scoped names to satisfy Buf lint: change the messages currently named
DeleteAppRequest and DeleteAppResponse and update the rpc Delete signature to
use the new names (e.g., AppDeleteRequest and AppDeleteResponse or another
App-prefixed pair); ensure you update all references to
DeleteAppRequest/DeleteAppResponse to the new symbol names (rpc Delete, message
DeleteAppRequest, message DeleteAppResponse) so the proto compiles and the
RPC_*_STANDARD_NAME lint passes.

In `@svc/ctrl/proto/hydra/v1/project.proto`:
- Around line 16-21: The RPC method rpc Delete currently uses non-standard
message names DeleteProjectRequest and DeleteProjectResponse which violates
RPC_REQUEST_STANDARD_NAME / RPC_RESPONSE_STANDARD_NAME; rename those messages to
DeleteRequest and DeleteResponse and update the rpc signature to use
DeleteRequest and return DeleteResponse (i.e., change "rpc
Delete(DeleteProjectRequest) returns (DeleteProjectResponse) {}" to "rpc
Delete(DeleteRequest) returns (DeleteResponse) {}" and rename the message
definitions accordingly).

In `@svc/ctrl/services/app/service.go`:
- Around line 9-11: The constructor New for the AppService currently accepts a
Config that allows Restate to be nil, enabling a partially constructed service
that only fails on delete; change New to validate and reject a nil Restate (or
accept separate create/delete dependency variants) so the service cannot be
instantiated without a Restate implementation—add a nil-check in New (and update
Config validation if present) to return an error when Restate is nil, and update
callers/tests to supply a non-nil Restate or use the split constructor.

In `@svc/ctrl/services/project/create_project.go`:
- Around line 34-55: The project insert (db.Query.InsertProject) and subsequent
app creation (s.appService.CreateApp / ctrlv1.CreateAppRequest) must be made
atomic: either run both inside a DB transaction or, if transactions across the
external appService are impossible, implement compensating cleanup to delete the
inserted project on CreateApp failure. Update CreateProject to start a
transaction around InsertProject and commit only after CreateApp succeeds, or
roll back and return a wrapped error; and ensure error returns are consistent by
wrapping the CreateApp error with connect.NewError(connect.CodeInternal,
fmt.Errorf("failed to create default app: %w", err)) similar to the
InsertProject error handling.

In `@svc/ctrl/services/project/delete_project.go`:
- Around line 24-26: The current error handling around client.Delete().Send(ctx,
...) rewraps all errors as connect.CodeInternal; change it to detect and
preserve context errors first by checking errors.Is(err, context.Canceled) and
errors.Is(err, context.DeadlineExceeded) (or equivalently using errors.Is for
both) and return connect.NewError(connect.CodeCanceled, err) or
connect.NewError(connect.CodeDeadlineExceeded, err) respectively, otherwise fall
back to connect.NewError(connect.CodeInternal, fmt.Errorf("failed to trigger
project deletion: %w", err)); update the return path where Delete().Send is
called so ctx cancellations/timeouts are not converted to 500s.

In `@svc/ctrl/services/project/service.go`:
- Around line 27-34: The New constructor currently allows a zero-value Config
that yields a Service with nil dependencies (used by DeleteProject via
s.restate); change New to validate required deps (at minimum cfg.Restate,
cfg.Database and cfg.AppService) and reject nils by returning an error (i.e.,
change signature to return (*Service, error)) or alternatively panic if your
project prefers, and update call sites accordingly so callers handle the error;
ensure the guard checks happen inside New before constructing Service to avoid
creating an invalid Service instance.

In `@svc/ctrl/worker/app/delete_handler.go`:
- Around line 31-36: The loop calling hydrav1.NewEnvironmentServiceClient(ctx,
envID) currently ignores the return from envClient.Delete().Send(); modify the
deletion call in the delete handler so it captures the error return and logs
failures (e.g., use envClient.Delete().Send(...) error check) and emit a
logger.Warn including app_id, environment_id and the error; follow the same
pattern used in other workers (certificate renewal, infra bootstrap) to avoid
silently failing environment deletions.

In `@svc/ctrl/worker/environment/delete_handler.go`:
- Around line 23-93: The delete sequence runs many independent restate.RunVoid
steps (e.g., DeleteCiliumNetworkPoliciesByEnvironmentId,
DeleteSentinelsByEnvironmentId, ... DeleteEnvironmentById) which allows
concurrent writers to insert new rows between steps; add an initial atomic step
to mark the environment as "deleting" (e.g., call an Update/SetDeleting method
on the environment record using envID within a single restate.Run or a DB
transaction) before any Delete* calls, and ensure all write paths
(inserts/updates that accept envID) check that "deleting" is set and reject new
writes; this serializes the cascade and prevents new rows from blocking
DeleteEnvironmentById.

In `@svc/ctrl/worker/project/delete_handler.go`:
- Around line 34-40: The call to
appClient.Delete().Send(&hydrav1.DeleteAppRequest{}) ignores the returned error,
so detect and handle the send error before proceeding to restate.RunVoid;
capture the result and error from appClient.Delete().Send (as in
certificate/renew_handler.go's sendErr pattern), log the failure via the
appropriate logger and return the error to abort further work so that
DeleteProjectById is not executed if the app delete failed; locate
NewAppServiceClient, the Delete().Send call, and the restate.RunVoid block to
implement this error check and early return using projectID and s.db.RW().

In `@web/apps/dashboard/lib/collections/deploy/environment-settings.ts`:
- Around line 129-135: The mapping over `regional` uses non-null assertions
(`r.region!`) which violates type-safety rules; replace the filter+map with a
single type-narrowing operation (e.g., use `flatMap` or a combined
filter-then-map callback) so `r.region` is guaranteed non-null without `!`.
Update the `regions` assignment that currently references `r.region!.id`,
`r.region!.name`, and `r.replicas` to instead only iterate over entries where
`r.region` is not null and then safely access `id` and `name` from `r.region`
(keeping `replicas` from `r`) to remove the non-null assertions.

In `@web/apps/dashboard/lib/trpc/routers/deploy/project/delete.ts`:
- Around line 41-50: The catch block in the delete handler incorrectly checks
for TRPCError (which never matches ConnectRPC errors) so replace the instanceof
TRPCError check with an instanceof ConnectError check and, where appropriate,
inspect error.code against Code.* constants; import ConnectError and Code from
"@connectrpc/connect", keep rethrowing TRPCError as-is but for ConnectError
propagate a more specific TRPCError (e.g., map ConnectError with
Code.NotFound/AlreadyExists to corresponding TRPCError codes) or rethrow the
ConnectError details in the processLogger before throwing a
TRPCError(INTERNAL_SERVER_ERROR). Apply the same fix to the create handler where
the same pattern occurs (references: ctrl.project.deleteProject and
ctrl.project.createProject, the catch blocks currently checking TRPCError).

---

Outside diff comments:
In `@web/apps/dashboard/lib/trpc/routers/deploy/project/create.ts`:
- Around line 41-50: The catch block in the createProject handler incorrectly
checks for TRPCError (which never matches ConnectRPC failures); update the error
handling to detect and rethrow ConnectError thrown by ctrl.project.createProject
(similar to delete.ts) — import or reference ConnectError from the ConnectRPC
client, if the caught error is a ConnectError rethrow it (or map it
appropriately), otherwise log the error and throw a TRPCError with the existing
INTERNAL_SERVER_ERROR message; ensure you update the instanceof check from
TRPCError to ConnectError and preserve the existing console.error and TRPCError
creation for non-ConnectError cases.

---

Minor comments:
In `@docs/engineering/architecture/services/control-plane/worker/overview.mdx`:
- Line 12: The Markdown link currently points to
https://github.com/unkeyed/unkey/blob/main/svc/ctrl/worker (using /blob/), but
that path is a directory; update the link so it uses /tree/ instead of /blob/
(i.e., change the URL in the link for `svc/ctrl/worker` to
https://github.com/unkeyed/unkey/tree/main/svc/ctrl/worker) to correctly
reference the package directory.

In
`@docs/engineering/architecture/services/control-plane/worker/workflows/routing.mdx`:
- Line 10: The link to the routing service currently points to a blob URL for
`svc/ctrl/worker/routing` which is a directory and won't render; update the URL
to use the GitHub tree view (change `/blob/` to `/tree/`) for the
`svc/ctrl/worker/routing` link so it correctly navigates to the package
directory in
docs/engineering/architecture/services/control-plane/worker/workflows/routing.mdx.

In `@docs/engineering/Dockerfile`:
- Around line 9-13: The Dockerfile currently ends with CMD ["make", "dev"] which
leaves the container running as root; create a non-root user and switch to it
before the final CMD and ensure /app is owned by that user so bind-mounted files
aren’t root-owned on the host. Add steps (in the Dockerfile) to create a
dedicated user/group (e.g., app), chown -R /app to that user, and add a USER app
line immediately before CMD ["make","dev"]; reference the final CMD and the /app
directory so reviewers can find where to apply the ownership and USER change.

In `@docs/engineering/infra/.gitignore`:
- Around line 1-2: Update the generated-folder notice to read more clearly:
replace the first line text ("# The content of this folder generated by running
bun `scripts/sync-infra-docs.ts`. DO NOT EDIT.") with a grammatically correct
sentence such as "This folder's contents are generated by running bun
`scripts/sync-infra-docs.ts`. DO NOT EDIT." and keep the second line ("# To
change these, go to github.com/unkeyed/infra") unchanged; locate and edit the
two comment lines at the top of docs/engineering/infra/.gitignore to apply this
wording fix.

In `@docs/engineering/scripts/sync-infra-docs.ts`:
- Around line 84-86: In buildTree, remove the non-null assertion on
subdirs.get(dir)! by ensuring a typed array is retrieved or created before
pushing: after computing dir, fetch const arr = subdirs.get(dir); if arr is
undefined create a new array and set it on subdirs (subdirs.set(dir, arr)); then
call arr.push(page). This keeps types safe without using the `!` operator and
references the existing symbols buildTree, subdirs, dir, and page.

In
`@web/apps/dashboard/app/`(app)/[workspaceSlug]/projects/_components/list/project-actions.tsx:
- Around line 25-26: Remove the non-null assertion on workspace?.slug and guard
against undefined before calling getProjectActionItems: check workspace?.slug
(or workspace) and either early-return/disable the menu when it's missing or
call getProjectActionItems only when slug is defined (e.g., const slug =
workspace?.slug; if (!slug) return /* disabled menu or placeholder */; const
menuItems = getProjectActionItems(projectId, projectName, slug, router);).
Update the surrounding logic that consumes menuItems to handle the
disabled/empty case so type safety is preserved and no `!` is used.

---

Nitpick comments:
In `@docs/engineering/Dockerfile`:
- Around line 5-6: Replace the two separate RUN apk add lines (the statements
installing github-cli and make) with a single RUN that installs both packages in
one layer and uses --no-cache to avoid leaving the package index in the image;
update the lines referencing "RUN apk add github-cli" and "RUN apk add make" to
a single invocation that includes --no-cache and both package names so the image
has one layer and no cached index.

In `@svc/ctrl/integration/cleanup_test.go`:
- Around line 49-208: The test only seeds one app and one environment (calls to
h.Seed.CreateApp and h.Seed.CreateEnvironment) so it doesn't exercise fan-out
deletion across multiple apps/environments; update cleanup_test.go to create at
least a second App and a second Environment (using another h.Seed.CreateApp and
h.Seed.CreateEnvironment with distinct IDs) and assert the cleanup behavior for
both children, and refactor into separate t.Run subtests (e.g., "multiple-apps"
and "multiple-environments") to cover the multi-child paths rather than only the
single-child case.

In `@svc/ctrl/proto/hydra/v1/environment.proto`:
- Around line 16-21: The Buf linter flags
DeleteEnvironmentRequest/DeleteEnvironmentResponse for naming, but the codebase
uses the Delete{Entity}Request/Response pattern (see
DeleteProjectRequest/DeleteProjectResponse); decide which approach to take and
apply it consistently: either (A) suppress the specific Buf lint rule in your
buf configuration (e.g., update the lint ruleset to ignore the naming rule) so
DeleteEnvironmentRequest/DeleteEnvironmentResponse remain unchanged, or (B)
rename the messages to the standard form (e.g., DeleteRequest/DeleteResponse or
EnvironmentServiceDeleteRequest) across this proto and any codegen consumers,
updating all references to DeleteEnvironmentRequest/DeleteEnvironmentResponse
(including RPC signature and generated bindings) to the new names to keep
linter-compliant naming.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 1f2e5d56-d3d2-4ae0-a096-2e87ee474a7f

📥 Commits

Reviewing files that changed from the base of the PR and between fc8d44f and 9e93c42.

⛔ Files ignored due to path filters (20)
  • docs/engineering/infra/operations/infrastructure/aws/gw-custom-attributes.png is excluded by !**/*.png
  • gen/proto/ctrl/v1/BUILD.bazel is excluded by !**/gen/**
  • gen/proto/ctrl/v1/app.pb.go is excluded by !**/*.pb.go, !**/gen/**
  • gen/proto/ctrl/v1/ctrlv1connect/BUILD.bazel is excluded by !**/gen/**
  • gen/proto/ctrl/v1/ctrlv1connect/app.connect.go is excluded by !**/gen/**
  • gen/proto/ctrl/v1/ctrlv1connect/project.connect.go is excluded by !**/gen/**
  • gen/proto/ctrl/v1/project.pb.go is excluded by !**/*.pb.go, !**/gen/**
  • gen/proto/hydra/v1/BUILD.bazel is excluded by !**/gen/**
  • gen/proto/hydra/v1/app.pb.go is excluded by !**/*.pb.go, !**/gen/**
  • gen/proto/hydra/v1/app_restate.pb.go is excluded by !**/*.pb.go, !**/gen/**
  • gen/proto/hydra/v1/environment.pb.go is excluded by !**/*.pb.go, !**/gen/**
  • gen/proto/hydra/v1/environment_restate.pb.go is excluded by !**/*.pb.go, !**/gen/**
  • gen/proto/hydra/v1/project.pb.go is excluded by !**/*.pb.go, !**/gen/**
  • gen/proto/hydra/v1/project_restate.pb.go is excluded by !**/*.pb.go, !**/gen/**
  • gen/rpc/ctrl/BUILD.bazel is excluded by !**/gen/**
  • gen/rpc/ctrl/app_generated.go is excluded by !**/gen/**
  • gen/rpc/ctrl/project_generated.go is excluded by !**/gen/**
  • go.sum is excluded by !**/*.sum
  • web/apps/dashboard/gen/proto/ctrl/v1/app_pb.ts is excluded by !**/gen/**
  • web/apps/dashboard/gen/proto/ctrl/v1/project_pb.ts is excluded by !**/gen/**
📒 Files selected for processing (157)
  • docs/engineering/Dockerfile
  • docs/engineering/Makefile
  • docs/engineering/architecture/index.mdx
  • docs/engineering/architecture/rfcs/0014-sentinel-middleware.mdx
  • docs/engineering/architecture/services/api/configuration.mdx
  • docs/engineering/architecture/services/api/overview.mdx
  • docs/engineering/architecture/services/control-plane/api/configuration.mdx
  • docs/engineering/architecture/services/control-plane/worker/configuration.mdx
  • docs/engineering/architecture/services/control-plane/worker/overview.mdx
  • docs/engineering/architecture/services/control-plane/worker/workflows/certificates.mdx
  • docs/engineering/architecture/services/control-plane/worker/workflows/custom-domains.mdx
  • docs/engineering/architecture/services/control-plane/worker/workflows/deployments.mdx
  • docs/engineering/architecture/services/control-plane/worker/workflows/github-app.mdx
  • docs/engineering/architecture/services/control-plane/worker/workflows/routing.mdx
  • docs/engineering/architecture/services/frontline/configuration.mdx
  • docs/engineering/architecture/services/frontline/routing.mdx
  • docs/engineering/architecture/services/krane/configuration.mdx
  • docs/engineering/architecture/services/krane/secrets.mdx
  • docs/engineering/architecture/services/sentinel/configuration.mdx
  • docs/engineering/architecture/services/vault/auth.mdx
  • docs/engineering/architecture/services/vault/configuration.mdx
  • docs/engineering/architecture/services/vault/overview.mdx
  • docs/engineering/architecture/workflows/index.mdx
  • docs/engineering/company/index.mdx
  • docs/engineering/company/meetings.mdx
  • docs/engineering/contributing/quality/testing/http-handler-tests.mdx
  • docs/engineering/docs.json
  • docs/engineering/hosting/backup-restore.mdx
  • docs/engineering/hosting/configuration.mdx
  • docs/engineering/hosting/index.mdx
  • docs/engineering/hosting/install.mdx
  • docs/engineering/hosting/scaling.mdx
  • docs/engineering/hosting/security.mdx
  • docs/engineering/hosting/troubleshooting.mdx
  • docs/engineering/hosting/upgrade.mdx
  • docs/engineering/infra/.gitignore
  • docs/engineering/infra/index.mdx
  • docs/engineering/infra/operations/cli/deploy/index.mdx
  • docs/engineering/infra/operations/cli/healthcheck/index.mdx
  • docs/engineering/infra/operations/cli/index.mdx
  • docs/engineering/infra/operations/cli/run/api/index.mdx
  • docs/engineering/infra/operations/cli/run/ctrl/index.mdx
  • docs/engineering/infra/operations/cli/run/index.mdx
  • docs/engineering/infra/operations/cli/run/ingress/index.mdx
  • docs/engineering/infra/operations/cli/run/krane/index.mdx
  • docs/engineering/infra/operations/cli/version/get/index.mdx
  • docs/engineering/infra/operations/cli/version/index.mdx
  • docs/engineering/infra/operations/cli/version/list/index.mdx
  • docs/engineering/infra/operations/cli/version/rollback/index.mdx
  • docs/engineering/infra/operations/index.mdx
  • docs/engineering/infra/operations/infra/deployments.mdx
  • docs/engineering/infra/operations/infra/eks-cluster.mdx
  • docs/engineering/infra/operations/infra/index.mdx
  • docs/engineering/infra/operations/infrastructure/accounts.mdx
  • docs/engineering/infra/operations/infrastructure/aws/google-idp.mdx
  • docs/engineering/infra/operations/infrastructure/aws/index.mdx
  • docs/engineering/infra/operations/infrastructure/aws/user-group-management.mdx
  • docs/engineering/infra/operations/infrastructure/dev/github-app.mdx
  • docs/engineering/infra/operations/infrastructure/dev/seeding-db-and-clickhouse.mdx
  • docs/engineering/infra/operations/infrastructure/dev/workos.mdx
  • docs/engineering/infra/operations/infrastructure/index.mdx
  • docs/engineering/infra/operations/infrastructure/stripe/subscriptions.mdx
  • docs/engineering/infra/operations/reference/index.mdx
  • docs/engineering/infra/operations/runbooks/clickhouse-user-provisioning.mdx
  • docs/engineering/infra/operations/runbooks/index.mdx
  • docs/engineering/infra/operations/runbooks/services/agent/deployment-issues.mdx
  • docs/engineering/infra/operations/verification-gaps.md
  • docs/engineering/scripts/sync-infra-docs.ts
  • docs/engineering/style.css
  • go.mod
  • pkg/db/BUILD.bazel
  • pkg/db/app_build_settings_delete_by_environment.sql_generated.go
  • pkg/db/app_delete_by_id.sql_generated.go
  • pkg/db/app_env_var_delete_by_environment.sql_generated.go
  • pkg/db/app_env_var_insert.sql_generated.go
  • pkg/db/app_list_ids_by_project.sql_generated.go
  • pkg/db/app_regional_settings_delete_by_environment.sql_generated.go
  • pkg/db/app_runtime_settings_delete_by_environment.sql_generated.go
  • pkg/db/bulk_app_env_var_insert.sql_generated.go
  • pkg/db/cilium_network_policy_delete_by_environment.sql_generated.go
  • pkg/db/custom_domain_delete_by_environment.sql_generated.go
  • pkg/db/custom_domain_delete_by_project.sql_generated.go
  • pkg/db/deployment_delete_by_environment.sql_generated.go
  • pkg/db/deployment_step_delete_by_environment.sql_generated.go
  • pkg/db/deployment_topology_delete_by_environment.sql_generated.go
  • pkg/db/environment_delete_by_id.sql_generated.go
  • pkg/db/environment_list_ids_by_app.sql_generated.go
  • pkg/db/frontline_route_delete_by_environment.sql_generated.go
  • pkg/db/frontline_route_delete_by_project.sql_generated.go
  • pkg/db/github_repo_connection_delete_by_app.sql_generated.go
  • pkg/db/github_repo_connection_delete_by_project.sql_generated.go
  • pkg/db/instance_count_by_app.sql_generated.go
  • pkg/db/project_delete_by_id.sql_generated.go
  • pkg/db/querier_bulk_generated.go
  • pkg/db/querier_generated.go
  • pkg/db/queries/app_build_settings_delete_by_environment.sql
  • pkg/db/queries/app_delete_by_id.sql
  • pkg/db/queries/app_env_var_delete_by_environment.sql
  • pkg/db/queries/app_env_var_insert.sql
  • pkg/db/queries/app_list_ids_by_project.sql
  • pkg/db/queries/app_regional_settings_delete_by_environment.sql
  • pkg/db/queries/app_runtime_settings_delete_by_environment.sql
  • pkg/db/queries/cilium_network_policy_delete_by_environment.sql
  • pkg/db/queries/custom_domain_delete_by_environment.sql
  • pkg/db/queries/custom_domain_delete_by_project.sql
  • pkg/db/queries/deployment_delete_by_environment.sql
  • pkg/db/queries/deployment_step_delete_by_environment.sql
  • pkg/db/queries/deployment_topology_delete_by_environment.sql
  • pkg/db/queries/environment_delete_by_id.sql
  • pkg/db/queries/environment_list_ids_by_app.sql
  • pkg/db/queries/frontline_route_delete_by_environment.sql
  • pkg/db/queries/frontline_route_delete_by_project.sql
  • pkg/db/queries/github_repo_connection_delete_by_app.sql
  • pkg/db/queries/github_repo_connection_delete_by_project.sql
  • pkg/db/queries/instance_count_by_app.sql
  • pkg/db/queries/project_delete_by_id.sql
  • pkg/db/queries/sentinel_count_by_app.sql
  • pkg/db/queries/sentinel_count_by_project.sql
  • pkg/db/queries/sentinel_delete_by_environment.sql
  • pkg/db/queries/sentinel_delete_by_project.sql
  • pkg/db/sentinel_count_by_app.sql_generated.go
  • pkg/db/sentinel_count_by_project.sql_generated.go
  • pkg/db/sentinel_delete_by_environment.sql_generated.go
  • pkg/db/sentinel_delete_by_project.sql_generated.go
  • svc/ctrl/api/BUILD.bazel
  • svc/ctrl/api/run.go
  • svc/ctrl/integration/cleanup_test.go
  • svc/ctrl/proto/ctrl/v1/app.proto
  • svc/ctrl/proto/ctrl/v1/project.proto
  • svc/ctrl/proto/hydra/v1/app.proto
  • svc/ctrl/proto/hydra/v1/environment.proto
  • svc/ctrl/proto/hydra/v1/project.proto
  • svc/ctrl/services/app/BUILD.bazel
  • svc/ctrl/services/app/create_app.go
  • svc/ctrl/services/app/delete_app.go
  • svc/ctrl/services/app/service.go
  • svc/ctrl/services/project/BUILD.bazel
  • svc/ctrl/services/project/create_project.go
  • svc/ctrl/services/project/delete_project.go
  • svc/ctrl/services/project/service.go
  • svc/ctrl/worker/BUILD.bazel
  • svc/ctrl/worker/app/BUILD.bazel
  • svc/ctrl/worker/app/delete_handler.go
  • svc/ctrl/worker/app/service.go
  • svc/ctrl/worker/environment/BUILD.bazel
  • svc/ctrl/worker/environment/delete_handler.go
  • svc/ctrl/worker/environment/service.go
  • svc/ctrl/worker/project/BUILD.bazel
  • svc/ctrl/worker/project/delete_handler.go
  • svc/ctrl/worker/project/service.go
  • svc/ctrl/worker/run.go
  • web/apps/dashboard/app/(app)/[workspaceSlug]/projects/_components/list/index.tsx
  • web/apps/dashboard/app/(app)/[workspaceSlug]/projects/_components/list/project-actions.tsx
  • web/apps/dashboard/lib/collections/deploy/environment-settings.ts
  • web/apps/dashboard/lib/trpc/routers/ctrl.ts
  • web/apps/dashboard/lib/trpc/routers/deploy/project/create.ts
  • web/apps/dashboard/lib/trpc/routers/deploy/project/delete.ts
💤 Files with no reviewable changes (41)
  • docs/engineering/infra/operations/cli/index.mdx
  • docs/engineering/infra/operations/infrastructure/index.mdx
  • docs/engineering/infra/operations/cli/run/krane/index.mdx
  • docs/engineering/infra/operations/infrastructure/accounts.mdx
  • docs/engineering/hosting/index.mdx
  • docs/engineering/infra/operations/cli/version/list/index.mdx
  • docs/engineering/infra/operations/infrastructure/aws/user-group-management.mdx
  • docs/engineering/infra/operations/cli/deploy/index.mdx
  • docs/engineering/infra/operations/infrastructure/dev/github-app.mdx
  • docs/engineering/infra/operations/cli/run/api/index.mdx
  • docs/engineering/hosting/security.mdx
  • docs/engineering/infra/operations/cli/run/index.mdx
  • docs/engineering/infra/operations/infra/index.mdx
  • docs/engineering/infra/operations/cli/run/ctrl/index.mdx
  • docs/engineering/infra/operations/infrastructure/dev/workos.mdx
  • docs/engineering/hosting/scaling.mdx
  • docs/engineering/hosting/backup-restore.mdx
  • docs/engineering/infra/operations/infra/deployments.mdx
  • docs/engineering/infra/operations/runbooks/index.mdx
  • docs/engineering/infra/operations/cli/healthcheck/index.mdx
  • docs/engineering/infra/index.mdx
  • docs/engineering/hosting/configuration.mdx
  • docs/engineering/infra/operations/runbooks/services/agent/deployment-issues.mdx
  • docs/engineering/infra/operations/runbooks/clickhouse-user-provisioning.mdx
  • docs/engineering/infra/operations/cli/version/get/index.mdx
  • docs/engineering/infra/operations/cli/run/ingress/index.mdx
  • docs/engineering/infra/operations/index.mdx
  • docs/engineering/infra/operations/infra/eks-cluster.mdx
  • docs/engineering/infra/operations/cli/version/index.mdx
  • docs/engineering/hosting/troubleshooting.mdx
  • docs/engineering/infra/operations/verification-gaps.md
  • docs/engineering/infra/operations/cli/version/rollback/index.mdx
  • docs/engineering/infra/operations/infrastructure/aws/google-idp.mdx
  • docs/engineering/infra/operations/reference/index.mdx
  • docs/engineering/infra/operations/infrastructure/dev/seeding-db-and-clickhouse.mdx
  • docs/engineering/architecture/index.mdx
  • docs/engineering/hosting/upgrade.mdx
  • docs/engineering/infra/operations/infrastructure/stripe/subscriptions.mdx
  • docs/engineering/architecture/workflows/index.mdx
  • docs/engineering/infra/operations/infrastructure/aws/index.mdx
  • docs/engineering/hosting/install.mdx

Comment on lines +1 to +12
.PHONY: sync dev lint

sync:
@bun run scripts/sync-infra-docs.ts

dev: sync
bunx mintlify@latest dev

lint: sync
bunx mintlify@latest broken-links
bunx mintlify@latest validate
bunx mintlify@latest a11y
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Add a fmt target or stop calling it from CI.

autofix.ci is already failing on make fmt, but this Makefile only defines sync, dev, and lint. The target names and workflow need to match before this goes green.

🧰 Tools
🪛 checkmake (0.2.2)

[warning] 1-1: Missing required phony target "all"

(minphony)


[warning] 1-1: Missing required phony target "clean"

(minphony)


[warning] 1-1: Missing required phony target "test"

(minphony)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/engineering/Makefile` around lines 1 - 12, The Makefile is missing a fmt
target while CI/autofix.ci invokes make fmt; either add a fmt target to the
Makefile or remove/adjust the CI job to stop calling it. To fix, add a new
target named fmt (alongside sync, dev, lint) that runs the repository’s
formatting command used by CI (or mirror whatever formatter command CI expects),
or alternatively update the CI workflow to call one of the existing targets
(e.g., lint) instead of make fmt; reference the Makefile targets sync, dev, lint
and the missing fmt target when making the change.

const REPO = "unkeyed/infra";
const SRC_DIR = "docs";
const TAB_NAME = "Infra";
const GROUP_NAME = "Infra";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

GROUP_NAME no longer matches the Infra section being updated.

docs.json now labels that group Infra Repo, so this lookup never matches and each sync appends a second generated group instead of replacing the existing one.

🛠️ Minimal fix
-const GROUP_NAME = "Infra";
+const GROUP_NAME = "Infra Repo";

Also applies to: 120-125

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/engineering/scripts/sync-infra-docs.ts` at line 12, The GROUP_NAME
constant is outdated and doesn't match docs.json ("Infra Repo"), causing lookups
to fail and a duplicate generated group to be appended; update the GROUP_NAME
value from "Infra" to "Infra Repo" (and update any other uses of GROUP_NAME in
the same file, e.g., the lookup/replacement logic around the block that
currently references GROUP_NAME at lines ~120-125) so the script matches and
replaces the existing group instead of creating a second one.

Comment on lines +1 to +2
-- name: DeleteDeploymentsByEnvironmentId :exec
DELETE FROM deployments WHERE environment_id = sqlc.arg(environment_id);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "Call sites for DeleteDeploymentsByEnvironmentId:"
rg -n -C3 --glob '*.go' '\bDeleteDeploymentsByEnvironmentId\s*\('

echo
echo "Queries touching instances:"
rg -n -C3 --glob 'pkg/db/queries/*.sql' 'DELETE FROM instances|FROM instances'

echo
echo "Schema references from deployment_id to deployments:"
rg -n -C2 'FOREIGN KEY.*deployment_id|REFERENCES deployments'

Repository: unkeyed/unkey

Length of output: 4213


🏁 Script executed:

# Get the full context of the delete_handler.go to see the complete deletion sequence
head -n 100 svc/ctrl/worker/environment/delete_handler.go | tail -n 30

Repository: unkeyed/unkey

Length of output: 1284


🏁 Script executed:

# Find schema or migration files that define the FK constraints
fd -e sql -e go 'schema|migration|ddl' | head -20

Repository: unkeyed/unkey

Length of output: 758


🏁 Script executed:

# Look for the table definition or schema constraints for deployments and instances
rg -n 'CREATE TABLE.*instances|CREATE TABLE.*deployments' --type sql

Repository: unkeyed/unkey

Length of output: 353


🏁 Script executed:

# Check if there's a delete sequence for environments that may handle cascading
rg -n -A 20 'delete_handler' svc/ctrl/worker/environment/delete_handler.go | head -50

Repository: unkeyed/unkey

Length of output: 39


🏁 Script executed:

# Get the instances table definition and FK constraints
sed -n '629,700p' pkg/db/schema.sql

Repository: unkeyed/unkey

Length of output: 2722


🏁 Script executed:

# Also check the deployments table for any related constraints
sed -n '484,550p' pkg/db/schema.sql

Repository: unkeyed/unkey

Length of output: 2713


🏁 Script executed:

# Search for any DELETE on instances before deployments in the handler
rg -n 'DeleteInstance|delete instance' svc/ctrl/worker/environment/delete_handler.go

Repository: unkeyed/unkey

Length of output: 39


🏁 Script executed:

# Search for any instance deletion queries
rg -n 'DELETE.*instances|DeleteInstance' pkg/db/queries/ --type sql

Repository: unkeyed/unkey

Length of output: 326


🏁 Script executed:

# Check how many queries reference instances
rg -c 'instances' pkg/db/queries/ --type sql | grep -v ':0$'

Repository: unkeyed/unkey

Length of output: 390


🏁 Script executed:

# Look for any environment-based instance deletions
rg -n 'instance.*environment|environment.*instance' pkg/db/queries/ --type sql

Repository: unkeyed/unkey

Length of output: 39


Delete instances before deleting deployments to prevent orphaned data.

instances reference deployments via deployment_id, but the schema has no foreign key constraint. Deleting deployments before instances will leave orphaned instance rows with dangling deployment_id references. The codebase has a DeleteDeploymentInstances query; call it before DeleteDeploymentsByEnvironmentId in the environment deletion handler.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/db/queries/deployment_delete_by_environment.sql` around lines 1 - 2, When
deleting an environment, ensure you remove dependent instances first to avoid
orphaned rows: call the DeleteDeploymentInstances query (the generated function
for the instances deletion) before calling DeleteDeploymentsByEnvironmentId in
the environment deletion handler (where the environment delete flow currently
invokes DeleteDeploymentsByEnvironmentId); update that handler to first execute
DeleteDeploymentInstances filtered by the same environment/deployment
identifiers, then execute DeleteDeploymentsByEnvironmentId, preserving
transactional ordering so both ops run in the same transaction.

Comment on lines +258 to +263
func countRows(t *testing.T, ctx context.Context, database db.Database, query string, args ...any) int {
t.Helper()
var count int
err := database.RO().QueryRowContext(ctx, query, args...).Scan(&count)
require.NoError(t, err)
return count
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail
rg -nP 'func countRows\(|countRows\(t,\s*ctx,\s*h\.DB,\s*c\.query,\s*c\.arg\)' svc/ctrl/integration/cleanup_test.go

Repository: unkeyed/unkey

Length of output: 291


Replace variadic any with single string parameter.

The function takes one string argument on all call sites (lines 235, 250), so args ...any weakens type safety unnecessarily. Refactor to arg string to match the coding guideline: "Never compromise type safety: No any, no ! (non-null assertion), no as Type."

Suggested change
-func countRows(t *testing.T, ctx context.Context, database db.Database, query string, args ...any) int {
+func countRows(t *testing.T, ctx context.Context, database db.Database, query string, arg string) int {
 	t.Helper()
 	var count int
-	err := database.RO().QueryRowContext(ctx, query, args...).Scan(&count)
+	err := database.RO().QueryRowContext(ctx, query, arg).Scan(&count)
 	require.NoError(t, err)
 	return count
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@svc/ctrl/integration/cleanup_test.go` around lines 258 - 263, The countRows
helper currently accepts a variadic args ...any which weakens type safety;
change its signature to accept a single arg string (func countRows(t *testing.T,
ctx context.Context, database db.Database, query string, arg string) int),
update the QueryRowContext call to use arg (not args...), and update the two
call sites that pass a single string (lines referenced) to pass that string as
the new arg parameter; keep the require.NoError check and return unchanged.

Comment on lines +16 to +21
rpc Delete(DeleteAppRequest) returns (DeleteAppResponse) {}
}

message DeleteAppRequest {}

message DeleteAppResponse {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Rename the RPC messages to pass Buf lint.

Buf already flags Line 16: DeleteAppRequest / DeleteAppResponse violate RPC_*_STANDARD_NAME, so proto checks stay red until these are renamed. Prefer service-scoped names here to avoid package collisions.

🔧 Suggested rename
-  rpc Delete(DeleteAppRequest) returns (DeleteAppResponse) {}
+  rpc Delete(AppServiceDeleteRequest) returns (AppServiceDeleteResponse) {}
 }
 
-message DeleteAppRequest {}
+message AppServiceDeleteRequest {}
 
-message DeleteAppResponse {}
+message AppServiceDeleteResponse {}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
rpc Delete(DeleteAppRequest) returns (DeleteAppResponse) {}
}
message DeleteAppRequest {}
message DeleteAppResponse {}
rpc Delete(AppServiceDeleteRequest) returns (AppServiceDeleteResponse) {}
}
message AppServiceDeleteRequest {}
message AppServiceDeleteResponse {}
🧰 Tools
🪛 Buf (1.66.1)

[error] 16-16: RPC request type "DeleteAppRequest" should be named "DeleteRequest" or "AppServiceDeleteRequest".

(RPC_REQUEST_STANDARD_NAME)


[error] 16-16: RPC response type "DeleteAppResponse" should be named "DeleteResponse" or "AppServiceDeleteResponse".

(RPC_RESPONSE_STANDARD_NAME)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@svc/ctrl/proto/hydra/v1/app.proto` around lines 16 - 21, Rename the RPC
request/response message types to service-scoped names to satisfy Buf lint:
change the messages currently named DeleteAppRequest and DeleteAppResponse and
update the rpc Delete signature to use the new names (e.g., AppDeleteRequest and
AppDeleteResponse or another App-prefixed pair); ensure you update all
references to DeleteAppRequest/DeleteAppResponse to the new symbol names (rpc
Delete, message DeleteAppRequest, message DeleteAppResponse) so the proto
compiles and the RPC_*_STANDARD_NAME lint passes.

Comment on lines +31 to +36
for _, envID := range envIDs {
logger.Info("deleting environment", "app_id", appID, "environment_id", envID)

envClient := hydrav1.NewEnvironmentServiceClient(ctx, envID)
envClient.Delete().Send(&hydrav1.DeleteEnvironmentRequest{})
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check how other workers handle Send() calls in similar patterns
rg -n "\.Send\(" svc/ctrl/worker/ -A2 -B2

Repository: unkeyed/unkey

Length of output: 5158


🏁 Script executed:

#!/bin/bash
# Read the delete_handler.go file to see full context and comments
cat -n svc/ctrl/worker/app/delete_handler.go

Repository: unkeyed/unkey

Length of output: 2156


🏁 Script executed:

#!/bin/bash
# Read the certificate renew_handler.go around the Send() call to see how error is handled
sed -n '50,65p' svc/ctrl/worker/certificate/renew_handler.go | cat -n

Repository: unkeyed/unkey

Length of output: 673


🏁 Script executed:

#!/bin/bash
# Read the bootstrap_infra_certs.go around Send() to see error handling pattern
sed -n '145,160p' svc/ctrl/worker/certificate/bootstrap_infra_certs.go | cat -n

Repository: unkeyed/unkey

Length of output: 733


🏁 Script executed:

#!/bin/bash
# Check if there are any comments in project/delete_handler.go (similar pattern)
cat -n svc/ctrl/worker/project/delete_handler.go

Repository: unkeyed/unkey

Length of output: 1817


Capture and log errors from Send() calls to match established patterns elsewhere.

Line 35 ignores the response and error from envClient.Delete().Send() without justification. Other workers in the codebase (certificate renewal, infrastructure bootstrap) that call Send() properly capture and log errors, even for async fire-and-forget operations. Missing error handling here could silently fail environment deletion while proceeding with app cleanup, leaving orphaned resources. Capture the error and log at minimum:

if err := envClient.Delete().Send(&hydrav1.DeleteEnvironmentRequest{}); err != nil {
    logger.Warn("failed to trigger environment deletion", "app_id", appID, "environment_id", envID, "error", err)
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@svc/ctrl/worker/app/delete_handler.go` around lines 31 - 36, The loop calling
hydrav1.NewEnvironmentServiceClient(ctx, envID) currently ignores the return
from envClient.Delete().Send(); modify the deletion call in the delete handler
so it captures the error return and logs failures (e.g., use
envClient.Delete().Send(...) error check) and emit a logger.Warn including
app_id, environment_id and the error; follow the same pattern used in other
workers (certificate renewal, infra bootstrap) to avoid silently failing
environment deletions.

Comment on lines +23 to +93
if err := restate.RunVoid(ctx, func(runCtx restate.RunContext) error {
return db.Query.DeleteCiliumNetworkPoliciesByEnvironmentId(runCtx, s.db.RW(), envID)
}, restate.WithName("delete network policies")); err != nil {
return nil, fmt.Errorf("delete network policies: %w", err)
}

if err := restate.RunVoid(ctx, func(runCtx restate.RunContext) error {
return db.Query.DeleteSentinelsByEnvironmentId(runCtx, s.db.RW(), envID)
}, restate.WithName("delete sentinels")); err != nil {
return nil, fmt.Errorf("delete sentinels: %w", err)
}

if err := restate.RunVoid(ctx, func(runCtx restate.RunContext) error {
return db.Query.DeleteCustomDomainsByEnvironmentId(runCtx, s.db.RW(), envID)
}, restate.WithName("delete custom domains")); err != nil {
return nil, fmt.Errorf("delete custom domains: %w", err)
}

if err := restate.RunVoid(ctx, func(runCtx restate.RunContext) error {
return db.Query.DeleteFrontlineRoutesByEnvironmentId(runCtx, s.db.RW(), envID)
}, restate.WithName("delete frontline routes")); err != nil {
return nil, fmt.Errorf("delete frontline routes: %w", err)
}

if err := restate.RunVoid(ctx, func(runCtx restate.RunContext) error {
return db.Query.DeleteAppEnvVarsByEnvironmentId(runCtx, s.db.RW(), envID)
}, restate.WithName("delete env vars")); err != nil {
return nil, fmt.Errorf("delete env vars: %w", err)
}

if err := restate.RunVoid(ctx, func(runCtx restate.RunContext) error {
return db.Query.DeleteAppRegionalSettingsByEnvironmentId(runCtx, s.db.RW(), envID)
}, restate.WithName("delete regional settings")); err != nil {
return nil, fmt.Errorf("delete regional settings: %w", err)
}

if err := restate.RunVoid(ctx, func(runCtx restate.RunContext) error {
return db.Query.DeleteAppBuildSettingsByEnvironmentId(runCtx, s.db.RW(), envID)
}, restate.WithName("delete build settings")); err != nil {
return nil, fmt.Errorf("delete build settings: %w", err)
}

if err := restate.RunVoid(ctx, func(runCtx restate.RunContext) error {
return db.Query.DeleteAppRuntimeSettingsByEnvironmentId(runCtx, s.db.RW(), envID)
}, restate.WithName("delete runtime settings")); err != nil {
return nil, fmt.Errorf("delete runtime settings: %w", err)
}

if err := restate.RunVoid(ctx, func(runCtx restate.RunContext) error {
return db.Query.DeleteDeploymentStepsByEnvironmentId(runCtx, s.db.RW(), envID)
}, restate.WithName("delete deployment steps")); err != nil {
return nil, fmt.Errorf("delete deployment steps: %w", err)
}

if err := restate.RunVoid(ctx, func(runCtx restate.RunContext) error {
return db.Query.DeleteDeploymentTopologiesByEnvironmentId(runCtx, s.db.RW(), envID)
}, restate.WithName("delete deployment topologies")); err != nil {
return nil, fmt.Errorf("delete deployment topologies: %w", err)
}

if err := restate.RunVoid(ctx, func(runCtx restate.RunContext) error {
return db.Query.DeleteDeploymentsByEnvironmentId(runCtx, s.db.RW(), envID)
}, restate.WithName("delete deployments")); err != nil {
return nil, fmt.Errorf("delete deployments: %w", err)
}

if err := restate.RunVoid(ctx, func(runCtx restate.RunContext) error {
return db.Query.DeleteEnvironmentById(runCtx, s.db.RW(), envID)
}, restate.WithName("delete environment")); err != nil {
return nil, fmt.Errorf("delete environment: %w", err)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🌐 Web query:

For github.com/restatedev/sdk-go, if several RunVoidblocks in one handler succeed and a later one fails, are the earlierRunVoid blocks re-executed on retry/replay? What pattern does Restate recommend for multi-table delete workflows that must not miss rows inserted concurrently during the delete?

💡 Result:

1) Will earlier RunVoid blocks be re-executed on retry/replay?

Normally, no. In Restate, a Run/RunVoid step durably records its final outcome (success or terminal error) in the invocation journal; on replay, Restate reuses that recorded outcome instead of re-running the step. (pkg.go.dev)

Edge case: Restate documents a small window where the most recent run action may be re-executed if the process fails after the action completed but before its result was durably persisted; Restate also guarantees it won’t start a later run while the previous one isn’t yet durable—so at worst you risk re-execution of only the latest step, not earlier ones. (restatedev.github.io)

So if you have multiple RunVoid steps and a later one fails, the earlier successful RunVoids are not re-executed on replay, except for that documented “small window” risk affecting only the latest step at the time of failure. (restatedev.github.io)


2) Recommended pattern for “multi-table delete” that must not miss concurrently inserted rows

Restate’s recommended shape for correctness here is to serialize all mutations for the same logical entity/key through an exclusive Virtual Object (or a Workflow keyed by that entity), so inserts and the delete workflow can’t interleave unpredictably. Exclusive handlers for Virtual Objects / Workflows run one-at-a-time per key. (docs.restate.dev)

A common pattern is:

  1. Route all writers through the same Virtual Object key (e.g., Account/{accountId}), not directly to the DB.
  2. In the delete handler:
    • Set a “deleting/tombstone” flag in object state first (durably), then
    • Perform the multi-table delete (often in chunks/pages) via RunVoid steps,
    • Finally mark “deleted” (or clear state).
  3. In every insert/update handler:
    • Check the flag; if “deleting/deleted”, reject the write or treat it as a no-op / compensating cleanup.

Because calls to that object are queued and processed serially, no concurrent insert can slip in “during” the delete (it will be processed either before deletion starts or after the tombstone is set, depending on arrival order). (docs.restate.dev)

Citations:


Mark environment as deleting before cascading deletes.

Each RunVoid is independently durable; earlier successful steps won't replay if a later one fails. Concurrent inserts landing between steps can leave rows that block DeleteEnvironmentById. Set a "deleting" flag atomically before the first child delete, then reject new writes for that environment. This serializes the delete workflow and prevents interleaving.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@svc/ctrl/worker/environment/delete_handler.go` around lines 23 - 93, The
delete sequence runs many independent restate.RunVoid steps (e.g.,
DeleteCiliumNetworkPoliciesByEnvironmentId, DeleteSentinelsByEnvironmentId, ...
DeleteEnvironmentById) which allows concurrent writers to insert new rows
between steps; add an initial atomic step to mark the environment as "deleting"
(e.g., call an Update/SetDeleting method on the environment record using envID
within a single restate.Run or a DB transaction) before any Delete* calls, and
ensure all write paths (inserts/updates that accept envID) check that "deleting"
is set and reject new writes; this serializes the cascade and prevents new rows
from blocking DeleteEnvironmentById.

Comment on lines +34 to +40
appClient := hydrav1.NewAppServiceClient(ctx, appID)
appClient.Delete().Send(&hydrav1.DeleteAppRequest{})
}

if err := restate.RunVoid(ctx, func(runCtx restate.RunContext) error {
return db.Query.DeleteProjectById(runCtx, s.db.RW(), projectID)
}, restate.WithName("delete project")); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

FILE="$(fd '^app\.pb\.go$' gen/proto/hydra/v1 | head -n1)"

echo "== Generated App client API =="
rg -n -C3 'type AppServiceClient|func NewAppServiceClient|func \(.*\) Delete\(|func \(.*\) Send\(|func \(.*\) Await\(|func \(.*\) Response\(' "$FILE"

echo
echo "== Current call site =="
rg -n -C2 'NewAppServiceClient|Delete\(\)\.Send' svc/ctrl/worker/project/delete_handler.go

Repository: unkeyed/unkey

Length of output: 87


🏁 Script executed:

# Find and read the delete_handler.go file to see full context
fd 'delete_handler\.go' -type f | head -5

Repository: unkeyed/unkey

Length of output: 227


🏁 Script executed:

# Search for proto definitions related to AppService
fd '.*\.proto' | xargs rg 'AppService|DeleteAppRequest' 2>/dev/null | head -20

Repository: unkeyed/unkey

Length of output: 552


🏁 Script executed:

# Look for similar Send() patterns in the codebase to see if results are checked
rg 'Delete\(\)\.Send\(|\.Send\(' svc/ctrl/worker --type go -B2 -A2 | head -40

Repository: unkeyed/unkey

Length of output: 3005


🏁 Script executed:

# Read the full delete_handler.go file to understand context
cat -n svc/ctrl/worker/project/delete_handler.go

Repository: unkeyed/unkey

Length of output: 1817


🏁 Script executed:

# Find Restate framework usage and client patterns
rg 'NewAppServiceClient|NewDeployServiceClient' --type-add 'proto:*.proto' -g '*.proto' -A5 -B2 | head -50

Repository: unkeyed/unkey

Length of output: 39


🏁 Script executed:

# Check if there's documentation or examples about Send() behavior
fd 'doc.go|README|example' svc/ctrl/worker -x cat {} \; | head -100

Repository: unkeyed/unkey

Length of output: 5398


🏁 Script executed:

# Search for Restate SDK usage patterns and documentation
rg -i 'restate.*client|Send\(\)|NewAppServiceClient' svc/ctrl/worker --context 5 | head -80

Repository: unkeyed/unkey

Length of output: 4786


🏁 Script executed:

# Check if there are tests that show the expected pattern
fd 'test\.go' svc/ctrl/worker -x grep -l 'Delete\|Send' {} \;

Repository: unkeyed/unkey

Length of output: 176


🏁 Script executed:

# Look for any comments or patterns around service client usage
rg 'virtual object|fire.and.forget|async' svc/ctrl/worker -i | head -20

Repository: unkeyed/unkey

Length of output: 2325


🏁 Script executed:

# Search the entire codebase for how Delete().Send() is typically handled
rg 'Delete\(\)\.Send' --type-add 'go:*.go' -g '*.go' -B5 -A5

Repository: unkeyed/unkey

Length of output: 3651


🏁 Script executed:

# Find and examine the app service client definition and Send() signature
rg 'type AppServiceClient|type AppServiceIngressClient|func.*Send\(' gen/proto/hydra/v1 -A3 2>/dev/null | head -50

Repository: unkeyed/unkey

Length of output: 743


🏁 Script executed:

# Look for the generated proto code location
fd '\.pb\.go$' | grep -i 'hydra\|proto' | head -10

Repository: unkeyed/unkey

Length of output: 392


🏁 Script executed:

# Search for how Restate client generation works
rg 'AppServiceClient.*Send|type.*Client.*Send' -A2 | head -40

Repository: unkeyed/unkey

Length of output: 39


🏁 Script executed:

# Check if there's a comment or pattern explaining why errors are ignored
rg 'fire.and.forget|async|background|schedule' svc/ctrl/worker/project -i -B2 -A2

Repository: unkeyed/unkey

Length of output: 39


🏁 Script executed:

# Get the full signature of the Client type and Send method from Restate SDK
rg 'sdk_go.Client\[|ingress.Requester\[' gen/proto/hydra/v1/app_restate.pb.go -B2 -A10

Repository: unkeyed/unkey

Length of output: 1933


🏁 Script executed:

# Look for any Restate SDK types or Send() implementation in the codebase
rg 'func.*Send\(' --type-add 'go:*.go' -g '*.go' | grep -i 'restate\|sdk' | head -20

Repository: unkeyed/unkey

Length of output: 39


🏁 Script executed:

# Check the Restate SDK import and find documentation or type signatures
rg 'restatedev/sdk-go|sdk_go|Client\[.*Send' --type-add 'go:*.go' -g '*.go' -B2 -A5 | head -60

Repository: unkeyed/unkey

Length of output: 4278


🏁 Script executed:

# Examine one of the actual generated files to see the full Send() method signature
sed -n '1,200p' gen/proto/hydra/v1/app_restate.pb.go | tail -100

Repository: unkeyed/unkey

Length of output: 3863


🏁 Script executed:

# Search for any Send() calls with error checking in the worker handlers
rg 'Send\(' svc/ctrl/worker --type-add 'go:*.go' -g '*.go' -B2 -A3 | grep -A5 'Send.*err'

Repository: unkeyed/unkey

Length of output: 39


🏁 Script executed:

# Look for Restate SDK Client type definition in imports or usage
rg 'sdk_go\.Client|\.Send\(.*\)' svc/ctrl/worker -A1 -B1 | head -60

Repository: unkeyed/unkey

Length of output: 998


🏁 Script executed:

# Check Restate SDK documentation or examples in go.mod
cat go.mod | grep -i restate

Repository: unkeyed/unkey

Length of output: 146


🏁 Script executed:

# Search for any comments or patterns about Send() being async/fire-and-forget
rg 'Send.*fire|Send.*async|Send.*schedule|fire.and.forget' svc/ctrl/worker -i

Repository: unkeyed/unkey

Length of output: 680


🏁 Script executed:

# Look at the renew_handler context for fire-and-forget pattern
sed -n '1,50p' svc/ctrl/worker/certificate/renew_handler.go

Repository: unkeyed/unkey

Length of output: 1834


🏁 Script executed:

# Search for any error handling related to Send() or virtual object calls
rg 'Send.*error|error.*Send|err.*Send' svc/ctrl/worker -B2 -A2 | head -50

Repository: unkeyed/unkey

Length of output: 537


🏁 Script executed:

# Check what sdk_go.Client provides - look at how the returned Client is used
rg 'Delete\(\)\.Send|Delete\(\)\.' svc/ctrl/worker/project -B1 -A3

Repository: unkeyed/unkey

Length of output: 452


🏁 Script executed:

# Look for Restate SDK usage patterns in the codebase for Send()
rg 'sdk_go\.Client|Requester.*Send' -A5 | head -80

Repository: unkeyed/unkey

Length of output: 6937


🏁 Script executed:

# Look at how sdk_go.Client is actually used - find any Send() calls that check the result
rg 'Client\[.*\]\.Send|sdk_go.*Send' gen/proto/hydra/v1 -A2 -B2 | head -60

Repository: unkeyed/unkey

Length of output: 39


🏁 Script executed:

# Check if there are any tests that show Send() usage
fd '_test\.go$' gen/proto -x cat {} \; | grep -A5 'Send' | head -40

Repository: unkeyed/unkey

Length of output: 39


🏁 Script executed:

# Look for the actual Send method implementation or type definition
rg 'func.*Send\(|type.*Client.*interface' gen/proto/hydra/v1/*restate.pb.go | head -20

Repository: unkeyed/unkey

Length of output: 1822


🏁 Script executed:

# Check how certificate/renew_handler uses the Send() call around line 47+
sed -n '40,80p' svc/ctrl/worker/certificate/renew_handler.go

Repository: unkeyed/unkey

Length of output: 1197


Check the app delete result before deleting the project.

Line 35 drops the error from Send(). If the app deletion request fails to schedule (network error, serialization failure, etc.), the project record will still be deleted, leaving orphaned apps in the database. Compare with certificate/renew_handler.go which checks sendErr and logs failures—follow that pattern here.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@svc/ctrl/worker/project/delete_handler.go` around lines 34 - 40, The call to
appClient.Delete().Send(&hydrav1.DeleteAppRequest{}) ignores the returned error,
so detect and handle the send error before proceeding to restate.RunVoid;
capture the result and error from appClient.Delete().Send (as in
certificate/renew_handler.go's sendErr pattern), log the failure via the
appropriate logger and return the error to abort further work so that
DeleteProjectById is not executed if the app delete failed; locate
NewAppServiceClient, the Delete().Send call, and the restate.RunVoid block to
implement this error check and early return using projectID and s.db.RW().

Comment on lines +129 to +135
regions: regional
.filter((r) => r.region !== null)
.map((r) => ({
id: r.region!.id,
name: r.region!.name,
replicas: r.replicas,
})),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Verify there are no remaining non-null assertions in this file
rg -n '\!\.' web/apps/dashboard/lib/collections/deploy/environment-settings.ts

Repository: unkeyed/unkey

Length of output: 1772


🏁 Script executed:

#!/bin/bash
# Verify there are no remaining non-null assertions in this file
rg -n '\!\.' web/apps/dashboard/lib/collections/deploy/environment-settings.ts

Repository: unkeyed/unkey

Length of output: 127


🏁 Script executed:

#!/bin/bash
# Verify there are no remaining non-null assertions in this file
rg -n '!\.' web/apps/dashboard/lib/collections/deploy/environment-settings.ts || echo "No matches found"

Repository: unkeyed/unkey

Length of output: 120


🏁 Script executed:

#!/bin/bash
# Check for other non-null assertion patterns (including standalone !)
rg -n '!' web/apps/dashboard/lib/collections/deploy/environment-settings.ts | head -20

Repository: unkeyed/unkey

Length of output: 641


🏁 Script executed:

#!/bin/bash
# View the broader context around the problematic lines
sed -n '125,140p' web/apps/dashboard/lib/collections/deploy/environment-settings.ts | cat -n

Repository: unkeyed/unkey

Length of output: 608


Remove non-null assertions in region mapping.

Lines 132–133 use !, which violates the TypeScript safety guideline ("Never compromise type safety: No any, no ! (non-null assertion), no as Type") and the active lint check. The .filter((r) => r.region !== null) on line 130 already guarantees r.region is non-null, making the ! redundant.

Use flatMap to narrow the type safely without assertions:

Suggested fix
-    regions: regional
-      .filter((r) => r.region !== null)
-      .map((r) => ({
-        id: r.region!.id,
-        name: r.region!.name,
-        replicas: r.replicas,
-      })),
+    regions: regional.flatMap((r) =>
+      r.region
+        ? [{ id: r.region.id, name: r.region.name, replicas: r.replicas }]
+        : [],
+    ),
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
regions: regional
.filter((r) => r.region !== null)
.map((r) => ({
id: r.region!.id,
name: r.region!.name,
replicas: r.replicas,
})),
regions: regional.flatMap((r) =>
r.region
? [{ id: r.region.id, name: r.region.name, replicas: r.replicas }]
: [],
),
🧰 Tools
🪛 GitHub Actions: autofix.ci

[error] 132-132: lint/style/noNonNullAssertion: Forbidden non-null assertion. Use optional chaining (r.region?.id) instead of r.region!.id.


[error] 133-133: lint/style/noNonNullAssertion: Forbidden non-null assertion. Use optional chaining (r.region?.name) instead of r.region!.name.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@web/apps/dashboard/lib/collections/deploy/environment-settings.ts` around
lines 129 - 135, The mapping over `regional` uses non-null assertions
(`r.region!`) which violates type-safety rules; replace the filter+map with a
single type-narrowing operation (e.g., use `flatMap` or a combined
filter-then-map callback) so `r.region` is guaranteed non-null without `!`.
Update the `regions` assignment that currently references `r.region!.id`,
`r.region!.name`, and `r.replicas` to instead only iterate over entries where
`r.region` is not null and then safely access `id` and `name` from `r.region`
(keeping `replicas` from `r`) to remove the non-null assertions.

Comment on lines +41 to +50
} catch (err) {
if (err instanceof TRPCError) {
throw err;
}
console.error("Failed to delete project:", err);
throw new TRPCError({
code: "INTERNAL_SERVER_ERROR",
message: "Failed to delete project",
});
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Verify what error types are used in the ctrl client setup
ast-grep --pattern 'import { $$$ConnectError$$$ } from "@connectrpc/connect"'
rg -n "ConnectError" --type=ts web/apps/dashboard/lib/trpc/routers/

Repository: unkeyed/unkey

Length of output: 343


🏁 Script executed:

cat -n web/apps/dashboard/lib/trpc/routers/deploy/project/delete.ts | head -60

Repository: unkeyed/unkey

Length of output: 2125


🏁 Script executed:

# Check what getCtrlClients returns and how errors should be handled
rg -n "getCtrlClients" --type=ts web/apps/dashboard/lib/trpc/routers/ -A 2 | head -40

Repository: unkeyed/unkey

Length of output: 1333


🏁 Script executed:

# Check create.ts mentioned in the review comment's notes
cat -n web/apps/dashboard/lib/trpc/routers/deploy/project/create.ts | grep -A 15 "catch (err)"

Repository: unkeyed/unkey

Length of output: 450


🏁 Script executed:

# Look for ConnectError imports in the same directory
rg -n "ConnectError" --type=ts web/apps/dashboard/lib/trpc/routers/deploy/project/

Repository: unkeyed/unkey

Length of output: 39


🏁 Script executed:

cat -n web/apps/dashboard/lib/trpc/routers/ctrl.ts | head -60

Repository: unkeyed/unkey

Length of output: 1184


🏁 Script executed:

# Check what transport is used and if it's ConnectRPC
rg -n "getTransport" --type=ts web/apps/dashboard/lib/trpc/routers/ -B 2 -A 5 | head -50

Repository: unkeyed/unkey

Length of output: 1253


Error type check will never match — ConnectRPC throws ConnectError, not TRPCError.

The ctrl.project.deleteProject call uses a ConnectRPC client created by getCtrlClients(), which throws ConnectError instances. The instanceof TRPCError check on line 42 will never match, causing all ctrl service errors to be caught by the generic handler and rethrown as INTERNAL_SERVER_ERROR, masking actual error details.

Compare with the correct pattern in add.ts:

import { Code, ConnectError, createClient } from "@connectrpc/connect";
// ...
if (error instanceof ConnectError && error.code === Code.AlreadyExists) {
🐛 Proposed fix
+import { ConnectError } from "@connectrpc/connect";
 import { TRPCError } from "@trpc/server";
     } catch (err) {
-      if (err instanceof TRPCError) {
-        throw err;
+      if (err instanceof ConnectError) {
+        throw new TRPCError({
+          code: "INTERNAL_SERVER_ERROR",
+          message: err.message,
+          cause: err,
+        });
       }
       console.error("Failed to delete project:", err);
       throw new TRPCError({

Note: The same issue exists in create.ts (lines 42-44).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@web/apps/dashboard/lib/trpc/routers/deploy/project/delete.ts` around lines 41
- 50, The catch block in the delete handler incorrectly checks for TRPCError
(which never matches ConnectRPC errors) so replace the instanceof TRPCError
check with an instanceof ConnectError check and, where appropriate, inspect
error.code against Code.* constants; import ConnectError and Code from
"@connectrpc/connect", keep rethrowing TRPCError as-is but for ConnectError
propagate a more specific TRPCError (e.g., map ConnectError with
Code.NotFound/AlreadyExists to corresponding TRPCError codes) or rethrow the
ConnectError details in the processLogger before throwing a
TRPCError(INTERNAL_SERVER_ERROR). Apply the same fix to the create handler where
the same pattern occurs (references: ctrl.project.deleteProject and
ctrl.project.createProject, the catch blocks currently checking TRPCError).

@chronark chronark closed this Mar 17, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants