Skip to content

regions fixes#5246

Merged
chronark merged 26 commits intomainfrom
regions-fixes
Mar 8, 2026
Merged

regions fixes#5246
chronark merged 26 commits intomainfrom
regions-fixes

Conversation

@chronark
Copy link
Collaborator

@chronark chronark commented Mar 8, 2026

  • fix: improve rollbacks
  • feat: scaling and region tables
  • fix: replace AVAILABLE_REGIONS env var with clusterRegions table query
  • fix: update instances mutation to use appScalingSettings table
  • fix: update CPU mutation to use appScalingSettings table
  • fix: update memory mutation to use appScalingSettings table
  • fix: update regions mutation to use appScalingSettings and clusterRegions tables
  • fix: revert cpu/memory mutations back to appRuntimeSettings
  • feat: rename app_scaling_settings to app_regional_settings, add horizontal_autoscaling_policies table
  • feat: replace regionConfig with appRegionalSettings across dashboard
  • fix: remove AVAILABLE_REGIONS from .env.example
  • feat: add migration script for region_config to app_regional_settings
  • feat: add Heartbeat RPC to ClusterService proto
  • feat: add heartbeat loop to krane agent
  • feat: implement Heartbeat RPC handler and rename cluster_regions to regions
  • feat: deploy workflow reads from appRegionalSettings, add region_id to deployment_topology
  • fix: display region names in UI, add local region support
  • feat: add X-Krane-Platform header, app_id to ApplySentinel, and appId labels
  • fix: step conflict
  • fix: rabbit

chronark added 22 commits March 7, 2026 11:34
This addresses a few things.

The most obvious one is lowering the freshness of some frontline caches
to make rollbacks faster.

I also moved the entire deployment workflow to a virtual object, keyed
by workspace id. The reason is that we had some potential races and this
is the pragmatic fix for right now. I could've done it per app too, but
I actually think limiting concurrency in the beginning is not a bad
thing. afterall, we're paying for all of these resources.
Krane now sends periodic heartbeats to the control plane every 30s,
reporting its region and platform. Also adds Platform config field,
fixes duplicate r.Recover()/r.DeferCtx calls, and updates mock client.
…egions

- Add Heartbeat handler on ctrl that upserts regions and clusters tables
- Rename cluster_regions table to regions for clarity
- Remove AvailableRegions config from ctrl api/worker
- Deploy workflow now fails with terminal error if no regions configured
- Certificate bootstrap queries regions from DB instead of config
- Add SQL queries: UpsertRegion, FindRegionByNameAndPlatform, ListRegions, UpsertCluster
…o deployment_topology

- Deploy handler now reads regions from app_regional_settings table
  instead of deprecated regionConfig on app_runtime_settings
- Fails with terminal error if no regions are configured
- Added region_id column to deployment_topology table
- Added FindAppRegionalSettingsByAppAndEnv SQL query with regions join
- Updated integration harness to create regions before topologies
- Join regions table when fetching regional settings to get region name
- Use region name (not ID) for display in runtime settings
- Add "local" flag code to sentinel node types and REGION_INFO
… labels

- Add X-Krane-Platform header alongside X-Krane-Region in the krane
  interceptor so ctrl can identify regions by (name, platform) pair
- All ctrl RPC handlers now validate both region and platform from headers
- Add app_id field to ApplySentinel proto message
- Add AppID label to all sentinel sub-resources (Service, PDB, gossip
  Service, CiliumNetworkPolicy) for consistency with deployment labels
- Remove hardcoded region list from sentinel config validation (regions
  are now dynamic from DB)
- Add platform field to sentinel config
@vercel
Copy link

vercel bot commented Mar 8, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
dashboard Ready Ready Preview, Comment Mar 8, 2026 9:06am

Request Review

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 8, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: f37d5620-4c29-46a9-ad5b-0a34da402bf9

📥 Commits

Reviewing files that changed from the base of the PR and between 078a045 and b52f848.

📒 Files selected for processing (2)
  • web/internal/db/src/schema/clusters.ts
  • web/internal/db/src/schema/deployment_topology.ts
🚧 Files skipped from review as they are similar to previous changes (1)
  • web/internal/db/src/schema/deployment_topology.ts

📝 Walkthrough

Walkthrough

Replaces static available_regions with a DB-backed, platform-aware regional model: adds regions/clusters/app_regional_settings tables and queries, Heartbeat RPC and handler, platform header propagation and client heartbeat, migration script, generated DB methods, schema/type updates, and frontend changes to use regions array (id,name,replicas).

Changes

Cohort / File(s) Summary
Config & Manifests
dev/config/ctrl-api.toml, dev/config/krane.toml, dev/k8s/manifests/..., dev/k8s/manifests/frontline.yaml
Removed available_regions; added platform = "dev" and changed region values (e.g., local.devlocal) in dev configs and k8s ConfigMaps.
Database schema & migrations
pkg/db/schema.sql, web/internal/db/src/schema/*, web/tools/migrate/region-config-to-regional-settings.ts
Added regions, clusters, app_regional_settings, horizontal_autoscaling_policies schemas; added region_id/regionId to deployment_topology; added indexes; migration script to convert legacy region_config into app_regional_settings rows.
SQL queries & generated DB layer
pkg/db/queries/*, pkg/db/*sql_generated.go, pkg/db/querier*.go, pkg/db/models_generated.go, pkg/db/BUILD.bazel
Added sqlc queries and generated code: ListRegions, FindRegionByNameAndPlatform, FindAppRegionalSettingsByAppAndEnv, UpsertRegion/Cluster, bulk upserts; updated deployment insert/topology SQL (ON DUPLICATE KEY, added region_id); new generated models and build entries.
Control plane API & worker
svc/ctrl/api/*.go, svc/ctrl/worker/*.go, svc/ctrl/integration/*
Removed AvailableRegions config fields; certificate bootstrap now lists regions from DB; worker deploy flow switched to app_regional_settings (uses RegionName/RegionID/Replicas) and errors if no regional settings.
Cluster service & RPCs
svc/ctrl/proto/ctrl/v1/cluster.proto, svc/ctrl/services/cluster/*.go
Added Heartbeat RPC/messages; updated ApplySentinel fields; multiple RPC handlers now require/validate X-Krane-Region and X-Krane-Platform headers; Heartbeat handler upserts region and cluster heartbeat.
Krane / sentinel / frontline services
svc/krane/*, svc/krane/internal/sentinel/*, svc/sentinel/*, svc/frontline/*
Added Platform config propagation and usage; krane runs periodic Heartbeat; sentinel/frontline accept and propagate platform header; frontline routing uses region.platform composite for nearest-region decisions.
UIDs & models
pkg/uid/prefix.go, pkg/db/models_generated.go
Added ClusterPrefix, RegionPrefix; adjusted SentinelPrefix; added models: Region, Cluster, AppRegionalSetting, HorizontalAutoscalingPolicy; DeploymentTopology includes RegionID.
Frontend — data & UI
web/apps/dashboard/..., web/internal/db/src/schema/*, web/apps/dashboard/lib/trpc/routers/deploy/environment-settings/*, web/apps/dashboard/lib/collections/deploy/environment-settings.ts
Replaced regionConfig map with regions array ({id,name,replicas}); getAvailableRegions now queries DB and returns id/name objects; updated forms, validation, UI (including local flag/avatar).
Worker & deploy flow
svc/ctrl/worker/deploy/*.go, svc/ctrl/worker/deploy/service.go
Topology creation iterates app_regional_settings; uses RegionName/RegionID/Replicas; desired status includes running; absence of regional settings now yields an error instead of defaulting.
Build / misc
svc/krane/BUILD.bazel, other BUILD changes
Added generated sources and new dependencies (proto ctrl/v1, repeat); added rpc_heartbeat.go to cluster service build.

Sequence Diagram(s)

sequenceDiagram
    participant Krane as Krane (client)
    participant Ctrl as Control Plane ClusterService
    participant DB as Database
    rect rgba(0,128,255,0.5)
    Krane->>Ctrl: Heartbeat(region, platform)
    Ctrl->>DB: UpsertRegion(name=region, platform)
    DB-->>Ctrl: OK
    Ctrl->>DB: FindRegionByNameAndPlatform(name=region, platform)
    DB-->>Ctrl: region_id
    Ctrl->>DB: UpsertCluster(id=clusterID, region_id, last_heartbeat_at)
    DB-->>Ctrl: OK
    Ctrl-->>Krane: HeartbeatResponse
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

🚥 Pre-merge checks | ❌ 3

❌ Failed checks (2 warnings, 1 inconclusive)

Check name Status Explanation Resolution
Description check ⚠️ Warning The pull request description is largely incomplete and does not follow the provided template structure, missing required sections like 'What does this PR do?', 'Type of change', 'How should this be tested?', and the completion checklist. Fill out all required sections from the template: provide a concise summary in 'What does this PR do?', mark the type of change, describe testing approach, and complete the required checklist items (build, format, code review steps, etc.).
Docstring Coverage ⚠️ Warning Docstring coverage is 70.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check ❓ Inconclusive The title is vague and generic, using non-descriptive phrasing ('regions fixes') that does not convey meaningful information about the actual scope or nature of the changes. Provide a more specific and descriptive title that summarizes the main change, such as 'Replace region configuration with database-backed regional settings' or 'Migrate to app_regional_settings table and add cluster heartbeat mechanism'.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch regions-fixes

Comment @coderabbitai help to get the list of available commands and usage tips.

@chronark chronark marked this pull request as ready for review March 8, 2026 07:05
@chronark chronark requested a review from imeyer as a code owner March 8, 2026 07:05
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 15

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
web/internal/db/src/schema/app_runtime_settings.ts (1)

45-49: ⚠️ Potential issue | 🟡 Minor

Use proper JSDoc syntax for deprecation annotation.

The deprecation comment uses single-line syntax // instead of the TypeScript JSDoc standard /** @deprecated ... */, which prevents IDEs and type checkers from recognizing and warning about the deprecated field usage.

Suggested fix
-    // `@deprecated`, use appRegionalSettings instead
+    /** `@deprecated` use appRegionalSettings instead */
     regionConfig: json("region_config")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@web/internal/db/src/schema/app_runtime_settings.ts` around lines 45 - 49,
Replace the single-line deprecation comment with a proper JSDoc deprecation
annotation for the regionConfig schema field: change the leading // `@deprecated`
comment to a JSDoc block /** `@deprecated` use appRegionalSettings instead */
immediately above the regionConfig declaration in app_runtime_settings.ts so
IDEs/type checkers recognize the deprecation for the regionConfig
json("region_config") field.
web/apps/dashboard/app/(app)/[workspaceSlug]/projects/new/steps/onboarding-environment-provider.tsx (1)

74-82: ⚠️ Potential issue | 🟠 Major

Don't seal initialization on an empty region list.

If availableRegions first resolves to [], Line 81 still flips hasInitializedRef and Lines 105-106 leave draft.regions empty. Because the ref never resets, later refetches with real regions are ignored, leaving new projects with no regions configured. That now blocks the new deploy flow.

Also applies to: 105-106

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@web/apps/dashboard/app/`(app)/[workspaceSlug]/projects/new/steps/onboarding-environment-provider.tsx
around lines 74 - 82, The effect is marking initialization done even when
availableRegions is an empty array so later real regions are ignored; change the
guard in the useEffect to treat an empty regions list as not-ready (e.g.,
require availableRegions.length > 0 before proceeding) and only set
hasInitializedRef.current = true after you have a non-empty availableRegions;
likewise, when assigning draft.regions (the code that writes to draft.regions at
the later block around lines 105-106), only set draft.regions if
availableRegions is non-empty to avoid persisting an empty regions list.
web/internal/db/src/schema/deployment_topology.ts (1)

22-45: ⚠️ Potential issue | 🟠 Major

Make regionId the canonical topology key.

regionId is introduced here, but this schema still allows missing IDs via default("") and still keys uniqueness on the legacy region label. Since the new regions model is only unique per (name, platform), two platforms sharing the same region name can still collide on unique_version_per_region, and "" will hide missing wiring at insert time.

Proposed fix
-    regionId: varchar("region_id", { length: 64 }).notNull().default(""),
+    regionId: varchar("region_id", { length: 64 }).notNull(),
@@
-    uniqueIndex("unique_region_per_deployment").on(table.deploymentId, table.region),
-    uniqueIndex("unique_version_per_region").on(table.region, table.version),
+    uniqueIndex("unique_region_per_deployment").on(table.deploymentId, table.regionId),
+    uniqueIndex("unique_version_per_region").on(table.regionId, table.version),

As per coding guidelines, "Make illegal states unrepresentable by modeling domain with discriminated unions and parsing inputs at boundaries into typed structures".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@web/internal/db/src/schema/deployment_topology.ts` around lines 22 - 45, The
schema allows empty regionId and still keys uniqueness on the legacy region
label; change regionId (varchar("region_id")) to not allow default "" (remove
.default("") and make it .notNull()), switch unique indexes that currently use
table.region to use table.regionId (replace unique_region_per_deployment and
unique_version_per_region to key on table.regionId instead of table.region), and
ensure any insertion/ORM layer validates/normalizes inputs so regionId is always
provided (parse at API/DTO boundary into a typed structure) so the canonical
topology key is regionId and legacy region remains non-unique label only.
🧹 Nitpick comments (5)
svc/ctrl/services/cluster/rpc_watch_deployments.go (1)

40-47: platform validated but unused.

The platform header is extracted and validated but never used in this function. If it's required for future functionality or consistency across RPCs, consider adding a comment. Otherwise, this validation adds overhead without effect.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@svc/ctrl/services/cluster/rpc_watch_deployments.go` around lines 40 - 47, The
code extracts and validates req.Header values region and platform (via region :=
req.Header().Get("X-Krane-Region"), platform :=
req.Header().Get("X-Krane-Platform") and assert.All) but platform is never used;
either remove the platform extraction/validation or make its intent explicit.
Fix by either deleting the platform variable and its NotEmpty assertion from the
assert.All call (leaving region validation only) or, if platform is required for
consistency/future use, keep the extraction/validation and add a brief comment
above (referencing platform, req.Header(), and assert.All) explaining why
platform must be validated now even though it is not yet used. Ensure the change
updates the assert.All invocation accordingly so there are no unused variables
or dead validation.
web/apps/dashboard/lib/trpc/routers/deploy/environment-settings/runtime/update-regions.ts (1)

54-66: Consider bulk insert to reduce N+1 queries.

The current implementation executes individual INSERT statements for each new region. When multiple regions are added, this results in N+1 database round-trips.

♻️ Proposed bulk insert optimization
       // Insert rows for newly added regions
-      for (const regionId of input.regionIds) {
-        const existing = existingSettings.find((s) => s.regionId === regionId);
-        if (!existing) {
-          await db.insert(appRegionalSettings).values({
-            workspaceId: ctx.workspace.id,
-            appId: env.appId,
-            environmentId: envId,
-            regionId,
-            replicas: defaults.replicas,
-          });
-        }
-      }
+      const existingRegionIds = new Set(existingSettings.map((s) => s.regionId));
+      const newRegions = input.regionIds.filter((regionId) => !existingRegionIds.has(regionId));
+
+      if (newRegions.length > 0) {
+        await db.insert(appRegionalSettings).values(
+          newRegions.map((regionId) => ({
+            workspaceId: ctx.workspace.id,
+            appId: env.appId,
+            environmentId: envId,
+            regionId,
+            replicas: defaults.replicas,
+          })),
+        );
+      }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@web/apps/dashboard/lib/trpc/routers/deploy/environment-settings/runtime/update-regions.ts`
around lines 54 - 66, The loop that inserts new region rows performs one INSERT
per region (see the for loop over input.regionIds, existingSettings lookup, and
db.insert(appRegionalSettings).values({...})), causing N+1 queries; instead,
collect the missing region rows into an array (using ctx.workspace.id,
env.appId, envId, regionId and defaults.replicas) and perform a single bulk
insert call (db.insert(appRegionalSettings).values(rows)) for all new regions in
one operation to eliminate multiple round-trips.
svc/sentinel/config.go (1)

77-80: Consider removing empty Validate method.

The Validate() method body is now empty. If no cross-field validation is needed, consider removing the method entirely or adding a comment explaining it's reserved for future use.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@svc/sentinel/config.go` around lines 77 - 80, The empty Validate method on
the Config type (func (c *Config) Validate()) should be removed if there's no
cross-field validation required; either delete the method entirely to avoid dead
code or, if you intend to keep it for future validation, add a short comment
above Config.Validate explaining it's intentionally no-op/reserved for future
use so linters/reviewers understand it's deliberate.
svc/krane/internal/sentinel/apply.go (1)

170-177: AppID missing from PodTemplateSpec labels.

AppID is added to the Deployment's ObjectMeta labels (line 173) but not to PodTemplateSpec labels (lines 198-202). If AppID is needed for pod-level queries or policies, consider adding it to the pod template labels as well.

Proposed fix
 			Template: corev1.PodTemplateSpec{
 				ObjectMeta: metav1.ObjectMeta{
 					Annotations: map[string]string{
 						"reloader.stakater.com/auto": "true",
 					},
 					Labels: labels.New().
 						WorkspaceID(sentinel.GetWorkspaceId()).
+						AppID(sentinel.GetAppId()).
 						EnvironmentID(sentinel.GetEnvironmentId()).
 						SentinelID(sentinel.GetSentinelId()).
 						ComponentSentinel(),
 				},

Also applies to: 193-203

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@svc/krane/internal/sentinel/apply.go` around lines 170 - 177, The
PodTemplateSpec labels currently omit the AppID which is present on the
Deployment ObjectMeta; update the labels builder used for the pod template (the
labels.New() chain that currently calls WorkspaceID, ProjectID, EnvironmentID,
SentinelID, ComponentSentinel, ManagedByKrane) to include AppID by calling
AppID(sentinel.GetAppId()), and apply the same addition to the other
PodTemplateSpec labels block referenced in the review so pod-level
queries/policies can use the AppID label.
web/internal/db/src/schema/app_regional_settings.ts (1)

27-30: Consider adding relation for horizontalAutoscalingPolicyId.

The schema defines horizontalAutoscalingPolicyId as an optional reference, but there's no corresponding relation defined in appRegionalSettingsRelations. If ORM-level joins to the autoscaling policy are needed, consider adding:

horizontalAutoscalingPolicy: one(horizontalAutoscalingPolicies, {
  fields: [appRegionalSettings.horizontalAutoscalingPolicyId],
  references: [horizontalAutoscalingPolicies.id],
}),
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@web/internal/db/src/schema/app_regional_settings.ts` around lines 27 - 30,
The schema declares horizontalAutoscalingPolicyId but
appRegionalSettingsRelations lacks a relation, so add a one-to-one relation
entry named horizontalAutoscalingPolicy to appRegionalSettingsRelations that
links appRegionalSettings.horizontalAutoscalingPolicyId to
horizontalAutoscalingPolicies.id (use the one(...) relation helper with fields:
[appRegionalSettings.horizontalAutoscalingPolicyId] and references:
[horizontalAutoscalingPolicies.id]) so ORM joins can resolve the referenced
autoscaling policy.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/db/queries/cluster_upsert.sql`:
- Around line 13-14: The upsert only updates last_heartbeat_at and ignores
conflicts on the other unique key (region_id), leaving stale mappings; update
the ON DUPLICATE KEY UPDATE clause in the cluster_upsert.sql so it also
refreshes the id and region_id from the incoming values (e.g., set id =
VALUES(id) or id = sqlc.arg(id), region_id = VALUES(region_id) or region_id =
sqlc.arg(region_id), and last_heartbeat_at = VALUES(last_heartbeat_at) /
sqlc.arg(last_heartbeat_at)) so that conflicts on either unique key reconcile to
the new mapping for the clusters table (referenced symbols: clusters, id,
region_id, last_heartbeat_at).

In `@svc/ctrl/proto/ctrl/v1/cluster.proto`:
- Around line 235-247: The proto was renumbered which breaks wire compatibility;
do not change existing tags for environment_id, sentinel_id, image, replicas,
cpu_millicores, memory_mib—keep their original tag numbers and instead add the
new field app_id at a new unused tag (e.g., tag 10 or another free tag) so
existing serialized messages still decode correctly; update the message
definition to preserve the original tags for environment_id, sentinel_id, image,
replicas, cpu_millicores, memory_mib and place app_id at a new unique tag while
leaving field names and types unchanged.

In `@svc/ctrl/services/cluster/rpc_get_desired_cilium_network_policy_state.go`:
- Around line 27-32: You validate platform but then call
FindCiliumNetworkPolicyByIDAndRegion with only Region (variable region) which
ignores platform and can return the wrong row because regions are unique by
(name, platform); either thread platform into the lookup by adding/using a
repository method that queries WHERE region = ? AND platform = ? (e.g.,
implement FindCiliumNetworkPolicyByIDAndRegionAndPlatform or extend
FindCiliumNetworkPolicyByIDAndRegion to accept platform), or resolve the region
to its region_id first (lookup region by name+platform then call
FindCiliumNetworkPolicyByIDAndRegion with region_id); update the call site in
rpc_get_desired_cilium_network_policy_state.go to pass platform (or resolved
region_id) instead of only region.

In `@svc/ctrl/services/cluster/rpc_heartbeat.go`:
- Around line 57-60: The UpsertCluster call currently uses
uid.New(uid.ClusterPrefix) (in db.Query.UpsertCluster with UpsertClusterParams)
which produces a random id per heartbeat; instead derive a stable cluster ID
from the region/request context (e.g., a deterministic function
deriveClusterID(regionID) or use a namespaced UID function) at the boundary and
pass that as ID so the cluster id is stable and represents region ownership;
update the upsert semantics to key on RegionID (respecting the
clusters_region_id_unique constraint) so subsequent heartbeats update
last_heartbeat_at for that region while preserving the deterministic ID.

In `@svc/ctrl/services/cluster/rpc_report_deployment_status.go`:
- Around line 36-45: The code currently reads X-Krane-Platform into platform but
doesn't use it to scope DB operations; update the transaction inside db.TxRetry
to include platform in all deployment resolution and instance deletion/upsert
filters so DB lookups use the composite (platform, region) key (e.g., when
resolving the deployment by K8s name and when deleting/upserting instances).
Ensure functions/methods that build the mutation path or queries (the logic
called inside the db.TxRetry callback, and any helpers that resolve deployments
or filter instances) accept and apply the platform value and/or validate the
region+platform pair before executing mutations.

In
`@web/apps/dashboard/app/`(app)/[workspaceSlug]/projects/[projectId]/(overview)/settings/components/runtime-settings/regions.tsx:
- Around line 28-29: The effect is being retriggered because defaultRegionNames
is a fresh array every render; instead compute a stable representation and use
that in your effects: memoize the region names inside this component (e.g. const
defaultRegionNamesMemo = useMemo(() => regions.map(r => r.name), [regions.map(r
=> r.name).join('|')]) or otherwise derive a stable key like namesKey =
regions.map(r=>r.name).sort().join(',')) and then use defaultRegionNamesMemo (or
namesKey) as the dependency for the effects that call setSelectedRegionNames and
update hasChanges (references: defaultRegionNames, regions,
setSelectedRegionNames, hasChanges, getAvailableRegions); this prevents
unrelated rerenders (like getAvailableRegions refetch) from resetting
in-progress selections.
- Around line 80-87: The form currently keys selected regions by region.name and
falls back to writing name into draft.regions.id (see draft.regions mapping,
availableRegions, defaultReplicas), which can create invalid ids; change the
combobox/value wiring so the selected value is the region.id (use id for value
and name only for label/search), update the mapping in draft.regions to lookup
availableRegions by id (not name) and set id from the selected id (never set id
= name), and apply the same id-based change for the other occurrence mentioned
(around the 142-145 region selection code) so onSubmit and rehydration no longer
reconstruct ids from names. Ensure all related props/state that currently expect
names are adjusted to expect ids.

In
`@web/apps/dashboard/app/`(app)/[workspaceSlug]/projects/[projectId]/components/region-flag.tsx:
- Around line 61-63: The BoringAvatar render in region-flag.tsx currently omits
accessible text; update the BoringAvatar invocation (the branch that renders
when flagCode === "local") to pass an explicit name and enable title so the
avatar exposes the accessible label — e.g., set name to the flag identifier (use
name={flagCode} or "local" for that branch) and set title={true} on the
BoringAvatar component.
- Around line 61-63: The local branch rendering of BoringAvatar doesn't preserve
the container's rounded-square shape; update the JSX where flagCode === "local"
renders BoringAvatar to pass square={shape === "rounded"} (using the existing
shape variable) so the avatar matches the container (rounded square when shape
=== "rounded", circular when shape === "circle"); keep other props (size,
variant) unchanged and ensure this change is applied in the component that
contains the flagCode check and BoringAvatar usage.

In `@web/apps/dashboard/lib/collections/deploy/environment-settings.ts`:
- Around line 215-237: The code only compares replicas using
original.regions.at(0) / modified.regions.at(0) and skips sending instance
updates when regionsChanged, so edits to replicas in non-first regions or
simultaneous region+replica changes are dropped; update the logic in the block
that computes origReplicas/modReplicas/instancesChanged to compute per-region
replica diffs by matching regions by id (use original.regions and
modified.regions maps keyed by id), build a payload of changed
regionIds->replicas (or a replicasPerRegion map) and call
trpcClient.deploy.environmentSettings.runtime.updateInstances.mutate with that
full per-region update (and do not suppress this mutation just because
regionsChanged), referencing origRegionIds, modRegionIds, original.regions,
modified.regions, instancesChanged, and updateInstances.mutate to locate where
to implement the change.

In
`@web/apps/dashboard/lib/trpc/routers/deploy/environment-settings/runtime/update-instances.ts`:
- Around line 17-27: The query on appRegionalSettings updates replicas for all
apps in the given environment because it lacks an appId filter; fetch the
environment record to get its appId (like update-regions.ts does) and add
eq(appRegionalSettings.appId, environment.appId) to the WHERE clause alongside
eq(appRegionalSettings.workspaceId, ctx.workspace.id) and
inArray(appRegionalSettings.environmentId, envIds) so only the current app's
regional settings are updated when setting input.replicasPerRegion.

In
`@web/apps/dashboard/lib/trpc/routers/deploy/environment-settings/runtime/update-regions.ts`:
- Around line 30-66: The bug is that existingSettings is queried only for
input.environmentId but then reused for every envId in the loop; change to fetch
or index existing settings per environment before inserting to avoid wrong
lookups and duplicate-key inserts. Either (1) move the
appRegionalSettings.findMany query into the for (const envId of envIds) loop and
use that per-env result when computing existing via existingSettings.find((s) =>
s.regionId === regionId), or (2) query all appRegionalSettings for the workspace
and envIds once and build a map keyed by environmentId (and regionId) and use
that map when checking for existence before db.insert; update references to
existingSettings, envIds, input.environmentId, env.appId, appRegionalSettings
and ensure replicas use defaults.replicas.

In `@web/internal/db/src/schema/clusters.ts`:
- Around line 21-26: The exported relation name and field are misleading: update
the export and relation field to reflect that this relation links clusters ->
regions (e.g., rename regionsRelations to clustersRelations or
clusterRegionsRelation) and rename the relation field currently named workspace
to something like region to match the referenced table; modify the
relations(...) call on clusters (the relations function, clusters, and inner
mapping) so it references clusters.regionId -> regions.id but exports and the
key use a clear name (reference symbols: regionsRelations, clusters, relations,
workspace, regions, regionId, id).

In `@web/internal/db/src/schema/horizontal_autoscaling_policies.ts`:
- Around line 20-26: The schema for horizontal_autoscaling_policies currently
allows invalid states (negative replicas, replicasMin > replicasMax, and
thresholds outside 0–100); fix this by adding DB-level constraints in the
table/migration for the columns replicasMin (replicas_min), replicasMax
(replicas_max), and memoryThreshold/cpuThreshold/rpsThreshold (memory_threshold,
cpu_threshold, rps_threshold): enforce replicas_min >= 0, replicas_max >=
replicas_min, and for each threshold allow NULL or require value BETWEEN 0 AND
100 (or convert to unsigned tinyint with CHECK), and include a migration step to
either sanitize or reject existing rows that violate these constraints before
applying them. Ensure the constraints are added in the same migration that
defines/updates horizontal_autoscaling_policies so the DB, not just the UI,
prevents illegal states.

In `@web/tools/migrate/region-config-to-regional-settings.ts`:
- Around line 36-37: The migration currently treats regionConfig and regions as
simple key/value pairs and writes legacy region names directly into
app_regional_settings.region_id; update the boundary parsing so you
parse/validate the incoming regionConfig into a typed structure, resolve each
legacy region key to the canonical region row (e.g., via an existing regions
lookup or a helper like findRegionByName), and insert the resolved region's id
into app_regional_settings.region_id instead of the legacy name; ensure you do
this where regionConfig and regions are created/iterated, avoid any use of any,
as-casts or non-null assertions, and return/throw on unresolved keys so illegal
states are unrepresentable.

---

Outside diff comments:
In
`@web/apps/dashboard/app/`(app)/[workspaceSlug]/projects/new/steps/onboarding-environment-provider.tsx:
- Around line 74-82: The effect is marking initialization done even when
availableRegions is an empty array so later real regions are ignored; change the
guard in the useEffect to treat an empty regions list as not-ready (e.g.,
require availableRegions.length > 0 before proceeding) and only set
hasInitializedRef.current = true after you have a non-empty availableRegions;
likewise, when assigning draft.regions (the code that writes to draft.regions at
the later block around lines 105-106), only set draft.regions if
availableRegions is non-empty to avoid persisting an empty regions list.

In `@web/internal/db/src/schema/app_runtime_settings.ts`:
- Around line 45-49: Replace the single-line deprecation comment with a proper
JSDoc deprecation annotation for the regionConfig schema field: change the
leading // `@deprecated` comment to a JSDoc block /** `@deprecated` use
appRegionalSettings instead */ immediately above the regionConfig declaration in
app_runtime_settings.ts so IDEs/type checkers recognize the deprecation for the
regionConfig json("region_config") field.

In `@web/internal/db/src/schema/deployment_topology.ts`:
- Around line 22-45: The schema allows empty regionId and still keys uniqueness
on the legacy region label; change regionId (varchar("region_id")) to not allow
default "" (remove .default("") and make it .notNull()), switch unique indexes
that currently use table.region to use table.regionId (replace
unique_region_per_deployment and unique_version_per_region to key on
table.regionId instead of table.region), and ensure any insertion/ORM layer
validates/normalizes inputs so regionId is always provided (parse at API/DTO
boundary into a typed structure) so the canonical topology key is regionId and
legacy region remains non-unique label only.

---

Nitpick comments:
In `@svc/ctrl/services/cluster/rpc_watch_deployments.go`:
- Around line 40-47: The code extracts and validates req.Header values region
and platform (via region := req.Header().Get("X-Krane-Region"), platform :=
req.Header().Get("X-Krane-Platform") and assert.All) but platform is never used;
either remove the platform extraction/validation or make its intent explicit.
Fix by either deleting the platform variable and its NotEmpty assertion from the
assert.All call (leaving region validation only) or, if platform is required for
consistency/future use, keep the extraction/validation and add a brief comment
above (referencing platform, req.Header(), and assert.All) explaining why
platform must be validated now even though it is not yet used. Ensure the change
updates the assert.All invocation accordingly so there are no unused variables
or dead validation.

In `@svc/krane/internal/sentinel/apply.go`:
- Around line 170-177: The PodTemplateSpec labels currently omit the AppID which
is present on the Deployment ObjectMeta; update the labels builder used for the
pod template (the labels.New() chain that currently calls WorkspaceID,
ProjectID, EnvironmentID, SentinelID, ComponentSentinel, ManagedByKrane) to
include AppID by calling AppID(sentinel.GetAppId()), and apply the same addition
to the other PodTemplateSpec labels block referenced in the review so pod-level
queries/policies can use the AppID label.

In `@svc/sentinel/config.go`:
- Around line 77-80: The empty Validate method on the Config type (func (c
*Config) Validate()) should be removed if there's no cross-field validation
required; either delete the method entirely to avoid dead code or, if you intend
to keep it for future validation, add a short comment above Config.Validate
explaining it's intentionally no-op/reserved for future use so linters/reviewers
understand it's deliberate.

In
`@web/apps/dashboard/lib/trpc/routers/deploy/environment-settings/runtime/update-regions.ts`:
- Around line 54-66: The loop that inserts new region rows performs one INSERT
per region (see the for loop over input.regionIds, existingSettings lookup, and
db.insert(appRegionalSettings).values({...})), causing N+1 queries; instead,
collect the missing region rows into an array (using ctx.workspace.id,
env.appId, envId, regionId and defaults.replicas) and perform a single bulk
insert call (db.insert(appRegionalSettings).values(rows)) for all new regions in
one operation to eliminate multiple round-trips.

In `@web/internal/db/src/schema/app_regional_settings.ts`:
- Around line 27-30: The schema declares horizontalAutoscalingPolicyId but
appRegionalSettingsRelations lacks a relation, so add a one-to-one relation
entry named horizontalAutoscalingPolicy to appRegionalSettingsRelations that
links appRegionalSettings.horizontalAutoscalingPolicyId to
horizontalAutoscalingPolicies.id (use the one(...) relation helper with fields:
[appRegionalSettings.horizontalAutoscalingPolicyId] and references:
[horizontalAutoscalingPolicies.id]) so ORM joins can resolve the referenced
autoscaling policy.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: f755f3a1-064d-49f2-b5cf-8a6608ca3017

📥 Commits

Reviewing files that changed from the base of the PR and between c37a7e6 and 5c941a4.

⛔ Files ignored due to path filters (4)
  • gen/proto/ctrl/v1/cluster.pb.go is excluded by !**/*.pb.go, !**/gen/**
  • gen/proto/ctrl/v1/ctrlv1connect/cluster.connect.go is excluded by !**/gen/**
  • gen/rpc/ctrl/cluster_generated.go is excluded by !**/gen/**
  • web/apps/dashboard/gen/proto/ctrl/v1/cluster_pb.ts is excluded by !**/gen/**
📒 Files selected for processing (86)
  • dev/config/ctrl-api.toml
  • dev/config/krane.toml
  • dev/k8s/manifests/ctrl-api.yaml
  • dev/k8s/manifests/ctrl-worker.yaml
  • dev/k8s/manifests/krane.yaml
  • docs/engineering/architecture/services/control-plane/api/configuration.mdx
  • docs/engineering/architecture/services/control-plane/worker/configuration.mdx
  • pkg/db/BUILD.bazel
  • pkg/db/app_regional_settings_find_by_app_and_env.sql_generated.go
  • pkg/db/bulk_cluster_region_upsert.sql_generated.go
  • pkg/db/bulk_cluster_upsert.sql_generated.go
  • pkg/db/bulk_deployment_step_insert.sql_generated.go
  • pkg/db/bulk_deployment_topology_insert.sql_generated.go
  • pkg/db/cluster_region_find_by_name.sql_generated.go
  • pkg/db/cluster_region_list.sql_generated.go
  • pkg/db/cluster_region_upsert.sql_generated.go
  • pkg/db/cluster_upsert.sql_generated.go
  • pkg/db/deployment_step_insert.sql_generated.go
  • pkg/db/deployment_topology_insert.sql_generated.go
  • pkg/db/deployment_topology_list_by_versions.sql_generated.go
  • pkg/db/deployment_topology_list_desired.sql_generated.go
  • pkg/db/models_generated.go
  • pkg/db/querier_bulk_generated.go
  • pkg/db/querier_generated.go
  • pkg/db/queries/app_regional_settings_find_by_app_and_env.sql
  • pkg/db/queries/cluster_region_find_by_name.sql
  • pkg/db/queries/cluster_region_list.sql
  • pkg/db/queries/cluster_region_upsert.sql
  • pkg/db/queries/cluster_upsert.sql
  • pkg/db/queries/deployment_step_insert.sql
  • pkg/db/queries/deployment_topology_insert.sql
  • pkg/db/schema.sql
  • pkg/uid/prefix.go
  • svc/ctrl/api/certificate.go
  • svc/ctrl/api/config.go
  • svc/ctrl/api/harness_test.go
  • svc/ctrl/api/run.go
  • svc/ctrl/integration/harness.go
  • svc/ctrl/integration/harness/harness.go
  • svc/ctrl/proto/ctrl/v1/cluster.proto
  • svc/ctrl/services/cluster/BUILD.bazel
  • svc/ctrl/services/cluster/rpc_get_desired_cilium_network_policy_state.go
  • svc/ctrl/services/cluster/rpc_get_desired_deployment_state.go
  • svc/ctrl/services/cluster/rpc_get_desired_sentinel_state.go
  • svc/ctrl/services/cluster/rpc_heartbeat.go
  • svc/ctrl/services/cluster/rpc_report_deployment_status.go
  • svc/ctrl/services/cluster/rpc_report_sentinel_status.go
  • svc/ctrl/services/cluster/rpc_watch_cilium_network_policies.go
  • svc/ctrl/services/cluster/rpc_watch_deployments.go
  • svc/ctrl/services/cluster/rpc_watch_sentinels.go
  • svc/ctrl/services/deployment/create_deployment.go
  • svc/ctrl/services/deployment/doc.go
  • svc/ctrl/worker/config.go
  • svc/ctrl/worker/deploy/deploy_handler.go
  • svc/ctrl/worker/deploy/service.go
  • svc/ctrl/worker/run.go
  • svc/krane/BUILD.bazel
  • svc/krane/config.go
  • svc/krane/internal/sentinel/apply.go
  • svc/krane/internal/sentinel/controller.go
  • svc/krane/internal/testutil/mock_cluster_client.go
  • svc/krane/pkg/controlplane/client.go
  • svc/krane/pkg/controlplane/interceptor.go
  • svc/krane/run.go
  • svc/sentinel/config.go
  • svc/sentinel/run.go
  • web/apps/dashboard/.env.example
  • web/apps/dashboard/app/(app)/[workspaceSlug]/projects/[projectId]/(overview)/deployments/[deploymentId]/network/unkey-flow/components/nodes/types.ts
  • web/apps/dashboard/app/(app)/[workspaceSlug]/projects/[projectId]/(overview)/settings/components/runtime-settings/instances.tsx
  • web/apps/dashboard/app/(app)/[workspaceSlug]/projects/[projectId]/(overview)/settings/components/runtime-settings/regions.tsx
  • web/apps/dashboard/app/(app)/[workspaceSlug]/projects/[projectId]/components/region-flag.tsx
  • web/apps/dashboard/app/(app)/[workspaceSlug]/projects/new/steps/onboarding-environment-provider.tsx
  • web/apps/dashboard/lib/collections/deploy/environment-settings.ts
  • web/apps/dashboard/lib/trpc/routers/deploy/environment-settings/get-available-regions.ts
  • web/apps/dashboard/lib/trpc/routers/deploy/environment-settings/get.ts
  • web/apps/dashboard/lib/trpc/routers/deploy/environment-settings/runtime/update-instances.ts
  • web/apps/dashboard/lib/trpc/routers/deploy/environment-settings/runtime/update-regions.ts
  • web/apps/dashboard/lib/trpc/routers/deploy/network/utils.ts
  • web/internal/db/src/schema/app_regional_settings.ts
  • web/internal/db/src/schema/app_runtime_settings.ts
  • web/internal/db/src/schema/clusters.ts
  • web/internal/db/src/schema/deployment_topology.ts
  • web/internal/db/src/schema/horizontal_autoscaling_policies.ts
  • web/internal/db/src/schema/index.ts
  • web/internal/db/src/schema/regions.ts
  • web/tools/migrate/region-config-to-regional-settings.ts
💤 Files with no reviewable changes (9)
  • svc/ctrl/api/run.go
  • svc/ctrl/api/config.go
  • svc/ctrl/worker/config.go
  • dev/k8s/manifests/ctrl-worker.yaml
  • docs/engineering/architecture/services/control-plane/worker/configuration.mdx
  • svc/ctrl/services/deployment/create_deployment.go
  • dev/config/ctrl-api.toml
  • web/apps/dashboard/.env.example
  • dev/k8s/manifests/ctrl-api.yaml

Comment on lines +13 to +14
ON DUPLICATE KEY UPDATE
last_heartbeat_at = sqlc.arg(last_heartbeat_at);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Handle duplicate hits on both unique keys.

clusters is unique on both id and region_id, but this upsert only refreshes last_heartbeat_at. If a cluster is re-registered under a new region, or a region gets a new cluster ID, the stale mapping survives and the control plane keeps the old association.

Proposed fix
 ON DUPLICATE KEY UPDATE
+	id = sqlc.arg(id),
+	region_id = sqlc.arg(region_id),
 	last_heartbeat_at = sqlc.arg(last_heartbeat_at);
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
ON DUPLICATE KEY UPDATE
last_heartbeat_at = sqlc.arg(last_heartbeat_at);
ON DUPLICATE KEY UPDATE
id = sqlc.arg(id),
region_id = sqlc.arg(region_id),
last_heartbeat_at = sqlc.arg(last_heartbeat_at);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/db/queries/cluster_upsert.sql` around lines 13 - 14, The upsert only
updates last_heartbeat_at and ignores conflicts on the other unique key
(region_id), leaving stale mappings; update the ON DUPLICATE KEY UPDATE clause
in the cluster_upsert.sql so it also refreshes the id and region_id from the
incoming values (e.g., set id = VALUES(id) or id = sqlc.arg(id), region_id =
VALUES(region_id) or region_id = sqlc.arg(region_id), and last_heartbeat_at =
VALUES(last_heartbeat_at) / sqlc.arg(last_heartbeat_at)) so that conflicts on
either unique key reconcile to the new mapping for the clusters table
(referenced symbols: clusters, id, region_id, last_heartbeat_at).

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
svc/frontline/services/router/service.go (1)

129-145: ⚠️ Potential issue | 🔴 Critical

Key shape inconsistent.

Line 139 looks up region.platform, but Line 113 stores gw.Region and Line 124 still checks s.region. One path will miss: if gw.Region is region-only, proximity falls back to an arbitrary region and the proxy can build a host without platform; if it already includes platform, local routing never matches. Normalize this key once, then use it for map population, local lookup, and proximity lookup.

As per coding guidelines, "Make illegal states unrepresentable by modeling domain with ADTs/discriminated unions and parsing inputs at boundaries into typed structures".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@svc/frontline/services/router/service.go` around lines 129 - 145, The code
builds and looks up region keys inconsistently (sometimes using gw.Region,
sometimes s.region plus s.platform), causing missed matches; update the logic to
normalize region keys into a single canonical form (e.g., "region.platform") at
the boundary where healthyByRegion is populated and then use that same canonical
key for lookups in findNearestRegionPlatform (the self variable and proximity
map lookup), and when setting decision.NearestNLBRegionPlatform ensure you store
the canonical key; specifically, change the code paths that populate
healthyByRegion (where gw.Region is used), the creation of self in
findNearestRegionPlatform, and the regionProximity keys to all use the same
normalized representation so lookups always match.
🧹 Nitpick comments (1)
svc/frontline/services/router/interface.go (1)

14-16: Don't smuggle region + platform in one string.

NearestNLBRegionPlatform makes malformed targets representable and forces every caller to know the "<region>.<platform>" encoding. Prefer a small typed value here and render the DNS label only at the proxy boundary.

As per coding guidelines, "Make illegal states unrepresentable by modeling domain with ADTs/discriminated unions and parsing inputs at boundaries into typed structures".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@svc/frontline/services/router/interface.go` around lines 14 - 16,
NearestNLBRegionPlatform currently smuggles "region.platform" into a single
string which allows malformed values and leaks encoding to callers; replace this
with a small typed ADT (e.g., a struct/type named RegionPlatform with explicit
Region and Platform fields and a constructor ParseRegionPlatform) and update the
interface to expose that type instead of NearestNLBRegionPlatform string; add a
method on the new type (e.g., ToDNSLabel or String) that renders the DNS-safe
"<region>.<platform>" only where the proxy/NLB boundary needs it, and update
callers to construct/parse at input boundaries via ParseRegionPlatform and to
call ToDNSLabel only at the proxy layer.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@svc/frontline/config.go`:
- Around line 33-35: The Platform field in the config struct is only marked
required but not validated, so invalid (non-DNS-label-safe) values can flow into
routing code; update the Validate() method to check the Platform string on load
and return an error if it is not a valid DNS label (RFC1123 style: lowercase
letters, digits and hyphens, starts/ends with alphanumeric, max 63 chars).
Locate the config struct's Platform field and add this check in the Validate()
function (return a descriptive error mentioning Platform), ensuring all callers
that load config will fail fast on invalid platform values.

In `@svc/frontline/services/proxy/director.go`:
- Around line 40-42: The sentinel director (makeSentinelDirector) is currently
forwarding X-Unkey-Frontline-Id, X-Unkey-Region, and X-Unkey-Request-Id but
omits the platform header, so same-region/local sentinel hops lose platform
context; update makeSentinelDirector to also set the HeaderPlatform
(X-Unkey-Platform) on the outgoing request alongside HeaderFrontlineID and
HeaderRegion (and ensure any use of req.Header.Set(HeaderPlatform, ...) mirrors
the remote-hop logic used elsewhere) so local sentinel paths include the
platform header consistently.

---

Outside diff comments:
In `@svc/frontline/services/router/service.go`:
- Around line 129-145: The code builds and looks up region keys inconsistently
(sometimes using gw.Region, sometimes s.region plus s.platform), causing missed
matches; update the logic to normalize region keys into a single canonical form
(e.g., "region.platform") at the boundary where healthyByRegion is populated and
then use that same canonical key for lookups in findNearestRegionPlatform (the
self variable and proximity map lookup), and when setting
decision.NearestNLBRegionPlatform ensure you store the canonical key;
specifically, change the code paths that populate healthyByRegion (where
gw.Region is used), the creation of self in findNearestRegionPlatform, and the
regionProximity keys to all use the same normalized representation so lookups
always match.

---

Nitpick comments:
In `@svc/frontline/services/router/interface.go`:
- Around line 14-16: NearestNLBRegionPlatform currently smuggles
"region.platform" into a single string which allows malformed values and leaks
encoding to callers; replace this with a small typed ADT (e.g., a struct/type
named RegionPlatform with explicit Region and Platform fields and a constructor
ParseRegionPlatform) and update the interface to expose that type instead of
NearestNLBRegionPlatform string; add a method on the new type (e.g., ToDNSLabel
or String) that renders the DNS-safe "<region>.<platform>" only where the
proxy/NLB boundary needs it, and update callers to construct/parse at input
boundaries via ParseRegionPlatform and to call ToDNSLabel only at the proxy
layer.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 532e7bcf-c4ce-42d6-9b55-0aa8d7cb4d4c

📥 Commits

Reviewing files that changed from the base of the PR and between 5c941a4 and 3099de6.

📒 Files selected for processing (11)
  • dev/k8s/manifests/frontline.yaml
  • svc/frontline/config.go
  • svc/frontline/routes/proxy/handler.go
  • svc/frontline/run.go
  • svc/frontline/services/proxy/director.go
  • svc/frontline/services/proxy/forward.go
  • svc/frontline/services/proxy/headers.go
  • svc/frontline/services/proxy/interface.go
  • svc/frontline/services/proxy/service.go
  • svc/frontline/services/router/interface.go
  • svc/frontline/services/router/service.go

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@svc/ctrl/services/cluster/rpc_watch_deployments.go`:
- Around line 40-45: The request currently requires the X-Krane-Platform header
(variable platform) via assert.All(assert.NotEmpty(...)) even though the RPC
only uses region to scope results; remove platform from the mandatory checks so
older callers without that header aren't rejected—i.e., update the validation in
rpc_watch_deployments.go to only assert.NotEmpty(region, "region is required")
(leave platform as an optional header and validate it later only when it is
actually used).

In `@svc/ctrl/worker/deploy/deploy_handler.go`:
- Around line 391-408: The query that loads regionalSettings uses the
read-replica accessor w.db.RO(), which can return stale results and incorrectly
trigger the terminal "No regions configured" error; change the DB accessor
inside the restate.Run closure to use w.db.RW() (i.e., call
db.Query.FindAppRegionalSettingsByAppAndEnv with w.db.RW() instead of w.db.RO())
so the just-saved region is visible; keep the restate.Run,
FindAppRegionalSettingsByAppAndEnv call, and error handling unchanged.
- Around line 413-430: Reject or filter out regions with non-positive replica
counts before creating deployment topologies: validate regionalSettings entries
(the loop over regionalSettings and the rs.Replicas field) and return a
validation error or skip entries where rs.Replicas <= 0 so you never append a
topology with DesiredReplicas <= 0 (see where you build
db.InsertDeploymentTopologyParams and set DesiredReplicas). Ensure this check is
performed at the boundary (prior to appending to topologies and prior to
creating sentinels) so waitForDeployments cannot treat zero/non-positive
DesiredReplicas as healthy; update the handler that builds topologies (the loop
using regionalSettings and rs) to either filter invalid regions or return a
clear error indicating the invalid rs.Replicas value.

In `@web/internal/db/src/schema/deployment_topology.ts`:
- Around line 22-23: The schema uses the display column region as a
uniqueness/version key which causes collisions across platforms; change the
uniqueness and version index definitions in deployment_topology to use regionId
(the stable key) instead of region. Locate the deployment_topology table
definition where region and regionId are declared and update any unique
constraints/indexes and version stream indexes that reference region to
reference regionId (ensure columns named regionId are used in place of region
for uniqueness/versioning).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 6eece26e-2534-48fb-90ca-33448d6a8299

📥 Commits

Reviewing files that changed from the base of the PR and between 3099de6 and 078a045.

📒 Files selected for processing (7)
  • pkg/db/models_generated.go
  • pkg/db/schema.sql
  • svc/ctrl/integration/harness.go
  • svc/ctrl/services/cluster/rpc_watch_deployments.go
  • svc/ctrl/worker/deploy/deploy_handler.go
  • svc/ctrl/worker/deployment/deployment_state.go
  • web/internal/db/src/schema/deployment_topology.ts

@chronark chronark merged commit ddc4896 into main Mar 8, 2026
12 checks passed
@chronark chronark deleted the regions-fixes branch March 8, 2026 09:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant