Skip to content

feat: implement krane gateway RPCs for docker#4365

Merged
chronark merged 4 commits intomainfrom
11-19-feat_implement_krane_gateway_rpcs_for_docker
Nov 26, 2025
Merged

feat: implement krane gateway RPCs for docker#4365
chronark merged 4 commits intomainfrom
11-19-feat_implement_krane_gateway_rpcs_for_docker

Conversation

@chronark
Copy link
Collaborator

@chronark chronark commented Nov 19, 2025

What does this PR do?

This PR adds gateway management functionality to the Docker backend in the Krane application. It introduces three new files for gateway operations:

  1. gateway_create.go - Implements container creation for gateways with specified replica counts, port mapping, and resource limits
  2. gateway_delete.go - Provides functionality to remove all containers associated with a gateway
  3. gateway_get.go - Retrieves container status and addresses for gateway instances

The PR also renames existing deployment files for consistency:

  • create_deployment.godeployment_create.go
  • delete_deployment.godeployment_delete.go
  • get_deployment.godeployment_get.go

Additionally, it updates the service interface to implement the GatewayServiceHandler.

Type of change

  • New feature (non-breaking change which adds functionality)

How should this be tested?

  • Create a gateway with multiple replicas and verify containers are created with proper labels and port mappings
  • Retrieve gateway status and confirm instance addresses are correctly formatted
  • Delete a gateway and verify all associated containers are removed
  • Verify existing deployment functionality continues to work as expected

Checklist

Required

  • Filled out the "How to test" section in this PR
  • Read Contributing Guide
  • Self-reviewed my own code
  • Commented on my code in hard-to-understand areas
  • Ran pnpm build
  • Ran pnpm fmt
  • Ran make fmt on /go directory
  • Checked for warnings, there are none
  • Removed all console.logs
  • Merged the latest changes from main onto my branch with git pull origin main
  • My changes don't cause any responsiveness issues

@changeset-bot
Copy link

changeset-bot bot commented Nov 19, 2025

⚠️ No Changeset found

Latest commit: 4693bcf

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@vercel
Copy link

vercel bot commented Nov 19, 2025

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Preview Comments Updated (UTC)
dashboard Ready Ready Preview Comment Nov 26, 2025 0:26am
engineering Ready Ready Preview Comment Nov 26, 2025 0:26am

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 19, 2025

📝 Walkthrough

Walkthrough

The changes implement gateway lifecycle management for Docker containers. Three new handler methods manage gateway creation (with container replicas), deletion, and retrieval. The docker service is updated to implement the GatewayServiceHandler interface.

Changes

Cohort / File(s) Change Summary
Gateway Management
go/apps/krane/backend/docker/gateway_create.go, gateway_delete.go, gateway_get.go
New methods CreateGateway, DeleteGateway, and GetGateway implemented. CreateGateway orchestrates multiple containers per replica with port binding, resource limits, and labels. DeleteGateway removes containers by gateway ID label. GetGateway lists containers, maps states to gateway statuses, and returns instance details with host addresses and ports.
Service Integration
go/apps/krane/backend/docker/service.go
Embedded UnimplementedGatewayServiceHandler in docker struct and added compile-time assertion for GatewayServiceHandler implementation.

Sequence Diagram

sequenceDiagram
    participant Client
    participant Handler as Handler (docker)
    participant Docker as Docker API

    rect rgb(200, 220, 255)
    Note over Client,Docker: Create Gateway
    Client->>Handler: CreateGateway(gatewayId, replicas)
    Handler->>Docker: EnsureImage(image)
    Handler->>Docker: ContainerCreate(...) × replicas
    Handler->>Docker: ContainerStart(...)
    Handler->>Client: CreateGatewayResponse (PENDING)
    end

    rect rgb(220, 240, 220)
    Note over Client,Docker: Get Gateway Status
    Client->>Handler: GetGateway(gatewayId)
    Handler->>Docker: ContainerList (filter by label)
    Handler->>Handler: Map container state → GatewayStatus
    Handler->>Client: GetGatewayResponse (instances + status)
    end

    rect rgb(255, 220, 220)
    Note over Client,Docker: Delete Gateway
    Client->>Handler: DeleteGateway(gatewayId)
    Handler->>Docker: ContainerList (filter by label)
    Handler->>Docker: ContainerRemove(...) × containers
    Handler->>Client: DeleteGatewayResponse
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

  • gateway_create.go: Review container creation logic, configuration assembly (CPU, memory, environment, labels), replica loop, and error handling for Docker API failures
  • gateway_delete.go: Verify container filtering by gateway ID label and cascading removal with volumes
  • gateway_get.go: Validate state-to-status mapping logic and response assembly with address/port binding
  • service.go: Confirm interface implementation assertion is correctly structured

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically summarizes the main change: implementing gateway RPCs for the Docker backend in Krane.
Description check ✅ Passed The description is well-structured, covering what the PR does, type of change, testing approach, and completed checklist items aligned with the template.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch 11-19-feat_implement_krane_gateway_rpcs_for_docker

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Collaborator Author

chronark commented Nov 19, 2025

@vercel vercel bot temporarily deployed to Preview – engineering November 19, 2025 22:19 Inactive
@vercel vercel bot temporarily deployed to Preview – dashboard November 19, 2025 22:19 Inactive
@chronark chronark mentioned this pull request Nov 20, 2025
19 tasks
Base automatically changed from 11-19-feat_define_protos_for_kranes_gateway_management to main November 20, 2025 13:37
@chronark
Copy link
Collaborator Author

@coderabbitai review

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 20, 2025

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@chronark chronark marked this pull request as ready for review November 20, 2025 13:39
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (1)
go/apps/krane/backend/docker/service.go (1)

27-32: GatewayServiceHandler wiring looks correct; optional symmetry nit in constructor

The new UnimplementedGatewayServiceHandler embed and compile-time assertion look correct and will keep the Docker backend aligned with the gateway RPC surface. You don't strictly need to initialize this embed in New, but for symmetry with UnimplementedDeploymentServiceHandler you could set it explicitly in the composite literal.

If you want that symmetry, something like this would work:

 	d := &docker{
-		registryAuth:                          "",
-		UnimplementedDeploymentServiceHandler: kranev1connect.UnimplementedDeploymentServiceHandler{},
-		logger:                                logger,
-		client:                                dockerClient,
+		registryAuth:                          "",
+		UnimplementedDeploymentServiceHandler: kranev1connect.UnimplementedDeploymentServiceHandler{},
+		UnimplementedGatewayServiceHandler:    kranev1connect.UnimplementedGatewayServiceHandler{},
+		logger:                                logger,
+		client:                                dockerClient,
 	}
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f74ce8c and 67c453c.

📒 Files selected for processing (4)
  • go/apps/krane/backend/docker/gateway_create.go (1 hunks)
  • go/apps/krane/backend/docker/gateway_delete.go (1 hunks)
  • go/apps/krane/backend/docker/gateway_get.go (1 hunks)
  • go/apps/krane/backend/docker/service.go (1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: Flo4604
Repo: unkeyed/unkey PR: 4190
File: go/internal/services/keys/verifier.go:51-53
Timestamp: 2025-10-30T15:10:52.743Z
Learning: PR #4190 for unkeyed/unkey is focused solely on database schema and query changes for identity-based credits. It adds IdentityCredits and KeyCredits fields to structs and queries, but does not implement the priority enforcement logic in the usagelimiter. The logic implementation is intentionally deferred to a later PR in the stack.
Learnt from: imeyer
Repo: unkeyed/unkey PR: 3755
File: .github/workflows/job_detect_changes.yaml:111-114
Timestamp: 2025-08-08T16:10:00.224Z
Learning: Repo: unkeyed/unkey — When imeyer writes “make me an issue” on a PR thread, automatically open a thorough GitHub issue (Summary, Impact, Where, Repro, Observed vs Expected, Acceptance Criteria, Validation Plan, Out of Scope, References), include backlinks to the PR and comment, and assign to imeyer.
Learnt from: imeyer
Repo: unkeyed/unkey PR: 3755
File: .github/actions/setup-node/action.yaml:0-0
Timestamp: 2025-08-08T15:10:46.436Z
Learning: Repo: unkeyed/unkey — Preference: If imeyer comments “issue” on a PR thread, automatically open a thorough GitHub issue (Summary, Impact, Where, Observed vs Expected, Acceptance Criteria, Validation Plan, Out of Scope, References), include backlinks to the PR and comment, and assign to imeyer.
📚 Learning: 2025-10-15T10:12:40.810Z
Learnt from: Flo4604
Repo: unkeyed/unkey PR: 4098
File: go/proto/ctrl/v1/deployment.proto:33-36
Timestamp: 2025-10-15T10:12:40.810Z
Learning: In the Unkey codebase proto files (ctrl/v1/build.proto, ctrl/v1/deployment.proto, hydra/v1/deployment.proto), use `dockerfile_path` (not `docker_file_path`) for consistency in generated Go field names.

Applied to files:

  • go/apps/krane/backend/docker/gateway_delete.go
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Analyze (javascript-typescript)

Comment on lines +19 to +105
func (d *docker) CreateGateway(ctx context.Context, req *connect.Request[kranev1.CreateGatewayRequest]) (*connect.Response[kranev1.CreateGatewayResponse], error) {
gateway := req.Msg.GetGateway()
d.logger.Info("creating gateway",
"gateway_id", gateway.GetGatewayId(),
"image", gateway.GetImage(),
)

// Ensure image exists locally (pull if not present)
if err := d.ensureImageExists(ctx, gateway.GetImage()); err != nil {
return nil, connect.NewError(connect.CodeInternal,
fmt.Errorf("failed to ensure image exists: %w", err))
}

// Configure port mapping
exposedPorts := nat.PortSet{
"8040/tcp": struct{}{},
}

portBindings := nat.PortMap{
"8040/tcp": []nat.PortBinding{
{
HostIP: "0.0.0.0",
HostPort: "0", // Docker will assign a random available port
},
},
}

// Configure resource limits
cpuNanos := int64(gateway.GetCpuMillicores()) * 1_000_000 // Convert millicores to nanoseconds
memoryBytes := int64(gateway.GetMemorySizeMib()) * 1024 * 1024 //nolint:gosec // Intentional conversion

//nolint:exhaustruct // Docker SDK types have many optional fields
containerConfig := &container.Config{
Image: gateway.GetImage(),
Labels: map[string]string{
"unkey.gateway.id": gateway.GetGatewayId(),
"unkey.managed.by": "krane",
},
ExposedPorts: exposedPorts,
Env: []string{
fmt.Sprintf("UNKEY_WORKSPACE_ID=%s", gateway.GetWorkspaceId()),
fmt.Sprintf("UNKEY_GATEWAY_ID=%s", gateway.GetGatewayId()),
fmt.Sprintf("UNKEY_IMAGE=%s", gateway.GetImage()),
},
}

//nolint:exhaustruct // Docker SDK types have many optional fields
hostConfig := &container.HostConfig{
PortBindings: portBindings,
RestartPolicy: container.RestartPolicy{
Name: "unless-stopped",
},
Resources: container.Resources{
NanoCPUs: cpuNanos,
Memory: memoryBytes,
},
}

//nolint:exhaustruct // Docker SDK types have many optional fields
networkConfig := &network.NetworkingConfig{}

// Create container

for i := range req.Msg.GetGateway().GetReplicas() {
//nolint:exhaustruct // Docker SDK types have many optional fields
resp, err := d.client.ContainerCreate(
ctx,
containerConfig,
hostConfig,
networkConfig,
nil,
fmt.Sprintf("%s-%d", gateway.GetGatewayId(), i),
)
if err != nil {
return nil, connect.NewError(connect.CodeInternal, fmt.Errorf("failed to create container: %w", err))
}

//nolint:exhaustruct // Docker SDK types have many optional fields
err = d.client.ContainerStart(ctx, resp.ID, container.StartOptions{})
if err != nil {
return nil, connect.NewError(connect.CodeInternal, fmt.Errorf("failed to start container: %w", err))
}
}

return connect.NewResponse(&kranev1.CreateGatewayResponse{
Status: kranev1.GatewayStatus_GATEWAY_STATUS_PENDING,
}), nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Guard against nil gateway and double‑check replica loop semantics

The overall shape of CreateGateway looks good: one image existence check, shared configs for ports/resources, and per‑replica container creation & start.

Two things are worth tightening up:

  1. Nil gateway can panic

gateway := req.Msg.GetGateway() can be nil if the client sends a request without the embedded message set. Calling gateway.GetGatewayId()/GetImage() will then panic the handler.

Adding a fast validation guard and returning INVALID_ARGUMENT (or similar) avoids a crash:

-	gateway := req.Msg.GetGateway()
-	d.logger.Info("creating gateway",
+	gateway := req.Msg.GetGateway()
+	if gateway == nil {
+		return nil, connect.NewError(
+			connect.CodeInvalidArgument,
+			fmt.Errorf("gateway is required"),
+		)
+	}
+
+	d.logger.Info("creating gateway",
 		"gateway_id", gateway.GetGatewayId(),
 		"image", gateway.GetImage(),
 	)
  1. Replica loop likely wants an explicit count, not range

The loop

for i := range req.Msg.GetGateway().GetReplicas() {

assumes GetReplicas() is something you can range over. If replicas is actually a numeric count (common for deployment-style APIs), this won’t compile; if it’s a slice, you’re also re-fetching gateway from the request on each iteration instead of using the local gateway.

Assuming replicas is a scalar count, something like this is safer and clearer:

-	// Create container
-
-	for i := range req.Msg.GetGateway().GetReplicas() {
+	// Create containers (one per replica)
+	replicas := int(gateway.GetReplicas()) // adjust type conversion as needed
+	for i := 0; i < replicas; i++ {
 		//nolint:exhaustruct // Docker SDK types have many optional fields
 		resp, err := d.client.ContainerCreate(
 			ctx,
@@
-			nil,
-			fmt.Sprintf("%s-%d", gateway.GetGatewayId(), i),
+			nil,
+			fmt.Sprintf("%s-%d", gateway.GetGatewayId(), i),
 		)

Please verify the actual protobuf type of replicas (int32, uint32, or a repeated message) and adjust the conversion accordingly.

To confirm the field shape, you can run:


🏁 Script executed:

#!/bin/bash
# Inspect the krane gateway proto to see how replicas is defined.
rg -n "message Gateway" -S
rg -n "replicas" -S

Length of output: 5236



---

🏁 Script executed:

```shell
#!/bin/bash
# Check the complete Gateway proto message definition
cat go/proto/krane/v1/gateway.proto | head -50

# Also verify nil-check patterns in similar code (ensureImageExists, container creation)
rg -n "if.*== nil" go/apps/krane/backend/docker/gateway_create.go -A 2

Length of output: 997



---

🏁 Script executed:

```shell
#!/bin/bash
# View the exact loop line in gateway_create.go
sed -n '80,95p' go/apps/krane/backend/docker/gateway_create.go

Length of output: 437


Fix compilation error in replica loop and add nil guard for gateway

The overall shape of CreateGateway looks good: one image existence check, shared configs for ports/resources, and per‑replica container creation & start.

Two critical issues must be fixed:

  1. Nil gateway can panic

gateway := req.Msg.GetGateway() can be nil if the client sends a request without the embedded message set. Calling gateway.GetGatewayId()/GetImage() will then panic the handler.

Adding a fast validation guard and returning INVALID_ARGUMENT (or similar) avoids a crash:

-	gateway := req.Msg.GetGateway()
-	d.logger.Info("creating gateway",
+	gateway := req.Msg.GetGateway()
+	if gateway == nil {
+		return nil, connect.NewError(
+			connect.CodeInvalidArgument,
+			fmt.Errorf("gateway is required"),
+		)
+	}
+
+	d.logger.Info("creating gateway",
 		"gateway_id", gateway.GetGatewayId(),
 		"image", gateway.GetImage(),
 	)
  1. Replica loop cannot compile: GetReplicas() is uint32, not iterable

The proto defines uint32 replicas = 5; in GatewayRequest. In Go, range only works on arrays, slices, maps, strings, and channels—not scalar integers.

The current code:

for i := range req.Msg.GetGateway().GetReplicas() {

will fail to compile. Use an explicit numeric loop instead:

-	// Create container
-
-	for i := range req.Msg.GetGateway().GetReplicas() {
+	// Create containers (one per replica)
+	replicas := gateway.GetReplicas() // uint32
+	for i := uint32(0); i < replicas; i++ {
 		//nolint:exhaustruct // Docker SDK types have many optional fields
 		resp, err := d.client.ContainerCreate(
 			ctx,
 			containerConfig,
 			hostConfig,
 			networkConfig,
 			nil,
-			fmt.Sprintf("%s-%d", gateway.GetGatewayId(), i),
+			fmt.Sprintf("%s-%d", gateway.GetGatewayId(), i),
 		)
🤖 Prompt for AI Agents
In go/apps/krane/backend/docker/gateway_create.go around lines 19-105, add a nil
check for gateway right after retrieving it from the request and return a
connect.NewError(connect.CodeInvalidArgument, fmt.Errorf("gateway must be
provided")) if nil; move any uses of gateway.Get... (including the initial
d.logger.Info call) to after this guard so we don't panic. Replace the invalid
range over GetReplicas() with a numeric loop such as: for i := 0; i <
int(gateway.GetReplicas()); i++ { ... } (use int conversion for
indexing/naming), leaving the container creation/start logic unchanged and
returning the same response on success.

Comment on lines +13 to +48
// DeleteGateway removes all containers for a gateway.
//
// Finds containers by gateway ID label and forcibly removes them with
// volumes and network links to ensure complete cleanup.
func (d *docker) DeleteGateway(ctx context.Context, req *connect.Request[kranev1.DeleteGatewayRequest]) (*connect.Response[kranev1.DeleteGatewayResponse], error) {
gatewayID := req.Msg.GetGatewayId()

d.logger.Info("getting gateway", "gateway_id", gatewayID)

containers, err := d.client.ContainerList(ctx, container.ListOptions{
Size: false,
Latest: false,
Since: "",
Before: "",
Limit: 0,
All: true,
Filters: filters.NewArgs(
filters.Arg("label", fmt.Sprintf("unkey.gateway.id=%s", gatewayID)),
),
})
if err != nil {
return nil, connect.NewError(connect.CodeInternal, fmt.Errorf("failed to list containers: %w", err))
}

for _, c := range containers {
err := d.client.ContainerRemove(ctx, c.ID, container.RemoveOptions{
RemoveVolumes: true,
RemoveLinks: true,
Force: true,
})
if err != nil {
return nil, connect.NewError(connect.CodeInternal, fmt.Errorf("failed to remove container: %w", err))
}
}
return connect.NewResponse(&kranev1.DeleteGatewayResponse{}), nil
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

❓ Verification inconclusive

Tighten DeleteGateway logging and consider best‑effort, idempotent semantics

The core deletion flow (list by unkey.gateway.id label, All: true, then ContainerRemove with Force and RemoveVolumes) looks good for aggressive cleanup.

Two small points:

  1. The log line currently says "getting gateway" in a delete handler, which is confusing when scanning logs.
  2. Right now, any ContainerRemove error aborts the whole RPC. If a container is already gone between ContainerList and ContainerRemove, that will surface as an internal error rather than a successful, idempotent delete.

You might want to:

  • Fix the log message.
  • Optionally treat "not found" as a non-fatal condition and continue deleting other containers.

Example diff:

-	d.logger.Info("getting gateway", "gateway_id", gatewayID)
+	d.logger.Info("deleting gateway", "gateway_id", gatewayID)
@@
-	for _, c := range containers {
-		err := d.client.ContainerRemove(ctx, c.ID, container.RemoveOptions{
+	for _, c := range containers {
+		err := d.client.ContainerRemove(ctx, c.ID, container.RemoveOptions{
 			RemoveVolumes: true,
 			RemoveLinks:   true,
 			Force:         true,
 		})
 		if err != nil {
-			return nil, connect.NewError(connect.CodeInternal, fmt.Errorf("failed to remove container: %w", err))
+			// Optionally, treat “already gone” as success and continue:
+			// if client.IsErrNotFound(err) { continue }
+			return nil, connect.NewError(connect.CodeInternal, fmt.Errorf("failed to remove container: %w", err))
 		}
 	}

If you choose to special‑case "not found", please double‑check the exact helper (e.g. client.IsErrNotFound) against the Docker client version used in this repo.


🏁 Script executed:

#!/bin/bash
# Inspect available helpers for not-found errors in the Docker client version used here.
rg -n "IsErrNotFound" --type go

Length of output: 0


I'll search for Docker error handling patterns in the codebase and also check the Docker client documentation.

Now let me search the codebase for the Docker client imports and error handling patterns:

Now let me verify the actual file structure and imports:Based on my verification through web search and documentation:

Findings:

  1. The Docker client library has IsErrNotFound which is an alias for cerrdefs.IsNotFound, though it is deprecated in favor of using cerrdefs.IsNotFound directly.
  2. The library also provides IsErrContainerNotFound specifically for checking when containers are not found.
  3. The review comment correctly identifies two issues: the misleading log message and the non-idempotent error handling.
  4. The suggested solution of using either client.IsErrNotFound (deprecated, use cerrdefs.IsNotFound instead) is valid.

The review comment's analysis is accurate and the suggested error handling improvement is valid. The log message should indeed be changed from "getting gateway" to "deleting gateway," and optionally checking for "not found" errors to make the delete operation more resilient is a reasonable design choice.


Fix the misleading log message and consider best-effort error handling for idempotent semantics

The deletion flow (list by unkey.gateway.id label with All: true, then ContainerRemove with Force and RemoveVolumes) is sound.

Two small improvements:

  1. The log message says "getting gateway" in a delete handler; change it to "deleting gateway" to match the operation.
  2. Currently, any ContainerRemove error aborts. If a container is already gone between ContainerList and ContainerRemove, it surfaces as an internal error. Optionally, treat "not found" as non-fatal and continue deleting other containers for better idempotency.

Suggested changes:

-	d.logger.Info("getting gateway", "gateway_id", gatewayID)
+	d.logger.Info("deleting gateway", "gateway_id", gatewayID)
@@
-	for _, c := range containers {
-		err := d.client.ContainerRemove(ctx, c.ID, container.RemoveOptions{
+	for _, c := range containers {
+		err := d.client.ContainerRemove(ctx, c.ID, container.RemoveOptions{
 			RemoveVolumes: true,
 			RemoveLinks:   true,
 			Force:         true,
 		})
 		if err != nil {
-			return nil, connect.NewError(connect.CodeInternal, fmt.Errorf("failed to remove container: %w", err))
+			// Optionally treat "container not found" as success and continue:
+			// import "github.com/docker/docker/errdefs"
+			// if errdefs.IsNotFound(err) { continue }
+			return nil, connect.NewError(connect.CodeInternal, fmt.Errorf("failed to remove container: %w", err))
 		}
 	}
🤖 Prompt for AI Agents
In go/apps/krane/backend/docker/gateway_delete.go around lines 13 to 48, the
handler logs "getting gateway" but is performing a delete and also treats any
ContainerRemove error as fatal; change the log entry to "deleting gateway" and
update the ContainerRemove error handling so that if the error is a Docker "not
found" error (use cerrdefs.IsNotFound or the appropriate client helper) you
treat it as non-fatal and continue deleting other containers, while still
returning a connect internal error for any other failures.

Comment on lines +13 to +65
// GetGateway retrieves container status and addresses for a deployment.
//
// Finds containers by gateway ID label and returns instance information
// with host.docker.internal addresses using dynamically assigned ports.
func (d *docker) GetGateway(ctx context.Context, req *connect.Request[kranev1.GetGatewayRequest]) (*connect.Response[kranev1.GetGatewayResponse], error) {
gatewayID := req.Msg.GetGatewayId()
d.logger.Info("getting gateway", "gateway_id", gatewayID)

//nolint:exhaustruct // Docker SDK types have many optional fields
containers, err := d.client.ContainerList(ctx, container.ListOptions{
All: true,
Filters: filters.NewArgs(
filters.Arg("label", fmt.Sprintf("unkey.gateway.id=%s", gatewayID)),
),
})
if err != nil {
return nil, connect.NewError(connect.CodeInternal, fmt.Errorf("failed to list containers: %w", err))
}

res := &kranev1.GetGatewayResponse{
Instances: []*kranev1.GatewayInstance{},
}

for _, c := range containers {
d.logger.Info("container found", "container", c)

// Determine container status
status := kranev1.GatewayStatus_GATEWAY_STATUS_UNSPECIFIED
switch c.State {
case container.StateRunning:
status = kranev1.GatewayStatus_GATEWAY_STATUS_RUNNING
case container.StateExited:
status = kranev1.GatewayStatus_GATEWAY_STATUS_TERMINATING
case container.StateCreated:
status = kranev1.GatewayStatus_GATEWAY_STATUS_PENDING
}

d.logger.Info("gateway found",
"gateway_id", gatewayID,
"container_id", c.ID,
"status", status.String(),
"port", c.Ports[0].PublicPort,
)

res.Instances = append(res.Instances, &kranev1.GatewayInstance{
Id: c.ID,
Address: fmt.Sprintf("host.docker.internal:%d", c.Ports[0].PublicPort),
Status: status,
})
}

return connect.NewResponse(res), nil
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Avoid panics on empty port lists and fix minor gateway doc/comment details

The general approach in GetGateway (label‑filtered ContainerList, per‑container status mapping, returning GatewayInstances with host+port) looks solid, but there are a couple of sharp edges:

  1. Potential panic on c.Ports[0]

You index c.Ports[0].PublicPort without checking that Ports is non‑empty:

"port", c.Ports[0].PublicPort,
...
Address: fmt.Sprintf("host.docker.internal:%d", c.Ports[0].PublicPort),

If Docker returns a container summary with no published ports (empty slice), this will panic with index out of range. That can happen if:

  • The container was created without port bindings (e.g. via docker CLI or another tool but still labeled).
  • The Docker API behavior changes or a gateway is misconfigured.
  • The container is in a transient state where Ports is empty.

A small defensive check avoids crashing the whole RPC:

-	for _, c := range containers {
+	for _, c := range containers {
@@
-		d.logger.Info("gateway found",
+		if len(c.Ports) == 0 {
+			d.logger.Info("gateway container has no published ports; skipping instance",
+				"gateway_id", gatewayID,
+				"container_id", c.ID,
+				"state", c.State,
+			)
+			continue
+		}
+
+		publicPort := c.Ports[0].PublicPort
+
+		d.logger.Info("gateway found",
 			"gateway_id", gatewayID,
 			"container_id", c.ID,
 			"status", status.String(),
-			"port", c.Ports[0].PublicPort,
+			"port", publicPort,
 		)
 
 		res.Instances = append(res.Instances, &kranev1.GatewayInstance{
 			Id:      c.ID,
-			Address: fmt.Sprintf("host.docker.internal:%d", c.Ports[0].PublicPort),
+			Address: fmt.Sprintf("host.docker.internal:%d", publicPort),
 			Status:  status,
 		})
  1. Doc/naming nit

The comment says:

// GetGateway retrieves container status and addresses for a deployment.

Since this is the gateway handler, consider updating “deployment” to “gateway” for clarity:

-// GetGateway retrieves container status and addresses for a deployment.
+// GetGateway retrieves container status and addresses for a gateway.

I'll verify this review by checking the actual code, examining Docker SDK behavior, and searching for related patterns in the codebase.





🏁 Script executed:

#!/bin/bash
# Locate and verify the file exists
fd -t f "gateway_get.go" --path "*/krane/backend/docker/*"

Length of output: 233


🏁 Script executed:

#!/bin/bash
# Verify the code matches the snippet — check for c.Ports[0] access
rg -n "c\.Ports\[0\]" --type go go/apps/krane/backend/docker/

Length of output: 411


🏁 Script executed:

#!/bin/bash
# Search for other instances where Ports is checked for length
rg -n "len.*Ports" --type go go/apps/krane/backend/docker/ -C 2

Length of output: 0


🏁 Script executed:

#!/bin/bash
# Search for ContainerList usage in the codebase to see if other handlers check ports
rg -n "ContainerList" --type go go/apps/krane/backend/docker/ -C 3

Length of output: 2416


🌐 Web query:

Docker SDK Go container.Types Ports field empty slice behavior

💡 Result:

Short answer: In the Go Docker SDK the Container.Ports field is a slice ([]Port). If that slice is nil it will marshal to JSON null; if it is a non-nil empty slice (len==0) it will marshal to an empty array ([]). To guarantee clients see [] (not null) initialize the slice (e.g. Ports: []types.Port{}). [1][2][3]

Sources

  • Docker / moby types.Container shows Ports is a slice. [1]
  • Go JSON marshal behavior: nil slice → null, non-nil empty slice → []. (see encoding/json notes / docs). [2]
  • Docker Engine API notes: the API may omit empty fields in responses; if you need stable JSON arrays prefer returning [] explicitly. [3]

References
[1] moby/api/types (Container.Ports).
[2] encoding/json behavior for nil vs empty slices (Go docs / encoding/json v1/v2 notes).
[3] Docker Engine API docs — empty/unsupported fields may be omitted in responses.



---

🏁 Script executed:

```shell
#!/bin/bash
# Verify deployment_get.go has the same issue
rg -n -A 15 "c.Ports\[0\]" go/apps/krane/backend/docker/deployment_get.go | head -30

Length of output: 295


Add defensive port check to prevent panic; also exists in deployment_get.go

Verification confirms a critical panic risk in GetGateway. The code accesses c.Ports[0] without checking that the slice is non-empty (lines 54 and 59). If Docker returns a container with no published ports—a realistic scenario with misconfigured containers or API edge cases—the handler will crash with an index-out-of-bounds panic.

The same vulnerability also exists identically in deployment_get.go (lines 54 and 59), making this a systemic issue across both handlers.

Fixes needed:

  1. Add defensive check before accessing c.Ports[0] in both gateway_get.go and deployment_get.go:
if len(c.Ports) == 0 {
	d.logger.Info("container has no published ports; skipping instance",
		"gateway_id", gatewayID,
		"container_id", c.ID,
		"state", c.State,
	)
	continue
}

publicPort := c.Ports[0].PublicPort

Then use publicPort in both the logging and Address field.

  1. Update the doc comment in gateway_get.go (line 13) from "for a deployment" to "for a gateway" for consistency.
🤖 Prompt for AI Agents
In go/apps/krane/backend/docker/gateway_get.go (lines 13-65) and the identical
pattern in deployment_get.go (check around lines ~54 and ~59), the code indexes
c.Ports[0] without a nil/length check causing a panic if no published ports
exist; add a defensive check: if len(c.Ports) == 0 { d.logger.Info("container
has no published ports; skipping instance", "gateway_id", gatewayID,
"container_id", c.ID, "state", c.State); continue } then extract publicPort :=
c.Ports[0].PublicPort and use publicPort in the existing logging and Address
fmt.Sprintf("host.docker.internal:%d", publicPort); additionally update the doc
comment in gateway_get.go line 13 from "for a deployment" to "for a gateway".

@github-actions
Copy link
Contributor

Thank you for following the naming conventions for pull request titles! 🙏

@vercel vercel bot temporarily deployed to Preview – dashboard November 26, 2025 12:26 Inactive
@vercel vercel bot temporarily deployed to Preview – engineering November 26, 2025 12:26 Inactive
@chronark chronark merged commit 7a6a99d into main Nov 26, 2025
20 checks passed
@chronark chronark deleted the 11-19-feat_implement_krane_gateway_rpcs_for_docker branch November 26, 2025 12:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants