feat(deploy): redesign metald network architecture by imeyer · Pull Request #3818 · unkeyed/unkey

imeyer · 2025-08-20T16:11:48Z

What does this PR do?

Fixes # (issue)

If there is not an issue for this, please create one first. This is used to tracking purposes and also helps use understand why this PR exists

Type of change

Bug fix (non-breaking change which fixes an issue)
Chore (refactoring code, technical debt, workflow improvements)
Enhancement (small improvements)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
This change requires a documentation update

How should this be tested?

Test A
Test B

Checklist

Required

Appreciated

If a UI change was made: Added a screen recording or screenshots to this PR
Updated the Unkey Docs if changes were necessary

Summary by CodeRabbit

New Features
- Optional step-based Docker build pipeline in builderd (feature flag).
- Firecracker backend with integrated jailer and VM metrics for metald.
- New cleanup script to remove all Deploy components.
Improvements
- More robust startup/shutdown and retryable base-asset initialization.
- Systemd units now enforce dependencies (spire-agent, assetmanagerd, metald).
Breaking Changes
- metald client/CLI now use ProjectID and EnvironmentID (UserID removed); config templates removed.
Bug Fixes
- Safer rootfs optimization with path validation; adjusted telemetry timeouts.
Documentation
- Removed legacy guides, examples, and environment templates.
Chores
- Dependency updates across modules.

changeset-bot · 2025-08-20T16:11:52Z

⚠️ No Changeset found

Latest commit: c2cb27f

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

vercel · 2025-08-20T16:11:54Z

The latest updates on your projects. Learn more about Vercel for GitHub.

2 Skipped Deployments

Project	Deployment	Preview	Comments	Updated (UTC)
dashboard	Ignored	Preview		Sep 3, 2025 7:23pm
engineering	Ignored	Preview		Sep 3, 2025 7:23pm

coderabbitai · 2025-08-22T00:40:02Z

📝 Walkthrough

Walkthrough

Removes multiple repository docs; adjusts Makefiles and systemd units; adds a Makefile target for proto generation; modifies DeployWorkflow to insert a build record; introduces builderd asset init with retry, a step-based Docker pipeline executor, shutdown tests, and metrics changes; overhauls metald: client/config/service, database (sqlc), networking model, Firecracker backend, Docker backend, health tests; adds a cleanup script.

Changes

Cohort / File(s)	Summary
Repository docs cleanup `CLAUDE.md`, `go/deploy/CLAUDE.md`, `go/deploy/metald/contrib/systemd/README.md`, `go/deploy/metald/contrib/systemd/environment.example`, `go/deploy/metald/contrib/systemd/fedora-installation.md`, `go/deploy/metald/client/examples/*`, `go/deploy/metald/internal/jailer/README.md`, `go/deploy/metald/internal/service/CLEANUP_BENCHMARKS.md`	Remove various guidance and example configuration/documentation files.
Control-plane DeployWorkflow `go/apps/ctrl/services/deployment/deploy_workflow.go`	Remove partitionDB field; add Step 3 to insert build into DB; minor imports update.
Top-level Go Makefile `go/Makefile`	Add target generate-builder to run buf codegen for proto/deploy/builderd.
Assetmanagerd service changes `go/deploy/assetmanagerd/Makefile`, `.../contrib/systemd/assetmanagerd.service`, `.../go.mod`, `.../internal/service/service.go`	Simplify Makefile (remove create-user/env-example/generate, adjust all/build/install); systemd now Requires/After spire-agent; bump protobuf indirect; remove tenant context usage in queries.
Builderd build/run config `go/deploy/builderd/Makefile`, `.../contrib/systemd/builderd.service`, `.../environment.example`, `.../go.mod`	Trim Makefile (remove create-user, lint/proto targets, quick-build, generate prereq); systemd adds Requires (spire-agent, assetmanagerd) and env flag to disable pipeline executor; delete env example; bump protobuf.
Builderd startup and shutdown `go/deploy/builderd/cmd/builderd/main.go`, `.../cmd/builderd/shutdown_test.go`	Switch to assets.BaseAssetManager with retry/metrics; add robust shutdown with mutex/timeout/promo server handling; add concurrent/sequence/timeout shutdown tests.
Builderd assets and metrics `go/deploy/builderd/internal/assets/base.go`, `.../internal/observability/metrics.go`, `.../internal/observability/otel.go`	Add MetricsRecorder, WithMetrics, InitializeBaseAssetsWithRetry; replace tenant metrics with base-asset-init metrics, adjust method signatures; reduce OTLP export timeouts to 10s.
Builderd executor (pipeline feature flag) `go/deploy/builderd/internal/config/config.go`, `.../internal/executor/registry.go`, `.../internal/executor/docker_pipeline.go`, `.../internal/executor/steps.go`, `.../internal/executor/docker_steps.go`	Add UsePipelineExecutor config; register DockerPipelineExecutor when enabled; add pipeline types/steps and orchestration for Docker builds.
Builderd executor (monolithic edits) `go/deploy/builderd/internal/executor/docker.go`	Remove tenant context in logs/metrics; drop TenantID from BuildResult literal.
Builderd minor imports `go/deploy/builderd/client/client.go`, `.../internal/assetmanager/client.go`, `.../internal/service/builder.go`, `.../internal/tenant/manager.go`	Import reordering and metrics call arg updates; no behavior change otherwise.
Metald CLI and client `go/deploy/metald/client/client.go`, `.../client/cmd/metald-cli/main.go`, `.../client/config.go`, `.../client/vmconfig.go`, `.../client/example_test.go`, `.../client/examples/*`, `.../client/go.mod`	Client adds ProjectID/EnvironmentID; headers updated; CLI adds flags; remove built-in config templates and file-save helpers; drop template constructor; remove examples; update toolchain and replaces.
Metald service entrypoints `go/deploy/metald/cmd/metald/main.go`, `.../cmd/metald-init/main.go`	Replace reconciler/network manager flow; use new database handler and new proto handler; raise log level; update octal literals and cmdline parsing in metald-init.
Metald systemd and Makefile `go/deploy/metald/contrib/systemd/metald.service`, `.../Makefile`, `.../README.md`, `.../.gitignore`	Systemd adds Requires/After deps; remove mkdir netns prestart; Makefile adds bridge targets, changes build/test flags, removes dev/metrics; doc wording updates; ignore networks-seed.sql.
Metald backend (Docker) `go/deploy/metald/internal/backend/docker/*`	Use crypto/rand for ports and safe conversions; adjust proto imports; port mapping uses 32-bit; minor logging and network info omissions.
Metald backend (Firecracker new) `go/deploy/metald/internal/backend/firecracker/*`	Add new Client/types; implement CreateVM/BootVM/DeleteVM/Shutdown/Pause/Resume/Reboot/GetVMInfo/GetVMMetrics; asset management helpers; kernel args and metadata helpers; config builder; validation. Remove old SDK v4 and tests.
Metald backend removals (Cloud Hypervisor) `go/deploy/metald/internal/backend/cloudhypervisor/*`	Remove Cloud Hypervisor client and types.
Metald backend types `go/deploy/metald/internal/backend/types/backend.go`	Update proto import path; remove VMInfo.NetworkInfo.
Metald networking (overhaul) `go/deploy/metald/internal/network/*`	Remove old allocator/manager/metrics/protection/impl/tests; add simplified Manager/Config, bridge ensure, tap placeholder; introduce subnet calculation utilities and expanded VMNetwork fields; change device name generation (Bridge field).
Metald database (migration to sqlc) `go/deploy/metald/internal/database/*`	Remove old Database and VMRepository; add sqlc-generated DBTX, Queries, models, querier, and queries; add Database handler constructors.
Metald service layer `go/deploy/metald/internal/service/vm.go`, `.../internal/service/auth.go`	VMService now uses database.Querier; simplifies VM ops; CreateVm returns HostPortPair; remove ListVms and ownership checks. Auth context shifts to User/Tenant/Project/Environment with new extractors; update dev tokens and baggage keys.
Metald health tests `go/deploy/metald/internal/health/handler_test.go`, `.../internal/health/system_test.go`	Add comprehensive health/system info test suites with a mocked backend.
Cleanup tooling `go/deploy/cleanup-unkey-deploy.sh`	Add root-only teardown script cleaning services, networks, binaries, configs, data, users/groups, iptables, and cgroups with confirmations.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor Operator
  participant Builderd as builderd main
  participant Assets as BaseAssetManager
  participant Metrics as BuildMetrics

  Operator->>Builderd: start
  Builderd->>Assets: NewBaseAssetManager().WithMetrics(Metrics)
  loop retry until success/cancel
    Builderd->>Assets: InitializeBaseAssetsWithRetry(ctx)
    alt retryable error
      Assets-->>Metrics: RecordBaseAssetInitRetry(attempt, reason)
    end
  end
  alt failed after retries
    Assets-->>Metrics: RecordBaseAssetInitFailure(total_attempts, error)
  end
  Builderd-->>Operator: ready

sequenceDiagram
  autonumber
  actor API
  participant ExecReg as ExecutorRegistry
  participant DPExec as DockerPipelineExecutor
  participant Pipe as BuildPipeline
  participant Steps as StepExecutors

  API->>ExecReg: select executor (docker)
  alt UsePipelineExecutor=true
    ExecReg-->>API: DockerPipelineExecutor
    API->>DPExec: Execute(request)
    DPExec->>Pipe: Execute(StepInput)
    Pipe->>Steps: PullImage → CreateContainer → ExtractMetadata → ExtractFilesystem → OptimizeRootfs → Cleanup
    Steps-->>Pipe: StepOutput propagation
    Pipe-->>DPExec: BuildResult(rootfs, metadata)
    DPExec-->>API: BuildResult
  else
    ExecReg-->>API: DockerExecutor (monolithic)
  end

sequenceDiagram
  autonumber
  actor Client
  participant Service as metald VMService
  participant DB as database.Querier
  participant Backend as Backend(Firecracker/Docker)

  Client->>Service: CreateVm(request)
  Service->>DB: AllocateNetwork()
  DB-->>Service: Network allocation
  Service->>Backend: CreateVM(config)
  Backend-->>Service: vm_id
  Service-->>Client: CreateVmResponse(state=CREATED, host_port_pair)

sequenceDiagram
  autonumber
  participant Ctrl as ctrl DeployWorkflow
  participant DB as db.Queries

  Ctrl->>Ctrl: log-deployment-pending
  Ctrl->>DB: InsertBuild(build_id, workspace_id, project_id, deployment_id, created_at)
  alt error
    DB-->>Ctrl: error
    Ctrl-->>Ctrl: abort
  else
    DB-->>Ctrl: ok
    Ctrl->>Ctrl: proceed to build version
  end

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~180 minutes

Possibly related PRs

feat: workflow calls metald to create and boot a vm #3495 — Also modifies DeployWorkflow in go/apps/ctrl/services/*/deploy_workflow.go; overlaps with this PR’s workflow step changes.
chore: move vmprovisioner protos to shared proto #3494 — Relocates proto packaging/imports (vmprovisioner to metald.v1) similar to this PR’s proto path updates and buf target additions.
feat: Initial implementation of Unkey Deploy #3488 — Prior large-scale changes to Deploy services (builderd/metald, database, networking) closely align with the refactors and additions introduced here.

✨ Finishing Touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch push-nuymlkoqsrmv

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

Add @coderabbitai ignore or @coderabbit ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

Visit our Status Page to check the current availability of CodeRabbit.
Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

fix(metald): stuff fix(metald): clean up network package feat: overhaul metald

chronark · 2025-09-03T19:05:25Z

I declare bankruptcy

imeyer · 2025-09-03T19:06:19Z

broke.. like my spirit

chronark · 2025-09-03T19:09:00Z

image.png

chronark · 2025-09-03T19:12:52Z

go/apps/ctrl/services/deployment/deploy_workflow.go

lgtm except this file, can you revert this file, or use the one that is in main, cause we need the partition db and we don't need the insert-build step.
I'd be surprised if this actually builds successfully right now

happy to merge this as is after that

ah that's weird.. my LSP didn't see that as an issue. on it.

🤔 Hm, I thought the side I chose was main but I think I chose the wrong side oops. Well that makes me worried for the rest of the rebase action I had yesterday but.. the changes shouldn't impact anything in production so I think we just fix as necessary and do better in the future lol.

can you look at this and resolve the convo if it's correct @chronark

github-actions · 2025-09-03T19:14:03Z

Thank you for following the naming conventions for pull request titles! 🙏

imeyer · 2025-09-03T19:16:00Z

lmao i forgot to commit the changes i made while we were on the hangout

chronark · 2025-09-03T19:20:11Z

works for me

chronark · 2025-09-03T19:24:31Z

you sure about those uints? you had pretty strong opinions about keeping it as int earlier this week 😛

chronark · 2025-09-03T19:25:11Z

ok cool,. if CI is green, I’ll merge

imeyer · 2025-09-03T19:25:25Z

i'm one of those "i'll address it in the next iteration" kind of people... the rabbit just tries to hold me back!!!

chronark · 2025-09-03T19:25:55Z

I should’ve lead with a sarcasm warning

imeyer · 2025-09-03T19:35:06Z

biggest yeet this side of 2025

coderabbitai

Actionable comments posted: 181

coderabbitai · 2025-09-03T19:41:30Z

go/deploy/assetmanagerd/go.mod

 	google.golang.org/genproto/googleapis/rpc v0.0.0-20250707201910-8d1bb00bc6a7 // indirect
 	google.golang.org/grpc v1.73.0 // indirect
-	google.golang.org/protobuf v1.36.6 // indirect
+	google.golang.org/protobuf v1.36.8 // indirect


💡 Verification agent

🧩 Analysis chain

Proto bump looks fine; verify repo-wide alignment and regen if needed.

Confirm all modules now resolve google.golang.org/protobuf v1.36.8 and regenerate any generated protobuf code if you vendor stubs or pin protoc-gen-go. Run go mod tidy afterwards.

🏁 Script executed:

#!/bin/bash # Check protobuf version across all modules fd go.mod | xargs -I{} bash -lc 'echo "== {} =="; rg -n "google.golang.org/protobuf" {} || true' # Inspect effective deps in this module ( cd go/deploy/assetmanagerd && go list -m all | rg "^google.golang.org/(grpc|protobuf|genproto)" ) # Optional: locate protoc-gen-go usage to decide if regen is needed rg -nP "(protoc|buf).*protoc-gen-go" -g '!**/node_modules/**' -C2

Length of output: 3027

Align google.golang.org/protobuf to v1.36.8 repository-wide
Bump protobuf to v1.36.8 in the following go.mod files (currently on older versions):

apps/agent/go.mod (v1.36.5)

apps/chproxy/go.mod (v1.36.6)

go/deploy/billaged/go.mod (v1.36.6)

go/deploy/metald/go.mod (v1.36.6)

go/deploy/pkg/observability/interceptors/go.mod (v1.36.6)

go/deploy/pkg/spiffe/go.mod (v1.36.5)

go/deploy/pkg/telemetry/go.mod (v1.36.6)

go/deploy/pkg/tls/go.mod (v1.36.5)

After aligning versions, regenerate any vendored protobuf stubs or run your protoc-gen-go pipeline (none found via rg search) and run go mod tidy in each module.

coderabbitai · 2025-09-03T19:41:30Z

go/deploy/assetmanagerd/Makefile


 # Targets (alphabetically ordered)
-.PHONY: all build check clean create-user deps dev env-example fmt generate help install lint lint-proto run service-logs service-logs-tail service-restart service-start service-status service-stop setup test test-coverage uninstall version vet
+.PHONY: all build check clean deps dev env-example fmt generate help install lint lint-proto run service-logs service-logs-tail service-restart service-start service-status service-stop setup test test-coverage uninstall version vet


⚠️ Potential issue

PHONY lists non-existent targets (generate, env-example)

They were removed but still declared; this breaks expectations and confuses help.

-.PHONY: all build check clean deps dev env-example fmt generate help install lint lint-proto run service-logs service-logs-tail service-restart service-start service-status service-stop setup test test-coverage uninstall version vet +.PHONY: all build check clean deps dev fmt help install lint lint-proto run service-logs service-logs-tail service-restart service-start service-status service-stop setup test test-coverage uninstall version vet

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

.PHONY: all build check clean deps dev env-example fmt generate help install lint lint-proto run service-logs service-logs-tail service-restart service-start service-status service-stop setup test test-coverage uninstall version vet

.PHONY: all build check clean deps dev fmt help install lint lint-proto run service-logs service-logs-tail service-restart service-start service-status service-stop setup test test-coverage uninstall version vet

🤖 Prompt for AI Agents

In go/deploy/assetmanagerd/Makefile around line 39, the .PHONY declaration still lists targets that no longer exist (generate, env-example); remove those entries from the .PHONY line or restore the corresponding targets so the declaration matches actual Makefile targets, and then run make help to verify output and update any documentation referencing them.

coderabbitai · 2025-09-03T19:41:30Z

go/deploy/assetmanagerd/Makefile

+.PHONY: all build check clean deps dev env-example fmt generate help install lint lint-proto run service-logs service-logs-tail service-restart service-start service-status service-stop setup test test-coverage uninstall version vet

-all: clean generate build ## Clean, generate, and build
+all: clean build ## Clean, generate, and build


🧹 Nitpick (assertive)

Update help text for all target

It no longer generates; adjust description.

-all: clean build ## Clean, generate, and build +all: clean build ## Clean and build

🤖 Prompt for AI Agents

In go/deploy/assetmanagerd/Makefile around line 41, the `all` target help text still says "Clean, generate, and build" but the target no longer performs generation; update the description to accurately reflect behavior (e.g., "Clean and build" or similar). Edit the comment after the target to replace "Clean, generate, and build" with the corrected short description and ensure formatting matches the existing help comment style.

coderabbitai · 2025-09-03T19:41:30Z

go/deploy/assetmanagerd/Makefile

 dev: generate ## Run in development mode
 	@go run ./cmd/$(SERVICE_NAME)


⚠️ Potential issue

Broken deps: dev/setup depend on missing “generate”

These targets will fail with “No rule to make target ‘generate’”.

-dev: generate ## Run in development mode +dev: ## Run in development mode @go run ./cmd/$(SERVICE_NAME) @@ -setup: deps generate ## Complete development setup +setup: deps ## Complete development setup

Also applies to: 111-112

🤖 Prompt for AI Agents

go/deploy/assetmanagerd/Makefile around lines 57-58 (and also 111-112): the dev and dev/setup targets declare a dependency on a non-existent "generate" target which causes "No rule to make target 'generate'"; add a concrete "generate" target before those lines that runs the project’s code-generation steps (for example run go generate ./... and any protoc/Swagger/gen commands your repo requires) or, if generation is not needed for development, remove the "generate" dependency from the dev and dev/setup targets; ensure the new target is marked .PHONY and invokes the exact generation commands used elsewhere in the repo so the make targets succeed.

coderabbitai · 2025-09-03T19:41:30Z

go/deploy/builderd/internal/assetmanager/client.go

+	assetv1 "github.com/unkeyed/unkey/go/gen/proto/deploy/assetmanagerd/v1"
+	"github.com/unkeyed/unkey/go/gen/proto/deploy/assetmanagerd/v1/assetmanagerdv1connect"


💡 Verification agent

🧩 Analysis chain

Import reorder is a no-op — OK

No behavioral change. Proceed.

Given significant builderd changes in this PR, remember the learning to bump the patch-level version for the service. If applicable here, please confirm the version bump.

🏁 Script executed:

#!/bin/bash # Search for builderd version declarations/bakes to confirm a patch bump happened. rg -nP -C2 '(?i)\b(version|appVersion|buildVersion)\b' go/deploy/builderd | sed -n '1,200p'

Length of output: 9672

Bump all version references to 0.5.1
Makefile is at 0.5.1, but README.md still shows 0.1.0 (go/deploy/builderd/README.md:278) and the CLI help default OTEL version shows “default: 0.1.0” (go/deploy/builderd/cmd/builderd/main.go:478). Update both to 0.5.1.

🤖 Prompt for AI Agents

In go/deploy/builderd/internal/assetmanager/client.go around lines 14-15 and also update go/deploy/builderd/README.md (around line ~278) and go/deploy/builderd/cmd/builderd/main.go (around line ~478), bump all version references from 0.1.0 to 0.5.1; specifically replace the README’s displayed version and the CLI help default OTEL version string with "0.5.1" and ensure any other occurrences in these files or nearby lines are updated to match 0.5.1 so Makefile, docs, and CLI defaults are consistent.

coderabbitai · 2025-09-03T19:41:35Z

go/deploy/metald/internal/service/auth.go

 	// TODO: In production, implement proper tenant-user relationship checks
-	// This should query a tenant membership service or database
+	// This should query a tenant membership service or database using ctx for timeouts/tracing
+	_ = ctx // Will be used for database queries in production implementation



🧹 Nitpick (assertive)

Upgrade TODO to AIDEV-TODO per repo guidelines and emphasize context usage.

- // TODO: In production, implement proper tenant-user relationship checks - // This should query a tenant membership service or database using ctx for timeouts/tracing + // AIDEV-TODO: In production, implement proper tenant-user relationship checks. + // Use ctx for deadlines/tracing; query an authz service or DB for membership/roles. _ = ctx // Will be used for database queries in production implementation

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

// TODO: In production, implement proper tenant-user relationship checks

// This should query a tenant membership service or database

// This should query a tenant membership service or database using ctx for timeouts/tracing

_ = ctx // Will be used for database queries in production implementation

// AIDEV-TODO: In production, implement proper tenant-user relationship checks.

// Use ctx for deadlines/tracing; query an authz service or DB for membership/roles.

_ = ctx // Will be used for database queries in production implementation

🤖 Prompt for AI Agents

In go/deploy/metald/internal/service/auth.go around lines 198 to 201, replace the generic TODO comment and the placeholder "_ = ctx" with an AIDEV-TODO per repo guidelines and update the note to explicitly require using the passed ctx for timeouts, cancellation and tracing when querying the tenant membership service or DB; in practice change the comment to start with "AIDEV-TODO:" and state that the implementation must query a tenant membership store using ctx (context-aware DB calls, propagate tracing/span IDs, respect deadlines/cancellations) so future implementers know to use ctx rather than ignore it.

coderabbitai · 2025-09-03T19:41:36Z

go/deploy/metald/internal/service/auth.go

+	if requestedTenantID == "restricted-tenant" && customerCtx.UserID != "admin-user" {
+		return fmt.Errorf("access denied: user %s cannot access restricted tenant", customerCtx.UserID)
 	}


🛠️ Refactor suggestion

Gate the demo “restricted-tenant” rule to dev mode.

Prevents accidental prod behavior differences.

- if requestedTenantID == "restricted-tenant" && customerCtx.UserID != "admin-user" { + if os.Getenv("UNKEY_METALD_AUTH_MODE") == "dev" && + requestedTenantID == "restricted-tenant" && customerCtx.UserID != "admin-user" { return fmt.Errorf("access denied: user %s cannot access restricted tenant", customerCtx.UserID) }

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents

In go/deploy/metald/internal/service/auth.go around lines 209-211, the hardcoded rule allowing only admin-user to access "restricted-tenant" must be limited to development mode; modify the conditional to check a dev-mode flag (preferably an injected config like cfg.IsDev or, if not available, os.Getenv("APP_ENV") == "development") in addition to requestedTenantID == "restricted-tenant" and customerCtx.UserID != "admin-user", so the access-deny logic only runs when running in dev mode.

coderabbitai · 2025-09-03T19:41:36Z

go/deploy/metald/Makefile

 GOOS ?= $(shell go env GOOS)
 GOARCH ?= $(shell go env GOARCH)
-LDFLAGS := -ldflags "-s -w -X main.version=$(VERSION)"
+LDFLAGS := -ldflags "-s=false -w=false -X main.version=$(VERSION)"


⚠️ Potential issue

Debug flags inappropriately set in production build.

The LDFLAGS have been modified to disable stripping (-s=false -w=false), which will significantly increase binary size and potentially expose sensitive debug information in production builds. These flags should only be used in debug builds.

Apply this diff to fix the production build flags:

-LDFLAGS := -ldflags "-s=false -w=false -X main.version=$(VERSION)" +LDFLAGS := -ldflags "-s -w -X main.version=$(VERSION)"

For debug builds, you should use separate debug flags:

+DEBUG_LDFLAGS := -ldflags "-X main.version=$(VERSION)"

🤖 Prompt for AI Agents

In go/deploy/metald/Makefile around line 11, LDFLAGS currently disable symbol and DWARF stripping (-s=false -w=false); change production LDFLAGS to enable stripping (-s -w) while preserving the version injection, and implement a separate debug path (e.g., a DEBUG flag or distinct DEBUG_LDFLAGS) that sets unstripped flags for local/dev builds; ensure the default Makefile target uses the stripped production LDFLAGS and only uses the debug flags when an explicit DEBUG=true (or similar) is provided.

coderabbitai · 2025-09-03T19:41:36Z

go/deploy/metald/README.md

+Metald runs as root to manage network namespaces, interfaces, and iptables operations. This is acceptable as metald is designed to be the sole application on dedicated VM hosts. The integrated jailer still drops privileges to specified UID/GID for individual VM processes, ensuring proper isolation.

-The `make install` command configures these automatically.
+The `make install` command configures the service with appropriate permissions automatically.



🧹 Nitpick (assertive)

Fix double-spacing and add an AIDEV note on privileged operation.

Two grammar nits and add an anchor reminding maintainers to revisit hardening.

-Metald runs as root to manage network namespaces, interfaces, and iptables operations. This is acceptable as metald is designed to be the sole application on dedicated VM hosts. The integrated jailer still drops privileges to specified UID/GID for individual VM processes, ensuring proper isolation. +Metald runs as root to manage network namespaces, interfaces, and iptables operations. This is acceptable as metald is designed to be the sole application on dedicated VM hosts. The integrated jailer still drops privileges to specified UID/GID for individual VM processes, ensuring proper isolation. + + +When feasible, apply systemd hardening (ProtectSystem, ProtectControlGroups, PrivateUsers, CapabilityBoundingSet) and network namespace isolation to reduce blast radius. -The `make install` command configures the service with appropriate permissions automatically. +The `make install` command configures the service with appropriate permissions automatically.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

Metald runs as root to manage network namespaces, interfaces, and iptables operations. This is acceptable as metald is designed to be the sole application on dedicated VM hosts. The integrated jailer still drops privileges to specified UID/GID for individual VM processes, ensuring proper isolation.

The `make install` command configures these automatically.

The `make install` command configures the service with appropriate permissions automatically.

Metald runs as root to manage network namespaces, interfaces, and iptables operations. This is acceptable as metald is designed to be the sole application on dedicated VM hosts. The integrated jailer still drops privileges to specified UID/GID for individual VM processes, ensuring proper isolation.



When feasible, apply systemd hardening (ProtectSystem, ProtectControlGroups, PrivateUsers, CapabilityBoundingSet) and network namespace isolation to reduce blast radius.

The `make install` command configures the service with appropriate permissions automatically.

🧰 Tools

🪛 LanguageTool

[grammar] ~80-~80: Use correct spacing
Context: ...VM processes, ensuring proper isolation. The make install command configures th...

(QB_NEW_EN_OTHER_ERROR_IDS_5)

[grammar] ~82-~82: Use correct spacing
Context: ...h appropriate permissions automatically. ## Contributing See [Development Setup](./...

(QB_NEW_EN_OTHER_ERROR_IDS_5)

🤖 Prompt for AI Agents

In go/deploy/metald/README.md around lines 80 to 83, fix the double-spacing in the paragraph about metald running as root (remove any extra spaces between sentences) and append a short AIDEV note/anchor reminding maintainers to revisit hardening of privileged operations (e.g., add a single-line anchor or NOTE (AIDEV): comment stating "revisit privileged operation hardening and service isolation" and linkable anchor name). Ensure the sentence flow remains grammatical and the new AIDEV note is concise and clearly identifiable for future reference.

coderabbitai · 2025-09-03T19:41:36Z

go/Makefile

+generate-builder:
+	buf generate --path proto/deploy/builderd
+


🧹 Nitpick (assertive)

Add generate-builder target — add PHONY and tools dependency

Looks good. Make it phony and depend on tools so buf is present.

Apply:

- .PHONY: install tools fmt test-unit test-integration test-integration-long test-stress test build generate pull up clean + .PHONY: install tools fmt test-unit test-integration test-integration-long test-stress test build generate generate-builder pull up clean @@ -generate-builder: - buf generate --path proto/deploy/builderd +generate-builder: tools + buf generate --path proto/deploy/builderd

Note: Per prior learning, running buf from go/ with out: gen is correct.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

generate-builder:

buf generate --path proto/deploy/builderd

.PHONY: install tools fmt test-unit test-integration test-integration-long test-stress test build generate generate-builder pull up clean

generate-builder: tools

buf generate --path proto/deploy/builderd

🤖 Prompt for AI Agents

In go/Makefile around lines 55 to 57, the generate-builder target should be declared phony and made to depend on the tools target so buf is available; update the Makefile to add generate-builder to the .PHONY list and change the target to depend on tools (e.g., "generate-builder: tools") while keeping the existing command "buf generate --path proto/deploy/builderd".

coderabbitai

Review continued from previous batch...

coderabbitai · 2025-09-03T19:42:45Z

go/apps/ctrl/services/deployment/deploy_workflow.go

+	"io"
+	"net/http"


⚠️ Potential issue

Remove unused imports (io, net/http) — currently cause build failure.

These are only referenced in a large block comment. Go rejects unused imports.

Apply:

- "io" - "net/http"

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

"io"

"net/http"

🤖 Prompt for AI Agents

In go/apps/ctrl/services/deployment/deploy_workflow.go around lines 7-8, the imports "io" and "net/http" are unused (only mentioned in a block comment) and cause a build failure; remove "io" and "net/http" from the import block (or only re-add them when actual code uses them) so the file compiles cleanly.

coderabbitai · 2025-09-03T19:42:45Z

go/apps/ctrl/services/deployment/deploy_workflow.go

+	// Step 3: Insert build into database
+	err = hydra.StepVoid(ctx, "insert-build", func(stepCtx context.Context) error {
+		w.logger.Info("inserting build into database", "build_id", buildID)
+		insertErr := db.Query.InsertBuild(stepCtx, w.db.RW(), db.InsertBuildParams{
+			ID:           buildID,
+			WorkspaceID:  req.WorkspaceID,
+			ProjectID:    req.ProjectID,
+			DeploymentID: req.DeploymentID,
+			CreatedAt:    time.Now().UnixMilli(),
+		})
+		if insertErr != nil {
+			return fmt.Errorf("failed to create build record: %w", insertErr)
+		}
+		w.logger.Info("build record created successfully", "build_id", buildID)
+		return nil
+	})
+	if err != nil {
+		w.logger.Error("failed to insert build", "error", err, "build_id", buildID)
+		return err
+	}


⚠️ Potential issue

buildID is undefined — initialize before InsertBuild.

Compilation fails without a value.

Apply:

} - // Step 3: Insert build into database + // Generate build ID + buildID := uid.New("build") + + // Step 3: Insert build into database err = hydra.StepVoid(ctx, "insert-build", func(stepCtx context.Context) error {

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

// Step 3: Insert build into database

err = hydra.StepVoid(ctx, "insert-build", func(stepCtx context.Context) error {

w.logger.Info("inserting build into database", "build_id", buildID)

insertErr := db.Query.InsertBuild(stepCtx, w.db.RW(), db.InsertBuildParams{

ID: buildID,

WorkspaceID: req.WorkspaceID,

ProjectID: req.ProjectID,

DeploymentID: req.DeploymentID,

CreatedAt: time.Now().UnixMilli(),

})

if insertErr != nil {

return fmt.Errorf("failed to create build record: %w", insertErr)

}

w.logger.Info("build record created successfully", "build_id", buildID)

return nil

})

if err != nil {

w.logger.Error("failed to insert build", "error", err, "build_id", buildID)

return err

}

}

// Generate build ID

buildID := uid.New("build")

// Step 3: Insert build into database

err = hydra.StepVoid(ctx, "insert-build", func(stepCtx context.Context) error {

w.logger.Info("inserting build into database", "build_id", buildID)

insertErr := db.Query.InsertBuild(stepCtx, w.db.RW(), db.InsertBuildParams{

ID: buildID,

WorkspaceID: req.WorkspaceID,

ProjectID: req.ProjectID,

DeploymentID: req.DeploymentID,

CreatedAt: time.Now().UnixMilli(),

})

if insertErr != nil {

return fmt.Errorf("failed to create build record: %w", insertErr)

}

w.logger.Info("build record created successfully", "build_id", buildID)

return nil

})

if err != nil {

w.logger.Error("failed to insert build", "error", err, "build_id", buildID)

return err

}

🤖 Prompt for AI Agents

In go/apps/ctrl/services/deployment/deploy_workflow.go around lines 90 to 109, buildID is used but not defined; initialize a buildID before the InsertBuild step (for example: buildID := uuid.NewString() or via your project's ID generator) and use that variable in the db.InsertBuild call and subsequent logs, and add the necessary import for the ID generator you choose.

coderabbitai · 2025-09-03T19:42:45Z

go/deploy/assetmanagerd/contrib/systemd/assetmanagerd.service

+After=network.target spire-agent.service
 Wants=network.target
+Requires=spire-agent.service



💡 Verification agent

🧩 Analysis chain

Good: spire-agent ordering and dependency; suggest tighter coupling and graceful stop

Consider BindsTo for tighter lifetime coupling with spire-agent.

Add TimeoutStopSec for graceful shutdowns.

@@ -After=network.target spire-agent.service +After=network.target spire-agent.service Wants=network.target -Requires=spire-agent.service +Requires=spire-agent.service +BindsTo=spire-agent.service

Additionally, consider:

# outside this hunk TimeoutStopSec=30s

Also ensure the referenced local storage path exists at boot:

# outside this hunk ExecStartPre=+/usr/bin/mkdir -p /opt/builderd/rootfs

🏁 Script executed:

#!/bin/bash # Verify builderd path reference and pre-create directive presence rg -n 'UNKEY_ASSETMANAGERD_LOCAL_STORAGE_PATH|/opt/builderd/rootfs' go/deploy/assetmanagerd/contrib/systemd -C2

Length of output: 730

Add tighter service coupling and graceful shutdown steps

Under the existing Requires=spire-agent.service, add
After=network.target spire-agent.service Wants=network.target Requires=spire-agent.service +BindsTo=spire-agent.service

Outside that hunk, add for graceful stop:
TimeoutStopSec=30s

And pre-create the local storage directory at boot:
ExecStartPre=+/usr/bin/mkdir -p /opt/builderd/rootfs

🤖 Prompt for AI Agents

In go/deploy/assetmanagerd/contrib/systemd/assetmanagerd.service around lines 4 to 7, add tighter coupling and graceful shutdown/start steps: add a Wants=spire-agent.service line beneath the existing Requires=spire-agent.service, and add TimeoutStopSec=30s (to the Service section) and ExecStartPre=+/usr/bin/mkdir -p /opt/builderd/rootfs (to the Service section before ExecStart) so the unit pre-creates the local storage directory and allows a 30s graceful stop.

coderabbitai · 2025-09-03T19:42:45Z

go/deploy/builderd/cmd/builderd/shutdown_test.go

+			atomic.StoreInt64(&shutdownStarted, 0)
+


🛠️ Refactor suggestion

Don’t reset the shared shutdown flag inside each goroutine.

This defeats the purpose of racing multiple shutdown attempts against the same guard.

- // Reset shutdown state before each attempt - atomic.StoreInt64(&shutdownStarted, 0) + // Intentionally share shutdownStarted across goroutines to exercise the guard.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

atomic.StoreInt64(&shutdownStarted, 0)

// Intentionally share shutdownStarted across goroutines to exercise the guard.

🤖 Prompt for AI Agents

In go/deploy/builderd/cmd/builderd/shutdown_test.go around lines 59-60, do not reset the shared shutdown flag inside each goroutine (remove the atomic.StoreInt64(&shutdownStarted, 0) call from within the goroutine); instead initialize or reset the shared atomic shutdownStarted once before spawning the concurrent goroutines so all goroutines race against the same guard, ensuring the test exercises multiple concurrent shutdown attempts correctly.

coderabbitai · 2025-09-03T19:42:45Z

go/deploy/builderd/cmd/builderd/shutdown_test.go

+	// Start servers in background
+	go func() {
+		if err := server.ListenAndServe(); err != nil && err != http.ErrServerClosed {
+			t.Logf("HTTP server error: %v", err)
+		}
+	}()
+
+	go func() {
+		if err := promServer.ListenAndServe(); err != nil && err != http.ErrServerClosed {
+			t.Logf("Prometheus server error: %v", err)
+		}
+	}()
+
+	// Give servers time to start
+	time.Sleep(100 * time.Millisecond)
+


🧹 Nitpick (assertive)

Optional: use Server.Serve with a listener in TestShutdownSequence for symmetry.

This mirrors the timeout test, avoids spurious log noise on shutdown, and makes startup deterministic.

If flaky behavior emerges, switch to Serve on an explicit net.Listener and wait until it’s bound before proceeding.

🤖 Prompt for AI Agents

In go/deploy/builderd/cmd/builderd/shutdown_test.go around lines 131 to 146, replace the goroutines calling server.ListenAndServe and promServer.ListenAndServe with starting each server via server.Serve(listener) and promServer.Serve(promListener) where the listeners are created with net.Listen; create the listeners before the goroutines, wait until each listener is bound (e.g., immediately after net.Listen returns) to make startup deterministic, and use the Serve form to avoid spurious "use of closed network connection" logs during shutdown and to mirror the timeout test’s symmetry.

coderabbitai · 2025-09-03T19:42:50Z

go/deploy/metald/internal/network/manager.go

+// Config holds network configuration
+type Config struct {
+	BaseNetwork     *net.IPNet
+	BridgeName      string
+	DNSServers      []string // Default: ["8.8.8.8", "8.8.4.4"]
+	EnableIPv6      bool
+	EnableRateLimit bool
+	RateLimitMbps   int // Per VM rate limit in Mbps
+}


🧹 Nitpick (assertive)

Document field constraints in Config.

Add comments for invariants (non-empty BridgeName, non-nil BaseNetwork, RateLimitMbps > 0 when enabled) to guide callers.

type Config struct { - BaseNetwork *net.IPNet - BridgeName string - DNSServers []string // Default: ["8.8.8.8", "8.8.4.4"] - EnableIPv6 bool - EnableRateLimit bool - RateLimitMbps int // Per VM rate limit in Mbps + BaseNetwork *net.IPNet // required + BridgeName string // required, linux bridge device name (e.g., "br0") + DNSServers []string // optional; IP literals. Default: ["8.8.8.8", "8.8.4.4"] + EnableIPv6 bool + EnableRateLimit bool + RateLimitMbps int // required if EnableRateLimit; per-VM Mbps }

Add an AIDEV-NOTE near NewManager stating the validation policy.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

// Config holds network configuration

type Config struct {

BaseNetwork *net.IPNet

BridgeName string

DNSServers []string // Default: ["8.8.8.8", "8.8.4.4"]

EnableIPv6 bool

EnableRateLimit bool

RateLimitMbps int // Per VM rate limit in Mbps

}

// Config holds network configuration

type Config struct {

BaseNetwork *net.IPNet // required

BridgeName string // required, linux bridge device name (e.g., "br0")

DNSServers []string // optional; IP literals. Default: ["8.8.8.8", "8.8.4.4"]

EnableIPv6 bool

EnableRateLimit bool

RateLimitMbps int // required if EnableRateLimit; per-VM Mbps

}

🤖 Prompt for AI Agents

In go/deploy/metald/internal/network/manager.go around lines 30 to 38, the Config struct lacks comments documenting field invariants; add inline comments for BridgeName (must be non-empty), BaseNetwork (must be non-nil), and RateLimitMbps (must be >0 when EnableRateLimit is true) and clarify default DNSServers; also add an AIDEV-NOTE comment immediately above NewManager describing the validation policy (which fields are validated by NewManager vs. callers' responsibility and whether NewManager returns an error on invalid config) so callers know expectations and invariants.

coderabbitai · 2025-09-03T19:42:50Z

go/deploy/metald/internal/network/manager.go

+type Manager struct {
+	logger   *slog.Logger
+	config   *Config
+	mu       sync.RWMutex
+	bridgeMu sync.RWMutex
+}


🧹 Nitpick (assertive)

Add GoDoc for exported Manager and document locking.

Public types must be documented; also record which mutex protects which fields.

- type Manager struct { +// Manager orchestrates bridge/tap network lifecycle. +// AIDEV-NOTE: mu guards Manager-wide state; bridgeMu guards bridge creation/configuration. +type Manager struct { logger *slog.Logger config *Config mu sync.RWMutex bridgeMu sync.RWMutex }

🤖 Prompt for AI Agents

In go/deploy/metald/internal/network/manager.go around lines 40 to 45, the exported Manager type lacks GoDoc and documentation of its locking strategy; add a GoDoc comment immediately above the type declaration that briefly describes Manager's purpose and explicitly documents which mutex protects which fields (e.g., mu guards config and other mutable state, bridgeMu guards bridge-related fields), and mention that mu is an RWMutex for concurrent read access and bridgeMu is a separate RWMutex for bridge-specific state and why both are needed.

coderabbitai · 2025-09-03T19:42:50Z

go/deploy/metald/internal/network/tuntap.go

+package network
+
+// setupVMNetworking creates and configures TAP and veth devices for a VM
+func (m *Manager) setupTunTap(ip string) error {
+	// // tap := &netlink.Tuntap{
+	// // 	LinkAttrs: netlink.LinkAttrs{
+	// // 		Name: deviceNames.TAP,
+	// // 	},
+	// // 	Mode: netlink.TUNTAP_MODE_TAP,
+	// // }
+
+	// // if err := netlink.LinkAdd(tap); err != nil {
+	// // 	return fmt.Errorf("failed to create TAP device %s: %w", deviceNames.TAP, err)
+	// // }
+
+	// // tapLink, err := netlink.LinkByName(deviceNames.TAP)
+	// // if err != nil {
+	// // 	return fmt.Errorf("failed to get TAP device: %w", err)
+	// // }
+
+	// // Set TAP device up
+	// if err := netlink.LinkSetUp(tap); err != nil {
+	// 	return fmt.Errorf("failed to bring TAP device up: %w", err)
+	// }
+
+	// m.logger.Info("TAP device created and up", slog.String("tap", tapLink.Attrs().Name))
+
+	return nil
+}


⚠️ Potential issue

Stub silently returns success; add linux build tag and fail fast until implemented.

Silent no-op can mislead callers. Gate to linux, fix doc, add TODO, and return NotImplemented.

-package network +//go:build linux + +package network + +import "fmt" -// setupVMNetworking creates and configures TAP and veth devices for a VM +// setupTunTap creates and configures TAP and veth devices for a VM. +// AIDEV-TODO: Implement using netlink (requires CAP_NET_ADMIN). Ensure bridge attach, MTU, and ownership. func (m *Manager) setupTunTap(ip string) error { @@ - return nil + // AIDEV-TODO: temporary NotImplemented to avoid silent success paths. + return fmt.Errorf("setupTunTap: not implemented for ip=%s", ip) }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

package network

// setupVMNetworking creates and configures TAP and veth devices for a VM

func (m *Manager) setupTunTap(ip string) error {

// // tap := &netlink.Tuntap{

// // LinkAttrs: netlink.LinkAttrs{

// // Name: deviceNames.TAP,

// // },

// // Mode: netlink.TUNTAP_MODE_TAP,

// // }

// // if err := netlink.LinkAdd(tap); err != nil {

// // return fmt.Errorf("failed to create TAP device %s: %w", deviceNames.TAP, err)

// // }

// // tapLink, err := netlink.LinkByName(deviceNames.TAP)

// // if err != nil {

// // return fmt.Errorf("failed to get TAP device: %w", err)

// // }

// // Set TAP device up

// if err := netlink.LinkSetUp(tap); err != nil {

// return fmt.Errorf("failed to bring TAP device up: %w", err)

// }

// m.logger.Info("TAP device created and up", slog.String("tap", tapLink.Attrs().Name))

return nil

}

//go:build linux

package network

import "fmt"

// setupTunTap creates and configures TAP and veth devices for a VM.

// AIDEV-TODO: Implement using netlink (requires CAP_NET_ADMIN). Ensure bridge attach, MTU, and ownership.

func (m *Manager) setupTunTap(ip string) error {

// // tap := &netlink.Tuntap{

// // LinkAttrs: netlink.LinkAttrs{

// // Name: deviceNames.TAP,

// // },

// // Mode: netlink.TUNTAP_MODE_TAP,

// // }

// // if err := netlink.LinkAdd(tap); err != nil {

// // return fmt.Errorf("failed to create TAP device %s: %w", deviceNames.TAP, err)

// // }

// // tapLink, err := netlink.LinkByName(deviceNames.TAP)

// // if err != nil {

// // return fmt.Errorf("failed to get TAP device: %w", err)

// // }

// // Set TAP device up

// if err := netlink.LinkSetUp(tap); err != nil {

// return fmt.Errorf("failed to bring TAP device up: %w", err)

// }

// m.logger.Info("TAP device created and up", slog.String("tap", tapLink.Attrs().Name))

// AIDEV-TODO: temporary NotImplemented to avoid silent success paths.

return fmt.Errorf("setupTunTap: not implemented for ip=%s", ip)

}

🤖 Prompt for AI Agents

In go/deploy/metald/internal/network/tuntap.go lines 1-29: add a linux-only build tag at the top of the file (//go:build linux and the legacy +build line) so this stub is only compiled on Linux, update the function comment to note it’s unimplemented and include a TODO, and replace the silent no-op with an immediate failing return (e.g. return fmt.Errorf("setupTunTap: not implemented")) so callers fail fast; also add the required fmt import if missing.

coderabbitai · 2025-09-03T19:42:50Z

go/deploy/metald/Makefile

 build: deps ## Build the binary
 	@mkdir -p $(BUILD_DIR)
-	@go build $(LDFLAGS) -o $(BUILD_DIR)/$(BINARY_NAME) ./cmd/metald
+	go build $(LDFLAGS) -gcflags="all=-N -l" -o $(BUILD_DIR)/$(BINARY_NAME) ./cmd/metald


⚠️ Potential issue

Debug flags included in regular build target.

The build target now includes debug flags (-gcflags="all=-N -l") which disable optimizations and inlining. This will significantly impact performance and should only be used in debug builds.

Apply this diff to separate production and debug builds:

build: deps ## Build the binary @mkdir -p $(BUILD_DIR) - go build $(LDFLAGS) -gcflags="all=-N -l" -o $(BUILD_DIR)/$(BINARY_NAME) ./cmd/metald + go build $(LDFLAGS) -o $(BUILD_DIR)/$(BINARY_NAME) ./cmd/metald + +debug-build: deps ## Build the binary with debug symbols + @mkdir -p $(BUILD_DIR) + go build $(DEBUG_LDFLAGS) -gcflags="all=-N -l" -o $(BUILD_DIR)/$(BINARY_NAME)-debug ./cmd/metald

coderabbitai · 2025-09-03T19:42:51Z

go/deploy/metald/Makefile

-
 test: ## Run all tests
-	@go test ./... -v
+	@go test -json -failfast -v ./... | tparse


💡 Verification agent

🧩 Analysis chain

Test runner changed to use tparse for better output formatting.

The change from go test ./... -v to go test -json -failfast -v ./... | tparse provides better test output formatting. However, ensure that tparse is installed as a dependency.

Run the following script to verify tparse availability:

🏁 Script executed:

#!/bin/bash # Check if tparse is available in go.mod or as a tool dependency echo "Checking for tparse in go.mod dependencies..." rg -n "tparse" go/deploy/metald/go.mod echo "Checking for tparse installation in Makefile or scripts..." rg -n "go install.*tparse" go/deploy/metald/ echo "Checking if tparse is mentioned in documentation..." fd README -t f | xargs rg -l "tparse"

Length of output: 464

Add tparse installation in Makefile and track as a Go tool dependency
No references to tparse were found in go.mod, scripts, or docs, so invoking go test … | tparse will break in a clean environment.

In go/deploy/metald/Makefile, add a target or prerequisite to install tparse, for example:
install-tools: go install github.com/jstemmer/tparse/cmd/tparse@latest
and make test depend on install-tools.

Track tparse in version control by adding a tools.go under go/deploy/metald/:
// +build tools package tools import _ "github.com/jstemmer/tparse/cmd/tparse"

Update README or developer setup docs to include the above steps.

🤖 Prompt for AI Agents

In go/deploy/metald/Makefile around line 88 the test target pipes output to tparse which isn't installed or tracked; add a Makefile target (e.g., install-tools) that runs `go install github.com/jstemmer/tparse/cmd/tparse@latest` and make the test target depend on that target so tparse is available in a clean environment; add a new go/deploy/metald/tools.go file with the tools build tag that imports `_ "github.com/jstemmer/tparse/cmd/tparse"` to pin tparse as a Go tool dependency in version control; and update the repository README/developer setup docs to mention the tools installation or the Makefile target.

coderabbitai

Review continued from previous batch...

coderabbitai · 2025-09-03T19:43:58Z

go/deploy/builderd/cmd/builderd/main.go

+	// AIDEV-NOTE: Signal handling continues during graceful shutdown to prevent SIGABRT panics
+	shutdownSignalReceived := make(chan struct{})
+


🛠️ Refactor suggestion

⚠️ Potential issue

Signal ignore loop never runs due to closed-channel select; fix to actually drain signals.

After closing shutdownSignalReceived, the select always takes that branch and returns immediately, so additional signals are not ignored/logged.

- // Continue handling signals during graceful shutdown to prevent SIGABRT panics - go func() { - for { - select { - case <-shutdownSignalReceived: - // Already shutting down, ignore - return - case sig := <-sigChan: - logger.Warn("received additional signal during shutdown, ignoring", - slog.String("signal", sig.String()), - ) - // Continue listening for more signals - } - } - }() + // Continue handling (and ignoring) subsequent signals during shutdown + go func() { + // Wait until shutdown starts + <-shutdownSignalReceived + for sig := range sigChan { + logger.Warn("received additional signal during shutdown, ignoring", + slog.String("signal", sig.String()), + ) + } + }()

Optionally call signal.Stop(sigChan) just before process exit to release resources.

Also applies to: 435-450

coderabbitai · 2025-09-03T19:43:58Z

go/deploy/builderd/internal/assets/base.go

 	// Register via assetmanager client
 	assetID, err := m.assetClient.RegisterBuildArtifact(ctx, "base-assets", localPath, asset.Type, labels)
 	if err != nil {
-		return fmt.Errorf("failed to register asset: %w", err)
+		// Already exists errors are fine, connection errors should cause retry
+		errStr := strings.ToLower(err.Error())
+		if strings.Contains(errStr, "already exists") || strings.Contains(errStr, "duplicate") ||
+			strings.Contains(errStr, "conflict") {
+			m.logger.InfoContext(ctx, "base asset already registered, skipping",
+				"asset", asset.Name,
+				"error", err,
+			)
+			return nil // Success - asset already registered
+		} else {
+			// This is likely a connection/service unavailable error - should trigger retry
+			return fmt.Errorf("failed to register base asset %s (service may not be ready): %w", asset.Name, err)
+		}
 	}


🛠️ Refactor suggestion

Avoid substring matching for “already exists” errors; use typed/status errors

String contains checks are brittle and locale-dependent. Prefer:

gRPC: status.Code(err) == codes.AlreadyExists

Sentinel/typed errors: errors.Is(err, assetmanager.ErrAlreadyExists)

If neither is available, introduce an error classifying helper with unit tests.

coderabbitai · 2025-09-03T19:43:58Z

go/deploy/metald/internal/backend/firecracker/asset_static.go

+// copyMetadataForRootDevice copies metadata and creates container.cmd for a root device
+func (c *Client) copyMetadataForRootDevice(ctx context.Context, vmID string, disk *metaldv1.StorageDevice, jailerRoot string, diskDst string) error {
+	baseName := strings.TrimSuffix(filepath.Base(disk.GetPath()), filepath.Ext(disk.GetPath()))
+	metadataSrc := filepath.Join(filepath.Dir(disk.GetPath()), baseName+".metadata.json")
+
+	// Check if metadata file exists
+	if _, err := os.Stat(metadataSrc); err != nil {
+		if os.IsNotExist(err) {
+			return nil // No metadata file is OK
+		}
+		return fmt.Errorf("failed to stat metadata file: %w", err)
+	}
+
+	// Copy metadata file
+	metadataDst := filepath.Join(jailerRoot, filepath.Base(metadataSrc))
+	if err := copyFileWithOwnership(metadataSrc, metadataDst, int(c.jailerConfig.UID), int(c.jailerConfig.GID)); err != nil {
+		return fmt.Errorf("failed to copy metadata file: %w", err)
+	}
+
+	c.logger.LogAttrs(ctx, slog.LevelInfo, "copied metadata file to jailer root",
+		slog.String("src", metadataSrc),
+		slog.String("dst", metadataDst),
+	)
+
+	// Load and process metadata to create container.cmd
+	metadata, err := c.loadContainerMetadata(ctx, disk.GetPath())
+	if err != nil || metadata == nil {
+		return nil // Can't create container.cmd without metadata
+	}


🧹 Nitpick (assertive)

Log metadata load failures at debug; add an anchor for future behavior.

Currently returns nil on errors, which can hide issues.

Apply this diff:

metadata, err := c.loadContainerMetadata(ctx, disk.GetPath()) if err != nil || metadata == nil { - return nil // Can't create container.cmd without metadata + // AIDEV-NOTE: Metadata missing/unreadable; skipping container.cmd generation is intentional. + // Consider promoting to debug with context to aid troubleshooting. + c.logger.DebugContext(ctx, "skipping container.cmd (no metadata)", slog.String("disk", disk.GetPath()), slog.Any("error", err)) + return nil }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

// copyMetadataForRootDevice copies metadata and creates container.cmd for a root device

func (c *Client) copyMetadataForRootDevice(ctx context.Context, vmID string, disk *metaldv1.StorageDevice, jailerRoot string, diskDst string) error {

baseName := strings.TrimSuffix(filepath.Base(disk.GetPath()), filepath.Ext(disk.GetPath()))

metadataSrc := filepath.Join(filepath.Dir(disk.GetPath()), baseName+".metadata.json")

// Check if metadata file exists

if _, err := os.Stat(metadataSrc); err != nil {

if os.IsNotExist(err) {

return nil // No metadata file is OK

}

return fmt.Errorf("failed to stat metadata file: %w", err)

}

// Copy metadata file

metadataDst := filepath.Join(jailerRoot, filepath.Base(metadataSrc))

if err := copyFileWithOwnership(metadataSrc, metadataDst, int(c.jailerConfig.UID), int(c.jailerConfig.GID)); err != nil {

return fmt.Errorf("failed to copy metadata file: %w", err)

}

c.logger.LogAttrs(ctx, slog.LevelInfo, "copied metadata file to jailer root",

slog.String("src", metadataSrc),

slog.String("dst", metadataDst),

)

// Load and process metadata to create container.cmd

metadata, err := c.loadContainerMetadata(ctx, disk.GetPath())

if err != nil || metadata == nil {

return nil // Can't create container.cmd without metadata

}

// Load and process metadata to create container.cmd

metadata, err := c.loadContainerMetadata(ctx, disk.GetPath())

if err != nil || metadata == nil {

// AIDEV-NOTE: Metadata missing/unreadable; skipping container.cmd generation is intentional.

// Consider promoting to debug with context to aid troubleshooting.

c.logger.DebugContext(

ctx,

"skipping container.cmd (no metadata)",

slog.String("disk", disk.GetPath()),

slog.Any("error", err),

)

return nil

}

🤖 Prompt for AI Agents

In go/deploy/metald/internal/backend/firecracker/asset_static.go around lines 16 to 44, the call to loadContainerMetadata swallows errors and returns nil silently; log failures at debug and add an anchor comment about future behavior. Modify the error path to call c.logger.LogAttrs(ctx, slog.LevelDebug, "failed to load container metadata", slog.String("vmID", vmID), slog.String("diskPath", disk.GetPath()), slog.Any("error", err)) (or equivalent logger method) when err != nil, and include a TODO/TICKET comment noting that we currently return nil but should surface or handle metadata load errors in the future; keep returning nil for now to preserve existing behavior.

coderabbitai · 2025-09-03T19:43:58Z

go/deploy/metald/internal/backend/firecracker/asset_static.go

+	// Create temporary mount directory
+	mountDir := filepath.Join("/tmp", fmt.Sprintf("mount-%s", vmID))
+	if err := os.MkdirAll(mountDir, 0o755); err != nil {
+		return fmt.Errorf("failed to create mount directory: %w", err)
+	}
+	defer os.RemoveAll(mountDir)
+


🛠️ Refactor suggestion

⚠️ Potential issue

Avoid RemoveAll on an active mount; use MkdirTemp to prevent collisions.

RemoveAll before unmount risks deleting guest files if umount fails. Also make mountpoint unique.

Apply this diff:

- // Create temporary mount directory - mountDir := filepath.Join("/tmp", fmt.Sprintf("mount-%s", vmID)) - if err := os.MkdirAll(mountDir, 0o755); err != nil { - return fmt.Errorf("failed to create mount directory: %w", err) - } - defer os.RemoveAll(mountDir) + // Create a unique temporary mount directory to avoid collisions + mountDir, err := os.MkdirTemp("", fmt.Sprintf("metald-mnt-%s-", vmID)) + if err != nil { + return fmt.Errorf("failed to create mount directory: %w", err) + }

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents

In go/deploy/metald/internal/backend/firecracker/asset_static.go around lines 61-67, the code creates a non-unique /tmp mount directory and defers os.RemoveAll immediately, which risks deleting guest files if unmount fails; replace filepath.Join("/tmp", ...) + MkdirAll with os.MkdirTemp("/tmp", "mount-*") to create a unique temporary directory with correct permissions, remove the deferred os.RemoveAll, and instead perform cleanup (os.RemoveAll) only after a successful unmount (or in a cleanup path that runs when the mount is confirmed unmounted), ensuring errors from MkdirTemp are handled and that RemoveAll is not executed while the mount point may still be active.

coderabbitai · 2025-09-03T19:43:58Z

go/deploy/metald/internal/backend/firecracker/asset_static.go

+	// Mount the rootfs ext4 image
+	mountCmd := exec.CommandContext(ctx, "mount", "-o", "loop", diskDst, mountDir)
+	if err := mountCmd.Run(); err != nil {
+		return fmt.Errorf("failed to mount rootfs: %w", err)
+	}
+	defer func() {
+		// Always unmount
+		umountCmd := exec.CommandContext(ctx, "umount", mountDir)
+		if err := umountCmd.Run(); err != nil {
+			c.logger.WarnContext(ctx, "failed to unmount rootfs",
+				"error", err,
+				"mountDir", mountDir,
+			)
+		}
+	}()
+


🛠️ Refactor suggestion

⚠️ Potential issue

Harden mount options and only remove mountpoint after successful unmount.

Use nosuid,nodev,noexec and explicit fs type. Don’t RemoveAll if unmount fails.

Apply this diff:

- // Mount the rootfs ext4 image - mountCmd := exec.CommandContext(ctx, "mount", "-o", "loop", diskDst, mountDir) + // AIDEV-NOTE: Requires CAP_SYS_ADMIN/root. Mount rw to write container.cmd but restrict capabilities. + // Mount the rootfs ext4 image + mountCmd := exec.CommandContext(ctx, "mount", "-t", "ext4", "-o", "loop,rw,nosuid,nodev,noexec", diskDst, mountDir) if err := mountCmd.Run(); err != nil { return fmt.Errorf("failed to mount rootfs: %w", err) } defer func() { // Always unmount umountCmd := exec.CommandContext(ctx, "umount", mountDir) if err := umountCmd.Run(); err != nil { c.logger.WarnContext(ctx, "failed to unmount rootfs", "error", err, "mountDir", mountDir, ) - } + // Do not remove mountDir if unmount failed to avoid deleting guest files from the image. + return + } + // Safe to remove the mount directory after a successful unmount + _ = os.RemoveAll(mountDir) }()

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

// Mount the rootfs ext4 image

mountCmd := exec.CommandContext(ctx, "mount", "-o", "loop", diskDst, mountDir)

if err := mountCmd.Run(); err != nil {

return fmt.Errorf("failed to mount rootfs: %w", err)

}

defer func() {

// Always unmount

umountCmd := exec.CommandContext(ctx, "umount", mountDir)

if err := umountCmd.Run(); err != nil {

c.logger.WarnContext(ctx, "failed to unmount rootfs",

"error", err,

"mountDir", mountDir,

)

}

}()

// AIDEV-NOTE: Requires CAP_SYS_ADMIN/root. Mount rw to write container.cmd but restrict capabilities.

// Mount the rootfs ext4 image

mountCmd := exec.CommandContext(ctx, "mount", "-t", "ext4", "-o", "loop,rw,nosuid,nodev,noexec", diskDst, mountDir)

if err := mountCmd.Run(); err != nil {

return fmt.Errorf("failed to mount rootfs: %w", err)

}

defer func() {

// Always unmount

umountCmd := exec.CommandContext(ctx, "umount", mountDir)

if err := umountCmd.Run(); err != nil {

c.logger.WarnContext(ctx, "failed to unmount rootfs",

"error", err,

"mountDir", mountDir,

)

// Do not remove mountDir if unmount failed to avoid deleting guest files from the image.

return

}

// Safe to remove the mount directory after a successful unmount

_ = os.RemoveAll(mountDir)

}()

🤖 Prompt for AI Agents

In go/deploy/metald/internal/backend/firecracker/asset_static.go around lines 68 to 83, the mount call currently uses insecure defaults and the defer always removes the mountpoint even if unmount fails; change the mount command to include explicit filesystem type (ext4) and hardened options ("-o", "loop,nosuid,nodev,noexec") and update the deferred cleanup so it first attempts to unmount and only removes the mountDir (e.g. os.RemoveAll) if unmount succeeds; if umount fails log the warning but do not delete the directory or its contents.

coderabbitai · 2025-09-03T19:44:03Z

go/deploy/metald/internal/network/port_allocator.go

+// PortMapping represents a mapping from VM port to host port
 type PortMapping struct {
-	ContainerPort int    `json:"container_port"`
-	HostPort      int    `json:"host_port"`
-	Protocol      string `json:"protocol"` // tcp or udp
-	VMID          string `json:"vm_id"`
+	VMPort   int    `json:"vm_port"`
+	HostPort int    `json:"host_port"`
+	Protocol string `json:"protocol"` // tcp or udp
+	VMID     string `json:"vm_id"`
 }


🧹 Nitpick (assertive)

Backward-compat risk: JSON field rename to vm_port.

If persisted state or external APIs still send container_port, unmarshalling will break. Consider a custom UnmarshalJSON to accept both.

// Optional: accept both "vm_port" and "container_port" during decode. func (p *PortMapping) UnmarshalJSON(b []byte) error { type Alias PortMapping aux := struct { Alias ContainerPort *int `json:"container_port"` }{} if err := json.Unmarshal(b, &aux); err != nil { return err } *p = PortMapping(aux.Alias) if p.VMPort == 0 && aux.ContainerPort != nil { p.VMPort = *aux.ContainerPort } return nil }

🤖 Prompt for AI Agents

In go/deploy/metald/internal/network/port_allocator.go around lines 10 to 16, the JSON field was renamed to "vm_port" which breaks unmarshalling of persisted or external data still using "container_port"; implement a custom UnmarshalJSON on PortMapping that defines an alias type, decodes into a struct that includes both a ContainerPort *int and the alias, assign the decoded alias back to the receiver, and if VMPort is zero and ContainerPort is non-nil set VMPort from ContainerPort; ensure the method returns any json.Unmarshal error and that the package imports encoding/json.

coderabbitai · 2025-09-03T19:44:03Z

go/deploy/metald/internal/network/types.go

+// BridgeManager manages workspace allocation across multiple bridges
+type BridgeManager struct {
+	bridgeCount  int                             // 8 or 32 bridges
+	bridgePrefix string                          // "br-vms" -> br-vms-0, br-vms-1, etc.
+	workspaces   map[string]*WorkspaceAllocation // workspace_id -> allocation
+	bridgeUsage  map[int]map[string]bool         // bridge_num -> workspace_id -> exists
+	mu           sync.RWMutex
+	statePath    string       // Path to state persistence file
+	logger       *slog.Logger // Structured logger for state operations
+}


🧹 Nitpick (assertive)

Mutex/logger fields: document concurrency and logging contracts.

Since these types embed mu and logger, add AIDEV-NOTE comments outlining which methods take read/write locks and how logger context is propagated.

Also applies to: 29-39

🤖 Prompt for AI Agents

In go/deploy/metald/internal/network/types.go around lines 10 to 19 (and similarly for lines 29 to 39), add AIDEV-NOTE comments on the mutex and logger fields: document which public/internal methods acquire the read lock (RLock) and which acquire the write lock (Lock), include the expected lock ordering if any, and state whether callsites must hold locks when calling those methods; also document how the logger context is propagated (e.g., whether methods add workspace/bridge fields, whether the logger is safe for concurrent use, and whether callers should clone/With on the logger before adding fields). Keep the notes brief and specific (method names and lock type), place them immediately above the mu and logger field declarations, and update the mirrored note for the other struct at lines 29–39.

coderabbitai · 2025-09-03T19:44:03Z

go/deploy/metald/internal/network/types.go

+type MultiBridgeManager struct {
+	bridgeCount    int                             // 8 or 32 bridges
+	bridgePrefix   string                          // "br-vms" -> br-vms-0, br-vms-1, etc.
+	workspaces     map[string]*WorkspaceAllocation // workspace_id -> allocation
+	bridgeUsage    map[int]map[string]bool         // bridge_num -> workspace_id -> exists
+	mu             sync.RWMutex
+	vlanRangeStart int          // Starting VLAN ID (100)
+	vlanRangeEnd   int          // Ending VLAN ID (4000)
+	statePath      string       // Path to state persistence file
+	logger         *slog.Logger // Structured logger for state operations
+}


🛠️ Refactor suggestion

Duplicated manager types (BridgeManager vs MultiBridgeManager).

Both carry near-identical fields; this invites drift. Consolidate into a single manager with config options (bridge/vlan ranges) to reduce complexity.

Add an AIDEV-NOTE explaining the design choice if you keep both.

coderabbitai · 2025-09-03T19:44:03Z

go/deploy/metald/internal/network/types.go

+	WorkspaceID  string `json:"workspace_id"`
+	BridgeNumber int    `json:"bridge_number"` // 0-31
+	BridgeName   string `json:"bridge_name"`   // br-vms-N
+	CreatedAt    string `json:"created_at"`
+	VMCount      int    `json:"vm_count"` // Track VM count for IP allocation
+}


🛠️ Refactor suggestion

CreatedAt should be time.Time for consistency and safer JSON.

WorkspaceAllocation.CreatedAt is string while VMNetwork.CreatedAt is time.Time. Use time.Time and let encoding/json handle RFC3339; add omitempty if needed.

- CreatedAt string `json:"created_at"` + CreatedAt time.Time `json:"created_at"`

Update serialization/deserialization sites accordingly.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

WorkspaceID string `json:"workspace_id"`

BridgeNumber int `json:"bridge_number"` // 0-31

BridgeName string `json:"bridge_name"` // br-vms-N

CreatedAt string `json:"created_at"`

VMCount int `json:"vm_count"` // Track VM count for IP allocation

}

WorkspaceID string `json:"workspace_id"`

BridgeNumber int `json:"bridge_number"` // 0-31

BridgeName string `json:"bridge_name"` // br-vms-N

CreatedAt time.Time `json:"created_at"`

VMCount int `json:"vm_count"` // Track VM count for IP allocation

}

coderabbitai · 2025-09-03T19:44:03Z

go/deploy/metald/internal/network/types.go

+	VMID        string     `json:"vm_id"`
+	NetworkID   string     `json:"network_id"`   // AIDEV-NOTE: Internal 8-char ID for network device naming
+	WorkspaceID string     `json:"workspace_id"` // AIDEV-NOTE: Track workspace for proper IP release
+	Namespace   string     `json:"namespace"`
+	TapDevice   string     `json:"tap_device"`
+	IPAddress   net.IP     `json:"ip_address"`
+	Netmask     net.IPMask `json:"netmask"`
+	Gateway     net.IP     `json:"gateway"`
+	MacAddress  string     `json:"mac_address"`
+	DNSServers  []string   `json:"dns_servers"`
+	CreatedAt   time.Time  `json:"created_at"`

-	// Optional fields for advanced configurations
-	VLANID      int     `json:"vlan_id,omitempty"`
+	// Optional fields for advanced configuration
 	IPv6Address net.IP  `json:"ipv6_address,omitempty"`
 	Routes      []Route `json:"routes,omitempty"`
 }


🧹 Nitpick (assertive)

JSON of net.IPMask is awkward; prefer CIDR prefix or IPNet.

net.IP marshals nicely, but net.IPMask becomes base64-encoded bytes in JSON. Replace Netmask with a CIDR prefix int or use an IPNet for the interface address.

- Netmask net.IPMask `json:"netmask"` + Prefix int `json:"prefix"` // e.g., 24

Add an AIDEV-NOTE on why this change improves API ergonomics.

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents

In go/deploy/metald/internal/network/types.go around lines 52 to 67, replace the Netmask field (net.IPMask) which JSON-encodes awkwardly with either a CIDR prefix int (e.g., PrefixLen int `json:"prefix_len"`) or an IPNet (e.g., IPNet *net.IPNet `json:"ip_net,omitempty"`), update any uses to parse/format accordingly, and add an AIDEV-NOTE comment explaining that using a prefix length or IPNet improves API ergonomics because it avoids base64-encoding of mask bytes and is human-readable/portable in JSON. Ensure tags use omitempty where appropriate and keep semantic compatibility by converting old netmask byte values at marshal/unmarshal points if needed.

coderabbitai

Review continued from previous batch...

coderabbitai · 2025-09-03T19:45:10Z