Skip to content

feat(metald): database/network layer#3941

Merged
chronark merged 1 commit intomainfrom
push-tokvskrsrkut
Sep 11, 2025
Merged

feat(metald): database/network layer#3941
chronark merged 1 commit intomainfrom
push-tokvskrsrkut

Conversation

@imeyer
Copy link
Contributor

@imeyer imeyer commented Sep 10, 2025

  • Reimplement network management/creation
  • Rabbit is gonna ruin this commit message

What does this PR do?

Fixes # (issue)

If there is not an issue for this, please create one first. This is used to tracking purposes and also helps use understand why this PR exists

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • Chore (refactoring code, technical debt, workflow improvements)
  • Enhancement (small improvements)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

How should this be tested?

  • Test A
  • Test B

Checklist

Required

  • Filled out the "How to test" section in this PR
  • Read Contributing Guide
  • Self-reviewed my own code
  • Commented on my code in hard-to-understand areas
  • Ran pnpm build
  • Ran pnpm fmt
  • Checked for warnings, there are none
  • Removed all console.logs
  • Merged the latest changes from main onto my branch with git pull origin main
  • My changes don't cause any responsiveness issues

Appreciated

  • If a UI change was made: Added a screen recording or screenshots to this PR
  • Updated the Unkey Docs if changes were necessary

Summary by CodeRabbit

  • New Features

    • Deployments auto-provision networking: create bridge, compute available IPs, and allocate per-VM addresses.
    • Deployment responses include generated VM IDs.
    • Network allocations now include and expose a bridge name.
  • Refactor

    • Simplified, stateless resource ID generation.
    • Added tracing to deployment workflows for improved observability.
    • Streamlined authentication middleware configuration (tenant-auth option removed).

@changeset-bot
Copy link

changeset-bot bot commented Sep 10, 2025

⚠️ No Changeset found

Latest commit: df5c3f0

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@vercel
Copy link

vercel bot commented Sep 10, 2025

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Preview Comments Updated (UTC)
dashboard Ready Ready Preview Comment Sep 11, 2025 11:12am
engineering Ready Ready Preview Comment Sep 11, 2025 11:12am

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Sep 10, 2025

📝 Walkthrough

Walkthrough

Removed tenant-auth interceptor options; added bridge_name to DB schema and models; refactored bridge creation to a standalone CreateBridge function; replaced stateful ID generator with stateless GenerateID; added GenerateAvailableIPs; large parts of network package commented out; CreateDeployment and VM flows updated with tracing and network/bridge allocation changes.

Changes

Cohort / File(s) Summary
Observability / Tenant auth removal
go/deploy/pkg/observability/interceptors/options.go, go/deploy/metald/cmd/metald/main.go
Removed tenant-auth-related Options fields and WithTenantAuth constructor; removed usage/exemption from interceptor setup in main.
Database: bridge_name addition
go/deploy/metald/sqlc/schema.sql, go/deploy/metald/sqlc/queries.sql, go/deploy/metald/internal/database/queries.sql.go, go/deploy/metald/internal/database/models.go
Added bridge_name column to network_allocations schema; sqlc query and generated structs updated to include BridgeName; model struct NetworkAllocation gains BridgeName.
Network: bridge creation refactor
go/deploy/metald/internal/network/bridge.go
Replaced method ensureBridge on Manager with exported CreateBridge(logger *slog.Logger, cfg Config) (string, error) returning bridge name; refactored to use passed logger/config and wrapped errors.
Network: ID generation & device names removed
go/deploy/metald/internal/network/idgen.go
Removed stateful IDGenerator and NetworkDeviceNames; added stateless GenerateID() string (base32, br- prefix).
Network: IP utilities added
go/deploy/metald/internal/network/iputil.go
Added GenerateAvailableIPs(cidr string) (string, error) to compute usable host IPs and return JSON array string.
Network: core logic disabled
go/deploy/metald/internal/network/network.go
Entire file replaced with commented-out implementation; public network helpers/constants removed (file effectively deactivated).
Service: deployment & vm changes
go/deploy/metald/internal/service/deployment.go, go/deploy/metald/internal/service/vm.go
CreateDeployment now starts an OpenTelemetry span, allocates network, creates bridge, generates available IPs, persists NetworkAllocation, loops per-VM to pop/assign IPs and logs VM creation (VM creation call left as placeholder). VM service gained generateVMID helper and logging label changes; removed some validation.
Backend asset_manager change
go/deploy/metald/internal/backend/firecracker/asset_manager.go
Stopped setting TenantId in BuildOptions returned by createBuildOptions (tenantID param no longer propagated).
Service auth cleanup
go/deploy/metald/internal/service/auth.go
Removed large block of commented-out authentication scaffolding; file now empty except package declaration.
Build / infra change
go/deploy/spire/Makefile
install-agent target dependency create-users removed.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor Client
  participant API as metald.API
  participant DB as Database
  participant NET as network package
  participant BR as Bridge subsystem
  participant BACK as Backend (VM)
  Note over API: CreateDeployment request
  Client ->> API: CreateDeployment(ctx, req)
  API ->> API: start span "metald.vm.create"
  API ->> DB: CreateNetworkAllocation(deploymentID, networkID, availableIPs, bridgeName)
  API ->> NET: GenerateAvailableIPs(baseCIDR)
  NET -->> API: availableIPs JSON
  API ->> BR: CreateBridge(logger, cfg{BridgeName, BaseNetwork,...})
  BR -->> API: bridgeName
  loop per VM
    API ->> DB: allocate/pop IP for VM
    DB -->> API: assigned IP
    API ->> BACK: create VM (placeholder / RPC)
    BACK -->> API: VM created (or placeholder)
  end
  API -->> Client: CreateDeploymentResponse (vm ids)
  Note over API,DB: span ended
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested labels

No Rabbit

Suggested reviewers

  • perkinsjr
  • chronark
  • ogzhanolguncu
  • Flo4604

Pre-merge checks (1 passed, 1 warning, 1 inconclusive)

❌ Failed checks (1 warning, 1 inconclusive)
Check name Status Explanation Resolution
Description Check ⚠️ Warning The current description includes placeholder comments and a brief bullet list but does not follow the repository’s required template. The “What does this PR do?” section remains unpopulated aside from template text, the issue number is not specified, the Type of change checkboxes are all unchecked, the testing steps are generic, and the checklist items are left blank. This indicates multiple required sections are missing or incomplete, making it difficult for reviewers to understand the changes, test them, or verify checklist compliance. Fill out the “What does this PR do?” section with a concise summary of the changes and reference the related issue number. Select the appropriate Type of change checkbox, provide concrete test steps under “How should this be tested?”, and complete all required checklist items to demonstrate adherence to repository standards. Ensuring these sections are fully populated will align the description with the template and give reviewers the necessary context.
Title Check ❓ Inconclusive The current pull request title, “feat(metald): database/network layer,” is broad and does not clearly convey the primary change of reimplementing network management and creation within the metald service. It refers generally to database and network layers but omits the specific feature and motivation behind the changeset. As a result, it is only partially related to the main objectives and lacks sufficient clarity for someone scanning history to understand the primary purpose. Consider renaming the title to explicitly summarize the core feature, for example “feat(metald): implement network management and database integration,” so that it succinctly highlights the reimplementation of network creation and its database support. This will help reviewers and future maintainers quickly grasp the intent of the PR.
✅ Passed checks (1 passed)
Check name Status Explanation
Docstring Coverage ✅ Passed Docstring coverage is 85.71% which is sufficient. The required threshold is 80.00%.

Pre-merge checks (2 passed, 1 warning)

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Description Check ⚠️ Warning The current PR description is mostly the unmodified template plus a short informal note and is missing critical details: there is no referenced issue number, no selected "Type of change", only placeholder test steps ("Test A"/"Test B"), and all required checklist items remain unchecked; the raw summary shows substantial DB and network changes that demand migration/testing details. Because these required sections are incomplete or absent, reviewers lack the information needed to assess risk, run tests, or validate migrations. This makes the description insufficient for review. Please update the PR description to reference an issue (or create one), select the appropriate "Type of change" checkboxes, and provide concrete "How should this be tested" steps including database migration/rollback steps and expected outcomes; run and report pnpm build/fmt and any tests and then mark the checklist items accordingly. Also document the schema change (network_allocations.bridge_name), any required DB migration scripts or manual upgrade steps, and note backward-compatibility impacts so reviewers can validate correctness and risks. After updating, re-run CI and request reviewers to re-check the PR.
✅ Passed checks (2 passed)
Check name Status Explanation
Title Check ✅ Passed The title "feat(metald): database/network layer" is concise and directly reflects the main changes in this changeset (network reimplementation and database/network layer work), so it meaningfully summarizes the primary change and is suitable for scanability by teammates. It avoids noise and is appropriately formatted for a single-line PR title.
Docstring Coverage ✅ Passed Docstring coverage is 85.71% which is sufficient. The required threshold is 80.00%.

Tip

👮 Agentic pre-merge checks are now available in preview!

Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.

  • Built-in checks – Quickly apply ready-made checks to enforce title conventions, require pull request descriptions that follow templates, validate linked issues for compliance, and more.
  • Custom agentic checks – Define your own rules using CodeRabbit’s advanced agentic capabilities to enforce organization-specific policies and workflows. For example, you can instruct CodeRabbit’s agent to verify that API documentation is updated whenever API schema files are modified in a PR. Note: Upto 5 custom checks are currently allowed during the preview period. Pricing for this feature will be announced in a few weeks.

Please see the documentation for more information.

Example:

reviews:
  pre_merge_checks:
    custom_checks:
      - name: "Undocumented Breaking Changes"
        mode: "warning"
        instructions: |
          Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal).

Please share your feedback with us on this Discord post.

✨ Finishing touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch push-tokvskrsrkut

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Contributor

github-actions bot commented Sep 10, 2025

Thank you for following the naming conventions for pull request titles! 🙏

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 21

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (24)
go/deploy/spire/Makefile (2)

21-25: Fix production trust domain typo.

production.unkey.cliud should be production.unkey.cloud. Current value will produce incorrect SPIFFE IDs.

-TRUST_DOMAIN = production.unkey.cliud
+TRUST_DOMAIN = production.unkey.cloud

127-129: Add missing create-users target and run SPIRE services as non-root

  • Define an idempotent create-users target before the uninstall target in go/deploy/spire/Makefile since install-server depends on it.
  • Update go/deploy/spire/contrib/systemd/spire-server.service (line 9) to User=spire-server and Group=spire-server.
  • Update go/deploy/spire/contrib/systemd/spire-agent.service (line 10) to User=spire-agent and Group=spire-agent.
go/deploy/metald/internal/network/network.go (1)

3-118: No callers for deprecated APIs
Ripgrep across the repo found zero references to the now‐commented exports (GetSubnetInfo, CalculateIndex, ValidateIdentifier, etc.), so builds won’t break. You can safely delete go/deploy/metald/internal/network/network.go and add a doc.go outlining the new API surface to avoid confusion.

go/deploy/metald/sqlc/schema.sql (4)

13-17: Enforce Linux bridge name length (<=15) to prevent runtime netlink errors.

Add a CHECK constraint now so invalid names never persist.

-  bridge_name TEXT NOT NULL UNIQUE,
+  bridge_name TEXT NOT NULL UNIQUE CHECK(length(bridge_name) <= 15),

9-17: Clarify FK delete behavior and enforce via explicit actions.

Decide whether to CASCADE or RESTRICT deletes to avoid orphans; encode it explicitly.

-  FOREIGN KEY (network_id) REFERENCES networks(id)
+  FOREIGN KEY (network_id) REFERENCES networks(id) ON DELETE RESTRICT
-  FOREIGN KEY (network_allocation_id) REFERENCES network_allocations(id),
+  FOREIGN KEY (network_allocation_id) REFERENCES network_allocations(id) ON DELETE RESTRICT,

37-39: Redundant index: deployment_id already has a UNIQUE constraint.

SQLite creates an index for UNIQUE(deployment_id); the extra index duplicates it. Remove to save writes and space.

-CREATE INDEX idx_network_allocations_deployment
-  ON network_allocations(deployment_id);

34-35: Likely-needed index on network_allocations(network_id).

Joins/filtering by network_id will benefit from an index.

+CREATE INDEX idx_network_allocations_network
+  ON network_allocations(network_id);
go/deploy/metald/cmd/metald/main.go (5)

318-336: Auth removed; ensure mTLS-only access or refuse to start when TLS is disabled.

With tenant auth interceptor gone, running with TLS.Mode="disabled" exposes unauthenticated RPCs. Fail fast or add request-layer auth.

Minimal guard (outside this hunk, after config is loaded):

// hard-fail if no transport security AND no request auth
if cfg.TLS.Mode == "disabled" {
	logger.Error("TLS disabled and no auth interceptors configured; refusing to start")
	os.Exit(1)
}

If you rely on mTLS identity, add a lightweight Connect interceptor to enforce client SVIDs:

type mtlsAuth struct{}
func (mtlsAuth) WrapUnary(next connect.UnaryFunc) connect.UnaryFunc {
	return func(ctx context.Context, req connect.AnyRequest) (connect.AnyResponse, error) {
		info := connect.RequestMetadataFromContext(ctx)
		if info == nil || len(info.PeerCertificates) == 0 {
			return nil, connect.NewError(connect.CodeUnauthenticated, errors.New("client cert required"))
		}
		// TODO: verify SPIFFE IDs / allowed subjects here
		return next(ctx, req)
	}
}

348-355: Duplicate /metrics endpoints (main mux and dedicated Prom server). Pick one.

Serving metrics twice can double scrape cost and confuse operators.

-  if cfg.OpenTelemetry.Enabled && cfg.OpenTelemetry.PrometheusEnabled {
-    mux.Handle("/metrics", otelProviders.PrometheusHTTP)
-    logger.Info("Prometheus metrics endpoint enabled",
-      slog.String("path", "/metrics"),
-    )
-  }
+  // metrics are exposed via the dedicated Prometheus server below

Also applies to: 408-436


55-58: Comment/code mismatch: says “First 8 chars” but returns 7.

Align the comment with behavior.

-        return "dev-" + setting.Value[:7] // First 8 chars of commit hash
+        return "dev-" + setting.Value[:7] // First 7 chars of commit hash

359-371: H2C allowed when TLS is off. Combined with no auth, this is risky.

If keeping H2C for local dev, restrict bind to loopback when TLS disabled.

Example (outside hunk):

if cfg.TLS.Mode == "disabled" && (cfg.Server.Address != "127.0.0.1" && cfg.Server.Address != "localhost") {
	logger.Error("insecure H2C requires loopback bind; set UNKEY_METALD_ADDRESS=127.0.0.1")
	os.Exit(1)
}

Also applies to: 373-380


481-491: Docs: default OTEL service name in help text doesn’t match runtime (“metald”).

Update help to avoid confusion.

-  UNKEY_METALD_OTEL_SERVICE_NAME         Service name (default: vmm-controlplane)
+  UNKEY_METALD_OTEL_SERVICE_NAME         Service name (default: metald)
go/deploy/metald/internal/database/models.go (1)

25-32: Explicitly order columns in CreateNetworkAllocation and GetNetworkAllocation queries
RETURNING * and SELECT na.* emit available_ips before bridge_name (table order), but the Go model declares BridgeName before AvailableIps, swapping those fields at scan time. Change both SQL statements to explicitly list columns in struct order—e.g.:

-- CreateNetworkAllocation
INSERT INTO network_allocations (
  id, deployment_id, network_id, bridge_name, available_ips, allocated_at
)
VALUES (DEFAULT, ?, ?, ?, ?, DEFAULT)
RETURNING id, deployment_id, network_id, bridge_name, available_ips, allocated_at;

-- GetNetworkAllocation
SELECT
  na.id,
  na.deployment_id,
  na.network_id,
  na.bridge_name,
  na.available_ips,
  na.allocated_at,
  n.base_network
FROM network_allocations na
JOIN networks n ON na.network_id = n.id
WHERE na.deployment_id = ?;
go/deploy/metald/internal/service/vm.go (7)

93-104: CreateVm allocates a network but never creates a network_allocation or bridge.

This leaves the DB in a partial state and leaks the network. Either:

  • Move single-VM provisioning under the new CreateDeployment flow, or
  • Create a network_allocation row (with available_ips and bridge_name) and clean up on failure.

I can draft a transactional flow if you confirm CreateVm should remain supported.


105-111: Compile-time break: AllocateIP called without required fields.

database.AllocateIPParams requires IpAddr and NetworkAllocationID. Also, IP must be popped first via PopAvailableIPJSON.

Proposed fix (assuming CreateVmRequest has deployment_id; adjust if different):

-	ip, ipErr := s.queries.AllocateIP(ctx, database.AllocateIPParams{
-		VmID: req.Msg.GetVmId(),
-	})
-	if ipErr != nil {
-		return nil, connect.NewError(connect.CodeInternal, fmt.Errorf("failed to allocate IP for vm: %w", ipErr))
-	}
+	// Pop an IP and allocate it to this VM
+	ipRow, popErr := s.queries.PopAvailableIPJSON(ctx, req.Msg.GetDeploymentId())
+	if popErr != nil {
+		return nil, connect.NewError(connect.CodeInternal, fmt.Errorf("failed to pop available IP: %w", popErr))
+	}
+	ipAlloc, ipErr := s.queries.AllocateIP(ctx, database.AllocateIPParams{
+		VmID:                req.Msg.GetVmId(), // or use backend vmID below
+		IpAddr:              ipRow.AllocatedIp, // ensure alias matches query
+		NetworkAllocationID: ipRow.ID,
+	})
+	if ipErr != nil {
+		return nil, connect.NewError(connect.CodeInternal, fmt.Errorf("failed to allocate IP for vm: %w", ipErr))
+	}

Follow-up: Use the backend’s returned vmID for IP allocation to avoid mismatch if req.Msg.VmId is empty.


112-151: ID/IP flow mismatch and potential leakage.

  • You allocate IP before knowing the actual vmID returned by backend.CreateVM, risking orphaned IPs if creation fails.
  • Consider: pop IP → attempt VM creation with vmID generated by service → on backend failure, return IP to pool; on success, persist ip_allocations with the final vmID.

I can provide a transaction-aware sequence and compensating actions.


76-82: Full config logging may leak secrets.

Dumping the entire VM config at debug can expose credentials/paths. Redact sensitive fields or gate behind an explicit debug flag.


154-212: DeleteVm should release IPs and possibly free network resources.

Currently only calls backend.DeleteVM and stops metrics. Add DB cleanup:

  • ReleaseIP(vm_id) and, if last VM in deployment, consider ReleaseNetwork + DeleteNetworkAllocation.

I can add the DB teardown sequence if you confirm the lifecycle.


484-541: Validation is commented out; minimal checks recommended.

Re-enable at least CPU/memory/boot sanity checks to fail fast before backend.

I can reintroduce a trimmed validator aligned with current proto fields.


105-111: Remove incorrect GetDeploymentId usage CreateVmRequest has no deployment_id field—only vm_id; replace any GetDeploymentId calls with req.Msg.GetVmId().

go/deploy/metald/sqlc/queries.sql (2)

25-31: PopAvailableIPJSON returns the wrong IP.

After the UPDATE, json_extract reads the new 0th element—not the popped one—causing allocation of a still-available IP. Use a CTE to capture the original element.

Apply:

--- a/go/deploy/metald/sqlc/queries.sql
+++ b/go/deploy/metald/sqlc/queries.sql
@@
--- name: PopAvailableIPJSON :one
-UPDATE network_allocations
-SET available_ips = json_remove(available_ips, '$[0]')
-WHERE deployment_id = ?
-AND json_array_length(available_ips) > 0
-RETURNING json_extract(available_ips, '$[0]') as allocated_ip, id;
+WITH orig AS (
+  SELECT id, available_ips
+  FROM network_allocations
+  WHERE deployment_id = ?
+    AND json_array_length(available_ips) > 0
+  LIMIT 1
+),
+upd AS (
+  UPDATE network_allocations
+  SET available_ips = json_remove(available_ips, '$[0]')
+  WHERE id = (SELECT id FROM orig)
+  RETURNING id
+)
+SELECT json_extract(orig.available_ips, '$[0]') AS allocated_ip, orig.id
+FROM orig;

49-72: Add supporting indexes for hot paths.

Queries frequently filter by deployment_id and network_allocation_id. Add indexes:

  • network_allocations(deployment_id)
  • ip_allocations(vm_id)
  • ip_allocations(network_allocation_id)

I can add a migration snippet if desired.

go/deploy/metald/internal/service/deployment.go (1)

140-143: Great place to include metrics.

Record deployment success/failure and durations similar to VM paths.

go/deploy/metald/internal/database/queries.sql.go (1)

200-206: Bug: PopAvailableIPJSON returns the wrong IP (post-update value).

RETURNING json_extract(available_ips, '$[0]') runs after the update, so it returns the new first IP, not the popped one. Make it atomic and return the pre-update value using a CTE.

Apply this diff to fix the query logic:

-UPDATE network_allocations
-SET available_ips = json_remove(available_ips, '$[0]')
-WHERE deployment_id = ?
-AND json_array_length(available_ips) > 0
-RETURNING json_extract(available_ips, '$[0]') as allocated_ip, id
+WITH popped AS (
+  SELECT
+    deployment_id,
+    json_extract(available_ips, '$[0]') AS allocated_ip
+  FROM network_allocations
+  WHERE deployment_id = ?
+    AND json_array_length(available_ips) > 0
+)
+UPDATE network_allocations
+SET available_ips = json_remove(available_ips, '$[0]')
+WHERE deployment_id = (SELECT deployment_id FROM popped)
+RETURNING (SELECT allocated_ip FROM popped) AS allocated_ip, id
📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0086690 and 11cf7bd.

📒 Files selected for processing (15)
  • go/deploy/metald/cmd/metald/main.go (1 hunks)
  • go/deploy/metald/internal/backend/firecracker/asset_manager.go (0 hunks)
  • go/deploy/metald/internal/database/models.go (1 hunks)
  • go/deploy/metald/internal/database/queries.sql.go (4 hunks)
  • go/deploy/metald/internal/network/bridge.go (1 hunks)
  • go/deploy/metald/internal/network/idgen.go (1 hunks)
  • go/deploy/metald/internal/network/iputil.go (1 hunks)
  • go/deploy/metald/internal/network/network.go (1 hunks)
  • go/deploy/metald/internal/service/auth.go (0 hunks)
  • go/deploy/metald/internal/service/deployment.go (1 hunks)
  • go/deploy/metald/internal/service/vm.go (3 hunks)
  • go/deploy/metald/sqlc/queries.sql (1 hunks)
  • go/deploy/metald/sqlc/schema.sql (1 hunks)
  • go/deploy/pkg/observability/interceptors/options.go (0 hunks)
  • go/deploy/spire/Makefile (1 hunks)
💤 Files with no reviewable changes (3)
  • go/deploy/metald/internal/backend/firecracker/asset_manager.go
  • go/deploy/pkg/observability/interceptors/options.go
  • go/deploy/metald/internal/service/auth.go
🧰 Additional context used
🧬 Code graph analysis (2)
go/deploy/metald/internal/database/queries.sql.go (1)
go/deploy/metald/internal/database/models.go (1)
  • NetworkAllocation (25-32)
go/deploy/metald/internal/service/deployment.go (5)
go/deploy/metald/internal/service/vm.go (1)
  • VMService (26-34)
go/deploy/metald/internal/network/idgen.go (1)
  • GenerateID (13-19)
go/deploy/metald/internal/network/bridge.go (1)
  • CreateBridge (11-68)
go/deploy/metald/internal/network/iputil.go (1)
  • GenerateAvailableIPs (11-48)
go/deploy/metald/internal/database/queries.sql.go (2)
  • CreateNetworkAllocationParams (65-70)
  • AllocateIPParams (19-23)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: Test API / API Test Local
  • GitHub Check: Test Go API Local / Test
  • GitHub Check: Build / Build
🔇 Additional comments (11)
go/deploy/metald/sqlc/schema.sql (1)

13-13: Ignore migration warning for bridge_name constraint
This schema.sql represents the initial (greenfield) DDL—there’s no existing data or migration pipeline in go/deploy/metald—so adding NOT NULL UNIQUE here does not pose a migration risk.

Likely an incorrect or invalid review comment.

go/deploy/metald/internal/database/models.go (1)

25-32: BridgeName mapping looks correct and aligns with schema.

DB/json tags are consistent with schema and API expectations.

go/deploy/metald/internal/service/vm.go (5)

145-151: Returned endpoint uses ip from earlier broken AllocateIP call.

After fixing IP allocation, ensure Host uses the allocated IP (ipAlloc.IpAddr). Also, avoid hardcoding the port if backend dictates it.

Would you like me to wire the backend port or make it configurable?


36-47: Service label change to "metald" is fine.

Unifies logging tags across services.


65-73: Good: metrics hooks on VM create path.

Create/failure metrics are recorded consistently.


176-184: Good: stop metrics collection on delete/shutdown before backend calls.

Avoids dangling collectors.


214-482: Boot/Shutdown/Pause/Resume/GetVmInfo flows look consistent.

Error handling and tracing are reasonable.

go/deploy/metald/sqlc/queries.sql (1)

14-17: No action needed: parameter order in CreateNetworkAllocationParams matches the SQL definition.

go/deploy/metald/internal/service/deployment.go (1)

70-85: Good: precomputing available IPs and persisting bridge_name.

This aligns with the new schema and flow.

go/deploy/metald/internal/database/queries.sql.go (2)

60-63: CreateNetworkAllocation insert/return scan order looks correct.

Columns and scan order for bridge_name align with the struct; placeholders prevent injection. No issues.

Also applies to: 73-78, 84-84


142-142: GetNetworkAllocation bridge_name propagation is consistent.

Selection, struct field addition, and scan order are coherent. Good.

Also applies to: 152-152, 165-165

Comment on lines +60 to 63
INSERT INTO network_allocations (deployment_id, network_id, available_ips, bridge_name)
VALUES (?, ?, ?, ?)
RETURNING id, deployment_id, network_id, bridge_name, available_ips, allocated_at
`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add a DB-level safeguard: enforce Linux IFNAMSIZ (15 chars) on bridge_name.

Linux interface names are limited to 15 characters. The schema appears to define bridge_name TEXT NOT NULL UNIQUE, but without a length constraint you can persist invalid names that will later fail in netlink. Add a CHECK constraint in schema (and validate in application) to cap length at 15.

Example (schema.sql):

ALTER TABLE network_allocations
ADD CONSTRAINT chk_bridge_name_len CHECK (length(bridge_name) <= 15);

Also applies to: 69-69, 73-78, 84-84

🤖 Prompt for AI Agents
In go/deploy/metald/internal/database/queries.sql.go around lines 60-63 (and
also apply to lines 69, 73-78, 84), add a DB-level CHECK constraint enforcing
Linux IFNAMSIZ by ensuring bridge_name length <= 15 and update the related
INSERT/UPDATE DDL or ALTER TABLE statements to include that constraint; also add
complementary application-side validation to reject or truncate bridge names
longer than 15 characters before DB write so netlink calls never receive invalid
names.

Comment on lines +10 to 19
// CreateBridge creates the bridge if it doesn't exist
func CreateBridge(logger *slog.Logger, cfg Config) (string, error) {
link, linkErr := netlink.LinkByName(cfg.BridgeName)
if linkErr == nil {
logger.Debug("bridge exists",
slog.String("bridge", cfg.BridgeName),
slog.Int("mtu", link.Attrs().MTU),
)
return nil
return link.Attrs().Name, nil
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Make CreateBridge idempotent: ensure address and link-up even when bridge exists.

Early-return on existing bridge skips address reconciliation and LinkSetUp. Convert to “ensure” semantics: reuse the existing link, add/replace the IP, and bring it up.

Apply this diff:

 func CreateBridge(logger *slog.Logger, cfg Config) (string, error) {
-	link, linkErr := netlink.LinkByName(cfg.BridgeName)
-	if linkErr == nil {
-		logger.Debug("bridge exists",
-			slog.String("bridge", cfg.BridgeName),
-			slog.Int("mtu", link.Attrs().MTU),
-		)
-		return link.Attrs().Name, nil
-	}
+	link, linkErr := netlink.LinkByName(cfg.BridgeName)
+	var br netlink.Link

-	bridge := &netlink.Bridge{ //nolint:exhaustruct
-		LinkAttrs: netlink.LinkAttrs{
-			Name: cfg.BridgeName, //nolint:exhaustruct
-		},
-	}
-
-	if err := netlink.LinkAdd(bridge); err != nil {
-		logger.Error("failed to create bridge",
-			slog.String("bridge", cfg.BridgeName),
-			slog.String("error", err.Error()),
-		)
-		return "", fmt.Errorf("failed to create bridge: %w", err)
-	}
-
-	br, brErr := netlink.LinkByName(cfg.BridgeName)
-	if brErr != nil {
-		return "", fmt.Errorf("failed to get bridge: %w", brErr)
-	}
+	if linkErr == nil {
+		logger.Debug("bridge exists",
+			slog.String("bridge", cfg.BridgeName),
+			slog.Int("mtu", link.Attrs().MTU),
+		)
+		br = link
+	} else {
+		bridge := &netlink.Bridge{ //nolint:exhaustruct
+			LinkAttrs: netlink.LinkAttrs{Name: cfg.BridgeName}, //nolint:exhaustruct
+		}
+		if err := netlink.LinkAdd(bridge); err != nil {
+			logger.Error("failed to create bridge",
+				slog.String("bridge", cfg.BridgeName),
+				slog.String("error", err.Error()),
+			)
+			return "", fmt.Errorf("failed to create bridge: %w", err)
+		}
+		var brErr error
+		br, brErr = netlink.LinkByName(cfg.BridgeName)
+		if brErr != nil {
+			return "", fmt.Errorf("failed to get bridge: %w", brErr)
+		}
+	}
@@
-	if err := netlink.AddrAdd(br, addr); err != nil {
+	if err := netlink.AddrReplace(br, addr); err != nil {
 		logger.Error("failed to add IP to bridge",
 			slog.String("bridge", cfg.BridgeName),
 			slog.String("error", err.Error()),
 		)
-		return "", fmt.Errorf("failed to add IP to bridge: %w", err)
+		return "", fmt.Errorf("failed to configure IP on bridge: %w", err)
 	}
@@
-	logger.Debug("bridge created",
+	logger.Debug("bridge configured",
 		slog.String("bridge", cfg.BridgeName),
 		slog.Int("mtu", br.Attrs().MTU),
 	)
 
 	return cfg.BridgeName, nil

Note: AddrReplace makes the operation idempotent and safe on re-runs.

Also applies to: 35-38, 53-60, 62-67

🤖 Prompt for AI Agents
In go/deploy/metald/internal/network/bridge.go around lines 10-19 (and also
adjust the similar logic at 35-38, 53-60, 62-67), the function currently
early-returns when the bridge exists which skips address reconciliation and
LinkSetUp; change it to “ensure” semantics by reusing the existing link instead
of returning: after obtaining link := netlink.LinkByName(cfg.BridgeName) when no
error, parse cfg.BridgeIP into an *net.IPNet, call netlink.AddrReplace(link,
&netlink.Addr{IPNet: parsedIPNet}) to idempotently add/replace the IP address,
then call netlink.LinkSetUp(link) to bring the interface up, handling and
returning any errors from AddrReplace/LinkSetUp instead of exiting early.

Comment on lines +40 to 43
addr, err := netlink.ParseAddr(cfg.BaseNetwork.String())
if err != nil {
return fmt.Errorf("failed to parse bridge IP: %w", err)
return "", fmt.Errorf("failed to parse bridge IP: %w", err)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Wrong IP assigned to bridge: using network address instead of a host address.

netlink.ParseAddr(cfg.BaseNetwork.String()) yields the network address (e.g., 192.168.0.0/24). Assign a valid host (e.g., first host .1) within the prefix.

Apply this diff (minimal fix) and add a small helper:

-	addr, err := netlink.ParseAddr(cfg.BaseNetwork.String())
+	prefixSize, _ := cfg.BaseNetwork.Mask.Size()
+	gwIP, err := firstHostIPv4(cfg.BaseNetwork.IP)
+	if err != nil {
+		return "", fmt.Errorf("failed to derive bridge IP: %w", err)
+	}
+	addr, err := netlink.ParseAddr(fmt.Sprintf("%s/%d", gwIP.String(), prefixSize))
 	if err != nil {
 		return "", fmt.Errorf("failed to parse bridge IP: %w", err)
 	}

Add this helper in the same package:

// firstHostIPv4 returns the first usable IPv4 address in the subnet (network IP + 1).
func firstHostIPv4(ip net.IP) (net.IP, error) {
	ip4 := ip.To4()
	if ip4 == nil {
		return nil, fmt.Errorf("only IPv4 is supported for bridge IP assignment")
	}
	res := make(net.IP, 4)
	copy(res, ip4)
	for i := 3; i >= 0; i-- {
		res[i]++
		if res[i] != 0 {
			break
		}
	}
	return res, nil
}

Also add net to imports.

🤖 Prompt for AI Agents
In go/deploy/metald/internal/network/bridge.go around lines 40 to 43, the code
parses the network CIDR as the bridge address which yields the network address
(e.g., 192.168.0.0/24) instead of a usable host; replace the netlink.ParseAddr
call with logic that parses the CIDR to get the network IP, compute the first
usable host (network IP + 1) using a new helper, and then create the
netlink.Addr with that host IP; add the provided firstHostIPv4 helper in the
same package (which validates IPv4 and increments the IP bytes), and add the net
import to the file imports.

Comment on lines 9 to +12
// IDGenerator generates short, unique IDs for network devices
// Network interface names in Linux are limited to 15 characters,
// so we generate 8-character IDs to leave room for prefixes like "tap-", "vh-", etc.
type IDGenerator struct {
mu sync.Mutex
generated map[string]struct{} // Track generated IDs to ensure uniqueness
}

// NewIDGenerator creates a new ID generator
func NewIDGenerator() *IDGenerator {
//exhaustruct:ignore
return &IDGenerator{
generated: make(map[string]struct{}),
}
}

// GenerateNetworkID generates a unique 8-character ID for network devices
// The ID is guaranteed to be unique within this generator instance
func (g *IDGenerator) GenerateNetworkID() (string, error) {
g.mu.Lock()
defer g.mu.Unlock()

// Try up to 10 times to generate a unique ID
for range 10 {
// Generate 4 random bytes (8 hex characters)
bytes := make([]byte, 4)
if _, err := rand.Read(bytes); err != nil {
return "", fmt.Errorf("failed to generate random bytes: %w", err)
}

id := hex.EncodeToString(bytes)

// Check if this ID already exists
if _, exists := g.generated[id]; !exists {
g.generated[id] = struct{}{}
return id, nil
}
}

return "", fmt.Errorf("failed to generate unique ID after 10 attempts")
}

// ReleaseID removes an ID from the tracking set, allowing it to be reused
func (g *IDGenerator) ReleaseID(id string) {
g.mu.Lock()
defer g.mu.Unlock()
delete(g.generated, id)
}

// NetworkDeviceNames holds all network device names for a VM
// AIDEV-NOTE: All names follow consistent patterns and stay within 15-char limit
type NetworkDeviceNames struct {
ID string // 8-character internal ID
Namespace string // Network namespace name (no length limit)
TAP string // TAP device name (15 char limit)
Bridge string // Bridge name (15 char limit)
}
// Generates an for the bridge, it's not guaranteed unique until it's saved to the DB
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Stale/incomplete comments.

The file-level comments still refer to the removed IDGenerator and have a sentence fragment. Update or remove to avoid confusion.

🤖 Prompt for AI Agents
In go/deploy/metald/internal/network/idgen.go around lines 9 to 12, the
top-of-file comments are stale and contain a sentence fragment referencing a
removed IDGenerator and an incomplete sentence about the bridge; update these
comments to accurately describe the current purpose of the file (or remove them
entirely). Replace the outdated references with a concise, complete file comment
that explains what the package/file now provides (e.g., short unique ID
generation for network resources or whatever the implementation actually does),
ensuring no sentence fragments remain and Linux interface name length notes are
only present if relevant.

Comment on lines +12 to 19
// Generates an for the bridge, it's not guaranteed unique until it's saved to the DB
func GenerateID() string {
b := make([]byte, 8)
rand.Read(b)
id := base32.StdEncoding.WithPadding(base32.NoPadding).EncodeToString(b)[:12]

// GenerateDeviceNames creates a consistent set of network device names
func GenerateDeviceNames(networkID string) *NetworkDeviceNames {
return &NetworkDeviceNames{
ID: networkID,
Namespace: fmt.Sprintf("ns_vm_%s", networkID),
TAP: fmt.Sprintf("tap_%s", networkID), // 12 chars
Bridge: fmt.Sprintf("br-%s", networkID),
}
return fmt.Sprintf("br-%s", id)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Persistence-level retry on UNIQUE constraint recommended.

Since IDs are stateless now, implement a small retry loop (e.g., 3–5 attempts) around DB insert on UNIQUE(bridge_name) violation to keep call sites simple and robust.


🛠️ Refactor suggestion

⚠️ Potential issue

Don’t ignore crypto/rand errors and normalize to lowercase for interface-name safety.

Handle rand.Reader errors and ensure the name is lowercase to avoid surprises on systems that expect lowercased netdev names.

 import (
 	"crypto/rand"
 	"encoding/base32"
 	"fmt"
+	"io"
+	"strings"
 )
@@
-// Generates an for the bridge, it's not guaranteed unique until it's saved to the DB
+// GenerateID returns a 15-char bridge name: "br-" + 12 base32 chars (~60 bits).
+// Not guaranteed unique; enforce a UNIQUE constraint in DB and retry on conflict.
 func GenerateID() string {
-	b := make([]byte, 8)
-	rand.Read(b)
-	id := base32.StdEncoding.WithPadding(base32.NoPadding).EncodeToString(b)[:12]
-
-	return fmt.Sprintf("br-%s", id)
+	b := make([]byte, 8)
+	if _, err := io.ReadFull(rand.Reader, b); err != nil {
+		panic(fmt.Errorf("crypto/rand failed: %w", err))
+	}
+	enc := base32.StdEncoding.WithPadding(base32.NoPadding)
+	id := enc.EncodeToString(b)
+	id = strings.ToLower(id[:12])
+	return "br-" + id
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// Generates an for the bridge, it's not guaranteed unique until it's saved to the DB
func GenerateID() string {
b := make([]byte, 8)
rand.Read(b)
id := base32.StdEncoding.WithPadding(base32.NoPadding).EncodeToString(b)[:12]
// GenerateDeviceNames creates a consistent set of network device names
func GenerateDeviceNames(networkID string) *NetworkDeviceNames {
return &NetworkDeviceNames{
ID: networkID,
Namespace: fmt.Sprintf("ns_vm_%s", networkID),
TAP: fmt.Sprintf("tap_%s", networkID), // 12 chars
Bridge: fmt.Sprintf("br-%s", networkID),
}
return fmt.Sprintf("br-%s", id)
}
import (
"crypto/rand"
"encoding/base32"
"fmt"
"io"
"strings"
)
// GenerateID returns a 15-char bridge name: "br-" + 12 base32 chars (~60 bits).
// Not guaranteed unique; enforce a UNIQUE constraint in DB and retry on conflict.
func GenerateID() string {
b := make([]byte, 8)
if _, err := io.ReadFull(rand.Reader, b); err != nil {
panic(fmt.Errorf("crypto/rand failed: %w", err))
}
enc := base32.StdEncoding.WithPadding(base32.NoPadding)
id := enc.EncodeToString(b)
id = strings.ToLower(id[:12])
return "br-" + id
}
🤖 Prompt for AI Agents
In go/deploy/metald/internal/network/idgen.go around lines 12-19, the code
ignores crypto/rand errors and does not normalize the interface name to
lowercase; change GenerateID to return (string, error), call rand.Read correctly
and check its error, return a wrapped error on failure, then base32-encode the
bytes, trim to 12 chars and run strings.ToLower on the result before formatting
the final "br-%s" string; update imports (crypto/rand, strings) and adjust all
callers to handle the returned error.

Comment on lines +92 to +101
logger.Debug("network allocated",
slog.Any("network_cidr", netConfig.BaseNetwork.String()),
slog.Any("network_id", na.ID),
)

logger.Debug("bridge allocated",
slog.Any("network_cidr", netConfig.BaseNetwork.String()),
slog.String("bridge_id", br),
)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Use concrete slog types.

Prefer slog.String for CIDR and slog.Int64 for IDs.

-	logger.Debug("network allocated",
-		slog.Any("network_cidr", netConfig.BaseNetwork.String()),
-		slog.Any("network_id", na.ID),
+	logger.Debug("network allocated",
+		slog.String("network_cidr", netConfig.BaseNetwork.String()),
+		slog.Int64("network_id", na.ID),
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
logger.Debug("network allocated",
slog.Any("network_cidr", netConfig.BaseNetwork.String()),
slog.Any("network_id", na.ID),
)
logger.Debug("bridge allocated",
slog.Any("network_cidr", netConfig.BaseNetwork.String()),
slog.String("bridge_id", br),
)
logger.Debug("network allocated",
slog.String("network_cidr", netConfig.BaseNetwork.String()),
slog.Int64("network_id", na.ID),
)
logger.Debug("bridge allocated",
slog.Any("network_cidr", netConfig.BaseNetwork.String()),
slog.String("bridge_id", br),
)
🤖 Prompt for AI Agents
In go/deploy/metald/internal/service/deployment.go around lines 92 to 101, the
logs use slog.Any for the CIDR and network ID; change those fields to use
concrete slog types: use slog.String("network_cidr",
netConfig.BaseNetwork.String()) for the CIDR and slog.Int64("network_id", na.ID)
for the network ID (keep slog.String for the bridge_id if it is a string).
Update the two logger.Debug calls accordingly so they pass the proper typed slog
attributes.

Comment on lines +102 to +138
logger.DebugContext(ctx, "creating vms", slog.Int64("count", int64(vmCount)))
for vm := range vmCount {
id := s.generateVMID(ctx)

vmid := fmt.Sprintf("ud-%s", id)

// Pop an available IP from the pool
ipRow, popErr := s.queries.PopAvailableIPJSON(ctx, req.Msg.Deployment.GetDeploymentId())
if popErr != nil {
logger.ErrorContext(ctx, "failed to pop available IP",
slog.String("error", popErr.Error()),
)
return nil, connect.NewError(connect.CodeInternal, popErr)
}

// Now allocate the IP with all required fields
ipAlloc, allocErr := s.queries.AllocateIP(ctx, database.AllocateIPParams{
VmID: vmid,
IpAddr: ipRow.JsonExtract.(string),
NetworkAllocationID: ipRow.ID,
})
if allocErr != nil {
logger.ErrorContext(ctx, "failed to pop available IP",
slog.String("error", allocErr.Error()),
)
return nil, connect.NewError(connect.CodeInternal, allocErr)
}

// CREATION call here

logger.Debug("created vm",
slog.String("id", ipAlloc.VmID),
slog.String("ip", ipAlloc.IpAddr),
slog.Any("requested", vmCount),
slog.Any("fulfilled", vm),
)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Loop is invalid; field name mismatch; not returning real IDs.

  • for vm := range vmCount is not legal Go.
  • PopAvailableIPJSON row field should use the alias (AllocatedIp), not JsonExtract.
  • Collect vmIDs and return them; don’t return static placeholders.

Apply:

-	logger.DebugContext(ctx, "creating vms", slog.Int64("count", int64(vmCount)))
-	for vm := range vmCount {
+	logger.DebugContext(ctx, "creating vms", slog.Int("count", int(vmCount)))
+	vmIDs := make([]string, 0, int(vmCount))
+	for i := 0; i < int(vmCount); i++ {
 		id := s.generateVMID(ctx)
 
 		vmid := fmt.Sprintf("ud-%s", id)
 
 		// Pop an available IP from the pool
 		ipRow, popErr := s.queries.PopAvailableIPJSON(ctx, req.Msg.Deployment.GetDeploymentId())
 		if popErr != nil {
 			logger.ErrorContext(ctx, "failed to pop available IP",
 				slog.String("error", popErr.Error()),
 			)
 			return nil, connect.NewError(connect.CodeInternal, popErr)
 		}
 
 		// Now allocate the IP with all required fields
 		ipAlloc, allocErr := s.queries.AllocateIP(ctx, database.AllocateIPParams{
 			VmID:                vmid,
-			IpAddr:              ipRow.JsonExtract.(string),
+			IpAddr:              ipRow.AllocatedIp,
 			NetworkAllocationID: ipRow.ID,
 		})
 		if allocErr != nil {
 			logger.ErrorContext(ctx, "failed to pop available IP",
 				slog.String("error", allocErr.Error()),
 			)
 			return nil, connect.NewError(connect.CodeInternal, allocErr)
 		}
 
 		// CREATION call here
+		// TODO: invoke s.backend.CreateVM with config and vmid; handle rollback on failure.
 
 		logger.Debug("created vm",
 			slog.String("id", ipAlloc.VmID),
 			slog.String("ip", ipAlloc.IpAddr),
-			slog.Any("requested", vmCount),
-			slog.Any("fulfilled", vm),
+			slog.Int("requested", int(vmCount)),
+			slog.Int("fulfilled", i+1),
 		)
+		vmIDs = append(vmIDs, vmid)
 	}

And update the response:

-	return connect.NewResponse(&metaldv1.CreateDeploymentResponse{
-		VmIds: []string{"ud-001", "ud-002", "ud-003"},
-	}), nil
+	return connect.NewResponse(&metaldv1.CreateDeploymentResponse{
+		VmIds: vmIDs,
+	}), nil
🤖 Prompt for AI Agents
In go/deploy/metald/internal/service/deployment.go around lines 102–138, the
loop and IP fields are wrong and we never return real VM IDs: replace the
invalid "for vm := range vmCount" with a conventional index loop (e.g. for i :=
0; i < vmCount; i++) so we iterate vmCount times; read the popped IP from the
correct field (use ipRow.AllocatedIp or the DB alias instead of JsonExtract);
call AllocateIP and on success append ipAlloc.VmID to a vmIDs slice you build;
fix the error log for allocation to say "failed to allocate IP" when
appropriate; and when returning, return the collected vmIDs (and update the
debug log to use the requested count and fulfilled count from len(vmIDs) or
i+1). Ensure error returns still wrap the original DB errors with
connect.NewError(connect.CodeInternal, err).

Comment on lines +49 to +53
func (s *VMService) generateVMID(ctx context.Context) string {
b := make([]byte, 15)
rand.Read(b)
return base32.StdEncoding.WithPadding(base32.NoPadding).EncodeToString(b)[:24]
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Consider making generateVMID return (string, error).

Propagating errors avoids silent fallbacks for identifiers.

🤖 Prompt for AI Agents
In go/deploy/metald/internal/service/vm.go around lines 49 to 53, change
generateVMID to return (string, error) instead of just string, use
crypto/rand.Read (or check the current rand.Read) and capture its error, return
"", err if the read fails, otherwise encode the bytes and return the ID with nil
error; update all callers to handle the returned error and propagate it
appropriately.

⚠️ Potential issue

crypto/rand.Read error is ignored; add handling.

If rand.Read fails, you’ll silently encode zeroed bytes. Log and handle the error.

Apply:

 func (s *VMService) generateVMID(ctx context.Context) string {
-	b := make([]byte, 15)
-	rand.Read(b)
-	return base32.StdEncoding.WithPadding(base32.NoPadding).EncodeToString(b)[:24]
+	b := make([]byte, 15)
+	if _, err := rand.Read(b); err != nil {
+		s.logger.LogAttrs(ctx, slog.LevelError, "generateVMID: crypto/rand failed", slog.String("error", err.Error()))
+		// Best-effort fallback: still return a value; consider surfacing the error instead.
+	}
+	return base32.StdEncoding.WithPadding(base32.NoPadding).EncodeToString(b)[:24]
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
func (s *VMService) generateVMID(ctx context.Context) string {
b := make([]byte, 15)
rand.Read(b)
return base32.StdEncoding.WithPadding(base32.NoPadding).EncodeToString(b)[:24]
}
func (s *VMService) generateVMID(ctx context.Context) string {
b := make([]byte, 15)
if _, err := rand.Read(b); err != nil {
s.logger.LogAttrs(ctx, slog.LevelError, "generateVMID: crypto/rand failed", slog.String("error", err.Error()))
// Best-effort fallback: still return a value; consider surfacing the error instead.
}
return base32.StdEncoding.WithPadding(base32.NoPadding).EncodeToString(b)[:24]
}
🤖 Prompt for AI Agents
In go/deploy/metald/internal/service/vm.go around lines 49–53, rand.Read error
is ignored; change generateVMID to return (string, error), call rand.Read using
b := make([]byte, 15) and check err := rand.Read(b); if err != nil return "",
fmt.Errorf("generateVMID: rand.Read failed: %w", err), then continue to encode
and return the ID; update all callers to handle the error (propagate or log) and
ensure you import fmt (and adjust any tests).

Comment on lines 101 to 103
s.logger.Info("network allocated",
slog.String("network_cidr", network.BaseNetwork),
slog.Any("network_cidr", network.BaseNetwork),
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Use slog.String for CIDR.

BaseNetwork appears to be a string; slog.Any is noisy here.

-	slog.Any("network_cidr", network.BaseNetwork),
+	slog.String("network_cidr", network.BaseNetwork),
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
s.logger.Info("network allocated",
slog.String("network_cidr", network.BaseNetwork),
slog.Any("network_cidr", network.BaseNetwork),
)
s.logger.Info("network allocated",
slog.String("network_cidr", network.BaseNetwork),
)
🤖 Prompt for AI Agents
In go/deploy/metald/internal/service/vm.go around lines 101 to 103, the logger
uses slog.Any for network.BaseNetwork which is a string and produces noisy
output; replace slog.Any("network_cidr", network.BaseNetwork) with
slog.String("network_cidr", network.BaseNetwork) so the CIDR is logged cleanly
as a string (no other changes needed).

- Reimplement network management/creation
- Rabbit is gonna ruin this commit message
@vercel vercel bot temporarily deployed to Preview – engineering September 11, 2025 11:12 Inactive
@vercel vercel bot temporarily deployed to Preview – dashboard September 11, 2025 11:12 Inactive
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (14)
go/deploy/spire/Makefile (1)

23-25: Fix production trust domain typo.

Use cloud, not cliud.

-TRUST_DOMAIN = production.unkey.cliud
+TRUST_DOMAIN = production.unkey.cloud
go/deploy/metald/internal/network/network.go (1)

3-118: Remove dead commented API from internal/network/network.go

  • Delete the large commented-out subnet API in go/deploy/metald/internal/network/network.go (or gate it behind a build tag); leaving it commented invites drift.
  • Add a short doc.go in the package pointing callers to the maintained APIs: network.GenerateAvailableIPs (iputil.go) and CreateBridge (bridge.go).
  • Repo scan found no live references to the old symbols (CalculateIndex, CalculateIndexOneBased, GetSubnetInfo, GetSubnetInfoByIndex, GetNetwork, GetGateway, ValidateIdentifier, GetAllSubnetsInRange, IsValidIndex); callers import package network but use CreateBridge/GenerateAvailableIPs — removal is safe.
go/deploy/metald/sqlc/schema.sql (3)

11-17: Enforce 1:1 mapping and add basic validations.

Prevent double-allocation of the same network, validate JSON, and add cascade to keep FK integrity.

 CREATE TABLE network_allocations (
   id INTEGER PRIMARY KEY,
   deployment_id TEXT NOT NULL UNIQUE,
   network_id INTEGER NOT NULL,
   bridge_name TEXT NOT NULL UNIQUE,
-  available_ips TEXT NOT NULL,  -- JSON array: ["10.0.0.18","10.0.0.19",...]
+  available_ips TEXT NOT NULL CHECK (json_valid(available_ips)),  -- JSON array: ["10.0.0.18","10.0.0.19",...]
   allocated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
-  FOREIGN KEY (network_id) REFERENCES networks(id)
+  FOREIGN KEY (network_id) REFERENCES networks(id) ON DELETE CASCADE,
+  UNIQUE(network_id)
 );

3-7: Constrain boolean column.

Make intent explicit for is_allocated.

 CREATE TABLE networks (
   id INTEGER PRIMARY KEY,
   base_network TEXT UNIQUE NOT NULL,  -- CIDR notation: "10.0.0.16/28"
-  is_allocated INTEGER NOT NULL DEFAULT 0
+  is_allocated INTEGER NOT NULL DEFAULT 0 CHECK (is_allocated IN (0,1))
 );

19-27: Cascade child rows on parent delete.

Avoid orphaned ip_allocations when a network allocation is removed.

   FOREIGN KEY (network_allocation_id) REFERENCES network_allocations(id),
+  -- Keep child rows in sync with parent lifecycle
+  FOREIGN KEY (network_allocation_id) REFERENCES network_allocations(id) ON DELETE CASCADE,
   UNIQUE(network_allocation_id, ip_addr)
go/deploy/metald/cmd/metald/main.go (2)

312-347: Expose /health on the main server.

You instantiate a health handler but never register it on the primary mux; only the Prometheus server serves /health. Most LB/ingress health checks will probe the main port. Add it here.

- // Create unified health handler
- healthHandler := healthpkg.Handler("metald", getVersion(), startTime)
+ // Create unified health handler
+ healthHandler := healthpkg.Handler("metald", getVersion(), startTime)
@@
- path, handler := metaldv1connect.NewVmServiceHandler(vmService,
+ path, handler := metaldv1connect.NewVmServiceHandler(vmService,
 		connect.WithInterceptors(interceptorList...),
 	)
 	mux.Handle(path, handler)
+ // Health on main server
+ mux.HandleFunc("/health", healthHandler)

318-336: Auth/interceptor change may block health checks; verify behavior.

You removed tenant-auth options; ensure default interceptors don’t inadvertently require auth for /health or RPCs you expect to be public. If auth is required, add an explicit bypass in the interceptor package or mount /health before auth.

Would you like a patch in the interceptors package to whitelist /health?

go/deploy/metald/internal/service/vm.go (3)

105-110: Compile-time mismatch with sqlc AllocateIP signature.

AllocateIP requires VmID, IpAddr, NetworkAllocationID. Here only VmID is provided. Also, CreateVm lacks a network context to select a pool.

  • Fix by requiring deployment_id (or network_allocation_id) in CreateVmRequest and passing all three params, or route single-VM creation through CreateDeployment.
  • Until then, remove this call to avoid broken DB writes.

Example:

- ip, ipErr := s.queries.AllocateIP(ctx, database.AllocateIPParams{
-   VmID: req.Msg.GetVmId(),
- })
+ // TODO: require deployment_id and use PopAvailableIPJSON + AllocateIP with IpAddr and NetworkAllocationID.

112-151: VM ID/IP mapping inconsistency.

You allocate IPs (incorrectly) for req.Msg.VmId but backend returns vmID; response logs/use vmID while IP is tied to a different ID.

  • Generate the VM ID first, use it consistently for AllocateIP and backend.CreateVM, and return that same ID.
- vmID, err := s.backend.CreateVM(ctx, config)
+ genID, genErr := s.generateVMID(ctx)
+ if genErr != nil {
+   return nil, connect.NewError(connect.CodeInternal, genErr)
+ }
+ vmID, err := s.backend.CreateVM(ctx, config /* pass genID if backend supports */)

Also update any AllocateIP to use vmID.


145-151: Hard-coded port.

Port 35428 should come from backend or config; avoid magic numbers in API responses.

- Port: 35428,
+ Port: info.Port, // or from backend response/config

If backend doesn’t expose it yet, extend Backend.CreateVM to return endpoint details.

go/deploy/metald/sqlc/queries.sql (1)

25-31: Name the JSON column alias to produce a typed field from sqlc.

Use an alias like allocated_ip to avoid JsonExtract interface{} and driver-dependent types.

-RETURNING json_extract(available_ips, '$[0]') as allocated_ip, id;
+RETURNING json_extract(available_ips, '$[0]') AS allocated_ip, id;

Then re-run sqlc so PopAvailableIPJSONRow has AllocatedIp string.

go/deploy/metald/internal/database/queries.sql.go (2)

117-122: Typed count preferred.

sqlc generated GetAvailableIPCount returns interface{}; cast issues likely. Make SQL cast to INTEGER so sqlc infers int64.

-SELECT json_array_length(available_ips) as count
+SELECT CAST(json_array_length(available_ips) AS INTEGER) AS count

Re-run sqlc to get GetAvailableIPCount(ctx string) (int64, error).


200-216: Rename json_extract alias to allocated_ip and regenerate.

Avoid interface{} + type assertions at call sites.

-RETURNING json_extract(available_ips, '$[0]') as allocated_ip, id
+RETURNING json_extract(available_ips, '$[0]') AS allocated_ip, id

Then re-run sqlc so:

type PopAvailableIPJSONRow struct {
    AllocatedIp string `db:"allocated_ip" json:"allocated_ip"`
    ID          int64  `db:"id" json:"id"`
}
```<!-- review_comment_end -->

</blockquote></details>
<details>
<summary>go/deploy/metald/internal/service/deployment.go (1)</summary><blockquote>

`140-143`: **Return actual IDs instead of placeholders.**



```diff
- return connect.NewResponse(&metaldv1.CreateDeploymentResponse{
-   VmIds: []string{"ud-001", "ud-002", "ud-003"},
- }), nil
+ return connect.NewResponse(&metaldv1.CreateDeploymentResponse{
+   VmIds: vmIDs,
+ }), nil
♻️ Duplicate comments (18)
go/deploy/spire/Makefile (1)

128-128: Remove stale create-users prerequisite from install-server.

This target no longer exists; current Makefile will fail to resolve it. (Repeat from prior review.)

-install-server: download-spire create-users setup-directories ## Install SPIRE server
+install-server: download-spire setup-directories ## Install SPIRE server
go/deploy/metald/internal/service/vm.go (2)

101-103: Use slog.String for CIDR.

BaseNetwork is string; avoid slog.Any.

-	slog.Any("network_cidr", network.BaseNetwork),
+	slog.String("network_cidr", network.BaseNetwork),

49-53: Handle crypto/rand errors; return (string, error).

Ignoring rand.Read can yield zeroed IDs. Return an error and propagate to callers.

-func (s *VMService) generateVMID(ctx context.Context) string {
-	b := make([]byte, 15)
-	rand.Read(b)
-	return base32.StdEncoding.WithPadding(base32.NoPadding).EncodeToString(b)[:24]
-}
+func (s *VMService) generateVMID(ctx context.Context) (string, error) {
+	b := make([]byte, 15)
+	if _, err := rand.Read(b); err != nil {
+		return "", fmt.Errorf("generateVMID: rand.Read failed: %w", err)
+	}
+	return base32.StdEncoding.WithPadding(base32.NoPadding).EncodeToString(b)[:24], nil
+}
go/deploy/metald/internal/network/iputil.go (3)

9-24: Return Go slice (or add a helper) and validate IPv4.

Returning JSON couples callers; prefer []string (or add a GenerateAvailableIPsJSON wrapper). Also validate IPv4 and handle small subnets.

-func GenerateAvailableIPs(cidr string) (string, error) {
+func GenerateAvailableIPs(cidr string) ([]string, error) {
 	_, ipnet, err := net.ParseCIDR(cidr)
 	if err != nil {
-		return "", fmt.Errorf("invalid CIDR: %w", err)
+		return nil, fmt.Errorf("invalid CIDR: %w", err)
 	}
-	// Calculate the number of hosts
-	ones, bits := ipnet.Mask.Size()
-	hostBits := bits - ones
-	numHosts := (1 << hostBits) - 2 // Exclude network and broadcast
+	base := ipnet.IP.To4()
+	if base == nil {
+		return nil, fmt.Errorf("only IPv4 CIDRs are supported: %s", cidr)
+	}
+	ones, bits := ipnet.Mask.Size() // expect 32
+	hostBits := uint(bits - ones)
+	total := uint64(1) << hostBits
+	usable := uint64(0)
+	if total > 3 {
+		usable = total - 3 // exclude network, gateway, broadcast
+	}
-	if numHosts < 1 {
-		return "[]", nil // No usable IPs in this network
+	if usable == 0 {
+		return []string{}, nil
 	}

26-39: Cap output size to avoid memory blowups on broad CIDRs and guard broadcast calc.

-	ips := make([]string, 0, numHosts-1) // -1 to also exclude gateway
+	const maxIPs uint64 = 65536
+	count := usable
+	if count > maxIPs {
+		count = maxIPs
+	}
+	ips := make([]string, 0, int(count))
@@
-	ip := ipnet.IP.Mask(ipnet.Mask)
+	ip := base.Mask(ipnet.Mask)
@@
-	broadcast := getBroadcast(ipnet)
-	for i := 0; i < numHosts-1 && !ip.Equal(broadcast); i++ {
+	broadcast := getBroadcast(ipnet)
+	if broadcast == nil {
+		return nil, fmt.Errorf("failed to compute broadcast for %s", cidr)
+	}
+	for i := uint64(0); i < count && !ip.Equal(broadcast); i++ {
 		ips = append(ips, ip.String())
 		incIP(ip)
 	}
-
-	// Convert to JSON array
-	jsonBytes, err := json.Marshal(ips)
-	if err != nil {
-		return "", fmt.Errorf("failed to marshal IPs to JSON: %w", err)
-	}
-
-	return string(jsonBytes), nil
+	return ips, nil

Optionally add:

func GenerateAvailableIPsJSON(cidr string) (string, error) { ips, err := GenerateAvailableIPs(cidr); if err != nil { return "", err }; b, err := json.Marshal(ips); if err != nil { return "", err }; return string(b), nil }
```<!-- review_comment_end -->

---

`60-71`: **Handle non-IPv4 input in getBroadcast.**

To4() may be nil; return nil to let caller fail fast.




```diff
 func getBroadcast(ipnet *net.IPNet) net.IP {
-	ip := ipnet.IP.To4()
+	ip := ipnet.IP.To4()
+	if ip == nil {
+		return nil
+	}
 	mask := ipnet.Mask
 	broadcast := make(net.IP, len(ip))
go/deploy/metald/internal/database/queries.sql.go (1)

59-70: BridgeName: enforce IFNAMSIZ in schema and validate in app.

Add a CHECK (length <= 15) in schema. This prevents persisting invalid names that netlink rejects.

go/deploy/metald/internal/network/bridge.go (3)

10-19: Make CreateBridge idempotent (don’t early-return).

If the bridge exists, still ensure IP address and link-up; use AddrReplace.

- link, linkErr := netlink.LinkByName(cfg.BridgeName)
- if linkErr == nil {
-   logger.Debug("bridge exists",
-     slog.String("bridge", cfg.BridgeName),
-     slog.Int("mtu", link.Attrs().MTU),
-   )
-   return link.Attrs().Name, nil
- }
+ link, linkErr := netlink.LinkByName(cfg.BridgeName)
+ var br netlink.Link
+ if linkErr == nil {
+   logger.Debug("bridge exists",
+     slog.String("bridge", cfg.BridgeName),
+     slog.Int("mtu", link.Attrs().MTU),
+   )
+   br = link
+ } else {
+   bridge := &netlink.Bridge{LinkAttrs: netlink.LinkAttrs{Name: cfg.BridgeName}} //nolint:exhaustruct
+   if err := netlink.LinkAdd(bridge); err != nil {
+     logger.Error("failed to create bridge", slog.String("bridge", cfg.BridgeName), slog.String("error", err.Error()))
+     return "", fmt.Errorf("failed to create bridge: %w", err)
+   }
+   var brErr error
+   br, brErr = netlink.LinkByName(cfg.BridgeName)
+   if brErr != nil {
+     return "", fmt.Errorf("failed to get bridge: %w", brErr)
+   }
+ }

40-51: Wrong address: using network address; assign first host with correct mask.

Assign a usable host IP (e.g., .1) and replace existing addr.

- addr, err := netlink.ParseAddr(cfg.BaseNetwork.String())
+ prefixSize, _ := cfg.BaseNetwork.Mask.Size()
+ gwIP, err := firstHostIPv4(cfg.BaseNetwork.IP)
+ if err != nil {
+   return "", fmt.Errorf("failed to derive bridge IP: %w", err)
+ }
+ addr, err := netlink.ParseAddr(fmt.Sprintf("%s/%d", gwIP.String(), prefixSize))
  if err != nil {
    return "", fmt.Errorf("failed to parse bridge IP: %w", err)
  }
-
- if err := netlink.AddrAdd(br, addr); err != nil {
+ if err := netlink.AddrReplace(br, addr); err != nil {
    logger.Error("failed to add IP to bridge",
      slog.String("bridge", cfg.BridgeName),
      slog.String("error", err.Error()),
    )
-   return "", fmt.Errorf("failed to add IP to bridge: %w", err)
+   return "", fmt.Errorf("failed to configure IP on bridge: %w", err)
  }

Add helper:

// firstHostIPv4 returns the first usable IPv4 in the subnet (network IP + 1).
func firstHostIPv4(ip net.IP) (net.IP, error) {
    ip4 := ip.To4()
    if ip4 == nil { return nil, fmt.Errorf("only IPv4 is supported") }
    out := make(net.IP, 4); copy(out, ip4)
    for i := 3; i >= 0; i-- { out[i]++; if out[i] != 0 { break } }
    return out, nil
}

(Remember to import net.)


62-67: Message nit: “bridge configured” is clearer for idempotent flow.

- logger.Debug("bridge created",
+ logger.Debug("bridge configured",
go/deploy/metald/internal/service/deployment.go (6)

19-25: Span name: use deployment-specific name.

Improves trace readability.

- ctx, span := s.tracer.Start(ctx, "metald.vm.create",
+ ctx, span := s.tracer.Start(ctx, "metald.deployment.create",

39-45: Return the correct error variable.

bnErr is the parse error; you’re returning netErr.

- return nil, connect.NewError(connect.CodeInternal, netErr)
+ return nil, connect.NewError(connect.CodeInternal, bnErr)

47-54: Avoid hard-coded DNS resolvers.

Make DNSServers configurable or use infra defaults.

- DNSServers: []string{"4.2.2.2", "4.2.2.1"},
+ DNSServers: s.resolveDNSServersFromConfigOrDefault(),

Add a helper to read from config/env.


56-67: Cleanup message and rollback path.

You release the network, not a bridge; fix log. Consider wrapping AllocateNetwork→CreateBridge→CreateNetworkAllocation in a tx and releasing/rolling back on any failure.

- logger.Debug("cleaned up bridge")
+ logger.Debug("released network after bridge creation failure")

I can propose a tx-scoped flow with s.queries if desired.


92-100: Use concrete slog types.

- slog.Any("network_cidr", netConfig.BaseNetwork.String()),
- slog.Any("network_id", na.ID),
+ slog.String("network_cidr", netConfig.BaseNetwork.String()),
+ slog.Int64("network_id", na.ID),

102-138: Invalid loop; unsafe type assertion; return real VM IDs.

  • for vm := range vmCount is invalid.
  • ipRow.JsonExtract may be []byte; assertion to string can panic.
  • Collect and return actual IDs instead of static placeholders.
- logger.DebugContext(ctx, "creating vms", slog.Int64("count", int64(vmCount)))
- for vm := range vmCount {
-   id := s.generateVMID(ctx)
+ logger.DebugContext(ctx, "creating vms", slog.Int("count", int(vmCount)))
+ vmIDs := make([]string, 0, int(vmCount))
+ for i := 0; i < int(vmCount); i++ {
+   id, idErr := s.generateVMID(ctx)
+   if idErr != nil { return nil, connect.NewError(connect.CodeInternal, idErr) }
    vmid := fmt.Sprintf("ud-%s", id)
@@
-   ipAlloc, allocErr := s.queries.AllocateIP(ctx, database.AllocateIPParams{
+   // Safely coerce allocated IP from sqlc row until we alias it
+   var ipStr string
+   switch v := ipRow.JsonExtract.(type) {
+   case string: ipStr = v
+   case []byte: ipStr = string(v)
+   default:
+     return nil, connect.NewError(connect.CodeInternal, fmt.Errorf("unexpected IP type %T", v))
+   }
+   ipAlloc, allocErr := s.queries.AllocateIP(ctx, database.AllocateIPParams{
      VmID:                vmid,
-     IpAddr:              ipRow.JsonExtract.(string),
+     IpAddr:              ipStr,
      NetworkAllocationID: ipRow.ID,
    })
@@
-   logger.Debug("created vm",
+   logger.Debug("created vm",
      slog.String("id", ipAlloc.VmID),
      slog.String("ip", ipAlloc.IpAddr),
-     slog.Any("requested", vmCount),
-     slog.Any("fulfilled", vm),
+     slog.Int("requested", int(vmCount)),
+     slog.Int("fulfilled", i+1),
    )
+   vmIDs = append(vmIDs, vmid)
  }

And update the response (below).

go/deploy/metald/internal/network/idgen.go (2)

9-12: Fix stale/incomplete comments; convert to proper doc comment.

These lines reference the removed IDGenerator and contain a fragment. Replace with a concise doc comment for GenerateID and note the DB uniqueness contract.

-// IDGenerator generates short, unique IDs for network devices
-// Network interface names in Linux are limited to 15 characters,
-
-// Generates an for the bridge, it's not guaranteed unique until it's saved to the DB
+// GenerateID returns a Linux-safe 15-char bridge name: "br-" + 12 base32 chars.
+// Not guaranteed unique; enforce UNIQUE(bridge_name) at persistence and retry on conflict.
+// Linux netdev names are limited to 15 characters.

13-19: Don’t ignore crypto/rand errors; normalize to lowercase; minor string concat nit.

rand.Read error must be handled; lowercase avoids surprises with netdev naming; concatenation avoids an unnecessary Sprintf.

-func GenerateID() string {
-	b := make([]byte, 8)
-	rand.Read(b)
-	id := base32.StdEncoding.WithPadding(base32.NoPadding).EncodeToString(b)[:12]
-
-	return fmt.Sprintf("br-%s", id)
-}
+func GenerateID() string {
+	b := make([]byte, 8)
+	if _, err := io.ReadFull(rand.Reader, b); err != nil {
+		panic(fmt.Errorf("idgen: crypto/rand failure: %w", err))
+	}
+	enc := base32.StdEncoding.WithPadding(base32.NoPadding)
+	id := strings.ToLower(enc.EncodeToString(b)[:12])
+	return "br-" + id
+}

Add required imports:

 import (
 	"crypto/rand"
 	"encoding/base32"
 	"fmt"
+	"io"
+	"strings"
 )

If panicking here is undesirable, consider changing the signature to func GenerateID() (string, error) and propagate errors to callers. I can help refactor call sites if desired.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 11cf7bd and df5c3f0.

📒 Files selected for processing (15)
  • go/deploy/metald/cmd/metald/main.go (1 hunks)
  • go/deploy/metald/internal/backend/firecracker/asset_manager.go (0 hunks)
  • go/deploy/metald/internal/database/models.go (1 hunks)
  • go/deploy/metald/internal/database/queries.sql.go (4 hunks)
  • go/deploy/metald/internal/network/bridge.go (1 hunks)
  • go/deploy/metald/internal/network/idgen.go (1 hunks)
  • go/deploy/metald/internal/network/iputil.go (1 hunks)
  • go/deploy/metald/internal/network/network.go (1 hunks)
  • go/deploy/metald/internal/service/auth.go (0 hunks)
  • go/deploy/metald/internal/service/deployment.go (1 hunks)
  • go/deploy/metald/internal/service/vm.go (3 hunks)
  • go/deploy/metald/sqlc/queries.sql (1 hunks)
  • go/deploy/metald/sqlc/schema.sql (1 hunks)
  • go/deploy/pkg/observability/interceptors/options.go (0 hunks)
  • go/deploy/spire/Makefile (1 hunks)
💤 Files with no reviewable changes (3)
  • go/deploy/pkg/observability/interceptors/options.go
  • go/deploy/metald/internal/service/auth.go
  • go/deploy/metald/internal/backend/firecracker/asset_manager.go
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-09-10T04:47:17.076Z
Learnt from: imeyer
PR: unkeyed/unkey#3941
File: go/deploy/spire/Makefile:110-110
Timestamp: 2025-09-10T04:47:17.076Z
Learning: In the SPIRE Makefile (go/deploy/spire/Makefile), the create-users target has been removed, so any references to create-users in target dependencies should also be removed. Both install-agent and install-server targets had create-users dependencies that needed to be cleaned up.

Applied to files:

  • go/deploy/spire/Makefile
🧬 Code graph analysis (2)
go/deploy/metald/internal/database/queries.sql.go (1)
go/deploy/metald/internal/database/models.go (1)
  • NetworkAllocation (25-32)
go/deploy/metald/internal/service/deployment.go (6)
go/deploy/metald/internal/service/vm.go (1)
  • VMService (26-34)
go/pkg/otel/tracing/trace.go (1)
  • Start (59-62)
go/deploy/metald/internal/network/idgen.go (1)
  • GenerateID (13-19)
go/deploy/metald/internal/network/bridge.go (1)
  • CreateBridge (11-68)
go/deploy/metald/internal/network/iputil.go (1)
  • GenerateAvailableIPs (11-48)
go/deploy/metald/internal/database/queries.sql.go (2)
  • CreateNetworkAllocationParams (65-70)
  • AllocateIPParams (19-23)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: Test Go API Local / Test
  • GitHub Check: Build / Build
  • GitHub Check: Test API / API Test Local
  • GitHub Check: Analyze (javascript-typescript)
🔇 Additional comments (4)
go/deploy/spire/Makefile (2)

110-110: LGTM: removed nonexistent dependency.

Dropping create-users from install-agent is correct per recent cleanup.


111-121: Incorrect — systemd units run as root; no user creation required

Both go/deploy/spire/contrib/systemd/spire-agent.service and go/deploy/spire/contrib/systemd/spire-server.service set User=root, so the install targets do not need to create dedicated spire users; remove/resolve the original suggestion.

Likely an incorrect or invalid review comment.

go/deploy/metald/sqlc/schema.sql (1)

11-15: Migration warning for NOT NULL bridge_name.

If this alters an existing DB, adding a NOT NULL column without a default in SQLite will fail unless the table is empty. Provide a migration that backfills bridge_name (e.g., br-) and then adds the NOT NULL + UNIQUE constraint.

go/deploy/metald/internal/network/idgen.go (1)

5-5: LGTM on base32 choice.

Using base32 without padding is appropriate for fixed-length, interface-safe IDs.

@graphite-app
Copy link

graphite-app bot commented Sep 11, 2025

Movie gif. Robert Downey Jr as Tony Stark in Iron Man slides on sunglasses with a smooth grin on his face. He gives a thumbs up as people in business attire and military uniforms stand and clap behind him.  (Added via Giphy)

@chronark chronark merged commit f84edb6 into main Sep 11, 2025
18 checks passed
@chronark chronark deleted the push-tokvskrsrkut branch September 11, 2025 14:16
@graphite-app
Copy link

graphite-app bot commented Sep 11, 2025

Graphite Automations

"Post a GIF when PR approved" took an action on this PR • (09/11/25)

1 gif was posted to this PR based on Andreas Thomas's automation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants