Skip to content

Add internal Certificate Authority for peer TLS certificates#5491

Closed
zgv163 wants to merge 18 commits intonetbirdio:mainfrom
zgv163:feature/internal-ca
Closed

Add internal Certificate Authority for peer TLS certificates#5491
zgv163 wants to merge 18 commits intonetbirdio:mainfrom
zgv163:feature/internal-ca

Conversation

@zgv163
Copy link
Copy Markdown

@zgv163 zgv163 commented Mar 3, 2026

Describe your changes

Adds a built-in Certificate Authority to NetBird that issues TLS certificates for peer hostnames, eliminating browser security warnings when accessing services over NetBird peer domains.

Management server:

  • Internal CA module with ECDSA P-256 root certificate generation, constrained to the account's DNS domain via x509 NameConstraints
  • Certificate signing via gRPC with CSR-based flow (peer private keys never leave the device)
  • CA initialization with configurable display name, organization, and validity period
  • CA rotation (creates new CA alongside existing one for graceful transition; old CA can be deactivated after distribution)
  • Certificate revocation by serial number
  • REST API and OpenAPI spec for CA management and issued certificates
  • Wildcard DNS record support for peers with wildcard certificates
  • Permission module for certificate authority operations
  • Certificate validity tied to peer login expiration
  • Integration tests wired into the black-box test server

Client:

  • netbird cert request — request a TLS certificate (supports --wildcard flag)
  • netbird cert status — show current certificate details
  • netbird cert trust-ca — install CA into OS trust store
  • netbird cert untrust-ca — remove CA from OS trust store
  • Certificate auto-renewal before expiry
  • CA distribution via management sync
  • Platform trust store integration (macOS, Linux, Windows)
  • Proto definitions for certificate daemon RPCs and gRPC signing

Account settings:

  • cert_wildcard_allowed setting to control whether peers can request wildcard certificates

Issue ticket number and link

Closes #5479

Stack

Checklist

  • Is it a bug fix
  • Is a typo/documentation fix
  • Is a feature enhancement
  • It is a refactor
  • Created tests that fail without the change (if possible)

By submitting this pull request, you confirm that you have read and agree to the terms of the Contributor License Agreement.

Documentation

Select exactly one:

  • I added/updated documentation for this change
  • Documentation is not needed for this change (explain why)

Docs PR URL (required if "docs added" is checked)

Paste the PR link from https://github.com/netbirdio/docs here:

netbirdio/docs#643

Companion PRs:

Summary by CodeRabbit

  • New Features

    • Full certificate lifecycle: request/status/trust/untrust via CLI, automatic renewal, wildcard support, and OS CA trust on macOS/Linux/Windows.
    • CA management API/UI: initialize, rotate, deactivate CAs; list/revoke issued certificates; CA certs propagated to peers.
  • Tests

    • Extensive unit and integration tests covering certificate manager, CA manager, signers, gRPC/HTTP handlers, CLI and end-to-end flows.
  • Other

    • Per-peer rate limiting and activity events for certificate operations.

@CLAassistant
Copy link
Copy Markdown

CLAassistant commented Mar 3, 2026

CLA assistant check
All committers have signed the CLA.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Mar 3, 2026

📝 Walkthrough

Walkthrough

Adds full certificate management: daemon CLI and RPCs, client-side cert manager with OS trust installers, engine-driven renewal, management CA subsystem (internal CA + ACME stub), storage and APIs (gRPC/HTTP), and sync changes to distribute CA PEMs and wildcard-peer data.

Changes

Cohort / File(s) Summary
Client CLI
client/cmd/cert.go, client/cmd/root.go
New cert command group and subcommands (request, status, trust-ca, untrust-ca) that call daemon RPCs and print human-friendly outputs.
Client cert manager & platform trust
client/internal/cert/manager.go, client/internal/cert/manager_test.go, client/internal/cert/trust_darwin.go, client/internal/cert/trust_linux.go, client/internal/cert/trust_windows.go
New Manager for keys/CSRs/certs, atomic disk persistence, inspection/renewal helpers, tests, and OS-specific CA install/uninstall/is-trusted implementations.
Client runtime wiring
client/internal/connect.go, client/internal/engine.go
Expose/set certManager on ConnectClient/Engine; add background renewal loop, renewal logic, and engine hooks to trigger renewals and store CA data.
Daemon API & server
client/proto/daemon.proto, client/server/cert.go, client/server/server.go
New daemon RPCs (RequestCertificate, GetCertificateStatus, TrustCA, UntrustCA) and server-side implementations handling CSR/key gen, signing via management, storage, and OS trust operations.
Management CA core
management/server/ca/signer.go, management/server/ca/types.go, management/server/ca/manager.go, management/server/ca/internal_ca.go, management/server/ca/acme_stub.go, management/server/ca/*_test.go
CA models, CertSigner interface, internal CA implementation, ACME stub signer, CA Manager with signing, rotation, rate-limiting, persistence, and extensive unit tests.
Management gRPC & sync
management/internals/shared/grpc/cert_service.go, management/internals/shared/grpc/conversion.go, management/internals/shared/grpc/server.go, management/internals/controllers/network_map/controller/...
New gRPC methods (SignCertificate, GetCACertificates), add CA PEMs into PeerConfig/ToSyncResponse, thread wildcard-peer discovery into zone resolution and call sites.
Management HTTP API & handlers
management/server/http/handlers/ca/ca_handler.go, management/server/http/handler.go, management/server/http/handlers/ca/ca_handler_test.go
New REST endpoints and handler wiring for CA lifecycle and issued-certificate operations, permissions checks, and tests.
Storage & store interface
management/server/store/store.go, management/server/store/sql_store.go
Store interface and SQL implementation extended for CACertificate, IssuedCertificate, CertIssuanceLog, queries, migrations; encrypt/decrypt sensitive fields and new persistence methods.
Management types & settings
management/server/types/settings.go, management/server/types/account.go
Settings extended with CA-related fields (enable, wildcard, validity, rate limits, ACME config); GetPeersCustomZone updated to emit wildcard DNS records for wildcard peers.
Management server init & modules
management/internals/server/modules.go, management/internals/server/boot.go
CA manager constructed and registered (ACME stub), CAManager wired into server and gRPC API handlers.
Permissions & activity codes
management/server/permissions/modules/module.go, management/server/activity/codes.go
New CertificateAuthority permission module and activity codes for CA/certificate events.
Shared management client & REST
shared/management/client/*.go, shared/management/client/rest/ca.go, shared/management/client/rest/client.go, shared/management/client/rest/ca_test.go
Added SignCertificate/GetCACertificates client methods, REST CA client, mocks, integration tests, and expose CertificateAuthority API on REST client.
API spec & generated types
shared/management/http/api/openapi.yml, shared/management/http/api/types.gen.go
OpenAPI additions for CA endpoints/schemas and generated types (CACertificateResponse, IssuedCertificateResponse); AccountSettings exposes cert_wildcard_allowed.
Shared protos
shared/management/proto/management.proto
Added SignCertificate/GetCACertificates RPCs, CertSigningType enum, SignCertificate messages, CACertificateInfo, and PeerConfig ca_certificates_pem field.
Tests & harness updates
management/server/*_test.go, management/server/http/testing/testing_tools/..., management/server/peer_test.go, management/server/account_test.go
Many tests and test harness changes to initialize CA manager and adapt to extended ToSyncResponse/GetPeersCustomZone signatures.

Sequence Diagram(s)

sequenceDiagram
    actor Peer as Peer/Client
    participant CLI as NetBird CLI
    participant Daemon as Daemon
    participant MGM as Management (MGM)
    participant CA as CA Manager
    participant Store as SQL Store

    Peer->>CLI: netbird cert request --type internal
    CLI->>Daemon: RequestCertificate(signing_type, wildcard)
    Daemon->>Daemon: generate key, create CSR
    Daemon->>MGM: SignCertificate(CSR, signing_type, wildcard)
    MGM->>CA: SignCertificate(CSR, peerFQDN, wildcard)
    CA->>Store: Persist issued certificate
    Store-->>CA: Confirm
    CA-->>MGM: Signed cert + chain + expires_at
    MGM-->>Daemon: Certificate + chain + expires_at
    Daemon->>Daemon: Store cert/key on disk
    Daemon-->>CLI: Return cert paths and metadata
Loading
sequenceDiagram
    actor Admin as Administrator
    participant HTTP as Management API
    participant CA as CA Manager
    participant Store as SQL Store
    participant OS as Client OS

    Admin->>HTTP: POST /api/ca (init)
    HTTP->>CA: InitForAccount(domain, options)
    CA->>Store: CreateCACertificate
    Store-->>CA: CA record
    CA-->>HTTP: CA response (fingerprint)
    Admin->>HTTP: Trust CA (trigger)
    HTTP->>Store: GetActiveCACertificates
    HTTP->>OS: InstallCA(certPEM)
    OS-->>HTTP: Success + fingerprints
    HTTP-->>Admin: Success
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Possibly related PRs

Poem

🐰 I hopped through keys and CSRs so bright,

I begged the CA to sign by moonlight,
Wildcards bloom for every domain,
Peers don HTTPS — no browser pain,
A rabbit hums: renew, trust, delight!

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Note

Due to the large number of review comments, Critical severity comments were prioritized as inline comments.

🟠 Major comments (18)
client/internal/cert/trust_linux.go-75-90 (1)

75-90: ⚠️ Potential issue | 🟠 Major

IsCATrusted can report true when trust is not effectively active.

Line 88 checks only whether the NetBird cert file exists. If trust-store regeneration previously failed, this returns true while clients may still not trust the CA.

💡 Suggested direction
 func IsCATrusted(caPEM []byte) bool {
-    certDir, _, err := detectDistro()
+    certDir, updateCmd, err := detectDistro()
     if err != nil {
         return false
     }
@@
     certPath := fmt.Sprintf("%s/netbird-%s.crt", certDir, fp[:16])
-    _, err = os.Stat(certPath)
-    return err == nil
+    if _, err = os.Stat(certPath); err != nil {
+        return false
+    }
+    // Keep trust metadata in sync before reporting trusted.
+    return exec.Command(updateCmd).Run() == nil
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@client/internal/cert/trust_linux.go` around lines 75 - 90, IsCATrusted
currently returns true merely if the NetBird cert file exists in certDir (using
detectDistro and certFingerprint), which can be stale if trust-store
regeneration failed; update IsCATrusted to also verify the cert is actually
present in the system trust bundle (use the second return value from
detectDistro which currently is ignored) or otherwise confirm trust-store
regeneration succeeded: after computing fp and certPath, open the
distro-provided bundle path (the second value from detectDistro), search the
bundle for the PEM block or the certificate fingerprint (certFingerprint or
matching subject/serial), and only return true if both the per-cert file exists
and the bundle contains the cert; handle and propagate errors accordingly in
IsCATrusted so a missing/failed regeneration does not produce a false positive.
client/internal/cert/trust_linux.go-92-103 (1)

92-103: ⚠️ Potential issue | 🟠 Major

detectDistro relies on PATH environment variable and can fail in restricted daemon/service environments.

Using exec.LookPath() here may fail when /usr/sbin is not in PATH, causing false "unsupported Linux distribution" errors on otherwise supported Debian/Ubuntu and RHEL/Fedora systems. This is a practical problem in containerized deployments and service environments where PATH is restrictive.

💡 Suggested fix (resolve absolute executable path)
+func resolveCmd(name string) (string, error) {
+    candidates := []string{
+        "/usr/sbin/" + name,
+        "/usr/bin/" + name,
+        "/sbin/" + name,
+        "/bin/" + name,
+    }
+    for _, p := range candidates {
+        if st, err := os.Stat(p); err == nil && st.Mode().Perm()&0111 != 0 {
+            return p, nil
+        }
+    }
+    return exec.LookPath(name)
+}
+
 func detectDistro() (certDir, updateCmd string, err error) {
     if _, err := os.Stat(debianCertDir); err == nil {
-        if _, err := exec.LookPath(debianUpdateCmd); err == nil {
-            return debianCertDir, debianUpdateCmd, nil
+        if cmd, err := resolveCmd(debianUpdateCmd); err == nil {
+            return debianCertDir, cmd, nil
         }
     }
     if _, err := os.Stat(rhelCertDir); err == nil {
-        if _, err := exec.LookPath(rhelUpdateCmd); err == nil {
-            return rhelCertDir, rhelUpdateCmd, nil
+        if cmd, err := resolveCmd(rhelUpdateCmd); err == nil {
+            return rhelCertDir, cmd, nil
         }
     }
     return "", "", fmt.Errorf("unsupported Linux distribution: neither %s nor %s found", debianUpdateCmd, rhelUpdateCmd)
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@client/internal/cert/trust_linux.go` around lines 92 - 103, The detectDistro
function should not rely solely on exec.LookPath (which fails in restricted
PATHs); update detectDistro to fall back to known absolute locations for the
update commands when LookPath returns nothing: try
exec.LookPath(debianUpdateCmd) and if that fails check common paths (e.g.
/usr/sbin/update-ca-certificates, /usr/local/sbin/update-ca-certificates) before
giving up, and similarly for rhelUpdateCmd check e.g. /sbin/update-ca-trust and
/usr/sbin/update-ca-trust; retain the existing cert directory checks for
debianCertDir and rhelCertDir and return the first certDir + resolved update
command path found, otherwise return the unsupported distro error as before.
shared/management/http/api/types.gen.go-180-410 (1)

180-410: ⚠️ Potential issue | 🟠 Major

EventActivityCode.Valid() is missing the newly introduced certificate activity codes.

The server now defines/emits certificate activity codes in management/server/activity/codes.go (e.g., certificate.authority.create, certificate.issue, certificate.revoke), but this enum validator does not recognize them. Any client code relying on EventActivityCode.Valid() will treat real events as invalid. Please update the OpenAPI enum for event activity codes and regenerate this file.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@shared/management/http/api/types.gen.go` around lines 180 - 410,
EventActivityCode.Valid() is missing the new certificate activity entries;
update the OpenAPI enum that generates EventActivityCode to include the
certificate activity codes (e.g., certificate.authority.create,
certificate.issue, certificate.revoke) so the generated constants (e.g.,
EventActivityCodeCertificateAuthorityCreate, EventActivityCodeCertificateIssue,
EventActivityCodeCertificateRevoke) appear, then regenerate
shared/management/http/api/types.gen.go so the EventActivityCode.Valid() switch
includes those new constants; ensure the new constant names match the generator
output and rebuild the generated file.
client/server/server.go-120-126 (1)

120-126: ⚠️ Potential issue | 🟠 Major

Use profile/config-derived cert directory instead of global default path.

Line 120 hardcodes profilemanager.DefaultConfigPathDir. With custom configFile or multi-profile setups, cert data can land in the wrong directory.

🐛 Suggested fix
-	certDir := filepath.Join(profilemanager.DefaultConfigPathDir, "certs")
+	certBaseDir := profilemanager.DefaultConfigPathDir
+	if configFile != "" {
+		certBaseDir = filepath.Dir(configFile)
+	}
+	certDir := filepath.Join(certBaseDir, "certs")
 	certMgr, err := cert.NewManager(certDir)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@client/server/server.go` around lines 120 - 126, The code currently builds
certDir using profilemanager.DefaultConfigPathDir which ignores a custom
configFile/profile; change construction of certDir to derive the config path
from the active server config/profile rather than the global default—use the
server's loaded config or profile manager function that returns the actual
config directory (the same source used for other config files) when creating
certDir before calling cert.NewManager; set s.certManager on success exactly as
now (keep cert.NewManager and s.certManager usage).
client/internal/engine.go-426-426 (1)

426-426: ⚠️ Potential issue | 🟠 Major

Start the cert-renewal goroutine only after successful engine startup.

startCertRenewalLoop() is invoked before Start() completes. If Start() exits early on error, the renewal goroutine can keep running against a partially initialized engine context.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@client/internal/engine.go` at line 426, The cert-renewal goroutine is being
launched via startCertRenewalLoop() before Start() finishes, which can leave it
running against a partially-initialized engine; move the call so it is only
invoked after a successful engine startup (i.e., only after Start() returns nil
or at the end of Start() once initialization completes). Remove or defer the
current pre-Start call to startCertRenewalLoop() and add a single call site that
runs only on successful initialization (reference the startCertRenewalLoop()
method and the Start() method on the same engine instance).
management/server/http/handlers/ca/ca_handler.go-84-85 (1)

84-85: ⚠️ Potential issue | 🟠 Major

Reject invalid JSON bodies instead of silently applying defaults.

parseCAOptions ignores decoder errors, so malformed payloads can still create/rotate CAs with default options. This should be a 400 path.

Suggested fix
 import (
+	"errors"
 	"encoding/json"
+	"io"
 	"net/http"
 	"time"
@@
-	opts := parseCAOptions(r)
+	opts, err := parseCAOptions(r)
+	if err != nil {
+		util.WriteError(r.Context(), err, w)
+		return
+	}
@@
-	opts := parseCAOptions(r)
+	opts, err := parseCAOptions(r)
+	if err != nil {
+		util.WriteError(r.Context(), err, w)
+		return
+	}
@@
-func parseCAOptions(r *http.Request) nbca.CAOptions {
+func parseCAOptions(r *http.Request) (nbca.CAOptions, error) {
 	var req api.CAInitRequest
-	_ = json.NewDecoder(r.Body).Decode(&req)
+	if err := json.NewDecoder(r.Body).Decode(&req); err != nil && !errors.Is(err, io.EOF) {
+		return nbca.CAOptions{}, status.Errorf(status.InvalidArgument, "invalid request body: %v", err)
+	}
@@
-	return opts
+	return opts, nil
 }

Also applies to: 168-169, 269-285

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@management/server/http/handlers/ca/ca_handler.go` around lines 84 - 85,
parseCAOptions currently swallows JSON decode errors which lets malformed bodies
fall back to defaults; change parseCAOptions to return (opts, error) (or return
error when already parsing), propagate that error to callers in the CA handlers
(the request handlers that call parseCAOptions around Create/Rotate/Update CA
logic) and when a JSON decode error is returned respond with HTTP 400 and an
explanatory message; specifically, update parseCAOptions to check the
decoder.Decode error and return it, then update each call site (the handlers
that invoke parseCAOptions at the top of the request flow) to check the error
and write a 400 response instead of proceeding with defaults.
client/internal/engine.go-2133-2143 (1)

2133-2143: ⚠️ Potential issue | 🟠 Major

Wildcard certificates are downgraded during renewal.

Renewal always uses CreateCSR(..., false) and SignCertificate(..., false), so peers with wildcard certs will be reissued as non-wildcard certs.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@client/internal/engine.go` around lines 2133 - 2143, Renewal always passes
false for the wildcard flag, so wildcard certs get reissued as non-wildcard;
change the flow in the renewal path to detect whether the certificate/FQDN is a
wildcard (e.g., inspect the currently stored cert or check if fqdn starts with
"*.") and pass that boolean into e.certManager.CreateCSR(key, fqdn,
<isWildcard>) and into e.mgmClient.SignCertificate(..., <isWildcard>) instead of
hardcoding false so the wildcard property is preserved during CSR creation and
signing.
client/internal/engine.go-1043-1051 (1)

1043-1051: ⚠️ Potential issue | 🟠 Major

Certificate renewal can run concurrently from multiple triggers.

updateConfig can launch renewal goroutines, and the periodic loop can do the same, with no in-flight guard. This can cause duplicate signing requests and racing certificate writes.

Also applies to: 2066-2090

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@client/internal/engine.go` around lines 1043 - 1051, Multiple triggers can
start concurrent renewals; add an in-flight guard to prevent duplicate work by
introducing a renewal lock/flag on Engine (e.g. add renewInFlight uint32 or
renewMu + renewing bool to the struct) and modify the renewal pathway so
Engine.renewCertificate checks and sets this flag atomically (or acquires the
mutex) and returns immediately if a renewal is already running, then clears the
flag on completion (with defer) and recovers from panics; apply the same check
before launching goroutines from updateConfig and the periodic loop (the other
renewal site around lines 2066-2090) so only one renewal runs at a time.
management/server/http/handlers/ca/ca_handler.go-93-101 (1)

93-101: ⚠️ Potential issue | 🟠 Major

Do not return success when account setting enablement fails.

initCA can create a CA but still return success even if CertificateAuthorityEnabled fails to persist. That leaves CA state and account settings out of sync.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@management/server/http/handlers/ca/ca_handler.go` around lines 93 - 101, The
handler currently logs failures when GetAccountSettings or SaveAccountSettings
fail but still returns success, letting initCA create a CA while account
settings aren't persisted; update the logic in the CA creation flow (the code
around h.accountManager.GetStore().GetAccountSettings,
settings.CertificateAuthorityEnabled assignment, and
h.accountManager.GetStore().SaveAccountSettings) so that any error returned by
GetAccountSettings or SaveAccountSettings is propagated as a non-success
response (return the error / write an HTTP error and stop further success
handling) instead of only logging; ensure you reference and return/propagate
errors from SaveAccountSettings and from GetAccountSettings (and consider
aborting/cleaning up if CA creation already happened) so CA state and account
settings cannot become out of sync.
management/internals/controllers/network_map/controller/controller.go-265-265 (1)

265-265: ⚠️ Potential issue | 🟠 Major

CA certificates are not being propagated in sync updates.

Both grpc.ToSyncResponse calls pass nil for caCertsPEM, so peers won’t receive CA certs through these network-map update paths.

Also applies to: 401-401

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@management/internals/controllers/network_map/controller/controller.go` at
line 265, The ToSyncResponse calls are passing nil for the caCertsPEM argument
so CA certificates aren't included in sync updates; locate both
grpc.ToSyncResponse usages and pass the correct CA PEM data (e.g. the variable
that holds CA certs such as caCertsPEM or account.Settings.CaCertsPEM /
c.config.CACertsPEM) instead of nil so the CA certs are propagated to peers.
management/server/ca/manager_test.go-353-357 (1)

353-357: ⚠️ Potential issue | 🟠 Major

Rotation test expectation conflicts with the stated rotation behavior.

This test expects two active CAs after rotation, but the objective says old CA should be deactivated. Keeping this assertion can mask a real regression.

💡 Proposed fix
-	// Both should be active
+	// Old CA should be deactivated after rotation
 	active, err := mgr.GetActiveCACertificates(context.Background(), "account1")
 	require.NoError(t, err)
-	assert.Len(t, active, 2)
+	assert.Len(t, active, 1)
+	assert.Equal(t, ca2.ID, active[0].ID)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@management/server/ca/manager_test.go` around lines 353 - 357, The test in
manager_test.go incorrectly asserts two active CAs after rotation; update the
assertion to match the intended rotation behavior (old CA deactivated). Replace
the assert.Len(t, active, 2) check around the call to
mgr.GetActiveCACertificates(context.Background(), "account1") so it asserts a
single active certificate (or explicitly verifies that the new CA is present and
the old CA is not) and, if helpful, add an assertion that the old CA's status is
deactivated using the same manager methods used elsewhere in the test.
management/server/ca/internal_ca.go-105-114 (1)

105-114: ⚠️ Potential issue | 🟠 Major

Cap issued certificate expiry to the CA expiry.

Leaf NotAfter is currently now + s.validity; it can exceed s.caCert.NotAfter and produce an already-doomed lifetime window.

💡 Proposed fix
 	now := time.Now().UTC()
+	notAfter := now.Add(s.validity)
+	if notAfter.After(s.caCert.NotAfter) {
+		notAfter = s.caCert.NotAfter
+	}
+	if !notAfter.After(now) {
+		return nil, fmt.Errorf("CA certificate is expired or has insufficient remaining validity")
+	}
 	template := &x509.Certificate{
 		SerialNumber: serialNumber,
 		Subject: pkix.Name{
 			CommonName: peerFQDN,
 		},
 		DNSNames:  dnsNames,
 		NotBefore: now,
-		NotAfter:  now.Add(s.validity),
+		NotAfter:  notAfter,
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@management/server/ca/internal_ca.go` around lines 105 - 114, The issued
certificate NotAfter is set to now.Add(s.validity) and may exceed the CA's
expiration; update the code that builds the x509.Certificate template (where
template.NotAfter is assigned) to compute expiry := now.Add(s.validity) and cap
it to s.caCert.NotAfter (use the earlier of the two timestamps), then assign
template.NotAfter = expiry so leaf certs never outlive the CA.
management/server/ca/internal_ca.go-220-228 (1)

220-228: ⚠️ Potential issue | 🟠 Major

Add PermittedDNSDomainsCritical: true to mark DNS NameConstraints as critical.

RFC 5280 (PKIX) explicitly requires that the nameConstraints extension on CA certificates MUST be marked critical. Without this field, the constraints may not be enforced by conforming validators and violates the standard. Go's x509.Certificate template requires PermittedDNSDomainsCritical: true to set the critical bit on this extension.

Proposed fix
 		NotBefore:             now,
 		NotAfter:              now.Add(validity),
 		KeyUsage:              x509.KeyUsageCertSign | x509.KeyUsageCRLSign,
 		BasicConstraintsValid: true,
 		IsCA:                  true,
 		MaxPathLen:            0,
 		MaxPathLenZero:        true,
+		PermittedDNSDomainsCritical: true,
 		PermittedDNSDomains:   []string{"." + dnsDomain, dnsDomain},
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@management/server/ca/internal_ca.go` around lines 220 - 228, The Certificate
template that sets PermittedDNSDomains (the x509.Certificate block with fields
NotBefore/NotAfter/KeyUsage/IsCA/MaxPathLenZero/PermittedDNSDomains) must also
set PermittedDNSDomainsCritical: true so the DNS nameConstraints extension is
marked critical per RFC 5280; add the PermittedDNSDomainsCritical: true field to
that x509.Certificate template alongside PermittedDNSDomains.
shared/management/http/api/openapi.yml-4215-4228 (1)

4215-4228: ⚠️ Potential issue | 🟠 Major

Constrain CA init inputs to prevent invalid/over-permissive cert parameters.

Line 4225 currently accepts any integer for validity_days, and subject fields are unbounded. This weakens API-side validation and can allow invalid or overly long-lived CA inputs.

Suggested validation constraints
     CAInitRequest:
       type: object
       description: Optional parameters for initializing a CA certificate
       properties:
         display_name:
           description: Display name for the CA certificate CommonName. Defaults to "{domain} Internal CA ({suffix})"
           type: string
+          minLength: 1
+          maxLength: 64
         organization:
           description: Organization name for the CA certificate. Defaults to "NetBird Self-Hosted"
           type: string
+          minLength: 1
+          maxLength: 64
         validity_days:
           description: CA certificate validity in days. Defaults to 3650 (~10 years)
           type: integer
+          minimum: 1
+          maximum: 3650
+          default: 3650
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@shared/management/http/api/openapi.yml` around lines 4215 - 4228, The
CAInitRequest schema accepts unbounded values; tighten validation by adding
constraints on CAInitRequest properties: for validity_days set integer minimum
and maximum (e.g., minimum: 1 and maximum: 36500) to prevent negative or
excessively long-lived CAs, and for display_name and organization add string
constraints such as minLength (e.g., 1), maxLength (e.g., 255) and an optional
pattern to restrict disallowed characters; update the OpenAPI CAInitRequest
definition (properties display_name, organization, validity_days) to include
these validation keywords so the API rejects invalid/over-permissive inputs.
client/server/cert.go-89-101 (1)

89-101: ⚠️ Potential issue | 🟠 Major

Validate certificate payload before persisting files.

StoreCert is called even if the selected cert bytes are empty. That can persist unusable cert state and break subsequent status/renewal flows.

🔧 Suggested fix
 	// Select cert and chain based on signing type
 	var certPEM, chainPEM []byte
 	switch signingType {
 	case mgmProto.CertSigningType_CERT_SIGNING_ACME:
 		certPEM = signResp.AcmeCertPem
 		chainPEM = signResp.AcmeChainPem
 	default:
 		certPEM = signResp.InternalCertPem
 		chainPEM = signResp.InternalChainPem
 	}
+
+	if len(certPEM) == 0 {
+		return nil, gstatus.Errorf(codes.Internal, "sign certificate: empty certificate returned")
+	}

 	if err := s.certManager.StoreCert(certPEM, chainPEM, keyPEM); err != nil {
 		return nil, gstatus.Errorf(codes.Internal, "store certificate: %v", err)
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@client/server/cert.go` around lines 89 - 101, Before calling
s.certManager.StoreCert ensure the selected cert payloads are non-empty: after
computing certPEM and chainPEM from signingType (using signResp and
mgmProto.CertSigningType_CERT_SIGNING_ACME) validate certPEM and chainPEM (and
keyPEM) are not nil/empty; if any required byte slice is empty, return an
appropriate gstatus error (e.g., codes.InvalidArgument or
codes.FailedPrecondition) instead of calling s.certManager.StoreCert so you
don't persist invalid certificate state.
management/server/ca/types.go-45-48 (1)

45-48: ⚠️ Potential issue | 🟠 Major

Require field encryptor for CA private key operations.

When DataStoreEncryptionKey is not configured in boot, the fieldEncrypt remains nil and CA private keys are persisted unencrypted to the database. Add validation to reject plaintext storage of CA credentials.

🔒 Suggested fix
 func (c *CACertificate) EncryptSensitiveData(enc *crypt.FieldEncrypt) error {
 	if enc == nil {
+		if c.PrivateKeyPEM != "" {
+			return fmt.Errorf("field encryptor is required for CA private key")
+		}
 		return nil
 	}

 func (c *CACertificate) DecryptSensitiveData(enc *crypt.FieldEncrypt) error {
 	if enc == nil {
+		if c.PrivateKeyPEM != "" {
+			return fmt.Errorf("field encryptor is required to decrypt CA private key")
+		}
 		return nil
 	}

Also applies to: 62-65

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@management/server/ca/types.go` around lines 45 - 48, The code currently
allows EncryptSensitiveData to return nil when enc is nil, letting CA private
keys be stored unencrypted; update CACertificate.EncryptSensitiveData to return
a validation error if enc is nil (e.g., "field encryptor required for CA
credentials") instead of silently doing nothing, and apply the same defensive
check to the corresponding DecryptSensitiveData (or similar method around lines
62-65) so neither encryption nor decryption paths accept a nil
*crypt.FieldEncrypt; reference the CACertificate.EncryptSensitiveData method and
the sibling decrypt method/class methods when adding the explicit error return.
management/server/ca/manager.go-155-162 (1)

155-162: ⚠️ Potential issue | 🟠 Major

Persist issued certificate and issuance log atomically.

The issuance log is currently best-effort. If log write fails after certificate persistence, rate-limit counters become inaccurate and repeated issuance can bypass intended limits.

🔒 Suggested direction
-	if err := m.store.CreateIssuedCertificate(ctx, issued); err != nil {
-		return nil, nil, fmt.Errorf("store issued certificate: %w", err)
-	}
-
-	logEntry := NewCertIssuanceLog(accountID, peerID, trigger)
-	if err := m.store.CreateCertIssuanceLog(ctx, logEntry); err != nil {
-		log.WithContext(ctx).Warnf("failed to log certificate issuance: %v", err)
-	}
+	logEntry := NewCertIssuanceLog(accountID, peerID, trigger)
+	if err := m.store.CreateIssuedCertificateWithLog(ctx, issued, logEntry); err != nil {
+		return nil, nil, fmt.Errorf("store issued certificate and issuance log: %w", err)
+	}

(Equivalent transaction-based implementation in the store is also fine.)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@management/server/ca/manager.go` around lines 155 - 162, The certificate
issuance and its issuance log must be persisted atomically instead of
best-effort: replace the separate calls to m.store.CreateIssuedCertificate and
m.store.CreateCertIssuanceLog (and the creation of logEntry via
NewCertIssuanceLog) with a single transactional operation — either add/use a
store method like CreateIssuedCertificateWithLog(ctx, issued, logEntry) or run
both writes inside the store's transaction API (Begin/Commit/Rollback or
RunInTx) so that failure to write the log rolls back the certificate persist;
ensure errors are returned to the caller and that the transaction is committed
only if both created successfully so rate-limit counters remain consistent.
management/server/store/sql_store.go-5242-5248 (1)

5242-5248: ⚠️ Potential issue | 🟠 Major

Avoid in-place encryption mutation of the caller-provided CA object.

CreateCACertificate mutates caCert before persistence. On retry/error paths, the same instance can be encrypted multiple times and become unreadable.

🔧 Proposed fix
 func (s *SqlStore) CreateCACertificate(ctx context.Context, caCert *ca.CACertificate) error {
-	if err := caCert.EncryptSensitiveData(s.fieldEncrypt); err != nil {
+	if caCert == nil {
+		return status.Errorf(status.InvalidArgument, "CA certificate is nil")
+	}
+
+	caCopy := *caCert
+	if err := caCopy.EncryptSensitiveData(s.fieldEncrypt); err != nil {
 		return fmt.Errorf("encrypt CA certificate: %w", err)
 	}
 
-	result := s.db.Create(caCert)
+	result := s.db.Create(&caCopy)
 	if result.Error != nil {
 		log.WithContext(ctx).Errorf("failed to create CA certificate in store: %v", result.Error)
 		return status.Errorf(status.Internal, "failed to create CA certificate in store")
 	}
 	return nil
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@management/server/store/sql_store.go` around lines 5242 - 5248,
CreateCACertificate currently calls caCert.EncryptSensitiveData(s.fieldEncrypt)
which mutates the caller's object; instead make a copy/clone of the incoming
*ca.CACertificate (preserving all fields), call EncryptSensitiveData on that
copy using s.fieldEncrypt, and pass the encrypted copy to s.db.Create so the
original caCert remains unmodified for retry/error paths; reference the
CreateCACertificate function, the ca.CACertificate object, and the
EncryptSensitiveData method when locating where to implement the non-mutating
copy-before-encrypt change.
🟡 Minor comments (4)
client/cmd/cert.go-141-148 (1)

141-148: ⚠️ Potential issue | 🟡 Minor

Return an error when trust/untrust reports Success=false.

Both commands currently return nil even when the daemon reports failure, which makes CLI exit status unreliable for users/scripts.

Suggested fix
 func certTrustCAFn(cmd *cobra.Command, _ []string) error {
@@
 	if resp.Success {
 		cmd.Printf("CA certificate(s) installed into OS trust store\n")
 		for _, fp := range resp.CaFingerprints {
 			cmd.Printf("  Fingerprint: %s\n", fp)
 		}
+		return nil
 	}
-
-	return nil
+	return fmt.Errorf("trust CA failed")
 }
@@
 func certUntrustCAFn(cmd *cobra.Command, _ []string) error {
@@
 	if resp.Success {
 		cmd.Println("CA certificate(s) removed from OS trust store")
+		return nil
 	}
-
-	return nil
+	return fmt.Errorf("untrust CA failed")
 }

Also applies to: 164-168

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@client/cmd/cert.go` around lines 141 - 148, The CLI handlers currently always
return nil even when resp.Success is false; update the trust and untrust
handling (the blocks that inspect resp.Success and resp.CaFingerprints around
the current printing logic) to detect when resp.Success is false, print/log the
daemon-provided error detail (e.g., resp.Error or resp.Message) and return a
non-nil error (use fmt.Errorf with a clear message including the daemon error
text) so the command exits nonzero on failure; apply the same change to the
other similar block (the 164-168 counterpart) and add fmt to imports if needed.
shared/management/client/rest/ca.go-28-38 (1)

28-38: ⚠️ Potential issue | 🟡 Minor

Return nil on parse errors for pointer-returning methods.

These methods currently return &ret, err; on decode failure this returns a non-nil zero-value pointer with an error.

💡 Proposed fix (apply same pattern to `GetCA` and `RotateCA`)
 func (a *CertificateAuthorityAPI) InitCA(ctx context.Context) (*api.CACertificateResponse, error) {
 	resp, err := a.c.NewRequest(ctx, "POST", "/api/ca", nil, nil)
 	if err != nil {
 		return nil, err
 	}
 	if resp.Body != nil {
 		defer resp.Body.Close()
 	}
 	ret, err := parseResponse[api.CACertificateResponse](resp)
-	return &ret, err
+	if err != nil {
+		return nil, err
+	}
+	return &ret, nil
 }

Based on learnings: In Go codebases like netbirdio/netbird, methods returning (T, error) should propagate the error and return early; only inspect/use the value when err is nil.

Also applies to: 41-51, 67-77

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@shared/management/client/rest/ca.go` around lines 28 - 38, The InitCA method
currently returns &ret, err which yields a non-nil zero-value pointer when
parseResponse fails; change the flow to check err from parseResponse and return
nil, err on failure (i.e., ret, err :=
parseResponse[api.CACertificateResponse](resp); if err != nil { return nil, err
}; return &ret, nil). Apply the same pattern to GetCA and RotateCA so
pointer-returning methods return nil on parse/decode errors and only return
&value when err is nil.
management/server/ca/internal_ca.go-271-279 (1)

271-279: ⚠️ Potential issue | 🟡 Minor

Normalize expected DNS names before map lookup.

Incoming CSR names are lowercased, but expected-name keys are built from raw peerFQDN. Mixed-case input can be rejected unexpectedly.

💡 Proposed fix
-	expectedNames := map[string]bool{peerFQDN: true}
+	expectedNames := map[string]bool{strings.ToLower(peerFQDN): true}
 	if wildcard {
-		expectedNames["*."+peerFQDN] = true
+		expectedNames["*."+strings.ToLower(peerFQDN)] = true
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@management/server/ca/internal_ca.go` around lines 271 - 279, The map keys in
expectedNames are created from peerFQDN with original casing, but the CSR names
are compared using strings.ToLower(name), causing mixed-case peerFQDN to be
rejected; update the expectedNames construction to use strings.ToLower(peerFQDN)
(and strings.ToLower("*."+peerFQDN) when wildcard is true) so the keys match the
lowercased CSR DNSNames (references: expectedNames, peerFQDN, wildcard,
csr.DNSNames, strings.ToLower).
client/internal/cert/manager.go-155-159 (1)

155-159: ⚠️ Potential issue | 🟡 Minor

FQDN change detection should not depend on SAN order.

This currently checks only the first SAN entry. If SAN order changes, rename detection can trigger incorrectly and force unnecessary renewals.

🔧 Suggested fix
 func (m *Manager) FQDNChanged(currentFQDN string) bool {
 	m.mu.RLock()
 	defer m.mu.RUnlock()

 	cert, err := m.loadCert()
 	if err != nil {
 		return false
 	}
 	if len(cert.DNSNames) == 0 {
 		return true
 	}
-	return cert.DNSNames[0] != currentFQDN
+	for _, dns := range cert.DNSNames {
+		if dns == currentFQDN {
+			return false
+		}
+	}
+	return true
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@client/internal/cert/manager.go` around lines 155 - 159, The FQDN-change
check incorrectly compares only cert.DNSNames[0] to currentFQDN which is
sensitive to SAN ordering; update the logic in the function (the block
referencing cert.DNSNames and currentFQDN) to scan all entries (or build a set)
and return false if any cert.DNSNames entry equals currentFQDN, otherwise return
true—i.e., replace the single-index comparison with a loop or membership test
over cert.DNSNames.
🧹 Nitpick comments (13)
client/internal/connect.go (1)

481-485: Propagate cert manager to an already-running engine in setter.

Right now SetCertManager updates only ConnectClient state. If called after engine creation, engine keeps stale manager.

♻️ Suggested update
 func (c *ConnectClient) SetCertManager(m *cert.Manager) {
 	c.engineMutex.Lock()
 	c.certManager = m
+	engine := c.engine
 	c.engineMutex.Unlock()
+
+	if engine != nil {
+		engine.SetCertManager(m)
+	}
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@client/internal/connect.go` around lines 481 - 485, SetCertManager currently
only updates ConnectClient.certManager and leaves an already-running engine with
a stale manager; modify SetCertManager to atomically set c.certManager under
c.engineMutex and, if c.engine != nil, propagate the new manager to the running
engine (e.g., call a corresponding method like c.engine.SetCertManager(m) or
c.engine.UpdateCertManager(m)); ensure you perform a nil-check on c.engine and
do this while holding the same c.engineMutex to avoid races.
management/internals/server/modules.go (1)

202-203: Prefer direct store dependency in CAManager() construction.

Using s.AccountManager().GetStore() couples CA initialization to account-manager initialization. s.Store() keeps this module boundary cleaner.

♻️ Suggested simplification
 func (s *BaseServer) CAManager() *ca.Manager {
 	return Create(s, func() *ca.Manager {
-		mgr := ca.NewManager(s.AccountManager().GetStore())
+		mgr := ca.NewManager(s.Store())
 		mgr.RegisterSigner(ca.NewACMEPersistSigner())
 		return mgr
 	})
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@management/internals/server/modules.go` around lines 202 - 203, The CA
manager is being constructed with s.AccountManager().GetStore(), coupling CA
init to the account manager; change the dependency to use the service store
directly by calling ca.NewManager with s.Store() when creating mgr (the manager
variable created by ca.NewManager) and keep the existing
mgr.RegisterSigner(ca.NewACMEPersistSigner()) call unchanged so the CA module
depends only on s.Store() rather than AccountManager.
shared/management/client/rest/ca_test.go (1)

315-318: Make rotation-state semantics explicit in assertions.

At Line 315 the test only checks count after rotation. Add assertions for active/inactive flags by CA ID so intended rotation behavior is locked down and regression-resistant.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@shared/management/client/rest/ca_test.go` around lines 315 - 318, After the
ListCAs call in the test (c.CertificateAuthority.ListCAs and the local variable
cas), add explicit assertions that verify each returned CA's rotation state
(e.g., active/inactive flag or RotationState field) by CA ID so the rotation
outcome is unambiguous; locate the CA entries in cas by their ID values and
assert the expected active=true for the new/active CA and active=false (or
RotationState=="inactive") for the rotated-out CA to lock the intended behavior
into the test.
management/server/peer_test.go (1)

1183-1183: Add a non-nil caCertsPEM assertion path in TestToSyncResponse.

Line 1183 verifies only the nil path. A second case with PEM entries would validate end-to-end CA cert propagation in sync responses.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@management/server/peer_test.go` at line 1183, Add a second assertion path in
TestToSyncResponse that passes a non-nil slice of PEM strings to ToSyncResponse
(replace the current nil caCertsPEM argument used in the call to
grpc.ToSyncResponse) and assert the returned response contains those PEM entries
(e.g., compare to response.CACertsPEM or the struct field that holds CA certs).
Specifically, duplicate the existing call in TestToSyncResponse but provide a
representative []string{"-----BEGIN CERTIFICATE-----..."} for caCertsPEM, call
grpc.ToSyncResponse, and add assertions that the response's CA cert field equals
the provided PEM slice and is correctly non-nil.
management/server/account_test.go (1)

404-405: Add one non-nil wildcard-peers case for this path.

Line 404 validates only the compatibility call path. Consider adding a case with populated wildcardPeers to exercise the new wildcard DNS behavior directly.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@management/server/account_test.go` around lines 404 - 405, Add a test case
that supplies a non-nil wildcardPeers value when calling GetPeerNetworkMap to
exercise the new wildcard DNS behavior: create a testCase that populates
wildcardPeers (e.g., a map/slice with at least one wildcard entry), call
GetPeersCustomZone(...) as before, then call GetPeerNetworkMap(..., customZone,
wildcardPeers, validatedPeers, ...) instead of nil, and add assertions that
verify the wildcard DNS behavior (expected entries/addresses) for the returned
network map; update the table of test cases in account_test.go so this new
scenario runs alongside the existing compatibility-only case.
client/proto/daemon.proto (1)

849-856: Adopt UNSPECIFIED = 0 enum default for safer proto3 schema evolution.

Proto3 best practice reserves the 0 enum value as a non-semantic placeholder. This makes it explicit when a field is truly unspecified versus set to a meaningful value, improving API safety across language bindings and future versions. Renumber INTERNAL to 1 and ACME to 2, then validate that incoming requests reject the unspecified value on the server.

Suggested fix
 enum DaemonCertSigningType {
-  DAEMON_CERT_SIGNING_INTERNAL = 0;
-  DAEMON_CERT_SIGNING_ACME = 1;
+  DAEMON_CERT_SIGNING_UNSPECIFIED = 0;
+  DAEMON_CERT_SIGNING_INTERNAL = 1;
+  DAEMON_CERT_SIGNING_ACME = 2;
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@client/proto/daemon.proto` around lines 849 - 856, Update the
DaemonCertSigningType enum so the zero value is an explicit placeholder: add
UNSPECIFIED = 0, renumber DAEMON_CERT_SIGNING_INTERNAL to 1 and
DAEMON_CERT_SIGNING_ACME to 2, and then update server-side validation for
CertificateRequest.signing_type to reject UNSPECIFIED (zero) as an
invalid/unspecified value before processing.
shared/management/http/api/openapi.yml (2)

341-343: Add explicit default/example for cert_wildcard_allowed.

cert_wildcard_allowed is documented but has no explicit default/example, which makes client behavior less clear when the field is omitted.

Suggested OpenAPI tweak
         cert_wildcard_allowed:
           description: Allows peers to request wildcard subdomain certificates
           type: boolean
+          default: false
+          example: false
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@shared/management/http/api/openapi.yml` around lines 341 - 343, The OpenAPI
property cert_wildcard_allowed lacks an explicit default/example; update the
cert_wildcard_allowed schema entry to include a clear default and/or example
(e.g., default: false and example: false) so clients know the implicit behavior
when the field is omitted; modify the cert_wildcard_allowed property in the
OpenAPI definition to add these keys alongside description and type.

9992-10192: Model lifecycle conflict responses explicitly for CA endpoints.

/api/ca, /api/ca/{caId}, and /api/ca/rotate would benefit from explicit conflict/precondition response codes (e.g., already initialized, no active CA, invalid deactivation state) so clients can handle stateful flows deterministically.

Example response additions
   /api/ca:
     post:
@@
       responses:
         '200':
           description: A JSON Object of the created CA Certificate
@@
+        '409':
+          description: CA already initialized for this account
+          content: { }
   /api/ca/rotate:
     post:
@@
       responses:
         '200':
           description: A JSON Object of the new CA Certificate
@@
+        '412':
+          description: No active CA available to rotate
+          content: { }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@shared/management/http/api/openapi.yml` around lines 9992 - 10192, The
OpenAPI spec for the CA endpoints (/api/ca GET/POST, /api/ca/{caId} GET/DELETE,
and /api/ca/rotate POST) is missing explicit 409/412 conflict or precondition
responses to indicate stateful errors (e.g., CA already initialized, no active
CA to rotate/deactivate, invalid deactivation state); update each operation to
add appropriate response entries (409 Conflict and/or 412 Precondition Failed)
with descriptive descriptions and referenceable response components (e.g.,
create or reuse components/responses/ca_conflict and
components/responses/ca_precondition_failed) so clients can deterministically
handle lifecycle errors, and ensure the new responses are wired into the POST
/api/ca, POST /api/ca/rotate, and DELETE /api/ca/{caId} operations.
management/server/ca/types.go (1)

95-103: Clone DNS name input to avoid external slice mutation.

dnsNames is stored by reference. A caller mutating the original slice can mutate the certificate record unexpectedly.

🔧 Suggested refactor
 import (
 	"fmt"
+	"slices"
 	"time"
@@
 	return &IssuedCertificate{
 		ID:           xid.New().String(),
 		AccountID:    accountID,
 		PeerID:       peerID,
 		SerialNumber: serialNumber,
-		DNSNames:     dnsNames,
+		DNSNames:     slices.Clone(dnsNames),
 		HasWildcard:  hasWildcard,
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@management/server/ca/types.go` around lines 95 - 103, NewIssuedCertificate
stores the incoming dnsNames slice by reference causing external mutations to
affect the IssuedCertificate; fix by copying/cloning the slice before assigning
to the struct (e.g. allocate a new []string and copy or use append to create a
new slice) inside NewIssuedCertificate so the IssuedCertificate.DNSNames is
independent of the caller's slice.
management/internals/shared/grpc/cert_service.go (2)

180-186: Return an explicit error for unknown signing types.

Defaulting unknown values to internal signing makes forward compatibility and validation harder to reason about.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@management/internals/shared/grpc/cert_service.go` around lines 180 - 186, The
function certSigningTypeToString currently defaults unknown
proto.CertSigningType values to ca.SigningTypeInternal; change it to return an
explicit error for unrecognized types by updating its signature to return
(string, error) and mapping known cases (CERT_SIGNING_ACME ->
ca.SigningTypeACME) to return the string with nil error, while returning a
non-nil error (e.g., fmt.Errorf("unknown CertSigningType: %d", t)) for any
default/unknown branch; update all callers of certSigningTypeToString to handle
the error (propagate or return a validation error) so unknown/forward-compatible
enum values are rejected instead of silently treated as internal.

56-59: CSR signature verification already occurs downstream in the CA signing layer.

The signature is validated in InternalCASigner.Sign() (management/server/ca/internal_ca.go) before the certificate is issued. However, moving the check to the entry point in this handler would be a reasonable defense-in-depth improvement for earlier validation and clearer error attribution.

Suggested enhancement (not critical)
 	csr, err := x509.ParseCertificateRequest(signReq.CsrDer)
 	if err != nil {
 		return nil, status.Errorf(codes.InvalidArgument, "invalid CSR: %v", err)
 	}
+	if err := csr.CheckSignature(); err != nil {
+		return nil, status.Errorf(codes.InvalidArgument, "invalid CSR signature: %v", err)
+	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@management/internals/shared/grpc/cert_service.go` around lines 56 - 59, The
parsed CSR from x509.ParseCertificateRequest(signReq.CsrDer) should have its
signature verified immediately for defense-in-depth: after obtaining csr (from
ParseCertificateRequest) call csr.CheckSignature() and if it returns an error
return status.Errorf(codes.InvalidArgument, "invalid CSR signature: %v", err);
keep the existing downstream verification in InternalCASigner.Sign() intact.
This uses the parsed CSR variable (csr) in cert_service.go and does not change
the CA signing logic in InternalCASigner.Sign().
client/server/cert.go (1)

300-306: Avoid implicit fallback to internal signing for unknown enum values.

Defaulting unknown values to internal can silently change behavior for future/invalid client inputs. Prefer returning an explicit error.

🔧 Suggested refactor
-func daemonSigningTypeToMgmt(t proto.DaemonCertSigningType) mgmProto.CertSigningType {
+func daemonSigningTypeToMgmt(t proto.DaemonCertSigningType) (mgmProto.CertSigningType, error) {
 	switch t {
+	case proto.DaemonCertSigningType_DAEMON_CERT_SIGNING_INTERNAL:
+		return mgmProto.CertSigningType_CERT_SIGNING_INTERNAL, nil
 	case proto.DaemonCertSigningType_DAEMON_CERT_SIGNING_ACME:
-		return mgmProto.CertSigningType_CERT_SIGNING_ACME
+		return mgmProto.CertSigningType_CERT_SIGNING_ACME, nil
 	default:
-		return mgmProto.CertSigningType_CERT_SIGNING_INTERNAL
+		return 0, fmt.Errorf("unsupported signing type: %v", t)
 	}
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@client/server/cert.go` around lines 300 - 306, The function
daemonSigningTypeToMgmt currently falls back to
mgmProto.CertSigningType_CERT_SIGNING_INTERNAL for unknown
proto.DaemonCertSigningType values; change the function to return
(mgmProto.CertSigningType, error) instead of a bare mgmProto.CertSigningType,
map known cases (e.g., DAEMON_CERT_SIGNING_ACME -> CERT_SIGNING_ACME) and return
a descriptive error (e.g., fmt.Errorf("unsupported daemon signing type: %v", t))
for the default branch, and update all callers to handle the error and
propagate/fail appropriately; reference the function name
daemonSigningTypeToMgmt and the types proto.DaemonCertSigningType /
mgmProto.CertSigningType when making the changes.
management/server/store/sql_store.go (1)

5388-5390: Use UTC consistently in certificate time filters.

Using time.Now().UTC() here keeps wildcard-active checks aligned with the UTC timestamps used elsewhere in certificate lifecycle logic.

🕒 Proposed tweak
 	result := s.db.Model(&ca.IssuedCertificate{}).
-		Where("account_id = ? AND has_wildcard = ? AND revoked = ? AND not_after > ?", accountID, true, false, time.Now()).
+		Where("account_id = ? AND has_wildcard = ? AND revoked = ? AND not_after > ?", accountID, true, false, time.Now().UTC()).
 		Distinct().Pluck("peer_id", &peerIDs)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@management/server/store/sql_store.go` around lines 5388 - 5390, The WHERE
clause filtering issued certificates uses time.Now() which can yield local time
and mismatch UTC-based certificate timestamps; update the time comparison in the
query that builds on s.db.Model(&ca.IssuedCertificate{}) (the Where(...) that
checks "not_after > ?") to pass time.Now().UTC() so the
has_wildcard/revoked/not_after checks for peerIDs use UTC consistently with the
rest of certificate lifecycle logic.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 05b66e7 and 1623875.

⛔ Files ignored due to path filters (5)
  • client/proto/daemon.pb.go is excluded by !**/*.pb.go
  • client/proto/daemon_grpc.pb.go is excluded by !**/*.pb.go
  • shared/management/proto/management.pb.go is excluded by !**/*.pb.go
  • shared/management/proto/management_grpc.pb.go is excluded by !**/*.pb.go
  • shared/management/proto/proxy_service.pb.go is excluded by !**/*.pb.go
📒 Files selected for processing (49)
  • client/cmd/cert.go
  • client/cmd/root.go
  • client/internal/cert/manager.go
  • client/internal/cert/manager_test.go
  • client/internal/cert/trust_darwin.go
  • client/internal/cert/trust_linux.go
  • client/internal/cert/trust_windows.go
  • client/internal/connect.go
  • client/internal/engine.go
  • client/proto/daemon.proto
  • client/server/cert.go
  • client/server/server.go
  • management/internals/controllers/network_map/controller/controller.go
  • management/internals/controllers/network_map/controller/repository.go
  • management/internals/server/boot.go
  • management/internals/server/modules.go
  • management/internals/shared/grpc/cert_service.go
  • management/internals/shared/grpc/conversion.go
  • management/internals/shared/grpc/server.go
  • management/server/account_test.go
  • management/server/activity/codes.go
  • management/server/ca/acme_stub.go
  • management/server/ca/internal_ca.go
  • management/server/ca/internal_ca_test.go
  • management/server/ca/manager.go
  • management/server/ca/manager_test.go
  • management/server/ca/signer.go
  • management/server/ca/types.go
  • management/server/http/handler.go
  • management/server/http/handlers/accounts/accounts_handler.go
  • management/server/http/handlers/ca/ca_handler.go
  • management/server/http/handlers/ca/ca_handler_test.go
  • management/server/http/testing/testing_tools/channel/channel.go
  • management/server/peer_test.go
  • management/server/permissions/modules/module.go
  • management/server/store/sql_store.go
  • management/server/store/store.go
  • management/server/types/account.go
  • management/server/types/settings.go
  • management/server/types/testdata/networkmap_golden_new_with_onpeeradded_router.json
  • shared/management/client/cert.go
  • shared/management/client/client.go
  • shared/management/client/mock.go
  • shared/management/client/rest/ca.go
  • shared/management/client/rest/ca_test.go
  • shared/management/client/rest/client.go
  • shared/management/http/api/openapi.yml
  • shared/management/http/api/types.gen.go
  • shared/management/proto/management.proto

Comment thread management/internals/shared/grpc/cert_service.go Outdated
Comment thread management/server/ca/internal_ca.go Outdated
Comment thread management/server/ca/manager.go Outdated
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 14

♻️ Duplicate comments (1)
management/server/ca/manager.go (1)

107-111: ⚠️ Potential issue | 🟠 Major

Validate first SAN content, not just SAN count, before wildcard derivation.

len(csr.DNSNames) > 0 still allows csr.DNSNames[0] == "", which can produce invalid names (for example *.) and inconsistent issuance metadata. This was partially addressed, but non-empty SAN validation is still missing.

🐛 Proposed fix
 import (
 	"context"
 	"crypto/x509"
 	"fmt"
+	"strings"
 	"time"
@@
-	if len(csr.DNSNames) == 0 {
+	if len(csr.DNSNames) == 0 {
 		return nil, nil, fmt.Errorf("csr must include at least one DNS SAN")
 	}
-	peerFQDN := csr.DNSNames[0]
+	peerFQDN := strings.TrimSpace(csr.DNSNames[0])
+	if peerFQDN == "" {
+		return nil, nil, fmt.Errorf("csr first DNS SAN must be non-empty")
+	}
@@
-	dnsNames := csr.DNSNames
-	if wildcard && !containsName(dnsNames, "*."+peerFQDN) {
-		dnsNames = append(dnsNames, "*."+peerFQDN)
+	dnsNames := append([]string(nil), csr.DNSNames...)
+	if wildcard {
+		wildcardName := "*." + strings.TrimPrefix(peerFQDN, "*.")
+		if !containsName(dnsNames, wildcardName) {
+			dnsNames = append(dnsNames, wildcardName)
+		}
 	}

Also applies to: 149-152

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@management/server/ca/manager.go` around lines 107 - 111, The code checks
len(csr.DNSNames) > 0 but doesn't validate the actual content, so ensure the
first SAN (csr.DNSNames[0]) is non-empty and not just whitespace before using it
(peerFQDN) or deriving wildcards; if it's empty, return an error like "csr must
include at least one non-empty DNS SAN". Apply the same validation to the other
occurrence that derives a peerFQDN/wildcard (the block around the second check
at the later section) so both places validate csr.DNSNames[0] content rather
than only count.
🧹 Nitpick comments (3)
management/server/http/handlers/ca/ca_handler_test.go (1)

167-175: Assert exact permission operation in tests.

Line 168 and Line 173 use gomock.Any() for operation, so these tests won't catch a handler accidentally checking the wrong operation (Read/Create/Delete) for a route.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@management/server/http/handlers/ca/ca_handler_test.go` around lines 167 -
175, The test currently stubs permissionsManagerMock.ValidateUserPermissions
with gomock.Any() for the operation argument, which hides incorrect operation
checks; update both expectations (the true and false branches) to assert the
exact permission operation constant the handler should require (replace the
operation gomock.Any() with the specific modules.* operation constant the route
uses, e.g. modules.OperationRead / modules.OperationCreate /
modules.OperationDelete) while keeping the same account/user args
(testAccountID, testUserID) so the test fails if the handler checks the wrong
operation.
shared/management/http/api/openapi.yml (1)

341-343: Keep cert_wildcard_allowed required in AccountSettings for schema consistency.

This new settings field is modeled as optional, while nearby account settings flags are modeled as required. Consider keeping it required to preserve a stable object shape for SDK/client generation.

Proposed change
       required:
         - peer_login_expiration_enabled
         - peer_login_expiration
         - peer_inactivity_expiration_enabled
         - peer_inactivity_expiration
         - regular_users_view_blocked
         - peer_expose_enabled
         - peer_expose_groups
+        - cert_wildcard_allowed

Based on learnings: Team preference is to keep new fields required for consistency, and avoid per-field requiredness drift without a global API design update.

Also applies to: 365-373

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@shared/management/http/api/openapi.yml` around lines 341 - 343, The
AccountSettings schema currently defines cert_wildcard_allowed as an optional
boolean; make it required to keep the object shape stable by adding
"cert_wildcard_allowed" into the AccountSettings required array (the same place
other flags are listed), and apply the same change for the other newly-added
flags in that same settings block (the fields referenced around lines 365-373)
so all account flag fields remain consistently required for SDK/client
generation.
management/server/ca/manager.go (1)

97-105: Unify ACME and internal post-sign persistence/audit flow.

The ACME branch returns before CreateIssuedCertificate and CreateCertIssuanceLog. Once ACME signing is enabled, issued-certificate listing/revocation/rate accounting won’t have complete data.

Also applies to: 159-166

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@management/server/ca/manager.go` around lines 97 - 105, The ACME branch in
manager.go currently returns immediately after calling signer.Sign for
SigningTypeACME, skipping the post-sign persistence and audit flow; modify the
SigningTypeACME branch so it captures the signer.Sign result and error (using
m.signers and signer.Sign), then invoke CreateIssuedCertificate and
CreateCertIssuanceLog exactly as the internal branch does (handling and
propagating any persistence/logging errors), and finally return the same tuple
(issued certificate, issuance log, error) without an early return; ensure error
handling preserves the original sign error when appropriate and that both
CreateIssuedCertificate and CreateCertIssuanceLog are always attempted when sign
succeeds.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@client/internal/cert/manager.go`:
- Around line 75-83: StoreCert currently writes cert, chain, and key directly
into m.certDir via atomicWrite which can leave a mixed/partial state on failure;
change StoreCert so it stages the three files into a temporary directory (or
temp filenames) first (e.g., create tempDir := m.certDir + ".tmp-<uniq>" and
write certPEM/chainPEM/keyPEM to tempDir using atomicWrite), then perform a
single commit step that atomically moves the staged files into place (use
os.Rename to move tempDir/<certFileName|chainFileName|keyFileName> to
m.certDir/<...> or rename temp files over the targets), and on any error perform
rollback by deleting the temp files/dir and restoring any backed-up originals;
reference the existing symbols StoreCert, atomicWrite, m.certDir, certFileName,
chainFileName, keyFileName when implementing staging + commit/rollback so writes
become all-or-nothing.
- Around line 51-55: In Manager.CreateCSR, validate the fqdn parameter before
building dnsNames: if fqdn is empty or only whitespace, return a clear error
immediately instead of constructing SAN/CN with an empty string; implement the
check at the top of CreateCSR (use strings.TrimSpace on fqdn) and return a
descriptive error (e.g., "fqdn must not be empty") so downstream CSR creation
never receives invalid SAN/CN values.

In `@client/internal/cert/trust_linux.go`:
- Line 39: The filename uses a truncated fingerprint slice fp[:16] when building
certPath (e.g., certPath := fmt.Sprintf("%s/netbird-%s.crt", certDir, fp[:16]))
which risks collisions; update all occurrences (the certPath construction lines
at the three instances) to use the full SHA-256 hex fingerprint (fp) instead of
fp[:16] so filenames are deterministic and unique; locate the usages in
trust_linux.go where certPath is formed and replace the truncated slice with the
full fp string while preserving the existing fmt.Sprintf pattern and surrounding
logic.
- Around line 44-47: Replace the unbounded exec.Command(...).CombinedOutput()
calls for updateCmd with a context-aware execution to enforce a timeout: create
a context with timeout (e.g., 30s) via context.WithTimeout, use
exec.CommandContext(ctx, updateCmd, ...) to run the command, call
cmd.CombinedOutput() (or cmd.Run()/Output() as appropriate), and if the context
deadline is exceeded return a wrapped timeout error that includes the command
output; apply the same change to both occurrences of
exec.Command(updateCmd).CombinedOutput() (the updateCmd invocations in
trust_linux.go) and import context and time as needed.

In `@client/internal/engine.go`:
- Around line 2159-2163: The renewal code currently hardcodes
mgmProto.CertSigningType_CERT_SIGNING_INTERNAL when calling
e.mgmClient.SignCertificate, which can downgrade ACME-issued certs; change it to
determine the signing type dynamically (e.g., read the existing certificate's
signing mode or a persisted policy) and pass that variable instead of
CERT_SIGNING_INTERNAL, falling back to INTERNAL only if no existing/explicit
policy is available; update references around signCtx,
e.mgmClient.SignCertificate, and isWildcard to use the computed signingType so
renewal preserves ACME-issued certificates.
- Around line 1039-1044: Remove the length check so CA bundle reconciliation
always runs: call e.certManager.StoreCA(caCerts) unconditionally after
retrieving caCerts from conf.GetCaCertificatesPem(), guarded only by
e.certManager != nil, so that StoreCA is invoked even when caCerts is empty
(allowing StoreCA to clear stale CAs); update the block around e.certManager and
StoreCA accordingly and keep the existing error logging (log.Warnf("failed to
store CA certificates: %v", err)) if StoreCA returns an error.

In `@management/server/ca/internal_ca.go`:
- Around line 307-314: The generateSerialNumber function can return 0 from
rand.Int which yields an invalid certificate serial; update generateSerialNumber
to loop (retry) calling rand.Int(rand.Reader, max) until the returned *big.Int
has Sign() > 0 (i.e., is non-zero/positive) and then return that value (handle
and propagate any rand.Int error on each attempt); this ensures every generated
serial in generateSerialNumber is strictly positive.
- Around line 86-88: The Sign method on InternalCASigner can panic when csr is
nil; add an explicit nil check at the start of InternalCASigner.Sign (before
calling csr.CheckSignature) that returns a clear error (e.g., fmt.Errorf("nil
CSR provided")) so the function safely fails instead of dereferencing a nil CSR
in csr.CheckSignature; update any tests/callers if they rely on a different
error string.
- Around line 61-80: NewInternalCASigner currently constructs InternalCASigner
with caCert and caKey without verifying they match; add a validation after
parseECPrivateKeyPEM and parseCertificatePEM that compares the certificate's
public key to the private key's derived public key (for ECDSA, compare the X and
Y coordinates) and return an error like "CA certificate and private key do not
match" if they differ; update the constructor flow around parseCertificatePEM,
parseECPrivateKeyPEM, and the returned InternalCASigner to perform this check
before creating the struct (reference NewInternalCASigner, parseCertificatePEM,
parseECPrivateKeyPEM, and InternalCASigner fields caCert/caKey).

In `@management/server/ca/manager.go`:
- Around line 197-200: Update the comment on the CheckRateLimit method to
reflect both exemptions: change the current single-line comment that says
"domain_change triggers are exempt from rate limiting." to mention that both
domain_change and session_renewal triggers are exempt, referencing the
TriggerDomainChange and TriggerSessionRenewal symbols so the comment matches the
behavior in the CheckRateLimit function.

In `@management/server/http/handlers/ca/ca_handler_test.go`:
- Around line 433-437: Tests currently call POST
"/api/ca/certificates/{serialNumber}/revoke" but the real API is DELETE
"/api/ca/certificates/{serialNumber}"; update the tests to use http.MethodDelete
and the path "/api/ca/certificates/1234" (or the parameterized form) when
creating requests with newAuthenticatedRequest, and ensure the router
registration (router.HandleFunc(..., h.revokeCert).Methods("DELETE")) and any
other test instance (the second occurrence around line 458) use the DELETE
method and the canonical path so the test exercises the production route shape.

In `@management/server/store/sql_store.go`:
- Around line 5243-5245: The CreateCACertificate method dereferences caCert
immediately; add a nil guard at the top of CreateCACertificate that returns a
descriptive error if caCert == nil before copying or calling
caCopy.EncryptSensitiveData, then proceed as before; apply the same defensive
nil-check pattern to the other create-* methods in this file that follow the
same pattern so inputs are validated before dereference.

In `@shared/management/client/rest/ca.go`:
- Around line 106-108: The RevokeCertificate implementation is calling the wrong
HTTP verb/path; update CertificateAuthorityAPI.RevokeCertificate to use a DELETE
request to the server endpoint and adjust the URL to
"/api/ca/certificates/"+serialNumber (instead of POST
"/api/ca/certificates/{serialNumber}/revoke") by invoking a.c.NewRequest with
method "DELETE" and the corrected path so it matches the server wiring.

In `@shared/management/http/api/openapi.yml`:
- Around line 10023-10050: The OpenAPI spec for the "Initialize CA" POST
(summary: "Initialize CA") omits the 422 response even though the implementation
returns 422 for validation/invalid JSON; add a '422' response under responses
pointing to a shared validation/unprocessable response (e.g. "$ref":
"#/components/responses/unprocessable_entity" or create one) so clients see
validation errors, and apply the same change to the CA rotate endpoint (summary:
"Rotate CA") that uses CAInitRequest/CACertificateResponse and the same security
entries (BearerAuth, TokenAuth). Ensure the new response is referenced where
CAInitRequest and CACertificateResponse are used.

---

Duplicate comments:
In `@management/server/ca/manager.go`:
- Around line 107-111: The code checks len(csr.DNSNames) > 0 but doesn't
validate the actual content, so ensure the first SAN (csr.DNSNames[0]) is
non-empty and not just whitespace before using it (peerFQDN) or deriving
wildcards; if it's empty, return an error like "csr must include at least one
non-empty DNS SAN". Apply the same validation to the other occurrence that
derives a peerFQDN/wildcard (the block around the second check at the later
section) so both places validate csr.DNSNames[0] content rather than only count.

---

Nitpick comments:
In `@management/server/ca/manager.go`:
- Around line 97-105: The ACME branch in manager.go currently returns
immediately after calling signer.Sign for SigningTypeACME, skipping the
post-sign persistence and audit flow; modify the SigningTypeACME branch so it
captures the signer.Sign result and error (using m.signers and signer.Sign),
then invoke CreateIssuedCertificate and CreateCertIssuanceLog exactly as the
internal branch does (handling and propagating any persistence/logging errors),
and finally return the same tuple (issued certificate, issuance log, error)
without an early return; ensure error handling preserves the original sign error
when appropriate and that both CreateIssuedCertificate and CreateCertIssuanceLog
are always attempted when sign succeeds.

In `@management/server/http/handlers/ca/ca_handler_test.go`:
- Around line 167-175: The test currently stubs
permissionsManagerMock.ValidateUserPermissions with gomock.Any() for the
operation argument, which hides incorrect operation checks; update both
expectations (the true and false branches) to assert the exact permission
operation constant the handler should require (replace the operation
gomock.Any() with the specific modules.* operation constant the route uses, e.g.
modules.OperationRead / modules.OperationCreate / modules.OperationDelete) while
keeping the same account/user args (testAccountID, testUserID) so the test fails
if the handler checks the wrong operation.

In `@shared/management/http/api/openapi.yml`:
- Around line 341-343: The AccountSettings schema currently defines
cert_wildcard_allowed as an optional boolean; make it required to keep the
object shape stable by adding "cert_wildcard_allowed" into the AccountSettings
required array (the same place other flags are listed), and apply the same
change for the other newly-added flags in that same settings block (the fields
referenced around lines 365-373) so all account flag fields remain consistently
required for SDK/client generation.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 0bbffefb-1c19-44ae-b457-bd68cee1d34c

📥 Commits

Reviewing files that changed from the base of the PR and between 1623875 and b7562ac.

📒 Files selected for processing (18)
  • client/cmd/cert.go
  • client/internal/cert/manager.go
  • client/internal/cert/trust_linux.go
  • client/internal/connect.go
  • client/internal/engine.go
  • client/server/cert.go
  • client/server/server.go
  • management/internals/shared/grpc/cert_service.go
  • management/server/account.go
  • management/server/ca/internal_ca.go
  • management/server/ca/manager.go
  • management/server/http/handlers/accounts/accounts_handler_test.go
  • management/server/http/handlers/ca/ca_handler.go
  • management/server/http/handlers/ca/ca_handler_test.go
  • management/server/store/sql_store.go
  • shared/management/client/rest/ca.go
  • shared/management/http/api/openapi.yml
  • shared/management/http/api/types.gen.go
🚧 Files skipped from review as they are similar to previous changes (4)
  • management/server/http/handlers/ca/ca_handler.go
  • client/server/server.go
  • client/cmd/cert.go
  • client/internal/connect.go

Comment on lines +51 to +55
func (m *Manager) CreateCSR(key crypto.Signer, fqdn string, wildcard bool) ([]byte, error) {
dnsNames := []string{fqdn}
if wildcard {
dnsNames = append(dnsNames, "*."+fqdn)
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Validate empty FQDN before CSR creation.

On Line 52, an empty fqdn creates invalid SAN/CN content and pushes the error downstream. Fail fast with a clear client-side validation error.

🔧 Suggested fix
 func (m *Manager) CreateCSR(key crypto.Signer, fqdn string, wildcard bool) ([]byte, error) {
+	if fqdn == "" {
+		return nil, fmt.Errorf("fqdn is required")
+	}
 	dnsNames := []string{fqdn}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
func (m *Manager) CreateCSR(key crypto.Signer, fqdn string, wildcard bool) ([]byte, error) {
dnsNames := []string{fqdn}
if wildcard {
dnsNames = append(dnsNames, "*."+fqdn)
}
func (m *Manager) CreateCSR(key crypto.Signer, fqdn string, wildcard bool) ([]byte, error) {
if fqdn == "" {
return nil, fmt.Errorf("fqdn is required")
}
dnsNames := []string{fqdn}
if wildcard {
dnsNames = append(dnsNames, "*."+fqdn)
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@client/internal/cert/manager.go` around lines 51 - 55, In Manager.CreateCSR,
validate the fqdn parameter before building dnsNames: if fqdn is empty or only
whitespace, return a clear error immediately instead of constructing SAN/CN with
an empty string; implement the check at the top of CreateCSR (use
strings.TrimSpace on fqdn) and return a descriptive error (e.g., "fqdn must not
be empty") so downstream CSR creation never receives invalid SAN/CN values.

Comment thread client/internal/cert/manager.go Outdated
Comment on lines +75 to +83
if err := atomicWrite(filepath.Join(m.certDir, certFileName), certPEM, 0644); err != nil {
return fmt.Errorf("write cert: %w", err)
}
if err := atomicWrite(filepath.Join(m.certDir, chainFileName), chainPEM, 0644); err != nil {
return fmt.Errorf("write chain: %w", err)
}
if err := atomicWrite(filepath.Join(m.certDir, keyFileName), keyPEM, 0600); err != nil {
return fmt.Errorf("write key: %w", err)
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

StoreCert can leave inconsistent certificate state on partial failure.

If Line 75 succeeds and Line 81 fails, disk state may contain a new cert/chain with an old key (or vice versa), breaking TLS load until repaired. Please make bundle writes all-or-nothing (staging + commit/rollback).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@client/internal/cert/manager.go` around lines 75 - 83, StoreCert currently
writes cert, chain, and key directly into m.certDir via atomicWrite which can
leave a mixed/partial state on failure; change StoreCert so it stages the three
files into a temporary directory (or temp filenames) first (e.g., create tempDir
:= m.certDir + ".tmp-<uniq>" and write certPEM/chainPEM/keyPEM to tempDir using
atomicWrite), then perform a single commit step that atomically moves the staged
files into place (use os.Rename to move
tempDir/<certFileName|chainFileName|keyFileName> to m.certDir/<...> or rename
temp files over the targets), and on any error perform rollback by deleting the
temp files/dir and restoring any backed-up originals; reference the existing
symbols StoreCert, atomicWrite, m.certDir, certFileName, chainFileName,
keyFileName when implementing staging + commit/rollback so writes become
all-or-nothing.

Comment thread client/internal/cert/trust_linux.go Outdated
Comment thread client/internal/cert/trust_linux.go Outdated
Comment thread client/internal/engine.go Outdated
Comment thread management/server/ca/manager.go Outdated
Comment on lines +433 to +437
req := newAuthenticatedRequest(t, http.MethodPost, "/api/ca/certificates/1234/revoke")

router := mux.NewRouter()
router.HandleFunc("/api/ca/certificates/{serialNumber}/revoke", h.revokeCert).Methods("POST")
router.ServeHTTP(recorder, req)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Align revoke tests with the actual API contract.

Line 433/436 and Line 458 validate POST /api/ca/certificates/{serialNumber}/revoke, but the registered endpoint is DELETE /api/ca/certificates/{serialNumber}. These tests currently pass against a non-production route shape.

🔧 Proposed fix
-	req := newAuthenticatedRequest(t, http.MethodPost, "/api/ca/certificates/1234/revoke")
+	req := newAuthenticatedRequest(t, http.MethodDelete, "/api/ca/certificates/1234")

 	router := mux.NewRouter()
-	router.HandleFunc("/api/ca/certificates/{serialNumber}/revoke", h.revokeCert).Methods("POST")
+	router.HandleFunc("/api/ca/certificates/{serialNumber}", h.revokeCert).Methods("DELETE")
 	router.ServeHTTP(recorder, req)
-		{"POST", "/api/ca/certificates/1234/revoke", "/api/ca/certificates/{serialNumber}/revoke", h.revokeCert},
+		{"DELETE", "/api/ca/certificates/1234", "/api/ca/certificates/{serialNumber}", h.revokeCert},

Also applies to: 458-458

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@management/server/http/handlers/ca/ca_handler_test.go` around lines 433 -
437, Tests currently call POST "/api/ca/certificates/{serialNumber}/revoke" but
the real API is DELETE "/api/ca/certificates/{serialNumber}"; update the tests
to use http.MethodDelete and the path "/api/ca/certificates/1234" (or the
parameterized form) when creating requests with newAuthenticatedRequest, and
ensure the router registration (router.HandleFunc(...,
h.revokeCert).Methods("DELETE")) and any other test instance (the second
occurrence around line 458) use the DELETE method and the canonical path so the
test exercises the production route shape.

Comment thread management/server/store/sql_store.go
Comment on lines +106 to +108
func (a *CertificateAuthorityAPI) RevokeCertificate(ctx context.Context, serialNumber string) error {
resp, err := a.c.NewRequest(ctx, "POST", "/api/ca/certificates/"+serialNumber+"/revoke", nil, nil)
if err != nil {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Fix revoke request method/path to match server endpoint.

Line 107 calls POST /api/ca/certificates/{serialNumber}/revoke, but server wiring exposes DELETE /api/ca/certificates/{serialNumber}. This call will fail against the registered API.

🔧 Proposed fix
 import (
 	"context"
+	"net/url"

 	"github.com/netbirdio/netbird/shared/management/http/api"
 )
@@
 func (a *CertificateAuthorityAPI) RevokeCertificate(ctx context.Context, serialNumber string) error {
-	resp, err := a.c.NewRequest(ctx, "POST", "/api/ca/certificates/"+serialNumber+"/revoke", nil, nil)
+	resp, err := a.c.NewRequest(ctx, "DELETE", "/api/ca/certificates/"+url.PathEscape(serialNumber), nil, nil)
 	if err != nil {
 		return err
 	}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
func (a *CertificateAuthorityAPI) RevokeCertificate(ctx context.Context, serialNumber string) error {
resp, err := a.c.NewRequest(ctx, "POST", "/api/ca/certificates/"+serialNumber+"/revoke", nil, nil)
if err != nil {
import (
"context"
"net/url"
"github.com/netbirdio/netbird/shared/management/http/api"
)
func (a *CertificateAuthorityAPI) RevokeCertificate(ctx context.Context, serialNumber string) error {
resp, err := a.c.NewRequest(ctx, "DELETE", "/api/ca/certificates/"+url.PathEscape(serialNumber), nil, nil)
if err != nil {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@shared/management/client/rest/ca.go` around lines 106 - 108, The
RevokeCertificate implementation is calling the wrong HTTP verb/path; update
CertificateAuthorityAPI.RevokeCertificate to use a DELETE request to the server
endpoint and adjust the URL to "/api/ca/certificates/"+serialNumber (instead of
POST "/api/ca/certificates/{serialNumber}/revoke") by invoking a.c.NewRequest
with method "DELETE" and the corrected path so it matches the server wiring.

Comment on lines +10023 to +10050
post:
summary: Initialize CA
description: Initialize a new certificate authority for the account
tags: [ Certificate Authority ]
security:
- BearerAuth: [ ]
- TokenAuth: [ ]
requestBody:
description: Optional CA configuration parameters
content:
application/json:
schema:
$ref: '#/components/schemas/CAInitRequest'
responses:
'200':
description: A JSON Object of the created CA Certificate
content:
application/json:
schema:
$ref: '#/components/schemas/CACertificateResponse'
'400':
"$ref": "#/components/responses/bad_request"
'401':
"$ref": "#/components/responses/requires_authentication"
'403':
"$ref": "#/components/responses/forbidden"
'500':
"$ref": "#/components/responses/internal_error"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Document 422 responses for CA init/rotate validation errors.

These endpoints currently omit 422, but the PR behavior indicates invalid JSON/validation paths return 422. The spec should expose that to clients.

Proposed change
       responses:
         '200':
           description: A JSON Object of the created CA Certificate
           content:
             application/json:
               schema:
                 $ref: '#/components/schemas/CACertificateResponse'
         '400':
           "$ref": "#/components/responses/bad_request"
+        '422':
+          "$ref": "#/components/responses/validation_failed"
         '401':
           "$ref": "#/components/responses/requires_authentication"
         '403':
           "$ref": "#/components/responses/forbidden"
         '500':
           "$ref": "#/components/responses/internal_error"
@@
       responses:
         '200':
           description: A JSON Object of the new CA Certificate
           content:
             application/json:
               schema:
                 $ref: '#/components/schemas/CACertificateResponse'
         '400':
           "$ref": "#/components/responses/bad_request"
+        '422':
+          "$ref": "#/components/responses/validation_failed"
         '401':
           "$ref": "#/components/responses/requires_authentication"
         '403':
           "$ref": "#/components/responses/forbidden"
         '500':
           "$ref": "#/components/responses/internal_error"

Also applies to: 10112-10140

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@shared/management/http/api/openapi.yml` around lines 10023 - 10050, The
OpenAPI spec for the "Initialize CA" POST (summary: "Initialize CA") omits the
422 response even though the implementation returns 422 for validation/invalid
JSON; add a '422' response under responses pointing to a shared
validation/unprocessable response (e.g. "$ref":
"#/components/responses/unprocessable_entity" or create one) so clients see
validation errors, and apply the same change to the CA rotate endpoint (summary:
"Rotate CA") that uses CAInitRequest/CACertificateResponse and the same security
entries (BearerAuth, TokenAuth). Ensure the new response is referenced where
CAInitRequest and CACertificateResponse are used.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (5)
client/internal/cert/manager.go (2)

52-54: ⚠️ Potential issue | 🟡 Minor

Trim whitespace before FQDN validation.

Line 52 only rejects ""; " " still passes and creates an invalid CSR input. Normalize first, then validate.

Suggested fix
 import (
 	"crypto"
 	"crypto/ecdsa"
 	"crypto/elliptic"
 	"crypto/rand"
 	"crypto/x509"
 	"crypto/x509/pkix"
 	"encoding/pem"
 	"fmt"
 	"os"
 	"path/filepath"
+	"strings"
 	"sync"
 	"time"
 )
@@
 func (m *Manager) CreateCSR(key crypto.Signer, fqdn string, wildcard bool) ([]byte, error) {
-	if fqdn == "" {
+	fqdn = strings.TrimSpace(fqdn)
+	if fqdn == "" {
 		return nil, fmt.Errorf("FQDN is required for CSR creation")
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@client/internal/cert/manager.go` around lines 52 - 54, Trim and normalize the
fqdn string before validating it: wherever the CSR creation logic checks "if
fqdn == "" { ... }" (in the CSR generation function handling the fqdn variable),
first call strings.TrimSpace(fqdn) and assign it back to fqdn, then perform the
empty-string check and error return; this ensures inputs like "   " are rejected
the same as "" and prevents invalid CSR input.

106-115: ⚠️ Potential issue | 🟠 Major

StoreCert can still leave mixed cert/key state on partial commit.

If rename on Line 108 succeeds for one file and a later rename fails, disk can hold a new cert with an old key (or vice versa). That breaks TLS loading until repaired.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@client/internal/cert/manager.go` around lines 106 - 115, StoreCert can leave
mixed cert/key state because it renames temp files into place one-by-one (loop
over files and os.Rename(tmpPaths[i], f.path)), so a later rename failure yields
partially updated files; fix by performing an atomic directory swap instead of
per-file renames: create a temporary directory next to the target cert
directory, write all cert/key files into that temp dir (using the same filenames
as in files/tmpPaths), then atomically replace the target cert directory with
os.Rename(tempDir, certDir) (or, if directory swap is not possible in your
layout, implement the three-step swap: rename existing target files to backups,
rename all temps into place, then remove backups) so that StoreCert (and symbols
files, tmpPaths, and the rename loop) no longer leaves a mixed state on partial
failure.
client/internal/engine.go (1)

2162-2163: ⚠️ Potential issue | 🟠 Major

Renewal hardcodes INTERNAL signing type.

Line 2162 forces CERT_SIGNING_INTERNAL for every renewal. If the active cert came from a different signing mode, renewal can silently change issuer/trust behavior.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@client/internal/engine.go` around lines 2162 - 2163, The renewal currently
hardcodes mgmProto.CertSigningType_CERT_SIGNING_INTERNAL when calling
e.mgmClient.SignCertificate, which can change issuer/trust; modify the renewal
flow to determine the original signing type from the active certificate/metadata
(e.g. from e.activeCert or its stored signing type) and pass that signing type
into SignCertificate instead of CERT_SIGNING_INTERNAL (use the same variable
names signCtx, csrDER, and signResp). Ensure fallback behavior if the original
signing type is missing (preserve current default), and update any references in
the renewal function that assume INTERNAL.
management/server/store/sql_store.go (1)

5413-5415: ⚠️ Potential issue | 🟠 Major

Add a nil guard to CreateCertIssuanceLog for input safety consistency.

Line 5414 writes entry directly without validating it, while other new Create* methods already guard nil inputs.

🔧 Suggested fix
 func (s *SqlStore) CreateCertIssuanceLog(ctx context.Context, entry *ca.CertIssuanceLog) error {
+	if entry == nil {
+		return status.Errorf(status.InvalidArgument, "cert issuance log entry is nil")
+	}
 	result := s.db.Create(entry)
 	if result.Error != nil {
 		log.WithContext(ctx).Errorf("failed to create cert issuance log in store: %v", result.Error)
 		return status.Errorf(status.Internal, "failed to create cert issuance log in store")
 	}
 	return nil
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@management/server/store/sql_store.go` around lines 5413 - 5415, Add a nil
guard at the start of CreateCertIssuanceLog: check if entry == nil and return
the same response other Create* methods use for nil inputs (match their behavior
— e.g., return nil or a specific validation error) before calling
s.db.Create(entry); this keeps input-safety consistent with the other Create
methods in SqlStore.
management/server/ca/manager.go (1)

162-165: ⚠️ Potential issue | 🟡 Minor

Avoid mutating CSR-backed SAN slice when adding wildcard DNS name.

Line 162 aliases csr.DNSNames directly; line 164's append may mutate the original backing array if spare capacity exists. Go's x509.ParseCertificateRequest allocates DNSNames using append(), which can over-allocate capacity beyond the final length. Copy the slice first to prevent unexpected mutations of the input CSR object.

🔧 Suggested fix
-	dnsNames := csr.DNSNames
+	dnsNames := append([]string(nil), csr.DNSNames...)
 	if wildcard && !containsName(dnsNames, "*."+peerFQDN) {
 		dnsNames = append(dnsNames, "*."+peerFQDN)
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@management/server/ca/manager.go` around lines 162 - 165, The code currently
aliases csr.DNSNames into dnsNames and may append to it, potentially mutating
the CSR's backing array; instead create a copy of the CSR slice before mutating
it (e.g., allocate a new slice and copy csr.DNSNames into it) so appending the
wildcard ("*."+peerFQDN) does not modify csr.DNSNames; update the block around
csr.DNSNames / dnsNames in manager.go (functions using containsName, peerFQDN,
wildcard) to use the copied slice before any append.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@client/internal/engine.go`:
- Around line 2181-2185: Validate the signed certificate payload from signResp
before calling e.certManager.StoreCert: ensure signResp.InternalCertPem and
signResp.InternalChainPem are non-empty, attempt to parse/validate them (e.g.,
combine certPEM+chainPEM and call tls.X509KeyPair with keyPEM or decode PEM
blocks and parse with x509.ParseCertificate/x509.ParseCertificates) and only
call e.certManager.StoreCert(certPEM, chainPEM, keyPEM) if parsing succeeds; on
parse/validation failure, log a clear error including trigger and the parse
error and skip storing to avoid overwriting a valid cert set.

In `@management/internals/shared/grpc/cert_service.go`:
- Around line 200-216: The SAN comparison in cert_service.go (the block that
builds expected := map[string]struct{}{peerFQDN: {}} and then iterates over
csr.DNSNames) must be made case-insensitive: normalize keys and inputs (e.g.,
use strings.ToLower on peerFQDN, the wildcard entry "*."+peerFQDN, and each
entry from csr.DNSNames) before building the expected map and before checking
membership and duplicates (seen map should track the normalized name), or
alternatively use strings.EqualFold when comparing; also add the strings import
if needed so CSR SAN checks in this function treat DNS names case-insensitively.

---

Duplicate comments:
In `@client/internal/cert/manager.go`:
- Around line 52-54: Trim and normalize the fqdn string before validating it:
wherever the CSR creation logic checks "if fqdn == "" { ... }" (in the CSR
generation function handling the fqdn variable), first call
strings.TrimSpace(fqdn) and assign it back to fqdn, then perform the
empty-string check and error return; this ensures inputs like "   " are rejected
the same as "" and prevents invalid CSR input.
- Around line 106-115: StoreCert can leave mixed cert/key state because it
renames temp files into place one-by-one (loop over files and
os.Rename(tmpPaths[i], f.path)), so a later rename failure yields partially
updated files; fix by performing an atomic directory swap instead of per-file
renames: create a temporary directory next to the target cert directory, write
all cert/key files into that temp dir (using the same filenames as in
files/tmpPaths), then atomically replace the target cert directory with
os.Rename(tempDir, certDir) (or, if directory swap is not possible in your
layout, implement the three-step swap: rename existing target files to backups,
rename all temps into place, then remove backups) so that StoreCert (and symbols
files, tmpPaths, and the rename loop) no longer leaves a mixed state on partial
failure.

In `@client/internal/engine.go`:
- Around line 2162-2163: The renewal currently hardcodes
mgmProto.CertSigningType_CERT_SIGNING_INTERNAL when calling
e.mgmClient.SignCertificate, which can change issuer/trust; modify the renewal
flow to determine the original signing type from the active certificate/metadata
(e.g. from e.activeCert or its stored signing type) and pass that signing type
into SignCertificate instead of CERT_SIGNING_INTERNAL (use the same variable
names signCtx, csrDER, and signResp). Ensure fallback behavior if the original
signing type is missing (preserve current default), and update any references in
the renewal function that assume INTERNAL.

In `@management/server/ca/manager.go`:
- Around line 162-165: The code currently aliases csr.DNSNames into dnsNames and
may append to it, potentially mutating the CSR's backing array; instead create a
copy of the CSR slice before mutating it (e.g., allocate a new slice and copy
csr.DNSNames into it) so appending the wildcard ("*."+peerFQDN) does not modify
csr.DNSNames; update the block around csr.DNSNames / dnsNames in manager.go
(functions using containsName, peerFQDN, wildcard) to use the copied slice
before any append.

In `@management/server/store/sql_store.go`:
- Around line 5413-5415: Add a nil guard at the start of CreateCertIssuanceLog:
check if entry == nil and return the same response other Create* methods use for
nil inputs (match their behavior — e.g., return nil or a specific validation
error) before calling s.db.Create(entry); this keeps input-safety consistent
with the other Create methods in SqlStore.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 9a54db83-9202-4ad0-bc41-b09abc951a80

📥 Commits

Reviewing files that changed from the base of the PR and between b7562ac and b8ce917.

📒 Files selected for processing (9)
  • client/internal/cert/manager.go
  • client/internal/cert/trust_linux.go
  • client/internal/engine.go
  • management/internals/shared/grpc/cert_service.go
  • management/server/ca/internal_ca.go
  • management/server/ca/manager.go
  • management/server/ca/manager_test.go
  • management/server/ca/types.go
  • management/server/store/sql_store.go
🚧 Files skipped from review as they are similar to previous changes (2)
  • management/server/ca/manager_test.go
  • client/internal/cert/trust_linux.go

Comment thread client/internal/engine.go
Comment on lines +2181 to +2185
certPEM := signResp.InternalCertPem
chainPEM := signResp.InternalChainPem

if err := e.certManager.StoreCert(certPEM, chainPEM, keyPEM); err != nil {
log.Warnf("cert renewal (%s): store certificate: %v", trigger, err)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Validate signed cert payload before persisting.

signResp.InternalCertPem/InternalChainPem are written directly without client-side sanity checks. A malformed/empty response can overwrite a working cert set and leave the client unable to load TLS certs.

Suggested fix
 	certPEM := signResp.InternalCertPem
 	chainPEM := signResp.InternalChainPem
+	if len(certPEM) == 0 {
+		log.Warnf("cert renewal (%s): empty certificate payload", trigger)
+		return
+	}
+	block, _ := pem.Decode(certPEM)
+	if block == nil {
+		log.Warnf("cert renewal (%s): invalid certificate PEM", trigger)
+		return
+	}
+	if _, err := x509.ParseCertificate(block.Bytes); err != nil {
+		log.Warnf("cert renewal (%s): parse signed cert: %v", trigger, err)
+		return
+	}
 
 	if err := e.certManager.StoreCert(certPEM, chainPEM, keyPEM); err != nil {
 		log.Warnf("cert renewal (%s): store certificate: %v", trigger, err)
 		return
 	}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
certPEM := signResp.InternalCertPem
chainPEM := signResp.InternalChainPem
if err := e.certManager.StoreCert(certPEM, chainPEM, keyPEM); err != nil {
log.Warnf("cert renewal (%s): store certificate: %v", trigger, err)
certPEM := signResp.InternalCertPem
chainPEM := signResp.InternalChainPem
if len(certPEM) == 0 {
log.Warnf("cert renewal (%s): empty certificate payload", trigger)
return
}
block, _ := pem.Decode(certPEM)
if block == nil {
log.Warnf("cert renewal (%s): invalid certificate PEM", trigger)
return
}
if _, err := x509.ParseCertificate(block.Bytes); err != nil {
log.Warnf("cert renewal (%s): parse signed cert: %v", trigger, err)
return
}
if err := e.certManager.StoreCert(certPEM, chainPEM, keyPEM); err != nil {
log.Warnf("cert renewal (%s): store certificate: %v", trigger, err)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@client/internal/engine.go` around lines 2181 - 2185, Validate the signed
certificate payload from signResp before calling e.certManager.StoreCert: ensure
signResp.InternalCertPem and signResp.InternalChainPem are non-empty, attempt to
parse/validate them (e.g., combine certPEM+chainPEM and call tls.X509KeyPair
with keyPEM or decode PEM blocks and parse with
x509.ParseCertificate/x509.ParseCertificates) and only call
e.certManager.StoreCert(certPEM, chainPEM, keyPEM) if parsing succeeds; on
parse/validation failure, log a clear error including trigger and the parse
error and skip storing to avoid overwriting a valid cert set.

Comment thread management/internals/shared/grpc/cert_service.go Outdated
zgv163 added 18 commits March 5, 2026 18:18
  Wire certificate management end-to-end: management client methods
  (SignCertificate, GetCACertificates), daemon RPC handlers, and CLI
  commands (netbird cert request|status|trust-ca|untrust-ca).
  Wire cert manager into the engine via ConnectClient so CA certificates
  from SyncResponse are stored locally, FQDN changes trigger re-issuance,
  and a background goroutine renews certificates approaching expiry.
  Define CACertificateResponse and IssuedCertificateResponse schemas in
  the OpenAPI spec with 7 endpoint paths, regenerate types, and update
  the CA handler to use the generated api types instead of local structs.
  Add REST client (CertificateAuthorityAPI) with mock-based unit tests.
The initCA handler creates the CA but did not flip
CertificateAuthorityEnabled in account settings, so peers requesting
certificates got a "certificate authority is not enabled" error until
the setting was toggled manually.
For peers with login expiration enabled, issue certificates whose
validity matches the account's PeerLoginExpiration duration instead of
the default 90 days. Non-expiring peers keep the 90-day default.

Server: add validity parameter to SignCertificate and a
TriggerSessionRenewal trigger exempt from rate limiting.

Client: replace the fixed 30-day renewal threshold with a proportional
one (1/3 of cert lifetime, clamped 1h–30d) and detect expired
certificates after sync for immediate renewal.
TrustCA() now proactively fetches CA certificates from the management
server via GetCACertificates gRPC call when ca.pem is not yet available
locally, instead of failing with "no CA certificates available".
…tting

Support display name, organization, and validity options when
initializing or rotating a CA. Persist resolved subject fields on the
CACertificate record and expose them in the REST API. Wire the
cert_wildcard_allowed account setting through the accounts handler.
Fix critical, major, and minor issues from CodeRabbit review:

- Enforce exact SAN allowlist in CSR validation (C1)
- Use FillBytes for serial number to prevent panic (C2)
- Add nil CSR and empty DNSNames checks (C3)
- Return 422 on invalid JSON in CA init/rotate (M6)
- Propagate settings errors in initCA as HTTP errors (M9)
- Cap leaf cert NotAfter to CA expiry (M12)
- Mark NameConstraints as critical (M13)
- Add resolveCmd helper for trust_linux.go (M2)
- Derive cert dir from config file path (M4)
- Move cert renewal loop after engine init (M5)
- Detect wildcard on cert renewal (M7)
- Add atomic guard for concurrent renewal (M8)
- Validate cert PEM before storing (M15)
- Deep-copy CA cert before encryption (M18)
- Add certificate activity codes to OpenAPI spec (M3, M14)
- Fix CLI trust/untrust error returns (m1)
- Fix REST client nil pointer on parse error (m2)
- Lowercase FQDN in name validation (m3)
- Check all DNSNames in FQDNChanged (m4)
- Propagate cert manager to running engine (n1)
- Preserve internal CA settings on account update (Bug netbirdio#4)
- Reject non-DNS SANs (IP, Email, URI) in CSR validation
- Add nil CSR guard in Sign method
- Validate CA cert/key match in constructor with safe type assertion
- Ensure non-zero serial numbers
- Use SignRequest/IssuedCertParams/CAOptions structs to reduce param counts
- Extract validateCSRSANs helper to reduce cognitive complexity
- Two-phase atomic StoreCert with temp file cleanup on failure
- Add 30s timeout on trust store update commands (Linux)
- Use full SHA-256 fingerprint in trust store cert filenames
- Remove len>0 guard so empty CA list clears local bundle
- Add nil guards on CreateCACertificate/CreateIssuedCertificate
- Detect duplicate DNS SANs in CSR validation
- Fix comment to mention both rate limit exemptions
… feedback

- Fix merge conflict artifacts in handler.go and sql_store.go from upstream rebase
- Validate signed cert PEM is non-empty before persisting during renewal (engine.go)
- Use case-insensitive DNS SAN comparison in CSR validation (cert_service.go)
@zgv163 zgv163 force-pushed the feature/internal-ca branch from b8ce917 to 4f9be46 Compare March 5, 2026 21:32
@sonarqubecloud
Copy link
Copy Markdown

sonarqubecloud Bot commented Mar 5, 2026

Quality Gate Failed Quality Gate failed

Failed conditions
6 New issues
6 New Code Smells (required ≤ 0)

See analysis details on SonarQube Cloud

Catch issues before they fail your Quality Gate with our IDE extension SonarQube for IDE

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
management/internals/controllers/network_map/controller/controller.go (1)

230-261: ⚠️ Potential issue | 🟠 Major

Use the goroutine argument (p.ID) instead of the outer loop variable (peer.ID) on line 259.

The goroutine is explicitly passed the current iteration's peer as parameter p, but line 259 incorrectly uses the captured outer peer.ID for the proxy network map lookup. This inconsistency can cause incorrect proxy data to be merged with the wrong peer's network map.

🔧 Suggested fix
-			proxyNetworkMap, ok := proxyNetworkMaps[peer.ID]
+			proxyNetworkMap, ok := proxyNetworkMaps[p.ID]
 			if ok {
 				remotePeerNetworkMap.Merge(proxyNetworkMap)
 			}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@management/internals/controllers/network_map/controller/controller.go` around
lines 230 - 261, The goroutine captures the loop variable but receives the
current peer as parameter p (*nbpeer.Peer); change the proxy map lookup to use
p.ID instead of the outer peer.ID so the correct proxyNetworkMaps entry is
merged into remotePeerNetworkMap (update the proxyNetworkMaps[peer.ID] ->
proxyNetworkMaps[p.ID] reference where remotePeerNetworkMap.Merge(...) is
called).
♻️ Duplicate comments (6)
client/internal/engine.go (2)

2159-2163: ⚠️ Potential issue | 🟠 Major

Preserve certificate signing policy during renewal.

Line 2162 hardcodes CERT_SIGNING_INTERNAL; renewal can unintentionally switch issuer mode.

🔧 Proposed direction
- signResp, err := e.mgmClient.SignCertificate(signCtx, csrDER, mgmProto.CertSigningType_CERT_SIGNING_INTERNAL, isWildcard)
+ signingType := e.certManager.CurrentSigningTypeOrDefault(mgmProto.CertSigningType_CERT_SIGNING_INTERNAL)
+ signResp, err := e.mgmClient.SignCertificate(signCtx, csrDER, signingType, isWildcard)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@client/internal/engine.go` around lines 2159 - 2163, The renewal call to
e.mgmClient.SignCertificate currently hardcodes
mgmProto.CertSigningType_CERT_SIGNING_INTERNAL which can change issuer mode;
update the call to pass the certificate signing policy currently in use instead
of the constant. Retrieve the active signing type from the engine state or
certificate metadata (e.g., an existing field like e.currentCertSigningType or
from the existing cert object) and use that variable in the SignCertificate
invocation (preserving signCtx, csrDER and isWildcard), so renewal respects the
original cert signing policy rather than switching to CERT_SIGNING_INTERNAL.

2181-2190: ⚠️ Potential issue | 🟠 Major

Validate cert/chain payload before persisting.

Line 2184 only checks leaf PEM non-empty. Malformed cert/chain can still overwrite a working local cert set.

🔧 Proposed fix
 	certPEM := signResp.InternalCertPem
 	chainPEM := signResp.InternalChainPem

-	if len(certPEM) == 0 {
+	if len(certPEM) == 0 || len(chainPEM) == 0 {
 		log.Warnf("cert renewal (%s): server returned empty certificate", trigger)
 		return
 	}
+	if _, err := tls.X509KeyPair(append(certPEM, chainPEM...), keyPEM); err != nil {
+		log.Warnf("cert renewal (%s): invalid signed payload: %v", trigger, err)
+		return
+	}

 	if err := e.certManager.StoreCert(certPEM, chainPEM, keyPEM); err != nil {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@client/internal/engine.go` around lines 2181 - 2190, The code only checks
certPEM non-empty before persisting; update the renewal flow to validate the
payloads (signResp.InternalCertPem, signResp.InternalChainPem, and keyPEM)
before calling e.certManager.StoreCert: decode PEM blocks and parse the leaf
certificate with crypto/x509 (ensure x509.ParseCertificate succeeds), parse and
validate the chain PEM blocks, parse the private key (PKCS1/PKCS8/EC) and verify
the private key matches the leaf certificate's public key; if any
parse/validation step fails log a descriptive message with log.Warnf and return
without calling e.certManager.StoreCert to avoid overwriting a working cert set.
shared/management/client/rest/ca.go (1)

106-108: ⚠️ Potential issue | 🔴 Critical

Fix revoke endpoint verb/path to match server API.

Line 107 calls POST /api/ca/certificates/{serialNumber}/revoke, but server-side route is DELETE /api/ca/certificates/{serialNumber}. This request will fail as implemented.

🔧 Suggested fix
 func (a *CertificateAuthorityAPI) RevokeCertificate(ctx context.Context, serialNumber string) error {
-	resp, err := a.c.NewRequest(ctx, "POST", "/api/ca/certificates/"+serialNumber+"/revoke", nil, nil)
+	resp, err := a.c.NewRequest(ctx, "DELETE", "/api/ca/certificates/"+url.PathEscape(serialNumber), nil, nil)
 	if err != nil {
 		return err
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@shared/management/client/rest/ca.go` around lines 106 - 108, In
CertificateAuthorityAPI.RevokeCertificate, the request currently uses
a.c.NewRequest with method "POST" and path
"/api/ca/certificates/"+serialNumber+"/revoke"; change it to use the DELETE verb
and the server route "/api/ca/certificates/"+serialNumber by updating the
a.c.NewRequest call in RevokeCertificate so it issues a DELETE to the correct
path.
shared/management/http/api/openapi.yml (1)

10061-10074: ⚠️ Potential issue | 🟠 Major

Document 422 responses for CA init/rotate validation failures (still missing).

/api/ca (POST) and /api/ca/rotate (POST) still omit 422 from the OpenAPI responses, despite validation failure paths needing to be represented for clients.

Proposed OpenAPI response additions
       responses:
         '200':
           description: A JSON Object of the created CA Certificate
           content:
             application/json:
               schema:
                 $ref: '#/components/schemas/CACertificateResponse'
         '400':
           "$ref": "#/components/responses/bad_request"
+        '422':
+          "$ref": "#/components/responses/validation_failed"
         '401':
           "$ref": "#/components/responses/requires_authentication"
         '403':
           "$ref": "#/components/responses/forbidden"
         '500':
           "$ref": "#/components/responses/internal_error"
@@
       responses:
         '200':
           description: A JSON Object of the new CA Certificate
           content:
             application/json:
               schema:
                 $ref: '#/components/schemas/CACertificateResponse'
         '400':
           "$ref": "#/components/responses/bad_request"
+        '422':
+          "$ref": "#/components/responses/validation_failed"
         '401':
           "$ref": "#/components/responses/requires_authentication"
         '403':
           "$ref": "#/components/responses/forbidden"
         '500':
           "$ref": "#/components/responses/internal_error"

Also applies to: 10151-10164

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@shared/management/http/api/openapi.yml` around lines 10061 - 10074, Add a 422
Unprocessable Entity response to the OpenAPI responses for the CA init and
rotate endpoints to document validation failures: update the POST /api/ca and
POST /api/ca/rotate response objects in shared/management/http/api/openapi.yml
to include a '422' entry referencing the shared validation response (e.g.
"$ref": "#/components/responses/validation_error" or equivalent component name
used in the spec), and mirror the same 422 addition for the other identical
response blocks noted around the other range. Ensure the reference matches the
existing component key for validation errors so clients see the schema for
validation failures.
management/server/store/sql_store.go (1)

5604-5611: ⚠️ Potential issue | 🟡 Minor

Add a nil guard for entry to match create-method defensive checks.

This method should mirror the same input validation pattern used by other Create* store methods.

🔧 Proposed fix
 func (s *SqlStore) CreateCertIssuanceLog(ctx context.Context, entry *ca.CertIssuanceLog) error {
+	if entry == nil {
+		return status.Errorf(status.InvalidArgument, "cert issuance log entry is nil")
+	}
 	result := s.db.Create(entry)
 	if result.Error != nil {
 		log.WithContext(ctx).Errorf("failed to create cert issuance log in store: %v", result.Error)
 		return status.Errorf(status.Internal, "failed to create cert issuance log in store")
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@management/server/store/sql_store.go` around lines 5604 - 5611,
CreateCertIssuanceLog omits the nil-input guard used by other Create* methods;
add a check at the start of CreateCertIssuanceLog to return an InvalidArgument
status when the provided *ca.CertIssuanceLog entry is nil (log the error with
log.WithContext(ctx).Errorf) before calling s.db.Create(entry) so the method
matches the defensive validation pattern of other create methods.
management/server/ca/manager.go (1)

123-165: ⚠️ Potential issue | 🟠 Major

Remove SAN-order dependency when deriving peerFQDN.

Using csr.DNSNames[0] is brittle. If the first SAN is wildcard, wildcard augmentation can generate invalid entries and inconsistent issuance metadata.

🔧 Proposed fix
 import (
 	"context"
 	"crypto/x509"
 	"fmt"
+	"slices"
+	"strings"
 	"time"
@@
-	peerFQDN := csr.DNSNames[0]
+	var peerFQDN string
+	for _, name := range csr.DNSNames {
+		n := strings.TrimSpace(name)
+		if n == "" || strings.HasPrefix(n, "*.") {
+			continue
+		}
+		peerFQDN = n
+		break
+	}
+	if peerFQDN == "" {
+		return nil, nil, fmt.Errorf("csr must include a non-wildcard DNS SAN")
+	}
@@
-	dnsNames := csr.DNSNames
+	dnsNames := slices.Clone(csr.DNSNames)
 	if wildcard && !containsName(dnsNames, "*."+peerFQDN) {
 		dnsNames = append(dnsNames, "*."+peerFQDN)
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@management/server/ca/manager.go` around lines 123 - 165, The code currently
picks peerFQDN from csr.DNSNames[0], which is order-dependent and can be a
wildcard; change it to derive peerFQDN deterministically by selecting the first
non-wildcard entry from csr.DNSNames (i.e., the first name that does not start
with "*.") and if none exists, fall back to csr.Subject.CommonName (and finally
to the first DNSName if CommonName is empty); update references to peerFQDN used
in NewInternalCASigner and wildcard augmentation so containsName and the
"*."+peerFQDN append use this validated non-wildcard FQDN.
🧹 Nitpick comments (10)
client/internal/cert/trust_windows.go (2)

57-72: Consider extracting shared helpers to reduce duplication.

The writeTempPEM and sha1Fingerprint functions are nearly identical to the Darwin implementation. Consider extracting these to a shared file (e.g., trust_common.go without build tags, or with appropriate build constraints) to reduce duplication and ensure consistent behavior across platforms.

Also applies to: 74-85

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@client/internal/cert/trust_windows.go` around lines 57 - 72, The writeTempPEM
and sha1Fingerprint implementations in trust_windows.go duplicate logic from
Darwin; extract both functions into a new shared file (e.g., trust_common.go)
without platform build tags (or with appropriate tags that include all OSes that
need them) and remove the duplicates from trust_windows.go and the Darwin file;
ensure the new file exports or keeps the same unexported function names
(writeTempPEM, sha1Fingerprint) so existing callers (e.g., in trust_windows.go
and the Darwin counterpart) continue to compile and behavior remains identical.

24-28: Consider adding timeout protection like Linux.

Similar to the Darwin implementation, the Windows certutil commands run without timeout protection. The Linux version uses a 30-second timeout. While certutil operations are generally fast, consistency across platforms would improve reliability.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@client/internal/cert/trust_windows.go` around lines 24 - 28, The certutil
invocation in trust_windows.go uses exec.Command without a timeout; wrap the
call in a context with a timeout (e.g., 30s) and use exec.CommandContext so the
process is killed if it hangs. Locate the exec.Command("certutil", "-addstore",
"-f", "Root", tmpFile) call and replace it to create a context with timeout,
defer cancel, run the command with exec.CommandContext, and preserve the
existing CombinedOutput error handling and tmpFile usage.
client/internal/cert/manager_test.go (1)

84-98: Note: Test uses placeholder PEM content.

The CA PEM content (CA1, CA2) is not valid certificate data. This works if StoreCA simply concatenates and writes the bytes, but if validation is later added to StoreCA, this test will fail. Consider using generateTestCert to create valid CA PEM data for more realistic testing.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@client/internal/cert/manager_test.go` around lines 84 - 98, TestStoreCA
currently writes placeholder PEMs ("CA1"/"CA2") which aren’t real certs; replace
those with valid PEMs generated by generateTestCert and use them when calling
mgr.StoreCA so the test remains valid if StoreCA adds cert validation.
Specifically, in TestStoreCA use generateTestCert (or a helper that returns a CA
PEM []byte) to produce two CA PEM blobs, pass them to mgr.StoreCA, then read
mgr.CAPath() and assert the expected subject/contents; keep the NewManager and
CAPath usage and retain error checks.
client/internal/cert/trust_linux.go (1)

110-122: Minor: Consider handling partial distro detection failures.

If the cert directory exists but the update command is not found, detectDistro silently falls through to try the next distro. This could mask configuration issues (e.g., Debian system missing update-ca-certificates). Consider logging or returning a more specific error when the directory exists but the command is missing.

💡 Optional improvement
 func detectDistro() (certDir, updateCmd string, err error) {
 	if _, err := os.Stat(debianCertDir); err == nil {
-		if cmd, err := resolveCmd(debianUpdateCmd); err == nil {
+		cmd, cmdErr := resolveCmd(debianUpdateCmd)
+		if cmdErr == nil {
 			return debianCertDir, cmd, nil
 		}
+		// Debian cert dir exists but command not found - continue to try RHEL
 	}
 	if _, err := os.Stat(rhelCertDir); err == nil {
-		if cmd, err := resolveCmd(rhelUpdateCmd); err == nil {
+		cmd, cmdErr := resolveCmd(rhelUpdateCmd)
+		if cmdErr == nil {
 			return rhelCertDir, cmd, nil
 		}
 	}
 	return "", "", fmt.Errorf("unsupported Linux distribution: neither %s nor %s found", debianUpdateCmd, rhelUpdateCmd)
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@client/internal/cert/trust_linux.go` around lines 110 - 122, detectDistro
currently ignores cases where the cert directory (debianCertDir or rhelCertDir)
exists but resolveCmd(debianUpdateCmd / rhelUpdateCmd) fails, which can hide
misconfiguration; update detectDistro to detect and return a specific error (or
log) when a cert dir is present but resolveCmd fails so callers see "certificate
directory present but update command missing" instead of falling through—check
the debian branch and rhel branch separately (using debianCertDir,
debianUpdateCmd, rhelCertDir, rhelUpdateCmd and resolveCmd) and return an
explicit error mentioning the missing update command for that distro instead of
continuing silently.
client/internal/cert/trust_darwin.go (1)

24-28: Consider adding timeout protection for consistency with Linux.

The Linux trust implementation uses runWithTimeout with a 30-second timeout for system commands. The macOS security commands here run without timeout protection. While less common, macOS Keychain operations can also hang (e.g., waiting for user authentication prompts or system lock contention).

♻️ Suggested approach
+import (
+	"context"
 	"crypto/sha1" //nolint:gosec // SHA-1 used for macOS Keychain fingerprint matching
 	"crypto/x509"
 	"encoding/hex"
 	"encoding/pem"
 	"fmt"
 	"os"
 	"os/exec"
 	"strings"
+	"time"
 )

+const trustUpdateTimeout = 30 * time.Second

 // InstallCA adds a CA certificate to the macOS System Keychain as a trusted root.
 func InstallCA(caPEM []byte) error {
 	tmpFile, err := writeTempPEM(caPEM)
 	if err != nil {
 		return err
 	}
 	defer os.Remove(tmpFile)

-	out, err := exec.Command("security", "add-trusted-cert", "-d", "-r", "trustRoot",
-		"-k", "/Library/Keychains/System.keychain", tmpFile).CombinedOutput()
+	ctx, cancel := context.WithTimeout(context.Background(), trustUpdateTimeout)
+	defer cancel()
+
+	out, err := exec.CommandContext(ctx, "security", "add-trusted-cert", "-d", "-r", "trustRoot",
+		"-k", "/Library/Keychains/System.keychain", tmpFile).CombinedOutput()
 	if err != nil {
+		if ctx.Err() == context.DeadlineExceeded {
+			return fmt.Errorf("security add-trusted-cert timed out after %s", trustUpdateTimeout)
+		}
 		return fmt.Errorf("security add-trusted-cert: %s: %w", strings.TrimSpace(string(out)), err)
 	}
 	return nil
 }

Apply similar changes to UninstallCA.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@client/internal/cert/trust_darwin.go` around lines 24 - 28, The macOS
InstallCA code calls exec.Command("security", "add-trusted-cert", ...) without
timeout protection; wrap that call using the existing runWithTimeout helper with
a 30-second timeout (same pattern used in Linux) so the security command uses
CombinedOutput via runWithTimeout(ctx, 30*time.Second, "security",
"add-trusted-cert", ...) and return errors including command output as before;
apply the same change to the corresponding UninstallCA call that runs the
"security" command so both InstallCA and UninstallCA use runWithTimeout for
consistency and to avoid hangs.
management/internals/server/modules.go (1)

212-216: Reduce bootstrap coupling in CAManager() construction.

Using s.AccountManager().GetStore() here couples CA initialization to account manager creation. Prefer using s.Store() directly to keep module wiring independent and avoid side effects from eager AccountManager init.

♻️ Suggested simplification
-		mgr := ca.NewManager(s.AccountManager().GetStore())
+		mgr := ca.NewManager(s.Store())
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@management/internals/server/modules.go` around lines 212 - 216, CAManager()
currently calls s.AccountManager().GetStore(), which forces AccountManager
initialization; change it to use s.Store() when constructing the CA manager to
decouple wiring. In BaseServer.CAManager(), replace the Create callback that
calls ca.NewManager(s.AccountManager().GetStore()) with
ca.NewManager(s.Store()), keep the subsequent
mgr.RegisterSigner(ca.NewACMEPersistSigner()) and return flow unchanged so CA
initialization no longer depends on AccountManager.
shared/management/client/client.go (1)

31-32: Consider interface segregation to reduce API breakage blast radius.

Line 31 and Line 32 add mandatory methods to the root Client interface. If external implementations exist, this becomes a hard compile-time break. A smaller cert-capability interface would make rollout safer.

♻️ Possible refactor
 type Client interface {
 	io.Closer
 	Sync(ctx context.Context, sysInfo *system.Info, msgHandler func(msg *proto.SyncResponse) error) error
 	Job(ctx context.Context, msgHandler func(msg *proto.JobRequest) *proto.JobResponse) error
 	GetServerPublicKey() (*wgtypes.Key, error)
 	Register(serverKey wgtypes.Key, setupKey string, jwtToken string, sysInfo *system.Info, sshKey []byte, dnsLabels domain.List) (*proto.LoginResponse, error)
 	Login(serverKey wgtypes.Key, sysInfo *system.Info, sshKey []byte, dnsLabels domain.List) (*proto.LoginResponse, error)
 	GetDeviceAuthorizationFlow(serverKey wgtypes.Key) (*proto.DeviceAuthorizationFlow, error)
 	GetPKCEAuthorizationFlow(serverKey wgtypes.Key) (*proto.PKCEAuthorizationFlow, error)
 	GetNetworkMap(sysInfo *system.Info) (*proto.NetworkMap, error)
 	IsHealthy() bool
 	SyncMeta(sysInfo *system.Info) error
 	Logout() error
 	CreateExpose(ctx context.Context, req ExposeRequest) (*ExposeResponse, error)
 	RenewExpose(ctx context.Context, domain string) error
 	StopExpose(ctx context.Context, domain string) error
-	SignCertificate(ctx context.Context, csrDER []byte, signingType proto.CertSigningType, wildcard bool) (*proto.SignCertificateResponse, error)
-	GetCACertificates(ctx context.Context) (*proto.GetCACertificatesResponse, error)
 }
+
+type CertificateClient interface {
+	SignCertificate(ctx context.Context, csrDER []byte, signingType proto.CertSigningType, wildcard bool) (*proto.SignCertificateResponse, error)
+	GetCACertificates(ctx context.Context) (*proto.GetCACertificatesResponse, error)
+}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@shared/management/client/client.go` around lines 31 - 32, The root Client
interface was extended with SignCertificate and GetCACertificates which forces
all implementations to change; extract those methods into a new smaller
interface (e.g., CertificateClient or CertManager) and remove them from the root
Client interface, update callers that need cert capabilities to depend on the
new CertificateClient interface (or accept an optional interface assertion) and
update any internal types that currently implement
SignCertificate/GetCACertificates to implement the new interface; reference the
existing Client interface and the two methods SignCertificate and
GetCACertificates to locate the change.
management/server/ca/internal_ca_test.go (1)

85-87: Make CN fallback assertion less brittle.

Line 86 hardcodes expected string length; a regex assertion is more stable for randomized suffix formatting.

🔧 Suggested test tweak
-	assert.Len(t, cert.Subject.CommonName, len("mynetwork.selfhosted Internal CA (")+len("abcdef)"))
+	assert.Regexp(t, `^mynetwork\.selfhosted Internal CA \([0-9a-f]{6}\)$`, cert.Subject.CommonName)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@management/server/ca/internal_ca_test.go` around lines 85 - 87, The current
test uses a brittle length-based assertion for cert.Subject.CommonName; replace
the assert.Len check for the CN suffix with a regex match to allow randomized
suffixes—locate the lines using cert.Subject.CommonName and the
assert.Contains/assert.Len calls and change the length assertion to assert that
cert.Subject.CommonName matches a regexp like `^mynetwork\.selfhosted Internal
CA \([A-Za-z0-9-]+\)$` (use regexp.MustCompile or similar and
assert.Regexp/assert.True with re.MatchString) while keeping the existing
assert.Contains and Organization equality checks.
management/internals/shared/grpc/conversion.go (1)

118-123: Avoid building PeerConfig twice in ToSyncResponse.

toPeerConfig(...) is called twice with identical arguments, and NetworkMap.PeerConfig is reassigned again at Line 132. Build once and reuse to remove redundant work.

♻️ Suggested refactor
 func ToSyncResponse(..., caCertsPEM [][]byte) *proto.SyncResponse {
+	peerCfg := toPeerConfig(peer, networkMap.Network, dnsName, settings, httpConfig, deviceFlowConfig, networkMap.EnableSSH, caCertsPEM)
 	response := &proto.SyncResponse{
-		PeerConfig: toPeerConfig(peer, networkMap.Network, dnsName, settings, httpConfig, deviceFlowConfig, networkMap.EnableSSH, caCertsPEM),
+		PeerConfig: peerCfg,
 		NetworkMap: &proto.NetworkMap{
 			Serial:     networkMap.Network.CurrentSerial(),
 			Routes:     toProtocolRoutes(networkMap.Routes),
 			DNSConfig:  toProtocolDNSConfig(networkMap.DNSConfig, dnsCache, dnsFwdPort),
-			PeerConfig: toPeerConfig(peer, networkMap.Network, dnsName, settings, httpConfig, deviceFlowConfig, networkMap.EnableSSH, caCertsPEM),
+			PeerConfig: peerCfg,
 		},
 		Checks: toProtocolChecks(ctx, checks),
 	}
@@
-	response.NetworkMap.PeerConfig = response.PeerConfig
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@management/internals/shared/grpc/conversion.go` around lines 118 - 123, In
ToSyncResponse, toPeerConfig(...) is invoked twice with identical args—create a
single local variable (e.g., peerCfg := toPeerConfig(peer, networkMap.Network,
dnsName, settings, httpConfig, deviceFlowConfig, networkMap.EnableSSH,
caCertsPEM)) and reuse it for both PeerConfig fields (the top-level PeerConfig
and NetworkMap.PeerConfig) instead of calling toPeerConfig again; remove the
redundant second call/assignment so the function builds the PeerConfig once and
reuses the result.
management/server/ca/types.go (1)

108-123: Clone DNSNames in constructor to avoid slice aliasing.

DNSNames: p.DNSNames keeps shared backing storage; later caller mutations can leak into persisted model state.

♻️ Proposed fix
 	return &IssuedCertificate{
 		ID:           xid.New().String(),
 		AccountID:    p.AccountID,
 		PeerID:       p.PeerID,
 		SerialNumber: p.SerialNumber,
-		DNSNames:     p.DNSNames,
+		DNSNames:     append([]string(nil), p.DNSNames...),
 		HasWildcard:  p.HasWildcard,
 		NotBefore:    p.NotBefore,
 		NotAfter:     p.NotAfter,
 		SigningType:   p.SigningType,
 		SignedByCAID: p.SignedByCAID,
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@management/server/ca/types.go` around lines 108 - 123, NewIssuedCertificate
currently assigns the DNSNames slice by reference (DNSNames: p.DNSNames) which
allows caller-side mutations to leak into the IssuedCertificate model; update
NewIssuedCertificate to clone p.DNSNames into a new slice before assigning
(e.g., allocate a new slice with the same length/capacity and copy elements) so
IssuedCertificate.DNSNames has its own backing array and cannot be mutated via
the original slice.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@client/server/cert.go`:
- Around line 219-241: The UntrustCA handler currently always returns Success:
true even when no certificate blocks are decoded or cert.UninstallCA fails;
update the logic in the UntrustCA function to set the response Success field
based on whether any CA was actually removed (i.e., removed > 0), and return a
non-success response or error when removed == 0 (and log a warning via
log.Warnf) so callers can detect failure; adjust the final return that
constructs proto.UntrustCAResponse to reflect the removed count and success
boolean instead of unconditionally true.

In `@management/server/ca/internal_ca.go`:
- Around line 287-307: validateCSRNames currently only checks DNSNames but must
reject CSRs containing IPAddresses, EmailAddresses, or URIs and must reject
duplicate DNS SANs; update validateCSRNames to return an error if
csr.IPAddresses, csr.EmailAddresses, or csr.URIs are non-empty, ensure DNS name
comparisons use case-insensitive matching against peerFQDN and the optional
"*."+peerFQDN entry, and detect duplicates in csr.DNSNames (e.g., via a
map[string]bool) to reject CSRs with repeated DNS SANs; preserve the final check
that peerFQDN is present (use the containsName helper or inline case-insensitive
check) and produce clear error messages referencing the offending SAN type or
duplicate name.

In `@management/server/http/handlers/ca/ca_handler.go`:
- Around line 299-301: The code currently sets opts.Validity =
time.Duration(*req.ValidityDays) * 24 * time.Hour without validating
req.ValidityDays; update the CA handler to first check that req.ValidityDays is
non-nil and a positive, non-zero value (and optionally enforce an upper bound if
desired), and return a clear HTTP error (bad request) when the value is zero or
negative instead of converting it; locate the logic around req.ValidityDays and
opts.Validity in ca_handler.go and add the validation/early-return before
assigning opts.Validity.

In `@shared/management/client/rest/ca.go`:
- Around line 45-46: The code concatenates caID directly into the request path
(e.g., in the NewRequest call), which can break URLs for IDs with reserved
chars; wrap caID with url.PathEscape before building the path (and import
net/url if missing), and apply the same change to the other occurrence that
builds "/api/ca/"+caID (the second call around the other NewRequest usage) so
both use url.PathEscape(caID) when constructing the URI.

In `@shared/management/http/api/openapi.yml`:
- Around line 10139-10140: Update the OpenAPI operation description string
"description: Rotate the certificate authority, deactivating the current one and
creating a new one" (the operation under tag "Certificate Authority") to
accurately reflect the actual behavior: state that rotation creates a new CA
while retaining the current CA for a configurable/graceful overlap period and
only deactivates the old CA after the overlap, rather than saying it deactivates
immediately; keep the description concise and explicit about the overlap
behavior.

In `@shared/management/http/api/types.gen.go`:
- Around line 1489-1490: Ensure parseCAOptions() validates the ValidityDays
field from the request against the OpenAPI bounds (minimum 1, maximum 36500)
before converting it to a duration: check if ValidityDays is nil (use default if
intended), and if non-nil verify *ValidityDays >= 1 && *ValidityDays <= 36500,
returning a validation error (with clear message) when out of range; do this
validation in parseCAOptions() (where the conversion to time.Duration currently
happens) so invalid values are rejected per the API contract.

---

Outside diff comments:
In `@management/internals/controllers/network_map/controller/controller.go`:
- Around line 230-261: The goroutine captures the loop variable but receives the
current peer as parameter p (*nbpeer.Peer); change the proxy map lookup to use
p.ID instead of the outer peer.ID so the correct proxyNetworkMaps entry is
merged into remotePeerNetworkMap (update the proxyNetworkMaps[peer.ID] ->
proxyNetworkMaps[p.ID] reference where remotePeerNetworkMap.Merge(...) is
called).

---

Duplicate comments:
In `@client/internal/engine.go`:
- Around line 2159-2163: The renewal call to e.mgmClient.SignCertificate
currently hardcodes mgmProto.CertSigningType_CERT_SIGNING_INTERNAL which can
change issuer mode; update the call to pass the certificate signing policy
currently in use instead of the constant. Retrieve the active signing type from
the engine state or certificate metadata (e.g., an existing field like
e.currentCertSigningType or from the existing cert object) and use that variable
in the SignCertificate invocation (preserving signCtx, csrDER and isWildcard),
so renewal respects the original cert signing policy rather than switching to
CERT_SIGNING_INTERNAL.
- Around line 2181-2190: The code only checks certPEM non-empty before
persisting; update the renewal flow to validate the payloads
(signResp.InternalCertPem, signResp.InternalChainPem, and keyPEM) before calling
e.certManager.StoreCert: decode PEM blocks and parse the leaf certificate with
crypto/x509 (ensure x509.ParseCertificate succeeds), parse and validate the
chain PEM blocks, parse the private key (PKCS1/PKCS8/EC) and verify the private
key matches the leaf certificate's public key; if any parse/validation step
fails log a descriptive message with log.Warnf and return without calling
e.certManager.StoreCert to avoid overwriting a working cert set.

In `@management/server/ca/manager.go`:
- Around line 123-165: The code currently picks peerFQDN from csr.DNSNames[0],
which is order-dependent and can be a wildcard; change it to derive peerFQDN
deterministically by selecting the first non-wildcard entry from csr.DNSNames
(i.e., the first name that does not start with "*.") and if none exists, fall
back to csr.Subject.CommonName (and finally to the first DNSName if CommonName
is empty); update references to peerFQDN used in NewInternalCASigner and
wildcard augmentation so containsName and the "*."+peerFQDN append use this
validated non-wildcard FQDN.

In `@management/server/store/sql_store.go`:
- Around line 5604-5611: CreateCertIssuanceLog omits the nil-input guard used by
other Create* methods; add a check at the start of CreateCertIssuanceLog to
return an InvalidArgument status when the provided *ca.CertIssuanceLog entry is
nil (log the error with log.WithContext(ctx).Errorf) before calling
s.db.Create(entry) so the method matches the defensive validation pattern of
other create methods.

In `@shared/management/client/rest/ca.go`:
- Around line 106-108: In CertificateAuthorityAPI.RevokeCertificate, the request
currently uses a.c.NewRequest with method "POST" and path
"/api/ca/certificates/"+serialNumber+"/revoke"; change it to use the DELETE verb
and the server route "/api/ca/certificates/"+serialNumber by updating the
a.c.NewRequest call in RevokeCertificate so it issues a DELETE to the correct
path.

In `@shared/management/http/api/openapi.yml`:
- Around line 10061-10074: Add a 422 Unprocessable Entity response to the
OpenAPI responses for the CA init and rotate endpoints to document validation
failures: update the POST /api/ca and POST /api/ca/rotate response objects in
shared/management/http/api/openapi.yml to include a '422' entry referencing the
shared validation response (e.g. "$ref":
"#/components/responses/validation_error" or equivalent component name used in
the spec), and mirror the same 422 addition for the other identical response
blocks noted around the other range. Ensure the reference matches the existing
component key for validation errors so clients see the schema for validation
failures.

---

Nitpick comments:
In `@client/internal/cert/manager_test.go`:
- Around line 84-98: TestStoreCA currently writes placeholder PEMs ("CA1"/"CA2")
which aren’t real certs; replace those with valid PEMs generated by
generateTestCert and use them when calling mgr.StoreCA so the test remains valid
if StoreCA adds cert validation. Specifically, in TestStoreCA use
generateTestCert (or a helper that returns a CA PEM []byte) to produce two CA
PEM blobs, pass them to mgr.StoreCA, then read mgr.CAPath() and assert the
expected subject/contents; keep the NewManager and CAPath usage and retain error
checks.

In `@client/internal/cert/trust_darwin.go`:
- Around line 24-28: The macOS InstallCA code calls exec.Command("security",
"add-trusted-cert", ...) without timeout protection; wrap that call using the
existing runWithTimeout helper with a 30-second timeout (same pattern used in
Linux) so the security command uses CombinedOutput via runWithTimeout(ctx,
30*time.Second, "security", "add-trusted-cert", ...) and return errors including
command output as before; apply the same change to the corresponding UninstallCA
call that runs the "security" command so both InstallCA and UninstallCA use
runWithTimeout for consistency and to avoid hangs.

In `@client/internal/cert/trust_linux.go`:
- Around line 110-122: detectDistro currently ignores cases where the cert
directory (debianCertDir or rhelCertDir) exists but resolveCmd(debianUpdateCmd /
rhelUpdateCmd) fails, which can hide misconfiguration; update detectDistro to
detect and return a specific error (or log) when a cert dir is present but
resolveCmd fails so callers see "certificate directory present but update
command missing" instead of falling through—check the debian branch and rhel
branch separately (using debianCertDir, debianUpdateCmd, rhelCertDir,
rhelUpdateCmd and resolveCmd) and return an explicit error mentioning the
missing update command for that distro instead of continuing silently.

In `@client/internal/cert/trust_windows.go`:
- Around line 57-72: The writeTempPEM and sha1Fingerprint implementations in
trust_windows.go duplicate logic from Darwin; extract both functions into a new
shared file (e.g., trust_common.go) without platform build tags (or with
appropriate tags that include all OSes that need them) and remove the duplicates
from trust_windows.go and the Darwin file; ensure the new file exports or keeps
the same unexported function names (writeTempPEM, sha1Fingerprint) so existing
callers (e.g., in trust_windows.go and the Darwin counterpart) continue to
compile and behavior remains identical.
- Around line 24-28: The certutil invocation in trust_windows.go uses
exec.Command without a timeout; wrap the call in a context with a timeout (e.g.,
30s) and use exec.CommandContext so the process is killed if it hangs. Locate
the exec.Command("certutil", "-addstore", "-f", "Root", tmpFile) call and
replace it to create a context with timeout, defer cancel, run the command with
exec.CommandContext, and preserve the existing CombinedOutput error handling and
tmpFile usage.

In `@management/internals/server/modules.go`:
- Around line 212-216: CAManager() currently calls
s.AccountManager().GetStore(), which forces AccountManager initialization;
change it to use s.Store() when constructing the CA manager to decouple wiring.
In BaseServer.CAManager(), replace the Create callback that calls
ca.NewManager(s.AccountManager().GetStore()) with ca.NewManager(s.Store()), keep
the subsequent mgr.RegisterSigner(ca.NewACMEPersistSigner()) and return flow
unchanged so CA initialization no longer depends on AccountManager.

In `@management/internals/shared/grpc/conversion.go`:
- Around line 118-123: In ToSyncResponse, toPeerConfig(...) is invoked twice
with identical args—create a single local variable (e.g., peerCfg :=
toPeerConfig(peer, networkMap.Network, dnsName, settings, httpConfig,
deviceFlowConfig, networkMap.EnableSSH, caCertsPEM)) and reuse it for both
PeerConfig fields (the top-level PeerConfig and NetworkMap.PeerConfig) instead
of calling toPeerConfig again; remove the redundant second call/assignment so
the function builds the PeerConfig once and reuses the result.

In `@management/server/ca/internal_ca_test.go`:
- Around line 85-87: The current test uses a brittle length-based assertion for
cert.Subject.CommonName; replace the assert.Len check for the CN suffix with a
regex match to allow randomized suffixes—locate the lines using
cert.Subject.CommonName and the assert.Contains/assert.Len calls and change the
length assertion to assert that cert.Subject.CommonName matches a regexp like
`^mynetwork\.selfhosted Internal CA \([A-Za-z0-9-]+\)$` (use regexp.MustCompile
or similar and assert.Regexp/assert.True with re.MatchString) while keeping the
existing assert.Contains and Organization equality checks.

In `@management/server/ca/types.go`:
- Around line 108-123: NewIssuedCertificate currently assigns the DNSNames slice
by reference (DNSNames: p.DNSNames) which allows caller-side mutations to leak
into the IssuedCertificate model; update NewIssuedCertificate to clone
p.DNSNames into a new slice before assigning (e.g., allocate a new slice with
the same length/capacity and copy elements) so IssuedCertificate.DNSNames has
its own backing array and cannot be mutated via the original slice.

In `@shared/management/client/client.go`:
- Around line 31-32: The root Client interface was extended with SignCertificate
and GetCACertificates which forces all implementations to change; extract those
methods into a new smaller interface (e.g., CertificateClient or CertManager)
and remove them from the root Client interface, update callers that need cert
capabilities to depend on the new CertificateClient interface (or accept an
optional interface assertion) and update any internal types that currently
implement SignCertificate/GetCACertificates to implement the new interface;
reference the existing Client interface and the two methods SignCertificate and
GetCACertificates to locate the change.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 4e9674af-7560-4b75-b79a-493211b8b7cb

📥 Commits

Reviewing files that changed from the base of the PR and between b8ce917 and 4f9be46.

⛔ Files ignored due to path filters (5)
  • client/proto/daemon.pb.go is excluded by !**/*.pb.go
  • client/proto/daemon_grpc.pb.go is excluded by !**/*.pb.go
  • shared/management/proto/management.pb.go is excluded by !**/*.pb.go
  • shared/management/proto/management_grpc.pb.go is excluded by !**/*.pb.go
  • shared/management/proto/proxy_service.pb.go is excluded by !**/*.pb.go
📒 Files selected for processing (51)
  • client/cmd/cert.go
  • client/cmd/root.go
  • client/internal/cert/manager.go
  • client/internal/cert/manager_test.go
  • client/internal/cert/trust_darwin.go
  • client/internal/cert/trust_linux.go
  • client/internal/cert/trust_windows.go
  • client/internal/connect.go
  • client/internal/engine.go
  • client/proto/daemon.proto
  • client/server/cert.go
  • client/server/server.go
  • management/internals/controllers/network_map/controller/controller.go
  • management/internals/controllers/network_map/controller/repository.go
  • management/internals/server/boot.go
  • management/internals/server/modules.go
  • management/internals/shared/grpc/cert_service.go
  • management/internals/shared/grpc/conversion.go
  • management/internals/shared/grpc/server.go
  • management/server/account.go
  • management/server/account_test.go
  • management/server/activity/codes.go
  • management/server/ca/acme_stub.go
  • management/server/ca/internal_ca.go
  • management/server/ca/internal_ca_test.go
  • management/server/ca/manager.go
  • management/server/ca/manager_test.go
  • management/server/ca/signer.go
  • management/server/ca/types.go
  • management/server/http/handler.go
  • management/server/http/handlers/accounts/accounts_handler.go
  • management/server/http/handlers/accounts/accounts_handler_test.go
  • management/server/http/handlers/ca/ca_handler.go
  • management/server/http/handlers/ca/ca_handler_test.go
  • management/server/http/testing/testing_tools/channel/channel.go
  • management/server/peer_test.go
  • management/server/permissions/modules/module.go
  • management/server/store/sql_store.go
  • management/server/store/store.go
  • management/server/types/account.go
  • management/server/types/settings.go
  • management/server/types/testdata/networkmap_golden_new_with_onpeeradded_router.json
  • shared/management/client/cert.go
  • shared/management/client/client.go
  • shared/management/client/mock.go
  • shared/management/client/rest/ca.go
  • shared/management/client/rest/ca_test.go
  • shared/management/client/rest/client.go
  • shared/management/http/api/openapi.yml
  • shared/management/http/api/types.gen.go
  • shared/management/proto/management.proto
🚧 Files skipped from review as they are similar to previous changes (6)
  • client/internal/connect.go
  • management/server/account.go
  • management/server/store/store.go
  • client/cmd/cert.go
  • client/proto/daemon.proto
  • client/internal/cert/manager.go

Comment thread client/server/cert.go
Comment on lines +219 to +241
rest := caPEMData
var removed int
for {
var block *pem.Block
block, rest = pem.Decode(rest)
if block == nil {
break
}

singlePEM := pem.EncodeToMemory(block)
if err := cert.UninstallCA(singlePEM); err != nil {
log.Warnf("failed to remove CA from trust store: %v", err)
continue
}
removed++
}

log.Infof("removed %d CA certificate(s) from OS trust store", removed)

return &proto.UntrustCAResponse{
Success: true,
}, nil
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Don’t return success when no CA was actually removed.

UntrustCA currently reports success even if all uninstall operations fail or no valid cert blocks are found.

🔧 Proposed fix
 	rest := caPEMData
 	var removed int
+	var failed int
 	for {
 		var block *pem.Block
 		block, rest = pem.Decode(rest)
 		if block == nil {
 			break
 		}
@@
 		singlePEM := pem.EncodeToMemory(block)
 		if err := cert.UninstallCA(singlePEM); err != nil {
 			log.Warnf("failed to remove CA from trust store: %v", err)
+			failed++
 			continue
 		}
 		removed++
 	}
 
+	if removed == 0 {
+		return nil, gstatus.Errorf(codes.FailedPrecondition, "no CA certificates were removed")
+	}
+
 	log.Infof("removed %d CA certificate(s) from OS trust store", removed)
 
 	return &proto.UntrustCAResponse{
-		Success: true,
+		Success: failed == 0,
 	}, nil
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@client/server/cert.go` around lines 219 - 241, The UntrustCA handler
currently always returns Success: true even when no certificate blocks are
decoded or cert.UninstallCA fails; update the logic in the UntrustCA function to
set the response Success field based on whether any CA was actually removed
(i.e., removed > 0), and return a non-success response or error when removed ==
0 (and log a warning via log.Warnf) so callers can detect failure; adjust the
final return that constructs proto.UntrustCAResponse to reflect the removed
count and success boolean instead of unconditionally true.

Comment on lines +223 to +229
func certSigningTypeToString(t proto.CertSigningType) string {
switch t {
case proto.CertSigningType_CERT_SIGNING_ACME:
return ca.SigningTypeACME
default:
return ca.SigningTypeInternal
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Reject unknown signing types instead of silently defaulting to internal.

default -> internal can produce unintended issuance behavior when a client sends an unrecognized enum value.

🔧 Proposed fix
-func certSigningTypeToString(t proto.CertSigningType) string {
+func certSigningTypeToString(t proto.CertSigningType) (string, error) {
 	switch t {
+	case proto.CertSigningType_CERT_SIGNING_INTERNAL:
+		return ca.SigningTypeInternal, nil
 	case proto.CertSigningType_CERT_SIGNING_ACME:
-		return ca.SigningTypeACME
+		return ca.SigningTypeACME, nil
 	default:
-		return ca.SigningTypeInternal
+		return "", status.Errorf(codes.InvalidArgument, "unsupported signing_type: %v", t)
 	}
 }
-	signingType := certSigningTypeToString(signReq.SigningType)
+	signingType, err := certSigningTypeToString(signReq.SigningType)
+	if err != nil {
+		return nil, err
+	}

Comment on lines +287 to +307
func (s *InternalCASigner) validateCSRNames(csr *x509.CertificateRequest, peerFQDN string, wildcard bool) error {
if len(csr.DNSNames) == 0 {
return fmt.Errorf("CSR must contain at least one DNS name")
}

expectedNames := map[string]bool{strings.ToLower(peerFQDN): true}
if wildcard {
expectedNames["*."+strings.ToLower(peerFQDN)] = true
}

for _, name := range csr.DNSNames {
if !expectedNames[strings.ToLower(name)] {
return fmt.Errorf("CSR contains unexpected DNS name %q, expected %v", name, peerFQDN)
}
}

if !containsName(csr.DNSNames, peerFQDN) {
return fmt.Errorf("CSR must contain the peer FQDN %q", peerFQDN)
}

return nil
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Enforce strict SAN rejection in CSR validation (IP/Email/URI + duplicate DNS SANs).

validateCSRNames currently allows CSRs that include IP, Email, URI SANs, and duplicate DNS entries. Even if SANs are overwritten at issuance, this weakens strict validation guarantees for the signing API.

🛡️ Suggested hardening
 func (s *InternalCASigner) validateCSRNames(csr *x509.CertificateRequest, peerFQDN string, wildcard bool) error {
+	if len(csr.IPAddresses) > 0 || len(csr.EmailAddresses) > 0 || len(csr.URIs) > 0 {
+		return fmt.Errorf("CSR must not contain IP, Email, or URI SANs")
+	}
 	if len(csr.DNSNames) == 0 {
 		return fmt.Errorf("CSR must contain at least one DNS name")
 	}
 
 	expectedNames := map[string]bool{strings.ToLower(peerFQDN): true}
 	if wildcard {
 		expectedNames["*."+strings.ToLower(peerFQDN)] = true
 	}
 
+	seen := make(map[string]struct{}, len(csr.DNSNames))
 	for _, name := range csr.DNSNames {
-		if !expectedNames[strings.ToLower(name)] {
+		normalized := strings.ToLower(name)
+		if _, exists := seen[normalized]; exists {
+			return fmt.Errorf("CSR contains duplicate DNS SAN %q", name)
+		}
+		seen[normalized] = struct{}{}
+		if !expectedNames[normalized] {
 			return fmt.Errorf("CSR contains unexpected DNS name %q, expected %v", name, peerFQDN)
 		}
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@management/server/ca/internal_ca.go` around lines 287 - 307, validateCSRNames
currently only checks DNSNames but must reject CSRs containing IPAddresses,
EmailAddresses, or URIs and must reject duplicate DNS SANs; update
validateCSRNames to return an error if csr.IPAddresses, csr.EmailAddresses, or
csr.URIs are non-empty, ensure DNS name comparisons use case-insensitive
matching against peerFQDN and the optional "*."+peerFQDN entry, and detect
duplicates in csr.DNSNames (e.g., via a map[string]bool) to reject CSRs with
repeated DNS SANs; preserve the final check that peerFQDN is present (use the
containsName helper or inline case-insensitive check) and produce clear error
messages referencing the offending SAN type or duplicate name.

Comment on lines +299 to +301
if req.ValidityDays != nil {
opts.Validity = time.Duration(*req.ValidityDays) * 24 * time.Hour
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Validate ValidityDays before converting to duration.

Line 299 accepts zero/negative values, which can generate invalid CA lifetimes.

🔧 Proposed fix
 if req.ValidityDays != nil {
+	if *req.ValidityDays <= 0 {
+		return nbca.CAOptions{}, status.Errorf(status.InvalidArgument, "validity_days must be greater than 0")
+	}
 	opts.Validity = time.Duration(*req.ValidityDays) * 24 * time.Hour
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if req.ValidityDays != nil {
opts.Validity = time.Duration(*req.ValidityDays) * 24 * time.Hour
}
if req.ValidityDays != nil {
if *req.ValidityDays <= 0 {
return nbca.CAOptions{}, status.Errorf(status.InvalidArgument, "validity_days must be greater than 0")
}
opts.Validity = time.Duration(*req.ValidityDays) * 24 * time.Hour
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@management/server/http/handlers/ca/ca_handler.go` around lines 299 - 301, The
code currently sets opts.Validity = time.Duration(*req.ValidityDays) * 24 *
time.Hour without validating req.ValidityDays; update the CA handler to first
check that req.ValidityDays is non-nil and a positive, non-zero value (and
optionally enforce an upper bound if desired), and return a clear HTTP error
(bad request) when the value is zero or negative instead of converting it;
locate the logic around req.ValidityDays and opts.Validity in ca_handler.go and
add the validation/early-return before assigning opts.Validity.

Comment on lines +45 to +46
resp, err := a.c.NewRequest(ctx, "GET", "/api/ca/"+caID, nil, nil)
if err != nil {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Escape path parameters before concatenating API paths.

caID is inserted raw into the URI. Use url.PathEscape to prevent malformed requests when IDs contain reserved characters.

🔒 Suggested fix
 import (
 	"context"
+	"net/url"
@@
 func (a *CertificateAuthorityAPI) GetCA(ctx context.Context, caID string) (*api.CACertificateResponse, error) {
-	resp, err := a.c.NewRequest(ctx, "GET", "/api/ca/"+caID, nil, nil)
+	resp, err := a.c.NewRequest(ctx, "GET", "/api/ca/"+url.PathEscape(caID), nil, nil)
@@
 func (a *CertificateAuthorityAPI) DeactivateCA(ctx context.Context, caID string) error {
-	resp, err := a.c.NewRequest(ctx, "DELETE", "/api/ca/"+caID, nil, nil)
+	resp, err := a.c.NewRequest(ctx, "DELETE", "/api/ca/"+url.PathEscape(caID), nil, nil)

Also applies to: 61-61

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@shared/management/client/rest/ca.go` around lines 45 - 46, The code
concatenates caID directly into the request path (e.g., in the NewRequest call),
which can break URLs for IDs with reserved chars; wrap caID with url.PathEscape
before building the path (and import net/url if missing), and apply the same
change to the other occurrence that builds "/api/ca/"+caID (the second call
around the other NewRequest usage) so both use url.PathEscape(caID) when
constructing the URI.

Comment on lines +10139 to +10140
description: Rotate the certificate authority, deactivating the current one and creating a new one
tags: [ Certificate Authority ]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Rotate CA description appears behaviorally inaccurate.

Line [10139] says rotation deactivates the current CA, but this PR’s objective describes a graceful overlap period with both CAs present. This description should be corrected to avoid operator/client misunderstanding.

Suggested doc fix
-      description: Rotate the certificate authority, deactivating the current one and creating a new one
+      description: Rotate the certificate authority by creating a new CA alongside the current one for graceful transition
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
description: Rotate the certificate authority, deactivating the current one and creating a new one
tags: [ Certificate Authority ]
description: Rotate the certificate authority by creating a new CA alongside the current one for graceful transition
tags: [ Certificate Authority ]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@shared/management/http/api/openapi.yml` around lines 10139 - 10140, Update
the OpenAPI operation description string "description: Rotate the certificate
authority, deactivating the current one and creating a new one" (the operation
under tag "Certificate Authority") to accurately reflect the actual behavior:
state that rotation creates a new CA while retaining the current CA for a
configurable/graceful overlap period and only deactivates the old CA after the
overlap, rather than saying it deactivates immediately; keep the description
concise and explicit about the overlap behavior.

Comment on lines +1489 to +1490
// ValidityDays CA certificate validity in days. Defaults to 3650 (~10 years)
ValidityDays *int `json:"validity_days,omitempty"`
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "[1/2] Check schema definitions for CAInitRequest.validity_days constraints"
fd -HI -e yaml -e yml -e json | while read -r f; do
  if rg -q 'CAInitRequest|validity_days' "$f"; then
    echo "---- $f ----"
    rg -n -C2 'CAInitRequest|validity_days|minimum|maximum|exclusiveMinimum|exclusiveMaximum' "$f" || true
  fi
done

echo
echo "[2/2] Check server-side validation paths"
rg -n -C3 'ValidityDays|validity_days|CAInitRequest|PostApiCa|PostApiCaRotate|InitCA|RotateCA'

Repository: netbirdio/netbird

Length of output: 19125


Add validation in handler for validity_days constraints.

The OpenAPI schema defines constraints for validity_days (minimum: 1, maximum: 36500 at lines 4258-4259 in openapi.yml), but the handler code in management/server/http/handlers/ca/ca_handler.go (lines 299–301) accepts the value without validating it against these bounds. A client can send invalid values that violate the API contract, and they will be accepted and processed.

Add validation in parseCAOptions() to enforce the documented constraints before converting to duration.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@shared/management/http/api/types.gen.go` around lines 1489 - 1490, Ensure
parseCAOptions() validates the ValidityDays field from the request against the
OpenAPI bounds (minimum 1, maximum 36500) before converting it to a duration:
check if ValidityDays is nil (use default if intended), and if non-nil verify
*ValidityDays >= 1 && *ValidityDays <= 36500, returning a validation error (with
clear message) when out of range; do this validation in parseCAOptions() (where
the conversion to time.Duration currently happens) so invalid values are
rejected per the API contract.

@zgv163 zgv163 marked this pull request as draft March 5, 2026 22:12
@zgv163
Copy link
Copy Markdown
Author

zgv163 commented Mar 6, 2026

Closing to resubmit with fixes and cleaner commits. The original submission had several critical bugs (CSR SAN validation, panics) and SonarCloud failures that need addressing. Will reopen with a clean PR once all issues are resolved and tests pass locally.

@zgv163
Copy link
Copy Markdown
Author

zgv163 commented Mar 6, 2026

Closing this in favor of a Let's Encrypt DNS-PERSIST-01 approach instead. The Internal CA route doesn't provide a smooth experience on iOS and Android — see updated comment on #5479 for details.

The code here is preserved for reference and could be useful for air-gapped/enterprise environments in the future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature Proposal] Internal HTTPS Certificates for Peer Resources

2 participants