Skip to content

[self-hosted] add netbird server#5232

Merged
mlsmaycon merged 87 commits intomainfrom
combined
Feb 12, 2026
Merged

[self-hosted] add netbird server#5232
mlsmaycon merged 87 commits intomainfrom
combined

Conversation

@braginini
Copy link
Copy Markdown
Collaborator

@braginini braginini commented Feb 1, 2026

Describe your changes

Add a combined version of NetBird's control plane.

docs are coming.

Issue ticket number and link

Stack

Checklist

  • Is it a bug fix
  • Is a typo/documentation fix
  • Is a feature enhancement
  • It is a refactor
  • Created tests that fail without the change (if possible)

By submitting this pull request, you confirm that you have read and agree to the terms of the Contributor License Agreement.

Documentation

Select exactly one:

Docs PR URL (required if "docs added" is checked)

Paste the PR link from https://github.com/netbirdio/docs here:

https://github.com/netbirdio/docs/pull/__

Summary by CodeRabbit

  • New Features
    • Unified NetBird combined server (Management, Signal, Relay, STUN) as a single executable with richer YAML configuration, validation, and defaults.
    • Official Dockerfile/image for single-container deployment.
    • Optional in-process profiling endpoint for diagnostics.
    • Multiplexing to route HTTP/gRPC/WebSocket traffic via one port; runtime hooks to inject custom handlers.
  • Chores
    • Updated deployment scripts, compose files, and reverse-proxy templates to target the combined server; added example configs and getting-started updates.

shyam0904a and others added 30 commits December 13, 2025 20:54
- dialer_generic.go: Change FailedPrecondition to Internal, add docs for Linux root check
- dialer.go: Preserve both native and WebSocket errors for diagnostics
- dialer.go: Document timeout rationale (10s native, 30s WebSocket)
- fallback.go: Remove codes.Internal, add comments explaining each error code
- flow/client.go: Document why simpler fallback strategy is acceptable
- dialer_generic.go: Use fmt.Errorf for consistent error wrapping, remove unused imports
- fallback.go: Simplify by removing manual unwrap loop (status.FromError handles it)
  - --enable-stun requires non-empty --stun-ports
  - Ports must be in range 1-65535
  - No duplicate ports allowed
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In `@combined/cmd/root.go`:
- Around line 526-529: The code trusts X-Real-Ip / X-Real-Port unconditionally
when building connRemoteAddr (variables/functions: connRemoteAddr, r.RemoteAddr,
r.Header.Get("X-Real-Ip")/X-Real-Port); change it to only honor those headers
when the immediate peer (r.RemoteAddr) is a configured/trusted reverse proxy
(e.g., check membership against TrustedHTTPProxies or the existing
trust-checking helper used elsewhere), otherwise fall back to r.RemoteAddr;
preserve use of net.JoinHostPort to compose the address when headers are
accepted and silently ignore/validate malformed header values.
- Around line 305-311: Replace the fatal exit inside the healthcheck goroutine
so failures don't bypass shutdown: instead of calling log.Fatalf in the
goroutine that calls httpHealthcheck.ListenAndServe(), capture the error and
send it to a dedicated error channel (or call a provided cancel function) so the
main routine can initiate graceful shutdown; update the goroutine around
httpHealthcheck.ListenAndServe() to check errors.Is(err, http.ErrServerClosed)
and on any other error send that err over the channel (or log.Errorf+call
cancel) rather than invoking log.Fatalf so wg, management, relay, and STUN
shutdown logic can run.
🧹 Nitpick comments (2)
combined/cmd/config.go (1)

242-266: Edge case: exposedAddress without an explicit port.

If a user sets exposedAddress: "https://example.com" (no port), net.SplitHostPort will fail and hostPort will be "example.com". Auto-generated STUN URIs become stun:example.com:3478 (correct), but relay/signal URIs become rels://example.com and https://example.com — which technically work using default ports but may confuse users or downstream parsers expecting explicit ports.

Consider documenting that exposedAddress should include the port, or defaulting to :443/:80 based on protocol when no port is present.

combined/cmd/root.go (1)

222-226: Port extraction from ListenAddress is duplicated.

The same net.SplitHostPort + strconv.Atoi pattern appears in both createManagementServer (method, Line 222) and createManagementServer (function, Line 444). Consider extracting a helper like listenPort(addr string) int to avoid drift.

Also applies to: 444-449

Comment thread combined/cmd/root.go
Comment on lines +305 to +311
wg.Add(1)
go func() {
defer wg.Done()
if err := httpHealthcheck.ListenAndServe(); !errors.Is(err, http.ErrServerClosed) {
log.Fatalf("failed to start healthcheck server: %v", err)
}
}()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

log.Fatalf in healthcheck goroutine will os.Exit(1) bypassing graceful shutdown.

If the healthcheck listener fails (e.g., port conflict), log.Fatalf kills the process immediately without running the shutdown sequence for management, relay, or STUN servers. Consider sending the error through a channel or using log.Errorf + cancellation instead.

🤖 Prompt for AI Agents
In `@combined/cmd/root.go` around lines 305 - 311, Replace the fatal exit inside
the healthcheck goroutine so failures don't bypass shutdown: instead of calling
log.Fatalf in the goroutine that calls httpHealthcheck.ListenAndServe(), capture
the error and send it to a dedicated error channel (or call a provided cancel
function) so the main routine can initiate graceful shutdown; update the
goroutine around httpHealthcheck.ListenAndServe() to check errors.Is(err,
http.ErrServerClosed) and on any other error send that err over the channel (or
log.Errorf+call cancel) rather than invoking log.Fatalf so wg, management,
relay, and STUN shutdown logic can run.

Comment thread combined/cmd/root.go
Comment on lines +526 to +529
connRemoteAddr := r.RemoteAddr
if r.Header.Get("X-Real-Ip") != "" && r.Header.Get("X-Real-Port") != "" {
connRemoteAddr = net.JoinHostPort(r.Header.Get("X-Real-Ip"), r.Header.Get("X-Real-Port"))
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

X-Real-Ip / X-Real-Port headers are trusted without validation against TrustedHTTPProxies.

These headers are used to construct the remote address for the relay connection. If the server is directly exposed (no reverse proxy), a client could spoof them. The impact is low since the relay authenticates via HMAC and these are primarily used for connection tracking/logging, but it's inconsistent with the management server's reverse proxy trust model.

🤖 Prompt for AI Agents
In `@combined/cmd/root.go` around lines 526 - 529, The code trusts X-Real-Ip /
X-Real-Port unconditionally when building connRemoteAddr (variables/functions:
connRemoteAddr, r.RemoteAddr, r.Header.Get("X-Real-Ip")/X-Real-Port); change it
to only honor those headers when the immediate peer (r.RemoteAddr) is a
configured/trusted reverse proxy (e.g., check membership against
TrustedHTTPProxies or the existing trust-checking helper used elsewhere),
otherwise fall back to r.RemoteAddr; preserve use of net.JoinHostPort to compose
the address when headers are accepted and silently ignore/validate malformed
header values.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@infrastructure_files/getting-started.sh`:
- Line 198: The docker-compose log commands use the wrong service name "netbird"
and should use the actual service name "netbird-server"; locate the calls to
$DOCKER_COMPOSE_COMMAND logs --tail=20 netbird (look for the
$DOCKER_COMPOSE_COMMAND invocation and the logs --tail=20 pattern) and change
the service argument to netbird-server in both occurrences (the one around the
health-check timeout and the other later at the second logs call) so
docker-compose prints the server logs instead of failing with "no such service".
🧹 Nitpick comments (1)
infrastructure_files/getting-started.sh (1)

800-802: Consider adding keepalive to the netbird_server upstream.

The netbird_dashboard upstream (line 797-798) has keepalive 10 for connection reuse, but netbird_server does not. For the API and OAuth2 routes (short-lived HTTP requests), a keepalive pool would reduce connection churn to the backend.

Proposed fix
 upstream netbird_server {
     server ${server_addr};
+    keepalive 10;
 }

Comment thread infrastructure_files/getting-started.sh Outdated
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In `@combined/cmd/root.go`:
- Around line 631-650: The logEnvVars function currently masks values only when
the env var key contains "secret", "key", or "password" so NB_ vars containing
"token" are logged in cleartext; update the conditional in logEnvVars to also
check for "token" (case-insensitive) so keys like "NB_API_TOKEN" are masked by
maskSecret, ensuring you reference the logEnvVars function and the maskSecret
helper when making the change.

In `@signal/cmd/run.go`:
- Line 44: signalLetsencryptEmail is declared and exposed as the
--letsencrypt-email flag but never used; either thread it into the cert manager
creation or remove it. To fix, locate where encryption.CreateCertManager is
called (and any related cert manager setup in main/run), and add
signalLetsencryptEmail as an argument (and propagate it through to
CreateCertManager and its callers) so the LetsEncrypt email is passed through,
or if you prefer to drop the option, remove the signalLetsencryptEmail variable
and its flag binding for --letsencrypt-email to eliminate the dead flag.
🧹 Nitpick comments (2)
combined/cmd/root.go (1)

223-227: Silently ignoring strconv.Atoi error on port string.

If portStr is non-numeric (unlikely after SplitHostPort, but possible with malformed config), mgmtPort silently becomes 0. The same pattern repeats at line 450. Consider returning an error or at least logging a warning.

Suggested fix
-	mgmtPort, _ := strconv.Atoi(portStr)
+	mgmtPort, err := strconv.Atoi(portStr)
+	if err != nil {
+		return fmt.Errorf("invalid port in listen address %q: %w", cfg.Server.ListenAddress, err)
+	}

Apply the same pattern at line 450.

signal/cmd/run.go (1)

329-331: Both --letsencrypt-data-dir and deprecated --ssl-dir bind to the same variable.

This is a clean deprecation approach. However, both flags have an empty string default. If a user enables Let's Encrypt (--letsencrypt-domain) but forgets to specify a data directory, encryption.CreateCertManager("") will attempt to use filepath.Join("", "letsencrypt") as the cert cache directory. Consider validating that the data dir is non-empty when Let's Encrypt is enabled.

#!/bin/bash
# Check what happens when CreateCertManager receives an empty datadir
rg -n 'func CreateCertManager' --type=go -A 15

Comment thread combined/cmd/root.go
Comment on lines +631 to +650
// logEnvVars logs all NB_ environment variables that are currently set
func logEnvVars() {
log.Info("=== Environment Variables ===")
found := false
for _, env := range os.Environ() {
if strings.HasPrefix(env, "NB_") {
key, _, _ := strings.Cut(env, "=")
value := os.Getenv(key)
if strings.Contains(strings.ToLower(key), "secret") || strings.Contains(strings.ToLower(key), "key") || strings.Contains(strings.ToLower(key), "password") {
value = maskSecret(value)
}
log.Infof(" %s=%s", key, value)
found = true
}
}
if !found {
log.Info(" (none set)")
}
log.Info("=== End Environment Variables ===")
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Environment variable masking misses common sensitive key patterns like "token".

The mask check on line 639 covers "secret", "key", and "password", but NB_ prefixed env vars containing "token" (e.g., API tokens) would be logged in cleartext.

Suggested fix
-			if strings.Contains(strings.ToLower(key), "secret") || strings.Contains(strings.ToLower(key), "key") || strings.Contains(strings.ToLower(key), "password") {
+			if strings.Contains(strings.ToLower(key), "secret") || strings.Contains(strings.ToLower(key), "key") || strings.Contains(strings.ToLower(key), "password") || strings.Contains(strings.ToLower(key), "token") {
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// logEnvVars logs all NB_ environment variables that are currently set
func logEnvVars() {
log.Info("=== Environment Variables ===")
found := false
for _, env := range os.Environ() {
if strings.HasPrefix(env, "NB_") {
key, _, _ := strings.Cut(env, "=")
value := os.Getenv(key)
if strings.Contains(strings.ToLower(key), "secret") || strings.Contains(strings.ToLower(key), "key") || strings.Contains(strings.ToLower(key), "password") {
value = maskSecret(value)
}
log.Infof(" %s=%s", key, value)
found = true
}
}
if !found {
log.Info(" (none set)")
}
log.Info("=== End Environment Variables ===")
}
// logEnvVars logs all NB_ environment variables that are currently set
func logEnvVars() {
log.Info("=== Environment Variables ===")
found := false
for _, env := range os.Environ() {
if strings.HasPrefix(env, "NB_") {
key, _, _ := strings.Cut(env, "=")
value := os.Getenv(key)
if strings.Contains(strings.ToLower(key), "secret") || strings.Contains(strings.ToLower(key), "key") || strings.Contains(strings.ToLower(key), "password") || strings.Contains(strings.ToLower(key), "token") {
value = maskSecret(value)
}
log.Infof(" %s=%s", key, value)
found = true
}
}
if !found {
log.Info(" (none set)")
}
log.Info("=== End Environment Variables ===")
}
🤖 Prompt for AI Agents
In `@combined/cmd/root.go` around lines 631 - 650, The logEnvVars function
currently masks values only when the env var key contains "secret", "key", or
"password" so NB_ vars containing "token" are logged in cleartext; update the
conditional in logEnvVars to also check for "token" (case-insensitive) so keys
like "NB_API_TOKEN" are masked by maskSecret, ensuring you reference the
logEnvVars function and the maskSecret helper when making the change.

Comment thread signal/cmd/run.go
signalPort int
metricsPort int
signalLetsencryptDomain string
signalLetsencryptEmail string
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

signalLetsencryptEmail is declared and has a CLI flag but is never used.

The variable is bound to --letsencrypt-email (line 332) but is never passed to encryption.CreateCertManager or any other function. Either wire it into the cert manager flow or remove the dead flag to avoid confusing users who set it expecting it to take effect.

#!/bin/bash
# Verify signalLetsencryptEmail is truly unused
rg -n 'signalLetsencryptEmail' --type=go

Also applies to: 218-219, 332-332

🤖 Prompt for AI Agents
In `@signal/cmd/run.go` at line 44, signalLetsencryptEmail is declared and exposed
as the --letsencrypt-email flag but never used; either thread it into the cert
manager creation or remove it. To fix, locate where encryption.CreateCertManager
is called (and any related cert manager setup in main/run), and add
signalLetsencryptEmail as an argument (and propagate it through to
CreateCertManager and its callers) so the LetsEncrypt email is passed through,
or if you prefer to drop the option, remove the signalLetsencryptEmail variable
and its flag binding for --letsencrypt-email to eliminate the dead flag.

@mlsmaycon mlsmaycon changed the title Combined [self-hosted] Combined Feb 12, 2026
@sonarqubecloud
Copy link
Copy Markdown

sonarqubecloud Bot commented Feb 12, 2026

Quality Gate Passed Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
No data about Coverage
3.0% Duplication on New Code

See analysis details on SonarQube Cloud

@mlsmaycon mlsmaycon changed the title [self-hosted] Combined [self-hosted] add netbird server Feb 12, 2026
@mlsmaycon mlsmaycon merged commit 64b849c into main Feb 12, 2026
93 of 98 checks passed
@mlsmaycon mlsmaycon deleted the combined branch February 12, 2026 18:24
@coderabbitai coderabbitai Bot mentioned this pull request Apr 24, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants