[client] Trust wg interface in firewalld to bypass owner-flagged chains#5928
[client] Trust wg interface in firewalld to bypass owner-flagged chains#5928
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds a new firewalld package (DBus + firewall-cmd fallback) and integrates it across iptables, nftables, uspfilter, and router code to add/remove the WireGuard interface from firewalld's "trusted" zone and to exclude firewalld-owned nftables tables from external-chain logic. Changes
Sequence Diagram(s)sequenceDiagram
participant App as Application
participant Mgr as Firewall Manager\n(iptables / nftables / uspfilter)
participant FWD as firewalld Package
participant DBus as DBus\n(org.fedoraproject.FirewallD1)
participant CMD as firewall-cmd\n(CLI)
App->>Mgr: Init() / AllowNetbird()
Mgr->>FWD: TrustInterface(iface)
FWD->>FWD: create 3s timeout context
FWD->>DBus: probe getDefaultZone()
alt DBus available
DBus-->>FWD: zone info
FWD->>DBus: addInterface(iface)
alt Success or idempotent error
DBus-->>FWD: OK / handled
else Assigned to other zone
FWD->>DBus: changeZoneOfInterface(iface, trusted)
DBus-->>FWD: OK
end
else DBus unavailable
FWD->>CMD: firewall-cmd --zone=trusted --change-interface=<iface>
CMD-->>FWD: stdout/exit
FWD->>FWD: parse output for known non-errors
end
FWD-->>Mgr: success or warning (logged)
App->>Mgr: Close() / Reset()
Mgr->>FWD: UntrustInterface(iface)
alt DBus available
FWD->>DBus: removeInterface(iface)
DBus-->>FWD: OK or idempotent
else DBus unavailable
FWD->>CMD: firewall-cmd --zone=trusted --remove-interface=<iface>
CMD-->>FWD: stdout/exit
end
FWD-->>Mgr: success or aggregated error
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (2)
client/firewall/iptables/manager_linux.go (1)
183-212:Closeappending untrust error tomerrintentionally blocksDeleteState.This is a subtle but correct behavior: if firewalld untrust fails, the
ShutdownStateis preserved so the crash-recovery path can retry cleanup on next startup. Worth a short code comment here since the pattern differs fromInit/AllowNetbirdwhich only warn.💡 Optional clarifying comment
if err := m.untrustInFirewalld(); err != nil { + // Append to merr so ShutdownState is preserved for the next startup's + // crash-recovery path to retry firewalld cleanup. merr = multierror.Append(merr, err) }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@client/firewall/iptables/manager_linux.go` around lines 183 - 212, The Close method intentionally appends the error from m.untrustInFirewalld() to merr so that DeleteState on stateManager is skipped when untrust fails (preserving ShutdownState for crash-recovery); add a short clarifying comment in Manager.Close near the m.untrustInFirewalld() append (and/or above the "attempt to delete state only if all other operations succeeded" block) referencing m.untrustInFirewalld, the merr variable, and stateManager.DeleteState to explain this deliberate difference from Init/AllowNetbird which only warn.client/firewall/firewalld/firewalld_linux.go (1)
30-67: Consider consolidating the per-caller boilerplate into the package.Every caller (
iptables.Manager,nftables.Manager,nftables.router) now wrapsIsRunning+Add/RemoveTrustedInterfacewith the same 3s-timeout context and identical logging pattern — three copies oftrustInFirewalldand two ofuntrustInFirewalld. Exposing higher-level helpers from this package would remove the duplication and keep thefirewalldCallTimeout/IsRunningpolicy in one place.♻️ Suggested API
// in client/firewall/firewalld/firewalld.go (or a new helpers file) const CallTimeout = 3 * time.Second // Trust adds iface to the trusted zone if firewalld is running. No-op otherwise. // Uses an internal bounded context so callers don't have to manage timeouts. func Trust(iface string) error { ctx, cancel := context.WithTimeout(context.Background(), CallTimeout) defer cancel() if !IsRunning(ctx) { return nil } if err := AddTrustedInterface(ctx, iface); err != nil { return fmt.Errorf("add %s to firewalld trusted zone: %w", iface, err) } log.Infof("added %s to firewalld trusted zone", iface) return nil } // Untrust is the inverse of Trust. func Untrust(iface string) error { /* symmetric */ }Callers collapse to a single line, e.g.:
-if err := m.trustInFirewalld(); err != nil { +if err := firewalld.Trust(m.wgIface.Name()); err != nil { log.Warnf("failed to trust interface in firewalld: %v", err) }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@client/firewall/firewalld/firewalld_linux.go` around lines 30 - 67, The package has duplicated caller-side boilerplate that creates a 3s timeout context, checks IsRunning, calls AddTrustedInterface/RemoveTrustedInterface and logs; centralize this by adding package-level helpers (e.g., a CallTimeout constant and two functions Trust(iface string) error and Untrust(iface string) error) that create an internal context.WithTimeout(context.Background(), CallTimeout), call IsRunning(ctx) and then AddTrustedInterface/RemoveTrustedInterface, wrap and return errors with context, and emit the shared log messages; update callers to invoke firewalld.Trust/Untrust instead of repeating the timeout+IsRunning+call pattern.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@client/firewall/firewalld/firewalld_linux.go`:
- Around line 30-67: The package has duplicated caller-side boilerplate that
creates a 3s timeout context, checks IsRunning, calls
AddTrustedInterface/RemoveTrustedInterface and logs; centralize this by adding
package-level helpers (e.g., a CallTimeout constant and two functions
Trust(iface string) error and Untrust(iface string) error) that create an
internal context.WithTimeout(context.Background(), CallTimeout), call
IsRunning(ctx) and then AddTrustedInterface/RemoveTrustedInterface, wrap and
return errors with context, and emit the shared log messages; update callers to
invoke firewalld.Trust/Untrust instead of repeating the timeout+IsRunning+call
pattern.
In `@client/firewall/iptables/manager_linux.go`:
- Around line 183-212: The Close method intentionally appends the error from
m.untrustInFirewalld() to merr so that DeleteState on stateManager is skipped
when untrust fails (preserving ShutdownState for crash-recovery); add a short
clarifying comment in Manager.Close near the m.untrustInFirewalld() append
(and/or above the "attempt to delete state only if all other operations
succeeded" block) referencing m.untrustInFirewalld, the merr variable, and
stateManager.DeleteState to explain this deliberate difference from
Init/AllowNetbird which only warn.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 89fd9754-f845-4869-981b-4ecdfc9b6d64
📒 Files selected for processing (6)
client/firewall/firewalld/firewalld.goclient/firewall/firewalld/firewalld_linux.goclient/firewall/firewalld/firewalld_other.goclient/firewall/iptables/manager_linux.goclient/firewall/nftables/manager_linux.goclient/firewall/nftables/router_linux.go
There was a problem hiding this comment.
🧹 Nitpick comments (2)
client/firewall/firewalld/firewalld_linux.go (2)
95-121: Minor: scoperetinside theZONE_ALREADY_SETbranch.
retis only consumed bymove.Store(&ret)in the migration path; declaring it at the top ofaddDBusis slightly misleading (impliesaddInterfaceresult is being captured too). Declaring it next tomove.Storemakes the intent clearer and keeps the success/early-return paths cleaner. Purely cosmetic.♻️ Suggested change
- var ret string call := obj.CallWithContext(ctx, dbusZoneIface+".addInterface", 0, TrustedZone, iface) if call.Err == nil { return nil } if dbusErrContains(call.Err, errAlreadyEnabled) { return nil } if dbusErrContains(call.Err, errZoneAlreadySet) { move := obj.CallWithContext(ctx, dbusZoneIface+".changeZoneOfInterface", 0, TrustedZone, iface) if move.Err == nil { + var ret string return move.Store(&ret) } return fmt.Errorf("firewalld changeZoneOfInterface: %w", move.Err) }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@client/firewall/firewalld/firewalld_linux.go` around lines 95 - 121, The local variable ret is declared at the top of addDBus but only used when handling the dbusZoneIface+".changeZoneOfInterface" branch; move the declaration of ret into that branch immediately before calling move.Store(&ret) to limit its scope and clarify intent (update references in the dbusZoneIface+".changeZoneOfInterface" handling where move and move.Store are used within addDBus).
142-154: Consider toleratingALREADY_ENABLEDoutput inaddCLIfor parity withremoveCLI.
firewall-cmd --change-interfacegenerally moves/binds the interface and returns 0 even when it's already bound to the target zone, so this is usually fine in practice. However, if some firewalld versions/states emit a non-zero exit with anALREADY_ENABLED-style message, the DBus path tolerates it while the CLI fallback surfaces an error. For symmetry withremoveCLI(which already filtersUNKNOWN_INTERFACE/NOT_ENABLED), consider the same non-error mapping here.♻️ Optional
out, err := exec.CommandContext(ctx, "firewall-cmd", "--zone="+TrustedZone, "--change-interface="+iface, ).CombinedOutput() if err != nil { - return fmt.Errorf("firewall-cmd change-interface: %w: %s", err, strings.TrimSpace(string(out))) + msg := strings.TrimSpace(string(out)) + if strings.Contains(msg, errAlreadyEnabled) { + return nil + } + return fmt.Errorf("firewall-cmd change-interface: %w: %s", err, msg) }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@client/firewall/firewalld/firewalld_linux.go` around lines 142 - 154, The addCLI function currently treats any non-zero exit from firewall-cmd as an error; update addCLI to tolerate ALREADY_ENABLED output like removeCLI does by checking the CombinedOutput string for "ALREADY_ENABLED" (or equivalent casing/whitespace) after exec.CommandContext("firewall-cmd", "--zone="+TrustedZone, "--change-interface="+iface). If the output contains ALREADY_ENABLED, return nil instead of an error; otherwise keep the existing fmt.Errorf behavior. Keep the existing exec.LookPath("firewall-cmd") check and mirror the same string-filtering approach used in removeCLI for symmetry.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@client/firewall/firewalld/firewalld_linux.go`:
- Around line 95-121: The local variable ret is declared at the top of addDBus
but only used when handling the dbusZoneIface+".changeZoneOfInterface" branch;
move the declaration of ret into that branch immediately before calling
move.Store(&ret) to limit its scope and clarify intent (update references in the
dbusZoneIface+".changeZoneOfInterface" handling where move and move.Store are
used within addDBus).
- Around line 142-154: The addCLI function currently treats any non-zero exit
from firewall-cmd as an error; update addCLI to tolerate ALREADY_ENABLED output
like removeCLI does by checking the CombinedOutput string for "ALREADY_ENABLED"
(or equivalent casing/whitespace) after exec.CommandContext("firewall-cmd",
"--zone="+TrustedZone, "--change-interface="+iface). If the output contains
ALREADY_ENABLED, return nil instead of an error; otherwise keep the existing
fmt.Errorf behavior. Keep the existing exec.LookPath("firewall-cmd") check and
mirror the same string-filtering approach used in removeCLI for symmetry.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: ae21212d-eaac-41b7-aa4b-e04fa394cea8
📒 Files selected for processing (1)
client/firewall/firewalld/firewalld_linux.go
e95d703 to
45d682f
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@client/firewall/firewalld/firewalld_linux.go`:
- Around line 47-55: The timeout context currently created once in
TrustInterface (ctx from context.WithTimeout using callTimeout) is reused for
isRunning and subsequent DBus/CLI calls, which can leave addCLI/removeCLI
running with an expired context; change the code so that isRunning gets its own
short timeout and each DBus or CLI attempt (addTrusted, addCLI, removeCLI, and
their counterparts in the other affected blocks) creates its own context via
context.WithTimeout(callTimeout) and defers cancel immediately before the call
so every probe/mutation gets a fresh deadline; update TrustInterface and the
similar blocks (functions/methods addTrusted, addCLI, removeCLI and the code
paths referenced at lines 66-74 and 79-109) to follow this pattern.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: f741b3c6-725d-4d58-86f9-73753b12a803
📒 Files selected for processing (6)
client/firewall/firewalld/firewalld_linux.goclient/firewall/firewalld/firewalld_linux_test.goclient/firewall/firewalld/firewalld_other.goclient/firewall/iptables/manager_linux.goclient/firewall/nftables/manager_linux.goclient/firewall/nftables/router_linux.go
🚧 Files skipped from review as they are similar to previous changes (4)
- client/firewall/firewalld/firewalld_other.go
- client/firewall/iptables/manager_linux.go
- client/firewall/nftables/router_linux.go
- client/firewall/nftables/manager_linux.go
28c1739 to
ddc0ada
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@client/firewall/iptables/manager_linux.go`:
- Around line 81-87: The code trusts the WG interface via
firewalld.TrustInterface(m.wgIface.Name()) before performing later fatal init
steps (e.g., m.aclMgr.init(stateManager)), so if a subsequent init fails the
interface remains trusted; update Init() to either move the TrustInterface call
until after all fatal initializations succeed or add a rollback on error: if
m.aclMgr.init(...) (or any subsequent fatal step) returns an error, call the
complementary firewalld untrust/untust method (e.g., firewalld.UntrustInterface
or similar) with m.wgIface.Name() before returning, and log any untrust
failures; ensure the rollback is applied for every early return path so the
interface is not left trusted on failed startup.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 4a9bf26b-fe88-4f0f-b7b4-d1eb89cdcb70
📒 Files selected for processing (9)
client/firewall/firewalld/firewalld_linux.goclient/firewall/firewalld/firewalld_linux_test.goclient/firewall/firewalld/firewalld_other.goclient/firewall/iptables/manager_linux.goclient/firewall/nftables/manager_linux.goclient/firewall/nftables/router_linux.goclient/firewall/uspfilter/allow_netbird.goclient/firewall/uspfilter/common/iface.goclient/firewall/uspfilter/filter_test.go
✅ Files skipped from review due to trivial changes (1)
- client/firewall/firewalld/firewalld_other.go
🚧 Files skipped from review as they are similar to previous changes (4)
- client/firewall/nftables/manager_linux.go
- client/firewall/nftables/router_linux.go
- client/firewall/firewalld/firewalld_linux_test.go
- client/firewall/firewalld/firewalld_linux.go
7cb5aaa to
28fde7c
Compare
28fde7c to
9327582
Compare
|
There was a problem hiding this comment.
🧹 Nitpick comments (2)
client/firewall/firewalld/firewalld_linux.go (2)
35-61: Process-global parent context may surprise concurrent callers.
parentCtxis process-global, so the last caller toSetParentContextwins for every subsequentTrustInterface. That is fine in the current single-engine client, but if this package is ever used from tests running engines in parallel, or a future second engine instance, cancellation scope could leak across instances. Also,trustLogOnceis package-scoped, so an engine restart within the same process will silence the "added to trusted zone" Info log on the second run.Consider scoping state via a small struct (e.g.
type Client struct { parent context.Context; logOnce sync.Once }) and returning a handle from a constructor, keeping the current free functions as thin wrappers over a package-default instance for compatibility. Not a blocker.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@client/firewall/firewalld/firewalld_linux.go` around lines 35 - 61, The code currently uses process-global state (parentCtx, parentCtxMu, trustLogOnce) which can leak cancellation/logging across concurrent engine instances; refactor by introducing a Client struct (e.g. type Client struct { parent context.Context; parentMu sync.RWMutex; logOnce sync.Once }) with a constructor NewClient(ctx context.Context) and make SetParentContext / getParentContext / trustLogOnce usages methods on *Client (SetParentContext -> (*Client).SetParentContext, getParentContext -> (*Client).getParentContext) while keeping package-level free functions as thin wrappers that forward to a package-default Client for backward compatibility; update TrustInterface/UntrustInterface and any references to trustLogOnce to use the instance's logOnce so state is scoped per Client instance.
145-157:isRunningsilently returns false for non-sentinel DBus errors; consider a debug log.Only
errDBusUnavailable(fromdbus.SystemBus()failures at Line 148) andcontext.DeadlineExceededtrigger thefirewall-cmdfallback. Any other DBus error fromgetDefaultZone— e.g.org.freedesktop.DBus.Error.ServiceUnknownwhen firewalld isn't registered,AccessDeniedfrom polkit, or wrapped peer errors that don't expose the deadline — causesisRunningto returnfalsewith no diagnostic. That's the correct outcome for "firewalld not running", but can make field debugging confusing (trust/untrust appear to silently no-op even though DBus was reachable).Consider a single
log.Debugfat Line 112 before thereturn falseso operators have a breadcrumb when something is odd with the firewalld DBus service.🪵 Suggested debug log
if errors.Is(err, errDBusUnavailable) || errors.Is(err, context.DeadlineExceeded) { ctx, cancel = newCallContext(parent) defer cancel() return isRunningCLI(ctx) } + log.Debugf("firewalld: dbus probe returned non-fallback error, treating as not running: %v", err) return false }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@client/firewall/firewalld/firewalld_linux.go` around lines 145 - 157, The isRunningDBus function currently swallows non-sentinel DBus errors when obj.CallWithContext(ctx, dbusRootIface+".getDefaultZone", 0).Store(&zone) fails; add a debug log there to record the error before returning false so operators can see DBus-level failures. Specifically, in isRunningDBus, just before the existing "return false, fmt.Errorf(...)" for the getDefaultZone call, call the package logger's Debugf (or log.Debugf) to include the error value and a short context message mentioning getDefaultZone and dbusDest/dbusPath; keep the existing error return behavior unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@client/firewall/firewalld/firewalld_linux.go`:
- Around line 35-61: The code currently uses process-global state (parentCtx,
parentCtxMu, trustLogOnce) which can leak cancellation/logging across concurrent
engine instances; refactor by introducing a Client struct (e.g. type Client
struct { parent context.Context; parentMu sync.RWMutex; logOnce sync.Once })
with a constructor NewClient(ctx context.Context) and make SetParentContext /
getParentContext / trustLogOnce usages methods on *Client (SetParentContext ->
(*Client).SetParentContext, getParentContext -> (*Client).getParentContext)
while keeping package-level free functions as thin wrappers that forward to a
package-default Client for backward compatibility; update
TrustInterface/UntrustInterface and any references to trustLogOnce to use the
instance's logOnce so state is scoped per Client instance.
- Around line 145-157: The isRunningDBus function currently swallows
non-sentinel DBus errors when obj.CallWithContext(ctx,
dbusRootIface+".getDefaultZone", 0).Store(&zone) fails; add a debug log there to
record the error before returning false so operators can see DBus-level
failures. Specifically, in isRunningDBus, just before the existing "return
false, fmt.Errorf(...)" for the getDefaultZone call, call the package logger's
Debugf (or log.Debugf) to include the error value and a short context message
mentioning getDefaultZone and dbusDest/dbusPath; keep the existing error return
behavior unchanged.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: cb06b6c7-e46a-4ec3-9c30-ad701993130f
📒 Files selected for processing (11)
client/firewall/firewalld/firewalld.goclient/firewall/firewalld/firewalld_linux.goclient/firewall/firewalld/firewalld_linux_test.goclient/firewall/firewalld/firewalld_other.goclient/firewall/iptables/manager_linux.goclient/firewall/nftables/manager_linux.goclient/firewall/nftables/router_linux.goclient/firewall/uspfilter/allow_netbird.goclient/firewall/uspfilter/common/iface.goclient/firewall/uspfilter/filter_test.goclient/internal/engine.go
✅ Files skipped from review due to trivial changes (2)
- client/firewall/firewalld/firewalld.go
- client/internal/engine.go
🚧 Files skipped from review as they are similar to previous changes (7)
- client/firewall/uspfilter/common/iface.go
- client/firewall/iptables/manager_linux.go
- client/firewall/nftables/manager_linux.go
- client/firewall/nftables/router_linux.go
- client/firewall/firewalld/firewalld_linux_test.go
- client/firewall/uspfilter/allow_netbird.go
- client/firewall/uspfilter/filter_test.go



Describe your changes
When firewalld is running on recent versions it creates its nftables chains with the
NFT_CHAIN_OWNERflag, which makes any attempt by another process to insert rules into them return EPERM. NetBird currently tries to insert accept rules into firewalld-ownedfilter_FORWARD/filter_INPUTchains and logs a failure. This change delegates acceptance to firewalld itself by adding the wg interface to firewalld'strustedzone, matching what Tailscale does.client/firewall/firewalldpackage that adds/removes an interface from thetrustedzone via DBus, with afirewall-cmdfallbackAllowNetbirdso userspace-firewall mode is covered toofirewalldfrom the "add accept rules to external chains" walk so we stop hitting EPERMShutdownStatecrash-recovery pathIssue ticket number and link
Stack
Checklist
Documentation
Select exactly one:
Internal compatibility shim, no user-facing configuration or behavior change beyond the interface being placed in firewalld's trusted zone while NetBird is running.
Docs PR URL (required if "docs added" is checked)
Paste the PR link from https://github.com/netbirdio/docs here:
https://github.com/netbirdio/docs/pull/__
Summary by CodeRabbit
New Features
Tests