[client] Fix stale entries in nftables with no handle#5272
Conversation
📝 WalkthroughWalkthroughThis change broadens nftables router resilience: it treats stale kernel rules (Handle == 0) as non-fatal, prunes them from in-memory state, introduces rollbackRules for cleanup on failed flushes, aggregates deletion errors with multierror, and updates refreshRulesMap to rebuild the rule map from kernel state. (≤50 words) Changes
Sequence Diagram(s)(omitted — changes remain primarily internal to router logic and kernel state handling; conditions for sequence diagrams not met) Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
client/firewall/nftables/router_linux.go (1)
993-1012:⚠️ Potential issue | 🟡 Minor
RemoveAllLegacyRouteRulesdoes not handle Handle==0 entries.All other delete paths (
DeleteRouteRule,removeNatRule,removeLegacyRouteRule,DeleteDNATRule,RemoveInboundDNAT) now guard against stale entries with Handle==0. However,RemoveAllLegacyRouteRulesat line 1004 callsr.conn.DelRule(rule)directly without checkingrule.Handle == 0, which would fail for stale entries just like the original bug this PR addresses.Proposed fix
for k, rule := range r.rules { if !strings.HasPrefix(k, firewall.ForwardingFormatPrefix) { continue } + if rule.Handle == 0 { + log.Warnf("legacy forwarding rule %s has no handle, removing stale entry", k) + delete(r.rules, k) + continue + } if err := r.conn.DelRule(rule); err != nil { merr = multierror.Append(merr, fmt.Errorf("remove legacy forwarding rule: %v", err)) } else { delete(r.rules, k) }
6a53884 to
ca101f8
Compare
|


Describe your changes
Summary
Problem
When conn.Flush() fails in AddNatRule, rules are stored in the map with Handle == 0 (the kernel never assigned handles). The old refreshRulesMap() merged kernel rules into the existing map, so these stale entries persisted
indefinitely. Any subsequent delete operation would fail on Handle == 0, which blocked further operations like re-adding the same NAT rule.
Issue ticket number and link
Stack
Checklist
Documentation
Select exactly one:
Docs PR URL (required if "docs added" is checked)
Paste the PR link from https://github.com/netbirdio/docs here:
https://github.com/netbirdio/docs/pull/__
Summary by CodeRabbit
Bug Fixes
Tests